Linear Regression Examples
This document demonstrates the linear regression capabilities of the Obsidian D3 plugin.
Example 1: Simple Linear Regression
View Source
data:
source: '[{"age": 25, "salary": 45000}, {"age": 30, "salary": 55000}, {"age": 35, "salary": 65000}, {"age": 40, "salary": 75000}, {"age": 45, "salary": 85000}, {"age": 50, "salary": 95000}]'
type: scatter
engine: plot
x: age
y: salary
width: 600
height: 400
title: Age vs Salary (Simple Regression)
marks:
- type: dot
configuration:
x: age
y: salary
fill: steelblue
- type: regression
configuration:
x: age
y: salary
regression:
confidence: 0.95
showEquation: true
showRSquared: trueExample 2: Regression with Confidence Bands
View Source
data:
source: '[{"experience": 1, "productivity": 42}, {"experience": 2, "productivity": 48}, {"experience": 3, "productivity": 55}, {"experience": 4, "productivity": 58}, {"experience": 5, "productivity": 65}, {"experience": 6, "productivity": 68}, {"experience": 7, "productivity": 75}, {"experience": 8, "productivity": 78}, {"experience": 9, "productivity": 82}, {"experience": 10, "productivity": 88}]'
type: scatter
engine: plot
x: experience
y: productivity
width: 600
height: 400
title: Experience vs Productivity (with 95% Confidence Bands)
marks:
- type: dot
configuration:
x: experience
y: productivity
fill: steelblue
- type: regression
configuration:
x: experience
y: productivity
stroke: "#4a90e2"
strokeWidth: 2
bandFill: "#4a90e2"
bandFillOpacity: 0.2
regression:
confidence: 0.95
showConfidenceBand: true
showEquation: true
showRSquared: trueExample 3: Grouped Regression by Category
View Source
data:
source: '[{"hours": 1, "score": 55, "method": "Online"}, {"hours": 2, "score": 60, "method": "Online"}, {"hours": 3, "score": 68, "method": "Online"}, {"hours": 4, "score": 72, "method": "Online"}, {"hours": 5, "score": 78, "method": "Online"}, {"hours": 1, "score": 58, "method": "In-Person"}, {"hours": 2, "score": 65, "method": "In-Person"}, {"hours": 3, "score": 74, "method": "In-Person"}, {"hours": 4, "score": 82, "method": "In-Person"}, {"hours": 5, "score": 90, "method": "In-Person"}]'
type: scatter
engine: plot
x: hours
y: score
color: method
width: 600
height: 400
title: Study Hours vs Test Score (by Learning Method)
marks:
- type: dot
configuration:
x: hours
y: score
fill: method
- type: regression
configuration:
x: hours
y: score
regression:
groupBy: method
confidence: 0.95
showEquation: trueExample 4: Regression with Outlier Removal
View Source
data:
source: '[{"x": 1, "y": 10}, {"x": 2, "y": 20}, {"x": 3, "y": 30}, {"x": 4, "y": 40}, {"x": 5, "y": 50}, {"x": 6, "y": 60}, {"x": 7, "y": 25}, {"x": 8, "y": 80}, {"x": 9, "y": 90}, {"x": 10, "y": 100}]'
type: scatter
engine: plot
x: x
y: y
width: 600
height: 400
title: Data with Outlier (x=7, y=25)
marks:
- type: dot
configuration:
x: x
y: y
fill: steelblue
- type: regression
configuration:
x: x
y: y
regression:
removeOutliers: true
outlierThreshold: 1.5
showEquation: true
showRSquared: trueExample 5: Extended Domain Regression
View Source
data:
source: '[{"temperature": 20, "sales": 80}, {"temperature": 25, "sales": 120}, {"temperature": 30, "sales": 160}, {"temperature": 35, "sales": 200}]'
type: scatter
engine: plot
x: temperature
y: sales
width: 600
height: 400
title: Temperature vs Ice Cream Sales (Extended Domain)
marks:
- type: dot
configuration:
x: temperature
y: sales
fill: orange
- type: regression
configuration:
x: temperature
y: sales
regression:
extendToDomain: true
domain:
x: [15, 40]
showEquation: true
showRSquared: true
showConfidenceBand: true
confidence: 0.90Example 6: Real-World Example - Housing Prices
View Source
data:
source: '[{"sqft": 1000, "price": 180000}, {"sqft": 1200, "price": 210000}, {"sqft": 1400, "price": 245000}, {"sqft": 1600, "price": 270000}, {"sqft": 1800, "price": 305000}, {"sqft": 2000, "price": 340000}, {"sqft": 2200, "price": 365000}, {"sqft": 2400, "price": 400000}, {"sqft": 2600, "price": 430000}, {"sqft": 2800, "price": 465000}]'
type: scatter
engine: plot
x: sqft
y: price
width: 700
height: 500
title: House Price vs Square Footage
scales:
x:
label: Square Feet
y:
label: Price ($)
marks:
- type: dot
configuration:
x: sqft
y: price
fill: "#27ae60"
- type: regression
configuration:
x: sqft
y: price
stroke: "#2ecc71"
strokeWidth: 3
bandFill: "#2ecc71"
bandFillOpacity: 0.15
regression:
confidence: 0.95
showConfidenceBand: true
showEquation: true
showRSquared: trueRegression Configuration Options
All regression examples support these configuration options:
Basic Options
confidence: Confidence level (0.90, 0.95, 0.99)showConfidenceBand: Display confidence interval bandsremoveOutliers: Automatic outlier detection and removaloutlierThreshold: IQR multiplier for outlier detection (default: 1.5)
Display Options
showEquation: Display regression equationshowRSquared: Display R² coefficientextendToDomain: Extend regression line to plot boundariesdomain: Custom domain for extension{"x": [min, max]}
Grouping
groupBy: Field name for grouped regressions
Styling
lineStyle: Regression line stylingstroke: Line colorstrokeWidth: Line widthstrokeDasharray: Dash pattern
bandStyle: Confidence band stylingfill: Fill colorfillOpacity: Transparency (0-1)
Statistical Output
Each regression provides:
- Slope: Rate of change (β coefficient)
- Intercept: Y-intercept (α coefficient)
- R²: Coefficient of determination (0-1, higher is better fit)
- Equation:
y = mx + bformat - Standard Error: Measure of prediction accuracy
- Confidence Bands: Prediction intervals at specified confidence level
- Residuals: Error terms for each observation
Tips for Best Results
- Sample Size: Use at least 10-15 data points for reliable results
- Outliers: Enable outlier removal for noisy data
- Confidence Level: Use 95% for most applications, 99% for critical analysis
- Grouped Regression: Useful when data has distinct categories with different trends
- Domain Extension: Helps visualize predictions beyond observed data range
- Visual Inspection: Always plot data first to verify linear relationship assumption