Skip to content

Linear Regression Examples

This document demonstrates the linear regression capabilities of the Obsidian D3 plugin.

Example 1: Simple Linear Regression

View Source
data:
  source: '[{"age": 25, "salary": 45000}, {"age": 30, "salary": 55000}, {"age": 35, "salary": 65000}, {"age": 40, "salary": 75000}, {"age": 45, "salary": 85000}, {"age": 50, "salary": 95000}]'
type: scatter
engine: plot
x: age
y: salary
width: 600
height: 400
title: Age vs Salary (Simple Regression)
marks:
  - type: dot
    configuration:
      x: age
      y: salary
      fill: steelblue
  - type: regression
    configuration:
      x: age
      y: salary
      regression:
        confidence: 0.95
        showEquation: true
        showRSquared: true

Example 2: Regression with Confidence Bands

View Source
data:
  source: '[{"experience": 1, "productivity": 42}, {"experience": 2, "productivity": 48}, {"experience": 3, "productivity": 55}, {"experience": 4, "productivity": 58}, {"experience": 5, "productivity": 65}, {"experience": 6, "productivity": 68}, {"experience": 7, "productivity": 75}, {"experience": 8, "productivity": 78}, {"experience": 9, "productivity": 82}, {"experience": 10, "productivity": 88}]'
type: scatter
engine: plot
x: experience
y: productivity
width: 600
height: 400
title: Experience vs Productivity (with 95% Confidence Bands)
marks:
  - type: dot
    configuration:
      x: experience
      y: productivity
      fill: steelblue
  - type: regression
    configuration:
      x: experience
      y: productivity
      stroke: "#4a90e2"
      strokeWidth: 2
      bandFill: "#4a90e2"
      bandFillOpacity: 0.2
      regression:
        confidence: 0.95
        showConfidenceBand: true
        showEquation: true
        showRSquared: true

Example 3: Grouped Regression by Category

View Source
data:
  source: '[{"hours": 1, "score": 55, "method": "Online"}, {"hours": 2, "score": 60, "method": "Online"}, {"hours": 3, "score": 68, "method": "Online"}, {"hours": 4, "score": 72, "method": "Online"}, {"hours": 5, "score": 78, "method": "Online"}, {"hours": 1, "score": 58, "method": "In-Person"}, {"hours": 2, "score": 65, "method": "In-Person"}, {"hours": 3, "score": 74, "method": "In-Person"}, {"hours": 4, "score": 82, "method": "In-Person"}, {"hours": 5, "score": 90, "method": "In-Person"}]'
type: scatter
engine: plot
x: hours
y: score
color: method
width: 600
height: 400
title: Study Hours vs Test Score (by Learning Method)
marks:
  - type: dot
    configuration:
      x: hours
      y: score
      fill: method
  - type: regression
    configuration:
      x: hours
      y: score
      regression:
        groupBy: method
        confidence: 0.95
        showEquation: true

Example 4: Regression with Outlier Removal

View Source
data:
  source: '[{"x": 1, "y": 10}, {"x": 2, "y": 20}, {"x": 3, "y": 30}, {"x": 4, "y": 40}, {"x": 5, "y": 50}, {"x": 6, "y": 60}, {"x": 7, "y": 25}, {"x": 8, "y": 80}, {"x": 9, "y": 90}, {"x": 10, "y": 100}]'
type: scatter
engine: plot
x: x
y: y
width: 600
height: 400
title: Data with Outlier (x=7, y=25)
marks:
  - type: dot
    configuration:
      x: x
      y: y
      fill: steelblue
  - type: regression
    configuration:
      x: x
      y: y
      regression:
        removeOutliers: true
        outlierThreshold: 1.5
        showEquation: true
        showRSquared: true

Example 5: Extended Domain Regression

View Source
data:
  source: '[{"temperature": 20, "sales": 80}, {"temperature": 25, "sales": 120}, {"temperature": 30, "sales": 160}, {"temperature": 35, "sales": 200}]'
type: scatter
engine: plot
x: temperature
y: sales
width: 600
height: 400
title: Temperature vs Ice Cream Sales (Extended Domain)
marks:
  - type: dot
    configuration:
      x: temperature
      y: sales
      fill: orange
  - type: regression
    configuration:
      x: temperature
      y: sales
      regression:
        extendToDomain: true
        domain:
          x: [15, 40]
        showEquation: true
        showRSquared: true
        showConfidenceBand: true
        confidence: 0.90

Example 6: Real-World Example - Housing Prices

View Source
data:
  source: '[{"sqft": 1000, "price": 180000}, {"sqft": 1200, "price": 210000}, {"sqft": 1400, "price": 245000}, {"sqft": 1600, "price": 270000}, {"sqft": 1800, "price": 305000}, {"sqft": 2000, "price": 340000}, {"sqft": 2200, "price": 365000}, {"sqft": 2400, "price": 400000}, {"sqft": 2600, "price": 430000}, {"sqft": 2800, "price": 465000}]'
type: scatter
engine: plot
x: sqft
y: price
width: 700
height: 500
title: House Price vs Square Footage
scales:
  x:
    label: Square Feet
  y:
    label: Price ($)
marks:
  - type: dot
    configuration:
      x: sqft
      y: price
      fill: "#27ae60"
  - type: regression
    configuration:
      x: sqft
      y: price
      stroke: "#2ecc71"
      strokeWidth: 3
      bandFill: "#2ecc71"
      bandFillOpacity: 0.15
      regression:
        confidence: 0.95
        showConfidenceBand: true
        showEquation: true
        showRSquared: true

Regression Configuration Options

All regression examples support these configuration options:

Basic Options

  • confidence: Confidence level (0.90, 0.95, 0.99)
  • showConfidenceBand: Display confidence interval bands
  • removeOutliers: Automatic outlier detection and removal
  • outlierThreshold: IQR multiplier for outlier detection (default: 1.5)

Display Options

  • showEquation: Display regression equation
  • showRSquared: Display R² coefficient
  • extendToDomain: Extend regression line to plot boundaries
  • domain: Custom domain for extension {"x": [min, max]}

Grouping

  • groupBy: Field name for grouped regressions

Styling

  • lineStyle: Regression line styling
    • stroke: Line color
    • strokeWidth: Line width
    • strokeDasharray: Dash pattern
  • bandStyle: Confidence band styling
    • fill: Fill color
    • fillOpacity: Transparency (0-1)

Statistical Output

Each regression provides:

  • Slope: Rate of change (β coefficient)
  • Intercept: Y-intercept (α coefficient)
  • : Coefficient of determination (0-1, higher is better fit)
  • Equation: y = mx + b format
  • Standard Error: Measure of prediction accuracy
  • Confidence Bands: Prediction intervals at specified confidence level
  • Residuals: Error terms for each observation

Tips for Best Results

  1. Sample Size: Use at least 10-15 data points for reliable results
  2. Outliers: Enable outlier removal for noisy data
  3. Confidence Level: Use 95% for most applications, 99% for critical analysis
  4. Grouped Regression: Useful when data has distinct categories with different trends
  5. Domain Extension: Helps visualize predictions beyond observed data range
  6. Visual Inspection: Always plot data first to verify linear relationship assumption

Released under the MIT License. Built by Boundary Lab.