Machine Learning Visualizations

This example demonstrates various Plot marks to visualize machine learning metrics, model performance, and training data using the dg-plot handle.

Dataset

The example dataset contains 50 data points with the following metrics:

Training and validation loss over epochs
Feature importance scores
Model predictions vs actual values
Confusion matrix metrics
Cross-validation performance
Hyperparameter tuning results

Learning Curves

Track training and validation loss over epochs:

View Source

data:
  source: |
    [
      {"epoch": 1, "train_loss": 0.82, "val_loss": 0.85, "train_acc": 0.65, "val_acc": 0.63},
      {"epoch": 2, "train_loss": 0.75, "val_loss": 0.79, "train_acc": 0.71, "val_acc": 0.68},
      {"epoch": 3, "train_loss": 0.68, "val_loss": 0.74, "train_acc": 0.76, "val_acc": 0.72},
      {"epoch": 4, "train_loss": 0.61, "val_loss": 0.69, "train_acc": 0.80, "val_acc": 0.75},
      {"epoch": 5, "train_loss": 0.55, "val_loss": 0.65, "train_acc": 0.83, "val_acc": 0.77},
      {"epoch": 6, "train_loss": 0.49, "val_loss": 0.62, "train_acc": 0.86, "val_acc": 0.79},
      {"epoch": 7, "train_loss": 0.44, "val_loss": 0.60, "train_acc": 0.88, "val_acc": 0.80},
      {"epoch": 8, "train_loss": 0.40, "val_loss": 0.58, "train_acc": 0.90, "val_acc": 0.81},
      {"epoch": 9, "train_loss": 0.36, "val_loss": 0.57, "train_acc": 0.91, "val_acc": 0.82},
      {"epoch": 10, "train_loss": 0.33, "val_loss": 0.56, "train_acc": 0.92, "val_acc": 0.82}
    ]
marks:
  - type: line
    x: epoch
    y: train_loss
    stroke: steelblue
    strokeWidth: 2
  - type: line
    x: epoch
    y: val_loss
    stroke: coral
    strokeWidth: 2
    strokeDasharray: [4, 4]
  - type: dot
    x: epoch
    y: train_loss
    fill: steelblue
    r: 4
  - type: dot
    x: epoch
    y: val_loss
    fill: coral
    r: 4
  - type: crosshair
    x: epoch
    y: train_loss
    opacity: 0.4
    tip: true
grid: true
title: Learning Curves
style:
  fontSize: 12
width: 1000

Feature Importance

Visualize relative importance of model features:

View Source

data:
  source: |
    [
      {"feature": "feature_1", "importance": 0.85, "std": 0.05, "category": "Primary"},
      {"feature": "feature_2", "importance": 0.72, "std": 0.06, "category": "Primary"},
      {"feature": "feature_3", "importance": 0.65, "std": 0.04, "category": "Secondary"},
      {"feature": "feature_4", "importance": 0.58, "std": 0.07, "category": "Secondary"},
      {"feature": "feature_5", "importance": 0.45, "std": 0.05, "category": "Secondary"},
      {"feature": "feature_6", "importance": 0.38, "std": 0.04, "category": "Tertiary"},
      {"feature": "feature_7", "importance": 0.32, "std": 0.06, "category": "Tertiary"},
      {"feature": "feature_8", "importance": 0.25, "std": 0.03, "category": "Tertiary"}
    ]
marks:
  - type: bar
    x: importance
    y: feature
    fill: category
    sort: y
  - type: rule
    x1: "d => d.importance - d.std"
    x2: "d => d.importance + d.std"
    y: feature
    stroke: currentColor
    strokeOpacity: 0.4
grid: true
title: Feature Importance
style:
  fontSize: 12
scales:
  color:
    type: ordinal
    scheme: tableau10

Model Performance

Compare predicted vs actual values with confidence bands:

View Source

data:
  source: |
    [
      {"actual": 10.2, "predicted": 9.8, "confidence": 0.92, "group": "Group A"},
      {"actual": 15.7, "predicted": 16.1, "confidence": 0.88, "group": "Group A"},
      {"actual": 20.5, "predicted": 19.9, "confidence": 0.95, "group": "Group A"},
      {"actual": 25.3, "predicted": 26.0, "confidence": 0.91, "group": "Group A"},
      {"actual": 30.8, "predicted": 29.5, "confidence": 0.89, "group": "Group A"},
      {"actual": 12.4, "predicted": 11.9, "confidence": 0.87, "group": "Group B"},
      {"actual": 17.9, "predicted": 18.5, "confidence": 0.93, "group": "Group B"},
      {"actual": 22.6, "predicted": 21.8, "confidence": 0.90, "group": "Group B"},
      {"actual": 27.1, "predicted": 28.0, "confidence": 0.86, "group": "Group B"},
      {"actual": 32.5, "predicted": 31.2, "confidence": 0.94, "group": "Group B"}
    ]
marks:
  - type: line
    x: actual
    y: actual
    stroke: gray
    strokeDasharray: [2, 2]
    strokeOpacity: 0.3
  - type: dot
    x: actual
    y: predicted
    r: (d) => 3 + d.confidence * 4
    fill: steelblue
    fillOpacity: 0.5
    stroke: steelblue
    tip: true
  - type: crosshair
    x: actual
    y: predicted
    opacity: 0.4
    tip: true
grid: true
title: Model Performance (Predicted vs Actual)
style:
  fontSize: 12
scales:
  x:
    domain: [0, 40]
    label: Actual Values
  y:
    domain: [0, 40]
    label: Predicted Values

Cross-validation Performance

Visualize model performance across different folds:

View Source

data:
  source: |
    [
      {"fold": 1, "accuracy": 0.82, "precision": 0.80, "recall": 0.83, "f1": 0.81},
      {"fold": 2, "accuracy": 0.85, "precision": 0.83, "recall": 0.86, "f1": 0.84},
      {"fold": 3, "accuracy": 0.79, "precision": 0.78, "recall": 0.81, "f1": 0.79},
      {"fold": 4, "accuracy": 0.83, "precision": 0.82, "recall": 0.85, "f1": 0.83},
      {"fold": 5, "accuracy": 0.81, "precision": 0.79, "recall": 0.82, "f1": 0.80}
    ]
marks:
  - type: rect
    x: fold
    y1: 0
    y2: accuracy
    fill: steelblue
    fillOpacity: 0.3
  - type: rule
    x: fold
    y1: precision
    y2: recall
    stroke: coral
    strokeWidth: 2
  - type: dot
    x: fold
    y: f1
    fill: purple
    r: 6
  - type: text
    x: fold
    y: "d => d.f1 + 0.05"
    text: "d => d.f1.toFixed(2)"
    fontSize: 10
    textAnchor: middle
grid: true

Hyperparameter Tuning

Visualize hyperparameter search results:

View Source

data:
  source: |
    [
      {"learning_rate": 0.001, "batch_size": 32, "score": 0.75, "runtime": 120},
      {"learning_rate": 0.001, "batch_size": 64, "score": 0.78, "runtime": 100},
      {"learning_rate": 0.001, "batch_size": 128, "score": 0.76, "runtime": 80},
      {"learning_rate": 0.01, "batch_size": 32, "score": 0.82, "runtime": 110},
      {"learning_rate": 0.01, "batch_size": 64, "score": 0.85, "runtime": 90},
      {"learning_rate": 0.01, "batch_size": 128, "score": 0.83, "runtime": 70},
      {"learning_rate": 0.1, "batch_size": 32, "score": 0.79, "runtime": 100},
      {"learning_rate": 0.1, "batch_size": 64, "score": 0.81, "runtime": 80},
      {"learning_rate": 0.1, "batch_size": 128, "score": 0.80, "runtime": 60}
    ]
marks:
  - type: dot
    x: learning_rate
    y: score
    r: "d => d.runtime / 10"
    fill: "d => d.batch_size"
    fillOpacity: 0.6
    title: "d => `Batch Size: ${d.batch_size}\nRuntime: ${d.runtime}s`"
  - type: crosshair
    x: learning_rate
    y: score
    opacity: 0.4
    tip: true
grid: true
title: Hyperparameter Tuning (Learning Rate vs Score)
scales:
  x:
    type: log
    domain: [0.0001, 1]
    label: Learning Rate (log scale)
  y:
    domain: [0.7, 0.9]
    label: Model Score
  color:
    scheme: viridis
    legend: true

Model Residuals

Analyze prediction residuals:

View Source

data:
  source: |
    [
      {"predicted": 25.3, "residual": -2.1, "confidence": 0.85, "feature_val": 12.5},
      {"predicted": 28.7, "residual": 1.8, "confidence": 0.92, "feature_val": 15.8},
      {"predicted": 31.2, "residual": -1.5, "confidence": 0.88, "feature_val": 18.2},
      {"predicted": 35.8, "residual": 2.3, "confidence": 0.90, "feature_val": 22.5},
      {"predicted": 38.5, "residual": -1.9, "confidence": 0.87, "feature_val": 25.7},
      {"predicted": 42.1, "residual": 1.6, "confidence": 0.91, "feature_val": 28.9},
      {"predicted": 45.6, "residual": -2.4, "confidence": 0.86, "feature_val": 32.4},
      {"predicted": 48.9, "residual": 2.0, "confidence": 0.89, "feature_val": 35.8},
      {"predicted": 52.3, "residual": -1.7, "confidence": 0.93, "feature_val": 38.9},
      {"predicted": 55.8, "residual": 1.9, "confidence": 0.90, "feature_val": 42.3}
    ]
marks:
  - type: dot
    x: predicted
    y: residual
    r: "d => d.confidence * 8"
    fill: "d => Math.abs(d.residual)"
    fillOpacity: 0.6
  - type: rule
    x1: 20
    x2: 60
    y: 0
    stroke: currentColor
    strokeOpacity: 0.2
  - type: area
    x: predicted
    y1: "d => Math.min(0, d.residual)"
    y2: "d => Math.max(0, d.residual)"
    fill: "d => d.residual > 0 ? 'coral' : 'steelblue'"
    fillOpacity: 0.2
  - type: crosshair
    x: predicted
    y: residual
    opacity: 0.4
    tip: true
grid: true
scales:
  color:
    type: linear
    scheme: rdbu
    legend: true

Training Progress

Detailed view of training metrics over time:

View Source

data:
  source: |
    [
      {"step": 100, "loss": 0.85, "gradient_norm": 0.95, "learning_rate": 0.01},
      {"step": 200, "loss": 0.75, "gradient_norm": 0.82, "learning_rate": 0.01},
      {"step": 300, "loss": 0.68, "gradient_norm": 0.75, "learning_rate": 0.008},
      {"step": 400, "loss": 0.62, "gradient_norm": 0.68, "learning_rate": 0.008},
      {"step": 500, "loss": 0.55, "gradient_norm": 0.62, "learning_rate": 0.006},
      {"step": 600, "loss": 0.50, "gradient_norm": 0.58, "learning_rate": 0.006},
      {"step": 700, "loss": 0.45, "gradient_norm": 0.52, "learning_rate": 0.004},
      {"step": 800, "loss": 0.42, "gradient_norm": 0.48, "learning_rate": 0.004},
      {"step": 900, "loss": 0.38, "gradient_norm": 0.45, "learning_rate": 0.002},
      {"step": 1000, "loss": 0.35, "gradient_norm": 0.42, "learning_rate": 0.002}
    ]
marks:
  - type: line
    x: step
    y: loss
    stroke: steelblue
    strokeWidth: 2
  - type: area
    x: step
    y: loss
    fill: steelblue
    fillOpacity: 0.1
  - type: dot
    x: step
    y: gradient_norm
    r: "d => d.learning_rate * 1000"
    fill: coral
    fillOpacity: 0.6
  - type: rule
    x: step
    y1: loss
    y2: gradient_norm
    stroke: gray
    strokeOpacity: 0.2
  - type: text
    x: step
    y: "d => Math.max(d.loss, d.gradient_norm) + 0.1"
    text: "d => d.learning_rate.toFixed(3)"
    fontSize: 8
    textAnchor: middle
  - type: crosshair
    x: step
    y: loss
    opacity: 0.4
    tip: true
grid: true

Usage Tips

Each plot demonstrates different mark types and their combinations:
- line for trend visualization
- area for filled regions
- dot for scatter points
- rule for error bars and reference lines
- bar for categorical comparisons
- rect for rectangular marks
- text for labels
- crosshair for interactive data exploration
Common styling options:
- Use fillOpacity and strokeOpacity for layering
- Set grid: true for reference lines
- Use tip: true with crosshair for tooltips
- Apply color scales with scheme for consistent palettes
Interactive features:
- Hover over points for tooltips
- Use crosshair marks for precise data reading
- Combine multiple marks for rich visualizations
Data handling:
- Use computed values: "d => expression"
- Format text with templates
- Scale values for visual encoding (e.g., radius)

Machine Learning Visualizations ​

Dataset ​

Learning Curves ​

Feature Importance ​

Model Performance ​

Cross-validation Performance ​

Hyperparameter Tuning ​

Model Residuals ​

Training Progress ​

Usage Tips ​

Machine Learning Visualizations

Dataset

Learning Curves

Feature Importance

Model Performance

Cross-validation Performance

Hyperparameter Tuning

Model Residuals

Training Progress

Usage Tips