Skip to content

Data Transformation Pipeline Examples

This directory contains real-world examples demonstrating the power of the data transformation pipeline in the Obsidian D3 plugin.

1. Sales Dashboard

File: sales-dashboard.md

Shows how to combine multiple transformations to create a sales dashboard:

  • Derive: Calculate line totals (quantity × price)
  • Aggregate: Group by region and sum totals
  • Sort: Order by highest sales first
  • Select: Choose display columns and rename for UI

Use Case: Analyze regional sales performance

2. Time Series Analysis

File: time-series-analysis.md

Demonstrates time-based data analysis:

  • Derive: Calculate moving averages and growth rates
  • Filter: Focus on specific date ranges
  • Aggregate: Group by time periods (daily, weekly, monthly)
  • Sort: Order chronologically

Use Case: Track metrics over time, identify trends

3. Data Cleaning

File: data-cleaning.md

Shows how to prepare messy data for visualization:

  • Filter: Remove nulls, invalid values, outliers
  • Select: Keep only needed columns
  • Rename: Standardize column names
  • Derive: Fix data inconsistencies

Use Case: Clean raw data before charting

4. Multi-Step Pipeline

File: multi-step-pipeline.md

Complex example combining all transformation types:

  • Real-world scenario with multiple steps
  • Shows transformation chaining
  • Demonstrates how order matters
  • Performance considerations

Use Case: End-to-end data processing workflows

Quick Reference: Transformation Types

Filter

Remove rows based on conditions

yaml
transformations:
  - type: filter
    configuration:
      where:
        status: { eq: "active" }

Aggregate

Group and summarize data

yaml
transformations:
  - type: aggregate
    configuration:
      groupBy: ["region"]
      sum: ["sales"]
      avg: ["price"]

Sort

Order data by columns

yaml
transformations:
  - type: sort
    configuration:
      by:
        - column: "sales"
          direction: "desc"

Limit

Take first N rows or paginate

yaml
transformations:
  - type: limit
    configuration:
      count: 10
      offset: 0

Select

Choose and rename columns

yaml
transformations:
  - type: select
    configuration:
      columns: ["id", "name", "value"]
      rename:
        name: "Product Name"
        value: "Amount"

Rename

Rename columns for standardization

yaml
transformations:
  - type: rename
    configuration:
      mapping:
        firstName: "first_name"
        lastName: "last_name"

Derive

Add computed columns

yaml
transformations:
  - type: derive
    configuration:
      columns:
        total: "quantity * price"
        profit_margin: "(price - cost) / price * 100"

Chaining Transformations

Transformations execute in order. The output of one becomes the input to the next:

yaml
transformations:
  # Step 1: Add computed column
  - type: derive
    configuration:
      columns:
        total: "quantity * price"
  
  # Step 2: Filter for valid records
  - type: filter
    configuration:
      where:
        total: { gt: 0 }
  
  # Step 3: Group and aggregate
  - type: aggregate
    configuration:
      groupBy: ["category"]
      sum: ["total"]
  
  # Step 4: Sort by highest sales
  - type: sort
    configuration:
      by: [{ column: "total_sum", direction: "desc" }]
  
  # Step 5: Limit to top 10
  - type: limit
    configuration:
      count: 10

Performance Tips

  1. Filter Early: Remove unnecessary rows before aggregation
  2. Select Columns: Only keep what you need
  3. Aggregate Before Sort: Grouping reduces data size
  4. Limit Last: Use limit as the final step for pagination
  5. Avoid Complex Derives: Keep expressions simple for performance

Error Handling

  • Invalid expressions in derive result in null values (not breaking)
  • Missing columns in select are silently skipped
  • Filter with no matches returns empty result
  • All transformations continue even if some fail

Real-World Scenarios

See individual example files for:

  • Sales analytics dashboards
  • Time series trend analysis
  • Data quality improvements
  • API response formatting
  • Report generation pipelines

Released under the MIT License. Built by Boundary Lab.