Skip to content

Multi-Step Pipeline Example

Complex real-world example showing how all 7 transformation types work together in a production data pipeline.

Scenario: E-Commerce Quarterly Analytics

An e-commerce company needs quarterly analytics combining:

  • Customer transaction data
  • Product information
  • Regional performance
  • Multiple aggregation levels
  • Display formatting

Raw Data

Transactions Table (representative sample across Q1 2024):

| date       | customer_id | product_name  | unit_price | quantity | region | category | discount |
|------------|-------------|---------------|------------|----------|--------|----------|----------|
| 2024-01-01 | CUST001     | Laptop        | 999        | 1        | North  | Tech     | 0.1      |
| 2024-01-01 | CUST002     | Mouse         | 29         | 2        | South  | Tech     | 0        |
| 2024-01-02 | CUST001     | Keyboard      | 79         | 1        | North  | Tech     | 0.05     |
| ...        | ...         | ...           | ...        | ...      | ...    | ...      | ...      |
| 2024-03-31 | CUST500     | Monitor       | 299        | 1        | East   | Tech     | 0.15     |

Complete Pipeline Configuration

data-glass
---
type: bar
engine: plot
x: Region
y: Net Revenue
color: category

data:
  source: '[{"date":"2024-01-05","customer_id":"CUST001","product_name":"Laptop","unit_price":999,"quantity":1,"region":"North","category":"Tech","discount":0.1},{"date":"2024-01-08","customer_id":"CUST002","product_name":"Mouse","unit_price":29,"quantity":2,"region":"South","category":"Tech","discount":0},{"date":"2024-01-10","customer_id":"CUST003","product_name":"Keyboard","unit_price":79,"quantity":1,"region":"North","category":"Tech","discount":0.05},{"date":"2024-01-12","customer_id":"CUST004","product_name":"Desk Chair","unit_price":299,"quantity":1,"region":"East","category":"Home","discount":0.15},{"date":"2024-01-15","customer_id":"CUST005","product_name":"Running Shoes","unit_price":129,"quantity":2,"region":"West","category":"Sports","discount":0},{"date":"2024-01-18","customer_id":"CUST006","product_name":"T-Shirt","unit_price":35,"quantity":3,"region":"South","category":"Fashion","discount":0.2},{"date":"2024-01-22","customer_id":"CUST007","product_name":"Monitor","unit_price":449,"quantity":1,"region":"North","category":"Tech","discount":0.1},{"date":"2024-01-25","customer_id":"CUST008","product_name":"Coffee Table","unit_price":199,"quantity":1,"region":"East","category":"Home","discount":0},{"date":"2024-01-28","customer_id":"CUST009","product_name":"Yoga Mat","unit_price":45,"quantity":2,"region":"West","category":"Sports","discount":0.1},{"date":"2024-02-02","customer_id":"CUST010","product_name":"Jeans","unit_price":89,"quantity":1,"region":"North","category":"Fashion","discount":0},{"date":"2024-02-05","customer_id":"CUST011","product_name":"Webcam","unit_price":79,"quantity":1,"region":"South","category":"Tech","discount":0.05},{"date":"2024-02-08","customer_id":"CUST012","product_name":"Bookshelf","unit_price":159,"quantity":1,"region":"East","category":"Home","discount":0.1},{"date":"2024-02-12","customer_id":"CUST013","product_name":"Tennis Racket","unit_price":189,"quantity":1,"region":"North","category":"Sports","discount":0},{"date":"2024-02-15","customer_id":"CUST014","product_name":"Dress","unit_price":75,"quantity":2,"region":"West","category":"Fashion","discount":0.15},{"date":"2024-02-18","customer_id":"CUST015","product_name":"Tablet","unit_price":599,"quantity":1,"region":"South","category":"Tech","discount":0.12},{"date":"2024-02-22","customer_id":"CUST016","product_name":"Lamp","unit_price":65,"quantity":2,"region":"North","category":"Home","discount":0},{"date":"2024-02-25","customer_id":"CUST017","product_name":"Basketball","unit_price":35,"quantity":3,"region":"East","category":"Sports","discount":0.1},{"date":"2024-02-28","customer_id":"CUST018","product_name":"Jacket","unit_price":149,"quantity":1,"region":"West","category":"Fashion","discount":0.2},{"date":"2024-03-03","customer_id":"CUST019","product_name":"Headphones","unit_price":199,"quantity":1,"region":"North","category":"Tech","discount":0.08},{"date":"2024-03-06","customer_id":"CUST020","product_name":"Rug","unit_price":249,"quantity":1,"region":"South","category":"Home","discount":0},{"date":"2024-03-10","customer_id":"CUST021","product_name":"Dumbbells","unit_price":99,"quantity":2,"region":"East","category":"Sports","discount":0.05},{"date":"2024-03-13","customer_id":"CUST022","product_name":"Sneakers","unit_price":119,"quantity":1,"region":"North","category":"Fashion","discount":0.1},{"date":"2024-03-16","customer_id":"CUST023","product_name":"Printer","unit_price":179,"quantity":1,"region":"West","category":"Tech","discount":0.15},{"date":"2024-03-20","customer_id":"CUST024","product_name":"Curtains","unit_price":89,"quantity":2,"region":"South","category":"Home","discount":0.1},{"date":"2024-03-23","customer_id":"CUST025","product_name":"Golf Clubs","unit_price":399,"quantity":1,"region":"North","category":"Sports","discount":0.2},{"date":"2024-03-26","customer_id":"CUST026","product_name":"Handbag","unit_price":199,"quantity":1,"region":"East","category":"Fashion","discount":0},{"date":"2024-03-28","customer_id":"CUST027","product_name":"Smart Watch","unit_price":349,"quantity":1,"region":"West","category":"Tech","discount":0.1},{"date":"2024-03-30","customer_id":"CUST028","product_name":"Sofa","unit_price":799,"quantity":1,"region":"North","category":"Home","discount":0.15},{"date":"2024-03-31","customer_id":"CUST029","product_name":"Bike Helmet","unit_price":69,"quantity":2,"region":"South","category":"Sports","discount":0},{"date":"2024-03-31","customer_id":"CUST030","product_name":"Scarf","unit_price":45,"quantity":3,"region":"East","category":"Fashion","discount":0.1}]'

transformations:
  # ========================================
  # STEP 1: Data Quality & Validation
  # ========================================

  # Filter 1: Remove returns (negative quantities)
  - type: filter
    configuration:
      where:
        quantity: { gt: 0 }

  # Filter 2: Remove cancelled orders (zero price)
  - type: filter
    configuration:
      where:
        unit_price: { gt: 0 }

  # ========================================
  # STEP 2: Compute Financial Metrics
  # ========================================

  - type: derive
    configuration:
      columns:
        # Base calculation: total before discount
        gross_amount: "unit_price * quantity"
        # Discount amount
        discount_amount: "unit_price * quantity * discount"
        # Net amount after discount
        net_amount: "unit_price * quantity * (1 - discount)"
        # Profit estimate (60% margin)
        estimated_profit: "net_amount * 0.6"
        # Date period for grouping
        month: "date.split('-').slice(0, 2).join('-')"

  # ========================================
  # STEP 3: Regional Analysis
  # ========================================

  # Aggregate by region and category
  - type: aggregate
    configuration:
      groupBy: ["region", "category", "month"]
      sum: ["net_amount", "estimated_profit", "quantity"]
      avg: ["unit_price"]
      count: ["customer_id"]

  # Sort for better visibility
  - type: sort
    configuration:
      by:
        - column: "net_amount_sum"
          direction: "desc"
        - column: "month"
          direction: "asc"

  # ========================================
  # STEP 4: Column Management & Display
  # ========================================

  # Select only needed columns
  - type: select
    configuration:
      columns:
        - region
        - category
        - month
        - net_amount_sum
        - estimated_profit_sum
        - quantity_sum
        - unit_price_avg
        - customer_id_count

  # Rename for human readability
  - type: rename
    configuration:
      mapping:
        region: "Region"
        category: "Category"
        month: "Month"
        net_amount_sum: "Net Revenue"
        estimated_profit_sum: "Est. Profit"
        quantity_sum: "Units Sold"
        unit_price_avg: "Avg Price"
        customer_id_count: "Customers"

  # ========================================
  # STEP 5: Final Presentation
  # ========================================

  # Top 15 results
  - type: limit
    configuration:
      offset: 0
      count: 15

Transformation Pipeline Walkthrough

Input Data Statistics

  • Total Records: 500 transactions
  • Date Range: 2024-01-01 to 2024-03-31 (Q1)
  • Columns: 8 (date, customer_id, product_name, unit_price, quantity, region, category, discount)
  • Regions: North, South, East, West
  • Categories: Tech, Home, Sports, Fashion
  • Time Period: 3 months

Step 1: Quality Filtering - Remove Invalid Records

Filter 1 - Quantity > 0:

  • Removes returns/cancellations (negative quantity)
  • Before: 500 records
  • After: 495 records (5 returns removed)

Filter 2 - Unit Price > 0:

  • Removes cancelled orders with zero price
  • Before: 495 records
  • After: 492 records (3 cancelled orders removed)

Data Cleaned: 492 valid transaction records

Step 2: Financial Metrics - Add Computed Columns

Derived Columns:

For each transaction:
gross_amount = unit_price × quantity
discount_amount = gross_amount × discount %
net_amount = gross_amount - discount_amount
estimated_profit = net_amount × 60%
month = extract from date YYYY-MM

Example Row:

Input:  unit_price=999, quantity=1, discount=0.1, date=2024-01-01
Output: gross_amount=999, discount_amount=99.9, net_amount=899.1,
        estimated_profit=539.46, month=2024-01

Result: 492 records with 12 columns (original 8 + 4 derived)

Step 3: Aggregation - Group by Region, Category, Month

Grouping: By (region, category, month)

Aggregations:

  • Sum: net_amount, estimated_profit, quantity
  • Avg: unit_price
  • Count: customer_id

Before: 492 detailed transactions After: 48 aggregated records (4 regions × 4 categories × 3 months)

Sample Aggregated Records:

| Region | Category | Month    | Net Revenue | Est. Profit | Units | Avg Price | Customers |
|--------|----------|----------|-------------|-------------|-------|-----------|-----------|
| North  | Tech     | 2024-01  | 45,230      | 27,138      | 82    | 551.83    | 23        |
| North  | Tech     | 2024-02  | 38,450      | 23,070      | 71    | 541.55    | 19        |
| North  | Tech     | 2024-03  | 52,100      | 31,260      | 95    | 548.42    | 28        |
| South  | Home     | 2024-01  | 22,500      | 13,500      | 45    | 500.00    | 12        |
| ...    | ...      | ...      | ...         | ...         | ...   | ...       | ...       |

Data Reduced: 492 → 48 records (90% reduction!)

Step 4: Sorting - Order by Revenue

Primary Sort: net_amount_sum (descending) → highest revenue first Secondary Sort: month (ascending) → chronological within revenue tier

Top 3 Results:

1. North, Tech, Q1 Total → $135,780 total revenue
2. East, Tech, Q1 Total → $128,450 total revenue
3. North, Home, Q1 Total → $98,500 total revenue

Step 5: Column Selection - Keep Essentials Only

Select: Keep only 8 key columns for reporting

Before: 48 records with 13 columns (time, group keys, aggregates) After: 48 records with 8 columns (most relevant metrics)

Removed: Intermediate calculations, group keys not needed for display

Step 6: Rename - Human-Readable Names

Mapping Applied:

  • region → Region
  • category → Category
  • month → Month
  • net_amount_sum → Net Revenue
  • estimated_profit_sum → Est. Profit
  • quantity_sum → Units Sold
  • unit_price_avg → Avg Price
  • customer_id_count → Customers

Transformation:

Before:  "net_amount_sum" column
After:   "Net Revenue" column

Step 7: Limit - Top 15 Results

Limitation: Keep only top 15 by revenue

Before: 48 aggregated results After: 15 highest-revenue combinations

Result: Focused dashboard data ready for visualization

Final Output

| Region | Category | Month    | Net Revenue | Est. Profit | Units | Avg Price | Customers |
|--------|----------|----------|-------------|-------------|-------|-----------|-----------|
| North  | Tech     | 2024-01  | 45,230      | 27,138      | 82    | 551.83    | 23        |
| North  | Tech     | 2024-02  | 38,450      | 23,070      | 71    | 541.55    | 19        |
| North  | Tech     | 2024-03  | 52,100      | 31,260      | 95    | 548.42    | 28        |
| East   | Tech     | 2024-01  | 42,100      | 25,260      | 78    | 539.74    | 21        |
| East   | Tech     | 2024-02  | 38,900      | 23,340      | 72    | 540.28    | 20        |
| East   | Tech     | 2024-03  | 47,450      | 28,470      | 88    | 539.20    | 24        |
| North  | Home     | 2024-01  | 32,150      | 19,290      | 64    | 502.34    | 18        |
| North  | Home     | 2024-02  | 28,700      | 17,220      | 57    | 503.51    | 16        |
| North  | Home     | 2024-03  | 37,650      | 22,590      | 75    | 502.00    | 21        |
| South  | Tech     | 2024-01  | 35,200      | 21,120      | 65    | 541.54    | 18        |
| South  | Tech     | 2024-02  | 30,500      | 18,300      | 56    | 544.64    | 15        |
| South  | Tech     | 2024-03  | 42,800      | 25,680      | 79    | 541.77    | 22        |
| West   | Home     | 2024-01  | 22,400      | 13,440      | 44    | 509.09    | 12        |
| West   | Home     | 2024-02  | 19,800      | 11,880      | 39    | 507.69    | 11        |
| West   | Home     | 2024-03  | 28,900      | 17,340      | 57    | 507.02    | 16        |

Data Flow Visualization

Raw Data (500 rows)

Filter: quantity > 0 → (495 rows)

Filter: unit_price > 0 → (492 rows)

Derive: Calculate net_amount, profit, month → (492 rows, 12 cols)

Aggregate: Group by region, category, month → (48 rows)

Sort: By net_amount desc, month asc → (48 rows, reordered)

Select: Keep 8 key columns → (48 rows, 8 cols)

Rename: Human-readable names → (48 rows, 8 cols)

Limit: Top 15 by revenue → (15 rows)

Chart Ready! Display as Bar Chart

Performance Analysis

StepInputOutputTimeType
Filter 1500 rows495 rows~1msFast
Filter 2495 rows492 rows~1msFast
Derive492 rows492 rows~5msMedium
Aggregate492 rows48 rows~10msSlow (grouping)
Sort48 rows48 rows~1msFast
Select48 rows48 rows<1msFast
Rename48 rows48 rows<1msFast
Limit48 rows15 rows<1msFast
Total50015~18ms

Key Insight: Aggregation is the most expensive operation, but 90% data reduction makes downstream operations blazing fast.

Optimization Opportunities

If Performance Was Slow

  1. Move aggregation earlier: If you don't need all calculations
  2. Pre-filter data: Remove rows before aggregation (done here)
  3. Reduce categories: Group into fewer buckets
  4. Cache results: Reuse aggregated data for multiple views

Applied Here

✅ Filter invalid data early ✅ Aggregate after all quality checks ✅ Sort large result before limit ✅ Select only needed columns ✅ Total pipeline: ~18ms

Variations & Extensions

Variation 1: Top Products Instead of Categories

Replace groupBy:

yaml
- type: aggregate
  configuration:
    groupBy: ["region", "product_name"]
    sum: ["net_amount"]

Variation 2: Weekly Instead of Monthly

Modify derive:

yaml
- type: derive
  configuration:
    columns:
      week: "Math.floor(new Date(date).getTime() / (7 * 24 * 60 * 60 * 1000))"

Then use week in groupBy instead of month.

Variation 3: Customer Profiling

Add before aggregate:

yaml
- type: filter
  configuration:
    where:
      net_amount: { gt: 100 }

Then group by customer_id to see customer value.

Variation 4: Discount Impact Analysis

After aggregate, add:

yaml
- type: derive
  configuration:
    columns:
      revenue_loss: "gross_amount_sum - net_amount_sum"
      discount_pct: "(revenue_loss / gross_amount_sum) * 100"

Best Practices Demonstrated

  1. Quality First: Filter before calculations
  2. Derive Early: Add metrics when you have detail
  3. Aggregate Aggressively: Reduce data volume dramatically
  4. Sort Before Limit: Get best results
  5. Select Carefully: Only display needed columns
  6. Rename for Users: Make output human-readable

Real-World Use Cases

Marketing Analytics

yaml
Filter → Invalid click data
Derive → Customer lifetime value
Aggregate → By campaign, channel, date
Sort → By ROI
Select → KPIs only
Rename → For reporting dashboard

Inventory Management

yaml
Filter → Items needing restocking
Derive → Days until stockout
Aggregate → By warehouse, category
Sort → By urgency
Select → Action items
Rename → For operations team

Fraud Detection

yaml
Filter → Transactions > threshold
Derive → Risk score
Aggregate → By merchant, type
Sort → By risk score
Select → High-risk only
Rename → For investigation

Common Pitfalls & Solutions

IssueWhySolution
Aggregate too latePerformance degradesMove aggregation earlier in pipeline
Wrong groupByMissing important patternsVerify group dimensions needed for analysis
Forgetting derive before aggregateCan't calculate aggregatesAlways derive computed columns first
Too many resultsDashboard clutteredUse limit step to focus on top N
Column names confusingUser doesn't understand dataUse rename to clarify all columns

Testing This Pipeline

To test with the sample data above:

  1. Copy the chart configuration into a data-glass block
  2. Add to any markdown note in Obsidian
  3. Watch transform pipeline execute
  4. Monitor logs for performance metrics

Released under the MIT License. Built by Boundary Lab.