Skip to content

Data Transformations

Transform your data before visualization. Filter, aggregate, sort, and derive new columns.


Quick Reference

TypePurposeExample
filterRemove rowswhere: { status: { eq: "active" } }
aggregateGroup and summarizegroupBy: ["region"], sum: ["sales"]
sortOrder rowsby: [{ column: "date", direction: "desc" }]
limitTake N rowscount: 10
selectChoose columnscolumns: ["name", "value"]
renameRename columnsmapping: { firstName: "first_name" }
deriveAdd computed columnscolumns: { total: "price * quantity" }
pivotReshape dataindex: "date", columns: "category"
samplingReduce datasetmethod: "random", size: 100

Basic Structure

Add transformations to any chart:

yaml
data:
  file: data/sales.csv
transformations:
  - type: filter
    configuration:
      where: { status: { eq: "active" } }
  - type: sort
    configuration:
      by: [{ column: "date", direction: "desc" }]

Transformations execute in order. Output of one becomes input to next.


Filter

Remove rows based on conditions.

yaml
transformations:
  - type: filter
    configuration:
      where:
        status: { eq: "active" }

Operators

OperatorMeaningExample
eqEquals{ status: { eq: "active" } }
neNot equals{ status: { ne: "deleted" } }
gtGreater than{ price: { gt: 100 } }
gteGreater or equal{ age: { gte: 18 } }
ltLess than{ stock: { lt: 10 } }
lteLess or equal{ score: { lte: 50 } }
inIn list{ category: { in: ["A", "B"] } }
containsContains string{ name: { contains: "Corp" } }
betweenIn range{ price: { between: [10, 100] } }

Multiple Conditions

All conditions must match (AND):

yaml
where:
  status: { eq: "active" }
  price: { gt: 0 }
  category: { in: ["Electronics", "Books"] }

Aggregate

Group data and apply aggregation functions.

yaml
transformations:
  - type: aggregate
    configuration:
      groupBy: ["region"]
      sum: ["revenue"]
      avg: ["price"]
      count: ["id"]

Aggregation Functions

FunctionDescriptionOutput Column
sumSum of valuesfield_sum
avgAveragefield_avg
minMinimumfield_min
maxMaximumfield_max
countCount rowsfield_count
firstFirst valuefield_first
lastLast valuefield_last
concatJoin stringsfield_concat

Custom Output Names

yaml
transformations:
  - type: aggregate
    configuration:
      groupBy: ["region"]
      aggregations:
        - field: revenue
          operation: sum
          as: total_revenue
        - field: orders
          operation: count
          as: order_count

Sort

Order rows by one or more columns.

yaml
transformations:
  - type: sort
    configuration:
      by:
        - column: "date"
          direction: "desc"
        - column: "name"
          direction: "asc"

Shorthand

yaml
transformations:
  - type: sort
    configuration:
      by: "date"
      order: "desc"

Limit

Take first N rows, with optional offset.

yaml
transformations:
  - type: limit
    configuration:
      count: 10
      offset: 0

Pagination example:

yaml
# Page 2, 10 items per page
transformations:
  - type: limit
    configuration:
      count: 10
      offset: 10

Select

Choose which columns to keep.

yaml
transformations:
  - type: select
    configuration:
      columns: ["name", "value", "category"]

With Rename

yaml
transformations:
  - type: select
    configuration:
      columns: ["product_name", "sale_amount"]
      rename:
        product_name: "Product"
        sale_amount: "Sales"

Rename

Rename columns without selecting.

yaml
transformations:
  - type: rename
    configuration:
      mapping:
        firstName: "first_name"
        lastName: "last_name"
        emailAddress: "email"

Derive

Add computed columns using expressions.

yaml
transformations:
  - type: derive
    configuration:
      columns:
        total: "price * quantity"
        profit_margin: "(price - cost) / price * 100"
        full_name: "first_name + ' ' + last_name"

Supported Operations

  • Arithmetic: +, -, *, /
  • String concatenation: +
  • Parentheses for grouping
  • Column references by name

Pivot

Reshape data from long to wide format.

yaml
transformations:
  - type: pivot
    configuration:
      index: "date"
      columns: "category"
      values: "amount"

Before:

datecategoryamount
JanA100
JanB200
FebA150

After:

dateAB
Jan100200
Feb150null

Sampling

Reduce dataset size for performance.

yaml
transformations:
  - type: sampling
    configuration:
      method: "random"
      size: 1000

Methods

MethodDescription
randomRandom sample of N rows
firstFirst N rows
lastLast N rows
nthEvery Nth row

Chaining Transformations

Combine multiple transformations for complex pipelines:

markdown
```dg
type: bar
data:
  file: data/orders.csv
transformations:
  # Step 1: Add computed column
  - type: derive
    configuration:
      columns:
        total: "quantity * unit_price"

  # Step 2: Filter valid orders
  - type: filter
    configuration:
      where:
        total: { gt: 0 }
        status: { ne: "cancelled" }

  # Step 3: Group by category
  - type: aggregate
    configuration:
      groupBy: ["category"]
      sum: ["total"]

  # Step 4: Sort by highest total
  - type: sort
    configuration:
      by: [{ column: "total_sum", direction: "desc" }]

  # Step 5: Top 10 only
  - type: limit
    configuration:
      count: 10

x: category
y: total_sum
title: Top 10 Categories by Revenue
```

Performance Tips

  1. Filter Early: Remove unnecessary rows before aggregation
  2. Select Columns: Only keep what you need
  3. Aggregate Before Sort: Grouping reduces data size
  4. Limit Last: Use limit as the final step
  5. Use Sampling: For very large datasets, sample first

See Also

Released under the MIT License.