Data Transformations

Transform your data before visualization. Filter, aggregate, sort, and derive new columns.

Quick Reference

Type	Purpose	Example
filter	Remove rows	`where: { status: { eq: "active" } }`
aggregate	Group and summarize	`groupBy: ["region"], sum: ["sales"]`
sort	Order rows	`by: [{ column: "date", direction: "desc" }]`
limit	Take N rows	`count: 10`
select	Choose columns	`columns: ["name", "value"]`
rename	Rename columns	`mapping: { firstName: "first_name" }`
derive	Add computed columns	`columns: { total: "price * quantity" }`
pivot	Reshape data	`index: "date", columns: "category"`
sampling	Reduce dataset	`method: "random", size: 100`

Basic Structure

Add transformations to any chart:

yaml

data:
  file: data/sales.csv
transformations:
  - type: filter
    configuration:
      where: { status: { eq: "active" } }
  - type: sort
    configuration:
      by: [{ column: "date", direction: "desc" }]

Transformations execute in order. Output of one becomes input to next.

Filter

Remove rows based on conditions.

yaml

transformations:
  - type: filter
    configuration:
      where:
        status: { eq: "active" }

Operators

Operator	Meaning	Example
`eq`	Equals	`{ status: { eq: "active" } }`
`ne`	Not equals	`{ status: { ne: "deleted" } }`
`gt`	Greater than	`{ price: { gt: 100 } }`
`gte`	Greater or equal	`{ age: { gte: 18 } }`
`lt`	Less than	`{ stock: { lt: 10 } }`
`lte`	Less or equal	`{ score: { lte: 50 } }`
`in`	In list	`{ category: { in: ["A", "B"] } }`
`contains`	Contains string	`{ name: { contains: "Corp" } }`
`between`	In range	`{ price: { between: [10, 100] } }`

Multiple Conditions

All conditions must match (AND):

yaml

where:
  status: { eq: "active" }
  price: { gt: 0 }
  category: { in: ["Electronics", "Books"] }

Aggregate

Group data and apply aggregation functions.

yaml

transformations:
  - type: aggregate
    configuration:
      groupBy: ["region"]
      sum: ["revenue"]
      avg: ["price"]
      count: ["id"]

Aggregation Functions

Function	Description	Output Column
`sum`	Sum of values	`field_sum`
`avg`	Average	`field_avg`
`min`	Minimum	`field_min`
`max`	Maximum	`field_max`
`count`	Count rows	`field_count`
`first`	First value	`field_first`
`last`	Last value	`field_last`
`concat`	Join strings	`field_concat`

Custom Output Names

yaml

transformations:
  - type: aggregate
    configuration:
      groupBy: ["region"]
      aggregations:
        - field: revenue
          operation: sum
          as: total_revenue
        - field: orders
          operation: count
          as: order_count

Sort

Order rows by one or more columns.

yaml

transformations:
  - type: sort
    configuration:
      by:
        - column: "date"
          direction: "desc"
        - column: "name"
          direction: "asc"

Shorthand

yaml

transformations:
  - type: sort
    configuration:
      by: "date"
      order: "desc"

Limit

Take first N rows, with optional offset.

yaml

transformations:
  - type: limit
    configuration:
      count: 10
      offset: 0

Pagination example:

yaml

# Page 2, 10 items per page
transformations:
  - type: limit
    configuration:
      count: 10
      offset: 10

Select

Choose which columns to keep.

yaml

transformations:
  - type: select
    configuration:
      columns: ["name", "value", "category"]

With Rename

yaml

transformations:
  - type: select
    configuration:
      columns: ["product_name", "sale_amount"]
      rename:
        product_name: "Product"
        sale_amount: "Sales"

Rename

Rename columns without selecting.

yaml

transformations:
  - type: rename
    configuration:
      mapping:
        firstName: "first_name"
        lastName: "last_name"
        emailAddress: "email"

Derive

Add computed columns using expressions.

yaml

transformations:
  - type: derive
    configuration:
      columns:
        total: "price * quantity"
        profit_margin: "(price - cost) / price * 100"
        full_name: "first_name + ' ' + last_name"

Supported Operations

Arithmetic: +, -, *, /
String concatenation: +
Parentheses for grouping
Column references by name

Pivot

Reshape data from long to wide format.

yaml

transformations:
  - type: pivot
    configuration:
      index: "date"
      columns: "category"
      values: "amount"

Before:

date	category	amount
Jan	A	100
Jan	B	200
Feb	A	150

After:

date	A	B
Jan	100	200
Feb	150	null

Sampling

Reduce dataset size for performance.

yaml

transformations:
  - type: sampling
    configuration:
      method: "random"
      size: 1000

Methods

Method	Description
`random`	Random sample of N rows
`first`	First N rows
`last`	Last N rows
`nth`	Every Nth row

Chaining Transformations

Combine multiple transformations for complex pipelines:

markdown

```dg
type: bar
data:
  file: data/orders.csv
transformations:
  # Step 1: Add computed column
  - type: derive
    configuration:
      columns:
        total: "quantity * unit_price"

  # Step 2: Filter valid orders
  - type: filter
    configuration:
      where:
        total: { gt: 0 }
        status: { ne: "cancelled" }

  # Step 3: Group by category
  - type: aggregate
    configuration:
      groupBy: ["category"]
      sum: ["total"]

  # Step 4: Sort by highest total
  - type: sort
    configuration:
      by: [{ column: "total_sum", direction: "desc" }]

  # Step 5: Top 10 only
  - type: limit
    configuration:
      count: 10

x: category
y: total_sum
title: Top 10 Categories by Revenue
```

Performance Tips

Filter Early: Remove unnecessary rows before aggregation
Select Columns: Only keep what you need
Aggregate Before Sort: Grouping reduces data size
Limit Last: Use limit as the final step
Use Sampling: For very large datasets, sample first

Data Transformations ​

Quick Reference ​

Basic Structure ​

Filter ​

Operators ​

Multiple Conditions ​

Aggregate ​

Aggregation Functions ​

Custom Output Names ​

Sort ​

Shorthand ​

Limit ​

Select ​

With Rename ​

Rename ​

Derive ​

Supported Operations ​

Pivot ​

Sampling ​

Methods ​

Chaining Transformations ​

Performance Tips ​

See Also ​

Data Transformations

Quick Reference

Basic Structure

Filter

Operators

Multiple Conditions

Aggregate

Aggregation Functions

Custom Output Names

Sort

Shorthand

Limit

Select

With Rename

Rename

Derive

Supported Operations

Pivot

Sampling

Methods

Chaining Transformations

Performance Tips

See Also