Data Transformations
Transform your data before visualization. Filter, aggregate, sort, and derive new columns.
Quick Reference
| Type | Purpose | Example |
|---|---|---|
| filter | Remove rows | where: { status: { eq: "active" } } |
| aggregate | Group and summarize | groupBy: ["region"], sum: ["sales"] |
| sort | Order rows | by: [{ column: "date", direction: "desc" }] |
| limit | Take N rows | count: 10 |
| select | Choose columns | columns: ["name", "value"] |
| rename | Rename columns | mapping: { firstName: "first_name" } |
| derive | Add computed columns | columns: { total: "price * quantity" } |
| pivot | Reshape data | index: "date", columns: "category" |
| sampling | Reduce dataset | method: "random", size: 100 |
Basic Structure
Add transformations to any chart:
yaml
data:
file: data/sales.csv
transformations:
- type: filter
configuration:
where: { status: { eq: "active" } }
- type: sort
configuration:
by: [{ column: "date", direction: "desc" }]Transformations execute in order. Output of one becomes input to next.
Filter
Remove rows based on conditions.
yaml
transformations:
- type: filter
configuration:
where:
status: { eq: "active" }Operators
| Operator | Meaning | Example |
|---|---|---|
eq | Equals | { status: { eq: "active" } } |
ne | Not equals | { status: { ne: "deleted" } } |
gt | Greater than | { price: { gt: 100 } } |
gte | Greater or equal | { age: { gte: 18 } } |
lt | Less than | { stock: { lt: 10 } } |
lte | Less or equal | { score: { lte: 50 } } |
in | In list | { category: { in: ["A", "B"] } } |
contains | Contains string | { name: { contains: "Corp" } } |
between | In range | { price: { between: [10, 100] } } |
Multiple Conditions
All conditions must match (AND):
yaml
where:
status: { eq: "active" }
price: { gt: 0 }
category: { in: ["Electronics", "Books"] }Aggregate
Group data and apply aggregation functions.
yaml
transformations:
- type: aggregate
configuration:
groupBy: ["region"]
sum: ["revenue"]
avg: ["price"]
count: ["id"]Aggregation Functions
| Function | Description | Output Column |
|---|---|---|
sum | Sum of values | field_sum |
avg | Average | field_avg |
min | Minimum | field_min |
max | Maximum | field_max |
count | Count rows | field_count |
first | First value | field_first |
last | Last value | field_last |
concat | Join strings | field_concat |
Custom Output Names
yaml
transformations:
- type: aggregate
configuration:
groupBy: ["region"]
aggregations:
- field: revenue
operation: sum
as: total_revenue
- field: orders
operation: count
as: order_countSort
Order rows by one or more columns.
yaml
transformations:
- type: sort
configuration:
by:
- column: "date"
direction: "desc"
- column: "name"
direction: "asc"Shorthand
yaml
transformations:
- type: sort
configuration:
by: "date"
order: "desc"Limit
Take first N rows, with optional offset.
yaml
transformations:
- type: limit
configuration:
count: 10
offset: 0Pagination example:
yaml
# Page 2, 10 items per page
transformations:
- type: limit
configuration:
count: 10
offset: 10Select
Choose which columns to keep.
yaml
transformations:
- type: select
configuration:
columns: ["name", "value", "category"]With Rename
yaml
transformations:
- type: select
configuration:
columns: ["product_name", "sale_amount"]
rename:
product_name: "Product"
sale_amount: "Sales"Rename
Rename columns without selecting.
yaml
transformations:
- type: rename
configuration:
mapping:
firstName: "first_name"
lastName: "last_name"
emailAddress: "email"Derive
Add computed columns using expressions.
yaml
transformations:
- type: derive
configuration:
columns:
total: "price * quantity"
profit_margin: "(price - cost) / price * 100"
full_name: "first_name + ' ' + last_name"Supported Operations
- Arithmetic:
+,-,*,/ - String concatenation:
+ - Parentheses for grouping
- Column references by name
Pivot
Reshape data from long to wide format.
yaml
transformations:
- type: pivot
configuration:
index: "date"
columns: "category"
values: "amount"Before:
| date | category | amount |
|---|---|---|
| Jan | A | 100 |
| Jan | B | 200 |
| Feb | A | 150 |
After:
| date | A | B |
|---|---|---|
| Jan | 100 | 200 |
| Feb | 150 | null |
Sampling
Reduce dataset size for performance.
yaml
transformations:
- type: sampling
configuration:
method: "random"
size: 1000Methods
| Method | Description |
|---|---|
random | Random sample of N rows |
first | First N rows |
last | Last N rows |
nth | Every Nth row |
Chaining Transformations
Combine multiple transformations for complex pipelines:
markdown
```dg
type: bar
data:
file: data/orders.csv
transformations:
# Step 1: Add computed column
- type: derive
configuration:
columns:
total: "quantity * unit_price"
# Step 2: Filter valid orders
- type: filter
configuration:
where:
total: { gt: 0 }
status: { ne: "cancelled" }
# Step 3: Group by category
- type: aggregate
configuration:
groupBy: ["category"]
sum: ["total"]
# Step 4: Sort by highest total
- type: sort
configuration:
by: [{ column: "total_sum", direction: "desc" }]
# Step 5: Top 10 only
- type: limit
configuration:
count: 10
x: category
y: total_sum
title: Top 10 Categories by Revenue
```Performance Tips
- Filter Early: Remove unnecessary rows before aggregation
- Select Columns: Only keep what you need
- Aggregate Before Sort: Grouping reduces data size
- Limit Last: Use limit as the final step
- Use Sampling: For very large datasets, sample first
See Also
- Data Sources - Load data to transform
- Cheat Sheet - Quick reference
- Recipes - Real-world examples