Margin Library
API reference for the margin Python library—load datasets and interact with your workspace.
The margin library connects your notebooks to your Margin workspace. It's pre-installed in every kernel.
Loading Datasets
margin.load()
Load a dataset from your workspace into a pandas DataFrame.
import margin
# Load by name
df = margin.load("sales_data")
# Load with sampling (default behavior)
df = margin.load("large_dataset") # Samples automatically
# Load all rows explicitly
df = margin.load("sales_data", full=True)
# Load specific columns only
df = margin.load("sales_data", columns=["date", "amount"])
# Custom sample size
df = margin.load("large_dataset", sample=5000)
Parameters
| Parameter | Type | Description |
|---|---|---|
name | str | Dataset name (filename, display name, or built-in) |
sample | int | None | Number of rows to sample. Uses environment default if not specified. |
full | bool | If True, load all rows regardless of size. Default False. |
columns | list[str] | None | Only load specific columns. Default loads all. |
filters | list[tuple] | None | Row filters for Parquet files. Format: [("col", "op", "val")] |
Returns
| Format | Return Type |
|---|---|
| CSV | pandas.DataFrame |
| JSON | pandas.DataFrame (from records) |
| JSONL | pandas.DataFrame (one row per line) |
| Parquet | pandas.DataFrame |
Examples
import margin
# Load and explore
df = margin.load("customer_data")
print(df.shape)
print(df.columns.tolist())
df.head()
# Load specific columns for efficiency
df = margin.load("transactions", columns=["date", "amount", "category"])
# Summary statistics
df.describe()
# Group and aggregate
df.groupby('category')['amount'].sum()
# Load full dataset when you need all rows
df = margin.load("events", full=True)
# Filter after loading
recent = df[df['date'] >= '2024-01-01']
Built-in Datasets
Margin includes several built-in datasets for learning and testing:
| Name | Rows | Description |
|---|---|---|
iris | 150 | Fisher's Iris flower classification |
tips | 244 | Restaurant tipping behavior |
penguins | 344 | Palmer Penguins measurements |
sales_demo | 500 | Synthetic quarterly sales |
import margin
# Load a built-in dataset
df = margin.load("iris")
df = margin.load("tips")
Name Resolution
The library tries to match your input in this order:
- Built-in dataset –
iris,tips,penguins,sales_demo - Exact name match – Your uploaded datasets
- Display name match – Human-readable names you assigned
Use exact names for clarity. Display names are convenient but may change.
Error Handling
import margin
try:
df = margin.load("nonexistent_file")
except margin.DatasetNotFoundError as e:
print(f"Dataset not found: {e}")
except margin.DatasetTooLargeError as e:
print(f"Dataset too large (use full=True if intentional): {e}")
except margin.InvalidDatasetError as e:
print(f"Failed to parse dataset: {e}")
| Exception | Cause |
|---|---|
DatasetNotFoundError | No matching dataset in workspace |
DatasetTooLargeError | Dataset exceeds limits and full=False |
InvalidDatasetError | File exists but couldn't be parsed |
StorageConnectionError | Unable to connect to storage |
Inspecting Datasets
margin.inspect()
Get metadata about a specific dataset without loading it into memory.
import margin
info = margin.inspect("sales_data")
print(f"Name: {info.name}")
print(f"Size: {info.size_mb:.1f} MB")
print(f"Rows: {info.row_count:,}")
print(f"Columns: {info.columns}")
Useful for checking file size before loading large datasets.
Listing Datasets
margin.list_datasets()
Get a list of all datasets in your workspace.
import margin
# List all datasets (including built-ins)
datasets = margin.list_datasets()
for ds in datasets:
print(f"{ds.name}: {ds.row_count} rows")
Returns
List of DatasetInfo objects with metadata:
DatasetInfo(
name='sales_data',
display_name='Q4 Sales',
file_format='csv',
size_bytes=1024000,
row_count=5000,
columns=['date', 'amount', 'region'],
is_builtin=False
)
Checking Existence
margin.exists()
Check if a dataset exists before loading.
import margin
if margin.exists("sales_data"):
df = margin.load("sales_data")
else:
print("Dataset not found")
Complete Example
Here's a typical workflow using the margin library:
import margin
import pandas as pd
import matplotlib.pyplot as plt
# 1. See what's available
datasets = margin.list_datasets()
print(f"Found {len(datasets)} datasets")
# 2. Check size before loading
info = margin.inspect("transactions")
print(f"Dataset: {info.row_count:,} rows, {info.size_mb:.1f} MB")
# 3. Load the data
df = margin.load("transactions")
# 4. Analyze
monthly = df.groupby(df['date'].dt.to_period('M'))['amount'].sum()
# 5. Visualize
plt.figure(figsize=(12, 6))
monthly.plot(kind='bar')
plt.title('Monthly Transaction Volume')
plt.ylabel('Amount ($)')
plt.tight_layout()
plt.show()
Tips
- Check size first – Use
margin.inspect()before loading large files - Use sampling – Default sampling prevents memory issues with big datasets
- Use Parquet – Faster loading and smaller memory footprint than CSV
- Handle errors – Wrap loads in try/except for robust notebooks
- Cache results – Assign to a variable instead of calling
margin.load()multiple times
Next Steps
- Upload datasets to your workspace
- Learn about the Python environment
- Create briefs from your analysis