Margin Library

API reference for the margin Python library—load datasets and interact with your workspace.

The margin library connects your notebooks to your Margin workspace. It's pre-installed in every kernel.

Loading Datasets

`margin.load()`

Load a dataset from your workspace into a pandas DataFrame.

import margin

# Load by name
df = margin.load("sales_data")

# Load with sampling (default behavior)
df = margin.load("large_dataset")  # Samples automatically

# Load all rows explicitly
df = margin.load("sales_data", full=True)

# Load specific columns only
df = margin.load("sales_data", columns=["date", "amount"])

# Custom sample size
df = margin.load("large_dataset", sample=5000)

Parameters

Parameter	Type	Description
`name`	`str`	Dataset name (filename, display name, or built-in)
`sample`	`int \| None`	Number of rows to sample. Uses environment default if not specified.
`full`	`bool`	If `True`, load all rows regardless of size. Default `False`.
`columns`	`list[str] \| None`	Only load specific columns. Default loads all.
`filters`	`list[tuple] \| None`	Row filters for Parquet files. Format: `[("col", "op", "val")]`

Returns

Format	Return Type
CSV	`pandas.DataFrame`
JSON	`pandas.DataFrame` (from records)
JSONL	`pandas.DataFrame` (one row per line)
Parquet	`pandas.DataFrame`

Examples

import margin

# Load and explore
df = margin.load("customer_data")
print(df.shape)
print(df.columns.tolist())
df.head()

# Load specific columns for efficiency
df = margin.load("transactions", columns=["date", "amount", "category"])

# Summary statistics
df.describe()

# Group and aggregate
df.groupby('category')['amount'].sum()

# Load full dataset when you need all rows
df = margin.load("events", full=True)

# Filter after loading
recent = df[df['date'] >= '2024-01-01']

Built-in Datasets

Margin includes several built-in datasets for learning and testing:

Name	Rows	Description
`iris`	150	Fisher's Iris flower classification
`tips`	244	Restaurant tipping behavior
`penguins`	344	Palmer Penguins measurements
`sales_demo`	500	Synthetic quarterly sales

import margin

# Load a built-in dataset
df = margin.load("iris")
df = margin.load("tips")

Name Resolution

The library tries to match your input in this order:

Built-in dataset – iris, tips, penguins, sales_demo
Exact name match – Your uploaded datasets
Display name match – Human-readable names you assigned

Use exact names for clarity. Display names are convenient but may change.

Error Handling

import margin

try:
    df = margin.load("nonexistent_file")
except margin.DatasetNotFoundError as e:
    print(f"Dataset not found: {e}")
except margin.DatasetTooLargeError as e:
    print(f"Dataset too large (use full=True if intentional): {e}")
except margin.InvalidDatasetError as e:
    print(f"Failed to parse dataset: {e}")

Exception	Cause
`DatasetNotFoundError`	No matching dataset in workspace
`DatasetTooLargeError`	Dataset exceeds limits and `full=False`
`InvalidDatasetError`	File exists but couldn't be parsed
`StorageConnectionError`	Unable to connect to storage

Inspecting Datasets

`margin.inspect()`

Get metadata about a specific dataset without loading it into memory.

import margin

info = margin.inspect("sales_data")

print(f"Name: {info.name}")
print(f"Size: {info.size_mb:.1f} MB")
print(f"Rows: {info.row_count:,}")
print(f"Columns: {info.columns}")

Useful for checking file size before loading large datasets.

Listing Datasets

`margin.list_datasets()`

Get a list of all datasets in your workspace.

import margin

# List all datasets (including built-ins)
datasets = margin.list_datasets()

for ds in datasets:
    print(f"{ds.name}: {ds.row_count} rows")

Returns

List of DatasetInfo objects with metadata:

DatasetInfo(
    name='sales_data',
    display_name='Q4 Sales',
    file_format='csv',
    size_bytes=1024000,
    row_count=5000,
    columns=['date', 'amount', 'region'],
    is_builtin=False
)

Checking Existence

`margin.exists()`

Check if a dataset exists before loading.

import margin

if margin.exists("sales_data"):
    df = margin.load("sales_data")
else:
    print("Dataset not found")

Complete Example

Here's a typical workflow using the margin library:

import margin
import pandas as pd
import matplotlib.pyplot as plt

# 1. See what's available
datasets = margin.list_datasets()
print(f"Found {len(datasets)} datasets")

# 2. Check size before loading
info = margin.inspect("transactions")
print(f"Dataset: {info.row_count:,} rows, {info.size_mb:.1f} MB")

# 3. Load the data
df = margin.load("transactions")

# 4. Analyze
monthly = df.groupby(df['date'].dt.to_period('M'))['amount'].sum()

# 5. Visualize
plt.figure(figsize=(12, 6))
monthly.plot(kind='bar')
plt.title('Monthly Transaction Volume')
plt.ylabel('Amount ($)')
plt.tight_layout()
plt.show()

Tips

Check size first – Use margin.inspect() before loading large files
Use sampling – Default sampling prevents memory issues with big datasets
Use Parquet – Faster loading and smaller memory footprint than CSV
Handle errors – Wrap loads in try/except for robust notebooks
Cache results – Assign to a variable instead of calling margin.load() multiple times

Next Steps

Python Environment

What's pre-installed in Margin kernels and how to add packages.

AI Agent

Work alongside an AI that can write code, run cells, and iterate on analysis in your notebooks.