Margin Library

API reference for the margin Python library—load datasets and interact with your workspace.

The margin library connects your notebooks to your Margin workspace. It's pre-installed in every kernel.

Loading Datasets

margin.load()

Load a dataset from your workspace into a pandas DataFrame.

import margin

# Load by name
df = margin.load("sales_data")

# Load with sampling (default behavior)
df = margin.load("large_dataset")  # Samples automatically

# Load all rows explicitly
df = margin.load("sales_data", full=True)

# Load specific columns only
df = margin.load("sales_data", columns=["date", "amount"])

# Custom sample size
df = margin.load("large_dataset", sample=5000)

Parameters

ParameterTypeDescription
namestrDataset name (filename, display name, or built-in)
sampleint | NoneNumber of rows to sample. Uses environment default if not specified.
fullboolIf True, load all rows regardless of size. Default False.
columnslist[str] | NoneOnly load specific columns. Default loads all.
filterslist[tuple] | NoneRow filters for Parquet files. Format: [("col", "op", "val")]

Returns

FormatReturn Type
CSVpandas.DataFrame
JSONpandas.DataFrame (from records)
JSONLpandas.DataFrame (one row per line)
Parquetpandas.DataFrame

Examples

import margin

# Load and explore
df = margin.load("customer_data")
print(df.shape)
print(df.columns.tolist())
df.head()
# Load specific columns for efficiency
df = margin.load("transactions", columns=["date", "amount", "category"])

# Summary statistics
df.describe()

# Group and aggregate
df.groupby('category')['amount'].sum()
# Load full dataset when you need all rows
df = margin.load("events", full=True)

# Filter after loading
recent = df[df['date'] >= '2024-01-01']

Built-in Datasets

Margin includes several built-in datasets for learning and testing:

NameRowsDescription
iris150Fisher's Iris flower classification
tips244Restaurant tipping behavior
penguins344Palmer Penguins measurements
sales_demo500Synthetic quarterly sales
import margin

# Load a built-in dataset
df = margin.load("iris")
df = margin.load("tips")

Name Resolution

The library tries to match your input in this order:

  1. Built-in datasetiris, tips, penguins, sales_demo
  2. Exact name match – Your uploaded datasets
  3. Display name match – Human-readable names you assigned
Use exact names for clarity. Display names are convenient but may change.

Error Handling

import margin

try:
    df = margin.load("nonexistent_file")
except margin.DatasetNotFoundError as e:
    print(f"Dataset not found: {e}")
except margin.DatasetTooLargeError as e:
    print(f"Dataset too large (use full=True if intentional): {e}")
except margin.InvalidDatasetError as e:
    print(f"Failed to parse dataset: {e}")
ExceptionCause
DatasetNotFoundErrorNo matching dataset in workspace
DatasetTooLargeErrorDataset exceeds limits and full=False
InvalidDatasetErrorFile exists but couldn't be parsed
StorageConnectionErrorUnable to connect to storage

Inspecting Datasets

margin.inspect()

Get metadata about a specific dataset without loading it into memory.

import margin

info = margin.inspect("sales_data")

print(f"Name: {info.name}")
print(f"Size: {info.size_mb:.1f} MB")
print(f"Rows: {info.row_count:,}")
print(f"Columns: {info.columns}")

Useful for checking file size before loading large datasets.

Listing Datasets

margin.list_datasets()

Get a list of all datasets in your workspace.

import margin

# List all datasets (including built-ins)
datasets = margin.list_datasets()

for ds in datasets:
    print(f"{ds.name}: {ds.row_count} rows")

Returns

List of DatasetInfo objects with metadata:

DatasetInfo(
    name='sales_data',
    display_name='Q4 Sales',
    file_format='csv',
    size_bytes=1024000,
    row_count=5000,
    columns=['date', 'amount', 'region'],
    is_builtin=False
)

Checking Existence

margin.exists()

Check if a dataset exists before loading.

import margin

if margin.exists("sales_data"):
    df = margin.load("sales_data")
else:
    print("Dataset not found")

Complete Example

Here's a typical workflow using the margin library:

import margin
import pandas as pd
import matplotlib.pyplot as plt

# 1. See what's available
datasets = margin.list_datasets()
print(f"Found {len(datasets)} datasets")

# 2. Check size before loading
info = margin.inspect("transactions")
print(f"Dataset: {info.row_count:,} rows, {info.size_mb:.1f} MB")

# 3. Load the data
df = margin.load("transactions")

# 4. Analyze
monthly = df.groupby(df['date'].dt.to_period('M'))['amount'].sum()

# 5. Visualize
plt.figure(figsize=(12, 6))
monthly.plot(kind='bar')
plt.title('Monthly Transaction Volume')
plt.ylabel('Amount ($)')
plt.tight_layout()
plt.show()

Tips

  1. Check size first – Use margin.inspect() before loading large files
  2. Use sampling – Default sampling prevents memory issues with big datasets
  3. Use Parquet – Faster loading and smaller memory footprint than CSV
  4. Handle errors – Wrap loads in try/except for robust notebooks
  5. Cache results – Assign to a variable instead of calling margin.load() multiple times

Next Steps