The margin library connects your notebooks to your Margin workspace. It's pre-installed in every kernel.
margin.load()Load a dataset from your workspace into a pandas DataFrame.
import margin
# Load by name
df = margin.load("sales_data")
# Load with sampling (default behavior)
df = margin.load("large_dataset") # Samples automatically
# Load all rows explicitly
df = margin.load("sales_data", full=True)
# Load specific columns only
df = margin.load("sales_data", columns=["date", "amount"])
# Custom sample size
df = margin.load("large_dataset", sample=5000)
| Parameter | Type | Description |
|---|---|---|
name | str | Dataset name (filename, display name, or built-in) |
sample | int | None | Number of rows to sample. Uses environment default if not specified. |
full | bool | If True, load all rows regardless of size. Default False. |
columns | list[str] | None | Only load specific columns. Default loads all. |
filters | list[tuple] | None | Row filters for Parquet files. Format: [("col", "op", "val")] |
| Format | Return Type |
|---|---|
| CSV | pandas.DataFrame |
| JSON | pandas.DataFrame (from records) |
| JSONL | pandas.DataFrame (one row per line) |
| Parquet | pandas.DataFrame |
import margin
# Load and explore
df = margin.load("customer_data")
print(df.shape)
print(df.columns.tolist())
df.head()
# Load specific columns for efficiency
df = margin.load("transactions", columns=["date", "amount", "category"])
# Summary statistics
df.describe()
# Group and aggregate
df.groupby('category')['amount'].sum()
# Load full dataset when you need all rows
df = margin.load("events", full=True)
# Filter after loading
recent = df[df['date'] >= '2024-01-01']
Margin includes several built-in datasets for learning and testing:
| Name | Rows | Description |
|---|---|---|
iris | 150 | Fisher's Iris flower classification |
tips | 244 | Restaurant tipping behavior |
penguins | 344 | Palmer Penguins measurements |
sales_demo | 500 | Synthetic quarterly sales |
import margin
# Load a built-in dataset
df = margin.load("iris")
df = margin.load("tips")
The library tries to match your input in this order:
iris, tips, penguins, sales_demoimport margin
try:
df = margin.load("nonexistent_file")
except margin.DatasetNotFoundError as e:
print(f"Dataset not found: {e}")
except margin.DatasetTooLargeError as e:
print(f"Dataset too large (use full=True if intentional): {e}")
except margin.InvalidDatasetError as e:
print(f"Failed to parse dataset: {e}")
| Exception | Cause |
|---|---|
DatasetNotFoundError | No matching dataset in workspace |
DatasetTooLargeError | Dataset exceeds limits and full=False |
InvalidDatasetError | File exists but couldn't be parsed |
StorageConnectionError | Unable to connect to storage |
margin.inspect()Get metadata about a specific dataset without loading it into memory.
import margin
info = margin.inspect("sales_data")
print(f"Name: {info.name}")
print(f"Size: {info.size_mb:.1f} MB")
print(f"Rows: {info.row_count:,}")
print(f"Columns: {info.columns}")
Useful for checking file size before loading large datasets.
margin.list_datasets()Get a list of all datasets in your workspace.
import margin
# List all datasets (including built-ins)
datasets = margin.list_datasets()
for ds in datasets:
print(f"{ds.name}: {ds.row_count} rows")
List of DatasetInfo objects with metadata:
DatasetInfo(
name='sales_data',
display_name='Q4 Sales',
file_format='csv',
size_bytes=1024000,
row_count=5000,
columns=['date', 'amount', 'region'],
is_builtin=False
)
margin.exists()Check if a dataset exists before loading.
import margin
if margin.exists("sales_data"):
df = margin.load("sales_data")
else:
print("Dataset not found")
Here's a typical workflow using the margin library:
import margin
import pandas as pd
import matplotlib.pyplot as plt
# 1. See what's available
datasets = margin.list_datasets()
print(f"Found {len(datasets)} datasets")
# 2. Check size before loading
info = margin.inspect("transactions")
print(f"Dataset: {info.row_count:,} rows, {info.size_mb:.1f} MB")
# 3. Load the data
df = margin.load("transactions")
# 4. Analyze
monthly = df.groupby(df['date'].dt.to_period('M'))['amount'].sum()
# 5. Visualize
plt.figure(figsize=(12, 6))
monthly.plot(kind='bar')
plt.title('Monthly Transaction Volume')
plt.ylabel('Amount ($)')
plt.tight_layout()
plt.show()
margin.inspect() before loading large filesmargin.load() multiple times