Python Environment

What's pre-installed in Margin kernels and how to add packages.

Margin kernels run Python 3.11 with a curated set of data science libraries pre-installed. You can also install additional packages within your session.

Pre-installed Libraries

Every kernel starts with these libraries ready to import:

Data Manipulation

Library	Version	Import
pandas	2.2+	`import pandas as pd`
numpy	1.26+	`import numpy as np`
polars	0.20+	`import polars as pl`

Visualization

Library	Version	Import
matplotlib	3.8+	`import matplotlib.pyplot as plt`
seaborn	0.13+	`import seaborn as sns`
plotly	5.18+	`import plotly.express as px`
altair	5.2+	`import altair as alt`

Machine Learning

Library	Version	Import
scikit-learn	1.4+	`from sklearn import ...`
statsmodels	0.14+	`import statsmodels.api as sm`
scipy	1.12+	`import scipy`

Utilities

Library	Version	Import
requests	2.31+	`import requests`
beautifulsoup4	4.12+	`from bs4 import BeautifulSoup`
openpyxl	3.1+	`import openpyxl`
pyarrow	15+	`import pyarrow`

Margin Tools

Library	Description
margin	Load datasets from your workspace

Installing Additional Packages

Need something else? Install packages within your session:

# Install a package
!pip install package-name

# Install a specific version
!pip install package-name==1.2.3

# Install multiple packages
!pip install package1 package2 package3

Installed packages persist for the duration of your kernel session. When you disconnect and reconnect, you'll need to reinstall them.

Common Additions

# NLP
!pip install nltk spacy transformers

# Time series
!pip install prophet pmdarima

# Geographic
!pip install geopandas folium

# Financial
!pip install yfinance pandas-ta

Working with Large Datasets

For large files, use efficient formats and lazy loading:

# Parquet is faster than CSV for large files
df = pd.read_parquet('large_file.parquet')

# Or use Polars for even better performance
import polars as pl
df = pl.read_parquet('large_file.parquet')

# Chunked reading for huge CSVs
chunks = pd.read_csv('huge.csv', chunksize=10000)
for chunk in chunks:
    process(chunk)

Memory Management

Kernels have memory limits. Keep your session healthy:

# Check memory usage
import sys
print(f"DataFrame size: {df.memory_usage(deep=True).sum() / 1e6:.1f} MB")

# Free memory by deleting large objects
del large_dataframe
import gc
gc.collect()

# Use efficient dtypes
df['category_col'] = df['category_col'].astype('category')
df['small_int'] = df['small_int'].astype('int8')

Display Settings

Configure how outputs render:

# Pandas display options
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:.2f}'.format)

# Matplotlib defaults
plt.rcParams['figure.figsize'] = [10, 6]
plt.rcParams['figure.dpi'] = 100

# Plotly defaults (for inline display)
import plotly.io as pio
pio.renderers.default = 'notebook'

Kernel Lifecycle

Understanding the kernel lifecycle helps avoid surprises:

State	What's Happening
Connecting	Establishing WebSocket connection to the kernel server
Warming up	Python environment initializing, loading packages. Cells you run will queue.
Connected	Ready to execute. All pre-installed packages available, plus your installed packages and variables.
Restarted	Fresh environment. Pre-installed packages only—reinstall custom packages, re-run setup cells.
Disconnected	Nothing running. Reconnect to get a fresh environment.

If you run cells while connecting or warming up, they queue and execute automatically once the kernel is ready.

Resource Limits

Kernels have built-in limits to keep your session secure:

Limit	Value	Description
Memory	~800 MB	Per-kernel RAM limit
Execution timeout	5 minutes	Max time for a single cell to run
Idle timeout	10 minutes	Disconnects after inactivity (auto-reconnects when you run code)

If your code hits the execution timeout, the kernel terminates and you'll need to reconnect. For long-running computations, consider breaking them into smaller chunks or using more efficient algorithms.

Put your imports and pip installs in the first cell. Run it after connecting or restarting.

Tips for Smooth Sessions

Group imports at the top – Easy to re-run after restart
Use requirements cells – Put !pip install commands in their own cell
Clear outputs before sharing – Reduces notebook file size
Restart when things get weird – Memory leaks happen; fresh start helps

Next Steps

Load your datasets with the margin library
Master keyboard shortcuts for faster work
Create briefs from your outputs

Keyboard Shortcuts

Speed up your workflow with keyboard shortcuts for notebooks and briefs.

Margin Library

API reference for the margin Python library—load datasets and interact with your workspace.