Python Environment

What's pre-installed in Margin kernels and how to add packages.

Margin kernels run Python 3.11 with a curated set of data science libraries pre-installed. You can also install additional packages within your session.

Pre-installed Libraries

Every kernel starts with these libraries ready to import:

Data Manipulation

LibraryVersionImport
pandas2.2+import pandas as pd
numpy1.26+import numpy as np
polars0.20+import polars as pl

Visualization

LibraryVersionImport
matplotlib3.8+import matplotlib.pyplot as plt
seaborn0.13+import seaborn as sns
plotly5.18+import plotly.express as px
altair5.2+import altair as alt

Machine Learning

LibraryVersionImport
scikit-learn1.4+from sklearn import ...
statsmodels0.14+import statsmodels.api as sm
scipy1.12+import scipy

Utilities

LibraryVersionImport
requests2.31+import requests
beautifulsoup44.12+from bs4 import BeautifulSoup
openpyxl3.1+import openpyxl
pyarrow15+import pyarrow

Margin Tools

LibraryDescription
marginLoad datasets from your workspace

Installing Additional Packages

Need something else? Install packages within your session:

# Install a package
!pip install package-name

# Install a specific version
!pip install package-name==1.2.3

# Install multiple packages
!pip install package1 package2 package3
Installed packages persist for the duration of your kernel session. When you disconnect and reconnect, you'll need to reinstall them.

Common Additions

# NLP
!pip install nltk spacy transformers

# Time series
!pip install prophet pmdarima

# Geographic
!pip install geopandas folium

# Financial
!pip install yfinance pandas-ta

Working with Large Datasets

For large files, use efficient formats and lazy loading:

# Parquet is faster than CSV for large files
df = pd.read_parquet('large_file.parquet')

# Or use Polars for even better performance
import polars as pl
df = pl.read_parquet('large_file.parquet')

# Chunked reading for huge CSVs
chunks = pd.read_csv('huge.csv', chunksize=10000)
for chunk in chunks:
    process(chunk)

Memory Management

Kernels have memory limits. Keep your session healthy:

# Check memory usage
import sys
print(f"DataFrame size: {df.memory_usage(deep=True).sum() / 1e6:.1f} MB")

# Free memory by deleting large objects
del large_dataframe
import gc
gc.collect()

# Use efficient dtypes
df['category_col'] = df['category_col'].astype('category')
df['small_int'] = df['small_int'].astype('int8')

Display Settings

Configure how outputs render:

# Pandas display options
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:.2f}'.format)

# Matplotlib defaults
plt.rcParams['figure.figsize'] = [10, 6]
plt.rcParams['figure.dpi'] = 100

# Plotly defaults (for inline display)
import plotly.io as pio
pio.renderers.default = 'notebook'

Kernel Lifecycle

Understanding the kernel lifecycle helps avoid surprises:

StateWhat's Happening
ConnectingEstablishing WebSocket connection to the kernel server
Warming upPython environment initializing, loading packages. Cells you run will queue.
ConnectedReady to execute. All pre-installed packages available, plus your installed packages and variables.
RestartedFresh environment. Pre-installed packages only—reinstall custom packages, re-run setup cells.
DisconnectedNothing running. Reconnect to get a fresh environment.
If you run cells while connecting or warming up, they queue and execute automatically once the kernel is ready.

Resource Limits

Kernels have built-in limits to keep your session secure:

LimitValueDescription
Memory~800 MBPer-kernel RAM limit
Execution timeout5 minutesMax time for a single cell to run
Idle timeout10 minutesDisconnects after inactivity (auto-reconnects when you run code)

If your code hits the execution timeout, the kernel terminates and you'll need to reconnect. For long-running computations, consider breaking them into smaller chunks or using more efficient algorithms.

Put your imports and pip installs in the first cell. Run it after connecting or restarting.

Tips for Smooth Sessions

  1. Group imports at the top – Easy to re-run after restart
  2. Use requirements cells – Put !pip install commands in their own cell
  3. Clear outputs before sharing – Reduces notebook file size
  4. Restart when things get weird – Memory leaks happen; fresh start helps

Next Steps