Guide·

A Practical Guide to Reproducible Analysis

How to make sure your analysis can be re-run and verified, without becoming a software engineer.

What Reproducibility Actually Means

Reproducibility has an intimidating reputation. Mention it and people think of academic papers with containerized environments, pinned dependencies, and automated test suites. That's one version of reproducibility, but it's not the only one, and it's not always the most useful one for working analysts.

In practice, reproducible analysis means something simpler: can someone else understand what you did and verify it? Can they read your code and follow your reasoning? Can they run your analysis on the same data and get similar results? Can they trace any output back to its source?

Most analytical work doesn't need lab-grade reproducibility. It needs enough transparency that a colleague could check your work if they needed to. That's a much more achievable bar.

Three Levels of Reproducibility

Think of reproducibility on a spectrum. At the first level, your analysis is reviewable—someone can read your code and understand the methodology. This is the bare minimum. If your code is so convoluted that nobody can follow it, you don't have reproducibility; you have a black box.

At the second level, your analysis is re-runnable. Someone can take your code and your data, execute everything, and get results that are similar to what you got. This requires that your code actually works end-to-end and that the data is available.

At the third level, your analysis is verifiable. Someone can trace any specific output—any chart, any number, any claim—back to the exact code and data that produced it. This is where data lineage becomes important. It's not enough to know that the analysis could be reproduced; you need to know that this specific output came from this specific run, using this specific dataset.

Most business analysis should aim for levels one and two. Level three is a bonus that becomes important when stakes are high or trust is low.

The Minimum Viable Setup

You don't need complex infrastructure to achieve reasonable reproducibility. Start with some basic hygiene that takes almost no extra effort.

Keep each analysis self-contained in a single notebook. Don't scatter code across multiple files unless you have a good reason. A single notebook with clear sections—setup, data loading, cleaning, analysis, conclusions—is easier for someone else to follow than a project with eight Python files and a README that's three months out of date.

Structure your notebook to flow from raw data to final output. The first cells should load data. The middle cells should transform and analyze it. The final cells should produce the outputs that matter. If someone runs all cells from top to bottom, they should get your results.

Document your assumptions in markdown cells, especially the ones that aren't obvious from the code. What date range are you using? What records are you excluding and why? What would change your conclusions? These notes cost almost nothing to add and save hours of confusion later.

Print your package versions somewhere near the top of the notebook. This isn't about being pedantic—it's about future debugging. When something breaks six months from now, knowing that you were using pandas 2.1 instead of pandas 2.2 can save an afternoon of head-scratching.

import pandas as pd
import numpy as np
print(f"pandas: {pd.__version__}")
print(f"numpy: {np.__version__}")

Common Reproducibility Killers

Some coding patterns destroy reproducibility without being obviously wrong. Learn to recognize them.

Random seeds are the classic gotcha. If your analysis uses any randomness—train/test splits, sampling, Monte Carlo simulations—the results will be different every time you run it unless you set a seed. This makes your analysis unreproducible by definition. Always set seeds explicitly.

import numpy as np
np.random.seed(42)

Live data queries are another trap. If you query a database and the data changes daily, your notebook will produce different results tomorrow than it did today. There's nothing wrong with live queries for monitoring dashboards, but for analytical work that you're going to share, consider using snapshots. Either save the data to a file or add a date filter to your query that pins to a specific point in time.

Relative file paths break constantly. Your code works on your laptop where the working directory is set up just right, but it fails on anyone else's machine. Use absolute paths, or at least explicit path construction that doesn't depend on hidden assumptions about the current directory.

from pathlib import Path
data_dir = Path(__file__).parent / 'data'
df = pd.read_csv(data_dir / 'customers.csv')

Hidden state is the sneakiest problem. Notebooks let you run cells out of order, which is great for exploration but terrible for reproducibility. You modify a variable in cell 5, but you don't re-run cells 1-4. Now the notebook's in-memory state doesn't match what you'd get from running top-to-bottom. This leads to the frustrating experience where your notebook works for you but breaks for everyone else.

The "Run All" Test

Before you share any analysis with anyone, do this: restart the kernel, then run all cells from top to bottom. No manual intervention. No running cells out of order. Just Run All.

If it doesn't work, your analysis isn't reproducible. Period. Maybe you deleted a variable that later cells depend on. Maybe you have cells that only work after you've run them once before. Whatever the issue, if Run All fails, someone else definitely won't be able to reproduce your work.

This test takes thirty seconds and catches most reproducibility issues before they embarrass you.

Levels of Investment

There's a spectrum of how much effort you can put into reproducibility. The right level depends on how important the analysis is and how likely someone is to need to verify it later.

At the low end, just restart and run all before sharing. This catches the obvious problems with almost no effort. Add markdown documentation for anything non-obvious. This takes minutes and is almost always worth it.

At the medium level, save data snapshots instead of relying on live queries. Pin your package versions explicitly, either in comments or in a requirements file. This takes a bit more discipline but makes your analysis robust to changes in the underlying data or environment.

At the high end, use containers to capture the full environment. Docker lets you specify not just the packages but the exact versions of Python, the operating system dependencies, everything. This is overkill for most business analysis but appropriate for anything that might end up in a regulatory filing or a published paper.

Start at the low end. Move up only when you have a specific reason.

Reproducibility in Margin

Margin's design supports reproducibility without requiring heroic discipline. Datasets are stored in the platform, so the data you upload is the data you analyze—no wondering which version of the CSV you used. Notebooks save their outputs, so the state at save time is preserved even if you later make changes.

When you create a brief from a notebook, the connection is recorded. Every artifact in the brief traces back to a specific cell in a specific notebook—and from there to the datasets used. Someone can click through to see the code that produced any chart or table, and trace it all the way back to the source data. The lineage is built into the system rather than maintained through manual documentation.

This doesn't make reproducibility automatic—you still need to write clear code and document your assumptions—but it removes some of the friction that makes reproducibility hard to maintain in practice.

Good Enough Is Good Enough

Perfect reproducibility is an ideal that's rarely achievable and often not necessary. Live data changes. Business contexts evolve. The world doesn't stand still while you finalize your analysis.

What you're really after is transparency: someone can understand what you did, verify the key parts, and trust the conclusions. If you achieve that, your analysis is reproducible enough for most purposes. Don't let the perfect be the enemy of the good.


Margin makes it easy to share reproducible analysis. Upload your data, do the work, share a link. Try it free.