Sample Size and Significance

Scattered dots converging into a tighter band as uncertainty narrows — Original Delta-X illustration.

reader9 min read

Sample size is the number of independent trades behind a backtest statistic; significance is whether that statistic is large enough, relative to its noise, to be unlikely to come from luck alone.

Target audience: Traders who see a great win rate over a handful of trades and want to know whether to believe it.

Learning objectives

✓Explain why small samples produce large swings in win rate and expectancy.
✓Use a rough rule of thumb for how many trades a claim needs.
✓Recognise that low-frequency systems take years to validate.
✓Treat a backtest result as a range, not a single number.

Definition

Sample size is the number of independent trades behind a backtest statistic; significance is whether that statistic is large enough, relative to its noise, to be unlikely to come from luck alone.

Why it matters

An edge measured on 12 trades is a story; the same edge measured on 400 trades is evidence. Most blown-up systems were validated on too few trades, where ordinary variance looked like skill. Knowing how much data a claim needs is what separates a tradeable edge from a coincidence.

Noise shrinks slower than you think

The uncertainty in an average falls with the square root of the number of trades, not linearly. To halve the noise around your measured expectancy you need four times as many trades. That is why 20 trades tell you almost nothing and 400 tell you a lot: the first 20 leave an error bar wide enough to contain both a great system and a losing one. A result without a sample size attached is not a result.

A working rule of thumb

For a system with a modest edge, treat roughly 100 trades as the minimum to form an opinion and several hundred to trust expectancy and drawdown. The smaller the edge per trade, the larger the sample you need to see it through the noise. A high win rate over 15 trades is the single most common trap: it is exactly what a coin-flip system also produces some of the time.

Frequency sets the calendar

Sample size interacts with how often the system trades. A setup that fires twice a week needs years to accumulate a few hundred trades, which means you cannot validate it quickly and must lean harder on logic and out-of-sample care. A higher-frequency system reaches significance in months. Neither is better; but the low-frequency system demands more patience and more humility about what its short record can prove.

Visual models

Expectancy: win rate and payoff together; a high win rate with a tiny payoff still loses

Worked examples

Example 1: The same coin, different stories

A break-even system (true expectancy zero) tested over 20 trades will, by chance, show a positive result almost half the time, sometimes a strongly positive one. Over 400 trades that same zero-edge system clusters tightly around zero and the illusion disappears. The lesson: a positive small-sample backtest is consistent with no edge at all, so the burden is on the sample, not the headline number.

Common mistakes

Trusting a win rate or expectancy measured over a few dozen trades.

Reporting expectancy as a single number with no error bar or trade count.

Validating a low-frequency system on a short window and calling it proven.

Adding trades from different regimes and treating them as one homogeneous sample.

Stopping the test as soon as the result looks good.

Myth vs reality

Myth

That a high win rate is meaningful regardless of how few trades produced it.

Reality

No paired reality note provided.

Myth

That doubling the trades halves the uncertainty (it takes four times as many).

Reality

No paired reality note provided.

Myth

That a short, profitable record proves an edge.

Reality

No paired reality note provided.

Risk considerations

Sizing up after a small-sample 'edge' is how variance becomes ruin.
Low-frequency edges can look broken for long stretches that are still within normal variance.

Practice exercises

1. Attach a sample size to every claim

For each system statistic you rely on, write the number of trades behind it and judge whether it is enough.

List your headline stats: win rate, expectancy, max drawdown.
Write the trade count behind each one.
Flag any claim resting on fewer than ~100 trades as unproven.
Estimate how long, at the system's frequency, several hundred trades would take.

Quiz

Q1. Why are small-sample backtests untrustworthy?

Q2. Roughly how many trades does a modest edge need to form an opinion?

Q3. How does trade frequency affect validation?

Next lesson

Overfitting and Curve-Fitting

Continue to next

This lesson is educational content only and is not financial advice. Trading involves substantial risk. A tested process improves decision quality and survivability; it does not predict the market or guarantee any outcome. Trade only with risk you can afford to lose.

Learning objectives

Definition

Why it matters

Noise shrinks slower than you think

A working rule of thumb

Frequency sets the calendar

Visual models

Worked examples

Example 1: The same coin, different stories

Common mistakes

Myth vs reality

Risk considerations

Related concepts

Practice exercises

1. Attach a sample size to every claim

Quiz

Q1. Why are small-sample backtests untrustworthy?

Q2. Roughly how many trades does a modest edge need to form an opinion?

Q3. How does trade frequency affect validation?

Overfitting and Curve-Fitting