How the Math Works — The Payday Effect

This page explains every statistical concept used in the study. Each section starts with a real-world analogy, then connects it to what we actually did. You don't need to read them in order — jump to whatever term confused you on another page.

1. What is a "regression" and why do we use it?

The problem we're solving

We want to know: do stock prices behave differently around paydays? But lots of things affect stock prices — the day of the week, the time of year, what happened yesterday. If we just look at "payday days vs. other days," we might accidentally pick up one of those other effects and think it's a payday effect.

Real-world analogy

Ice cream and drowning. Ice cream sales and drowning deaths both go up in summer. If you just compared the two, you'd conclude ice cream causes drowning. But both are caused by a third thing: hot weather. A regression is the tool that says "hold weather constant, and now check if ice cream still predicts drowning." (It doesn't.)

In our study: "Hold the day-of-week, month, and recent market moves constant, and now check if payday timing still predicts stock returns."

How it works, step by step

A regression takes each trading day and builds an equation:

How much did the market move today?
today's return = baseline (the average day)
  + ?? × is it a payday window? ← THIS is what we want to measure
  + ?? × is it Monday? (Mondays tend to be worse)
  + ?? × is it Friday? (Fridays tend to be better)
  + ?? × is it January? (January has its own pattern)
  + ?? × is it the turn of the month? (last day + first 3 days)
  + ?? × what did the market do yesterday? (momentum)
  + random noise (stuff we can't predict)

The regression fills in every ?? with the number that best fits 16,680 trading days of data. The number in front of "is it a payday window?" is the beta coefficient (β) — the headline number in our study.

Bottom line: β = +0.10% means "on days near a payday, the market goes up an extra 0.10% per day, even after accounting for everything else we know moves markets." If β = 0, there's no payday effect. If β is negative, the market actually goes down near paydays.

2. What is a "p-value"?

The question it answers

Suppose you found β = +0.10%. Cool — but is that real, or did it just happen by chance? Maybe you got unlucky with which days fell in the "payday window" and which didn't. The p-value answers: "If there were truly no payday effect, how often would random chance produce a number this big or bigger?"

Coin flip analogy

You suspect a coin is rigged. You flip it 100 times.

52 heads: Meh. That's close enough to 50 that you wouldn't even blink. (p ≈ 0.69)
57 heads: Hmm, a little suspicious, but you've seen that happen with a fair coin. (p ≈ 0.09)
60 heads: Now you're paying attention. This is uncommon for a fair coin. (p ≈ 0.02)
65 heads: Okay, this coin is almost certainly rigged. (p ≈ 0.002)
75 heads: There is essentially zero chance this coin is fair. (p < 0.000001)

The p-value is the probability of seeing a result this extreme (or more) if the coin were perfectly fair. The smaller the p-value, the harder it is to explain away as luck.

The significance threshold

Scientists have agreed on a convention: if p < 0.05 (less than 5% chance of being luck), we call the result "statistically significant." This isn't a magic number — it's just a widely accepted standard. Here's what the different levels mean:

p > 0.10
Not significant
Probably noise

p = 0.05–0.10
Suggestive
Interesting, not conclusive

p = 0.01–0.05
Significant *
Likely real

p = 0.001–0.01
Highly significant **
Almost certainly real

p < 0.001
Extremely significant ***
Beyond reasonable doubt

Throughout this site, we mark significance with stars: * = p < 0.05, ** = p < 0.01, *** = p < 0.001, and . = p < 0.10 (suggestive).

3. What are "HAC standard errors" and why should I care?

The problem with basic statistics on stock data

Most statistics assume that each data point is independent — like each flip of a coin has nothing to do with the last one. But stock prices don't work like coin flips:

What basic stats assume

Each day is independent of the last
Market volatility is the same every day
A calm day and a crisis day have equal weight

How stocks actually work

Bad days tend to follow bad days (momentum)
Some weeks are wild, some are calm (volatility clustering)
A 2008 crisis day is very different from a normal Tuesday

If you use basic statistics on stock data, your p-values will be too optimistic — results will look more significant than they really are. It's like wearing rose-colored glasses that make everything seem more important.

HAC standard errors (named after the authors, Newey and West) are the correction. They adjust the math to account for the fact that stock returns are messy, correlated, and unevenly volatile. Think of it as swapping the rose-colored glasses for prescription lenses. The results might look less dramatic, but they're honest.

4. What is "FDR correction" and the "multiple comparisons problem"?

The birthday party analogy

You're at a party with 100 people. You test whether each person's birthday predicts their salary. At the p < 0.05 level, you'd expect about 5 people to show a "significant" correlation just by pure luck — even though birthdays obviously don't affect salaries.

That's the multiple comparisons problem: test enough things, and some will look significant by accident.

In our study, we tested:

4 different payroll schedules (weekly, biweekly, semi-monthly, monthly)
3 different time windows (full history, post-2020, last 12 months)
21 different clearing lags (0 through 20 days)
5 different metrics (return, overnight return, intraday return, volatility, volume)

That's hundreds of tests. Some will look significant by chance alone.

How FDR correction fixes this

FDR stands for "False Discovery Rate." It's a method (invented by Benjamini and Hochberg in 1995) that adjusts all the p-values to account for the number of tests. It produces a q-value for each result:

What q-values mean

q = 0.05 means: "If you take all findings with q ≤ 0.05, at most 5% of them are expected to be false alarms."

A finding with p = 0.01 might have q = 0.15 after FDR correction — meaning that once you account for all the tests you ran, this result is no longer trustworthy on its own.

Bottom line: Raw p-values tell you how surprising ONE result is. FDR-adjusted q-values tell you how surprising it is given that you tested many things. Throughout this site, we report both — and our headline conclusions are based on q-values, not raw p-values.

5. What is the "clearing lag"?

This is the single most important concept for understanding our findings. When your paycheck is deposited, the 401(k) contribution doesn't instantly buy stock. It goes through a pipeline:

Payday (e.g., Friday the 15th)
Your paycheck arrives. $333 is deducted for your 401(k). But nothing has been invested yet — the money is just sitting in your employer's payroll account.
Employer processing (1-2 business days)
Your employer batches all employees' contributions together and wires the total to the recordkeeper (Fidelity, Vanguard, Schwab, etc.). ERISA law requires this within 7 business days, but most do it within 1-3.
Recordkeeper receives funds (Day 3-4)
Fidelity/Vanguard receives the wire, matches it to your account, and queues a trade order based on your investment elections (e.g., "80% S&P 500 index fund, 20% bonds").
Trade order placed (Day 5)
The recordkeeper places a buy order for your index fund shares. For mutual funds, this executes at the 4:00 PM closing price (NAV).
Shares purchased (Day 6-7) ← YOUR MONEY ENTERS THE MARKET
The trade executes. Your $333 has finally become shares of the S&P 500 index fund. This is the moment your money actually affects stock prices.
Settlement (Day 7-8)
The trade officially settles (T+1 for mutual funds). Your shares appear in your account.

The key insight: There's a 7-8 trading day gap between "your paycheck arrives" and "your money buys stocks." Our research shows that stock prices systematically rise during this exact gap. By the time your money buys in, prices are already elevated.

This is why the initial analysis (which looked at the paycheck date) found nothing — the action happens a week later. Once we shifted the analysis to account for the clearing lag, the pattern appeared.

6. What is a "Monte Carlo simulation"?

The skeptic's challenge

Imagine showing your results to a skeptic. They say: "Sure, you found a pattern around payday dates. But I bet you'd find a pattern around ANY set of dates if you looked hard enough."

The Monte Carlo test directly answers this challenge.

How it works

Take the real payday dates (about 480 semi-monthly settlement dates over 20 years).
Pick 480 random dates from the same calendar — completely ignoring paydays.
Run the exact same analysis on these random dates. Compute β (the effect size).
Write down the result and repeat steps 2-3 a total of 500 times.
Compare: is the REAL payday β bigger than 95% of the random ones?

If real paydays are NOT special

The real β would be somewhere in the middle of the random β's. It wouldn't stand out. Verdict: the pattern is noise.

If real paydays ARE special

The real β would be larger than 95%+ of the random β's. Random dates almost never produce a pattern this strong. Verdict: the pattern is real.

Bottom line: Monte Carlo is the "prove it's not a coincidence" test. We shuffle the calendar 500 times and check if paydays really are special compared to random dates. They are.

7. What is "GARCH"?

GARCH is a type of statistical model built specifically for financial data. Remember how we said stock volatility clusters — calm days follow calm days, and stormy days follow stormy days?

Weather analogy

If you're predicting tomorrow's temperature, knowing today's temperature helps a lot (if it's 90° today, it's probably not 30° tomorrow). Similarly, if the stock market was wild today, it's more likely to be wild tomorrow.

A regular regression ignores this. GARCH builds it into the model: it predicts not just the direction of the market, but also how volatile it will be, and uses that to make more precise estimates.

We use GARCH as a cross-check. If our payday effect shows up in both the regular regression AND the GARCH model, we can be more confident it's real — because GARCH handles volatility clustering better than HAC standard errors alone.

Bottom line: GARCH is a fancier statistical model that's purpose-built for stock market data. We ran our test through it as a second opinion. It confirmed the findings.

8. What is "block bootstrap"?

Bootstrap is a technique where you create "fake" versions of your dataset by resampling it, then check if your results hold up across all the fake versions.

The deck of cards analogy

Imagine your data is a deck of 5,000 cards (one per trading day). A regular bootstrap would shuffle the cards randomly and draw 5,000 with replacement. But this breaks the order — you might put a Monday after a Wednesday, which doesn't make sense for time-series data.

A block bootstrap instead picks up chunks of consecutive cards (say, 20 days at a time) and rearranges the chunks. Each chunk keeps its internal order intact, so the patterns within each stretch of days are preserved. Only the arrangement of chunks changes.

We do this 1,000 times and compute β each time. If 95% of the bootstrapped β's are positive (or all negative), the result is robust. The range of values gives us a confidence interval — a range where the true effect most likely lives.

Bottom line: Block bootstrap is a way to test "if we had slightly different data, would we still get the same answer?" It's like asking 1,000 slightly different versions of reality and checking if they all agree. When they do, we can trust the finding.

9. How to read the charts on this site

Most interactive charts on this site follow the same format. Here's a complete guide:

The x-axis (horizontal)

Usually shows the clearing lag — how many trading days after the paycheck date. Lag 0 = the paycheck day itself. Lag +8 = eight trading days later (approximately when the 401(k) money actually buys stocks).

The y-axis (vertical)

Usually shows the β coefficient (the effect size) in percentage points per day. A bar reaching up to +0.10% means "the market returns an extra 0.10% per day during this window." A bar going down to -0.10% means the market performs 0.10% worse per day.

The zero line

The horizontal dashed line at y = 0 represents "no effect." Bars above this line mean positive excess returns; bars below mean negative. If a bar's error range crosses this line, the result is not statistically significant.

Error bars (thin lines above and below each bar)

These show the 95% confidence interval. Think of it as: "we're 95% sure the true value is somewhere within this range."

Short error bars = we're quite sure about this number
Long error bars = there's a lot of uncertainty
Error bars that cross zero = we can't rule out that the effect is actually zero (not significant)

Colors and legend

Different colors represent different time periods or categories. Click a legend entry to show/hide that series. This lets you compare, for example, "full 1960-2026 sample" vs. "just the 2000-2019 era."

Significance markers

Stars next to values indicate statistical significance:

Marker	Meaning	How confident?
(no marker)	p ≥ 0.10	Not significant — could easily be chance
.	p < 0.10	Suggestive — worth noting but not conclusive
*	p < 0.05	Significant — less than 5% chance of being luck
**	p < 0.01	Highly significant — less than 1% chance
*******	p < 0.001	Extremely significant — less than 0.1% chance

Green/red shaded bands

Green bands highlight the "settlement window" (the days when 401(k) money is most likely buying). Red bands or markers highlight where investors are buying at elevated prices.

Interactive features

Hover over any data point to see exact values
Click a legend entry to hide/show that series
Drag to zoom into a region
Double-click to reset zoom
On mobile: tap for hover info, pinch to zoom

10. What is a Friday isolation test?

After finding a pattern, we asked: does it survive if we remove all Fridays?

Lucky shirt analogy

You think wearing your lucky shirt helps your team win. To test it, you wear a different shirt to 10 games. If your team still wins just as often, the shirt wasn't the cause. That's what a Friday isolation test does — it "removes the shirt" (Fridays) and checks if the pattern holds.

In our study: we removed all Friday trading days from 65 years of data, then re-ran every regression. Every significant result vanished. The pattern only exists on Fridays. This told us the mechanism is tied to something specific about Fridays — settlement batching, biweekly paydays, options expiration — not to payroll timing in general.

Bottom line: If a finding only works on one day of the week, the explanation must involve something special about that day. Friday is when employers batch payroll, when biweekly workers get paid, and when options expire. Our effect is structural — tied to Friday-specific market mechanics.

11. What is component decomposition?

Our study combined two types of paydays: the 15th and the end-of-month. Component decomposition means testing each one separately.

Baking analogy

You bake a cake with flour AND sugar and it tastes great. But which ingredient is doing the work? To find out, you bake one batch with only flour (no sugar) and another with only sugar (no flour). If neither batch tastes as good alone, the magic was in the COMBINATION. That's what we found with paydays.

Results:

15th alone: no significant signal at lag+7-8. Its peak is at lag+16-18 (which points to the NEXT month's turn-of-month dates)
End-of-month alone: significant at lag+0 (the last trading day of the month — the classic turn-of-month effect) but NOT at lag+7-8
Combined (semi-monthly): significant at lag+7-8. The combination is stronger than either ingredient alone.

Bottom line: The lag+7-8 effect isn't caused by one payday type — it's a composite that only appears when you combine the 15th and end-of-month anchors, doubling the measurement opportunities per month. This doesn't mean it's fake — it means it requires a specific analytical lens to detect.

12. What is a "volume signal without returns"?

Sometimes you can detect money flowing into the market (via trading volume) even when prices don't move. This is exactly what we found with biweekly paydays.

Highway analogy

Adding 10% more cars to a highway definitely increases traffic (volume). But if those extra cars are spread evenly across all lanes and times, the speed limit (returns) doesn't change. You can measure the extra traffic, but it doesn't move anything directionally. That's biweekly 401(k) flow: detectable in volume, too diffuse for price impact.

Biweekly paydays (every other Friday, 43% of US workers) show volume spikes at lag+3, +7, and +8 with p < 0.001 — extremely significant. But return coefficients are near zero at every lag. The money IS entering the market; it's just spread across too many Fridays to push prices.

Bottom line: Seeing volume without returns is evidence that flow IS real (countering the "it's all just noise" argument) but too diffuse to create directional price pressure. Only when flow is concentrated — semi-monthly pay on specific dates — does it become detectable in returns as well.

13. Putting it all together

Here's how all these methods connect in our study:

We start with a question: do stock prices behave differently around paydays?
We use regression to isolate the payday effect from other known market patterns (day-of-week, month, etc.).
We use HAC standard errors to make sure our p-values are honest (not inflated by the messiness of stock data).
We test multiple lags (0 to 20 days after paycheck) to find when the effect actually happens.
We apply FDR correction because testing 20 different lags means some will look significant by chance.
We run Monte Carlo to prove that real payday dates produce a stronger signal than random dates.
We run GARCH as a second opinion from a model designed specifically for stock data.
We run block bootstrap to check that results hold up when we reshuffle the data.
We test the Friday dependency — remove all Fridays and check if the signal survives. (It doesn't.)
We decompose the components — test each payday anchor separately to understand the composite nature.
We check volume — biweekly paydays show flow is real (volume spikes) even where returns aren't detectable.
Only findings that survive ALL of these filters make it into our conclusions — and the finding must be described with all its limitations (Friday-dependent, composite, structural).

Think of it like a courtroom: the regression is the evidence, HAC is the lie detector, FDR is the cross-examination, Monte Carlo is the independent witness, GARCH is the expert testimony, bootstrap is the jury deliberation, the Friday test is checking the alibi, and component decomposition is interviewing each suspect separately. A finding that survives all of them — with all its caveats noted — is an honest verdict.