How the Math Works
Everything you need to understand this research, explained like you've never taken a statistics class. Start here if the other pages feel dense.
1. What is a "regression" and why do we use it?
The problem we're solving
We want to know: do stock prices behave differently around paydays? But lots of things affect stock prices — the day of the week, the time of year, what happened yesterday. If we just look at "payday days vs. other days," we might accidentally pick up one of those other effects and think it's a payday effect.
Ice cream and drowning. Ice cream sales and drowning deaths both go up in summer. If you just compared the two, you'd conclude ice cream causes drowning. But both are caused by a third thing: hot weather. A regression is the tool that says "hold weather constant, and now check if ice cream still predicts drowning." (It doesn't.)
In our study: "Hold the day-of-week, month, and recent market moves constant, and now check if payday timing still predicts stock returns."
How it works, step by step
A regression takes each trading day and builds an equation:
today's return = baseline (the average day)
+ ?? × is it a payday window? ← THIS is what we want to measure
+ ?? × is it Monday? (Mondays tend to be worse)
+ ?? × is it Friday? (Fridays tend to be better)
+ ?? × is it January? (January has its own pattern)
+ ?? × is it the turn of the month? (last day + first 3 days)
+ ?? × what did the market do yesterday? (momentum)
+ random noise (stuff we can't predict)
The regression fills in every ?? with the number that best fits 16,680 trading days of data. The number in front of "is it a payday window?" is the beta coefficient (β) — the headline number in our study.
2. What is a "p-value"?
The question it answers
Suppose you found β = +0.10%. Cool — but is that real, or did it just happen by chance? Maybe you got unlucky with which days fell in the "payday window" and which didn't. The p-value answers: "If there were truly no payday effect, how often would random chance produce a number this big or bigger?"
You suspect a coin is rigged. You flip it 100 times.
- 52 heads: Meh. That's close enough to 50 that you wouldn't even blink. (p ≈ 0.69)
- 57 heads: Hmm, a little suspicious, but you've seen that happen with a fair coin. (p ≈ 0.09)
- 60 heads: Now you're paying attention. This is uncommon for a fair coin. (p ≈ 0.02)
- 65 heads: Okay, this coin is almost certainly rigged. (p ≈ 0.002)
- 75 heads: There is essentially zero chance this coin is fair. (p < 0.000001)
The p-value is the probability of seeing a result this extreme (or more) if the coin were perfectly fair. The smaller the p-value, the harder it is to explain away as luck.
The significance threshold
Scientists have agreed on a convention: if p < 0.05 (less than 5% chance of being luck), we call the result "statistically significant." This isn't a magic number — it's just a widely accepted standard. Here's what the different levels mean:
Not significant
Probably noise
Suggestive
Interesting, not conclusive
Significant *
Likely real
Highly significant **
Almost certainly real
Extremely significant ***
Beyond reasonable doubt
Throughout this site, we mark significance with stars: * = p < 0.05, ** = p < 0.01, *** = p < 0.001, and . = p < 0.10 (suggestive).
3. What are "HAC standard errors" and why should I care?
The problem with basic statistics on stock data
Most statistics assume that each data point is independent — like each flip of a coin has nothing to do with the last one. But stock prices don't work like coin flips:
- Each day is independent of the last
- Market volatility is the same every day
- A calm day and a crisis day have equal weight
- Bad days tend to follow bad days (momentum)
- Some weeks are wild, some are calm (volatility clustering)
- A 2008 crisis day is very different from a normal Tuesday
If you use basic statistics on stock data, your p-values will be too optimistic — results will look more significant than they really are. It's like wearing rose-colored glasses that make everything seem more important.
4. What is "FDR correction" and the "multiple comparisons problem"?
You're at a party with 100 people. You test whether each person's birthday predicts their salary. At the p < 0.05 level, you'd expect about 5 people to show a "significant" correlation just by pure luck — even though birthdays obviously don't affect salaries.
That's the multiple comparisons problem: test enough things, and some will look significant by accident.
In our study, we tested:
- 4 different payroll schedules (weekly, biweekly, semi-monthly, monthly)
- 3 different time windows (full history, post-2020, last 12 months)
- 21 different clearing lags (0 through 20 days)
- 5 different metrics (return, overnight return, intraday return, volatility, volume)
That's hundreds of tests. Some will look significant by chance alone.
How FDR correction fixes this
FDR stands for "False Discovery Rate." It's a method (invented by Benjamini and Hochberg in 1995) that adjusts all the p-values to account for the number of tests. It produces a q-value for each result:
q = 0.05 means: "If you take all findings with q ≤ 0.05, at most 5% of them are expected to be false alarms."
A finding with p = 0.01 might have q = 0.15 after FDR correction — meaning that once you account for all the tests you ran, this result is no longer trustworthy on its own.
5. What is the "clearing lag"?
This is the single most important concept for understanding our findings. When your paycheck is deposited, the 401(k) contribution doesn't instantly buy stock. It goes through a pipeline:
-
Payday (e.g., Friday the 15th)
Your paycheck arrives. $333 is deducted for your 401(k). But nothing has been invested yet — the money is just sitting in your employer's payroll account. -
Employer processing (1-2 business days)
Your employer batches all employees' contributions together and wires the total to the recordkeeper (Fidelity, Vanguard, Schwab, etc.). ERISA law requires this within 7 business days, but most do it within 1-3. -
Recordkeeper receives funds (Day 3-4)
Fidelity/Vanguard receives the wire, matches it to your account, and queues a trade order based on your investment elections (e.g., "80% S&P 500 index fund, 20% bonds"). -
Trade order placed (Day 5)
The recordkeeper places a buy order for your index fund shares. For mutual funds, this executes at the 4:00 PM closing price (NAV). -
Shares purchased (Day 6-7) ← YOUR MONEY ENTERS THE MARKET
The trade executes. Your $333 has finally become shares of the S&P 500 index fund. This is the moment your money actually affects stock prices. -
Settlement (Day 7-8)
The trade officially settles (T+1 for mutual funds). Your shares appear in your account.
This is why the initial analysis (which looked at the paycheck date) found nothing — the action happens a week later. Once we shifted the analysis to account for the clearing lag, the pattern appeared.
6. What is a "Monte Carlo simulation"?
Imagine showing your results to a skeptic. They say: "Sure, you found a pattern around payday dates. But I bet you'd find a pattern around ANY set of dates if you looked hard enough."
The Monte Carlo test directly answers this challenge.
How it works
- Take the real payday dates (about 480 semi-monthly settlement dates over 20 years).
- Pick 480 random dates from the same calendar — completely ignoring paydays.
- Run the exact same analysis on these random dates. Compute β (the effect size).
- Write down the result and repeat steps 2-3 a total of 500 times.
- Compare: is the REAL payday β bigger than 95% of the random ones?
The real β would be somewhere in the middle of the random β's. It wouldn't stand out. Verdict: the pattern is noise.
The real β would be larger than 95%+ of the random β's. Random dates almost never produce a pattern this strong. Verdict: the pattern is real.
7. What is "GARCH"?
GARCH is a type of statistical model built specifically for financial data. Remember how we said stock volatility clusters — calm days follow calm days, and stormy days follow stormy days?
If you're predicting tomorrow's temperature, knowing today's temperature helps a lot (if it's 90° today, it's probably not 30° tomorrow). Similarly, if the stock market was wild today, it's more likely to be wild tomorrow.
A regular regression ignores this. GARCH builds it into the model: it predicts not just the direction of the market, but also how volatile it will be, and uses that to make more precise estimates.
We use GARCH as a cross-check. If our payday effect shows up in both the regular regression AND the GARCH model, we can be more confident it's real — because GARCH handles volatility clustering better than HAC standard errors alone.
8. What is "block bootstrap"?
Bootstrap is a technique where you create "fake" versions of your dataset by resampling it, then check if your results hold up across all the fake versions.
Imagine your data is a deck of 5,000 cards (one per trading day). A regular bootstrap would shuffle the cards randomly and draw 5,000 with replacement. But this breaks the order — you might put a Monday after a Wednesday, which doesn't make sense for time-series data.
A block bootstrap instead picks up chunks of consecutive cards (say, 20 days at a time) and rearranges the chunks. Each chunk keeps its internal order intact, so the patterns within each stretch of days are preserved. Only the arrangement of chunks changes.
We do this 1,000 times and compute β each time. If 95% of the bootstrapped β's are positive (or all negative), the result is robust. The range of values gives us a confidence interval — a range where the true effect most likely lives.
9. How to read the charts on this site
Most interactive charts on this site follow the same format. Here's a complete guide:
The x-axis (horizontal)
Usually shows the clearing lag — how many trading days after the paycheck date. Lag 0 = the paycheck day itself. Lag +8 = eight trading days later (approximately when the 401(k) money actually buys stocks).
The y-axis (vertical)
Usually shows the β coefficient (the effect size) in percentage points per day. A bar reaching up to +0.10% means "the market returns an extra 0.10% per day during this window." A bar going down to -0.10% means the market performs 0.10% worse per day.
The zero line
The horizontal dashed line at y = 0 represents "no effect." Bars above this line mean positive excess returns; bars below mean negative. If a bar's error range crosses this line, the result is not statistically significant.
Error bars (thin lines above and below each bar)
These show the 95% confidence interval. Think of it as: "we're 95% sure the true value is somewhere within this range."
- Short error bars = we're quite sure about this number
- Long error bars = there's a lot of uncertainty
- Error bars that cross zero = we can't rule out that the effect is actually zero (not significant)
Colors and legend
Different colors represent different time periods or categories. Click a legend entry to show/hide that series. This lets you compare, for example, "full 1960-2026 sample" vs. "just the 2000-2019 era."
Significance markers
Stars next to values indicate statistical significance:
| Marker | Meaning | How confident? |
|---|---|---|
| (no marker) | p ≥ 0.10 | Not significant — could easily be chance |
| . | p < 0.10 | Suggestive — worth noting but not conclusive |
| * | p < 0.05 | Significant — less than 5% chance of being luck |
| ** | p < 0.01 | Highly significant — less than 1% chance |
| *** | p < 0.001 | Extremely significant — less than 0.1% chance |
Green/red shaded bands
Green bands highlight the "settlement window" (the days when 401(k) money is most likely buying). Red bands or markers highlight where investors are buying at elevated prices.
Interactive features
- Hover over any data point to see exact values
- Click a legend entry to hide/show that series
- Drag to zoom into a region
- Double-click to reset zoom
- On mobile: tap for hover info, pinch to zoom
10. Putting it all together
Here's how all these methods connect in our study:
- We start with a question: do stock prices behave differently around paydays?
- We use regression to isolate the payday effect from other known market patterns (day-of-week, month, etc.).
- We use HAC standard errors to make sure our p-values are honest (not inflated by the messiness of stock data).
- We test multiple lags (0 to 20 days after paycheck) to find when the effect actually happens.
- We apply FDR correction because testing 20 different lags means some will look significant by chance.
- We run Monte Carlo to prove that real payday dates produce a stronger signal than random dates.
- We run GARCH as a second opinion from a model designed specifically for stock data.
- We run block bootstrap to check that results hold up when we reshuffle the data.
- Only findings that survive ALL of these filters make it into our conclusions.