Methodology — The Payday Effect

Research question

Do S&P 500 daily returns exhibit statistically significant excess performance in the trading days surrounding predictable payroll-driven inflows (semi-monthly, biweekly, and other pay schedules), after controlling for known calendar anomalies, and is any such effect consistent with the mechanism of 401(k) contribution flow rather than alternative explanations?

Data

Field	S&P 500	Bitcoin
Source	Yahoo Finance	Yahoo Finance
Ticker	^GSPC	BTC-USD
Sample (main)	1960–2026 (65 years)	2016–2026 (10 years)
Additional windows	2016–2026, 2020–2026	2020–2026
Rows	16,680 trading days	3,758 calendar days
Fields	Date, Open, High, Low, Close, Volume

Bitcoin serves as a secondary test asset. Because BTC trades 24/7 and has no "overnight" session, it lets us check whether payday effects are specific to traditional equity market hours or appear in any risk asset.

Payday definitions

We constructed seven distinct payday calendars, each representing a different payroll schedule used in the US economy:

Schedule	Payday dates	Why we test it
Mid-month	15th of each month (rolled to prior trading day if weekend/holiday)	First leg of semi-monthly pay; common for salaried workers
End-of-month	Last trading day of each month	Second leg of semi-monthly pay; overlaps with turn-of-month
Semi-monthly	Both 15th and month-end	The combined schedule used by ~19% of US workers; primary test
Biweekly	Every other Friday (union of two offset calendars)	Most common US pay schedule (~37% of workers); lower flow concentration
Weekly	Every Friday	Negative control: spreads flow across all weeks, should dilute any signal
Military	1st and 15th of each month	Thrift Savings Plan (TSP) contributions from federal/military pay
Social Security	2nd, 3rd, 4th Wednesdays of each month	Negative control: recipients generally do not invest in equities via payroll deduction

Bucket definitions

For each payday calendar, every trading day is assigned to exactly one bucket based on its distance from the nearest payday:

Bucket	Days relative to payday	What it captures
Run-up	T-5 through T-1	The 5 trading days before a payday. If someone is front-running the expected inflow, this is where we would see it — prices rising in anticipation of the wave of buying.
Payday	T (day zero)	The payday itself. If 401(k) money hits the market immediately, this day should show the impact.
Post-payday	T+1 through T+3	The 3 trading days after payday. Accounts for the fact that contributions may take a few days to settle and actually execute as stock purchases.
Other	Everything else	All days that do not fall within any payday window. This is the baseline — the "normal" days we compare against.

Statistical methods

We apply four independent statistical methods to the same data. If a finding appears in multiple methods, it is more likely to be real rather than an artifact of one particular test's assumptions.

1. HAC-OLS regression

The primary model is an ordinary least squares regression with Newey-West heteroskedasticity- and autocorrelation-consistent standard errors:

r_t = α + β_run · D_run + β_pay · D_pay + β_post · D_post + γ · Controls + ε_t

Where r_t is the daily return, D_run, D_pay, D_post are dummy variables for each bucket, and Controls include:

Monday dummy — captures the well-documented Monday effect (French 1980)
Friday dummy — captures end-of-week patterns (Cross 1973)
January dummy — captures the January effect (Rozeff & Kinney 1976)
Turn-of-month dummy — captures the last day + first 3 days of each month (Ariel 1987, Lakonishok & Smidt 1988)
Pre-holiday dummy — captures elevated returns before market holidays (Ariel 1990)
AR(1) term — lagged return to capture serial correlation
Volatility regime — realized volatility proxy to control for clustering

The Newey-West bandwidth is chosen automatically: L = floor(4 × (T/100)^(2/9)), following Newey & West (1994).

We draw a best-fit line through 65 years of daily stock returns, asking: "After accounting for Mondays, Fridays, January, month boundaries, holidays, and recent market choppiness, do the days around paydays still show higher-than-normal returns?" The beta coefficients tell us the size of the effect, and the p-values tell us whether it could plausibly be random noise.

2. Monte Carlo permutation test

We randomly shuffle the payday calendar dates 500+ times. For each permutation, we recompute the mean return difference between "payday window" and "other" days. This builds a null distribution without relying on any assumptions about the shape of the return distribution.

The permutation p-value is the fraction of random shuffles that produce an effect as large or larger than the real one. This test is distribution-free: it does not assume normality, stationarity, or any particular error structure.

Imagine writing each day's return on a card, shuffling the deck, and randomly assigning days to "payday" or "other." If the real payday calendar produces bigger returns than 95% of random shuffles, we can be fairly confident the pattern is not coincidence.

3. GARCH(1,1) model

A GARCH(1,1) model captures time-varying volatility — the well-documented tendency of stock market volatility to cluster (calm periods follow calm periods; turbulent periods follow turbulent periods). We include the payday dummy variables in the mean equation, and model conditional variance as:

σ^2_t = ω + α · ε^2_{t-1} + β · σ^2_{t-1}

This ensures our significance tests account for the fact that volatility is not constant over time.

Regular regression pretends the market's choppiness is the same every day. GARCH admits that the market has calm weeks and stormy weeks, and adjusts the significance tests accordingly. If the payday effect is still significant after this adjustment, it is more robust.

4. FDR correction (Benjamini-Hochberg)

We run dozens of statistical tests across different pay schedules, time windows, and asset types. When you test 100 hypotheses, about 5 will look significant at the 5% level by pure chance. The Benjamini-Hochberg (1995) procedure controls the false discovery rate — the expected proportion of false positives among all findings declared significant.

We report both raw p-values and FDR-adjusted q-values for every test. A finding must survive FDR correction (q < 0.10) to be treated as reliable.

When you flip a coin 100 times, you will get runs of heads that look meaningful but are just luck. FDR correction is a way of adjusting for the fact that we ran many tests at once, so the findings that survive are less likely to be flukes.

Assumptions

Every statistical model rests on assumptions. We conducted a formal audit of every assumption underlying our analysis and documented whether each one holds, holds with caveats, or fails. The chart below summarizes the results:

Assumption audit results

Each assumption was tested against the actual data. PASS = assumption holds. CAVEAT = assumption partially holds or requires mitigation. FAIL = assumption is violated, but a mitigation strategy is in place.

Friday isolation test

As a robustness check, we removed all Fridays from the dataset and re-ran the entire analysis. If the payday effect is about settlement timing in general — money arriving a fixed number of days after payday — then it should appear on any day of the week, not just Fridays. Fridays account for roughly 20% of all trading days.

We filtered out every trading day where weekday = Friday, rebuilt all payday sets from the remaining days, and re-ran the HAC-OLS regression at every lag from 0 through 20. The result: every statistically significant finding vanished. No lag produced a p-value below 0.10 in the Friday-excluded dataset.

This is a critical robustness test. It reveals that the semi-monthly clearing-lag signal is entirely concentrated on Friday trading days. Whatever mechanism drives the effect, it operates through — or only on — Fridays.

Like testing if your lucky charm actually works by leaving it at home for a month. If you still win without it, the charm wasn't the cause. In our case, the "charm" was Fridays — and without them, the pattern disappeared.

Component decomposition

Instead of testing the semi-monthly schedule (15th + end-of-month combined) as a single unit, we tested each anchor date separately. This decomposition reveals whether the clearing-lag signal comes from one anchor, the other, or only the combination:

Mid-month (15th only) — isolates the first paycheck of the month
End-of-month (last trading day only) — isolates the second paycheck, which overlaps with the turn-of-month
Semi-monthly (15th + end-of-month) — the combined schedule as tested in our primary analysis
Military (1st + 15th) — a different anchor pair used by federal and military payroll (Thrift Savings Plan)

Component decomposition: β by lag for each payday anchor

Each line represents a different payday anchor tested independently. Dots mark statistically significant lags (p < 0.05). Data: S&P 500, 1960–2026.

We tested each payday anchor individually. Result: the 15th alone peaks at lag+16–18 (pointing to the NEXT month's turn-of-month). The end-of-month alone peaks at lag+0 (the classic turn-of-month effect). Neither shows significance at lag+7–8. Only the COMBINATION produces the lag+7–8 signal — which means it's a composite effect of having two measurement windows per month.

Day-of-month regression

Instead of measuring distance from the nearest payday (our primary lag-from-payday approach), we tested fixed calendar days directly: the 1st, 2nd, 3rd, and so on through the 31st. This is the approach used by Ma & Pratt (1992) and allows comparison with the prior literature.

We created a dummy variable for each calendar day of the month and regressed daily returns on the full set of 31 dummies with Newey-West HAC standard errors. This tests whether specific calendar dates — regardless of whether they fall on a trading day — are associated with excess returns.

The lag-from-payday approach is more precise because actual paydays shift for weekends and holidays. A fixed day-of-month test treats the 15th as the 15th regardless of whether it's a trading day. The two approaches give different answers, which is itself informative about the role of actual settlement dates vs fixed calendar dates.

Single-day excess return analysis

Our regression model measures β_run over a 5-day window. But which individual day within that window has the highest excess? To answer this, we computed the mean excess return for each individual day offset (0 through 20) relative to the overall daily mean return.

The results are revealing:

Day+1 has the highest single-day excess: +6.83 bps (p = 0.012)
Day+12 has the second highest: +6.96 bps (p = 0.006)
Day+8 itself shows essentially zero excess: −0.83 bps (not significant)

The regression β peaks at lag+8, but the SINGLE-DAY excess peaks at day+1. These are measuring different things: β_run at lag+8 captures the cumulative 5-day run-up ENDING at day+8 (i.e., days 3–7). The single-day peak at day+1 suggests the fastest settlement processors create immediate buying pressure the day after payday.

Limitations

No study is without limitations. The four most important ones for this research are:

No actual flow data: We do not have direct data on daily 401(k) contribution volumes. We are testing whether prices behave as if payroll money moves markets, but we cannot observe the flow itself. A fully correct model would weight each pay schedule by its actual dollar payroll, which is not publicly available at daily frequency.
Overnight/intraday split is imprecise: Daily OHLC bars fold pre-market (4:00–9:30 ET) and post-market (16:00–20:00 ET) trading into the "overnight" return via the next day's open print. A proper pre/post-market analysis would require intraday tick data, which is beyond the scope of this study.
Payday roll convention is uncertain: When a payday falls on a weekend or holiday, we roll it back to the prior trading day. In practice, many employers deposit pay on the preceding Friday (forward roll). This means our payday calendar may be offset by 1–2 days for roughly 30% of paydays, which could attenuate real signals.
Multiple testing: Despite applying FDR correction, the sheer number of tests (7 pay schedules × 4 buckets × 3 time windows × 2 assets) means some spurious findings may survive. We emphasize patterns that replicate across multiple methods and time periods rather than isolated significant p-values.