Research question

Do S&P 500 daily returns exhibit statistically significant excess performance in the trading days surrounding predictable payroll-driven inflows (semi-monthly, biweekly, and other pay schedules), after controlling for known calendar anomalies, and is any such effect consistent with the mechanism of 401(k) contribution flow rather than alternative explanations?

Data

FieldS&P 500Bitcoin
SourceYahoo FinanceYahoo Finance
Ticker^GSPCBTC-USD
Sample (main)1960–2026 (65 years)2016–2026 (10 years)
Additional windows2016–2026, 2020–20262020–2026
Rows16,680 trading days3,758 calendar days
FieldsDate, Open, High, Low, Close, Volume

Bitcoin serves as a secondary test asset. Because BTC trades 24/7 and has no "overnight" session, it lets us check whether payday effects are specific to traditional equity market hours or appear in any risk asset.

Payday definitions

We constructed seven distinct payday calendars, each representing a different payroll schedule used in the US economy:

SchedulePayday datesWhy we test it
Mid-month 15th of each month (rolled to prior trading day if weekend/holiday) First leg of semi-monthly pay; common for salaried workers
End-of-month Last trading day of each month Second leg of semi-monthly pay; overlaps with turn-of-month
Semi-monthly Both 15th and month-end The combined schedule used by ~19% of US workers; primary test
Biweekly Every other Friday (union of two offset calendars) Most common US pay schedule (~37% of workers); lower flow concentration
Weekly Every Friday Negative control: spreads flow across all weeks, should dilute any signal
Military 1st and 15th of each month Thrift Savings Plan (TSP) contributions from federal/military pay
Social Security 2nd, 3rd, 4th Wednesdays of each month Negative control: recipients generally do not invest in equities via payroll deduction

Bucket definitions

For each payday calendar, every trading day is assigned to exactly one bucket based on its distance from the nearest payday:

BucketDays relative to paydayWhat it captures
Run-up T-5 through T-1 The 5 trading days before a payday. If someone is front-running the expected inflow, this is where we would see it — prices rising in anticipation of the wave of buying.
Payday T (day zero) The payday itself. If 401(k) money hits the market immediately, this day should show the impact.
Post-payday T+1 through T+3 The 3 trading days after payday. Accounts for the fact that contributions may take a few days to settle and actually execute as stock purchases.
Other Everything else All days that do not fall within any payday window. This is the baseline — the "normal" days we compare against.

Statistical methods

We apply four independent statistical methods to the same data. If a finding appears in multiple methods, it is more likely to be real rather than an artifact of one particular test's assumptions.

1. HAC-OLS regression

The primary model is an ordinary least squares regression with Newey-West heteroskedasticity- and autocorrelation-consistent standard errors:

r_t = α + β_run · D_run + β_pay · D_pay + β_post · D_post + γ · Controls + ε_t

Where r_t is the daily return, D_run, D_pay, D_post are dummy variables for each bucket, and Controls include:

The Newey-West bandwidth is chosen automatically: L = floor(4 × (T/100)^(2/9)), following Newey & West (1994).

We draw a best-fit line through 65 years of daily stock returns, asking: "After accounting for Mondays, Fridays, January, month boundaries, holidays, and recent market choppiness, do the days around paydays still show higher-than-normal returns?" The beta coefficients tell us the size of the effect, and the p-values tell us whether it could plausibly be random noise.

2. Monte Carlo permutation test

We randomly shuffle the payday calendar dates 500+ times. For each permutation, we recompute the mean return difference between "payday window" and "other" days. This builds a null distribution without relying on any assumptions about the shape of the return distribution.

The permutation p-value is the fraction of random shuffles that produce an effect as large or larger than the real one. This test is distribution-free: it does not assume normality, stationarity, or any particular error structure.

Imagine writing each day's return on a card, shuffling the deck, and randomly assigning days to "payday" or "other." If the real payday calendar produces bigger returns than 95% of random shuffles, we can be fairly confident the pattern is not coincidence.

3. GARCH(1,1) model

A GARCH(1,1) model captures time-varying volatility — the well-documented tendency of stock market volatility to cluster (calm periods follow calm periods; turbulent periods follow turbulent periods). We include the payday dummy variables in the mean equation, and model conditional variance as:

σ^2_t = ω + α · ε^2_{t-1} + β · σ^2_{t-1}

This ensures our significance tests account for the fact that volatility is not constant over time.

Regular regression pretends the market's choppiness is the same every day. GARCH admits that the market has calm weeks and stormy weeks, and adjusts the significance tests accordingly. If the payday effect is still significant after this adjustment, it is more robust.

4. FDR correction (Benjamini-Hochberg)

We run dozens of statistical tests across different pay schedules, time windows, and asset types. When you test 100 hypotheses, about 5 will look significant at the 5% level by pure chance. The Benjamini-Hochberg (1995) procedure controls the false discovery rate — the expected proportion of false positives among all findings declared significant.

We report both raw p-values and FDR-adjusted q-values for every test. A finding must survive FDR correction (q < 0.10) to be treated as reliable.

When you flip a coin 100 times, you will get runs of heads that look meaningful but are just luck. FDR correction is a way of adjusting for the fact that we ran many tests at once, so the findings that survive are less likely to be flukes.

Assumptions

Every statistical model rests on assumptions. We conducted a formal audit of every assumption underlying our analysis and documented whether each one holds, holds with caveats, or fails. The chart below summarizes the results:

Assumption audit results
Each assumption was tested against the actual data. PASS = assumption holds. CAVEAT = assumption partially holds or requires mitigation. FAIL = assumption is violated, but a mitigation strategy is in place.

Limitations

No study is without limitations. The four most important ones for this research are:

  1. No actual flow data: We do not have direct data on daily 401(k) contribution volumes. We are testing whether prices behave as if payroll money moves markets, but we cannot observe the flow itself. A fully correct model would weight each pay schedule by its actual dollar payroll, which is not publicly available at daily frequency.
  2. Overnight/intraday split is imprecise: Daily OHLC bars fold pre-market (4:00–9:30 ET) and post-market (16:00–20:00 ET) trading into the "overnight" return via the next day's open print. A proper pre/post-market analysis would require intraday tick data, which is beyond the scope of this study.
  3. Payday roll convention is uncertain: When a payday falls on a weekend or holiday, we roll it back to the prior trading day. In practice, many employers deposit pay on the preceding Friday (forward roll). This means our payday calendar may be offset by 1–2 days for roughly 30% of paydays, which could attenuate real signals.
  4. Multiple testing: Despite applying FDR correction, the sheer number of tests (7 pay schedules × 4 buckets × 3 time windows × 2 assets) means some spurious findings may survive. We emphasize patterns that replicate across multiple methods and time periods rather than isolated significant p-values.