Methodology
How we tested the hypothesis: data sources, statistical models, controls, and assumptions.
Research question
Do S&P 500 daily returns exhibit statistically significant excess performance in the trading days surrounding predictable payroll-driven inflows (semi-monthly, biweekly, and other pay schedules), after controlling for known calendar anomalies, and is any such effect consistent with the mechanism of 401(k) contribution flow rather than alternative explanations?
Data
| Field | S&P 500 | Bitcoin |
|---|---|---|
| Source | Yahoo Finance | Yahoo Finance |
| Ticker | ^GSPC | BTC-USD |
| Sample (main) | 1960–2026 (65 years) | 2016–2026 (10 years) |
| Additional windows | 2016–2026, 2020–2026 | 2020–2026 |
| Rows | 16,680 trading days | 3,758 calendar days |
| Fields | Date, Open, High, Low, Close, Volume | |
Bitcoin serves as a secondary test asset. Because BTC trades 24/7 and has no "overnight" session, it lets us check whether payday effects are specific to traditional equity market hours or appear in any risk asset.
Payday definitions
We constructed seven distinct payday calendars, each representing a different payroll schedule used in the US economy:
| Schedule | Payday dates | Why we test it |
|---|---|---|
| Mid-month | 15th of each month (rolled to prior trading day if weekend/holiday) | First leg of semi-monthly pay; common for salaried workers |
| End-of-month | Last trading day of each month | Second leg of semi-monthly pay; overlaps with turn-of-month |
| Semi-monthly | Both 15th and month-end | The combined schedule used by ~19% of US workers; primary test |
| Biweekly | Every other Friday (union of two offset calendars) | Most common US pay schedule (~37% of workers); lower flow concentration |
| Weekly | Every Friday | Negative control: spreads flow across all weeks, should dilute any signal |
| Military | 1st and 15th of each month | Thrift Savings Plan (TSP) contributions from federal/military pay |
| Social Security | 2nd, 3rd, 4th Wednesdays of each month | Negative control: recipients generally do not invest in equities via payroll deduction |
Bucket definitions
For each payday calendar, every trading day is assigned to exactly one bucket based on its distance from the nearest payday:
| Bucket | Days relative to payday | What it captures |
|---|---|---|
| Run-up | T-5 through T-1 | The 5 trading days before a payday. If someone is front-running the expected inflow, this is where we would see it — prices rising in anticipation of the wave of buying. |
| Payday | T (day zero) | The payday itself. If 401(k) money hits the market immediately, this day should show the impact. |
| Post-payday | T+1 through T+3 | The 3 trading days after payday. Accounts for the fact that contributions may take a few days to settle and actually execute as stock purchases. |
| Other | Everything else | All days that do not fall within any payday window. This is the baseline — the "normal" days we compare against. |
Statistical methods
We apply four independent statistical methods to the same data. If a finding appears in multiple methods, it is more likely to be real rather than an artifact of one particular test's assumptions.
1. HAC-OLS regression
The primary model is an ordinary least squares regression with Newey-West heteroskedasticity- and autocorrelation-consistent standard errors:
r_t = α + β_run · D_run + β_pay · D_pay + β_post · D_post + γ · Controls + ε_t
Where r_t is the daily return, D_run, D_pay, D_post are dummy variables for each bucket, and Controls include:
- Monday dummy — captures the well-documented Monday effect (French 1980)
- Friday dummy — captures end-of-week patterns (Cross 1973)
- January dummy — captures the January effect (Rozeff & Kinney 1976)
- Turn-of-month dummy — captures the last day + first 3 days of each month (Ariel 1987, Lakonishok & Smidt 1988)
- Pre-holiday dummy — captures elevated returns before market holidays (Ariel 1990)
- AR(1) term — lagged return to capture serial correlation
- Volatility regime — realized volatility proxy to control for clustering
The Newey-West bandwidth is chosen automatically: L = floor(4 × (T/100)^(2/9)), following Newey & West (1994).
2. Monte Carlo permutation test
We randomly shuffle the payday calendar dates 500+ times. For each permutation, we recompute the mean return difference between "payday window" and "other" days. This builds a null distribution without relying on any assumptions about the shape of the return distribution.
The permutation p-value is the fraction of random shuffles that produce an effect as large or larger than the real one. This test is distribution-free: it does not assume normality, stationarity, or any particular error structure.
3. GARCH(1,1) model
A GARCH(1,1) model captures time-varying volatility — the well-documented tendency of stock market volatility to cluster (calm periods follow calm periods; turbulent periods follow turbulent periods). We include the payday dummy variables in the mean equation, and model conditional variance as:
σ^2_t = ω + α · ε^2_{t-1} + β · σ^2_{t-1}
This ensures our significance tests account for the fact that volatility is not constant over time.
4. FDR correction (Benjamini-Hochberg)
We run dozens of statistical tests across different pay schedules, time windows, and asset types. When you test 100 hypotheses, about 5 will look significant at the 5% level by pure chance. The Benjamini-Hochberg (1995) procedure controls the false discovery rate — the expected proportion of false positives among all findings declared significant.
We report both raw p-values and FDR-adjusted q-values for every test. A finding must survive FDR correction (q < 0.10) to be treated as reliable.
Assumptions
Every statistical model rests on assumptions. We conducted a formal audit of every assumption underlying our analysis and documented whether each one holds, holds with caveats, or fails. The chart below summarizes the results:
Limitations
No study is without limitations. The four most important ones for this research are:
- No actual flow data: We do not have direct data on daily 401(k) contribution volumes. We are testing whether prices behave as if payroll money moves markets, but we cannot observe the flow itself. A fully correct model would weight each pay schedule by its actual dollar payroll, which is not publicly available at daily frequency.
- Overnight/intraday split is imprecise: Daily OHLC bars fold pre-market (4:00–9:30 ET) and post-market (16:00–20:00 ET) trading into the "overnight" return via the next day's open print. A proper pre/post-market analysis would require intraday tick data, which is beyond the scope of this study.
- Payday roll convention is uncertain: When a payday falls on a weekend or holiday, we roll it back to the prior trading day. In practice, many employers deposit pay on the preceding Friday (forward roll). This means our payday calendar may be offset by 1–2 days for roughly 30% of paydays, which could attenuate real signals.
- Multiple testing: Despite applying FDR correction, the sheer number of tests (7 pay schedules × 4 buckets × 3 time windows × 2 assets) means some spurious findings may survive. We emphasize patterns that replicate across multiple methods and time periods rather than isolated significant p-values.