Merchant Baseline Calibration: Why Generic Rules Fail -- Fraudhalo

Generic fraud rules fail SMB merchants for a reason that is structural, not incidental: they are not built for the merchants they're being applied to. A rule set designed to catch high-frequency card-not-present fraud on commodity goods will misfire constantly when applied to a landscaping company in Atlanta, a specialty food distributor in Phoenix, or a local HVAC contractor in Cleveland. These businesses have transaction profiles that bear almost no resemblance to the merchant profiles generic rules were optimized for. The rule set doesn't know that. It applies the same logic regardless.

Merchant baseline calibration is the systematic approach to solving this problem. It's not a new concept in payments, but its implementation at the SMB processor level has historically been constrained by the data infrastructure requirements. Understanding both the problem and the solution in specific terms is useful for any processor evaluating their fraud operations.

Why Two Merchants Are Not the Same Problem

Consider two merchants on the same SMB processor's portfolio. Merchant A is a landscaping and outdoor services company in Atlanta. Their transactions cluster heavily in spring and fall, with near-zero volume in January and February. Average ticket is $380, with significant variance — some invoices are $150 for maintenance visits, others are $2,400 for full landscape projects. Customers are local: 90% of cardholders are within 35 miles. Most transactions happen between 7am and 6pm on weekdays. Repeat customer rate is high — the same 40 customers account for 60% of annual revenue.

Merchant B is a SaaS subscription company in New York City. Their transactions are uniformly distributed throughout the year, with slight upticks in January (new year signups) and September (back-to-school cohort). Average ticket is $79/month with essentially zero variance — it's a subscription at a fixed price. Customers are national and international: cardholders come from all 50 states and 30+ countries. Transactions happen continuously, 24/7, with automated billing at fixed intervals. Repeat customer rate is structurally 100% until cancellation.

A generic rule that flags transactions above $500 from first-time cardholders will fire frequently on Merchant A (new landscape project customers paying a substantial first invoice) and almost never on Merchant B (subscription doesn't exceed $500 per transaction). A velocity rule that flags more than three transactions per card in 24 hours will never fire on either merchant under normal conditions but will fire legitimately on Merchant A if a customer processes a deposit, a change order, and a final payment on the same day.

Apply either rule set to both merchants and you get miscalibration in both directions: excessive false positives on legitimate patterns, and gaps where the rules simply don't apply to the actual fraud risks those businesses face.

What Generic Rules Optimize For

Generic rule sets for card-not-present fraud were primarily developed and refined against high-volume e-commerce and card-not-present retail fraud patterns: stolen cards used for quick high-value purchases of resalable goods, shipped to addresses not associated with the card. The rules encode signals that are diagnostic for that fraud type: high-value first transactions, shipping-billing address mismatch, multiple cards at a single shipping address, rapid transaction velocity from a single IP.

These signals are genuinely informative for the fraud type they were built for. They become misleading when applied to merchants whose normal operations generate similar signals. A merchant who regularly ships to business addresses that differ from billing addresses will generate billing-shipping mismatches on every legitimate B2B order. A merchant with a one-day sale will generate legitimate velocity spikes. The rule can't distinguish.

The result is a systematic calibration mismatch: SMB merchants with irregular, high-variance, locally-concentrated transaction profiles are penalized by rules that were calibrated against high-frequency, geographically distributed, consistent-value transaction patterns. The mismatch isn't occasional — it's structural.

How Per-Merchant Baseline Calibration Works

Merchant baseline calibration replaces population-level benchmarks with merchant-specific behavioral models. The calibration process uses the merchant's own transaction history as the reference point for anomaly detection.

Data window: 30 to 60 days of transaction history is the standard calibration window. Less than 30 days produces baselines that are too sensitive to short-term variance. More than 90 days dilutes signals about recent behavioral changes (which matter for bust-out detection). 30-60 days balances stability against responsiveness.

Key metrics for calibration:

Ticket size distribution: Mean, standard deviation, and percentiles (25th, 75th, 95th) of transaction amounts. A transaction at the 98th percentile of a merchant's historical ticket distribution is an anomaly; a transaction at the 60th percentile is normal. This determination requires knowing the distribution for this specific merchant.
Peak hour patterns: The distribution of transaction timing across hours of the day and days of the week. A transaction at 2am on a Sunday for a merchant whose entire historical volume occurs Monday-Friday between 8am and 5pm is a timing anomaly. For a merchant with 24/7 volume, the same transaction is entirely normal.
Geographic spread: The distribution of cardholder zip codes relative to the merchant's location. A landscaper with 90% of cardholders within 35 miles has a tight geographic baseline; a national e-commerce merchant has a broad one. Transactions from outside the established geographic footprint are anomalous for the former, normal for the latter.
Customer return rate: The fraction of transactions from previously-seen cardholders. A merchant with 70% repeat customers has a high return rate baseline; a new card at this merchant is more anomalous than a new card at a merchant with 10% repeat customers.

Baseline Drift and Recalibration

Merchant baselines are not static. Legitimate merchants change: they run promotions, expand to new markets, change their pricing, or shift their customer mix. A baseline calibrated in January may not accurately represent a merchant's normal patterns in June if the business has grown or changed substantially.

Baseline drift refers to gradual, legitimate shifts in a merchant's transaction profile that should update the baseline rather than trigger fraud alerts. A merchant expanding from local to regional service delivery will show geographic spread increasing over 2-3 months — this is growth, not an attack. A pricing change will shift the ticket size distribution. Treating these legitimate shifts as anomalies creates false positives; failing to update the baseline leaves the model miscalibrated.

The recalibration trigger is a rate-of-change threshold on baseline metrics. Gradual drift that stays below the rate-of-change threshold updates the rolling baseline. Sudden discontinuities — ticket size jumping 50% in a week — trigger alerts for human review rather than silent baseline updates. This distinction is central to the bust-out detection use case: the behavioral drift pattern that precedes bust-out fraud looks like legitimate growth until the rate of change exceeds what organic growth produces. The recalibration logic has to encode that distinction. Baselines should be recalibrated when legitimate growth signals are confirmed, either through automated confidence thresholds or human review, but never through silent drift that normalizes anomalous escalation.