How Card Testing Attacks Work — and How to Stop Them

A card testing attack is one of the most mechanically predictable fraud patterns in payments, yet it keeps burning processors that haven't tuned their velocity rules for the specific fingerprint. The attack structure is nearly always the same: an actor acquires a batch of compromised card numbers — typically via a credential dump, dark web market, or BIN attack — and needs to verify which cards are still live before committing them to higher-value fraud. They run a sequence of small-dollar authorizations against a merchant or processor API, observe which succeed, and discard the rest. The damage to the processor shows up a week later in chargeback volumes, interchange penalties, and, if the ratio climbs high enough, escalation from Visa's Fraud Monitoring Program (VFMP) or Mastercard's Excessive Chargeback Program (ECP).

Understanding the mechanics matters because the detection signals derive directly from the attack structure. If you know what the attacker is optimizing for — rapid card enumeration with minimum transaction footprint — you know exactly which velocity windows to instrument.

The anatomy of a BIN attack

A BIN (Bank Identification Number) attack is the most common card testing variant targeting payment infrastructure directly. The attacker doesn't need a list of known card numbers. Instead, they take a 6-digit BIN prefix and systematically enumerate possible account numbers against it, submitting transactions in rapid succession to test which combinations are valid. A BIN with a 4-to-6 digit account number space generates thousands of possible card numbers, and even a 1–2% hit rate is operationally sufficient for the attacker if they can run at scale.

The transaction signature of a BIN attack is distinct:

High authorization volume against a narrow BIN prefix — often 50–500 transaction attempts within a 5–15 minute window
Amounts clustered at $0.00–$1.00 (some processors allow $0 auth), or at common test amounts like $0.01, $0.99, $1.00
Single merchant or single MID as the target — the attacker picks a low-friction endpoint with predictable response codes
Device fingerprint recycling: the same device ID or IP subnet appears across dozens of card numbers in sequence
Rapid decline-then-retry patterns: the attacker sees a decline code, immediately submits the next card in the enumerated sequence

The signals for BIN attacks are velocity signals operating on short windows — 5 minutes, 15 minutes, 1 hour. Standard daily or weekly velocity rules don't catch them because the attack completes and disperses before aggregation windows close.

Distributed card testing: the harder variant

The pattern that most mid-size processors get caught by is distributed card testing — where the attacker spreads testing attempts across multiple merchants, IP addresses, device fingerprints, and time windows to stay below per-entity velocity thresholds. A distributed testing operation might submit 4–6 attempts per device across 20 different device identifiers, targeting 15 different merchant IDs, spread across a 4-hour window. No single entity trips a velocity rule. But the cross-entity pattern — same card BIN prefix appearing across multiple devices, merchants, and IPs in a correlated time band — is the tell.

Detection at this level requires aggregating across the full graph: BIN → device, BIN → IP, BIN → merchant, device → account. The signal is not per-entity velocity; it is entity-graph co-occurrence within a time window. This is precisely where rule-based systems fail — a rule can encode velocity per entity, but encoding a graph query is not something a rules engine does naturally. It requires a feature store that can compute cross-entity counts in near real time.

What the velocity features look like in practice

In a production fraud scoring context, the features that carry predictive weight for card testing detection are computed across rolling windows at multiple granularities. A feature set for this attack type typically includes:

{
 "bin_6_auth_count_15m": 47,
 "bin_6_decline_rate_15m": 0.89,
 "device_id_card_count_1h": 23,
 "ip_subnet_card_count_30m": 31,
 "merchant_id_small_dollar_count_5m": 18,
 "amount_bucket": "0_to_1_usd",
 "card_sequence_gap_median_seconds": 4.2
}

The bin_6_auth_count_15m feature — how many authorization attempts have hit this BIN prefix in the last 15 minutes — is the single strongest univariate predictor for BIN attacks in our internal validation study using synthetic transaction streams. The card_sequence_gap_median_seconds feature captures the attacker's retry cadence: human card entry has a minimum of 15–30 seconds between attempts; automated testing loops run at 2–8 seconds per attempt depending on network latency and API throttling. A median gap below 10 seconds on a sequence of 5+ card attempts from the same device is a strong positive signal even if no single velocity threshold is breached.

We're not saying rule-based systems can't catch any card testing — simple BIN velocity rules do catch obvious attacks. What they can't catch is the long tail of distributed, throttled, and BIN-spread attacks that account for the majority of testing volume in practice.

A plausible production scenario

Consider a payment processor handling approximately 120,000 daily transactions for a mix of e-commerce merchants — an operating profile typical of a mid-market acquirer. In a synthetic transaction stream modeling this environment, a card testing campaign targeting a single high-throughput merchant MID generates 340 authorization attempts over 22 minutes, using 8 distinct device fingerprints cycling through a BIN range on a 10-second cadence. The median transaction amount is $0.99. The decline rate across the batch is 91%.

A daily velocity rule set at 100 transactions per card per day doesn't fire — no single card number breaches the threshold. A per-device rule set at 10 transactions per device per hour doesn't fire individually. But a feature computing bin_6_auth_count_15m reaches 78 by minute 12. At that point, in our internal evaluation, a gradient-boosted model trained on this feature produces a fraud score above 85 for every subsequent transaction in the sequence. The scoring latency for these feature computations — including a Redis Cluster lookup for the BIN window count — runs at p50 of 22ms and p99 of 61ms, well within the sub-100ms authorization window.

The key operational outcome is that flagging happens before the batch completes, not after the chargebacks arrive.

Network program exposure: VFMP and ECP thresholds

The business reason card testing gets elevated attention is its downstream effect on chargeback ratios. When tested cards are confirmed live and then used for unauthorized purchases, the chargebacks land on the merchants where the fraudulent transactions clear — which are often the same merchants used for the testing phase, or other merchants on the same processor's portfolio. Visa's VFMP monitoring triggers at a chargeback ratio above 0.9% of transactions or 0.9% of volume in a given month. Mastercard's ECP triggers at 1.5% chargebacks in a month. Both programs carry monthly fines and eventual termination risk if the ratio persists.

A processor handling 120K daily transactions at an average ticket of $85 needs roughly 1,080 disputed transactions per month to breach the Visa VFMP threshold — that's about 36 chargebacks per day. A card testing campaign that validates 200 cards in a single session, each subsequently used for $200–$500 in fraudulent purchases, can generate that volume from a single incident. The math makes early detection — at the testing phase, before the validated cards are deployed — far more cost-effective than reactive chargeback response.

Where purely reactive approaches fall short

The standard reactive playbook — dispute all chargebacks with compelling evidence, submit representment, accept some loss rate — works for first-party fraud and friendly fraud where the transaction itself was legitimate. It doesn't work for card testing because the testing transaction and the actual fraud transaction may land on different merchants, different processors, or different acquirers entirely. The processor who absorbed the card testing transactions often isn't the one who absorbs the downstream fraud. There's no chargeback to representment; the exposure is indirect, via card network monitoring ratios computed across your entire portfolio.

This is why the detection investment needs to live at the authorization layer — before the transaction completes — not in the dispute operations workflow. The detection problem and the chargeback problem operate on different timescales: authorization decisions happen in milliseconds; chargeback cycles run 30–90 days. By the time the dispute data arrives, the card testing campaign is long complete and the tested cards are in active use.

For processors evaluating their current card testing exposure, the first diagnostic to run is bin_6_auth_count_15m across your historical transaction logs. If you're seeing peaks above 30 authorization attempts per BIN prefix per 15-minute window at small dollar amounts, you have active testing traffic that your current rules may be missing. The detection signals are there — the question is whether your feature pipeline is computing them fast enough to act on them.

See also: Fraudhalo's Card Testing Detection page for the specific signals we monitor in production, and How It Works for the full feature computation architecture.