Anonymised framing. This page refers to the comparison vendor as
Provider P — a top-tier KYT vendor with a paid REST API. The provider’s
real name will appear here after written counsel sign-off.
What we measure
For every screened address we compare two verdicts on the same address, fetched fresh (cache-bypass) from both engines on the same UTC day:- Aegis verdict — Tier 0 (SDN) + Tier 1–2 (multi-source consensus) +
Tier 3 (1-hop inheritance) + Tier 4 (BFS / flow exposure). Same code
path that powers our
/check-addressand/v2/check-addressendpoints. - Provider P verdict — the vendor’s address screen endpoint.
| Band | Aegis levels in band | Provider P levels in band |
|---|---|---|
| ACTIONABLE | sanctioned, critical, high | the vendor’s “block / review” tier |
| CLEAR | medium, low, none | the vendor’s “allow / low” tier |
Benchmark: 100 addresses, two strata (2026-05-29)
After a 48-row pilot we ran a deliberately bias-correcting 100-row benchmark. The pilot sample was drawn entirely from addresses Aegis already knows about — by construction it couldn’t surface anything Provider P knew that we didn’t. The benchmark fixes that.Sample design
| Half | Net | n | Source | Aegis prior data? |
|---|---|---|---|---|
| Stratified known | ETH | 25 | our consensus view (12 ACTIONABLE + 13 CLEAR) | Yes |
| Stratified known | BSC | 25 | same | Yes |
| Fresh real-traffic | TRON | 50 | Tronscan: recipients of >$1000 USDT transfers, last 24h, filtered to addresses NOT in our labels DB | No |
| Total | 100 |
Headline numbers
| Metric | Value |
|---|---|
| Total cases | 100 |
| In-denominator (provider errors removed) | 96 |
| Two-band agreement | 66 / 96 = 68.8 % |
| Aegis-only ACTIONABLE (Aegis stricter) | 29 |
| Provider-only ACTIONABLE (Aegis blind spots) | 1 |
| Both ACTIONABLE (agreeing risky verdict) | 10 |
| Both CLEAR (agreeing clean verdict) | 56 |
| Excluded (Provider P poll timeout) | 4 |
Per-stratum breakdown
| Stratum | n | denom | agreement | Aegis stricter | Blind spots |
|---|---|---|---|---|---|
| Stratified ACTIONABLE (ETH+BSC) | 24 | 24 | 41.7 % | 14 | 0 |
| Stratified CLEAR (ETH+BSC) | 26 | 26 | 96.2 % | 1 | 0 |
| Fresh unknown TRON | 50 | 46 | 67.4 % | 14 | 1 |
- Stratified-CLEAR (96 %). When both engines see no signal, they
almost always agree. The single mismatch in 26 is Aegis upgrading
to
sanctionedfrom OFAC-SDN where Provider P calls the addresslow— a calibration disagreement, not a coverage failure on either side. - Stratified-ACTIONABLE (42 %). This number is low by design:
the stratum is cherry-picked from addresses Aegis already flags as
risky, then we ask whether Provider P agrees. When the engines
disagree (14 / 24) every single time it’s because Aegis is
stricter — Aegis surfaces Tether-blacklist enforcement, OFAC-SDN
matches, attacker labels, sanctioned-exchange tags where Provider P
returns
none / low / medium. This is not a quality score for either side; it’s the asymmetry that matters. - Fresh-unknown TRON (67 %). Real-world traffic baseline. 31 of 46 both CLEAR (typical traffic — neither engine flags). 14 are Aegis-stricter (BFS / Tier-4 caught flow exposure on a fresh address with no direct label — inheritance from a sanctioned cluster). And one blind spot (below).
The one blind spot we found
risk_level=high / category=high_risk_exchange / sourced
from the provider. The very next Aegis check on the address returned:
How to read these numbers
Aegis is a near-superset of Provider P’s actionable set. Across
96 in-denominator cases there are 29 Aegis-stricter mismatches and 1
the other way — and that one was closed within the same run by the
feedback loop above.
Stratified-CLEAR agreement is 96 %. In the easy case — neither
engine seeing risk — the two engines almost always agree.
BFS / Tier-4 catches what labels alone can’t. 14 of 50 fresh
TRON addresses with no labels at all in either engine were
flagged as ACTIONABLE by Aegis via flow-exposure inheritance from
sanctioned clusters. Provider P returned CLEAR on these 14 —
suggesting it relies on direct identity tags more than flow
propagation. This is a meaningful coverage difference in our favour.
Cost & reproducibility
- Provider P spend for this benchmark: $96 USD (4 of 100 cases were excluded as provider poll timeouts and not charged).
- Aegis side: internal compute.
- The next benchmark (randomly-sampled in the hundreds, same harness, same band design) will run when there’s an audience for a published precision number.

