ShieldProbe Assess

Scanners find CVEs. We find breaches.

ShieldProbe runs a real attack campaign — not a scan — on your web apps and APIs. Business logic flaws, multi-step exploit chains, auth bypasses, API authz gaps. Audit-grade report in 48 hours. CREST-certified human validation on every finding. Unlimited retests for 12 months.

CREST-Certified
Reasoning Engine
48h
48 HoursFrom scope to audit-grade report
+30%More verified vulns vs. other AI pentest tools
12
12 MonthsUnlimited retests — deterministic replays
Blind Benchmark · Financial Application

Competitors saw a JPEG. ShieldProbe saw an entry vector.

Same target, same scope, same clock. Here's what the reasoning difference produced.

Other AI pentest tools
  1. Scanned avatar.jpg — treated as a static asset, never parsed.
  2. Tested form endpoints — got HTTP 200, marked healthy.
  3. Ran WAF signature rules — no known CVE pattern matched.
  4. Business-logic layer — out of scope, no template to follow.
  5. Report delivered: 0 critical findings.

Signature-based testing can't reason. It matches known patterns or returns nothing.

ShieldProbe Assess
  1. 1
    Analyzed the manager's profile avatar — the AI treated the image as an attack surface, not a decoration.
  2. 2
    OCR'd a blurry sticky note visible in the background of the photo.
  3. 3
    Extracted credentials from the sticky note and tested them against the auth endpoint.
  4. 4
    Authenticated into the internal finance dashboard.
  5. 5
    Exploited a business-logic flaw to authorize a $50,000 fraudulent transfer.

Reproducible. Every step shipped with requests, responses, screenshots, and payloads. CREST-certified consultant signed the report.

What Assess actually finds

Scanners match known CVEs against known software versions. Your app's logic is custom — which is exactly where real breaches happen. These are the classes Assess is built to catch.

Business-logic flaws

A banking withdraw form that treats a negative number as a deposit. A discount coupon that stacks on itself. An API that trusts client-sent prices. Scanners return HTTP 200 and call it healthy.

Multi-step exploit chains

A low-severity path disclosure, a weak session token, and an exposed admin endpoint chained into full takeover. Each looks benign alone. The AI plans across them.

Auth & session flaws

JWT alg confusion, broken MFA paths, session fixation, cookie flag gaps, OAuth redirect misconfigurations. The kind of thing pattern-matchers can't test because the test requires reasoning about state.

API authorization bypass

IDOR, BOLA (broken object-level authorization), mass assignment, rate-limit evasion, GraphQL batching abuse, SOAP/REST/GraphQL coverage. The AI drives the API like an authenticated user who shouldn't have access.

Cross-layer chains

A low-severity cloud misconfiguration becomes critical when it enables lateral movement into an app with business-logic flaws. Single-layer competitors miss these paths by construction.

What CVE scanners systematically miss

Anything that isn't a known vulnerable dependency on a known version. Your custom logic, your auth flows, your billing rules, your admin pathways. All invisible to signature-based tooling. All in scope for Assess.

How the engagement actually runs

No abstract four-box diagram. Here's the mechanics a security engineer would ask about.

Blackbox or greybox

You pick. Credentialed testing if you want depth against authenticated flows; uncredentialed if you want the attacker's view. Both on the same engagement if you want full coverage.

OWASP Top 10 or full WSTG

Fast scope against the Top 10 for quick compliance coverage, or the complete WSTG (Web Security Testing Guide) for depth. You choose before the run; we don't pad the bill by switching mid-engagement.

Internal testing via a 5 MB connector

No agent. No admin rights. No privileged install. A 5 MB connector gets us into your internal surface the same day — no procurement war, no security-review quarter.

Reproducible exploit evidence

Every finding ships with the full request/response pair, payload, screenshot, and a replay script. Not theoretical. Your devs — and your auditors — can reproduce each one exactly.

CREST-certified human validation

Every report is signed off by a CREST-certified consultant before it leaves the platform. Today human review covers ~35% of findings; our RLHF flywheel is driving that toward <20% without lowering the signal bar.

12+ compliance output formats

SOC 2, ISO 27001, PCI DSS, HIPAA, NIST 800-53, and more. Big 4 auditors accept ManticoreAI reports directly. PDF, JSON, and SARIF exports for your own tooling.

What the 48 hours actually looks like

If speed is the only thing you've heard, you've heard the consequence, not the cause.

H+0

Scope & launch

You define targets in the platform. Methodology pre-selected. No back-and-forth on SoW for a week. Testing kicks off in hours, not weeks.

H+0–36

Live attack campaign

The reasoning engine runs a persistent agent — tracking targets, credentials, footholds, and hypotheses across the engagement. It chains exploits, pivots when it hits a dead end, and uses its kernel-level driver to drive real tooling (Burp, SQLMap, Metasploit) the way an operator would.

H+36–44

CREST-certified human validation

A certified consultant reviews each flagged finding against the replay evidence, removes noise, and signs off. This is where audit-grade comes from.

H+44–48

Report generation

Audit-grade report compiled into your chosen compliance format. Findings export to JSON/SARIF for your SIEM or ticketing. Defend can auto-apply virtual patches the moment findings land; Fix pushes candidate PRs into the developers' IDEs.

What Assess does not cover

Scope honesty is credibility. Here's what we decline.

Out of scope

Phishing & social engineering

Human-layer testing belongs with specialist consultancies.

Out of scope

Desktop applications

Thick-client native apps are not what we're built for.

Out of scope

Mobile applications

iOS and Android binary testing is out of scope.

Out of scope

Physical security

Badge cloning, lock-picking, facility access — not our surface.

Retests

Unlimited retests for 12 months — here's why we can give them away

Each retest is an atomic deterministic replay of the original exploit chain against your latest deploy. Not a fresh scan. Marginal cost to us is effectively zero, so it's free to you.

Traditional consultancies charge per retest because each run requires rebooking a human pentester. AI-only scanners charge because their runs aren't deterministic — every rerun re-discovers the same noise. Ours is a signed, replayable artifact.

Retests, 12 months
~0
Marginal cost per replay
45×
Faster turn vs. consulting

Traditional consultancy, AI-only scanner, or Assess

Three paths to a pentest report. Here's where each one breaks.

Traditional consultancyAI-only scannerShieldProbe Assess
Time to audit-grade report6–8 weeksHours, but not audit-grade48 hours, audit-grade
Business-logic depthVaries by operatorNone — pattern matching onlyReasoning engine, kernel-driven tooling
Human validationYes, by assigned consultantNone or self-attestedCREST-certified sign-off on every report
Retest modelExtra cost, schedulingRe-run noise on every triggerUnlimited, deterministic replays, 12 months
Evidence formatNarrative + screenshotsSignatures, often no exploitReplayable payloads, requests, responses
Auditor acceptanceUsually yesFrequently rejectedBig 4 auditors accept directly
Price band$50–200K per engagementCheap, low signal$10–120K/year, 40–60% below consulting

Run the benchmark on your own app

Scope an engagement today. Audit-grade report in 48 hours. Findings hand off directly to Defend for instant protection and Fix for developer-side remediation.