Developer experience
One POST /scan away.
The API is RESTful and so is the docs. First-party SDKs for Python and TypeScript, plain curl for everyone else. All open-source, Apache-2.0.
Install
Python alpha
pip install scrapesmith
TypeScript alpha
npm i @scrapesmith/sdk
curl always
curl …
Submit a URL
curl
Python
TypeScript
curl -X POST https://api.scrapesmith.io/scan \
-H "X-API-Key: $SCRAPESMITH_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url":"https://suspicious.example/login"}'
# => {"id":"","status":"pending"} from scrapesmith_client import ScrapeSmith
client = ScrapeSmith(api_key="ss_...")
scan = client.scan_and_wait("https://suspicious.example/login")
print(scan.verdict, scan.score)
# malicious 78
for sig in scan.signals:
print(" -", sig["id"], sig["message"])import { ScrapeSmith } from "@scrapesmith/sdk";
const client = new ScrapeSmith({ apiKey: process.env.SCRAPESMITH_API_KEY });
const scan = await client.scanAndWait("https://suspicious.example/login");
console.log(scan.verdict, scan.score);
for (const sig of scan.signals) {
console.log(" -", sig.id, sig.message);
}Find similar kits
curl
Python
TypeScript
curl https://api.scrapesmith.io/scan/$SCAN_ID/similar
# {"results":[{"id":"...","url":"...","distance":3,"verdict":"malicious","score":75}, ...]}similar = client.similar(scan.id, max_distance=8)
for s in similar:
print(f"distance={s.distance} {s.verdict:>10} {s.url}")const similar = await client.similar(scan.id, { maxDistance: 8 });
for (const s of similar) {
console.log(`distance=${s.distance} ${s.verdict ?? "—"} ${s.url}`);
}Watch a brand
Register a brand and we'll match its keywords against every new scan.
Hits fire a brand.match webhook event and surface as
brand_watch.host_match / dom_match signals on
the scan record. Hosts in allowed_domains (and their
subdomains) never fire - the brand's real site doesn't trip its own
watch.
curl
Python
TypeScript
curl -X POST https://api.scrapesmith.io/admin/brands \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Acme Bank",
"keywords": ["acme", "acmebank", "acme-bank"],
"allowed_domains": ["acmebank.com", "acme.com"]
}'import httpx
httpx.post(
"https://api.scrapesmith.io/admin/brands",
headers={"Authorization": f"Bearer {ADMIN_TOKEN}"},
json={
"name": "Acme Bank",
"keywords": ["acme", "acmebank", "acme-bank"],
"allowed_domains": ["acmebank.com", "acme.com"],
},
).raise_for_status()await fetch("https://api.scrapesmith.io/admin/brands", {
method: "POST",
headers: {
"Authorization": `Bearer ${ADMIN_TOKEN}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
name: "Acme Bank",
keywords: ["acme", "acmebank", "acme-bank"],
allowed_domains: ["acmebank.com", "acme.com"],
}),
});Verify webhook signatures
Every webhook is HMAC-SHA256 signed over <timestamp>.<body>.
Verify it on your endpoint — the SDKs ship a one-liner.
Python (FastAPI)
TypeScript (Hono/Express)
from fastapi import FastAPI, Header, Request, HTTPException
from scrapesmith_client import verify_webhook
app = FastAPI()
SECRET = "..." # from POST /admin/webhooks
@app.post("/scrapesmith-hook")
async def hook(req: Request,
x_scrapesmith_signature: str = Header(...),
x_scrapesmith_timestamp: int = Header(...)):
body = await req.body()
try:
verify_webhook(secret=SECRET, body=body,
signature=x_scrapesmith_signature,
timestamp=x_scrapesmith_timestamp)
except ValueError as e:
raise HTTPException(status_code=401, detail=str(e))
# ... event is authentic; process bodyimport { verifyWebhook } from "@scrapesmith/sdk";
app.post("/scrapesmith-hook", async (req, res) => {
const body = await req.text();
try {
await verifyWebhook({
secret: process.env.WEBHOOK_SECRET!,
body,
signature: req.header("x-scrapesmith-signature"),
timestamp: Number(req.header("x-scrapesmith-timestamp")),
});
} catch (e) {
return res.status(401).send((e as Error).message);
}
// ... event is authentic
});
Register a webhook with one POST to
/admin/webhooks
(admin token required, see API docs). The secret is returned
once; keep it in your secret manager. We retry deliveries 4 times with
exponential backoff and persist every attempt for the audit trail.
Build something with this.
Free tier is unlimited for evaluation. Production limits start at $49/mo — see pricing.
Try a scan Full API reference