AI SEO crawler check
Parameters
enter the website url you want to check in
Your vebapi api key
Example Request
curl -X GET "https://vebapi.com/api/seo/aiseochecker?website=codeconia.com" \
-H "X-API-KEY: YOUR_API_KEY" \
-H "Content-Type: application/json"
Response
{
"url": "codeconia.com",
"robots_found": true,
"ai_access": {
"GPTBot": true,
"ChatGPT-User": true,
"Google-Extended": true,
"AnthropicBot": true,
"ClaudeBot": true,
"PerplexityBot": true,
"CCBot": true,
"Amazonbot": true,
"Bytespider": true,
"facebookexternalhit": true,
"cohere-ai": true,
"YouBot": true,
"NeevaBot": true,
"ai-crawler": true,
"Applebot": true,
"Baiduspider": true,
"Sogou": true,
"YandexBot": true,
"PhindBot": true,
"DuckDuckBot": true,
"Yeti": true,
"360Spider": true,
"ias-va": true
},
"ai_bots_allowed": true,
"suggestions": [
"Your site is currently open for AI bots. You're AI-friendly!",
"You can still improve by providing structured data (schema.org) for better AI comprehension.",
"Add clear content usage terms if you want to allow or limit AI training use."
]
}
What it does
Checks whether a website allows AI bots (e.g., GPTBot, Google-Extended, PerplexityBot) to crawl and use its content. It reads the site’s robots.txt, evaluates access rules for major AI/LLM crawlers, and returns an allow/deny matrix plus practical suggestions (e.g., what to add/change in robots.txt) if access is blocked.
Why it’s useful (benefits)
-
Know your AI exposure: Quickly see if LLMs can crawl or train on your content.
-
Compliance & policy control: Verify that your
robots.txtreflects your intended AI policy. -
Easy fixes: Get specific suggestions to enable/limit AI crawling.
-
Bulk auditing: Integrate into your CI/SEO pipelines to monitor many domains.
Base URL
https://vebapi.com
Endpoint
GET /api/seo/aiseochecker
Authentication
Send your API key in the header: X-API-KEY: YOUR_API_KEY
Input Parameters
| Name | In | Type | Required | Description | Example |
|---|---|---|---|---|---|
| website | query | string | Yes | Domain or URL to check. Subdomains are treated as provided (robots.txt is fetched from that host). |
codeconia.com or https://codeconia.com |
Notes
• The service fetcheshttps://<host>/robots.txt(orhttpifhttpsis unavailable).
• Ifwebsiteis a full URL, the host component is used forrobots.txt.
Output (Response Body)
| Field | Type | Description | Example |
|---|---|---|---|
| url | string | The normalized host or site you asked to check. | "codeconia.com" |
| robots_found | boolean | Whether a robots.txt file was found and parsed successfully. |
true |
| ai_access | object | Key/value map of major AI/LLM crawler user-agents to boolean access. true = allowed, false = blocked. |
See example on your page |
| ai_bots_allowed | boolean | Overall flag: true if all tracked AI bots are allowed, false if one or more are blocked. |
true |
| suggestions | string[] | Human-readable suggestions based on the analysis (how to enable or improve AI crawling/compliance). | See example on your page |
ai_access keys (typical)
GPTBot, ChatGPT-User, Google-Extended, AnthropicBot, ClaudeBot, PerplexityBot, CCBot, Amazonbot, Bytespider, facebookexternalhit, cohere-ai, YouBot, NeevaBot, ai-crawler, Applebot, Baiduspider, Sogou, YandexBot, PhindBot, DuckDuckBot, Yeti, 360Spider, ias-va
Meaning: For each listed bot,
true= allowed by currentrobots.txtrules;false= blocked (viaDisallowfor that UA or wildcard rules that match it).
Example
You already display an example request and response on the page, so they’re not duplicated here.
Status Codes (typical)
-
200 OK – Analysis succeeded.
-
400 Bad Request – Missing/invalid
websiteparameter. -
401 Unauthorized – Missing/invalid
X-API-KEY. -
404 Not Found – Host reachable but no
robots.txtand no fallback (you’ll still getrobots_found: falseif host exists). -
500 Server Error – Unexpected error while fetching/parsing.
Implementation Notes & Best Practices
-
To allow GPTBot explicitly, add for example:
User-agent: GPTBot Allow: / -
To opt out of certain AI bots, specify:
User-agent: GPTBot Disallow: / -
Prefer host-specific rules if you have multiple subdomains.
-
Re-check after changes; CDN caches can delay
robots.txtpropagation.