AI SEO crawler check
Parameters
enter the website url you want to check in
Your vebapi api key
Example Request
curl -X GET "https://vebapi.com/api/seo/aiseochecker?website=codeconia.com" \
-H "X-API-KEY: YOUR_API_KEY" \
-H "Content-Type: application/json"
Response
{
"url": "codeconia.com",
"robots_found": true,
"ai_access": {
"GPTBot": true,
"ChatGPT-User": true,
"Google-Extended": true,
"AnthropicBot": true,
"ClaudeBot": true,
"PerplexityBot": true,
"CCBot": true,
"Amazonbot": true,
"Bytespider": true,
"facebookexternalhit": true,
"cohere-ai": true,
"YouBot": true,
"NeevaBot": true,
"ai-crawler": true,
"Applebot": true,
"Baiduspider": true,
"Sogou": true,
"YandexBot": true,
"PhindBot": true,
"DuckDuckBot": true,
"Yeti": true,
"360Spider": true,
"ias-va": true
},
"ai_bots_allowed": true,
"suggestions": [
"Your site is currently open for AI bots. You're AI-friendly!",
"You can still improve by providing structured data (schema.org) for better AI comprehension.",
"Add clear content usage terms if you want to allow or limit AI training use."
]
}
What it does
Checks whether a website allows AI bots (e.g., GPTBot, Google-Extended, PerplexityBot) to crawl and use its content. It reads the site’s robots.txt
, evaluates access rules for major AI/LLM crawlers, and returns an allow/deny matrix plus practical suggestions (e.g., what to add/change in robots.txt
) if access is blocked.
Why it’s useful (benefits)
-
Know your AI exposure: Quickly see if LLMs can crawl or train on your content.
-
Compliance & policy control: Verify that your
robots.txt
reflects your intended AI policy. -
Easy fixes: Get specific suggestions to enable/limit AI crawling.
-
Bulk auditing: Integrate into your CI/SEO pipelines to monitor many domains.
Base URL
https://vebapi.com
Endpoint
GET /api/seo/aiseochecker
Authentication
Send your API key in the header: X-API-KEY: YOUR_API_KEY
Input Parameters
Name | In | Type | Required | Description | Example |
---|---|---|---|---|---|
website | query | string | Yes | Domain or URL to check. Subdomains are treated as provided (robots.txt is fetched from that host). |
codeconia.com or https://codeconia.com |
Notes
• The service fetcheshttps://<host>/robots.txt
(orhttp
ifhttps
is unavailable).
• Ifwebsite
is a full URL, the host component is used forrobots.txt
.
Output (Response Body)
Field | Type | Description | Example |
---|---|---|---|
url | string | The normalized host or site you asked to check. | "codeconia.com" |
robots_found | boolean | Whether a robots.txt file was found and parsed successfully. |
true |
ai_access | object | Key/value map of major AI/LLM crawler user-agents to boolean access. true = allowed, false = blocked. |
See example on your page |
ai_bots_allowed | boolean | Overall flag: true if all tracked AI bots are allowed, false if one or more are blocked. |
true |
suggestions | string[] | Human-readable suggestions based on the analysis (how to enable or improve AI crawling/compliance). | See example on your page |
ai_access
keys (typical)
GPTBot
, ChatGPT-User
, Google-Extended
, AnthropicBot
, ClaudeBot
, PerplexityBot
, CCBot
, Amazonbot
, Bytespider
, facebookexternalhit
, cohere-ai
, YouBot
, NeevaBot
, ai-crawler
, Applebot
, Baiduspider
, Sogou
, YandexBot
, PhindBot
, DuckDuckBot
, Yeti
, 360Spider
, ias-va
Meaning: For each listed bot,
true
= allowed by currentrobots.txt
rules;false
= blocked (viaDisallow
for that UA or wildcard rules that match it).
Example
You already display an example request and response on the page, so they’re not duplicated here.
Status Codes (typical)
-
200 OK – Analysis succeeded.
-
400 Bad Request – Missing/invalid
website
parameter. -
401 Unauthorized – Missing/invalid
X-API-KEY
. -
404 Not Found – Host reachable but no
robots.txt
and no fallback (you’ll still getrobots_found: false
if host exists). -
500 Server Error – Unexpected error while fetching/parsing.
Implementation Notes & Best Practices
-
To allow GPTBot explicitly, add for example:
User-agent: GPTBot Allow: /
-
To opt out of certain AI bots, specify:
User-agent: GPTBot Disallow: /
-
Prefer host-specific rules if you have multiple subdomains.
-
Re-check after changes; CDN caches can delay
robots.txt
propagation.