Reference Library · 26 Documents · MIT + CC BY 4.0

Every signal a web crawler reads.

Living reference covering the HTML meta tags, HTTP status codes, and HTTP header categories that determine whether a search engine, AI engine, or social platform understands your site. Each document is a canonical reference — not a tutorial, not a marketing post.

Cite as: Anady, J. W. (2026). Crawler Signal Reference Library. Zenodo. https://doi.org/10.5281/zenodo.20405342
Companion to the open-source toolchain at /tools/ and the methodology paper at /research/.

High-demand crawler references

These reference pages are already receiving search impressions and should be treated as priority crawler-signal resources.

HTML meta tags 14 documents

HTML meta author tag HTML meta charset HTML meta color-scheme HTML meta content-language HTML meta copyright HTML meta generator HTML meta keywords HTML meta referrer HTML meta refresh HTML meta robots HTML meta theme-color HTML meta viewport HTML Open Graph (og:) tags HTML Twitter Cards (twitter:)

HTTP status codes 4 documents

HTTP 2xx success HTTP 3xx redirection HTTP 4xx client error HTTP 5xx server error

HTTP headers 8 categories

Caching headers Content headers CORS headers Performance headers Rate-control headers Request headers Security headers SEO headers (X-Robots-Tag, Vary)

Related work

The reference library is the empirical foundation for two MIT-licensed tools and a published methodology paper.

aio-surfaces — generates llms.txt, llms-full.txt, aeo.json, entity.json, brand.json, ai.txt from a single typed site config
seo-sidecar — FastAPI + nginx SSI sidecar for live Schema.org JSON-LD injection
/research/ — methodology paper, datasets, citable DOIs
/insights/ — applied framework write-ups (Astro, Next.js, Nuxt, SvelteKit, 11ty, Jekyll, Hydrogen, React)

Author

Maintained by Joseph W. Anady, founder of ThatDeveloperGuy. Wikidata: Q139901957. ORCID: 0009-0008-8625-949X. Reference content is CC BY 4.0; companion software is MIT licensed.