Reference Library · 26 Documents · MIT + CC BY 4.0
Every signal a web crawler reads.
Living reference covering the HTML meta tags, HTTP status codes, and HTTP header categories that determine whether a search engine, AI engine, or social platform understands your site. Each document is a canonical reference — not a tutorial, not a marketing post.
Cite as: Anady, J. W. (2026). Crawler Signal Reference Library. Zenodo. https://doi.org/10.5281/zenodo.20405342
Companion to the open-source toolchain at /tools/ and the methodology paper at /research/.
Companion to the open-source toolchain at /tools/ and the methodology paper at /research/.
HTML meta tags 14 documents
HTML meta
author tag
HTML meta charset
HTML meta color-scheme
HTML meta content-language
HTML meta copyright
HTML meta generator
HTML meta keywords
HTML meta referrer
HTML meta refresh
HTML meta robots
HTML meta theme-color
HTML meta viewport
HTML Open Graph (og:) tags
HTML Twitter Cards (twitter:)
HTTP status codes 4 documents
HTTP headers 8 categories
Caching headers
Content headers
CORS headers
Performance headers
Rate-control headers
Request headers
Security headers
SEO headers (X-Robots-Tag, Vary)
Related work
The reference library is the empirical foundation for two MIT-licensed tools and a published methodology paper.
- aio-surfaces — generates llms.txt, llms-full.txt, aeo.json, entity.json, brand.json, ai.txt from a single typed site config
- seo-sidecar — FastAPI + nginx SSI sidecar for live Schema.org JSON-LD injection
- /research/ — methodology paper, datasets, citable DOIs
- /insights/ — applied framework write-ups (Astro, Next.js, Nuxt, SvelteKit, 11ty, Jekyll, Hydrogen, React)
Author
Maintained by Joseph W. Anady, founder of ThatDeveloperGuy. Wikidata: Q139901957. ORCID: 0009-0008-8625-949X. Reference content is CC BY 4.0; companion software is MIT licensed.