Crawled Posts

Crawled Posts

In the current admin version, Crawled Posts is no longer only a list of crawled records. It now works as the coordination board for the full crawl workspace: URLs, parser runs, staging summary, and even crawl source configuration.

Crawled Posts

Illustration captured from the local environment on March 14, 2026. The crawl workspace now contains sample URLs, parser runs, and staging summaries so you can understand how the real pipeline should be read.

What is this screen for?

  • monitoring the total number of URLs inside the crawl workspace
  • filtering URLs by source, status, or data type
  • reviewing staging quality before sending data further down the flow
  • configuring and coordinating crawl sources through Control Workspace

Main areas

  • Advanced filters: filter by source, status, type, or URL keyword
  • Stat cards: total URLs, parsed URLs, failed URLs, blocked URLs, parse runs, and staging items
  • URL inventory: the central URL list that tells you how far the pipeline has progressed
  • Recent parse runs: history of recent parser executions
  • Staging summary: summary of extracted entities and their average quality
  • Control Workspace: manage Sources, Rules, Review Queue, Bundles, and Publish Logs

Fast way to read one crawl workspace

  1. Start with the stat row to see whether the workspace is healthy or whether URL failures are increasing.
  2. Use filters to narrow the list by source or status.
  3. Open URL inventory to see which URLs are parsed, pending, or failing.
  4. Review Recent parse runs if you suspect the parser failed recently.
  5. Check Staging summary to see whether the extracted data is clean enough.
  6. If the source configuration needs changes, open Control Workspace.

When should you go into Control Workspace?

  • when you need to add a new crawl source
  • when you need to adjust a seed URL
  • when you need to change review thresholds or logic
  • when you need to check whether a bundle is ready to publish

Operational notes

  • do not push too many new URLs before reviewing source and seed
  • if Failed URLs or Blocked URLs starts growing quickly, review the source first
  • when Staging summary shows low quality, review carefully before syncing further

This screen is more technical coordination than editorial writing

If your goal is to edit a single article, use Post Crawl or open the specific record detail. Crawled Posts is better suited for managing the pipeline and source batches.