Skip to main content

Processing Pipeline

When you upload a document, Shelv runs an automated pipeline that converts the source PDF into a structured Markdown filesystem.

Stages

1. Parsing

The uploaded PDF is converted into normalized Markdown so the rest of the pipeline can reason over consistent text and headings. This stage also records metadata needed for status reporting and retries. Output: parsed page content ready for structuring.

2. Structuring

Shelv analyzes parsed content and proposes a filesystem layout for files and directories.
  • Document headings and their hierarchy
  • Logical content boundaries (chapters, sections, clauses)
  • The selected template’s conventions (if any)
  • Practical file sizing for agent workflows
Output: a proposed file tree and file contents.

3. Verification

Automated checks validate output quality before finalization:
  • Required artifacts — expected files (such as README.md) are present
  • Content sanity — output content size is compared against source content
  • Naming/path safety — file paths follow expected conventions
  • File sizing sanity — suspiciously small or large files are flagged
If checks fail, Shelv retries automatically before marking the shelf as failed.

4. Storage

Structured files are written to S3-compatible object storage scoped to your account and shelf ID. Availability:
  • File tree and file reads are available in ready and review
  • Temporary S3 credentials are available in ready

5. Webhooks

When processing completes (or fails), Shelv dispatches webhook notifications to all registered endpoints for the relevant events (shelf.ready, shelf.failed, shelf.review).

Error Handling

If any stage fails, the shelf transitions to failed status with a descriptive error message and the name of the failed step. You can retry processing with:
curl -X POST https://api.shelv.dev/api/shelves/{id}/retry \
  -H "Authorization: Bearer sk_your_api_key"
This cleans up any partial output and re-runs the pipeline from the beginning.