Notion Block Search & Sync System

A Val Town application that searches Notion pages for keywords OR block types (like checkboxes), extracts structured data, stores it in blob storage, and syncs it back to a Notion database with intelligent filtering and validation.

Overview

This system enables automatic extraction and organization of action items from Notion pages:

Search: Scans recent Notion pages for configurable keywords or block types
Extract: Captures block content including mentions and dates
Validate: Filters out blocks that are too short (< 5 words by default)
Enrich: Auto-assigns missing owner (creator) and due date (today by default)
Store: Saves to Val Town blob storage with timestamp tracking and sync metadata
Optimize: Skips already-synced items to reduce API calls by 90%+
Sync: Creates/updates Notion database pages with Status "Not started" (create only)

Project Structure

├── backend/
│   ├── controllers/         # Business logic
│   │   ├── pageController.ts           # Page operations
│   │   ├── todoController.ts           # Keyword search logic
│   │   ├── todoSaveController.ts       # Blob → Notion sync
│   │   └── todoOrchestrationController.ts  # Batch workflow
│   ├── crons/              # Time-based triggers
│   │   ├── todoSearch.cron.ts  # Periodic keyword search
│   │   └── todoSync.cron.ts    # Periodic database sync
│   ├── routes/             # HTTP handlers
│   │   ├── api/            # API endpoints
│   │   │   └── pages.ts    # Recent pages API
│   │   └── tasks/          # Task automation endpoints
│   │       ├── todoSearch.ts # Single page search webhook
│   │       ├── todoSave.ts # Blob sync webhook
│   │       └── todos.ts    # Batch search & sync
│   ├── services/           # External API integrations
│   │   ├── notion/         # Notion API wrapper
│   │   │   ├── index.ts    # Client initialization
│   │   │   ├── pages.ts    # Page operations
│   │   │   ├── databases.ts # Database operations
│   │   │   ├── blocks.ts   # Block operations
│   │   │   └── search.ts   # Search operations
│   │   ├── aiService.ts    # OpenAI for fuzzy matching
│   │   └── blobService.ts  # Val Town blob storage
│   └── utils/              # Utility functions
│       ├── notionUtils.ts  # Block transformation
│       ├── blobUtils.ts    # Blob key parsing
│       └── emojiUtils.ts   # Emoji extraction
├── frontend/               # React frontend
├── shared/                 # Shared types and utilities
│   ├── types.ts            # TypeScript interfaces
│   └── utils.ts            # Shared utility functions
├── main.http.tsx           # Application entry point (Hono)
├── CLAUDE.md               # Development guidelines
└── AGENTS.md               # Val Town platform guidelines

MVC Architecture

This application follows a strict 3-layer MVC architecture with clear separation of concerns:

Request → Route → Controller → Service → External API
                      ↓
Response ← Format ← Standard Response ← Result

Layer 1: Routes (`backend/routes/`)

Responsibility: HTTP handling only

Extract request parameters (query, body, headers)
Call controller functions
Format responses with appropriate HTTP status codes
Never contain business logic

// Example: backend/routes/tasks/todos.ts
app.post('/', async (c) => {
  const keyword = c.req.query('keyword') || undefined;
  const result = await todoOrchestrationController.processBatchTodos(hours, keyword);
  return c.json(result, 200);
});

Layer 2: Controllers (`backend/controllers/`)

Responsibility: Business logic and orchestration

Validate input data
Orchestrate multiple service calls
Transform and filter data
Return standardized response format: {success, data, error, details?}
Never make direct HTTP calls to external APIs

// Example: backend/controllers/todoController.ts
export async function processTodoSearch(pageId: string, keyword: string = 'todo') {
  // Validation
  if (!pageId) return { success: false, error: "Invalid pageId", ... };

  // Call service layer
  const blocks = await notionService.getPageBlocksRecursive(pageId);

  // Business logic
  const matches = blocks.filter(block => searchBlockForKeyword(block, keyword));

  return { success: true, data: matches, error: null };
}

Layer 3: Services (`backend/services/`)

Responsibility: External API calls only

Make HTTP requests to external APIs
Handle API authentication
Parse and normalize API responses
Return structured results: {success, data, error}
Never contain business logic

// Example: backend/services/notion/pages.ts
export async function getPageBlocksRecursive(blockId: string) {
  const response = await notion.blocks.children.list({ block_id: blockId });
  return response.results;
}

Golden Rule: Never skip layers! Routes call controllers, controllers call services. This ensures testability, maintainability, and clear separation of concerns.

Search Workflow

The system follows a three-stage pipeline: Notion → Blob Storage → Notion Database

This workflow supports two search modes (keyword or block type). The pipeline remains the same regardless of mode.

Stage 1: Search & Extract (Notion → Blobs)

Flow:

Get recent pages from Notion (configurable time window)
For each page, recursively fetch all blocks
Search blocks for keyword matches
Extract structured data from matching blocks
Validate word count: Skip blocks below MIN_BLOCK_WORDS (default: 5)
Auto-assign missing fields: Add default due date and/or creator as owner
Save enriched blocks to blob storage with timestamp

Validation & Auto-Assignment (happens here, not during sync):

❌ Word count < MIN_BLOCK_WORDS? → Skip (too short to be meaningful)
⚠️ Missing date_mention? → Auto-assign based on DEFAULT_DUE_DATE setting (default: today)
⚠️ Missing people_mention? → Auto-assign block creator as Owner
✅ Only skips if: (1) too short, or (2) creator info unavailable (rare)
Result: All blobs in storage are meaningful, complete with owner + due date

Endpoints:

POST /tasks/todo/search - Single page search (webhook-triggered)
POST /tasks/todos?hours=24 - Batch search across recent pages

Keywords Configuration:

Set via SEARCH_KEYWORDS environment variable (comma-separated)
Example: SEARCH_KEYWORDS=todo,zinger,bit,😀
Defaults to todo if not set
All keywords searched in single pass through blocks (efficient)

Keyword Matching Logic:

Text keywords (e.g., "todo", "bit", "steel"):
- Case-insensitive
- Word boundary matching (finds "todo" but not "todoist")
- Uses regex: /\btodo\b/i
Emojis (e.g., "😀", "🎉"):
- Exact match
- Case-sensitivity N/A
Multi-keyword: Block saved if it matches ANY keyword

Block Extraction:

When a keyword is found, the system extracts and transforms the block into a reduced format:

{
  todo_string: "Buy groceries for @John due October 30, 2025",
  block_id: "abc-123-def-456",
  block_url: "https://www.notion.so/abc123def456",
  last_edited_time: "2025-10-29T12:00:00.000Z",
  people_mentions: [{ id: "user-123", name: "John", email: "john@example.com" }],
  date_mentions: ["2025-10-30"],
  link_mentions: [{ text: "Project", url: "/page-id" }],
  sync_metadata: {
    synced: false,  // Needs sync to Notion database
    target_page_id: undefined  // Will be set after first sync (ID of page in todos database)
  }
}

Transformation Details:

Dates: Formatted to human-readable (e.g., "October 30, 2025 at 3:00 PM EDT")
Original dates preserved: ISO format kept in date_mentions array for Notion API
Block URL: Clickable link to original block location
Emojis: Extracted for use as page icons

Stage 2: Blob Storage

Storage Format:

Key pattern: {projectName}--{category}--{blockId}
Example: demo--todo--abc-123-def-456
Content: JSON of reduced block structure with sync metadata

Blob Structure:

{
  todo_string: "...",
  block_id: "...",
  page_url: "...",             // Source page URL
  parent_id: "..." | null,     // Parent block ID (for project matching)
  // ... other properties
  sync_metadata: {
    synced: boolean,           // true = synced to Notion, false = needs sync
    target_page_id?: string    // Cached ID of page in todos database (optimization)
  }
}

Update Logic:

Compare last_edited_time of existing blob vs new block
If unchanged: Skip save (preserves synced: true status)
If changed: Save with synced: false (triggers re-sync)
Preserve cached target_page_id across updates
Prevents data loss from out-of-order processing

Stage 3: Sync to Notion Database (Blobs → Notion)

Flow:

List all blobs in "todo" category
For each blob, read reduced block data
Optimization: Skip if synced: true (0 API calls)
Optimization: Use cached target_page_id if available (1 API call - update only)
If no cached ID: Query database for existing page by Block ID
Create new page OR update existing page
- CREATE: Adds Status = "Not started" (only on creation)
- UPDATE: Does NOT modify Status (preserves user changes)
Mark blob as synced: true and cache page ID

Note: No validation happens during sync - all blobs are guaranteed valid because validation occurs during the search phase (Stage 1).

Endpoints:

POST /tasks/todo/save - Sync all blobs to database (webhook-triggered)
POST /tasks/todos - Batch workflow (search + sync in one call)

Property Mappings (Blob → Notion Database):

todo_string        → Name (title)
block_id           → Block ID (rich_text)
block_url          → Block URL (url)
page_url           → Page URL (url) - source page where todo was found
last_edited_time   → Todo last edited time (date)
people_mentions[0] → Owner (people)
people_mentions[1..] → Other people (people)
date_mentions[0]   → Due date (date)
link_mentions      → Links (rich_text, bullet list)
matched projects   → Projects db (relation) - see Project Matching
emoji (if found)   → Page icon
(on CREATE only)   → Status (select: "Not started")

Status Property Behavior:

CREATE: New pages are created with Status = "Not started" (select property)
UPDATE: Existing pages are updated WITHOUT modifying Status
Why: Preserves user changes to Status in the database (e.g., "In progress", "Done")
Result: Status is set once on creation, then managed by users in the database

Sync Optimization:

The system uses sync metadata to dramatically reduce Notion API calls:

On first sync:

Blob has synced: false, no target_page_id
Query database → create or update → cache page ID
Mark synced: true
API calls: 1 query + 1 create/update = 2 calls

On subsequent syncs (no changes):

Blob has synced: true
Skip immediately
API calls: 0 calls (100% reduction)

On subsequent syncs (block changed):

Blob saved with synced: false (block edited in Notion)
Has cached target_page_id from previous sync
Update directly without query
Mark synced: true
API calls: 1 update (50% reduction)

Performance impact:

Before optimization: 100 blobs = 100 queries + 50 updates = 150 API calls
After optimization: 90 synced + 10 changed = 0 + 10 updates = 10 API calls (93% reduction)

Block Type Handling

The search uses recursive block fetching to traverse the entire page hierarchy, including nested content.

Recursive Fetching

How it works:

function getPageBlocksRecursive(blockId, containerFilter?) {
  1. Fetch immediate children of blockId
  2. For each child:
     - Add child to results
     - If child.has_children === true:
       - If containerFilter provided: only recurse if block type is in filter
       - Otherwise: recurse into all children
       - Add to results
  3. Return flattened array of all blocks
}

What this means:

✅ Finds blocks nested inside toggles
✅ Finds blocks nested inside columns
✅ Finds blocks nested inside lists
✅ Finds blocks nested N levels deep

Block Type Mode Optimization

When using block type mode (SEARCH_BLOCK_TYPE=to_do), the system optimizes recursive fetching by only traversing into container blocks that can hold to_do children:

Container blocks (recursed into):

to_do - to_do blocks can nest inside other to_do blocks
toggle - common pattern for organizing todos
column_list / column - layout containers
synced_block - can contain any block type
callout - can contain nested content
quote - can contain nested blocks
bulleted_list_item / numbered_list_item - can have nested content
template - can contain any block type

Non-container blocks (skipped):

paragraph, heading_1/2/3, code, equation - cannot have to_do children
image, video, file, pdf, audio, embed, bookmark - media blocks

Performance impact: Significantly reduces API calls by skipping recursion into blocks that cannot contain to_do items. Keyword mode still traverses all blocks (no filter applied).

Included Block Types

These block types are searched for keywords:

Block Type	Has rich_text?	Notes
`paragraph`	✅	Standard text blocks
`heading_1`, `heading_2`, `heading_3`	✅	All heading levels
`bulleted_list_item`	✅	Bullet lists
`numbered_list_item`	✅	Numbered lists
`to_do`	✅	Checkbox items
`toggle`	✅	Collapsible toggles
`quote`	✅	Quote blocks
`callout`	✅	Callout/alert blocks
`code`	✅	Code blocks (captions only)
`column`	N/A	Container - children are searched
`column_list`	N/A	Container - children are searched

Column Behavior:

Column blocks themselves have no searchable text
But their children (paragraphs, lists, etc.) ARE searched
Example: A todo in a column will be found

Excluded Block Types

These block types are explicitly skipped:

Block Type	Reason
`unsupported`	Not supported by Notion API
`button`	Action buttons, not content
`table`	Container block, no text content
`table_row`	Cells aren't individual blocks; can't be saved to blob
`child_page`	Page title not in rich_text format
`child_database`	Database title not in rich_text format
`divider`	No text content
`table_of_contents`	No text content
`breadcrumb`	No text content
`image`, `file`, `video`, `pdf`	Media blocks (captions could be added later)
`bookmark`, `embed`	External content (could be added later)

Why tables are excluded:

Table content lives in table_row.cells[][] (array of arrays)
Cells contain rich_text but aren't individual blocks
Can't be saved to blob storage as standalone blocks
Can't create Notion pages from cell content

Validation & Auto-Assignment Rules

Matched blocks are validated for minimum length, then enriched with auto-assigned fields before being saved to blob storage. Validation ensures quality, auto-assignment ensures completeness.

Validation & Enrichment for Blob Storage

Matched blocks go through validation and enrichment before being saved:

1. Word Count Validation (REQUIRED):

✅ Block must have at least MIN_BLOCK_WORDS words (default: 5)
❌ Blocks with fewer words are skipped - too short to be meaningful todos
Counts all words including mentions and dates (simple whitespace split)
Example: "Buy groceries for @John tomorrow" = 5 words (passes)
Example: "todo" = 1 word (skipped)

2. Date Mention (AUTO-ASSIGNED):

✅ At least one date mention (date_mentions.length > 0)
First date becomes "Due date"
AUTO-ASSIGNED: If no date found, uses DEFAULT_DUE_DATE env var setting (defaults to "today")

3. Person Mention (AUTO-ASSIGNED):

✅ At least one person mention (people_mentions.length > 0)
First person becomes "Owner"
Additional people become "Other people"
AUTO-ASSIGNED: If no @mentions found, block creator is automatically assigned as owner

Automatic Due Date Assignment:

If a block matches search criteria but has no date mentions, the system automatically assigns a due date
Configurable: Set via DEFAULT_DUE_DATE environment variable
Options: today (default), tomorrow, one_week, end_of_week, next_business_day
Rationale: Blocks without explicit dates still need deadlines; "today" is a sensible default
Date is stored in same ISO format as explicit dates (e.g., "2025-10-31")

Automatic Owner Assignment:

If a block matches search criteria but has no @person mentions, the system automatically assigns the block creator (from Notion's created_by field) as the owner
Rationale: When someone creates a todo without mentioning anyone, they're implicitly taking ownership
Only the creator's Notion user ID is used (no additional API calls needed)
The creator appears as "Owner" in the synced database page

Result: All matched blocks are saved - no blocks are skipped due to missing dates or people.

When Validation Happens

During Search (Stage 1) - todoController.ts:

After keyword match is found
After block is transformed to reduced format
Before saving to blob storage

Not During Sync (Stage 3) - All blobs are guaranteed valid, no checking needed

Validation Logic

// From todoController.ts (search phase)

// Step 1: Check minimum word count
const minWords = getMinBlockWords();
const wordCount = countWords(reducedBlock.todo_string);

if (wordCount < minWords) {
  console.log(`○ Skipped: block too short (${wordCount} words, minimum: ${minWords})`);
  continue;  // Don't save to blob
}

// Step 2: Auto-assign due date if no date mentioned
if (!reducedBlock.date_mentions || reducedBlock.date_mentions.length === 0) {
  const setting = getDefaultDueDateSetting();
  const calculatedDate = calculateDueDate(setting);

  reducedBlock.date_mentions = [calculatedDate];
  console.log(`✓ Auto-assigned due date: ${calculatedDate} (setting: ${setting})`);
}

// Step 3: Auto-assign creator as owner if no people mentioned
if (!reducedBlock.people_mentions || reducedBlock.people_mentions.length === 0) {
  const creator = getBlockCreator(block);

  if (creator) {
    reducedBlock.people_mentions = [creator];
    console.log(`✓ Auto-assigned creator as owner (ID: ${creator.id})`);
  } else {
    console.log(`○ Skipped: no people mentions and creator info not available`);
    continue;  // Don't save to blob
  }
}

console.log('✓ Validation passed: has people and date mentions');

// All validated and enriched blocks reach this point and get saved
await blobService.saveBlockToBlob('todo', block.id, blobData);

Examples

Valid - Explicit @mention + explicit date (8 words):

"Buy groceries for @John due October 30, 2025"
✅ Word count: 8 (passes minimum of 5)
✅ Has person mention (@John)
✅ Has date mention (October 30, 2025)
→ Saved to blob storage
→ Synced to Notion database with @John as Owner, due Oct 30

Valid - Auto-assigned creator + explicit date (6 words):

"Buy groceries due October 30, 2025"
✅ Word count: 6 (passes minimum of 5)
✅ Has date mention (October 30, 2025)
⚠️  No person mention → Creator auto-assigned
→ Saved to blob storage with creator as Owner
→ Synced to Notion database (creator appears as Owner, due Oct 30)

Valid - Explicit @mention + auto-assigned date (5 words):

"Buy groceries for @John tomorrow"
✅ Word count: 5 (passes minimum of 5)
✅ Has person mention (@John)
⚠️  No date mention → Auto-assigned (e.g., today = 2025-10-31)
→ Saved to blob storage
→ Synced to Notion database with @John as Owner, due Oct 31

Valid - Both auto-assigned (5 words):

"Buy groceries at the store"
✅ Word count: 5 (passes minimum of 5)
⚠️  No person mention → Creator auto-assigned
⚠️  No date mention → Auto-assigned (e.g., today = 2025-10-31)
→ Saved to blob storage with creator as Owner
→ Synced to Notion database (creator appears as Owner, due Oct 31)

Invalid - Too short (2 words):

"Buy groceries"
❌ Word count: 2 (below minimum of 5)
→ NOT saved to blob storage (skipped - too short)

Invalid - Too short (1 word):

"todo"
❌ Word count: 1 (below minimum of 5)
→ NOT saved to blob storage (skipped - too short)

Invalid - Too short with emojis (2 words):

"🍋 🎉"
❌ Word count: 2 (below minimum of 5)
→ NOT saved to blob storage (skipped - too short)

Invalid - Creator unavailable (rare):

"Buy groceries at the store tomorrow"
✅ Word count: 6 (passes minimum of 5)
⚠️  No person mention → Would auto-assign creator
❌ Creator info not available from Notion API
→ NOT saved to blob storage (skipped - creator unavailable)

Note: With word count validation and auto-assignment enabled, most meaningful blocks are saved. Blocks are skipped if:

Too short (below MIN_BLOCK_WORDS, default: 5) - most common
Creator unavailable (rare edge case)

Sync Summary

After syncing, the controller reports:

Total blobs: All blobs in storage (all guaranteed valid with auto-assigned fields)
Pages created: New pages added to database
Pages updated: Existing pages updated
Pages skipped: Blobs already synced (synced: true)
Pages failed: Errors during create/update

Note: All blobs meet criteria with auto-assignment - validation and enrichment happen during search, not sync.

Endpoints

API Endpoints

GET /api/pages/recent?hours=24

Get pages edited in last N hours
Filters out archived pages and pages in TODOS_DB_ID database
Returns simplified page objects with parent information

Response:

{
  "pages": [
    {
      "id": "page-id",
      "object": "page",
      "title": "My Page",
      "url": "https://notion.so/...",
      "last_edited_time": "2025-10-29T12:00:00.000Z",
      "parent": { "type": "page_id", "id": "parent-id" }
    }
  ],
  "count": 1,
  "timeRange": "24 hours"
}

Task Endpoints

POST /tasks/todo/search

Search single page for keywords (webhook-triggered)
Keywords from SEARCH_KEYWORDS env var (comma-separated)
Extracts and saves matching blocks to blobs
Body: { "page_id": "abc-123" }

POST /tasks/todo/save

Sync all blobs to Notion database
Validates and creates/updates pages
No request body needed

POST /tasks/todos?hours=24

Batch workflow: Search recent pages + sync to database
Keywords from SEARCH_KEYWORDS env var
Combines search and save in one call
Use for manual triggers or cron jobs

Response:

{
  "success": true,
  "pagesSearched": 5,
  "totalTodosFound": 12,
  "searchResults": [
    {
      "pageId": "abc-123",
      "pageTitle": "My Page",
      "success": true,
      "blocksFound": 3,
      "blockIds": ["block-1", "block-2", "block-3"]
    }
  ],
  "saveResult": {
    "totalBlobs": 12,
    "pagesCreated": 5,
    "pagesUpdated": 3,
    "pagesSkipped": 4,
    "pagesFailed": 0
  }
}

Cron Jobs

The system includes two separate cron jobs for automated workflow execution. Crons are time-based triggers that run independently of HTTP requests.

Architecture

Crons live in backend/crons/ and follow the same MVC pattern as HTTP routes:

Cron Trigger → Controller → Service → External API

Key differences from HTTP routes:

Triggered by time intervals (not HTTP requests)
No request/response cycle
Results logged to console only
Use .cron.tsx extension for Val Town

Cron 1: Todo Search (`todoSearch.cron.ts`)

Purpose: Search recent pages for keywords/block types and save matches to blob storage

Workflow:

Get recent pages from Notion (last 15 minutes)
Search each page for configured keywords or block types
Save matching blocks to Val Town blob storage
Does NOT sync to Notion database

Configuration:

Lookback window: 15 minutes (optimized for frequent runs)
Keywords/Block type: From SEARCH_KEYWORDS or SEARCH_BLOCK_TYPE env var
Recommended schedule: Every 1 minute
- 15 minute lookback provides buffer for missed runs
- Frequent runs catch changes quickly

Output:

=== Cron: Todo Search Started ===
Timestamp: 2025-10-29T12:00:00.000Z

Cron: Search complete - Found 12 todos in 5 pages
Pages with matches:
  - Project Planning: 3 match(es)
  - Meeting Notes: 5 match(es)
  - Weekly Review: 4 match(es)

=== Cron: Todo Search Complete ===

Cron 2: Todo Sync (`todoSync.cron.ts`)

Purpose: Sync validated todo blobs to Notion database

Workflow:

Read all todo blobs from Val Town blob storage
Validate each blob (requires person mention + date mention)
Query Notion database for existing pages by Block ID
Create new pages or update existing pages (timestamp-based)

Configuration:

No parameters: Processes all blobs in storage
Recommended schedule: Every 8-12 hours
- Less frequent than search cron
- Allows time for blob accumulation
- Reduces Notion API calls

Output:

=== Cron: Todo Sync Started ===
Timestamp: 2025-10-29T14:00:00.000Z

Cron: Sync complete
Summary:
  Total blobs processed: 12
  Pages created: 5
  Pages updated: 3
  Pages skipped: 4
  Pages failed: 0

=== Cron: Todo Sync Complete ===

Why Two Separate Crons?

Operational flexibility:

Search cron runs frequently to capture changes quickly
Sync cron runs less frequently to batch database updates
Reduces Notion API rate limit concerns
Allows manual triggering of sync independently

Fault isolation:

Search failures don't block syncing existing blobs
Sync failures don't block new searches
Each cron can be debugged independently

Cost optimization:

Blob storage is cheap and fast
Notion API calls are rate-limited
Separate crons allow different schedules for different costs

Setting Up Crons in Val Town

Navigate to Val Town UI
Create new cron vals:
- todoSearch.cron.tsx - Copy content from backend/crons/todoSearch.cron.ts
- todoSync.cron.tsx - Copy content from backend/crons/todoSync.cron.ts
Set schedules:
- todoSearch.cron.tsx: Every 1 minute (* * * * *)
- todoSync.cron.tsx: Every 1 minute (* * * * *)
Monitor logs: Check Val Town console for cron execution results

Note: Val Town cron jobs must be separate vals (not files in this project). The files in backend/crons/ serve as templates to copy into Val Town cron vals.

Environment Variables

Required environment variables (set in Val Town):

NOTION_API_KEY - Notion integration token (required)
- Get from: https://www.notion.so/my-integrations
- Required for all Notion API calls
TODOS_DB_ID - Notion database ID for todo sync (required)
- The database where keyword matches are synced
- Format: abc123def456... (32-character ID without hyphens)
PROJECTS_DB_ID - Notion database ID for project matching (optional)
- Links todos to projects automatically (see Project Matching)
- If not set, todos sync normally without project links
- Format: abc123def456... (32-character ID without hyphens)
SEARCH_KEYWORDS - Keywords to search for (optional, keyword mode)
- Comma-separated list of keywords/phrases
- Example: todo,zinger,bit or todo,😀,🎉
- Defaults to todo if not set
- All blocks matching ANY keyword will be saved to blob storage
- Efficient: Searches all keywords in a single pass through blocks
SEARCH_BLOCK_TYPE - Block type to search for (optional, block type mode)
- Alternative to keyword search - searches by Notion block type
- Example: to_do (searches all Notion checkbox blocks)
- Defaults to to_do if set with empty value
- Takes precedence over SEARCH_KEYWORDS if both are set
- Common values: to_do, paragraph, bulleted_list_item, numbered_list_item
- Still requires people_mentions + date_mentions validation
- Useful for: "Check a box, add @person and date = instant todo"
MIN_BLOCK_WORDS - Minimum word count for blocks to be saved (optional)
- Blocks with fewer words are skipped (too short to be meaningful todos)
- Defaults to 5 if not set
- Word counting:
  - Counts all words including mentions and dates (simple whitespace split)
  - Hyphenated words count as 1 (e.g., "buy-now" = 1 word)
  - Emojis count as words (e.g., "🍋 🎉" = 2 words)
- Default of 5 accounts for: ~1 word for mention + ~2-3 words for date + ~2-3 words for action
- Example: "Buy groceries for @John tomorrow" = 5 words (minimum)
DEFAULT_DUE_DATE - Default due date for blocks without date mentions (optional)
- Used when a block matches search criteria but has no date
- Supported values: today, tomorrow, one_week, end_of_week, next_business_day
- Defaults to today if not set
- Examples:
  - today - Due today (default)
  - tomorrow - Due tomorrow
  - one_week - Due 7 days from today
  - end_of_week - Due next Friday (end of work week)
  - next_business_day - Due next weekday (skips weekends)
BLOCK_STABILITY_MINUTES - Minimum age (in minutes) before blocks are saved to blob storage (optional)
- Only applies to cron-triggered searches (not webhook-triggered)
- Prevents syncing blocks that are actively being edited
- Defaults to 0 if not set (no delay - blocks saved immediately)
- Set to a positive number to add a stability delay (e.g., 2 for 2 minutes)
- Examples:
  - Not set or BLOCK_STABILITY_MINUTES=0 - All blocks saved immediately (default)
  - BLOCK_STABILITY_MINUTES=2 - Block edited 1 minute ago will be skipped, 3 minutes ago will be saved
  - BLOCK_STABILITY_MINUTES=5 - Block edited 4 minutes ago will be skipped, 6 minutes ago will be saved
- Webhook/button triggers always bypass this delay regardless of setting (immediate sync on user action)
- Use case: Set a delay if you frequently edit todos and want cron to wait for "final" versions
RECENT_PAGES_LOOKBACK_HOURS - Default lookback window for searching recent pages (optional)
- System-wide setting that applies to all triggers: cron jobs, frontend dashboard, manual API calls
- Defaults to 24 hours if not set
- Must be a positive integer
- Can be overridden per-request with ?hours=X query parameter
- Examples:
  - Not set or RECENT_PAGES_LOOKBACK_HOURS=24 - Search last 24 hours (default)
  - RECENT_PAGES_LOOKBACK_HOURS=48 - Search last 48 hours (2 days)
  - RECENT_PAGES_LOOKBACK_HOURS=168 - Search last 168 hours (1 week)
- Use case: Match to your cron schedule or desired dashboard timeframe
  - Cron every 4 hours → Set to 6-8 hours (buffer for overlap/delays)
  - Cron every 24 hours → Set to 24-48 hours
  - No cron → Set to desired dashboard timeframe
NOTION_WEBHOOK_SECRET - API key for protecting webhooks and API endpoints (recommended)
- Required for production use - protects all /tasks/* and /api/* endpoints (except /api/health)
- Prevents unauthorized access to your Notion data and webhook triggers
- Set to any secure random string (e.g., generated password or UUID)
- Notion webhook configuration: Add custom header X-API-KEY with this value
- API requests: Include header X-API-KEY: your-secret-value
- If not set, authentication is disabled (development mode only)
- Example: NOTION_WEBHOOK_SECRET=abc123xyz789...
Security note: Without this, anyone can:
- Trigger your webhooks (causing unnecessary processing)
- Access recent pages via /api/pages/recent (potential data leak)
- View page IDs from /api/health and use them to query other endpoints
Public endpoint exception: /api/health remains public (needed for frontend dashboard)
API_KEY - Legacy API key (deprecated, use NOTION_WEBHOOK_SECRET instead)
- Kept for backwards compatibility
- Use NOTION_WEBHOOK_SECRET for new deployments

Search Modes

This system supports two mutually exclusive search modes. Choose the mode that best fits your workflow.

Mode 1: Keyword Search (default)

When to use: You want to search for specific text in blocks (e.g., "todo", "zinger", emojis).

Configuration:

SEARCH_KEYWORDS=todo,zinger,🍋

How it works:

Searches block text content for keywords
Matches text keywords with word boundaries (case-insensitive)
Matches emojis with exact match
Example: Block containing "Buy groceries todo @John October 30"

Use case: Flexible text-based search across any block type that contains your keywords.

Mode 2: Block Type Search

When to use: You want to use Notion's native block types (especially checkboxes) as task markers.

Configuration:

SEARCH_BLOCK_TYPE=to_do

How it works:

Searches by Notion block type (not text content)
Matches to_do blocks (Notion checkboxes)
Works with both checked and unchecked checkboxes
No keyword required in text
Example: Any checkbox block with @John and October 30

Use case:

Create a checkbox in Notion (makes it a to_do block)
Add @person mention
Add date
Done! Automatically syncs to database (no need to type "todo")

Other block types: You can also search for paragraph, bulleted_list_item, numbered_list_item, etc.

Mode Priority

If both env vars are set: SEARCH_BLOCK_TYPE takes precedence

System will use block type mode
SEARCH_KEYWORDS will be ignored
A warning will be logged

If neither is set: Defaults to keyword mode with keyword "todo"

Auto-Assignment (applies to both modes)

Regardless of search mode, all matched blocks are enriched with:

⚠️ date_mention - If missing, auto-assigned based on DEFAULT_DUE_DATE (default: today)
⚠️ people_mention - If missing, auto-assigned to block creator

All matched blocks are saved to blob storage with complete owner + due date information.

Project Matching

When syncing todos to the database, the system automatically links them to related projects using a cascade of matching strategies. Each strategy is tried in order until a match is found.

Strategy 1: Link Mentions

If the todo text contains a link to a page that exists in your Projects database, the todo is linked to that project.

Example: A todo containing [[Project Alpha]] (a page mention) will be linked to "Project Alpha" if it exists in the Projects database.

Strategy 2: Source Page

If the todo block appears on a page that is itself a project (exists in the Projects database), the todo is linked to that project.

Example: A todo on the "Project Beta" page will be linked to "Project Beta" automatically.

Strategy 3: AI Fuzzy Matching with Date Disambiguation

If strategies 1-2 don't find a match, the system uses OpenAI (via Val Town's @std/openai) to match the todo text against project names and client names.

Initial AI Match:

Sends the todo text and list of projects (with client names) to OpenAI (gpt-4o-mini)
AI returns one of:
- A specific project ID (if confident match to a project name)
- CLIENT:ClientName (if matches a client but not a specific project)
- NONE (no match)
Conservative matching - only links when confident

Date-Based Disambiguation:

When the AI returns a client match (or picks a specific project that has sibling projects for the same client), the system disambiguates using the todo's due date:

Single date match: If the due date falls within exactly one project's date range → use that project
Multiple overlapping dates: If the due date falls within multiple projects' date ranges → second AI call to pick the best semantic fit
No date match: If no project contains the due date → pick project with closest start/end date boundary
No dates on projects: Fall back to most recently edited project

Example - AI picks client, date disambiguates:

Todo: "Review Acme contract for @John due Dec 15"
AI returns: CLIENT:Acme

Projects for Acme:
- "Acme Website Redesign" (Nov 1 - Nov 30) ❌ Dec 15 not in range
- "Acme Q4 Campaign" (Dec 1 - Dec 31) ✅ Dec 15 in range

Result: Linked to "Acme Q4 Campaign"

Example - Overlapping dates, second AI call:

Todo: "this should not go to mission/vision due Nov 28"
AI returns: Dealfront Mission/Vision/Purpose (specific project)

Projects for Dealfront:
- "Dealfront Mission/Vision/Purpose" (Nov 28 - Nov 28) ✅ Nov 28 in range
- "Dealfront Roadmap Strategies" (Nov 18 - Dec 6) ✅ Nov 28 in range

Both overlap! Second AI call with just these 2 candidates:
AI picks: "Dealfront Roadmap Strategies" (better semantic fit based on todo text)

Result: Linked to "Dealfront Roadmap Strategies"

Strategy 4: Parent Block Traversal

If strategies 1-3 don't find a match and the todo is nested under a parent block, the system traverses up the block tree and applies strategies 1-3 to each ancestor.

How it works:

Fetches the parent block from Notion
Applies strategies 1-3 to the parent's content (using the todo's due date for disambiguation)
If no match, moves to grandparent, up to 5 levels
Stops when a match is found or reaches page level

Example: A todo nested under a toggle "Project Gamma Tasks" might match to "Project Gamma" via the parent toggle's text.

Configuration

To enable project matching, set the PROJECTS_DB_ID environment variable to your Projects database ID. Without this, todos sync normally but without project links.

Required database properties:

Your Todos database needs a Projects db relation property pointing to your Projects database
Your Projects database should have:
- Clients relation - Links to a Clients database (enhances AI matching)
- Dates property (date with start/end) - Enables date-based disambiguation

OpenAI Integration:

Uses Val Town's built-in OpenAI integration (@std/openai)
Model: gpt-4o-mini (fast, cost-effective)
Two potential AI calls per todo:
1. Initial match (always, when strategies 1-2 fail)
2. Disambiguation (only when multiple projects have overlapping date ranges)

Getting Started

Prerequisites

Create a Notion integration at https://www.notion.so/my-integrations
Create a Notion database with these properties:
- Name (title)
- Block ID (rich_text)
- Block URL (url)
- Page URL (url) - source page where todo was found
- Todo last edited time (date)
- Owner (people)
- Other people (people)
- Due date (date)
- Links (rich_text)
- Projects db (relation) - optional, links to Projects database
- Status (select) - New pages will be set to "Not started"
Share the database with your integration

Setup

Fork this val in Val Town
Set environment variables:
- NOTION_API_KEY = your integration token
- TODOS_DB_ID = your database ID
- Choose a search mode:
  - Keyword mode: SEARCH_KEYWORDS = todo (or your preferred keywords, comma-separated)
  - Block type mode: SEARCH_BLOCK_TYPE = to_do (or your preferred block type)
Test with: POST /tasks/todos?hours=1

Usage Examples

Keyword Mode Examples

Find and sync all keyword matches from last 24 hours:

# Set: SEARCH_KEYWORDS=todo
curl -X POST https://your-val.express/tasks/todos

Custom time window:

# Set: SEARCH_KEYWORDS=todo
curl -X POST "https://your-val.express/tasks/todos?hours=48"

Search for multiple keywords:

# Set: SEARCH_KEYWORDS=todo,zinger,🍋
curl -X POST https://your-val.express/tasks/todos

Block Type Mode Examples

Find and sync all checkbox todos from last 24 hours:

# Set: SEARCH_BLOCK_TYPE=to_do
curl -X POST https://your-val.express/tasks/todos

Find all bullet points with people + dates:

# Set: SEARCH_BLOCK_TYPE=bulleted_list_item
curl -X POST https://your-val.express/tasks/todos

Other Examples

Get recent pages (API):

curl "https://your-val.express/api/pages/recent?hours=12"

Development Guidelines

For project-specific architecture: See CLAUDE.md
For Val Town platform guidelines: See AGENTS.md

Tech Stack

Runtime: Deno on Val Town
Framework: Hono (lightweight web framework)
Frontend: React 18.2.0 with Pico CSS (classless CSS framework)
APIs:
- Notion API (@notionhq/client v2)
- Val Town blob storage
Language: TypeScript

Architecture Diagrams

Complete System Flow

┌─────────────────────────────────────────────────────────────┐
│                     Notion Workspace                        │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐                 │
│  │  Page A  │  │  Page B  │  │  Page C  │                 │
│  │  "todo"  │  │  "todo"  │  │          │                 │
│  └──────────┘  └──────────┘  └──────────┘                 │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
         ┌─────────────────────────────────────┐
         │  POST /tasks/todos?keyword=todo     │
         │  (Batch Search & Sync Endpoint)     │
         └─────────────────┬───────────────────┘
                           │
        ┌──────────────────┴──────────────────┐
        │                                     │
        ▼                                     ▼
┌───────────────────┐              ┌──────────────────────────┐
│  Step 1: Search   │              │  Step 3: Sync (Optimized)│
│                   │              │                          │
│ • Get recent pages│              │ • Read blobs             │
│ • Fetch all blocks│              │ • Skip if synced: true   │
│ • Search keywords │              │ • Use cached page ID     │
│ • Extract data    │              │ • Create/update pages    │
│ • Validate        │              │ • Mark synced: true      │
└─────────┬─────────┘              └──────────────────────────┘
          │                                   ▲
          ▼                                   │
┌─────────────────────┐                      │
│  Step 2: Store      │                      │
│                     │                      │
│ • Save to blobs     │──────────────────────┘
│ • Set synced: false │
│ • Compare timestamps│
│ • Preserve page ID  │
└─────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────────────┐
│                   Val Town Blob Storage                          │
│                                                                  │
│  demo--todo--block-1: { data, sync_metadata: {synced, page_id} }│
│  demo--todo--block-2: { data, sync_metadata: {synced, page_id} }│
│  demo--todo--block-3: { data, sync_metadata: {synced, page_id} }│
└─────────────────────────────────────────────────────────────────┘

MVC Layer Interaction

┌──────────────────────────────────────────────────────────────┐
│                       HTTP Request                           │
│  POST /tasks/todos?hours=24&keyword=todo                     │
└──────────────────────┬───────────────────────────────────────┘
                       │
                       ▼
         ┌─────────────────────────────┐
         │   ROUTE (todos.ts)          │
         │   • Extract query params    │
         │   • Call controller         │
         │   • Format HTTP response    │
         └─────────────┬───────────────┘
                       │
                       ▼
         ┌─────────────────────────────────────┐
         │   CONTROLLER (orchestration)        │
         │   • Validate inputs                 │
         │   • Orchestrate workflow:           │
         │     1. Get recent pages             │
         │     2. Search each page             │
         │     3. Sync to database             │
         │   • Return standardized result      │
         └──────────────┬──────────────────────┘
                        │
            ┌───────────┼───────────┐
            │           │           │
            ▼           ▼           ▼
    ┌───────────┐ ┌──────────┐ ┌────────────┐
    │ SERVICE:  │ │ SERVICE: │ │ SERVICE:   │
    │ pages.ts  │ │ blob.ts  │ │ database.ts│
    │           │ │          │ │            │
    │ • API call│ │ • Blob   │ │ • Query DB │
    │ • Parse   │ │   CRUD   │ │ • Create   │
    │ • Return  │ │ • Return │ │ • Update   │
    └─────┬─────┘ └────┬─────┘ └─────┬──────┘
          │            │             │
          ▼            ▼             ▼
    Notion API   Blob Storage   Notion API

License

MIT

lightweight

todoSweeper

Notion Block Search & Sync System

Table of Contents

Overview

Project Structure

MVC Architecture

Layer 1: Routes (backend/routes/)

Layer 2: Controllers (backend/controllers/)

Layer 3: Services (backend/services/)

Search Workflow

Stage 1: Search & Extract (Notion → Blobs)

Stage 2: Blob Storage

Stage 3: Sync to Notion Database (Blobs → Notion)

Block Type Handling

Recursive Fetching

Block Type Mode Optimization

Included Block Types

Excluded Block Types

Validation & Auto-Assignment Rules

Validation & Enrichment for Blob Storage

When Validation Happens

Validation Logic

Examples

Sync Summary

Endpoints

API Endpoints

Task Endpoints

Cron Jobs

Architecture

Cron 1: Todo Search (todoSearch.cron.ts)

Cron 2: Todo Sync (todoSync.cron.ts)

Why Two Separate Crons?

Setting Up Crons in Val Town

Environment Variables

Search Modes

Mode 1: Keyword Search (default)

Mode 2: Block Type Search

Mode Priority

Auto-Assignment (applies to both modes)

Project Matching

Strategy 1: Link Mentions

Strategy 2: Source Page

Strategy 3: AI Fuzzy Matching with Date Disambiguation

Strategy 4: Parent Block Traversal

Configuration

Getting Started

Prerequisites

Setup

Usage Examples

Keyword Mode Examples

Block Type Mode Examples

Other Examples

Development Guidelines

Tech Stack

Architecture Diagrams

Complete System Flow

MVC Layer Interaction

License

Layer 1: Routes (`backend/routes/`)

Layer 2: Controllers (`backend/controllers/`)

Layer 3: Services (`backend/services/`)

Cron 1: Todo Search (`todoSearch.cron.ts`)

Cron 2: Todo Sync (`todoSync.cron.ts`)