Notion Keyword Search & Sync System

A Val Town application that searches Notion pages for keywords (like "todo"), extracts structured data, stores it in blob storage, and syncs it back to a Notion database with intelligent filtering and validation.

Overview

This system enables automatic extraction and organization of action items from Notion pages:

Search: Scans recent Notion pages for configurable keywords
Extract: Captures block content including mentions and dates
Store: Saves to Val Town blob storage with timestamp tracking and sync metadata
Optimize: Skips already-synced items to reduce API calls by 90%+
Sync: Updates a Notion database with validated items
Filter: Only syncs items with required metadata (owner + due date)

Project Structure

├── backend/
│   ├── controllers/         # Business logic
│   │   ├── pageController.ts           # Page operations
│   │   ├── todoController.ts           # Keyword search logic
│   │   ├── todoSaveController.ts       # Blob → Notion sync
│   │   └── todoOrchestrationController.ts  # Batch workflow
│   ├── crons/              # Time-based triggers
│   │   ├── todoSearch.cron.ts  # Periodic keyword search
│   │   └── todoSync.cron.ts    # Periodic database sync
│   ├── routes/             # HTTP handlers
│   │   ├── api/            # API endpoints
│   │   │   └── pages.ts    # Recent pages API
│   │   └── tasks/          # Task automation endpoints
│   │       ├── todoSearch.ts # Single page search webhook
│   │       ├── todoSave.ts # Blob sync webhook
│   │       └── todos.ts    # Batch search & sync
│   ├── services/           # External API integrations
│   │   ├── notion/         # Notion API wrapper
│   │   │   ├── index.ts    # Client initialization
│   │   │   ├── pages.ts    # Page operations
│   │   │   ├── databases.ts # Database operations
│   │   │   └── search.ts   # Search operations
│   │   └── blobService.ts  # Val Town blob storage
│   └── utils/              # Utility functions
│       ├── notionUtils.ts  # Block transformation
│       ├── blobUtils.ts    # Blob key parsing
│       └── emojiUtils.ts   # Emoji extraction
├── frontend/               # React frontend
├── shared/                 # Shared types and utilities
│   ├── types.ts            # TypeScript interfaces
│   └── utils.ts            # Shared utility functions
├── main.http.tsx           # Application entry point (Hono)
├── CLAUDE.md               # Development guidelines
└── AGENTS.md               # Val Town platform guidelines

MVC Architecture

This application follows a strict 3-layer MVC architecture with clear separation of concerns:

Request → Route → Controller → Service → External API
                      ↓
Response ← Format ← Standard Response ← Result

Layer 1: Routes (`backend/routes/`)

Responsibility: HTTP handling only

Extract request parameters (query, body, headers)
Call controller functions
Format responses with appropriate HTTP status codes
Never contain business logic

// Example: backend/routes/tasks/todos.ts
app.post('/', async (c) => {
  const keyword = c.req.query('keyword') || undefined;
  const result = await todoOrchestrationController.processBatchTodos(hours, keyword);
  return c.json(result, 200);
});

Layer 2: Controllers (`backend/controllers/`)

Responsibility: Business logic and orchestration

Validate input data
Orchestrate multiple service calls
Transform and filter data
Return standardized response format: {success, data, error, details?}
Never make direct HTTP calls to external APIs

// Example: backend/controllers/todoController.ts
export async function processTodoSearch(pageId: string, keyword: string = 'todo') {
  // Validation
  if (!pageId) return { success: false, error: "Invalid pageId", ... };

  // Call service layer
  const blocks = await notionService.getPageBlocksRecursive(pageId);

  // Business logic
  const matches = blocks.filter(block => searchBlockForKeyword(block, keyword));

  return { success: true, data: matches, error: null };
}

Layer 3: Services (`backend/services/`)

Responsibility: External API calls only

Make HTTP requests to external APIs
Handle API authentication
Parse and normalize API responses
Return structured results: {success, data, error}
Never contain business logic

// Example: backend/services/notion/pages.ts
export async function getPageBlocksRecursive(blockId: string) {
  const response = await notion.blocks.children.list({ block_id: blockId });
  return response.results;
}

Golden Rule: Never skip layers! Routes call controllers, controllers call services. This ensures testability, maintainability, and clear separation of concerns.

Keyword Search Workflow

The system follows a three-stage pipeline: Notion → Blob Storage → Notion Database

Stage 1: Search & Extract (Notion → Blobs)

Flow:

Get recent pages from Notion (configurable time window)
For each page, recursively fetch all blocks
Search blocks for keyword matches
Extract structured data from matching blocks
Validate: Check for required fields (people_mentions AND date_mentions)
Save valid blocks to blob storage with timestamp

Validation (happens here, not during sync):

✅ Block must have at least one people_mention (for Owner)
✅ Block must have at least one date_mention (for Due date)
❌ Blocks without both criteria are skipped (not saved to blob storage)
Result: All blobs in storage are guaranteed valid and ready to sync

Endpoints:

POST /tasks/todo/search - Single page search (webhook-triggered)
POST /tasks/todos?hours=24 - Batch search across recent pages

Keywords Configuration:

Set via SEARCH_KEYWORDS environment variable (comma-separated)
Example: SEARCH_KEYWORDS=todo,zinger,bit,😀
Defaults to todo if not set
All keywords searched in single pass through blocks (efficient)

Keyword Matching Logic:

Text keywords (e.g., "todo", "bit", "steel"):
- Case-insensitive
- Word boundary matching (finds "todo" but not "todoist")
- Uses regex: /\btodo\b/i
Emojis (e.g., "😀", "🎉"):
- Exact match
- Case-sensitivity N/A
Multi-keyword: Block saved if it matches ANY keyword

Block Extraction:

When a keyword is found, the system extracts and transforms the block into a reduced format:

{
  full_sentence: "Buy groceries for @John due October 30, 2025",
  block_id: "abc-123-def-456",
  block_url: "https://www.notion.so/abc123def456",
  last_edited_time: "2025-10-29T12:00:00.000Z",
  people_mentions: [{ id: "user-123", name: "John", email: "john@example.com" }],
  date_mentions: ["2025-10-30"],
  link_mentions: [{ text: "Project", url: "/page-id" }],
  sync_metadata: {
    synced: false,  // Needs sync to Notion database
    notion_page_id: undefined  // Will be set after first sync
  }
}

Transformation Details:

Dates: Formatted to human-readable (e.g., "October 30, 2025 at 3:00 PM EDT")
Original dates preserved: ISO format kept in date_mentions array for Notion API
Block URL: Clickable link to original block location
Emojis: Extracted for use as page icons

Stage 2: Blob Storage

Storage Format:

Key pattern: {projectName}--{category}--{blockId}
Example: demo--todo--abc-123-def-456
Content: JSON of reduced block structure with sync metadata

Blob Structure:

{
  full_sentence: "...",
  block_id: "...",
  // ... other properties
  sync_metadata: {
    synced: boolean,           // true = synced to Notion, false = needs sync
    notion_page_id?: string    // Cached Notion page ID (optimization)
  }
}

Update Logic:

Compare last_edited_time of existing blob vs new block
If unchanged: Skip save (preserves synced: true status)
If changed: Save with synced: false (triggers re-sync)
Preserve cached notion_page_id across updates
Prevents data loss from out-of-order processing

Stage 3: Sync to Notion Database (Blobs → Notion)

Flow:

List all blobs in "todo" category
For each blob, read reduced block data
Optimization: Skip if synced: true (0 API calls)
Optimization: Use cached notion_page_id if available (1 API call - update only)
If no cached ID: Query database for existing page by Block ID
Create new page OR update existing page
Mark blob as synced: true and cache page ID

Note: No validation happens during sync - all blobs are guaranteed valid because validation occurs during the search phase (Stage 1).

Endpoints:

POST /tasks/todo/save - Sync all blobs to database (webhook-triggered)
POST /tasks/todos - Batch workflow (search + sync in one call)

Property Mappings (Blob → Notion Database):

full_sentence      → Name (title)
block_id           → Block ID (rich_text)
block_url          → Block URL (url)
last_edited_time   → Todo last edited time (date)
people_mentions[0] → Owner (people)
people_mentions[1..] → Other people (people)
date_mentions[0]   → Due date (date)
link_mentions      → Links (rich_text, bullet list)
emoji (if found)   → Page icon

Sync Optimization:

The system uses sync metadata to dramatically reduce Notion API calls:

On first sync:

Blob has synced: false, no notion_page_id
Query database → create or update → cache page ID
Mark synced: true
API calls: 1 query + 1 create/update = 2 calls

On subsequent syncs (no changes):

Blob has synced: true
Skip immediately
API calls: 0 calls (100% reduction)

On subsequent syncs (block changed):

Blob saved with synced: false (block edited in Notion)
Has cached notion_page_id from previous sync
Update directly without query
Mark synced: true
API calls: 1 update (50% reduction)

Performance impact:

Before optimization: 100 blobs = 100 queries + 50 updates = 150 API calls
After optimization: 90 synced + 10 changed = 0 + 10 updates = 10 API calls (93% reduction)

Block Type Handling

The search uses recursive block fetching to traverse the entire page hierarchy, including nested content.

Recursive Fetching

How it works:

function getPageBlocksRecursive(blockId) {
  1. Fetch immediate children of blockId
  2. For each child:
     - Add child to results
     - If child.has_children === true:
       - Recursively fetch child's children
       - Add to results
  3. Return flattened array of all blocks
}

What this means:

✅ Finds blocks nested inside toggles
✅ Finds blocks nested inside columns
✅ Finds blocks nested inside lists
✅ Finds blocks nested N levels deep

Included Block Types

These block types are searched for keywords:

Block Type	Has rich_text?	Notes
`paragraph`	✅	Standard text blocks
`heading_1`, `heading_2`, `heading_3`	✅	All heading levels
`bulleted_list_item`	✅	Bullet lists
`numbered_list_item`	✅	Numbered lists
`to_do`	✅	Checkbox items
`toggle`	✅	Collapsible toggles
`quote`	✅	Quote blocks
`callout`	✅	Callout/alert blocks
`code`	✅	Code blocks (captions only)
`column`	N/A	Container - children are searched
`column_list`	N/A	Container - children are searched

Column Behavior:

Column blocks themselves have no searchable text
But their children (paragraphs, lists, etc.) ARE searched
Example: A todo in a column will be found

Excluded Block Types

These block types are explicitly skipped:

Block Type	Reason
`unsupported`	Not supported by Notion API
`button`	Action buttons, not content
`table`	Container block, no text content
`table_row`	Cells aren't individual blocks; can't be saved to blob
`child_page`	Page title not in rich_text format
`child_database`	Database title not in rich_text format
`divider`	No text content
`table_of_contents`	No text content
`breadcrumb`	No text content
`image`, `file`, `video`, `pdf`	Media blocks (captions could be added later)
`bookmark`, `embed`	External content (could be added later)

Why tables are excluded:

Table content lives in table_row.cells[][] (array of arrays)
Cells contain rich_text but aren't individual blocks
Can't be saved to blob storage as standalone blocks
Can't create Notion pages from cell content

Validation Rules

Not all blocks with keywords are saved to blob storage. Validation ensures data quality and storage efficiency.

Required Fields for Blob Storage

A block will ONLY be saved to blob storage if it has:

✅ At least one person mention (people_mentions.length > 0)
- First person becomes "Owner"
- Additional people become "Other people"
✅ At least one date mention (date_mentions.length > 0)
- First date becomes "Due date"

Without both: Block is skipped entirely (not saved, not synced).

When Validation Happens

During Search (Stage 1) - todoController.ts:

After keyword match is found
After block is transformed to reduced format
Before saving to blob storage

Not During Sync (Stage 3) - All blobs are guaranteed valid, no checking needed

Validation Logic

// From todoController.ts (search phase)
if (!reducedBlock.people_mentions || reducedBlock.people_mentions.length === 0) {
  console.log(`○ Skipped: no people mentions (no owner)`);
  continue;  // Don't save to blob
}

if (!reducedBlock.date_mentions || reducedBlock.date_mentions.length === 0) {
  console.log(`○ Skipped: no date mentions (no due date)`);
  continue;  // Don't save to blob
}

// Only valid blocks reach this point and get saved to blob storage
await blobService.saveBlockToBlob('todo', block.id, blobData);

Examples

Valid - Will save and sync:

"Buy groceries for @John due October 30, 2025"
✅ Has person mention (@John)
✅ Has date mention (October 30, 2025)
→ Saved to blob storage
→ Synced to Notion database

Invalid - Will NOT save:

"Buy groceries due October 30, 2025"
✅ Has date mention
❌ Missing person mention
→ NOT saved to blob storage (skipped during search)

"Buy groceries for @John"
✅ Has person mention
❌ Missing date mention
→ NOT saved to blob storage (skipped during search)

Sync Summary

After syncing, the controller reports:

Total blobs: All blobs in storage (all guaranteed valid)
Pages created: New pages added to database
Pages updated: Existing pages updated
Pages skipped: Blobs already synced (synced: true)
Pages failed: Errors during create/update

Note: All blobs meet validation criteria - validation happens during search, not sync.

Endpoints

API Endpoints

GET /api/pages/recent?hours=24

Get pages edited in last N hours
Filters out archived pages and pages in TODOS_DB_ID database
Returns simplified page objects with parent information

Response:

{
  "pages": [
    {
      "id": "page-id",
      "object": "page",
      "title": "My Page",
      "url": "https://notion.so/...",
      "last_edited_time": "2025-10-29T12:00:00.000Z",
      "parent": { "type": "page_id", "id": "parent-id" }
    }
  ],
  "count": 1,
  "timeRange": "24 hours"
}

Task Endpoints

POST /tasks/todo/search

Search single page for keywords (webhook-triggered)
Keywords from SEARCH_KEYWORDS env var (comma-separated)
Extracts and saves matching blocks to blobs
Body: { "page_id": "abc-123" }

POST /tasks/todo/save

Sync all blobs to Notion database
Validates and creates/updates pages
No request body needed

POST /tasks/todos?hours=24

Batch workflow: Search recent pages + sync to database
Keywords from SEARCH_KEYWORDS env var
Combines search and save in one call
Use for manual triggers or cron jobs

Response:

{
  "success": true,
  "pagesSearched": 5,
  "totalTodosFound": 12,
  "searchResults": [
    {
      "pageId": "abc-123",
      "pageTitle": "My Page",
      "success": true,
      "blocksFound": 3,
      "blockIds": ["block-1", "block-2", "block-3"]
    }
  ],
  "saveResult": {
    "totalBlobs": 12,
    "pagesCreated": 5,
    "pagesUpdated": 3,
    "pagesSkipped": 4,
    "pagesFailed": 0
  }
}

Cron Jobs

The system includes two separate cron jobs for automated workflow execution. Crons are time-based triggers that run independently of HTTP requests.

Architecture

Crons live in backend/crons/ and follow the same MVC pattern as HTTP routes:

Cron Trigger → Controller → Service → External API

Key differences from HTTP routes:

Triggered by time intervals (not HTTP requests)
No request/response cycle
Results logged to console only
Use .cron.tsx extension for Val Town

Cron 1: Todo Search (`todoSearch.cron.ts`)

Purpose: Search recent pages for keywords and save matches to blob storage

Workflow:

Get recent pages from Notion (last 6 hours)
Search each page for "todo" keyword
Save matching blocks to Val Town blob storage
Does NOT sync to Notion database

Configuration:

Lookback window: 6 hours (hardcoded)
Keyword: "todo" (hardcoded default)
Recommended schedule: Every 4 hours
- 6 hour lookback provides 2 hour buffer for overlap
- Ensures no pages are missed

Output:

=== Cron: Todo Search Started ===
Timestamp: 2025-10-29T12:00:00.000Z

Cron: Search complete - Found 12 todos in 5 pages
Pages with matches:
  - Project Planning: 3 match(es)
  - Meeting Notes: 5 match(es)
  - Weekly Review: 4 match(es)

=== Cron: Todo Search Complete ===

Cron 2: Todo Sync (`todoSync.cron.ts`)

Purpose: Sync validated todo blobs to Notion database

Workflow:

Read all todo blobs from Val Town blob storage
Validate each blob (requires person mention + date mention)
Query Notion database for existing pages by Block ID
Create new pages or update existing pages (timestamp-based)

Configuration:

No parameters: Processes all blobs in storage
Recommended schedule: Every 8-12 hours
- Less frequent than search cron
- Allows time for blob accumulation
- Reduces Notion API calls

Output:

=== Cron: Todo Sync Started ===
Timestamp: 2025-10-29T14:00:00.000Z

Cron: Sync complete
Summary:
  Total blobs processed: 12
  Pages created: 5
  Pages updated: 3
  Pages skipped: 4
  Pages failed: 0

=== Cron: Todo Sync Complete ===

Why Two Separate Crons?

Operational flexibility:

Search cron runs frequently to capture changes quickly
Sync cron runs less frequently to batch database updates
Reduces Notion API rate limit concerns
Allows manual triggering of sync independently

Fault isolation:

Search failures don't block syncing existing blobs
Sync failures don't block new searches
Each cron can be debugged independently

Cost optimization:

Blob storage is cheap and fast
Notion API calls are rate-limited
Separate crons allow different schedules for different costs

Setting Up Crons in Val Town

Navigate to Val Town UI
Create new cron vals:
- todoSearch.cron.tsx - Copy content from backend/crons/todoSearch.cron.ts
- todoSync.cron.tsx - Copy content from backend/crons/todoSync.cron.ts
Set schedules:
- todoSearch.cron.tsx: Every 4 hours (0 */4 * * *)
- todoSync.cron.tsx: Every 8 hours (0 */8 * * *)
Monitor logs: Check Val Town console for cron execution results

Note: Val Town cron jobs must be separate vals (not files in this project). The files in backend/crons/ serve as templates to copy into Val Town cron vals.

Environment Variables

Required environment variables (set in Val Town):

NOTION_API_KEY - Notion integration token (required)
- Get from: https://www.notion.so/my-integrations
- Required for all Notion API calls
TODOS_DB_ID - Notion database ID for todo sync (required)
- The database where keyword matches are synced
- Format: abc123def456... (32-character ID without hyphens)
SEARCH_KEYWORDS - Keywords to search for (optional)
- Comma-separated list of keywords/phrases
- Example: todo,zinger,bit or todo,😀,🎉
- Defaults to todo if not set
- All blocks matching ANY keyword will be saved to blob storage
- Efficient: Searches all keywords in a single pass through blocks
API_KEY - Optional API key for authentication
- Used by authCheck middleware for protected endpoints
NOTION_WEBHOOK_SECRET - Optional webhook signature verification
- Used to verify Notion webhook authenticity

Getting Started

Prerequisites

Create a Notion integration at https://www.notion.so/my-integrations
Create a Notion database with these properties:
- Name (title)
- Block ID (rich_text)
- Block URL (url)
- Todo last edited time (date)
- Owner (people)
- Other people (people)
- Due date (date)
- Links (rich_text)
Share the database with your integration

Setup

Fork this val in Val Town
Set environment variables:
- NOTION_API_KEY = your integration token
- TODOS_DB_ID = your database ID
- SEARCH_KEYWORDS = todo (or your preferred keywords, comma-separated)
Test with: POST /tasks/todos?hours=1

Usage Examples

Find and sync all matches from last 24 hours (uses SEARCH_KEYWORDS env var):

curl -X POST https://your-val.express/tasks/todos

Custom time window (still uses SEARCH_KEYWORDS env var):

curl -X POST "https://your-val.express/tasks/todos?hours=48"

Search for multiple keywords - Set SEARCH_KEYWORDS=todo,zinger,😀 then:

curl -X POST https://your-val.express/tasks/todos

Get recent pages (API):

curl "https://your-val.express/api/pages/recent?hours=12"

Development Guidelines

For project-specific architecture: See CLAUDE.md
For Val Town platform guidelines: See AGENTS.md

Tech Stack

Runtime: Deno on Val Town
Framework: Hono (lightweight web framework)
Frontend: React 18.2.0 with Pico CSS (classless CSS framework)
APIs:
- Notion API (@notionhq/client v2)
- Val Town blob storage
Language: TypeScript

Architecture Diagrams

Complete System Flow

┌─────────────────────────────────────────────────────────────┐
│                     Notion Workspace                        │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐                 │
│  │  Page A  │  │  Page B  │  │  Page C  │                 │
│  │  "todo"  │  │  "todo"  │  │          │                 │
│  └──────────┘  └──────────┘  └──────────┘                 │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
         ┌─────────────────────────────────────┐
         │  POST /tasks/todos?keyword=todo     │
         │  (Batch Search & Sync Endpoint)     │
         └─────────────────┬───────────────────┘
                           │
        ┌──────────────────┴──────────────────┐
        │                                     │
        ▼                                     ▼
┌───────────────────┐              ┌──────────────────────────┐
│  Step 1: Search   │              │  Step 3: Sync (Optimized)│
│                   │              │                          │
│ • Get recent pages│              │ • Read blobs             │
│ • Fetch all blocks│              │ • Skip if synced: true   │
│ • Search keywords │              │ • Use cached page ID     │
│ • Extract data    │              │ • Create/update pages    │
│ • Validate        │              │ • Mark synced: true      │
└─────────┬─────────┘              └──────────────────────────┘
          │                                   ▲
          ▼                                   │
┌─────────────────────┐                      │
│  Step 2: Store      │                      │
│                     │                      │
│ • Save to blobs     │──────────────────────┘
│ • Set synced: false │
│ • Compare timestamps│
│ • Preserve page ID  │
└─────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────────────┐
│                   Val Town Blob Storage                          │
│                                                                  │
│  demo--todo--block-1: { data, sync_metadata: {synced, page_id} }│
│  demo--todo--block-2: { data, sync_metadata: {synced, page_id} }│
│  demo--todo--block-3: { data, sync_metadata: {synced, page_id} }│
└─────────────────────────────────────────────────────────────────┘

MVC Layer Interaction

┌──────────────────────────────────────────────────────────────┐
│                       HTTP Request                           │
│  POST /tasks/todos?hours=24&keyword=todo                     │
└──────────────────────┬───────────────────────────────────────┘
                       │
                       ▼
         ┌─────────────────────────────┐
         │   ROUTE (todos.ts)          │
         │   • Extract query params    │
         │   • Call controller         │
         │   • Format HTTP response    │
         └─────────────┬───────────────┘
                       │
                       ▼
         ┌─────────────────────────────────────┐
         │   CONTROLLER (orchestration)        │
         │   • Validate inputs                 │
         │   • Orchestrate workflow:           │
         │     1. Get recent pages             │
         │     2. Search each page             │
         │     3. Sync to database             │
         │   • Return standardized result      │
         └──────────────┬──────────────────────┘
                        │
            ┌───────────┼───────────┐
            │           │           │
            ▼           ▼           ▼
    ┌───────────┐ ┌──────────┐ ┌────────────┐
    │ SERVICE:  │ │ SERVICE: │ │ SERVICE:   │
    │ pages.ts  │ │ blob.ts  │ │ database.ts│
    │           │ │          │ │            │
    │ • API call│ │ • Blob   │ │ • Query DB │
    │ • Parse   │ │   CRUD   │ │ • Create   │
    │ • Return  │ │ • Return │ │ • Update   │
    └─────┬─────┘ └────┬─────┘ └─────┬──────┘
          │            │             │
          ▼            ▼             ▼
    Notion API   Blob Storage   Notion API

License

MIT

lightweight

todoSweeper