A Val Town application that searches Notion pages for keywords (like "todo"), extracts structured data, stores it in blob storage, and syncs it back to a Notion database with intelligent filtering and validation.
- Overview
- Project Structure
- MVC Architecture
- Keyword Search Workflow
- Block Type Handling
- Validation Rules
- Endpoints
- Cron Jobs
- Environment Variables
This system enables automatic extraction and organization of action items from Notion pages:
- Search: Scans recent Notion pages for configurable keywords
- Extract: Captures block content including mentions and dates
- Store: Saves to Val Town blob storage with timestamp tracking and sync metadata
- Optimize: Skips already-synced items to reduce API calls by 90%+
- Sync: Updates a Notion database with validated items
- Filter: Only syncs items with required metadata (owner + due date)
├── backend/
│ ├── controllers/ # Business logic
│ │ ├── pageController.ts # Page operations
│ │ ├── todoController.ts # Keyword search logic
│ │ ├── todoSaveController.ts # Blob → Notion sync
│ │ └── todoOrchestrationController.ts # Batch workflow
│ ├── crons/ # Time-based triggers
│ │ ├── todoSearch.cron.ts # Periodic keyword search
│ │ └── todoSync.cron.ts # Periodic database sync
│ ├── routes/ # HTTP handlers
│ │ ├── api/ # API endpoints
│ │ │ └── pages.ts # Recent pages API
│ │ └── tasks/ # Task automation endpoints
│ │ ├── todoSearch.ts # Single page search webhook
│ │ ├── todoSave.ts # Blob sync webhook
│ │ └── todos.ts # Batch search & sync
│ ├── services/ # External API integrations
│ │ ├── notion/ # Notion API wrapper
│ │ │ ├── index.ts # Client initialization
│ │ │ ├── pages.ts # Page operations
│ │ │ ├── databases.ts # Database operations
│ │ │ └── search.ts # Search operations
│ │ └── blobService.ts # Val Town blob storage
│ └── utils/ # Utility functions
│ ├── notionUtils.ts # Block transformation
│ ├── blobUtils.ts # Blob key parsing
│ └── emojiUtils.ts # Emoji extraction
├── frontend/ # React frontend
├── shared/ # Shared types and utilities
│ ├── types.ts # TypeScript interfaces
│ └── utils.ts # Shared utility functions
├── main.http.tsx # Application entry point (Hono)
├── CLAUDE.md # Development guidelines
└── AGENTS.md # Val Town platform guidelines
This application follows a strict 3-layer MVC architecture with clear separation of concerns:
Request → Route → Controller → Service → External API
↓
Response ← Format ← Standard Response ← Result
Responsibility: HTTP handling only
- Extract request parameters (query, body, headers)
- Call controller functions
- Format responses with appropriate HTTP status codes
- Never contain business logic
// Example: backend/routes/tasks/todos.ts
app.post('/', async (c) => {
const keyword = c.req.query('keyword') || undefined;
const result = await todoOrchestrationController.processBatchTodos(hours, keyword);
return c.json(result, 200);
});
Responsibility: Business logic and orchestration
- Validate input data
- Orchestrate multiple service calls
- Transform and filter data
- Return standardized response format:
{success, data, error, details?} - Never make direct HTTP calls to external APIs
// Example: backend/controllers/todoController.ts
export async function processTodoSearch(pageId: string, keyword: string = 'todo') {
// Validation
if (!pageId) return { success: false, error: "Invalid pageId", ... };
// Call service layer
const blocks = await notionService.getPageBlocksRecursive(pageId);
// Business logic
const matches = blocks.filter(block => searchBlockForKeyword(block, keyword));
return { success: true, data: matches, error: null };
}
Responsibility: External API calls only
- Make HTTP requests to external APIs
- Handle API authentication
- Parse and normalize API responses
- Return structured results:
{success, data, error} - Never contain business logic
// Example: backend/services/notion/pages.ts
export async function getPageBlocksRecursive(blockId: string) {
const response = await notion.blocks.children.list({ block_id: blockId });
return response.results;
}
Golden Rule: Never skip layers! Routes call controllers, controllers call services. This ensures testability, maintainability, and clear separation of concerns.
The system follows a three-stage pipeline: Notion → Blob Storage → Notion Database
Flow:
- Get recent pages from Notion (configurable time window)
- For each page, recursively fetch all blocks
- Search blocks for keyword matches
- Extract structured data from matching blocks
- Validate: Check for required fields (people_mentions AND date_mentions)
- Save valid blocks to blob storage with timestamp
Validation (happens here, not during sync):
- ✅ Block must have at least one
people_mention(for Owner) - ✅ Block must have at least one
date_mention(for Due date) - ❌ Blocks without both criteria are skipped (not saved to blob storage)
- Result: All blobs in storage are guaranteed valid and ready to sync
Endpoints:
POST /tasks/todo/search- Single page search (webhook-triggered)POST /tasks/todos?hours=24- Batch search across recent pages
Keywords Configuration:
- Set via
SEARCH_KEYWORDSenvironment variable (comma-separated) - Example:
SEARCH_KEYWORDS=todo,zinger,bit,😀 - Defaults to
todoif not set - All keywords searched in single pass through blocks (efficient)
Keyword Matching Logic:
- Text keywords (e.g., "todo", "bit", "steel"):
- Case-insensitive
- Word boundary matching (finds "todo" but not "todoist")
- Uses regex:
/\btodo\b/i
- Emojis (e.g., "😀", "🎉"):
- Exact match
- Case-sensitivity N/A
- Multi-keyword: Block saved if it matches ANY keyword
Block Extraction:
When a keyword is found, the system extracts and transforms the block into a reduced format:
{
full_sentence: "Buy groceries for @John due October 30, 2025",
block_id: "abc-123-def-456",
block_url: "https://www.notion.so/abc123def456",
last_edited_time: "2025-10-29T12:00:00.000Z",
people_mentions: [{ id: "user-123", name: "John", email: "john@example.com" }],
date_mentions: ["2025-10-30"],
link_mentions: [{ text: "Project", url: "/page-id" }],
sync_metadata: {
synced: false, // Needs sync to Notion database
notion_page_id: undefined // Will be set after first sync
}
}
Transformation Details:
- Dates: Formatted to human-readable (e.g., "October 30, 2025 at 3:00 PM EDT")
- Original dates preserved: ISO format kept in
date_mentionsarray for Notion API - Block URL: Clickable link to original block location
- Emojis: Extracted for use as page icons
Storage Format:
- Key pattern:
{projectName}--{category}--{blockId} - Example:
demo--todo--abc-123-def-456 - Content: JSON of reduced block structure with sync metadata
Blob Structure:
{
full_sentence: "...",
block_id: "...",
// ... other properties
sync_metadata: {
synced: boolean, // true = synced to Notion, false = needs sync
notion_page_id?: string // Cached Notion page ID (optimization)
}
}
Update Logic:
- Compare
last_edited_timeof existing blob vs new block - If unchanged: Skip save (preserves
synced: truestatus) - If changed: Save with
synced: false(triggers re-sync) - Preserve cached
notion_page_idacross updates - Prevents data loss from out-of-order processing
Flow:
- List all blobs in "todo" category
- For each blob, read reduced block data
- Optimization: Skip if
synced: true(0 API calls) - Optimization: Use cached
notion_page_idif available (1 API call - update only) - If no cached ID: Query database for existing page by Block ID
- Create new page OR update existing page
- Mark blob as
synced: trueand cache page ID
Note: No validation happens during sync - all blobs are guaranteed valid because validation occurs during the search phase (Stage 1).
Endpoints:
POST /tasks/todo/save- Sync all blobs to database (webhook-triggered)POST /tasks/todos- Batch workflow (search + sync in one call)
Property Mappings (Blob → Notion Database):
full_sentence → Name (title)
block_id → Block ID (rich_text)
block_url → Block URL (url)
last_edited_time → Todo last edited time (date)
people_mentions[0] → Owner (people)
people_mentions[1..] → Other people (people)
date_mentions[0] → Due date (date)
link_mentions → Links (rich_text, bullet list)
emoji (if found) → Page icon
Sync Optimization:
The system uses sync metadata to dramatically reduce Notion API calls:
On first sync:
- Blob has
synced: false, nonotion_page_id - Query database → create or update → cache page ID
- Mark
synced: true - API calls: 1 query + 1 create/update = 2 calls
On subsequent syncs (no changes):
- Blob has
synced: true - Skip immediately
- API calls: 0 calls (100% reduction)
On subsequent syncs (block changed):
- Blob saved with
synced: false(block edited in Notion) - Has cached
notion_page_idfrom previous sync - Update directly without query
- Mark
synced: true - API calls: 1 update (50% reduction)
Performance impact:
- Before optimization: 100 blobs = 100 queries + 50 updates = 150 API calls
- After optimization: 90 synced + 10 changed = 0 + 10 updates = 10 API calls (93% reduction)
The search uses recursive block fetching to traverse the entire page hierarchy, including nested content.
How it works:
function getPageBlocksRecursive(blockId) {
1. Fetch immediate children of blockId
2. For each child:
- Add child to results
- If child.has_children === true:
- Recursively fetch child's children
- Add to results
3. Return flattened array of all blocks
}
What this means:
- ✅ Finds blocks nested inside toggles
- ✅ Finds blocks nested inside columns
- ✅ Finds blocks nested inside lists
- ✅ Finds blocks nested N levels deep
These block types are searched for keywords:
| Block Type | Has rich_text? | Notes |
|---|---|---|
paragraph | ✅ | Standard text blocks |
heading_1, heading_2, heading_3 | ✅ | All heading levels |
bulleted_list_item | ✅ | Bullet lists |
numbered_list_item | ✅ | Numbered lists |
to_do | ✅ | Checkbox items |
toggle | ✅ | Collapsible toggles |
quote | ✅ | Quote blocks |
callout | ✅ | Callout/alert blocks |
code | ✅ | Code blocks (captions only) |
column | N/A | Container - children are searched |
column_list | N/A | Container - children are searched |
Column Behavior:
- Column blocks themselves have no searchable text
- But their children (paragraphs, lists, etc.) ARE searched
- Example: A todo in a column will be found
These block types are explicitly skipped:
| Block Type | Reason |
|---|---|
unsupported | Not supported by Notion API |
button | Action buttons, not content |
table | Container block, no text content |
table_row | Cells aren't individual blocks; can't be saved to blob |
child_page | Page title not in rich_text format |
child_database | Database title not in rich_text format |
divider | No text content |
table_of_contents | No text content |
breadcrumb | No text content |
image, file, video, pdf | Media blocks (captions could be added later) |
bookmark, embed | External content (could be added later) |
Why tables are excluded:
- Table content lives in
table_row.cells[][](array of arrays) - Cells contain rich_text but aren't individual blocks
- Can't be saved to blob storage as standalone blocks
- Can't create Notion pages from cell content
Not all blocks with keywords are saved to blob storage. Validation ensures data quality and storage efficiency.
A block will ONLY be saved to blob storage if it has:
-
✅ At least one person mention (
people_mentions.length > 0)- First person becomes "Owner"
- Additional people become "Other people"
-
✅ At least one date mention (
date_mentions.length > 0)- First date becomes "Due date"
Without both: Block is skipped entirely (not saved, not synced).
During Search (Stage 1) - todoController.ts:
- After keyword match is found
- After block is transformed to reduced format
- Before saving to blob storage
Not During Sync (Stage 3) - All blobs are guaranteed valid, no checking needed
// From todoController.ts (search phase)
if (!reducedBlock.people_mentions || reducedBlock.people_mentions.length === 0) {
console.log(`○ Skipped: no people mentions (no owner)`);
continue; // Don't save to blob
}
if (!reducedBlock.date_mentions || reducedBlock.date_mentions.length === 0) {
console.log(`○ Skipped: no date mentions (no due date)`);
continue; // Don't save to blob
}
// Only valid blocks reach this point and get saved to blob storage
await blobService.saveBlockToBlob('todo', block.id, blobData);
Valid - Will save and sync:
"Buy groceries for @John due October 30, 2025"
✅ Has person mention (@John)
✅ Has date mention (October 30, 2025)
→ Saved to blob storage
→ Synced to Notion database
Invalid - Will NOT save:
"Buy groceries due October 30, 2025"
✅ Has date mention
❌ Missing person mention
→ NOT saved to blob storage (skipped during search)
"Buy groceries for @John"
✅ Has person mention
❌ Missing date mention
→ NOT saved to blob storage (skipped during search)
After syncing, the controller reports:
- Total blobs: All blobs in storage (all guaranteed valid)
- Pages created: New pages added to database
- Pages updated: Existing pages updated
- Pages skipped: Blobs already synced (
synced: true) - Pages failed: Errors during create/update
Note: All blobs meet validation criteria - validation happens during search, not sync.
GET /api/pages/recent?hours=24
- Get pages edited in last N hours
- Filters out archived pages and pages in TODOS_DB_ID database
- Returns simplified page objects with parent information
Response:
{ "pages": [ { "id": "page-id", "object": "page", "title": "My Page", "url": "https://notion.so/...", "last_edited_time": "2025-10-29T12:00:00.000Z", "parent": { "type": "page_id", "id": "parent-id" } } ], "count": 1, "timeRange": "24 hours" }
POST /tasks/todo/search
- Search single page for keywords (webhook-triggered)
- Keywords from
SEARCH_KEYWORDSenv var (comma-separated) - Extracts and saves matching blocks to blobs
- Body:
{ "page_id": "abc-123" }
POST /tasks/todo/save
- Sync all blobs to Notion database
- Validates and creates/updates pages
- No request body needed
POST /tasks/todos?hours=24
- Batch workflow: Search recent pages + sync to database
- Keywords from
SEARCH_KEYWORDSenv var - Combines search and save in one call
- Use for manual triggers or cron jobs
Response:
{ "success": true, "pagesSearched": 5, "totalTodosFound": 12, "searchResults": [ { "pageId": "abc-123", "pageTitle": "My Page", "success": true, "blocksFound": 3, "blockIds": ["block-1", "block-2", "block-3"] } ], "saveResult": { "totalBlobs": 12, "pagesCreated": 5, "pagesUpdated": 3, "pagesSkipped": 4, "pagesFailed": 0 } }
The system includes two separate cron jobs for automated workflow execution. Crons are time-based triggers that run independently of HTTP requests.
Crons live in backend/crons/ and follow the same MVC pattern as HTTP routes:
Cron Trigger → Controller → Service → External API
Key differences from HTTP routes:
- Triggered by time intervals (not HTTP requests)
- No request/response cycle
- Results logged to console only
- Use
.cron.tsxextension for Val Town
Purpose: Search recent pages for keywords and save matches to blob storage
Workflow:
- Get recent pages from Notion (last 6 hours)
- Search each page for "todo" keyword
- Save matching blocks to Val Town blob storage
- Does NOT sync to Notion database
Configuration:
- Lookback window: 6 hours (hardcoded)
- Keyword: "todo" (hardcoded default)
- Recommended schedule: Every 4 hours
- 6 hour lookback provides 2 hour buffer for overlap
- Ensures no pages are missed
Output:
=== Cron: Todo Search Started ===
Timestamp: 2025-10-29T12:00:00.000Z
Cron: Search complete - Found 12 todos in 5 pages
Pages with matches:
- Project Planning: 3 match(es)
- Meeting Notes: 5 match(es)
- Weekly Review: 4 match(es)
=== Cron: Todo Search Complete ===
Purpose: Sync validated todo blobs to Notion database
Workflow:
- Read all todo blobs from Val Town blob storage
- Validate each blob (requires person mention + date mention)
- Query Notion database for existing pages by Block ID
- Create new pages or update existing pages (timestamp-based)
Configuration:
- No parameters: Processes all blobs in storage
- Recommended schedule: Every 8-12 hours
- Less frequent than search cron
- Allows time for blob accumulation
- Reduces Notion API calls
Output:
=== Cron: Todo Sync Started ===
Timestamp: 2025-10-29T14:00:00.000Z
Cron: Sync complete
Summary:
Total blobs processed: 12
Pages created: 5
Pages updated: 3
Pages skipped: 4
Pages failed: 0
=== Cron: Todo Sync Complete ===
Operational flexibility:
- Search cron runs frequently to capture changes quickly
- Sync cron runs less frequently to batch database updates
- Reduces Notion API rate limit concerns
- Allows manual triggering of sync independently
Fault isolation:
- Search failures don't block syncing existing blobs
- Sync failures don't block new searches
- Each cron can be debugged independently
Cost optimization:
- Blob storage is cheap and fast
- Notion API calls are rate-limited
- Separate crons allow different schedules for different costs
- Navigate to Val Town UI
- Create new cron vals:
todoSearch.cron.tsx- Copy content frombackend/crons/todoSearch.cron.tstodoSync.cron.tsx- Copy content frombackend/crons/todoSync.cron.ts
- Set schedules:
todoSearch.cron.tsx: Every 4 hours (0 */4 * * *)todoSync.cron.tsx: Every 8 hours (0 */8 * * *)
- Monitor logs: Check Val Town console for cron execution results
Note: Val Town cron jobs must be separate vals (not files in this project). The files in backend/crons/ serve as templates to copy into Val Town cron vals.
Required environment variables (set in Val Town):
-
NOTION_API_KEY- Notion integration token (required)- Get from: https://www.notion.so/my-integrations
- Required for all Notion API calls
-
TODOS_DB_ID- Notion database ID for todo sync (required)- The database where keyword matches are synced
- Format:
abc123def456...(32-character ID without hyphens)
-
SEARCH_KEYWORDS- Keywords to search for (optional)- Comma-separated list of keywords/phrases
- Example:
todo,zinger,bitortodo,😀,🎉 - Defaults to
todoif not set - All blocks matching ANY keyword will be saved to blob storage
- Efficient: Searches all keywords in a single pass through blocks
-
API_KEY- Optional API key for authentication- Used by
authCheckmiddleware for protected endpoints
- Used by
-
NOTION_WEBHOOK_SECRET- Optional webhook signature verification- Used to verify Notion webhook authenticity
- Create a Notion integration at https://www.notion.so/my-integrations
- Create a Notion database with these properties:
- Name (title)
- Block ID (rich_text)
- Block URL (url)
- Todo last edited time (date)
- Owner (people)
- Other people (people)
- Due date (date)
- Links (rich_text)
- Share the database with your integration
- Fork this val in Val Town
- Set environment variables:
NOTION_API_KEY= your integration tokenTODOS_DB_ID= your database IDSEARCH_KEYWORDS=todo(or your preferred keywords, comma-separated)
- Test with:
POST /tasks/todos?hours=1
Find and sync all matches from last 24 hours (uses SEARCH_KEYWORDS env var):
curl -X POST https://your-val.express/tasks/todos
Custom time window (still uses SEARCH_KEYWORDS env var):
curl -X POST "https://your-val.express/tasks/todos?hours=48"
Search for multiple keywords - Set SEARCH_KEYWORDS=todo,zinger,😀 then:
curl -X POST https://your-val.express/tasks/todos
Get recent pages (API):
curl "https://your-val.express/api/pages/recent?hours=12"
- For project-specific architecture: See
CLAUDE.md - For Val Town platform guidelines: See
AGENTS.md
- Runtime: Deno on Val Town
- Framework: Hono (lightweight web framework)
- Frontend: React 18.2.0 with Pico CSS (classless CSS framework)
- APIs:
- Notion API (@notionhq/client v2)
- Val Town blob storage
- Language: TypeScript
┌─────────────────────────────────────────────────────────────┐
│ Notion Workspace │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Page A │ │ Page B │ │ Page C │ │
│ │ "todo" │ │ "todo" │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ POST /tasks/todos?keyword=todo │
│ (Batch Search & Sync Endpoint) │
└─────────────────┬───────────────────┘
│
┌──────────────────┴──────────────────┐
│ │
▼ ▼
┌───────────────────┐ ┌──────────────────────────┐
│ Step 1: Search │ │ Step 3: Sync (Optimized)│
│ │ │ │
│ • Get recent pages│ │ • Read blobs │
│ • Fetch all blocks│ │ • Skip if synced: true │
│ • Search keywords │ │ • Use cached page ID │
│ • Extract data │ │ • Create/update pages │
│ • Validate │ │ • Mark synced: true │
└─────────┬─────────┘ └──────────────────────────┘
│ ▲
▼ │
┌─────────────────────┐ │
│ Step 2: Store │ │
│ │ │
│ • Save to blobs │──────────────────────┘
│ • Set synced: false │
│ • Compare timestamps│
│ • Preserve page ID │
└─────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Val Town Blob Storage │
│ │
│ demo--todo--block-1: { data, sync_metadata: {synced, page_id} }│
│ demo--todo--block-2: { data, sync_metadata: {synced, page_id} }│
│ demo--todo--block-3: { data, sync_metadata: {synced, page_id} }│
└─────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ HTTP Request │
│ POST /tasks/todos?hours=24&keyword=todo │
└──────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ ROUTE (todos.ts) │
│ • Extract query params │
│ • Call controller │
│ • Format HTTP response │
└─────────────┬───────────────┘
│
▼
┌─────────────────────────────────────┐
│ CONTROLLER (orchestration) │
│ • Validate inputs │
│ • Orchestrate workflow: │
│ 1. Get recent pages │
│ 2. Search each page │
│ 3. Sync to database │
│ • Return standardized result │
└──────────────┬──────────────────────┘
│
┌───────────┼───────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌──────────┐ ┌────────────┐
│ SERVICE: │ │ SERVICE: │ │ SERVICE: │
│ pages.ts │ │ blob.ts │ │ database.ts│
│ │ │ │ │ │
│ • API call│ │ • Blob │ │ • Query DB │
│ • Parse │ │ CRUD │ │ • Create │
│ • Return │ │ • Return │ │ • Update │
└─────┬─────┘ └────┬─────┘ └─────┬──────┘
│ │ │
▼ ▼ ▼
Notion API Blob Storage Notion API
MIT