Create a webhook endpoint that processes PDF text extraction and blob storage for Notion pages, following the established separation of concerns architecture.
Before starting, verify these files exist and understand their current state:
/backend/routes/tasks/_tasks.routes.ts
(route definitions)/backend/controllers/tasks.controller.ts
(business logic)/backend/services/notion.service.ts
(Notion API calls)/backend/utils/pdfExtractor.ts
(PDF processing utilities)/backend/types/
(type definitions directory)
Create or update /backend/types/tasks.types.ts
with the following content:
/**
* Task processing type definitions
*/
export interface ProcessBlobRequest {
pageId: string;
}
export interface ProcessBlobResponse {
success: boolean;
pageId: string;
blobKey?: string;
textLength?: number;
statusUpdated?: boolean;
timestamp: string;
error?: string;
details?: string;
}
export interface WebhookPayload {
data: {
id: string;
[key: string]: any;
};
[key: string]: any;
}
export interface NotionPageProperties {
id: string;
properties: {
PDF?: {
type: 'files';
files: Array<{
type: 'external' | 'file';
external?: { url: string };
file?: { url: string };
}>;
};
[key: string]: any;
};
}
In /backend/routes/tasks/_tasks.routes.ts
, ensure these imports are present at
the top:
import { Hono } from "npm:hono@3.12.12";
import { processBlobExtraction, extractPageIdFromWebhook } from "../../controllers/tasks.controller.ts";
In /backend/routes/tasks/_tasks.routes.ts
, add this endpoint before the
catch-all handler (the app.post("*", ...)
route):
// PDF text extraction and blob storage endpoint
app.post("/blob", async (c) => {
try {
// At this point, webhook auth has already passed
const body = await c.req.json();
console.log("📥 Blob processing webhook received:", body);
// Extract page ID from webhook payload
const pageId = extractPageIdFromWebhook(body);
if (!pageId) {
return c.json({ error: "Page ID is required" }, 400);
}
// Process the blob extraction using controller
const result = await processBlobExtraction({ pageId });
// Return appropriate HTTP status based on result
if (result.success) {
return c.json(result);
} else {
const statusCode = result.error?.includes("No PDF") || result.error?.includes("No text") ? 400 : 500;
return c.json(result, statusCode);
}
} catch (error) {
console.error("❌ Error in blob endpoint:", error);
return c.json({
error: "Internal server error",
details: error.message,
timestamp: new Date().toISOString()
}, 500);
}
});
In the same file, find the catch-all handler and update the availableEndpoints
array to include the new endpoint:
app.post("*", (c) => {
const path = c.req.path;
const method = c.req.method;
const returnObj = {
error: "Endpoint not found",
path: path,
method: method,
availableEndpoints: [
"GET /tasks/debug-webhook",
"POST /tasks/test",
"POST /tasks/notion-webhook",
"POST /tasks/blob", // <- Add this line
],
};
console.log(returnObj);
return c.json(returnObj, 404);
});
In /backend/controllers/tasks.controller.ts
, ensure these imports are present
at the top:
import { blob } from "https://esm.town/v/std/blob";
import { getPageProperties, updatePageStatus } from "../services/notion.service.ts";
import { extractTextFromPDFUrl } from "../utils/pdfExtractor.ts";
import type {
ProcessBlobRequest,
ProcessBlobResponse,
WebhookPayload,
NotionPageProperties
} from "../types/tasks.types.ts";
In /backend/controllers/tasks.controller.ts
, add this function:
/**
* Process PDF extraction and blob storage for a Notion page
*/
export async function processBlobExtraction(request: ProcessBlobRequest): Promise<ProcessBlobResponse> {
const { pageId } = request;
const timestamp = new Date().toISOString();
try {
console.log(`🔍 Processing page: ${pageId}`);
// Get page properties from Notion
const pageResult = await getPageProperties(pageId);
if (!pageResult.success) {
console.error(`❌ Failed to get page properties: ${pageResult.error}`);
return {
success: false,
pageId,
timestamp,
error: `Failed to get page properties: ${pageResult.error}`
};
}
const page = pageResult.data;
console.log(`📄 Retrieved page properties for: ${page.id}`);
// Extract PDF URL from page properties
const pdfUrl = extractPdfUrl(page);
if (!pdfUrl) {
console.error("❌ No valid PDF found on page");
return {
success: false,
pageId,
timestamp,
error: "No PDF files found in PDF property"
};
}
console.log(`📎 Found PDF URL: ${pdfUrl}`);
// Extract text from PDF
console.log("🔄 Starting PDF text extraction...");
const extractedText = await extractTextFromPDFUrl(pdfUrl);
if (!extractedText || extractedText.trim().length === 0) {
console.error("❌ No text extracted from PDF");
return {
success: false,
pageId,
timestamp,
error: "No text could be extracted from PDF"
};
}
// Save extracted text to blob storage
const blobKey = `findings--transcripts--${pageId}`;
console.log(`💾 Saving extracted text to blob with key: ${blobKey}`);
await blob.setJSON(blobKey, {
pageId: pageId,
extractedText: extractedText,
extractedAt: timestamp,
textLength: extractedText.length,
pdfUrl: pdfUrl
});
console.log(`✅ Text saved to blob storage successfully`);
// Update page status to "Done" and save blob key
console.log("🔄 Updating page status to 'Done' and saving blob key...");
const statusResult = await updatePageStatus(pageId, "Done", blobKey);
if (!statusResult.success) {
console.error(`⚠️ Failed to update page status: ${statusResult.error}`);
// Don't fail the entire operation if status update fails
} else {
console.log("✅ Page status updated to 'Done' and blob key saved");
}
return {
success: true,
pageId,
blobKey,
textLength: extractedText.length,
statusUpdated: statusResult.success,
timestamp
};
} catch (error) {
console.error("❌ Error processing blob extraction:", error);
return {
success: false,
pageId,
timestamp,
error: "Internal server error",
details: error.message
};
}
}
In /backend/controllers/tasks.controller.ts
, add these helper functions:
/**
* Extract PDF URL from Notion page properties
*/
function extractPdfUrl(page: NotionPageProperties): string | null {
// Extract PDF property
const pdfProperty = page.properties?.PDF;
if (!pdfProperty || pdfProperty.type !== 'files') {
console.error("❌ No PDF property found or property is not of type 'files'");
return null;
}
const files = pdfProperty.files;
if (!files || files.length === 0) {
console.error("❌ No files found in PDF property");
return null;
}
// Get the first (and should be only) PDF file
const pdfFile = files[0];
if (!pdfFile) {
console.error("❌ PDF file is null or undefined");
return null;
}
// Get the file URL (handle both external and Notion-hosted files)
if (pdfFile.type === 'external') {
return pdfFile.external?.url || null;
} else if (pdfFile.type === 'file') {
return pdfFile.file?.url || null;
} else {
console.error("❌ Unknown PDF file type:", pdfFile.type);
return null;
}
}
/**
* Extract page ID from Notion webhook payload
*/
export function extractPageIdFromWebhook(webhookBody: WebhookPayload): string | null {
const pageId = webhookBody.data?.id;
if (!pageId) {
console.error("❌ No page ID found in webhook payload", JSON.stringify(webhookBody, null, 2));
return null;
}
return pageId;
}
Check that /backend/services/notion.service.ts
contains these functions. If
they don't exist, add them:
import { Client } from "npm:@notionhq/client";
const notion = new Client({ auth: Deno.env.get("NOTION_API_KEY") });
export async function getPageProperties(pageId: string) {
try {
const response = await notion.pages.retrieve({
page_id: pageId,
});
return {
success: true,
data: response,
timestamp: new Date().toISOString(),
};
} catch (error) {
return {
success: false,
error: error.message,
timestamp: new Date().toISOString(),
};
}
}
export async function updatePageStatus(pageId: string, status: string, blobKey?: string) {
try {
const properties: any = {
Status: {
select: {
name: status
}
}
};
// Add blob key if provided
if (blobKey) {
properties["Blob Key"] = {
rich_text: [
{
text: {
content: blobKey
}
}
]
};
}
const response = await notion.pages.update({
page_id: pageId,
properties: properties
});
return {
success: true,
data: response,
timestamp: new Date().toISOString(),
};
} catch (error) {
return {
success: false,
error: error.message,
timestamp: new Date().toISOString(),
};
}
}
Check that /backend/utils/pdfExtractor.ts
contains the extractTextFromPDFUrl
function. If it doesn't exist, add it:
/**
* Extract text from a PDF file accessible via URL
*/
export async function extractTextFromPDFUrl(pdfUrl: string): Promise<string> {
try {
console.log(`📄 Downloading PDF from URL: ${pdfUrl}`);
// Download the PDF file
const response = await fetch(pdfUrl);
if (!response.ok) {
throw new Error(`Failed to download PDF: ${response.status} ${response.statusText}`);
}
const arrayBuffer = await response.arrayBuffer();
console.log(`📄 Downloaded PDF, size: ${arrayBuffer.byteLength} bytes`);
// Use pdfjs-dist to extract text
const pdfjsLib = await import("https://esm.sh/pdfjs-dist@4.0.379/legacy/build/pdf.mjs");
// Load the PDF document
const loadingTask = pdfjsLib.getDocument({ data: arrayBuffer });
const pdfDocument = await loadingTask.promise;
console.log(`📄 PDF loaded with ${pdfDocument.numPages} pages`);
// Extract text from all pages
let fullText = '';
for (let pageNum = 1; pageNum <= pdfDocument.numPages; pageNum++) {
const page = await pdfDocument.getPage(pageNum);
const textContent = await page.getTextContent();
// Combine text items into a single string
const pageText = textContent.items
.map((item: any) => item.str)
.join(' ');
fullText += pageText + '\n';
console.log(`📄 Extracted text from page ${pageNum}: ${pageText.length} characters`);
}
console.log(`✅ Total extracted text: ${fullText.length} characters`);
return fullText.trim();
} catch (error) {
console.error("❌ Error extracting text from PDF URL:", error);
throw new Error(`PDF text extraction failed: ${error.message}`);
}
}
# Use the fetch tool to test the endpoint
fetch("/tasks/blob", {
method: "POST",
body: JSON.stringify({
data: {
id: "test-page-id-123"
}
}),
headers: {
"Content-Type": "application/json"
}
})
# Test with invalid payload
fetch("/tasks/blob", {
method: "POST",
body: JSON.stringify({
data: {} // Missing id
}),
headers: {
"Content-Type": "application/json"
}
})
# Use the requests tool to examine execution logs
requests("main.tsx")
After implementation, verify:
- [ ] Types file created:
/backend/types/tasks.types.ts
exists with all required interfaces - [ ] Route handler added: Endpoint is in
/backend/routes/tasks/_tasks.routes.ts
before catch-all handler - [ ] Imports correct: All files have proper import statements with correct paths
- [ ] Controller function:
processBlobExtraction
exists in tasks controller with proper typing - [ ] Helper functions:
extractPageIdFromWebhook
andextractPdfUrl
exist with proper typing - [ ] Service functions:
getPageProperties
andupdatePageStatus
exist in notion service - [ ] Utility function:
extractTextFromPDFUrl
exists in PDF extractor utility - [ ] Available endpoints updated: New endpoint listed in catch-all handler
- [ ] Blob storage format: Uses key format
findings--transcripts--{pageId}
- [ ] Error handling: Proper HTTP status codes (400 for client errors, 500 for server errors)
- [ ] Logging consistency: All console.log statements use emoji prefixes
- [ ] Type safety: No
any
types used except where necessary for external APIs
Successful Request:
-
Status: 200
-
Response:
{ "success": true, "pageId": "abc123", "blobKey": "findings--transcripts--abc123", "textLength": 1234, "statusUpdated": true, "timestamp": "2024-01-01T12:00:00.000Z" }
Missing Page ID:
-
Status: 400
-
Response:
{ "error": "Page ID is required" }
No PDF Found:
-
Status: 400
-
Response:
{ "success": false, "pageId": "abc123", "error": "No PDF files found in PDF property", "timestamp": "2024-01-01T12:00:00.000Z" }
Internal Error:
-
Status: 500
-
Response:
{ "error": "Internal server error", "details": "Specific error message", "timestamp": "2024-01-01T12:00:00.000Z" }
This implementation follows the established patterns:
- Routes: Handle HTTP request/response only, delegate to controllers
- Controllers: Orchestrate business logic, call services and utilities
- Services: Make pure API calls to external systems (Notion)
- Utils: Provide reusable utility functions (PDF processing)
- Types: Centralized type definitions shared across modules
- Error Handling: Consistent success/error response patterns
- Logging: Emoji-prefixed console logs for easy debugging
The endpoint integrates seamlessly with existing webhook authentication middleware and follows the same patterns as other endpoints in the project.