• Blog
  • Docs
  • Pricing
  • We’re hiring!
Log inSign up
tr3ntg

tr3ntg

readback-api

API for readback.
Unlisted
Like
readback-api
Home
Code
9
.claude
1
backend
4
marketing
3
.vtignore
README.md
deno.json
main.tsx
test-upload.html
test-url-extraction.html
Branches
2
Pull requests
Remixes
History
Environment variables
7
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data – all from the browser, and deployed in milliseconds.
Sign up now
Code
/
README.md
Code
/
README.md
Search
7/2/2025
Viewing readonly version of main branch: v113
View latest version
README.md

Speech API Wrapper with Document Text Extraction

This is a wrapper API for the LemonFox AI speech generation service with usage tracking, rate limiting, and document text extraction capabilities. It simplifies the interface by only exposing the essential parameters while handling authentication, usage tracking, and configuration internally.

Features

  • Authentication: RevenueCat subscription verification + admin bypass
  • Usage Tracking: Monthly character limits with SQLite storage
  • Rate Limiting: 4 million characters per calendar month per customer
  • Document Text Extraction: Extract clean text from .txt, .rtf, .docx, .md, and .pdf files for TTS
  • URL Content Extraction: Extract readable content from web pages using reader view algorithms
  • Modular Architecture: Separated middleware and database modules

Project Structure

├── backend/
│   ├── index.ts              # Main Hono app
│   ├── middleware/
│   │   ├── auth.ts          # Authentication middleware
│   │   └── usage.ts         # Usage tracking middleware
│   ├── database/
│   │   └── usage.ts         # Database operations
│   └── README.md
├── main.tsx                  # Frontend (if applicable)
└── README.md

Setup

Required Environment Variables

  1. LEMONFOX_API_KEY - Your LemonFox API key
  2. REVENUECAT_API_KEY - Your RevenueCat API key for subscription verification
  3. ADMIN_ACCESS_KEY - Admin bypass key for development/testing
  4. PDFVECTOR_API_KEY - Your PDFVector API key for PDF text extraction

Configuration

Update the REVENUECAT_PROJECT_ID constant in /backend/middleware/auth.ts with your actual RevenueCat project ID.

Authentication

All API endpoints (except /health) require authentication via the Authorization header:

Admin Access

Authorization: Bearer YOUR_ADMIN_ACCESS_KEY

Customer Access

Authorization: Bearer CUSTOMER_ID

or

Authorization: Customer CUSTOMER_ID

The API will verify the customer has active entitlements via RevenueCat before allowing access.

Usage Limits

  • Monthly Limit: 4,000,000 characters per customer
  • Reset Period: Calendar month (1st to last day of month)
  • Admin Users: Unlimited usage
  • Tracking: Automatic usage recording after successful requests

Endpoints

POST /api/speech

Generates speech from text using the LemonFox AI service.

Request Body:

{ "voice": "sarah", "input": "Text to convert to speech" }

Response:

{ "audio": "_base_64_encoded_audio_here", "word_timestamps": [ { "word": "Hello!", "start": 0.275, "end": 0.7 } ] }

Error Response (Rate Limited):

{ "error": "Monthly character limit exceeded", "details": { "monthly_limit": 4000000, "current_usage": 3950000, "requested_characters": 100000, "remaining_characters": 50000 } }

POST /api/extract-text

Extracts clean text content from uploaded documents (.txt, .rtf, .docx, .md, .pdf) or from web URLs for TTS processing.

File Upload Method

Request:

  • Method: POST
  • Content-Type: multipart/form-data
  • Body: Form data with 'file' field containing the document

Supported File Types:

  • .txt - Plain text files
  • .rtf - Rich Text Format files
  • .docx - Microsoft Word documents
  • .md - Markdown files
  • .pdf - PDF documents

File Size Limit: 10MB

Processing Details:

  • Markdown files: Processed locally using the remove-markdown package
    • Strips all Markdown formatting (headers, links, lists, etc.)
    • Preserves alt text from images
    • Supports GitHub-Flavored Markdown
  • PDF files: Processed via the PDFVector API
    • Uses AI enhancement when needed for complex layouts
    • Consumes 1-2 credits per page
    • Requires valid PDFVECTOR_API_KEY

URL Extraction Method

Request:

  • Method: POST
  • Content-Type: application/json
  • Body: JSON object with URL
{ "url": "https://example.com/article" }

URL Requirements:

  • Must be HTTP or HTTPS protocol
  • Must be publicly accessible
  • Should contain article-like content for best results

Response Format (Both Methods)

Success Response:

{ "text": "Extracted clean text content suitable for TTS processing...", "filename": "document.pdf", "fileType": "pdf", "wordCount": 1250, "characterCount": 6890 }

For URL extraction, fileType will be "url" and filename will be the page title or domain name.

Error Responses:

{ "error": "PDFVector API key not configured. Please set PDFVECTOR_API_KEY environment variable." }
{ "error": "PDF processing error: Document processing timed out", "code": "timeout-error" }
{ "error": "Failed to fetch URL: 404 Not Found" }
{ "error": "Invalid URL format" }

Text Processing:

  • Removes excessive whitespace and formatting artifacts
  • Strips common header/footer patterns and page numbers
  • Optimizes text flow for natural TTS reading
  • Handles complex document structures (tables, lists, etc.)
  • For URLs: Extracts main article content using reader view algorithms
  • For Markdown: Converts to plain text while preserving readability
  • For PDFs: Uses AI when needed to handle complex layouts and maintain reading order

GET /api/usage

Get usage statistics for the authenticated customer. Response varies based on user tier.

Premium User Response:

{ "user_tier": "premium", "monthly_limit": 600000, "current_usage": 150000, "remaining_characters": 450000, "usage_percentage": 25, "reset_date": "2024-02-01T00:00:00.000Z" }

Free User Response:

{ "user_tier": "free", "lifetime_limit": 25000, "current_usage": 5000, "remaining_characters": 20000, "usage_percentage": 20, "message": "Upgrade to premium for monthly limits and unlimited usage." }

Admin User Response (no query parameter):

{ "message": "Admin users have unlimited usage", "is_admin": true }

Admin Query Parameter:

GET /api/usage?customer_id=specific_customer_id

Allows admins to check usage for any specific customer ID.

GET /health

Health check endpoint that returns the service status.

Response:

{ "status": "ok", "timestamp": "2024-01-01T00:00:00.000Z" }

GET /

Root endpoint that provides basic API information.

Response:

{ "message": "Text Extraction API", "version": "1.0.0", "endpoints": { "health": "/health", "speech": "/api/speech", "extractText": "/api/extract-text", "usage": "/api/usage" }, "supportedFileTypes": ["txt", "rtf", "docx", "md", "pdf"], "timestamp": "2024-01-01T00:00:00.000Z" }

Database Schema

The system uses SQLite to track usage with the following schema:

Table: customer_usage_v1

  • id - Auto-incrementing primary key
  • customer_id - Customer identifier from RevenueCat
  • character_count - Number of characters in the request
  • request_timestamp - ISO timestamp of the request
  • created_at - Database insertion timestamp

Configuration

The API automatically configures:

  • response_format: Always set to "mp3"
  • word_timestamps: Always set to true

Only voice and input parameters need to be provided by the client.

External Services

PDFVector API

  • Used for PDF text extraction
  • Pricing: 1-2 credits per page depending on AI usage
  • AI Enhancement: Automatically enabled for complex layouts
  • Timeout: 3 minutes maximum per document
  • Supported formats: PDF and Word documents

Error Codes

  • 400 - Bad Request (missing required fields, invalid file type)
  • 401 - Unauthorized (missing/invalid auth header)
  • 403 - Forbidden (no active subscription)
  • 429 - Too Many Requests (monthly limit exceeded)
  • 500 - Internal Server Error

Dependencies

  • hono - Web framework
  • mammoth - DOCX text extraction
  • rtf-parser - RTF text extraction
  • remove-markdown - Markdown to plain text conversion
  • pdfvector - PDF text extraction via API
  • @extractus/article-extractor - Web page content extraction
FeaturesVersion controlCode intelligenceCLIMCP
Use cases
TeamsAI agentsSlackGTM
DocsShowcaseTemplatesNewestTrendingAPI examplesNPM packages
PricingNewsletterBlogAboutCareers
We’re hiring!
Brandhi@val.townStatus
X (Twitter)
Discord community
GitHub discussions
YouTube channel
Bluesky
Open Source Pledge
Terms of usePrivacy policyAbuse contact
© 2025 Val Town, Inc.