✅ - develop main.tsx ✅ - find every div that contains a post. ✅ - the exampleDiv.html file consists of a reference of what one of those post divs should look like. it represents a set of author's reviews to a pop song. ✅ - identify the title and artist of the song. In the example div the title is "Family Matters" and the artist is Skye Newman ✅ - identify the href for the entry ✅ - identify the overall score ([5.73] in the example div) ✅ - for each paragraph element that represents a review, identify the author's name, their url, the text of the their review, and the score they gave the review.
✅ the resulting output should be an array reviews of elements where each element has the following structure:
artist, song_title, href, overall_score, reviewer, reviewer_url, review_text, review_score
✅ for now, just console log the output ✅ parse scores as numbers (not strings with brackets)
The scraper successfully:
overall_score: 3.23, review_score: 5Current status: Working on page 2, extracting 19 reviews from 10 posts.
Next steps:
✅ - add a song_index integer field that starts at 0 so that each song has its own unique id. ✅ - Uncomment the loop but use 10 for pagesTotal. ✅ - Create a SQLite table to cache the results (drop and replace if the table already exists) ✅ - Change main.tsx to a script val. ✅ - create a results.tsx endpoint val that serves the JSON data from the sqlite table.
The scraper now:
singles_jukebox_reviewsUsage:
main.tsx to scrape and cache data (script val)results.tsx to get the JSON API of all cached reviews (HTTP val)Database Schema:
CREATE TABLE singles_jukebox_reviews (
id INTEGER PRIMARY KEY AUTOINCREMENT,
song_index INTEGER NOT NULL,
artist TEXT NOT NULL,
song_title TEXT NOT NULL,
href TEXT NOT NULL,
overall_score REAL,
reviewer TEXT NOT NULL,
reviewer_url TEXT NOT NULL,
review_text TEXT NOT NULL,
review_score INTEGER NOT NULL
)