4 attractions: Empire State Building, Edge Hudson Yards, Summit One Vanderbilt, Top of the Rock.
1 scrape per day. Scraper enters each attraction's public booking widget and reads every tour time and price. Public data only, no private API endpoints.
Playwright (Microsoft browser automation) drives a real Chromium browser. Standard clicks, keystrokes, waits. Same behavior as a guest booking manually.
TOR uses bot-protection. Scraper presents a standard browser identity to clear it.
Forward windows vary by attraction. ESB sells 173 days out (~6 months). Summit sells 200 days (~6.5 months). Edge and TOR sell 265 days (~9 months).
Apr 12 2026 scrape: ESB 9,568 rows across 172 dates. Edge 20,868 rows across 267 dates. Summit 5,371 rows across 199 dates. TOR 22,929 rows across 264 dates.
price_cents: base price in cents, before booking fee. Null if unavailable or sold out.
status: available, sold_out, going_fast
scrape_date: date row was captured
Making Prices Comparable
Base price only is stored. Booking fees not in the JSON. All-in price computed at display time.
Standard: single adult GA, same travel date.
Tour time intervals vary: ESB every 15 min, Summit every 30 min, Edge and TOR every 10 min. Cross-attraction comparisons match to nearest available tour time within ±30 min.
Booking fees: ESB $5 flat per order. Edge $2 per ticket. Summit $3 flat per order. TOR fee embedded in displayed price.
1-ticket all-in (Apr 2026): ESB $49, Edge $42, Summit $47, TOR $42. ESB is $5 to $7 higher at 1 ticket. Fee math shifts at larger party sizes.
Anomaly Handling
Sold-out tour times carry forward the last known price. Shown as "Sold Out" on tracker, excluded from averages, retained for same-date comparisons.
Zero-row scrapes block publish. Yesterday's data stays live until the next successful run.
Failure alerts fire via Gmail (OAuth) within seconds. Alert includes attraction name, row counts, and traceback.
Outliers flagged, not deleted. Threshold: price more than 3 standard deviations from 30-day rolling mean. Real price spikes (holidays, competitive moves) appear as outliers on day one but are retained.
Known closures (Summit private-event days) render as "Closed" in the tracker.
Sunset Detection
Sunset premium = peak evening price minus same-date noon price. Noon is the baseline because every attraction sells a noon tour time and noon prices are never sunset-inflated.
ESB: Tour times labeled "Sunset" or "Twilight" in the booking widget. Peak price in that window minus noon price.
TOR: Tour times labeled "Sunset" in the booking widget (e.g. "5:00 PM Sunset"). 9 price tiers at $3 intervals ($42 to $71). Dynamic by date.
Edge: Tour times labeled "Sunset" in the booking widget. Peak price in 4:00 PM to 9:00 PM window minus noon. 26 distinct prices observed, $1 steps.
Summit: No sunset label. Two price bands per date: floor and peak. $13 step on 199 of 200 dates (99.5%). Peak band = sunset window.
AI Utilization
Claude Code (Anthropic): Full system architecture, scraper implementation, schema design, data pipeline, static site (5 pages), sunset detection algorithm, case study document.
48-hour build. Manual estimate: 5-6 weeks with a 2-person team.
Competitor API discovery: Some booking platforms expose API endpoints queryable at scale. This approach likely violates terms of service. Production collection uses only public booking widget interaction, equivalent to manual guest behavior.
AI limitations: Cannot predict competitor pricing intent, cannot capture login-gated rates, cannot forecast site structure changes.
Limitations
Prices reflect scrape-time widget display. Fees layered in UI, not stored. Checkout totals can differ if fee structure changes between runs.
No pre-launch baseline. Tool created April 2026. History grows one day per run.
TOR uses bot-protection. Scraper clears it with a standard browser identity. Site-side changes may require re-tuning.
IP throttling or site rebuilds can break a scraper without warning. Validation gate prevents empty runs from overwriting good data. Gaps possible until patched.
Reseller pricing (Viator, GetYourGuide) not captured. Direct ticket pages only. Adding resellers: 1-month build.
Forward windows differ. ESB 173 days (~6 months), Summit 200 days (~6.5 months), Edge and TOR 265 days (~9 months). Comparative analysis limited to the ESB window.
Assumptions
GA only: All comparisons use general-admission pricing. Premium tiers (Express, VIP, Sunrise) excluded.
Single adult: One ticket, one adult. No child/senior rates, no group discounts.
Base price: Price before booking fees. All-in computed at display time using known fee structures.
Same travel date: Cross-attraction comparisons always use the same calendar date.
Nearest tour time: When comparing sunset pricing, each ESB tour time matched to nearest competitor tour time within ±30 min.
Noon baseline: Sunset premium = peak evening price minus noon price. Noon chosen because all attractions sell it and none price it as sunset.
Sold-out carry-forward: When a tour time sells out, last known price retained for trend analysis.
No reseller pricing: Direct booking widget only. OTA prices (Viator, GetYourGuide) not captured.
Infrastructure & Deployment
Current build (pilot): Local machine, SQLite, JSON export, GitHub Pages, shared password. Adequate for 48-hour demo. Not production-grade.
Help chatbot (bottom-right corner): Plain-language Q&A on pricing data, sunset logic, and methodology.
Production build: Azure tenant. Container Apps Jobs (daily cron). PostgreSQL (audit, backups). Blob Storage + CDN (JSON). Static Web Apps + Entra ID SSO (IT-provisioned access). Key Vault (secrets). Monitor (alerting).
Cost: $58 to $99/month.
GitHub Actions: Workflow in repo, dormant. Enable in settings to activate daily scrape.