The Technology
The ML Infrastructure
Behind the Data.
How Reelgood ingests 55M+ raw streaming entities, resolves them through a proprietary ML pipeline, and delivers 4.2M verified, deduplicated titles with a single canonical ID. In real time.
How It Works
From Raw Signal to Verified Record.
Every data point that enters the Reelgood pipeline passes through four stages before it becomes part of the verified catalog. No manual intervention at scale. No batch-and-wait. Just a continuous, automated process that keeps the data current.
01. Ingest
Raw entity collection
Data is collected continuously from 300+ streaming services across 25+ countries. Each service delivers its own identifiers, naming conventions, and catalog structure. The pipeline ingests all of it without requiring any pre-normalization.
55M+ entities
02. Normalize
Signal extraction and cleaning
Metadata signals are extracted and standardized across sources: cast, crew, runtime, synopsis, release date, original language, production studio, and regional title variants. Inconsistencies across providers are resolved at this stage.
03. Match
ML entity resolution
The matching model compares normalized signals across all known entities and applies confidence scoring to determine whether two records represent the same title. No shared ID required. Duplicates, variants, and regional releases are resolved to a single canonical record.
04. Output
Canonical record delivery
Verified records are assigned a single canonical ID and made available via the partner API or bulk S3 export within 5 minutes of ingestion. Each record maps to EIDR and streaming service-specific identifiers where they exist.
4.2M titlesML Matching
No Shared ID Required.
Most metadata matching systems require a shared external identifier (an EIDR, an internal service ID, or a third-party key) to connect records across sources. When those IDs are missing, mismatched, or proprietary, the match fails.
Reelgood's ML matching model works differently. It uses metadata signal comparison and confidence scoring to determine whether two records describe the same title. If the cast, runtime, synopsis, and release year all align across two provider records, the model resolves them to a single canonical entry, with or without a shared external ID.
The model is trained on feedback from more than 100 million consumer app users, giving it a uniquely large ground-truth dataset for resolving edge cases: remakes, regional variants, multi-part releases, and repackaged catalog content.
The same matching model handles catalog onboarding for new enterprise partners. Send your existing dataset in any format and the model maps it to our canonical ID framework without requiring your internal IDs to overlap with ours.
Input signals (per entity)
Identity
One Record Per Title. Forever.
Show › Season › Episode hierarchy
Every title in the Reelgood catalog has a single canonical ID. Multiple provider records, each with their own identifiers, naming conventions, and catalog structures, collapse into one authoritative entry.
The same canonical ID maps to EIDR and to hundreds of streaming service-specific identifiers. When a title moves between services, changes its packaging, or gets restructured across markets, the canonical ID stays fixed. The downstream systems that depend on it never break.
The show-season-episode hierarchy is maintained consistently across all providers. Where providers disagree on a season cut or episode numbering, Reelgood resolves to one canonical structure while preserving every variant beneath it.
- 285K movies with canonical records
- 70K TV shows, 163K seasons, 3.9M episodes
- 1.2M talent records (cast and crew)
- Single canonical ID mapped to EIDR + service-specific IDs
- Durable across catalog changes, service restructuring, and regional variants
- 100% movie poster coverage across the catalog
Coverage
The Depth Behind the Numbers.
Coverage is only valuable if it's complete at every level. Reelgood maintains full hierarchical data for every title, from show-level metadata down to individual episode records and image assets in multiple formats.
Territories and Languages
- English metadata available across all records
- Localized metadata in Spanish, French, German, Italian, Portuguese, and Hindi
- 14+ territory markets: US, Canada, UK, Germany, Australia, Ireland, New Zealand, India, Spain, France, Italy, Mexico, Argentina, Brazil
- New countries added within 1 month on request
Image Assets
- 100% movie poster coverage
- Posters in 2:3, 1:1, and 4:3 aspect ratios
- Backdrops (16:9), scene stills, logos, celebrity headshots
- Localized artwork per market where variants exist
- Delivered via Cloudflare CDN (HTTPS, 99.9%+ uptime)
Season and episode image coverage varies by asset type and service.
Historical Availability
- 7+ years of streaming availability records
- Exact window open and close dates per service
- Full movement history: which services held a title and when
- SVOD, AVOD, TVOD, and TVEverywhere coverage
- New countries can be added within 2 months
Delivery
Data Your Way.
Two delivery methods, both production-ready out of the box. The REST API suits real-time lookups and discovery features. Bulk S3 exports suit data pipelines, warehouses, and analytics platforms.
REST API
Real-time queries, streaming availability, and title lookup
- Format JSON
- Transport HTTPS / TLS 1.2+
- Rate limit (standard) 50 calls / second
- Authentication API key (header)
- Uptime SLA 99.9%
- Webhook updates Available on request
- Schema mapping Custom schemas supported
Bulk S3 Export
Full catalog access for pipelines, warehouses, and analytics
-
Formats
JSON CSV Parquet ORC
- Export frequency Twice daily (incremental)
- Delivery S3 bucket (your account)
- Update method Full catalog rewrite (up to 2x daily)
- Historical backfill 7+ years available
- Image assets Cloudflare CDN (no auth)
"rg_id": "7f3a2b1c-9d4e-4a2f-8b3c-1e2d4f5a6b7c",
"content_type": "show",
"title": "The Bear",
"release_year": 2022,
"language": "en",
"genres": ["Drama", "Comedy"],
"eidr": "10.5240/7791-8BB4-87E1-5BED-0527-W",
"imdb_id": "tt14452776",
"poster_2x3": "https://cdn.reelgood.com/img/show/7f3a2b1c/poster-350.jpg",
"backdrop_16x9": "https://cdn.reelgood.com/img/show/7f3a2b1c/backdrop-780.jpg",
"seasons_count": 4,
"episodes_count": 38,
"availability": [{
"service": "hulu",
"monetization": "svod",
"territory": "US",
"window_start": "2022-06-23",
"window_end": null,
"stream_url": "https://www.hulu.com/series/the-bear-05e05382-..."
}],
"updated_at": "2026-05-14T08:42:11Z"
}
Direct technical access
You work with the engineers and data scientists who built the system. No support tiers, no account management layer.
Custom schema delivery
Provide your target schema and data arrives pre-mapped. Reduces integration time for teams with existing data contracts.
ID cross-referencing
Map your existing internal IDs to Reelgood canonical IDs. Third-party IDs like EIDR and streaming service-specific IDs are included in every record.
Migration
Switch Without Risk.
Switching metadata providers usually means a high-stakes cutover: mapping old IDs to new ones, running validation in isolation, and hoping nothing breaks in production. Reelgood's migration process is built to eliminate that risk.
We ingest your existing catalog
Send your existing dataset in any format. The ML matching model maps your catalog to Reelgood's canonical ID framework without requiring your internal IDs to overlap with ours. Metadata signals do the work.
Parallel run for validation
Reelgood runs alongside your current provider while your team verifies accuracy and flags edge cases. Both systems return data for the same queries. You compare, validate, and reconcile before any live traffic depends on Reelgood.
Cutover when you're ready
You control the timeline. Cut over a single endpoint, a region, or your full stack, in the order that works for your team. Reelgood stays available for reconciliation queries after cutover for as long as you need.
Typical timeline
Most teams complete validation and cut over within a few weeks, not months. The parallel-run approach means there is no pressure to rush the timeline.
- No pre-existing ID alignment required
- No data loss during transition
- Full historical backfill available from day one
- Direct engineering support throughout
- Rollback available at any stage
Reliability
Built to Stay Up.
Production systems depend on this data. Reliability is not a feature. It is the baseline expectation. Reelgood's infrastructure and incident response are built around that.
Both the partner API and the image CDN. Service credits available in enterprise agreements.
24-hour resolution target. 24/7 on-call rotation with severity-based incident response.
No support tier. You work with the engineers who built the system, from onboarding through production.
-
Each title resolves to a single canonical record that corresponds to one specific work. A remake is a distinct canonical record from the original. Regional variants (e.g., a title packaged differently for a specific market) are preserved as variants beneath the canonical record. Multi-part releases (e.g., a film split into parts for streaming) are modeled as separate episodes or separate titles, depending on how providers and industry standards classify them.
The model is trained specifically on edge cases like these, using ground-truth data from 100M+ consumer interactions to validate ambiguous matches before they enter the canonical catalog.
-
Median API response time is under 100ms for single-title lookups. Bulk queries (availability across multiple services for a set of titles) scale with the query size but remain well under 500ms for typical request patterns. The API is served from Cloudflare's global edge network, so latency varies by region; US and European partners see the lowest numbers.
The rate limit is 50 calls per second on standard plans. Higher limits are available for production use cases that require burst capacity.
-
S3 exports are full catalog rewrites delivered up to twice daily. Every file in the bucket is replaced with the latest version of the complete dataset. No partial or delta exports. This means your pipeline always works from a clean, authoritative snapshot rather than trying to reconcile a stream of changes against a prior state.
The full US subscription is approximately 1.8GB uncompressed. Files are delivered in line-delimited JSON by default, with CSV, Parquet, and ORC available depending on your contract. For teams that prefer event-driven updates, the REST API with webhook notifications is the right fit.
-
Reelgood maintains one canonical show-season-episode hierarchy per title. When providers disagree on season structure (e.g., a show packaged as two seasons in one market and three in another), Reelgood resolves to one canonical structure while preserving all market-specific variants as additional records beneath it.
This means your downstream systems always receive a consistent, deduplicated representation of any series, regardless of how it's packaged across services or regions. The canonical ID for any episode remains stable even when provider packaging changes.
-
Yes. Reelgood's onboarding process maps your existing catalog to our canonical ID framework using metadata signals. No shared external identifier required. The result is a lookup table that maps each of your internal IDs to the corresponding Reelgood canonical ID.
Going forward, every record returned by the API or included in S3 exports contains both the Reelgood canonical ID and any third-party IDs (EIDR, streaming service-specific IDs) that exist for that title. You can use whichever identifier your systems prefer.
-
The canonical record persists. The canonical ID never changes and the title remains in the metadata catalog indefinitely. The availability record is updated to reflect that the title is no longer streaming on any active service, with the window close date and the last known service recorded.
Historical availability data, including the full movement history across all previous windows, remains accessible. This is particularly useful for rights and licensing teams tracking windowing patterns over time.
-
The partner API uses API key authentication passed as a request header. Keys are issued per integration and can be scoped to specific data endpoints or rate limit tiers. Key rotation is supported without downtime.
All API traffic is encrypted in transit via TLS 1.2 or higher. IP allowlisting is available for enterprise accounts that require it.
-
Current streaming availability data is refreshed continuously and available within 5 minutes of ingestion. This applies to real-time availability queries via the API and to the current state of the catalog in S3 exports.
Historical availability data covers 7+ years of records with exact window open and close dates. Historical records are immutable once written. A title that left a service in 2019 retains that exact record indefinitely. New historical data (records from newly onboarded services or markets) can be backfilled on request.
See the Data in Action.
Talk to our engineering team about your use case, data format, and integration requirements. Most evaluations start with a sample dataset tailored to your catalog.