The Technology

The ML Infrastructure
Behind the Data.

How Reelgood ingests 55M+ raw streaming entities, resolves them through a proprietary ML pipeline, and delivers 4.2M verified, deduplicated titles with a single canonical ID. In real time.

55M+
raw data entities ingested
4.2M
verified unique titles
99%+
ML-verified accuracy
<5 min
ingestion to availability

From Raw Signal to Verified Record.

Every data point that enters the Reelgood pipeline passes through four stages before it becomes part of the verified catalog. No manual intervention at scale. No batch-and-wait. Just a continuous, automated process that keeps the data current.

01. Ingest

Raw entity collection

Data is collected continuously from 300+ streaming services across 25+ countries. Each service delivers its own identifiers, naming conventions, and catalog structure. The pipeline ingests all of it without requiring any pre-normalization.

55M+ entities
+300 more

02. Normalize

Signal extraction and cleaning

Metadata signals are extracted and standardized across sources: cast, crew, runtime, synopsis, release date, original language, production studio, and regional title variants. Inconsistencies across providers are resolved at this stage.

03. Match

ML entity resolution

The matching model compares normalized signals across all known entities and applies confidence scoring to determine whether two records represent the same title. No shared ID required. Duplicates, variants, and regional releases are resolved to a single canonical record.

04. Output

Canonical record delivery

Verified records are assigned a single canonical ID and made available via the partner API or bulk S3 export within 5 minutes of ingestion. Each record maps to EIDR and streaming service-specific identifiers where they exist.

4.2M titles

No Shared ID Required.

Most metadata matching systems require a shared external identifier (an EIDR, an internal service ID, or a third-party key) to connect records across sources. When those IDs are missing, mismatched, or proprietary, the match fails.

Reelgood's ML matching model works differently. It uses metadata signal comparison and confidence scoring to determine whether two records describe the same title. If the cast, runtime, synopsis, and release year all align across two provider records, the model resolves them to a single canonical entry, with or without a shared external ID.

The model is trained on feedback from more than 100 million consumer app users, giving it a uniquely large ground-truth dataset for resolving edge cases: remakes, regional variants, multi-part releases, and repackaged catalog content.

The same matching model handles catalog onboarding for new enterprise partners. Send your existing dataset in any format and the model maps it to our canonical ID framework without requiring your internal IDs to overlap with ours.

Input signals (per entity)

Title Release year Cast Crew Runtime Synopsis Original language Production studio Regional title variants Episode structure
Confidence score 99.2%
Canonical match confirmed: single ID assigned

One Record Per Title. Forever.

The Bear (2022) poster
Example: canonical record
The Bear (2022)
Drama · Hulu · 4 seasons · 38 episodes

Show › Season › Episode hierarchy

The Bear
rg_id: 7f3a2b1c…  |  EIDR: 10.5240/XXXX
Season 1  (8 episodes)
rg_id: 8c4d3e2f…
S1E1: "System"
rg_id: 9d5e4f3a…
S1E2: "Hands"
rg_id: ae6f5a4b…
+ 6 more episodes…
Reelgood ID EIDR Hulu ID Peacock ID Prime Video ID Disney+ ID + hundreds more

Every title in the Reelgood catalog has a single canonical ID. Multiple provider records, each with their own identifiers, naming conventions, and catalog structures, collapse into one authoritative entry.

The same canonical ID maps to EIDR and to hundreds of streaming service-specific identifiers. When a title moves between services, changes its packaging, or gets restructured across markets, the canonical ID stays fixed. The downstream systems that depend on it never break.

The show-season-episode hierarchy is maintained consistently across all providers. Where providers disagree on a season cut or episode numbering, Reelgood resolves to one canonical structure while preserving every variant beneath it.

  • 285K movies with canonical records
  • 70K TV shows, 163K seasons, 3.9M episodes
  • 1.2M talent records (cast and crew)
  • Single canonical ID mapped to EIDR + service-specific IDs
  • Durable across catalog changes, service restructuring, and regional variants
  • 100% movie poster coverage across the catalog

The Depth Behind the Numbers.

Coverage is only valuable if it's complete at every level. Reelgood maintains full hierarchical data for every title, from show-level metadata down to individual episode records and image assets in multiple formats.

285K
Movies
70K
TV Shows
163K
Seasons
3.9M
Episodes
1.2M
Talent Records
🌎

Territories and Languages

  • English metadata available across all records
  • Localized metadata in Spanish, French, German, Italian, Portuguese, and Hindi
  • 14+ territory markets: US, Canada, UK, Germany, Australia, Ireland, New Zealand, India, Spain, France, Italy, Mexico, Argentina, Brazil
  • New countries added within 1 month on request
🖼

Image Assets

  • 100% movie poster coverage
  • Posters in 2:3, 1:1, and 4:3 aspect ratios
  • Backdrops (16:9), scene stills, logos, celebrity headshots
  • Localized artwork per market where variants exist
  • Delivered via Cloudflare CDN (HTTPS, 99.9%+ uptime)

Season and episode image coverage varies by asset type and service.

📈

Historical Availability

  • 7+ years of streaming availability records
  • Exact window open and close dates per service
  • Full movement history: which services held a title and when
  • SVOD, AVOD, TVOD, and TVEverywhere coverage
  • New countries can be added within 2 months

Data Your Way.

Two delivery methods, both production-ready out of the box. The REST API suits real-time lookups and discovery features. Bulk S3 exports suit data pipelines, warehouses, and analytics platforms.

REST API

Real-time queries, streaming availability, and title lookup

  • Format JSON
  • Transport HTTPS / TLS 1.2+
  • Rate limit (standard) 50 calls / second
  • Authentication API key (header)
  • Uptime SLA 99.9%
  • Webhook updates Available on request
  • Schema mapping Custom schemas supported

Bulk S3 Export

Full catalog access for pipelines, warehouses, and analytics

  • Formats
    JSON CSV Parquet ORC
  • Export frequency Twice daily (incremental)
  • Delivery S3 bucket (your account)
  • Update method Full catalog rewrite (up to 2x daily)
  • Historical backfill 7+ years available
  • Image assets Cloudflare CDN (no auth)
shows.jsonl  ·  line 1 of 70,284 Illustrative Sample

"rg_id" "7f3a2b1c-9d4e-4a2f-8b3c-1e2d4f5a6b7c"
"content_type" "show"
"title" "The Bear"
"release_year" 2022
"language" "en"
"genres" "Drama" "Comedy"
"eidr" "10.5240/7791-8BB4-87E1-5BED-0527-W"
"imdb_id" "tt14452776"
"poster_2x3" "https://cdn.reelgood.com/img/show/7f3a2b1c/poster-350.jpg"
"backdrop_16x9" "https://cdn.reelgood.com/img/show/7f3a2b1c/backdrop-780.jpg"
"seasons_count" 4
"episodes_count" 38
"availability"
"service" "hulu"
"monetization" "svod"
"territory" "US"
"window_start" "2022-06-23"
"window_end" null
"stream_url" "https://www.hulu.com/series/the-bear-05e05382-..."

"updated_at" "2026-05-14T08:42:11Z"
👥

Direct technical access

You work with the engineers and data scientists who built the system. No support tiers, no account management layer.

🔄

Custom schema delivery

Provide your target schema and data arrives pre-mapped. Reduces integration time for teams with existing data contracts.

🔎

ID cross-referencing

Map your existing internal IDs to Reelgood canonical IDs. Third-party IDs like EIDR and streaming service-specific IDs are included in every record.

Switch Without Risk.

Switching metadata providers usually means a high-stakes cutover: mapping old IDs to new ones, running validation in isolation, and hoping nothing breaks in production. Reelgood's migration process is built to eliminate that risk.

1

We ingest your existing catalog

Send your existing dataset in any format. The ML matching model maps your catalog to Reelgood's canonical ID framework without requiring your internal IDs to overlap with ours. Metadata signals do the work.

2

Parallel run for validation

Reelgood runs alongside your current provider while your team verifies accuracy and flags edge cases. Both systems return data for the same queries. You compare, validate, and reconcile before any live traffic depends on Reelgood.

3

Cutover when you're ready

You control the timeline. Cut over a single endpoint, a region, or your full stack, in the order that works for your team. Reelgood stays available for reconciliation queries after cutover for as long as you need.

Typical timeline

Weeks

Most teams complete validation and cut over within a few weeks, not months. The parallel-run approach means there is no pressure to rush the timeline.

  • No pre-existing ID alignment required
  • No data loss during transition
  • Full historical backfill available from day one
  • Direct engineering support throughout
  • Rollback available at any stage

Built to Stay Up.

Production systems depend on this data. Reliability is not a feature. It is the baseline expectation. Reelgood's infrastructure and incident response are built around that.

99.9%
Monthly API uptime SLA

Both the partner API and the image CDN. Service credits available in enterprise agreements.

1 hr
Critical issue response

24-hour resolution target. 24/7 on-call rotation with severity-based incident response.

Direct
Technical team access

No support tier. You work with the engineers who built the system, from onboarding through production.

Monitoring and observability: Reelgood operates 24/7 infrastructure monitoring across the API, data pipeline, and CDN. Partners receive status page access and can configure webhook alerts for data staleness events or SLA deviations.

Questions from engineering and data teams.

Common technical questions from developers, data engineers, and architects evaluating Reelgood for production use.

Talk to engineering
  • Each title resolves to a single canonical record that corresponds to one specific work. A remake is a distinct canonical record from the original. Regional variants (e.g., a title packaged differently for a specific market) are preserved as variants beneath the canonical record. Multi-part releases (e.g., a film split into parts for streaming) are modeled as separate episodes or separate titles, depending on how providers and industry standards classify them.

    The model is trained specifically on edge cases like these, using ground-truth data from 100M+ consumer interactions to validate ambiguous matches before they enter the canonical catalog.

  • Median API response time is under 100ms for single-title lookups. Bulk queries (availability across multiple services for a set of titles) scale with the query size but remain well under 500ms for typical request patterns. The API is served from Cloudflare's global edge network, so latency varies by region; US and European partners see the lowest numbers.

    The rate limit is 50 calls per second on standard plans. Higher limits are available for production use cases that require burst capacity.

  • S3 exports are full catalog rewrites delivered up to twice daily. Every file in the bucket is replaced with the latest version of the complete dataset. No partial or delta exports. This means your pipeline always works from a clean, authoritative snapshot rather than trying to reconcile a stream of changes against a prior state.

    The full US subscription is approximately 1.8GB uncompressed. Files are delivered in line-delimited JSON by default, with CSV, Parquet, and ORC available depending on your contract. For teams that prefer event-driven updates, the REST API with webhook notifications is the right fit.

  • Reelgood maintains one canonical show-season-episode hierarchy per title. When providers disagree on season structure (e.g., a show packaged as two seasons in one market and three in another), Reelgood resolves to one canonical structure while preserving all market-specific variants as additional records beneath it.

    This means your downstream systems always receive a consistent, deduplicated representation of any series, regardless of how it's packaged across services or regions. The canonical ID for any episode remains stable even when provider packaging changes.

  • Yes. Reelgood's onboarding process maps your existing catalog to our canonical ID framework using metadata signals. No shared external identifier required. The result is a lookup table that maps each of your internal IDs to the corresponding Reelgood canonical ID.

    Going forward, every record returned by the API or included in S3 exports contains both the Reelgood canonical ID and any third-party IDs (EIDR, streaming service-specific IDs) that exist for that title. You can use whichever identifier your systems prefer.

  • The canonical record persists. The canonical ID never changes and the title remains in the metadata catalog indefinitely. The availability record is updated to reflect that the title is no longer streaming on any active service, with the window close date and the last known service recorded.

    Historical availability data, including the full movement history across all previous windows, remains accessible. This is particularly useful for rights and licensing teams tracking windowing patterns over time.

  • The partner API uses API key authentication passed as a request header. Keys are issued per integration and can be scoped to specific data endpoints or rate limit tiers. Key rotation is supported without downtime.

    All API traffic is encrypted in transit via TLS 1.2 or higher. IP allowlisting is available for enterprise accounts that require it.

  • Current streaming availability data is refreshed continuously and available within 5 minutes of ingestion. This applies to real-time availability queries via the API and to the current state of the catalog in S3 exports.

    Historical availability data covers 7+ years of records with exact window open and close dates. Historical records are immutable once written. A title that left a service in 2019 retains that exact record indefinitely. New historical data (records from newly onboarded services or markets) can be backfilled on request.

See the Data in Action.

Talk to our engineering team about your use case, data format, and integration requirements. Most evaluations start with a sample dataset tailored to your catalog.