Building A Scalable Data Lake
On AWS For An OTT Platform: Driving Speed & Reliability

About the client

Sony Corporation, a leading OTT content provider with live & on-demand streaming viewership over 50 million users


One major challenge that Sony faced is the lack of scalability of its bare metal infrastructure, making it unable to support large amounts of data, workloads, and concurrent users. Without a scalable infrastructure, the website was not able to handle traffic surges and that planning for and adding capacity took too much time and added costs, resulting in lost opportunities and revenue for the company. Besides, Sony needed to extract data from 45+ disparate data sources to build reports /dashboards for various businesses – a cumbersome process that was exacerbated by location, geographic and compliance constraints involved in accessing data from the different sources. With scattered data sources with no single source of truth, the team lacked visibility on user consumption behaviors and struggled to make data-informed decisions for the business.


Cloud Kinetics helped Sony build a data lake on AWS, establishing a single source of truth that makes it easier for users across the company to access, process, and consume the data. This involved migrating data from 6 disparate sources with AWS GLUE, a fully managed ETL (extract, transform, and load) AWS service that can automate the process of extracting and transforming data from various sources into a format suitable for analysis and reporting. Additionally, custom scripts from third-party tools were converted to work seamlessly with AWS services, enabling the team to derive actionable insights efficiently.

  • Build data lake on AWS as a single source of truth
  • Data migration from 6 data sources
  • Creation of ETL scripts in AWS GLUE
  • Conversion of custom scripts from 3rd party tool

Success Metrics

  • Time saved: Rapid (under 24 hrs) microsite deployment for timed events
  • Reliability: Sustained ability to handle website traffic surges
  • Scalability: Multicloud model supporting an elastic 400 to 1200 instances
Tags: Data Modernization Amazon Web Services (AWS) Data & Analytics Data Engineering Entertainment & Media