BIG DATA on AWS

MIS reporting bottleneck resolved for India’s Largest Electricity Distributor

BUSINESS CHALLENGE

Thousands of feeders city-wide generate SCADA data every 15 minutes amounting to 500,000 records per day. Multi-dimensional reports are generated to identify and fix transmission losses, power failures and to improve service quality. With their existing on premise IT setup on Oracle SuperCluster, the MIS team struggled to present reports to decision makers on time.

BUSINESS GOAL

Timely availability of scheduled and ad-hoc reports to decision-makers enabling them to address quality issues.

SOLUTION OVERVIEW

  • Daily transfer of raw data to AWS cloud storage
  • AWS Architecture – Redshift cluster for Data orchestration & EC2 for Report Generation App
  • Batch processing for data cleansing, transformation and loading
  • Reports generated by directly querying Redshift data-sets

SOLUTION HIGHLIGHTS

  • A highly resilient, rapidly scalable AWS architecture conforming to best practices
  • Optimal operation of Redshift clusters
  • Significant reduction in query execution time

VALUE PROPOSITION

PERFORMANCE ENHANCEMENT

  • Daily Reports that needed 5+ hours to be generated, were now generated in less than 10 seconds
  • Yearly reports that took 12+ hours to be generated, were generated now in less than 1 minute

COST OPTIMIZATION

  • An automated process to backup everyday data, terminate the Redshift clusters and build
    them afresh next day with the existing data
  • Pay-as-you go Cloud model reduced 60% cost on data storage

HIGHLY SECURE BUILD

  • On premise to AWS connection was established using VPN Tunnelling
  • Role-based access control for the AWS environment
  • Cloud storage implemented with AWS best practices