Use casesprogrammatic advertisingad operations reporting

Building an AdOps Reporting Dashboard for a Video AdTech Platform

AdOps dashboarddata aggregationApache SparkPostgreSQLAWSAmazon EMRAmazon RDSDockerKubernetesAmazon EKSTerraformheader bidding transaction logsdata processingmetrics computationlow-latency reporting

A leading video AdTech company needed a proprietary AdOps reporting dashboard — one that could collect, aggregate, normalize, process, and visualize data flowing out of its ad mediator platform. This article walks through the business use case, the technical requirements, the main challenges, and the stack used to deliver the solution.

The Scenario

The company operated an ad mediation platform that collected data from two separate data harvesters. Publishers using the platform needed visibility into campaign performance metrics and the ability to manage and control their inventory — but no reporting layer existed to surface that data in a usable way.

The goal was to build an AdOps reporting dashboard that would serve as that layer: pulling data from multiple sources, processing it, and presenting actionable metrics in a clean UI.

Main Challenges

Three interconnected technical challenges shaped the architecture of the solution:

Establishing a central aggregation point for the disparate data sources feeding the platform.
Computing and processing incoming data from raw header bidding transaction logs, and doing so quickly enough to meet a strict latency requirement.
Presenting processed metrics in the UI within 10 minutes of the underlying data being generated — a low-latency constraint that ruled out batch-only approaches.

A secondary requirement was infrastructure that would be straightforward to deploy and update over time, without heavy manual intervention.

Approach and Technical Decisions

Computing and Displaying Metrics

The core data processing challenge stemmed from the nature of the source files. Metrics had to be computed from raw header bidding transaction logs dumped into Amazon S3 buckets. These files varied considerably in size and contained multiple log types, meaning they required transformation before any meaningful metrics could be extracted and displayed.

Solution: Apache Spark was selected for data processing, given its ability to handle large, heterogeneous data files efficiently. PostgreSQL was used to store the aggregated metrics once processed. To meet the 10-minute latency requirement and support fast filtering and display in the UI, Apache Spark ran on Amazon EMR and PostgreSQL ran on Amazon RDS — both managed services that provide the scalability and availability needed for a production reporting workload.

Infrastructure Management

Manually managing infrastructure at this scale introduces risk and slows iteration. The requirement was for an infrastructure layer that could be created, modified, and versioned with minimal friction.

Solution: Terraform was adopted as the infrastructure-as-code tool, enabling reproducible environment creation and straightforward updates to existing configurations.

Containerization

Portions of the application required containerization to support clean deployment and future extensibility.

Solution: Docker was used for containerization, with Amazon EKS (a managed Kubernetes cluster) launched for orchestration. Kubernetes on EKS simplifies ongoing container management and reduces the operational overhead of scaling or modifying workloads over time.

Technology Stack

AWS (Amazon S3, Amazon EMR, Amazon RDS, Amazon EKS)
Apache Spark — raw transaction log processing
PostgreSQL — aggregated metrics storage
Terraform — infrastructure-as-code
Docker / Kubernetes (Amazon EKS) — containerization and orchestration
JavaScript — front-end reporting UI
Python — supporting data pipeline components

Outcomes and Considerations

The resulting platform gives publishers a practical window into their programmatic performance. Key capabilities include:

Filtering reports by demand partners, devices, and time intervals
Visibility into inventory performance across the ad mediation stack
Insight into demand partner performance to support optimization decisions

The 10-minute end-to-end latency target — from raw transaction log to visible dashboard metric — is achievable with this architecture, but it depends on right-sizing the EMR cluster relative to data volumes. Organizations considering a similar pattern should account for the cost implications of keeping EMR clusters warm versus spinning them up on demand, particularly if data ingestion is intermittent rather than continuous.

The use of Terraform and managed AWS services keeps the operational burden low, which is a meaningful advantage for AdTech teams that need to iterate quickly on reporting features without maintaining heavy infrastructure expertise in-house.