Use casesdemand-side platforms (DSP)AWS infrastructure

Building a Location-Based Mobile DSP on AWS: Architecture, Challenges, and Tradeoffs

RDSDynamoDBElastiCache RedisEC2Auto ScalingLambdaElastic Load BalancingRedshiftS3CloudFrontRoute53VPCAMIbid requestsSSPad exchangead servertrackerbankerlow latencyquery per second (QPS)

The Scenario

A publicly traded AdTech company specializing in location-based mobile advertising needed a robust, scalable cloud infrastructure to power its demand-side platform (DSP). The platform allows media buyers — brands, advertisers, and agencies — to purchase ad inventory from publishers (websites and mobile apps) in real time. The core technical challenges spanned years of incremental refinement: reducing manual infrastructure overhead, achieving the low-latency response times required by real-time bidding, implementing precise frequency capping, and eventually expanding to a second geographic region to access additional supply.

This write-up covers the architecture decisions made throughout that evolution, organized around the problems each decision was meant to solve.


Part 1: Core Infrastructure Challenges and Initial AWS Solutions

The Main Challenges

Manual infrastructure management. Early in the platform's life, the team maintained multiple instances relying on various third-party components. This translated into significant manual effort: managing multiple MySQL databases, performing regular backups, and patching instances individually.

Predictable instance replacement and code updates. Any production system running real-time bidding needs the ability to replace faulty compute instances quickly and deploy application code reliably. Without a structured approach, this process introduced unpredictability.

Frequency capping at low latency. Frequency capping — limiting the number of times a specific user sees the same ad within a given timeframe (e.g., a maximum of three exposures per 24-hour period) — is a core DSP feature. Implementing it correctly requires querying user-level data fast enough to keep pace with bid request throughput. Any storage layer that introduces meaningful latency breaks the feature.

Timely reporting. Tracked events (impressions, clicks, conversions) need to flow through to usable reports quickly. Delays undermine campaign management, budget pacing, and performance optimization.

Solutions Implemented

  • Amazon RDS replaced manually managed MySQL databases, eliminating routine backup tasks and reducing the operational overhead of database maintenance.
  • Auto Scaling with Amazon Machine Images (AMI) addressed the instance reliability problem. When an EC2 instance becomes unreachable or faulty, Auto Scaling can launch a pre-configured replacement automatically. The AMI serves as a known-good baseline, making outcomes predictable.
  • DynamoDB was selected for frequency capping and key bidder functions because of its ability to deliver low-latency access to user-level data at scale. The bidder queries DynamoDB to check and update per-user ad exposure counts in real time.
  • Elastic Load Balancing (ELB) and Lambda handle tracker log processing for event-based reporting. Lambda processes logs in parallel batches, often triggered by other AWS services.
  • ElastiCache for Redis supports real-time analytics queries (impressions, clicks, and derived metrics across multiple dimensions) and provides a fallback for correcting available budgets when tracker data is delayed.

Outcomes

  • Infrastructure management overhead dropped substantially. Time previously spent on manual database tasks was redirected to platform feature development.
  • Auto Scaling maintains high availability: new instances launch quickly whenever existing ones become unreachable.
  • DynamoDB enables granular, up-to-the-minute frequency capping while keeping bidder throughput high.
  • With ELB, Lambda, and Redshift working together, reports based on tracked events are processed and made available within 15 minutes.
  • The platform scales to handle up to 260,000 bid requests per second (queries per second, QPS).

Part 2: West Coast Expansion and Budget Control

Challenge 1: West Coast Data Center Setup

Real-time bidding operates under strict latency constraints. When a user opens a mobile app containing an ad slot, the app sends a bid request to an SSP or ad exchange, which fans it out to multiple DSPs simultaneously. Each DSP must return its bid within a defined window — typically 120ms to 300ms. Miss that window and the DSP is timed out; the opportunity is lost.

Proximity matters enormously here. SSPs and ad exchanges route bid requests to nearby DSPs to minimize round-trip latency. A platform running only on the East Coast will be at a structural disadvantage competing for supply generated by West Coast users and publishers. Some ad requests are explicitly routed to the nearest data center, meaning a DSP without West Coast presence simply won't receive them.

The architectural challenge was replicating the existing application stack in a new region without diverging from the established configuration.

Solution: The bidder instances and Lambda monitoring functions were deployed in a West Coast data center by copying the AMI configuration from the East Coast region. This ensured both regions ran identical application versions. VPC peering was used to transfer data between the two data centers, keeping campaign state synchronized.

Challenge 2: Budget Control (the "Banker" System)

The DSP receives 100,000+ ad requests per second from SSPs and ad exchanges. A core problem with real-time bidding is that there is an inherent delay between submitting a bid and receiving confirmation of the auction outcome. The DSP doesn't immediately know whether it won, so it can't immediately charge the campaign budget. If the bidder continues submitting bids during this delay without accounting for pending spend, advertisers overspend.

An initial approach used Redis to record budget spend, queried directly by the bidder. To further reduce overspending, a dedicated banker system was implemented.

How the banker works:

  1. The bidder requests budget resources from the banker before placing a bid.
  2. The banker returns available budget and adjusts the campaign's remaining balance to account for the pending bid (a reservation).
  3. ElastiCache for Redis stores budgets and reservations, supporting fast queries from the banker to retrieve per-bidder reservation data.
  4. The tracker subsequently reports actual spend back to the campaign reporting database, reconciling the reservation against the real outcome.

This reservation model means the platform can prevent overspending even when auction outcome data arrives late.


Platform Architecture: How the Components Fit Together

The diagram below illustrates the end-to-end flow of a bid request through the platform:

Diagram showing how a location-based mobile DSP processes bid requests end-to-end

Step-by-step flow:

  1. A user opens a mobile app containing an ad slot. The app sends a bid request — including user data such as device type and location — to an SSP and/or ad exchange.
  2. The SSP/ad exchange forwards the bid request to the DSP's bidder.
  3. The bidder requests budget resources from the banker. The banker returns available budget and adjusts the campaign balance to prevent overspending.
  4. The bidder matches the bid request against active advertiser campaigns using predefined targeting criteria. For example, if the bid request indicates a user in New York on an iPhone, the bidder searches for campaigns targeting that profile. If a match is found, the bidder returns a bid response (ad markup) to the SSP/ad exchange.
  5. If the bid wins, the ad markup loads in the mobile app, which sends a request to the ad server to retrieve the creative.
  6. The ad server delivers the ad to the mobile app.
  7. The tracker collects impression, click, and conversion data.
  8. Budget data flows from the tracker to the campaign reporting database.
  9. Campaign and analytics data also flow from the tracker to the campaign reporting database.
  10. The campaign reporting database aggregates all data and surfaces it in the user interface, where advertisers can plan campaigns, manage budgets, and review performance metrics (impressions, clicks, conversions).

Many of these steps happen concurrently rather than in strict sequence.

Component Summary

Component Role
Bidder Evaluates bid requests against campaign targeting criteria; submits bids to SSPs/ad exchanges
Ad server Delivers creatives to mobile users after a won auction
Tracker Collects impression, click, and conversion data for analytics and budget reconciliation
Banker Tracks campaign budget in real time; prevents overspending via a reservation model
Campaign reporting database Aggregates data from the tracker; feeds the UI
User interface Allows advertisers to plan, manage, and measure campaigns

Infrastructure Design Details

Security

  • Raw event logs are stored on S3. If a database fails, it can be restored from a snapshot and the logs re-ingested to recover data integrity.
  • For in-memory databases (ElastiCache for Redis), automatic snapshots are enabled to provide a recovery point.
  • Network access controls and HTTPS secure connections between clients and Elastic Load Balancers.

Compute Selection

Instance types were selected through benchmark testing — specifically A/B tests comparing performance-sensitive production instances, factoring in memory requirements alongside CPU.

  • The initial selection was M4 instances, chosen for their balanced resource profile.
  • As traffic grew, the platform migrated to C4 instances based on performance test results and the heavier compute workload.
  • The platform subsequently upgraded to C5 as the next generation of compute-optimized instances.

For container orchestration, ECS was selected over ECS Fargate and EKS primarily because ECS provides more disk space than Fargate typically offers.

Lambda handles tasks that run in parallel across small batches, including ELB log processing that drives real-time tracker actions such as redirecting users to landing pages, setting conversion cookies, and reporting impression and spend data to the banker.

Storage

Several storage options were evaluated over time, including S3 Glacier and Amazon Elastic File System (EFS). Amazon Elastic Block Store (EBS) was ultimately selected based on availability and reliability requirements relative to the existing architecture.

Database

The final database configuration:

  • Redshift — stores terabytes of event data for historical analytics and reporting
  • ElastiCache for Redis — provides real-time data access for the bidder (latency-sensitive queries)
  • PostgreSQL on RDS — handles all other storage requirements

Networking

  • Classic ELBs route traffic to EC2 instances (migration to Application Load Balancers is underway, enabling container-based routing and directing a single load balancer to different instances per domain)
  • ELB logs provide traffic data for analysis
  • CloudFront serves creative assets
  • Route 53 manages public and private domains and subdomains
  • VPC separates environments; VPC peering connects the East Coast and West Coast data centers

Performance Targets

The architecture is evaluated against these operational metrics:

  • Bid request responses within the defined latency window (e.g., 150ms)
  • Report generation completed within 15 minutes
  • Campaign budget data reliably delivered to the banker to prevent advertiser overspending

CloudWatch with custom application metrics, alarms, and dashboards provides ongoing performance monitoring.


Tradeoffs

The most notable architectural tradeoff in this setup is the deliberate choice to manage certain services manually rather than relying on full automation. Automated services reduce configuration time, but manual management allows more granular tuning of individual service parameters — which, for a latency-sensitive, high-throughput platform like a DSP, can yield meaningfully better overall performance. The cost is higher ongoing configuration effort; the benefit is finer control over the components that matter most.

For teams evaluating a similar architecture, this tradeoff is worth making explicit early: the services most worth automating are those where consistency matters more than customization (database backups, instance replacement, log processing). The services worth managing manually are those where performance headroom is being actively squeezed — typically the bidder compute layer and the real-time data access paths.