HomeBig DataDoing extra with much less: Transferring from transactional to stateful batch processing

Doing extra with much less: Transferring from transactional to stateful batch processing

Amazon processes lots of of tens of millions of economic transactions every day, together with accounts receivable, accounts payable, royalties, amortizations, and remittances, from over 100 completely different enterprise entities. All of this knowledge is distributed to the eCommerce Monetary Integration (eCFI) programs, the place they’re recorded within the subledger.

Guaranteeing full monetary reconciliation at this scale is vital to day-to-day accounting operations. With transaction volumes exhibiting double-digit share development every year, we discovered that our legacy transactional-based monetary reconciliation structure proved too costly to scale and lacked the appropriate stage of visibility for our operational wants.

On this put up, we present you ways we migrated to a batch processing system, constructed on AWS, that consumes time-bounded batches of occasions. This not solely diminished prices by nearly 90%, but additionally improved visibility into our end-to-end processing circulate. The code used for this put up is out there on GitHub.

Legacy structure

Our legacy structure primarily utilized Amazon Elastic Compute Cloud (Amazon EC2) to group associated monetary occasions into stateful artifacts. Nonetheless, a stateful artifact might discuss with any persistent artifact, comparable to a database entry or an Amazon Easy Storage Service (Amazon S3) object.

We discovered this method resulted in deficiencies within the following areas:

  • Value – Individually storing lots of of tens of millions of economic occasions per day in Amazon S3 resulted in excessive I/O and Amazon EC2 compute useful resource prices.
  • Knowledge completeness – Totally different occasions flowed by means of the system at completely different speeds. For example, whereas a small stateful artifact for a single buyer order might be recorded in a few seconds, the stateful artifact for a bulk cargo containing 1,000,000 strains would possibly require a number of hours to replace absolutely. This made it tough to know whether or not all the info had been processed for a given time vary.
  • Advanced retry mechanisms – Monetary occasions have been handed between legacy programs utilizing particular person community calls, wrapped in a backoff retry technique. Nonetheless, community timeouts, throttling, or site visitors spikes might end in some occasions erroring out. This required us to construct a separate service to sideline, handle, and retry problematic occasions at a later date.
  • Scalability – Bottlenecks occurred when completely different occasions competed to replace the identical stateful artifact. This resulted in extreme retries or redundant updates, making it much less cost-effective because the system grew.
  • Operational assist – Utilizing devoted EC2 situations meant that we would have liked to take beneficial growth time to handle OS patching, deal with host failures, and schedule deployments.

The next diagram illustrates our legacy structure.

Transactional-based legacy architecture

Evolution is vital

Our new structure wanted to handle the deficiencies whereas preserving the core aim of our service: replace stateful artifacts primarily based on incoming monetary occasions. In our case, a stateful artifact refers to a gaggle of associated monetary transactions used for reconciliation. We thought of the next as a part of the evolution of our stack:

  • Stateless and stateful separation
  • Minimized end-to-end latency
  • Scalability

Stateless and stateful separation

In our transactional system, every ingested occasion ends in an replace to a stateful artifact. This turned an issue when 1000’s of occasions got here in all of sudden for a similar stateful artifact.

Nonetheless, by ingesting batches of knowledge, we had the chance to create separate stateless and stateful processing parts. The stateless part performs an preliminary cut back operation on the enter batch to group collectively associated occasions. This meant that the remainder of our system might function on these smaller stateless artifacts and carry out fewer write operations (fewer operations means decrease prices).

The stateful part would then be a part of these stateless artifacts with present stateful artifacts to provide an up to date stateful artifact.

For example, think about a web-based retailer abruptly obtained 1000’s of purchases for a well-liked merchandise. As a substitute of updating an merchandise database entry 1000’s of instances, we are able to first produce a single stateless artifact that summaries the newest purchases. The merchandise entry can now be up to date one time with the stateless artifact, lowering the replace bottleneck. The next diagram illustrates this course of.

Batch visualization

Minimized end-to-end latency

Not like conventional extract, remodel, and cargo (ETL) jobs, we didn’t need to carry out every day and even hourly extracts. Our accountants want to have the ability to entry the up to date stateful artifacts inside minutes of knowledge arriving in our system. For example, if they’d manually despatched a correction line, they wished to have the ability to test throughout the similar hour that their adjustment had the meant impact on the focused stateful artifact as a substitute of ready till the following day. As such, we centered on parallelizing the incoming batches of knowledge as a lot as doable by breaking down the person duties of the stateful part into subcomponents. Every subcomponent might run independently of one another, which allowed us to course of a number of batches in an meeting line format.


Each the stateless and stateful parts wanted to answer shifting site visitors patterns and doable enter batch backlogs. We additionally wished to include serverless compute to higher reply to scale whereas lowering the overhead of sustaining an occasion fleet.

This meant we couldn’t merely have a one-to-one mapping between the enter batch and stateless artifact. As a substitute, we constructed flexibility into our service so the stateless part might robotically detect a backlog of enter batches and group a number of enter batches collectively in a single job. Related backlog administration logic was utilized to the stateful part. The next diagram illustrates this course of.

Batch scalability

Present structure

To satisfy our wants, we mixed a number of AWS merchandise:

  • AWS Step Features – Orchestration of our stateless and stateful workflows
  • Amazon EMR – Apache Spark operations on our stateless and stateful artifacts
  • AWS Lambda – Stateful artifact indexing and orchestration backlog administration
  • Amazon ElastiCache – Optimizing Amazon S3 request latency
  • Amazon S3 – Scalable storage of our stateless and stateful artifacts
  • Amazon DynamoDB – Stateless and stateful artifact index

The next diagram illustrates our present structure.

Current architecture

The next diagram exhibits our stateless and stateful workflow.


The AWS CloudFormation template to render this structure and corresponding Java code is out there within the following GitHub repo.

Stateless workflow

We used an Apache Spark software on a long-running Amazon EMR cluster to concurrently ingest enter batch knowledge and carry out cut back operations to provide the stateless artifacts and a corresponding index file for the stateful processing to make use of.

We selected Amazon EMR for its confirmed extremely out there data-processing functionality in a manufacturing setting and likewise its capability to horizontally scale after we see elevated site visitors hundreds. Most significantly, Amazon EMR had decrease price and higher operational assist when in comparison with a self-managed cluster.

Stateful workflow

Every stateful workflow performs operations to create or replace tens of millions of stateful artifacts utilizing the stateless artifacts. Just like the stateless workflows, all stateful artifacts are saved in Amazon S3 throughout a handful of Apache Spark part-files. This alone resulted in an enormous price discount, as a result of we considerably diminished the variety of Amazon S3 writes (whereas utilizing the identical quantity of total storage). For example, storing 10 million particular person artifacts utilizing the transactional legacy structure would price $50 in PUT requests alone, whereas 10 Apache Spark part-files would price solely $0.00005 in PUT requests (primarily based on $0.005 per 1,000 requests).

Nonetheless, we nonetheless wanted a solution to retrieve particular person stateful artifacts, as a result of any stateful artifact might be up to date at any level sooner or later. To do that, we turned to DynamoDB. DynamoDB is a totally managed and scalable key-value and doc database. It’s very best for our entry sample as a result of we wished to index the placement of every stateful artifact within the stateful output file utilizing its distinctive identifier as a major key. We used DynamoDB to index the placement of every stateful artifact throughout the stateful output file. For example, if our artifact represented orders, we might use the order ID (which has excessive cardinality) because the partition key, and retailer the file location, byte offset, and byte size of every order as separate attributes. By passing the byte-range in Amazon S3 GET requests, we are able to now fetch particular person stateful artifacts as in the event that they have been saved independently. We have been much less involved about optimizing the variety of Amazon S3 GET requests as a result of the GET requests are over 10 instances cheaper than PUT requests.

Total, this stateful logic was break up throughout three serial subcomponents, which meant that three separate stateful workflows might be working at any given time.


The next diagram illustrates our pre-fetcher subcomponent.

Prefetcher architecture

The pre-fetcher subcomponent makes use of the stateless index file to retrieve pre-existing stateful artifacts that ought to be up to date. These could be earlier shipments for a similar buyer order, or previous stock actions for a similar warehouse. For this, we flip as soon as once more to Amazon EMR to carry out this high-throughput fetch operation.

Every fetch required a DynamoDB lookup and an Amazon S3 GET partial byte-range request. As a result of giant variety of exterior calls, fetches have been extremely parallelized utilizing a thread pool contained inside an Apache Spark flatMap operation. Pre-fetched stateful artifacts have been consolidated into an output file that was later used as enter to the stateful processing engine.

Stateful processing engine

The next diagram illustrates the stateful processing engine.

Stateful processor architecture

The stateful processing engine subcomponent joins the pre-fetched stateful artifacts with the stateless artifacts to provide up to date stateful artifacts after making use of customized enterprise logic. The up to date stateful artifacts are written out throughout a number of Apache Spark part-files.

As a result of stateful artifacts might have been listed on the similar time that they have been pre-fetched (additionally referred to as in-flight updates), the stateful processor additionally joins not too long ago processed Apache Spark part-files.

We once more used Amazon EMR right here to make the most of the Apache Spark operations which can be required to hitch the stateless and stateful artifacts.

State indexer

The next diagram illustrates the state indexer.

State Indexer architecture

This Lambda-based subcomponent information the placement of every stateful artifact throughout the stateful part-file in DynamoDB. The state indexer additionally caches the stateful artifacts in an Amazon ElastiCache for Redis cluster to offer a efficiency enhance within the Amazon S3 GET requests carried out by the pre-fetcher.

Nonetheless, even with a thread pool, a single Lambda operate isn’t highly effective sufficient to index tens of millions of stateful artifacts throughout the 15-minute time restrict. As a substitute, we make use of a cluster of Lambda features. The state indexer begins with a single coordinator Lambda operate, which determines the variety of employee features which can be wanted. For example, if 100 part-files are generated by the stateful processing engine, then the coordinator would possibly assign 5 part-files for every of the 20 Lambda employee features to work on. This methodology is extremely scalable as a result of we are able to dynamically assign extra or fewer Lambda employees as required.

Every Lambda employee then performs the ElastiCache and DynamoDB writes for all of the stateful artifacts inside every assigned part-file in a multi-threaded method. The coordinator operate screens the well being of every Lambda employee and restarts employees as wanted.

Distributed Lambda architecture


We used Step Features to coordinate every of the stateless and stateful workflows, as proven within the following diagram.

Step Function Workflow

Each time a brand new workflow step ran, the step was recorded in a DynamoDB desk by way of a Lambda operate. This desk not solely maintained the order wherein stateful batches ought to be run, nevertheless it additionally fashioned the premise of the backlog administration system, which directed the stateless ingestion engine to group extra or fewer enter batches collectively relying on the backlog.

We selected Step Features for its native integration with many AWS providers (together with triggering by an Amazon CloudWatch scheduled occasion rule and including Amazon EMR steps) and its built-in assist for backoff retries and complicated state machine logic. For example, we outlined completely different backoff retry charges primarily based on the kind of error.


Our batch-based structure helped us overcome the transactional processing limitations we initially got down to resolve:

  • Diminished price – Now we have been in a position to scale to 1000’s of workflows and lots of of million occasions per day utilizing solely three or 4 core nodes per EMR cluster. This diminished our Amazon EC2 utilization by over 90% in comparison with an analogous transactional system. Moreover, writing out batches as a substitute of particular person transactions diminished the variety of Amazon S3 PUT requests by over 99.8%.
  • Knowledge completeness ensures – As a result of every enter batch is related to a time interval, when a batch has completed processing, we all know that every one occasions in that point interval have been accomplished.
  • Simplified retry mechanisms – Batch processing implies that failures happen on the batch stage and will be retried instantly by means of the workflow. As a result of there are far fewer batches than transactions, batch retries are rather more manageable. For example, in our service, a typical batch incorporates about two million entries. Throughout a service outage, solely a single batch must be retried, versus two million particular person entries within the legacy structure.
  • Excessive scalability – We’ve been impressed with how straightforward it’s to scale our EMR clusters on the fly if we detect a rise in site visitors. Utilizing Amazon EMR occasion fleets additionally helps us robotically select probably the most cost-effective situations throughout completely different Availability Zones. We additionally just like the efficiency achieved by our Lambda-based state indexer. This subcomponent not solely dynamically scales with no human intervention, however has additionally been surprisingly cost-efficient. A big portion of our utilization has fallen throughout the free tier.
  • Operational excellence – Changing conventional hosts with serverless parts comparable to Lambda allowed us to spend much less time on compliance tickets and focus extra on delivering options for our clients.

We’re significantly excited in regards to the investments we’ve got made shifting from a transactional-based system to a batch processing system, particularly our shift from utilizing Amazon EC2 to utilizing serverless Lambda and massive knowledge Amazon EMR providers. This expertise demonstrates that even providers initially constructed on AWS can nonetheless obtain price reductions and enhance efficiency by rethinking how AWS providers are used.

Impressed by our progress, our group is shifting to exchange many different legacy providers with serverless parts. Likewise, we hope that different engineering groups can study from our expertise, proceed to innovate, and do extra with much less.

Discover the code used for this put up within the following GitHub repository.

Particular because of growth group: Ryan Schwartz, Abhishek Sahay, Cecilia Cho, Godot Bian, Sam Lam, Jean-Christophe Libbrecht, and Nicholas Leong.

In regards to the Authors

Tom Jin is a Senior Software program Engineer for eCommerce Monetary Integration (eCFI) at Amazon. His pursuits embrace constructing large-scale programs and making use of machine studying to healthcare purposes. He’s primarily based in Vancouver, Canada and is a fan of ocean conservation.

Karthik Odapally is a Senior Options Architect at AWS supporting our Gaming Clients. He loves presenting at exterior conferences like AWS Re:Invent, and serving to clients find out about AWS. His ardour outdoors of labor is to bake cookies and bread for household and pals right here within the PNW. In his spare time, he performs Legend of Zelda (Hyperlink’s Awakening) along with his 4 yr previous daughter.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments