How to manage state in Trident Storm topologies

Sumit Chawla's Blog

Code for this example@ https://github.com/sumitchawla/storm-examples

Trident API is in Storm Topologies is just another abstraction on how “Stream” of data is processed in Storm.

Basic Storm stream processing guarantees “at least once” message processing, whereas Trident API guarantees “exactly once” message processing.  In simple terms, that means,  basic stream processing makes sure that no message is ever lost. To achieve that, storm might replay the same message again and again, until it is certain that the message is processed successfully.   There is no direct way to figure out if the message has been played first time, or its being replayed due to an error or failure.   Trident API solves this problem partially by grouping this message into a batch.  If Trident API, needs to replay the same message again, it will come back with same Batch Id.  The application receiving this message will have to keep track of…

View original post 2,676 more words

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s