Adventures in Big Data: Storm

Sunday, April 27, 2014

Storm-YARN at Yahoo! - Realtime streaming data processing over HDFS at Hadoop Summit

In Hadoop Summit last year, Andrew Feng of Yahoo! delivered a presentation on Storm-on-YARN, using as a use case the personalization they use in their web pages.

Apache Storm is a streaming data processing aimed at achieving real-time processing over big data. It is fault-tolerant (if any node fails, the task will be respawned in other nodes). We have previously talked about Lambda Architectures recently (and talking about Lambdoop at the same time), and about its seminal article by Nathan Marz. Storm is actually led by nathanmarz, and it proposes to form a topology of tasks separated in Spouts(sources of data) and Bolts (processors with inputs and outputs) in a DAG, which reminds me that of Apache Tez.

I've found the Storm explanation at Hortonworks page quite concise, concrete and to the point - so I'll just leave it there if you want to know more of Storm. I also recommend to browse the concepts at Storm home page, now at Apache Incubator.

I actually find interesting in the presentation, the explanation on how does Storm work internally by way of showing how storm-yarn makes your life easier.

PS. They actually tell everyone that Yahoo! had been using (by July 2013) YARN in a 10.000 nodes cluster. For those that doubt about stability and production-readiness of the system that brings that extra flex to HDFS.

PS2. All due respect, I find Mr. Feng's accent a bit hard to follow, but gotta say that as the presentation advanced I began to like it :)

Thursday, April 10, 2014

Spider.io and Storm: An storming love story with a happy ending

At spider.io, they have made (what I consider) a lot of changes to their core architecture, to adapt to a rapidly-changing business environment. The gotchas on these slides are worth a look (and the humour is appreciated too).

Storm at spider.io - London Storm Meetup 2013-06-18 from Ashley Brown

The fact that they changed technology so often, instead of suggesting a lack of vision, can be seen as a great flexibility and ability to adapt to their demands.

The concepts layered are also very challenging and are clearly explained. I've found these slides very rewarding for the time invested.