Monday, September 5, 2016

Testing Alluxio

Lately I am testing Alluxio, formerly Tachyon, a tool that I will be using for keeping in-memory a large bunch of objects that may have a limited number of ad-hoc, asynchronous, daily modifications, for (mostly) Spark processing over HDFS.

Looking at feedback from other companies (the quite hyped Baidu and Barclays examples, among others), it seems like Alluxio is a good fit for this problem. Specifically from Barclays', since their architecture sounds like a common pattern out there and similar to one that actually I've met.

However, there are some others contenders too, and I've been impressed by Apache Ignite, a whole suite for all things grid. It has an astonishing set of features, but especially the IgniteRDD (Spark Shared RDD) has caught my attention. Several modes of operation... very interesting. This would make for another testing effort, and especially so because its architecture (and use case set) seems to differ largely from that of Alluxio.

Also, maybe more general-purpose tools like Redis would be OK to make the work. Alluxio's integration with HDFS made it the first point of contact for attacking the problem for us, but certainly it does not mean it has to be the best case. A recent article on the benefits of using Spark with Redis for time series computation reported to accelerate Spark over 100 times, and Spark-with-Alluxio over 45 times (which also means that Spark with Alluxio only would get about 100/45 = around twice as fast as Spark alone... which sounds too little a number).

Let's see how it goes...

1 comment:

  1. The article is so appealing, When I was read this blog, I learnt new things & it’s truly have well stuff related to developing technology, Thank you for sharing this blog. Need to learn software testing companies, please share. It is very useful who is looking for smart test automation platform.

    ReplyDelete