Big Data is all the rage these days. But a lot of Big Data applications out there inherently rely on large, clunky, long running batch jobs. Looking to implement a real time learning system to block potentially fraudulent transactions? Or perhaps you wish to recommend some product just as a user removes an item from their basket. Waiting till the weekly 20 hour job runs isn't going to help you there.
Fortunately, we have good, robust, open source tools at hand that can enable us to tap into streaming data. In this talk, we will look at a combination of technologies that can help in this regard. Spark is a general purpose scale out processing system with support for micro-batch oriented stream processing. Kafka is a message broker developed for processing real time data feeds at high throughput and low latency. Cassandra is an excellent scale out partitioned row store with masterless, self healing capabilities that can work with Spark.