Apache Spark
Apache Spark This is a new framework that is faster than MapReduce. It is written in Scala and has a more functional approach to programming. Spark extends the previous MapReduce framework to a generic distributed dataflow, properly modeled as a DAG. Resilient distributed datasets’ Lifecycle Resilient distributed datasets (RDD) are the unit data blocks of Apache Spark. These blocks are created, transformed and written back into the disk. Resilient means that they remain in memory or on disk on a “best effort” basis, and can be recomputed if need be....