$ spark-shell
$ JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home
val input = sc.textFile("input.txt")
val tokenized = input.map(line => line.split(" ")).filter(words => words.size > 1)
val counts = tokenized.map(words => (words(0), 1)).reduceByKey { (a,b) => a + b }
counts.collect()

<aside> 💡 At this stage, you can see that each Spark action relates to the concept of a job. Each job consists of one or more stages, which groups tasks that can be executed within one node without the need for shuffling.

Under the hood, Spark creates a physical execution plan which can be understood as a “recipe” of exact steps how to transform data into final form.

</aside>

input.toDebugString
// ...
counts.toDebugString
input.getNumPartitions
input.glom().collect()
counts.cache()
counts.collect()

In summary