Useful Transformations and Actions in Apache Spark

Transformations

groupByKey([numPartitions])

When called on a dataset of (K, V) pairs, returns a dataset of (K, Iterable<V>) pairs.

reduceByKey(func, [numPartitions])

When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce functionĀ func, which must be of type (V,V) => V.

sortByKey([ascending], [numPartitions])

When called on a dataset of (K, V) pairs where K implements Ordered, returns a dataset of (K, V) pairs sorted by keys in ascending or descending order, as specified in the boolean ascending argument.

Actions

foreach(func)

Run a function func on each element of the dataset. This is usually done for side effects such as updating an Accumulator or interacting with external storage systems.

Source:Apache Spark