Statistics for June 2019 – MOHANMA.COM

Firstly, thanks to everyone who have been visiting MOHANMA.COM.

Unique visitors for MOHANMA.COM in June 2019 is 1405.

Number of visits between June 1 to June 30 is 6159.

16th June 2019 clocked the highest number of visits in a day – 419.

Average hits received per day for June 2019 is 1019.

Thank you all.

Useful Transformations and Actions in Apache Spark

Transformations

groupByKey([numPartitions])

When called on a dataset of (K, V) pairs, returns a dataset of (K, Iterable<V>) pairs.

reduceByKey(func, [numPartitions])

When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function func, which must be of type (V,V) => V.

sortByKey([ascending], [numPartitions])

When called on a dataset of (K, V) pairs where K implements Ordered, returns a dataset of (K, V) pairs sorted by keys in ascending or descending order, as specified in the boolean ascending argument.

Actions

foreach(func)

Run a function func on each element of the dataset. This is usually done for side effects such as updating an Accumulator or interacting with external storage systems.

Source:Apache Spark

RDD : Resilient Distributed Dataset

Apache Spark has a key feature known as Resilient distributed datasets(RDD’s). The data structure available in Spark.

RDD’s are fault tolerant and can be operated in parallel.

RDD’s can be created by:-

  • Parallel execution of collection available in driver program.
  • By referencing dataset in an external storage system such as HDFS, HBase or data source supporting Hadoop input format.

RDD’s support two types of operations:

  1. Transformations
  2. Actions

Transformations generate a new dataset from already available dataset.

Actions return values to driver after working on dataset.

Unlike Map-Reduce, Spark does not carry out the complete life-cycle of data processing for task completion.

Spark is efficient and operates on datasets only when results are required by driver program.