Cricket World Cup 1999(England): Lance Klusener – A force to reckon for the Proteas

Until world cup 1999 in England, no player in cricketing history had taken their home team to penultimate stage of the tournament single handedly.

And then there was Lance Klusener.

A sensation in 1999 world cup, here was a player who could bat, bowl and field plus win matches for his team.

He could wield a cricket bat like a baseball bat, he bowled with a jumping action which would strike the batsmen straight plumb in front of the wicket. His fielding standards were on par with his other skills.

It was a treat to watch his left handed stance in front of the wicket. Bowling swinging deliveries at medium pace he brought a gust on cricketing field.

Klusener can be attributed as one of the best all-rounder for south african ODI cricket team.

His style of play brought a new sensation in one day international(ODI) cricket. A revelation for cricketing pitches and to world cricket much alike.

I had the privilege to watch him live in action when India played South Africa at M Chinnaswamy stadium, Bengaluru in 2000. He would hit sixes with such ease that the newly installed electronic scoreboard at that time had to bear the brunt.

Thank you Klusener and host nation England for some wonderful memories of 1999 cricket world cup.

Useful Transformations and Actions in Apache Spark

Transformations

groupByKey([numPartitions])

When called on a dataset of (K, V) pairs, returns a dataset of (K, Iterable<V>) pairs.

reduceByKey(func, [numPartitions])

When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs where the values for each key are aggregated using the given reduce function func, which must be of type (V,V) => V.

sortByKey([ascending], [numPartitions])

When called on a dataset of (K, V) pairs where K implements Ordered, returns a dataset of (K, V) pairs sorted by keys in ascending or descending order, as specified in the boolean ascending argument.

Actions

foreach(func)

Run a function func on each element of the dataset. This is usually done for side effects such as updating an Accumulator or interacting with external storage systems.

Source:Apache Spark

RDD : Resilient Distributed Dataset

Apache Spark has a key feature known as Resilient distributed datasets(RDD’s). The data structure available in Spark.

RDD’s are fault tolerant and can be operated in parallel.

RDD’s can be created by:-

  • Parallel execution of collection available in driver program.
  • By referencing dataset in an external storage system such as HDFS, HBase or data source supporting Hadoop input format.

RDD’s support two types of operations:

  1. Transformations
  2. Actions

Transformations generate a new dataset from already available dataset.

Actions return values to driver after working on dataset.

Unlike Map-Reduce, Spark does not carry out the complete life-cycle of data processing for task completion.

Spark is efficient and operates on datasets only when results are required by driver program.

CAP Theorem and Distributed System

CAP Theorem is also known as Brewer’s Theorem.

CAP refers to Consistency, Availability and Partition tolerance.

  • Consistency – All nodes in a cluster have same data at any particular time.
  • Availability – It is the end result of request i.e. success or failure.
  • Partition tolerance – As per this feature, the system continues to operate despite any hardware or network failures.

According to CAP Theorem any distributed system can exhibit only 2 out of the above 3 features.

Most distributed systems experience network failures, hence partition tolerance has to be met by distributed systems.

For the remaining one condition to be met:

Databases designed based on ACID properties choose consistency over availability.

Note: Consistency here not to be confused with the one in ACID.

On the other hand, databases designed based on BASE properties choose availability over consistency.

HBase

In case of HBase, consistency and partition tolerance are met. But HBase does not offer availability. HBase is a NoSQL database.