RDD : Resilient Distributed Dataset

Apache Spark has a key feature known as Resilient distributed datasets(RDD’s). The data structure available in Spark.

RDD’s are fault tolerant and can be operated in parallel.

RDD’s can be created by:-

  • Parallel execution of collection available in driver program.
  • By referencing dataset in an external storage system such as HDFS, HBase or data source supporting Hadoop input format.

RDD’s support two types of operations:

  1. Transformations
  2. Actions

Transformations generate a new dataset from already available dataset.

Actions return values to driver after working on dataset.

Unlike Map-Reduce, Spark does not carry out the complete life-cycle of data processing for task completion.

Spark is efficient and operates on datasets only when results are required by driver program.

CAP Theorem and Distributed System

CAP Theorem is also known as Brewer’s Theorem.

CAP refers to Consistency, Availability and Partition tolerance.

  • Consistency – All nodes in a cluster have same data at any particular time.
  • Availability – It is the end result of request i.e. success or failure.
  • Partition tolerance – As per this feature, the system continues to operate despite any hardware or network failures.

According to CAP Theorem any distributed system can exhibit only 2 out of the above 3 features.

Most distributed systems experience network failures, hence partition tolerance has to be met by distributed systems.

For the remaining one condition to be met:

Databases designed based on ACID properties choose consistency over availability.

Note: Consistency here not to be confused with the one in ACID.

On the other hand, databases designed based on BASE properties choose availability over consistency.

HBase

In case of HBase, consistency and partition tolerance are met. But HBase does not offer availability. HBase is a NoSQL database.

Law of Action : Economics and Startups

Newton’s 3rd law states :-

“For every action there is an equal and opposite reaction.”

A simple action such as walking proves the validity of the above statement, when we push our feet backwards on the ground, our body moves forward.

Does this mean all actions are bound to this law ?

Yes, if not all actions, most of them are bound to this law.

A ball pushed downwards on water surface bounces back due to the buoyancy exhibited by water.

A ball thrown up must come back because of gravity.

This law holds good even in the world of Economics and Startups, wonder how ?

Trade markets, Financial results, traction all these are driven by a law of action.

Cause and effect is a normalizing feature associated in Economics.

But is the world ready for a level playing field ?

No it isn’t and it will never be.

Every product created which exists is bound by this law of action.

Action and Reaction constitute a single entity. A pure play source of energy being transferred between two states.

A Startup in order to succeed must have necessary traction, if traction is the result, there needs to be a action in-order to achieve the desired result.

A chain of individual components and the synergy generated by this chain is what makes a startup succeed.

Everything that matters must be active in order to succeed. Being active at the right moment results in stupendous success.

To get the timing right is one thing and to get the time going with a thing is another. The former is the action and the latter is its reaction.