Tender coconut is such a tasty and healthy drink.
Quote for the day – 20th March 2019
Get the physics right and the stats will fall in place on their own.
Quote for the day – 3rd March 2019
Time is limited, if you believe in something and want to put it into action, do it now.
Quote for the day – 2nd March 2019
The wise know power of circulation.
Quote for the day – 4th February 2019
Being zestful is growth personified.
RDD : Resilient Distributed Dataset
Apache Spark has a key feature known as Resilient distributed datasets(RDD’s). The data structure available in Spark.
RDD’s are fault tolerant and can be operated in parallel.
RDD’s can be created by:-
- Parallel execution of collection available in driver program.
- By referencing dataset in an external storage system such as HDFS, HBase or data source supporting Hadoop input format.
RDD’s support two types of operations:
- Transformations
- Actions
Transformations generate a new dataset from already available dataset.
Actions return values to driver after working on dataset.
Unlike Map-Reduce, Spark does not carry out the complete life-cycle of data processing for task completion.
Spark is efficient and operates on datasets only when results are required by driver program.
CAP Theorem and Distributed System
CAP Theorem is also known as Brewer’s Theorem.
CAP refers to Consistency, Availability and Partition tolerance.
- Consistency – All nodes in a cluster have same data at any particular time.
- Availability – It is the end result of request i.e. success or failure.
- Partition tolerance – As per this feature, the system continues to operate despite any hardware or network failures.
According to CAP Theorem any distributed system can exhibit only 2 out of the above 3 features.
Most distributed systems experience network failures, hence partition tolerance has to be met by distributed systems.
For the remaining one condition to be met:
Databases designed based on ACID properties choose consistency over availability.
Note: Consistency here not to be confused with the one in ACID.
On the other hand, databases designed based on BASE properties choose availability over consistency.
HBase
In case of HBase, consistency and partition tolerance are met. But HBase does not offer availability. HBase is a NoSQL database.
Quote for the day – 28th December 2018
Clarity of thought brings the best out of an individual.
Quote for the day – 20th November 2018
A few words can do that which a book cannot.
Quote for the day – 18th November 2018
A positive attitude usually results in a favorable outcome. You have nothing to loose by being positive.