What is temporarily lazy Val in Scala
(Why) do we need to call the cache or stay on an RDD
I think the question would be better phrased than:
When do we need to call the cache or stay on an RDD?
Spark processes are lazy, that is, nothing will happen until it is required. To quickly answer the question, nothing happens to the data after it is output, only one is created using the file as the source.
Let's say we transform this data a bit:
Nothing happens to the data here either. Now there is a new RDD that contains a reference to and a function that can be applied when needed.
Only when an action is applied to an RDD will the RDD chain Lineage executed. That is, the data divided into partitions is loaded by the Spark Cluster Executors), the function is applied and the result is calculated.
In a linear line, as in this example, is not required. The data is loaded into the executors, all transformations are applied, and finally that is calculated, all in memory - if the data fits in memory.
is useful when the origin of the RDD branches out. Suppose you want to filter the words in the previous example into a number of positive and negative words. You could do it like this:
Here each branch outputs a reload of the data. Adding an explicit statement ensures that previous processing is retained and reused. The job then looks like this:
This is why it is said to break the line because it creates a checkpoint that can be reused for further processing.
Rule of thumb: use when the origin of your RDD branches out or if an RDD is used multiple times like in a loop.
- Did Boris Johnson betray David Cameron
- What is the English name for Yamraaj
- How old is Eddie Van Halen now
- Are there cultures these days that eat insects?
- Is chewing gum gluten-free
- Who does the US census?
- How do you levitate with energy manipulation
- Is Chicago more dangerous than Oakland CA.
- Why do few people study astronomy?
- Trade domains
- What is H1B
- What does cogito ergo sum
- Why does NaOH not react with acetylene
- What are companies competing for quantum supremacy?
- Diesel engines should be banned
- Is SUNY a respected institution in Albany
- Is it profitable to invest in Bitcoin mining?
- How are sirens different from mermaids?
- Is Kamila Rakhimova a Muslim name
- What important events happened in 1950
- Magic is the science we don't understand
- What are some symbols for chemical compounds
- How to treat a scraped knee
- Which artist sang the song Little Wing?