Home Top Spark GraphX Interview Questions

Top Spark GraphX Interview Questions

Answer:

  • Spark SQL (Shark)
  • Spark Streaming
  • GraphX
  • MLlib
  • SparkR

Answer:

“GraphX” is a component in Spark which is used for graph processing. It helps to build and transform interactive graphs. Spark uses GraphX for graph processing to build and transform interactive graphs. The GraphX component enables programmers to reason about structured data at scale.

Answer:

“PageRank” is the measure of each vertex in a graph.

Answer:

The RDDs in Spark, depend on one or more other RDDs. The representation of dependencies in between RDDs is known as the lineage graph. Lineage graph information is used to compute each RDD on demand, so that whenever a part of persistent RDD is lost, the data that is lost can be recovered using the lineage graph information.

Answer:

Spark MLib- Machine learning library in Spark for commonly used learning algorithms like clustering, regression, classification, etc.

Spark Streaming – This library is used to process real time streaming data.

Spark GraphX – Spark API for graph parallel computations with basic operators like join Vertices, subgraph, aggregate Messages, etc.

Spark SQL – Helps execute SQL like queries on Spark data using standard visualization or BI tools.

Answer:

Lineage graphs are always useful to recover RDDs from a failure but this is generally time consuming if the RDDs have long lineage chains. Spark has an API for check pointing i.e. a REPLICATE flag to persist. However, the decision on which data to checkpoint – is decided by the user. Checkpoints are useful when the lineage graphs are long and have wide dependencies.