This tutorial covers graph processing with GraphX in Apache Spark.
Represented by a unique ID and attributes.
Represented by a source vertex ID, a destination vertex ID, and attributes.
from pyspark.sql import SparkSession
from graphframes import *
spark = SparkSession.builder.appName("GraphXExample").getOrCreate()
vertices = spark.createDataFrame([
("1", "Alice", 34),
("2", "Bob", 36),
("3", "Charlie", 30)], ["id", "name", "age"])
edges = spark.createDataFrame([
("1", "2", "friend"),
("2", "3", "follow"),
("3", "1", "friend")], ["src", "dst", "relationship"])
graph = GraphFrame(vertices, edges)
results = graph.pageRank(resetProbability=0.15, maxIter=10)
results.vertices.show()
results.edges.show()
result = graph.connectedComponents()
result.show()
result = graph.triangleCount()
result.show()
GraphFrames is a library that provides a DataFrame-based API for graph processing.
motifs = graph.find("(a)-[e]->(b); (b)-[e2]->(c)")
motifs.show()
results = graph.shortestPaths(landmarks=["1", "2"]