GraphX的图存储模式

January 25, 2019 · 2 min read

Le Dai

Sr Soft Engineer

GraphX的图存储模式

Graphx使用的是Vertex-Cut( 点分割 ) 方式存储图，用三个RDD存储图数据信息：

1.VertexTable(id, data)：id为顶点id， data为顶点属性 (点 rdd table)

2.EdgeTable(pid, src, dst, data)：pid 为分区id ，src为源顶点id ，dst为目的顶点id，data为边属性 (边 rdd table)

3.RoutingTable(id, pid)：id 为顶点id ，pid 为分区id (路由 rdd table 记录点分区的位置)

graphx

点分割存储

GraphX在进行图分割时，有几种不同的分区(partition)策略，它通过PartitionStrategy专门定义这些策略。在PartitionStrategy中，总共定义了EdgePartition2D、EdgePartition1D、RandomVertexCut以及 CanonicalRandomVertexCut这四种不同的分区策略。

1 RandomVertexCut

case object RandomVertexCut extends PartitionStrategy {
    override def getPartition(src: VertexId, dst: VertexId, numParts: PartitionID): PartitionID = {
      math.abs((src, dst).hashCode()) % numParts
    }
}

通过取源顶点和目标顶点id的哈希值来将边分配到不同的分区。这个方法会产生一个随机的边分割，两个顶点之间相同方向的边会分配到同一个分区。

2 CanonicalRandomVertexCut

case object CanonicalRandomVertexCut extends PartitionStrategy {
    override def getPartition(src: VertexId, dst: VertexId, numParts: PartitionID): PartitionID = {
      if (src < dst) {
        math.abs((src, dst).hashCode()) % numParts
      } else {
        math.abs((dst, src).hashCode()) % numParts
      }
    }
}

这种分割方法和前一种方法没有本质的不同。不同的是，哈希值的产生带有确定的方向（即两个顶点中较小id的顶点在前）。两个顶点之间所有的边都会分配到同一个分区，而不管方向如何。

3 EdgePartition1D

case object EdgePartition1D extends PartitionStrategy {
    override def getPartition(src: VertexId, dst: VertexId, numParts: PartitionID): PartitionID = {
      val mixingPrime: VertexId = 1125899906842597L
      (math.abs(src * mixingPrime) % numParts).toInt
    }
}

这种方法仅仅根据源顶点id来将边分配到不同的分区。有相同源顶点的边会分配到同一分区。

GraphX的图存储模式

点分割存储

1 RandomVertexCut

2 CanonicalRandomVertexCut

3 EdgePartition1D

4 EdgePartition2D