大规模图处理
各种网络及其度
无/有向网络中节点的度
无向网络
Degree (or degree centrality) of a vertex: d(vi) :
d(vi)=∣vj∣ s.t. eij∈E∧eij=eji,# of edges connected to it
eg. 下图中 d(A)=4, d(H)=2

有向网络
In-degree of a vertex din(vi) : din(vi)=∣vj∣ s.t. eij∈E ,# of edges pointing to vi
Out-degree of a vertex dout(vi) : dout(vi)=∣vj∣ s.t. eji∈E ,# of edges from vi
eg. 下图中 din(A)=3, dout(A)=1; din(B)=2, dout(B)=2

各网络及其度的定义
Simple network: If a network has neither self-edges nor multi-edges
Multi-edge (multigraph): If more than one edge between the same pair of vertices
Self-loop: If an edge connects vertex to itself
Direct graph (digraph): If each edge has a direction
Weighted graph: If a weight wij (a real number ) is associated with each edge vij

网络图结构及性质
Subgraph: A subset of the nodes and edges in a graph/network
Clique (complete graph): Every node is connected to every other
Singleton vs. dyad (two nodes and their relationship) vs. triad:

Ego-centric network: A net work pull out by selecting a node and all of its connections

度的分布

Degree sequence
The list of degrees of the nodes sorted in non-increasing order. Eg: in G1 ,degree sequence:(4, 3, 2, 2, 1)
Degree frequency distribution
Let NK denote the # of vertices with degree k, (N0,N1,…,Nt), t is max degree for a node
Eg: in G1 ,degree frequency distribution: (0, 1, 2, 1, 1)
Degree distribution
Probability mass function f for random variable X:
(f(0),f(1),…,f(t)), where f(k)=P(X=k)=Nk/n
Eg: in G1 ,degree distribution: (0, 0.2, 0.4, 0.2, 0.2)
路径
Walk
In a graph G between nodes X and Y : ordered sequence of vertices, starting at X and ending at Y, s.t. there is an edge between every pair of consecutive vertices
Hop
The length of the walk
Path
A walk with distinct vertices
Shortest Path (geodesic path, d )
Geodesic paths are not necessarily unique: It is quite possible to have more than one path of equal length between a given pair of vertices
Diameter of a graph: the length of the longest geodesic path between any pair of vertices in the network for which a path actually exists
Average Path Length( <d> )
Average of the shortest paths between all pairs of nodes: <d>=N(N−1)1i,j=1,N(i=j)∑di,j
Eulerian Path
A path that traverses each edge in a network exactly once
Hamilton Path
A path that visits each vertex in a network exactly once
Distance
The length of the shortest path

Total # of path length 2 from j to i ,via any vertex in Nij(2) is:Nij(2)=∑k=1nAikAkj=[A2]ij
So, generalizing to path of arbitrary length, we have: Nij(r)=[Ar]ij
When starting and ending at the same vertex i , we have: Lr=∑i=1n[Ar]ii=TrAr
# of loops can be expressed in terms of adjacency matrix:
Matrix A written in the form of A=UKUT , U is the orthogonal matrix of eigenvectors and K is the diagonal matrix of eigenvalues (SVD):
Lr=Tr(UKUT)r=Tr(UKrUT)=Tr(UUTKr)=Tr(Kr)=∑ikir, where ki is the i-th eigenvalue of the adjacency matrix
Independent Paths
Edge-independent:
Two path connecting a pair of vertices are edge-independent if they share no edges
Vertex-independent:
Two path are vertex-independent if they share no vertices other than the starting and ending vertices
半径与直径

Eccentricity
The eccentricity of a node vi is the maximum distance from vi to any other nodes in graph
e(vi)=maxj{d(vi,vj)} Eg. e(A)=1,e(F)=e(B)=e(D)=e(H)=2
Radius
Radius of a connected graph G : the min eccentricity of any node in G
r(G)=mini{e(vi)}=mini{d(vi,vj)} Eg. r(G1)=1
Diameter
Diameter of a connected graph G : the max eccentricity of any node in G
d(G)=maxi{e(vi)}=maxi,j{d(vi,vj)} Eg. d(G1)=2
弱/强连接&出/入元素(有向图)
Weakly/Strongly connected
Weakly: If the vertices are connected by 1 or more paths when one can go either way along any edge

In/Out-component
Out-component: Those reachable from vertex A

簇类系数
因为真实的网络可能十分稀疏,所以需要这么一个衡量系数去衡量一下密集度
Clustering coefficient of a node vi: A measure of the density of edges in the neighborhood of vi
Let Gi=(Vi,Ei) be the subgraph induced by the neighbors of vertex vi , ∣Vi∣=ni( # of neighbors of vi ), and ∣Ei∣=mi ( # of edges among the neighbors of vi )
For undirected network: C(vi)=max #edges in Gi#edges in Gi=(2ni)mi=ni(ni−1)2×mi
For directed network: C(vi)=max #edges in Gi#edges in Gi=ni(ni−1)mi
For a graph: C(G)=n1i∑C(vi) averaging the local clustering coefficient of all the vertices
双边网络
Bipartite Network: two kinds of vertices, and edges linking only vertices of unlike types

Incidence matrix: Bij=1 if vertex j links to group i, 0 otherwise
The projection to one-mode can be written in terms of the incidence matrix B as follow:
Pij=∑k=1gBkiBkj=∑k=1gBikTBkj
The product of BkiBkj will be 1 if i and j both belong to the sample group k in the bi-partite network
协同引证与文献耦合
Co-citation
Co-citation of vertices i and j : AikAjk=1 , if i and j are both cited by k

# of vertices having outgoing edges pointing to both i and j :
Cij=∑k=1nAikAjk=∑k=1nAikAkjT
Co-citation matrix: its a symmetric matrix C=AAT
Diagonal matrix (Cii): total # papers citing i
Bibliographic coupling
Bibliographic coupling of vertices i and j : AikAjk=1 , if i and j both cite k

Bibliographic coupling of i and j :
Bij=∑k=1nAkiAkj=∑k=1nAikTAkj
Bibliographic coupling matrix: B=ATA
Diagonal matrix (Bii): total # papers cited by i
Co-citation&Bibliographic Coupling Comparison
Strong co-citation must have a lot of incoming edges
Strong bib-coupling if two papers have similar citations
图计算任务和算法
图计算的任务
1、Graph pattern matching: subgraph matching...
2、 Graph analytics: PageRank ...
图计算的算法
PageRank, Simrank, HITS, Label Propagation, Maximal-Independent-Set, Maximal-Node-Matching, Connected-Component, Shortest Distance ...
程序模型框架
不支持:N 支持:Y
Model
Data/Graph Integration
Both Graph Matching/Analytic
Explicit Loop Control
Systems
Vertex-Centric
N
N
N
Giraph, Pregel, WindCatch, GPS, Grace, Pregel+, TUX
Edge-Centric
N
N
N
PowerGraph, GraphChi, X-Stream, TurboGraph, FlashGraph, Powerlyra
Block-Centric
N
N
N
Giraph++, Blogel
Linear Algebra
N
N
N
PEGASUS, GraphTwist, LA3
Grape
N
Y
N
Grape
DSL
N
N
Y
Green-Marl
Datalog
N
Y
Y
Socialite, Distribute-Socialite, DeALS
Dataflow
Y
Y
Y
GraphX, GraphFrame, Pregelix
SQL-G
Y
Y
Y
SQL-G, EmptyHeaded
Last updated