大规模图处理
各种网络及其度
无/有向网络中节点的度
无向网络
Degree (or degree centrality) of a vertex: :
, of edges connected to it
eg. 下图中
有向网络
In-degree of a vertex : , of edges pointing to
Out-degree of a vertex : , of edges from
eg. 下图中
各网络及其度的定义
Simple network: If a network has neither self-edges nor multi-edges
Multi-edge (multigraph): If more than one edge between the same pair of vertices
Self-loop: If an edge connects vertex to itself
Direct graph (digraph): If each edge has a direction
Weighted graph: If a weight (a real number ) is associated with each edge
网络图结构及性质
Subgraph: A subset of the nodes and edges in a graph/network
Clique (complete graph): Every node is connected to every other
Singleton vs. dyad (two nodes and their relationship) vs. triad:
Ego-centric network: A net work pull out by selecting a node and all of its connections
度的分布
Degree sequence
The list of degrees of the nodes sorted in non-increasing order. Eg: in ,degree sequence:(4, 3, 2, 2, 1)
Degree frequency distribution
Let denote the of vertices with degree k, is max degree for a node
Eg: in ,degree frequency distribution: (0, 1, 2, 1, 1)
Degree distribution
Probability mass function for random variable :
Eg: in ,degree distribution: (0, 0.2, 0.4, 0.2, 0.2)
路径
Walk
In a graph between nodes and : ordered sequence of vertices, starting at and ending at , s.t. there is an edge between every pair of consecutive vertices
Hop
The length of the walk
Path
A walk with distinct vertices
Shortest Path (geodesic path, )
Geodesic paths are not necessarily unique: It is quite possible to have more than one path of equal length between a given pair of vertices
Diameter of a graph: the length of the longest geodesic path between any pair of vertices in the network for which a path actually exists
Average Path Length( )
Average of the shortest paths between all pairs of nodes:
Eulerian Path
A path that traverses each edge in a network exactly once
Hamilton Path
A path that visits each vertex in a network exactly once
Distance
The length of the shortest path
Total of path length 2 from to ,via any vertex in is:
So, generalizing to path of arbitrary length, we have:
When starting and ending at the same vertex , we have:
of loops can be expressed in terms of adjacency matrix:
Matrix written in the form of , is the orthogonal matrix of eigenvectors and is the diagonal matrix of eigenvalues (SVD):
, where is the -th eigenvalue of the adjacency matrix
Independent Paths
Edge-independent:
Two path connecting a pair of vertices are edge-independent if they share no edges
Vertex-independent:
Two path are vertex-independent if they share no vertices other than the starting and ending vertices
半径与直径
Eccentricity
The eccentricity of a node is the maximum distance from to any other nodes in graph
Eg.
Radius
Radius of a connected graph : the min eccentricity of any node in
Eg.
Diameter
Diameter of a connected graph : the max eccentricity of any node in
Eg.
弱/强连接&出/入元素(有向图)
Weakly/Strongly connected
Weakly: If the vertices are connected by 1 or more paths when one can go either way along any edge
In/Out-component
Out-component: Those reachable from vertex A
簇类系数
因为真实的网络可能十分稀疏,所以需要这么一个衡量系数去衡量一下密集度
Clustering coefficient of a node : A measure of the density of edges in the neighborhood of
Let be the subgraph induced by the neighbors of vertex , ( of neighbors of ), and ( of edges among the neighbors of )
For undirected network:
For directed network:
For a graph: averaging the local clustering coefficient of all the vertices
双边网络
Bipartite Network: two kinds of vertices, and edges linking only vertices of unlike types
Incidence matrix: if vertex links to group , otherwise
The projection to one-mode can be written in terms of the incidence matrix as follow:
The product of will be if and both belong to the sample group in the bi-partite network
协同引证与文献耦合
Co-citation
Co-citation of vertices and : , if and are both cited by
of vertices having outgoing edges pointing to both and :
Co-citation matrix: its a symmetric matrix
Diagonal matrix : total papers citing
Bibliographic coupling
Bibliographic coupling of vertices and : , if and both cite
Bibliographic coupling of and :
Bibliographic coupling matrix:
Diagonal matrix : total papers cited by
Co-citation&Bibliographic Coupling Comparison
Strong co-citation must have a lot of incoming edges
Strong bib-coupling if two papers have similar citations
图计算任务和算法
图计算的任务
1、Graph pattern matching: subgraph matching...
2、 Graph analytics: PageRank ...
图计算的算法
PageRank, Simrank, HITS, Label Propagation, Maximal-Independent-Set, Maximal-Node-Matching, Connected-Component, Shortest Distance ...
程序模型框架
不支持:N 支持:Y
Model | Data/Graph Integration | Both Graph Matching/Analytic | Explicit Loop Control | Systems |
Vertex-Centric | N | N | N | Giraph, Pregel, WindCatch, GPS, Grace, Pregel+, TUX |
Edge-Centric | N | N | N | PowerGraph, GraphChi, X-Stream, TurboGraph, FlashGraph, Powerlyra |
Block-Centric | N | N | N | Giraph++, Blogel |
Linear Algebra | N | N | N | PEGASUS, GraphTwist, LA3 |
Grape | N | Y | N | Grape |
DSL | N | N | Y | Green-Marl |
Datalog | N | Y | Y | Socialite, Distribute-Socialite, DeALS |
Dataflow | Y | Y | Y | GraphX, GraphFrame, Pregelix |
SQL-G | Y | Y | Y | SQL-G, EmptyHeaded |
Last updated