Graph Sampling-
In graph sampling we discover the all methods for patterns small graph from large no. of data. In data mining lots of data available but that all data represented with user requirement. The lots of patterns are used for representing data into graphical format like 2D, 3D method or pi chart, flowchart. In graph sampling all data are represented with use of graphs. The all mining data was show to user with use of graph method mainly. The graph mining is best research area within data mining.
Frequent Sub Graph Mining-
The frequent sub graph mining gives a small number of graphs as a result from large graph database. In that mining lots of algorithms are used from data mining and create final output to user. The frequent sub graph mining comes under 2 different types mainly-
1.Algorithm using BSF search strategy-
A.That all algorithm based on Apriori algorithm approach.
B.The graph is divided into ‘K’ and ‘K+1’ formation.
C.The size of graph defined by no. of vertices in that graph.
In that algorithm basically 2 algorithms occurs mainly-
•AGM Algorithm-
-That algorithm is based on Apriori algorithm mainly
-That algorithm used adjacent matrix for graph representation
•FSG Algorithm-
-That algorithm is based on Apriori algorithm mainly
-Edges in that graphs are presented as a frequent items.
-Every time additional edges are attached for finding frequent item in that graph technique.
2.Algorithm using DFS search strategy-
1.That type of algorithm comes under pattern graph approach
2.BSF graph technique is costly then DFS is used mainly.
That graph technique fallow 1 algorithm mainly.
•G Span Algorithm-
-That algorithm based on pattern search growth approach.
-Multiple candidate generation can be reduced in G Span
-It work on labeled sample graphs
-Each graph has unique label for each edge and its vertices
-It finds frequent sub graph easily
Explanation :
Graph sampling is a vital technique in data mining used to analyze and process large-scale graph data efficiently. In many real-world applications such as social networks, biological networks, communication systems, and the World Wide Web, graphs can contain millions or even billions of nodes and edges. Analyzing such massive graphs directly is computationally expensive and time-consuming. Graph sampling provides a practical solution by selecting a smaller, representative subset of the original graph that preserves its essential structural properties.
The main goal of graph sampling is to create a smaller graph that maintains the statistical and topological characteristics of the original one, such as degree distribution, clustering coefficient, and community structure. This allows researchers to perform experiments and analyses on the sample while obtaining results that generalize well to the full dataset. Effective sampling ensures that key features of the network are not lost, enabling accurate approximations and predictions.
There are several common methods of graph sampling, including node sampling, edge sampling, random walk sampling, and snowball sampling.
-
Node sampling randomly selects a subset of nodes and includes all or some of their connecting edges.
-
Edge sampling chooses random edges and includes their associated nodes.
-
Random walk sampling starts from a random node and moves through the graph by following connected edges, producing a more natural exploration of the structure.
-
Snowball sampling expands from an initial set of nodes by iteratively including their neighbors, which is particularly useful in social network analysis.
Graph sampling is widely used in various domains. In social networks, it helps analyze user communities or influence patterns without processing the entire network. In web mining, it assists in understanding hyperlink structures. In biology, it helps study protein–protein interaction networks efficiently. Moreover, it supports visualization tasks by simplifying large graphs for human interpretation.
In conclusion, graph sampling plays a crucial role in data warehousing and data mining by enabling scalable analysis of complex graph data. By generating smaller yet representative subgraphs, it enhances performance, reduces computational costs, and maintains analytical accuracy across diverse applications.
Read More-

Comments
Post a Comment