留学生计算机科学与技术专业硕士课程dissertation范文1 Introduction
The report summarizes the work I did in the last few months.I do some study for the background of this research which contains the theories, algorithm that detect and analyse the evolving communities, and so on. In order to do some research on the evolving communities, some related theories about communities, Clique percolation method ( CPM ) should be studied firstly. Then chapter three emphasize the frame of community detecting. The next chapter explain the method of analysing the evolving communities. In the last part of this report I show the previous work in the related area.
2 Related theories
This chapter gives a brief description of related theories about community detecting, which contain the introduction ,the formal expression, the types and the topological properties of network, the concept and properties of community, and the principle of CPM.
2.1 Network
2.11 Basic concept
In the real life, some systems that involve multi-individual and some of them interact with others can be abstracted as network where each individual is a node of network and edge of network represent the relationship or interaction of individuals. In the co-authorship network the nodes in are authors, which are linked if they co-author the paper. The network can be generated as graph G which can be illustrated as {V, E}, based on the the properties of V and E, network can be divided into several styles. If a random pair of nodes (i,j) and (j,i) has the same edge, then we call this network is undirected network, and the co-authorship network belongs to undirected network[1]. In the weighted network every edge can be marked by some property, for example, types and weight, and the the size of the weight present the degree of association. However, in co-authorship network the number of the weight stands for the times of collaboration.
2.1.2 Related property
Plenty of studies found that, related network such as co-authorship network is greatly difference of random network.
1 small-world effect
Even though the co-authorship network is very huge, the average path length of network which is defined as the average distance between any two nodes in network is real small.
presents the distance between the nodes i and j
The small-word is firstly proposed by Hungarian--F.Karinthy in 1929, he thought any two persons can be linked by average six lines that are composed by six persons[2]. After that, in 1967, Milgram who are the social psychologist of Harvard University proposed the six degrees of separation that confirm the the property of small-word[3]. Hereafter, people use other method to test the six degrees of separation hypothesis. In , professor Grossman in university of auckland proposed the Erdos number project tested the small-word in collaboration network of mathematician[5], and Elmacioglu do some research on savant collaboration network based on the DBLP data set, and test the six degrees of separation phenomenon[4].#p#分页标题#e#
These studies stated clearly that, in the huge network, the distance among individuals is real close.
2 Clustering features
In the co-authorship network, if a node i is directly linked by edge with other ki nodes, we call the node i is adjacent to these ki nodes, however, ki nodes may have the relationship of adjacency. Consider that in the friendship network, two friends of a person may be the friends each other[6].
One of the most important metric of network is clustering coefficient(cc). In co-authorship network(weighted network), the clustering coefficient of node i is defined as :
V(i) is the set of nodes which is adjacent to i.
Apparently, if every node in the network is isolated, the cc of this network is zero; and if any two nodes are adjacent nodes, the network is the complete graph, therefore the cc of this network is 1. In general, the cc of most network is between 0 and 1. For the co-authorship network, the cc is close to a nonzero constant coefficient as the size of network increasing.
3 Scale-free
The degree of node i in network is defined as the number of nodes which is adjacent to i. In the scale network, the degree of node is shown as normal distribution or the constant, while the degree of nodes that in scale-free network is scale-free distribution. As plenty of researches show that most of real network is scale-free network, such as co-authorship network, of which degree satisfies the power law distribution
2.2 Community
1 The community in sociology
Sociologist F.Tonnies was the first man who proposed the concept of "community", and give the detailed explanation in the book- Gemeinschaftund Gesellchaft that published in 1887. The community means any organism that have a collaborative relationship. In Tonnies opinion, the wide range of "Geneinschaft" meaning is not only contains the geographically-based communities, but also blood relationship, spiritual relationship and so on, and the prime of the community is the collective sense of cultural awareness. With the development of industry and city, the study of community raised the sociologists interest.
There are two general types of definitions of community in sociology, one is emphasize the functionality, they think the community is composed of persons who have the same goal and interested party; another is defined as geographically-based communities, they divide the society into many pieces that include people who work, study, live(etc.) together.
2 Community in network
Community is the common property in the network. It is the basic unit of the network. However, there have not be a explicit definition for community yet. An accepted opinion is that: a community is the network`s subgraph in which nodes tend to linked with other nodes in the same subgraph. Hence, the nodes in the community link stronger each other. Take the co-authorship network for example, authors in the same community has the similar research interests. Therefore, it is very important for learning and analysing the network to study the community.#p#分页标题#e#
In fact, the community can not be found easily in most real network; thus we need to use some community detection method to dig the hidden community in network.
The community detection means the process of assigning the nodes in the network to several subset by types.
2.3 Clique percolation method (CMP)
In the co-authorship network, there is almost no independent community, by contrast, it is composed by many overlap communities,e.g., authors isdivided by the study interest in co-authorship network, now the author i is interested in several study areas, therefor i belongs to several communities. In that case, it is real hard to extract the community, so we need to use community detection method.
The Clique percolation method is one of the detection method, which is proposed by Palla and his co-workers in 2005 [7], they think the network can be thought as the set of small complete networks in which each nodes is connected to every other nodes, and the small complete network is called clique, if there are k nodes in it then it called k-clique. Two k-clique are adjacent only if they share k-1 nodes.provided that some nodes are the nodes exist in several k-clique which is not adjacent(having no k-1 public nodes), then this nodes are called the overlap part of these k-clique; thus we can use CMP to find the overlapping communities in the network.[8].
The result of CPM is dependent on the network itself[9], k is the very important coefficient, with the increasing of k, the size of community will be becoming smaller but cohesiver. Another important coefficient is w* when using in the weighted network[10]. w* is the weight threshold, the links weaker than w* should be ignored, therefore if the w* is becoming bigger, the community will be looser and even die. Palla also pointed that the advantage of CPM is that the community that defined by CPM is not restrictive, and allow overlaps between the communities.
3 Frame of community detection
Community structure is one of the important properties network. In term of topology, nodes in the network tend to link with other nodes that are exist in the same community; thus the density in the community is higher than that among communities. In terms of property, nodes in the same community also have the same or similar properties, therefore, study on community construct is very helpful for people to understand network structure and analysing the network properties.
In general, most real life networks including co-authorship network are very huge and dynamic. And we can not know the numbers of community and member in community. Moreover, the network changes every often, the change of community structure have large influence upon the network structure, as new communities will emerge, old communities will disappear and some communities will merge with other communities.
The high frequency of change in dynamic network places a new burden on the underlying community detection methond.
3.1 Structure
3.1.1 Overall frame#p#分页标题#e#
Detecting community in dynamic network contains the following steps:
Detecting the community structure in every needed time
Finding the process of community evolution.
Detecting communities track communities
The figure above show the overall frame of community detection, and the general flow is described as following:
Getting the whole network data set in the research time interval.
Exacting the network segment in every needed time.
Analyse the topology of the whole network using the network segments.
Using the appropriate community detection method to detect the communities.
Tracking the communities and analysing the evolution about communities in order to find the dynamic property of network.
3.1.2 Specific work
1 Time division and network segment extracting
The reasonable time division plays a pivotal role in the research. In general, real life network is changing by time. We could get the network segment in every divided time interval by time divided, thereby archiving the study of community detection.
We can not find the emerging of community, if the time interval is too long, while, the number of unstable nodes and edges will increase sharply.
2 Choosing community extracting method
Community extracting method is the foundation of community detection. In fact, we have a lot of community detection method, thus as GN, CPM and so on. Whatever the method we use, the aim of us is dividing the network into several community. As the co-authorship is the interrelated and overlapping network we choose to use CPM , which is wildly used in the collaboration network, as the community detection method.
3.2 Formal description
3.2.1 Network segment
For the co-authorship is the dynamic network, we add the time property in the network description, therefore, the definition of co-authorship network with time property will be described as:
T is the time set which presents the time interval.
W is the weight set.
V presents the nodes in the network, , where N presents the number of the nodes, i.e, the size of G,|G|=N.
Dv means the duration of node v, .
E means the edge set, , where M presents the number of the edges in the network.
Every edge can be describe as: ( vi ,vj, t, w),, that means nodes i and j have the relationship in the time t, and the degree of the relationship is w.
Network segment is the status of network in the time t, we use GS present the network segment in the time interval S , GS is defined as
.
S is the time interval, .
W has the same definition with that I mentioned above.
VS is the node set in the time interval S, i.e, .
ES means the edge set in time S, every edge can be describe as: ( vi ,vj, S, w),, that means nodes i and j have the relationship in the time interval S, and the degree of the relationship is w.#p#分页标题#e#
3.2.2 The evolution graph of community
The evolution graph of community shows the change process of communities in the network. The evolution graph can be described as: E=(TS,C,ER), where TS is the ascending time series S1, S2, ... ,St. .
C is the network segment set in the corresponding time interval.
ER is the evolution set, . means ci and cj are the two status of the same community in the adjacent time interval, if the T(cj)=T(ci)+1, then we call ci is the predecessor of cj, and cj is the successor of ci. In the community evolution graph, there may be no predecessor(successor) of a community, or one or more predecessor(sucessor) of one community.
4 Discovery of community evolution
One of the most important stage of analysing the evolving community is the discovery of community evolution.
4.1 The types of evolving communities
According to the existence of predecessor or successor, the process of community evolution can be divided into several types: merging(birth), disappearance(death), growth, contraction, merging,splitting.
the change of community (picture from ref.[10])
Birth
Community c emerges in the time T(c)+1, only if c does not exist in the network in the time T(c), in other word, there is no predecessor of c in the community evolution graph EG=(TS,C,ER), .
Death
Community c dies in the time T(c)+1, only if c does not exist in the network in the time T(c)+1, in other word, there is no successor of c in the community evolution graph EG=(TS,C,ER), .
Other changes
Community c has successor in time T(c)+1, EG=(TS,C,ER), . Moreover, the successor can be divided into stable and changeable status according to the difference between c and it`s successor. If c keeps stable, then in the graph EG=(TS,C,ER), and c`=c, that is to say, no member leaves or enters the community. If the community changes in the next time, then we should analyse the changes in order to know the community tendency. The changes include:
Growth,contraction,merging and splitting. However, the change of community is the combination of several basic types in the most cases.
4.2 measurement of community evolution
The parameter of measure the community evolution include: stable,changeable, disappear,grow, decompose,merge, life cycle and so on.
Ci and cj is the adjacent community status of the same community and T(ci)=T(ci)+1, i.e., EG=(TS,C,ER), .
Stable
The stability of community ci can by defined as:
The higher stability of community, the more number of nodes that stay in the community, if the Rstability=1, then we call the whole network is the static network in that time.
Disappearace
The disappearance parameter of community means the members in the community disappear proportionately. In the range of T(ci)~T(cj),
Rdisappear is proportional to the number of memberswho disappear from community, while, Rdisappear is inversely proportional to the size of community itself.#p#分页标题#e#
计算机硕士dissertation范文Growth
The growth parameter of community is the proportion of new members . In T(ci)~T(cj)
Rgrow is proportional to the number of new members of community, while, Rgrow is inversely proportional to the size of community.
Changeable
The changes of member in the community shows the changes of community in the process of evolution, In T(ci)~T(cj)
Rchange is proportional to the number of new and disappeared members in community, while, Rchange is inversely proportional to the size of community.
Decompose
The decompose of the community means the proportion of members who leave to other community. In T(ci)~T(cj):
4.3 Community correlation and evolution
4.3.1 community correlation
The correlation of community shows the degree of correlation of two community,CR(ci,cj) presents the degree of correlation of ci and cj.
If ci andcj are the adjacent status of the same community,, then we call the CR(ci,cj) is self-correlation. Apparently, community correlation is in the rang of [0,1].
If CR(ci,cj)=1 then we know the member of communities is the same, while when CR(ci,cj)equals to zero that means ci and cj have no common member.
4.3.2 Community detection
The key step of detecting the community is finding the process of community change, in other word, finding community status of the same community in the different time interval. Then we can build the community evolution graph according to the process of community change in order to finding the type of community evolution.
Reference
[1] Borner,K., Maru,J.T., Goldstone,R.L. (2004). The simultaneous evolution of author and paper networks. PNAS. 101(Suppl 1). 5266-5273.
[2] Braun T. Hungarian priority in network theory [M]. Science, 2004:1754
[3] Milgram, S. (1967).The small world problem. Psychology Today 2, 60--70
[4] Elmacioglu, E.,Lee, D.(2005) On Six Degrees of Separation in DLP-DB and More. SIGMOD Record 34(2). 33-40
[5] Elmacioglu(1996) The Erdos Number Project[online].
Available from:http://www.oakland.edu/enp/
[Accessed on 17th April, 2010]
[6] Li,X.,Chen,G.(2003). A local-world evolving network model. Physica A:Statistical Mechanics and its Application 328(1-2).274-268.
[7] Palla, G., Derenyi, I., Farkas, I., et al.(2005) Uncovering the overlapping community structure of complex networks in nature and society[J].Nature.435. 814-818.
[8] Peng, M.J.(2009) Complex network study. Yang Zhou University.
[9] Huang, Z.X., Yan,Y., Qiu,Y.H.(2009). A community detection method based on CMP. Newspaper of Chongqing Normal University.26(2).1-4.
[10]Palla,G., Barabasi,A., Vicsek, T.(2007) Supplementary Information (in Quantifying social group evolution). [online]. Nature. Available from:
http://www.nature.com/nature/journal/v446/n7136/suppinfo/nature05670.html .
相关文章
UKthesis provides an online writing service for all types of academic writing. Check out some of them and don't hesitate to place your order.