Date of Award

Summer 2007

Document Type


Degree Name

Doctor of Philosophy (PhD)


Modeling Simul & Visual Engineering

Committee Director

Min Song

Committee Member

Mark Pullen

Committee Member

John Sokolowski

Committee Member

Bowen Loftin


Explosive growth in the availability of various kinds of data in distributed locations has resulted in unprecedented opportunity to develop distributed knowledge discovery (DKD) techniques. DKD embraces the growing trend of merging computation with communication by performing distributed data analysis and modeling with minimal communication of data. Most of the current state-of-the-art DKD systems suffer from the lack of scalability, robustness and adaptability due to their dependence on a centralized model for building the knowledge discovery model. Peer-to-Peer networks offer a better scalable and fault-tolerant computing platform for building distributed knowledge discovery models than client-server based platforms. Algorithms and communication protocols have been developed for file search and discovery services in peer-to-peer networks. The file search algorithms are concerned with identification of a peer and discovery of a file on that specified peer, so most of the current peer-to-peer networks for file search act as directory services. The problem of distributed knowledge discovery is different from file search services, however new issues and challenges have to be addressed. The algorithms and communication protocols for knowledge discovery deal with implementing algorithms by which every peer in the network discovers the correct knowledge discovery model, as if it were given the combined database. Therefore, algorithms and communication protocols for DKD mainly deal with distributed computing. The distributed computations are entirely asynchronous, impose very little communication overhead, transparently tolerate network topology changes and peer failures and quickly adjust to changes in the data as they occur. Another important aspect of the distributed computations in a peer-to-peer network is that most of the communication between peer nodes is local i.e. the knowledge discovery model is learned at each peer using information gathered from a very small neighborhood, whose size is independent of the size of the peer-to-peer network. The peer-to-peer constraints on data and/or computing are the hard ones, so the challenge is to show that it is still possible to extract useful information from the distributed data effectively and dependably. The implementation of a distributed algorithm in an asynchronous and decentralized environment is the hardest challenge. DKD in a peer-to-peer network raises issues related to impracticality of global communications and global synchronization, on-the-fly data updates, lack of control, accuracy of computation, the need to share resources with other applications, and frequent failure and recovery of resources. We propose a methodology based on novel distributed algorithms and communication protocols to perform DKD in a peer-to-peer network. We investigate the performance of our algorithms and communication protocols by means of analysis and simulations.