An Approach for Fast Score Computation in Bayesian Network Structure Learning Over Large-Scale Distributed Data
No Thumbnail Available
Authors
Meeting name
Sponsors
Date
Journal Title
Format
Thesis
Subject
Abstract
On a fundamental level, Bayes’ theorem enables us to utilize prior knowledge to determine the probability of an event. In consequence, its suitability for probabilistic reasoning has lead to its employment in probabilistic graphical modeling and the inception of Bayesian networks. The field is saturated with techniques to learn the structure of a Bayesian network (also known as Bayes network). Nevertheless, most of the techniques struggle when the number of variables (or network nodes) and the input data grow drastically. At that point, parallel distributed processing is the best alternative to alleviate the computational complexity of this problem. To this end, we propose a gossip-based distributed score computation approach called DiSC that is used to compute the sufficient statistics of families of variables in order to accelerate the structure learning process of Bayesian networks. We show that DiSC can significantly outperform map-reduce style score computations executed by the distributed computation framework Apache Spark on a variety of synthetic and real datasets with a low accuracy trade-off.
Table of Contents
Introduction -- Related work -- Distributed score computation -- Evaluation -- Conclusion
DOI
PubMed ID
Degree
Ph.D.
