An Approach for Fast Score Computation in Bayesian Network Structure Learning Over Large-Scale Distributed Data
Metadata[+] Show full item record
On a fundamental level, Bayes’ theorem enables us to utilize prior knowledge to determine the probability of an event. In consequence, its suitability for probabilistic reasoning has lead to its employment in probabilistic graphical modeling and the inception of Bayesian networks. The ﬁeld is saturated with techniques to learn the structure of a Bayesian network (also known as Bayes network). Nevertheless, most of the techniques struggle when the number of variables (or network nodes) and the input data grow drastically. At that point, parallel distributed processing is the best alternative to alleviate the computational complexity of this problem. To this end, we propose a gossip-based distributed score computation approach called DiSC that is used to compute the suﬃcient statistics of families of variables in order to accelerate the structure learning process of Bayesian networks. We show that DiSC can signiﬁcantly outperform map-reduce style score computations executed by the distributed computation framework Apache Spark on a variety of synthetic and real datasets with a low accuracy trade-oﬀ.
Table of Contents
Introduction -- Related work -- Distributed score computation -- Evaluation -- Conclusion