An Approach for Fast Score Computation in Bayesian Network Structure Learning Over Large-Scale Distributed Data

No Thumbnail Available

Meeting name

Sponsors

Date

Journal Title

Format

Thesis

Subject

Research Projects

Organizational Units

Journal Issue

Abstract

On a fundamental level, Bayes’ theorem enables us to utilize prior knowledge to determine the probability of an event. In consequence, its suitability for probabilistic reasoning has lead to its employment in probabilistic graphical modeling and the inception of Bayesian networks. The field is saturated with techniques to learn the structure of a Bayesian network (also known as Bayes network). Nevertheless, most of the techniques struggle when the number of variables (or network nodes) and the input data grow drastically. At that point, parallel distributed processing is the best alternative to alleviate the computational complexity of this problem. To this end, we propose a gossip-based distributed score computation approach called DiSC that is used to compute the sufficient statistics of families of variables in order to accelerate the structure learning process of Bayesian networks. We show that DiSC can significantly outperform map-reduce style score computations executed by the distributed computation framework Apache Spark on a variety of synthetic and real datasets with a low accuracy trade-off.

Table of Contents

Introduction -- Related work -- Distributed score computation -- Evaluation -- Conclusion

DOI

PubMed ID

Degree

Ph.D.

Rights

License