Bayesian networks are a probabilistic graphical model in which nodes are random variables and the probability distribution is defined by the product:
\(P(X_1, \ldots, X_2) = \prod_{i=1}^{n} P(X_i | \pi(X_i))\),where \(\pi(X_i)\) is the parent of \(X_i\).
The Bayesian network module in aGrUM can help you do the following operations:
- Model Bayesian networks, from graph to local distributions.
- Execute probabilistic inference from a wide range of algorithms.
- Load and save Bayesian networks in different file formats.
The Bayesian networks module list all classes for using Bayesian networks with aGrUM.
We will use the classic Asia network to illustrate how the gum::BayesNetFactory class works.
Creating Bayesian networks
The following code illustrates how to create the Asia network using the gum::BayesNet class. To create an instance of a Bayesian network, you simply need to call the gum::BayesNet class constructor.
Class representing a Bayesian network.
Use the gum::BayesNet::add( const gum::DiscreteVariable& ) method to add variables in the Bayesian network. The following variables are available in aGrUM:
"template", "A variable of the Asia Bayesian network", 0 );
var.addLabel( "True" );
var.addLabel( "False" );
var.setName( "Visit to Asia" );
auto visitToAsia = bn.add( var );
var.setName( "Smoker" );
auto smoker = bn.add( var );
var.setName( "Has Tuberculosis" );
auto hasTuberculosis = bn.add( var );
var.setName( "Has Lung Cancer" );
auto hasLungCancer = bn.add( var );
var.setName( "Has Bronchitis" );
auto hasBronchitis = bn.add( var );
var.setName( "Tuberculosis or Cancer" );
auto tubOrCancer = bn.add( var );
var.setName( "XRay Result" );
auto xray = bn.add( var );
var.setName( "Dyspnea" );
auto dyspnea = bn.add( var );
Use the gum::BayesNet::addArc( gum::NodeId, gum::NodeId ) to add arcs between node in the Bayesian Network.
bn.addArc( visitToAsia, hasTuberculosis );
bn.addArc( hasTuberculosis, tubOrCancer );
bn.addArc( smoker, hasLungCancer );
bn.addArc( smoker, hasBronchitis );
bn.addArc( hasLungCancer, tubOrCancer );
bn.addArc( tubOrCancer, xray );
bn.addArc( tubOrCancer, dyspnea );
bn.addArc( hasBronchitis, dyspnea );
You can also use the gum::BayesNet::idFromName( const std::string& ) method to retrieve variable's id from its name. Many methods have a version with names instead of node id. The next 3 lines are then equivalent :
bn.addArc( visitToAsia, hasTuberculosis );
bn.addArc( bn.idFromName("Visit to Asia"), bn.idFromName("Has Tuberculosis"));
the NodeId from the name
bn.addArc( "Visit to Asia","Has Tuberculosis");
Finally, use the gum::BayesNet::cpt( gum::NodeId ) to access a variable's conditional probability table. See How to use the MultiDim hierarchy to learn how to fill gum::Tensor. Here we use the gum::Tensor::fillwith( const std::vector&
) method.
bn.cpt( visitToAsia ).fillWith( { 0.1f, 0.9f } );
bn.cpt( "Visit to Asia" ).fillWith( { 0.1f, 0.9f } );
bn.cpt( smoker ).fillWith( { 0.7f, 0.3f } );
bn.cpt( hasTuberculosis ).fillWith( {
0.05f, 0.01f,
0.95f, 0.99f
} );
bn.cpt( hasLungCancer ).fillWith( {
0.10f, 0.90f,
0.01f, 0.99f
} );
bn.cpt( tubOrCancer ).fillWith( {
1.00f, 0.00f,
1.00f, 0.00f,
1.00f, 0.00f,
0.00f, 1.00f
} );
bn.cpt( xray ).fillWith( {
0.98f, 0.02f,
0.05f, 0.95f
} );
bn.cpt( dyspnea ).fillWith( {
0.90f, 0.10f,
0.70f, 0.30f,
0.80f, 0.20f,
0.10f, 0.90f
} );
Filling conditional probability tables can be hard and you should use the commenting trick as above to help you with large tables. It is important to remember that the std::vector is used to fill a multi-dimensional table where each line should sum to 1, i.e. each line stores \(P(X_i |
\pi(X_i)\).
Probabilistic Inference
All inference algorithms implement the gum::BayesNetInference class. The main methods for inference are:
ie.makeInference();
for (const auto& idn : bn.nodes()) {
const auto name = bn.variable(idn).name();
std::cout << name << " : " << ie.posterior(name) << std::endl;
}
ie.addEvidence("B", "middle");
ie.makeInference();
for (const auto& idn : bn.nodes()) {
const auto name = bn.variable(idn).name();
std::cout << name << " : " << ie.posterior(name) << std::endl;
}
auto updated_marginal = ie.posterior("A");
std::cout << updated_marginal << std::endl;
Implementation of a Shafer-Shenoy's-like version of lazy propagation for inference in Bayesian networ...
More advance methods can be used for special use case:
).
Inference Algorithms
Here is a list of exact inference algorithms:
And this is the list of approximate inference algorithms:
Finally, a list of utility algorithms used by some inference algorithms:
Using the gum::BayesNetFactory class
The gum::ByesNetFactory class is usefull when writing serializers and deserailizers for the gum::BayesNet class. You can also use it to create gum:BayesNet directly in C++, you may however find that using directly the gum::BayesNet class simpler.
Instantiating the factory
The gum::BayesNetFactory expects a pointer toward a gum::BayesNet. The factory will not release this pointer, so you should be careful to release it yourself.
A factory class to ease BayesNet construction.
Most methods follow a start / end pattern . Until the end method is called, there is no guarantee that the element is added or partially added to the gum::BayesNet.
Adding nodes
To add a node, you must use the gum::BayesNetFactory::startVariableDeclaration() and gum::BayesNetFactory::endVariableDeclaration() methods. You must provide several informations to correctly add a node to the gum::BayesNet, otherwise a gum::OperationNotAllowed will be raised.
When declaring a variable you must:
- Have finished any previous declaration using the respective end method.
- Give it a name using gum::BayesNetFactory::variableName(std::string).
- Add at least two modalities using gum::BayesNetFactory::addModality(std::string).
Here is a list of legal method calls while declaring a variable:
Here is a code sample where we declare the "Visit To Asia" variable in the Asia Network example:
factory.startVariableDeclaration();
factory.variableName( "Visit To Asia" );
factory.variableDescription(
"True if patient visited Asia in the past months" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();
factory.startVariableDeclaration();
factory.variableName( "Smoker" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();
factory.startVariableDeclaration();
factory.variableName( "Has Tuberculosis" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();
factory.startVariableDeclaration();
factory.variableName( "Has Lung Cancer" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();
factory.startVariableDeclaration();
factory.variableName( "Tuberculosis or Cancer" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();
factory.startVariableDeclaration();
factory.variableName( "Has Bronchitis" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();
factory.startVariableDeclaration();
factory.variableName( "XRay Result" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();
factory.startVariableDeclaration();
factory.variableName( "Dyspnea" );
factory.addModality( "True" );
factory.addModality( "False" );
factory.endVariableDeclaration();
The gum::BayesNetFactory::endVariableDeclaration() method returns the variable's gum::NodeId in the gum::BayesNet.
Adding arcs
To add an arc you must use the gum::BayesNetFactory::startParentsDeclaration( const std::string& ) and gum::BayesNetFactory::endParentsDeclaration() methods.
Here is a list of legal method calls while declaring parents:
Note that you may not add all parents in one shot and that calling both start end methods without adding any parent will not result in an error.
factory.startParentsDeclaration( "Has Tuberculosis" );
factory.addParent( "Visit To Asia" );
factory.endParentsDeclaration();
factory.startParentsDeclaration( "Has Lung Cancer" );
factory.addParent( "Smoker" );
factory.endParentsDeclaration();
factory.startParentsDeclaration( "Tuberculosis or Cancer" );
factory.addParent( "Has Tuberculosis" );
factory.addParent( "Has Lung Cancer" );
factory.endParentsDeclaration();
factory.startParentsDeclaration( "Has Bronchitis" );
factory.addParent( "Smoker" );
factory.endParentsDeclaration();
factory.startParentsDeclaration( "XRay Result" );
factory.addParent( "Tuberculosis or Cancer" );
factory.endParentsDeclaration();
factory.startParentsDeclaration( "Dyspnea" );
factory.addParent( "Tuberculosis or Cancer" );
factory.addParent( "Has Bronchitis" );
factory.endParentsDeclaration();
Defining Conditional Probability Tables
The gum::BayesNetFactory class offers three ways to define conditional probability tables (CPT): raw, factorized and delegated.
Raw CPT definition
From a user perspective, raw definitions are useful to define small CPT, like root nodes. However, they do not scale well if the CPT dimension is too high and you should prefer Factorized CPT definition if you need to define large CPT. On the other hand, raw definitions are very useful when automatically filling CPT from some source (file, database, another CPT, ...).
Two methods can be used to define raw CPT:
Defining the conditional probability table for the root node "Visit To Asia" in the Asia Network example can be achieved as follow:
factory.startRawProbabilityDeclaration("VisitToAsia");
auto variables = std::vector<std::string>{ "VisitToAsia" };
auto values = std::vector<float>{ 0.01f, 0.99f };
factory.rawConditionalTable(variables, values);
factory.endRawProbabilityDeclaration();
Defining the conditional probability table for a node with parents:
factory.startRawProbabilityDeclaration("Tuberculosis or Cancer");
variables = std::vector<std::string>{
"Tuberculosis or Cancer",
"Has Tuberculosis",
"Has Lung Cancer"
};
values = std::vector<float>
{ 0.00f, 0.00f, 0.00f, 1.00f,
1.00f, 1.00f, 1.00f, 0.00f };
factory.rawConditionalTable(variables, values);
factory.endRawProbabilityDeclaration();
Factorized CPT definition
Factorized definitions are useful when dealing with sparse CPT. It can also be used when writing the raw CPT is error prone. The gum::BayesNetFactory::startFactorizedProbabilityDeclaration(const std::string&) is used to start a definition and gum::BayesNetFactory::endFactorizedProbabilityDeclaration(const std::string&) to end it.
A factorized definition is made of consecutive factorized entries. Each entry set parents modalities and defines a distribution given those modalities. If some parents are left undefined, then the distribution will be assigned to each possible outcome of those parents.
To start declaring a factorized entry call the gum::BayesNetFactory::startFactorizedEntry() and to end it call gum::BayesNetFactory::endFactorizedEntry().
In the following example, we define the CPT for the "Dyspnea" variable in the Asia Network:
factory.startFactorizedProbabilityDeclaration("Tuberculosis or Cancer");
factory.startFactorizedEntry();
values = std::vector<float>{ 1.00f, 0.00f };
factory.setVariableValues( values );
factory.endFactorizedEntry();
factory.startFactorizedEntry();
factory.setParentModality( "Has Lung Cancer", "False" );
factory.setParentModality( "Has Tuberculosis", "False" );
values = std::vector<float>{ 0.00f, 1.00f };
factory.setVariableValues( values );
factory.endFactorizedEntry();
factory.endFactorizedProbabilityDeclaration();
While adding values in a factorized definition, two methods are available:
The unchecked version will not check if the vector matches the variable's domain size. The checked version will raise a gum::OperationNotAllowed if such situation.
Delegated CPT definition
Delegated definitions let the user define himself the gum::DiscreteVariable and gum::MultiDimAdressable added to the gum::BayesNet. You should only use such method if you familiar with the multidim hierarchy and require specific multidimensional arrays, like gum::MultiDimNoisyORCompound, gum::aggregator::Count, etc.
Serialization
There are several file format currently supported for gum::BayesNet serialization and deserialization. The all either implement gum::BNReader for serialization or gum::BNWriter for deserialization.
The gum::BNReader class
The main methods for deserializing an instance of gum::BayesNet are:
try {
reader.proceed();
} catch ( gum::IOError& e ) {
}
Definition of templatized reader of BIF files for Bayesian networks.
The gum::BNWriter class
The main methods for serializing an instance of gum::BayesNet are:
- gum::BNWriter::write( std::ostream&, const IBayesNet<GUM_SCALAR>& ).
- gum::BNWriter::write( std;;string, const IBayesNet<GUM_SCALAR>& ). Both methods will raise a gum::IOError if a serialization error occurred.
try {
writer.write( std::cout, asia );
writer.write( "/tmp/asiaNetwork.bif", asia );
} catch ( gum::IOError& e ) {
}
Writes a IBayesNet in the BIF format.
Be aware that the file will be created if it does not exists. If it does exist, its content will be erased.
List of supported format
The BIF format:
The BIF XML format:
The DSL format:
The CNF format (no reader in this format):
The NET format: