aGrUM 2.3.2
a C++ library for (probabilistic) graphical models
IBNLearner_tpl.h
Go to the documentation of this file.
1/****************************************************************************
2 * This file is part of the aGrUM/pyAgrum library. *
3 * *
4 * Copyright (c) 2005-2025 by *
5 * - Pierre-Henri WUILLEMIN(_at_LIP6) *
6 * - Christophe GONZALES(_at_AMU) *
7 * *
8 * The aGrUM/pyAgrum library is free software; you can redistribute it *
9 * and/or modify it under the terms of either : *
10 * *
11 * - the GNU Lesser General Public License as published by *
12 * the Free Software Foundation, either version 3 of the License, *
13 * or (at your option) any later version, *
14 * - the MIT license (MIT), *
15 * - or both in dual license, as here. *
16 * *
17 * (see https://agrum.gitlab.io/articles/dual-licenses-lgplv3mit.html) *
18 * *
19 * This aGrUM/pyAgrum library is distributed in the hope that it will be *
20 * useful, but WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, *
21 * INCLUDING BUT NOT LIMITED TO THE WARRANTIES MERCHANTABILITY or FITNESS *
22 * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE *
23 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER *
24 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, *
25 * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR *
26 * OTHER DEALINGS IN THE SOFTWARE. *
27 * *
28 * See LICENCES for more details. *
29 * *
30 * SPDX-FileCopyrightText: Copyright 2005-2025 *
31 * - Pierre-Henri WUILLEMIN(_at_LIP6) *
32 * - Christophe GONZALES(_at_AMU) *
33 * SPDX-License-Identifier: LGPL-3.0-or-later OR MIT *
34 * *
35 * Contact : info_at_agrum_dot_org *
36 * homepage : http://agrum.gitlab.io *
37 * gitlab : https://gitlab.com/agrumery/agrum *
38 * *
39 ****************************************************************************/
40#pragma once
41
42
43#include <algorithm>
44
46
47namespace gum::learning {
48
49 template < typename GUM_SCALAR >
50 IBNLearner::Database::Database(const std::string& filename,
51 const BayesNet< GUM_SCALAR >& bn,
52 const std::vector< std::string >& missing_symbols) {
53 // assign to each column name in the database its position
55 DBInitializerFromCSV initializer(filename);
56 const auto& xvar_names = initializer.variableNames();
57 std::size_t nb_vars = xvar_names.size();
59 for (std::size_t i = std::size_t(0); i < nb_vars; ++i)
60 var_names.insert(xvar_names[i], i);
61
62 // we use the bn to insert the translators into the database table
63 std::vector< NodeId > nodes;
64 nodes.reserve(bn.dag().sizeNodes());
65 for (const auto node: bn.dag())
66 nodes.push_back(node);
67 std::sort(nodes.begin(), nodes.end());
68 std::size_t i = std::size_t(0);
69 for (auto node: nodes) {
70 const Variable& var = bn.variable(node);
71 try {
72 _database_.insertTranslator(var, var_names[var.name()], missing_symbols);
73 } catch (NotFound const&) {
74 GUM_ERROR(MissingVariableInDatabase, "Variable '" << var.name() << "' is missing")
75 }
76 _nodeId2cols_.insert(NodeId(node), i++);
77 }
78
79 // fill the database
80 initializer.fillDatabase(_database_);
81
82 // get the domain sizes of the variables
83 for (auto dom: _database_.domainSizes())
84 _domain_sizes_.push_back(dom);
85
86 // create the parser
88 }
89
90 template < typename GUM_SCALAR >
91 BayesNet< GUM_SCALAR > IBNLearner::Database::_BNVars_() const {
92 BayesNet< GUM_SCALAR > bn;
93 const std::size_t nb_vars = _database_.nbVariables();
94 for (std::size_t i = 0; i < nb_vars; ++i) {
95 const DiscreteVariable& var = dynamic_cast< const DiscreteVariable& >(_database_.variable(i));
96 bn.add(var);
97 }
98 return bn;
99 }
100
101 template < typename GUM_SCALAR >
102 IBNLearner::IBNLearner(const std::string& filename,
104 const std::vector< std::string >& missing_symbols) :
105 scoreDatabase_(filename, bn, missing_symbols) {
106 filename_ = filename;
107 noPrior_ = new NoPrior(scoreDatabase_.databaseTable());
108 inducedTypes_ = false;
109 GUM_CONSTRUCTOR(IBNLearner);
110 }
111
112
113} // namespace gum::learning
A class for generic framework of learning algorithms that can easily be used.
Class representing a Bayesian network.
Definition BayesNet.h:93
Base class for discrete random variable.
The class for generic Hash Tables.
Definition hashTable.h:637
value_type & insert(const Key &key, const Val &val)
Adds a new element (actually a copy of this element) into the hash table.
Error: A name of variable is not found in the database.
Exception : the element we looked for cannot be found.
Base class for every random variable.
Definition variable.h:79
const std::string & name() const
returns the name of the variable
The class for initializing DatabaseTable and RawDatabaseTable instances from CSV files.
the class used to read a row in the database and to transform it into a set of DBRow instances that c...
The class used to pack sets of generators.
Database(const std::string &file, const std::vector< std::string > &missing_symbols, const bool induceTypes=false)
default constructor
std::vector< std::size_t > _domain_sizes_
the domain sizes of the variables (useful to speed-up computations)
Definition IBNLearner.h:265
DatabaseTable _database_
the database itself
Definition IBNLearner.h:259
Bijection< NodeId, std::size_t > _nodeId2cols_
a bijection assigning to each variable name its NodeId
Definition IBNLearner.h:268
DBRowGeneratorParser * _parser_
the parser used for reading the database
Definition IBNLearner.h:262
BayesNet< GUM_SCALAR > _BNVars_() const
IBNLearner(const std::string &filename, const std::vector< std::string > &missingSymbols, bool induceTypes=true)
read the database file for the score / parameter estimation and var names
Database scoreDatabase_
the database to be used by the scores and parameter estimators
std::string filename_
the filename database
static void isCSVFileName_(const std::string &filename)
checks whether the extension of a CSV filename is correct
bool inducedTypes_
the policy for typing variables
Definition IBNLearner.h:935
void fillDatabase(DATABASE &database, const bool retry_insertion=false)
fills the rows of the database table
const std::vector< std::string > & variableNames()
returns the names of the variables in the input dataset
the no a priorclass: corresponds to 0 weight-sample
Definition noPrior.h:65
#define GUM_ERROR(type, msg)
Definition exceptions.h:72
Size NodeId
Type for node ids.
include the inlined functions if necessary
Definition CSVParser.h:54