aGrUM 2.3.2
a C++ library for (probabilistic) graphical models
gum::learning::IBNLearner::Database Class Reference

a helper to easily read databases More...

#include <IBNLearner.h>

Collaboration diagram for gum::learning::IBNLearner::Database:

Public Member Functions

template<typename GUM_SCALAR>
 Database (const std::string &filename, const BayesNet< GUM_SCALAR > &bn, const std::vector< std::string > &missing_symbols)
Constructors / Destructors
 Database (const std::string &file, const std::vector< std::string > &missing_symbols, const bool induceTypes=false)
 default constructor
 Database (const DatabaseTable &db)
 default constructor
 Database (const std::string &filename, const Database &score_database, const std::vector< std::string > &missing_symbols)
 constructor for the priors
template<typename GUM_SCALAR>
 Database (const std::string &filename, const gum::BayesNet< GUM_SCALAR > &bn, const std::vector< std::string > &missing_symbols)
 constructor with a BN providing the variables of interest
 Database (const Database &from)
 copy constructor
 Database (Database &&from)
 move constructor
 ~Database ()
 destructor
Operators
Databaseoperator= (const Database &from)
 copy operator
Databaseoperator= (Database &&from)
 move operator
Accessors / Modifiers
DBRowGeneratorParserparser ()
 returns the parser for the database
const std::vector< std::size_t > & domainSizes () const
 returns the domain sizes of the variables
const std::vector< std::string > & names () const
 returns the names of the variables in the database
NodeId idFromName (const std::string &var_name) const
 returns the node id corresponding to a variable name
const std::string & nameFromId (NodeId id) const
 returns the variable name corresponding to a given node id
const DatabaseTabledatabaseTable () const
 returns the internal database table
void setDatabaseWeight (const double new_weight)
 assign a weight to all the rows of the database so that the sum of their weights is equal to new_weight
const Bijection< NodeId, std::size_t > & nodeId2Columns () const
 returns the mapping between node ids and their columns in the database
const std::vector< std::string > & missingSymbols () const
 returns the set of missing symbols taken into account
std::size_t nbRows () const
 returns the number of records in the database
std::size_t size () const
 returns the number of records in the database
void setWeight (const std::size_t i, const double weight)
 sets the weight of the ith record
double weight (const std::size_t i) const
 returns the weight of the ith record
double weight () const
 returns the weight of the whole database

Protected Attributes

DatabaseTable _database_
 the database itself
DBRowGeneratorParser_parser_ {nullptr}
 the parser used for reading the database
std::vector< std::size_t > _domain_sizes_
 the domain sizes of the variables (useful to speed-up computations)
Bijection< NodeId, std::size_t > _nodeId2cols_
 a bijection assigning to each variable name its NodeId
Size _max_threads_number_ {gum::getNumberOfThreads()}
 the max number of threads authorized
Size _min_nb_rows_per_thread_ {100}
 the minimal number of rows to parse (on average) by thread

Private Member Functions

template<typename GUM_SCALAR>
BayesNet< GUM_SCALAR > _BNVars_ () const

Detailed Description

a helper to easily read databases

Definition at line 123 of file IBNLearner.h.

Constructor & Destructor Documentation

◆ Database() [1/7]

gum::learning::IBNLearner::Database::Database ( const std::string & file,
const std::vector< std::string > & missing_symbols,
const bool induceTypes = false )
explicit

default constructor

Parameters
filethe name of the CSV file containing the data
missing_symbolsthe set of symbols in the CSV file that correspond to missing data
induceTypesBy default, all the values in the dataset are interpreted as "labels", i.e., as categorical values. But if some columns of the dataset have only numerical values, it would certainly be better totag them as corresponding to integer, range or continuous variables. By setting induceTypes to true, this is precisely what the BNLearner will do.

Definition at line 84 of file IBNLearner.cpp.

86 :
87 Database(IBNLearner::readFile_(filename, missing_symbols)) {
88 // if the usr wants the best translators to be inferred, just do it
89 if (induceTypes) {
90 for (const auto& [first, second]: _database_.betterTranslators()) {
91 // change the translator
92 _database_.changeTranslator(*second, first);
93 // recompute the domain size
94 _domain_sizes_[first] = second->domainSize();
95 }
96 }
97 }
Database(const std::string &file, const std::vector< std::string > &missing_symbols, const bool induceTypes=false)
default constructor
std::vector< std::size_t > _domain_sizes_
the domain sizes of the variables (useful to speed-up computations)
Definition IBNLearner.h:265
DatabaseTable _database_
the database itself
Definition IBNLearner.h:259
static DatabaseTable readFile_(const std::string &filename, const std::vector< std::string > &missing_symbols)
reads a file and returns a databaseVectInRam

References Database(), gum::learning::IBNLearner::IBNLearner(), _database_, _domain_sizes_, and gum::learning::IBNLearner::readFile_().

Referenced by Database(), Database(), Database(), Database(), Database(), operator=(), and operator=().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ Database() [2/7]

gum::learning::IBNLearner::Database::Database ( const DatabaseTable & db)
explicit

default constructor

Parameters
dban already initialized database table that is used to fill the Database

Definition at line 70 of file IBNLearner.cpp.

70 : _database_(db) {
71 // get the variables names
72 const auto& var_names = _database_.variableNames();
73 const std::size_t nb_vars = var_names.size();
74 for (auto dom: _database_.domainSizes())
75 _domain_sizes_.push_back(dom);
76 for (std::size_t i = 0; i < nb_vars; ++i) {
77 _nodeId2cols_.insert(NodeId(i), i);
78 }
79
80 // create the parser
81 _parser_ = new DBRowGeneratorParser(_database_.handler(), DBRowGeneratorSet());
82 }
Bijection< NodeId, std::size_t > _nodeId2cols_
a bijection assigning to each variable name its NodeId
Definition IBNLearner.h:268
DBRowGeneratorParser * _parser_
the parser used for reading the database
Definition IBNLearner.h:262
Size NodeId
Type for node ids.

References _database_, _domain_sizes_, _nodeId2cols_, and _parser_.

◆ Database() [3/7]

gum::learning::IBNLearner::Database::Database ( const std::string & filename,
const Database & score_database,
const std::vector< std::string > & missing_symbols )

constructor for the priors

We must ensure that the variables of the Database are identical to those of the score database (else the counts used by the scores might be erroneous). However, we allow the variables to be ordered differently in the two databases: variables with the same name in both databases are supposed to be the same.

Parameters
filethe name of the CSV file containing the data
score_databasethe main database used for the learning
missing_symbolsthe set of symbols in the CSV file that correspond to missing data

Definition at line 99 of file IBNLearner.cpp.

101 {
102 // assign to each column name in the CSV file its column
103 IBNLearner::isCSVFileName_(CSV_filename);
104 DBInitializerFromCSV initializer(CSV_filename);
105 const auto& prior_names = initializer.variableNames();
106 std::size_t prior_nb_vars = prior_names.size();
107 HashTable< std::string, std::size_t > prior_names2col(prior_nb_vars);
108 for (auto i = std::size_t(0); i < prior_nb_vars; ++i)
109 prior_names2col.insert(prior_names[i], i);
110
111 // check that there are at least as many variables in the a priori
112 // database as those in the score_database
113 if (prior_nb_vars < score_database._database_.nbVariables()) {
114 GUM_ERROR(InvalidArgument,
115 "the a prior database has fewer variables "
116 "than the observed database")
117 }
118
119 // get the mapping from the columns of score_database to those of
120 // the CSV file
121 const std::vector< std::string >& score_names = score_database.databaseTable().variableNames();
122 const std::size_t score_nb_vars = score_names.size();
123 HashTable< std::size_t, std::size_t > mapping(score_nb_vars);
124 for (auto i = std::size_t(0); i < score_nb_vars; ++i) {
125 try {
126 mapping.insert(i, prior_names2col[score_names[i]]);
127 } catch (Exception const&) {
128 GUM_ERROR(MissingVariableInDatabase,
129 "Variable " << score_names[i]
130 << " of the observed database does not belong to the "
131 << "prior database")
132 }
133 }
134
135 // create the translators for CSV database
136 for (auto i = std::size_t(0); i < score_nb_vars; ++i) {
137 const Variable& var = score_database.databaseTable().variable(i);
138 _database_.insertTranslator(var, mapping[i], missing_symbols);
139 }
140
141 // fill the database
142 initializer.fillDatabase(_database_);
143
144 // get the domain sizes of the variables
145 for (auto dom: _database_.domainSizes())
146 _domain_sizes_.push_back(dom);
147
148 // compute the mapping from node ids to column indices
149 _nodeId2cols_ = score_database.nodeId2Columns();
150
151 // create the parser
152 _parser_ = new DBRowGeneratorParser(_database_.handler(), DBRowGeneratorSet());
153 }
static void isCSVFileName_(const std::string &filename)
checks whether the extension of a CSV filename is correct
#define GUM_ERROR(type, msg)
Definition exceptions.h:72

References Database(), _database_, _domain_sizes_, _nodeId2cols_, _parser_, databaseTable(), gum::learning::IDBInitializer::fillDatabase(), GUM_ERROR, gum::HashTable< Key, Val >::insert(), gum::learning::IBNLearner::isCSVFileName_(), gum::learning::IDatabaseTable< T_DATA >::nbVariables(), nodeId2Columns(), gum::learning::DatabaseTable::variable(), gum::learning::IDatabaseTable< T_DATA >::variableNames(), and gum::learning::IDBInitializer::variableNames().

Here is the call graph for this function:

◆ Database() [4/7]

template<typename GUM_SCALAR>
gum::learning::IBNLearner::Database::Database ( const std::string & filename,
const gum::BayesNet< GUM_SCALAR > & bn,
const std::vector< std::string > & missing_symbols )

constructor with a BN providing the variables of interest

Parameters
filethe name of the CSV file containing the data
bna Bayesian network indicating which variables of the CSV file are used for learning
missing_symbolsthe set of symbols in the CSV file that correspond to missing data

References Database(), and weight().

Here is the call graph for this function:

◆ Database() [5/7]

gum::learning::IBNLearner::Database::Database ( const Database & from)

copy constructor

Definition at line 155 of file IBNLearner.cpp.

155 :
156 _database_(from._database_), _domain_sizes_(from._domain_sizes_),
157 _nodeId2cols_(from._nodeId2cols_) {
158 // create the parser
159 _parser_ = new DBRowGeneratorParser(_database_.handler(), DBRowGeneratorSet());
160 }

References Database(), _database_, _domain_sizes_, _nodeId2cols_, and _parser_.

Here is the call graph for this function:

◆ Database() [6/7]

gum::learning::IBNLearner::Database::Database ( Database && from)

move constructor

Definition at line 162 of file IBNLearner.cpp.

162 :
163 _database_(std::move(from._database_)), _domain_sizes_(std::move(from._domain_sizes_)),
164 _nodeId2cols_(std::move(from._nodeId2cols_)) {
165 // create the parser
166 _parser_ = new DBRowGeneratorParser(_database_.handler(), DBRowGeneratorSet());
167 }

References Database(), _database_, _domain_sizes_, _nodeId2cols_, and _parser_.

Here is the call graph for this function:

◆ ~Database()

gum::learning::IBNLearner::Database::~Database ( )

destructor

Definition at line 169 of file IBNLearner.cpp.

169{ delete _parser_; }

References _parser_.

◆ Database() [7/7]

template<typename GUM_SCALAR>
gum::learning::IBNLearner::Database::Database ( const std::string & filename,
const BayesNet< GUM_SCALAR > & bn,
const std::vector< std::string > & missing_symbols )

Definition at line 50 of file IBNLearner_tpl.h.

52 {
53 // assign to each column name in the database its position
55 DBInitializerFromCSV initializer(filename);
56 const auto& xvar_names = initializer.variableNames();
57 std::size_t nb_vars = xvar_names.size();
58 HashTable< std::string, std::size_t > var_names(nb_vars);
59 for (std::size_t i = std::size_t(0); i < nb_vars; ++i)
60 var_names.insert(xvar_names[i], i);
61
62 // we use the bn to insert the translators into the database table
63 std::vector< NodeId > nodes;
64 nodes.reserve(bn.dag().sizeNodes());
65 for (const auto node: bn.dag())
66 nodes.push_back(node);
67 std::sort(nodes.begin(), nodes.end());
68 std::size_t i = std::size_t(0);
69 for (auto node: nodes) {
70 const Variable& var = bn.variable(node);
71 try {
72 _database_.insertTranslator(var, var_names[var.name()], missing_symbols);
73 } catch (NotFound const&) {
74 GUM_ERROR(MissingVariableInDatabase, "Variable '" << var.name() << "' is missing")
75 }
76 _nodeId2cols_.insert(NodeId(node), i++);
77 }
78
79 // fill the database
80 initializer.fillDatabase(_database_);
81
82 // get the domain sizes of the variables
83 for (auto dom: _database_.domainSizes())
84 _domain_sizes_.push_back(dom);
85
86 // create the parser
87 _parser_ = new DBRowGeneratorParser(_database_.handler(), DBRowGeneratorSet());
88 }

References _database_, _domain_sizes_, _nodeId2cols_, _parser_, gum::learning::IDBInitializer::fillDatabase(), GUM_ERROR, gum::HashTable< Key, Val >::insert(), gum::learning::IBNLearner::isCSVFileName_(), gum::Variable::name(), and gum::learning::IDBInitializer::variableNames().

Here is the call graph for this function:

Member Function Documentation

◆ _BNVars_()

template<typename GUM_SCALAR>
BayesNet< GUM_SCALAR > gum::learning::IBNLearner::Database::_BNVars_ ( ) const
private

Definition at line 91 of file IBNLearner_tpl.h.

91 {
92 BayesNet< GUM_SCALAR > bn;
93 const std::size_t nb_vars = _database_.nbVariables();
94 for (std::size_t i = 0; i < nb_vars; ++i) {
95 const DiscreteVariable& var = dynamic_cast< const DiscreteVariable& >(_database_.variable(i));
96 bn.add(var);
97 }
98 return bn;
99 }

References _database_.

◆ databaseTable()

INLINE const DatabaseTable & gum::learning::IBNLearner::Database::databaseTable ( ) const

returns the internal database table

Definition at line 101 of file IBNLearner_inl.h.

101{ return _database_; }

References _database_.

Referenced by Database().

Here is the caller graph for this function:

◆ domainSizes()

INLINE const std::vector< std::size_t > & gum::learning::IBNLearner::Database::domainSizes ( ) const

returns the domain sizes of the variables

Definition at line 63 of file IBNLearner_inl.h.

63 {
64 return _domain_sizes_;
65 }

References _domain_sizes_.

◆ idFromName()

INLINE NodeId gum::learning::IBNLearner::Database::idFromName ( const std::string & var_name) const

returns the node id corresponding to a variable name

Definition at line 80 of file IBNLearner_inl.h.

80 {
81 try {
82 const auto cols = _database_.columnsFromVariableName(var_name);
83 return _nodeId2cols_.first(cols[0]);
84 } catch (...) {
85 GUM_ERROR(MissingVariableInDatabase,
86 "Variable " << var_name << " could not be found in the database")
87 }
88 }

References _database_, _nodeId2cols_, and GUM_ERROR.

◆ missingSymbols()

INLINE const std::vector< std::string > & gum::learning::IBNLearner::Database::missingSymbols ( ) const

returns the set of missing symbols taken into account

Definition at line 104 of file IBNLearner_inl.h.

104 {
105 return _database_.missingSymbols();
106 }

References _database_.

◆ nameFromId()

INLINE const std::string & gum::learning::IBNLearner::Database::nameFromId ( NodeId id) const

returns the variable name corresponding to a given node id

Definition at line 91 of file IBNLearner_inl.h.

91 {
92 try {
93 return _database_.variableName(_nodeId2cols_.second(id));
94 } catch (...) {
95 GUM_ERROR(MissingVariableInDatabase,
96 "Variable of Id " << id << " could not be found in the database")
97 }
98 }

References _database_, _nodeId2cols_, and GUM_ERROR.

◆ names()

INLINE const std::vector< std::string > & gum::learning::IBNLearner::Database::names ( ) const

returns the names of the variables in the database

Definition at line 68 of file IBNLearner_inl.h.

68 {
69 return _database_.variableNames();
70 }

References _database_.

◆ nbRows()

INLINE std::size_t gum::learning::IBNLearner::Database::nbRows ( ) const

returns the number of records in the database

Definition at line 114 of file IBNLearner_inl.h.

114{ return _database_.nbRows(); }

References _database_.

◆ nodeId2Columns()

INLINE const Bijection< NodeId, std::size_t > & gum::learning::IBNLearner::Database::nodeId2Columns ( ) const

returns the mapping between node ids and their columns in the database

Definition at line 109 of file IBNLearner_inl.h.

109 {
110 return _nodeId2cols_;
111 }

References _nodeId2cols_.

Referenced by Database().

Here is the caller graph for this function:

◆ operator=() [1/2]

IBNLearner::Database & gum::learning::IBNLearner::Database::operator= ( const Database & from)

copy operator

Definition at line 171 of file IBNLearner.cpp.

171 {
172 if (this != &from) {
173 delete _parser_;
174 _database_ = from._database_;
175 _domain_sizes_ = from._domain_sizes_;
176 _nodeId2cols_ = from._nodeId2cols_;
177
178 // create the parser
179 _parser_ = new DBRowGeneratorParser(_database_.handler(), DBRowGeneratorSet());
180 }
181
182 return *this;
183 }

References Database(), _database_, _domain_sizes_, _nodeId2cols_, and _parser_.

Here is the call graph for this function:

◆ operator=() [2/2]

IBNLearner::Database & gum::learning::IBNLearner::Database::operator= ( Database && from)

move operator

Definition at line 185 of file IBNLearner.cpp.

185 {
186 if (this != &from) {
187 delete _parser_;
188 _database_ = std::move(from._database_);
189 _domain_sizes_ = std::move(from._domain_sizes_);
190 _nodeId2cols_ = std::move(from._nodeId2cols_);
191
192 // create the parser
193 _parser_ = new DBRowGeneratorParser(_database_.handler(), DBRowGeneratorSet());
194 }
195
196 return *this;
197 }

References Database(), _database_, _domain_sizes_, _nodeId2cols_, and _parser_.

Here is the call graph for this function:

◆ parser()

INLINE DBRowGeneratorParser & gum::learning::IBNLearner::Database::parser ( )

returns the parser for the database

Definition at line 60 of file IBNLearner_inl.h.

60{ return *_parser_; }

References _parser_.

◆ setDatabaseWeight()

INLINE void gum::learning::IBNLearner::Database::setDatabaseWeight ( const double new_weight)

assign a weight to all the rows of the database so that the sum of their weights is equal to new_weight

assign new weight to the rows of the learning database

Definition at line 73 of file IBNLearner_inl.h.

73 {
74 if (_database_.nbRows() == std::size_t(0)) return;
75 const double weight = new_weight / double(_database_.nbRows());
76 _database_.setAllRowsWeight(weight);
77 }
double weight(const std::size_t i) const
returns the weight of the ith record

References _database_, and weight().

Here is the call graph for this function:

◆ setWeight()

INLINE void gum::learning::IBNLearner::Database::setWeight ( const std::size_t i,
const double weight )

sets the weight of the ith record

Exceptions
OutOfBoundsif i is outside the set of indices of the records or if the weight is negative

Definition at line 120 of file IBNLearner_inl.h.

120 {
121 _database_.setWeight(i, weight);
122 }

References _database_, and weight().

Here is the call graph for this function:

◆ size()

INLINE std::size_t gum::learning::IBNLearner::Database::size ( ) const

returns the number of records in the database

Definition at line 117 of file IBNLearner_inl.h.

117{ return _database_.size(); }

References _database_.

◆ weight() [1/2]

INLINE double gum::learning::IBNLearner::Database::weight ( ) const

returns the weight of the whole database

Definition at line 130 of file IBNLearner_inl.h.

130{ return _database_.weight(); }

References _database_.

◆ weight() [2/2]

INLINE double gum::learning::IBNLearner::Database::weight ( const std::size_t i) const

returns the weight of the ith record

Exceptions
OutOfBoundsif i is outside the set of indices of the records

Definition at line 125 of file IBNLearner_inl.h.

125 {
126 return _database_.weight(i);
127 }

References _database_.

Referenced by Database(), setDatabaseWeight(), and setWeight().

Here is the caller graph for this function:

Member Data Documentation

◆ _database_

DatabaseTable gum::learning::IBNLearner::Database::_database_
protected

◆ _domain_sizes_

std::vector< std::size_t > gum::learning::IBNLearner::Database::_domain_sizes_
protected

the domain sizes of the variables (useful to speed-up computations)

Definition at line 265 of file IBNLearner.h.

Referenced by Database(), Database(), Database(), Database(), Database(), Database(), domainSizes(), operator=(), and operator=().

◆ _max_threads_number_

Size gum::learning::IBNLearner::Database::_max_threads_number_ {gum::getNumberOfThreads()}
protected

the max number of threads authorized

Definition at line 271 of file IBNLearner.h.

unsigned int getNumberOfThreads()
returns the max number of threads used by default when entering the next parallel region

◆ _min_nb_rows_per_thread_

Size gum::learning::IBNLearner::Database::_min_nb_rows_per_thread_ {100}
protected

the minimal number of rows to parse (on average) by thread

Definition at line 274 of file IBNLearner.h.

274{100};

◆ _nodeId2cols_

Bijection< NodeId, std::size_t > gum::learning::IBNLearner::Database::_nodeId2cols_
protected

a bijection assigning to each variable name its NodeId

Definition at line 268 of file IBNLearner.h.

Referenced by Database(), Database(), Database(), Database(), Database(), idFromName(), nameFromId(), nodeId2Columns(), operator=(), and operator=().

◆ _parser_

DBRowGeneratorParser* gum::learning::IBNLearner::Database::_parser_ {nullptr}
protected

the parser used for reading the database

Definition at line 262 of file IBNLearner.h.

262{nullptr};

Referenced by Database(), Database(), Database(), Database(), Database(), ~Database(), operator=(), operator=(), and parser().


The documentation for this class was generated from the following files: