aGrUM 2.3.2
a C++ library for (probabilistic) graphical models
gum::learning::DBRowGeneratorSet Class Referencefinal

The class used to pack sets of generators. More...

#include <agrum/base/database/DBRowGeneratorSet.h>

Public Member Functions

Constructors / Destructors
 DBRowGeneratorSet ()
 default constructor
 DBRowGeneratorSet (const DBRowGeneratorSet &from)
 copy constructor
 DBRowGeneratorSet (DBRowGeneratorSet &&from)
 move constructor
virtual DBRowGeneratorSetclone () const
 virtual copy constructor
virtual ~DBRowGeneratorSet ()
 destructor
Operators
DBRowGeneratorSetoperator= (const DBRowGeneratorSet &from)
 copy operator
DBRowGeneratorSetoperator= (DBRowGeneratorSet &&from)
 move operator
DBRowGeneratoroperator[] (const std::size_t i)
 returns the ith generator
const DBRowGeneratoroperator[] (const std::size_t i) const
 returns the ith generator
Accessors / Modifiers
template<class Generator>
void insertGenerator (const Generator &generator)
 inserts a new generator at the end of the set
template<class Generator>
void insertGenerator (const Generator &generator, const std::size_t i)
 inserts a new generator at the ith position of the set
std::size_t nbGenerators () const noexcept
 returns the number of generators
std::size_t size () const noexcept
 returns the number of generators (alias for nbGenerators)
bool hasRows ()
 returns true if there are still rows that can be output by the set of generators
bool setInputRow (const DBRow< DBTranslatedValue > &input_row)
 sets the input row from which the generators will create new rows
const DBRow< DBTranslatedValue > & generate ()
 generates a new output row from the input row
template<typename GUM_SCALAR>
void setBayesNet (const BayesNet< GUM_SCALAR > &new_bn)
 assign a new Bayes net to all the generators that depend on a BN
void reset ()
 resets all the generators
void clear ()
 removes all the generators
void setColumnsOfInterest (const std::vector< std::size_t > &cols_of_interest)
 sets the columns of interest: the output DBRow needs only contain correct values fot these columns
void setColumnsOfInterest (std::vector< std::size_t > &&cols_of_interest)
 sets the columns of interest: the output DBRow needs only contain correct values fot these columns
const std::vector< std::size_t > & columnsOfInterest () const
 returns the current set of columns of interest

Detailed Description

The class used to pack sets of generators.

When learning Bayesian networks, the records of the train dataset are used to construct contingency tables that are either exploited in statistical conditional independence tests or in scores. To achieve this, the values of the DatabaseTable's records need all be observed, i.e., there should be no missing value. When this is not the case, we need to decide what to do with the records (actually the DBRows) that contain missing values. Should we discard them? Should we use an EM algorithm to substitute them by several fully-observed DBRows weighted by their probability of occurrence? Should we use a K-means algorithm to substitute them by only one DBRow of highest probability of occurrence? DBRowGenerator classes are used to perform these substitutions. From one input DBRow, they can produce from 0 to several output DBRows. DBRowGenerator instances can be used in sequences, i.e., a first DBRowGenerator can, e.g., apply an EM algorithm to produce many output DBRows, then these DBRows can feed another DBRowGenerator that only keeps those whose weight is higher than a given threshold. The purpose of Class DBRowGeneratorSet is to contain this sequence of DBRowGenerator instances. The key idea is that it makes the parsing of the output DBRow generated easier. For instance, if we want to use a sequence of 2 generators, outputing 3 times and 4 times the DBRows they get in input respectively, we could use the following code:

gum::learning::DBRowGeneratorDuplicate generator3 ( col_types, 3 );
gum::learning::DBRowGeneratorDuplicate generator4 ( col_types, 4 );
for ( auto dbrow : database ) {
generator3.setInputRow ( dbrow );
while ( generator3.hasRows () ) {
const auto& output3_dbrow = generator3.generate ();
generator4.setInputRow ( output3_dbrow );
while ( generator4.hasRows () ) {
const auto& output4_dbrow = generator4.generate ();
// do something with output4_dbrow
}
}
}
The class representing a tabular database as used by learning tasks.

For each input DBRow of the DatabaseTable, these while loops output 3 x 4 = 12 identical DBRows. As can be seen, when several DBRowGenerator instances are to be used in sequence, the code is not very easy to write. The DBRowGeneratorSet simplifies the coding as follows:

gum::learning::DBRowGeneratorDuplicate generator3 ( col_types, 3 );
gum::learning::DBRowGeneratorDuplicate generator4 ( col_types, 4 );
genset.insertGenerator ( generator3 );
genset.insertGenerator ( generator4 );
for ( auto dbrow : database ) {
genset.setInputRow ( dbrow );
while ( genset.hasRows () ) {
const auto& output_dbrow = genset.generate ();
// do something with output_dbrow
}
}
void insertGenerator(const Generator &generator)
inserts a new generator at the end of the set
bool hasRows()
returns true if there are still rows that can be output by the set of generators
bool setInputRow(const DBRow< DBTranslatedValue > &input_row)
sets the input row from which the generators will create new rows
const DBRow< DBTranslatedValue > & generate()
generates a new output row from the input row
DBRowGeneratorSet()
default constructor

As can be seen, whatever the number of DBRowGenerator instances packed into the DBRowGeneratorSet, only one while loop is needed to parse all the generated output DBRow instances.

Definition at line 129 of file DBRowGeneratorSet.h.

Constructor & Destructor Documentation

◆ DBRowGeneratorSet() [1/3]

gum::learning::DBRowGeneratorSet::DBRowGeneratorSet ( )

default constructor

Referenced by DBRowGeneratorSet(), DBRowGeneratorSet(), clone(), operator=(), and operator=().

Here is the caller graph for this function:

◆ DBRowGeneratorSet() [2/3]

gum::learning::DBRowGeneratorSet::DBRowGeneratorSet ( const DBRowGeneratorSet & from)

copy constructor

References DBRowGeneratorSet().

Here is the call graph for this function:

◆ DBRowGeneratorSet() [3/3]

gum::learning::DBRowGeneratorSet::DBRowGeneratorSet ( DBRowGeneratorSet && from)

move constructor

References DBRowGeneratorSet().

Here is the call graph for this function:

◆ ~DBRowGeneratorSet()

virtual gum::learning::DBRowGeneratorSet::~DBRowGeneratorSet ( )
virtual

destructor

Member Function Documentation

◆ clear()

void gum::learning::DBRowGeneratorSet::clear ( )

removes all the generators

References clear().

Referenced by clear().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ clone()

virtual DBRowGeneratorSet * gum::learning::DBRowGeneratorSet::clone ( ) const
virtual

virtual copy constructor

References DBRowGeneratorSet().

Here is the call graph for this function:

◆ columnsOfInterest()

const std::vector< std::size_t > & gum::learning::DBRowGeneratorSet::columnsOfInterest ( ) const

returns the current set of columns of interest

References columnsOfInterest().

Referenced by columnsOfInterest().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ generate()

const DBRow< DBTranslatedValue > & gum::learning::DBRowGeneratorSet::generate ( )

generates a new output row from the input row

References generate().

Referenced by generate().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ hasRows()

bool gum::learning::DBRowGeneratorSet::hasRows ( )

returns true if there are still rows that can be output by the set of generators

References hasRows().

Referenced by hasRows().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ insertGenerator() [1/2]

template<class Generator>
void gum::learning::DBRowGeneratorSet::insertGenerator ( const Generator & generator)

inserts a new generator at the end of the set

Exceptions
OperationNotAllowedis raised if the generator set has already started generating output rows and is currently in a state where the generation is not completed yet (i.e., we still need to call the generate() method to complete it).

◆ insertGenerator() [2/2]

template<class Generator>
void gum::learning::DBRowGeneratorSet::insertGenerator ( const Generator & generator,
const std::size_t i )

inserts a new generator at the ith position of the set

Exceptions
OperationNotAllowedis raised if the generator set has already started generating output rows and is currently in a state where the generation is not completed yet (i.e., we still need to call the generate() method to complete it).

◆ nbGenerators()

std::size_t gum::learning::DBRowGeneratorSet::nbGenerators ( ) const
noexcept

returns the number of generators

◆ operator=() [1/2]

DBRowGeneratorSet & gum::learning::DBRowGeneratorSet::operator= ( const DBRowGeneratorSet & from)

copy operator

References DBRowGeneratorSet().

Here is the call graph for this function:

◆ operator=() [2/2]

DBRowGeneratorSet & gum::learning::DBRowGeneratorSet::operator= ( DBRowGeneratorSet && from)

move operator

References DBRowGeneratorSet().

Here is the call graph for this function:

◆ operator[]() [1/2]

DBRowGenerator & gum::learning::DBRowGeneratorSet::operator[] ( const std::size_t i)

returns the ith generator

Warning
this operator assumes that there are at least i+1 generators. So, it won't check that the ith generator actually exists. If unsure, use method generatorSafe that performs this check.

◆ operator[]() [2/2]

const DBRowGenerator & gum::learning::DBRowGeneratorSet::operator[] ( const std::size_t i) const

returns the ith generator

Warning
this operator assumes that there are at least i+1 generators. So, it won't check that the ith generator actually exists. If unsure, use method generatorSafe that performs this check.

◆ reset()

void gum::learning::DBRowGeneratorSet::reset ( )

resets all the generators

References reset().

Referenced by reset().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ setBayesNet()

template<typename GUM_SCALAR>
void gum::learning::DBRowGeneratorSet::setBayesNet ( const BayesNet< GUM_SCALAR > & new_bn)

assign a new Bayes net to all the generators that depend on a BN

Typically, generators based on EM or K-means depend on a model to compute correctly their outputs. Method setBayesNet enables to update their BN model.

References setBayesNet().

Referenced by setBayesNet().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ setColumnsOfInterest() [1/2]

void gum::learning::DBRowGeneratorSet::setColumnsOfInterest ( const std::vector< std::size_t > & cols_of_interest)

sets the columns of interest: the output DBRow needs only contain correct values fot these columns

This method is useful, e.g., for EM-like algorithms that need to know which unobserved variables/values need be filled. In this case, the DBRowGenerator instances contained in the DBRowGeneratorSet still output DBRows with the same columns as the DatabaseTable, but only the columns of these DBRows corresponding to those passed in argument to Method setColumnsOfInterest are meaningful. For instance, if a DatabaseTable contains 10 columns and Method setColumnsOfInterest() is applied with vector { 0, 3, 4 }, then the DBRowGenerator instances contained in the DBRowGeneratorSet will output DBRows with 10 columns, in which only columns 0, 3 and 4 are guaranteed to have correct values (columns are always indexed, starting from 0).

Exceptions
OperationNotAllowedis raised if the generator set has already started generating output rows and is currently in a state where the generation is not completed yet (i.e., we still need to call the generate() method to complete it).

References setColumnsOfInterest().

Referenced by setColumnsOfInterest(), and setColumnsOfInterest().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ setColumnsOfInterest() [2/2]

void gum::learning::DBRowGeneratorSet::setColumnsOfInterest ( std::vector< std::size_t > && cols_of_interest)

sets the columns of interest: the output DBRow needs only contain correct values fot these columns

This method is useful, e.g., for EM-like algorithms that need to know which unobserved variables/values need be filled. In this case, the DBRowGenerator instances contained in the DBRowGeneratorSet still output DBRows with the same columns as the DatabaseTable, but only the columns of these DBRows corresponding to those passed in argument to Method setColumnsOfInterest are meaningful. For instance, if a DatabaseTable contains 10 columns and Method setColumnsOfInterest() is applied with vector { 0, 3, 4 }, then the DBRowGenerator instances contained in the DBRowGeneratorSet will output DBRows with 10 columns, in which only columns 0, 3 and 4 are guaranteed to have correct values (columns are always indexed, starting from 0).

Exceptions
OperationNotAllowedis raised if the generator set has already started generating output rows and is currently in a state where the generation is not completed yet (i.e., we still need to call the generate() method to complete it).

References setColumnsOfInterest().

Here is the call graph for this function:

◆ setInputRow()

bool gum::learning::DBRowGeneratorSet::setInputRow ( const DBRow< DBTranslatedValue > & input_row)

sets the input row from which the generators will create new rows

Returns
true if the set of generators is able to generate output rows from the input row passed in argument

References setInputRow().

Referenced by setInputRow().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ size()

std::size_t gum::learning::DBRowGeneratorSet::size ( ) const
noexcept

returns the number of generators (alias for nbGenerators)

References size().

Referenced by size().

Here is the call graph for this function:
Here is the caller graph for this function:

The documentation for this class was generated from the following file: