aGrUM 2.3.2
a C++ library for (probabilistic) graphical models
gum::learning::DBRowGenerator Class Referenceabstract

The base class for all DBRow generators. More...

#include <agrum/base/database/DBRowGenerator.h>

Inheritance diagram for gum::learning::DBRowGenerator:

Public Member Functions

Constructors / Destructors
 DBRowGenerator (const std::vector< DBTranslatedValueType > &column_types, const DBRowGeneratorGoal goal)
 default constructor
 DBRowGenerator (const DBRowGenerator &from)
 copy constructor
 DBRowGenerator (DBRowGenerator &&from)
 move constructor
virtual DBRowGeneratorclone () const =0
 virtual copy constructor
virtual ~DBRowGenerator ()
 destructor
Accessors / Modifiers
bool hasRows ()
 returns true if there are still rows that can be output by the DBRowGenerator
bool setInputRow (const DBRow< DBTranslatedValue > &row)
 sets the input row from which the generator will create its output rows
virtual const DBRow< DBTranslatedValue > & generate ()=0
 generate new rows from the input row
void decreaseRemainingRows ()
 decrease the number of remaining output rows
virtual void reset ()
 resets the generator. There are therefore no more ouput row to generate
virtual void setColumnsOfInterest (const std::vector< std::size_t > &cols_of_interest)
 sets the columns of interest: the output DBRow needs only contain correct values fot these columns
virtual void setColumnsOfInterest (std::vector< std::size_t > &&cols_of_interest)
 sets the columns of interest: the output DBRow needs only contain correct values fot these columns
const std::vector< std::size_t > & columnsOfInterest () const
 returns the current set of columns of interest
DBRowGeneratorGoal goal () const
 returns the goal of the DBRowGenerator

Protected Member Functions

DBRowGeneratoroperator= (const DBRowGenerator &)
 copy constructor
DBRowGeneratoroperator= (DBRowGenerator &&)
 move constructor
virtual std::size_t computeRows_ (const DBRow< DBTranslatedValue > &row)=0
 the method that computes the set of DBRow instances to output after method setInputRow has been called

Protected Attributes

std::size_t nb_remaining_output_rows_ {std::size_t(0)}
 the number of output rows still to retrieve through the generate method
std::vector< DBTranslatedValueTypecolumn_types_
 the types of the columns in the DatabaseTable
std::vector< std::size_t > columns_of_interest_
 the set of columns of interest
DBRowGeneratorGoal goal_ {DBRowGeneratorGoal::OTHER_THINGS_THAN_REMOVE_MISSING_VALUES}
 the goal of the DBRowGenerator (just remove missing values or not)

Detailed Description

The base class for all DBRow generators.

A DBRowGenerator instance takes as input a DBRow containing DBTranslatedValue instances provided directly by a DatabaseTable or resulting from a DBRow generation by another DBRowGenerator. Then, it produces from 0 to several instances of DBRow of DBTranslatedValue. This is essentially useful to deal with missing values: during learning, when a DBRow contains some missing values, what should we do with it? Should we discard it? Should we use an EM algorithm to produce several DBRows weighted by their probability of occurrence? Should we use a K-means algorithm to produce only one DBRow of highest probability of occurrence? Using the appropriate DBRowGenerator, you can apply any of these rules when your learning algorithm parses the DatabaseTable. You just need to indicate which DBRowGenerator to use, no line of code needs be changed in your high-level learning algorithm.

As an example of how a DBRowGenerator works, an "Identity" DBRowGenerator takes as input a DBRow and returns it without any further processing, so it "produces" only one output DBRow. An EM DBRowGenerator takes in input a DBRow in which some cells may be missing. In this case, it produces all the possible combinations of values that these missing values may take and it assigns to these combinations a weight proportional to their probability of occurrence according to a given model. As such, it may most often produce several output DBRows.

The standard usage of a DBRowGenerator is the following:

// create a DatabaseTable and fill it
for ( int i = 0; i < 10; ++i )
// fill the database
// keep in a vector the types of the columns in the database
const std::vector<gum::learning::DBTranslatedValueType>
// create the generator
// parse the database and produce output rows
for ( auto dbrow : database ) {
generator.setInputRow ( dbrow );
while ( generator.hasRows () ) {
const auto& output_dbrow = generator.generate ();
// do something with the output dbrow
}
}
A DBRowGenerator class that returns exactly the rows it gets in input.
The databases' cell translators for labelized variables.
the class for packing together the translators used to preprocess the datasets
std::size_t insertTranslator(const DBTranslator &translator, const std::size_t column, const bool unique_column=true)
inserts a new translator at the end of the translator set
The class representing a tabular database as used by learning tasks.

All DBRowGenerator classes should derive from this class. It takes care of the interaction with the RecordCounter / Score classes. The user who wishes to create a new DBRowGenerator, say for instance, one that outputs k times the input row, just has to define the following class (not all the constructors/destructors are required, but we provide them for self-consistency), the important part of which is located from the "Accessors / Modifiers" section on:

class DuplicateGenerator : public DBRowGenerator {
public:
// ######################################################################
// Constructors / Destructors
// ######################################################################
DuplicateGenerator(const std::vector< DBTranslatedValueType > column_types,
const std::size_t nb_duplicates)
: DBRowGenerator ( column_types )
, _nb_duplicates_ ( nb_duplicates ) {}
DuplicateGenerator( const DuplicateGenerator& from)
: DBRowGenerator( from)
, _input_row_( from. _input_row_ )
, _nb_duplicates_ ( from. _nb_duplicates_ ) {}
DuplicateGenerator(DuplicateGenerator&& from)
: DBRowGenerator ( std::move( from ))
, _input_row_( from. _input_row_ )
, _nb_duplicates_ ( from. _nb_duplicates_ ) {}
virtual DuplicateGenerator* clone () const {
return new DuplicateGenerator(*this);
}
~DuplicateGenerator() {}
// ######################################################################
// Operators
// ######################################################################
DuplicateGenerator&
operator=( const DuplicateGenerator& from ) {
DBRowGenerator::operator=( from );
_input_row_ = from. _input_row_;
_nb_duplicates_ = from. _nb_duplicates_;
return *this;
}
DuplicateGenerator& operator=( DuplicateGenerator&& from ) {
DBRowGenerator::operator=( std::move( from ) );
_input_row_ = from. _input_row_;
_nb_duplicates_ = from. _nb_duplicates_;
return *this;
}
// ######################################################################
// Accessors / Modifiers
// ######################################################################
virtual const DBRow< DBTranslatedValue >& generate() final {
this->decreaseRemainingRows();
return * _input_row_;
}
protected:
virtual std::size_t
computeRows_( const DBRow< DBTranslatedValue >& row ) final {
_input_row_ = &row;
return _nb_duplicates_;
}
private:
const DBRow< DBTranslatedValue >* _input_row_ { nullptr };
std::size_t _nb_duplicates_ { std::size_t(1) };
};
The base class for all DBRow generators.
DBRowGenerator(const std::vector< DBTranslatedValueType > &column_types, const DBRowGeneratorGoal goal)
default constructor
STL namespace.

Definition at line 223 of file DBRowGenerator.h.

Constructor & Destructor Documentation

◆ DBRowGenerator() [1/3]

gum::learning::DBRowGenerator::DBRowGenerator ( const std::vector< DBTranslatedValueType > & column_types,
const DBRowGeneratorGoal goal )

default constructor

Parameters
column_typesindicates for each column whether this is a continuous or a discrete one

References goal().

Referenced by DBRowGenerator(), DBRowGenerator(), clone(), operator=(), and operator=().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ DBRowGenerator() [2/3]

gum::learning::DBRowGenerator::DBRowGenerator ( const DBRowGenerator & from)

copy constructor

References DBRowGenerator().

Here is the call graph for this function:

◆ DBRowGenerator() [3/3]

gum::learning::DBRowGenerator::DBRowGenerator ( DBRowGenerator && from)

move constructor

References DBRowGenerator().

Here is the call graph for this function:

◆ ~DBRowGenerator()

virtual gum::learning::DBRowGenerator::~DBRowGenerator ( )
virtual

destructor

Member Function Documentation

◆ clone()

virtual DBRowGenerator * gum::learning::DBRowGenerator::clone ( ) const
pure virtual

virtual copy constructor

Implemented in gum::learning::DBRowGenerator4CompleteRows, gum::learning::DBRowGeneratorEM< GUM_SCALAR >, and gum::learning::DBRowGeneratorIdentity.

References DBRowGenerator().

Here is the call graph for this function:

◆ columnsOfInterest()

const std::vector< std::size_t > & gum::learning::DBRowGenerator::columnsOfInterest ( ) const

returns the current set of columns of interest

◆ computeRows_()

virtual std::size_t gum::learning::DBRowGenerator::computeRows_ ( const DBRow< DBTranslatedValue > & row)
protectedpure virtual

the method that computes the set of DBRow instances to output after method setInputRow has been called

Implemented in gum::learning::DBRowGenerator4CompleteRows, gum::learning::DBRowGeneratorEM< GUM_SCALAR >, and gum::learning::DBRowGeneratorIdentity.

◆ decreaseRemainingRows()

void gum::learning::DBRowGenerator::decreaseRemainingRows ( )

decrease the number of remaining output rows

When method setInputRow is performed, the DBRowGenerator knows how many output rows it will be able to generate. Each time method decreaseRemainingRows is called, we decrement this number. When the number becomes equal to 0, then there remains no new output row to generate.

◆ generate()

virtual const DBRow< DBTranslatedValue > & gum::learning::DBRowGenerator::generate ( )
pure virtual

◆ goal()

DBRowGeneratorGoal gum::learning::DBRowGenerator::goal ( ) const

returns the goal of the DBRowGenerator

Referenced by DBRowGenerator(), and gum::learning::DBRowGeneratorWithBN< GUM_SCALAR >::DBRowGeneratorWithBN().

Here is the caller graph for this function:

◆ hasRows()

bool gum::learning::DBRowGenerator::hasRows ( )

returns true if there are still rows that can be output by the DBRowGenerator

◆ operator=() [1/2]

DBRowGenerator & gum::learning::DBRowGenerator::operator= ( const DBRowGenerator & )
protected

copy constructor

References DBRowGenerator().

Here is the call graph for this function:

◆ operator=() [2/2]

DBRowGenerator & gum::learning::DBRowGenerator::operator= ( DBRowGenerator && )
protected

move constructor

References DBRowGenerator().

Here is the call graph for this function:

◆ reset()

virtual void gum::learning::DBRowGenerator::reset ( )
virtual

resets the generator. There are therefore no more ouput row to generate

◆ setColumnsOfInterest() [1/2]

virtual void gum::learning::DBRowGenerator::setColumnsOfInterest ( const std::vector< std::size_t > & cols_of_interest)
virtual

sets the columns of interest: the output DBRow needs only contain correct values fot these columns

This method is useful, e.g., for EM-like algorithms that need to know which unobserved variables/values need be filled. In this case, the DBRowGenerator still outputs DBRows with the same columns as the DatabaseTable, but only the columns of these DBRows corresponding to those passed in argument to Method setColumnsOfInterest are meaningful. For instance, if a DatabaseTable contains 10 columns and Method setColumnsOfInterest() is applied with vector<> { 0, 3, 4 }, then the DBRowGenerator will output DBRows with 10 columns, in which only columns 0, 3 and 4 are guaranteed to have correct values (columns are always indexed, starting from 0).

◆ setColumnsOfInterest() [2/2]

virtual void gum::learning::DBRowGenerator::setColumnsOfInterest ( std::vector< std::size_t > && cols_of_interest)
virtual

sets the columns of interest: the output DBRow needs only contain correct values fot these columns

This method is useful, e.g., for EM-like algorithms that need to know which unobserved variables/values need be filled. In this case, the DBRowGenerator still outputs DBRows with the same columns as the DatabaseTable, but only the columns of these DBRows corresponding to those passed in argument to Method setColumnsOfInterest are meaningful. For instance, if a DatabaseTable contains 10 columns and Method setColumnsOfInterest() is applied with vector<> { 0, 3, 4 }, then the DBRowGenerator will output DBRows with 10 columns, in which only columns 0, 3 and 4 are guaranteed to have correct values (columns are always indexed, starting from 0).

◆ setInputRow()

bool gum::learning::DBRowGenerator::setInputRow ( const DBRow< DBTranslatedValue > & row)

sets the input row from which the generator will create its output rows

Returns
a Boolean indicating whether, from this input DBRow, the DBRowGenerator is capable of outputing at least one row or not

Member Data Documentation

◆ column_types_

std::vector< DBTranslatedValueType > gum::learning::DBRowGenerator::column_types_
protected

the types of the columns in the DatabaseTable

This is useful to determine whether we need to use the .discr_val field or the .cont_val field in DBTranslatedValue instances.

Definition at line 330 of file DBRowGenerator.h.

◆ columns_of_interest_

std::vector< std::size_t > gum::learning::DBRowGenerator::columns_of_interest_
protected

the set of columns of interest

Definition at line 333 of file DBRowGenerator.h.

◆ goal_

DBRowGeneratorGoal gum::learning::DBRowGenerator::goal_ {DBRowGeneratorGoal::OTHER_THINGS_THAN_REMOVE_MISSING_VALUES}
protected

the goal of the DBRowGenerator (just remove missing values or not)

Definition at line 336 of file DBRowGenerator.h.

◆ nb_remaining_output_rows_

std::size_t gum::learning::DBRowGenerator::nb_remaining_output_rows_ {std::size_t(0)}
protected

the number of output rows still to retrieve through the generate method

Definition at line 325 of file DBRowGenerator.h.

325{std::size_t(0)};

The documentation for this class was generated from the following file: