aGrUM 2.3.2
a C++ library for (probabilistic) graphical models
gum::learning::RecordCounter Class Reference

The class that computes counting of observations from the database. More...

#include <agrum/BN/learning/scores_and_tests/recordCounter.h>

Inheritance diagram for gum::learning::RecordCounter:
Collaboration diagram for gum::learning::RecordCounter:

Public Member Functions

Constructors / Destructors
 RecordCounter (const DBRowGeneratorParser &parser, const std::vector< std::pair< std::size_t, std::size_t > > &ranges, const Bijection< NodeId, std::size_t > &nodeId2columns=Bijection< NodeId, std::size_t >())
 default constructor
 RecordCounter (const DBRowGeneratorParser &parser, const Bijection< NodeId, std::size_t > &nodeId2columns=Bijection< NodeId, std::size_t >())
 default constructor
 RecordCounter (const RecordCounter &from)
 copy constructor
 RecordCounter (RecordCounter &&from)
 move constructor
virtual RecordCounterclone () const
 virtual copy constructor
virtual ~RecordCounter ()
 destructor
Operators
RecordCounteroperator= (const RecordCounter &from)
 copy operator
RecordCounteroperator= (RecordCounter &&from)
 move operator
Accessors / Modifiers
void clear ()
 clears all the last database-parsed counting from memory
virtual void setNumberOfThreads (Size nb)
 sets the number max of threads that can be used
void setMinNbRowsPerThread (const std::size_t nb) const
 changes the number min of rows a thread should process in a multithreading context
std::size_t minNbRowsPerThread () const
 returns the minimum of rows that each thread should process
const std::vector< double > & counts (const IdCondSet &ids, const bool check_discrete_vars=false)
 returns the counts over all the variables in an IdCondSet
void setRanges (const std::vector< std::pair< std::size_t, std::size_t > > &new_ranges)
 sets new ranges to perform the counting
void clearRanges ()
 reset the ranges to the one range corresponding to the whole database
const std::vector< std::pair< std::size_t, std::size_t > > & ranges () const
 returns the current ranges
template<typename GUM_SCALAR>
void setBayesNet (const BayesNet< GUM_SCALAR > &new_bn)
 assign a new Bayes net to all the counter's generators depending on a BN
const Bijection< NodeId, std::size_t > & nodeId2Columns () const
 returns the mapping from ids to column positions in the database
const DatabaseTabledatabase () const
 returns the database on which we perform the counts
Accessors/Modifiers
virtual Size getNumberOfThreads () const
 returns the current max number of threads used by the class containing this ThreadNumberManager
bool isGumNumberOfThreadsOverriden () const
 indicates whether the class containing this ThreadNumberManager set its own number of threads

Private Attributes

Size _nb_threads_ {0}
 the max number of threads used by the class

Detailed Description

The class that computes counting of observations from the database.

This class is the one to be called by scores and independence tests to compute the counting of observations from tabular datasets they need. The counting are performed the following way: when asked for the counting over a set X = {X_1,...,X_n} of variables, the RecordCounter first checks whether it already contains some counting over a set Y of variables containing X. If this is the case, then it extracts from the counting over Y those over X (this is usually way faster than determining the counting by parsing the database). Otherwise, it determines the counting over X by parsing in a parallel way the database. Only the result of the last database-parsed counting is available for the subset counting determination. As an example, if we create a RecordCounter and ask it the counting over {A,B,C}, it will parse the database and provide the counting. Then, if we ask it counting over B, it will use the table over {A,B,C} to produce the counting we look for. Then, asking for counting over {A,C} will be performed the same way. Now, asking counting over {B,C,D} will require another database parsing. Finally, if we ask for counting over A, a new database parsing will be performed because only the counting over {B,C,D} are now contained in the RecordCounter.

Here is an example of how to use the RecordCounter class:
// here, write the code to construct your database, e.g.:
gum::learning::DBInitializerFromCSV<> initializer( "file.csv" );
const auto& var_names = initializer.variableNames();
const std::size_t nb_vars = var_names.size();
for (std::size_t i = 0; i < nb_vars; ++i) {
translator_set.insertTranslator(translator, i);
}
// create the parser of the database
// create the record counter
// get the counts:
gum::learning::IdCondSet<> ids ( 0, gum::vector<gum::NodeId> {2,1} );
const std::vector< double >& counts1 = counter.counts ( ids );
// change the rows from which we compute the counts:
// they should now be made on rows [500,600) U [1050,1125) U [100,150)
std::vector<std::pair<std::size_t,std::size_t>> new_ranges
{ std::pair<std::size_t,std::size_t>(500,600),
std::pair<std::size_t,std::size_t>(1050,1125),
std::pair<std::size_t,std::size_t>(100,150) };
counter.setRanges ( new_ranges );
const std::vector< double >& counts2 = counter.counts ( ids );
The class for initializing DatabaseTable and RawDatabaseTable instances from CSV files.
the class used to read a row in the database and to transform it into a set of DBRow instances that c...
The class used to pack sets of generators.
The databases' cell translators for continuous variables.
the class for packing together the translators used to preprocess the datasets
std::size_t insertTranslator(const DBTranslator &translator, const std::size_t column, const bool unique_column=true)
inserts a new translator at the end of the translator set
The class representing a tabular database as used by learning tasks.
iterator handler() const
returns a new unsafe handler pointing to the 1st record of the database
A class for storing a pair of sets of NodeIds, the second one corresponding to a conditional set.
Definition idCondSet.h:214
The class that computes counting of observations from the database.
const DatabaseTable & database() const
returns the database on which we perform the counts

Definition at line 127 of file recordCounter.h.

Constructor & Destructor Documentation

◆ RecordCounter() [1/4]

gum::learning::RecordCounter::RecordCounter ( const DBRowGeneratorParser & parser,
const std::vector< std::pair< std::size_t, std::size_t > > & ranges,
const Bijection< NodeId, std::size_t > & nodeId2columns = BijectionNodeId, std::size_t >() )

default constructor

Parameters
parserthe parser used to parse the database
rangesa set of pairs {(X1,Y1),...,(Xn,Yn)} of database's rows indices. The counting are then performed only on the union of the rows [Xi,Yi), i in {1,...,n}. This is useful, e.g, when performing cross validation tasks, in which part of the database should be ignored. An empty set of ranges is equivalent to an interval [X,Y) ranging over the whole database.
nodeId2Columnsa mapping from the ids of the nodes in the graphical model to the corresponding column in the DatabaseTable parsed by the parser. This enables estimating from a database in which variable A corresponds to the 2nd column the parameters of a BN in which variable A has a NodeId of 5. An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable.
Warning
If nodeId2columns is not empty, then only the counts over the ids belonging to this bijection can be computed: applying method counts() over other ids will raise exception NotFound.

References ranges().

Referenced by RecordCounter(), RecordCounter(), clone(), operator=(), and operator=().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ RecordCounter() [2/4]

gum::learning::RecordCounter::RecordCounter ( const DBRowGeneratorParser & parser,
const Bijection< NodeId, std::size_t > & nodeId2columns = BijectionNodeId, std::size_t >() )

default constructor

Parameters
parserthe parser used to parse the database
nodeId2Columnsa mapping from the ids of the nodes in the graphical model to the corresponding column in the DatabaseTable parsed by the parser. This enables estimating from a database in which variable A corresponds to the 2nd column the parameters of a BN in which variable A has a NodeId of 5. An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable.
Warning
If nodeId2columns is not empty, then only the counts over the ids belonging to this bijection can be computed: applying method counts() over other ids will raise exception NotFound.

◆ RecordCounter() [3/4]

gum::learning::RecordCounter::RecordCounter ( const RecordCounter & from)

copy constructor

References RecordCounter().

Here is the call graph for this function:

◆ RecordCounter() [4/4]

gum::learning::RecordCounter::RecordCounter ( RecordCounter && from)

move constructor

References RecordCounter().

Here is the call graph for this function:

◆ ~RecordCounter()

virtual gum::learning::RecordCounter::~RecordCounter ( )
virtual

destructor

Member Function Documentation

◆ clear()

void gum::learning::RecordCounter::clear ( )

clears all the last database-parsed counting from memory

◆ clearRanges()

void gum::learning::RecordCounter::clearRanges ( )

reset the ranges to the one range corresponding to the whole database

◆ clone()

virtual RecordCounter * gum::learning::RecordCounter::clone ( ) const
virtual

virtual copy constructor

References RecordCounter().

Here is the call graph for this function:

◆ counts()

const std::vector< double > & gum::learning::RecordCounter::counts ( const IdCondSet & ids,
const bool check_discrete_vars = false )

returns the counts over all the variables in an IdCondSet

Parameters
idsthe idset of the variables over which we perform counting.
check_discrete_varsThe record counter can only produce correct results on sets of discrete variables. By default, the method does not check whether the variables corresponding to the IdCondSet are actually discrete. If check_discrete_vars is set to true, then this check is performed before computing the counting vector. In this case, if a variable is not discrete, a TypeError exception is raised.
Returns
a vector containing the multidimensional contingency table over all the variables corresponding to the ids passed in argument (both at the left hand side and right hand side of the conditioning bar of the IdCondSet). The first dimension is that of the first variable in the IdCondSet, i.e., when its value increases by 1, the offset in the output vector also increases by 1. The second dimension is that of the second variable in the IdCondSet, i.e., when its value increases by 1, the offset in the ouput vector increases by the domain size of the first variable. For the third variable, the offset corresponds to the product of the domain sizes of the first two variables, and so on.
Warning
The vector returned by the function may differ from one call to another. So, care must be taken. E,g. a code like:
const std::vector< double >&
counts = counter.counts(ids);
counts = counter.counts(other_ids);
const std::vector< double > & counts(const IdCondSet &ids, const bool check_discrete_vars=false)
returns the counts over all the variables in an IdCondSet
may be erroneous because the two calls to method counts() may return references to different vectors. The correct way of using method counts() is always to call it declaring a new reference variable:
const std::vector< double >& counts =
counter.counts(ids);
const std::vector< double >& other_counts =
counter.counts(other_ids);
Exceptions
TypeErroris raised if check_discrete_vars is set to true (i.e., we check that all variables in the IdCondSet are discrete) and if at least one variable is not of a discrete nature.

◆ database()

const DatabaseTable & gum::learning::RecordCounter::database ( ) const

returns the database on which we perform the counts

◆ getNumberOfThreads()

virtual Size gum::ThreadNumberManager::getNumberOfThreads ( ) const
virtualinherited

◆ isGumNumberOfThreadsOverriden()

bool gum::ThreadNumberManager::isGumNumberOfThreadsOverriden ( ) const
virtualinherited

indicates whether the class containing this ThreadNumberManager set its own number of threads

Implements gum::IThreadNumberManager.

Referenced by gum::learning::IBNLearner::createParamEstimator_(), and gum::learning::IBNLearner::createScore_().

Here is the caller graph for this function:

◆ minNbRowsPerThread()

std::size_t gum::learning::RecordCounter::minNbRowsPerThread ( ) const

returns the minimum of rows that each thread should process

◆ nodeId2Columns()

const Bijection< NodeId, std::size_t > & gum::learning::RecordCounter::nodeId2Columns ( ) const

returns the mapping from ids to column positions in the database

Warning
An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable.

◆ operator=() [1/2]

RecordCounter & gum::learning::RecordCounter::operator= ( const RecordCounter & from)

copy operator

References RecordCounter().

Here is the call graph for this function:

◆ operator=() [2/2]

RecordCounter & gum::learning::RecordCounter::operator= ( RecordCounter && from)

move operator

References RecordCounter().

Here is the call graph for this function:

◆ ranges()

const std::vector< std::pair< std::size_t, std::size_t > > & gum::learning::RecordCounter::ranges ( ) const

returns the current ranges

Referenced by RecordCounter().

Here is the caller graph for this function:

◆ setBayesNet()

template<typename GUM_SCALAR>
void gum::learning::RecordCounter::setBayesNet ( const BayesNet< GUM_SCALAR > & new_bn)

assign a new Bayes net to all the counter's generators depending on a BN

Typically, generators based on EM or K-means depend on a model to compute correctly their outputs. Method setBayesNet enables to update their BN model.

◆ setMinNbRowsPerThread()

void gum::learning::RecordCounter::setMinNbRowsPerThread ( const std::size_t nb) const

changes the number min of rows a thread should process in a multithreading context

When Method counts executes several threads to perform counting on the rows of the database, the MinNbRowsPerThread indicates how many rows each thread should at least process. This is used to compute the number of threads actually run. This number is equal to the min between the max number of threads allowed and the number of records in the database divided by nb.

◆ setNumberOfThreads()

virtual void gum::learning::RecordCounter::setNumberOfThreads ( Size nb)
virtual

sets the number max of threads that can be used

Parameters
nbthe number max of threads to be used. If this number is set to 0, then it is defaulted to aGrUM's max number of threads

Reimplemented from gum::ThreadNumberManager.

◆ setRanges()

void gum::learning::RecordCounter::setRanges ( const std::vector< std::pair< std::size_t, std::size_t > > & new_ranges)

sets new ranges to perform the counting

Parameters
rangesa set of pairs {(X1,Y1),...,(Xn,Yn)} of database's rows indices. The counting are then performed only on the union of the rows [Xi,Yi), i in {1,...,n}. This is useful, e.g, when performing cross validation tasks, in which part of the database should be ignored. An empty set of ranges is equivalent to an interval [X,Y) ranging over the whole database.

Member Data Documentation

◆ _nb_threads_

Size gum::ThreadNumberManager::_nb_threads_ {0}
privateinherited

the max number of threads used by the class

Definition at line 126 of file threadNumberManager.h.

126{0};

The documentation for this class was generated from the following file: