![]() |
aGrUM 2.3.2
a C++ library for (probabilistic) graphical models
|
The class that computes counting of observations from the database. More...
#include <agrum/BN/learning/scores_and_tests/recordCounter.h>
Public Member Functions | |
Constructors / Destructors | |
| RecordCounter (const DBRowGeneratorParser &parser, const std::vector< std::pair< std::size_t, std::size_t > > &ranges, const Bijection< NodeId, std::size_t > &nodeId2columns=Bijection< NodeId, std::size_t >()) | |
| default constructor | |
| RecordCounter (const DBRowGeneratorParser &parser, const Bijection< NodeId, std::size_t > &nodeId2columns=Bijection< NodeId, std::size_t >()) | |
| default constructor | |
| RecordCounter (const RecordCounter &from) | |
| copy constructor | |
| RecordCounter (RecordCounter &&from) | |
| move constructor | |
| virtual RecordCounter * | clone () const |
| virtual copy constructor | |
| virtual | ~RecordCounter () |
| destructor | |
Operators | |
| RecordCounter & | operator= (const RecordCounter &from) |
| copy operator | |
| RecordCounter & | operator= (RecordCounter &&from) |
| move operator | |
Accessors / Modifiers | |
| void | clear () |
| clears all the last database-parsed counting from memory | |
| virtual void | setNumberOfThreads (Size nb) |
| sets the number max of threads that can be used | |
| void | setMinNbRowsPerThread (const std::size_t nb) const |
| changes the number min of rows a thread should process in a multithreading context | |
| std::size_t | minNbRowsPerThread () const |
| returns the minimum of rows that each thread should process | |
| const std::vector< double > & | counts (const IdCondSet &ids, const bool check_discrete_vars=false) |
| returns the counts over all the variables in an IdCondSet | |
| void | setRanges (const std::vector< std::pair< std::size_t, std::size_t > > &new_ranges) |
| sets new ranges to perform the counting | |
| void | clearRanges () |
| reset the ranges to the one range corresponding to the whole database | |
| const std::vector< std::pair< std::size_t, std::size_t > > & | ranges () const |
| returns the current ranges | |
| template<typename GUM_SCALAR> | |
| void | setBayesNet (const BayesNet< GUM_SCALAR > &new_bn) |
| assign a new Bayes net to all the counter's generators depending on a BN | |
| const Bijection< NodeId, std::size_t > & | nodeId2Columns () const |
| returns the mapping from ids to column positions in the database | |
| const DatabaseTable & | database () const |
| returns the database on which we perform the counts | |
Accessors/Modifiers | |
| virtual Size | getNumberOfThreads () const |
| returns the current max number of threads used by the class containing this ThreadNumberManager | |
| bool | isGumNumberOfThreadsOverriden () const |
| indicates whether the class containing this ThreadNumberManager set its own number of threads | |
Private Attributes | |
| Size | _nb_threads_ {0} |
| the max number of threads used by the class | |
The class that computes counting of observations from the database.
This class is the one to be called by scores and independence tests to compute the counting of observations from tabular datasets they need. The counting are performed the following way: when asked for the counting over a set X = {X_1,...,X_n} of variables, the RecordCounter first checks whether it already contains some counting over a set Y of variables containing X. If this is the case, then it extracts from the counting over Y those over X (this is usually way faster than determining the counting by parsing the database). Otherwise, it determines the counting over X by parsing in a parallel way the database. Only the result of the last database-parsed counting is available for the subset counting determination. As an example, if we create a RecordCounter and ask it the counting over {A,B,C}, it will parse the database and provide the counting. Then, if we ask it counting over B, it will use the table over {A,B,C} to produce the counting we look for. Then, asking for counting over {A,C} will be performed the same way. Now, asking counting over {B,C,D} will require another database parsing. Finally, if we ask for counting over A, a new database parsing will be performed because only the counting over {B,C,D} are now contained in the RecordCounter.
Definition at line 127 of file recordCounter.h.
| gum::learning::RecordCounter::RecordCounter | ( | const DBRowGeneratorParser & | parser, |
| const std::vector< std::pair< std::size_t, std::size_t > > & | ranges, | ||
| const Bijection< NodeId, std::size_t > & | nodeId2columns = Bijection< NodeId, std::size_t >() ) |
default constructor
| parser | the parser used to parse the database |
| ranges | a set of pairs {(X1,Y1),...,(Xn,Yn)} of database's rows indices. The counting are then performed only on the union of the rows [Xi,Yi), i in {1,...,n}. This is useful, e.g, when performing cross validation tasks, in which part of the database should be ignored. An empty set of ranges is equivalent to an interval [X,Y) ranging over the whole database. |
| nodeId2Columns | a mapping from the ids of the nodes in the graphical model to the corresponding column in the DatabaseTable parsed by the parser. This enables estimating from a database in which variable A corresponds to the 2nd column the parameters of a BN in which variable A has a NodeId of 5. An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable. |
References ranges().
Referenced by RecordCounter(), RecordCounter(), clone(), operator=(), and operator=().
| gum::learning::RecordCounter::RecordCounter | ( | const DBRowGeneratorParser & | parser, |
| const Bijection< NodeId, std::size_t > & | nodeId2columns = Bijection< NodeId, std::size_t >() ) |
default constructor
| parser | the parser used to parse the database |
| nodeId2Columns | a mapping from the ids of the nodes in the graphical model to the corresponding column in the DatabaseTable parsed by the parser. This enables estimating from a database in which variable A corresponds to the 2nd column the parameters of a BN in which variable A has a NodeId of 5. An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable. |
| gum::learning::RecordCounter::RecordCounter | ( | const RecordCounter & | from | ) |
| gum::learning::RecordCounter::RecordCounter | ( | RecordCounter && | from | ) |
|
virtual |
destructor
| void gum::learning::RecordCounter::clear | ( | ) |
clears all the last database-parsed counting from memory
| void gum::learning::RecordCounter::clearRanges | ( | ) |
reset the ranges to the one range corresponding to the whole database
|
virtual |
| const std::vector< double > & gum::learning::RecordCounter::counts | ( | const IdCondSet & | ids, |
| const bool | check_discrete_vars = false ) |
returns the counts over all the variables in an IdCondSet
| ids | the idset of the variables over which we perform counting. |
| check_discrete_vars | The record counter can only produce correct results on sets of discrete variables. By default, the method does not check whether the variables corresponding to the IdCondSet are actually discrete. If check_discrete_vars is set to true, then this check is performed before computing the counting vector. In this case, if a variable is not discrete, a TypeError exception is raised. |
| const DatabaseTable & gum::learning::RecordCounter::database | ( | ) | const |
returns the database on which we perform the counts
|
virtualinherited |
returns the current max number of threads used by the class containing this ThreadNumberManager
Implements gum::IThreadNumberManager.
Referenced by gum::learning::IBNLearner::createParamEstimator_(), gum::learning::IBNLearner::createScore_(), gum::credal::InferenceEngine< GUM_SCALAR >::displatchMarginalsToThreads_(), gum::credal::MultipleInferenceEngine< GUM_SCALAR, BNInferenceEngine >::expFusion_(), gum::ScheduledInference::scheduler(), and gum::credal::MultipleInferenceEngine< GUM_SCALAR, BNInferenceEngine >::verticesFusion_().
|
virtualinherited |
indicates whether the class containing this ThreadNumberManager set its own number of threads
Implements gum::IThreadNumberManager.
Referenced by gum::learning::IBNLearner::createParamEstimator_(), and gum::learning::IBNLearner::createScore_().
| std::size_t gum::learning::RecordCounter::minNbRowsPerThread | ( | ) | const |
returns the minimum of rows that each thread should process
returns the mapping from ids to column positions in the database
| RecordCounter & gum::learning::RecordCounter::operator= | ( | const RecordCounter & | from | ) |
| RecordCounter & gum::learning::RecordCounter::operator= | ( | RecordCounter && | from | ) |
| const std::vector< std::pair< std::size_t, std::size_t > > & gum::learning::RecordCounter::ranges | ( | ) | const |
returns the current ranges
Referenced by RecordCounter().
| void gum::learning::RecordCounter::setBayesNet | ( | const BayesNet< GUM_SCALAR > & | new_bn | ) |
assign a new Bayes net to all the counter's generators depending on a BN
Typically, generators based on EM or K-means depend on a model to compute correctly their outputs. Method setBayesNet enables to update their BN model.
| void gum::learning::RecordCounter::setMinNbRowsPerThread | ( | const std::size_t | nb | ) | const |
changes the number min of rows a thread should process in a multithreading context
When Method counts executes several threads to perform counting on the rows of the database, the MinNbRowsPerThread indicates how many rows each thread should at least process. This is used to compute the number of threads actually run. This number is equal to the min between the max number of threads allowed and the number of records in the database divided by nb.
|
virtual |
sets the number max of threads that can be used
| nb | the number max of threads to be used. If this number is set to 0, then it is defaulted to aGrUM's max number of threads |
Reimplemented from gum::ThreadNumberManager.
| void gum::learning::RecordCounter::setRanges | ( | const std::vector< std::pair< std::size_t, std::size_t > > & | new_ranges | ) |
sets new ranges to perform the counting
| ranges | a set of pairs {(X1,Y1),...,(Xn,Yn)} of database's rows indices. The counting are then performed only on the union of the rows [Xi,Yi), i in {1,...,n}. This is useful, e.g, when performing cross validation tasks, in which part of the database should be ignored. An empty set of ranges is equivalent to an interval [X,Y) ranging over the whole database. |
|
privateinherited |
the max number of threads used by the class
Definition at line 126 of file threadNumberManager.h.