The class that computes counting of observations from the database. More...

#include <agrum/BN/learning/scores/recordCounter.h>

Inheritance diagram for gum::learning::RecordCounter:

Collaboration diagram for gum::learning::RecordCounter:

Public Member Functions
Constructors / Destructors
	RecordCounter (const DBRowGeneratorParser &parser, const std::vector< std::pair< std::size_t, std::size_t > > &ranges, const Bijection< NodeId, std::size_t > &nodeId2columns=Bijection< NodeId, std::size_t >())
	default constructor
	RecordCounter (const DBRowGeneratorParser &parser, const Bijection< NodeId, std::size_t > &nodeId2columns=Bijection< NodeId, std::size_t >())
	default constructor
	RecordCounter (const RecordCounter &from)
	copy constructor
	RecordCounter (RecordCounter &&from)
	move constructor
virtual RecordCounter *	clone () const
	virtual copy constructor
	~RecordCounter () override
	destructor
Operators
RecordCounter &	operator= (const RecordCounter &from)
	copy operator
RecordCounter &	operator= (RecordCounter &&from)
	move operator
Accessors / Modifiers
void	clear ()
	clears all the last database-parsed counting from memory
void	setNumberOfThreads (Size nb) override
	sets the number max of threads that can be used
void	setMinNbRowsPerThread (const std::size_t nb) const
	changes the number min of rows a thread should process in a multithreading context
std::size_t	minNbRowsPerThread () const
	returns the minimum of rows that each thread should process
const std::vector< double > &	counts (const IdCondSet &ids, const bool check_discrete_vars=false)
	returns the counts over all the variables in an IdCondSet
void	setRanges (const std::vector< std::pair< std::size_t, std::size_t > > &new_ranges)
	sets new ranges to perform the counting
void	clearRanges ()
	reset the ranges to the one range corresponding to the whole database
const std::vector< std::pair< std::size_t, std::size_t > > &	ranges () const
	returns the current ranges
template<GUM_Numeric GUM_SCALAR>
void	setBayesNet (const BayesNet< GUM_SCALAR > &new_bn)
	assign a new Bayes net to all the counter's generators depending on a BN
const Bijection< NodeId, std::size_t > &	nodeId2Columns () const
	returns the mapping from ids to column positions in the database
const DatabaseTable &	database () const
	returns the database on which we perform the counts
Accessors/Modifiers
Size	getNumberOfThreads () const override
	returns the current max number of threads used by the class containing this ThreadNumberManager
bool	isGumNumberOfThreadsOverriden () const override
	indicates whether the class containing this ThreadNumberManager set its own number of threads

Private Attributes
Size	_nb_threads_ {0}
	the max number of threads used by the class

Detailed Description

The class that computes counting of observations from the database.

This class is the one to be called by scores and independence tests to compute the counting of observations from tabular datasets they need. The counting are performed the following way: when asked for the counting over a set X = {X_1,...,X_n} of variables, the RecordCounter first checks whether it already contains some counting over a set Y of variables containing X. If this is the case, then it extracts from the counting over Y those over X (this is usually way faster than determining the counting by parsing the database). Otherwise, it determines the counting over X by parsing in a parallel way the database. Only the result of the last database-parsed counting is available for the subset counting determination. As an example, if we create a RecordCounter and ask it the counting over {A,B,C}, it will parse the database and provide the counting. Then, if we ask it counting over B, it will use the table over {A,B,C} to produce the counting we look for. Then, asking for counting over {A,C} will be performed the same way. Now, asking counting over {B,C,D} will require another database parsing. Finally, if we ask for counting over A, a new database parsing will be performed because only the counting over {B,C,D} are now contained in the RecordCounter.

Here is an example of how to use the RecordCounter class:: // here, write the code to construct your database, e.g.:

gum::learning::DBInitializerFromCSV<> initializer( "file.csv" );

const auto& var_names = initializer.variableNames();

const std::size_t nb_vars = var_names.size();

gum::learning::DBTranslatorSet<> translator_set;

gum::learning::DBTranslator4ContinuousVariable<> translator;

for (std::size_t i = 0; i < nb_vars; ++i) {

translator_set.insertTranslator(translator, i);

}

gum::learning::DatabaseTable<> database(translator_set);

// create the parser of the database

gum::learning::DBRowGeneratorSet<> genset;

gum::learning::DBRowGeneratorParser<> parser(database.handler(), genset);

// create the record counter

gum::learning::RecordCounter<> counter(parser);

// get the counts:

gum::learning::IdCondSet<> ids ( 0, gum::vector<gum::NodeId> {2,1} );

const std::vector< double >& counts1 = counter.counts ( ids );

// change the rows from which we compute the counts:

// they should now be made on rows [500,600) U [1050,1125) U [100,150)

std::vector<std::pair<std::size_t,std::size_t>> new_ranges

{ std::pair<std::size_t,std::size_t>(500,600),

std::pair<std::size_t,std::size_t>(1050,1125),

std::pair<std::size_t,std::size_t>(100,150) };

counter.setRanges ( new_ranges );

const std::vector< double >& counts2 = counter.counts ( ids );

gum::learning::DBInitializerFromCSV
The class for initializing DatabaseTable and RawDatabaseTable instances from CSV files.
Definition DBInitializerFromCSV.h:151

gum::learning::DBRowGeneratorParser
the class used to read a row in the database and to transform it into a set of DBRow instances that c...
Definition DBRowGeneratorParser.h:84

gum::learning::DBRowGeneratorSet
The class used to pack sets of generators.
Definition DBRowGeneratorSet.h:129

gum::learning::DBTranslator4ContinuousVariable
The databases' cell translators for continuous variables.
Definition DBTranslator4ContinuousVariable.h:144

gum::learning::DBTranslatorSet
the class for packing together the translators used to preprocess the datasets
Definition DBTranslatorSet.h:130

gum::learning::DBTranslatorSet::insertTranslator
std::size_t insertTranslator(const DBTranslator &translator, const std::size_t column, const bool unique_column=true)
inserts a new translator at the end of the translator set

gum::learning::DatabaseTable
The class representing a tabular database as used by learning tasks.
Definition databaseTable.h:200

gum::learning::IDatabaseTable::handler
iterator handler() const
returns a new unsafe handler pointing to the 1st record of the database

gum::learning::IdCondSet
A class for storing a pair of sets of NodeIds, the second one corresponding to a conditional set.
Definition idCondSet.h:214

gum::learning::RecordCounter
The class that computes counting of observations from the database.
Definition recordCounter.h:127

gum::learning::RecordCounter::database
const DatabaseTable & database() const
returns the database on which we perform the counts

Definition at line 127 of file recordCounter.h.

Constructor & Destructor Documentation

◆ RecordCounter() [1/4]

gum::learning::RecordCounter::RecordCounter	(	const DBRowGeneratorParser &	parser,
		const std::vector< std::pair< std::size_t, std::size_t > > &	ranges,
		const Bijection< NodeId, std::size_t > &	nodeId2columns = Bijection< NodeId, std::size_t >() )

default constructor

Parameters

parser	the parser used to parse the database
ranges	a set of pairs {(X1,Y1),...,(Xn,Yn)} of database's rows indices. The counting are then performed only on the union of the rows [Xi,Yi), i in {1,...,n}. This is useful, e.g, when performing cross validation tasks, in which part of the database should be ignored. An empty set of ranges is equivalent to an interval [X,Y) ranging over the whole database.
nodeId2Columns	a mapping from the ids of the nodes in the graphical model to the corresponding column in the DatabaseTable parsed by the parser. This enables estimating from a database in which variable A corresponds to the 2nd column the parameters of a BN in which variable A has a NodeId of 5. An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable.

Warning: If nodeId2columns is not empty, then only the counts over the ids belonging to this bijection can be computed: applying method counts() over other ids will raise exception NotFound.

References ranges().

Referenced by RecordCounter(), RecordCounter(), clone(), operator=(), and operator=().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ RecordCounter() [2/4]

gum::learning::RecordCounter::RecordCounter	(	const DBRowGeneratorParser &	parser,
		const Bijection< NodeId, std::size_t > &	nodeId2columns = Bijection< NodeId, std::size_t >() )

default constructor

Parameters

parser the parser used to parse the database

nodeId2Columns a mapping from the ids of the nodes in the graphical model to the corresponding column in the DatabaseTable parsed by the parser. This enables estimating from a database in which variable A corresponds to the 2nd column the parameters of a BN in which variable A has a NodeId of 5. An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable.

Warning: If nodeId2columns is not empty, then only the counts over the ids belonging to this bijection can be computed: applying method counts() over other ids will raise exception NotFound.

◆ RecordCounter() [3/4]

gum::learning::RecordCounter::RecordCounter ( const RecordCounter & from )

copy constructor

References RecordCounter().

Here is the call graph for this function:

◆ RecordCounter() [4/4]

gum::learning::RecordCounter::RecordCounter ( RecordCounter && from )

move constructor

References RecordCounter().

Here is the call graph for this function:

◆ ~RecordCounter()

gum::learning::RecordCounter::~RecordCounter ( )

override

destructor

Member Function Documentation

◆ clear()

void gum::learning::RecordCounter::clear ( )

clears all the last database-parsed counting from memory

◆ clearRanges()

void gum::learning::RecordCounter::clearRanges ( )

reset the ranges to the one range corresponding to the whole database

◆ clone()

virtual RecordCounter * gum::learning::RecordCounter::clone ( ) const

nodiscardvirtual

virtual copy constructor

References RecordCounter().

Here is the call graph for this function:

◆ counts()

const std::vector< double > & gum::learning::RecordCounter::counts	(	const IdCondSet &	ids,
		const bool	check_discrete_vars = false )

returns the counts over all the variables in an IdCondSet

Parameters

ids	the idset of the variables over which we perform counting.
check_discrete_vars	The record counter can only produce correct results on sets of discrete variables. By default, the method does not check whether the variables corresponding to the IdCondSet are actually discrete. If check_discrete_vars is set to true, then this check is performed before computing the counting vector. In this case, if a variable is not discrete, a TypeError exception is raised.

Returns: a vector containing the multidimensional contingency table over all the variables corresponding to the ids passed in argument (both at the left hand side and right hand side of the conditioning bar of the IdCondSet). The first dimension is that of the first variable in the IdCondSet, i.e., when its value increases by 1, the offset in the output vector also increases by 1. The second dimension is that of the second variable in the IdCondSet, i.e., when its value increases by 1, the offset in the output vector increases by the domain size of the first variable. For the third variable, the offset corresponds to the product of the domain sizes of the first two variables, and so on.

Warning: The vector returned by the function may differ from one call to another. So, care must be taken. E,g. a code like:
const std::vector< double >&

counts = counter.counts(ids);

counts = counter.counts(other_ids);

gum::learning::RecordCounter::counts
const std::vector< double > & counts(const IdCondSet &ids, const bool check_discrete_vars=false)
returns the counts over all the variables in an IdCondSet

may be erroneous because the two calls to method counts() may return references to different vectors. The correct way of using method counts() is always to call it declaring a new reference variable:
const std::vector< double >& counts =

counter.counts(ids);

const std::vector< double >& other_counts =

counter.counts(other_ids);

Exceptions

TypeError is raised if check_discrete_vars is set to true (i.e., we check that all variables in the IdCondSet are discrete) and if at least one variable is not of a discrete nature.

◆ database()

const DatabaseTable & gum::learning::RecordCounter::database ( ) const

returns the database on which we perform the counts

◆ getNumberOfThreads()

Size gum::ThreadNumberManager::getNumberOfThreads ( ) const

nodiscardoverridevirtualinherited

returns the current max number of threads used by the class containing this ThreadNumberManager

Implements gum::IThreadNumberManager.

Referenced by gum::learning::IBNLearner::createParamEstimator_(), gum::learning::IBNLearner::createScore_(), gum::credal::InferenceEngine< GUM_SCALAR >::dispatchMarginalsToThreads_(), gum::credal::MultipleInferenceEngine< GUM_SCALAR, BNInferenceEngine >::expFusion_(), gum::ScheduledInference::scheduler(), and gum::credal::MultipleInferenceEngine< GUM_SCALAR, BNInferenceEngine >::verticesFusion_().

Here is the caller graph for this function:

◆ isGumNumberOfThreadsOverriden()

bool gum::ThreadNumberManager::isGumNumberOfThreadsOverriden ( ) const

nodiscardoverridevirtualinherited

indicates whether the class containing this ThreadNumberManager set its own number of threads

Implements gum::IThreadNumberManager.

Referenced by gum::learning::IBNLearner::createParamEstimator_(), and gum::learning::IBNLearner::createScore_().

Here is the caller graph for this function:

◆ minNbRowsPerThread()

std::size_t gum::learning::RecordCounter::minNbRowsPerThread ( ) const

returns the minimum of rows that each thread should process

◆ nodeId2Columns()

const Bijection< NodeId, std::size_t > & gum::learning::RecordCounter::nodeId2Columns ( ) const

returns the mapping from ids to column positions in the database

Warning: An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable.

◆ operator=() [1/2]

RecordCounter & gum::learning::RecordCounter::operator= ( const RecordCounter & from )

copy operator

References RecordCounter().

Here is the call graph for this function:

◆ operator=() [2/2]

RecordCounter & gum::learning::RecordCounter::operator= ( RecordCounter && from )

move operator

References RecordCounter().

Here is the call graph for this function:

◆ ranges()

const std::vector< std::pair< std::size_t, std::size_t > > & gum::learning::RecordCounter::ranges ( ) const

returns the current ranges

Referenced by RecordCounter().

Here is the caller graph for this function:

◆ setBayesNet()

template<GUM_Numeric GUM_SCALAR>

void gum::learning::RecordCounter::setBayesNet ( const BayesNet< GUM_SCALAR > & new_bn )

assign a new Bayes net to all the counter's generators depending on a BN

Typically, generators based on EM or K-means depend on a model to compute correctly their outputs. Method setBayesNet enables to update their BN model.

◆ setMinNbRowsPerThread()

void gum::learning::RecordCounter::setMinNbRowsPerThread ( const std::size_t nb ) const

changes the number min of rows a thread should process in a multithreading context

When Method counts executes several threads to perform counting on the rows of the database, the MinNbRowsPerThread indicates how many rows each thread should at least process. This is used to compute the number of threads actually run. This number is equal to the min between the max number of threads allowed and the number of records in the database divided by nb.

◆ setNumberOfThreads()

void gum::learning::RecordCounter::setNumberOfThreads ( Size nb )

overridevirtual

sets the number max of threads that can be used

Parameters

nb	the number max of threads to be used. If this number is set to 0, then it is defaulted to aGrUM's max number of threads

Implements gum::IThreadNumberManager.

◆ setRanges()

void gum::learning::RecordCounter::setRanges ( const std::vector< std::pair< std::size_t, std::size_t > > & new_ranges )

sets new ranges to perform the counting

Parameters

ranges a set of pairs {(X1,Y1),...,(Xn,Yn)} of database's rows indices. The counting are then performed only on the union of the rows [Xi,Yi), i in {1,...,n}. This is useful, e.g, when performing cross validation tasks, in which part of the database should be ignored. An empty set of ranges is equivalent to an interval [X,Y) ranging over the whole database.

Member Data Documentation

◆ _nb_threads_

Size gum::ThreadNumberManager::_nb_threads_ {0}

privateinherited

the max number of threads used by the class

Definition at line 126 of file threadNumberManager.h.

126{0};

The documentation for this class was generated from the following file:

agrum/base/stattests/recordCounter.h

Public Member Functions

Private Attributes

Detailed Description

Constructor & Destructor Documentation

◆ RecordCounter() [1/4]

◆ RecordCounter() [2/4]

◆ RecordCounter() [3/4]

◆ RecordCounter() [4/4]

◆ ~RecordCounter()

Member Function Documentation

◆ clear()

◆ clearRanges()

◆ clone()

◆ counts()

◆ database()

◆ getNumberOfThreads()

◆ isGumNumberOfThreadsOverriden()

◆ minNbRowsPerThread()

◆ nodeId2Columns()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ ranges()

◆ setBayesNet()

◆ setMinNbRowsPerThread()

◆ setNumberOfThreads()

◆ setRanges()

Member Data Documentation

◆ _nb_threads_