the class for computing the NML penalty used by MIIC More...

#include <kNML.h>

Inheritance diagram for gum::learning::KNML:

Collaboration diagram for gum::learning::KNML:

Public Member Functions
Constructors / Destructors
	KNML (const DBRowGeneratorParser &parser, const Prior &prior, const std::vector< std::pair< std::size_t, std::size_t > > &ranges, const Bijection< NodeId, std::size_t > &nodeId2columns=Bijection< NodeId, std::size_t >())
	default constructor
	KNML (const DBRowGeneratorParser &parser, const Prior &prior, const Bijection< NodeId, std::size_t > &nodeId2columns=Bijection< NodeId, std::size_t >())
	default constructor
	KNML (const KNML &from)
	copy constructor
	KNML (KNML &&from)
	move constructor
virtual KNML *	clone () const
	virtual copy constructor
virtual	~KNML ()
	destructor
Operators
KNML &	operator= (const KNML &from)
	copy operator
KNML &	operator= (KNML &&from)
	move operator
Accessors / Modifiers
virtual void	clear ()
	clears all the data structures from memory, including the C_n^r cache
virtual void	clearCache ()
	clears the current C_n^r cache
virtual void	useCache (const bool on_off)
	turn on/off the use of the C_n^r cache
virtual void	setNumberOfThreads (Size nb)
	changes the max number of threads used to parse the database
virtual Size	getNumberOfThreads () const
	returns the number of threads used to parse the database
virtual bool	isGumNumberOfThreadsOverriden () const
	indicates whether the user set herself the number of threads
virtual void	setMinNbRowsPerThread (const std::size_t nb) const
	changes the number min of rows a thread should process in a multithreading context
virtual std::size_t	minNbRowsPerThread () const
	returns the minimum of rows that each thread should process
void	setRanges (const std::vector< std::pair< std::size_t, std::size_t > > &new_ranges)
	sets new ranges to perform the counts used by kNML
void	clearRanges ()
	reset the ranges to the one range corresponding to the whole database
const std::vector< std::pair< std::size_t, std::size_t > > &	ranges () const
	returns the current ranges
double	score (const NodeId var1, const NodeId var2)
	the scores
double	score (const NodeId var1, const NodeId var2, const std::vector< NodeId > &rhs_ids)
	the scores
const Bijection< NodeId, std::size_t > &	nodeId2Columns () const
	return the mapping between the columns of the database and the node ids
const DatabaseTable &	database () const
	return the database used by the score

Protected Member Functions
virtual double	score_ (const IdCondSet &idset) final
	returns the score for a given IdCondSet

Private Member Functions
std::vector< double >	marginalize_ (const std::size_t node_2_marginalize, const std::size_t X_size, const std::size_t Y_size, const std::size_t Z_size, const std::vector< double > &N_xyz) const
	returns a counting vector where variables are marginalized from N_xyz

Private Attributes
const double	one_log2_ {M_LOG2E}
	1 / log(2)
Prior *	prior_ {nullptr}
	the expert knowledge a priorwe add to the contingency tables
RecordCounter	counter_
	the record counter used for the counts over discrete variables
ScoringCache	cache_
	the scoring cache
bool	use_cache_ {true}
	a Boolean indicating whether we wish to use the cache
const std::vector< NodeId >	empty_ids_
	an empty vector

Detailed Description

the class for computing the NML penalty used by MIIC

Definition at line 67 of file kNML.h.

Constructor & Destructor Documentation

◆ KNML() [1/4]

gum::learning::KNML::KNML	(	const DBRowGeneratorParser &	parser,
		const Prior &	prior,
		const std::vector< std::pair< std::size_t, std::size_t > > &	ranges,
		const Bijection< NodeId, std::size_t > &	nodeId2columns = Bijection< NodeId, std::size_t >() )

default constructor

Parameters

parser	the parser used to parse the database
prior	An prior that we add to the computation of the score (this should come from expert knowledge): this consists in adding numbers to counts in the contingency tables
ranges	a set of pairs {(X1,Y1),...,(Xn,Yn)} of database's rows indices. The counts are then performed only on the union of the rows [Xi,Yi), i in {1,...,n}. This is useful, e.g, when performing cross validation tasks, in which part of the database should be ignored. An empty set of ranges is equivalent to an interval [X,Y) ranging over the whole database.
nodeId2Columns	a mapping from the ids of the nodes in the graphical model to the corresponding column in the DatabaseTable parsed by the parser. This enables estimating from a database in which variable A corresponds to the 2nd column the parameters of a BN in which variable A has a NodeId of 5. An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable.

Warning: If nodeId2columns is not empty, then only the scores over the ids belonging to this bijection can be computed: applying method score() over other ids will raise exception NotFound.

References ranges().

Referenced by KNML(), KNML(), clone(), operator=(), and operator=().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ KNML() [2/4]

gum::learning::KNML::KNML	(	const DBRowGeneratorParser &	parser,
		const Prior &	prior,
		const Bijection< NodeId, std::size_t > &	nodeId2columns = Bijection< NodeId, std::size_t >() )

default constructor

Parameters

parser	the parser used to parse the database
prior	An prior that we add to the computation of the score (this should come from expert knowledge): this consists in adding numbers to counts in the contingency tables
nodeId2Columns	a mapping from the ids of the nodes in the graphical model to the corresponding column in the DatabaseTable parsed by the parser. This enables estimating from a database in which variable A corresponds to the 2nd column the parameters of a BN in which variable A has a NodeId of 5. An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable.

Warning: If nodeId2columns is not empty, then only the scores over the ids belonging to this bijection can be computed: applying method score() over other ids will raise exception NotFound.

◆ KNML() [3/4]

gum::learning::KNML::KNML ( const KNML & from )

copy constructor

References KNML().

Here is the call graph for this function:

◆ KNML() [4/4]

gum::learning::KNML::KNML ( KNML && from )

move constructor

References KNML().

Here is the call graph for this function:

◆ ~KNML()

virtual gum::learning::KNML::~KNML ( )

virtual

destructor

Member Function Documentation

◆ clear()

virtual void gum::learning::KNML::clear ( )

virtual

clears all the data structures from memory, including the C_n^r cache

Reimplemented from gum::learning::IndependenceTest.

◆ clearCache()

virtual void gum::learning::KNML::clearCache ( )

virtual

clears the current C_n^r cache

Reimplemented from gum::learning::IndependenceTest.

◆ clearRanges()

void gum::learning::IndependenceTest::clearRanges ( )

reset the ranges to the one range corresponding to the whole database

◆ clone()

virtual KNML * gum::learning::KNML::clone ( ) const

virtual

virtual copy constructor

Implements gum::learning::IndependenceTest.

References KNML().

Here is the call graph for this function:

◆ database()

const DatabaseTable & gum::learning::IndependenceTest::database ( ) const

return the database used by the score

◆ getNumberOfThreads()

virtual Size gum::learning::IndependenceTest::getNumberOfThreads ( ) const

virtual

returns the number of threads used to parse the database

Reimplemented from gum::learning::IndependenceTest.

◆ isGumNumberOfThreadsOverriden()

virtual bool gum::learning::IndependenceTest::isGumNumberOfThreadsOverriden ( ) const

virtual

indicates whether the user set herself the number of threads

Reimplemented from gum::learning::IndependenceTest.

◆ marginalize_()

std::vector< double > gum::learning::IndependenceTest::marginalize_	(	const std::size_t	node_2_marginalize,
		const std::size_t	X_size,
		const std::size_t	Y_size,
		const std::size_t	Z_size,
		const std::vector< double > &	N_xyz ) const

protectedinherited

returns a counting vector where variables are marginalized from N_xyz

Parameters

node_2_marginalize	indicates which node(s) shall be marginalized: 0 means that X should be marginalized 1 means that Y should be marginalized 2 means that Z should be marginalized
X_size	the domain size of variable X
Y_size	the domain size of variable Y
Z_size	the domain size of the set of conditioning variables Z
N_xyz	a counting vector of dimension X * Y * Z (in this order)

◆ minNbRowsPerThread()

virtual std::size_t gum::learning::IndependenceTest::minNbRowsPerThread ( ) const

virtual

returns the minimum of rows that each thread should process

Reimplemented from gum::learning::IndependenceTest.

◆ nodeId2Columns()

const Bijection< NodeId, std::size_t > & gum::learning::IndependenceTest::nodeId2Columns ( ) const

return the mapping between the columns of the database and the node ids

Warning: An empty nodeId2Columns bijection means that the mapping is an identity, i.e., the value of a NodeId is equal to the index of the column in the DatabaseTable.

◆ operator=() [1/2]

KNML & gum::learning::KNML::operator= ( const KNML & from )

copy operator

References KNML().

Here is the call graph for this function:

◆ operator=() [2/2]

KNML & gum::learning::KNML::operator= ( KNML && from )

move operator

References KNML(), gum::learning::IndependenceTest::clearRanges(), gum::learning::IndependenceTest::getNumberOfThreads(), gum::learning::IndependenceTest::isGumNumberOfThreadsOverriden(), gum::learning::IndependenceTest::minNbRowsPerThread(), gum::learning::IndependenceTest::ranges(), gum::learning::IndependenceTest::score(), gum::learning::IndependenceTest::setMinNbRowsPerThread(), gum::learning::IndependenceTest::setNumberOfThreads(), and gum::learning::IndependenceTest::setRanges().

Here is the call graph for this function:

◆ ranges()

const std::vector< std::pair< std::size_t, std::size_t > > & gum::learning::IndependenceTest::ranges ( ) const

returns the current ranges

Referenced by KNML().

Here is the caller graph for this function:

◆ score() [1/2]

double gum::learning::IndependenceTest::score	(	const NodeId	var1,
		const NodeId	var2 )

the scores

◆ score() [2/2]

double gum::learning::IndependenceTest::score	(	const NodeId	var1,
		const NodeId	var2,
		const std::vector< NodeId > &	rhs_ids )

the scores

◆ score_()

virtual double gum::learning::KNML::score_ ( const IdCondSet & idset )

finalprotectedvirtual

returns the score for a given IdCondSet

Exceptions

OperationNotAllowed is raised if the score does not support calling method score such an idset (due to too many/too few variables in the left hand side or the right hand side of the idset).

Implements gum::learning::IndependenceTest.

◆ setMinNbRowsPerThread()

virtual void gum::learning::IndependenceTest::setMinNbRowsPerThread ( const std::size_t nb ) const

virtual

changes the number min of rows a thread should process in a multithreading context

When computing score, several threads are used by record counters to perform counts on the rows of the database, the MinNbRowsPerThread method indicates how many rows each thread should at least process. This is used to compute the number of threads actually run. This number is equal to the min between the max number of threads allowed and the number of records in the database divided by nb.

Reimplemented from gum::learning::IndependenceTest.

◆ setNumberOfThreads()

virtual void gum::learning::IndependenceTest::setNumberOfThreads ( Size nb )

virtual

changes the max number of threads used to parse the database

Reimplemented from gum::learning::IndependenceTest.

◆ setRanges()

void gum::learning::IndependenceTest::setRanges ( const std::vector< std::pair< std::size_t, std::size_t > > & new_ranges )

sets new ranges to perform the counts used by kNML

Parameters

ranges a set of pairs {(X1,Y1),...,(Xn,Yn)} of database's rows indices. The counts are then performed only on the union of the rows [Xi,Yi), i in {1,...,n}. This is useful, e.g, when performing cross validation tasks, in which part of the database should be ignored. An empty set of ranges is equivalent to an interval [X,Y) ranging over the whole database.

◆ useCache()

virtual void gum::learning::KNML::useCache ( const bool on_off )

virtual

turn on/off the use of the C_n^r cache

Reimplemented from gum::learning::IndependenceTest.

References gum::learning::IndependenceTest::database(), and gum::learning::IndependenceTest::nodeId2Columns().

Here is the call graph for this function:

Member Data Documentation

◆ cache_

ScoringCache gum::learning::IndependenceTest::cache_

protectedinherited

the scoring cache

Definition at line 222 of file independenceTest.h.

◆ counter_

RecordCounter gum::learning::IndependenceTest::counter_

protectedinherited

the record counter used for the counts over discrete variables

Definition at line 219 of file independenceTest.h.

◆ empty_ids_

const std::vector< NodeId > gum::learning::IndependenceTest::empty_ids_

protectedinherited

an empty vector

Definition at line 228 of file independenceTest.h.

◆ one_log2_

const double gum::learning::IndependenceTest::one_log2_ {M_LOG2E}

protectedinherited

1 / log(2)

Definition at line 213 of file independenceTest.h.

213{M_LOG2E};

M_LOG2E

#define M_LOG2E

Definition math_utils.h:55

◆ prior_

Prior* gum::learning::IndependenceTest::prior_ {nullptr}

protectedinherited

the expert knowledge a priorwe add to the contingency tables

Definition at line 216 of file independenceTest.h.

216{nullptr};

◆ use_cache_

bool gum::learning::IndependenceTest::use_cache_ {true}

protectedinherited

a Boolean indicating whether we wish to use the cache

Definition at line 225 of file independenceTest.h.

225{true};

The documentation for this class was generated from the following file:

agrum/base/stattests/kNML.h

Public Member Functions

Protected Member Functions

Private Member Functions

Private Attributes

Detailed Description

Constructor & Destructor Documentation

◆ KNML() [1/4]

◆ KNML() [2/4]

◆ KNML() [3/4]

◆ KNML() [4/4]

◆ ~KNML()

Member Function Documentation

◆ clear()

◆ clearCache()

◆ clearRanges()

◆ clone()

◆ database()

◆ getNumberOfThreads()

◆ isGumNumberOfThreadsOverriden()

◆ marginalize_()

◆ minNbRowsPerThread()

◆ nodeId2Columns()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ ranges()

◆ score() [1/2]

◆ score() [2/2]

◆ score_()

◆ setMinNbRowsPerThread()

◆ setNumberOfThreads()

◆ setRanges()

◆ useCache()

Member Data Documentation

◆ cache_

◆ counter_

◆ empty_ids_

◆ one_log2_

◆ prior_

◆ use_cache_