aGrUM 2.3.2
a C++ library for (probabilistic) graphical models
gum::learning::DBTranslatorSet Class Referencefinal

the class for packing together the translators used to preprocess the datasets More...

#include <agrum/base/database/DBTranslatorSet.h>

Public Member Functions

Constructors / Destructors
 DBTranslatorSet ()
 default constructor
 DBTranslatorSet (const DBTranslatorSet &from)
 copy constructor
 DBTranslatorSet (DBTranslatorSet &&from)
 move constructor
virtual DBTranslatorSetclone () const
 virtual copy constructor
virtual ~DBTranslatorSet ()
 destructor
Operators
DBTranslatorSetoperator= (const DBTranslatorSet &from)
 copy operator
DBTranslatorSetoperator= (DBTranslatorSet &&from)
 move operator
DBTranslatoroperator[] (const std::size_t k)
 returns the kth translator
const DBTranslatoroperator[] (const std::size_t k) const
 returns the kth translator
Accessors / Modifiers
std::size_t insertTranslator (const DBTranslator &translator, const std::size_t column, const bool unique_column=true)
 inserts a new translator at the end of the translator set
std::size_t insertTranslator (const Variable &var, const std::size_t column, const std::vector< std::string > &missing_symbols, const bool unique_column=true)
 inserts a new translator for a given variable at the end of the translator set
std::size_t insertTranslator (const Variable &var, const std::size_t column, const bool unique_column=true)
 inserts a new translator for a given variable at the end of the translator set
template<class Translator>
void changeTranslator (const Translator &new_translator, const std::size_t pos)
 substitute a translator by another one
void eraseTranslator (const std::size_t k, const bool k_is_input_col=false)
 erases either the kth translator or those parsing the kth column of the input database
DBTranslatortranslator (const std::size_t k)
 returns the kth translator
const DBTranslatortranslator (const std::size_t k) const
 returns the kth translator
DBTranslatortranslatorSafe (const std::size_t k)
 returns the kth translator
const DBTranslatortranslatorSafe (const std::size_t k) const
 returns the kth translator
DBTranslatedValue translate (const std::vector< std::string > &row, const std::size_t k) const
 ask the kth translator to translate a string in a row of the database
DBTranslatedValue translateSafe (const std::vector< std::string > &row, const std::size_t k) const
 similar to method translate, except that it checks that the kth translator exists
std::string translateBack (const DBTranslatedValue translated_val, const std::size_t k) const
 returns the original string that was translated into translated_val
std::string translateBackSafe (const DBTranslatedValue translated_val, const std::size_t k) const
 similar to method translateBack, except that it checks that the kth translator exists
bool isMissingValue (const DBTranslatedValue translated_val, const std::size_t k) const
 indicates whether the kth translator considers a translated_val as a missing value
bool isMissingValueSafe (const DBTranslatedValue translated_val, const std::size_t k) const
 similar to method isMissingValue, except that it checks that the kth translator exists
std::size_t domainSize (const std::size_t k) const
 returns the domain size of the variable stored into the kth translator
std::size_t domainSizeSafe (const std::size_t k) const
 returns the domain size of the variable stored into the kth translator
const Variablevariable (const std::size_t k) const
 returns the variable stored into the kth translator
const VariablevariableSafe (const std::size_t k) const
 returns the variable stored into the kth translator
bool needsReordering (const std::size_t k) const
 indicates whether a reordering is needed to make the kth translator sorted
bool needsReorderingSafe (const std::size_t k) const
 same as method needsReordering but checks that the kth translator exists
HashTable< std::size_t, std::size_t > reorder (const std::size_t k)
 performs a reordering of the dictionary and returns a mapping from the old translated values to the new ones.
HashTable< std::size_t, std::size_t > reorderSafe (const std::size_t k)
 same as method reorder but checks that the kth translator exists
std::size_t inputColumn (const std::size_t k) const
 returns the column of the input database that will be read by the kth translator
std::size_t inputColumnSafe (const std::size_t k) const
 returns the column of the input database that will be read by the kth translator
std::size_t highestInputColumn () const
 returns the largest input database column index read by the translators
void clear ()
 remove all the translators
std::size_t nbTranslators () const
 returns the number of translators stored into the set
std::size_t size () const
 returns the number of translators stored into the set
const std::vector< DBTranslator * > & translators () const
 returns the set of translators

Detailed Description

the class for packing together the translators used to preprocess the datasets

When learning Bayesian networks, the records of the train dataset are used to construct contingency tables that are either exploited in statistical conditional independence tests or in scores. In both cases, the values observed in the records must be translated into indices in the finite domain of the corresponding random variables. The DBTranslator classes are used for this purpose. To make the parsing of all the columns of the dataset easier, all the DBTranslator instances used are gathered into a DBTranslatorSet.

Here is an example of how to use this class:
// create an empty translator set
std::vector<std::string> missing { "?", "N/A", "???" };
// create the translators and add them to the translator set. First,
// create translator1 that will perform its translations on Column 1
// of the dataset (columns start from index 0)
std::size_t pos1 = set.insertTranslator ( translator1, 1 );
// currently, pos1 is equal to 0, that is, translator1 is the first
// translator in the translator set
// create a translator handling Column 0 of the dataset
std::size_t pos0 = set.insertTranslator ( translator0, 0 );
// translator0 has been inserted into the translator set at position pos0.
// pos0 = 0 because translators are sorted by increasing column order in
// the translator set. So, now, in the set, the first translator is
// translator0 and the the second one is translator1.
std::size_t pos2 = set.insertTranslator ( translator2, 2 );
// the set contains { translator0, translator1, translator2 }, in this order
// parsing the rows of the dataset
std::vector<std::string> row1 { ".33", "toto", "titi" };
float val0 = set.translate ( row1, 0 ).cont_val; // val0 = 0.33f
std::size_t val1 = set.translate ( row1, 1 ).discr_val; // val1 = 0 (toto)
std::size_t val2 = set.translate ( row1, 2 ).discr_val; // val2 = 0 (titi)
std::vector<std::string> row2 { "4.22x", "???", "??" };
val0 = set.translate ( row2, 0 ).cont_val; // raises gum::TypeError
val1 = set.translate ( row2, 1 ).discr_val;
// = std::numeric_limits<std::size_t>::max ()
val2 = set.translate ( row2, 2 ).discr_val; // = 1 (??)
// with method translateSafe, an exception is raised whenever we try to
// translate a column that is not taken into account by the translators
set.translateSafe ( row2, 3 ); // raises gum::UndefinedElement
The databases' cell translators for continuous variables.
The databases' cell translators for labelized variables.
the class for packing together the translators used to preprocess the datasets
DBTranslatedValue translateSafe(const std::vector< std::string > &row, const std::size_t k) const
similar to method translate, except that it checks that the kth translator exists
DBTranslatedValue translate(const std::vector< std::string > &row, const std::size_t k) const
ask the kth translator to translate a string in a row of the database
std::size_t insertTranslator(const DBTranslator &translator, const std::size_t column, const bool unique_column=true)
inserts a new translator at the end of the translator set
std::size_t discr_val
the field for storing discrete values
float cont_val
the field for storing continuous values

Definition at line 130 of file DBTranslatorSet.h.

Constructor & Destructor Documentation

◆ DBTranslatorSet() [1/3]

gum::learning::DBTranslatorSet::DBTranslatorSet ( )

default constructor

Referenced by DBTranslatorSet(), DBTranslatorSet(), clone(), operator=(), and operator=().

Here is the caller graph for this function:

◆ DBTranslatorSet() [2/3]

gum::learning::DBTranslatorSet::DBTranslatorSet ( const DBTranslatorSet & from)

copy constructor

References DBTranslatorSet().

Here is the call graph for this function:

◆ DBTranslatorSet() [3/3]

gum::learning::DBTranslatorSet::DBTranslatorSet ( DBTranslatorSet && from)

move constructor

References DBTranslatorSet().

Here is the call graph for this function:

◆ ~DBTranslatorSet()

virtual gum::learning::DBTranslatorSet::~DBTranslatorSet ( )
virtual

destructor

Member Function Documentation

◆ changeTranslator()

template<class Translator>
void gum::learning::DBTranslatorSet::changeTranslator ( const Translator & new_translator,
const std::size_t pos )

substitute a translator by another one

Parameters
new_translatorthe new translator, copied at index pos of the TranslatorSet
posthe position where the new translator should replace the old one.

◆ clear()

void gum::learning::DBTranslatorSet::clear ( )

remove all the translators

◆ clone()

virtual DBTranslatorSet * gum::learning::DBTranslatorSet::clone ( ) const
virtual

virtual copy constructor

References DBTranslatorSet().

Here is the call graph for this function:

◆ domainSize()

std::size_t gum::learning::DBTranslatorSet::domainSize ( const std::size_t k) const

returns the domain size of the variable stored into the kth translator

Warning
this method assumes that there are at least k translators. So, it won't check that the kth translator actually exists. If unsure, use method domainSizeSafe that performs this check.

◆ domainSizeSafe()

std::size_t gum::learning::DBTranslatorSet::domainSizeSafe ( const std::size_t k) const

returns the domain size of the variable stored into the kth translator

Exceptions
UndefinedElementis raised if there are fewer than k translators in the translator set.

◆ eraseTranslator()

void gum::learning::DBTranslatorSet::eraseTranslator ( const std::size_t k,
const bool k_is_input_col = false )

erases either the kth translator or those parsing the kth column of the input database

DBTranslatorSets do not necessarily read all the columns of their input database. For instance, a CSV may contain 10 columns, but the DBTranslatorSet may only contain two translators reading columns 3 and 5 respectively. When k_is_input_col is set to false, Parameter k passed in argument corresponds to either 0 or 1, i.e., to the index of one of the two translators stored into the DBTranslatorSet. When k_is_input_col is set to true, the translators to be erased are the ones that parse the kth column of the input database (when several translators parse the column k, all of them are removed).

Warning
if the translator does not exists, nothing is done. In particular, no exception is raised.

◆ highestInputColumn()

std::size_t gum::learning::DBTranslatorSet::highestInputColumn ( ) const

returns the largest input database column index read by the translators

◆ inputColumn()

std::size_t gum::learning::DBTranslatorSet::inputColumn ( const std::size_t k) const

returns the column of the input database that will be read by the kth translator

Warning
this method assumes that there are at least k translators. So, it won't check that the kth translator actually exists. If unsure, use method inputColumnSafe that performs this check.

◆ inputColumnSafe()

std::size_t gum::learning::DBTranslatorSet::inputColumnSafe ( const std::size_t k) const

returns the column of the input database that will be read by the kth translator

Exceptions
UndefinedElementis raised if there are fewer than k translators in the translator set.

◆ insertTranslator() [1/3]

std::size_t gum::learning::DBTranslatorSet::insertTranslator ( const DBTranslator & translator,
const std::size_t column,
const bool unique_column = true )

inserts a new translator at the end of the translator set

Parameters
translatora translator that will be copied into the translator set
columnthe index of the column that this new translator should read in the database.
unique_columnindicates whether the column can be read by several translators.
Returns
the position of the translator within the translator set.
Exceptions
DuplicateElementis raised if there already exists a translator reading the column passed in argument and the unique_column argument is set to true.

References translator().

Referenced by gum::learning::readFile(), and gum::learning::IBNLearner::readFile_().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ insertTranslator() [2/3]

std::size_t gum::learning::DBTranslatorSet::insertTranslator ( const Variable & var,
const std::size_t column,
const bool unique_column = true )

inserts a new translator for a given variable at the end of the translator set

Parameters
varthe variable that will be contained into the translator
columnthe index of the column that this new translator should read in the database.
unique_columnindicates whether the column can be read by several translators.
Exceptions
DuplicateElementis raised if there already exists a translator reading the column passed in argumentt and the unique_column argument is set to true.

◆ insertTranslator() [3/3]

std::size_t gum::learning::DBTranslatorSet::insertTranslator ( const Variable & var,
const std::size_t column,
const std::vector< std::string > & missing_symbols,
const bool unique_column = true )

inserts a new translator for a given variable at the end of the translator set

Parameters
varthe variable that will be contained into the translator
columnthe index of the column that this new translator should read in the database.
missing_symbolsthe set of symbols in the database representing missing values
unique_columnindicates whether the column can be read by several translators.
Exceptions
DuplicateElementis raised if there already exists a translator reading the column passed in argument and the unique_column argument is set to true.

◆ isMissingValue()

bool gum::learning::DBTranslatorSet::isMissingValue ( const DBTranslatedValue translated_val,
const std::size_t k ) const

indicates whether the kth translator considers a translated_val as a missing value

Parameters
translated_valthe value that we compare to the translation of a missing value
kthe index of the translator that performed the translation
Warning
this method assumes that there are at least k translators. So, it won't check that the kth translator actually exists. If unsure, use method isMissingValueSafe that performs this check.

◆ isMissingValueSafe()

bool gum::learning::DBTranslatorSet::isMissingValueSafe ( const DBTranslatedValue translated_val,
const std::size_t k ) const

similar to method isMissingValue, except that it checks that the kth translator exists

Parameters
translated_valthe value that we compare to the translation of a missing value
kthe index of the translator that performed the translation
Exceptions
UndefinedElementis raised if there are fewer than k translators in the translator set.

◆ nbTranslators()

std::size_t gum::learning::DBTranslatorSet::nbTranslators ( ) const

returns the number of translators stored into the set

◆ needsReordering()

bool gum::learning::DBTranslatorSet::needsReordering ( const std::size_t k) const

indicates whether a reordering is needed to make the kth translator sorted

For a given translator, if the strings represented by the translations are only numbers, the translations are considered to be sorted if and only if they are sorted by increasing number. If the strings do not only represent numbers, then translations are considered to be sorted if and only if they are sorted lexicographically.

When constructing dynamically its dictionary, the translator may assign wrong DBTranslatedValue values to strings. For instance, a translator reading sequentially integer strings 4, 1, 3, may map 4 into DBTranslatedValue{std::size_t(0)}, 1 into DBTranslatedValue{std::size_t(1)} and 3 into DBTranslatedValue{std::size_t(2)}, resulting in random variables having domain {4,1,3}. The user may prefer having domain {1,3,4}, i.e., a domain specified with increasing values. This requires a reordering. Method needsReodering() returns a Boolean indicating whether such a reordering should be performed or whether the current order is OK.

Warning
this method assumes that there are at least k translators. So, it won't check that the kth translator actually exists. If unsure, use method needsReorderingSafe that performs this check.

◆ needsReorderingSafe()

bool gum::learning::DBTranslatorSet::needsReorderingSafe ( const std::size_t k) const

same as method needsReordering but checks that the kth translator exists

Exceptions
UndefinedElementis raised if there are fewer than k translators in the translator set.

◆ operator=() [1/2]

DBTranslatorSet & gum::learning::DBTranslatorSet::operator= ( const DBTranslatorSet & from)

copy operator

References DBTranslatorSet().

Here is the call graph for this function:

◆ operator=() [2/2]

DBTranslatorSet & gum::learning::DBTranslatorSet::operator= ( DBTranslatorSet && from)

move operator

References DBTranslatorSet().

Here is the call graph for this function:

◆ operator[]() [1/2]

DBTranslator & gum::learning::DBTranslatorSet::operator[] ( const std::size_t k)

returns the kth translator

Warning
this operator assumes that there are at least k translators. So, it won't check that the kth translator actually exists. If unsure, use method translatorSafe that performs this check.

◆ operator[]() [2/2]

const DBTranslator & gum::learning::DBTranslatorSet::operator[] ( const std::size_t k) const

returns the kth translator

Warning
this operator assumes that there are at least k translators. So, it won't check that the kth translator actually exists. If unsure, use method translatorSafe that performs this check.

◆ reorder()

HashTable< std::size_t, std::size_t > gum::learning::DBTranslatorSet::reorder ( const std::size_t k)

performs a reordering of the dictionary and returns a mapping from the old translated values to the new ones.

When a reordering is needed, i.e., string values must be translated differently, Method reorder() computes how the translations should be changed. It updates accordingly the dictionary and returns the mapping that enables changing the old dictionary values into the new ones. Note that the hash table returned is expressed in terms of std::size_t because only the translations for discrete random variables need be reordered, those for continuous random variables are identity mappings.

Warning
this method assumes that there are at least k translators. So, it won't check that the kth translator actually exists. If unsure, use method reorderSafe that performs this check.

◆ reorderSafe()

HashTable< std::size_t, std::size_t > gum::learning::DBTranslatorSet::reorderSafe ( const std::size_t k)

same as method reorder but checks that the kth translator exists

Exceptions
UndefinedElementis raised if there are fewer than k translators in the translator set.

◆ size()

std::size_t gum::learning::DBTranslatorSet::size ( ) const

returns the number of translators stored into the set

◆ translate()

DBTranslatedValue gum::learning::DBTranslatorSet::translate ( const std::vector< std::string > & row,
const std::size_t k ) const

ask the kth translator to translate a string in a row of the database

Parameters
rowa row of the original database
kthe index of the translator that will perform the translation
Warning
this method assumes that there are at least k translators. So, it won't check that the kth translator actually exists. If unsure, use method translateSafe that performs this check.
as there is not necessarily an identity mapping between the set of columns of the database and the set of translators used, k may not necessarily corresponds to the index of a column in the database: this is the index of a translator within the set

◆ translateBack()

std::string gum::learning::DBTranslatorSet::translateBack ( const DBTranslatedValue translated_val,
const std::size_t k ) const

returns the original string that was translated into translated_val

Parameters
translated_valthe value from which we look for the original string
kthe index of the translator that performed the translation
Warning
this method assumes that there are at least k translators. So, it won't check that the kth translator actually exists. If unsure, use method translateBackSafe that performs this check.
as there is not necessarily an identity mapping between the set of columns of the database and the set of translators used, k may not necessarily corresponds to the index of a column in the database: this is the index of a translator within the set

◆ translateBackSafe()

std::string gum::learning::DBTranslatorSet::translateBackSafe ( const DBTranslatedValue translated_val,
const std::size_t k ) const

similar to method translateBack, except that it checks that the kth translator exists

Parameters
translated_valthe value from which we look for the original string
kthe index of the translator that performed the translation
Exceptions
UndefinedElementis raised if there are fewer than k translators in the translator set.
Warning
as there is not necessarily an identity mapping between the set of columns of the database and the set of translators used, k may not necessarily corresponds to the index of a column in the database: this is the index of a translator within the set

◆ translateSafe()

DBTranslatedValue gum::learning::DBTranslatorSet::translateSafe ( const std::vector< std::string > & row,
const std::size_t k ) const

similar to method translate, except that it checks that the kth translator exists

Parameters
rowa row of the original database
kthe index of the translator that will perform the translation
Exceptions
UndefinedElementis raised if there are fewer than k translators in the translator set.
Warning
as there is not necessarily an identity mapping between the set of columns of the database and the set of translators used, k may not necessarily corresponds to the index of a column in the database: this is the index of a translator within the set

◆ translator() [1/2]

DBTranslator & gum::learning::DBTranslatorSet::translator ( const std::size_t k)

returns the kth translator

Warning
this method assumes that there are at least k translators. So, it won't check that the kth translator actually exists. If unsure, use method translatorSafe that performs this check.

Referenced by insertTranslator().

Here is the caller graph for this function:

◆ translator() [2/2]

const DBTranslator & gum::learning::DBTranslatorSet::translator ( const std::size_t k) const

returns the kth translator

Warning
this method assumes that there are at least k translators. So, it won't check that the kth translator actually exists. If unsure, use method translatorSafe that performs this check.

◆ translators()

const std::vector< DBTranslator * > & gum::learning::DBTranslatorSet::translators ( ) const

returns the set of translators

◆ translatorSafe() [1/2]

DBTranslator & gum::learning::DBTranslatorSet::translatorSafe ( const std::size_t k)

returns the kth translator

Exceptions
UndefinedElementis raised if there are fewer than k translators in the translator set.

◆ translatorSafe() [2/2]

const DBTranslator & gum::learning::DBTranslatorSet::translatorSafe ( const std::size_t k) const

returns the kth translator

Exceptions
UndefinedElementis raised if there are fewer than k translators in the translator set.

◆ variable()

const Variable & gum::learning::DBTranslatorSet::variable ( const std::size_t k) const

returns the variable stored into the kth translator

Warning
this method assumes that there are at least k translators. So, it won't check that the kth translator actually exists. If unsure, use method variableSafe that performs this check.

◆ variableSafe()

const Variable & gum::learning::DBTranslatorSet::variableSafe ( const std::size_t k) const

returns the variable stored into the kth translator

Exceptions
UndefinedElementis raised if there are fewer than k translators in the translator set.

The documentation for this class was generated from the following file: