![]() |
aGrUM 2.3.2
a C++ library for (probabilistic) graphical models
|
The databases' cell translators for labelized variables. More...
#include <agrum/base/database/DBTranslator4LabelizedVariable.h>
Public Member Functions | |
Constructors / Destructors | |
| DBTranslator4LabelizedVariable (const std::vector< std::string > &missing_symbols, std::size_t max_dico_entries=std::numeric_limits< std::size_t >::max()) | |
| default constructor without any initial variable | |
| DBTranslator4LabelizedVariable (std::size_t max_dico_entries=std::numeric_limits< std::size_t >::max()) | |
| default constructor without any initial variable nor missing symbols | |
| DBTranslator4LabelizedVariable (const LabelizedVariable &var, const std::vector< std::string > &missing_symbols, const bool editable_dictionary=false, std::size_t max_dico_entries=std::numeric_limits< std::size_t >::max()) | |
| default constructor with a labelized variable as translator | |
| DBTranslator4LabelizedVariable (const LabelizedVariable &var, const bool editable_dictionary=false, std::size_t max_dico_entries=std::numeric_limits< std::size_t >::max()) | |
| default constructor with a labelized variable as translator but without missing symbols | |
| DBTranslator4LabelizedVariable (const DBTranslator4LabelizedVariable &from) | |
| copy constructor | |
| DBTranslator4LabelizedVariable (DBTranslator4LabelizedVariable &&from) | |
| move constructor | |
| virtual DBTranslator4LabelizedVariable * | clone () const |
| virtual copy constructor | |
| virtual | ~DBTranslator4LabelizedVariable () |
| destructor | |
Operators | |
| DBTranslator4LabelizedVariable & | operator= (const DBTranslator4LabelizedVariable &from) |
| copy operator | |
| DBTranslator4LabelizedVariable & | operator= (DBTranslator4LabelizedVariable &&from) |
| move operator | |
Accessors / Modifiers | |
| virtual DBTranslatedValue | translate (const std::string &str) final |
| returns the translation of a string | |
| virtual std::string | translateBack (const DBTranslatedValue translated_val) const final |
| returns the original value for a given translation | |
| virtual std::size_t | domainSize () const final |
| returns the domain size of a variable corresponding to the translations | |
| virtual bool | needsReordering () const final |
| indicates whether a reordering is needed to make the translations sorted | |
| virtual HashTable< std::size_t, std::size_t > | reorder () final |
| performs a reordering of the dictionary and returns a mapping from the old translated values to the new ones. | |
| virtual const LabelizedVariable * | variable () const final |
| returns the variable stored into the translator | |
| virtual DBTranslatedValue | missingValue () const final |
| returns the translation of a missing value | |
Operators | |
| DBTranslatedValue | operator<< (const std::string &str) |
| alias for method translate | |
| std::string | operator>> (const DBTranslatedValue translated_val) |
| alias for method translateBack | |
Accessors / Modifiers | |
| virtual bool | hasEditableDictionary () const |
| indicates whether the translator has an editable dictionary or not | |
| virtual void | setEditableDictionaryMode (bool new_mode) |
| sets/unset the editable dictionary mode | |
| virtual const Bijection< std::size_t, std::string > & | getDictionary () const |
| returns the translation from database indices to input strings | |
| const Set< std::string > & | missingSymbols () const |
| returns the set of missing symbols taken into account by the translator | |
| bool | isMissingSymbol (const std::string &str) const |
| indicates whether a string corresponds to a missing symbol | |
| void | setVariableName (const std::string &str) const |
| sets the name of the variable stored into the translator | |
| void | setVariableDescription (const std::string &str) const |
| sets the name of the variable stored into the translator | |
| DBTranslatedValueType | getValType () const |
| returns the type of values handled by the translator | |
| bool | isLossless () const |
| returns a Boolean indicating whether the translation is lossless or not | |
| bool | isMissingValue (const DBTranslatedValue &val) const |
| indicates whether a translated value corresponds to a missing value | |
Protected Attributes | |
| bool | is_lossless_ |
| indicates whether the translation is lossless (e.g., ranges) or not | |
| bool | is_dictionary_dynamic_ |
| indicates whether the dictionary can be updated or not | |
| std::size_t | max_dico_entries_ |
| the maximum number of entries that the dictionary is allowed to contain | |
| Set< std::string > | missing_symbols_ |
| the set of missing symbols | |
| Bijection< std::size_t, std::string > | back_dico_ |
| the bijection relating back translated values and their original strings. | |
| DBTranslatedValueType | val_type_ |
| the type of the values translated by the translator | |
The databases' cell translators for labelized variables.
Translators are used by DatabaseTable instances to transform datasets' strings into DBTranslatedValue instances. The point is that strings are not adequate for fast learning, they need to be preprocessed into a type that can be analyzed quickly (the so-called DBTranslatedValue type).
A DBTranslator4LabelizedVariable is a translator that contains and exploits a LabelizedVariable for translations. Each time a string needs be translated, we ask the LabelizedVariable to provide the index of the label corresponding to the string. This index, when encoded into a DBTranslatedValue, is precisely the translation of the string.
Definition at line 155 of file DBTranslator4LabelizedVariable.h.
| gum::learning::DBTranslator4LabelizedVariable::DBTranslator4LabelizedVariable | ( | const std::vector< std::string > & | missing_symbols, |
| std::size_t | max_dico_entries = std::numeric_limits< std::size_t >::max() ) |
default constructor without any initial variable
When using this constructor, it is assumed implicitly that the dictionary contained into the translator is editable. So, when reading the database, if we observe a label that has not been encountered before, we add it into the dictionary of the translator (hence into the variable contained by the translator).
| missing_symbols | the set of symbols in the database representing missing values |
| max_dico_entries | the max number of entries that the dictionary can contain. If we try to add new entries in the dictionary, this will be considered as an error and a SizeError exception will be raised |
Referenced by DBTranslator4LabelizedVariable(), DBTranslator4LabelizedVariable(), clone(), operator=(), and operator=().
| gum::learning::DBTranslator4LabelizedVariable::DBTranslator4LabelizedVariable | ( | std::size_t | max_dico_entries = std::numeric_limits< std::size_t >::max() | ) |
default constructor without any initial variable nor missing symbols
When using this constructor, it is assumed implicitly that the dictionary contained into the translator is editable. So, when reading the database, if we observe a label that has not been encountered before, we add it into the dictionary of the translator (hence into the variable contained by the translator).
| max_dico_entries | the max number of entries that the dictionary can contain. If we try to add new entries in the dictionary, this will be considered as an error and a SizeError exception will be raised |
| gum::learning::DBTranslator4LabelizedVariable::DBTranslator4LabelizedVariable | ( | const LabelizedVariable & | var, |
| const std::vector< std::string > & | missing_symbols, | ||
| const bool | editable_dictionary = false, | ||
| std::size_t | max_dico_entries = std::numeric_limits< std::size_t >::max() ) |
default constructor with a labelized variable as translator
| var | a labelized variable whose labels will be used for translations. The translator keeps a copy of this variable |
| missing_symbols | the set of symbols in the database representing missing values |
| editable_dictionary | the mode in which the translator will perform translations: when false (the default), the translation of a string that does not correspond to a label of var will raise a NotFound exception; when true, the translator will try to add the string as a new label into var (and therefore into the dictionary) |
| max_dico_entries | the max number of entries that the dictionary can contain. If we try to add new entries in the dictionary, this will be considered as an error and a SizeError exception will be raised |
| gum::learning::DBTranslator4LabelizedVariable::DBTranslator4LabelizedVariable | ( | const LabelizedVariable & | var, |
| const bool | editable_dictionary = false, | ||
| std::size_t | max_dico_entries = std::numeric_limits< std::size_t >::max() ) |
default constructor with a labelized variable as translator but without missing symbols
| var | a labelized variable whose labels will be used for translations. The translator keeps a copy of this variable |
| editable_dictionary | the mode in which the translator will perform translations: when false (the default), the translation of a string that does not correspond to a label of var will raise a NotFound exception; when true, the translator will try to add the string as a new label into var (and therefore into the dictionary) |
| max_dico_entries | the max number of entries that the dictionary can contain. If we try to add new entries in the dictionary, this will be considered as an error and a SizeError exception will be raised |
| gum::learning::DBTranslator4LabelizedVariable::DBTranslator4LabelizedVariable | ( | const DBTranslator4LabelizedVariable & | from | ) |
copy constructor
References DBTranslator4LabelizedVariable().
| gum::learning::DBTranslator4LabelizedVariable::DBTranslator4LabelizedVariable | ( | DBTranslator4LabelizedVariable && | from | ) |
move constructor
References DBTranslator4LabelizedVariable().
|
virtual |
destructor
|
virtual |
virtual copy constructor
Implements gum::learning::DBTranslator.
References DBTranslator4LabelizedVariable().
|
finalvirtual |
returns the domain size of a variable corresponding to the translations
Assume that the translator has been fed with the observed values of a random variable. Then it has produced a set of translated values. The latter define the domain of the variable. The domainSize is the size of this domain. In other words, this corresponds to the number of labels of the LabelizedVariable contained in the translator. Note that missing values are not taken into account in the domain sizes.
Implements gum::learning::DBTranslator.
|
virtualinherited |
returns the translation from database indices to input strings
|
inherited |
returns the type of values handled by the translator
Referenced by gum::learning::BNDatabaseGenerator< GUM_SCALAR >::toDatabaseTable().
|
virtualinherited |
indicates whether the translator has an editable dictionary or not
Reimplemented in gum::learning::DBTranslator4DiscretizedVariable, gum::learning::DBTranslator4IntegerVariable, and gum::learning::DBTranslator4NumericalDiscreteVariable.
|
inherited |
returns a Boolean indicating whether the translation is lossless or not
Some translations can lose some information. For instance, a translator for a discretized variable will translate all the values of a discretization interval as the same value (the index of the interval). As such it looses some information because, knowing this index, it is impossible to get back to the original value that was translated. Method isLossless() indicates whether the translation never loses any information or not.
|
inherited |
indicates whether a string corresponds to a missing symbol
|
inherited |
indicates whether a translated value corresponds to a missing value
|
inherited |
returns the set of missing symbols taken into account by the translator
|
finalvirtual |
returns the translation of a missing value
Implements gum::learning::DBTranslator.
References missingValue().
Referenced by missingValue().
|
finalvirtual |
indicates whether a reordering is needed to make the translations sorted
If the strings represented by the translations are only numbers, translations are considered to be sorted if and only if they are sorted by increasing number. If the strings do not only represent numbers, then translations are considered to be sorted if and only if they are sorted lexicographically.
When constructing dynamically its dictionary, the translator may assign wrong DBTranslatedValue values to strings. For instance, a translator reading sequentially integer strings 4, 1, 3, may map 4 into DBTranslatedValue{std::size_t(0)}, 1 into DBTranslatedValue{std::size_t(1)} and 3 into DBTranslatedValue{std::size_t(2)}, resulting in random variables having domain {4,1,3}. The user may prefer having domain {1,3,4}, i.e., a domain specified with increasing values. This requires a reordering. Method needsReodering() returns a Boolean indicating whether such a reordering should be performed or whether the current order is OK.
Implements gum::learning::DBTranslator.
References needsReordering().
Referenced by needsReordering().
|
inherited |
alias for method translate
| DBTranslator4LabelizedVariable & gum::learning::DBTranslator4LabelizedVariable::operator= | ( | const DBTranslator4LabelizedVariable & | from | ) |
copy operator
References DBTranslator4LabelizedVariable().
| DBTranslator4LabelizedVariable & gum::learning::DBTranslator4LabelizedVariable::operator= | ( | DBTranslator4LabelizedVariable && | from | ) |
move operator
References DBTranslator4LabelizedVariable().
|
inherited |
alias for method translateBack
|
finalvirtual |
performs a reordering of the dictionary and returns a mapping from the old translated values to the new ones.
When a reordering is needed, i.e., string values must be translated differently. Method reorder() computes how the translations should be changed. It updates accordingly the dictionary and returns the mapping that enables changing the old dictionary values into the new ones.
Implements gum::learning::DBTranslator.
References reorder().
Referenced by reorder().
|
virtualinherited |
sets/unset the editable dictionary mode
Reimplemented in gum::learning::DBTranslator4DiscretizedVariable, gum::learning::DBTranslator4IntegerVariable, and gum::learning::DBTranslator4NumericalDiscreteVariable.
|
inherited |
sets the name of the variable stored into the translator
|
inherited |
sets the name of the variable stored into the translator
|
finalvirtual |
returns the translation of a string
This method tries to translate a given string into the DBTranslatedValue that should be stored into a DatabaseTable. If the translator cannot find the translation in its current dictionary, then two situations can obtain:
| str | the string that the translator will try to translate |
| UnknownLabelInDatabase | is raised if the translation cannot be found and the translator is not in an editable dictionary mode. |
| SizeError | is raised if the number of entries in the dictionary has already reached its maximum. |
Implements gum::learning::DBTranslator.
|
finalvirtual |
returns the original value for a given translation
| translated_val | a DBTranslatedValue which is supposed to contain the index of a label of the LabelizedVariable contained in the translator. The method should return this label |
| UnknownLabelInDatabase | is raised if this original value cannot be found |
Implements gum::learning::DBTranslator.
|
finalvirtual |
returns the variable stored into the translator
Implements gum::learning::DBTranslator.
References variable().
Referenced by variable().
|
mutableprotectedinherited |
the bijection relating back translated values and their original strings.
Note that the translated values considered here are of type std::size_t because only the values for discrete variables need be stored, those for continuous variables are actually identity mappings.
Definition at line 399 of file DBTranslator.h.
|
protectedinherited |
indicates whether the dictionary can be updated or not
Definition at line 385 of file DBTranslator.h.
|
protectedinherited |
indicates whether the translation is lossless (e.g., ranges) or not
Definition at line 382 of file DBTranslator.h.
|
protectedinherited |
the maximum number of entries that the dictionary is allowed to contain
Definition at line 388 of file DBTranslator.h.
|
protectedinherited |
the set of missing symbols
Definition at line 391 of file DBTranslator.h.
|
protectedinherited |
the type of the values translated by the translator
Definition at line 402 of file DBTranslator.h.