aGrUM 2.3.2
a C++ library for (probabilistic) graphical models
gum::learning::DBInitializerFromCSV Class Reference

The class for initializing DatabaseTable and RawDatabaseTable instances from CSV files. More...

#include <agrum/base/database/DBInitializerFromCSV.h>

Inheritance diagram for gum::learning::DBInitializerFromCSV:
Collaboration diagram for gum::learning::DBInitializerFromCSV:

Public Types

enum class  InputType : char { STRING , DBCELL }
 the enumeration indicating the type of the data the IDBInitializer expects as input data More...

Public Member Functions

Constructors / Destructors
 DBInitializerFromCSV (const std::string filename, bool fileContainsNames=true, const std::string delimiter=",", const char commentmarker='#', const char quoteMarker='"')
 default constructor
 DBInitializerFromCSV (const DBInitializerFromCSV &from)
 copy constructor
 DBInitializerFromCSV (DBInitializerFromCSV &&from)
 move constructor
virtual DBInitializerFromCSVclone () const
 virtual copy constructor
virtual ~DBInitializerFromCSV ()
 destructor
Operators
DBInitializerFromCSVoperator= (const DBInitializerFromCSV &from)
 copy operator
DBInitializerFromCSVoperator= (DBInitializerFromCSV &&from)
 move operator
Accessors / Modifiers
const std::vector< std::string > & variableNames ()
 returns the names of the variables in the input dataset
template<class DATABASE>
void fillDatabase (DATABASE &database, const bool retry_insertion=false)
 fills the rows of the database table
std::size_t throwingColumn () const
 This method indicates which column filling raised an exception, if any, during the execution of fillDatabase.

Protected Member Functions

virtual std::vector< std::string > variableNames_ () final
 returns the names of the variables
virtual const std::vector< std::string > & currentStringRow_ () final
 returns the content of the current row using strings
virtual bool nextRow_ () final
 indicates whether there is a next row to read (and point on it)
virtual const DBRow< DBCell > & currentDBCellRow_ ()
 asks the child class for the content of the current row using dbcells

Detailed Description

The class for initializing DatabaseTable and RawDatabaseTable instances from CSV files.

In aGrUM, the usual way to create DatabaseTable instances used by learning algorithms is to use the 4-step process below:

  1. Create an IDBInitializer instance (either a DBInitializerFromCSV or a DBInitializerFromSQL). This will enable to get the variables corresponding to the columns of the DatabaseTable.
  2. Knowing these variables, create a DBTranslatorSet for encoding the lines of the CSV file or those of the SQL result into the appropriate values for the learning algorithms.
  3. Create the DatabaseTable, passing it the DBTranslatorSet created in the preceding step. Use the IDBInitializer to provide the variables' names to the DatabaseTable.
  4. Use the IDBInitializer to add the lines of the CSV file or those of the SQL result into the DatabaseTable.
The following codes show the details of this process:
// 1/ use the initializer to parse all the columns/rows of a CSV file
gum::learning::DBInitializerFromCSV initializer ( "asia.csv" );
const auto& var_names = initializer.variableNames ();
const std::size_t nb_vars = var_names.size ();
// we create as many translators as there are variables
for ( std::size_t i = 0; i < nb_vars; ++i )
translator_set.insertTranslator ( translator, i );
// create a DatabaseTable with these translators. For the moment, the
// DatabaseTable will be empty, i.e., it will contain no row
gum::learning::DatabaseTable database ( translator_set );
database.setVariableNames( initializer.variableNames () );
// use the DBInitializerFromCSV to fill the rows:
initializer.fillDatabase ( database );
// now, the database contains all the content of the CSV file
// 2/ use an IDBInitializer to initialize a DatabaseTable, but ignore
// some columns.
gum::learning::DBInitializerFromCSV initializer2 ( "asia.csv" );
gum::learning::DatabaseTable database2; // empty database
// indicate which columns of the CSV file should be read
database2.insertTranslator ( translator, 1 );
database2.insertTranslator ( translator, 3 );
database2.insertTranslator ( translator, 4 );
// sets the names of the columns correctly
database2.setVariableNames( initializer2.variableNames () );
// fill the rows:
initializer2.fillDatabase ( database2 );
// now all the rows of the CSV file have been transferred into database2,
// but only columns 1, 3 and 4 of the CSV file have been kept.
// 3/ another possibility to initialize a DatabaseTable, ignoring
// some columns:
gum::learning::DBInitializerFromCSV initializer3 ( "asia.csv" );
gum::learning::DatabaseTable database3 ( translator_set );
// here, database3 is an empty database but it contains already
// translators for all the columns of the CSV file. We shall now remove
// the columns/translators that are not wanted anymore
database3.ignoreColumn ( 0 );
database3.ignoreColumn ( 2 );
database3.ignoreColumn ( 5 );
database3.ignoreColumn ( 6 );
database3.ignoreColumn ( 7 );
// asia contains 8 columns. The above ignoreColumns keep only columns
// 1, 3 and 4.
// sets the names of the columns correctly
database3.setVariableNames( initializer3.variableNames () );
// fill the rows:
initializer3.fillDatabase ( database3 );
// now all the rows of the CSV file have been transferred into database3,
// but only columns 1, 3 and 4 of the CSV file have been kept.
The class for initializing DatabaseTable and RawDatabaseTable instances from CSV files.
The databases' cell translators for labelized variables.
the class for packing together the translators used to preprocess the datasets
std::size_t insertTranslator(const DBTranslator &translator, const std::size_t column, const bool unique_column=true)
inserts a new translator at the end of the translator set
The class representing a tabular database as used by learning tasks.
void setVariableNames(const std::vector< std::string > &names, const bool from_external_object=true) override
sets the names of the variables
std::size_t insertTranslator(const DBTranslator &translator, const std::size_t input_column, const bool unique_column=true)
insert a new translator into the database table

Definition at line 151 of file DBInitializerFromCSV.h.

Member Enumeration Documentation

◆ InputType

enum class gum::learning::IDBInitializer::InputType : char
stronginherited

the enumeration indicating the type of the data the IDBInitializer expects as input data

Enumerator
STRING 
DBCELL 

Definition at line 139 of file IDBInitializer.h.

139: char { STRING, DBCELL };

Constructor & Destructor Documentation

◆ DBInitializerFromCSV() [1/3]

gum::learning::DBInitializerFromCSV::DBInitializerFromCSV ( const std::string filename,
bool fileContainsNames = true,
const std::string delimiter = ",",
const char commentmarker = '#',
const char quoteMarker = '"' )

default constructor

Parameters
filenamethe name of the CSV file
fileContainsNamesa Boolean indicating whether the first line of the CSV file contains the names of the columns
delimiterthe character that acts as the column separator in the CSV file
commentmarkerthe character that marks the beginning of a comment
quoteMarkerthe character that is used to quote the sentences in the CSV file

Referenced by DBInitializerFromCSV(), DBInitializerFromCSV(), clone(), operator=(), and operator=().

Here is the caller graph for this function:

◆ DBInitializerFromCSV() [2/3]

gum::learning::DBInitializerFromCSV::DBInitializerFromCSV ( const DBInitializerFromCSV & from)

copy constructor

the new initializer points to the same file as from, but it reparses it from scratch.

References DBInitializerFromCSV().

Here is the call graph for this function:

◆ DBInitializerFromCSV() [3/3]

gum::learning::DBInitializerFromCSV::DBInitializerFromCSV ( DBInitializerFromCSV && from)

move constructor

References DBInitializerFromCSV().

Here is the call graph for this function:

◆ ~DBInitializerFromCSV()

virtual gum::learning::DBInitializerFromCSV::~DBInitializerFromCSV ( )
virtual

destructor

Member Function Documentation

◆ clone()

virtual DBInitializerFromCSV * gum::learning::DBInitializerFromCSV::clone ( ) const
virtual

virtual copy constructor

Implements gum::learning::IDBInitializer.

References DBInitializerFromCSV().

Here is the call graph for this function:

◆ currentDBCellRow_()

virtual const DBRow< DBCell > & gum::learning::IDBInitializer::currentDBCellRow_ ( )
protectedvirtualinherited

asks the child class for the content of the current row using dbcells

If the child class parses DBRows, this method should be overloaded

◆ currentStringRow_()

virtual const std::vector< std::string > & gum::learning::DBInitializerFromCSV::currentStringRow_ ( )
finalprotectedvirtual

returns the content of the current row using strings

Reimplemented from gum::learning::IDBInitializer.

References currentStringRow_().

Referenced by currentStringRow_().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ fillDatabase()

template<class DATABASE>
void gum::learning::IDBInitializer::fillDatabase ( DATABASE & database,
const bool retry_insertion = false )
inherited

fills the rows of the database table

This method may raise exceptions when trying to insert new rows into the database table. See Method insertRow() of the database table.

Referenced by gum::learning::IBNLearner::Database::Database(), gum::learning::IBNLearner::Database::Database(), gum::learning::readFile(), and gum::learning::IBNLearner::readFile_().

Here is the caller graph for this function:

◆ nextRow_()

virtual bool gum::learning::DBInitializerFromCSV::nextRow_ ( )
finalprotectedvirtual

indicates whether there is a next row to read (and point on it)

Implements gum::learning::IDBInitializer.

References nextRow_().

Referenced by nextRow_().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ operator=() [1/2]

DBInitializerFromCSV & gum::learning::DBInitializerFromCSV::operator= ( const DBInitializerFromCSV & from)

copy operator

the initializer points to the same file as from, but it reparses it from scratch.

References DBInitializerFromCSV().

Here is the call graph for this function:

◆ operator=() [2/2]

DBInitializerFromCSV & gum::learning::DBInitializerFromCSV::operator= ( DBInitializerFromCSV && from)

move operator

the initializer points to the same file as from, but it reparses it from scratch.

References DBInitializerFromCSV().

Here is the call graph for this function:

◆ throwingColumn()

std::size_t gum::learning::IDBInitializer::throwingColumn ( ) const
inherited

This method indicates which column filling raised an exception, if any, during the execution of fillDatabase.

◆ variableNames()

const std::vector< std::string > & gum::learning::IDBInitializer::variableNames ( )
inherited

returns the names of the variables in the input dataset

Referenced by gum::learning::IBNLearner::Database::Database(), gum::learning::IBNLearner::Database::Database(), gum::learning::readFile(), and gum::learning::IBNLearner::readFile_().

Here is the caller graph for this function:

◆ variableNames_()

virtual std::vector< std::string > gum::learning::DBInitializerFromCSV::variableNames_ ( )
finalprotectedvirtual

returns the names of the variables

Implements gum::learning::IDBInitializer.


The documentation for this class was generated from the following file: