the class for computing the log2 of the parametric complexity of an r-ary multinomial variable More...

#include <variableLog2ParamComplexity.h>

Collaboration diagram for gum::VariableLog2ParamComplexity:

Public Member Functions
Constructors / Destructors
	VariableLog2ParamComplexity ()
	default constructor
	VariableLog2ParamComplexity (const VariableLog2ParamComplexity &from)
	copy constructor
	VariableLog2ParamComplexity (VariableLog2ParamComplexity &&from)
	move constructor
virtual VariableLog2ParamComplexity *	clone () const
	virtual copy constructor
virtual	~VariableLog2ParamComplexity ()
	destructor
Operators
VariableLog2ParamComplexity &	operator= (const VariableLog2ParamComplexity &from)
	copy operator
VariableLog2ParamComplexity &	operator= (VariableLog2ParamComplexity &&from)
	move operator
Accessors / Modifiers
double	log2Cnr (const std::size_t r, const double n)
	returns the value of the log in base 2 of Cnr
void	CnrToFile (const std::string &filename)
	the function used to write the cpp file with the values of log2(Cnr)
void	useCache (const bool on_off)
	indicates whether we wish to use a cache for the Cnr
void	clearCache ()
	clears the current cache

Private Attributes
const double	_cst1_ = -0.5 + std::log2(std::sqrt(M_PI))
	the value of N above which we should use Szpankowski's approximation
const double	_cst2_ = std::sqrt(2.0 / M_PI) / 3.0
const double	_cst3_ = 3.0 / 36.0 - 4.0 / (9.0 * M_PI)
bool	_use_cache_ {true}
HashTable< std::pair< std::size_t, double >, double >	_cache_

Detailed Description

the class for computing the log2 of the parametric complexity of an r-ary multinomial variable

This class enables to compute the log in base 2 of the parametric complexity of a single r-ary multinomial variable, i.e., the log in base 2 of the C_N^r term used by NML scores in Bayesian network structure learning algorithm (see, e.g., Silander, Roos, Kontkanen and Myllymaki (2007) "Factorized Normalized Maximum " Likelihood Criterion for Learning Bayesian network Structures)"

Definition at line 87 of file variableLog2ParamComplexity.h.

Constructor & Destructor Documentation

◆ VariableLog2ParamComplexity() [1/3]

gum::VariableLog2ParamComplexity::VariableLog2ParamComplexity ( )

default constructor

Referenced by VariableLog2ParamComplexity(), VariableLog2ParamComplexity(), clone(), operator=(), and operator=().

Here is the caller graph for this function:

◆ VariableLog2ParamComplexity() [2/3]

gum::VariableLog2ParamComplexity::VariableLog2ParamComplexity ( const VariableLog2ParamComplexity & from )

copy constructor

References VariableLog2ParamComplexity().

Here is the call graph for this function:

◆ VariableLog2ParamComplexity() [3/3]

gum::VariableLog2ParamComplexity::VariableLog2ParamComplexity ( VariableLog2ParamComplexity && from )

move constructor

References VariableLog2ParamComplexity().

Here is the call graph for this function:

◆ ~VariableLog2ParamComplexity()

virtual gum::VariableLog2ParamComplexity::~VariableLog2ParamComplexity ( )

virtual

destructor

Member Function Documentation

◆ clearCache()

void gum::VariableLog2ParamComplexity::clearCache ( )

clears the current cache

◆ clone()

virtual VariableLog2ParamComplexity * gum::VariableLog2ParamComplexity::clone ( ) const

virtual

virtual copy constructor

References VariableLog2ParamComplexity().

Here is the call graph for this function:

◆ CnrToFile()

void gum::VariableLog2ParamComplexity::CnrToFile ( const std::string & filename )

the function used to write the cpp file with the values of log2(Cnr)

Definition at line 156 of file variableLog2ParamComplexity.cpp.

                                                                       {
    // save all the value of cn2
    std::vector< long double > cn2_table(VariableLog2ParamComplexityCTableNSize);
    cn2_table[0] = 1;
    cn2_table[1] = 2;
 
    // for every value of n less than Szpankowski_threshold, we compute the
    // value of C_n^2 and write it into cn2_table
    GammaLog2 gamma_log2;
    for (double n = 2; n < VariableLog2ParamComplexityCTableNSize; ++n) {
      // here, note that, in  Silander, Roos, Kontkanen and Myllymaki (2007)
      // "Factorized Normalized Maximum Likelihood Criterion for Learning
      // Bayesian network Structures" paper, there is an uppercase N in the
      // formula, but this should be a lowercase n. In addition, we will loop
      // only on h=1 to n-1 and add to 2.0 the value computed to take into
      // account of h=0 and h=n.
      long double cn2 = 2;
      for (double h = 1; h < n; ++h) {
        long double elt = (gamma_log2(n + 1) - gamma_log2(h + 1) - gamma_log2((n - h) + 1)) * M_LN2
                        + h * std::log(h / n) + (n - h) * std::log((n - h) / n);
        cn2 += std::exp(elt);
      }
 
      // const double logCn2 = (double) std::log2 ( cn2 );
 
      cn2_table[int(n)] = cn2;
    }
 
    // write the header of the output file
    std::ofstream outfile(filename);
    if (!outfile.is_open()) { GUM_ERROR(IOError, "It is impossible to open file " << filename) }
    outfile.precision(20);
    outfile << "namespace gum {\n\n";
    /*
      outfile << "  // the size in r of VariableLog2ParamComplexityCTable\n";
      outfile << "  const std::size_t VariableLog2ParamComplexityCTableRSize = "
      << "4;\n\n";
      outfile << "  // the size in n of VariableLog2ParamComplexityCTable\n";
      outfile << "  const std::size_t VariableLog2ParamComplexityCTableNSize = "
      << VariableLog2ParamComplexityCTableNSize << ";\n\n";
    */
    outfile << "  // the CTable cache for log2(Cnr), n < " << VariableLog2ParamComplexityCTableNSize
            << " and r in {2,3,4,5}\n";
    outfile << "  const double VariableLog2ParamComplexityCTable[4]["
            << VariableLog2ParamComplexityCTableNSize << "] = {\n";
 
    // write the values of Cn2:
    outfile << "      { ";
    bool first = true;
    for (const auto cn2: cn2_table) {
      if (first) first = false;
      else outfile << ",\n        ";
      const double logCn2 = (double)std::log2(cn2);
      outfile << logCn2;
    }
    outfile << " },\n";
 
    // write the values of cn3, which are equal to cn2 + n
    outfile << "      { ";
    for (std::size_t i = std::size_t(0); i < VariableLog2ParamComplexityCTableNSize; ++i) {
      if (i > std::size_t(0)) outfile << ",\n        ";
      const double logCn3 = (double)std::log2(cn2_table[i] + i);
      outfile << logCn3;
    }
    outfile << " },\n";
 
    // write the values of cn4, which are equal to cn2 * (1 + n/2) + n
    outfile << "      { ";
    for (std::size_t i = std::size_t(0); i < VariableLog2ParamComplexityCTableNSize; ++i) {
      if (i > std::size_t(0)) outfile << ",\n        ";
      const double logCn4 = (double)std::log2(cn2_table[i] * (1.0 + i / 2.0) + i);
      outfile << logCn4;
    }
    outfile << " },\n";
 
    // write the values of cn5, which are equal to cn2 * (1 + 5n/6) + n + n^2/3
    outfile << "      { ";
    for (std::size_t i = std::size_t(0); i < VariableLog2ParamComplexityCTableNSize; ++i) {
      if (i > std::size_t(0)) outfile << ",\n        ";
      const double logCn5
          = (double)std::log2(cn2_table[i] * (1.0 + 5.0 * i / 6.0) + i + i * i / 3.0);
      outfile << logCn5;
    }
    outfile << " }\n";
 
    // write the footer and close the file
    outfile << "  };\n\n";
    outfile << "} /* namespace gum */\n";
    outfile.close();
  }

References GUM_ERROR, M_LN2, and gum::VariableLog2ParamComplexityCTableNSize.

◆ log2Cnr()

double gum::VariableLog2ParamComplexity::log2Cnr	(	const std::size_t	r,
		const double	n )

returns the value of the log in base 2 of Cnr

Definition at line 59 of file variableLog2ParamComplexity.cpp.

                                                                               {
    // we know that c_n^1 = 1 for all values of n
    // in addition, c_0^r = 1 for all values of r
    // finally, it is easy to see that c_1^r = r for all r
    if (r == std::size_t(1)) return 0.0;         // log2(1)
    if (n == 0.0) return 0.0;                    // log2(1)
    if (n == 1.0) return std::log2((double)r);   // log2(r)
 
    if (n < 0.0) {
      GUM_ERROR(OutOfBounds,
                "In the penalty of the fNML score, n must be greater "
                    << "than or equal to 0. But, here, n = " << n);
    }
 
    if (n < VariableLog2ParamComplexityCTableNSize) {
      // check if we can find the value we look for in precomputed table
      // ScorefNMLVariableLog2ParamComplexity
      std::size_t xn = (std::size_t)n;
      if (r - 2 < VariableLog2ParamComplexityCTableRSize) {
        return VariableLog2ParamComplexityCTable[r - 2][xn];
      } else {
        // try to find the value in the cache
        if (_use_cache_) {
          try {
            return _cache_[std::pair< std::size_t, double >{r, n}];
          } catch (NotFound const&) {}
        }
 
        // use Equation (13) of the paper to compute the value of cnr:
        // C_n^r = C_n^{r-1} + (n / (r-2)) C_n^{r-2}
        // as we handle only log2's of C_n^r, we have the following:
        // let k_r be such that C_n^{r-2} = k_r * C_n^{r-1}
        // log2 ( C_n^r ) = log2 ( C_n^{r-1} + k_r * (n/(r-2)) * C_n^{r-1} )
        //                = log2 ( C_n^{r-1} ) + log2 ( 1 + k_r * (n/(r-2)) )
        // as  k_r = C_n^{r-2} / C_n^{r-1}, we have that
        // log2(k_r) = log2 ( C_n^{r-2} ) - log2 ( C_n^{r-1} )
        // so, k_r = exp ( (log2(cn_^{r-2}) - log2(C_n^{r-1})) * log(2) )
        // now, let q_r = 1 + k_r * (n/(r-2)), then
        // C_n^r = C_n^{r-1} * q_r, or, equivalently,
        // log2(C_n^r) = log2(C_n^{r-1}) + log2(q_r)
        // Now, we can use the same method to compute C_n^{r+1}:
        // k_{r+1}   = C_n^{r-1} / C_n^r = 1 / q_r
        // q_{r+1}   = 1 + k_{r+1} * (n/(r-1))
        // C_n^{r+1} = C_n^r * q_{r+1}
        double log2Cnr1 = VariableLog2ParamComplexityCTable[3][xn];   // log(C_n^5)
        double log2Cnr2 = VariableLog2ParamComplexityCTable[2][xn];   // log(C_n^4)
        double log2Cnr  = 0.0;
        double k_r      = std::exp((log2Cnr2 - log2Cnr1) * M_LN2);
        double q_r      = 1.0 + k_r * n / (6.0 - 2.0);                // we first compute C_n^6
        for (std::size_t i = std::size_t(6); i <= r; ++i) {
          log2Cnr  = log2Cnr1 + std::log2(q_r);
          log2Cnr1 = log2Cnr;
          k_r      = 1.0 / q_r;
          q_r      = 1.0 + k_r * (n / (i - 1.0));
        }
 
        // if we use a cache, update it
        if (_use_cache_) { _cache_.insert(std::pair< std::size_t, double >{r, n}, log2Cnr); }
 
        return log2Cnr;
      }
    } else {
      // try to find the value in the cache
      if (_use_cache_) {
        try {
          return _cache_[std::pair< std::size_t, double >{r, n}];
        } catch (NotFound const&) {}
      }
 
      // compute the corrected Szpankowski approximation of cn2 (see the
      // documentation of constants cst1, cst2, cst3 in the ScorefNML header)
      double log2Cnr1 = 0.5 * std::log2(n) + _cst1_ + _cst2_ / std::sqrt(n) + _cst3_ / n;
      if (r == std::size_t(2)) return log2Cnr1;
 
      // the value of log2(cn1), which is always equal to 0
      double log2Cnr2 = 0.0;
 
      // use Equation (13) of the paper to compute the value of cnr
      // (see the detail of the formulas in the above if statement)
      double k_r     = std::exp((log2Cnr2 - log2Cnr1) * M_LN2);
      double q_r     = 1.0 + k_r * n / (3.0 - 2.0);   // we first compute C_n^3
      double log2Cnr = 0.0;
      for (std::size_t i = std::size_t(3); i <= r; ++i) {
        log2Cnr  = log2Cnr1 + std::log2(q_r);
        log2Cnr1 = log2Cnr;
        k_r      = 1.0 / q_r;
        q_r      = 1.0 + k_r * (n / (i - 1.0));
      }
 
      // if we use a cache, update it
      if (_use_cache_) { _cache_.insert(std::pair< std::size_t, double >{r, n}, log2Cnr); }
 
      return log2Cnr;
    }
  }

References _cache_, _cst1_, _cst2_, _cst3_, _use_cache_, GUM_ERROR, log2Cnr(), M_LN2, gum::VariableLog2ParamComplexityCTable, gum::VariableLog2ParamComplexityCTableNSize, and gum::VariableLog2ParamComplexityCTableRSize.

Referenced by log2Cnr().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ operator=() [1/2]

VariableLog2ParamComplexity & gum::VariableLog2ParamComplexity::operator= ( const VariableLog2ParamComplexity & from )

copy operator

References VariableLog2ParamComplexity().

Here is the call graph for this function:

◆ operator=() [2/2]

VariableLog2ParamComplexity & gum::VariableLog2ParamComplexity::operator= ( VariableLog2ParamComplexity && from )

move operator

References VariableLog2ParamComplexity().

Here is the call graph for this function:

◆ useCache()

void gum::VariableLog2ParamComplexity::useCache ( const bool on_off )

indicates whether we wish to use a cache for the Cnr

Member Data Documentation

◆ _cache_

HashTable< std::pair< std::size_t, double >, double > gum::VariableLog2ParamComplexity::_cache_

private

Definition at line 170 of file variableLog2ParamComplexity.h.

Referenced by log2Cnr().

◆ _cst1_

const double gum::VariableLog2ParamComplexity::_cst1_ = -0.5 + std::log2(std::sqrt(M_PI))

private

the value of N above which we should use Szpankowski's approximation

Definition at line 162 of file variableLog2ParamComplexity.h.

Referenced by log2Cnr().

◆ _cst2_

const double gum::VariableLog2ParamComplexity::_cst2_ = std::sqrt(2.0 / M_PI) / 3.0

private

Definition at line 163 of file variableLog2ParamComplexity.h.

Referenced by log2Cnr().

◆ _cst3_

const double gum::VariableLog2ParamComplexity::_cst3_ = 3.0 / 36.0 - 4.0 / (9.0 * M_PI)

private

Definition at line 164 of file variableLog2ParamComplexity.h.

Referenced by log2Cnr().

◆ _use_cache_

bool gum::VariableLog2ParamComplexity::_use_cache_ {true}

private

Definition at line 167 of file variableLog2ParamComplexity.h.

167{true};

Referenced by log2Cnr().

The documentation for this class was generated from the following files:

agrum/base/core/math/variableLog2ParamComplexity.h
agrum/base/core/math/variableLog2ParamComplexity.cpp

Public Member Functions

Private Attributes

Detailed Description

Constructor & Destructor Documentation

◆ VariableLog2ParamComplexity() [1/3]

◆ VariableLog2ParamComplexity() [2/3]

◆ VariableLog2ParamComplexity() [3/3]

◆ ~VariableLog2ParamComplexity()

Member Function Documentation

◆ clearCache()

◆ clone()

◆ CnrToFile()

◆ log2Cnr()

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ useCache()

Member Data Documentation

◆ _cache_

◆ _cst1_

◆ _cst2_

◆ _cst3_

◆ _use_cache_