aGrUM 2.3.2
a C++ library for (probabilistic) graphical models
gum::VariableLog2ParamComplexity Class Reference

the class for computing the log2 of the parametric complexity of an r-ary multinomial variable More...

#include <variableLog2ParamComplexity.h>

Collaboration diagram for gum::VariableLog2ParamComplexity:

Public Member Functions

Constructors / Destructors
 VariableLog2ParamComplexity ()
 default constructor
 VariableLog2ParamComplexity (const VariableLog2ParamComplexity &from)
 copy constructor
 VariableLog2ParamComplexity (VariableLog2ParamComplexity &&from)
 move constructor
virtual VariableLog2ParamComplexityclone () const
 virtual copy constructor
virtual ~VariableLog2ParamComplexity ()
 destructor
Operators
VariableLog2ParamComplexityoperator= (const VariableLog2ParamComplexity &from)
 copy operator
VariableLog2ParamComplexityoperator= (VariableLog2ParamComplexity &&from)
 move operator
Accessors / Modifiers
double log2Cnr (const std::size_t r, const double n)
 returns the value of the log in base 2 of Cnr
void CnrToFile (const std::string &filename)
 the function used to write the cpp file with the values of log2(Cnr)
void useCache (const bool on_off)
 indicates whether we wish to use a cache for the Cnr
void clearCache ()
 clears the current cache

Private Attributes

const double _cst1_ = -0.5 + std::log2(std::sqrt(M_PI))
 the value of N above which we should use Szpankowski's approximation
const double _cst2_ = std::sqrt(2.0 / M_PI) / 3.0
const double _cst3_ = 3.0 / 36.0 - 4.0 / (9.0 * M_PI)
bool _use_cache_ {true}
HashTable< std::pair< std::size_t, double >, double_cache_

Detailed Description

the class for computing the log2 of the parametric complexity of an r-ary multinomial variable

This class enables to compute the log in base 2 of the parametric complexity of a single r-ary multinomial variable, i.e., the log in base 2 of the C_N^r term used by NML scores in Bayesian network structure learning algorithm (see, e.g., Silander, Roos, Kontkanen and Myllymaki (2007) "Factorized Normalized Maximum " Likelihood Criterion for Learning Bayesian network Structures)"

Definition at line 87 of file variableLog2ParamComplexity.h.

Constructor & Destructor Documentation

◆ VariableLog2ParamComplexity() [1/3]

gum::VariableLog2ParamComplexity::VariableLog2ParamComplexity ( )

default constructor

Referenced by VariableLog2ParamComplexity(), VariableLog2ParamComplexity(), clone(), operator=(), and operator=().

Here is the caller graph for this function:

◆ VariableLog2ParamComplexity() [2/3]

gum::VariableLog2ParamComplexity::VariableLog2ParamComplexity ( const VariableLog2ParamComplexity & from)

copy constructor

References VariableLog2ParamComplexity().

Here is the call graph for this function:

◆ VariableLog2ParamComplexity() [3/3]

gum::VariableLog2ParamComplexity::VariableLog2ParamComplexity ( VariableLog2ParamComplexity && from)

move constructor

References VariableLog2ParamComplexity().

Here is the call graph for this function:

◆ ~VariableLog2ParamComplexity()

virtual gum::VariableLog2ParamComplexity::~VariableLog2ParamComplexity ( )
virtual

destructor

Member Function Documentation

◆ clearCache()

void gum::VariableLog2ParamComplexity::clearCache ( )

clears the current cache

◆ clone()

virtual VariableLog2ParamComplexity * gum::VariableLog2ParamComplexity::clone ( ) const
virtual

virtual copy constructor

References VariableLog2ParamComplexity().

Here is the call graph for this function:

◆ CnrToFile()

void gum::VariableLog2ParamComplexity::CnrToFile ( const std::string & filename)

the function used to write the cpp file with the values of log2(Cnr)

Definition at line 156 of file variableLog2ParamComplexity.cpp.

156 {
157 // save all the value of cn2
158 std::vector< long double > cn2_table(VariableLog2ParamComplexityCTableNSize);
159 cn2_table[0] = 1;
160 cn2_table[1] = 2;
161
162 // for every value of n less than Szpankowski_threshold, we compute the
163 // value of C_n^2 and write it into cn2_table
164 GammaLog2 gamma_log2;
165 for (double n = 2; n < VariableLog2ParamComplexityCTableNSize; ++n) {
166 // here, note that, in Silander, Roos, Kontkanen and Myllymaki (2007)
167 // "Factorized Normalized Maximum Likelihood Criterion for Learning
168 // Bayesian network Structures" paper, there is an uppercase N in the
169 // formula, but this should be a lowercase n. In addition, we will loop
170 // only on h=1 to n-1 and add to 2.0 the value computed to take into
171 // account of h=0 and h=n.
172 long double cn2 = 2;
173 for (double h = 1; h < n; ++h) {
174 long double elt = (gamma_log2(n + 1) - gamma_log2(h + 1) - gamma_log2((n - h) + 1)) * M_LN2
175 + h * std::log(h / n) + (n - h) * std::log((n - h) / n);
176 cn2 += std::exp(elt);
177 }
178
179 // const double logCn2 = (double) std::log2 ( cn2 );
180
181 cn2_table[int(n)] = cn2;
182 }
183
184 // write the header of the output file
185 std::ofstream outfile(filename);
186 if (!outfile.is_open()) { GUM_ERROR(IOError, "It is impossible to open file " << filename) }
187 outfile.precision(20);
188 outfile << "namespace gum {\n\n";
189 /*
190 outfile << " // the size in r of VariableLog2ParamComplexityCTable\n";
191 outfile << " const std::size_t VariableLog2ParamComplexityCTableRSize = "
192 << "4;\n\n";
193 outfile << " // the size in n of VariableLog2ParamComplexityCTable\n";
194 outfile << " const std::size_t VariableLog2ParamComplexityCTableNSize = "
195 << VariableLog2ParamComplexityCTableNSize << ";\n\n";
196 */
197 outfile << " // the CTable cache for log2(Cnr), n < " << VariableLog2ParamComplexityCTableNSize
198 << " and r in {2,3,4,5}\n";
199 outfile << " const double VariableLog2ParamComplexityCTable[4]["
201
202 // write the values of Cn2:
203 outfile << " { ";
204 bool first = true;
205 for (const auto cn2: cn2_table) {
206 if (first) first = false;
207 else outfile << ",\n ";
208 const double logCn2 = (double)std::log2(cn2);
209 outfile << logCn2;
210 }
211 outfile << " },\n";
212
213 // write the values of cn3, which are equal to cn2 + n
214 outfile << " { ";
215 for (std::size_t i = std::size_t(0); i < VariableLog2ParamComplexityCTableNSize; ++i) {
216 if (i > std::size_t(0)) outfile << ",\n ";
217 const double logCn3 = (double)std::log2(cn2_table[i] + i);
218 outfile << logCn3;
219 }
220 outfile << " },\n";
221
222 // write the values of cn4, which are equal to cn2 * (1 + n/2) + n
223 outfile << " { ";
224 for (std::size_t i = std::size_t(0); i < VariableLog2ParamComplexityCTableNSize; ++i) {
225 if (i > std::size_t(0)) outfile << ",\n ";
226 const double logCn4 = (double)std::log2(cn2_table[i] * (1.0 + i / 2.0) + i);
227 outfile << logCn4;
228 }
229 outfile << " },\n";
230
231 // write the values of cn5, which are equal to cn2 * (1 + 5n/6) + n + n^2/3
232 outfile << " { ";
233 for (std::size_t i = std::size_t(0); i < VariableLog2ParamComplexityCTableNSize; ++i) {
234 if (i > std::size_t(0)) outfile << ",\n ";
235 const double logCn5
236 = (double)std::log2(cn2_table[i] * (1.0 + 5.0 * i / 6.0) + i + i * i / 3.0);
237 outfile << logCn5;
238 }
239 outfile << " }\n";
240
241 // write the footer and close the file
242 outfile << " };\n\n";
243 outfile << "} /* namespace gum */\n";
244 outfile.close();
245 }
#define GUM_ERROR(type, msg)
Definition exceptions.h:72
#define M_LN2
Definition math_utils.h:63
constexpr std::size_t VariableLog2ParamComplexityCTableNSize

References GUM_ERROR, M_LN2, and gum::VariableLog2ParamComplexityCTableNSize.

◆ log2Cnr()

double gum::VariableLog2ParamComplexity::log2Cnr ( const std::size_t r,
const double n )

returns the value of the log in base 2 of Cnr

Definition at line 59 of file variableLog2ParamComplexity.cpp.

59 {
60 // we know that c_n^1 = 1 for all values of n
61 // in addition, c_0^r = 1 for all values of r
62 // finally, it is easy to see that c_1^r = r for all r
63 if (r == std::size_t(1)) return 0.0; // log2(1)
64 if (n == 0.0) return 0.0; // log2(1)
65 if (n == 1.0) return std::log2((double)r); // log2(r)
66
67 if (n < 0.0) {
68 GUM_ERROR(OutOfBounds,
69 "In the penalty of the fNML score, n must be greater "
70 << "than or equal to 0. But, here, n = " << n);
71 }
72
74 // check if we can find the value we look for in precomputed table
75 // ScorefNMLVariableLog2ParamComplexity
76 std::size_t xn = (std::size_t)n;
78 return VariableLog2ParamComplexityCTable[r - 2][xn];
79 } else {
80 // try to find the value in the cache
81 if (_use_cache_) {
82 try {
83 return _cache_[std::pair< std::size_t, double >{r, n}];
84 } catch (NotFound const&) {}
85 }
86
87 // use Equation (13) of the paper to compute the value of cnr:
88 // C_n^r = C_n^{r-1} + (n / (r-2)) C_n^{r-2}
89 // as we handle only log2's of C_n^r, we have the following:
90 // let k_r be such that C_n^{r-2} = k_r * C_n^{r-1}
91 // log2 ( C_n^r ) = log2 ( C_n^{r-1} + k_r * (n/(r-2)) * C_n^{r-1} )
92 // = log2 ( C_n^{r-1} ) + log2 ( 1 + k_r * (n/(r-2)) )
93 // as k_r = C_n^{r-2} / C_n^{r-1}, we have that
94 // log2(k_r) = log2 ( C_n^{r-2} ) - log2 ( C_n^{r-1} )
95 // so, k_r = exp ( (log2(cn_^{r-2}) - log2(C_n^{r-1})) * log(2) )
96 // now, let q_r = 1 + k_r * (n/(r-2)), then
97 // C_n^r = C_n^{r-1} * q_r, or, equivalently,
98 // log2(C_n^r) = log2(C_n^{r-1}) + log2(q_r)
99 // Now, we can use the same method to compute C_n^{r+1}:
100 // k_{r+1} = C_n^{r-1} / C_n^r = 1 / q_r
101 // q_{r+1} = 1 + k_{r+1} * (n/(r-1))
102 // C_n^{r+1} = C_n^r * q_{r+1}
103 double log2Cnr1 = VariableLog2ParamComplexityCTable[3][xn]; // log(C_n^5)
104 double log2Cnr2 = VariableLog2ParamComplexityCTable[2][xn]; // log(C_n^4)
105 double log2Cnr = 0.0;
106 double k_r = std::exp((log2Cnr2 - log2Cnr1) * M_LN2);
107 double q_r = 1.0 + k_r * n / (6.0 - 2.0); // we first compute C_n^6
108 for (std::size_t i = std::size_t(6); i <= r; ++i) {
109 log2Cnr = log2Cnr1 + std::log2(q_r);
110 log2Cnr1 = log2Cnr;
111 k_r = 1.0 / q_r;
112 q_r = 1.0 + k_r * (n / (i - 1.0));
113 }
114
115 // if we use a cache, update it
116 if (_use_cache_) { _cache_.insert(std::pair< std::size_t, double >{r, n}, log2Cnr); }
117
118 return log2Cnr;
119 }
120 } else {
121 // try to find the value in the cache
122 if (_use_cache_) {
123 try {
124 return _cache_[std::pair< std::size_t, double >{r, n}];
125 } catch (NotFound const&) {}
126 }
127
128 // compute the corrected Szpankowski approximation of cn2 (see the
129 // documentation of constants cst1, cst2, cst3 in the ScorefNML header)
130 double log2Cnr1 = 0.5 * std::log2(n) + _cst1_ + _cst2_ / std::sqrt(n) + _cst3_ / n;
131 if (r == std::size_t(2)) return log2Cnr1;
132
133 // the value of log2(cn1), which is always equal to 0
134 double log2Cnr2 = 0.0;
135
136 // use Equation (13) of the paper to compute the value of cnr
137 // (see the detail of the formulas in the above if statement)
138 double k_r = std::exp((log2Cnr2 - log2Cnr1) * M_LN2);
139 double q_r = 1.0 + k_r * n / (3.0 - 2.0); // we first compute C_n^3
140 double log2Cnr = 0.0;
141 for (std::size_t i = std::size_t(3); i <= r; ++i) {
142 log2Cnr = log2Cnr1 + std::log2(q_r);
143 log2Cnr1 = log2Cnr;
144 k_r = 1.0 / q_r;
145 q_r = 1.0 + k_r * (n / (i - 1.0));
146 }
147
148 // if we use a cache, update it
149 if (_use_cache_) { _cache_.insert(std::pair< std::size_t, double >{r, n}, log2Cnr); }
150
151 return log2Cnr;
152 }
153 }
const double _cst1_
the value of N above which we should use Szpankowski's approximation
double log2Cnr(const std::size_t r, const double n)
returns the value of the log in base 2 of Cnr
HashTable< std::pair< std::size_t, double >, double > _cache_
const double VariableLog2ParamComplexityCTable[4][1000]
constexpr std::size_t VariableLog2ParamComplexityCTableRSize

References _cache_, _cst1_, _cst2_, _cst3_, _use_cache_, GUM_ERROR, log2Cnr(), M_LN2, gum::VariableLog2ParamComplexityCTable, gum::VariableLog2ParamComplexityCTableNSize, and gum::VariableLog2ParamComplexityCTableRSize.

Referenced by log2Cnr().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ operator=() [1/2]

VariableLog2ParamComplexity & gum::VariableLog2ParamComplexity::operator= ( const VariableLog2ParamComplexity & from)

copy operator

References VariableLog2ParamComplexity().

Here is the call graph for this function:

◆ operator=() [2/2]

VariableLog2ParamComplexity & gum::VariableLog2ParamComplexity::operator= ( VariableLog2ParamComplexity && from)

move operator

References VariableLog2ParamComplexity().

Here is the call graph for this function:

◆ useCache()

void gum::VariableLog2ParamComplexity::useCache ( const bool on_off)

indicates whether we wish to use a cache for the Cnr

Member Data Documentation

◆ _cache_

HashTable< std::pair< std::size_t, double >, double > gum::VariableLog2ParamComplexity::_cache_
private

Definition at line 170 of file variableLog2ParamComplexity.h.

Referenced by log2Cnr().

◆ _cst1_

const double gum::VariableLog2ParamComplexity::_cst1_ = -0.5 + std::log2(std::sqrt(M_PI))
private

the value of N above which we should use Szpankowski's approximation

Definition at line 162 of file variableLog2ParamComplexity.h.

Referenced by log2Cnr().

◆ _cst2_

const double gum::VariableLog2ParamComplexity::_cst2_ = std::sqrt(2.0 / M_PI) / 3.0
private

Definition at line 163 of file variableLog2ParamComplexity.h.

Referenced by log2Cnr().

◆ _cst3_

const double gum::VariableLog2ParamComplexity::_cst3_ = 3.0 / 36.0 - 4.0 / (9.0 * M_PI)
private

Definition at line 164 of file variableLog2ParamComplexity.h.

Referenced by log2Cnr().

◆ _use_cache_

bool gum::VariableLog2ParamComplexity::_use_cache_ {true}
private

Definition at line 167 of file variableLog2ParamComplexity.h.

167{true};

Referenced by log2Cnr().


The documentation for this class was generated from the following files: