aGrUM 2.3.2
a C++ library for (probabilistic) graphical models
gum::StructuredPlaner< GUM_SCALAR > Class Template Reference

<agrum/FMDP/planning/structuredPlaner.h> More...

#include <structuredPlaner.h>

Inheritance diagram for gum::StructuredPlaner< GUM_SCALAR >:
Collaboration diagram for gum::StructuredPlaner< GUM_SCALAR >:

Public Member Functions

Datastructure access methods
INLINE const FMDP< GUM_SCALAR > * fmdp ()
 Returns a const ptr on the Factored Markov Decision Process on which we're planning.
INLINE const MultiDimFunctionGraph< GUM_SCALAR > * vFunction ()
 Returns a const ptr on the value function computed so far.
virtual Size vFunctionSize ()
 Returns vFunction computed so far current size.
INLINE MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy ()
 Returns the best policy obtained so far.
virtual Size optimalPolicySize ()
 Returns optimalPolicy computed so far current size.
std::string optimalPolicy2String ()
 Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.
Planning Methods
virtual void initialize (const FMDP< GUM_SCALAR > *fmdp)
 Initializes data structure needed for making the planning.
virtual void makePlanning (Idx nbStep=1000000)
 Performs a value iteration.

Static Public Member Functions

static StructuredPlaner< GUM_SCALAR > * spumddInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true)
static StructuredPlaner< GUM_SCALAR > * sviInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true)

Protected Member Functions

Value Iteration Methods
virtual void initVFunction_ ()
 Performs a single step of value iteration.
virtual MultiDimFunctionGraph< GUM_SCALAR > * valueIteration_ ()
 Performs a single step of value iteration.
virtual MultiDimFunctionGraph< GUM_SCALAR > * evalQaction_ (const MultiDimFunctionGraph< GUM_SCALAR > *, Idx)
 Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximiseQactions_ (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
 Performs max_a Q(s,a).
virtual MultiDimFunctionGraph< GUM_SCALAR > * minimiseFunctions_ (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
 Performs min_i F_i.
virtual MultiDimFunctionGraph< GUM_SCALAR > * addReward_ (MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0)
 Perform the R(s) + gamma . function.

Protected Attributes

const FMDP< GUM_SCALAR > * fmdp_
 The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ).
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
 The Value Function computed iteratively.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
 The associated optimal policy.
gum::VariableSet elVarSeq_
 A Set to eleminate primed variables.
GUM_SCALAR discountFactor_
 Discount Factor used for infinite horizon planning.
IOperatorStrategy< GUM_SCALAR > * operator_
bool verbose_
 Boolean used to indcates whether or not iteration informations should be displayed on terminal.

Private Attributes

GUM_SCALAR _threshold_
 The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*.
bool _firstTime_

Constructor & destructor.

 StructuredPlaner (IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose)
 Default constructor.
virtual ~StructuredPlaner ()
 Default destructor.

Optimal policy extraction methods

virtual void evalPolicy_ ()
 Perform the required tasks to extract an optimal policy.
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * makeArgMax_ (const MultiDimFunctionGraph< GUM_SCALAR > *Qaction, Idx actionId)
 Creates a copy of given Qaction that can be exploit by a Argmax.
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * argmaximiseQactions_ (std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &)
 Performs argmax_a Q(s,a).
void extractOptimalPolicy_ (const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
 From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.
NodeId _recurArgMaxCopy_ (NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
 Recursion part for the createArgMaxCopy.
NodeId _recurExtractOptPol_ (NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
 Recursion part for the createArgMaxCopy.
void _transferActionIds_ (const ArgMaxSet< GUM_SCALAR, Idx > &, ActionSet &)
 Extract from an ArgMaxSet the associated ActionSet.

Detailed Description

template<typename GUM_SCALAR>
class gum::StructuredPlaner< GUM_SCALAR >

<agrum/FMDP/planning/structuredPlaner.h>

A class to find optimal policy for a given FMDP.

Perform a structure value iteration planning

Pure virtual functions : regress_, maximize_, argmaximize_, add_ and subtract_ are a priorthe ones to be respecified according to the used datastructure (MDDs, DTs, BNs, ...)

Definition at line 82 of file structuredPlaner.h.

Constructor & Destructor Documentation

◆ StructuredPlaner()

template<typename GUM_SCALAR>
INLINE gum::StructuredPlaner< GUM_SCALAR >::StructuredPlaner ( IOperatorStrategy< GUM_SCALAR > * opi,
GUM_SCALAR discountFactor,
GUM_SCALAR epsilon,
bool verbose )
protected

Default constructor.

Definition at line 86 of file structuredPlaner_tpl.h.

89 :
92
94 vFunction_ = nullptr;
95 optimalPolicy_ = nullptr;
96 }
<agrum/FMDP/planning/structuredPlaner.h>
GUM_SCALAR discountFactor_
Discount Factor used for infinite horizon planning.
bool verbose_
Boolean used to indcates whether or not iteration informations should be displayed on terminal.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optimalPolicy_
The associated optimal policy.
IOperatorStrategy< GUM_SCALAR > * operator_
GUM_SCALAR _threshold_
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*.
StructuredPlaner(IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose)
Default constructor.
MultiDimFunctionGraph< GUM_SCALAR > * vFunction_
The Value Function computed iteratively.

References StructuredPlaner(), _threshold_, discountFactor_, operator_, optimalPolicy_, verbose_, and vFunction_.

Referenced by StructuredPlaner(), ~StructuredPlaner(), spumddInstance(), and sviInstance().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ ~StructuredPlaner()

template<typename GUM_SCALAR>
INLINE gum::StructuredPlaner< GUM_SCALAR >::~StructuredPlaner ( )
virtual

Default destructor.

Definition at line 102 of file structuredPlaner_tpl.h.

102 {
104
105 if (vFunction_) { delete vFunction_; }
106
107 if (optimalPolicy_) delete optimalPolicy_;
108
109 delete operator_;
110 }

References StructuredPlaner(), operator_, optimalPolicy_, and vFunction_.

Here is the call graph for this function:

Member Function Documentation

◆ _recurArgMaxCopy_()

template<typename GUM_SCALAR>
NodeId gum::StructuredPlaner< GUM_SCALAR >::_recurArgMaxCopy_ ( NodeId currentNodeId,
Idx actionId,
const MultiDimFunctionGraph< GUM_SCALAR > * src,
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * argMaxCpy,
HashTable< NodeId, NodeId > & visitedNodes )
private

Recursion part for the createArgMaxCopy.

Definition at line 499 of file structuredPlaner_tpl.h.

504 {
506
507 NodeId nody;
508 if (src->isTerminalNode(currentNodeId)) {
510 nody = argMaxCpy->manager()->addTerminalNode(leaf);
511 } else {
513 NodeId* sonsMap = static_cast< NodeId* >(
514 SOA_ALLOCATE(sizeof(NodeId) * currentNode->nodeVar()->domainSize()));
515 for (Idx moda = 0; moda < currentNode->nodeVar()->domainSize(); ++moda)
518 nody = argMaxCpy->manager()->addInternalNode(currentNode->nodeVar(), sonsMap);
519 }
521 return nody;
522 }
NodeId _recurArgMaxCopy_(NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
#define SOA_ALLOCATE(x)

References _recurArgMaxCopy_(), gum::DiscreteVariable::domainSize(), gum::HashTable< Key, Val >::exists(), gum::HashTable< Key, Val >::insert(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::isTerminalNode(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::node(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::nodeValue(), gum::InternalNode::nodeVar(), SOA_ALLOCATE, and gum::InternalNode::son().

Referenced by _recurArgMaxCopy_(), and makeArgMax_().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ _recurExtractOptPol_()

template<typename GUM_SCALAR>
NodeId gum::StructuredPlaner< GUM_SCALAR >::_recurExtractOptPol_ ( NodeId currentNodeId,
const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * argMaxOptVFunc,
HashTable< NodeId, NodeId > & visitedNodes )
private

Recursion part for the createArgMaxCopy.

Definition at line 576 of file structuredPlaner_tpl.h.

580 {
582
583 NodeId nody;
584 if (argMaxOptVFunc->isTerminalNode(currentNodeId)) {
587 nody = optimalPolicy_->manager()->addTerminalNode(leaf);
588 } else {
590 NodeId* sonsMap = static_cast< NodeId* >(
591 SOA_ALLOCATE(sizeof(NodeId) * currentNode->nodeVar()->domainSize()));
592 for (Idx moda = 0; moda < currentNode->nodeVar()->domainSize(); ++moda)
594 nody = optimalPolicy_->manager()->addInternalNode(currentNode->nodeVar(), sonsMap);
595 }
597 return nody;
598 }
NodeId _recurExtractOptPol_(NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
Recursion part for the createArgMaxCopy.
void _transferActionIds_(const ArgMaxSet< GUM_SCALAR, Idx > &, ActionSet &)
Extract from an ArgMaxSet the associated ActionSet.

References _recurExtractOptPol_(), _transferActionIds_(), gum::DiscreteVariable::domainSize(), gum::HashTable< Key, Val >::exists(), gum::HashTable< Key, Val >::insert(), gum::InternalNode::nodeVar(), optimalPolicy_, SOA_ALLOCATE, and gum::InternalNode::son().

Referenced by _recurExtractOptPol_(), and extractOptimalPolicy_().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ _transferActionIds_()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::_transferActionIds_ ( const ArgMaxSet< GUM_SCALAR, Idx > & src,
ActionSet & dest )
private

Extract from an ArgMaxSet the associated ActionSet.

Definition at line 604 of file structuredPlaner_tpl.h.

605 {
606 for (auto idi = src.beginSafe(); idi != src.endSafe(); ++idi)
607 dest += *idi;
608 }

References gum::ArgMaxSet< GUM_SCALAR_VAL, GUM_SCALAR_SEQ >::beginSafe(), and gum::ArgMaxSet< GUM_SCALAR_VAL, GUM_SCALAR_SEQ >::endSafe().

Referenced by _recurExtractOptPol_().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ addReward_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::addReward_ ( MultiDimFunctionGraph< GUM_SCALAR > * function,
Idx actionId = 0 )
protectedvirtual

Perform the R(s) + gamma . function.

Warning
function is deleted, new one is returned

Definition at line 408 of file structuredPlaner_tpl.h.

409 {
410 // *****************************************************************************************
411 // ... we multiply the result by the discount factor, ...
413 newVFunction->copyAndMultiplyByScalar(*Vold, this->discountFactor_);
414 delete Vold;
415
416 // *****************************************************************************************
417 // ... and finally add reward
419
420 return newVFunction;
421 }
const FMDP< GUM_SCALAR > * fmdp_
The Factored Markov Decision Process describing our planning situation (NB : this one must have funct...
#define RECAST(x)
Definition fmdp_tpl.h:57

References gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndMultiplyByScalar(), discountFactor_, fmdp_, operator_, and RECAST.

Referenced by evalPolicy_().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ argmaximiseQactions_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< GUM_SCALAR >::argmaximiseQactions_ ( std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > & qActionsSet)
protectedvirtual

Performs argmax_a Q(s,a).

Warning
Performs also the deallocation of the QActions

Definition at line 529 of file structuredPlaner_tpl.h.

References operator_.

Referenced by evalPolicy_().

Here is the caller graph for this function:

◆ evalPolicy_()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::evalPolicy_ ( )
protectedvirtual

Perform the required tasks to extract an optimal policy.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 435 of file structuredPlaner_tpl.h.

435 {
436 // *****************************************************************************************
437 // Loop reset
439 newVFunction->copyAndReassign(*vFunction_, fmdp_->mapMainPrime());
440
443 // *****************************************************************************************
444 // For each action
445 for (auto actionIter = fmdp_->beginActions(); actionIter != fmdp_->endActions(); ++actionIter) {
447
448 qAction = this->addReward_(qAction);
449
451 }
452 delete newVFunction;
453
454
455 // *****************************************************************************************
456 // Next to evaluate main value function, we take maximise over all action
457 // value, ...
460
461 // *****************************************************************************************
462 // Next to evaluate main value function, we take maximise over all action
463 // value, ...
465 }
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * argmaximiseQactions_(std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &)
Performs argmax_a Q(s,a).
void extractOptimalPolicy_(const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracti...
virtual MultiDimFunctionGraph< GUM_SCALAR > * addReward_(MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0)
Perform the R(s) + gamma . function.
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * makeArgMax_(const MultiDimFunctionGraph< GUM_SCALAR > *Qaction, Idx actionId)
Creates a copy of given Qaction that can be exploit by a Argmax.
virtual MultiDimFunctionGraph< GUM_SCALAR > * evalQaction_(const MultiDimFunctionGraph< GUM_SCALAR > *, Idx)
Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.

References addReward_(), argmaximiseQactions_(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndReassign(), evalQaction_(), extractOptimalPolicy_(), fmdp_, makeArgMax_(), operator_, and vFunction_.

Here is the call graph for this function:

◆ evalQaction_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::evalQaction_ ( const MultiDimFunctionGraph< GUM_SCALAR > * Vold,
Idx actionId )
protectedvirtual

Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.

Definition at line 357 of file structuredPlaner_tpl.h.

358 {
359 // ******************************************************************************
360 // Initialisation :
361 // Creating a copy of last Vfunction to deduce from the new Qaction
362 // And finding the first var to eleminate (the one at the end)
363
364 return operator_->regress(Vold, actionId, this->fmdp_, this->elVarSeq_);
365 }
gum::VariableSet elVarSeq_
A Set to eleminate primed variables.

References elVarSeq_, fmdp_, and operator_.

Referenced by evalPolicy_().

Here is the caller graph for this function:

◆ extractOptimalPolicy_()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::extractOptimalPolicy_ ( const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * optimalValueFunction)
protected

From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.

Warning
deallocate the argmax optimal value function

Definition at line 552 of file structuredPlaner_tpl.h.

554 {
555 optimalPolicy_->clear();
556
557 // Insertion des nouvelles variables
559 = argMaxOptimalValueFunction->variablesSequence().beginSafe();
560 varIter != argMaxOptimalValueFunction->variablesSequence().endSafe();
561 ++varIter)
562 optimalPolicy_->add(**varIter);
563
565 optimalPolicy_->manager()->setRootNode(_recurExtractOptPol_(argMaxOptimalValueFunction->root(),
567 src2dest));
568
570 }

References _recurExtractOptPol_(), and optimalPolicy_.

Referenced by evalPolicy_().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ fmdp()

template<typename GUM_SCALAR>
INLINE const FMDP< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::fmdp ( )
inline

Returns a const ptr on the Factored Markov Decision Process on which we're planning.

Definition at line 148 of file structuredPlaner.h.

148{ return fmdp_; }

References fmdp_.

Referenced by initialize().

Here is the caller graph for this function:

◆ initialize()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::initialize ( const FMDP< GUM_SCALAR > * fmdp)
virtual

Initializes data structure needed for making the planning.

Warning
No calling this methods before starting the first makePlaninng will surely and definitely result in a crash

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 243 of file structuredPlaner_tpl.h.

243 {
244 fmdp_ = fmdp;
245
246 // Determination of the threshold value
248
249 // Establishement of sequence of variable elemination
250 for (auto varIter = fmdp_->beginVariables(); varIter != fmdp_->endVariables(); ++varIter)
251 elVarSeq_ << fmdp_->main2prime(*varIter);
252
253 // Initialisation of the value function
254 vFunction_ = operator_->getFunctionInstance();
255 optimalPolicy_ = operator_->getAggregatorInstance();
256 _firstTime_ = true;
257 }
INLINE const FMDP< GUM_SCALAR > * fmdp()
Returns a const ptr on the Factored Markov Decision Process on which we're planning.

References _threshold_, discountFactor_, fmdp(), and fmdp_.

Referenced by gum::AdaptiveRMaxPlaner::initialize().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ initVFunction_()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::initVFunction_ ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 308 of file structuredPlaner_tpl.h.

308 {
309 vFunction_->copy(*(RECAST(fmdp_->reward())));
310 }

References fmdp_, RECAST, and vFunction_.

Referenced by makePlanning().

Here is the caller graph for this function:

◆ makeArgMax_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< GUM_SCALAR >::makeArgMax_ ( const MultiDimFunctionGraph< GUM_SCALAR > * Qaction,
Idx actionId )
protected

Creates a copy of given Qaction that can be exploit by a Argmax.

Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction

Parameters
Qaction: the function graph we want to transform
actionId: the action Id associated to that graph
Warning
delete the original Qaction, returns its conversion

Definition at line 474 of file structuredPlaner_tpl.h.

476 {
478 = operator_->getArgMaxFunctionInstance();
479
480 // Insertion des nouvelles variables
482 = qAction->variablesSequence().beginSafe();
483 varIter != qAction->variablesSequence().endSafe();
484 ++varIter)
485 amcpy->add(**varIter);
486
488 amcpy->manager()->setRootNode(
490
491 delete qAction;
492 return amcpy;
493 }

References _recurArgMaxCopy_(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::add(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), operator_, gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::root(), and gum::MultiDimImplementation< GUM_SCALAR >::variablesSequence().

Referenced by evalPolicy_().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ makePlanning()

template<typename GUM_SCALAR>
void gum::StructuredPlaner< GUM_SCALAR >::makePlanning ( Idx nbStep = 1000000)
virtual

Performs a value iteration.

Parameters
nbStep: enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 263 of file structuredPlaner_tpl.h.

263 {
264 if (_firstTime_) {
265 this->initVFunction_();
266 _firstTime_ = false;
267 }
268
269 // *****************************************************************************************
270 // Main loop
271 // *****************************************************************************************
272 Idx nbIte = 0;
274 while ((gap > _threshold_) && (nbIte < nbStep)) {
275 ++nbIte;
276
278
279 // *****************************************************************************************
280 // Then we compare new value function and the old one
282 gap = 0;
283
284 for (deltaV->beginValues(); deltaV->hasValue(); deltaV->nextValue())
285 if (gap < fabs(deltaV->value())) gap = fabs(deltaV->value());
286 delete deltaV;
287
288 if (verbose_)
289 std::cout << " ------------------- Fin itération n° " << nbIte << std::endl
290 << " Gap : " << gap << " - " << _threshold_ << std::endl;
291
292 // *****************************************************************************************
293 // And eventually we update pointers for next loop
294 delete vFunction_;
296 }
297
298 // *****************************************************************************************
299 // Policy matching value function research
300 // *****************************************************************************************
301 this->evalPolicy_();
302 }
virtual void evalPolicy_()
Perform the required tasks to extract an optimal policy.
virtual MultiDimFunctionGraph< GUM_SCALAR > * valueIteration_()
Performs a single step of value iteration.
virtual void initVFunction_()
Performs a single step of value iteration.

References _firstTime_, and initVFunction_().

Referenced by gum::AdaptiveRMaxPlaner::makePlanning().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ maximiseQactions_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::maximiseQactions_ ( std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > & qActionsSet)
protectedvirtual

Performs max_a Q(s,a).

Warning
Performs also the deallocation of the QActions

Definition at line 371 of file structuredPlaner_tpl.h.

372 {
374 qActionsSet.pop_back();
375
376 while (!qActionsSet.empty()) {
378 qActionsSet.pop_back();
380 }
381
382 return newVFunction;
383 }

References operator_.

◆ minimiseFunctions_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::minimiseFunctions_ ( std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > & qActionsSet)
protectedvirtual

Performs min_i F_i.

Warning
Performs also the deallocation of the F_i

Definition at line 389 of file structuredPlaner_tpl.h.

390 {
392 qActionsSet.pop_back();
393
394 while (!qActionsSet.empty()) {
396 qActionsSet.pop_back();
398 }
399
400 return newVFunction;
401 }

References operator_.

◆ optimalPolicy()

template<typename GUM_SCALAR>
INLINE MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy ( )
inlinevirtual

Returns the best policy obtained so far.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 163 of file structuredPlaner.h.

163 {
164 return optimalPolicy_;
165 }

References optimalPolicy_.

◆ optimalPolicy2String()

template<typename GUM_SCALAR>
std::string gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy2String ( )
virtual

Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 124 of file structuredPlaner_tpl.h.

124 {
125 // ************************************************************************
126 // Discarding the case where no \pi* have been computed
127 if (!optimalPolicy_ || optimalPolicy_->root() == 0) return "NO OPTIMAL POLICY CALCULATED YET";
128
129 // ************************************************************************
130 // Initialisation
131
132 // Declaration of the needed string stream
137
138 // First line for the toDot
139 output << std::endl << "digraph \" OPTIMAL POLICY \" {" << std::endl;
140
141 // Form line for the internal node stream en the terminal node stream
142 terminalStream << "node [shape = box];" << std::endl;
143 nonTerminalStream << "node [shape = ellipse];" << std::endl;
144
145 // For somme clarity in the final string
146 std::string tab = "\t";
147
148 // To know if we already checked a node or not
150
151 // FIFO of nodes to visit
153
154 // Loading the FIFO
155 fifo.push(optimalPolicy_->root());
156 visited << optimalPolicy_->root();
157
158
159 // ************************************************************************
160 // Main loop
161 while (!fifo.empty()) {
162 // Node to visit
163 NodeId currentNodeId = fifo.front();
164 fifo.pop();
165
166 // Checking if it is terminal
167 if (optimalPolicy_->isTerminalNode(currentNodeId)) {
168 // Get back the associated ActionSet
170
171 // Creating a line for this node
172 terminalStream << tab << currentNodeId << ";" << tab << currentNodeId << " [label=\""
173 << currentNodeId << " - ";
174
175 // Enumerating and adding to the line the associated optimal actions
176 for (SequenceIteratorSafe< Idx > valIter = ase.beginSafe(); valIter != ase.endSafe();
177 ++valIter)
178 terminalStream << fmdp_->actionName(*valIter) << " ";
179
180 // Terminating line
181 terminalStream << "\"];" << std::endl;
182 continue;
183 }
184
185 // Either wise
186 {
187 // Geting back the associated internal node
189
190 // Creating a line in internalnode stream for this node
191 nonTerminalStream << tab << currentNodeId << ";" << tab << currentNodeId << " [label=\""
192 << currentNodeId << " - " << currentNode->nodeVar()->name() << "\"];"
193 << std::endl;
194
195 // Going through the sons and agregating them according the the sons Ids
197 for (Idx sonIter = 0; sonIter < currentNode->nbSons(); ++sonIter) {
198 if (!visited.exists(currentNode->son(sonIter))) {
199 fifo.push(currentNode->son(sonIter));
200 visited << currentNode->son(sonIter);
201 }
202 if (!sonMap.exists(currentNode->son(sonIter)))
203 sonMap.insert(currentNode->son(sonIter), new LinkedList< Idx >());
204 sonMap[currentNode->son(sonIter)]->addLink(sonIter);
205 }
206
207 // Adding to the arc stram
208 for (auto sonIter = sonMap.beginSafe(); sonIter != sonMap.endSafe(); ++sonIter) {
209 arcstream << tab << currentNodeId << " -> " << sonIter.key() << " [label=\" ";
210 Link< Idx >* modaIter = sonIter.val()->list();
211 while (modaIter) {
212 arcstream << currentNode->nodeVar()->label(modaIter->element());
213 if (modaIter->nextLink()) arcstream << ", ";
214 modaIter = modaIter->nextLink();
215 }
216 arcstream << "\",color=\"#00ff00\"];" << std::endl;
217 delete sonIter.val();
218 }
219 }
220 }
221
222 // Terminating
224 << nonTerminalStream.str() << std::endl
225 << arcstream.str() << std::endl
226 << "}" << std::endl;
227
228 return output.str();
229 }

References optimalPolicy_.

◆ optimalPolicySize()

template<typename GUM_SCALAR>
virtual Size gum::StructuredPlaner< GUM_SCALAR >::optimalPolicySize ( )
inlinevirtual

Returns optimalPolicy computed so far current size.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 170 of file structuredPlaner.h.

170 {
171 return optimalPolicy_ != nullptr ? optimalPolicy_->realSize() : 0;
172 }

References optimalPolicy_.

◆ spumddInstance()

template<typename GUM_SCALAR>
StructuredPlaner< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::spumddInstance ( GUM_SCALAR discountFactor = 0.9,
GUM_SCALAR epsilon = 0.00001,
bool verbose = true )
inlinestatic

Definition at line 92 of file structuredPlaner.h.

References StructuredPlaner().

Referenced by gum::SDYNA::RandomMDDInstance(), and gum::SDYNA::spimddiInstance().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ sviInstance()

template<typename GUM_SCALAR>
StructuredPlaner< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::sviInstance ( GUM_SCALAR discountFactor = 0.9,
GUM_SCALAR epsilon = 0.00001,
bool verbose = true )
inlinestatic

Definition at line 104 of file structuredPlaner.h.

References StructuredPlaner().

Referenced by gum::SDYNA::RandomTreeInstance(), and gum::SDYNA::spitiInstance().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ valueIteration_()

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::valueIteration_ ( )
protectedvirtual

Performs a single step of value iteration.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 325 of file structuredPlaner_tpl.h.

325 {
326 // *****************************************************************************************
327 // Loop reset
329 newVFunction->copyAndReassign(*vFunction_, fmdp_->mapMainPrime());
330
331 // *****************************************************************************************
332 // For each action
334 for (auto actionIter = fmdp_->beginActions(); actionIter != fmdp_->endActions(); ++actionIter) {
336 qActionsSet.push_back(qAction);
337 }
338 delete newVFunction;
339
340 // *****************************************************************************************
341 // Next to evaluate main value function, we take maximise over all action
342 // value, ...
344
345 // *******************************************************************************************
346 // Next, we eval the new function value
348
349 return newVFunction;
350 }
virtual MultiDimFunctionGraph< GUM_SCALAR > * maximiseQactions_(std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
Performs max_a Q(s,a).

References operator_.

◆ vFunction()

template<typename GUM_SCALAR>
INLINE const MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::vFunction ( )
inline

Returns a const ptr on the value function computed so far.

Definition at line 153 of file structuredPlaner.h.

153{ return vFunction_; }

References vFunction_.

◆ vFunctionSize()

template<typename GUM_SCALAR>
virtual Size gum::StructuredPlaner< GUM_SCALAR >::vFunctionSize ( )
inlinevirtual

Returns vFunction computed so far current size.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 158 of file structuredPlaner.h.

158{ return vFunction_ != nullptr ? vFunction_->realSize() : 0; }

References vFunction_.

Member Data Documentation

◆ _firstTime_

template<typename GUM_SCALAR>
bool gum::StructuredPlaner< GUM_SCALAR >::_firstTime_
private

Definition at line 382 of file structuredPlaner.h.

Referenced by makePlanning().

◆ _threshold_

template<typename GUM_SCALAR>
GUM_SCALAR gum::StructuredPlaner< GUM_SCALAR >::_threshold_
private

The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*.

Definition at line 381 of file structuredPlaner.h.

Referenced by StructuredPlaner(), and initialize().

◆ discountFactor_

template<typename GUM_SCALAR>
GUM_SCALAR gum::StructuredPlaner< GUM_SCALAR >::discountFactor_
protected

Discount Factor used for infinite horizon planning.

Definition at line 365 of file structuredPlaner.h.

Referenced by StructuredPlaner(), addReward_(), and initialize().

◆ elVarSeq_

template<typename GUM_SCALAR>
gum::VariableSet gum::StructuredPlaner< GUM_SCALAR >::elVarSeq_
protected

A Set to eleminate primed variables.

Definition at line 360 of file structuredPlaner.h.

Referenced by evalQaction_().

◆ fmdp_

template<typename GUM_SCALAR>
const FMDP< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::fmdp_
protected

The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ).

Definition at line 340 of file structuredPlaner.h.

Referenced by addReward_(), evalPolicy_(), evalQaction_(), fmdp(), initialize(), and initVFunction_().

◆ operator_

template<typename GUM_SCALAR>
IOperatorStrategy< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::operator_
protected

◆ optimalPolicy_

template<typename GUM_SCALAR>
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy_
protected

The associated optimal policy.

Warning
Leaves are ActionSet which contains the ids of the best actions While this is sufficient to be exploited, to be understood by a human somme translation from the fmdp_ is required. optimalPolicy2String do this job.

Definition at line 355 of file structuredPlaner.h.

Referenced by StructuredPlaner(), ~StructuredPlaner(), _recurExtractOptPol_(), extractOptimalPolicy_(), optimalPolicy(), optimalPolicy2String(), and optimalPolicySize().

◆ verbose_

template<typename GUM_SCALAR>
bool gum::StructuredPlaner< GUM_SCALAR >::verbose_
protected

Boolean used to indcates whether or not iteration informations should be displayed on terminal.

Definition at line 373 of file structuredPlaner.h.

Referenced by StructuredPlaner().

◆ vFunction_

template<typename GUM_SCALAR>
MultiDimFunctionGraph< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::vFunction_
protected

The Value Function computed iteratively.

Definition at line 345 of file structuredPlaner.h.

Referenced by StructuredPlaner(), ~StructuredPlaner(), evalPolicy_(), initVFunction_(), vFunction(), and vFunctionSize().


The documentation for this class was generated from the following files: