<agrum/FMDP/planning/structuredPlaner.h> More...

#include <structuredPlaner.h>

Inheritance diagram for gum::StructuredPlaner< GUM_SCALAR >:

Collaboration diagram for gum::StructuredPlaner< GUM_SCALAR >:

Public Member Functions
Datastructure access methods
INLINE const FMDP< GUM_SCALAR > *	fmdp ()
	Returns a const ptr on the Factored Markov Decision Process on which we're planning.
INLINE const MultiDimFunctionGraph< GUM_SCALAR > *	vFunction ()
	Returns a const ptr on the value function computed so far.
virtual Size	vFunctionSize ()
	Returns vFunction computed so far current size.
INLINE MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *	optimalPolicy ()
	Returns the best policy obtained so far.
virtual Size	optimalPolicySize ()
	Returns optimalPolicy computed so far current size.
std::string	optimalPolicy2String ()
	Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.
Planning Methods
virtual void	initialize (const FMDP< GUM_SCALAR > *fmdp)
	Initializes data structure needed for making the planning.
virtual void	makePlanning (Idx nbStep=1000000)
	Performs a value iteration.

Static Public Member Functions

static StructuredPlaner< GUM_SCALAR > *	spumddInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true)
static StructuredPlaner< GUM_SCALAR > *	sviInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true)

Protected Member Functions
Value Iteration Methods
virtual void	initVFunction_ ()
	Performs a single step of value iteration.
virtual MultiDimFunctionGraph< GUM_SCALAR > *	valueIteration_ ()
	Performs a single step of value iteration.
virtual MultiDimFunctionGraph< GUM_SCALAR > *	evalQaction_ (const MultiDimFunctionGraph< GUM_SCALAR > *, Idx)
	Performs the P(s'\|s,a).V^{t-1}(s') part of the value itération.
virtual MultiDimFunctionGraph< GUM_SCALAR > *	maximiseQactions_ (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
	Performs max_a Q(s,a).
virtual MultiDimFunctionGraph< GUM_SCALAR > *	minimiseFunctions_ (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &)
	Performs min_i F_i.
virtual MultiDimFunctionGraph< GUM_SCALAR > *	addReward_ (MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0)
	Perform the R(s) + gamma . function.

Protected Attributes
const FMDP< GUM_SCALAR > *	fmdp_
	The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ).
MultiDimFunctionGraph< GUM_SCALAR > *	vFunction_
	The Value Function computed iteratively.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *	optimalPolicy_
	The associated optimal policy.
gum::VariableSet	elVarSeq_
	A Set to eleminate primed variables.
GUM_SCALAR	discountFactor_
	Discount Factor used for infinite horizon planning.
IOperatorStrategy< GUM_SCALAR > *	operator_
bool	verbose_
	Boolean used to indcates whether or not iteration informations should be displayed on terminal.

Private Attributes
GUM_SCALAR	_threshold_
	The threshold value Whenever \| V^{n} - V^{n+1} \| < threshold, we consider that V ~ V*.
bool	_firstTime_

Constructor & destructor.
	StructuredPlaner (IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose)
	Default constructor.
virtual	~StructuredPlaner ()
	Default destructor.

Optimal policy extraction methods
virtual void	evalPolicy_ ()
	Perform the required tasks to extract an optimal policy.
MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *	makeArgMax_ (const MultiDimFunctionGraph< GUM_SCALAR > *Qaction, Idx actionId)
	Creates a copy of given Qaction that can be exploit by a Argmax.
virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *	argmaximiseQactions_ (std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &)
	Performs argmax_a Q(s,a).
void	extractOptimalPolicy_ (const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
	From V(s)* = argmax_a Q(s,a), this function extract pi(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.
NodeId	_recurArgMaxCopy_ (NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > , MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > , HashTable< NodeId, NodeId > &)
	Recursion part for the createArgMaxCopy.
NodeId	_recurExtractOptPol_ (NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
	Recursion part for the createArgMaxCopy.
void	_transferActionIds_ (const ArgMaxSet< GUM_SCALAR, Idx > &, ActionSet &)
	Extract from an ArgMaxSet the associated ActionSet.

Detailed Description

template<typename GUM_SCALAR>
class gum::StructuredPlaner< GUM_SCALAR >

A class to find optimal policy for a given FMDP.

Perform a structure value iteration planning

Pure virtual functions : regress_, maximize_, argmaximize_, add_ and subtract_ are a priorthe ones to be respecified according to the used datastructure (MDDs, DTs, BNs, ...)

Definition at line 82 of file structuredPlaner.h.

Constructor & Destructor Documentation

◆ StructuredPlaner()

template<typename GUM_SCALAR>

INLINE gum::StructuredPlaner< GUM_SCALAR >::StructuredPlaner	(	IOperatorStrategy< GUM_SCALAR > *	opi,
		GUM_SCALAR	discountFactor,
		GUM_SCALAR	epsilon,
		bool	verbose )

protected

Default constructor.

Definition at line 86 of file structuredPlaner_tpl.h.

                                                                              :
      discountFactor_(discountFactor), operator_(opi), verbose_(verbose) {
    GUM_CONSTRUCTOR(StructuredPlaner);
 
    _threshold_    = epsilon;
    vFunction_     = nullptr;
    optimalPolicy_ = nullptr;
  }

References StructuredPlaner(), _threshold_, discountFactor_, operator_, optimalPolicy_, verbose_, and vFunction_.

Referenced by StructuredPlaner(), ~StructuredPlaner(), spumddInstance(), and sviInstance().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ ~StructuredPlaner()

template<typename GUM_SCALAR>

INLINE gum::StructuredPlaner< GUM_SCALAR >::~StructuredPlaner ( )

virtual

Default destructor.

Definition at line 102 of file structuredPlaner_tpl.h.

                                                           {
    GUM_DESTRUCTOR(StructuredPlaner);
 
    if (vFunction_) { delete vFunction_; }
 
    if (optimalPolicy_) delete optimalPolicy_;
 
    delete operator_;
  }

References StructuredPlaner(), operator_, optimalPolicy_, and vFunction_.

Here is the call graph for this function:

Member Function Documentation

◆ _recurArgMaxCopy_()

template<typename GUM_SCALAR>

NodeId gum::StructuredPlaner< GUM_SCALAR >::_recurArgMaxCopy_	(	NodeId	currentNodeId,
		Idx	actionId,
		const MultiDimFunctionGraph< GUM_SCALAR > *	src,
		MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *	argMaxCpy,
		HashTable< NodeId, NodeId > &	visitedNodes )

private

Recursion part for the createArgMaxCopy.

Definition at line 499 of file structuredPlaner_tpl.h.

                                                                                                  {
    if (visitedNodes.exists(currentNodeId)) return visitedNodes[currentNodeId];
 
    NodeId nody;
    if (src->isTerminalNode(currentNodeId)) {
      ArgMaxSet< GUM_SCALAR, Idx > leaf(src->nodeValue(currentNodeId), actionId);
      nody = argMaxCpy->manager()->addTerminalNode(leaf);
    } else {
      const InternalNode* currentNode = src->node(currentNodeId);
      NodeId*             sonsMap     = static_cast< NodeId* >(
          SOA_ALLOCATE(sizeof(NodeId) * currentNode->nodeVar()->domainSize()));
      for (Idx moda = 0; moda < currentNode->nodeVar()->domainSize(); ++moda)
        sonsMap[moda]
            = _recurArgMaxCopy_(currentNode->son(moda), actionId, src, argMaxCpy, visitedNodes);
      nody = argMaxCpy->manager()->addInternalNode(currentNode->nodeVar(), sonsMap);
    }
    visitedNodes.insert(currentNodeId, nody);
    return nody;
  }

References _recurArgMaxCopy_(), gum::DiscreteVariable::domainSize(), gum::HashTable< Key, Val >::exists(), gum::HashTable< Key, Val >::insert(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::isTerminalNode(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::node(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::nodeValue(), gum::InternalNode::nodeVar(), SOA_ALLOCATE, and gum::InternalNode::son().

Referenced by _recurArgMaxCopy_(), and makeArgMax_().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ _recurExtractOptPol_()

template<typename GUM_SCALAR>

NodeId gum::StructuredPlaner< GUM_SCALAR >::_recurExtractOptPol_	(	NodeId	currentNodeId,
		const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *	argMaxOptVFunc,
		HashTable< NodeId, NodeId > &	visitedNodes )

private

Recursion part for the createArgMaxCopy.

Definition at line 576 of file structuredPlaner_tpl.h.

                                                 {
    if (visitedNodes.exists(currentNodeId)) return visitedNodes[currentNodeId];
 
    NodeId nody;
    if (argMaxOptVFunc->isTerminalNode(currentNodeId)) {
      ActionSet leaf;
      _transferActionIds_(argMaxOptVFunc->nodeValue(currentNodeId), leaf);
      nody = optimalPolicy_->manager()->addTerminalNode(leaf);
    } else {
      const InternalNode* currentNode = argMaxOptVFunc->node(currentNodeId);
      NodeId*             sonsMap     = static_cast< NodeId* >(
          SOA_ALLOCATE(sizeof(NodeId) * currentNode->nodeVar()->domainSize()));
      for (Idx moda = 0; moda < currentNode->nodeVar()->domainSize(); ++moda)
        sonsMap[moda] = _recurExtractOptPol_(currentNode->son(moda), argMaxOptVFunc, visitedNodes);
      nody = optimalPolicy_->manager()->addInternalNode(currentNode->nodeVar(), sonsMap);
    }
    visitedNodes.insert(currentNodeId, nody);
    return nody;
  }

References _recurExtractOptPol_(), _transferActionIds_(), gum::DiscreteVariable::domainSize(), gum::HashTable< Key, Val >::exists(), gum::HashTable< Key, Val >::insert(), gum::InternalNode::nodeVar(), optimalPolicy_, SOA_ALLOCATE, and gum::InternalNode::son().

Referenced by _recurExtractOptPol_(), and extractOptimalPolicy_().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ _transferActionIds_()

template<typename GUM_SCALAR>

void gum::StructuredPlaner< GUM_SCALAR >::_transferActionIds_	(	const ArgMaxSet< GUM_SCALAR, Idx > &	src,
		ActionSet &	dest )

private

Extract from an ArgMaxSet the associated ActionSet.

Definition at line 604 of file structuredPlaner_tpl.h.

                                                                            {
    for (auto idi = src.beginSafe(); idi != src.endSafe(); ++idi)
      dest += *idi;
  }

References gum::ArgMaxSet< GUM_SCALAR_VAL, GUM_SCALAR_SEQ >::beginSafe(), and gum::ArgMaxSet< GUM_SCALAR_VAL, GUM_SCALAR_SEQ >::endSafe().

Referenced by _recurExtractOptPol_().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ addReward_()

template<typename GUM_SCALAR>

MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::addReward_	(	MultiDimFunctionGraph< GUM_SCALAR > *	function,
		Idx	actionId = 0 )

protectedvirtual

Perform the R(s) + gamma . function.

Warning: function is deleted, new one is returned

Definition at line 408 of file structuredPlaner_tpl.h.

                                                                                                {
    // *****************************************************************************************
    // ... we multiply the result by the discount factor, ...
    MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = operator_->getFunctionInstance();
    newVFunction->copyAndMultiplyByScalar(*Vold, this->discountFactor_);
    delete Vold;
 
    // *****************************************************************************************
    // ... and finally add reward
    newVFunction = operator_->add(newVFunction, RECAST(fmdp_->reward(actionId)));
 
    return newVFunction;
  }

References gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndMultiplyByScalar(), discountFactor_, fmdp_, operator_, and RECAST.

Referenced by evalPolicy_().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ argmaximiseQactions_()

template<typename GUM_SCALAR>

MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< GUM_SCALAR >::argmaximiseQactions_ ( std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > & qActionsSet )

protectedvirtual

Performs argmax_a Q(s,a).

Warning: Performs also the deallocation of the QActions

Definition at line 529 of file structuredPlaner_tpl.h.

                                                                                       {
    MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* newVFunction
        = qActionsSet.back();
    qActionsSet.pop_back();
 
    while (!qActionsSet.empty()) {
      MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* qAction
          = qActionsSet.back();
      qActionsSet.pop_back();
      newVFunction = operator_->argmaximize(newVFunction, qAction);
    }
 
    return newVFunction;
  }

References operator_.

Referenced by evalPolicy_().

Here is the caller graph for this function:

◆ evalPolicy_()

template<typename GUM_SCALAR>

void gum::StructuredPlaner< GUM_SCALAR >::evalPolicy_ ( )

protectedvirtual

Perform the required tasks to extract an optimal policy.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 435 of file structuredPlaner_tpl.h.

                                                   {
    // *****************************************************************************************
    // Loop reset
    MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = operator_->getFunctionInstance();
    newVFunction->copyAndReassign(*vFunction_, fmdp_->mapMainPrime());
 
    std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* >
        argMaxQActionsSet;
    // *****************************************************************************************
    // For each action
    for (auto actionIter = fmdp_->beginActions(); actionIter != fmdp_->endActions(); ++actionIter) {
      MultiDimFunctionGraph< GUM_SCALAR >* qAction = this->evalQaction_(newVFunction, *actionIter);
 
      qAction = this->addReward_(qAction);
 
      argMaxQActionsSet.push_back(makeArgMax_(qAction, *actionIter));
    }
    delete newVFunction;
 
 
    // *****************************************************************************************
    // Next to evaluate main value function, we take maximise over all action
    // value, ...
    MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* argMaxVFunction
        = argmaximiseQactions_(argMaxQActionsSet);
 
    // *****************************************************************************************
    // Next to evaluate main value function, we take maximise over all action
    // value, ...
    extractOptimalPolicy_(argMaxVFunction);
  }

References addReward_(), argmaximiseQactions_(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndReassign(), evalQaction_(), extractOptimalPolicy_(), fmdp_, makeArgMax_(), operator_, and vFunction_.

Here is the call graph for this function:

◆ evalQaction_()

template<typename GUM_SCALAR>

MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::evalQaction_	(	const MultiDimFunctionGraph< GUM_SCALAR > *	Vold,
		Idx	actionId )

protectedvirtual

Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.

Definition at line 357 of file structuredPlaner_tpl.h.

                                                                 {
    // ******************************************************************************
    // Initialisation :
    // Creating a copy of last Vfunction to deduce from the new Qaction
    // And finding the first var to eleminate (the one at the end)
 
    return operator_->regress(Vold, actionId, this->fmdp_, this->elVarSeq_);
  }

References elVarSeq_, fmdp_, and operator_.

Referenced by evalPolicy_().

Here is the caller graph for this function:

◆ extractOptimalPolicy_()

template<typename GUM_SCALAR>

void gum::StructuredPlaner< GUM_SCALAR >::extractOptimalPolicy_ ( const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * optimalValueFunction )

protected

From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.

Warning: deallocate the argmax optimal value function

Definition at line 552 of file structuredPlaner_tpl.h.

                                      {
    optimalPolicy_->clear();
 
    // Insertion des nouvelles variables
    for (SequenceIteratorSafe< const DiscreteVariable* > varIter
         = argMaxOptimalValueFunction->variablesSequence().beginSafe();
         varIter != argMaxOptimalValueFunction->variablesSequence().endSafe();
         ++varIter)
      optimalPolicy_->add(**varIter);
 
    HashTable< NodeId, NodeId > src2dest;
    optimalPolicy_->manager()->setRootNode(_recurExtractOptPol_(argMaxOptimalValueFunction->root(),
                                                                argMaxOptimalValueFunction,
                                                                src2dest));
 
    delete argMaxOptimalValueFunction;
  }

References _recurExtractOptPol_(), and optimalPolicy_.

Referenced by evalPolicy_().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ fmdp()

template<typename GUM_SCALAR>

INLINE const FMDP< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::fmdp ( )

inline

Returns a const ptr on the Factored Markov Decision Process on which we're planning.

Definition at line 148 of file structuredPlaner.h.

148{ return fmdp_; }

References fmdp_.

Referenced by initialize().

Here is the caller graph for this function:

◆ initialize()

template<typename GUM_SCALAR>

void gum::StructuredPlaner< GUM_SCALAR >::initialize ( const FMDP< GUM_SCALAR > * fmdp )

virtual

Initializes data structure needed for making the planning.

Warning: No calling this methods before starting the first makePlaninng will surely and definitely result in a crash

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 243 of file structuredPlaner_tpl.h.

                                                                                {
    fmdp_ = fmdp;
 
    // Determination of the threshold value
    _threshold_ *= (1 - discountFactor_) / (2 * discountFactor_);
 
    // Establishement of sequence of variable elemination
    for (auto varIter = fmdp_->beginVariables(); varIter != fmdp_->endVariables(); ++varIter)
      elVarSeq_ << fmdp_->main2prime(*varIter);
 
    // Initialisation of the value function
    vFunction_     = operator_->getFunctionInstance();
    optimalPolicy_ = operator_->getAggregatorInstance();
    _firstTime_    = true;
  }

References _threshold_, discountFactor_, fmdp(), and fmdp_.

Referenced by gum::AdaptiveRMaxPlaner::initialize().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ initVFunction_()

template<typename GUM_SCALAR>

void gum::StructuredPlaner< GUM_SCALAR >::initVFunction_ ( )

protectedvirtual

Performs a single step of value iteration.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 308 of file structuredPlaner_tpl.h.

                                                      {
    vFunction_->copy(*(RECAST(fmdp_->reward())));
  }

References fmdp_, RECAST, and vFunction_.

Referenced by makePlanning().

Here is the caller graph for this function:

◆ makeArgMax_()

template<typename GUM_SCALAR>

MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< GUM_SCALAR >::makeArgMax_	(	const MultiDimFunctionGraph< GUM_SCALAR > *	Qaction,
		Idx	actionId )

protected

Creates a copy of given Qaction that can be exploit by a Argmax.

Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction

Parameters

Qaction	: the function graph we want to transform
actionId	: the action Id associated to that graph

Warning: delete the original Qaction, returns its conversion

Definition at line 474 of file structuredPlaner_tpl.h.

                                                               {
    MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* amcpy
        = operator_->getArgMaxFunctionInstance();
 
    // Insertion des nouvelles variables
    for (SequenceIteratorSafe< const DiscreteVariable* > varIter
         = qAction->variablesSequence().beginSafe();
         varIter != qAction->variablesSequence().endSafe();
         ++varIter)
      amcpy->add(**varIter);
 
    HashTable< NodeId, NodeId > src2dest;
    amcpy->manager()->setRootNode(
        _recurArgMaxCopy_(qAction->root(), actionId, qAction, amcpy, src2dest));
 
    delete qAction;
    return amcpy;
  }

References _recurArgMaxCopy_(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::add(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), operator_, gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::root(), and gum::MultiDimImplementation< GUM_SCALAR >::variablesSequence().

Referenced by evalPolicy_().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ makePlanning()

template<typename GUM_SCALAR>

void gum::StructuredPlaner< GUM_SCALAR >::makePlanning ( Idx nbStep = 1000000 )

virtual

Performs a value iteration.

Parameters

nbStep : enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 263 of file structuredPlaner_tpl.h.

                                                              {
    if (_firstTime_) {
      this->initVFunction_();
      _firstTime_ = false;
    }
 
    // *****************************************************************************************
    // Main loop
    // *****************************************************************************************
    Idx        nbIte = 0;
    GUM_SCALAR gap   = _threshold_ + 1;
    while ((gap > _threshold_) && (nbIte < nbStep)) {
      ++nbIte;
 
      MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = this->valueIteration_();
 
      // *****************************************************************************************
      // Then we compare new value function and the old one
      MultiDimFunctionGraph< GUM_SCALAR >* deltaV = operator_->subtract(newVFunction, vFunction_);
      gap                                         = 0;
 
      for (deltaV->beginValues(); deltaV->hasValue(); deltaV->nextValue())
        if (gap < fabs(deltaV->value())) gap = fabs(deltaV->value());
      delete deltaV;
 
      if (verbose_)
        std::cout << " ------------------- Fin itération n° " << nbIte << std::endl
                  << " Gap : " << gap << " - " << _threshold_ << std::endl;
 
      // *****************************************************************************************
      // And eventually we update pointers for next loop
      delete vFunction_;
      vFunction_ = newVFunction;
    }
 
    // *****************************************************************************************
    // Policy matching value function research
    // *****************************************************************************************
    this->evalPolicy_();
  }

References _firstTime_, and initVFunction_().

Referenced by gum::AdaptiveRMaxPlaner::makePlanning().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ maximiseQactions_()

template<typename GUM_SCALAR>

MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::maximiseQactions_ ( std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > & qActionsSet )

protectedvirtual

Performs max_a Q(s,a).

Warning: Performs also the deallocation of the QActions

Definition at line 371 of file structuredPlaner_tpl.h.

                                                                      {
    MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
    qActionsSet.pop_back();
 
    while (!qActionsSet.empty()) {
      MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
      qActionsSet.pop_back();
      newVFunction = operator_->maximize(newVFunction, qAction);
    }
 
    return newVFunction;
  }

References operator_.

◆ minimiseFunctions_()

template<typename GUM_SCALAR>

MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::minimiseFunctions_ ( std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > & qActionsSet )

protectedvirtual

Performs min_i F_i.

Warning: Performs also the deallocation of the F_i

Definition at line 389 of file structuredPlaner_tpl.h.

                                                                      {
    MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
    qActionsSet.pop_back();
 
    while (!qActionsSet.empty()) {
      MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
      qActionsSet.pop_back();
      newVFunction = operator_->minimize(newVFunction, qAction);
    }
 
    return newVFunction;
  }

References operator_.

◆ optimalPolicy()

template<typename GUM_SCALAR>

INLINE MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy ( )

inlinevirtual

Returns the best policy obtained so far.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 163 of file structuredPlaner.h.

                                                                                      {
      return optimalPolicy_;
    }

References optimalPolicy_.

◆ optimalPolicy2String()

template<typename GUM_SCALAR>

std::string gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy2String ( )

virtual

Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 124 of file structuredPlaner_tpl.h.

                                                                 {
    // ************************************************************************
    // Discarding the case where no \pi* have been computed
    if (!optimalPolicy_ || optimalPolicy_->root() == 0) return "NO OPTIMAL POLICY CALCULATED YET";
 
    // ************************************************************************
    // Initialisation
 
    // Declaration of the needed string stream
    std::stringstream output;
    std::stringstream terminalStream;
    std::stringstream nonTerminalStream;
    std::stringstream arcstream;
 
    // First line for the toDot
    output << std::endl << "digraph \" OPTIMAL POLICY \" {" << std::endl;
 
    // Form line for the internal node stream en the terminal node stream
    terminalStream << "node [shape = box];" << std::endl;
    nonTerminalStream << "node [shape = ellipse];" << std::endl;
 
    // For somme clarity in the final string
    std::string tab = "\t";
 
    // To know if we already checked a node or not
    Set< NodeId > visited;
 
    // FIFO of nodes to visit
    std::queue< NodeId > fifo;
 
    // Loading the FIFO
    fifo.push(optimalPolicy_->root());
    visited << optimalPolicy_->root();
 
 
    // ************************************************************************
    // Main loop
    while (!fifo.empty()) {
      // Node to visit
      NodeId currentNodeId = fifo.front();
      fifo.pop();
 
      // Checking if it is terminal
      if (optimalPolicy_->isTerminalNode(currentNodeId)) {
        // Get back the associated ActionSet
        ActionSet ase = optimalPolicy_->nodeValue(currentNodeId);
 
        // Creating a line for this node
        terminalStream << tab << currentNodeId << ";" << tab << currentNodeId << " [label=\""
                       << currentNodeId << " - ";
 
        // Enumerating and adding to the line the associated optimal actions
        for (SequenceIteratorSafe< Idx > valIter = ase.beginSafe(); valIter != ase.endSafe();
             ++valIter)
          terminalStream << fmdp_->actionName(*valIter) << " ";
 
        // Terminating line
        terminalStream << "\"];" << std::endl;
        continue;
      }
 
      // Either wise
      {
        // Geting back the associated internal node
        const InternalNode* currentNode = optimalPolicy_->node(currentNodeId);
 
        // Creating a line in internalnode stream for this node
        nonTerminalStream << tab << currentNodeId << ";" << tab << currentNodeId << " [label=\""
                          << currentNodeId << " - " << currentNode->nodeVar()->name() << "\"];"
                          << std::endl;
 
        // Going through the sons and agregating them according the the sons Ids
        HashTable< NodeId, LinkedList< Idx >* > sonMap;
        for (Idx sonIter = 0; sonIter < currentNode->nbSons(); ++sonIter) {
          if (!visited.exists(currentNode->son(sonIter))) {
            fifo.push(currentNode->son(sonIter));
            visited << currentNode->son(sonIter);
          }
          if (!sonMap.exists(currentNode->son(sonIter)))
            sonMap.insert(currentNode->son(sonIter), new LinkedList< Idx >());
          sonMap[currentNode->son(sonIter)]->addLink(sonIter);
        }
 
        // Adding to the arc stram
        for (auto sonIter = sonMap.beginSafe(); sonIter != sonMap.endSafe(); ++sonIter) {
          arcstream << tab << currentNodeId << " -> " << sonIter.key() << " [label=\" ";
          Link< Idx >* modaIter = sonIter.val()->list();
          while (modaIter) {
            arcstream << currentNode->nodeVar()->label(modaIter->element());
            if (modaIter->nextLink()) arcstream << ", ";
            modaIter = modaIter->nextLink();
          }
          arcstream << "\",color=\"#00ff00\"];" << std::endl;
          delete sonIter.val();
        }
      }
    }
 
    // Terminating
    output << terminalStream.str() << std::endl
           << nonTerminalStream.str() << std::endl
           << arcstream.str() << std::endl
           << "}" << std::endl;
 
    return output.str();
  }

References optimalPolicy_.

◆ optimalPolicySize()

template<typename GUM_SCALAR>

virtual Size gum::StructuredPlaner< GUM_SCALAR >::optimalPolicySize ( )

inlinevirtual

Returns optimalPolicy computed so far current size.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 170 of file structuredPlaner.h.

                                     {
      return optimalPolicy_ != nullptr ? optimalPolicy_->realSize() : 0;
    }

References optimalPolicy_.

◆ spumddInstance()

template<typename GUM_SCALAR>

StructuredPlaner< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::spumddInstance	(	GUM_SCALAR	discountFactor = 0.9,
		GUM_SCALAR	epsilon = 0.00001,
		bool	verbose = true )

inlinestatic

Definition at line 92 of file structuredPlaner.h.

                                                                                          {
      return new StructuredPlaner< GUM_SCALAR >(new MDDOperatorStrategy< GUM_SCALAR >(),
                                                discountFactor,
                                                epsilon,
                                                verbose);
    }

References StructuredPlaner().

Referenced by gum::SDYNA::RandomMDDInstance(), and gum::SDYNA::spimddiInstance().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ sviInstance()

template<typename GUM_SCALAR>

StructuredPlaner< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::sviInstance	(	GUM_SCALAR	discountFactor = 0.9,
		GUM_SCALAR	epsilon = 0.00001,
		bool	verbose = true )

inlinestatic

Definition at line 104 of file structuredPlaner.h.

                                                                                       {
      return new StructuredPlaner< GUM_SCALAR >(new TreeOperatorStrategy< GUM_SCALAR >(),
                                                discountFactor,
                                                epsilon,
                                                verbose);
    }

References StructuredPlaner().

Referenced by gum::SDYNA::RandomTreeInstance(), and gum::SDYNA::spitiInstance().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ valueIteration_()

template<typename GUM_SCALAR>

MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::valueIteration_ ( )

protectedvirtual

Performs a single step of value iteration.

Reimplemented in gum::AdaptiveRMaxPlaner.

Definition at line 325 of file structuredPlaner_tpl.h.

                                                                                       {
    // *****************************************************************************************
    // Loop reset
    MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = operator_->getFunctionInstance();
    newVFunction->copyAndReassign(*vFunction_, fmdp_->mapMainPrime());
 
    // *****************************************************************************************
    // For each action
    std::vector< MultiDimFunctionGraph< GUM_SCALAR >* > qActionsSet;
    for (auto actionIter = fmdp_->beginActions(); actionIter != fmdp_->endActions(); ++actionIter) {
      MultiDimFunctionGraph< GUM_SCALAR >* qAction = this->evalQaction_(newVFunction, *actionIter);
      qActionsSet.push_back(qAction);
    }
    delete newVFunction;
 
    // *****************************************************************************************
    // Next to evaluate main value function, we take maximise over all action
    // value, ...
    newVFunction = this->maximiseQactions_(qActionsSet);
 
    // *******************************************************************************************
    // Next, we eval the new function value
    newVFunction = this->addReward_(newVFunction);
 
    return newVFunction;
  }

References operator_.

◆ vFunction()

template<typename GUM_SCALAR>

INLINE const MultiDimFunctionGraph< GUM_SCALAR > * gum::StructuredPlaner< GUM_SCALAR >::vFunction ( )

inline

Returns a const ptr on the value function computed so far.

Definition at line 153 of file structuredPlaner.h.

153{ return vFunction_; }

References vFunction_.

◆ vFunctionSize()

template<typename GUM_SCALAR>

virtual Size gum::StructuredPlaner< GUM_SCALAR >::vFunctionSize ( )

inlinevirtual

Returns vFunction computed so far current size.

Implements gum::IPlanningStrategy< GUM_SCALAR >.

Definition at line 158 of file structuredPlaner.h.

158{ return vFunction_ != nullptr ? vFunction_->realSize() : 0; }

References vFunction_.

Member Data Documentation

◆ _firstTime_

template<typename GUM_SCALAR>

bool gum::StructuredPlaner< GUM_SCALAR >::_firstTime_

private

Definition at line 382 of file structuredPlaner.h.

Referenced by makePlanning().

◆ _threshold_

template<typename GUM_SCALAR>

GUM_SCALAR gum::StructuredPlaner< GUM_SCALAR >::_threshold_

private

The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*.

Definition at line 381 of file structuredPlaner.h.

Referenced by StructuredPlaner(), and initialize().

◆ discountFactor_

template<typename GUM_SCALAR>

GUM_SCALAR gum::StructuredPlaner< GUM_SCALAR >::discountFactor_

protected

Discount Factor used for infinite horizon planning.

Definition at line 365 of file structuredPlaner.h.

Referenced by StructuredPlaner(), addReward_(), and initialize().

◆ elVarSeq_

template<typename GUM_SCALAR>

gum::VariableSet gum::StructuredPlaner< GUM_SCALAR >::elVarSeq_

protected

A Set to eleminate primed variables.

Definition at line 360 of file structuredPlaner.h.

Referenced by evalQaction_().

◆ fmdp_

template<typename GUM_SCALAR>

const FMDP< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::fmdp_

protected

The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ).

Definition at line 340 of file structuredPlaner.h.

Referenced by addReward_(), evalPolicy_(), evalQaction_(), fmdp(), initialize(), and initVFunction_().

◆ operator_

template<typename GUM_SCALAR>

IOperatorStrategy< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::operator_

protected

Definition at line 367 of file structuredPlaner.h.

Referenced by StructuredPlaner(), ~StructuredPlaner(), addReward_(), argmaximiseQactions_(), evalPolicy_(), evalQaction_(), makeArgMax_(), maximiseQactions_(), minimiseFunctions_(), and valueIteration_().

◆ optimalPolicy_

template<typename GUM_SCALAR>

MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< GUM_SCALAR >::optimalPolicy_

protected

The associated optimal policy.

Warning: Leaves are ActionSet which contains the ids of the best actions While this is sufficient to be exploited, to be understood by a human somme translation from the fmdp_ is required. optimalPolicy2String do this job.

Definition at line 355 of file structuredPlaner.h.

Referenced by StructuredPlaner(), ~StructuredPlaner(), _recurExtractOptPol_(), extractOptimalPolicy_(), optimalPolicy(), optimalPolicy2String(), and optimalPolicySize().

◆ verbose_

template<typename GUM_SCALAR>

bool gum::StructuredPlaner< GUM_SCALAR >::verbose_

protected

Boolean used to indcates whether or not iteration informations should be displayed on terminal.

Definition at line 373 of file structuredPlaner.h.

Referenced by StructuredPlaner().

◆ vFunction_

template<typename GUM_SCALAR>

MultiDimFunctionGraph< GUM_SCALAR >* gum::StructuredPlaner< GUM_SCALAR >::vFunction_

protected

The Value Function computed iteratively.

Definition at line 345 of file structuredPlaner.h.

Referenced by StructuredPlaner(), ~StructuredPlaner(), evalPolicy_(), initVFunction_(), vFunction(), and vFunctionSize().

The documentation for this class was generated from the following files:

agrum/FMDP/planning/structuredPlaner.h
agrum/FMDP/planning/structuredPlaner_tpl.h

Public Member Functions

Static Public Member Functions

Protected Member Functions

Protected Attributes

Private Attributes

Constructor & destructor.

Optimal policy extraction methods

Detailed Description

Constructor & Destructor Documentation

◆ StructuredPlaner()

◆ ~StructuredPlaner()

Member Function Documentation

◆ _recurArgMaxCopy_()

◆ _recurExtractOptPol_()

◆ _transferActionIds_()

◆ addReward_()

◆ argmaximiseQactions_()

◆ evalPolicy_()

◆ evalQaction_()

◆ extractOptimalPolicy_()

◆ fmdp()

◆ initialize()

◆ initVFunction_()

◆ makeArgMax_()

◆ makePlanning()

◆ maximiseQactions_()

◆ minimiseFunctions_()

◆ optimalPolicy()

◆ optimalPolicy2String()

◆ optimalPolicySize()

◆ spumddInstance()

◆ sviInstance()

◆ valueIteration_()

◆ vFunction()

◆ vFunctionSize()

Member Data Documentation

◆ _firstTime_

◆ _threshold_

◆ discountFactor_

◆ elVarSeq_

◆ fmdp_

◆ operator_

◆ optimalPolicy_

◆ verbose_

◆ vFunction_