<agrum/FMDP/planning/adaptiveRMaxPlaner.h> More...

#include <adaptiveRMaxPlaner.h>

Inheritance diagram for gum::AdaptiveRMaxPlaner:

Collaboration diagram for gum::AdaptiveRMaxPlaner:

Public Member Functions
Planning Methods
void	initialize (const FMDP< double > *fmdp)
	Initializes data structure needed for making the planning.
void	makePlanning (Idx nbStep=1000000)
	Performs a value iteration.
Datastructure access methods
INLINE const FMDP< double > *	fmdp ()
	Returns a const ptr on the Factored Markov Decision Process on which we're planning.
INLINE const MultiDimFunctionGraph< double > *	vFunction ()
	Returns a const ptr on the value function computed so far.
virtual Size	vFunctionSize ()
	Returns vFunction computed so far current size.
INLINE MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *	optimalPolicy ()
	Returns the best policy obtained so far.
virtual Size	optimalPolicySize ()
	Returns optimalPolicy computed so far current size.
std::string	optimalPolicy2String ()
	Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.
Planning Methods
virtual void	initialize (const FMDP< double > *fmdp)
	Initializes data structure needed for making the planning.

Static Public Member Functions

static AdaptiveRMaxPlaner *	ReducedAndOrderedInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
static AdaptiveRMaxPlaner *	TreeInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)

static StructuredPlaner< double > *	spumddInstance (double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)
static StructuredPlaner< double > *	sviInstance (double discountFactor=0.9, double epsilon=0.00001, bool verbose=true)

Protected Member Functions
Value Iteration Methods
virtual void	initVFunction_ ()
	Performs a single step of value iteration.
virtual MultiDimFunctionGraph< double > *	valueIteration_ ()
	Performs a single step of value iteration.
Optimal policy extraction methods
virtual void	evalPolicy_ ()
	Perform the required tasks to extract an optimal policy.
Value Iteration Methods
virtual MultiDimFunctionGraph< double > *	evalQaction_ (const MultiDimFunctionGraph< double > *, Idx)
	Performs the P(s'\|s,a).V^{t-1}(s') part of the value itération.
virtual MultiDimFunctionGraph< double > *	maximiseQactions_ (std::vector< MultiDimFunctionGraph< double > * > &)
	Performs max_a Q(s,a).
virtual MultiDimFunctionGraph< double > *	minimiseFunctions_ (std::vector< MultiDimFunctionGraph< double > * > &)
	Performs min_i F_i.
virtual MultiDimFunctionGraph< double > *	addReward_ (MultiDimFunctionGraph< double > *function, Idx actionId=0)
	Perform the R(s) + gamma . function.

Protected Attributes
const FMDP< double > *	fmdp_
	The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ).
MultiDimFunctionGraph< double > *	vFunction_
	The Value Function computed iteratively.
MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *	optimalPolicy_
	The associated optimal policy.
gum::VariableSet	elVarSeq_
	A Set to eleminate primed variables.
double	discountFactor_
	Discount Factor used for infinite horizon planning.
IOperatorStrategy< double > *	operator_
bool	verbose_
	Boolean used to indcates whether or not iteration informations should be displayed on terminal.

Private Member Functions
void	_makeRMaxFunctionGraphs_ ()
std::pair< NodeId, NodeId >	_visitLearner_ (const IVisitableGraphLearner , NodeId currentNodeId, MultiDimFunctionGraph< double > , MultiDimFunctionGraph< double > *)
void	_clearTables_ ()

Private Attributes
HashTable< Idx, MultiDimFunctionGraph< double > * >	_actionsRMaxTable_
HashTable< Idx, MultiDimFunctionGraph< double > * >	_actionsBoolTable_
const ILearningStrategy *	_fmdpLearner_
double	_rThreshold_
double	_rmax_
double	_threshold_
	The threshold value Whenever \| V^{n} - V^{n+1} \| < threshold, we consider that V ~ V*.
bool	_firstTime_

Incremental methods
HashTable< Idx, StatesCounter * >	_counterTable_
HashTable< Idx, bool >	_initializedTable_
bool	_initialized_
void	checkState (const Instantiation &newState, Idx actionId)

Constructor & destructor.
	AdaptiveRMaxPlaner (IOperatorStrategy< double > opi, double discountFactor, double epsilon, const ILearningStrategy learner, bool verbose)
	Default constructor.
	~AdaptiveRMaxPlaner ()
	Default destructor.

Optimal policy extraction methods
NodeId	_recurArgMaxCopy_ (NodeId, Idx, const MultiDimFunctionGraph< double > , MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > , HashTable< NodeId, NodeId > &)
	Recursion part for the createArgMaxCopy.
NodeId	_recurExtractOptPol_ (NodeId, const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &)
	Recursion part for the createArgMaxCopy.
void	_transferActionIds_ (const ArgMaxSet< double, Idx > &, ActionSet &)
	Extract from an ArgMaxSet the associated ActionSet.
MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *	makeArgMax_ (const MultiDimFunctionGraph< double > *Qaction, Idx actionId)
	Creates a copy of given Qaction that can be exploit by a Argmax.
virtual MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *	argmaximiseQactions_ (std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * > &)
	Performs argmax_a Q(s,a).
void	extractOptimalPolicy_ (const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *optimalValueFunction)
	From V(s)* = argmax_a Q(s,a), this function extract pi(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.

Incremental methods
void	setOptimalStrategy (MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *optPol)
virtual ActionSet	stateOptimalPolicy (const Instantiation &curState)
const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *	optPol_ {nullptr}
ActionSet	allActions_

Detailed Description

A class to find optimal policy for a given FMDP.

Perform a RMax planning on given in parameter factored markov decision process

Definition at line 73 of file adaptiveRMaxPlaner.h.

Constructor & Destructor Documentation

◆ AdaptiveRMaxPlaner()

gum::AdaptiveRMaxPlaner::AdaptiveRMaxPlaner	(	IOperatorStrategy< double > *	opi,
		double	discountFactor,
		double	epsilon,
		const ILearningStrategy *	learner,
		bool	verbose )

private

Default constructor.

Definition at line 84 of file adaptiveRMaxPlaner.cpp.

                                                                               :
      StructuredPlaner(opi, discountFactor, epsilon, verbose), IDecisionStrategy(),
      _fmdpLearner_(learner), _initialized_(false) {
    GUM_CONSTRUCTOR(AdaptiveRMaxPlaner);
  }

References AdaptiveRMaxPlaner(), gum::StructuredPlaner< double >::StructuredPlaner(), _fmdpLearner_, and _initialized_.

Referenced by AdaptiveRMaxPlaner(), ~AdaptiveRMaxPlaner(), ReducedAndOrderedInstance(), and TreeInstance().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ ~AdaptiveRMaxPlaner()

gum::AdaptiveRMaxPlaner::~AdaptiveRMaxPlaner ( )

Default destructor.

Definition at line 97 of file adaptiveRMaxPlaner.cpp.

                                          {
    GUM_DESTRUCTOR(AdaptiveRMaxPlaner);
 
    for (HashTableIteratorSafe< Idx, StatesCounter* > scIter = _counterTable_.beginSafe();
         scIter != _counterTable_.endSafe();
         ++scIter)
      delete scIter.val();
  }

References AdaptiveRMaxPlaner(), and _counterTable_.

Here is the call graph for this function:

Member Function Documentation

◆ _clearTables_()

void gum::AdaptiveRMaxPlaner::_clearTables_ ( )

private

Definition at line 342 of file adaptiveRMaxPlaner.cpp.

                                         {
    for (auto actionIter = this->fmdp()->beginActions(); actionIter != this->fmdp()->endActions();
         ++actionIter) {
      delete _actionsBoolTable_[*actionIter];
      delete _actionsRMaxTable_[*actionIter];
    }
    _actionsRMaxTable_.clear();
    _actionsBoolTable_.clear();
  }

References _actionsBoolTable_, _actionsRMaxTable_, gum::FMDP< GUM_SCALAR >::endActions(), and gum::StructuredPlaner< double >::fmdp().

Referenced by makePlanning().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ _makeRMaxFunctionGraphs_()

void gum::AdaptiveRMaxPlaner::_makeRMaxFunctionGraphs_ ( )

private

Definition at line 243 of file adaptiveRMaxPlaner.cpp.

                                                    {
    _rThreshold_ = _fmdpLearner_->modaMax() * 5 > 30 ? _fmdpLearner_->modaMax() * 5 : 30;
    _rmax_       = _fmdpLearner_->rMax() / (1.0 - this->discountFactor_);
 
    for (auto actionIter = this->fmdp()->beginActions(); actionIter != this->fmdp()->endActions();
         ++actionIter) {
      std::vector< MultiDimFunctionGraph< double >* > rmaxs;
      std::vector< MultiDimFunctionGraph< double >* > boolQs;
 
      for (auto varIter = this->fmdp()->beginVariables(); varIter != this->fmdp()->endVariables();
           ++varIter) {
        const IVisitableGraphLearner* visited = _counterTable_[*actionIter];
 
        MultiDimFunctionGraph< double >* varRMax  = this->operator_->getFunctionInstance();
        MultiDimFunctionGraph< double >* varBoolQ = this->operator_->getFunctionInstance();
 
        visited->insertSetOfVars(varRMax);
        visited->insertSetOfVars(varBoolQ);
 
        std::pair< NodeId, NodeId > rooty
            = _visitLearner_(visited, visited->root(), varRMax, varBoolQ);
        varRMax->manager()->setRootNode(rooty.first);
        varRMax->manager()->reduce();
        varRMax->manager()->clean();
        varBoolQ->manager()->setRootNode(rooty.second);
        varBoolQ->manager()->reduce();
        varBoolQ->manager()->clean();
 
        rmaxs.push_back(varRMax);
        boolQs.push_back(varBoolQ);
 
        //          std::cout << RECASTED(this->fmdp_->transition(*actionIter,
        //          *varIter))->toDot() << std::endl;
        //          for( auto varIter2 =
        //          RECASTED(this->fmdp_->transition(*actionIter,
        //          *varIter))->variablesSequence().beginSafe(); varIter2 !=
        //          RECASTED(this->fmdp_->transition(*actionIter,
        //          *varIter))->variablesSequence().endSafe(); ++varIter2 )
        //              std::cout << (*varIter2)->name() << " | ";
        //          std::cout << std::endl;
 
        //          std::cout << varRMax->toDot() << std::endl;
        //          for( auto varIter =
        //          varRMax->variablesSequence().beginSafe(); varIter !=
        //          varRMax->variablesSequence().endSafe(); ++varIter )
        //              std::cout << (*varIter)->name() << " | ";
        //          std::cout << std::endl;
 
        //          std::cout << varBoolQ->toDot() << std::endl;
        //          for( auto varIter =
        //          varBoolQ->variablesSequence().beginSafe(); varIter !=
        //          varBoolQ->variablesSequence().endSafe(); ++varIter )
        //              std::cout << (*varIter)->name() << " | ";
        //          std::cout << std::endl;
      }
 
      //        std::cout << "Maximising" << std::endl;
      _actionsRMaxTable_.insert(*actionIter, this->maximiseQactions_(rmaxs));
      _actionsBoolTable_.insert(*actionIter, this->minimiseFunctions_(boolQs));
    }
  }

References _actionsBoolTable_, _actionsRMaxTable_, _counterTable_, _fmdpLearner_, _rmax_, _rThreshold_, _visitLearner_(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::clean(), gum::StructuredPlaner< double >::discountFactor_, gum::FMDP< GUM_SCALAR >::endActions(), gum::FMDP< GUM_SCALAR >::endVariables(), gum::StructuredPlaner< double >::fmdp(), gum::IVisitableGraphLearner::insertSetOfVars(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), gum::StructuredPlaner< double >::maximiseQactions_(), gum::StructuredPlaner< double >::minimiseFunctions_(), gum::StructuredPlaner< double >::operator_, gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::reduce(), gum::IVisitableGraphLearner::root(), and gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::setRootNode().

Referenced by makePlanning().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ _recurArgMaxCopy_()

NodeId gum::StructuredPlaner< double >::_recurArgMaxCopy_	(	NodeId	currentNodeId,
		Idx	actionId,
		const MultiDimFunctionGraph< double > *	src,
		MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *	argMaxCpy,
		HashTable< NodeId, NodeId > &	visitedNodes )

privateinherited

Recursion part for the createArgMaxCopy.

Definition at line 291 of file structuredPlaner_tpl.h.

                                                                                                  {
    if (visitedNodes.exists(currentNodeId)) return visitedNodes[currentNodeId];
 
    NodeId nody;
    if (src->isTerminalNode(currentNodeId)) {
      ArgMaxSet< GUM_SCALAR, Idx > leaf(src->nodeValue(currentNodeId), actionId);
      nody = argMaxCpy->manager()->addTerminalNode(leaf);
    } else {
      const InternalNode* currentNode = src->node(currentNodeId);
      NodeId*             sonsMap     = static_cast< NodeId* >(
          SOA_ALLOCATE(sizeof(NodeId) * currentNode->nodeVar()->domainSize()));
      for (Idx moda = 0; moda < currentNode->nodeVar()->domainSize(); ++moda)
        sonsMap[moda]
            = _recurArgMaxCopy_(currentNode->son(moda), actionId, src, argMaxCpy, visitedNodes);
      nody = argMaxCpy->manager()->addInternalNode(currentNode->nodeVar(), sonsMap);
    }
    visitedNodes.insert(currentNodeId, nody);
    return nody;
  }

References vFunction_.

◆ _recurExtractOptPol_()

NodeId gum::StructuredPlaner< double >::_recurExtractOptPol_	(	NodeId	currentNodeId,
		const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *	argMaxOptVFunc,
		HashTable< NodeId, NodeId > &	visitedNodes )

privateinherited

Recursion part for the createArgMaxCopy.

Definition at line 321 of file structuredPlaner_tpl.h.

                                                 {
    if (visitedNodes.exists(currentNodeId)) return visitedNodes[currentNodeId];
 
    NodeId nody;
    if (argMaxOptVFunc->isTerminalNode(currentNodeId)) {
      ActionSet leaf;
      _transferActionIds_(argMaxOptVFunc->nodeValue(currentNodeId), leaf);
      nody = optimalPolicy_->manager()->addTerminalNode(leaf);
    } else {
      const InternalNode* currentNode = argMaxOptVFunc->node(currentNodeId);
      NodeId*             sonsMap     = static_cast< NodeId* >(
          SOA_ALLOCATE(sizeof(NodeId) * currentNode->nodeVar()->domainSize()));
      for (Idx moda = 0; moda < currentNode->nodeVar()->domainSize(); ++moda)
        sonsMap[moda] = _recurExtractOptPol_(currentNode->son(moda), argMaxOptVFunc, visitedNodes);
      nody = optimalPolicy_->manager()->addInternalNode(currentNode->nodeVar(), sonsMap);
    }
    visitedNodes.insert(currentNodeId, nody);
    return nody;
  }

◆ _transferActionIds_()

void gum::StructuredPlaner< double >::_transferActionIds_	(	const ArgMaxSet< double, Idx > &	src,
		ActionSet &	dest )

privateinherited

Extract from an ArgMaxSet the associated ActionSet.

Definition at line 329 of file structuredPlaner_tpl.h.

                                                                            {
    for (auto idi = src.beginSafe(); idi != src.endSafe(); ++idi)
      dest += *idi;
  }

References evalQaction_(), fmdp_, and vFunction_.

Here is the call graph for this function:

◆ _visitLearner_()

std::pair< NodeId, NodeId > gum::AdaptiveRMaxPlaner::_visitLearner_	(	const IVisitableGraphLearner *	visited,
		NodeId	currentNodeId,
		MultiDimFunctionGraph< double > *	rmax,
		MultiDimFunctionGraph< double > *	boolQ )

private

Definition at line 309 of file adaptiveRMaxPlaner.cpp.

                                                                                 {
    std::pair< NodeId, NodeId > rep;
    if (visited->isTerminal(currentNodeId)) {
      rep.first = rmax->manager()->addTerminalNode(
          visited->nodeNbObservation(currentNodeId) < _rThreshold_ ? _rmax_ : 0.0);
      rep.second = boolQ->manager()->addTerminalNode(
          visited->nodeNbObservation(currentNodeId) < _rThreshold_ ? 0.0 : 1.0);
      return rep;
    }
 
    auto rmaxsons = static_cast< NodeId* >(
        SOA_ALLOCATE(sizeof(NodeId) * visited->nodeVar(currentNodeId)->domainSize()));
    auto bqsons = static_cast< NodeId* >(
        SOA_ALLOCATE(sizeof(NodeId) * visited->nodeVar(currentNodeId)->domainSize()));
 
    for (Idx moda = 0; moda < visited->nodeVar(currentNodeId)->domainSize(); ++moda) {
      std::pair< NodeId, NodeId > sonp
          = _visitLearner_(visited, visited->nodeSon(currentNodeId, moda), rmax, boolQ);
      rmaxsons[moda] = sonp.first;
      bqsons[moda]   = sonp.second;
    }
 
    rep.first  = rmax->manager()->addInternalNode(visited->nodeVar(currentNodeId), rmaxsons);
    rep.second = boolQ->manager()->addInternalNode(visited->nodeVar(currentNodeId), bqsons);
    return rep;
  }

References _rmax_, _rThreshold_, _visitLearner_(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addInternalNode(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addTerminalNode(), gum::DiscreteVariable::domainSize(), gum::IVisitableGraphLearner::isTerminal(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), gum::IVisitableGraphLearner::nodeNbObservation(), gum::IVisitableGraphLearner::nodeSon(), gum::IVisitableGraphLearner::nodeVar(), and SOA_ALLOCATE.

Referenced by _makeRMaxFunctionGraphs_(), and _visitLearner_().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ addReward_()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::addReward_	(	MultiDimFunctionGraph< double > *	function,
		Idx	actionId = 0 )

protectedvirtualinherited

Perform the R(s) + gamma . function.

Warning: function is deleted, new one is returned

Definition at line 256 of file structuredPlaner_tpl.h.

                                                                                                {
    // *****************************************************************************************
    // ... we multiply the result by the discount factor, ...
    MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = operator_->getFunctionInstance();
    newVFunction->copyAndMultiplyByScalar(*Vold, this->discountFactor_);
    delete Vold;
 
    // *****************************************************************************************
    // ... and finally add reward
    newVFunction = operator_->add(newVFunction, RECAST(fmdp_->reward(actionId)));
 
    return newVFunction;
  }

References _firstTime_.

Referenced by gum::AdaptiveRMaxPlaner::evalPolicy_(), and gum::AdaptiveRMaxPlaner::valueIteration_().

Here is the caller graph for this function:

◆ argmaximiseQactions_()

MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< double >::argmaximiseQactions_ ( std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * > & qActionsSet )

protectedvirtualinherited

Performs argmax_a Q(s,a).

Warning: Performs also the deallocation of the QActions

Definition at line 304 of file structuredPlaner_tpl.h.

                                                                                       {
    MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* newVFunction
        = qActionsSet.back();
    qActionsSet.pop_back();
 
    while (!qActionsSet.empty()) {
      MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* qAction
          = qActionsSet.back();
      qActionsSet.pop_back();
      newVFunction = operator_->argmaximize(newVFunction, qAction);
    }
 
    return newVFunction;
  }

Referenced by gum::AdaptiveRMaxPlaner::evalPolicy_().

Here is the caller graph for this function:

◆ checkState()

void gum::AdaptiveRMaxPlaner::checkState	(	const Instantiation &	newState,
		Idx	actionId )

inlinevirtual

Implements gum::IDecisionStrategy.

Definition at line 222 of file adaptiveRMaxPlaner.h.

                                                                 {
      if (!_initializedTable_[actionId]) {
        _counterTable_[actionId]->reset(newState);
        _initializedTable_[actionId] = true;
      } else _counterTable_[actionId]->incState(newState);
    }

References _counterTable_, and _initializedTable_.

◆ evalPolicy_()

void gum::AdaptiveRMaxPlaner::evalPolicy_ ( )

protectedvirtual

Perform the required tasks to extract an optimal policy.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 204 of file adaptiveRMaxPlaner.cpp.

                                       {
    // *****************************************************************************************
    // Loop reset
    MultiDimFunctionGraph< double >* newVFunction = operator_->getFunctionInstance();
    newVFunction->copyAndReassign(*vFunction_, fmdp_->mapMainPrime());
 
    std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy >* >
        argMaxQActionsSet;
    // *****************************************************************************************
    // For each action
    for (auto actionIter = fmdp_->beginActions(); actionIter != fmdp_->endActions(); ++actionIter) {
      MultiDimFunctionGraph< double >* qAction = this->evalQaction_(newVFunction, *actionIter);
 
      qAction = this->addReward_(qAction, *actionIter);
 
      qAction = this->operator_->maximize(
          _actionsRMaxTable_[*actionIter],
          this->operator_->multiply(qAction, _actionsBoolTable_[*actionIter], 1),
          2);
 
      argMaxQActionsSet.push_back(makeArgMax_(qAction, *actionIter));
    }
    delete newVFunction;
 
    // *****************************************************************************************
    // Next to evaluate main value function, we take maximise over all action
    // value, ...
    MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy >* argMaxVFunction
        = argmaximiseQactions_(argMaxQActionsSet);
 
    // *****************************************************************************************
    // Next to evaluate main value function, we take maximise over all action
    // value, ...
    extractOptimalPolicy_(argMaxVFunction);
  }

Here is the call graph for this function:

◆ evalQaction_()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::evalQaction_	(	const MultiDimFunctionGraph< double > *	Vold,
		Idx	actionId )

protectedvirtualinherited

Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.

Definition at line 235 of file structuredPlaner_tpl.h.

                                                                 {
    // ******************************************************************************
    // Initialisation :
    // Creating a copy of last Vfunction to deduce from the new Qaction
    // And finding the first var to eleminate (the one at the end)
 
    return operator_->regress(Vold, actionId, this->fmdp_, this->elVarSeq_);
  }

Referenced by _transferActionIds_(), gum::AdaptiveRMaxPlaner::evalPolicy_(), and gum::AdaptiveRMaxPlaner::valueIteration_().

Here is the caller graph for this function:

◆ extractOptimalPolicy_()

void gum::StructuredPlaner< double >::extractOptimalPolicy_ ( const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * optimalValueFunction )

protectedinherited

From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.

Warning: deallocate the argmax optimal value function

Definition at line 313 of file structuredPlaner_tpl.h.

                                      {
    optimalPolicy_->clear();
 
    // Insertion des nouvelles variables
    for (SequenceIteratorSafe< const DiscreteVariable* > varIter
         = argMaxOptimalValueFunction->variablesSequence().beginSafe();
         varIter != argMaxOptimalValueFunction->variablesSequence().endSafe();
         ++varIter)
      optimalPolicy_->add(**varIter);
 
    HashTable< NodeId, NodeId > src2dest;
    optimalPolicy_->manager()->setRootNode(_recurExtractOptPol_(argMaxOptimalValueFunction->root(),
                                                                argMaxOptimalValueFunction,
                                                                src2dest));
 
    delete argMaxOptimalValueFunction;
  }

Referenced by gum::AdaptiveRMaxPlaner::evalPolicy_().

Here is the caller graph for this function:

◆ fmdp()

INLINE const FMDP< double > * gum::StructuredPlaner< double >::fmdp ( )

inlineinherited

Returns a const ptr on the Factored Markov Decision Process on which we're planning.

Definition at line 148 of file structuredPlaner.h.

148{ return fmdp_; }

Referenced by gum::AdaptiveRMaxPlaner::_clearTables_(), gum::AdaptiveRMaxPlaner::_makeRMaxFunctionGraphs_(), and gum::AdaptiveRMaxPlaner::initialize().

Here is the caller graph for this function:

◆ initialize() [1/2]

void gum::AdaptiveRMaxPlaner::initialize ( const FMDP< double > * fmdp )

virtual

Initializes data structure needed for making the planning.

Warning: No calling this methods before starting the first makePlaninng will surely and definitely result in a crash

Implements gum::IPlanningStrategy< double >.

Definition at line 117 of file adaptiveRMaxPlaner.cpp.

                                                                {
    if (!_initialized_) {
      StructuredPlaner::initialize(fmdp);
      IDecisionStrategy::initialize(fmdp);
      for (auto actionIter = fmdp->beginActions(); actionIter != fmdp->endActions(); ++actionIter) {
        _counterTable_.insert(*actionIter, new StatesCounter());
        _initializedTable_.insert(*actionIter, false);
      }
      _initialized_ = true;
    }
  }

References _counterTable_, _initialized_, _initializedTable_, gum::StructuredPlaner< double >::fmdp(), gum::IDecisionStrategy::initialize(), and gum::StructuredPlaner< GUM_SCALAR >::initialize().

Here is the call graph for this function:

◆ initialize() [2/2]

void gum::StructuredPlaner< double >::initialize ( const FMDP< double > * fmdp )

virtualinherited

Initializes data structure needed for making the planning.

Warning: No calling this methods before starting the first makePlaninng will surely and definitely result in a crash

Definition at line 197 of file structuredPlaner_tpl.h.

                                                                                {
    fmdp_ = fmdp;
 
    // Determination of the threshold value
    _threshold_ *= (1 - discountFactor_) / (2 * discountFactor_);
 
    // Establishement of sequence of variable elemination
    for (auto varIter = fmdp_->beginVariables(); varIter != fmdp_->endVariables(); ++varIter)
      elVarSeq_ << fmdp_->main2prime(*varIter);
 
    // Initialisation of the value function
    vFunction_     = operator_->getFunctionInstance();
    optimalPolicy_ = operator_->getAggregatorInstance();
    _firstTime_    = true;
  }

References gum::HashTable< Key, Val >::exists(), gum::Set< Key >::exists(), gum::HashTable< Key, Val >::insert(), and gum::InternalNode::son().

Here is the call graph for this function:

◆ initVFunction_()

void gum::AdaptiveRMaxPlaner::initVFunction_ ( )

protectedvirtual

Performs a single step of value iteration.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 151 of file adaptiveRMaxPlaner.cpp.

                                          {
    vFunction_->manager()->setRootNode(vFunction_->manager()->addTerminalNode(0.0));
    for (auto actionIter = fmdp_->beginActions(); actionIter != fmdp_->endActions(); ++actionIter)
      vFunction_ = this->operator_->add(vFunction_, RECASTED(this->fmdp_->reward(*actionIter)), 1);
  }

References gum::StructuredPlaner< double >::fmdp_, gum::StructuredPlaner< double >::operator_, RECASTED, and gum::StructuredPlaner< double >::vFunction_.

◆ makeArgMax_()

MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * gum::StructuredPlaner< double >::makeArgMax_	(	const MultiDimFunctionGraph< double > *	Qaction,
		Idx	actionId )

protectedinherited

Creates a copy of given Qaction that can be exploit by a Argmax.

Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction

Parameters

Qaction	: the function graph we want to transform
actionId	: the action Id associated to that graph

Warning: delete the original Qaction, returns its conversion

Definition at line 285 of file structuredPlaner_tpl.h.

                                                               {
    MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy >* amcpy
        = operator_->getArgMaxFunctionInstance();
 
    // Insertion des nouvelles variables
    for (SequenceIteratorSafe< const DiscreteVariable* > varIter
         = qAction->variablesSequence().beginSafe();
         varIter != qAction->variablesSequence().endSafe();
         ++varIter)
      amcpy->add(**varIter);
 
    HashTable< NodeId, NodeId > src2dest;
    amcpy->manager()->setRootNode(
        _recurArgMaxCopy_(qAction->root(), actionId, qAction, amcpy, src2dest));
 
    delete qAction;
    return amcpy;
  }

References _threshold_, and verbose_.

Referenced by gum::AdaptiveRMaxPlaner::evalPolicy_().

Here is the caller graph for this function:

◆ makePlanning()

void gum::AdaptiveRMaxPlaner::makePlanning ( Idx nbStep = 1000000 )

virtual

Performs a value iteration.

Parameters

nbStep : enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 132 of file adaptiveRMaxPlaner.cpp.

                                                  {
    _makeRMaxFunctionGraphs_();
 
    StructuredPlaner::makePlanning(nbStep);
 
    _clearTables_();
  }

References _clearTables_(), _makeRMaxFunctionGraphs_(), and gum::StructuredPlaner< GUM_SCALAR >::makePlanning().

Here is the call graph for this function:

◆ maximiseQactions_()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::maximiseQactions_ ( std::vector< MultiDimFunctionGraph< double > * > & qActionsSet )

protectedvirtualinherited

Performs max_a Q(s,a).

Warning: Performs also the deallocation of the QActions

Definition at line 242 of file structuredPlaner_tpl.h.

                                                                      {
    MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
    qActionsSet.pop_back();
 
    while (!qActionsSet.empty()) {
      MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
      qActionsSet.pop_back();
      newVFunction = operator_->maximize(newVFunction, qAction);
    }
 
    return newVFunction;
  }

Referenced by gum::AdaptiveRMaxPlaner::_makeRMaxFunctionGraphs_(), and gum::AdaptiveRMaxPlaner::valueIteration_().

Here is the caller graph for this function:

◆ minimiseFunctions_()

MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::minimiseFunctions_ ( std::vector< MultiDimFunctionGraph< double > * > & qActionsSet )

protectedvirtualinherited

Performs min_i F_i.

Warning: Performs also the deallocation of the F_i

Definition at line 249 of file structuredPlaner_tpl.h.

                                                                      {
    MultiDimFunctionGraph< GUM_SCALAR >* newVFunction = qActionsSet.back();
    qActionsSet.pop_back();
 
    while (!qActionsSet.empty()) {
      MultiDimFunctionGraph< GUM_SCALAR >* qAction = qActionsSet.back();
      qActionsSet.pop_back();
      newVFunction = operator_->minimize(newVFunction, qAction);
    }
 
    return newVFunction;
  }

References elVarSeq_, fmdp_, operator_, optimalPolicy_, and vFunction_.

Referenced by gum::AdaptiveRMaxPlaner::_makeRMaxFunctionGraphs_().

Here is the caller graph for this function:

◆ optimalPolicy()

INLINE MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * gum::StructuredPlaner< double >::optimalPolicy ( )

inlinevirtualinherited

Returns the best policy obtained so far.

Implements gum::IPlanningStrategy< double >.

Definition at line 163 of file structuredPlaner.h.

                                                                                      {
      return optimalPolicy_;
    }

◆ optimalPolicy2String()

std::string gum::StructuredPlaner< double >::optimalPolicy2String ( )

virtualinherited

Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.

Implements gum::IPlanningStrategy< double >.

Definition at line 179 of file structuredPlaner_tpl.h.

                                                                 {
    // ************************************************************************
    // Discarding the case where no \pi* have been computed
    if (!optimalPolicy_ || optimalPolicy_->root() == 0) return "NO OPTIMAL POLICY CALCULATED YET";
 
    // ************************************************************************
    // Initialisation
 
    // Declaration of the needed string stream
    std::stringstream output;
    std::stringstream terminalStream;
    std::stringstream nonTerminalStream;
    std::stringstream arcstream;
 
    // First line for the toDot
    output << std::endl << "digraph \" OPTIMAL POLICY \" {" << std::endl;
 
    // Form line for the internal node stream en the terminal node stream
    terminalStream << "node [shape = box];" << std::endl;
    nonTerminalStream << "node [shape = ellipse];" << std::endl;
 
    // For somme clarity in the final string
    std::string tab = "\t";
 
    // To know if we already checked a node or not
    Set< NodeId > visited;
 
    // FIFO of nodes to visit
    std::queue< NodeId > fifo;
 
    // Loading the FIFO
    fifo.push(optimalPolicy_->root());
    visited << optimalPolicy_->root();
 
 
    // ************************************************************************
    // Main loop
    while (!fifo.empty()) {
      // Node to visit
      NodeId currentNodeId = fifo.front();
      fifo.pop();
 
      // Checking if it is terminal
      if (optimalPolicy_->isTerminalNode(currentNodeId)) {
        // Get back the associated ActionSet
        ActionSet ase = optimalPolicy_->nodeValue(currentNodeId);
 
        // Creating a line for this node
        terminalStream << tab << currentNodeId << ";" << tab << currentNodeId << " [label=\""
                       << currentNodeId << " - ";
 
        // Enumerating and adding to the line the associated optimal actions
        for (SequenceIteratorSafe< Idx > valIter = ase.beginSafe(); valIter != ase.endSafe();
             ++valIter)
          terminalStream << fmdp_->actionName(*valIter) << " ";
 
        // Terminating line
        terminalStream << "\"];" << std::endl;
        continue;
      }
 
      // Either wise
      {
        // Geting back the associated internal node
        const InternalNode* currentNode = optimalPolicy_->node(currentNodeId);
 
        // Creating a line in internalnode stream for this node
        nonTerminalStream << tab << currentNodeId << ";" << tab << currentNodeId << " [label=\""
                          << currentNodeId << " - " << currentNode->nodeVar()->name() << "\"];"
                          << std::endl;
 
        // Going through the sons and agregating them according the the sons Ids
        HashTable< NodeId, LinkedList< Idx >* > sonMap;
        for (Idx sonIter = 0; sonIter < currentNode->nbSons(); ++sonIter) {
          if (!visited.exists(currentNode->son(sonIter))) {
            fifo.push(currentNode->son(sonIter));
            visited << currentNode->son(sonIter);
          }
          if (!sonMap.exists(currentNode->son(sonIter)))
            sonMap.insert(currentNode->son(sonIter), new LinkedList< Idx >());
          sonMap[currentNode->son(sonIter)]->addLink(sonIter);
        }
 
        // Adding to the arc stram
        for (auto sonIter = sonMap.beginSafe(); sonIter != sonMap.endSafe(); ++sonIter) {
          arcstream << tab << currentNodeId << " -> " << sonIter.key() << " [label=\" ";
          Link< Idx >* modaIter = sonIter.val()->list();
          while (modaIter) {
            arcstream << currentNode->nodeVar()->label(modaIter->element());
            if (modaIter->nextLink()) arcstream << ", ";
            modaIter = modaIter->nextLink();
          }
          arcstream << "\",color=\"#00ff00\"];" << std::endl;
          delete sonIter.val();
        }
      }
    }
 
    // Terminating
    output << terminalStream.str() << std::endl
           << nonTerminalStream.str() << std::endl
           << arcstream.str() << std::endl
           << "}" << std::endl;
 
    return output.str();
  }

◆ optimalPolicySize()

virtual Size gum::StructuredPlaner< double >::optimalPolicySize ( )

inlinevirtualinherited

Returns optimalPolicy computed so far current size.

Implements gum::IPlanningStrategy< double >.

Definition at line 170 of file structuredPlaner.h.

                                     {
      return optimalPolicy_ != nullptr ? optimalPolicy_->realSize() : 0;
    }

◆ ReducedAndOrderedInstance()

AdaptiveRMaxPlaner * gum::AdaptiveRMaxPlaner::ReducedAndOrderedInstance	(	const ILearningStrategy *	learner,
		double	discountFactor = 0.9,
		double	epsilon = 0.00001,
		bool	verbose = true )

inlinestatic

Definition at line 83 of file adaptiveRMaxPlaner.h.

                                                                                     {
      return new AdaptiveRMaxPlaner(new MDDOperatorStrategy< double >(),
                                    discountFactor,
                                    epsilon,
                                    learner,
                                    verbose);
    }

References AdaptiveRMaxPlaner().

Referenced by gum::SDYNA::RMaxMDDInstance().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ setOptimalStrategy()

void gum::IDecisionStrategy::setOptimalStrategy ( MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * optPol )

inlineinherited

Definition at line 111 of file IDecisionStrategy.h.

                                                                                               {
      optPol_ = optPol;
    }

References optPol_.

◆ spumddInstance()

StructuredPlaner< double > * gum::StructuredPlaner< double >::spumddInstance	(	double	discountFactor = 0.9,
		double	epsilon = 0.00001,
		bool	verbose = true )

inlinestaticinherited

Definition at line 92 of file structuredPlaner.h.

                                                                                          {
      return new StructuredPlaner< GUM_SCALAR >(new MDDOperatorStrategy< GUM_SCALAR >(),
                                                discountFactor,
                                                epsilon,
                                                verbose);
    }

◆ stateOptimalPolicy()

virtual ActionSet gum::IDecisionStrategy::stateOptimalPolicy ( const Instantiation & curState )

inlinevirtualinherited

Reimplemented in gum::E_GreedyDecider, and gum::RandomDecider.

Definition at line 115 of file IDecisionStrategy.h.

                                                                        {
      return (optPol_ && optPol_->realSize() != 0) ? optPol_->get(curState) : allActions_;
    }

References allActions_, and optPol_.

Referenced by gum::E_GreedyDecider::stateOptimalPolicy().

Here is the caller graph for this function:

◆ sviInstance()

StructuredPlaner< double > * gum::StructuredPlaner< double >::sviInstance	(	double	discountFactor = 0.9,
		double	epsilon = 0.00001,
		bool	verbose = true )

inlinestaticinherited

Definition at line 104 of file structuredPlaner.h.

                                                                                       {
      return new StructuredPlaner< GUM_SCALAR >(new TreeOperatorStrategy< GUM_SCALAR >(),
                                                discountFactor,
                                                epsilon,
                                                verbose);
    }

◆ TreeInstance()

AdaptiveRMaxPlaner * gum::AdaptiveRMaxPlaner::TreeInstance	(	const ILearningStrategy *	learner,
		double	discountFactor = 0.9,
		double	epsilon = 0.00001,
		bool	verbose = true )

inlinestatic

Definition at line 97 of file adaptiveRMaxPlaner.h.

                                                                                          {
      return new AdaptiveRMaxPlaner(new TreeOperatorStrategy< double >(),
                                    discountFactor,
                                    epsilon,
                                    learner,
                                    verbose);
    }

References AdaptiveRMaxPlaner().

Referenced by gum::SDYNA::RMaxTreeInstance().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ valueIteration_()

MultiDimFunctionGraph< double > * gum::AdaptiveRMaxPlaner::valueIteration_ ( )

protectedvirtual

Performs a single step of value iteration.

Reimplemented from gum::StructuredPlaner< double >.

Definition at line 160 of file adaptiveRMaxPlaner.cpp.

                                                                       {
    // *****************************************************************************************
    // Loop reset
    MultiDimFunctionGraph< double >* newVFunction = operator_->getFunctionInstance();
    newVFunction->copyAndReassign(*vFunction_, fmdp_->mapMainPrime());
 
    // *****************************************************************************************
    // For each action
    std::vector< MultiDimFunctionGraph< double >* > qActionsSet;
    for (auto actionIter = fmdp_->beginActions(); actionIter != fmdp_->endActions(); ++actionIter) {
      MultiDimFunctionGraph< double >* qAction = evalQaction_(newVFunction, *actionIter);
 
      // *******************************************************************************************
      // Next, we add the reward
      qAction = addReward_(qAction, *actionIter);
 
      qAction = this->operator_->maximize(
          _actionsRMaxTable_[*actionIter],
          this->operator_->multiply(qAction, _actionsBoolTable_[*actionIter], 1),
          2);
 
      qActionsSet.push_back(qAction);
    }
    delete newVFunction;
 
    // *****************************************************************************************
    // Next to evaluate main value function, we take maximise over all action
    // value, ...
    newVFunction = maximiseQactions_(qActionsSet);
 
    return newVFunction;
  }

References _actionsBoolTable_, _actionsRMaxTable_, gum::StructuredPlaner< double >::addReward_(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndReassign(), gum::StructuredPlaner< double >::evalQaction_(), gum::StructuredPlaner< double >::fmdp_, gum::StructuredPlaner< double >::maximiseQactions_(), gum::StructuredPlaner< double >::operator_, and gum::StructuredPlaner< double >::vFunction_.

Here is the call graph for this function:

◆ vFunction()

INLINE const MultiDimFunctionGraph< double > * gum::StructuredPlaner< double >::vFunction ( )

inlineinherited

Returns a const ptr on the value function computed so far.

Definition at line 153 of file structuredPlaner.h.

153{ return vFunction_; }

◆ vFunctionSize()

virtual Size gum::StructuredPlaner< double >::vFunctionSize ( )

inlinevirtualinherited

Returns vFunction computed so far current size.

Implements gum::IPlanningStrategy< double >.

Definition at line 158 of file structuredPlaner.h.

158{ return vFunction_ != nullptr ? vFunction_->realSize() : 0; }

Member Data Documentation

◆ _actionsBoolTable_

HashTable< Idx, MultiDimFunctionGraph< double >* > gum::AdaptiveRMaxPlaner::_actionsBoolTable_

private

Definition at line 209 of file adaptiveRMaxPlaner.h.

Referenced by _clearTables_(), _makeRMaxFunctionGraphs_(), evalPolicy_(), and valueIteration_().

◆ _actionsRMaxTable_

HashTable< Idx, MultiDimFunctionGraph< double >* > gum::AdaptiveRMaxPlaner::_actionsRMaxTable_

private

Definition at line 208 of file adaptiveRMaxPlaner.h.

Referenced by _clearTables_(), _makeRMaxFunctionGraphs_(), evalPolicy_(), and valueIteration_().

◆ _counterTable_

HashTable< Idx, StatesCounter* > gum::AdaptiveRMaxPlaner::_counterTable_

private

Definition at line 230 of file adaptiveRMaxPlaner.h.

Referenced by ~AdaptiveRMaxPlaner(), _makeRMaxFunctionGraphs_(), checkState(), and initialize().

◆ _firstTime_

bool gum::StructuredPlaner< double >::_firstTime_

privateinherited

Definition at line 382 of file structuredPlaner.h.

Referenced by addReward_().

◆ _fmdpLearner_

const ILearningStrategy* gum::AdaptiveRMaxPlaner::_fmdpLearner_

private

Definition at line 210 of file adaptiveRMaxPlaner.h.

Referenced by AdaptiveRMaxPlaner(), and _makeRMaxFunctionGraphs_().

◆ _initialized_

bool gum::AdaptiveRMaxPlaner::_initialized_

private

Definition at line 233 of file adaptiveRMaxPlaner.h.

Referenced by AdaptiveRMaxPlaner(), and initialize().

◆ _initializedTable_

HashTable< Idx, bool > gum::AdaptiveRMaxPlaner::_initializedTable_

private

Definition at line 231 of file adaptiveRMaxPlaner.h.

Referenced by checkState(), and initialize().

◆ _rmax_

double gum::AdaptiveRMaxPlaner::_rmax_

private

Definition at line 213 of file adaptiveRMaxPlaner.h.

Referenced by _makeRMaxFunctionGraphs_(), and _visitLearner_().

◆ _rThreshold_

double gum::AdaptiveRMaxPlaner::_rThreshold_

private

Definition at line 212 of file adaptiveRMaxPlaner.h.

Referenced by _makeRMaxFunctionGraphs_(), and _visitLearner_().

◆ _threshold_

double gum::StructuredPlaner< double >::_threshold_

privateinherited

The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*.

Definition at line 381 of file structuredPlaner.h.

Referenced by evalPolicy_(), and makeArgMax_().

◆ allActions_

ActionSet gum::IDecisionStrategy::allActions_

protectedinherited

Definition at line 124 of file IDecisionStrategy.h.

Referenced by initialize(), gum::E_GreedyDecider::stateOptimalPolicy(), stateOptimalPolicy(), and gum::RandomDecider::stateOptimalPolicy().

◆ discountFactor_

double gum::StructuredPlaner< double >::discountFactor_

protectedinherited

Discount Factor used for infinite horizon planning.

Definition at line 365 of file structuredPlaner.h.

Referenced by gum::AdaptiveRMaxPlaner::_makeRMaxFunctionGraphs_().

◆ elVarSeq_

gum::VariableSet gum::StructuredPlaner< double >::elVarSeq_

protectedinherited

A Set to eleminate primed variables.

Definition at line 360 of file structuredPlaner.h.

Referenced by minimiseFunctions_().

◆ fmdp_

const FMDP< double >* gum::StructuredPlaner< double >::fmdp_

protectedinherited

The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ).

Definition at line 340 of file structuredPlaner.h.

Referenced by ~StructuredPlaner(), _transferActionIds_(), gum::AdaptiveRMaxPlaner::evalPolicy_(), gum::AdaptiveRMaxPlaner::initVFunction_(), minimiseFunctions_(), and gum::AdaptiveRMaxPlaner::valueIteration_().

◆ operator_

IOperatorStrategy< double >* gum::StructuredPlaner< double >::operator_

protectedinherited

Definition at line 367 of file structuredPlaner.h.

Referenced by gum::AdaptiveRMaxPlaner::_makeRMaxFunctionGraphs_(), gum::AdaptiveRMaxPlaner::evalPolicy_(), evalPolicy_(), gum::AdaptiveRMaxPlaner::initVFunction_(), minimiseFunctions_(), and gum::AdaptiveRMaxPlaner::valueIteration_().

◆ optimalPolicy_

MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::StructuredPlaner< double >::optimalPolicy_

protectedinherited

The associated optimal policy.

Warning: Leaves are ActionSet which contains the ids of the best actions While this is sufficient to be exploited, to be understood by a human somme translation from the fmdp_ is required. optimalPolicy2String do this job.

Definition at line 355 of file structuredPlaner.h.

Referenced by ~StructuredPlaner(), and minimiseFunctions_().

◆ optPol_

const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy >* gum::IDecisionStrategy::optPol_ {nullptr}

protectedinherited

Definition at line 121 of file IDecisionStrategy.h.

121{nullptr};

Referenced by initialize(), setOptimalStrategy(), and stateOptimalPolicy().

◆ verbose_

bool gum::StructuredPlaner< double >::verbose_

protectedinherited

Boolean used to indcates whether or not iteration informations should be displayed on terminal.

Definition at line 373 of file structuredPlaner.h.

Referenced by makeArgMax_().

◆ vFunction_

MultiDimFunctionGraph< double >* gum::StructuredPlaner< double >::vFunction_

protectedinherited

The Value Function computed iteratively.

Definition at line 345 of file structuredPlaner.h.

Referenced by _recurArgMaxCopy_(), _transferActionIds_(), gum::AdaptiveRMaxPlaner::evalPolicy_(), evalPolicy_(), gum::AdaptiveRMaxPlaner::initVFunction_(), minimiseFunctions_(), and gum::AdaptiveRMaxPlaner::valueIteration_().

The documentation for this class was generated from the following files:

agrum/FMDP/planning/adaptiveRMaxPlaner.h
agrum/FMDP/planning/adaptiveRMaxPlaner.cpp

Public Member Functions

Static Public Member Functions

Protected Member Functions

Protected Attributes

Private Member Functions

Private Attributes

Incremental methods

Constructor & destructor.

Optimal policy extraction methods

Incremental methods

Detailed Description

Constructor & Destructor Documentation

◆ AdaptiveRMaxPlaner()

◆ ~AdaptiveRMaxPlaner()

Member Function Documentation

◆ _clearTables_()

◆ _makeRMaxFunctionGraphs_()

◆ _recurArgMaxCopy_()

◆ _recurExtractOptPol_()

◆ _transferActionIds_()

◆ _visitLearner_()

◆ addReward_()

◆ argmaximiseQactions_()

◆ checkState()

◆ evalPolicy_()

◆ evalQaction_()

◆ extractOptimalPolicy_()

◆ fmdp()

◆ initialize() [1/2]

◆ initialize() [2/2]

◆ initVFunction_()

◆ makeArgMax_()

◆ makePlanning()

◆ maximiseQactions_()

◆ minimiseFunctions_()

◆ optimalPolicy()

◆ optimalPolicy2String()

◆ optimalPolicySize()

◆ ReducedAndOrderedInstance()

◆ setOptimalStrategy()

◆ spumddInstance()

◆ stateOptimalPolicy()

◆ sviInstance()

◆ TreeInstance()

◆ valueIteration_()

◆ vFunction()

◆ vFunctionSize()

Member Data Documentation

◆ _actionsBoolTable_

◆ _actionsRMaxTable_

◆ _counterTable_

◆ _firstTime_

◆ _fmdpLearner_

◆ _initialized_

◆ _initializedTable_

◆ _rmax_

◆ _rThreshold_

◆ _threshold_

◆ allActions_

◆ discountFactor_

◆ elVarSeq_

◆ fmdp_

◆ operator_

◆ optimalPolicy_

◆ optPol_

◆ verbose_

◆ vFunction_