![]() |
aGrUM 2.3.2
a C++ library for (probabilistic) graphical models
|
<agrum/FMDP/planning/adaptiveRMaxPlaner.h> More...
#include <adaptiveRMaxPlaner.h>
Public Member Functions | |
Planning Methods | |
| void | initialize (const FMDP< double > *fmdp) |
| Initializes data structure needed for making the planning. | |
| void | makePlanning (Idx nbStep=1000000) |
| Performs a value iteration. | |
Datastructure access methods | |
| INLINE const FMDP< double > * | fmdp () |
| Returns a const ptr on the Factored Markov Decision Process on which we're planning. | |
| INLINE const MultiDimFunctionGraph< double > * | vFunction () |
| Returns a const ptr on the value function computed so far. | |
| virtual Size | vFunctionSize () |
| Returns vFunction computed so far current size. | |
| INLINE MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * | optimalPolicy () |
| Returns the best policy obtained so far. | |
| virtual Size | optimalPolicySize () |
| Returns optimalPolicy computed so far current size. | |
| std::string | optimalPolicy2String () |
| Provide a better toDot for the optimal policy where the leaves have the action name instead of its id. | |
Planning Methods | |
| virtual void | initialize (const FMDP< double > *fmdp) |
| Initializes data structure needed for making the planning. | |
Static Public Member Functions | |
| static AdaptiveRMaxPlaner * | ReducedAndOrderedInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true) |
| static AdaptiveRMaxPlaner * | TreeInstance (const ILearningStrategy *learner, double discountFactor=0.9, double epsilon=0.00001, bool verbose=true) |
| static StructuredPlaner< double > * | spumddInstance (double discountFactor=0.9, double epsilon=0.00001, bool verbose=true) |
| static StructuredPlaner< double > * | sviInstance (double discountFactor=0.9, double epsilon=0.00001, bool verbose=true) |
Protected Member Functions | |
Value Iteration Methods | |
| virtual void | initVFunction_ () |
| Performs a single step of value iteration. | |
| virtual MultiDimFunctionGraph< double > * | valueIteration_ () |
| Performs a single step of value iteration. | |
Optimal policy extraction methods | |
| virtual void | evalPolicy_ () |
| Perform the required tasks to extract an optimal policy. | |
Value Iteration Methods | |
| virtual MultiDimFunctionGraph< double > * | evalQaction_ (const MultiDimFunctionGraph< double > *, Idx) |
| Performs the P(s'|s,a).V^{t-1}(s') part of the value itération. | |
| virtual MultiDimFunctionGraph< double > * | maximiseQactions_ (std::vector< MultiDimFunctionGraph< double > * > &) |
| Performs max_a Q(s,a). | |
| virtual MultiDimFunctionGraph< double > * | minimiseFunctions_ (std::vector< MultiDimFunctionGraph< double > * > &) |
| Performs min_i F_i. | |
| virtual MultiDimFunctionGraph< double > * | addReward_ (MultiDimFunctionGraph< double > *function, Idx actionId=0) |
| Perform the R(s) + gamma . function. | |
Protected Attributes | |
| const FMDP< double > * | fmdp_ |
| The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ). | |
| MultiDimFunctionGraph< double > * | vFunction_ |
| The Value Function computed iteratively. | |
| MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * | optimalPolicy_ |
| The associated optimal policy. | |
| gum::VariableSet | elVarSeq_ |
| A Set to eleminate primed variables. | |
| double | discountFactor_ |
| Discount Factor used for infinite horizon planning. | |
| IOperatorStrategy< double > * | operator_ |
| bool | verbose_ |
| Boolean used to indcates whether or not iteration informations should be displayed on terminal. | |
Private Member Functions | |
| void | _makeRMaxFunctionGraphs_ () |
| std::pair< NodeId, NodeId > | _visitLearner_ (const IVisitableGraphLearner *, NodeId currentNodeId, MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< double > *) |
| void | _clearTables_ () |
Private Attributes | |
| HashTable< Idx, MultiDimFunctionGraph< double > * > | _actionsRMaxTable_ |
| HashTable< Idx, MultiDimFunctionGraph< double > * > | _actionsBoolTable_ |
| const ILearningStrategy * | _fmdpLearner_ |
| double | _rThreshold_ |
| double | _rmax_ |
| double | _threshold_ |
| The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*. | |
| bool | _firstTime_ |
Incremental methods | |
| HashTable< Idx, StatesCounter * > | _counterTable_ |
| HashTable< Idx, bool > | _initializedTable_ |
| bool | _initialized_ |
| void | checkState (const Instantiation &newState, Idx actionId) |
Constructor & destructor. | |
| AdaptiveRMaxPlaner (IOperatorStrategy< double > *opi, double discountFactor, double epsilon, const ILearningStrategy *learner, bool verbose) | |
| Default constructor. | |
| ~AdaptiveRMaxPlaner () | |
| Default destructor. | |
Optimal policy extraction methods | |
| NodeId | _recurArgMaxCopy_ (NodeId, Idx, const MultiDimFunctionGraph< double > *, MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &) |
| Recursion part for the createArgMaxCopy. | |
| NodeId | _recurExtractOptPol_ (NodeId, const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &) |
| Recursion part for the createArgMaxCopy. | |
| void | _transferActionIds_ (const ArgMaxSet< double, Idx > &, ActionSet &) |
| Extract from an ArgMaxSet the associated ActionSet. | |
| MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * | makeArgMax_ (const MultiDimFunctionGraph< double > *Qaction, Idx actionId) |
| Creates a copy of given Qaction that can be exploit by a Argmax. | |
| virtual MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * | argmaximiseQactions_ (std::vector< MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > * > &) |
| Performs argmax_a Q(s,a). | |
| void | extractOptimalPolicy_ (const MultiDimFunctionGraph< ArgMaxSet< double, Idx >, SetTerminalNodePolicy > *optimalValueFunction) |
| From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet. | |
Incremental methods | |
| void | setOptimalStrategy (MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > *optPol) |
| virtual ActionSet | stateOptimalPolicy (const Instantiation &curState) |
| const MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * | optPol_ {nullptr} |
| ActionSet | allActions_ |
<agrum/FMDP/planning/adaptiveRMaxPlaner.h>
A class to find optimal policy for a given FMDP.
Perform a RMax planning on given in parameter factored markov decision process
Definition at line 73 of file adaptiveRMaxPlaner.h.
|
private |
Default constructor.
Definition at line 84 of file adaptiveRMaxPlaner.cpp.
References AdaptiveRMaxPlaner(), gum::StructuredPlaner< double >::StructuredPlaner(), _fmdpLearner_, and _initialized_.
Referenced by AdaptiveRMaxPlaner(), ~AdaptiveRMaxPlaner(), ReducedAndOrderedInstance(), and TreeInstance().
| gum::AdaptiveRMaxPlaner::~AdaptiveRMaxPlaner | ( | ) |
Default destructor.
Definition at line 97 of file adaptiveRMaxPlaner.cpp.
References AdaptiveRMaxPlaner(), and _counterTable_.
|
private |
Definition at line 342 of file adaptiveRMaxPlaner.cpp.
References _actionsBoolTable_, _actionsRMaxTable_, gum::FMDP< GUM_SCALAR >::endActions(), and gum::StructuredPlaner< double >::fmdp().
Referenced by makePlanning().
|
private |
Definition at line 243 of file adaptiveRMaxPlaner.cpp.
References _actionsBoolTable_, _actionsRMaxTable_, _counterTable_, _fmdpLearner_, _rmax_, _rThreshold_, _visitLearner_(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::clean(), gum::StructuredPlaner< double >::discountFactor_, gum::FMDP< GUM_SCALAR >::endActions(), gum::FMDP< GUM_SCALAR >::endVariables(), gum::StructuredPlaner< double >::fmdp(), gum::IVisitableGraphLearner::insertSetOfVars(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), gum::StructuredPlaner< double >::maximiseQactions_(), gum::StructuredPlaner< double >::minimiseFunctions_(), gum::StructuredPlaner< double >::operator_, gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::reduce(), gum::IVisitableGraphLearner::root(), and gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::setRootNode().
Referenced by makePlanning().
|
privateinherited |
Recursion part for the createArgMaxCopy.
Definition at line 291 of file structuredPlaner_tpl.h.
References vFunction_.
|
privateinherited |
Recursion part for the createArgMaxCopy.
Definition at line 321 of file structuredPlaner_tpl.h.
|
privateinherited |
Extract from an ArgMaxSet the associated ActionSet.
Definition at line 329 of file structuredPlaner_tpl.h.
References evalQaction_(), fmdp_, and vFunction_.
|
private |
Definition at line 309 of file adaptiveRMaxPlaner.cpp.
References _rmax_, _rThreshold_, _visitLearner_(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addInternalNode(), gum::MultiDimFunctionGraphManager< GUM_SCALAR, TerminalNodePolicy >::addTerminalNode(), gum::DiscreteVariable::domainSize(), gum::IVisitableGraphLearner::isTerminal(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), gum::IVisitableGraphLearner::nodeNbObservation(), gum::IVisitableGraphLearner::nodeSon(), gum::IVisitableGraphLearner::nodeVar(), and SOA_ALLOCATE.
Referenced by _makeRMaxFunctionGraphs_(), and _visitLearner_().
|
protectedvirtualinherited |
Perform the R(s) + gamma . function.
Definition at line 256 of file structuredPlaner_tpl.h.
References _firstTime_.
Referenced by gum::AdaptiveRMaxPlaner::evalPolicy_(), and gum::AdaptiveRMaxPlaner::valueIteration_().
|
protectedvirtualinherited |
Performs argmax_a Q(s,a).
Definition at line 304 of file structuredPlaner_tpl.h.
Referenced by gum::AdaptiveRMaxPlaner::evalPolicy_().
|
inlinevirtual |
Implements gum::IDecisionStrategy.
Definition at line 222 of file adaptiveRMaxPlaner.h.
References _counterTable_, and _initializedTable_.
|
protectedvirtual |
Perform the required tasks to extract an optimal policy.
Reimplemented from gum::StructuredPlaner< double >.
Definition at line 204 of file adaptiveRMaxPlaner.cpp.
References _actionsBoolTable_, _actionsRMaxTable_, gum::StructuredPlaner< double >::addReward_(), gum::StructuredPlaner< double >::argmaximiseQactions_(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndReassign(), gum::StructuredPlaner< double >::evalQaction_(), gum::StructuredPlaner< double >::extractOptimalPolicy_(), gum::StructuredPlaner< double >::fmdp_, gum::StructuredPlaner< double >::makeArgMax_(), gum::StructuredPlaner< double >::operator_, and gum::StructuredPlaner< double >::vFunction_.
|
protectedvirtualinherited |
Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.
Definition at line 235 of file structuredPlaner_tpl.h.
Referenced by _transferActionIds_(), gum::AdaptiveRMaxPlaner::evalPolicy_(), and gum::AdaptiveRMaxPlaner::valueIteration_().
|
protectedinherited |
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.
Definition at line 313 of file structuredPlaner_tpl.h.
Referenced by gum::AdaptiveRMaxPlaner::evalPolicy_().
|
inlineinherited |
Returns a const ptr on the Factored Markov Decision Process on which we're planning.
Definition at line 148 of file structuredPlaner.h.
Referenced by gum::AdaptiveRMaxPlaner::_clearTables_(), gum::AdaptiveRMaxPlaner::_makeRMaxFunctionGraphs_(), and gum::AdaptiveRMaxPlaner::initialize().
Initializes data structure needed for making the planning.
Implements gum::IPlanningStrategy< double >.
Definition at line 117 of file adaptiveRMaxPlaner.cpp.
References _counterTable_, _initialized_, _initializedTable_, gum::StructuredPlaner< double >::fmdp(), gum::IDecisionStrategy::initialize(), and gum::StructuredPlaner< GUM_SCALAR >::initialize().
|
virtualinherited |
Initializes data structure needed for making the planning.
Definition at line 197 of file structuredPlaner_tpl.h.
References gum::HashTable< Key, Val >::exists(), gum::Set< Key >::exists(), gum::HashTable< Key, Val >::insert(), and gum::InternalNode::son().
|
protectedvirtual |
Performs a single step of value iteration.
Reimplemented from gum::StructuredPlaner< double >.
Definition at line 151 of file adaptiveRMaxPlaner.cpp.
References gum::StructuredPlaner< double >::fmdp_, gum::StructuredPlaner< double >::operator_, RECASTED, and gum::StructuredPlaner< double >::vFunction_.
|
protectedinherited |
Creates a copy of given Qaction that can be exploit by a Argmax.
Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction
| Qaction | : the function graph we want to transform |
| actionId | : the action Id associated to that graph |
Definition at line 285 of file structuredPlaner_tpl.h.
References _threshold_, and verbose_.
Referenced by gum::AdaptiveRMaxPlaner::evalPolicy_().
|
virtual |
Performs a value iteration.
| nbStep | : enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed |
Reimplemented from gum::StructuredPlaner< double >.
Definition at line 132 of file adaptiveRMaxPlaner.cpp.
References _clearTables_(), _makeRMaxFunctionGraphs_(), and gum::StructuredPlaner< GUM_SCALAR >::makePlanning().
|
protectedvirtualinherited |
Performs max_a Q(s,a).
Definition at line 242 of file structuredPlaner_tpl.h.
Referenced by gum::AdaptiveRMaxPlaner::_makeRMaxFunctionGraphs_(), and gum::AdaptiveRMaxPlaner::valueIteration_().
|
protectedvirtualinherited |
Performs min_i F_i.
Definition at line 249 of file structuredPlaner_tpl.h.
References elVarSeq_, fmdp_, operator_, optimalPolicy_, and vFunction_.
Referenced by gum::AdaptiveRMaxPlaner::_makeRMaxFunctionGraphs_().
|
inlinevirtualinherited |
Returns the best policy obtained so far.
Implements gum::IPlanningStrategy< double >.
Definition at line 163 of file structuredPlaner.h.
|
virtualinherited |
Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.
Implements gum::IPlanningStrategy< double >.
Definition at line 179 of file structuredPlaner_tpl.h.
|
inlinevirtualinherited |
Returns optimalPolicy computed so far current size.
Implements gum::IPlanningStrategy< double >.
Definition at line 170 of file structuredPlaner.h.
|
inlinestatic |
Definition at line 83 of file adaptiveRMaxPlaner.h.
References AdaptiveRMaxPlaner().
Referenced by gum::SDYNA::RMaxMDDInstance().
|
inlineinherited |
Definition at line 111 of file IDecisionStrategy.h.
References optPol_.
|
inlinestaticinherited |
Definition at line 92 of file structuredPlaner.h.
|
inlinevirtualinherited |
Reimplemented in gum::E_GreedyDecider, and gum::RandomDecider.
Definition at line 115 of file IDecisionStrategy.h.
References allActions_, and optPol_.
Referenced by gum::E_GreedyDecider::stateOptimalPolicy().
|
inlinestaticinherited |
Definition at line 104 of file structuredPlaner.h.
|
inlinestatic |
Definition at line 97 of file adaptiveRMaxPlaner.h.
References AdaptiveRMaxPlaner().
Referenced by gum::SDYNA::RMaxTreeInstance().
|
protectedvirtual |
Performs a single step of value iteration.
Reimplemented from gum::StructuredPlaner< double >.
Definition at line 160 of file adaptiveRMaxPlaner.cpp.
References _actionsBoolTable_, _actionsRMaxTable_, gum::StructuredPlaner< double >::addReward_(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndReassign(), gum::StructuredPlaner< double >::evalQaction_(), gum::StructuredPlaner< double >::fmdp_, gum::StructuredPlaner< double >::maximiseQactions_(), gum::StructuredPlaner< double >::operator_, and gum::StructuredPlaner< double >::vFunction_.
|
inlineinherited |
Returns a const ptr on the value function computed so far.
Definition at line 153 of file structuredPlaner.h.
|
inlinevirtualinherited |
Returns vFunction computed so far current size.
Implements gum::IPlanningStrategy< double >.
Definition at line 158 of file structuredPlaner.h.
|
private |
Definition at line 209 of file adaptiveRMaxPlaner.h.
Referenced by _clearTables_(), _makeRMaxFunctionGraphs_(), evalPolicy_(), and valueIteration_().
|
private |
Definition at line 208 of file adaptiveRMaxPlaner.h.
Referenced by _clearTables_(), _makeRMaxFunctionGraphs_(), evalPolicy_(), and valueIteration_().
|
private |
Definition at line 230 of file adaptiveRMaxPlaner.h.
Referenced by ~AdaptiveRMaxPlaner(), _makeRMaxFunctionGraphs_(), checkState(), and initialize().
|
privateinherited |
Definition at line 382 of file structuredPlaner.h.
Referenced by addReward_().
|
private |
Definition at line 210 of file adaptiveRMaxPlaner.h.
Referenced by AdaptiveRMaxPlaner(), and _makeRMaxFunctionGraphs_().
|
private |
Definition at line 233 of file adaptiveRMaxPlaner.h.
Referenced by AdaptiveRMaxPlaner(), and initialize().
Definition at line 231 of file adaptiveRMaxPlaner.h.
Referenced by checkState(), and initialize().
|
private |
Definition at line 213 of file adaptiveRMaxPlaner.h.
Referenced by _makeRMaxFunctionGraphs_(), and _visitLearner_().
|
private |
Definition at line 212 of file adaptiveRMaxPlaner.h.
Referenced by _makeRMaxFunctionGraphs_(), and _visitLearner_().
|
privateinherited |
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*.
Definition at line 381 of file structuredPlaner.h.
Referenced by evalPolicy_(), and makeArgMax_().
|
protectedinherited |
Definition at line 124 of file IDecisionStrategy.h.
Referenced by initialize(), gum::E_GreedyDecider::stateOptimalPolicy(), stateOptimalPolicy(), and gum::RandomDecider::stateOptimalPolicy().
|
protectedinherited |
Discount Factor used for infinite horizon planning.
Definition at line 365 of file structuredPlaner.h.
Referenced by gum::AdaptiveRMaxPlaner::_makeRMaxFunctionGraphs_().
|
protectedinherited |
A Set to eleminate primed variables.
Definition at line 360 of file structuredPlaner.h.
Referenced by minimiseFunctions_().
|
protectedinherited |
The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ).
Definition at line 340 of file structuredPlaner.h.
Referenced by ~StructuredPlaner(), _transferActionIds_(), gum::AdaptiveRMaxPlaner::evalPolicy_(), gum::AdaptiveRMaxPlaner::initVFunction_(), minimiseFunctions_(), and gum::AdaptiveRMaxPlaner::valueIteration_().
|
protectedinherited |
Definition at line 367 of file structuredPlaner.h.
Referenced by gum::AdaptiveRMaxPlaner::_makeRMaxFunctionGraphs_(), gum::AdaptiveRMaxPlaner::evalPolicy_(), evalPolicy_(), gum::AdaptiveRMaxPlaner::initVFunction_(), minimiseFunctions_(), and gum::AdaptiveRMaxPlaner::valueIteration_().
|
protectedinherited |
The associated optimal policy.
Definition at line 355 of file structuredPlaner.h.
Referenced by ~StructuredPlaner(), and minimiseFunctions_().
|
protectedinherited |
Definition at line 121 of file IDecisionStrategy.h.
Referenced by initialize(), setOptimalStrategy(), and stateOptimalPolicy().
|
protectedinherited |
Boolean used to indcates whether or not iteration informations should be displayed on terminal.
Definition at line 373 of file structuredPlaner.h.
Referenced by makeArgMax_().
|
protectedinherited |
The Value Function computed iteratively.
Definition at line 345 of file structuredPlaner.h.
Referenced by _recurArgMaxCopy_(), _transferActionIds_(), gum::AdaptiveRMaxPlaner::evalPolicy_(), evalPolicy_(), gum::AdaptiveRMaxPlaner::initVFunction_(), minimiseFunctions_(), and gum::AdaptiveRMaxPlaner::valueIteration_().