![]() |
aGrUM 2.3.2
a C++ library for (probabilistic) graphical models
|
<agrum/FMDP/planning/structuredPlaner.h> More...
#include <structuredPlaner.h>
Public Member Functions | |
Datastructure access methods | |
| INLINE const FMDP< GUM_SCALAR > * | fmdp () |
| Returns a const ptr on the Factored Markov Decision Process on which we're planning. | |
| INLINE const MultiDimFunctionGraph< GUM_SCALAR > * | vFunction () |
| Returns a const ptr on the value function computed so far. | |
| virtual Size | vFunctionSize () |
| Returns vFunction computed so far current size. | |
| INLINE MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * | optimalPolicy () |
| Returns the best policy obtained so far. | |
| virtual Size | optimalPolicySize () |
| Returns optimalPolicy computed so far current size. | |
| std::string | optimalPolicy2String () |
| Provide a better toDot for the optimal policy where the leaves have the action name instead of its id. | |
Planning Methods | |
| virtual void | initialize (const FMDP< GUM_SCALAR > *fmdp) |
| Initializes data structure needed for making the planning. | |
| virtual void | makePlanning (Idx nbStep=1000000) |
| Performs a value iteration. | |
Static Public Member Functions | |
| static StructuredPlaner< GUM_SCALAR > * | spumddInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true) |
| static StructuredPlaner< GUM_SCALAR > * | sviInstance (GUM_SCALAR discountFactor=0.9, GUM_SCALAR epsilon=0.00001, bool verbose=true) |
Protected Member Functions | |
Value Iteration Methods | |
| virtual void | initVFunction_ () |
| Performs a single step of value iteration. | |
| virtual MultiDimFunctionGraph< GUM_SCALAR > * | valueIteration_ () |
| Performs a single step of value iteration. | |
| virtual MultiDimFunctionGraph< GUM_SCALAR > * | evalQaction_ (const MultiDimFunctionGraph< GUM_SCALAR > *, Idx) |
| Performs the P(s'|s,a).V^{t-1}(s') part of the value itération. | |
| virtual MultiDimFunctionGraph< GUM_SCALAR > * | maximiseQactions_ (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &) |
| Performs max_a Q(s,a). | |
| virtual MultiDimFunctionGraph< GUM_SCALAR > * | minimiseFunctions_ (std::vector< MultiDimFunctionGraph< GUM_SCALAR > * > &) |
| Performs min_i F_i. | |
| virtual MultiDimFunctionGraph< GUM_SCALAR > * | addReward_ (MultiDimFunctionGraph< GUM_SCALAR > *function, Idx actionId=0) |
| Perform the R(s) + gamma . function. | |
Protected Attributes | |
| const FMDP< GUM_SCALAR > * | fmdp_ |
| The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ). | |
| MultiDimFunctionGraph< GUM_SCALAR > * | vFunction_ |
| The Value Function computed iteratively. | |
| MultiDimFunctionGraph< ActionSet, SetTerminalNodePolicy > * | optimalPolicy_ |
| The associated optimal policy. | |
| gum::VariableSet | elVarSeq_ |
| A Set to eleminate primed variables. | |
| GUM_SCALAR | discountFactor_ |
| Discount Factor used for infinite horizon planning. | |
| IOperatorStrategy< GUM_SCALAR > * | operator_ |
| bool | verbose_ |
| Boolean used to indcates whether or not iteration informations should be displayed on terminal. | |
Private Attributes | |
| GUM_SCALAR | _threshold_ |
| The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*. | |
| bool | _firstTime_ |
Constructor & destructor. | |
| StructuredPlaner (IOperatorStrategy< GUM_SCALAR > *opi, GUM_SCALAR discountFactor, GUM_SCALAR epsilon, bool verbose) | |
| Default constructor. | |
| virtual | ~StructuredPlaner () |
| Default destructor. | |
Optimal policy extraction methods | |
| virtual void | evalPolicy_ () |
| Perform the required tasks to extract an optimal policy. | |
| MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * | makeArgMax_ (const MultiDimFunctionGraph< GUM_SCALAR > *Qaction, Idx actionId) |
| Creates a copy of given Qaction that can be exploit by a Argmax. | |
| virtual MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * | argmaximiseQactions_ (std::vector< MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > * > &) |
| Performs argmax_a Q(s,a). | |
| void | extractOptimalPolicy_ (const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *optimalValueFunction) |
| From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet. | |
| NodeId | _recurArgMaxCopy_ (NodeId, Idx, const MultiDimFunctionGraph< GUM_SCALAR > *, MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &) |
| Recursion part for the createArgMaxCopy. | |
| NodeId | _recurExtractOptPol_ (NodeId, const MultiDimFunctionGraph< ArgMaxSet< GUM_SCALAR, Idx >, SetTerminalNodePolicy > *, HashTable< NodeId, NodeId > &) |
| Recursion part for the createArgMaxCopy. | |
| void | _transferActionIds_ (const ArgMaxSet< GUM_SCALAR, Idx > &, ActionSet &) |
| Extract from an ArgMaxSet the associated ActionSet. | |
<agrum/FMDP/planning/structuredPlaner.h>
A class to find optimal policy for a given FMDP.
Perform a structure value iteration planning
Pure virtual functions : regress_, maximize_, argmaximize_, add_ and subtract_ are a priorthe ones to be respecified according to the used datastructure (MDDs, DTs, BNs, ...)
Definition at line 82 of file structuredPlaner.h.
|
protected |
Default constructor.
Definition at line 86 of file structuredPlaner_tpl.h.
References StructuredPlaner(), _threshold_, discountFactor_, operator_, optimalPolicy_, verbose_, and vFunction_.
Referenced by StructuredPlaner(), ~StructuredPlaner(), spumddInstance(), and sviInstance().
|
virtual |
Default destructor.
Definition at line 102 of file structuredPlaner_tpl.h.
References StructuredPlaner(), operator_, optimalPolicy_, and vFunction_.
|
private |
Recursion part for the createArgMaxCopy.
Definition at line 499 of file structuredPlaner_tpl.h.
References _recurArgMaxCopy_(), gum::DiscreteVariable::domainSize(), gum::HashTable< Key, Val >::exists(), gum::HashTable< Key, Val >::insert(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::isTerminalNode(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::node(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::nodeValue(), gum::InternalNode::nodeVar(), SOA_ALLOCATE, and gum::InternalNode::son().
Referenced by _recurArgMaxCopy_(), and makeArgMax_().
|
private |
Recursion part for the createArgMaxCopy.
Definition at line 576 of file structuredPlaner_tpl.h.
References _recurExtractOptPol_(), _transferActionIds_(), gum::DiscreteVariable::domainSize(), gum::HashTable< Key, Val >::exists(), gum::HashTable< Key, Val >::insert(), gum::InternalNode::nodeVar(), optimalPolicy_, SOA_ALLOCATE, and gum::InternalNode::son().
Referenced by _recurExtractOptPol_(), and extractOptimalPolicy_().
|
private |
Extract from an ArgMaxSet the associated ActionSet.
Definition at line 604 of file structuredPlaner_tpl.h.
References gum::ArgMaxSet< GUM_SCALAR_VAL, GUM_SCALAR_SEQ >::beginSafe(), and gum::ArgMaxSet< GUM_SCALAR_VAL, GUM_SCALAR_SEQ >::endSafe().
Referenced by _recurExtractOptPol_().
|
protectedvirtual |
Perform the R(s) + gamma . function.
Definition at line 408 of file structuredPlaner_tpl.h.
References gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndMultiplyByScalar(), discountFactor_, fmdp_, operator_, and RECAST.
Referenced by evalPolicy_().
|
protectedvirtual |
Performs argmax_a Q(s,a).
Definition at line 529 of file structuredPlaner_tpl.h.
References operator_.
Referenced by evalPolicy_().
|
protectedvirtual |
Perform the required tasks to extract an optimal policy.
Reimplemented in gum::AdaptiveRMaxPlaner.
Definition at line 435 of file structuredPlaner_tpl.h.
References addReward_(), argmaximiseQactions_(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::copyAndReassign(), evalQaction_(), extractOptimalPolicy_(), fmdp_, makeArgMax_(), operator_, and vFunction_.
|
protectedvirtual |
Performs the P(s'|s,a).V^{t-1}(s') part of the value itération.
Definition at line 357 of file structuredPlaner_tpl.h.
References elVarSeq_, fmdp_, and operator_.
Referenced by evalPolicy_().
|
protected |
From V(s)* = argmax_a Q*(s,a), this function extract pi*(s) This function mainly consists in extracting from each ArgMaxSet presents at the leaves the associated ActionSet.
Definition at line 552 of file structuredPlaner_tpl.h.
References _recurExtractOptPol_(), and optimalPolicy_.
Referenced by evalPolicy_().
|
inline |
Returns a const ptr on the Factored Markov Decision Process on which we're planning.
Definition at line 148 of file structuredPlaner.h.
References fmdp_.
Referenced by initialize().
|
virtual |
Initializes data structure needed for making the planning.
Implements gum::IPlanningStrategy< GUM_SCALAR >.
Definition at line 243 of file structuredPlaner_tpl.h.
References _threshold_, discountFactor_, fmdp(), and fmdp_.
Referenced by gum::AdaptiveRMaxPlaner::initialize().
|
protectedvirtual |
Performs a single step of value iteration.
Reimplemented in gum::AdaptiveRMaxPlaner.
Definition at line 308 of file structuredPlaner_tpl.h.
References fmdp_, RECAST, and vFunction_.
Referenced by makePlanning().
|
protected |
Creates a copy of given Qaction that can be exploit by a Argmax.
Hence, this step consists in replacing each lea by an ArgMaxSet containing the value of the leaf and the actionId of the Qaction
| Qaction | : the function graph we want to transform |
| actionId | : the action Id associated to that graph |
Definition at line 474 of file structuredPlaner_tpl.h.
References _recurArgMaxCopy_(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::add(), gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::manager(), operator_, gum::MultiDimFunctionGraph< GUM_SCALAR, TerminalNodePolicy >::root(), and gum::MultiDimImplementation< GUM_SCALAR >::variablesSequence().
Referenced by evalPolicy_().
|
virtual |
Performs a value iteration.
| nbStep | : enables you to specify how many value iterations you wish to do. makePlanning will then stop whether when optimal value function is reach or when nbStep have been performed |
Implements gum::IPlanningStrategy< GUM_SCALAR >.
Reimplemented in gum::AdaptiveRMaxPlaner.
Definition at line 263 of file structuredPlaner_tpl.h.
References _firstTime_, and initVFunction_().
Referenced by gum::AdaptiveRMaxPlaner::makePlanning().
|
protectedvirtual |
Performs max_a Q(s,a).
Definition at line 371 of file structuredPlaner_tpl.h.
References operator_.
|
protectedvirtual |
Performs min_i F_i.
Definition at line 389 of file structuredPlaner_tpl.h.
References operator_.
|
inlinevirtual |
Returns the best policy obtained so far.
Implements gum::IPlanningStrategy< GUM_SCALAR >.
Definition at line 163 of file structuredPlaner.h.
References optimalPolicy_.
|
virtual |
Provide a better toDot for the optimal policy where the leaves have the action name instead of its id.
Implements gum::IPlanningStrategy< GUM_SCALAR >.
Definition at line 124 of file structuredPlaner_tpl.h.
References optimalPolicy_.
|
inlinevirtual |
Returns optimalPolicy computed so far current size.
Implements gum::IPlanningStrategy< GUM_SCALAR >.
Definition at line 170 of file structuredPlaner.h.
References optimalPolicy_.
|
inlinestatic |
Definition at line 92 of file structuredPlaner.h.
References StructuredPlaner().
Referenced by gum::SDYNA::RandomMDDInstance(), and gum::SDYNA::spimddiInstance().
|
inlinestatic |
Definition at line 104 of file structuredPlaner.h.
References StructuredPlaner().
Referenced by gum::SDYNA::RandomTreeInstance(), and gum::SDYNA::spitiInstance().
|
protectedvirtual |
Performs a single step of value iteration.
Reimplemented in gum::AdaptiveRMaxPlaner.
Definition at line 325 of file structuredPlaner_tpl.h.
References operator_.
|
inline |
Returns a const ptr on the value function computed so far.
Definition at line 153 of file structuredPlaner.h.
References vFunction_.
|
inlinevirtual |
Returns vFunction computed so far current size.
Implements gum::IPlanningStrategy< GUM_SCALAR >.
Definition at line 158 of file structuredPlaner.h.
References vFunction_.
|
private |
Definition at line 382 of file structuredPlaner.h.
Referenced by makePlanning().
|
private |
The threshold value Whenever | V^{n} - V^{n+1} | < threshold, we consider that V ~ V*.
Definition at line 381 of file structuredPlaner.h.
Referenced by StructuredPlaner(), and initialize().
|
protected |
Discount Factor used for infinite horizon planning.
Definition at line 365 of file structuredPlaner.h.
Referenced by StructuredPlaner(), addReward_(), and initialize().
|
protected |
A Set to eleminate primed variables.
Definition at line 360 of file structuredPlaner.h.
Referenced by evalQaction_().
|
protected |
The Factored Markov Decision Process describing our planning situation (NB : this one must have function graph as transitions and reward functions ).
Definition at line 340 of file structuredPlaner.h.
Referenced by addReward_(), evalPolicy_(), evalQaction_(), fmdp(), initialize(), and initVFunction_().
|
protected |
Definition at line 367 of file structuredPlaner.h.
Referenced by StructuredPlaner(), ~StructuredPlaner(), addReward_(), argmaximiseQactions_(), evalPolicy_(), evalQaction_(), makeArgMax_(), maximiseQactions_(), minimiseFunctions_(), and valueIteration_().
|
protected |
The associated optimal policy.
Definition at line 355 of file structuredPlaner.h.
Referenced by StructuredPlaner(), ~StructuredPlaner(), _recurExtractOptPol_(), extractOptimalPolicy_(), optimalPolicy(), optimalPolicy2String(), and optimalPolicySize().
|
protected |
Boolean used to indcates whether or not iteration informations should be displayed on terminal.
Definition at line 373 of file structuredPlaner.h.
Referenced by StructuredPlaner().
|
protected |
The Value Function computed iteratively.
Definition at line 345 of file structuredPlaner.h.
Referenced by StructuredPlaner(), ~StructuredPlaner(), evalPolicy_(), initVFunction_(), vFunction(), and vFunctionSize().