The general SDyna architecture abstract class. More...

#include <agrum/FMDP/SDyna/sdyna.h>

Collaboration diagram for gum::SDYNA:

Public Member Functions
std::string	toString ()
	Returns.
std::string	optimalPolicy2String ()
Problem specification methods
void	addAction (const Idx actionId, std::string_view actionName)
	Inserts a new action in the SDyna instance.
void	addVariable (const DiscreteVariable *var)
	Inserts a new variable in the SDyna instance.
Initialization
void	initialize ()
	Initializes the Sdyna instance.
void	initialize (const Instantiation &initialState)
	Initializes the Sdyna instance at given state.
Incremental methods
void	setCurrentState (const Instantiation &currentState)
	Sets last state visited to the given state.
Idx	takeAction (const Instantiation &curState)
Idx	takeAction ()
void	feedback (const Instantiation &originalState, const Instantiation &reachedState, Idx performedAction, double obtainedReward)
	Performs a feedback on the last transition.
void	feedback (const Instantiation &reachedState, double obtainedReward)
	Performs a feedback on the last transition.
void	makePlanning (Idx nbStep)
	Starts a new planning.
Size methods
just to get the size of the different data structure for performance evaluation purposes only
Size	learnerSize ()
	learnerSize
Size	modelSize ()
	modelSize
Size	valueFunctionSize ()
	valueFunctionSize
Size	optimalPolicySize ()
	optimalPolicySize

Static Public Member Functions

static SDYNA *	spitiInstance (double attributeSelectionThreshold=0.99, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@
static SDYNA *	spimddiInstance (double attributeSelectionThreshold=0.99, double similarityThreshold=0.3, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@
static SDYNA *	RMaxMDDInstance (double attributeSelectionThreshold=0.99, double similarityThreshold=0.3, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@
static SDYNA *	RMaxTreeInstance (double attributeSelectionThreshold=0.99, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@
static SDYNA *	RandomMDDInstance (double attributeSelectionThreshold=0.99, double similarityThreshold=0.3, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@
static SDYNA *	RandomTreeInstance (double attributeSelectionThreshold=0.99, double discountFactor=0.9, double epsilon=1, Idx observationPhaseLenght=100, Idx nbValueIterationStep=10)
	@

Protected Attributes
FMDP< double > *	fmdp_
	The learnt Markovian Decision Process.
Instantiation	lastState_
	The state in which the system is before we perform a new action.

Private Attributes
ILearningStrategy *	_learner_
	The learner used to learn the FMDP.
IPlanningStrategy< double > *	_planer_
	The planer used to plan an optimal strategy.
IDecisionStrategy *	_decider_
	The decider.
Idx	_observationPhaseLenght_
	The number of observation we make before using again the planer.
Idx	_nbObservation_
	The total number of observation made so far.
Idx	_nbValueIterationStep_
	The number of Value Iteration step we perform.
Idx	_lastAction_
	The last performed action.
Set< Observation * >	_bin_
	Since SDYNA made these observation, it has to delete them on quitting.
bool	_actionReward_
bool	verbose_

Constructor & destructor.
	SDYNA (ILearningStrategy learner, IPlanningStrategy< double > planer, IDecisionStrategy *decider, Idx observationPhaseLenght, Idx nbValueIterationStep, bool actionReward, bool verbose=true)
	Constructor.
	~SDYNA ()
	Destructor.

Detailed Description

The general SDyna architecture abstract class.

Instance of SDyna architecture should inherit

Definition at line 77 of file sdyna.h.

Constructor & Destructor Documentation

◆ SDYNA()

gum::SDYNA::SDYNA	(	ILearningStrategy *	learner,
		IPlanningStrategy< double > *	planer,
		IDecisionStrategy *	decider,
		Idx	observationPhaseLenght,
		Idx	nbValueIterationStep,
		bool	actionReward,
		bool	verbose = true )

private

Constructor.

Returns: an instance of SDyna architecture

Definition at line 81 of file sdyna.cpp.

                                                     :
      _learner_(learner), _planer_(planer), _decider_(decider),
      _observationPhaseLenght_(observationPhaseLenght),
      _nbValueIterationStep_(nbValueIterationStep), _actionReward_(actionReward),
      verbose_(verbose) {
    GUM_CONSTRUCTOR(SDYNA);
 
    fmdp_ = new FMDP< double >();
 
    _nbObservation_ = 1;
  }

References SDYNA(), _actionReward_, _decider_, _learner_, _nbObservation_, _nbValueIterationStep_, _observationPhaseLenght_, _planer_, fmdp_, and verbose_.

Referenced by SDYNA(), ~SDYNA(), RandomMDDInstance(), RandomTreeInstance(), RMaxMDDInstance(), RMaxTreeInstance(), spimddiInstance(), and spitiInstance().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ ~SDYNA()

gum::SDYNA::~SDYNA ( )

Destructor.

Definition at line 102 of file sdyna.cpp.

                {
    delete _decider_;
 
    delete _learner_;
 
    delete _planer_;
 
    for (auto obsIter = _bin_.beginSafe(); obsIter != _bin_.endSafe(); ++obsIter)
      delete *obsIter;
 
    delete fmdp_;
 
    GUM_DESTRUCTOR(SDYNA);
  }

References SDYNA(), _bin_, _decider_, _learner_, _planer_, and fmdp_.

Here is the call graph for this function:

Member Function Documentation

◆ addAction()

INLINE void gum::SDYNA::addAction	(	const Idx	actionId,
		std::string_view	actionName )

Inserts a new action in the SDyna instance.

Warning: Without effect until method initialize is called

Parameters

actionId	: an id to identify the action
actionName	: its human name

Definition at line 144 of file sdyna_inl.h.

                                                                            {
    fmdp_->addAction(actionId, std::string(actionName));
  }

References fmdp_.

◆ addVariable()

INLINE void gum::SDYNA::addVariable ( const DiscreteVariable * var )

Inserts a new variable in the SDyna instance.

Warning: Without effect until method initialize is called

Parameters

var	: the var to be added. Note that variable may or may not have all its modalities given. If not they will be discovered by the SDyna architecture during the process

Definition at line 148 of file sdyna_inl.h.

148{ fmdp_->addVariable(var); }

References fmdp_.

◆ feedback() [1/2]

void gum::SDYNA::feedback	(	const Instantiation &	originalState,
		const Instantiation &	reachedState,
		Idx	performedAction,
		double	obtainedReward )

Performs a feedback on the last transition.

Incremental methods.

In extenso, learn from the transition.

Parameters

originalState	: the state we were in before the transition
reachedState	: the state we reached after
performedAction	: the action we performed
obtainedReward	: the reward we obtained

Definition at line 153 of file sdyna.cpp.

                                                    {
    _lastAction_ = lastAction;
    lastState_   = prevState;
    feedback(curState, reward);
  }

References _lastAction_, feedback(), and lastState_.

Referenced by feedback().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ feedback() [2/2]

void gum::SDYNA::feedback	(	const Instantiation &	reachedState,
		double	obtainedReward )

Performs a feedback on the last transition.

In extenso, learn from the transition.

Parameters

reachedState	: the state reached after the transition
obtainedReward	: the reward obtained during the transition

Warning: Uses the originalState and performedAction stored in cache If you want to specify the original state and the performed action, see below

Definition at line 173 of file sdyna.cpp.

                                                                   {
    Observation* obs = new Observation();
 
    for (auto varIter = lastState_.variablesSequence().beginSafe();
         varIter != lastState_.variablesSequence().endSafe();
         ++varIter)
      obs->setModality(*varIter, lastState_.val(**varIter));
 
    for (auto varIter = newState.variablesSequence().beginSafe();
         varIter != newState.variablesSequence().endSafe();
         ++varIter) {
      obs->setModality(fmdp_->main2prime(*varIter), newState.val(**varIter));
 
      if (this->_actionReward_) obs->setRModality(*varIter, lastState_.val(**varIter));
      else obs->setRModality(*varIter, newState.val(**varIter));
    }
 
    obs->setReward(reward);
 
    _learner_->addObservation(_lastAction_, obs);
    _bin_.insert(obs);
 
    setCurrentState(newState);
    _decider_->checkState(lastState_, _lastAction_);
 
    if (_nbObservation_ % _observationPhaseLenght_ == 0) makePlanning(_nbValueIterationStep_);
 
    _nbObservation_++;
  }

References _actionReward_, _bin_, _decider_, _lastAction_, _learner_, _nbObservation_, _nbValueIterationStep_, _observationPhaseLenght_, fmdp_, lastState_, makePlanning(), setCurrentState(), gum::Observation::setModality(), gum::Observation::setReward(), gum::Observation::setRModality(), gum::Instantiation::val(), and gum::Instantiation::variablesSequence().

Here is the call graph for this function:

◆ initialize() [1/2]

void gum::SDYNA::initialize ( )

Initializes the Sdyna instance.

Definition at line 121 of file sdyna.cpp.

                         {
    _learner_->initialize(fmdp_);
    _planer_->initialize(fmdp_);
    _decider_->initialize(fmdp_);
  }

References _decider_, _learner_, _planer_, and fmdp_.

Referenced by initialize().

Here is the caller graph for this function:

◆ initialize() [2/2]

void gum::SDYNA::initialize ( const Instantiation & initialState )

Initializes the Sdyna instance at given state.

Parameters

initialState : the state of the studied system from which we will begin the explore, learn and exploit process

Definition at line 134 of file sdyna.cpp.

                                                          {
    initialize();
    setCurrentState(initialState);
  }

References initialize(), and setCurrentState().

Here is the call graph for this function:

◆ learnerSize()

INLINE Size gum::SDYNA::learnerSize ( )

learnerSize

Returns

Definition at line 156 of file sdyna_inl.h.

156{ return _learner_->size(); }

References _learner_.

◆ makePlanning()

void gum::SDYNA::makePlanning ( Idx nbStep )

Starts a new planning.

Parameters

nbStep : the maximal number of value iteration performed in this planning

Definition at line 210 of file sdyna.cpp.

                                                   {
    if (verbose_) std::cout << "Updating decision trees ..." << std::endl;
    _learner_->updateFMDP();
    // std::cout << << "Done" << std::endl;
 
    if (verbose_) std::cout << "Planning ..." << std::endl;
    _planer_->makePlanning(nbValueIterationStep);
    // std::cout << << "Done" << std::endl;
 
    _decider_->setOptimalStrategy(_planer_->optimalPolicy());
  }

References _decider_, _learner_, _planer_, and verbose_.

Referenced by feedback().

Here is the caller graph for this function:

◆ modelSize()

INLINE Size gum::SDYNA::modelSize ( )

modelSize

Returns

Definition at line 158 of file sdyna_inl.h.

158{ return fmdp_->size(); }

References fmdp_.

◆ optimalPolicy2String()

INLINE std::string gum::SDYNA::optimalPolicy2String ( )

Definition at line 154 of file sdyna_inl.h.

154{ return _planer_->optimalPolicy2String(); }

References _planer_.

◆ optimalPolicySize()

INLINE Size gum::SDYNA::optimalPolicySize ( )

optimalPolicySize

Returns

Definition at line 162 of file sdyna_inl.h.

162{ return _planer_->optimalPolicySize(); }

References _planer_.

◆ RandomMDDInstance()

INLINE SDYNA * gum::SDYNA::RandomMDDInstance	(	double	attributeSelectionThreshold = 0.99,
		double	similarityThreshold = 0.3,
		double	discountFactor = 0.9,
		double	epsilon = 1,
		Idx	observationPhaseLenght = 100,
		Idx	nbValueIterationStep = 10 )

static

@

Definition at line 112 of file sdyna_inl.h.

                                                                      {
    bool               actionReward = true;
    ILearningStrategy* ls
        = new FMDPLearner< GTEST, GTEST, IMDDILEARNER >(attributeSelectionThreshold,
                                                        actionReward,
                                                        similarityThreshold);
    IPlanningStrategy< double >* ps
        = StructuredPlaner< double >::spumddInstance(discountFactor, epsilon);
    IDecisionStrategy* ds = new RandomDecider();
    return new SDYNA(ls, ps, ds, observationPhaseLenght, nbValueIterationStep, actionReward);
  }

References SDYNA(), and gum::StructuredPlaner< GUM_ELEMENT >::spumddInstance().

Here is the call graph for this function:

◆ RandomTreeInstance()

INLINE SDYNA * gum::SDYNA::RandomTreeInstance	(	double	attributeSelectionThreshold = 0.99,
		double	discountFactor = 0.9,
		double	epsilon = 1,
		Idx	observationPhaseLenght = 100,
		Idx	nbValueIterationStep = 10 )

static

@

Definition at line 129 of file sdyna_inl.h.

                                                                       {
    bool               actionReward = true;
    ILearningStrategy* ls
        = new FMDPLearner< CHI2TEST, CHI2TEST, ITILEARNER >(attributeSelectionThreshold,
                                                            actionReward);
    IPlanningStrategy< double >* ps
        = StructuredPlaner< double >::sviInstance(discountFactor, epsilon);
    IDecisionStrategy* ds = new RandomDecider();
    return new SDYNA(ls, ps, ds, observationPhaseLenght, nbValueIterationStep, actionReward);
  }

References SDYNA(), and gum::StructuredPlaner< GUM_ELEMENT >::sviInstance().

Here is the call graph for this function:

◆ RMaxMDDInstance()

INLINE SDYNA * gum::SDYNA::RMaxMDDInstance	(	double	attributeSelectionThreshold = 0.99,
		double	similarityThreshold = 0.3,
		double	discountFactor = 0.9,
		double	epsilon = 1,
		Idx	observationPhaseLenght = 100,
		Idx	nbValueIterationStep = 10 )

static

@

Definition at line 80 of file sdyna_inl.h.

                                                                    {
    bool               actionReward = true;
    ILearningStrategy* ls
        = new FMDPLearner< GTEST, GTEST, IMDDILEARNER >(attributeSelectionThreshold,
                                                        actionReward,
                                                        similarityThreshold);
    AdaptiveRMaxPlaner* rm
        = AdaptiveRMaxPlaner::ReducedAndOrderedInstance(ls, discountFactor, epsilon);
    IPlanningStrategy< double >* ps = rm;
    IDecisionStrategy*           ds = rm;
    return new SDYNA(ls, ps, ds, observationPhaseLenght, nbValueIterationStep, actionReward);
  }

References SDYNA(), and gum::AdaptiveRMaxPlaner::ReducedAndOrderedInstance().

Here is the call graph for this function:

◆ RMaxTreeInstance()

INLINE SDYNA * gum::SDYNA::RMaxTreeInstance	(	double	attributeSelectionThreshold = 0.99,
		double	discountFactor = 0.9,
		double	epsilon = 1,
		Idx	observationPhaseLenght = 100,
		Idx	nbValueIterationStep = 10 )

static

@

Definition at line 98 of file sdyna_inl.h.

                                                                     {
    bool               actionReward = true;
    ILearningStrategy* ls
        = new FMDPLearner< GTEST, GTEST, ITILEARNER >(attributeSelectionThreshold, actionReward);
    AdaptiveRMaxPlaner*          rm = AdaptiveRMaxPlaner::TreeInstance(ls, discountFactor, epsilon);
    IPlanningStrategy< double >* ps = rm;
    IDecisionStrategy*           ds = rm;
    return new SDYNA(ls, ps, ds, observationPhaseLenght, nbValueIterationStep, actionReward);
  }

References SDYNA(), and gum::AdaptiveRMaxPlaner::TreeInstance().

Here is the call graph for this function:

◆ setCurrentState()

INLINE void gum::SDYNA::setCurrentState ( const Instantiation & currentState )

Sets last state visited to the given state.

During the learning process, we will consider that were in this state before the transition.

Parameters

currentState : the state

Definition at line 150 of file sdyna_inl.h.

                                                                      {
    lastState_ = currentState;
  }

References lastState_.

Referenced by feedback(), and initialize().

Here is the caller graph for this function:

◆ spimddiInstance()

INLINE SDYNA * gum::SDYNA::spimddiInstance	(	double	attributeSelectionThreshold = 0.99,
		double	similarityThreshold = 0.3,
		double	discountFactor = 0.9,
		double	epsilon = 1,
		Idx	observationPhaseLenght = 100,
		Idx	nbValueIterationStep = 10 )

static

@

Definition at line 63 of file sdyna_inl.h.

                                                                    {
    bool               actionReward = false;
    ILearningStrategy* ls
        = new FMDPLearner< GTEST, GTEST, IMDDILEARNER >(attributeSelectionThreshold,
                                                        actionReward,
                                                        similarityThreshold);
    IPlanningStrategy< double >* ps
        = StructuredPlaner< double >::spumddInstance(discountFactor, epsilon, false);
    IDecisionStrategy* ds = new E_GreedyDecider();
    return new SDYNA(ls, ps, ds, observationPhaseLenght, nbValueIterationStep, actionReward, false);
  }

References SDYNA(), and gum::StructuredPlaner< GUM_ELEMENT >::spumddInstance().

Here is the call graph for this function:

◆ spitiInstance()

INLINE SDYNA * gum::SDYNA::spitiInstance	(	double	attributeSelectionThreshold = 0.99,
		double	discountFactor = 0.9,
		double	epsilon = 1,
		Idx	observationPhaseLenght = 100,
		Idx	nbValueIterationStep = 10 )

static

@

Definition at line 48 of file sdyna_inl.h.

                                                                  {
    bool               actionReward = false;
    ILearningStrategy* ls
        = new FMDPLearner< CHI2TEST, CHI2TEST, ITILEARNER >(attributeSelectionThreshold,
                                                            actionReward);
    IPlanningStrategy< double >* ps
        = StructuredPlaner< double >::sviInstance(discountFactor, epsilon);
    IDecisionStrategy* ds = new E_GreedyDecider();
    return new SDYNA(ls, ps, ds, observationPhaseLenght, nbValueIterationStep, actionReward);
  }

References SDYNA(), and gum::StructuredPlaner< GUM_ELEMENT >::sviInstance().

Here is the call graph for this function:

◆ takeAction() [1/2]

Idx gum::SDYNA::takeAction ( )

Returns: the id of the action the SDyna instance wish to be performed

Definition at line 238 of file sdyna.cpp.

                        {
    ActionSet actionSet = _decider_->stateOptimalPolicy(lastState_);
    if (actionSet.size() == 1) {
      _lastAction_ = actionSet[0];
    } else {
      Idx randy    = randomValue(actionSet.size());
      _lastAction_ = actionSet[randy == actionSet.size() ? 0 : randy];
    }
    return _lastAction_;
  }

References _decider_, _lastAction_, lastState_, gum::randomValue(), and gum::ActionSet::size().

Referenced by takeAction().

Here is the call graph for this function:

Here is the caller graph for this function:

◆ takeAction() [2/2]

Idx gum::SDYNA::takeAction ( const Instantiation & curState )

Returns: actionId the id of the action the SDyna instance wish to be performed

Parameters

curState the state in which we currently are

Definition at line 228 of file sdyna.cpp.

                                                     {
    lastState_ = curState;
    return takeAction();
  }

References lastState_, and takeAction().

Here is the call graph for this function:

◆ toString()

std::string gum::SDYNA::toString ( )

Returns.

Returns: a string describing the learned FMDP, and the associated optimal policy. Both in DOT language.

Definition at line 252 of file sdyna.cpp.

                            {
    return fmdp_->toString() + '\n' + _planer_->optimalPolicy2String() + '\n';
  }

References _planer_, and fmdp_.

◆ valueFunctionSize()

INLINE Size gum::SDYNA::valueFunctionSize ( )

valueFunctionSize

Returns

Definition at line 160 of file sdyna_inl.h.

160{ return _planer_->vFunctionSize(); }

References _planer_.

Member Data Documentation

◆ _actionReward_

bool gum::SDYNA::_actionReward_

private

Definition at line 392 of file sdyna.h.

Referenced by SDYNA(), and feedback().

◆ _bin_

Set< Observation* > gum::SDYNA::_bin_

private

Since SDYNA made these observation, it has to delete them on quitting.

Definition at line 390 of file sdyna.h.

Referenced by ~SDYNA(), and feedback().

◆ _decider_

IDecisionStrategy* gum::SDYNA::_decider_

private

The decider.

Definition at line 374 of file sdyna.h.

Referenced by SDYNA(), ~SDYNA(), feedback(), initialize(), makePlanning(), and takeAction().

◆ _lastAction_

Idx gum::SDYNA::_lastAction_

private

The last performed action.

Definition at line 387 of file sdyna.h.

Referenced by feedback(), feedback(), and takeAction().

◆ _learner_

ILearningStrategy* gum::SDYNA::_learner_

private

The learner used to learn the FMDP.

Definition at line 368 of file sdyna.h.

Referenced by SDYNA(), ~SDYNA(), feedback(), initialize(), learnerSize(), and makePlanning().

◆ _nbObservation_

Idx gum::SDYNA::_nbObservation_

private

The total number of observation made so far.

Definition at line 381 of file sdyna.h.

Referenced by SDYNA(), and feedback().

◆ _nbValueIterationStep_

Idx gum::SDYNA::_nbValueIterationStep_

private

The number of Value Iteration step we perform.

Definition at line 384 of file sdyna.h.

Referenced by SDYNA(), and feedback().

◆ _observationPhaseLenght_

Idx gum::SDYNA::_observationPhaseLenght_

private

The number of observation we make before using again the planer.

Definition at line 378 of file sdyna.h.

Referenced by SDYNA(), and feedback().

◆ _planer_

IPlanningStrategy< double >* gum::SDYNA::_planer_

private

The planer used to plan an optimal strategy.

Definition at line 371 of file sdyna.h.

Referenced by SDYNA(), ~SDYNA(), initialize(), makePlanning(), optimalPolicy2String(), optimalPolicySize(), toString(), and valueFunctionSize().

◆ fmdp_

FMDP< double >* gum::SDYNA::fmdp_

protected

The learnt Markovian Decision Process.

Definition at line 361 of file sdyna.h.

Referenced by SDYNA(), ~SDYNA(), addAction(), addVariable(), feedback(), initialize(), modelSize(), and toString().

◆ lastState_

Instantiation gum::SDYNA::lastState_

protected

The state in which the system is before we perform a new action.

Definition at line 364 of file sdyna.h.

Referenced by feedback(), feedback(), setCurrentState(), takeAction(), and takeAction().

◆ verbose_

bool gum::SDYNA::verbose_

private

Definition at line 394 of file sdyna.h.

Referenced by SDYNA(), and makePlanning().

The documentation for this class was generated from the following files:

agrum/FMDP/SDyna/sdyna.h
agrum/FMDP/SDyna/sdyna.cpp
agrum/FMDP/SDyna/sdyna_inl.h

Public Member Functions

Static Public Member Functions

Protected Attributes

Private Attributes

Constructor & destructor.

Detailed Description

Constructor & Destructor Documentation

◆ SDYNA()

◆ ~SDYNA()

Member Function Documentation

◆ addAction()

◆ addVariable()

◆ feedback() [1/2]

◆ feedback() [2/2]

◆ initialize() [1/2]

◆ initialize() [2/2]

◆ learnerSize()

◆ makePlanning()

◆ modelSize()

◆ optimalPolicy2String()

◆ optimalPolicySize()

◆ RandomMDDInstance()

◆ RandomTreeInstance()

◆ RMaxMDDInstance()

◆ RMaxTreeInstance()

◆ setCurrentState()

◆ spimddiInstance()

◆ spitiInstance()

◆ takeAction() [1/2]

◆ takeAction() [2/2]

◆ toString()

◆ valueFunctionSize()

Member Data Documentation

◆ _actionReward_

◆ _bin_

◆ _decider_

◆ _lastAction_

◆ _learner_

◆ _nbObservation_

◆ _nbValueIterationStep_

◆ _observationPhaseLenght_

◆ _planer_

◆ fmdp_

◆ lastState_

◆ verbose_