Détails du cours OROC-OP-DP

Markov decision processes: dynamic programming and applications

Cette page est essentiellement destinée aux enseignants et regroupe toutes les informations nécessaires à la gestion d'un cours ENSTA.

Identité du cours

Sigle : OROC-OP-DP
Titre français : Markov decision processes: dynamic programming and applications
Titre anglais : Markov decision processes: dynamic programming and applications
Méta infos : modifiée le : 30/08/2016   par : pcarpent   Nb de visiteurs : 468   annee : 3A      periode : 1      Rattachement/module : A      ECTS : 1.5      type : unknown     
ouvert : Oui     modif. autorisée : Oui     email auto. au responsable : Oui     à évaluer : Oui     en ligne : Non    
domaine ParisTech : 1,1b    

Equipe pédagogique

Responsable (login) :
Professeur principal :
Professeurs participants : Marianne AKIAN,    Jean-Philippe CHANCELIER,   
Maitres de conférences :

Contenu

Objectifs :

Joint course between ENSTA and M2 Optimization (Paris-Saclay University)

The course will be given in English

The aim of this course is to introduce different stochastic control models and to present dynamic programming as a tool for solving them. Illustrations selected among stock management, portfolio selection, yield management, transportation or Web PageRank optimisation will be presented.

We shall consider essentially stochastic dynamical systems with discrete time and finite state space, or finite Markov chains. This framework already contains the essential difficulties (for instance for long term problems), and allows one to give at the same time an insight of algorithms, mathematical techniques and qualitative properties. We may however consider some examples with infinite state space or continuous time.

We shall present the different types of stochastic control problems: complete and incomplete observation problems, criteria with finite horizon, discounted infinite horizon, stopping time, ergodic criteria, risk-sensitive criteria, constrainted problems, armed bandit problems. For some of these criteria, we shall state the corresponding dynamic programming equations, study their qualitative properties, and the algorithms for solving them (value iterations, policy iterations, linear programming), and deduce in some cases the structure of optimal strategies.

Lectures 1-7 are common to ENSTA course and M2 course, and necessary to validate the ENSTA course and the M2 course. Lecture 8-12 are necessary to validate the M2 course. ENSTA students may attend these lectures.

ENSTA students not enrolled in a double Master program are strongly encouraged to follow the entire course (ENSTA part + M2 part).

Mots clés : Markov Decision processes, Stochastic control, Ergodic control, Risk-sensitive
control, Dynamic programming, Max-plus algebra, Value iteration, Policy
iteration.
Objectives :

Joint course between ENSTA and M2 Optimization (Paris-Saclay University)

The course will be given in English

The aim of this course is to introduce different stochastic control models and to present dynamic programming as a tool for solving them. Illustrations selected among stock management, portfolio selection, yield management, transportation or Web PageRank optimisation will be presented.

We shall consider essentially stochastic dynamical systems with discrete time and finite state space, or finite Markov chains. This framework already contains the essential difficulties (for instance for long term problems), and allows one to give at the same time an insight of algorithms, mathematical techniques and qualitative properties. We may however consider some examples with infinite state space or continuous time.

We shall present the different types of stochastic control problems: complete and incomplete observation problems, criteria with finite horizon, discounted infinite horizon, stopping time, ergodic criteria, risk-sensitive criteria, constrainted problems, armed bandit problems. For some of these criteria, we shall state the corresponding dynamic programming equations, study their qualitative properties, and the algorithms for solving them (value iterations, policy iterations, linear programming), and deduce in some cases the structure of optimal strategies.

Lectures 1-7 are common to ENSTA course and M2 course, and necessary to validate the ENSTA course and the M2 course. Lecture 8-12 are necessary to validate the M2 course. ENSTA students may attend these lectures.

ENSTA students not enrolled in a double Master program are strongly encouraged to follow the entire course (ENSTA part + M2 part).

Keywords : Markov Decision processes, Stochastic control, Ergodic control, Risk-sensitive
control, Dynamic programming, Max-plus algebra, Value iteration, Policy
iteration.
Supports :
Biblio :

Dynamic Programming

D.P. Bertsekas. Prentice Hall Inc., 1987.

Controlled Markov processes

E.B. Dynkin and A.A. Yushkevich. Springer-Verlag, New York, 1979.

Markov decision processes

M.L. Puterman. John Wiley Inc., New York, 1994.
Contrôle : Final exam + practical works.

Besoins particuliers et remarques éventuelles

Moyens :
Commentaires :

Séances

mar. 20 sept. 2016   - 09:00 à 12:15 : Bloc de Module (1/2 journée) (MOD)
programme : Lecture 1. Dynamic programming equation of deterministic control problems. Markov chains: Fokker-Planck and Kolmogorov equations, graph and communication classes. Problems with additive or discounted criteria, with constant or non constant discount factors, and with finite horizon.
besoin : Vidéo,
Intervenants : Jean-Philippe CHANCELIER,
mar. 20 sept. 2016   - 14:00 à 16:00 : Bloc de Module (1/2 journée) (MOD)
programme : Lecture 2. Markov chains: stopping time, strong Markov property. Problems with additive or discounted criteria, and stopping time or infinite horizon. Equivalence between criteria with discount factors and criteria with stopping time. Homogeneous Markov chains: forward Kolmogorov equation, reduction to a sequence of i.i.d. random variables.
besoin : Vidéo,
Intervenants : Jean-Philippe CHANCELIER,
mar. 27 sept. 2016   - 09:00 à 12:15 : Bloc de Module (1/2 journée) (MOD)
programme : Lecture 3. Introduction to Markov decision processes (MDP) with complete state observation. Problems with additive or discounted criteria and finite horizon, with or without an exit time. Markovian strategies, feedback policies given by the dynamic programming equation. Properties of the dynamic programming operator (monotonicity, homogeneity, sup-norm nonexpansivity, convexity).
besoin : Vidéo,
Intervenants : Jean-Philippe CHANCELIER,
mar. 27 sept. 2016   - 14:00 à 16:00 : Bloc de Module (1/2 journée) (MOD)
programme : Lecture 4. MDP with infinite horizon. Contraction of the dynamic programming operator. Application to the dynamic programming equation and to the convergence of value iteration and policy iteration algorithms.
besoin : Vidéo,
Intervenants : Jean-Philippe CHANCELIER,
mar. 04 oct. 2016   - 09:00 à 12:15 : Bloc de Module (1/2 journée) (MOD)
programme : Lecture 5. Long-term behaviour of Markov chains: recurrence, invariant measure, ergodic theorem. Evaluation of mean-payoff/ergodic criteria or long-term risk sensitive criteria.
besoin : Vidéo,
Intervenants : Jean-Philippe CHANCELIER,
mar. 04 oct. 2016   - 14:00 à 16:00 : TD en salle info (TD)
programme : Lecture 6. Practical work on the PageRank optimization. This practical work will informally use results which will be developed in next lectures.
besoin : Vidéo,
Intervenants : Jean-Philippe CHANCELIER,
mar. 11 oct. 2016   - 09:00 à 12:15 : Bloc de Module (1/2 journée) (MOD)
programme : Lecture 7. Modelization and solution of several problems among stock management, machine maintenance, portfolio selection, Yield management, transportation, PageRank optimisation. The aim of this lecture is to illustrate the techniques of the previous lectures, and also to motivate the introduction of new ones, which will be partly studied in the following lectures. End of the ENSTA program.
besoin : Vidéo,
Intervenants : Marianne AKIAN,
mar. 11 oct. 2016   - 14:00 à 16:00 : Bloc de Module (1/2 journée) (MOD)
programme : Lecture 8. MDP with mean-payoff/ergodic criteria or long-term risk sensitive criteria: ergodic dynamic programming equation, vanishing discount approach. The irreducible or unichain case: existence and uniqueness of a solution, value and policy iterations.
besoin : Vidéo,
Intervenants : Marianne AKIAN,
mar. 18 oct. 2016   - 09:00 à 12:15 : Bloc de Module (1/2 journée) (MOD)
programme : Lecture 9. Constrained MDP. Formulation in terms of occupation measures and linear programming. Relaxed strategies.
besoin : Vidéo,
Intervenants : Marianne AKIAN,
mar. 18 oct. 2016   - 14:00 à 16:00 : Bloc de Module (1/2 journée) (MOD)
programme : Lecture 10. Introduction to multichain mean-payoff MDP and/or to zero-sum games associated to risk sensitive criteria, and/or Max-plus algebra for deterministic control problems.
besoin : Vidéo,
Intervenants : Marianne AKIAN,
mar. 08 nov. 2016   - 09:00 à 12:15 : Bloc de Module (1/2 journée) (MOD)
programme : Lecture 11. Problems with partial observation: reduction to a MDP with an infinite state space. The dynamic programming equation.
besoin : Vidéo,
Intervenants : Marianne AKIAN,
mar. 08 nov. 2016   - 14:00 à 16:00 : Bloc de Module (1/2 journée) (MOD)
programme : Lecture 12. Gittins index for multiarmed bandits.
besoin : Vidéo,
Intervenants : Marianne AKIAN,
mar. 15 nov. 2016   - 13:30 à 16:30 : Contrôle (CC)
programme : Exam
besoin : Vidéo,
Intervenants : Marianne AKIAN, Jean-Philippe CHANCELIER,