Stochastic Games with Asymmetric Information

Stochastic Games with Asymmetric Information

12 July, 2013 -  6 August, 2013 | Warsaw



Co-financed by: Warsaw Center of Mathematics and Computer Science




1 Introduction
Shapley introduced stochastic games in [1]. After this pioneering work a large number of papers were written on this topic. A good survey on zero-sum stochastic games can be found in Vrieze [2]. Stochastic games generalize Markov decision processes, in the sense that the latter may be treated as one player stochastic games. Most of the available literature in this area falls into the category of stochastic games with complete state information, that is, at each stage the state of the game is completely known to the players. Though there is considerable literature on Markov decision processes with partial information (see, e.g., Bertsekas and Shreve [3], Dynkin and Yushkevich [4], Hernandez-Lerma [5]), the corresponding results on stochastic games seem to be rather sparse. A two person zero-sum stochastic game with partial observation on general (uncountable) state and action spaces with discounted payoff was studied in Ghosh et. al. [6]. But a general theory on stochastic games with partial observation do not seem to be available in the literature. In this work, we propose to study the existence of value and saddle-point equilibria of a two-person zero-sum stochastic game when one of the players (say, minimizer) observes only a subspace of the total space of observations while the other (say, maximizer) has a full observation. Such a informational asymmetry is relevant in various business problems when one of the agents in the market has informational advantage i.e., when one is playing against the market maker. To the best of our knowledge the study of such games under complete information asymmetry is novel and not much studied in the literature.
2 Aim
Our objective of the research group is to study the Shapley equations associated with such games under a nonlinear filtering setup and provide value-iteration based techniques for approximating the corresponding value functions. We wish to study such systems using ideas from stochastic games and stochastic filtering theory. Our methodology subsumes the theory of partially observed Markov decision problems that have wide applicability, especially in Operations Research and Management Science (see, e.g., Monahan [7], White [8]). Our analysis shall result in computable schemes for actually
computing the value function and thereby provide an algorithmic technique to actually compute a saddle-point equilibrium strategy for the players. To the best of our knowledge such computation schemes are open problems and have not been much studied in the literature.
3 Scope
The scope of this research group includes various interdisciplinary ideas from stochastic control and games, operator theory and functional analysis, and, reinforcement learning on the theoretical side. On the application side, such a problem is relevant to real-life applications in pricing theory, supply chain cost optimization and optimal portfolio selection under market incompleteness conditions (which is the actual scenario as opposed to unrealistic assumptions made by Black-Scholes-Merton model) to name a few.
4 Methodology
We study the associated filtering problem allowing the partially observed problem to be transformed into an equivalent complete observation problem where the nonlinear filter becomes the new state (the so-called "separated" problem, see, e.g., Bensoussan [9] in the context of continuous time). It may be interesting to compare the equivalence between completely-observed games and partially-observed games with the corresponding results in partially observed Markov decision processes (POMDP). The treatment of POMDP is based on estimating the unobserved state using the available information. The conditional distribution of the state given the available information is then used as a basis for controlling systems with partial observation. In other words, one introduces a completely observed MDP (COMDP) model where the conditional distribution of the state in POMDP model given the available information constitutes the state in the COMDP model. One can then show that the conditional distributions of the states based on the available information constitute a statistic sufficient for control, as do the available information themselves (see Chapter 10 in Bersekas and Shreve [3]). In view of this the COMDP model is often referred to as "separated" control problem; the policies in the COMDP model are referred to as "separated" policies. The `separation' is carried out between the estimation and control: the observation is used to estimate the unobserved state, and then the state estimate is used for control purpose. The observation process, though available, is not used for control purpose. It is explained in Bertsekas and Shreve [3] that the standard method of establishing the sufficiency of the conditional distributions for control purpose does not extend to the partially observed stochastic games. Hence, although the partially observed game is a generalization of the corresponding decision process, the methods employed for this analysis shall be significantly different.

[1] Shapley, L.; 1953, Stochastic games, Proc. Natl. Acad. Sci. 39:1095-1100, USA.
[2] Vrieze, K.; 1989, Zero-sum stochastic games: A survey, CWI Quarterly 2:147-170.
[3] Bertsekas D. P. and S. E. Shreve; 1978, Stochastic Optimal Control: Discrete Time case, Athena Scientific, Nashua, NH, USA.
[4] Dynkin, E. B. and A. A. Yushkevich; 1975, Controlled Markov Processes, Academic Press.
[5] Hernandez-Lerma, O; 1989, Adaptive Controlled Markov Processes, Springer-Verlag, Berlin.
[6] Ghosh, M. K., D. Mc.Donald, and S. Sinha; 2004, Zero sum stochastic games with partial information, J. Optim. Theory. Applcn.
[7] Monahan G. E.; 1982, A survey of partially observable Markov decision processes: theory, models and algorithms, Management Sci. 28, 1-16.
[8] White D. J.; 1993, Markov Decision Processes, Wiley, New York.
[9] Bensoussan A.; 1992, Stochastic Control of Partially Observable Systems, Cambridge University Press.