讲座时间:
2022年3月10日08:00-10:00(伦敦时间,周四)
2022年3月10日 16:00-18:00(北京时间,周四)
腾讯会议ID:874-650-891
主讲人一:史成春
讲座题目:Statistical inference in reinforcement learning
主讲人简介:
史成春是伦敦政治经济学院统计系的助理教授,目前有10余篇第一作者同行评审的文章被顶级统计期刊AOS, JRSSB和JASA接受,还在顶级机器学习会议(ICML和NeurIPS)上发表了论文。2022年起,担任JRSSB和Journal of Nonparametric Statistics的副主编。目前,他的研究主要是在强化学习和复杂数据中开发统计学习方法。他是2021年皇家统计学会研究奖的获奖者,连续两年获得IMS Travel Award
报告摘要:
Reinforcement learning (RL) is concerned with how intelligence agents take actions in a given environment to maximize the cumulative reward they receive. In healthcare, applying RL algorithms could assist patients in improving their health status. In ride-sharing platforms, applying RL algorithms could increase drivers' income and customer satisfaction. RL has been arguably one of the most vibrant research frontiers in machine learning over the last few years. Nevertheless, statistics as a field, as opposed to computer science, has only recently begun to engage with reinforcement learning both in depth and in breadth. In today's talk, I will discuss some of my recent work on developing statistical inferential tools for reinforcement learning, with applications to mobile health and ridesharing companies. The talk will cover several different papers published in highly-ranked statistical journals (JASA & JRSSB) and top machine learning conference (ICML)
主讲人二:严晓东
讲座题目:Statistical inference for reinforcement learning from the perspective of two-armed bandit process
主讲人简介:
严晓东,山东大学未来学者,山东大学金融研究院副研究员,云南大学与香港理工大学联合培养博士,加拿大阿尔伯塔大学博士后,中国现场统计研究会高维数据统计分会理事,山东省大数据专业建设委员会常务副秘书长,山东省应用统计学会副秘书长,山东省财政厅第一批省级政策性农业保险咨询专家。在国际著名期刊AOS, JASA,JOE以及高水平期刊IJOF, Statistica Sinica,JMA等发表论文近20篇,荣获“云南省2020年优秀博士论文”奖。目前主持国家自科基金,省自科与社科基金等。
报告摘要:
Motivated by the study of asymptotic behaviour of the two armed bandit problem, we obtain several nonlinear limit theorems about the central limit theorem, which is identified explicitly, and depend heavily on the structure of the events or the integrating functions. This demonstrates the key signature of the nonlinear structure. It also lays the theoretical foundation for statistical inference in determining the arm that offers a higher chance of reward. Meanwhile,this presentation also proposes a strategic sampling procedure to construct a treatment effect testing statistics and employs nonlinear limit theory to study its asymptotic behaviour referred to strategic central limit theorem (strategic CLT) . We also provide a common strategic sampling-based bootstrap to recover the limit distribution of the developed statistics, making its use possible on observational dataset and scalable for other hypothesis testings. The theoretical results achieve the explicit density function of limit distribution, known as spike distribution with a more spike function image than standard normal density. Simulation studies pose supportive evidence that the proposed spike statistics performs well with finite samples and especially shows powerful behavior with small size of the sample. A real data example is provided for illustration.
欢迎各位老师同学积极参加!