Tao of Programmer: ROBOCUP相关三篇论文的综述

ROBOCUP相关论文的综述

1 A Co-operative Framework for Strategic Planningg

an thought as a specialisation of distributed problem
solving. The term “distributed” can refer to either the process or the product, or even both of them. As a result
of that, the meaning of distributed planning can be threefold and more specifically it can mean any of the
following categories of planning:
分布式规划分类：
‰ Centralised Planning for Distributed Plans: where the plan is constructed by one agent and the execution is done by many agents.
‰ Distributed Planning for Centralised Plans: where the problem is constructed by many agents and the execution is done by one agent.
‰ Distributed Planning for Distributed Plans: where both the construction and the execution of the plan are done by many agents.
策略的规划是一个“情况”到“计划”的映射，一种情况对应一个计划。

策略计划是一系列的动作，而每一个动作可以表示为：触发，动作，终止。

因此，设计一个策略计划的步骤：
识别策略计划－－>识别计划的动作-->识别动作的触发器-->识别动作的取消条件-->给这个计划分配agent

不同策略的组合形成合作：

如果组合这些策略来形成较好的策略呢，使用神经网络训练，得到最好的策略选择。

总结：
和uva的想法不太一样，不如根据角色决策来的鲁棒性好。
因为一个策略一旦定下来，动作，出发条件，终止条件就固定死了，而比赛时可能的情况太多了。
而根据角色的动态变换，却可以更好的适应场上的变化。
神经网络对球队总体行为的训练，可以使球员选择能够取得更好效益动作而“自发"的进行配合。

2 A Layered Approach to Learning Client Behaviors in the RoboCup
Soccer Server

Soccer Server的特点

Enough complexity to be realistic; Easy accessibility to researchers worldwide;
Embodiment of most MAS issues, including [20];
– Ability to support reactive or deliberative agents
– Need for agents to model other agents
– Need for agents to affect each other
– Room for both cooperative and competitive agents
– Possibility for stable or evolving agents
– Need for resource management (stamina)
– Need for social conventions
– Opportunity for agents to fill different roles
– Support for communicating agents
– Opportunity to plan communicative acts
– Room for exploring commitment/decommitment strategies
Straightforward evaluation; Good multiagentML opportunities.

一，学习底层动作
截球
参数：BD＝Ball's distance, BA=Ball's Angle, TA=turn angle - the angle to turn agter facing the ball

通过收集正确数据，进行神经网络的学习，可以得到90％以上的成功率。
其中BA在网络中占的劝重最大，如果根据神经网络训练来数据作一个BA到转角度的lookup table，效果几乎一样，下图。
分析法

二传球
场景布置

三种结果。成功，错过球和失败。
作为DT树的输入174个属性

包括：
到接球者之间的距离、角度（2）
队友的距离和角度，基于到接球者之间的角度排序(18)
对手的距离和角度，基于到接球者之间的角度排序(22)
给定角度和距离内接球者周围的队友个数、对手个数和队员的个数(45)
接球者和队友的距离和角度，基于距离排序（20）
接球者和对手的距离和角度，基于距离排序（22）
从接球者角度看在给定的距离和角度下传球者周围队友、对手、球员的个数。

Notice that these are used much more frequently than the attributes from the receiver’s perspective. Thus the trained tree is comparably effective when the passer must decide without any input from the potential receivers.
实验表明，决策树使用的绝大多数都是passer的相关参数，因此训练树即使没有接球者的数据，学到的结果也是相对有效的。

3 An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems

MDPs，单个agent，学习策略。
robocup环境不允许多agent处于相同环境，所以MDPs扩展为multi－agent MDPs（MAMDPs），其中不同智能体被给予不同的回报函数。
另有cooperative MAMDPs，智能题给予相同的回报函数，所以可以视为相同环境。
coMASMDPs和非CoMASMDPs的不同，非的只能确定其他，改善一个，重复此过程。Co的却能求出全局最好策略。
CoMASMDPs和单MDPs不同。单只考虑个人，co的受独立、同步的agents的影响。因为agents们并没有被强制合作，因此一个能够产生合作行为的算法，就是任务。
出了合作，还有困难就是选择动作时不完全信息，下面区别两种情况：
A）agent知道自己和他人动作的选择（合作动作学习）
b）agent只知道自己的动作。独立学习
因为MAMDP的转移依赖于当前状态和动作，因此在独自学习的情况下，隐瞒其他agent选择的信息会导致对单个智能题来说，接下来的状态不可预测。
因此，在这两种情况下，我们假设每个agent知道当前的状态和状态转移得来的回报。（Robocup不满足这个假设）

为独立学习者来说，本文提的分布算法，两个难点：a）必须学到优的策略 b)不同的优的策略存在，全队agents必须同意这个策略。

An approach to noncommunicative multiagent coordination
in continuous domains∗

合作图－－连续并且／或者无通讯环境。

Tao of Programmer

ROBOCUP相关三篇论文的综述

2008年12月27日星期六

0 评论:

博客归档

类别