View on GitHub


Workshop on the representation, sharing and evaluation of multimodal agent interaction.

Welcome to the 1st Workshop on the representation, sharing and evaluation of multimodal agent interaction.

This workshop is part of the fist International Conference on Hybrid Human-Artificial Intelligence, to be held at the Vrije Universiteit Amsterdam. The workshop will take place in the main building of the Vrije Universiteit of Amsterdam. Room HG-02A24 is available on the second floor in the A-wing.

Human Agent Interaction

You can also follow the meeting online:

Zoom Meeting

Meeting ID: 942 8663 3051 Passcode: 804451

(!!) Note that the keynote talk by Dan Bohus will be given online but will NOT be streamed.

Program, Room HG-02A24

Time Title & Abstract
09:15 - 09:30 Welcome
09:30 - 11:00 oral papers
Na Li and Robert Ross Transferring Studies Across Embodiments: A Case Study in Confusion Detection
Abstract Human-robot studies are expensive to conduct and difficult to con- trol, and as such researchers sometimes turn to human-avatar in- teraction in the hope of faster and cheaper data collection that can be transferred to the robot domain. In terms of our work, we are particularly interested in the challenge of detecting and mod- elling user confusion in interaction, and as part of this research programme, we conducted situated dialogue studies to investigate users’ reactions in confusing scenarios that we give in both physical and virtual environments. In this paper, we present a combined review of these studies and the results that we observed across these two embodiments. For the physical embodiment, we used a Pepper Robot, while for the virtual modality, we used a 3D avatar. Our study shows that despite attitudinal differences and technical control limitations, there were a number of similarities detected in user behaviour and self-reporting results across embodiment options. This work suggests that, while avatar interaction is no true substitute for robot interaction studies, sufficient care in study de- sign may allow well executed human-avatar studies to supplement more challenging human-robot studies.
  Full paper
Jūra Miniotaitė and André Pereira Tabletop games as Multimodal Datasets for Social AI
Abstract Progressing the work to create believable, data-driven, social AI starts where most difficult AI challenges start, with the dataset. Tabletop games provide an excellent opportunity to get a dataset with complex social behavior in a structured context. In this work, the tabletop games; Pandemic Hot Zone - Europe, Hanabi, Poker, and a custom designed game are analyzed to assess their quality as tasks for data collection. We will analyze possible datasets collected in the abovementioned games in respect to: (1) Dialog overlapping and lexical diversity; (2) Cooperation, Competitiveness and Social Context; (3) Game components and non-verbal behavior; (4) Research Opportunities.
  Full paper
Thomas Baier, Selene Santamaria Baez and Piek Vossen A modular architecture for creating multimodal agents
Abstract The paper describes a flexible and modular platform to create multimodal interactive agents. The platform operates through an event-bus on which signals and interpretations are posted in a sequence in time. Different sensors and interpretation components can be integrated by defining their input and output as topics, which results in a logical workflow for further interpretations. We explain a broad range of components that have been developed so far and integrated into a range of interactive agents. We also explain how the actual interaction is recorded as multimodal data as well as in a so-called episodic Knowledge Graph. By analysing the recorded interaction, we can analyse and compare different agents and agent components.
  Full paper
11:00 - 11:15 break
11:15 - 12:15 Interaction session, three different agent setups are open to the participants for interaction, your experience and data will be used for the discussion later
12:15 - 13:00 lunch
13:00 - 14:00 oral papers
Lucie Galland, Milene Guerreiro Goncalves and Catharine Oertel Towards a personalized collaborator for Human-Robot creative ideation; Adapting conversational strategies to increase creativity
Abstract Humans engage in creatively demanding tasks every day. However, thinking creatively while alone and without external input can be difficult. As other human collaborators are not always easily available, robots can serve as a scalable addition to the ideation process. In this work, we investigate how social robots can improve creativity in ideation tasks. Studies have shown that good mood and social robot partners can have a positive impact on creativity. We explore the possibility that a social robot, adapting its conversational strategies to user’s mood, could enhance creativity. To this end, we develop a reinforcement learning algorithm that can learn a custom transition function to adapt to each particular user. We apply this algorithm in the case of an ideation task. A simulated user is developed to test the policy. We find that the reinforcement learning algorithm performed significantly better than the random use of these strategies. This algorithm will later be tested with human participants using a Furhat robot.
  Full paper
André Meyer-Vitali and Wico Mulder Trustworthy Hybrid Team Decision-Support Systems
Abstract The aim to empower human users of artificially intelligent systems be- comes paramount when considering coordination in hybrid teams of humans and autonomous agents. Hereby, we consider not only one-to-one interactions, but also many-to-many situations (multiple humans and multiple agents), where we strive to make use of their complementary capabilities. Therefore, mutual awareness of each others’ strengths and weaknesses is crucial for beneficial coordination. In order to address these goals, and in accordance with a hybrid theory of mind, we propose the use of trustworthy interaction patterns and epistemic orchestration with intentions and causal models. The interaction patterns we describe are based on previous work on modular design patterns for hybrid team actors. Epistemic orchestration, a con- cept for explicit representation of intentions and causal relationships with the goal of specifying team architectures and their interactions, is being explored. While the current ideas are only a preliminary formulation of what will be developed further, a realistic use case is described as an experimental playground.
  Full paper
14:00 - 15:00 Interaction analysis & discussion: the interaction data that is collected in the morning and analysed through some tooling
15:00 - 15:15 Break
15:15 - 16:15 Panel on sharing multimodal interaction data:
  - The current research and multi-modal data collection practices are detrimental to progressing HAI systems.
  - We are going catastrophically wrong because….
  - Pre-trained models never work reliably in in-the-wild situations.
  - What are the (multi-modal) challenges that we need to overcome?
  - What could we do to fix the problem (how could data be shared?)
  - Can we share data without violating privacy and how should consent be given?
  - How to ensure reproducibility of experiments (data collection)
  - How to analyze data and the discussion of metrics that could be re-used to make studies comparable.
16:15 - 17:15 Keynote by Dan Bohus, Microsoft
Title Physically Situated Interaction: Opportunities and Challenges
Abstract Real-world deployments of physically situated interactive systems often bring to the fore important challenges that are not always, or not immediately visible in laboratory controlled settings. In the first part of this presentation, using vignettes from work done over the last decade at Microsoft Research, I will showcase a few such challenges and opportunities in the space of physically situated language interaction. Then, in the second part of the talk, I will pivot to some of the engineering challenges that often arise in constructing and deploying systems in the open-world, and briefly present an open-source framework that aims to accelerate development and research for multimodal interactive systems.
17:15 - Drinks


Interaction is a real world event that takes place in time and physical or virtual space. By definition it only exists when it happens. This makes it difficult to observe and study interactions, to share interaction data, to replicate or reproduce them and to evaluate agent behavior in an objective way. Interactions are also extremely complex, covering many variables whose values change from case to case. The physical circumstances are different, the participants are different and past experiences have an impact on the actual event. Besides, the eye(s) of the camera(s) and/or experimenters are another factor with impact and the man-power needed to capture such data is high. Finally, privacy issues make it difficult to simply record and publish interaction data freely.

It is therefore not a surprise that interaction research progresses slowly. This workshop aims to bring together researchers with different research backgrounds to explore how interaction research can become more standardised and scalable. The goal of this workshop is to explore how researchers and developers can share experiments and data in which multimodal agent interaction plays a role and how these interactions can be compared and evaluated. Especially within real-world physical contexts, modeling and representing situations and contexts for effective interactions is a challenge. We therefore invite researchers and developers to share with us how and why you record multimodal interactions, whether your data can be shared or combined with other data, how systems can be trained and tested and how interaction can be replicated. Machine learning communities like vision and NLP have made a lot of fast progress by creating competitive leaderboards based on benchmark datasets. But although this is great for training unimodal perception models, obviously such datasets are not sufficient for research involving interaction where multiple modalities should be considered.

So what do we as interaction researchers need in order to achieve similar progress? What kinds of shared platforms and tools? What kinds of datasets are most useful? What about something along the lines of the “Alexa challenge” for dialogue systems, or “RoboCup@Home” for HRI, where groups of researchers start with the same platform and compete based on how well they can design interactions on specific tasks with real users? What would such a challenge look like? What kinds of tasks? Where do the users come from? How might simulation come into play, and how far can it get us?

In addition to this focus on the representation and evaluation of multimodal interaction, we will also run a panel discussion on privacy issues related to interaction data and the possibilities to mitigate privacy limitations for sharing.

Call for papers

We invite submissions of long and short papers focusing on advancement in multimodal datacollection for conversational AI. Papers can cover experimental, theoretical research but also tools, platforms and practical engineering challenges. We invite researchers and developers to share with us how and why they record multimodal interactions, whether their data can be shared or combined with other data, how systems can be trained and evaluated, and how results can be reproduced.

All papers must be original and not simultaneously submitted to another journal or conference. We invite work-in-progress submissions, blue-sky papers and demonstrations. The review will be blind (one-way anonymized review). Proceedings will be published through arXiv by each individual author and links to the papers will be hosted on the workshop website. Submitted papers should conform to the latest ACM LaTeX or Word publication format. Click here for LaTeX templates and examples (download the zip package entitled Primary Article Template - LaTeX).

The Call for Papers, including instructions for the submissions, can be found on the EasyChair page.

Important dates

List of Topics

Organizing committee


All questions about submissions should be emailed to