Workshop Programme

All timings are in GMT+9 (Tokyo, Japan).

Time Event  
09:30 09:40 Workshop Introduction  
09:40 10:40 Invited Talk 1 (45 min oral + 15 min QA)

Title: Remaining challenges in complex data-to-text generation
by Craig Thomson, ADAPT/DCU

In 2017, Wiseman et al. explored the ability of then state-of-the-art neural sequence-to-sequence (seq-2-seq) models to generate texts that were more complex than the short, descriptive texts seen in most datasets. Since then, a lot has changed, with Large Language Models (LLMs) upending the field of Natural Language Processing, and alleviating some of the issues seen in outputs of seq-2-seq models. However, LLMs still make mistakes, and identifying the conditions under which these are made is an open question. This talk will explore the problems that persist when trying to build complex data-to-text systems using neural systems. It will cover the SportSett dataset of basketball game data, which was designed to enable exploration of this problem in a realistic setting. LLMs are well positioned to take advantage of the large amount of data available in SportSett, investigating problems that were obscured by an abundance of simple errors in seq-2-seq model output. Also covered are methods for identifying factual accuracy mistakes in system outputs. The talk will conclude with some examples of mistakes that ChatGPT still makes when writing data-to-text sports summaries, and the interesting research avenues that this leaves.
ALT: Invited speaker propic
10:40 11:10 Coffee Break  
11:10 12:10 Oral Session 1 (15 min oral + 5 min QA)  
11:10 11:30 Enhancing Situation Awareness through Model-Based Explanation Generation
Konstantinos Gavriilidis, Ioannis Konstas, Helen Hastie and Wei Pang
 
11:30 11:50 Controllable Synthetic Clinical Note Generation with Privacy Guarantees
Tal Baumel, Andre Manoel, Shize Su, Daniel Jones, Huseyin Inan, Aaron Ari Bornstein and Robert Sim
 
11:50 12:10 Beyond the Hype: Identifying and Analyzing Math Word Problem-Solving Challenges for Large Language Models
Romina Soledad Albornoz-De Luise, David Arnau, Pablo Arnau-González and Miguel Arevalillo-Herráez
 
12:10 13:40 Lunch  
13:40 14:40 Panel

Panelists:
- Claire Gardent, CNRS
- C. Maria Keet, University of Cape Town
- Michela Lorandi, ADAPT/DCU
- Craig Thomson, ADAPT/DCU

 
14:40 15:30 Hackathon (Part 1)  
15:30 16:00 Coffee Break  
16:00 17:20 Hackathon (Part 2)  
17:20 18:20 Invited Talk 2 (45 min oral + 15 min QA)

Title: Inference over Clinical Trial Data: A Neuro-Symbolic Perspective
by Marco Valentino, Neuro-Symbolic AI Group / Idiap Research Institute

Clinical Trial Reports (CTRs) hold essential information for advancing personalised medicine. However, with the vast number of CTRs produced over the years, manually reviewing the reports to extract the best evidence for experimental treatments has become impractical. Large Language Models (LLMs) present a promising solution due to their capacity to interpret and perform inference on both textual and semi-structured data. But a critical question remains: Can LLMs be reliably deployed in this sensitive and high-stakes domain? In this talk, I will explore this question in two stages. First, I will examine the unique challenges of reasoning over clinical trial data, introducing the NLI4CT dataset as a key resource. I will delve into the results from two recent SemEval tasks designed to evaluate the accuracy, faithfulness, and consistency of state-of-the-art models, highlighting their strengths and limitations. Building on these insights, I will then present a Neuro-Symbolic perspective for safer clinical NLP applications, discussing the possibility of developing hybrid architectures that integrate Natural Language Inference (NLI) and Data-to-Text (D2T) generation with external critique models for automatic feedback and verification.
ALT: Invited speaker propic
18:20 18:30 Closing  
19:00 Dinner  

Acknowledgments

Funded by the European Union (ERC, NG-NLG, 101039303)

ERC