Practical D2T

Practical D2T at INLG 2024
Tokyo, Japan, 23 Sept, 2024

Workshop Programme

All timings are in GMT+9 (Tokyo, Japan).

Time	Event
09:30 09:40	Workshop Introduction
09:40 10:40	Invited Talk 1 (45 min oral + 15 min QA) Title: Remaining challenges in complex data-to-text generation by Craig Thomson, ADAPT/DCU In 2017, Wiseman et al. explored the ability of then state-of-the-art neural sequence-to-sequence (seq-2-seq) models to generate texts that were more complex than the short, descriptive texts seen in most datasets. Since then, a lot has changed, with Large Language Models (LLMs) upending the field of Natural Language Processing, and alleviating some of the issues seen in outputs of seq-2-seq models. However, LLMs still make mistakes, and identifying the conditions under which these are made is an open question. This talk will explore the problems that persist when trying to build complex data-to-text systems using neural systems. It will cover the SportSett dataset of basketball game data, which was designed to enable exploration of this problem in a realistic setting. LLMs are well positioned to take advantage of the large amount of data available in SportSett, investigating problems that were obscured by an abundance of simple errors in seq-2-seq model output. Also covered are methods for identifying factual accuracy mistakes in system outputs. The talk will conclude with some examples of mistakes that ChatGPT still makes when writing data-to-text sports summaries, and the interesting research avenues that this leaves.
10:40 11:10	Coffee Break
11:10 12:10	Oral Session 1 (15 min oral + 5 min QA)
11:10 11:30	Enhancing Situation Awareness through Model-Based Explanation Generation Konstantinos Gavriilidis, Ioannis Konstas, Helen Hastie and Wei Pang
11:30 11:50	Controllable Synthetic Clinical Note Generation with Privacy Guarantees Tal Baumel, Andre Manoel, Shize Su, Daniel Jones, Huseyin Inan, Aaron Ari Bornstein and Robert Sim
11:50 12:10	Beyond the Hype: Identifying and Analyzing Math Word Problem-Solving Challenges for Large Language Models Romina Soledad Albornoz-De Luise, David Arnau, Pablo Arnau-González and Miguel Arevalillo-Herráez
12:10 13:40	Lunch
13:40 14:40	Panel Panelists: - Claire Gardent, CNRS - C. Maria Keet, University of Cape Town - Michela Lorandi, ADAPT/DCU - Craig Thomson, ADAPT/DCU
14:40 15:30	Hackathon (Part 1)
15:30 16:00	Coffee Break
16:00 17:20	Hackathon (Part 2)
17:20 18:20	Invited Talk 2 (45 min oral + 15 min QA) Title: Inference over Clinical Trial Data: A Neuro-Symbolic Perspective by Marco Valentino, Neuro-Symbolic AI Group / Idiap Research Institute Clinical Trial Reports (CTRs) hold essential information for advancing personalised medicine. However, with the vast number of CTRs produced over the years, manually reviewing the reports to extract the best evidence for experimental treatments has become impractical. Large Language Models (LLMs) present a promising solution due to their capacity to interpret and perform inference on both textual and semi-structured data. But a critical question remains: Can LLMs be reliably deployed in this sensitive and high-stakes domain? In this talk, I will explore this question in two stages. First, I will examine the unique challenges of reasoning over clinical trial data, introducing the NLI4CT dataset as a key resource. I will delve into the results from two recent SemEval tasks designed to evaluate the accuracy, faithfulness, and consistency of state-of-the-art models, highlighting their strengths and limitations. Building on these insights, I will then present a Neuro-Symbolic perspective for safer clinical NLP applications, discussing the possibility of developing hybrid architectures that integrate Natural Language Inference (NLI) and Data-to-Text (D2T) generation with external critique models for automatic feedback and verification.
18:20 18:30	Closing
19:00	Dinner

Acknowledgments

Funded by the European Union (ERC, NG-NLG, 101039303)

ERC