Hackathon
Practical D2T 2024 features a hackathon focused on improving the semantic accuracy of D2T systems. Join us and get hands-on experience with easy and quick LLM as judge and comparable human annotations! Let’s explore the NLG outputs together at the hackathon on Monday 💪
Join us on September 23rd, and explore LLMs for:
- Generating textual summaries from structured data as input
- Detecting different categories of errors in the obtained summaries
- Comparing error detection capabilities with those of human annotators.
We will use factgenie, our web framework for annotating and visualizing word spans in textual model outputs. With factgenie, both humans and LLMs can be used to annotate various span-based errors including semantic inaccuracies or irrelevant text.
We will work on various domains using recent structured data in the form of JSONs and CSVs. You can take a first look at the data in the online factgenie-demo we prepared. You can also see some examples of LLM-driven D2T and error annotation: try selecting the dataset (top left) st24-openweather, and the annotations (top right) st24-demo-openweather-dev-llama3. You will see what kind of weather summaries Mistral produces, and how LLama 3 tries to spot eventual errors.
Communication channels
We encourage you to join the Some INLG Discord server which will be the official communication channel for INLG 2024.
We will use dedicated practical-d2t-workshop-2024 channel for communication during the hackathon.
We will use are existing Google Group d2t2024@googlegroups.com only as a backup solution.
Phase 1: Intro + generation (50 min)
During the first part of the hackathon, we will give you a quick tour of factgenie and its features. This phase will be focused on generation, so we will work together on prompting open LLMs to produce summaries from structured data. Feel free to variate your prompts, parameters and play with different datasets to get the best output!
Phase 2: Annotation and results (80 minutes)
Next, it’s time to see how good the obtained summaries are! We will explore the second feature of factgenie, that is error annotation. During this phase, you can play with different LLMs as error annotators. Just like before, you can variate the model, prompts and parameters, but also, introduce new error categories besides those we will provide.
At the same time, we will gather some human error annotation, to compare them with the LLMs’ ones. It is up to you to decide if you want to help us by quickly annotating a couple of summaries, focus on annotation through LLMs, or both!
Finally, we will correlate human annotation against LLMs ones, and discuss the results.
Should you have any other questions, feel free to contact the organisers.
Acknowledgments
Funded by the European Union (ERC, NG-NLG, 101039303)