POSTED: Mar 20, 2024

Computer Science
120 flag
Multimodal Event Extraction from Incomplete Data
Fatemeh Shiri

phd

Monash University

Loading...Melbourne, Victoria, Australia

Loading...


Event extraction from multimodal documents is an important yet under-explored problem. One challenge faced by this task is the scarcity of paired image-text datasets, making it difficult to fully exploit the strong representation power of multimodal language models. We present Theia, an end-to-end multimodal event extraction framework that can be trained on incomplete data. 

Specifically, we couple a generation-based event extraction model with a customized image synthesizer that can generate images from text. Our model leverages the capabilities of pre-trained vision-language models and can be trained on incomplete (i.e. text-only) data.  Experimental results on existing multimodal datasets demonstrate the effectiveness of our approach for both synthesising missing data and extracting events over state-of-the-art approaches.

Comments

Loading...
  • Write a comment

Leave a comment