POSTED: Mar 20, 2024

Computer Science
120 flag
Multimodal Event Extraction from Incomplete Data
Fatemeh Shiri


Monash University

Loading...Melbourne, Victoria, Australia


Event extraction from multimodal documents is an important yet under-explored problem. One challenge faced by this task is the scarcity of paired image-text datasets, making it difficult to fully exploit the strong representation power of multimodal language models. We present Theia, an end-to-end multimodal event extraction framework that can be trained on incomplete data. 

Specifically, we couple a generation-based event extraction model with a customized image synthesizer that can generate images from text. Our model leverages the capabilities of pre-trained vision-language models and can be trained on incomplete (i.e. text-only) data.  Experimental results on existing multimodal datasets demonstrate the effectiveness of our approach for both synthesising missing data and extracting events over state-of-the-art approaches.


  • Write a comment

Leave a comment