Large Model Based Crossmodal Chinese Poetry Creation

Published in 2024 IEEE Smart World Congress (SWC), 2024

Generating Chinese poetry is a complex task with significant potential for large models. However, most current systems only support single-model of input and the output lacks interpretability. This paper proposes a large model based system that supports cross-modal input of text and image, provides interpretable annotations for generated Chinese poems, and supports multiple rounds of iterative optimization. First, it analyzes images with CLIP and MiniGPT-4 and generates descriptive text from analysis with ERNIE-4.0. Then, it generates Chinese ancient poems from the input text and descriptive text by ERNIE-4.0, using our devised prompts based on CRISPE. Finally, it evaluates and then optimizes the created poems with prompts based on Few-shot. Preliminary evaluations have validated the efficacy of our poetry scoring criteria and demonstrated the superior performance of the system when utilizing the conjunction of text and imagery as cross-modal inputs.

L. Yang*, Z. Zhang*, K. Niu, S. Pan, W. Zhu and C. Ma, “Large Model Based Crossmodal Chinese Poetry Creation,” in 2024 IEEE Smart World Congress (SWC), accepted