Image Caption Generation Through the Integration of CNN-Based Residual Network Architectures and LSTM

Santi D.

doi:10.1109/ICICoS62600.2024.10636926

Share

Export Citation

APA

MLA

Chicago

Harvard

Vancouver

BIBTEX

RIS

Universitas Hasanuddin

Research output:Contribution to journal›Article›peer-review

Image Caption Generation Through the Integration of CNN-Based Residual Network Architectures and LSTM

Santi D.

Proceedings International Conference on Informatics and Computational Sciences

Published: 2024Citations: 10

Abstract

Image captioning improves understanding of visuals and words and impacts image retrieval and visual information. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs), more especially long short-term memory (LSTM) systems, are being used in this field’s recent advances to address the difficulty of using deep learning to produce concise narratives and highlight object-concept relationships. This study focuses on CNNs with residual network architectures (ResNet-50, ResNet-101, ResNet-152) tested on the Flickr 8k dataset. The aim is to explore and understand how the depth of the network affects the understanding of visual structures and contexts in improving the quality of descriptions. Our methodology involves several stages, including image feature extraction, text preprocessing, and model optimization and evaluation using metrics like BLEU scores. Experimental results demonstrate the effectiveness of our approach, with the ResNet-101 model achieving the highest BLEU score among the tested models. This work contributes to the ongoing efforts to bridge the gap between visual data understanding and natural language generation, offering promising prospects for more natural and accurate image captioning systems.

Access to Document

10.1109/ICICoS62600.2024.10636926

Fingerprint

Computer scienceSciences

ResidualSciences

Artificial intelligenceSciences

Residual neural networkSciences

Computer visionSciences

Convolutional neural networkSciences

Computer graphics (images)Sciences

AlgorithmSciences

Share

Export Citation

Image Caption Generation Through the Integration of CNN-Based Residual Network Architectures and LSTM

Abstract

Access to Document

Other files and links

Related Papers

Fingerprint