Multimodal retrieval with contrastive pretraining

Alsan, H.F.; Yildiz, E.; Safdil, E.B.; Arslan, F.; Arsan, T.

dc.contributor.author	Alsan, H.F.
dc.contributor.author	Yildiz, E.
dc.contributor.author	Safdil, E.B.
dc.contributor.author	Arslan, F.
dc.contributor.author	Arsan, T.
dc.date.accessioned	2023-10-19T15:05:32Z
dc.date.available	2023-10-19T15:05:32Z
dc.date.issued	2021
dc.identifier.isbn	9781665436038
dc.identifier.uri	https://doi.org/10.1109/INISTA52262.2021.9548414
dc.identifier.uri	https://hdl.handle.net/20.500.12469/4941
dc.description	Kocaeli University;Kocaeli University Technopark	en_US
dc.description	2021 International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2021 --25 August 2021 through 27 August 2021 -- --172175	en_US
dc.description.abstract	In this paper, we present multimodal data retrieval aided with contrastive pretraining. Our approach is to pretrain a contrastive network to assist in multimodal retrieval tasks. We work with multimodal data, which has image and caption (text) pairs. We present a dual encoder deep neural network with the image and text encoder to encode multimodal data (images and text) to represent vectors. These representation vectors are used for similarity-based retrieval. Image encoder is a 2D convolutional network, and text encoder is a recurrent neural network (Long-Short Term Memory). MS-COCO 2014 dataset has both images and captions, and it is used for multimodal training with triplet loss. We used a convolutional Siamese network to compute the similarities between images before the dual encoder training (contrastive pretraining). The advantage is that Siamese networks can aid the retrieval, and we seek to show if Siamese networks can be used in practice. Finally, we investigated the performance of Siamese assisted retrieval with BLEU score metric. We conclude that Siamese can help with image-to-text retrieval tasks. © 2021 IEEE.	en_US
dc.language.iso	eng	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.relation.ispartof	2021 International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2021 - Proceedings	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Convolutional Networks	en_US
dc.subject	Deep Learning	en_US
dc.subject	Long-Short Term Memory (LSTM)	en_US
dc.subject	Multimodal Data	en_US
dc.subject	Pretraining	en_US
dc.subject	Siamese networks	en_US
dc.subject	Triplet loss	en_US
dc.subject	Brain	en_US
dc.subject	Computer vision	en_US
dc.subject	Convolution	en_US
dc.subject	Convolutional neural networks	en_US
dc.subject	Deep neural networks	en_US
dc.subject	Network coding	en_US
dc.subject	Convolutional networks	en_US
dc.subject	Data retrieval	en_US
dc.subject	Deep learning	en_US
dc.subject	Image texts	en_US
dc.subject	Long-short term memory	en_US
dc.subject	Multi-modal	en_US
dc.subject	Multi-modal data	en_US
dc.subject	Pre-training	en_US
dc.subject	Siamese network	en_US
dc.subject	Triplet loss	en_US
dc.subject	Long short-term memory	en_US
dc.title	Multimodal retrieval with contrastive pretraining	en_US
dc.type	conferenceObject	en_US
dc.department	N/A	en_US
dc.identifier.doi	10.1109/INISTA52262.2021.9548414	en_US
dc.identifier.scopus	2-s2.0-85116673208	en_US
dc.institutionauthor	N/A
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
dc.authorscopusid	55364564400
dc.authorscopusid	57289197300
dc.authorscopusid	57288694000
dc.authorscopusid	58353740700
dc.authorscopusid	6506505859
dc.khas	20231019-Scopus	en_US

Files in this item

Name:: 4941.pdf
Size:: 2.134Mb
Format:: PDF
Description:: Tam Metin / Full Text

View/Open

This item appears in the following Collection(s)

Scopus İndeksli Yayınlar Koleksiyonu [2197]
Scopus Indexed Publications Collection

Show simple item record

Multimodal retrieval with contrastive pretraining

Files in this item

This item appears in the following Collection(s)

Related items

PREDICTING PATH LOSS DISTRIBUTIONS OF A WIRELESS COMMUNICATION SYSTEM FOR MULTIPLE BASE STATION ALTITUDES FROM SATELLITE IMAGES ﻿

Real frequency design of Pi and T matching networks with complex terminations ﻿

Modified Q-Based Real Frequency Design of Narrowband Impedance Equalizer with Complex Terminations ﻿

PREDICTING PATH LOSS DISTRIBUTIONS OF A WIRELESS COMMUNICATION SYSTEM FOR MULTIPLE BASE STATION ALTITUDES FROM SATELLITE IMAGES

Real frequency design of Pi and T matching networks with complex terminations

Modified Q-Based Real Frequency Design of Narrowband Impedance Equalizer with Complex Terminations