Multimodal retrieval with contrastive pretraining
Özet
In this paper, we present multimodal data retrieval aided with contrastive pretraining. Our approach is to pretrain a contrastive network to assist in multimodal retrieval tasks. We work with multimodal data, which has image and caption (text) pairs. We present a dual encoder deep neural network with the image and text encoder to encode multimodal data (images and text) to represent vectors. These representation vectors are used for similarity-based retrieval. Image encoder is a 2D convolutional network, and text encoder is a recurrent neural network (Long-Short Term Memory). MS-COCO 2014 dataset has both images and captions, and it is used for multimodal training with triplet loss. We used a convolutional Siamese network to compute the similarities between images before the dual encoder training (contrastive pretraining). The advantage is that Siamese networks can aid the retrieval, and we seek to show if Siamese networks can be used in practice. Finally, we investigated the performance of Siamese assisted retrieval with BLEU score metric. We conclude that Siamese can help with image-to-text retrieval tasks. © 2021 IEEE.
Koleksiyonlar
İlgili Öğeler
Başlık, yazar, küratör ve konuya göre gösterilen ilgili öğeler.
-
PREDICTING PATH LOSS DISTRIBUTIONS OF A WIRELESS COMMUNICATION SYSTEM FOR MULTIPLE BASE STATION ALTITUDES FROM SATELLITE IMAGES
Shoer, I.; Gunturk, B.K.; Ates, H.F.; Baykas, T. (IEEE Computer Society, 2022)It is expected that unmanned aerial vehicles (UAVs) will play a vital role in future communication systems. Optimum positioning of UAVs, serving as base stations, can be done through extensive field measurements or ray ... -
Real frequency design of Pi and T matching networks with complex terminations
Şengül, Metin Y.; Yeşilyurt, Gökmen (Institute of Electrical and Electronics Engineers Inc., 2018)In this paper real frequency design of Pi and T matching networks with complex terminations is studied. Generally the generator and load termination impedances are given as measurement values. So they can be regarded as a ... -
Modified Q-Based Real Frequency Design of Narrowband Impedance Equalizer with Complex Terminations
Şengül, Metin Y.; Yeşilyurt, Gökmen (World Scientıfic Publ Co Pte Ltd, 2019)In this paper, real frequency design equations of narrowband impedance matching network with complex terminations are derived; which are used to design L, Pi and T type of networks. In the approach, there is no need to ...