Share

Export Citation

APA
MLA
Chicago
Harvard
Vancouver
BIBTEX
RIS
Universitas Hasanuddin
Research output:Contribution to journalArticlepeer-review

Image search optimization with web scraping, text processing and cosine similarity algorithms

Ridwang

2020 IEEE International Conference on Communication Networks and Satellite Comnetsat 2020 Proceedings

Published: 2020Citations: 9

Abstract

The process of searching for image data in cyberspace is only limited to text keywords in the form of file names of images entered on search engines such as Google, Yahoo and Bing so that the results obtained are many variations of the image. With the development of information retrieval technology and text processing, it is hoped that it can help the image search process to be more specific according to the text keywords that are input. The web scrap process can help to dig more detailed information down to metadata from an image source on the website. The text data generated in the web scrap process is further processed using text processing and cosine similarity algorithms to produce information relevant to the image being sought. The results obtained reach 90% accuracy for general image data with a search image limit of up to 20 images. For specific images, it only reaches 25% accuracy for a limit of 20 images. There are 2 things that affect the accuracy value of image search, namely a very large image limit and a very specific query or image name so that there are less relevant images generated by search engines. With an image search method like this, it is expected to be able to find and download images that are truly relevant and of high quality to be used as training data in the image classification process.

Other files and links

Fingerprint

Computer scienceSciences
Information retrievalSciences
Cosine similaritySciences
Image retrievalSciences
Image (mathematics)Sciences
MetadataSciences
Process (computing)Sciences
Automatic image annotationSciences
Image processingSciences
Search engineSciences
Image file formatsSciences
Similarity (geometry)Sciences
Artificial intelligenceSciences
Pattern recognition (psychology)Sciences
World Wide WebSciences
Operating systemSciences