Share
Export Citation
Image search optimization with web scraping, text processing and cosine similarity algorithms
Ridwang
2020 IEEE International Conference on Communication Networks and Satellite Comnetsat 2020 Proceedings
Abstract
The process of searching for image data in cyberspace is only limited to text keywords in the form of file names of images entered on search engines such as Google, Yahoo and Bing so that the results obtained are many variations of the image. With the development of information retrieval technology and text processing, it is hoped that it can help the image search process to be more specific according to the text keywords that are input. The web scrap process can help to dig more detailed information down to metadata from an image source on the website. The text data generated in the web scrap process is further processed using text processing and cosine similarity algorithms to produce information relevant to the image being sought. The results obtained reach 90% accuracy for general image data with a search image limit of up to 20 images. For specific images, it only reaches 25% accuracy for a limit of 20 images. There are 2 things that affect the accuracy value of image search, namely a very large image limit and a very specific query or image name so that there are less relevant images generated by search engines. With an image search method like this, it is expected to be able to find and download images that are truly relevant and of high quality to be used as training data in the image classification process.