Share
Export Citation
Enhanced Vision Transformer and Image Inpainting for Cataract Stage Classification in Telemedicine
Bustan M.A.
International Conference on New Media Studies Conmedia
Abstract
Cataracts are one of the leading causes of blindness worldwide. Indonesia has a high prevalence of cataracts, accounting for more than 80% of blindness cases. Early and accurate staging is crucial for treatment planning. However, telemedicine applications that use smartphone-based imaging face challenges due to low image quality and specular reflections caused by the reflective and curved surface of the cornea. These reflections can reduce diagnostic accuracy. Recent advances in Vision Transformer (ViT) models have shown promising results in medical image analysis; however, their performance is often compromised by image noise. This study proposes an image inpainting method to reduce specular reflections and ViT enhancement with local window attention (LWA) and region of interest (ROI) masking to enhance feature representation and focus analysis on relevant ocular areas. We evaluated this approach using cataract eye images taken with a smartphone camera to classify disease stages. The experimental results demonstrate that the proposed method achieves an accuracy of 0.98 and a macro F1 score of 0.98, significantly outperforming the baseline model. These results indicate that combining the image inpainting method with the enhanced ViT, which is enhanced by applying LWA and ROI mask, can improve the reliability of cataract stage classification in telemedicine.