Resumen:
This research work focuses on the development of a device. Such a device could assist doctors in level 1 and 2 healthcare clinics in Mexico. Because, such clinics lack specialists. The device takes pictures of the patient's skin. The pictures allow
to identify diseases and provide a preliminary diagnosis. With the pre-diagnosis it is possible to send the patient to the corresponding specialist. We built a Vision Transformer (VIT) model with a Raspberry Pi 4. The system leverages a dataset augmented by a Generative Adversarial Network (GAN) using Stable Diffusion. The addition of synthetic data significantly improved the performance metrics. Accuracy increased from 90.76% to 92.77%, and the macro average and weighted average F1 scores increased from 0.9076 to 0.9281. Also, improvements were observed in most disease categories. Thus, the model's capacity allows generalization, especially in underrepresented or challenging classes.