Density-Based Clustering to Deal with Highly Imbalanced Data in Multi-Class Problems

Munguía Lira, Julio Cesar; Rendón Lara, Eréndira; Alejo Eleuterio, Roberto; Granda Gutiérrez, Everardo Efrén; Del Razo López, Federico

Density-Based Clustering to Deal with Highly Imbalanced Data in Multi-Class Problems

Munguía Lira, Julio Cesar; Rendón Lara, Eréndira; Alejo Eleuterio, Roberto; Granda Gutiérrez, Everardo Efrén; Del Razo López, Federico

URI: http://hdl.handle.net/20.500.11799/139897

Fecha: 2023-09-21

Resumen:

In machine learning and data mining applications, an imbalanced distribution of classes in the training dataset can drastically affect the performance of learning models. The class imbalance problem is frequently observed during classification tasks in real-world scenarios when the available instances of one class are much fewer than the amount of data available in other classes. Machine learning algorithms that do not consider the class imbalance could introduce a strong bias towards the majority class, while the minority class is usually despised. Thus, sampling techniques have been extensively used in various studies to overcome class imbalances, mainly based on random undersampling and oversampling methods. However, there is still no final solution, especially in the domain of multi-class problems. A strategy that combines density-based clustering algorithms with random undersampling and oversampling techniques is studied in this work. To analyze the performance of the studied method, an experimental validation was achieved on a collection of hyperspectral remote sensing images, and a deep learning neural network was utilized as the classifier. This data bank contains six datasets with different imbalance ratios, from slight to severe. The experimental results outperform the classification measured by the geometric mean of the precision compared with other state-of-the-art methods, mainly for highly imbalanced datasets.

Descripción:

Artículo sobre un método para manejar imbalance de clases

Mostrar el registro completo del objeto digital

Ficheros en el objeto digital

Nombre: mathematics-11-04 ...

Tamaño: 381.3Kb

Formato: PDF

Descripción: Artículo sobre un ...

Ver documento

Este ítem aparece en la(s) siguiente(s) colección(ones)

Científica [35]

Visualización del Documento

Título
Density-Based Clustering to Deal with Highly Imbalanced Data in Multi-Class Problems
Autor
Munguía Lira, Julio Cesar
Rendón Lara, Eréndira
Alejo Eleuterio, Roberto
Granda Gutiérrez, Everardo Efrén
Del Razo López, Federico
Fecha de publicación
2023-09-21
Editor
Mathematics
Tipo de documento
Artículo
Palabras clave
density-based clustering
sampling methods
deep neural networks