Classification on imbalanced data sets, taking advantage of errors to improve performance

Cervantes Canales, Jair; García Lamont, Farid; LOPEZ CHAU, ASDRUBAL

Mostrar el registro sencillo del objeto digital

dc.contributor.author	Cervantes Canales, Jair
dc.contributor.author	García Lamont, Farid
dc.contributor.author	LOPEZ CHAU, ASDRUBAL
dc.creator	Cervantes Canales, Jair; 101829
dc.creator	García Lamont, Farid; 216477
dc.creator	LOPEZ CHAU, ASDRUBAL; 100664
dc.date.accessioned	2016-05-11T16:14:29Z
dc.date.available	2016-05-11T16:14:29Z
dc.date.issued	2015
dc.identifier.isbn	978-3-319-22052-9
dc.identifier.issn	0302-9743
dc.identifier.uri	http://hdl.handle.net/20.500.11799/41187
dc.description.abstract	Classification methods usually exhibit a poor performance when they are applied on imbalanced data sets. In order to overcome this problem, some algorithms have been proposed in the last decade. Most of them generate synthetic instances in order to balance data sets, regardless the classification algorithm. These methods work reasonably well in most cases; however, they tend to cause over-fitting. In this paper, we propose a method to face the imbalance problem. Our approach, which is very simple to implement, works in two phases; the first one detects instances that are difficult to predict correctly for classification methods. These instances are then categorized into “noisy” and “secure”, where the former refers to those instances whose most of their nearest neighbors belong to the opposite class. The second phase of our method, consists in generating a number of synthetic instances for each one of those that are difficult to predict correctly. After applying our method to data sets, the AUC area of classifiers is improved dramatically. We compare our method with others of the state-of-the-art, using more than 10 data sets.	es
dc.language.iso	eng	es
dc.publisher	Springer	es
dc.relation.ispartofseries	10.1007/978-3-319-22053-6_8;
dc.rights	openAccess
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0
dc.subject	Imbalanced	es
dc.subject	Classification	es
dc.subject	Synthetic instances	es
dc.subject.classification	INGENIERÍA Y TECNOLOGÍA
dc.title	Classification on imbalanced data sets, taking advantage of errors to improve performance	es
dc.type	Capítulo de Libro
dc.provenance	Científica
dc.road	Verde
dc.ambito	Internacional	es
dc.audience	students
dc.audience	researchers
dc.type.conacyt	bookPart
dc.identificator	7

Ficheros en el objeto digital

Nombre: ICIC2015_2.pdf

Tamaño: 287.3Kb

Formato: PDF

Ver documento

Nombre: ICIC2015_2.docx

Tamaño: 218.1Kb

Formato: Microsoft Word 2007

Ver documento

Nombre: ICIC2015_2.epub

Tamaño: 177.3Kb

Formato: application/epub+zip

Ver documento

Este ítem aparece en la(s) siguiente(s) colección(ones)

Conacyt [10019]
Capítulos de Libro [110]

Visualización del Documento

Título
Classification on imbalanced data sets, taking advantage of errors to improve performance
Autor
Cervantes Canales, Jair
García Lamont, Farid
LOPEZ CHAU, ASDRUBAL
Fecha de publicación
2015
Editor
Springer
Tipo de documento
Capítulo de Libro
Palabras clave
Imbalanced
Classification
Synthetic instances