• Skip to main content
  • Skip to after header navigation
  • Skip to site footer
ERN: Emerging Researchers National Conference in STEM

ERN: Emerging Researchers National Conference in STEM

  • About
    • About AAAS
    • About the NSF
    • About the Conference
    • Partners/Supporters
    • Project Team
  • Conference
  • Abstracts
    • Undergraduate Abstract Locator
    • Graduate Abstract Locator
    • Abstract Submission Process
    • Presentation Schedules
    • Abstract Submission Guidelines
    • Presentation Guidelines
  • Travel Awards
  • Resources
    • Award Winners
    • Code of Conduct-AAAS Meetings
    • Code of Conduct-ERN Conference
    • Conference Agenda
    • Conference Materials
    • Conference Program Books
    • ERN Photo Galleries
    • Events | Opportunities
    • Exhibitor Info
    • HBCU-UP/CREST PI/PD Meeting
    • In the News
    • NSF Harassment Policy
    • Plenary Session Videos
    • Professional Development
    • Science Careers Handbook
    • Additional Resources
    • Archives
  • Engage
    • Webinars
    • ERN 10-Year Anniversary Videos
    • Plenary Session Videos
  • Contact Us
  • Login

Development of a Novel Semi-Supervised Based Kernel Density Clustering Algorithm

Undergraduate #109
Discipline: Technology and Engineering
Subcategory: Electrical Engineering

Saul B. Henderson - University of the District of Columbia
Co-Author(s): Keenan Leatham, Nian Zhang, Lara Thompson



Although the semi-supervised learning has great success in many machine learning and data mining applications, its importance under the condition of imbalanced data sets has received very limited attention in the community. Classification of data becomes very difficult because of unbounded size and imbalanced nature of data. The minority samples are those that rarely occur but are extremely important and they also implies an overwhelming cost when they are not well classified. Therefore, it is critical to develop a highly efficient algorithm to alleviate such overlap in low dimensional mapping so as to improve the classification accuracy. Inspired by the semi-supervised learning mode, the proposed clustering algorithms will utilize external information or side information from context classes (i.e. backgrounds or confounders) in addition to intrinsic information from the object class (i.e. targets to be recognized) to partition data into clusters. Such intra-class clustering (ICC) approach partitions each class into sub-classes in order to minimize overlap across clusters from different classes. The new semi-supervised learning based kernel density clustering algorithm consist of a principal component analysis (PCA) step, a difference-of-density estimation step, and a gradient ascent step. Initially data will be projected into a lower dimensional space using PCA. This will ensure the following kernel density estimation can be computed efficiently in high dimensional space. Then, the difference-of-density estimation will perform Gaussian kernel density estimation on both the object class and the context class. It will then find the differences between the two density estimation. Under this analysis, a density clustering algorithm has been developed. It uses the density function as a map, and assign examples on the same ??mountain?? (local peak) to the same cluster. Once the density clustering step is complete, the data in each new cluster share one local maximum on the difference of density map. To search for the local maximum of sample in the object class, the gradient of the difference of density map is calculated through finite difference. A modified Levenberg-Marquardt algorithm will be applied to the gradient to iteratively find the position of the local maximum. Once a local maximum for each example is found, all examples sharing the same local maximum will be assigned to the same subclass label. A kernel-based least square support vector machine (LS-SVM) is designed as a classifier. Its performance is compared with the traditional quadratic classifier on both real-world photo-thermal infrared (IR) imaging spectroscopy (PT-IRIS) data and olfactory database. Experimental results show that the proposed algorithm can not only separate an arbitrary data distribution into non-overlapping unimodal clusters, but also can utilize intervening context data distributions to further separate the clusters. The findings demonstrate that the proposed approach can perform efficiently in those applications where class-conditional densities are significantly non-Gaussian or multi-modal. [This study was supported by a grant from the University of the District of Columbia (NSF/HBCU-UP/ HRD #1505509, HRD #1533479, and NSF/DUE #1654474), Washington, D. C. 20008]

Funder Acknowledgement(s): National Science Foundation (NSF/HBCU-UP/ HRD #1505509, HRD #1533479, and NSF/DUE #1654474)

Faculty Advisor: Nian Zhang, nzhang@udc.edu

Role: I contributed to the Matlab implementation of principal component analysis (PCA) step, the difference-of-density estimation step, and the gradient ascent step.

Sidebar

Abstract Locators

  • Undergraduate Abstract Locator
  • Graduate Abstract Locator

This material is based upon work supported by the National Science Foundation (NSF) under Grant No. DUE-1930047. Any opinions, findings, interpretations, conclusions or recommendations expressed in this material are those of its authors and do not represent the views of the AAAS Board of Directors, the Council of AAAS, AAAS’ membership or the National Science Foundation.

AAAS

1200 New York Ave, NW
Washington,DC 20005
202-326-6400
Contact Us
About Us

  • LinkedIn
  • Facebook
  • Instagram
  • Twitter
  • YouTube

The World’s Largest General Scientific Society

Useful Links

  • Membership
  • Careers at AAAS
  • Privacy Policy
  • Terms of Use

Focus Areas

  • Science Education
  • Science Diplomacy
  • Public Engagement
  • Careers in STEM

Focus Areas

  • Shaping Science Policy
  • Advocacy for Evidence
  • R&D Budget Analysis
  • Human Rights, Ethics & Law

© 2023 American Association for the Advancement of Science