top of page

Paper accepted at ICDM 2025 workshop LLM4Cluster

  • Writer: Jillian Aurisano
    Jillian Aurisano
  • Oct 7, 2025
  • 1 min read


Our goal in this paper is to process short-text collections, labeled by their topics, and identify those groups of text items that may be better characterized by new labels and those text items that may be mislabeled. When a labeled text collection is clustered based on the syntactic and semantic contents of text items, it is expected that each cluster will also share the same label for text items contained in it. Our approach presented here embeds the short texts in a lower-dimensional space and then, using spatial label entropy as a guide, finds spatial clusters that are contiguous but have a wide diversity in their label assignments. We use LLMs to process such impure label clusters to discover new labels for them. We demonstrate promising results obtained by this approach for two different collections of short texts.



Comments


Get in touch with us

University of Cincinnati

bottom of page