Handling categorical features with many levels using a product partitioning model.

Project: Research Project

Project Details

Description

We represent the categorical predictor by a graph where the nodes are the categories and we establish a probability distribution over significant partitions of this graph.

Conditionally on the observed data, we obtain a posterior distribution for the aggregation of levels, which allows inferring about the most probable grouping for the categories. We draw inferences about all the other parameters of the regression model.

We compare our methods with the state-of-the-art and show that it has equally good predictive performance and more interpretable results.

Our approach balances accuracy against interpretability, a current major concern in statistics and machine learning.
StatusFinished
Effective start/end date3/1/227/1/22

UN Sustainable Development Goals

In 2015, UN member states agreed to 17 global Sustainable Development Goals (SDGs) to end poverty, protect the planet and ensure prosperity for all. This project contributes towards the following SDG(s):

  • SDG 3 - Good Health and Well-being

Main Funding Source

  • Installed Capacity (Academic Unit)

Location

  • Bogotá D.C.

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.