Variable selection using nonparametric estimators for mixed data

About Research in Groups

This project investigates the scalability of nonparametric regression methods when a very high number of mixed data — consisting of continuous, categorical, and functional data — are observed, such as gene expressions changing during drug therapy. One key aspect of this project is to employ an adaptive variable selection step, which chooses relevant features and suitable embeddings of the functional features into appropriate Hilbert or metric spaces. The benefits of this is threefold: the variable selection allows better understanding of the problem at hand, the embedding of the functional features makes it possible to understand how the functional features contribute to the regression task (e.g. by highlighting that the relevance of one such feature comes from the form of its derivative), and the adaptivity makes it usable by nonexperts. Additionally, theoretical results underpin the applicability of the method and provide global error control. Selk’s expertise in working with functional data nicely complements Sell’s in scalable nonparametric methodology.