Projects

Course Projects

tOSU

Slice Sampling: Theory and Methods, with Applications in R Report Slides Codes
Comparative Analysis of Dimensionality Reduction Techniques for Image Recognition (in R) Report Codes
Effect of economic advancement on deforestation Report Codes

IIT Kanpur

A Brief Review of Sparse Principal Components Analysis and its Generalization Report Slides Codes

Abstract
Principal Component Analysis is a widely studied methodology as it is a useful technique for dimension reduction. In this report, we discuss Sparse Principal Component Analysis (SPCA), which is a modification over PCA. This method is able to resolve the interpretation issue of PCA. Additionally, it provides sparse loadings to the principal components. The main idea of SPCA comes from the relationship between PCA problem and regression analysis. We also discuss GAS-PCA, which is a generalization over SPCA and this method performs better than SPCA, even in finite sample cases. Our report is mainly based on Zou et al. (2006) and its extension Leng and Wang (2009).
Nonparametric Kernel Density Estimation for the Metropolis-Hastings Algorithm Report Slides Codes

Abstract
In this report, we discuss how the rejection step of the Metropolis-Hastings algorithm affects kernel density estimation. We elaborate on the theory developed by Roberts et al. (2003) by providing extensive proofs and explore applications exhibiting their efficiency in various problems.
Spectral Clustering: Theory and Applications Report Codes

Abstract
In this report, we present a class of popular clustering algorithms called Spectral Clustering algorithms. We introduce graph theoretic notations required to understand the report. We discuss similarity graphs and graph Laplacians, along with their important properties. Three popular clustering algorithms are presented. Choice of optimal number of clusters, similarity functions, similarity graphs and graph Laplacians are also discussed. We then present Spectral clustering through different looking glasses. Finally, we apply Spectral clustering to simulated and real life datasets. This report is primarily based on Von Luxburg (2007).
Efficient High-dimensional Robust Variable Selection via Rank-based LASSO Methods Report Slides Codes

Abstract
Penalized variable selection is a popular approach for describing the relationship between the response, and explanatory variables, . LASSO-based methods have received special attention throughout the literature of regression analysis. But stringent conditions are imposed on the relation and on the error distribution. In this report, we present Rank-LASSO as a robust, superior method over the general LASSO, which can be used even when number of predictors is much larger than the sample size. The major properties of the Rank-LASSO has been presented in a non-asymptotic fashion, which makes it useful for the aforementioned case of . The report also shows the superiority of the thresholded modified version of Rank-LASSO in more general scenarios. Apart from theoretical results, we present numerical experiments for demonstrating that performance of the Rank-LASSO is substantially better than regular LAD-LASSO in terms of robust model selection problems. The report is primarily based on Rejchel, W., & Bogdan, M. (2020).
Understanding Nonparametric Modal Regression via Kernel Density Estimation Report Slides Codes

Abstract
In this report we review non-parametric Modal Regression using Kernel Density Estimator. Instead of using conditional mean, Modal Regression uses conditional mode to summarize the relationship between the response and the explanatory variables. We describe the idea of Modal Regression and include a brief discussion regarding the superiority of Multi-modal regression over the Uni-modal case. The consistency properties of the proposed estimator and the idea of Confidence Sets have been reviewed. This report also includes an application of Prediction Sets in case of Bandwidth selection. Certain generalizations and extensions are also discussed. The report is primarily based on Chen et al. (2016).
Understanding Confidence Intervals in Adaptive Markov Chain Monte Carlo Report Codes

Abstract
In this report, we attempt to understand the problems in asymptotic variance estimation for Adaptive Markov Chain Monte Carlo (AMCMC) and the role of confidence intervals in providing consistent estimation procedures for the asymptotic variance. The report is primarily based on Atchade´ (2012).
Ozone concentration and meteorology in the LA Basin, 1976 - A Regression Study Report Slides Codes

Details
Performed Exploratory Data Analysis on the Ozone (LA Basin, 1976) dataset to understand the effect of meteorological variables in predicting Ozone concentration.
Confirmed multicollinearity, heteroscedasticity, normality, and auto-correlation with appropriate tests and took corrective measures for each, developing three parametric predictive models.
Implemented Alternating Conditional Expectation (ACE) algorithm to create a non-parametric model that improved R^2 by 8% and RMSE by 62% with respect to the best of the three parametric models.

Arkajyoti Bhattacharjee

Projects

Course Projects

tOSU

IIT Kanpur