GSoC'23 Week 1 & 2 : Adding demo and BISTs to ridge and implementing lasso regression.
This Blog contains the summary of the work I did in weeks 1 and 2.
Understanding the structure and codebase of Octave
Octave is a high-level interpreted language for Numerical analyses and statistical computing. Octave's basic functionalities can be extended by adding packages. Octave has more than 100 packages that can be loaded into the Octave interpreter and directly be used to perform Numerical computations and statistical analyses. Octave's syntax is largely compatible with MATLAB with minor differences.
For my Project, I will be contributing to Octave's Statistics package. Functions are an integral part of Octave as they allow users to access necessary tools without hassle. All inbuilt functions are .m files within the package folder. Any new function added must be a separate file within the package folder, with the same name as the function. A complete implementation of a function contains the following :
License: User agreement and other information
Texinfo / Docstring: Help text with detailed information on the usage of the function and description of input-output variables. this serves as a guide for the user to understand the usage of function and parameter constraints.
Code: this block will contain the actual code of the function.
Demo: This Block contains examples of the usage of the function that serves as a demo for the usage of the function.
Output BISTs: Built-in Self Tests are self-tests performed by the internal tester of GNU Octave to check for Output for the input. BISTs are important since they indicate whether a function code is properly functioning as expected.
Input BISTs: These BISTs are to test the parameter checks of the function. A Proper function will return an Error for an invalid argument. these tests can range anywhere from checking the number of input arguments to the size and value of variables.
these parts serve as an essential component of a complete function. BISTs help in checking the integrity of the code.
Adding Demos and BISTs to the ridge function
I implemented a function ridge that performs ridge regression and returns the vector of coefficient estimates by applying ridge regression from the predictor matrix X to the response vector y. Each value returned corresponds to the respective ridge parameter in 'k'. Previously I added a Demo using the Acetylene dataset, to observe the change in ridge coefficient estimates. But first I had to fix the issue with the ridge that was returning wrong values of b while unscaling. this error was being produced due to element-wise multiplication instead matrix multiplication. replacing this fixed the problem with wrong values.
if (unscale)
b = b ./ repmat (stdx', 1, nk);
b = [mean(y)-m.*b; b]; ##removed this to
b = [mean(y)-m*b; b]; ## this
endif
Adding DEMO: I wrote another demo from carbig dataset to predict the MPG values using the acceleration, weight, Displacement and Horsepower values as predictors.
load carbig ## loading the dataset
X = [Acceleration Weight Displacement Horsepower];
y = MPG;
Next, we split the data into test and training datasets and predict the ridge coefficient for the training dataset and plot the line with the predicted ridge coefficient and the test set values of X.
Code in the GitHub repo of GNU Octave
Implementing lasso regression in octave
Lasso (Least Absolute Shrinkage and Selection Operator) is a very similar regression to ridge regression with the only difference being the penalty terms in the loss function.
In ridge regression
$$L_{ridge}(\beta) = \sum_{i=1}^{N} (y_i - \beta_0 - \sum_{j=1}^{p} \beta_j x_{ij})^2 + \lambda \sum_{j=1}^{p} \beta_j^2$$
In Lasso regression
$$L_{lasso}(\beta) = \sum_{i=1}^{N}(y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^{p}|\beta_j|$$
and Elastic Net combines the penalties of ridge regression and lasso to get the best of both worlds. Elastic Net aims at minimizing the following loss function:
$$L_{elastic net}(\beta) = \sum_{i=1}^{N} (y_i - \beta_0 - \sum_{j=1}^{p} \beta_j x_{ij})^2 + \lambda_1 \sum_{j=1}^{p} |\beta_j| + \lambda_2 \sum_{j=1}^{p} \beta_j^2$$
where α is the mixing parameter between ridge (α = 0) and lasso (α = 1).