GSoC'23 Project: Enhancement of Statistics Package in GNU Octave
Addition of kNN and GAMs to the Statistics Package of GNU Octave
Table of contents
No headings in the article.
I am Azmat Khan from Jabalpur, India currently Majoring in Electronics at MITS Gwalior.
I am exhilarated to share that I will be Contributing to the Statistics package of GNU Octave this summer.
The aim of this project is the enhancement of the Statistics package of Octave, which is being heavily developed. The statistics package still lacks a lot of basic functionality required to perform various statistical analyses. The addition of GAMs or Generalized Additive Models will allow users to utilize the power of GAMs to make better predictions for complex relationships with ease of interpretability whereas kNN Classification is a fundamental algorithm for basic prediction and regression-based tasks. The addition of these features will enhance the utility of Octave. this will also enable Octave to be used as a primary tool in Data Science and Statistical Modelling.
I hope to Inspire More people to Open source development and create a positive change in the community through this Project.
My Fork of the repository can be found here where you can check out the progress of the project. Weekly updates and the project's overall progress will be documented in My blog.
Timeline that I proposed in my proposal updated :
WEEK / TIME PERIOD | PROPOSED TASKS |
week 0: May 4 - May 28: Community Bonding Period | - fixing bugs, adding missing functionalities to the statistics package. Getting familiar with the codebase of Octave and its Packages, Understanding the structure of Octave, Identifying the missing and pre-existing functionalities in the package |
week 1: May 29 - June 4 | a. Implementing the missing functionalities. b. adding BISTs and Demo to ridge function. |
week 2: June 5 - June 12 | a. Implementing lasso function for lasso regression. b. Adding BISTs and Demo for lasso. |
week 3: June 13 - June 23 | a. Improving knnsearch function with different distance metrics. b. adding kdtree search method c. adding BISTs and Demos to knnsearch d. Rough Implementation of knnpredict for predicting labels from Input data for query points. |
week 4: June 24 - Jul 4 | a. Implementing knnpredict function b. adding BISTs and Demo for knnpredict c. Implementation of classdef for classificationKNN , d. Implementing fitcknn to fit values into the |
week 5: July 5 - Jul 10 | a. Optimization of implemented functions by identifying the time taken by the part of codes, vectorising, using octaves built-in functionalities and Clearing large matrices. b. Using profiler-guided optimization to improve runtime performance. c. Buffer period for any pending Tasks. |
week 6 : | a. Implementing the missing functionalities. b. Fixing bugs and adding functionalities dependent for implementing GAM. |
week 7 & 8 : | a. Rough Implementation of GAM regression. b. Implementation of GAM regression function from input data. |
week 9 : | a. Adding BISTs and DEMO to the GAM regression function. b. Optimization of implemented functions by identifying the time taken by the part of codes, vectorising, using octaves built-in functionalities and Clearing large matrices. |
week 10 : | a. Additional Tasks b. Adding additional functions essential for the statistics package |
The timeline will be further updated.
Andreas Bertsatos and Nicholas R. Jankowski will be mentoring me throughout this project. I feel incredibly fortunate to be working with both of these Super-Qualified people with a lot of experience to learn from.
In this series of Blogs I will be sharing and documenting my Whole GSoC journey so stay tuned if you are interested in the progress of this project.
Suggestions and Feedback on the project are highly appreciated : )