GSoC'23 Project: Enhancement of Statistics Package in GNU Octave

Addition of kNN and GAMs to the Statistics Package of GNU Octave

GSoC'23 Project: Enhancement of Statistics Package in GNU Octave

Table of contents

No heading

No headings in the article.

I am Azmat Khan from Jabalpur, India currently Majoring in Electronics at MITS Gwalior.

I am exhilarated to share that I will be Contributing to the Statistics package of GNU Octave this summer.

The aim of this project is the enhancement of the Statistics package of Octave, which is being heavily developed. The statistics package still lacks a lot of basic functionality required to perform various statistical analyses. The addition of GAMs or Generalized Additive Models will allow users to utilize the power of GAMs to make better predictions for complex relationships with ease of interpretability whereas kNN Classification is a fundamental algorithm for basic prediction and regression-based tasks. The addition of these features will enhance the utility of Octave. this will also enable Octave to be used as a primary tool in Data Science and Statistical Modelling.

I hope to Inspire More people to Open source development and create a positive change in the community through this Project.

My Fork of the repository can be found here where you can check out the progress of the project. Weekly updates and the project's overall progress will be documented in My blog.

Timeline that I proposed in my proposal updated :

WEEK / TIME PERIODPROPOSED TASKS
week 0: May 4 - May 28: Community Bonding Period- fixing bugs, adding missing functionalities to the statistics package. Getting familiar with the codebase of Octave and its Packages, Understanding the structure of Octave, Identifying the missing and pre-existing functionalities in the package
week 1: May 29 - June 4a. Implementing the missing functionalities. b. adding BISTs and Demo to ridge function.
week 2: June 5 - June 12a. Implementing lasso function for lasso regression. b. Adding BISTs and Demo for lasso.
week 3: June 13 - June 23a. Improving knnsearch function with different distance metrics. b. adding kdtree search method c. adding BISTs and Demos to knnsearch d. Rough Implementation of knnpredict for predicting labels from Input data for query points.
week 4: June 24 - Jul 4a. Implementing knnpredict function b. adding BISTs and Demo for knnpredict c. Implementation of classdef for classificationKNN , d. Implementing fitcknn to fit values into the
week 5: July 5 - Jul 10a. Optimization of implemented functions by identifying the time taken by the part of codes, vectorising, using octaves built-in functionalities and Clearing large matrices. b. Using profiler-guided optimization to improve runtime performance. c. Buffer period for any pending Tasks.
week 6 :a. Implementing the missing functionalities. b. Fixing bugs and adding functionalities dependent for implementing GAM.
week 7 & 8 :a. Rough Implementation of GAM regression. b. Implementation of GAM regression function from input data.
week 9 :a. Adding BISTs and DEMO to the GAM regression function. b. Optimization of implemented functions by identifying the time taken by the part of codes, vectorising, using octaves built-in functionalities and Clearing large matrices.
week 10 :a. Additional Tasks b. Adding additional functions essential for the statistics package

The timeline will be further updated.

Andreas Bertsatos and Nicholas R. Jankowski will be mentoring me throughout this project. I feel incredibly fortunate to be working with both of these Super-Qualified people with a lot of experience to learn from.

In this series of Blogs I will be sharing and documenting my Whole GSoC journey so stay tuned if you are interested in the progress of this project.

Suggestions and Feedback on the project are highly appreciated : )


GSoC Proposal Github

GSoC Project Page Octave Packages

Did you find this article valuable?

Support Azmat Khan by becoming a sponsor. Any amount is appreciated!