Project: Machine Learning models for image classification, object detection and segmentation in Virtual Reality

Several medical procedures in surgery or interventional radiology are recorded as videos that are used for review, training, and quality monitoring. These videos have at least 3 interesting artifacts - (1.) anatomical structures such as organs, tumors, tissues, etc. (2.) medical equipment, and (3.) medical information overlayed that describe the patient. It will be immensely helpful for review and search purposes if these can be identified and automatically labeled in the videos.

The goal of this project is to have machine learning models that can classify a video frame of procedure being carried out, perform object detection and segmentation on video frame for any disease let’s say tumor. These all need to be done in a Virtual Reality environment where the screen will be curved and inference for object detection and segmentation should be according to that.


  • Train a model for classification, detection, and segmentation of frames from a video.
  • These trained models should be used for inference in VR environment in oculus quest 2.

Required Skills:

  • Python, ML Framework (Tensorflow is preferred, Pytorch), Deep Learning (Familiarity with SOTA networks for segmentation and detection)

Beginner Tasks:

  • Train an object detection model and inference on the curved screen.


Sample mockups:

Datasets: [1] Cholec80 ( 1602.03012.pdf ( | Datasets - Research Group CAMMA (

Starter Task:

  • Implement a classification model for the Cholec80 dataset in a VR environment.
  • Bonus: Search for medical imaging datasets for VR environment and prior ML work being done in the medical domain using VR.


  • More details about dataset and task will follow up.

Very cool project! I hope that you get some great students – I will be following your progress!

Hello, @pri2si17, my name is Zhiyao Xie. I am a second-year graduate student at Macau university of science and technology, and my research area is object detection. The framework I’ve been using recently is Pytorch and I’ve designed some modules for Detectron2, so I have some thoughts on instance segmentation tasks. I’m very interested in this project especially since it needs to be done in a Virtual Reality environment where the screen will be curved.

I wanted to get some feedback from the community on some detail information such as some idea datasets which can help to apply GSoC2022. I will be looking at some segmentation models used in the medical field and looking for some features that may help in model training.

It is the first time for me to apply for GSoC. If I can get some feedback from you, I’ll write down in greater detail the proposal.

Hope you can give me some guidance.

Thank you so much!

1 Like

Greetings @pri2si17 and @shbucher

Please call me Tran. I am currently 2nd year bachelor of Tokyo Institute of Technology, will hit 3rd year this April.

I primarily focus on research involving computer vision and computational photography, with latest paper (submitted on iLRN 2022, yet to know accepted or not till April 7) about application of saliency/object detection algorithms for simulated gaze tracking in VR-augmented immersive learning environments, based on a video feed captured from VR user’s experience. These experience tell me that this project strikingly fits my current scope, and I really want to join this.

I would like to start contributing to this project from now on, and if possible, until the very end of GSOC 2022. I have a fair background in linear algebra, mostly used Python and MATLAB recently and is familiar with detection algorithms. I also have an Oculus Quest 2 headset at my disposal (in fact it belongs to my lab, not mine, but the Professor gave me full access anyway). I would like to know more details about this project, and become a part of it from now.

Thank you so much for reading this. Hope to hear more from you soon.

1 Like

Respected @pri2si17 and @shbucher,

My name is Sankalp and I am a first year Master’s student in Computer Science at IIT Hyderabad, India. My institute Research project is based on Deep learning only. Currently, I am also doing an official Deep learning course which uses Pytorch for the assignments. I have worked on Keras framework also. I have worked on a project on semantic segmentation of crop categories as well using SOTA U-Net architecture.

I sincerely believe I’ll be able to contribute to this project. I am ready to start on it rightaway but I’ll need some initial pointers regarding what to do.

Thank you so much for reading this. Hope to hear more from you soon.

1 Like

Hello Tran – Welcome! I am actually not a mentor for this particular project – just keenly following its progress! @pri2si17 is the mentor. Best, Sherri

Hi @TranHuuNhatHuy @sankalp1911 @XZhiYao,

Thanks for your interest in this project. Can you guys please share your prior work with computer vision involving segmentation and object detection? Meanwhile, I will add more details for dataset and how to proceed.

Hi @pri2si17, I am Aryan Verma, currently a student of 3rd year at CS Department, NIT Hamirpur. The project proposed by you is a combination of my two prior projects in which I have worked on the same technology but in different domains. I worked on Laboratory Simulator with labeled machinery parts (Through ML) in Augmented Reality. I prepared an application using Unity 3D and detected the various parts of machinery using custom object detection and used labels to indicate them in AR.

Also recently I published my work in Elsevier’s ‘Computers in Biology and Medicine’ with professors from Emory University, USA, in which I detected and segmented the area of COVID-19 infection from the lung CT using AI. This work also prepared an android application. My two other works involve brain tumor segmentation from MRI and tongue contour delineation from Ultrasounds are also put to possible publication in SCI Journals. Hence, working on AR/VR and medical image analysis is my keen interest and I have not only practiced these tasks but theoretical computer vision concepts. In Image processing, I have proposed a novel modified LBP(Local Binary Pattern) feature which performs better than LBP and its paper is in progress.

Now, About the curved screen, the video transforms will be used for making it reach the neural network in a simplified manner and the positions of the detection markers in the video can be transformed according to that. Even I have worked on making web browsers in virtual reality through which I have gained concepts for making the side screens depicted in the demo image above. These screens will show real-time data. Not even this, but the user can also place these screens according to his/her convenience by dragging them through hand gestures, triggered with ML pipeline.

My skills and prior work express my keen interest and knowledge in this subject and would like to contribute to it this time also.(I was student at GSoC 2021)

Thanks for your time, I hope you would soon give some resources to prove my skills for the project!

Hi @pri2si17

My name is Shivaditya Shivganesh, currently in my 3rd year at VIT Chennai. I have worked on multiple SOTA Object detection and Segmentation algorithms like SoftTeacher and implemented them in pytorch/tensorflow. I have implemented object detection algorithms on endoscopy datasets like Kvasir Dataset.

I have developed multiple VR games using Unity XR toolkit. I have also developed a Tensorflow based object detection application using Unity with inference on board the VR HMD device. I have also designed an endoscopy Image Classifier with Unity VR.

I believe that my past experiences (Participation in Google Summer of Code 21) and knowledge would help me in contributing to this project.

Hello @pri2si17,

I have already given my intro in the message preceding this one. As for my past experience with CV involving segmentation and object detection, I’ve worked on a project involving semantic segmentation where we had to develop a SOTA segmentation model to classify satellite images into 256 crop categories (GitHub - sankalpmittal1911-BitSian/AgriculturalNet-AgNet-: State of the art model for classifying crop categories.). I used U-NET with skip connections to model such a system. Moreover, I have implemented Object Detection (YOLOv5) algorithm both by using keras and TF in deep learning specialization course. Currently, I am also pursuing official course on deep learning.

Eagerly waiting for more details on dataset and steps to proceed.

Thank you!

Hello @pri2si17 :wave:

My name is Muhammed Abdullah and I am an undergraduate from VNIT, Nagpur, India. I am currently studying Object Detection and Semantic Segmentation and doing an implementation of YOLO from scratch. I found this project quite intriguing and would like to be a part. As for my prior work in ML and CV I have worked on Face Recognition project

I am applying to GSoC for 1st time so not getting how to start with anything, any suggestion would be a great help. Is there anything that I need to prepare prior to proposals? I do have quite good experience with python, PyTorch, and open-source contributions. I am eager to learn new skills needed for the project.

Here is my resume to get an idea of my domains and achievements. Resume

hi @pri2si17 Aryan Gulati this side I am an Undergraduate and ML Enthusiast Currently in my 4th year of Bachelor of Technology (B.Tech.) at Guru Gobind Singh Indraprastha University, majoring in Computer Science & Engineering. I am a Learner, and also passionate about Data I’m good with numbers, Analysing data, and building up research strategies. Also, I enjoy reading about research going on new innovations. Also, I had done projects using Opencv of Object detection, face Recognition, etc I Just want to know those frames from a video of some specific diseases or it’s any wound for example bruises and What type of feedback we have to give ?

Dear @pri2si17 @sunbiz judywawira and munjal-vandana , I am interested in this project, and I was wondering if we could discuss about it. My name is Ambra Jin. I am currently studying at Artificial Intelligence at Queen Mary University London and in September I am pursuing my doctoral degree in the CDT AI for Health in UCL.

My first master’s group project required us to segment a catheter in MRI interventional surgeries.The goal of the study was to use different CNNs to construct a 3D model of the catheter in realtime. To segment a medical image, we compared the performance of various convolutional neural networks. I handled statistical data across platforms in this project, using Matlab and Python code to simulate and train data. I additionally preprocessed all of the training data to produce false artifacts, increasing the variability of the data, and then post-processed it to get a parameterised model. In addition, in my second master’s, I improved my skills by using machine learning in the context of image processing in the Machine Learning and Computer Vision modules. I believe my previous experience will help me ahead for this project, and I hope we can have a chat about it. I can send my CV Best wishes Ambra

Please post your questions here. We can also discuss your ideas to implement the project on the LibreHealth Chat -

In GSoC, we don’t look at the CV as much as your proposed approach to complete the project. The goal of the program is to make you long-term open-source contributors. Please come up with a draft proposal and we can help improve the proposal through discussion.

Hello @pri2si17 , my name is surya prakash ,i am a 3rd year graduate student at galgotias university,india. i have worked on projects where i used convulational neural network using Tensorflow . i have knowledge of using different optimizations algorithm like Adam , Rmsprop ,momentum gradient Descent and stochastic gradient descent where i have used activation function like Relu, tanh, sigmoid for object classification project. i always likes work on projects related to healthcare and will be more than happy to work with you all

Hi @all,

So, I have updated with dataset and starter task. Please work on it and provide the updates.

Happy development!!

Hi everyone, hope you are doing good and working on your proposal and starter tasks. Please let us know if you have any queries.

1 Like

Link it in your proposal, not here.

How are things going? You have until 2022-04-19T18:00:00Z to get this finalized. Finalized means the starter task is completed and ready for review. Failure to do the starter task is an immediate rejection. We do occasionally (though rarely) give extensions for the starter task, we have no control over the Google side of things.