My Blog | My Site 1

From a million photos to a few million lives- My journey into corporate and as a researcher

Jul 14, 2022
5 min read

Updated: Aug 31, 2022

Foreword

As I'm walking into my senior year of high school, I found myself asking the question- what do I truly aspire to do as a technologist entrepreneur, and why? After quite a bit of pondering (and driving through the densely forested area near my home), I found my answer with a trip down memory lane.

Starting from Parva Capsula, a smart pill utilizing ML to detect biomarkers of colorectal cancer to provide a diagnosis, to AstraCell, a Pete Conrad Scholar-winning biologically composed fuel cell, to ZoroMine, an eco-friendly quantum-computing (AWS Graviton 3.0) powered crypto mining platform, I have always prioritized environmental contribution in my entrepreneurism. And this is something I've taken as a personal mantra of mine, so it made sense that I made this such a large piston in my entrepreneurial engine. With my research at HPCC Systems, I intend to fully hone in on my offering back to the Healthcare and open source community.

With my initial headspace out of the way, please allow me to introduce my project. I will be conducting research comparing two popular Machine Learning models, a Neural Network using our GNN bundle and a Boosted Tree, to determine which of the two is the better system for colorectal cancer diagnosis. Alternatively, an ensemble between the aforementioned models will yield better results. I believe that by completing my research, I will be able to give this vital information back to the Healthcare community and push the AI medical integration envelope even further. Simultaneously, I will also give back to the HPCC community with beneficial insight on imagery analysis, which fellow ML engineers can use in many image use cases.

And thus began my inspiring journey with the HPCC systems and my wonderful mentor and manager, Mr. Bob Foreman and Ms. Lorraine Chapman, who, along with Mr. Roger Dev, made me feel welcome and a part of the HPCC family even in the weeks leading up to my internship start date.

Week 1- My Warm-Up and Warm Welcome

Work-End: Established office credentials, completed the (industry recognized) compliance training,

Getting knee-deep into a variety of tutorials curated (and narrated!) by my excellent mentor Bob Foreman

Verified my decision to use the Boosted Forest

I continued to add to my GitHub with my progress in learning the ECL language.

retrieved my first and output by inputting my data into one of the BF models on the ECL repository

Personal-End:

I got into the groove of working as a corporate employee.

I received my first office credentials and completed my first compliance training (a surreal experience)

I attended Mr. Vijay Raghavan's lunch and learned session; awe-inspiring to hear such wise words and his insightful answers to my questions.

Week two- Data, Data, Data!

Work end:

Taking a quick detour into python to gather an idea of my ideal output while the GNN bundle underwent surgery of its own

Chose my algorithms and data types of choice to use for the Boosted Forest Model- (F1, Confusion Matrix, Precision, and Recall)

Fed my DenseNet161 split0 into the DenseNet program (preparing for the GNN rendition as it was up and running)

Worked out kinks of the ensembled model (utilizing an intermediate Keras model)

Personal End:

began to understand time-spacing and pacing when working on a project

got out of the mindset of procedural programming and began to think like an ML engineer

During week 3, I was attending HOSA's International Leadership Conference to compete in medical public-speaking and business competitions and was not in the office. However, I was able to attend Mr. Richard Chapman's talk to the HPCC interns in the weekly lunch and share meeting. Hearing his corporate story and the opportunity to ask him some of my burning questions was an extraordinary moment in my internship.

Week 4- Mapping and Sailing

Work-End:

final dataset validation

put together model run-plan (classification experiments -> data loading -> scripts)

began the model on Python3

completed DenseNet161 program

completed Resnet DenseNet average

first attempt at ECL cluster with model

Research into the confusion matrix and annotations

Personal-End:

I began to push the limits of myself and my work ethic.

I was mentally preparing for the hectic next two weeks.

Week 5- Into the Neural Net

Work-End:

Schema for the final ensembled model

Further work into the Neural Network model-

developing the scripts,

confusion matrix

Dry-Running experiments (to test the order in a semi- modeled environment)

Personal-End:

Working under time pressure

Ensuring material progress daily, despite troubleshooting

Week 6- Into the Neural Net, Act II

Work-End:

Deep research into the relationship between the two models and the coupled Intermediate Keras model.

How to go about feeding the Feature-Extracted data from the NN and feed into BF

(Light GBM, numerical data)

Output- confusion matrix, F1, annotations to convert into diagnosis

Intermediate Keras Model-

The dataset is the experiments

compile all data from the four models (DenseNet 161, ResNet 152, Averaged, Fine-Tuned) and average to have a singular and simpler input

conduct repeated experiments and use the aforementioned outputs to transfer

Personal-End:

Reflections back on the first half of this incredible internship

Thinking back to my challenges and how I overcame them

How can I work better or differently in the second half?

And here we reach the 1/2 milestone in my astoundingly life-changing journey. While doing my mental reflection on my personal growth and upcoming over the program's first half, I identified three main areas that most young engineers and I can utilize and grow from in their first workspace experience.

We all have a set of herculean tasks in our careers that we must overcome to step up to the next stage of knowledge and expertise. But the internal communication factor, also known as self-motivation, makes these hurdles appear so tall. The most common hurdle in the Machine Learning field is the initial learning curve. As a high schooler warming into the field myself, I faced this challenge as I was beginning to learn the inner workings and concepts of Machine Learning, Artificial Intelligence, and HPCC systems. To work through this gargantuan hurdle, I defined three main strategies that beginners in Machine Learning can use to improve their self-motivation.

I. Fostering self-learning within you (research-driven learning)

Offering something back to the community (finding purpose)- The nature of healthcare is the more AI-driven solutions that are moving into the diagnosis space; I am confident that my study will offer insight as to which program will offer higher performance in a highly regulated HIPAA compliant industry- hoping this can spark into leveraging the idea for other forms of imagery analysis using HPCC.

I. self-inspiring yourself in a space that requires foundational knowledge

Much self-inspiration comes from first establishing an end goal where you'd like to be and working backward to establish a plan of action. Once this is in place, the only requirement is the motivation to move the plan along.

II. branching into a competitive ML space with no prior experience

The primary outcome of entering the ML space is materializing technology outcomes into a business outcome with a community purpose. Keeping up a "patience first" mindset and leveraging industry-standard tools such as the expansive ECL-ML library can expedite the learning process and help to develop an understanding of the platform and systems you are utilizing.

III. building presentation skills in a high-stakes scenario

An effective presentation relies entirely on the clarity of the message that needs to be put across. After establishing this message through a story that lingers throughout your presentation, adding extra flair like a clean presentation deck or graphics can help further elevate your concept and yourself as a presenter.

As I moved into the second leg of my internship, I began to work on the my Random Forest model, and had a grasp on the development and research process from my weekly learning's in the first half.

Week 7- Trekking through the forest

Began development on the random forest model