Diving Deep into Business Analytics with R Programming

When a class is named after your graduation major, and one of the most popular disciplines in the present world, you know it’s going to be pivotal in your learning path. BA with R proved to be just that. The brilliant Dr. Sourav Chatterjee made it clear right at the beginning that R programming is going to be used just as a tool (which it is) to understand and master the nuances of business analytics. Having said that, his course material left no stone unturned in taking us through all aspects of R programming needed for data science.

I had worked a bit with Java and PHP, but this was my first experience with the R programming language. I started with an introductory course on Datacamp to quickly learn the very basics of R like vectors, matrices and data frames. Then, in class, Dr. Chatterjee proved to be a dedicated and patient professor as he started with basic manipulations and sample generation in R and then quickly moving to the foundations of data analytics. We got familiar with libraries like tidyverse, forecast, gplots and toyed with data visualization using ggplot on some interesting data sets. We created several plots, graphs, charts, and heatmaps, before scaling up to larger data sets.

This was followed by some of the most important things a business analyst/data scientist learns in his career. So far, everything looked pretty straight forward to me but now was the time to push boundaries and actually dive deep into analytics. I was introduced to dimension reduction, correlation matrix and the all-important analytics task of principal component analysis (PCA). I learnt how to evaluate performance of models, create lift and decile charts, and classification with the help of a confusion matrix – all with just a few lines of code. As Dr. Chatterjee explained time and again, it was never about the code. It was about knowing when and how to use it and what to do with the result.

Dr. Sourav Chatterjee’s BA with R class

We then followed the natural analytics progression with linear and multiple regression where I learned about partitioning of data and generating predictions. This was followed by a thorough understanding of the KNN model and how and when to run it. By now, I was beginning to get a hand of problem statements and the approach to take to solve them, thanks to class assignments on real-world scenarios like employee performance and spam detection. Through the examples done in class, it was easy to grasp the concepts of R-squared value, p-value and the roles they play in model evaluation. It was in this class that I understood logistic regression, discriminant analysis, association rules for the first time and I have been working on them ever since, in every data science course or project that I have taken up.

All of this knowledge and Dr. Chatterjee’s guidelines were put to use in the final project where I worked with a group led by the talented Abhishek Pandey on London cabs data. After rigorous work on large data sets downloaded/extracted from various sources, we trained a model to predict arrival times for cabs by comparing RMSE across random forests, logistic regression, and SVMs. It was a great way to put into practice everything we had learned over four months.

And with that, I had laid a robust foundation in data analytics, and was ready to build it further in the time to come. By January 2019, I was confident to dive into analytics projects and work on complex data sets to generate prediction models using the tools taught by Dr. Saurav Chatterjee.

ALSO SEE Saying “Hello, old friend” to Statistics and Analytics

This is the second post of my #10DaysToGraduate series where I share 10 key lessons from my Master’s degree in the form of a countdown to May 8, my graduation date.

Saying “Hello, old friend” to Statistics and Analytics

There’s a reason I chose Statistics to be no. 10 and the first one in this countdown. When you want to enter the world of data science, you realize very quickly that you can do nothing without the concepts of statistics being clear in your head. The University of Texas at Dallas obviously understood this and made Statistics and Analytics a core course. So, when I started my Master’s program in Fall 2018, I enrolled for this course with Dr. Avanti Sethi in my very first semester. Dr. Sethi proved to be an excellent teacher, and I am honored to have had the pleasure of knowing and working with him during the past two years.

Photo by Luke Chesser on Unsplash

Thanks to his well-designed lectures and assignments, I was able to build a strong statistical foundation with good practice of basic concepts like measures of central tendency (mean, median, mode) and measures of statistical dispersion (variance, standard deviation, IQR). The course then went on to cover concepts like population, sampling, estimation, z-score, t-score, Normal distribution, hypothesis testing, p-value, chi-square tests, ANOVA tests and regression. Dr. Sethi, who is an Excel ninja, also conducted a separate hands-on session for students interested in learning Advanced Excel and taught us how to build macros. The problem statements in his assignments covered real-life scenarios ranging from sports team performances and automobile dealerships to Halloween sales and manufacturing plant obstacles.

Dr. Sethi’s class in Sep 2018

And just like that, right in the very first semester, Statistics and Analytics had set the ball rolling on my data science journey. I have been going back to Dr. Sethi’s assignments every few months, to make sure I don’t forget the very foundations of everything that I have learned in analytics so far. It was a memorable semester thanks to this wonderful class, and left me with a lot of confidence to move forward.

This is the first post of my #10DaysToGraduate series where I share 10 key lessons from my Master’s degree in the form of a countdown to May 8, my graduation date.

AWS Certified Solutions Architect Associate – My First Technical Certification

A few months ago when I Googled “top IT certifications”, almost every list mentioned AWS Solutions Architect Associate as one of the exams worth taking. It was only a few weeks ago that I decided to actually go ahead with it. To start preparing, I took the 4-Saturday workshop organized the Computer Science department at The University of Texas at Dallas. At first, the idea was just to add the certification to my Resume in order to make myself more employable. But as the course progressed, I was thoroughly impressed by the wide range of services available to users on the AWS cloud. It looked more and more like an area worth pursuing as a career. The hands-on approach of the workshop conducted by UTD alum Shri Patnaik, also added to my confidence and helped me prepare better. For exam practice, I enrolled in the Udemy practice tests designed by Tutorials Dojo . A few weeks later, I was ready to give the exam.

ALSO SEE Facial Recognition with Python, OpenCV and Raspberry Pi

Amidst the corona virus pandemic, I was fortunate enough to enroll at the Richardson PSI center for a date which was just 2 days before everything started closing down. The test itself was a bit challenging for me as some questions ended up taking more time than I would have expected. However, the practice tests came in handy as I had some experience with the type of questions asked and how to pace myself towards the end. the questions, as expected, covered everything from AWS cloud services to networking concepts to practical considerations and best practices for solution architects. The fact that I had read the AWS whitepapers related to these topics also helped.

When I hit the submit button, my heart was in my mouth (despite numerous exams over the years, I still get very nervous during tests). Everything felt totally worth it when I saw the “Congratulations! You passed..” message on the screen. And that is the gist of how I got my first technical certification 🙂

I am now preparing for a couple of other certification exams. Up next is the AWS Data Analytics certification exam, which I plan to take on April 15. Wish me luck!

CHECK OUT My project portfolio

Facial Recognition with Python, OpenCV and Raspberry Pi

Everybody Loves Recognition! Technically, the definition of recognition is – Identification of someone or something or person from previous encounters or knowledge. But how can it be used to solve real-world problems? This was the premise of a facial recognition project I built using Python and OpenCV on a Raspberry Pi. All the code for this project is available on my github page.

The Problem

Crime tourism, which is very different from ‘crime against tourists’, refers to organized gangs that enter countries on tourist visas with the sole intention to commit crime or make a quick buck. Residing in their destination countries for just a few weeks, they seek to inflict maximum damage on locals before returning to their home countries. It’s something that has been picking up all over the world but especially in Canada, US, Australia.  Here’s an excerpt from a Candian Report:

“Over the weekend, we got a notification that there were at least three people arrested,” he said. “And there were two detained yesterday in a different city. It’s just a growing problem.” When police in Australia broke up a Chilean gang in December, they thanked Canadian police for tipping them off. Three suspects who’d fled Ontario and returned to Chile turned up in Sydney, Australia. The tip from Halton Regional Police led to eight arrests and the recovery of more than $1 million worth of stolen goods.

While the tip came in handy, it would be much more effective to have portable facial-recognition devices at airports and tourist spots to identify criminals and stop them before their crime in a new destination.

The Solution

I used Crime tourism as an example problem to demonstrate the use of facial recognition as a solution. It started with buying a Raspberry Pi v3 ($35) and a 5 MP 1080 p mini Pi camera module ($9) and configuring them.

Then, using Adrian Rosebrock’s brilliant tutorial, I embarked on a 10-hour journey (full of mistakes made on my part) to compile OpenCV on my Raspberry Pi! Here are some important things to remember from this compilation expedition:

•You need to expand your file system to be able to use the entire 32 GB of Pi memory •You need to create a Python 3 virtual environment and always make sure that you’re working inside that environment
•Before you begin the compile process – Increase the SWAP space from 100 MB to 2048 MB to enable you to compile OpenCV with all four cores of the Raspberry Pi (and without the compile hanging due to memory exhausting).
•After installation of NumPy and completion of your OpenCV compilation, re-swap to 100 MB

Python Code for Facial Recognition

I then followed MjRobot’s tutorial to write three simple Python programs for the actual facial-recognition using OpenCV. The object-detection is performed using the Haar feature-based cascade classifiers, which is an effective object detection method proposed by Paul Viola and Michael Jones in their paper, “Rapid Object Detection using a Boosted Cascade of Simple Features” in 2001. It is a machine-learning based-approach where cascade function is trained from a lot of positive and negative images. These images are then used to detect objects in other images. Haar Cascades directory is readily available on the OpenCv github page.

Demonstration

I presented this project on my last day as the President of the UTD club – Travelytics. There, I conducted a live demonstration of the Pi cam capturing my face after I run the first Python program, training the model with the second program, and real-time facial recognition using the third program. Here’s a glimpse:

This project proved to be an excellent route for me to learn the basics of Python, OpenCV, computer vision, Raspberry Pi and how we can implement a low-budget, effective facial recognition solution to complex problems.

Grasping at Straws

Univ

As I am all set to enter the final semester of my Masters degree, I am feeling extremely anxious. While most people are concerned about finding a full-time job in a state or company of their preference, for me that thought is still miles away. My immediate concern is how much I know as a data engineer/analyst. 18 months ago, I made the switch from product manager/actor/writer to Business Analytics student. The goal was to become proficient in the concepts of data mining and analysis, since it was a promising sector and the whole world seemed to be moving in a direction where every industry heavily relies on data science. Now, as I get closer to my graduation date, I keep questioning the extent of my knowledge. And to my disappointment, I keep coming across questions I do not know the answer to.

I need to fix this situation and quickly. I have 117 days to go until my graduation date (May 8, 2020). So, I am taking a start from scratch approach for now. The idea is to revise everything I have learnt at UTD as part of my course, followed by a couple of online courses and certifications. This includes the basics of statistics (p-value, hypothesis testing), database foundations, SQL, NoSQL, mining concepts like principal component analysis, regression techniques, clustering, time series, big data – Hadoop, Spark, Hive, language basics in Python and R, and data visualization techniques.

To devise a plan for this, I am contacting some students I look up to and asking for their advise on the best approach to ensure maximum retention. I am also hoping to audit some classes this final semester. I have just one class left to fulfill my graduation requirements but there is so much more I wish to learn. Natural Language Processing, Applied Machine Learning and Business Data Warehousing are my top picks. I have written to the professors asking for their permission to let me sit in on their lectures.

20191114_191341

Finally, this will also be my last semester as the president of Travelytics – a club I conceived and founded with the help of some of my friends. After one final project presentation (Computer Vision with Python, OpenCV and Raspberry Pi), it will be time to hand over the reins of this organization to the next batch of students.

117 days to go. Time for a final sprint!

Dallas Diaries Video For My Folks

I am studying Information Technology and Management in The University of Texas at Dallas since August 2018. After spending 18 months in Dallas, I returned to Mumbai this December for a short winter break. I wanted to do my best to give my parents and my grandmother a glimpse of my life in at the university. This was difficult as I do not click a lot of pictures. I had, however, captured some videos every now and then. So, I put them together in this video just so that I can give my folks a sneak peek into life in and around Richardson.

Fall Internship, Certifications and The Roadway to Graduation

After 6 months at my first job in the United States, I decided to move on and pursue other avenues. I have several exciting academic projects and certifications lined up over the next few months. My facial recognition robot using Raspberry Pi, Python and OpenCV is almost done. I am preparing to appear for PMP and Cloudera Hadoop certification exams in January 2020, followed by AWS Solutions Architect Associate exam in February 2020. As I get closer to my graduation date (May 2020), I am raring to join the workforce and get my hands dirty solving some real world problems. My iCode internship has given me the push I needed to relaunch my technical career. I have summed up my Fall internship experience at iCode in this LinkedIn post.

The Way She Says Hello

It is probably one of the most commonly-used words of the English language. But when this girl at Starbucks @ Custer&Renner says “Hello“, she instantly stands out! What is it that she does differently that makes you feel welcome, safe, at ease right away? It’s not so much her sweet voice as it is the sincerity behind it.

On one hand, her Hello sounds cute, innocent and charming (almost child-like) and on the other, it’s professional and courteous enough to not make you feel patronized. She makes you feel that no matter how bad your day was or how many challenges lie ahead of you, you can forget about it for now and just enjoy your coffee. Her Hello is the sound of reassurance, of kindness, of earnestness.

Yes, it’s part of her job to say Hello to everyone. But the impact she makes on hundreds of strangers with that one, simple word is testimony to the fact that she is doing something special. For me, this is a great lesson in customer service, in brand management and also in just being a positive energy for people around me. Well done hiring this kid, Starbucks!

UTD Mercury Goes Behind The Scenes of The Rocky Horror Show

For the past two months, I have been part of an incredible experience. I was fortunate enough to get cast in my first musical – The Rocky Horror Show. Everything from the audition process to the rehearsal to the final performances was surreal. For all of us involved, this journey has made its mark as one of the most memorable experiences of our lives. Would you believe if I told you that we had 7 SOLD-OUT SHOWS?! There were long queues every night and many people had to return disappointed (sorry about that!). I wish to write about my whole experience in more detail soon but for now, I’ll leave you with a “Behind the Scenes” video that UTD Mercury created for us. It also features me in a small interview where I express how I am in awe of my fellow Transylvanians!

Travelytics presents BIG DATA IN TRAVEL with Dr. Rick Seaney

When we kicked off our first Travelytics event in 2018, Prof. Kevin Short at UTD was kind enough to grace us with his presence and speak on the use of data in the airline industry. And now, thanks to him, we have a travel domain stalwart visiting UTD and conducting a special lecture for Travelytics. The topic is an exciting one – BIG DATA in the TRAVEL INDUSTRY. We look forward to an exciting session with Dr. Seaney and a bunch of enthusiastic data science students.

Dr. Rick Seaney - Big Data in Travel