In an attempt to practice my analytics coding skills, I thought I’ll put them to work in a topic that interests and affects many people across the globe. So, I used Python, NLP (natural language processing), matplotlib, seaborn, WordCloud, and Tweepy, to perform some basic analysis followed by a round of sentiment analysis on data extracted from recent tweets.
Through hands-on implementation of pandas, natural language processing, and #matplotlib, I learnt a bunch of stuff during this project including –
how to install and use wordcloud
how to create and use a twitter #developer account
how to install and use #tweepy
how to perform #sentimentanalysis on extracted data
While the project is not everything I wanted it to be, it provided some good practice in essential data science tools and techniques. I wrote a detailed description of this project in this article on LinkedIn. All the code for this project is posted on my github page.
The realization that I am lacking in so many aspects of data science is sometimes disheartening. However, I am determined to keep moving forward. The day I find myself to be an excellent data analyst cannot be too far, right?
My on-and-off relationship with Python began a few months before I started my Master’s degree. When I knew that I was going to turn towards IT, it was a no brainer that I had to raise my coding game. I had learned C programming during my engineering days but that was almost a decade ago. So, to go back to my roots, I took a weekend course in object-oriented programming with Java. While it was a lot of fun, it became clear to me that Java, though brilliant, was more of a mobile app development tool (no offense, Java lovers!). There was another language that reigned over the data science kingdom and for any chance of success as a data analyst, I had to woo her.
I started learning Python with the MIT OCW course (edX) on Introduction to computational programming with Python to understand the basic data structures and some beginner-level programs. While I got through the basics, I could not complete this course as, after a point, I found it to be a bit dry. And that was that. At UTD, I was already making good progress in my analytics learning trajectory thanks to my work with R programming. So there was no need to hurry things up with another language. However, as things progressed with my club Travelytics, and I came across competitions online, I couldn’t delay getting my hands dirty with Python anymore.
So, I dived right in with Kaggle Learn‘s wonderful data science track which started with 7 hours of Python, including all the basics from variables, lists, loops and functions to important libraries and elementary programs. This was followed by my internship at iCode where I worked with Python projects and also trained over 50 students in the foundations of Python and machine learning. The hands-on exercises and projects at iCode, like building a movie recommender system, were of great help in laying down the foundations of Python for data science in my brain.
Back at UTD, it helped that my friend, Joseph Kim, who was the President of the data science club, conducted some amazing hands-on sessions for people to learn Python basics. Attending these sessions helped me, and many others, stay in the loop (pun totally intended). Then came my own Python research for my facial recognition project to solve crime tourism, at the end of which I had adapted three simple python programs that detect and recognize faces in real time. This was my most memorable time spent with Python programming, as I was able to see some tangible results generated by code written by me.
In the last few months, I have been following the extraordinary free YouTube lessons of Krishna Naik. His Machine Learning playlist is the most valuable resource I have found online that helps me practise everything from the use of impressive data science libraries like NumPy, Pandas and scikit learn to data visualization exercises with matplotlib and Seaborn. He is also an excellent coach in analytics concepts like entropy and Gini impurity, and machine learning algorithms like regression, k-means clustering, k-nearest neighbors, decision trees and ensemble methods.
We are truly fortunate to live in a world and time where so many resources are available for anyone who has an Internet connection and wants to learn. I am currently working my way through Kiril Eremenko’s well-acclaimed Udemy course on Python for data science. While all these wonderful online resources have their charm, nothing comes close to in-class training. This became evident in my object-oriented programming class with Dr. Nassim Sohaee. Her diligent classwork and challenging assignments, which I am still working on, have been excellent tools to help me understand the nuts and bolts of object-oriented design and the anatomy of Python programming. I have worked in various projects dealing with loops, functions, classes, inheritance and exception handling. In addition to all the data science exercises, this class has helped me gain more confidence in leveraging Python as a powerful programming language in the time to come.
This is the fifth post of my #10DaysToGraduate series where I share 10 key lessons from my Master’s degree in the form of a countdown to May 8, my graduation date.
Everybody Loves Recognition! Technically, the definition of recognition is – Identification of someone or something or person from previous encounters or knowledge. But how can it be used to solve real-world problems? This was the premise of a facial recognition project I built using Python and OpenCV on a Raspberry Pi. All the code for this project is available on my github page.
The Problem
Crime tourism, which is very different from ‘crime against tourists’, refers to organized gangs that enter countries on tourist visas with the sole intention to commit crime or make a quick buck. Residing in their destination countries for just a few weeks, they seek to inflict maximum damage on locals before returning to their home countries. It’s something that has been picking up all over the world but especially in Canada, US, Australia. Here’s an excerpt from a Candian Report:
“Over the weekend, we got a notification that there were at least three people arrested,” he said. “And there were two detained yesterday in a different city. It’s just a growing problem.” When police in Australia broke up a Chilean gang in December, they thanked Canadian police for tipping them off. Three suspects who’d fled Ontario and returned to Chile turned up in Sydney, Australia. The tip from Halton Regional Police led to eight arrests and the recovery of more than $1 million worth of stolen goods.
While the tip came in handy, it would be much more effective to have portable facial-recognition devices at airports and tourist spots to identify criminals and stop them before their crime in a new destination.
The Solution
I used Crime tourism as an example problem to demonstrate the use of facial recognition as a solution. It started with buying a Raspberry Pi v3 ($35) and a 5 MP 1080 p mini Pi camera module ($9) and configuring them.
Then, using Adrian Rosebrock’s brilliant tutorial, I embarked on a 10-hour journey (full of mistakes made on my part) to compile OpenCV on my Raspberry Pi! Here are some important things to remember from this compilation expedition:
•You need to expand your file system to be able to use the entire 32 GB of Pi memory •You need to create a Python 3 virtual environment and always make sure that you’re working inside that environment •Before you begin the compile process – Increase the SWAP space from 100 MB to 2048 MB to enable you to compile OpenCV with all four cores of the Raspberry Pi (and without the compile hanging due to memory exhausting). •After installation of NumPy and completion of your OpenCV compilation, re-swap to 100 MB
Python Code for Facial Recognition
I then followed MjRobot’s tutorial to write three simple Python programs for the actual facial-recognition using OpenCV. The object-detection is performed using the Haar feature-based cascade classifiers, which is an effective object detection method proposed by Paul Viola and Michael Jones in their paper, “Rapid Object Detection using a Boosted Cascade of Simple Features” in 2001. It is a machine-learning based-approach where cascade function is trained from a lot of positive and negative images. These images are then used to detect objects in other images. Haar Cascades directory is readily available on the OpenCv github page.
Demonstration
I presented this project on my last day as the President of the UTD club – Travelytics. There, I conducted a live demonstration of the Pi cam capturing my face after I run the first Python program, training the model with the second program, and real-time facial recognition using the third program. Here’s a glimpse:
This project proved to be an excellent route for me to learn the basics of Python, OpenCV, computer vision, Raspberry Pi and how we can implement a low-budget, effective facial recognition solution to complex problems.
As I am all set to enter the final semester of my Masters degree, I am feeling extremely anxious. While most people are concerned about finding a full-time job in a state or company of their preference, for me that thought is still miles away. My immediate concern is how much I know as a data engineer/analyst. 18 months ago, I made the switch from product manager/actor/writer to Business Analytics student. The goal was to become proficient in the concepts of data mining and analysis, since it was a promising sector and the whole world seemed to be moving in a direction where every industry heavily relies on data science. Now, as I get closer to my graduation date, I keep questioning the extent of my knowledge. And to my disappointment, I keep coming across questions I do not know the answer to.
I need to fix this situation and quickly. I have 117 days to go until my graduation date (May 8, 2020). So, I am taking a start from scratch approach for now. The idea is to revise everything I have learnt at UTD as part of my course, followed by a couple of online courses and certifications. This includes the basics of statistics (p-value, hypothesis testing), database foundations, SQL, NoSQL, mining concepts like principal component analysis, regression techniques, clustering, time series, big data – Hadoop, Spark, Hive, language basics in Python and R, and data visualization techniques.
To devise a plan for this, I am contacting some students I look up to and asking for their advise on the best approach to ensure maximum retention. I am also hoping to audit some classes this final semester. I have just one class left to fulfill my graduation requirements but there is so much more I wish to learn. Natural Language Processing, Applied Machine Learning and Business Data Warehousing are my top picks. I have written to the professors asking for their permission to let me sit in on their lectures.
Finally, this will also be my last semester as the president of Travelytics – a club I conceived and founded with the help of some of my friends. After one final project presentation (Computer Vision with Python, OpenCV and Raspberry Pi), it will be time to hand over the reins of this organization to the next batch of students.
For over a decade now, I have chased the dream of becoming a Bollywood star. It has been an amazing ride full of ups and downs. There have been some minor breakthroughs but nothing significant enough for me to make a living out. So, while this journey as an aspiring Bollywood actor has taught me a lot and I have thoroughly enjoyed and loved every bit of it, I have come to realize that it is time to pull the plug.
It has taken a lot of effort for me to come to terms with the fact that my acting career is going nowhere. For over 15 years, all that I wanted was this. No matter what I did, no matter where I went, I always felt that it will connect back to my dream. But now, I feel like I do not want to invest any more of my youth in this “struggle”. I need to accept that I have failed. And it is now time to move on.
It makes me very sad. I feel like something is dying inside me. After all, it’s a dream I have chased since I was 16. However, I have found some solace in the knowledge that acting is now a part of who I am and I can always continue being an actor on the side. This is where acting becomes a hobby for me like playing the guitar or dancing or travel. May be I can get back to doing theatre and join the countless number of doctors, engineers, working professionals who use it as a way of expressing themselves! With that in mind, I have made my peace with my decision of giving up my Bollywood aspirations.
Once I made this call, I started looking at other things that excite me – other areas where I thought I could make a difference. I have worked as a Senior Travel Writer, Editor and Manager over the last few years. During this time, I have had time to travel, volunteer, teach, write, think and reconsider my career options. After a fair amount of self-discovery, I have concluded that the best combination of what I would like to do and what the world needs right now is data science in the solar energy sector.
The world of renewable energy, like every other field these days, generates huge amounts of data and there is a need for analysts and scientists who can make sense of this data. With skilled effort in the right direction, a lot can be done to bring down solar implementation costs. That to me is an exciting future to work towards. With my background in Electronics and Telecommunications engineering, and my interest in programming and statistics, it felt like the right thing to pursue next.
I started my data science journey last year with an introductory course on the R programming language on a website called Datacamp. I have followed it up with an MIT OCW course on Introduction to Computational Thinking Using Python. I have also applied to several universities for my Masters in Business Analytics/ Data Science/ Information Systems. If all goes well, I hope to begin higher studies in Fall 2018.
This is a new beginning and as one would expect, I am nervous and anxious just like I was at the beginning of my Bollywood struggle. I am 32 now and it scares the shit out of me to restart my whole career. Nevertheless, I am driven by the fact that I now have a new purpose – one that can add some value to the world and also help me meet my true potential. I realize that this may look like a clichéd choice, a silly one even. But what matters to me is – it feels like something worth doing no matter how people perceive it. It is what my heart is pointing me towards.
I do plan to continue theatre and acting in some form or another. But now, it would be just for me and not with the motive of “chasing a dream”. My dream has now been replaced by an ambition – Become a skilled Data Scientist and make a revolutionary impact in the Renewable Energy sector.