It’s been over three months at my new job and I have already learnt a variety of new things. Working with a brilliant team of developers, I have been exploring the nitty-gritties of Spotfire and unraveling new functionalities every week. While there is a significant difference between the available functionalities for the client and web versions, Spotfire leads my list of best visualization tools to provide business insights. But what I have found most exciting recently is integrating it with Smarstheet.
Smartsheet, as the name suggests, is a smart sheet – essentially an Excel-meets-Trello-meets-Tableau platform. Tying it up with Spotfire gives you a pretty neat solution where on one end, you can create efficient workflows for your team to work directly on the data source and on the other, pull the data in to create great data visuals. To give you a simple example – you can set an automation in Smartsheet to send out email reminders to different members of your team to populate cells assigned to them and this data can then be pulled into the corresponding Spotfire dashboard. The only catch here is – you would need a Smartsheet business license to connect Smartsheet to Spotfire using the Live Data Connector.
As I continue to monitor things that are new in upcoming versions of Spotfire by following Neil Kanungo’s enlightening Dr. Spotfire sessions, I plan to keep an eye out for other such integrations to help my team build efficient processes and deliver fast, accurate business insights.
It’s all been a bit overwhelming. Yes, there is a global pandemic that has shaken things up for almost everybody on this planet. But that aside, the last few months have been an emotional roller coaster. I graduated with a Master’s degree in May, left behind some loved ones in Texas and moved to California in June, turned 35 (still can’t believe it!) in October… and throughout all this, kept looking for full time employment to land on my feet. So, you can imagine my thrill when I inform you that just as the year was nearing a disappointing end, I have joined Amgen in Thousand Oaks, California as a Clinical App & Analytical Services Manager.
I really can’t describe how I feel at this point. Relieved? Happy? Anxious? Motivated? While these words begin to describe my emotions, they don’t completely portray what’s going on. It’s been that kind of a year. What I do know is that I have been incredibly lucky throughout this journey. I am touched by the kind gestures of a number of people who have tried to help and support me. From the professors at my University to my friends and colleagues to strangers who admired my Resume song to a kind old friend who offered me a lifeboat in the form of a contract job to my cousin who suddenly reappeared in my life to play big brother – I have had so much love and warmth that my heart is full of gratitude. I also know that whatever is coming my way holds great value and can never be taken for granted.
I now move forward, with the entirety of my skills, dedication, and experience, to join Amgen’s mission to serve patients. As part of the R&D team, I’ll work for Global Development Operations and strive to provide meaningful analytical insights and business intellect using vital data. I am already moved by the warm welcome I have received at the firm, and am pumped to work with a brilliant global team. It’ll be amazing to employ my Python, Spotfire, data analytics, and business training and experience to help develop life-saving drugs. I am inspired by the incredible things Amgen has achieved over the years, and feel honored that I can be part of its upcoming feats.
On the personal front, I will soon be moving into a little apartment – my bachelor pad if you will – in Los Angeles! Once the virus is no longer a threat and it’s safe to step out again, I would love to explore the city of Hollywood. I’ve heard and read great things about southern California and can’t wait to check this part of the world out. Like anyone beginning life in a new city, I’m feeling the butterflies. Let’s see what 2021 has in store!
This will probably be my last post of the year. So, here’s wishing everyone reading this a merry Christmas and a marvelous new year! Happy Holidays!
In an attempt to practice my analytics coding skills, I thought I’ll put them to work in a topic that interests and affects many people across the globe. So, I used Python, NLP (natural language processing), matplotlib, seaborn, WordCloud, and Tweepy, to perform some basic analysis followed by a round of sentiment analysis on data extracted from recent tweets.
Through hands-on implementation of pandas, natural language processing, and #matplotlib, I learnt a bunch of stuff during this project including –
how to install and use wordcloud
how to create and use a twitter #developer account
how to install and use #tweepy
how to perform #sentimentanalysis on extracted data
While the project is not everything I wanted it to be, it provided some good practice in essential data science tools and techniques. I wrote a detailed description of this project in this article on LinkedIn. All the code for this project is posted on my github page.
The realization that I am lacking in so many aspects of data science is sometimes disheartening. However, I am determined to keep moving forward. The day I find myself to be an excellent data analyst cannot be too far, right?
Last summer I worked with a Game of thrones dataset for a visualization project. I was planning to revisit that dataset to unravel some more mysteries, when it occurred to me that I should look for something similar with my current favorite – The Office.
I found this wonderful dataset of lines from the show. It has dimensions like Speaker and Seasons making it a tempting dataset for a Tableau exercise. The first thing that came to mind was to get into Michael’s business – That’s What She Said!
Nothing surprising here – Michael obviously stands out! I was also interested in looking at the lines from a sentiment analysis point of view. It turns out that not many people laugh in the show (at least that’s what the script says). An analysis of the lines revealed some unusual observations –
Angela talks more than Oscar, and Toby talks more than Stanley
Dwight laughs more than Pam, and Toby more than Oscar
Looking at both these dashboards together, you can see that –
Season 4 has the maximum number of “That’s what she said”s but the lowest lines with characters laughing.
You can find the dashboard on my github page. I wanted to explore this further but I came across this amazing Tableau Public workbook, and this brilliant article where the author goes into data mining with R and word frequencies and character correlations. These are great inspirations for me to explore some other datasets and come up with interesting insights and dashboards.
When a class is named after your graduation major, and one of the most popular disciplines in the present world, you know it’s going to be pivotal in your learning path. BA with R proved to be just that. The brilliant Dr. Sourav Chatterjee made it clear right at the beginning that R programming is going to be used just as a tool (which it is) to understand and master the nuances of business analytics. Having said that, his course material left no stone unturned in taking us through all aspects of R programming needed for data science.
I had worked a bit with Java and PHP, but this was my first experience with the R programming language. I started with an introductory course on Datacamp to quickly learn the very basics of R like vectors, matrices and data frames. Then, in class, Dr. Chatterjee proved to be a dedicated and patient professor as he started with basic manipulations and sample generation in R and then quickly moving to the foundations of data analytics. We got familiar with libraries like tidyverse, forecast, gplots and toyed with data visualization using ggplot on some interesting data sets. We created several plots, graphs, charts, and heatmaps, before scaling up to larger data sets.
This was followed by some of the most important things a business analyst/data scientist learns in his career. So far, everything looked pretty straight forward to me but now was the time to push boundaries and actually dive deep into analytics. I was introduced to dimension reduction, correlation matrix and the all-important analytics task of principal component analysis (PCA). I learnt how to evaluate performance of models, create lift and decile charts, and classification with the help of a confusion matrix – all with just a few lines of code. As Dr. Chatterjee explained time and again, it was never about the code. It was about knowing when and how to use it and what to do with the result.
We then followed the natural analytics progression with linear and multiple regression where I learned about partitioning of data and generating predictions. This was followed by a thorough understanding of the KNN model and how and when to run it. By now, I was beginning to get a hand of problem statements and the approach to take to solve them, thanks to class assignments on real-world scenarios like employee performance and spam detection. Through the examples done in class, it was easy to grasp the concepts of R-squared value, p-value and the roles they play in model evaluation. It was in this class that I understood logistic regression, discriminant analysis, association rules for the first time and I have been working on them ever since, in every data science course or project that I have taken up.
All of this knowledge and Dr. Chatterjee’s guidelines were put to use in the final project where I worked with a group led by the talented Abhishek Pandey on London cabs data. After rigorous work on large data sets downloaded/extracted from various sources, we trained a model to predict arrival times for cabs by comparing RMSE across random forests, logistic regression, and SVMs. It was a great way to put into practice everything we had learned over four months.
And with that, I had laid a robust foundation in data analytics, and was ready to build it further in the time to come. By January 2019, I was confident to dive into analytics projects and work on complex data sets to generate prediction models using the tools taught by Dr. Saurav Chatterjee.
This is the second post of my #10DaysToGraduate series where I share 10 key lessons from my Master’s degree in the form of a countdown to May 8, my graduation date.
There’s a reason I chose Statistics to be no. 10 and the first one in this countdown. When you want to enter the world of data science, you realize very quickly that you can do nothing without the concepts of statistics being clear in your head. The University of Texas at Dallas obviously understood this and made Statistics and Analytics a core course. So, when I started my Master’s program in Fall 2018, I enrolled for this course with Dr. Avanti Sethi in my very first semester. Dr. Sethi proved to be an excellent teacher, and I am honored to have had the pleasure of knowing and working with him during the past two years.
Thanks to his well-designed lectures and assignments, I was able to build a strong statistical foundation with good practice of basic concepts like measures of central tendency (mean, median, mode) and measures of statistical dispersion (variance, standard deviation, IQR). The course then went on to cover concepts like population, sampling, estimation, z-score, t-score, Normal distribution, hypothesis testing, p-value, chi-square tests, ANOVA tests and regression. Dr. Sethi, who is an Excel ninja, also conducted a separate hands-on session for students interested in learning Advanced Excel and taught us how to build macros. The problem statements in his assignments covered real-life scenarios ranging from sports team performances and automobile dealerships to Halloween sales and manufacturing plant obstacles.
And just like that, right in the very first semester, Statistics and Analytics had set the ball rolling on my data science journey. I have been going back to Dr. Sethi’s assignments every few months, to make sure I don’t forget the very foundations of everything that I have learned in analytics so far. It was a memorable semester thanks to this wonderful class, and left me with a lot of confidence to move forward.
This is the first post of my #10DaysToGraduate series where I share 10 key lessons from my Master’s degree in the form of a countdown to May 8, my graduation date.