I recently got introduced to the world of data science at Glitter Point.
When I looked at the data for the first time, it appeared mysterious and incomprehensible. But, it slowly revealed the treasure inside it when I started digging deep and things started to become absolutely clear to me. After playing with the data for some time, I realised that it is the magic wand in our hands. But, we can only leverage the power of this wand by digging deep into the data.
Then this magic wand may tell us, when a machine is going to fail, whether an applicant defaults a loan, whether a customer buys a product etc…
Here comes a data set which I come across, that have details about the car transportation network in the city of New York. The data consists of the extensive details of the rides that is been carried out by this car network in the past 1 year in the city of Newyork.
I was really curios after I closely looked at this data, as it has the information on which times are economical to travel, which places are crowded, when do these car network company over charging, which days are profitable to this car company etc.. In the advent of car network companies like UBER ,I am interested to dig into this dataset.
Let us start exploring this data to find out some interesting answers. Follow me!
- Let us suppose that I am living in the city of Newyork and I got the opportunity to work from home on any day of my choice in the week then what day should I prefer?
The below bar graph is revealing the average fare charged per minute across all the days in a week.
Clearly, the fare per duration is less from Wednesday to Saturday relative to other days. Monday has the highest fare per minute and hence expensive. So, I will happily opt to work from home on Monday as it helps me to beat the Monday blues as well!
Moreover, travelling on Wednesday to Saturday can save you anywhere between 0.1- 0.12 USD per every minute of travel relative to other days in the week.
- What are the best time to go for work in the day?
The best time is the time in which the car ride is relatively economical and hassel free . So, I need to book a cab when the demand and the fare per trip is low. The below line graphs reveal the information that I need.
I Intend to travel to office between 8 to 10 AM. From the graphs, we can understand that the number of booking are low at 10AM and the average fare per trip at 10AM is little higher compared to past 2 hours (8AM & 9AM). But, the bookings are high by few thousand numbers at 8AM and 9AM compared to 10AM.So, I am good to travel at 10AM in the day which is the optimal time to travel.
NOTE: As the Hassel free journey depends on many other factors, the information in the data only indicates approximate situation and help us to build the curiosity to explore further.
- From the data set, I understood that all the cars in the network are owned by just 4 vendors. So, let us explore at what time is the demand for FUEL is very high by knowing which, the vendors are well prepared for the long rides.
I have drawn the above line graph, from which we can understand that the cars will travel long durations from 8AM to 11PM and relatively less in between 12 AM to 5 PM which time they can utilize to fuel their tanks and get prepared for the day.
- Surcharge pricing is really Frustrating!
In principle, I don’t want my fares to include more surcharge. Having trips with low surcharge pricing is a pleasure, isn’t it?
I have drawn the below graph to indicate the average surcharge charged by cabs with respect to time.
From the graph, we can understand that there is no surcharge between 6AM and 3 PM. The surcharge is super high between 4 PM to 7 PM and high between 8PM to 5AM. Knowing this helps me to plan my schedule better without sudden surprises in the fare.
- As I know few places in New York, I would like to explore at which place s and in what times the booking are high. So that I personally may understand the context.
Plotting the Latitude and Longitude of the places and time of Pickup resulted in the below graph.
As the plot suggests, in the area surrounded by yellow circle, more number of bookings are taking place from 6PM to 6AM( 6 PM – 12 AM-Black dots & 12AM – 6 AM- Green dots) .
The place is happened to be Greenwich village in the city of New York. As the Google suggests, the place has many cafes, bars and restaurants.
I got really excited as the data is clearly indicating a place and the time slot in the day between which it should be busy.
As there are few other places too where the bookings are high in other times as well, say between 12PM-6PM-Blue Dots & 6AM – 12PM-Red Dots. This information is very critical for infrastructure planning, to select places for consumer stores, to avoid traffic etc…
Simple insights like this is revealing a lot of information , which is very useful . When I started seeing the data from different dimensions like above, it started giving me a valuable information that saves my time ,efforts and Money. Moreover, this information is also very valuable to the business to optimise their operations.
Currently, I am working on a Machine Learning Model that considers all the above data and predicts the fare even before I start my ride. I am curios to publish my work here.
Please do register to our blog and follow us!