The demand for more and more analysis of IoT data has been growing exponentially with the explosion of connected devices. Unfortunately the cost and time associated to analyze this data has also grown exponentially as data volumes keep getting larger and larger. An enterprise client, who has been collecting data from millions of devices, has […] continue reading »
Category: ETL
Maximizing Sales with Market Basket Analysis
Sales data analyses can provide a wealth of insights for any business but rarely is it made available to the public. In 2018, however, a retail chain provided Black Friday sales data on Kaggle as part of a Kaggle competition. Although the store and product lines are anonymized, the dataset presents a great learning opportunity […] continue reading »
Exposing Potential Fraud in Amazon Reviews
Amazon continues to be one of the most popular marketplaces in the US as well as the world due, at least in part, to its variety of product categories and product reviews. But how accurate are these reviews? Do sellers or their competitors try and influence them in any way? Does the Verified Purchase tag […] continue reading »
Visualizing Gender Disparities in Hi-Tech, Engineering, and Science
We’ve all heard about the many improvements and changes being made to help get women into Hi-Tech and the Sciences, but where are these improvements being seen? Is the entire nation reaching gender equality or are there pockets of improvement and pockets of stagnation. These are the questions I set out to answer using the […] continue reading »
Indoor vs Outdoor Activities? CDC Health Data Shows Which Is Better For You
Diving into CDC Behavioral Risk Factor data using Pivot Billions to learn what exercise behaviors are associated with improved health. Motivating yourself to go outside and get some exercise or play a sport can be hard, but it is worth it. I had trouble with this myself but after looking into the CDC’s […] continue reading »
Understanding 2 Billion Rows of Weblogs in Real-Time
Managing data just keeps getting tougher. The more we think we’ve gotten a handle on our data the more it grows and becomes too large for our existing analyses. This issue became very clear to me after I undertook the task of trying to understand the effectiveness of ad campaigns using SiteCatalyst weblogs. Seeing as […] continue reading »
Real Net Profit: 150% in just 4 Months
Developing a post-commission profitable currency trading model using Pivot Billions and R. Needle, meet haystack. Searching for the right combination of features to make a consistent trading model can be quite difficult and takes many, many iterations. By incorporating Pivot Billions and R into my research process, I was able to dramatically improve the […] continue reading »
Taming 1.5 Billion Rows of “Big Apple” Data
The age of data has arrived. With it, more and more datasets are created and they just keep getting bigger. Whether dealing with private or open data, individuals and organizations across the world are realizing that there are enormous amounts of information and insights to be gained from massive data. The public NYC Taxi and […] continue reading »
R NewYorkers Feeling the Holiday Spirit? Here’s Your Tip
The holiday season brings with it a degree of cheer and joy that many claim makes people act friendlier towards each other. I wanted to see how this effect translates to action so I decided to look into tips for New York green taxis both during the holiday season and the rest of the year. […] continue reading »
Do the holidays mean bigger tips for NYC taxi drivers?
The holiday season brings with it a degree of cheer and joy that many claim makes people act friendlier towards each other. I wanted to see how this effect translates to action so I decided to look into tips for New York green taxis both during the holiday season and the rest of the year. […] continue reading »