EDA – Pivot Billions https://pivotbillions.com Fri, 16 Sep 2022 23:21:15 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.26 Real-time analysis of 50 billion records of IoT data https://pivotbillions.com/real-time-analysis-of-50-billion-records-of-iot-data/ Wed, 16 Sep 2020 23:59:46 +0000 https://pivotbillions.com/?p=4073 continue reading »]]> The demand for more and more analysis of IoT data has been growing exponentially with the explosion of connected devices. Unfortunately the cost and time associated to analyze this data has also grown exponentially as data volumes keep getting larger and larger.

An enterprise client, who has been collecting data from millions of devices, has been wrestling with the growing pain of having to analyse it as the volume of data is exceeding the capacity of their conventional data processing systems.

Their pain is not only related to the size of data but from a lack of agility. The requirements for their analysis changes rapidly and conventional systems simply could not adapt to such changes in a timely or economical manner.

Their latest analysis requirement was to process 50 billion records of GPS data for various different analytic specifications and identify particular behavioral patterns for their customers’ interests. That kind of dynamic requirement made conventional batch data processing very difficult and expensive.

They have researched numerous products from different vendors and chosen AuriQ’s Pivotbillions, a massively parallel, in-memory data processing and analysis solution. AuriQ Pivotbillions enabled the client to analyze their entire 50 billion records in real-time. That meant that analytic queries, including ad-hoc queries, against the entire data set could be processed in seconds or tens of seconds which allowed data analysts to test their hypotheses very efficiently.


Fig1: An example of a visualization of the analyzed 50 billion records of GPS data, showing how devices move before and after the airport.

Because Pivotbillions is a software solution that can run on Amazon Web Services, it did not require any special hardware. Excel-like user interface of Pivotbillions allows analysts to work on data immediately without any coding or learning.

The total system cost to analyze their 50 billions records utilizing PivotBillions on AWS was easily less than 1/10 of conventional systems. It tooks only a few weeks to complete analysis and visualization tasks for various different requirements.

Facts

  • Records: 50 billions records from few millions devices
  • Size: 6 TB in 365 compressed files
  • Repository: AWS S3
  • Instances: AWS EC2 m5.large (up to 500 concurrent EC2)
  • Time to preprocess and load : 30 minutes (from original data in S3)
    • Conventional system: took few days to load a partial sampled data into database
  • Response time of queries on whole 50 billions records: few ~ few tens of seconds
    • Conventional system: took few hours to days to process a query to partial sampled data.
    • Performance varies slightly depending on conditions of AWS.

 

Trial versions of the PivotBillions service is available for free. Click here to request a demo or sign up for a free account.

]]>
5 Minute Analysis of Data.gov Datasets with PivotBillions https://pivotbillions.com/5-minute-analysis-of-data-gov-datasets-with-pivotbillions/ Fri, 19 Apr 2019 19:00:46 +0000 https://pivotbillions.com/?p=2319 continue reading »]]>

Overview

Data.gov is an open source of data provided by the US government that provides a wealth of interesting information ranging from agriculture, education, finance, health, science, etc...Currently, it lists over 200,000 datasets available to explore.

In this 5 Minute Analysis, we picked one dataset from the consumer data category, specifically the Financial Services Consumer Complaint Database. This dataset is provided by the Consumer Financial Protection Bureau or CFPB, and is a record of complaints received to the CFPB on various financial products and services offered by a multitude of companies.

The data is comprised of over 1.2 million rows in csv format and is about 700MB in size. Loading the data was pretty simple using the drag and drop feature in PivotBillions to upload the data to our cloud based demo portal.

 

 

EDA

Doing some quick exploratory data analysis (EDA), here are some general trends that we can see in the data.

Since the launch of the CFPB back in 2011, the number of recorded complaints has gone up year over year.

In that time, the top financial product or service that was associated with a complaint were the following:

By far, mortgage, debt collection and credit reporting outpace other types of products or services with regards to generating complaints.

The top companies who are the recipients of these complaints include the major credit reporting companies and banks.

The top five states for consumer complaints are California, Florida, Texas, New York and surprisingly Georgia.  Georgia is a bit of an outlier considering it is only the 8th largest state in terms of population in the US, but somehow more complaints originated from there compared to Illinois which has a larger population.

The overwhelming preferred method for reporting complaints is online, via the web.

Out of the top three major financial product/services reported on, we see a trend of the mortgage related complaints declining while debt collection and credit reporting complaints have grown significantly over that time.

 

The Wrap-up

What insights did we gain through our EDA? The declining mortgage complaints seem to indicate that it looks like there is steady recovery from the subprime mortgage crisis of 2007-2008, although there is still friction with consumers trying to modify loans or avoid foreclosure. The slow rise of debt collection complaints might indicate that consumers might be taking on more credit card or loan debt and the rate of default is rising. The most drastic increase in complaints is with credit reporting/credit repair services. The huge spike in September 2017 is primarily a result of the Equifax data breach announcement, but even before then it looks like there was a steady rise of complaints with regard to how the credit reporting agencies were handling information in consumer credit reports.

A lot more can be gleaned by delving deeper into the data, but that might have to be reserved for another post. Currently, this dataset is available in our public demo portal for anyone to play with.  Try it for yourself and see what insights you might find.

]]>