Tableau – Pivot Billions

Visualizing Gender Disparities in Hi-Tech, Engineering, and Science

pivotteam — Wed, 13 Mar 2019 18:38:07 +0000

We’ve all heard about the many improvements and changes being made to help get women into Hi-Tech and the Sciences, but where are these improvements being seen? Is the entire nation reaching gender equality or are there pockets of improvement and pockets of stagnation. These are the questions I set out to answer using the Safe Graph US Census Bureau Census Block Group American Community Survey Data from Kaggle.

This census data is an extremely comprehensive dataset at the census block group level (on the order of 1500 households each) containing detailed information on a multitude of demographic aspects of the people living in the US. However, the data is quite sizable and comes in the form of a very large zip file that makes the data hard to explore. Moreover, the zip file contains many internal files containing different demographic topics, each with its own schema. I used Pivot Billions to preprocess my data for Tableau due to its data ETL and joining features.

Using it, I quickly extracted the key columns I wanted from the various internal files in the census data while the data was still zipped and in S3.

Then I joined those key columns together by census block group and enhanced the data with a few custom columns such as the percentage of women and the degree of gender inequality.

This second custom column is used to describe how far away from equal (50% men, 50% women) the computer, engineering, and science workers are in each census block group. The values range from 0 to 5 with 5 meaning highly biased towards one gender. Now that the main data I was interested in was ready and loaded into Pivot Billions, I then downloaded the Pivot Billions preprocessed data as a csv file and then simply imported the data into Tableau as a text file.

Now all I had to do to explore the distribution of gender equality and inequality in the United States was to visualize this data. I quickly opened a new worksheet and told Tableau to treat the Census Block, latitude/longitude, and DegreeOfGenderInqeuality fields as Dimensions (categories). Then I added some of the interesting fields to the displayed tooltip in Tableau and colored by DegreeOfGenderInequality. Focusing on the extremes of Equality (Green) and Inequality (Orange), I was left with a very telling interactive visualization.

It was immediately clear that the coastal regions of the US and the Southeastern US region consistently maintained a high level of gender equality. On the other hand, the North/NorthEastern US regions leaned heavily towards just one of the genders (typically towards men).

Though many areas of the US seem to be making strides towards gender equality in Hi-Tech and the Sciences, many still remain heavily biased and stagnant. It may be worthwhile to look at the regulations and companies operating in the coastal and southeastern regions of the US to see what policies (governmental or private) may be having an effect. With time hopefully these policies will be implemented across the US to equalize the representation of both genders in these fields and truly create a more equal society.

To interact with this visualization yourself, feel free to go through my Gender Equality and Inequality workbook on Tableau Public.

San Francisco – A City Full of Psychopaths?

pivotteam — Thu, 31 Jan 2019 19:52:27 +0000

With a population of over 800,000 people within approximately 47 square miles, San Francisco is one of the most densely populated cities in the United States. Cities with this level of population density must contend with a high level of policing activity to ensure safety. I wanted to explore just how safe it actually is so I decided to analyze the SF Police Calls for Services and Incidents dataset from Kaggle which covers many years (2003-2017) of policing data and contains a variety of useful information about each record.

Starting with the police-department-incidents.csv file and loading it into Pivot Billions, I quickly renamed the x and y fields to what they truly were (longitude and latitude) and started to view the whole data. Getting a quick distribution of the Resolution column I was able to see the different ways each police incident was resolved and get an idea for how common each resolution was.

I was surprised to discover that a resolution of “Psychopathic Cases” occurred nearly 30,000 times across the data! What the heck?! Is San Francisco a city of psychopaths? Sure San Francisco has had its share of famous serial killers, but this hardly seemed likely.

To investigate this further, I simply applied a filter on the Resolution column in Pivot Billions to show only the data that ended up classified as a “Psychopathic Case” by the San Francisco PD. I then downloaded this filtered data to visualize in Tableau, choosing to also drop the Location column since Tableau has trouble interpreting it.

Now that my data was in a size and format manageable for Tableau, I loaded the data into Tableau as a Text file and graphed the data by the latitude and longitude columns. By adding the Address and Number of Rows data to the visualization and quickly integrating some interactive filters and highlighters, I ended up with a useful and informative visualization of my Policing data.

This interactive map shows where each “Psychopathic Case” police incident occured and how many incidents were recorded at that location. We can clearly see that there is a high concentration of these cases in Northeast San Francisco. By using the interactive filters in my visualization it was easy to determine that the Central, Northern, Southern, and Tenderloin districts are the main districts where such incidents were recorded the most.

So is San Francisco a city overrun by psychopaths? Hardly.

A little further digging reveals that the vast majority of these so called psychopathic resolutions were in fact “non-criminal - aided cases” where a mentally ill person was involved.

In fact if you look at the Psychopathic Case designation over time, you notice that it disappears almost entirely after 2014.

It looks like this is a case of poor classification of a very serious problem that police in all major cities have to deal with, and that is the rise of mental health related incident calls. There have been a slew of high profile incidents between law enforcement and the mentally ill in San Francisco in the past, so it is a promising sign that the SFPD is moving in the right direction and starting to limit the use of the “Psychopathic Case” designation.

If you want to explore more of this data please feel free to go through my workbook on Tableau Public.

Do the holidays mean bigger tips for NYC taxi drivers?

pivotteam — Mon, 31 Dec 2018 19:51:23 +0000

The holiday season brings with it a degree of cheer and joy that many claim makes people act friendlier towards each other. I wanted to see how this effect translates to action so I decided to look into tips for New York green taxis both during the holiday season and the rest of the year. To start, I streamed all of the green taxi data files from the public NYC Taxi and Limousine Commission Trip Record Data for 2017-07-01 to 2018-06-31 (the most recent year of green taxi data) into Pivot Billions and enhanced the data with two new columns: holidayseason and tip_percent.

There were many rows that weren’t relevant to this analysis since cash payments did not have records of tips, so I filtered out cash payments from the payment_type column in Pivot Billions bringing the total rows to ~5 Million.

To dive into the data I made use of Pivot Billions’ pivot feature to quickly reorganize all of this filtered data by where the passenger(s) were dropped off (DOLocationID) and whether the trip occurred during the holiday season. My over 9 Million original rows of data were now shrunk down to a much more manageable 513 row detailed summary. Downloading this new view of the data from Pivot Billions I switched my focus to visualizing and analysing the data in Tableau.

Now that the data was shrunk down to a size Tableau can handle, I loaded the Taxi Zone Shapefile and my newly downloaded DoLocationID_holiday_tips.csv file into Tableau. This was a simple process of loading the shapefile from our datasource and then joining it to our Pivot Billions - processed file by setting Location ID equal to DOLocationID.

After quickly defining a new metric from our data called “Holiday Effect” that tracks the percentage difference in average tips between the holiday season and the rest of the year and adding a few dynamic filters to the data to make it interactive and explorable, I was left with a very clear and powerful visualization of the green taxi data.

It is immediately clear that there are regions with a much greater occurrence of positive holiday effects (blue areas) than negative effects (orange areas) as well as the reverse. Utilizing Tableau’s dynamic filters it's easy to narrow down the data by location and explore which areas of New York experience the effect the most. It appears that Bronx and Brooklyn experience more negative effects whereas Queens is evenly spread between positive and negative. However, Manhattan and Newark Airport have a much higher proportion of positive effects due to the holiday season.

Though most of New York is being affected by the holidays for better or worse, people going to Manhattan and Newark Airport seem to be feeling the holiday spirit the most.

To view and interact with this visualization or download the workbook to Tableau, see my Holiday Effect on Tips by Drop-Off Location workbook on Tableau Public. You can also explore or download my other workbook to see the Holiday Effect on Tips by Pick-Up Location.

Simplifying Iowa Liquor Sales Data for Loading to Tableau

pivotteam — Fri, 19 Oct 2018 21:41:45 +0000

Overview

In this 5 minute analysis, we pre-process, map, and explore complicated public sales data for liquor stores in Iowa to extract relevant latitude and longitude from a problematic column in the data.

We want to filter the data for the city with the most sales and prepare it for easy loading into the popular BI (business intelligence) tool, Tableau. Finally we use the extracted location data to visualize sales within the selected city using the Tableau Map.

Steps

1. Perform ETL and Load

In this example, we need to first do a little ETL(extract, transform, load) on one of the data columns before we load the data. The data column we are targeting is the StoreLocation column whose contents include both the address of the store as well as latitude and longitude coordinates.

Example: 1414 48TH ST FORT MADISON 52627 (40.624226, -91.373211)

We want to extract only the latitude and longitude data and create two new columns with the extracted data. To do this we do the following:

Drag and drop the data file into Pivot Billions, select the Main checkbox option for that file and then click on Column Preview.
Click the Plus (+) icon on the bottom left hand side of the window to add a new column.
Click the edit mode icon to the left of the data type selector to switch to Advanced ETL mode.
Enter latitude into the Column Name box.
Enter -eval s:latitude '""' -eval - 'RxMap(StoreLocation,"^.*($.*$,$.*$).*$",latitude,"%%1%%")' into the Ess Syntax box.

What's this Ess Syntax? It is a simple scripting format that Pivot Billion uses that allows for data pre-processing. In this step, the "-eval" function that starts the script indicates that we are substituting our new variable latitude with the contents from somewhere else. In this case, the content is coming from what's inside the parentheses of StoreLocation.
Click the Plus (+) icon again to create a second new column.
Click the icon to the left of the new edit box to switch to Advanced ETL mode.
Enter longitude into the Column Name box.
Enter -eval s:longitude '""' -eval - 'RxMap(StoreLocation,"^.*($.*$,$.*$).*$",longitude,"%%2%%")' into the Ess Syntax box.
Confirm the output is what is expected by clicking Check.
Click Import at the top of the screen.

Note: This is a very large file, 3.22 GB with over 12 million rows, so we recommend that your system has at least 16 GB of memory and that your docker setting allocates at least 7.5 GB for docker containers.

2. View and Explore the Data
We now have access to all 12,591,077 rows of the Iowa Liquor Sales data. This contains the original data in an easy tabular format along with the two new latitude and longitude features we’ve created.

We viewed the distribution of sales by city for our data by clicking on the distribution charting icon in the City column. Not surprisingly, the capital of Iowa (Des Moines) with the highest population of any city in the state has the highest total number of liquor sales.

Now, for our analysis we’ll filter all of the data to only include data for the city of Des Moines.

Click on the filter icon in the City column.
Select Equals from the dropdown and then enter “Des Moines” and press enter. All of our data is immediately filtered to just Des Moines data.

The data is almost ready for Tableau or another business intelligence tool but the StoreLocation column is troublesome and prevents Tableau from reading the data. So we’ll go ahead and download the data without this column.

Click the icon on the top right.
Unselect the box next to StoreLocation to remove that column from your selected data.
Now click the icon on the top right.
Enter “DesMoinesSales” for the file name and then click Selected Columns.
When the file is ready for download click the at the bottom of the screen as seen below.

3. Import to Tableau and Graph the Data

We can now easily import this data into Tableau and use its powerful visualizations to dive into Des Moines liquor sales by location.

Unzip the “DesMoinesSales.csv.zip” file we just downloaded from PivotBillions.
Open Tableau and click Text under Connect.
Navigate to the folder where the “DesMoinesSales.csv” file is located and Open it.
Then click on Sheet1.
Right click Latitude under Measures and select Convert to Discrete and then Convert to Dimension.
Repeat this for Longitude.
Drag Longitude to Columns and Drag Latitude to Rows as seen below.
Drag Sale Dollars to Size and drag whatever features you want to Detail.

The Wrap-up

It is clear from this SymbolMap plot that the sales for Des Moines are highest near the city center. However, it is also clear that there are many high volume locations throughout the city. We can now visualize the data however we see fit in Tableau. We could also easily remove our city filter from PivotBillions and load the whole preprocessed data into Tableau, or change our filter(s) to select a different subset or arrangement of the data.

To view and interact with this visualization or download the workbook to Tableau, see my Iowa Liquor Sales Workbook on Tableau Public.