US Craft Beers¶

Setting up the database connection¶

In [ ]:

import pandas as pd
import sqlite3

conn = sqlite3.connect("Pathname")

The Craft Beers dataset

In this project, I utilized an intriguing dataset of craft beers, originally compiled by Jean-Nicholas Hould. This dataset, which I sourced from a detailed web scraping tutorial available on Hould's website, encompasses a diverse range of craft beers, providing detailed information about each, including their names, types, brewing details, and other relevant attributes. My objective in this project is to delve deep into this dataset, applying my data analysis skills to uncover trends, patterns, and insights that reveal the nuances and dynamics of the craft beer market. This project not only serves as a testament to my analytical capabilities but also highlights the potential of skillfully curated data in driving meaningful analysis.

Data source: here.

The project began with connecting to a craft beer SQLite database and identifying 'beers' and 'breweries' as the key data tables:

In [ ]:

query = """
SELECT name 
FROM sqlite_master 
WHERE type='table';
"""
df = pd.read_sql_query(query, conn)
df

Out[ ]:

	name
0	beers
1	breweries

The schemas of 'beers' and 'breweries' tables were examined to understand their composition, essential for informed data analysis:

In [ ]:

for table in ['beers','breweries']:
    
    query = f"""
    PRAGMA table_info({table});
    """
    df = pd.read_sql_query(query, conn)
    print(df[['name','type']])
    print('='*30)

         name     type
0         abv     REAL
1         ibu     REAL
2          id  INTEGER
3        name     TEXT
4       style     TEXT
5  brewery_id  INTEGER
6      ounces     REAL
==============================
         name     type
0  brewery_id  INTEGER
1        name     TEXT
2        city     TEXT
3       state     TEXT
==============================

Comprehensive metadata from the 'sqlite_master' table was retrieved to gain insight into the overall database structure, guiding the analytical approach:

In [ ]:

metadata_query = """
SELECT *
FROM sqlite_master;
"""
metadata_result = pd.read_sql_query(metadata_query, conn)
metadata_result

Out[ ]:

	type	name	tbl_name	rootpage	sql
0	table	beers	beers	2	CREATE TABLE "beers" (\n"abv" REAL,\n "ibu" R...
1	table	breweries	breweries	40	CREATE TABLE "breweries" (\n"brewery_id" INTEG...

The first five records from the 'beers' table were analyzed, focusing on fundamental attributes like alcohol content and beer style, setting a direction for deeper exploration:

In [ ]:

five_rows_beers_query = """
SELECT abv, ibu, name, style
FROM beers
LIMIT 5
"""
five_rows_beers_result = pd.read_sql_query(five_rows_beers_query, conn)
five_rows_beers_result

Out[ ]:

	abv	ibu	name	style
0	0.050	None	Pub Beer	American Pale Lager
1	0.066	None	Devil's Cup	American Pale Ale (APA)
2	0.071	None	Rise of the Phoenix	American IPA
3	0.090	None	Sinister	American Double / Imperial IPA
4	0.075	None	Sex and Candy	American IPA

The initial exploration of the 'breweries' table involved examining its first five entries, focusing on brewery identification and location, critical for contextual brewery analysis:

In [ ]:

five_rows_breweries_query = """
SELECT brewery_id, name, city, state
FROM breweries
LIMIT 5;
"""
five_rows_breweries_result = pd.read_sql_query(five_rows_breweries_query, conn)
five_rows_breweries_result

Out[ ]:

	brewery_id	name	city	state
0	0	NorthGate Brewing	Minneapolis	MN
1	1	Against the Grain Brewery	Louisville	KY
2	2	Jack's Abby Craft Lagers	Framingham	MA
3	3	Mike Hess Brewing Company	San Diego	CA
4	4	Fort Point Beer Company	San Francisco	CA

The total number of entries in the 'beers' table was calculated to determine the dataset's breadth, informing the strategy for subsequent analyses:

In [ ]:

beers_total_rows_query = """
SELECT COUNT(*) AS row_count
FROM beers;
"""
beers_total_rows_result = pd.read_sql_query(beers_total_rows_query, conn)
beers_total_rows_result

Out[ ]:

	row_count
0	2410

Unique beer styles in the 'beers' table were counted to assess the diversity of styles represented in the dataset, reflecting the craft beer market's variety:

In [ ]:

distinct_styles_query = """
SELECT COUNT (DISTINCT(style)) AS distinct_styles
FROM beers;
"""
distinct_styles_result = pd.read_sql_query(distinct_styles_query, conn)
distinct_styles_result

Out[ ]:

	distinct_styles
0	99

The range of alcoholic strength in the dataset was analyzed by calculating the minimum, average, and maximum ABV in the 'beers' table, identifying brewing trends and preferences:

In [ ]:

alcoholic_content_query = """
SELECT MIN(abv) AS min, AVG(abv) AS avg, MAX(abv) AS max
FROM beers;
"""
alcoholic_content_result = pd.read_sql_query(alcoholic_content_query, conn)
alcoholic_content_result

Out[ ]:

	min	avg	max
0	0.001	0.059773	0.128

Alcohol by Volume (ABV) data from the 'beers' table was extracted for a detailed analysis of alcohol content distribution, laying the groundwork for statistical examination and visualization:

In [ ]:

abv_distribution_query = """
SELECT abv
FROM beers;
"""
abv_distribution_result = pd.read_sql_query(abv_distribution_query, conn)
abv_distribution_result

Out[ ]:

	abv
0	0.050
1	0.066
2	0.071
3	0.090
4	0.075
...	...
2405	0.067
2406	0.052
2407	0.055
2408	0.055
2409	0.052

2410 rows × 1 columns

The distribution of ABV was visualized using a histogram with 30 bins, providing detailed insights into the variation and typical ranges of alcohol content:

In [ ]:

import matplotlib.pyplot as plt
ax = abv_distribution_result.hist(bins=30)[0][0]
ax.set_title("Alcohol by Volume")
plt.show()

Distinct IPA styles from the 'beers' table were isolated and listed to explore the diversity within this popular beer category, informing a targeted analysis of IPAs:

In [ ]:

ipa_styles_query = """
SELECT DISTINCT(style) AS ipa_styles
FROM beers
WHERE style LIKE "%IPA%"
ORDER BY style ASC;
"""
ipa_styles_result = pd.read_sql_query(ipa_styles_query, conn)
ipa_styles_result

Out[ ]:

	ipa_styles
0	American Double / Imperial IPA
1	American IPA
2	American White IPA
3	Belgian IPA
4	English India Pale Ale (IPA)

Average bitterness levels for lagers and ales, including IPAs and Pale Ales, were analyzed to compare flavor profiles and discern trends across beer types:

In [ ]:

ale_bitterness_query = """
SELECT AVG(ibu)
FROM beers
WHERE style LIKE '%lager%';
"""
ale_bitterness_result = pd.read_sql_query(ale_bitterness_query, conn)
print(ale_bitterness_result)

lager_bitterness_query = """
SELECT AVG(ibu)
FROM beers
WHERE style LIKE "%IPA%" OR "%APA%" OR "%Pale Ale%";
"""
lager_bitterness_result = pd.read_sql_query(lager_bitterness_query, conn)
print(lager_bitterness_result)

    AVG(ibu)
0  22.863636
   AVG(ibu)
0  71.94898

The most common beer styles were identified using a window function to rank styles based on occurrence frequency, highlighting the predominant styles in the craft beer market:

In [ ]:

style_count_query = """
SELECT DISTINCT(style),
COUNT(name) OVER(PARTITION BY style) AS number
FROM beers
ORDER BY number DESC
LIMIT 10;
"""
style_count_result = pd.read_sql_query(style_count_query, conn)
style_count_result

Out[ ]:

	style	number
0	American IPA	424
1	American Pale Ale (APA)	245
2	American Amber / Red Ale	133
3	American Blonde Ale	108
4	American Double / Imperial IPA	105
5	American Pale Wheat Ale	97
6	American Brown Ale	70
7	American Porter	68
8	Saison / Farmhouse Ale	52
9	Witbier	51

Beers and their corresponding breweries in San Francisco were listed, employing an inner join to provide a structured overview of the local craft beer offerings:

In [ ]:

beers_and_brewery_query = """
SELECT breweries.name AS brewery, beers.name AS beer
FROM beers
INNER JOIN breweries
ON beers.brewery_id = breweries.brewery_id
WHERE breweries.city = 'San Francisco'
ORDER BY brewery ASC, beer ASC;
"""
beers_and_brewery_result = pd.read_sql_query(beers_and_brewery_query, conn)
beers_and_brewery_result

Out[ ]:

	brewery	beer
0	21st Amendment Brewery	21st Amendment IPA (2006)
1	21st Amendment Brewery	21st Amendment Watermelon Wheat Beer (2006)
2	21st Amendment Brewery	Back in Black
3	21st Amendment Brewery	Bitter American
4	21st Amendment Brewery	Bitter American (2011)
5	21st Amendment Brewery	Brew Free! or Die IPA
6	21st Amendment Brewery	Brew Free! or Die IPA (2008)
7	21st Amendment Brewery	Brew Free! or Die IPA (2009)
8	21st Amendment Brewery	Fireside Chat
9	21st Amendment Brewery	Fireside Chat (2010)
10	21st Amendment Brewery	He Said Baltic-Style Porter
11	21st Amendment Brewery	He Said Belgian-Style Tripel
12	21st Amendment Brewery	Hell or High Watermelon Wheat
13	21st Amendment Brewery	Hell or High Watermelon Wheat (2009)
14	21st Amendment Brewery	Hell or High Watermelon Wheat (2009)
15	21st Amendment Brewery	Hop Crisis
16	21st Amendment Brewery	Lower De Boom
17	21st Amendment Brewery	Marooned On Hog Island
18	21st Amendment Brewery	Monk's Blood
19	21st Amendment Brewery	Special Edition: Allies Win The War!
20	Anchor Brewing Company	Brotherhood Steam
21	Anchor Brewing Company	California Lager
22	Anchor Brewing Company	IPA
23	Anchor Brewing Company	Liberty Ale
24	Anchor Brewing Company	Summer Wheat
25	Fort Point Beer Company	KSA
26	Fort Point Beer Company	Park
27	Fort Point Beer Company	Villager
28	Fort Point Beer Company	Westfalia
29	Pine Street Brewery	Atom Splitter Pale Ale
30	Pine Street Brewery	Black Bay Milk Stout
31	Speakasy Ales & Lagers	Baby Daddy Session IPA

The ten breweries with the most extensive range of beers were identified, showcasing the breweries with the most significant variety in their offerings:

In [ ]:

largest_breweries_query = """
SELECT breweries.name AS name, COUNT(*) AS "number of beers"
FROM beers
INNER JOIN breweries
ON beers.brewery_id = breweries.brewery_id
GROUP BY breweries.brewery_id, breweries.name 
ORDER BY "number of beers" DESC, name ASC
LIMIT 10;
"""
largest_breweries_result = pd.read_sql_query(largest_breweries_query, conn)
largest_breweries_result

Out[ ]:

	name	number of beers
0	Brewery Vivant	62
1	Sun King Brewing Company	38
2	Oskar Blues Brewery	33
3	Cigar City Brewing Company	25
4	Sixpoint Craft Ales	24
5	Hopworks Urban Brewery	23
6	Stevens Point Brewery	22
7	21st Amendment Brewery	20
8	Great Crescent Brewery	20
9	Bonfire Brewing Company	19

The top 10 states by the number of breweries and total beer production were analyzed, combining subqueries and aggregation to rank states based on craft beer industry activity:

In [ ]:

 top_10_states_query = """
     SELECT state, count(*) AS "breweries", sum(b.beers) "beers" FROM breweries
     JOIN 
         (SELECT brewery_id, count(*) AS beers
          FROM beers
          GROUP BY brewery_id) b 
     ON b.brewery_id = breweries.brewery_id
     GROUP BY state
     ORDER BY breweries DESC, state ASC
     LIMIT 10
 """

top_10_states_result = pd.read_sql_query(top_10_states_query, conn)
top_10_states_result

Out[ ]:

	state	breweries	beers
0	CO	47	265
1	CA	39	183
2	MI	32	162
3	OR	29	125
4	TX	28	130
5	PA	25	100
6	MA	23	82
7	WA	23	68
8	IN	22	139
9	WI	20	87

Analyzing the relationship between ABV and IBU. The correlation heatmap shows a moderate positive correlation of 0.67 between ABV (Alcohol by Volume) and IBU (International Bitterness Units), suggesting beers with higher ABV tend to be more bitter, but the relationship is not strong enough to predict bitterness solely based on alcohol content:

In [ ]:

import seaborn as sns

abv_ibu_query = """
SELECT abv AS ABV, ibu AS IBU
FROM beers;
"""
abv_ibu_df = pd.read_sql_query(abv_ibu_query, conn)

correlation_data = abv_ibu_df.dropna()
sns.heatmap(correlation_data.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation between ABV and IBU")

plt.show()

The bar chart displays the average Alcohol by Volume (ABV) for craft beers across various states, revealing Nevada as having the highest average ABV and Utah as having the lowest. The data suggest a standardization in ABV among most states, with Nevada and Utah marking the upper and lower bounds of this range. The results could be indicative of state-specific regulations, consumer tastes, or brewing traditions that influence the alcohol content in craft beer production:

In [ ]:

import sqlite3
import matplotlib.pyplot as plt

avg_abv_by_state_query = """
SELECT state, AVG(abv) as average_abv
FROM beers
JOIN breweries ON beers.brewery_id = breweries.brewery_id
GROUP BY state
ORDER BY average_abv DESC;
"""
avg_abv_by_state = pd.read_sql_query(avg_abv_by_state_query, conn)

avg_abv_by_state.plot(kind='bar', x='state', y='average_abv', figsize=(10, 6))
plt.title("Average ABV by State")
plt.ylabel("Average ABV")
plt.xlabel('State')

plt.show()

This section of the analysis prepares the craft beer dataset for a geographical deep dive. We start by standardizing state names in the avg_abv_by_state dataset, ensuring uniformity for accurate mapping. A mapping dictionary is utilized to convert state abbreviations to their full names. This step is pivotal for aligning our beer data with the geographical data, setting the foundation for an insightful spatial analysis:

In [ ]:

import geopandas as gpd

avg_abv_by_state['state'] = avg_abv_by_state['state'].str.upper().str.strip()

state_abbrev_to_full = {
    'AL': 'Alabama', 'AK': 'Alaska', 'AZ': 'Arizona', 'AR': 'Arkansas', 'CA': 'California',
    'CO': 'Colorado', 'CT': 'Connecticut', 'DE': 'Delaware', 'FL': 'Florida', 'GA': 'Georgia',
    'HI': 'Hawaii', 'ID': 'Idaho', 'IL': 'Illinois', 'IN': 'Indiana', 'IA': 'Iowa',
    'KS': 'Kansas', 'KY': 'Kentucky', 'LA': 'Louisiana', 'ME': 'Maine', 'MD': 'Maryland',
    'MA': 'Massachusetts', 'MI': 'Michigan', 'MN': 'Minnesota', 'MS': 'Mississippi', 'MO': 'Missouri',
    'MT': 'Montana', 'NE': 'Nebraska', 'NV': 'Nevada', 'NH': 'New Hampshire', 'NJ': 'New Jersey',
    'NM': 'New Mexico', 'NY': 'New York', 'NC': 'North Carolina', 'ND': 'North Dakota', 'OH': 'Ohio',
    'OK': 'Oklahoma', 'OR': 'Oregon', 'PA': 'Pennsylvania', 'RI': 'Rhode Island', 'SC': 'South Carolina',
    'SD': 'South Dakota', 'TN': 'Tennessee', 'TX': 'Texas', 'UT': 'Utah', 'VT': 'Vermont',
    'VA': 'Virginia', 'WA': 'Washington', 'WV': 'West Virginia', 'WI': 'Wisconsin', 'WY': 'Wyoming',
    'DC': 'District of Columbia'
}

avg_abv_by_state['state'] = avg_abv_by_state['state'].map(state_abbrev_to_full)

gdf = gpd.read_file('Pathname')

gdf = gdf.merge(avg_abv_by_state, left_on='NAME', right_on='state')
gdf.set_index(gdf.state, inplace= True)
gdf.drop(['GEO_ID', 'STATE', 'LSAD', 'NAME'], inplace= True, axis= 1)

gdf.head()

Out[ ]:

	CENSUSAREA	geometry	state	average_abv
state
Maine	30842.923	MULTIPOLYGON (((-67.61976 44.51975, -67.61541 ...	Maine	0.057815
Massachusetts	7800.058	MULTIPOLYGON (((-70.83204 41.60650, -70.82373 ...	Massachusetts	0.055683
Michigan	56538.901	MULTIPOLYGON (((-88.68443 48.11579, -88.67563 ...	Michigan	0.063377
Montana	145545.801	POLYGON ((-104.05770 44.99743, -104.25015 44.9...	Montana	0.056487
Nevada	109781.180	POLYGON ((-114.05060 37.00040, -114.04999 36.9...	Nevada	0.066900

Here, we integrate the prepared craft beer data with geospatial information. Using GeoPandas, we load US state boundary data from a GeoJSON file and merge it with our craft beer dataset. By aligning the state names from both datasets and setting up the data frame for geographical plotting, we pave the way for a compelling visualization that will showcase ABV trends across the US:

In [ ]:

from bokeh.io import show, output_notebook
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar, HoverTool
from bokeh.palettes import brewer
from bokeh.plotting import figure

# Add a formatted ABV field for hover display
gdf['formatted_abv'] = gdf['average_abv'].apply(lambda x: f'{x:.3f}')

# Convert GeoDataFrame to GeoJSON format
geo_json = gdf.to_json()

# Create a GeoJSONDataSource
geo_source = GeoJSONDataSource(geojson=geo_json)

# Define a color mapper: palette can be adjusted
palette = brewer['YlGnBu'][8]
color_mapper = LinearColorMapper(palette=palette, low=gdf.average_abv.min(), high=gdf.average_abv.max())

# Create figure object with adjusted dimensions
p = figure(title='Average Alcohol by Volume (ABV) across States',
           height=600, width=950,
           toolbar_location='below',
           tools="pan, wheel_zoom, reset")

# Set the x and y range bounds
p.x_range.bounds = (-130, -65)
p.y_range.bounds = (24, 50)

# Remove axis labels and grid lines
p.axis.visible = False
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None

# Add patch renderer to figure
p.patches('xs', 'ys', source=geo_source,
          fill_color={'field': 'average_abv', 'transform': color_mapper},
          line_color='black', line_width=0.5, fill_alpha=1)

# Configure hover tool
hover = HoverTool()
hover.tooltips = [
    ("State", "@state"),
    ("Average ABV", "@formatted_abv")
]
p.add_tools(hover)

# Add color bar
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=12, width=500, height=20,
                     location=(0,0), orientation='horizontal')
p.add_layout(color_bar, 'below')

output_notebook()
show(p)

Loading BokehJS ...

The resulting map, depicting the average ABV by state, reveals intriguing regional trends in the craft beer market. States like Nevada and California might show higher average ABV, indicating a preference for stronger beers, whereas states like Utah could exhibit lower ABV averages, possibly due to local regulations or cultural preferences. This geographical representation not only highlights regional variations in beer strength but also provides valuable insights into the diverse landscape of American craft beer preferences.

The bar chart indicates that "ale" is the most frequently occurring word in beer names, suggesting it is a very popular type of beer. On the opposite end, "wheat" appears as the least common among the listed words, which could indicate it's a less common flavor profile or style in the selection of beers analyzed:

In [ ]:

from collections import Counter
import re

query_beers = """
SELECT *
FROM beers;
"""
beers = pd.read_sql_query(query_beers, conn)

words = Counter(re.findall(r'\w+', ' '.join(beers['name']).lower()))
most_common_words = words.most_common(10)

words_df = pd.DataFrame(most_common_words, columns=['Word', 'Frequency'])
words_df.plot(kind='bar', x='Word', y='Frequency')
plt.title("Most Common Words in Beer Names")

plt.show()

Ramzan Albakov

Craft Beer

A thorough investigation into the craft beer industry, employing sophisticated data analysis and visualization to uncover trends and patterns, demonstrating my ability to analyze niche markets with precision.

US Craft Beers¶

Setting up the database connection¶

	state	breweries	beers
0	CO	47	265
1	CA	39	183
2	MI	32	162
3	OR	29	125
4	TX	28	130
5	PA	25	100
6	MA	23	82
7	WA	23	68
8	IN	22	139
9	WI	20	87

	state	breweries	beers
0	CO	47	265
1	CA	39	183
2	MI	32	162
3	OR	29	125
4	TX	28	130
5	PA	25	100
6	MA	23	82
7	WA	23	68
8	IN	22	139
9	WI	20	87

	state	breweries	beers
0	CO	47	265
1	CA	39	183
2	MI	32	162
3	OR	29	125
4	TX	28	130
5	PA	25	100
6	MA	23	82
7	WA	23	68
8	IN	22	139
9	WI	20	87