Now that you’ve mastered NumPy for numerical analysis, you face a new challenge: Bean Counter’s data comes in spreadsheets, CSVs, and databases. You have:
Sales reports from 50+ locations
Product catalogs with thousands of items
Customer data with demographics and preferences
Supplier information across multiple regions
The Problem: This data has labels, categories, dates, and mixed types. NumPy arrays aren’t enough, you need something more powerful!
Your Tool:Pandas - Think of it as Excel but better, capable of handling millions of rows.
In this tutorial, you’ll learn to wrangle business data, turning spreadsheets into actionable insights.
Excel cannot have more than 1 million rows!
NoteHow to Use This Tutorial
First import pandas and numpy. Work through exercises marked “YOUR CODE BELOW”.
Section 1 - DataFrames: Your CEO’s Digital Spreadsheet
A DataFrame is like a better Excel spreadsheet in Python. Let’s create one for Bean Counter’s store performance.
import pandas as pdimport numpy as np# Create a DataFrame from a dictionarystore_data = {'store_id': [101, 102, 103, 104, 105],'location': ['Downtown', 'Airport', 'University', 'Beach', 'Mall'],'monthly_sales': [450000, 620000, 380000, 290000, 510000],'staff_count': [45, 62, 38, 28, 52],'customer_rating': [4.8, 4.5, 4.9, 4.7, 4.6]}df = pd.DataFrame(store_data)print("Bean Counter Store Performance:")print(df)print(f"\nDataFrame shape: {df.shape} (rows, columns)")
Pandas DataFrames: Perfect for labeled, mixed-type data
Think of DataFrames as spreadsheets!
Exercise 1.1 - Create Product Catalog DataFrame
Build a DataFrame for Bean Counter’s top products.
import pandas as pd# Create a dictionary with product dataproduct_data = {'product_name': ['Espresso', 'Latte', 'Cappuccino', 'Americano', 'Mocha'],'price': [2.50, 4.50, 4.00, 3.00, 5.00],'cost': [0.75, 1.50, 1.25, 0.90, 1.80],'units_sold_daily': [450, 320, 280, 210, 190]}# YOUR CODE BELOW# Create DataFrameproducts_df =
Code
# Test your DataFrameassert products_df.shape == (5, 4), "Should have 5 products and 4 columns"assertlist(products_df.columns) == ['product_name', 'price', 'cost', 'units_sold_daily']assert products_df['price'].sum() ==19.00, "Total price should be 19.00"print("Bean Counter Product Catalog:")print(products_df)print("Perfect! Your product catalog is ready for analysis!")
Section 2 - Exploring Your Business Data
As CEO, you need to quickly understand your data. Pandas provides powerful exploration tools.
Use pandas exploration methods to understand store performance data.
import pandas as pdimport numpy as np# Store performance dataperformance_df = pd.DataFrame({'store': ['Plaza', 'Station', 'Park', 'Beach', 'Airport', 'Mall', 'Downtown', 'University'],'quarterly_revenue': [1250000, 980000, 870000, 650000, 1450000, 1100000, 1350000, 920000],'profit_margin': [32.5, 28.7, 30.1, 25.4, 35.2, 31.8, 33.9, 29.5],'customer_count': [45000, 38000, 34000, 28000, 52000, 41000, 48000, 36000],'satisfaction': [4.7, 4.5, 4.6, 4.8, 4.4, 4.6, 4.8, 4.9]})# YOUR CODE BELOW# 1. Display the first 3 rows and store in a variablefirst_rows =# 2. Get the shape (rows, columns) and save itdata_shape =# 3. Get summary statistics for numerical columns and store themsummary_stats =
Code
# Test your explorationassert data_shape == (8, 5), "Should have 8 stores and 5 columns"assertlen(first_rows) ==3, "first_rows should have 3 stores"assertisinstance(summary_stats, pd.DataFrame), "summary_stats should be a DataFrame"print("First 3 stores:")print(first_rows)print(f"\nDataFrame shape: {data_shape}")print("Great exploration! You understand your data structure!")
Section 3 - Selecting and Filtering CEO Reports
As CEO, you need to slice and dice data to answer specific questions.
# Test your filteringassertlen(high_revenue_stores) ==4, "Should find 4 high revenue stores"assertall(high_revenue_stores['monthly_revenue'] >100000), "All selected stores should have revenue > 100000"assertlen(growing_stores) ==6, "Should find 6 stores with positive growth"assertall(growing_stores['growth_rate'] >0), "All selected stores should have positive growth"print(f"High revenue stores: {list(high_revenue_stores['location'])}")print(f"Growing stores: {list(growing_stores['location'])}")print("Excellent! You've identified your star performers!")
Section 4 - Creating Strategic Insights
As CEO, you need to create new metrics and sort data for decision-making.
Create a comprehensive performance analysis with calculated metrics.
import pandas as pd# Store operational dataoperations_df = pd.DataFrame({'store': ['Plaza', 'Airport', 'Beach', 'Mall', 'Downtown'],'revenue': [125000, 145000, 65000, 110000, 135000],'costs': [87500, 94250, 48750, 77000, 87750],'customers': [4500, 5200, 2800, 4100, 4800],'staff': [25, 32, 15, 23, 28]})# YOUR CODE BELOW# 1. Calculate profit for each storeoperations_df['profit'] =# 2. Calculate profit margin percentageoperations_df['profit_margin'] =# 3. Calculate revenue per customeroperations_df['revenue_per_customer'] =# 4. Calculate customers per staff (efficiency)operations_df['efficiency'] =# 5. Sort by profit (highest to lowest)top_performers =
Code
# Test your dashboardassert operations_df['profit'].sum() ==184750, "Total profit should be 184,750"assert operations_df['profit_margin'].max() ==35.0, "Highest margin should be 35%"assert top_performers.iloc[0]['store'] =='Airport', "Airport should be most profitable"print("CEO Dashboard - Top Performers by Profit:")print(top_performers[['store', 'profit', 'profit_margin', 'efficiency']])print("Great CEO dashboard! You have visibility of performance!")
Section 5 - Reading Real Business Data
As CEO, you’ll work with data from various sources. Let’s load real files!
import pandas as pdimport io# Simulate reading a CSV file (in practice, you'd use pd.read_csv('filename.csv'))csv_data ="""store_id,location,jan_sales,feb_sales,mar_sales101,Downtown,125000,132000,118000102,Airport,145000,152000,148000103,Beach,82000,79000,85000104,Mall,98000,102000,105000105,University,76000,81000,79000"""df = pd.read_csv(io.StringIO(csv_data))print("Quarterly Sales Report (loaded from CSV):")print(df)# Calculate Q1 totalsdf['q1_total'] = df['jan_sales'] + df['feb_sales'] + df['mar_sales']print("\nWith Q1 Totals:")print(df[['location', 'q1_total']])
Process a CSV file containing Bean Counter’s sales data.
import pandas as pdimport io# Simulated CSV data (in practice, you'd read from a file)csv_content ="""product,category,price,units_sold,customer_ratingEspresso,Coffee,2.50,4500,4.7Latte,Coffee,4.50,3200,4.8Cappuccino,Coffee,4.00,2800,4.6Croissant,Food,3.50,1200,4.5Muffin,Food,2.75,1800,4.4Sandwich,Food,6.50,900,4.7Mocha,Coffee,5.00,1900,4.5Americano,Coffee,3.00,2100,4.6"""# YOUR CODE BELOW# 1. Read the CSV datasales_df = pd.read_csv(io.StringIO(csv_content))# 2. Calculate revenue for each productsales_df['revenue'] =# 3. Filter for only Coffee productscoffee_df =# 4. Sort coffee products by units_sold (highest first)coffee_sorted =# 5. Calculate total coffee revenuetotal_coffee_revenue =
Code
# Test your data loading and analysisassertlen(coffee_df) ==5, "Should have 5 coffee products"assert coffee_sorted.iloc[0]['product'] =='Espresso', "Espresso should be top seller"assert total_coffee_revenue ==52650.0, f"Coffee revenue should be 52,650"print("Top Coffee Products by Volume:")print(coffee_sorted[['product', 'units_sold', 'revenue']])print(f"\nTotal Coffee Revenue: ${total_coffee_revenue:,.2f}")print("Perfect! You can now load and analyze real business data!")
Conclusion
Congratulations! You’ve learned the first steps in Pandas for management!
You’ve learned:
DataFrames - Creating and working with structured business data
Exploration - Using head(), info(), describe() for quick insights
Selection - Accessing specific columns and rows of data
Filtering - Finding data that meets business criteria
Calculated Columns - Creating new metrics from existing data
Sorting - Ranking data by any metric
File I/O - Loading data from CSV and Excel files
Your Bean Counter CEO data toolkit now includes:
Ability to work with spreadsheet-like data in Python
Tools to explore and understand large datasets quickly
Skills to filter and find exactly the data you need
Power to create custom metrics and KPIs
Capability to process data from multiple file formats
Remember:
DataFrames are like Excel spreadsheets, but better
Use df['column'] for single columns, df[['col1','col2']] for multiple
Filter with boolean conditions: df[df['revenue'] > 100000]
Create new columns with calculations: df['profit'] = df['revenue'] - df['cost']
Sort with sort_values() to rank your data
Load real data with pd.read_csv() and pd.read_excel()
What’s Next: In the final session of the Python introduction, you’ll combine NumPy and Pandas with visualization to create compelling charts and graphs that will wow the board of directors and drive strategic decisions!
Solutions
You will likely find solutions to most exercises online. However, I strongly encourage you to work on these exercises independently without searching explicitly for the exact answers to the exercises. Understanding someone else’s solution is very different from developing your own. Use the lecture notes and try to solve the exercises on your own. This approach will significantly enhance your learning and problem-solving skills.
Remember, the goal is not just to complete the exercises, but to understand the concepts and improve your programming abilities. If you encounter difficulties, review the lecture materials, experiment with different approaches, and don’t hesitate to ask for clarification during class discussions.