After your outstanding performance as Operations Director, the board has unanimously appointed you as CEO of Bean Counter! You now lead a coffee empire with:
50+ locations across the country
1,000+ employees
Millions of transactions per month
Thousands of products and suppliers
The Challenge: As CEO, you’re drowning in data. Your Python lists take minutes to process sales reports. Board meetings are tomorrow, and you need answers NOW!
Your new Tool:NumPy - a Python library that processes numerical data fast. What takes minutes with regular Python takes seconds with NumPy.
Section 1 - Understanding and Installing Packages
Before we dive into NumPy, let’s understand what packages are and how to install them.
What are Packages?
As CEO, you wouldn’t build every tool from scratch - you’d use the best tools available. The same applies to Python!
Packages are collections of pre-written code that solve common problems
They’re created by experts and shared with the community
Think of them as apps you can add to Python to extend its capabilities
TipWhy Use Packages?
Instead of writing thousands of lines of code yourself, you can install a package in seconds and get professional-grade functionality. It’s like the difference between building your own espresso machine vs. buying a professional one!
Installing NumPy with uv
To use NumPy (or any package), we first need to install it. We’ll use uv, which we already used to install Python at the start. Do the following in the terminal where you have all your notebooks as well as the files that uv added during its initialization. Not in the notebook!
# Install NumPy for numerical computinguv add numpy
# While we're at it, let's also install pandas for next sessionuv add pandas
NoteHow to Install
Open your terminal (or use the terminal in your IDE)
Type uv add numpy pandas and press Enter
Wait a few seconds for installation to complete
That’s it! You can now use NumPy in your code
Using Installed Packages
Once installed, you can import and use packages in your Python code:
import numpy as np # Import NumPy with alias 'np' (standard convention)
The as np part creates a shorthand. Instead of typing numpy.array() later, you can type np.array() and access numpy functions.
Now let’s see why NumPy is essential!
WarningPrerequisites
Make sure you’ve installed NumPy using uv add numpy in your terminal before starting the next section!
Section 3 - Why NumPy? The Speed Advantage
Now let’s see why NumPy is so important for large-scale data processing.
import numpy as npimport time# Comparing Python lists vs NumPy for 1 million sales transactionssize =1000000# Python list approach (slow)python_sales =list(range(1, size +1)) # Creates a liststart = time.time() # Starts a timerpython_result = [x *1.08for x in python_sales] # Add 8% taxpython_time = time.time() - start # Computes the time difference# NumPy approach (fast!)numpy_sales = np.arange(1, size +1) # Creates a NumPy arraystart = time.time() # Starts a timernumpy_result = numpy_sales *1.08# Add 8% tax to ALL at once!numpy_time = time.time() - start # Computes the time differenceprint(f"Processing {size:,} transactions:")print(f"Python list: {python_time:.4f} seconds")print(f"NumPy array: {numpy_time:.4f} seconds")print(f"NumPy is {python_time/numpy_time:.1f}x faster!")
Create a NumPy array of this week’s daily revenues and calculate the total.
import numpy as np# Daily revenues for the week (in thousands)daily_revenues = [125.5, 132.8, 118.9, 145.2, 155.7, 189.3, 176.4]# YOUR CODE BELOW# Convert to NumPy arrayrevenues_array =# Calculate total weekly revenuetotal_revenue =
Code
# Test your revenue calculationassertisinstance(revenues_array, np.ndarray), "Should be a NumPy array"assertint(total_revenue) ==int(1043.8), f"Total should be 1043.8, got {total_revenue}"print(f"Weekly revenue: ${total_revenue:.1f}k")print("Excellent! You've made your first CEO-level analysis with NumPy!")
Section 4 - Creating Arrays for Business Data
As CEO of Bean Counter, you need different ways to create and initialize data arrays.
import numpy as np# Different ways to create arrays for business scenarios# From a list - actual dataquarterly_sales = np.array([1250000, 1380000, 1420000, 1510000])# Zeros - initialize budget trackingbudget_tracking = np.zeros(12) # 12 months, all start at 0# Ones - initialize satisfaction scoresinitial_ratings = np.ones(10) *4.5# 10 stores, all start at 4.5# Range - sequential IDs or time periodsstore_ids = np.arange(1, 51) # Store IDs from 1 to 50months = np.arange(1, 13) # Months 1-12print(f"Q1 Sales: ${quarterly_sales[0]:,}")print(f"Number of stores: {len(store_ids)}")print(f"Initial ratings: {initial_ratings[:3]}...") # First 3 stores
Q1 Sales: $1,250,000
Number of stores: 50
Initial ratings: [4.5 4.5 4.5]...
Exercise 4.1 - Initialize Company Metrics
As CEO, set up arrays for tracking various company metrics.
import numpy as np# YOUR CODE BELOW# 1. Create an array of 50 zeros for tracking store profitsstore_profits =# 2. Create an array with store IDs from 101 to 150 (50 stores)store_codes =# 3. Create an array of 12 months, each starting with budget of 100000monthly_budgets =
Code
# Test your arraysassert store_profits.shape == (50,), "Should have 50 stores"assert store_codes[0] ==101and store_codes[-1] ==150, "Store codes should be 101-150"assert np.sum(monthly_budgets[:3]) ==300000, "Q1 budget should be 300000"print(f"Profit tracking shape: {store_profits.shape}")print(f"First 5 store codes: {store_codes[:5]}")print(f"Q1 budget total: ${np.sum(monthly_budgets[:3]):,}")print("Perfect! Your metric tracking system is initialized!")
Section 5 - Vectorized Operations for Mass Updates
As CEO, you need to update prices, apply taxes, and calculate profits across lots of products instantly.
import numpy as np# Prices for 10 productsoriginal_prices = np.random.uniform(2.50, 25.00, 10)print(f"Sample original prices: {original_prices[:5].round(2)}")# CEO Decision: 15% price increase across the boardnew_prices = original_prices *1.15# Add 8% sales tax to all pricesprices_with_tax = new_prices *1.08# Calculate revenue if we sell 50 of each productquantities = np.ones(10) *50revenues = prices_with_tax * quantitiesprint(f"\nAfter 15% increase + tax: {prices_with_tax[:5].round(2)}")print(f"Total potential revenue: ${revenues.sum():,.2f}")
Sample original prices: [12.44 22.27 9.93 11.65 4.93]
After 15% increase + tax: [15.45 27.65 12.34 14.47 6.12]
Total potential revenue: $6,806.60
TipRandom Number Generation
Numpy allows us, for example, to generate uniform random numbers between two values. The syntax is np.random.uniform(a, b, c), where a and b are the lower and upper bounds of the range, respectively while c is the number of random numbers to generate.
WarningNo Loops Needed!
With NumPy, avoid writing loops for mathematical operations:
# Don't do this:for i inrange(len(prices)): prices[i] = prices[i] *1.15# Do this instead:prices = prices *1.15
Perform mass calculations across all Bean Counter stores.
import numpy as np# Monthly data for 50 storesrevenues = np.array([125000, 98000, 145000, 87000, 156000, 134000, 92000, 167000,118000, 143000, 99000, 175000, 132000, 89000, 154000, 121000,138000, 95000, 162000, 108000, 147000, 131000, 88000, 159000,126000, 141000, 93000, 168000, 115000, 152000, 128000, 86000,144000, 119000, 137000, 96000, 171000, 113000, 149000, 124000,135000, 91000, 164000, 107000, 146000, 129000, 85000, 158000,122000, 140000])# Cost is 65% of revenue for each storecosts = revenues *0.65# YOUR CODE BELOW# 1. Calculate profit for each store (revenue - costs)profits =# 2. Calculate profit margin for each store (profit / revenue * 100)profit_margins =# 3. Apply 25% corporate tax to get after-tax profitprofits_after_tax =print(f"Total monthly profit (before tax): ${profits.sum():,.2f}")print(f"Total monthly profit (after tax): ${profits_after_tax.sum():,.2f}")
Code
# Test your calculationsassert np.isclose(profits.sum(), 2240700), "Total profit before tax should be 2,240,700"assert np.isclose(profit_margins.mean(), 35.0), "Average margin should be 35%"assert np.isclose(profits_after_tax.sum(), 1680525), "After-tax profit should be 1,680,525"print(f"Total monthly profit (before tax): ${profits.sum():,.2f}")print(f"Total monthly profit (after tax): ${profits_after_tax.sum():,.2f}")print("Fantastic! You've learned company-wide financial calculations!")
Section 6 - Statistical Analysis for CEO Insights
CEOs need quick statistical insights to make strategic decisions.
import numpy as np# Customer satisfaction scores from 1000 surveysnp.random.seed(42) # For reproducible resultssatisfaction_scores = np.random.normal(4.2, 0.5, 1000) # Mean 4.2, std 0.5, normal distributionsatisfaction_scores = np.clip(satisfaction_scores, 1, 5) # Keep between 1-5print(f"Survey Analysis (n=1000):")print(f"Average satisfaction: {satisfaction_scores.mean():.2f}")print(f"Standard deviation: {satisfaction_scores.std():.2f}")print(f"Lowest score: {satisfaction_scores.min():.2f}")print(f"Highest score: {satisfaction_scores.max():.2f}")print(f"Median score: {np.median(satisfaction_scores):.2f}")# How many customers gave 4+ stars?highly_satisfied = np.sum(satisfaction_scores >=4.0)print(f"\nHighly satisfied (4+): {highly_satisfied} ({highly_satisfied/10:.1f}%)")
Survey Analysis (n=1000):
Average satisfaction: 4.20
Standard deviation: 0.46
Lowest score: 2.58
Highest score: 5.00
Median score: 4.21
Highly satisfied (4+): 649 (64.9%)
Note, how we can use the methods provided by NumPy to calculate statistics on the satisfaction scores array. These methods are efficient and concise, making our code more readable and maintainable.
2D Arrays
So far, we’ve worked with 1D arrays (like a single row or column). As CEO, you often need 2D arrays. Think of them as tables with rows and columns!
import numpy as np# Example: Sales data for 5 stores over 7 days# Rows = stores, Columns = dayssales_table = np.array([ [125, 132, 128, 145, 155, 189, 176], # Store 1 [98, 102, 95, 108, 115, 142, 138], # Store 2 [156, 162, 159, 171, 178, 198, 192], # Store 3 [87, 91, 88, 95, 102, 125, 118], # Store 4 [134, 139, 136, 148, 153, 178, 165] # Store 5])print("Sales Table (5 stores × 7 days):")print(sales_table)print(f"\nShape: {sales_table.shape} (rows, columns)")# Calculate statistics along different axestotal_per_store = np.sum(sales_table, axis=1) # Sum across columns (days) for each storetotal_per_day = np.sum(sales_table, axis=0) # Sum across rows (stores) for each dayprint(f"\nTotal sales per store: {total_per_store}")print(f"Total sales per day: {total_per_day}")
axis=1 gives you one value per row (e.g., average per store)
axis=0 gives you one value per column (e.g., average per day)
Boolean Filtering and Binary Vectors
An important concept: when you filter with a condition, NumPy creates a boolean (True/False) array, also called a binary vector!
import numpy as np# Sample satisfaction scoresscores = np.array([4.8, 3.2, 4.5, 2.8, 4.9, 3.7, 4.2, 5.0])# When we apply a condition, we get a boolean array (binary vector)high_scores_mask = scores >=4.0print(f"Original scores: {scores}")print(f"Boolean mask (>= 4.0): {high_scores_mask}")print(f"Type: {type(high_scores_mask)}")# We can use this binary vector in several ways:# 1. Count True values (treating True=1, False=0)count_high = np.sum(high_scores_mask)print(f"\nNumber of high scores: {count_high}")# 2. Filter to get only values that are Truefiltered_scores = scores[high_scores_mask]print(f"High scores only: {filtered_scores}")# 3. Do it all in one line (common pattern)count_directly = np.sum(scores >=4.0)print(f"Count directly: {count_directly}")
Original scores: [4.8 3.2 4.5 2.8 4.9 3.7 4.2 5. ]
Boolean mask (>= 4.0): [ True False True False True False True True]
Type: <class 'numpy.ndarray'>
Number of high scores: 5
High scores only: [4.8 4.5 4.9 4.2 5. ]
Count directly: 5
ImportantBoolean Arrays (Binary Vectors)
When you write array >= value, NumPy creates a boolean array: - True (=1) where condition is met - False (=0) where condition is not met
This binary vector can be used to: - Count: np.sum(condition) - sums up the 1s and 0s - Filter: array[condition] - returns only True values - Analyze: Check what percentage meets criteria
Analyze performance metrics across all Bean Counter locations using 2D arrays.
import numpy as np# Daily customer counts for 50 stores over 30 days# This creates a 2D array: rows = stores, columns = daysnp.random.seed(100)daily_customers = np.random.randint(150, 500, size=(50, 30)) # 50 stores, 30 daysprint(f"Data shape: {daily_customers.shape}")print(f"First store's first 5 days: {daily_customers[0, :5]}")# YOUR CODE BELOW# 1. Calculate total customers served across all stores in the monthtotal_customers =# 2. Calculate average daily customers per store (across all stores and days)avg_daily_per_store =# 3. Find the busiest single day (max customers in one store on one day)busiest_day =# 4. Find stores that averaged over 350 customers per day# Hint: Use np.mean(daily_customers, axis=1) to get average per store# Then count how many stores have average > 350store_averages =high_traffic_stores =
Code
# Test your analysisassert total_customers ==492171, f"Total should be 492,171 , got {total_customers}"assert np.isclose(avg_daily_per_store, 328.1, atol=0.1), f"Average should be ~328.1"assert busiest_day ==499, f"Busiest day should be 499, got {busiest_day}"print(f"Total customers served: {total_customers:,}")print(f"Average daily customers per store: {avg_daily_per_store:.1f}")print(f"Busiest single day: {busiest_day} customers")print(f"High-traffic stores (>350/day): {high_traffic_stores}")print("Excellent CEO-level analysis! You understand your company's traffic patterns!")
Conclusion
Congratulations! You’ve learned NumPy for analytics!
You’ve learned:
Array Creation - Initialize data structures for company-wide metrics
Vectorized Operations - Update thousands of prices/costs instantly
Statistical Analysis - Get insights from massive datasets in milliseconds
Speed Advantage - Process millions of data points in seconds
Your Bean Counter CEO toolkit now includes:
Lightning-fast analysis of millions of transactions
Company-wide financial calculations in seconds
Statistical insights for board presentations
The ability to handle big data that would crash Excel
Remember:
NumPy arrays are specialized for numerical operations
Vectorized operations eliminate the need for loops
Use np.mean(), np.sum(), np.std() for quick statistics
Random functions help simulate business scenarios
Always consider using NumPy when dealing with large numerical datasets
What’s Next: In the next tutorial, you’ll learn Pandas, the ultimate tool for working with structured business data. You’ll import real sales data, filter it, group it, and uncover insights that will transform Bean Counter’s strategy!
Solutions
You will likely find solutions to most exercises online. However, I strongly encourage you to work on these exercises independently without searching explicitly for the exact answers to the exercises. Understanding someone else’s solution is very different from developing your own. Use the lecture notes and try to solve the exercises on your own. This approach will significantly enhance your learning and problem-solving skills.
Remember, the goal is not just to complete the exercises, but to understand the concepts and improve your programming abilities. If you encounter difficulties, review the lecture materials, experiment with different approaches, and don’t hesitate to ask for clarification during class discussions.