Programming with Python
Kühne Logistics University Hamburg - Fall 2024
Tip
NumPy arrays are stored in contiguous memory blocks, making operations very efficient.
ndarray
np.zeros()
for arrays of zerosnp.random.rand()
for random valuesnp.arange()
for evenly spaced valuesnp.linspace()
for linearly spaced values+
, -
, *
, /
)reshape
, flatten
)Tip
NumPy operations are vectorized, meaning they operate on entire arrays at once rather than element by element.
Task: Complete the following task:
Note
Note, that you can always use the help()
function to get more information about a function. But be sure to import the package first, otherwise you will get an error. To quit the help page, press q
.
Task: Complete the following task:
pip install pandas
or with Thonnyimport pandas as pd
Note
You can also use a different abbreviation, but pd
is the most common one.
Name Kids City Salary
0 Tobias 2 Oststeinbek 3000
1 Robin 1 Oststeinbek 3200
2 Nils 0 Hamburg 4000
3 Nikolai 0 Lübeck 2500
Name Age Department Position Salary
0 Alice 30 HR Manager 50000
1 Bob 25 IT Developer 60000
2 Charlie 28 Finance Analyst 55000
3 David 35 Marketing Executive 52000
4 Eve 32 Sales Representative 48000
5 Frank 29 IT Developer 61000
6 Grace 31 HR Assistant 45000
7 Hank 27 Finance Analyst 53000
8 Ivy 33 Marketing Manager 58000
9 Jack 26 Sales Representative 47000
10 Kara 34 IT Developer 62000
11 Leo 30 HR Manager 51000
12 Mona 28 Finance Analyst 54000
13 Nina 35 Marketing Executive 53000
14 Oscar 32 Sales Representative 49000
15 Paul 29 IT Developer 63000
16 Quinn 31 HR Assistant 46000
17 Rita 27 Finance Analyst 52000
18 Sam 33 Marketing Manager 59000
19 Tina 26 Sales Representative 48000
20 Uma 34 IT Developer 64000
21 Vince 30 HR Manager 52000
22 Walt 28 Finance Analyst 55000
23 Xena 35 Marketing Executive 54000
24 Yara 32 Sales Representative 50000
25 Zane 29 IT Developer 65000
26 Anna 31 HR Assistant 47000
27 Ben 27 Finance Analyst 53000
28 Cathy 33 Marketing Manager 60000
29 Dylan 26 Sales Representative 49000
30 Ella 34 IT Developer 66000
31 Finn 30 HR Manager 53000
32 Gina 28 Finance Analyst 56000
33 Hugo 35 Marketing Executive 55000
34 Iris 32 Sales Representative 51000
35 Jake 29 IT Developer 67000
36 Kyla 31 HR Assistant 48000
37 Liam 27 Finance Analyst 54000
38 Mia 33 Marketing Manager 61000
39 Noah 26 Sales Representative 50000
40 Olive 34 IT Developer 68000
41 Pete 30 HR Manager 54000
42 Quincy 28 Finance Analyst 57000
43 Rose 35 Marketing Executive 56000
44 Steve 32 Sales Representative 52000
45 Tara 29 IT Developer 69000
46 Umar 31 HR Assistant 49000
47 Vera 27 Finance Analyst 55000
48 Will 33 Marketing Manager 62000
49 Zara 26 Sales Representative 51000
df.head()
method to display the first 5 rowsdf.tail()
method to display the last 5 rowsdf.info()
to display information about a DataFrame<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 50 non-null object
1 Age 50 non-null int64
2 Department 50 non-null object
3 Position 50 non-null object
4 Salary 50 non-null int64
dtypes: int64(2), object(3)
memory usage: 2.1+ KB
None
df.describe()
to display summary statisticsdf.index
attribute to access the indexdf['column_name']
to access a columndf[df['column'] > value]
method to filter Name Age Department Position Salary
35 Jake 29 IT Developer 67000
40 Olive 34 IT Developer 68000
45 Tara 29 IT Developer 69000
Tara
Olive
Task: Complete the following task:
Note
Note, that we can use the mean()
method on the Salary
column, as it is a numeric column. In addition, we can use the min()
method on the Salary
column to find the lowest salary.
df.groupby('column').method()
Name | Age | Department | Salary | |
---|---|---|---|---|
Position | ||||
Analyst | CharlieHankMonaRitaWaltBenGinaLiamQuincyVera | 275 | FinanceFinanceFinanceFinanceFinanceFinanceFina... | 544000 |
Assistant | GraceQuinnAnnaKylaUmar | 155 | HRHRHRHRHR | 235000 |
Developer | BobFrankKaraPaulUmaZaneEllaJakeOliveTara | 306 | ITITITITITITITITITIT | 645000 |
Executive | DavidNinaXenaHugoRose | 175 | MarketingMarketingMarketingMarketingMarketing | 270000 |
Manager | AliceIvyLeoSamVinceCathyFinnMiaPeteWill | 315 | HRMarketingHRMarketingHRMarketingHRMarketingHR... | 560000 |
Representative | EveJackOscarTinaYaraDylanIrisNoahSteveZara | 290 | SalesSalesSalesSalesSalesSalesSalesSalesSalesS... | 495000 |
df.drop(columns=["column"])
Age Salary
Position
Analyst 275 544000
Assistant 155 235000
Developer 306 645000
Executive 175 270000
Manager 315 560000
Representative 290 495000
['column1', 'column2']
Age | Salary | ||
---|---|---|---|
Position | Department | ||
Analyst | Finance | 28 | 57000 |
Assistant | HR | 31 | 49000 |
Developer | IT | 34 | 69000 |
Executive | Marketing | 35 | 56000 |
Manager | HR | 30 | 54000 |
Marketing | 33 | 62000 | |
Representative | Sales | 32 | 52000 |
sum()
: sum of the valuesmean()
: mean of the valuesmax()
: maximum of the valuesmin()
: minimum of the valuescount()
: count of the valuesTask: Complete the following task:
pd.concat()
to concatenate along shared columns A B
0 1 4
1 2 5
2 3 6
0 7 10
1 8 11
2 9 12
pd.join()
to join DataFrames along columns A B C D
x 1 4 NaN NaN
y 2 5 8.0 11.0
z 3 6 7.0 10.0
pd.merge(df_name, on='column', how='type')
how
specifies the type of merge
inner
: rows with matching keys in both DataFramesouter
: rows from both are kept, missing values are filledleft
: rows from the left are kept, missing values are filledright
: rows from right are kept, missing values are filled A B C
0 1 4.0 NaN
1 2 5.0 7.0
2 3 6.0 8.0
3 4 NaN 9.0
Task: Complete the following task:
df1 = pd.DataFrame({
"Name": ["John", "Alice", "Bob", "Carol"],
"Department": ["Sales", "IT", "HR", "Sales"],
"Salary": [50000, 60000, 55000, 52000]})
df2 = pd.DataFrame({
"Name": ["Alice", "Bob", "Dave", "Eve"],
"Position": ["Developer", "Manager", "Analyst", "Developer"],
"Years": [5, 8, 3, 4]})
# TODO: Merge the two DataFrames on the "Name" column
# Try different types of merges (inner, outer, left, right)
# Observe and describe the differences in the results
pd.read_excel(file_path)
functiondf.to_excel(file_path)
methodNote
Note, that you likely need to install the openpyxl
package to be able to write Excel files, as it handles the file format.
We can also specify the sheet name when reading and writing
Name Age Department Position Salary
0 Alice 30 HR Manager 50000
1 Bob 25 IT Developer 60000
2 Charlie 28 Finance Analyst 55000
3 David 35 Marketing Executive 52000
4 Eve 32 Sales Representative 48000
Task: Complete the following task:
Question: Anybody ever heard of the terms?
For example, the following DataFrame is in wide format:
Date Hamburg Los_Angeles Tokyo
0 2024-03-01 12.0 18.2 14.8
1 2024-03-02 9.8 23.0 17.6
2 2024-03-03 7.6 20.3 16.0
3 2024-03-04 10.1 21.1 13.4
4 2024-03-05 11.2 18.5 15.1
.. ... ... ... ...
87 2024-05-27 12.4 24.5 24.9
88 2024-05-28 17.8 20.6 22.3
89 2024-05-29 16.2 20.4 20.2
90 2024-05-30 15.5 20.7 21.7
91 2024-05-31 12.6 22.0 22.9
[92 rows x 4 columns]
The melting process transforms it into the following long format:
Date City Temperature
0 2024-03-01 Hamburg 12.0
1 2024-03-02 Hamburg 9.8
2 2024-03-03 Hamburg 7.6
3 2024-03-04 Hamburg 10.1
4 2024-03-05 Hamburg 11.2
.. ... ... ...
271 2024-05-27 Tokyo 24.9
272 2024-05-28 Tokyo 22.3
273 2024-05-29 Tokyo 20.2
274 2024-05-30 Tokyo 21.7
275 2024-05-31 Tokyo 22.9
[276 rows x 3 columns]
pd.melt()
to transform from wide to longid_vars
: columns to keepvar_name
: name of the new column that will contain the names of the original columnsvalue_name
: name of the new column that will contain the values of the original columns Position Variables Values
0 Manager Name Alice
1 Developer Name Bob
2 Analyst Name Charlie
3 Executive Name David
4 Representative Name Eve
.. ... ... ...
195 Developer Salary 69000
196 Assistant Salary 49000
197 Analyst Salary 55000
198 Manager Salary 62000
199 Representative Salary 51000
[200 rows x 3 columns]
Task: Complete the following task:
How do
Large Language
Models work?
Photo by Taylor Vick on Unsplash
Tip
Currently, Cursor is my favorite one. But this might change in the future, as there is a lot of competition in this space.
.py
fileCtrl + L
to open the chatTask: Paste the following prompt in to the chat:
Can you please write me a small number guessing game in python? It should work for one player in the terminal. The player should guess a number between 1-10 and get hints about whether his guess was too large or too small. After 3 tries, end the game if he didn’t succeed with a nice message.
Copy the generated code and paste it into your file.
Note
And that’s it for todays lecture!
You now have the basic knowledge to start working with tabular data and AI!.
For more interesting literature to learn more about Python, take a look at the literature list of this course.
Lecture VII - Pandas and AI | Dr. Tobias Vlćek | Home