1. Simple ways to hard code DataFrame
- Use python dic:
1
2
3
4
5fruit_info = pd.DataFrame(
data = {'fruit': ['apple', 'orange', 'banana', 'raspberry'],
'color': ['red', 'orange', 'yellow', 'pink']
})
fruit_info - Use specified format like below:
1
2
3
4
5fruit_info2 = pd.DataFrame(
[("red", "apple"), ("orange", "orange"), ("yellow", "banana"),
("pink", "raspberry")],
columns = ["color", "fruit"])
fruit_info2
The out put of two above will be:
color | fruit | |
---|---|---|
0 | red | apple |
1 | orange | orange |
2 | yellow | banana |
3 | pink | raspberry |
- You can call df.shape to inspect the shape of it
- You can convert the entire DataFrame into a two-dimensional Numpy array using df.values.
2. Do some simple manipulation on DF
To add a column:
1
2fruit_info['rank1']=[2,4,1,3]
fruit_infoOutput:
fruit color rank1 0 apple red 2 1 orange orange 4 2 banana yellow 1 3 raspberry pink 3 - Or you can use
d.loc[:, 'new column name'] = …
,take in two value in the “[]” with first indicate rows ; second indicate columnsOutput:1
2fruit_info.loc[:,'rank2'] = [2,4,1,3]
fruit_infofruit color rank1 rank2 0 apple red 2 2 1 orange orange 4 4 2 banana yellow 1 1 3 raspberry pink 3 3
- Or you can use
To drop a row or column:
1
2
3
4# "axis = 1" means to drop columns while "axis = 0" means to drop rows
fruit_info_original = fruit_info.drop(labels = ['rank1','rank2'],axis = 1 )
fruit_info_originalOutput:
fruit color 0 apple red 1 orange orange 2 banana yellow 3 raspberry pink To rename the column name(in below , turn them into capital letters)
1
2
3fruit_info_original.rename(lambda x:x.capitalize(),axis = 1,inplace = True)
fruit_info_caps = fruit_info_original
fruit_info_capsOutput:
Fruit Color 0 apple red 1 orange orange 2 banana yellow 3 raspberry pink
3. Load and form DF from zip
1 | import zipfile |
About
with as
from copilot:这段代码中的
with
和as
是用于上下文管理的关键字。让我来解释一下:
with
语句:
with
语句用于创建一个上下文管理器,它可以在进入和退出代码块时执行一些操作。- 在这段代码中,
with
语句用于打开一个 ZIP 文件并处理其中的内容。as
关键字:
- 在
with
语句中,as
后面的变量名称表示我们获取到的上下文管理器对象。- 在这里,
fh
是一个文件句柄,它是通过zf.open(f)
打开的 ZIP 文件中的一个文件。as
关键字允许我们将这个文件句柄赋值给变量fh
,以便在with
代码块内使用。总结一下,
[with]
语句帮助我们管理资源,确保在使用完毕后,资源被正确地释放。而[as]
关键字则用于获取上下文管理器中的对象,以便我们可以直接操作它。
4. Basic inspection of data
len(df)
Output how many row does DF havedf.head()
Output five row in head of DFdf.shape
Output the shape of DFdf.describe()
Output the basic statics such as mean value when the inside datatype of a col is valuable
5. Two Slicing way of DF
Thinking of accessing a specified piece of data, there are two different methods call loc and iloc
loc
method:Selection Using Label/Indexiloc
method:Selection using Integer location
Eg.
1.baby_names.head()
output:
State | Sex | Year | Name | Count | |
---|---|---|---|---|---|
0 | AK | F | 1910 | Mary | 14 |
1 | AK | F | 1910 | Annie | 12 |
2 | AK | F | 1910 | Anna | 10 |
3 | AK | F | 1910 | Margaret | 8 |
4 | AK | F | 1910 | Helen | 7 |
2.baby_names.loc[2:5, ['Name']]
output:
Name | |
---|---|
2 | Anna |
3 | Margaret |
4 | Helen |
5 | Elsie |
3.baby_names.iloc[2:5,['Name']]
output:IndexError:
.iloc requires numeric indexers, got [‘Name’]
4.baby_names.iloc[2:5[3]]
output:
Name | |
---|---|
2 | Anna |
3 | Margaret |
4 | Helen |
5.df = baby_names[:5].set_index("Name")
6.df
output: (changing the index as ‘Name’ col)
State | Sex | Year | Count | |
---|---|---|---|---|
Name | ||||
Mary | AK | F | 1910 | 14 |
Annie | AK | F | 1910 | 12 |
Anna | AK | F | 1910 | 10 |
Margaret | AK | F | 1910 | 8 |
Helen | AK | F | 1910 | 7 |
7.df.loc[['Mary', 'Anna'], :]
output:
State | Sex | Year | Count | |
---|---|---|---|---|
Name | ||||
Mary | AK | F | 1910 | 14 |
Anna | AK | F | 1910 | 10 |
However, if we still want to access rows by location we will need to use the integer loc (iloc
) accessor:
8.df.iloc[1:4, 2:3]
output:
Year | |
---|---|
Name | |
Annie | 1910 |
Anna | 1910 |
Margaret | 1910 |
to be continue…