Python

How to remove none from pandas DataFrame

Pandas is very useful to handle table data.

In table data, sometimes it contains None data.

In that case we would like to remove None from specific column.

So how can we remove None ?

Today I introduce "How to remove none from pandas DataFrame".

Author


Mid-carieer engineer (AI, system). Good at Python and SQL. Through system engineer experience, I also have a lot of knowledge like server.

Advantage to read

You can understand "How to remove none from pandas DataFrame".


How to remove none from pandas DataFrame

In order to remove None data, use dropna() method.

As its name, dropna() drops None data.

We can use it like below.

import pandas as pd

data_list1 = [
[1,2,None],
[2,None,4],
[None,4,5],
[4,5,6]
]
col_list1 = ["c1","c2","c3"]
df1 = pd.DataFrame(data=data_list1, columns=col_list1)
print(df1)

#     c1   c2   c3
# 0  1.0  2.0  NaN
# 1  2.0  NaN  4.0
# 2  NaN  4.0  5.0
# 3  4.0  5.0  6.0

df2 = df1.dropna()

print(df2)

#     c1   c2   c3
# 3  4.0  5.0  6.0


With using dropna(), we could extract rows that does not have None.

Then how can we handle more complex data ?



How to remove data that have none in specific column

We could remove data that has None.

Then how can we check None in specific column ?

In order to set column condition, use subset.

You can set column names in subset like below.

df3 = df1.dropna(subset=["c1","c2"])
print(df3)

#     c1   c2   c3
# 0  1.0  2.0  NaN
# 3  4.0  5.0  6.0



How to remove data that has none in all columns

So how can we remove data that has none in all columns ?

This case, use how="all".

If you set how="all", you can get data without rows that has none in all columns.

data_list1 = [
[1,2,None],
[2,None,4],
[None,None,None],
[4,5,6]
]

col_list1 = ["c1","c2","c3"]
df1 = pd.DataFrame(data=data_list1, columns=col_list1)
print(df1)

#     c1   c2   c3
# 0  1.0  2.0  NaN
# 1  2.0  NaN  4.0
# 2  NaN  NaN  NaN
# 3  4.0  5.0  6.0

df2 = df1.dropna()
print(df2)

#     c1   c2   c3
# 3  4.0  5.0  6.0

df4 = df1.dropna(how="all")
print(df4)

#     c1   c2   c3
# 0  1.0  2.0  NaN
# 1  2.0  NaN  4.0
# 3  4.0  5.0  6.0



How to remove cplumn that has none

With using dropna(), we could remove rows that has None.

Then how can we drop columns ?

In order to remove column, use axis=1 option.

data_list1 = [
[1,2,None],
[2,None,4],
[3,4,5],
[4,5,6]
]
col_list1 = ["c1","c2","c3"]
df1 = pd.DataFrame(data=data_list1, columns=col_list1)
print(df1)

#     c1   c2   c3
# 0  1.0  2.0  NaN
# 1  2.0  NaN  4.0
# 2  3.0  4.0  5.0
# 3  4.0  5.0  6.0

df5 = df1.dropna(axis=1)
print(df5)

#    c1
# 0   1
# 1   2
# 2   3
# 3   4


axis=1 を指定することでNoneを含む列が除外された。



 Conclusion

Today I described about "How to remove none from pandas DataFrame".

In order to remove None, we can use dropna().

And we can use these options.

dropna options

  • Filter by specific columns: subset=["column name"]
  • Remove rows that has None in all columns: how="all"
  • Remove columns: axis=1


  • It is useful. So I'd like to remember it.



    If you felt this article is useful, please share.

    にほんブログ村 IT技術ブログへ

    -Python
    -

    Translate »

    © 2021 ITips