Python

How to get cross join with pandas.DataFrame

How can I get cross join data with pandas DataFrame ?


pandas.DataFrame is useful to handle table format data.

With using pandas, we can import CSV or Excel data and join them with common key column.

And sometimes you want to get all combination data without any join condition.

But how can we get such all combination data.

So today I will introduce about "How to get cross join with pandas.DataFrame".

Author


Mid-carieer engineer (AI, system). Good at Python and SQL.

Advantage to read

You can understand about "How to get cross join with pandas.DataFrame". Then you don't have to concern about programming with pandas.DataFrame.


pandas.DataFrame "merge"

In order to join two DataFrame, we can use merge.


Set two DataFrame and common key as parameters of pd.merge().

SAMPLE

import pandas as pd

data_list1 = [
    ["a",1],
    ["b",2],
    ["c",3],
    ["d",4]
]
col_list1 = ["column1-1","key"]
df1 = pd.DataFrame(data=data_list1, columns=col_list1)
print(df1)

#   column1-1  key
# 0         a    1
# 1         b    2
# 2         c    3
# 3         d    4

data_list2 = [
    ["e",1],
    ["f",2],
    ["g",3],
    ["h",4]
]
col_list2 = ["column2-1","key"]
df2 = pd.DataFrame(data=data_list2, columns=col_list2)
print(df2)

#   column2-1  key
# 0         e    1
# 1         f    2
# 2         g    3
# 3         h    4

df_crossjoin = pd.merge(
    df1,
    df2,
    on="key"
)
print(df_crossjoin)

#   column1-1  key column2-1
# 0         a    1         e
# 1         b    2         f
# 2         c    3         g
# 3         d    4         h


Then we can join two DataFrame by common key.



Cross join with merge

Next, in order to get all combination data, can we use merge without common key ?

SAMPLE

import pandas as pd

data_list1 = [
    ["a"],
    ["b"],
    ["c"],
    ["d"]
]
col_list1 = ["column1-1"]
df1 = pd.DataFrame(data=data_list1, columns=col_list1)
print(df1)

#   column1-1
# 0         a
# 1         b
# 2         c
# 3         d

data_list2 = [
    ["e"],
    ["f"],
    ["g"],
    ["h"]
]
col_list2 = ["column2-1"]
df2 = pd.DataFrame(data=data_list2, columns=col_list2)
print(df2)

#   column2-1
# 0         e
# 1         f
# 2         g
# 3         h

df_crossjoin = pd.merge(
    df1,
    df2
)
print(df_crossjoin)

# MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False


If we don't set common column in merge, it shows error like above.

So what can we do ?



How to get cross join with pandas.DataFrame

In order to join DataFrame, we can use merge.

But merge needs common key.

So how can we do cross join ?


There is a method that we add dummy common column to both DataFrame.

SAMPLE

import pandas as pd

data_list1 = [
    ["a"],
    ["b"],
    ["c"],
    ["d"]
]
col_list1 = ["column1-1"]
df1 = pd.DataFrame(data=data_list1, columns=col_list1)
df1["dummy_key"] = 1
print(df1)

#   column1-1  dummy_key
# 0         a          1
# 1         b          1
# 2         c          1
# 3         d          1

data_list2 = [
    ["e"],
    ["f"],
    ["g"],
    ["h"]
]
col_list2 = ["column2-1"]
df2 = pd.DataFrame(data=data_list2, columns=col_list2)
df2["dummy_key"] = 1
print(df2)

#   column2-1  dummy_key
# 0         e          1
# 1         f          1
# 2         g          1
# 3         h          1

df_crossjoin = pd.merge(
    df1,
    df2,
    on="dummy_key"
)
print(df_crossjoin)

#    column1-1  dummy_key column2-1
# 0          a          1         e
# 1          a          1         f
# 2          a          1         g
# 3          a          1         h
# 4          b          1         e
# 5          b          1         f
# 6          b          1         g
# 7          b          1         h
# 8          c          1         e
# 9          c          1         f
# 10         c          1         g
# 11         c          1         h
# 12         d          1         e
# 13         d          1         f
# 14         d          1         g
# 15         d          1         h


Add dummy columns with same value.

Then with using dummy columns as common key, we can do cross join.


In addition, if you don't want to show dummy column, you can delete it by drop.

SAMPLE

df_crossjoin.drop(columns=["dummy_key"], inplace=True)
print(df_crossjoin)

#    column1-1 column2-1
# 0          a         e
# 1          a         f
# 2          a         g
# 3          a         h
# 4          b         e
# 5          b         f
# 6          b         g
# 7          b         h
# 8          c         e
# 9          c         f
# 10         c         g
# 11         c         h
# 12         d         e
# 13         d         f
# 14         d         g
# 15         d         h



Conclusion

Today I will described about "How to get cross join with pandas.DataFrame".

Important points are following.

Point

  • Use merge to join DataFrame
  • merge requires common key
  • In order to get cross join, add dummy column as common key


With using dummy key, we can get cross joined DataFrame. But I hope pandas to allow join without on parameter.


There are some other articles about pandas.Dataframe.

If you interested in them, please read them.

If you felt this article is useful, please share.

にほんブログ村 IT技術ブログへ

-Python
-

© 2022 ITips