How to get cross join with pandas.DataFrame

How to get cross join with pandas.DataFrame

How can I get cross join data with pandas DataFrame ?

pandas.DataFrame is useful to handle table format data.

With using pandas, we can import CSV or Excel data and join them with common key column.

And sometimes you want to get all combination data without any join condition.

But how can we get such all combination data.

So today I will introduce about "How to get cross join with pandas.DataFrame".

目次

pandas.DataFrame "merge"

In order to join two DataFrame, we can use merge.

So let's try to use merge .

Prepare 2 DataFrame with common key column.

import pandas as pd

data_list1 = [
    ["a",1],
    ["b",2],
    ["c",3],
    ["d",4]
]
col_list1 = ["column1-1","key"]
df1 = pd.DataFrame(data=data_list1, columns=col_list1)
print(df1)

#   column1-1  key
# 0         a    1
# 1         b    2
# 2         c    3
# 3         d    4

data_list2 = [
    ["e",1],
    ["f",2],
    ["g",3],
    ["h",4]
]
col_list2 = ["column2-1","key"]
df2 = pd.DataFrame(data=data_list2, columns=col_list2)
print(df2)

#   column2-1  key
# 0         e    1
# 1         f    2
# 2         g    3
# 3         h    4

Then with using pd.merge() ,  we can join two DataFrame by common key.

df_crossjoin = pd.merge(
    df1,
    df2,
    on="key"
)
print(df_crossjoin)

#   column1-1  key column2-1
# 0         a    1         e
# 1         b    2         f
# 2         c    3         g
# 3         d    4         h

Cross join with merge

Next, in cross join case, there are no common keys.
So can we use merge without common key ?

Let's try to use merge .

import pandas as pd

data_list1 = [
    ["a"],
    ["b"],
    ["c"],
    ["d"]
]
col_list1 = ["column1-1"]
df1 = pd.DataFrame(data=data_list1, columns=col_list1)
print(df1)

#   column1-1
# 0         a
# 1         b
# 2         c
# 3         d

data_list2 = [
    ["e"],
    ["f"],
    ["g"],
    ["h"]
]
col_list2 = ["column2-1"]
df2 = pd.DataFrame(data=data_list2, columns=col_list2)
print(df2)

#   column2-1
# 0         e
# 1         f
# 2         g
# 3         h

df_crossjoin = pd.merge(
    df1,
    df2
)
print(df_crossjoin)

# MergeError: No common columns to perform merge on. 
# Merge options: left_on=None, right_on=None, left_index=False, right_index=False

As this result, if we don't set common column in merge, it shows error like above.

So how can we get cross join data ?

How to get cross join with pandas.DataFrame

In order to join DataFrame, we can use merge.

But without common key, merge raises error.

So how can we do cross join ?

In order to get cross join with pandas.DataFrame, we can use parameter how = "cross".

df_crossjoin = pd.merge(
    df1,
    df2,
    how = "cross"
)
print(df_crossjoin)

#    column1-1 column2-1
# 0          a         e
# 1          a         f
# 2          a         g
# 3          a         h
# 4          b         e
# 5          b         f
# 6          b         g
# 7          b         h
# 8          c         e
# 9          c         f
# 10         c         g
# 11         c         h
# 12         d         e
# 13         d         f
# 14         d         g
# 15         d         h

With using how = "cross", we can get cross join DataFrame.

how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘inner’

  • left: use only keys from left frame, similar to a SQL left outer join; preserve key order.
  • right: use only keys from right frame, similar to a SQL right outer join; preserve key order.
  • outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
  • inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys.
  • cross: creates the cartesian product from both frames, preserves the order of the left keys.
pandas.merge — pandas 2.2.1 documentation

Dummy common key

There is another way to get cross join DataFrame.

There is a method that we add dummy common column to both DataFrame.

Check following code.

import pandas as pd

data_list1 = [
    ["a"],
    ["b"],
    ["c"],
    ["d"]
]
col_list1 = ["column1-1"]
df1 = pd.DataFrame(data=data_list1, columns=col_list1)
df1["dummy_key"] = 1
print(df1)

#   column1-1  dummy_key
# 0         a          1
# 1         b          1
# 2         c          1
# 3         d          1

data_list2 = [
    ["e"],
    ["f"],
    ["g"],
    ["h"]
]
col_list2 = ["column2-1"]
df2 = pd.DataFrame(data=data_list2, columns=col_list2)
df2["dummy_key"] = 1
print(df2)

#   column2-1  dummy_key
# 0         e          1
# 1         f          1
# 2         g          1
# 3         h          1

df_crossjoin = pd.merge(
    df1,
    df2,
    on="dummy_key"
)
print(df_crossjoin)

#    column1-1  dummy_key column2-1
# 0          a          1         e
# 1          a          1         f
# 2          a          1         g
# 3          a          1         h
# 4          b          1         e
# 5          b          1         f
# 6          b          1         g
# 7          b          1         h
# 8          c          1         e
# 9          c          1         f
# 10         c          1         g
# 11         c          1         h
# 12         d          1         e
# 13         d          1         f
# 14         d          1         g
# 15         d          1         h

Add dummy columns with same value.

Then with using dummy columns as common key, we can do cross join.

python – pandas two dataframe cross join – Stack Overflow

In addition, if you don't want to show dummy column, you can delete it by drop.

df_crossjoin.drop(columns=["dummy_key"], inplace=True)
print(df_crossjoin)

#    column1-1 column2-1
# 0          a         e
# 1          a         f
# 2          a         g
# 3          a         h
# 4          b         e
# 5          b         f
# 6          b         g
# 7          b         h
# 8          c         e
# 9          c         f
# 10         c         g
# 11         c         h
# 12         d         e
# 13         d         f
# 14         d         g
# 15         d         h

Conclusion

Today I described about "How to get cross join with pandas.DataFrame".

Important points are following.

  • Use merge to join DataFrame
  • merge requires common key
  • In order to get cross join, use parameter how = "cross"
  • You can also add dummy column as common key

Without on parameter, you can use how parameter.

How to get cross join with pandas.DataFrame

この記事が気に入ったら
いいね または フォローしてね!

If you like this article, please share !
  • URLをコピーしました!
  • URLをコピーしました!

Author

karasanのアバター karasan System engineer

Mid-career engineer (AI, Data science, Salesforce, etc.).
Good at Python and SQL.

目次