{"id":5268,"date":"2021-12-19T14:36:12","date_gmt":"2021-12-19T05:36:12","guid":{"rendered":"https:\/\/itips.krsw.biz\/?p=5268"},"modified":"2021-12-19T17:52:43","modified_gmt":"2021-12-19T08:52:43","slug":"pandas-dataframe-how-to-find-and-drop-duplicated-row","status":"publish","type":"post","link":"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/","title":{"rendered":"How to handle duplicated rows in pandas.DataFrame"},"content":{"rendered":"<p><img decoding=\"async\" src=\"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-python-800x450.jpg\" alt=\"How to handle duplicated row in pandas.DataFrame\" \/><\/p>\n<div class=\"st-kaiwa-box kaiwaicon7 clearfix\"><div class=\"st-kaiwa-face\"><img decoding=\"async\" src=\"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2022\/06\/junior_face_r_sulk_nobg_hair0_cloth10_200px.png\" width=\"60px\"><div class=\"st-kaiwa-face-name\"><\/div><\/div><div class=\"st-kaiwa-area\"><div class=\"st-kaiwa-hukidashi\">There may be duplicated rows in pandas DataFrame. I'd like to find and delete them.<\/div><\/div><\/div>\n<p><\/br><\/p>\n<p><code>pandas.DataFrame<\/code> is useful to handle table format data.<\/p>\n<p>Sometimes DataFrame has duplicated data.<\/p>\n<p>In some cases, <span class=\"rmarker\">duplicated data is useless.<\/span> So we would like to find and delete them.<\/p>\n<p>But how can we find duplicated rows ?<\/p>\n<p>So today I will introduce <strong>&quot;<span class=\"st-mymarker-s\">How to handle duplicated rows in pandas.DataFrame<\/span>&quot;<\/strong>.<\/p>\n<div class=\"st-mybox  has-title \" style=\"background:#ffffff;border-color:#BDBDBD;border-width:2px;border-radius:5px;margin: 25px 0;\"><p class=\"st-mybox-title\" style=\"color:#757575;font-weight:bold;background: #ffffff;\"><i class=\"fa fa-check-circle st-css-no\" aria-hidden=\"true\"><\/i>Author<\/p><div class=\"st-in-mybox\"><br \/>\n<div class=\"st-kaiwa-box kaiwaicon2 clearfix\"><div class=\"st-kaiwa-face\"><img decoding=\"async\" src=\"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2022\/06\/karasan_smile_200px_b.gif\" width=\"60px\"><div class=\"st-kaiwa-face-name\"><\/div><\/div><div class=\"st-kaiwa-area\"><div class=\"st-kaiwa-hukidashi\">Mid-carieer engineer (AI, system). Good at Python and SQL.<\/div><\/div><\/div><br \/>\n<\/div><\/div>\n<div class=\"st-minihukidashi-box \" ><p class=\"st-minihukidashi\" style=\"background:#3F51B5;color:#fff;margin: 0 0 0 -6px;font-size:80%;border-radius:30px;\"><span class=\"st-minihukidashi-arrow\" style=\"border-top-color: #3F51B5;\"><\/span><span class=\"st-minihukidashi-flexbox\">Advantage to read<\/span><\/p><\/div>\n<div class=\"clip-memobox \" style=\"background:#E8EAF6;color:#000000;\"><div class=\"clip-fonticon\" style=\"font-size:200%;color:#3F51B5;\"><i class=\"fa fa-thumbs-o-up st-css-no\" aria-hidden=\"true\"><\/i><\/div><div class=\"clip-memotext\" style=\"border-color:#3F51B5;\"><p style=\"color:#000000;\">You can understand \"How to handle duplicated rows in pandas.DataFrame\". After that <span class=\"st-mymarker-s\">you will be good at data handling.<\/span><\/p><\/div><\/div>\n<p><!--more--><\/p>\n<p><\/br><\/p>\n<h2>How to handle duplicated rows in pandas.DataFrame<\/h2>\n<p><img decoding=\"async\" src=\"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-how-800x450.jpg\" alt=\"\" \/><\/p>\n<p>So how can we handle duplicated rows in pandas.DataFrame ?<\/p>\n<p>There are mothods like following.<\/p>\n<div class=\"freebox has-title \" style=\"\"><p class=\"p-free\" style=\"font-weight:bold;\"><span class=\"p-entry-f\" style=\"font-weight:bold;\">Handle duplication<\/span><\/p><div class=\"free-inbox\">\n<ul>\n<li>Find duplication: <code>duplicated()<\/code><\/li>\n<li>Delete duplication: <code>drop_duplicates()<\/code><\/li>\n<\/ul>\n<\/div><\/div>\n<p>I will explain about them with using following data.<\/p>\n<div class=\"st-minihukidashi-box \" ><p class=\"st-minihukidashi\" style=\"background:#3F51B5;color:#fff;margin: 0 0 0 -6px;font-size:80%;border-radius:30px;\"><span class=\"st-minihukidashi-arrow\" style=\"border-top-color: #3F51B5;\"><\/span><span class=\"st-minihukidashi-flexbox\">DATA<\/span><\/p><\/div>\n<pre class=\"prettyprint lang-python\">\nimport pandas as pd\n\ndata_list1 = &#091;\n&#091;&#34;a&#34;,12,100&#093;,\n&#091;&#34;a&#34;,12,100&#093;,\n&#091;&#34;c&#34;,12,90&#093;,\n&#091;&#34;d&#34;,13,85&#093;,\n&#091;&#34;d&#34;,13,85&#093;,\n&#091;&#34;e&#34;,14,95&#093;\n&#093;\ncol_list1 = &#091;&#34;id&#34;,&#34;age&#34;,&#34;score&#34;&#093;\ndf1 = pd.DataFrame(data=data_list1, columns=col_list1)\nprint(df1)\n\n#   id  age  score\n# 0  a   12    100\n# 1  a   12    100\n# 2  c   12     90\n# 3  d   13     85\n# 4  d   13     85\n# 5  e   14     95\n<\/pre>\n<p><\/br><\/br><\/p>\n<h3>Find duplication: duplicated()<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2021\/03\/book_1615898374-800x533.jpg\" alt=\"\" \/><\/p>\n<p>In order to find duplication, we can use <code>duplicated()<\/code>.<\/p>\n<p><code>duplicated()<\/code> function returns boolean value of each row.<\/p>\n<p>Among duplicated rows, it considers <span class=\"st-mymarker-s\">first row as <code>original<\/code><\/span>, and considers others as <code>duplicated<\/code>.<\/p>\n<div class=\"st-minihukidashi-box \" ><p class=\"st-minihukidashi\" style=\"background:#3F51B5;color:#fff;margin: 0 0 0 -6px;font-size:80%;border-radius:30px;\"><span class=\"st-minihukidashi-arrow\" style=\"border-top-color: #3F51B5;\"><\/span><span class=\"st-minihukidashi-flexbox\">SAMPLE<\/span><\/p><\/div>\n<pre class=\"prettyprint lang-python\">\nprint(df1.duplicated())\n\n# 0    False\n# 1     True\n# 2    False\n# 3    False\n# 4     True\n# 5    False\n# dtype: bool\n<\/pre>\n<p><\/br><\/p>\n<p>Masking by boolean values, we can extract duplicated rows.<\/p>\n<pre class=\"prettyprint lang-python\">\nprint(df1&#091;df1.duplicated()&#093;)\n\n#   id  age  score\n# 1  a   12    100\n# 4  d   13     85\n<\/pre>\n<p><\/br><\/br><\/p>\n<h4>&quot;keep&quot; remaining rows<\/h4>\n<p>If we use <code>duplicated()<\/code> without parameters, it considers first row as <code>False<\/code> and other row as <code>True(duplicated)<\/code>.<\/p>\n<p>If we set <code>keep=&quot;last&quot;<\/code>, it considers last row as <code>False(not duplicated)<\/code>.<br \/>\n(Its default is <code>keep=&quot;first&quot;<\/code>.)<\/p>\n<div class=\"st-minihukidashi-box \" ><p class=\"st-minihukidashi\" style=\"background:#3F51B5;color:#fff;margin: 0 0 0 -6px;font-size:80%;border-radius:30px;\"><span class=\"st-minihukidashi-arrow\" style=\"border-top-color: #3F51B5;\"><\/span><span class=\"st-minihukidashi-flexbox\">SAMPLE<\/span><\/p><\/div>\n<pre class=\"prettyprint lang-python\">\nprint(df1.duplicated(keep=&#34;last&#34;))\n\n# 0     True\n# 1    False\n# 2    False\n# 3     True\n# 4    False\n# 5    False\n# dtype: bool\n\nprint(df1&#091;df1.duplicated(keep=&#34;last&#34;)&#093;)\n\n#   id  age  score\n# 0  a   12    100\n# 3  d   13     85\n<\/pre>\n<p><\/br><\/p>\n<p>In addition, <code>keep=False<\/code> considers both first and last rows as `True(dupllicated)'.<\/p>\n<pre class=\"prettyprint lang-python\">\nprint(df1.duplicated(keep=False))\n\n# 0     True\n# 1     True\n# 2    False\n# 3     True\n# 4     True\n# 5    False\n# dtype: bool\n\nprint(df1&#091;df1.duplicated(keep=False)&#093;)\n\n#   id  age  score\n# 0  a   12    100\n# 1  a   12    100\n# 3  d   13     85\n# 4  d   13     85\n<\/pre>\n<p><\/br><\/br><\/p>\n<h4>&quot;subset&quot; considers certain columns to find duplication<\/h4>\n<p><code>duplicated()<\/code> considers duplicated row if rows have same values in all columns.<\/p>\n<p>With using <code>subset=[&quot;column name&quot;]<\/code> parameter, it use certain columns to find duplication.<\/p>\n<div class=\"st-minihukidashi-box \" ><p class=\"st-minihukidashi\" style=\"background:#3F51B5;color:#fff;margin: 0 0 0 -6px;font-size:80%;border-radius:30px;\"><span class=\"st-minihukidashi-arrow\" style=\"border-top-color: #3F51B5;\"><\/span><span class=\"st-minihukidashi-flexbox\">SAMPLE<\/span><\/p><\/div>\n<pre class=\"prettyprint lang-python\">\nprint(df1.duplicated(subset=&#091;&#34;age&#34;&#093;))\n\n# 0    False\n# 1     True\n# 2     True\n# 3    False\n# 4     True\n# 5    False\n# dtype: bool\n\nprint(df1&#091;df1.duplicated(subset=&#091;&#34;age&#34;&#093;)&#093;)\n\n#   id  age  score\n# 1  a   12    100\n# 2  c   12     90\n# 4  d   13     85\n<\/pre>\n<p><\/br><\/br><\/p>\n<h3>Delete duplication: drop_duplicates()<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/01\/solution_1578916409-800x533.jpg\" alt=\"\" \/><\/p>\n<p>When you want to delete duplicated rows, you can use <code>drop_duplicates()<\/code>.<\/p>\n<div class=\"st-minihukidashi-box \" ><p class=\"st-minihukidashi\" style=\"background:#3F51B5;color:#fff;margin: 0 0 0 -6px;font-size:80%;border-radius:30px;\"><span class=\"st-minihukidashi-arrow\" style=\"border-top-color: #3F51B5;\"><\/span><span class=\"st-minihukidashi-flexbox\">SAMPLE<\/span><\/p><\/div>\n<pre class=\"prettyprint lang-python\">\nprint(df1.drop_duplicates())\n\n#   id  age  score\n# 0  a   12    100\n# 2  c   12     90\n# 3  d   13     85\n# 5  e   14     95\n<\/pre>\n<p><\/br><\/p>\n<p>It works same with <span class=\"rmarker\">masking reversed boolean of <code>duplicated()<\/code><\/span>.<\/p>\n<pre class=\"prettyprint lang-python\">\nprint(df1&#091;~df1.duplicated()&#093;)\n\n#   id  age  score\n# 0  a   12    100\n# 2  c   12     90\n# 3  d   13     85\n# 5  e   14     95\n<\/pre>\n<p><\/br><\/p>\n<p>In addition, <code>drop_duplicates()<\/code> can also use <code>keep<\/code> and <code>subset<\/code> parameters.<\/p>\n<p><\/br><\/br><\/p>\n<h4>&quot;inplace&quot; updates original DataFrame<\/h4>\n<p><code>drop_duplicates()<\/code> returns another DataFrame.<\/p>\n<p>If you want to update source DataFrame, you can use <code>inplace=True<\/code>.<\/p>\n<div class=\"st-minihukidashi-box \" ><p class=\"st-minihukidashi\" style=\"background:#3F51B5;color:#fff;margin: 0 0 0 -6px;font-size:80%;border-radius:30px;\"><span class=\"st-minihukidashi-arrow\" style=\"border-top-color: #3F51B5;\"><\/span><span class=\"st-minihukidashi-flexbox\">SAMPLE<\/span><\/p><\/div>\n<pre class=\"prettyprint lang-python\">\ndf1.drop_duplicates()\nprint(df1)\n\n#   id  age  score\n# 0  a   12    100\n# 1  a   12    100\n# 2  c   12     90\n# 3  d   13     85\n# 4  d   13     85\n# 5  e   14     95\n\ndf1.drop_duplicates(inplace=True)\nprint(df1)\n\n#   id  age  score\n# 0  a   12    100\n# 2  c   12     90\n# 3  d   13     85\n# 5  e   14     95\n<\/pre>\n<p><\/br><\/br><\/p>\n<h2>Conclusion<\/h2>\n<p><img decoding=\"async\" src=\"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/11\/h2-conclusion-800x450.jpg\" alt=\"\" \/><\/p>\n<p>Today I explained about <strong>&quot;<span class=\"st-mymarker-s\">How to handle duplicated rows in pandas.DataFrame<\/span>&quot;<\/strong>.<\/p>\n<p>In rder to find and delete duplicated rows, there are mothods like following.<\/p>\n<div class=\"freebox has-title \" style=\"\"><p class=\"p-free\" style=\"font-weight:bold;\"><span class=\"p-entry-f\" style=\"font-weight:bold;\">Handle duplication<\/span><\/p><div class=\"free-inbox\">\n<ul>\n<li>Find duplication: <code>duplicated()<\/code><\/li>\n<li>Delete duplication: <code>drop_duplicates()<\/code><\/li>\n<\/ul>\n<\/div><\/div>\n<p>Each of them can can consider original row by <code>keep=&quot;first&quot;<\/code> or <code>keep=&quot;last&quot;<\/code> parameters.<\/p>\n<p>And with using <code>subset=[&quot;column name&quot;]<\/code> parameter, it use certain columns to find duplication.<\/p>\n<p><\/br><\/p>\n<div class=\"st-kaiwa-box kaiwaicon2 clearfix\"><div class=\"st-kaiwa-face\"><img decoding=\"async\" src=\"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2022\/06\/karasan_smile_200px_b.gif\" width=\"60px\"><div class=\"st-kaiwa-face-name\"><\/div><\/div><div class=\"st-kaiwa-area\"><div class=\"st-kaiwa-hukidashi\">They are very useful for data handling.<\/div><\/div><\/div>\n<p><\/br><\/p>\n<div class=\"st-mybox  has-title st-mybox-class\" style=\"background:#fafafa;border-width:1px;border-radius:5px;margin: 25px 0 25px 0;\"><p class=\"st-mybox-title\" style=\"color:#757575;font-weight:bold;text-shadow: #fff 3px 0px 0px, #fff 2.83487px 0.981584px 0px, #fff 2.35766px 1.85511px 0px, #fff 1.62091px 2.52441px 0px, #fff 0.705713px 2.91581px 0px, #fff -0.287171px 2.98622px 0px, #fff -1.24844px 2.72789px 0px, #fff -2.07227px 2.16926px 0px, #fff -2.66798px 1.37182px 0px, #fff -2.96998px 0.42336px 0px, #fff -2.94502px -0.571704px 0px, #fff -2.59586px -1.50383px 0px, #fff -1.96093px -2.27041px 0px, #fff -1.11013px -2.78704px 0px, #fff -0.137119px -2.99686px 0px, #fff 0.850987px -2.87677px 0px, #fff 1.74541px -2.43999px 0px, #fff 2.44769px -1.73459px 0px, #fff 2.88051px -0.838246px 0px;\"><i class=\"fa fa-file-text-o st-css-no\" aria-hidden=\"true\"><\/i>Reference<\/p><div class=\"st-in-mybox\">\n<ul>\n<li><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.duplicated.html\">duplicated \u2014 pandas 1.3.5 documentation<\/a><\/li>\n<li><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.drop_duplicates.html\">drop_duplicates \u2014 pandas 1.3.5 documentation<\/a><\/li>\n<\/ul>\n<\/div><\/div>\n<p><\/br><\/p>\n<p><a target=\"_blank\"  href=\"https:\/\/www.amazon.co.jp\/gp\/product\/B0789WKTKJ\/ref=as_li_tl?ie=UTF8&camp=247&creative=1211&creativeASIN=B0789WKTKJ&linkCode=as2&tag=itipskrsw-22&linkId=93f51e24e63e522a64926c312d72fdbe\" rel=\"noopener noreferrer\"><img decoding=\"async\" border=\"0\" src=\"\/\/ws-fe.amazon-adsystem.com\/widgets\/q?_encoding=UTF8&MarketPlace=JP&ASIN=B0789WKTKJ&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=_SL250_&tag=itipskrsw-22\" ><\/a><\/p>\n<p><\/br><\/p>\n<p>There are more articles about pandas.DataFrame.<\/p>\n<p>If you are interested in them, please read them.<\/p>\n<div class=\"st-mybox  has-title st-mybox-class\" style=\"background:#E8F5E9;border-color:#43A047;border-width:3px;border-radius:5px;margin: 25px 0;\"><p class=\"st-mybox-title\" style=\"color:#006400;font-weight:bold;text-shadow: #fff 3px 0px 0px, #fff 2.83487px 0.981584px 0px, #fff 2.35766px 1.85511px 0px, #fff 1.62091px 2.52441px 0px, #fff 0.705713px 2.91581px 0px, #fff -0.287171px 2.98622px 0px, #fff -1.24844px 2.72789px 0px, #fff -2.07227px 2.16926px 0px, #fff -2.66798px 1.37182px 0px, #fff -2.96998px 0.42336px 0px, #fff -2.94502px -0.571704px 0px, #fff -2.59586px -1.50383px 0px, #fff -1.96093px -2.27041px 0px, #fff -1.11013px -2.78704px 0px, #fff -0.137119px -2.99686px 0px, #fff 0.850987px -2.87677px 0px, #fff 1.74541px -2.43999px 0px, #fff 2.44769px -1.73459px 0px, #fff 2.88051px -0.838246px 0px;\"><i class=\"fa fa fa-hand-o-right st-css-no\" aria-hidden=\"true\"><\/i>Read more<\/p><div class=\"st-in-mybox\"><br \/>\n\t\t\t<a href=\"https:\/\/itips.krsw.biz\/en\/how-to-convert-pandas-dataframe-numpy-ndarray\/\" class=\"st-cardlink\">\n\t\t\t<div class=\"kanren st-cardbox\" >\n\t\t\t\t\t\t\t\t<dl class=\"clearfix\">\n\t\t\t\t\t<dt class=\"st-card-img\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"320\" height=\"180\" src=\"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-python-320x180.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" srcset=\"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-python-320x180.jpg 320w, https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-python-640x360.jpg 640w, https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-python-800x450.jpg 800w, https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-python-768x432.jpg 768w, https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-python-1536x864.jpg 1536w, https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-python-400x225.jpg 400w, https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-python.jpg 1920w\" sizes=\"(max-width: 320px) 100vw, 320px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t<\/dt>\n\t\t\t\t\t<dd>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<h5 class=\"st-cardbox-t\">How to convert between Pandas DataFrame and NumPy ndarray<\/h5>\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/dd>\n\t\t\t\t<\/dl>\n\t\t\t<\/div>\n\t\t\t<\/a>\n\t\t\t<br \/>\n\t\t\t<a href=\"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-update-column-value\/\" class=\"st-cardlink\">\n\t\t\t<div class=\"kanren st-cardbox\" >\n\t\t\t\t\t\t\t\t<dl class=\"clearfix\">\n\t\t\t\t\t<dt class=\"st-card-img\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"320\" height=\"214\" src=\"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2018\/12\/code_1545140786-320x214.jpg\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" srcset=\"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2018\/12\/code_1545140786-320x214.jpg 320w, https:\/\/itips.krsw.biz\/wp-content\/uploads\/2018\/12\/code_1545140786-640x427.jpg 640w, https:\/\/itips.krsw.biz\/wp-content\/uploads\/2018\/12\/code_1545140786-800x534.jpg 800w, https:\/\/itips.krsw.biz\/wp-content\/uploads\/2018\/12\/code_1545140786-768x512.jpg 768w, https:\/\/itips.krsw.biz\/wp-content\/uploads\/2018\/12\/code_1545140786-400x267.jpg 400w, https:\/\/itips.krsw.biz\/wp-content\/uploads\/2018\/12\/code_1545140786-300x200.jpg 300w, https:\/\/itips.krsw.biz\/wp-content\/uploads\/2018\/12\/code_1545140786-1024x683.jpg 1024w, https:\/\/itips.krsw.biz\/wp-content\/uploads\/2018\/12\/code_1545140786.jpg 1280w\" sizes=\"(max-width: 320px) 100vw, 320px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t<\/dt>\n\t\t\t\t\t<dd>\n\t\t\t\t\t\t\t\t\t\t\t\t\t<h5 class=\"st-cardbox-t\">[Python] Update column value of Pandas DataFrame<\/h5>\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/dd>\n\t\t\t\t<\/dl>\n\t\t\t<\/div>\n\t\t\t<\/a>\n\t\t\t<br \/>\n<\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>There may be duplicated rows in pandas DataFrame. I'd like to find and delete them. pandas.DataFrame &#8230; <\/p>\n","protected":false},"author":1,"featured_media":2919,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_locale":"en_US","_original_post":"https:\/\/itips.krsw.biz\/?p=5253","footnotes":""},"categories":[6],"tags":[35],"class_list":["post-5268","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python","tag-pandas","en-US"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>How to handle duplicated rows in pandas.DataFrame - ITips\u30b7\u30b9\u30c6\u30e0\u30bd\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u30ba<\/title>\n<meta name=\"description\" content=\"pandas.DataFrame is useful to handle table format data.Sometimes DataFrame has duplicated data.In some cases, duplicated data is useless. So we would like to find and delete them.But how can we find duplicated rows ?So today I will introduce &quot;How to handle duplicated rows in pandas.DataFrame&quot;.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to handle duplicated rows in pandas.DataFrame - ITips\u30b7\u30b9\u30c6\u30e0\u30bd\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u30ba\" \/>\n<meta property=\"og:description\" content=\"pandas.DataFrame is useful to handle table format data.Sometimes DataFrame has duplicated data.In some cases, duplicated data is useless. So we would like to find and delete them.But how can we find duplicated rows ?So today I will introduce &quot;How to handle duplicated rows in pandas.DataFrame&quot;.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/\" \/>\n<meta property=\"og:site_name\" content=\"ITips\u30b7\u30b9\u30c6\u30e0\u30bd\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u30ba\" \/>\n<meta property=\"article:published_time\" content=\"2021-12-19T05:36:12+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-12-19T08:52:43+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-python.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"ITips\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@https:\/\/twitter.com\/karasan_itips\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"ITips\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/itips.krsw.biz\\\/en\\\/pandas-dataframe-how-to-find-and-drop-duplicated-row\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/itips.krsw.biz\\\/en\\\/pandas-dataframe-how-to-find-and-drop-duplicated-row\\\/\"},\"author\":{\"name\":\"ITips\",\"@id\":\"https:\\\/\\\/itips.krsw.biz\\\/#\\\/schema\\\/person\\\/981ee81393a64c1b43f0b62d91998f0c\"},\"headline\":\"How to handle duplicated rows in pandas.DataFrame\",\"datePublished\":\"2021-12-19T05:36:12+00:00\",\"dateModified\":\"2021-12-19T08:52:43+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/itips.krsw.biz\\\/en\\\/pandas-dataframe-how-to-find-and-drop-duplicated-row\\\/\"},\"wordCount\":585,\"image\":{\"@id\":\"https:\\\/\\\/itips.krsw.biz\\\/en\\\/pandas-dataframe-how-to-find-and-drop-duplicated-row\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/itips.krsw.biz\\\/wp-content\\\/uploads\\\/2020\\\/10\\\/h2-python.jpg\",\"keywords\":[\"pandas\"],\"articleSection\":[\"Python\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/itips.krsw.biz\\\/en\\\/pandas-dataframe-how-to-find-and-drop-duplicated-row\\\/\",\"url\":\"https:\\\/\\\/itips.krsw.biz\\\/en\\\/pandas-dataframe-how-to-find-and-drop-duplicated-row\\\/\",\"name\":\"How to handle duplicated rows in pandas.DataFrame - ITips\u30b7\u30b9\u30c6\u30e0\u30bd\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u30ba\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/itips.krsw.biz\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/itips.krsw.biz\\\/en\\\/pandas-dataframe-how-to-find-and-drop-duplicated-row\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/itips.krsw.biz\\\/en\\\/pandas-dataframe-how-to-find-and-drop-duplicated-row\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/itips.krsw.biz\\\/wp-content\\\/uploads\\\/2020\\\/10\\\/h2-python.jpg\",\"datePublished\":\"2021-12-19T05:36:12+00:00\",\"dateModified\":\"2021-12-19T08:52:43+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/itips.krsw.biz\\\/#\\\/schema\\\/person\\\/981ee81393a64c1b43f0b62d91998f0c\"},\"description\":\"pandas.DataFrame is useful to handle table format data.Sometimes DataFrame has duplicated data.In some cases, duplicated data is useless. So we would like to find and delete them.But how can we find duplicated rows ?So today I will introduce \\\"How to handle duplicated rows in pandas.DataFrame\\\".\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/itips.krsw.biz\\\/en\\\/pandas-dataframe-how-to-find-and-drop-duplicated-row\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/itips.krsw.biz\\\/en\\\/pandas-dataframe-how-to-find-and-drop-duplicated-row\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/itips.krsw.biz\\\/en\\\/pandas-dataframe-how-to-find-and-drop-duplicated-row\\\/#primaryimage\",\"url\":\"https:\\\/\\\/itips.krsw.biz\\\/wp-content\\\/uploads\\\/2020\\\/10\\\/h2-python.jpg\",\"contentUrl\":\"https:\\\/\\\/itips.krsw.biz\\\/wp-content\\\/uploads\\\/2020\\\/10\\\/h2-python.jpg\",\"width\":1920,\"height\":1080},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/itips.krsw.biz\\\/en\\\/pandas-dataframe-how-to-find-and-drop-duplicated-row\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u30db\u30fc\u30e0\",\"item\":\"https:\\\/\\\/itips.krsw.biz\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to handle duplicated rows in pandas.DataFrame\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/itips.krsw.biz\\\/#website\",\"url\":\"https:\\\/\\\/itips.krsw.biz\\\/\",\"name\":\"ITips\u30b7\u30b9\u30c6\u30e0\u30bd\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u30ba\",\"description\":\"\u4e8b\u696d\u306b\u3068\u3063\u3066\u91cd\u8981\u306a\u60c5\u5831\u30b7\u30b9\u30c6\u30e0\u306e\u8ab2\u984c\u3092\u89e3\u6c7a\u3057\u307e\u3059\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/itips.krsw.biz\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/itips.krsw.biz\\\/#\\\/schema\\\/person\\\/981ee81393a64c1b43f0b62d91998f0c\",\"name\":\"ITips\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a89ef68c98cf6b05d7754a22b3e650bab179284eafbaa216db990ab3650cd763?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a89ef68c98cf6b05d7754a22b3e650bab179284eafbaa216db990ab3650cd763?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a89ef68c98cf6b05d7754a22b3e650bab179284eafbaa216db990ab3650cd763?s=96&d=mm&r=g\",\"caption\":\"ITips\"},\"description\":\"\u30b7\u30b9\u30c6\u30e0\u30a8\u30f3\u30b8\u30cb\u30a2\u3001AI\u30a8\u30f3\u30b8\u30cb\u30a2\u3068\u3001IT\u696d\u754c\u306710\u5e74\u4ee5\u4e0a\u50cd\u3044\u3066\u3044\u308b\u4e2d\u5805\u3002Python\u3068SQL\u304c\u5f97\u610f\u3002 System engineer AI engineer, Data scientist. Mid-carrier IT person. Good at Python and SQL.\",\"sameAs\":[\"https:\\\/\\\/www.pinterest.jp\\\/it_karasan\\\/\",\"https:\\\/\\\/x.com\\\/https:\\\/\\\/twitter.com\\\/karasan_itips\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to handle duplicated rows in pandas.DataFrame - ITips\u30b7\u30b9\u30c6\u30e0\u30bd\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u30ba","description":"pandas.DataFrame is useful to handle table format data.Sometimes DataFrame has duplicated data.In some cases, duplicated data is useless. So we would like to find and delete them.But how can we find duplicated rows ?So today I will introduce \"How to handle duplicated rows in pandas.DataFrame\".","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/","og_locale":"en_US","og_type":"article","og_title":"How to handle duplicated rows in pandas.DataFrame - ITips\u30b7\u30b9\u30c6\u30e0\u30bd\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u30ba","og_description":"pandas.DataFrame is useful to handle table format data.Sometimes DataFrame has duplicated data.In some cases, duplicated data is useless. So we would like to find and delete them.But how can we find duplicated rows ?So today I will introduce \"How to handle duplicated rows in pandas.DataFrame\".","og_url":"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/","og_site_name":"ITips\u30b7\u30b9\u30c6\u30e0\u30bd\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u30ba","article_published_time":"2021-12-19T05:36:12+00:00","article_modified_time":"2021-12-19T08:52:43+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-python.jpg","type":"image\/jpeg"}],"author":"ITips","twitter_card":"summary_large_image","twitter_creator":"@https:\/\/twitter.com\/karasan_itips","twitter_misc":{"Written by":"ITips","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/#article","isPartOf":{"@id":"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/"},"author":{"name":"ITips","@id":"https:\/\/itips.krsw.biz\/#\/schema\/person\/981ee81393a64c1b43f0b62d91998f0c"},"headline":"How to handle duplicated rows in pandas.DataFrame","datePublished":"2021-12-19T05:36:12+00:00","dateModified":"2021-12-19T08:52:43+00:00","mainEntityOfPage":{"@id":"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/"},"wordCount":585,"image":{"@id":"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/#primaryimage"},"thumbnailUrl":"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-python.jpg","keywords":["pandas"],"articleSection":["Python"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/","url":"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/","name":"How to handle duplicated rows in pandas.DataFrame - ITips\u30b7\u30b9\u30c6\u30e0\u30bd\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u30ba","isPartOf":{"@id":"https:\/\/itips.krsw.biz\/#website"},"primaryImageOfPage":{"@id":"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/#primaryimage"},"image":{"@id":"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/#primaryimage"},"thumbnailUrl":"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-python.jpg","datePublished":"2021-12-19T05:36:12+00:00","dateModified":"2021-12-19T08:52:43+00:00","author":{"@id":"https:\/\/itips.krsw.biz\/#\/schema\/person\/981ee81393a64c1b43f0b62d91998f0c"},"description":"pandas.DataFrame is useful to handle table format data.Sometimes DataFrame has duplicated data.In some cases, duplicated data is useless. So we would like to find and delete them.But how can we find duplicated rows ?So today I will introduce \"How to handle duplicated rows in pandas.DataFrame\".","breadcrumb":{"@id":"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/#primaryimage","url":"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-python.jpg","contentUrl":"https:\/\/itips.krsw.biz\/wp-content\/uploads\/2020\/10\/h2-python.jpg","width":1920,"height":1080},{"@type":"BreadcrumbList","@id":"https:\/\/itips.krsw.biz\/en\/pandas-dataframe-how-to-find-and-drop-duplicated-row\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u30db\u30fc\u30e0","item":"https:\/\/itips.krsw.biz\/"},{"@type":"ListItem","position":2,"name":"How to handle duplicated rows in pandas.DataFrame"}]},{"@type":"WebSite","@id":"https:\/\/itips.krsw.biz\/#website","url":"https:\/\/itips.krsw.biz\/","name":"ITips\u30b7\u30b9\u30c6\u30e0\u30bd\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u30ba","description":"\u4e8b\u696d\u306b\u3068\u3063\u3066\u91cd\u8981\u306a\u60c5\u5831\u30b7\u30b9\u30c6\u30e0\u306e\u8ab2\u984c\u3092\u89e3\u6c7a\u3057\u307e\u3059","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/itips.krsw.biz\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/itips.krsw.biz\/#\/schema\/person\/981ee81393a64c1b43f0b62d91998f0c","name":"ITips","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/a89ef68c98cf6b05d7754a22b3e650bab179284eafbaa216db990ab3650cd763?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/a89ef68c98cf6b05d7754a22b3e650bab179284eafbaa216db990ab3650cd763?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a89ef68c98cf6b05d7754a22b3e650bab179284eafbaa216db990ab3650cd763?s=96&d=mm&r=g","caption":"ITips"},"description":"\u30b7\u30b9\u30c6\u30e0\u30a8\u30f3\u30b8\u30cb\u30a2\u3001AI\u30a8\u30f3\u30b8\u30cb\u30a2\u3068\u3001IT\u696d\u754c\u306710\u5e74\u4ee5\u4e0a\u50cd\u3044\u3066\u3044\u308b\u4e2d\u5805\u3002Python\u3068SQL\u304c\u5f97\u610f\u3002 System engineer AI engineer, Data scientist. Mid-carrier IT person. Good at Python and SQL.","sameAs":["https:\/\/www.pinterest.jp\/it_karasan\/","https:\/\/x.com\/https:\/\/twitter.com\/karasan_itips"]}]}},"_links":{"self":[{"href":"https:\/\/itips.krsw.biz\/wp-json\/wp\/v2\/posts\/5268","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/itips.krsw.biz\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/itips.krsw.biz\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/itips.krsw.biz\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/itips.krsw.biz\/wp-json\/wp\/v2\/comments?post=5268"}],"version-history":[{"count":8,"href":"https:\/\/itips.krsw.biz\/wp-json\/wp\/v2\/posts\/5268\/revisions"}],"predecessor-version":[{"id":5276,"href":"https:\/\/itips.krsw.biz\/wp-json\/wp\/v2\/posts\/5268\/revisions\/5276"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/itips.krsw.biz\/wp-json\/wp\/v2\/media\/2919"}],"wp:attachment":[{"href":"https:\/\/itips.krsw.biz\/wp-json\/wp\/v2\/media?parent=5268"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/itips.krsw.biz\/wp-json\/wp\/v2\/categories?post=5268"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/itips.krsw.biz\/wp-json\/wp\/v2\/tags?post=5268"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}