DB Python

MySQL bulk Insert by pymysql

MySQL bulk Insert by pymysql

"Mysql insert is slow in Python"
When we try to insert huge data to MySQL table, sometimes it takes time.
But with using bulk insert, it will take short time.
This article introduce how to do bulk insert.


bulk insert

Use executemany method of pymysql instead of execute method.
Then you can insert bulk data.

Reference:
pymysql bulk insert


Test

Prepare

Install pymysql library by pip.

pip install pymysql

Create test table in MySQL DB.

CREATE TABLE <code>hoge</code> (
  <code>id</code> int(11) NOT NULL AUTO_INCREMENT,
  <code>name</code> varchar(50) NOT NULL,
  <code>value</code> double NOT NULL,
  <code>text</code> text NOT NULL,
  PRIMARY KEY (<code>id</code>)
) ENGINE=InnoDB AUTO_INCREMENT=98289 DEFAULT CHARSET=latin1


Insert one by one

To insert 100,000 records, it takes 13 seconds.

import pymysql
import datetime

print("Connect to DB")
conn = pymysql.connect(user="root", password="", host="localhost", database="test")

# Insert one by one
def insert_data(values):
    insert_sql = "INSERT INTO test.hoge (name, value, text) values (%s,%s,%s)"

    cur = conn.cursor()
    for value in values:
        cur.execute(insert_sql, value)

def main():
    # Generate data
    values = []
    print("Generate data")
    for i in range(100000):
        name = "name_{}".format(i)
        value = i
        text = "text_{}".format(i)
        values.append([name,value,text])
    print("Length of data: {}".format(len(values)))
    print()

    # Insert one by one
    print("Insert data")
    start_time = datetime.datetime.now()
    print("Start:" + str(start_time))
    insert_data(values)
    end_time = datetime.datetime.now()
    print("End:" + str(end_time))
    diff_time = end_time - start_time
    print("Diff:" + str(diff_time))
    print()

if __name__ == "__main__":
    main()


Bulk Insert

The difference is executemany in insert_data_bulk function.
To insert 100,000 records, it takes 2 seconds.
It is about 5 times faster than one-by-one-insert.

To insert 1,000,000 records, One-by-one-insert takes 137 seconds. And Bulk-insert takes 26 seconds.
So it is also 5 times faster than one-by-one-insert.

import pymysql
import datetime

print("Connect to DB")
conn = pymysql.connect(user="root", password="", host="localhost", database="test")

# Bulk Insert
def insert_data_bulk(values):
    print("Insert bulk data")
    insert_sql = "INSERT INTO test.hoge (name, value, text) values (%s,%s,%s)"

    cur = conn.cursor()
    cur.executemany(insert_sql, values)

def main():
    # Generate data
    values = []
    print("Generate data")
    for i in range(100000):
        name = "name_{}".format(i)
        value = i
        text = "text_{}".format(i)
        values.append([name,value,text])
    print("Length of data: {}".format(len(values)))
    print()

    # Bulk Insert
    print("Insert data")
    start_time = datetime.datetime.now()
    print("Start:" + str(start_time))
    insert_data_bulk(values)
    end_time = datetime.datetime.now()
    print("End:" + str(end_time))
    diff_time = end_time - start_time
    print("Diff:" + str(diff_time))
    print()

if __name__ == "__main__":
    main()


Finally

When you feel that mysql insert is slow, try to use executemany of pymysql.

Sample code:
yKRSW/sample_mysql_bulk_insert: Sample of MySQL bulk insert with using pymysql.

コチラもオススメ

KRSW

駆け出し機械学習エンジニア。機械学習、DB、WEBと浅く広い感じ。 Junior machine learning engineer. Not a specialist but a generalist who knows DB, WEB too.

If you felt this article is useful, please share.

にほんブログ村 IT技術ブログへ

-DB, Python
-,

Translate »

Copyright© ITips , 2020 All Rights Reserved.