"Mysql insert is slow in Python"
When we try to insert huge data to MySQL table, sometimes it takes time.
But with using bulk insert, it will take short time.
This article introduce how to do bulk insert.
bulk insert
Use executemany
method of pymysql
instead of execute
method.
Then you can insert bulk data.
Reference:
pymysql bulk insert
Test
Prepare
Install pymysql
library by pip.
pip install pymysql
Create test table in MySQL DB.
CREATE TABLE <code>hoge</code> ( <code>id</code> int(11) NOT NULL AUTO_INCREMENT, <code>name</code> varchar(50) NOT NULL, <code>value</code> double NOT NULL, <code>text</code> text NOT NULL, PRIMARY KEY (<code>id</code>) ) ENGINE=InnoDB AUTO_INCREMENT=98289 DEFAULT CHARSET=latin1
Insert one by one
To insert 100,000 records, it takes 13 seconds.
import pymysql import datetime print("Connect to DB") conn = pymysql.connect(user="root", password="", host="localhost", database="test") # Insert one by one def insert_data(values): insert_sql = "INSERT INTO test.hoge (name, value, text) values (%s,%s,%s)" cur = conn.cursor() for value in values: cur.execute(insert_sql, value) def main(): # Generate data values = [] print("Generate data") for i in range(100000): name = "name_{}".format(i) value = i text = "text_{}".format(i) values.append([name,value,text]) print("Length of data: {}".format(len(values))) print() # Insert one by one print("Insert data") start_time = datetime.datetime.now() print("Start:" + str(start_time)) insert_data(values) end_time = datetime.datetime.now() print("End:" + str(end_time)) diff_time = end_time - start_time print("Diff:" + str(diff_time)) print() if __name__ == "__main__": main()
Bulk Insert
The difference is executemany
in insert_data_bulk
function.
To insert 100,000 records, it takes 2 seconds.
It is about 5 times faster than one-by-one-insert.
To insert 1,000,000 records, One-by-one-insert takes 137 seconds. And Bulk-insert takes 26 seconds.
So it is also 5 times faster than one-by-one-insert.
import pymysql import datetime print("Connect to DB") conn = pymysql.connect(user="root", password="", host="localhost", database="test") # Bulk Insert def insert_data_bulk(values): print("Insert bulk data") insert_sql = "INSERT INTO test.hoge (name, value, text) values (%s,%s,%s)" cur = conn.cursor() cur.executemany(insert_sql, values) def main(): # Generate data values = [] print("Generate data") for i in range(100000): name = "name_{}".format(i) value = i text = "text_{}".format(i) values.append([name,value,text]) print("Length of data: {}".format(len(values))) print() # Bulk Insert print("Insert data") start_time = datetime.datetime.now() print("Start:" + str(start_time)) insert_data_bulk(values) end_time = datetime.datetime.now() print("End:" + str(end_time)) diff_time = end_time - start_time print("Diff:" + str(diff_time)) print() if __name__ == "__main__": main()
Finally
When you feel that mysql insert is slow, try to use executemany
of pymysql
.
Sample code:
yKRSW/sample_mysql_bulk_insert: Sample of MySQL bulk insert with using pymysql.