Overview
Features
Download
Documentation
Community
Add-Ons & Services

Poco::Data - extremely slow insert into database

Please post support and help requests here.

Poco::Data - extremely slow insert into database

Postby mkaut » 09 Jan 2013, 15:13

I have a table with 3 columns and 5100 rows, modelled as
Code: Select all
std::vector<Poco::Tuple<unsigned, unsigned, double> > tabRows

which I am inserting into an open MySQL session using
Code: Select all
ses << "INSERT INTO TableName VALUES(?, ?, ?)", use(tabRows), now;


It works, but it takes almost 160 seconds :o on my PC (and this is only 5100 rows, my real data has 300x that).
Note that this is real time, the user- and sys-time, as reported by clock() from <ctime> or the "time" command on Linux, is only ca. 0.2 s - suggesting that almost all the time is spent waiting.
Interestingly, I hear the hard drive spinning like crazy the whole 2.5 minutes.

Is there some way to make this faster? I tried adding "bulk" into the use command, but it wouldn't compile.
I also tried inserting one row at a time, but it did not help either.
The 'solution' I found is to write the data into a csv file and then import the file into mysql manually; this takes only a fraction of a second.

I am using Poco 1.5, on linux (tested both 32 and 64 version), connected to a mysql server on localhost.

Thanks.

Michal
mkaut
 
Posts: 19
Joined: 17 Oct 2012, 15:22

Re: Poco::Data - extremely slow insert into database

Postby rakesh » 09 Jan 2013, 18:18

Just out of curiosity, does wrapping the bulk insert into a transaction make any difference?
rakesh
 
Posts: 78
Joined: 13 Apr 2011, 17:43
Location: Chicago

Re: Poco::Data - extremely slow insert into database

Postby alex » 09 Jan 2013, 19:42

rakesh wrote:Just out of curiosity, does wrapping the bulk insert into a transaction make any difference?

It will make a difference to a varying degree. However, only ODBC back-end truly supports it. With ODBC back-end, std::vector and native types (except bool), I've seen significant speedups (thousands of times faster in some cases, there are benchmarks in the ODBC testsuite). The gain will depend on the driver implementation. Use of complex types (e.g. std::string, tuple, etc) or non-vector container will affect performance because additional copying is unavoidable.

As for the OP question, MySQL back-end contribution was of the hit-n-run type so its performance is likely to be sub-optimal. I was seriously tempted to remove it from 1.5. because of maintenance headaches and there's already ODBC support. There's room for improvement, but we need someone to step up and take ownership of it. Any takers?
alex
 
Posts: 1047
Joined: 11 Jul 2006, 16:27
Location: United_States

Re: Poco::Data - extremely slow insert into database

Postby mkaut » 10 Jan 2013, 11:34

alex wrote:However, only ODBC back-end truly supports it. With ODBC back-end, std::vector and native types (except bool), I've seen significant speedups (thousands of times faster in some cases, there are benchmarks in the ODBC testsuite). The gain will depend on the driver implementation. Use of complex types (e.g. std::string, tuple, etc) or non-vector container will affect performance because additional copying is unavoidable.

As for the OP question, MySQL back-end contribution was of the hit-n-run type so its performance is likely to be sub-optimal. I was seriously tempted to remove it from 1.5. because of maintenance headaches and there's already ODBC support. There's room for improvement, but we need someone to step up and take ownership of it. Any takers?

Alex, thanks for the link to the benchmark - I had not realized I can have one vector of simple type per column - that's why I was using tuple. With this changed, the code now compiles also with bulk insert, but fails on runtime with "Invalid access: Bulk not supported by this session." .. so switching to ODBC back-end is probably the only way to make this work.

However, setting up ODBC to talk to MySQL, plus finding out how to connect to it from Poco, sounds a bit scary, so I guess I will just dump the table into a file and then import it to the database from bash script.

Anyway, thanks for the explanation.
mkaut
 
Posts: 19
Joined: 17 Oct 2012, 15:22

Re: Poco::Data - extremely slow insert into database

Postby alex » 10 Jan 2013, 15:16

mkaut wrote:However, setting up ODBC to talk to MySQL, plus finding out how to connect to it from Poco, sounds a bit scary

If you are doing a one-time thing, may not be worth the effort, but it's really not complicated:

Code: Select all
alex$ cat /etc/odbc.ini
[ODBC Data Sources]
PocoDataMySQLTest = MySQL ODBC 5.2 Driver

[PocoDataMySQLTest]
Driver   = /usr/local/lib/libmyodbc5a.so
SERVER   = localhost
UID      = poco
PWD      = poco
DATABASE = test
PORT     = 3306


Code: Select all
Session sess(Poco::Data::ODBC::Connector::KEY, "DSN=PocoDataMySQLTest;");

or
Code: Select all
Session sess(Poco::Data::ODBC::Connector::KEY, "DRIVER={MySQL ODBC 5.2 Driver};DATABASE=test;SERVER=localhost;UID=poco;PWD=poco;");
alex
 
Posts: 1047
Joined: 11 Jul 2006, 16:27
Location: United_States

Re: Poco::Data - extremely slow insert into database

Postby mkaut » 11 Jan 2013, 19:23

alex wrote:If you are doing a one-time thing, may not be worth the effort, but it's really not complicated:

Thanks a lot for the instructions, it really was as easy as you made it sound :-).

On the downside, it did not help much: for the version with tuples for each row, the time went down by about 1/3, but it did not improve further when I switched to the one-vector-per-column structure .. and adding the bulk option did not help either :-(.
So I guess I will go back to the previous solution, which is using the C++ code to write a CSV file and then import it into mysql from bash.

Anyway, thanks for your excellent help, now I have at least the ODBC driver ready, should I ever need it.

Michal
mkaut
 
Posts: 19
Joined: 17 Oct 2012, 15:22

Re: Poco::Data - extremely slow insert into database

Postby AlexZH » 13 Mar 2013, 13:38

I've just began using Poco. First of all I wanted to try to write to SQL database directly from c++ code. (Current solution uses c++ and c# for writing to DB).
Fortunately I found this thread just at the beginning.

I saw some examples: http://pocoproject.org/blog/?p=113

And I got confused as I can't make it compile (as in on of the posts above)! At string
Code: Select all
session << "SELECT * FROM Test", use(ints, bulk(100)), now;

I get "bulk is undeclared identifier".

I tried it for odbc driver connecting to MSSQL Server 2008 R2 jn Windows 7.

Could anybody give some instruction how to make keyword 'bulk' work?

I know how to solve it using csv and bulkInsert utility (as suggested above) but it doesn't seem to be a right solution...

Thanks in advance!
AlexZH
 
Posts: 4
Joined: 13 Mar 2013, 13:29

Re: Poco::Data - extremely slow insert into database

Postby AlexZH » 13 Mar 2013, 16:11

Can anybody show an example of working bulk keyword?

For poco 1.4.6. I get 'bulk' - undeclared identifier

For poco 1.5.1. I get 'use' - invalid arguments (but bulk declared already)

I used the following example:
Code: Select all
std::vector<int> ints(100, 1);
session << "INSERT INTO Test VALUES (?)", use(ints, bulk), now;


I'm running on Windows 7, MSVS 2010, connecting to MSSQL 2008R2 with ODBC driver.
I've just started with Poco libraries. My goal is to speed up current solution (C++/C#) making bulk inserts directly from c++ code and avoiding C# completely.

Thanks in advance!
AlexZH
 
Posts: 4
Joined: 13 Mar 2013, 13:29


Return to Support

Who is online

Users browsing this forum: No registered users and 2 guests

cron