Overview
Features
Download
Documentation
Community
Add-Ons & Services
The POCO C++ Libraries Blog

The Incredible Bulk

Filed under: News by alex at 02:11

OK, forget about the step I mentioned before – it’s gone and we have true bulk instead. For the drivers/connectors that support it properly, that is. Namely, when one gets into implementation of more advanced features, some ODBC drivers simply do not live up to the expectations. I was not able to find PostgreSQL driver that will do multiple recordsets. When it comes to bulk operations (a.k.a. “array binding”), results are varying. Oracle is a champion when you bind integers (10,000 ints, ~500x speedup). Put some strings into mix, things will get slower, but still significantly better than normal mode. BLOBs? You may end up with no difference.

At any rate, I have decided to put bulk in and keep it because it allows us to scale. Achieving speedups at the order of magnitude of hundreds is nothing to sneeze at, even if it’s for limited scenarios. I’ve seen 2-6x speedups with MS SQL Server (with strings and BLOBs) and similar with PostgreSQL (although I was not able to do bulk with BLOBs for PostgreSQL). Based on the results, I think it will be useful in many cases. For the time being, it is only allowed for single direction (in for parameters, out for recordsets) and std::vector.

Here’s what the insert code looks like:

std::vector<int> ints(100, 1);
session << "INSERT INTO Test VALUES (?)", use(ints, bulk), now;

And for select:

std::vector<int> ints;
session << "SELECT * FROM Test", bulk(100), use(ints), now;
session << "SELECT * FROM Test", use(ints, bulk(100)), now;

As usual, manual is up to date.

I’d say, this makes Data feature complete for the time being. What remains is polishing for the next big release. Early adopters are strongly encouraged to check it out and play with it. Feedback is essential to make good things better.

For1.3.1. users, I have patched few bugs. Some are rather serious, so please update to the latest code. Results are in SVN (1.3.2. branch).

5 Comments »
  1. Is there a guide to using POCO data with MSSQL on a unix system? It would help a lot of people if someone could simply write a guide. Poco is one of the greatest frameworks out there, can’t wait to see what comes next.

    Comment by John Lemon on December 17, 2007, 03:30

  2. John,

    You can check documentation for unixODBC or iODBC and FreeTDS. Link to Poco::Data manual is in the post above (bear in mind that manual reflects features available in SVN).

    Alex

    Comment by alex on December 17, 2007, 04:07

  3. As can be seen from the corrections above, I managed to do blobs with PostgreSQL, albeit on windows only. MySQL refuses to do it, so for now bulk is not supported w/ MySQL ODBC (luckily, the native connector work is in progress).

    std::deque and std::list are now suppoerted for bulk operations alongside std::vector (when it comes to input binding, std::vector offers better performance for POD types, since there is no internal caching and copying).

    And then, internal extraction can now do bulk as well (for the drivers willing to cooperate).

    We need testers, especially on non-windows platforms! Please check out the trunk and give it a try.

    Comment by alex on December 21, 2007, 04:34

  4. If you need a workaround for non-implementation of bulk for insert, then I suggest comapring multiple insert statements with:

    insert foo select 1, ‘a’
    union select 2, ‘b’
    union select 3, ‘c’

    It works quite well where I’ve tried it – especially it seems on SQLServer and Sybase where triggers run per-statement, not per-row.

    James

    Comment by James Mansion on January 2, 2008, 16:24

  5. James,

    Thanks for the tip. There is a difference between bulk and multiple statements, however. Bulk insert operates with supplied container(s) as parameter array(s) passed to the driver. From there, it entirely depends what driver does with it – some turn it into multiple statements, some (e.g. Oracle) do one statement and pass the array(s) to the db.

    Since we currently treat SQL code as a “black box”, your suggestion is only applicable at the user level where user can choose whether to pass SQL string as multiple statements separated by ‘;’, or only one statement in the way you suggest.

    Alex

    Comment by alex on January 2, 2008, 17:34

RSS RSS feed for comments on this post. TrackBack URI

Leave a comment