The Incredible Bulk

OK, forget about the step I mentioned before – it’s gone and we have true bulk instead. For the drivers/connectors that support it properly, that is. Namely, when one gets into implementation of more advanced features, some ODBC drivers simply do not live up to the expectations. I was not able to find PostgreSQL driver that will do multiple recordsets. When it comes to bulk operations (a.k.a. “array binding”), results are varying. Oracle is a champion when you bind integers (10,000 ints, ~500x speedup). Put some strings into mix, things will get slower, but still significantly better than normal mode. BLOBs? You may end up with no difference.

At any rate, I have decided to put bulk in and keep it because it allows us to scale. Achieving speedups at the order of magnitude of hundreds is nothing to sneeze at, even if it’s for limited scenarios. I’ve seen 2-6x speedups with MS SQL Server (with strings and BLOBs) and similar with PostgreSQL (although I was not able to do bulk with BLOBs for PostgreSQL). Based on the results, I think it will be useful in many cases. For the time being, it is only allowed for single direction (in for parameters, out for recordsets) and std::vector.

Here’s what the insert code looks like:

std::vector<int> ints(100, 1);
session << "INSERT INTO Test VALUES (?)", use(ints, bulk), now;

And for select:

std::vector<int> ints;
session << "SELECT * FROM Test", bulk(100), use(ints), now;
session << "SELECT * FROM Test", use(ints, bulk(100)), now;

As usual, manual is up to date.

I’d say, this makes Data feature complete for the time being. What remains is polishing for the next big release. Early adopters are strongly encouraged to check it out and play with it. Feedback is essential to make good things better.

For1.3.1. users, I have patched few bugs. Some are rather serious, so please update to the latest code. Results are in SVN (1.3.2. branch).