Class::DBI is
a very nice database abstraction layer for perl. It allows you to define
your tables and columns and it magically provides you with classes with
accessors/mutators for those columns. With something like
Class::DBI::Pg, you don’t even need to tell it your columns; it
asls the database on startup. It’s all very cool mojo and massively
decreases the development time on anything database related in perl.
Unfortunately, as far as I can tell, it has a massive performance
problem in its design. One of the features of Class::DBI is lazy
population of data. It won’t fetch data from the database until you try
to use one of the accessors. This isn’t normally a problem, except with
retrieve_all(). Basically this function returns a list of
objects for every row in your table. Unfortunately, due to the lazy
loading of data, retrieve_all() calls SELECT id FROM
table; and then every time you use an object it calls SELECT *
FROM table WHERE id = n;. For a small table, this isn’t too bad, but
for a large table, it’s a killer.
I did a little benchmark today to see just how much slower it is over
plain DBI. I wrote two functions which iterate over a table, assigning
one value to a function (forcing Class::DBI to fetch the data). The
table in question contains 635 rows. The code I used was:
use strict; use warnings; use Benchmark qw(:all) ; use Foo; use DBI; sub class_dbi { for my $foo (Foo->retrieve_all()) { my $bar = $foo->bar; } } sub dbi { my $dbh = DBI->connect("dbi:Pg:dbname=$db;host=$host",$user, $passwd); my $sth = $dbh->prepare("SELECT * FROM foos;"); $sth->execute(); while(my $row = $sth->fetchrow_hashref()) { my $bar = $row->{bar}; } } cmpthese(100, { 'Class::DBI' => 'class_dbi();', 'DBI' => 'dbi();', });
The results:
brick david% perl benchmark.pl s/iter Class::DBI DBI Class::DBI 10.3 -- -97% DBI 0.351 2845% --
Class::DBI is more than 28 times slower than using DBI directly. I’m
hoping that someone will now tell me “Oh you just do blah”, otherwise
I’m going to have to rewrite some of my code. One thing to learn from
this is that reduction in development time can often cost you more in
other areas, and it’s often runtime performance.
Update: It appears that the bug is that
Class::DBI::Pg does’t set the Essential list of columns, so
Class::DBI uses the primary column. you can fix this by adding the
following to your database modules:
__PACKAGE__->columns(Essential => __PACKAGE__->columns);
Remember you’ll need to do that for each of your modules; you won’t
be able to do it in your superclass, as you won’t have discovered your
columns yet. This has increased performance, but not massively. New
timings (with the addition of using Class::DBI through an iterator):
s/iter Class::DBI it Class::DBI DBI Class::DBI it 6.35 -- -2% -94% Class::DBI 6.23 2% -- -94% DBI 0.350 1714% 1680% --
Update 2: It appears that further speedgains can be
made by not using Class::DBI::Plugin::DateTime::Pg to convert the three
timestamp columns in my table into DateTime objects.:
s/iter Class::DBI it Class::DBI DBI Class::DBI it 1.26 -- -11% -72% Class::DBI 1.12 12% -- -69% DBI 0.350 260% 220% --
on said:
If you read the output using arrays (using list context on the left side) it should be must faster, since it then reads the whole objects. What you get now is an iterator. If you have more than 635 rows, memory could be an issue though.
@foo = Foo->retrieve_all();
on said:
Here’s a Perl ORM performance comparison across a broad range of tasks (e.g., inserts, updates, simple selects, joins, both with and without inflating column values into DateTime objects, and so on):
http://code.google.com/p/rose/wiki/RDBOBenchmarks
Here it is with DBI added to the mix:
http://code.google.com/p/rose/wiki/RDBOBenchmarksWithDBI