Monday, October 12, 2009

Use Digest::MD5 - it's easy

In my previous entry, I presented and purposefully ignored a rather non-portable way of getting an MD5 sum from a file:
$md5 = `md5sum $filename.new | awk '{ print $1 }'`;
$md5 =~ s/\n$//;
There are stupider and better ways of doing this in the system call, but it completely ignores the problem that the command is not always called md5sum.

Digest::MD5 comes to the rescue!

use autodie; # Hee-hee
use Digest::MD5;
my $digester = Digest::MD5->new;
open(FH,"<$filename.new");
$digester->addfile(*FH);
my $md5 = $digester->hexdigest;
Okay, that looks slightly over-complicated, one might argue that Digest::MD5 should hide the file handle fiddling from the user. The value added comes when you have a chunk of data already in a variable, then you just do use Digest::MD5 qw(md5_hex); and call md5_hex($data).

Oh, and I snuck in something else again, didn't I.

3 comments:

Michael said...

If you are getting the MD5 hash for a scalar in memory then yes, Digest::MD5 is the right answer. But if it's a file on disk that you need to hash, then look at Digest::MD5::File which hides the file manipulation details.

Jakub Narebski said...

Why do you use glob filehandle, i.e. open(FH,...) instead of local filehandle open(my $fh,...)?

bakkushan said...

Michael: From what I can see, Digest::MD5::File does not really hide the details very well, it merely does it differently. However, it does seem to simplify matters slightly when fiddling with multiple files.

Jakub: I use glob filehandles because I'm used to it. It also makes it easy to see that I'm actually dealing with a filehandle in subsequent code.

In more complex code, I might use a hash of filehandles, though.