How Can I Explain This?: Perl 5

Showing posts with label Perl 5. Show all posts

Thursday, March 4, 2010

Coding styles that make me twitch, part 8

One of the worst programming habits I know of, is assuming that everything will work out; you just perform the command/function call, and it has to work. In other words: do not bother checking for error conditions or propagating them, because they are not going to happen anyway. right?

Wrong.

But the code works. Mostly. Except when an error condition occurs. And it will, eventually.

In some instances, such code is a glorified shell script. I have ranted about that before. But catching error conditions properly can be tricker than with in-Perl functions and modules. Especially if you have no control over what the external program does, but the original programmer did.

Even with in-Perl functions and modules, you might run into, ehrm, interesting usage.

Usually, the root of the problem is that the original programmer did not foresee that some his code could grow into something else.

While it may have been a good idea to create an SQL query wrapper to hide parts of what DBI does for auto-committing statements:

sub do_sql {
    my $qry = shift;
    my $ignore = shift;
    if (!$dh) {
        &slogin();
        return if (!$dh);
    }
    my $sh = $dh->prepare($qry);
    if ($sh && !$sh->execute) {
        if ($sh->errstr =~ /(MySQL server has\
 gone away)|(Lost connection to MySQL server)/) {
            &log("Lost MySQL, reconnecting.");
            &slogin();
            return if (!$dh);
            $sh = $dh->prepare($sqlstmt);
            undef $sh if ($sh && !$sh->execute);
        } else {
            undef $sh;
        }
    }
    &log("SQL failed: '$qry'.") if (!$sh && !$ignore);
    return $sh;
}

that is no guarantee that the implementation will be future-proof, when the subroutine is used like this:

my $sh=&do_sql("SELECT * FROM tab1 WHERE\
 x<>'$cgi->{data}' AND y<>42");
if (my $res=$sh->fetchrow_hashref) {
    $cgi->{ref}=$res->{ref};
}
&do_sql("UPDATE tab1 SET x='$cgi->{newdata}',\
 y=23, z='$cgi->{ref}'");
&do_sql("UPDATE tab2 SET tab1_changed='yes'");
&log("Updated tab1, ready for externals");
# Process the changes made above:
system("/usr/local/bin/changestuff.pl");
&log("Finished processing");

Imagine now that changestuff.pl also performs changes in the database tables mentioned above, and that the code above is called in a cron job every minute or so.

Here is an attempt at listing the worst parts:

do_sql only pretends to do proper error checking, it mostly does not do anything useful about the error situations.
The query result seems of little consequence.
There is an obvious need for bound variables in the prepared statement, but do_sql does not support that. So the code pretends the problem does not exist.
When do_sql is used for updates, there is no check whether the returned statement handle ($sh) is empty or not, there is no way of knowing whether we are clobbering the database.
Why do we not care whether the external commands succeeds or not?
SQL transactions, anyone?
Yeah, other parts of the style sucks, too.

Now imagine the example above multiplied to thousands of lines of inter-dependant code.

I am happy to say that I do not see things like this too often.

However, cleaning up code like this is a PITA, and it is often easier just to close your eyes, add your own code, and leave well enough alone.

Thursday, February 25, 2010

Coding styles that make me twitch, part 7

Parentheses and precedence

This week, I will strive to be better at explaining the problem scope, while heroically and miraculously maintaining brevity.

I used to be annoyed at seeing extraneous parentheses with the ternary conditional operator pair ? :, but then I discovered that the PHP language designers decided to break precedence (see example #3). So I have stopped twitching as much as I used to when seeing parentheses galore in that context.

In Perl Best Practices, Damian Conway notes that even though the low-precedence logic operators and, not and or may seem nicer, they can confuse the reader regarding the intended meaning of the code. But he does not — as far as I can recall or find — address how these relates to parentheses and precedence (I suppose that should be "parentheses && precedence").

I start twitching when someone insists on using parentheses like this (imagine that the example was more convoluted):

if (expr1 ||
    (expr2 && expr3) ||
    expr4
    ...)

That does not help understanding. It confuses me, as the read, regarding your intentions. Is there a piece of code missing, perhaps a little bit of || expr2b or || expr3b?

Sometimes, I even see misguidedly mixed-in letter-literal operators:

if (expr1 or
    (expr2 && expr3) or
    (!expr4)
    ...)

I much prefer seeing

if (expr1 ||
    expr2 && expr3 ||
    expr4
    ...)

if (expr1
    || expr2 && expr3
    || expr4
    ...)

depending on what floats your boat in the most stylish way imaginable.

I would have thought that the logical and/or/not part of operator precedence would be easy to understand. It is essentially the same in most programming languages: Ruby, Python, Perl, Java, C, ... and not even PHP managed to mess this one up.

The chance that anyone is going to be confused by your code because of these parentheses not being there is far, far smaller than the chance that they are going to be confused by their presence.

I have heard the defense "but the code is so non-obvious that I had to add them!" Well, make your code obvious instead.

Thursday, February 18, 2010

Coding styles that make me twitch, part 6

Edit: There are, apparently, strong feelings that I should not post my personal preferences. Reader discretion advised: this post expresses my personal opinions regarding a limited use case for file handling and filehandles, and must not be read as general advise on how to deal with filehandles in Perl, and so on.

Today's theme is: unnecessary variable clutter.

my $fh = new Filehandle("/usr/bin/somecmd someargs |");
while (my $l=<$fh>) {
    if ($l=~m/foo bar zot/) {
        # lots of code that does not depend on
        # a variable file handle, nor creative
        # usage of $l where $_ could not be used
        # implicitly just as easily
    }
}

Sure, there are situations where it makes sense to assign filehandles to variables, or using $l instead of $_, but the above example is not one of them. I found no particular reason why the original programmer had used new Filehandle, either.

Perl 5 has, on purpose, made this easier for us than in the above example:

open my $FH, '-|', '/usr/bin/somecmd someargs'
    or croak "OMG\n";
while (<$FH>) {
    if (/foo bar zot/) {
        # lots of code
    }
}

Thursday, February 11, 2010

Missing feature

A few hours ago, I suddenly had a bright(?) idea, or desire if you will:

Proper (Unicode) exponents in Perl.

That is, I want to be able to write 2², 4¹³⁷, 3^-9, etc. and have Perl understand them.

For Perl 5, I suspect someone would use a source filter to implement it.

For Perl 6, PerlJam++ suggested introducing each of the exponents as postfix operators, using this example for squaring:

our &postfix:<²> := &infix:<**>.assuming(b => 2);

But then a negative exponent would complicate things a bit.

It's a thought, anyway, and not one that I'd want to distract more pressing implementation concerns.

And ifwhen someone decides that this is a good idea to have in the language core, I'll start nagging about Knuth's up-arrow notation. Not that I'd want anyone attempt calculating 4↑↑↑↑4.

Thursday, February 4, 2010

Spooky

This is just a small anecdote.

Tonight, I finished a small Perl 5 script that I've been wanting to complete for a while, but where I was a bit nervous that I'd fsck it up right and good.

It was a script designed to handle two tab-separated text sources; one a list of tournament IDs and tournament names, the other a list of player results, one line per result with the player ID, tournament name, position and score achieved.

I achieved this by creating a hash of hashes for each file, referencing the first while parsing the other, and bravely inserting the data into a single database table.

I tested my code piece for piece while building it, which is sensible in itself, but what spooked me was this:

There was not a single bug. The script did what it was supposed to do, all along.

That's not supposed to happen.

I need a drink.

Thursday, January 14, 2010

feather.perl6.nl - a Sysadminish Tale

<@Juerd> frettled: Blog about the mess you found when
         you first logged in on feather yesterday :)

Sure.

This will, incidentally, also explain why Trac is kindof unavailable now.

feather.perl6.nl is a Xen guest (a virtual machine, hereafter "VM") that's hosting several important services for the Perl 6 community. There's SVN web access, a Trac installation, and a bunch of other stuff I honestly don't know the half of.

Recently, the VM started running out of memory too often for comfort. What was going on? Juerd asked for help in tracking down the problem, as he didn't have the time to do so himself. And needing some distraction from work -- something to help me procrastinate -- I volunteered.

By now, you're probably banging your head on your keyboard in sympathy with me for saying something that may have been slightly less than brilliant. You know the feeling; Matt Trout reaches his right hand towards you, the world is suddenly in slow-motion, you see his hand closing in on you, his grin widening, and a voice saying "thhhhhaaaaannnnnkssss ffffooooorrr vooooluuunnnteeeerrriiiinnnnng", and you're basically up that creek with all the mud and dirt in it.

After handing over an SSH public key and getting sudo access (yeah, yeah, I know), I had a look anyway.

First, I went on a brief but wild goose chase, finding some error messages regarding ConsoleKit which appeared to be more frequent just before the server went out of memory, checked the Debian version (an unholy mix of Debian unstable and Debian experimental with lots of package updates pending someone's attention), and generally tried to get a feel of how the system was configured.

We already knew that Apache somehow might be responsible for gobbling up available memory, so my first action was to have a look at the last 100,000 lines of the Apache access log, using a simplistic log analysis script.

But which log? There were three Apache log directories to choose from. I (correctly) guessed that the one called simply /var/log/apache2 might be the interesting one, the others seemed to be legacy directories which should have been removed ages ago.

According to the script, there were 0 accesses in the last 100,000 lines.

Knowing the script, that was not so strange, because it makes a few assumptions regarding the log format, using a regexp belonging to the days before named captures and whatnot:

while (<>) {
  if (/^(\S+) (\S+) - - \[[^\]]*\] \"(GET|POST) \S*
       HTTP(|\/1\.[01])\" \d{3} (\d+) \"/) {

The regexp line has been split for the sake of the line width of this blog. There's nothing to be proud of here.

Anyway, I first had to remove the first capture; feather's logs weren't showing the virtualhost as the first column, and access types were most certainly not limited to only GET and POST:

frettled@feather:~$ sudo awk '{print $6}' /var/log/apache2/access.log|
sort -u
"CHECKOUT
"CONNECT
"DELETE
"GET
"HEAD
"MERGE
"MKACTIVITY
"OPTIONS
"POST
"PROPFIND
"PROPPATCH
"PUT
"REPORT

Right.

After straightening that up (and adding %v to the LogFormat specifications in the Apache config for future use), I got the following result:

Use of uninitialized value $size in addition (+)
at /usr/local/sbin/bandwidthips line 39, <> line 1002.
Use of uninitialized value $size in addition (+)
at /usr/local/sbin/bandwidthips line 40, <> line 1002.
Use of uninitialized value $size in addition (+)
at /usr/local/sbin/bandwidthips line 44, <> line 1002.

AAAARRGH! Idiot! Imbecile! Inept half-wit! Yep, I'd forgotten to renumber my captures. See, this is why Perl should be in version 5.10.1 or 6 when fiddling with those bloody annoying regexps.

frettled@feather:~$ sudo tail -100000 /var/log/apache2/access.log|
/usr/local/sbin/hitips|head
193.200.132.146: Bytes = 14329487 (3.44%), Hits = 51503 (51.82%)
66.249.71.2: Bytes = 132116084 (31.73%), Hits = 18111 (18.22%)
66.249.71.37: Bytes = 50846948 (12.21%), Hits = 6236 (6.27%)
93.158.149.31: Bytes = 54880221 (13.18%), Hits = 1894 (1.9%)
71.194.15.106: Bytes = 460200 (0.11%), Hits = 1894 (1.9%)
209.9.237.232: Bytes = 433388 (0.1%), Hits = 1686 (1.69%)
193.200.132.135: Bytes = 1726871 (0.41%), Hits = 1635 (1.64%)
193.200.132.142: Bytes = 429358 (0.1%), Hits = 1609 (1.61%)
208.115.111.246: Bytes = 8461415 (2.03%), Hits = 1238 (1.24%)
67.218.116.133: Bytes = 18945415 (4.55%), Hits = 1126 (1.13%)

So, uhm, around 52% of the hits come from feather3.perl6.nl, and nearly 25% from Google's indexer. Lovely.

Looking at the accesses from feather3, I quickly saw that they mostly had to do with svnweb.

Juerd had already stopped Apache, but someone -- I don't know who -- started it again at 12:00, probably anxious that SVN and such didn't work.

I then followed the running processes using the top command, updating each second (top d1), sorting by memory usage (typing M while top was running), hoping to catch some quickly growing processes.

Nopes. None, zilch, nada. Nothing that appeared horribly wrong. Sure, the apache2 processes used some memory (30-60 MB resident set, 50-100 virtual), but nothing appeared to be out of the ordinary. I changed the update frequency to each third second -- top sometimes uses an inordinate amount of CPU, depending on magic -- and waited. After a while, a couple of apache2 processes were using more CPU and memory than the others, around 60-90 MB resident. And they were growing. And according to lsof, they were active in the svnweb directory (and used a metric shitload of libraries). And after growing, they didn't release memory, they just kept on using it. But it wasn't enough to use up memory, there was still a bunch of free RAM.

So that was perhaps svnweb's fault, then?

Maybe.

But then my time ran out, and I had to drop the ball, leaving the top process running.

Five minutes later, the memory ran out again. It's just as if someone was waiting for me to go idle in order to produce the problem that I was looking for.

Sigh.

svnweb kindof remained the main suspect, until Juerd caught whatever was happening at the right time.

And catching what happens at the right time is bloody important.

Here's what he found, using Apache's server status:

Well, that's not svnweb. That's Trac. And the IP addresses belong to Google.

And that's spam, effectively creating a DoS or DDoS attack on our services as a side effect when search engines try to index the Trac webpages. It probably isn't intentional, but spammers just don't care.

So, what can we do to protect feather from suffering from such attacks in the future?

There's a lot that can be done. It takes effort. It takes time. It takes someone.

Here are a few suggestions on how to improve the robustness of the kind of services feather provides:

Add a captcha to the web form. The disadvantage is that this does not really save processing resources, but it probably should be done anyway.
Add an unnecessary and bogus input field to the web form, e.g. "Phone number". This input field should be hidden with CSS so that web browsers don't display it, and if someone submits anything with data with that field's name, then you can be nearly 100% certain it's spam from someone who's used a web scraper before automatically filling the form. Filter it out.
Change the webserver delegation architecture, so that each Apache process isn't loading tens of megabytes of libraries and keeping them in memory. Off-loading to shorter-living FastCGI daemons or similar solutions, or even sacrificing program startup speed by using CGI+suexec, etc., may be decent starting points.
Consider using a front-end proxy like Varnish to gloss over underlying nastiness.
Start with a new VM and migrate services to that one, gradually.
Document configuration choices and what each web service does/is there for, so that the next sysadmin coming along can make educated guesses quicker. :)

These tasks can rather easily be split into manageable one-person projects.

Does this sound interesting to you, or did I lose you at the third line of this blog entry?

Pop in on #perl6 on the freenode IRC network and say so.

Thursday, December 31, 2009

2009 In Perl

Repeat after me: I will not pretend to be an analyst or doomsayer, even though the end (of 2009) is nigh.

In 2009, Perl grew up a bit more, both as a language and as a community.

Language Development

Perl 5.10.1 came with a pony to those of us who fear the .0 releases.

The Perl 5.11 development tree got started, and it looks like it is rolling on rails. At this rate, we will see 5.12.0 quicker than you can say antidisestablishmentarianism.

... Perl 6 has made progress both on the specification side and in implementations -- yep, that is plural. It is sometimes confusing when naming changes under your feet, but it is acceptable while the spec is still settling.

Community

In 2009, I think I saw more openness regarding the internal conflicts in the Perl community as a whole; there were abundant admissions that we were not communicating nearly as well as we should, that there was at least a small amount of internal bickering over the present and future state of the onion -- onions, I must inject, tend to come in many shapes and flavours, and are not always the same inside -- really, which way we are going, are we having a conflict or not (yes we are -- no we are not -- huh, are we talking? -- pass the chips), and get off my lawn before I shoot or hug you.

In brief, it looks to me like 2009 was the year when the community showed renews signs of self-awareness.

But much more happened. We got a closer focus on Perl visibility, from my POV mainly owing to Matt S. Trout's lightning talk challenge from NPW 2009, plus a whole range of people working on other PR aspects for ourselves. And mst still keeps his hair colour. Wow.

Other Stuff

I made new friends, I learned a lot, I even got to help out a bit, and I hope that this will continue in 2010.

I hope you will too.

Happy new year!

Wednesday, December 9, 2009

GCD - A Small Language Enthuser

fun gcd (x:int,y:int) : int =
    case x of 0 => y
  | _ => if x < 0 then gcd(y,0-x) else
         if y < 0 then gcd(0-y,x) else
         if y > x then gcd(y-x,x) else gcd(x-y,y);

"But that's not Perl!"

Yeah, yeah, I hear you.

I'll rectify that minor detail in a bit.

But first, an anecdote.

Back in the late nineteennineties, I was studying computer science, and one of the classes was about program specification and verification.

Several of the students already had a background with several programming languages, some were functional, some were imperative, and other languages were a bit confused about what they really were.

When studying program specification and verification, you either become rather obsessed with program correctness -- and hopefully elegance -- or you fail spectactularly.

There are several ways to muster enthusiasm when dealing with such studies; they can be rather, ehrm, theoretical.

I therefore flitted about, flirting with various programming languages, comparing them with the eagerness that young idealists do.

For some reason, I found Euclid's GCD algorithm to be particularly fascinating, for reasons unknown to men to this day.

The Perl version I saw was rather awful, and technically incorrect:

sub gcd {
    if (!$_[0]) {
        return $_[1];
    }
    if ($_[1] > $_[0]) {
        return gcd ($_[1]-$_[0],$_[0]);
    }
    return gcd ($_[0]-$_[1],$_[1]);
}

Yikes. I mean, eep. And Perl does have a modulo operator.

sub gcd {
    my ($x, $y) = @_;
    $y ? gcd ($y, $x % $y) : abs $x;
}

I won't claim that the above code is the epitome of elegance, but it solves the problem in a general and easily read way (I admit a prejudice against $_[N]), while retaining correctness.

This is, BTW, one place where some golfers miss the boat; the GCD cannot be a negative integer. That's why the ML code at the top is so verbose.

Small challenges like these kept me going, and it can be an inspiring way to learn details in a new language. So, what would it look like in Perl 6?

sub gcd (Int $x, Int $y) {
    $y ?? gcd($y, $x % $y) !! $x.abs;
}

What's your favourite algorithm for playtesting languages?

Tuesday, December 1, 2009

Oslo.pm Past and Future

In case the title wasn't a give-away: this is a non-technical blog entry.

I became an Oslo.pm member by signing up for the mailing list shortly after the Nordic Perl Workshop 2009. That's cheap (well, free!), easy, and therefore newbie-friendly.

Last week, I dropped in at the general assembly and exercised my speaking and voting rights, and got an inside scoop on how this Perl organization works. The board members were, after all, the guys who did a terrific job arranging not only this spring's workshop, but also mostly the same people who held the workshop of 2006, which also went quite well.

From my point of view, Oslo.pm has come from being an anonymous group to a rather solid little volunteer organization. Before 2006, I'd have said "huh?" if someone asked me who might have anything to do with Perl in Norway, afterwards, I knew there was something called Oslo.pm, and so did a few people in Europe and the USA. After NPW 2009, I think it's safe to say that the organization is now known as a stable and capable group of Perl mongers. That's a decent achievement, especially in this age, when it seems like almost nobody (in Norway) is willing to do anything free of charge.

So what did they think about themselves, and what's going to happen in the near future?

True to the Norwegian spirit, they were modest and self-disparaging, but they were very happy that the attendees were apparently happy, even months later.

Salve J. Nilsen, the Great Leader of 2009, bowed and said farewell to the post of chairman, and now Marcus Ramberg is at the helm.

The new Oslo.pm board will attempt to increase local activity, and there will probably be some kind of technical talk on the first Wednesday of almost every month in 2010. They aim to increase cooperation with local Perl-using companies, as well as aiming for some cross-language and language agnostic sessions.

First out is tomorrow's Perl 5.10 session at Redpill Linpro, which I'm sure will be technically rewarding for those who show up. I plan to!

Wednesday, November 25, 2009

The morality of helping

Yesterday, a friend asked me, "are you a Perl expert?"

I answered in the only way possible: "eeeeh..."

It turned out that my friend did not ask for help for himself, but for someone else, who had posted a programming class question on a non-programming bulletin board.

The poor fellow was struggling with a question of parsing a two-column input to generate a certain output format, essentially also two-column, except with a slightly different layout.

On Usenet, there was - and maybe still is - a long-standing tradition of not solving people's homework for them. The reasoning behind this is that we do not learn quite as well when people solve our problems for us, as when we struggle with them ourselves.

In some cases, school questions would be met with derision, in other cases with genuinely unhelpful and false answers, and sometimes with helpful clues about how to solve the problem; where to look, tips for using stepping debuggers, which book chapter or manual page would clarify the problem, etc.

Okay, that is fair enough.

The guy had gotten only one answer, from another guy who regretted that he had not touched Perl in ages, and therefore could not help. And I thought that Perl was like learning a particularly catchy, but annoying song: you might think that you have forgotten, but then someone hums or whistles the tune, and WHAM - there it is, stuck in your head again.

So I had a look. Maybe I could provide a hint or two, you never know. I know my way around some of the less scary parts of Perl 5 City, anyway.

This guy had essentially nailed the problem semantically, but he was evidently struggling with his code, it just did not work.

I immediately saw a few major concerns:

Some parts were copy-pasted from bad textbook Perl.

Some parts must have come from a poor programming education.

The code was overly complex and verbose.

There was no error checking or debug print-outs.

And it would take me more time to helpfully point out these things than write something that might be a solution myself.

The moral dilemma then was:

Should I help the guy out by tearing his code apart and pointing out all the flaws that made it thoroughly lousy code, thereby provoking a true emo-American melodrama?

Or should I just write an alternate solution, with sound error-checking, simplicity, and debug print-outs?

In this case, I thought the latter was the way to go. I put the code up anonymously somewhere, gave the link to my friend, and perhaps the fellow with the problem now has a better understanding of how simple and beautiful Perl can be.

Yeah, right. :D

Sunday, November 15, 2009

What stops me from using Perl 6, today?

Since I got hooked on the Perl community, and got a taste of Perl 6, I've been wondering about:

what, exactly, is it that I could use Perl 6 for, right now?
why am I not actively using Perl 6 now?

Those are easy questions, but answering is hard, so this may be a long post.

Sure, the points listed below are not exactly Perl 6 specific; I could probably have picked some other programming language, but I somehow feel more comfortable in the way that Perl 6 still is Perl.

What I could use Perl 6 for right now

I think it's fair to say that using Perl 6 today mostly means using Rakudo, and that I wouldn't use it in what we popularly call a "production setting". But many of us programmers, sysadmins, geeks and nerds have perfectly suitable hobby projects, where we won't have clients wringing our necks if there is three minutes of downtime in a month, or if we don't deliver the Speedy Gonzalez of services; we have projects that are neither computing performance constrained or stability constrained.

So that's where I could have started using Perl 6 half a year ago, and of course still can.

I know I can use Perl 6 for e.g. a fairly complex web site using Web.pm and Squerl for a SQLite backend. It will probably work just fine, for a lot of projects.

I know I can use it for lots of one-liner scripts.

I know that in some regards, Perl 6 will outperform classic Perl 5 in terms of programmer time spent. An example is the given-when control structure, which (to me) is semantically superior to if-elsif-elsif-elsif. Programmer time is important to me, I hate coding too much for menial tasks. And I'm sorry to say that Perl 5.10 doesn't do it yet for me, as I cannot rely on its presence, even for hobby projects.

And I know I can use Perl 6 to refresh some of the knowledge about programming language specifics (terminology, technique, methodology, etc.) that I've allowed to rust since I left university in 2001.

Concrete projects, in no particular order

web page for registration of pool billiards tournament results; it's not performance critical, and the users could check and verify the dataset themselves after input

conversion of historical results data in CSV format to a database; one-time job, needs manual verification no matter what programming language I use to do it

contributing to the Temporal.pm specification and implementation in Perl 6

personal web gallery generation; I positively loathe most of the online galleries, because they sooner rather than later are discovered to have HUGE, GAPING security vulnerabilities

blogging tool; I'm not very comfortable with blog software running on servers, either, and whatever blogging I do, it's not actual work

That's quite a lot, isn't it? It ought to have been enough to get me going in a jiffy!

Why I'm not actively using Perl 6 now

This may be a surprise to some: it's not because of a lack of matureness in the tools, a lack of confidence in the language or tools, stability issues, etc. As I tangentially mentioned above, I believe there is no technical hindrance for me to start coding on a hobby project.

I have plenty of hobby projects to choose from. They are also quite manageable in terms of eventual lines of code.

However, there is something holding me back, and that's a certain degree of perfectionism mixed with procrastination fever.

mst mentioned during the NPW hackathon this spring that perfectionism was a barrier against getting started. If you're too obsessed with getting things right at first, at wanting to avoid failure, procrastinating is too easy. Getting slightly intoxicated (yup, drinking alcohol, which of course is only a recourse for adults) is a way of reducing your own perfection anxiety. This is almost exactly what Randall Munroe's xkcd calls the Ballmer Peak:

But I don't sleep too well after drinking alcohol, and I also tend to do hobby projects in my "running breaks" during work hours, in which case alcohol intake may be a very bad idea.

In addition, my time at work is a series of interruptions, which really isn't conductive to sitting down and learning something new and complex.

When I get home from work, I'm usually so fed up with computers that I don't want to have anything to do with them.

So my spare time, whatever is left of it, usually isn't spent on programming. Note that I don't even do these projects in a programming language I already know well; they are on hold regardless of that.

All in all, there's nothing much wrong with Perl 6.

Blaming the immaturity of Rakudo would just be a silly excuse. There's something wrong with my capacity for finding the time to get down and dirty with it, that's what; I'm apparently not currently capable of saying honestly:

This is my Perl 6 hour. This hour, I'm going to do Perl 6 stuff, and this time is sacred.

Tuesday, November 3, 2009

Checking and fixing Unix file modes

Among other tasks in my sysadmin role at a web hosting provider, I work with fallout from poorly designed PHP code - which is ubiquitous - and I use Perl 5 to perform a bunch of semi-automated tasks.

If you just want to see how I utilize the Fcntl module, look further down.

One of the many things that tend to go wrong is the assumption that PHP always runs as mod_php (blatantly disregarding php-cgi, suphp, and other per-user PHP frameworks), and therefore directories (folders) and files used by PHP "must be" prepared with chmod 777 chmod 666. The latter number is a BIG FRIGGIN' HINT, it's the number of the beast.

Whenever documentation tells you to use the number of the beast for your chmod command, that documentation is also telling you to lube up and bend over.

Unfortunately, customers and users don't necessarily see this gotcha; no matter how good the hosting provider's documentation is, they'll naturally trust the software monger's instructions.

That means that the hosting provider ought to have tools available for identifying and fixing such writable directories and files. There are many wrong ways of doing it, one is:chmod -R, since that touches ALL files, recursively, overwriting the ctime stamp. An okay way is to use find, which (in most versions) allow you to fix things up quite neatly (bash/sh compatible syntax, GNU find compatible options, $dir represents the directory to fix recursively, $user is the username whose files you want to change):

find $dir -xdev -user $uname \
\( \( -perm /og=w -exec chmod og-w {} \; \) -o \
   \( -perm /g=w -exec chmod g-w {} \; \) -o \
   \( -perm /o=w -exec chmod o-w {} \; \) \)

Phew, that was quite a mouthful, but it's rather nice in resource usage, and it doesn't cross filesystem boundaries (-xdev).

So why would I want to do this in Perl, you ask?

"Eeerrr. Good question, let me tell you why!"

There are many other problems to look for, which it is sensible to look for while you're at it, just to mention a few:

Backdoors
Hidden IFRAMEs
Malicious JavaScript
Malware URLs
Malware redirects (e.g. in .htaccess)
Outdated software versions
Root exploits
Spamming scripts
Viruses

These things belong in a program, not a teeny weeny Unix one-liner, or even a set of them.

While you're at it, you might want to create a log of what you found, and perhaps which line numbers are relevant for which files, both for pointing out where to consider fixing things, as well as having something to use for debugging your false positives.

The code

Here's how I identify the files with too liberal write permissions, utilizing the Fnctl module. The filename is stored in $_, the user's real UID in $r_uid, and I also store various file information and what kind of file we're looking at.

use Fcntl ':mode';

my %badperms;
my ($dev,$ino,$mode,$nlink,
    $uid,$gid,$rdev,$size) = lstat($_);
my $g_write = $mode & S_IWGRP;
my $o_write = $mode & S_IWOTH;
my ($isfile,$isdir,$islink) = (-f _, -d _, -l _);

my $filetype = $isfile ? "File" : \
   $isdir ? "Directory" : \
   $islink ? "Symlink" : "Other";
my $fn = $_;

# Writable for others                                                    
if (!$islink && ($g_write || $o_write)) {
    $badperms{$fn} =
        sprintf ("%s: [%04o] %s\n", $filetype,
                 S_IMODE($mode), $fn);
}

In a future post, I'll try to get back with how this fits in my bigger picture of vulnerability detection.

As always, suggestions for improvements and questions are very welcome.

Monday, October 19, 2009

Coding styles that make me twitch, part 5

Let us say that we have some code using DBI - old-fashioned, but it still works, kindof.

How would you like to see the following in a 6000 line CGI script you're supposed to debug?


    my $sth=&query("SELECT id FROM invoices WHERE invoiced='N'");
    my $invoiced=$sth->rows;

    my $sth2=&query("SELECT count(*) nusers FROM users \
LEFT JOIN people ... # long SQL statement
    my $row2 = $sth2->fetchrow_hashref;
    my $nusers = $row2->{nusers};

    my $sth3=&query("SELECT count(*) npeople FROM users \
LEFT JOIN people ... # long SQL statement
    my $row3 = $sth3->fetchrow_hashref;
    my $npeople = $row3->{npeople};

    my $sth4=...

And then, after carefully checking scope, you discover that the variables $sth2, $row2, $sth3, $row3, $sth4, $row4 etc. are not used anywhere else within the same scope.

Would you develop a tick?

I did, and the tick didn't lessen in strength when I discovered unsafe variable interpolation as well as a sub query that used $dbh->prepare($qtext) but didn't allow passing of usable parameters, such as, you know, bind variables.

I've started on "refactoring" the real life code this example is based on, but I get depressed from the sheer amount of work in fixing many problems at once. Maybe I should pick up drunken coding to become less perfectionist.

Monday, October 12, 2009

Use Digest::MD5 - it's easy

In my previous entry, I presented and purposefully ignored a rather non-portable way of getting an MD5 sum from a file:

$md5 = `md5sum $filename.new | awk '{ print $1 }'`;
$md5 =~ s/\n$//;

There are stupider and better ways of doing this in the system call, but it completely ignores the problem that the command is not always called md5sum.

Digest::MD5 comes to the rescue!


use autodie; # Hee-hee
use Digest::MD5;
my $digester = Digest::MD5->new;
open(FH,"<$filename.new");
$digester->addfile(*FH);
my $md5 = $digester->hexdigest;

Okay, that looks slightly over-complicated, one might argue that Digest::MD5 should hide the file handle fiddling from the user. The value added comes when you have a chunk of data already in a variable, then you just do use Digest::MD5 qw(md5_hex); and call md5_hex($data).

Oh, and I snuck in something else again, didn't I.

Tuesday, October 6, 2009

Coding styles that make me twitch, part 4

All too often, I see code like this:


$md5 = `md5sum $filename.new | awk '{ print $1 }'`;
$md5 =~ s/\n$//;
if ($md5 ne $origmd5) {
  system("mv $filename $filename.old");
  system("mv $filename.new $filename");
  …
}

Well, you get the idea.

Sure, Perl can pretend to be a glorified shell script, but there are perfectly well-functioning internal functions for these things.

Let's ignore the non-portability of the md5sum command for the time being, and also avoid shaving that awk call down with cut and an enclosing echo -n $(…) - we're here for the Perl, not the shell, right?

The above system() calls can easily be replaced with the following to save yourself a few unnecessary forks:


rename($filename,"$filename.old") and
rename("$filename.new",$filename);

Note the slight refinement of using and to avoid clobbering the file. But the documentation for rename() suggests that we use move() from File::Copy instead:


use File::Copy qw(mv);
mv($filename,"$filename.old") and
mv("$filename.new",$filename);

Similarly, there are nice built-in functions for many other frequent victims of system(), e.g.:


chmod
chgrp + chown
link
mkdir
rmdir
symlink
unlink

And of course there is a bunch of more or less helpful modules on CPAN (in addition to the already mentioned File::Copy), e.g. File::Path.

Monday, August 31, 2009

Some ways that Perl 6 is grand, part 2 of ?

Okay, this is really part 1b of ?, but…

In my earlier post, I used the zip operator to join two lists into a hash.

There was one obvious use of the operator that escaped me at the time, and that was how I sometimes need to create a new hash from the keys of two hashes, or keys and values. And now I think it's starting to look neat:

my %A = { a => 1, b => 2 };
my %B = { z => 9, y => 8 };

my %AB = %A.keys Z %B.keys;
# { "a" => "z", "b" => "y" }

%AB = %A.keys Z %B.values;
# { "a" => 9, "b" => 8 }

However, this is a bit unpredictable, since the hash key order is undefined. So if you expect sorted keys, do that at the same time:

%AB = %A.keys.sort Z %B.keys.sort;
# { "a" => y, "b" => z }

# Sort by B's values - two variants
%AB = %A.keys.sort Z map { %B{$_} }, %B.keys.sort;
%AB = %A.keys.sort Z %B.sort.map( { .value } );
# { "a" => 8, "b" => 9 }

The equivalent Perl 5.10 version would be:

use List::MoreUtils qw/zip/;
my @k = sort(keys(%A));
my @v = map { $B{$_} }, sort(keys(%B));
%AB = zip @k, @v;

I now have a nice-ish argument for upgrading to Perl 5.10.1 on $workplace's servers. :D

masak++ for helping a tired me with the map expression.
Chas. Owens++ for spotting the missing use statement for Perl 5.
Pm++ for another way of sorting by value, just what I was hoping for!
isec++ for spotting a missing sort() for Perl 5.

Sunday, August 23, 2009

Autovivification - a reminder

Most of you already know this by heart, but the odd reader may have forgotten.

Autovivification is what we call the process of automatically creating entries in built-in data structures (Perl 5: array/list and hash), usually at the time we check whether an inner element exists or not.

This can be a royal PITA, if you don't pay attention to the problem. That's why it keeps being mentioned.

Here's a simple example:


my %hash;
my $n;
while (!exists ($hash{x}) && $n < 5) {
    $n++;
    if (!exists ($hash{x}{y}) {
        print "hash{x}{y} does not exist: $n\n";
    }
}

Q: How many times does the above while loop run in Perl 5?
A: Once.

The simple matter of checking the existence of the inner hash resulted in an entry being created for $hash{x}.

That means that tests like these should be written more carefully:


…
    if (defined ($hash{x})) {
        if (!exists ($hash{x}{y}) {
            print "$n: hash{x}{y} does not exist.\n";
        }
    } else {
        print "$n: hash{x} is undefined.\n";
    }
…

This prints $n: hash{x} is undefined. (with an incrementing $n) five times.

Edit 2009-08-24 18:49 UTC: MST commented that there is an autovivification module on CPAN that lets us say no autovivification; - and it's even lexically scoped! That's just $notreallyanexpletive brilliant! Thanks, Matt, and thanks, Vincent!

Oh, BTW, Perl 6 has a useful specification for autovivification, which illuminates the problem further.

Wednesday, July 8, 2009

Coding styles that make me twitch, part 3

Today's twitchiness is sponsored by ... no, wait, I don't have sponsors. Ah, well.

I have an issue with people who insist on using <"> as a string delimiter when the (static) string itself contains that very character. It gets fugly all too soon:

my $html_output = "<a href=\"http://www.example.com/foobar/$pagename.html\" title=\"Oh, a link to $pagename\"> ...\n";

It's so easy to avoid having to quote the <"> while still allowing variable interpolation:

my $html_output = qq(<a href="http://www.example.com/foobar/$pagename.html" title="Oh, a link to $pagename"> ...\n);

Since the example is HTML (and could be e.g. SQL), and it might be multi-line, why not ...

my $html_output = <<EOL;
<a href="http://www.example.com/foobar/$pagename.html"
   title="Oh, a link to $pagename>
EOL

That wasn't so hard? Or fugly?

Wednesday, June 24, 2009

Virus scanning with File::Scan::ClamAV

This is almost ridiculously easy.

Problem: a bunch of user directories need virus scanning and per-user reports

Solution: A Perl script using File::Scan::ClamAV

Prerequisites:

A unixy OS
Perl 5.8 or 5.10
A functional and running clamd, preferably listening to a socket
The module File::Scan::ClamAV (and its dependencies)

The following code could (after some sensible adjustments) run in a loop through all usernames on your system.


use File::Scan::ClamAV;
# (...)
# $dir contains the full path of the user's directory
my $av = new File::Scan::ClamAV (find_all => 1,
                                 port => '/tmp/clamd.socket');
# find_all means that we wish to recurse directories.
# /tmp/clamd.socket is where my clamd has its socket.
# Other clamd configurations may differ.

unless ($av->ping) {
    plogdie "clamd isn't running, aborting virus scan";
} else {
    plog "Performing virus scan for $uname";

    # Save virus information per username ($uname).
    # Note! scan() returns a hash.
    $a_viruses{$uname} = $av->scan($dir);

    if ($a_viruses{$uname}) {
        my @vfiles = sort keys %{$a_viruses{$uname}};
        plog "$uname has ".@vfiles." viruses.";

        # Home assignment: print contents of $a_viruses{$uname}
        # to a file, using the sorted list @vfiles.
    }
}

Friday, May 22, 2009

Querying quotas with Quota.pm

I never claimed that this blog would be an exercise in expertise. ;)

Prerequisites: Unixy OS with quotas enabled, Perl 5.8.8, Quota.pm 1.6.3.

I always like to provide some sort of progress display for my programs. As a sysadmin, there are times when it might be prudent to check users' files without inspecting them by hand, e.g. when checking for parasitical exploits in websites. The number of files and/or amount of disk space used seem like reasonable measurements for keeping track of that progress.

So you use Quota; and get coding, right?

Except that you may not know beforehand whether you're scanning a local file system, or a remote file system, and Quota.pm requires that you have a magical device identifier before asking what the quota is.

The manual says that you should do something like this:


my $r_uid; # User's real UID
my $dev = Quota::getqcarg($directory);

my @quotadata = Quota::query($dev, $r_uid);

Now we've got some nice quota data, in the following order:

Current blocks used, block soft limit, block hard limit, soft block time limit, current inodes used, inode soft limit, inode hard limit, soft inode time limit.

But no, there's a catch! If you run this on the local file server, rather than via NFS/RPC, then Quota::query() will barf, because $dev is erroneous. How did that happen?

Well, Quota::query() doesn't work if the device is local!

So we have to do this after calling Quota::getqcarg():


if ($dev =~ m{^/dev/}) {
    $dev = "127.0.0.1:$directory";
}

The irony is then that there appears to be a need for an RPC listener on the loopback device, at least.

Anyway, I hope this is useful; it helped me to make tings Just Work.