Friday, October 17, 2008

Perl Collections

I read this comment on Yegge's blog from February 2008:

For example, if you see this in Perl:

%x = map { $_ => 1 } @words;
@words = keys %x;

without a comment, you should fire, or at the very least yell at, whoever wrote it. (There are efficiency reasons for not writing that as well, but mainly it's a clarity issue.)

I've been mulling this over for the last couple weeks, and I've decided The Comment deserves comment. I would actually respond to this in the comments on that blog, but they've been closed a long time, and so I'm putting my comments here.

I doubt anyone but me cares about this: I'm just venting here.

Let's begin with the most obvious counterpoint: if you have an employee you're paying to write Perl, and he or she can't understand

%x = map { $_ => 1 } @words;
@words = keys %x;

you're getting ripped off. No-one getting paid to read or write Perl can be excused for not immediately seeing what that code does. That is not advanced code, the variables are clearly named, and the idiom is well-known. If you can't grok that code, you need to bone up on your Perl. If Perl is just another tool in your toolkit as a professional, you might not immediately see what it does... but in that case, you ought to expect to be opening the Camel Book fairly frequently anyhow. In that case you don't expect to immediately understand what every line of Perl does as soon as you see it; but you do expect to be looking things up.

I'm not claiming that's the best way to get a unique list (List::MoreUtils::uniq is much better, for example); but it's a very short solution that will work consistently. It might not be the fastest solution, but it's reasonable, correct, and terse. And frankly, it's also fairly clear.

Part of the problem goes back to what a former boss I highly respect told me once: people write bad Perl because they can. Perl is a simple language to learn, and it's very forgiving. One result of that is, there is a decided lack of "raising the bar" on Perl newbies. And frankly, one of the great things about Perl is the community that encourages newbies, rather than harassing them. So maybe ugly Perl is an acceptable trade-off. Maybe the benefits of a DWIM language and a welcoming community far outweigh any evils stemming from naive and/or verbose code.

But maybe there is a middle path, where newbies can be encouraged to write better Perl without being harrangued and beaten. Maybe it's possible to gently and casually get the message across that baby talk is great for babies, but inappropriate for adolescents. Naivety and inexperience are nothing to be ashamed of, but they're not something to be proud of either. I think that's what my boss was trying to tell me.

And incidentally, it was that boss that really forced me to learn Perl properly. I knew Perl before, but my Perl really sucked. He forced me to use it, and raised the bar on me whenever I started to make progress. I'm no Perl expert, but whatever small advances I've made started when he put the pressure on me to write better Perl, instead of stuff that actually ran.

I think there is a related factor here too: Perl is a huge language. It was designed that way, and there are nooks and crannies that can go unexplored for years. I'm no Perl guru, but I know a lot more Perl now than I did two years ago, when I had already written a lot of Perl professionally. It seems I'm constantly saying "Well, I thought I knew Perl before, but now I realize I didn't". This is part of the famous TIMTOWTODI: having many ways to do any given task implies a fairly large and flexible vocabulary. It also implies a learning curve that is very long, even if it might well be very steep at times. Perl is a language that can surprise you even if you've been learning it for years.

I've found my Perl has gotten simpler and less like line noise over time. I've been known to write comments here and there, and my variable names have gotten more descriptive; but that's not what I mean. I find I almost never use && or || anymore, preferring and and or. I frequently use a die unless... idiom now: I find having a die at the start of a line emphasizes this is a potential exit point much more strongly. I also prefer to use do_something() if... instead of if(...){do_something()}, as it just seems to flow better. And I avoid if not, using unless.

One place my Perl has decidedly changed is, I now use list operators much more frequently. Maybe that's because I've spent some time writing actual, working code in Lisp, maybe not. But one thing is certain, I use map, grep, split, and join a lot more now. I used to do things like

foreach $line (sort @lines) {
print "$line\n";
print "\n";

Now I write that as:

print join ("\n", sort @lines), "\n";

I'm not sure that's really an improvement, except I prefer to use fewer lines to do the same amount of work. One line to accomplish a task in a reasonably clear manner seems like a better ROI for my Carpal Tunnel pains than four.

But I'm not trying to argue I'm brilliant, or anything like it. I'm merely using my own Perl as an example: over time, my Perl has changed its form decidedly---for the better, I think. When I started writing Perl, my programs looked like I had messed up my terminal settings: they were full of cryptic symbols and complicated key words. Now it reads more and more like English. I think that's an improvement.

And particularly when it comes to list operators, I've found replacing a lot of my for or foreach blocks with map blocks has had the net effect of making it easier to follow the code flow in my head. It's not always the most readable, but there's a sense where readable is in the eyes of the beholder. I honestly find grep inside map more readable than nested loops. It lets me keep track of things like counting variables and nested scopes a lot more easily.

So I'm looking at the "bad Perl" example in The Comment, and I'm wondering exactly what the problem with it is. It seems there is a fear of list operations, which seems to be a particular instance of a fear of not explicitly shuffling variables.

Before I read The Comment, a friend had asked me how to do a word count in Perl. I have to admit my solution looked a lot like "bad Perl":

my %count = map { my $word = $_;
$word => scalar grep { $_ eq $word } @words } @words;

It's not terribly efficient to call grep inside map, I understand that. But you'll end up with a nested loop no matter what you do: mine is only two lines long, and seems reasonably clear. Isn't that the point? I assume if we're working in Perl, then performance is not the main consideration: C or Java or Lisp can all knock out a word count a lot faster. We generally choose Perl because it's a great language for thinking: it's a great language for ripping apart a problem quickly and simply. That's frankly why I came back to Perl from Java: I wanted to get stuff done.

So if we want a language to think in, why wouldn't we use as much of it as we can? Why wouldn't we pick up little idioms and bits of vocabulary that make thinking easier? Why wouldn't we sum up several lines of foreach into a single, short map? That's why we have things like synonyms, right? That's why we develop vocabularies in natural languages to encapsulate ideas that pop up. We use words in modern Physics now that Newton didn't use: that's because we've gotten new ideas, and they've required new words. Sure, Newton's English was Turing-complete (so to speak), but it's a lot easier to model modern Physics with our extensions to it, and our newer vocabulary is generally considered a good thing.

So I've decided that not only is The Comment inaccurate, it's actually a step back. Good Perl, like good English, is concise and accurate. Gratuitously using arcane vocabulary in Perl is bad, just like it is in English. But deliberately limiting your vocabulary at the cost of brevity and clarity in spoken languages is laughable (try it: carry on a conversation without using nouns introduced in the last century, and see how ridiculous you sound); and I've concluded it's the same in Perl. I've become convinced the Right Thing to do is to encourage wider vocabulary, both in my own code and in others'. And if that forces the poor sap who maintains my code to look a few things up in the Camel Book, well he can thank me later, when his Perl is a little more fluent and his vocabulary a little broader.


Anonymous said...

$count{$_}++ for @words;

clumsy ox said...

Good call! Good single-line solution! Definitely better than my map + grep.

But I still argue that if a professional programmer doesn't grok the original "bad perl", the problem isn't the author of that code.

Matt Y. said...

If I was going to fire anyone for anything involving this code is would be for using a name like %x without a damned good reason.

This is one of the aspects of Perl culture that has always frustrated me and is one of several reasons I don't code in Perl much anymore. There is this idea that if you can't read it, then you're just not awesome enough to work with it.

You never know who is going to need to touch your code in the future, and they may not be as steeped in Perl as you are. And there is nothing wrong with that. On the other hand, there is nothing wrong with using Perl idioms and making the most of the language, as long as you give people something to grab on to.

Idioms are hard. And they are hard to look up. That one in particular I can't find in my (very old) Camel book. So give a hint.

At the very least, put the (unfortunately necessary in Perl) intermediate variable to semantic use to clarify your intent:

%unique_words = map { $_ => 1 } @words;
@words = keys %unique_words;

Or just use the library function (like you mentioned) to begin with. :)

clumsy ox said...

I can buy the variable name is a bad choice. And frankly, as much as I've enjoyed maintaining code with comments like:
# I can't believe this actually works
I have to admit they're not terribly helpful.

On the other hand, I refuse to buy into the school of thought that it's the author's fault if someone has trouble reading code. I've learned a lot by maintaining other people's code: and when someone else's code forces me to open the Camel Book, then they've done me a favour in pushing me to expand.

What I found frustrating about The Comment to start was, the assumption that writing slightly obtuse (but valid and perfectly clear) code is a firing offense.

Dave Hingsburger said...

Um, what?

(that's code by the way)

Anonymous said...

I will give you one credit that perl is very forgiving and writing bad code seems to be commonplace. But a firing offense? Welcome to real life, where you have to figure out what code is doing. Part of programming/coding is reading and understanding what code is doing. Most people use shitty variable names (which is your point obviously).

Anonymous said...

Why haven't I been using grep? arrghh.