Sunday, August 16, 2009

Hash key sort order - a Perl 6 community sunshine story

The order of hash keys is implementation dependent and arbitrary. Unless %hash is altered in any way, successive calls to .keys, .kv, .pairs, .values, or .iterator will iterate over the elements in the same order.

S09 - Hashes

This is new. It may not actually say much, but it does say what was implicit before, so that there is little room for doubt.

So here's my little sunshine story about how easy it is to clarify a part of the spec.

In my previous post, I used an imaginary case for showing off some features of Perl 6 - some of which also are available in Perl 5.10, as mentioned by Robert 'phaylon' Sedlacek in a comment.

This wasn't the only useful comment, I think there's a bit to be learned by reading those, so please do.

But I digress from the point of this post, which is a question that was raised in another comment to last Sunday's post:

By the way, do you know where the behavior of ~%h is spec'ed? I keep getting the keys back in the same order I put them in and don't know if that is an implementation quirk or a feature.

- Chas. Owens

The spec wasn't very clear about this; S32/Containers - Hash said that certain iterator methods iterate "… the elements of %hash in no apparent order, but the order will be the same between successive calls to these functions, as long as %hash doesn't change."

S09 - Hashes didn't say anything about it at all.

I said I would ask around. Thanks to the excellent community channel #perl6 on Freenode, I got an answer similar to this: no, this is unspecified/undefined behaviour, but feel free to come up with a better way of saying it, and update the synopses.

And how hard is it to update the synopses? Not at all! If we want to contribute, we get access. It's as easy and simple as that.

First, you need to check out the (part of) the svn repository that you want to contribute to:

svn co

Then you change whatever you want to change, preferably discuss it with some of the experienced souls on #perl6 or e.g. the Perl 6 language mailing list, show diffs on e.g. or, and if you think you're doing the right thing - commit the change.

"But I can't commit, I only have read access" you might say. Just ask in any of the mentioned fora for a "commit bit" and state your e-mail address and preferred username, and someone will help you out with that part.

As we say on #perl6: community++

I got started, will you join me?


mj41 said...

In some cases PHP behavior "same order as you put keys in" is also useful.

bakkushan said...

Yes, it may be useful in some cases.

However, as a programmer, I must say that seems more like a different kind of data structure than a hash; the behaviour you mention is similar to what you do if you just create a linked list.

This is probably why PHP doesn't call it a hash, but rather sticks to the name "array".

While the implementation of a hash may vary, they generally use a hashing function to generate a Hash table, because this is often more efficient so insert, update and delete from than a linked list or a search tree.

I suppose one could create a subclass of hash that keeps a separate list that's the current hash key order.

You will get into performance trouble if you need to delete or rearrange elements, since that's costly.

What the Perl 6 specification says, is essentially that you cannot depend on any specific behaviour for the order of hash keys. If you need a specific order, you may sort at e.g. extraction time.

If what you do is a one-time insert, it may be sensible to just maintain a separate list of keys as you insert them into the hash:

%hash{$key} = $val;

Abigail said...

The order of hash keys is implementation dependent and arbitrary. Unless %hash is altered in any way, successive calls to .keys, .kv, .pairs, .values, or .iterator will iterate over the elements in the same order.

That's not a very shocking "new in perl6" thingy. Perl5 makes the same promises. Perl4 did so as well. Heck, probably even perl1 made the same claim.

bakkushan said...

That's right, it's not new in Perl 6, but it's new in the specification.

(And having a specification is new for Perl 6.)