Eric Johnson (kablamo): Why reading code is good for you

Perl Gems: Using File::Copy to Deploy Files to a Windows UNC Path

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4

Below is script that illustrates the use of File::Copy to copy files to a UNC path on a Windows network.  The example code downloads a copy of the hosts file made available by the Malware Domain List and copies it to the appropriate directory on a Windows machine in order to prevent the machine from being able to successfully resolve those malicious sites.  

#!usr/bin/perl

use LWP;
use File::Copy;
use strict;
use warnings;

#URL of hosts file
my $URI = 'http://www.malwaredomainlist.com/hostslist/hosts.txt';

#downloads host file
my $ua = LWP::UserAgent->new();
my $request = HTTP::Request->new(GET => $URI);
my $response = $ua->request($request);
my $content = $response->content();
#print $content;

#writes downloaded hosts file to file
open(my $hosts2, ">", "hosts2.txt");
print $hosts2 "$content";
close $hosts2;

#opens file that stores list of PC names
open(my $computers, "<", "computers.txt")
   or die "cannot open < computers.txt: $!";

#copies file to proper location on each computer  
while(<$computers>){
   my $computer=$_;
   print $computer;
   my $path1='hosts2.txt';
   my $path2="\\\\$computer\\C\$\\WINDOWS\\system32\\drivers\\etc\\hosts";
   copy("$path1","$path2") or die "Copy failed: $!";
}

close $computers;

Perl News: Czech Perl Workshop Talk Schedule

Czech Perl Workshop takes place in Prague on May 20th and 21st 2014

The (almost) complete talk schedule has been published.

This event is highly recommended for Perl professionals but also for those, who are just deeply interested in this field and are hungry for new information and ideas.

Register now (deadline is 6th of May 2014).

Perl Foundation News: Tony Cook's Grant Extended

I am pleased to announce that Tony Cook's grant Maintaining Perl 5 grant has been extended by another 400 hours.

Thank you to everyone who responded to the call for comments and who provided feedback on this grant extension. I would also like to thank all those who continue to provide financial support to the Perl 5 Core Maintenance Fund.

Tadeusz Sosnierz: New Perl6 game: RetroRacer

Whatever but Cool

(I’m really sorry for the name; I couldn’t think of anything better :))

Image

 

This game, apart from (obviously) being a showcase for a new Steroids iteration, is all about switching lanes on a high traffic road in a fast car. Yay!

It’s really no rocket science compared to ThroughTheWindow from the last post – even code even looks similar. One obvious improvement (beside finally using proper PNGs instead of silly BMPs – timotimo++!) is a built-in collision detection:

my $s self.add_sprite(‘othercar’$_0);

# …

$s.when({ $_.collides_with($!player}, {

    # …

});

No more cheating with collisions like I did with ThroughTheWindow. The existing solution uses the entire image sprite as a hitbox; I’m hoping to make it customizable one day (it’s a simple thing really, code-wise).

All in all, the game isn’t all that much more sophisticated than the last one; I was really just looking for a good excuse to write a new game (and add some new stuff to Steroids), and I sort of came up with a nice theme to follow: ThroughTheWindow used just one key (spacebar), so the next step was to use two (thus RetroRacer) uses left and right arrow keys. What will the next game use? 3 keys? 4 keys? Is it an arithmetical or geometrical series? Oh my, I can’t wait to find out myself.

Now go and grab it at https://github.com/tadzik/RetroRacer, and don’t forget about the soundtrack!

 


Eric Johnson (kablamo): An experiment - Write code every day

If you missed John Resig’s recent post about writing code everyday I highly recommend it.

He is a busy guy with a full time job (at Khan Academy), a few open source side projects (the author of jQuery), a wife, and a few hobbies. How to sustainably get stuff done on his open source side projects without his wife leaving him? He decided to start writing (non work) code for 30 minutes every day. This by itself is not a revolutionary idea. What blew my mind out of my nose and on to the table are the benefits he encountered:

  • Minimum viable code – No time for more than that.
  • Small but continuous progress – No anxiety about not getting stuff done.
  • Free time on the weekends – Instead of working all weekend to catch up from doing nothing during the week.
  • Lowered cost of context switching – Compared to resuming work on a side project just on the weekends.
  • Brain solves side project issues in the background

Wow, I need to do this too. So this is another experiment and here are the rules.

  1. I will write code for a minimum of 30 minutes each day.
  2. I must push working code every day.
  3. I will write for a minimum of 10 minutes each day.
  4. I must publish a blog post at least once a week.

Eric Johnson (kablamo): Codecube.io now supports Perl

Codecube.io is a jsfiddle type service which runs Perl code (and other languages) and shows the results in your browser.

The website is written in Go and runs your code inside a Docker container. It originally had support for C, Go, Python, and Ruby. I was looking for an excuse to play with Docker and Go so I submitted a pull request which added support for Perl.

See also:

PAL-Blog: Frohe Ostern

Frohe Ostern!

PAL-Blog: Passing arguments

Many functions, methods or subs (no matter how you call them) need some arguments. In Perl TIMTOWTDI, but some are faster than others. I'll compare eight ways to get arguments passed to a sub.

Ovid: Views in DBIx::Class

Did you know you can write a view in DBIx::Class? The DBIx::Class::ResultSource::View module makes this very easy and it's helped me solve a very thorny problem in Veure: how do I efficiently make sure that email sent from Alpha Centauri to Epsilon Eridani doesn't show up instantly in your inbox?

Here's the problem: in the game Veure, you can send email to other players (only one at a time), they can reply, there can be email "threads", and so on. If you notice something interesting at the cloning vats in The House of Comoros space station in the Alpha Centauri system and you dash off a quick email to your friend about it, you may not know or care where your friend is. If they're in the same star system, they get the email instantly (a concession to gameplay mechanics), but if they're at the Epsilon Eridani Jump Gate, that will take a while because the email is traveling via wormhole.

Epsilon Eridani is about 10.5 light years from Sol, but it's 12.64 light years from Alpha Centauri A. As luck would have it, there's a direct wormhole between the two, making your email faster (there is no direct wormhole between Sol and Epsilon Eridani, so you actually have to travel almost 14 light years to get there). Information sent via wormhole takes 30 seconds per light year to travel (that's over three times faster than the Corvette, currently the fastest ship in the game). As I use PostgreSQL for the Veure database, I can take advantage of PostgreSQL's excellent time handling. So my basic rules look like this:

  • You can always see email you've sent
  • You can always see email originating in the system you're currently in
  • You cannot see email originating in a different star system unless it was sent 30 seconds x "wormhole route distance" ago.

Now when you want to fetch an entire email thread, that's where things start to get hairy because the SQL looks like this (this lets us select the entire thread, regardless of which email id is used):

Do you want to try to convert that to a dbic query? I didn't think so. I started on it but it wasn't clear to me that it was an improvement over the raw SQL.

Fortunately, dbic's views take arbitrary SQL and returns a standard resultset, though the individual results are read-only (because they might not have a one-to-one correspondence with to a given table). That's often fine in a Web app because you often present a list of results, a user chooses one and acts on that. The "choosing one" happens on a separate request where you can edit a standard result instead of the view result.

For the above SQL, I now have it wrapped up in the following:

package Veure::Schema::Result::View::EmailThread;

use Moose;
use MooseX::MarkAsMethods autoclean => 1;
extends 'Veure::Schema::Result::Email';

__PACKAGE__->table_class('DBIx::Class::ResultSource::View');
__PACKAGE__->table("email_thread");    # XXX virtual view name. Doesn't exist

# is_virtual allows us to use bind parameters
__PACKAGE__->result_source_instance->is_virtual(1);
__PACKAGE__->result_source_instance->view_definition($scary_sql_here);
__PACKAGE__->meta->make_immutable;

1;

And, of course, the ResultSet class:

package Veure::Schema::ResultSet::View::EmailThread;

use strict;
use warnings;
use parent 'DBIx::Class::ResultSet';

sub get_thread {
    my ( $self, $email, $character ) = @_;
    my $id = $character->character_id;
    return $self->search( {}, { bind => [ $email->id, $id, $id ] } );
}

1;

I have a lot more work to do on this, but even though the above is a bit clumsy, it lets me solve a hard problem with very little code. Sure, I could have used a regular email result set and iterated through all of them, checking to see if you were the recipient and whether you were in a different star system and where enough time had elapsed to let you see the email.

Or I can let my database do it for me. Not only is the code faster, it's cleaner, too.

NEILB: Release often

Today was my 28th consecutive day releasing to CPAN, and I'm one day behind BARBIE who started all this. Having to release every day has pushed me in a number of ways, and I've certainly done more in the last 4 weeks than I would have otherwise.

Ovid: Perl-Operated Boy

And now for something completely different.

Shawn M Moore: SKShapeNode, you are dead to me

For the past three months I've spent damn near every night and weekend moment building my next iOS game. I now regularly shut down Diesel Cafe. The game is my most ambitious project yet and I'm having a blast making it. As of today it's sixteen thousand lines and growing strong. For the UI I'm using Sprite Kit which has been a real pleasure. But lurking inside it there is one source of pain that keeps recurring.

SKShapeNode is a subclass of SKNode that draws a CGPathRef. It can render Bézier curves, polygons, rings, Louisiana, whatever. You can set the stroke color of the shape, or its fill color, or both. You could probably implement a decent chunk of your game's HUD with it. Bézier curves are a great way to give visual feedback of a user's gesture as in, say, Flight Control. Describing shapes at runtime rather than at design time (as in SKSpriteNode) unlocks worlds of possibilities.

However, SKShapeNode is by far the least-well engineered API in Sprite Kit. In fact, I have trouble naming a single lousier API that I've used since I started programming professionally. Say what you will about tenets of SOAP, at least it's an ethos.

I respect that iOS 7 was a rush order. It's unfair to expect that everything will come out perfectly during a platform reinvention. However I maintain that Sprite Kit would have been improved by simply holding SKShapeNode back until iOS 8. It was not ready to ship. But since people have it, they want to use it. And to those people, BEWARE!

SKShapeNode, how do I loathe thee? Let me count the ways.

  1. SKShapeNodeiswidelyknowntoleakmemory.

    Unfixable memory leaks is already enough reason to avoid using an API. But wait, there's more…

  2. From SKShapeNode's documentation, "A line width larger than 2.0 may cause rendering artifacts in the final rendered image."

    It's good that they are up front about this limitation. But that is still pretty weak.

  3. Sometimes setStrokeColor:[SKColor redColor] has no visual effect at all. So you have to trick the SKShapeNode into redrawing itself. Changing its alpha is one way to do it:

    #if BUSTED_SKSHAPENODE_SETSTROKECOLOR
        CGFloat oldAlpha = shape.alpha;
        shape.alpha = 0;
        shape.alpha = oldAlpha;
    #endif
        shape.strokeColor = [SKColor redColor];

    Note that it is not sufficient to simply say shape.alpha = shape.alpha. That does not trigger a display. For whatever reason, the internals demand you actually change the property value.

    You know, I wouldn't be surpised to learn that internally, Sprite Kit uses a setNeedsDisplay: system like CALayer. That is an optimization to eliminate useless redraws. If that's the case, then whoever was working on SKShapeNode apparently forgot to have setStrokeColor: invoke the setNeedsDisplay: of Sprite Kit.


    Digging deeper, it seems this problem manifests itself only when the SKShapeNode is a descendent of SKEffectNode. To see it in action, start a new project using the Sprite Kit template and replace your scene class's implementation with this:

    -(id)initWithSize:(CGSize)size {
        if (self = [super initWithSize:size]) {
            SKEffectNode *container = [SKEffectNode node];
            [self addChild:container];
    
            SKShapeNode *shape = [SKShapeNode node];
            shape.path = [UIBezierPath bezierPathWithRoundedRect:CGRectMake(20, 20, 20, 20) cornerRadius:4].CGPath;
            shape.position = CGPointMake(CGRectGetMidX(self.frame), CGRectGetMidY(self.frame));
            shape.strokeColor = [SKColor redColor];
            [container addChild:shape];
    
            dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(2 * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{
                NSLog(@"setting stroke color");
    
                if (0) { // <-- CHANGE ME
                    CGFloat oldAlpha = shape.alpha;
                    shape.alpha = 0;
                    shape.alpha = oldAlpha;
                }
                shape.strokeColor = [SKColor greenColor];
            });
        }
        return self;
    }

    For me, the round rect stays red indefinitely. If you change that if (0) to a true value, then the alpha change causes the subsequent setStrokeColor: to have the intended visible effect.

    I've reported this to Apple as rdar://16400219.

  4. SKShapeNode sometimes drops little rendering glitches throughout my scenes.

    Those red lines are from SKShapeNode instances that once rendered red rectangles. Many frames ago. For whatever reason SKShapeNode decided to try to resurrect them, but only did half the job.

  5. This one is the most baffling and upsetting. It seems that if you have too many SKShapeNode instances visible on screen, it completely screws up the scene rendering. The scene shrinks to about 60% of its height for a few moments. In the following screenshots you can see what happens when I tiptoe past the apparent SKShapeNode limit (thanks to all that detritus from the previous point). The game becomes completely unusable.

    This problem seems to yet again be the fault of SKShapeNode inside of an SKEffectNode. My guess is that SKEffectNode's unique rendering model is triggering this. SKEffectNode lets you apply Core Image filters (which are akin to Photoshop filters) to some of your nodes. It's amazingly powerful. Seriously next level shit. But to achieve that, SKEffectNode must render its subtree into a separate buffer to which it can apply its CI filter. This different codepath is probably the cause of all the problems. But if SKShapeNode freaks out when it's being rendered into an SKEffectNode, I seriously question how robust Sprite Kit is. (Incidentally, SKEffectNode also doesn't respect the zPositions of its children, but that's another post altogether. The solution for that one is to interject a plain SKNode into the node tree. See rdar://16534245)

    Anyway. I've luckily been able to replicate this crazy rendering glitch with a small amount of code. I've recorded a video showing the bug. As before, replace the Sprite Kit template's scene class's implementation with the following:

    -(id)initWithSize:(CGSize)size {    
        if (self = [super initWithSize:size]) {
            SKEffectNode *container = [SKEffectNode node];
            [self addChild:container];
        }
        return self;
    }
    
    -(void)touchesBegan:(NSSet *)touches withEvent:(UIEvent *)event {
        UITouch *touch = [touches anyObject];
        SKEffectNode *container = self.children[0];
    
        SKShapeNode *shape = [SKShapeNode node];
        shape.path = [UIBezierPath bezierPathWithRoundedRect:CGRectMake(-10, -10, 20, 20) cornerRadius:4].CGPath;
        shape.position = [touch locationInNode:self];
        shape.strokeColor = [SKColor colorWithHue:drand48() saturation:1 brightness:1 alpha:1];
        shape.blendMode = SKBlendModeAdd;
        [container addChild:shape];
    }

    Tap the screen a few times. All's well.

    Tap the screen a few more time… Hey what the hell was that?

    What in the world did Apple do to cause this bug? Regardless, I've reported this one as rdar://16400203.

  6. According to other folks, SKShapeNode also has terrible performance and is missing key features from its CAShapeLayer counterpart.


Because of all these flaws, SKShapeNode is completely untrustworthy. I now refuse to use SKShapeNode for any new code I write. I have also been refactoring existing code that uses it to stop using it. Here are some ways I've been able to do that.

  1. Just remove the SKShapeNode. For some effects it's not worth all the trouble. You'll soon think of something better to replace it.

  2. For borders on opaque nodes, just use a SKSpriteNode instantiated with +[SKSpriteNode spriteNodeWithColor:size:]. This gets you a rectangular block of the provided SKColor. Beyond just borders, I've converted my HP bars this way too.

    Switching to a sprite even looks better. And you won't have to fear using a border width of greater than 2.0. Cripes!

  3. Sprite Kit plays well enough with CALayer and friends. When you can get away with it, stick a CAShapeLayer into your SKView's layer. I use this in two places in my game: a drawing pad and a procedurally-generated lightning bolt.

    This works fine if your CAShapeLayer is going to be the topmost UI component. However if you need to display Sprite Kit content over the layer, things would get tricky. Maybe you can use two SKView instances, sandwiching the CAShapeLayer. That sounds like an awful lot of work though. Personally, I've chosen my battles carefully; there will be nothing in my game that renders above that drawing pad or lightning bolt.

    Be aware that using CALayer requires jumping through a few convertPoint: hurdles. The coordinate system of Sprite Kit is different from the coordinate system of Core Animation. Natch.

  4. Render a CGPathRef offscreen using a disconnected CAShapeLayer. Then snapshot that into an image. Then create an SKSpriteNode with that snapshot as a texture. While I haven't personally used this technique, I see no reason it wouldn't work.

    Now you can add that sprite to your scene, animate it all over town, put it over or under other nodes, etc. You now have an unchanging SKShapeNode without all of the insane, unfixable bugs.


You know, maybe that one is worth doing right. The first person to implement the complete SKShapeNode API using an SKSpriteNode backed by a CALayer wins … my undying respect!

Update: Reader Michael Redig pointed me to his SKUtilities project which implements exactly that: SKUShapeNode is a subclass of SKSpriteNode that renders using a CAShapeLayer. It's currently incomplete but certainly looks to me like a good start.


As far as I'm concerned, this is how SKShapeNode should be handled in your codebase:

#define SKShapeNode SHAPENODE_IS_BANNED

This results in an error if, in a moment of weakness, you try to use SKShapeNode:

Dave's Free Press: Journal: Devel::CheckLib can now check libraries' contents

Perlgeek.de : Rakudo's Abstract Syntax Tree

After or while a compiler parses a program, the compiler usually translates the source code into a tree format called Abstract Syntax Tree, or AST for short.

The optimizer works on this program representation, and then the code generation stage turns it into a format that the platform underneath it can understand. Actually I wanted to write about the optimizer, but noticed that understanding the AST is crucial to understanding the optimizer, so let's talk about the AST first.

The Rakudo Perl 6 Compiler uses an AST format called QAST. QAST nodes derive from the common superclass QAST::Node, which sets up the basic structure of all QAST classes. Each QAST node has a list of child nodes, possibly a hash map for unstructured annotations, an attribute (confusingly) named node for storing the lower-level parse tree (which is used to extract line numbers and context), and a bit of extra infrastructure.

The most important node classes are the following:

QAST::Stmts
A list of statements. Each child of the node is considered a separate statement.
QAST::Op
A single operation that usually maps to a primitive operation of the underlying platform, like adding two integers, or calling a routine.
QAST::IVal, QAST::NVal, QAST::SVal
Those hold integer, float ("numeric") and string constants respectively.
QAST::WVal
Holds a reference to a more complex object (for example a class) which is serialized separately.
QAST::Block
A list of statements that introduces a separate lexical scope.
QAST::Var
A variable
QAST::Want
A node that can evaluate to different child nodes, depending on the context it is compiled it.

To give you a bit of a feel of how those node types interact, I want to give a few examples of Perl 6 examples, and what AST they could produce. (It turns out that Perl 6 is quite a complex language under the hood, and usually produces a more complicated AST than the obvious one; I'll ignore that for now, in order to introduce you to the basics.)

Ops and Constants

The expression 23 + 42 could, in the simplest case, produce this AST:

QAST::Op.new(
    :op('add'),
    QAST::IVal.new(:value(23)),
    QAST::IVal.new(:value(42)),
);

Here an QAST::Op encodes a primitive operation, an addition of two numbers. The :op argument specifies which operation to use. The child nodes are two constants, both of type QAST::IVal, which hold the operands of the low-level operation add.

Now the low-level add operation is not polymorphic, it always adds two floating-point values, and the result is a floating-point value again. Since the arguments are integers and not floating point values, they are automatically converted to float first. That's not the desired semantics for Perl 6; actually the operator + is implemented as a subroutine of name &infix:<+>, so the real generated code is closer to

QAST::Op.new(
    :op('call'),
    :name('&infix:<+>'),    # name of the subroutine to call
    QAST::IVal.new(:value(23)),
    QAST::IVal.new(:value(42)),
);

Variables and Blocks

Using a variable is as simple as writing QAST::Var.new(:name('name-of-the-variable')), but it must be declared first. This is done with QAST::Var.new(:name('name-of-the-variable'), :decl('var'), :scope('lexical')).

But there is a slight caveat: in Perl 6 a variable is always scoped to a block. So while you can't ordinarily mention a variable prior to its declaration, there are indirect ways to achieve that (lookup by name, and eval(), to name just two).

So in Rakudo there is a convention to create QAST::Block nodes with two QAST::Stmts children. The first holds all the declarations, and the second all the actual code. That way all the declaration always come before the rest of the code.

So my $x = 42; say $x compiles to roughly this:

QAST::Block.new(
    QAST::Stmts.new(
        QAST::Var.new(:name('$x'), :decl('var'), :scope('lexical')),
    ),
    QAST::Stmts.new(
        QAST::Op.new(
            :op('p6store'),
            QAST::Var.new(:name('$x')),
            QAST::IVal.new(:value(42)),
        ),
        QAST::Op.new(
            :op('call'),
            :name('&say'),
            QAST::Var.new(:name('$x')),
        ),
    ),
);

Polymorphism and QAST::Want

Perl 6 distinguishes between native types and reference types. Native types are closer to the machine, and their type name is always lower case in Perl 6.

Integer literals are polymorphic in that they can be either a native int or a "boxed" reference type Int.

To model this in the AST, QAST::Want nodes can contain multiple child nodes. The compile-time context decides which of those is acutally used.

So the integer literal 42 actually produces not just a simple QAST::IVal node but rather this:

QAST::Want.new(
    QAST::WVal(Int.new(42)),
    'Ii',
    QAST::Ival(42),
)

(Note that Int.new(42) is just a nice notation to indicate a boxed integer object; it doesn't quite work like this in the code that translate Perl 6 source code into ASTs).

The first child of a QAST::Want node is the one used by default, if no other alternative matches. The comes a list where the elements with odd indexes are format specifications (here Ii for integers) and the elements at even-side indexes are the AST to use in that case.

An interesting format specification is 'v' for void context, which is always chosen when the return value from the current expression isn't used at all. In Perl 6 this is used to eagerly evaluate lazy lists that are used in void context, and for several optimizations.

Shawn M Moore: Compile-Time Error for Incorrectly Cased #import

OS X uses a case-insensitive filesystem by default. That means the following code that purports to load AFNetworking.h both compiles and runs, nary a peep:

#import "AfNeTwOrKiNg.H"

Gawsh, I can't put my finger on it, but that kinda rubs me the wrong way.

The frequency of my typing YWaction.h when I meant YWAction.h would frighten children. One of my favorite things about working in a language like Objective-C (relative to my mother tongue Perl, anyway) is that I am immediately notified of most typos. But clang does not warn about this one. It's just plain sloppy. That I cannot abide. Practically, it would also lead to lots of tiny problems should anyone try to build this app on a case-sensitive filesystem.

Carefully reviewing all the #import statements in my project makes my eyes glaze over. So, let's put this high-falutin' typin' teevee machine to work. Here is how you can get Xcode to text you more when you screw up your imports.

  1. Navigate to your app target. It's probably the first entry in your file navigator, then under Targets on the left pane.
  2. Select the Build Phases tab.
  3. Click the little + button at the top left.
  4. Select "New Run Script Build Phase".
  5. This adds a "Run Script" entry to the bottom of this list. Pop it open by clicking its disclosure triangle.
  6. Set the value of Shell to /usr/bin/perl
  7. You heard that right. Perl.
  8. In the text field below Shell, paste in the following Perl script:
    my @files = glob("*/*.[hm]");
    my %is_file = map { s{.*/}{}r => 1 } @files;
    my %lc_file = map { lc($_) => $_ } keys %is_file;
    
    my $errors = 0;
    
    for my $file (@files) {
        open my $handle, "<", $file;
        while (<$handle>) {
            next unless my ($import) = /#import\s*"(.*)"/;
            next if $is_file{$import};
    
            print qq{$file:$.: warning "$import"};
    
            if (my $fixed_case = $lc_file{lc $import}) {
                print qq{ (should be "$fixed_case")};
            }
    
            print qq{\n};
    
            ++$errors;
        }
    }
    
    exit 1 if $errors;
  9. Rename the build phase by clicking its name twice. I called mine Check #import Casing.
  10. Drag and drop to reorder your build phases however you like. Mine's near the top, because I think it's better to fail fast.

When it's all said and done, your build phase should resemble mine. Unless you've got a newer version of Xcode than me, in which case I'm knowing about my own future, that's cool!

Now when you ⌘B, Xcode will tell you about all your miscased #import statements just like any other builtin error. One less thing to be vigilant about! Happily, Xcode even shows these errors right in context.

You might notice that this is actually an error. That's because in my projects, all warnings are errors. Ain't nobody got time for anything less.

Dave's Free Press: Journal: I Love Github

Dave's Free Press: Journal: Palm Treo call db module

Ocean of Awareness: Evolvable languages

Ideally, if a syntax is useful and clear, and a programmer can easily read it at a glance, you should be able to add it to an existing language. In this post, I will describe a modest incremental change to the Perl syntax.

It's one I like, because that's beside the point, for two reasons. First, it's simply intended as an example of language evolution. Second, regardless of its merits, it is unlikely to happen, because of the way that Perl 5 is parsed. In this post I will demonstrate a way of writing a parser, so that this change, or others, can be made in a straightforward way, and without designing your language into a corner.

When initializing a hash, Perl 5 allows you to use not just commas, but also the so-called "wide comma" (=>). The wide comma is suggestive visually, and it also has some smarts about what a hash key is: The hash key is always converted into a string, so that wide comma knows that in a key-value pair like this:

    key1 => 711,

that key1 is intended as a string.

But what about something like this?

  {
   company name => 'Kamamaya Technology',
   employee 1 => first name => 'Jane',
   employee 1 => last name => 'Doe',
   employee 1 => title => 'President',
   employee 2 => first name => 'John',
   employee 2 => last name => 'Smith',
   employee 3 => first name => 'Clarence',
   employee 3 => last name => 'Darrow',
  }

Here I think the intent is obvious -- to create an employee database in the form of a hash of hashes, allowing spaces in the keys. In Data::Dumper format, the result would look like:

{
              'employee 2' => {
                                'last name' => '\'Smith\'',
                                'first name' => '\'John\''
                              },
              'company name' => '\'Kamamaya Technology\'',
              'employee 3' => {
                                'last name' => '\'Darrow\'',
                                'first name' => '\'Clarence\''
                              },
              'employee 1' => {
                                'title' => '\'President\'',
                                'last name' => '\'Doe\'',
                                'first name' => '\'Jane\''
                              }
            }

And in fact, that is the output of the script in this Github gist, which parses the previous "extended Perl 5" snippet using a Marpa grammar before passing it on to Perl.

Perl 5 does not allow a syntax like this, and looking at its parsing code will tell you why -- it's already a maintenance nightmare. The extension I've described above could, in theory, be added to Perl 5, but doing so would aggravate an already desperate maintenance situation.

Now, depending on taste, you may be just as happy that you'll never see the extensions I have just outlined in Perl 5. But I don't think it is as easy to be happy about a parsing technology that quickly paints the languages which use it into a corner.

How it works

The code is in a Github gist. For the purposes of the example, I've implemented a toy subset of Perl. But this approach has been shown to scale. There are full Marpa-powered parsers of C, ECMAScript, XPath, and liberal HTML.

Marpa is a general BNF parser, which means that anything you can write in BNF, Marpa can parse. For practical parsing, what matters are those grammars that can be parsed in linear time, and with Marpa that class is vast, including all the classes of grammar currently in practical use. To describe the class of grammars that Marpa parses in linear time, assume that you have either a left or right parser, with infinite lookahead, that uses regular expressions. (A parser like this is called LR-regular.) Assume that this LR-regular parser parses your grammar. In that case, you can be sure that Marpa will parse that grammar in linear time, and without doing the lookahead. (Instead Marpa tracks possibilities in a highly-optimized table.) Marpa also parses many grammars that are not LR-regular in linear time, but just LR-regular is very likely to include any class of grammar that you will be interested in parsing. The LR-regular grammars easily include all those that can be parsed using yacc, recursive descent or regular expressions.

Marpa excels at those special hacks so necessary in recursive descent and other techniques. Marpa allows you to define events that will stop it at symbols or rules, both before and after. While stopped, you can hand processing over to your own custom code. Your custom code can feed your own tokens to the parse for as long as you like. In doing so, it can consult Marpa to determine exactly what symbols and rules have been recognized and which ones are expected. Once finished with custom processing, you can then ask Marpa to pick up again at any point you wish.

The craps game is over

The bottom line is that if you can describe your language extension in BNF, or in BNF plus some hacks, you can rely on Marpa parsing it in reasonable time. Language design has been like shooting crap in a casino that sets you up to win a lot of the first rolls before the laws of probability grind you down. Marpa changes the game.

To learn more

Marpa::R2 is available on CPAN. A list of my Marpa tutorials can be found here. There are new tutorials by Peter Stuifzand and amon. The Ocean of Awareness blog focuses on Marpa, and it has an annotated guide. Marpa has a web page that I maintain and Ron Savage maintains another. For questions, support and discussion, there is the "marpa parser" Google Group.

Comments

Comments on this post can be made in Marpa's Google group.

Shawn M Moore: My New Blog: Japanese Technique

I've decided to split off a separate blog for all my Japanese-related content. I'm calling the new site Japanese Technique.

I have many reasons for making this a new blog instead of continuing to post such articles to sartak.org. One major reason is that I want to reach a completely new audience.

Most of you reading this know me from my open-source development work. That is how I would like to maintain this site going forward. I am concerned that technical articles about git and Perl would be offputting to people who might otherwise find value in my Japanese content. The flip-side of that is I would not want to flood you developers with learning-Japanese material that is of no use to you.

I also want to post more frequently. The methods I use to study Japanese, and the tools I build to do it, certainly gives me an interesting angle. As @lestrrat so eloquently put it, I'm learning Japanese like an engineer. And I have plenty to say on that topic. I only post several times a year about developer topics, and I would not want that content to be overwhelmed.

In short, by giving this outlet a fresh site and a name, Japanese Technique, I can let this facet of my life stand on its own. That appeals to me more than having Japanese piggyback on, dilute, or be diluted by, my previous endeavors.

Perlgeek.de : doc.perl6.org and p6doc

Background

Earlier this year I tried to assess the readiness of the Perl 6 language, compilers, modules, documentation and so on. While I never got around to publish my findings, one thing was painfully obvious: there is a huge gap in the area of documentation.

There are quite a few resources, but none of them comprehensive (most comprehensive are the synopsis, but they are not meant for the end user), and no single location we can point people to.

Announcement

So, in the spirit of xkcd, I present yet another incomplete documentation project: doc.perl6.org and p6doc.

The idea is to take the same approach as perldoc for Perl 5: create user-level documentation in Pod format (here the Perl 6 Pod), and make it available both on a website and via a command line tool. The source (documentation, command line tool, HTML generator) lives at https://github.com/perl6/doc/. The website is doc.perl6.org.

Oh, and the last Rakudo Star release (2012.06) already shipped p6doc.

Status and Plans

Documentation, website and command line tool are all in very early stages of development.

In the future, I want both p6doc SOMETHING and http://doc.perl6.org/SOMETHING to either document or link to documentation of SOMETHING, be it a built-in variable, an operator, a type name, routine name, phaser, constant or... all the other possible constructs that occur in Perl 6. URLs and command line arguments specific to each type of construct will also be available (/type/SOMETHING URLs already work).

Finally I want some way to get a "full" view of a type, ie providing all methods from superclasses and roles too.

Help Wanted

All of that is going to be a lot of work, though the most work will be to write the documentation. You too can help! You can write new documentation, gather and incorporate already existing documentation with compatible licenses (for example synopsis, perl 6 advent calendar, examples from rosettacode), add more examples, proof-read the documentation or improve the HTML generation or the command line tool.

If you have any questions about contributing, feel free to ask in #perl6. Of course you can also; create pull requests right away :-).

Shawn M Moore: IIDelegate: Conforming to Protocols with Blocks

This article was published at

Ocean of Awareness: A Marpa-powered C parser

Jean-Damien Durand has just released MarpaX::Languages::C::AST, which parses C language into an abstract syntax tree (AST). MarpaX::Languages::C::AST has been tested against Perl's C source code, as well as Marpa's own C source.

Because it is Marpa-powered, MarpaX::Languages::C::AST works differently from other C parsers. In the past, C parsers have been syntax-driven -- parsing was based on a BNF description of the C grammar. More recently, C parsers have used hand-written recursive descent -- they have been procedurally-driven.

MarpaX::Languages::C::AST uses both approaches. Marpa has the advantage that it makes full knowledge of the state of the parse available to the programmer, so that procedural logic and syntax-driven parsing can reinforce each other. The result is a combined lexer/parser which is very compact and easy to understand. Among the potential applications:

  • Customized "lints". You can write programs to enforce C language standards and restrictions specific to an individual, a company or a project.
  • C interpreters. By taking the AST and adding your own back end, you can create a special-purpose C interpreter or a special-purpose compiler.
  • C variants. Because the code for the parser is compact and easy to modify, it lends itself to language extension and experimentation. For example, you could reasonably implement compilers to try out the proposals submitted to a standards committee.
  • C supersets. Would you like to see some of the syntax from a favorite language available in C? Here's your chance.

The implementation

A few of Jean-Damien's implementation choices are worth noting. A C parser can take one of two strategies: approximate or precise. A compiler has, of course, to be precise. Tools, such as cross-referencers, often decide to be approximate, or sloppy. Sloppiness is easier to implement and has other advantages: A sloppy tool can tolerate missing C flags: what the C flags should be can be one of the things it guesses at.

Of the two strategies, Jean-Damien decided to go with "precise". MarpaX::Languages::C::AST follows the C11 standard, with either GCC or Microsoft extensions. This has the advantage that MarpaX::Languages::C::AST could be used as the front end of a compiler.

Because MarpaX::Languages::C::AST purpose is to take things as far as an AST, and let the user take over, it does not implement those constraints usually implemented in post-processing. One example of a post-syntactic constraint is the one that bans "case" labels outside of switch statements. Perhaps a future version can include a default "first phase" post-processor to enforce the constraints from the standards. As currently implemented, the user can check for and enforce these constraints in any way he likes. This makes it easier for extensions and customizations, which I think of as the major purpose of MarpaX::Languages::C::AST.

The parsing strategy

Those familar with the C parsing and its special issues may be interested in Jean-Damien's approach to them. MarpaX::Languages::C::AST is, with a few exceptions, syntax-driven -- the parser works from Marpa's SLIF, an extended BNF variant. The SLIF-driven logic is sufficient to deal with the if-then-else issue. Marpa handles right recursion in linear time, so that the if-then-else issue could have been dealt with by rewriting the relevant rules. But Jean-Damien wanted to have his BNF follow closely the grammar in the standards, and he decided to use Marpa's rule ranking facility instead.

More complicated is the ambiguity in C between variable names and types, which actually takes C beyond BNF and context-free grammars into context-sensitive territory. Most C parsers deal with this using lexer or post-processing hacks. Marpa allows the parser to do this more elegantly. Marpa knows the parsing context at all times and can comnunicate this to a user's customized code. Marpa also has the ability to use the parsing context to decide when to switch control from the syntax-driven logic to a user's customized procedural logic, and for the syntax-driven logic to take control back when the procedural logic wants to give it back. This allows the variable-name-versus-type ambiguity to be handled by specifically targeted code which knows the full context of the decisions it needs to make. This code can be written more directly, simply and clearly than was possible with previous parsing methods.

Compilers?

Above I mentioned special-purpose compilers. What about production compilers? MarpaX::Languages::C::AST's upper layers are in Perl, so the speed, while acceptable for special-purpose tools, will probably not be adequate for production. Perhaps a future Marpa-powered C parser will rewrite those upper layers in C, and make the race more interesting.

To learn more

Marpa::R2 is available on CPAN. A list of my Marpa tutorials can be found here. There are new tutorials by Peter Stuifzand and amon. The Ocean of Awareness blog focuses on Marpa, and it has an annotated guide. Marpa also has a web page. For questions, support and discussion, there is the "marpa parser" Google Group. Comments on this post can be made there.

Ocean of Awareness: Parsing Ada Lovelace

The application

Abstract Syntax Forests (ASF's) are my most recent project. I am adding ASF's to my Marpa parser. Marpa has long supported ambiguous parsing, and allowed users to iterate through, and examine, all the parses of an ambiguous parse. This was enough for most applications.

Even applications which avoid ambiguity benefit from better ways to detect and locate it. And there are applications that require the ability to select among and manipulate very large sets of ambiguous parses. Prominent among these is Natural Language Processing (NLP). This post will introduce an experiment. Marpa in fact seems to have some potential for NLP.

Writing an efficient ASF in not a simple matter. The naive implementation is to generate complete set of fully expanded abstract syntax trees (AST's). This approach consumes resources that can become exponential in the size of the input. Translation: the naive implementation quickly becomes unuseably slow. Marpa optimizes by aggressively identifying identical subtrees of the AST's. Especially in highly ambiguous parses, many subtrees are identical, and this optimization is often a big win.

Ada Lovelace

My primary NLP example at this point is a quote from Ada Lovelace. It is a long sentence, possibly the longest, from her Notes -- 158 words. A disadvantage of this example is that it is not typical of normal NLP. By modern standards it is an unusually long and complex sentence. An advantage of it, and my reason for the choice, is that it stresses the parser.

The "Note A" from which this sentence is taken is one of Ada's notes on a translation of a paper on the work of her mentor and colleague, Charles Babbage. Ada's "Notes" are longer than the original paper, and far more important. In these "Notes" Ada makes the first distinction between a computer and a calculator, and between software and hardware. In their collaboration, Babbage did all of the hardware design, and he wrote most of the actual programs in her paper. But these two revolutionary ideas, and their elaboration, are Ada's.

Why would Babbage ignore obvious implications of his own invention? The answer is that, while these implications are obvious to us, they simply did not fit into the 1843 view of the world. In those days, algebra was leading-edge math. The ability to manipulate equations was considered an extremely advanced form of reason. For Babbage and his contemporaries, that sort of ability to reason certainly suggested the ability to distinguish between good and evil, and this in turn suggested possession of a soul. Ada's "Notes" were written 20 years after Mary Shelly, while visiting Ada's father in Switzerland, wrote the novel Frankenstein. For Ada's contemporaries, announcing that you planned to create a machine that composed music, or did advanced mathematical reasoning, was not very different from announcing that you planned to assemble a human being in your lab.

Ada was the daughter of the poet Byron. For her, pushing boundaries was a family tradition. Babbage was happy to leave these matters to Ada. As Babbage's son put it, his father

considered the Paper by Menabrea, translated with notes by Lady Lovelace, published in volume 3 of Taylor's 'Scientific Memoirs," as quite disposing of the mathematical aspect of the invention. My business now is not with that.

On reading Ada

Ada's notes are worth reading, but the modern reader has to be prepared to face several layers of difficulty:

  • They are in Victorian English. In modern English, a long complex sentence is usually considered a editing failure. In Ada's time, following Greek and Roman examples, a periodic sentence was considered especially appropriate when making an important point. And good literary style and good scientific style were considered one and the same.
  • They are mathematical, and none of math is of the kind currently studied by programmers.
  • Ada has literally no prior literature on software to build on, and has to invent her terminology. It is almost never the modern terminology, and it can be hard to guess how it relates to modern terminology. For example, does Ada forsee objects, methods and classes? Ada speaks of computing both symbolic results and numeric data, and attaching one to the other. She clearly understands that the symbolic results can represent operations. Ada also clearly understands that numeric data can represent not just the numbers themselves, but notes, positions in a loom, or computer operations. So we have arbitrary data, tagged with symbols that can be both names and operations. But are these objects?
  • Finally, she associates mathematics with philosophy. In her day, this was expected. Unfortunately, modern readers now often see that sort of discussion as irrelevant, or even as a sign of inability to come to the point.

Ada's quote

Those who view mathematical science, not merely as a vast body of abstract and immutable truths, whose intrinsic beauty, symmetry and logical completeness, when regarded in their connexion together as a whole, entitle them to a prominent place in the interest of all profound and logical minds, but as possessing a yet deeper interest for the human race, when it is remembered that this science constitutes the language through which alone we can adequately express the great facts of the natural world, and those unceasing changes of mutual relationship which, visibly or invisibly, consciously or unconsciously to our immediate physical perceptions, are interminably going on in the agencies of the creation we live amidst: those who thus think on mathematical truth as the instrument through which the weak mind of man can most effectually read his Creator's works, will regard with especial interest all that can tend to facilitate the translation of its principles into explicit practical forms.

Ada, the bullet point version

Ada's sentence may look like what happens when two pickups carrying out-of-date dictionaries to the landfill run into each other on the way. But there is, in fact, a good deal of structure and meaning in all those words. Let's take it as bullet points:

  • 1. Math is awesome just for being itself.
  • 2. Math describes and predicts the external world.
  • 3. Math is the best way to get at what it is that is really behind existence.
  • 4. If we can do more and better math, that has to be a good thing.

Ada is connecting her new science of software to the history of thought in the West, something which readers of the time would expect her to do. Bullet point 1 alludes to the Greek view of mathematics, especially Plato's. Bullet point 2 alludes to the scientific view, as pioneered by Galileo and Newton. Bullet point 3 alludes to the post-Classical world view, especially the Christian one. But while the language is Christian, Ada's idea is one that Einstein would have had no trouble with. And bullet 4 is the call to action.

When we come to discuss the parse in detail, we'll see that it follows this structure. As an aside, note Ada's mention of "logical completeness" as one of the virtues of math. Gödel came along nearly a century later and showed this vision, which went back to the Greeks, was an illusion. So Ada did not predict everything. On the other hand, Gödel's result was also a complete surprise to Johnny von Neumann, who was in the room that day.

The experiment so far

I've gotten Marpa to grind through this sentence, using the same framework as the Stanford NLP demo. That demo, in fact, refuses to even attempt any sentence longer than 70 words, so my Ada quote needs to be broken up. Even on the smaller pieces, the Stanford demo becomes quite slow. Marpa, by contrast, grinds through the whole thing quickly. The Stanford demo is based on a CYK parser, and presumably is O(n3) -- cubic. Marpa seems to be exhibiting linear behavior.

Promising as this seems for Marpa, its first results may not hold up as the experiment gets more realistic. So far, I've only given Marpa enough English grammar and vocabulary to parse this one sentence. That is enough to make the grammar very complex and ambiguous, but even so it must be far less complex and ambiguous than the one behind the Stanford demo. Marpa will never have time worse than O(n3), but it's quite possible that if Marpa's grammar were as ambiguous as the Stanford one, Marpa would be no faster. Marpa, in fact, could turn out to be slower by some linear factor.

There may never be a final decision based on speed. Marpa might turn out to represent one approach, good for certain purposes. And, especially when speed is indecisive, other abilities can prove more important.

To learn more

Marpa::R2 is available on CPAN. A list of my Marpa tutorials can be found here. There are new tutorials by Peter Stuifzand and amon. The Ocean of Awareness blog focuses on Marpa, and it has an annotated guide. Marpa also has a web page. For questions, support and discussion, there is the "marpa parser" Google Group. Comments on this post can be made there.

Dave's Free Press: Journal: Graphing tool

Dave's Free Press: Journal: XML::Tiny released

Perlgeek.de : Pattern Matching and Unpacking

When talking about pattern matching in the context of Perl 6, people usually think about regex or grammars. Those are indeed very powerful tools for pattern matching, but not the only one.

Another powerful tool for pattern matching and for unpacking data structures uses signatures.

Signatures are "just" argument lists:

sub repeat(Str $s, Int $count) {
    #     ^^^^^^^^^^^^^^^^^^^^  the signature
    # $s and $count are the parameters
    return $s x $count
}

Nearly all modern programming languages have signatures, so you might say: nothing special, move along. But there are two features that make them more useful than signatures in other languages.

The first is multi dispatch, which allows you to write several routines with the name, but with different signatures. While extremely powerful and helpful, I don't want to dwell on them. Look at Chapter 6 of the "Using Perl 6" book for more details.

The second feature is sub-signatures. It allows you to write a signature for a sigle parameter.

Which sounds pretty boring at first, but for example it allows you to do declarative validation of data structures. Perl 6 has no built-in type for an array where each slot must be of a specific but different type. But you can still check for that in a sub-signature

sub f(@array [Int, Str]) {
    say @array.join: ', ';
}
f [42, 'str'];      # 42, str
f [42, 23];         # Nominal type check failed for parameter '';
                    # expected Str but got Int instead in sub-signature
                    # of parameter @array

Here we have a parameter called @array, and it is followed by a square brackets, which introduce a sub-signature for an array. When calling the function, the array is checked against the signature (Int, Str), and so if the array doesn't contain of exactly one Int and one Str in this order, a type error is thrown.

The same mechanism can be used not only for validation, but also for unpacking, which means extracting some parts of the data structure. This simply works by using variables in the inner signature:

sub head(*@ [$head, *@]) {
    $head;
}
sub tail(*@ [$, *@tail]) {
    @tail;
}
say head <a b c >;      # a
say tail <a b c >;      # b c

Here the outer parameter is anonymous (the @), though it's entirely possible to use variables for both the inner and the outer parameter.

The anonymous parameter can even be omitted, and you can write sub tail( [$, *@tail] ) directly.

Sub-signatures are not limited to arrays. For working on arbitrary objects, you surround them with parenthesis instead of brackets, and use named parameters inside:

multi key-type ($ (Numeric :$key, *%)) { "Number" }
multi key-type ($ (Str     :$key, *%)) { "String" }
for (42 => 'a', 'b' => 42) -> $pair {
    say key-type $pair;
}
# Output:
# Number
# String

This works because the => constructs a Pair, which has a key and a value attribute. The named parameter :$key in the sub-signature extracts the attribute key.

You can build quite impressive things with this feature, for example red-black tree balancing based on multi dispatch and signature unpacking. (More verbose explanation of the code.) Most use cases aren't this impressive, but still it is very useful to have occasionally. Like for this small evaluator.

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 2

Perlgeek.de : YAPC Europe 2013 Day 3

The second day of YAPC Europe climaxed in the river boat cruise, Kiev's version of the traditional conference dinner. It was a largish boat traveling on the Dnipro river, with food, drinks and lots of Perl folks. Not having fixed tables, and having to get up to fetch food and drinks led to a lot of circulation, and thus meeting many more people than at traditionally dinners. I loved it.

Day 3 started with a video message from next year's YAPC Europe organizers, advertising for the upcoming conference and talking a bit about the oppurtunities that Sofia offers. Tempting :-).

Monitoring with Perl and Unix::Statgrab was more about the metrics that are available for monitoring, and less about doing stuff with Perl. I was a bit disappointed.

The "Future Perl Versioning" Discussion was a very civilized discussion, with solid arguments. Whether anybody changed their minds remain to be seen.

Carl Mäsak gave two great talks: one on reactive programming, and one on regular expressions. I learned quite a bit in the first one, and simply enjoyed the second one.

After the lunch (tasty again), I attended Jonathan Worthington's third talk, MoarVM: a metamodel-focused runtime for NQP and Rakudo. Again this was a great talk, based on great work done by Jonathan and others during the last 12 months or so. MoarVM is a virtual machine designed for Perl 6's needs, as we understand them now (as opposed to parrot, which was designed towards Perl 6 as it was understood around 2003 or so, which is considerably different).

How to speak manager was both amusing and offered a nice perspective on interactions between managers and programmers. Some of this advice assumed a non-tech-savy manager, and thus didn't quite apply to my current work situation, but was still interesting.

I must confess I don't remember too much of the rest of the talks that evening. I blame five days of traveling, hackathon and conference taking their toll on me.

The third session of lightning talks was again an interesting mix, containing interesting technical tidbits, the usual "we are hiring" slogans, some touching and thoughtful moments, and finally a song by Piers Cawley. He had written the lyrics in the previous 18 hours (including sleep), to (afaict) a traditional irish song. Standing up in front of ~300 people and singing a song that you haven't really had time to practise takes a huge amount of courage, and I admire Piers both for his courage and his great performance. I hope it was recorded, and makes it way to the public soon.

Finally the organizers spoke some closing words, and received their well-deserved share of applause.

As you might have guess from this and the previous blog posts, I enjoyed this year's YAPC Europe very much, and found it well worth attending, and well organized. I'd like to give my heart-felt thanks to everybody who helped to make it happen, and to my employer for sending me there.

This being only my second YAPC, I can't make any far-reaching comparisons, but compared to YAPC::EU 2010 in Pisa I had an easier time making acquaintances. I cannot tell what the big difference was, but the buffet-style dinners at the pre-conference meeting and the river boat cruise certainly helped to increase the circulation and thus the number of people I talked to.

Dave's Free Press: Journal: YAPC::Europe 2007 travel plans

Perlgeek.de : A small regex optimization for NQP and Rakudo

Recently I read the course material of the Rakudo and NQP Internals Workshop, and had an idea for a small optimization for the regex engine. Yesterday night I implemented it, and I'd like to walk you through the process.

As a bit of background, the regex engine that Rakudo uses is actually implemented in NQP, and used by NQP too. The code I am about to discuss all lives in the NQP repository, but Rakudo profits from it too.

In addition one should note that the regex engine is mostly used for parsing grammar, a process which involves nearly no scanning. Scanning is the process where the regex engine first tries to match the regex at the start of the string, and if it fails there, moves to the second character in the string, tries again etc. until it succeeds.

But regexes that users write often involve scanning, and so my idea was to speed up regexes that scan, and where the first thing in the regex is a literal. In this case it makes sense to find possible start positions with a fast string search algorithm, for example the Boyer-Moore algorithm. The virtual machine backends for NQP already implement that as the index opcode, which can be invoked as start = index haystack, needle, startpos, where the string haystack is searched for the substring needle, starting from position startpos.

From reading the course material I knew I had to search for a regex type called scan, so that's what I did:

$ git grep --word scan
3rdparty/libtommath/bn_error.c:   /* scan the lookup table for the given message
3rdparty/libtommath/bn_mp_cnt_lsb.c:   /* scan lower digits until non-zero */
3rdparty/libtommath/bn_mp_cnt_lsb.c:   /* now scan this digit until a 1 is found
3rdparty/libtommath/bn_mp_prime_next_prime.c:                   /* scan upwards 
3rdparty/libtommath/changes.txt:       -- Started the Depends framework, wrote d
src/QRegex/P5Regex/Actions.nqp:                     QAST::Regex.new( :rxtype<sca
src/QRegex/P6Regex/Actions.nqp:                     QAST::Regex.new( :rxtype<sca
src/vm/jvm/QAST/Compiler.nqp:    method scan($node) {
src/vm/moar/QAST/QASTRegexCompilerMAST.nqp:    method scan($node) {
Binary file src/vm/moar/stage0/NQPP6QRegexMoar.moarvm matches
Binary file src/vm/moar/stage0/QASTMoar.moarvm matches
src/vm/parrot/QAST/Compiler.nqp:    method scan($node) {
src/vm/parrot/stage0/P6QRegex-s0.pir:    $P5025 = $P5024."new"("scan" :named("rx
src/vm/parrot/stage0/QAST-s0.pir:.sub "scan" :subid("cuid_135_1381944260.6802") 
src/vm/parrot/stage0/QAST-s0.pir:    push $P5004, "scan"

The binary files and .pir files are generated code included just for bootstrapping, and not interesting for us. The files in 3rdparty/libtommath are there for bigint handling, thus not interesting for us either. The rest are good matches: src/QRegex/P6Regex/Actions.nqp is responsible for compiling Perl 6 regexes to an abstract syntax tree (AST), and src/vm/parrot/QAST/Compiler.nqp compiles that AST down to PIR, the assembly language that the Parrot Virtual Machine understands.

So, looking at src/QRegex/P6Regex/Actions.nqp the place that mentions scan looked like this:

    $block<orig_qast> := $qast;
    $qast := QAST::Regex.new( :rxtype<concat>,
                 QAST::Regex.new( :rxtype<scan> ),
                 $qast,
                 ($anon
                      ?? QAST::Regex.new( :rxtype<pass> )
                      !! (nqp::substr(%*RX<name>, 0, 12) ne '!!LATENAME!!'
                            ?? QAST::Regex.new( :rxtype<pass>, :name(%*RX<name>) )
                            !! QAST::Regex.new( :rxtype<pass>,
                                   QAST::Var.new(
                                       :name(nqp::substr(%*RX<name>, 12)),
                                       :scope('lexical')
                                   ) 
                               )
                          )));

So to make the regex scan, the AST (in $qast) is wrapped in QAST::Regex.new(:rxtype<concat>,QAST::Regex.new( :rxtype<scan> ), $qast, ...), plus some stuff I don't care about.

To make the optimization work, the scan node needs to know what to scan for, if the first thing in the regex is indeed a constant string, aka literal. If it is, $qast is either directly of rxtype literal, or a concat node where the first child is a literal. As a patch, it looks like this:

--- a/src/QRegex/P6Regex/Actions.nqp
+++ b/src/QRegex/P6Regex/Actions.nqp
@@ -667,9 +667,21 @@ class QRegex::P6Regex::Actions is HLL::Actions {
     self.store_regex_nfa($code_obj, $block, QRegex::NFA.new.addnode($qast))
     self.alt_nfas($code_obj, $block, $qast);
 
+    my $scan := QAST::Regex.new( :rxtype<scan> );
+    {
+        my $q := $qast;
+        if $q.rxtype eq 'concat' && $q[0] {
+            $q := $q[0]
+        }
+        if $q.rxtype eq 'literal' {
+            nqp::push($scan, $q[0]);
+            $scan.subtype($q.subtype);
+        }
+    }
+
     $block<orig_qast> := $qast;
     $qast := QAST::Regex.new( :rxtype<concat>,
-                 QAST::Regex.new( :rxtype<scan> ),
+                 $scan,
                  $qast,

Since concat nodes have always been empty so far, the code generators don't look at their child nodes, and adding one with nqp::push($scan, $q[0]); won't break anything on backends that don't support this optimization yet (which after just this patch were all of them). Running make test confirmed that.

My original patch did not contain the line $scan.subtype($q.subtype);, and later on some unit tests started to fail, because regex matches can be case insensitive, but the index op works only case sensitive. For case insensitive matches, the $q.subtype of the literal regex node would be ignorecase, so that information needs to be carried on to the code generation backend.

Once that part was in place, and some debug nqp::say() statements confirmed that it indeed worked, it was time to look at the code generation. For the parrot backend, it looked like this:

    method scan($node) {
        my $ops := self.post_new('Ops', :result(%*REG<cur>));
        my $prefix := self.unique('rxscan');
        my $looplabel := self.post_new('Label', :name($prefix ~ '_loop'));
        my $scanlabel := self.post_new('Label', :name($prefix ~ '_scan'));
        my $donelabel := self.post_new('Label', :name($prefix ~ '_done'));
        $ops.push_pirop('repr_get_attr_int', '$I11', 'self', %*REG<curclass>, '"$!from"');
        $ops.push_pirop('ne', '$I11', -1, $donelabel);
        $ops.push_pirop('goto', $scanlabel);
        $ops.push($looplabel);
        $ops.push_pirop('inc', %*REG<pos>);
        $ops.push_pirop('gt', %*REG<pos>, %*REG<eos>, %*REG<fail>);
        $ops.push_pirop('repr_bind_attr_int', %*REG<cur>, %*REG<curclass>, '"$!from"', %*REG<pos>);
        $ops.push($scanlabel);
        self.regex_mark($ops, $looplabel, %*REG<pos>, 0);
        $ops.push($donelabel);
        $ops;
    }

While a bit intimidating at first, staring at it for a while quickly made clear what kind of code it emits. First three labels are generated, to which the code can jump with goto $label: One as a jump target for the loop that increments the cursor position ($looplabel), one for doing the regex match at that position ($scanlabel), and $donelabel for jumping to when the whole thing has finished.

Inside the loop there is an increment (inc) of the register the holds the current position (%*REG<pos>), that position is compared to the end-of-string position (%*REG<eos>), and if is larger, the cursor is marked as failed.

So the idea is to advance the position by one, and then instead of doing the regex match immediately, call the index op to find the next position where the regex might succeed:

--- a/src/vm/parrot/QAST/Compiler.nqp
+++ b/src/vm/parrot/QAST/Compiler.nqp
@@ -1564,7 +1564,13 @@ class QAST::Compiler is HLL::Compiler {
         $ops.push_pirop('goto', $scanlabel);
         $ops.push($looplabel);
         $ops.push_pirop('inc', %*REG<pos>);
-        $ops.push_pirop('gt', %*REG<pos>, %*REG<eos>, %*REG<fail>);
+        if nqp::elems($node.list) && $node.subtype ne 'ignorecase' {
+            $ops.push_pirop('index', %*REG<pos>, %*REG<tgt>, self.rxescape($node[0]), %*REG<pos>);
+            $ops.push_pirop('eq', %*REG<pos>, -1, %*REG<fail>);
+        }
+        else {
+            $ops.push_pirop('gt', %*REG<pos>, %*REG<eos>, %*REG<fail>);
+        }
         $ops.push_pirop('repr_bind_attr_int', %*REG<cur>, %*REG<curclass>, '"$!from"', %*REG<pos>);
         $ops.push($scanlabel);
         self.regex_mark($ops, $looplabel, %*REG<pos>, 0);

The index op returns -1 on failure, so the condition for a cursor fail are slightly different than before.

And as mentioned earlier, the optimization can only be safely done for matches that don't ignore case. Maybe with some additional effort that could be remedied, but it's not as simple as case-folding the target string, because some case folding operations can change the string length (for example ß becomes SS while uppercasing).

After successfully testing the patch, I came up with a small, artifical benchmark designed to show a difference in performance for this particular case. And indeed, it sped it up from 647 ± 28 µs to 161 ± 18 µs, which is roughly a factor of four.

You can see the whole thing as two commits on github.

What remains to do is implementing the same optimization on the JVM and MoarVM backends, and of course other optimizations. For example the Perl 5 regex engine keeps track of minimal and maximal string lengths for each subregex, and can anchor a regex like /a?b?longliteral/ to 0..2 characters before a match of longliteral, and generally use that meta information to fail faster.

But for now I am mostly encouraged that doing a worthwhile optimization was possible in a single evening without any black magic, or too intimate knowledge of the code generation.

Update: the code generation for MoarVM now also uses the index op. The logic is the same as for the parrot backend, the only difference is that the literal needs to be loaded into a register (whose name fresh_s returns) before index_s can use it.

Perlgeek.de : Quo Vadis Perl?

The last two days we had a gathering in town named Perl (yes, a place with that name exists). It's a lovely little town next to the borders to France and Luxembourg, and our meeting was titled "Perl Reunification Summit".

Sadly I only managed to arrive in Perl on Friday late in the night, so I missed the first day. Still it was totally worth it.

We tried to answer the question of how to make the Perl 5 and the Perl 6 community converge on a social level. While we haven't found the one true answer to that, we did find that discussing the future together, both on a technical and on a social level, already brought us closer together.

It was quite a touching moment when Merijn "Tux" Brand explained that he was skeptic of Perl 6 before the summit, and now sees it as the future.

We also concluded that copying API design is a good way to converge on a technical level. For example Perl 6's IO subsystem is in desperate need of a cohesive design. However none of the Perl 6 specification and the Rakudo development team has much experience in that area, and copying from successful Perl 5 modules is a viable approach here. Path::Class and IO::All (excluding the crazy parts) were mentioned as targets worth looking at.

There is now also an IRC channel to continue our discussions -- join #p6p5 on irc.perl.org if you are interested.

We also discussed ways to bring parallel programming to both perls. I missed most of the discussion, but did hear that one approach is to make easier to send other processes some serialized objects, and thus distribute work among several cores.

Patrick Michaud gave a short ad-hoc presentation on implicit parallelism in Perl 6. There are several constructs where the language allows parallel execution, for example for Hyper operators, junctions and feeds (think of feeds as UNIX pipes, but ones that allow passing of objects and not just strings). Rakudo doesn't implement any of them in parallel right now, because the Parrot Virtual Machine does not provide the necessary primitives yet.

Besides the "official" program, everybody used the time in meat space to discuss their favorite projects with everybody else. For example I took some time to discuss the future of doc.perl6.org with Patrick and Gabor Szabgab, and the relation to perl6maven with the latter. The Rakudo team (which was nearly completely present) also discussed several topics, and I was happy to talk about the relation between Rakudo and Parrot with Reini Urban.

Prior to the summit my expectations were quite vague. That's why it's hard for me to tell if we achieved what we and the organizers wanted. Time will tell, and we want to summarize the result in six to nine months. But I am certain that many participants have changed some of their views in positive ways, and left the summit with a warm, fuzzy feeling.

I am very grateful to have been invited to such a meeting, and enjoyed it greatly. Our host and organizers, Liz and Wendy, took care of all of our needs -- travel, food, drinks, space, wifi, accommodation, more food, entertainment, food for thought, you name it. Thank you very much!

Update: Follow the #p6p5 hash tag on twitter if you want to read more, I'm sure other participants will blog too.

Other blogs posts on this topic: PRS2012 – Perl5-Perl6 Reunification Summit by mdk and post-yapc by theorbtwo

Dave's Free Press: Journal: Wikipedia handheld proxy

Dave's Free Press: Journal: Bryar security hole

Dave's Free Press: Journal: Thankyou, Anonymous Benefactor!

Dave's Free Press: Journal: Number::Phone release

Dave's Free Press: Journal: Ill

Dave's Free Press: Journal: CPANdeps upgrade

Perlgeek.de : iPod nano 5g on linux -- works!

For Christmas I got an iPod nano (5th generation). Since I use only Linux on my home computers, I searched the Internet for how well it is supported by Linux-based tools. The results looked bleak, but they were mostly from 2009.

Now (December 2012) on my Debian/Wheezy system, it just worked.

The iPod nano 5g presents itself as an ordinary USB storage device, which you can mount without problems. However simply copying files on it won't make the iPod show those files in the play lists, because there is some meta data stored on the device that must be updated too.

There are several user-space programs that allow you to import and export music from and to the iPod, and update those meta data files as necessary. The first one I tried, gtkpod 2.1.2, worked fine.

Other user-space programs reputed to work with the iPod are rhythmbox and amarok (which both not only organize but also play music).

Although I don't think anything really depends on some particular versions here (except that you need a new enough version of gtkpod), here is what I used:

  • Architecture: amd64
  • Linux: 3.2.0-4-amd64 #1 SMP Debian 3.2.35-2
  • Userland: Debian GNU/Linux "Wheezy" (currently "testing")
  • gtkpod: 2.1.2-1

Dave's Free Press: Journal: CPANdeps

Dave's Free Press: Journal: Module pre-requisites analyser

Dave's Free Press: Journal: Perl isn't dieing

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 3

Perlgeek.de : The Fun of Running a Public Web Service, and Session Storage

One of my websites, Sudokugarden, recently surged in traffic, from about 30k visitors per month to more than 100k visitors per month. Here's the tale of what that meant for the server side.

As a bit of background, I built the website in 2007, when I knew a lot less about the web and programming. It runs on a host that I share with a few friends; I don't have root access on that machine, though when the admin is available, I can generally ask him to install stuff for me.

Most parts of the websites are built as static HTML files, with Server Side Includes. Parts of those SSIs are Perl CGI scripts. The most popular part though, which allows you to solve Sudoku in the browser and keeps hiscores, is written as a collection of Perl scripts, backed by a mysql database.

When at peak times the site had more than 10k visitors a day, lots of visitors would get a nasty mysql: Cannot connect: Too many open connections error. The admin wasn't available for bumping the connection limit, so I looked for other solutions.

My first action was to check the logs for spammers and crawlers that might hammered the page, and I found and banned some; but the bulk of the traffic looked completely legitimate, and the problem persisted.

Looking at the seven year old code, I realized that most pages didn't actually need a database connection, if only I could remove the session storage from the database. And, in fact, I could. I used CGI::Session, which has pluggable backend. Switching to a file-based session backend was just a matter of changing the connection string and adding a directory for session storage. Luckily the code was clean enough that this only affected a single subroutine. Everything was fine.

For a while.

Then, about a month later, the host ran out of free disk space. Since it is used for other stuff too (like email, and web hosting for other users) it took me a while to make the connection to the file-based session storage. What happened was 3 million session files on a ext3 file system with a block size of 4 kilobyte. A session is only about 400 byte, but since a file uses up a multiple of the block size, the session storage amounted to 12 gigabyte of used-up disk space, which was all that was left on that machine.

Deleting those sessions turned out to be a problem; I could only log in as my own user, which doesn't have write access to the session files (which are owned by www-data, the Apache user). The solution was to upload a CGI script that deleted the session, but of course that wasn't possible at first, because the disk was full. In the end I had to delete several gigabyte of data from my home directory before I could upload anything again. (Processes running as root were still writing to reserved-to-root portions of the file system, which is why I had to delete so much data before I was able to write again).

Even when I was able to upload the deletion script, it took quite some time to actually delete the session files; mostly because the directory was too large, and deleting files on ext3 is slow. When the files were gone, the empty session directory still used up 200MB of disk space, because the directory index doesn't shrink on file deletion.

Clearly a better solution to session storage was needed. But first I investigated where all those sessions came from, and banned a few spamming IPs. I also changed the code to only create sessions when somebody logs in, not give every visitor a session from the start.

My next attempt was to write the sessions to an SQLite database. It uses about 400 bytes per session (plus a fixed overhead for the db file itself), so it uses only a tenth of storage space that the file-based storage used. The SQLite database has no connection limit, though the old-ish version that was installed on the server doesn't seem to have very fine-grained locking either; within a few days I could errors that the session database was locked.

So I added another layer of workaround: creating a separate session database per leading IP octet. So now there are up to 255 separate session database (plus a 256th for all IPv6 addresses; a decision that will have to be revised when IPv6 usage rises). After a few days of operation, it seems that this setup works well enough. But suspicious as I am, I'll continue monitoring both disk usage and errors from Apache.

So, what happens if this solution fails to work out? I can see basically two approaches: move the site to a server that's fully under my control, and use redis or memcached for session storage; or implement sessions with signed cookies that are stored purely on the client side.

Perlgeek.de : YAPC Europe 2013 Day 2

The second day of YAPC Europe was enjoyable and informative.

I learned about ZeroMQ, which is a bit like sockets on steriods. Interesting stuff. Sadly Design decisions on p2 didn't quite qualify as interesting.

Matt's PSGI archive is a project to rewrite Matt's infamous script archive in modern Perl. Very promising, and a bit entertaining too.

Lunch was very tasty, more so than the usual mass catering. Kudos to the organizers!

After lunch, jnthn talked about concurrency, parallelism and asynchrony in Perl 6. It was a great talk, backed by great work on the compiler and runtime. Jonathans talk are always to be recommended.

I think I didn't screw up my own talk too badly, at least the timing worked fine. I just forgot to show the last slide. No real harm done.

I also enjoyed mst's State of the Velociraptor, which was a summary of what went on in the Perl world in the last year. (Much better than the YAPC::EU 2010 talk with the same title).

The Lightning talks were as enjoyable as those from the previous day. So all fine!

Next up is the river cruise, I hope to blog about that later on.

Perlgeek.de : Stop The Rewrites!

What follows is a rant. If you're not in the mood to read a rant right now, please stop and come back in an hour or two.

The Internet is full of people who know better than you how to manage your open source project, even if they only know some bits and pieces about it. News at 11.

But there is one particular instance of that advice that I hear often applied to Rakudo Perl 6: Stop the rewrites.

To be honest, I can fully understand the sentiment behind that advice. People see that it has taken us several years to get where we are now, and in their opinion, that's too long. And now we shouldn't waste our time with rewrites, but get the darn thing running already!

But Software development simply doesn't work that way. Especially not if your target is moving, as is Perl 6. (Ok, Perl 6 isn't moving that much anymore, but there are still areas we don't understand very well, so our current understanding of Perl 6 is a moving target).

At some point or another, you realize that with your current design, you can only pile workaround on top of workaround, and hope that the whole thing never collapses.

Picture of
a Jenga tower
Image courtesy of sermoa

Those people who spread the good advice to never do any major rewrites again, they never address what you should do when you face such a situation. Build the tower of workarounds even higher, and pray to Cthulhu that you can build it robust enough to support a whole stack of third-party modules?

Curiously this piece of advice occasionally comes from people who otherwise know a thing or two about software development methodology.

I should also add that since the famous "nom" switchover, which admittedly caused lots of fallout, we had three major rewrites of subsystems (longest-token matching of alternative, bounded serialization and qbootstrap), All three of which caused no new test failures, and two of which caused no fallout from the module ecosystem at all. In return, we have much faster startup (factor 3 to 4 faster) and a much more correct regex engine.

Perlgeek.de : The REPL trick

A recent discussion on IRC prompted me to share a small but neat trick with you.

If there are things you want to do quite often in the Rakudo REPL (the interactive "Read-Evaluate-Print Loop"), it makes sense to create a shortcut for them. And creating shortcuts for often-used stuff is what programming languages excel at, so you do it right in Perl module:

use v6;
module REPLHelper;

sub p(Mu \x) is export {
    x.^mro.map: *.^name;
}

I have placed mine in $HOME/.perl6/repl.

And then you make sure it's loaded automatically:

$ alias p6repl="perl6 -I$HOME/.perl6/repl/ -MREPLHelper"
$ p6repl
> p Int
Int Cool Any Mu
>

Now you have a neat one-letter function which tells you the parents of an object or a type, in method resolution order. And a way to add more shortcuts when you need them.

Perlgeek.de : News in the Rakudo 2012.06 release

Rakudo development continues to progress nicely, and so there are a few changes in this month's release worth explaining.

Longest Token Matching, List Iteration

The largest chunk of development effort went into Longest-Token Matching for alternations in Regexes, about which Jonathan already blogged. Another significant piece was Patrick's refactor of list iteration. You probably won't notice much of that, except that for-loops are now a bit faster (maybe 10%), and laziness works more reliably in a couple of cases.

String to Number Conversion

String to number conversion is now stricter than before. Previously an expression like +"foo" would simply return 0. Now it fails, ie returns an unthrown exception. If you treat that unthrown exception like a normal value, it blows up with a helpful error message, saying that the conversion to a number has failed. If that's not what you want, you can still write +$str // 0.

require With Argument Lists

require now supports argument lists, and that needs a bit more explaining. In Perl 6 routines are by default only looked up in lexical scopes, and lexical scopes are immutable at run time. So, when loading a module at run time, how do you make functions available to the code that loads the module? Well, you determine at compile time which symbols you want to import, and then do the actual importing at run time:

use v6;
require Test <&plan &ok &is>;
#            ^^^^^^^^^^^^^^^ evaluated at compile time,
#                            declares symbols &plan, &ok and &is
#       ^^^                  loaded at run time

Module Load Debugging

Rakudo had some trouble when modules were precompiled, but its dependencies were not. This happens more often than it sounds, because Rakudo checks timestamps of the involved files, and loads the source version if it is newer than the compiled file. Since many file operations (including simple copying) change the time stamp, that could happen very easily.

To make debugging of such errors easier, you can set the RAKUDO_MODULE_DEBUG environment variable to 1 (or any positive number; currently there is only one debugging level, in the future higher numbers might lead to more output).

$ RAKUDO_MODULE_DEBUG=1 ./perl6 -Ilib t/spec/S11-modules/require.t
MODULE_DEBUG: loading blib/Perl6/BOOTSTRAP.pbc
MODULE_DEBUG: done loading blib/Perl6/BOOTSTRAP.pbc
MODULE_DEBUG: loading lib/Test.pir
MODULE_DEBUG: done loading lib/Test.pir
1..5
MODULE_DEBUG: loading t/spec/packages/Fancy/Utilities.pm
MODULE_DEBUG: done loading t/spec/packages/Fancy/Utilities.pm
ok 1 - can load Fancy::Utilities at run time
ok 2 - can call our-sub from required module
MODULE_DEBUG: loading t/spec/packages/A.pm
MODULE_DEBUG: loading t/spec/packages/B.pm
MODULE_DEBUG: loading t/spec/packages/B/Grammar.pm
MODULE_DEBUG: done loading t/spec/packages/B/Grammar.pm
MODULE_DEBUG: done loading t/spec/packages/B.pm
MODULE_DEBUG: done loading t/spec/packages/A.pm
ok 3 - can require with variable name
ok 4 - can call subroutines in a module by name
ok 5 - require with import list

Module Loading Traces in Compile-Time Errors

If module myA loads module myB, and myB dies during compilation, you now get a backtrace which indicates through which path the erroneous module was loaded:

$ ./perl6 -Ilib -e 'use myA'
===SORRY!===
Placeholder variable $^x may not be used here because the surrounding block
takes no signature
at lib/myB.pm:1
  from module myA (lib/myA.pm:3)
  from -e:1

Improved autovivification

Perl allows you to treat not-yet-existing array and hash elements as arrays or hashes, and automatically creates those elements for you. This is called autovivification.

my %h;
%h<x>.push: 1, 2, 3; # worked in the previous release too
push %h<y>, 4, 5, 6; # newly works in the 2012.06

Dave's Free Press: Journal: Travelling in time: the CP2000AN

Perlgeek.de : Localization for Exception Messages

Ok, my previous blog post wasn't quite as final as I thought.. My exceptions grant said that the design should make it easy to enable localization and internationalization hooks. I want to discuss some possible approaches and thereby demonstrate that the design is flexible enough as it is.

At this point I'd like to mention that much of the flexibility comes from either Perl 6 itself, or from the separation of stringifying and exception and generating the actual error message.

Mixins: the sledgehammer

One can always override a method in an object by mixing in a role which contains the method on question. When the user requests error messages in a different language, one can replace method Str or method message with one that generates the error message in a different language.

Where should that happen? The code throws exceptions is fairly scattered over the code base, but there is a central piece of code in Rakudo that turns Parrot-level exceptions into Perl 6 level exceptions. That would be an obvious place to muck with exceptions, but it would mean that exceptions that are created but not thrown don't get the localization. I suspect that's a fairly small problem in the real world, but it still carries code smell. As does the whole idea of overriding methods.

Another sledgehammer: alternative setting

Perl 6 provides built-in types and routines in an outer lexical scope known as a "setting". The default setting is called CORE. Due to the lexical nature of almost all lookups in Perl 6, one can "override" almost anything by providing a symbol of the same name in a lexical scope.

One way to use that for localization is to add another setting between the user's code and CORE. For example a file DE.setting:

my class X::Signature::Placeholder does X::Comp {
    method message() {
        'Platzhaltervariablen können keine bestehenden Signaturen überschreiben';
    }
}

After compiling, we can load the setting:

$ ./perl6 --target=pir --output=DE.setting.pir DE.setting
$ ./install/bin/parrot -o DE.setting.pbc DE.setting.pir
$ ./perl6 --setting=DE -e 'sub f() { $^x }'
===SORRY!===
Platzhaltervariablen können keine bestehenden Signaturen überschreiben
at -e:1

That works beautifully for exceptions that the compiler throws, because they look up exception types in the scope where the error occurs. Exceptions from within the setting are a different beast, they'd need special lookup rules (though the setting throws far fewer exceptions than the compiler, so that's probably manageable).

But while this looks quite simple, it comes with a problem: if a module is precompiled without the custom setting, and it contains a reference to an exception type, and then the l10n setting redefines it, other programs will contain references to a different class with the same name. Which means that our precompiled module might only catch the English version of X::Signature::Placeholder, and lets our localized exception pass through. Oops.

Tailored solutions

A better approach is probably to simply hack up the string conversion in type Exception to consider a translator routine if present, and pass the invocant to that routine. The translator routine can look up the error message keyed by the type of the exception, and has access to all data carried in the exception. In untested Perl 6 code, this might look like this:

# required change in CORE
my class Exception {
    multi method Str(Exception:D:) {
        return self.message unless defined $*LANG;
        if %*TRANSLATIONS{$*LANG}{self.^name} -> $translator {
            return $translator(self);
        }
        return self.message; # fallback
    }
}

# that's what a translator could write:

%*TRANSLATIONS<de><X::TypeCheck::Assignment> = {
        "Typenfehler bei Zuweisung zu '$_.symbol()': "
        ~ "'{$_.expected.^name}' erwartet, aber '{$_.got.^name} bekommen"
    }
}

And setting the dynamic language $*LANG to 'de' would give a German error message for type check failures in assignment.

Another approach is to augment existing error classes and add methods that generate the error message in different languages, for example method message-fr for French, and check their existence in Exception.Str if a different language is requested.

Conclusion

In conclusion there are many bad and enough good approaches; we will decide which one to take when the need arises (ie when people actually start to translate error messages).

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 1

Ocean of Awareness: Significant newlines? Or semicolons?

Should statements have explicit terminators, like the semicolon of Perl and the C language? Or should they avoid the clutter, and separate statements by giving whitespace syntactic significance and a real effect on the semantics, as is done in Python and Javascript?

Actually we don't have to go either way. As an example, let's look at some BNF-ish DSL. It defines a small calculator. At first glance, it looks as if this language has taken the significant-whitespace route -- there certainly are no explicit statement terminators.

:default ::= action => ::first
:start ::= Expression
Expression ::= Term
Term ::=
      Factor
    | Term '+' Term action => do_add
Factor ::=
      Number
    | Factor '*' Factor action => do_multiply
Number ~ digits
digits ~ [\d]+
:discard ~ whitespace
whitespace ~ [\s]+

The rule is that there isn't one

If we don't happen to like the layout of the above DSL, and rearrange it in various ways, we'll find that everything we try works. If we become curious about what exactly what the rules for newlines are, and look at the documentation, we won't find any. That's because there aren't any.

We can see this by thoroughly messing up the line structure:

:default ::= action => ::first :start ::= Expression Expression ::= Term
Term ::= Factor | Term '+' Term action => do_add Factor ::= Number |
Factor '*' Factor action => do_multiply Number ~ digits digits ~
[\d]+ :discard ~ whitespace whitespace ~ [\s]+

The script will continue to run just fine.

How does it work?

How does it work? Actually, pose the question this way: Can a human reader tell where the statements end? If the reader is not used to reading BNF, he might have trouble with this particular example but, for a language that he knows, the answer is simple: Yes, of course he can. So really the question is, why do we expect the parser to be so stupid that it cannot?

The only trick is that this is done without trickery. Marpa's DSL is written in itself, and Marpa's self-grammar describes exactly what a statement is and what it is not. The Marpa parser is powerful enough to simply take this self-describing DSL and act on it, finding where statements begin and end, much as a human reader is able to.

To learn more

This example was produced with the Marpa parser. Marpa::R2 is available on CPAN. The code for this example is based on that in the synopsis for its top-level document, but it is isolated conveniently in a Github gist.

A list of my Marpa tutorials can be found here. There are new tutorials by Peter Stuifzand and amon. The Ocean of Awareness blog focuses on Marpa, and it has an annotated guide. Marpa has a web page that I maintain and Ron Savage maintains another. For questions, support and discussion, there is the "marpa parser" Google Group. Comments on this post can be made there.

Dave's Free Press: Journal: Thanks, Yahoo!

Dave's Free Press: Journal: POD includes

Dave's Free Press: Journal: cgit syntax highlighting

Perlgeek.de : First day at YAPC::Europe 2013 in Kiev

Today was the first "real" day of YAPC Europe 2013 in Kiev. In the same sense that it was the first real day, we had quite a nice "unreal" conference day yesterday, with a day-long Perl 6 hackathon, and in the evening a pre-conference meeting a Sovjet-style restaurant with tasty food and beverages.

The talks started with a few words of welcome, and then the announcement that the YAPC Europe next year will be in Sofia, Bulgaria, with the small side note that there were actually three cities competing for that honour. Congratulations to Sofia!

Larry's traditional keynote was quite emotional, and he had to fight tears a few times. Having had cancer and related surgeries in the past year, he still does his perceived duty to the Perl community, which I greatly appreciate.

Afterwards Dave Cross talked about 25 years of Perl in 25 minutes, which was a nice walk through some significant developments in the Perl world, though a bit hasty. Maybe picking fewer events and spending a bit more time on the selected few would give a smoother experience.

Another excellent talk that ran out of time was on Redis. Having experimented a wee bit with Redis in the past month, this was a real eye-opener on the wealth of features we might have used for a project at work, but in the end we didn't. Maybe we will eventually revise that decision.

Ribasushi talked about how hard benchmarking really is, and while I was (in principle) aware of that fact that it's hard to get right, there were still several significant factors that I overlooked (like the CPU's tendency to scale frequency in response to thermal and power-management considerations). I also learned that I should use Dumbbench instead of the Benchmark.pm core module. Sadly it didn't install for me (Capture::Tiny tests failing on Mac OS X).

The Perl 6 is dead, long live Perl 5 talk was much less inflammatory than the title would suggest (maybe due to Larry touching on the subject briefly during the keynote). It was mostly about how Perl 5 is used in the presenter's company, which was mildly interesting.

After tasty free lunch I attended jnthn's talk on Rakudo on the JVM, which was (as is typical for jnthn's talk) both entertaining and taught me something, even though I had followed the project quite a bit.

Thomas Klausner's Bread::Board by example made me want to refactor the OTRS internals very badly, because it is full of the anti-patterns that Bread::Board can solve in a much better way. I think that the OTRS code base is big enough to warrant the usage of Bread::Board.

I enjoyed Denis' talk on Method::Signatures, and was delighted to see that most syntax is directly copied from Perl 6 signature syntax. Talk about Perl 6 sucking creativity out of Perl 5 development.

The conference ended with a session of lighning talks, something which I always enjoy. Many lightning talks had a slightly funny tone or undertone, while still talking about interesting stuff.

Finally there was the "kick-off party", beverages and snacks sponsored by booking.com. There (and really the whole day, and yesterday too) I not only had conversations with my "old" Perl 6 friends, but also talked with many interesting people I never met before, or only met online before.

So all in all it was a nice experience, both from the social side, and from quality and contents of the talks. Venue and food are good, and the wifi too, except when it stops working for a few minutes.

I'm looking forward to two more days of conference!

(Updated: Fixed Thomas' last name)

Ocean of Awareness: Marpa v. Parse::RecDescent: a rematch

The application

In a recent post, I looked at an unusual language which serializes arrays and strings, using a mixture of counts and parentheses. Here is an example:

A2(A2(S3(Hey)S13(Hello, World!))S5(Ciao!))

The language is of special interest for comparison against recursive descent because, while simple, it requires procedural parsing -- a purely declarative BNF approach will not work. So it's a chance to find out if Marpa can play the game that is recursive descent's specialty.

The previous post focused on how to use Marpa to mix procedural and declarative parsing together smoothly, from a coding point of view. It only hinted at another aspect: speed. Over the last year, Marpa has greatly improved its speed for this kind of application. The latest release of Marpa::R2 now clocks in almost 100 times faster than Parse::RecDescent for long inputs.

The benchmark

LengthSeconds
Marpa::R2 Marpa::XS Parse::RecDescent
1000 1.569 2.938 13.616
2000 2.746 7.067 62.083
3000 3.935 13.953 132.549
10000 12.270 121.654 1373.171

Parse::RecDescent is pure Perl, while Marpa is based on a parse engine in a library written in hand-optimized C. You'd expect Marpa to win this race and it did.

And it is nice to see that the changes from Marpa::XS to Marpa::R2 have paid off. Included in the table are the Marpa numbers from my 2012 benchmark of Marpa::XS. Marpa::R2 has a new interface and an internal lexer, and now beats Marpa::XS by a factor of up to 10.

While the benchmarked language is ideally suited to show recursive descent to advantage, the input lengths were picked to emphasize Marpa's strengths. Marpa optimizes by doing a lot of precomputation, and is written with long inputs in mind. Though these days, a 500K source, longer than the longest tested, would not exactly set a new industry record.

To learn more

There are fuller descriptions of the language in Flavio's post and code, and my recent post on how to write a parser for this language. I talk more about the benchmark's methodology in my post on the 2012 benchmark.

Marpa::R2 is available on CPAN. A list of my Marpa tutorials can be found here. There is a new tutorial by Peter Stuifzand. The Ocean of Awareness blog focuses on Marpa, and it has an annotated guide. Marpa also has a web page. For questions, support and discussion, there is a Google Group: marpa-parser@googlegroups.com. Comments on this post can be made there.

Dave's Free Press: Journal: CPAN Testers' CPAN author FAQ

Perlgeek.de : Correctness in Computer Programs and Mathematical Proofs

While reading On Proof and Progress in Mathematics by Fields Medal winner Bill Thurston (recently deceased I was sorry to hear), I came across this gem:

The standard of correctness and completeness necessary to get a computer program to work at all is a couple of orders of magnitude higher than the mathematical community’s standard of valid proofs. Nonetheless, large computer programs, even when they have been very carefully written and very carefully tested, always seem to have bugs.

I noticed that mathematicians are often sloppy about the scope of their symbols. Sometimes they use the same symbol for two different meanings, and you have to guess from context which on is meant.

This kind of sloppiness generally doesn't have an impact on the validity of the ideas that are communicated, as long as it's still understandable to the reader.

I guess on reason is that most mathematical publications still stick to one-letter symbol names, and there aren't that many letters in the alphabets that are generally accepted for usage (Latin, Greek, a few letters from Hebrew). And in the programming world we snort derisively at FORTRAN 77 that limited variable names to a length of 6 characters.

Shawn M Moore: Structured Exceptions in Moose Mentorship

For the past six months, I have been mentoring a student named Upasana for Moose. This mentorship was done through the GNOME Outreach Program for Women and was sponsored by The Perl Foundation. Our project was to convert Moose's hundreds of exceptions from strings to a hierarchy of exception classes. This increases robustness of the entire Moose ecosystem and at the same time allows Moose to be more aggressive in updating (perhaps eventually translating?) its error messages.

Yesterday, Upasana's structured exceptions branch landed in Moose. Her apprenticeship is now officially complete. I couldn't be happier with her work. Thank you so much for persisting through a long and challenging summer, Upasana!

Working with Upasana has been a blast. It was very rewarding to teach advanced concepts and to have those lessons not only learned, but also applied. This mentorship has also given me a real appreciation for how difficult it is to break into your first programming community. There are so many challenges that we old fogeys take for granted. There are of course the technical problems, like how to name methods, how to produce a clean and useful git history, how to change code in a popular project without breaking downstream consumers, how to write tests that are effective, and so on. But there are also many social problems to overcome, such as how to ask questions effectively, when to ignore the peanut gallery, how to work with/around other people's schedules, and even how to detect jokes, sarcasm, and mood generally in text-based chat. These are real barriers to entry for any new contributor. It's a small wonder that anyone learns to navigate these quagmires without help from a mentor. So, please please please go easy on your newbies. It is harder for them than you realize.

This apprenticeship would not have been possible without the support of the Outreach Program for Women and The Perl Foundation. Many thanks to them for running this as smoothly as possible. We have certainly succeeded in their goal of nurturing a newbie programmer into an expert open source contributor.

Thanks also to Jesse Luehrs for being an unofficial mentor. He helped both Upasana and myself with many problems along the way. I would also like to thank the #moose-dev team for helping at times when I was unavailable.

Backstory

At YAPC::NA 2012 in Madison, a vague blur named John Anderson gave a talk on the benefits of structured exceptions. We commiserated over Moose's stringy exceptions several times. Their existence was especially egregious because the rest of Moose is so heavily object-oriented, having been built on top of a meta-object protocol. During the Q&A for John's talk it was made clear that if anyone was willing to put in the hard work to convert Moose to structured exceptions, that branch would not only be merged, but cherished.

I volunteered to do that work. Maybe. You know, I really must review the video to judge how much of that volunteering was actually voluntary. In any case, I started a branch but never got far with it. That led to ribbing over the next six months in the Moose IRC channels that I had not kept my promise.

That false start did teach me one important lesson: the way forward was not to start by ripping out the existing exception system. It became clear that the right way was to build the new system, then convert the exceptions one-by-one over to the new system. That way you can run the tests at any point and they should still pass. Then, after all exceptions are moved over, get rid of the old system. (This is indeed how Upasana managed the complex cutover.)

So I hadn't gotten anywhere with structured exceptions. This made me a little bit sad that the project would never get off the ground and we would be stuck with those crappy string exceptions forever. But then out of the blue in March 2013, Karen Pauley asked me:

I noticed that you were listed as a mentor for previous GSoC projects and I wondered if you would mind being a possible mentor for [the Outreach Program for Women]?

I said absolutely, and why shucks, I have just the project for it!

OPW Proposals

There was a student, Upasana, interested in working on a Perl project for OPW. Originally she was talking to the Dancer team about an internship, but Sawyer generously suggested she also check out what Moose could offer her. I began talking to Upasana and it was clear she was very eager to learn Modern Perl. I recommended she read the Objects chapter in Modern Perl to see if Moose itself interested her. Turns out it did!

As part of the deal for OPW sponsorship, students were required to contribute a change to the project of their choice, to include in their grant application. For Upasana, we looked through Moose's bug tracker and found a good candidate for a first contribution. The ticket described how the Num type constraint was implemented using looks_like_number which is too lax in its parsing. It allows whitespace, Inf, NaN, and other dubious values. This leniency is at odds with Moose's preference for strict validation, favoring correctness over all.

This ticket turned out to be a perfect first project. It was a microcosm of what I hoped the structured exceptions work would be: make a substantial change in the documented and tested behavior of Moose without causing the world to end. And to work with the greater Moose community to iron out any problems. The fix itself necessitated understanding a bit of the metaobject protocol and dealing with many levels of abstraction running through the same body of code. Moose is a difficult project to hack on, but Upasana excelled. After this fix, I knew she would be able to handle anything else we could throw at her.

As part of that pre-proposal work, Upasana released her first module, MooseX::Types::LaxNum. This provides a LaxNum type constraint with the old behavior to help smooth the upgrade path to the stricter Num checking. This led her to get a CPAN account and to learn enough about Dist::Zilla to be dangerous. She also added some much-needed comments to the code implementing the Num type explaining why there was a seemingly-superfluous variable copy in the code.

After this fix, Upasana wrote and submitted her proposal to OPW. She had already landed a real bugfix into Moose, had a strong (and desperately needed) project outline, and had started getting involved in the community. In other words, Upasana and I conspired to make it very difficult for anyone to reject her proposal. ;) Indeed it passed through Perl's vetting with flying colors.

Summer Reading

With her proposal accepted, Upasana was eager to begin the real work. So what I did first was maliciously dump a ton of reading material on her. The first task was to finish Modern Perl, of course.

It was important to me that she learn from an example of exceptions done particularly well, so I pointed her at the Conditions and Restarts chapter of Practical Common Lisp. While we did not directly use any of the ideas of the condition system (because this is Perl and we really don't have any condition system… yet!), I hope that someday we can adopt more of these ideas—especially restarts—into Moose.

Because Moose is so heavily influenced by (one wouldn't even be wrong to call it a port of) the Common Lisp Object System, it made sense for her to work through The Art of the Metaobject Protocol. If you really want to hack on Moose, there is simply no way of getting around the metaobject protocol. Without an understanding of the MOP you can really only change the shallowest layers of Moose. Poorly, at that. After lots of reading of this book, and banging her head against walls, and asking questions, and experimenting, Upasana successfully grokked the MOP and has since been able to put Moose's abstractions to work fluently.

Upasana probably felt like she was thrown into the deep end with these two Lisp books, so of course I also recommended she read up on Traits (which is Smalltalk). Any useful exception class hierarchy will need to make use of roles to model cross-class concerns, so understanding role theory was important. But there was also the practical aspect of being able to convert the exceptions that Moose's role implementation throws. In the end, the roles that were designed mostly focused around the different classes of metaobject in Moose. Once structured exceptions are used in anger, we may discover additional roles for our exception hierarchy.

Moose Hacking

At some point she was finally done with all that boring summer reading homework and started coding. The first exception we converted was the first exception in lib/Moose.pm. Namely, extends with no arguments is an error. Reading through the channel logs, I apparently recommended Ender Upasana convert this exception just for practice, to get a sense of how the work would go. It turns out that this practice was just as good as real work. I had not planned it that way, but there was no reason to scrap and start over once we had designed the Moose::Exception superclass and converted some exceptions over.

This first exception happened to have no tests, which was a recurring problem all throughout Upasana's internship. Moose is very well tested … except for its error conditions! Luckily, Upasana added nearly a thousand hand-written tests to Moose to cover its many, many exceptions. That was not easy work, since some of the errors require odd contortions to trigger. In fact, I know she would agree that the most difficult part of this internship was actually just figuring out how to replicate some of those obscure errors. In a few cases we concluded that there was simply no way to trigger a particular error, so we left such exceptions as just confess with a string. If anyone notices, then yay! We can use their code as the test for the new structured exception.

The groove we settled into for converting these exceptions was that one of us would pick an exception, then she would try to replicate it. In the beginning I was picking most of the exceptions, since Moose is dauntingly large and she had no way of knowing what was easy or difficult. Quickly she took over that job and preferred going through each class converting all its exceptions from the top down. If she couldn't replicate the exception, I'd give it a shot and give her hints. One of the best techniques for replicating exceptions was to edit Moose's code to change the exception message, or to change the confess into a print. Run the tests and see if any start failing. This was far more reliable than simply grepping for the error message, since the tests are written using regular expressions which don't use exact string matching, and can include variable interpolation. Either of those defeat a simple grep.

After we figured out the code to replicate the error condition, next was converting it into a test. Initially, each test would start out by simply checking the exception string against a regular expression. Then Upasana wrote a new class for that exception. Then changed the Moose code to throw an object of that class instead of the original string. Lastly, the test would be finalized by confirming that the exception class was correct, and the attribute values were as expected. This process ensured that Moose continues to throw the exact same messages for each error. It was very important that the exception object automatically stringified into the legacy message that Moose used to throw. This gave us a fighting chance at landing our huge branch without disrupting the entire ecosystem, since downstream tests for Moose's error messages would still work without modification. We even put together a short list of exceptions whose messages should be improved later, once we are confident that most of downstream is using exception objects. Hopefully our restraint will help this cutover run more smoothly than Perl 5.10.0's addition of the variable name in the undef warning message, which hit some rough patches.

Finally, that exception would be done, so she would post a gist for me to review. I was unfortunately not always available to review those gists (especially during some crunch time at work), so at times they would grow and grow. I think the longest gist was something like 2000 lines long. In my reviews, I did my best to justify every change to her code that I suggested, since I wanted to foster an appreciation of what I suppose I would call deliberate code. No extraneous cruft, proper variable names, easily understood and maintained methods, etc. See this example review session.

Repeat this 300 times and you can see how it might take a whole summer!

We communicated through IRC for everything, but we used GitHub issues for tracking what needed doing. We ended up with nearly 300 issues covering everything from "please review this gist" to "sign OPW contracts" to an issue for pretty much each individual exception. I am a huge believer in using bug trackers (I run a personal RT instance that has nearly 3000 tickets in it), and so I am glad we found a way to make GitHub issues work for us.

Since Upasana was combing over every single file in Moose to convert those exceptions, we unearthed some interesting little problems in the Moose codebase. There are some low-hanging fruit here if anyone wants to start contributing to Moose!

A running joke was that what Upasana was currently working on was the "hardest part of Moose". There were at least five parts of Moose that earned that label. In order they were: all that abstraction, the meta-object protocol, method inlining, the MOP bootstrap, and metaclass compatibility rebasing. The latter of which pretty much only Jesse really understands. In the end, Upasana was able to manage each of these most-difficult parts of Moose. So I feel 100% confident in conferring Upasana the title of Mᴏᴏsᴇ Exᴘᴇʀᴛ. There are precious few of those. :)

As an aside, one of the secondary topics of conversation in our channel was food. I have a nasty habit of ordering the same dishes everywhere I go. But Upasana convinced me to branch out and try some new Indian dishes like dosa. Such food photos were a constant distraction during the mentorship. She would make fun of me for eating way too much Japanese food, since every time I posted a photo, it was invariably of ramen or curry or sushi. No regrets! But I would return the favor by joking about the American fast food chains she likes, so it was all in good fun.

The Future

From the very beginning, I have been impressed with Upasana's work ethic, her cheery demeanor (especially when mine wasn't!), and how downright intelligent she is. I could not have asked for a better student. I very much look forward to working on future projects with Upasana both as equals and as friends.

I learned a lot during this mentorship. Right this moment I feel like I was a terrible mentor at times. Always absent, sometimes moody, and never satisfied. Mentoring certainly took a lot more of my time and attention than I had anticipated. But reflecting on how much I have helped Upasana grow, and what we were able to achieve together, and how hard it is for a newbie to break into a community, I feel like it would be absolutely foolish for me to stop mentoring. I suppose we'll see what projects are available next OPW or Summer of Code. Howsabout conditions for Perl?

By the way. As far as I know, Upasana still plans to come to YAPC::NA 2014 in Orlando. It would be a real shame if she had to pay for any of her food. :)

Shawn M Moore: Reinstating Class::MOP's Commit History in Moose

Ever use git blame or git log in the Class::MOP parts of the Moose repository? If so you've probably seen Dave Rolsky's mega commit 38bf2a25.

$ git blame lib/Class/MOP/Package.pm

38bf2a25 (Dave Rolsky  2010-12-27 08:48:08 -0600   1)
38bf2a25 (Dave Rolsky  2010-12-27 08:48:08 -0600   2) package Class::MOP::Package;
38bf2a25 (Dave Rolsky  2010-12-27 08:48:08 -0600   3)
38bf2a25 (Dave Rolsky  2010-12-27 08:48:08 -0600   4) use strict;
38bf2a25 (Dave Rolsky  2010-12-27 08:48:08 -0600   5) use warnings;
38bf2a25 (Dave Rolsky  2010-12-27 08:48:08 -0600   6)
0db1c8dc (Jesse Luehrs 2011-04-17 19:11:28 -0500   7) use Scalar::Util 'blessed', 'reftype', 'weaken';
38bf2a25 (Dave Rolsky  2010-12-27 08:48:08 -0600   8) use Carp         'confess';
0db1c8dc (Jesse Luehrs 2011-04-17 19:11:28 -0500   9) use Devel::GlobalDestruction 'in_global_destruction';
38bf2a25 (Dave Rolsky  2010-12-27 08:48:08 -0600  10) use Package::Stash;
38bf2a25 (Dave Rolsky  2010-12-27 08:48:08 -0600  11)
38bf2a25 (Dave Rolsky  2010-12-27 08:48:08 -0600  12) use base 'Class::MOP::Object';
38bf2a25 (Dave Rolsky  2010-12-27 08:48:08 -0600  13)
38bf2a25 (Dave Rolsky  2010-12-27 08:48:08 -0600  14) # creation ...
38bf2a25 (Dave Rolsky  2010-12-27 08:48:08 -0600  15)
38bf2a25 (Dave Rolsky  2010-12-27 08:48:08 -0600  16) sub initialize {
38bf2a25 (Dave Rolsky  2010-12-27 08:48:08 -0600  17)     my ( $class, @args ) = @_;

This commit is a straight up copy of all the Class::MOP code into the Moose repository. Unfortunately it was committed as a single (gigantic) changeset.

commit 38bf2a2585e26a47c919fd4c286b7716acb51c00
Author: Dave Rolsky <autarch@urth.org>
Date:   Mon Dec 27 08:48:08 2010 -0600

    Merged CMOP into Moose

        ...

diffstat says this commit changed a solid fourteen thousand lines of code across 138 files. A most productive morning for Dave!

This is problematic because there is a rich and quite important four years' worth of history before December 27th 2010, which is when Class::MOP was merged into the Moose repository. All of that is effectively lost because this commit is a copy instead of a more delicate merge. I don't blame Dave one bit for this; I certainly did not object when this merger was going down, nor did I appreciate how valuable a clean commit history is.

Why does this copy merge really matter anyway? Well, for example, we needed to know exactly when Moose started forbidding bare references in attribute default (to prove someone wrong on the Internet, that noblest of goals). I had to resort to clumsily bisecting Class::MOP releases on metacpan to find that the restriction was added in version 0.33 (August 19th 2006). I was sad that I was not able to use my usual git blame or git log tools because all Class::MOP history leads to 38bf2a25.

In an ideal world, we would be able to tweak 38bf2a25, two years later, to add a second parent commit (namely, the last commit in Class::MOP, d004c8d5). This would turn it into a merge of two commits, which is how we could inject the entire Class::MOP commit history into Moose's history. From then on git would inspect both histories to produce blame reports, commit logs, and so on. Just like any other merge commit.

Alas, we cannot just go ahead and change the public Moose repository to include the Class::MOP history like that. Adding a second parent to 38bf2a25 would change that commit's SHA. That would cause a cascade of changes to every subsequent commit (and their SHAs) in the two years' worth of commits since. This would in turn break everyone's git clone of the Moose repository, as well as any external pointers to commits (such as RT tickets, email archives, etc), and the rest of the civilized world.

However! you, my friend, can fix it for your own checkout without screwing anybody up. git has a tool for rewiring parent commits. Two, in fact. The original tool was called git graft but there is a slicker, more powerful replacement called, well, git replace. These tools allow you to tweak individual commits in your local repository without disrupting other commits and their SHAs. This means you can still freely and painlessly share your branches and commits with GitHub, other developers, and yo momma.

To do this, start by cloning up a new copy of the Moose repository for playing around. Not strictly necessary, but caution is definitely warranted here. You want to make sure this procedure works before I, through you, potentially damage your working copy.

git clone git://git.moose.perl.org/Moose.git
cd Moose

Next, fetch Class::MOP's master so its commits also exist within the Moose repository.

git remote add cmop git://git.moose.perl.org/Class-MOP.git
git fetch cmop master

Finally, fix the sledgehammer-merge commit 38bf2a25 to include both its original parent commit and the last Class::MOP commit. To do that we create an entirely new commit object that is exactly like 38bf2a25 except it has that second parent commit, d004c8d5. Then we use git replace to tell git to use the new SHA (hint: it's f18fded8) in place of 38bf2a25.

NEW_MERGE=$(
      git cat-file commit 38bf2a25
    | perl -ple '/^parent / && print "parent d004c8d565f9b314da7652e9368aeb4587ffaa3d"'
    | git hash-object -t commit -w --stdin
)
git replace 38bf2a25 $NEW_MERGE

All done. Enjoy your new (old) history!

$ git log --grep associated_metaclass --format='format:%h %ad %an%n    %s' lib/Class/MOP

cc03c2b Sun Feb 19 12:51:48 2012 -0600 Dave Rolsky
    Weaken the associated_metaclass after cloning a method.
        -- post-merge commit
aa5bb36 Mon Apr 25 10:38:05 2011 -0500 Jesse Luehrs
    fix setting associated_metaclass and attribute on accessor objects
        -- post-merge commit
09ea7f8 Sun Aug 10 18:24:02 2008 +0000 Yuval Kogman
    package_name >= associated_metaclass->name
        -- pre-merge commit!
5e60726 Sun Aug 10 17:42:29 2008 +0000 Yuval Kogman
    add associated_metaclass to Method
        -- pre-merge commit!
$ git blame lib/Class/MOP/Package.pm

2243a22b (Stevan Little   2006-06-29 18:27:47 +0000   1)
2243a22b (Stevan Little   2006-06-29 18:27:47 +0000   2) package Class::MOP::Package;
2243a22b (Stevan Little   2006-06-29 18:27:47 +0000   3)
2243a22b (Stevan Little   2006-06-29 18:27:47 +0000   4) use strict;
2243a22b (Stevan Little   2006-06-29 18:27:47 +0000   5) use warnings;
2243a22b (Stevan Little   2006-06-29 18:27:47 +0000   6)
0db1c8dc (Jesse Luehrs    2011-04-17 19:11:28 -0500   7) use Scalar::Util 'blessed', 'reftype', 'weaken';
6d5355c3 (Stevan Little   2006-06-29 23:28:32 +0000   8) use Carp         'confess';
0db1c8dc (Jesse Luehrs    2011-04-17 19:11:28 -0500   9) use Devel::GlobalDestruction 'in_global_destruction';
407a4276 (Jesse Luehrs    2010-05-10 23:20:29 -0500  10) use Package::Stash;
2243a22b (Stevan Little   2006-06-29 18:27:47 +0000  11)
f197afa6 (Jesse Luehrs    2010-05-10 21:13:19 -0500  12) use base 'Class::MOP::Object';
6e57504d (Stevan Little   2006-08-12 06:13:02 +0000  13)
6d5355c3 (Stevan Little   2006-06-29 23:28:32 +0000  14) # creation ...
6d5355c3 (Stevan Little   2006-06-29 23:28:32 +0000  15)
6d5355c3 (Stevan Little   2006-06-29 23:28:32 +0000  16) sub initialize {
3be6bc1c (Yuval Kogman    2008-08-14 18:21:45 +0000  17)     my ( $class, @args ) = @_;

In closing, to answer my original question using my normal means...

$ git log -p --reverse -S 'References are not allowed as default'

commit 148b469742669e1a506538200f624dcdaeeb510a
Author: Stevan Little <stevan.little@iinteractive.com>
Date:   Wed Aug 16 21:32:05 2006 +0000

    no ref in the defaults

...

+    (Class::MOP::Attribute::is_default_a_coderef(\%options))
+        || confess("References are not allowed as default values, you must ".
+                   "wrap then in a CODE reference (ex: sub { [] } and not [])")
+            if exists $options{default} && ref $options{default};

Sweet, sweet history.

Dave's Free Press: Journal: YAPC::Europe 2006 report: day 3

Shawn M Moore: Learning to Build Abstractions in Quartz Composer

I decided today that I would learn a bit of Quartz Composer. I had never touched it before, beyond reading a couple articles and watching a conference talk. The most useful introduction for me was "UI Prototyping with Quartz Composer and Origami" by Pasan Premaratne. It takes you from absolute zero to having built a simpler version to Path's attractive spinout menu with Facebook's Origami.

I recommend you not only read Pasan's post, but also actually follow along! At worst you'll have spent twenty minutes exploring a novel way of doing things. At best you'll have acquired a new tool for your kit and become even more handsome.

Near the end of his post, Pasan laments:

The only downside that I see right now to using Quartz Composer is that if you're prototyping something complex, your composition can get unwieldy and convoluted fairly quickly. In just creating a radial menu with three buttons we have over 20 patches in our composition.

I agree that it can become unwieldy. Here's a snapshot of my small composition:

Those three yellow blocks contain essentially the same code. Each block does not explicitly group its contained patches; they are spacially arranged in a particular region on the canvas. Nothing more. So a yellow block is about as constructive as a source code comment.

These blocks contain the same patches duplicated with slightly different parameters. That is of course a bit offensive to me as a programmer. If possible I would like to clean up that repetition. But I'm not even sure that I can.

Here's the rub. If Quartz Composer lacks tools to abstract away chunks of your composition, then it is little more than a shiny toy. It would be like a programming language that does not support creating functions. But if QC does enable building bigger, reusable units of design, then it is worthy of my attention.

So! Let's learn how to build abstractions in Quartz Composer, together! This is my very first day with QC, so there are going to be some false starts. Bear with me. :)

Macros

Step one is to read and follow along with everything in Pasan's post.

Go do it. I'll be here.

As I went through Pasan's tutorial, I kept seeing mention of a "Macro" feature all over Quartz Composer. If it resembles any other system's macro functionality, that would be one way to reduce complexity in your project.

The next step, then, must be to select the group of patches responsible for one of the buttons and click Create Macro in the toolbar. This replaces all the patches with a single Macro Patch. The noodle from Interaction 2's Drag outlet is connected to this macro patch, which is a good sign. In fact if you go to the Viewer you should see that nothing has changed.

Double click the macro patch to jump into its definition. Take care to double click in its body, not its titlebar (which renames the patch). We can see that it is almost identical to what we had before. However there is a new patch called Number Splitter near the top. This must be how Quartz Composer connects the Drag interaction from outside the macro to the inputs of the animations inside the macro. Observe that Number Splitter's inlet is green, presumably to indicate that it is special in this way.

If we want to make this macro reusable, we would need to create our own parameters. The button image must be one such parameter otherwise every button will be labeled "A". The best place to start is to delete the image patch which is hardcoded to be ButtonA.png.

To parameterize the image, right click the Layer patch. Under Publish Inputs select the Image property. QC offers to let you name the parameter differently from the name that the Layer patch expects, but in this case we can stick with Image. This turns the inlet green, which matches the Number Splitter parameter. Here's to hoping we're right!

Clicking Edit Parent in the QC toolbar leaves the macro and returns to our overall composition. If you look at the macro patch you can see that it now has an Image inlet. Drag that ButtonA.png file back into the composition, delete its Layer, and hook up its outlet to the macro's Image input.

If all went according to plan, there should be no change in the Viewer. Indeed there are still three spinout buttons labeled A, B, and C.


Here begins one of my false starts. Read on before making changes to your composition.

Let's get rid of the other two buttons and replace them with macros. Copy and paste appears to work just fine on macro patches. Ensure that each of the three macros has its Input and Image inlets populated.

If you flip to the Viewer you'll see that there's only one button. The problem is that all three buttons are animating between the same positions at the same times. The other two buttons are hiding below the visible one.

To fix this we need more macro parameters. But to add them, we would need to edit the macro three times, once for each button. This is because we used copy and paste to duplicate the macros. Just like in programming, copy and paste is a worst practice.

Lesson learned. False start over.

User-Defined Patches

Copying and pasting macros didn't pan out. How else can we achieve the abstraction we want?

Another button in the QC toolbar is Add to Library. I presume that is for reusing the components you have created across different projects. Let's add the macro to our library as a new patch type. After playing around a little I've discovered that rather than adding the macro to your library directly, it's better to explode the macro first (available under its right-click menu). Call the new patch type Radial Button.

Then from the Patch Library drag in two more Radial Buttons and wire them up. You'll notice that the inlets are now called Enable, Input, and Input. That's downright ridiculous, so let's fix that problem first.

In the Patch Library, right click the Radial Button object and select Edit. Then without selecting a patch, open up the Patch Inspector. This inspects Radial Button itself. In the dropdown at the top select Published Inputs & Outputs. You'll see a table of Input mapped to input_proxy_1 and another Input mapped to input_proxy_2. Change one of the Input labels to Progress and the other to Image. I don't know if there's a way to immediately see which is which. However if you save and reopen Radial Button, the label on the image layer should say Image not Progress. If you guessed wrong be sure to flip them the other way.

We've renamed the properties, so let's go back to our composition to see our change.

Ah crud. Still two uselessly-named Input inlets. I bet that every time we edit Radial Button we must remove it from our composition and add the new version back in. What a pain! If you know a better solution, please get in touch. Otherwise, if you really must remove and re-add your custom patches after each change, it's probably best to finish the patch in isolation before adding it to your project.

Beware! Don't forget to adjust the layer ordering any time you add new layers. Hit Area should be the layer with the highest number, then the Add Button layer should be the next layer below that. If you miss this step, you will see rendering bugs. Or worse, the touch handler mysteriously won't fire, because it is obscured by other layers.

Next let's make more properties into parameters. x- and y-coordinate are as good a place to start as any. Recall that in Pasan's post we assigned different End Values to each button's Transition patches. However, notice that the Start Value and End Value of the Transition patch have ordinary inlet ports. That means we could publish those inputs from Radial Button itself. Start by right clicking Transition X, selecting Publish Inputs, then End Value. Call it End X. Similarly, publish Transition Y's End Value as End Y.

Go back to your composition, remove the Radial Button patches, then re-add them. You'll see that you have the new End X and End Y inputs. Hook up the Progress and Image inlets as before. Then, using the same method as in Pasan's post, assign constant values using the Patch Inspector for each of the End X and End Y inputs on each of the three buttons. For convenience they are:

  • A button: (-184.5, -408.5)
  • B button: (0, -298.5)
  • C button: (184.5, -408.5)

Notice as you're editing how the default values for End X and End Y are the values that had been assigned to each End Value of Button A's two Transition patches. You should change them to better defaults (like 0) by editing the Radial Button patch. Use the Patch Inspector with no patch selected, then change the values under Input Parameters.

In the Viewer, confirm that the animation is working again. The only thing left to fix is the friction and tension of each of the buttons coming out. I'm sure you can handle that.

How Bout Dem Abstractions

Thanks to our custom Radial Button patch, there is less duplication in our composition. The number of patches has decreased by half, which makes the design more comprehensible.

That is not good enough.

If we want to add or remove a button in this menu, it'd be a surprising amount of work. We would need to do some trigonometry for each button to produce a new set of magic numbers. No way! It needs to be as easy to add a button as it would be in, say, Interface Builder.

Before we attempt that, let's simplify the problem first. Let's move everything to the origin. Change the Radial Button's Start Value for both Transition X and Transition Y to 0. And then in your composition, change the y-coordinate of the Add Button and the Hit Test from -512 to 0.

Using the origin, instead of the magic y-coordinate -512, reduces the fiddliness of the task. Also, we'll lay the buttons across the complete circle around the add button, rather than just half or a quarter of it. Once we have that working, coming back and restoring these constraints would be straightforward.

To begin, we'll remove the End X and End Y published inputs from Radial Button. You remove them the same way you added them: just right click the patch and select the property under Published Inputs.

The new inputs we'll want are Radius, Count, and Index. The Radius input will tell us how far from the origin to move that button. Count and Index will be used to decide where on the circle that button will go.

To calculate the destination of each button, we'll need to use sin and cos. Quartz Composer provides a Mathematical Expression patch that evaluates expressions of arbitrarily many variables. We'll need one patch for the x-coordinate and another for the y-coordinate, so drag out two.

If you inspect a Mathematical Patch, under the Settings pane there is a text field for formula. Any free variables in this formula will end up as inputs to the patch (which is a wonderful bit of design).

For the x-coordinate patch, we'll want to use the formula sin(360 * index/count) * radius.

Note! sin uses degrees not radians. Knowing that will save you the twenty minutes of head-scratching and intense self-doubt that I suffered. :)

For the y-coordinate patch, we'll use the same formula but with cos instead, producing cos(360 * index/count) * radius.

Hook the Result outlets of these two patches up to the End Values of the corresponding Transition patches.

We'll need to publish the Index, Count, and Radius properties. If we do that on one of the two Mathematical Expression patches, the other one wouldn't receive those inputs. If we do it on both patches, we might expect that Quartz Composer would send the same value to both patches, but that's not the case. The patch just publishes two sets of inputs with the same names. Useless!

What we want here is an Input Splitter. Right click one of the Mathematical Expression patches and select Insert Input Splitter for Radius. This gives you a tiny patch with an unnamed input and output. The key to using this is that you can send output from one patch as input to multiple other patches. So drag the noodle from the Radius Input Splitter to both Mathematical Expression patches' Radius input. Repeat for Index and Count.

Then finally we can publish the Radius, Count, and Index properties in the usual way from the Input Splitter patches.

With all those changes made, our Radial Button patch looks like this:

And then our project can use that new and improved version like this:

With each Radius set to 200, Count set to 3, and Index from 0 to 2, we get the following result:

Great! We can factor out the Friction parameter in the same way. (This is your cue!)

Put Your Abstraction to Work

That was certainly a lot of work to build out that Radial Button patch. Let's see how well it serves us by adding a fourth and fifth button to the menu.

First, add your new images to the composition. Delete their Layer patches. Drag in two more Radial Buttons. Hook up their Image and Progress inlets.

Then set the Radius of the two new patches to 200. Set the Count of all the Radial Button patches to 5. Then finally set the Index values of the new patches to 3 and 4.

Success!

If you wanted to, you could use an Input Splitter to avoid duplicating the Radius and Count properties and make it even easier to add buttons.

I wonder if there's an automatic way to specify the Count and Index properties. Is there a way to count or enumerate the number of connections from an outlet? Quartz Composer provides a lot of patches so I would not be surprised if it did.

To ace this project, you could move the menu to the bottom left of the screen. Make it cover only a quarter of the circle like Path does. I imagine you could implement the latter merely by choosing interesting values for Index and Count. Or, better yet, expose the 360 factor from the sin and cos formulae as a published input.

The Rub

Quartz Composer is certainly an interesting tool to have in your repertoire. Origami does a lot to make it more usable and more flexible. For future projects I no longer have a reason to prototype animations in code. Origami helps you design your interactions faster and gets you closer to that sweet, sweet instantaneous feedback.

Yes, it is too easy to create a mess of your composition with too many patches. Just like not using functions would make a mess of your code. Quartz Composer thankfully does provide designers robust tools for creating and reusing abstractions. You just have to think like a programmer. ;)

Subscriptions

Header image by Tambako the Jaguar. Some rights reserved.