Moose Blog: Moose's Num won't be using Scalar::Util::looks_like_number as of Moose 2.1000

Before Moose 2.1000, Moose's Num type used Scalar::Util::looks_like_number, which recognizes
" 1223  ", NaN, Inf, "0 but true", etc. as numbers. In future releases, Num will only accept "123", 123, "1e3", 1e3, ".0", "0.0" etc. You can still get the older behaviour by using the LaxNum type provided by the MooseX::Types::LaxNum module. Please see this RT ticket for more details.

Perl Gems: NY Perl Mongers Talk Slides


Information Retrieval and Extraction from cfrenz

Slides from a talk I recently gave at a NY Perl Mongers Meetup 

Gabor Szabo: Perl, Python, Ruby, PHP and HTML5 on Google trends

Once in a while someone looks up some numbers regarding Perl and other languages, sees a downward graph, rings the warning bells, and then others start saying why is that not important, and there are more modules on CPAN anyway... As I have a lot of other urgent things to do, I decided a good way to procrastinate would be to look at some data. Some people and companies think that the number of pages having the term programming perl, is a good indication of language popularity. That certainly has some value, but I think seeing how many people are actually searching for a term has better indication for the interest in that term.... So I looked at the Google Trends for the above 5 terms and tried to understand what I see there.

For the full article visit Perl, Python, Ruby, PHP and HTML5 on Google trends

dagolden: How I manage new perls with perlbrew

Perl v5.19.0 was released this morning and I already have it installed as my default perl. This post explains how I do it.

First, I manage my perls with perlbrew. I install that, then use it to install some tools I need globally available:

$ install-patchperl
$ install-cpanm

You must install cpanm with perlbrew -- if you don't, weird things can happen when you switch perls and try to install stuff.

I keep my perls installed read-only and add a local::lib based library called "@std". (I stole this technique from Ricardo Signes.) That way, I can always get back to a clean, stock perl if I need to test something that way.

(There are still some weird warnings that get thrown doing things this way when I switch perls, but everything seems to work.)

I also install perls with an alias, so "19.0" is short for "5.19.0".

Then I have a little program that builds new perls, sets things up the way I want, and installs my usual modules. All I have to do is type this:

$ newperl 19.0

And then I've got a brand new perl I can make into my default perl.

Here's that program. Feel free to adapt to your own neeeds:

#!/usr/bin/env perl
use v5.10;
use strict;
use warnings;
use autodie qw/:all/;

my $as = shift
  or die "Usage: $0 <perl-version>";
my @args = @ARGV;

# trailing "t" means do threads
my @threads = ( $as =~ /t$/ ) ? (qw/-D usethreads/) : ();

$as =~ s/^5\.//;
my $perl = "5.$as";
$perl =~ s/t$//; # strip trailing "t" if any
my $lib = $as . '@std';

my @problem_modules = qw(
  JSON::XS
);

my @to_install = qw(
  Task::BeLike::DAGOLDEN
);

my @no_man = qw/-D man1dir=none -D man3dir=none/;

# install perl and lock it down
system( qw/perlbrew install -j 9 --as/, $as, $perl, @threads, @no_man, @args );
system( qw/chmod -R a-w/, "$ENV{HOME}/perl5/perlbrew/perls/$as" );

# give us a local::lib for installing things
system( qw/perlbrew lib create/, $lib );

# let's avoid any pod tests when we try to install stuff
system( qw/perlbrew exec --with/, $lib, qw/cpanm TAP::Harness::Restricted/ );
local $ENV{HARNESS_SUBCLASS} = "TAP::Harness::Restricted";

# some things need forcing
system( qw/perlbrew exec --with/, $lib, qw/cpanm -f/, @problem_modules );

# now install the rest
system( qw/perlbrew exec --with/, $lib, qw/cpanm/, @to_install );

# repeat to catch any circularity problems
system( qw/perlbrew exec --with/, $lib, qw/cpanm/, @to_install );

Yes, that takes a while. I kicked it off right before going to get lunch. When I got back, I was ready to switch:

$ perlbrew switch 19.0@std

I also have a couple bash aliases/functions that I use for easy, temporary toggling between perls:

alias wp="perlbrew list | grep \@"
up () {
  local perl=$1
  if [ $perl ]; then
    perlbrew use $perl@std
  fi
  local current=$(perlbrew list | grep \* | sed -e 's/\* //' )
  echo "Current perl is $current"
}

I use them like this (notice that I don't need to type my @std library for this fast switching):

$ up
Current perl is 18.0@std

$ wp
  10.0@std
  10.0-32@std
  10.1@std
  12.5@std
  14.4@std
  16.3@std
  16.3@test
  16.3t@std
* 18.0@std
  19.0@std
  8.5@std
  8.9@std

$ up 19.0
Use of uninitialized value in split at /loader/0x7fa639030cd8/local/lib.pm line 8.
Use of uninitialized value in split at /loader/0x7fa639030cd8/local/lib.pm line 8.
Current perl is 19.0@std

(there's that warning I mentioned)

I hope this guide helps people keep multiple perls for development and testing. In particular, I'd love to see more more people doing development work and testing using 5.19.X so it can get some real-world testing.

See you June 21 for v5.19.1...

Perlbuzz: Perlbuzz news roundup for 2013-05-20

These links are collected from the Perlbuzz Twitter feed. If you have suggestions for news bits, please mail me at andy@perlbuzz.com.

Perl 5 Porters Summaries: Perl 5 Porters Weekly: May 13-19, 2013

Welcome to Perl 5 Porters Weekly, a summary of the email traffic of the perl5-porters email list.

The topic of the week on P5P was the clean up and release of perl 5.18.0 which was released on 5-18 for North Americans. RJBS in a seperate email said blead would be reopen for patches on Tuesday, as 5.19.0 is scheduled to be released on Monday (May 20).

Read the release announcement

Read perldelta

Download a tarball

Congratulations to the perl5 team for this new release!

Perl News: Perl 5.18.0 released

The release of Perl 5.18.0 has been announced

You can find a full list of changes in the file “perldelta.pod” located in the “pod” directory inside the release and on the web.

Perl v5.18.0 represents approximately 12 months of development since Perl v5.16.0 and contains approximately 400,000 lines of changes across 2,100 files from 113 authors.

Perl continues to flourish into its third decade thanks to a vibrant community of users and developers.

Strawberry Perl 5.18.0.1 is available at http://strawberryperl.com (all editions: MSI, ZIP, PortableZIP for both: 32/64bit MS Windows)

Perl Foundation News: Grant Application: Maintaining Perl 5

We have received the following grant application, under the Perl 5 Core Maintenance Fund, from Tony Cook.

Before we vote on this proposal we would like to get feedback and endorsements from the Perl community. Please leave feedback in the comments or send email with your comments to karen at perlfoundation.org.

Project Title: Maintaining Perl 5

Name: Tony Cook

Synopsis

Free up one of the Perl 5 core's contributors to work non-stop on making Perl 5 better.

Benefits to Perl 5 Core Maintenance

This grant provides the pumpking with a development resource to target as he or she wills, while still providing for more general bug fixes and other improvements to the perl core.

Deliverables

I propose to adopt the same model as Dave and Nicholas's successful ongoing grants.

Like their grants, there are intentionally no pre-defined deliverables for this project. I intend to devote around 260 hours (about 20 hours a week) over the next 3 months to work on improving the core, paid by the hour at the same below-commercial rate as Dave and Nicholas. Some weeks I may be able to more than 20 hours, if acceptable this will consume more hours and end the grant earlier.

Like them, I would post a weekly summary on the p5p mailing list detailing activity for the week, allowing the grant managers and active core developers to verify that the claimed hours tally with actual activity, and thus allow early flagging of any concerns. Likewise, missing two weekly reports in a row without prior notice would be grounds for one of my grant managers to terminate this grant.

Exactly as Nicholas and Dave do, once per calendar month I would claim an amount equal to $50 x hours worked. I would issue a report similar to the weekly ones, but summarising the whole month. The report would need to be signed off by one of the grant managers before I get paid. Note that this means I am paid entirely in arrears.

At the time of my final claim, I would also produce a report summarising the activity across the whole project period.

Also, (the "nuclear option"), the grant managers would be allowed, at any time, to inform the board that in their opinion the project is failing, and that the TPF board may then, after allowing me to present my side of things, to decide whether to terminate the project at that point (i.e. to not pay me for any hours worked after I was first informed that a manager had "raised the alarm").

I view this grant as a proof of concept - if it goes well for everyone involved, I expect to apply to extend it.

Project Details

I think that the work that I would do to improve Perl 5 would mostly fall into one of four main classes: code reviews, bug fixing, helping other contributors, and adding features - with bug fixes the most prominent and adding features the least.

In one major sense what I'm offering is different to Nicholas's or Dave's grant: the current pumpking would be able to assign tasks directly.

Ideally this would be done with some consultation with myself, so a large complex task involving parts of the core I'm unfamiliar with isn't assigned (or is assigned with reasonable expectations on time). Of course, if too many tasks are negotiated into non-existence, the grant can be terminated.

Some possible tasks, based on discussions over the last several months:

  • readpipe(@list)
  • core exception objects
  • a git hook to prevent changes in the left of a merge

Success metric: completion of tasks assigned.

Otherwise I'd work on:

  • Reviews of patches submitted to perlbug, possibly committing them

This will improve my core knowledge, and provide more timely feedback to non-committers using their time to help perl.

Metric: number and complexity of patches applied or commented on.

  • Fixing bugs I select from the perl5 Request Tracker queue.

While I wouldn't necessarily be working on the the harder bugs that Dave targets, this would help bring the total bug count down, and reduce the noise in the Request Tracker queue.

Metric: number and complexity of issues fixed.

  • Fixing systemic issues in perl, such as the mis-use of I32 and U32 in the perl core.

Metric: complexity of issue solved.

  • Contributing to discussion on the perl5-porters mailing list.

For the grant, I'm specifically not proposing to:

  1. Be a release manager. This doesn't prevent me volunteering to act as a release manager, but that wouldn't be counted towards this grant.
  2. Act as language designer - I don't feel that I'm good at this.

Project Schedule

I expect that I can deliver 260 hours of work in approximately 3 months.

I am available to start work on this project immediately.

Bio

I'm a freelance programmer living in Australia. I've been irregularly contributing to perl since 2008 and a committer since 2010. My contributions have varied from build system fixes, to UTF-8 handling, to portability fixes. I've been programming in C for 25 years and in perl for 20.

Endorsed By

Ricardo Signes, Nicholas Clark, H.Merijn Brand

Amount Requested: $13,000

Suggestions for Grant Manager

Ricardo Signes, Marcus Holland-Moritz

Ovid: Cleaning up the Test::Class::Moose base class

I'm quite enjoying Test::Class::Moose. It's very easy to use and it gives you such fine-grained control over your test suite and powerful reporting capabilities that it's turning out to be far more powerful than I had expected. It's actually easy enough to use for beginners, but power users will really appreciate it. There was, however, a major issue I had with it and it stems from a habit I picked up from Test::Class.

For those who are very familiar with using Test::Class (or if you've read my Test::Class tutorial), you may be used to seeing a base class that looks like this:

package My::Test::Class;
use parent 'Test::Class';

INIT { Test::Class->runtests }

sub startup  : Tests(startup)  {}
sub setup    : Tests(setup)    {}
sub teardown : Tests(teardown) {}
sub shutdown : Tests(shutdown) {}

1;

The empty test control methods are there so that a subclass knows it can always safely do this:

sub startup : Tests(startup) {
    my $test = shift;
    $test->next::method;
    # more code here
}

Those stub test methods are no longer needed with Test::Class::Moose, but until today, that INIT block was an annoying code smell that had some unfortunate side-effects. Let's make that go away.

The INIT Phase

First of all, what does that INIT block do?

INIT { Test::Class->runtests }

In Test::Class, that would simply run all loaded test classes at INIT time. However, that means you could do this:

prove -lv t/lib/path/to/my/test/class.pm

And voilà, you test class could be executed just like it's a *.t program. That's because it would be loaded, it would inherit from your base class, the INIT block would fire and Rube Goldberg would be laughing in his grave.

Sadly, this leads to nasty bugs like this one I blogged about almost 3 years ago. Basically, if you use $some::module, the code in that module is executed before INIT fires but the subroutines are not (aside from import()). However, if you load the code directly with, say perl t/lib/path/to/my/test/class.pm, the INIT block fires before the code in the module is executed. You can read perldoc perlmod for more information about INIT and friends.

How does this impact Moose?

At first glance, you might not think issues with INIT and friends would impact Moose, but there are a couple of subtle repercussions. For example, we know that subroutine attributes only fire at compile time. That's why the Moose manual explains how to use inherited subroutine attributes with Moose. In short, you have to ensure that you're inheriting at compile-time, not at CHECK/INIT/UNITCHECK or runtime:

BEGIN { extends 'Some::Class' }

Since Moose isn't part of the Perl language and much of its behavior fires at runtime, you can have some strange issues, such as when I wrote the following code:

package TestsFor::Person::Employee;
use Test::Class::Moose extends => 'TestsFor::Person';

sub extra_constructor_args {
    return ( employee_number => 666 );
}

BEGIN {
    after 'test_constructor' => sub {
        my $test = shift;
        is $test->test_person->employee_number, 666,
          '... and we should get the correct employee number';
    };
}

1;

That's really ugly and, in fact, I received an email from someone who called me on that, arguing that Test::Routine and Test::Roo were simpler in this case. I think he was right and I thought for a while about the best way to resolve this. You shouldn't have to think about timing issues like this when you're writing your test classes.

A Cleaner Test::Class::Moose

Today I've released Test::Class::Moose version 0.11 and it makes this case easier to handle. Your base class now can look like this:

package My::Test::Class;
use Test::Class::Moose;
1;

In fact, you may not even need a base test class at all — just have your classes use Test::Class::Moose directly — but a base test class is handy enough that people put all sorts of interesting code in there.

So how do you do the INIT trick that lets you run an individual test class? I now recommend use of a driver *.t script for your test suite. Here's what one might look like:

use Test::Class::Moose::Load 't/lib';
my $test_suite = Test::Class::Moose->new(
     show_timing  => 1,
     randomize    => 0,
     statistics   => 1,
     test_classes => $ENV{TEST_CLASS},
)->runtests;

my $report = $test_suite->test_report;
# the reporting object, of course, can be ignored if you
# don't need reporting on the test suite

Unlike Test::Class, Test::Class::Moose has a constructor that lets you control the test suite behavior prior to running the suite. There is a new attribute, test_classes which takes the name of a test class or an array reference of test class names. Only those test classes will be run. Yes, you could have written a Test::Class::Moose subclass to control this, but I wanted to make this built-in because running individual classes is a common use case. Now, instead of this:

prove -lv t/lib/path/to/my/test/class.pm

You do this:

TEST_CLASS=Class::I::Want::To::Run prove -lv t/test_class_tests.t

And things work just like the are supposed to. As a bonus, that nasty BEGIN block in my TestsFor::Person::Employee class goes away:

package TestsFor::Person::Employee;
use Test::Class::Moose extends => 'TestsFor::Person';

sub extra_constructor_args {
    return ( employee_number => 666 );
}

after 'test_constructor' => sub {
    my $test = shift;
    is $test->test_person->employee_number, 666,
      '... and we should get the correct employee number';
};

1;

And your test classes now look like proper Moose code rather than worrying about BEGIN versus INIT phases.

I'll be giving a talk about Test::Class::Moose at YAPC::NA and I've also proposed it for YAPC::EU. I hope to see you there!

Modern Perl Blog: Including People

The YAPC North America Perl Conferences of the past couple of years have held an event I like quite a bit. It's the first time attendees mixer.

As of a couple of years ago, YAPC::NA organizers and attendees realized that about half of the attendees of each conference had never attended a YAPC before. That's between one and two hundred people whose main face to face involvement with the larger Perl community may have been limited to a local Perl Mongers meeting. Yes, these attendees have almost certainly used the CPAN, very likely participated in a discussion on a Perl web site, mostly used a Perl mailing list (and not just the YAPC mailing list), and have probably been on a Perl IRC channel, but they probably aren't the people you think of when you think "Who are the best connected people in the Perl community?"

As far back as I can remember (which, admittedly, is one of the YAPC::NAs in Chicago), an early morning talk has served as an introduction to YAPC, specifically intended to help new attendees understand the conference and its quirks and norms. That talk invites these attendees to the novice welcome meeting.

The organizers also grab as many of the well connected people in the community—pumpkings, core developers, CPAN contributors, authors, project leaders, anyone whose name you might recognize—and ask them to show up and be willing to talk to people. That's it.

What I like about this system is that it welcomes people in two ways. First, it acknowledges that it's okay to be new to YAPC or the Perl community in person. If that's you, you're not alone. Half of everyone you're going to see at the conference is like you in that sense.

Not only that, but you have permission to participate. You're welcome to attend this little meetup that has an explicit place in the schedule—it's an official part of the conference—and you're encouraged to talk to people you might know only by reputation. They're there to meet and talk to you. They're not there to hang out in little groups by themselves. They're there to talk to you.

I've heard good things about this event. I've enjoyed it every time I've gone. (As an introvert myself, I like having permission to talk to people with a limit of a couple of hours.)

I feel the same way about a YAPC Code of Conduct. I don't see it as a warning that "straying from a straight and narrow path of arbitrariness will not be tolerated, so if you're not sure if you might accidentally say or do something someone else doesn't like, stay away!" I see it as giving people who aren't necessarily well versed in the norms and ideals and messy politics of dealing with the Perl community every day virtually and in person permission to believe that they should feel welcome and important in the community.

It's about empathy.

I understand people disagree about the wording and even need for a code of conduct, and I don't mean to suggest that such concerns come from robots who lack empathy. By no means.

Yet put yourself in the shoes of someone who feels like he might not quite fit in in a talk, because it's full of inside jokes and jargon and the kinds of comfortable banter you only get after you've idled on a handful of Perl IRC channels for months or years. (Imagine that person's an introvert, or at least not as stubborn as I am.) Now imagine the speaker or someone else says or does something that reminds that person that he doesn't belong there, that he doesn't fit in, that he's different.

That's not necessarily assault. That's not necessarily a criminal act. But it's probably unnecessary and hopefully unintentional.

(The best silly example I can come up with is a speaker saying "... and of course, if you're a Windows user, no one cares about you until you man up and get a real operating system." and half of the audience laughs.)

Sure, there are good legal reasons to have a code of conduct that suggests that criminal activities such as assault, battery, and sexual assault are intolerable. There's no gray area about groping or rape or physical violence.

I agree with a lot of Open source is not a war zone, and I agree fully that the Perl community is all the richer for contributions of people like Wendy and Liz and Karen and Su-Shee et cetera. I'm glad they participate, just like I'm glad people like Tim Bunce and Schwern and brian d foy and Dave Rolsky and Rik et cetera participate.

I just can't quite agree that a code of conduct has a chilling effect that will exclude people. I don't see it.

Maybe that's because I see the code of conduct as explicitly saying "If you feel like you don't quite fit in, that's okay. You're welcome here. We take you seriously. If you have one of these big problems—even if it's with a speaker or writer or developer with a famous name—we'll take that seriously. The rules apply to everyone."

Sure, there's more to helping newcomers feel welcome than enforcing a policy of civil conduct, but that's the minimum I want to see, and I'm glad that YAPCs have done other things (less controversial, I'm sure) to that end.

Joel Berger: My "Mojolicious Introduction" now updated for 4.0

On Feb 28, 2013 I gave a talk to Chicago.pm about Mojolicious. I called it an introduction, but I really wanted to show some of the features that sets Mojolicious apart. Because of this, the talk moves very fast. It hits routing and responses quickly, hits testing often, on all the way to well-tested non-blocking websocket examples.

I promised to get my slides up afterwards but life (i.e. my doctoral thesis) got in the way. Now with the release of Mojolicious 4.0 I thought I would take the opportunity to right a wrong and get the slides up; so here they are: http://mojolicious-introduction.herokuapp.com/!

The talk is itself a Mojolicious app, the source of which is available from on GitHub. Not only are all the code snippets shown in the talk included, not only do they all run, but they are actually what is rendered by the talk (DRY++), so what you see is what you get! Please leave any feedback and ask any questions. I may not see the responses here, so feel free to ping me elsewhere if needed.

dagolden: Anyone want vanillaperl.com?

I'm tired of paying the domain bill for vanillaperl.com (which currently just redirects to strawberryperl.com).

It will lapse at the beginning of July unless someone wants to take it.

If you're interested, leave a comment below and explain what you want to do with it. I'll award it to the best proposal received by June 1.

Schwern: Blog moved to blog.schwern.net

My blog has moved to blog.schwern.net.

Sebastian Riedel about Perl and the Web: Mojolicious 4.0 released: Perl real-time web framework

image

It fills me with great joy to announce our classiest release yet, Mojolicious 4.0 (Top Hat).

This is the first major release for the newest member of our core team, please welcome Dr. Joel Berger, he was responsible for many of the new features. It has only been 11 months since our 3.0 release and the new development process is working out very well for us so far. The community keeps growing fast, we’ve now been starred almost 900 times on GitHub and the IRC channel regularly reaches more than 150 visitors, thanks everyone!

While the main focus of this release has been on the removal of legacy APIs, there are also quite a few new features, here are the highlights:

  • Content generators: “json” and “form” generators are built right in. (example)
  • JSON WebSocket messages: Native serialization and deserialization support. (example)
  • JSON WebSocket tests: Just as easy to use as their HTTP equivalents. (example)
  • Event synchronization: Avoid callback spaghetti with delays. (example)
  • Scalability: The event loop got a lot better at managing more than 10k concurrent connections. (example)
  • Smooth restarting: The Morbo development web server does not have any noticeable downtime while restarting anymore.
  • Hooks: The framework got more extensible with new hooks. (example)
  • GZip: Compression is now transparently supported by the user agent.
  • HTML5 forms: Tag helpers have been added for many of the new form elements. (example)
  • Session expiration: Can now be controlled with a relative value that persists within the session. (example)
  • GET/POST parameters: Retrieve multiple values at once with the much more secure multi name form. (example)
  • JSON Pointers: Now fully RFC 6901 compliant. (example)
  • Monotonic clock support: All built-in web servers are now very resilient to time jumps.

And as usual there is a lot more to discover, see Changes on GitHub for the full list of improvements.

Have fun!

Perlgeek.de : Exceptions Grant Report -- Final update

In my previous blog post I mentioned that I'm nearly done with my exceptions Hague grant. I have since done all the things that I identified as still missing.

In particular I ack through the setting for remaining uses of die, and the only thing left are internal errors, error messages about not-yet-implemented things and the actual declaration of die. Which means that everything that should be a typed exception is now.

The error catalogue can be found in S32::Exception. Documentation for compiler writers is in a separate document, and the promised documentation for test authors is in the POD of Test::Util in the "roast" repository.

Now I wait for review of my work by the grant manager (thanks Will) and the grant committee.

I'd like to thank everybody who was involved with the grant.

Dave's Free Press: Journal: Module pre-requisites analyser

Dave's Free Press: Journal: CPANdeps

Dave's Free Press: Journal: Perl isn't dieing

Ocean of Awareness: Marpa's SLIF now allows procedural parsing

Marpa's SLIF (scanless interface) allows an application to parse directly from any BNF grammar. Marpa parses vast classes of grammars in linear time, including all those classes currently in practical use. With its latest release, Marpa::R2's SLIF also allows an application to intermix its own custom lexing and parsing logic with Marpa's, and to switch back and forth between them. This means, among other things, that Marpa's SLIF can now do procedural parsing.

What is procedural parsing? Procedural parsing is parsing using ad hoc code in a procedural language. The opposite of procedural parsing is declarative parsing -- parsing driven by some kind of formal description of the grammar. Procedural parsing may be described as what you do when you've given up on your parsing algorithm. Dissatisfaction with parsing theory has left modern programmers accustomed to procedural parsing. And in fact some problems are best tackled with procedural parsing.

An example

One such problem is parsing Perl-style here-documents. Peter Stuifzand has tackled this using the just-released version of Marpa::R2. For those unfamiliar, Perl allows documents to be incorporated into its source files in line-oriented fashion as "here-documents". Here-documents can be used in expressions. The syntax to do this is very handy, if a little strange. For example,

say <<ENDA, <<ENDB, <<ENDC; say <<ENDD;
a
ENDA
b
ENDB
c
ENDC
d
ENDD

starts with a single line declaring four here-documents spread out over two say statements. The expressions of the form

<<ENDX

are here-document expressions. << is the heredoc operator. The string which follows it (in this example, ENDA, ENDB, etc.) is the heredoc terminator string -- the string that will signal end of body of the here-document. The body of the here-documents follow, in order, over the next eight lines. More details of here-document syntax, with examples, can be found in the Perl documentation.

All of this poses quite a challenge to a parser-lexer combination, which is one reason I chose it as an example -- to illustrate that the Marpa's SLIF support for procedural parsing can handle genuinely difficult cases. There are a few ways Marpa could approach this. The one Peter Stuifzand chose was to to read the here-document's body as the value of the terminator in each <<ENDX expression.

The strategy works this way: Marpa allows the application to mark certain lexemes as "pause" lexemes. Whenever a "pause" lexeme is encountered, Marpa's internal scanning stops, and control is handed over to the application. In this case, the application is set up to pause after every newline, and before the terminator in every here-document expression.

While reading the line containing the four here-document expressions, Marpa's SLIF pauses and resumes five times -- once for each here-document expression, then once for the final newline. Details can be found in compact form in the heavily commented code in this Github gist.

Marpa as a better procedural parser

So far I've talked in terms of Marpa "allowing" procedural parsing. In fact, there can be much more to it. Marpa can make procedural parsing easier and more accurate.

Marpa knows, at every point, which rules it is recognizing, and how far it is into them. Marpa also knows which new rules the grammar expects, and which terminals. The procedural parsing logic can consult this information to guide its decisions. Marpa can provide your procedural parsing logic with radar, as well as the option to use a very smart autopilot.

For more about Marpa

Marpa's latest version is Marpa::R2, which is available on CPAN. Marpa's SLIF is a new interface, which represents a major increase in Marpa's "whipitupitude". The SLIF has tutorials here and here. Marpa has a web page, and of course it is the focus of my "Ocean of Awareness" blog.

Comments on this post can be sent to the Marpa's Google Group: marpa-parser@googlegroups.com

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 3

Ocean of Awareness: Is Earley parsing fast enough?

"First we ask, what impact will our algorithm have on the parsing done in production compilers for existing programming languages? The answer is, practically none." -- Jay Earley's Ph.D thesis, p. 122.

In the above quote, the inventor of the Earley parsing algorithm poses a question. Is his algorithm fast enough for a production compiler? His answer is a stark "no".

This is the verdict on Earley's that you often hear repeated today, 45 years later. Earley's, it is said, has a too high a "constant factor". Verdicts tends to be repeated more often than examined. This particular verdict originates with the inventor himself. So perhaps it is not astonishing that many treat the dismissal of Earley's on grounds of speed to be as valid today as it was in 1968.

But in the past 45 years, computer technology has changed beyond recognition and researchers have made several significant improvements to Earley's. It is time to reopen this case.

What is a "constant factor"

The term "constant factor" here has a special meaning, one worth looking at carefully. Programmers talk about time efficiency in two ways: time complexity and speed.

Speed is simple: It's how fast the algorithm is against the clock. To make comparison easy, the clock can be an abstraction. The clock ticks could be, for example, weighted instructions on some convenient and mutually-agreed architecture.

By the time Earley was writing, programmers had discovered that simply comparing speeds, even on well-chosen abstract clocks, was not enough. Computers were improving very quickly. A speed result that was clearly significant when the comparison was made could quickly become unimportant. Researchers needed to talk about time efficiency in a way that made what they said as true decades later as on the day they said it. To do this, researchers created the idea of time complexity.

Time complexity is measured using several notations, but the most common is big-O notation. Here's the idea: Assume we are comparing two algorithms, Algorithm A and Algorithm B. Assume that algorithm A uses 42 weighted instructions for each input symbol. Assume that algorithm B uses 1792 weighted instructions for each input symbol. Where the count of input symbols is N, A's speed is 42*N, and B's is 1792*N. But the time complexity of both is the same: O(N). The big-O notation throws away the two "constant factors", 42 and 1792. Both are said to be "linear in N". (Or more often, just "linear".)

It often happens that algorithms we need to compare for time efficiency have different speeds, but the same time complexity. In practice, this usually this means we can treat them as having essentially the same time efficiency. But not always. It sometimes happens that this difference is relevant. When this happens, the rap against the slower algorithm is that it has a "high constant factor".

OK, about that high constant factor

What is the "constant factor" between Earley and the current favorite parsing algorithm, as a number? (My interest is practical, not historic, so I will be talking about Earley's as modernized by Aycock, Horspool, Leo and myself. But much of what I say applies to Earley's algorithm in general.)

What the current favorite parsing algorithm is can be an interesting question. When Earley wrote, it was hand-written recursive descent. The next year (1969) LALR parsing was invented, and the year after (1970) a tool that used it was introduced -- yacc. At points over the next decades, yacc chased both Earley's and recursive descent almost completely out of the textbooks. But as I have detailed elsewhere, yacc had serious problems. In 2006 things went full circle -- the industry's standard C compiler, GCC, replaced LALR with recursive descent.

So back to 1970. That year, Jay Earley wrote up his algorithm for "Communications of the ACM", and put a rough number on his "constant factor". He said that his algorithm was an "order of magnitude" slower than the current favorites -- a factor of 10. Earley suggested ways to lower this 10-handicap, and modern implementations have followed up on them and found others. But for this post, let's concede the factor of ten and throw in another. Let's say Earley's is 100 times slower than the current favorite, whatever that happens to be.

Moore's Law and beyond

Let's look at the handicap of 100 in the light of Moore's Law. Since 1968, computers have gotten a billion times faster -- nine orders of magnitude. Nine factors of ten. This means that today Earley's runs seven factors of ten faster than the current favorite algorithm did in 1968. Earley's is 10 million times as fast as the algorithm that was then considered practical.

Of course, our standard of "fast enough to be practical" also evolves. But it evolves a lot more slowly. Let's exaggerate and say that "practical" meant "takes an hour" in 1968, but that today we would demand that the same program take only a second. Do the arithmetic and you find that Earley's is now more than 2,000 times faster than it needs to be to be practical.

Bringing in Moore's Law is just the beginning. The handicap Jay Earley gave his algorithm is based on a straight comparison of CPU speeds. But parsing, in practical cases, involves I/O. And the "current favorite" needs to do as much I/O as Earley's. I/O overheads, and the accompanying context switches, swamp considerations of CPU speed, and that is more true today that it was in 1968. When an application is I/O bound, CPU is in effect free. Parsing may not be I/O bound in this sense, but neither is it one of those applications where the comparison can be made in raw CPU terms.

Finally, pipelining has changed the nature of the CPU overhead itself radically. In 1968, the time to run a series of CPU instructions varied linearly with the number of instructions. Today, that is no longer true, and the change favors strategies like Earley's, which require a higher instruction count, but achieve efficiency in other ways.

Achievable speed

So far, I've spoken in terms of theoretical speeds, not achievable ones. That is, I've assumed that both Earley's and the current favorite are producing their best speed, unimpeded by implementation considerations.

Earley, writing in 1968 and thinking of hand-written recursive descent, assumed that production compilers could be, and in practice usually would be, written by programmers with plenty of time to do careful and well-thought-out hand-optimization. After forty-five years of real-life experience, we know better.

In those widely used practical compilers and interpreters that rely on lots of procedural logic -- and these days that is almost all of them -- it is usually all the maintainers can do to keep the procedural logic correct. In all but a few cases, optimization is opportunistic, not systematic. Programmers have been exposed to the realities of parsing with large amounts of complex procedural logic, and hand-written recursive descent has acquired a reputation for being slow.

In theory, LALR based compilers are less dependent on procedural parsing and therefore easier to keep optimal. In practice they are as bad or worse. LALR parsers usually still need a considerable amount of procedural logic, but procedural logic is harder to write for LALR than it is for recursive descent.

Modern Earley parsing has a much easier time actually delivering its theoretical best speed in practice. Earley's is powerful enough, and in its modern version well-enough aware of the state of the parse, that procedural logic can be kept to minimum or eliminated. Most of the parsing is done by the mathematics at its core.

The math at Earley's core can be heavily optimized, and any optimization benefits all applications. Optimization of special-purpose procedural logic benefits only the application that uses that logic.

Other considerations

But you might say,

"A lot of interesting points, Jeffrey, but all things being equal, a factor of 10, or even what's left from a factor of ten once I/O, pipelining and implementation inefficiencies have all nibbled away at it, is still worth having. It may in a lot of instances not even be measurable, but why not grab it for the sake of the cases where it is?"

Which is a good point. The "implementation inefficiences" can be nasty enough that Earley's is in fact faster in raw terms, but let's assume that some cost in speed is still being paid for the use of Earley's. Why incur that cost?

Error diagnosis

The parsing algorithms currently favored, in their quest for efficiency, do not maintain full information about the state of the parse. This is fine when the source is 100% correct, but in practice an important function of a parser is to find and diagnose errors. When the parse fails, the current favorites often have little idea of why. An Earley parser knows the full state of the parse. This added knowledge can save a lot of programmer time.

Readability

The more that a parser does from the grammar, and the less procedural logic it uses, the more readable the code will be. This has a determining effect on maintainance costs and the software's ability to evolve over time.

Accuracy

Procedural logic can produce inaccuracy -- inability to describe or control the actual language begin parsed. Some parsers, particularly LALR and PEG, have a second major source of inaccuracy -- they use a precedence scheme for conflict resolution. In specific cases, this can work, but precedence-driven conflict resolution produces a language without a "clean" theoretical description.

The obvious problem with not knowing what language you are parsing is failure to parse correct source code. But another, more subtle, problem can be worse over the life cycle of a language ...

False positives

False positives are cases where the input is in error, and should be reported as such, but instead the result is what you wanted. This may sound like unexpected good news, but when a false positive does surface, it is quite possible that it cannot be fixed without breaking code that, while incorrect, does work. Over the life of a language, false positives are deadly. False positives produce buggy and poorly understood code which must be preserved and maintained forever.

Power

The modern Earley implementation can parse vast classes of grammar in linear time. These classes include all those currently in practical use.

Flexibility

Modern Earley implementations parse all context-free grammars in times that are, in practice, considered optimal. With other parsers, the class of grammars parsed is highly restricted, and there is usually a real danger that a new change will violate those restrictions. As mentioned, the favorite alternatives to Earley's make it hard to know exactly what language you are, in fact, parsing. A change can break one of these parsers without there being any indication. By comparison, syntax changes and extensions to Earley's grammars are carefree.

For more about Marpa

Above I've spoken of "modern Earley parsing", by which I've meant Earley parsing as amended and improved by the efforts of Aho, Horspool, Leo and myself. At the moment, the only implementation that contains all of these modernizations is Marpa.

Marpa's latest version is Marpa::R2, which is available on CPAN. Marpa's SLIF is a new interface, which represents a major increase in Marpa's "whipitupitude". The SLIF has tutorials here and here. Marpa has a web page, and of course it is the focus of my "Ocean of Awareness" blog.

Comments on this post can be sent to the Marpa's Google Group: marpa-parser@googlegroups.com

Dave's Free Press: Journal: Devel::CheckLib can now check libraries' contents

Perlgeek.de : Rakudo's Abstract Syntax Tree

After or while a compiler parses a program, the compiler usually translates the source code into a tree format called Abstract Syntax Tree, or AST for short.

The optimizer works on this program representation, and then the code generation stage turns it into a format that the platform underneath it can understand. Actually I wanted to write about the optimizer, but noticed that understanding the AST is crucial to understanding the optimizer, so let's talk about the AST first.

The Rakudo Perl 6 Compiler uses an AST format called QAST. QAST nodes derive from the common superclass QAST::Node, which sets up the basic structure of all QAST classes. Each QAST node has a list of child nodes, possibly a hash map for unstructured annotations, an attribute (confusingly) named node for storing the lower-level parse tree (which is used to extract line numbers and context), and a bit of extra infrastructure.

The most important node classes are the following:

QAST::Stmts
A list of statements. Each child of the node is considered a separate statement.
QAST::Op
A single operation that usually maps to a primitive operation of the underlying platform, like adding two integers, or calling a routine.
QAST::IVal, QAST::NVal, QAST::SVal
Those hold integer, float ("numeric") and string constants respectively.
QAST::WVal
Holds a reference to a more complex object (for example a class) which is serialized separately.
QAST::Block
A list of statements that introduces a separate lexical scope.
QAST::Var
A variable
QAST::Want
A node that can evaluate to different child nodes, depending on the context it is compiled it.

To give you a bit of a feel of how those node types interact, I want to give a few examples of Perl 6 examples, and what AST they could produce. (It turns out that Perl 6 is quite a complex language under the hood, and usually produces a more complicated AST than the obvious one; I'll ignore that for now, in order to introduce you to the basics.)

Ops and Constants

The expression 23 + 42 could, in the simplest case, produce this AST:

QAST::Op.new(
    :op('add'),
    QAST::IVal.new(:value(23)),
    QAST::IVal.new(:value(42)),
);

Here an QAST::Op encodes a primitive operation, an addition of two numbers. The :op argument specifies which operation to use. The child nodes are two constants, both of type QAST::IVal, which hold the operands of the low-level operation add.

Now the low-level add operation is not polymorphic, it always adds two floating-point values, and the result is a floating-point value again. Since the arguments are integers and not floating point values, they are automatically converted to float first. That's not the desired semantics for Perl 6; actually the operator + is implemented as a subroutine of name &infix:<+>, so the real generated code is closer to

QAST::Op.new(
    :op('call'),
    :name('&infix:<+>'),    # name of the subroutine to call
    QAST::IVal.new(:value(23)),
    QAST::IVal.new(:value(42)),
);

Variables and Blocks

Using a variable is as simple as writing QAST::Var.new(:name('name-of-the-variable')), but it must be declared first. This is done with QAST::Var.new(:name('name-of-the-variable'), :decl('var'), :scope('lexical')).

But there is a slight caveat: in Perl 6 a variable is always scoped to a block. So while you can't ordinarily mention a variable prior to its declaration, there are indirect ways to achieve that (lookup by name, and eval(), to name just two).

So in Rakudo there is a convention to create QAST::Block nodes with two QAST::Stmts children. The first holds all the declarations, and the second all the actual code. That way all the declaration always come before the rest of the code.

So my $x = 42; say $x compiles to roughly this:

QAST::Block.new(
    QAST::Stmts.new(
        QAST::Var.new(:name('$x'), :decl('var'), :scope('lexical')),
    ),
    QAST::Stmts.new(
        QAST::Op.new(
            :op('p6store'),
            QAST::Var.new(:name('$x')),
            QAST::IVal.new(:value(42)),
        ),
        QAST::Op.new(
            :op('call'),
            :name('&say'),
            QAST::Var.new(:name('$x')),
        ),
    ),
);

Polymorphism and QAST::Want

Perl 6 distinguishes between native types and reference types. Native types are closer to the machine, and their type name is always lower case in Perl 6.

Integer literals are polymorphic in that they can be either a native int or a "boxed" reference type Int.

To model this in the AST, QAST::Want nodes can contain multiple child nodes. The compile-time context decides which of those is acutally used.

So the integer literal 42 actually produces not just a simple QAST::IVal node but rather this:

QAST::Want.new(
    QAST::WVal(Int.new(42)),
    'Ii',
    QAST::Ival(42),
)

(Note that Int.new(42) is just a nice notation to indicate a boxed integer object; it doesn't quite work like this in the code that translate Perl 6 source code into ASTs).

The first child of a QAST::Want node is the one used by default, if no other alternative matches. The comes a list where the elements with odd indexes are format specifications (here Ii for integers) and the elements at even-side indexes are the AST to use in that case.

An interesting format specification is 'v' for void context, which is always chosen when the return value from the current expression isn't used at all. In Perl 6 this is used to eagerly evaluate lazy lists that are used in void context, and for several optimizations.

Dave's Free Press: Journal: I Love Github

Ocean of Awareness: What if languages were free?

In 1980, George Copeland wrote an article titled "What if Mass Storage were Free?". Costs of mass storage were showing signs that they might fall dramatically. Copeland, as a thought exercise, took this trend to its extreme. Among other things, he predicted that deletion would become unnecessary, and in fact, undesirable.

Copeland's thought experiment has proved prophetic. For many purposes, mass storage is treated as if it were free. For example, you probably retrieved this blog post from a server provided to me at no charge, in the hope that I might write and upload something interesting.

Until now languages were high-cost efforts. Worse, language projects ran a high risk of disappointment, up to and including total failure. I believe those days are coming to an end.

Small languages, shaped to the problem domain

What if whenever you needed a new language, poof, it was there? You would be encouraged to tackle each problem domain with a new language dedicated to dealing with that domain. Since each language is no larger than its problem domain, learning a language would be essentially the same as learning the problem domain. The incremental effort required to learn the language itself would head toward zero.

No more language bloat

Language bloat would end. Currently, the risk and cost of developing languages make it imperative to extend the ones we have. Free languages mean fewer reasons to add features to existing languages.

No more search for THE perfect language

No language is perfect for all tasks. But because the high cost of languages favors large, general-purpose languages, we are compelled to try for perfection anyway. Ironically, we are often making the language worse, and we know it.

A world full of perfect languages

An older sense of the word perfect is "having all the properties or qualities requisite to its nature and kind". The C language might be called perfect in this sense. C lacks a lot of features that are highly desirable in most contexts. But for programming that is portable and close to the hardware, the C language is perfect or close to it. If languages were free, this is the kind of perfection that we would seek -- languages precisely fitted to their domain, so that adding to them cannot make them better.

Moving toward free

My own effort to contribute to a fall in the cost of languages is the Marpa parser. Marpa produces a reasonable parser for every language you can write in BNF. If the BNF is for a grammar in any of the classes currently in practical use, the parser Marpa produces will have linear speed. In one case, using Marpa, a targeted language was written in less than an hour. More typically, Marpa reduce the time needed to create new languages to hours.

As one example of going from "impossible" to "easy", I have written a drop-in solution to an example in the Gang of Four book. The Gang of Four described a language and its interpretation, but they did not include a parser. Creating a parser to fit their example would have been impossibly hard when the Gang of Four wrote. Using Marpa, it is easy. The parser can be found in this earlier blog post.

Marpa's latest version is Marpa::R2, which is available on CPAN. Recently, it has gained immensely in "whipitupitude" with a new interface, which has tutorials here and here. Marpa has a web page, and of course it is the focus of my "Ocean of Awareness" blog.

Comments on this post can be sent to the Marpa's Google Group: marpa-parser@googlegroups.com

Dave's Free Press: Journal: Palm Treo call db module

Ocean of Awareness: BNF to AST

The latest version of Marpa takes parsing "whipitupitude" one step further. You can now go straight from a BNF description of your language, and an input string, to an abstract syntax tree (AST).

To illustrate, I'll use an example from the Gang of Four's (Go4's) chapter on the Interpreter pattern. (It's pages 243-255 of the Design Patterns book.) The Go4 knew of no easy general way to go from BNF to AST, so they dealt with that part of the interpreter problem by punting -- they did not even try to parse the input string. Instead they constructed the BNF they'd just presented and constructed an AST directly in their code.

The reason the Go4 didn't know of an easy, generally-applicable way to parse their example was that there was none. Now there is. In this post, Marpa will take us quickly and easily from BNF to AST. (Full code for this post can be found in a Github gist.)

The Go4's example was a simple boolean expression language, whose primary input was

true and x or y and not x

Here, in full, is the BNF for an slight elaboration of the Go4 example. It is written in the DSL for Marpa's Scanless interface (SLIF DSL), and includes specifications for building the AST.

:default ::= action => ::array

:start ::= <boolean expression>
<boolean expression> ::=
       <variable> bless => variable
     | '1' bless => constant
     | '0' bless => constant
     | ('(') <boolean expression> (')') action => ::first bless => ::undef
    || ('not') <boolean expression> bless => not
    || <boolean expression> ('and') <boolean expression> bless => and
    || <boolean expression> ('or') <boolean expression> bless => or

<variable> ~ [[:alpha:]] <zero or more word characters>
<zero or more word characters> ~ [\w]*

:discard ~ whitespace
whitespace ~ [\s]+

This syntax should be fairly transparent. In previous posts I've given a tutorial, and a a mini-tutorial. And of course, the interface is documented.

For those skimming, here are a few quick comments on less-obvious features. To guide Marpa in building the AST, the BNF statements have action and bless adverbs. The bless adverbs indicate a Perl class into which the node should be blessed. This is convenient for using an object-oriented approach with the AST. The action adverb tells Marpa how to build the nodes. "action => ::array" means the result of the rule should be an array containing its child nodes. "action => ::first" means the result of the rule should just be its first child. Many of the child symbols, especially literal strings of a structural nature, are in parentheses. This makes them invisible to the semantics.

A :default pseudo-rule specifies the defaults -- in this case the "action => ::array" adverb setting. The :start pseudo-rule specified the start symbol. The :discard pseudo-rule indicates that whitespace is to be discarded.

The Go4 did not deal with precedence. In their example, the input string is fully parenthesized, even though its priorities are the standard ones. I've eliminated the parentheses, because the standard precedence is implemented in SLIF grammar. The double vertical bar ("||") is a "loosen" operator -- an alternative after "loosen" operator will be at a looser precedence than the one before. Alternatives separated by a single bar are at the same precedence.

Creating the AST

Creating the AST is simple. First, we use Marpa to turn the above DSL for boolean expressions into a parser. (We'd saved the SLIF DSL source in the string $rules.)

my $grammar = Marpa::R2::Scanless::G->new(
    {   bless_package => 'Boolean_Expression',
        source        => \$rules,
    }   
);  

Next we define a closure that uses $grammar to turn BNF into AST's.

sub bnf_to_ast {
    my ($bnf) = @_;
    my $recce = Marpa::R2::Scanless::R->new( { grammar => $grammar } );
    $recce->read( \$bnf );
    my $value_ref = $recce->value();
    if ( not defined $value_ref ) {
        die "No parse for $bnf";
    }
    return ${$value_ref};
} ## end sub bnf_to_ast

Where $bnf is our input string, we run it as follows:

my $ast1 = bnf_to_ast($bnf);

The AST

If we use Data::Dumper to examine the AST,

say Data::Dumper::Dumper($ast1) if $verbose_flag;

we see this:

$VAR1 = bless( [
                 bless( [
                          bless( [
                                   'true'
                                 ], 'Boolean_Expression::variable' ),
                          bless( [
                                   'x'
                                 ], 'Boolean_Expression::variable' )
                        ], 'Boolean_Expression::and' ),
                 bless( [
                          bless( [
                                   'y'
                                 ], 'Boolean_Expression::variable' ),
                          bless( [
                                   bless( [
                                            'x'
                                          ], 'Boolean_Expression::variable' )
                                 ], 'Boolean_Expression::not' )
                        ], 'Boolean_Expression::and' )
               ], 'Boolean_Expression::or' );

Processing the AST

In their example, the Go4 processed their AST in several ways: straight evaluation, copying, and substitution of the occurrences of a variable in one boolean expression by another boolean expression. It is obvious that the AST above is the computational equivalent of the Go4's AST, but for the sake of completeness I carry out the same operations in the Github gist.

AST creation via Marpa's SLIF is self-hosting -- the SLIF DSL is parsed into an AST, and a parser created by interpreting the AST. The Marpa SLIF DSL source file in this post, that describes boolean expressions, was itself turned into an AST on its way to becoming a parser that turns boolean expressions into AST's.

Comments

Comments on this post can be sent to the Marpa Google Group: marpa-parser@googlegroups.com

Perlgeek.de : Meet DBIish, a Perl 6 Database Interface

In the aftermath of the Oslo Perl 6 hackathon 2012, I have decided to fork and rename MiniDBI. MiniDBI is intended as a compatible port of Perl 5's excellent DBI module to Perl 6. While working on the MiniDBI backends, I noticed that I became more and more unhappy with that. Perl 6 is sufficiently different from Perl 5 to warrant different design decisions in the database interface layer.

Meet DBIish. It started with MiniDBI's code base, but has some substantial deviations from MiniDBI:

  • Connection information is passed by named arguments to the driver (instead of a single DSN string)
  • Different naming of several methods. There's not much point in having both fetchrow_array and fetchrow_arrayref in Rakudo. fetchrow simply returns an array or a list, and the caller decides what to do with it.
  • Backends only need to implement fetchrow and column_names, and get all the other fetching methods (like fetchrow-hash, fetchall-hash) for free.
  • Error handling from DB connection and statement handle are unified into a single row

The latter two changes brought quite a reduction in backend code size.

My plans for the future include experimenting with different names and maybe totally different APIs. When a language has lazy lists, one can simply return all rows lazily, instead of encouraging the user to fetch the rows one by one.

Currently the Postgresql and mysql backends support basic CRUD operations, Postgresql with proper prepared statements and placeholders. An SQLite backend is under way, but still needs better support from our native call interface.

Perlgeek.de : doc.perl6.org and p6doc

Background

Earlier this year I tried to assess the readiness of the Perl 6 language, compilers, modules, documentation and so on. While I never got around to publish my findings, one thing was painfully obvious: there is a huge gap in the area of documentation.

There are quite a few resources, but none of them comprehensive (most comprehensive are the synopsis, but they are not meant for the end user), and no single location we can point people to.

Announcement

So, in the spirit of xkcd, I present yet another incomplete documentation project: doc.perl6.org and p6doc.

The idea is to take the same approach as perldoc for Perl 5: create user-level documentation in Pod format (here the Perl 6 Pod), and make it available both on a website and via a command line tool. The source (documentation, command line tool, HTML generator) lives at https://github.com/perl6/doc/. The website is doc.perl6.org.

Oh, and the last Rakudo Star release (2012.06) already shipped p6doc.

Status and Plans

Documentation, website and command line tool are all in very early stages of development.

In the future, I want both p6doc SOMETHING and http://doc.perl6.org/SOMETHING to either document or link to documentation of SOMETHING, be it a built-in variable, an operator, a type name, routine name, phaser, constant or... all the other possible constructs that occur in Perl 6. URLs and command line arguments specific to each type of construct will also be available (/type/SOMETHING URLs already work).

Finally I want some way to get a "full" view of a type, ie providing all methods from superclasses and roles too.

Help Wanted

All of that is going to be a lot of work, though the most work will be to write the documentation. You too can help! You can write new documentation, gather and incorporate already existing documentation with compatible licenses (for example synopsis, perl 6 advent calendar, examples from rosettacode), add more examples, proof-read the documentation or improve the HTML generation or the command line tool.

If you have any questions about contributing, feel free to ask in #perl6. Of course you can also; create pull requests right away :-).

Ocean of Awareness: The Interpreter Design Pattern

The influential Design Patterns book lays out 23 patterns for programming. One of them, the Interpreter Pattern, is rarely used. Steve Yegge puts it a bit more strikingly -- he says that the book contains 22 patterns and a practical joke.

That sounds (and in fact is) negative, but elsewhere Yegge says that "[t]ragically, the only [Go4] pattern that can help code get smaller (Interpreter) is utterly ignored by programmers". (The Design Patterns book has four authors, and is often called the Gang of Four book, or Go4.)

In fact, under various names and definitions, the Interpreter Pattern and its close relatives and/or identical twins are widely cited, much argued and highly praised[1]. As they should be. Languages are the most powerful and flexible design pattern of all. A language can include all, and only, the concepts relevent to your domain. A language can allow you to relate them in all, and only, the appropriate ways. A language can identify errors with pinpoint precision, hide implementation details, allow invisible "drop-in" enhancements, etc., etc., etc.

In fact languages are so powerful and flexible, that their use is pretty much universal. The choice is not whether or not to use a language to solve the problem, but whether to use a general-purpose language, or a domain-specific language. Put another way, if you decide not to use a language targeted to your domain, it almost always means that you are choosing to use another language that is not specifically fitted to your domain.

Why then, is the Interpreter Pattern so little used? Why does Yegge call it a practical joke?

There's a problem

The problem with the Interpreter Pattern is that you must turn your language into an AST -- that is, you must parse it somehow. Simplifying the language can help here. But if the point is to be simple at the expense of power and flexibility, you might as well stick with the other 22 design patterns.

On the other hand, creating a parser for anything but the simplest languages has been a time-consuming effort, and one of a kind known for disappointing results. In fact, language development efforts run a real risk of total failure.

How did the Go4 deal with this? They defined the problem away. They stated that the parsing issue was separate from the Interpreter Pattern, which was limited to what you did with the AST once you'd somehow come up with one.

But AST's don't (so to speak) grow on trees. You have to get one from somewhere. In their example, the Go4 simply built an AST in their code, node by node. In doing this, they bypassed the BNF and the problem of parsing. But they also bypassed their language and the whole point of the Interpreter Pattern.

Which is why Yegge characterized the chapter as a practical joke. And why other programming techniques and patterns are almost always preferred to the Interpreter Pattern.

Finding that one missing piece

So that's how the Go4 left things. A potentially great programming technique, made almost useless because of a missing piece. There was no easy, general, and practical way to generate AST's.

Few expected that to change. I was more optimistic than most. In 2007 I embarked on a full-time project: to create a parser based on Earley's algorithm. I was sure that it would fulfill two of the criteria -- it would be easy to use, and it would be general. As for practical -- well, a lot of parsing problems are small, and a lot of applications don't require a lot of speed, and for these I expected the result to be good enough.

What I didn't realize was that all of the problems preventing Earley's from seeing real, practical use has already been solved in the academic literature. I was not alone in not having put the picture together. The people who had solved the problems had focused on two disjoint sets of issues, and were unaware of each other's work. In 1991, in the Netherlands, the mathematican Joop Leo had arrived at an astounding result -- he showed how to make Earley's run in linear time for LR-regular grammars. LR-regular is a vast class of grammars. It easily includes, as a proper subset, every class of grammar now in practical use -- regular expressions, PEG, recursive descent, the LALR on which yacc and bison are based, you name it. (For those into the math, LR-regular includes LR(k) for all k, and therefore LL(k), also for all k.)

Leo's mathematical approach did not address some nagging practical issues, foremost among them the handling of nullable rules and symbols. But ten years later in Canada, Aycock and Horspool focused on exactly these issues, and solved them. Aycock-Horspool seem to have been unaware of Leo's earlier result. The time complexity of the Aycock-Horspool algorithm was essentially that of Earley's original algorithm.

Because of Leo's work, for any grammar in any class currently in practical use, an Earley's parser could be fast. If only it could be combined with the approach of Aycock and Horspool, I realized, Leo's speeds could be available in an everyday programming tool.

In changing the Earley parse engine, Aycock-Horspool and Leo had branched off in different directions. It was not obvious that their approaches could be combined, much less how. And in fact, the combination of the two is not a simple algorithm. But it is fast, and the new Marpa parse engine makes full information about the state of the parse (rules recognized, symbols expected, etc.) available as it proceeds. This is very convenient for, among other things, error reporting.

Eureka and all that

The result is an algorithm which parses anything you can write in BNF and does it in times considered optimal in practice. Unlike recursive descent, you don't have to write out the parser -- Marpa generates a parser for you, from the BNF. It's the easy, "drop-in" solution that the Go4 needed and did not have. A reworking of the Go4 example, with the missing parser added, is in a previous blog post, and the code for the reworking is in a Github gist.

More about Marpa

Marpa's latest version is Marpa::R2, which is available on CPAN. Recently, it has gained immensely in "whipitupitude" with a new interface, which has tutorials here and here. Marpa has a web page, and of course it is the focus of my "Ocean of Awareness" blog.

Comments on this post can be sent to the Marpa's Google Group: marpa-parser@googlegroups.com

Notes

Note 1: For example, the Wikipedia article on DSL's; Eric Raymond discussing mini-languages; "Notable Design Patterns for Domain-Specific Languages", Diomidis Spinellis; and the c2.com wiki.

Perlgeek.de : Exceptions Grant Report for May 2012

It seems quite a long time since I started working on my grant on exceptions, and I until quite recently I felt that I still had quite a long way to go. And then I read the deliverables again, and found that I have actually achieved quite a bit of them already. I also noticed that some of them are quite ambiguously formulated.

Also when I wrote the grant application I had a clever system in the back of my mind that lets you categorize exceptions with different tags. After presenting that idea to the #perl6 channel, they uniformly told me that it was a (bad) reinvention of the existing type system. They were right, of course. So instead exceptions use the "real" type system now, which means that some aspects of the grant application do not make so much sense now.

Let's look at the deliverables in detail:

D1: Specification

S32::Exception contains my work in this area..

Since exceptions use the normal Perl 6 type system, the amount of work I had to do was less than I had expected. I consider it done, in the sense that everything is there that we need to throw typed exceptions and work with them in a meaningful and intuitive way.

There are certainly still open design question in the general space of exceptions (like, how do we indicate that an exception should or should not print its backtrace by default? There are ways to achieve this right now, but it's not as easy as it it should be for the end user). However those open questions are well outside the realm of this grant. I still plan to tackle them in due time.

D2: Error catalog, tests

The error catalog is compiled and in Rakudo's src/core/Exception.pm. It is not comprehensive (ie doesn't cover all possible errors that are thrown from current compilers), but the grant request only required an "initial" catalog. It is certainly enough to demonstrate the feasibility of the design, and to handle many very common cases. I will certainly summarize it in the S32::Exception document.

Tests are in the roast repository. At the time of writing there are 343 tests (Update 2012-06-04: 411 tests), of which Rakudo passes nearly all (the few failures are due to misparses, which cause wrong parse errors to be generated). They cover both the exceptions API and the individual exception types.

D3: Implementation, tests, documentation

The meat of the implementation is done. Not all exceptions thrown from the setting are typed yet, about 30 remain (plus a few for internal errors that don't make sense to improve much). (Update 2012-06-04: all of these 30 errors now throw typed exceptions too). The tests mentioned above already cover several RT tickets where people complained about wrong or less-than-awesome errors. Documentation is still missing, though I have given a walk through the process of adding a new typed exception to Rakudo on IRC, which might serve as a starting point for such documentation.

So in summary, still missing are

  • Finish changing text based exceptions to typed exceptions in CORE
  • Documenting the error catalog in S32::Exception
  • Documentation for compiler writers and test writers

A surprisingly short list :-)

I'd also like to mention that I did several things related to exceptions which were not covered by this grant report:

  • greatly improved backtrace printer
  • Many exceptions from within the compilation process (such as parse errors, redeclarations etc.) are now typed.
  • I enabled typed exceptions thrown from C code, and as a proof of concept I ported all user-visible exceptions in perl6.ops to their intended types.
  • Exceptions from within the meta model can now be caught in the "actions" part of the compiler, augmented with line numbers and file name and re-thrown

Perlgeek.de : Stop The Rewrites!

What follows is a rant. If you're not in the mood to read a rant right now, please stop and come back in an hour or two.

The Internet is full of people who know better than you how to manage your open source project, even if they only know some bits and pieces about it. News at 11.

But there is one particular instance of that advice that I hear often applied to Rakudo Perl 6: Stop the rewrites.

To be honest, I can fully understand the sentiment behind that advice. People see that it has taken us several years to get where we are now, and in their opinion, that's too long. And now we shouldn't waste our time with rewrites, but get the darn thing running already!

But Software development simply doesn't work that way. Especially not if your target is moving, as is Perl 6. (Ok, Perl 6 isn't moving that much anymore, but there are still areas we don't understand very well, so our current understanding of Perl 6 is a moving target).

At some point or another, you realize that with your current design, you can only pile workaround on top of workaround, and hope that the whole thing never collapses.

Picture of
a Jenga tower
Image courtesy of sermoa

Those people who spread the good advice to never do any major rewrites again, they never address what you should do when you face such a situation. Build the tower of workarounds even higher, and pray to Cthulhu that you can build it robust enough to support a whole stack of third-party modules?

Curiously this piece of advice occasionally comes from people who otherwise know a thing or two about software development methodology.

I should also add that since the famous "nom" switchover, which admittedly caused lots of fallout, we had three major rewrites of subsystems (longest-token matching of alternative, bounded serialization and qbootstrap), All three of which caused no new test failures, and two of which caused no fallout from the module ecosystem at all. In return, we have much faster startup (factor 3 to 4 faster) and a much more correct regex engine.

Perlgeek.de : The REPL trick

A recent discussion on IRC prompted me to share a small but neat trick with you.

If there are things you want to do quite often in the Rakudo REPL (the interactive "Read-Evaluate-Print Loop"), it makes sense to create a shortcut for them. And creating shortcuts for often-used stuff is what programming languages excel at, so you do it right in Perl module:

use v6;
module REPLHelper;

sub p(Mu \x) is export {
    x.^mro.map: *.^name;
}

I have placed mine in $HOME/.perl6/repl.

And then you make sure it's loaded automatically:

$ alias p6repl="perl6 -I$HOME/.perl6/repl/ -MREPLHelper"
$ p6repl
> p Int
Int Cool Any Mu
>

Now you have a neat one-letter function which tells you the parents of an object or a type, in method resolution order. And a way to add more shortcuts when you need them.

Perlgeek.de : News in the Rakudo 2012.06 release

Rakudo development continues to progress nicely, and so there are a few changes in this month's release worth explaining.

Longest Token Matching, List Iteration

The largest chunk of development effort went into Longest-Token Matching for alternations in Regexes, about which Jonathan already blogged. Another significant piece was Patrick's refactor of list iteration. You probably won't notice much of that, except that for-loops are now a bit faster (maybe 10%), and laziness works more reliably in a couple of cases.

String to Number Conversion

String to number conversion is now stricter than before. Previously an expression like +"foo" would simply return 0. Now it fails, ie returns an unthrown exception. If you treat that unthrown exception like a normal value, it blows up with a helpful error message, saying that the conversion to a number has failed. If that's not what you want, you can still write +$str // 0.

require With Argument Lists

require now supports argument lists, and that needs a bit more explaining. In Perl 6 routines are by default only looked up in lexical scopes, and lexical scopes are immutable at run time. So, when loading a module at run time, how do you make functions available to the code that loads the module? Well, you determine at compile time which symbols you want to import, and then do the actual importing at run time:

use v6;
require Test <&plan &ok &is>;
#            ^^^^^^^^^^^^^^^ evaluated at compile time,
#                            declares symbols &plan, &ok and &is
#       ^^^                  loaded at run time

Module Load Debugging

Rakudo had some trouble when modules were precompiled, but its dependencies were not. This happens more often than it sounds, because Rakudo checks timestamps of the involved files, and loads the source version if it is newer than the compiled file. Since many file operations (including simple copying) change the time stamp, that could happen very easily.

To make debugging of such errors easier, you can set the RAKUDO_MODULE_DEBUG environment variable to 1 (or any positive number; currently there is only one debugging level, in the future higher numbers might lead to more output).

$ RAKUDO_MODULE_DEBUG=1 ./perl6 -Ilib t/spec/S11-modules/require.t
MODULE_DEBUG: loading blib/Perl6/BOOTSTRAP.pbc
MODULE_DEBUG: done loading blib/Perl6/BOOTSTRAP.pbc
MODULE_DEBUG: loading lib/Test.pir
MODULE_DEBUG: done loading lib/Test.pir
1..5
MODULE_DEBUG: loading t/spec/packages/Fancy/Utilities.pm
MODULE_DEBUG: done loading t/spec/packages/Fancy/Utilities.pm
ok 1 - can load Fancy::Utilities at run time
ok 2 - can call our-sub from required module
MODULE_DEBUG: loading t/spec/packages/A.pm
MODULE_DEBUG: loading t/spec/packages/B.pm
MODULE_DEBUG: loading t/spec/packages/B/Grammar.pm
MODULE_DEBUG: done loading t/spec/packages/B/Grammar.pm
MODULE_DEBUG: done loading t/spec/packages/B.pm
MODULE_DEBUG: done loading t/spec/packages/A.pm
ok 3 - can require with variable name
ok 4 - can call subroutines in a module by name
ok 5 - require with import list

Module Loading Traces in Compile-Time Errors

If module myA loads module myB, and myB dies during compilation, you now get a backtrace which indicates through which path the erroneous module was loaded:

$ ./perl6 -Ilib -e 'use myA'
===SORRY!===
Placeholder variable $^x may not be used here because the surrounding block
takes no signature
at lib/myB.pm:1
  from module myA (lib/myA.pm:3)
  from -e:1

Improved autovivification

Perl allows you to treat not-yet-existing array and hash elements as arrays or hashes, and automatically creates those elements for you. This is called autovivification.

my %h;
%h<x>.push: 1, 2, 3; # worked in the previous release too
push %h<y>, 4, 5, 6; # newly works in the 2012.06

Perlgeek.de : Localization for Exception Messages

Ok, my previous blog post wasn't quite as final as I thought.. My exceptions grant said that the design should make it easy to enable localization and internationalization hooks. I want to discuss some possible approaches and thereby demonstrate that the design is flexible enough as it is.

At this point I'd like to mention that much of the flexibility comes from either Perl 6 itself, or from the separation of stringifying and exception and generating the actual error message.

Mixins: the sledgehammer

One can always override a method in an object by mixing in a role which contains the method on question. When the user requests error messages in a different language, one can replace method Str or method message with one that generates the error message in a different language.

Where should that happen? The code throws exceptions is fairly scattered over the code base, but there is a central piece of code in Rakudo that turns Parrot-level exceptions into Perl 6 level exceptions. That would be an obvious place to muck with exceptions, but it would mean that exceptions that are created but not thrown don't get the localization. I suspect that's a fairly small problem in the real world, but it still carries code smell. As does the whole idea of overriding methods.

Another sledgehammer: alternative setting

Perl 6 provides built-in types and routines in an outer lexical scope known as a "setting". The default setting is called CORE. Due to the lexical nature of almost all lookups in Perl 6, one can "override" almost anything by providing a symbol of the same name in a lexical scope.

One way to use that for localization is to add another setting between the user's code and CORE. For example a file DE.setting:

my class X::Signature::Placeholder does X::Comp {
    method message() {
        'Platzhaltervariablen können keine bestehenden Signaturen überschreiben';
    }
}

After compiling, we can load the setting:

$ ./perl6 --target=pir --output=DE.setting.pir DE.setting
$ ./install/bin/parrot -o DE.setting.pbc DE.setting.pir
$ ./perl6 --setting=DE -e 'sub f() { $^x }'
===SORRY!===
Platzhaltervariablen können keine bestehenden Signaturen überschreiben
at -e:1

That works beautifully for exceptions that the compiler throws, because they look up exception types in the scope where the error occurs. Exceptions from within the setting are a different beast, they'd need special lookup rules (though the setting throws far fewer exceptions than the compiler, so that's probably manageable).

But while this looks quite simple, it comes with a problem: if a module is precompiled without the custom setting, and it contains a reference to an exception type, and then the l10n setting redefines it, other programs will contain references to a different class with the same name. Which means that our precompiled module might only catch the English version of X::Signature::Placeholder, and lets our localized exception pass through. Oops.

Tailored solutions

A better approach is probably to simply hack up the string conversion in type Exception to consider a translator routine if present, and pass the invocant to that routine. The translator routine can look up the error message keyed by the type of the exception, and has access to all data carried in the exception. In untested Perl 6 code, this might look like this:

# required change in CORE
my class Exception {
    multi method Str(Exception:D:) {
        return self.message unless defined $*LANG;
        if %*TRANSLATIONS{$*LANG}{self.^name} -> $translator {
            return $translator(self);
        }
        return self.message; # fallback
    }
}

# that's what a translator could write:

%*TRANSLATIONS<de><X::TypeCheck::Assignment> = {
        "Typenfehler bei Zuweisung zu '$_.symbol()': "
        ~ "'{$_.expected.^name}' erwartet, aber '{$_.got.^name} bekommen"
    }
}

And setting the dynamic language $*LANG to 'de' would give a German error message for type check failures in assignment.

Another approach is to augment existing error classes and add methods that generate the error message in different languages, for example method message-fr for French, and check their existence in Exception.Str if a different language is requested.

Conclusion

In conclusion there are many bad and enough good approaches; we will decide which one to take when the need arises (ie when people actually start to translate error messages).

Dave's Free Press: Journal: Travelling in time: the CP2000AN

Dave's Free Press: Journal: Graphing tool

Dave's Free Press: Journal: XML::Tiny released

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 1

Perlgeek.de : Pattern Matching and Unpacking

When talking about pattern matching in the context of Perl 6, people usually think about regex or grammars. Those are indeed very powerful tools for pattern matching, but not the only one.

Another powerful tool for pattern matching and for unpacking data structures uses signatures.

Signatures are "just" argument lists:

sub repeat(Str $s, Int $count) {
    #     ^^^^^^^^^^^^^^^^^^^^  the signature
    # $s and $count are the parameters
    return $s x $count
}

Nearly all modern programming languages have signatures, so you might say: nothing special, move along. But there are two features that make them more useful than signatures in other languages.

The first is multi dispatch, which allows you to write several routines with the name, but with different signatures. While extremely powerful and helpful, I don't want to dwell on them. Look at Chapter 6 of the "Using Perl 6" book for more details.

The second feature is sub-signatures. It allows you to write a signature for a sigle parameter.

Which sounds pretty boring at first, but for example it allows you to do declarative validation of data structures. Perl 6 has no built-in type for an array where each slot must be of a specific but different type. But you can still check for that in a sub-signature

sub f(@array [Int, Str]) {
    say @array.join: ', ';
}
f [42, 'str'];      # 42, str
f [42, 23];         # Nominal type check failed for parameter '';
                    # expected Str but got Int instead in sub-signature
                    # of parameter @array

Here we have a parameter called @array, and it is followed by a square brackets, which introduce a sub-signature for an array. When calling the function, the array is checked against the signature (Int, Str), and so if the array doesn't contain of exactly one Int and one Str in this order, a type error is thrown.

The same mechanism can be used not only for validation, but also for unpacking, which means extracting some parts of the data structure. This simply works by using variables in the inner signature:

sub head(*@ [$head, *@]) {
    $head;
}
sub tail(*@ [$, *@tail]) {
    @tail;
}
say head <a b c >;      # a
say tail <a b c >;      # b c

Here the outer parameter is anonymous (the @), though it's entirely possible to use variables for both the inner and the outer parameter.

The anonymous parameter can even be omitted, and you can write sub tail( [$, *@tail] ) directly.

Sub-signatures are not limited to arrays. For working on arbitrary objects, you surround them with parenthesis instead of brackets, and use named parameters inside:

multi key-type ($ (Numeric :$key, *%)) { "Number" }
multi key-type ($ (Str     :$key, *%)) { "String" }
for (42 => 'a', 'b' => 42) -> $pair {
    say key-type $pair;
}
# Output:
# Number
# String

This works because the => constructs a Pair, which has a key and a value attribute. The named parameter :$key in the sub-signature extracts the attribute key.

You can build quite impressive things with this feature, for example red-black tree balancing based on multi dispatch and signature unpacking. (More verbose explanation of the code.) Most use cases aren't this impressive, but still it is very useful to have occasionally. Like for this small evaluator.

Dave's Free Press: Journal: Thanks, Yahoo!

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 2

Dave's Free Press: Journal: YAPC::Europe 2007 travel plans

Perlgeek.de : Quo Vadis Perl?

The last two days we had a gathering in town named Perl (yes, a place with that name exists). It's a lovely little town next to the borders to France and Luxembourg, and our meeting was titled "Perl Reunification Summit".

Sadly I only managed to arrive in Perl on Friday late in the night, so I missed the first day. Still it was totally worth it.

We tried to answer the question of how to make the Perl 5 and the Perl 6 community converge on a social level. While we haven't found the one true answer to that, we did find that discussing the future together, both on a technical and on a social level, already brought us closer together.

It was quite a touching moment when Merijn "Tux" Brand explained that he was skeptic of Perl 6 before the summit, and now sees it as the future.

We also concluded that copying API design is a good way to converge on a technical level. For example Perl 6's IO subsystem is in desperate need of a cohesive design. However none of the Perl 6 specification and the Rakudo development team has much experience in that area, and copying from successful Perl 5 modules is a viable approach here. Path::Class and IO::All (excluding the crazy parts) were mentioned as targets worth looking at.

There is now also an IRC channel to continue our discussions -- join #p6p5 on irc.perl.org if you are interested.

We also discussed ways to bring parallel programming to both perls. I missed most of the discussion, but did hear that one approach is to make easier to send other processes some serialized objects, and thus distribute work among several cores.

Patrick Michaud gave a short ad-hoc presentation on implicit parallelism in Perl 6. There are several constructs where the language allows parallel execution, for example for Hyper operators, junctions and feeds (think of feeds as UNIX pipes, but ones that allow passing of objects and not just strings). Rakudo doesn't implement any of them in parallel right now, because the Parrot Virtual Machine does not provide the necessary primitives yet.

Besides the "official" program, everybody used the time in meat space to discuss their favorite projects with everybody else. For example I took some time to discuss the future of doc.perl6.org with Patrick and Gabor Szabgab, and the relation to perl6maven with the latter. The Rakudo team (which was nearly completely present) also discussed several topics, and I was happy to talk about the relation between Rakudo and Parrot with Reini Urban.

Prior to the summit my expectations were quite vague. That's why it's hard for me to tell if we achieved what we and the organizers wanted. Time will tell, and we want to summarize the result in six to nine months. But I am certain that many participants have changed some of their views in positive ways, and left the summit with a warm, fuzzy feeling.

I am very grateful to have been invited to such a meeting, and enjoyed it greatly. Our host and organizers, Liz and Wendy, took care of all of our needs -- travel, food, drinks, space, wifi, accommodation, more food, entertainment, food for thought, you name it. Thank you very much!

Update: Follow the #p6p5 hash tag on twitter if you want to read more, I'm sure other participants will blog too.

Other blogs posts on this topic: PRS2012 – Perl5-Perl6 Reunification Summit by mdk and post-yapc by theorbtwo

Dave's Free Press: Journal: Wikipedia handheld proxy

Perlgeek.de : SQLite support for DBIish

DBIish, the new database interface for Rakudo Perl 6, now has a working SQLite backend. It uses prepared statements and placeholders, and supports standard CRUD operations.

Previously the SQLite driver would randomly report "Malformed UTF-8 string" or segfault, but usually worked pretty well when run under valgrind. The problem turned out to be a mismatch between the caller's and the callee's ideas about memory management.

In particular, parrot's garbage collector would deallocate strings passed to sqlite3_bind_text after the call was done, but sqlite wants such values to stay around until the next call to sqlite3_step in the very least.

Fixing this mismatch was enabled by this patch, which lets you mark strings as explicitly managed. Such strings keep their marshalled C string equivalent around until they are garbage-collected themselves. So now the sqlite driver keeps a copy of the strings as long as necessary, and the SQLite tests pass reliably.

Currently it still needs the cstr branches in the nqp and zavolaj repositories, but they will be merged soon -- certainly before the May release of Rakudo.

Perlgeek.de : News in the Rakudo 2012.05 release

The Rakudo Star release 2012.05 comes with many improvements to the compiler. Some people have asked what they mean, so I want to explain some of them here.

The new -I and -M allow manipulation of the library search path and loading of modules, similar to Perl 5.

perl6 -Ilib t/yourtest.t  # finds your module under lib/

If you want to manipulate the search path from inside a script or module, you can now use the new lib module, again known from Perl 5.

# file t/yourtest.t;
use v6;
use lib 't/lib'; # now can load testing modules from t/lib/Yourmodule/Test.pm
use Yourmodule::Test;
...

If you look at how lib.pm is implemented, you'll notice another new feature: the ability to write a custom EXPORT subroutine -- necessary exactly for things like lib.pm.

But normal exporting and importing is now handled quite well from Rakudo. You can now mark routines as being exported to certain tag names:

module CGI {
    sub h1($text) is export(:HTML) { '<h1>' ~ $text ~ '</h1>' }
    sub param($key) is export { ... };
}

If you want to get only the HTML generating function(s), you can write

use CGI :HTML;

S11 has more details on the exporting and importing mechanism.

You can also import from within a single file by using import instead of use:

module Greeter {
    sub hello($who) is export {
        say "Hello $who";
    }
}

import Greeter; # make sub hello available in the current scope
hello('Perl 6 fans');

Dave's Free Press: Journal: Bryar security hole

Dave's Free Press: Journal: POD includes

Dave's Free Press: Journal: cgit syntax highlighting

Dave's Free Press: Journal: CPAN Testers' CPAN author FAQ

Perlgeek.de : Correctness in Computer Programs and Mathematical Proofs

While reading On Proof and Progress in Mathematics by Fields Medal winner Bill Thurston (recently deceased I was sorry to hear), I came across this gem:

The standard of correctness and completeness necessary to get a computer program to work at all is a couple of orders of magnitude higher than the mathematical community’s standard of valid proofs. Nonetheless, large computer programs, even when they have been very carefully written and very carefully tested, always seem to have bugs.

I noticed that mathematicians are often sloppy about the scope of their symbols. Sometimes they use the same symbol for two different meanings, and you have to guess from context which on is meant.

This kind of sloppiness generally doesn't have an impact on the validity of the ideas that are communicated, as long as it's still understandable to the reader.

I guess on reason is that most mathematical publications still stick to one-letter symbol names, and there aren't that many letters in the alphabets that are generally accepted for usage (Latin, Greek, a few letters from Hebrew). And in the programming world we snort derisively at FORTRAN 77 that limited variable names to a length of 6 characters.

Dave's Free Press: Journal: Thankyou, Anonymous Benefactor!

Dave's Free Press: Journal: Number::Phone release

Dave's Free Press: Journal: Ill

Dave's Free Press: Journal: CPANdeps upgrade

Dave's Free Press: Journal: YAPC::Europe 2006 report: day 3

Perlgeek.de : iPod nano 5g on linux -- works!

For Christmas I got an iPod nano (5th generation). Since I use only Linux on my home computers, I searched the Internet for how well it is supported by Linux-based tools. The results looked bleak, but they were mostly from 2009.

Now (December 2012) on my Debian/Wheezy system, it just worked.

The iPod nano 5g presents itself as an ordinary USB storage device, which you can mount without problems. However simply copying files on it won't make the iPod show those files in the play lists, because there is some meta data stored on the device that must be updated too.

There are several user-space programs that allow you to import and export music from and to the iPod, and update those meta data files as necessary. The first one I tried, gtkpod 2.1.2, worked fine.

Other user-space programs reputed to work with the iPod are rhythmbox and amarok (which both not only organize but also play music).

Although I don't think anything really depends on some particular versions here (except that you need a new enough version of gtkpod), here is what I used:

  • Architecture: amd64
  • Linux: 3.2.0-4-amd64 #1 SMP Debian 3.2.35-2
  • Userland: Debian GNU/Linux "Wheezy" (currently "testing")
  • gtkpod: 2.1.2-1
Header image by Tambako the Jaguar. Some rights reserved.