Perl News: YAPC::EU videos start appearing online

@yapceu has posted a link to the start of videos from the conference.

YAPC::EU 2014 was the biggest conference in Europe, dedicated to the Perl programming language.

http://www.yapc.eu/ lists other Perl conferences and workshops around Europe.

brian d foy: I learn something about tell(), then abuse it.

I learned a new thing today, or remembered a forgotten one. I can use tell to affect the file handle that $. uses.

It all started very simply. I was going too far in my answer to How do I add the elements of a file to a second one as columns using Perl?, a question I found by looking for the most down voted open questions without an accepted answer. As usual, I thought the answer would be easy. And, for the most part it was.

Then I wanted to make it even easier. I thought Perl might not be necessary at all when we have things like paste and head and tail and other command-line thingys. The problem was a header in one input file and no corresponding header in the other. How could I make paste ignore the header?

I bet there's something that I'm missing, but I started working with the Perl Power Tools version of paste. To fast forward through a file to get to the right starting point, I wanted to look at $. to know when to stop, but that only works for the last read filehandle. To use it on another filehandle, I need to do something to to that handle without disturbing the data. tell was just the thing.


tell( $fh )
readline( $fh ) while $. < $starting_line - 1;

But, now I think that's also stupid because I didn't need the magic because I don't need to know the number of the currently read line:


readline( $fh ) foreach 1 .. $starting_line - 1;

As Perl gives, so Perl takes away (brain cells).

Laufeyjarson writes... » Perl: PBP: 034 Single-Character Strings

The PBP suggests never using “” or ” for empty strings, and using q{} instead.  Because clearly that’s so much more readable.The concern here is that ” (two single-quotes) might look like ” (a single double quote) on it’s own in some fonts.  I really think that context and, in a modern editor, syntax highlighting, will help keep the difference clear.  I don’t like the quote-like operators, and don’t think it’s worth dragging Perl back into the land of lines full of line-noise with the extra braces to protect some poor sucker from their poor font choice.

Using a pair of double quotes remains unambiguous and is, in my opinion, much clearer.  I’m horrified to discover my empty string might interpolate something by accident!  I just can’t care, and find it too ugly to use.

This is one of the PerlCritic warnings I turn off right away.

logicLAB.org: Task::Date::Holidays

I have just revived a lot of my CPAN distributions after they where stranded in migration. One these distributions is Date::Holidays a wrapper/adapter to modules in the Date::Holidays namespace and related. Since development started up again I have made several releases and I am on a quest to get all of the RTs/issues out of the way.

Some release history:

0.19 2014.08.27 bug fix release, update not required (see below)

- This release addressed reports on failing tests for perl 5.21
The use in this distribution of UNIVERSAL is now deprecated,
see: Github issue [#3] and [RT:98337]

0.18 2014.08.24 feature release, update not required

- Added adapter class for Date::Holidays::BR [RT:63437]

0.17 2014.08.22 maintenance release, update not required

- Migrated from Module::Build to Dist::Zilla

- Fixed issue in some test, which would break if Date::Holidays::DK
was not installed

0.16 2014.08.18 maintenance release, update not required

- Fixed POD error

- Aligned all version numbers

- Added t/kwalitee.t Test::Kwalitee test

- Added t/changes.t Test::CPAN::Changes test

What struck me when I was shifting back and forth between perl versions on my laptop and had to install some of the Date::Holidays modules over and over again, was:

  1. I have to refamiliarize me with my own code
  2. I have to get an overview of what new distributions have been added to the namespace and acquire my attention, I feel like I have been on a looooong holiday
  3. I seriously need to get some work done and get some releases out

Enter Task::Date::Holidays! – using this distribution it will be easy for me to get all of the interesting distributions installed when I have completed point 2, then I can focus on point 3 and point one will solve itself.

Task::Date::Holidays 0.01, contain the following list of distributions:

- Date::Holidays::AT
- Date::Holidays::NO
- Date::Holidays::DK
- Date::Holidays::DE
- Date::Holidays::GB
- Date::Holidays::PT
- Date::Holidays::ES
- Date::Holidays::PL
- Date::Holidays::CZ
- Date::Holidays::KR
- Date::Holidays::SK
- Date::Holidays::FR
- Date::Holidays::BR
- Date::Holidays::CA_ES
- Date::Holidays::USFederal
- Date::Holidays::CA
- Date::Holidays::CN
- Date::Holidays::NZ
- Date::Holidays::AU

Many of these are completely new to me, so this will be very interesting – so expect plenty of releases of Date::Holidays as I chew my way through the list…

jonasbn, Copenhagen

PAL-Blog: Selbstversorger: Eine Bildergeschichte

Zoe wollte heute Bratkartoffeln zum Mittagessen! Kleines Problem an der Sache: Unsere Kartoffelvorräte warten gerade darauf, bei der nächsten Jagd beim nächsten Supermarktbesuch aufgefüllt zu werden. Zum Glück sind gerade Sommerferien und so hatte sie genug Zeit, ihr Mittagessen selbst zu jagen, erlegen, schlachten, auszunehmen und schließlich zu verspeisen.

Ovid: Try rakudobrew and play with concurrency

rakudobrew is similar to perlbrew, but it's for Rakudo (a.k.a., Perl 6), the Perl-inspired language that we've all come to have a love/hate relationship with. I urge you to try it out, but first, some interesting new developments that you should probably know about.

By now you've probably heard of Perl 6, either the greatest dynamic programming language ever created or one of the longest-running shaggy dog stories in history (more so than even Duke Nukem Forever). That being said, with MoarVM and other experiments, it looks like Perl 6 is finally turning an interesting corner.

There's an old saying in programming of "make it good, then make it fast." They're finally working on making Perl 6 fast. In some examples, it's actually an order of magnitude faster than Perl 5. That's astonishing given that Perl 5 is already the fastest dynamic language out there. In other examples, it's more than an order of magnitude slower, but with their new profiling tools, the Perl 6 devs are quickly finding performance bottlenecks and, more importantly, finally have an architecture that they're convinced is good enough to optimize.

So, um, yeah. You've heard all this before. You've gotten your hopes up. You've been let down. Christmas after Christmas, there were no presents under that red-black tree.

So let's look at this differently. Moore's law is coming to an end. It has to be. There are simply physical limitations (they're called subatomic particles) beyond which we can't go. Until someone comes up with a revolutionary method of computation (don't hold your breath on quantum computers), we are going to see a change in how software works. If you want it faster, you don't just wait 18 months for new hardware; you go concurrent (we'll skip the distinction between concurrency and parallelism for now).

There is not, to my knowledge, a single popular dynamic programming language which has a working concurrency model. They're all broken in fundamental ways. Perl 6 aims to change that by making concurrency, if not easy, at least easier. And it's working now. In fact, even if the language is slow, if it makes it easier to do concurrent programming there are those who will adopt it (I'm looking at you, Erlang). Perl 6 is relatively easy to write and it's concurrency is easy to read. Let's take a look.

First, install rakudobrew.

git clone https://github.com/tadzik/rakudobrew ~/.rakudobrew

Then add that to your $PATH (I put mine in my .bashrc file):

export PATH=~/.rakudobrew/bin:$PATH

Then build it and (optionally) install the panda package manager (because we need more package managers, right?):

rakudobrew build moar && rakudobrew build-panda

Now test it:

$ perl6 -e 'say "Hello, World"'
Hello, World

So let's randomly sleep 100 times:

$ time perl6 -e 'for 1 .. 100 { rand.sleep }'

real    0m50.567s
user    0m0.288s
sys 0m0.040s

OK, that took almost a minute. Now let's run that in parallel:

$ time perl6 -e 'await do for 1 .. 100 { start { rand.sleep } }'

real    0m3.635s
user    0m0.367s
sys 0m0.100s

50 seconds down to 3 seconds. Not bad!

What does that code mean, though?

start means "start this code in another process and return a promise". A promise is merely a piece of asynchronous work that the system will try to complete. In other words, start returns a promise to complete this code in another process.

await takes a list of promises and waits for them to finish. It's a blocking function. Promises don't block, but await does:

$ time perl6 -e 'for 1 .. 100 { start { rand.sleep } }'

real    0m0.284s
user    0m0.237s
sys 0m0.076s

(But don't do that. Spawned threads will be taken down when the program exits, so make sure you wait for them to finish)

Jonathan Worthington has a brilliant talk about Perl 6 concurrency and here's the video for it:

Or you can read this blog post, if you prefer that.

This is important. Having somewhat easy-to-use, working parallel programming in a language will be a killer feature, regardless of whether or not it's a dynamic language. That being said, you may not care about this. You want to know if the language is fun and easy.

There's a Learn Y in X Minutes post for Perl 6 which explains many of the concepts. That should get you up to speed on the basics rather quickly.

Yes, Perl 6 still has bugs. I found two rather quickly, but with rakudobrew it will be easier to stay up to date and the concurrency work is very interesting.

If you need to dive into the language, you can also read the Perl 6 docs. There's a lot of new terminology, but the language is now stable enough that a tutorial is merited.

If I had the opportunity, I think I'd spend a lot of my time porting a Web framework to Perl 6. Sadly, I have to pay the bills and I can't, but I'm finding Perl 6 to be a lovely thing to play with. How many of you can say you still love someone/something after almost a decade and a half (ooh, I'm going to hell for that last comment).

logicLAB.org: Continous Integration with Travis CI and Github for Distzillians

This post started out at a comment on a blog post, but it was simply not possible for me to comment on the relevant post, so I have edited my comment into a blog post even though most of the contents are not mine :-/

So all the Kudos, credits, beers and stuff for the contents of this post should really go to: Alex Balhatchet, read his post:

        http://blogs.perl.org/users/alex_balhatchet/2013/04/travis-ci-perl.html

Okay – I have migrated all of my CPAN distributions to Github and all new projects start up here. Over the years I have used Jenkins for continuous integration, so I was very happy when I found out that Github offered continues integration using Travis.

I had everything going with Module::Build, which have been my preferred build system and one I have been using for a long time, with Travis CI on Github, but recently I have started porting all my distributions from Module::Build to Dist::Zilla, which meant that I had to revisit my whole toolchain.

Finding Alex’s article was just what I was looking for.

The after reading Dave Cross blog post and presentation I got coverage with coveralls integrated (see my blog post on those spiffy badges)

In order to get coverage integrated with Alex’s example, I have made the following changes to his suggested setup.

install is extended with:

- cpanm –quiet –notest Devel::Cover::Report::Coveralls
- cpanm –quiet –notest Dist::Zilla::App::Command::cover

and this additional after_success section has to be added:

after_success:
– dzil cover -outputdir cover_db -report coveralls

I did run into some issues with Alex version, so I have to explicitly install Test::Kwalitee since the version on the Travis CI platform is apparently too old, so also under install I do:

- cpanm –quiet –notest Test::Kwalitee

The Test::Kwalitee issue should go away Karen Etheridge patched her Dist::Zilla plugin

An in addition I also had to install the dependencies of the plugins explicitly under install:

- dzil listdeps –author | cpanm –quiet –notest –skip-satisfied

This was due to an issue with Pod::Coverage tests, where Pod::Coverage::TrustPod.pm is missing.

You can have a look at one of my setups from my code Github:

language: perl
perl:
– “5.18″
– “5.16″
– “5.14″
– “5.12″
– “5.10″

before_install:

# Prevent “Please tell me who you are” errors for certain DZIL configs

– git config –global user.name “TravisCI”

install:

# Deal with all of the DZIL dependencies, quickly and quietly

– cpanm –quiet –notest –skip-satisfied Dist::Zilla

# Hack to getting the latest Test::Kwalitee

– cpanm –quiet –notest Test::Kwalitee

# Getting coveralls report

– cpanm –quiet –notest Devel::Cover::Report::Coveralls

# Getting cover command for Dist::Zilla

– cpanm –quiet –notest Dist::Zilla::App::Command::cover

# Getting all the plugins used by Dist::Zilla in this particular setup

– dzil authordeps | grep -vP ‘[^\w:]‘ | xargs -n 5 -P 10 cpanm –quiet –notest –skip-satisfied

# Getting all the dependencies requested by author

– dzil listdeps –author | cpanm –quiet –notest –skip-satisfied

– export RELEASE_TESTING=1 AUTOMATED_TESTING=1 AUTHOR_TESTING=1 HARNESS_OPTIONS=j10:c HARNESS_TIMER=1

# Getting all the dependencies requested by distribution

– dzil listdeps | grep -vP ‘[^\w:]‘ | cpanm –quiet –notest –skip-satisfied

script:

– dzil smoke –release –author

after_success:

– dzil cover -outputdir cover_db -report coveralls

Thanks to Alex Balhatchet for the post – and thanks to Dave for his awesome post and Karen for most impressive responsiveness, when reporting an issue.

Continued good day,

jonasbn, Copenhagen

Nestoria Dev Blog: Nestoria Devs at YAPC

YAPC::EU, the European edition of the Yet Another Perl Conference, was this past weekend. As mentioned in a previous post we sent along four of our developers: Alex, Sam, Ignacio and Tim. Here’s a brief (and photo filled) summary of our time in София България (that’s Sofia Bulgaria for those of you who don’t read Cyrillic.)

I (Alex, Nestoria CTO) have been to a lot of YAPCs, but for my team mates Sam, Tim and Ignacio it was their first one. A great opportunity for them to dive deep into the Perl community and learn a huge amount in a short time.

Thursday

Unfortunately this year our flight was too late in the evening, and we ended up missing the traditional pre-conference drinks. I won’t be making this mistake again with any future YAPCs we attend - it sounded like we definitely missed out on a fun night.

On the plus side our flight from London to Sofia was pleasantly uneventful, and our hotel - 10 minutes from the airport, 1 minute from the conference venue - was very nice. I think we all slept well and were ready for the conference to begin on Friday morning.

Friday

Our 1 minute walk from the hotel to the conference venue was nice - no danger of getting lost, just follow the nerdy T-Shirts. Unsurprisingly a lot of other Perl Mongers were staying at our hotel, and the hotel breakfasts got more social as the week went on.

The venue was nice, especially the large room set aside for keynotes, lightning talks and the talks expected to be the most popular. Good chairs, good audio/visual equipment, and very helpful conference staff.

A huge thank you and shout out to Marian Marinov and his team!

And a smaller, but more personal, thank you to Marian for getting our banner printed in time for Friday despite me emailing him the PDF on Thursday morning :-) What do you think? I’m quite proud of it.

Speaking of things I’m proud of, on Friday I spoke about the Nestoria Geocoder and the new OpenCage Data API that allows people outside Nestoria to take advantage of it. I think the talk was quite well received, although everybody’s geocoding challenges are a bit different so some audience members who wanted exact house-number addressing were disappointed.

The scheduling committee had done a nice job this year of grouping together similar talks, which meant that my talk kicked off an afternoon of Geo-related presentations. I particularly enjoyed Hakim Cassimally’s talk on Civic Hacking. I hadn’t realised that MySociety’s projects were being used in Africa and Asia as well as within the UK - very cool!

As usual after the main tracks ended we had the lightning talks. I spoke again - this time about Test Kit 2.0, a slightly shorter version of a talk I gave at a recent London.pm Technical Meeting. Hopefully I convinced a few other developers to delete all the boilerplate from their .t files.

After the lightning talks, Curtis “Ovid” Poe gave a fantastic key note about managerless companies. He started out comparing the extremely hierarchical companies of the 90s and 00s with feudal society centuries ago in Britain, and then went on to give some great real world examples of companies being run differently and how they are succeeding. As well as the usual tech examples of Valve Software and Github he mentioned some non-tech companies, such as Semco in Brazil, which was certainly eye-opening for me. At Nestoria we are pretty good at hiring smart people and giving them the freedom to solve problems in whatever way they see fit; but going truly managerless is a big step up from that, and lead to some great discussions between me and my devs.

Friday ended with the traditional conference dinner, with the traditional challenges of getting a few hundred developers onto a few coaches and to a very very large restaurant. The food was very tasty, and very plentiful; we had fresh bread rolls, two starters, then some Bulgarian folk dance as entertainment, followed by a large main and a very tasty dessert. But the food was definitely topped by the view: the restaurant was on a lake in the Bulgarian countryside, and the sight was stunning.

Saturday

Saturday morning started out with a small Dev Ops track for me, while Sam and Ignacio went to some Web talks, and Tim saw some presentations about search and data.

For my part I really enjoyed Marian’s talk about creating Linux containers with Perl, and look forward to his libraries being finished and up on CPAN.

After lunch was pretty much an MST-fest, as Matt S Trout gave a 50 minute talk on Devops Logique and a 50 minute keynote on The State of the Velociraptor. Both were very interesting, and I had to smile when the topic of Prolog came up - back in university we studied Prolog and Haskell in our first year, quite an unusual introduction to programming I think.

Before the keynote came the second day of lightning talks, and the second day where I gave a talk. This time around I talked about this very blog - and announced live that this month’s Module of the Month winner was Tim Bunce for Devel::NYTProf. Unsurprisingly Tim got a thunderous round of applause, despite not being there this year.

Dan Muey of cPanel gave a great talk about Unicode and Perl which definitely resonated with me; by which I mean it exactly matched our unicode style guide :-)

Sunday

Sam and Ignacio were very excited for Sunday, as that seemed to be where all the web related talks went. Sawyer X gave a particularly good introduction to Plack and PSGI, and then went on to share how Booking.com has managed to gradually shift over to PSGI running on uWSGI. I learned a huge amount, and I hope we can make a similar shift at Nestoria sometime soon.

Susanne Schmidt (Su-Shee) also gave a wonderful introduction to the wide and not-so-varied world of web frameworks. In preparation she had built the same application - a cat GIF browser, naturally - in about 10-20 different frameworks across 5-10 different languages! Unsurprisingly a lot of them are almost identical - Dancer, Sinatra, Django, Rails, Mojolicious all seem to have borrowed ideas from one another over the years. I had no idea though that R (yes, the statistics language) has a web framework! And it’s pretty nice too, you can produce some really great graphs and charts with it with very little code.

I’d also like to shout out Tatsuro Hisamori (aka まいんだー) for coming over from Japan and to tell us about how he sped up his test suite from 40 minutes to 3 minutes. We actually have a pretty similar set up here at Nestoria - spreading different groups of tests over different VMs, with lots of parallelisation and a a home grown web interface to the results. Their project Ukigumo’s web interface looks scarily similar to ours.

To round out the day Sawyer reprised his The Joy In What We Do keynote from YAPC::NA. It’s a touching tale of how he learned programming, and Perl, and how we should all take time to reflect how fun programming can be. The talk ends with some of the Perl language features and CPAN libraries we should be proud of, and we should be talking about in the wider programming community. All in all I left feeling pretty happy to be a Perl dev.


So that was YAPC::EU 2014! It was an absolute blast, and we can’t wait to sponsor and send some devs over to Granada for YAPC::EU 2015 next September.

Of course we don’t have to wait that long for the next Perl event. We’re sponsoring and attending The London Perl Workshop 2014 in November - hope to see you there!

Laufeyjarson writes... » Perl: PBP: 033 String Delimiters

The PBP suggests using interpolating string delimiters only when they’re needed.  How much of my time do I get to waste changing single to double quotes because there’s a contraction in a message, or the other way because there isn’t?

This is one of the things I like least about turning on PerlCritic’s “brutal” level.  Every string can only use double quotes if it needs them.  When it needs them is a little harder than you might think, and I wind up converting back and forth constantly, spending lots of my time for something that I feel has little value.

The PBP suggests that unexpected variable interpolation is a source of many errors.  I find this a strange claim; variable interpolation is one of the basic parts of the language.  If a very new programmer is having trouble with it, that’s learning curve.  Using the different quotes to make the behavior less obvious won’t help.  A programmer with any level of experience won’t have complex issues here.

Here is the first place the PBP suggests using one of the q{} functions, too.  I find them hideously ugly and unreadable.  They are, in my opinion, ugly side effects of other things, to be used only when needed.  Single-quoting every string with a contraction in it isn’t “needed” in this case.  Don’t.  Changing the delimiters on q{} to let things work without backslashing is also ugly and much harder to read, in my opinion.

At least the book isn’t suggesting you avoid interpolating quotes for speed – there’s no need for that, as the interpreter notices and picks the right ones.

Use quotes, and be happy.  Don’t sweat interpolating quotes when something isn’t going to interpolate.  Where I use PerlCritic, this gets turned off right away.

Perl Foundation News: Outreach Program for Women: Mentor's Summary

Randy Stauner writes:

Thanks to the sponsorship of the Perl Foundation, this summer I had the pleasure of mentoring Pattawan Kaewduangdee from Thailand as a MetaCPAN contributor through the GNOME Foundation's Free and Open Source Program for Women.

Pattawan was a great help to MetaCPAN and the Perl community. She's bright and ambitious and accomplished a lot during her internship. Her schedule adapted over the summer and she kept up with it well.

I was assisted by Thomas Sibley and Olaf Alders. Thomas shared his OpenID experience to help direct one of Pattawan's longer and more difficult pull requests and was available on IRC to answer questions. Olaf was busy mentoring a student with Google Summer of Code but made time to review and discuss issues and do video chat Hangouts with us. I'm grateful to them both for the support.

Overall Pattawan submitted 27 pull requests, several of which were quite long and involved. Noteworthy changes include implementing OpenID login, upgrading the website to Bootstrap 3, adding activity to news feeds and making them more visible, helping us improve the development virtual machine for future contributors, and implementing several features and bug fixes that had plagued our users for years :-). At the end of the program a few of the pull requests have not yet been merged, but they are nearly complete and will be merged soon.

Pattawan was very receptive and I saw her learn and grow over the summer. I'm pleased to see the ways that she engaged with the community: on IRC, by commenting on lots of github issues (several that were not her own), and arranging to go to YAPC::Asia. It seemed to me that Pattawan's involvement inspired more people to get involved as the number of pull requests from others increased. One couldn't ask for more than that.

She also helped me to learn and grow as a contributor and a mentor and I'm grateful to her for that. I believe she will continue to be an open source contributor, both with MetaCPAN and other projects.

Many thanks to the GNOME Foundation, the Perl Foundation, and all involved with OPW.

Hacking Thy Fearful Symmetry: Testing Dancer Applications

So you wrote a Dancer or Dancer2 application and, good programmer that you are, you want to test it. It's a kind of no-brainer that Dancer::Test/Dancer2::Test is the module that you should reach for, right?

Well, maybe not.

The truth is, Dancer::Test was created as necessary collateral when Dancer came to be. But since then a few PSGI-generic testing modules appeared on CPAN. Covering more functionality, better maintained, arguably superior in pretty much every way imaginable, they are kinda making the Dancer-specific module obsolete.

Actually, scratch that "kinda". Typical of his usual soft-spoken magnamity, Sawyer X declared that DANCER2::TEST MUST DIE, and as of the last release of Dancer2, using it will trigger a warning and recommend you to use Test::Plack instead.

So, if not Dancer::Test and Dancer2::Test, what then? As mentioned above, the Dancer crew recommends Plack::Test. But there is also Test::TCP and Test::WWW::Mechanize::PSGI.

How do they compare? What is the proper way to make them play nice with the Dancer app to test? Pretty good questions. To answer them, I created the default boilerplate application for Dancer and Dancer2 (via dancer -a Test1 and dancer2 -a Test2), and implemented very simple tests for each module. Let's see how it looks.

Dancer::Test / Dancer2::Test

The testing modules that come bundled with Dancer itself. Pros: no need to install any additional module. Cons: not as complete and sound as the other testing modules, and downright actively deprecated in the case of Dancer2.

Dancer test

          
use strict;
use warnings;

use Test::More tests => 3;

use Test1;
use Dancer::Test;

route_exists '/', 'a route handler is defined for /';

response_status_is '/', 200, 'response status is 200 for /';

response_content_like '/' => qr#<title>Test1</title>#, 'title is okay';

        

Dancer2 test

          
use strict;
use warnings;

use Test::More tests => 3;

use Test2;

use Dancer2::Test apps => [ 'Test2' ];

{ package Test2; set log => 'error'; }

# to silence the deprecation notice
$Dancer2::Test::NO_WARN = 1;

route_exists [ GET =>  '/' ], 'a route handler is defined for /';

response_status_is '/', 200, 'response status is 200 for /';

response_content_like '/' => qr#<title>Test2</title>#, 'title is okay';

        

Plack::Test

The testing module that comes with Plack itself. Pros: it's the standard for PSGI application testing. Cons: it's also fairly low-level.

Dancer test

          
use strict;
use warnings;

use Test::More tests => 3;

use Plack::Test;
use HTTP::Request::Common;

use Test1;
{ use Dancer ':tests'; set apphandler => 'PSGI'; set log => 'error'; }

test_psgi( Dancer::Handler->psgi_app, sub {
    my $app = shift;

    my $res = $app->( GET '/' );

    ok $res->is_success;

    is $res->code => 200, 'response status is 200 for /';

    like $res->content => qr#<title>Test1</title>#, 'title is okay';
} );

        

Dancer2 test

          
use strict;
use warnings;

use Test::More tests => 3;

use Plack::Test;
use HTTP::Request::Common;

use Test2;
{ package Test2; set apphandler => 'PSGI'; set log => 'error'; }

test_psgi( Test2::dance, sub {
    my $app = shift;

    my $res = $app->( GET '/' );

    ok $res->is_success;

    is $res->code => 200, 'response status is 200 for /';

    like $res->content => qr#<title>Test2</title>#, 'title is okay';
} );

        

Test::TCP

This one doesn't only test the application using its PSGI interface, but really run the application on a local random port. Pros: you really test the real, end-to-end deal. Cons: slightly slower, and can cause problems if your machine blocks some ports.

Dancer test

          
use strict;
use warnings;

use Test::More tests => 3;

use Test::TCP;
use Test::WWW::Mechanize;

Test::TCP::test_tcp( 
    client => sub {
        my $port = shift;

        my $mech = Test::WWW::Mechanize->new;

        $mech->get_ok( "http://localhost:$port/", 'a route handler is defined for /' );

        is $mech->status => 200, 'response status is 200 for /';

        $mech->title_is( 'Test1', 'title is okay' );

    },
    server => sub {
        use Test1;

        use Dancer ':tests';

        set port => shift;

        set log => 'error';

        Dancer->dance;
    }
);


        

Dancer2 test

          
use strict;
use warnings;

use Test::More tests => 3;

use Test::TCP;
use Test::WWW::Mechanize;

Test::TCP::test_tcp( 
    client => sub {
        my $port = shift;

        my $mech = Test::WWW::Mechanize->new;

        $mech->get_ok( "http://localhost:$port/", 'a route handler is defined for /' );

        is $mech->status => 200, 'response status is 200 for /';

        $mech->title_is( 'Test2', 'title is okay' );

    },
    server => sub {
        use Test2;

        package Test2;

        Dancer2->runner->{port} = shift;

        set log => 'error';

        dance;
    }
);

        

Test::WWW::Mechanize::PSGI

This one is my favorite. It's basically a wrapper that allows to use Test::WWW::Mechanize, itself a wrapper with nifty test helper functions for WWW::Mechanize, on PSGI applications. Also very nice: it allows the tests to be trivially reused against a real server by having the $mech object be a Test::WWW::Mechanize instead of a Test::WWW::Mechanize::PSGI.

Dancer test

          
use strict;
use warnings;

use Test::More tests => 3;

use Test::WWW::Mechanize::PSGI;

use Test1;
{ use Dancer ':tests'; set apphandler => 'PSGI'; set log => 'error'; }


my $mech = Test::WWW::Mechanize::PSGI->new(
    app => Dancer::Handler->psgi_app
);

$mech->get_ok( '/', 'a route handler is defined for /' );

is $mech->status => 200, 'response status is 200 for /';

$mech->title_is( 'Test1', 'title is okay' );

        

Dancer2 tests

          
use strict;
use warnings;

use Test::More tests => 3;

use Test::WWW::Mechanize::PSGI;

use Test2;
{ package Test2; set apphandler => 'PSGI'; set log => 'error'; }


my $mech = Test::WWW::Mechanize::PSGI->new(
    app => Test2::dance
);

$mech->get_ok( '/', 'a route handler is defined for /' );

is $mech->status => 200, 'response status is 200 for /';

$mech->title_is( 'Test2', 'title is okay' );

        

Nestoria Dev Blog: Module of the month August 2014: Devel::NYTProf

Welcome to another Module of the Month blog post, a recurring post in which we highlight particular modules, project or tools that we use here at Nestoria.

This month’s award goes to the amazing Devel::NYTProf, simply the best Perl code profiler there is and one of the most powerful tools you can reach for when working on a large and complex code base.

Let’s start out with some quotes from some rather intelligent blokes:

Prove where the bottleneck is

"Bottlenecks occur in surprising places, so don’t try to second guess and put in a speed hack until you have proven that’s where the bottleneck is." — Rob Pike

Don’t do it yet

"The First Rule of Program Optimization: Don’t do it. The Second Rule of Program Optimization (for experts only!): Don’t do it yet." — Michael A. Jackson

… but only after the code has been identified

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified" — Donald Knuth

All three of these quotes point towards a single truth: when your code is slow, and you suspect it could be faster, reach for your profiler!

And of course, the more powerful and feature-rich your profiler is, and the more code you can easily point it at, the more bottlenecks and potential optimization sites you will find.


At Nestoria we have wielded the great Devel::NYTProf against most areas of our code. It’s helped us get our internal geocoding down to 100ms per listing, which means we can re-geocode an entire country of listings in less than 24 hours. It’s helped us respond to over 90% website requests in less than 200ms. And it’s helped us process all of our metrics logs before 8am every day, so that our commercial team can quickly act on those numbers and do their jobs well.

Most recently we have been using Devel::NYTProf::Apache (also from TIMB!) to profile our website in production. By using the addpid option we have each Apache child process write its own nytprof.out file, which we can then merge together the files with nytprofmerge. We have around 30 Apache children at any given time so we end up with 5 hours of information from only 10 minutes of real time where our site is slower for our users.

(Note: we do turn off statement level profiling with stmts=0 and make sure to write the nytprof.out files to a ramdisk. Without those two hacks the site falls over.)


So thank you Tim, for Devel::NYTProf and for everything else you’ve done, and for being one of the nicest people in the Perl community :-)

Enjoy your $1 per week Gittip donation from us!

PAL-Blog: Kuchenmassaker

Was gehört zu einem Geburtstag? Genau, ein Kuchen. Kuchen-backen ist irgendwo zwischen Kinderkram und Quantenphysik angesiedelt, also eigentlich ganz einfach. Da Zoe zumindest in einem der beiden Grenzbereiche über gewisse Erfahrungen verfügt, sollte so ein läppischer Kuchen doch kein Problem sein. Wenn dann noch die beste Freundin vorbeikommt - was kann da schon schiefgehen?

Laufeyjarson writes... » Perl: PBP: 032 Utility Subroutines

Here’s where the PBP tells you to prefix “internal use only” subroutines with an underscore.  Why it calls them “utility subroutines” I don’t know.

This is a longstanding Perl practice that may even be in one of the many official docs somewhere, but is often unknown to new people. It’s not obvious what that means, and it allows a naive programmer to call them anyway and get themselves into trouble.  (This is why many people wish Perl had real private functions, to which I say feh.  Learn some care and don’t go where you aren’t wanted.)

The PBP suggests that “… and reserved (by ancient C/Unix convention) for non-public components of a system.”  This is probably true, but suffers the problem that anyone who didn’t get to Perl via C or by programming through the 1970′s may not have at hand.  It’s good to write it down and actually say it.

 

Perl Foundation News: New Grant Manager

We are pleased to announce that Mark Jensen has joined the Grants Committee as a Grant Manager.

Mark has been a core developer on the BioPerl project since 2009, and is the author of the Neo4j graph database Perl driver, REST::Neo4p. He currently manages the Data Coordinating Center team of The Cancer Genome Atlas.

Please join me in extending a warm welcome to Mark.

logicLAB.org: CPANday

CPANday is over – I had HUGE plans, well not for CPANday in particular, but for all of my CPAN contributions.

CPANday did however come very handy and was a magnificent initiative, but as always did not get as much done as I wanted to.

I made 7 maintenance releases, mostly getting up to speed with my distributions, which was previously stranded in migration.

- Business::DK::Phonenumber 0.07
- Business::DK::CPR 1.11
- Tie::Tools 1.07
- XML::Conf 0.05
- Business::Tax::VAT 1.04
- Business::DK::Postalcode 0.09
- Workflow 1.41

I made a single feature release of one of my newer distributions.

- Business::GL::Postalcode 0.03

I made two deletions.

- WWW::Nike::NikePlus::Public
- Business::OnlinePayment::Cashcow

The later very much spurred by the blog post by Book. There is no need to keep distributions on CPAN where the service they use is no longer available. The first was a scraper on the Nike+ community site, where a more well defined API has been developed. I do not think there is a Perl integration, but I do not really use Nike+ anymore after having shifted to Endomondo, so … Cashcow seems to have been discontinued. I created this distribution when working for a client, which was using Cashcow, but the company do not exist anymore and neither does the Cashcow owners it seems.

I would have loved to:

- Do a new release – I have several new releases planned (as part of a quest)
- Thank somebody – I need to buy NEILB a beer for his continued energy and contributions to the CPAN community
- Put my CPAN stuff on Github – Done
- Improve the POD (see also etc.) – I plan to go over POD on all of my modules, now that all of them are accessible on Github
- Close RTs – I have created a quest today to do exactly that
- Get coverage reports with Devel::Cover – I NEED coverage badges on all my distributions
- Address some test reports from CPAN testers – I have many of them as RTs, so they will be addressed
- Delete some distributions – Done and Done
- Blog about a CPAN module – there are SO many to choose from, but I have just started fooling around with Dist::Zilla, perhaps a blog post should be written :-)
- Give money to the Perl foundation – my employer does on a regular basis, but perhaps I should too

For me CPANday started a wave, it began slowly leading up to CPANday and now continuing with almost daily releases, cleaning, improving, fixing and most of all having a lot fun…

Thanks to all of the CPAN contributors, testers, users, Philippe Bruhat and especially Neil Bowers

Dave's Free Press: Journal: Module pre-requisites analyser

Dave's Free Press: Journal: CPANdeps

Dave's Free Press: Journal: Perl isn't dieing

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 3

Perlgeek.de : The Fun of Running a Public Web Service, and Session Storage

One of my websites, Sudokugarden, recently surged in traffic, from about 30k visitors per month to more than 100k visitors per month. Here's the tale of what that meant for the server side.

As a bit of background, I built the website in 2007, when I knew a lot less about the web and programming. It runs on a host that I share with a few friends; I don't have root access on that machine, though when the admin is available, I can generally ask him to install stuff for me.

Most parts of the websites are built as static HTML files, with Server Side Includes. Parts of those SSIs are Perl CGI scripts. The most popular part though, which allows you to solve Sudoku in the browser and keeps hiscores, is written as a collection of Perl scripts, backed by a mysql database.

When at peak times the site had more than 10k visitors a day, lots of visitors would get a nasty mysql: Cannot connect: Too many open connections error. The admin wasn't available for bumping the connection limit, so I looked for other solutions.

My first action was to check the logs for spammers and crawlers that might hammered the page, and I found and banned some; but the bulk of the traffic looked completely legitimate, and the problem persisted.

Looking at the seven year old code, I realized that most pages didn't actually need a database connection, if only I could remove the session storage from the database. And, in fact, I could. I used CGI::Session, which has pluggable backend. Switching to a file-based session backend was just a matter of changing the connection string and adding a directory for session storage. Luckily the code was clean enough that this only affected a single subroutine. Everything was fine.

For a while.

Then, about a month later, the host ran out of free disk space. Since it is used for other stuff too (like email, and web hosting for other users) it took me a while to make the connection to the file-based session storage. What happened was 3 million session files on a ext3 file system with a block size of 4 kilobyte. A session is only about 400 byte, but since a file uses up a multiple of the block size, the session storage amounted to 12 gigabyte of used-up disk space, which was all that was left on that machine.

Deleting those sessions turned out to be a problem; I could only log in as my own user, which doesn't have write access to the session files (which are owned by www-data, the Apache user). The solution was to upload a CGI script that deleted the session, but of course that wasn't possible at first, because the disk was full. In the end I had to delete several gigabyte of data from my home directory before I could upload anything again. (Processes running as root were still writing to reserved-to-root portions of the file system, which is why I had to delete so much data before I was able to write again).

Even when I was able to upload the deletion script, it took quite some time to actually delete the session files; mostly because the directory was too large, and deleting files on ext3 is slow. When the files were gone, the empty session directory still used up 200MB of disk space, because the directory index doesn't shrink on file deletion.

Clearly a better solution to session storage was needed. But first I investigated where all those sessions came from, and banned a few spamming IPs. I also changed the code to only create sessions when somebody logs in, not give every visitor a session from the start.

My next attempt was to write the sessions to an SQLite database. It uses about 400 bytes per session (plus a fixed overhead for the db file itself), so it uses only a tenth of storage space that the file-based storage used. The SQLite database has no connection limit, though the old-ish version that was installed on the server doesn't seem to have very fine-grained locking either; within a few days I could errors that the session database was locked.

So I added another layer of workaround: creating a separate session database per leading IP octet. So now there are up to 255 separate session database (plus a 256th for all IPv6 addresses; a decision that will have to be revised when IPv6 usage rises). After a few days of operation, it seems that this setup works well enough. But suspicious as I am, I'll continue monitoring both disk usage and errors from Apache.

So, what happens if this solution fails to work out? I can see basically two approaches: move the site to a server that's fully under my control, and use redis or memcached for session storage; or implement sessions with signed cookies that are stored purely on the client side.

Dave's Free Press: Journal: Devel::CheckLib can now check libraries' contents

Perlgeek.de : Rakudo's Abstract Syntax Tree

After or while a compiler parses a program, the compiler usually translates the source code into a tree format called Abstract Syntax Tree, or AST for short.

The optimizer works on this program representation, and then the code generation stage turns it into a format that the platform underneath it can understand. Actually I wanted to write about the optimizer, but noticed that understanding the AST is crucial to understanding the optimizer, so let's talk about the AST first.

The Rakudo Perl 6 Compiler uses an AST format called QAST. QAST nodes derive from the common superclass QAST::Node, which sets up the basic structure of all QAST classes. Each QAST node has a list of child nodes, possibly a hash map for unstructured annotations, an attribute (confusingly) named node for storing the lower-level parse tree (which is used to extract line numbers and context), and a bit of extra infrastructure.

The most important node classes are the following:

QAST::Stmts
A list of statements. Each child of the node is considered a separate statement.
QAST::Op
A single operation that usually maps to a primitive operation of the underlying platform, like adding two integers, or calling a routine.
QAST::IVal, QAST::NVal, QAST::SVal
Those hold integer, float ("numeric") and string constants respectively.
QAST::WVal
Holds a reference to a more complex object (for example a class) which is serialized separately.
QAST::Block
A list of statements that introduces a separate lexical scope.
QAST::Var
A variable
QAST::Want
A node that can evaluate to different child nodes, depending on the context it is compiled it.

To give you a bit of a feel of how those node types interact, I want to give a few examples of Perl 6 examples, and what AST they could produce. (It turns out that Perl 6 is quite a complex language under the hood, and usually produces a more complicated AST than the obvious one; I'll ignore that for now, in order to introduce you to the basics.)

Ops and Constants

The expression 23 + 42 could, in the simplest case, produce this AST:

QAST::Op.new(
    :op('add'),
    QAST::IVal.new(:value(23)),
    QAST::IVal.new(:value(42)),
);

Here an QAST::Op encodes a primitive operation, an addition of two numbers. The :op argument specifies which operation to use. The child nodes are two constants, both of type QAST::IVal, which hold the operands of the low-level operation add.

Now the low-level add operation is not polymorphic, it always adds two floating-point values, and the result is a floating-point value again. Since the arguments are integers and not floating point values, they are automatically converted to float first. That's not the desired semantics for Perl 6; actually the operator + is implemented as a subroutine of name &infix:<+>, so the real generated code is closer to

QAST::Op.new(
    :op('call'),
    :name('&infix:<+>'),    # name of the subroutine to call
    QAST::IVal.new(:value(23)),
    QAST::IVal.new(:value(42)),
);

Variables and Blocks

Using a variable is as simple as writing QAST::Var.new(:name('name-of-the-variable')), but it must be declared first. This is done with QAST::Var.new(:name('name-of-the-variable'), :decl('var'), :scope('lexical')).

But there is a slight caveat: in Perl 6 a variable is always scoped to a block. So while you can't ordinarily mention a variable prior to its declaration, there are indirect ways to achieve that (lookup by name, and eval(), to name just two).

So in Rakudo there is a convention to create QAST::Block nodes with two QAST::Stmts children. The first holds all the declarations, and the second all the actual code. That way all the declaration always come before the rest of the code.

So my $x = 42; say $x compiles to roughly this:

QAST::Block.new(
    QAST::Stmts.new(
        QAST::Var.new(:name('$x'), :decl('var'), :scope('lexical')),
    ),
    QAST::Stmts.new(
        QAST::Op.new(
            :op('p6store'),
            QAST::Var.new(:name('$x')),
            QAST::IVal.new(:value(42)),
        ),
        QAST::Op.new(
            :op('call'),
            :name('&say'),
            QAST::Var.new(:name('$x')),
        ),
    ),
);

Polymorphism and QAST::Want

Perl 6 distinguishes between native types and reference types. Native types are closer to the machine, and their type name is always lower case in Perl 6.

Integer literals are polymorphic in that they can be either a native int or a "boxed" reference type Int.

To model this in the AST, QAST::Want nodes can contain multiple child nodes. The compile-time context decides which of those is acutally used.

So the integer literal 42 actually produces not just a simple QAST::IVal node but rather this:

QAST::Want.new(
    QAST::WVal(Int.new(42)),
    'Ii',
    QAST::Ival(42),
)

(Note that Int.new(42) is just a nice notation to indicate a boxed integer object; it doesn't quite work like this in the code that translate Perl 6 source code into ASTs).

The first child of a QAST::Want node is the one used by default, if no other alternative matches. The comes a list where the elements with odd indexes are format specifications (here Ii for integers) and the elements at even-side indexes are the AST to use in that case.

An interesting format specification is 'v' for void context, which is always chosen when the return value from the current expression isn't used at all. In Perl 6 this is used to eagerly evaluate lazy lists that are used in void context, and for several optimizations.

Dave's Free Press: Journal: I Love Github

Dave's Free Press: Journal: Palm Treo call db module

Ocean of Awareness: Evolvable languages

Ideally, if a syntax is useful and clear, and a programmer can easily read it at a glance, you should be able to add it to an existing language. In this post, I will describe a modest incremental change to the Perl syntax.

It's one I like, because that's beside the point, for two reasons. First, it's simply intended as an example of language evolution. Second, regardless of its merits, it is unlikely to happen, because of the way that Perl 5 is parsed. In this post I will demonstrate a way of writing a parser, so that this change, or others, can be made in a straightforward way, and without designing your language into a corner.

When initializing a hash, Perl 5 allows you to use not just commas, but also the so-called "wide comma" (=>). The wide comma is suggestive visually, and it also has some smarts about what a hash key is: The hash key is always converted into a string, so that wide comma knows that in a key-value pair like this:

    key1 => 711,

that key1 is intended as a string.

But what about something like this?

  {
   company name => 'Kamamaya Technology',
   employee 1 => first name => 'Jane',
   employee 1 => last name => 'Doe',
   employee 1 => title => 'President',
   employee 2 => first name => 'John',
   employee 2 => last name => 'Smith',
   employee 3 => first name => 'Clarence',
   employee 3 => last name => 'Darrow',
  }

Here I think the intent is obvious -- to create an employee database in the form of a hash of hashes, allowing spaces in the keys. In Data::Dumper format, the result would look like:

{
              'employee 2' => {
                                'last name' => '\'Smith\'',
                                'first name' => '\'John\''
                              },
              'company name' => '\'Kamamaya Technology\'',
              'employee 3' => {
                                'last name' => '\'Darrow\'',
                                'first name' => '\'Clarence\''
                              },
              'employee 1' => {
                                'title' => '\'President\'',
                                'last name' => '\'Doe\'',
                                'first name' => '\'Jane\''
                              }
            }

And in fact, that is the output of the script in this Github gist, which parses the previous "extended Perl 5" snippet using a Marpa grammar before passing it on to Perl.

Perl 5 does not allow a syntax like this, and looking at its parsing code will tell you why -- it's already a maintenance nightmare. The extension I've described above could, in theory, be added to Perl 5, but doing so would aggravate an already desperate maintenance situation.

Now, depending on taste, you may be just as happy that you'll never see the extensions I have just outlined in Perl 5. But I don't think it is as easy to be happy about a parsing technology that quickly paints the languages which use it into a corner.

How it works

The code is in a Github gist. For the purposes of the example, I've implemented a toy subset of Perl. But this approach has been shown to scale. There are full Marpa-powered parsers of C, ECMAScript, XPath, and liberal HTML.

Marpa is a general BNF parser, which means that anything you can write in BNF, Marpa can parse. For practical parsing, what matters are those grammars that can be parsed in linear time, and with Marpa that class is vast, including all the classes of grammar currently in practical use. To describe the class of grammars that Marpa parses in linear time, assume that you have either a left or right parser, with infinite lookahead, that uses regular expressions. (A parser like this is called LR-regular.) Assume that this LR-regular parser parses your grammar. In that case, you can be sure that Marpa will parse that grammar in linear time, and without doing the lookahead. (Instead Marpa tracks possibilities in a highly-optimized table.) Marpa also parses many grammars that are not LR-regular in linear time, but just LR-regular is very likely to include any class of grammar that you will be interested in parsing. The LR-regular grammars easily include all those that can be parsed using yacc, recursive descent or regular expressions.

Marpa excels at those special hacks so necessary in recursive descent and other techniques. Marpa allows you to define events that will stop it at symbols or rules, both before and after. While stopped, you can hand processing over to your own custom code. Your custom code can feed your own tokens to the parse for as long as you like. In doing so, it can consult Marpa to determine exactly what symbols and rules have been recognized and which ones are expected. Once finished with custom processing, you can then ask Marpa to pick up again at any point you wish.

The craps game is over

The bottom line is that if you can describe your language extension in BNF, or in BNF plus some hacks, you can rely on Marpa parsing it in reasonable time. Language design has been like shooting crap in a casino that sets you up to win a lot of the first rolls before the laws of probability grind you down. Marpa changes the game.

To learn more

Marpa::R2 is available on CPAN. A list of my Marpa tutorials can be found here. There are new tutorials by Peter Stuifzand and amon. The Ocean of Awareness blog focuses on Marpa, and it has an annotated guide. Marpa has a web page that I maintain and Ron Savage maintains another. For questions, support and discussion, there is the "marpa parser" Google Group.

Comments

Comments on this post can be made in Marpa's Google group.

Perlgeek.de : doc.perl6.org and p6doc

Background

Earlier this year I tried to assess the readiness of the Perl 6 language, compilers, modules, documentation and so on. While I never got around to publish my findings, one thing was painfully obvious: there is a huge gap in the area of documentation.

There are quite a few resources, but none of them comprehensive (most comprehensive are the synopsis, but they are not meant for the end user), and no single location we can point people to.

Announcement

So, in the spirit of xkcd, I present yet another incomplete documentation project: doc.perl6.org and p6doc.

The idea is to take the same approach as perldoc for Perl 5: create user-level documentation in Pod format (here the Perl 6 Pod), and make it available both on a website and via a command line tool. The source (documentation, command line tool, HTML generator) lives at https://github.com/perl6/doc/. The website is doc.perl6.org.

Oh, and the last Rakudo Star release (2012.06) already shipped p6doc.

Status and Plans

Documentation, website and command line tool are all in very early stages of development.

In the future, I want both p6doc SOMETHING and http://doc.perl6.org/SOMETHING to either document or link to documentation of SOMETHING, be it a built-in variable, an operator, a type name, routine name, phaser, constant or... all the other possible constructs that occur in Perl 6. URLs and command line arguments specific to each type of construct will also be available (/type/SOMETHING URLs already work).

Finally I want some way to get a "full" view of a type, ie providing all methods from superclasses and roles too.

Help Wanted

All of that is going to be a lot of work, though the most work will be to write the documentation. You too can help! You can write new documentation, gather and incorporate already existing documentation with compatible licenses (for example synopsis, perl 6 advent calendar, examples from rosettacode), add more examples, proof-read the documentation or improve the HTML generation or the command line tool.

If you have any questions about contributing, feel free to ask in #perl6. Of course you can also; create pull requests right away :-).

Perlgeek.de : YAPC Europe 2013 Day 2

The second day of YAPC Europe was enjoyable and informative.

I learned about ZeroMQ, which is a bit like sockets on steriods. Interesting stuff. Sadly Design decisions on p2 didn't quite qualify as interesting.

Matt's PSGI archive is a project to rewrite Matt's infamous script archive in modern Perl. Very promising, and a bit entertaining too.

Lunch was very tasty, more so than the usual mass catering. Kudos to the organizers!

After lunch, jnthn talked about concurrency, parallelism and asynchrony in Perl 6. It was a great talk, backed by great work on the compiler and runtime. Jonathans talk are always to be recommended.

I think I didn't screw up my own talk too badly, at least the timing worked fine. I just forgot to show the last slide. No real harm done.

I also enjoyed mst's State of the Velociraptor, which was a summary of what went on in the Perl world in the last year. (Much better than the YAPC::EU 2010 talk with the same title).

The Lightning talks were as enjoyable as those from the previous day. So all fine!

Next up is the river cruise, I hope to blog about that later on.

Ocean of Awareness: Language design: Exploiting ambiguity

Currently, in designing languages, we don't allow ambiguities -- not even potential ones. We insist that it must not be even possible to write an ambiguous program. This is unnecessarily restrictive.

This post is written in English, which is full of ambiguities. Natural languages are always ambiguous, because human beings find that that's best way for versatile, rapid, easy communication. Human beings arrange things so that every sentence is unambiguous in context. Mistakes happen, and ambiguous sentences occur, but in practice, the problem is manageable. In a conversation, for example, we would just ask for clarification.

If we allow our computer languages to take their most natural forms, they will often have the potential for ambiguity. This is even less of a problem on a computer than it is in conversation -- a computer can always spot an actual ambiguity immediately. When actual ambiguities occur, we can deal with them in exactly the same way that we deal with any other syntax problem: The computer catches it and reports it, and we fix it.

An example

To illustrate, I'll use a DSL-writing DSL language. It'll be tiny -- just lexeme declarations and BNF rules. Newlines will not be significant. Statements can end with a semicolon, but that's optional. (The code for this post is in a Github gist.)

Here is a toy calculator written in our tiny DSL-writing language:

  Number matches '\d+'
  E ::= T '*' F
  E ::= T
  T ::= F '+' Number
  T ::= Number

Trying an improvement

With a grammar this small, just about anything is readable. But let's assume we want to improve it, and that we decide that the lexeme declaration of Number really belongs after the rules which use it. (If our grammar was longer, this could make a real difference.) So we move the lexeme declaration to the end:

  E ::= T '*' F
  E ::= T
  T ::= F '+' Number
  T ::= Number
  Number matches '\d+'

But there's an issue

It turns out the grammar for our toy DSL-writer is ambiguous. When a lexeme declaration follows a BNF rule, there's no way to tell whether or not it is actually a lexeme declaration, or part of the BNF rule. Our parser catches that:

Parse of BNF/Scanless source is ambiguous
Length of symbol "Statement" at line 4, column 1 is ambiguous
  Choices start with: T ::= Number
  Choice 1, length=12, ends at line 4, column 12
  Choice 1: T ::= Number
  Choice 2, length=33, ends at line 5, column 20
  Choice 2: T ::= Number\nNumber matches '\\d

Here Marpa tells you why it thinks your script is ambiguous. Two different statements can start at line 4. Both of them are BNF rules, but one is longer than the other.

Just another syntax error

Instead of having to design a language where ambiguity was not even possible, we designed one where ambiguities can happen. This allows us to design a much more flexible language, like the ones we choose when we humans communicate with each other. The downside is that actual ambiguities will occur, but they can be reported, and fixed, just like any other syntax error.

In this case, we recall we allowed semi-colons to terminate a rule, and our fix is easy:

  E ::= T '*' F
  E ::= T
  T ::= F '+' Number
  T ::= Number ;
  Number matches '\d+'

To learn more

The code for this post is a gist on Github. It was written using Marpa::R2, which is available on CPAN. A list of my Marpa tutorials can be found here. There are new tutorials by Peter Stuifzand and amon. The Ocean of Awareness blog focuses on Marpa, and it has an annotated guide. Marpa has a web page that I maintain and Ron Savage maintains another. For questions, support and discussion, there is a "marpa parser" Google Group and an IRC channel: #marpa at irc.freenode.net.

Comments

Comments on this post can be made in Marpa's Google group.

Perlgeek.de : Stop The Rewrites!

What follows is a rant. If you're not in the mood to read a rant right now, please stop and come back in an hour or two.

The Internet is full of people who know better than you how to manage your open source project, even if they only know some bits and pieces about it. News at 11.

But there is one particular instance of that advice that I hear often applied to Rakudo Perl 6: Stop the rewrites.

To be honest, I can fully understand the sentiment behind that advice. People see that it has taken us several years to get where we are now, and in their opinion, that's too long. And now we shouldn't waste our time with rewrites, but get the darn thing running already!

But Software development simply doesn't work that way. Especially not if your target is moving, as is Perl 6. (Ok, Perl 6 isn't moving that much anymore, but there are still areas we don't understand very well, so our current understanding of Perl 6 is a moving target).

At some point or another, you realize that with your current design, you can only pile workaround on top of workaround, and hope that the whole thing never collapses.

Picture of
a Jenga tower
Image courtesy of sermoa

Those people who spread the good advice to never do any major rewrites again, they never address what you should do when you face such a situation. Build the tower of workarounds even higher, and pray to Cthulhu that you can build it robust enough to support a whole stack of third-party modules?

Curiously this piece of advice occasionally comes from people who otherwise know a thing or two about software development methodology.

I should also add that since the famous "nom" switchover, which admittedly caused lots of fallout, we had three major rewrites of subsystems (longest-token matching of alternative, bounded serialization and qbootstrap), All three of which caused no new test failures, and two of which caused no fallout from the module ecosystem at all. In return, we have much faster startup (factor 3 to 4 faster) and a much more correct regex engine.

Ocean of Awareness: A Marpa-powered C parser

Jean-Damien Durand has just released MarpaX::Languages::C::AST, which parses C language into an abstract syntax tree (AST). MarpaX::Languages::C::AST has been tested against Perl's C source code, as well as Marpa's own C source.

Because it is Marpa-powered, MarpaX::Languages::C::AST works differently from other C parsers. In the past, C parsers have been syntax-driven -- parsing was based on a BNF description of the C grammar. More recently, C parsers have used hand-written recursive descent -- they have been procedurally-driven.

MarpaX::Languages::C::AST uses both approaches. Marpa has the advantage that it makes full knowledge of the state of the parse available to the programmer, so that procedural logic and syntax-driven parsing can reinforce each other. The result is a combined lexer/parser which is very compact and easy to understand. Among the potential applications:

  • Customized "lints". You can write programs to enforce C language standards and restrictions specific to an individual, a company or a project.
  • C interpreters. By taking the AST and adding your own back end, you can create a special-purpose C interpreter or a special-purpose compiler.
  • C variants. Because the code for the parser is compact and easy to modify, it lends itself to language extension and experimentation. For example, you could reasonably implement compilers to try out the proposals submitted to a standards committee.
  • C supersets. Would you like to see some of the syntax from a favorite language available in C? Here's your chance.

The implementation

A few of Jean-Damien's implementation choices are worth noting. A C parser can take one of two strategies: approximate or precise. A compiler has, of course, to be precise. Tools, such as cross-referencers, often decide to be approximate, or sloppy. Sloppiness is easier to implement and has other advantages: A sloppy tool can tolerate missing C flags: what the C flags should be can be one of the things it guesses at.

Of the two strategies, Jean-Damien decided to go with "precise". MarpaX::Languages::C::AST follows the C11 standard, with either GCC or Microsoft extensions. This has the advantage that MarpaX::Languages::C::AST could be used as the front end of a compiler.

Because MarpaX::Languages::C::AST purpose is to take things as far as an AST, and let the user take over, it does not implement those constraints usually implemented in post-processing. One example of a post-syntactic constraint is the one that bans "case" labels outside of switch statements. Perhaps a future version can include a default "first phase" post-processor to enforce the constraints from the standards. As currently implemented, the user can check for and enforce these constraints in any way he likes. This makes it easier for extensions and customizations, which I think of as the major purpose of MarpaX::Languages::C::AST.

The parsing strategy

Those familar with the C parsing and its special issues may be interested in Jean-Damien's approach to them. MarpaX::Languages::C::AST is, with a few exceptions, syntax-driven -- the parser works from Marpa's SLIF, an extended BNF variant. The SLIF-driven logic is sufficient to deal with the if-then-else issue. Marpa handles right recursion in linear time, so that the if-then-else issue could have been dealt with by rewriting the relevant rules. But Jean-Damien wanted to have his BNF follow closely the grammar in the standards, and he decided to use Marpa's rule ranking facility instead.

More complicated is the ambiguity in C between variable names and types, which actually takes C beyond BNF and context-free grammars into context-sensitive territory. Most C parsers deal with this using lexer or post-processing hacks. Marpa allows the parser to do this more elegantly. Marpa knows the parsing context at all times and can comnunicate this to a user's customized code. Marpa also has the ability to use the parsing context to decide when to switch control from the syntax-driven logic to a user's customized procedural logic, and for the syntax-driven logic to take control back when the procedural logic wants to give it back. This allows the variable-name-versus-type ambiguity to be handled by specifically targeted code which knows the full context of the decisions it needs to make. This code can be written more directly, simply and clearly than was possible with previous parsing methods.

Compilers?

Above I mentioned special-purpose compilers. What about production compilers? MarpaX::Languages::C::AST's upper layers are in Perl, so the speed, while acceptable for special-purpose tools, will probably not be adequate for production. Perhaps a future Marpa-powered C parser will rewrite those upper layers in C, and make the race more interesting.

To learn more

Marpa::R2 is available on CPAN. A list of my Marpa tutorials can be found here. There are new tutorials by Peter Stuifzand and amon. The Ocean of Awareness blog focuses on Marpa, and it has an annotated guide. Marpa also has a web page. For questions, support and discussion, there is the "marpa parser" Google Group. Comments on this post can be made there.

Perlgeek.de : The REPL trick

A recent discussion on IRC prompted me to share a small but neat trick with you.

If there are things you want to do quite often in the Rakudo REPL (the interactive "Read-Evaluate-Print Loop"), it makes sense to create a shortcut for them. And creating shortcuts for often-used stuff is what programming languages excel at, so you do it right in Perl module:

use v6;
module REPLHelper;

sub p(Mu \x) is export {
    x.^mro.map: *.^name;
}

I have placed mine in $HOME/.perl6/repl.

And then you make sure it's loaded automatically:

$ alias p6repl="perl6 -I$HOME/.perl6/repl/ -MREPLHelper"
$ p6repl
> p Int
Int Cool Any Mu
>

Now you have a neat one-letter function which tells you the parents of an object or a type, in method resolution order. And a way to add more shortcuts when you need them.

Perlgeek.de : News in the Rakudo 2012.06 release

Rakudo development continues to progress nicely, and so there are a few changes in this month's release worth explaining.

Longest Token Matching, List Iteration

The largest chunk of development effort went into Longest-Token Matching for alternations in Regexes, about which Jonathan already blogged. Another significant piece was Patrick's refactor of list iteration. You probably won't notice much of that, except that for-loops are now a bit faster (maybe 10%), and laziness works more reliably in a couple of cases.

String to Number Conversion

String to number conversion is now stricter than before. Previously an expression like +"foo" would simply return 0. Now it fails, ie returns an unthrown exception. If you treat that unthrown exception like a normal value, it blows up with a helpful error message, saying that the conversion to a number has failed. If that's not what you want, you can still write +$str // 0.

require With Argument Lists

require now supports argument lists, and that needs a bit more explaining. In Perl 6 routines are by default only looked up in lexical scopes, and lexical scopes are immutable at run time. So, when loading a module at run time, how do you make functions available to the code that loads the module? Well, you determine at compile time which symbols you want to import, and then do the actual importing at run time:

use v6;
require Test <&plan &ok &is>;
#            ^^^^^^^^^^^^^^^ evaluated at compile time,
#                            declares symbols &plan, &ok and &is
#       ^^^                  loaded at run time

Module Load Debugging

Rakudo had some trouble when modules were precompiled, but its dependencies were not. This happens more often than it sounds, because Rakudo checks timestamps of the involved files, and loads the source version if it is newer than the compiled file. Since many file operations (including simple copying) change the time stamp, that could happen very easily.

To make debugging of such errors easier, you can set the RAKUDO_MODULE_DEBUG environment variable to 1 (or any positive number; currently there is only one debugging level, in the future higher numbers might lead to more output).

$ RAKUDO_MODULE_DEBUG=1 ./perl6 -Ilib t/spec/S11-modules/require.t
MODULE_DEBUG: loading blib/Perl6/BOOTSTRAP.pbc
MODULE_DEBUG: done loading blib/Perl6/BOOTSTRAP.pbc
MODULE_DEBUG: loading lib/Test.pir
MODULE_DEBUG: done loading lib/Test.pir
1..5
MODULE_DEBUG: loading t/spec/packages/Fancy/Utilities.pm
MODULE_DEBUG: done loading t/spec/packages/Fancy/Utilities.pm
ok 1 - can load Fancy::Utilities at run time
ok 2 - can call our-sub from required module
MODULE_DEBUG: loading t/spec/packages/A.pm
MODULE_DEBUG: loading t/spec/packages/B.pm
MODULE_DEBUG: loading t/spec/packages/B/Grammar.pm
MODULE_DEBUG: done loading t/spec/packages/B/Grammar.pm
MODULE_DEBUG: done loading t/spec/packages/B.pm
MODULE_DEBUG: done loading t/spec/packages/A.pm
ok 3 - can require with variable name
ok 4 - can call subroutines in a module by name
ok 5 - require with import list

Module Loading Traces in Compile-Time Errors

If module myA loads module myB, and myB dies during compilation, you now get a backtrace which indicates through which path the erroneous module was loaded:

$ ./perl6 -Ilib -e 'use myA'
===SORRY!===
Placeholder variable $^x may not be used here because the surrounding block
takes no signature
at lib/myB.pm:1
  from module myA (lib/myA.pm:3)
  from -e:1

Improved autovivification

Perl allows you to treat not-yet-existing array and hash elements as arrays or hashes, and automatically creates those elements for you. This is called autovivification.

my %h;
%h<x>.push: 1, 2, 3; # worked in the previous release too
push %h<y>, 4, 5, 6; # newly works in the 2012.06

Ocean of Awareness: Parsing Ada Lovelace

The application

Abstract Syntax Forests (ASF's) are my most recent project. I am adding ASF's to my Marpa parser. Marpa has long supported ambiguous parsing, and allowed users to iterate through, and examine, all the parses of an ambiguous parse. This was enough for most applications.

Even applications which avoid ambiguity benefit from better ways to detect and locate it. And there are applications that require the ability to select among and manipulate very large sets of ambiguous parses. Prominent among these is Natural Language Processing (NLP). This post will introduce an experiment. Marpa in fact seems to have some potential for NLP.

Writing an efficient ASF in not a simple matter. The naive implementation is to generate complete set of fully expanded abstract syntax trees (AST's). This approach consumes resources that can become exponential in the size of the input. Translation: the naive implementation quickly becomes unuseably slow. Marpa optimizes by aggressively identifying identical subtrees of the AST's. Especially in highly ambiguous parses, many subtrees are identical, and this optimization is often a big win.

Ada Lovelace

My primary NLP example at this point is a quote from Ada Lovelace. It is a long sentence, possibly the longest, from her Notes -- 158 words. A disadvantage of this example is that it is not typical of normal NLP. By modern standards it is an unusually long and complex sentence. An advantage of it, and my reason for the choice, is that it stresses the parser.

The "Note A" from which this sentence is taken is one of Ada's notes on a translation of a paper on the work of her mentor and colleague, Charles Babbage. Ada's "Notes" are longer than the original paper, and far more important. In these "Notes" Ada makes the first distinction between a computer and a calculator, and between software and hardware. In their collaboration, Babbage did all of the hardware design, and he wrote most of the actual programs in her paper. But these two revolutionary ideas, and their elaboration, are Ada's.

Why would Babbage ignore obvious implications of his own invention? The answer is that, while these implications are obvious to us, they simply did not fit into the 1843 view of the world. In those days, algebra was leading-edge math. The ability to manipulate equations was considered an extremely advanced form of reason. For Babbage and his contemporaries, that sort of ability to reason certainly suggested the ability to distinguish between good and evil, and this in turn suggested possession of a soul. Ada's "Notes" were written 20 years after Mary Shelly, while visiting Ada's father in Switzerland, wrote the novel Frankenstein. For Ada's contemporaries, announcing that you planned to create a machine that composed music, or did advanced mathematical reasoning, was not very different from announcing that you planned to assemble a human being in your lab.

Ada was the daughter of the poet Byron. For her, pushing boundaries was a family tradition. Babbage was happy to leave these matters to Ada. As Babbage's son put it, his father

considered the Paper by Menabrea, translated with notes by Lady Lovelace, published in volume 3 of Taylor's 'Scientific Memoirs," as quite disposing of the mathematical aspect of the invention. My business now is not with that.

On reading Ada

Ada's notes are worth reading, but the modern reader has to be prepared to face several layers of difficulty:

  • They are in Victorian English. In modern English, a long complex sentence is usually considered a editing failure. In Ada's time, following Greek and Roman examples, a periodic sentence was considered especially appropriate when making an important point. And good literary style and good scientific style were considered one and the same.
  • They are mathematical, and none of math is of the kind currently studied by programmers.
  • Ada has literally no prior literature on software to build on, and has to invent her terminology. It is almost never the modern terminology, and it can be hard to guess how it relates to modern terminology. For example, does Ada forsee objects, methods and classes? Ada speaks of computing both symbolic results and numeric data, and attaching one to the other. She clearly understands that the symbolic results can represent operations. Ada also clearly understands that numeric data can represent not just the numbers themselves, but notes, positions in a loom, or computer operations. So we have arbitrary data, tagged with symbols that can be both names and operations. But are these objects?
  • Finally, she associates mathematics with philosophy. In her day, this was expected. Unfortunately, modern readers now often see that sort of discussion as irrelevant, or even as a sign of inability to come to the point.

Ada's quote

Those who view mathematical science, not merely as a vast body of abstract and immutable truths, whose intrinsic beauty, symmetry and logical completeness, when regarded in their connexion together as a whole, entitle them to a prominent place in the interest of all profound and logical minds, but as possessing a yet deeper interest for the human race, when it is remembered that this science constitutes the language through which alone we can adequately express the great facts of the natural world, and those unceasing changes of mutual relationship which, visibly or invisibly, consciously or unconsciously to our immediate physical perceptions, are interminably going on in the agencies of the creation we live amidst: those who thus think on mathematical truth as the instrument through which the weak mind of man can most effectually read his Creator's works, will regard with especial interest all that can tend to facilitate the translation of its principles into explicit practical forms.

Ada, the bullet point version

Ada's sentence may look like what happens when two pickups carrying out-of-date dictionaries to the landfill run into each other on the way. But there is, in fact, a good deal of structure and meaning in all those words. Let's take it as bullet points:

  • 1. Math is awesome just for being itself.
  • 2. Math describes and predicts the external world.
  • 3. Math is the best way to get at what it is that is really behind existence.
  • 4. If we can do more and better math, that has to be a good thing.

Ada is connecting her new science of software to the history of thought in the West, something which readers of the time would expect her to do. Bullet point 1 alludes to the Greek view of mathematics, especially Plato's. Bullet point 2 alludes to the scientific view, as pioneered by Galileo and Newton. Bullet point 3 alludes to the post-Classical world view, especially the Christian one. But while the language is Christian, Ada's idea is one that Einstein would have had no trouble with. And bullet 4 is the call to action.

When we come to discuss the parse in detail, we'll see that it follows this structure. As an aside, note Ada's mention of "logical completeness" as one of the virtues of math. Gödel came along nearly a century later and showed this vision, which went back to the Greeks, was an illusion. So Ada did not predict everything. On the other hand, Gödel's result was also a complete surprise to Johnny von Neumann, who was in the room that day.

The experiment so far

I've gotten Marpa to grind through this sentence, using the same framework as the Stanford NLP demo. That demo, in fact, refuses to even attempt any sentence longer than 70 words, so my Ada quote needs to be broken up. Even on the smaller pieces, the Stanford demo becomes quite slow. Marpa, by contrast, grinds through the whole thing quickly. The Stanford demo is based on a CYK parser, and presumably is O(n3) -- cubic. Marpa seems to be exhibiting linear behavior.

Promising as this seems for Marpa, its first results may not hold up as the experiment gets more realistic. So far, I've only given Marpa enough English grammar and vocabulary to parse this one sentence. That is enough to make the grammar very complex and ambiguous, but even so it must be far less complex and ambiguous than the one behind the Stanford demo. Marpa will never have time worse than O(n3), but it's quite possible that if Marpa's grammar were as ambiguous as the Stanford one, Marpa would be no faster. Marpa, in fact, could turn out to be slower by some linear factor.

There may never be a final decision based on speed. Marpa might turn out to represent one approach, good for certain purposes. And, especially when speed is indecisive, other abilities can prove more important.

To learn more

Marpa::R2 is available on CPAN. A list of my Marpa tutorials can be found here. There are new tutorials by Peter Stuifzand and amon. The Ocean of Awareness blog focuses on Marpa, and it has an annotated guide. Marpa also has a web page. For questions, support and discussion, there is the "marpa parser" Google Group. Comments on this post can be made there.

Perlgeek.de : Localization for Exception Messages

Ok, my previous blog post wasn't quite as final as I thought.. My exceptions grant said that the design should make it easy to enable localization and internationalization hooks. I want to discuss some possible approaches and thereby demonstrate that the design is flexible enough as it is.

At this point I'd like to mention that much of the flexibility comes from either Perl 6 itself, or from the separation of stringifying and exception and generating the actual error message.

Mixins: the sledgehammer

One can always override a method in an object by mixing in a role which contains the method on question. When the user requests error messages in a different language, one can replace method Str or method message with one that generates the error message in a different language.

Where should that happen? The code throws exceptions is fairly scattered over the code base, but there is a central piece of code in Rakudo that turns Parrot-level exceptions into Perl 6 level exceptions. That would be an obvious place to muck with exceptions, but it would mean that exceptions that are created but not thrown don't get the localization. I suspect that's a fairly small problem in the real world, but it still carries code smell. As does the whole idea of overriding methods.

Another sledgehammer: alternative setting

Perl 6 provides built-in types and routines in an outer lexical scope known as a "setting". The default setting is called CORE. Due to the lexical nature of almost all lookups in Perl 6, one can "override" almost anything by providing a symbol of the same name in a lexical scope.

One way to use that for localization is to add another setting between the user's code and CORE. For example a file DE.setting:

my class X::Signature::Placeholder does X::Comp {
    method message() {
        'Platzhaltervariablen können keine bestehenden Signaturen überschreiben';
    }
}

After compiling, we can load the setting:

$ ./perl6 --target=pir --output=DE.setting.pir DE.setting
$ ./install/bin/parrot -o DE.setting.pbc DE.setting.pir
$ ./perl6 --setting=DE -e 'sub f() { $^x }'
===SORRY!===
Platzhaltervariablen können keine bestehenden Signaturen überschreiben
at -e:1

That works beautifully for exceptions that the compiler throws, because they look up exception types in the scope where the error occurs. Exceptions from within the setting are a different beast, they'd need special lookup rules (though the setting throws far fewer exceptions than the compiler, so that's probably manageable).

But while this looks quite simple, it comes with a problem: if a module is precompiled without the custom setting, and it contains a reference to an exception type, and then the l10n setting redefines it, other programs will contain references to a different class with the same name. Which means that our precompiled module might only catch the English version of X::Signature::Placeholder, and lets our localized exception pass through. Oops.

Tailored solutions

A better approach is probably to simply hack up the string conversion in type Exception to consider a translator routine if present, and pass the invocant to that routine. The translator routine can look up the error message keyed by the type of the exception, and has access to all data carried in the exception. In untested Perl 6 code, this might look like this:

# required change in CORE
my class Exception {
    multi method Str(Exception:D:) {
        return self.message unless defined $*LANG;
        if %*TRANSLATIONS{$*LANG}{self.^name} -> $translator {
            return $translator(self);
        }
        return self.message; # fallback
    }
}

# that's what a translator could write:

%*TRANSLATIONS<de><X::TypeCheck::Assignment> = {
        "Typenfehler bei Zuweisung zu '$_.symbol()': "
        ~ "'{$_.expected.^name}' erwartet, aber '{$_.got.^name} bekommen"
    }
}

And setting the dynamic language $*LANG to 'de' would give a German error message for type check failures in assignment.

Another approach is to augment existing error classes and add methods that generate the error message in different languages, for example method message-fr for French, and check their existence in Exception.Str if a different language is requested.

Conclusion

In conclusion there are many bad and enough good approaches; we will decide which one to take when the need arises (ie when people actually start to translate error messages).

Dave's Free Press: Journal: Graphing tool

Dave's Free Press: Journal: Travelling in time: the CP2000AN

Dave's Free Press: Journal: XML::Tiny released

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 1

Ocean of Awareness: Significant newlines? Or semicolons?

Should statements have explicit terminators, like the semicolon of Perl and the C language? Or should they avoid the clutter, and separate statements by giving whitespace syntactic significance and a real effect on the semantics, as is done in Python and Javascript?

Actually we don't have to go either way. As an example, let's look at some BNF-ish DSL. It defines a small calculator. At first glance, it looks as if this language has taken the significant-whitespace route -- there certainly are no explicit statement terminators.

:default ::= action => ::first
:start ::= Expression
Expression ::= Term
Term ::=
      Factor
    | Term '+' Term action => do_add
Factor ::=
      Number
    | Factor '*' Factor action => do_multiply
Number ~ digits
digits ~ [\d]+
:discard ~ whitespace
whitespace ~ [\s]+

The rule is that there isn't one

If we don't happen to like the layout of the above DSL, and rearrange it in various ways, we'll find that everything we try works. If we become curious about what exactly what the rules for newlines are, and look at the documentation, we won't find any. That's because there aren't any.

We can see this by thoroughly messing up the line structure:

:default ::= action => ::first :start ::= Expression Expression ::= Term
Term ::= Factor | Term '+' Term action => do_add Factor ::= Number |
Factor '*' Factor action => do_multiply Number ~ digits digits ~
[\d]+ :discard ~ whitespace whitespace ~ [\s]+

The script will continue to run just fine.

How does it work?

How does it work? Actually, pose the question this way: Can a human reader tell where the statements end? If the reader is not used to reading BNF, he might have trouble with this particular example but, for a language that he knows, the answer is simple: Yes, of course he can. So really the question is, why do we expect the parser to be so stupid that it cannot?

The only trick is that this is done without trickery. Marpa's DSL is written in itself, and Marpa's self-grammar describes exactly what a statement is and what it is not. The Marpa parser is powerful enough to simply take this self-describing DSL and act on it, finding where statements begin and end, much as a human reader is able to.

To learn more

This example was produced with the Marpa parser. Marpa::R2 is available on CPAN. The code for this example is based on that in the synopsis for its top-level document, but it is isolated conveniently in a Github gist.

A list of my Marpa tutorials can be found here. There are new tutorials by Peter Stuifzand and amon. The Ocean of Awareness blog focuses on Marpa, and it has an annotated guide. Marpa has a web page that I maintain and Ron Savage maintains another. For questions, support and discussion, there is the "marpa parser" Google Group. Comments on this post can be made there.

Perlgeek.de : Pattern Matching and Unpacking

When talking about pattern matching in the context of Perl 6, people usually think about regex or grammars. Those are indeed very powerful tools for pattern matching, but not the only one.

Another powerful tool for pattern matching and for unpacking data structures uses signatures.

Signatures are "just" argument lists:

sub repeat(Str $s, Int $count) {
    #     ^^^^^^^^^^^^^^^^^^^^  the signature
    # $s and $count are the parameters
    return $s x $count
}

Nearly all modern programming languages have signatures, so you might say: nothing special, move along. But there are two features that make them more useful than signatures in other languages.

The first is multi dispatch, which allows you to write several routines with the name, but with different signatures. While extremely powerful and helpful, I don't want to dwell on them. Look at Chapter 6 of the "Using Perl 6" book for more details.

The second feature is sub-signatures. It allows you to write a signature for a sigle parameter.

Which sounds pretty boring at first, but for example it allows you to do declarative validation of data structures. Perl 6 has no built-in type for an array where each slot must be of a specific but different type. But you can still check for that in a sub-signature

sub f(@array [Int, Str]) {
    say @array.join: ', ';
}
f [42, 'str'];      # 42, str
f [42, 23];         # Nominal type check failed for parameter '';
                    # expected Str but got Int instead in sub-signature
                    # of parameter @array

Here we have a parameter called @array, and it is followed by a square brackets, which introduce a sub-signature for an array. When calling the function, the array is checked against the signature (Int, Str), and so if the array doesn't contain of exactly one Int and one Str in this order, a type error is thrown.

The same mechanism can be used not only for validation, but also for unpacking, which means extracting some parts of the data structure. This simply works by using variables in the inner signature:

sub head(*@ [$head, *@]) {
    $head;
}
sub tail(*@ [$, *@tail]) {
    @tail;
}
say head <a b c >;      # a
say tail <a b c >;      # b c

Here the outer parameter is anonymous (the @), though it's entirely possible to use variables for both the inner and the outer parameter.

The anonymous parameter can even be omitted, and you can write sub tail( [$, *@tail] ) directly.

Sub-signatures are not limited to arrays. For working on arbitrary objects, you surround them with parenthesis instead of brackets, and use named parameters inside:

multi key-type ($ (Numeric :$key, *%)) { "Number" }
multi key-type ($ (Str     :$key, *%)) { "String" }
for (42 => 'a', 'b' => 42) -> $pair {
    say key-type $pair;
}
# Output:
# Number
# String

This works because the => constructs a Pair, which has a key and a value attribute. The named parameter :$key in the sub-signature extracts the attribute key.

You can build quite impressive things with this feature, for example red-black tree balancing based on multi dispatch and signature unpacking. (More verbose explanation of the code.) Most use cases aren't this impressive, but still it is very useful to have occasionally. Like for this small evaluator.

Dave's Free Press: Journal: Thanks, Yahoo!

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 2

Perlgeek.de : YAPC Europe 2013 Day 3

The second day of YAPC Europe climaxed in the river boat cruise, Kiev's version of the traditional conference dinner. It was a largish boat traveling on the Dnipro river, with food, drinks and lots of Perl folks. Not having fixed tables, and having to get up to fetch food and drinks led to a lot of circulation, and thus meeting many more people than at traditionally dinners. I loved it.

Day 3 started with a video message from next year's YAPC Europe organizers, advertising for the upcoming conference and talking a bit about the oppurtunities that Sofia offers. Tempting :-).

Monitoring with Perl and Unix::Statgrab was more about the metrics that are available for monitoring, and less about doing stuff with Perl. I was a bit disappointed.

The "Future Perl Versioning" Discussion was a very civilized discussion, with solid arguments. Whether anybody changed their minds remain to be seen.

Carl Mäsak gave two great talks: one on reactive programming, and one on regular expressions. I learned quite a bit in the first one, and simply enjoyed the second one.

After the lunch (tasty again), I attended Jonathan Worthington's third talk, MoarVM: a metamodel-focused runtime for NQP and Rakudo. Again this was a great talk, based on great work done by Jonathan and others during the last 12 months or so. MoarVM is a virtual machine designed for Perl 6's needs, as we understand them now (as opposed to parrot, which was designed towards Perl 6 as it was understood around 2003 or so, which is considerably different).

How to speak manager was both amusing and offered a nice perspective on interactions between managers and programmers. Some of this advice assumed a non-tech-savy manager, and thus didn't quite apply to my current work situation, but was still interesting.

I must confess I don't remember too much of the rest of the talks that evening. I blame five days of traveling, hackathon and conference taking their toll on me.

The third session of lightning talks was again an interesting mix, containing interesting technical tidbits, the usual "we are hiring" slogans, some touching and thoughtful moments, and finally a song by Piers Cawley. He had written the lyrics in the previous 18 hours (including sleep), to (afaict) a traditional irish song. Standing up in front of ~300 people and singing a song that you haven't really had time to practise takes a huge amount of courage, and I admire Piers both for his courage and his great performance. I hope it was recorded, and makes it way to the public soon.

Finally the organizers spoke some closing words, and received their well-deserved share of applause.

As you might have guess from this and the previous blog posts, I enjoyed this year's YAPC Europe very much, and found it well worth attending, and well organized. I'd like to give my heart-felt thanks to everybody who helped to make it happen, and to my employer for sending me there.

This being only my second YAPC, I can't make any far-reaching comparisons, but compared to YAPC::EU 2010 in Pisa I had an easier time making acquaintances. I cannot tell what the big difference was, but the buffet-style dinners at the pre-conference meeting and the river boat cruise certainly helped to increase the circulation and thus the number of people I talked to.

Dave's Free Press: Journal: YAPC::Europe 2007 travel plans

Perlgeek.de : A small regex optimization for NQP and Rakudo

Recently I read the course material of the Rakudo and NQP Internals Workshop, and had an idea for a small optimization for the regex engine. Yesterday night I implemented it, and I'd like to walk you through the process.

As a bit of background, the regex engine that Rakudo uses is actually implemented in NQP, and used by NQP too. The code I am about to discuss all lives in the NQP repository, but Rakudo profits from it too.

In addition one should note that the regex engine is mostly used for parsing grammar, a process which involves nearly no scanning. Scanning is the process where the regex engine first tries to match the regex at the start of the string, and if it fails there, moves to the second character in the string, tries again etc. until it succeeds.

But regexes that users write often involve scanning, and so my idea was to speed up regexes that scan, and where the first thing in the regex is a literal. In this case it makes sense to find possible start positions with a fast string search algorithm, for example the Boyer-Moore algorithm. The virtual machine backends for NQP already implement that as the index opcode, which can be invoked as start = index haystack, needle, startpos, where the string haystack is searched for the substring needle, starting from position startpos.

From reading the course material I knew I had to search for a regex type called scan, so that's what I did:

$ git grep --word scan
3rdparty/libtommath/bn_error.c:   /* scan the lookup table for the given message
3rdparty/libtommath/bn_mp_cnt_lsb.c:   /* scan lower digits until non-zero */
3rdparty/libtommath/bn_mp_cnt_lsb.c:   /* now scan this digit until a 1 is found
3rdparty/libtommath/bn_mp_prime_next_prime.c:                   /* scan upwards 
3rdparty/libtommath/changes.txt:       -- Started the Depends framework, wrote d
src/QRegex/P5Regex/Actions.nqp:                     QAST::Regex.new( :rxtype<sca
src/QRegex/P6Regex/Actions.nqp:                     QAST::Regex.new( :rxtype<sca
src/vm/jvm/QAST/Compiler.nqp:    method scan($node) {
src/vm/moar/QAST/QASTRegexCompilerMAST.nqp:    method scan($node) {
Binary file src/vm/moar/stage0/NQPP6QRegexMoar.moarvm matches
Binary file src/vm/moar/stage0/QASTMoar.moarvm matches
src/vm/parrot/QAST/Compiler.nqp:    method scan($node) {
src/vm/parrot/stage0/P6QRegex-s0.pir:    $P5025 = $P5024."new"("scan" :named("rx
src/vm/parrot/stage0/QAST-s0.pir:.sub "scan" :subid("cuid_135_1381944260.6802") 
src/vm/parrot/stage0/QAST-s0.pir:    push $P5004, "scan"

The binary files and .pir files are generated code included just for bootstrapping, and not interesting for us. The files in 3rdparty/libtommath are there for bigint handling, thus not interesting for us either. The rest are good matches: src/QRegex/P6Regex/Actions.nqp is responsible for compiling Perl 6 regexes to an abstract syntax tree (AST), and src/vm/parrot/QAST/Compiler.nqp compiles that AST down to PIR, the assembly language that the Parrot Virtual Machine understands.

So, looking at src/QRegex/P6Regex/Actions.nqp the place that mentions scan looked like this:

    $block<orig_qast> := $qast;
    $qast := QAST::Regex.new( :rxtype<concat>,
                 QAST::Regex.new( :rxtype<scan> ),
                 $qast,
                 ($anon
                      ?? QAST::Regex.new( :rxtype<pass> )
                      !! (nqp::substr(%*RX<name>, 0, 12) ne '!!LATENAME!!'
                            ?? QAST::Regex.new( :rxtype<pass>, :name(%*RX<name>) )
                            !! QAST::Regex.new( :rxtype<pass>,
                                   QAST::Var.new(
                                       :name(nqp::substr(%*RX<name>, 12)),
                                       :scope('lexical')
                                   ) 
                               )
                          )));

So to make the regex scan, the AST (in $qast) is wrapped in QAST::Regex.new(:rxtype<concat>,QAST::Regex.new( :rxtype<scan> ), $qast, ...), plus some stuff I don't care about.

To make the optimization work, the scan node needs to know what to scan for, if the first thing in the regex is indeed a constant string, aka literal. If it is, $qast is either directly of rxtype literal, or a concat node where the first child is a literal. As a patch, it looks like this:

--- a/src/QRegex/P6Regex/Actions.nqp
+++ b/src/QRegex/P6Regex/Actions.nqp
@@ -667,9 +667,21 @@ class QRegex::P6Regex::Actions is HLL::Actions {
     self.store_regex_nfa($code_obj, $block, QRegex::NFA.new.addnode($qast))
     self.alt_nfas($code_obj, $block, $qast);
 
+    my $scan := QAST::Regex.new( :rxtype<scan> );
+    {
+        my $q := $qast;
+        if $q.rxtype eq 'concat' && $q[0] {
+            $q := $q[0]
+        }
+        if $q.rxtype eq 'literal' {
+            nqp::push($scan, $q[0]);
+            $scan.subtype($q.subtype);
+        }
+    }
+
     $block<orig_qast> := $qast;
     $qast := QAST::Regex.new( :rxtype<concat>,
-                 QAST::Regex.new( :rxtype<scan> ),
+                 $scan,
                  $qast,

Since concat nodes have always been empty so far, the code generators don't look at their child nodes, and adding one with nqp::push($scan, $q[0]); won't break anything on backends that don't support this optimization yet (which after just this patch were all of them). Running make test confirmed that.

My original patch did not contain the line $scan.subtype($q.subtype);, and later on some unit tests started to fail, because regex matches can be case insensitive, but the index op works only case sensitive. For case insensitive matches, the $q.subtype of the literal regex node would be ignorecase, so that information needs to be carried on to the code generation backend.

Once that part was in place, and some debug nqp::say() statements confirmed that it indeed worked, it was time to look at the code generation. For the parrot backend, it looked like this:

    method scan($node) {
        my $ops := self.post_new('Ops', :result(%*REG<cur>));
        my $prefix := self.unique('rxscan');
        my $looplabel := self.post_new('Label', :name($prefix ~ '_loop'));
        my $scanlabel := self.post_new('Label', :name($prefix ~ '_scan'));
        my $donelabel := self.post_new('Label', :name($prefix ~ '_done'));
        $ops.push_pirop('repr_get_attr_int', '$I11', 'self', %*REG<curclass>, '"$!from"');
        $ops.push_pirop('ne', '$I11', -1, $donelabel);
        $ops.push_pirop('goto', $scanlabel);
        $ops.push($looplabel);
        $ops.push_pirop('inc', %*REG<pos>);
        $ops.push_pirop('gt', %*REG<pos>, %*REG<eos>, %*REG<fail>);
        $ops.push_pirop('repr_bind_attr_int', %*REG<cur>, %*REG<curclass>, '"$!from"', %*REG<pos>);
        $ops.push($scanlabel);
        self.regex_mark($ops, $looplabel, %*REG<pos>, 0);
        $ops.push($donelabel);
        $ops;
    }

While a bit intimidating at first, staring at it for a while quickly made clear what kind of code it emits. First three labels are generated, to which the code can jump with goto $label: One as a jump target for the loop that increments the cursor position ($looplabel), one for doing the regex match at that position ($scanlabel), and $donelabel for jumping to when the whole thing has finished.

Inside the loop there is an increment (inc) of the register the holds the current position (%*REG<pos>), that position is compared to the end-of-string position (%*REG<eos>), and if is larger, the cursor is marked as failed.

So the idea is to advance the position by one, and then instead of doing the regex match immediately, call the index op to find the next position where the regex might succeed:

--- a/src/vm/parrot/QAST/Compiler.nqp
+++ b/src/vm/parrot/QAST/Compiler.nqp
@@ -1564,7 +1564,13 @@ class QAST::Compiler is HLL::Compiler {
         $ops.push_pirop('goto', $scanlabel);
         $ops.push($looplabel);
         $ops.push_pirop('inc', %*REG<pos>);
-        $ops.push_pirop('gt', %*REG<pos>, %*REG<eos>, %*REG<fail>);
+        if nqp::elems($node.list) && $node.subtype ne 'ignorecase' {
+            $ops.push_pirop('index', %*REG<pos>, %*REG<tgt>, self.rxescape($node[0]), %*REG<pos>);
+            $ops.push_pirop('eq', %*REG<pos>, -1, %*REG<fail>);
+        }
+        else {
+            $ops.push_pirop('gt', %*REG<pos>, %*REG<eos>, %*REG<fail>);
+        }
         $ops.push_pirop('repr_bind_attr_int', %*REG<cur>, %*REG<curclass>, '"$!from"', %*REG<pos>);
         $ops.push($scanlabel);
         self.regex_mark($ops, $looplabel, %*REG<pos>, 0);

The index op returns -1 on failure, so the condition for a cursor fail are slightly different than before.

And as mentioned earlier, the optimization can only be safely done for matches that don't ignore case. Maybe with some additional effort that could be remedied, but it's not as simple as case-folding the target string, because some case folding operations can change the string length (for example ß becomes SS while uppercasing).

After successfully testing the patch, I came up with a small, artifical benchmark designed to show a difference in performance for this particular case. And indeed, it sped it up from 647 ± 28 µs to 161 ± 18 µs, which is roughly a factor of four.

You can see the whole thing as two commits on github.

What remains to do is implementing the same optimization on the JVM and MoarVM backends, and of course other optimizations. For example the Perl 5 regex engine keeps track of minimal and maximal string lengths for each subregex, and can anchor a regex like /a?b?longliteral/ to 0..2 characters before a match of longliteral, and generally use that meta information to fail faster.

But for now I am mostly encouraged that doing a worthwhile optimization was possible in a single evening without any black magic, or too intimate knowledge of the code generation.

Update: the code generation for MoarVM now also uses the index op. The logic is the same as for the parrot backend, the only difference is that the literal needs to be loaded into a register (whose name fresh_s returns) before index_s can use it.

Perlgeek.de : Quo Vadis Perl?

The last two days we had a gathering in town named Perl (yes, a place with that name exists). It's a lovely little town next to the borders to France and Luxembourg, and our meeting was titled "Perl Reunification Summit".

Sadly I only managed to arrive in Perl on Friday late in the night, so I missed the first day. Still it was totally worth it.

We tried to answer the question of how to make the Perl 5 and the Perl 6 community converge on a social level. While we haven't found the one true answer to that, we did find that discussing the future together, both on a technical and on a social level, already brought us closer together.

It was quite a touching moment when Merijn "Tux" Brand explained that he was skeptic of Perl 6 before the summit, and now sees it as the future.

We also concluded that copying API design is a good way to converge on a technical level. For example Perl 6's IO subsystem is in desperate need of a cohesive design. However none of the Perl 6 specification and the Rakudo development team has much experience in that area, and copying from successful Perl 5 modules is a viable approach here. Path::Class and IO::All (excluding the crazy parts) were mentioned as targets worth looking at.

There is now also an IRC channel to continue our discussions -- join #p6p5 on irc.perl.org if you are interested.

We also discussed ways to bring parallel programming to both perls. I missed most of the discussion, but did hear that one approach is to make easier to send other processes some serialized objects, and thus distribute work among several cores.

Patrick Michaud gave a short ad-hoc presentation on implicit parallelism in Perl 6. There are several constructs where the language allows parallel execution, for example for Hyper operators, junctions and feeds (think of feeds as UNIX pipes, but ones that allow passing of objects and not just strings). Rakudo doesn't implement any of them in parallel right now, because the Parrot Virtual Machine does not provide the necessary primitives yet.

Besides the "official" program, everybody used the time in meat space to discuss their favorite projects with everybody else. For example I took some time to discuss the future of doc.perl6.org with Patrick and Gabor Szabgab, and the relation to perl6maven with the latter. The Rakudo team (which was nearly completely present) also discussed several topics, and I was happy to talk about the relation between Rakudo and Parrot with Reini Urban.

Prior to the summit my expectations were quite vague. That's why it's hard for me to tell if we achieved what we and the organizers wanted. Time will tell, and we want to summarize the result in six to nine months. But I am certain that many participants have changed some of their views in positive ways, and left the summit with a warm, fuzzy feeling.

I am very grateful to have been invited to such a meeting, and enjoyed it greatly. Our host and organizers, Liz and Wendy, took care of all of our needs -- travel, food, drinks, space, wifi, accommodation, more food, entertainment, food for thought, you name it. Thank you very much!

Update: Follow the #p6p5 hash tag on twitter if you want to read more, I'm sure other participants will blog too.

Other blogs posts on this topic: PRS2012 – Perl5-Perl6 Reunification Summit by mdk and post-yapc by theorbtwo

Dave's Free Press: Journal: Wikipedia handheld proxy

Dave's Free Press: Journal: Bryar security hole

Dave's Free Press: Journal: POD includes

Perlgeek.de : First day at YAPC::Europe 2013 in Kiev

Today was the first "real" day of YAPC Europe 2013 in Kiev. In the same sense that it was the first real day, we had quite a nice "unreal" conference day yesterday, with a day-long Perl 6 hackathon, and in the evening a pre-conference meeting a Sovjet-style restaurant with tasty food and beverages.

The talks started with a few words of welcome, and then the announcement that the YAPC Europe next year will be in Sofia, Bulgaria, with the small side note that there were actually three cities competing for that honour. Congratulations to Sofia!

Larry's traditional keynote was quite emotional, and he had to fight tears a few times. Having had cancer and related surgeries in the past year, he still does his perceived duty to the Perl community, which I greatly appreciate.

Afterwards Dave Cross talked about 25 years of Perl in 25 minutes, which was a nice walk through some significant developments in the Perl world, though a bit hasty. Maybe picking fewer events and spending a bit more time on the selected few would give a smoother experience.

Another excellent talk that ran out of time was on Redis. Having experimented a wee bit with Redis in the past month, this was a real eye-opener on the wealth of features we might have used for a project at work, but in the end we didn't. Maybe we will eventually revise that decision.

Ribasushi talked about how hard benchmarking really is, and while I was (in principle) aware of that fact that it's hard to get right, there were still several significant factors that I overlooked (like the CPU's tendency to scale frequency in response to thermal and power-management considerations). I also learned that I should use Dumbbench instead of the Benchmark.pm core module. Sadly it didn't install for me (Capture::Tiny tests failing on Mac OS X).

The Perl 6 is dead, long live Perl 5 talk was much less inflammatory than the title would suggest (maybe due to Larry touching on the subject briefly during the keynote). It was mostly about how Perl 5 is used in the presenter's company, which was mildly interesting.

After tasty free lunch I attended jnthn's talk on Rakudo on the JVM, which was (as is typical for jnthn's talk) both entertaining and taught me something, even though I had followed the project quite a bit.

Thomas Klausner's Bread::Board by example made me want to refactor the OTRS internals very badly, because it is full of the anti-patterns that Bread::Board can solve in a much better way. I think that the OTRS code base is big enough to warrant the usage of Bread::Board.

I enjoyed Denis' talk on Method::Signatures, and was delighted to see that most syntax is directly copied from Perl 6 signature syntax. Talk about Perl 6 sucking creativity out of Perl 5 development.

The conference ended with a session of lighning talks, something which I always enjoy. Many lightning talks had a slightly funny tone or undertone, while still talking about interesting stuff.

Finally there was the "kick-off party", beverages and snacks sponsored by booking.com. There (and really the whole day, and yesterday too) I not only had conversations with my "old" Perl 6 friends, but also talked with many interesting people I never met before, or only met online before.

So all in all it was a nice experience, both from the social side, and from quality and contents of the talks. Venue and food are good, and the wifi too, except when it stops working for a few minutes.

I'm looking forward to two more days of conference!

(Updated: Fixed Thomas' last name)

Dave's Free Press: Journal: cgit syntax highlighting

Dave's Free Press: Journal: CPAN Testers' CPAN author FAQ

Perlgeek.de : Correctness in Computer Programs and Mathematical Proofs

While reading On Proof and Progress in Mathematics by Fields Medal winner Bill Thurston (recently deceased I was sorry to hear), I came across this gem:

The standard of correctness and completeness necessary to get a computer program to work at all is a couple of orders of magnitude higher than the mathematical community’s standard of valid proofs. Nonetheless, large computer programs, even when they have been very carefully written and very carefully tested, always seem to have bugs.

I noticed that mathematicians are often sloppy about the scope of their symbols. Sometimes they use the same symbol for two different meanings, and you have to guess from context which on is meant.

This kind of sloppiness generally doesn't have an impact on the validity of the ideas that are communicated, as long as it's still understandable to the reader.

I guess on reason is that most mathematical publications still stick to one-letter symbol names, and there aren't that many letters in the alphabets that are generally accepted for usage (Latin, Greek, a few letters from Hebrew). And in the programming world we snort derisively at FORTRAN 77 that limited variable names to a length of 6 characters.

Dave's Free Press: Journal: Thankyou, Anonymous Benefactor!

Dave's Free Press: Journal: Number::Phone release

Dave's Free Press: Journal: Ill

Dave's Free Press: Journal: CPANdeps upgrade

Dave's Free Press: Journal: YAPC::Europe 2006 report: day 3

Perlgeek.de : iPod nano 5g on linux -- works!

For Christmas I got an iPod nano (5th generation). Since I use only Linux on my home computers, I searched the Internet for how well it is supported by Linux-based tools. The results looked bleak, but they were mostly from 2009.

Now (December 2012) on my Debian/Wheezy system, it just worked.

The iPod nano 5g presents itself as an ordinary USB storage device, which you can mount without problems. However simply copying files on it won't make the iPod show those files in the play lists, because there is some meta data stored on the device that must be updated too.

There are several user-space programs that allow you to import and export music from and to the iPod, and update those meta data files as necessary. The first one I tried, gtkpod 2.1.2, worked fine.

Other user-space programs reputed to work with the iPod are rhythmbox and amarok (which both not only organize but also play music).

Although I don't think anything really depends on some particular versions here (except that you need a new enough version of gtkpod), here is what I used:

  • Architecture: amd64
  • Linux: 3.2.0-4-amd64 #1 SMP Debian 3.2.35-2
  • Userland: Debian GNU/Linux "Wheezy" (currently "testing")
  • gtkpod: 2.1.2-1

Subscriptions

Header image by Tambako the Jaguar. Some rights reserved.