Laufeyjarson writes... » Perl: PBP: 087 Extended Boilerplates

The Best Practices make some suggestions on additional things to include in your POD, and suggest those items get listed in your general boilerplates to fill out.  Some of the suggestions are interesting, but I’m not sure they belong in every POD ever.

I already think the templates given earlier are too verbose for many situations, and would rather see ways to be reminded of what sections exist instead of being forced to wade through them all and delete them every time I start a module.  Making modules should be easy, not painful.  It’s already hard enough to get engineers to organize their code.

That being said, some of the things listed in this section in the book are interesting, and to be reminded to include them is excellent.  While I don’t know if they belong in every module, they may belong somewhere in every project.

EXAMPLES: Isn’t the “Examples” section what the already-hard-to-manage Synopsis is supposed to be?

FAQ: I don’t think the FAQ belongs in every module.  Maybe in a POD for the project, and maybe on the web site or wiki.  Depends on how the project is being organized and the context it is for.

COMMON USAGE MISTAKES:  I like the note that this is “Frequently Unasked Questions” but I don’t want to put examples of how to do it wrong anywhere.  I think that’s what’ll show up in the Internet search, and it will cause more questions than it asks.

SEE ALSO: I loved the original Unix man(1) pages, because they had fantastic cross-references.  I learned Unix on a real Unix system where the manual pages were complete, cross-referenced, and included general background sections.  The modern Linux man pages are a pale, pathetic imitation of this.  (And GNU with their drive to put things in the hard-to-use and opaque info tool damaged this terribly.  I wish they’d stop.)  Whoops, rant over, sorry.  I just with the linking and anchoring tools in POD were easier to work with and less cranky and verbose.

I just noticed a footnote in this section that raises my hackles a little too.  “By now you have no doubt detected the ulterior motive for providing more extensive user manuals and written advice. User documentation is all about not having to actually talk to users.”  This is a terrible reason to write documentation.  This continues the poor belief that engineers are different than other people and shouldn’t have to deal with them.  The reason to write documentation is to help those other people, so that they can get the most possible out of the program you’re working with.  You’re trying to make their lives better by making sure they have the knowledge to understand and use the system, not to make your life better by getting them to leave you alone.

perlancar's blog: pericmd 032: More on tab completion (4): Completing paths

perlancar's blog

There are several kinds of tree-like entities that can be addressed using a path. Filesystem is one, another is a hierarchy of Perl module/package (yet another is Riap URL, which we haven’t really covered in depth, but suffice to say that local Riap URL also map to Perl packages in Perl-based application). A module called Complete::Path is used as a backend to complete all these kinds of path. Each specific type of path is then completed using a higher-level function which uses Complete::Path, but they support the same settings that Complete::Path supports/respects.

Completing filesystem path

Function complete_file in Complete::Util can be used to complete filesystem path (that is, files and directories). There is a filter option which can be a simple string like "d" to only use directories and not files or "x" to only include files (and directories) that have their executable bit set, or as complex as you want since it can also be a coderef.

Let’s try using the function directly. Suppose we have a directory containing these files:

% mkdir tmp
% cd tmp
% mkdir dir1 dir2 Dir3 dir2/dir4
% touch file1 file2-a file2_b File3 dir2/file4 dir2/dir4/file5

Then this code:

% perl -MComplete::Util=complete_file -MData::Dump -E'dd complete_file(word=>"d")'
["Dir3/", "dir1/", "dir2/"]

Note how directories are automatically appended with path separator character (in this case, /). This is for convenience to let you press Tab again directly to dig a filesystem deeper into subdirectories without typing the path separator character manually.

The map_case option. complete_file() also accepts map_case option (will be passed to Complete::Path) which, if turned on (by default it is), will regard underscore (_) and dash (-) as the same character. This is for convenience to let you use dash (which does not require pressing the Shift key on US keyboards) for completing words that might use underscores as separators. Example:

% perl -MComplete::Util=complete_file -MData::Dump -E'dd complete_file(word=>"file2-")'
["file2-a", "file2_b"]

The exp_im_path option. exp_im_path is short for “expand intermediate paths” and is another convenience option which by default is turned on (can be turned off globally by setting environment COMPLETE_OPT_EXP_IM_PATH to 0). This option lets you type only one or a few characters of intermediate paths. For example:

% perl -MComplete::Util=complete_file -MData::Dump -E'dd complete_file(word=>"d/d/f")'
["dir2/dir4/file5"]

This is akin to a shell wildcard like d*/d*/f*.

Note that by default, expansion is limited only when each intermediate path is only 1 or 2 characters long. As to why this is done, the documentation for Complete module contains the gory details.

The dig_leaf option. This is another convenience option (again, by default is turned on and can be turned off using COMPLETE_OPT_DIG_LEAF=0), which lets Complete::Path dig immediately several levels down if it finds only a single directory in the intermediate paths. For example:

% perl -MComplete::Util=complete_file -MData::Dump -E'dd complete_file(word=>"dir2/")'
["dir2/dir4/file5", "dir2/file4"]

Inside dir2 there is only a single file (file4) and a single subdirectory (dir4). Instead of settling with those, since there is only a single directory, Complete::Path will dig inside dir4 and add the files inside it to the completion answer. If dir4 in turn only contains a single subdirectory, the process is repeated. The effect is, if you have a deep directory structure, e.g. lib/TAP/Parser/Iterator/Stream.pm and you happen to have only a single file like that and no other intermediate paths, you just have to type “lib” (or even “l/”, due to exp_im_path setting) and voila, the whole path is completed using a single Tab press instead of you having to Tab-Tab-Tab your way into the deep directory.

Completing Perl module names

Perl module names can be completed using the complete_module function in Complete::Module module. Since Perl modules also form a hierarchical namespace, the function also calls Complete::Path::complete_path as its backend and shares the same support for options like exp_im_path and dig_leaf. Let’s see some examples:

% perl -MComplete::Module=complete_module -MData::Dump -E'dd complete_module(word=>"TA")'
{
  path_sep => "/",
  words => ["TAP/", "TableDef", "Taint/", "Task/Weaken", "tainting"],
}
% perl -MComplete::Module=complete_module -MData::Dump -E'dd complete_module(word=>"TAP::")'
{
  path_sep => "::",
  words => [
    "TAP::Base",
    "TAP::Formatter::",
    "TAP::Harness",
    "TAP::Harness::",
    "TAP::Object",
    "TAP::Parser",
    "TAP::Parser::",
  ],
}

Wait, why is the path separator still “/”, shouldn’t it be “::” (double colon)? Yes, this is for convenience when doing bash completion. Path separator will only become “::” if the word already contains “::”. Otherwise . See the documentation of Complete::Module (or some of my old blog posts) for more details.

You can force using “::” by specifying path_sep argument:

% perl -MComplete::Module=complete_module -MData::Dump -E'dd complete_module(word=>"TA", path_se=>"::")'
{
  path_sep => "::",
  words => ["TAP::", "TableDef", "Taint::", "Task::Weaken", "tainting"],
}

Also, why does instead of array of words, the function returns a hash structure instead? This allows for setting metadata (like the path_sep key above) useful for hints when formatting the completion. The hash completion answer structure will be discussed in the next blog post.

Another convenience that the function provides is some common shortcuts like “dzp” automatically being expanded to “Dist/Zilla/Plugin/”, “pws” to “Pod/Weaver/Section/” and so on. This list of shortcuts can be customized, even from the environment variable.

Let’s see the complete_module() function in action in an actual CLI program. Install App::PMUtils from CPAN. It contains several CLI apps like pmversion or pmpath:

% pmversion t/ansit<tab>
% pmversion Text/ANSITable _
0.39

% pmpath dat<tab><tab>
Data/            DateTime         DateTimePP       
Date/            DateTime/        DateTimePPExtra  
% pmpath date/<tab>
% pmpath Date/<tab><tab>
Date/Format     Date/Language   Date/Language/  Date/Parse      
% pmpath Date/f<tab>
% pmpath Date/Format _
/home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Date/Format.pm

% pmversion dzb<tab>
% pmversion Dist/Zilla/PluginBundle/<tab>
% pmversion Dist/Zilla/PluginBundle/a/perla<tab>
% pmversion Dist/Zilla/PluginBundle/Author/PERLANCAR _
0.33

brian d foy: What could a reddit bot do with feedback?

David Farrell is conducting some Reddit experiments with his Perly::Bot. Through the _perly_bot user, he automagically injects things into the Perl subreddit, making a bit more like a feed aggregator. But that's just a start.

I like reddit because it allows for casual feedback, either up or down (unlike Likes). Some of that information could feed back into the bot. If someone's entries in blogs.perl.org, for instance, are consistently disfavored, the bot could neglect to inject them. There's lot of interesting math and algorithms around this sort of thing. And, counter-gaming and counter-counter-gaming. I'm sure some of you reading this know how to do that stuff and would have fun adding those features.

Perl Foundation News: Grant Extension Request: Maintaining Perl 5

Tony Cook has requested an extension of $20,000 for his Maintaining Perl 5 grant. This grant has been running successfully since July 2013. The requested extension would allow Tony to devote another 400 hours to the project. The funds for this extension would come from the Perl 5 Core Maintenance Fund.

As well as posting reports on the p5p mailing list Tony provides detailed monthly reports, the most recent of which can be found in the following blog posts:

January 2015
December 2014
November 2014

Before we make a decision on this extension we would like to have a period of community consultation that will last for seven days. Please leave feedback in the comments or, if you prefer, email your comments to karen at perlfoundation.org.

Ovid: Perl 6 for Mere Mortals - FOSDEM Video

My FOSDEM talk, Perl 6 for Mere Mortals, is now online:

You can see the rest of the Perl dev room videos here. Sadly, there were some technical problems, so a couple of videos have audio issues.

All of FOSDEM videos will eventually be here, but not all videos are ready yet.

PAL-Blog: Lieber ein Ende...

mit Schrecken als ein Schrecken ohne Ende. Sprichwörter sind was tolles, aber manchmal nerven sie einfach nur noch. Es sind Tage wie dieser, die zum nachdenken anregen - und Angst machen.

perlancar's blog: pericmd 031: More on tab completion (3): case sensitivity, Complete

perlancar's blog

Continuing from the previous blog’s example, you’ll notice that by default tab completion is case-insensitive:

% mycomp2 --baz h<tab>
% mycomp2 --baz H<tab><tab>
HISTCONTROL     HISTIGNORE      HISTTIMEFORMAT  
HISTFILESIZE    HISTSIZE        HOME            

This is because most completion routines, including complete_env() used above, or complete_array_elem() used often in custom completion routines, offer ci (case-insensitive) option which defaults to $Complete::OPT_CI which in turn default to environment variable COMPLETE_OPT_CI or 1.

If you turn off case-insensitivity, e.g. by:

% export COMPLETE_OPT_CI=0

then the above completion will no longer work:

% mycomp2 --baz h<tab><tab>

Alternatively if you need to set case (in-)sensitivity specifically in a routine, you can turn it on or off explicitly. For example:

# explicitly turn on case-insensitivity, regardless of COMPLETE_OPT_CI or $Complete::OPT_CI setting
complete_array_elem(array=>\@array, word=>$args{word}, ci=>1);

There are several other settings in Complete that are observed by the other Complete::* modules. Most of these settings’ default are optimized for convenience. This will be covered in the next blog post.


Perl Foundation News: Maintaining Perl 5: Grant Report for January 2015

Tony Cook writes:

Approximately 61 tickets were reviewed or worked on, and 11 patches were applied.

There were no especially interesting tickets this month.

HoursActivity
1.60cpan #101078 create/test bisect script and start bisect
cpan #101078 review bisect results, comment
0.27#120357 (security) research and comment
0.52#122432 review status
#122432 summarize status of cpan dists
2.25#122443 testing, polish
1.48#122730 bang head against dzil, try some simple fixes,
create github issue
0.20#123065 apply to blead
0.83#123218 produce a patch
0.15#123315 comment
2.13#123341 testing, review code and comment
2.78#123394 review, cleanup, testing, push to blead
#123394 review discussion and comment
2.52#123413 review latest patch, look into sdbm history,
comment
#123413 review, testing, fixes, push to blead, comment
0.23#123437 review and comment
0.58#123443 look for similar write() bug and work on fix
0.18#123512 review and comment
1.17#123528 review patch, discussion of win32 GetVersionEx()
behaviour
1.03#123532 testing, review code, comment
3.03#123538 diagnose, produce a patch, comment
#123538 test, apply to blead, check 123622 and lots of
win32 test failure code in between
3.34#123542 reproduce, debugging, try to reduce test case
#123542 reduce test case size, try to understand the parser
0.67#123549 review, research, testing, comment
1.60#123551 try to work out afl-gcc/blead build issues
4.00#123554 diagnose, debugging and comment
#123554 produce a better patch, checking code, comment
#123554 review, re-test and apply to blead
0.57#123555 review, research and comment
#123555 review discussion and mark as rejected
0.33#123562 try to understand code
1.52#123566 review, testing, apply to blead and comment
0.45#123575 review, testing
0.50#123580 review discussion
0.22#123585 review ticket and code
0.97#123591, 123538, test, add tests to 123538 patch
1.84#123599 research and comment
#123599 follow-up comment
0.45#123600 review smoke results, re-test and push to blead
0.25#123605 review, test and apply to blead
0.83#123606 review, test, apply to blead and comment
0.53#123620 review and briefly comment
#123620 review
1.13#123632 review, check history/usage of Opcode, comment
#123632 test and apply to blead
0.52#123634 review, test and apply to blead
0.20#123635 review discussion and patch and reject
6.96#123638 (security) review, discussion, attempt to fix
#123638 debugging
#123638 code review
#123638 testing
0.40#123658 review discussion and patch, comment
1.79#123672 review, find problem not in v5.14, start bisect
#123672 fix bisect
0.40#123675 review discussion, research, comment
1.05#123677 try to debug
0.35#123682 review and briefly comment
1.30#123683 try to visually track down commit, leont beats me
to commenting, start bisect
0.35#123689 comment
0.40#123693 review patch and comment
0.47#36248 try to understand cause and boggle at encoding.pm
0.9764-bit gcc/win32 build issue
3.18check out recent gcc Win32 build issues
1.30cygwin op/repeat.t issue
1.28fix SDBM_File build on gcc/win32
1.35look into Win32 test failures, reproduce, start bisect
0.83More 5.20.2
0.95more gcc Win32
0.52Plack-App-PHPCGI setup
2.68Review 5.20.2 votes list
0.55review and comment on character/bin data thread
1.40review bisect result and fix bisect code
0.40review maint-5.20 votes
0.45win32 sdbm issues
0.32win32 test failure
2.83win32 unthreaded sdbm_file build issues, discussion,
diagnosis, fix, testing

73.35 Hours Total

Sebastian Riedel about Perl and the Web: Mojoconf 2015

I’m excited to announce that this year’s Mojoconf will be held in New York City, from the 4th to 6th of June 2015. Right before YAPC::NA, so you can stop by on your way there. And just like last year, we will have one day of training, one day of talks, and a hackathon to wrap everything up. We hope to see you there!

perlancar's blog: pericmd 030: More on tab completion (2): Completing arguments, element_completion

perlancar's blog

Like the previous post, this blog post still focuses on tab completion, particularly on completing arguments.

Aside from option names and option values, Perinci::CmdLine can also complete arguments. In Perinci::CmdLine, command-line arguments will also be fed to function as function arguments. The function argument to which the command-line argument(s) will be fed to must be specified with the pos (for positional) property, and optionally with the greedy property. Example:

use Perinci::CmdLine::Any;

our %SPEC;
$SPEC{mycomp2} = {
    v => 1.1,
    args => {
        foo => {
            schema => 'str*',
            pos => 0,
            req => 1,
        },
        bar => {
            schema => 'str*',
            pos => 1,
        },
        baz => {
            schema => 'str*',
        },
    },
};
sub mycomp2 {
    my %args = @_;
    [200, "OK", join(
        "",
        "foo=", $args{foo}//'', " ",
        "bar=", $args{bar}//'', " ",
        "baz=", $args{baz}//'',
    )];
}

Perinci::CmdLine::Any->new(
    url => '/main/mycomp2',
)->run;

In the above program, the argument foo will map to the first
command-line argument (pos=0), bar to the second command-line argument
(pos=1), while baz does not map to any command-line argument (must be specified as command-line option, e.g. --baz val). Of course, the positional arguments can also be specified as command-line options too, although they cannot be both command-line options and arguments at the same time.

% mycomp2
ERROR 400: Missing required argument(s): foo

% mycomp2 1
foo=1 bar= baz=

% mycomp2 --foo 1
foo=1 bar= baz=

% mycomp2 1
ERROR 400: You specified option --foo but also argument #0

% mycomp2 1 2
foo=1 bar=2 baz=

% mycomp2 1 --bar 2
foo=1 bar=2 baz=

% mycomp2 1 --bar 2 2
ERROR 400: You specified option --bar but also argument #1

% mycomp2 1 2 --baz 3
foo=1 bar=2 baz=3

% mycomp2 1 2 3
ERROR 400: There are extra, unassigned elements in array: [3]

As you can see from the last example, Perinci::CmdLine will complain if there are extra arguments that are not unassigned to any function argument. What if we want a function argument to slurp all the remaining command-line arguments? We can declare a function argument as an array and set the pos property as well as set greedy property to true to express that the argument is slurpy.

use Perinci::CmdLine::Any;

our %SPEC;
$SPEC{mycomp2} = {
    v => 1.1,
    args => {
        foo => {
            schema => 'str*',
            pos => 0,
            req => 1,
        },
        bar => {
            schema => ['array*', of=>'str*'],
            pos => 1,
            greedy => 1,
        },
        baz => {
            schema => 'str*',
        },
    },
};
sub mycomp2 {
    my %args = @_;
    [200, "OK", join(
        "",
        "foo=", $args{foo}//'', " ",
        "bar=", ($args{bar} ? "[".join(",",@{$args{bar}})."]" : ''), " ",
        "baz=", $args{baz}//'',
    )];
}

Perinci::CmdLine::Any->new(
    url => '/main/mycomp2',
)->run;

When run:

% mycomp2 1
foo=1 bar= baz=

% mycomp2 1 2
foo=1 bar=[2] baz=

% mycomp2 1 2 3 4
foo=1 bar=[2,3,4] baz=

Now, since command-line arguments map to function arguments, to specify completion for it we just need to put a completion property to the metadata, just like any other argument.

#!/usr/bin/env perl

use Complete::Util qw(complete_array_elem complete_env);
use Perinci::CmdLine::Any;

our %SPEC;
$SPEC{mycomp2} = {
    v => 1.1,
    args => {
        foo => {
            schema => 'str*',
            pos => 0,
            req => 1,
            cmdline_aliases => {f=>{}},
            completion => sub {
                my %args = @_;
                complete_array_elem(
                    word  => $args{word},
                    array => [qw/apple banana blackberry blueberry/],
                ),
            },
        },
        bar => {
            schema => ['array*', of=>'str*'],
            pos => 1,
            greedy => 1,
            element_completion => sub {
                my %args = @_;
                complete_array_elem(
                    word    => $args{word},
                    array   => [qw/snakefruit durian jackfruit/],
                    exclude => $args{args}{bar},
                );
            },
        },
        baz => {
            schema => 'str*',
            completion => \&complete_env,
        },
    },
};
sub mycomp2 {
    my %args = @_;
    [200, "OK", join(
        "",
        "foo=", $args{foo}//'', " ",
        "bar=", ($args{bar} ? "[".join(",",@{$args{bar}})."]" : ''), " ",
        "baz=", $args{baz}//'',
    )];
}

Perinci::CmdLine::Any->new(
    url => '/main/mycomp2',
)->run;

Completion works when function argument is fed as command-line option (including aliases) or command-line argument. Let’s test the completion for foo (note: from this point onwards, I assume you have activated bash completion for the script, as described in the previous post pericmd 029):

% mycomp2 <tab><tab>
-\?               blackberry        --format          --no-config
apple             blueberry         -h                -v
banana            --config-path     --help            --version
--bar             --config-profile  --json            
--baz             --foo             --naked-res      

% mycomp2 b<tab>
banana      blackberry  blueberry   

% mycomp2 --foo <tab><tab>
apple       banana      blackberry  blueberry   

% mycomp2 -f <tab><tab>
apple       banana      blackberry  blueberry   

From the last program listing, you’ll see several new things. First is that the bar argument uses the element_completion property (line 27) instead of completion. This is because bar itself is an argument with type of array (of string), and we are completing the element, not the array itself:

% mycomp2 --bar <tab><tab>
durian      jackfruit   snakefruit  

% mycomp2 --bar durian --bar <tab><tab>
jackfruit   snakefruit  

% mycomp2 --bar durian --bar jackfruit --bar <tab>
% mycomp2 --bar durian --bar jackfruit --bar snakefruit _

You’ll also notice that if a bar value has been specified, the choice will be removed from the offering in completion for the subsequent --bar option value. This is because we are using the exclude option in complete_array_elem() (line 32). The $args{args} contains the function arguments that have been formed at that point.

And lastly, in line 38, you see a new function complete_env which can complete from environment variable names. Since both complete_env() and completion routine expect hash argument as well, and the only required argument is also word, we can pass the subroutine reference directly. Let’s see it in action:

% mycomp2 --baz H
HISTCONTROL     HISTIGNORE      HISTTIMEFORMAT  
HISTFILESIZE    HISTSIZE        HOME            

Perl Foundation News: Call For Grant Proposals (March 2015 Round)

The Grants Committee is accepting grant proposals all the time. We evaluate them every two months and another evaluation period has come.

If you have an idea for doing some Perl work that will benefit the Perl community, consider sending a grant application. The application deadline for this round is 23:59 March 15th UTC. We will publish the received applications, get community feedback and conclude the acceptance by March 30th.

To apply, please read How to Write a Proposal. Rules of Operation will also help you understand how the grant process works. For those who are familiar with the process, the format will be the same as the previous rounds in 2014-2015.

We will confirm the receipt of application within 24 hours.

If you have further questions, please comment here. If your comment does not show up here within 24 hours, the chances are that the spam filter did something bad. Get in touch with me at tpf-grants-secretary at perl-foundation.org.

Laufeyjarson writes... » Perl: PBP: 086 Boilerplates

The Best Practices suggest creating boilerplates for POD documentation.  They helpfully provide some examples, and suggest differentiating between modules and applications.  I can’t argue with these ideas, particularly when trying to get a group to standardize on a set, but they are not as clear-cut wins in my mind as the book make them out to be.

One thing the book suggests is that programmers don’t write documentation because of the “empty page” syndrome – they just don’t know what to write and need some structure to fill it in.  That doesn’t match my experience as well as many of the other observations in the book.  My experience is that engineers are outright hostile to writing documentation, and will leave the templates blank or just remove them.

My personal thought on this relates to the left brain/right brain theory.  One half of the brain is logical and organized, the other half is creative, to oversimplify hugely.  I think that many engineers, when writing code, are deep in the logical/symbolic space.  They’re working there where they can manage the complex state of the program in their minds.  At that moment, they don’t have language skills.  They’re shut off, to write code.  Turning them back on is effort, and breaks them out of the mental state to do coding.  (This also helps explain why some error messages are so horrible – they make perfect sense if you know the state of the code, and none at all from a user’s perspective.)

Engineers seem to fight this for three reasons that I’ve noticed.

1> Coding is the fun part.  You’re in the middle of having fun, and suddenly someone wants you to stop and do this boring thing you don’t want to do.  They do not stop the roller coaster in the middle and make you fill out a tax form to keep going.  Documentation is that same kind of interruption.

2> It interrupts the work.  Switching between two mind sets is effort, and they don’t want to do it.

3> They genuinely don’t understand why it’s needed, and think anyone who does need it is “weak” and should just suck it up and read the code, like they did.

My experience is that real professionals understand documentation is important, and will write it.  They will put in vague stubs or leave the boilerplate while they’re deep in symbolic thought and then come back as a second pass and write the needed words – both user documentation and technical documentation.  They often re-organize or reformat the code at the same time.

How does that long digression relate to boilerplate?

If you provide boilerplate, especially verbose boilerplate such as found in the book, engineers will either never fill it out or fill it out in the worst possible way.  Boilerplate needs to be a guideline and a reminder, “Hey, a license section is important, and here’s what ours looks like” not to be cast in stone and say EVERY PROGRAM MUST HAVE A COMPLETE SET OF EVERY OPTION EVER SEEN IN POD.  Guess which most corporations lean to?

Being forced to fill in complex sections of POD that mean nothing for the application you’re working on is not helpful, and in fact makes it harder to face writing the documentation in the first place.

Much better, in my opinion, would be smarter tools.  A POD checker that checks for minimum required sections – wait, you can use perlcritic to do this! – and ways to easily navigate your document as what the POD will look like, as well as have a list of all the common POD sections so you can realize, “Oh, yeah, this does have dependencies, I should list those.” instead of having to throw that away on every other document.

But, maybe I’m wrong.

And having said this, I now wonder if it’s possible to extend my favorite editor to do that.

perlancar's blog: pericmd 029: More on tab completion (1)

perlancar's blog

The next several blog posts will focus on tab completion.

Let’s get right to it with a simple example. Put the code below to mycomp, chmod +x the file, and put it somewhere in your PATH (e.g. /usr/local/bin or $HOME/bin if your PATH happens to have $HOME/bin as an entry):

#!/usr/bin/env perl

use 5.010;
use strict;
use warnings;

use Perinci::CmdLine::Any;

our %SPEC;
$SPEC{mycomp} = {
    v => 1.1,
    args => {
        int1 => {
            schema => [int => min=>1, max=>30],
        },
        str1 => {
            schema => [str => in=>[qw/foo bar baz qux quux/]],
        },
        str2 => {
            schema => ['str'],
        },
    },
};
sub mycomp {
    [200];
}

Perinci::CmdLine::Any->new(
    url => '/main/mycomp',
)->run;

Activate bash completion by executing this command in your shell:

% complete -C mycomp mycomp

If your script happens to live outside PATH, e.g. in /path/to/mycomp, you can instead use:

% complete -C /path/to/mycomp mycomp

but normally your CLI programs will reside in PATH, so the above command is for testing only.

Now to test completion:

% mycomp <tab><tab>
-\?               .gitignore        --json            perl-App-hello/
--config-path     -h                mycomp            --str1
--config-profile  hello             --naked-res       -v
--format          --help            --no-config       --version
.git/             --int1            pause/      

As you can see, by default Perinci::CmdLine gives you a list of known options as well as files and directives in the current directory.

% mycomp -<tab><tab>
-\?               -h                --naked-res       --version
--config-path     --help            --no-config       
--config-profile  --int1            --str1            
--format          --json            -v  

If the current word (the word being completed at the cursor) is “-“, Perinci::CmdLine assumes that you want to complete an option name so it doesn’t give a list of files/dirs. (What if, in the rare case, there is a file beginning with a dash and you want to complete it? You can use ./-.)

If the option name can be completed unambiguously:

% mycomp --i<tab><tab>

then it will be completed directly without showing list of completion candidates (underscore _ shows the location of cursor):

% mycomp --int1 _

Perinci::CmdLine can also complete option values. Now let’s press tab again to complete:

% mycomp --int1 <tab><tab>
1   11  13  15  17  19  20  22  24  26  28  3   4   6   8   
10  12  14  16  18  2   21  23  25  27  29  30  5   7   9   

From the argument schema ([int => min=>1, max=>30]), Perinci::CmdLine can provide a list of numbers from 1 to 30 as completion candidates. Now let’s try another argument:

% mycomp --str1=<tab><tab>
bar   baz   foo   quux  qux   

The schema ([str => in=>[qw/foo bar baz qux quux/]]) also helps Perinci::CmdLine provide a completion list. Now another argument:

% mycomp --str2 <tab><tab>
.git/            hello            mycomp~          perl-App-hello/  
.gitignore       mycomp           pause/           

What happened? Since the schema (['str']) doesn’t provide any hints about possible values, Perinci::CmdLine falls back to completing using files/dirs in the current directory. Of course, you can also do something like:

% mycomp --str2 ../../foo<tab><tab>

to list other directories.

This is all nice and good, but the power of tab completion comes with custom completion: when we are able to provide our own completion to option values (and arguments). Let’s try that by adding a completion routine in our Rinci metadata:

use Complete::Util qw(complete_array_elem);

$SPEC{mycomp} = {
    v => 1.1,
    args => {
        int1 => {
            schema => [int => min=>1, max=>30],
            completion => sub {
                my %args = @_;
                my $word = $args{word};

                # let's provide a list of numbers from 1 to current day of month
                my $mday = (localtime)[3];
                complete_array_elem(word=>$word, array=>[1..$mday]);
            },
        },
        str1 => {
            schema => [str => in=>[qw/foo bar baz qux quux/]],
        },
        str2 => {
            schema => ['str'],
        },
    },
};

You see a couple of things new here. First is the completion routine which is supplied in the completion property of the argument specification. A completion routine will receive a hash of arguments (the most important argument is word, there are other arguments and we will get to it later). A completion routine is expected to return an array of words or a hash (see Complete for the specification of the “completion answer”). Second is the use of the module Complete::Util and a function from the module called complete_array_elem which will return an array filtered by $word as prefix. The module contains some more utility functions which we will discuss later.

Now let’s test it (assuming today is Feb 27th, 2015):

% mycomp --int1 <tab><tab>
1   11  13  15  17  19  20  22  24  26  3   5   7   9   
10  12  14  16  18  2   21  23  25  27  4   6   8   

Debugging completion

When we write completion code, we might make mistakes. For example, suppose we forget to use Complete::Util qw(complete_array_elem); then when we test it, we might get unexpected result:

% mycomp --int1 <tab><tab>
.git/            hello            mycomp~          perl-App-hello/  
.gitignore       mycomp           pause/   

Why is Perinci::CmdLine showing files/dirs from current directory instead?

To help debug problems when doing custom completion, you can use the testcomp utility (install it via cpanm App::CompleteUtils). To use testcomp, specify the command and arguments and put ^ (caret) to signify where the cursor is supposed to be. So type:

% testcomp mycomp --int1 ^
[testcomp] COMP_LINE=<mycomp --int1 >, COMP_POINT=14
[testcomp] exec(): ["/mnt/home/s1/perl5/perlbrew/perls/perl-5.18.4/bin/perl","-MLog::Any::Adapter=ScreenColoredLevel","mycomp"]
[pericmd] -> run(), @ARGV=[]
[pericmd] Checking env MYCOMP_OPT: <undef>
[pericmd] Running hook_after_get_meta ...
[comp][periscomp] entering Perinci::Sub::Complete::complete_cli_arg(), words=["--int1",""], cword=1, word=<>
[comp][compgl] entering Complete::Getopt::Long::complete_cli_arg(), words=["--int1",""], cword=1, word=<>
[comp][compgl] invoking routine supplied from 'completion' argument to complete option value, option=<--int1>
[comp][periscomp] entering completion routine (that we supply to Complete::Getopt::Long)
[comp][periscomp] completing option value for a known function argument, arg=<int1>, ospec=<int1=i>
[comp][periscomp] invoking routine supplied from 'completion' argument
[comp][periscomp] result from 'completion' routine: <undef>
[comp][periscomp] entering complete_arg_val, arg=<int1>
[comp][periscomp] invoking routine specified in arg spec's 'completion' property
[comp][periscomp] completion died: Undefined subroutine &main::complete_array_elem called at mycomp line 22.
[comp][periscomp] no completion from metadata possible, declining
[comp][periscomp] leaving complete_arg_val, result=<undef>
[comp][periscomp] leaving completion routine (that we supply to Complete::Getopt::Long)
[comp][compgl] adding result from routine: <undef>
[comp][compgl] entering default completion routine
[comp][compgl] completing with file, file=<>
[comp][compgl] leaving default completion routine, result={path_sep => "/",words => [".git/",".gitignore","hello","mycomp","mycomp~","pause/","perl-App-hello/"]}
[comp][compgl] adding result from default completion routine
[comp][compgl] leaving Complete::Getopt::Long::complete_cli_arg(), result={path_sep => "/",words => [".git/",".gitignore","hello","mycomp","mycomp~","pause/","perl-App-hello/"]}
[comp][periscomp] leaving Perinci::Sub::Complete::complete_cli_arg(), result={path_sep => "/",words => [".git/",".gitignore","hello","mycomp","mycomp~","pause/","perl-App-hello/"]}
[pericmd] Running hook_display_result ...
.git/
.gitignore
hello
mycomp
mycomp~
pause/
perl-App-hello/
[pericmd] Running hook_after_run ...
[pericmd] exit(0)

From the debug output, you can see the error message and realize that the completion routine dies. You’ll also know that Perinci::CmdLine then falls back to using using files/dirs.


dagolden: What to do if PAUSE tells you this distribution name can only be used by users with permission for X, which you do not have

Over the last year, a handful of CPAN authors have been bitten by PAUSE complaining that they don't have permissions for a distribution name they've uploaded.

What's going on? (short explanation)

PAUSE used to have a gaping security hole; it's now closed. As a result, when an author uploads a distribution with a name like Foo-Bar-Baz-1.23.tar.gz, the author must have primary or co-maintainer permissions on the package name matching the distribution (Foo::Bar::Baz, in this case) or else the distribution will not be indexed. It's still on CPAN, but won't be added to the index that allows people to easily install it.

How to fix it

If you are uploading Foo-Bar-Baz-1.23.tar.gz, make sure you have a "lib/Foo/Bar/Baz.pm" file containing a "package Foo::Bar::Baz" statement.

If you use any sort of clever syntax mangler like Moops that doesn't use "package" statements, be sure your generated META.json or META.yml file includes a "provides" field claiming the package name matching the distribution name. If you don't understand what that means or how to make it happen, you shouldn't be using Moops or anything like it until you do.

What's going on? (long explanation)

Many CPAN ecosystem tools (like rt.cpan.org) treat a distribution (i.e. tarball) name as a significant entity for permissions, etc. But historically, nothing required distribution names to have anything to do with the modules they contained. This led to an interesting security hole: by uploading a distribution matching an existing distribution on CPAN, but with entirely new, unrelated modules, PAUSE would index the modules and associate them with the distribution. The author of said distribution would then be treated as a fully-authorized administrator over the shared distribution name.

Example: Let's say I wanted to hijack the Moose RT queue. I could have uploaded Moose-666.tar.gz containing lib/Not/Really/Moose.pm with "package Not::Really::Moose" and a $VERSION of 666. That would create an index entry like this:

Not::Really::Moose        666      DAGOLDEN/Moose-666.tar.gz

'lo and behold, because I had an indexed distribution "DAGOLDEN/Moose-666.tar.gz", I would become an administrator of the Moose RT queue. And MetaCPAN would think that "666" was the latest release of Moose.

To fix this, PAUSE now ties distribution names to the package namespace permissions system. While I can still upload Moose-666.tar.gz, because I don't have permissions over the "Moose" package name, my bogus distribution would not be indexed. Without being indexed, the ecosystem doesn't use it to give me any permissions.

A small handful of distributions were grandfathered (e.g. libwww-perl) and don't have to follow this rule, but all new distributions do.

Unfortunately, PAUSE's upload reporting has some bugs and other distribution problems can wind up incorrectly reported as a permissions problem. These are actually pretty rare. Still, I hope to work with Andreas at the QA hackathon to fix the upload reporting.

But if you get this error message, it's 90% or more likely that you've got one of these problems:

  1. You don't have a module "Foo::Bar::Baz" in a distribution called Foo-Bar-Baz-$VERSION.tar.gz; fix it by adding that module
  2. You think you have a "Foo::Bar::Baz" module, but PAUSE can't find it or understand your package declaration; fix your package declaration or use a 'provides' field in META.json to be explicit
  3. You have a "Foo::Bar::Baz" module, PAUSE can find it, but for some weird, historical reason someone *else* actually owns that namespace and you never noticed before

If you've ruled out #1 and #2 yourself, please feel free to contact modules@perl.org for help, but be patient, as it make take a while for an admin to see your email and investigate.

I hope this explanation helps anyone mystified by this error message.

perlancar's blog: pericmd 028: Environment support

perlancar's blog

Aside from config file, environment can also be a convenient way to input things. It’s more “secure” than command-line options because a casual “ps ax” command won’t show the values of environment variables unlike the command-line. By default, only the user running the program and the superuser can see the environment values of a process (exposed via /proc/PID/environ which has 0700 mode, while /proc/PID/cmdline has 0444 mode).

Perinci::CmdLine gives you environment support. By default, it reads PROGNAME_OPT for you (the value will be prepended to the command-line option). It has a higher precedence than config files (i.e. it can override values from config files) but can be overriden by command-line options.

Let’s see an example with pause. If you run without any configuration file:

% pause ls
ERROR 400: Missing required argument(s): password, username

If we set PAUSE_OPT environment to:

% export PAUSE_OPT="--username PERLANCAR --password xxx"

then run the program again:

% pause ls
...

then username and password arguments have been set from the environment.

Turning off environment (and configuration)

If you don’t want your CLI program to read from environment or configuration, you can turn these features off via the read_env and read_config attributes, respectively:

Perinci::CmdLine::Any->new(
    ...
    read_env => 0,
    read_config => 0,
)->run;

PAL-Blog: Blog-Battle #5: Fasten

Die Islamisierung des Abendlandes brachte gerade Hunderttausende auf die Straßen. Ein paar, weil sie diese gerne hätten (so ein klares Feindbild hat schon Vorteile, haben wir alle 1939 bis '45 gelernt - naja, anscheinend nur fast alle) und die Meisten, weil sie im Geschichtsunterricht aufgepasst haben. Die Islamisierung des Bloggerlandes dagegen, ist greifbar. Sie geht so weit, dass sogar mein dieswöchiger Blog-Battle Post damit anfängt.

Hacking Thy Fearful Symmetry: got lib? Lieber Gott!

A new version of got hit CPAN a few
days ago, and it has a brand new feature that is mind-bogglingly awesome. Mind you, the fact that I'm the one who sent its PR might paint me as slightly biased on the matter. But let's not dwell too much on the shameless self-promotion going on here, and instead let's turn our gazes to that promised successor to sliced bread.

Before, though, a quick recap: got is a lovely little utility that help you manage your git repositories. At its core, it keeps a list of managed local git repositories and, upon request, will let you know of the status of each of them (dirty, all neatly commited locally, in sync with the remote origin) or update all the remote origins en masse.

That, by itself, is wonderful, but that core goodness comes with even more delicious sprinkles: got can open shells in specified projects, or open a whole slew of them in tmux windows. It can also fork stuff from GitHub. In short, it's growing to be a very nice repositories command center...

... and the growth just went one size bigger. You see, either at $work or on my yak shaving expeditions, I tend to end up dealing with sizable codebases spanning many repositories. Which means that a truckload of custom library paths are usually required to get any of their scripts to work.

At first, of course, I went for the direct approach:

          $ perl -Ilib -I/and/this/other/lib -I../and/yet/another/lib bloody/script.pl

        

Then got annoyed with super-long commands, so stuffed all those libs in PERL5LIB.

          # warning: this is fish, not bash
$ set -x PERL5LIB lib /and/this/other/lib ../and/yet/another/lib

$ perl bloody/script.pl

        

And then got fed up with remembering which bunch of libraries I need for this project or that project, so I looked into ylib and Devel::Local. Truth to be told, they are fairly good solutions. But I thought that since got is already my repository shepherd, wouldn't it be nice if it could take away even more of the humdrum of the library dance?

Well, thanks to got tag and got lib, it can. Lemme demonstrate.

Let's say that I try to run Galuga from a Dancer-less stock perlbrew install.

          15:26 yanick@enkidu ~/work/perl-modules/Galuga
$ perl bin/app.pl
Can't locate Dancer2.pm in @INC [...]

        

I'm missing Dancer2. And probably a bunch of plugins. Very sad situation. Fortunately, I have all that I need in local repositories, which I had the good sense to tag as being dancer2-related:

          $  got list --tag dancer2
8) Dancer2                   git   git@github.com:PerlDancer/Dancer2.git
9) Dancer2-Template-Caribou  git   git@github.com:yanick/Dancer2-Template-Caribou.git
22) dancer2-plugin-feed       git   git@github.com:yanick/Dancer-Plugin-Feed.git

        

Cue in got lib, which will help me setting up that PERL5LIB.

          # got lib can expand from a single repo
$ got lib Web-Query/lib
/home/yanick/work/perl-modules/Web-Query/lib

# or from a whole tagged set
$ got lib @dancer2/lib
/home/yanick/work/perl-modules/dancer/Dancer2/lib:/home/yanick/work/perl-modules/dancer/Dancer2-Template-Caribou/lib:/home/yanick/work/perl-modules/dancer/Dancer2-Plugin-Feed/lib

# or can pass a plain ol' directory through 
$ got lib ./lib
/home/yanick/work/perl-modules/Galuga/lib

# all together now
$ got lib Web-Query/lib @dancer2/lib ./lib
/home/yanick/work/perl-modules/Web-Query/lib:/home/yanick/work/perl-modules/dancer/Dancer2/lib:/home/yanick/work/perl-modules/dancer/Dancer2-Template-Caribou/lib:/home/yanick/work/perl-modules/dancer/Dancer2-Plugin-Feed/lib:/home/yanick/work/perl-modules/Galuga/lib

        

Once we're happy, the list of library paths can be put in a .gotlib file to be automatically picked by got lib.

          $ cat .gotlib
./lib
@dancer2/lib
Web-Query/lib

$ got lib
/home/yanick/work/perl-modules/Galuga/lib:/home/yanick/work/perl-modules/dancer/Dancer2/lib:/home/yanick/work/perl-modules/dancer/Dancer2-Template-Caribou/lib:/home/yanick/work/perl-modules/dancer/Dancer2-Plugin-Feed/lib:/home/yanick/work/perl-modules/Web-Query/lib

        

And, finally, we can use that to populate PERL5LIB.

          # using the 'fish' shell
# for 'bash' you'll want 'export PERL5LIB=`got lib`'
$ set -x PERL5LIB (got lib)

# TADAH!
$ perl bin/app.pl
[ ... Dancer2 is found and everybody's happy ... ]

        

Being the lazy person I am, I can also let my shell do the work of doing all that if a .gotlib file is present in the current directory:

          $ cat ~/.config/fish/functions/__got_lib.fish
function __got_lib --on-variable PWD --description 'set got lib'

    status --is-command-substitution; and return

    test -f '.gotlib'; or return

    set -l mylib (got lib)
    echo "setting PERL5LIB to $mylib"

    set -x PERL5LIB $mylib
end

$ cd Galuga/
setting PERL5LIB to /home/yanick/work/perl-modules/Galuga/lib:/home/yanick/work/perl-modules/dancer/Dancer2/lib:/home/yanick/work/perl-modules/dancer/Dancer2-Template-Caribou/lib:/home/yanick/work/perl-modules/dancer/Dancer2-Plugin-Feed/lib:/home/yanick/work/perl-modules/Web-Query/lib

        

Neat, isn't?

Oh, and before I sign off this blog entry, a last note. You probably noticed that the '/lib' subdirectories are explicitly given. That was a conscious decision of mine. In part to allow for special cases (like adding test-specific libs via @dancer/t/lib), but also to keep the door open for got lib to be used for non-Perl projects. I won't go into details here, but just to give you a teaser... say you are in a repo building .so files that you want to append to your LD_LIBRARY_PATH. Then this will do the trick:

          $ echo $LD_LIBRARY_PATH
/usr/kde/3.5/lib /usr/kde/3.5/lib

$ set -x LD_LIBRARY_PATH  ( got lib --libvar LD_LIBRARY_PATH ./build/ )

$ echo $LD_LIBRARY_PATH
/home/yanick/work/perl-modules/Galuga/build:/usr/kde/3.5/lib

        

Enjoy!

The Onion Stand: My February CPAN PR Challenge: Template::Plugin::Autoformat

For this month's CPAN Pull Request Challenge I was assigned with  Template::Plugin::Autoformat, a module that lets you easily format text and numbers in your Template Toolkit templates, using Damian Conway's excellent Text::Autoformat. If you ever needed to adjust text right/left/center justification, alignment, capitalization, bullets, indenting, without being able to resort to CSS - for example, if your templates are not for HTML or if said text is inside a <pre> tag - this module could make your life much easier!

A different challenge


It was an unusually busy month for me and I didn't get a chance to tackle it until yesterday - I even got Neil's "One week left!" email reminder, which was nice. Even so, I figured it would be ok, because the original PR Challenge email mentioned this module had several CPAN Testers FAIL reports, CPANTS issues, and hadn't seen an update in several years.

Except now when I finally checked it out and it saw it was last released a month ago, had zero failures on CPAN Testers, no open issue on either RT or Github, no CPANTS issues, pristine documentation, even complete META resource information! I thought I had been assigned to a dist needing help, but instead what I was looking at was a stable and well-maintained one.

As it turns out, Template::Plugin::Autoformat hadn't seen a single update between 2008 and 2014, when it was adopted by Peter Karman. Peter started making several developer releases until he was satisfied with the results, and released a stable version in January 2015 (which is probably why it was still in the CPAN PR Challenge's list, created prior to said release). His new version fixed all open issues, had a great test coverage and felt like one of those modules that do one thing and do it very well.

And now I had 1 day to send a nice pull request to a module that looked like it needed no pull requests, or I'd lose the challenge :X

Starting small


Okay, instead of being caught in analysis paralysis, I cloned the repo and looked for low hanging fruits. Turns out there were some!

  1. The copyright year was still 2014. This is the first place I look because it's also the first place I overlook in my own projects :) It's such a simple patch that it almost feels like cheating, but you have to start somewhere, right? Done.
  2.  The README was not in Markdown. I love markdown READMEs, because they make the Github project page look *so* much nicer without compromising reading it from the terminal. Even better, "README.md" is fully supported by PAUSE & CPAN \o/. As I was making the conversion, I noticed the README's contents were just a copy of the pod, so I tweaked it a bit to include installation instructions and just a teaser pointing to the full docs, online and via perldoc. This is a good thing for the developer too, as there's less duplicate content to worry about. Done.
  3. The distribution did not declare a minimum perl version. CPANTS Kwalitee is a terrific free service for the Perl community, letting users and authors know whether a given module passes or fails several quality assurance metrics. While, as I mentioned before, Template::Plugin::Autoformat passed all core CPANTS metrics, this extra metric was not being met. In fact, it was the only extra metric not being met. Thankfully, the excellent perlver tool makes it very easy to find the minimum perl version for your module or app. It reported 5.6.0 as being the minimum version so, after a very simple addition to the Makefile.PL, I had my third pull request of the night.
  4. The Changes file was not fully compliant with the CPAN::Changes spec. This was also an easy one to fix, since the only standing issue was formatting the release dates to something CPAN::Changes would understand. Next!
  5. Test coverage was almost 100%, but not exactly 100%. This is another great way to help other projects: check the code coverage and see if you can improve it in any way. In this case, after running the great cover tool, I found out it had 100% statement coverage, but 50% pod coverage and 91.6% branch coverage. The pod coverage was actually a mistake - there was a private function being counted as public. Adding the missing branch test was also pretty straightforward. After the patch, Template::Plugin::Autoformat got 100% coverage in everything - which is pretty cool!
  6. The "NAME" key in Makefile.PL had the distribution name, not the main package's name. Now, the builder is clever enough to do the right thing, but nevertheless it was triggering a warning every time I ran "make" - which was quite a bit while I played with test coverage. Easy fix again, just s/Template-Plugin-Autoformat/Template::Plugin::Autoformat/ and I was done for the night.
So after a couple of hours having fun with Template::Plugin::Autoformat, I had 6 PRs to show for on the PR Challenge. Woot! Best of all, just a few hours later Peter Karman merged all my PRs and made a new release \o/

perlancar's blog: pericmd 027: Configuration file support (2)

perlancar's blog

This post is still on config files: I want to mention a couple of features that might be useful.

Specifying arrays

In a traditional INI file, array of strings are written as multiple lines of parameters with the same name, so:

array=value1
array=value2

which will result in array value ["value1", "value2"]. There is a problem with this approach though: you can’t specify an array with zero elements. (Actually, specifying array with one element is also problematic in general because you can’t tell whether you want to specify a string/scalar or a one-element array, but this is not a problem in Perinci::CmdLine because argument schema helps pointing out which.)

So the IOD format allows specifying parameter value as JSON:

array=!json ["value1","value2"]
array2=!j []

Or, if you specify an array, you can skip the “!json” or “!j” part and use “[…]” notation directly. IOD recognizes “[” as a marker of JSON arrays:

array=["value1","value2"]
array2=[]

Specifying hashes

In a traditional INI format you can’t specify a hash parameter value. Usually, when an INI file is read into a data structure by a reader module, a section is represented by a hash of parameters and their values.

IOD allows specifying values of JSON hashes (objects), or any valid JSON values for that matter. Like in the case of array, you can omit the “!json” or “!j” part because “{” is regarded as the marker for a hash.

hash=!json {"father":50, "mother":45}
also hash={"father":50, "mother":45}

Let’s see this in action using fatten, a CLI program that uses Perinci::CmdLine. I have the following ~/.config/fatten.conf:

[profile=parse-id-phone]
trace_method=require
overwrite=1
include=Parse::PhoneNumber::ID
include=Perinci::CmdLine::Lite

fatten has a feature that, instead of using subcommands as section names, it looks for script name in the section, for convenience.

parse-id-phone, in turn, is another Perinci::CmdLine-based script:

use Perinci::CmdLine::Any -prefer_lite=>1;

Perinci::CmdLine::Any->new(
    url => "/Parse/PhoneNumber/ID/parse_id_phone",
)->run;

Since fatten detects modules used through trapping the require() statement, modules like Perinci::CmdLine::Lite (the backend used by Perinci::CmdLine::Any) and Parse::PhoneNumber::ID (the backend module for the script itself) fail to be detected and we need to tell fatten via the –include option (or include parameter in the config file). Thus, when we run fatten to fatpack the parse-id-phone script:

% fatten --input-file `which parse-id-phone` --output-file /tmp/parse-id-phone --debug
fatten: Created tempdir /tmp/KYa79g6Qan
fatten: Will be targetting perl bless( {original => "v5.18.4",qv => 1,version => [5,18,4]}, 'version' )
fatten: Tracing dependencies ...
fatten:   Tracing with method 'require' ...
ERROR 400: Missing required argument(s): text
fatten: Building lib/ ...
fatten:   Adding module: Perinci::CmdLine::Any (traced)
fatten:   Adding module: Perinci::CmdLine::Lite (traced)
fatten:   Adding module: Log::Any (traced)
fatten:   Adding module: Log::Any::Manager (traced)
fatten:   Adding module: Log::Any::Adapter::Util (traced)
fatten:   Adding module: Log::Any::Adapter::Null (traced)
fatten:   Adding module: Log::Any::Adapter::Base (traced)
fatten:   Adding module: Log::Any::Proxy (traced)
fatten:   Adding module: Mo (traced)
fatten:   Adding module: Mo::build (traced)
fatten:   Adding module: Mo::default (traced)
fatten:   Adding module: experimental (traced)
fatten:   Adding module: Perinci::CmdLine::Base (traced)
fatten:   Adding module: Perinci::Access::Lite (traced)
fatten:   Adding module: Perinci::AccessUtil (traced)
fatten:   Adding module: Perinci::CmdLine::Util::Config (traced)
fatten:   Adding module: Parse::PhoneNumber::ID (traced)
fatten:   Adding module: Function::Fallback::CoreOrPP (traced)
fatten:   Adding module: Perinci::Sub::Util (traced)
fatten:   Adding module: Perinci::Sub::Normalize (traced)
fatten:   Adding module: Sah::Schema::Rinci (traced)
fatten:   Adding module: Data::Sah::Normalize (traced)
fatten:   Adding module: Perinci::Object (traced)
fatten:   Adding module: Perinci::Object::Function (traced)
fatten:   Adding module: Perinci::Object::Metadata (traced)
fatten:   Adding module: String::Trim::More (traced)
fatten:   Adding module: Config::IOD::Reader (traced)
fatten:   Adding module: Perinci::Sub::GetArgs::Argv (traced)
fatten:   Adding module: Getopt::Long::Util (traced)
fatten:   Adding module: Perinci::Sub::GetArgs::Array (traced)
fatten:   Adding module: Data::Sah::Util::Type (traced)
fatten:   Adding module: Parse::PhoneNumber::ID (included)
fatten:   Adding module: Perinci::CmdLine::Lite (included)
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Perinci/CmdLine/Any.pm --> /tmp/KYa79g6Qan/lib/Perinci/CmdLine/Any.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Perinci/CmdLine/Lite.pm --> /tmp/KYa79g6Qan/lib/Perinci/CmdLine/Lite.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Log/Any.pm --> /tmp/KYa79g6Qan/lib/Log/Any.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Log/Any/Manager.pm --> /tmp/KYa79g6Qan/lib/Log/Any/Manager.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Log/Any/Adapter/Util.pm --> /tmp/KYa79g6Qan/lib/Log/Any/Adapter/Util.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Log/Any/Adapter/Null.pm --> /tmp/KYa79g6Qan/lib/Log/Any/Adapter/Null.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Log/Any/Adapter/Base.pm --> /tmp/KYa79g6Qan/lib/Log/Any/Adapter/Base.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Log/Any/Proxy.pm --> /tmp/KYa79g6Qan/lib/Log/Any/Proxy.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Mo.pm --> /tmp/KYa79g6Qan/lib/Mo.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Mo/build.pm --> /tmp/KYa79g6Qan/lib/Mo/build.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Mo/default.pm --> /tmp/KYa79g6Qan/lib/Mo/default.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/experimental.pm --> /tmp/KYa79g6Qan/lib/experimental.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Perinci/CmdLine/Base.pm --> /tmp/KYa79g6Qan/lib/Perinci/CmdLine/Base.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Perinci/Access/Lite.pm --> /tmp/KYa79g6Qan/lib/Perinci/Access/Lite.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Perinci/AccessUtil.pm --> /tmp/KYa79g6Qan/lib/Perinci/AccessUtil.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Perinci/CmdLine/Util/Config.pm --> /tmp/KYa79g6Qan/lib/Perinci/CmdLine/Util/Config.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Parse/PhoneNumber/ID.pm --> /tmp/KYa79g6Qan/lib/Parse/PhoneNumber/ID.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Function/Fallback/CoreOrPP.pm --> /tmp/KYa79g6Qan/lib/Function/Fallback/CoreOrPP.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Perinci/Sub/Util.pm --> /tmp/KYa79g6Qan/lib/Perinci/Sub/Util.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Perinci/Sub/Normalize.pm --> /tmp/KYa79g6Qan/lib/Perinci/Sub/Normalize.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Sah/Schema/Rinci.pm --> /tmp/KYa79g6Qan/lib/Sah/Schema/Rinci.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Data/Sah/Normalize.pm --> /tmp/KYa79g6Qan/lib/Data/Sah/Normalize.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Perinci/Object.pm --> /tmp/KYa79g6Qan/lib/Perinci/Object.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Perinci/Object/Function.pm --> /tmp/KYa79g6Qan/lib/Perinci/Object/Function.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Perinci/Object/Metadata.pm --> /tmp/KYa79g6Qan/lib/Perinci/Object/Metadata.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/String/Trim/More.pm --> /tmp/KYa79g6Qan/lib/String/Trim/More.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Config/IOD/Reader.pm --> /tmp/KYa79g6Qan/lib/Config/IOD/Reader.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Perinci/Sub/GetArgs/Argv.pm --> /tmp/KYa79g6Qan/lib/Perinci/Sub/GetArgs/Argv.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Getopt/Long/Util.pm --> /tmp/KYa79g6Qan/lib/Getopt/Long/Util.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Perinci/Sub/GetArgs/Array.pm --> /tmp/KYa79g6Qan/lib/Perinci/Sub/GetArgs/Array.pm ...
fatten:   Copying /home/s1/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/Data/Sah/Util/Type.pm --> /tmp/KYa79g6Qan/lib/Data/Sah/Util/Type.pm ...
fatten:   Added 31 files (340.6 KB)
fatten: Packing ...
fatten:   Produced /tmp/parse-id-phone (372.7 KB)
fatten: Deleting tempdir /tmp/KYa79g6Qan ...

perlancar's blog: Things that are so last {decade,century} and annoying that I encountered recently (1)

perlancar's blog

1) When you download a torrent and inside you see 50 files named *.rar, *.r00, *.r01 and so on.

2) When you open a website and it has a self-glorifying video (like Flash) intro that autoplays and a link you have to click to go to the “main page”. Ironically, PayPal is now showing exactly this. Of course, in the 00’s era you also had to wait up to a minute until that percentage of loading video reaches 100%.

3) When you run CPAN client for the first time and get asked about which mirror host would you like use.

Okay, okay, I know. This is just a cheap shot at how old Perl is :-) While we’re at it, let’s enumerate other things that are old and annoying in Perl:

Old and annoying

1) Context. Does anybody agree that this is more trouble than it’s worth? The fact that no other languages steal this feature seem to reinforce the feeling that context probably sucks. But this is such a core feature of Perl that we’ll just have to live with it until eternity.

2) No builtin OO. Okay, not my personal rant as I think Perl’s OO is fine as it is. But everybody is whining that “OO support in Perl 5 is {not builtin, bolted on, half-assed, abysmal}, wwweeh!” so I’ll just list it here and be done with it.

3) no builtin clone() function. Data::Clone is not core, Clone::PP is not core, Clone is not core, Storable is core but … JSON is too simplistic and not core. YAML is not core. Data::Dumper is, well… Everything is slowish to superslow to hyperslow. Sereal is not core. Perhaps at least make Sereal core?

4) JSON is not core? Does this send a message to the world that we don’t care about “the Web”? Funnily, JSON::PP is core. So perhaps JSON support is core. -ish. Frankly, I don’t understand this situation.

5) As already mentioned, the default CPAN client. Many Perl-related websites are old but still perfectly functional; they just don’t utilize some niceties like for example AJAX for voting or such that you have to press Submit and load a new page just to vote. And they ooze the old look. The default CPAN client is similar in vein: It retains default settings that are no longer relevant or annoying so that they always remind you of the old era. Things like prompting you with too many questions at the beginning of use or being too verbose with messages by default, which remind you of an era when platforms vary wildly (compare the plethora of incompatible Unices/weird architectures to the today’s world of mostly Windows+Linux+OSX) or when download speed is so slow that you have to prompt/update for every file being downloaded.

6) Too many special rules exceptions. Consistency and simplicity are good and more valued nowadays when we don’t have time for anything. This is the age of distraction and short attention span. Who has time to learn all the quirks and exceptions when new languages, frameworks, and shiny new things come out every week? (XXX Examples.)

This list is far from complete and I’m sure to update it. But let’s also list things in Perl that are old and also rock!

Old but rock

1) everything is a manpage. Man rocks, man! I think alternatives like Ruby’s ri or bropages are just inferior bastard children.

2) sigils. Sigil rocks, man. They aid readability. Unfortunately we have too many of them (and even more in Perl 6). I think “one sigil $ for variables” is enough. Long live the shell!

Like Perl even today, this is a post in progress

Also, like Perl (or some of its features, to be more exact), I’m too so last {century, decade}. Whether I’m annoying or not is left as an exercise to people around me.

I long for the days when things like tweeting or smartphone is considered so last century and I look to them with great nostalgic feeling. Probably when I’m playing with my grandchild.


perlancar's blog: pericmd 026: Configuration file support

perlancar's blog

Perinci::CmdLine supports reading configuration files. I had planned for an abstract configuration system, something like Config::Any but IOD-based, but as that is not ready yet and I needeed config file support immediately, I implemented the bare essentials.

Configuration is basically a way to use files to supply values for function arguments, just like command-line options which is also a way to use command-line to supply values for function arguments. It is useful for cases where using command-line options is cumbersome (e.g. the values are long/many) or insecure (e.g. supplying password).

Configuration is searched either in the user’s home directory (~/.config or ~) or global directory /etc (I’m not sure about the Windows equivalent for /etc, any input? So far I’ve only used File::HomeDir->my_home on Windows). Configuration file name is program_name + .conf. The format of configuration file is IOD, which is basically INI with some extra features (it’s more INI-compatible than other formats like TOML).

The configuration section maps to subcommand (if program does not have subcommands, just put the parameters outside any section). Configuration parameter maps to function argument name (without any translation to command-line options, so foo_bar function argument is specified as foo_bar instead of foo-bar or --foo-bar).

There is a concept of “profiles”, which lets you keep multiple sets of values in a single file by using section. You just need to add ” profile=PROFILENAME” to any configuration section to make a configuration belong to a certain profile.

Parameters outside any section will be applied to all subcommands and profiles.

Let’s see an example. Program myprog:

our %SPEC;
$SPEC{myfunc} = {
    v => 1.1,
    args => {
        user => {schema=>'str*', req=>1},
        pass => {schema=>'str*', req=>1},
    },
};
sub myfunc {
    my %args = @_;
    if ($args{user} eq 'ujang' && $args{pass} eq 'alu') {
        [200, "OK", "nasi goreng"];
    } elsif ($args{user} eq 'nyai' && $args{pass} eq 'nyiru') {
        [200, "OK", "sayur asem"];
    } else {
        [401, "Wrong password"];
    }
}

use Perinci::CmdLine::Any;
Perinci::CmdLine::Any->new(
    url => '/main/myfunc',
)->run;

If we run this program:

% ./myprog
ERROR 400: Missing required argument(s): pass, user

We can of course supply the user and pass arguments via command-line options (–user and –pass), but passing passwords over the command-line is unsafe due to ps ax and all. So let’s put them in a configuration file. In ~/myprog.conf:

user=ujang
pass=alu

Then when we run the program again:

% ./myprog
nasi goreng

Let’s put in several profiles in the config file:

[profile=u1]
user=ujang
pass=alu

[profile=u2]
user=nyai
pass=nyiru

When we run the program:

% ./myprog --config-profile u1 ;# we pick ujang
nasi goreng
% ./myprog --config-profile u2 ;# we pick nyai
sayur asem

You can override the value of function arguments from command-line, since the command-line has higher precedence:

% ./myprog --config-profile u1 --pass foo
ERROR 401: Wrong password

You can also customize the location of config file via --config-path or disable searching of config file via --no-config.


Sebastian Riedel about Perl and the Web: Mojolicious 6.0 released: Perl real-time web framework

image

It fills me with great joy to announce the release of Mojolicious 6.0 (Clinking Beer Mugs).

This has been the first major release for the newest member of our core team, please welcome Jan Henning Thorsen. It would appear that 2015 will be remembered as the year of the 6.0 releases, but the year is still young and there’s a lot more for us to look forward to. The IETF has just approved HTTP/2, which may, for better or worse, change completely how we develop web applications, and we can’t wait to see where this will lead us. There will also be a Mojoconf this year, in New York, preparations have already begun and we should be able to share more details very soon.

As real-time web technologies are becoming more and more mainstream, the community has matured, but kept up a steady growth rate. We have been able to reinforce our position as the most starred Perl project on GitHub, and every day there are now hundreds of users browsing through the official documentation. The mailing-list is actually just about to reach 1000 subscribers, thanks everyone!

The main focus this year has been performance, pretty much everything got faster and/or scales better. But there are also quite a few new features, here’s a list of the highlights:

  • Nested helpers: Organize your helpers into namespaces, like the new built-in helper reply->asset. (example)
  • SOCKS5 support: Through IO::Socket::Socks. (example)
  • Non-blocking name resolution support: Through Net::DNS::Native.
  • New Mojo::DOM: Completely redesigned API and experimental support for case-insensitive attribute selectors like [foo=”bar” i].
  • RFC 3339 support: Almost every new REST API uses it. (example)
  • Content negotiation: Now with If-None-Match and If-Modified-Since. (example)
  • IPv6 everywhere: IO::Socket::IP has become so reliable that we now use it for everything, all the time.
  • No more “wantarray()”: To prevent security vulnerabilities, it is gone from the entire code base.
  • Mojo::Pg and Minion: Had stable 1.0 releases and have since become official spin-off projects.

And as usual there is a lot more to discover, see Changes on GitHub for the full list of improvements.

Have fun!

Laufeyjarson writes... » Perl: Cross Platform Perl Talk Abstract

I am submitting a talk to this year’s YAPC::NA.  It’s a talk I’ve given a couple of times, and has been well received.  The submission form wants a URL for the abstract, and this was the most straightforward way I could think of to put it up.  Feedback is welcome, even if the talk itself is not accepted.

The Proposed Talk

Perl has a long history, and began as a portable Unix program, as many were at the time. Perl scripts were often fairly Unix independent with trivial effort. Windows and Mac joined the family over time, and thanks to great efforts on the part of the language itself, Perl scripts remained fairly system independent. Perl provides many of the tools to make this easier as built in features, and others are hiding in CPAN.

This talk is a discussion of the history of Perl on multiple platforms, some of the common pitfalls working on multiple platforms, and suggestions for tools and techniques to make writing Perl scripts that run on more than one platform.

The talk can be tailored to be either 50 minute or 110 minute talk, as space allows.  The 110 minute version contains more history and examples, shown on several platforms.

Slides and example code from prior talks are available on Github.

The Presenter

Louis Erickson

Louis Erickson

Lou Erickson began his work with Perl on Windows in 1998, and has worked extensively with it on Windows, and Unixish, and some on the Mac. Today, he writes large-scale systems in Perl for NVIDIA, runs his mouth on his blog, and gives the occasional talk at this local Perl User’s Group.

Laufeyjarson writes... » Perl: PBP: 085 Types of Documentation

This Practice starts off a new section, on documentation.  I think there’s at least one point in this chapter I disagree with Mr. Conway’s conclusions, but we’ll get there presently.  In general, I think he has valuable things to say, and the issues are worth thinking about and deciding on.

Documentation is important, and it’s something many of us are terrible at.

The first Practice he suggests is to be aware there are several kinds of documentation, and to separate the user documentation from the technical documentation.

This is actually just basic writing practice – know your audience.  You don’t use highly technical language writing for six year olds, and you rarely discuss mice who write motorcycles when writing for PhD’s.

The Practice suggests keeping user documentation and technical documentation separate.  It suggests the public parts of POD for user documentation, and private parts of POD or comments for the technical documentation.  They’re different documents for different audiences.  The book goes on to describe in more detail what they mean, and what might go in what type of documentation, with reasonable examples.

This is fine, as far as it goes.  I think documentation is really important, and that not only are there multiple audiences, there are multiple types of documentation.

Most programs wind up with POD that contains reference documentation.  It has a long list of functions or methods, each of which explains what the parameters are and the return values.  They might even each have examples.

Nowhere in most POD do you get the explanation of why you need these things or what the module exists to do.  The abstract information needed to understand the details is not present.

Have you seen an application with a dialog box with a field labelled “First Name”, and a text box, where the on-line help says, “Provide the user’s first name.”  That’s reference documentation of the worst kind, because it was obvious.  I probably guessed that a user’s first name went in the “First Name” text box.  I bet I even guessed that their surname went in the “Last Name” text box.  What I don’t know is why I’m providing a name, what will be done with it, and how it relates to the other parts of the system.  Will this be a user name?  A login ID?  An e-mail address?  Who knows!  It’s a First Name, and a Last Name.  I’m so relieved that the on-line help clarified that for me.

Now, maybe this kind of information doesn’t belong in the POD for a module.  But if it isn’t there, where else will it be, especially where will it be that the programmer writing and maintaining the thing will see it to update it?  It might be in a separate POD file.  It might be on a Wiki.  It might be published in a paper.  It might be on a blog post.  It might be written on the back of a cocktail napkin lost in their desk drawer.

Wherever it is, the reader needs to be able to find it, and the author needs to be reminded it exists to keep it updated.  My favorite way to do this is to keep it all together in the POD, so it doesn’t get lost.  If that’s objectionable, the POD should provide a link to it.

A good friend of mine says, “If the user can’t find it, it doesn’t exist.”  Does your documentation exist?

perlancar's blog: pericmd 025: Dynamic list of subcommands

perlancar's blog

In one of my applications using Perinci::CmdLine, we have hundreds of subcommands. The subcommands are taken from all the functions in all of the API modules. Instead of updating the CLI application script and adding the subcommand manually to the subcommands attribute hash in the Perinci::CmdLine constructor, I decided that the framework should support dynamic list of subcommands.

So instead of a hash, Perinci::CmdLine also accepts subcommands attribute as a coderef, which is expected to return a hash. An example:

use Perinci::CmdLine::Any;

our %SPEC;

$SPEC{':package'} = {
    v => 1.1,
    summary => 'Demonstrates dynamic subcommands',
};

$SPEC{noop} = {
    v => 1.1,
    result_naked => 1,
};
sub noop {}

Perinci::CmdLine::Any->new(
    url => '/main/',
    subcommands => sub {
        my %subcommands = map { ("subcmd$_" => {url=>"/main/noop"}) } "01".."50";
        return \%subcommands;
    },
)->run;

The above code will generate 50 subcommands programmatically (although admittedly, in the above contrived example one might as well generate the hash and assign it to the subcommands attribute directly).

Applications for this feature include: a remote client which fetches the list of API functions (or modules + functions) dynamically upon run, so the client does not need to be updated whenever there is a new API function on the server.


Perl Foundation News: YAPC::NA::2015 Hackathons Announced

The Perl Foundation is excited to announce three hackathons that will be running sequentially with YAPC::NA 2015 in Salt Lake City this June. These events include a Perl 6 hackathon with Perl creator Larry Wall to be held on June 11th.

The cost to attend these three hackathons is included in all YAPC::NA 2015 passes, but we do encourage you to RSVP online so we know how many to expect. Putting on these events is not cheap! Along with being able to attend each hackathon, we will also be providing rooms, wi-fi, and snacks throughout the day. So we would appreciate your help! To help cover these costs, we have a suggested donation of $50 per day, which you can make during your online registration.

Each hackathon listed will run from 9:00 am until 6:00 pm. We will provide snacks and drinks, but you will be on your own for lunch.

This year's hackathons will be:

Sunday, June 7th 2015: Pull request hackathon

Come join your fellow Perl developers for a day of bug squashing and developing as you participate in the 2015 CPAN Pull Request Challenge. Or, bring your own pet project and just enjoy a day of hacking amongst your like-minded peers.

Thursday, June 11th 2015: Perl 6 hackathon with Larry Wall

Come spend an entire day working on the next generation of Perl. There's no better way to learn it than to do it! Perl creator, Larry Wall will be on hand throughout the day.

Friday, June 12th 2015: Hardware hackathon

Get up to your armpits in Arduino! Bring your own toys or come check out some of ours. It's time to get creative and see what you can make.

We look forward to seeing you! Please take a moment to register on the YAPC site: http://www.yapcna.org/yn2015/purchase

Perl Foundation News: Ricardo Signes Grant Application Successful

I am pleased to announce that Ricardo Signes' recent grant application to cover the costs of his travel to the QA Hackathon has been successful. I would like to thank everyone who provided feedback on this grant.

The grant was awarded from our Perl 5 Core Maintenance Fund. If you wish to contribute to this fund please go to our donation system or contact karen [a] perlfoundation.org

The QA Hackathon is a free of charge coding workshop for people involved in Quality Assurance, testing, packaging, CPAN, and other quality assurance projects. It is taking place in Berlin, Germany, from the 16th to the 19th April.

perlancar's blog: pericmd 024: Getopt::Long::Subcommand

perlancar's blog

Let’s take a look at another module in this post. If you are familiar with Getopt::Long but want to support subcommands and do not want (or have the time) to invest too much time on Perinci::CmdLine, you can try another one of my modules instead: Getopt::Long::Subcommand. That module is created precisely for the said situation.

Like Getopt::Long, it also has the GetOptions() function. But the interface is rather different, instead of an options specification hash, you supply a hash of program specification, containing keys like summary, description, and options (which is the options specification hash).

The options specification is, like in Getopt::Long, also a hash with keys like ‘foo=s’, but the value is also different. Instead of just a reference to a variable or a handler coderef, for the value you supply another hash of specification, containing keys like summary, and handler. For the handler you supply the coderef or reference to value.

The program specification can also contain the key subcommands which is where you put the subcommands. The value is a hash of subcommand names and specification. A subcommand specification is like program specification, and can also contain another key subcommands for nested subcommands.

Taken from the module’s Synopsis:

use Getopt::Long::Subcommand; # exports GetOptions

my %opts;
my $res = GetOptions(

    summary => 'Summary about your program ...',

    # common options recognized by all subcommands
    options => {
        'help|h|?' => {
            summary => 'Display help message',
            handler => sub {
                my ($cb, $val, $res) = @_;
                if ($res->{subcommand}) {
                    say "Help message for $res->{subcommand} ...";
                } else {
                    say "General help message ...";
                }
                exit 0;
            },
        'version|v' => {
            summary => 'Display program version',
            handler => sub {
                say "Program version $main::VERSION";
                exit 0;
            },
        'verbose' => {
            handler => \$opts{verbose},
        },
    },

    # list your subcommands here
    subcommands => {
        subcmd1 => {
            summary => 'The first subcommand',
            # subcommand-specific options
            options => {
                'foo=i' => {
                    handler => \$opts{foo},
                },
            },
        },
        subcmd1 => {
            summary => 'The second subcommand',
            options => {
                'bar=s' => \$opts{bar},
                'baz'   => \$opts{baz},
            },
        },
    },

    # tell how to complete option value and arguments. see
    # Getopt::Long::Complete for more details, the arguments are the same
    # except there is an additional 'subcommand' that gives the subcommand
    # name.
    completion => sub {
        my %args = @_;
        ...
    },

);
die "GetOptions failed!\n" unless $res->{success};
say "Running subcommand $res->{subcommand} ...";

Like in Perinci::CmdLine (and also Getopt::Long::Descriptive), you put the summary text for program, each subcommand, and each option. This allows the module to generate a nice help message for you automatically (which, unfortunately, at the time of this writing is not yet implemented).

Also like Perinci::CmdLine and Getopt::Long::Complete, there is completion support.

Unlike with Perinci::CmdLine, you write your program “conventionally”, like you would with Getopt::Long. There is no concept of Rinci metadata or Riap URL.

Also unfortunately, at the time of this writing there is no “real-world” application written using this module, because I write most of my CLI apps using Perinci::CmdLine. Aside from the example in Synopsis, there is a demo script demo-getopt-long-subcommand which shows the features as well as tab completion, but apart from that doesn’t do anything useful.


The Effective Perler: Perl v5.18 adds character class set operations

Perl v5.18 added experimental character code set operations, a requirement for full Unicode support according to Unicode Technical Standard #18, which specifies what a compliant language must support and divides those into three levels.

The perlunicode documentation lists each requirement and its status in Perl. Besides some regular expression anchors handling all forms of line boundaries (which might break older programs), set subtraction and intersection in character classes was the last feature Perl needed to be Level 1 compliant.

Perl calls this experimental feature “Extended Bracketed Character Classes” in perlrecharclass. Inside the (?[ ]), a regular expression does character class set operations. Inside the brackets, whitespace is insignificant (as if /x is on). Here’s a simple example to find the character z:

use v5.18;
no warnings qw(experimental::regex_sets);

my $regex = qr/(?[ [z] ])/;

while(  ) {
	chomp;
	say "[$_] ", /$regex/ ? 'Matched' : 'Missed';
	}

__DATA__
This is a line
This is the next line
And here's another line

None of the input lines have a letter z, so nothing matches:

[This is a line] Missed
[This is the next line] Missed
[And here's another line] Missed

To add more characters to the set, in old Perl (and still, even), you would add that character in the same set of brackets. If you want to find an x, you add that next to the z:

use v5.18;
no warnings qw(experimental::regex_sets);

my $regex = qr/(?[ [xz] ])/;

while(  ) {
	chomp;
	say "[$_] ", /$regex/ ? 'Matched' : 'Missed';
	}

__DATA__
This is a line
This is the next line
And here's another line

And now the middle input line matches:

[This is a line] Missed
[This is the next line] Matched
[And here's another line] Missed

But, you can do this with set math. Since you want either of those to match, you would take a union. Inside the (?[ ]), a + is the union operator (the | is also the union operator). Almost everything inside (?[ ]) is a metacharater, which is why you had to have another set of brackets around the literal characters in the previous example:

use v5.18;
no warnings qw(experimental::regex_sets);

my $regex = qr/(?[ [x] + [z] ])/;

while(  ) {
	chomp;
	say "[$_] ", /$regex/ ? 'Matched' : 'Missed';
	}

__DATA__
This is a line
This is the next line
And here's another line

The output is the same as before because it’s the same character class:

[This is a line] Missed
[This is the next line] Matched
[And here's another line] Missed

You can also do intersections with the &. In this example, you have two separate character classes that each have one character that matches each input line and they only have one character in common:

use v5.18;
no warnings qw(experimental::regex_sets);

my $regex = qr/(?[ [sxy] & [exw] ])/;


while(  ) {
	chomp;
	say "[$_] ", /$regex/ ? 'Matched' : 'Missed';
	}

__DATA__
This is a line
This is the next line
And here's another line

Their union is only x, so only that character matches and you get the same input, again:

[This is a line] Missed
[This is the next line] Matched
[And here's another line] Missed

The - is the set subtraction operator. In this example, the first character class are Perl word characters. You subtract from that the ASCII alphabetical characters, leaving only the digits and underscore:

use v5.18;
no warnings qw(experimental::regex_sets);

my $regex = qr/(?[ [\w] - [a-zA-Z] ])/;


while(  ) {
	chomp;
	say "[$_] ", /$regex/ ? 'Matched' : 'Missed';
	}

__DATA__
This is 1 line
This is the next line
And here's another line

Only the first line has a digit, so only it matches:

[This is 1 line] Matched
[This is the next line] Missed
[And here's another line] Missed

This gets more interesting with named properties, the only Level 2 feature Perl supports so far (see perluniprops). Some character classes may be easier to construct, read, and maintain without losing their literal characters. Suppose you want to get just the Eastern Arabic digits, perhaps because you’re in a country that uses Arabic as I am as I write this. You can take the intersection of the Arabic property and the Digit property. The Universal Character Set has this wonderful feature to assign many labels to its characters so we can identify subsets of a particular script:

use v5.18;
use utf8;
use open qw(:std :utf8);

no warnings qw(experimental::regex_sets);

my $regex = qr/(?[ \p{Arabic} & \p{Digit} ])/;

foreach my $ord ( 0 .. 0x10fffd ) {
	my $char = chr( $ord );
	say $char if $char =~ m/$regex/;
	}

Now you see just the digits from that script:

۰
۱
۲
۳
۴
۵
۶
۷
۸
۹

You can get more complicated. If you wanted the Western Arabic Digits too (what we normally call just “arabic numerals”). Although some of this problem is easy, that doesn’t show off the operations. In this example, you have two separate intersections that are joined in a union:

use v5.18;
use utf8;
use open qw(:std :utf8);

no warnings qw(experimental::regex_sets);

my $regex = qr/(?[ 
	( \p{Arabic} & \p{Digit} ) 
		+ 
	( \p{ASCII}  & \p{Digit} ) 
	])/;

foreach my $ord ( 0 .. 0x10fffd ) {
	my $char = chr( $ord );
	say $char if $char =~ m/$regex/;
	}

Now you see two sets of numerals:

0
1
2
3
4
5
6
7
8
9
۰
۱
۲
۳
۴
۵
۶
۷
۸
۹

There is one more character class set operator, the ^, which acts like an exclusive-or (the xor bit operator uses the same character. This operator takes the union of the two character classes then subtracts their intersection. That is, the resulting set has all the characters in both classes except for the ones they both have.

In this example, you have two intersections to extract the hex digits and digits from ASCII. That’s important since other scripts in the UCS have characters with these properties. From those intersections, you use the ^ to get the set that only contains the characters that show up in exactly one set.

use v5.18;
use utf8;
use open qw(:std :utf8);

no warnings qw(experimental::regex_sets);

my $regex = qr/(?[ 
	( \p{ASCII} & \p{HexDigit} )
		^ 
	( \p{ASCII} & \p{Digit} )
	])/;

foreach my $ord ( 0 .. 0x10fffd ) {
	my $char = chr( $ord );
	say $char if $char =~ m/$regex/;
	}

In this case, it’s the uppercase and lowercase letters:

A
B
C
D
E
F
a
b
c
d
e
f

Things to remember

  • Regular expression character class set operations satisfy UTS #18 Level 1 requirements.
  • You can compose character classes from other classes with unions, intersections, and subtractions.
  • Inside the (?[ ]), whitespace is insignificant.
  • regex_sets is an experimental feature.

:: Luca Ferrari ::: 2015 CPAN Pull Request: February pending

My February assignment was not a piece of cake: I got MyCPAN::Indexer, a module by the great brain d foy, yes the author of so many Perl Books, the launcher of the Perl Mongers, and...you know, pretty much a lot of the Perl world. Ok, what chance could I have to comment and improve the code of brian? It does not matter, I did my homework at my very best. The first step was to understand what

Sawyer X: Dancer2 0.159000 waiting for you on CPAN!

Hi everyone,

It's been a little while since we had a release. We took longer this time because this release provides a few major improvements we wanted to mature.

With 13 contributors and 23 tickets closed, I'd like to present Dancer2 0.159000.

There are three major changes in this release:

  • Asynchronous streaming support (also known as Delayed Responses).
  • Cleanup of the Manual and Cookbook
  • Remove dependency on MIME::Types

Dancer2 now supports full asynchronous and streaming responses while remaining event loop agnostic. You can use whichever event loop you want. An example for its usage can be found in our Manual.

We will provide more examples in the near future.

The Manual and Cookbook have been revamped, thanks to the work of Snigdha Dagar, our OPW (Outreach Program for Women) contributor. This results in cleaner, clearer, and accurate documentation.

The removal of MIME::Types allows us to maintain a smaller core at a low price. We're using the MIME support from Plack now, and if you have MIME::Types installed, we will use it as a fallback.

A special thanks to anyone involved in this commit, especially the following people (in order of appearance in the Changelog):

Russell Jenkins, Lennart Hengstmengel, Nikita K, pants, Daniel Muey, Dávid Kovács, Graham Knop, Sawyer X, Alberto Simões, Snigdha Dagar, Omar M. Othman, Nuno Carvalho, and Vince W.

The full changelog for this release:

[ BUG FIXES ]
* GH #762: Delay app cleanup until errors are rendered. (Russell Jenkins)
* GH #835: Correct Logic error in Logger if no request exists.
           (Lennart Hengstmengel)
* GH #839: Correct "no_server_tokens" definition in production.yml.
           (Nikita K)
* GH #853, #852: Handle malformed (contentless) cookies. (pants)
* GH #840, #842: Ensure session data available to template engines.
                 (Russell Jenkins)
* GH #565, #847, #849: Fix HTTP Status template logic and documentation.
                       (Daniel Muey, Russell Jenkins, Dávid Kovács)
* GH #843: Add missing attributes to Moo class used in tests. (Graham Knop)

[ ENHANCEMENT ]
* GH #836: Support delayed (asynchronous) responses!
           ("Delayed responses" in Dancer2::Manual for more information.)
           (Sawyer X)
* GH #824: Use Plack::MIME by default, MIME::Types as failback if available.
           (Alberto Simões)
* GH #792, #848: Keywords can now use prototypes.
                 (Russell Jenkins, Sawyer X)

[ DOCUMENTATION ]
* GH #837, #838, #841: Major documentation restructure. (Snigdha Dagar)
  (Check eb9416e9 and a78e27d7 for more details.)
* GH #823: Cleanup Manual and Cookbook docs. (Omar M. Othman)
* GH #828: Provide README.mkdn. (Nuno Carvalho)
* GH #830: Fix typo in Session::YAML pod. (Vince W)
* GH #831,#832: Fix broken link in Session::YAML pod. (Vince W)

PAL-Blog: Familienbett - Nein Danke

Ich bin kein Fan vom Familienbett, in dem Eltern und Kinder bis mindestens zum 10. Lebensjahr (der Kinder, nicht der Eltern) gemeinsam schlafen. Die letzte Nacht hat meine Meinung bestätigt. Gummibärchen (bzw. TaTü) schläft zwangsweise bei uns (und hat mich gestern Abend zum Einschlafen das erste Mal ganz leicht spürbar getreten!), aber heute Nacht hatten wir wirklich ein Familinenbett. Nur Bea fehlte, aber deren Bett steht in der Wohngruppe und auch wenn sie hier ist, ist sie froh über ihr eigenes Bett.

Perlgeek.de : All Perl 6 modules in a box

Sometimes when we change things in the Perl 6 language or the Rakudo Perl 6 compiler that implements it, we want to know if the planned changes will cause fallout in the library modules out there, and how much.

To get a quick estimate, we can now do a git grep in the experimental perl6-all-modules repository.

This is an attempt to get all the published module into a single git repository. It is built using git subrepo, an unofficial git extension module that I've been wanting to try for some time, and that seems to have some advantages over submodules in some cases. The notable one in this case being that git grep ignores submodules, but descends into subrepos just fine.

Here is the use case that made me create this repository: Rakudo accesses low-level operations through the nqp:: pseudo namespace. For example nqp::concat_s('a', 'b') is a low-level way to concatenate two strings. User-level programs can also use nqp:: ops, though it is generally a bad idea, because it ties the program to the particular compiler used, and what's more, the nqp:: ops are not part of the public API, and thus neither documented in the same place as the rest of Perl 6, nor are there any promises for stability attached.

So we want to require module authors to use a pragma, use nqp; in order to make their use of compiler internal explicit and deliberate. And of course, where possible, we want them to not use them at all :-)

To find out how many files in the ecosystem use nqp:: ops, a simple command, combined with the power of the standard UNIX tools, will help:

$ git grep -l 'nqp::'|wc -l
32

That's not too bad, considering we have... how many modules/distributions again?

Since they are added in author/repo structure, counting them with ls and wc isn't hard:

ls -1d */*/|wc -l
282

Ok, but number of files in relation to distributions isn't really useful. So let's ask: how many distributions directly use nqp:: ops?

$ git grep -l nqp:: | cut -d/ -f1,2 |sort -u|wc -l
23

23 out of 282 (or about 8%) distributions use the nqp:: syntax.

By the way, there is a tool (written in Perl 6, of course) to generate and update the repository. Not perfect yet, very much a work in progress. It's in the _tools folder, so you should probably filter out that directory in your queries (though in the examples above, it doesn't make a difference).

So, have fun with this new toy!

Perlgeek.de : A new Perl 6 community server - update

In my previous post I announced my plans for a new Perl 6 community server (successor to feather.perl6.nl), and now I'd like to share some updates.

Thanks to the generosity of the Perl 6 community, the server has been ordered and paid. I am now in the process of contacting those donors who haven't paid yet, leaving them the choice to re-purpose their pledge to ongoing costs (traffic, public IPv4 addresses, domain(s), SSL certs if necessary) and maintenance, or withdraw their pledges.

Some details of the hardware we'll get:

  • CPU: Intel® Xeon® Haswell-EP Series Processor E5-2620 v3, 2.40 GHz, 6-Core Socket 2011-3, 15MB Cache
  • RAM: 4x8GB DDR4 DDR4 PC2133 Reg. ECC 2R
  • HD: 2x 2TB SATA3-HD

The vendor has told me that all parts have arrived, and will be assembled today or tomorrow.

Currently I lean towards using KVM to create three virtual hosts: one for websites (*.perl6.org, perlcabal.syn), one for general hacking and IRC activity, and one for high-risk stuff (evalbots, try.rakudo.org, ...).

I've secured the domain p6c.org (for "perl 6 community"), and the IPv4 range 213.95.82.52 - 213.95.82.62 and the IPv6 net 2001:780:101:ff00::/64.

So the infrastructure is in place, now I'm waiting for the delivery of the hardware.

Perlgeek.de : doc.perl6.org: some stats, future directions

In June 2012 I started the perl6/doc repository with the intent to collect/write API documentation for Perl 6 built-in types and routines. Not long afterwards, the website doc.perl6.org was born, generated from the aforementioned repository.

About 2.5 years later, the repository has seen more than one thousand commits from more than 40 contributors, 14 of which contributed ten patches or more. The documentation encompasses about 550 routines in 195 types, with 15 documents for other things than built-in types (for example an introduction to regexes, descriptions of how variables work).

In terms of subjective experience, I observed an increase in the number of questions on our IRC channel and otherwise that could be answered by pointing to the appropriate pages of doc.perl6.org, or augmenting the answer with a statement like "for more info, see ..."

While it's far from perfect, I think both the numbers and the experience is very encouraging, and I'd like to thank everybody who helped make that happen, often by contributing skills I'm not good at: front-end design, good English and gentle encouragement.

Plans for the Future

Being a community-driven project, I can't plan anybody else's time on it, so these are my own plans for the future of doc.perl6.org.

Infrastructural improvements

There are several unsolved problems with the web interface, with how we store our documents, and how information can be found. I plan to address them slowly but steadily.

  • The search is too much centered around types and routines, searching for variables, syntactic constructs and keywords isn't easily possible. I want it to find many more things than right now.
  • Currently we store the docs for each type in a separate file called Type.pod. Which will break when we start to document native types, which being with lower case letters. Having int.pod and Int.pod is completely unworkable on case-insensitive or case-preserving file system. I want to come up with a solution for that, though I don't yet know what it will look like.
  • doc.perl6.org is served from static pages, which leads to some problems with file names conflicting with UNIX conventions. You can't name a file infix:</>.html, and files with two consecutive dots in their names are also weird. So in the long run, we'll have to switch to some kind of dynamic URL dispatching, or a name escaping scheme that is capable of handling all of Perl 6's syntax.
  • Things like the list of methods and what they coerce to in class Cool don't show up in derived types; either the tooling needs to be improved for that, or they need to be rewritten to use the usual one-heading-per-method approach.

Content

Of course my plan is to improve coverage of the built-in types and routines, and add more examples. In addition, I want to improve and expand on the language documentation (for example syntax, OO, regexes, MOP), ideally documenting every Perl 6 feature.

Once the language features are covered in sufficient breadth and depth (though I won't wait for 100% coverage), I want to add three tutorial tracks:

  • A track for beginners
  • A quick-start for programmers from other languages
  • A series of intermediate to advanced guides covering topics such as parsing, how to structure a bigger application, the responsible use of meta programming, or reactive programming.

Of course I won't be able to do that all on my own, so I hope to convince my fellow and future contributors that those are good ideas.

Time to stop rambling about the future, and off to writing some docs, this is yours truly signing off.

Perlgeek.de : CPAN Pull Request Challenge: A call to the CPAN authors

The 2015 CPAN Pull Request Challenge is ramping up, and so far nearly two hundred volunteers have signed up, pledging to make one pull request for a CPAN distribution for each month of the year.

So here's a call to the all the CPAN authors: please be supportive, and if you don't like for your CPAN distributions to be part of the challenge, please send an email to neil at bowers dot com, stating your PAUSE ID and the fact that you want to be excluded.

How to be supportive? The first step is to act on pull requests. If you don't have time for a review, please say so; getting some response, even if it's "it'll be some time 'till I get around to reviewing this" is much better than none.

The volunteers have varied backgrounds; some are seasoned veterans, others are beginners who will make their first contribution to Open Source. So please be patient and encouraging.

If you have specific requirements for contributions, add a file called CONTRIBUTING or CONTRIBUTING.md to your github repositories where you state those requirements.

And of course, be civil. But that goes without saying, right? :-)

(And to those CPAN authors who haven't done it yet: put your distributions on github, so that you're not left out of the fun!

Happy New Year everybody, and have a great deal of fun!

See also: Resources for the CPAN Pull Request Challenge.

Dave's Free Press: Journal: Devel::CheckLib can now check libraries' contents

Perlgeek.de : Rakudo's Abstract Syntax Tree

After or while a compiler parses a program, the compiler usually translates the source code into a tree format called Abstract Syntax Tree, or AST for short.

The optimizer works on this program representation, and then the code generation stage turns it into a format that the platform underneath it can understand. Actually I wanted to write about the optimizer, but noticed that understanding the AST is crucial to understanding the optimizer, so let's talk about the AST first.

The Rakudo Perl 6 Compiler uses an AST format called QAST. QAST nodes derive from the common superclass QAST::Node, which sets up the basic structure of all QAST classes. Each QAST node has a list of child nodes, possibly a hash map for unstructured annotations, an attribute (confusingly) named node for storing the lower-level parse tree (which is used to extract line numbers and context), and a bit of extra infrastructure.

The most important node classes are the following:

QAST::Stmts
A list of statements. Each child of the node is considered a separate statement.
QAST::Op
A single operation that usually maps to a primitive operation of the underlying platform, like adding two integers, or calling a routine.
QAST::IVal, QAST::NVal, QAST::SVal
Those hold integer, float ("numeric") and string constants respectively.
QAST::WVal
Holds a reference to a more complex object (for example a class) which is serialized separately.
QAST::Block
A list of statements that introduces a separate lexical scope.
QAST::Var
A variable
QAST::Want
A node that can evaluate to different child nodes, depending on the context it is compiled it.

To give you a bit of a feel of how those node types interact, I want to give a few examples of Perl 6 examples, and what AST they could produce. (It turns out that Perl 6 is quite a complex language under the hood, and usually produces a more complicated AST than the obvious one; I'll ignore that for now, in order to introduce you to the basics.)

Ops and Constants

The expression 23 + 42 could, in the simplest case, produce this AST:

QAST::Op.new(
    :op('add'),
    QAST::IVal.new(:value(23)),
    QAST::IVal.new(:value(42)),
);

Here an QAST::Op encodes a primitive operation, an addition of two numbers. The :op argument specifies which operation to use. The child nodes are two constants, both of type QAST::IVal, which hold the operands of the low-level operation add.

Now the low-level add operation is not polymorphic, it always adds two floating-point values, and the result is a floating-point value again. Since the arguments are integers and not floating point values, they are automatically converted to float first. That's not the desired semantics for Perl 6; actually the operator + is implemented as a subroutine of name &infix:<+>, so the real generated code is closer to

QAST::Op.new(
    :op('call'),
    :name('&infix:<+>'),    # name of the subroutine to call
    QAST::IVal.new(:value(23)),
    QAST::IVal.new(:value(42)),
);

Variables and Blocks

Using a variable is as simple as writing QAST::Var.new(:name('name-of-the-variable')), but it must be declared first. This is done with QAST::Var.new(:name('name-of-the-variable'), :decl('var'), :scope('lexical')).

But there is a slight caveat: in Perl 6 a variable is always scoped to a block. So while you can't ordinarily mention a variable prior to its declaration, there are indirect ways to achieve that (lookup by name, and eval(), to name just two).

So in Rakudo there is a convention to create QAST::Block nodes with two QAST::Stmts children. The first holds all the declarations, and the second all the actual code. That way all the declaration always come before the rest of the code.

So my $x = 42; say $x compiles to roughly this:

QAST::Block.new(
    QAST::Stmts.new(
        QAST::Var.new(:name('$x'), :decl('var'), :scope('lexical')),
    ),
    QAST::Stmts.new(
        QAST::Op.new(
            :op('p6store'),
            QAST::Var.new(:name('$x')),
            QAST::IVal.new(:value(42)),
        ),
        QAST::Op.new(
            :op('call'),
            :name('&say'),
            QAST::Var.new(:name('$x')),
        ),
    ),
);

Polymorphism and QAST::Want

Perl 6 distinguishes between native types and reference types. Native types are closer to the machine, and their type name is always lower case in Perl 6.

Integer literals are polymorphic in that they can be either a native int or a "boxed" reference type Int.

To model this in the AST, QAST::Want nodes can contain multiple child nodes. The compile-time context decides which of those is acutally used.

So the integer literal 42 actually produces not just a simple QAST::IVal node but rather this:

QAST::Want.new(
    QAST::WVal(Int.new(42)),
    'Ii',
    QAST::Ival(42),
)

(Note that Int.new(42) is just a nice notation to indicate a boxed integer object; it doesn't quite work like this in the code that translate Perl 6 source code into ASTs).

The first child of a QAST::Want node is the one used by default, if no other alternative matches. The comes a list where the elements with odd indexes are format specifications (here Ii for integers) and the elements at even-side indexes are the AST to use in that case.

An interesting format specification is 'v' for void context, which is always chosen when the return value from the current expression isn't used at all. In Perl 6 this is used to eagerly evaluate lazy lists that are used in void context, and for several optimizations.

Dave's Free Press: Journal: I Love Github

Dave's Free Press: Journal: Palm Treo call db module

Ocean of Awareness: Removing obsolete versions of Marpa from CPAN

Marpa::XS, Marpa::PP, and Marpa::HTML are obsolete versions of Marpa, which I have been keeping on CPAN for the convenience of legacy users. All new users should look only at Marpa::R2.

I plan to delete the obsolete releases from CPAN soon. For legacy users who need copies, they will still be available on backPAN.

I do this because their placement on CPAN placement makes them "attractive nuisances" -- they show up in searches and generally make it harder to find Marpa::R2, which is the version that new users should be interested in. There is also some danger a new user could, by mistake, use the obsolete versions instead of Marpa::R2.

It's been some time since someone has reported a bug in their code, so they should be stable for legacy applications. I would usually promise to fix serious bugs that affect legacy users, but unfortunately, especially in the case of Marpa::XS, it is a promise I would have trouble keeping. Marpa::XS depends on Glib, and uses a complex build which I last performed on a machine I no longer use for development.

For this reason, a re-release to CPAN with deprecatory language is also not an option. I probably would not want to do so anyway -- the CPAN infrastructure by default pushes legacy users into upgrading, which always carries some risk. New deprecatory language would add no value for the legacy users, and they are the only audience these releases exist to serve.

Comments

Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net. To learn more about Marpa, there's the official web site maintained by Ron Savage. I also have a Marpa web site.

Perlgeek.de : A new Perl 6 community server - call for funding

So far, many Perl 6 developers have used feather as a generic development server. Juerd, who has genereously provided this server for us for free for many years, has announced that it will be shut down at the end of the year.

My daytime job is at a b2b IT outsourcing and hosting company called noris network, and they have agreed to sponsor the hosting/housing of a 1U 19" server in one of their state-of-the-art data centers in Nürnberg, Germany.

What's missing is the actual hardware. Some folks in the community have already agreed to participate in funding the hardware, though I have few concrete pledges.

So here is the call to action: If you want to help the Perl 6 community with a one-time donation towards a new community server, please send me an e-mail to moritz at faui2k3 dot org, specifying the amount you're willing do pledge, and whether you want to stay private as a donor. I accept money transfer by paypal and wire transfer (SWIFT). Direct hardware donations are also welcome. (Though actual money will be deferred until the final decision what hardware to buy, and thus the total amount required).

How much do we need?

Decent, used 1U servers seem to start at about 250€, though 350€ would get us a lot more bang (mostly RAM and hard disk space). And in general, the more the merrier. (Cheaper offers exist, for example on ebay, but usually they are without hard disks, so the need for extra drives makes them more expensive in total).

With more money, even beefier hardware and/or spare parts and/or a maintainance contract and/new hardware would be an option.

What do we need it for?

The main tasks for the server are:

  • Hosting websites like perl6.org and the synopses
  • Hosting infrastructure like the panda metadata server
  • Be available for smoke runs of the compilers, star distributions and module ecosystem.
  • Be available as a general development machine for people who don't have linux available and/or not enough resources to build some Perl 6 compilers on their own machines comfortably.
  • A place for IRC sessions for community memebers
  • A backup location for community services like the IRC logs, the camelia IRC eval bot etc. Those resources are currently hosted elswewhere, though having another option for hosting would be very valuable.
  • A webspace for people who want to host Perl 6-related material.
  • It is explicitly not meant as a general hosting platform, nor as a mail server.

    Configuration

    If the hardware we get is beefy enough, I'd like to virtualize the server into two to three components. One for hosting the perl6.org and related websites that should be rather stable, and one for the rest of the system. If resources allow it, and depending on feedback I get, maybe a third virtual system for high-risk stuff like evalbot.

    As operating system I'll install Debian Jessie (the current testing), simply because I'll end up maintaing the system, and it's the system I'm most familiar with.

Dave's Free Press: Journal: Graphing tool

Dave's Free Press: Journal: XML::Tiny released

Perlgeek.de : Pattern Matching and Unpacking

When talking about pattern matching in the context of Perl 6, people usually think about regex or grammars. Those are indeed very powerful tools for pattern matching, but not the only one.

Another powerful tool for pattern matching and for unpacking data structures uses signatures.

Signatures are "just" argument lists:

sub repeat(Str $s, Int $count) {
    #     ^^^^^^^^^^^^^^^^^^^^  the signature
    # $s and $count are the parameters
    return $s x $count
}

Nearly all modern programming languages have signatures, so you might say: nothing special, move along. But there are two features that make them more useful than signatures in other languages.

The first is multi dispatch, which allows you to write several routines with the name, but with different signatures. While extremely powerful and helpful, I don't want to dwell on them. Look at Chapter 6 of the "Using Perl 6" book for more details.

The second feature is sub-signatures. It allows you to write a signature for a sigle parameter.

Which sounds pretty boring at first, but for example it allows you to do declarative validation of data structures. Perl 6 has no built-in type for an array where each slot must be of a specific but different type. But you can still check for that in a sub-signature

sub f(@array [Int, Str]) {
    say @array.join: ', ';
}
f [42, 'str'];      # 42, str
f [42, 23];         # Nominal type check failed for parameter '';
                    # expected Str but got Int instead in sub-signature
                    # of parameter @array

Here we have a parameter called @array, and it is followed by a square brackets, which introduce a sub-signature for an array. When calling the function, the array is checked against the signature (Int, Str), and so if the array doesn't contain of exactly one Int and one Str in this order, a type error is thrown.

The same mechanism can be used not only for validation, but also for unpacking, which means extracting some parts of the data structure. This simply works by using variables in the inner signature:

sub head(*@ [$head, *@]) {
    $head;
}
sub tail(*@ [$, *@tail]) {
    @tail;
}
say head <a b c >;      # a
say tail <a b c >;      # b c

Here the outer parameter is anonymous (the @), though it's entirely possible to use variables for both the inner and the outer parameter.

The anonymous parameter can even be omitted, and you can write sub tail( [$, *@tail] ) directly.

Sub-signatures are not limited to arrays. For working on arbitrary objects, you surround them with parenthesis instead of brackets, and use named parameters inside:

multi key-type ($ (Numeric :$key, *%)) { "Number" }
multi key-type ($ (Str     :$key, *%)) { "String" }
for (42 => 'a', 'b' => 42) -> $pair {
    say key-type $pair;
}
# Output:
# Number
# String

This works because the => constructs a Pair, which has a key and a value attribute. The named parameter :$key in the sub-signature extracts the attribute key.

You can build quite impressive things with this feature, for example red-black tree balancing based on multi dispatch and signature unpacking. (More verbose explanation of the code.) Most use cases aren't this impressive, but still it is very useful to have occasionally. Like for this small evaluator.

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 2

Perlgeek.de : YAPC Europe 2013 Day 3

The second day of YAPC Europe climaxed in the river boat cruise, Kiev's version of the traditional conference dinner. It was a largish boat traveling on the Dnipro river, with food, drinks and lots of Perl folks. Not having fixed tables, and having to get up to fetch food and drinks led to a lot of circulation, and thus meeting many more people than at traditionally dinners. I loved it.

Day 3 started with a video message from next year's YAPC Europe organizers, advertising for the upcoming conference and talking a bit about the oppurtunities that Sofia offers. Tempting :-).

Monitoring with Perl and Unix::Statgrab was more about the metrics that are available for monitoring, and less about doing stuff with Perl. I was a bit disappointed.

The "Future Perl Versioning" Discussion was a very civilized discussion, with solid arguments. Whether anybody changed their minds remain to be seen.

Carl Mäsak gave two great talks: one on reactive programming, and one on regular expressions. I learned quite a bit in the first one, and simply enjoyed the second one.

After the lunch (tasty again), I attended Jonathan Worthington's third talk, MoarVM: a metamodel-focused runtime for NQP and Rakudo. Again this was a great talk, based on great work done by Jonathan and others during the last 12 months or so. MoarVM is a virtual machine designed for Perl 6's needs, as we understand them now (as opposed to parrot, which was designed towards Perl 6 as it was understood around 2003 or so, which is considerably different).

How to speak manager was both amusing and offered a nice perspective on interactions between managers and programmers. Some of this advice assumed a non-tech-savy manager, and thus didn't quite apply to my current work situation, but was still interesting.

I must confess I don't remember too much of the rest of the talks that evening. I blame five days of traveling, hackathon and conference taking their toll on me.

The third session of lightning talks was again an interesting mix, containing interesting technical tidbits, the usual "we are hiring" slogans, some touching and thoughtful moments, and finally a song by Piers Cawley. He had written the lyrics in the previous 18 hours (including sleep), to (afaict) a traditional irish song. Standing up in front of ~300 people and singing a song that you haven't really had time to practise takes a huge amount of courage, and I admire Piers both for his courage and his great performance. I hope it was recorded, and makes it way to the public soon.

Finally the organizers spoke some closing words, and received their well-deserved share of applause.

As you might have guess from this and the previous blog posts, I enjoyed this year's YAPC Europe very much, and found it well worth attending, and well organized. I'd like to give my heart-felt thanks to everybody who helped to make it happen, and to my employer for sending me there.

This being only my second YAPC, I can't make any far-reaching comparisons, but compared to YAPC::EU 2010 in Pisa I had an easier time making acquaintances. I cannot tell what the big difference was, but the buffet-style dinners at the pre-conference meeting and the river boat cruise certainly helped to increase the circulation and thus the number of people I talked to.

Dave's Free Press: Journal: YAPC::Europe 2007 travel plans

Perlgeek.de : A small regex optimization for NQP and Rakudo

Recently I read the course material of the Rakudo and NQP Internals Workshop, and had an idea for a small optimization for the regex engine. Yesterday night I implemented it, and I'd like to walk you through the process.

As a bit of background, the regex engine that Rakudo uses is actually implemented in NQP, and used by NQP too. The code I am about to discuss all lives in the NQP repository, but Rakudo profits from it too.

In addition one should note that the regex engine is mostly used for parsing grammar, a process which involves nearly no scanning. Scanning is the process where the regex engine first tries to match the regex at the start of the string, and if it fails there, moves to the second character in the string, tries again etc. until it succeeds.

But regexes that users write often involve scanning, and so my idea was to speed up regexes that scan, and where the first thing in the regex is a literal. In this case it makes sense to find possible start positions with a fast string search algorithm, for example the Boyer-Moore algorithm. The virtual machine backends for NQP already implement that as the index opcode, which can be invoked as start = index haystack, needle, startpos, where the string haystack is searched for the substring needle, starting from position startpos.

From reading the course material I knew I had to search for a regex type called scan, so that's what I did:

$ git grep --word scan
3rdparty/libtommath/bn_error.c:   /* scan the lookup table for the given message
3rdparty/libtommath/bn_mp_cnt_lsb.c:   /* scan lower digits until non-zero */
3rdparty/libtommath/bn_mp_cnt_lsb.c:   /* now scan this digit until a 1 is found
3rdparty/libtommath/bn_mp_prime_next_prime.c:                   /* scan upwards 
3rdparty/libtommath/changes.txt:       -- Started the Depends framework, wrote d
src/QRegex/P5Regex/Actions.nqp:                     QAST::Regex.new( :rxtype<sca
src/QRegex/P6Regex/Actions.nqp:                     QAST::Regex.new( :rxtype<sca
src/vm/jvm/QAST/Compiler.nqp:    method scan($node) {
src/vm/moar/QAST/QASTRegexCompilerMAST.nqp:    method scan($node) {
Binary file src/vm/moar/stage0/NQPP6QRegexMoar.moarvm matches
Binary file src/vm/moar/stage0/QASTMoar.moarvm matches
src/vm/parrot/QAST/Compiler.nqp:    method scan($node) {
src/vm/parrot/stage0/P6QRegex-s0.pir:    $P5025 = $P5024."new"("scan" :named("rx
src/vm/parrot/stage0/QAST-s0.pir:.sub "scan" :subid("cuid_135_1381944260.6802") 
src/vm/parrot/stage0/QAST-s0.pir:    push $P5004, "scan"

The binary files and .pir files are generated code included just for bootstrapping, and not interesting for us. The files in 3rdparty/libtommath are there for bigint handling, thus not interesting for us either. The rest are good matches: src/QRegex/P6Regex/Actions.nqp is responsible for compiling Perl 6 regexes to an abstract syntax tree (AST), and src/vm/parrot/QAST/Compiler.nqp compiles that AST down to PIR, the assembly language that the Parrot Virtual Machine understands.

So, looking at src/QRegex/P6Regex/Actions.nqp the place that mentions scan looked like this:

    $block<orig_qast> := $qast;
    $qast := QAST::Regex.new( :rxtype<concat>,
                 QAST::Regex.new( :rxtype<scan> ),
                 $qast,
                 ($anon
                      ?? QAST::Regex.new( :rxtype<pass> )
                      !! (nqp::substr(%*RX<name>, 0, 12) ne '!!LATENAME!!'
                            ?? QAST::Regex.new( :rxtype<pass>, :name(%*RX<name>) )
                            !! QAST::Regex.new( :rxtype<pass>,
                                   QAST::Var.new(
                                       :name(nqp::substr(%*RX<name>, 12)),
                                       :scope('lexical')
                                   ) 
                               )
                          )));

So to make the regex scan, the AST (in $qast) is wrapped in QAST::Regex.new(:rxtype<concat>,QAST::Regex.new( :rxtype<scan> ), $qast, ...), plus some stuff I don't care about.

To make the optimization work, the scan node needs to know what to scan for, if the first thing in the regex is indeed a constant string, aka literal. If it is, $qast is either directly of rxtype literal, or a concat node where the first child is a literal. As a patch, it looks like this:

--- a/src/QRegex/P6Regex/Actions.nqp
+++ b/src/QRegex/P6Regex/Actions.nqp
@@ -667,9 +667,21 @@ class QRegex::P6Regex::Actions is HLL::Actions {
     self.store_regex_nfa($code_obj, $block, QRegex::NFA.new.addnode($qast))
     self.alt_nfas($code_obj, $block, $qast);
 
+    my $scan := QAST::Regex.new( :rxtype<scan> );
+    {
+        my $q := $qast;
+        if $q.rxtype eq 'concat' && $q[0] {
+            $q := $q[0]
+        }
+        if $q.rxtype eq 'literal' {
+            nqp::push($scan, $q[0]);
+            $scan.subtype($q.subtype);
+        }
+    }
+
     $block<orig_qast> := $qast;
     $qast := QAST::Regex.new( :rxtype<concat>,
-                 QAST::Regex.new( :rxtype<scan> ),
+                 $scan,
                  $qast,

Since concat nodes have always been empty so far, the code generators don't look at their child nodes, and adding one with nqp::push($scan, $q[0]); won't break anything on backends that don't support this optimization yet (which after just this patch were all of them). Running make test confirmed that.

My original patch did not contain the line $scan.subtype($q.subtype);, and later on some unit tests started to fail, because regex matches can be case insensitive, but the index op works only case sensitive. For case insensitive matches, the $q.subtype of the literal regex node would be ignorecase, so that information needs to be carried on to the code generation backend.

Once that part was in place, and some debug nqp::say() statements confirmed that it indeed worked, it was time to look at the code generation. For the parrot backend, it looked like this:

    method scan($node) {
        my $ops := self.post_new('Ops', :result(%*REG<cur>));
        my $prefix := self.unique('rxscan');
        my $looplabel := self.post_new('Label', :name($prefix ~ '_loop'));
        my $scanlabel := self.post_new('Label', :name($prefix ~ '_scan'));
        my $donelabel := self.post_new('Label', :name($prefix ~ '_done'));
        $ops.push_pirop('repr_get_attr_int', '$I11', 'self', %*REG<curclass>, '"$!from"');
        $ops.push_pirop('ne', '$I11', -1, $donelabel);
        $ops.push_pirop('goto', $scanlabel);
        $ops.push($looplabel);
        $ops.push_pirop('inc', %*REG<pos>);
        $ops.push_pirop('gt', %*REG<pos>, %*REG<eos>, %*REG<fail>);
        $ops.push_pirop('repr_bind_attr_int', %*REG<cur>, %*REG<curclass>, '"$!from"', %*REG<pos>);
        $ops.push($scanlabel);
        self.regex_mark($ops, $looplabel, %*REG<pos>, 0);
        $ops.push($donelabel);
        $ops;
    }

While a bit intimidating at first, staring at it for a while quickly made clear what kind of code it emits. First three labels are generated, to which the code can jump with goto $label: One as a jump target for the loop that increments the cursor position ($looplabel), one for doing the regex match at that position ($scanlabel), and $donelabel for jumping to when the whole thing has finished.

Inside the loop there is an increment (inc) of the register the holds the current position (%*REG<pos>), that position is compared to the end-of-string position (%*REG<eos>), and if is larger, the cursor is marked as failed.

So the idea is to advance the position by one, and then instead of doing the regex match immediately, call the index op to find the next position where the regex might succeed:

--- a/src/vm/parrot/QAST/Compiler.nqp
+++ b/src/vm/parrot/QAST/Compiler.nqp
@@ -1564,7 +1564,13 @@ class QAST::Compiler is HLL::Compiler {
         $ops.push_pirop('goto', $scanlabel);
         $ops.push($looplabel);
         $ops.push_pirop('inc', %*REG<pos>);
-        $ops.push_pirop('gt', %*REG<pos>, %*REG<eos>, %*REG<fail>);
+        if nqp::elems($node.list) && $node.subtype ne 'ignorecase' {
+            $ops.push_pirop('index', %*REG<pos>, %*REG<tgt>, self.rxescape($node[0]), %*REG<pos>);
+            $ops.push_pirop('eq', %*REG<pos>, -1, %*REG<fail>);
+        }
+        else {
+            $ops.push_pirop('gt', %*REG<pos>, %*REG<eos>, %*REG<fail>);
+        }
         $ops.push_pirop('repr_bind_attr_int', %*REG<cur>, %*REG<curclass>, '"$!from"', %*REG<pos>);
         $ops.push($scanlabel);
         self.regex_mark($ops, $looplabel, %*REG<pos>, 0);

The index op returns -1 on failure, so the condition for a cursor fail are slightly different than before.

And as mentioned earlier, the optimization can only be safely done for matches that don't ignore case. Maybe with some additional effort that could be remedied, but it's not as simple as case-folding the target string, because some case folding operations can change the string length (for example ß becomes SS while uppercasing).

After successfully testing the patch, I came up with a small, artifical benchmark designed to show a difference in performance for this particular case. And indeed, it sped it up from 647 ± 28 µs to 161 ± 18 µs, which is roughly a factor of four.

You can see the whole thing as two commits on github.

What remains to do is implementing the same optimization on the JVM and MoarVM backends, and of course other optimizations. For example the Perl 5 regex engine keeps track of minimal and maximal string lengths for each subregex, and can anchor a regex like /a?b?longliteral/ to 0..2 characters before a match of longliteral, and generally use that meta information to fail faster.

But for now I am mostly encouraged that doing a worthwhile optimization was possible in a single evening without any black magic, or too intimate knowledge of the code generation.

Update: the code generation for MoarVM now also uses the index op. The logic is the same as for the parrot backend, the only difference is that the literal needs to be loaded into a register (whose name fresh_s returns) before index_s can use it.

Dave's Free Press: Journal: Wikipedia handheld proxy

Dave's Free Press: Journal: Bryar security hole

Dave's Free Press: Journal: Thankyou, Anonymous Benefactor!

Dave's Free Press: Journal: Number::Phone release

Dave's Free Press: Journal: Ill

Dave's Free Press: Journal: CPANdeps upgrade

Dave's Free Press: Journal: CPANdeps

Dave's Free Press: Journal: Module pre-requisites analyser

Dave's Free Press: Journal: Perl isn't dieing

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 3

Perlgeek.de : The Fun of Running a Public Web Service, and Session Storage

One of my websites, Sudokugarden, recently surged in traffic, from about 30k visitors per month to more than 100k visitors per month. Here's the tale of what that meant for the server side.

As a bit of background, I built the website in 2007, when I knew a lot less about the web and programming. It runs on a host that I share with a few friends; I don't have root access on that machine, though when the admin is available, I can generally ask him to install stuff for me.

Most parts of the websites are built as static HTML files, with Server Side Includes. Parts of those SSIs are Perl CGI scripts. The most popular part though, which allows you to solve Sudoku in the browser and keeps hiscores, is written as a collection of Perl scripts, backed by a mysql database.

When at peak times the site had more than 10k visitors a day, lots of visitors would get a nasty mysql: Cannot connect: Too many open connections error. The admin wasn't available for bumping the connection limit, so I looked for other solutions.

My first action was to check the logs for spammers and crawlers that might hammered the page, and I found and banned some; but the bulk of the traffic looked completely legitimate, and the problem persisted.

Looking at the seven year old code, I realized that most pages didn't actually need a database connection, if only I could remove the session storage from the database. And, in fact, I could. I used CGI::Session, which has pluggable backend. Switching to a file-based session backend was just a matter of changing the connection string and adding a directory for session storage. Luckily the code was clean enough that this only affected a single subroutine. Everything was fine.

For a while.

Then, about a month later, the host ran out of free disk space. Since it is used for other stuff too (like email, and web hosting for other users) it took me a while to make the connection to the file-based session storage. What happened was 3 million session files on a ext3 file system with a block size of 4 kilobyte. A session is only about 400 byte, but since a file uses up a multiple of the block size, the session storage amounted to 12 gigabyte of used-up disk space, which was all that was left on that machine.

Deleting those sessions turned out to be a problem; I could only log in as my own user, which doesn't have write access to the session files (which are owned by www-data, the Apache user). The solution was to upload a CGI script that deleted the session, but of course that wasn't possible at first, because the disk was full. In the end I had to delete several gigabyte of data from my home directory before I could upload anything again. (Processes running as root were still writing to reserved-to-root portions of the file system, which is why I had to delete so much data before I was able to write again).

Even when I was able to upload the deletion script, it took quite some time to actually delete the session files; mostly because the directory was too large, and deleting files on ext3 is slow. When the files were gone, the empty session directory still used up 200MB of disk space, because the directory index doesn't shrink on file deletion.

Clearly a better solution to session storage was needed. But first I investigated where all those sessions came from, and banned a few spamming IPs. I also changed the code to only create sessions when somebody logs in, not give every visitor a session from the start.

My next attempt was to write the sessions to an SQLite database. It uses about 400 bytes per session (plus a fixed overhead for the db file itself), so it uses only a tenth of storage space that the file-based storage used. The SQLite database has no connection limit, though the old-ish version that was installed on the server doesn't seem to have very fine-grained locking either; within a few days I could errors that the session database was locked.

So I added another layer of workaround: creating a separate session database per leading IP octet. So now there are up to 255 separate session database (plus a 256th for all IPv6 addresses; a decision that will have to be revised when IPv6 usage rises). After a few days of operation, it seems that this setup works well enough. But suspicious as I am, I'll continue monitoring both disk usage and errors from Apache.

So, what happens if this solution fails to work out? I can see basically two approaches: move the site to a server that's fully under my control, and use redis or memcached for session storage; or implement sessions with signed cookies that are stored purely on the client side.

Ocean of Awareness: PEG: Ambiguity, precision and confusion

Precise?

PEG parsing is a new notation for a notorously tricky algorithm that goes back to the earliest computers. In its PEG form, this algorithm acquired an seductive new interface, one that looks like the best of extended BNF combined with the best of regular expressions. Looking at a sample of it, you are tempted to imagine that writing a parser has suddenly become a very straightforward matter. Not so.

For those not yet in the know on this, I'll illustrate with a pair of examples from an excellent 2008 paper by Redziejowski. Let's start with these two PEG specifications.

    ("a"|"aa")"a"
    ("aa"|"a")"a"
    

One of these two PEG grammars accepts the string "aaa" but not the string "aa". The other does the opposite -- it accepts the string the string "aa" but not the string "aaa". Can you tell which one? (For the answer, see page 4 of Redziejowski 2008.)

Here is another example:

    A = "a"A"a"/"aa"
    

What language does this describe? All the strings in the language are obviously the letter "a", repeated some number of times. But which string lengths are in the language, and which are not? Again the answer is on page 4 of Redziejowski 2008 -- it's exactly those strings whose length is a power of 2.

With PEG, what you see in the extended BNF is not what you get. PEG parsing has been called "precise", apparently based on the idea that PEG parsing is in a certain sense unambiguous. In this case "precise" is taken as synonymous with "unique". That is, PEG parsing is precise in exactly the same sense that Jimmy Hoffa's body is at a precise location. There is (presumably) exactly one such place, but we are hard put to be any more specific about the matter.

Syntax-driven?

The advantage of using a syntax-driven parser generator is that the syntax you specify describes the language that will be parsed. For most practical grammars, PEG is not syntax-driven in this sense. Several important PEG researchers understand this issue, and have tried to deal with it. I will talk about their work below. This is much more at stake than bragging rights over which algorithm is really syntax-driven and which is not.

When you do not know the language your parser is parsing, you of course have the problem that your parser might not parse all the strings in your language. That can be dealt with by fixing the parser to accept the correct input, as you encounter problems.

A second, more serious, problem is often forgotten. Your PEG parser might accept strings that are not in your language. At worst, this creates a security loophole. At best, it leaves with a choice: break compatiblity, or leave the problem unfixed.

It's important to be able to convince yourself that your code is correct by examining it and thinking about it. Beginning programmers often simply hack things, and call code complete once it passes the test suite. Test suites don't catch everything, but there is a worse problem with the beginner's approach.

Since the beginner has no clear idea of why his code works, even when it does, it is unlikely to be well-organized or readable. Programming techniques like PEG, where the code can be made to work, but where it is much harder, and in practice usually not possible, to be sure why the code works, become maintenance nightmares.

The maintenance implications are especially worrisome if the PEG parser is for a language with a life cycle that may involve bug fixes or other changes. The impact of even small changes to a PEG specification is hard to predict and hard to discover after the fact.

Is PEG unambiguous?

PEG is not unambiguous in any helpful sense of that word. BNF allows you to specify ambiguous grammars, and that feature is tied to its power and flexibility and often useful in itself. PEG will only deliver one of those parses. But without an easy way of knowing which parse, the underlying ambiguity is not addressed -- it is just ignored.

My Marpa parser is a general BNF parser based on Earley's. It also can simply throw all but one of the parses in an ambiguous parse away. But I would not feel justified in saying to a user who has an issue with ambiguity, that Marpa has solved her problem by throwing all but one arbitrarily chosen result.

Sticking with Marpa for a moment, we can see one example of a more helpful approach to ambiguity. Marpa allows a user to rank rules, so that all but the highest ranking rules are not used in a parse. Marpa's rule rankings are specified in its BNF, and they work together with the BNF in an intuitive way. In every case, Marpa delivers precisely the parses its BNF and its rule rankings specify. And it is "precision" in this sense that a parser writer is looking for.

Is there a sensible way to use PEG?

I'll return to Marpa at the end of this post. For now, let's assume that you are not interested in using Marpa -- you are committed to PEG, and you want to make the best of PEG. Several excellent programmers have focused on PEG, without blinding themselves to its limitations. I've already mentioned one important paper by Redziejowski. Many of Redziejowski's collected papers are about PEG, and Redziejowski, in his attempts to use PEG, does not sugarcoat its problems.

Roberto Ierusalimschy, author of Lua and one of the best programmers of our time, has written a PEG-based parser of his own. Roberto is fully aware of PEG's limits, but he makes a very good case for choosing PEG as the basis of LPEG, his parser generator. LPEG is intended for use with Lua, a ruthlessly minimal language. Roberto's minimalist implementation limits the power of his parser, but his aim is to extend regular expressions in a disciplined way, and a compact parser of limited power is quite acceptable for his purposes.

Matching the BNF to the PEG spec

As Redziejowski and Ierusalimschy and the other authors of Mascarenhas et al, 2013 recognize, not knowing what language you are parsing is more than an annoyance. We can call a language "well-behaved for PEG" if the PEG spec delivers exactly the language the BNF describes.

Which languages are are well-behaved for PEG? According to Mascarenhas et al, 2013, the LL(1) languages are well-behaved. (The LL(1) languages are the languages a top-down parser can parse based on at most one character of input.) Syntax-driven parsers for LL(1) have been around for much longer than PEG -- one such parser is described in the first paper to describe recursive descent (Peter Lucas, 1961). But most practical languages are not LL(1). Redziejowski 2013 and Redziejowski 2014 seek to extend this result by defining the language class LL(1p) -- those top-down languages with one "parsing procedure" of lookahead. The LL(1p) languages are also well-behaved for PEG.

Mascarenhas et al, 2013 also look at a different approach -- instead of writing a PEG specification and trying to keep it well-behaved, they look at taking languages from larger top-down classes and translating them to PEG. I don't know of any followup, but it's possible this approach could produce well-behaved top-down parsers which are an improvement over direct-from-PEG parsing. But for those who are open to leaving top-down parsing behind, a parser which handles languages in all these classes and more is already available.

Marpa

In this post, I have adopted the point of view of programmers using PEG, or thinking of doing so. My own belief in this matter is that very few programmers should want to bother with the issues I've just described. My reason for this is the Marpa parser -- a general BNF Earley-driven parser that

  • has an implementation you can use today;
  • allows the application to combine syntax-driven parsing with custom procedural logic;
  • makes available full, left-eidetic knowledge of the parse to the procedural logic;
  • and parses a vast class of grammars in linear time, including all the LR-regular grammars.

The LR-regular grammars include the LR(k) and LL(k) grammars for all k. LR-regular includes all the languages which are well-behaved under PEG, and all of those that Mascarenhas et al, 2013 consider translating into PEG.

Comments

Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net. To learn more about Marpa, there's the official web site maintained by Ron Savage. I also have a Marpa web site.

Perlgeek.de : Profiling Perl 6 code on IRC

On the #perl6 IRC channel, we have a bot called camelia that executes small snippets of Perl 6 code, and prints the output that it produces. This is a pretty central part of our culture, and we use it to explain or demonstrate features or even bugs in the compiler.

Here is an example:

10:35 < Kristien> Can a class contain classes?
10:35 < Kristien> m: class A { class B { } }; say A.new.B.new
10:35 <+camelia> rakudo-moar 114659: OUTPUT«No such method 'B' for invocant of 
                 type 'A'␤  in block <unit> at /tmp/g81K8fr9eY:1␤␤»
10:35 < Kristien> :(
10:36 < raydiak> m: class A { class B { } }; say A::B.new
10:36 <+camelia> rakudo-moar 114659: OUTPUT«B.new()␤»

Yesterday and today I spent some time teaching this IRC bot to not only run the code, but optionally also run it through a profiler, to make it possible to determine where the virtual machine spends its time running the code. an example:

12:21 < moritz> prof-m: Date.today for ^100; say "done"
12:21 <+camelia> prof-m 9fc66c: OUTPUT«done␤»
12:21 <+camelia> .. Prof: http://p.p6c.org/453bbe

The Rakudo Perl 6 Compiler on the MoarVM backend has a profile, which produces a fancy HTML + Javascript page, and this is what is done. It is automatically uploaded to a webserver, producing this profile.

Under the hood, it started with a patch that makes it possible to specify the output filename for a profile run, and another one to clear up the fallout from the previous patch.

Then came the bigger part: setting up the Apache virtual host that serves the web files, including a restricted user that only allows up- and downloads via scp. Since the IRC bot can execute arbitrary code, it is very likely that an attacker can steal the private SSH keys used for authentication against the webserver. So it is essential that if those keys are stolen, the attacker can't do much more than uploading more files.

I used rssh for this. It is the login shell for the upload user, and configured to only allow scp. Since I didn't want the attacker to be able to modify the authorized_keys file, I configured rssh to use a chroot below the home directory (which sadly in turn requires a setuid-root wrapper around chroot, because ordinary users can't execute it. Well, nothing is perfect).

Some more patching and debugging later, the bot was ready.

The whole thing feels a bit bolted on; if usage warrants it, I'll see if I can make the code a bit prettier.

Perlgeek.de : YAPC Europe 2013 Day 2

The second day of YAPC Europe was enjoyable and informative.

I learned about ZeroMQ, which is a bit like sockets on steriods. Interesting stuff. Sadly Design decisions on p2 didn't quite qualify as interesting.

Matt's PSGI archive is a project to rewrite Matt's infamous script archive in modern Perl. Very promising, and a bit entertaining too.

Lunch was very tasty, more so than the usual mass catering. Kudos to the organizers!

After lunch, jnthn talked about concurrency, parallelism and asynchrony in Perl 6. It was a great talk, backed by great work on the compiler and runtime. Jonathans talk are always to be recommended.

I think I didn't screw up my own talk too badly, at least the timing worked fine. I just forgot to show the last slide. No real harm done.

I also enjoyed mst's State of the Velociraptor, which was a summary of what went on in the Perl world in the last year. (Much better than the YAPC::EU 2010 talk with the same title).

The Lightning talks were as enjoyable as those from the previous day. So all fine!

Next up is the river cruise, I hope to blog about that later on.

Perlgeek.de : The REPL trick

A recent discussion on IRC prompted me to share a small but neat trick with you.

If there are things you want to do quite often in the Rakudo REPL (the interactive "Read-Evaluate-Print Loop"), it makes sense to create a shortcut for them. And creating shortcuts for often-used stuff is what programming languages excel at, so you do it right in Perl module:

use v6;
module REPLHelper;

sub p(Mu \x) is export {
    x.^mro.map: *.^name;
}

I have placed mine in $HOME/.perl6/repl.

And then you make sure it's loaded automatically:

$ alias p6repl="perl6 -I$HOME/.perl6/repl/ -MREPLHelper"
$ p6repl
> p Int
Int Cool Any Mu
>

Now you have a neat one-letter function which tells you the parents of an object or a type, in method resolution order. And a way to add more shortcuts when you need them.

Dave's Free Press: Journal: Travelling in time: the CP2000AN

Perlgeek.de : New Perl 6 community server now live, accepting signups

The new Perl 6 community server is now alive and kicking.

As planned, I've set up KVM virtualization, and so far there are two guest systems. hack.p6c.org is meant for general Perl 6 development activity (which also includes irssi/weechat sessions), and is equipped with 20GB RAM to handle multiple concurrent rakudo-jvm compilations :-). It runs a pretty bare-bones Debian Jessie.

Update: there is now a website for the new server.

www.p6c.org is the web server where I plan to host perl6.org and related (sub-)domains. It's not as beefy as hack, but sufficiently large to compile and run Rakudo, in preparation for future Perl 6-based web hosting. Currently I'm running a copy of several perl6.org subdomains on it (with the domain name p6c instead of perl6 for test purposes); the plan is to switch the perl6.org DNS over once all of the websites have been copied/migrated.

If you have a Perl 6 related use for a shell account or for serving websites, please request an account by email (moritz.lenz@gmail.com) or IRC (moritz on freenode and magnet), including:

  1. Your desired username
  2. What you want to do on the machine(s) (not necessary for #perl6 regulars)
  3. Which of the machine(s) you need access to
  4. Optionally an openssh public key
  5. Whether you'd be willing to help a bit with sysadmin tasks (mostly apt-get update && apt-get dist-upgrade, restarting hung services, killing huge processes)
  6. Software you need installed (it's OK to not know this up-front)

Note that feather.perl6.nl will shut down soon (no fixed date yet, but "end of 2014" is expected), so if you rely on feather now, you should consider migrating to the new server.

The code of conduct is pretty simple:

  1. Be reasonable in your resource usage.
  2. Use technical means to limit your resource usage so that it doesn't accidentally explode (ulimit comes to mind).
  3. Limit yourself to legal and Perl 6-related use cases (no warez).
  4. Help your fellow hackers.

The standard disclaimer applies:

  • Expect no privacy. There will potentially be many root users, who could all read your files and memory.
  • There are no promises of continued service or even support. Your account can be terminated without notice.
  • Place of jurisdiction in Nürnberg, Germany. You have to comply with German law while using the server. (Note that this puts pretty high standards on privacy for any user data you collect, including from web applications). It's your duty to inform yourself about the applicable laws. Illegal activities will be reported to the authorities.

With all that said, happy hacking!.

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 1

Dave's Free Press: Journal: Thanks, Yahoo!

Ocean of Awareness: Parsing: Top-down versus bottom-up

Comparisons between top-down and bottom-up parsing are often either too high-level or too low-level. Overly high-level treatments reduce the two approaches to buzzwords, and the comparison to a recitation of received wisdom. Overly low-level treatments get immersed in the minutiae of implementation, and the resulting comparison is as revealing as placing two abstractly related code listings side by side. In this post I hope to find the middle level; to shed light on why advocates of bottom-up and top-down parsing approaches take the positions they do; and to speculate about the way forward.

Top-down parsing

The basic idea of top-down parsing is as brutally simple as anything in programming: Starting at the top, we add pieces. We do this by looking at the next token and deciding then and there where it fits into the parse tree. Once we've looked at every token, we have our parse tree.

In its purest form, this idea is too simple for practical parsing, so top-down parsing is almost always combined with lookahead. Lookahead of one token helps a lot. Longer lookaheads are very sparsely used. They just aren't that helpful, and since the number of possible lookaheads grows exponentially, they get very expensive very fast.

Top-down parsing has an issue with left recursion. It's straightforward to see why. Take an open-ended expression like

    a + b + c + d + e + f + [....]

Here the plus signs continue off to the right, and adding any of them to the parse tree requires a dedicated node which must be above the node for the first plus sign. We cannot put that first plus sign into a top-down parse tree without having first dealt with all those plus signs that follow it. For a top-down strategy, this is a big, big problem.

Even in the simplest expression, there is no way of counting the plus signs without looking to the right, quite possibly a very long way to the right. When we are not dealing with simple expressions, this rightward-looking needs to get sophisticated. There are ways of dealing with this difficulty, but all of them share one thing in common -- they are trying to make top-down parsing into something that it is not.

Advantages of top-down parsing

Top-down parsing does not look at the right context in any systematic way, and in the 1970's it was hard to believe that top-down was as good as we can do. (It's not all that easy to believe today.) But its extreme simplicity is also top-down parsing's great strength. Because a top-down parser is extremely simple, it is very easy to figure out what it is doing. And easy to figure out means easy to customize.

Take another of the many constructs incomprehensible to a top-down parser:

    2 * 3 * 4 + 5 * 6
    

How do top-down parsers typically handle this? Simple: as soon as they realize they are faced with an expression, they give up on top-down parsing and switch to a special-purpose algorithm.

These two properties -- easy to understand and easy to customize -- have catapulted top-down parsing to the top of the heap. Behind their different presentations, combinator parsing, PEG, and recursive descent are all top-down parsers.

Bottom-up parsing

Few theoreticians of the 1970's imagined that top-down parsing might be the end of the parsing story. Looking to the right in ad hoc ways clearly does help. It would be almost paradoxical if there was no systematic way to exploit the right context.

In 1965, Don Knuth found an algorithm to exploit right context. Knuth's LR algorithm was, like top-down parsing as I have described it, deterministic. Determinism was thought to be essential -- allowing more than one choice easily leads to a combinatorial explosion in the number of possibilities that have to be considered at once. When parsers are restricted to dealing with a single choice, it is much easier to guarantee that they will run in linear time.

Knuth's algorithm did not try to hang each token from a branch of a top-down parse tree as soon as it was encountered. Instead, Knuth suggested delaying that decision. Knuth's algorithm collected "subparses".

When I say "subparses" in this discussion, I mean pieces of the parse that contain all the decisions necessary to construct the part of the parse tree that is below them. But subparses do not contain any decisions about what is above them in the parse tree. Put another way, subparses know who they are, but not where they belong.

Subparses may not know where they belong, but knowing who they are is enough for them to be assembled into larger subparses. And, if we keep assembling the subparses, eventually we will have a "subparse" that is the full parse tree. And at that point we will know both who everyone is and where everyone belongs.

Knuth's algorithm stored subparses by shifting them onto a stack. The operation to do this was called a "shift". (Single tokens of the input are treated as subparses with a single node.) When there was enough context to build a larger subparse, the algorithm popped one or more subparses off the stack, assembled a larger subparse, and put the resulting subparse back on the stack. This operation was called a "reduce", based on the idea that its repeated application eventually "reduces" the parse tree to its root node.

In handling the stack, we will often be faced with choices. One kind of choice is between using what we already have on top of the stack to assemble a larger subparse; or pushing more subparses on top of the stack instead ("shift/reduce"). When we decide to reduce, we may encounter the other kind of choice -- we have to decide which rule to use ("reduce/reduce").

Like top-down parsing, bottom-up parsing is usually combined with lookahead. For the same lookahead, a bottom-up parser parses everything that a top-down parser can handle, and more.

Formally, Knuth's approach is now called shift/reduce parsing. I want to demonstrate why theoreticians, and for a long time almost everybody else as well, was so taken with this method. I'll describe how it works on some examples, including two very important ones that stump top-down parsers: arithmetic expressions and left-recursion. My purpose here is bring to light the basic concepts, and not to guide an implementor. There are excellent implementation-oriented presentations in many other places. The Wikipedia article, for example, is excellent.

Bottom-up parsing solved the problem of left recursion. In the example from above,

    a + b + c + d + e + f + [....]

we simply build one subparse after another, as rapidly as we can. In the terminology of shift/reduce, whenever we can reduce, we do. Eventually we will have run out of tokens, and will have reduced until there is only one element on the stack. That one remaining element is the subparse that is also, in fact, our full parse tree.

The top-down parser had a problem with left recursion precisely because it needed to build top-down. To build top-down, it needed to know about all the plus signs to come, because these needed to be fitted into the parse tree above the current plus sign. But when building bottom-up, we don't need to know anything about the plus signs that will be above the current one in the parse tree. We can afford to wait until we encounter them.

But if working bottom-up solves the left recursion problem, doesn't it create a right recursion problem? In fact, for a bottom-up parser, right recursion is harder, but not much. That's because of the stack. For a right recursion like this:

    a = b = c = d = e = f = [....]

we use a strategy opposite to the one we used for the left recursion. For left recursion, we reduced whenever we could. For right recursion, when we have a choice, we always shift. This means we will immediately shift the entire input onto the stack. Once the entire input is on the stack, we have no choice but to start reducing. Eventually we will reduce the stack to a single element. At that point, we are done. Essentially, what we are doing is exactly what we did for left recursion, except that we use the stack to reverse the order.

Arithmetic expressions like

    2 * 3 * 4 + 5 * 6

require a mixed strategy. Whenever we have a shift/reduce choice, and one of the operators is on the stack, we check to see if the topmost operator is a multiply or an addition operator. If it is a multiply operator, we reduce. In all other cases, if there is a shift/reduce choice, we shift.

In the discussion above, I have pulled the strategy for making stack decisions (shift/reduce and reduce/reduce) out of thin air. Clearly, if bottom-up parsing was going to be a practical parsing algorithm, the stack decisions would have to be made algorithmically. In fact, discovering a practical way to do this was a far from trivial task. The solution in Knuth's paper was considered (and apparently intended) to be mathematically provocative, rather than practical. But by 1979, it was thought a practical way to make stack decisions had been found and yacc, a parser generator based on bottom-up parsing, was released. (Readers today may be more familiar with yacc's successor, bison.)

The fate of bottom-up parsing

With yacc, it looked as if the limitations of top-down parsing were past us. We now had a parsing algorithm that could readily and directly parse left and right recursions, as well as arithmetic expressions. Theoreticians thought they'd found the Holy Grail.

But not all of the medieval romances had happy endings. And as I've described elsewhere, this story ended badly. Bottom-up parsing was driven by tables which made the algorithm fast for correct inputs, but unable to accurately diagnose faulty ones. The subset of grammars parsed was still not quite large enough, even for conservative language designers. And bottom-up parsing was very unfriendly to custom hacks, which made every shortcoming loom large. It is much harder to work around a problem in a bottom-up parser than than it is to deal with a similar shortcoming in a top-down parser. After decades of experience with bottom-up parsing, top-down parsing has re-emerged as the algorithm of choice.

Non-determinism

For many, the return to top-down parsing answers the question that we posed earlier: "Is there any systematic way to exploit right context when parsing?" So far, the answer seems to be a rather startling "No". Can this really be the end of the story?

It would be very strange if the best basic parsing algorithm we know is top-down. Above, I described at some length some very important grammars that can be parsed bottom-up but not top-down, at least not directly. Progress like this seems like a lot to walk away from, and especially to walk back all the way to what is essentially a brute force algorithm. This perhaps explains why lectures and textbooks persist in teaching bottom-up parsing to students who are very unlikely to use it. Because the verdict from practitioners seems to be in, and likely to hold up on appeal.

Fans of deterministic top-down parsing, and proponents of deterministic bottom-up parsing share an assumption: For a practical algorithm to be linear, it has to be deterministic. But is this actually the case?

It's not, in fact. To keep bottom-up parsing deterministic, we restricted ourselves to a stack. But what if we track all possible subpieces of parses? For efficiency, we can link them and put them into tables, making the final decisions in a second pass, once the tables are complete. (The second pass replaces the stack-driven see-sawing back and forth of the deterministic bottom-up algorithm, so it's not an inefficiency.) Jay Earley in 1968 came up with an algorithm to do this, and in 1991 Joop Leo added a memoization to Earley's algorithm which made it linear for all deterministic grammars.

The "deterministic grammars" are exactly the bottom-up parseable grammars with lookahead -- the set of grammars parsed by Knuth's algorithm. So that means the Earley/Leo algorithm parses, in linear time, everything that a deterministic bottom-up parser can parse, and therefore every grammar that a deterministic top-down parser can parse. (In fact, the Earley/Leo algorithm is linear for a lot of ambiguous grammars as well.)

Top-down parsing had the advantage that it was easy to know where you are. The Earley/Leo algorithm has an equivalent advantage -- its tables know where it is, and it is easy to query them programmatically. In 2010, this blogger modified the Earley/Leo algorithm to have the other big advantage of top-down parsing: The Marpa algorithm rearranges the Earley/Leo parse engine so that we can stop it, perform our own logic, and restart where we left off. A quite useable parser based on the Marpa algorithm is available as open source.

Comments

Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net. To learn more about Marpa, there's the official web site maintained by Ron Savage. I also have a Marpa web site.

Dave's Free Press: Journal: POD includes

Dave's Free Press: Journal: cgit syntax highlighting

Perlgeek.de : First day at YAPC::Europe 2013 in Kiev

Today was the first "real" day of YAPC Europe 2013 in Kiev. In the same sense that it was the first real day, we had quite a nice "unreal" conference day yesterday, with a day-long Perl 6 hackathon, and in the evening a pre-conference meeting a Sovjet-style restaurant with tasty food and beverages.

The talks started with a few words of welcome, and then the announcement that the YAPC Europe next year will be in Sofia, Bulgaria, with the small side note that there were actually three cities competing for that honour. Congratulations to Sofia!

Larry's traditional keynote was quite emotional, and he had to fight tears a few times. Having had cancer and related surgeries in the past year, he still does his perceived duty to the Perl community, which I greatly appreciate.

Afterwards Dave Cross talked about 25 years of Perl in 25 minutes, which was a nice walk through some significant developments in the Perl world, though a bit hasty. Maybe picking fewer events and spending a bit more time on the selected few would give a smoother experience.

Another excellent talk that ran out of time was on Redis. Having experimented a wee bit with Redis in the past month, this was a real eye-opener on the wealth of features we might have used for a project at work, but in the end we didn't. Maybe we will eventually revise that decision.

Ribasushi talked about how hard benchmarking really is, and while I was (in principle) aware of that fact that it's hard to get right, there were still several significant factors that I overlooked (like the CPU's tendency to scale frequency in response to thermal and power-management considerations). I also learned that I should use Dumbbench instead of the Benchmark.pm core module. Sadly it didn't install for me (Capture::Tiny tests failing on Mac OS X).

The Perl 6 is dead, long live Perl 5 talk was much less inflammatory than the title would suggest (maybe due to Larry touching on the subject briefly during the keynote). It was mostly about how Perl 5 is used in the presenter's company, which was mildly interesting.

After tasty free lunch I attended jnthn's talk on Rakudo on the JVM, which was (as is typical for jnthn's talk) both entertaining and taught me something, even though I had followed the project quite a bit.

Thomas Klausner's Bread::Board by example made me want to refactor the OTRS internals very badly, because it is full of the anti-patterns that Bread::Board can solve in a much better way. I think that the OTRS code base is big enough to warrant the usage of Bread::Board.

I enjoyed Denis' talk on Method::Signatures, and was delighted to see that most syntax is directly copied from Perl 6 signature syntax. Talk about Perl 6 sucking creativity out of Perl 5 development.

The conference ended with a session of lighning talks, something which I always enjoy. Many lightning talks had a slightly funny tone or undertone, while still talking about interesting stuff.

Finally there was the "kick-off party", beverages and snacks sponsored by booking.com. There (and really the whole day, and yesterday too) I not only had conversations with my "old" Perl 6 friends, but also talked with many interesting people I never met before, or only met online before.

So all in all it was a nice experience, both from the social side, and from quality and contents of the talks. Venue and food are good, and the wifi too, except when it stops working for a few minutes.

I'm looking forward to two more days of conference!

(Updated: Fixed Thomas' last name)

Ocean of Awareness: What makes a parsing algorithm successful?

What makes a parsing algorithm successful? Two factors, I think. First, does the algorithm parse a workably-defined set of grammars in linear time? Second, does it allow the application to intervene in the parse with custom code? When parsing algorithms are compared, typically neither of these gets much attention. But the successful algorithms do one or the other.

Does the algorithm parse a workably-defined set of grammars in linear time?

"Workably-defined" means more than well-defined in the mathematical sense -- the definition has to be workable. That is, the definition must be something that, with reasonable effort, a programmer can use in practice.

The algorithms in regular expression engines are workably-defined. A regular expression, in the pure sense consists of a sequence of symbols, usually shown by concatenation:

a b c

or a choice among sequences, usually shown by a vertical bar:

a | b | c

or a repetition of any of the above, typically shown with a star:

a*

or any recursive combination of these. True, if this definition is new to you, it can take time to get used to. But vast numbers of working programming are very much "used to it", can think in terms of regular expressions, and can determine if a particular problem will yield to treatment as a regular expression, or not.

Parsers in the LALR family (yacc, bison, etc.) do not have a workably defined set of grammars. LALR is perfectly well-defined mathematically, but even experts in parsing theory are hard put to decide if a particular grammar is LALR.

Recursive descent also does not have a workably defined set of grammars. Recursive descent doesn't even have a precise mathematical description -- you can say that recursive descent is LL, but in practice LL tables are rarely used. Also in practice, the LL logic is extended with every other trick imaginable, up to and including switching to other parsing algorithms.

Does it allow the user to intervene in the parse?

It is not easy for users to intervene in the processing of a regular expression, though some implementations attempt to allow such efforts. LALR parsers are notoriously opaque. Those who maintain the LALR-driven Perl parser have tried to supplement its abilities with custom code, with results that will not encourage others making the same attempt.

Recursive descent, on the other hand, has no parse engine -- it is 100% custom code. You don't get much friendlier than that.

Conclusions

Regular expressions are a success, and will remain so, because the set of grammars they handle is very workably-defined. Applications using regular expressions have to take what the algorithm gives them, but what it gives them is very predictable.

For example, an application can write regular expressions on the fly, and the programmer can be confident they will run as long as they are well-formed. And it is easy to determine if the regular expression is well-formed. (Whether it actually does what you want is a separate issue.)

Recursive descent does not handle a workably-defined set of grammars, and it also has to be hand-written. But it makes up for this by allowing the user to step into the parsing process anywhere, and "get his hands dirty". Recursive descent does nothing for you, but it does allow you complete control. This is enough to make recursive descent the current algorithm of choice for major parsing projects.

As I have chronicled elsewhere, LALR was once, on highly convincing theoretical grounds, seen as the solution to the parsing problem. But while mathematically well-defined, LALR was not workably defined. And it was very hostile to applications that tried to alter, or even examine, its syntax-driven workings. After decades of trying to make it work, the profession has abandoned LALR almost totally.

What about Marpa?

Marpa has both properties: its set of grammars is workably-defined. And, while Marpa is syntax-driven like LALR and regular expressions, it also allows the user to stop the parse engine, communicate with it about the state of the parse, do her own parsing for a while, and restart the parse engine at any point she wants.

Marpa's workable definition has a nuance that the one for regular expressions does not. For regular expressions, linearity is a given -- they parse in linear time or fail. Marpa parses a much larger class of grammars, the context-free grammars -- anything that can be written in BNF. BNF is used to describe languages in standards, and is therefore itself a kind of "gold standard" for a workable definition of a set of grammars. However, Marpa does not parse everything that can be written in BNF in linear time.

Marpa linearly-parsed set of grammars is smaller than the context-free grammars, but it is still very large, and it is still workably-defined. Marpa will parse any unambiguous language in linear time, unless it contains unmarked middle recursions. An example of a "marked" middle recursion is the language described by

S ::= a S a | x

a string of which is "aaaxaaa", where the "x" marks the middle. An example of an "unmarked" middle recursion is the language described by

S ::= a S a | a

a string of which is "aaaaaaa", where nothing marks the middle, so that you don't know until the end where the middle of the recursion is. If a human can reliably find the middle by eyeball, the middle recursion is marked. If a human can't, then the middle recursion might be unmarked.

Marpa also parses a large set of unambiguous grammars linearly, and this set of grammars is also workably-defined. Marpa parses an ambiguous grammar in linear time if

  • It has no unmarked middle recursions.
  • All right recursions are unambiguous.
  • There are no cycles. A cycle occurs, for example, if there is a rule A ::= A in the grammar.
  • Marpa's level of ambiguity at any location is bounded by a constant.

The term "level of ambiguity" requires a bit of explanation. At any given location, there can be as many rules "in play" as you like, without affecting the level of ambiguity. The key question: What is the maximum number of different origins that a rule might have? (The "origin" of a rule is the location where it begins.) That is, can a rule currently in play have at most 20 different origins? Or could it have its origin at every location so far? If the maximum number of origins is 20 or any other fixed constant, the level of ambiguity is "bounded". But if the maximum number of origins keeps growing as the length of the input grows, the level of ambiguity is unbounded.

For the unambiguous case, Marpa's workable definition encompasses a much larger class of grammars, but is no more complex than that for regular expressions. If you want to extend even further, and work with ambiguous grammars, the definition remains quite workable. Of the four restrictions needed to ensure linearity, the one requiring a bounded level of ambiguity is the only one that might force you to exercise real vigliance -- once you get into ambiguity, unboundedness is easy to slip into.

As for the other three, cycles never occur in a practical grammars, and Marpa reports them, so that you simply fix them when they happen. Most recursions will be left recursions, which are unrestricted. My experience has been that, in practical grammars, unmarked middle recursions and ambiguous right recursions are not especially tempting features. If you note whenever you use a right recursion, checking that it is not ambiguous, and if you note whenever you use a middle recursion, checking that it is marked, then you will stay linear.

To learn more about Marpa, there's the official web site maintained by Ron Savage. I also have a Marpa web site.

Comments

Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.

Dave's Free Press: Journal: CPAN Testers' CPAN author FAQ

Ocean of Awareness: Reporting mismatched delimiters

In many contexts, programs need to identify non-overlapping pieces of a text. One very direct way to do this is to use a pair of delimiters. One delimiter of the pair marks the start and the other marks the end. Delimiters can take many forms: Quote marks, parentheses, curly braces, square brackets, XML tags, and HTML tags are all delimiters in this sense.

Mismatching delimiters is easy to do. Traditional parsers are often poor at reporting these errors: hopeless after the first mismatch, and for that matter none too precise about the first one. This post outlines a scaleable method for the accurate reporting of mismatched delimiters. I will illustrate the method with a simple but useable tool -- a utility which reports mismatched brackets.

The example script

The example script, bracket.pl, reports mismatched brackets in the set:

() {} []

They are expected to nest without overlaps. Other text is treated as filler. bracket.pl is not smart about things like strings or comments. This does have the advantage of making bracket.pl mostly language-agnostic.

Because it's intended primarily to be read as an illustration of the technique, bracket.pl's grammar is a basic one. The grammar that bracket.pl uses is so simple that an emulator of bracket.pl could be written using recursive descent. I hope the reader who goes on to look into the details will see that this technique scales to more complex situations, in a way that a solution based on a traditional parser will not.

Error reports

The description of how the method works will make more sense after we've looked at some examples of the diagnostics bracket.pl produces. To be truly useful, bracket.pl must report mismatches that span many lines, and it can do this. But single-line examples are easier to follow. All the examples in this post will be contained in a one line. Consider the string '((([))'. bracket.pl's diagnostics are:

* Line 1, column 1: Opening '(' never closed, problem detected at end of string
((([))
^
====================
* Line 1, column 4: Missing close ], problem detected at line 1, column 5
((([))
   ^^

In the next example bracket.pl realizes that it cannot accept the ')' at column 16, without first closing the set of curly braces started at column 5. It identifies the problem, along with both of the locations involved.

* Line 1, column 5: Missing close }, problem detected at line 1, column 16
[({({x[]x{}x()x)})]
    ^          ^

So far, so good. But an important advantage of bracket.pl has yet to be seen. Most compilers, once they report a first mismatched delimiter, produce error messages that are unreliable -- so unreliable that they are useless in practice. bracket.pl repairs a mismatched bracket before continuing, so that it can do a reasonable job of analyzing the text that follows. Consider the text '({]-[(}-[{)'. The output of bracket.pl is

* Line 1, column 1: Missing close ), problem detected at line 1, column 3
({]-[(}-[{)
^ ^
====================
* Line 1, column 2: Missing close }, problem detected at line 1, column 3
({]-[(}-[{)
 ^^
====================
* Line 1, column 3: Missing open [
({]-[(}-[{)
  ^
====================
* Line 1, column 5: Missing close ], problem detected at line 1, column 7
({]-[(}-[{)
    ^ ^
====================
* Line 1, column 6: Missing close ), problem detected at line 1, column 7
({]-[(}-[{)
     ^^
====================
* Line 1, column 7: Missing open {
({]-[(}-[{)
      ^
====================
* Line 1, column 9: Missing close ], problem detected at line 1, column 11
({]-[(}-[{)
        ^ ^
====================
* Line 1, column 10: Missing close }, problem detected at line 1, column 11
({]-[(}-[{)
         ^^
====================
* Line 1, column 11: Missing open (
({]-[(}-[{)
          ^

Each time, bracket.pl corrects itself, and accurately reports the next set of problems.

A difficult error report

To be 100% accurate, bracket.pl would have to guess the programmer's intent. This is, of course, not possible. Let's look at a text where bracket.pl's guesses are not so good: {{]}. Here we will assume the closing square bracket is a typo for a closing parenthesis. Here's the result:

* Line 1, column 1: Missing close }, problem detected at line 1, column 3
{{]}
^ ^
====================
* Line 1, column 2: Missing close }, problem detected at line 1, column 3
{{]}
 ^^
====================
* Line 1, column 3: Missing open [
{{]}
  ^
====================
* Line 1, column 4: Missing open {
{{]}
   ^

Instead of one error, bracket.pl finds four.

But even in this case, the method is fairly good, especially when compared with current practice. The problem is at line 1, column 3, and the first three messages all identify this as one of their potential problem locations. It is reasonable to believe that a programmer, especially once he becomes used to this kind of mismatch reporting, will quickly find the first mismatch and fix it. For this difficult case, bracket.pl may not be much better than the state of the art, but it is certainly no worse.

How it works

For full details of the workings of bracket.pl there is the code, which is heavily commented. This section provides a conceptual overview.

bracket.pl uses two features of Marpa: left-eideticism and the Ruby Slippers. By left-eidetic, I mean that Marpa knows everything there is to know about the parse at, and to left of, the current position. As a consequence, Marpa also knows exactly which of its input symbols can lead to a successful parse, and is able to stop as soon as it knows that the parse cannot succeed.

In the Ruby Slippers technique, we arrange for parsing to stop whenever we encounter an input which would cause parsing to fail. The application then asks Marpa, "OK. What input would allow the parse to continue?" The application takes Marpa's answer to this question, and uses it to concoct an input that Marpa will accept.

In this case, bracket.pl creates a virtual token which fixes the mismatch of brackets. Whatever the missing bracket may be, bracket.pl invents a bracket of that kind, and adds it to the virtual input. This done, parsing and error detection can proceed as if there was no problem. Of course, the error which made the Ruby Slippers token necessary is recorded, and those records are the source of the error reports we saw above.

To make its error messages as informative as possible in the case of missing closing brackets, bracket.pl needs to report the exact location of the opening bracket. Left-eideticism again comes in handy here. Once the virtual closing bracket is supplied to Marpa, bracket.pl asks, "That bracketed text that I just closed -- where did it begin?" The Marpa parser tracks the start location of all symbol and rule instances, so it is able to provide the application with the exact location of the starting bracket.

When bracket.pl encounters a problem at a point where there are unclosed opening brackets, it has two choices. It can be optimistic or it can be pessimistic. "Optimistic" means it can hope that something later in the input will close the opening bracket. "Pessimistic" means it can decide that "all bets are off" and use Ruby Slippers tokens to close all the currently active open brackets.

bracket.pl uses the pessimistic strategy. While the optimistic strategy sounds better, in practice the pessimistic one seems to provide better diagnostics. The pessimistic strategy does report some fixable problems as errors. But the optimistic one can introduce spurious fixes. These hide the real errors, and it is worse to miss errors than it is to overreport them. Even when the pessimistic strategy overreports, its first error message will always accurately identify the first problem location.

While bracket.pl is already useable, I think of it as a prototype. Beyond that, the problem of matching delimiters is in fact very general, and I believe these techniques may have very wide application.

For more

The example script of this post is a Github gist. For more about Marpa, there's the official web site maintained by Ron Savage. I also have a Marpa web site. Comments on this post can be made in Marpa's Google group.

Dave's Free Press: Journal: YAPC::Europe 2006 report: day 3

Subscriptions

Header image by Tambako the Jaguar. Some rights reserved.