Gabor Szabo: Switching Gears? Changing direction?

I have been struggling with what I am doing for quite some time. I keep seeing the evidence that the user-base of Perl is shrinking. While you can't see from these numbers, but the Perl Maven has not gaing more readers since February 2015. But I still love to explain stuff about Perl.

For the full article visit Switching Gears? Changing direction?

NeilB: CPAN Weekly: one module per week, in your inbox

CPAN Weekly is a mailing list for Perl 5 programmers. Each week there will be one short message sent to the list, with a brief description of a CPAN module, and example usage.

The idea is not to provide a tutorial, but just to make you aware of the module, and show one basic use case. By planting seeds in your mental Perl toolbox, hopefully next time you have certain needs you will think "oh, I read about a module for that!", rather than "I'll just write a module for that".

You can sign up at cpan-weekly.org.

The idea for this came while reviewing the 2015 Pull Request Challenge. A number of participants commented that an unexpected side effect of taking part in the challenge was learning a bit more about some of the modules on CPAN, and realising how many there were.

The first module will be mailed to the list in the week starting Monday 15th February.

You can help with this project: email me and let me know the modules that you consider to be hidden gems of CPAN: neil at bowers dot com.

Thanks to listbox for providing the mailing list.

Perl Foundation News: Perl 6 Release Goals: Final Grant Report

Jonathan Worthington writes:

I applied for a third and final extension of my Perl 6 Release Goals grant, which was published for comments in December and subsequently approved. The final extension granted a further 110 hours of work, which I completed prior to the Christmas release of Perl 6. This report covers the work that was done under this extension, and concludes with some final comments on the grant as a whole.

I'd like to start with a small note on timing. In November, I worked almost exclusively on Perl 6. Around the middle of the month, I had exhausted all of the hours that had been assigned in the previous grant extension. The general understanding on Perl 6 Core Development Fund grants is that I may - at my own risk - go ahead and continue with work that needs doing, in hope that a grant extension application will be approved. I did this, concurrent with writing up a report on what was achieved and requesting the extension. Thus, I didn't actually endure a sleepless week or two in December completing the hours in the final grant extension - as was speculated in one comment! Rather, the extension covered all of my December work, as well work in the later parts of November.

Numerous issues were resolved during the hours provided by this final grant extension:

  • Supplies, the Perl 6 API for asynchronous streams of data, got a design cleanup. The API was good overall, but several corners of it were suboptimal both from a language design and safety point of view, as well as from an optimizability perspective.
  • Some API design issues around async sockets and processes, as well as with Promise combinators, were resolved. The CLOSE phaser was added to supply blocks to facilitate resource management, and the whenever syntax came to support channels as well as promises and supplies. This meant that the earliest block syntax, which I've never been entirely happy with, could go away. Finally, a couple of other concurrency bugs were resolved.
  • A number of important I/O issues were dealt with, the most notable of which involved dealing with various complaints about Windows newline handling. The native file descriptor behind a handle was also exposed, for use in conjunction with native calling, and UDP support was added to IO::Socket::Async.
  • The semantics of multi methods stubbed in roles, as well as composition of multi methods in roles, were reviewed and modified to be more useful.
  • Sized native lexical variables got a good looking over, as well as unsigned native integers. Numerous issues around them were addressed.
  • A few control flow related semantic issues were ironed out, generally involving the interaction of phasers and control flow operations (such as next and last).
  • Nearly 20 other smaller semantic bugs were resolved in a range of areas: list flattening edge cases, role punning, .?/.+/.* behavior with multis, multi-dispatch with optional parameters, shadowing of built-in types, return constraints on blocks, and sigilless variables in list assignments.
  • A couple of nasty bugs were fixed (a GC hang, a pre-compilation bug, and a meta-object mixins problem).

I also contributed in various ways to preparing for the release itself. Of note, I added the experimental pragma and moved a number of things we were not happy with including in Perl 6 Christmas behind it. I also clarified version reporting to reflect the language/compiler version distinction more cleanly. Finally, I was there on Christmas day itself to lend a hand with the release.

With the Perl 6 Christmas release now made, this Perl 6 Release Goals grant has reached its natural conclusion. I would like to thank all those who have contributed funds to make the initial grant and its two extensions possible. For me, 2015 was a year with various happy distractions, but also in the latter parts of the year suboptimal health. Together, these notably reduced my usual levels of "free time" for participating in Perl 6. So, rather than simply enabling me to do a bit more, this grant was critical to my continued substantial involvement in the Perl 6 project during this important year. I would also like to thank TPF for administering this grant, my grant manager, and last - but certainly not least - the Perl 6 community, who I count among the best folks I've worked with on anything, ever.

dagolden: My Github dashboard of neglect

Bitrot.

The curse of being a prolific publisher is a long list of once-cherished, now-neglected modules.

Earlier this week, I got a depressing Github notification. The author of a pull request who has politely pestered me for a while to review his PR, added this comment:

1 year has passed

Ouch!

Sadly, after taking time to review the PR, I actually decided it wasn't a great fit and politely (I hope), rejected it. And then I felt even WORSE, because I'd made someone wait around a year for me to say "no".

Much like my weight hitting a local maxima on the scale, goading me to rededicate myself to healthier eating [dear startups, enough with the constant junk food, already!], this PR felt like a low point in my open-source maintenance.

And, so, just like I now have an app to show me a dashboard of my food consumption, I decided I needed a birds eye view of what I'd been ignoring on Github.

Here, paraphrased, is my "conversation" with Github.

Me: Github, show me a dashboard!  

GH: Here's a feed of events on repos you watch

Me: No, I want a dashboard.

GH: Here's a list of issues created, assigned or mentioning you.

Me: No, I want a dashboard.  Maybe I need an organization view.  [my CPAN repos are in an organization]

GH: Here's a feed of events on repos in the organization.

Me: No, I want a dashboard of issues.

GH: Here's a list of issues for repos in the organization.

Me: Uh, can you summarize that?

GH: No.

Me: Github, you suck.  But you have an API.  Time to bust out some Perl.

So I wrote my own github-dashboard program, using Net::GitHub. (Really, I adapted it from other Net::GitHub programs I already use.) I keep my Github user id and API token in my .gitconfig, so the program pulls my credentials from there.

Below, you can see my Github dashboard of neglect (top 40 only!). The three columns of numbers are (respectively) PRs, non-wishlist issues and wishlist issues. (Wishlist items are identified either by label or by "wishlist" in the title.)

$ ./github-dashboard |  head -40
                               Capture-Tiny   3  18   0
                                    Meerkat   2   8   0
                               getopt-lucid   2   1   0
                                  Path-Tiny   1  21   0
                               HTTP-Tiny-UA   1   5   0
                         Path-Iterator-Rule   1   5   0
  Dist-Zilla-Plugin-BumpVersionAfterRelease   1   3   2
                              Metabase-Fact   1   3   0
                dist-zilla-plugin-osprereqs   1   2   0
       Dist-Zilla-Plugin-Test-ReportPrereqs   1   2   0
                                    ToolSet   1   2   0
        Dist-Zilla-Plugin-Meta-Contributors   1   1   0
     Dist-Zilla-Plugin-MakeMaker-Highlander   1   0   0
                         Task-CPAN-Reporter   1   0   0
                           IO-CaptureOutput   0   7   0
                                     pantry   0   7   2
                     TAP-Harness-Restricted   0   4   0
                            class-insideout   0   3   0
                               Hash-Ordered   0   3   0
                                    Log-Any   0   3   4
                                  perl-chef   0   3   0
                                 Term-Title   0   3   0
                               Test-DiagINC   0   3   0
                          Acme-require-case   0   2   0
                                 Class-Tiny   0   2   0
                                  Data-Fake   0   2   2
                  dist-zilla-plugin-twitter   0   2   0
                   Log-Any-Adapter-Log4perl   0   2   0
                             math-random-oo   0   2   0
                                 superclass   0   2   0
                                   Test-Roo   0   2   0
                              universal-new   0   2   0
                           zzz-rt-to-github   0   2   0
                      app-ylastic-costagent   0   1   0
                      Dancer-Session-Cookie   0   1   0
          Dist-Zilla-Plugin-CheckExtraTests   0   1   0
          Dist-Zilla-Plugin-InsertCopyright   0   1   0
Dist-Zilla-Plugin-ReleaseStatus-FromVersion   0   1   0
                                 File-chdir   0   1   0
                                 File-pushd   0   1   0

Now, when I set aside maintenance time, I know where to work.

dagolden: A parallel MongoDB client with Perl and fork

Concurrency is hard, and that's just as true in Perl as it is in most languages. While Perl has threads, they aren't lightweight, so they aren't an obvious answer to parallel processing the way they are elsewhere. In Perl, doing concurrent work generally means (a) a non-blocking/asynchronous framework or (b) forking sub-processes as workers.

There is no officially-supported async MongoDB driver for Perl (yet), so this article is about forking.

The problem with forking a MongoDB client object is that forks don't automatically close sockets. And having two (or more) processes trying to use the same socket is a recipe for corruption.

At one point in the design of the MongoDB Perl driver v1.0.0, I had it cache the PID on creation and then check if it had changed before every operation. If so, the socket to the MongoDB server would be closed and re-opened. It was auto-magic!

The problem with this approach is that it incurs overhead on every operation, regardless of whether forks are in use. Even if forks are used, they are rare compared to the frequency of database operations for any non-trivial program.

So I took out that mis-feature. Now, you must manually call the reconnect method on your client objects after you fork (or spawn a thread, too).

Here's a pattern I've found myself using from time to time to do parallel processing with Parallel::ForkManager, adapted to reconnect the MongoDB client object in each child:

use Parallel::ForkManager;

# Pass in a MongoDB::MongoClient object, the number of parallel jobs to
# run, and a code-reference to execute. The code reference is passed
# the client and the iteration number.
sub parallel_mongodb {
    my ( $client, $jobs, $fcn ) = @_;

    my $pm = Parallel::ForkManager->new( $jobs > 1 ? $jobs : 0 );

    local $SIG{INT} = sub {
        warn "Caught SIGINT; Waiting for child processes\n";
        $pm->wait_all_children;
        exit 1;
    };

    for my $i ( 0 .. $jobs - 1 ) {
        $pm->start and next;
        $SIG{INT} = sub { $pm->finish };
        $client->reconnect;
        $fcn->( $i );
        $pm->finish;
    }

    $pm->wait_all_children;
}

To use this subroutine, I partition the input data into the number of jobs to run. Then I call parallel_mongodb with a closure that can find the input data from the job number:

use MongoDB;

# Partition input data into N parts.  Assume each is a document to insert.
my @data = (
   [ { a => 1 },  {b => 2},  ... ],
   [ { m => 11 }, {n => 12}, ... ],
   ...
);
my $number_of_jobs = @data;

my $client = MongoDB->connect;
my $coll = $client->ns("test.dataset");

parallel_mongodb( $client, $number_of_jobs,
  sub {
    $coll->insert_many( $data[ shift ], { ordered => 0 } );
  }
);

Of course, you want to be careful that the job count (i.e. the partition count) is optimal. I find that having it roughly equal to the number of CPUs tends to work pretty well in practice.

What you don't want to do, however, is to call $pm->start more than the number of child tasks you want running in parallel. You don't want a new process for every data item to process, since each fork also has to reconnect to the database, which is slow. That's why you should figure out the partitioning first, and only spawn a process per partition.

This is best for "embarrassingly parallel" problems, where there's no need for communication back from the child processes. And while what I've shown does a manual partition into arrays, you could also do this with a single array, where child workers only processes indices where the index modulo the number of jobs is equal to the job ID. Or you could have child workers pulling from a common task queue over a network, etc.

TIMTOWTDI, and now you can do it in parallel.

Perl Foundation News: Perl 5 Grant Application: QA Hackathon Travel

We have received the following grant application from Ricardo Signes. Before we vote on this proposal we would like to have a period of community consultation that will last seven days. Please leave feedback in the comments or if you prefer send email with your comments to karen at perlfoundation.org.

Name: Ricardo Signes

Project Title: Perl QA Hackathon 2016

Amount Requested: $1200

Synopsis:

This grant will be used to pay for travel for Ricardo Signes to and from the Perl QA Hackathon held in Rugby, UK in Q1 2016.

Benefits to Perl 5:

I have attended six of the seven Perl QA Hackathons (Oslo, Birmingham, Amsterdam, Paris, Lancaster, and Lyon) and have, at each of them, been able to contribute several solid work days of very productive work to the infrastructure behind the CPAN and related tools. Specifically, I was one of the chief implementors of the new CPAN Testers platform (Metabase) and built the Fake CPAN system for testing CPAN tools, and several reusable software libraries that are used to power both Metabase and Fake CPAN. In 2012, I worked on refactoring PAUSE, adding tests and improving maintainability. PAUSE, the system which processes contributor uploads to the CPAN, manages CPAN contributor identity, and builds the CPAN indexes used by CPAN clients to locate libraries for installation.

In previous years, I also spent a significant amount of time working with other attendees on their contributions, and plan to do the same this year. This is one of the several reasons that attendance in person is incomparably superior to "virtual attendance."

Deliverable Elements:

The QA Hackathon does not have a set agenda, so promising specific work product from it up front seems unwise. I have detailed, above, the sort of work that I am almost certain to do, however. Further, I will provide a public, written report of my activities at the Hackathon.

I hope, in particular, to work on the web code of PAUSE and to discuss mechanisms for improving collaborative code review within the community of toolchain maintainers.

The hackathon takes place over the course of four days, with eight to ten hour workdays. I'll probably also be working on the travel and in the evenings.

Any software that I produce will be released under the Perl 5 standard license terms, or possibly even less restrictive terms.

Applicant Biography:

I have been building software in Perl professionally for about fifteen years. I am a frequent contributor of original software to the CPAN and a frequent contributor to, or maintainer of, other popular CPAN libraries. I am also a contributor to the core Perl 5 project, and its current project lead.

I have been the recipient of TPF grants five times before, all of which were successful.

Joel Berger: Get an in-browser remote desktop with Mojolicious and noVNC

The article itself is published at PerlTricks. This is the second article I’ve published there (the first was about How to send verification emails using Mojolicious). Hopefully more to come!

Sawyer X: Perl 5 Porters Mailing List Summary: January 25th - February 1st

Hey everyone,

Following is the p5p (Perl 5 Porters) mailing list summary for the past week. Enjoy!

January 25th - February 1st

Corrections

The previous summary accidentally included the wrong ticket number for a Storable bug and blamed JSON::XS and Cpanel::JSON::XS. Those had been fixed in the published blog post and in the repo. My apologies. Thanks, Ben Bullock, for the correction!

News and updates

Encode 2.80 released! You can read more here.

Dagfinn Ilmari Mannsåker merged his branch that exposes more siginfo_t fields to the sounds of appreciation from fellow developers.

Craig A. Berry has integrated podlators into core.

podlators 4.06 released!

The 12th grant report from Tony Cook's 6th grant in which approximately 9 tickets were reviewed or worked on, and 3 patches were applied in roughly 17 hours.

Tony also provides a summary of the month of December. Roughly 50 hours in which approximately 28 tickets were reviewed, and 5 patches were applied.

Bugs

Reported bugs

Resolved bugs

Rejected bugs

Proposed patches

Another proposed patch by Tony Cook in Perl #126410 which does not break on debugging/threaded builds.

Discussion

Following Chad Granum's release of Importer, Aristotle commented on the list not favoring this suggestion while Kent Fredric commented on the benefit of it in comparison with the current exporting approach.

The discussion of the topic Karl Williamson raised with two different implementations of Unicode sentence boundary continues. It is still unclear what should be supported and how.

Ben Bullock pinned the problem in Perl #127232 to a Storable breaking the encapsulation of objects.

Bulk88 covered several ways of storing C resources in Perl. This is a worthy read.

Karl Williamson provided a review of a patch provided by Niko Tyni in Perl #127288.

Ricardo Signes is pinging the list on Perl #125833 and suggesting simply forbidding any leading colons in require or use statements.

Another ping from Ricardo on Perl #125569, regarding a memory saving patch by Bulk88.

And one more ping from Ricardo on Perl #116965 which garnished some interest and discussion.

Ricardo also commented on Perl #124368 with regards to handling literal // and /$null/.

Dennis Kaarsemaker and Tony Cook fixed the Win32 Jenkins build and Dennis took the time to share with the list how the build script was fixed.

James E. Keenan started testing blead on Darwin/PPC and found two failing tests on older Darwins for the new siginfo_t fields that Dagfinn Ilmari Mannsåker exposed. Ilmari and Lukas provided patches with a fix and James is running a smoke test with them.

Ed Avis opened Perl #127405 on removing the core function dump since it serves little to no value. Several comments added information on its purpose, problems, and lack of current usefulness.

Felipe Gasper opened Perl #127386 regarding setting the proper value for $!. This led to an interesting talk on the list regarding how Perl handles exit codes.

Jarkko Hietaniemi sent a Git hook he wrote that enforces a smoke test before a commit push, which he uses frequently, along with explanations on how it works.

Perl #127391 does not seem like a bug, but led to a discussion on associative subtleties.

Did you know that in the old days you could start a shell script with a colon? More explanations from Zefram here.

Perl Foundation News: Ian Hague Perl 6 Grant Application: JavaScript backend for Rakudo

We have received the following Perl 6 Ian Hague Grant Application. Before we vote on this proposal we would like to have a period of community consultation for 10 days. Please leave feedback in the comments or if you prefer send email with your comments to karen at perlfoundation.org.

Name: Paweł Murias

Project Title: JavaScript backend for Rakudo

Synopsis:

Improve the JavaScript backend from handling NQP (Not Quite Perl) to full Perl 6.

Benefits to to Perl 6 Development:

A JavaScript backend for Rakudo will allow the use of Perl 6 in many new niches. The main focus of the grant is to allow Perl 6 to be used for writing the frontend part of single page applications (for the backend part we can use MoarVM).

A side benefit of the grant is that I intend to create a web-based REPL that should allow users to play around with Perl 6 without installing it. The goal of the grant is provide a JavaScript backend with enough features that the community can start experimenting with what running inside the browser will allow us to accomplish.

While working on the JavaScript backend I write test for the things I'm implementing. Expanding the test suite will directly help future backend authors. It also helps anyone doing non trivial changes to the existing backends. I have found MoarVM and JVM bugs in the past doing that.

Deliverables:

  • Upload rakudo-js to npm and CPAN.
  • Have this rakudo-js be able compile our chosen subset of the 6.c roast (official Perl 6 test suit) to JavaScript and pass them in a modern browser.
  • Write a simple REPL in Perl 6 that will run in a modern browser.
  • Write a tutorial showing how to use the JavaScript backend.

Project Details:

Rakudo compiles Perl 6 and NQP (a subset of Perl 6 that Rakudo itself is written in) to an abstract syntax tree form called QAST. QAST is then passed to either the MoarVM, JVM or JavaScript backends. Currently the JavaScript backend can only handle AST that is produced from NQP. The goal of this project is to improve the JavaScript backend to handle the QAST produced from full Perl 6.

I started the original work on the JavaScript backend while Rakudo was transitioning from being a Parrot targeting compiler to a multi-platform one. Parts of the work on the backend was done as parts of a GSoC project. After the GSoC projects I undertook a rewrite of the backend. The rewrite allowed me to add source maps support and use more type information to generate better code. The JavaScript backend is now merged in the the master branch of the NQP repo.

After reviewing the initial draft of this grant proposal Jonathan Worthington pointed out that implementing gather/take proved to be tricky on other backends. To reduce this risk I added basic continuations support to the backend. This was enough to run a basic form of gather/take: https://github.com/perl6/nqp/blob/master/t/js/continuations.t#L76 . I implemented this using a CPS transform with a trampoline (to work around the lack of tail call optimization).

Most of Perl 6 is built from smaller building blocks. This will mean that a large part of the effort will be needed before I get to the point where the test module compiles and the first test passes. On the other hand when the needed building blocks are implemented correctly I will be able to reuse the quality work that went into Rakudo and the setting.

Inch-stones:

  • Cleanup the array handling in nqp-js.
  • Finish up serialization of closures in the nqp-js-running-on-js.
  • Go through the MoarVM opcode list and where it's possible write tests for untested opcode and implement them in nqp-js
  • Do the obvious speedups for the code generated by nqp-js.
  • Compile the meta-model and bootstrap support with nqp-js.
  • Implement a bunch of p6 specific ops.
  • Get rakudo to compile on nqp-js.
  • Get rakudo compiled to js to correctly compile nqp::say("Hello World").
  • Get the rakudo setting to compile.
  • Get rakudo compiled to js to correctly compile say("Hello World").
  • Get Test.pm to correctly compile.
  • Pass a first test.
  • Go through roast test fixing bugs and implementing missing things to make them pass.
  • Get continuations support fully functional. Get nqp-js to pass test all test in full CPS mode.
  • Pass the part of roast we focus on node.js.
  • Be able to webpack the generated javascript code.
  • Be able to run tests in the browser.
  • Pass the part of roast we focus on in a browser.
  • Polish up source maps support.
  • Implement (and test) interoperability with javascript code
  • Upload rakudo-js on npm.
  • Write a simple Perl6 REPL that should run in modern browser.
  • Put the REPL on try.perl6.org when the community views it as good enough.
  • Write a tutorial that describes how to use the backend.
  • Make sure our source map support works correctly and integrates well with browsers.
  • Fix issues that early adopters will encounter.
  • Fix the most obvious performance issues.

Completeness Criteria:

  • Rakudo-js released on npm
  • Rakudo-js passes our chosen subset of roast
  • A simple Perl 6 REPL running in a modern browser (for evaluation purposes a modern version of Google Chrome)

List of tests we want to pass:

We want to pass tests from the official Perl 6 Test Suit "Roast". It can be found on github.com/perl6/roast.

For the purpose of the grant we focus on a subset of those (mainly excluding IO and OS interaction). Some tests might be broken for reasons outside of the JavaScript backend so that for grant completeness purposes we are only concerned with the tests that pass on the Rakudo MoarVM backend.

We want to pass the tests in the following subdirectories of roast:

  • S02-lexical-conventions
  • S02-lists
  • S02-literals
  • S02-magicals
  • S02-names
  • S02-names-vars
  • S02-one-pass-parsing
  • S02-packages
  • S02-types
  • S03-binding
  • S03-feeds
  • S03-junctions
  • S03-metaops
  • S03-operators
  • S03-sequence
  • S03-smartmatch
  • S04-blocks-and-statements
  • S04-declarations
  • S04-exception-handlers
  • S04-exceptions
  • S04-phasers
  • S04-statement-modifiers
  • S04-statement-parsing
  • S04-statements
  • S05-capture
  • S05-grammar
  • S05-interpolation
  • S05-mass
  • S05-match
  • S05-metachars
  • S05-metasyntax
  • S05-modifier
  • S05-nonstrings
  • S05-substitution
  • S05-syntactic-categories
  • S05-transliteration
  • S06-advanced
  • S06-currying
  • S06-macros
  • S06-multi
  • S06-operator-overloading
  • S06-other
  • S06-routine-modifiers
  • S06-signature
  • S06-traits
  • S07-iterators
  • S09-autovivification
  • S09-hashes
  • S09-subscript
  • S09-typed-arrays
  • S10-packages
  • S11-modules
  • S12-attributes
  • S12-class
  • S12-construction
  • S12-enums
  • S12-introspection
  • S12-meta
  • S12-methods
  • S12-subset
  • S12-traits
  • S13-overloading
  • S13-syntax
  • S13-type-casting
  • S14-roles
  • S14-traits
  • S32-array
  • S32-basics
  • S32-container
  • S32-exceptions
  • S32-hash
  • S32-list
  • S32-num
  • S32-scalar
  • S32-str
  • S32-temporal
  • S32-trig

Project Schedule:

The project is expected to take 4 months of full time effort. I will begin work as soon as the grant gets accepted. Based on the progress of previous backends most of the effort will be needed to get to the point where all the basic building blocks are working and we start passing tests.

The plan for the months of the grant is:

  • Getting to the point where can begin compiling the CORE setting.
  • Correctly compile and load the setting.
  • Fixing inevitable bugs that cause failing tests and implementing missing bits of functionality. After this steps we should be passing the roast test suit.
  • Working on making the backend easy to install and use. This will include writing a tutorial, fixing issues that early users find, tweaking source maps, improving obvious performance problems.

Report Schedule:

I will report on the progress of the grant on a blog at least every two weeks, preferably more often. I will also keep the #perl6 channel updated on my progress.

Public Repository:

The backend code will be hosted at github.com/perl6/nqp. Any required modifications to rakudo will be hosted at github.com/perl6/rakudo. (Work on rakudo itself will intially be done in either a branch or a github fork).

Grant Deliverables ownership/copyright and License Information:

All the work produced as a result of this grant will be licensed under the Artistic License Version 2.0. I will send in the CLA and if required transfer the copyright to The Perl Foundation.

Things not addressed by the scope of the grant:

Performance and size of the generated JavaScript code will likely be an important concern before using the backend for serious production use. While I'll to keep it in mind and attempt to solve the most obvious problems as it's a very open ended issue it's not the main focus of the grant. As the feedback from the #perl6 channel seemed to imply webapps are the thing the community is the most interested in, I'm moving IO support while running on node.js outside of the scope of grant.

Amount Requested: $10000.

Bio:

Worked on the mildew/smop/kp6 Perl 6 implementations. Once the Perl 6 implementations converged on Rakudo I started working on the JavaScript backend for it, starting first with one for NQP. I worked on the JavaScript backend for NQP outside of and as part of a Google Summer of Code project. I have tweaked both the MoarVM and JVM backends so if changes to the whole of NQP are required I will be capable of that.

Perl Foundation News: Grant Report: Test::Simple/Stream Stabilization

In the last month, Chad has been working with Ricardo Signes (rjbs) doing final tweaking of Test2. For that, some new versions of Test2 and related modules have been published for testing and review purposes.

For those who are lazy, some pointers here for Test2, Test2::Suite, Test2::Workflow and dev release of Test::Builder.

Perlgeek.de : All Perl 6 modules in a box

Sometimes when we change things in the Perl 6 language or the Rakudo Perl 6 compiler that implements it, we want to know if the planned changes will cause fallout in the library modules out there, and how much.

To get a quick estimate, we can now do a git grep in the experimental perl6-all-modules repository.

This is an attempt to get all the published module into a single git repository. It is built using git subrepo, an unofficial git extension module that I've been wanting to try for some time, and that seems to have some advantages over submodules in some cases. The notable one in this case being that git grep ignores submodules, but descends into subrepos just fine.

Here is the use case that made me create this repository: Rakudo accesses low-level operations through the nqp:: pseudo namespace. For example nqp::concat_s('a', 'b') is a low-level way to concatenate two strings. User-level programs can also use nqp:: ops, though it is generally a bad idea, because it ties the program to the particular compiler used, and what's more, the nqp:: ops are not part of the public API, and thus neither documented in the same place as the rest of Perl 6, nor are there any promises for stability attached.

So we want to require module authors to use a pragma, use nqp; in order to make their use of compiler internal explicit and deliberate. And of course, where possible, we want them to not use them at all :-)

To find out how many files in the ecosystem use nqp:: ops, a simple command, combined with the power of the standard UNIX tools, will help:

$ git grep -l 'nqp::'|wc -l
32

That's not too bad, considering we have... how many modules/distributions again?

Since they are added in author/repo structure, counting them with ls and wc isn't hard:

ls -1d */*/|wc -l
282

Ok, but number of files in relation to distributions isn't really useful. So let's ask: how many distributions directly use nqp:: ops?

$ git grep -l nqp:: | cut -d/ -f1,2 |sort -u|wc -l
23

23 out of 282 (or about 8%) distributions use the nqp:: syntax.

By the way, there is a tool (written in Perl 6, of course) to generate and update the repository. Not perfect yet, very much a work in progress. It's in the _tools folder, so you should probably filter out that directory in your queries (though in the examples above, it doesn't make a difference).

So, have fun with this new toy!

Dave's Free Press: Journal: Module pre-requisites analyser

Dave's Free Press: Journal: CPANdeps

Dave's Free Press: Journal: Perl isn't dieing

Perlgeek.de : doc.perl6.org: some stats, future directions

In June 2012 I started the perl6/doc repository with the intent to collect/write API documentation for Perl 6 built-in types and routines. Not long afterwards, the website doc.perl6.org was born, generated from the aforementioned repository.

About 2.5 years later, the repository has seen more than one thousand commits from more than 40 contributors, 14 of which contributed ten patches or more. The documentation encompasses about 550 routines in 195 types, with 15 documents for other things than built-in types (for example an introduction to regexes, descriptions of how variables work).

In terms of subjective experience, I observed an increase in the number of questions on our IRC channel and otherwise that could be answered by pointing to the appropriate pages of doc.perl6.org, or augmenting the answer with a statement like "for more info, see ..."

While it's far from perfect, I think both the numbers and the experience is very encouraging, and I'd like to thank everybody who helped make that happen, often by contributing skills I'm not good at: front-end design, good English and gentle encouragement.

Plans for the Future

Being a community-driven project, I can't plan anybody else's time on it, so these are my own plans for the future of doc.perl6.org.

Infrastructural improvements

There are several unsolved problems with the web interface, with how we store our documents, and how information can be found. I plan to address them slowly but steadily.

  • The search is too much centered around types and routines, searching for variables, syntactic constructs and keywords isn't easily possible. I want it to find many more things than right now.
  • Currently we store the docs for each type in a separate file called Type.pod. Which will break when we start to document native types, which being with lower case letters. Having int.pod and Int.pod is completely unworkable on case-insensitive or case-preserving file system. I want to come up with a solution for that, though I don't yet know what it will look like.
  • doc.perl6.org is served from static pages, which leads to some problems with file names conflicting with UNIX conventions. You can't name a file infix:</>.html, and files with two consecutive dots in their names are also weird. So in the long run, we'll have to switch to some kind of dynamic URL dispatching, or a name escaping scheme that is capable of handling all of Perl 6's syntax.
  • Things like the list of methods and what they coerce to in class Cool don't show up in derived types; either the tooling needs to be improved for that, or they need to be rewritten to use the usual one-heading-per-method approach.

Content

Of course my plan is to improve coverage of the built-in types and routines, and add more examples. In addition, I want to improve and expand on the language documentation (for example syntax, OO, regexes, MOP), ideally documenting every Perl 6 feature.

Once the language features are covered in sufficient breadth and depth (though I won't wait for 100% coverage), I want to add three tutorial tracks:

  • A track for beginners
  • A quick-start for programmers from other languages
  • A series of intermediate to advanced guides covering topics such as parsing, how to structure a bigger application, the responsible use of meta programming, or reactive programming.

Of course I won't be able to do that all on my own, so I hope to convince my fellow and future contributors that those are good ideas.

Time to stop rambling about the future, and off to writing some docs, this is yours truly signing off.

Perlgeek.de : Introducing Go Continuous Delivery

Go Continuous Delivery (short GoCD or simply Go) is an open source tool that controls an automated build or deployment process.

It consists of a server component that holds the pipeline configuration, polls source code repositories for changes, schedules and distributes work, collects artifacts, and presents a web interface to visualize and control it all, and offers a mechanism for manual approval of steps. One or more agents can connect to the server, and carry out the actual jobs in the build pipeline.

Pipeline Organization

Every build, deployment or test jobs that GoCD executes must be part of a pipeline. A pipeline consists of one or more linearly arranged stages. Within a stage, jobs run potentially in parallel, and are individually distributed to agents. Tasks are again linearly executed within a job. The most general task is the execution of an external program. Other tasks include the retrieval of artifacts, or specialized things such as running a Maven build.

Matching of Jobs to Agents

When an agent is idle, it polls the server for work. If the server has jobs to run, it uses two criteria to decide if the agent is fit for carrying out the job: environments and resources.

Each job is part of a pipeline, and a pipeline is part of an environment. On the other hand, each agent is configured to be part of one or more environments. An agent only accepts jobs from pipelines from one of its environments.

Resources are user-defined labels that describe what an agent has to offer, and inside a pipeline configuration, you can specify what resources a job needs. For example you can define that job requires the phantomjs resource to test a web application, then only agents that you assign this resource will execute that job. It is also a good idea to add the operating system and version as a resources. In the example above, the agent might have the phantomjs, debian and debian-jessie resources, offering the author of the job some choice of granularity for specifying the required operating system.

Installing the Go Server on Debian

To install the Go server on a Debian or Debian-based operating system, first you have to make sure you can download Debian packages via HTTPS:

$ apt-get install -y apt-transport-https

Then you need to configure the package sourcs:

$ echo 'deb http://dl.bintray.com/gocd/gocd-deb/ /' > /etc/apt/sources.list.d/gocd.list
$ curl https://bintray.com/user/downloadSubjectPublicKey?username=gocd | apt-key add -

And finally install it:

$ apt-get update && apt-get install -y go-server

When you now point your browser at port 8154 of the go server for HTTPS (ignore the SSL security warnings) or port 8153 for HTTP, you should see to go server's web interface:

To prevent unauthenticated access, create a password file (you need to have the apache2-utils package installed to have the htpasswd command available) on the command line:

$ htpasswd -c -s /etc/go-server-passwd go-admin
New password:
Re-type new password:
Adding password for user go-admin
$ chown go: /etc/go-server-passwd
$ chmod 600 /etc/go-server-passwd

In the go web interface, click on the Admin menu and then "Server Configuration". In the "User Management", enter the path /etc/go-server-passwd in the field "Password File Path" and click on "Save" at the bottom of the form.

Immediately afterwards, the go server asks you for username and password.

You can also use LDAP or Active Directory for authentication.

Installing a Go Worker on Debian

On one or more servers where you want to execute the automated build and deployment steps, you need to install a go agent, which will connect to the server and poll it for work. On each server, you need to do the first same three steps as when installing the server, to ensure that you can install packages from the go package repository. And then, of course, install the go agent:

$ apt-get install -y apt-transport-https
$ echo 'deb http://dl.bintray.com/gocd/gocd-deb/ /' > /etc/apt/sources.list.d/gocd.list
$ curl https://bintray.com/user/downloadSubjectPublicKey?username=gocd | apt-key add -
$ apt-get update && apt-get install -y go-agent

Then edit the file /etd/default/go-agent. The first line should read

GO_SERVER=127.0.0.1

Change the right-hand side to the hostname or IP address of your go server, and then start the agent:

$ service go-agent start

After a few seconds, the agent has contacted the server, and when you click on the "Agents" menu in the server's web frontend, you should see the agent:

("lara" is the host name of the agent here).

A Word on Environments

Go makes it possible to run agents in specific environments, and for example run a go agent on each testing and on each production machine, and use the matching of pipelines to agent environments to ensure that for example an installation step happens on the right machine in the right environment. If you go with this model, you can also use Go to copy the build artifacts to the machines where they are needed.

I chose not to do this, because I didn't want to have to install a go agent on each machine that I want to deploy to. Instead I use Ansible, executed on a Go worker, to control all machines in an environment. This requires managing the SSH keys that Ansible uses, and distributing packages through a Debian repository. But since Debian seems to require a repository anyway to be able to resolve dependencies, this is not much of an extra hurdle.

So don't be surprised when the example project here only uses a single environment in Go, which I call Control.

First Contact with Go's XML Configuration

There are two ways to configure your Go server: through the web interface, and through a configuration file in XML. You can also edit the XML config through the web interface.

While the web interface is a good way to explore go's capabilities, it quickly becomes annoying to use due to too much clicking. Using an editor with good XML support get things done much faster, and it lends itself better to compact explanation, so that's the route I'm going here.

In the Admin menu, the "Config XML" item lets you see and edit the server config. This is what a pristine XML config looks like, with one agent already registered:

<?xml version="1.0" encoding="utf-8"?>
<cruise xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="cruise-config.xsd" schemaVersion="77">
<server artifactsdir="artifacts" commandRepositoryLocation="default" serverId="b2ce4653-b333-4b74-8ee6-8670be479df9">
    <security>
    <passwordFile path="/etc/go-server-passwd" />
    </security>
</server>
<agents>
    <agent hostname="lara" ipaddress="192.168.2.43" uuid="19e70088-927f-49cc-980f-2b1002048e09" />
</agents>
</cruise>

The ServerId and the data of the agent will differ in your installation, even if you followed the same steps.

To create an environment and put the agent in, add the following section somewhere within <cruise>...</cruise>:

<environments>
    <environment name="Control">
    <agents>
        <physical uuid="19e70088-927f-49cc-980f-2b1002048e09" />
    </agents>
    </environment>
</environments>

(The agent UUID must be that of your agent, not of mine).

To give the agent some resources, you can change the <agent .../> tag in the <agents> section to read:

<agent hostname="lara" ipaddress="192.168.2.43" uuid="19e70088-927f-49cc-980f-2b1002048e09">
  <resources>
    <resource>debian-jessie</resource>
    <resource>build</resource>
    <resource>debian-repository</resource>
  </resources>
</agent>

Creating an SSH key

It is convenient for Go to have an SSH key without password, to be able to clone git repositories via SSH, for example.

To create one, run the following commands on the server:

$ su - go $ ssh-keygen -t rsa -b 2048 -N '' -f ~/.ssh/id_rsa

And either copy the resulting .ssh directory and the files therein onto each agent into the /var/go directory (and remember to set owner and permissions as they were created originally), or create a new key pair on each agent.

Ready to Go

Now that the server and an agent has some basic configuration, it is ready for its first pipeline configuration. Which we'll get to soon :-).


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 3

Perlgeek.de : Automating Deployments: Distributing Debian Packages with Aptly

Once a Debian package is built, it must be distributed to the servers it is to be installed on.

Debian, as well as all other operating systems I know of, use a pull model for that. That is, the package and its meta data are stored on a server that the client can contact, and request the meta data and the package.

The sum of meta data and packages is called a repository. In order to distribution packages to the servers that need them, we must set up and maintain such a repository.

Signatures

In Debian land, packages are also signed cryptographically, to ensure packages aren't tampered with on the server or during transmission.

So the first step is to create a key pair that is used to sign this particular repository. (If you already have a PGP key for signing packages, you can skip this step).

The following assumes that you are working with a pristine system user that does not have a gnupg keyring yet, and which will be used to maintain the debian repository. It also assumes you have the gnupg package installed.

$ gpg --gen-key

This asks a bunch of questions, like your name and email address, key type and bit width, and finally a pass phrase. I left the pass phrase empty to make it easier to automate updating the repository, but that's not a requirement.

$ gpg --gen-key
gpg (GnuPG) 1.4.18; Copyright (C) 2014 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

gpg: directory `/home/aptly/.gnupg' created
gpg: new configuration file `/home/aptly/.gnupg/gpg.conf' created
gpg: WARNING: options in `/home/aptly/.gnupg/gpg.conf' are not yet active during this run
gpg: keyring `/home/aptly/.gnupg/secring.gpg' created
gpg: keyring `/home/aptly/.gnupg/pubring.gpg' created
Please select what kind of key you want:
   (1) RSA and RSA (default)
   (2) DSA and Elgamal
   (3) DSA (sign only)
   (4) RSA (sign only)
Your selection? 1
RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) 
Requested keysize is 2048 bits
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0) 
Key does not expire at all
Is this correct? (y/N) y
You need a user ID to identify your key; the software constructs the user ID
from the Real Name, Comment and Email Address in this form:
    "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"

Real name: Aptly Signing Key
Email address: automatingdeployments@gmail.com
You selected this USER-ID:
    "Moritz Lenz <automatingdeployments@gmail.com>"

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
You need a Passphrase to protect your secret key.

You don't want a passphrase - this is probably a *bad* idea!
I will do it anyway.  You can change your passphrase at any time,
using this program with the option "--edit-key".

We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
..........+++++
.......+++++

Not enough random bytes available.  Please do some other work to give
the OS a chance to collect more entropy! (Need 99 more bytes)
..+++++
gpg: /home/aptly/.gnupg/trustdb.gpg: trustdb created
gpg: key 071B4856 marked as ultimately trusted
public and secret key created and signed.

gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
pub   2048R/071B4856 2016-01-10
      Key fingerprint = E80A D275 BAE1 DEDE C191  196D 078E 8ED8 071B 4856
uid                  Moritz Lenz <automatingdeployments@gmail.com>
sub   2048R/FFF787F6 2016-01-10

Near the bottom the line starting with pub contains the key ID:

pub   2048R/071B4856 2016-01-10

We'll need the public key later, so it's best to export it:

$ gpg --export --armor 071B4856 > pubkey.asc

Preparing the Repository

There are several options for managing Debian repositories. My experience with debarchiver is mixed: Once set up, it works, but it does not give immediate feedback on upload; rather it communicates the success or failure by email, which isn't very well-suited for automation.

Instead I use aptly, which works fine from the command line, and additionally supports several versions of the package in one repository.

To initialize a repo, we first have to come up with a name. Here I call it internal.

$ aptly repo create -distribution=jessie -architectures=amd64,i386,all -component=main internal

Local repo [internal] successfully added.
You can run 'aptly repo add internal ...' to add packages to repository.

$ aptly publish repo -architectures=amd64,i386,all internal
Warning: publishing from empty source, architectures list should be complete, it can't be changed after publishing (use -architectures flag)
Loading packages...
Generating metadata files and linking package files...
Finalizing metadata files...
Signing file 'Release' with gpg, please enter your passphrase when prompted:
Clearsigning file 'Release' with gpg, please enter your passphrase when prompted:

Local repo internal has been successfully published.
Please setup your webserver to serve directory '/home/aptly/.aptly/public' with autoindexing.
Now you can add following line to apt sources:
  deb http://your-server/ jessie main
Don't forget to add your GPG key to apt with apt-key.

You can also use `aptly serve` to publish your repositories over HTTP quickly.

As the message says, there needs to be a HTTP server that makes these files available. For example an Apache virtual host config for serving these files could look like this:

<VirtualHost *:80>
        ServerName apt.example.com
        ServerAdmin moritz@example.com

        DocumentRoot /home/aptly/.aptly/public/
        <Directory /home/aptly/.aptly/public/>
                Options +Indexes +FollowSymLinks

                Require all granted
        </Directory>

        # Possible values include: debug, info, notice, warn, error, crit,
        # alert, emerg.
        LogLevel notice
        CustomLog /var/log/apache2/apt/access.log combined
        ErrorLog /var/log/apache2/apt/error.log
        ServerSignature On
</VirtualHost>

After creating the logging directory (mkdir -p /var/log/apache2/apt/), enabling the the virtual host (a2ensite apt.conf) and restarting Apache, the Debian repository is ready.

Adding Packages to the Repository

Now that the repository is set up, you can add a package by running

$ aptly repo add internal package-info_0.1-1_all.deb
$ aptly publish update internal

Configuring a Host to use the Repository

Copy the PGP public key with which the repository is signed (pubkey.asc) to the host which shall use the repository, and import it:

$ apt-key add pubkey.asc

Then add the actual package source:

$ echo "deb http://apt.example.com/ jessie main" > /etc/apt/source.list.d/internal

After an apt-get update, the contents of the repository are available, and an apt-cache policy package-info shows the repository as a possible source for this package:

$ apt-cache policy package-info
package-info:
  Installed: (none)
  Candidate: 0.1-1
  Version table:
 *** 0.1-1 0
        990 http://apt.example.com/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status

This concludes the whirlwind tour through debian repository management and thus package distribution. Next up will be the actual package installation.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: Devel::CheckLib can now check libraries' contents

Ocean of Awareness: Top-down parsing is guessing

Top-down parsing is guessing. Literally. Bottom-up parsing is looking.

The way you'll often hear that phrased is that top-down parsing is looking, starting at the top, and bottom-up parsing is looking, starting at the bottom. But that is misleading, because the input is at the bottom -- at the top there is nothing to look at. A usable top-down parser must have a bottom-up component, even if that component is just lookahead.

A more generous, but still accurate, way to describe the top-down component of parsers is "prediction". And prediction is, indeed, a very useful component of a parser, when used in combination with other techniques.

Of course, if a parser does nothing but predict, it can predict only one input. Top-down parsing must always be combined with a bottom-up component. This bottom-up component may be as modest as lookahead, but it must be there or else top-down parsing is really not parsing at all.

So why is top-down parsing used so much?

Top-down parsing may be unusable in its pure form, but from one point of view that is irrelevant. Top-down parsing's biggest advantage is that it is highly flexible -- there's no reason to stick to its "pure" form.

A top-down parser can be written as a series of subroutine calls -- a technique called recursive descent. Recursive descent allows you to hook in custom-written bottom-up logic at every top-down choice point, and it is a technique which is completely understandable to programmers with little or no training in parsing theory. When dealing with recursive descent parsers, it is more useful to be a seasoned, far-thinking programmer than it is to be a mathematician. This makes recursive descent very appealing to seasoned, far-thinking programmers, and they are the audience that counts.

Switching techniques

You can even use the flexibility of top-down to switch away from top-down parsing. For example, you could claim that a top-down parser could do anything my own parser (Marpa) could do, because a recursive descent parser can call a Marpa parser.

A less dramatic switchoff, and one that still leaves the parser with a good claim to be basically top-down, is very common. Arithmetic expressions are essential for a computer language. But they are also among the many things top-down parsing cannot handle, even with ordinary lookahead. Even so, most computer languages these days are parsed top-down -- by recursive descent. These recursive descent parsers deal with expressions by temporarily handing control over to an bottom-up operator precedence parser. Neither of these parsers is extremely smart about the hand-over and hand-back -- it is up to the programmer to make sure the two play together nicely. But used with caution, this approach works.

Top-down parsing and language-oriented programming

But what about taking top-down methods into the future of language-oriented programming, extensible languages, and grammars which write grammars? Here we are forced to confront the reality -- that the effectiveness of top-down parsing comes entirely from the foreign elements that are added to it. Starting from a basis of top-down parsing is literally starting with nothing. As I have shown in more detail elsewhere, top-down techniques simply do not have enough horsepower to deal with grammar-driven programming.

Perl 6 grammars are top-down -- PEG with lots of extensions. These extensions include backtracking, backtracking control, a new style of tie-breaking and lots of opportunity for the programmer to intervene and customize everything. But behind it all is a top-down parse engine.

One aspect of Perl 6 grammars might be seen as breaking out of the top-down trap. That trick of switching over to a bottom-up operator precedence parser for expressions, which I mentioned above, is built into Perl 6 and semi-automated. (I say semi-automated because making sure the two parsers "play nice" with each other is not automated -- that's still up to the programmer.)

As far as I know, this semi-automation of expression handling is new with Perl 6 grammars, and it may prove handy for duplicating what is done in recursive descent parsers. But it adds no new technique to those already in use. And features like

  • mulitple types of expression, which can be told apart based on their context,
  • n-ary expressions for arbitrary n, and
  • the autogeneration of multiple rules, each allowing a different precedence scheme, for expressions of arbitrary arity and associativity,

all of which are available and in current use in Marpa, are impossible for the technology behind Perl 6 grammars.

I am a fan of the Perl 6 effort. Obviously, I have doubts about one specific set of hopes for Perl 6 grammars. But these hopes have not been central to the Perl 6 effort, and I will be an eager student of the Perl 6 team's work over the coming months.

Comments

To learn more about Marpa, there's the official web site maintained by Ron Savage. I also have a Marpa web site. Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.

Ocean of Awareness: Fast handy languages

Back around 1980, I had access to UNIX and a language I wanted to parse. I knew that UNIX had all the latest CS tools. So I expected to type in my BNF and "Presto, Language!".

Not so easy, I was told. Languages were difficult things created with complex tools written by experts who understood the issues. I recall thinking that, while English had a syntax that is as hard as they come, toddlers manage to parse it just fine. But experts are experts, and more so at second-hand.

I was steered to an LALR-based parser called yacc. (Readers may be more familiar with bison, a yacc successor.) LALR had extended the class of quickly parseable grammars a bit beyond recursive descent. But recursive descent was simple in principle, and its limits were easy to discover and work around. LALR, on the hand, was OK when it worked, but figuring out why it failed when it failed was more like decryption than debugging, and this was the case both with parser development and run-time errors. I soon gave up on yacc and found another way to solve my problem.

Few people complained about yacc on the Internet. If you noise it about that you are unable to figure out how to use what everybody says is the state-of-the-art tool, the conclusions drawn may not be the ones you want. But my experience seems to have been more than common.

LALR's claim to fame was that it was the basis of the industry-standard C compiler. Over three decades, its maintainers suffered amid the general silence. But by 2006, they'd had enough. GCC (the new industry standard) ripped its LALR engine out. By then the trend back to recursive descent was well underway.

A surprise discovery

Back in the 1970's, there had been more powerful alternatives to LALR and recursive descent. But they were reputed to be slow.

For some applications slow is OK. In 2007 I decided that a parsing tool that parsed all context-free languages at state-of-the-art speeds, slow or fast as the case might be, would be a useful addition to programmer toolkits. And I ran into a surprise.

Hidden in the literature was an amazing discovery -- an 1991 article by Joop Leo that described how to modify Earley's algorithm to be fast for every language class in practical use. (When I say "fast" in this article, I will mean "linear".) Leo's article had been almost completely ignored -- my project (Marpa) would become its first practical implementation.

Second-order languages

The implications of Leo's discovery go well beyond speed. If you can rely on the BNF that you write always producing a practical parser, you can auto-generate your language. In fact, you can write languages which write languages.

Which languages are fast?

The Leo/Earley algorithm is not fast for every BNF-expressible language. BNF is powerful, and you can write exponentially ambiguous languages in it. But programmers these days mostly care about unambiguous languages -- we are accustomed to tools and techniques that parse only a subset of these.

As I've said, Marpa is fast for every language class in practical use today. Marpa is almost certainly fast for any language that a modern programmer has in mind. Unless you peek ahead at the hints I am about to give you, in fact, it is actually hard to write an unambiguous grammar that goes non-linear on Marpa. Simply mixing up lots of left, right and middle recursions will not be enough to make an unambiguous grammar go non-linear. You will also need to violate a rule in the set that I am about to give you.

To guarantee that Marpa is fast for your BNF language, follow three rules:

  • Rule 1: Your BNF must be unambiguous.
  • Rule 2: Your BNF must have no "unmarked" middle recursions.
  • Rule 3: All of the right-recursive symbols in your BNF must be dedicated to right recursion.

Rule 3 turns out to be very easy to obey. I discuss it in detail in the next section, which will be about how to break these rules and get away with it.

Before we do that, let's look at what an "unmarked" middle recursion is. Here's an example of a "marked" middle recursion:

       M ::= 'b'
       M ::= 'a' M 'a'
    

Here the "b" symbol is the marker. This marked middle recursion generates sequences like

       b
       a b a
       a a b a a
    

Now for an "unmarked" middle recursion:

       M ::= 'a' 'a'
       M ::= 'a' M 'a'
    

This unmarked middle recursion generates sequences like

       a a
       a a a a
       a a a a a a
    

In this middle recursion there is no marker. To know where the middle is, you have to scan all the way to the end, and then count back.

A rule of thumb is that if you can "eyeball" the middle of a long sequence, the recursion is marked. If you can't, it is unmarked. Unfortunately, we can't characterize exactly what a marker must look like -- a marker can encode the moves of a Turing machine, so marker detection is undecidable.

How to get away with breaking the rules

The rules about ambiguity and recursions are "soft". If you only use limited ambiguity and keep your rule-breaking recursions short, your parser will stay fast.

Above, I promised to explain rule 3, which insisted that a right recursive symbol be "dedicated". A right recursive symbol is "dedicated" if it appears only as the recursive symbol in a right recursion. If your grammar is unambiguous, but you've used an "undedicated" right-recursive symbol, that is easy to fix. Just rewrite the grammar, replacing the "undedicated" symbol with two different symbols. Dedicate one of the two to the right recursion, and use the other symbol everywhere else.

When NOT to use Marpa

The languages I have described as "fast" for Marpa include all those in practical use and many more. But do you really want to use Marpa for all of them? Here are four cases for which Marpa is probably not your best alternative.

The first case: a language that parses easily with a regular expression. The regular expression will be much faster. Don't walk away from a good thing.

The second case: a language that is easily parsed using a single loop and some state that fits into constant space. This parser might be very easy to write and maintain. If you are using a much slower higher level language, Marpa's optimized C language may be a win on CPU speed. But, as before, if you have a good thing, don't walk away from it.

The third case: a variation on the second. Here your single loop might be getting out of hand, making you yearn for the syntax-driven convenience of Marpa, but your state still fits into constant space. In its current implementation, Marpa keeps all of its parse tables forever, so Marpa does not parse in constant space. Keeping the tables allows Marpa to deal with the full structure of its input, in a way that a SAX-ish approaches cannot. But if space requirements are an issue, and your application allows a simplified constant-space approach, Marpa's power and convenience may not be enough to make up for that.

The fourth case: a language that

  • is very small;
  • changes slowly or not at all, and does not grow in complexity;
  • merits careful hand-optimization, and has available the staff to do it;
  • merits and has available the kind of on-going support that will keep your code optimized under changing circumstances; and
  • is easily parseable via recursive descent:

It is rare that all of these are the case, but when that happens, recursive descent is often preferable to Marpa. Lua and JSON are two languages which meet the above criteria. In Lua's case, it targets platforms with very restricted memories, which is an additional reason to prefer recursive descent -- Marpa has a relatively large footprint.

It was not good luck that made both Lua and JSON good targets for recursive descent -- they were designed around its limits. JSON is a favorite test target of Marpa for just these reasons. There are carefully hand-optimized C language parsers for us to benchmark against.

We get closer and closer, but Marpa will never beat small hand-optimized JSON parsers in software. However, while recursive descent is a technique for hand-writing parsers, Marpa is a mathematical algorithm. Someday, instructions for manipulating Earley items could be implemented directly in silicon. When and if that day comes, Earley's algorithm will beat recursive descent even at parsing the grammars that were designed for it.

Comments

Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net. To learn more about Marpa, there's the official web site maintained by Ron Savage. I also have a Marpa web site.

Perlgeek.de : Automating Deployments: Simplistic Deployment with Git and Bash

One motto of the Extreme Programming movement is to do the simplest thing that can possibly work, and only get more fancy when it is necessary.

In this spirit, the simplest deployment option for some projects is to change the working directory in a clone of the project's git repository, and run

git pull

If this works, it has a certain beauty of mirroring pretty much exactly what developers do in their development environment.

Reality kicks in

But it only works if all of these conditions are met:

  • There is already a checkout of the git repository, and it's configured correctly.
  • There are no local changes in the git repository.
  • There were no forced updates in the remote repository.
  • No additional build or test step is required.
  • The target machine has git installed, and both network connection to and credentials for the git repository server.
  • The presence of the .git directory poses no problem.
  • No server process needs to be restarted.
  • No additional dependencies need to be installed.

As an illustration on how to attack some of these problems, let's consider just the second point: local modifications in the git repository. It happens, for example when people try out things, or do emergency fixes etc. git pull does a fetch (which is fine), and a merge. Merging is an operation that can fail (for example if local uncommitted changes or local commits exists) and that requires manual intervention.

Manual changes are a rather bad thing to have in an environment where you want to deploy automatically. Their presence leave you two options: discard them, or refuse to deploy. If you chose the latter approach, git pull --ff-only is a big improvement; this will only do the merge if it is a trivial fast-forward merge, that is a merge where the local side didn't change at all. If that's not the case (that is, a local commit exists), the command exits with a non-zero return value, which the caller should interpret as a failure, and report the error somehow. If it's called as part of a cron job, the standard approach is to send an email containing the error message.

If you chose to discard the changes instead, you could do a git stash for getting rid of uncommitted changes (and at the same time preserving them for a time in the deps of the .git directory for later inspection), and doing a reset or checkout instead of the merge, so that the command sequence would read:

set -e
git fetch origin
git checkout --force origin/master

(This puts the local repository in a detached head state, which tends make manual working with it unpleasant; but at this point we have reserve this copy of the git repository for deployment only; manual work should be done elsewhere).

More Headaches

For very simple projects, using the git pull approach is fine. For more complex software, you have to tackle each of these problems, for example:

  • Clone the git repo first if no local copy exists
  • Discard local changes as discussed above (or remove the old copy, and always clone anew)
  • Have a separate checkout location (possibly on a different server), build and test there.
  • Copy the result over to the destination machine (but exclude the .git dir).
  • Provide a way to declare dependencies, and install them before doing the final copy step.
  • Provide a way to restart services after the copying

So you could build all these solutions -- or realize that they exist. Having a dedicated build server is an established pattern, and there are lot of software solutions for dealing with that. As is building a distributable software package (like .deb or .rpm packages), for which distribution systems exist -- the operating system vendors use it all the time.

Once you build Debian packages, the package manager ensure that dependencies are installed for you, and the postinst scripts provide a convenient location for restarting services.

If you chose that road, you gets lots of established tooling that wasn't explicitly mentioned above, but which often makes live much easier: Querying the database of existing packages, listing installed versions, finding which package a file comes from, extra security through package signing and signature verification, the ability to create meta packages, linter that warn about common packaging mistakes, and so on.

I'm a big fan of reusing existing solutions where it makes sense, and I feel this is a space where reusing can save huge amounts of time. Many of these tools have hundreds of corner cases already ironed out, and if you tried to tackle them yourself, you'd be stuck in a nearly endless exercise of yak shaving.

Thus I want to talk about the key steps in more detail: Building Debian packages, distributing them and installing them. And some notes on how to put them all together with existing tooling.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: I Love Github

Perlgeek.de : Automating Deployments: Debian Packaging for an Example Project

After general notes on Debian packaging, I want to introduce an example project, and how it's packaged.

The Project

package-info is a minimalistic web project, written solely for demonstrating packaging and deployment. When called in the browser, it produces a text document containing the output of dpkg -l, which gives an overview of installed (and potentially previously installed) packages, their version, installation state and a one-line description.

It is written in Perl using the Mojolicious web framework.

The actual code resides in the file usr/lib/package-info/package-info and is delightfully short:

#!/usr/bin/perl
use Mojolicious::Lite;

plugin 'Config';

get '/' => sub {
    my $c = shift;

    $c->render(text => scalar qx/dpkg -l/, format => 'text');
};

app->start;

It loads the "Lite" version of the framework, registers a route for the URL /, which renders as plain text the output of the system command dpkg -l, and finally starts the application.

It also loads the Config-Plugin, which is used to specify the PID file for the server process.

The corresponding config file in etc/package-info.conf looks like this:

#!/usr/bin/perl
{
    hypnotoad => {
        pid_file => '/var/run/package-info/package-info.pid',
    },
}

which again is perl code, and specifies the location of the PID file when run under hypnotoad, the application server recommended for use with Mojolicious.

To test it, you can install the libmojolicious-perl package, and run MOJO_CONFIG=$PWD/etc/package-info.conf morbo usr/lib/package-info/package-info. This starts a development server on port 3000. Pointing your browser at http://127.0.0.1:3000/, you should see a list like this:

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                  Version                              Architecture Description
+++-=====================================-====================================-============-===============================================================================
ii  ack-grep                              2.14-4                               all          grep-like program specifically for large source trees
ii  acl                                   2.2.52-2                             amd64        Access control list utilities
rc  acroread-debian-files                 0.2.5                                amd64        Debian specific parts of Adobe Acrobat Reader
ii  adduser                               3.113+nmu3                           all          add and remove users and groups
ii  adwaita-icon-theme                    3.14.0-2                             all          default icon theme of GNOME

though much longer.

Initial Packaging

Installing dh-make and running dh_make --createorig -p package-info_0.1 gives us a debian directory along with several files.

I started by editing debian/control to look like this:

Source: package-info
Section: main
Priority: optional
Maintainer: Moritz Lenz 
Build-Depends: debhelper (>= 9)
Standards-Version: 3.9.5

Package: package-info
Architecture: all
Depends: ${misc:Depends}, libmojolicious-perl
Description: Web service for getting a list of installed packages

Debian packages support the notion of source package, which a maintainer uploads to the Debian build servers, and from which one or more binary package are built. The control reflects this structure, with the first half being about the source package and its build dependencies, and the second half being about the binary package.

Next I deleted the file debian/source/format, which by default indicates the use of the quilt patch management system, which isn't typically used in git based workflows.

I leave debian/rules, debian/compat and debian/changelog untouched, and create a file debian/install with two lines:

etc/package-info.conf
usr/lib/package-info/package-info

In lieu of a proper build system, this tells dh_install which files to copy into the debian package.

This is a enough for a building a Debian package. To trigger the build, this command suffices:

debuild -b -us -uc

The -b instructs debuild to only create a binary package, and the two -u* options skips the steps where debuild cryptographically signs the generated files.

This command creates three files in the directory above the source tree: package-info_0.1-1_all.deb, package-info_0.1-1_amd64.changes and package-info_0.1-1_amd64.build. The .deb file contains the actual program code and meta data, the .changes file meta data about the package as well as the last changelog entry, and the .build file a transcript of the build process.

A Little Daemonology

Installing the .deb file from the previous step would install a working software, but you'd have to start it manually.

Instead, it is useful to provide means to automatically start the server process at system boot time. Traditionally, this has been done by shipping init scripts. Since Debian transitioned to systemd as its init system with the "Jessie" / 8 version, systemd service files are the new way to go, and luckily much shorter than a robust init script.

The service file goes into debian/package-info.service:

[Unit]
Description=Package installation information via http
Requires=network.target
After=network.target

[Service]
Type=simple
RemainAfterExit=yes
SyslogIdentifier=package-info
PIDFile=/var/run/package-info/package-info.pid
Environment=MOJO_CONFIG=/etc/package-info.conf
ExecStart=/usr/bin/hypnotoad /usr/lib/package-info/package-info -f
ExecStop=/usr/bin/hypnotoad -s /usr/lib/package-info/package-info
ExecReload=/usr/bin/hypnotoad /usr/lib/package-info/package-info

The [Unit] section contains the service description, as well as the specification when it starts. The [Service] section describes the service type, where simple means that systemd expects the start command to not terminate as long as the process is running. With Environment, environment variables can be set for all three of the ExecStart, ExecStop and ExecReload commands.

Another debhelper, dh-systemd takes care of installing the service file, as well as making sure the service file is read and the service started or restarted after a package installation. To enable it, dh-systemd must be added to the Build-Depends line in file debian/control, and the catch-all build rule in debian/rules be changed to:

%:
        dh $@ --with systemd

To enable hypnotoad to write the PID file, the containing directory must exists. Writing /var/run/package-info/ into a new debian/dirs file ensures this directory is created at package installation.

To test the changes, again invoke debuild -b -us -uc and install the resulting .deb file with sudo dpkg -i ../package-info_0.1-1_all.deb.

The server process should now listen on port 8080, so you can test it with curl http://127.0.0.1:8080/ | head.

A Bit More Security

As it is now, the application server and the application run as the root user, which violates the Principle of least privilege. Instead it should run as a separate user, package-info that isn't allowed to do much else.

To make the installation as smooth as possible, the package should create the user itself if it doesn't exist. The debian/postinst script is run at package installation time, and is well suited for such tasks:

#!/bin/sh

set -e
test $DEBIAN_SCRIPT_DEBUG && set -v -x

export PATH=$PATH:/sbin:/usr/sbin:/bin:/usr/bin

USER="package-info"

case "$1" in
    configure)
        if ! getent passwd $USER >/dev/null ; then
            adduser --system $USER
        fi
        chown -R $USER /var/run/package-info/
    ;;
esac

#DEBHELPER#

exit 0

There are several actions that a postinst script can execute, and configure is the right one for creating users. At this time, the files are already installed.

Note that it also changes the permissions for the directory in which the PID file is created, so that when hypnotoad is invoked as the package-info user, it can still create the PID file.

Please note the presence of the #DEBHELPER# tag, which the build system replaces with extra actions. Some of these come from dh-systemd, and take care of restarting the service after installation, and enabling it for starting after a reboot on first installation.

To set the user under which the service runs, adding the line User=package-info to the [UNIT] section of debian/package-info.service.

Linux offers more security features that can be enabled in a declarative way in the systemd service file in the [Unit] section. Here are a few that protect the rest of the system from the server process, should it be exploited:

PrivateTmp=yes
InaccessibleDirectories=/home
ReadOnlyDirectories=/bin /sbin /usr /lib /etc

Additional precautions can be taken by limiting the number of processes that can be spawned and the available memory through the LimitNPROC and MemoryLimit options.

The importance of good packaging

If you tune your packages so that they do as much configuration and environment setup themselves, you benefit two-fold. It makes it easy to the package in any context, regardless of whether it is embedded in a deployment system. But even if it is part of a deployment system, putting the package specific bits into the package itself helps you keep the deployment system generic, and thus easy to extend to other packages.

For example configuration management systems such as Ansible, Chef and Puppet allow you to create users and to restart services when a new package version is available, but if you rely on that, you have to treat each package separately in the configuration management system.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Perlgeek.de : Automating Deployments: Why bother?

At my employer, we developed a new software architecture. This involved developing and deploying several new components, many of them following the same pattern: A daemon process listing on a message bus (RabbitMQ, in case you're wondering) and also talking to existing applications: A database, an Active Directory service, a NetApp cluster or a VCenter, you name it.

Shortly after the development of these components begun, it was decided that a different team than before should operate the software we developed. The new team, although dedicated and qualified, was also drowning in other work.

As we had them deploy the first few components, it became clear that each new deployment distracted them from doing what we wanted most: build the infrastructure that we and our software needed.

As programmers, automating thins is much of our daily business, so why not automate some steps? We already had a Jenkins running for executing tests, so the next step was to automate the builds.

Since our systems run Debian GNU/Linux, and we build our applications as Debian packages, distributing the software meant uploading it to an internal Debian mirror. This proved to be a trickier than expected, because we use debarchiver for managing the Debian repositories, which doesn't give immediate feedback if an upload was successful.

After that, a deployment involved only an apt-get update && apt-get install $package, which at first we left to the ops team, and later automated too - though in the production environment only after a manual trigger.

Many of the manual and automatic deployments failed, usually due to missing resources in the message bus, so we automated their generation as well.

Reduced feedback cycles

So at $work, automating deployments first was a means to save time, and a means to defend the architectural freedom to develop several smaller components instead of few small components. Later it became a means to improve reliability.

But it quickly also became a tool to reduce the time it takes to get feedback on new features. We found it notoriously hard to get people to use the staging environment to try out new features, so we decided to simply roll them out to production, and wait for complaints (or praise, though we get that less often).

Being able to quickly roll out a fix when a critical bug has managed to slip into the production environment not only proved useful now and then, but also gave us a feeling of safety.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: Palm Treo call db module

Perlgeek.de : Automating Deployments: Installing Packages

After the long build-up of building and distributing and authenticating packages, actually installing them is easy. On the target system, run

$ apt-get update $ apt-get install package-info

(replace package-info with the package you want to install, if that deviates from the example used previously).

If the package is of high quality, it takes care of restarting services where necessary, so no additional actions are necessary afterwards.

Coordination with Ansible

If several hosts are needed to provide a service, it can be beneficial to coordinate the update, for example only updating one or two hosts at a time, or doing a small integration test on each after moving on to the next.

A nice tool for doing that is Ansible, an open source IT automation system.

Ansibles starting point is an inventory file, which lists that hosts that Ansible works with, optionally in groups, and how to access them.

It is best practice to have one inventory file for each environment (production, staging, development, load testing etc.) with the same group names, so that you can deploy to a different environment simply by using a different inventory file.

Here is an example for an inventory file with two web servers and a database server:

# production
[web]
www01.yourorg.com
www02.yourorg.com

[database]
db01.yourorg.com

[all:vars]
ansible_ssh_user=root

Maybe the staging environment needs only a single web server:

# staging
[web]
www01.staging.yourorg.com

[database]
db01.stagingyourorg.com

[all:vars]
ansible_ssh_user=root

Ansible is organized in modules for separate tasks. Managing Debian packages is done with the apt module:

$ ansible -i staging web -m apt -a 'name=package-info update_cache=yes state=latest'

The -i option specifies the path to the inventory file, here staging. The next argument is the group of hosts (or a single host, if desired), and -m apt tells Ansible to use the apt module.

What comes after the -a is a module-specific command. name specifies a Debian package, update_cache=yes forces Ansible to run apt-get update before installing the latest version, and state=latest says that that's what we want to do.

If instead of the latest version we want a specific version, -a 'name=package-info=0.1 update_cache=yes state=present force=yes' is the way to go. Without force=yes, apt wouldn't downgrade the module to actually get the desired version.

This uses the ad-hoc mode of Ansible. More sophisticated deployments use playbooks, of which I hope to write more later. Those also allow you to do configuration tasks such as adding repository URLs and GPG keys for package authentication.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Perlgeek.de : Profiling Perl 6 code on IRC

On the #perl6 IRC channel, we have a bot called camelia that executes small snippets of Perl 6 code, and prints the output that it produces. This is a pretty central part of our culture, and we use it to explain or demonstrate features or even bugs in the compiler.

Here is an example:

10:35 < Kristien> Can a class contain classes?
10:35 < Kristien> m: class A { class B { } }; say A.new.B.new
10:35 <+camelia> rakudo-moar 114659: OUTPUT«No such method 'B' for invocant of 
                 type 'A'␤  in block <unit> at /tmp/g81K8fr9eY:1␤␤»
10:35 < Kristien> :(
10:36 < raydiak> m: class A { class B { } }; say A::B.new
10:36 <+camelia> rakudo-moar 114659: OUTPUT«B.new()␤»

Yesterday and today I spent some time teaching this IRC bot to not only run the code, but optionally also run it through a profiler, to make it possible to determine where the virtual machine spends its time running the code. an example:

12:21 < moritz> prof-m: Date.today for ^100; say "done"
12:21 <+camelia> prof-m 9fc66c: OUTPUT«done␤»
12:21 <+camelia> .. Prof: http://p.p6c.org/453bbe

The Rakudo Perl 6 Compiler on the MoarVM backend has a profile, which produces a fancy HTML + Javascript page, and this is what is done. It is automatically uploaded to a webserver, producing this profile.

Under the hood, it started with a patch that makes it possible to specify the output filename for a profile run, and another one to clear up the fallout from the previous patch.

Then came the bigger part: setting up the Apache virtual host that serves the web files, including a restricted user that only allows up- and downloads via scp. Since the IRC bot can execute arbitrary code, it is very likely that an attacker can steal the private SSH keys used for authentication against the webserver. So it is essential that if those keys are stolen, the attacker can't do much more than uploading more files.

I used rssh for this. It is the login shell for the upload user, and configured to only allow scp. Since I didn't want the attacker to be able to modify the authorized_keys file, I configured rssh to use a chroot below the home directory (which sadly in turn requires a setuid-root wrapper around chroot, because ordinary users can't execute it. Well, nothing is perfect).

Some more patching and debugging later, the bot was ready.

The whole thing feels a bit bolted on; if usage warrants it, I'll see if I can make the code a bit prettier.

Ocean of Awareness: Linear? Yeah right.

Linear?

I have claimed that my new parser, Marpa, is linear for vast classes of grammars, going well beyond what the traditional parsers can do. But skepticism is justified. When it comes to parsing algorithms, there have been a lot of time complexity claims that are hand-wavy, misleading or just plain false. This post describes how someone, who is exercising the appropriate degree of skepticism, might conclude that believing Marpa's claims is a reasonable and prudent thing to do.

Marpa's linearity claims seem to be, in comparison with the other parsers in practical use today, bold. Marpa claims linearity, not just for every class of grammar for which yacc/bison, PEG and recursive descent currently claim linearity, but for considerably more. (The mathematical details of these claims are in a section at the end.) It seems too good to be true.

Why should I believe you?

The most important thing to realize, in assessing the believability of Marpa's time complexity claims, is that they are not new. They were already proved in a long-accepted paper in the refereed literature. They are the time complexity claims proved by Joop Leo for his algorithm in 1991, over two decades ago. Marpa is derived from Leo's algorithm, and its time complexity claims are those proved for Leo's algorithm.

Above I said that Marpa's time complexity claims "seem" bold. On any objective assessment, they are in fact a bit of a yawn. The claims seem surprising only because a lot of people are unaware of Leo's results. That is, they are surprising in the same sense that someone who had avoided hearing about radio waves would be surprised to learn that he can communicate instantly with someone on the other side of the world.

So, if there's so little to prove, why does the Marpa paper have proofs? In Marpa, I made many implementation decisions about, and some changes to, the Leo/Earley algorithm. None of my changes produced better time complexity results -- my only claim is that I did not change the Leo/Earley algorithm in a way that slowed it down. To convince myself of this claim, I reworked the original proofs of Leo and Earley, changing them to reflect my changes, and demonstrated that the results that Leo had obtained still held.

Proofs of this kind, which introduce no new mathematical techniques, but simply take a previous result and march from here to there by well-know means, are called "tedious". In journals, where there's a need to conserve space, they are usually omitted, especially if, as is the case with Marpa's time complexity proofs, the results are intuitively quite plausible.

Getting from plausible to near-certain

So let's say you are not going to work through every line of Marpa's admittedly tedious proofs. We've seen that the results are intuitively plausible, as long as you don't reject the previous literature. But can we do better than merely "plausible"?

As an aside, many people misunderstand the phrase "mathematically proven", especially as it applies to branches of math like parsing theory. The fact is that proofs in papers often contain errors. Usually these are minor, and don't affect the result. On the other hand, Jay Earley's paper, while one of the best Computer Science papers ever published, also contained a very serious error. And this error slipped past his Ph.D. committee and his referees. Mathematical arguments and proofs do not allow us to achieve absolute certainty. They can only improve the degree of certainty.

There's a second way to dramatically increase your degree of conviction in Marpa's linearity claims, and it is quite simple. Create examples of problematic grammars, run them and time them. This is not as satisfying as a mathematical proof, because no set of test grammars can be exhaustive. But if you can't find a counter-example to Marpa's linearity claims among the grammars of most interest to you, that should help lift your level of certainty to "certain for all practical purposes".

Much of this increase in certainty can be achieved without bothering to run your own tests. Marpa is in wide use at this point. If Marpa was going quadratic on grammars for which it claimed to be linear, and these were grammars of practical interest, that would very likely have been noticed by now.

I'm still not convinced

Let's suppose all this has not brought you to the level of certainty you need to use Marpa. That means the reasonable thing is to continue to struggle to work with the restrictions of the traditional algorithms, right? No, absolutely not.

OK, so you don't believe that Marpa preserves the advances in power and speed made by Leo. Does that mean that parsers have to stay underpowered? No, it simply means that there should be a more direct implementation of Leo's 1991, bypassing Marpa.

But if you are looking for an implementation of Leo's 1991 algorithm, I think you may end up coming back to Marpa as the most reasonable choice. Marpa's additional features include the ability to use custom, procedural logic, as you can with recursive descent. And Marpa has worked out a lot of the implementation details for you.

Comments

Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net. To learn more about Marpa, there's the official web site maintained by Ron Savage. I also have a Marpa web site.

Appendix: Some technical details

Above I talked about algorithms, classes of grammars and their linearity claims. I didn't give details because most folks aren't interested. For those who are, they are in this section.

yacc is linear for a grammar class called LALR, which is a subset of another grammar class called LR(1). If you are willing to hassle with GLR, bison claims linearity for all of LR(1). Recursive descent is a technique, not an algorithm, but it is top-down with look-ahead, and therefore can be seen as some form of LL(k), where k depends on how it is implemented. In practice, I suspect k is never much bigger than 3, and usually pretty close to 1. With packratting, PEG can be made linear for everything it parses but there is a catch -- only in limited cases do you know what language your PEG grammar actually parses. In current practice, that means your PEG grammar must be LL(1). Some of the PEG literature looks at techniques for extending this as far as LL-regular, but there are no implementations, and it remains to be seen if the algorithms described are practical.

The Marpa paper contains a proof, based on a proof of the same claim by Joop Leo, that Marpa is linear for LR-regular grammars. The LR-regular grammars include the LR(k) grammars for every k. So Marpa is linear for LR(1), LR(2), LR(8675309), etc. LR-regular also includes LL-regular. So every class of grammar under discussion in the PEG literature is already parsed in linear time by Marpa. From this, it is also safe to conclude that, if a grammar can be parsed by anything reasonably described as recursive descent, it can be parsed in linear time by Marpa.

Perlgeek.de : Architecture of a Deployment System

An automated build and deployment system is structured as a pipeline.

A new commit or branch in a version control system triggers the instantiation of the pipeline, and starts executing the first of a series of stages. When a stage succeeds, it triggers the next one. If it fails, the entire pipeline instance stops.

Then manual intervention is necessary, typically by adding a new commit that fixes code or tests, or by fixing things with the environment or the pipeline configuration. A new instance of the pipeline then has a chance to succeed.

Deviations from the strict pipeline model are possible: branches, potentially executed in parallel, for example allow running different tests in different environments, and waiting with the next step until both are completed successfully.

The typical stages are building, running the unit tests, deployment to a first test environment, running integration tests there, potentially deployment to and tests in various test environments, and finally deployment to production.

Sometimes, these stages blur a bit. For example, a typical build of Debian packages also runs the unit tests, which alleviates the need for a separate unit testing stage. Likewise if the deployment to an environment runs integration tests for each host it deploys to, there is no need for a separate integration test stage.

Typically there is a piece of software that controls the flow of the whole pipeline. It prepares the environment for a stage, runs the code associated with the stage, collects its output and artifacts (that is, files that the stage produces and that are worth keeping, like binaries or test output), determines whether the stage was successful, and then proceeds to the next.

From an architectural standpoint, it relieves the stages of having to know what stage comes next, and even how to reach the machine on which it runs. So it decouples the stages.

Anti-Pattern: Separate Builds per Environment

If you use a branch model like git flow for your source code, it is tempting to automatically deploy the develop branch to the testing environment, and then make releases, merge them into the master branch, and deploy that to the production environment.

It is tempting because it is a straight-forward extension of an existing, proven workflow.

Don't do it.

The big problem with this approach is that you don't actually test what's going to be deployed, and on the flip side, deploy something untested to production. Even if you have a staging environment before deploying to production, you are invalidating all the testing you did the testing environment if you don't actually ship the binary or package that you tested there.

If you build "testing" and "release" packages from different sources (like different branches), the resulting binaries will differ. Even if you use the exact same source, building twice is still a bad idea, because many builds aren't reproducible. Non-deterministic compiler behavior, differences in environments and dependencies all can lead to packages that worked fine in one build, and failed in another.

It is best to avoid such potential differences and errors by deploying to production exactly the same build that you tested in the testing environment.

Differences in behavior between the environments, where they are desirable, should be implemented by configuration that is not part of the build. (It should be self-evident that the configuration should still be under version control, and also automatically deployed. There are tools that specialize in deploying configuration, like Puppet, Chef and Ansible.)


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: Travelling in time: the CP2000AN

Dave's Free Press: Journal: Graphing tool

Dave's Free Press: Journal: XML::Tiny released

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 1

Dave's Free Press: Journal: Thanks, Yahoo!

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 2

Ocean of Awareness: What are the reasonable computer languages?

"You see things; and you say 'Why?' But I dream things that never were; and I say 'Why not?'" -- George Bernard Shaw

In the 1960's and 1970's computer languages were evolving rapidly. It was not clear which way they were headed. Would most programming be done with general-purpose languages? Or would programmers create a language for every task domain? Or even for every project? And, if lots of languages were going to be created, what kinds of languages would be needed?

It was in that context that Čulik and Cohen, in a 1973 paper, outlined what they thought programmers would want and should have. In keeping with the spirit of the time, it was quite a lot:

  • Programmers would want to extend their grammars with new syntax, including new kinds of expressions.
  • Programmers would also want to use tools that automatically generated new syntax.
  • Programmers would not want to, and especially in the case of auto-generated syntax would usually not be able to, massage the syntax into very restricted forms. Instead, programmers would create grammars and languages which required unlimited lookahead to disambiguate, and they would require parsers which could handle these grammars.
  • Finally, programmers would need to be able to rely on all of this parsing being done in linear time.

Today, we think we know that Čulik and Cohen's vision was naive, because we think we know that parsing technology cannot support it. We think we know that parsing is much harder than they thought.

The eyeball grammars

As a thought problem, consider the "eyeball" class of grammars. The "eyeball" class of grammars contains all the grammars that a human can parse at a glance. If a grammar is in the eyeball class, but a computer cannot parse it, it presents an interesting choice. Either,

  • your computer is not using the strongest practical algorithm; or
  • your mind is using some power which cannot be reduced to a machine computation.

There are some people out there (I am one of them) who don't believe that everything the mind can do reduces to a machine computation. But even those people will tend to go for the choice in this case: There must be some practical computer parsing algorithm which can do at least as well at parsing as a human can do by "eyeball". In other words, the class of "reasonable grammars" should contain the eyeball class.

Čulik and Cohen's candidate for the class of "reasonable grammars" were the grammars that a deterministic parse engine could parse if it had a lookahead that was infinite, but restricted to distinguishing between regular expressions. They called these the LR-regular, or LRR, grammars. And the LRR grammars do in fact seem to be a good first approximation to the eyeball class. They do not allow lookahead that contains things that you have to count, like palindromes. And, while I'd be hard put to eyeball every possible string for every possible regular expression, intuitively the concept of scanning for a regular expression does seem close to capturing the idea of glancing through a text looking for a telltale pattern.

So what happened?

Alas, the algorithm in the Čulik and Cohen paper turned out to be impractical. But in 1991, Joop Leo discovered a way to adopt Earley's algorithm to parse the LRR grammars in linear time, without doing the lookahead. And Leo's algorithm does have a practical implementation: Marpa.

References, comments, etc.

To learn more about Marpa, there's the official web site maintained by Ron Savage. I also have a Marpa web site. Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.

Dave's Free Press: Journal: YAPC::Europe 2007 travel plans

Dave's Free Press: Journal: Wikipedia handheld proxy

Dave's Free Press: Journal: Bryar security hole

Dave's Free Press: Journal: POD includes

Dave's Free Press: Journal: cgit syntax highlighting

Ocean of Awareness: Grammar reuse

Every year the Perl 6 community creates an "Advent" series of posts. I always follow these, but one in particular caught my attention this year. It presents a vision of a future where programming is language-driven. A vision that I share. The post went on to encourage its readers to follow up on this vision, and suggested an approach. But I do not think the particular approach suggested would be fruitful. In this post I'll explain why.

Reuse

The focus of the Advent post was language-driven programming, and that is the aspect that excites me most. But the points that I wish to make are more easily understood if I root them in a narrower, but more familiar issue -- grammar reuse.

Most programmers will be very familiar with grammar reuse from regular expressions. In the regular expression ("RE") world, programming by cutting and pasting is very practical and often practiced.

For this post I will consider grammar reusability to be the ability to join two grammars and create a third. This is also sometimes called grammar composition. For this purpose, I will widen the term "grammar" to include RE's and PEG parser specifications. Ideally, when you compose two grammars, what you get is

  • a language you can reasonably predict, and
  • if each of the two original grammars can be parsed in reasonable time, a language that can be parsed in reasonable time.

Not all language representations are reusable. RE's are, and BNF is. PEG looks like a combination of BNF and RE's, but PEG, in fact, is its own very special form of parser specification. And PEG parser specifications are one of the least reusable language representations ever invented.

Reuse and regular expressions

RE's are as well-behaved under reuse as a language representation can get. The combination of two RE's is always another RE, and you can reasonably determine what language the combined RE recognizes by examining it. Further, every RE is parseable in linear time.

The one downside, often mentioned by critics, is that RE's do not scale in terms of readability. Here, however, the problem is not really one of reusability. The problem is that RE's are quite limited in their capabilities, and programmers often exploit the excellent behavior of RE's under reuse to push them into applications for which RE's just do not have the power.

Reuse and PEG

When programmers first look at PEG syntax, they often think they've encountered paradise. They see both BNF and RE's, and imagine they'll have the best of each. But the convenient behavior of RE's depends on their unambiguity. You simply cannot write an unambiguous RE -- it's impossible.

More powerful and more flexible, BNF allows you to describe many more grammars -- including ambiguous ones. How does PEG resolve this? With a Gordian knot approach. Whenever it encounters an ambiguity, it throws all but one of the choices away. The author of the PEG specification gets some control over what is thrown away -- he specifies an order of preference for the choices. But degree of control is less than it seems, and in practice PEG is the nitroglycerin of parsing -- marvelous when it works, but tricky and dangerous.

Consider these 3 PEG specifications:

	("a"|"aa")"a"
	("aa"|"a")"a"
	A = "a"A"a"/"aa"

All three clearly accept only strings which are repetitions of the letter "a". But which strings? For the answers, suggestions for dealing with PEG if you are committed to it, and more, look at my previous post on PEG.

When getting an RE or a BNF grammar to work, you can go back to the grammar and ask yourself "Does my grammar look like my intended language?". With PEG, this is not really possible. With practice, you might get used to figuring out single line PEG specs like the first two above. (If you can get the last one, you're amazing.) But tracing these through multiple rule layers required by useful grammars is, in practice, not really possible.

In real life, PEG specifications are written by hacking them until the test suite works. And, once you get a PEG specification to pass the test suite for a practical-sized grammar, you are very happy to leave it alone. Trying to compose two PEG specifications is rolling the dice with the odds against you.

Reuse and the native Perl 6 parser

The native Perl 6 parser is an extended PEG parser. The extensions are very interesting from the PEG point of view. The PEG "tie breaking" has been changed, and backtracking can be used. These features mean the Perl 6 parser can be extended to languages well beyond what ordinary PEG parsers can handle. But, if you use the extra features, reuse will be even trickier than if you stuck with vanilla PEG.

Reuse and general BNF parsing

As mentioned, general BNF is reusable, and so general BNF parsers like Marpa are as reusable as regular expressions, with two caveats. First, if the two grammars are not doing their own lexing, their lexers will have to be compatible.

Second, with regular expressions you had the advantage that every regular expression parses in linear time, so that speed was guaranteed to be acceptable. Marpa users reuse grammars and pieces of grammars all the time. The result is always the language specified by the merged BNF, and I've never heard anyone complain that performance deterioriated.

But, while it may not happen often, it is possible to combine two Marpa grammars that run in linear time and end up with one that does not. You can guarantee your merged Marpa grammar will stay linear if you follow 2 rules:

  • keep the grammar unambiguous;
  • don't use an unmarked middle recursion.

Unmarked middle recursions are not things you're likely to need a lot: they are those palindromes where you have to count to find the middle: grammars like "A ::= a | a A a". If you use a middle recursion at all, it is almost certainly going to be marked, like "A ::= b | a A a", which generates strings like "aabaa". With Marpa, as with RE's, reuse is easy and practical. And, as I hope to show in a future post, unlike RE's, Marpa opens the road to language-driven programming.

Perl 6

I'm a fan of the Perl 6 effort. I certainly should be a supporter, after the many favors they've done for me and the Marpa community over the years. The considerations of this post will disappoint some of the hopes for applications of the native Perl 6 parser. But these applications have not been central to the Perl 6 effort, of which I will be an eager student over the coming months.

Comments

To learn more about Marpa, there's the official web site maintained by Ron Savage. I also have a Marpa web site. Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.

Perlgeek.de : Automating Deployments: Building Debian Packages

I have argued before that it is a good idea to build packages from software you want to automatically deploy. The package manager gives you dependency management as well as the option to execute code at defined points in the installation process, which is very handy for restarting services after installation, creating necessary OS-level users and so on.

Which package format to use?

There are many possible package formats and managers for them out there. Many ecosystems and programming languages come with their own, for example Perl uses CPAN.pm or cpanminus to install Perl modules, the NodeJS community uses npm, ruby has the gem installer, python pip and easyinstall, and so on.

One of the disadvantages is that they only work well for one language. If you or your company uses applications uses software in multiple programming languages, and you chose to use the language-specific packaging formats and tools for each, you burden yourself and the operators with having to know (and being aware of) these technologies.

Operating teams are usually familiar with the operating system's package manager, so using that seems like an obvious choice. Especially if the same operating system family is used throughout the whole organization. In specialized environments, other solutions might be preferable.

What's in a Debian package, and how do I build one?

A .deb file is an ar archive with meta data about the archive format version, meta data for the package (name, version, installation scripts) and the files that are to be installed.

While it is possible to build such a package directly, the easier and much more common route is to use the tooling provided by the devscripts package. These tools expect the existence of a debian/ directory with various files in them.

debian/control contains information such as the package name, dependencies, maintainer and description. debian/rules is a makefile that controls the build process of the debian package. debian/changelog contains a human-readable summary of changes to the package. The top-most changelog entry determines the resulting version of the package.

You can use dh_make from the dh-make package to generate a skeleton of files for the debian/ directory, which you can then edit to your liking. This will ask your for the architecture of the package. You can use a specific one like amd64, or the word any for packages that can be build on any architecture. If resulting package is architecture independent (as as is the case for many scripting languages), using all as the architecture is appropriate.

Build process of a Debian package

If you use dh_make to create a skeleton, debian/rules mostly consists of a catch-all rule that calls dh $@. This is tool that tries to do the right thing for each build step automatically, and usually it succeeds. If there is a Makefile in your top-level directory, it will call the configure, build, check and install make targets for you. If your build system installs into the DESTDIR prefix (which is set to debian/your-package-name), it should pretty much work out of the box.

If you want to copy additional files into the Debian package, listing the file names, one on each line, in debian/install, that is done automatically for you.

Shortcuts

If you have already packaged your code for distribution through language-specific tools, such as CPAN (Perl) or pip (Python), there are shortcuts to creating Debian Packages.

Perl

The tool dh-make-perl (installable via the package of the same name) can automatically create a debian directory based on the perl-specific packaging. Calling dh-make-perl . inside the root directory of your perl source tree is often enough to create a functional Debian package. It sticks to the naming convention that a Perl package Awesome::Module becomes libawesome-module-perl in Debian land.

Python

py2dsc from the python-stdeb package generates a debian/ directory from an existing python tarball.

Another approach is to use dh-virtualenv. This copies all of the python dependencies into a virtualenv, so the resulting packages only depends on the system python and possible C libraries that the python packages use; all python-level dependencies are baked in. This tends to produce bigger packages with fewer dependencies, and allows you to run several python programs on a single server, even if they depend on different versions of the same python library.

dh-virtualenv has an unfortunate choice of default installation prefix that clashes with some assumptions that Debian's python packages make. You can override that choice in debian/rules:

#!/usr/bin/make -f
export DH_VIRTUALENV_INSTALL_ROOT=/usr/share/yourcompany
%:
        dh $@ --with python-virtualenv --with systemd

It also assumes Pyhton 2 by default. For a Python 3 based project, add these lines:

override_dh_virtualenv:
        dh_virtualenv --python=/usr/bin/python3

(As always with Makefiles, be sure to indent with hard tabulator characters, not with spaces).


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: CPAN Testers' CPAN author FAQ

Perlgeek.de : Why is it hard to write a compiler for Perl 6?

Russian translation available; Пост доступен на сайте softdroid.net: Почему так трудно написать компилятор для Perl 6?.

Today's deceptively simple question on #perl6: is it harder to write a compiler for Perl 6 than for any other programming language?

The answer is simple: yes, it's harder (and more work) than for many other languages. The more involved question is: why?

So, let's take a look. The first point is organizational: Perl 6 isn't yet fully explored and formally specified; it's much more stable than it used to be, but less stable than, say, targeting C89.

But even if you disregard this point, and target the subset that for example the Rakudo Perl 6 compiler implements right now, or the wait a year and target the first Perl 6 language release, the point remains valid.

So let's look at some technical aspects.

Static vs. Dynamic

Perl 6 has both static and dynamic corners. For example, lexical lookups are statical, in the sense that they can be resolved at compile time. But that's not optional. For a compiler to properly support native types, it must resolve them at compile time. We also expect the compiler to notify us of certain errors at compile time, so there must be a fair amount of static analysis.

On the other hand, type annotations are optional pretty much anywhere, and methods are late bound. So the compiler must also support features typically found in dynamic languages.

And even though method calls are late bound, composing roles into classes is a compile time operation, with mandatory compile time analysis.

Mutable grammar

The Perl 6 grammar can change during a parse, for example by newly defined operators, but also through more invasive operations such as defining slangs or macros. Speaking of slangs: Perl 6 doesn't have a single grammar, it switches back and forth between the "main" language, regexes, character classes inside regexes, quotes, and all the other dialects you might think of.

Since the grammar extensions are done with, well, Perl 6 grammars, it forces the parser to be interoperable with Perl 6 regexes and grammars. At which point you might just as well use them for parsing the whole thing, and you get some level of minimally required self-hosting.

Meta-Object Programming

In a language like C++, the behavior of the object system is hard-coded into the language, and so the compiler can work under this assumption, and optimize the heck out of it.

In Perl 6, the object system is defined by other objects and classes, the meta objects. So there is another layer of indirection that must be handled.

Mixing of compilation and run time

Declarations like classes, but also BEGIN blocks and the right-hand side of constant declarations are run as soon as they are parsed. Which means the compiler must be able to run Perl 6 code while compiling Perl 6 code. And also the other way round, through EVAL.

More importantly, it must be able to run Perl 6 code before it has finished compiling the whole compilation unit. That means it hasn't even fully constructed the lexical pads, and hasn't initialized all the variables. So it needs special "static lexpads" to which compile-time usages of variables can fall back to. Also the object system has to be able to work with types that haven't been fully declared yet.

So, lots of trickiness involved.

Serialization, Repossession

Types are objects defined through their meta objects. That means that when you precompile a module (or even just the setting, that is, the mass of built-ins), the compiler has to serialize the types and their meta objects. Including closures. Do you have any idea how hard it is to correctly serialize closures?

But, classes are mutable. So another module might load a precompiled module, and add another method to it, or otherwise mess with it. Now the compiler has to serialize the fact that, if the second module is loaded, the object from the first module is modified. We say that the serialization context from the second module repossesses the type.

And there are so many ways in which this can go wrong.

General Featuritis

One of the many Perl 6 mottos is "torture the implementor on behalf of the user". So it demands not only both static and dynamic typing, but also functional features, continuations, exceptions, lazy lists, a powerful grammar engine, named arguments, variadic arguments, introspection of call frames, closures, lexical and dynamic variables, packed types (for direct interfacing with C libraries, for example), and phasers (code that is automatically run at different phases of the program).

All of these features aren't too hard to implement in isolation, but in combination they are a real killer. And you want it to be fast, right?

Dave's Free Press: Journal: Thankyou, Anonymous Benefactor!

Perlgeek.de : Automating Deployments: 3+ Environments

Software is written to run in a production environment. This is where the goal of the business is achieved: making money for the business, or reaching and educating people, or whatever the reason for writing the software is. For websites, this is the typically the Internet-facing public servers.

But the production environment is not where you want to develop software. Developing is an iterative process, and comes with its own share of mistakes and corrections. You don't want your customers to see all those mistakes as you make them, so you develop in a different environment, maybe on your PC or laptop instead of a server, with a different database (though hopefully using the same database software as in the production environment), possibly using a different authentication mechanism, and far less data than the production environment has.

You'll likely want to prevent certain interactions in the development environment that are desirable in production: Sending notifications (email, SMS, voice, you name it), charging credit cards, provisioning virtual machines, opening rack doors in your data center and so on. How that is done very much depends on the interaction. You can configure a mail transfer agent to deliver all mails to a local file or mail box. Some APIs have dedicated testing modes or installations; in the worst case, you might have to write a mock implementation that answers similarly to the original API, but doesn't carry out the action that the original API does.

Deploying software straight to production if it has only been tested on the developer's machine is a rather bad practice. Often the environments are too different, and the developer unknowingly relied on a feature of his environment that isn't the same in the production environment. Thus it is quite common to have one or more environments in between where the software is deployed and tested, and only propagated to the next deployment environment when all the tests in the previous one were successful.

After a software is modified in the development environment, it is
deployed to the testing environment (with its own database), and if all tests
were successful, propagated to the production environment.

One of these stages is often called testing. This is where the software is shown to the stakeholders to gather feedback, and if manual QA steps are required, they are often carried out in this environment (unless there is a separate environment for that).

A reason to have another non-production environment is test service dependencies. If several different software components are deployed to the testing environment, and you decide to deploy one or two at a time to production, things might break in production. The component you deployed might have a dependency on a newer version of another component, and since the testing environment contained that newer version, nobody noticed. Or maybe a database upgrade in the testing environment failed, and had to be repaired manually; you don't want the same to happen in a production setting, so you decide to test in another environment first.

After a software is modified in the development environment, it is
deployed to the testing environment (with its own database), and if all tests
were successful, propagated to the staging  environment. Only if this works is
the deployment to production carried out

Thus many companies have another staging environment that mirrors the production environment as closely as possible. A planned production deployment is first carried out in the staging environment, and on success done in production too, or rolled back on error.

There are valid reasons to have more environments even. If automated performance testing is performed, it should be done in an separate environment where no manual usage is possible to avoid distorting results. Other tests such as automated acceptance or penetration testings are best done in their own environment.

One can add more environments for automated acceptance, penetration
     and performance testing for example; those typically come before the
     staging environment.

In addition, dedicated environment for testing and evaluating explorative features are possible.

It should be noted that while these environment all serve valid purposes, they also come at a cost. Machines, either virtual or native, on which all those environments run must be available, and they consume resources. They must be set up initially and maintained. License costs must be considered (for example for proprietary databases). Also the time for deploying code increases as the number of environment increases. With more environments, automating deployments and maybe even management and configuration of the infrastructure becomes mandatory.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: Number::Phone release

Dave's Free Press: Journal: Ill

Dave's Free Press: Journal: CPANdeps upgrade

Perlgeek.de : Writing docs helps you take the user's perspective

This year, most of my contributions to Perl 6 have been to the documentation, or were directly inspired by writing the documentation.

Quite often when I write documentation, I start thinking things like this is a bit awkward to explain, wouldn't it be more consistent if ... or what happens when I use a negative number here? The implementation disallows it, but does it actually need to? or if I tell people to just pass this particular value most of the time, why not make it the default?.

Like most people who aspires to be a good programmer, I'm lazy. In particular, I hate doing pointless work. And documenting inconsistencies or missing default values or arbitrary restrictions definitively feels like doing work that shouldn't be necessary. So with a sigh I overcome my laziness, and try to fix stuff in the code, the tests, and sometimes the design docs so I can be more lazy in documenting the features. And of course, to make the overall experience more pleasant for the end user.

I've been skeptical of README-driven development in the past, dismissing it as part of the outdated (or at least for software not suitable) waterfall model, or as "no plan survives contact with the enemy". But now that I'm writing more docs, I see the value of writing docs early (of course with the provision that if things turn out to be impractical a documented, the docs may still be revised). Because it's very easy as a developer to lose the user's perspective, and writing docs makes it easier (at least for me) to look at the project from that perspective again.

Examples

With the philosophy part done, I'd like to bring some examples.

The missing default value

In Perl 6 land, we distinguish meta classes, which control behavior of a type, and representations, which control memory layout of objects.

Most Perl 6 objects have the representation P6opaque, which provides opaque, efficient storage for objects with attributes, properties, slots, or however you call per-object storage in your favorite language. Special representations exists for interfacing with C libraries, concurrency control and so on.

The class Metamodel::Primitives provides primitives for writing meta classes, with this method:

method create_type(Mu $how, $repr) { ... }

$how is our standard name for Meta stuff (from "Higher Order Workings", or simply from controlling how stuff works), and $repr is the name of the representation.

Somebody new to meta object stuff doesn't need to know much about representations (except when they want to very low-level stuff), so the docs for create_type could have said if you don't know what representation to use, use P6opaque. Or I could just establish P6opaque as a default:

method create_type(Mu $how, $repr = 'P6opaque') { ... }

There, less to document, and somebody new to this stuff can ignore the whole representations business for a while longer.

Arbitrary restrictions

The method rotor on the List was intended to create a list of sublists with fixed number of elements from the original list, potentially with overlap. So the old API was:

method rotor($elems = 2, $overlap = 1) { ... }

And one would use it a

.say for (1..7).rotor(3, 1);
# 1 2 3
# 3 4 5
# 5 6 7

Again I had an issue with default values: It wasn't clear to me why $elems defaulted to 2 (so I removed that default), or whe $overlap defaulted to 1. Wouldn't 0 be a more intuitive default?

But my main issue was that the implementation disallowed negative overlaps, and the design docs were silent on the issue. If you visualize how rotor works (take $elems elements from the list, then step back $overlap elements, then rinse and repeat), it's clear what negative overlaps mean: they are steps forward instead of backwards, and create gaps (that is, some list elements aren't included in the sublists).

And once you allow negative steps backwards, why not go work with steps forward in the first place, which are more intuitive to the user, and explicitly allow negative steps to create overlaps?

So that's what we did, though the end result is even more general.

The crucial question here was why disallow negative overlaps?, or recognizing that a restriction was arbitrary. And then lifting it.

Wording of error messages

Error messages are important to communicate why something went wrong.

We used to have the error message Could not find an appropriate parametric role variant for $role. A test for a good error message is: ask why?, and if the piece of code that threw the error can know the answer, the error message needs improving.

In this case: why can't the runtime environment find an appropriate variant? Because it didn't try hard enough? No. Because it's buggy? I hope not. It can't find the candidate because it's not there. So, include that answer in the error message: No appropriate parametric role variant available for $role.

(Uninformative/lazy error messages are one of my favorite topics for rants; consider the good old SIOCADDRT: No such process that route(8) sometimes emits, or python's Cannot import name X -- why not? ...)

So, write those docs. Write them at a time where you can still change semantics. Keep asking yourself what you could change so the documentation becomes shorter, sweeter, easier understandable.

Dave's Free Press: Journal: YAPC::Europe 2006 report: day 3

Perlgeek.de : Automating Deployments: A New Year and a Plan

I work as a software engineer and architect, and in the last year or so I also built automated deployment pipelines for our software. While I found it hard to get started, the end result and even the process of building them were immensely satisfying, and I learned a lot.

The memories of not knowing how to do things are fresh enough in my mind that I feel qualified to teach them to others. And I've been wanting to write a tech book for ages. So yes, here it comes.

For 2016 I am planning to write an ebook on automating deployments. It's going to be a practical guide, mostly using technologies I'm already familiar with, and also pointing out alternative technologies. And there will be enough theory to justify putting in the effort of learning about and implementing automated (and possibly continuous) deployments, and to justify the overall architecture.

I will be blogging about the topics that I want to be in the book, and later distill them into book chapters.

Here is a very rough outline of topics that I want to include, subject to future change:

  • Motivations for automating deployments
  • Requirements for automated/continuous deployments
  • Teaser: Using only git and bash as the simplest thing that could possibly work
  • Discussion of the previous example, and anatomy of a more complex deployment system
  • The build stage: Building Debian packages
  • Distributing Debian packages (with aptly)
  • Deployment to a staging environment with Ansible
  • Automated integration testing
  • Propagation to a production environment
  • Stitching it all together with Go CD

If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Subscriptions

Header image by Tambako the Jaguar. Some rights reserved.