perlbuzz.com: All about the new Test2 framework and how it will help your tests

The new Test2 framework has been released after a couple years of development. I wanted to find out about what this means for users of Test::Simple and Test::More, so I chatted with the project leader, Chad Granum (exodist).

Andy Lester: So Test2 has just been released after a couple of years of work, and a lot of discussion. For those of us who haven’t followed its development, what is Test2 and why is it a good thing?

Chad Granum: The big changes will be for people who write test modules. The old Test::Builder was tied to specific generation of TAP output. That’s been replaced with a flexible event system.

It all started when David Golden submitted a patch to change the indentation of a comment intended for humans who read the test. The change would help people, but meant nothing to the machine. I had to reject the patch because it broke a lot of downstream modules. Things broke because they tested that Test::Builder produced the message in its original form. I thought that was crazy, and wanted to make things easier to maintain, test, and improve.

Andy: Test::Builder’s internals were pretty fragile?

Chad: That is true, but that’s not the whole picture. The real problem was the tools people used to validate testing tools. Test::Builder::Tester was the standard, and it boiled down to giant string comparisons of TAP output, which mixes messages for the computer’s use, and messages for human use.

While most of the changes are under the hood, there are improvements for people who just want to write tests. Test2 has a built-in synchronization system for forking/threading. If you modify a test to load Test2::IPC before loading Test::More, then you can fork in your tests and it will work in sane/reasonable ways. Up until now doing this required external tools such as Test::SharedFork which had severe limitations.

Another thing I want to note is an improvement in how Test2 tracks file+line number for error reporting purposes. As you know diagnostics are reported when a test fails, and it gives you the filename and line number of the failure. Test::Builder used a global variable $Test::Builder::Level which people were required to localize and bump whenever they added a stack frame to their tool. This was confusing and easy to get wrong.

Test2 now uses a Context object. This object solves the problem by locking in the “context” (file + line) when the tool is first called. All nested tools will then find that context. The context object also doubles as the primary interface to Test2 for tool writers, which means it will not be obscure like the $Level variable was.

Andy: I just counted 1045 instances of $Test::Builder::Level in my codebase at work. Are you saying that I can throw them all away when I start using Test2?

Chad: Yes, if you switch to using Test2 in those tools you can stop counting your stack frames. That said, the $Level variable will continue to work forever for backwards compatibility.

Andy: Will the TAP output be the same? We’re still using an ancient install of Smolder as our CI tool and I believe it expects TAP to look a certain way.

Chad: Extreme care was taken to ensure that the TAP output did not change in any significant ways. The one exception is David Golden’s change that started all this:

Ok 1 - random test
    # a subtest
    Ok 1 - subtest result
    1..1
Ok 2 - a subtest

This has changed to:

Ok 1 - random test
# a subtest
    Ok 1 - subtest result
    1..1
Ok 2 - a subtest

That is the change that started all this, and had the potential to break CPAN.

Andy: So Test2 is all about possibilities for the future. It’s going to make it easier for people to create new Test:: modules. As the author of a couple of Test:: modules myself, I know that the testing of the tests is always a big pain. There’s lots of cut & paste from past modules that work and tweaking things until they finally pass the tests. What’s different between the old way of doing the module testing and now?

Chad: Test::Builder assumed TAP would be the final product, and did not give you any control or hooks into everything between your tool and the TAP, as such you had to test your final TAP output, which often included text you did not yourself produce. In Test2 we drop those assumptions, TAP is no longer assumed, and you also have hooks in almost every step of the process between your tool and the final output.

Many of the actions Test::Builder would accomplish have been turned into Event objects. Test tools do their thing, and then fire events off to Test2 for handling. Eventually these events hit a formatter (TAP by default) and are rendered for a harness. Along with the hooks there is a tool in Test2::API called intercept, it takes a codeblock, all events generated inside that codeblock are captured and returned, they are not rendered and do not affect the global test state. Once you capture your events you can test them as data structures, and ignore ones that are not relevant to your tools.

The Test::Builder::Tester way may seem more simple at first, but that is deceptive. There is a huge loss of information. Also if there are changes to how Test::Builder renders TAP, such as dropping the ‘-‘ then everything breaks.

Using Test::Builder::Tester

test_out("ok 1 - a passing test");
ok(1, 'a passing test');
test_test("Got expected line of TAP output");

Using intercept and basic Test::More tools

my $events = intercept {
    ok(1, 'a passing test');
};

my $e = shift @$events;

ok($e->pass, "passing tests event");
is($e->name, "a passing test", "got event name");
is_deeply(
    $e->trace->frame,
    [__PACKAGE__, __FILE__, 42, 'Test2::Tools::Basic::ok'],
    "Got package, file, line and sub name"
);

Using Test2::Tools::Compare

like(
    intercept {
        ok(1, 'a passing test');
    },
    array {
        event Ok => sub {
            call pass => 1;
            call name => 'a passing test';

            prop file    => __FILE__;
            prop package => __PACKAGE__;
            prop line    => 42; 
            prop subname => 'Test2::Tools::Basic::ok';
        };
    },
    'A passing test'
);

Andy: What other features does Test2 include for users who aren’t creating Test:: modules?

Chad: Test2’s core, which is included in the Test-Simple distribution does not have new features at the user level. However Test2-Suite was released at the same time as Test2/Test-Simple, and it contains new versions of all the Test::More tools, and adds some things people have been requesting for years, but were not possible with the old Test::Builder

The biggest example would be “die/bail on fail”, which lets you tell the test suite to stop after the first failure. The old stuff could not do this because there was no good hook point, and important diagnostics would be lost.

It’s as simple as using one of these two modules:

use Test2::Plugin::DieOnFail;
use Test2::Plugin::BailOnFail;

The difference is that DieOnFail calls die under the hood. The BailOnFail will send a bail-out event which will abort the current file, and depending on the harness might stop the entire test run.

Andy: So how do I start using Test2? At my day job, our code base has 1,200 *.t files totalling 282,000 lines of code. Can I expect to install the new version of Test::Simple (version 1.302019) that includes Test2 and everything will “just work”?

Chad: For the vast majority of cases the answer is “yes”. Back-compatibility was one of the most significant concerns for the project. That said, some things did unfortunately break. A good guide to what breaks, and why can be found in this document. Usually things that break do so because they muck about with the Test::Builder internals in nasty ways. Usually these modules had no choice due to Test::Builder’s limitations. When I found such occurrences I tried to add hooks or APIs to do those things in sane/reasonable ways.

Andy: Do I have to upgrade? Can I refuse to go up to Test-Simple 1.302019? What are the implications of that?

Chad: Well, nobody is going to come to you and force you to install the latest version. If you want to keep using your old version you can. You might run into trouble down the line if other Test:: tools you use decide to make use of Test2-specific features, at which point you would need to lock in old versions of those as well. You also would not be able to start using any new tools people build using Test2.

Andy: And the tools you’re talking about are Test:: modules, right? The command line tool prove and make test haven’t changed, because they’re part of Test::Harness?

Chad: Correct. Test::Harness has not been touched, it will work on any test files that produce TAP, and Test2 still produces TAP by default. That said I do have a project in the works to create an alternative harness specifically for Test2 stuff, but it will never be a requirement to use it, things will always work on Test::Harness.

Andy: So if I’m understanding the Changes file correctly, Test-Simple 1.302012 was the last old-style version and 1.302014 is the new version with Test2?

Chad: No, Test-Simple-1.001014 is the last STABLE release of Test-Simple that did not have Test2, then Test-Simple-1.302015 was the first stable release to include Test2. There were a lot of development releases between the 2, but no stable ones. The version numbers had to be carefully crafted to follow the old scheme, but we also had to keep it below 1.5xxxxx because of the previous maintainers’ projects which used that number as well as 2.0. Some downstream users had code switched based on version number and expected an API that never came to be. Most of these downstream distributions have been fixed now, but we are using a “safe” version number just in case.

Andy: What has development for this been like? This has been in the works for, what, two years now? I remember talking to you briefly about it at OSCON 2014.

Chad: At the point we talked I had just been given Test-Simple, and did not have any plans to make significant changes. What we actually talked about was my project Fennec which was a separate Test::Builder based test framework. Some features from Fennec made their way into Test2, enough so that Fennec will be deprecated once I have a stable Test2::Workflow release.

Initially development started as a refactor of Test::Builder that was intended to be fairly small. The main idea was to introduce the events, and a way to capture them. From there it ballooned out as I fixed bugs, or made other changes necessary to support events.

At one point the changes were significant enough, and broke enough downstream modules that I made it a complete fork under the name Test-Stream. I figured it would be easier to make Test::Builder a compatibility wrapper.

In 2015, I attended the QA hackathon in Berlin, and my Test-Stream fork was a huge topic of conversation. The conversation resulted in a general agreement (not unanimous) that it would be nice to have these changes. There was also a list of requests (demands?) for the project before it could go stable. We called it the punch-list.

After the Berlin hackathon there was more interest in the project. Other toolchain people such as Graham Knop (Haarg), Daniel Dragan (bulk88), Ricardo Signes (rjbs), Matt Trout (mst), Karen Etheridge (ether), Leon Timmermans (leont), Joel Berger (jberger), Kent Fredric (kentnl), Peter Rabbitson (ribasushi), etc. started reviewing my code, making suggestions and reporting bugs. This was one of the most valuable experiences. The project as it is now, is much different than it was in Berlin, it is much better from the extra eyes and hands.

A month ago was another QA hackathon, in Rugby UK, and once again Test2 was a major topic. This time the general agreement was that it was ready now. The only new requirements on the table were related to making the broken downstream modules very well known, and also getting a week of extra cpan-testers results prior to release.

I must note that at both QA hackathons the decisions were not unanimous, but in both cases there was a very clear majority.

Andy: So what’s next? I see that you have a grant for more documentation. Tell me about that, and what can people do to help?

Chad: The Test2 core API is not small, and has more moving pieces than Test::Builder did. Right now there is plenty of technical/module documentation, but there is a lack of overview documentation. There is a need for a manual that helps people find solutions to their problems, and tied the various parts together. This is the first part of the manual docs for tool authors.

Test2::Suite is also not small, but provides a large set of tools for people to use, some are improvements on old tools, some are completely new. The manual will have a second section on using these new tools. This second part of the manual will be geared towards people writing tests.

The best way for people to help would be to start using Test2::Suite in their tests, and Test2 in their test tools. People will undoubtedly find places where more documentation is needed, or where things are not clear. Reporting such documentation gaps would help me to write better documentation. (Test::More repo,
Test2::Suite repo)

Apart from the documentation, I have 2 other Test2 related projects nearing completion: Test2-Workflow, which is an implementation of the tools from Fennec that are not a core part of Test2, and Test2-Harness which is an optional alternative to Test::Harness. Both are pretty much code-complete on GitHub, but neither has the test coverage I feel is necessary before putting them on CPAN.

Andy: Thanks for all the work that’s gone into this, both to you and the rest of those who’ve contributed. It sounds like we’ll soon see more tools to make testing easier and more robust.

Sawyer X: Perl 5 Porters Mailing List Summary: May 25th-29th

Hey everyone,

Following is the p5p (Perl 5 Porters) mailing list summary for the remainder of the past week. Enjoy!

May 25th-29th

News and updates

Additional grant reports by Tony Cook. Over 35 total hours and approximately 14 tickets were reviewed or worked on, and 4 patches applied.

Tony also published his entire April grant report. Over 71 total hours and approximately 40 tickets were reviewed, and 3 patches applied were applied.

Dave Mitchell finished the work on Scope::Upper, making it pass all of its tests. Kent Fredric provided a tarball with all of Dave's patches, in order to test it.

Issues

New issues

Resolved issues

Discussion

The conversation around the possible deprecation of encoding.pm continues.

The conversation around the usage of strcpy in locale.c continued.

Vincent Pit notes on the conversation about a compile-time indirect method call check that the current implementation of the indirect pragma is not suitable for core. Abigail is not in favor of having it in core at all. Zefram hints at Sub::StrictDecl.

Father Chrysostomos opened Perl #128242 to discuss the idea of providing aliasing on the right hand side of a my statement. There are many questions about this and there's even the possibility of introducing a new character for this new type of ability. I recommend reading comments by Zefram provided here and here.

In Perl #128241 Father Chrysostomos suggests handling the situation of a regex with a variable that ends up to be empty: /$empty/. Because it is then equivalent to //, it will do something different than what usually is expected. The threads of conversation on the topic are here and here.

Aristotle Pagaltzis agrees an unused POSIX symbol does not need a dedicated deprecation cycle since it isn't used anywhere in CPAN.

perlbuzz.com: Perlbuzz news roundup for 2016-05-27

It’s been a while since Perlbuzz.com has been up and running. Time for me empty the queue of the last two years of Twitter postings.

These links are collected from the
Perlbuzz Twitter feed.
If you have suggestions for news bits, please mail me at
andy@perlbuzz.com.

Perl Foundation News: YAPC Newsletter #8

In this issue

  • Registration deadlines approaching

  • Keynote address: The Dark Art of Boatbuilding and Project Management -- A salty look at projects, people and customers

  • Sponsor Spotlight: cPanel

Registration deadlines approaching

  • Book your room now! Conference rates are only available until June 1st! After June 1st rates return to the standard hotel rate with no guarantee of availability. Rooms are limited! Reserve yours today! And if you have trouble reserving this room, this news post may have some helpful tips for you.

  • YAPC::NA::2016 Conference passes are still only $250, until June 4th. After June 4th a Late Enrollment fee will also be assessed. Don’t hesitate! Get your passes today!

Keynote Address: The Dark Art of Boatbuilding and Project Management -- A salty look at projects, people and customers.

We are pleased to announce keynote speaker, Art Eaton. A Perl developer and Florida native with experiences growing up in a ship yard, Art has an entertaining and useful message for us all. Taken from the viewpoint that all efficient industrial processes are identical at their core, he examines the IT industry from a boat builder's perspective.

This talk is all about running a successful, professional, and happy boatyard where people find value and comfort in their jobs, the quality of work is high, and customers know what they are getting for their money.

...sort of. Art will be applying business experience gained from his marine engineering, medical, military and IT backgrounds to present you with a sea story about project management with which people in all industries can identify.

Sponsor Spotlight: cPanel

http://cpanel.com/

The cPanel & WHM software package is an easy-to-use control panel that gives web hosts, and the website owners they serve, the ability to quickly and easily manage their servers and websites. Developed in Perl and working in the SCRUM development methodology, cPanel seeks highly motivated development team members in our Houston location. Visit http://jobs.cpanel.com for open positions.

Want to sponsor YAPC?

The best way to sponsor YAPC is through a sponsorship to The Perl Foundation. By sponsoring TPF, not only do you get recognition for your support of YAPC::NA, but you are also recognized as a sponsor of our regional Perl Workshops, our Outreach Program for Women, beginner training initiatives, and our grants programs for an entire year. It really is the way to get the most value for your sponsorship money. Find out more by visiting https://donate.perlfoundation.org or by contacting treasurer@perlfoundation.org.

Perl Foundation News: Maintaining Perl 5: Grant Report for April 2016

Tony Cook writes:

Approximately 48 tickets were reviewed, and 7 patches were applied

HoursActivity
1.57investigate new ipsysv.t failures on darwin, create cpan
#112827, fix in blead, #p5p unicode string behaviour
discussion
4.60#122287 testing
#122287 work on a patch to Configure/Makefile.SH
#122287 more work, testing cross platform, comment with patch
#122287 double check patch, expand comment on patch
1.47#124430 try to find why App::assh started working
#124430 keep trying, start a bisect
3.42#125296 review, experiment and comment
#125296 comment
1.45#125368 bisect and close
3.30#126162 apply patch manually, work on fixes
#126162 more fixes, push to smoke-me and comment
1.70#126206 review ticket, try to work up a test case
#126206 more try to work up a test case
0.15#126545 review discussion and tickets and comment
0.32#127080 review with aim to close, but unresolved issue,
comment
0.40#127158 (sec) review discussion to see if anything needs
to remain private (for now)
#127158 make public
7.15#127231 ask List-MoreUtils maintainer for a fixed release
#127231 partly track down to Params::Validate bug
#127231 testing
#127231 try to reproduce again
#127231 open a PR against P::V and comment
#127231 review autarch's suggested extra fix and comment on PR
5.61#127380 (sec) work on alternate patch
#127380 (sec) more alternate patch work
#127380 (sec) comment
#127380 review discussion and comment
#127380 comment, work on a patch
#127380 reply to new comments, work on patch, comment with
patch
#127380 reply to new comments
2.17#127455 more debugging, work on hints to downgrade optimization
4.03#127494 testing, fix win32 jenkins failure
#127494 testing, debugging, comment
0.65#127533 produce a simple patch and comment
#127533 apply to blead
1.50#127543 review email from Alan Burlinson
#127543 review, research and comment
#127543 follow-up
0.50#127555 review, add to stack not refcounted ticket and
briefly comment
0.22#127585 comment
1.95#127611 review and comment
#127611 research and comment
#127611 research and comment
2.03#127619 testing and comment
#127619 review newest patches, testing, apply to blead
1.21#127635 review, produce an alternate patch
#127635 re-test and apply alternate patch to blead
0.43#127636 review, test and apply to blead
1.10#127641 optimization, fix cmp_version.t failure
1.32#127657 build, try to figure out symbolizer
3.48#127663 review code, work on a patch
#127663 more work on patch
0.45#127664 review patches, minor fixes, apply to blead
0.53#127687 research and comment
6.75#127708 research, discussion with khw on #p5p, comment
#127708 testing, comments, extra patch, patch to add
i_xlocale and d_duplocale probes
#127708 try thread-safe locale code, seems to work, irc
discussion
#127708 review khw's new “locale and threads” message,
research and reply
#127708 note extra discussion in ticket, comment in extra
discussion
#127708 brief irc discussion
0.18#127712 comment
0.43#127713 testing, research and comment
1.25#127746 discussion with khw, testing, start bisect
0.43#127751 try a test build, seems to be strawberry perl
specific, comment
4.25#127759 debugging
#127759 debugging
#127759 debugging
7.48#127760 work on debug code to debug via smoke
#127760 more work on test patch, push to smoke-me
#127760 review smoke log, add more debug code, local
testing and debugging
#127760 review smoke results, testing on Win7 and XP, look
for Win2000 in old MSDN disks, ask George to run a simple
test program
#127760 comment
#127760 testing, push a smoke-me with a skip for Win2k and
earlier
0.95#127791 review, testing, apply to blead
0.03#127793 move to perl6 queue
0.25#127796 (sec) research and comment
1.27#127802 debugging and comment
1.43#127804 review, testing, start bisect
#127804 fix bisect and finish, comment
0.45cygwin test failures
1.30fix bad code on non-clang in threads.xs
0.05khw's cygwin locale skip patch follow-up
1.22maint votes
0.93more alarm.t failures, check over history, try to
reproduce without a VM (in a loop)
1.90more cygwin, get file.t passing, look into alarm.t
failures
1.97more maint votes
0.18new Time::HiRes warning, try to search cpan for hrt_ualarm
users
2.27track down jenkins Win32 issue, reproduce on Linux, track
down to buffer overflow and fix, testing, notify list and
Yves

87.68 hours total

Perl Foundation News: Maintaining the Perl 5 Core: Report for Month 31

Dave Mitchell writes:

I spent last month mainly working on various assorted RT tickets.

Summary

10:54 [perl #127746] charset.t and subst.t fail on Solaris under -Duse64bitall
0:40 [perl #127799] Bleadperl breaks TOKUHIROM/Module-Build-Pluggable-0.10.tar.gz on Windows
18:02 [perl #127834] @INC issues
14:57 [perl #127875] Blead breaks Scope::Upper
0:51 [perl #127915] $=x~0 segfaults Perl 5.24.0-RC1-2-gde1d2c7
1:00 [perl #127999] Slowdown in split + list assign
9:50 process p5p mailbox
1:00 review [perl #127810] Provide -Dfortify_inc
0:29 review and apply [perl #127819] Get -DPERL_MEM_LOG working again
1:00 sort out a unconfig.h issue

58:43 Total (HH::MM)

As of 2016/04/30: since the beginning of the grant:

132.9 weeks
1837.1 total hours
13.8 average hours per week

There are 163 hours left on the grant.

Sawyer X: Perl 5 Porters Mailing List Summary: May 19th-24th

Hey everyone,

Following is the p5p (Perl 5 Porters) mailing list summary for the past week and a bit. Enjoy!

May 19th-24th

News and updates

Perl 5.25.1 is now available! You can read the release announcement here. The date for 5.26 is for May 2017. Those are typos. :)

Lexical subroutines are no longer experimental!

Perl can now recognize version control conflict markers, thanks to a patch by Lukas Mai in Perl #127993.

Karl Williamson created a META ticket for 5.24.1 blockers in Perl #128222.

Dave Mitchell provides his grant report. In total, over 21 hours, mostly spent on making Scope::Upper work on latest versions of Perl 5.24.0.

Issues

New issues

Resolved issues

Proposed patches

Klaus Baldermann provided a patch for perlbug in Perl #128180 to add more verbosity to the output, following a PerlMonks thread.

Michael Haubenwallner provided a patch to avoid libperl.dll.dll.a in Cygwin.

Discussion

An update from H.Merijn Brand (Tux) that he had finished preparing builds for 5.24.0 for HP-UX ia64.

The discussion around detecting perl6 in the shebang line continues.

Ed Avis suggested to revisit a suggestion by Kent Fredric to bring in indirect into core. People have shown support for removing indirect object notation from examples in the core documentation, but have yet to expressed a position about Kent's original suggestion. Sawyer X requested that performance would be verified first, as well as a discussion continues before making a decision on it.

Father Chrysostomos pings us about the sub :const feature, allowing to make anonymous subroutines constants. Is anyone using it? Do you like it? Does it have any problems? Should it stay experimental?

Father Chrysostomos suggests perhaps only partially deprecating encoding.

There is an interesting conversation around Perl #127531 (permit \escape on right side of my). Father Chrysostomos made progress on the topic, and several interesting comments on the topic by Ricardo Signes and Aristotle Pagaltzis.

H.Merijn Brand updates that a machine which was used for smoking perl core on HP-UX 10.20 had gone for good. Rest in piece.

Father Chrysostomos suggests deprecating encoding::warnings.

Glenn Golden asks whether the usage of FileHandle.pm is applicable in an example of perlipc.

In Perl #128227, Eric Wong suggests moving Perl to vfork for spawning external processes. Leon Timmermans found that originally Perl had used it but abandoned it, while Ivan Pozdeev adds that Configure asks whether to use vfork and that the current POSIX standard (intended to replace vfork) is posix_spawn.

Tom Wyant noted in Perl #128213 that while literal left curly bracket was deprecated in 5.22 and produces a compile error since 5.25.1, there was no deprecation warning on 5.24. Karl Williamson provides extensive comments on the change and its intent. This continued with a lively discussion between Zefram and Yves Orton.

Perl Foundation News: Staging Server for New blogs.perl.org

Although we have no new progress reports for the blogs.perl.org migration since the third one, a development version of the site is available for testing at http://blogsperlorg.pearlbee.org/.

If you use the existing site, please check that the new beta site works for you.

Please leave a comment below if you have helpful suggestions for how to improve the new site. Make sure your suggestions fall within the scope of the grant's proposal.

Perl Hacks: Dancing in Cluj-Napoca

Over the last couple of weeks I’ve been running a poll to decide which training course to run at YAPC Europe in August. Thank you you the people who voted in the poll.

I’ve just closed the poll and the results are pretty clear. In Cluj-Napoca I’ll be running a course on Modern Web Development with Perl and Dancer. That was the most popular choice with 31% of the vote. Moose and “Other” were the second most popular choice with 19% each. Here are the full results.

Title %
Modern Web Development with Perl and Dancer 31%
Object Oriented Programming with Perl and Moose 19%
Other 19%
Database Programming with Perl and DBIx::Class 16.7%
Testing Perl Programs 14.3%

The “other” responses were interesting. A couple of people asked for Perl 6 training (and I think their wish might be granted – but I don’t want to pre-empt announcements by other people). Someone wanted “Advanced Testing”. Someone wanted “nodejs”. Someone wanted the web training, but with Mojoicious instead of Dancer (I’ve never used Mojolicious so I’m not the right person to be running that course). Oh, and we had one vote each for “all of the above” and “none of the above”. Perhaps some suggestions there if someone else wants to run a training course at the conference.

I also asked about cost. And those answers were interesting too. I guess it’s no surprise that people gravitated towards the lower numbers (“how much do you want to pay?”; “as little as possible, obviously!”) but it wasn’t the lowest price that was most popular. The most popular choice (with 42.5%) was 100 €. We haven’t worked out the details of the pricing yet (we need to see what the venue will charge us) but I hope to get it as close to 100 € as possible.

Speaking of the venue, we do know where the training will be. It’s will be at Cluj Hub which is a really great-looking co-working and events space in Cluj. As I said above, we’re still working out the details (costs, catering, stuff like that) but there are some fabulous plans being discussed and I hope to be able to annouce full details soon.

And what about the class itself? Well, I’m glad you asked. It’ll be a hands-on course and over a day we’ll build a complete (and, hopefully) useful little web application using a number of modern web technologies. The back-end will (of course) be Perl (specifically Dancer2) but we’ll also be using Bootstrap, jQuery, Mustache and more.

It’s a course I’ve been working on intermittently for some time and I’m really pleased with how it’s shaping up. I think you’ll enjoy it too.

So when you’re planning your trip to Cluj-Napoca, please consider travelling a day early and coming to the training. It’ll be a lot of fun.

Once the last details have been worked out, we’ll add it to the YAPC web site so you can book it.

In summary:

Modern Web Development with Perl and Dancer
One-day, hands-on course
Cluj Hub, Cluj-Napoca, Romania
Tue 23 August 2016
Cost: TBA (but as close to 100 € as possible)

The post Dancing in Cluj-Napoca appeared first on Perl Hacks.

Dave's Free Press: Journal: Module pre-requisites analyser

Dave's Free Press: Journal: CPANdeps

Dave's Free Press: Journal: Perl isn't dieing

Perlgeek.de : Introducing Go Continuous Delivery

Go Continuous Delivery (short GoCD or simply Go) is an open source tool that controls an automated build or deployment process.

It consists of a server component that holds the pipeline configuration, polls source code repositories for changes, schedules and distributes work, collects artifacts, and presents a web interface to visualize and control it all, and offers a mechanism for manual approval of steps. One or more agents can connect to the server, and carry out the actual jobs in the build pipeline.

Pipeline Organization

Every build, deployment or test jobs that GoCD executes must be part of a pipeline. A pipeline consists of one or more linearly arranged stages. Within a stage, jobs run potentially in parallel, and are individually distributed to agents. Tasks are again linearly executed within a job. The most general task is the execution of an external program. Other tasks include the retrieval of artifacts, or specialized things such as running a Maven build.

Matching of Jobs to Agents

When an agent is idle, it polls the server for work. If the server has jobs to run, it uses two criteria to decide if the agent is fit for carrying out the job: environments and resources.

Each job is part of a pipeline, and a pipeline is part of an environment. On the other hand, each agent is configured to be part of one or more environments. An agent only accepts jobs from pipelines from one of its environments.

Resources are user-defined labels that describe what an agent has to offer, and inside a pipeline configuration, you can specify what resources a job needs. For example you can define that job requires the phantomjs resource to test a web application, then only agents that you assign this resource will execute that job. It is also a good idea to add the operating system and version as a resources. In the example above, the agent might have the phantomjs, debian and debian-jessie resources, offering the author of the job some choice of granularity for specifying the required operating system.

Installing the Go Server on Debian

To install the Go server on a Debian or Debian-based operating system, first you have to make sure you can download Debian packages via HTTPS:

$ apt-get install -y apt-transport-https

Then you need to configure the package sourcs:

$ echo 'deb http://dl.bintray.com/gocd/gocd-deb/ /' > /etc/apt/sources.list.d/gocd.list
$ curl https://bintray.com/user/downloadSubjectPublicKey?username=gocd | apt-key add -

And finally install it:

$ apt-get update && apt-get install -y go-server

When you now point your browser at port 8154 of the go server for HTTPS (ignore the SSL security warnings) or port 8153 for HTTP, you should see to go server's web interface:

To prevent unauthenticated access, create a password file (you need to have the apache2-utils package installed to have the htpasswd command available) on the command line:

$ htpasswd -c -s /etc/go-server-passwd go-admin
New password:
Re-type new password:
Adding password for user go-admin
$ chown go: /etc/go-server-passwd
$ chmod 600 /etc/go-server-passwd

In the go web interface, click on the Admin menu and then "Server Configuration". In the "User Management", enter the path /etc/go-server-passwd in the field "Password File Path" and click on "Save" at the bottom of the form.

Immediately afterwards, the go server asks you for username and password.

You can also use LDAP or Active Directory for authentication.

Installing a Go Worker on Debian

On one or more servers where you want to execute the automated build and deployment steps, you need to install a go agent, which will connect to the server and poll it for work. On each server, you need to do the first same three steps as when installing the server, to ensure that you can install packages from the go package repository. And then, of course, install the go agent:

$ apt-get install -y apt-transport-https
$ echo 'deb http://dl.bintray.com/gocd/gocd-deb/ /' > /etc/apt/sources.list.d/gocd.list
$ curl https://bintray.com/user/downloadSubjectPublicKey?username=gocd | apt-key add -
$ apt-get update && apt-get install -y go-agent

Then edit the file /etd/default/go-agent. The first line should read

GO_SERVER=127.0.0.1

Change the right-hand side to the hostname or IP address of your go server, and then start the agent:

$ service go-agent start

After a few seconds, the agent has contacted the server, and when you click on the "Agents" menu in the server's web frontend, you should see the agent:

("lara" is the host name of the agent here).

A Word on Environments

Go makes it possible to run agents in specific environments, and for example run a go agent on each testing and on each production machine, and use the matching of pipelines to agent environments to ensure that for example an installation step happens on the right machine in the right environment. If you go with this model, you can also use Go to copy the build artifacts to the machines where they are needed.

I chose not to do this, because I didn't want to have to install a go agent on each machine that I want to deploy to. Instead I use Ansible, executed on a Go worker, to control all machines in an environment. This requires managing the SSH keys that Ansible uses, and distributing packages through a Debian repository. But since Debian seems to require a repository anyway to be able to resolve dependencies, this is not much of an extra hurdle.

So don't be surprised when the example project here only uses a single environment in Go, which I call Control.

First Contact with Go's XML Configuration

There are two ways to configure your Go server: through the web interface, and through a configuration file in XML. You can also edit the XML config through the web interface.

While the web interface is a good way to explore go's capabilities, it quickly becomes annoying to use due to too much clicking. Using an editor with good XML support get things done much faster, and it lends itself better to compact explanation, so that's the route I'm going here.

In the Admin menu, the "Config XML" item lets you see and edit the server config. This is what a pristine XML config looks like, with one agent already registered:

<?xml version="1.0" encoding="utf-8"?>
<cruise xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="cruise-config.xsd" schemaVersion="77">
<server artifactsdir="artifacts" commandRepositoryLocation="default" serverId="b2ce4653-b333-4b74-8ee6-8670be479df9">
    <security>
    <passwordFile path="/etc/go-server-passwd" />
    </security>
</server>
<agents>
    <agent hostname="lara" ipaddress="192.168.2.43" uuid="19e70088-927f-49cc-980f-2b1002048e09" />
</agents>
</cruise>

The ServerId and the data of the agent will differ in your installation, even if you followed the same steps.

To create an environment and put the agent in, add the following section somewhere within <cruise>...</cruise>:

<environments>
    <environment name="Control">
    <agents>
        <physical uuid="19e70088-927f-49cc-980f-2b1002048e09" />
    </agents>
    </environment>
</environments>

(The agent UUID must be that of your agent, not of mine).

To give the agent some resources, you can change the <agent .../> tag in the <agents> section to read:

<agent hostname="lara" ipaddress="192.168.2.43" uuid="19e70088-927f-49cc-980f-2b1002048e09">
  <resources>
    <resource>debian-jessie</resource>
    <resource>build</resource>
    <resource>debian-repository</resource>
  </resources>
</agent>

Creating an SSH key

It is convenient for Go to have an SSH key without password, to be able to clone git repositories via SSH, for example.

To create one, run the following commands on the server:

$ su - go $ ssh-keygen -t rsa -b 2048 -N '' -f ~/.ssh/id_rsa

And either copy the resulting .ssh directory and the files therein onto each agent into the /var/go directory (and remember to set owner and permissions as they were created originally), or create a new key pair on each agent.

Ready to Go

Now that the server and an agent has some basic configuration, it is ready for its first pipeline configuration. Which we'll get to soon :-).


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 3

Perlgeek.de : Managing State in a Continuous Delivery Pipeline

Continuous Delivery is all nice and fluffy for a stateless application. Installing a new application is a simple task, which just needs installing of the new binaries (or sources, in case of a language that's not compiled), stop the old instance, and start a new instance. Bonus points for reversing the order of the last two steps, to avoid downtimes.

But as soon as there is persistent state to consider, things become more complicated.

Here I will consider traditional, relational databases with schemas. You can avoid some of the problems by using a schemaless "noSQL" database, but you don't always have that luxury, and it doesn't solve all of the problems anyway.

Along with the schema changes you have to consider data migrations, but they aren't generally harder to manage than schema changes, so I'm not going to consider them in detail.

Synchronization Between Code and Database Versions

State management is hard because code is usually tied to a version of the database schema. There are several cases where this can cause problems:

  • Database changes are often slower than application updates. In version 1 of your application can only deal with version 1 of the schema, and version 2 of the application can only deal with version 2 of the schema, you have to stop the application in version 1, do the database upgrade, and start up the application only after the database migration has finished.
  • Stepbacks become painful. Typically either a database change or its rollback can lose data, so you cannot easily do an automated release and stepback over these boundaries.

To elaborate on the last point, consider the case where a column is added to a table in the database. In this case the rollback of the change (deleting the column again) loses data. On the other side, if the original change is to delete a column, that step usually cannot be reversed; you can recreate a column of the same type, but the data is lost. Even if you archive the deleted column data, new rows might have been added to the table, and there is no restore data for these new rows.

Do It In Multiple Steps

There is no tooling that can solve these problems for you. The only practical approach is to collaborate with the application developers, and break up the changes into multiple steps (where necessary).

Suppose your desired change is to drop a column that has a NOT NULL constraint. Simply dropping the column in one step comes with the problems outlined above. In a simple scenario, you might be able to do the following steps instead:

  • Deploy a database change that makes the column nullable (or give it a default value)
  • Wait until you're sure you don't want to roll back to a version where this column is NOT NULL
  • Deploy a new version of the application that doesn't use the column anymore
  • Wait until you're sure you don't want to roll back to a version of your application that uses this column
  • Deploy a database change that drops the column entirely.

In a more complicated scenario, you might first need to a deploy a version of your application that can deal with reading NULL values from this column, even if no code writes NULL values yet.

Adding a column to a table works in a similar way:

  • Deploy a database change that adds the new column with a default value (or NULLable)
  • Deploy a version of the application that writes to the new column
  • optionally run some migrations that fills the column for old rows
  • optionally deploy a database change that adds constraints (like NOT NULL) that weren't possible at the start

... with the appropriate waits between the steps.

Prerequisites

If you deploy a single logical database change in several steps, you need to do maybe three or four separate deployments, instead of one big deployment that introduces both code and schema changes at once. That's only practical if the deployments are (at least mostly) automated, and if the organization offers enough continuity that you can actually actually finish the change process.

If the developers are constantly putting out fires, chances are they never get around to add that final, desired NOT NULL constraint, and some undiscovered bug will lead to missing information later down the road.

Tooling

Unfortunately, I know of no tooling that supports the inter-related database and application release cycle that I outlined above.

But there are tools that manage schema changes in general. For example sqitch is a rather general framework for managing database changes and rollbacks.

On the lower level, there are tools like pgdiff that compare the old and new schema, and use that to generate DDL statements that bring you from one version to the next. Such automatically generated DDLs can form the basis of the upgrade scripts that sqitch then manages.

Some ORMs also come with frameworks that promise to manage schema migrations for you. Carefully evaluate whether they allow rollbacks without losing data.

No Silver Bullet

There is no single solution that manages all your data migrations automatically for your during your deployments. You have to carefully engineer the application and database changes to decouple them a bit. This is typically more work on the application development side, but it buys you the ability to deploy and rollback without being blocked by database changes.

Tooling is available for some pieces, but typically not for the big picture. Somebody has to keep track of the application and schema versions, or automate that.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Perlgeek.de : Automating Deployments: Distributing Debian Packages with Aptly

Once a Debian package is built, it must be distributed to the servers it is to be installed on.

Debian, as well as all other operating systems I know of, use a pull model for that. That is, the package and its meta data are stored on a server that the client can contact, and request the meta data and the package.

The sum of meta data and packages is called a repository. In order to distribution packages to the servers that need them, we must set up and maintain such a repository.

Signatures

In Debian land, packages are also signed cryptographically, to ensure packages aren't tampered with on the server or during transmission.

So the first step is to create a key pair that is used to sign this particular repository. (If you already have a PGP key for signing packages, you can skip this step).

The following assumes that you are working with a pristine system user that does not have a gnupg keyring yet, and which will be used to maintain the debian repository. It also assumes you have the gnupg package installed.

$ gpg --gen-key

This asks a bunch of questions, like your name and email address, key type and bit width, and finally a pass phrase. I left the pass phrase empty to make it easier to automate updating the repository, but that's not a requirement.

$ gpg --gen-key
gpg (GnuPG) 1.4.18; Copyright (C) 2014 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

gpg: directory `/home/aptly/.gnupg' created
gpg: new configuration file `/home/aptly/.gnupg/gpg.conf' created
gpg: WARNING: options in `/home/aptly/.gnupg/gpg.conf' are not yet active during this run
gpg: keyring `/home/aptly/.gnupg/secring.gpg' created
gpg: keyring `/home/aptly/.gnupg/pubring.gpg' created
Please select what kind of key you want:
   (1) RSA and RSA (default)
   (2) DSA and Elgamal
   (3) DSA (sign only)
   (4) RSA (sign only)
Your selection? 1
RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) 
Requested keysize is 2048 bits
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0) 
Key does not expire at all
Is this correct? (y/N) y
You need a user ID to identify your key; the software constructs the user ID
from the Real Name, Comment and Email Address in this form:
    "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"

Real name: Aptly Signing Key
Email address: automatingdeployments@gmail.com
You selected this USER-ID:
    "Moritz Lenz <automatingdeployments@gmail.com>"

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
You need a Passphrase to protect your secret key.

You don't want a passphrase - this is probably a *bad* idea!
I will do it anyway.  You can change your passphrase at any time,
using this program with the option "--edit-key".

We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
..........+++++
.......+++++

Not enough random bytes available.  Please do some other work to give
the OS a chance to collect more entropy! (Need 99 more bytes)
..+++++
gpg: /home/aptly/.gnupg/trustdb.gpg: trustdb created
gpg: key 071B4856 marked as ultimately trusted
public and secret key created and signed.

gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
pub   2048R/071B4856 2016-01-10
      Key fingerprint = E80A D275 BAE1 DEDE C191  196D 078E 8ED8 071B 4856
uid                  Moritz Lenz <automatingdeployments@gmail.com>
sub   2048R/FFF787F6 2016-01-10

Near the bottom the line starting with pub contains the key ID:

pub   2048R/071B4856 2016-01-10

We'll need the public key later, so it's best to export it:

$ gpg --export --armor 071B4856 > pubkey.asc

Preparing the Repository

There are several options for managing Debian repositories. My experience with debarchiver is mixed: Once set up, it works, but it does not give immediate feedback on upload; rather it communicates the success or failure by email, which isn't very well-suited for automation.

Instead I use aptly, which works fine from the command line, and additionally supports several versions of the package in one repository.

To initialize a repo, we first have to come up with a name. Here I call it internal.

$ aptly repo create -distribution=jessie -architectures=amd64,i386,all -component=main internal

Local repo [internal] successfully added.
You can run 'aptly repo add internal ...' to add packages to repository.

$ aptly publish repo -architectures=amd64,i386,all internal
Warning: publishing from empty source, architectures list should be complete, it can't be changed after publishing (use -architectures flag)
Loading packages...
Generating metadata files and linking package files...
Finalizing metadata files...
Signing file 'Release' with gpg, please enter your passphrase when prompted:
Clearsigning file 'Release' with gpg, please enter your passphrase when prompted:

Local repo internal has been successfully published.
Please setup your webserver to serve directory '/home/aptly/.aptly/public' with autoindexing.
Now you can add following line to apt sources:
  deb http://your-server/ jessie main
Don't forget to add your GPG key to apt with apt-key.

You can also use `aptly serve` to publish your repositories over HTTP quickly.

As the message says, there needs to be a HTTP server that makes these files available. For example an Apache virtual host config for serving these files could look like this:

<VirtualHost *:80>
        ServerName apt.example.com
        ServerAdmin moritz@example.com

        DocumentRoot /home/aptly/.aptly/public/
        <Directory /home/aptly/.aptly/public/>
                Options +Indexes +FollowSymLinks

                Require all granted
        </Directory>

        # Possible values include: debug, info, notice, warn, error, crit,
        # alert, emerg.
        LogLevel notice
        CustomLog /var/log/apache2/apt/access.log combined
        ErrorLog /var/log/apache2/apt/error.log
        ServerSignature On
</VirtualHost>

After creating the logging directory (mkdir -p /var/log/apache2/apt/), enabling the the virtual host (a2ensite apt.conf) and restarting Apache, the Debian repository is ready.

Adding Packages to the Repository

Now that the repository is set up, you can add a package by running

$ aptly repo add internal package-info_0.1-1_all.deb
$ aptly publish update internal

Configuring a Host to use the Repository

Copy the PGP public key with which the repository is signed (pubkey.asc) to the host which shall use the repository, and import it:

$ apt-key add pubkey.asc

Then add the actual package source:

$ echo "deb http://apt.example.com/ jessie main" > /etc/apt/source.list.d/internal

After an apt-get update, the contents of the repository are available, and an apt-cache policy package-info shows the repository as a possible source for this package:

$ apt-cache policy package-info
package-info:
  Installed: (none)
  Candidate: 0.1-1
  Version table:
 *** 0.1-1 0
        990 http://apt.example.com/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status

This concludes the whirlwind tour through debian repository management and thus package distribution. Next up will be the actual package installation.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: Devel::CheckLib can now check libraries' contents

Ocean of Awareness: Top-down parsing is guessing

Top-down parsing is guessing. Literally. Bottom-up parsing is looking.

The way you'll often hear that phrased is that top-down parsing is looking, starting at the top, and bottom-up parsing is looking, starting at the bottom. But that is misleading, because the input is at the bottom -- at the top there is nothing to look at. A usable top-down parser must have a bottom-up component, even if that component is just lookahead.

A more generous, but still accurate, way to describe the top-down component of parsers is "prediction". And prediction is, indeed, a very useful component of a parser, when used in combination with other techniques.

Of course, if a parser does nothing but predict, it can predict only one input. Top-down parsing must always be combined with a bottom-up component. This bottom-up component may be as modest as lookahead, but it must be there or else top-down parsing is really not parsing at all.

So why is top-down parsing used so much?

Top-down parsing may be unusable in its pure form, but from one point of view that is irrelevant. Top-down parsing's biggest advantage is that it is highly flexible -- there's no reason to stick to its "pure" form.

A top-down parser can be written as a series of subroutine calls -- a technique called recursive descent. Recursive descent allows you to hook in custom-written bottom-up logic at every top-down choice point, and it is a technique which is completely understandable to programmers with little or no training in parsing theory. When dealing with recursive descent parsers, it is more useful to be a seasoned, far-thinking programmer than it is to be a mathematician. This makes recursive descent very appealing to seasoned, far-thinking programmers, and they are the audience that counts.

Switching techniques

You can even use the flexibility of top-down to switch away from top-down parsing. For example, you could claim that a top-down parser could do anything my own parser (Marpa) could do, because a recursive descent parser can call a Marpa parser.

A less dramatic switchoff, and one that still leaves the parser with a good claim to be basically top-down, is very common. Arithmetic expressions are essential for a computer language. But they are also among the many things top-down parsing cannot handle, even with ordinary lookahead. Even so, most computer languages these days are parsed top-down -- by recursive descent. These recursive descent parsers deal with expressions by temporarily handing control over to an bottom-up operator precedence parser. Neither of these parsers is extremely smart about the hand-over and hand-back -- it is up to the programmer to make sure the two play together nicely. But used with caution, this approach works.

Top-down parsing and language-oriented programming

But what about taking top-down methods into the future of language-oriented programming, extensible languages, and grammars which write grammars? Here we are forced to confront the reality -- that the effectiveness of top-down parsing comes entirely from the foreign elements that are added to it. Starting from a basis of top-down parsing is literally starting with nothing. As I have shown in more detail elsewhere, top-down techniques simply do not have enough horsepower to deal with grammar-driven programming.

Perl 6 grammars are top-down -- PEG with lots of extensions. These extensions include backtracking, backtracking control, a new style of tie-breaking and lots of opportunity for the programmer to intervene and customize everything. But behind it all is a top-down parse engine.

One aspect of Perl 6 grammars might be seen as breaking out of the top-down trap. That trick of switching over to a bottom-up operator precedence parser for expressions, which I mentioned above, is built into Perl 6 and semi-automated. (I say semi-automated because making sure the two parsers "play nice" with each other is not automated -- that's still up to the programmer.)

As far as I know, this semi-automation of expression handling is new with Perl 6 grammars, and it may prove handy for duplicating what is done in recursive descent parsers. But it adds no new technique to those already in use. And features like

  • mulitple types of expression, which can be told apart based on their context,
  • n-ary expressions for arbitrary n, and
  • the autogeneration of multiple rules, each allowing a different precedence scheme, for expressions of arbitrary arity and associativity,

all of which are available and in current use in Marpa, are impossible for the technology behind Perl 6 grammars.

I am a fan of the Perl 6 effort. Obviously, I have doubts about one specific set of hopes for Perl 6 grammars. But these hopes have not been central to the Perl 6 effort, and I will be an eager student of the Perl 6 team's work over the coming months.

Comments

To learn more about Marpa, there's the official web site maintained by Ron Savage. I also have a Marpa web site. Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.

Dave's Free Press: Journal: I Love Github

Dave's Free Press: Journal: Palm Treo call db module

Perlgeek.de : Automating Deployments: Installing Packages

After the long build-up of building and distributing and authenticating packages, actually installing them is easy. On the target system, run

$ apt-get update $ apt-get install package-info

(replace package-info with the package you want to install, if that deviates from the example used previously).

If the package is of high quality, it takes care of restarting services where necessary, so no additional actions are necessary afterwards.

Coordination with Ansible

If several hosts are needed to provide a service, it can be beneficial to coordinate the update, for example only updating one or two hosts at a time, or doing a small integration test on each after moving on to the next.

A nice tool for doing that is Ansible, an open source IT automation system.

Ansibles starting point is an inventory file, which lists that hosts that Ansible works with, optionally in groups, and how to access them.

It is best practice to have one inventory file for each environment (production, staging, development, load testing etc.) with the same group names, so that you can deploy to a different environment simply by using a different inventory file.

Here is an example for an inventory file with two web servers and a database server:

# production
[web]
www01.yourorg.com
www02.yourorg.com

[database]
db01.yourorg.com

[all:vars]
ansible_ssh_user=root

Maybe the staging environment needs only a single web server:

# staging
[web]
www01.staging.yourorg.com

[database]
db01.stagingyourorg.com

[all:vars]
ansible_ssh_user=root

Ansible is organized in modules for separate tasks. Managing Debian packages is done with the apt module:

$ ansible -i staging web -m apt -a 'name=package-info update_cache=yes state=latest'

The -i option specifies the path to the inventory file, here staging. The next argument is the group of hosts (or a single host, if desired), and -m apt tells Ansible to use the apt module.

What comes after the -a is a module-specific command. name specifies a Debian package, update_cache=yes forces Ansible to run apt-get update before installing the latest version, and state=latest says that that's what we want to do.

If instead of the latest version we want a specific version, -a 'name=package-info=0.1 update_cache=yes state=present force=yes' is the way to go. Without force=yes, apt wouldn't downgrade the module to actually get the desired version.

This uses the ad-hoc mode of Ansible. More sophisticated deployments use playbooks, of which I hope to write more later. Those also allow you to do configuration tasks such as adding repository URLs and GPG keys for package authentication.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Perlgeek.de : Automating Deployments: Building in the Pipeline

The first step of an automated deployment system is always the build. (For a software that doesn't need a build to be tested, the test might come first, but stay with me nonetheless).

At this point, I assume that there is already a build system in place that produces packages in the desired format, here .deb files. Here I will talk about integrating this build step into a pipeline that automatically polls new versions from a git repository, runs the build, and records the resulting .deb package as a build artifact.

A GoCD Build Pipeline

As mentioned earlier, my tool of choice of controlling the pipeline is Go Continuous Delivery. Once you have it installed and configured and agent, you can start to create a pipeline.

GoCD let's you build pipelines in its web interface, which is great for exploring the available options. But for a blog entry, it's easier to look at the resulting XML configuration, which you can also enter directly ("Admin" → "Config XML").

So without further ado, here's the first draft:


  <pipelines group="deployment">
    <pipeline name="package-info">
      <materials>
        <git url="https://github.com/moritz/package-info.git" dest="package-info" />
      </materials>
      <stage name="build" cleanWorkingDir="true">
        <jobs>
          <job name="build-deb" timeout="5">
            <tasks>
              <exec command="/bin/bash" workingdir="package-info">
                <arg>-c</arg>
                <arg>debuild -b -us -uc</arg>
              </exec>
            </tasks>
            <artifacts>
              <artifact src="package-info*_*" dest="package-info/" />
            </artifacts>
          </job>
        </jobs>
      </stage>
    </pipeline>
  </pipelines>

The outer-most group is a pipeline group, which has a name. It can be used to make it easier to get an overview of available pipelines, and also to manage permissions. Not very interesting for now.

The second level is the <pipeline> with a name, and it contains a list of materials and one or more stages.

Materials

A material is anything that can trigger a pipeline, and/or provide files that commands in a pipeline can work with. Here the only material is a git repository, which GoCD happily polls for us. When it detects a new commit, it triggers the first stage in the pipeline.

Directory Layout

Each time a job within a stage is run, the go agent (think worker) which runs it prepares a directory in which it makes the materials available. On linux, this directory defaults to /var/lib/go-agent/pipelines/$pipline_name. Paths in the GoCD configuration are typically relative to this path.

For example the material definition above contains the attribute dest="package-info", so the absolute path to this git repository is /var/lib/go-agent/pipelines/package-info/package-info. Leaving out the dest="..." works, and gives on less level of directory, but only works for a single material. It is a rather shaky assumption that you won't need a second material, so don't do that.

See the config references for a list of available material types and options. Plugins are available that add further material types.

Stages

All the stages in a pipeline run serially, and each one only if the previous stage succeed. A stage has a name, which is used both in the front end, and for fetching artifacts.

In the example above, I gave the stage the attribute cleanWorkingDir="true", which makes GoCD delete files created during the previous build, and discard changes to files under version control. This tends to be a good option to use, otherwise you might unknowingly slide into a situation where a previous build affects the current build, which can be really painful to debug.

Jobs, Tasks and Artifacts

Jobs are potentially executed in parallel within a stage, and have names for the same reasons that stages do.

Inside a job there can be one or more tasks. Tasks are executed serially within a job. I tend to mostly use <exec> tasks (and <fetchartifact>, which I will cover in a later blog post), which invoke system commands. They follow the UNIX convention of treating an exit status of zero as success, and everything else as a failure.

For more complex commands, I create shell or Perl scripts inside a git repository, and add repository as a material to the pipeline, which makes them available during the build process with no extra effort.

The <exec> task in our example invokes /bin/bash -c 'debuild -b -us -uc'. Which is a case of Cargo Cult Programming, because invoking debuild directly works just as well. Ah well, will revise later.

debuild -b -us -uc builds the Debian package, and is executed inside the git checkout of the source. It produces a .deb file, a .changes file and possibly a few other files with meta data. They are created one level above the git checkout, so in the root directory of the pipeline run.

These are the files that we want to work with later on, we let GoCD store them in an internal database. That's what the <artifact> instructs GoCD to do.

Since the name of the generated files depends on the version number of the built Debian package (which comes from the debian/changelog file in the git repo), it's not easy to reference them by name later on. That's where the dest="package-info/" comes in play: it makes GoCD store the artifacts in a directory with a fixed name. Later stages then can retrieve all artifact files from this directory by the fixed name.

The Pipeline in Action

If nothing goes wrong (and nothing ever does, right?), this is roughly what the web interface looks like after running the new pipeline:

So, whenever there is a new commit in the git repository, GoCD happily builds a Debian pacckage and stores it for further use. Automated builds, yay!

But there is a slight snag: It recycles version numbers, which other Debian tools are very unhappy about. In the next blog post, I'll discuss a way to deal with that.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Perlgeek.de : Architecture of a Deployment System

An automated build and deployment system is structured as a pipeline.

A new commit or branch in a version control system triggers the instantiation of the pipeline, and starts executing the first of a series of stages. When a stage succeeds, it triggers the next one. If it fails, the entire pipeline instance stops.

Then manual intervention is necessary, typically by adding a new commit that fixes code or tests, or by fixing things with the environment or the pipeline configuration. A new instance of the pipeline then has a chance to succeed.

Deviations from the strict pipeline model are possible: branches, potentially executed in parallel, for example allow running different tests in different environments, and waiting with the next step until both are completed successfully.

The typical stages are building, running the unit tests, deployment to a first test environment, running integration tests there, potentially deployment to and tests in various test environments, and finally deployment to production.

Sometimes, these stages blur a bit. For example, a typical build of Debian packages also runs the unit tests, which alleviates the need for a separate unit testing stage. Likewise if the deployment to an environment runs integration tests for each host it deploys to, there is no need for a separate integration test stage.

Typically there is a piece of software that controls the flow of the whole pipeline. It prepares the environment for a stage, runs the code associated with the stage, collects its output and artifacts (that is, files that the stage produces and that are worth keeping, like binaries or test output), determines whether the stage was successful, and then proceeds to the next.

From an architectural standpoint, it relieves the stages of having to know what stage comes next, and even how to reach the machine on which it runs. So it decouples the stages.

Anti-Pattern: Separate Builds per Environment

If you use a branch model like git flow for your source code, it is tempting to automatically deploy the develop branch to the testing environment, and then make releases, merge them into the master branch, and deploy that to the production environment.

It is tempting because it is a straight-forward extension of an existing, proven workflow.

Don't do it.

The big problem with this approach is that you don't actually test what's going to be deployed, and on the flip side, deploy something untested to production. Even if you have a staging environment before deploying to production, you are invalidating all the testing you did the testing environment if you don't actually ship the binary or package that you tested there.

If you build "testing" and "release" packages from different sources (like different branches), the resulting binaries will differ. Even if you use the exact same source, building twice is still a bad idea, because many builds aren't reproducible. Non-deterministic compiler behavior, differences in environments and dependencies all can lead to packages that worked fine in one build, and failed in another.

It is best to avoid such potential differences and errors by deploying to production exactly the same build that you tested in the testing environment.

Differences in behavior between the environments, where they are desirable, should be implemented by configuration that is not part of the build. (It should be self-evident that the configuration should still be under version control, and also automatically deployed. There are tools that specialize in deploying configuration, like Puppet, Chef and Ansible.)


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: Graphing tool

Dave's Free Press: Journal: Travelling in time: the CP2000AN

Dave's Free Press: Journal: XML::Tiny released

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 1

Perlgeek.de : Automating Deployments: Pipeline Templates in GoCD

In the last few blog post, you've seen the development of a GoCD pipeline for building a package, uploading it into repository for a testing environment, installing it in that environment, and then repeating the upload and installation cycle for a production environment.

To recap, this the XML config for GoCD so far:

<pipeline name="package-info">
  <materials>
    <git url="https://github.com/moritz/package-info.git" dest="package-info" materialName="package-info" />
    <git url="https://github.com/moritz/deployment-utils.git" dest="deployment-utils" materialName="deployment-utils" />
  </materials>
  <stage name="build" cleanWorkingDir="true">
    <jobs>
      <job name="build-deb" timeout="5">
        <tasks>
          <exec command="../deployment-utils/debian-autobuild" workingdir="#{package}" />
        </tasks>
        <artifacts>
          <artifact src="version" />
          <artifact src="package-info*_*" dest="package-info/" />
        </artifacts>
      </job>
    </jobs>
  </stage>
  <stage name="upload-testing">
    <jobs>
      <job name="upload-testing">
        <tasks>
          <fetchartifact pipeline="" stage="build" job="build-deb" srcdir="package-info">
            <runif status="passed" />
          </fetchartifact>
          <exec command="/bin/bash">
            <arg>-c</arg>
            <arg>deployment-utils/add-package testing jessie package-info_*.deb</arg>
          </exec>
        </tasks>
        <resources>
          <resource>aptly</resource>
        </resources>
      </job>
    </jobs>
  </stage>
  <stage name="deploy-testing">
    <jobs>
      <job name="deploy-testing">
        <tasks>
          <exec command="ansible" workingdir="deployment-utils/ansible/">
            <arg>--sudo</arg>
            <arg>--inventory-file=testing</arg>
            <arg>web</arg>
            <arg>-m</arg>
            <arg>apt</arg>
            <arg>-a</arg>
            <arg>name=package-info state=latest update_cache=yes</arg>
            <runif status="passed" />
          </exec>
        </tasks>
      </job>
    </jobs>
  </stage>
  <stage name="upload-production">
    <approval type="manual" />
    <jobs>
      <job name="upload-production">
        <tasks>
          <fetchartifact pipeline="" stage="build" job="build-deb" srcdir="package-info">
            <runif status="passed" />
          </fetchartifact>
          <exec command="/bin/bash">
            <arg>-c</arg>
            <arg>deployment-utils/add-package production jessie package-info_*.deb</arg>
          </exec>
        </tasks>
        <resources>
          <resource>aptly</resource>
        </resources>
      </job>
    </jobs>
  </stage>
  <stage name="deploy-production">
    <jobs>
      <job name="deploy-production">
        <tasks>
          <exec command="ansible" workingdir="deployment-utils/ansible/">
            <arg>--sudo</arg>
            <arg>--inventory-file=production</arg>
            <arg>web</arg>
            <arg>-m</arg>
            <arg>apt</arg>
            <arg>-a</arg>
            <arg>name=package-info state=latest update_cache=yes</arg>
            <runif status="passed" />
          </exec>
        </tasks>
      </job>
    </jobs>
  </stage>
</pipeline>

The interesting thing here is that the pipeline isn't very specific to this project. Apart from the package name, the Debian distribution and the group of hosts to which to deploy, everything in here can be reused to any software that's Debian packaged.

To make the pipeline more generic, we can define paramaters, short params

  <params>
    <param name="distribution">jessie</param>
    <param name="package">package-info</param>
    <param name="target">web</param>
  </params>

And then replace all the occurrences of package-info inside the stages definition with #{package}and so on:

  <stage name="build" cleanWorkingDir="true">
    <jobs>
      <job name="build-deb" timeout="5">
        <tasks>
          <exec command="../deployment-utils/debian-autobuild" workingdir="#{package}" />
        </tasks>
        <artifacts>
          <artifact src="version" />
          <artifact src="#{package}*_*" dest="#{package}/" />
        </artifacts>
      </job>
    </jobs>
  </stage>
  <stage name="upload-testing">
    <jobs>
      <job name="upload-testing">
        <tasks>
          <fetchartifact pipeline="" stage="build" job="build-deb" srcdir="#{package}">
            <runif status="passed" />
          </fetchartifact>
          <exec command="/bin/bash">
            <arg>-c</arg>
            <arg>deployment-utils/add-package testing #{distribution} #{package}_*.deb</arg>
          </exec>
        </tasks>
        <resources>
          <resource>aptly</resource>
        </resources>
      </job>
    </jobs>
  </stage>
  <stage name="deploy-testing">
    <jobs>
      <job name="deploy-testing">
        <tasks>
          <exec command="ansible" workingdir="deployment-utils/ansible/">
            <arg>--sudo</arg>
            <arg>--inventory-file=testing</arg>
            <arg>#{target}</arg>
            <arg>-m</arg>
            <arg>apt</arg>
            <arg>-a</arg>
            <arg>name=#{package} state=latest update_cache=yes</arg>
            <runif status="passed" />
          </exec>
        </tasks>
      </job>
    </jobs>
  </stage>
  <stage name="upload-production">
    <approval type="manual" />
    <jobs>
      <job name="upload-production">
        <tasks>
          <fetchartifact pipeline="" stage="build" job="build-deb" srcdir="#{package}">
            <runif status="passed" />
          </fetchartifact>
          <exec command="/bin/bash">
            <arg>-c</arg>
            <arg>deployment-utils/add-package production #{distribution} #{package}_*.deb</arg>
          </exec>
        </tasks>
        <resources>
          <resource>aptly</resource>
        </resources>
      </job>
    </jobs>
  </stage>
  <stage name="deploy-production">
    <jobs>
      <job name="deploy-production">
        <tasks>
          <exec command="ansible" workingdir="deployment-utils/ansible/">
            <arg>--sudo</arg>
            <arg>--inventory-file=production</arg>
            <arg>#{target}</arg>
            <arg>-m</arg>
            <arg>apt</arg>
            <arg>-a</arg>
            <arg>name=#{package} state=latest update_cache=yes</arg>
            <runif status="passed" />
          </exec>
        </tasks>
      </job>
    </jobs>
  </stage>

The next step towards generalization is to move the stages to a template. This can either be done again by editing the XML config, or in the web frontend with AdminPipelines and then clicking the Extract Template link next to the pipeline called package-info.

Either way, the result in the XML looks like this:

<pipelines group="deployment">
  <pipeline name="package-info" template="debian-base">
    <params>
      <param name="distribution">jessie</param>
      <param name="package">package-info</param>
      <param name="target">web</param>
    </params>
    <materials>
      <git url="https://github.com/moritz/package-info.git" dest="package-info" materialName="package-info" />
      <git url="https://github.com/moritz/deployment-utils.git" dest="deployment-utils" materialName="deployment-utils" />
    </materials>
  </pipeline>
</pipelines>
<templates>
  <pipeline name="debian-base">
      <!-- stages definitions go here -->
  </pipeline>
</templates>

Everything that's specific to this one software is now in the pipeline definition, and the reusable parts are in the template. With the sole exception of the deployment-utils repo, which must be added for software that is being automatically deployed, since GoCD has no way to move a material to a template.

Adding a deployment pipeline for another piece of software is now just a matter of specifying the URL, package name, target (that is, name of a group in the Ansible inventory file) and distribution. So about a minute of work once you're used to the tooling.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: Thanks, Yahoo!

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 2

Ocean of Awareness: What are the reasonable computer languages?

"You see things; and you say 'Why?' But I dream things that never were; and I say 'Why not?'" -- George Bernard Shaw

In the 1960's and 1970's computer languages were evolving rapidly. It was not clear which way they were headed. Would most programming be done with general-purpose languages? Or would programmers create a language for every task domain? Or even for every project? And, if lots of languages were going to be created, what kinds of languages would be needed?

It was in that context that Čulik and Cohen, in a 1973 paper, outlined what they thought programmers would want and should have. In keeping with the spirit of the time, it was quite a lot:

  • Programmers would want to extend their grammars with new syntax, including new kinds of expressions.
  • Programmers would also want to use tools that automatically generated new syntax.
  • Programmers would not want to, and especially in the case of auto-generated syntax would usually not be able to, massage the syntax into very restricted forms. Instead, programmers would create grammars and languages which required unlimited lookahead to disambiguate, and they would require parsers which could handle these grammars.
  • Finally, programmers would need to be able to rely on all of this parsing being done in linear time.

Today, we think we know that Čulik and Cohen's vision was naive, because we think we know that parsing technology cannot support it. We think we know that parsing is much harder than they thought.

The eyeball grammars

As a thought problem, consider the "eyeball" class of grammars. The "eyeball" class of grammars contains all the grammars that a human can parse at a glance. If a grammar is in the eyeball class, but a computer cannot parse it, it presents an interesting choice. Either,

  • your computer is not using the strongest practical algorithm; or
  • your mind is using some power which cannot be reduced to a machine computation.

There are some people out there (I am one of them) who don't believe that everything the mind can do reduces to a machine computation. But even those people will tend to go for the choice in this case: There must be some practical computer parsing algorithm which can do at least as well at parsing as a human can do by "eyeball". In other words, the class of "reasonable grammars" should contain the eyeball class.

Čulik and Cohen's candidate for the class of "reasonable grammars" were the grammars that a deterministic parse engine could parse if it had a lookahead that was infinite, but restricted to distinguishing between regular expressions. They called these the LR-regular, or LRR, grammars. And the LRR grammars do in fact seem to be a good first approximation to the eyeball class. They do not allow lookahead that contains things that you have to count, like palindromes. And, while I'd be hard put to eyeball every possible string for every possible regular expression, intuitively the concept of scanning for a regular expression does seem close to capturing the idea of glancing through a text looking for a telltale pattern.

So what happened?

Alas, the algorithm in the Čulik and Cohen paper turned out to be impractical. But in 1991, Joop Leo discovered a way to adopt Earley's algorithm to parse the LRR grammars in linear time, without doing the lookahead. And Leo's algorithm does have a practical implementation: Marpa.

References, comments, etc.

To learn more about Marpa, there's the official web site maintained by Ron Savage. I also have a Marpa web site. Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.

Perlgeek.de : Technology for automating deployments: the agony of choice

As an interlude I'd like to look at alternative technology stacks that you could use in your deployment project. I'm using to a certain stack because it makes sense in the context of the bigger environment.

This is a mostly Debian-based infrastructure with its own operations team, and software written in various (mostly dynamic) programming languages.

If your organization writes only Java code (or code in programming languages that are based on the JVM), and your operations folks are used to that, it might be a good idea to ship .jar files instead. Then you need a tool that can deploy them, and a repository to store them.

Package format

I'm a big fan of operating system packages, for three reasons: The operators are famiilar with them, they are language agnostic, and configuration management software typically supports them out of the box.

If you develop applications in several different programming languages, say perl, python and ruby, it doesn't make sense to build a deployment pipeline around three different software stacks and educate everybody involved about how to use and debug each of the language-specific package managers. It is much more economical to have the maintainer of each application build a system package, and then use one toolchain to deploy that.

That doesn't necessarily imply building a system package for each upstream package. Fat-packaging is a valid way to avoid an explosion of packaging tasks, and also of avoiding clashes when dependencies on conflicting version of the same package exists. dh-virtualenv works well for python software and all its python dependencies into a single Debian package; only the python interpreter itself needs to be installed on the target machine.

If you need to deploy to multiple operating system families and want to build only one package, nix is an interesting approach, with the additional benefit of allowing parallel installation of several versions of the same package. That can be useful for running two versions in parallel, and only switching over to the new one for good when you're convinced that there are no regressions.

Repository

The choice of package format dictates the repository format. Debian packages are stored in a different structure than Pypi packages, for example. For each repository format there is tooling available to help you create and update the repository.

For Debian, aptly is my go-to solution for repository management. Reprepro seems to be a decent alternative.

Pulp is a rather general and scalable repository management software that was originally written for RPM packages, but now also supports Debian packages, Python (pypi) packages and more. Compared to the other solutions mentioned so far (which are just command line programs you run when you need something, and file system as storage), it comes with some administrative overhead, because there's at least a MongoDB database and a RabbitMQ message broker required to run it. But when you need such a solution, it's worth it.

A smaller repository management for Python is pip2pi. In its simplest form you just copy a few .tar.gz files into a directory and run dir2pi . in that directory, and make it accessible through a web server.

CPAN-Mini is a good and simple tool for providing a CPAN mirror, and CPAN-Mini-Inject lets you add your own Perl modules, in the simplest case through the mcpani script.

Installation

Installing a package and its dependencies often looks easy on the surface, something like apt-get update && apt-get install $package. But that is deceptive, because many installers are interactive by nature, or require special flags to force installation of an older version, or other potential pitfalls.

Ansible provides modules for installing .deb packages, python modules, perl, RPM through yum, nix packages and many others. It also requires little up-front configuration on the destination system and is very friendly for beginners, but still offers enough power for more complex deployment tasks. It can also handle configuration management.

An alternative is Rex, with which I have no practical experience.

Not all configuration management systems are good fits for managing deployments. For example Puppet doesn't seem to have a good way to provide an order for package upgrades ("first update the backend on servers bck01 and bck02, and then frontend on www01, and the rest of the backend servers").


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Perlgeek.de : Automating Deployments: New Website, Community

No IT project would be complete without its own website, so my project of writing a book on automated deployments now has one too. Please visit deploybook.com and tell me what you think about. The plan is to gradually add some content, and maybe also a bit of color. When the book finally comes out (don't hold your breath here, it'll take some more months), it'll also be for sale there.

Quite a few readers have emailed me, sharing their own thoughts on the topic of building and deploying software, and often asking for feedback. There seems to be a need for a place to share these things, so I created one. The deployment community is a discourse forum dedicated to discussing such things. I expect to see a low volume of posts, but valuables to those involved.

To conclude the news roundup for this week, I'd like to mention that next week I'll be giving a talk on continuous delivery at the German Perl Workshop 2016, where I'm also one of the organizers. After the workshop is over, slides will be available online, and I'll likely have more time again for blogging. So stay tuned!


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: YAPC::Europe 2007 travel plans

Dave's Free Press: Journal: Wikipedia handheld proxy

Perlgeek.de : Automating Deployments: Stage 2: Uploading

Once you have the pipeline for building a package, it's time to distribute the freshly built package to the machines where it's going to be installed on.

I've previously explained the nuts and bolts of getting a Debian package into a repository managed by aptly so it's time to automate that.

Some Assumptions

We are going to need a separate repository for each environment we want to deploy to (or maybe group of environments; it might be OK and even desirable to share a repository between various testing environments that can be used in parallel, for example for security, performance and functional testing).

At some point in the future, when a new version of the operating system is released, we'll also need to build packages for another major version, so for example for Debian stretch instead of jessie. So it's best to plan for that case. Based on these assumptions, the path to each repository will be $HOME/aptly/$environment/$distribution.

For the sake of simplicity, I'm going to assume a single host on which both testing and production repositories will be hosted on from separate directories. If you need those repos on separate servers, it's easy to reverse that decision (or make a different one in the first place).

To easy the transportation and management of the repository, a GoCD agent should be running on the repo server. It can copy the packages from the GoCD server's artifact repository with built-in commands.

Scripting the Repository Management

It would be possible to manually initialize each repository, and only automate the process of adding a package. But since it's not hard to do, taking the opposite route of creating automatically on the fly is more reliable. The next time you need a new environment or need to support a new distribution you will benefit from this decision.

So here is a small Perl program that, given an environment, distribution and a package file name, creates the aptly repo if it doesn't exist yet, writes the config file for the repo, and adds the package.

#!/usr/bin/perl
use strict;
use warnings;
use 5.014;
use JSON qw(encode_json);
use File::Path qw(mkpath);
use autodie;

unless ( @ARGV == 3) {
    die "Usage: $0 <environment> <distribution> <.deb file>\n";
}
my ( $env, $distribution, $package ) = @ARGV;

my $base_path   = "$ENV{HOME}/aptly";
my $repo_path   = "$base_path/$env/$distribution";
my $config_file = "$base_path/$env-$distribution.conf";
my @aptly_cmd   = ("aptly", "-config=$config_file");

init_config();
init_repo();
add_package();


sub init_config {
    mkpath $base_path;
    open my $CONF, '>:encoding(UTF-8)', $config_file;
    say $CONF encode_json( {
    rootDir => $repo_path,
    architectures => [qw( i386 amd64 all )],
    });
    close $CONF;
}

sub init_repo {
    return if -d "$repo_path/db";
    mkpath $repo_path;
    system @aptly_cmd, "repo", "create", "-distribution=$distribution", "myrepo";
    system @aptly_cmd, "publish", "repo", "myrepo";
}

sub add_package {
    system @aptly_cmd,  "repo", "add", "myrepo", $package;
    system @aptly_cmd,  "publish", "update", $distribution;
}

As always, I've developed and tested this script interactively, and only started to plug it into the automated pipeline once I was confident that it did what I wanted.

And as all software, it's meant to be under version control, so it's now part of the deployment-utils git repo.

More Preparations: GPG Key

Before GoCD can upload the debian packages into a repository, the go agent needs to have a GPG key that's not protected by a password. You can either log into the go system user account and create it there with gpg --gen-key, or copy an existing .gnupg directory over to ~go (don't forget to adjust the ownership of the directory and the files in there).

Integrating the Upload into the Pipeline

The first stage of the pipeline builds the Debian package, and records the resulting file as an artifact. The upload step needs to retrieve this artifact with a fetchartifact task. This is the config for the second stage, to be inserted directly after the first one:

  <stage name="upload-testing">
    <jobs>
      <job name="upload-testing">
        <tasks>
          <fetchartifact pipeline="" stage="build" job="build-deb" srcdir="package-info">
            <runif status="passed" />
          </fetchartifact>
          <exec command="/bin/bash">
            <arg>-c</arg>
            <arg>deployment-utils/add-package testing jessie package-info_*.deb</arg>
          </exec>
        </tasks>
        <resources>
          <resource>aptly</resource>
        </resources>
      </job>
    </jobs>
  </stage>

Note that testing here refers to the name of the environment (which you can chose freely, as long as you are consistent), not the testing distribution of the Debian project.

There is a aptly resource, which you must assign to the agent running on the repo server. If you want separate servers for testing and production repositories, you'd come up with a more specific resource name here (for example `aptly-testing^) and a separate one for the production repository.

Make the Repository Available through HTTP

To make the repository reachable from other servers, it needs to be exposed to the network. The most convenient way is over HTTP. Since only static files need to be served (and a directory index), pretty much any web server will do.

An example config for lighttpd:

dir-listing.encoding = "utf-8"
server.dir-listing   = "enable"
alias.url = ( 
    "/debian/testing/jessie/"    => "/var/go/aptly/testing/jessie/public/",
    "/debian/production/jessie/" => "/var/go/aptly/production/jessie/public/",
    # more repos here
)

And for the Apache HTTP server, once you've configured a virtual host:

Options +Indexes
Alias /debian/testing/jessie/     /var/go/aptly/testing/jessie/public/
Alias /debian/production/jessie/  /var/go/aptly/production/jessie/public/
# more repos here

Achievement Unlocked: Automatic Build and Distribution

With theses steps done, there is automatic building and upload of packages in place. Since client machines can pull from that repository at will, we can tick off the distribution of packages to the client machines.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: Bryar security hole

Dave's Free Press: Journal: POD includes

Dave's Free Press: Journal: cgit syntax highlighting

Perlgeek.de : Automating Deployments: Installation in the Pipeline

As [mentioned before](perlgeek.de/blog-en/automating-deployments/2016-007-installing-packages.html), my tool of choice for automating package installation is [ansible](https://deploybook.com/resources).

The first step is to create an inventory file for ansible. In a real deployment setting, this would contain the hostnames to deploy to. For the sake of this project I just have a test setup consisting of virtual machines managed by vagrant, which leads to a somewhat unusual ansible configuration.

That's the ansible.cfg:

[defaults]
remote_user = vagrant
host_key_checking = False

And the inventory file called testing for the testing environment:

[web]
testserver ansible_ssh_host=127.0.0.1 ansible_ssh_port=2345 

(The host is localhost here, because I run a vagrant setup to test the pipeline; In a real setting, it would just be the hostname of your test machine).

All code and configuration goes to version control, I created an ansible directory in the deployment-utils repo and dumped the files there.

Finally I copied the ssh private key (from vagrant ssh-config) to /var/go/.ssh/id_rsa, adjusted the owner to user go, and was ready to go.

Plugging it into GoCD

Automatically installing a newly built package through GoCD in the testing environment is just another stage away:

  <stage name="deploy-testing">
    <jobs>
      <job name="deploy-testing">
        <tasks>
          <exec command="ansible" workingdir="deployment-utils/ansible/">
            <arg>--sudo</arg>
            <arg>--inventory-file=testing</arg>
            <arg>web</arg>
            <arg>-m</arg>
            <arg>apt</arg>
            <arg>-a</arg>
            <arg>name=package-info state=latest update_cache=yes</arg>
            <runif status="passed" />
          </exec>
        </tasks>
      </job>
    </jobs>
  </stage>

The central part is an invocation of ansible in the newly created directory of the deployment--utils repository.

Results

To run the new stage, either trigger a complete run of the pipeline by hitting the "play" triangle in the pipeline overview in web frontend, or do a manual trigger of that one stage in the pipe history view.

You can log in on the target machine to check if the package was successfully installed:

vagrant@debian-jessie:~$ dpkg -l package-info
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-=================================
ii  package-info   0.1-0.7.1    all          Web service for getting a list of

and verify that the service is running:

vagrant@debian-jessie:~$ systemctl status package-info
● package-info.service - Package installation information via http
   Loaded: loaded (/lib/systemd/system/package-info.service; static)
   Active: active (running) since Sun 2016-03-27 13:15:41 GMT; 4h 6min ago
  Process: 4439 ExecStop=/usr/bin/hypnotoad -s /usr/lib/package-info/package-info (code=exited, status=0/SUCCESS)
 Main PID: 4442 (/usr/lib/packag)
   CGroup: /system.slice/package-info.service
           ├─4442 /usr/lib/package-info/package-info
           ├─4445 /usr/lib/package-info/package-info
           ├─4446 /usr/lib/package-info/package-info
           ├─4447 /usr/lib/package-info/package-info
           └─4448 /usr/lib/package-info/package-info

and check that it responds on port 8080, as it's supposed to:

    vagrant@debian-jessie:~$ curl http://127.0.0.1:8080/|head -n 7
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
      0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0Desired=Unknown/Install/Remove/Purge/Hold
    | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
    |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
    ||/ Name                           Version                     Architecture Description
    +++-==============================-===========================-============-===============================================================================
    ii  acl                            2.2.52-2                    amd64        Access control list utilities
    ii  acpi                           1.7-1                       amd64        displays information on ACPI devices
    curl: (23) Failed writing body (2877 != 16384)

The last line is simply curl complaining that it can't write the full output, due to the pipe to head exiting too early to receive all the contents. We can safely ignore that.

Going All the Way to Production

Uploading and deploying to production works the same as with the testing environment. So all that's needed is to duplicate the configuration of the last two pipelines, replace every occurrence of testing with pproduction, and add a manual approval button, so that production deployment remains a conscious decision:

  <stage name="upload-production">
    <approval type="manual" />
    <jobs>
      <job name="upload-production">
        <tasks>
          <fetchartifact pipeline="" stage="build" job="build-deb" srcdir="package-info">
            <runif status="passed" />
          </fetchartifact>
          <exec command="/bin/bash">
            <arg>-c</arg>
            <arg>deployment-utils/add-package production jessie package-info_*.deb</arg>
          </exec>
        </tasks>
        <resources>
          <resource>aptly</resource>
        </resources>
      </job>
    </jobs>
  </stage>
  <stage name="deploy-production">
    <jobs>
      <job name="deploy-production">
        <tasks>
          <exec command="ansible" workingdir="deployment-utils/ansible/">
            <arg>--sudo</arg>
            <arg>--inventory-file=production</arg>
            <arg>web</arg>
            <arg>-m</arg>
            <arg>apt</arg>
            <arg>-a</arg>
            <arg>name=package-info state=latest update_cache=yes</arg>
            <runif status="passed" />
          </exec>
        </tasks>
      </job>
    </jobs>
  </stage>

The only real news here is the second line:

    <approval type="manual" />

which makes GoCD only proceed to this stage when somebody clicks the approval arrow in the web interface.

You also need to fill out the inventory file called production with the list of your server or servers.

Achievement Unlocked: Basic Continuous Delivery

Let's recap, the pipeline

  • is triggered automatically from commits in the source code
  • automatically builds a Debian package from each commit
  • uploads it to a repository for the testing environment
  • automatically installs it in the testing environment
  • upon manual approval, uploads it to a repository for the production environment
  • ... and automatically installs the new version in production.

So the basic framework for Continuous Delivery is in place.

Wow, that escalated quickly.

Missing Pieces

Of course, there's lots to be done before we can call this a fully-fledged Continuous Delivery pipeline:

  • Automatic testing
  • Generalization to other software
  • version pinning (always installing the correct version, not the newest one).
  • Rollbacks
  • Data migration

But even as is, the pipeline can provide quite some time savings and shortened feedback cycles. The manual approval before production deployment is a good hook for manual tasks, such as manual tests.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Ocean of Awareness: Grammar reuse

Every year the Perl 6 community creates an "Advent" series of posts. I always follow these, but one in particular caught my attention this year. It presents a vision of a future where programming is language-driven. A vision that I share. The post went on to encourage its readers to follow up on this vision, and suggested an approach. But I do not think the particular approach suggested would be fruitful. In this post I'll explain why.

Reuse

The focus of the Advent post was language-driven programming, and that is the aspect that excites me most. But the points that I wish to make are more easily understood if I root them in a narrower, but more familiar issue -- grammar reuse.

Most programmers will be very familiar with grammar reuse from regular expressions. In the regular expression ("RE") world, programming by cutting and pasting is very practical and often practiced.

For this post I will consider grammar reusability to be the ability to join two grammars and create a third. This is also sometimes called grammar composition. For this purpose, I will widen the term "grammar" to include RE's and PEG parser specifications. Ideally, when you compose two grammars, what you get is

  • a language you can reasonably predict, and
  • if each of the two original grammars can be parsed in reasonable time, a language that can be parsed in reasonable time.

Not all language representations are reusable. RE's are, and BNF is. PEG looks like a combination of BNF and RE's, but PEG, in fact, is its own very special form of parser specification. And PEG parser specifications are one of the least reusable language representations ever invented.

Reuse and regular expressions

RE's are as well-behaved under reuse as a language representation can get. The combination of two RE's is always another RE, and you can reasonably determine what language the combined RE recognizes by examining it. Further, every RE is parseable in linear time.

The one downside, often mentioned by critics, is that RE's do not scale in terms of readability. Here, however, the problem is not really one of reusability. The problem is that RE's are quite limited in their capabilities, and programmers often exploit the excellent behavior of RE's under reuse to push them into applications for which RE's just do not have the power.

Reuse and PEG

When programmers first look at PEG syntax, they often think they've encountered paradise. They see both BNF and RE's, and imagine they'll have the best of each. But the convenient behavior of RE's depends on their unambiguity. You simply cannot write an unambiguous RE -- it's impossible.

More powerful and more flexible, BNF allows you to describe many more grammars -- including ambiguous ones. How does PEG resolve this? With a Gordian knot approach. Whenever it encounters an ambiguity, it throws all but one of the choices away. The author of the PEG specification gets some control over what is thrown away -- he specifies an order of preference for the choices. But degree of control is less than it seems, and in practice PEG is the nitroglycerin of parsing -- marvelous when it works, but tricky and dangerous.

Consider these 3 PEG specifications:

	("a"|"aa")"a"
	("aa"|"a")"a"
	A = "a"A"a"/"aa"

All three clearly accept only strings which are repetitions of the letter "a". But which strings? For the answers, suggestions for dealing with PEG if you are committed to it, and more, look at my previous post on PEG.

When getting an RE or a BNF grammar to work, you can go back to the grammar and ask yourself "Does my grammar look like my intended language?". With PEG, this is not really possible. With practice, you might get used to figuring out single line PEG specs like the first two above. (If you can get the last one, you're amazing.) But tracing these through multiple rule layers required by useful grammars is, in practice, not really possible.

In real life, PEG specifications are written by hacking them until the test suite works. And, once you get a PEG specification to pass the test suite for a practical-sized grammar, you are very happy to leave it alone. Trying to compose two PEG specifications is rolling the dice with the odds against you.

Reuse and the native Perl 6 parser

The native Perl 6 parser is an extended PEG parser. The extensions are very interesting from the PEG point of view. The PEG "tie breaking" has been changed, and backtracking can be used. These features mean the Perl 6 parser can be extended to languages well beyond what ordinary PEG parsers can handle. But, if you use the extra features, reuse will be even trickier than if you stuck with vanilla PEG.

Reuse and general BNF parsing

As mentioned, general BNF is reusable, and so general BNF parsers like Marpa are as reusable as regular expressions, with two caveats. First, if the two grammars are not doing their own lexing, their lexers will have to be compatible.

Second, with regular expressions you had the advantage that every regular expression parses in linear time, so that speed was guaranteed to be acceptable. Marpa users reuse grammars and pieces of grammars all the time. The result is always the language specified by the merged BNF, and I've never heard anyone complain that performance deterioriated.

But, while it may not happen often, it is possible to combine two Marpa grammars that run in linear time and end up with one that does not. You can guarantee your merged Marpa grammar will stay linear if you follow 2 rules:

  • keep the grammar unambiguous;
  • don't use an unmarked middle recursion.

Unmarked middle recursions are not things you're likely to need a lot: they are those palindromes where you have to count to find the middle: grammars like "A ::= a | a A a". If you use a middle recursion at all, it is almost certainly going to be marked, like "A ::= b | a A a", which generates strings like "aabaa". With Marpa, as with RE's, reuse is easy and practical. And, as I hope to show in a future post, unlike RE's, Marpa opens the road to language-driven programming.

Perl 6

I'm a fan of the Perl 6 effort. I certainly should be a supporter, after the many favors they've done for me and the Marpa community over the years. The considerations of this post will disappoint some of the hopes for applications of the native Perl 6 parser. But these applications have not been central to the Perl 6 effort, of which I will be an eager student over the coming months.

Comments

To learn more about Marpa, there's the official web site maintained by Ron Savage. I also have a Marpa web site. Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.

Perlgeek.de : Automatically Deploying Specific Versions

Versions. Always talking about versions. So, more talk about versions.

The installation pipeline from a previous installment always installs the newest version available. In a normal, simple, linear development flow, this is fine, and even in other workflows, it's a good first step.

But we really want the pipeline to deploy the exact versions that was built inside the same instance of the pipeline. The obvious benefit is that it allows you to rerun older versions of the pipeline to install older versions, effectively giving you a rollback.

Or you can build a second pipeline for hotfixes, based on the same git repository but a different branch, and when you do want a hotfix, you simply pause the regular pipeline, and trigger the hotfix pipeline. In this scenario, if you always installed the newest version, finding a proper version string for the hotfix is nearly impossible, because it needs to be higher than the currently installed one, but also lower than the next regular build. Oh, and all of that automatically please.

A less obvious benefit to installing a very specific version is that it detects error in the package source configuration of the target machines. If the deployment script just installs the newest version that's available, and through an error the repository isn't configured on the target machine, the installation process becomes a silent no-op if the package is already installed in an older version.

Implementation

There are two things to do: figure out version to install of the package, and and then do it.

The latter step is fairly easy, because the ansible "apt" module that I use for installation supports, and even has an example in the documentation:

# Install the version '1.00' of package "foo"
- apt: name=foo=1.00 state=present

Experimenting with this feature shows that in case this is a downgrade, you also need to add force=yes.

Knowing the version number to install also has a simple, though maybe not obvious solution: write the version number to a file, collect this file as an artifact in GoCD, and then when it's time to install, fetch the artifact, and read the version number from it.

When I last talked about the build step, I silently introduced configuration that collects the version file that the debian-autobuild script writes:

  <job name="build-deb" timeout="5">
    <tasks>
      <exec command="../deployment-utils/debian-autobuild" workingdir="#{package}" />
    </tasks>
    <artifacts>
      <artifact src="version" />
      <artifact src="package-info*_*" dest="package-info/" />
    </artifacts>
  </job>

So only the actual installation step needs adjusting. This is what the configuration looked like:

  <job name="deploy-testing">
    <tasks>
      <exec command="ansible" workingdir="deployment-utils/ansible/">
        <arg>--sudo</arg>
        <arg>--inventory-file=testing</arg>
        <arg>web</arg>
        <arg>-m</arg>
        <arg>apt</arg>
        <arg>-a</arg>
        <arg>name=package-info state=latest update_cache=yes</arg>
        <runif status="passed" />
      </exec>
    </tasks>
  </job>

So, first fetch the version file:

      <job name="deploy-testing">
        <tasks>
          <fetchartifact pipeline="" stage="build" job="build-deb" srcfile="version" />
          ...

Then, how to get the version from the file to ansible? One could either use ansible's lookup('file', path) function, or write a small script. I decided to the latter, since I was originally more aware of bash's capabilities than of ansible's, and it's only a one-liner anyway:

          ...
          <exec command="/bin/bash" workingdir="deployment-utils/ansible/">
            <arg>-c</arg>
            <arg>ansible --sudo --inventory-file=testing #{target} -m apt -a "name=#{package}=$(&lt; ../../version) state=present update_cache=yes force=yes"</arg>
          </exec>
        </tasks>
      </job>

Bash's $(...) opens a sub-process (which again is a bash instance), and inserts the output from that sub-process into the command line. < ../../version is a short way of reading the file. And, this being XML, the less-than sign needs to be escaped.

The production deployment configuration looks pretty much the same, just with --inventory-file=production.

Try it!

To test the version-specific package installation, you need to have at least two runs of the pipeline that captured the version artifact. If you don't have that yet, you can push commits to the source repository, and GoCD picks them up automatically.

You can query the installed version on the target machine with dpkg -l package-info. After the last run, the version built in that pipeline instance should be installed.

Then you can rerun the deployment stage from a previous pipeline, for example in the history view of the pipeline by hovering with the mouse over the stage, and then clicking on the circle with the arrow on it that triggers the rerun.

After the stage rerun has completed, checking the installed version again should yield the version built in the pipeline instance that you selected.

Conclusions

Once you know how to set up your pipeline to deploy exactly the version that was built in the same pipeline instance, it is fairly easy to implement.

Once you've done that, you can easily deploy older versions of your software as a step back scenario, and use the same mechanism to automatically build and deploy hotfixes.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: CPAN Testers' CPAN author FAQ

Perlgeek.de : Automating Deployments: Version Recycling Considered Harmful

In the previous installment we saw a GoCD configuration that automatically built a Debian package from a git repository whenever somebody pushes a new commit to the git repo.

The version of the generated Debian package comes from the debian/changelog file of the git repository. Which means that whenever somebody pushes code or doc changes without a new changelog entry, the resulting Debian package has the same version number as the previous one.

The problem with this version recycling is that most Debian tooling assumes that the tuple of package name, version and architecture uniquely identifies a revision of a package. So stuffing a new version of a package with an old version number into a repository is bound to cause trouble; most repository management software simply refuses to accept that. On the target machine, upgrade the package won't do anything if the version number stays the same.

So, its a good idea to put a bit more thought into the version string of the automatically built Debian package.

Constructing Unique Version Numbers

There are several source that you can tap to generate unique version numbers:

  • Randomness (for example in the form of UUIDs)
  • The current date and time
  • The git repository itself
  • GoCD exposes several environment variables that can be of use

The latter is quite promising: GO_PIPELINE_COUNTER is a monotonic counter that increases each time GoCD runs the pipeline, so a good source for a version number. GoCD allows manual re-running of stages, so it's best to combine it with GO_STAGE_COUNTER. In terms of shell scripting, using $GO_PIPELINE_COUNTER.$GO_STAGE_COUNTER as a version string sounds like a decent approach.

But, there's more. GoCD allows you to trigger a pipeline with a specific version of a material, so you can have a new pipeline run to build an old version of the software. If you do that, using GO_PIPELINE_COUNTER as the first part of the version string doesn't reflect the use of an old code base.

To construct a version string that primarily reflects the version of the git repository, and only secondarily the build iteration, the first part of the version string has to come from git. As a distributed version control system, git doesn't supply a single, numeric version counter. But if you limit yourself to a single repository and branch, you can simply count commits.

git describe is an established way to count commits. By default it prints the last tag in the repo, and if HEAD does not resolve to the same commit as the tag, it adds the number of commits since that tag, and the abbreviated sha1 hash prefixed by g, so for example 2016.04-32-g4232204 for the commit 4232204, which is 32 commits after the tag 2016.04. The option --long forces it to always print the number of commits and the hash, even when HEAD points to a tag.

We don't need the commit hash for the version number, so a shell script to construct a good version number looks like this:

#!/bin/bash

set -e
set -o pipefail
version=$(git describe --long |sed 's/-g[A-Fa-f0-9]*$//')
version="$version.${GO_PIPELINE_COUNTER:-0}.${GO_STAGE_COUNTER:-0}"

Bash's ${VARIABLE:-default} syntax is a good way to make the script work outside a GoCD agent environment.

This script requires a tag to be set in the git repository. If there is none, it fails with this message from git describe:

fatal: No names found, cannot describe anything.

Other Bits and Pieces Around the Build

Now that we have a version string, we need to instruct the build system to use this version string. This works by writing a new entry in debian/changelog with the desired version number. The debchange tool automates this for us. A few options are necessary to make it work reliably:

export DEBFULLNAME='Go Debian Build Agent'
export DEBEMAIL='go-noreply@example.com'
debchange --newversion=$version  --force-distribution -b  \
    --distribution="${DISTRIBUTION:-jessie}" 'New Version'

When we want to reference this version number in later stages in the pipeline (yes, there will be more), it's handy to have it available in a file. It is also handy to have it in the output, so two more lines to the script:

echo $version
echo $version > ../version

And of course, trigger the actual build:

debuild -b -us -uc

Plugging It Into GoCD

To make the script accessible to GoCD, and also have it under version control, I put it into a git repository under the name debian-autobuild and added the repo as a material to the pipeline:

<pipeline name="package-info">
  <materials>
    <git url="https://github.com/moritz/package-info.git" dest="package-info" />
    <git url="https://github.com/moritz/deployment-utils.git" dest="deployment-utils" materialName="deployment-utils" />
  </materials>
  <stage name="build" cleanWorkingDir="true">
    <jobs>
      <job name="build-deb" timeout="5">
        <tasks>
          <exec command="../deployment-utils/debian-autobuild" workingdir="#{package}" />
        </tasks>
        <artifacts>
          <artifact src="version" />
          <artifact src="package-info*_*" dest="package-info/" />
        </artifacts>
      </job>
    </jobs>
  </stage>
</pipeline>

Now GoCD automatically builds Debian packages on each commit to the git repository, and gives each a distinct version string.

The next step is to add it to a repository, so that it can be installed on a target machine with a simple apt-get command.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Ocean of Awareness: What parser do birds use?

"Here we provide, to our knowledge, the first unambiguous experimental evidence for compositional syntax in a non-human vocal system." -- "Experimental evidence for compositional syntax in bird calls", Toshitaka N. Suzuki, David Wheatcroft & Michael Griesser Nature Communications 7, Article number: 10986

In this post I look at a subset of the language of the Japanese great tit, also known as Parus major. The above cited article presents evidence that bird brains can parse this language. What about standard modern computer parsing methods? Here is the subset -- probably a tiny one -- of the language actually used by Parus major.

      S ::= ABC
      S ::= D
      S ::= ABC D
      S ::= D ABC
    

Classifying the Parus major grammar

Grammophone is a very handy new tool for classifying grammars. Its own parser is somewhat limited, so that it requires a period to mark the end of a rule. The above grammar is in Marpa's SLIF format, which is smart enough to use the "::=" operator to spot the beginning and end of rules, just as the human eye does. Here's the same grammar converted into a form acceptable to Grammophone:

      S -> ABC .
      S -> D .
      S -> ABC D .
      S -> D ABC .
    

Grammophone tells us that the Parus major grammar is not LL(1), but that it is LALR(1).

What does this mean?

LL(1) is the class of grammar parseable by top-down methods: it's the best class for characterizing most parsers in current use, including recursive descent, PEG, and Perl 6 grammars. All of these parsers fall short of dealing with the Parus major language.

LALR(1) is probably most well-known from its implementations in bison and yacc. While able to handle this subset of Parus's language, LALR(1) has its own, very strict, limits. Whether LALR(1) could handle the full complexity of Parus language is a serious question. But it's a question that in practice would probably not arise. LALR(1) has horrible error handling properties.

When the input is correct and within its limits, an LALR-driven parser is fast and works well. But if the input is not perfectly correct, LALR parsers produce no useful analysis of what went wrong. If Parus hears "d abc d", a parser like Marpa, on the other hand, can produce something like this:

# * String before error: abc d\s
# * The error was at line 1, column 7, and at character 0x0064 'd', ...
# * here: d
    

Parus uses its language in predatory contexts, and one can assume that a Parus with a preference for parsers whose error handling is on an LALR(1) level will not be keeping its alleles in the gene pool for very long.

References, comments, etc.

Those readers content with sub-Parus parsing methods may stop reading here. Those with greater parsing ambitions, however, may wish to learn more about Marpa. A Marpa test script for parsing the Parus subset is in a Github gist. Marpa has a semi-official web site, maintained by Ron Savage. The official, but more limited, Marpa website is my personal one. Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.

Perlgeek.de : Automating Deployments: 3+ Environments

Software is written to run in a production environment. This is where the goal of the business is achieved: making money for the business, or reaching and educating people, or whatever the reason for writing the software is. For websites, this is the typically the Internet-facing public servers.

But the production environment is not where you want to develop software. Developing is an iterative process, and comes with its own share of mistakes and corrections. You don't want your customers to see all those mistakes as you make them, so you develop in a different environment, maybe on your PC or laptop instead of a server, with a different database (though hopefully using the same database software as in the production environment), possibly using a different authentication mechanism, and far less data than the production environment has.

You'll likely want to prevent certain interactions in the development environment that are desirable in production: Sending notifications (email, SMS, voice, you name it), charging credit cards, provisioning virtual machines, opening rack doors in your data center and so on. How that is done very much depends on the interaction. You can configure a mail transfer agent to deliver all mails to a local file or mail box. Some APIs have dedicated testing modes or installations; in the worst case, you might have to write a mock implementation that answers similarly to the original API, but doesn't carry out the action that the original API does.

Deploying software straight to production if it has only been tested on the developer's machine is a rather bad practice. Often the environments are too different, and the developer unknowingly relied on a feature of his environment that isn't the same in the production environment. Thus it is quite common to have one or more environments in between where the software is deployed and tested, and only propagated to the next deployment environment when all the tests in the previous one were successful.

After a software is modified in the development environment, it is
deployed to the testing environment (with its own database), and if all tests
were successful, propagated to the production environment.

One of these stages is often called testing. This is where the software is shown to the stakeholders to gather feedback, and if manual QA steps are required, they are often carried out in this environment (unless there is a separate environment for that).

A reason to have another non-production environment is test service dependencies. If several different software components are deployed to the testing environment, and you decide to deploy one or two at a time to production, things might break in production. The component you deployed might have a dependency on a newer version of another component, and since the testing environment contained that newer version, nobody noticed. Or maybe a database upgrade in the testing environment failed, and had to be repaired manually; you don't want the same to happen in a production setting, so you decide to test in another environment first.

After a software is modified in the development environment, it is
deployed to the testing environment (with its own database), and if all tests
were successful, propagated to the staging  environment. Only if this works is
the deployment to production carried out

Thus many companies have another staging environment that mirrors the production environment as closely as possible. A planned production deployment is first carried out in the staging environment, and on success done in production too, or rolled back on error.

There are valid reasons to have more environments even. If automated performance testing is performed, it should be done in an separate environment where no manual usage is possible to avoid distorting results. Other tests such as automated acceptance or penetration testings are best done in their own environment.

One can add more environments for automated acceptance, penetration
     and performance testing for example; those typically come before the
     staging environment.

In addition, dedicated environment for testing and evaluating explorative features are possible.

It should be noted that while these environment all serve valid purposes, they also come at a cost. Machines, either virtual or native, on which all those environments run must be available, and they consume resources. They must be set up initially and maintained. License costs must be considered (for example for proprietary databases). Also the time for deploying code increases as the number of environment increases. With more environments, automating deployments and maybe even management and configuration of the infrastructure becomes mandatory.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: Thankyou, Anonymous Benefactor!

Perlgeek.de : Continuous Delivery for Libraries?

Past Thursday I gave a talk on Continuous Delivery (slides) at the German Perl Workshop 2016 (video recordings have been made, but aren't available yet). One of the questions from the audience was something along the lines of: would I use Continuous Delivery for a software library?

My take on this is that you typically develop a library driven by the needs of one or more applications, not just for the sake of developing a library. So you have some kind of pilot application which makes use of the new library features.

You can integrate the library into the application's build pipeline. Automatically build and unit test the library, and once it succeeds, upload the library into a repository. The build pipeline for the application can then download the newest version of the library, and include it in its build result (fat-packaging). The application build step now has two triggers: commits from its own version control repository, and library uploads.

Then the rest of the delivery pipeline for the application serves as quality gating for the library as well. If the pipeline includes integration tests and functional tests for the whole software stack, it will catch errors of the library, and deploy the library along with the application.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Dave's Free Press: Journal: Number::Phone release

Dave's Free Press: Journal: Ill

Dave's Free Press: Journal: CPANdeps upgrade

Ocean of Awareness: Introduction to Marpa Book in progress

What follows is a summary of the features of the Marpa algorithm, followed by a discussion of potential applications. It refers to itself as a "monograph", because it is a draft of part of the introduction to a technical monograph on the Marpa algorithm. I hope the entire monograph will appear in a few weeks.

The Marpa project

The Marpa project was intended to create a practical and highly available tool to generate and use general context-free parsers. Tools of this kind had long existed for LALR and regular expressions. But, despite an encouraging academic literature, no such tool had existed for context-free parsing. The first stable version of Marpa was uploaded to a public archive on Solstice Day 2011. This monograph describes the algorithm used in the most recent version of Marpa, Marpa::R2. It is a simplification of the algorithm presented in my earlier paper.

A proven algorithm

While the presentation in this monograph is theoretical, the approach is practical. The Marpa::R2 implementation has been widely available for some time, and has seen considerable use, including in production environments. Many of the ideas in the parsing literature satisfy theoretical criteria, but in practice turn out to face significant obstacles. An algorithm may be as fast as reported, but may turn out not to allow adequate error reporting. Or a modification may speed up the recognizer, but require additional processing at evaluation time, leaving no advantage to compensate for the additional complexity.

In this monograph, I describe the Marpa algorithm as it was implemented for Marpa::R2. In many cases, I believe there are better approaches than those I have described. But I treat these techniques, however solid their theory, as conjectures. Whenever I mention a technique that was not actually implemented in Marpa::R2, I will always explicitly state that that technique is not in Marpa as implemented.

Features

General context-free parsing

As implemented, Marpa parses all "proper" context-free grammars. The proper context-free grammars are those which are free of cycles, unproductive symbols, and inaccessible symbols. Worst case time bounds are never worse than those of Earley's algorithm, and therefore never worse than O(n**3).

Linear time for practical grammars

Currently, the grammars suitable for practical use are thought to be a subset of the deterministic context-free grammars. Using a technique discovered by Joop Leo, Marpa parses all of these in linear time. Leo's modification of Earley's algorithm is O(n) for LR-regular grammars. Leo's modification also parses many ambiguous grammars in linear time.

Left-eidetic

The original Earley algorithm kept full information about the parse --- including partial and fully recognized rule instances --- in its tables. At every parse location, before any symbols are scanned, Marpa's parse engine makes available its information about the state of the parse so far. This information is in useful form, and can be accessed efficiently.

Recoverable from read errors

When Marpa reads a token which it cannot accept, the error is fully recoverable. An application can try to read another token. The application can do this repeatedly as long as none of the tokens are accepted. Once the application provides a token that is accepted by the parser, parsing will continue as if the unsuccessful read attempts had never been made.

Ambiguous tokens

Marpa allows ambiguous tokens. These are often useful in natural language processing where, for example, the same word might be a verb or a noun. Use of ambiguous tokens can be combined with recovery from rejected tokens so that, for example, an application could react to the rejection of a token by reading two others.

Using the features

Error reporting

An obvious application of left-eideticism is error reporting. Marpa's abilities in this respect are ground-breaking. For example, users typically regard an ambiguity as an error in the grammar. Marpa, as currently implemented, can detect an ambiguity and report specifically where it occurred and what the alternatives were.

Event driven parsing

As implemented, Marpa::R2 allows the user to define "events". Events can be defined that trigger when a specified rule is complete, when a specified rule is predicted, when a specified symbol is nulled, when a user-specified lexeme has been scanned, or when a user-specified lexeme is about to be scanned. A mid-rule event can be defined by adding a nulling symbol at the desired point in the rule, and defining an event which triggers when the symbol is nulled.

Ruby slippers parsing

Left-eideticism, efficient error recovery, and the event mechanism can be combined to allow the application to change the input in response to feedback from the parser. In traditional parser practice, error detection is an act of desperation. In contrast, Marpa's error detection is so painless that it can be used as the foundation of new parsing techniques.

For example, if a token is rejected, the lexer is free to create a new token in the light of the parser's expectations. This approach can be seen as making the parser's "wishes" come true, and I have called it "Ruby Slippers Parsing".

One use of the Ruby Slippers technique is to parse with a clean but oversimplified grammar, programming the lexical analyzer to make up for the grammar's short-comings on the fly. As part of Marpa::R2, the author has implemented an HTML parser, based on a grammar that assumes that all start and end tags are present. Such an HTML grammar is too simple even to describe perfectly standard-conformant HTML, but the lexical analyzer is programmed to supply start and end tags as requested by the parser. The result is a simple and cleanly designed parser that parses very liberal HTML and accepts all input files, in the worst case treating them as highly defective HTML.

Ambiguity as a language design technique

In current practice, ambiguity is avoided in language design. This is very different from the practice in the languages humans choose when communicating with each other. Human languages exploit ambiguity in order to design highly flexible, powerfully expressive languages. For example, the language of this monograph, English, is notoriously ambiguous.

Ambiguity of course can present a problem. A sentence in an ambiguous language may have undesired meanings. But note that this is not a reason to ban potential ambiguity --- it is only a problem with actual ambiguity.

Syntax errors, for example, are undesired, but nobody tries to design languages to make syntax errors impossible. A language in which every input was well-formed and meaningful would be cumbersome and even dangerous: all typos in such a language would be meaningful, and parser would never warn the user about errors, because there would be no such thing.

With Marpa, ambiguity can be dealt with in the same way that syntax errors are dealt with in current practice. The language can be designed to be ambiguous, but any actual ambiguity can be detected and reported at parse time. This exploits Marpa's ability to report exactly where and what the ambiguity is. Marpa::R2's own parser description language, the SLIF, uses ambiguity in this way.

Auto-generated languages

In 1973, Čulik and Cohen pointed out that the ability to efficiently parse LR-regular languages opens the way to auto-generated languages. In particular, Čulik and Cohen note that a parser which can parse any LR-regular language will be able to parse a language generated using syntax macros.

Second order languages

In the literature, the term "second order language" is usually used to describe languages with features which are useful for second-order programming. True second-order languages --- languages which are auto-generated from other languages --- have not been seen as practical, since there was no guarantee that the auto-generated language could be efficiently parsed.

With Marpa, this barrier is raised. As an example, Marpa::R2's own parser description language, the SLIF, allows "precedenced rules". Precedenced rules are specified in an extended BNF. The BNF extensions allow precedence and associativity to be specified for each RHS.

Marpa::R2's precedenced rules are implemented as a true second order language. The SLIF representation of the precedenced rule is parsed to create a BNF grammar which is equivalent, and which has the desired precedence. Essentially, the SLIF does a standard textbook transformation. The transformation starts with a set of rules, each of which has a precedence and an associativity specified. The result of the transformation is a set of rules in pure BNF. The SLIF's advantage is that it is powered by Marpa, and therefore the SLIF can be certain that the grammar that it auto-generates will parse in linear time.

Notationally, Marpa's precedenced rules are an improvement over similar features in LALR-based parser generators like yacc or bison. In the SLIF, there are two important differences. First, in the SLIF's precedenced rules, precedence is generalized, so that it does not depend on the operators: there is no need to identify operators, much less class them as binary, unary, etc. This more powerful and flexible precedence notation allows the definition of multiple ternary operators, and multiple operators with arity above three.

Second, and more important, a SLIF user is guaranteed to get exactly the language that the precedenced rule specifies. The user of the yacc equivalent must hope their syntax falls within the limits of LALR.

References, comments, etc.

Marpa has a semi-official web site, maintained by Ron Savage. The official, but more limited, Marpa website is my personal one. Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.

Dave's Free Press: Journal: YAPC::Europe 2006 report: day 3

Header image by Tambako the Jaguar. Some rights reserved.