ldami: Very positive and refreshing articles on Perl 5 -- example for embedded systems

 Michel Conrad recently wrote two very positive articles on Perl 5. Those were relayed on the  LinkedIn Perl group but I haven't seen them on  usual Perl sites, so I share them here in the hope they get propagated better. :

 https://opensource.com/article/18/1/why-i-love-perl-5

https://opensource.com/article/18/1/my-delorean-runs-perl

The second article is very interesting in that it shows an unusual application area for Perl : realtime graphics for a car dashboard ! This reminded me of a very interesting presentation many years ago at the French Perl Workshop 2005 , where we saw an application driving all vehicles on the apron of Port Airport.

 


Gabor Szabo: The Popularity of Perl in 2018 February

More than 2 year ago I've published an article called The Popularity of Perl in 2015 It contained a list of sites with their Alexa ranking and a few sites with information from their Google Analytics. This is an updated version of that report for February 2018.

For the full article visit The Popularity of Perl in 2018 February

End Point Blog: Sunsetting Piggybak, A Ruby on Rails Ecommerce Gem

Screenshot of website for Piggybak: Open Source Ruby on Rails Ecommerce

Hi there! I’m Steph, the original creator and not-very-good maintainer of Piggybak, a modular ecommerce platform written in Ruby on Rails. With the help of End Point, I created Piggybak after building a custom Ruby on Rails ecommerce application for one of our clients here at End Point.

My goal with Piggybak was to create a lightweight, modular ecommerce platform in the form of a Ruby on Rails gem that could be combined with other Ruby on Rails gems for quick setup. Over the years, End Point has had much success working in Interchange, an ecommerce framework written in Perl. The web stack has evolved greatly over the years, as has the capacity for modularity and the ability to decouple front-end and back-end architecture.

Fast forward about 4 years after Piggybak was released, and we’ve decided to retire it. Not only did I leave the maintenance up to End Point after I left to work as an in-house software engineer for the last couple of years, but I was also in a position to evaluate using Piggybak as the base for a custom solution.

While I think there are some great Ruby on Rails gems to help support your ecommerce application (see below), one of the main things I realized was that the modularity in Piggybak often doesn’t suit the needs for the target audience of custom Ruby on Rails ecommerce platforms.

These days, here’s my oversimplified categorization of those looking for ecommerce solutions, divided into two audiences:

Audience #1: Boilerplate Saas with Theme Customization
Those sellers where boilerplate ecommerce solutions work, with simple product and variant modeling. Shopify, Magento, BigCommerce, WooCommerce can be decent popular options for that audience, but there are so many other options out there.

Audience #2: Custom Ecommerce in Ruby on Rails
Companies going this route have custom enough needs where a small team can develop and maintain a custom solution, using Ruby on Rails efficiently that don’t require them to depend on a pre-existing data model or business rule.

Not only did Piggybak suffer from a lack of community involvement, but it also didn’t suit the needs of the two audiences listed above. Because it was complex enough written in the form of a Rails gem (engine), it required a developer with some knowledge to install and customize further. And because it defined assumptions around the product-variant data model and business rules for inventory and payment management, it wasn’t necessarily a good fit for custom ecommerce needs.

Other Ruby on Rails Gems

While I’m sad to sunset Piggybak, I still believe Rails offers great options for ecommerce needs, including these popular Ruby on Rails gems:

If you are wondering if there are any Ruby on Rails ecommerce gems still out there, you can look at Solidus, Spree, and RoR-E. End Point has a long history with Spree, and experience with the other two platforms, but again, the audience of those businesses choosing a custom path may not want to be tied to the data models and business rules adopted by these existing platforms.

Perl Foundation News: Grant Proposal: Perl-Based CloudForFree.org Platform

The Grants Committee has received the following grant proposal for the January/February round. Before the Committee members vote, we would like to solicit feedback from the Perl community on the proposal.

Review the proposal below and please comment here by February 20th, 2018. The Committee members will start the voting process following that and the conclusion will be announced the last week of February.

Perl-Based CloudForFree.org Platform

  • Name:

    John Napiorkowski

  • Amount Requested:

    USD $1,500

Synopsis

Complete development of the partially-operational http://www.cloudforfree.org platform.

Benefits to the Perl Community

Few people understand what The Cloud is, and fewer still have access.

The Perl community will benefit by having wide-spread free access to a Perl-based Cloud computing platform, which will showcase Perl as both modern and competitive against other Cloud technologies.

Deliverables

The CloudForFree.org platform itself will be updated with new features which will automatically become available to all users.

Project Details

The CloudForFree.org platform is a Perl-based Cloud computing software system created from scratch.

Numerous technologies are utilized in the current version of CloudForFree.org, including but not limited to:

  • Perl
  • Catalyst MVC
  • ShinyCMS
  • RPerl
  • GCC C++
  • Javascript
  • CGI::Ajax
  • Ace Editor

I have reviewed the CloudForFree.org source code and started initial planning discussions with the founder, Will Braswell.

Inch-stones

Phase 1, LAMP Installer Script Section 51

  • Review & Comprehend LAMP_installer.sh Script
  • Upgrade LAMP_installer.sh Section 51 "PERL CLOUDFORFREE" To Function Properly
  • Demonstrate Section 51 Installing CloudForFree Software In Ubuntu 16.04
  • Upgrade CFF Server To RPerl v3.2

Phase 2, IDE Code Editor

  • Create Linux User Account For Existing & New Users
  • New Copies Of Learning RPerl Exercises
  • Save All Files In Linux User Home Directories
  • Button For "Save File" Function
  • Save & Quit Functions For Ace Editor vi & emacs Keyboard Commands
    • :w :wq ZZ etc.
    • Ctrl-x Ctrl-s Ctrl-x s Ctrl-x Ctrl-w etc.
  • Multiple Open Files Via Tab Widget

Phase 3, Open GitHub Issues

  • Review & Comprehend All Open GitHub Issues github.com/wbraswell/cloudforfree.org...
  • Collaborate With Client Personnel To Plan Solutions For Each Open Issue
  • Implement Solutions For Each Open Issue

Phase 4, Terminal Emulation

  • Full xterm Terminal Emulation For Command-Line Job Execution
  • Includes All VT100 & VT102 & VT220 Features
  • Resize Number Of Rows & Columns By Window Resize
  • Resize Font Size By User Configuration
  • Backspace, Tab, Other Special Command Characters
  • Arbitrary Placement Of Characters At Any Row & Column
  • Color Characters & Background
  • Curses & Other Menu Support
  • Full Window (F10) & Full Screen (F11)

Phase 5, Graphical Output & Mouse Input

  • Full X-Windows Graphics Output Using Xpra HTML5 Client
  • Generate Output Using SDL From Perl & C
  • Mouse Left-Click, Right-Click, Drag, Scroll
  • Resize Number Of Available X & Y Pixels By Window Resize
  • Full Window (F10) & Full Screen (F11)

Phase 6, GitHub Repositories Integration

  • Import & Setup User's GitHub Keys Via Linux User Account (From Phase 2)
  • List Of All GitHub Repos With User As Owner Or Collaborator
  • Admin Mode, Display All Repos
  • Allow User To Enter Any Other Readable GitHub Repo URL ex. github.com/wbraswell/physicsperl
  • Buttons For Basic User Git Commands
    • clone
    • add
    • commit
    • push
    • pull
    • checkout
    • status
  • Do Not Duplicate Any GitHub Web Functionality

Phase 7, Job Queue

  • Job Scheduler & Monitor Using OAR (1st Choice), Or HTCondor Or Slurm (2nd Choices)
  • Manage User's Jobs Via Linux User Account (From Phase 2)
  • List Of All Current Jobs With User As Owner
  • Admin Mode, Display All Jobs
  • FileManager For Selecting *.pl RPerl Programs To Run
  • Buttons For Basic Job Control Commands
    • start
    • restart
    • stop (SIGTERM)
    • force stop (SIGKILL)
    • pause (SIGSTOP)
    • continue (SIGCONT)
    • show/hide CLI (From Phase 4)
    • show/hide GUI (From Phase 5)
  • Display Resources: FLOPS, Cores, Memory, Storage; Total, Unused, Available To User

Project Schedule

I can start in about a week or so.

This is a very complex job, I expect it could take 3 to 5 months to complete.

Completeness Criteria

This grant is done when a new version of CloudForFree.org is made available with all the new features listed herein.

Bio

I am the maintainer of Catalyst MVC and I have 77 distributions on CPAN:

https://metacpan.org/author/JJNAPIORK

I am a member of the Austin Perl Mongers with Will Braswell, the creator of both RPerl and CloudForFree.org.

We have worked together for several years on various Perl projects, both open source and commercial.

Perl Foundation News: Grant Proposals: January/February 2017

The Grants Committee has received the following grant proposals for the January/February round.

Before the Committee members vote on the proposals, we would like to solicit feedback from the Perl community.

Review the proposals at their individual links and please comment there by February 20th, 2018. The Committee members will start the voting process following that and the conclusion will be announced the last week of February.

Perl Foundation News: Grant Proposal: OS Installation Packages for RPerl Compiler

The Grants Committee has received the following grant proposal for the January/February round. Before the Committee members vote, we would like to solicit feedback from the Perl community on the proposal.

Review the proposal below and please comment here by February 20th, 2018. The Committee members will start the voting process following that and the conclusion will be announced the last week of February.

OS Installation Packages for RPerl Compiler

  • Name:

    Felix Tubiana

  • Amount Requested:

    USD $800

Synopsis

Creation of RPerl installation packages for 12 major operating systems.

Benefits to the Perl Community

Most of the Perl community is limited in their ability to benefit from the RPerl compiler due to installation complexity.

This project will significantly decrease the barrier to utilizing RPerl for normal Perl developers.

Deliverables

Installation packages will be made available for free public download from the http://rperl.org website.

Project Details

Currently, there are 3 methods of installing RPerl, none of which is particularly easy or stable or fast.

First, RPerl can be installed manually from the latest code on GitHub, which requires all dependencies to also be installed, perhaps manually as well.

Second, RPerl can be installed via the interactive command-line installation script https://github.com/wbraswell/rperl/blob/master/script/rperl_installer.sh, which is automated but still error prone due to the many and varied configuration options, different operating systems, etc.

Third, RPerl can be installed from CPAN via the normal cpan RPerl or cpanm RPerl commands; this is the most automated option, but it takes forever and is error prone and is only capable of installing Perl-based or Alien-based dependencies, so OS-specific dependencies are completely unsupported in this case.

The operating system installation packages generated by this project will allow most software developers to quickly and easily install RPerl without the issues introduced by missing or broken CPAN dependencies, confusing installer configuration settings, or manual code installation.

In other words, RPerl will finally be as easy to install as Perl itself.

I am already working with Will Braswell (RPerl creator) to implement CentOS support in the RPerl installation script:

https://github.com/wbraswell/rperl/commit/15ba50fa659342d155e10c307d9084d9df16e5c9

Inch-stones

  • Upgrade RPerl installation script to support:
    • Debian
    • Mint
    • Fedora
    • OpenSUSE
    • FreeBSD
    • NetBSD
    • Windows 10, Chocolatey
    • Windows 10, Cygwin
    • Windows 10, Native
    • macOS
  • Implement OS-specific installation packages, including package templates and accompanying how-to documentation, for:
    • Ubuntu
    • Debian
    • Mint
    • CentOS
    • Fedora
    • OpenSUSE
    • FreeBSD
    • NetBSD
    • Windows 10, Chocolatey
    • Windows 10, Cygwin
    • Windows 10, Native
    • macOS

Project Schedule

I will start immediately and hope to finish within about 2 months, although it could take longer especially with the additional difficulty of Windows platforms.

Completeness Criteria

This grant is done when a new version of RPerl is release to CPAN containing the updated installation script, as well as all 12 operating system installation packages being made available on the http://rperl.org website.

Bio

I am an official member of the RPerl development team.

This means I am a Perl programmer working with Will Braswell, and I am dedicated to the future of Perl.

Perl Foundation News: Grant Proposal: List Operators In RPerl Compiler

The Grants Committee has received the following grant proposal for the January/February round. Before the Committee members vote, we would like to solicit feedback from the Perl community on the proposal.

Review the proposal below and please comment here by February 20th, 2018. The Committee members will start the voting process following that and the conclusion will be announced the last week of February.

List Operators In RPerl Compiler

  • Name:

    Mathias Köhler

  • Amount Requested:

    USD $800

Synopsis

This grant proposal covers the implementation of multiple list operators in the RPerl compiler.

There are many operators AKA builtin functions in Perl, this proposal covers a subset of those related to lists and arrays.

Benefits to the Perl Community

Perl is an operator-rich language, which is one of the aspects of Perl which makes it powerful.

Most Perl programs make use of list operators, and will benefit greatly from the ability to be compiled using RPerl.

Deliverables

A new version of RPerl will be released with the new list operators.

Project Details

This project is solely focused on the RPerl compiler internals, which are written in Perl and C++.

A new Perl module is created for each list operator to be implemented, and then code generation routines are written to output both Perl and C++ code.

For detailed technical information about implementing RPerl operators, please see the following documentation:

https://github.com/wbraswell/rperl/blob/master/docs/devs_getting_started.txt

I have already implemented a number of scalar RPerl operators, including sin, cos, exp, log, sqrt, atan2, and length.

https://github.com/wbraswell/rperl/commits?author=93r

Since 2016 I have been an officially authorized contributor to RPerl, and I continue to work with the creator Mr. Braswell on numerous RPerl tasks.

Inch-stones

Create Perl tests, Perl code generation, C++ code generation, and C++ tests for the following list operators:

  • Quoted Word qw()
  • Push push
  • Pop pop
  • Shift shift
  • Unshift unshift
  • Range ..
  • Reverse reverse
  • Sort sort

Project Schedule

This grant project should be complete in approximately 3 or 4 months, and I am available to begin now.

Completeness Criteria

We will be done when RPerl is updated on CPAN with the new list operators.

Bio

I am a professional Perl programmer and a member of the RPerl development team.

I will continue working with the creator of RPerl, Mr. Braswell, throughout this grant project.

Perl Foundation News: Grant Proposal: MySQL Support In RPerl Compiler

The Grants Committee has received the following grant proposal for the January/February round. Before the Committee members vote, we would like to solicit feedback from the Perl community on the proposal.

Review the proposal below and please comment here by February 20th, 2018. The Committee members will start the voting process following that and the conclusion will be announced the last week of February.

MySQL Support In RPerl Compiler

  • Name:

    Richard D. Johnson

  • Amount Requested:

    USD $650

Synopsis

I want to implement basic MySQL support for Perl code compiled using the RPerl optimizing compiler.

This will allow Perl software which depends upon MySQL to be compiled into very fast C++ code.

Benefits to the Perl Community

MySQL is one of the most popular databases among Perl programmers.

Many different kinds of Perl software utilize MySQL, so this grant has a potentially wide applicability in the Perl community.

Often, software which depends upon a database is itself performance critical in some way, and this grant will directly benefit such cases where MySQL is the database of choice.

Deliverables

We will release a new version of the RPerl compiler, containing the new MySQL support.

Project Details

Implementing this project requires knowledge of DBI, MySQL, Perl, C++, and RPerl compiler internals.

I have already started writing out the manually-compiled example code in both Perl and C++, which is an important first step.

I am working directly with the RPerl project leader, Will Braswell, to ensure all code is merged into the main RPerl code base on GitHub and CPAN.

Inch-stones

  • Write manually-compiled source code examples in both Perl and C++.
  • Enable use DBI; found in RPerl input source code.
  • Parse and validate DBI operations found in RPerl input source code.
  • Enable MySQL C++ driver in RPerl compiler internals.
  • Map RPerl input source code to C++ output source code.
  • Support the following MySQL operations:
    • Open Database
    • Create Table
    • Insert Record
    • Select Record
    • Update Record
    • Delete Record

Project Schedule

I expect this project to be completed within 60 to 90 days after we officially begin.

I can begin work immediately.

Completeness Criteria

This task will be considered complete when the new version of RPerl is released to CPAN, containing the specified features.

Bio

I am a Perl programmer who works directly with Will Braswell, the RPerl development team lead.

I have attended the last 2 YAPC/TPC conferences in North America, and I plan to attend again this year.

I live in Austin like Mr. Braswell, and we do much of our programming in a shared office in the same physical location.

I will be able to seek the daily guidance and support of Mr. Braswell while working on this project.

Perl Foundation News: Grant Proposal: Curating and improving Perl6 documentation

The Grants Committee has received the following grant proposal for the January/February round. Before the Committee members vote, we would like to solicit feedback from the Perl community on the proposal.

Review the proposal below and please comment here by February 20th, 2018. The Committee members will start the voting process following that and the conclusion will be announced the last week of February.

Curating and improving Perl6 documentation

  • Name:

    JJ Merelo

  • Amount Requested:

    USD 5,000

Synopsis

The documentation of Perl6 at https://github.com/perl6/doc is an impressive piece of work, but working on it has revealed some problems. I propose to work part time for two months on improving this documentation by

  • working on all outstanding issues and address all of the issues in 2015, 2016 and 2017,
  • creating a set of rules for existing and new contributions and enforce those rules in the test suite,
  • assigning outstanding issues during the grant and create a system for assigning work to volunteers,
  • creating an entry page with a tutorial for complete beginners with very little or no knowledge of programming, with the intent of providing a good landing page for the language
  • contributing to Perl and general open source conferences with entry-level tutorials and learning material, and
  • writing articles in websites such as The Practical Dev (https://dev.to), HackerNoon or Medium with practical "30 minutes" tutorials to perform usual tasks in Perl6.

In general, the main intention of this work is to attract new users or retain them after they make their first attempts to work with the language. So far and unfortunately, Perl6 has failed to attract users and it stands in the lower 10% of the top 100 languages in GitHub according to the number of pull requests. It has around 2 questions a week in StackOverflow, and in general they are advanced questions, with no space for first steps. Once again, documentation is not development (or not fully), but it is essential to attract a critical mass of developers, that will go on to develop the next killer application that uses Perl6. It has got to reach from the entry-level programmers to the more advanced ones.

So my main objective, and the stick by which it will be measured, is the attention gathered by the language and how the user base increases.

Benefits to the Perl6 community and Perl community at large

Curating documentation is an integral part of the development process. Besides including work on little code snippets, that have to absolutely and positively work flawlessly, there are pieces of code in the perl6/doc repository used for testing documentation; we will propose to also work on this aspect.

On the other hand, documentation is the entry point for newcomers to a language, and it needs to be the very best to increase the community of Perl6 users and obviously developers. A good documentation will Y help developers find their way to use the best perl6ish data structures and functions. Curation of the documentation base will help to make it sustainable on the long term, which will eventually benefit the communities at large.

Project details

Here are the rest of the project details, starting with the deliverables.

Deliverables

These is what I will deliver during and by the end of the grant.

  • Addressing 5 doc issues per day on average, minimum 2 issues per day. Reduce the amount of open issues by at least 120.
  • Write guidelines for contributions and write test code for enforcing it.
  • Analyze landing pages and site paths and contribute to work on preferred landing pages. Produce report to improve usability of the site.
  • Contribute 1 entry-level tutorial per week, to be published preferably on non-perl publications (Medium, dev.to, hackernoon).
  • Monitor StackOverflow and answer and/or identify needs to be addressed in documentation. Contribute to work on those pages or create new ones.
  • Hang out at the IRC channel to help newcomers in general and identify possible problems in documentation; address them when needed. In general, monitor all possible channels where people might need help with Perl6 and, besides helping when needed, try to identify possible problems in the documentation.
  • Refactor documentation tests and write new ones to make new additions follow guidelines. This is already in an issue in the GitHub repo.

Project details and schedule

Perl6 documentation is an impressive piece of work. There are close to 8K commits, close to 700 PRs and around 800 closed issues. The way it is organized and served allows it to be searched very fast and it can be really useful.

However, from the point of view of the newcomer, he or she might find it difficult to get started on the language. Descriptions are sometimes quite technical and examples do not correspond to real use cases and in some other cases do not cover all cases.

Internally, it is mainly a volunteer effort and lots of people are working hard reviewing it and keeping it relevant and, over all, coherent and correct. But since it is mainly self-organized and self-assigned sometimes it is difficult to get a particular thing done. This means that there are issues open since 2015 (#93 since June 13th 2015), and almost 300 of them are still open. Even "low hanging fruit" issues are hanging low for really a long time, with no systematic effort to pick them up from the tree.

A bigger problem is that these issues are mostly self-generated by people that have already contributed, which is a challenge since it seems that people coming to the documentation are not bothered enough to actually find a particular error and create an issue. Encouraging this behavior would help to improve the documentation, and also attract new documentors to this, even those who are not totally proficient in Perl6.

Another problem I have found is that there is some lack of coherency and consistency in the documentation, which should only be expected, since every page has been written by a different set of developers. For instance, some examples included in the documentation can be directly copy/pasted and run. Some other are not, with several examples lumped together into a code block. Output is written sometimes in different lines, sometimes in the same line, almost always as a comment. It would be convenient to have some rules, even more convenient to enforce those rules adding a test to the test suite; it would be a plus to make all examples runnable so that they are tested along with the documentation and flagged when they become obsolete or erroneous.

In general, the objective is to improve the documentation. Getting a detailed log of the visits to the site will allow also to analyze the path, see whether people are finding what they are looking for, and optimize landing pages. A bit of referrer-specific code might also help, but in general analyzing the effectiveness of the documentation and reacting to it would also improve the user experience of the site. This work would be done locally from Granada. I will also attend Glasgow's YAPC and any other during the grant (for instance, the Dutch Perl Workshop) if it's considered necessary. I can organize locally a documentation hackathon by the end of the grant or do an online squashathon.

Project schedule

  • Day 7: identify core of people working on the documentation and create an open channel of communication, for instance, a Telegram channel with the GitHub bot connected. Use it for meta-discussions. Then, daily: work on that channel to add a small layer of organization, assign tasks, discuss issues.
  • Daily: monitor StackOverflow, Perl6 IRC channel, Twitter and other social networks for people needing help with Perl6. Identify main sources of problem.
  • As-it-happens: check out PRs for documentation, engage them, accept them if adequate. Check out new issues, engage and assign them to self.
  • Weekly: produce entry-level tutorials and publication on non-perl sites.
  • Daily: work on outstanding perl6/doc issues, starting with the first. Target: 5 a day, minimum 2 a day.
  • Day 15: produce a report on site use and usability. Day 30: implement improvements. Day 45: analyze improvements and produce report on site visits and interactions. Day 60: implement second round of improvements.
  • Day 15: RFC of guidelines for documentation and code included with it. Day 30: code implementation of those guidelines.
  • Day 45: organize online or offline Perl6 documentation hackathon in Granada (Spain) to address outstanding issues, improve code examples and in general add a layer of sustainability to the documentation effort.

Report schedule

The reports will be done according to grant rules, and, additionally...

  • The grant report will be integrated with the doc GitHub repo, as a ChangeLog or grant report folder. This ChangeLog will be updated as-it-happens.
  • 4 reports, 2 a month, on activity happening outside the log: tutorials published, StackOverflow activity. They can be published wherever the committee sees fit. I can create a GitHub repo for the grant, for instance, and publish it as github pages.

Requested amount for the Project Grant

Whatever amount is the usual for two months, part time, work. The contract will be signed through my employer, the University of Granada.

Bio

Born in 1965, I have been using Perl since 1993 and Perl5 since its release by the beginning of 2016. Currently I am professor in an university in Spain, where I have been teaching since 1988. I teach cloud computing and other related subjects.

I have authored "Learning to program with Perl6", where I actually used Perl6 to put the book into production. I have been contributing to the Perl devroom since 2013, and attended several YAPC, last one in Cluj-Napoca. I also organized YAPC::Europe in Granada, and two workshop (next one in one week) in the same city.

I am mainly a Perl programmer, but I use Perl6 extensively; I contribute to the Perl6 repositories since 2016.

Juan J. Merelo, JJ in GitHub, JMERELO in CPAN, jjmerelo in Medium, JJ in The Practical Dev.

Country of Residence and nationality

Spain and Spanish.

Suggestions for Grant Manager

Liz Mathijsen, Wendy van Dijk, Jeff Goff, Simon Proctor, Tom Browder.

SEE ALSO

Copyright

This file is released under the GPL. See the LICENSE file included in this distribution,
or go to http://www.fsf.org/licenses/gpl.txt

Perl Foundation News: Grant Proposal: Perl Camp in Democratic Republic of Congo

The Grants Committee has received the following grant proposal for the January/February round. Before the Committee members vote, we would like to solicit feedback from the Perl community on the proposal.

Review the proposal below and please comment here by February 20th, 2018. The Committee members will start the voting process following that and the conclusion will be announced the last week of February.

Please note that this particular request was submitted during the last round of 2017 when we had already exhausted our funding; so unfortunately, the dates on the proposal have already passed.

Perl Camp in Democratic Republic of Congo

  • Name:

    Narcisse Mbunzama

  • Amount Requested:

    USD 4,000

Synopsis

Perl Camp in Democratic Republic of Congo

20-21 December 2017

PROJECT DESCRIPTION

We aim to organize a first nationwide Perl Camp in democratic republic of congo. During 2 days in Kinshasa, the capital city of the democratic republic of Congo, students, IT managers and developers will come together to learn, share ideas, discuss issues related to Perl finally to build their capacity in Perl and to cultivate an active Perl developer community in democratic republic of Congo.

The first Perl Camp in the democratic republic of Congo will consist in a series of activities including (courses on Perl programming, debates and discussions, hackhaton, and leadership programs, etc) to allow participants to develop their skills and knowledge on Perl and leadership programme to enable participants to be able to lead a Perl community in their specific regions. Indeed, at the moment there is not an active Perl community in democratic republic and only few than 10 people know about Perl in the country and use Perl when it comes to choose the programming language, while Perl has several and great advantages.

We plan through the first Perl Camp in democratic republic of Congo to bring around 150 participants coming from across the country to build their capacity on Perl and to create an active Perl community in the democratic republic of Congo.

THE TEAM

The consultation is lead by Narcisse Mbunzama, an experimented Software developer with strong background in Perl. Narcisse Mbunzama, is a dual citizen of DRCongo and Sweden. He holds a master in computer science from Umea University in Sweden. He's a serial winning tech innovator, and the executive manager of DRCongo developer association. Prior to that, he served as central africa coordinator for Africa environment outlook for youth at the United Nations Environment Programme(UNEP) and as fellow of the International Telecommunication Union. I'm fluent in English, French and Swedish.

As, Software developer, He organize a series of programming trainings for developers, students and IT managers in democratic republic of Congo in regularly basis since 2014 in democratic republic of Congo. Participants are developers, students and IT managers, they come to build their skills on Perl programming. Since 2014, he have trained more than 100 people in different countries. The training will be conduct by Narcisse Mbunzama with 2 other people, they participate also as trainers and mentors during the Perl trainings.

The trainers team are : - Ms.Lucie Rushimande, Master in Computer science from University of Kinshasa - Mr. Glodie Mbanzulu, Master in Mechaniclzl Law from University of Kinshasa. She teach participants in the training the issues related to privacy, law and freedom expression.

THE PARTICIPANTS

Perl Camp in democratic republic of Congo is targeting developers, students, and IT managers from public and private organizations from across the democratic republic of Congo. Female and youth participation in the Perl Camp in democratic republic of Congo will be encouraged.

NUMBERS OF PARTICIPANTS

150 participants from across the democratic republic of Congo among which - 120 participants living outside the capital city of Kinshasa will receive travel and accomodation supports to to participate in the event in kinshasa and 30 participants will be from kinshasa and around Kinshasa

THE DATE

20-21 December 2017

VENUE

University of Kinshasa, the democratic republic of Congo

OUTREACH

To promote the program to the local community and to reach a larger audience of people, we will organize a strong marketing campaigns before, during and after the Perl Camp in democratic republic of DRCongo on Perl and related activities through :

    - Media presence on TV, Radio, newspapers
    - Social Media : Facebook page, Youtube, Twitter, blogs, etc.
    - Face to face campaigns, PR, etc
    - Presence in major IT events in the democratic republic of Congo

TIMELINE

December 9, 2017 : Open call for application to participate in Perl in DRCongo

December 15, 2017 : Selection of participants

December 20, 2017 : Opening ceremony of Perl Camp in DRCongo 8:00 - 18:00 : (Training and courses on Perl, workshops, debates, etc)

December 21, 2017 8:00 - 18:00 : (Hackhaton, Leadership skills, networking, etc) End of the program.

January 5, 2018 : Final report send to the Grant donors.

BUDGET

Location of Event room : 250$ x 2 = 500$ Equipement and Materials (prints, flyers, pencils, etc) : 500$ Communication/internet : 100$ Transport and accomodation of: 2000$ Food and drinks of participants during 2 days : 800$ Administrative cost : 100$

TOTAL BUDGET REQUEST: 4000$

SPONSORING

Africa Park Aventure, a DRCongo based toursim company will give us 150 computers to use during the Perl Camp in democratic republic of Congo.

BANK ACCOUNT

If selected for funding, we will provide a bank account to wire the funds.

End Point Blog: Liquid Galaxy at the Pyeongchang 2018 Winter Olympics

Liquid Galaxy showing Pyeongchang Olympic venues

The Winter Olympics are in full swing in Pyeongchang, Korea! We’re proud to note that we have a full Liquid Galaxy running onsite there as well.

Our Seoul-based partner, AZero, has been working closely with KEPCO, the Korean power company, to bring Liquid Galaxies to several of their visitor centers scattered throughout South Korea. KEPCO utilizes the Liquid Galaxy to showcase their infrastructure elements of hydroelectric dams, large power stations, and substations to show how they bring electricity to the 40M+ people living in South Korea.

Liquid Galaxy showing aerial view of mountains around Pyeongchang

As a lead corporate sponsor of the games, KEPCO wanted to bring that same story to the Olympic venues. AZero developed new content that highlights the Olympic venues and Pyeongchang area and deployed the Liquid Galaxy at the Gangneung branch office (near Pyeongchang, and also a host of several Olympic venues).

Now that the games are in full swing, KEPCO is bringing VIPs, government officials, and a global list of business contacts to their center, and is using the Liquid Galaxy as the central platform to present their accomplishments.

Liquid Galaxy showing aerial view of South Korea

If you’re in Pyeongchang, why not take some time to see this incredible immersive platform?

End Point is looking forward to further cooperation with AZero in Korea, as well as our partners in Japan for the 2020 Olympic games that will be held in Tokyo.

Perl Foundation News: Maintaining the Perl 5 Core (Dave Mitchell): Grant Report for Jan 2018

This is a monthly report by Dave Mitchell on his grant under Perl 5 Core Maintenance Fund. We thank the TPF sponsors to make this grant possible.

The main thing I did last month was to fix a bunch of issues with tr///c.
Initially I was just working on a particular ticket, then noticed that
tr///c was almost completed untested in core, and had a bunch of issues,
and I ended up rewriting most of the implementation for
tr/nonutf8/nonutf8/c (with or without the /s and /d flags).
From the merge commit, v5.27.7-203-gb1f1599:

    This branch does the following:

    Fixes an issue with tr/non_utf8/long_non_utf8/c, where
    length(long_non_utf8) > 0x7fff.

    Fixes an issue with tr/non_utf8/non_utf8/cd: basically, the
    implicit \x{100}-\x{7fffffff} added to the searchlist by /c wasn't being
    added.

    Adds a lot of code comments to the various tr/// functions.

    Adds tr///c tests - basically /c was almost completely untested.

    Changes the layout of the op_pv transliteration table: it used to be roughly

          256 x short  - basic table
            1 x short  - length of extended table (n)
            n x short  - extended table

    where the 2 and 3rd items were only present under /c. Its now

            1 x Size_t - length of table (256+n)
      (256+n) x short  - table - both basic and extended

    where n == 0 apart from under /c.

    The new table format also allowed the tr/non_utf8/non_utf8/ code branches
    to be considerably simplified.

    op_dump() now dumps the contents of the (non-utf8 variant) transliteration
    table.

    Removes I32's from the tr/non_utf8/non_utf8/ code paths, making it fully
    64-bit clean.

    Improves the pod for tr///.

The other thing of note I did was to move sub attributes back before the
signature, because attributes need to be able to affect the compilation of
code within the signature, e.g.

    sub f :lvalue ($a = do { $x = "abc"; return substr($x,0,1)}) {
        ....
    }

SUMMARY:
      0:20 OP_MULTICONCAT signed/unsigned issue
      6:30 RT #132141: lvalue return broken in signature
     35:58 RT #132608 heap-buffer-overflow in Perl_do_trans
      2:22 RT #132772: MULTICONCAT: Geo-StreetAddress-US-1.04 affected too
      0:56 SEGV in t/re/pat.t
      0:55 [perl #131648] Out-of-bounds read in S_regmatch
      2:26 process p5p mailbox
    ------
     49:27 TOTAL (HH::MM)


 224.4 weeks
2962.4 total hours
  13.2 average hours per week

There are 171 hours left on the grant.

Perl Foundation News: Maintaining Perl 5 (Tony Cook): January 2018 Report

This is a monthly report by Tony Cook on his grant under Perl 5 Core Maintenance Fund. We thank the TPF sponsors to make this grant possible.

Approximately 29 tickets were reviewed, and 6 patches were
applied

[Hours]         [Activity]
 34.99          #127743 try to work up makefile rules for new stack limit
                handling
                #127743 more makefile rules, work on fetching config from
                var instead of a constant
                #127743 more stack size calculation re-work
                #127743 more makefile rules, experiment with stack size
                probing
                #127743 testing, debugging, more stack size probing,
                documentation
                #127743 debugging, non-core testing
                #127743 non-core debugging, fix calculation for stack
                frames on 5.8.9, debug 5.10.0
                #127743 for 5.10.0 debugging
                #127743 track down to dependency issue, work on fix,
                testing
                #127743 testing, minor fixes
                #127743 more non-core testing, clean up, start on Win32
                build updates
                #127743 check non-Storable perl builds, back to Win32
                builds, locale discussion with khw
                #127743 more Win32 build work, nmake, gmake, start dmake
                #127743 fix deps, testing
                #127743 finish win32 testing, commit clean up, more regexp
                tests, start smoke
                #127743 review smoke results, diagnose cygwin failures,
                debugging
                #127743 continue debugging, find part of problem (no depth
                checks on retrieving array/refs)
                #127743 adjust max depth for cynwin64, prep to test on
                fedora
                #127743 try to repro fedora failures, can’t, try to repro
                freebsd issues
                #127743 avoid testing huge levels of recursion
                #127743 review smoke results, be more conservative on all
                platforms, testing, push new smoke candidate
  0.46          #131844 (sec) refresh memory and comment
                #131844 (sec) review and comment
  1.45          #131878 re-review discussion, discussion on #p5p, minor
                change to patch, testing, apply to blead, make ticket
                public and close
  1.32          #131954 testing, apply to blead, make ticket public
  0.35          #132055 (sec) review and comment
  1.74          #132063 (sec) review, check possible related issues
                #132063 (sec) more check possible issues, comment
  2.33          #132147 (sec) review, research, work on patch
                #132147 (sec) testing, clean up, comment with patch
  0.30          #132189 (sec) reproduce and forward privately upstream
  1.03          #132227 (sec) debugging
  2.26          #132533 review patches, struggle applying them, comment
                #132533 review new patches, testing, fixes, apply to blead
  0.52          #132602 security assessment, make public, comments
  1.40          #132618 review, debugging, make public and link to meta
                ticket, comment
  0.37          #132625 (sec) testing, comment
  8.53          #132640 (sec) delve into sub-parsing, try not to delve too
                deeply
                #132640 try to understand the parser, research
                #132640 (sec) work on a fix, debugging
                #132640 (sec) get a working patch, try to improve it, make
                ticket public, merge into 125351, comment with patch
  1.74          #132648 testing, research
                #132648 mail cygwin mailing list
                #132648 testing
  0.18          #132654 review, make public and reject
  1.13          #132655 (sec) testing, comment
  0.33          #132679 review and comment
  0.95          #132704 testing, apply to blead
  2.43          #132761 testing, test against older perl versions, comment
                #132761 work up a patch and comment with it
                #132761 testing in blead, testing against older perls,
                apply to blead
  0.43          #132762 review, testing, apply to blead
  0.25          investigate cygwin enoent test failures (have problems
                fetching from git while I’m at it)
  0.33          list catch up
  0.45          more cygwin, ask Zefram abv
  0.72          more work on security list discussion notes
  0.88          review security queue
  0.17          security queue clean up – moved several tickets to the
                public queue
  0.37          ticket cleanup
  1.33          Work on composing security list discussion post, research
======
 68.74 hours total

Perl Foundation News: Grant Extension Approved: Jonathan Worthington's Perl 6 Performance and Reliability Engineering grant

Jonathan Worthington recently requested an extension of his Perl 6 Performance and Reliability Engineering grant.

I'm pleased to announce that the Board of Directors approved extension of another $10,000. It'll allow him to dedicate another 400 hours to this work. Additional $210 of expense was approved to cover the wire tranfer fee from the past payments and the next payment.

I would like to thank the community members who took time to comment on this grant extension request and our sponsors who made funding the grant possible through our Perl 6 Development Fund.

I also appreciate Jonathan Worthington's continued work.

Sawyer X: Perl 5 Porters Mailing List Summary: February 1st-11th

Hey everyone,

Following is the p5p (Perl 5 Porters) mailing list summary for the past week and a half.

Enjoy!

February 1st-11th

News

Encode 2.96 is released.

Grant Reports

Issues

New Issues

  • Perl #132788: Blead Breaks CPAN: LEMBARK/Object-Trampoline-1.42.tar.gz.
  • Perl #132795: t/porting/bench.t uses system perl's /lib during testing.
  • Perl #132799: Blead Breaks CPAN: PERLANCAR/Unix-Passwd-File-0.250.tar.gz.
  • Perl #132800: lib/unicore/mktables takes too long.
  • Perl #132810: Blead Breaks CPAN: KYZ/Test-MockCommand-0.03.tar.gz.
  • Perl #132817: Blead Breaks CPAN: SUNNAVY/Net-Google-Code-0.19.tar.gz.
  • Perl #132822: Blead Breaks CPAN: JETTERO/Games-RolePlay-MapGen-1.5008.tar.gz.
  • Perl #132828: Tricky code can bypass Carp overload protections and trigger exceptions.
  • Perl #132833: COW bug in :encoding layer.
  • Perl #132834: Blead Breaks CPAN: SMUELLER/ExtUtils-InferConfig-1.04.tar.gz.
  • Perl #132843: v5.27.8-242-ge0280e4921 fails tests.
  • Perl #132849: Perl build process leaves random files in system.
  • Perl #132854: Blead Breaks CPAN: MSTROUT/PerlX-AsyncAwait-0.001002.tar.gz.

Resolved Issues

  • Perl #132235: Bleadperl v5.27.4-48-g0cbfaef69b breaks ZEFRAM/Module-Runtime-0.015.tar.gz.
  • Perl #132373: Bleadperl v5.27.4-53-g04680144c4 breaks CSSON/Pod-Elemental-Transformer-Splint-0.1201.tar.gz.
  • Perl #132593: PERL-5.26.1 heap_buffer_overflow.
  • Perl #132630: Assertion failure in Perl_fbm_instr.
  • Perl #132732: use if - behaviour does not match documentation.
  • Perl #132799: Blead Breaks CPAN: PERLANCAR/Unix-Passwd-File-0.250.tar.gz.
  • Perl #132817: Blead Breaks CPAN: SUNNAVY/Net-Google-Code-0.19.tar.gz.
  • Perl #132843: v5.27.8-242-ge0280e4921 fails tests.

Perl Foundation News: Maintaining the Perl 5 Core (Zefram): Grant Report for Jan 2017

This is a monthly report by Zefram on his grant under Perl 5 Core Maintenance Fund. We thank the TPF sponsors to make this grant possible.

The hours that I have worked in 2018-01 pursuant to my TPF core
maintenance grant are as follows.

  9h33m  review mail
  5h31m  review tickets
  2h40m  [perl #132671] Bleadperl v5.27.6-206-g16ada235c3 breaks
         JGAMBLE/Algorithm-QuineMcCluskey-0.16.tar.gz
  2h14m  [perl #8910] Subroutine doesn't create elements.
  1h57m  [perl #132648] Cwd: different return values between pure perl
         and XS variants
  1h47m  [perl #132695] BBC Catalyst tests broken by
         abda9fe0fe75ae824723761c1c98af958f17a41c
  1h32m  [perl #47602] floating-point binary-decimal conversion bugs
  1h01m  [perl #132727] BBC Class::MethodMaker broken by
         6661956a23de82b41adc406200054293d6d7aded
    53m  [perl #132425] Suggested warning on attempt to 'use' with
         arguments when no import() subroutine exists
    52m  create tickets
    49m  [perl #30123] foreach ("", "a".."zzzzzz") confuses range
         optimizer
    48m  [perl #8045] Hash keys are not always parsed as strings if
         the arrow is omitted
    29m  [perl #16113] open on localized *F pb
    29m  [perl #132788] Blead Breaks CPAN:
         LEMBARK/Object-Trampoline-1.42.tar.gz
    28m  perldebguts
    25m  [perl #132775] Blead Breaks CPAN:
         SMUELLER/Math-Clipper-1.23.tar.gz
    21m  Darwin mkostemp() misconfiguration
    21m  [perl #18581] memory leak: if($foo++){} and = overloading
    10m  [perl #54412] Mistake in perlipc doc, perl 5.10.0
     9m  [perl #132100] [PATCH] missing newline after "Unable to flush
         stdout: ..."
     9m  perldelta
     8m  [perl #132777] Blead Breaks CPAN:
         GRAY/POSIX-RT-Spawn-0.11.tar.gz
     7m  review commits
 32h53m  TOTAL

At the end of the month there was 71h19m of authorised working time
remaining in this grant.

End Point Blog: Regionation with PostGIS

Coral reefs map

Recently a co-worker handed me a KML file and said, in essence, “This file takes too long for the Liquid Galaxy to load and render. What can you do to make it faster?” It’s a common problem for large data sets, no matter the display platform. For the Liquid Galaxy the common first response is “Regionate!”

Though your dictionary may claim otherwise, for purposes of this post the word means to group a set of geographic data into regions of localized objects. This is sometimes also called “spatial clustering” or “geographic clustering”. After grouping objects into geographically similar clusters, we can then use the KML Region object to tell Google Earth to render the full detail of a region only when the current view shows enough of that region to justify spending the processing time. Although the “Pro” version of Google Earth offers an automated regionation feature, it has some limitations. I’d like to compare it to some alternatives available in PostgreSQL and PostGIS.

Data sets

For this experiment I’ve chosen a few different freely available datasets, with the aim to use different geographic data types, distributed in different ways. First, I found a database of 49,000 on-shore wind turbines across the United States. The data set represents each turbine with a single point, and annotates it with its type, manufacturer, capacity, and many other characteristics. In this sample image, notice how the data are distributed, with compact clusters of many points, scattered sparsely across the landscape.

Wind turbine dataset

I wanted data sets with different types of geometric objects, so the second data set I chose describes geologic fault lines within the United States as multiple strings of connected line segments, comprising what’s known as a multilinestring object (see the OpenGIS Consortium Simple Features for SQL specification for more details on multilinestring data and other GIS data types in this document). Like the wind turbine data, this dataset also annotates each fault with other attributes, though we won’t use those annotations here. The faults are distributed in less obvious clusters than the wind turbines, though some clustering is still easily visible.

Faults dataset

The last data set uses a third geographic type, the multipolygon, to represent the perimeter of wildfires in a portion of the western United States. The data set combines wildfire data from a variety of freely available sources, and though it may not be comprehensive, it serves nicely to experiment with regionating polygon data. Note that across the area covered by the dataset, the polygons are quite thickly distributed.

Wildfire dataset

Regionating with Google Earth Pro

Having decided which data sets to use, my first step was to download each one, and import them into three different tables in a PostGIS database. The precise details are beyond the scope of this article, but for those interested in playing along at home, I recommend trying qgis. For data processing and storage, I used PostgreSQL 10 and PostGIS 2.4. Since my goal is to compare these data with Google Earth Pro’s built-in regionation tools, for the next step I created KML files from my PostGIS data with a simple Python script, and used Google Earth Pro to regionate each data set. Here I ran into a few surprises, first that Google Earth Pro doesn’t actually display anything but menus on my Ubuntu 16.04 system. The computer I’m using is a little unusual, so I’m willing to chalk this up to some peculiarity of the computer, and I won’t hold this one against Google Earth Pro. It certainly made testing a little difficult, as I had to use my Google Earth Pro installation for regionating, and either my Google Earth (note: not “Pro”) installation, or qgis, to view the results.

The second surprise was the sheer time it took to regionate these data. The wildfire and turbine data only took a couple of minutes, but my computer spent more than half an hour to regionate the faults data. My system isn’t especially high powered, but this still caught me off guard, and as we’ll see, the other regionation methods I tried took far less time than Google Earth Pro for this data set.

Finally, Google Earth Pro’s regionation system offers no customization at all; the user simply enters a KML file to regionate and a directory for the results. The PostGIS-based options, as we’ll see below, have parameters which affect the size and number of resulting clusters, but with Google Earth Pro, you’re stuck with whatever it gives you. For each of the three data sets, Google Earth Pro gave me thousands of KML files as output -- just over 2000 files for the wildfire data, and over 17,000 for the faults data. Each file contains all the placemarks for one of the regions the system found. In any regionation problem I’ve had, the end goal was always to create a single, final KML file in the end, so in practice I’d likely have to reprocess each of these files (or rather, I’d write code to do it), but this isn’t much different from the other methods we’ll explore, which create tens, hundreds, or even thousands of database entries and which likewise need to be processed in software.

Basic regionating with PostGIS

PostGIS offers several functions to help us cluster geographic data. The two I used for this test were ST_ClusterKMeans and ST_ClusterDBScan. Both are window functions which calculate clusters and return a unique integer ID for each input row. So after making sure the table for each of my three data sets contained a primary key called “id”, a geometry field called “location”, and an empty integer field called “cluster_id”, I could cluster a dataset with one query, such as this one for the turbine data:

UPDATE turbines
    SET cluster_id = cid
    FROM (
        SELECT
            id,
            ST_ClusterDBScan(location, 0.5, 3)
                OVER () AS cid
        FROM turbines
    ) foo
    WHERE foo.id = turbines.id;

The two functions I mentioned above have advantages and disadvantages for different situations, and I knew I’d want to experiment with them and their arguments, so I wrote another Python script to digest the clustering results and turn them into KML, to make it easy to compare different values and see which I liked best. When zoomed out, each KML files display an approximate outline of the group of objects in one region, and as the user zooms in, the objects themselves appear. For instance, in this screenshot from the wind turbine data set, you can see several regions outlined in white, and the individual turbines from one region appearing at the top of the image.

Regionated wind turbines

KML cognoscenti may note that regions defined in KML are always rectangular, but the shapes in the image above are not. In fact, the image shows “concave hulls” drawn around the objects in the region, the result of the PostGIS ST_ConcaveHull function. I outlined the regions with these hulls rather than with rectangles, to get a better idea of where the objects each cluster actually lay within the rectangular region. This was also helpful for another data set I was working on at the time, not described in this article.

To create these regions and the corresponding hulls, I need to know each region’s bounding box in latitude and longitude coordinates, and the hull to draw. The script gets that information like this:

WITH clusters AS (
    SELECT
        cluster_id,
        ST_Collect(location) AS locations
    FROM turbines
    GROUP BY cluster_id
)
SELECT
    ST_XMin(locations) AS xmin,
    ST_XMax(locations) AS xmax,
    ST_YMin(locations) AS ymin,
    ST_YMax(locations) AS ymax,
    ST_AsKML(ST_ConcaveHull(locations, 0.99)) AS envelope
FROM clusters;

PostGIS regionation functions

The two functions I mentioned deserve some further exploration. First ST_ClusterKMeans, which as you’d expect from the name, group objects using a k-means cluster algorithm. Its only parameters are the geometric data objects themselves, and the maximum number of clusters the user wants it to find. The algorithm will make the clusters as small as they need to be, in order to come up with the number the user requests, though of course it won’t generate more clusters than there are input data points.

The other function I used is called ST_ClusterDBScan, an implementation of the DBSCAN algorithm. This algorithm accepts two parameters, a distance called eps, and a number of points called minpoints. A discussion of their meaning is out of the scope of this article; suffice it to say that modifying these values will adjust the size and number of resulting clusters.

Which function works best for a given situation, and with which parameters, depends largely on the specific needs of the end user, as well as the type and distribution of the input data. For example, compare these two results, which regionate the faults data set using first DBSCAN, then K-means.

Faults data, DBSCAN clustering

Faults data, K-means clustering

Aside from the obvious advantages of customizability, the PostGIS regionation is much faster than Google Earth Pro’s regionater. The DBSCAN clustering shown above took a bit over two minutes, and the K-means clustering was done in just six seconds. Of course this timing varies with different parameters, and it wasn’t terribly difficult to find a DBSCAN algorithm invocation that would take over ten minutes on the faults data. But that’s still an improvement over Google Earth Pro’s half hour run times.

Here’s another example, taken from the wind turbine data, again with the DBSCAN results first, followed by K-means.

Wind turbine data, DBSCAN clustering

Wind turbine data, K-means clustering

Applications

Obviously, beauty is in the eye of the beholder, and applications could be found for which both clustering systems were beneficial. I’ll finish with one more example from a data set I haven’t yet talked about. It comes from NOAA and shows coral reefs determined to be under threat from various risks, as well as the predicted development of that risk level through the year 2050. For these data, I envisioned regionating them so that when zoomed out, the user would see the general boundaries of a cluster, colored in a way that reflected the level of assessed risk. Zooming in, the user would see the large clusters fade into smaller ones, which in turn would fade into the full detail. This is where I began using the “concave hulls” mentioned earlier. KML Regions let us specify fades, so that one level of detail fades out as another fades in, and I used this along with two levels of clustering to create the data set shown in screen captures here. First, from a high altitude, the user sees the largest, least-detailed clusters, with colors representing the assessed threat.

Coral reefs, widest zoom

Zooming in, the larger clusters fade into smaller ones. Here you can see some of the most detailed layers already beginning to emerge. In the top right, you can also see one of the original clusters fading out.

Coral reefs, mid-range zoom

Finally, at even closer zoom levels, we see mostly the original reef data.

Coral reefs, closest zoom

I find this presentation both visually beautiful, and technically interesting. Please comment on the uses you can envision for this sort of processing.

Perl Foundation News: Maintaining Perl 5 (Tony Cook): December 2017 Report

This is a monthly report by Tony Cook on his grant under Perl 5 Core Maintenance Fund. We thank the TPF sponsors to make this grant possible.

Approximately 4 tickets were reviewed or worked on.

[Hours]         [Activity]
 22.84          #127743 32-bit build warnings/build issues
                #127743 finish up 32-bit testing, commits, work on other
                unaddressed Storable issues, 64-bit frozen data
                #127743 handle 32-bit frozen data as unsigned, add support
                for 64-bit frozen data, testing
                #127743 more testing, check g(un)zip exist before we use
                them
                #127743 work through CPAN queue, fail to reproduce #59531,
                reproduce and fix #118551
                #127743 try to reproduce #131136, track down cause (long
                double padding)
                #127743 work up a fix for the padding issue, testing with
                various configs
                #127743 systematic look at Storable issues, reduce
                performance issue on debugging builds
                #127743 test code for regexp cloning
                #127743 more regexp testing, fix Storable.pm dependencies
                #127743 old perl testing, cleanup
                #127743 think about fix for stack size calculation issues
  0.80          #132252 try to understand change and implications,
                thinking
  0.28          #132631 research and comment
  5.55          #50608 work out why previous changes didn’t break malice.t
                (which the regexp changes did), fix, more regexp testing
                #50608 old perl portability
                #50608 debugging on 5.6.2, try to rebuild 5.6.2 on modern
                Debian
                #50608 more debugging, drop 5.6 support for regexp thawing
======
 29.47 hours total

Dave's Free Press: Journal: Module pre-requisites analyser

Dave's Free Press: Journal: CPANdeps

Dave's Free Press: Journal: Perl isn't dieing

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 3

Dave's Free Press: Journal: Devel::CheckLib can now check libraries' contents

Ocean of Awareness: Top-down parsing is guessing

Top-down parsing is guessing. Literally. Bottom-up parsing is looking.

The way you'll often hear that phrased is that top-down parsing is looking, starting at the top, and bottom-up parsing is looking, starting at the bottom. But that is misleading, because the input is at the bottom -- at the top there is nothing to look at. A usable top-down parser must have a bottom-up component, even if that component is just lookahead.

A more generous, but still accurate, way to describe the top-down component of parsers is "prediction". And prediction is, indeed, a very useful component of a parser, when used in combination with other techniques.

Of course, if a parser does nothing but predict, it can predict only one input. Top-down parsing must always be combined with a bottom-up component. This bottom-up component may be as modest as lookahead, but it must be there or else top-down parsing is really not parsing at all.

So why is top-down parsing used so much?

Top-down parsing may be unusable in its pure form, but from one point of view that is irrelevant. Top-down parsing's biggest advantage is that it is highly flexible -- there's no reason to stick to its "pure" form.

A top-down parser can be written as a series of subroutine calls -- a technique called recursive descent. Recursive descent allows you to hook in custom-written bottom-up logic at every top-down choice point, and it is a technique which is completely understandable to programmers with little or no training in parsing theory. When dealing with recursive descent parsers, it is more useful to be a seasoned, far-thinking programmer than it is to be a mathematician. This makes recursive descent very appealing to seasoned, far-thinking programmers, and they are the audience that counts.

Switching techniques

You can even use the flexibility of top-down to switch away from top-down parsing. For example, you could claim that a top-down parser could do anything my own parser (Marpa) could do, because a recursive descent parser can call a Marpa parser.

A less dramatic switchoff, and one that still leaves the parser with a good claim to be basically top-down, is very common. Arithmetic expressions are essential for a computer language. But they are also among the many things top-down parsing cannot handle, even with ordinary lookahead. Even so, most computer languages these days are parsed top-down -- by recursive descent. These recursive descent parsers deal with expressions by temporarily handing control over to an bottom-up operator precedence parser. Neither of these parsers is extremely smart about the hand-over and hand-back -- it is up to the programmer to make sure the two play together nicely. But used with caution, this approach works.

Top-down parsing and language-oriented programming

But what about taking top-down methods into the future of language-oriented programming, extensible languages, and grammars which write grammars? Here we are forced to confront the reality -- that the effectiveness of top-down parsing comes entirely from the foreign elements that are added to it. Starting from a basis of top-down parsing is literally starting with nothing. As I have shown in more detail elsewhere, top-down techniques simply do not have enough horsepower to deal with grammar-driven programming.

Perl 6 grammars are top-down -- PEG with lots of extensions. These extensions include backtracking, backtracking control, a new style of tie-breaking and lots of opportunity for the programmer to intervene and customize everything. But behind it all is a top-down parse engine.

One aspect of Perl 6 grammars might be seen as breaking out of the top-down trap. That trick of switching over to a bottom-up operator precedence parser for expressions, which I mentioned above, is built into Perl 6 and semi-automated. (I say semi-automated because making sure the two parsers "play nice" with each other is not automated -- that's still up to the programmer.)

As far as I know, this semi-automation of expression handling is new with Perl 6 grammars, and it may prove handy for duplicating what is done in recursive descent parsers. But it adds no new technique to those already in use. And features like

  • mulitple types of expression, which can be told apart based on their context,
  • n-ary expressions for arbitrary n, and
  • the autogeneration of multiple rules, each allowing a different precedence scheme, for expressions of arbitrary arity and associativity,

all of which are available and in current use in Marpa, are impossible for the technology behind Perl 6 grammars.

I am a fan of the Perl 6 effort. Obviously, I have doubts about one specific set of hopes for Perl 6 grammars. But these hopes have not been central to the Perl 6 effort, and I will be an eager student of the Perl 6 team's work over the coming months.

Comments

To learn more about Marpa, there's the official web site maintained by Ron Savage. I also have a Marpa web site. Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.

Dave's Free Press: Journal: I Love Github

Dave's Free Press: Journal: Palm Treo call db module

Dave's Free Press: Journal: Graphing tool

Dave's Free Press: Journal: Travelling in time: the CP2000AN

Dave's Free Press: Journal: XML::Tiny released

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 1

Ocean of Awareness: Parsing: an expanded timeline

The fourth century BCE: In India, Pannini creates a sophisticated description of the Sanskrit language, exact and complete, and including pronunciation. Sanskrit could be recreated using nothing but Pannini's grammar. Pannini's grammar is probably the first formal system of any kind, predating Euclid. Even today, nothing like it exists for any other natural language of comparable size or corpus. Pannini is the object of serious study today. But in the 1940's and 1950's Pannini is almost unknown in the West. His work has no direct effect on the other events in this timeline.

1943: Emil Post defines and studies a formal rewriting system using productions. With this, the process of reinventing Pannini in the West begins.

1948: Claude Shannon publishes the foundation paper of information theory. Andrey Markov's finite state processes are used heavily.

1952: Grace Hopper writes a linker-loader and describes it as a "compiler". She seems to be the first person to use this term for a computer program. Hopper uses the term "compiler" in its original sense: "something or someone that brings other things together".

1954: At IBM, a team under John Backus begins working on the language which will be called FORTRAN. The term "compiler" is still being used in Hopper's looser sense, instead of its modern one. In particular, there is no implication that the output of a "compiler" is ready for execution by a computer. The output of one 1954 "compiler", for example, produces relative addresses, which need to be translated by hand before a machine can execute them.

1955: Noam Chomsky is awarded a Ph.D. in linguistics and accepts a teaching post at MIT. MIT does not have a linguistics department and Chomsky, in his linguistics course, is free to teach his own approach, highly original and very mathematical.

1956: Chomsky publishes the paper which is usually considered the foundation of Western formal language theory. The paper advocates a natural language approach that involves

  • a bottom layer, using Markov's finite state processes;
  • a middle, syntactic layer, using context-free grammars and context-sensitive grammars; and
  • a top layer, which involves mappings or "transformations" of the output of the syntactic layer.

These layers resemble, and will inspire, the lexical, syntactic and AST transformation phases of modern parsers. For finite state processes, Chomsky acknowledges Markov. The other layers seem to be Chomsky's own formulations -- Chomsky does not cite Post's work.

1957: Steven Kleene discovers regular expressions, a very handy notation for Markov's processes. Regular expressions turn out to describe exactly the mathematical objects being studied as finite state automata, as well as some of the objects being studied as neural nets.

1957: Noam Chomsky publishes Syntactic Structures, one of the most influential books of all time. The orthodoxy in 1957 is structural linguistics which argues, with Sherlock Holmes, that "it is a capital mistake to theorize in advance of the facts". Structuralists start with the utterances in a language, and build upward.

But Chomsky claims that without a theory there are no facts: there is only noise. The Chomskyan approach is to start with a grammar, and use the corpus of the language to check its accuracy. Chomsky's approach will soon come to dominate linguistics.

1957: Backus's team makes the first FORTRAN compiler available to IBM customers. FORTRAN is the first high-level language that will find widespread implementation. As of this writing, it is the oldest language that survives in practical use. FORTRAN is a line-by-line language and its parsing is primitive.

1958: John McCarthy's LISP appears. LISP goes beyond the line-by-line syntax -- it is recursively structured. But the LISP interpreter does not find the recursive structure: the programmer must explicitly indicate the structure herself, using parentheses.

1959: Backus invents a new notation to describe the IAL language (aka ALGOL). Backus's notation is influenced by his study of Post -- he seems not to have read Chomsky until later.

1960: Peter Naur improves the Backus notation and uses it to describe ALGOL 60. The improved notation will become known as Backus-Naur Form (BNF).

1960: The ALGOL 60 report specifies, for the first time, a block structured language. ALGOL 60 is recursively structured but the structure is implicit -- newlines are not semantically significant, and parentheses indicate syntax only in a few specific cases. The ALGOL compiler will have to find the structure. It is a case of 1960's optimism at its best. As the ALGOL committee is well aware, a parsing algorithm capable of handling ALGOL 60 does not yet exist. But the risk they are taking will soon pay off.

1960: A.E. Gleenie publishes his description of a compiler-compiler. Glennie's "universal compiler" is more of a methodology than an implementation -- the compilers must be written by hand. Glennie credits both Chomsky and Backus, and observes that the two notations are "related". He also mentions Post's productions. Glennie may have been the first to use BNF as a description of a procedure instead of as the description of a Chomsky grammar. Glennie points out that the distinction is "important".

Chomskyan BNF and procedural BNF: BNF, when used as a Chomsky grammar, describes a set of strings, and does not describe how to parse strings according to the grammar. BNF notation, if used to describe a procedure, is a set of instructions, to be tried in some order, and used to process a string. Procedural BNF describes a procedure first, and a language only indirectly.

Both procedural and Chomskyan BNF describe languages, but usually not the same language. That is,

  • Suppose D is some BNF description.
  • Let P(D) be D interpreted as a procedure,
  • Let L(P(D)) be the language which the procedure P(D) parses.
  • Let G(D) be D interpreted as a Chomsky grammar.
  • Let L(G(D)) be the language which the grammar G(D) describes.
  • Then, usually, L(P(D)) != L(G(D)).

The pre-Chomskyan approach, using procedural BNF, is far more natural to someone trained as a computer programmer. The parsing problem appears to the programmer in the form of strings to be parsed, exactly the starting point of procedural BNF and pre-Chomsky parsing.

Even when the Chomskyan approach is pointed out, it does not at first seem very attractive. With the pre-Chomskyan approach, the examples of the language more or less naturally lead to a parser. In the Chomskyan approach the programmer has to search for an algorithm to parse strings according to his grammar -- and the search for good algorithms to parse Chomskyan grammars has proved surprisingly long and difficult. Handling semantics is more natural with a Chomksyan approach. But, using captures, semantics can be added to a pre-Chomskyan parser and, with practice, this seems natural enough.

Despite the naturalness of the pre-Chomskyan approach to parsing, we will find that the first fully-described automated parsers are Chomskyan. This is a testimony to Chomsky's influence at the time. We will also see that Chomskyan parsers have been dominant ever since.

1961: In January, Ned Irons publishes a paper describing his ALGOL 60 parser. It is the first paper to fully describe any parser. The Irons algorithm is Chomskyan and top-down with a "left corner" element. The Irons algorithm is general, meaning that it can parse anything written in BNF. It is syntax-driven (aka declarative), meaning that the parser is actually created from the BNF -- the parser does not need to be hand-written.

1961: Peter Lucas publishes the first description of a purely top-down parser. This can be considered to be recursive descent, though in Lucas's paper the algorithm has a syntax-driven implementation, useable only for a restricted class of grammars. Today we think of recursive descent as a methodology for writing parsers by hand. Hand-coded approaches became more popular in the 1960's due to three factors:

  • Memory and CPU were both extremely limited. Hand-coding paid off, even when the gains were small.
  • Non-hand coded top-down parsing, of the kind Lucas's syntax-driven approach allowed, is a very weak parsing technique. It was (and still is) often necessary to go beyond its limits.
  • Top-down parsing is intuitive -- it essentially means calling subroutines. It therefore requires little or no knowledge of parsing theory. This makes it a good fit for hand-coding.

1963: L. Schmidt, Howard Metcalf, and Val Schorre present papers on syntax-directed compilers at a Denver conference.

1964: Schorre publishes a paper on the Meta II "compiler writing language", summarizing the papers of the 1963 conference. Schorre cites both Backus and Chomsky as sources for Meta II's notation. Schorre notes that his parser is "entirely different" from that of Irons 1961 -- in fact it is pre-Chomskyan. Meta II is a template, rather than something that readers can use, but in principle it can be turned into a fully automated compiler-compiler.

1965: Don Knuth invents LR parsing. The LR algorithm is deterministic, Chomskyan and bottom-up, but it is not thought to be practical. Knuth is primarily interested in the mathematics.

1968: Jay Earley invents the algorithm named after him. Like the Irons algorithm, Earley's algorithm is Chomskyan, syntax-driven and fully general. Unlike the Irons algorithm, it does not backtrack. Earley's algorithm is both top-down and bottom-up at once -- it uses dynamic programming and keeps track of the parse in tables. Earley's approach makes a lot of sense and looks very promising indeed, but there are three serious issues:

  • First, there is a bug in the handling of zero-length rules.
  • Second, it is quadratic for right recursions.
  • Third, the bookkeeping required to set up the tables is, by the standards of 1968 hardware, daunting.

1969: Frank DeRemer describes a new variant of Knuth's LR parsing. DeRemer's LALR algorithm requires only a stack and a state table of quite manageable size. LALR looks practical.

1969: Ken Thompson writes the "ed" editor as one of the first components of UNIX. At this point, regular expressions are an esoteric mathematical formalism. Through the "ed" editor and its descendants, regular expressions will become an everyday part of the working programmer's toolkit.

Recognizers: In comparing algorithms, it can be important to keep in mind whether they are recognizers or parsers. A recognizer is a program which takes a string and produces a "yes" or "no" according to whether a string is in part of a language. Regular expressions are typically used as recognizers. A parser is a program which takes a string and produces a tree reflecting its structure according to a grammar. The algorithm for a compiler clearly must be a parser, not a recognizer. Recognizers can be, to some extent, used as parsers by introducing captures.

1972: Alfred Aho and Jeffrey Ullman publish a two volume textbook summarizing the theory of parsing. This book is still important. It is also distressingly up-to-date -- progress in parsing theory slowed dramatically after 1972. Aho and Ullman describe a straightforward fix to the zero-length rule bug in Earley's original algorithm. Unfortunately, this fix involves adding even more bookkeeping to Earley's.

1972: Under the names TDPL and GTDPL, Aho and Ullman investigate the non-Chomksyan parsers in the Schorre lineage. They note that "it can be quite difficult to determine what language is defined by a TDPL parser". That is, GTDPL parsers do whatever they do, and that whatever is something the programmer in general will not be able to describe. The best a programmer can usually do is to create a test suite and fiddle with the GTDPL description until it passes. Correctness cannot be established in any stronger sense. GTDPL is an extreme form of the old joke that "the code is the documentation" -- with GTDPL nothing documents the language of the parser, not even the code.

GTDPL's obscurity buys nothing in the way of additional parsing power. Like all non-Chomskyan parsers, GTDPL is basically a extremely powerful recognizer. Pressed into service as a parser, it is comparatively weak. As a parser, GTDPL is essentially equivalent to Lucas's 1961 syntax-driven algorithm, which was in turn a restricted form of recursive descent.

At or around this time, rumor has it that the main line of development for GTDPL parsers is classified secret by the US government. GTDPL parsers have the property that even small changes in GTDPL parsers can be very labor-intensive. For some government contractors, GTDPL parsing provides steady work for years to come. Public interest in GTDPL fades.

1975: Bell Labs converts its C compiler from hand-written recursive descent to DeRemer's LALR algorithm.

1977: The first "Dragon book" comes out. This soon-to-be classic textbook is nicknamed after the drawing on the front cover, in which a knight takes on a dragon. Emblazoned on the knight's lance are the letters "LALR". From here on out, to speak lightly of LALR will be to besmirch the escutcheon of parsing theory.

1979: Bell Laboratories releases Version 7 UNIX. V7 includes what is, by far, the most comprehensive, useable and easily available compiler writing toolkit yet developed.

1979: Part of the V7 toolkit is Yet Another Compiler Compiler (YACC). YACC is LALR-powered. Despite its name, YACC is the first compiler-compiler in the modern sense. For some useful languages, the process of going from Chomskyan specification to executable is fully automated. Most practical languages, including the C language and YACC's own input language, still require manual hackery. Nonetheless, after two decades of research, it seems that the parsing problem is solved.

1987: Larry Wall introduces Perl 1. Perl embraces complexity like no previous language. Larry uses YACC and LALR very aggressively -- to my knowledge more aggressively than anyone before or since.

1991: Joop Leo discovers a way of speeding up right recursions in Earley's algorithm. Leo's algorithm is linear for just about every unambiguous grammar of practical interest, and many ambiguous ones as well. In 1991 hardware is six orders of magnitude faster than 1968 hardware, so that the issue of bookkeeping overhead had receded in importance. This is a major discovery. When it comes to speed, the game has changed in favor of the Earley algorithm.

But Earley parsing is almost forgotten. Twenty years will pass before anyone writes a practical implementation of Leo's algorithm.

1990's: Earley's is forgotten. So everyone in LALR-land is content, right? Wrong. Far from it, in fact. Users of LALR are making unpleasant discoveries. While LALR automatically generates their parsers, debugging them is so hard they could just as easily write the parser by hand. Once debugged, their LALR parsers are fast for correct inputs. But almost all they tell the users about incorrect inputs is that they are incorrect. In Larry's words, LALR is "fast but stupid".

2000: Larry Wall decides on a radical reimplementation of Perl -- Perl 6. Larry does not even consider using LALR again.

2002: John Aycock and R. Nigel Horspool publish their attempt at a fast, practical Earley's parser. Missing from it is Joop Leo's improvement -- they seem not to be aware of it. Their own speedup is limited in what it achieves and the complications it introduces can be counter-productive at evaluation time. But buried in their paper is a solution to the zero-length rule bug. And this time the solution requires no additional bookkeeping.

2004: Bryan Ford publishes his paper on PEG. Implementers by now are avoiding YACC, and it seems as if there might soon be no syntax-driven algorithms in practical use. Ford fills this gap by repackaging the nearly-forgotten GTDPL. Ford adds packratting, so that PEG is always linear, and provides PEG with an attractive new syntax. But nothing has been done to change the problematic behaviors of GTDPL.

2006: GNU announces that the GCC compiler's parser has been rewritten. For three decades, the industry's flagship C compilers have used LALR as their parser -- proof of the claim that LALR and serious parsing are equivalent. Now, GNU replaces LALR with the technology that it replaced a quarter century earlier: recursive descent.

Today: After five decades of parsing theory, the state of the art seems to be back where it started. We can imagine someone taking Ned Iron's original 1961 algorithm from the first paper ever published describing a parser, and republishing it today. True, he would have to translate its code from the mix of assembler and ALGOL into something more fashionable, say Haskell. But with that change, it might look like a breath of fresh air.

Marpa: an afterword

The recollections of my teachers cover most of this timeline. My own begin around 1970. Very early on, as a graduate student, I became unhappy with the way the field was developing. Earley's algorithm looked interesting, and it was something I returned to on and off.

The original vision of the 1960's was a parser that was

  • efficient,
  • practical,
  • general, and
  • syntax-driven.

By 2010 this vision seemed to have gone the same way as many other 1960's dreams. The rhetoric stayed upbeat, but parsing practice had become a series of increasingly desperate compromises.

But, while nobody was looking for them, the solutions to the problems encountered in the 1960's had appeared in the literature. Aycock and Horspool had solved the zero-length rule bug. Joop Leo had found the speedup for right recursion. And the issue of bookkeeping overhead had pretty much evaporated on its own. Machine operations are now a billion times faster than in 1968, and are probably no longer relevant in any case -- cache misses are now the bottleneck.

The programmers of the 1960's would have been prepared to trust a fully declarative Chomskyan parser. With the experience with LALR in their collective consciousness, modern programmers might be more guarded. As Lincoln said, "Once a cat's been burned, he won't even sit on a cold stove." But I found it straightforward to rearrange the Earley parse engine to allow efficient event-driven handovers between procedural and syntax-driven logic. And Earley tables provide the procedural logic with full knowledge of the state of the parse so far, so that Earley's algorithm is a better platform for hand-written procedural logic than recursive descent.

References, comments, etc.

My implementation of Earley's algorithm is called Marpa. For more about Marpa, there is the semi-official web site, maintained by Ron Savage. The official, but more limited, Marpa website is my personal one. Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.

Dave's Free Press: Journal: Thanks, Yahoo!

Dave's Free Press: Journal: YAPC::Europe 2007 report: day 2

Ocean of Awareness: What are the reasonable computer languages?

"You see things; and you say 'Why?' But I dream things that never were; and I say 'Why not?'" -- George Bernard Shaw

In the 1960's and 1970's computer languages were evolving rapidly. It was not clear which way they were headed. Would most programming be done with general-purpose languages? Or would programmers create a language for every task domain? Or even for every project? And, if lots of languages were going to be created, what kinds of languages would be needed?

It was in that context that Čulik and Cohen, in a 1973 paper, outlined what they thought programmers would want and should have. In keeping with the spirit of the time, it was quite a lot:

  • Programmers would want to extend their grammars with new syntax, including new kinds of expressions.
  • Programmers would also want to use tools that automatically generated new syntax.
  • Programmers would not want to, and especially in the case of auto-generated syntax would usually not be able to, massage the syntax into very restricted forms. Instead, programmers would create grammars and languages which required unlimited lookahead to disambiguate, and they would require parsers which could handle these grammars.
  • Finally, programmers would need to be able to rely on all of this parsing being done in linear time.

Today, we think we know that Čulik and Cohen's vision was naive, because we think we know that parsing technology cannot support it. We think we know that parsing is much harder than they thought.

The eyeball grammars

As a thought problem, consider the "eyeball" class of grammars. The "eyeball" class of grammars contains all the grammars that a human can parse at a glance. If a grammar is in the eyeball class, but a computer cannot parse it, it presents an interesting choice. Either,

  • your computer is not using the strongest practical algorithm; or
  • your mind is using some power which cannot be reduced to a machine computation.

There are some people out there (I am one of them) who don't believe that everything the mind can do reduces to a machine computation. But even those people will tend to go for the choice in this case: There must be some practical computer parsing algorithm which can do at least as well at parsing as a human can do by "eyeball". In other words, the class of "reasonable grammars" should contain the eyeball class.

Čulik and Cohen's candidate for the class of "reasonable grammars" were the grammars that a deterministic parse engine could parse if it had a lookahead that was infinite, but restricted to distinguishing between regular expressions. They called these the LR-regular, or LRR, grammars. And the LRR grammars do in fact seem to be a good first approximation to the eyeball class. They do not allow lookahead that contains things that you have to count, like palindromes. And, while I'd be hard put to eyeball every possible string for every possible regular expression, intuitively the concept of scanning for a regular expression does seem close to capturing the idea of glancing through a text looking for a telltale pattern.

So what happened?

Alas, the algorithm in the Čulik and Cohen paper turned out to be impractical. But in 1991, Joop Leo discovered a way to adopt Earley's algorithm to parse the LRR grammars in linear time, without doing the lookahead. And Leo's algorithm does have a practical implementation: Marpa.

References, comments, etc.

To learn more about Marpa, there's the official web site maintained by Ron Savage. I also have a Marpa web site. Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.

Dave's Free Press: Journal: YAPC::Europe 2007 travel plans

Dave's Free Press: Journal: Wikipedia handheld proxy

Dave's Free Press: Journal: Bryar security hole

Dave's Free Press: Journal: POD includes

Dave's Free Press: Journal: cgit syntax highlighting

Dave's Free Press: Journal: CPAN Testers' CPAN author FAQ

Ocean of Awareness: What parser do birds use?

"Here we provide, to our knowledge, the first unambiguous experimental evidence for compositional syntax in a non-human vocal system." -- "Experimental evidence for compositional syntax in bird calls", Toshitaka N. Suzuki, David Wheatcroft & Michael Griesser Nature Communications 7, Article number: 10986

In this post I look at a subset of the language of the Japanese great tit, also known as Parus major. The above cited article presents evidence that bird brains can parse this language. What about standard modern computer parsing methods? Here is the subset -- probably a tiny one -- of the language actually used by Parus major.

      S ::= ABC
      S ::= D
      S ::= ABC D
      S ::= D ABC
    

Classifying the Parus major grammar

Grammophone is a very handy new tool for classifying grammars. Its own parser is somewhat limited, so that it requires a period to mark the end of a rule. The above grammar is in Marpa's SLIF format, which is smart enough to use the "::=" operator to spot the beginning and end of rules, just as the human eye does. Here's the same grammar converted into a form acceptable to Grammophone:

      S -> ABC .
      S -> D .
      S -> ABC D .
      S -> D ABC .
    

Grammophone tells us that the Parus major grammar is not LL(1), but that it is LALR(1).

What does this mean?

LL(1) is the class of grammar parseable by top-down methods: it's the best class for characterizing most parsers in current use, including recursive descent, PEG, and Perl 6 grammars. All of these parsers fall short of dealing with the Parus major language.

LALR(1) is probably most well-known from its implementations in bison and yacc. While able to handle this subset of Parus's language, LALR(1) has its own, very strict, limits. Whether LALR(1) could handle the full complexity of Parus language is a serious question. But it's a question that in practice would probably not arise. LALR(1) has horrible error handling properties.

When the input is correct and within its limits, an LALR-driven parser is fast and works well. But if the input is not perfectly correct, LALR parsers produce no useful analysis of what went wrong. If Parus hears "d abc d", a parser like Marpa, on the other hand, can produce something like this:

# * String before error: abc d\s
# * The error was at line 1, column 7, and at character 0x0064 'd', ...
# * here: d
    

Parus uses its language in predatory contexts, and one can assume that a Parus with a preference for parsers whose error handling is on an LALR(1) level will not be keeping its alleles in the gene pool for very long.

References, comments, etc.

Those readers content with sub-Parus parsing methods may stop reading here. Those with greater parsing ambitions, however, may wish to learn more about Marpa. A Marpa test script for parsing the Parus subset is in a Github gist. Marpa has a semi-official web site, maintained by Ron Savage. The official, but more limited, Marpa website is my personal one. Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.

Dave's Free Press: Journal: Thankyou, Anonymous Benefactor!

Dave's Free Press: Journal: Number::Phone release

Dave's Free Press: Journal: Ill

Dave's Free Press: Journal: CPANdeps upgrade

Dave's Free Press: Journal: YAPC::Europe 2006 report: day 3

Ocean of Awareness: Introduction to Marpa Book in progress

What follows is a summary of the features of the Marpa algorithm, followed by a discussion of potential applications. It refers to itself as a "monograph", because it is a draft of part of the introduction to a technical monograph on the Marpa algorithm. I hope the entire monograph will appear in a few weeks.

The Marpa project

The Marpa project was intended to create a practical and highly available tool to generate and use general context-free parsers. Tools of this kind had long existed for LALR and regular expressions. But, despite an encouraging academic literature, no such tool had existed for context-free parsing. The first stable version of Marpa was uploaded to a public archive on Solstice Day 2011. This monograph describes the algorithm used in the most recent version of Marpa, Marpa::R2. It is a simplification of the algorithm presented in my earlier paper.

A proven algorithm

While the presentation in this monograph is theoretical, the approach is practical. The Marpa::R2 implementation has been widely available for some time, and has seen considerable use, including in production environments. Many of the ideas in the parsing literature satisfy theoretical criteria, but in practice turn out to face significant obstacles. An algorithm may be as fast as reported, but may turn out not to allow adequate error reporting. Or a modification may speed up the recognizer, but require additional processing at evaluation time, leaving no advantage to compensate for the additional complexity.

In this monograph, I describe the Marpa algorithm as it was implemented for Marpa::R2. In many cases, I believe there are better approaches than those I have described. But I treat these techniques, however solid their theory, as conjectures. Whenever I mention a technique that was not actually implemented in Marpa::R2, I will always explicitly state that that technique is not in Marpa as implemented.

Features

General context-free parsing

As implemented, Marpa parses all "proper" context-free grammars. The proper context-free grammars are those which are free of cycles, unproductive symbols, and inaccessible symbols. Worst case time bounds are never worse than those of Earley's algorithm, and therefore never worse than O(n**3).

Linear time for practical grammars

Currently, the grammars suitable for practical use are thought to be a subset of the deterministic context-free grammars. Using a technique discovered by Joop Leo, Marpa parses all of these in linear time. Leo's modification of Earley's algorithm is O(n) for LR-regular grammars. Leo's modification also parses many ambiguous grammars in linear time.

Left-eidetic

The original Earley algorithm kept full information about the parse --- including partial and fully recognized rule instances --- in its tables. At every parse location, before any symbols are scanned, Marpa's parse engine makes available its information about the state of the parse so far. This information is in useful form, and can be accessed efficiently.

Recoverable from read errors

When Marpa reads a token which it cannot accept, the error is fully recoverable. An application can try to read another token. The application can do this repeatedly as long as none of the tokens are accepted. Once the application provides a token that is accepted by the parser, parsing will continue as if the unsuccessful read attempts had never been made.

Ambiguous tokens

Marpa allows ambiguous tokens. These are often useful in natural language processing where, for example, the same word might be a verb or a noun. Use of ambiguous tokens can be combined with recovery from rejected tokens so that, for example, an application could react to the rejection of a token by reading two others.

Using the features

Error reporting

An obvious application of left-eideticism is error reporting. Marpa's abilities in this respect are ground-breaking. For example, users typically regard an ambiguity as an error in the grammar. Marpa, as currently implemented, can detect an ambiguity and report specifically where it occurred and what the alternatives were.

Event driven parsing

As implemented, Marpa::R2 allows the user to define "events". Events can be defined that trigger when a specified rule is complete, when a specified rule is predicted, when a specified symbol is nulled, when a user-specified lexeme has been scanned, or when a user-specified lexeme is about to be scanned. A mid-rule event can be defined by adding a nulling symbol at the desired point in the rule, and defining an event which triggers when the symbol is nulled.

Ruby slippers parsing

Left-eideticism, efficient error recovery, and the event mechanism can be combined to allow the application to change the input in response to feedback from the parser. In traditional parser practice, error detection is an act of desperation. In contrast, Marpa's error detection is so painless that it can be used as the foundation of new parsing techniques.

For example, if a token is rejected, the lexer is free to create a new token in the light of the parser's expectations. This approach can be seen as making the parser's "wishes" come true, and I have called it "Ruby Slippers Parsing".

One use of the Ruby Slippers technique is to parse with a clean but oversimplified grammar, programming the lexical analyzer to make up for the grammar's short-comings on the fly. As part of Marpa::R2, the author has implemented an HTML parser, based on a grammar that assumes that all start and end tags are present. Such an HTML grammar is too simple even to describe perfectly standard-conformant HTML, but the lexical analyzer is programmed to supply start and end tags as requested by the parser. The result is a simple and cleanly designed parser that parses very liberal HTML and accepts all input files, in the worst case treating them as highly defective HTML.

Ambiguity as a language design technique

In current practice, ambiguity is avoided in language design. This is very different from the practice in the languages humans choose when communicating with each other. Human languages exploit ambiguity in order to design highly flexible, powerfully expressive languages. For example, the language of this monograph, English, is notoriously ambiguous.

Ambiguity of course can present a problem. A sentence in an ambiguous language may have undesired meanings. But note that this is not a reason to ban potential ambiguity --- it is only a problem with actual ambiguity.

Syntax errors, for example, are undesired, but nobody tries to design languages to make syntax errors impossible. A language in which every input was well-formed and meaningful would be cumbersome and even dangerous: all typos in such a language would be meaningful, and parser would never warn the user about errors, because there would be no such thing.

With Marpa, ambiguity can be dealt with in the same way that syntax errors are dealt with in current practice. The language can be designed to be ambiguous, but any actual ambiguity can be detected and reported at parse time. This exploits Marpa's ability to report exactly where and what the ambiguity is. Marpa::R2's own parser description language, the SLIF, uses ambiguity in this way.

Auto-generated languages

In 1973, Čulik and Cohen pointed out that the ability to efficiently parse LR-regular languages opens the way to auto-generated languages. In particular, Čulik and Cohen note that a parser which can parse any LR-regular language will be able to parse a language generated using syntax macros.

Second order languages

In the literature, the term "second order language" is usually used to describe languages with features which are useful for second-order programming. True second-order languages --- languages which are auto-generated from other languages --- have not been seen as practical, since there was no guarantee that the auto-generated language could be efficiently parsed.

With Marpa, this barrier is raised. As an example, Marpa::R2's own parser description language, the SLIF, allows "precedenced rules". Precedenced rules are specified in an extended BNF. The BNF extensions allow precedence and associativity to be specified for each RHS.

Marpa::R2's precedenced rules are implemented as a true second order language. The SLIF representation of the precedenced rule is parsed to create a BNF grammar which is equivalent, and which has the desired precedence. Essentially, the SLIF does a standard textbook transformation. The transformation starts with a set of rules, each of which has a precedence and an associativity specified. The result of the transformation is a set of rules in pure BNF. The SLIF's advantage is that it is powered by Marpa, and therefore the SLIF can be certain that the grammar that it auto-generates will parse in linear time.

Notationally, Marpa's precedenced rules are an improvement over similar features in LALR-based parser generators like yacc or bison. In the SLIF, there are two important differences. First, in the SLIF's precedenced rules, precedence is generalized, so that it does not depend on the operators: there is no need to identify operators, much less class them as binary, unary, etc. This more powerful and flexible precedence notation allows the definition of multiple ternary operators, and multiple operators with arity above three.

Second, and more important, a SLIF user is guaranteed to get exactly the language that the precedenced rule specifies. The user of the yacc equivalent must hope their syntax falls within the limits of LALR.

References, comments, etc.

Marpa has a semi-official web site, maintained by Ron Savage. The official, but more limited, Marpa website is my personal one. Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.

Header image by Tambako the Jaguar. Some rights reserved.