First Poke-Conf at Mont-Soleil - A report
[13-01-2020]

by Jose E. Marchesi

pokists in Mont-Soleil
Poking at Mont-Soleil

This last weekend we had the first gathering of poke developers, as part of the GNU Hackers Meeting at Mont-Soleil, in Switzerland. I can say we had a lot of fun, and it was a quite productive meeting too: many patches were written, and many technical aspects designed and clarified.

Attendants: Bruno Haible, Egeyar Bagcioglu, John Darrington, Luca Saiu, Darshit Shah, Jose E. Marchesi.

First we made a little introductory talk for the benefit of the GHM attendants who were not familiar with poke, followed by a quick review of the recent developments. After that, we went to discuss some serious business: handling of stream-like IO spaces, integral "atoms" in structs, the adoption of a bug tracking system for the project, how to best support Unicode and UTF-8 in poke, and many many other things, some of which are summarized below.

Stream-like IO spaces, and stdin/stdout

How to handle stream-like IO devices (such as the standard input and standard output) in poke is not trivial. We considered two possibilities:

Option 1: to handle stdin and stdout in a special way at the Poke language level. This basically would mean to alter the semantics of the mapping operator: mapping in a stream IO space "advances" the file position.

A little Poke program that would process TCP packets from stdin and writing them out in stdout, using Option 1, would look like:

#!/usr/bin/poke -L
!#

load tcp;

try do_stuff (Tcp_Packet @ stdin : 0#B);
until E_eof;

We quickly discovered that Option 1 is not really feasible: "advancing" the file position implicitly basically means adjusting offsets, and that totally breaks poke's mapping mechanism. Also, it is not a very "pokish" way of resolving the problem. So we had to think about an alternative.

Option 2: to handle stdin and stdout (and other stream-like devices) as regular IO spaces. This basically implies the need to "remember" the data already read from read-only streams (using either a memory buffer or a temporary file in the filesystem) and to provide a way for the user to "forget" that data when it is no longer needed.

This new operation has the following form:

forget IOS, OFFSET; -> From 0#B..OFFSET in IOS, "forget" it.
If a reference to a value mapped in that range is done, then the peek/poke PVM instructions should raise an E_eof exception (meaning in this case Early Of File, instead of End Of File :D). Forgetting an IO space that is backed by a regular file (and not a stream-like device) is a nop. Likewise, forgetting a write-only stream IO space is also a nop.

Making streams to look like regular IO spaces also requires a way to append to them, in the case of write-only streams. For that purpose we decided to add a new built-in function iosize that, given the ID of an IO space, returns the offset past the end of the space. This is the offset to be used in a lhs mapping.

But what should iosize return if it is invoked in a read-only stream? Should it raise an exception? Bruno thinks we should raise an exception for stdin because otherwise it may reveal some details on the buffering strategy. John and Darshit agrees.

The TCP filter above, using Option 2, would look like:

var offset = 0#B;

try
{
  forget stdin, offset;

  var b = byte @ stdin : offset;

  if (b < 80)
     byte @ stdout : iosize (stdout) = b;

  offset = offset + packet'size;
}
until E_eof;

Since appending to an IO space will be a common operation when handling stream-like spaces, we considered to add some syntax sugar, something like:

byte @ stdout :+ = b;

Unicode

How to best support Unicode in poke? We concluded that mimicking the C support (with its support of "wide" chars and strings) is not a good idea. We will be splitting the support in several pickles:

unicode.pk (also handling ucs encodings)
utf8.pk
utf16.pk

Additionally to provide suitable Poke types (like for an UTF-8 character) we will want to implement additional functionality (we used the GNU libunistring API as a base):

.source and source

We also discussed about the need to have a way to "load" (or source) pickles at the Poke language level. This way, we will be able to write Poke programs like:

#!/usr/bin/poke -L
!#

load elf;

... operate on ELF stuff ...
    

This would have the same semantics than the currently available .load dot-command. At this point Bruno pointed out that if, once a given pickle has been loaded, loading it again is a nop, then calling the operation "load" is confusing. Darshit suggested using the name "source" instead. After some discussion we agreed to use "source" for both the dot-command and the language-level construction, with the same semantics than the current dot-command.

At some point (surely after the release) we will be adding a modules system to Poke. But for the time being "sourcing" pickles shall be enough.

We got a bug tracker!

We finally decided on getting ourselves a bug tracker, and concluded bugzilla would be a good option: it allows programmatic access, email notifications, and provides a good web interface for people enjoying such things. So we went ahead and requested the addition of poke as a product in the sourceware bugzilla (which also manages the bugs of other GNU programs such as glibc and binutils). The sourceware overseers created the product almost immediately (thanks Frank, you are the best!) so we can already use it at our pleasure.

Looking forward to meet again

All in all, this was a great experience. I'm looking forward to meet again with my fellow pokers, which will happen at FOSDEM in a few weeks! I will definitely organize another Poke-Conf colocated with the GNU Hackers Meeting in Hamburg this summer. At that point we will have already released poke 1.0 so we will have a great excuse to make a party... or so I hope! :)

Happy poking!

Back to Applied Pokology Follow up in the mailing list...