Applied Pokology

Back to blog... _____ ---' __\_______ ______)

GNU poke development news

__) __) ---._______) Jose E. Marchesi November 15, 2020 The development of GNU poke is progressing well, and we hold hopes for a first release before the end of the year: we are determined for something good to happen in 2020! ;) This article briefly reviews the latest news in the development of the program, like changes in certain syntax to make the language more compact, support for lambda expressions, support for stream-like IO spaces and how they can be used to write filters, support for using assignments to poke complex data structures, improvements in data integrity, annoying bugs fixed, and more. Make the language a bit more compact ==================================== Being a domain-specific language for a tool, it is to be expected for Poke to be often written interactively. It follows that compactness (while maintaining good readability) is important, as it reduces the number of keys the user must press to achieve whatever effect. In this spirit, we have recently changed two aspects of the syntax of the language. First, we have renamed the keywords `defunit', `defvar', `defun' and `deftype' to `unit', `var', `fun' and `type' respectively. Therefore, where we would previously write: ,---- | defun rtrim = (string s, string cs = " \t") string: | { | defvar cs_length = cs'length; | defvar result = ""; | defvar i = s'length; | ... | } `---- Now we write: ,---- | fun rtrim = (string s, string cs = " \t") string: | { | var cs_length = cs'length; | var result = ""; | var i = s'length; | ... | } `---- Second, we have now support for "chaining" several declarations of the same kind, separated by commas. So, where we would previously write: ,---- | defvar STB_LOCAL = 0; | defvar STB_GLOBAL = 1; | defvar STB_WEAK = 2; | defvar STB_LOOS = 10; | defvar STB_HIOS = 12; | defvar STB_LOPROC = 13; | defvar STB_HIPROC = 15; `---- Now we write: ,---- | var STB_LOCAL = 0, | STB_GLOBAL = 1, | STB_WEAK = 2, | STB_LOOS = 10, | STB_HIOS = 12, | STB_LOPROC = 13, | STB_HIPROC = 15; `---- Chaining declarations like that works for units, variables and types, but not for functions nor methods. This is both due to a technicality (function specifiers are not terminated with a semicolon) and the fact it would be quite unusual and confusing. Support for lambdas =================== Yes, it was definitely about time... being a proper lexically scoped language with closures, perfectly capable to do funargs in both directions (passing them to functions and returning them from functions) it would be a real indecency to not support lambda expressions! So we just added them, and we are much happier now :) For once the language syntax proved to be sane enough to be on our side, allowing us to use a nice and orthogonal construct: ,---- | lambda FUNCTION_SPECIFIER `---- where a FUNCTION_SPECIFIER is the same notation that one would use when defining a function in a `fun' construction. Examples: ,---- | (poke) lambda void: {} | #<closure> | (poke) lambda (int i) int: { return i + 2; } | #<closure> `---- Once created, lambdas can be invoked like any other function value: ,---- | (poke) lambda void: {} () | (poke) lambda (int i) int: { return i + 2; } (10) | 12 `---- And of course they can be stored in variables, passed around in function calls, and the like: ,---- | (poke) var la = lambda (int i) int: { return i + 2; } | (poke) la (10) | 12 `---- This is the classic closure-oriented way of supporting generators of number sequences: ,---- | type Generator = ()int; | fun new_generator = Generator: | { | var i = 0; | return lambda int: { i = i + 1; return i; }; | } `---- and then: ,---- | (poke) var g1 = new_generator | (poke) var g2 = new_generator | (poke) g1 | 1 | (poke) g1 | 2 | (poke) g2 | 1 | (poke) g1 | 3 `---- Support for stream-like IO spaces ================================= Back in January, during the first Pokeconf celebrated in Switzerland, we had a long discussion about how could we support stream-like IO spaces in poke. Could that be achieved, it would allow us to access devices like IO ports and pipes, and poke them at pleasure. Also, it would make it possible to write filter-like programs in Poke, where the standard input is processed an entity at a time, and some output generated in the standard output. To wrap sequential access devices in a random access abstraction like the poke IO spaces, without introducing special cases in the handling of the later, wasn't easy, but we finally figured out a good design for it. We already published a little post in Applied Pokology describing it (<http://jemarch.net/pokology-20200113.html>). Right, but this had to be implemented. Recently our resident IO expert Egeyar Bagcioglu did just that, adding support for a new kind of IO device (IOD) to poke: the stream. The result is awesome. Now we can write filters like this implementation of the `strings' command: ,---- | #!/usr/local/bin/poke -L | !# | | /* Printable ASCII characters: 0x20..0x7e */ | | var stdin = open ("<stdin>"); | var stdout = open ("<stdout>"); | | var offset = 0#B; | | try | { | flush (stdin, offset); | | var b = byte @ stdin : offset; | if (b >= 0x20 && b <= 0x7e) | byte @ stdout : iosize (stdout) = b; | | offset = offset + 1#B; | } | until E_eof; | | close (stdin); | close (stdout); `---- Note how `open' recognizes the handlers "<stdin>" and "<stdout>" and uses the stream IOD for them, and how `flush' causes remembered parts of the stdin to be forgotten. Thank Ege! Maps of complex values in l-values ================================== In Poke it is possible to specify maps in the left side of assignment statements, like this: ,---- | (poke) int @ 23#B = 666 `---- The above statement will poke the value 666 at offset 23 bytes from the beginning of the IO space. In principle, values of any type can be written to the IO space this way, even complex ones like arrays, structs and unions. However, until now such constructions were limited to simple types, i.e. integral, offsets, and strings. Any attempt of poking complex values like structs were impeded by a compile-time error. Well... not anymore. We recently (finally!) added support for poking complex values using maps in the l-value of assignments. This is how you would create an empty ELF file: ,---- | (poke) .mem tmp | The current IOS is now `*tmp*'. | (poke) dump | 76543210 0011 2233 4455 6677 8899 aabb ccdd eeff 0123456789ABCDEF | 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................ | 00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ | 00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ | 00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ | 00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ | 00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................ | 00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................ | 00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ | (poke) load elf | (poke) Elf64_Ehdr @ 0#B = Elf64_Ehdr {} | (poke) dump | 76543210 0011 2233 4455 6677 8899 aabb ccdd eeff 0123456789ABCDEF | 00000000: 7f45 4c46 0000 0000 0000 0000 0000 0000 .ELF............ | 00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ | 00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................ | 00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................ | 00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ | 00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................ | 00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................ | 00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ | (poke) save | (poke) save :size iosize :file "foo.elf" `---- It is important to note that when a value is assigned to an l-value map, its mapped/non-mapped properties, or any other properties, are not changed at all. For example: ,---- | (poke) var a = [1,2,3] | (poke) a'mapped | 0 | (poke) int[3] @ 10#B = a | (poke) a'mapped | 0 `---- The attributes of the array stored in the variable `a' don't change. In this example it is not mapped, but it could have been mapped at some other (or the same) offset in the current IO space, or even at another IO space, and its properties would have still been preserved. Assignment to structs with data integrity ========================================= Data integrity is a fundamental matter in poke. The several kinds of constraints specified by the user in type descriptions, like an array bounded by size, or a struct or union in which fields should have certain form, shall be preserved at all time. Now, there are three ways to generate values in Poke: 1) From a literal, like `[1,2,3]'. 2) From a constructor, like `Foo { a = 10, b = 20 }'. 3) From a mapping, like `int[3] @ 0#B' or `Foo @ 0#B'. In all cases, the constraints are checked, and any violation reported. This means that the integrity of all the data in poke values, as defined by the types, is guaranteed by both compile-time and run-time checks. However, until now there was a little caveat: it was possible to break that data integrity when assigning new values to struct fields: ,---- | (poke) type Foo = struct { int a = 0xff; int b; } | (poke) Foo { } | Foo { | a=0xff, | b=0x0 | } | (poke) var f = Foo {} | (poke) f.a | 0xff | (poke) f.a = 0 | (poke) f | Foo { | a=0x0, | b=0x0 | } `---- We just fixed that (support for this required support for other rather complex stuff) and now poke aborts the assignment if it would break the integrity of the data: ,---- | (poke) f.a = 20 | unhandled constraint violation exception `---- Much better this way :) New rules for union constructors ================================ Unlike structs, union constructors with more than one field initializer really don't make much sense, like in: ,---- | (poke) type Foo = union { byte b : b > 0; int i; } | (poke) Foo {b = 2, i = 12} `---- What kind of Foo are we asking for? It is not clear. Until this patch, the same rules used for constructed structs applied, so the result would be: ,---- | Foo { | b=2UB | } `---- Now, if we do: ,---- | (poke) var f = Foo {b = 2} | (poke) f.b = 0 `---- What would we expect? For the value in `f' to change nature and become a Foo of kind `i'? Or for this to be considered as a constraint error? In the first case (the nature of the union value changes) we could expect to get something like: ,---- | (poke) f | Foo { | i = 12 | } `---- But should that really be 12, i.e. the value previously specified in the initializer, or 0? Wouldn't we expect this value to be impacted by the coupling in bits of the two fields, if we consider both fields "start" at the beginning of the union? This quickly degenerates into something very complicated and obscure, and almost impossible to implement properly and to understand. However, this is because we are thinking about unions like if they were regular structs: they are not. This was simply the wrong way of thinking. A better approach, which we just implemented, is the following: an union constructor admits either one or zero field initializers. Specifying more than one field initializer is a compile-time error: ,---- | (poke) Foo {b = 2, i = 12} | <stdin>:1:1: error: union constructors require exactly one field initializer | Foo {b = 2, i = 12}; | ^~~~~~~~~~~~~~~~~~~ `---- When we specify a field initializer, we are also declaring the kind of Foo we want to construct, i.e. its "nature": ,---- | (poke) Foo {b = 2} | Foo { | b=2UB | } `---- Therefore, if we provide an invalid initial value for `b', then we get a constraint-violation exception: ,---- | (poke) Foo {b = 0} | unhandled constraint violation exception `---- The alternative `i' is not considered in this case: we asked for a Foo of kind `b', and we provided the wrong values for it: we want the exception. Once constructed, union values do not change their "nature" due to assignments: ,---- | (poke) var f = Foo {b = 2} | (poke) f.b = 0 | unhandled constraint violation exception `---- Note that mapped unions are different in this sense. When we map an union on some IO space: ,---- | (poke) var m = Foo @ 0#B `---- we are not specifying what nature of Foo we want: it all depends on the contents of the IO space. Therefore, we could get either a `b' or an `i', and if the underlying data in the IO space changes, the nature of the value in `m' may change as well. Something similar happens when we don't specify a field initializer in an union constructor: ,---- | (poke) Foo {} | Foo { | i=0 | } `---- Since no initializer was provided, we didn't indicate what kind of Foo we wanted, and therefore each alternative is tried assuming all fields have default values, which in the case of integral types is a zero. Hope all this makes sense XD The infamous big array bug is now fixed! ======================================== Ok, this is a bit of an embarrassing one. When I first added support for arrays and struct values to the Poke virtual machine, I designed the corresponding `mka' and `mksct' PVM instructions in a way they would get their contents from the run-time stack. So, if we wanted to build a struct value with three fields `f1', `f2' and `f3' we would generate PVM code similar to this pseudo-code (the actual PVM assembly is more complicated): ,---- | push "f3" | push 30 | push "f2" | push 20 | push "f1" | push 10 | mksct `---- where `mksct' pops the field names and the field values from the stack and creates the value `struct {f1 = 10, f2 = 20, f3 = 30}'. Yes, the stuff should be pushed in reverse order to the stack, for obvious reasons :) Of course, being stupid me, I made the mistake of apply the same strategy to arrays. So for example to build an array [10,20,30], the compiler would generate PVM code like: ,---- | push 30 | push 20 | push 10 | mka `---- That works very well when compiling array literals like `[1,2,3]', but what happens when you map, say, all the bytes in a given tar file that is, say, 53Mb long? ,---- | (poke) .file some.tar.gz | (poke) byte[] @ 0#B | Segmentation fault `---- Yeah... turns out that the Jitter stacks are not only limited, but not that big. Which is ok, of course, since we were clearly misusing them. Also, this approach has the additional disadvantage (promptly pointed out by Luca Saiu) of requiring copying the elements around twice for no good reason. The solution is obvious: the `mka' instruction should work differently. Instead of getting its elements from the stack, it should create an empty array. Then the elements should be added to the array in a sequence or in a loop, using an element insertion instruction. Compiling `[1,2,3]' then becomes: ,---- | mka | push 1 | ains | push 2 | ains | push 3 | ains `---- Whereas the array map should use a loop instead, since the length of the resulting array is not known at compile-time. Thing is, I have been aware of both the problem and of its solution for a long time, but I have been procrastinating it for long, drawing my attention to more difficult and important issues. This despite complains of people. However, while implementing the support for complex maps in l-values, I realized I needed the elements of array literals to be pushed in the right order in the stack, and therefore I had no choice but to bit the bullet and implement the new `mka'. So now poke supports big arrays without segfaulting, and John Darrington is happy. Hurrah! New built-in function gettime ============================= We added a new built-in function `gettime', that returns an array of two signed 64-bit numbers denoting the number of seconds and nanoseconds since the Epoch (1-1-1970) respectively. Additionally, we added a `Timespec' struct and an accompanying `gettimeofdaty' function to the `time' pickle: ,---- | type Timespec = struct | { | int<64> sec; | int<64> nsec; | }; | | fun gettimeofday = Timespec: | { | var time = get_time; | return Timespec {sec = time[0], nsec = time[1]}; | } `---- Support for octal and hexadecimal codes in strings ================================================== The Poke strings are slowly maturing into grown-up strings... this week Mohammad-Reza Nabipoor (whom we warmly welcome to the poke gang!) added support for specifying character codes in strings using octal and hexadecimal escapes: ,---- | (poke) print "foo\xa" | foo | (poke) print "fo\157\n" | foo `---- Thank you Mohammad! Support for `continue' in loops =============================== Not much to say about this one... we have added `continue' statements to the language, with the usual semantics: it initiates a new iteration in the containing loop: ,---- | for (packet in packets) | { | if (packet_is_not_valid (packet)) | continue; | ... | } `---- poke.rec database ================= Lastly... the boring stuff :) GNU poke is getting big and complex, encompassing many components, and happily more people are joining the development. We are really starting to be in need of organizing better, or we will go nuts. Therefore, in what proved to be a quite painful exercise, I gathered all my dispersed notes, TODO lists, ideas and bug reports, prioritized them, and documented and organized them in a recutils database (http://www.gnu.org/s/recutils). The database can be found in the file `etc/poke.rec' in the source tree. This database currently contains record sets for tasks, releases and hackers. It provides a way to generate reports, and clear answers to questions like "what is pending before we can release 1.0?" or, "I would like to work on the compiler, what can I do?". The fact that `rec-mode.el', the Emacs interface to recutils, is now very actively maintained by Antoine Kalmbach, really helps us. We have got to send him a Jamón de Bellota as a proof of our estimation! And that's all for now. Happy poking! :)