Applied Pokology - GNU poke development news

 
Applied Pokology                                           Back to blog...

     _____
 ---'   __\_______
            ______)         Understanding Poke methods
            __)             
           __)
 ---._______)

                                                          Jose E. Marchesi
                                                          May 4, 2020

Poke struct types can be a bit daunting at first sight.  You can find
all sort of things inside them: from fields, variables and functions to
constraint expressions, initialization expressions, labels, other type
definitions, and methods.

Struct methods can be particularly confusing for the novice poker.  In
particular, it is important to understand the difference between methods
and regular functions defined inside struct types.  This article will
hopefully clear the confusion, and also will provide the reader with a
better understanding on how poke works internally.


The Packet
==========

  First we need to define some structure to use as an example.  Let's
  say we are interesting in poking Packets, as defined by the Packet
  Specification 1.2 published by the Packet Foundation (none less).

  In a nutshell, each Packet starts with a byte whose value is always
  0xab, followed by a byte that defines the size of the payload.  A
  stream of bytes conforming the payload follows, themselves followed by
  another stream of the same number of bytes with "control" values.

  We could translate this description into the following Poke struct
  type definition:

  ,----
  | type Packet =
  |   struct
  |   {
  |     byte magic = 0xab;
  |     byte size;
  |     byte[size] payload;
  |     byte[size] control;
  |   };
  `----


  See the Poke manual for details on types, initialization values,
  constraint expressions etc.

  There are some details described the Packet Specification 1.2 that are
  not covered in this simple definition, but we will be attending to
  that later in this article.


The process of building structs
===============================

  Given the definition of a struct type like Packet, there are only two
  ways to build a struct value in Poke.

  One is to map it from some IO space.  This is achieved using the map
  operator:

  ,----
  | (poke) Packet @ 12#B
  | Packet {
  |   magic = 0xab,
  |   size = 2,
  |   payload = [0x12UB,0x30UB],
  |   control = [0x1UB,0x1UB]
  | }
  `----


  The expression above maps a Packet starting at offset 12 bytes, in the
  current IO space.  See the Poke manual for more details on using the
  map operator.

  The second way to build a struct value is to _construct_ one,
  specifying the value to some, all or none of its fields.  It looks
  like this:

  ,----
  | (poke) Packet {size = 2, payload = [1UB,2UB]}
  | Packet {
  |   magic = 0xab,
  |   size = 2,
  |   payload = [0x1UB,0x2UB],
  |   control = [0x0UB,0x0UB]
  | }
  `----


  In either case, building a struct involves to determine the value of
  all the fields of the struct, one by one.  The order in which the
  struct fields are built is determined by the order of appearance of
  the fields in the type description.

  In our example, the value of magic is determined first, then
  `size', `payload' and finally `control'.  This is the reason why we
  can refer to the values of previous fields when defining fields, such
  as in the size of the `payload' array above, but not the other way
  around: by the time `payload' is mapped or constructed, the value of
  `size', has already been mapped or constructed.

  What happens behind the curtains is that when poke finds the
  definition of a struct type, like Packet, it compiles two functions
  from it: a mapper function, and a constructor function.  The mapper
  function gets as arguments the IO space and the offset from which to
  map the struct value, whereas the constructor function gets the
  template specifying the initial values for some, or all of the fields;
  reasonable default values (like zeroes) are used for fields for which
  no initial values have been specified.

  These functions, mapper and constructor, are invoked to create fresh
  values when a map operator @ or a struct constructor is used in a Poke
  program, or at the poke prompt.


Variables in struct types
=========================

  Fields are not the only entity that can appear in the definition of a
  struct type.

  Suppose that after reading more carefully the Packet Specification 1.2
  (that spans for several thousand of pages) we realize that the field
  `size' doesn't really stores the number of bytes of the payload and
  control arrays, like we thought initially.  Or not exactly: the Packet
  Foundation says that if `size' has the special value 0xff, then the
  size is zero.

  We could of course do something like this:

  ,----
  | type Packet =
  |   struct
  |   {
  |     byte magic = 0xab;
  |     byte size;
  | 
  |     byte[size == 0xff ? 0 : size] payload;
  |     byte[size == 0xff ? 0 : size] control;
  |   };
  `----


  However, we can avoid replicating code by using a variable instead:

  ,----
  | type Packet =
  |   struct
  |   {
  |     byte magic = 0xab;
  |     byte size;
  | 
  |     var real_size = (size == 0xff ? 0 : size);
  | 
  |     byte[real_size] payload;
  |     byte[real_size] control;
  |   };
  `----


  Note how the variable can be used after it gets defined.  In the
  underlying process of mapping or constructing the struct, the variable
  is incorporated into the lexical environment.  Once defined, it can be
  used in constraint expressions, array sizes, etc.  We will see more
  about this later.

  Incidentally, it is of course possible to use global variables as
  well.  For example:

  ,----
  | var packet_special = 0xff;
  | type Packet =
  |   struct
  |   {
  |     byte magic = 0xab;
  |     byte size;
  | 
  |     var real_size = (size == packet_special ? 0 : size);
  | 
  |     byte[real_size] payload;
  |     byte[real_size] control;
  |   };
  `----


  In this case, the global `packet_special' gets captured in the lexical
  environment of the struct type (in reality in the lexical environment
  of the implicitly created mapper and constructor functions) in a way
  that if you later modify `packet_special' the new value will be used
  when mapping/constructing _new_ values of type Packet.  Which is
  really cool, but lets not get distracted from the main topic... :)


Functions in struct types
=========================

  Further reading of the Packet Specification 1.2 reveals that each
  Packet has an additional `crc' field.  The content of this field is
  derived from both the payload bytes and the control bytes.

  But this is no vulgar CRC we are talking about.  On the contrary, it
  is a special function developed by the CRC Foundation in partnership
  with the Packet Foundation, called superCRC (patented, TM).

  Fortunately, the CRC Foundation distributes a pickle `supercrc.pk',
  that provides a `calculate_crc' function with the following spec:

  ,----
  | fun calculate_crc = (byte[] data, byte[] control) int:
  `----


  So let's use the function like this in our type, after loading the
  supercrc pickle:

  ,----
  | load supercrc;
  | 
  | type Packet =
  |   struct
  |   {
  |     byte magic = 0xab;
  |     byte size;
  | 
  |     var real_size = (size == 0xff ? 0 : size);
  |  
  |     byte[real_size] payload;
  |     byte[real_size] control;
  | 
  |     int crc = calculate_crc (payload, control);
  |   };
  `----


  However, there is a caveat: it happens that the calculation of the CRC
  may involve arithmetic and division, so the CRC Foundation warns us
  that the `calculate_crc' function may raise E_div_by_zero.  However,
  the Packet 1.2 Specification tells us that in these situations, the
  `crc' field of the packet should contain zero.  If we used the type
  above, any exception raised by `calculate_crc' would be propagated by
  the mapper/constructor:

  ,----
  | (poke) Packet @ 12#B
  | unhandled division by zero exception
  `----


  A solution is to use a function that takes care of the extra needed
  logic, wrapping calculate_crc:

  ,----
  | load supercrc;
  | 
  | type Packet =
  |   struct
  |   {
  |     byte magic = 0xab;
  |     byte size;
  | 
  |     var real_size = (size == 0xff ? 0 : size);
  | 
  |     byte[real_size] payload;
  |     byte[real_size] control;
  | 
  |     fun corrected_crc = int:
  |     {
  |       try return calculate_crc (payload, control);
  |       catch if E_div_by_zero { return 0; }
  |     }
  |          
  |     int crc = corrected_crc;
  |   };
  `----


  Again, note how the function is accessible after its definition.  Note
  as well how both fields and variables and other functions can be used
  in the function body.  There is no difference to define variables and
  functions in struct types than to define them inside other functions
  or on the top-level environment: the same lexical rules apply.


Methods
=======

  At this point you may be thinking something on the line of "hey, since
  variables and functions are also members of the struct, I should be
  able to access them the same way than fields, right?".

  So you will want to do:

  ,----
  | (poke) var p = Packet @ 12#B
  | (poke) p.real_size
  | (poke) p.corrected_crc
  `----


  But sorry, this won't work.

  To understand why, think about the struct building process we sketched
  above.  The mapper and constructor functions are derived/compiled from
  the struct type.  You can imagine them to have prototypes like:

  ,----
  | Packet_mapper (IOspace, offset) -> Packet value
  | Packet_constructor (template)   -> Packet value
  `----


  You can also picture the fields, variables and functions in the struct
  type specification as being defined inside the bodies of Packet_mapper
  and Packet_constructor, as their contents get mapped/constructed.  For
  example, let's see what the mapper does:

  ,----
  | Packet_mapper:
  | 
  |   . Map a byte, put it in a local `magic'.
  |   . Map a byte, put it in a local `size'.
  |   . Calculate the real size, put it in a local `real_size'.
  |   . Map an array of real_size bytes, put it in a local `payload'.
  |   . Map an array of real_size bytes, put it in a local `control'.
  |   . Compile a function, put it in a local `corrected_crc'.
  |   . map a byte, call the function in the local `corrected_crc',
  |     complain if the values are not the same, otherwise put the
  |     mapped byte in a local `crc'.
  |   . Build a struct value with the values from the locals `magic',
  |     `size', `payload', `control' and `crc', and return it.
  `----


  The pseudo-code for the constructor would be almost identical.  Just
  replace "map a byte" with "construct a byte".

  So you see, both the values for the mapped fields and the values for
  the variables and functions defined inside the struct type end as
  locals of the mapping process, but only the values of the fields are
  actually put in the struct value that is returned in the last step.

  This is where methods come in the picture.  A method looks very
  similar to a function, but it is not quite the same thing.  Let me
  show you an example:

  ,----
  | load supercrc;
  | 
  | type Packet =
  |   struct
  |   {
  |     byte magic = 0xab;
  |     byte size;
  | 
  |     var real_size = (size == 0xff ? 0 : size);
  | 
  |     byte[real_size] payload;
  |     byte[real_size] control;
  | 
  |     fun corrected_crc = int:
  |     {
  |       try return calculate_crc (payload, control);
  |       catch if E_div_by_zero { return 0; }
  |     }
  |          
  |     int crc = corrected_crc;
  | 
  |     method c_crc = int:
  |     {
  |       return corrected_crc;
  |     }
  |   };
  `----


  We have added a method `c_crc' to our Packet struct type, that just
  returns the corrected superCRC (patented, TM) of a packet.  This can
  be invoked using dot-notation, once a Packet value is
  mapped/constructed:

  ,----
  | (poke) var p = Packet @ 12#B
  | (poke) p.c_crc
  | 0xdeadbeef
  `----


  Now, the important bit here is that the method returns the corrected
  crc _of a Packet_.  That's it, it actually operates on a Packet value.
  This Packet value gets implicitly passed as an argument whenever a
  method is invoked.

  We can visualize this with the following "pseudo Poke":

  ,----
  | method c_crc = (Packet SELF) int:
  | {
  |    return SELF.corrected_crc;
  | }
  `----


  Fortunately, poke takes care to recognize when you are referring to
  fields of this implicit struct value, and does The Right Thing(TM) for
  you.  This includes calling other methods:

  ,----
  | method foo = void: { ... }
  | method bar = void:
  | {
  |  [...]
  |  foo;
  | }
  `----


  The corresponding "pseudo-poke" being:

  ,----
  | method bar = (Packet SELF) void:
  | {
  |  [...]
  |  SELF.foo ();
  | }
  `----


  It is also possible to define methods that modify the contents of
  struct fields, no problem:

  ,----
  | var packet_special = 0xff;
  | 
  | type Packet =
  |   struct
  |   {
  |     byte magic = 0xab;
  |     byte size;
  |     [...]
  | 
  |     method set_size = (byte s) void:
  |     {
  |       if (s == 0)
  |         size = packet_special;
  |       else         
  |         size = s;
  |     }
  |   };
  `----


  This is what is commonly known as a "setter".  Note, incidentally, how
  a method can also use regular variables.  The Poke compiler knows when
  to generate a store in a normal variable such as `packet_special', and
  when to generate a set to a SELF field.


A few restrictions
==================

  Given the different nature of the variables, functions and methods,
  there are a couple of restrictions:

  - Functions can't set fields defined in the struct type.

    This will be rejected by the compiler:

    ,----
    | type Foo =
    |   struct
    |   {
    |      int field;
    |      fun wrong = void: { field = 10; }
    |   };
    `----


    Remember the construction/mapping process.  When a function
    accesses a field of the struct type like in the example above, it
    is not doing one of these pseudo `SELF.field = 10'.  Instead, it
    is simply updating the value of the local created in this step in
    Foo_mapper:

    ,----
    | Foo_mapper:
    |        
    |  . Map an int, put it in a local `field'.
    |  . [...]
    `----


    Setting that local would impact the mapping of the subsequent fields
    if they refer to `field' (for example, in their constraint
    expression) but it wouldn't actually alter the value of the field
    `field' in the struct value that is created and returned from the
    mapper!

    This is very confusing, so we just disallow this with a compiler
    error "invalid assignment to struct field", for your own sanity 8-)

  - Methods can't be used in field constraint expressions, nor in
    variables or functions defined in a struct type.

    How could they be?  The field constraint expressions, the
    initialization expressions of variables, and the functions defined
    in struct types are all executed as part of the mapper/constructor
    and, at that time, there is no struct value yet to pass to the
    method.

    If you try to do this, the compiler will greet you with an "invalid
    reference to struct method" message.

  Happy poking! :)