Towards Data Structures for Tla 2.0
draft A
tla 1.x
uses ./src/tla/libawk
to provide a bunch of "awk-like"
data structures. libawk
is good, as far is it goes, but it isn't
suitable for a thoroughly librified libarch
.
problems with the current
libawk
:bogus error handling --
libawk
makes little attempt to propogate errors to callers in an orderly way: it assumes it is running in a one-shot (short-lived) process and is free to exit on error.leaky abstraction barrier -- programs using the current
libawk
too often wind up refering to libawk strings ast_uchar *
(which is incompatible with the Unicode plans) or, even worse, explicitly freeing, allocating, more modifying supposed-to-be-opaque fields oflibawk
data structures. The API isn't quite a clean abstraction.missing functionality -- 2.0 needs Unicode support which would be hard to retrofit onto
libawk
. While working on1.x
, I sorely missed some minor generalizations oflibawk
such as number valued list and table entries.awkward memory managment (no pun intended) -- Programs must explicitly free all
libawk
data structures allocated as "stack locals". It is easy to flub this, at least along some execution paths -- the result would be a memory-leaking 2.0 library.
Here is what I plan to take the place of libawk
in tla 2.0
.
Namespaces
The central data structure used by libarch
in tla-2.0
is called a
namespace
.
Roughly speaking, a namespace
is a kind of dictionary: a
dynamically modifiable collection of named variables. Programs
create and delete variables within a namespace. Programs read and
write the values of variables in a namespace.
However, namespaces have considerably more structure than your average dictionary:
Namespaces at a Glance
A namespace is a data structure which maps variable names to locations. What is a "variable name"? What is a "location"?
A location is mutable storage for a single scalar value. The
set of scalar values includes strings, numbers, symbols,
booleans, and the value nil
. A location works similarly to a
C variable or structure field of scalar type: programs can read the
value stored there; programs can store a new value there.
Need a picture here. A "location" pictured as a box, containing a scalar.
A variable name is comprised of at least a scope number and identifier. The scope number is a small integer, the identifier is a string (constrained to conform to "identifier name syntax").
A namespace contains 1 or more dynamically allocated scopes. Each scope is a disjoint namespace: the same identifier may be bound to two distinct locations, in two different scopes.
Simple Variables
A simple variable name is comprised of only a scope number and identifier. Given just a the identifier naming a simple variable, and its scope, programs can find its location and therefore both read and modify the value of the variable.
Need a picture here. A scope, containing only simple variables, pictured as a 2-col table with variable names in the left column, scalar values in the right. A namespace pictured (for now) as an array of such scopes.
Non-Simple Variables: Lists and Tables
Some identifiers, however, are bound to more complex variables.
A list variable is an identifier bound within a namespace scope to a dynamically resizable 1-d array of locations.
To find a location within a list variable, a program must supply a scope, an identifier name, and an integer list offset.
Similarly, a table variable is an identifier bound (within a given scope) to a dynamically resizable array of rows, each row being a dynamically resizable array of columns, each column within a row being a single location.
To find a location within a table variable, a program must supply a scope, an identifier name, an integer row offset, and an integer column offset.
Picture Picture scopes as a table: name: binding: simple_var [ 42 ] list_var size=2 [ "hello" ] [ "world" ] table_var n_rows=2 row[0]= [ "hello" ] [ "world" ] row[1]= [ "hello" ] [ "sailor" ] [14] A composite of several of those forming a namespace. Some variable names with arrows pointing to the addressed location. E.g. "simple_var" points to the box around 42 "table_var[1][1]" points to the box around "sailor" ]]
Fancy Tricks With Scopes
The namespace
data structure provides somewhat efficient
operations to:
Clear a scope -- Remove all bindings from an indicated scope.
(All names in the scope are, in effect, simple variables whose
current value is nil
.)
Push a scope -- Clear the scope but, first, remember the old values on a stack.
Pop a scope -- The inverse of pushing a scope.
There are some "standard scopes", intended to be used with these operations, used to implement a simple (albeit heavyweight) "calling convention" based on namespaces. The standard scopes are:
environment_scope global_scope params_scope locals_scope returns_scope
Overview of Using Namespaces
The namespace
data structure will be used in tla-2.0
to provide a
uniform and completely "reflective" API to libarch
.
Using tla-1.x
, a "client program" of libarch
has little choice
but to run tla
as a subprocess. To invoke a libarch
entry
point today, a program has to build an argv
array of the
parameters, fork
and exec
the command, wait for the command and
collect any return parameters.
Using namespaces
, a 2.0
client will do something very similar,
but considerably easier in the details:
To run a command, a client can: (1) allocate a namespace
; (2)
initialize the namespace
by setting variables to reflect parameters
to the command (and the name of the command to run); (3) call
arch_run
; (4) read back exit status and results from the
namespace
; (5) free the namespace.
(There is also the possibility of namespaces persisting across
multiple libarch
invocations, of course.)
Function: {`alloc_namespace'}
Prototype:
ssize_t alloc_namespace (void);
Description:
Normally, return a small positive integer: the "namespace descriptor" for a newly allocated namespace.
Upon a recoverable allocation failure (a retry might succeed if the allocator permits it), return 0.
Upon catastrophic failure, return a value less than 0. Here and elsewhere, a catastrophic failure (usually indicated by a return value less than 0) indicates that most calls into
libarch
are no longer safe. Callers receiving a "catastrophic error" return value should, persumably, arrange to make an emergency exit from their process as quickly as possible.
Function: {`free_namespace'}
Prototype:
int free_namespace (ssize_t nspace);
Description:
nspace
should be a descriptor previously returned byalloc_namespace
.Free the indicated namespace and release all associated resources.
Normally, return 0.
Return a negative value upon a catastrophic error.
Should not, currently, return a positive value.
Scopes
Each namespace
data structure can contain multiple, disjoint
identifier name mappings at once. Each disjoint mapping is called
a namespace scope.
In other words, a namespace contains N
scopes. Every identifier
name can be bound to a variable in each of those N
scopes. Two
scopes can contain separate variables for a single name.
Function: {`namespace_create_scope'}
Prototype:
ssize_t namespace_create_scope (ssize_t nspace);
Description:
Return a positive integer which serves as the name for a new disjoint identifier mapping within
nspace
, a namespace previously returned byalloc_namespace
.Return -1 for catastrophic failures and 0 for potentially transient failures (such as some kind of allocation failure).
There is no correpsonding function to release a previously allocated scope. Programs are not expected to create large numbers of scopes.
Standard Scopes
This section defines {`namespace_statics'}
, {`namespace_globals'}
,
{`namespace_locals'}
, {`namespace_params'}
, {`namespace_returns'}
.
The namespace
library provides some standard, built-in
scopes. The integer identifiers for these scopes are
the same in all namespace
instances:
Prototypes:
ssize_t namespace_environment (void); ssize_t namespace_globals (void); ssize_t namespace_locals (void); ssize_t namespace_params (void); ssize_t namespace_returns (void);
Description:
Return the scope number for each of the 5 standard scopes.
Scope Lists
Every scope represents a mapping from identifiers to variables. In fact, scopes have additional structure beyond that.
Let's call a simple mapping from identifiers to mappings a symbol table.
A scope then has two parts: a list of symbol tables and a current offset into that list.
In pseudo-code, we might declare a scope
data structure
this way:
struct scope { int current_list_pos; list_of<struct symbol_table> symbtabs; }
If a user asks for the variable named X
in scope S
, then X
is
looked up in in the symbol table S.symbtabs[S.current_list_pos]
.
Using Scope Lists as Stacks
Function: {`namespace_push_scope'}
int namespace_push_scope (ssize_t nspace, ssize_t scope);
Allocate a new symbol table and append it to the symbtabs
list of the indicated scope. Set the current_list_pos
of that scope to point to this newly appended symbol table.
The new symbol table is initially empty (no identifiers bound
to variables).
Function: {`namespace_pop_scope'}
int namespace_pop_scope (ssize_t nspace, ssize_t scope);
Discard the last element of the symbtabs
list of the
indicated scope
. Set the current_list_pos
pointer of
the scope to point to the new last element of symbtabs
.
If this operation would otherwise leave the symbtabs
list
empty, instead, the list is reinitialized to contain a single
symbol table, initially containing no bindings.
Randomly Accessing Scope Lists
Function: {`namespace_n_scope_elements'}
ssize_t namespace_n_scope_elements (ssize_t nspace, ssize_t scope);
Return the number of elements in the symbtabs
list of
the indicated scope.
Function: {`scope_set_symbtab'}
ssize_t namespace_set_symbtab (ssize_t nspace, ssize_t scope, ssize_t symbtab_list_pos);
Change the current_list_pos
field of the indicated scope.
I.e., change which scope in the symbtabs
list is used, by
default, to look up variable names.
Variables, Indexes, and Locations
So. Namespaces contain scopes. Each scope is a dynamic list of symbol tables plus a "current symbol table" index. Each symbol table maps identifiers to variables. Please make sure you have absorbed enough from the preceeding sections to understand the description in this paragraph before continuing.
We're left with at least two questions: What are identifiers? and What are variables?.
Identifiers
Identifiers are represented as ASCII strings, beginning with an alphabetic character, containing only alphabetic, numeric, and underscore characters.
In namespace
APIs, identifiers are usually passed as
t_uchar *
pointers to 0-terminated strings.
Variables and Locations: Singletons, Lists, and Tables
Namespace variables are containers for one or more mutable locations.
Each location holds a scalar value. A scalar value can be a
number, (immutable) string, symbol, boolean, or the nil
value.
Singleton variables consist of just a single location. They hold a single scalar value. To access the scalar value stored in a singleton variable, you need only the variable's name.
List variables consist of a dynamically sized ordered collection of locations. New locations can be prepended to, appended upon, or inserted into the list. Locations can be deleted, too, from arbitrary positions within the list. To access a scalar value stored in a list variable, you need both the variable's name and an integer list element index.
Table variables consist of a "dynamically sized ordered collection of list of locations" (whew!). In plainer english, a table variable is a resizable list of rows, and each row is a resizable list of columns. Each element of a column is a separate location, containing some scalar value. To access a scalar value stored in an table variable, you need the variable's name, an integer row index, and an integer column index.
Lists and Tables Not Values
Don't make the mistake of thinking that a list variable is a variable whose value is a list.
There is no such thing as a value which is a list: all values
in namespaces are immutable scalars. Lists can be modified
and are composite values, containing N
locations, each containing
a separate scalars.
Think instead that some variables happen list structured (or array structured or whatever) -- instead of consisting of a single location, they happen to consist of a modifiable list of locations. The list in this equation is part of the variable -- not part of the value stored in the variable.
Got it?
Note: Please pay special attention to the function `namespace_copy', documented below. Understanding it's semantics is vital to understanding how to use namespaces effectively.
Function: {`namespace_rename'}
Prototype:
int namespace_rename (ssize_t nspace, t_uchar * old_name, ssize_t old_scope, t_uchar * new_name, ssize_t new_scope);
Description:
Change the name and scope of a variable. If the old and new names or scopes differ, the old name becomes (in effect) a singleton variable bound to nil and the new name is bound to the variable formerly bound to the old name.
Function: {`namespace_copy'}
Prototype:
int namespace_copy (t_uchar * to_name, ssize_t to_scope, ssize_t nspace, t_uchar * from_name, ssize_t from_scope)
Description:
If the
from
variable is a singleton variable, then make theto
variable a singleton variable containing an equal scalar value.If the
from
variable is a list or table variable, then theto
variable is made to be a reference to that same list or table. By reference, I mean that modifications made to either variable are visible as modifications to both -- they refer to the same underlying list or table.Although two variables can refer to the same list or table, nevertheless, each list or table specifically "belongs" to one variable in particular. If that variable is destroyed or converted to some other kind of variable, then the list or table is destroyed. When that happens, all other variables that refer to the same list or table are implicitly converted into singleton variables, containing the value
nil
.In other words, if you
namespace_copy
variableA
to variableB
, andA
was a list variable at the time, then:1. modifications to the
A
list effect theB
list and vice versa.2. if
A
is destroyed or is converted to some other kind of variable, thenB
becomes a singleton variable, initialized to the valuenil
.3. if
B
is destroyed or is converted to some other kind of variable, on the other hand,A
is uneffected.In effect,
A
has been copied toB
by reference with the caveat that, using ournamespace
interfaces, the representation of references are "safe" (e.g., can't result in de-referencing invalid pointers).
Namespace "Addresses" (aka Indexes)
Locations within a namespace are analogous to byte locations within the memory of a general purpose computer: they can contain a simple "scalar" value and, they have an address.
Namespace location addresses are the topic of this section.
To avoid confusion over the word "address", the actual name we use for namespace location addresses is namespace indexes.
* Type {`t_namespace_index'}
Prototype:
typedef <unspecified> t_namespace_index;
Description:
The type of address-like namespace indexes.
A namespace index functions similarly to an address: given a namespace and a namespace index, a unique (although possibly non-existent) location is refered to.
Given an index (and its namespace), a program can read and write the contents of the named location --- in that way, an index functions similarly to a pointer.
Unlike pointers, namespace indexes are reliably bounds checked. If your program has bugs, dereferencing or changing the location named by an index might return unexpected data or store data in an unintended part of the namespace --- but at least the namespace data structure will remain internally consistent. You won't wind up dereferencing an invalid C pointer, for example.
* Function {`namespace_index'}
Prototype:
int namespace_index (t_namespace_index * index_ret, ssize_t nspace, t_uchar * var_name, ssize_t scope);
Description:
Fill in
*index_ret
with an index that refers to the singleton location bound tovar_name
in the indicated scope.Normally, return 0.
Upon catastrophic error, return a value less than 0.
* Function {`namespace_list_index'}
Prototype:
int namespace_list_index (t_namespace_index * index_ret, ssize_t nspace, t_uchar * var_name, ssize_t scope, ssize_t list_pos);
Description:
Fill in
*index_ret
with an index that refers to the list element location bound tovar_name
in the indicated scope, at list offsetlist_pos
.Normally, return 0.
Upon catastrophic error, return a value less than 0.
* Function {`namespace_array_index'}
Prototype:
int namespace_array_index (t_namespace_index * index_ret, ssize_t nspace, t_uchar * var_name, ssize_t scope, ssize_t row, ssize_t col);
Description:
Fill in
*index_ret
with an index that refers to the array element location bound tovar_name
in the indicated scope, at array positionrow, col
.Normally, return 0.
Upon catastrophic error, return a value less than 0.
Setting and Getting Scalars Stored in Locations
Namespace indexes give us a way to translate location names within a namespace into a form of "address" for the indicated location. The functions in this section let you read or write the scalar stored in a given location.
Scalar values may be numbers, strings, symbols, booleans, or the
value nil
.
The Value nil
* Function: {`namespace_is_nil'}
Prototype:
int namespace_is_nil (ssize_t nspace, t_namespace_index index);
Description:
Return 1 if the indicated location exists and contains
nil
, 0 otherwise.Return a value less than 0 upon catastrophic error.
* Function: {`namespace_store_nil'}
Prototype:
int namespace_set_to_nil (ssize_t nspace, t_namespace_index index);
Description:
Store
nil
in the location indicated byindex
.If the indicated location does not currently exist, return 1, otherwise return 0.
(Except) return a value less than 0 for catastrophic errors.
Number Values
* Function: {`namespace_is_number}
Prototype:
int namespace_is_number (ssize_t nspace, t_namespace_index index);
Description:
Return 1 if the indicated location exists and contains a number, 0 otherwise.
Return a value less than 0 upon catastrophic error.
* Function: {`namespace_set_to_int32'}
Prototype:
int namespace_set_to_int32 (ssize_t nspace, t_namespace_index index, t_int32 new_value);
Description:
Store
new_value
in the location indicated byindex
.If the indicated location does not currently exist, return 1, otherwise return 0.
(Except) return a value less than 0 for catastrophic errors.
* Function: {`namespace_get_int32'}
Prototype:
int namespace_get_int32 (t_int32 * n_ret, ssize_t nspace, t_namespace_index index);
Description:
Retrieve the value stored in the location addressed by
index
, presuming that that location exists and contains a number representable as a 32-bit integer. Return 0 in this case.If the location does not exist or contains a non-number, return a value greater than 0.
Upon catastrophic error, return a value less than 0.
Boolean Values
* Function: {`namespace_is_boolean'}
Prototype:
int namespace_is_boolean (ssize_t nspace, t_namespace_index index);
Description:
Return 1 if the indicated location exists and contains a boolean, 0 otherwise.
Return a value less than 0 upon catastrophic error.
* Function: {`namespace_set_to_boolean'}
Prototype:
int namespace_set_to_int32 (ssize_t nspace, t_namespace_index index, int new_value);
Description:
Store
!!new_value
in the location indicated byindex
.If the indicated location does not currently exist, return 1, otherwise return 0.
(Except) return a value less than 0 for catastrophic errors.
* Function: {`namespace_get_boolean'}
Prototype:
int namespace_get_boolean (int * bool_ret, ssize_t nspace, t_namespace_index index);
Description:
Retrieve the 0-or-1 value stored in the location addressed by
index
, presuming that that location exists and contains a boolean. Return 0 in this case.If the location does not exist or contains a non-boolean, return a value greater than 0.
Upon catastrophic error, return a value less than 0.
Symbol Values
* Function: {`namespace_is_symbol'}
Prototype:
int namespace_is_symbol (ssize_t nspace, t_namespace_index index);
Description:
Return 1 if the indicated location exists and contains a symbol, 0 otherwise.
Return a value less than 0 upon catastrophic error.
* Function: {`namespace_set_to_symbol'}
Prototype:
int namespace_set_to_symbol (ssize_t nspace, t_namespace_index index, t_uchar * symbol);
Description:
Store
symbol
in the location indicated byindex
.
symbol
should be a string returned byidentifier_intern
(inlibhackerlab
). It is an undetected error if it is not. Therefore, most programs should stick tonamespace_set_to_symbol_str
.If the indicated location does not currently exist, return 1, otherwise return 0.
(Except) return a value less than 0 for catastrophic errors.
* Function: {`namespace_set_to_symbol_str'}
Prototype:
int namespace_set_to_symbol (ssize_t nspace, t_namespace_index index, t_uchar * symbol_name);
Description:
Intern the symbol named by 0-terminated
symbol_name
and store the resulting symbol in the location indicated byindex
.If the indicated location does not currently exist, return 1, otherwise return 0.
(Except) return a value less than 0 for catastrophic errors.
* Function: {`namespace_get_symbol'}
Prototype:
int namespace_get_boolean (t_uchar * identifier_ret, ssize_t nspace, t_namespace_index index);
Description:
Retrieve the symbol value stored in the location addressed by
index
, presuming that that location exists and contains a symbol. Return 0 in this case.If the location does not exist or contains a non-symbol, return a value greater than 0.
Upon catastrophic error, return a value less than 0.
String Values
* Function: {`namespace_is_string'}
Prototype:
int namespace_is_string (ssize_t nspace, t_namespace_index index);
Description:
Return 1 if the indicated location exists and contains a string, 0 otherwise.
Return a value less than 0 upon catastrophic error.
* Function: {`namespace_set_to_string_str'}
Prototype:
int namespace_set_to_string (ssize_t nspace, t_namespace_index index, t_uchar * str);
Description:
Store a copy of the 0-terminated string
str
in the location indicated byindex
.If the indicated location does not currently exist, return 1, otherwise return 0.
(Except) return a value less than 0 for catastrophic errors.
* Function: {`namespace_get_string_str_n'}
Prototype:
int namespace_get_boolean (t_uchar * str_ret, ssize_t * len_ret, ssize_t nspace, t_namespace_index index);
Description:
Retrieve the string value stored in the location addressed by
index
, presuming that that location exists and contains a string. Return 0 in this case.If the location does not exist or contains a non-string, return a value greater than 0.
Upon catastrophic error, return a value less than 0.
Namespace Buffers
libhackerlab
provides the module hackerlab/buffers
-- a data
structure for edittable strings supporting "markers".
In particular, hackarlab/buffers/buffers.h
provides for "buffer
sessions" -- flat namespaces of explicitly allocated and
freed buffers.
Every namespace has an associated buffer session:
Function: {`namespace_buffer_session'}
Prototype:
ssize_t namespace_buffer_session (ssize_t nspace);
Description:
Return the buffer session id associated with the indicated namespace or a value less than 0 upon error.
Return values less than 0 do not signal catastrophic errors. This function can not result in a catastrophic error.
Namespace Graphs
libhackerlab
provides the module hackerlab/graphs/ --
a data strcuture for edittable directed graphs.
The namespace data structure permits programs to allocate graphs which, if not otherwise freed, are guaranteed to be freed when the namespace itself is freed:
Function: {`namespace_alloc_graph'}
, {`namespace_free_graph'}
Prototypes:
ssize_t namespace_alloc_digraph (ssize_t nspace); int namespace_free_digraph (ssize_t nspace, ssize_t digraph);
Description:
Allocate (or free) a digraph associated with namespace
nspace
.Such graphs are automatically freed, if they have not already been explicitly freed, when the namespace is freed.
Namespace Descriptors and Subprocesses
function prototypes not provided
Similarly, namespaces provide for certain file descriptors to be automatically closed and for certain subprocesses to be killed and reaped when a namespace is freed.
Virtual Threads
Recall that, within a namespace, a scope consists of
symbtabs
, a list of symbol tables and current_list_pos
,
an index into the symbol table list.
Operations such as namespace_push_scope
allow us to use scopes
as a kind of "call frame stack". A function can save part of its
caller's bindings, install their own, then later restore the caller's
bindings (for example).
The gist is that within each scope, there can be multiple symbol tables, and which symbol table is current can change over time.
We can usefully repeat that abstraction at the next higher level. Instead of just saving and restoring individual symbol tables (aka, independent collections of bindings), we can instead save and restore entire sets of scopes.
A namespace thread is a data structure for holding a saved set of scopes. Programs can move the current values of any selected subset of a namespaces scopes to a thread object. In the namespace, the moved scope is replaced by an empty scope, containing no bindings. Programs can also restore the values of scopes from a thread: that discards the scopes replaced by those being restored and it leaves the thread object "empty".
Function: {`namespace_alloc_thread'}
, {`namespace_free_thread'}
Prototypes:
ssize_t namespace_alloc_thread (ssize_t nspace); int namespace_free_thread (ssize_t nspace, ssize_t thread);
Description:
Allocate (or free) a namespace thread within
nspace
.
Function: {`namespace_freeze'}
, {`namespace_thaw'}
Prototypes:
int namespace_freeze (ssize_t nspace, ssize_t thread, int n_scopes, ssize_t * scope_v); int namespace_thaw (ssize_t nspace, ssize_t thread)
Description:
Save (or restore) the indicated scopes in a namespace thread.
Error Codes
not written yet
notes:
rbcollins and I talked about
struct error_code { ssize_t error_class; ssize_t error_index_in_class; };
The APIs above assume single integer error codes, divided into negative and positive codes.
The APIs are returning error_index_in_class
. If the caller
knows what class of errors the callee can produce (and we
callees, by convention, to produce only one class of
error each) then the caller can form the complete struct
error_code
.
If the caller doesn't know the error class, then it is significant if the error code is non-0 and sometimes significant if a non-0 code is positive or negative.
Basic Namespace Utils
The earlier sections have built up quite a bit of structure in namespaces.
This section describes a set of "namespace utility functions" that can be built on the above. It would be tedious to make a complete list of all desirable utility functions .. just a few samples to illustrate:
List Operations
Namespace variables can be list variables. Given that it's convenient to have functions like:
int namespace_list_append (t_namespace_index append_to_list, ssize_t nspace, t_namespace_index append_from_list);
which, if both indexed variables are lists, appends a copy of
the from_list
to the to_list
.
Relational Operations
Given two variables which are tables:
int namespace_join (t_namespace_index output_table, ssize_t nspace, ssize_t join_column, t_scalar_comparison_fn (*cmp)(), void * cmp_rock, t_namespace_index table_a, t_namespace_index table_b, t_join_field_spec output_field, ...)
and so forth.
E.g., basic string/list/table ops.
form of int fn (output_var_specs, nspace, input_var_specs + params);
Entry Points and Calling Conventions
A thoroughly "librified" libarch
should include provisions which
makes it entry points easily accessible to the run-time environments
of scripting languages (and the like).
Such access to entry points generally requires:
1. A facility for finding and invoking entry points by symbolic name. Many of the most convenient ways to make
libarch
entry points available as functions in a scripting language involve having the ability to look up the list of (symbolic names for) available entry points at run time, and to be able to invoke an entry point given only its name.2. A facility for mashalling parameters and collecting return values from entry points, using a generic mechanism. If every entry point in
libarch
has its own C function type, then calling those entry points from a scripting language involves a lot of (programming) work. Each such entry point must be "wrapped", either by hand or using a tool such as Swig. It is simpler if there is a generic way to collect the arguments for or return values from alibarch
entry point; in other words, if scripting languages binding tolibarch
can get by with a single, generic wrapper that works for all entry points rather thanN+1
wrappers, one for each separate entry point.3. Useful invarients and error handling.
libarch
entry points need to make reasonably strong and universal guarantees. For example, absent a catastropic error, they should neither leak resources nor ever leave the internalnamespace
data structures in an inconsistent state.
How can we do that?
Just What is an Entry Point
For simplicity, I take the view that libarch
in tla-2.0
should
function as a kind of extra-fancy turing machine. Recall that a
turing machine has two parts: a finite state machine defining the
computational steps the machine can take; an "infinite tape" which
serves as the "memory" for the computation run by the turing
machine.
In libarch
's case, I regard a namespace
data structure as taking
the place of our "infinite tape". Namespaces are similar to turing
tapes in many ways: they have a simple topological structure and are
divided up into locations, each of which contains a scalar value.
Namespaces are different from Turing tapes in some of their arbitrary details. For example, namespaces divide their storage into "scopes" and each scope has a list of symbol tables. Thats' much more complicated than Turing's 1-D tape but the added complexity also adds realism: symbol tables are easier to program than the 1-d tape, even if they are logically equivalent; scopes are cheap to implement and handy, even if on a 1-d tape they would be absurdly expensive to simulate. A namespace is a 1-d tape modified in response to a bunch of pragmatic considerations.
If a namespace
takes the place of the infinite tape, then the
entry points in libarch
take the place of the finite state
machine.
Indeed, although libarch
may include some static data as a
performance optimization, from the perspective of its API, libarch
in 2.0 will be completely "stateless" --- all persistent state
between libarch
calls will be stored in a namespace, not in
libarch
itself.
A collection of stateless C entry points is, indeed, a form of finite state machine.
A Single Entry Point
librach
can get by with a single entry point (although doing so
is not literally proposed).
* Function: {`arch_run'}
Prototype:
int arch_run (ssize_t nspace, int (*poll)(void *), void * poll_rock);
Description:
Perform a single
libarch
state transition. Usually this means invoking the command selected by the current state of the namespacenspace
.As a side effect, the indicated namespace is modified to reflect the results of the state transition.
Normally return 0.
Returns a value less than 0 upon catastropic error.
Returns a value greater than 0 upon recoverable error (such there being no currently defined transition).
The
poll
parameter may be 0 or a user-supplied "poll function".arch_run
is free to (not required to) periodically callpoll
. Ifpoll
returns non-0,arch_run
will attempt to return to its caller as quickly as possible, even if that entails returning a (recoverable) error.
Provided that a client program suitably modifies the namespace
nspace
before each call to arch_run
, that can be a complete
interface. (I'm assuming, of course, that the client as the
separate namespace
interface available to set up parameters before
calling arch_run
and read back results after arch_run
returns.)
The arch_run
Calling Convention
Upon a call to arch_run
:
The Parameter Variable argv
The namespace variable named "argv"
in the standard
scope namespace_params()
must be initialized
much like an argv
you would pass to exec(2)
:
The namespace argv
variable must be a list variable.
argv[0]
must be the symbolic name of a libarch
"state
transition" entry point. Roughly, this should correspond to
a tla 1.X
subcommand name.
argv[1..n]
may contain a list of options and arguments
to the entry point named in argv[0]
.
The Standard Error Buffer
The namespace variable named "stderrbuf"
in the
standard scope namespace_globals()
may contain
a non-negative integer. If so, that integer is taken to
be the buffer id of the "standard error buffer", allocated
from the namespace's buffer set.
Within libarch
, code that wants to generate an error message
should prepend that message to the "stderrbuf"
. If the
buffer is not empty before prepending the new message, libarch
code should first prepend "\n\n---\n\n"
to the buffer.
The Standard Error Buffer
Similarly, the namespace variable named "stdoutbuf"
in the
standard scope namespace_globals()
may contain
a non-negative integer. If so, that integer is taken to
be the buffer id of the "standard output buffer", allocated
from the namespace's buffer set.
libarch
code can generate "normal output" by appending to this
buffer.
The Standard Input Buffer
Similarly, the namespace variable named "stdinbuf"
in the
standard scope namespace_globals()
may contain
a non-negative integer. If so, that integer is taken to
be the buffer id of the "standard input buffer", allocated
from the namespace's buffer set.
libarch
code can read "default input" by consuming the contents
of this buffer.
The Return Variables retv
and status
The namespace variable "retv"
in the standard scope
namespace_returns()
is used symmetrically to "argv"
.
Upon return from arch_run
, "retv"
will be a list variable,
containing the 0 or more "returned values" from the entry
point (regarded as a function call).
Upon return, the variable "status"
, also in the
namespace_returns()
scope, will be set to an integer
value: the same integer returned from arch_run
.
Callee-Preserves Locals
Upon return, libarch
will not have changed any variable values in
the standard scope namespace_locals()
.
If libarch
wants to use the locals
scope internally, it will
generally do so by "pushing" (namespace_push_scope
) that scope
on entry to arch_run
and popping the scope, to return the caller's
bindings, before return.
Callee-Preserves Parameters
The namespace_params()
is preserved similarly to the locals
scope.
Registering Commands
The 0
element of the namespace variable "argv"
in the
scope namespace_params()
contains the name of the command
to be invoked by arch_run
.
How is that name translated into an actual choice of which code to run?
* Function: {`arch_register_command'}
Prototype:
int arch_register_command (t_uchar * name, int (*fn) (ssize_t nspace, void * rock), void * fn_rock);
Description:
Remember that
fn
(provided thefn_rock
argument) implements thelibarch
entry point of the indicatedname
.
* Listing Commands
Not Illustrated: functions for listing the available commands and perhaps conventions for linking them to help messages and into help categories.
arch_run
Illustrated
If your module defined a new libarch
entry point (say, "my-id"
)
then during initialization, you'll need something like:
arch_run
Initialization Illustrated
if (0 != arch_register_command ("my-id", my_id_fn, (void *)0)) ... uh-oh, catastropic initialization error ...;
arch_run
Client Interface Illustrated
A libarch
client can call your new entry point in a style
remeniscent of using fork
and exec
:
ssize_t namespace; t_namespace_index argv0; t_namespace_index retv0; t_uchar * my_id; ssize_t my_id_len; namespace = alloc_namespace (); if (namespace < 0) ... catastropic error ...; if (0 != namespace_list_index (&argv0, namespace, "argv", namespace_params() 0)) ... catastropic error ...; if (0 != namespace_set_to_string (argv0, namespace, "my-id")) ... catastropic error ...; if (0 != arch_run (namespace)) ... some kind of error during the run of `my-id' ...; if (0 != namespace_list_index (&ret0, namespace, "retv", namespace_returns() 0)) ... catastropic error ...; if (0 != namespace_get_string_str_n (&my_id, &my_id_len, namespace, retv0)) ... catastropic error ...; /* We just called `my-id' with no parameters and got back * the string value `my_id', of length `my_id_len'. * * That string pointer remains valid until the namepace * binding of `"retv"' changes, for any reason. */
arch_run
Internal Interfaces Illustrated
Finally, here is what your implementation of my-id
might look like:
int my_id_fn (ssize_t nspace, void * rock) { t_uchar * id_string; int answer; if (1 != namespace_list_length (nspace, "argv", namespace_params())) { ... my id was called with bogus parameters ...; ... spew an error message to `stderrbuf' ...; ... then return with a non-0 exit code: ...; return arch_return_from_run (2); } id_string = low_level_call_to_compute_my_id (); if (!id_string) { ... spew an error message to `stderrbuf' ...; return arch_return_from_run (1); } answer = namespace_list_set_to_string (nspace, "retv", namespace_returns(), 0, id_string); free (id_string); return answer; }
my_id
turns out to be a particularly simple example.
A more complicated example might, for example, need to use
namespace local variables. That would involve calling
namespace_push_scope
to save the scope namespace_locals()
on entry, and calling namespace_pop_scope
to restore that
scope before returning.
Copyright
Copyright (C) 2004 Tom Lord
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
See the file COPYING
for further information about
the copyright and warranty status of this work.