Design/Examples
This page is for working out some sample programs that we'd like to run in the new version. Please illustrate how to handle the new features in the concrete syntax, and how to get at them from the C++ side. You may wish to borrow from Examples as well as from your own code in Dyna 0.3.
Comments on problems that these programs present to the compiler, and solutions, are also welcome.
At the bottom of this page are listed some topics that we should try to cover in these examples.
Contents 
Matrix multiplication
Here's our inference rule:
c(I,K) += a(I,J) * b(J,K).
Should also show how to explicitly request forward chaining, backward chaining, memoization, etc. This is probably done by declarations that apply to the consequent c
.
We also need at least one declaration, which would typically be written above the rule. The version below says that the axiom
type, which is an automatically declared union over [a/2,b/2]
, has values that are doubles and default to 0.
: double axiom := 0.
Possibly a better way of writing this default is to sayaxiom X := 0. % or more idiomatically, axiom _ := 0.which assigns 0 to all items
X
of typeaxiom
. This nonground assignment can be partially overridden by assignments such asb(3,4) := 5
.
The following declarations are optional, given the
declaration of axiom
above, but they may improve efficiency.
: double a(int X, int Y). : double b(int X, int Y). : double c(int X, int Y).
We should also show optional storage declarations that get various efficient versions of dense and sparse arrays.
Here are some alternative ways that one is allowed to write the double a(int X, int Y)
declaration:
: double a(int X, int Y). : _ a(_ X, _ Y). % Using _ for a type means to inherit it (double) or infer it (int). : double a(int _, int _). % Using _ for a var name means to generate a onetime var. : _ a(_ _, _ _). % We can do both. : _ a(int, Y). % _ can be omitted where syntactically valid. : _ a(_,_). % This is as much as we can syntactically omit ... : _ a/2. % ... unless we do this!
Neural networks
We begin with some optional declarations. Let's suppose we want to name nodes by arbitrary terms (not necessarily ints or strings). For readability and typesafety, however, we want to introduce an explicit node
type. So let's require the terms to be of the form node(Name)
where Name
is a term:
: void node(term Name) storedas Name.
The void
return type means that nodes don't themselves have values. The optional storedas
declaration ensures that there is no storage overhead for the node(...)
wrapper. Roughly speaking, any term may be stored as any other term.
Now let's optionally declare some item types with node
arguments.
(Remark: A node
term itself is not regarded as an item, since it is neither an axiom or a theorem.)
% axiom declarations : double input(node) := 0. % how much external input to this node? : double weight(From, To) := 0. % arguments are inferred to be nodes. : double target(node). % is there an external target for this node?
% theorem declarations : double in(node). : double out(node).
Note that we could have given storage declarations for these, too. For example,
double in(node X) storedas X. double weight(From, To) storedas (To,From).
Should probably spin out this example further. Maybe input nodes are a subtype of node, and only they reserve space to store input.
No default value was specified for the target
axiom, so it defaults to undef. Asking for the value of an undef item in C++ throws an exception, and such an item cannot be used to help derive other items.
Here are the inference rules. Recurrent networks (involving cycles in the weight graph) are okay without any special handling.
in(To) += input(To). in(To) += out(From) * weight(From,To). % the compiler would warn if you accidentally wrote weight(To,From) out(Node) = sigmoid(in(Node)). error += (out(Node)target(Node))**2.
While we had to specify default values for axioms (or allow them to be undef, which would mostly work fine in this example), we cannot specify them for theorems. Theorems' values are always defined by rule, including in the default case:

error
is defined to be a sum over expressions that default to undef (sincetarget(Node)
does). Thus, if there are no nondefault summands, the sum is 0 (the identity for+
). 
in(To)
is defined to be a sum over infinitely many products that default to 0. Thus, if there are no nondefault summands, the sum is again 0 (the sum of infinitely many zeroes). 
out(Node)
is defined to besigmoid(in(Node))
, so it issigmoid(0)
(i.e.,0.5
) ifin(Node)
is still 0. (Ifin(Node)
were undef for some reason, thenout(Node)
would be too, since its aggregator is=
.)
A training algorithm would try to minimize error
. This declaration says that it is the weight
nodes that are trainable, so that the compiler can generate code to help train them.
: trainable(weight).
This declaration adds weight/2
to the union type trainable
, which is a possibly empty subtype of axiom
.
It might be better if we could find a way to do without thetrainable
declaration, instead allowing any subtype ofaxiom
to be trained at runtime. Perhaps thetrainable
declaration is simply an optional pragma suggesting that the compiler should generate efficient support for trainingweight
.
We also need to define sigmoid
, which appeared above. Let's say that the value of a sigmoid/1
term is a double
computed by a foreign function. For example, the value of sigmoid(0.5*0.5)
is computed by looking up the value of the term 0.5*0.5
(namely 0.25
) and calling the foreign function on the result.
: foreign double sigmoid(double =X).
The type double =
is the type of expressions X
with double
value  in other words, =X
is a double
.
We still need a way to specify the C++ name of the foreign function, as well as any algebraic properties, such as the fact that it distributes overmax
ormax=
. The foreign function can't simply be calledsigmoid
, because (barring renaming by acname
declaration) the namesigmoid
in C++ will already be a constructor function used to construct a C++ term object, of typesigmoid_t
, such assigmoid(times(0.5,0.5))
.
It would be consistent for Dyna to pass the expression X
to the foreign function. However, as special handling for declared =
arguments to foreign functions, we instead pass the value =X
. Then the foreign function is only allowed to depend on =X
, not X
itself, and does not need access to the chart, which foreign functions probably should never have.
This special handling does make a problem for a foreign function likederivative
, which really does want to accept adouble =
expression and not its value:: foreign double =derivative(double =Formula).Here however is a possible workaround using a typedef so that the argument isn't syntactically a
double =
:: double_expr ~ double =. : foreign double_expr derivative(double_expr).Another, hardertouse workaround uses a wrapper, so that the
double =
is not a direct argument ofderivative
but only an argument to an argument:: void formula(double =). : foreign formula derivative(formula).
Another issue: Probably we want to regard0.5*0.5
and likewisesigmoid(0.5*0.5)
as expressions, not as foreign axioms from an infinite set. However, if they didn't evaluate all of their arguments, would we then regard them as foreign axioms and hence as items?
Permutation Neighborhood
The following program computes the best permutation in an exponentialsize neighborhood of an existing permutation using a finitestate acceptor and two other cost functions.
 Note that this requires automatic evaluation of subterms such as
I+1
. Also you're mixing aggregation operators. Anything else new? Jason 04:22, 28 Apr 2006 (EDT)
% Axioms: % initial (State), % final (State), % arc (Source, Target, Input), % b (Left, Right), % c (Left, Middle, Right). % permute (From, To), goal min= path (0, =length, S, T) + initial (S) + final (T). length max= From whenever permute (From, To). path (I, I + 1, S, T) min= arc (S, T, W) whenever permute (W, I + 1). path (I, K, S, T) min= path (I, J, S, M) + path (J, K, M, T) + combine (I, J, K, false). path (I, K, S, T) min= path (J, K, S, M) + path (I, J, M, T) + combine (I, J, K, true). combine (I, I, K, Swap) = 0. combine (I, K, K, Swap) = 0. combine (I, J, K, Swap) = combine (I, J, K  1, Swap) + combine (I + 1, J, K, Swap)  combine (I + 1, J, K  1, Swap) + before (I + 1, K, Swap)  middle (I + 1, K, Swap) + outside (I + 1, K, Swap). middle (I, K, Swap) += cyclic (I, M, K, Swap) if I < M & M < K. outside (I, K, Swap) += cyclic (I, K, O, Swap) if O < I  K < O. before (I, K, false) = before (I, K). before (I, K, true) = before (K, I). before (I, J) = b (L, R) if permute (L, I) & permute (R, J). cyclic (I, J, K, false) = cyclic (I, J, K). cyclic (I, J, K, true) = cyclic (J, I, K). cyclic (I, J, K) = c (L, M, R) if permute (L, I) & permute (M, J) & permute (R, K).
Ideally, we could also compute the new permutation implied by the best path. I'm not sure how to get that within Dyna.
 Well, hyperedges need to show up as items so you can follow the backpointers. Maybe proof trees (in this case paths) should show up as items too. How would you like to write it in Dyna? Jason 04:22, 28 Apr 2006 (EDT)
Also, b
and c
should be derived from features of the identity permutation, and the weights of those features should be trainable.
This version doesn't properly handle Failed to parse (Missing texvc executable; please see math/README to configure.): \epsilon transitions between states in the acceptor.
CKY with trainable latent annotations
Note the mixed =, + and * with backward for computing dot products. Two theorems are built for pruning half() constitents: one maximizes over the equivalence class, and another counts members of this class.
 I don't see any of this in the example; did something get lost? Jason 04:02, 28 Apr 2006 (EDT)
: type smallInt = 0..15. % can be stored more efficiently? double constit(string label, smallInt annotation, unsigned, unsigned). boolean allowed(...). : sum(rewrites(X, Alpha, ...)) = log(1.0). : sum(annotateStart(...)) = log(1.0). % since annotateStart has always only 1 arg it would be equivalent with : sum(annotateStart(_)) = log(1.0). : union(nonpersistent, [word, constit, length, allowed]). : retractable(nonpersistent). : trainable([rewrites, annotateStart]). constit(X, Alpha, from=I, to=J) log+= rewrites(X, Alpha, Word) whenever allowed(X, Word, I, J). constit(X, Alpha, I, K) log+= rewrites(X, Alpha, Y, Beta, Z, Gamma) + constit(Y, Beta, I, J) + constit(Z, Gamma, J, K) whenever allowed(X, Y, Z, from=I, via=J, to=K). goal log+= annotateStart(Alpha) + constit("START", Alpha, 0, length).
Summary:
 No declaration:
: item(item,double,inf)
 a (memoryefficent) type can be defined (smallInt),
 dots notation,
 nicer sum constraints for dynamite,
 keyvalue pairs in argument lists,
 different structure declarations,
 an item can have boolean (see "allowed") or unsigned (see "length") value,
 arguments are evaluated by default (see "length" in the last line)
The sum constraints might be nicer and more explicit aslhssum(X,Alpha) log+= rewrites(X, Alpha, ...). : constrain(lhssum(...) == log(1.0)).If you want to use an anonymous sum item instead of introducing
lhssum
, you could use the notation that I've suggested before, where the aggregator is used as a prefix operator:: constrain((log+= rewrites(X, Alpha, ...)) == log(1.0)).nJason 04:02, 28 Apr 2006 (EDT)
We're already anticipating keyvalue pairs in argument lists (see the section on keywords at Design/StorageSpec), but using the syntaxconstit(X, Alpha, From~I, To~J)following the convention that
~
and~~
are about assigning and comparing quoted terms whereas=
and==
assign and compare their values. How strongly do you want=
rather than~
in the case of keyword args? Jason 04:02, 28 Apr 2006 (EDT)
Your intended meaning for...
seems to be that there can be any number of remaining arguments and they can have any value (similar to_
). Questions/thoughts:Jason 04:02, 28 Apr 2006 (EDT)
 If some of the omitted arguments were declared with defaults, does
...
in the pattern mean that they are constrained to have their default values, or are unconstrained as with_
? If the functor is overloaded with multiple arities, presumably we get a pattern that can match terms of different arities.
 Presumably
...
should also be usable with keyword arguments. I note that David used
coverage(..., Sextent)
below, which seems symmetric enough  you can bind the first few and/or the last few items positionally. Ditto for mixed positional and keyword args. Your
annotateStart(...)
seems equivalent to the typerestricted free variableannotateStart _
, though perhaps your version is better style.
Syntactic MT decoding
In addition to backward chaining for computing feature/weight dot products, I assume you can assign to the priority of an item in an inference rule, as well as refer to "value" (perhaps a reserved word?).
% a quasisynchronous grammar translation decoder : item(item, double, inf). : pragma(keep_all_antecedents(item)). : union(nonpersistent,[ tword,dep,endS,endT,link ]). : retractable(nonpersistent). : union(lmstate, [ s, nil ] ). : union(linkState, [ rangeS, nil ] ). : structure(coverage(int twords, int skids, int halfKids, rangeS sextent)). : foreign(term merge_coverage(term, term)). : foreign(term merge_lm(int, term, term)). : structure(s(string first, term rest)). : foreign(nodep). : foreign(nodom). : foreign(neq). : foreign(getspan). : union(backs, [ score_word, score_hook, extend, complete, cap_tree ]). : pragma(backword_chain(backs)).
 The above syntax is all oldfashioned. I assume it's ok to update it. Jason 03:16, 28 Apr 2006 (EDT)
% Calculate transitive tree relations sibling(A, B) max= dep(A, H) + dep(B, H) + neq(A, B). desc(A, B) max= dep(A, M) + desc(M, B) + getspan(A,B). desc(A, B) max= dep(A, B) + getspan(A,B). grandchild(A, B) max= dep(A, M) + dep(M, B). ccommand(A, B) max= (dep(A, C) + desc(B, C) whenever alignChild("ccommandee")) + nodom(A, B). % end tree relations
 Shouldn't the above be booleans written with
=
and&
? Jason 03:16, 28 Apr 2006 (EDT)
goal max= root(Head,Lcov,Rcov,Llm,Rlm) + cap_tree(Head,Lcov,Rcov,Llm,Rlm). root(nonterm(Head,Link,Hword),Lcov,Rcov,Llm,Rlm) max= capped(LEFT, nonterm(Head,Link,Hword),Lcov,Llm) + rootchild(Head) + capped(RIGHT, nonterm(Head,Link,Hword),Rcov,Rlm). half(RIGHT,nonterm(Head,rangeS(U,V),W),coverage(1,0,0,rangeS(U,V)),"noadj",nil,s(W,nil)) max= (tword(LinkTag,LinkWord,rangeS(U,V)) whenever trans(LinkWord, W) whenever wchild(Head,W)) + score_word(Head,W,LinkTag,LinkWord). % compute the dot product of feature and weight vectors % note = instead of max= score_word(Head,W,LinkTag,LinkWord) = trans(LinkWord, W) * param("trans") + btrans(LinkWord, W) * param("btrans") + ctrans(LinkTag, Head) * param("ctrans") + bctrans(Head, LinkTag) * param("bctrans") + wchild(Head, W) * param("dmv") + wtag(W, Head) * param("wtag").
 I bet the dotproduct rule above can be made prettier somehow, maybe by breaking it into a bunch of
+=
rules that can then be collapsed down into fewer rules. Jason 03:16, 28 Apr 2006 (EDT)
capped(Dir,nonterm(Head,Link,Hword),Hcov,Hlm) max= half(Dir,nonterm(Head,Link,Hword),Hcov,Adj,Alink,Hlm) + stop(Dir,Head,Adj). half(LEFT,Head,coverage(0,0,0,rangeS(0,0)),"noadj",nil,nil) = 0 whenever rdone(Head). % only instantiate when right is done rdone(Head) max= capped(RIGHT,Head,Hcov,Hlm). hook(Dir,Head,Child,Hcov,Hlm) max= (half(Dir,Head,Hcov,Adj,Alink,Hlm) whenever half(RIGHT,Child,coverage(1,0,0,Cloc),"noadj",nil,Clm)) + score_hook(Dir,Head,Child,Adj,Alink,Hcov,Hlm) priority := inf if max_left_dep > Hcov.twords + 1. % We'd like to put in some dependency length bounds here.
 The idiom
priority := inf if
might deserve a shortcut name likeprune if
. Not sure, though. Jason 03:16, 28 Apr 2006 (EDT)
 The use of
Hcov.twords
here is trying to extract a keyword argument from a term of typecoverage
. I think this is a good accessor syntax, and follows naturally from our use of keyword constructors. We were already planning to allow keyword accessors of this kind on the C++ side; clearly they should be available in Dyna too.
 Field names are now supposed to be capitalized, since they are repeatable variables in the type declaration. Hence it should be
Hcov.Twords
.
 (Then I guess other C++ accessors like
.functor
,.argc
and.arg(3)
 though perhaps not.argc(I)
 should also be allowed in Dyna. Could be useful for analyzing subterms. In fact, maybe=X
and=foo
are just sugar forX.value
andfoo.value
.)
 (And I guess if
Hcov
were a primitive object (a.k.a. foreign datatype), we could use this syntax to call its methods: e.g.,Hcov.accessor(2)
instead ofaccessor(Hcov,2,ReturnedValue)
.)
 Presumably
Hcov.Twords
is only legal if theHcov
variable can be typed at compile time to something likecoverage
that is guaranteed to have aTwords
accessor, thanks to the declaration ofhalf
or an explicit type restrictor in the rule. The use of theTwords
accessor could be a clue to type inference, allowing us to infer a declaration forhalf
; we could even infer a type restriction on this specific rule but that seems dangerous. Jason 03:16, 28 Apr 2006 (EDT)
score_hook(Dir,nonterm(Htag,Hlink,Hword),nonterm(Ctag,Clink,Cword),Adj,Alink,Hcov,Hlm) = nonstop(Dir, Htag, Adj) * param("dmv") + dchild(Dir, Htah, Adj, Ctag) * param("dmv") + slm(Dir, Cword, Hword) * param("slm") + cprod(Dir, =tag(Hlink), =tag(Clink), Htag, Ctag) * param("cprod"). % In the line above, we want the source tag at a given source position. tag(Loc) = Tag if tword(Tag,Word,Loc). % see the pruning clauses tacked onto half(), which we should replicate here trap(RIGHT,Head,Child,merge_coverage(Hcov,Ccov),merge_lm(RIGHT,Hlm,Clm)) max= hook(RIGHT,Head,Child,Hcov,Hlm) + capped(LEFT,Child,Ccov,Clm) + extend(RIGHT,Head,Child,Hcov,Ccov,Hlm,Clm) priority := inf if foo.
 Surely rather than "replicating" the pruning clause foo, we should be able to make foo be a backwardchained item? It would presumably have to take arguments, but we might be able to get away with just a
This
argument as below.
trap(LEFT,Head,Child,merge_coverage(Hcov,Ccov),merge_lm(LEFT,Hlm,Clm)) max= hook(LEFT,Head,Child,Hcov,Hlm) + capped(RIGHT,Child,Ccov,Clm) + extend(LEFT,Head,Child,Hcov,Ccov,Hlm,Clm) priority := inf if foo. extend(Dir,Head,Child,Hcov,Ccov,Hlm,Clm) = foo. % old pruning rules % trap(Dir,Head,Child,Cov,Lm) max= ptrap(Dir,Head,Child,Cov,Lm) + prune_trap(Dir,Head,Child,Cov,Lm). % half(Dir,Head,Cov,Adj,Alink,Lm) max= phalf(Dir,Head,Cov,Adj,Alink,Lm) + prune_half(Dir,Head,Cov,Adj,Alink,Lm). % we don't want to destructure the whole coverage best_half(Dir, Sextent) max= half(Dir,Head,coverage(...,Sextent),Chead,Clink,Lm). count_half(Dir, Sextent) += 1 whenever half(Dir,Head,coverage(...,Sextent),Chead,Clink,Lm). half(Dir,Head,merge_coverage(Hcov,Ccov),Chead,Clink,merge_lm(Dir,Hlm,Clm)) max= trap(Dir,Head,nonterm(Chead,Clink,Cword),Hcov,Hlm) + capped(Dir,nonterm(Chead,Clink,Cword),Ccov,Clm) + complete(Dir,Head,nonterm(Chead,Clink,Cword),Hcov,Ccov,Hlm,Clm) priority := inf if (value + hparam("beam_width") < best_half(Dir, merge_coverage(Hcov, Ccov).sextent) or count_half(Dir, merge_coverage(Hcov, Ccov).sextent) > hparam("max_bin")).
 The expression
merge_coverage(Hcov, Ccov).Sextent
is not extracting theSextent
field from the constructed term of typemerge_coverage
itself (though that could also be useful), but rather from that term's (foreignly computed) value, which is another term. So I think this should actually be written as(=merge_coverage(Hcov, Ccov)).Sextent
, where the extra parens are probably optional by precedence rules. Jason 03:16, 28 Apr 2006 (EDT)
complete(Dir,Head,Child,Hcov,Ccov,Hlm,Clm) = foo. top_trans(S) max= trans(S,T). tag_trans(STag) max= ctrans(STag,TTag).
In the "subordinate clause"priority := inf if value + hparam("beam_width") < best_half(Dir, etc. etc.how about something like
with priority(This) := inf if This + beam_width < best_half(Dir, etc. etc.where
this
is a reserved word referring to the consequent of the main clause. Or perhapsvalue
was supposed to be the value of the antecedent expression, not the accumulated value of the consequent, in which case maybe the game iswith priority(Consequent) := inf if Antecedent + beam_width < best_half(Dir, etc. etc.But maybe you are not trying to change the priority of the consequent, only this particular update to the consequent. Hence perhaps
with priority(This) = inf if Antecedent + beam_width < best_half(Dir, etc. etc.In any case, your second rule is "subordinate" because it implicitly refers to the hyperedge item
This
created by the first rule.This
has keyword arguments, allowing us to get atThis.Dir
,This.Consequent
,This.Antecedent
, etc., where theThis.
part is omittable by convention, just as in C++. Maybe that hyperedge item should be more explicit in Dyna? We may want it to be anyway, for making the parse forest beautiful and efficient, and for definining queries over its hyperedges.The subordinate clauses let you economically write multiple inference rules (one defining
Consequent
, the other definingpriority(Consequent)
) that are triggered by the same set of antecedents. This is not quite a case of a multiheaded rule because the bodies are also different (the second body isinf if ...
). But there is some notion of keeping the local variables from a previous definition "in scope." Jason 03:16, 28 Apr 2006 (EDT)
YOUR EXAMPLE HERE
Topics
Here are some topics that we want to illuminate above using both real and artificial examples:
 declarations
 stick with : or something better?
 stick with storedas or something better?
 when to write "pragma" explicitly?
 inheritance of declarations
 typespecific declarations: chaining and memoization, different kinds of transience, transparency, retractability, incremental vs. replacement updates, determinism, parallelization ...
 rulespecific declarations, especially declaring transformations and prunability
 programspecific declarations: multicharts, declaring new operators ...
 types
 default values
 type inference (new version)
 meaning of type restrictors
 const (chartindependent) side conditions to use as guards in type defs
 axioms
 axiom accumulation operators like max= as well as := (maybe also max:= to control what assignment does but still allow retractability)
 inference rules
 "if" vs. "needed only if" ("especially if"?) (should be property of consequent rather than specific rule?)
 rulespecific declarations
 evaluated subterms
 syntax like (+= foo(X,Y)) for an anonymous sum inside a larger expression
 inference order
 prioritization
 pruning
 backward chaining; mixing forward and backward chaining
 funky agenda disciplines, including CKY order or blockprocessing of related items
 convergence declarations
 memoization and flush policy
 memoizing foreign functions; interaction with disk
 tape
 propagating different kinds of updates for general formulas
 hyperedges
 efficiently storing them
 specifying which ones to store (all? none? 1best?) and reconstructing them by backward chaining
 finding entire antecedent proof trees; efficiently constructing kbest trees on demand
 finding consequents as well as antecedents
 controlling what they look like (in case of program transformation)
 C++ accessors
 queries over chart
 nonground items in chart (default value is the simplest example)
 explicit inferencerules with "<" for accumulating queries? or just have queriable types?
 backwardchained and memoized queries (result materialized on demand)
 queries over foreign items
 C++ interface
 foreign interface
 declaration of foreign primitives
 operators and their properties
 defaults (under various accumulation operators)
 naming the foreign class and operator functions that get used (and what if not C++?)
 query iterator over all instances of the type, e.g., goal += 1/int X if X > 0
 declaration of foreign items
 autoevaluation of arguments
 explicitly updating/retracting their values
 queries
 declaration of foreign primitives