This page is for working out some sample programs that we'd like to run in the new version. Please illustrate how to handle the new features in the concrete syntax, and how to get at them from the C++ side. You may wish to borrow from Examples as well as from your own code in Dyna 0.3.
Comments on problems that these programs present to the compiler, and solutions, are also welcome.
At the bottom of this page are listed some topics that we should try to cover in these examples.
Here's our inference rule:
c(I,K) += a(I,J) * b(J,K).
Should also show how to explicitly request forward chaining, backward chaining, memoization, etc. This is probably done by declarations that apply to the consequent
We also need at least one declaration, which would typically be written above the rule. The version below says that the
axiom type, which is an automatically declared union over
[a/2,b/2], has values that are doubles and default to 0.
:- double axiom := 0.
Possibly a better way of writing this default is to sayaxiom X := 0. % or more idiomatically, axiom _ := 0.
which assigns 0 to all items
axiom. This non-ground assignment can be partially overridden by assignments such as
b(3,4) := 5.
The following declarations are optional, given the
axiom above, but they may improve efficiency.
:- double a(int X, int Y). :- double b(int X, int Y). :- double c(int X, int Y).
We should also show optional storage declarations that get various efficient versions of dense and sparse arrays.
Here are some alternative ways that one is allowed to write the
double a(int X, int Y) declaration:
:- double a(int X, int Y). :- _ a(_ X, _ Y). % Using _ for a type means to inherit it (double) or infer it (int). :- double a(int _, int _). % Using _ for a var name means to generate a one-time var. :- _ a(_ _, _ _). % We can do both. :- _ a(int, Y). % _ can be omitted where syntactically valid. :- _ a(_,_). % This is as much as we can syntactically omit ... :- _ a/2. % ... unless we do this!
We begin with some optional declarations. Let's suppose we want to name nodes by arbitrary terms (not necessarily ints or strings). For readability and type-safety, however, we want to introduce an explicit
node type. So let's require the terms to be of the form
Name is a term:
:- void node(term Name) storedas Name.
void return type means that nodes don't themselves have values. The optional
storedas declaration ensures that there is no storage overhead for the
node(...) wrapper. Roughly speaking, any term may be stored as any other term.
Now let's optionally declare some item types with
node term itself is not regarded as an item, since it is neither an axiom or a theorem.)
% axiom declarations :- double input(node) := 0. % how much external input to this node? :- double weight(From, To) := 0. % arguments are inferred to be nodes. :- double target(node). % is there an external target for this node?
% theorem declarations :- double in(node). :- double out(node).
Note that we could have given storage declarations for these, too. For example,
double in(node X) storedas X. double weight(From, To) storedas (To,From).
Should probably spin out this example further. Maybe input nodes are a subtype of node, and only they reserve space to store input.
No default value was specified for the
target axiom, so it defaults to undef. Asking for the value of an undef item in C++ throws an exception, and such an item cannot be used to help derive other items.
Here are the inference rules. Recurrent networks (involving cycles in the weight graph) are okay without any special handling.
in(To) += input(To). in(To) += out(From) * weight(From,To). % the compiler would warn if you accidentally wrote weight(To,From) out(Node) = sigmoid(in(Node)). error += (out(Node)-target(Node))**2.
While we had to specify default values for axioms (or allow them to be undef, which would mostly work fine in this example), we cannot specify them for theorems. Theorems' values are always defined by rule, including in the default case:
erroris defined to be a sum over expressions that default to undef (since
target(Node)does). Thus, if there are no non-default summands, the sum is 0 (the identity for
in(To)is defined to be a sum over infinitely many products that default to 0. Thus, if there are no non-default summands, the sum is again 0 (the sum of infinitely many zeroes).
out(Node)is defined to be
sigmoid(in(Node)), so it is
in(Node)is still 0. (If
in(Node)were undef for some reason, then
out(Node)would be too, since its aggregator is
A training algorithm would try to minimize
error. This declaration says that it is the
weight nodes that are trainable, so that the compiler can generate code to help train them.
This declaration adds
weight/2 to the union type
trainable, which is a possibly empty subtype of
It might be better if we could find a way to do without the
trainabledeclaration, instead allowing any subtype of
axiomto be trained at runtime. Perhaps the
trainabledeclaration is simply an optional pragma suggesting that the compiler should generate efficient support for training
We also need to define
sigmoid, which appeared above. Let's say that the value of a
sigmoid/1 term is a
double computed by a foreign function. For example, the value of
sigmoid(0.5*0.5) is computed by looking up the value of the term
0.25) and calling the foreign function on the result.
:- foreign double sigmoid(double =X).
double = is the type of expressions
double value -- in other words,
=X is a
We still need a way to specify the C++ name of the foreign function, as well as any algebraic properties, such as the fact that it distributes over
max=. The foreign function can't simply be called
sigmoid, because (barring renaming by a
cnamedeclaration) the name
sigmoidin C++ will already be a constructor function used to construct a C++ term object, of type
sigmoid_t, such as
It would be consistent for Dyna to pass the expression
X to the foreign function. However, as special handling for declared
= arguments to foreign functions, we instead pass the value
=X. Then the foreign function is only allowed to depend on
X itself, and does not need access to the chart, which foreign functions probably should never have.
This special handling does make a problem for a foreign function like
derivative, which really does want to accept a
double =expression and not its value::- foreign double =derivative(double =Formula).
Here however is a possible workaround using a typedef so that the argument isn't syntactically a
double =::- double_expr ~ double =. :- foreign double_expr derivative(double_expr).
Another, harder-to-use workaround uses a wrapper, so that the
double =is not a direct argument of
derivativebut only an argument to an argument::- void formula(double =). :- foreign formula derivative(formula).
Another issue: Probably we want to regard
sigmoid(0.5*0.5)as expressions, not as foreign axioms from an infinite set. However, if they didn't evaluate all of their arguments, would we then regard them as foreign axioms and hence as items?
The following program computes the best permutation in an exponential-size neighborhood of an existing permutation using a finite-state acceptor and two other cost functions.
- Note that this requires automatic evaluation of subterms such as
I+1. Also you're mixing aggregation operators. Anything else new? Jason 04:22, 28 Apr 2006 (EDT)
% Axioms: % initial (State), % final (State), % arc (Source, Target, Input), % b (Left, Right), % c (Left, Middle, Right). % permute (From, To), goal min= path (0, =length, S, T) + initial (S) + final (T). length max= From whenever permute (From, To). path (I, I + 1, S, T) min= arc (S, T, W) whenever permute (W, I + 1). path (I, K, S, T) min= path (I, J, S, M) + path (J, K, M, T) + combine (I, J, K, false). path (I, K, S, T) min= path (J, K, S, M) + path (I, J, M, T) + combine (I, J, K, true). combine (I, I, K, Swap) = 0. combine (I, K, K, Swap) = 0. combine (I, J, K, Swap) = combine (I, J, K - 1, Swap) + combine (I + 1, J, K, Swap) - combine (I + 1, J, K - 1, Swap) + before (I + 1, K, Swap) - middle (I + 1, K, Swap) + outside (I + 1, K, Swap). middle (I, K, Swap) += cyclic (I, M, K, Swap) if I < M & M < K. outside (I, K, Swap) += cyclic (I, K, O, Swap) if O < I | K < O. before (I, K, false) = before (I, K). before (I, K, true) = before (K, I). before (I, J) = b (L, R) if permute (L, I) & permute (R, J). cyclic (I, J, K, false) = cyclic (I, J, K). cyclic (I, J, K, true) = cyclic (J, I, K). cyclic (I, J, K) = c (L, M, R) if permute (L, I) & permute (M, J) & permute (R, K).
Ideally, we could also compute the new permutation implied by the best path. I'm not sure how to get that within Dyna.
- Well, hyperedges need to show up as items so you can follow the backpointers. Maybe proof trees (in this case paths) should show up as items too. How would you like to write it in Dyna? Jason 04:22, 28 Apr 2006 (EDT)
c should be derived from features of the identity permutation, and the weights of those features should be trainable.
This version doesn't properly handle Failed to parse (Missing texvc executable; please see math/README to configure.): \epsilon -transitions between states in the acceptor.
CKY with trainable latent annotations
Note the mixed =, + and * with backward for computing dot products. Two theorems are built for pruning half() constitents: one maximizes over the equivalence class, and another counts members of this class.
- I don't see any of this in the example; did something get lost? Jason 04:02, 28 Apr 2006 (EDT)
:- type smallInt = 0..15. % can be stored more efficiently? double constit(string label, smallInt annotation, unsigned, unsigned). boolean allowed(...). :- sum(rewrites(X, Alpha, ...)) = log(1.0). :- sum(annotateStart(...)) = log(1.0). % since annotateStart has always only 1 arg it would be equivalent with :- sum(annotateStart(_)) = log(1.0). :- union(nonpersistent, [word, constit, length, allowed]). :- retractable(nonpersistent). :- trainable([rewrites, annotateStart]). constit(X, Alpha, from=I, to=J) log+= rewrites(X, Alpha, Word) whenever allowed(X, Word, I, J). constit(X, Alpha, I, K) log+= rewrites(X, Alpha, Y, Beta, Z, Gamma) + constit(Y, Beta, I, J) + constit(Z, Gamma, J, K) whenever allowed(X, Y, Z, from=I, via=J, to=K). goal log+= annotateStart(Alpha) + constit("START", Alpha, 0, length).
- No declaration:
- a (memory-efficent) type can be defined (smallInt),
- dots notation,
- nicer sum constraints for dynamite,
- key-value pairs in argument lists,
- different structure declarations,
- an item can have boolean (see "allowed") or unsigned (see "length") value,
- arguments are evaluated by default (see "length" in the last line)
The sum constraints might be nicer and more explicit aslhssum(X,Alpha) log+= rewrites(X, Alpha, ...). :- constrain(lhssum(...) == log(1.0)).
If you want to use an anonymous sum item instead of introducing
lhssum, you could use the notation that I've suggested before, where the aggregator is used as a prefix operator::- constrain((log+= rewrites(X, Alpha, ...)) == log(1.0)).n
Jason 04:02, 28 Apr 2006 (EDT)
We're already anticipating key-value pairs in argument lists (see the section on keywords at Design/StorageSpec), but using the syntaxconstit(X, Alpha, From~I, To~J)
following the convention that
~~are about assigning and comparing quoted terms whereas
==assign and compare their values. How strongly do you want
~in the case of keyword args? Jason 04:02, 28 Apr 2006 (EDT)
Your intended meaning for
...seems to be that there can be any number of remaining arguments and they can have any value (similar to
Jason 04:02, 28 Apr 2006 (EDT)
- If some of the omitted arguments were declared with defaults, does
...in the pattern mean that they are constrained to have their default values, or are unconstrained as with
- If the functor is overloaded with multiple arities, presumably we get a pattern that can match terms of different arities.
...should also be usable with keyword arguments.
- I note that David used
coverage(..., Sextent)below, which seems symmetric enough -- you can bind the first few and/or the last few items positionally. Ditto for mixed positional and keyword args.
annotateStart(...)seems equivalent to the type-restricted free variable
annotateStart _, though perhaps your version is better style.
Syntactic MT decoding
In addition to backward chaining for computing feature/weight dot products, I assume you can assign to the priority of an item in an inference rule, as well as refer to "value" (perhaps a reserved word?).
% a quasi-synchronous grammar translation decoder :- item(item, double, -inf). :- pragma(keep_all_antecedents(item)). :- union(nonpersistent,[ tword,dep,endS,endT,link ]). :- retractable(nonpersistent). :- union(lmstate, [ s, nil ] ). :- union(linkState, [ rangeS, nil ] ). :- structure(coverage(int twords, int skids, int halfKids, rangeS sextent)). :- foreign(term merge_coverage(term, term)). :- foreign(term merge_lm(int, term, term)). :- structure(s(string first, term rest)). :- foreign(nodep). :- foreign(nodom). :- foreign(neq). :- foreign(getspan). :- union(backs, [ score_word, score_hook, extend, complete, cap_tree ]). :- pragma(backword_chain(backs)).
- The above syntax is all old-fashioned. I assume it's ok to update it. Jason 03:16, 28 Apr 2006 (EDT)
% Calculate transitive tree relations sibling(A, B) max= dep(A, H) + dep(B, H) + neq(A, B). desc(A, B) max= dep(A, M) + desc(M, B) + getspan(A,B). desc(A, B) max= dep(A, B) + getspan(A,B). grandchild(A, B) max= dep(A, M) + dep(M, B). ccommand(A, B) max= (dep(A, C) + desc(B, C) whenever alignChild("c-commandee")) + nodom(A, B). % end tree relations
- Shouldn't the above be booleans written with
&? Jason 03:16, 28 Apr 2006 (EDT)
goal max= root(Head,Lcov,Rcov,Llm,Rlm) + cap_tree(Head,Lcov,Rcov,Llm,Rlm). root(nonterm(Head,Link,Hword),Lcov,Rcov,Llm,Rlm) max= capped(LEFT, nonterm(Head,Link,Hword),Lcov,Llm) + rootchild(Head) + capped(RIGHT, nonterm(Head,Link,Hword),Rcov,Rlm). half(RIGHT,nonterm(Head,rangeS(U,V),W),coverage(1,0,0,rangeS(U,V)),"noadj",nil,s(W,nil)) max= (tword(LinkTag,LinkWord,rangeS(U,V)) whenever trans(LinkWord, W) whenever wchild(Head,W)) + score_word(Head,W,LinkTag,LinkWord). % compute the dot product of feature and weight vectors % note = instead of max= score_word(Head,W,LinkTag,LinkWord) = trans(LinkWord, W) * param("trans") + btrans(LinkWord, W) * param("btrans") + ctrans(LinkTag, Head) * param("ctrans") + bctrans(Head, LinkTag) * param("bctrans") + wchild(Head, W) * param("dmv") + wtag(W, Head) * param("wtag").
- I bet the dot-product rule above can be made prettier somehow, maybe by breaking it into a bunch of
+=rules that can then be collapsed down into fewer rules. Jason 03:16, 28 Apr 2006 (EDT)
capped(Dir,nonterm(Head,Link,Hword),Hcov,Hlm) max= half(Dir,nonterm(Head,Link,Hword),Hcov,Adj,Alink,Hlm) + stop(Dir,Head,Adj). half(LEFT,Head,coverage(0,0,0,rangeS(0,0)),"noadj",nil,nil) = 0 whenever rdone(Head). % only instantiate when right is done rdone(Head) max= capped(RIGHT,Head,Hcov,Hlm). hook(Dir,Head,Child,Hcov,Hlm) max= (half(Dir,Head,Hcov,Adj,Alink,Hlm) whenever half(RIGHT,Child,coverage(1,0,0,Cloc),"noadj",nil,Clm)) + score_hook(Dir,Head,Child,Adj,Alink,Hcov,Hlm) priority := -inf if max_left_dep > Hcov.twords + 1. % We'd like to put in some dependency length bounds here.
- The idiom
priority := -inf ifmight deserve a shortcut name like
prune if. Not sure, though. Jason 03:16, 28 Apr 2006 (EDT)
- The use of
Hcov.twordshere is trying to extract a keyword argument from a term of type
coverage. I think this is a good accessor syntax, and follows naturally from our use of keyword constructors. We were already planning to allow keyword accessors of this kind on the C++ side; clearly they should be available in Dyna too.
- Field names are now supposed to be capitalized, since they are repeatable variables in the type declaration. Hence it should be
- (Then I guess other C++ accessors like
.arg(3)-- though perhaps not
.argc(I)-- should also be allowed in Dyna. Could be useful for analyzing subterms. In fact, maybe
=fooare just sugar for
- (And I guess if
Hcovwere a primitive object (a.k.a. foreign datatype), we could use this syntax to call its methods: e.g.,
Hcov.Twordsis only legal if the
Hcovvariable can be typed at compile time to something like
coveragethat is guaranteed to have a
Twordsaccessor, thanks to the declaration of
halfor an explicit type restrictor in the rule. The use of the
Twordsaccessor could be a clue to type inference, allowing us to infer a declaration for
half; we could even infer a type restriction on this specific rule but that seems dangerous. Jason 03:16, 28 Apr 2006 (EDT)
score_hook(Dir,nonterm(Htag,Hlink,Hword),nonterm(Ctag,Clink,Cword),Adj,Alink,Hcov,Hlm) = nonstop(Dir, Htag, Adj) * param("dmv") + dchild(Dir, Htah, Adj, Ctag) * param("dmv") + slm(Dir, Cword, Hword) * param("slm") + cprod(Dir, =tag(Hlink), =tag(Clink), Htag, Ctag) * param("cprod"). % In the line above, we want the source tag at a given source position. tag(Loc) = Tag if tword(Tag,Word,Loc). % see the pruning clauses tacked onto half(), which we should replicate here trap(RIGHT,Head,Child,merge_coverage(Hcov,Ccov),merge_lm(RIGHT,Hlm,Clm)) max= hook(RIGHT,Head,Child,Hcov,Hlm) + capped(LEFT,Child,Ccov,Clm) + extend(RIGHT,Head,Child,Hcov,Ccov,Hlm,Clm) priority := -inf if foo.
- Surely rather than "replicating" the pruning clause foo, we should be able to make foo be a backward-chained item? It would presumably have to take arguments, but we might be able to get away with just a
Thisargument as below.
trap(LEFT,Head,Child,merge_coverage(Hcov,Ccov),merge_lm(LEFT,Hlm,Clm)) max= hook(LEFT,Head,Child,Hcov,Hlm) + capped(RIGHT,Child,Ccov,Clm) + extend(LEFT,Head,Child,Hcov,Ccov,Hlm,Clm) priority := -inf if foo. extend(Dir,Head,Child,Hcov,Ccov,Hlm,Clm) = foo. % old pruning rules % trap(Dir,Head,Child,Cov,Lm) max= ptrap(Dir,Head,Child,Cov,Lm) + prune_trap(Dir,Head,Child,Cov,Lm). % half(Dir,Head,Cov,Adj,Alink,Lm) max= phalf(Dir,Head,Cov,Adj,Alink,Lm) + prune_half(Dir,Head,Cov,Adj,Alink,Lm). % we don't want to destructure the whole coverage best_half(Dir, Sextent) max= half(Dir,Head,coverage(...,Sextent),Chead,Clink,Lm). count_half(Dir, Sextent) += 1 whenever half(Dir,Head,coverage(...,Sextent),Chead,Clink,Lm). half(Dir,Head,merge_coverage(Hcov,Ccov),Chead,Clink,merge_lm(Dir,Hlm,Clm)) max= trap(Dir,Head,nonterm(Chead,Clink,Cword),Hcov,Hlm) + capped(Dir,nonterm(Chead,Clink,Cword),Ccov,Clm) + complete(Dir,Head,nonterm(Chead,Clink,Cword),Hcov,Ccov,Hlm,Clm) priority := -inf if (value + hparam("beam_width") < best_half(Dir, merge_coverage(Hcov, Ccov).sextent) or count_half(Dir, merge_coverage(Hcov, Ccov).sextent) > hparam("max_bin")).
- The expression
merge_coverage(Hcov, Ccov).Sextentis not extracting the
Sextentfield from the constructed term of type
merge_coverageitself (though that could also be useful), but rather from that term's (foreignly computed) value, which is another term. So I think this should actually be written as
(=merge_coverage(Hcov, Ccov)).Sextent, where the extra parens are probably optional by precedence rules. Jason 03:16, 28 Apr 2006 (EDT)
complete(Dir,Head,Child,Hcov,Ccov,Hlm,Clm) = foo. top_trans(S) max= trans(S,T). tag_trans(STag) max= ctrans(STag,TTag).
In the "subordinate clause"priority := -inf if value + hparam("beam_width") < best_half(Dir, etc. etc.
how about something likewith priority(This) := -inf if This + beam_width < best_half(Dir, etc. etc.
thisis a reserved word referring to the consequent of the main clause. Or perhaps
valuewas supposed to be the value of the antecedent expression, not the accumulated value of the consequent, in which case maybe the game iswith priority(Consequent) := -inf if Antecedent + beam_width < best_half(Dir, etc. etc.
But maybe you are not trying to change the priority of the consequent, only this particular update to the consequent. Hence perhapswith priority(This) = -inf if Antecedent + beam_width < best_half(Dir, etc. etc.
In any case, your second rule is "subordinate" because it implicitly refers to the hyperedge item
Thiscreated by the first rule.
Thishas keyword arguments, allowing us to get at
This.Antecedent, etc., where the
This.part is omittable by convention, just as in C++. Maybe that hyperedge item should be more explicit in Dyna? We may want it to be anyway, for making the parse forest beautiful and efficient, and for definining queries over its hyperedges.
The subordinate clauses let you economically write multiple inference rules (one defining
Consequent, the other defining
priority(Consequent)) that are triggered by the same set of antecedents. This is not quite a case of a multi-headed rule because the bodies are also different (the second body is
-inf if ...). But there is some notion of keeping the local variables from a previous definition "in scope." Jason 03:16, 28 Apr 2006 (EDT)
YOUR EXAMPLE HERE
Here are some topics that we want to illuminate above using both real and artificial examples:
- stick with :- or something better?
- stick with storedas or something better?
- when to write "pragma" explicitly?
- inheritance of declarations
- type-specific declarations: chaining and memoization, different kinds of transience, transparency, retractability, incremental vs. replacement updates, determinism, parallelization ...
- rule-specific declarations, especially declaring transformations and prunability
- program-specific declarations: multi-charts, declaring new operators ...
- default values
- type inference (new version)
- meaning of type restrictors
- const (chart-independent) side conditions to use as guards in type defs
- axiom accumulation operators like max= as well as := (maybe also max:= to control what assignment does but still allow retractability)
- inference rules
- "if" vs. "needed only if" ("especially if"?) (should be property of consequent rather than specific rule?)
- rule-specific declarations
- evaluated subterms
- syntax like (+= foo(X,Y)) for an anonymous sum inside a larger expression
- inference order
- backward chaining; mixing forward and backward chaining
- funky agenda disciplines, including CKY order or block-processing of related items
- convergence declarations
- memoization and flush policy
- memoizing foreign functions; interaction with disk
- propagating different kinds of updates for general formulas
- efficiently storing them
- specifying which ones to store (all? none? 1-best?) and reconstructing them by backward chaining
- finding entire antecedent proof trees; efficiently constructing k-best trees on demand
- finding consequents as well as antecedents
- controlling what they look like (in case of program transformation)
- C++ accessors
- queries over chart
- non-ground items in chart (default value is the simplest example)
- explicit inference-rules with "<-" for accumulating queries? or just have queriable types?
- backward-chained and memoized queries (result materialized on demand)
- queries over foreign items
- C++ interface
- foreign interface
- declaration of foreign primitives
- operators and their properties
- defaults (under various accumulation operators)
- naming the foreign class and operator functions that get used (and what if not C++?)
- query iterator over all instances of the type, e.g., goal += 1/int X if X > 0
- declaration of foreign items
- auto-evaluation of arguments
- explicitly updating/retracting their values
- declaration of foreign primitives