Inference rule

From Dyna
Jump to: navigation, search

An inference rule specifies how to compute the value of an item (specifically, a theorem) from the values of other items.


A simple case

A simple inference rule is

area = length*width.

The item on the left-hand side (area) is known as the consequent, while the expression on the right-hand side (length*width) is known as the antecedent.

Variables in inference rules

Typically, an inference rule contains variables. In this case it represents not just a single equation, but an equational schema that describes how to compute the values of many different items. For example,

area(R) = length(R)*width(R).

defines how to compute all items of the form area(R). It implies infinitely many statements such as

area(this_rectangle) = length(this_rectangle)*width(this_rectangle).
area(that_rectangle) = length(that_rectangle)*width(that_rectangle).
area(room("kitchen",myhouse)) = length(room("kitchen",myhouse))
                                * width(room("kitchen",myhouse)).

Thus, the single antecedent of area(this_rectangle) is the expression length(this_rectangle)*width(this_rectangle).

Accumulation in inference rules

Most inference rules involve accumulation of many values. For example, the += accumulation operator in

dot_product += vector1(I) * vector2(I).

defines dot_product to be a sum,

Failed to parse (Missing texvc executable; please see math/README to configure.): \texttt{dot\_product} = \sum_\texttt{I} \texttt{vector1(I) * vector2(I)}.

The above rule can be generalized to define the dot product of any two vectors:

dot_product(Vector1,Vector2) += element(Vector1,I) * element(Vector2,I).

implies infinitely many statements such as the following, which defines the dot product of vectors named vect(22) and vect(17), if any:

Failed to parse (Missing texvc executable; please see math/README to configure.): \texttt{dot\_product(vect(22),vect(17))}
    Failed to parse (Missing texvc executable; please see math/README to configure.): = \sum_\texttt{I} \texttt{element(vect(22),I) * element(vect(17),I)}.

In both cases, the summation is over antecedents that vary in their vector index I. This is because I is not already specified by the consequent item (dot_product or dot_product(Vector1,Vector2)).

The latter inference rule can be read as follows: "For all terms Vector1, Vector2, and I, compute element(Vector1,I)*element(Vector2,I) and add it to dot_product(Vector1,Vector2) using +=."

Other accumulation operators

Other accumulation operators besides += can be used. For example, whereas += sums over numeric-valued antecedents, max= maximizes over them.

Suppose the value of edge_weight(U,V) is the weight of the edge from vertex U to vertex V in a weighted graph. The following use of max= finds the maximum weight of any edge in the graph,

 maxweight max= edge_weight(U,V).    % or edge_weight(_,_); see variable

whereas the following finds the maximum weights of any edge leaving vertex U or entering vertex V:

 maxout(U) max= edge_weight(U,V).
 maxin(V) max= edge_weight(U,V).

In all cases, the maximization is over variables that appear only in the antecedent expression, just as for summation.

NOTE - Because of certain issues with the implementation, there is currently no min= accumulation operator. To do a minimization over positive numbers, please negate all the numbers and maximize.

Similarly, |= and &= respectively take the disjunction and conjunction of boolean-valued antecedents. Suppose we are given an unweighted graph, where edge(U,V) is true if and only if there is an edge from vertex U to vertex V. The following derives the transitive closure of the graph:

 path(U,V) |= edge(U,V).
 path(U,V) |= path(U,W) & edge(W,V).

In this case, a particular theorem such as path(a,b) will have value true if either edge(a,b) has value true, or there is some W such that path(a,W) and edge(W,b) are both true. All the antecedents from both rules are combined disjunctively because the accumulation operator is |=.

(If &= were used instead of |=, then path(a,b) would be true if edge(a,b) had value true, and for all W, path(a,W) and path(W,b) were both true So |= and &= respectively accomplish existential and universal quantification over W.)

Notice that Dyna uses | and & as the boolean operators. This avoids the connotations of || or &&, which in C guarantee an order-dependent short-circuit evaluation. In Dyna, the evaluation order is up to the compiler.

NOTE - There is currently no &= accumulation operator. There is also no | operator.

Formal discussion

Form of inference rules

The general form of an inference rule is

consequent accumulation_operator antecedent

where the consequent is an item (known as a "theorem" by virtue of appearing in this context) that may contain variables as subterms, and the antecedent is an expression that may contain variables as subterms. (See pattern.)

A restriction on variables

In the present version of Dyna, variables in the consequent must also appear in the antecedent. (The same restriction appears in Datalog, the logical query language for relational databases.)

For example, the following rules are not allowed:

foo(X,Y) += bar(X).
word(epsilon,I,I) = true.
rewrite(X,X/Y,Y) = true. 
append([X|List1],List2,[X|List]) |= append(List1, List2, List).

Why not? Because the present version only supports forward-chaining, and only on variable-free terms (ground terms). If it finds a value for bar(a), it is unable to update all items of the form foo(a,Y).

Fortunately, such rules can often be eliminated by applying an "unfolding" transformation to the program. But the transformation fails to halt if the rules can produce items with arbitrarily deep variables, as in Prolog's difference lists.

A restriction on accumulation operators

If path is an item type, then all rules with a consequent of the form path(...) must use the same accumulation operator. Hence, the choice of accumulation operator is really a property of the path type, even though it is written syntactically as if it is part of the inference rules. For purposes of discussion, let us call this operator op=.

The meaning of inference rules

To define the value of a particular item Failed to parse (Missing texvc executable; please see math/README to configure.): \gamma

(such as path(a,b)), we use the op operation to 

aggregate the values of all expressions Failed to parse (Missing texvc executable; please see math/README to configure.): \alpha

(such as path(a,c)*edge(c,b)) such that Failed to parse (Missing texvc executable; please see math/README to configure.): \gamma
op= Failed to parse (Missing texvc executable; please see math/README to configure.): \alpha

matches some inference rule.

In practice (because Dyna finds the least fixpoint of a set of equations), only a finite number of these expressions Failed to parse (Missing texvc executable; please see math/README to configure.): \alpha

will have non-default values Failed to parse (Missing texvc executable; please see math/README to configure.): \alpha_1, \alpha_2, \ldots \alpha_n (n \geq 0)

. The value of Failed to parse (Missing texvc executable; please see math/README to configure.): \gamma

is defined to be Failed to parse (Missing texvc executable; please see math/README to configure.): d\;\texttt{op}\;\alpha_1\;\texttt{op}\;\alpha_2\;\texttt{op}\;\cdots\;\texttt{op}\; \alpha_n

, where d is the default value for items of Failed to parse (Missing texvc executable; please see math/README to configure.): \gamma 's type, as specified by an :- item(..., d) declaration.

Default values are significant

NOTE - nothing in this section is implemented.

This "in practice" definition ignores the infinitely many Failed to parse (Missing texvc executable; please see math/README to configure.): \alpha

expressions that have the default value.  Thus, it works only if defaults are specified appropriately (roughly speaking, so that the inference rule equations are consistent when all items have default values).  

For example, foo(X) max= bar(X,Y)+1 maximizes for each foo(X) over infinitely many bar items that have the default value (as well as 0 or more bar items for which a non-default value has been derived). For consistency, the default value of foo(X) should be the result of maximizing infinitely many bar(X,Y)+1 expressions with the default value: that is, it should be 1 plus the default value of bar. If max= is replaced by += here, then the default value of foo(X) should be the result of summing over those expressions; hence there is no consistent (finite) default value for foo unless the default value for bar is -1.

A less artificial example is given by the program for arc-consistency in constraint programming:

 :- item(in_domain, bool, false). % axiom - user may assert some true values
 :- item(consistent, bool, true). % axiom - user may assert some false values
 :- item(variable, bool, false).  % what are the variables of the system?
 :- item(possible, bool, false).  % what values are arc-consistent for each variable?
 :- item(support, bool, false).   % why are those values arc-consistent?
 variable(Var) |= in_domain(Var:Val).
 possible(Var:Val) &= in_domain(Var:Val).
 possible(Var:Val) &= support(Var:Val, Var2) whenever variable(Var2).
 support(Var:Val, Var2) |= consistent(Var:Val, Var2:Val2) & possible(Var2:Val2).

Note that all the equations in the above program are consistent when all items have default values. In this case, possible defaults to false even though it is accumulated with &=, whose identity element is true. This is because one of the conjuncts is in_domain, which is initially false. If the user asserts that conjunct to be true, then possible may become true as well (at least temporarily, until some consistent axioms are asserted false).

This example highights an error in the previous section: it shows that defaults are not used as accumulator identities. If they were, the declaration that possible defaults to false would mean that the value of a possible item would be fixed at false because it would be defined as an accumulation false & ... & .... No, it is clear that identity elements are associated with the operator &. What defaults are used for is to specify values of unasserted axioms (this contributes to the semantics) and underived theorems (the compiler should be able to figure this out for itself, but can't because it's uncomputable, so we ask the user to do it and merely check). Hardcoded axioms like foo=3 are treated as assertions that override the defaults, as are axiom-like (antecedentless) rules such as bar(a)+=4).
The compiler will eventually check that the defaults are consistent and issue a warning if not. This may get a little tricky to formulate once we allow lines like foo(0):=5 and constit(epsilon,I,I)=1.
The final line in the above program means that if possible(Var2:Val2) becomes true, then infinitely many derived items support(_:_,Var2) must be updated to true, much as if there were variables in the consequent that didn't appear in the antecedent. This is because it is possible to get a non-default antecedent expression when just one of its subexpressions has non-default value. Things are much easier if we rephrase the program so that consistent starts out false and must be asserted true, as in the standard presentation of arc consistency.

Special accumulation operators

NOTE - nothing in this section is implemented

In general, an accumulation operator should be an associative, commutative binary operator. In other words, it can be regarded as acting on a set of antecedent values.

It is possible to ignore this condition: then the order of accumulation will affect the result. At present Dyna may not formally guarantee any particular accumulation order, so you'd better know what you're doing if you use non-commutative accumulation operators.

The = operator

One unusual accumulation operator is =, which was used at the start of this page. It only works on sets of size 1. If asked to aggregate two or more values, it will throw a runtime exception.

The = operator is useful for rules where no accumulation is possible. For example,

area = length*width.
output(Node) = sigmoid_function(input(Node)).   % for a neural network

In these cases, all the antecedent variables also appear in the consequent, so a given consequent item will have at most one antecedent (provided that there are no other rules that contribute to defining area and output). There is no need to pick an accumulation operator, and it would be confusing to arbitrarily select one like +=. That is why Dyna provides =.

The = operator is also useful for rules where no accumulation is expected in practice. For example, in a deterministic finite-state automaton (DFA),

path(State,[]) = 1.
path(State,[Symbol|Rest]) = transition(State,Newstate,Symbol)*path(State,Rest).

implies that path(q,[a,b,c]) is the weight of the path from state q that reads the string [a,b,c]. The above rules using = are appropriate because in a DFA, this path is unique if it exists. If the rules are applied to a non-deterministic automaton, they will trigger a runtime exception when they attempt to aggregate multiple paths. In that case, one would have to replace = with an ordinary accumulation operator such as += or max=.

There may only be one derived antecedent per consequent, at any time (explain). However, it is fine for this antecedent's value to change as often as it likes.

Where = is the accumulation operator, the default value of the consequent item should equal the default value of the antecedent expression.

This is clearly the right course when the consequent binds all variables in the antecedent. It's probably also best when there are free variables in the antecedent, although an alternative in this case is to allow the user to choose any default value, or to use a special "undefined" value as the default.

In some cases, the compiler can prove at compile time that no accumulation is possible, in which case it can avoid the runtime check. This is the case if all the antecedent variables appear in the consequent, or are known through declarations to be determined by the variables in the consequent. (In particular, Dyna may eventually allow Mercury-style determinism declarations, which can be used to state and enforce the DFA requirement: for each State, Symbol pair there is at most one Newstate such that transition(State,Newstate,Symbol) has non-default value.)

Warning: The = accumulation operator is distinct from the term-unification operator = that may appear in expressions.

The := operator

A useful non-associative accumulation operator is assignment, :=. Recall that foo(X) += bar(X,Y) says to add values of bar(X,Y) to a running total foo(X), and foo(X) max= bar(X,Y) says to maximize values of bar(X,Y) into a running maximum foo(X). In much the same way,

foo(X) := bar(X,Y).

says to use values of bar(X,Y) to replace a running value foo(X). In other words, foo(X) holds the value that was most recently accumulated into it. This corresponds to a binary operator op such that a op b is defined to be b.

The usual warnings about non-associative accumulation operators apply (see above). Nonetheless, we suspect that := can be useful.

A subtlety: if foo(a) accumulates values 3,8,5 (giving it value 5), and the antecedent with value 8 later changes its value to 9, does foo(a) become 9 or does it stay 5? We need to study use cases.
There should perhaps be a complementary operator that uses the value of the first rather than the last antecedent found. This is useful for algorithms like union-find and Prim's algorithm. However, it can be handled anyway with foo(X) = bar(X,Y) if !derived(foo(X)).

Importing accumulation operators from C++

Dyna will eventually have the ability to import new accumulation operators from C++ for a given type, using some kind of :- accumulator(...) declaration.

If foo is a binary function or operator that has already been imported, then we will perhaps create a default definition for foo= as well. But it may be more efficient to implement foo= directly.

The direct implementation has the following advantages. When carrying out a foo= b to update a, it is faster to update a in place than to compute a foo b in a temporary and then copy it into a. Dyna often also needs to know whether a has changed as a result of this update, and the foo= implementation can provide information about that. Finally, the implementor ought to provide some other information to the Dyna compiler: e.g., whether foo distributes over other operators, and the result (if any) of x + y + y + y + ... (given x and y), which is needed for a compile-time check on the consistency of default values declared by :- item(...).


It is often necessary to write conditional expressions in inference rules.

Ordinary conditional expressions

The simplest conditional rules use C-style conditional expressions, of the form boolean-expression ? then-expression : else-expression. The following rule sums up the absolute values of the weights of all edges from vertex U:

total_abs_out(U) += edge(U,V) > 0 ? edge(U,V) : -edge(U,V).

Semi-expressions using if

Often, however, there is no "else" clause. In that case you can write a "semi-expression" for the antecedent. The following rule sums up the weights of all positively-weighted edges from vertex U:

total_pos_out(U) += edge(U,V) if edge(U,V) > 0.

And this rule sums up the weights of all edges from vertex U to a blue vertex:

total_out_to_blue(U) += edge(U,V) if blue(V).

This defines total_out_to_blue(U) as the summation Failed to parse (Missing texvc executable; please see math/README to configure.): \sum_{\texttt{V}:\;\texttt{blue(V)}} \texttt{edge(U,V)} . In other words, the if clause restricts the set of terms V to sum over. Note that the keyword if is always followed by a boolean-valued expression.

Another nice example:

factorial(N) *= I if between(1,I,N).  % where between(...) is a foreign axiom

Semi-expressions differ from conditional expressions

The above total_out_to_blue rule is very similar to

total_out_to_blue(U) += blue(V) ? edge(U,V) : 0.

since 0 is the identity element for +=. However, it would work even if += did not have an identity element. There is also a subtle difference. Consider the following set of asserted axioms:

edge(a,b) = 3.
edge(a,c) = 4.
edge(b,c) = 5.
blue(b) = true.
blue(c) = false.

Under the ?: rule, total_out_to_blue(a) would have two non-default antecedents

blue(b) ? edge(a,b) : 0    (with value 3)
blue(c) ? edge(a,c) : 0    (with value 0)

whereas under the if rule, it would have the single non-default antecedent

edge(a,b) if blue(b)       (with value 3)

Thus, although total_out_to_blue(a) has value 3 in both cases, the C++ antecedents() method (see chart object) will return fewer antecedents if the if rule is used.

The difference also affects which items are considered to have been derived (give reference). In the if case, total_out_to_blue(b) is not regarded as derived, because it has no antecedents.

Semi-expressions don't always have values

Not only is edge(a,c) if blue(c) not an antecedent of foo(a), but it has no value. In general, an if-expression has no value at all if its test evaluates to false. Thus, it is not properly an expression at all, but rather a "semi-expression" that may or may not have a value.

For simplicity, however, the C++ interface still regards these semi-expressions as expressions. So the antecedents of the double item total_out_to_blue(a), such as edge(a,b) if blue(b), are C++ terms of type expr_double. It is possible to ask the chart for their double values in the usual way (see chart object). Antecedents are in fact guaranteed to have values. An arbitrary semi-expression such as edge(a,c) if blue(c) also has type expr_double, of course, but might have no value in the chart (not even a default value), in which case asking for its value will throw a runtime exception.

Quantification in semi-expressions (whenever, if ever)

The following case can lead to confusion:

foo(X) += bar(X) if edge(X,Y) != 0.
% some axioms
edge(a,b) = 3.
edge(a,c) = 4.
bar(a) = 5.

Given the axioms above, what is the value of foo(a)? The best answer is probably 10, since += always sums over variables that appear in the antecedent pattern but not the consequent pattern, such as Y. The antecedents of foo(a) are

bar(a) if edge(a,b) != 0   (with value 5)
bar(a) if edge(a,c) != 0   (also with value 5)

(If you're still not convinced, notice that foo(a) would certainly be 10 if the rule had been written as foo(X) += edge(X,Y) != 0 ? bar(X) : 0.)

However, people often seem to think that foo(a) should be 5 in this case. To avoid confusion, Dyna requires you to word the rule as follows:

foo(X) += bar(X) whenever edge(X,Y) != 0.

This can be read as "increment foo(X) by bar(X) whenever you discover an edge edge(X,Y) with nonzero value." Thus, foo(a) is 10.

The whenever keyword must be used instead of if whenever there are variables like Y that appear only in the conditional test. It emphasizes the universal quantification over these variables. Here's another example that puts universal quantification to good use:

out_degree(U) += 1 whenever edge(U,V) != 0.   % counts the number of nonzero edges leaving U

But what if you really want existential rather than universal quantification? For instance, what if you really wanted foo(a) to be 5 in the previous example? This case comes up often in practice. In future, Dyna may allow you to write it conveniently as

foo(X) += bar(X) if ever edge(X,Y) != 0.   % existential quantification over Y

However, this construction is not strictly necessary. It is just syntactic sugar for

:- item(temp, bool, false).
foo(X) += bar(X) if temp(X).
temp(X) |= edge(X,Y) != 0.
Querying the antecedents of foo(a) may expose this syntactic sugar, as it's not clear what the antecedents should be otherwise.
We considered writing for each and if some, instead of whenever and if ever. Please let us know if you think that would be nicer.
In addition to if ever (which implicitly introduces temp(X) |= ...), we could add if always (which implicitly introduces temp(X) &= ...). Like if ever, and unlike whenever, this only adds the consequent once.
While we're adding new operators, we could also have ones for convenience that negate the condition: if vs. unless, if always vs. if never (or equivalently unless ever), if ever (or equivalently unless never) vs. unless always.

In cases like the above, is also often useful to use the ? operator (although be aware that it can carry a little extra overhead):

out_degree(U) += 1 whenever ?edge(U,V).   % counts the number of edges leaving U, even of value 0
Personal tools