Term object

From Dyna
Jump to: navigation, search


Term classes

As discussed earlier, every Dyna term type has a corresponding C++ class.

This includes union types and pattern types. It also includes structure and union types declared automatically by declaration inference. In particular, it includes automatically declared union types such as term, item, and trainable.

How to use term objects

Most often, you will construct terms in order to set or examine their values in a chart. You are also free to use them anywhere you want in your C++ program if you find them helpful.

You should treat term objects like C++ ints, or like objects in Java or Lisp. From the C++ view, assume they are small and cheap to copy. (Most of the time they are implemented as pointers to bigger objects that are stored on the heap, although this can be controlled by pragmas.) You don't need to take pointers to terms, and don't need to use new to create them on the heap. Also, function arguments should be terms, not term references; a function that declares it wants a foo& will only accept a foo&, whereas a function that declares it wants a foo will also accept arguments whose C++ type is a Dyna subtype of foo

Dyna's term objects are not initialized when created with a default constructor.

Like the code { int x; cout << x << endl; },

the code { term x; cout << x << endl; }

will produce undefined behavior (probably a crash).

Although copying a term is a cheap operation, it is not quite as cheap as copying a pointer, as the garbage collector monitors the copying of terms by your C++ driver program. You may want to avoid copying terms inside heavily used loops if possible. This caution is based on intuition, not profiling, however, so there may not be any significant slowdown from copying terms. Eerat 15:08, 2 Jul 2004 (EDT)

Terms are immutable; there are no methods to modify a term. You can, however, modify a term's value in a chart.

Term constructors

Constructors for structures and patterns have the same form as in Dyna. For example,


are valid in both Dyna and C++.

actually this syntax is broken for 1-arg types and needs to be rethought. Occasionally the constructor is mistaken for a copy constructor. eeraT 01:03, 20 Jul 2004 (EDT)

Note that 0-argument constructors like zero can be written without the parentheses. (The Dyna compiler here defined zero to be a global C++ constant. It is an instance of a C++ class that is also named zero. C++ prefers to interpret zero as referring to the constant; you can use class zero to refer to the class if necessary.)

Note that C++ does not get Dyna's syntactic sugar for terms. For example,

[1,2,3]                       % ok in Dyna only
cons(1,cons(2,cons(3,nil)))   % equivalent, ok in both C++ and Dyna

There are only C++ constructors for base types at present. However, eventually we will also have C++ constructors for pattern types, which use the same pattern constructor syntax as in Dyna. We'll also have constructors for union types as well under certain circumstances (e.g., for overloaded functors).

Primitive term objects

int vs. int_term
string vs. atom or dyna_string

Term I/O

To print a term t:

cout << t;

To read a term off a stream (using a parser generated by dynac so that it's specialized to your Dyna program):

cin >> t;
Might not be implemented publicly at the moment.

To turn a term into a string (let us know if you want a convenience method for this):

ostringstream s;
s << t;
return s.str();

To turn a string into a term:

term t = parse_term("constit(\"np\",3,5)");   // in the program's namespace

Currently parse_term is misnamed parse_string.
Note syntactic sugar on input and output: list syntax and infix operators.
In particular, how the string-to-term specialized parser deals with list syntax upon input. To read something of type nil (or a supertype like list), it allows the [] notation, provided that nil has been declared with arity 0. To read cons (or a supertype like list), it allows the [a,b|c] notation (provided that cons has been declared with arity 2 and the second element can also be a cons) as well as the [a,b,c] notation (provided that cons has been declared with arity 2 and the second element can be either another cons or a nil, and nil has been declared with arity 0).

Term casts

Suppose you have a union type as follows:

:- union(pachyderm, [elephant,rhino]).

Upcasts from a subtype to a supertype are implicit:

elephant e;
pachyderm p = e;    
pachyderm p = e.to_pachyderm();   // explicit version, if necessary
Don't use the cast syntax (pachyderm)e. It will work correctly for upcasts, but might be misinterpreted as a constructor call for downcasts, so be safe and avoid it entirely.

Downcasts are explicit, and are checked at runtime:

pachyderm p = ...;
if (p.is_elephant()) {
  elephant e = p.to_elephant();
  return e.trunk();
pachyderm p = ...;
try {
  elephant e = p.to_elephant();
  return e.trunk();
catch (std::exception &e) {
  cout << "This is an exceptional rhino";

Sidecasts work like downcasts:

Sidecasts are broken right now eeraT 00:48, 20 Jul 2004 (EDT)

:- union(swimmer,[goldfish,whale,dolphin]).
:- union(mammal,[horse,whale,dolphin]).
mammal m = ...;
if (m.is_swimmer()) {
  return m.to_swimmer().left_fin();  // actually returns something in an unnamed
                                     // union type [whale,dolphin]
In particular, sidecasting from foo_antecedent=[a,b,c] to times=[c,d,e] should produce something of type c, not of type [c,d,e], so that further methods on it will take advantage of the type knowledge about the particular kind of times (multiplicative) expression that we are dealing with. This is especially useful in something like
if (a.is_times()) {
   bar b = a.to_times().first();    // knows the type of first, so no cast needed
   cout << a.to_times().second();   // knows the type of second

Methods to analyze terms

  • is it of type foo? (The is_foo() method; note when this is/isn't defined. Also note the to_foo() typecast; I think this is documented elsewhere in the wiki.)
  • get the base type (see Type object)
    • note issues with C names for operators like + and * (including the current yucky semiring hack where everything comes out as times)
  • number of arguments
  • argument accessors
    • named accessors.
    • numbered: arg0, arg1, etc.
    • function-style: argc, arg(0), arg(1), etc.
    • explanation of accessors for union types

The first argument is also called arg0, the second argument is also called arg1, etc. So if the type declaration is

  foo(int x, list y)

then the methods x and arg0 do the same thing. This is especially useful for unnamed fields:

  foo(int, list)

is equivalent to

  foo(int arg0, list arg1)

If foo is a union type, then it will have an x accessor if at least one subtype of foo has an x accessor. If foo is a union over

    foo1(list x)
    foo2(list x, string y)
    foo3(string y, list x)

then its x accessor will have return type list, whereas if it is a union over those and also

    foo4(string y, int x)
    foo5(string y)

then its x accessor's return type will be the union type [list,int,null]. If such a type does not already exist in the Dyna program, then one is added on the C++ side and is given an arbitrary name such as union72.

Methods to analyze patterns

If you declare

:- structure(constit(string nonterm, int start, int end)).
:- pattern(mypattern, constit("np",I,I)).

then mypattern acts like a subclass of constit. A mypattern object has all the usual constit accessors, although it may be able to implement them more efficiently: for example, mypattern::nonterm() or mypattern::arg0() always returns np.

However, mypattern() also has the additional pattern-match accessor I(), which returns the binding of the variable I in the pattern declaration:

constit("s",0,8).to_mypattern()         // throws exception: checked downcast fails
constit("np",5,5).to_mypattern().I()    // returns 5

A typical use would be as follows:

term t = constit("np",5,5);          // equivalently: constit t("np",5,5) (faster, no =)
if (t.is_mypattern()) {              // check for pattern match
   mypattern p = t.to_mypattern();   // checked downcast to pattern subtype
   cout << p.arg0() << p.arg1() << p.arg2();     // prints np55
   cout << p.nonterm() << p.start() << p.end();  // prints np55
   cout << p.I();                                // prints 5
  • The first output line uses the generic argument accessors for term.
  • The second output line uses argument accessors for constit. These are available because mypattern is a subtype of constit.
  • The third output line is special for pattern instances. It recovers the binding of I in the match. This means that variable names are significant when you declare patterns in Dyna.

The Dyna compiler gets to decide on efficiency grounds whether to store a mypattern instance in memory as a constit instance (from which the variable bindings can be extracted), or as a set of variable bindings.(from which the constit instance can be reconstructed), or as a combination or hybrid of these.

Note that what patterns match is terms. They don't care at all about the values of the terms, so a chart is not needed to tell whether a term matches a pattern. To find items or expressions in a chart that match a pattern, use a query.

Avoiding name conflicts in accessors

x is really just an alias for get_x. The get_* form is always guaranteed to exist, whereas the x form isn't because it might conflict with some C++ word. For example,

   mytype(string class)

cannot have a C++ accessor called class(), but you can still use get_class.

Automatically generated C++ code should take care to use the get_* or arg# accessors because they are guaranteed to exist and work as expected.

What happens if a user does something smart-alecky like

  :- type(foo(int arg1, string arg0, foo x, bar get_x)).

The arg# accessors continue to work: so the four arguments can be referred to as arg0, arg1, arg2, arg3 in that order. The get_* accessors also continue to work: so the four arguments can be referred to as get_arg1, get_arg0, get_x, and get_get_x in that order. The foo class only provides direct named accessors when they don't conflict with existing methods of the class or C++ keywords; in this case, it provides an x accessor (a synonym for arg2) but the other names are already taken.

Union types work the same way. The union-type class still always has the arg# and get_* accessors. It only adds direct named accessors if they don't conflict with existing methods of the union-type class.

We may also want to worry about the case where local variables in the generated code conflict with symbols declared in the user's C++ program. (But this probably shouldn't result in renaming accessors as above, anyway -- maybe just pick obscure names for local vars? Is this really a problem? Cf. STL, but that's not separately compiled) Eerat 13:35, 2 Jul 2004 (EDT)
list is a bad name for the union of cons and nil. If the user has #include <list> and using namespace std; in their program, the STL linked list will conflict with the Dyna list Eerat 14:07, 2 Jul 2004 (EDT)
Is it really worth renaming for? They can write cky::list in that case. We'd warn users about that. Jason 16:53, 15 Jul 2004 (EDT)

Copying terms

Just as for integers:

term a = ...;
term b = a;

Testing term equality

Just as for integers:

At present this public interface is missing; but a.equals(b) may be available for some types.

Hashing terms

You can map from terms to ints (for example) by using a chart, but you can also do it with your own hash table:

hash_map<term,string> h;

A hash code for t can be obtained directly as hash<term>()(t).

At present this public interface is missing, but you can get a hash code with t.hashcode()
Personal tools