Garbage collection

From Dyna
Jump to: navigation, search

Contents

Why garbage collection is needed

Often, a term object will be represented as a pointer to a record on the heap. (Multiple copies of the same term point to the same record.) At present, these records stick around until they are explicitly garbage collected.

Even if you retract all axioms from the chart, or destroy the chart object, these term records will still stick around in hopes that you might create the same terms again. It's rather like a cache, except it may eventually grow to fill up memory.

NOTE - for small programs or datasets, garbage collection is probably not necessary. You may want to wait until you start using up lots of memory before enabling it. For this reason, garbage collection is disabled by default. If you want to use garbage collection in your program, pass the option --xcompiler=-g to the dynac command line.

Explicit garbage collection

To tell Dyna to garbage-collect its unused term objects, your C++ program can call

gc()

Note that this function will be inside the Dyna program's namespace.

In future, Dyna will also call gc() automatically when the number of term records has increased significantly.

Portability

The garbage collector has to do some fancy footwork to keep track of terms that you yourself create with constructors in your C++ driver program. If your driver program maintains a a vector or hash table of hundreds of terms, we're careful not to delete them when you call gc(). We hope the solution is fairly portable, but it is possible that tweaking will be needed for some compilers or operating systems.

Efficiency

Dyna tracks your terms fairly efficiently for garbage collection. However, there is a bit of overhead involved in calling the tracking code every time your driver program constructs/destroys/copies/assigns/etc. a term.

Basic advice

If your driver program creates so many terms of its own that speed is an issue, we strongly recommend passing -O (or even -O3) to g++ when compiling the generated .cpp file. This allows the very short tracking functions to be inlined into the generated code.

You can also reduce the number of tracking operations by passing terms by reference instead of by value. This saves a call to the copy constructor, at the cost of an extra level of indirection. (Term objects are pointers (or sometimes small structs) that can be copied quickly, so one ordinarily treats them like ints and passes them by value. The only reason to pass them by reference is to avoid the overhead of tracking. Eventually, the profiler should be able to advise you on the tradeoff.)

Suppressing term tracking in user C++ code (advanced)

If you write a time-critical function or other block of code during which you construct or copy terms, you may want to disable the tracking of these terms to get the code to run faster.

To disable tracking within a scope, simply construct an object of type gc_disabler. Simply place the line:

gc_disabler x;

at the beginning of the scope. Once the gc_disabler object is destroyed, gc tracking is reenabled. The declaration remains in effect until the block ends and the declaration goes out of scope. In particular, it remains in effect during any functions called from the block (i.e., dynamic scoping). It is safe to nest these gc_disabler objects.

(Remark: Dyna disables gc tracking inside all internal functions (i.e., dequeue_build(), build(), enqueue(), backprop(), retract_*()) so that they run fast.)

However, the code must meet certain constraints in order to prevent leaks/segfaults: Put simply, if tracking is/is not disabled when the term is created, tracking must/must not be disabled when the term is destroyed:

  • For terms on the stack, if the term is created after the gc_disabler declaration, it must not be returned by the current function. If it were returned, that would mean that it was constructed inside the scope of the gc_disabler and destroyed somewhere outside of the current function (clearly outside the scope of the gc_disabler).
  • For terms on the heap (and remember, a term in a vector is a term on the heap!), if the term is constructed when tracking is off, it must also be destroyed when tracking is off (possibly elsewhere).

These two conditions should ensure a 1-1 matching of tracked construction/destruction of terms. Furthermore:

  • Any term that is created when gc tracking is off must be destroyed before gc() is called. Otherwise, the gc() function is not going to know that you have a copy of the term and will incorrectly delete it.

These conditions are slightly difficult to understand (and potentially inaccurate), and perhaps it is best that for now, gc_disabler not be used. eeraT 16:09, 14 Dec 2004 (EST)

Eric, will the runtime environment catch violations of some of these conditions? Which? Jason 13:44, 19 Dec 2004 (EST)

In future, we may allow nesting a gc_enabler declaration within gc_disabler. Then, for example, a time critical function could momentarily activate tracking while it constructed a term and placed it in a vector. Let us know if you need this.

Marking terms in user C++ code (advanced)

Dyna's basic garbage collection strategy is mark-and-sweep (although we may add support for reference counting of selected term types). Before sweeping, it marks all terms in the chart, all terms constructed by the driver program, etc.

Possibly the marking interface should be exposed to the user. Then the user could safely create terms when gc tracking is off, provided that he or she promises to mark these terms before calling gc.

(This is what Dyna does internally -- it does not waste time tracking terms that it adds to the chart, but rather marks terms in the chart before calling gc.)

Personal tools
Namespaces

Variants
Actions
Navigation
Tools