GC API: Difference between revisions

6,212 bytes added ,  23 April 2008
callbacks need closures, duh
(next mtg)
(callbacks need closures, duh)
 
(13 intermediate revisions by 2 users not shown)
Line 24: Line 24:


'''Open issue:''' We should detect dumb implementation mismatches at link time and bomb out.  I don't know the trick, but I bet bsmedberg does.
'''Open issue:''' We should detect dumb implementation mismatches at link time and bomb out.  I don't know the trick, but I bet bsmedberg does.
'''Open issue:''' The JSAPI allows JSObjects to contain pointers to non-GC-allocated data which may contain pointers back to GC-allocated stuff (objects, strings, numbers).  See [[mdc:JSClass.mark]].  This existing API seems impossible to reimplement on top of the GC API as it stands.  First, the GC API doesn't offer a per-object custom marking hook.  Second, the GC API insists on a write barrier (the JSAPI doesn't).
'''Open issue:''' Need to document which operations require a request.


=API areas=
=API areas=
Line 37: Line 41:


==Allocation==
==Allocation==
(BTW we haven't established naming conventions; this is just a sketch)
(BTW we haven't established naming conventions; this is just a sketch)


Line 44: Line 47:
(Where the layout information isn't a speed win, the implementation can of course just discard it.  A hacky implementation can just delegate <code>gc_alloc_with_layout</code> and <code>gc_alloc_array_with_layout</code> to <code>gc_alloc_conservative</code>.  Sloppy, but fine by me.)
(Where the layout information isn't a speed win, the implementation can of course just discard it.  A hacky implementation can just delegate <code>gc_alloc_with_layout</code> and <code>gc_alloc_array_with_layout</code> to <code>gc_alloc_conservative</code>.  Sloppy, but fine by me.)


XXXbsmedberg: I think this incorrect. At least if layout information specifies that a word is not a GC pointer, we should reliably not trace that word.


All these functions return a pointer to a newly allocated region of memory that is subject to GC (that is, the GC may collect it when it becomes unreachable), or <code>NULL</code> on failure.
All these functions return a pointer to a newly allocated region of memory that is subject to GC (that is, the GC may collect it when it becomes unreachable), or <code>NULL</code> on failure. XXXbsmedberg: the OOM API probably requires either that allocation functions never fail, or that there is a variant of these functions that never fail.


All allocations are <code>malloc</code>-aligned (that is, alignment is such that the pointer can be cast to any reasonable C/C++ type and used).
All allocations are <code>malloc</code>-aligned (that is, alignment is such that the pointer can be cast to any reasonable C/C++ type and used). They must be at least 8-byte aligned, so that three bits of tag are available.


  typedef enum GCAllocFlags {
  typedef enum GCAllocFlags {
Line 153: Line 157:


== GC ==
== GC ==
gc / maybe_gc / do_incremental_gc


'''Open issue:''' Hooks into the GC cycle.
void '''gc_collect'''();
 
Unconditionally collect garbage now. The current thread must be in a request.
 
void '''gc_maybe_collect'''(int msecs);
 
Suggest to the Garbage collector API that now might be a good time to collect garbage. The GC may decide to begin or continue incremental garbage collection during this callback. <var>msecs</var> is an application hint to the garbage collector indicating how many milliseconds incremental marking should be allowed to consume. There is no guarantee about the actual time consumed by the function.
 
typedef enum gc_GCStatus {
  GC_ROOTING,
  GC_LAST_ROOTING,
  GC_PRE_SWEEP,
  GC_POST_SWEEP
} gc_GCStatus; 
 
; GC_ROOTING
: The callback function may programmatically "root" objects by explicitly marking objects (via <tt>gc_mark_object</tt>). Note that application code may re-enter after this callback, if incremental GC is being performed.
; GC_LAST_ROOTING
: Like the GC_ROOTING callback, the callback function  may programmatically "root" objects, but client code will not run before sweeping.
; GC_PRE_SWEEP
: At this point all marking has occurred. The callback function may synchronize external data structures by checking <tt>gc_get_markstate</tt>
; GC_POST_SWEEP
: At this point all sweeping has occurred, and the program is about to be resumed. Threads other than the main thread have not yet been restarted.
; GC_FINISHED
: At this point garbage collection is finished and threads have been resumed. Garbage collection will not occur again until this callback is complete. ''See {{bug|430290}} for rationale.''
 
typedef void (*gc_callback)(
  gc_GCStatus state, void *closure);
 
void '''gc_add_callback'''(gc_callback callback, void *closure);
 
Register a callback function. If '''gc_set_thread_affinity''' has been called, the callback will occur on the specified thread.
 
'''Open issue:''' Need to document which callbacks may call which GC API functions.
 
The next two issues can only be resolved by taking a good hard look at SpiderMonkey internals.
 
'''Open issue:''' We may need to add callbacks for entering and leaving stop-the-world mode (what the MMGC_THREADSAFE comments call "exclusiveGC").  These are distinct from the pre-sweep and post-sweep callbacks, which only fire when a GC cycle ends; incremental marking stops the world too, but shouldn't fire those.
 
'''Open issue:''' We may need to expose the GC lock somehow.  SpiderMonkey currently uses it two ways: uses it as a general-purpose mutex (dubious); and creates condition variables protected by it (not quite as dubious, but still).


== Rooting ==
== Rooting ==
add root / remove root
The rooting API provides a simple way to treat a particular GC object as a root. More complex rooting scenarios can be accomplished with a precollect hook.
 
typedef struct GCRoot GCRoot; /* opaque */
GCRoot* '''gc_root_object'''(
  void *gcobject);
 
Treat gcobject as a root. <var>gcobject</var> must have been allocated with a GC allocation function.
 
void '''gc_remove_root'''(
  GCRoot *root);
 
== Multithreading ==
Each thread must indicate when it enters/leaves a region of code that touches GC-managed memory (and therefore needs GC to happen only when it's at a safe point) and when it enters/leaves a region of code that doesn't touch GC-managed memory at all (basically one long safe point, where the thread doesn't care if GC happens or not).
 
For a single-threaded program with only one <code>GCHeap</code>, this just means calling <code>gc_begin_request(heap)</code> at startup and <code>gc_end_request(heap)</code> at shutdown.
 
Features:
 
void '''gc_begin_request'''(GCHeap heap);
 
Enter a request.
 
The calling thread must not be in any active requests on any heap.
 
void '''gc_end_request'''(GCHeap heap);
 
Leave the current request.
 
The calling thread must be in an active request on <code>heap</code>.
 
void '''gc_suspend_request'''(GCHeap heap);
 
Suspend the current request.
 
The calling thread must be in an active request on <code>heap</code>.  That request becomes inactive.
 
The calling thread must later call <code>gc_resume_request</code>.
 
Allocations pointed to by C/C++ local variables in the caller or any of its callers at the time of the call to <code>gc_suspend_request</code> will remain reachable until the matching <code>gc_resume_request</code> call.  (That is, they are temporarily rooted.)
 
void '''gc_resume_request'''(GCHeap heap);
 
Resume a suspended request.
 
The calling thread must not be in an active request on any <code>GCHeap</code>.
 
The most recently suspended inactive request that the calling thread is in on <code>heap</code> becomes active.
 
#define '''GC_FAST_SUSPEND_REQUEST'''(heap) ...
#define '''GC_FAST_RESUME_REQUEST'''(heap) ...
 
These are macros such that this code:
<pre style="border: none; padding: none; background-color: transparent">GC_FAST_SUSPEND_REQUEST(expr);
&lt;statements&gt;
GC_FAST_RESUME_REQUEST(expr);</pre>
 
expands to a C/C++ statement that behaves like this one:
<pre style="border: none; padding: none; background-color: transparent">{
    gc_suspend_request(heap);
    &lt;statements&gt;
    gc_resume_request(heap);
}</pre>
 
except that:
* in C++, <code>gc_resume_request</code> must be called whenever control exits the block, even if it exits via an exception, <code>return</code>, <code>break</code>, <code>continue</code>, or <code>goto</code>; and
* the behavior is undefined if the ''&lt;statements&gt;'' contain any identifier starting with <code>_gc_</code>.
 
If either macro is used any other way, the result is undefined.
 
void '''gc_yield_request'''(GCHeap heap);


== Synchronization/concurrency ==
Equivalent to <code>{gc_suspend_request(heap); gc_resume_request(heap);}</code>.
The request model, or something.


'''Open issue:''' This whole area.
void '''gc_set_thread_affinity'''();


('''jorendorff note:''' SpiderMonkey has some pretty awesome hacks in the gc synchronization code, requiring equally awesome hacking in ActionMonkey's branch of MMgc.  Maybe we could better divide the responsibilities.  Discuss.)
Inform the GC that all finalizers and callback functions should be called on the current thread.


== Tracing ==
== Tracing ==
Confirmed users, Bureaucrats and Sysops emeriti
1,217

edits