GC API: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(steal bsmedberg's flags, gc_free())
Line 47: Line 47:
All allocations are <code>malloc</code>-aligned (that is, alignment is such that the pointer can be cast to any reasonable C/C++ type and used).
All allocations are <code>malloc</code>-aligned (that is, alignment is such that the pointer can be cast to any reasonable C/C++ type and used).


  void * '''gc_alloc_no_pointers'''(GCHeap heap, size_t size);
typedef enum GCAllocFlags {
    '''GC_HAS_FINALIZER'''
} '''GCAllocFlags''';
 
The ''flags'' parameter to the allocation functions is a bitwise OR of <code>GCAllocFlags</code> constants.  Currently we only have one: <code>GC_HAS_FINALIZER</code> indicates that the allocation starts with a C++ object of a type that has a virtual destructor.  The GC will call the destructor at some point after the object becomes unreachable and before its memory is reused or the heap is destroyed.
 
  void * '''gc_alloc_no_pointers'''(
    GCHeap heap, size_t size, int flags);


Allocate ''size'' bytes of memory, not necessarily zeroed.  The GC will assume the memory never contains pointers to other GC allocations.
Allocate ''size'' bytes of memory, not necessarily zeroed.  The GC will assume the memory never contains pointers to other GC allocations.


  void * '''gc_alloc_conservative'''(GCHeap heap, size_t size);
  void * '''gc_alloc_conservative'''(
    GCHeap heap, size_t size, int flags);


Allocate and zero out ''size'' bytes of memory.  The GC will conservatively scan the memory for pointers.
Allocate and zero out ''size'' bytes of memory.  The GC will conservatively scan the memory for pointers.


  void * '''gc_alloc_onion'''(GCHeap, size_t rawSize, size_t ptrsSize);
  void * '''gc_alloc_onion'''(
    GCHeap heap, size_t rawSize, size_t ptrsSize, int flags);


Allocate ''rawSize'' + ''ptrsSize'' bytes of memory.  Both sizes must be pointer-size-aligned.  The first ''rawSize'' bytes are not necessarily zeroed.  They GC will assume that they never contain pointers.  The next ''ptrsSize'' bytes are treated as an array of "possible pointers" (<code>jsval</code>, really; see below).
Allocate ''rawSize'' + ''ptrsSize'' bytes of memory.  Both sizes must be pointer-size-aligned.  The first ''rawSize'' bytes are not necessarily zeroed.  They GC will assume that they never contain pointers.  The next ''ptrsSize'' bytes are treated as an array of "possible pointers" (<code>jsval</code>, really; see below).
''flags'' must not contain <code>GC_HAS_FINALIZER</code>.


Returns a pointer to the first byte of the pointers region, so the non-pointer-containing words are at negative offsets and the pointer-containing words are at positive offsets.
Returns a pointer to the first byte of the pointers region, so the non-pointer-containing words are at negative offsets and the pointer-containing words are at positive offsets.


  void * '''gc_alloc_with_layout'''(GCHeap heap, GCLayout layout);
  void * '''gc_alloc_with_layout'''(
    GCHeap heap, GCLayout layout, int flags);


Allocate and zero out memory for an object.  The ''layout'' specifies the size of the allocation and which words of it may contain pointers. Layout is described in more detail below.
Allocate and zero out memory for an object.  The ''layout'' specifies the size of the allocation and which words of it may contain pointers. Layout is described in more detail below.
Line 69: Line 81:
     GCLayout headerLayout,
     GCLayout headerLayout,
     GCLayout elementLayout,
     GCLayout elementLayout,
     size_t count);
     size_t count,
    int flags);


Allocate and zero out memory for an array.  The layout of the array consists of an optional header described by ''headerLayout'', followed by ''count'' array elements, each described by ''elementLayout''.  Returns a pointer to the beginning of the allocation (the header).
Allocate and zero out memory for an array.  The layout of the array consists of an optional header described by ''headerLayout'', followed by ''count'' array elements, each described by ''elementLayout''.  Returns a pointer to the beginning of the allocation (the header).
Line 78: Line 91:


'''Open issue:''' Weak references.
'''Open issue:''' Weak references.
== Explicit deallocation ==
void '''gc_free'''(GCHeap heap, void *ptr);
Frees the allocation at ''ptr'', which must be a pointer returned by an allocation function.  The region of memory at ''ptr'' must be live, but the caller must also ensure that no live, exactly-scanned words point to it.  (Conservatively scanned words, including C/C++ temporaries and local variables, may point to it.)
This may be a no-op.


== Layout ==
== Layout ==
Line 137: Line 158:
This is a description of the private API that the allocator needs to provide to the GC engine which runs atop it.  (The allocator also provides all the Allocation and Layout APIs, which are user-visible.)
This is a description of the private API that the allocator needs to provide to the GC engine which runs atop it.  (The allocator also provides all the Allocation and Layout APIs, which are user-visible.)


typedef enum gcallocflags {
  GC_HAS_FINALIZER
};
/* allocate memory which has no pointers */
void * gc_alloc_no_pointers(gcheap *heap, size_t size,
                            int gcallocflags);
/* allocate memory in which every word is a pointer */
void * gc_alloc_all_pointers(gcheap *heap, size_t size,
                              int gcallocflags);
/* allocate memory in which every word must be conservatively scanned
    for pointers */
void * gc_alloc_conservative(gcheap *heap, size_t size,
                              int gcallocflags);
/* allocate memory using a layout object */
void * gc_alloc_with_layout(gcheap *heap, gclayout *layout,
                            int gcallocflags);
/* allocate an array of objects */
void * gc_alloc_array_with_layout(gcheap *heap, gclayout *header_layout,
                                  gclayout *entry_layout, size_t entrycount,
                                  gcallocflags);
void gc_free(gcheap *heap);
  /* APIs for marking */
  /* APIs for marking */
   
   

Revision as of 15:03, 1 April 2008

This page contains notes regarding a generic conservative-GC API that we want to insert between application code (SpiderMonkey/Tamarin/XPCOM/whatever) and GC implementations (MMgc/jegc/spam/whatever).

Very preliminary.

jorendorff 11:11, 28 March 2008 (PDT)

If you want to listen in on some design discussions, lurk in the #memory channel on irc.mozilla.org. Next meeting: Wednesday, April 2, 2008, 12:00 noon Pacific time.

Decisions

We will require the GC implementation to scan the C/C++ stack conservatively.

The API will support mostly-exact GC, meaning each object is exactly scanned to find other objects directly reachable from it, and only the stack and certain objects are scanned conservatively. (However, it seems to me like you could still implement this kind of API as a thin wrapper around a conservative-only GC like Boehm GC. It would just be more conservative than the API requires, that's all.)

We will support C++ objects somehow, as well as something like spam's "onion" layout.

We won't bend over backwards to make the API support MMgc in the fastest or most natural way. For example, the fastest possible MMgc write barrier requires the caller to know the beginning-of-allocation pointers for both the object containing the field being written and the pointer being stored there. This is a pain for the user, and a card-marking GC doesn't need those pointers anyway. So our API will not provide this information; the MMgc wrapper will have to calculate it each time the write barrier hits. This will make MMgc look slow, but we can live with that.

The purpose of this API is to support switching to a different GC implementation at compile time, not at run time. (Each implementation will have its own header file, with its own inline functions or function-like macros. Maybe we'll produce a GC API header file that has no inline anything, just showing the signatures the implementation must provide, for reference.)

Open issue: API style. (C or C++? Use templates? What naming conventions? I think it's too early to ask these questions. So please consider all the function signatures below to be pseudocode.)

Open issue: We should detect dumb implementation mismatches at link time and bomb out. I don't know the trick, but I bet bsmedberg does.

API areas

Heap setup/teardown

We support multiple heaps; each heap is a disjoint universe of objects and can be collected separately. Implementations are expected not to trace pointers across heaps.

typedef ... GCHeap;

GCHeap gc_heap_create();

void gc_destroy_heap(GCHeap heap);

Allocation

(BTW we haven't established naming conventions; this is just a sketch)

The idea here is to support allocations of different types: no_pointers allocations are leaf objects and never contain pointers at all; with_layout and array_with_layout allocations have a known layout that governs which fields are pointers; and conservative allocations might contain pointers at any pointer-aligned offset and must be conservatively scanned.

(Where the layout information isn't a speed win, the implementation can of course just discard it. A hacky implementation can just delegate gc_alloc_with_layout and gc_alloc_array_with_layout to gc_alloc_conservative. Sloppy, but fine by me.)


All these functions return a pointer to a newly allocated region of memory that is subject to GC (that is, the GC may collect it when it becomes unreachable), or NULL on failure.

All allocations are malloc-aligned (that is, alignment is such that the pointer can be cast to any reasonable C/C++ type and used).

typedef enum GCAllocFlags {
    GC_HAS_FINALIZER
} GCAllocFlags;

The flags parameter to the allocation functions is a bitwise OR of GCAllocFlags constants. Currently we only have one: GC_HAS_FINALIZER indicates that the allocation starts with a C++ object of a type that has a virtual destructor. The GC will call the destructor at some point after the object becomes unreachable and before its memory is reused or the heap is destroyed.

void * gc_alloc_no_pointers(
    GCHeap heap, size_t size, int flags);

Allocate size bytes of memory, not necessarily zeroed. The GC will assume the memory never contains pointers to other GC allocations.

void * gc_alloc_conservative(
    GCHeap heap, size_t size, int flags);

Allocate and zero out size bytes of memory. The GC will conservatively scan the memory for pointers.

void * gc_alloc_onion(
    GCHeap heap, size_t rawSize, size_t ptrsSize, int flags);

Allocate rawSize + ptrsSize bytes of memory. Both sizes must be pointer-size-aligned. The first rawSize bytes are not necessarily zeroed. They GC will assume that they never contain pointers. The next ptrsSize bytes are treated as an array of "possible pointers" (jsval, really; see below).

flags must not contain GC_HAS_FINALIZER.

Returns a pointer to the first byte of the pointers region, so the non-pointer-containing words are at negative offsets and the pointer-containing words are at positive offsets.

void * gc_alloc_with_layout(
    GCHeap heap, GCLayout layout, int flags);

Allocate and zero out memory for an object. The layout specifies the size of the allocation and which words of it may contain pointers. Layout is described in more detail below.

void * gc_alloc_array_with_layout(
    GCHeap heap,
    GCLayout headerLayout,
    GCLayout elementLayout,
    size_t count,
    int flags);

Allocate and zero out memory for an array. The layout of the array consists of an optional header described by headerLayout, followed by count array elements, each described by elementLayout. Returns a pointer to the beginning of the allocation (the header).

Open issue: Reference counting (or DRC) and/or other nasty tricks to keep memory usage down.

Open issue: Finalizers.

Open issue: Weak references.

Explicit deallocation

void gc_free(GCHeap heap, void *ptr);

Frees the allocation at ptr, which must be a pointer returned by an allocation function. The region of memory at ptr must be live, but the caller must also ensure that no live, exactly-scanned words point to it. (Conservatively scanned words, including C/C++ temporaries and local variables, may point to it.)

This may be a no-op.

Layout

typedef ... GCLayout;

GCLayout gc_create_layout(GCHeap heap, size_t size,
                          unsigned char *bitmap);

Create a layout object which represents the layout of pointers within an object type.

bitmap is an array of bytes. Each byte represents one word of the target object (there must be size/sizeof(void *) entries)

0 - the word is not a pointer
1 - the word is a pointer, perhaps an interior pointer, and perhaps a pointer to an rcobject
2 - the word is a jsval
3 - the word is the possibly-pointer-containing half of a ttbox

3 only makes sense on 32-bit platforms for now.

Open issue: GCLayout lifetime - gc'd? or lifetime of heap?


Write barrier

template <class T, class U>
void gc_update(T **pField, U *newPtr);

(It's not decided whether we want C++ inline function templates, C macros, or what. This is just a sketch.)

Haven't discussed possibly having a different flavor of write barrier for jsval fields, etc.  :-P

(I'm calling it gc_update after what Jones and Lins call it in their book. Two very minor reasons to prefer this over gc_write_barrier: gc_update is less obscure for non-GC-experts; and it makes it clear that it actually performs the assignment, not just GC bookkeeping.)

Open issue: The Tamarin JIT might want some APIs that can help it generate fast inline machine code for pointer assignments (by inlining parts of the write barrier). At a minimum the JIT probably requires a wb function whose address can be taken. Not worrying about this yet.


GC

gc / maybe_gc / do_incremental_gc

Open issue: Hooks into the GC cycle.

Rooting

add root / remove root

Synchronization/concurrency

The request model, or something.

Open issue: This whole area.

(jorendorff note: SpiderMonkey has some pretty awesome hacks in the gc synchronization code, requiring equally awesome hacking in ActionMonkey's branch of MMgc. Maybe we could better divide the responsibilities. Discuss.)

Tracing

Some SpiderMonkey 1.8 JSAPI features, yet to be documented, need this. See mdc:JSAPI Reference#Memory_management.

bsmedberg says: most of the spidermonkey tracing APIs were designed for the cycle collector. I would dearly love to flush them down the drain once XPCOMGC lands.

The Memory API

This is a description of the private API that the allocator needs to provide to the GC engine which runs atop it. (The allocator also provides all the Allocation and Layout APIs, which are user-visible.)

/* APIs for marking */

/* Begin the process of mark/sweep: after this call, gc_is_marked should
   return GC_UNMARKED for all allocations */
void gc_begin_mark(gcheap *heap);

/* @param object must be the outermost pointer to the allocation */
void gc_mark_object(gcheap *heap, void *object);

/* Get the "next" object which has been marked but not yet processed.
   The GC engine needs to know about the object layout, but I don't know
   the best way to expose this (or whether the allocator ought to do that?) */
void * gc_get_next_unprocessed(gcheap *heap, some_kind_of_layout_object);

typedef enum gcmarkstate {
  GC_NOT_GCOBJECT, /* this pointer is not a gc allocation */
  GC_UNMARKED,
  GC_MARKED,     /* gc_mark_object, but not yet gc_get_next_unprocessed */
  GC_PROCESSED   /* processed by gc_get_next_unprocessed */
} gcmarkstate;

/* @param object must be an exterior pointer to an object in this heap */
gcmarkstate gc_get_markstate(gcheap *heap, void *object);

/* @param object inout - in is a conservative pointer
                 if the function returns something other than GC_NOT_GCOBJECT,
                 it will be set to the "outer" pointer */
gcmarkstate gc_conservative_beginning(gcheap *heap, void **object);

void gc_mark_stack(gcheap *heap);

/* Inform the allocator that we're done marking. The allocator will sweep
   all unmarked objects, firing finalizers where appropriate. */
gc_end_mark(gcheap *heap);

Things to be specified:

  • Synchronization
  • RCObjects - other than write barriers and a flag bit, I don't think the allocator itself needs to know anything about RCObjects