GHC code of Multicore Haskell

While reading paper : Runtime support for Multicore Haskell, i thought that it will be fun to trace the work done in paper with current code. This also helped me to understand paper well. Hope, it will be useful to you as well.

I have taken quotations from paper and then shared corresponding source code. I am focusing on data structures in this blog :

1) Each Haskell thread runs on a finite-sized stack, which is allocated in the heap. The state of a thread, together with its stack,kept in a heap-allocated thread state object (TSO). The size ofTSO is around 15 words plus the stack, and constitutes the whole of a Haskell Thread.

In ghc/includes/rts/storage/TSO.h :
 
typedef struct StgStack_ {
    StgHeader  header;
    StgWord32  stack_size;     // stack size in *words*
    StgWord32  dirty;          // non-zero => dirty
    StgPtr     sp;             // current stack pointer
    StgWord    stack[];
} StgStack;

Notice StgWord stack[]; :) struct StgStack is used in StgTSO

typedef struct StgTSO_ {
     // .... more fields ...
     /*
     * The thread's stack
     */
    struct StgStack_       *stackobj;

2) Haskell Execution Context (HEC) for each CPU2. The HEC is data structure that contains all the data that an OS worker thread requires in order to execute Haskell threads. In particular, a HEC contains ...

In ghc/rts/Capability.h :

struct Capability_ {
    // State required by the STG virtual machine when running Haskell
    // code.  During STG execution, the BaseReg register always points
    // to the StgRegTable of the current Capability (&cap->r).
    StgFunTable f;
    StgRegTable r;

    // ... more fields ...

#if defined(THREADED_RTS)
    // Worker Tasks waiting in the wings.  Singly-linked.
    Task *spare_workers;
    uint32_t n_spare_workers; // count of above

    // This lock protects:
    //    running_task
    //    returning_tasks_{hd,tl}
    //    wakeup_queue
    //    inbox
    //    putMVars
    Mutex lock;

    // Tasks waiting to return from a foreign call, or waiting to make
    // a new call-in using this Capability (NULL if empty).
    // NB. this field needs to be modified by tasks other than the
    // running_task, so it requires cap->lock to modify.  A task can
    // check whether it is NULL without taking the lock, however.
    Task *returning_tasks_hd; // Singly-linked, with head/tail
    Task *returning_tasks_tl;
    uint32_t n_returning_tasks;

    // Messages, or END_TSO_QUEUE.
    // Locks required: cap->lock
    Message *inbox;

    // putMVars are really messages, but they're allocated with malloc() so they
    // can't go on the inbox queue: the GC would get confused.
    struct PutMVar_ *putMVars;

    SparkPool *sparks;

    // Stats on spark creation/conversion
    SparkCounters spark_stats;
#if !defined(mingw32_HOST_OS)
    // IO manager for this cap
    int io_manager_control_wr_fd;
#endif
#endif

3) To make spark distribution cheaper and more asynchronous we re-implemented each HEC’s Spark Pool as a bounded work- stealing queue (Arora et al. 1998; Chase and Lev 2005). A work- stealing queue is a lock-free data structure with some attractive properties: the owner of the queue can push and pop from one end without synchronisation, meanwhile other threads can “steal” from the other end of the queue incurring only a single atomic instruction. When the queue is almost empty, popping also incurs an atomic instruction to avoid a race between the popping thread and a stealing thread.

In ghc/rts/WSDeque.h :

 * Work-stealing Deque data structure
 *
 * ---------------------------------------------------------------------------*/

#ifndef WSDEQUE_H
#define WSDEQUE_H

typedef struct WSDeque_ {
    // Size of elements array. Used for modulo calculation: we round up
    // to powers of 2 and use the dyadic log (modulo == bitwise &)

4) The shared heap is divided into fixed-size (4kbyte) blocks, each with a block descriptor that specifies which generation it belongs to, along with other per-block information. A HEC’s Allocation Area simply consists of a list of such blocks.

In ghc/includes/rts/storage/Block.h

/* The size of a block (2^BLOCK_SHIFT bytes) */
#define BLOCK_SHIFT  12

/* The size of a megablock (2^MBLOCK_SHIFT bytes) */
#define MBLOCK_SHIFT   20

#ifdef CMINUSMINUS
#define BLOCK_SIZE   (1<<BLOCK_SHIFT)
#else
#define BLOCK_SIZE   (UNIT<<BLOCK_SHIFT)
// Note [integer overflow]
#endif

#ifdef CMINUSMINUS
#define MBLOCK_SIZE    (1<<MBLOCK_SHIFT)
#else
#define MBLOCK_SIZE    (UNIT<<MBLOCK_SHIFT)

Also, in ghc/rts/sm/Storage.c
 
/* -----------------------------------------------------------------------------
   StgPtr allocate (Capability *cap, W_ n)

   Allocates an area of memory n *words* large, from the nursery of
   the supplied Capability, or from the global block pool if the area
   requested is larger than LARGE_OBJECT_THRESHOLD.  Memory is not
   allocated from the current nursery block, so as not to interfere
   with Hp/HpLim.

   The address of the allocated memory is returned. allocate() never
   fails; if it returns, the returned value is a valid address.  If
   the nursery is already full, then another block is allocated from
   the global block pool.  If we need to get memory from the OS and
   that operation fails, then the whole process will be killed.
   -------------------------------------------------------------------------- */

StgPtr allocate (Capability *cap, W_ n); 

In subsequent blogs, i would talk more about the algorithms as well.

Comments