While reading paper : Runtime support for Multicore Haskell, i thought that it will be fun to
trace the work done in paper with current code. This also helped me to understand paper well. Hope,
it will be useful to you as well.
I have taken quotations from paper and then shared corresponding source code. I am focusing on data structures in this blog :
1) Each Haskell thread runs on a finite-sized stack, which is allocated in the heap. The state of a thread, together with its stack,kept in a heap-allocated thread state object (TSO). The size ofTSO is around 15 words plus the stack, and constitutes the whole of a Haskell Thread.
In ghc/includes/rts/storage/TSO.h :
Notice StgWord stack[]; :) struct StgStack is used in StgTSO
2) Haskell Execution Context (HEC) for each CPU2. The HEC is data structure that contains all the data that an OS worker thread requires in order to execute Haskell threads. In particular, a HEC contains ...
In ghc/rts/Capability.h :
3) To make spark distribution cheaper and more asynchronous we re-implemented each HEC’s Spark Pool as a bounded work- stealing queue (Arora et al. 1998; Chase and Lev 2005). A work- stealing queue is a lock-free data structure with some attractive properties: the owner of the queue can push and pop from one end without synchronisation, meanwhile other threads can “steal” from the other end of the queue incurring only a single atomic instruction. When the queue is almost empty, popping also incurs an atomic instruction to avoid a race between the popping thread and a stealing thread.
In ghc/rts/WSDeque.h :
4) The shared heap is divided into fixed-size (4kbyte) blocks, each with a block descriptor that specifies which generation it belongs to, along with other per-block information. A HEC’s Allocation Area simply consists of a list of such blocks.
In ghc/includes/rts/storage/Block.h
Also, in ghc/rts/sm/Storage.c
In subsequent blogs, i would talk more about the algorithms as well.
I have taken quotations from paper and then shared corresponding source code. I am focusing on data structures in this blog :
1) Each Haskell thread runs on a finite-sized stack, which is allocated in the heap. The state of a thread, together with its stack,kept in a heap-allocated thread state object (TSO). The size ofTSO is around 15 words plus the stack, and constitutes the whole of a Haskell Thread.
In ghc/includes/rts/storage/TSO.h :
typedef struct StgStack_ { StgHeader header; StgWord32 stack_size; // stack size in *words* StgWord32 dirty; // non-zero => dirty StgPtr sp; // current stack pointer StgWord stack[]; } StgStack;
Notice StgWord stack[]; :) struct StgStack is used in StgTSO
typedef struct StgTSO_ { // .... more fields ... /* * The thread's stack */ struct StgStack_ *stackobj;
2) Haskell Execution Context (HEC) for each CPU2. The HEC is data structure that contains all the data that an OS worker thread requires in order to execute Haskell threads. In particular, a HEC contains ...
In ghc/rts/Capability.h :
struct Capability_ { // State required by the STG virtual machine when running Haskell // code. During STG execution, the BaseReg register always points // to the StgRegTable of the current Capability (&cap->r). StgFunTable f; StgRegTable r; // ... more fields ... #if defined(THREADED_RTS) // Worker Tasks waiting in the wings. Singly-linked. Task *spare_workers; uint32_t n_spare_workers; // count of above // This lock protects: // running_task // returning_tasks_{hd,tl} // wakeup_queue // inbox // putMVars Mutex lock; // Tasks waiting to return from a foreign call, or waiting to make // a new call-in using this Capability (NULL if empty). // NB. this field needs to be modified by tasks other than the // running_task, so it requires cap->lock to modify. A task can // check whether it is NULL without taking the lock, however. Task *returning_tasks_hd; // Singly-linked, with head/tail Task *returning_tasks_tl; uint32_t n_returning_tasks; // Messages, or END_TSO_QUEUE. // Locks required: cap->lock Message *inbox; // putMVars are really messages, but they're allocated with malloc() so they // can't go on the inbox queue: the GC would get confused. struct PutMVar_ *putMVars; SparkPool *sparks; // Stats on spark creation/conversion SparkCounters spark_stats; #if !defined(mingw32_HOST_OS) // IO manager for this cap int io_manager_control_wr_fd; #endif #endif
3) To make spark distribution cheaper and more asynchronous we re-implemented each HEC’s Spark Pool as a bounded work- stealing queue (Arora et al. 1998; Chase and Lev 2005). A work- stealing queue is a lock-free data structure with some attractive properties: the owner of the queue can push and pop from one end without synchronisation, meanwhile other threads can “steal” from the other end of the queue incurring only a single atomic instruction. When the queue is almost empty, popping also incurs an atomic instruction to avoid a race between the popping thread and a stealing thread.
In ghc/rts/WSDeque.h :
* Work-stealing Deque data structure * * ---------------------------------------------------------------------------*/ #ifndef WSDEQUE_H #define WSDEQUE_H typedef struct WSDeque_ { // Size of elements array. Used for modulo calculation: we round up // to powers of 2 and use the dyadic log (modulo == bitwise &)
4) The shared heap is divided into fixed-size (4kbyte) blocks, each with a block descriptor that specifies which generation it belongs to, along with other per-block information. A HEC’s Allocation Area simply consists of a list of such blocks.
In ghc/includes/rts/storage/Block.h
/* The size of a block (2^BLOCK_SHIFT bytes) */ #define BLOCK_SHIFT 12 /* The size of a megablock (2^MBLOCK_SHIFT bytes) */ #define MBLOCK_SHIFT 20 #ifdef CMINUSMINUS #define BLOCK_SIZE (1<<BLOCK_SHIFT) #else #define BLOCK_SIZE (UNIT<<BLOCK_SHIFT) // Note [integer overflow] #endif #ifdef CMINUSMINUS #define MBLOCK_SIZE (1<<MBLOCK_SHIFT) #else #define MBLOCK_SIZE (UNIT<<MBLOCK_SHIFT)
Also, in ghc/rts/sm/Storage.c
/* ----------------------------------------------------------------------------- StgPtr allocate (Capability *cap, W_ n) Allocates an area of memory n *words* large, from the nursery of the supplied Capability, or from the global block pool if the area requested is larger than LARGE_OBJECT_THRESHOLD. Memory is not allocated from the current nursery block, so as not to interfere with Hp/HpLim. The address of the allocated memory is returned. allocate() never fails; if it returns, the returned value is a valid address. If the nursery is already full, then another block is allocated from the global block pool. If we need to get memory from the OS and that operation fails, then the whole process will be killed. -------------------------------------------------------------------------- */ StgPtr allocate (Capability *cap, W_ n);
In subsequent blogs, i would talk more about the algorithms as well.
Comments
Post a Comment