<h3>8 <span class="Heading">Background jobs using fork</span></h3>
<p>This chapter describes a way to use multi-processor or multi-core machines from within <strong class="pkg">GAP</strong>. In its current version the <strong class="pkg">GAP</strong> system is a single threaded and single process system. However, modern operating systems allow, via the <code class="code">fork</code> system call, to replicate a complete process on the same machine relatively efficiently. That is, at first after a <code class="code">fork</code> the two processes actually use the same physical memory such that not much copying needs to be done. The child process is in exactly the same state as the parent process, sharing open files, network connections and the complete status of the workspace. However, whenever a page of memory is written, it is then automatically copied using new, additional physical memory, such that it behaves like a completely separate process. This method is called <q>copy-on-write</q>.</p>
<p>Thus this is a method to parallelise certain computations. Note however, that from the point of time when the <code class="code">fork</code> has occurred, all further communication between the two processes has to be realised via pipes or even files.</p>
<p>The operations and methods described in this chapter help to use <strong class="pkg">GAP</strong> in this way and implement certain <q>skeletons</q> of parallel programming to make these readily available in <strong class="pkg">GAP</strong>. Note that this implementation has its severe limitations and should probably eventually be replaced by a proper multi-threaded version of <strong class="pkg">GAP</strong>.</p>
<p>This operation creates a background job using <code class="func">IO_fork</code> (<a href="chap3.html#X86C819F37D07ECF7"><span class="RefLink">3.2-19</span></a>) which starts up as an identical copy of the currently running <strong class="pkg">GAP</strong> process. In this child process the function <var class="Arg">fun</var> is called with the argument list <var class="Arg">args</var>. The third argument <var class="Arg">opt</var> must be a record for options. The operation returns either an object representing the background job or <code class="keyw">fail</code> if the startup did not work.</p>
<p>This operation automatically sets up two pipes for communication with the child process. This is in particular used to report the result of the function call to <var class="Arg">fun</var> back to the parent. However, if called without the option <code class="code">TerminateImmediately</code> (see below) the child process stays alive even after the completion of <var class="Arg">fun</var> and one can submit further argument lists for subsequent calls to <var class="Arg">fun</var>. Of course, these additional argument lists will have to be sent over a pipe to the child process. A special case is if the argument <var class="Arg">args</var> is equal to <code class="keyw">fail</code>, in this case the child process is started but does not automatically call <var class="Arg">fun</var> but rather waits in an idle state until an argument list is submitted via the pipe using the <code class="func">Submit</code> (<a href="chap8.html#X864492F37E858197"><span class="RefLink">8.1-6</span></a>) operation described below.</p>
<p>There are two components defined which can be bound in the options record <var class="Arg">opt</var>. One is <code class="code">TerminateImmediately</code>, if this is bound to <code class="keyw">true</code> then the child process immediately terminates after the function <var class="Arg">fun</var> returns its result. In this case, no pipe for communication from parent to child is created since it would never be used. Note that in this case one can still get the result of the function <var class="Arg">fun</var> using the <code class="func">Pickup</code> (<a href="chap8.html#X7F4B8B9078D0E18E"><span class="RefLink">8.1-5</span></a>) operation described below, even when the child has already terminated, since the result is first transmitted back to the parent before termination.</p>
<p>The following operations are available to deal with background job objects:</p>
<p>This operation checks whether or not the background job represented by the object <var class="Arg">job</var> has already finished the function call to its worker function and is now idle. If so, <code class="keyw">true</code> is returned. If it is still running and working on the worker function, <code class="keyw">false</code> is returned. If the background job has already terminated altogether, this operation returns <code class="keyw">fail</code>. Note that if a child process terminates automatically after the first completion of its worker function and sending the result, then the first call to <code class="func">IsIdle</code> after completion will return <code class="keyw">true</code> to indicate successful completion and all subsequent calls will return <code class="keyw">fail</code>.</p>
<p>This operation checks whether or not the background job represented by the object <var class="Arg">job</var> has already terminated. If so, <code class="keyw">true</code> is returned, if not, <code class="keyw">false</code> is returned.</p>
<div class="func"><table class="func" width="100%"><tr><td class="tdleft"><code class="func">‣ WaitUntilIdle</code>( <var class="Arg">job</var> )</td><td class="tdright">( operation )</td></tr></table></div>
<p>Returns: the result of the worker function or <code class="keyw">fail</code></p>
<p>This operation waits until the worker function of the background job <var class="Arg">job</var> has finished and the job is idle. It then returns the result of the worker function, which has automatically been transmitted to the parent process. If the child process has died before completion <code class="keyw">fail</code> is returned.</p>
<div class="func"><table class="func" width="100%"><tr><td class="tdleft"><code class="func">‣ Pickup</code>( <var class="Arg">job</var> )</td><td class="tdright">( operation )</td></tr></table></div>
<p>Returns: the result of the worker function or <code class="keyw">fail</code></p>
<p>This operation does the same as <code class="func">WaitUntilIdle</code> (<a href="chap8.html#X7C139805804E6FE1"><span class="RefLink">8.1-4</span></a>).</p>
<p>This submits another argument list <var class="Arg">args</var> for another call to the worker function in the background job <var class="Arg">job</var>. It is an error if either the background job has already terminated or if it is still busy working on the previous argument list. That is, one must only submit another argument in a situation when <code class="func">IsIdle</code> (<a href="chap8.html#X7B7D934583257B9A"><span class="RefLink">8.1-2</span></a>) would return <code class="keyw">true</code>. This is for example the case directly after a successful call to <code class="func">Pickup</code> (<a href="chap8.html#X7F4B8B9078D0E18E"><span class="RefLink">8.1-5</span></a>) or i <code class="func">WaitUntilIdle</code> (<a href="chap8.html#X7C139805804E6FE1"><span class="RefLink">8.1-4</span></a>) which did not return <code class="keyw">fail</code>, unless the background job was created with the <code class="code">TerminateImmediately</code> option set to <code class="keyw">true</code>.</p>
<p>This operation returns immediately after submission, when the new argument list has been sent to the child process through a pipe. In particular, it does not await completion of the worker function for the new argument list.</p>
<p>This kills the background job represented by the object <var class="Arg">job</var> with immediate effect. No more results can be expected from it. Note that unless one has created the background job with the <code class="code">TerminateImmediately</code> option set to <code class="keyw">true</code> one always has to call <code class="func">Kill</code> on a background job eventually for cleanup purposes. Otherwise, the background job and the connecting pipes remain alive until the parent <strong class="pkg">GAP</strong> process terminates.</p>
<div class="func"><table class="func" width="100%"><tr><td class="tdleft"><code class="func">‣ ParTakeFirstResultByFork</code>( <var class="Arg">jobs</var>, <var class="Arg">args</var>[, <var class="Arg">opt</var>] )</td><td class="tdright">( operation )</td></tr></table></div>
<p>Returns: a list of results or <code class="keyw">fail</code></p>
<p>The argument <var class="Arg">jobs</var> must be a list of <strong class="pkg">GAP</strong> functions and the argument <var class="Arg">args</var> a list of the same length containing argument lists with which the job functions can be called. This operation starts up a background job using <codeclass="code">fork</code> for each of the functions in <var class="Arg">jobs</var>, calls it with the corresponding argument list in <var class="Arg">args</var>. As soon as any of the background jobs finishes with a result, <code class="func">ParTakeFirstResultByFork</code> terminates all other jobs and reports the results found so far. Note that it can happen that two jobs finish <q>at the same time</q> in the sense that both results are received before all other jobs could be terminated. Therefore the result of <code class="func">ParTakeFirstResultByFork</code> is a list, in which position <span class="SimpleMath">i</span> is bound if and only if job number <span class="SimpleMath">i</span> returned a result. So in the result at least one entry is bound but it is possible that more than one entry is bound.</p>
<p>You can specify an overall timeout to give up the whole computation if no job finishes by setting the <code class="code">TimeOut</code> component of the options record <var class="Arg">opt</var>. In this case you have to set it to a record with two components <code class="code">tv_sec</code> and <code class="code">tv_usec</code> which are seconds and microseconds respectively, exactly as returned by the <code class="func">IO_gettimeofday</code> (<a href="chap3.html#X7BC965198011083B"><span class="RefLink">3.2-29</span></a>) function. In the case of timeout an empty list is returned.</p>
<div class="func"><table class="func" width="100%"><tr><td class="tdleft"><code class="func">‣ ParDoByFork</code>( <var class="Arg">jobs</var>, <var class="Arg">args</var>[, <var class="Arg">opt</var>] )</td><td class="tdright">( operation )</td></tr></table></div>
<p>Returns: a list of results or <code class="keyw">fail</code></p>
<p>The argument <var class="Arg">jobs</var> must be a list of <strong class="pkg">GAP</strong> functions and the argument <var class="Arg">args</var> a list of the same length containing argument lists with which the job functions can be called. This operation starts up a background job using <codeclass="code">fork</code> for each of the functions in <var class="Arg">jobs</var>, calls it with the corresponding argument list in <var class="Arg">args</var>. As soon as all of the background jobs finish with a result, <code class="func">ParDoByFork</code> reports the results found. Therefore the result of <code class="func">ParDoByFork</code> is a list, in which position <span class="SimpleMath">i</span> is bound to the result that job number <span class="SimpleMath">i</span> returned.</p>
<p>You can specify an overall timeout to stop the whole computation if not all jobs finish in time by setting the <code class="code">TimeOut</code> component of the options record <var class="Arg">opt</var>. In this case you have to set it to a record with two components <code class="code">tv_sec</code> and <code class="code">tv_usec</code> which are seconds and microseconds respectively, exactly as returned by the <code class="func">IO_gettimeofday</code> (<a href="chap3.html#X7BC965198011083B"><span class="RefLink">3.2-29</span></a>) function. In the case of timeout a list is returned in which the positions corresponding to those jobs that have already finished are bound to the respective results and the other positions are unbound.</p>
<div class="func"><table class="func" width="100%"><tr><td class="tdleft"><code class="func">‣ ParListByFork</code>( <var class="Arg">l</var>, <var class="Arg">worker</var>[, <var class="Arg">opt</var>] )</td><td class="tdright">( operation )</td></tr></table></div>
<p>Returns: a list of results or <code class="keyw">fail</code></p>
<p>This is a parallel version of the <code class="func">List</code> (<a href="../../../doc/ref/chap21_mj.html#X7C3DC8BE78DEECDE"><span class="RefLink">Reference: list and non-list difference</span></a>) function. It applies the function <var class="Arg">worker</var> to all elements of the list <var class="Arg">l</var> and returns a list containing the results in corresponding positions. You have to specify the component <code class="code">NumberJobs</code> in the options record <var class="Arg">opt</var> which indicates how many background processes to start. You can optionally use the <code class="code">TimeOut</code> option exactly as for <code class="func">ParDoByFork</code> (<a href="chap8.html#X7A1F40D1841C36D2"><span class="RefLink">8.2-2</span></a>), however, if a timeout occurs, <code class="func">ParListByFork</code> returns <code class="keyw">fail</code>.</p>
<p>Note that the usefulness of this operation is relatively limited, since every individual result has to be sent back over a pipe from the child process to the parent process. Therefore this only makes sense if the computation time for the worker function dominates the communication time.</p>
<div class="func"><table class="func" width="100%"><tr><td class="tdleft"><code class="func">‣ ParMapReduceByFork</code>( <var class="Arg">l</var>, <var class="Arg">map</var>, <var class="Arg">reduce</var>[, <var class="Arg">opt</var>] )</td><td class="tdright">( operation )</td></tr></table></div>
<p>Returns: a value or <code class="keyw">fail</code></p>
<p>This is a parallel version implementation of the classical <code class="code">MapReduce</code> pattern. It applies the function <var class="Arg">map</var> to all elements of the list <var class="Arg">l</var> and then reduces the result using the <var class="Arg">reduce</var> function which accepts two return values of <var class="Arg">map</var> and returns one of them. Thus, the final result is one return value or <code class="keyw">fail</code> if the startup of the jobs fails. You have to specify the component <code class="code">NumberJobs</code> in the options record <var class="Arg">opt</var> which indicates how many background processes to start. You can optionally use the <code class="code">TimeOut</code> option exactly as for <code class="func">ParDoByFork</code> (<a href="chap8.html#X7A1F40D1841C36D2"><span class="RefLink">8.2-2</span></a>), however, if a timeout occurs, <code class="func">ParMapReduceByFork</code> returns <code class="keyw">fail</code>.</p>
<p>Note that this can be very useful because quite often the cumulated computation time for all the worker function calls dominates the communication time for a single result.</p>
<div class="func"><table class="func" width="100%"><tr><td class="tdleft"><code class="func">‣ IO_CallWithTimeout</code>( <var class="Arg">timeout</var>, <var class="Arg">func</var>, <var class="Arg">...</var> )</td><td class="tdright">( function )</td></tr></table></div>
<div class="func"><table class="func" width="100%"><tr><td class="tdleft"><code class="func">‣ IO_CallWithTimeoutList</code>( <var class="Arg">timeout</var>, <var class="Arg">func</var>, <var class="Arg">arglist</var> )</td><td class="tdright">( function )</td></tr></table></div>
<p><code class="code">IO_CallWithTimeout</code> and <code class="code">IO_CallWithTimeoutList</code> allow calling a function with a limit on length of time it will run. The function is run inside a copy of the current GAP session, so any changes it makes to global variables are thrown away when the function finishes or times out. The return value of <var class="Arg">func</var> is passed back to the current GAP session via <code class="code">IO_Pickle</code>. Note that <code class="code">IO_Pickle</code> may not be available for all objects.</p>
<p><code class="code">IO_CallWithTimeout</code> is variadic. Any arguments to it beyond the first two are passed as arguments to <var class="Arg">func</var>. <code class="code">IO_CallWithTimeoutList</code> in contrast takes exactly three arguments, of which the third is a list (possibly empty) of arguments to pass to <var class="Arg">func</var>.</p>
<p>If the call completes within the allotted time and returns a value <code class="code">res</code>, the result of <code class="code">IO_CallWithTimeout[List]</code> is a list of the form <code class="code">[ true, res ]</code>.</p>
<p>If the call completes within the allotted time and returns no value, the result of <code class="code">IO_CallWithTimeout[List]</code> is the list <code class="code">[ true ]</code>.</p>
<p>If the call does not complete within the timeout, the result of <code class="code">IO_CallWithTimeout[List]</code> is the list <code class="code">[ false ]</code>. If the call causes GAP to crash or exit, the result is the list <code class="code">[ fail ]</code>.</p>
<p>The timer is suspended during execution of a break loop and abandoned when you quit from a break loop.</p>
<p>The limit <var class="Arg">timeout</var> is specified as a record. At present the following components are recognised <code class="code">nanoseconds</code>, <code class="code">microseconds</code>, <code class="code">milliseconds</code>, <code class="code">seconds</code>, <code class="code">minutes</code>, <code class="code">hours</code>, <code class="code">days</code> and <code class="code">weeks</code>. Any of these components which is present should be bound to a positive integer, rational or float and the times represented are totalled to give the actual timeout. As a shorthand, a single positive integers may be supplied, and is taken as a number of microseconds. Further components are permitted and ignored, to allow for future functionality.</p>
<p>The precision of the timeouts is not guaranteed, and there is a system dependent upper limit on the timeout which is typically about 8 years on 32 bit systems and about 30 billion years on 64 bit systems. Timeouts longer than this will be reduced to this limit.</p>
<p>Note that the next parallel skeleton is a worker farm which is described in the following section.</p>
<p>The parallel skeleton of a worker farm is basically nothing but a bunch of background jobs all with the same worker function and all eagerly waiting for work. The only additional concepts needed are an input and an output queue. The input queue contains argument lists and the output queue pairs of argument lists and results.</p>
<p>One creates a worker farm with the following operation:</p>
<div class="func"><table class="func" width="100%"><tr><td class="tdleft"><code class="func">‣ ParWorkerFarmByFork</code>( <var class="Arg">fun</var>, <var class="Arg">opt</var> )</td><td class="tdright">( operation )</td></tr></table></div>
<p>Returns: an object representing the worker farm or <code class="keyw">fail</code></p>
<p>This operation creates a worker farm with the worker function <var class="Arg">fun</var> and sets up its input and output queue. An object representing the farm is returned unless not all jobs could be started up in which case <code class="keyw">fail</code> is returned. After startup all background jobs in the farm are idle. The only valid option in the options record <var class="Arg">opt</var> is <code class="code">NumberJobs</code> and it must be bound to the number of worker jobs in the farm, a positive integer.</p>
<p>The following operations are for worker farm objects:</p>
<p>This operation called on a worker farm object <var class="Arg">wf</var> administrates the inputand output queues of the worker farm. In particular it checks whether new results are available from the workers and if so it appends them to the output queue. If jobs are idle and the input queue is non-empty, argument lists from the input queue are sent to the idle jobs and removed from the input queue.</p>
<p>This operation must be called regularly to keep up the communication with the clients. It uses <code class="code">select</code> and so does not block if the boolean argument <var class="Arg">block</var> is set to <code class="keyw">false</code>. However, if larger chunks of data has to be sent or received this operation might need some time to return.</p>
<p>If the boolean argument <var class="Arg">block</var> is set to true then the <code class="func">DoQueues</code> blocks until at least one job has returned a result. This can be used to wait for the termination of all tasks without burning CPU cycles in the parent job. One would repeatedly call <code class="func">DoQueues</code> with <var class="Arg">block</var> set to <code class="keyw">true</code> and after each such call check with <code class="func">IsIdle</code> (<a href="chap8.html#X7C4E3D5A7FE617FE"><span class="RefLink">8.3-4</span></a>) whether all tasks are done. Note that one should no longer call <code class="func">DoQueues</code> with <var class="Arg">block</var> set to <code class="keyw">true</code> once this is the case since then it would block forever.</p>
<p>This operation is called automatically by most of the following operations.</p>
<p>This operation terminates all background jobs in the farm <var class="Arg">wf</var>, which cannot be used subsequently. One should always call this operation when the worker farm is no longer needed to free resources.</p>
<p>This operation returns <code class="keyw">true</code> if all background jobs in the worker farm <var class="Arg">wf</var> are idle. This means, that all tasks which have previously been submitted using <code class="func">Submit</code> (<a href="chap8.html#X81773CEC8246EDF3"><span class="RefLink">8.3-5</span></a>) have been completed and their result been appended to the output queue. The operation <code class="func">DoQueues</code> (<a href="chap8.html#X7ED2CE687EA7FC66"><span class="RefLink">8.3-2</span></a>) is automatically called before the execution of <code class="func">IsIdle</code>.</p>
<p>This operation submits a task in the form of an argument list for the worker function to the worker farm. It is appended at the end of the input queue. The operation <code class="func">DoQueues</code> (<a href="chap8.html#X7ED2CE687EA7FC66"><span class="RefLink">8.3-2</span></a>) is automatically called after the execution of <code class="func">Submit</code>, giving the farm a chance to actually send the work out to the worker background jobs.</p>
<p>This operation collects all results from the output queue of the worker farm. The output queue is empty after this function returns. The results are reported as a list of pairs, each pair has the input argument list as first component and the outputobject as second component.</p>
<p>The operation <code class="func">DoQueues</code> (<a href="chap8.html#X7ED2CE687EA7FC66"><span class="RefLink">8.3-2</span></a>) is automatically called before the execution of <code class="func">Pickup</code>, giving the farm a chance to actually receive some more results from the worker background jobs.</p>
Die Informationen auf dieser Webseite wurden
nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit,
noch Qualität der bereit gestellten Informationen zugesichert.
Bemerkung:
Die farbliche Syntaxdarstellung ist noch experimentell.