9.2.7. Task API

class radical.entk.Task(from_dict=None)[source]

A Task is an abstraction of a computational unit. In this case, a Task consists of its executable along with its required software environment, files to be staged as input and output.

At the user level a Task is to be populated by assigning attributes. Internally, an empty Task is created and population using the from_dict function. This is to avoid creating Tasks with new uid as tasks with new uid offset the uid count file in radical.utils and can potentially affect the profiling if not taken care.

_uids = []
_validate()[source]

Purpose: Validate that the state of the task is ‘DESCRIBED’ and that an executable has been specified for the task.

arguments

List of arguments to be supplied to the executable

Getter:returns the list of arguments of the current task
Setter:assigns a list of arguments to the current task
Arguments:list of strings
copy_input_data

Copies data (filenames in a list) from another task to a current task (or data staging area) before it starts.

The following is an example

t2.copy_input_data = ['$Pipeline_%s_Stage_%s_Task_%s/output.txt' %
                      (p.name, s1.name, t1.name)]
# output.txt is copied from a t1 task to a current task before it
# starts.
Getter:return the list of files
Setter:assign the list of files
Arguments:list of strings
copy_output_data

Copies data (filenames in a list) from a current task to another task (or data staging area) when a task is finished.

The following is an example

t.copy_output_data = [ 'results.txt > $SHARED/results.txt' ]
# results.txt is copied to a data staging area `$SHARED` when a task is
# finised.
Getter:return the list of files
Setter:assign the list of files
Arguments:list of strings
cpu_reqs

Purpose: The CPU requirements of the current Task.

The requirements are described in terms of the number of processes and threads to be run in this Task. The expected format is:

task.cpu_reqs = {'cpu_processes'    : X,
                 'cpu_process_type' : None/MPI,
                 'cpu_threads'      : Y,
                 'cpu_thread_type'  : None/OpenMP}

This description means that the Task is going to spawn X processes and Y threads per each of these processes to run on CPUs. Hence, the total number of cpus required by the Task is X*Y for all the processes and threads to execute concurrently. The same assumption is made in implementation and X*Y cpus are requested for this Task.

The default value is:

task.cpu_reqs = {'cpu_processes'    : 1,
                 'cpu_process_type' : None,
                 'cpu_threads'      : 1,
                 'cpu_thread_type'  : None}

This description requests 1 core and expected the executable to non-MPI and single threaded.

Getter:return the cpu requirement of the current Task
Setter:assign the cpu requirement of the current Task
Arguments:dict
download_output_data

Downloads data (filenames in a list) from a current task to a local client (e.g. laptop) when a task is finished.

The following is an example .. highlight:: python

 t.download_output_data = [ 'results.txt' ]
 # results.txt is transferred to a local client (e.g. laptop) when a
# current task finised.
Getter:return the list of files
Setter:assign the list of files
Arguments:list of strings
executable

A unix-based kernel to be executed

Getter:returns the executable of the current task
Setter:assigns the executable for the current task
Arguments:string
exit_code

Get the exit code for DONE tasks. 0 for successful, 1 for failed tasks.

Getter:return the exit code of the current task
from_dict(d)[source]

Create a Task from a dictionary. The change is in inplace.

Argument:python dictionary
Returns:None
gpu_reqs

Purpose: The GPU requirements of the current Task.

The requirements are described in terms of the number of processes and threads to be run in this Task. The expected format is:

task.gpu_reqs = {'gpu_processes'    : X,
                 'gpu_process_type' : None/MPI,
                 'gpu_threads'      : Y,
                 'gpu_thread_type'  : None/OpenMP/CUDA}

This description means that the Task is going to spawn X processes and Y threads per each of these processes to run on GPUs. Hence, the total number of gpus required by the Task is X*Y for all the processes and threads to execute concurrently. The same assumption is made in implementation and X*Y gpus are requested for this Task.

The default value is: .. highlight:: python .. code-block:: python

task.gpu_reqs = {‘gpu_processes’ : 0,
‘gpu_process_type’ : None, ‘gpu_threads’ : 0, ‘gpu_thread_type’ : None}

This description requests 0 gpus as not all machines have GPUs.

Getter:return the gpu requirement of the current Task
Setter:assign the gpu requirement of the current Task
Arguments:dict
lfs_per_process

Set the amount of local file-storage space required by the task

Symlinks data (filenames in a list) from another task to a current task (or data staging area) before it starts.

Getter:return the list of files
Setter:assign the list of files
Arguments:list of strings

Symlins data (filenames in a list) from a current task to another task (or data staging area) when a task is finished.

Getter:return the list of files
Setter:assign the list of files
Arguments:list of strings
luid

Unique ID of the current task (fully qualified).

example:
>>> task.luid
pipe.0001.stage.0004.task.0234
Getter:Returns the fully qualified uid of the current task
Type:String
move_input_data

Moves data (filenames in a list) from another task to a current task (or data staging area) before it starts.

Getter:return the list of files
Setter:assign the list of files
Arguments:list of strings
move_output_data

Moves data (filenames in a list) from a current task to another task (or data staging area) when a task is finished.

Getter:return the list of files
Setter:assign the list of files
Arguments:list of strings
name

Name of the task. Do not use a ‘,’ or ‘_’ in an object’s name.

Getter:Returns the name of the current task
Setter:Assigns the name of the current task
Type:String
parent_pipeline
Getter:Returns the pipeline this task belongs to
Setter:Assigns the pipeline uid this task belongs to
parent_stage
Getter:Returns the stage this task belongs to
Setter:Assigns the stage uid this task belongs to
path

Get the path of the task on the remote machine. Useful to reference files generated in the current task.

Getter:return the path of the current task
post_exec

List of commands to be executed post executable

Getter:return the list of commands
Setter:assign the list of commands
Arguments:list of strings
pre_exec

List of commands to be executed prior to the executable

Getter:return the list of commands
Setter:assign the list of commands
Arguments:list of strings
rts_uid

Unique RTS ID of the current task

Getter:Returns the RTS unique id of the current task
Type:String
sandbox

Sandbox the task is running in

Getter:returns the sandbox name
Setter:assigns the sandbox name
Arguments:string
stage_on_error

Allow to stage out data if task failed

Getter:Returns the value
Type:Boolean
state

Current state of the task

Getter:Returns the state of the current task
Type:String
state_history

Returns a list of the states obtained in temporal order

Returns:list
stderr

Name of the file to which stderr of task is to be written

Getter:return name of stderr file
Setter:assign name of stderr file
Arguments:str
stdout

Name of the file to which stdout of task is to be written

Getter:return name of stdout file
Setter:assign name of stdout file
Arguments:str
tag

WARNING: It will be deprecated.

tags

Set the tags for the task that can be used while scheduling by the RTS

Getter:return the tags of the current task
to_dict()[source]

Convert current Task into a dictionary

Returns:python dictionary
uid

Unique ID of the current task

Getter:Returns the unique id of the current task
Type:String
upload_input_data

Transfers data (filenames in a list) from a local client (e.g. laptop) to the location of the current task (or data staging area) before it starts.

Getter:return the list of files
Setter:assign the list of files
Arguments:list of strings