Getting Started

In this section we will run through the Ensemble Toolkit API. We will develop an example application consisting of a simple bag of Tasks.

Note

The reader is assumed to be familiar with the PST Model and to have read through the Introduction of Ensemble Toolkit.

Note

This chapter assumes that you have successfully installed Ensemble Toolkit, if not see Installation.

You can download the complete code discussed in this section here or find it in your virtualenv under share/radical.entk/user_guide/scripts.

Importing components from the Ensemble Toolkit Module

To create any application using Ensemble Toolkit, you need to import five modules: Pipeline, Stage, Task, AppManager, ResourceManager. We have already discussed these components in the earlier sections.

from radical.entk import Pipeline, Stage, Task, AppManager

Creating the workflow

We first create a Pipeline, Stage and Task object. Then we assign the ‘executable’ and ‘arguments’ for the Task. For this example, we will create one Pipeline consisting of one Stage that contains one Task.

In the below snippet, we first create a Pipeline then a Stage.

    t = Task()
    t.name = 'my.first.task'        # Assign a name to the task (optional, do not use ',' or '_')
    t.executable = '/bin/echo'   # Assign executable to the task
    t.arguments = ['Hello World']  # Assign arguments for the task executable

    # Add Task to the Stage
    s.add_tasks(t)

Next, we create a Task and assign its name, executable and arguments of the executable.

    # Add Stage to the Pipeline
    p.add_stages(s)

    # Create Application Manager
    appman = AppManager()

Now, that we have a fully described Task, a Stage and a Pipeline. We create our workflow by adding the Task to the Stage and adding the Stage to the Pipeline.

    # resource, walltime, and cpus
    # resource is 'local.localhost' to execute locally
    res_dict = {

        'resource': 'local.localhost',

Creating the AppManager

Now that our workflow has been created, we need to specify where it is to be executed. For this example, we will simply execute the workflow locally. We create an AppManager object, describe a resource request for 1 core for 10 minutes on localhost, i.e. your local machine. We assign the resource request description and the workflow to the AppManager and run our application.

        'walltime': 10,
        'cpus': 1
    }

    # Assign resource request description to the Application Manager
    appman.resource_desc = res_dict

    # Assign the workflow as a set or list of Pipelines to the Application Manager
    # Note: The list order is not guaranteed to be preserved
    appman.workflow = set([p])

    # Run the Application Manager
    appman.run()

Warning

If the python version your system has by default is Anaconda python, please change line 51 in the above code block to

'resource': 'local.localhost_anaconda',

To run the script, simply execute the following from the command line:

python get_started.py

Warning

The first run may fail for different reasons, most of which related to setting up the execution environment or requesting the correct resources. Upon failure, Python may incorrectly raise the exception KeyboardInterrupt. This may be confusion because it is reported even when no keyboard interrupt has been issued. Currently, we did not find a way to avoid to raise that exception.

And that’s it! That’s all the steps in this example. You can generate more verbose output by setting the environment variable `export RADICAL_LOG_TGT=radical.log;export RADICAL_LOG_LVL=DEBUG`.

After the execution of the example, you may want to check the output. Under your home folder, you will find a folder named radical.pilot.sandbox. In that folder, there will be a re.session.* folder and a ve.local.localhost folder. Inside, re.session.*, there is a pilot.0000 folder and in there a unit.000000 folder. In the unit folder, you will see several files including a unit.000000.out and unit.000000.err files. The unit.000000.out holds the messages from the standard output and unit.000000.err holds the messages from standard error. The unit.000000.out file should have a Hello World message.

Let’s look at the complete code for this example:

#!/usr/bin/env python

from radical.entk import Pipeline, Stage, Task, AppManager
import os

# ------------------------------------------------------------------------------
# Set default verbosity

if os.environ.get('RADICAL_ENTK_VERBOSE') is None:
    os.environ['RADICAL_ENTK_REPORT'] = 'True'


if __name__ == '__main__':

    # Create a Pipeline object
    p = Pipeline()

    # Create a Stage object
    s = Stage()

    # Create a Task object
    t = Task()
    t.name = 'my.first.task'        # Assign a name to the task (optional, do not use ',' or '_')
    t.executable = '/bin/echo'   # Assign executable to the task
    t.arguments = ['Hello World']  # Assign arguments for the task executable

    # Add Task to the Stage
    s.add_tasks(t)

    # Add Stage to the Pipeline
    p.add_stages(s)

    # Create Application Manager
    appman = AppManager()

    # Create a dictionary describe four mandatory keys:
    # resource, walltime, and cpus
    # resource is 'local.localhost' to execute locally
    res_dict = {

        'resource': 'local.localhost',
        'walltime': 10,
        'cpus': 1
    }

    # Assign resource request description to the Application Manager
    appman.resource_desc = res_dict

    # Assign the workflow as a set or list of Pipelines to the Application Manager
    # Note: The list order is not guaranteed to be preserved
    appman.workflow = set([p])

    # Run the Application Manager
    appman.run()