4.8. Profiling

EnTK can be configured to generate profiles by setting RADICAL_ENTK_PROFILE=True. Profiles are generated per component and sub-component. These profiles can be read and analyzed by using RADICAL Analytics (RA).

We describe profiling capabilities using RADICAL Analytics for EnTK via two examples that extract durations and timestamps.

The scripts and the data can be found in your virtualenv under share/radical.entk/analytics/scripts or can be downloaded via the following links:

  • Data: Link
  • Durations: Link
  • Timestamps: Link

Untar the data and run either of the scripts. We recommend following the inline comments and output messages to get an understanding of RADICAL Analytics’ usage for EnTK.

More details on the capabilities of RADICAL Analytics can be found in its documentation.

Note

The current examples of RADICAL Analytics are configured for RADICAL Pilot but can be changed to EnTK by * Setting stype to ‘radical.entk’ when creating the RADICAL Analytics session * Following the state model, event model, and sequence diagram to determine the EnTK probes to use in profiling.

4.8.1. Extracting durations

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
#!/usr/bin/env python
__copyright__ = 'Copyright 2013-2018, http://radical.rutgers.edu'
__license__ = 'MIT'


import os
import sys
import glob
import pprint
import radical.utils as ru
import radical.entk as re
import radical.analytics as ra

"""This example illustrates hoq to obtain durations for arbitrary (non-state)
profile events. Modified from examples under RADICAL Analytics"""

# ------------------------------------------------------------------------------
#
if __name__ == '__main__':

    loc = './re.session.two.vivek.017759.0012'
    src = os.path.dirname(loc)
    sid = os.path.basename(loc)
    session = ra.Session(src=src, sid = sid, stype='radical.entk')

    # A formatting helper before starting...
    def ppheader(message):
        separator = '\n' + 78 * '-' + '\n'
        print(separator + message + separator)

    # First we look at the *event* model of our session.  The event model is
    # usually less stringent than the state model: not all events will always be
    # available, events may have certain fields missing, they may be recorded
    # multiple times, their meaning may slightly differ, depending on the taken
    # code path.  But in general, these are the events available, and their
    # relative ordering.
    ppheader("event models")
    pprint.pprint(session.describe('event_model'))
    pprint.pprint(session.describe('statistics'))

    # Let's say that we want to see how long EnTK took to schedule, execute, and
    # process completed tasks.

    # We first filter our session to obtain only the task objects
    tasks = session.filter(etype='task', inplace=False)
    print('#tasks   : %d' % len(tasks.get()))

    # We use the 're.states.SCHEDULING' and 're.states.SUBMITTING' probes to find
    # the time taken by EnTK to create and submit all tasks for execution
    ppheader("Time spent to create and submit the tasks")
    duration = tasks.duration(event=[{ru.EVENT: 'state',
                                    ru.STATE: re.states.SCHEDULING},
                                    {ru.EVENT: 'state',
                                    ru.STATE: re.states.SUBMITTING}])
    print('duration : %.2f' % duration)

    # We use the 're.states.SUBMITTING' and 're.states.COMPLETED' probes to find
    # the time taken by EnTK to execute all tasks
    ppheader("Time spent to execute the tasks")
    duration = tasks.duration(event=[{ru.EVENT: 'state',
                                    ru.STATE: re.states.SUBMITTING},
                                    {ru.EVENT: 'state',
                                    ru.STATE: re.states.COMPLETED}])
    print('duration : %.2f' % duration)

    # We use the 're.states.COMPLETED' and 're.states.DONE' probes to find
    # the time taken by EnTK to process all executed tasks
    ppheader("Time spent to process executed tasks")
    duration = tasks.duration(event=[{ru.EVENT: 'state',
                                    ru.STATE: re.states.COMPLETED},
                                    {ru.EVENT: 'state',
                                    ru.STATE: re.states.DONE}])
    print('duration : %.2f' % duration)

    # Finally, we produce a list of the number of concurrent tasks between
    # states 're.states.SUBMITTING' and 're.states.COMPLETED' over the course
    # of the entire execution sampled every 10 seconds
    ppheader("concurrent tasks in between SUBMITTING and EXECUTED states")
    concurrency = tasks.concurrency(event=[{ru.EVENT: 'state',
                                            ru.STATE: re.states.SUBMITTING},
                                            {ru.EVENT: 'state',
                                            ru.STATE: re.states.COMPLETED}],
                                    sampling=10)
    pprint.pprint(concurrency)


# ------------------------------------------------------------------------------

4.8.2. Extracting timestamps

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
#!/usr/bin/env python

import os
import pprint
import radical.utils as ru
import radical.entk as re
import radical.analytics as ra

__copyright__ = 'Copyright 2013-2018, http://radical.rutgers.edu'
__license__ = 'MIT'

"""
This example illustrates the use of the method ra.Session.get().
Modified from examples under RADICAL Analytics
"""

# ------------------------------------------------------------------------------
#
if __name__ == '__main__':

    loc = './re.session.two.vivek.017759.0012'
    src = os.path.dirname(loc)
    sid = os.path.basename(loc)
    session = ra.Session(src=src, sid=sid, stype='radical.entk')

    # A formatting helper before starting...
    def ppheader(message):
        separator = '\n' + 78 * '-' + '\n'
        print(separator + message + separator)

    # and here we go. As seen in example 01, we use ra.Session.list() to get the
    # name of all the types of entity of the session.
    etypes = session.list('etype')
    pprint.pprint(etypes)

    # We limit ourselves to the type 'task'. We use the method
    # ra.Session.get() to get all the objects in our session with etype 'task':
    ppheader("properties of the entities with etype 'task'")
    tasks = session.get(etype='task')
    pprint.pprint(tasks)


    # Mmmm, still a bit too many entities. We limit our analysis to a single
    # task. We use ra.Session.get() to select all the objects in the
    # session with etype 'task' and uid 'task.0000' and return them into a
    # list:
    ppheader("properties of the entities with etype 'task' and uid 'task.0000'")
    task = session.get(etype='task', uid='task.0000')
    pprint.pprint(task)


    # We may want also to look into the states of this task:
    ppheader("states of the entities with uid 'task.0000'")
    states = task[0].states
    pprint.pprint(states)

    # and extract the state we need. For example, the state 'SCHEDULED', that
    # indicates that the task has been scheduled. To refer to the state 'SCHEDULED',
    # and to all the other states of RADICAL-Pilot, we use the re.states.SCHEDULED property
    # that guarantees type checking.
    ppheader("Properties of the state re.SCHEDULED of the entities with uid 'task.0000'")
    state = task[0].states[re.states.SCHEDULED]
    pprint.pprint(state)

    # Finally, we extract a property we need from this state. For example, the
    # timestamp of when the task has been created, i.e., the property 'time' of
    # the state SCHEDULED:
    ppheader("Property 'time' of the state re.states.SCHEDULED of the entities with uid 'task.000000'")
    timestamp = task[0].states[re.states.SCHEDULED][ru.TIME]
    pprint.pprint(timestamp)

    # ra.Session.get() can also been used to to get all the entities in our
    # session that have a specific state. For example, the following gets all
    # the types of entity that have the state 'SCHEDULED':
    ppheader("Entities with state re.states.SCHEDULED")
    entities = session.get(state=re.states.SCHEDULED)
    pprint.pprint(entities)

    # We can then print the timestamp of the state 'SCHEDULED' for all the entities
    # having that state by using something like:
    ppheader("Timestamp of all the entities with state re.states.SCHEDULED")
    timestamps = [entity.states[re.states.SCHEDULED][ru.TIME] for entity in entities]
    pprint.pprint(timestamps)

    # We can also create tailored data structures for our analyis. For
    # example, using tuples to name entities, state, and timestamp:
    ppheader("Named entities with state re.states.SCHEDULED and its timestamp")
    named_timestamps = [(entity.uid,
                        entity.states[re.states.SCHEDULED][ru.STATE],
                        entity.states[re.states.SCHEDULED][ru.TIME]) for entity in entities]
    pprint.pprint(named_timestamps)