Profiling

EnTK can be configured to generate profiles by setting RADICAL_ENTK_PROFILE=True. Profiles are generated per component and sub-component. These profiles can be read and analyzed by using RADICAL Analytics (RA).

We describe profiling capabilities using RADICAL Analytics for EnTK via two examples that extract durations and timestamps.

The scripts and the data can be found in your virtualenv under share/radical.entk/analytics/scripts or can be downloaded via the following links:

  • Data: Link

  • Durations: Link

  • Timestamps: Link

Untar the data and run either of the scripts. We recommend following the inline comments and output messages to get an understanding of RADICAL Analytics’ usage for EnTK.

More details on the capabilities of RADICAL Analytics can be found in its documentation.

Note

The current examples of RADICAL Analytics are configured for RADICAL Pilot but can be changed to EnTK by * Setting stype to ‘radical.entk’ when creating the RADICAL Analytics session * Following the state model, event model, and sequence diagram to determine the EnTK probes to use in profiling.

Extracting durations

 1#!/usr/bin/env python
 2__copyright__ = 'Copyright 2013-2018, http://radical.rutgers.edu'
 3__license__ = 'MIT'
 4
 5
 6import os
 7import sys
 8import glob
 9import pprint
10import radical.utils as ru
11import radical.entk as re
12import radical.analytics as ra
13
14"""This example illustrates hoq to obtain durations for arbitrary (non-state)
15profile events. Modified from examples under RADICAL Analytics"""
16
17# ------------------------------------------------------------------------------
18#
19if __name__ == '__main__':
20
21    loc = './re.session.two.vivek.017759.0012'
22    src = os.path.dirname(loc)
23    sid = os.path.basename(loc)
24    session = ra.Session(src=src, sid = sid, stype='radical.entk')
25
26    # A formatting helper before starting...
27    def ppheader(message):
28        separator = '\n' + 78 * '-' + '\n'
29        print(separator + message + separator)
30
31    # First we look at the *event* model of our session.  The event model is
32    # usually less stringent than the state model: not all events will always be
33    # available, events may have certain fields missing, they may be recorded
34    # multiple times, their meaning may slightly differ, depending on the taken
35    # code path.  But in general, these are the events available, and their
36    # relative ordering.
37    ppheader("event models")
38    pprint.pprint(session.describe('event_model'))
39    pprint.pprint(session.describe('statistics'))
40
41    # Let's say that we want to see how long EnTK took to schedule, execute, and
42    # process completed tasks.
43
44    # We first filter our session to obtain only the task objects
45    tasks = session.filter(etype='task', inplace=False)
46    print('#tasks   : %d' % len(tasks.get()))
47
48    # We use the 're.states.SCHEDULING' and 're.states.SUBMITTING' probes to find
49    # the time taken by EnTK to create and submit all tasks for execution
50    ppheader("Time spent to create and submit the tasks")
51    duration = tasks.duration(event=[{ru.EVENT: 'state',
52                                    ru.STATE: re.states.SCHEDULING},
53                                    {ru.EVENT: 'state',
54                                    ru.STATE: re.states.SUBMITTING}])
55    print('duration : %.2f' % duration)
56
57    # We use the 're.states.SUBMITTING' and 're.states.COMPLETED' probes to find
58    # the time taken by EnTK to execute all tasks
59    ppheader("Time spent to execute the tasks")
60    duration = tasks.duration(event=[{ru.EVENT: 'state',
61                                    ru.STATE: re.states.SUBMITTING},
62                                    {ru.EVENT: 'state',
63                                    ru.STATE: re.states.COMPLETED}])
64    print('duration : %.2f' % duration)
65
66    # We use the 're.states.COMPLETED' and 're.states.DONE' probes to find
67    # the time taken by EnTK to process all executed tasks
68    ppheader("Time spent to process executed tasks")
69    duration = tasks.duration(event=[{ru.EVENT: 'state',
70                                    ru.STATE: re.states.COMPLETED},
71                                    {ru.EVENT: 'state',
72                                    ru.STATE: re.states.DONE}])
73    print('duration : %.2f' % duration)
74
75    # Finally, we produce a list of the number of concurrent tasks between
76    # states 're.states.SUBMITTING' and 're.states.COMPLETED' over the course
77    # of the entire execution sampled every 10 seconds
78    ppheader("concurrent tasks in between SUBMITTING and EXECUTED states")
79    concurrency = tasks.concurrency(event=[{ru.EVENT: 'state',
80                                            ru.STATE: re.states.SUBMITTING},
81                                            {ru.EVENT: 'state',
82                                            ru.STATE: re.states.COMPLETED}],
83                                    sampling=10)
84    pprint.pprint(concurrency)
85
86
87# ------------------------------------------------------------------------------

Extracting timestamps

 1#!/usr/bin/env python
 2
 3import os
 4import pprint
 5import radical.utils as ru
 6import radical.entk as re
 7import radical.analytics as ra
 8
 9__copyright__ = 'Copyright 2013-2018, http://radical.rutgers.edu'
10__license__ = 'MIT'
11
12"""
13This example illustrates the use of the method ra.Session.get().
14Modified from examples under RADICAL Analytics
15"""
16
17# ------------------------------------------------------------------------------
18#
19if __name__ == '__main__':
20
21    loc = './re.session.two.vivek.017759.0012'
22    src = os.path.dirname(loc)
23    sid = os.path.basename(loc)
24    session = ra.Session(src=src, sid=sid, stype='radical.entk')
25
26    # A formatting helper before starting...
27    def ppheader(message):
28        separator = '\n' + 78 * '-' + '\n'
29        print(separator + message + separator)
30
31    # and here we go. As seen in example 01, we use ra.Session.list() to get the
32    # name of all the types of entity of the session.
33    etypes = session.list('etype')
34    pprint.pprint(etypes)
35
36    # We limit ourselves to the type 'task'. We use the method
37    # ra.Session.get() to get all the objects in our session with etype 'task':
38    ppheader("properties of the entities with etype 'task'")
39    tasks = session.get(etype='task')
40    pprint.pprint(tasks)
41
42
43    # Mmmm, still a bit too many entities. We limit our analysis to a single
44    # task. We use ra.Session.get() to select all the objects in the
45    # session with etype 'task' and uid 'task.0000' and return them into a
46    # list:
47    ppheader("properties of the entities with etype 'task' and uid 'task.0000'")
48    task = session.get(etype='task', uid='task.0000')
49    pprint.pprint(task)
50
51
52    # We may want also to look into the states of this task:
53    ppheader("states of the entities with uid 'task.0000'")
54    states = task[0].states
55    pprint.pprint(states)
56
57    # and extract the state we need. For example, the state 'SCHEDULED', that
58    # indicates that the task has been scheduled. To refer to the state 'SCHEDULED',
59    # and to all the other states of RADICAL-Pilot, we use the re.states.SCHEDULED property
60    # that guarantees type checking.
61    ppheader("Properties of the state re.SCHEDULED of the entities with uid 'task.0000'")
62    state = task[0].states[re.states.SCHEDULED]
63    pprint.pprint(state)
64
65    # Finally, we extract a property we need from this state. For example, the
66    # timestamp of when the task has been created, i.e., the property 'time' of
67    # the state SCHEDULED:
68    ppheader("Property 'time' of the state re.states.SCHEDULED of the entities with uid 'task.000000'")
69    timestamp = task[0].states[re.states.SCHEDULED][ru.TIME]
70    pprint.pprint(timestamp)
71
72    # ra.Session.get() can also been used to to get all the entities in our
73    # session that have a specific state. For example, the following gets all
74    # the types of entity that have the state 'SCHEDULED':
75    ppheader("Entities with state re.states.SCHEDULED")
76    entities = session.get(state=re.states.SCHEDULED)
77    pprint.pprint(entities)
78
79    # We can then print the timestamp of the state 'SCHEDULED' for all the entities
80    # having that state by using something like:
81    ppheader("Timestamp of all the entities with state re.states.SCHEDULED")
82    timestamps = [entity.states[re.states.SCHEDULED][ru.TIME] for entity in entities]
83    pprint.pprint(timestamps)
84
85    # We can also create tailored data structures for our analyis. For
86    # example, using tuples to name entities, state, and timestamp:
87    ppheader("Named entities with state re.states.SCHEDULED and its timestamp")
88    named_timestamps = [(entity.uid,
89                        entity.states[re.states.SCHEDULED][ru.STATE],
90                        entity.states[re.states.SCHEDULED][ru.TIME]) for entity in entities]
91    pprint.pprint(named_timestamps)