Profiling
EnTK can be configured to generate profiles by setting
RADICAL_ENTK_PROFILE=True
. Profiles are generated per component and
sub-component. These profiles can be read and analyzed by using
RADICAL Analytics (RA).
We describe profiling capabilities using RADICAL Analytics for EnTK via two examples that extract durations and timestamps.
The scripts and the data can be found in your virtualenv under share/radical.entk/analytics/scripts
or can be downloaded via the following links:
Untar the data and run either of the scripts. We recommend following the inline comments and output messages to get an understanding of RADICAL Analytics’ usage for EnTK.
More details on the capabilities of RADICAL Analytics can be found in its documentation.
Note
The current examples of RADICAL Analytics are configured for RADICAL Pilot but can be changed to EnTK by * Setting stype to ‘radical.entk’ when creating the RADICAL Analytics session * Following the state model, event model, and sequence diagram to determine the EnTK probes to use in profiling.
Extracting durations
1#!/usr/bin/env python
2__copyright__ = 'Copyright 2013-2018, http://radical.rutgers.edu'
3__license__ = 'MIT'
4
5
6import os
7import sys
8import glob
9import pprint
10import radical.utils as ru
11import radical.entk as re
12import radical.analytics as ra
13
14"""This example illustrates hoq to obtain durations for arbitrary (non-state)
15profile events. Modified from examples under RADICAL Analytics"""
16
17# ------------------------------------------------------------------------------
18#
19if __name__ == '__main__':
20
21 loc = './re.session.two.vivek.017759.0012'
22 src = os.path.dirname(loc)
23 sid = os.path.basename(loc)
24 session = ra.Session(src=src, sid = sid, stype='radical.entk')
25
26 # A formatting helper before starting...
27 def ppheader(message):
28 separator = '\n' + 78 * '-' + '\n'
29 print(separator + message + separator)
30
31 # First we look at the *event* model of our session. The event model is
32 # usually less stringent than the state model: not all events will always be
33 # available, events may have certain fields missing, they may be recorded
34 # multiple times, their meaning may slightly differ, depending on the taken
35 # code path. But in general, these are the events available, and their
36 # relative ordering.
37 ppheader("event models")
38 pprint.pprint(session.describe('event_model'))
39 pprint.pprint(session.describe('statistics'))
40
41 # Let's say that we want to see how long EnTK took to schedule, execute, and
42 # process completed tasks.
43
44 # We first filter our session to obtain only the task objects
45 tasks = session.filter(etype='task', inplace=False)
46 print('#tasks : %d' % len(tasks.get()))
47
48 # We use the 're.states.SCHEDULING' and 're.states.SUBMITTING' probes to find
49 # the time taken by EnTK to create and submit all tasks for execution
50 ppheader("Time spent to create and submit the tasks")
51 duration = tasks.duration(event=[{ru.EVENT: 'state',
52 ru.STATE: re.states.SCHEDULING},
53 {ru.EVENT: 'state',
54 ru.STATE: re.states.SUBMITTING}])
55 print('duration : %.2f' % duration)
56
57 # We use the 're.states.SUBMITTING' and 're.states.COMPLETED' probes to find
58 # the time taken by EnTK to execute all tasks
59 ppheader("Time spent to execute the tasks")
60 duration = tasks.duration(event=[{ru.EVENT: 'state',
61 ru.STATE: re.states.SUBMITTING},
62 {ru.EVENT: 'state',
63 ru.STATE: re.states.COMPLETED}])
64 print('duration : %.2f' % duration)
65
66 # We use the 're.states.COMPLETED' and 're.states.DONE' probes to find
67 # the time taken by EnTK to process all executed tasks
68 ppheader("Time spent to process executed tasks")
69 duration = tasks.duration(event=[{ru.EVENT: 'state',
70 ru.STATE: re.states.COMPLETED},
71 {ru.EVENT: 'state',
72 ru.STATE: re.states.DONE}])
73 print('duration : %.2f' % duration)
74
75 # Finally, we produce a list of the number of concurrent tasks between
76 # states 're.states.SUBMITTING' and 're.states.COMPLETED' over the course
77 # of the entire execution sampled every 10 seconds
78 ppheader("concurrent tasks in between SUBMITTING and EXECUTED states")
79 concurrency = tasks.concurrency(event=[{ru.EVENT: 'state',
80 ru.STATE: re.states.SUBMITTING},
81 {ru.EVENT: 'state',
82 ru.STATE: re.states.COMPLETED}],
83 sampling=10)
84 pprint.pprint(concurrency)
85
86
87# ------------------------------------------------------------------------------
Extracting timestamps
1#!/usr/bin/env python
2
3import os
4import pprint
5import radical.utils as ru
6import radical.entk as re
7import radical.analytics as ra
8
9__copyright__ = 'Copyright 2013-2018, http://radical.rutgers.edu'
10__license__ = 'MIT'
11
12"""
13This example illustrates the use of the method ra.Session.get().
14Modified from examples under RADICAL Analytics
15"""
16
17# ------------------------------------------------------------------------------
18#
19if __name__ == '__main__':
20
21 loc = './re.session.two.vivek.017759.0012'
22 src = os.path.dirname(loc)
23 sid = os.path.basename(loc)
24 session = ra.Session(src=src, sid=sid, stype='radical.entk')
25
26 # A formatting helper before starting...
27 def ppheader(message):
28 separator = '\n' + 78 * '-' + '\n'
29 print(separator + message + separator)
30
31 # and here we go. As seen in example 01, we use ra.Session.list() to get the
32 # name of all the types of entity of the session.
33 etypes = session.list('etype')
34 pprint.pprint(etypes)
35
36 # We limit ourselves to the type 'task'. We use the method
37 # ra.Session.get() to get all the objects in our session with etype 'task':
38 ppheader("properties of the entities with etype 'task'")
39 tasks = session.get(etype='task')
40 pprint.pprint(tasks)
41
42
43 # Mmmm, still a bit too many entities. We limit our analysis to a single
44 # task. We use ra.Session.get() to select all the objects in the
45 # session with etype 'task' and uid 'task.0000' and return them into a
46 # list:
47 ppheader("properties of the entities with etype 'task' and uid 'task.0000'")
48 task = session.get(etype='task', uid='task.0000')
49 pprint.pprint(task)
50
51
52 # We may want also to look into the states of this task:
53 ppheader("states of the entities with uid 'task.0000'")
54 states = task[0].states
55 pprint.pprint(states)
56
57 # and extract the state we need. For example, the state 'SCHEDULED', that
58 # indicates that the task has been scheduled. To refer to the state 'SCHEDULED',
59 # and to all the other states of RADICAL-Pilot, we use the re.states.SCHEDULED property
60 # that guarantees type checking.
61 ppheader("Properties of the state re.SCHEDULED of the entities with uid 'task.0000'")
62 state = task[0].states[re.states.SCHEDULED]
63 pprint.pprint(state)
64
65 # Finally, we extract a property we need from this state. For example, the
66 # timestamp of when the task has been created, i.e., the property 'time' of
67 # the state SCHEDULED:
68 ppheader("Property 'time' of the state re.states.SCHEDULED of the entities with uid 'task.000000'")
69 timestamp = task[0].states[re.states.SCHEDULED][ru.TIME]
70 pprint.pprint(timestamp)
71
72 # ra.Session.get() can also been used to to get all the entities in our
73 # session that have a specific state. For example, the following gets all
74 # the types of entity that have the state 'SCHEDULED':
75 ppheader("Entities with state re.states.SCHEDULED")
76 entities = session.get(state=re.states.SCHEDULED)
77 pprint.pprint(entities)
78
79 # We can then print the timestamp of the state 'SCHEDULED' for all the entities
80 # having that state by using something like:
81 ppheader("Timestamp of all the entities with state re.states.SCHEDULED")
82 timestamps = [entity.states[re.states.SCHEDULED][ru.TIME] for entity in entities]
83 pprint.pprint(timestamps)
84
85 # We can also create tailored data structures for our analyis. For
86 # example, using tuples to name entities, state, and timestamp:
87 ppheader("Named entities with state re.states.SCHEDULED and its timestamp")
88 named_timestamps = [(entity.uid,
89 entity.states[re.states.SCHEDULED][ru.STATE],
90 entity.states[re.states.SCHEDULED][ru.TIME]) for entity in entities]
91 pprint.pprint(named_timestamps)