Monitor resources and collect results

System metrics

Please note: Metrics collection is only available on CHI@UC for the time being.

 

Our latest CentOS7 appliances are configured to send a selection of system metrics to the Gnocchi time series database, using the collectd system statistics collection daemon. Measurements of these metrics can be retrieved via the command line. Visualizing these metrics is not yet supported in the web interface.

To retrieve measurements, install  the openstack command line tool with the Gnocchi plugin. To install them on your own machine (laptop or workstation), run:

pip install python-openstackclient
pip install gnocchiclient

Then, set up your environment for OpenStack command line usage, as described in the provisioning documentation.

Now, you can run the openstack metric command line utility. To show the different kinds of metrics collected for a specific instance, run: openstack metric resource show <instance_id>'. You will get a result like the following:

+-----------------------+-------------------------------------------------------------------+
| Field                 | Value                                                             |
+-----------------------+-------------------------------------------------------------------+
| created_by_project_id | 975c0a94b784483a885f4503f70af655                                  |
| created_by_user_id    | fee2bf85ecbe4f5fbcc5058de6938a8a                                  |
| creator               | fee2bf85ecbe4f5fbcc5058de6938a8a:975c0a94b784483a885f4503f70af655 |
| ended_at              | None                                                              |
| id                    | d17d5191-af60-4407-9ed2-e3d48e86ac6d                              |
| metrics               | interface-eno1@if_dropped: af391a6a-f323-4671-bebb-8673ce308f22   |
|                       | interface-eno1@if_errors: 204e4a0b-bdde-486c-a1a2-2e3b4c5d7b2e    |
|                       | interface-eno1@if_octets: e1a9ba2a-fae0-40f7-b9df-7ffa85c5da02    |
|                       | interface-eno1@if_packets: cb933fa4-0074-4e8b-978b-8aee8e172c94   |
|                       | interface-eno2@if_dropped: 8dde1dcb-6223-4aec-af3f-e507dc086795   |
|                       | interface-eno2@if_errors: d71f3a06-d865-404c-be29-a7f327e44494    |
|                       | interface-eno2@if_octets: 979edab6-99ee-4161-a068-b8e71e789800    |
|                       | interface-eno2@if_packets: fea00fe6-917b-4511-863f-bb472e6691b8   |
|                       | interface-eno3@if_dropped: 28f22053-2dc2-4a19-becb-5286645167eb   |
|                       | interface-eno3@if_errors: 7a248d94-ba00-4de1-b696-9c4cce017185    |
|                       | interface-eno3@if_octets: d77fa79d-d2b4-4276-bf0a-c21459dd5fce    |
|                       | interface-eno3@if_packets: ee2c8da1-3014-414f-ab3f-5dec06bcf2b5   |
|                       | interface-eno4@if_dropped: d28ee645-c5fd-4754-bce9-c8fa65bfcb14   |
|                       | interface-eno4@if_errors: 249d1f73-5a49-4be8-992b-4295eea32660    |
|                       | interface-eno4@if_octets: 3016e562-93ab-4c0c-803f-3cb72dc6d82d    |
|                       | interface-eno4@if_packets: 1c44cb28-9d6c-416b-96a6-2e1c1217a313   |
|                       | interface-lo@if_dropped: 5b486d93-fad6-4bc3-8294-4f10ffbd29ee     |
|                       | interface-lo@if_errors: 813d8c03-4655-4f05-b498-433424389bb3      |
|                       | interface-lo@if_octets: 636f6ecb-8492-4d5d-a1fc-7d3bb8062c22      |
|                       | interface-lo@if_packets: ba129a26-7d09-489c-8525-6edeaf06a671     |
|                       | load@load: 1c2bfba2-d4cb-4894-8dad-1f79bf087557                   |
|                       | memory@memory.buffered: 7c8621ae-cb22-4f48-b530-06909bffdacb      |
|                       | memory@memory.cached: f9720c07-c61a-4066-b50b-153f671d5e87        |
|                       | memory@memory.free: f46e7540-ed49-4013-ae62-1dd6dba497c1          |
|                       | memory@memory.slab_recl: 01544851-d910-49d4-ba41-fc9c51a0241b     |
|                       | memory@memory.slab_unrecl: 20aebd67-7134-4799-a073-21066111ffe4   |
|                       | memory@memory.used: 3b5b32a6-45c7-4cc2-aa63-c50bd00eeb1b          |
| original_resource_id  | d17d5191-af60-4407-9ed2-e3d48e86ac6d                              |
| project_id            | None                                                              |
| revision_end          | None                                                              |
| revision_start        | 2017-12-22T16:47:26.261617+00:00                                  |
| started_at            | 2017-12-22T16:47:26.261597+00:00                                  |
| type                  | generic                                                           |
| user_id               | None                                                              |
+-----------------------+-------------------------------------------------------------------+

To get all the measurements of a particular metric, run:

openstack metric measures show <metric_name> --resource <instance_id> --refresh

For example, to get measurements of used memory over time for instance d17d5191-af60-4407-9ed2-e3d48e86ac6d, run:

openstack metric measures show memory@memory.used --resource-id d17d5191-af60-4407-9ed2-e3d48e86ac6d --refresh

This will show the latest measurements with granularity set to 1.0, as well as aggregate values (by default mean) over one minute and one hour. Other aggregation methods can be used with the --aggregation option: std, count, min, max, sum.

+---------------------------+-------------+---------------+
| timestamp                 | granularity |         value |
+---------------------------+-------------+---------------+
| 2017-12-22T18:00:00+01:00 |      3600.0 |  1222193280.0 |
| 2017-12-22T18:01:00+01:00 |        60.0 |  1222684672.0 |
| 2017-12-22T18:02:00+01:00 |        60.0 | 1222394538.67 |
| 2017-12-22T18:03:00+01:00 |        60.0 | 1222147413.33 |
| 2017-12-22T18:01:20+01:00 |         1.0 |  1222684672.0 |
| 2017-12-22T18:01:30+01:00 |         1.0 |  1222684672.0 |
| 2017-12-22T18:01:40+01:00 |         1.0 |  1222684672.0 |
| 2017-12-22T18:01:50+01:00 |         1.0 |  1222684672.0 |
| 2017-12-22T18:02:00+01:00 |         1.0 |  1222684672.0 |
| 2017-12-22T18:02:10+01:00 |         1.0 |  1222684672.0 |
| 2017-12-22T18:02:20+01:00 |         1.0 |  1222684672.0 |
| 2017-12-22T18:02:30+01:00 |         1.0 |  1221943296.0 |
| 2017-12-22T18:02:40+01:00 |         1.0 |  1222438912.0 |
| 2017-12-22T18:02:50+01:00 |         1.0 |  1221931008.0 |
| 2017-12-22T18:03:00+01:00 |         1.0 |  1221931008.0 |
| 2017-12-22T18:03:10+01:00 |         1.0 |  1221931008.0 |
| 2017-12-22T18:03:20+01:00 |         1.0 |  1221931008.0 |
| 2017-12-22T18:03:30+01:00 |         1.0 |  1222373376.0 |
| 2017-12-22T18:03:40+01:00 |         1.0 |  1222369280.0 |
| 2017-12-22T18:03:50+01:00 |         1.0 |  1222348800.0 |
+---------------------------+-------------+---------------+

By default, metrics are stored with an archive policy set to "high", which is defined to keep data as:

However, note that since collectd is configured to collect metrics only every 10 seconds, there is no metric measurement for each second but every 10 seconds.

While only a few collectd plugins are enabled by default, you can leverage the large collection of available plugins. To do so, edit /etc/collectd.conf and uncomment each LoadPlugin <plugin_name> line that you want to enable. Then, restart collectd with:

sudo systemctl restart collectd

The collectd daemon is configured to send measurements by batch to minimize network traffic. However, if you want to avoid any interference during your experiments, you can disable collectd with:

sudo systemctl stop collectd && sudo systemctl disable collectd

Energy and power consumption

Our CentOS 7 and Ubuntu 16.04 appliances now include support for reporting energy and power consumption of each CPU socket and of memory DIMMs. It is done with the etrace2 utility which relies on the Intel RAPL (Running Average Power Limit) interface:

# spawn your program and print energy consumption 
$ etrace2 your_program

# also print power consumption every 0.5 sec
$ etrace2 -i 0.5 your_program

# just print power consumption every 1sec for 10sec
$ etrace2 -i 1.0 -t 10

For example, to report energy consumption during the generation of a large RSA private key:

$ etrace2 openssl genrsa -out private.pem 4096
# ETRACE2_VERSION=0.1
Generating RSA private key, 4096 bit long modulus
..............................................................................................................................................................................................................................................................................................................++
.............................................................................................................................................................++
e is 65537 (0x10001)
# ELAPSED=2.579472
# ENERGY=365.788208
# ENERGY_SOCKET0=99.037841
# ENERGY_DRAM0=78.577698
# ENERGY_SOCKET1=109.230103
# ENERGY_DRAM1=80.336548

The energy consumption is reported in joules.

etrace2 reports power and energy consumption of CPUs and memory of the node during the entire execution of the program. This will include consumption of other programs running during this period, as well as power and energy consumption of CPUs and memory under idle load.

Note the following caveats:

This utility was contributed by Chameleon user Kazutomo Yoshii of Argonne National Laboratory.