Practical P_S: How old are your connections?

I’ve often wished that PROCESSLIST exposed when a connection was first established, and I find myself wishing for this information more now with MySQL 5.6.  Improvements to PERFORMANCE_SCHEMA make it trivial to see how much time is being spent in various operations for a given connection – but it would make some analysis (“what percentage of connection time is spent doing X?”) easier.

That said, it is possible to approximate connection age with PERFORMANCE_SCHEMA in MySQL 5.6.  I say “approximate” because results will vary based on what instrumentation exists, is enabled, and is collecting timing data.  That’s because we’re just doing a SUM() on the SUM_TIMER_WAIT column for all instrumented waits.  Here’s an example (FYI, I’m using the format_time() function from Mark Leith’s awesome ps_helper scripts to convert from picoseconds to something meaningful to me):

mysql> SELECT ps_helper.format_time(SUM(sum_timer_wait)) total_time
    -> FROM events_waits_summary_by_thread_by_event_name ews
    -> JOIN threads t
    -> ON (t.thread_id = ews.thread_id)
    -> WHERE t.processlist_id = CONNECTION_ID()\G
*************************** 1. row ***************************
total_time: 07:27:38.2321
1 row in set (0.03 sec)

For good measure, I pulled the timestamps for the above query and the initial connection from the MySQL Enterprise Audit Log plugin and compared them. They are pretty close, but not exact matches:

mysql> SELECT TIMEDIFF('2013-07-30 23:48:11','2013-07-30 16:20:30');
+-------------------------------------------------------+
| TIMEDIFF('2013-07-30 23:48:11','2013-07-30 16:20:30') |
+-------------------------------------------------------+
| 07:27:41                                              |
+-------------------------------------------------------+
1 row in set (0.00 sec)

So somewhere, the PERFORMANCE_SCHEMA instrumentation isn’t recording about 3s worth of time (over a span of 7 hours) into the events_waits_summary_by_thread_by_event_name table. That’s going to happen – there are some minor stages missing instrumentation, and not all instrumentation is enabled and timed by default. For example, the default configuration doesn’t capture information for the “stage/sql/User sleep” instrument. Once that’s enabled (and timed), that timing information gets recorded:

mysql> SELECT * FROM events_stages_summary_global_by_event_name
    -> WHERE event_name LIKE '%sleep%'\G
*************************** 1. row ***************************
    EVENT_NAME: stage/sql/User sleep
    COUNT_STAR: 0
SUM_TIMER_WAIT: 0
MIN_TIMER_WAIT: 0
AVG_TIMER_WAIT: 0
MAX_TIMER_WAIT: 0
1 row in set (0.00 sec)

mysql> SELECT SLEEP(2);
+----------+
| SLEEP(2) |
+----------+
|        0 |
+----------+
1 row in set (2.00 sec)

mysql> SELECT * FROM events_stages_summary_global_by_event_name
    -> WHERE event_name LIKE '%sleep%'\G
*************************** 1. row ***************************
    EVENT_NAME: stage/sql/User sleep
    COUNT_STAR: 1
SUM_TIMER_WAIT: 2000057006405
MIN_TIMER_WAIT: 2000057006405
AVG_TIMER_WAIT: 2000057006405
MAX_TIMER_WAIT: 2000057006405
1 row in set (0.00 sec)

So, a first attempt at a query to find the age of all current connections would look like this:

mysql> SELECT
    ->   t.processlist_id,
    ->   t.processlist_user,
    ->   t.processlist_host,
    ->   ps_helper.format_time(SUM(ews.sum_timer_wait)) total_time
    -> FROM events_waits_summary_by_thread_by_event_name ews
    ->   JOIN threads t
    ->     ON (t.thread_id = ews.thread_id)
    -> WHERE t.processlist_id IS NOT NULL
    -> GROUP BY
    ->   t.processlist_id,
    ->   t.processlist_user,
    ->   t.processlist_host\G
*************************** 1. row ***************************
  processlist_id: 20
processlist_user: root
processlist_host: localhost
      total_time: 00:43:30.1790
*************************** 2. row ***************************
  processlist_id: 21
processlist_user: nothing
processlist_host: localhost
      total_time: 47.13 s
2 rows in set (0.02 sec)

But there’s a bit more to it than that- watch what happens with nothing@localhost below:

mysql> SELECT
    ->   t.processlist_id,
    ->   t.processlist_user,
    ->   t.processlist_host,
    ->   ps_helper.format_time(SUM(ews.sum_timer_wait)) total_time
    -> FROM events_waits_summary_by_thread_by_event_name ews
    ->   JOIN threads t
    ->     ON (t.thread_id = ews.thread_id)
    -> WHERE t.processlist_id IS NOT NULL
    -> GROUP BY
    ->   t.processlist_id,
    ->   t.processlist_user,
    ->   t.processlist_host\G
*************************** 1. row ***************************
  processlist_id: 20
processlist_user: root
processlist_host: localhost
      total_time: 01:02:00.4435
*************************** 2. row ***************************
  processlist_id: 22
processlist_user: nothing
processlist_host: localhost
      total_time: 1.60 ms
2 rows in set (0.03 sec)

mysql> SELECT SLEEP(10);
+-----------+
| SLEEP(10) |
+-----------+
|         0 |
+-----------+
1 row in set (10.00 sec)

mysql> SELECT
    ->   t.processlist_id,
    ->   t.processlist_user,
    ->   t.processlist_host,
    ->   ps_helper.format_time(SUM(ews.sum_timer_wait)) total_time
    -> FROM events_waits_summary_by_thread_by_event_name ews
    ->   JOIN threads t
    ->     ON (t.thread_id = ews.thread_id)
    -> WHERE t.processlist_id IS NOT NULL
    -> GROUP BY
    ->   t.processlist_id,
    ->   t.processlist_user,
    ->   t.processlist_host\G
*************************** 1. row ***************************
  processlist_id: 20
processlist_user: root
processlist_host: localhost
      total_time: 01:03:17.0619
*************************** 2. row ***************************
  processlist_id: 22
processlist_user: nothing
processlist_host: localhost
      total_time: 1.60 ms
2 rows in set (0.01 sec)

Notice that the total_time for nothing@localhost hasn’t increased. The summary tables get timing information once the wait event is completed, and that connection is still sitting idle. So how do we account for that? Well, we could theoretically add in values obtained from the events_waits_current table, but there are two problems there:

  1. This table isn’t enabled by default (you need to update setup_consumers to enable this).
  2. Events still waiting show with a TIMER_WAIT value of NULL.

The first is easy enough to enable, but the second is a major hurdle.  There’s a TIMER_START column with non-NULL values, but it’s expressed in picoseconds since server startup, and there’s no mechanism to get the current timer value to do your own calculations.  Until that’s addressed, we can’t leverage the events_waits_current table.  If we assume idle connections are likely the big gaps, we can do something like this:

mysql> SELECT
    ->  processlist_id,
    ->  processlist_user,
    ->  processlist_host,
    ->  ps_helper.format_time(SUM(total_time)) total_time
    -> FROM
    ->  ( SELECT
    ->   t.processlist_id,
    ->   t.processlist_user,
    ->   t.processlist_host,
    ->   SUM(ews.sum_timer_wait) total_time
    -> FROM events_waits_summary_by_thread_by_event_name ews
    ->   JOIN threads t
    ->     ON (t.thread_id = ews.thread_id)
    -> WHERE t.processlist_id IS NOT NULL
    -> GROUP BY
    ->   t.processlist_id,
    ->   t.processlist_user,
    ->   t.processlist_host
    ->  UNION SELECT
    ->   processlist_id,
    ->   processlist_user,
    ->   processlist_host,
    ->   processlist_time * 1000000000000 total_time
    -> FROM threads
    -> WHERE processlist_id IS NOT NULL
    ->   AND processlist_command = 'Sleep'
    -> ) thr
    ->  GROUP BY
    ->   processlist_id,
    ->   processlist_user,
    ->   processlist_host\G
*************************** 1. row ***************************
  processlist_id: 20
processlist_user: root
processlist_host: localhost
      total_time: 01:45:23.4352
*************************** 2. row ***************************
  processlist_id: 22
processlist_user: nothing
processlist_host: localhost
      total_time: 00:43:39.0016
*************************** 3. row ***************************
  processlist_id: 23
processlist_user: root
processlist_host: localhost
      total_time: 00:36:53.3564
3 rows in set (0.00 sec)

It’s not ideal, but it gets me in the neighborhood of what I’m looking for – an approximate age for connections.
 

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.