Category Archives: Java

Java stuff.

Debugging Communication Link Failure exceptions in Connector/J

Have you seen error messages similar to the following:

Communications link failure – Last packet sent to the server was X ms ago.

Judging from the forums, many people have had problems with this.  Here’s a brief overview of the causes, and potential solutions.

Generally speaking, this error suggests that the network connection has been closed. There can be several root causes:

  • Firewalls or routers may clamp down on idle connections (the MySQL client/server protocol doesn’t ping).
  • The MySQL Server may be closing idle connections which exceed the wait_timeout or interactive_timeout threshold

There’s a couple of useful diagnostic details which can be useful.  For starters, when a recent (5.1.13) version of Connector/J is used, you should see additional details around both the last packet sent and received.  Older versions may simply indicate the last time a packet was sent to the server, which is frequently zero ms ago.  That’s not terribly useful, and it may be that you just sent a packet, but haven’t received a packet from the server for 12 hours.  Knowing how long it’s been since Connector/J last received a packet from the server is useful information, so if you are not seeing this in your exception message, update your driver.

The second useful diagnostic detail shows up when Connector/J notices that the time a packet was last sent/received exceeds the wait_timeout or interactive_timeout threshold.  It will attempt to notify you of this in the exception message.

The following can be helpful in avoiding such problems, but ultimately network connections can be volatile:

  • Ensure connections are valid when checked out of connection pool (use query which starts with “/* ping */” *exactly* to execute lightweight ping instead of full query)
  • Minimize duration a Connection object is left idle while other application logic is executed
  • Explicitly validate Connection before using after being left idle for extended period of time
  • Ensure wait_timeout and interactive_timeout are set sufficiently high
  • Ensure tcpKeepalive is enabled
  • Ensure that any configurable firewall or router timeout setting accounts for maximum expected idle connection time.
  • Make sure that you are not setting socketTimeout, or that it is set to a sufficiently high value to avoid socket timeouts.

I’ve seen exception messages which indicate Connections being used after sitting idle for hours – sometimes days.  If you do this, make sure that you are explicitly testing the connection before using it after lengthy idle periods.  Network connections fail, and applications need to be prepared to handle that.  But expecting connections to survive extended periods where left idle and work magically when used again hours later is just asking for trouble.

[Update 11 August 2015]  Combinations of specific versions of MySQL Server, Connector/Java and JREs may produce “communication link failure” messages when trying to negotiate SSL/TLS during the connection phase.  Further details are found here.

Connector/J ping (there will be a test on this)

MySQL Connector/J has a useful feature which executes a lightweight ping against a server (or, in the case of load-balanced connections, all active pooled internal connections that are retained) to validate the connection. As you might guess, this feature is useful for Java apps which use connection pools, so that the pool can validate the connection. Depending on your connection pool and configuration, this can be done at different times:

  • before the pool returns a connection to the application
  • when the application returns a connection to the pool
  • during periodic checks of idle connections

So, if you want to use this magic light-weight ping process, here’s how you do it:

Specifying a “validation query” in your connection pool that starts with “/* ping */” _exactly_ will cause the driver to instead send a ping to the server and return a fake result set (much lighter weight), and when using a ReplicationConnection or a LoadBalancedConnection, will send the ping across all active connections.

So says the relevant note in the changelogs.

Now for the test.  Which of the following Strings match the condition above?

  • sql = “/* PING */ SELECT 1”;
  • sql = “SELECT 1 /* ping*/”;
  • sql = “/*ping*/ SELECT 1”;
  • sql = ” /* ping */ SELECT 1″;
  • sql = “/*to ping or not to ping*/ SELECT 1”;

If you said “none of the above”, you pass the test.  The test is sensitive to whitespace, capitalization, and placement.  It’s this way for efficiency, as this test is done for every Statement that is executed:


protected static final String PING_MARKER = "/* ping */";
...
if (sql.charAt(0) == '/') {
if (sql.startsWith(PING_MARKER)) {
doPingInstead();
...

All of these statements will issue normal SELECT statements and not get transformed into the lightweight ping.  More problematic, this means that loadbalanced connections only execute one statement against one physical connection in the internal pool, rather than validating the connection for each underlying physical connection.  So the non-active physical connections sit stale, and stale connections die, and then Connector/J re-balances and selects that stale dead connection, and suddenly you have an Exception pushed up to the application (loadBalanceValidateConnectionOnSwapServer can save you here).

If your Connector/J deployment uses a connection pool which allows you to specify a validation query, check it, and make sure that it starts with “/* ping */” – exactly.  This is especially true if you are using load-balancing or replication-aware features of Connector/J – it should help keep alive connections which otherwise will go stale and die, causing problems later.

Managing load-balanced Connector/J deployments

Connector/J has long provided an effective means to distribute read/write load across multiple MySQL server instances for Cluster or master-master replication deployments, but until version 5.1.13, managing such deployments frequently required a service outage to redeploy a new configuration.  Given that ease of scaling out by adding additional MySQL Cluster (server) instances is a key element in that product offering, which is also naturally targeted at deployments with very strict availability requirements, we had to add support for online changes of this nature.  It’s also critical for online upgrades – the other option is to take a MySQL Cluster server instance down hard, which loses any in-process transactions and generates application exceptions, if any application is trying to use that particular server instance.

Mark Matthews and I first presented this at the MySQL Conference and Expo 2010, but the live demo we had planned didn’t work out (buggy video adapter, not product, really!).  This is my belated attempt to rectify that situation by walking through configuration, deployment and management of load-balancing configurations.

First, let’s talk about the configuration properties associated with this new functionality:

  • loadBalanceConnectionGroup – This is somewhat tricky to explain – imagine a scenario where you have an application which pulls JDBC connections from three different sources.  They may be dedicated resources for specific components – one for your JMS, one for your web app, etc.  This allows you to manage these JDBC sources within a single class-loader in any combination you choose.  If you have different servers for JMS than for your web app, you would define two different connection groups.  If they use the same configuration, and you want to manage them as a logical single group, give them the same name.   This is the key property for management, though – if you don’t define a name (string) for loadBalanceConnectionGroup, you cannot manage the connections.
  • loadBalanceEnableJMX – The ability to manage the connections is exposed when you define a loadBalanceConnectionGroup, but if you want to manage this externally, you will want to enable JMX by setting this property to true.  This enables a JMX implementation which exposes the management and monitoring operations of a connection group.  To expose this externally, you need to start your application with the -Dcom.sun.management.jmxremote JVM flag.  You can then perform connect and perform operations using a JMX client such as jconsole.

Now that you have the properties set, what operations can you perform?  First, there are a few monitoring details exposed:

  • Current active host count
  • Current active physical connection count
  • Current active logical connection count
  • Total logical connections created
  • Total transaction count

The following management operations are also exposed:

  • Add host
  • Remove host

The JMX interface (com.mysql.jdbc.jmx.LoadBalanceConnectionGroupManagerMBean) has the following methods:


int getActiveHostCount(String group);

int getTotalHostCount(String group);

long getTotalLogicalConnectionCount(String group);

long getActiveLogicalConnectionCount(String group);

long getActivePhysicalConnectionCount(String group);

long getTotalPhysicalConnectionCount(String group);

long getTotalTransactionCount(String group);

void removeHost(String group, String host) throws SQLException;

void stopNewConnectionsToHost(String group, String host) throws SQLException;

void addHost(String group, String host, boolean forExisting);

String getActiveHostsList(String group);

String getRegisteredConnectionGroups();

The getRegisteredConnectionGroups() method will return the names of all connection groups defined in that class-loader.

So, what does this look like when running?  Let’s take a look at the long-delayed demo!

Here’s the code for my test application:

public class ToddTest {

private static String URL = "jdbc:mysql:loadbalance://" +
"localhost:3306,localhost:3310/test?" +
"loadBalanceConnectionGroup=first&loadBalanceEnableJMX=true";

public static void main(String[] args) throws Exception {
new Thread(new Repeater()).start();
new Thread(new Repeater()).start();
new Thread(new Repeater()).start();
}

static Connection getNewConnection() throws SQLException, ClassNotFoundException {
Class.forName("com.mysql.jdbc.Driver");
return DriverManager.getConnection(URL, "root", "");
}

static void executeSimpleTransaction(Connection c, int conn, int trans){
try {
c.setAutoCommit(false);
Statement s = c.createStatement();
s.executeQuery("SELECT SLEEP(1) /* Connection: " + conn + ", transaction: " + trans + " */");
c.commit();
} catch (SQLException e) {
e.printStackTrace();
}
}

public static class Repeater implements Runnable {

public void run() {
for(int i=0; i < 100; i++){ try { Connection c = getNewConnection(); for(int j=0; j < 10; j++){ executeSimpleTransaction(c, i, j); Thread.sleep(Math.round(100 * Math.random())); } c.close(); Thread.sleep(100); } catch (Exception e) { e.printStackTrace(); } } } } }

I start this (with -Dcom.sun.management.jmxremote flag), then start jconsole:

There's the testsuite.ToddTest main class listed there I select that and navigate to the com.mysql.jdbc.jmx.LoadBalanceConnectionGroupManager bean:

Great, let's look at the methods exposed:

Let's get the number of total physical connections established:

So, 27 physical connections have been established.  Now, let's see what the currently-configured hosts are:

So the connections are being balanced against instances running on ports 3306 and 3310.  Now, I've started up an instance on port 3309, and I want to tell Connector/J to start using it.  I do this with the addHost() method:

Adding port 3309 instance

That worked, and here's how I can tell - the queries start showing up in SHOW PROCESSLIST output on port 3309:

PROCESSLIST output on port 3309

Not quite as riveting as a live demo, I'm sure, but the instructions and code necessary to reproduce this quickly on your own local machine are above.  Hopefully this will prove useful to users who want to scale out, maintain and reconfigure Connector/J-driven applications without downtime.

Connector/J load-balancing for auto-commit-enabled deployments

In a previous blog post, I wrote about Connector/J’s failover policies, I described three triggers which cause Connector/J to re-balance (potentially selecting a new physical connection to another host):

  1. At transaction boundaries (transactions are explicitly committed or rolled back)
  2. A communication exception (SQL State starting with “08”) is encountered
  3. When a SQLException matches conditions defined by user, using the extension points defined by the loadBalanceSQLStateFailover, loadBalanceSQLExceptionSubclassFailover or loadBalanceExceptionChecker properties.

Those conditions fit most needs very well, but there are situations where people are running with auto-commit enabled (no explicit transaction commit or rollback), and the end result is that Connector/J never re-balances, and keeps chugging away using the same physical connection.  That’s a problem – particularly when load-balancing is being used under-the-hood to distribute read-only load across multiple slaves.  And as of today, there is new functionality which allows Connector/J to be configured to re-balance after a certain number of statements when auto-commit is enabled.  This functionality is dependent upon the following new properties:

  • loadBalanceAutoCommitStatementThreshold – defines the number of matching statements which will trigger the driver to (potentially) swap physical server connections.  The default value (0) retains the previously-established behavior that connections with auto-commit enabled are never balanced.
  • loadBalanceAutoCommitStatementRegex – the regular expression against which statements must match.  The default value (blank) matches all statements.

So, for example, using the following properties will cause Connector/J to re-balance after every third statement which contains the string “test”:

loadBalanceAutoCommitStatementThreshold=3

loadBalanceAutoCommitStatementRegex=.*test.*

Why does loadBalanceAutoCommitStatementRegex exist?  Maybe your application uses temporary tables.  Maybe it uses server-side variables.  Maybe it uses some other connection state that wouldn’t be good to lose by letting the driver arbitrarily swap physical connections before you are done with it (this post explores some of these scenarios).  This allows you to identify a trigger statement which is only executed at times where it is known to be safe to swap physical connections.  Generally, you likely won’t need to use it, but it may come in handy if you rely on server session-side state.

Load-balancing for MySQL Cluster

Shortly after I wrote my last post regarding some advanced Connector/J load-balancing properties, Anirudh published a post describing configuration of RHEL LVS for load-balancing and failover of MySQL Cluster SQL nodes.  It’s an interesting post, and I admit I know very little about RHEL LVS, but it reminded me of problems I experienced when trying to set up load-balanced ColdFusion(!) servers at my last job, years back.  We ended up with a nice hardware load-balancer sitting in front of multiple ColdFusion web servers.  The problems we found were that our application depended upon session state, which was stored (of course) on a single web server.  The load-balancer allowed us to define sticky sessions, which is what we did, but it cost us.

We couldn’t really balance load – we could balance session counts, sort of.  Every time a new session started, the balancer would pick which server would handle that session – for the full duration of the session.  Some sessions might be short and little load, while others may be very long and represent a huge amount of load.

We also had a limited HA solution.  We implemented a heartbeat function so that when a web server went offline, the load-balancer would re-route affected users to an available server.  But because the session data was stored on the original server, the user had to log in again and recreate session data.  If the user was in the middle of a complex transaction, too bad.

The above problem also made maintenance a pain.  We could reconfigure the load-balancer on the fly to stop using a specific server for new sessions, but we couldn’t take that web server offline until all of the user sessions on that machine terminated.  That might take 5 minutes, or it might take 5 hours.

As I said, I’m no LVS expert, but I would expect similar problems when using it as a load-balancer for MySQL Cluster.  I suspect that only new connection requests are balanced, making persistent connections (like common Java connection pools) “sticky” to whatever machine the connection was originally assigned.  You probably cannot balance load at anything less than “connection” level, while Connector/J will rebalance after transactions or communications errors.  And anytime you lack the ability to redistribute load except at new connections, taking servers offline for maintenance will be problematic (Connector/J 5.1.13 provides a new mechanism to facilitate interruption-free maintenance, which I intend to blog about later).

This means that it probably works best when using other connectors which don’t support load-balancing, or with applications that don’t use persistent connections, but I wouldn’t use it instead of Connector/J’s load-balancing, and I definitely would not use it with Connector/J’s load-balancing – Connector/J won’t understand that multiple MySQL server instances live behind a single address, and won’t be able to coordinate load-balancing with LVS.

Connector/J’s load-balancing failover policies

Connector/J provides a useful load-balancing implementation for Cluster or multi-master deployments.  As of Connector/J 5.1.12, this same implementation is used under the hood for balancing load between read-only slaves with ReplicationDriver.  When trying to balance workload between multiple servers, though, the driver has to decide when it’s safe to swap servers – doing so in the middle of a transaction would not make applications very happy.  Many of the same principles which apply to autoReconnect also apply here – you don’t want to lose important state information.

As a result, Connector/J will only try to pick a new server when one of the following happen:

  1. At transaction boundaries (transactions are explicitly committed or rolled back)
  2. A communication exception (SQL State starting with “08”) is encountered
  3. When a SQLException matches conditions defined by user, using the extension points defined by the loadBalanceSQLStateFailover, loadBalanceSQLExceptionSubclassFailover or loadBalanceExceptionChecker properties.

The third condition is new, and revolves around three new properties introduced with Connector/J 5.1.13.  It allows you to control which SQLExceptions trigger failover. Let’s examine each of the new properties in detail.

loadBalanceExceptionChecker

The loadBalanceExceptionChecker property is really the key.  This takes a fully-qualified class name which implements the new com.mysql.jdbc.LoadBalanceExceptionChecker interface.  This interface is very simple, and you only need to implement the following method:

public boolean shouldExceptionTriggerFailover(SQLException ex)

In goes a SQLException, out comes a boolean.  True triggers a failover, false does not.  Easy!

You can use this to implement your own custom logic. An example where this might be useful is when dealing with transient errors with MySQL Cluster, where certain buffers may be overloaded. At the 2010 MySQL Conference, Mark Matthews and I presented a simple example during our tutorial which does this:

public class NdbLoadBalanceExceptionChecker
 extends StandardLoadBalanceExceptionChecker {

 public boolean shouldExceptionTriggerFailover(SQLException ex) {
  return super.shouldExceptionTriggerFailover(ex)
    ||  checkNdbException(ex);
 }

 private boolean checkNdbException(SQLException ex){
 // Have to parse the message since most NDB errors
 // are mapped to the same DEMC, sadly.
  return (ex.getMessage().startsWith("Lock wait timeout exceeded") ||
  (ex.getMessage().startsWith("Got temporary error")
  && ex.getMessage().endsWith("from NDB")));
 }
}

The code above extends com.mysql.jdbc.StandardLoadBalanceExceptionChecker, which is the default implementation. There’s a few convenient shortcuts built into this, for those who want to have some level of control using properties, without writing Java code. This default implementation uses the two remaining properties: loadBalanceSQLStateFailover and loadBalanceSQLExceptionSubclassFailover.

loadBalanceSQLStateFailover

The loadBalanceSQLStateFailover property allows you to define a comma-delimited list of SQLState code prefixes, against which a SQLException is compared. If the prefix matches, failover is triggered. So, for example, the following would trigger a failover if a given SQLException starts with “00”, or is “12345”:

loadBalanceSQLStateFailover=00,12345

loadBalanceSQLExceptionSubclassFailover

This property can be used in conjunction with loadBalanceSQLStateFailover or on it’s own. If you want certain subclasses of SQLException to trigger failover, simply provide a comma-delimited list of fully-qualified class or interface names to check against. For example, say you want all SQLTransientConnectionExceptions to trigger failover:

loadBalanceSQLExceptionSubclassFailover=java.sql.SQLTransientConnectionException

That’s all there is to it!

Do you really want autoReconnect to silently reconnect?

Chances are, if you write Java applications using MySQL’s Connector/J driver, you’ve run across the autoReconnect property.  I remember that when I first found it, it seemed I had found the grail itself.  “No more nasty connection closed error messages,” I thought.  Except … it doesn’t really work that way, does it?  I’ve seen this question asked many times in many different contexts:  “Why doesn’t Connector/J just reconnect to MySQL and re-issue my statement, instead of throwing this Exception?”

There are actually a number of reasons, starting with loss of transactional integrity.  The MySQL Manual states that “there is no safe method of reconnecting to the MySQL server without risking some corruption of the connection state or database state information.”   Imagine the following series of statements:

conn.createStatement().execute(
  "UPDATE checking_account SET balance = balance - 1000.00 WHERE customer='Todd'");
conn.createStatement().execute(
  "UPDATE savings_account SET balance = balance + 1000.00 WHERE customer='Todd'");
conn.commit();

Now, what happens if the connection to the server dies after the UPDATE to checking_account? If no Exception is thrown, and the application never learns about the problem, it keeps going.  Except that the server never committed the transaction, so that gets rolled back. But then you start a new transaction by increasing the savings_account balance by 5. Your application never got an Exception, so it kept plodding through, eventually commiting. But the commit only applies to the changes made in the new connection, which means you’ve just increased the savings_account balance by 5 without the corresponding reduction to checking_account.  Instead of transferring $1000.00, you just gave me $1000.00.  Thanks!

“So?” you say.  “I run with auto-commit enabled.  That won’t cause any problems like that.”

Actually, it can be worse.  When Connector/J encounters a communication problem, there’s no way to know whether the server processed the currently-executing statement or not.  The following theoretical states are equally possible:

  • The server never received the statement, and therefore nothing happened on the server.
  • The server received the statement, executed it in full, but the response never got to the client.

If you are running with auto-commit enabled, you simply cannot guarantee the state of data on the server when a communication exception is encountered.  The statement may have reached the server; it may have not.  All you know is that communication died at some point, before the client received confirmation (or data) from the server.  This doesn’t just affect auto-commit statements, though – imagine if the communication problem pops up during Connection.commit().  Did it commit on the server before communication died?  Or did the server never receive the COMMIT request?  Ugh.

There’s also transaction-scoped contextual data to be concerned about.  For example:

  • Temporary tables
  • User-defined variables
  • Server-side prepared statements

These things all die when connections die, and if your application uses any of them, any number of ugly things could happen- some silently – if a connection is re-established, but the application plods on unknowingly.

The bottom line is that communication errors generate conditions which may well be unsafe for the driver to silently reconnect and retry, and the application should be notified.  As an application developer, how you handle that information is up to you, but you should be glad that Connector/J notifies you.