Monthly Archives: July 2010

Load-balancing for MySQL Cluster

Shortly after I wrote my last post regarding some advanced Connector/J load-balancing properties, Anirudh published a post describing configuration of RHEL LVS for load-balancing and failover of MySQL Cluster SQL nodes.  It’s an interesting post, and I admit I know very little about RHEL LVS, but it reminded me of problems I experienced when trying to set up load-balanced ColdFusion(!) servers at my last job, years back.  We ended up with a nice hardware load-balancer sitting in front of multiple ColdFusion web servers.  The problems we found were that our application depended upon session state, which was stored (of course) on a single web server.  The load-balancer allowed us to define sticky sessions, which is what we did, but it cost us.

We couldn’t really balance load – we could balance session counts, sort of.  Every time a new session started, the balancer would pick which server would handle that session – for the full duration of the session.  Some sessions might be short and little load, while others may be very long and represent a huge amount of load.

We also had a limited HA solution.  We implemented a heartbeat function so that when a web server went offline, the load-balancer would re-route affected users to an available server.  But because the session data was stored on the original server, the user had to log in again and recreate session data.  If the user was in the middle of a complex transaction, too bad.

The above problem also made maintenance a pain.  We could reconfigure the load-balancer on the fly to stop using a specific server for new sessions, but we couldn’t take that web server offline until all of the user sessions on that machine terminated.  That might take 5 minutes, or it might take 5 hours.

As I said, I’m no LVS expert, but I would expect similar problems when using it as a load-balancer for MySQL Cluster.  I suspect that only new connection requests are balanced, making persistent connections (like common Java connection pools) “sticky” to whatever machine the connection was originally assigned.  You probably cannot balance load at anything less than “connection” level, while Connector/J will rebalance after transactions or communications errors.  And anytime you lack the ability to redistribute load except at new connections, taking servers offline for maintenance will be problematic (Connector/J 5.1.13 provides a new mechanism to facilitate interruption-free maintenance, which I intend to blog about later).

This means that it probably works best when using other connectors which don’t support load-balancing, or with applications that don’t use persistent connections, but I wouldn’t use it instead of Connector/J’s load-balancing, and I definitely would not use it with Connector/J’s load-balancing – Connector/J won’t understand that multiple MySQL server instances live behind a single address, and won’t be able to coordinate load-balancing with LVS.

Connector/J’s load-balancing failover policies

Connector/J provides a useful load-balancing implementation for Cluster or multi-master deployments.  As of Connector/J 5.1.12, this same implementation is used under the hood for balancing load between read-only slaves with ReplicationDriver.  When trying to balance workload between multiple servers, though, the driver has to decide when it’s safe to swap servers – doing so in the middle of a transaction would not make applications very happy.  Many of the same principles which apply to autoReconnect also apply here – you don’t want to lose important state information.

As a result, Connector/J will only try to pick a new server when one of the following happen:

  1. At transaction boundaries (transactions are explicitly committed or rolled back)
  2. A communication exception (SQL State starting with “08”) is encountered
  3. When a SQLException matches conditions defined by user, using the extension points defined by the loadBalanceSQLStateFailover, loadBalanceSQLExceptionSubclassFailover or loadBalanceExceptionChecker properties.

The third condition is new, and revolves around three new properties introduced with Connector/J 5.1.13.  It allows you to control which SQLExceptions trigger failover. Let’s examine each of the new properties in detail.


The loadBalanceExceptionChecker property is really the key.  This takes a fully-qualified class name which implements the new com.mysql.jdbc.LoadBalanceExceptionChecker interface.  This interface is very simple, and you only need to implement the following method:

public boolean shouldExceptionTriggerFailover(SQLException ex)

In goes a SQLException, out comes a boolean.  True triggers a failover, false does not.  Easy!

You can use this to implement your own custom logic. An example where this might be useful is when dealing with transient errors with MySQL Cluster, where certain buffers may be overloaded. At the 2010 MySQL Conference, Mark Matthews and I presented a simple example during our tutorial which does this:

public class NdbLoadBalanceExceptionChecker
 extends StandardLoadBalanceExceptionChecker {

 public boolean shouldExceptionTriggerFailover(SQLException ex) {
  return super.shouldExceptionTriggerFailover(ex)
    ||  checkNdbException(ex);

 private boolean checkNdbException(SQLException ex){
 // Have to parse the message since most NDB errors
 // are mapped to the same DEMC, sadly.
  return (ex.getMessage().startsWith("Lock wait timeout exceeded") ||
  (ex.getMessage().startsWith("Got temporary error")
  && ex.getMessage().endsWith("from NDB")));

The code above extends com.mysql.jdbc.StandardLoadBalanceExceptionChecker, which is the default implementation. There’s a few convenient shortcuts built into this, for those who want to have some level of control using properties, without writing Java code. This default implementation uses the two remaining properties: loadBalanceSQLStateFailover and loadBalanceSQLExceptionSubclassFailover.


The loadBalanceSQLStateFailover property allows you to define a comma-delimited list of SQLState code prefixes, against which a SQLException is compared. If the prefix matches, failover is triggered. So, for example, the following would trigger a failover if a given SQLException starts with “00”, or is “12345”:



This property can be used in conjunction with loadBalanceSQLStateFailover or on it’s own. If you want certain subclasses of SQLException to trigger failover, simply provide a comma-delimited list of fully-qualified class or interface names to check against. For example, say you want all SQLTransientConnectionExceptions to trigger failover:


That’s all there is to it!

Do you really want autoReconnect to silently reconnect?

Chances are, if you write Java applications using MySQL’s Connector/J driver, you’ve run across the autoReconnect property.  I remember that when I first found it, it seemed I had found the grail itself.  “No more nasty connection closed error messages,” I thought.  Except … it doesn’t really work that way, does it?  I’ve seen this question asked many times in many different contexts:  “Why doesn’t Connector/J just reconnect to MySQL and re-issue my statement, instead of throwing this Exception?”

There are actually a number of reasons, starting with loss of transactional integrity.  The MySQL Manual states that “there is no safe method of reconnecting to the MySQL server without risking some corruption of the connection state or database state information.”   Imagine the following series of statements:

  "UPDATE checking_account SET balance = balance - 1000.00 WHERE customer='Todd'");
  "UPDATE savings_account SET balance = balance + 1000.00 WHERE customer='Todd'");

Now, what happens if the connection to the server dies after the UPDATE to checking_account? If no Exception is thrown, and the application never learns about the problem, it keeps going.  Except that the server never committed the transaction, so that gets rolled back. But then you start a new transaction by increasing the savings_account balance by 5. Your application never got an Exception, so it kept plodding through, eventually commiting. But the commit only applies to the changes made in the new connection, which means you’ve just increased the savings_account balance by 5 without the corresponding reduction to checking_account.  Instead of transferring $1000.00, you just gave me $1000.00.  Thanks!

“So?” you say.  “I run with auto-commit enabled.  That won’t cause any problems like that.”

Actually, it can be worse.  When Connector/J encounters a communication problem, there’s no way to know whether the server processed the currently-executing statement or not.  The following theoretical states are equally possible:

  • The server never received the statement, and therefore nothing happened on the server.
  • The server received the statement, executed it in full, but the response never got to the client.

If you are running with auto-commit enabled, you simply cannot guarantee the state of data on the server when a communication exception is encountered.  The statement may have reached the server; it may have not.  All you know is that communication died at some point, before the client received confirmation (or data) from the server.  This doesn’t just affect auto-commit statements, though – imagine if the communication problem pops up during Connection.commit().  Did it commit on the server before communication died?  Or did the server never receive the COMMIT request?  Ugh.

There’s also transaction-scoped contextual data to be concerned about.  For example:

  • Temporary tables
  • User-defined variables
  • Server-side prepared statements

These things all die when connections die, and if your application uses any of them, any number of ugly things could happen- some silently – if a connection is re-established, but the application plods on unknowingly.

The bottom line is that communication errors generate conditions which may well be unsafe for the driver to silently reconnect and retry, and the application should be notified.  As an application developer, how you handle that information is up to you, but you should be glad that Connector/J notifies you.

MySQL Support has a good home at Oracle – and we’re hiring!

There’s exciting news about Ulf Sandberg starting up a new MySQL services company.  I respect Ulf tremendously – he’s a great guy with good business sense.  That said, others have suggested that staff from the MySQL Support team need a new home – an assertion that simply is untrue.  MySQL Support has a great home at Oracle, and here’s why:

  • Nobody is closer to the development of MySQL than Oracle’s MySQL Support Team.  We work for the same company, share resources and knowledge, communicate directly.  We’ve had Support staff move to Development, and had Development staff move to Support.
  • Nobody is better qualified to support the broad needs of MySQL users on a daily basis.  We hire expert-level, experienced staff for Support, and we don’t do “tiered support”.   With a broad range of products to support and services to offer (MySQL Server to Cluster, ODBC to NDB API connectors, WorkBench to command-line clients, file system optimization to schema design, installation help to stack trace analysis), it takes all types to provide a cohesive, one-stop support shop.  We have that.
  • Never has the owner of MySQL been more committed to – and capable of – delivering on the needs of customers.
  • MySQL is now backed by the resources, stability and commitment of Oracle.

Yes, it is true we’ve lost staff over the past year.  The good news is that we’re developing and hiring more such staff, and you may have what it takes to be one!  It’s fun, it’s exciting, it’s stable, and – at the end of the day – it’s a good for a career.  While it’s always hard to see senior staff move on, I’m extremely proud to see the skills they honed in MySQL Support put to use in service of companies like Facebook.

So, if you are an experienced MySQL DBA, who loves working with people and drawing on a wide range of technical skills to solve challenging problems, and wants the chance to take your career to the next level, consider applying today (note: link is to US-based position, but we are hiring elsewhere, too – email me at todd[dot]farmer[at]oracle[dot]com for details)!