tirsdag den 14. maj 2013

MySQL and update statements


I recently had to do an SQL query that performed a 2x2 matrix multiplation where each field was represented by a column (e.g. column; a, b, c and d).

I then found out, that there's a bit of atomicity lacking when using MySQL. Depending on the way you view it, it can seem intuitive, however it can cause some problems when/if you're assuming the opposite, and/or when other db's act differently (e.g. Postgres).

The easiest example to consider, is swapping the value of two columns.

INSERT INTO mytable (id, col1, col2) VALUES(1, 500, 0);
UPDATE mytable SET col1 = col2, col2 = col1 WHERE id = 1;

Try to guess: What is the value of col2 after this operation?

The answer: It depends on which database you use.

My instincts told me that col2 would contain 500 after the operation. That is true for Postgres, but not true for MySQL. In MySQL col2 would contain 0.

I could imagine, that MySQL does this for reasons of performance, i.e. direct in-place updating of the column or something, but that's just me guessing.

In any case, there are several ways to work around this. At first, I introduced a buffer column to contain the old column value, e.g.:

UPDATE mytable SET tmp = col1, col1 = col2, col2 = tmp WHERE id = 1;

However, seeing that this may only be MySQL specific, I'm thinking about using a @variable instead, as this may be more efficient.



Drupal and temporary tables

Every once in a while, you find yourself in need of a temporary table. Fortunately, Drupal provides a very easy-to-use function: db_query_temporary(). Unfortunately, it's not very flexible.

The are 2 main things wrong with it:

  1. When using MySQL, the engine is hardcoded to MEMORY. While this may suit most purposes, it's certainly not flexible enough, should you need a temporary table containing a data set larger than whatever the DBA decided to use as a maximum for memory tables.
  2. Indices. Adding an index to a temporary table can sometimes be useful when aggregating data.
I've of course had the need to create both large temporary tables and add an index to it, and found that this is not possible out-of-the-box in Drupal.

Regarding the indices, first thing that comes to mind, is just adding the index to the table via db_add_primary_key() after having created the temporary table using db_query_temporary(). This will work for a MySQL backend, but not for other backends. The reason is, that the db_index operations (e.g. db_add_primary_key(), db_add_index(), etc.) checks if the table exists before trying. That may sound like a good thing, but the way it checks it, is by examining the information_schema where temporary tables do not exist. The reason it works for MySQL, is that the table exists function is implemented as a select query to the table in question (for reasons of performance). If it doesn't fail, the table exists.

So, for MySQL, we are currently able to add indices to a temporary table, but the table is still limited in size by the MEMORY storage engine.

I, of course, want this to work for all backends. I think I'll post a feature request on drupal.org about this at some time, once I get around to getting my Drupal 8 installation up and running again.

The feature proposal is simple; add a db_create_temporary_table() function, which acts just like db_create_table() except it creates a temporary table.

Until then, here's a workaround (tested with MySQL, Postgres and SQLite):


/**
 * Creates a new temporary table from a Drupal table definition.
 *
 * @param $table
 *   A Schema API table definition array.
 *
 * @return
 *   The name of the temporary table.
 */
function db_create_temporary_table($table) {
  static $index = 0;

  $connection = Database::getConnection();
  $schema = $connection->schema();
  $method = new ReflectionMethod($schema, 'createTableSql');
  $method->setAccessible(TRUE);

  $tablename = "_db_temporary_" . $index++;
  $statements = $method->invoke($schema, $tablename, $table);
  $statements[0] = preg_replace('/^CREATE TABLE /', 'CREATE TEMPORARY TABLE ', $statements[0]);
  foreach ($statements as $statement) {
    $connection->query($statement);
  }
  return $tablename;
}