torsdag den 8. september 2016

Using multiple emails with GIT

Travis CI, for example, uses the email in the git commit for notification.

~/.gitconfig
[organization "git@github.com:myproject1"]
  email = myname@project1.com

[organization "git@github.com:myproject2"]
  email = myname@project2.com

[organization "git@github.com:personal"]
  email = myname@personal.com
.bashrc
alias git=git-org.sh
~/bin/git-org.sh
#!/bin/sh

ORG=$(git-org.php)

if [[ ! -z "$ORG" ]]; then
  ORG_EMAIL=$(/usr/local/bin/git config organization.$ORG.email)
  if [[ ! -z "$ORG_EMAIL" ]]; then
    EMAIL=$(/usr/local/bin/git config user.email)
    if [[ "$EMAIL" != "$ORG_EMAIL" ]]; then
      echo "User mail is: $EMAIL"
      echo "Organization mail should be: $ORG_EMAIL"
      echo "Switching to email: $ORG_EMAIL"
      $(/usr/local/bin/git config user.email $ORG_EMAIL)
    fi
  fi
fi

/usr/local/bin/git "$@"
~/bin/git-org.php
#!/usr/local/bin/php
 /dev/null`;

if (!$remote) {
    exit;
}

$url = null;
$lines = explode("\n", $remote);
foreach ($lines as $line) {
    if (preg_match('/origin\s+(.*?)\s+\(fetch\)$/', $line, $matches)) {
        $url = $matches[1];
      break;
    }
}

if (!$url) {
    exit;
}

$parsed = parse_url($url);

if (!isset($parsed['scheme'])) {
    $pos = strpos($url, ':');
    if ($pos !== false) {
        $url = substr_replace($url, '/', $pos, 1);
        $url = 'git://' . $url;
        $parsed = parse_url($url);
    }
    else {
        exit;
    }
}

$user = $parsed['user'] ?? '';
$host = $parsed['host'] ?? '';
list($org) = explode('/', ltrim($parsed['path'] ?? '', '/'), 2);
print "$user@$host:$org\n";

fredag den 7. november 2014

Creating field collections programmatically

I had the challenge of creating a new node containing multivalue field collections within multivalue field collections, etc.

This posed no problem as such, except for perhaps performance. Whenever a field collection is saved, the host entity is saved also. In my case it caused a save of the node 26 times, which is perhaps a bit overkill for a new node.

The FieldCollectionItem entity has an internal parameter ($skip_host_save) on the save() method. However, this is not availabe via entity_save() or via the entity metadata wrapper, the latter which I use heavily.

The class below will allow you to call an entity object's save() method with arguments via an entity metadata wrapper object.

/**
/**
 * Class EntityDrupalWrapperSaveArguments
 *
 * Hax0r class for saving field collection without saving the host entity.
 *
 * When saving a new field collection, the host entity will be saved as well.
 * This can result in several of unnecessary (1) saves of the host entity,
 * especially if creating a new node with many field collection fields.
 *
 * The FieldCollectionItem::save() supports an internal option for only saving
 * the field collection entity, but there's no way to send this option via
 * entity_save() or EntityDrupalWrapper::save(). Only through Entity::save() is
 * this possible. However, if we use Entity::save() directly, we loose all the
 * benefits of the metadata wrapper.
 *
 * Instead we implement a "pseudo" class, which has access to the entity
 * metadata wrapper object's protected variables.
 *
 * (1) They SEEM unnecessary.
 */
class EntityDrupalWrapperSaveArguments extends EntityDrupalWrapper {

  /**
   * Re-implementation of EntityDrupalWrapper->save().
   *
   * When saving a new entity, the wrapper object's id must be updated.
   * Since this is a protected variable, we implement this method in a class
   * "pretending" to be an EntityDrupalWrapper class. Thereby we gain access to
   * protected variables in other objects of the same type.
   *
   * @see EntityDrupalWrapper::save().
   */
  static public function saveArguments() {
    $args = func_get_args();
    $wrapper = array_shift($args);
    if ($wrapper->data) {
      if (!entity_type_supports($wrapper->type, 'save')) {
        throw new EntityMetadataWrapperException("There is no information about how to save entities of type " . check_plain($wrapper->type) . '.');
      }
      self::entity_save_arguments($wrapper->type, $wrapper->data, $args);
      // On insert, update the identifier afterwards.
      if (!$wrapper->id) {
        list($wrapper->id, , ) = entity_extract_ids($wrapper->type, $wrapper->data);
      }
    }
    // If the entity hasn't been loaded yet, don't bother saving it.
    return $wrapper;
  }

  /**
   * Re-implementation of entity_save().
   *
   * In order to send arguments to the entity's save() method, we need to
   * re-implement the logic from entity_save().
   *
   * This function takes an extra arguments ($args) compared to entity_save().
   * $args contains an array of the arguments passed to $entity->save().
   *
   * Note: ^ There's a difference between entity_save() and $entity->save().
   *
   * @see entity_save().
   * @see Entity::save().
   */
  static public function entity_save_arguments($entity_type, $entity, $args) {
    $info = entity_get_info($entity_type);
    if (method_exists($entity, 'save')) {
      return call_user_func_array(array($entity, 'save'), $args);
    }
    elseif (isset($info['save callback'])) {
      $info['save callback']($entity);
    }
    elseif (in_array('EntityAPIControllerInterface', class_implements($info['controller class']))) {
      return entity_get_controller($entity_type)->save($entity);
    }
    else {
      return FALSE;
    }
  }
}

// Create an Entity and populate it
$entity = entity_create('node', array('type' => 'article'));
$entity->uid = 1;
$entity->title = 'test';

$wrapper = entity_metadata_wrapper('node', $entity);

$fc_entity = entity_create('field_collection_item', array('field_name' => 'field_my_field_collection_field'));
$fc_entity->setHostEntity('node', $entity);

$fc_wrapper = entity_metadata_wrapper('field_collection_item', $fc_entity);
$fc_wrapper->field_my_field_inside_the_field_collection->set('some value');

// Old style save.
// $fc_wrapper->save();

// New style save.
// Equivalent to $fc_entity->save(TRUE), but retains the functionality of the metadata wrapper.
EntityDrupalWrapperSaveArguments::saveArguments($fc_wrapper, TRUE);




Disclaimer: I'm not responsible if you hurt yourself with this code. And be aware, that using this code will also bypass some presave/update/insert handlers. Which could hurt you. Big time. And I'm not responsible.

fredag den 26. september 2014

Centralized cron and logging

Once too often, I come across modules that implements features, such as specific cron handling and logging.

While this may sometimes be practical, it almost eventually leads to the same problems.

1.) Specific cron handling is inflexible and is implemented in different ways for each module, each with their own bugs and quirks.

2.) Debug logging is always missing when you need it the most. One of the reasons it's usually missing, is because you don't want to inadvertently spam the error log, and you may not have time to implement a debug-log toggle switch. Even if a debug-log toggle switch is implemented, again each module will do it in their own ways, each with their quirks and bugs.


I propose a centralised solution for this.

1.) Use Ultimate Cron for handling cron. This will isolate the cron handling to Ultimate Cron, thus making it a bit easier to debug, since their multiple implementations of cron handling won't exists.

2.) Use Watchdog Filtering for handling debug-log toggling. This way, you can implement debug logging in your module where it makes sense, without having to worry about performance or spamming logs in the production system. The debug-logging can easily be switched on/off, even per module. So when you eventually DO need debug logging in the production system (because, hey, that WILL happen), you don't need to first implement debug logging AND deploy the code.

fredag den 29. august 2014

Renumbering hook_update_N

I've on more than one occasion come across hook_update_N's that weren't enumerated correctly.

Normally this doesn't cause much of a problem, but wrong update hook numbering just hurts my soul.

The documentation states: "Never renumber update hooks". Well ... screw that :-). Let's try and do it anyway, in a sensible and hopefully elegant manner.


The old and wrong hook_update_N numbering could look like this:
/**
 * Do stuff ...
 */
function mymodule_update_7000(&$sandbox) {
  // Do stuff ...
}

/**
 * Do stuff ...
 */
function mymodule_update_7001(&$sandbox) {
  // Do stuff ...
}

/**
 * Do stuff ...
 */
function mymodule_update_7100(&$sandbox) {
  // Do stuff ...
}


7000 and 7001 are wrong in this case. Someone has then skipped to 7100 in an attempt to fix the numbering. But has obviously failed. What we really want is:

/**
 * Do stuff ...
 */
function mymodule_update_7101(&$sandbox) {
  // Do stuff ...
}

/**
 * Do stuff ...
 */
function mymodule_update_7102(&$sandbox) {
  // Do stuff ...
}

/**
 * Do stuff ...
 */
function mymodule_update_7103(&$sandbox) {
  // Do stuff ...
}


How do we get from the old to the new?

/**
 * Mapping of schema versions (old => new).
 *
 * @return array
 */
function _mymodule_get_mapping() {
  return array(
    7000 => 7101,
    7001 => 7102,
    7100 => 7103,
  );
}

/**
 * Implements hook_requirements().
 *
 * Check and fix schema version before updating.
 */
function mymodule_requirements($phase) {
  switch ($phase) {
    case 'update':
      _mymodule_renumber_schema_version();
      break;
  }
}

/**
 * Drush does not invoke hook_requirements('update') for modules other
 * than 'system'.
 *
 * Attempt to intercept the command updatedb (updb), and fix the schema
 * version if applicable.
 */
if (drupal_is_cli() && function_exists('drush_get_command')) {
  $command = drush_get_command();
  if (isset($command['command']) && $command['command'] == 'updatedb') {
    _mymodule_renumber_schema_version();
  }
}

/**
 * Renumber current schema version according to mapping.
 */
function _mymodule_renumber_schema_version() {
  $schema_version = _mymodule_get_current_schema_version();
  $mapping = _mymodule_get_mapping();

  if (!empty($mapping[$schema_version])) {
    $new_version = $mapping[$schema_version];
    drupal_set_installed_schema_version('mymodule', $new_version);
  }
}

/**
 * Helper function for getting the current schema version.
 */
function _mymodule_get_current_schema_version() {
  return db_select('system', 's')
    ->fields('s', array('schema_version'))
    ->condition('name', 'mymodule')
    ->execute()
    ->fetchField();
}

/**
 * Do stuff ...
 */
function mymodule_update_7101(&$sandbox) {
  // Do stuff ...
}

/**
 * Do stuff ...
 */
function mymodule_update_7102(&$sandbox) {
  // Do stuff ...
}

/**
 * Do stuff ...
 */
function mymodule_update_7103(&$sandbox) {
  // Do stuff ...
}



Caveat: The might very well be some slight problems regarding hook_update_dependencies(), if other modules implement this towards your module.

mandag den 5. maj 2014

Uncommon but not unimportant MySQL optimizations for Drupal

Auto-Increment locking mode


https://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html

In most cases (if not all for Drupal), it is not necessary for auto-increment values to be consecutive. You can settle for monotonically increasing. By using the “interleaved” lock mode (2), greater scalability can be gained because of reduction in locks.

innodb_autoinc_lock_mode = 2

This mode will only work in conjunction with replication, iff the binlog format is set to eiter "row" or "mixed". "statement"-based replication will not work with "interleaved" lock mode.


Isolation level


http://dev.mysql.com/doc/refman/5.1/en/dynindex-isolevel.html

The default isolation level in MySQL is REPEATABLE-READ. The isolation level READ-COMMITED is a bit more relax wrt locking and thus scales better. There's nothing in Drupal that requires REPEATABLE-READ and the default level for Postgres is also READ-COMMITTED. The current trend for Drupal also seems to be moving towards the READ-COMMITED isolation level

transaction-isolation = READ-COMMITTED

tirsdag den 14. maj 2013

Using non-transactional cache backends in Drupal

Using a backend other than the DrupalDatabaseCache backend can cause cache consistency issues (read: lead to corrupted cache) especially during high load.

For example, when using memcache as a cache backend, memcache does not know about the database transactions currently in progress. This will lead to premature cache invalidation if cache_clear_all() is used inside a transaction. In fact, all other cache backends than the DrupalDatabaseCache backend suffers from this.

Consider the following:

  1. Start transaction
  2. Invoke hook_node_update()
  3. Modules do their stuff, e.g. field cache is invalidated, etc.
  4. Node is saved
  5. EntityCache is cleared
  6. Commit transaction

Whatever data is saved during the entire node_save() operation isn't committed to the database (and therefore available to concurrent requests) until step 6.

If concurrent requests are made to the node in question between #3 and #6, the path cache, field cache, and god knows what else, will be updated with old/wrong data. Only a cache clear or a new node save without any concurrent requests, will fix this.

Same with EntityCache. There may not be much of a time window between #5 and #6, but it's there. If concurrent requests manage to populate the entity cache in that time window, old/wrong data will be used. If you've ever experienced that you were unable to save a node because "it has already been altered", you may be the victim of this effect.

The Cache Consistent module contains a cache wrapper that addresses this issue, by buffering cache operations until the transaction is committed. Hooking in to the transaction requires a core patch, which is bundled with the module.

Drupals variable storage and locking


When using InnoDB and isolation level REPEATABLE-READ, locking may become an issue. Especially the Rules module has a reputation of overusing/abusing the variable storage. Beside the fact, that each time you change a variable it clears the entire monolithic cache entry for the variables, it can also cause deadlock issues during high concurrent load when deleting variables.

Drupal uses the db_merge() to update a variable, which begins with a transaction and then a SELECT ... FOR UPDATE.

The problem with a transactional SELECT ... FOR UPDATE, is that if 0 rows are found, a following INSERT can block other INSERTs. In principle, there's nothing wrong with this, however in the particular example regarding the variable_*() functions, this kind of locking can be greatly mitigated.

The problem arises with the use of variable_del() which removes the row from the variables table. After this the system is vulnerable to the 0 rows found locking issue.

My proposal is, that instead of deleting the row, just set it to NULL. This will prevent the gap-lock (or is it next-key lock?) at the expense of adding more data to the variable table. Since the entire variable table is stored in the cache anyways, the actual implications of this can be eliminated by not storing NULL values in the variable cache.


Quick core-hack for the variable_del() function:

function variable_del($name) {
  variable_set($name, NULL);
}


As a side note, changing the isolation level to READ-COMMITTED will also fix this problem.