Javeline / Ajax.org, MySQL, PHP

Versioning MySQL data: Multi-table records

In the article ‘Versioning MySQL data‘, I showed the basics of implementing a revisioning system using trigger. As Jens Schauder already pointed out, often the data of a record is spread across multiple tables, like an invoice with multiple invoice lines. Having each invoice line versioned individually isn’t really useful. Instead we want a new revision of the whole invoice on each change.

The perfect solution
Ideally a change of one or more parts of the invoice would be changed, a new revision would be created. There are several issues in actually creating this those. Detecting the change of multiple parts of the invoice at once, generating a single revision, would mean we need to know if the actions are done within the same transaction. Unfortunately there is a connection_id(), but no transaction_id() function in MySQL. Also, the query would fail when a query inserts or updates a record in the child table, using the parent table. We need to come up with something else.

In the implementation we currently have in production, we version the rows in the parent as well in the child tables. For each version of the parent row, we register which versions of the child rows ware set. This however has really complicated the trigger code and tends to need a lot of checking an querying slowing the write process down. Since nobody ever looks at the versions of the child rows, the application forces a new version of the parent row. The benefits of versioning both are therefor minimal.

Only versioning the parent
For this new (simplified) implementation, we will only have one revision number across all tables of the record. Changing data from the parent table, will trigger a new version. This will not only copy the parent row to the revisioning table, but also the rows of the children.

Writing to the child will not trigger a new version, instead it will update the data in the revisioning table. This means that when changing the record, you need to write to the parent table, before writing to the child tables. To force a new version without changing values use

UPDATE mytable SET _revision=NULL WHERE id=$id

Continue Reading »

Einstein

Follow me on twitter

I’m giving microblogging another try. Follow my post on twitter @adaniels_nl.

Javeline / Ajax.org, MySQL, PHP

Versioning MySQL data

As a developer you’re probably using a versioning control system, like subversion or git, to safeguard your data. Advantages of using a VCS are that you can walk to the individual changes for a document, see who made each change and revert back to specific revision if needed. These are features which would also be nice for data stored in a database. With the use of triggers we can implement versioning for data stored in a MySQL db.

The revisioning table
We will not store the different versions of the records in the original table. We want this solution to be in the database layer instead of putting all the logic in the application layer. Instead we’ll create a new table, which stores all the different versions and lives next to the original table, which only contains the current version of each record. This revisioning table is copy of the original table, with a couple of additional fields.

CREATE TABLE `_revision_mytable` LIKE `mytable`;
 
ALTER TABLE `_revision_mytable`
  CHANGE `id` `id` int(10) UNSIGNED,
  DROP PRIMARY KEY,
  ADD `_revision` bigint UNSIGNED AUTO_INCREMENT,
  ADD `_revision_previous` bigint UNSIGNED NULL,
  ADD `_revision_action` enum('INSERT','UPDATE') DEFAULT NULL,
  ADD `_revision_user_id` int(10) UNSIGNED NULL,
  ADD `_revision_timestamp` datetime NULL DEFAULT NULL,
  ADD `_revision_comment` text NULL,
  ADD PRIMARY KEY (`_revision`),
  ADD INDEX (`_revision_previous`),
  ADD INDEX `org_primary` (`id`);

The most important field is `_revision`. This field contains a unique identifier for a version of a record from the table. Since this is the unique identifier in the revisioning table, the original id field becomes a normal (indexed) field.
Continue Reading »

MySQL, PHP

EAV multi-value fields

In the article ‘An alternative way of EAV modelling’, I discussed how to do EAV modelling by casting all values (except text) to integers. I’ll continue on that and talk about more advanced topics like multi-value fields.

As binary set
Not all questions have only a single option. Some fields we want to represent by radio-buttons, allow the user to select any number of options. For this we can mimic the behaviour of the SET field type of MySQL. A SET is almost similar to an ENUM, except that each bit represents an option. The value can have multiple bits enabled to represent multiple options.

Example: field options for field ‘programming language’

+-------+-------------+
| value | description |
+-------+-------------+
| 1     | C/C++       |
| 2     | PHP         |
| 4     | Java        |
| 8     | Python      |
| 16    | Ruby        |
+-------+-------------+

Choosing ‘C/C++’, ‘PHP’ and ‘Python’ would result in value 11.

There are 2 major disadvantages to this approach. First, with an integer, the number of options is limited to 4*8 = 32. Second, retrieving the description of options would cause joining field on field_option to be done like:

SELECT value_display(`field_name`, value_display(`field_type`, `value`.`value`, GROUP_CONCAT(`field_option`.`description`), `value`.`text`, `precision`, `date_format`)) FROM `value` INNER JOIN `field` ON `value`.`fid` = `field`.`fid` LEFT JOIN `field_option` ON `value`.`fid` = `field_option`.`fid` AND (IF(`field`.`field_type`='SET, `value`.`value` & `field_option`.`value`, `value`.`value` = `field_option`.`value`)) WHERE `value`.`item_id`=? GROUP BY `value`.`fid`;

Because of this, the index on the value isn’t used. Instead, a function has to be performed on each option of the field. This should still give decent enough performance, because this scales linearly and not exponential.

Multiple values
Another way is to store multiple values for the same property. This would mean replacing the primary key from the value table by a normal index.

CREATE TABLE `value` (
  `item_id` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `fid` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `value` int(11) NOT NULL DEFAULT '0',
  `text` text,
  KEY `item_field`  (`item_id`,`fid`),
  KEY `value` (`fid`,`value`),
  KEY `text` (`text`(255))
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
 
SELECT value_display(`field_name`, value_display(`field_type`, `value`.`value`, GROUP_CONCAT(`field_option`.`description`), `value`.`text`, `precision`, `date_format`)) FROM `value` INNER JOIN `field` ON `value`.`fid` = `field`.`fid` LEFT JOIN `field_option` ON `value`.`fid` = `field_option`.`fid` AND `value`.`value` = `field_option`.`value` WHERE `value`.`item_id`=? GROUP BY `value`.`fid`;

The disadvantage of this, is that structural integrity is no longer enforced by the database. Only for a field with the type ‘SET’, multiple values should be allowed. However the database will allow multiple values for other field types (like numeric fields) as well. It is up to the application to replace existing values.

Ranges with a single field
A completely different type of multi-value is ranges. A range has a top and bottom value. We can solve this by saving 2 values in the database. Of the values, the the highest one is always the upper limit and the lowest is the bottom limit.

delimiter |
CREATE FUNCTION `value_display` (`type` enum('NUMBER', 'ENUM', 'DATE', 'TIME', 'TEXT'), `value` INT, <strong>`max_value` INT</strong>, `option` VARCHAR(255), `text` TEXT, `precision` INT, `date_format` VARCHAR(50)) RETURNS VARCHAR(255) CHARACTER SET latin1 NO SQL
BEGIN
  // ...
END
 
SELECT value_display(`field_name`, value_display(`field_type`, MIN(`value`.`value`), IF(COUNT(*)>1, MAX(`value`.`value`, NULL), GROUP_CONCAT(`field_option`.`description`), `value`.`text`, `precision`, `date_format`)) FROM `value` INNER JOIN `field` ON `value`.`fid` = `field`.`fid` LEFT JOIN `field_option` ON `value`.`fid` = `field_option`.`fid` AND `value`.`value` = `field_option`.`value` WHERE `value`.`item_id`=? GROUP BY `value`.`fid`;

Range by 2 fields
Another was of looking at ranges is that the are simply 2 individual fields, eg `min weight` and `max weight`. When filtering you might want to get all items that have 70 kg in their weight range. You would translate this to `min weight` <= 70 and `max weight` >= 70. (In the example query `min weight` has fid=2 and `max weight` fid=3.)

SELECT `item`.* FROM `item` INNER JOIN `value` ON `item`.`id`=`value`.`item_id` AND `value`.`fid` IN (2, 3) WHERE (`value`.`fid`=2 AND `value`.`value` <= 70) OR (`value`.`fid`=3 AND `value`.`value` >= 70) HAVING count(*) =2 GROUP BY `item`.`id`;

Conclusion
We see that using multi-value fields forces to group on fid and generally complicates the queries. If you only need ranges it might be a good idea to use 2 fields instead to keep the queries simpler.

The code in this article has not been tested.

Life as a hoster, PHP

How I PHP: multiple inheritance

Officially PHP doesn’t support multiple inheritance. There are several ways around this, without having to duplicate code.

Wrapper

The most commonly used method is to use a wrapper object.

<?php
 
abstract class FsNode
{
  public $path;
 
  public function __construct($path) {
    $this->path = $path;
  }
 
  public function rename($newname) {
    rename($this->path, $newname);
    $this->path = $newname;
  }
}
 
class File extends FsNode
{
  public function getContents() {
    return file_get_contents($this->path);
  }
}
 
class Dir extends FsNode
{
  public function scandir() {
    return scandir($this->path);
  }
}
 
class Symlink
{
  protected $node;
 
  public function __construct($node) {
    $this->node  = $node;
  }
 
  public function target($resolve=false) {
    return $resolve ? realpath($this->node->path) : readlink($this->node->path);
  }
 
  public function __call($method, $args) {
    return call_user_func_array(array($this->node, $method), $args);
  }
}
 
$dir = new Dir("/proc");
$linktodir = new Symlink(new Dir("/proc/self"));
 
var_dump($linktodir->scandir()); // Will be called through __call()
echo $linktodir->target(true), "\n";

A disadvantage is that is no longer possible to see if a node is a dir by using instanceof. Also, if most of the methods are defined in the wrapped class, this solution will hurt performance.

Mixin

A far more interesting approach is to use a mixins. When you call a non-static method, $this is always passed to that method. This is also the case if the calling object is not inherited from the called class. We can use that to our advantage to do the reverse of the wrapper.

abstract class FsNode
{
  public $mixin;
  public $path;
 
  public function __construct($path, $mixin=null) {
    $this->path = $path;
    $this->mixin = $mixin;
  }
 
  function rename($newname) {
    rename($this->path, $newname);
    $this->path = $newname;
  }
 
  public  function __call($method, $args) {
    if (isset($this->mixin) && ctype_alnum($method) && is_callable(array($this->mixin, $method))) {
      return eval("return {$this->mixin}::$method(" . (!empty($args) ? '$args[' . join('], $args[', array_keys($args)) . ']' : '') . ");");
    }
    trigger_error("Call to undefined method " . get_class($this) . "::$method()", E_USER_ERROR);
  }
}
 
class File extends FsNode
{
  public function getContents() {
    return file_get_contents($this->path);
  }
}
 
class Dir extends FsNode
{
  public function scandir() {
    return scandir($this->path);
  }
}
 
class Symlink extends FsNode
{
  public function target($resolve=false) {
    return $resolve ? realpath($this->path) : readlink($this->path);
  }
}
 
$dir = new Dir("/proc");
$linktodir = new Dir("/proc/self", 'Symlink');
 
var_dump($linktodir->scandir());
echo $linktodir->target(true), "\n"; // Will be called through __call()

caveat: The Symlink class is never instantiated. Properties defined in the Symlink class are ignored. Also, since Symlink doesn’t extends Dir, it’s not possible to access protected properties defined in Dir.

Compile time mixin

To see if a node is a symlink, you would need to do

if (isset($file->mixin) && ($file->mixin === 'Symlink' || is_subclass_of($file->mixin, 'Symlink'))) {
  //...
}

It would be nicer if you could simply use instance of. This is only possible by defining all combination. We can still use mixins though to prevent having to duplicate code.

abstract class FsNode
{
  protected $mixin;
  // Same as above
}
 
class File extends FsNode
{
  // Same as above
}
 
class Dir extends FsNode
{
  // Same as above
}
 
interface Symlink
{}
 
abstract class Symlink_Methods extends FsNode
{
  public function target($resolve=false) {
    return $resolve ? realpath($this->path) : readlink($this->path);
  }
}
 
class SymlinkFile extends File implements Symlink
{
  protected $mixin = 'Symlink_Methods';
}
 
class SymlinkDir extends Dir implements Symlink
{
  protected $mixin = 'Symlink_Methods';
}
 
$dir = new Dir("/proc");
$linktodir = new SymlinkDir("/proc/self");
 
var_dump($linktodir->scandir());
echo $linktodir->target(true), "\n"; // Will be called through __call()
 
if ($dir instanceof Dir)  ; // True
if ($dir instanceof Symlink)  ; // False
if ($linktodir instanceof Dir)  ; // True
if ($linktodir instanceof Symlink)  ; // True

Life as a hoster, PHP

Suhosin patch for PHP 5.3

The hardened-php project has yet to release a suhosin patch for PHP 5.3.0. We’re already using PHP 5.3, therefore I’ve modified the 0.9.7 patch for 5.2.10 to work with 5.3.0.

-> Download Suhosin patch for PHP 5.3 <-

My linux desktop

Hide Gnome Panel

Since a few months I’ve done away with using the Gnome main menu. Instead I use Gnome Do. I removed the bottom toolbar long ago, because always use alt-tab.

I’m not using the top toolbar much either. It was just taking up valuable screen space. I contains only the notification area and a logout button. I was looking at a way to remove it completely. The answer came in the Compiz widget layer. By placing it on the widget layer, fullsize windows actually fill the full screen, but the notification area is still available for applications who need it.

To move Gnome panel to the Widget layer, open ‘CompizConfig Settings Manager’ and enable ‘Widget Layer’. Go to tab ‘Behaviour’ and add the following text for the ‘Widget Windows’ field:

(class=Gnome-panel & type=Dock)

The desktop will now be completely clean:
Desktop clean

With we can display the widget layer, where the panel is found:
Desktop Widgets

PS. The widgets you see on the widget layer are screenlets. Ubuntu has the screenlets package in the universe repository.

Javeline / Ajax.org, Life as a hoster, PHP

How I PHP: How to take a website offline.

I’ve seen a lot of methods used to take a website temporarily off-line for maintenance. Most involve a using PHP to disable the site or renaming the index file. There is however a far better method of doing this, by placing the following in the vhost file or in an .htaccess file in the document root:

Header always set Retry-After "Thu, 18 Jun 2009 08:00:00 +0200"
Redirect 503 /

This way you are sure no part of the site is used. Also by returning a 503 http response, search-engine crawlers will not reindex your site right at the moment it is down. You can use ‘ErrorDocument’ to place a different text than the apache default.

MySQL, PHP, Replies

Bye bye MySQL?

Sun will be bought by Oracle. Will this be the beginning of the end of MySQL?

MySQL has a serious market share. For that oracle has already tried to buy MySQL back in 2006. In an interview responding to the offer in 2006, MySQL chief Marten Mickos told that the reason for declining was that they wanted to keep MySQL an independent product. From that I assume that oracle wanted to merge Oracle DB and MySQL technology. Even though MySQL will be part of a major merger for which the rules aren’t yet clear, you might think that Oracle hasn’t changed their ideas about what they want with MySQL in the last 3 years.

Won’t MySQL just lose most of its market share if it become something else. Other databases like PostgreSQL have been making mayor steps and are in many expects better than MySQL. MySQL has remained to be the only serious open-source RDBMS in respects of market share though. I believe this is mainly because MySQL is known, tried and tested. This might be a fragile thing though.

Based on Oracle’s decision, I might just take a more serious look at PostgreSQL. Changing is usually not so nice, but change often is.

Any thoughts? Leave a comment or trackback.

Javeline / Ajax.org, PHP

Simple Single Sign-On for PHP

Associated websites often share user information, so a visitor only has to register once and can use that username and password for all sites. A good example for this is Google. You can use you google account for GMail, Blogger, iGoogle, google code, etc. This is nice, but it would be even nicer if logging in for GMail would mean I’m also logged in for the other websites. For that you need to implement single sign-on (SSO).

There are many single sign-on applications and protocols. Most of these are fairly complex. Applications often come with full user management solutions. This makes them difficult to integrate. Most solutions also don’t work well with AJAX, because redirection is used to let the visitor log in at the SSO server.

I’ve written a simple single sign-on solution (400 lines of code), which works by linking sessions. This solutions works for normal websites as well as AJAX sites.
Continue Reading »

Next »