Archive for the 'MySQL' Category

Javeline / Ajax.org, MySQL, PHP

Versioning MySQL data: Multi-table records

In the article ‘Versioning MySQL data‘, I showed the basics of implementing a revisioning system using trigger. As Jens Schauder already pointed out, often the data of a record is spread across multiple tables, like an invoice with multiple invoice lines. Having each invoice line versioned individually isn’t really useful. Instead we want a new revision of the whole invoice on each change.

The perfect solution
Ideally a change of one or more parts of the invoice would be changed, a new revision would be created. There are several issues in actually creating this those. Detecting the change of multiple parts of the invoice at once, generating a single revision, would mean we need to know if the actions are done within the same transaction. Unfortunately there is a connection_id(), but no transaction_id() function in MySQL. Also, the query would fail when a query inserts or updates a record in the child table, using the parent table. We need to come up with something else.

In the implementation we currently have in production, we version the rows in the parent as well in the child tables. For each version of the parent row, we register which versions of the child rows ware set. This however has really complicated the trigger code and tends to need a lot of checking an querying slowing the write process down. Since nobody ever looks at the versions of the child rows, the application forces a new version of the parent row. The benefits of versioning both are therefor minimal.

Only versioning the parent
For this new (simplified) implementation, we will only have one revision number across all tables of the record. Changing data from the parent table, will trigger a new version. This will not only copy the parent row to the revisioning table, but also the rows of the children.

Writing to the child will not trigger a new version, instead it will update the data in the revisioning table. This means that when changing the record, you need to write to the parent table, before writing to the child tables. To force a new version without changing values use

UPDATE mytable SET _revision=NULL WHERE id=$id

Continue Reading »

Javeline / Ajax.org, MySQL, PHP

Versioning MySQL data

As a developer you’re probably using a versioning control system, like subversion or git, to safeguard your data. Advantages of using a VCS are that you can walk to the individual changes for a document, see who made each change and revert back to specific revision if needed. These are features which would also be nice for data stored in a database. With the use of triggers we can implement versioning for data stored in a MySQL db.

The revisioning table
We will not store the different versions of the records in the original table. We want this solution to be in the database layer instead of putting all the logic in the application layer. Instead we’ll create a new table, which stores all the different versions and lives next to the original table, which only contains the current version of each record. This revisioning table is copy of the original table, with a couple of additional fields.

CREATE TABLE `_revision_mytable` LIKE `mytable`;
 
ALTER TABLE `_revision_mytable`
  CHANGE `id` `id` int(10) UNSIGNED,
  DROP PRIMARY KEY,
  ADD `_revision` bigint UNSIGNED AUTO_INCREMENT,
  ADD `_revision_previous` bigint UNSIGNED NULL,
  ADD `_revision_action` enum('INSERT','UPDATE') DEFAULT NULL,
  ADD `_revision_user_id` int(10) UNSIGNED NULL,
  ADD `_revision_timestamp` datetime NULL DEFAULT NULL,
  ADD `_revision_comment` text NULL,
  ADD PRIMARY KEY (`_revision`),
  ADD INDEX (`_revision_previous`),
  ADD INDEX `org_primary` (`id`);

The most important field is `_revision`. This field contains a unique identifier for a version of a record from the table. Since this is the unique identifier in the revisioning table, the original id field becomes a normal (indexed) field.
Continue Reading »

MySQL, PHP

EAV multi-value fields

In the article ‘An alternative way of EAV modelling’, I discussed how to do EAV modelling by casting all values (except text) to integers. I’ll continue on that and talk about more advanced topics like multi-value fields.

As binary set
Not all questions have only a single option. Some fields we want to represent by radio-buttons, allow the user to select any number of options. For this we can mimic the behaviour of the SET field type of MySQL. A SET is almost similar to an ENUM, except that each bit represents an option. The value can have multiple bits enabled to represent multiple options.

Example: field options for field ‘programming language’

+-------+-------------+
| value | description |
+-------+-------------+
| 1     | C/C++       |
| 2     | PHP         |
| 4     | Java        |
| 8     | Python      |
| 16    | Ruby        |
+-------+-------------+

Choosing ‘C/C++’, ‘PHP’ and ‘Python’ would result in value 11.

There are 2 major disadvantages to this approach. First, with an integer, the number of options is limited to 4*8 = 32. Second, retrieving the description of options would cause joining field on field_option to be done like:

SELECT value_display(`field_name`, value_display(`field_type`, `value`.`value`, GROUP_CONCAT(`field_option`.`description`), `value`.`text`, `precision`, `date_format`)) FROM `value` INNER JOIN `field` ON `value`.`fid` = `field`.`fid` LEFT JOIN `field_option` ON `value`.`fid` = `field_option`.`fid` AND (IF(`field`.`field_type`='SET, `value`.`value` & `field_option`.`value`, `value`.`value` = `field_option`.`value`)) WHERE `value`.`item_id`=? GROUP BY `value`.`fid`;

Because of this, the index on the value isn’t used. Instead, a function has to be performed on each option of the field. This should still give decent enough performance, because this scales linearly and not exponential.

Multiple values
Another way is to store multiple values for the same property. This would mean replacing the primary key from the value table by a normal index.

CREATE TABLE `value` (
  `item_id` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `fid` int(10) UNSIGNED NOT NULL DEFAULT '0',
  `value` int(11) NOT NULL DEFAULT '0',
  `text` text,
  KEY `item_field`  (`item_id`,`fid`),
  KEY `value` (`fid`,`value`),
  KEY `text` (`text`(255))
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
 
SELECT value_display(`field_name`, value_display(`field_type`, `value`.`value`, GROUP_CONCAT(`field_option`.`description`), `value`.`text`, `precision`, `date_format`)) FROM `value` INNER JOIN `field` ON `value`.`fid` = `field`.`fid` LEFT JOIN `field_option` ON `value`.`fid` = `field_option`.`fid` AND `value`.`value` = `field_option`.`value` WHERE `value`.`item_id`=? GROUP BY `value`.`fid`;

The disadvantage of this, is that structural integrity is no longer enforced by the database. Only for a field with the type ‘SET’, multiple values should be allowed. However the database will allow multiple values for other field types (like numeric fields) as well. It is up to the application to replace existing values.

Ranges with a single field
A completely different type of multi-value is ranges. A range has a top and bottom value. We can solve this by saving 2 values in the database. Of the values, the the highest one is always the upper limit and the lowest is the bottom limit.

delimiter |
CREATE FUNCTION `value_display` (`type` enum('NUMBER', 'ENUM', 'DATE', 'TIME', 'TEXT'), `value` INT, <strong>`max_value` INT</strong>, `option` VARCHAR(255), `text` TEXT, `precision` INT, `date_format` VARCHAR(50)) RETURNS VARCHAR(255) CHARACTER SET latin1 NO SQL
BEGIN
  // ...
END
 
SELECT value_display(`field_name`, value_display(`field_type`, MIN(`value`.`value`), IF(COUNT(*)>1, MAX(`value`.`value`, NULL), GROUP_CONCAT(`field_option`.`description`), `value`.`text`, `precision`, `date_format`)) FROM `value` INNER JOIN `field` ON `value`.`fid` = `field`.`fid` LEFT JOIN `field_option` ON `value`.`fid` = `field_option`.`fid` AND `value`.`value` = `field_option`.`value` WHERE `value`.`item_id`=? GROUP BY `value`.`fid`;

Range by 2 fields
Another was of looking at ranges is that the are simply 2 individual fields, eg `min weight` and `max weight`. When filtering you might want to get all items that have 70 kg in their weight range. You would translate this to `min weight` <= 70 and `max weight` >= 70. (In the example query `min weight` has fid=2 and `max weight` fid=3.)

SELECT `item`.* FROM `item` INNER JOIN `value` ON `item`.`id`=`value`.`item_id` AND `value`.`fid` IN (2, 3) WHERE (`value`.`fid`=2 AND `value`.`value` <= 70) OR (`value`.`fid`=3 AND `value`.`value` >= 70) HAVING count(*) =2 GROUP BY `item`.`id`;

Conclusion
We see that using multi-value fields forces to group on fid and generally complicates the queries. If you only need ranges it might be a good idea to use 2 fields instead to keep the queries simpler.

The code in this article has not been tested.

MySQL, PHP, Replies

Bye bye MySQL?

Sun will be bought by Oracle. Will this be the beginning of the end of MySQL?

MySQL has a serious market share. For that oracle has already tried to buy MySQL back in 2006. In an interview responding to the offer in 2006, MySQL chief Marten Mickos told that the reason for declining was that they wanted to keep MySQL an independent product. From that I assume that oracle wanted to merge Oracle DB and MySQL technology. Even though MySQL will be part of a major merger for which the rules aren’t yet clear, you might think that Oracle hasn’t changed their ideas about what they want with MySQL in the last 3 years.

Won’t MySQL just lose most of its market share if it become something else. Other databases like PostgreSQL have been making mayor steps and are in many expects better than MySQL. MySQL has remained to be the only serious open-source RDBMS in respects of market share though. I believe this is mainly because MySQL is known, tried and tested. This might be a fragile thing though.

Based on Oracle’s decision, I might just take a more serious look at PostgreSQL. Changing is usually not so nice, but change often is.

Any thoughts? Leave a comment or trackback.

MySQL, PHP

Creating a cross tab in MySQL

Data stored in a database is often also useful for statistical purposes. If you own a web-shop you want to be able to create a report about turnover. You can get statistical information by using GROUP BY, eg.

SELECT DATE_FORMAT(invoice.date, '%M') AS `month`, COUNT(*) AS `invoice_count`, SUM(`invoice`.`amount`) AS `turnover`
FROM `invoice`
WHERE `date` BETWEEN '2008-01-01' AND '2008-12-31'
GROUP BY MONTH(`invoice`.`date`)
month     invoice_count  turnover
January   84             9532.26
February  141            20857.61
March     91             10922.71
April     112            15044.48
May       101            9676.60
June      137            12860.88
July      281            34291.20
August    191            26377.66
September 103            16324.78
October   99             12873.23

If you are selling a wide variety of products, you might like to see the turnover for each product category. You could do this with a simple GROUP BY as:

SELECT DATE_FORMAT(`invoice`.`date`, '%M') AS `month`, `category`.`description` AS `category`, COUNT(*) AS `product_count`, SUM(`invoice_product`.`amount`) AS `turnover`
FROM `invoice` INNER JOIN `invoice_product` ON `invoice`.`id` = `invoice_product`.`invoice_id` LEFT JOIN `product` ON `invoice_product`.`product_id` = `product`.`id` LEFT JOIN `category` ON `product`.`category_id` = `category`.`id`
WHERE `date` BETWEEN '2008-01-01' AND '2008-12-31' 
GROUP BY MONTH(`invoice`.`date`), `category`.`id`
month     category   product_count  turnover
January   Hardware   62             4821.31
January   Software   51             4419.41
January   Cables     12             291.54
February  Hardware   71             8408.93
February  Software   101            11726.36
February  Cables     17             312.32
February  Other      2              410.00
March     Hardware   21             2371.58
March     Software   81             8238.81
March     Cables     13             312.32
...

This would give you each category in a different row, ordered by month. Though this contains all the information the format is far from nice. Instead you would like to have 1 row per month with each category as a column as the information about the invoices as well.

SELECT DATE_FORMAT(`invoice`.`date`, '%M') AS `month`, COUNT(DISTINCT `invoice`.`id`) AS `product_count`, COUNT(*) AS `invoice_count`, SUM(`invoice_product`.`amount`) AS `turnover`,
  SUM(`product`.`category_id`=1) AS `hardware_count`, SUM(IF(`product`.`category_id`=1, `invoice_product`.`amount`, 0)) AS `hardware_turnover`,
  SUM(`product`.`category_id`=2) AS `software_count`, SUM(IF(`product`.`category_id`=2, `invoice_product`.`amount`, 0)) AS `software_turnover`,
  SUM(`product`.`category_id`=3) AS `cables_count`, SUM(IF(`product`.`category_id`=3, `invoice_product`.`amount`, 0)) AS `cables_turnover`,
  SUM(`product`.`category_id`=4) AS `other_count`, SUM(IF(`product`.`category_id`=4, `invoice_product`.`amount`, 0)) AS `other_turnover`
FROM `invoice` INNER JOIN `invoice_product` ON `invoice`.`id` = `invoice_product`.`invoice_id` LEFT JOIN `product` ON `invoice_product`.`product_id` = `product`.`id` LEFT JOIN `category` ON `product`.`category_id` = `category`.`id`
WHERE `date` BETWEEN '2008-01-01' AND '2008-12-31'
GROUP BY MONTH(`invoice`.`date`), `category`.`id`
month     invoice_count  turnover    hardware_count  hardware_turnover  software_count  software_turnover  cables_count  cables_turnover  other_count  other_turnover
January   84             9532.26     62              4821.31            51              4419.41            12            291.54           0            0
February  141            20857.61    71              8408.93            101             11726.36           17            312.32           2            410.00
March     91             10922.71    21              2371.58            81              8238.81            13            312.32           0            0
...

The big downside of this method is that you need to modify the query if a category is added. This can be solved though by dynamically creating the query in a PHP script.

If you want to do advanced statistical you should have a look at OLAP cubes. Pentaho is an open-source reporting app which supports MySQL. http://mondrian.pentaho.org/

All data is fictional. The SQL queries are untested.

Erlang, Javeline / Ajax.org, Life as a hoster, MySQL, PHP, Replies

An alternative way of EAV modelling

I was reading this month’s php|architect. It has a nice article about EAV modeling. I had seen this db structure in other project, but didn’t know that it was called EAV. For those who don’t read php|architect, EAV describes a method of saving a large set of attributes, only some of which apply to an individual entity. Normally you would create a table, with a row for each entity and save each attribute in a column. With EAV you save each attribute as a row.

This makes selecting the data quite tedious. If you can life with some constraints, there is an easier way to do this.
Continue Reading »

Life as a hoster, My linux desktop, MySQL, PHP

Howto Install Xen+Lustre on Ubuntu Gutsy

Send in by Ruben Daniels

Lustre is one of the most popular upcoming open source cluster file systems out there. When you want to run Xen’s from a SAN using Lustre you need to support both in the Linux kernel. Both XEN and Lustre are near mature products. This means there is support for it. But it’s quite difficult to find the right source and to combine it with the right kernel source of each. It took me a week of trial and erroring until I found a combination that worked. Since Google wasn’t much help I wrote this article so it might help you. This installation is Ubuntu Gutsy specific. You can start out with a basic Gutsy installation. Hardy is getting Lustre support, but at the time of this writing the package doesnt match the default kernel of Hardy.

First we apt-get some utilities and Xen packages

apt-get install quilt libc6-xen libxen3.1 linux-image-xen linux-restricted-modules-xen linux-xen python-xen-3.1 xen-docs-3.1 xen-hypervisor-3.1 xen-image-2.6.19-4-server xen-ioemu-3.1 xen-tools xen-utils-3.1

Now we get the kernel from kernel.org

cd /usr/src
wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.22.tar.gz
tar -zxvf linux-2.6.22.tar.gz

The best matching (and stable) lustre source I’ve found is in the Hardy Lustre source package. So we get it and extra it manually:

wget http://ubuntu2.cica.es/ubuntu/ubuntu/pool/universe/l/lustre/lustre-source_1.6.4.2-1_all.deb
dpkg -x lustre-source_1.6.4.2-1_all.deb lustre-1.6.4.2
bunzip2 lustre-1.6.4.2/usr/src/lustre.tar.bz2
tar -xvf lustre-1.6.4.2/usr/src/lustre.tar
mv lustre-1.6.4.2/usr/src/modules/lustre/* lustre-1.6.4.2
rm -R lustre-1.6.4.2/usr -R

The official XEN kernel version is 2.6.18. This is way too old for newer machines. So we’ll get the Gutsy kernel source which contains XEN patches for our 2.6.22 kernel and apply them:

apt-get source linux-source-2.6.22
cd /usr/src/linux-source-2.6.22-2.6.22
cd debian/binary-custom.d/xen/patchset
patch -p1 /usr/src/linux-2.6.22 < 001*
patch -p1 /usr/src/linux-2.6.22 < 002*
patch -p1 /usr/src/linux-2.6.22 < 003*

Now we apply the lustre patches:

cd /usr/src/linux-2.6.22
ln -s /usr/src/lustre-1.6.4.2/lustre/kernel_patches/series/2.6.22-vanilla.series series
ln -s /usr/src/lustre-1.6.4.2/lustre/kernel_patches/patches patches
quilt -av push

Make the Kernel:

make distclean
cp debian/binary-custom.d/xen/patchset/config.i386 .
make menuconfig
make
make install modules_install

If you use kernel modules you need to run the following command to create the ram fs.

mkinitramfs -o /boot/initrd.img-2.6.22 2.6.22

Add the kernel to grub’s menu.lst. You could do this manually, but let’s just use the ubuntu script.

update-grub

Reboot the machine. After reboot build Lustre:

cd /usr/src/lustre-1.6.4.2
./configure --with-linux=/usr/src/linux-2.6.22
make
make install

Reboot the machine again. You should now have a working lustre/xen kernel.

MySQL, PHP

MySQL agrees: Getting XML from a DB is easiest with my lib ;)

Today I’m very pleased. My UDF lib ‘lib_mysqludf_xql’ has been mentioned in an article by Jon Stephens about ‘Using XML in MySQL 5.1 and 6.0′.

The lib_mysqludf_xql you can query relational data and return XML. It will give you the same functionality as SQL/XML in Oracle and MS SQL. Compares to other methods like using CONCAT or generating XML in a PHP script, this method is much easier and will perform much better.

You can find the library at mysqludf.org, including some documentation. You can also find a bunch of other nice libs there.

Do note that you need to have root privileges on the server to install MySQL UDFs.

MySQL, PHP, Replies

Submit a form with AJAX using TelePort

This is a re-post of an article I wrote in back in August. We’re getting ready to release a new release of Javeline PlatForm at the end of this month. This release will be XForms compliant and has a lot of other new features as well. Before I start writing about that, I would like to put some attention on the communication layer aka Javeline TelePort.


PHPBuilder posted an article about sending a form using AJAX. This article shows how to post a simple form. But looking closely at the example, you can also see what the problem is using the plain XMLHttpRequest object and writing an implementation yourself.

First of all you to write some rather difficult javascript code. Next you need to completely rewrite your page, looking nor working like a normal HTML form. Last and most important, the function fetches values using getElementById(). This is not really very flexible, because you will need not only the form, but the function as well when you add a field.

Using an AJAX library can really help you here. Javeline TelePort has got a brand new method, which automatically creates a HTTP post request of a form and sends it to the server.
Continue Reading »

MySQL, PHP

Getting the first 5 records of type X in MySQL

I just got a question from Dirk Bonenkamp. He has a query which returns a number of contacts, but he wants to get no more than 5 contacts of per customer. After thinking really hard, I came up with a solution using variables and a subquery.

To test this idea, I used the ‘world’ database available at MySQL dev zone. This query selects no more than 5 countries per continent.

SET @a=0; SET @b="";
SELECT Name, Continent FROM (SELECT @a:=IF(@b=Continent, @a+1, 0) AS a, Name, @b:=Continent AS Continent FROM Country ORDER BY Continent) AS t WHERE a < 5;

This isn’t one of the nicest looking solutions, but it works. Do you have a better solution? Please let me know :) .

Next »