EAV multi-value fields
In the article ‘An alternative way of EAV modelling’, I discussed how to do EAV modelling by casting all values (except text) to integers. I’ll continue on that and talk about more advanced topics like multi-value fields.
As binary set
Not all questions have only a single option. Some fields we want to represent by radio-buttons, allow the user to select any number of options. For this we can mimic the behaviour of the SET field type of MySQL. A SET is almost similar to an ENUM, except that each bit represents an option. The value can have multiple bits enabled to represent multiple options.
Example: field options for field ‘programming language’
+-------+-------------+ | value | description | +-------+-------------+ | 1 | C/C++ | | 2 | PHP | | 4 | Java | | 8 | Python | | 16 | Ruby | +-------+-------------+
Choosing ‘C/C++’, ‘PHP’ and ‘Python’ would result in value 11.
There are 2 major disadvantages to this approach. First, with an integer, the number of options is limited to 4*8 = 32. Second, retrieving the description of options would cause joining field on field_option to be done like:
SELECT value_display(`field_name`, value_display(`field_type`, `value`.`value`, GROUP_CONCAT(`field_option`.`description`), `value`.`text`, `precision`, `date_format`)) FROM `value` INNER JOIN `field` ON `value`.`fid` = `field`.`fid` LEFT JOIN `field_option` ON `value`.`fid` = `field_option`.`fid` AND (IF(`field`.`field_type`='SET, `value`.`value` & `field_option`.`value`, `value`.`value` = `field_option`.`value`)) WHERE `value`.`item_id`=? GROUP BY `value`.`fid`;
Because of this, the index on the value isn’t used. Instead, a function has to be performed on each option of the field. This should still give decent enough performance, because this scales linearly and not exponential.
Multiple values
Another way is to store multiple values for the same property. This would mean replacing the primary key from the value table by a normal index.
CREATE TABLE `value` ( `item_id` int(10) UNSIGNED NOT NULL DEFAULT '0', `fid` int(10) UNSIGNED NOT NULL DEFAULT '0', `value` int(11) NOT NULL DEFAULT '0', `text` text, KEY `item_field` (`item_id`,`fid`), KEY `value` (`fid`,`value`), KEY `text` (`text`(255)) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; SELECT value_display(`field_name`, value_display(`field_type`, `value`.`value`, GROUP_CONCAT(`field_option`.`description`), `value`.`text`, `precision`, `date_format`)) FROM `value` INNER JOIN `field` ON `value`.`fid` = `field`.`fid` LEFT JOIN `field_option` ON `value`.`fid` = `field_option`.`fid` AND `value`.`value` = `field_option`.`value` WHERE `value`.`item_id`=? GROUP BY `value`.`fid`;
The disadvantage of this, is that structural integrity is no longer enforced by the database. Only for a field with the type ‘SET’, multiple values should be allowed. However the database will allow multiple values for other field types (like numeric fields) as well. It is up to the application to replace existing values.
Ranges with a single field
A completely different type of multi-value is ranges. A range has a top and bottom value. We can solve this by saving 2 values in the database. Of the values, the the highest one is always the upper limit and the lowest is the bottom limit.
delimiter | CREATE FUNCTION `value_display` (`type` enum('NUMBER', 'ENUM', 'DATE', 'TIME', 'TEXT'), `value` INT, <strong>`max_value` INT</strong>, `option` VARCHAR(255), `text` TEXT, `precision` INT, `date_format` VARCHAR(50)) RETURNS VARCHAR(255) CHARACTER SET latin1 NO SQL BEGIN // ... END SELECT value_display(`field_name`, value_display(`field_type`, MIN(`value`.`value`), IF(COUNT(*)>1, MAX(`value`.`value`, NULL), GROUP_CONCAT(`field_option`.`description`), `value`.`text`, `precision`, `date_format`)) FROM `value` INNER JOIN `field` ON `value`.`fid` = `field`.`fid` LEFT JOIN `field_option` ON `value`.`fid` = `field_option`.`fid` AND `value`.`value` = `field_option`.`value` WHERE `value`.`item_id`=? GROUP BY `value`.`fid`;
Range by 2 fields
Another was of looking at ranges is that the are simply 2 individual fields, eg `min weight` and `max weight`. When filtering you might want to get all items that have 70 kg in their weight range. You would translate this to `min weight` <= 70 and `max weight` >= 70. (In the example query `min weight` has fid=2 and `max weight` fid=3.)
SELECT `item`.* FROM `item` INNER JOIN `value` ON `item`.`id`=`value`.`item_id` AND `value`.`fid` IN (2, 3) WHERE (`value`.`fid`=2 AND `value`.`value` <= 70) OR (`value`.`fid`=3 AND `value`.`value` >= 70) HAVING count(*) =2 GROUP BY `item`.`id`;
Conclusion
We see that using multi-value fields forces to group on fid and generally complicates the queries. If you only need ranges it might be a good idea to use 2 fields instead to keep the queries simpler.
The code in this article has not been tested.
28 Oct 2009 Arnold Daniels





EAV is a non-relational design and trying to make it fit in an SQL database is a fool’s errand.
If EAV really seems like what you need, you should use something like CouchDB, Apache Cassandra , Project Voldemort, or one of the other emerging key/value storage technology.
By using EAV, you’ve already sacrificed any benefit using an RDBMS might have given you, so why are you still using an RDBMS?
I don’t mean to be cranky, but this is really one of those square-peg-in-a-round-hole situations.
If you want to discuss the why and how of EAV, please add to the existing discussion on An alternative way of EAV modelling.
That said. The issue with document oriented databases is that it is very difficult to get results based on referential info. Lets say you’re building a social network application and want to get the unread messages of all contacts living in the UK and are working at a company that is member of your partner program. This needs to filter on a combination of 3 documents `message`, `contact` and `company`. This can’t be done with CouchDB (and probably not with any other key/value store), because it can’t combine indexes from different documents. This makes the cases where you can actually use these type of databases very limited.