Structuring dynamic columns in relational databases
by Hasan Hasanzade, Senior Full-Stack Developer
Relational databases are rigid in terms of structure. That's why they are great tool for structured data. Basically, you define your schema, your columns are fixed, and everything is neat and predictable. When designing database schemas you might encounter situations where data models are required to be flexible enough. What happens when that structure needs to be... well, less structured?
This is where the concept of dynamic columns comes into play.
Challenge 🧩
Consider any of the following scenarios:
-
An e-commerce platform where different product categories have different attributes.
- electronics - RAM, storage, screen size
- clothes - size, color, material
- books - ISBN, author, publication date
In this scenario, more and more attributes can be added later on as new categories created.
-
A trading platform that supports predefined payout methods for users, with additional required information based on payout method type selection.
- pix - key type, address
- swift - iban, swift code, bank address, account holder
- crypto - network, currency, address
In this scenario, these methods along with their corresponding columns are predefined and less likely to change. No more payout method/column is predicted.
There can be many other cases where you need to store attributes that vary between different items, change frequently or less frequently. While relational databases weren't initially designed for this kind of flexibility, there are several common strategies to handle dynamic attributes effectively.
Let's explore some of the most popular approaches.
1. The Entity-Attribute-Value (EAV) Model
The EAV model is a classic approach to handling dynamic attributes. Instead of having predefined columns for every possible attribute, you use a separate table(s) to store these attributes as rows.
Typically, an EAV setup involves three main tables, two parent tables and one pivot table with value column:
- Entities table: This table holds your primary items (e.g.,
Users
,Products
). - Attributes table: This table defines all possible attributes that can be associated with your entities (e.g.,
Color
,Size
,Weight
,RAM
). - Entity-Attribute-Values table: This is the core of the EAV model. It is a polymorphic pivot table, that stores all entity types
with their attribute values. It links an entity to an attribute and stores the corresponding value. It usually has
columns like
entity_type
,entity_id
,attribute_id
, andvalue
.
2. JSON/JSONB Columns
Modern version of DBMS like PostgreSQL, MySQL, and SQL Server support storing JSON directly in columns. This is a great option for semi-structured data.
Table definition will be just like this:
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
attributes JSON
);
However, attributes
column itself will have its own flexible noSQL structure.
As a result of implementing this approach for scenario 2, the resulting columns from users table can be like this:
... | payout_method_type | payout_method_account | ... |
---|---|---|---|
... | pix | {"key_type":"CPF","address":"12345678901"} | ... |
... | swift | {"iban":"asdf iban","swift_code":"asdf swift","bank_address":"asdf bank address","account_holder":"John Wick"} | ... |
... | crypto | {"network":"TRON","currency":"USDT","address":"123"} | ... |
There are RDBMS specific details about whether to use JSON or JSONB data types for this purpose. It is even possible to index the data within JSON column in PostgreSQL. SQL syntax is also available to query this type of column by keys in their nested structure.
While it has a neat balance between structure and flexibility, all kinds of data validation logic should be handled in application level.
3. Vertical partitioning
This approach is well-suited for scenarios where categories and their associated attributes are predefined and less likely to change as in scenario 2, but with stricter structural expectations. Since the number of categories is limited and known in advance, they can be represented as enums rather than stored as rows in a separate table.
Each category has its own dedicated table to store attributes, typically following a naming convention like %category%_attributes
.
For example:
pix_attributes
swift_attributes
crypto_attributes
Within these category-specific tables, the columns for attributes are fixed and explicitly defined. This provides the benefit of strong typing, indexing, and better performance, while maintaining clarity and separation of concerns across different categories.
4. Sparse Columns
This is basically about predefining many nullable columns in advance. Column data types can be divided proportionally considering possible dynamic attribute data types. It is kind of reserving empty seats for coming data.
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
-- nullable empty fields by datatype
text_field_1 VARCHAR(255),
text_field_2 VARCHAR(255),
text_field_3 VARCHAR(255),
int_field_1 INT,
int_field_2 INT,
date_field_1 DATE,
-- ... many optional fields
);
Though it is simple in queries and good enough for performance, it really wastes space, has limited flexibility and requires schema changes if reserved columns are not enough. There is also limitation in number of columns that can be added to a table in RDBMS. Obviously, most of the time you don't hit that column limit, but still idea itself has some limitation, IMHO.
Not all RDBMS are good at handling multi-nullable columns. Microsoft SQL Server stands out among them, with that specific feature named sparse columns.
5. Hybrid approach
You know, you can combine any two or more of above approaches just to make new one. Based on different requirements for different modules/services, it is almost always beneficial to use this approach across the whole system.
Except when strict structural consistency is your goal.
Which is best to implement? 🤔
Correct answer: it depends!
There's no one-size-fits-all solution. The best approach depends on your specific needs:
- For maximum flexibility and when attributes are highly unpredictable: EAV might be considered for the most flexible approach, but be aware of its complexities and performance implications.
- For a good balance of flexibility, performance, and ease of use in modern databases: JSON columns are often the preferred method. They integrate well and offer powerful querying capabilities.
- If you're using SQL Server and have many optional attributes: Sparse Columns can be an efficient choice.
Before settling on a strategy, consider the frequency of changes to the attribute set, the complexity of queries you'll need to run, performance requirements, and the capabilities of your chosen database system.
Conclusion ✍️
Relational databases aren't naturally dynamic — but with the right design patterns, like EAV, JSON, or a hybrid approach, you can introduce flexibility while still benefiting from structured storage, indexing, and transactions.
Which approach works best depends on your use case, performance needs, and how dynamic your data truly is.
Remember: the goal isn't to build the most flexible system possible, but to build the right balance of flexibility and maintainability for your specific use case. Sometimes a little rigidity in your schema can save you a lot of complexity in your application code.