Skip to main content

When Should I Use Custom Entities? (part 1)

Back-end Development
Drupal

This article offers some reflections on my choices and experiences using custom entities on several Drupal sites and the evolution of my position on when they are appropriate or advisable -- the "why" of custom entities. It is not meant to be instructional on the "how" of coding them. Drupal's own documentation (for example Creating a custom content entity and Creating a content entity type in Drupal 8) is pretty useful for a base understanding, and it's easy to find useful blog entries providing a walk-through of the process. In fact, you may want to reference those pages if I talk about something for which you need a foundational understanding.

In the evolution of my position on custom entities, I at times made some erroneous assumptions and came to conclusions I now believe to be incorrect. I hope that describing my journey will be helpful to anyone considering whether or not to start down the path of using custom entities on your site.

Background

At the time in my career when I started working on my first Drupal site, I was an experienced full-stack .NET developer, and that was mostly on complex internal corporate applications. It was my experience that it was very normal and expected that a software developer would be very hands-on in the database design, including normalized relational architecture and optimizing for performance with views, stored procedures, and indexes.

I started a new chapter of my career working at an agency that made sites using the Drupal framework. Part of my process of learning Drupal itself was looking around an existing, fairly complex, Drupal 7 site. One day I got my database management tool set up and popped the hood on the sites to take a look. My first reaction was to gasp in horror! 

Phase 1: Custom entities as a solution to database architecture

OK, I exaggerate, but at first glance through my DBA goggles the database architecture was pretty alarming compared to the ideal of what I was used to creating myself. I didn't feel anything had been done wrong per se, but I questioned whether having dynamic entity definitions (the ability to add fields via the UI) was worth the price of the performance consequences in the database. Keep in mind that I was pretty green and naive regarding Drupal at this time.

The problem (as I saw it)

The way Drupal handles allowing a site builder to add fields to content types through the UI is to make each field its own table in the database. When you allow for entity revisions (snapshots of the data at the time edits are made), for even a fairly simple type of content with 5 fields you end up with the data for a single piece of content (node) spread across a dozen database tables.

That means that each time a node is added or deleted the representative database records in all those tables must be inserted or deleted. Each of those database tables has a least one index, and often several, so each of those for each of the tables must be stored or deleted as well. From a pure database perspective, adding and deleting index entries are relatively "expensive" operations.

Even simple retrieval of that node for viewing needs to get the data that is spread across all those tables. I incorrectly thought this meant at the absolute very least a half dozen (in my example) index lookups and then the read of the actual data referenced by them. In my mind that just had to significantly impact the performance of any Drupal site.

What's more, for this site I had no need for revision histories. Custom entities would allow me to skip that unnecessary database complexity and performance overhead, as revision functionality is an option in the configuration that defines each entity.

My solution

A few months later when I started work on a brand new Drupal 8 site, I decided that I would fix that problem. I had learned that all the "base fields" of an entity type are kept in a single table. Indeed, if you look at the node_field_data table in a Drupal database you can see that these base fields for the node entity type are not spread apart like fields you add through the UI. I figured that since it was typically pretty easy to identify the list of fields for a given type of content that would be quite stable, it seemed like a no-brainer to make a custom entity so that they could all be base fields and therefore all in one database table.

To accomplish my purpose, I wanted to take this to the extreme, so I decided my custom entities would leave the "fieldable" option disabled, meaning it would not be possible to later add more fields view the UI. It was all or nothing, and everything would be done in code. It was fairly straightforward to follow the examples and get my entities built in the code of a custom module. I enabled the module and there were my entities were, available for viewing and creating via the admin menu and all in one table in the database. Mission accomplished.

The reality (aka the problem of my solution)

I first realized there was a proverbial fly in the ointment when, as could have been predicted, my very stable list of fields wasn't so stable after all. Since all the operations in the database for setting up the entity are handled by Drupal when the module is installed or uninstalled, it's easy enough to change the entity structure that way. However, by then I had a considerable amount of test data and didn't want to lose it as would happen using that method.

Drupal provides the Update API to allow us to make those structural changes while retaining existing data. Using this we can do things besides just adding fields. We can change indexes, update data, or any other kind of database operation we want. It is not really difficult to write the hook_update_N() functions, so this is a pretty straightforward yet powerful feature of Drupal.

By no fault of Drupal, it turned out this became quite a burden. Every time I wanted to make a change to the entity I had to

  1. Update the code that defines the custom entity - namely its base fields as I was using them. The test of this was to uninstall and install the module, which meant I lost existing data. So when this was confirmed I had to restore the pre-update database and go to the next step.
  2. Write the hook_update_N() code to perform the change without destroying existing data. The test was to actually perform the update, so if I made a mistake it meant restoring the database to the pre-update point and trying again.

As you've probably realized, there is a maintenance problem of these two things needing to be kept "matching" - that is the final state of the database structure should be identical whether as a result of the module getting installed or the update is run against an existing installation. I eventually wrote some code to mitigate that by using the entity definition as the basis for dynamically determining the updates, but that serves to demonstrate how much of an issue this really became in practice of building a new site.

In conjunction with all that, my increasing knowledge of how Drupal worked made me realize the Drupal team had been way ahead of me on the performance concerns. The Drupal 8 cache was quite good. This is not an article about the Drupal cache, but as it turns out Drupal could usually avoid having to read and join the data from all those separate tables as I had imagined in my early naivety. Once Drupal figured out which node to retrieve by reading only the minimal data necessary to do so, a single read from the cache for that node and it had the fully hydrated object ready for use.

My conclusion after that project was never again build custom entities just to solve the performance problem of database tables representing each field. The problem wasn't nearly as bad as I had imagined, and the maintenance of the code was not insignificant.

To be continued...

In the next article, I will continue explaining the evolution of my thoughts on custom entities, including describing a use case in which I believe they are still a very useful and powerful option.