Announcing automigrate: db auto-migrations for Clojure

abogoyavlensky · January 20, 2022, 7:47pm

Hi there! Today I would like to announce my new library called automigrate, the database auto-migration tool for Clojure.

If you would like to know about it a little bit more, please check the article with a short description and an example of usage. The readme of the project contains the current full documentation and more details. It is not yet ready for production but it would be great if you could try it out. For now, it supports creating, updating, and deleting tables, columns, and indexes in PostgreSQL. Any feedback is really appreciated. And I hope you will enjoy using automigrate!

kurtharriger · January 28, 2022, 5:27am

One thing I would encourage you to think about is that many modern web apps generally need to support zero downtime deploys and many of these database evolution tools while convenient for local development often assume that only one version of the app is running at a time and that migrations are effectively instant, and because neither of these assumptions are generally true in a modern production environment I generally avoid using these tools in practice.

Most of these tools will prevent your application from starting if the database version is older until the migration has run and perhaps worse many also fail the application if the database is newer than the expected version.

When you deploy a new version of an web app in production you may be deploying many new nodes at the same time and many if not all of these nodes need to start before the old nodes are terminated. This implies that both old and new versions of an application will be running at the same time and any changes that are not backward compatible may cause the currently deployed version to stop working properly. And if an old node stops working it may fail health checks and be restarted which may cause the old node to attempt to rollback to the previous database or refuse to start, neither of which is a good situation. It should be perfectly fine for previous and next version of an app to coexist peacefully for some duration of time during deployments.

Additionally, some database migrations such as adding indexes can take hours to run on a large database. Because deployments often involve starting many nodes at about the same time, deployment is perhaps the worst time to apply database migrations. Only one node can acquire the database migration lock and all other nodes are blocked at startup waiting for database lock. This prevents the service from auto scaling and new nodes that aren’t able to start can fail health checks and be terminated and restarted with the assumption that they are not healthy, and in some environments this might even trigger an automatic rollback to the previous release and if the migration was not backward compatible the previous release may no longer even work which is also a really bad place to be.

For zero downtime deployments, migrations must be backward compatible with the previous deployment observing that deployments and migrations require a non-zero amount of time.
You can certainly use these tool to perform migrations as long as they don’t fail your app during startup if version is +/- 1 revision and you are disciplined enough to enforce backward compatibility.

Instead of renaming a column you need to

add a new nullable column, unless you are using triggers to populate the new column the new column must be nullable or the old version of app will no longer be able to insert data into the table and the app can’t be updated until the column has been added, therefore it must be nullable.
once column has been added the code can now be updated to populate both the new and old column
data migration script. since there is a non-zero amount of time from when you added the new column and when you updated the code to populate the new column it is likely the prior version has inserted new rows containing only values in old column leaving new column null so an additional step may be needed to populate any nulls. This doesn’t need to be a database script it could be done in application code.
remove references to old column from code and deploy
now you may safely remove the old column.

That is quite a bit of work, so in practice you are unlikely to rename a column just because, however if you need to change the data type you might do something like that.

The other thing I recommend for zero down time deployments is don’t run migrations during deployments. Wait until the app has been fully deployed and healthy then start the database migration. This can still be automated, but in practice large database migrations can still be risky causing lock timeouts and or high latency that you will likely want to manually trigger and monitor during a production deploy. You might run job if you are using Kubernetes or perhaps even just hit an admin/ops http endpoint on your service to start the process when ready. If you don’t have a large db and you want the process to be fully automated then just run a scheduled job to run any migrations 10-15 minutes post deploy, keeping in mind only one node needs to run the migration script and in some cases such as when using application code to backfill a newly added column its most ideal if this node performing the migration is not also serving production traffic as it may experience higher latencies and memory usage as it scrolls through all the database records.

That said I do believe database migrations can and should be automated, included in source control, reviewed and approved by the team before they are run and although zero down time deploys may not be one of your requirements and for most apps a small amount of downtime is fine, I would love to see one of these migration tools actually enable safe zero downtime deploys rather than introduce more problems.

It would be great if your library had the option to require backward compatible changes or at least warn loudly if it may not work with previous version (eg dropping column after rename is unsafe if old app is still using it, but perfectly fine after it is no longer used) and allow previous and next versions to safely coexist during deployments without forcing an upgrade or rollback locking the database at the worst of all times.

abogoyavlensky · January 28, 2022, 9:57am

Hi @kurtharriger! Thank you for your extended comment! And you are right about many topics related to zero-downtime deploy and migrations. But I think that a migration tool could help to keep DB schema changes in order and do not conflict with zero-downtime deploy.

I think that the backward compatibility of a database schema and how the app uses it is the responsibility of a programmer but not the tool. For instance, the whole process you described in detail about renaming the column, you could implement using this tool (and any other migration tool) by creating migrations for every step separately and applying them between deploys of an app. I suppose that even the third point from that list could be implemented by the data migration clj-file (which is not ready yet but it is in the roadmap of automigrate). The tool itself does not encourage you to include applying migration to the deploy process or at the start of an app. You could have separated step in the CI pipeline, or the whole separated CI pipeline just for migration, or even apply it by hand before/after the whole app has been deployed successfully. It is the decision of a programmer.

Thank you for the advice about warning for backward compatibility. For example, a warning about renaming or adding a new column could be really helpful. I think it is a cool idea and I will add it to the roadmap.

system · July 29, 2022, 9:57pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.