Sunday, January 27, 2008

A Brief History of a Database

You start your project from a clean slate. The design is clean - it meets the customer's requirements elegantly. The DB schema is normalised* and contains exactly what is needed by the code - no more, no less. You create the DB schema. You have an experienced DB Admin in your team who understands that your app will need to be deployed many times. He advises you to save the schema creation scripts. You type out all the SQL scripts that are needed to create your beautiful schema - the tables in all their glory, with all their foreign keys, indices, triggers and not null constraints. You've been reading the right books and articles which advice you to keep everything that is needed to get your program up and running in source control, so you check in the schema creation script into subversion like the good developer that you are. And you start coding.

A week passes by and you figure out that the schema needs to be changed a little - nothing drastic, just a small change - because "the $%*& customer changed his mind" or "the design doesn't handle that *&%$#@! complicated scenario". And boy, are you glad you've checked in the schema creation scripts! You have an easy way of making sure the code and the schema stay in sync - just make sure you pull everything from subversion at the same revision. Secure in the thought that your schema and code will stay in sync forever, you make your edits confidently.

Weeks pass by. One fine morning one of the QA folks sitting near the pantry come by and tell you, "Hey, we need to test a bug you've fixed in the latest build".
"Sure, no problem. Just take the build from \\kilimanjaro\builds\latest".
"Yeah, we already did that - but it doesn't work with the DB dump we have".
"Db dump"?
"Yeah, we realised we were spending too much time setting up data for our testing, so we took a Db dump of our setup data. When I try to install on top of that dump, I get a sql exception which says something about a table with the same name already existing".
Of course! The installer is trying to create a table that already exists. It would've saved lots of work if you could just drop the whole schema and recreate it using the create scripts. The QA folks will scream bloody murder if you do that, since they'll lose their precious data.

You now need to understand what changes have been made to the schema since the build was last installed on the QA machine and apply those changes without disturbing existing data. You find the build number and map it back to the subversion check in - maybe that is easy or maybe you need to go look at the logs in CruiseControl. Then you diff the appropriate revisions of the schema creation script and you know what changed. And then you modify the QA DB by adding columns, removing columns, etc. while making sure you preserve data. Thankfully you just have a few weeks of schema changes to wade through. This only takes 2 days. And nights. And the Saturday. After all that, you have to figure out a way to run the installer after turning off the DB creation. All in all, not pleasant at all.

Most importantly, you learn something: Data typically outlives the program that operates on it. You can't afford to drop and recreate tables every install - what if this had been the customers upgrading from one release to another?

So you read up a bit. Google leads you to DbDeploy. You find Refactoring Databases fascinating. You decide you'll write alter scripts. When you need to move a column from one table to another, you'll no longer modify the two create table scripts; you will instead write
alter table tableName add column columnName;
update tableName set columnName = (select from otherTable where relationBetweenTables);
alter table otherTable drop column columnName;
Yes, "update table" - so that the customer doesn't lose her data. While you are at it, you also write an undo script to roll back any change you apply and number your scripts and use the DB to keep track of which scripts have been run.

And you live happily ever after.

Yeah, right! As you well know, developing enterprise applications is no fairy tale. Next week, on to the world of branches and how they affect your DB.

Meanwhile, do tell me of different strategies you've used to change your DB over time and how they have worked for you.



* BCNF, no less!

No comments: