Thursday, January 31, 2008

A Schema to Survive the Winds of Change


Today we rejoin our intrepid hero1 in his journey through the land of database schema changes where many dangers and adventures await the unsuspecting traveler. This is the post release phase of the journey - if you wish to refresh your memory of the journey pre-release, please go to my previous post.

Our hero is attending the post-release party. The party is quite post2. The customer button holes our hero and congratulates him: "You've done a great job. All the users are raving about your product. And we have great plans for the future". Even in his drunken stupor, our hero knows what that means - more work. Still, that's better than no work.

The next day, our hero finds out what the customer meant by "great plans". He learns about Sarbanes-Oxley and Basel II compliance; the customer's plan to roll the software out to the poor suckers in the international branches3; and some other functionality they just can't live without. "And bug fixes. There are a few production bugs we4 simply have to fix", says the customer.

Since the customer wants the bugs fixed yesterday, while the new requirements will take a lot longer, there is a mismatch in time lines. There's also the small matter of the customer not willing to accept a bug fix release that includes any partially done new features - he fears for the stability of the application in production and for the safety of his precious data. So our hero is forced to work with branches - one support branch where all the bug fixes go, and the trunk where the code for all the additional features goes. He bravely enters the forbidding jungle of Merge - a vast area marked in all the maps of software development with the legend "Keep Out! Here be Dragons!" - armed only with his trusty sword of Intellect and his favourite merge tool. He also asks his manager for one man day per week to merge changes from the bug-fix branch on to the trunk.

Our hero journeys uneventfully for a few weeks. Production bugs are fixed on the branch. In the fullness of time, these get merged into trunk. Testing effort is doubled for all the production bugs, since they have to be tested on the trunk as well. Thankfully, automated functional tests come to our hero's rescue and reduce the overall testing effort.

While doing the merge one week, our hero notices that there is a new alter script in the production support branch - and it has the same number as an alter script in trunk! Once it gets merged into trunk, there will be two different alter scripts with the same number. This is a problem. Since an alter script should not be run twice while upgrading the schema to a later version, they are written with a guard, like so:
if (select appliedUpto from AppliedAlters) < 1000
begin
alter table customer add initial varchar2(10) null
update AppliedAlters set appliedupto = 1000
end
Two alter scripts with the same number results in only one of them getting run. Definitely A Problem.

Since the problem originated with the branch and the trunk having alter scripts with the same number, our hero first considers reserving a set of numbers for each branch - scripts on the production support branch would have numbers between 1000 and 1100, while the scripts on trunk would have numbers greater than 1100. His manager comes by and asks him to consider all the upgrade scenarios. So our hero takes out his yellow legal pad and notes down these upgrade scenarios:
  1. From an earlier version to a later version on the production support branch
  2. From an earlier version to a later version on trunk
  3. From a version on production support to a version on trunk
He realises that scenario 3 will not work very well, even with the reserved alter numbers. Take, for example, the case of a trunk install, when the DB has had alter number 1300 applied on it, but a later merge from the production support branch introduces alter number 1050 into trunk. The upgrade will fail to apply the alter script with version 1050, since the DB only keeps track of the highest applied alter script number.

Our hero ponders a while and concludes that he has no option but to keep track of all the individual alter script numbers that have ever been applied to the DB. Which means that the alter scripts start looking like this:
if (select count(*) from 
AppliedAlters where appliedNumber = 1000) = 0
begin
alter table customer add initial varchar2(10) null
insert into AppliedAlters values (1000)
end

Does this take care of all the upgrade scenarios outlined earlier? Let's check:
  1. From an earlier version to a later version on the production support branch: Alters applied so far - 0 to 1049. New alter to be applied - 1050. This upgrade is easy. Even the earlier numbering scheme can handle this.

  2. From an earlier version to a later version on trunk: There are two cases here.
    1. Alters applied so far - 0 to 1049 and 1101 to 1250. New alter to be applied - 1251. No problem, even with the earlier numbering scheme.
    2. Alters applied so far - 0 to 1049 and 1101 to 1250. New alter to be applied - 1050. As the number indicates, this new alter came from a merge. This would've broken the old numbering scheme, but the new way of tracking the alter scripts realises that 1050 is not yet applied and so applies it - easy peasy.

  3. From a version on production support to a version on trunk: Alters applied so far - 0 to 1049. New alters to be applied - 1050 and 1101 to 1250. Depending on the order in which the alters were applied, even the old tracking scheme could've handled this. The new numbering scheme will definitely handle this.
So once again, the day is saved, thanks to the Powerpuff Girls our hero.

This article closely reflects what we're currently doing in our team. If you've faced a similar problem, do tell me how you've solved it. If you used this technique or something similar, please do let me know how successful it was for you.



1 That's you. And if you are a she, then please excuse my use of he and hero.
2The customer wanted to make sure that the programmers had really created what he wanted, before he shelled out good money feeding the creators.
3 Which our hero translated to mean localization into Spanish, Tamil and Swahili and multiple currency units based on his prior experience with other customers and his knowledge of the customer's operations.
4 we = you, in this case. I bet it totally irritates you when some one says "we" when they actually mean you, especially in the context of work that needs to be done.

Sunday, January 27, 2008

A Brief History of a Database

You start your project from a clean slate. The design is clean - it meets the customer's requirements elegantly. The DB schema is normalised* and contains exactly what is needed by the code - no more, no less. You create the DB schema. You have an experienced DB Admin in your team who understands that your app will need to be deployed many times. He advises you to save the schema creation scripts. You type out all the SQL scripts that are needed to create your beautiful schema - the tables in all their glory, with all their foreign keys, indices, triggers and not null constraints. You've been reading the right books and articles which advice you to keep everything that is needed to get your program up and running in source control, so you check in the schema creation script into subversion like the good developer that you are. And you start coding.

A week passes by and you figure out that the schema needs to be changed a little - nothing drastic, just a small change - because "the $%*& customer changed his mind" or "the design doesn't handle that *&%$#@! complicated scenario". And boy, are you glad you've checked in the schema creation scripts! You have an easy way of making sure the code and the schema stay in sync - just make sure you pull everything from subversion at the same revision. Secure in the thought that your schema and code will stay in sync forever, you make your edits confidently.

Weeks pass by. One fine morning one of the QA folks sitting near the pantry come by and tell you, "Hey, we need to test a bug you've fixed in the latest build".
"Sure, no problem. Just take the build from \\kilimanjaro\builds\latest".
"Yeah, we already did that - but it doesn't work with the DB dump we have".
"Db dump"?
"Yeah, we realised we were spending too much time setting up data for our testing, so we took a Db dump of our setup data. When I try to install on top of that dump, I get a sql exception which says something about a table with the same name already existing".
Of course! The installer is trying to create a table that already exists. It would've saved lots of work if you could just drop the whole schema and recreate it using the create scripts. The QA folks will scream bloody murder if you do that, since they'll lose their precious data.

You now need to understand what changes have been made to the schema since the build was last installed on the QA machine and apply those changes without disturbing existing data. You find the build number and map it back to the subversion check in - maybe that is easy or maybe you need to go look at the logs in CruiseControl. Then you diff the appropriate revisions of the schema creation script and you know what changed. And then you modify the QA DB by adding columns, removing columns, etc. while making sure you preserve data. Thankfully you just have a few weeks of schema changes to wade through. This only takes 2 days. And nights. And the Saturday. After all that, you have to figure out a way to run the installer after turning off the DB creation. All in all, not pleasant at all.

Most importantly, you learn something: Data typically outlives the program that operates on it. You can't afford to drop and recreate tables every install - what if this had been the customers upgrading from one release to another?

So you read up a bit. Google leads you to DbDeploy. You find Refactoring Databases fascinating. You decide you'll write alter scripts. When you need to move a column from one table to another, you'll no longer modify the two create table scripts; you will instead write
alter table tableName add column columnName;
update tableName set columnName = (select from otherTable where relationBetweenTables);
alter table otherTable drop column columnName;
Yes, "update table" - so that the customer doesn't lose her data. While you are at it, you also write an undo script to roll back any change you apply and number your scripts and use the DB to keep track of which scripts have been run.

And you live happily ever after.

Yeah, right! As you well know, developing enterprise applications is no fairy tale. Next week, on to the world of branches and how they affect your DB.

Meanwhile, do tell me of different strategies you've used to change your DB over time and how they have worked for you.



* BCNF, no less!

Friday, January 25, 2008

Shortcuts to Impressing Developers

If you have been wondering whether it is worth your while to learn the keyboard shortcuts on your favorite IDE*, then here's another reason apart from the obvious one of blazing fast productivity.

I often teach a course on Object Oriented Programming and Design Patterns called the Object Bootcamp. The participants are typically other developers. I'm yet to see two developers who will not disagree about at least one aspect of Object Oriented Programming. Given that the course advocates a particular style of OO which would surprise the average programmer, getting the audience to listen is crucial.

Thankfully, the course is very hands on - it involves getting the participants to write some code to solve a problem and then critiquing** their code. I also normally refactor the code that I'm critiquing. Navigating the code and refactoring, both afford excellent opportunities to show off my skills with the IDE. Being able to select a block of code, do a Ctl+Alt+M and have IntelliJ Idea or ReSharper extract that block into a well-named method, as well as highlight and replace other similar code in the file, usually impresses other developers no end. Even simpler features like hitting Ctrl+B to get to the definition of the class or method are often quite alien to most developers. It makes you, in their eyes, a person who knows where his/her towel is and hence they are more amenable to letting you have your say.

The opportunities to impress others with your keyboard shortcuts are legion. Maybe you are giving a presentation to an audience which involves showing some code. Maybe you are interviewing for a company like my employer ThoughtWorks which invites you to pair program with one of your future colleagues as part of the hiring process. Maybe you are collaborating with your boss on creating a Word doc. In each of these situations, knowing and using keyboard shortcuts can give you that aura of competence - at least for a while.

What has your experience been with keyboard shortcuts? Have you seen others react differently to people who use keyboard shortcuts?


* You do use an IDE, right?
** Critiquing = PC terminology for projecting the code for the whole class to see and tearing it to pieces, thus publicly humiliating the author. The theory is that this builds character.;-). Just kidding! Developers seem to learn very well from their mistakes.

Saturday, January 19, 2008

The Seductive Charm of The First Post!

Take a look at the first blog entry on a random sample of a few of your friends' blogs. Chances are you'll find what I found recently while trying to decide what my first post should be: many blogs start with "This is my first post", "Finally, I've started blogging" or some variation on this general theme. Some of the blog entries stop there, while others go on to explain what the author will blog about and/or why the author is blogging. Eerily similar to the horde of "Programming in LanguageX" books which all start with a program to print "Hello, World". I'm betting, in ancient Egypt, after papyrus was invented, the first hieroglyphic diary started with "Dear Diary, I've decided to record my daily thoughts here" or something of that sort. I've never been any good as a rebel, so here goes my first post:
I'm blogging! This is my first post! As the name of the blog suggests, I'm a programmer* and I love coding. I hope to write about coding, programming languages, software development processes, problems I've faced (and their solution when I know them), how to teach others to code and people who inhabit the world of software development. Above all, I hope to blog regularly**.
There. I've done it. The first post in all its splendor is out there.

Let's talk about why people have this fascination with first posts.
  • I think we all have a fascination with beginnings. Count the number of people who make New Year resolutions, if you want evidence.
  • We like some fanfare when we begin an activity. And this might give us that initial determination to press ahead, till the activity becomes a habit.
  • If the first post makes the author think about and express his/her goals, it will probably help with the determination part as well.
What do you think? Have you done a "first post" blog? What was your motivation in doing that?


* Who obviously thinks he is God! Don't they all?
** Regularly = About once a week.


PS: If you are wondering "what is with the asterisks and footnotes", they are an attempt to ape my all time favorite author Terry Pratchett's style