Saturday, August 16, 2008

Hibernate's hbm2ddl automagic and why you shouldn't use it on your agile project

Hibernate now has a new1 feature: hbm2ddl. That stands for Hibernate Mappings to Data Definition Language. Which means that hibernate will read the mapping files, figure out what kind of a database it is connecting to (SQL Server, Oracle, ...) and automagically generate and execute the appropriate DDL commands required to create the correct schema. So no more hassle of writing and maintaining dialect specific SQL scripts for each kind of database that you need your application to work with. What's not to like about that, you ask?

Well, here is what I don't like. hbm2ddl assumes that it is executing against a clean schema. If the database has an existing schema or data, it WILL blow those away. Not good at all. Especially not good in an agile project which updates schemas like so. Bottom line: I'll stick to using my update scripts for now, thank you very much.



PS: Don't get me wrong, I love hibernate most of the time. It's a great tool for mapping objects to the DB without too much hassle. It makes life a lot easier most of the time. It's just that this particular feature caught me totally by surprise. And I hate these kind of surprises!



1: For a given value of new.

Saturday, April 12, 2008

Eclipse Shortcuts: or To Eclipse and Back Again

Recently I shifted from coding in VisualStudio+ReSharper to coding in Eclipse. Again. And me being very much a keyboard shortcuts person, I needed to find Eclipse equivalents for all my favourite ReSharper shortcuts. Déjà bloody vu. So I decided I'll spend some time and record the Eclipse versions of my oft used shortcuts for the next time. Here they are:

Most Used
Ctl+Space : completion
Ctl+1 : Quick fix suggestions
F2 : Show tooltip (javadoc, help)
Navigate
F3  : Go to declaration
Ctl+Sh+T : Open type (class)
Ctl+Sh+R : Open resource (file)
Ctl+O : list class members
Ctl+. : Next error/warning/search result
Ctl+E : switch to an open file
Alt+Left : Previous cursor location
Alt+Right : Next cursor location
Ctl+Q : Last edit location
F12 : activate editor
Ctl+M : maximise or restore active view
Ctl+J : Incremental search
Ctl+Sh+J : Incremental search backwards
Ctl+Sh+G : Find references in workspace
Ctl+Alt+H : Open call hierarchy
Alt+Sh+Q,S : open search view
Ctl+F7 : cycle through views
Alt+Sh+Q,P : package explorer
Debug
Alt+Sh+D : Debug menu
Ctl+Sh+B : toggle breakpoint
Ctl+Sh+I : inspect
F5 : Step into
F6 : Step over
F7 : Step out
F8 : Resume
Launch
Alt+Sh+X, T : Run tests
Ctl+F11 : Run last launched
Refactor

Alt+Sh+T : Refactoring Menu
Alt+Sh+M : Extract Method
Alt+Sh+R : Rename
Alt+Sh+C : Change method signature
Alt+Sh+I : Inline variable/method
Alt+Sh+L : Extract local variable
Alt+Sh+V : Move
Code
Ctl+Sh+F : Format
Ctl+Sh+O : get rid of redundant import statements
Alt+Sh+Up : select enclosing element/widen selection
Alt+Sh+Down : narrow selection
Alt+Sh+S : Source menu (override, create constructor,...)
Alt+Sh+Z : Surround with menu (try/catch, for,...)
Alt+/ : word completion
Ctl+Alt+Up : duplicate line

One of these days I should make this list for ReSharper and IDEA as well.

In the meanwhile, if there are any of your favourites I've missed, do let me know.

Tuesday, February 12, 2008

The Magic Number: Another Reason to Learn Your Tools

We developers are often paid to get productive quickly on a large, unfamiliar code base. There are many ways of doing this. One way of NOT doing this is to start from public static void main1 and work your way through each function or class or module till you understand what you want - that way lies failure and madness.

I can think of 3 ways of getting to grips with new code:
  • Use external sources: Talk to others who understand the code base better; Read the documentation; Watch a tutorial screencast; ...
  • Use the debugger: Step through the code and see what it does.
  • Use the Source, Luke: Read through the code.
Of these, I've found the last the hardest. It is this skill - reading through code - that I want to talk about today.

When I say "read through the code", I use read in a sense that is very different from reading a novel or even a text book. It's more of jumping around the code base, reading a bit here, taking in a whole class at a glance, poring over a single method for over an hour, finding the usages of said method, ... If you are a programmer, you know exactly what I'm talking about. While doing this, you are attempting to build a mental picture of what is going on in the code. This is where the Magic Number comes in.

The idea is that human beings2 can only handle a limited number of pieces of information in their working memory. You can think of working memory as the set of registers on your computer's CPU - all of your thinking happens on pieces of information that is in the working memory, and there is a limited amount of it available. Even though you may not agree with the range 5 to 9 specified in the article, for the purpose of this post, it is enough if you accept that there is a limit.

Here's how I think the code reading process works: You start off with understanding a small piece of code - piece A, let's say. Then piece B. And D. And then how B and D are related via C. And Z. Five of the slots in your working memory are now filled. Suddenly you realise that all these pieces are part of the English alphabet - and boom! five used slots shrinks to one used slot! The larger chunk - English Alphabet - occupies just one slot worth of space. You are essentially building up more powerful abstractions with which you are able to explain the purpose of ever larger swathes of code. So your ability to comprehend code depends on how many free slots you have in your working memory.

"Ok, even if all that is true, where do keyboard shortcuts come in?", you ask. Well, which keyboard shortcut to use is a piece of information too. If you are not very familiar with your IDE, then thinking about which keyboard shortcut to use will take up one or two registers in your working memory. Your brain has less free registers and hence is a slower computer. On the other hand, if you are really familiar with your IDE, your fingers automatically press the right key-chord when your brain thinks "I wonder what that method does?" and take you to the definition of the function. This process doesn't use the 5-9 registers at all, but is muscle memory a la riding the bicycle or swimming. More registers to understand the code implies faster understanding.

There is another factor involved as well: frustration. When you are struggling with the tool, the resulting frustration will make you want to take more breaks from your task of understanding the code. When the tool is almost an extension of you, the feeling of being in control will allow you to work longer periods. This is one way the human brain is different from computers - the latter doesn't get tired.

I don't have proof to offer for all the claims I'm making; I do have anecdotal evidence though.
Exhibit 1: While reading Java code, I'm much faster while using IntelliJ IDEA than while using Eclipse. I believe that Eclipse and IntelliJ IDEA are pretty much equal feature-wise. I attribute my higher productivity on IntelliJ IDEA entirely to my much greater familiarity with its shortcuts.
Exhibit 2: I find it easy to switch between reading C# and Java. I think a lot of the credit should go to JetBRAINS' - their masterstroke of reusing the IntelliJ IDEA's shortcuts in ReSharper means minimal muscle memory retraining when switching between these languages.

All of the above are arguments for learning to use your IDE really well - they could apply equally to using the mouse as well. Here's why I prefer the keyboard: accuracy and speed. With the mouse, I'm a bit of klutz. I click the wrong piece of code. I select the wrong region. I'm really slow at reading the available menu options and very likely to pick the wrong one even after all the reading. And my hands ache when I use the mouse for a while. With the keyboard, especially my trusty old Kinesis Classic, I'm fast and accurate. This leads to less mistakes, less frustration and more time spent reading and processing code. Ergo, Happy Developer.

And that's another reason to learn touch typing!

Are you a predominantly mouse person or a keyboard person? Either way, what are your reasons for preferring one over the other?



1 Or its moral equivalent in your favourite language.
2 That typically includes us programmmers as well.

Thursday, January 31, 2008

A Schema to Survive the Winds of Change


Today we rejoin our intrepid hero1 in his journey through the land of database schema changes where many dangers and adventures await the unsuspecting traveler. This is the post release phase of the journey - if you wish to refresh your memory of the journey pre-release, please go to my previous post.

Our hero is attending the post-release party. The party is quite post2. The customer button holes our hero and congratulates him: "You've done a great job. All the users are raving about your product. And we have great plans for the future". Even in his drunken stupor, our hero knows what that means - more work. Still, that's better than no work.

The next day, our hero finds out what the customer meant by "great plans". He learns about Sarbanes-Oxley and Basel II compliance; the customer's plan to roll the software out to the poor suckers in the international branches3; and some other functionality they just can't live without. "And bug fixes. There are a few production bugs we4 simply have to fix", says the customer.

Since the customer wants the bugs fixed yesterday, while the new requirements will take a lot longer, there is a mismatch in time lines. There's also the small matter of the customer not willing to accept a bug fix release that includes any partially done new features - he fears for the stability of the application in production and for the safety of his precious data. So our hero is forced to work with branches - one support branch where all the bug fixes go, and the trunk where the code for all the additional features goes. He bravely enters the forbidding jungle of Merge - a vast area marked in all the maps of software development with the legend "Keep Out! Here be Dragons!" - armed only with his trusty sword of Intellect and his favourite merge tool. He also asks his manager for one man day per week to merge changes from the bug-fix branch on to the trunk.

Our hero journeys uneventfully for a few weeks. Production bugs are fixed on the branch. In the fullness of time, these get merged into trunk. Testing effort is doubled for all the production bugs, since they have to be tested on the trunk as well. Thankfully, automated functional tests come to our hero's rescue and reduce the overall testing effort.

While doing the merge one week, our hero notices that there is a new alter script in the production support branch - and it has the same number as an alter script in trunk! Once it gets merged into trunk, there will be two different alter scripts with the same number. This is a problem. Since an alter script should not be run twice while upgrading the schema to a later version, they are written with a guard, like so:
if (select appliedUpto from AppliedAlters) < 1000
begin
alter table customer add initial varchar2(10) null
update AppliedAlters set appliedupto = 1000
end
Two alter scripts with the same number results in only one of them getting run. Definitely A Problem.

Since the problem originated with the branch and the trunk having alter scripts with the same number, our hero first considers reserving a set of numbers for each branch - scripts on the production support branch would have numbers between 1000 and 1100, while the scripts on trunk would have numbers greater than 1100. His manager comes by and asks him to consider all the upgrade scenarios. So our hero takes out his yellow legal pad and notes down these upgrade scenarios:
  1. From an earlier version to a later version on the production support branch
  2. From an earlier version to a later version on trunk
  3. From a version on production support to a version on trunk
He realises that scenario 3 will not work very well, even with the reserved alter numbers. Take, for example, the case of a trunk install, when the DB has had alter number 1300 applied on it, but a later merge from the production support branch introduces alter number 1050 into trunk. The upgrade will fail to apply the alter script with version 1050, since the DB only keeps track of the highest applied alter script number.

Our hero ponders a while and concludes that he has no option but to keep track of all the individual alter script numbers that have ever been applied to the DB. Which means that the alter scripts start looking like this:
if (select count(*) from 
AppliedAlters where appliedNumber = 1000) = 0
begin
alter table customer add initial varchar2(10) null
insert into AppliedAlters values (1000)
end

Does this take care of all the upgrade scenarios outlined earlier? Let's check:
  1. From an earlier version to a later version on the production support branch: Alters applied so far - 0 to 1049. New alter to be applied - 1050. This upgrade is easy. Even the earlier numbering scheme can handle this.

  2. From an earlier version to a later version on trunk: There are two cases here.
    1. Alters applied so far - 0 to 1049 and 1101 to 1250. New alter to be applied - 1251. No problem, even with the earlier numbering scheme.
    2. Alters applied so far - 0 to 1049 and 1101 to 1250. New alter to be applied - 1050. As the number indicates, this new alter came from a merge. This would've broken the old numbering scheme, but the new way of tracking the alter scripts realises that 1050 is not yet applied and so applies it - easy peasy.

  3. From a version on production support to a version on trunk: Alters applied so far - 0 to 1049. New alters to be applied - 1050 and 1101 to 1250. Depending on the order in which the alters were applied, even the old tracking scheme could've handled this. The new numbering scheme will definitely handle this.
So once again, the day is saved, thanks to the Powerpuff Girls our hero.

This article closely reflects what we're currently doing in our team. If you've faced a similar problem, do tell me how you've solved it. If you used this technique or something similar, please do let me know how successful it was for you.



1 That's you. And if you are a she, then please excuse my use of he and hero.
2The customer wanted to make sure that the programmers had really created what he wanted, before he shelled out good money feeding the creators.
3 Which our hero translated to mean localization into Spanish, Tamil and Swahili and multiple currency units based on his prior experience with other customers and his knowledge of the customer's operations.
4 we = you, in this case. I bet it totally irritates you when some one says "we" when they actually mean you, especially in the context of work that needs to be done.

Sunday, January 27, 2008

A Brief History of a Database

You start your project from a clean slate. The design is clean - it meets the customer's requirements elegantly. The DB schema is normalised* and contains exactly what is needed by the code - no more, no less. You create the DB schema. You have an experienced DB Admin in your team who understands that your app will need to be deployed many times. He advises you to save the schema creation scripts. You type out all the SQL scripts that are needed to create your beautiful schema - the tables in all their glory, with all their foreign keys, indices, triggers and not null constraints. You've been reading the right books and articles which advice you to keep everything that is needed to get your program up and running in source control, so you check in the schema creation script into subversion like the good developer that you are. And you start coding.

A week passes by and you figure out that the schema needs to be changed a little - nothing drastic, just a small change - because "the $%*& customer changed his mind" or "the design doesn't handle that *&%$#@! complicated scenario". And boy, are you glad you've checked in the schema creation scripts! You have an easy way of making sure the code and the schema stay in sync - just make sure you pull everything from subversion at the same revision. Secure in the thought that your schema and code will stay in sync forever, you make your edits confidently.

Weeks pass by. One fine morning one of the QA folks sitting near the pantry come by and tell you, "Hey, we need to test a bug you've fixed in the latest build".
"Sure, no problem. Just take the build from \\kilimanjaro\builds\latest".
"Yeah, we already did that - but it doesn't work with the DB dump we have".
"Db dump"?
"Yeah, we realised we were spending too much time setting up data for our testing, so we took a Db dump of our setup data. When I try to install on top of that dump, I get a sql exception which says something about a table with the same name already existing".
Of course! The installer is trying to create a table that already exists. It would've saved lots of work if you could just drop the whole schema and recreate it using the create scripts. The QA folks will scream bloody murder if you do that, since they'll lose their precious data.

You now need to understand what changes have been made to the schema since the build was last installed on the QA machine and apply those changes without disturbing existing data. You find the build number and map it back to the subversion check in - maybe that is easy or maybe you need to go look at the logs in CruiseControl. Then you diff the appropriate revisions of the schema creation script and you know what changed. And then you modify the QA DB by adding columns, removing columns, etc. while making sure you preserve data. Thankfully you just have a few weeks of schema changes to wade through. This only takes 2 days. And nights. And the Saturday. After all that, you have to figure out a way to run the installer after turning off the DB creation. All in all, not pleasant at all.

Most importantly, you learn something: Data typically outlives the program that operates on it. You can't afford to drop and recreate tables every install - what if this had been the customers upgrading from one release to another?

So you read up a bit. Google leads you to DbDeploy. You find Refactoring Databases fascinating. You decide you'll write alter scripts. When you need to move a column from one table to another, you'll no longer modify the two create table scripts; you will instead write
alter table tableName add column columnName;
update tableName set columnName = (select from otherTable where relationBetweenTables);
alter table otherTable drop column columnName;
Yes, "update table" - so that the customer doesn't lose her data. While you are at it, you also write an undo script to roll back any change you apply and number your scripts and use the DB to keep track of which scripts have been run.

And you live happily ever after.

Yeah, right! As you well know, developing enterprise applications is no fairy tale. Next week, on to the world of branches and how they affect your DB.

Meanwhile, do tell me of different strategies you've used to change your DB over time and how they have worked for you.



* BCNF, no less!

Friday, January 25, 2008

Shortcuts to Impressing Developers

If you have been wondering whether it is worth your while to learn the keyboard shortcuts on your favorite IDE*, then here's another reason apart from the obvious one of blazing fast productivity.

I often teach a course on Object Oriented Programming and Design Patterns called the Object Bootcamp. The participants are typically other developers. I'm yet to see two developers who will not disagree about at least one aspect of Object Oriented Programming. Given that the course advocates a particular style of OO which would surprise the average programmer, getting the audience to listen is crucial.

Thankfully, the course is very hands on - it involves getting the participants to write some code to solve a problem and then critiquing** their code. I also normally refactor the code that I'm critiquing. Navigating the code and refactoring, both afford excellent opportunities to show off my skills with the IDE. Being able to select a block of code, do a Ctl+Alt+M and have IntelliJ Idea or ReSharper extract that block into a well-named method, as well as highlight and replace other similar code in the file, usually impresses other developers no end. Even simpler features like hitting Ctrl+B to get to the definition of the class or method are often quite alien to most developers. It makes you, in their eyes, a person who knows where his/her towel is and hence they are more amenable to letting you have your say.

The opportunities to impress others with your keyboard shortcuts are legion. Maybe you are giving a presentation to an audience which involves showing some code. Maybe you are interviewing for a company like my employer ThoughtWorks which invites you to pair program with one of your future colleagues as part of the hiring process. Maybe you are collaborating with your boss on creating a Word doc. In each of these situations, knowing and using keyboard shortcuts can give you that aura of competence - at least for a while.

What has your experience been with keyboard shortcuts? Have you seen others react differently to people who use keyboard shortcuts?


* You do use an IDE, right?
** Critiquing = PC terminology for projecting the code for the whole class to see and tearing it to pieces, thus publicly humiliating the author. The theory is that this builds character.;-). Just kidding! Developers seem to learn very well from their mistakes.

Saturday, January 19, 2008

The Seductive Charm of The First Post!

Take a look at the first blog entry on a random sample of a few of your friends' blogs. Chances are you'll find what I found recently while trying to decide what my first post should be: many blogs start with "This is my first post", "Finally, I've started blogging" or some variation on this general theme. Some of the blog entries stop there, while others go on to explain what the author will blog about and/or why the author is blogging. Eerily similar to the horde of "Programming in LanguageX" books which all start with a program to print "Hello, World". I'm betting, in ancient Egypt, after papyrus was invented, the first hieroglyphic diary started with "Dear Diary, I've decided to record my daily thoughts here" or something of that sort. I've never been any good as a rebel, so here goes my first post:
I'm blogging! This is my first post! As the name of the blog suggests, I'm a programmer* and I love coding. I hope to write about coding, programming languages, software development processes, problems I've faced (and their solution when I know them), how to teach others to code and people who inhabit the world of software development. Above all, I hope to blog regularly**.
There. I've done it. The first post in all its splendor is out there.

Let's talk about why people have this fascination with first posts.
  • I think we all have a fascination with beginnings. Count the number of people who make New Year resolutions, if you want evidence.
  • We like some fanfare when we begin an activity. And this might give us that initial determination to press ahead, till the activity becomes a habit.
  • If the first post makes the author think about and express his/her goals, it will probably help with the determination part as well.
What do you think? Have you done a "first post" blog? What was your motivation in doing that?


* Who obviously thinks he is God! Don't they all?
** Regularly = About once a week.


PS: If you are wondering "what is with the asterisks and footnotes", they are an attempt to ape my all time favorite author Terry Pratchett's style