Ruthlessly Helpful

Stephen Ritchie's offerings of ruthlessly helpful software engineering practices.

When To Use Database-First

Code-centric development using an object-relational mapping (ORM) tool has a workflow that many developers find comfortable. They feel productive using the ORM in this way, as opposed to starting with the database model. There are a number of good posts out there on the Entity Framework 4.1 code-first capabilities: MSDN, MSDN Magazine, Scott Guthrie, Channel 9, and Scott Hanselman’s Magic Unicorn Feature. It makes sense to the object-oriented developer and writing code-first comes very naturally.

This prompts the question: When would it be better to take a database-first approach?

For many legacy and Brownfield projects the answer is obvious. You have a database that’s already designed — you may even be stuck with it — therefore you choose database-first approach. This is the defining need for database-first because the database is a fixed point. And so, use database-first when the database design comes from an external requirement or is controlled outside the scope or influence of the project. Similarly, modelling the persistence layer using a model-first approach fits the bill because what you learn about the requirements is expressed in data-centric terms.

Let’s say the project is Greenfield and you have 100% control over the database design. Would a database-first approach ever make sense in that situation?

On-line Transaction Processing (OLTP) and On-line Analytical Processing (OLAP) systems are considered two ends of the the data persistence spectrum. With databases that support OLTP systems the objective is to effectively and properly support the CRUD+Q operations in support of business operations. In the databases that support OLAP systems the objective is to effectively and properly support business intelligence, such as data mining, high-speed data analytics, decision support systems, and other data warehousing goals. These are two extremely different database designs. Many systems’ databases live on a continuum between these two extremes.

I once worked on a student loan originations system. It was a start-with-a-clean-slate, object-oriented development project. Initially, the system was all about entering, reviewing and approving loan applications. We talked about borrowers, students and parents, and their multiple addresses. There was a lot about loan limits and interest rates, check disbursements, and a myriad of complicated and subtle rules and regulations related to creating a loan and making a check disbursement. The system was recording the key records and financial transactions and the database was the master repository of this data. In fulfilling these requirements, the system was a success. However, once the system was readied for parallel Beta-testing at the bank things started to go sideways.

Here is some of what we missed by taking a code-first approach:

  1. Every day the bank manager must run a reconciliation report, which joins in a lot of financial data from a lot of the day’s records, no one can go home until the report shows that everything is balanced. The bank manager screamed when the report took over two hours.
  2. At the end of every quarter, there is an even bigger report that looks at more tables and financial transactions and reconciles everything to the penny. More screaming when this report ran for days and never properly reconciled — the query could never quite duplicate the financial engine’s logic to apply transactions.
  3. And lastly, every loan disbursement that goes out requires a form letter, defined by the Dept. of Education, be sent to every student that has a loan originated by the bank. Imagine the tens of thousands of form letters going out on the day they send the money to UCLA. The project almost died when just one form letter to one student took 30 minutes!
  4. The data migration from the legacy system to the new system was taking nearly a week to completely finish. The bank wasn’t going to stop operations for a week.

What we failed to realize was that the really significant, make-or-break requirements of the system were all reporting or data conversion related. None of it had been seriously brought up or laid out during system development, however, not meeting those requirements took the project very close to the edge of extinction.

A major lesson learned, look very closely at the question of data persistence and retrieval. Work hard to uncover and define the reporting, conversion and other data requirements. Make any hidden or implicit database requirements explicit. Find out if the system is really all about the care and feeding of a relational database.

Adding it all up: if the database-specific requirements significantly overshadow the application-specific requirements then a database-first approach is probably your best bet.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: