A tale of two databases

I’ve been working with databases for almost a decade. The databases I’ve worked with varied in size and brand. For more than half of this time, I’ve worked at a big in-house software provider, working with big Oracle deployments. On my spare time, I’ve developed web applications on top of MySQL, which has turned into my database of choice. So, when I’ve left the big company, I’ve started consulting small (and growing) startups on ways to improve their MySQL deployment.

Big companies and small companies are very different in the way they approach infrastructure, but the difference is structural, not technological. Let me explain:

BigCo

In BigCo, there are many employees. Titles and responsibilities are not only well defined, but are also well enforced.

For example, when our team needed to change the database schema, we had to ask a DBA (not our DBA, mind you. The DBAs were not part of our team. They sat in a different part of the building, and used a sort of round-robin mechanism for task assignment I still don’t fully understand).  If we needed more memory, we had to go to a system administrator. If we needed a new database server, we had to go to both.

Developing a new version for our software was full of pain. Development and test environments were always last on the task queue for the infrastructure personnel. It took them from days to weeks to set up a new development environment with a subset of data from production. We never had enough storage, so we had to try and subset parts of the database that we thought were relevant for this version – but we could never get a working copy on the first try, not with the many tables we had. It took us weeks to get a server working. And then more weeks to get another for QA.

All this, before the first line of code was even written. That’s when it really got sticky.

Bugs? Sure. It happens. A wrongly placed parenthesis, and all of the user data in the development database is deleted. Wait a week to get refresh the database.

Automated testing? I’d love to, but unless there was an easy way to refresh the database, it was impossible to get repeatable results that meant much.

Unrepeatable issues? Plenty. Development’s database and code is up-to-date with the latest version. QA are running last week’s drop. There’s no way to repeat and debug issues in development. Code and data are out of sync. We just marked them as wontfixcantrepeat, and moved on. Then we got the same bug in production.

It was a mess, but I thought it was because I was working in a big company, and there are too many people involved. I was sure once I moved to a smaller place, where I can manage my own database, I’d be fine. Right? Wrong.

SmallCo

Getting things done in a smaller company is easier, in theory. I could install a new server, a new database on my own. Now, I thought, I wouldn’t have to wait a week to get my database ready so I could start developing the next version. An entire copy of the operational database could fit, quite snugly, on my personal laptop. If something happened, I could just copy the database again, and start over. Easy!

Except it isn’t.

It took me a while to figure it out. After a week of work, my development database and the one I deployed to QA last week didn’t look the same anymore. I still couldn’t reproduce the bugs they found in QA on my own machine. I found myself refreshing my development environment over and over, each time to a different version of the application, before I realized what I was missing.

Source Control is Not Enough

In the olden days, when most companies developed desktop applications, someone very smart realized that code had to be managed in a centralized repository. When the code on the developer’s desktop was different from the code in QA, source control tools were used to momentarily shelve the recent changes and reload an older version of the code. Then the developer could debug and reproduce the error, and deploy a fix.

This is not the world we live in today. Almost every new service on the internet, and many internal enterprise systems are built around a client-server architecture. Somewhere, deep down below the layers of code sits a centralized data store, that holds the state of the application. Source control is no longer sufficient to manage the development life cycle of the software. We need data management tools as well.

This is the pain that we solve at OffScale. If you have encountered these problems,  join our beta.

Be Sociable, Share!

About Omer Gertel

CTO @ Offscale
  • http://www.algorithm.co.il/blogs Imri

    The situation is improving though – today with Django I have a test database that’s being set up everytime I run the tests, and I generate data inside it using django-any. The problem is that it’s harder to simulate large amounts of data, for that kind of tests I’ll probably need your solution.