Do I really need that database?

23 Nov 2015

I’m pretty sure that every time I’ve built a web application of any sort in the last decade, I’ve reached for a database. Usually SQLite, at least to start, because it lets me get going fast without any infrastructure in place. It’s not even a question, really: if I’m building a web app, it needs a database of some kind behind it.

Actually, let me rewind just a smidge. I really dig Clojure lately, and my favorite thing about it are the immutable data structures: once you’ve created your map, or vector, or whatever, you can’t change it. Instead, you can derive new things from it. Now that I’m comfortable working with immutable by default things, working without them feels almost like I’m building on shaky foundations.

As much as I like Clojure, the chances I can convince my team to start using it are effectively nil. As an experiment, I started sketching out what some of our data classes might look like if I could at least start using things in an immutable way by default, and I came up with something like this (we’re a Java shop, so it’s in Java):

public class User {
  public enum Role {
    ADMIN, EDITOR, OPERATOR, KIOSK
  }

  public final UUID uuid;
  public final String username;
  public final String password;
  public final String fullName;
  public final String emailAddress;
  public final Role role;

  public User(UUID uuid, String username, String password, String fullName, String emailAddress,
              Role role) {
    this.uuid = uuid;
    this.username = checkNotNull(username);
    this.password = checkNotNull(password);
    this.fullName = fullName;
    this.emailAddress = emailAddress;
    this.role = checkNotNull(role);
  }

  public User withUUID(UUID uuid) {
    return new User(uuid, this.username, this.password, this.fullName, this.emailAddress, this.role);
  }

  public User withUsername(String username) {
    return new User(this.uuid, username, this.password, this.fullName, this.emailAddress, this.role);
  }

  // And so on for each field.
}

All fields on that class are final, so we can only assign to them once. The constructor applies what constraints we need, and then there’s a with*(..) method for each field in the class that’ll create and return a new object, with just the one field switched out.

In use, it gets used effectively the same way as a Clojure map, except that I have to write all those with* methods. They’re tedious, but not really any more so than writing setters. Plus, I can keep all the data validation in one place (the constructor), I don’t have to write getters, and I don’t have to write copy methods anymore, because these can’t be changed by accident. (And we have had issues with things changing by accident..)

So they work, but how far can I take this? Using groov as example, I modelled out our entire project structure:

public class Project {
  public final Set<User> users;
  public final Set<Page> pages;
  public final Set<Device> devices;
  public final Settings settings;

  public Project(Collection<User> users, Collection<Page> pages, Collection<Device> devices,
                 Settings settings) {
    this.users = ImmutableSet.copyOf(users);
    this.pages = ImmutableSet.copyOf(pages);
    this.devices = ImmutableSet.copyOf(devices);
    this.settings = checkNotNull(settings);
  }

  // A bare constructor to make getting started easy
  public Project() {
    this(ImmutableSet.of(), ImmutableSet.of(), ImmutableSet.of(), new Settings());
  }

  public Project withUsers(Collection<User> users) {
    return new Project(users, this.pages, this.devices, this.settings);
  }

  // ...
}

I lean on Guava to keep my collections immutable, and we’re good to go: I can model out the entire server state easily with immutable data.

That’s great, but things have to change sometime, else it’s kind of useless. So in this prototype I created a little container to hold just the most recent instance of this project, and a way to change it:

public class AppState {
  private final AtomicReference<Project> project = new AtomicReference<>(new Project());

  public Project getProject() {
    return project.get();
  }

  public void transact(UnaryOperator<Project> updateFn) {
    Project original = project.getAndUpdate(updateFn);

    // At this point I can diff the original and updated project to check for changes, fire
    // messages, save it to disk, etc.
  }
}

I now have a container for all of the important server state that I can hand off to things that need it (servlets, background processing, etc.), and they can’t screw it up for anyone else. Once something grabs a reference to Project.users or whatever, no one can change those objects out from underneath them.

The only bit of synchronization I’ve needed to this point is that AtomicReference.

Now, getting back to that database: my data set is going to fit in memory, easily. The objects are all immutable, and the entire server is built around swapping out that Project instance. When the time came to start persisting this thing to disk, I didn’t reach for SQLite this time, I reached for Jackson and just wrote the whole thing to disk as JSON whenever the project changed.

public class Store {
  private final File projectDirectory;
  private final ObjectMapper objectMapper;
  private final ObjectWriter objectWriter;

  public Store(File file) {
    this.projectDirectory = makeSureThisIsADirectoryAndICanWriteToIt(file);

    // Create and configure objectMapper and objectWriter too
  }

  public void saveProject(Project original, Project updated) {
    saveSettings(original.settings, updated.settings);
    saveUsers(original.users, updated.users);
    savePages(original.pages, updated.pages);
    saveDevices(original.devices, updated.devices);
  }

  public void saveSettings(Settings original, Settings updated) {
    if (original == updated) {
      // Nothing to do
      return;
    }

    logger.info("saveSettings - settings have changed, writing them to disk.");
    writeValueAsJson(updated, "settings");
  }

  private void writeValueAsJson(Object value, String name) {
    try {
      File f = File.createTempFile(name, "json", projectDirectory);
      writer.writeValue(f, value);
      Files.move(f.toPath(), new File(projectDirectory, name + ".json").toPath(),
        StandardCopyOption.REPLACE_EXISTING, StandardCopyOption.ATOMIC_MOVE);
    } catch (IOException e) {
      logger.info("writeValueAsJson - Exception occurred while writing " + name, e);
    }
  }

  // And so on
}

As of this writing, I have a fully functional storage system that takes up a whopping 152 lines of code, it only writes the changed bits of the project (I could have written the whole thing in one go, but I like splitting it up a bit), and if something goes wrong I can easily inspect the data in a text editor.

I seriously don’t need a database this time around. It’s awesome. But it only works as long as I keep everything immutable: the moment something within that Project can change, the whole house of cards falls apart. For now though, I’m pretty happy with this, and it’s probably how I’ll start anything new from now on.

Granted, this won’t scale forever: eventually I’m not going to be able to keep everything in memory and it’s gonna have to spill to disk. When that happens, I think I’m going to give Datomic a strong look. As far as I can tell, this is what Datomic already does: it makes it look like your database is just a single immutable instance in memory. You pass around a database instance like I would pass around that Project instance above, and the data read from that instance is completely immutable. Changing the database (e.g. transacting it) returns a new instance, so again, you can’t change the data accidentally while something else is using it. Datomic handles the details of reading, writing, and caching behind the scenes so you don’t have to worry about it.

Jonathan Fischer

Do I really need that database?