Navigating Data Structure Decisions as an Indie Dev

Last week I wrote about the benefits of decision docs for solo / indie devs. This week I wanted to walk through a real example that I encountered recently.

I’ve been roughly describing my unnamed project as a “universal data modeling tool.” You define the data models used in your application and it allows you to export those data models to any format that you may be using (eg. TypeScript, SQL, GraphQL, your ORM, Protobuff, etc). I ran into a tricky decision this week around how I could structure the data models for this tool (very meta… I know).

The Problem

Data models are similar but not identical at different layers of the stack. Here are two examples of how data may be structured that highlight this fact.

1. Different properties

You have a users table in your database (SQL):

CREATE TABLE users (
    id int PRIMARY KEY,
    username varchar(25) NOT NULL,
    password varchar(30) NOT NULL
);

You have a user model on your client (TypeScript):

interface users {
    id: number;
    username: string;
}

We don’t want to expose the password to the client so the password property is present in the database version of the model but absent in the client version.

2. Different models

Instead of storing the password on the user model directly, you could store it in a related model (SQL):

CREATE TABLE users (
    id int PRIMARY KEY,
    username varchar(25) NOT NULL
);

CREATE TABLE credentials (
    id int PRIMARY KEY,
    type enum("password", "facebook", "api-token") NOT NULL,
    secret varchar(255) NOT NULL,
    user_id int references users(id)
);

In this case the credentials model should be completely absent from the client (TypeScript):

interface users {
    id: number;
    username: string;
}

To summarize the problem: I need a way of representing the different version (eg. database and client) of these models and properties such that the common pieces can be shared, but the uncommon pieces can be added / removed in the differing contexts.

I’ve decided to call these different versions “contexts.” What do you think about the choice of name? #twohardthings

Additional Information

My goal for this tool is that it should not take away any flexibility compared to if you were to create these models directly on the client / database. Maybe this is impossible, but at least it’s what I’m shooting for.
I believe this is an edge case, but not a far edge case. I expect that most models will be identical, but also that most projects will run into this issue for at least one model. To me this means that the tool must accommodate these tasks, but it doesn’t need to be the most streamlined, perfect solution.
Contexts and models have a many-to-many relationship. Each context can have many models and each model can have many contexts.
This entire data structure will be saved in a JSON file and tracked in version control. The structure would ideally be readable by a developer who understands JSON.

Possible Solutions

1. Context inheritance

Contexts can inherit from one another. When a context inherits from another, models from the parent are included in the child.

{
    "contexts": [
        {
            "descriptor": "client",
            "models": [
                {
                    "descriptor": "user",
                    "properties": ["..."]
                }
            ]
        },
        {
            "descriptor": "database",
            "extends": "client",
            "models": [
                {
                    "descriptor": "credentials",
                    "properties": ["..."]
                }
            ]
        }
    ]
}

Pros:

Data is normalized, so it’s less likely for data to be corrupted. Since each model is only represented in one place, when that model is changed, it only needs to be changed in one place leaving less room to make an error or have data that’s out of whack.

Cons:

Only supports changing models, not properties so the first example case (where the password is stored in the users table) cannot be supported.
Any time I need to see which models belong to a specific context it involves traversing the context “tree” and calculating the result. This could be error prone.
Since there is no singular list of models, it’s little bit harder to read or know where to go when looking for a specific model.

2. Context tags

Each model is tagged with the contexts where it can be accessed.

{
    "models": [
        {
            "descriptor": "user",
            "properties": [
                {
                    "descriptor": "id",
                    "contexts": ["client", "database"]
                },
                "..."
            ],
            "contexts": ["client", "database"]
        }
        {
            "descriptor": "credentials",
            "properties": ["..."],
            "contexts": ["database"]
        }
    ]
}

Pros:

Can support including / excluding properties
Clean and easy to read

Cons:

Managing which models & properties are tagged with which contexts could be cumbersome. It seems redundant to tag each model / property if the 95% case will be that a model or property is included. It clutters the readability a little bit.

3. Denormalized approach

Contexts can inherit from other contexts, but rather than inferring which models are included, models are explicitly copied.

{
    "contexts": [
        {
            "descriptor": "client",
            "models": [
                {
                    "descriptor": "user",
                    "properties": ["..."]
                }
            ]
        },
        {
            "descriptor": "database",
            "extends": "client",
            "models": [
                {
                    "descriptor": "user",
                    "properties": ["..."]
                },
                {
                    "descriptor": "credentials",
                    "properties": ["..."]
                }
            ]
        }
    ]
}

Pros:

Can support all functionality because models and properties can be added / removed on a context by context basis.
When looking at the data model from the perspective of a specific context, it’s very clear which models are included.
No calculation needed to see models for a specific context.

Cons:

Data is redundant. When making a change to a model is the change made to all copies of the model or does the user need to specify which context they are operating in? Hard to maintain.
The JSON structure will be large and harder to read due to duplication of information.

Option Matrix

Based on the pros and cons list, here is a high level matrix for each option.

Option	Functionality	Readability	Maintainability
Context inheritance	🟠	🟠	🟠
Context tags	🟢	🟡	🟡
Denormalized approach	🟢	🔴	🔴

Decision

At this point I can pick an option, brainstorm more options, or explore deeper. The “Context tags” option seems promising so I will start writing up a more detailed spec for next week which will dive deeper into this option and see if I can make any tweaks to improve the structure further.

One more thing…

Oh hey! I’m also releasing an alpha version of the modeling tool for OS X. This is super early and buggy but I’m trying to release early and often. You can check it out here.