Relational Data Design

Relational Data Design

Representing complex data in a computational framework

Data layout and structure is a constant trade-off between performance, readability, maintenance, future-proofing, extendability, and reuse. While more complex tables can achieve higher performance and readability, the cost becomes future proofing, maintainability, and often scalability.

Moving towards a document store type of data storage is incredibly beneficial for their simplicity and flexibility. They are more akin to file systems where documents are accessed by name and have few limitations on how they are structured. This is very good for horizontal scalability since it is easy to add more hardware when data does not need to be consistent across multiple tables/machines.

The relational model is a very good fit for developing non-sparse data layouts. However, for widely scaled systems the relational model can be inappropriate. Larger calculations and process now run via map-reduce. The most valuable high level data processing techniques are often a combination of hardware-focused data manipulation layers being used with functional type algorithms.

Data Normalization

Relational models require atomicity when working with data. For our purposes, this means normalizing data to the level of a noun, or a nameable piece. Consider the following setup code:

const mesh_room: Mesh = loadMesh("room_mesh");
const mesh_room_start: Mesh = loadMesh("room_start_mesh");
const mesh_room_trapped: Mesh = loadMesh("room_trapped_mesh");
const mesh_key: Mesh = loadMesh("key_mesh");
const mesh_potion: Mesh = loadMesh("potion_mesh");
const mesh_armor: Mesh = loadMesh("armor_mesh");

const texture_room: Texture = loadTexture("room_texture");
const texture_room_start: Texture = loadTexture("room_start_texture");
const texture_room_trapped: Texture = loadTexture("room_trapped_texture");
const texture_key: Texture = loadTexture("key_texture");
const texture_potion: Texture = loadTexture("potion_texture");
const texture_armor: Texture = loadTexture("armor_texture");

const animation_key_bob: Animation = loadAnimation("key_bob_animation");

const k1: PickupId = createPickup(
  type_key,
  mesh_key,
  texture_key,
  tint_color_copper,
);
const k2: PickupId = createPickup(
  type_key,
  mesh_key,
  texture_key,
  tint_color_silver,
);
const k3: PickupId = createPickup(
  type_key,
  mesh_key,
  texture_key,
  tint_color_gold,
);
const p1: PickupId = createPickup(
  type_potion,
  mesh_potion,
  texture_potion,
  tint_color_green,
);
const p2: PickupId = createPickup(
  type_potion,
  mesh_potion,
  texture_potion,
  tint_color_purple,
);
const a1: PickupId = createPickup(
  type_armor,
  mesh_armor,
  texture_armor,
  null,
);

const r1: Room = createRoom(
  worldPosition(0, 0),
  mesh_room_start,
  texture_room_start,
  null,
);
const r2: Room = createRoom(
  worldPosition(-20, 0),
  mesh_room_trapped,
  texture_room_trapped,
  hpDamage(10),
);
const r3: Room = createRoom(
  worldPosition(-10, 20),
  mesh_room,
  texture_room,
  null,
);
const r4: Room = createRoom(
  worldPosition(-30, 20),
  mesh_room,
  texture_room,
  null,
);
const r5: Room = createRoom(
  worldPosition(20, 10),
  mesh_room_trapped,
  texture_room_trapped,
  hpDamage(10),
);

addDoor(r1, r2, null);
addDoor(r1, r3, k1);
setRoomAsSpecial(r1, starting_room, worldPosition(1, 1));

addPickup(r2, k1, worldPosition(-18, 2));
addDoor(r2, r1, null);
addDoor(r2, r4, k2);

addPickup(r3, k2, worldPosition(-8, 12));
addPickup(r3, p1, worldPosition(-7, 13));
addPickup(r3, a1, worldPosition(-8, 14));
addDoor(r3, r1, null);
addDoor(r3, r2, null);
addDoor(r3, r5, k3);

addPickup(r4, k3, worldPosition(-28, 14));
addPickup(r4, p2, worldPosition(-27, 13));
addDoor(r4, r2);

setRoomAsSpecial(r5, exit_room, null);

This snippet is fairly complicated, involves multiple objects referencing each other, and needs to orchestrated so that all pieces are created before they can be referenced. This results in staggered phases of setup and linking. In order to make this design more database-like, the data needs to be normalized into tables.

Meshes - 0NF

mesh_idmesh_name
mesh_room“room_mesh”
mesh_room_start“room_start_mesh”
mesh_room_trapped“room_trapped_mesh”
mesh_key“key_mesh”
mesh_potion“potion_mesh”
mesh_armor“armor_mesh”

Textures - 0NF

texture_idtexture_name
texture_room“room_texture”
texture_room_start“room_start_texture”
texture_room_trapped“room_trapped_texture”
texture_key“key_texture”
texture_potion“potion_texture”
texture_armor“armor_texture”

Animations - 0NF

animation_idanimation_name
animation_key_bob“key_bob_animation”

Pickups - 0NF

pickup_idmesh_idtexture_idpickup_typecolor_tintanimation
k1mesh_keytexture_keykeycopperanimation_key_bob
k2mesh_keytexture_keykeysilveranimation_key_bob
k3mesh_keytexture_keykeygoldanimation_key_bob
p1mesh_potiontexture_potionpotiongreen
p2mesh_potiontexture_potionpotionpurple
a1mesh_armortexture_armorarmor

Rooms - 0NF

room_idmesh_idtexture_idworld_positionpickupsdamagedoors_tolockedis_startis_end
r1mesh_room_starttexture_room_start0, 0r2,r3r3 with k1true (1,1)false
r2mesh_room_trappedtexture_room_trapped-20, 10k110hpr1,r4r4 with k2falsefalse
r3mesh_roomtexture_room-10, 20k2,p1,a1r1,r2,r5r5 with k3falsefalse
r4mesh_roomtexture_room-30, 20k3,p2r2falsefalse
r5mesh_room_trappedtexture_room_trapped20, 1025hpfalsetrue

Normalization

There are many stages of normalization, including six normal forms. It is critical to know them and understand why they exist. Opting for a data first approach instead of objects/classes allows for data to be manipulated together. It also allows for changes to the design now require fewer changes to the data.

First Normal Form

Ever cell contains one and only one atomic value. There should exist no array of values, no null entries, and every row should be distinct. All tables should have a unique primary key to assist with sorting and optimizing data access. The key should also be as small as possible and because of the uniqueness rule, every row has an implicit key. This can include using the whole row as a key, but this is to be avoided if possible.

On the topic of treating a row in a database as a key, think of the table as a set and an insert is checking if a combination exists. This can be useful if the number of possible values is small enough to be represented with a bit set. Bit sets take up significantly less memory and can be accessing in O(1) time.

To normalize to first normal form, we need to first remove null values and treat them as new tables.

Pickups - 1NF

pickup_idmesh_idtexture_idpickup_type
k1mesh_keytexture_keykey
k2mesh_keytexture_keykey
k3mesh_keytexture_keykey
p1mesh_potiontexture_potionpotion
p2mesh_potiontexture_potionpotion
a1mesh_armortexture_armorarmor

PickupTints - 1NF

pickup_idcolor_tint
k1copper
k2silver
k3gold
p1green
p2purple

PickupAnimations - 1NF

pickup_idanimation
k1animation_key_bob
k2animation_key_bob
k3animation_key_bob

First normal form will create more tables with fewer columns in those tables and will only create rows for things that matter. This means more memory usage, but it also means that we no longer need to check for null values in our code, removing unnecessary state.

Rooms - 1NF

room_idmesh_idtexture_idworld_positionis_startis_end
r1mesh_room_starttexture_room_start0, 0true (1,1)false
r2mesh_room_trappedtexture_room_trapped-20, 10falsefalse
r3mesh_roomtexture_room-10, 20falsefalse
r4mesh_roomtexture_room-30, 20falsefalse
r5mesh_room_trappedtexture_room_trapped20, 10falsetrue

PickupInstances - 1NF

room_idpickups
r2k1
r3k2
r3p1
r3a1
r4p2
r4k3

Doors - 1NF

room_iddoors_to
r1r2
r1r3
r2r1
r2r4
r3r1
r3r2
r3r5
r4r2

LockedDoors - 1NF

room_idto_roomlocked_with
r1r3k1
r2r4k2
r3r5k3

Traps - 1NF

room_iddamage
r210hp
r525hp

Laying out data like this takes up less space in larger projects as the number of null entries or arrays would have only increased with the increased complexity. Now we can add new features without having to revisit the original objects.

Second Normal Form

Second normal form is about trying to pull out columns that don’t depend on only a part of the primary key. This can happen when the table requires a compound primary key. Consider the following alternative representation of pickups

Pickups - 0NF

mesh_idtexture_idpickup_typecolor_tint
mesh_keytexture_keykeycopper
mesh_keytexture_keykeysilver
mesh_keytexture_keykeygold
mesh_potiontexture_potionpotiongreen
mesh_potiontexture_potionpotionpurple
mesh_armortexture_armorarmor

Pickups - 1NF

mesh_idtexture_idpickup_type
mesh_keytexture_keykey
mesh_potiontexture_potionpotion
mesh_armortexture_armorarmor

TintedPickups - 1NF

pickup_typecolor_tint
keycopper
keysilver
keygold
potiongreen
potionpurple

Pickups - 2NF

pickup_idpickup_type
k1key
k2key
k3key
p1potion
p2potion
a1armor

PickupTints - 2NF

pickup_idcolor_tint
k1copper
k2silver
k3gold
p1green
p2purple

PickupAssets - 2NF

mesh_idtexture_idpickup_type
mesh_keytexture_keykey
mesh_potiontexture_potionpotion
mesh_armortexture_armorarmor

PickupAnimations - 2NF

pickup_typeanimation
keyanimation_key_bob

Third Normal Form

Third Normal Form removes transitive dependencies on the primary key via another column in the table. For example, any room that has a mesh will also have a matching texture.

Rooms - 1NF

room_idmesh_idtexture_idworld_positionis_startis_end
r1mesh_room_starttexture_room_start0, 0true (1,1)false
r2mesh_room_trappedtexture_room_trapped-20, 10falsefalse
r3mesh_roomtexture_room-10, 20falsefalse
r4mesh_roomtexture_room-30, 20falsefalse
r5mesh_room_trappedtexture_room_trapped20, 10falsetrue

Rooms - 3NF

room_idtexture_idworld_positionis_startis_end
r1texture_room_start0, 0true (1,1)false
r2texture_room_trapped-20, 10falsefalse
r3texture_room-10, 20falsefalse
r4texture_room-30, 20falsefalse
r5texture_room_trapped20, 10falsetrue

TexturesAndMeshes - 3NF

texture_idmesh_nametexture_name
texture_room_startroom_start_meshroom_start_texture
texture_room_trappedroom_trapped_meshroom_trapped_texture
texture_roomroom_meshroom_texture
texture_texture_keykey_meshkey_texture
texture_texture_potionpotion_meshpotion_texture
texture_texture_armorarmor_mesharmor_texture

Boyce-Codd Normal Form

Boyce-Codd Normal Form, BCNF, is normalizing the functional dependencies.

Rooms - BCNF

room_idworld_positionis_startis_end
r10, 0truefalse
r2-20, 10falsefalse
r3-10, 20falsefalse
r4-30, 20falsefalse
r520, 10falsetrue

Rooms - BCNF

texture_idis_startis_end
texture_room_starttruefalse
texture_roomfalsefalse
texture_room_trappedfalsetrue

Domain Key Normal Form

Domain key normal form considered to be the last normal forms, but for developing efficient data structures, it should be considered early and often. Domain knowledge is preferable when writing code as it makes immediate sense. Domain knowledge is the idea that data depends on other data, but only given information about the domain in which it resides. It can also be presenting a human interpretation of the data, but this is not always the case. Consider the following transformation:

TexturesAndMeshes - 3NF

texture_idmesh_nametexture_name
texture_room_startroom_start_meshroom_start_texture
texture_room_trappedroom_trapped_meshroom_trapped_texture
texture_roomroom_meshroom_texture
texture_texture_keykey_meshkey_texture
texture_texture_potionpotion_meshpotion_texture
texture_texture_armorarmor_mesharmor_texture

Assets - DKNF

asset_idstubbed_name
asset_room_startroom_start_*
asset_room_trappedroom_trapped_*
asset_roomroom_*
asset_texture_keykey_*
asset_texture_potionpotion_*
asset_texture_armorarmor_*

Operations

Manipulating data will always be handled with either an insert, delete, or update. It’s considered best practice to only modify data using actions that would be commonly found in a database to avoid unexpected state complexity. Even though the data is now laid out like a database, we are not required to use a query language to access it.

Stream Processing

All data can be implemented as streams. Tables can be thought of as sets of all possible permutations of the attributes. Processing a set can be thought of as traversing the set and producing an output set. Given that sets are unordered, this means we can trivially introduce parallel processing.

Stream processing, instead of random access processing, means that we can process data without the need write to variables outside of the process. This means no side-effects and easy parallelization because the order of operations is irrelevant. This makes it easier to think about the system, inspect, debug, extend, and interrupt it.

References