• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Liutauras Vilda
  • Ron McLeod
Sheriffs:
  • Jeanne Boyarsky
  • Devaka Cooray
  • Paul Clapham
Saloon Keepers:
  • Scott Selikoff
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
  • Frits Walraven
Bartenders:
  • Stephan van Hulst
  • Carey Brown

When normalizing a table, how do you decide where to move fields that are not dependent to the PK?

 
Ranch Hand
Posts: 58
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello.  
I have a table "Accounts" which holds records about accounts for a site I have been working on.  
I've recently come across the idea of normalization, and I've been stuck at the process of Second Normalization.  
My table contains fields that are not functionally dependent to the PK, cannot be Superkeys themselves, and cannot be Foreign keys.  
In this situation, how do you decide on where to move these fields?  
Below is the current version of the table in SQL:

name: varchar (PK)
is_guest : tinyint
is_online: tinyint
email: varchar
password: varchar (cyphered before added to DB)
queue_status: tinyint
is_ingame: tinyint
elo: smallint
queue_time: time  

queue_time and queue_status seems easy enough to move to another table. Since they are unique to players currently in a queue, they can be moved to another field.
name, email, password is obvious enough to keep in the same field.  
But what about all the tinyints(Which were supposed to be Booleans)?

I need these informations about the player, but none of the tinyint fields fulfil the constraints of Second normalization, where all the fields need to be functionally dependent to the PK.

So, where do I move these fields? Do I even move them? What kind of problems could I get for not correctly normalizing this table?

Any help is greatly appreciated.


 
 
Sheriff
Posts: 17734
302
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My first thought is that they are flags for a temporal state related to the name (I assume 'name' is the unique user name, not the actual name). In that case, I'd try having a table for each one instead, like guest table and online table, where user names will be added to them, with additional information. For example online table can be used to track the date/time when the user was online, meaning you'd have name, session_start, session_end columns. Then the table would be more appropriately named user_sessions or something like that. I don't know what the rules for is_guest are in your system but I imagine it could be something similar.
 
Bartender
Posts: 15737
368
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Let's take a look at all candidate keys of your relation. If I'm not mistaken, those are just:

  • name
  • email

  • Since all your candidate keys consist of only one attribute, by definition your relation is in second normal form, because a relation can only violate second normal form if there are candidate keys that are composite.

    Why do you say that queue_time and queue_status are not functionally dependent on name? After all, if I give you a name and ask you to retrieve a queue_status for me, you will be able to retrieve a unique value.

    Now, your relation is likely not in third normal form, because queue_status is probably functionally dependent on queue_time.

    What does it mean for a user to be a guest? Is that a user that has not registered with an email address? In that case, email is not a candidate key, but since is_guest is still functionally dependent on it, that also violates third normal form.

    I also question the design choice to have an attribute is_online in a database, because an application doesn't need to consult a database to determine whether a user is offline, but that has nothing to do with normalization.
     
    F Lucas
    Ranch Hand
    Posts: 58
    2
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Hello, sorry for the late reply. I had some school matters to resolve and did not have time to check back.
    I've read both replies, but I want to make sure if I got their meanings right.

    Junilu's reply:
    I've thought of that at first, but then for some reason I thought it would be weird to have multiple tables that might contain the same user.
    For example, if there is user Foo, which is currently online and in a game, I would have table Accounts, and 2 more tables Game_sessions and Online_Sessions that would have a foreign key to
    the user's ID?
    Makes sense to me, but these concepts still feel alien to me, and I'd like to know if this kind of approach is normal.

    Stephan's reply:
    Now I'm confused again.. I've been confused for a time about what "functionally dependent" means and after doing some research I thought it meant something like
    "If A determines B, B is functionally dependent to A".
    As in, no record with a different A can have the same B for it to be functionally dependent.
    Is that not the case? If it is not it makes things much simpler..
    And, the argument about online_status was a complete blind spot for me. I will consider a way to not use it.

    So, what I probably should be doing is:

    Table Account
    name(PK),email(Nullable),password,elo

    Table Online_sessions (If I do not find a way to not use this)
    name(FK)

    Table Game_sessions
    name(FK),(Not yet defined fields I will use in the actual game)

    Table Queue_sessions
    name(FK),queue_time

    Like, instead of using flags, I can just use the fact that the record exists in a specific table to get the information needed?

    Big thanks to both of you.
     
    Stephan van Hulst
    Bartender
    Posts: 15737
    368
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    F Lucas wrote:"If A determines B, B is functionally dependent to A".


    This is correct.

    As in, no record with a different A can have the same B for it to be functionally dependent.


    This conclusion is wrong.

    If A determines B, then I can ask you "Get me a tuple, ANY tuple, that has values x for attributes A, and tell me the value of its attribute B", and you will always be able to give me a single unambiguous answer. That is, there are no two tuples with identical values for attributes A, that have different values for attribute B. It does NOT mean that tuples with different values for A must also have a different value for B.

    Let's pretend for a second that every person on the planet has a unique full name, and all full names consist of a first name and a last name that are not necessarily unique:

    firstNamelastNamefavouriteColor
    DanBaileyRed
    DanEdmundsRed
    TracyLeeBlue
    IreneSpringerGreen
    DanSpringerRed
    TracyHarrisBlue

    In this relation, the primary key is (firstName, lastName).

    If I asked you to get the favourite color of the person named (Dan, Springer), you would be able to tell me that it's Red, even though there are multiple people with firstName Dan and multiple people with lastName Springer. So favouriteColor is functionally dependent on (firstName, lastName). All attributes are always trivially dependent on all keys, because keys necessarily produce unique tuples. So that's not really interesting. More interesting is to see if there are attributes that are functionally dependent on other non-keys.

    If I asked you to get the favourite color of the person named (Lee), you would NOT be able to tell me that it's Blue, because there's nothing stopping me from adding a tuple to the relation with the lastName Lee, but a different color than Blue. There is no functional dependency between lastName and favouriteColor.

    However, there is some strange cosmic rule that says that all people with the same first name must also have the same favourite color. If I asked you to tell me the favourite color of the person named (Dan), you would without a doubt be able to tell me Red. Even if we add another tuple to the database with firstName Dan, we can be sure that the favouriteColor will be Red, because that's what our cosmic constraint says.

    Now something interesting is happening: favouriteColor is functionally dependent on firstName, but firstName is NOT a key. In fact, (firstName) is a proper subset of our candidate key (firstName, lastName), so our relation violates second normal form. Instead, we should split this relation into two:

    firstNamelastName
    DanBailey
    DanEdmunds
    TracyLee
    IreneSpringer
    DanSpringer
    TracyHarris

    firstNamefavouriteColor
    DanRed
    TracyBlue
    IreneGreen

    So, what I probably should be doing is

    ...

    Like, instead of using flags, I can just use the fact that the record exists in a specific table to get the information needed?


    Depends. You can always replace a boolean attribute with a new relation that keeps track of which tuples have that boolean property. That has nothing to do with normalization though. It's just a design choice.

    If it's very rare for the boolean attribute to be 'true', you can save some space by making a separate relation for that attribute. That way, you will eliminate all 'false' values from your database.

    However, it's usually a better idea to take a look if the boolean attribute intrinsically belongs to the relation conceptually. Does the conceptual notion of an Account intrinsically have a queue time? Or are Queues just a separate application feature that happen to be related to Account? I would say the latter, so make Queue a separate relation. However, I would say that that quality of being a Guest or the quality of being Online are intrinsically part of an Account, so those would have been fine as members of the Account relation. However, as I already pointed out, those qualities are redundant and can be eliminated from your database completely: Whether a user is a guest can be derived from their email attribute, and whether a user is online is a non-persistent quality that should never make it to the database in the first place.

    If you want more advice regarding your attributes, you'll have to tell us more about what they mean. What is a guest? What are queues used for? What does it mean for a user to be 'in game'? What is 'elo'?
     
    F Lucas
    Ranch Hand
    Posts: 58
    2
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Hello.

    The explanation about second normal form was a real eye opener. I can't find a way to thank you enough for it  

    So, about the attributes..

    As you said is_guest is redundant. It just means it is a temporary account that will be deleted on session end, and does not have an email.
    So I can remove it.

    is_online really is not supposed to be in the database, but I am making a page with django and I am not sure if I can
    check if there is an ongoing session with a specific user. I will remove it if I can though.

    Below is my current idea of the database structure, and the explanation for each of the fields.
    It is going to be a wall of text, sorry.


    Tables

    Table Accounts

    name: varchar (PK), Its the username of the account. Maybe I should be using a separate ID for the PK, but since I could enforce unique usernames this way, I figured I'd just make this the PK.

    email: varchar, The email. Nullable, and if Null, this account is a "guest" account and gets restricted access to the site.

    password: varchar (cyphered before added to DB), The password.

    elo: smallint: This is a number that goes up and down depending on the player's win/lose ratio, that represents this player's skill.
    I have no way of using this for now, but I will later implement a queue which only matches players with close enough elo.

    Because name is the PK, at no point I will get two records when querying by name.
    None of the other fields can determine another field.
    So this field is in third normal form.

    Table Queue_sessions
    Since I am making a multiplayer game, I need a way for players to wait in a queue until there are enough players to fill a match lobby with.
    This is where currently queued players get recorded.

    name(FK): The player that is in queue. Unique, not nullable.

    queue_status: tinyint, This is either 0(Not in queue),1(In queue waiting for match),or 2(In queue but not available for matching.)
    Reasons for the status may vary, like the user being prompted for confirmation to join a match. It is in queue, but not eligible for receiving another alert for a match.

    queue_time: time, The amount of time this player has been in the queue. Players with the bigger queue_time are prioritized to find a match quicker.

    Because name is not nullable and always unique, I can also query this table by name and never get 2 different records.
    I cannot determine any other fields using anything other than name, so this table must be in third normal form.

    Table Game_sessions

    My game is a moddable, multiplayer, turn-based game.
    Which means anybody can make modified versions of my game by changing the game logic,
    There are multiple players interacting with each other in a match,
    And I do not need to care about latency.

    Because it is moddable, I cannot make a table for each game type, and ensure I only have the fields that I need.

    So this is what I've came up with for now..

    name(FK): The name of the player in game. Unique, not nullable.

    game_info: a varchar that will hold a JSON, not too big in size, that holds all the variables needed for the game.
    The field name will likely change. Not nullable.

    match(FK): The ID of the match. It refers to the table Matches. Not nullable.

    name is unique and not nullable, game_info and match is not unique.
    This table is in third normal form.

    (And some more fields might be added.)

    Table Matches

    id(PK): Just an auto-increment int.

    ruleset_id(FK): The id of the used Ruleset(Object used to modify the game). Refers to the table Rulesets. Not nullable.

    ongoing_time: Time this match has been going for. I might have to terminate matches that go on for too long and delete records related to it. Nullable, because it is null for finished matches.

    result: A JSON that holds information about a finished match. Contents vary depending on the Ruleset, so again, I cannot make definite fields to hold the values. Nullable, because it is null for ongoing matches.

    I have a PK, no other value is unique, none of the values define anything other than the PK. This table is in third normal form.

    Table Rulesets

    id(PK): Just an auto-increment int.

    content: JSON to be read by the backend code. unique.

    No two rulesets will have the same id, or the same content. Always only one record when I query for either of the fields.
    This table is in third normal form.


    Is this correct?

    Thanks.
     
    Stephan van Hulst
    Bartender
    Posts: 15737
    368
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    F Lucas wrote:is_online really is not supposed to be in the database, but I am making a page with django and I am not sure if I can check if there is an ongoing session with a specific user.


    I'm going to assume that Django is storing user sessions in its own session table in the database, and it keys sessions by a value that is provided in a session cookie or request parameter. It will be fairly easy for you to update your is_online attribute to true when a user request comes in, but it will be hard to update the is_online attribute to false when the session expires.

    Instead, I suggest that you remove this attribute from your accounts table, and create a new join table that contains an account_id and a session_id, so you can link a user to a Django session. When you want to look up whether a user is online or offline, you just look up their Django session through this join table, and then you can look up the expiration time of the session and compare that to the current time.

    The reason I would use a join table instead of putting the session_id directly into the account table is because it seems that it is still relatively unknown how well this feature will work, so you don't want to put columns in your account table that you don't know you will need. And besides, I think this is a feature of the application that is not intrinsically part of a user account.

    name: varchar (PK), Its the username of the account. Maybe I should be using a separate ID for the PK, but since I could enforce unique usernames this way, I figured I'd just make this the PK.


    I am a big fan of using natural keys, because it makes it easier to read and migrate a database. The problem with natural keys is that they might take up more space in your database if the values take up more than a few bytes.

    Consider this: If somebody uses the account name "f.lucas", that will not only take up 7 bytes in your accounts table, but it will also take up 7 bytes in every foreign row that refers to the account, more if you use a longer account name or a larger character set. If you use an INT surrogate key instead, it will only take up 4 bytes per foreign key.

    Now, I'm not saying you should blindly add surrogate keys to all your tables. I'm just saying it's something to consider. Here's a fun experiment: Build up your application completely with as few surrogate keys as possible, and add a couple of millions of records to all your tables. Then transform your database so that your tables use surrogate keys as their foreign keys, and compare the size of the database. Maybe it matters, maybe not.

    password: varchar (cyphered before added to DB), The password.


    I hope you're using Django's authentication mechanism, and you're not building your own. Do whatever your application framework provides to you. Everything else is unsafe.

    queue_status: tinyint, This is either 0(Not in queue),1(In queue waiting for match),or 2(In queue but not available for matching.)


    Why have a 0 status at all? Just remove the entry when a player isn't queued up.

    Also, if your database supports it, use enums instead of integers for columns like this. It's much nicer to read something like 'queued' and 'ready', instead of 1 and 2.

    queue_time: time, The amount of time this player has been in the queue. Players with the bigger queue_time are prioritized to find a match quicker.


    Don't store an amount of time, you can't reliably update it. Store a timestamp. Players with a smaller timestamp are prioritized.

    name(FK): The name of the player in game. Unique, not nullable.

    game_info: a varchar that will hold a JSON, not too big in size, that holds all the variables needed for the game.


    I hope it only holds all variables related to the specific player. If you store other variables needed for the game, your database will contain a redundant copy of the information, per player.

    For larger text formats like JSON, use CLOB instead of VARCHAR. Or consider serializing your game state to a binary format, and saving it as a BLOB.

    ongoing_time: Time this match has been going for. I might have to terminate matches that go on for too long and delete records related to it. Nullable, because it is null for finished matches.


    Again, use a timestamp and not an amount of time. Don't use this attribute to determine whether a game is finished. Either remove finished games, or if you want to save data about a finished game, move it to a different table.

    result: A JSON that holds information about a finished match. Contents vary depending on the Ruleset, so again, I cannot make definite fields to hold the values. Nullable, because it is null for ongoing matches.


    Use a different table for finished matches. Use CLOB or BLOB instead of VARCHAR. Not nullable.

    content: JSON to be read by the backend code. unique.


    Use CLOB or BLOB for rulesets as well. And I don't think you really need to enforce the unique constraint on the content of the ruleset. It's probably highly unlikely that two identical rulesets will get added to the table, and even if they do, is it that big a deal?
     
    F Lucas
    Ranch Hand
    Posts: 58
    2
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Wow.

    That post just made me realize how shallow was my planning for this.

    The problem with natural keys is that they might take up more space in your database if the values take up more than a few bytes.



    The space usage on the name will probably be irrelevant because my project will not be that big, but I will definitely keep it in mind.

    I hope you're using Django's authentication mechanism, and you're not building your own.



    I am. I've made the mistake of trying to cipher passwords on my own before...

    Why have a 0 status at all? Just remove the entry when a player isn't queued up.

    Also, if your database supports it, use enums instead of integers for columns like this. It's much nicer to read something like 'queued' and 'ready', instead of 1 and 2.



    Good point. I am going to remove 0.

    And my DB does not support enums, but django does and shows the field value by the enum name set in the code.

    Don't store an amount of time, you can't reliably update it.



    Completely true. I don't know what I was thinking about how to update a few dozens of players worth of time every second..

    I hope it only holds all variables related to the specific player.



    It does.

    For larger text formats like JSON, use CLOB instead of VARCHAR. Or consider serializing your game state to a binary format, and saving it as a BLOB.


    Ok. This one might be substantial in size enough to be worth to serialize, since I have no idea what the contents may be.

    It's probably highly unlikely that two identical rulesets will get added to the table, and even if they do, is it that big a deal?



    This is the only part I am unsure of.
    Since my site is just a personal project, I cannot expect many concurrent players.
    If somebody makes a mod, identical to another, and makes lobbies (Something close to a queue) with another name, I will be splitting the potential player pool that was
    supposed to go into the same match, into two lobbies.
    This might make custom modes impossible.

    Also I want to properly credit modders, and I'd not like people arguing about copy and pasting mods.
    Might be an unnecessary precaution, but I think this will do more good than harm.

    And I guess that is all for now.

    Thanks for all the help Stephan.

    I wish you a great day, you surely made mine better
     
    Stephan van Hulst
    Bartender
    Posts: 15737
    368
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Good for sticking with it, and have fun!

    I'm curious, what kind of game are you designing anyway?
     
    F Lucas
    Ranch Hand
    Posts: 58
    2
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    It's basically going to be a card game battle royale (The original game mode)
    but moddable.
    I played some Slay the Spire and thought that the combat system was really creative, and wanted to make a multiplayer game out of it.

    ..If I finish it that is. I sure hope I manage to do it.
     
    I think he's gonna try to grab my monkey. Do we have a monkey outfit for this tiny ad?
    Smokeless wood heat with a rocket mass heater
    https://woodheat.net
    reply
      Bookmark Topic Watch Topic
    • New Topic