• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Do I really need indexes?

 
P Marksson
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So I am having a hard time deciding whether or not I should use indexes. At one point, I want to benefit from faster selects. On the other hand, I am afraid my frequent insert/update operations will just slow everything down.

My scheme is kinda simple, just two tables. Here is the code:


So, one Group table and one URL table. As the code indicate, group names must be unique and each Group can contain multiple URLs.

Groups are never updated(modifying existing entries), but insertions(adding new entries) shouldn't be rare.

For URLs, insertions should be common. Updates of counter column happens very often.

Select queries also happens very often. They looks like this "Select where group = :group and url = :url".

So, do I need indexes? Share your opinions

Using MySQL.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, you should have indexes. Probably, unless the system is mainly for inserts and updates, and rarely queried, you should have indexes. Always index foreign keys to prevent pinning the entire table when the parent is updated or deleted.

Anyway, both tables already have indexes because of the primary key, neither of which are needed, because the data in the table is already unique. I don't understand why people are obsessed with adding numbers to unique textual data. :/ Group is a keyword, so it should be avoided. Groups (i dislike plurals in table names) and group_fk just seem strange to me. URLs can be far greater than 80 characters. 256 might be a good starting point, maaaaaybe.

Also, if you are not going to name your constraints, you can just put them inline:
 
P Marksson
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Wow! Nice improvements!

One question though: Shouldn't URL.Category be a VARCHAR? How can an INT reference a VARCHAR?
 
Claude Moore
Ranch Hand
Posts: 829
7
IBM DB2 Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think you should try some use cases. You are indexing VARCHAR columns;if you are going to run selects with LIKE clause on such columns, you may not found any benefit at all - query may result in a full table scan.
Does Mysql provide some performance analyzer? if so, you may start with no index at all and add indexes later, when needed and the overhead in inserting new rows is less important than faster select time.
Anyway if you need uniqueness on a char key, you need somehow to specify an unique constraint and some databases implement uniqueness via indexes..so you're isn't really a choice...
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
P Marksson wrote:Wow! Nice improvements!

One question though: Shouldn't URL.Category be a VARCHAR? How can an INT reference a VARCHAR?

Yeah. Good catch; i edited poorly.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Claude Moore wrote:You are indexing VARCHAR columns;if you are going to run selects with LIKE clause on such columns, you may not found any benefit at all - query may result in a full table scan.

Why? As long as the characters before the wildcard are specified, the index should be used. If not, there's nothing you can do anyway. Regardless, the index cost is minimal.
 
Claude Moore
Ranch Hand
Posts: 829
7
IBM DB2 Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Brian Tkatch wrote:
Claude Moore wrote:You are indexing VARCHAR columns;if you are going to run selects with LIKE clause on such columns, you may not found any benefit at all - query may result in a full table scan.

Why? As long as the characters before the wildcard are specified, the index should be used. If not, there's nothing you can do anyway. Regardless, the index cost is minimal.

Right, I meant a scenario with "column like '%searched%'. "  constraints.
 
Stefan Evans
Bartender
Posts: 1807
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just to put the opposing view, I am one of those who would use numbers to generate the link between these two tables.
Probably as a result of all of that third normal form database design that got drilled into my head along the way.

So I would give the category table an id field, and the foreign key reference from the URL table would reference that.

Reasons (not that they are necessarily good ones)
- Integer comparisions are easier/faster than string comparisions  (premature optimization)
- The category/group name is only stored in the database once, rather than repeated for every URL.  If you ever want to change it (not in your use cases I agree) there would be only one row to change in the database.

 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stefan Evans wrote:Just to put the opposing view, I am one of those who would use numbers to generate the link between these two tables.
Probably as a result of all of that third normal form database design that got drilled into my head along the way.

So I would give the category table an id field, and the foreign key reference from the URL table would reference that.

Reasons (not that they are necessarily good ones)
- Integer comparisions are easier/faster than string comparisions  (premature optimization)
- The category/group name is only stored in the database once, rather than repeated for every URL.  If you ever want to change it (not in your use cases I agree) there would be only one row to change in the database.

What does 3NF have to do with IDs?
Why are integers comparisons any faster? They are both, ultimately, done in binary.
The category.group name is small, and there is no issue repeating it, indeed, it'll make it easier to read manually. I would strongly recommend leaving it as character based, as i see zero benefit, and a slight negative in adding a redundant id.
The url is larger, and if it actually takes up hundreds of characters and is repeated in many tables, while the space is negligible (probably used anyway by the db blocks, so no space saved anyway) it might look cleaner, or save memory in the code? No idea.

Adding ids is up there with rebuilding indexes. Some people think it ought to help, even though it rarely, if ever, does.
 
P Marksson
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yeah I noticed that to. Having  URL.Category as a varchar means the string for a given group/category is defined/exists on more than one place. Thoughts on that?
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
P Marksson wrote:Yeah I noticed that to. Having  URL.Category as a varchar means the string for a given group/category is defined/exists on more than one place. Thoughts on that?

A lot of people thing using a number is faster or uses less space. Until they bring some sort of proof, i think it is a cumbersome idea that offers no perceptible benefit, if any at all.

I learnt to do it that way originally. I have have since switched after reading about people who questioned this dubious idea and seeing no benefit whatsoever is using the numbers. Since then, it has gotten easier to read data as i don't have to remember what id means what.

In databases, ideas die hard. One large company i worked for forced everyone to use 8.3 table names, where the first letter had to be M and the last four letter had to be _TBL. Although it was likely 20 to 30 years after this had any relevance, this company was still enforcing the rule. The M was Master (as opposed to C for Child) which was used in hierarchical databases, even though databases these days are relational. The 8.3 had to do with some database they used stored each table as a file on the OS and was restricted to an 8.3 filename. The _TBL was a redundant naming scheme with no benefit whatsoever, as tables cannot be anything other than tables and the files were kept is separate directories.

Another example is people who think (in Oracle) that rebuilding indexes makes the database faster. This is based on people not understanding how caching works and misapply an idea used by large tables (millions of records) that have been heavily modified (over 50% of their data deleted or updated).

The same is true for ids. Somebody had the notion that numbers would save space or be faster. While it may save space against (larger) varchar columns (that actually use the space), there is no shortage of space these days where that is going to make a difference. Some people think numbers are faster. I have not seen any proof of this. Who knows, maybe letters are faster.
 
Dave Tolls
Ranch Hand
Posts: 2112
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Brian Tkatch wrote:
Another example is people who think (in Oracle) that rebuilding indexes makes the database faster. This is based on people not understanding how caching works and misapply an idea used by large tables (millions of records) that have been heavily modified (over 50% of their data deleted or updated).


This was always one of Tom Kytes bugbears, IIRC.
I think he used to call it lobotomising the database.

Brian Tkatch wrote:
The same is true for ids. Somebody had the notion that numbers would save space or be faster. While it may save space against (larger) varchar columns (that actually use the space), there is no shortage of space these days where that is going to make a difference. Some people think numbers are faster. I have not seen any proof of this. Who knows, maybe letters are faster.


I suspect it's more (these days) down to people seeing generated IDs on most tables and so stick them on all.
Copy pasta!
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:This was always one of Tom Kytes bugbears, IIRC.
I think he used to call it lobotomising the database.

Heh. And Jonathan Lewis. Richard Foote (who blogs about Oracle indexes) went through this in excruciating detail.
 
Tim Holloway
Saloon Keeper
Posts: 18367
56
Android Eclipse IDE Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Brian Tkatch wrote:Some people think numbers are faster. I have not seen any proof of this. Who knows, maybe letters are faster.


Back in the ancient days when databases were barely more than flat files with relationships, numbers were almost always faster because the number would have been stored in machine binary format and a single machine-language instruction could compare them versus (in most cases) multiple machine-language instructions for character strings. And technically, even a single CISC instruction such as IBM's CLC is at the fundamental level multiple microcode instructions.

Since then, however, databases have gotten much more complex and much more abstract. Like Java's JIT re-optimizing, a modern database might internally change the internal organization of both columns and indexes on the fly. It might make text fields faster to compare by adopting an approach where an integer hash key was the first thing to be compared before doing a character-by-character comparison just like Java itself does. Fracturing keys into structured components for faster searching is a strategy that was widespread even as far back as the early 1980's (speaking from experience).

Similarly, there are ways to compress text fields, including having the internal representation of commonly-repeated values replaced by an integral index which can be done transparently by a sufficiently-intelligent/well-tuned DBMS.

So just as in Java, I prefer to skip the micro-optimizations these days and try to code more straightforwardly, if for no other reason than that optimizers usually look for the common approach when determining what to optimize.

Still, there are certain constructs that I use grudgingly, when I do so at all. I don't like URLs as primary keys, because not only are they prone to mutation, but also often aliased and/or piggybacked. A UUID is less than ideal as a primary key if you want to shard based on some characteristic such as country ID, since you have to search all the shards. Stuff like that. Where possible, I prefer to keep my keys small and tidy. Because although a smart DBMS may be able to do great things with untidy data, I'd rather not discover the hard way that the DBMS I'm using doesn't.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tim Holloway wrote:I don't like URLs as primary keys, because not only are they prone to mutation, but also often aliased and/or piggybacked.

I guess that depends on usage. If it's a table to store links that will not change (like a log or listing), it's a non-issue. If it's a list of what services are located where, the url should obviously be an attribute, as the service is the key.
 
Tim Holloway
Saloon Keeper
Posts: 18367
56
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"That will never change" are famous last words.

I twitch a little at the idea that a service - or a server - might not be renamed.

I twitch a LOT at the idea that a string (key or otherwise) might end up being internationalized. Or re-capitalized. Or re-punctuated.

Codes in place of "meaningful" text pre-date computers. They defend against ambiguity and often have self-checking attributes.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tim Holloway wrote:"That will never change" are famous last words.

Which is why i added "(like a log or listing)" which are cases where it will not change.

Tim Holloway wrote:I twitch a little at the idea that a service - or a server - might not be renamed.

That depends on your usage of the table. A service means something, server names change, perhaps, but so do any designation. A number is just a way of agreeing to an id that won't change. So, apply the same rule to the name, and *poof* is now won't change.

Tim Holloway wrote:I twitch a LOT at the idea that a string (key or otherwise) might end up being internationalized. Or re-capitalized. Or re-punctuated.

I don't know what you mean by I18N here, as a name is a name. As for capitalization, i usually force my name columns to be UPPER or LOWER so unique is what we humans consider unique. Presentation of a name should be a separate column.

Tim Holloway wrote:Codes in place of "meaningful" text pre-date computers. They defend against ambiguity and often have self-checking attributes.

I contend that in most cases that was mere convenience.
 
P Marksson
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, I read on stackoverflow that VARCHAR as primary key is significantly slower. From 2012 though.
Also read someone saying using VARCHARs as key wont have a duplicated entry and that the approach is optimized.
So confusing!

----------------------------------------

Seems like MySQL loses its intelligence once a VARCHAR is used as foreign key.

This is the tables I am playing with:



Forgot to mention earlier; all the URLs within a given Category must be unique.

Anyway, this for example, is legal:



This insertion will work fine even if there is no Category called HEY. If the foreign keys were integers, we wouldn't have this problem.
Kinda makes me afraid to use this. I don't want a system with sporadic inconsistencies.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
P Marksson wrote: I read on stackoverflow that VARCHAR as primary key is significantly slower.

I'd need to see proof. Maybe it's a mysql thing?

P Marksson wrote:Seems like MySQL loses its intelligence once a VARCHAR is used as foreign key.

If that is true, i wouldn't use mysql. (But i'm biased. I wouldn't use it anyway. )

P Marksson wrote:

1) UNIQUE is a constraint, not an index. (There are also unique indexes, but that is not the point.)
2) Naming an index by use rather than what it is, is convenience today but confusion tomorrow.
3) The _fk is not a column.

P Marksson wrote:Forgot to mention earlier; all the URLs within a given Category must be unique.

That makes a huge difference. Please tell us how it is intended to be used. It would likely help the discussion.

P Marksson wrote:This insertion will work fine even if there is no Category called HEY.
If the foreign keys were integers, we wouldn't have this problem.

Huh?

Anyway, based on this new information (and a related idea), i would suggest a little change:


What's the counter for?
 
P Marksson
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Regarding performance of VARCHAR as fk, here is an interesting read:
http://stackoverflow.com/questions/2103322/varchar-as-foreign-key-primary-key-in-database-good-or-bad

What would you prefer over MySQL? I am not locked to it. I can change it whenever I want.
I read a Postgres vs MySQL articles but wasn't convinced. Maybe you can convince me?

I like the idea of having combination of two columns as primary key. Clean and simple.
Category_Adress.Category can still reference a non-existing Category though.

The counter is a secret ;) It's updated pretty pretty often though.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I prefer Oracle. PostreSQL is similar, so it's probably my second choice. I just seriously do not like mysql and as a database bigot do not consider it worthy of being called a database. It's gotten better over the past number of years though.

Composite Primary Keys should be used often. It's generally a sign of a good data model.

Category_Adress.Category can still reference a non-existing Category though.
Please explain what you meant. That example is changed for Oracle. Note, my earlier code spelled "address" with only one "d". Oops.
 
P Marksson
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Strange. MySQL doesn't check for parent key if varchar. I checked if foreign keys are decelerate differently in MySQL but nope, still wont check.

There was this senior programmer at my previous work whom told me MySQL sucks. Basically, '3' = 3 will return true. It's kinda like C++: A old and crappy language that they constantly try to improve, but won't make anything better because the foundation is rotten.

Will simply switch to Oracle SQL Server(?).

Have to rewrite my Hibernate entities too :/

Thanks for the help though. I feel the improved scheme will significantly boost the application. I can also remove my synchronization from the backend now :-)
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Heh. Oracle and SQL Server are rival products. Both have free variants, but you would have to check if they apply in your case.

If you want to use mysql, even for convenience, you can just add that id. It isn't wrong when there's a reason for it. In any case, thank you for the report about mysql. I now have another thing to make fun of. :P
 
Dave Tolls
Ranch Hand
Posts: 2112
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Got to say I have never encountered this with MySQL.
I've had lookup tables with VARCHAR primary keys and the FK mappings work as expected.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:Got to say I have never encountered this with MySQL.
I've had lookup tables with VARCHAR primary keys and the FK mappings work as expected.

I can't believe i'm doing this. I hate mysql. Setup was a pain. I'm going to take pleasure in purging it right after i post this message.

 
Dave Tolls
Ranch Hand
Posts: 2112
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm not in a MySQL shop, but does 'references' act the same as:
CONSTRAINT 'fk_some_fk' FOREIGN KEY ('some_col') REFERENCES 'tabel2' ('other_col')
or whatever the syntax is?
I always stick the constraints as a separate thing.  Habit.

In fact, some Googling (which also brought up this ancient bug, referenced by this SO thread) seems to imply that inlined use of 'references' doesn't work.

It isn't mentioned as valid in the docs anymore, that's for sure.  Which explains why I've never used it.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:I'm not in a MySQL shop, but does 'references' act the same as:
CONSTRAINT 'fk_some_fk' FOREIGN KEY ('some_col') REFERENCES 'tabel2' ('other_col')
or whatever the syntax is?

Yes. The keyword CONSTRAINT is only required when naming a constraint. It is always optional.
FOREIGN KEY() is used to specify the child when not done inline.

Dave Tolls wrote:I always stick the constraints as a separate thing.  Habit.

So do i; it's prettier that way. But for tests, why bother?

Dave Tolls wrote:In fact, some Googling (which also brought up this ancient bug, referenced by this SO thread) seems to imply that inlined use of 'references' doesn't work.

Sorry, but mysql is such a piece of garbage. This is basic SQL syntax. And basic security. Why would anyone use this trash?

Dave Tolls wrote:It isn't mentioned as valid in the docs anymore, that's for sure.  Which explains why I've never used it.

Um, then why did i not receive an error from the ever so finicky command line? Does mysql pick and choose what to be fussy about?

Sorry, but this is a clear bug. If they do not support the syntax, they do not support SQL. Unless "mySQL" means "my own version of SQL".
 
Dave Tolls
Ranch Hand
Posts: 2112
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
See the ancient bug.
They weren't getting an error either.

Does it work properly with the separate CONSTRAINT line?
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:See the ancient bug.
They weren't getting an error either.

Does it work properly with the separate CONSTRAINT line?

No idea. And i'm not installing mysql again. What a pain.
 
P Marksson
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:See the ancient bug.
They weren't getting an error either.

Does it work properly with the separate CONSTRAINT line?


Actually, it does work. Shame on me for my bad tests yesterday. Guess I hadn't completely landed yet.

 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
P Marksson wrote:

FWIW, leaving off the optional CONSTRAINT keyword (and the name) should work as well.
 
Dave Tolls
Ranch Hand
Posts: 2112
16
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Brian Tkatch wrote:
FWIW, leaving off the optional CONSTRAINT keyword (and the name) should work as well.


I would have thought you'd learned not to rely on 'should' when it comes to MySQL!
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic