• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Retrieve records irrespective of the upper and lower case in which it is stored in Oracle DB

 
Rithanya Laxmi
Ranch Hand
Posts: 185
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

we have the below query in Oracle DB:-

select ('tbt-' || customer name ) from CUSTOMER

which appends the customer name with the prefix "tbt-" which will be passed as an input to CONTACT table which is having an ID which contains "tbt-<Cust_name>", but the real problem is that ID also contains some records with upper "TBT-<CustName>" hence there is a mismatch and we are getting only the records for lower "tbt-<Cust_name>" and not for records "TBT-<Cust_name>". Please let me know how we can get the both "tbt-<Cust_name>" as well as "TBT-<Cust_name>" records from CONTACT table. This is the below query:-

select * from CONTACT where id in (select ('tbt-' || customer name ) from CUSTOMER);

Now how we can include OR condition to support the below as well, so that it supports both UPPER & LOWER "tbt-" prefix?

select * from CONTACT where id in (select ('TBT-' || customer name ) from CUSTOMER);

since we are unable to use the OR condition for the inner SELECT , I having issues supporting both the upper & lower prefix values?

Please clarify.

Thanks
 
Martin Vajsar
Sheriff
Posts: 3752
62
Chrome Netbeans IDE Oracle
  • Likes 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are several possibilities. You could do the comparison in upper case:
This would be problematic if you have customers whose names differ in case only. And it isn't index friendly, unless you create a function based index on UPPER(id) on table CONTACT.

Alternatively, you can put the OR into the outer query:
This one is better suited to use index on the CONTACT.ID column (if any), but scans the CUSTOMER table twice.

A better way would be to homogenize the records in the CONTACT table to contain only one of the variants, or (even better) to remove the 'tbt-' prefix from the ID column altogether. It doesn't really belong there.
 
Roel De Nijs
Sheriff
Posts: 10662
144
AngularJS Chrome Eclipse IDE Hibernate Java jQuery MySQL Database Spring Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rithanya Laxmi wrote:select * from CONTACT where id in (select ('TBT-' || customer name ) from CUSTOMER);

since we are unable to use the OR condition for the inner SELECT , I having issues supporting both the upper & lower prefix values?

I'm not an Oracle expert at all, but why not using the UPPER or LOWER function? So something like
 
chris webster
Bartender
Posts: 2407
33
Linux Oracle Postgres Database Python Scala
  • Likes 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Oh boy, this is a real mess. It's hard to be sure what is going on here, but it looks like you have problems with:

  • Inconsistent data values.
  • Inconsistent data naming.
  • Poor key definitions.
  • Poor SQL queries.
  • Matching mixed case values in a query.


  • Probably a lot of other stuff because chances are this is not the worst thing hiding in your database. It's a long time since I had to deal with a mess like this, so I may be missing some trick for fixing all the problems with one magic spell. But I'm going to work on the assumption that there really is no such thing as a free lunch, so let's see if we can clear some of these up for you the old-fashioned way.

    Inconsistent data values

  • The idea of a relational database is to act as a "single source of truth". Each value should be stored once only, so that you know it is the definitive value. The exceptions are when primary key values are duplicated to act as foreign keys, and when data is denormalised to improve query performance. But even here it is imperative to keep the values consistent.
  • If you have inconsistent data - where values that should be the same are in fact different - then you need to fix the bugs in your design/application that are allowing this to happen, and you then need to clean up the bad historic data.
  • This may sound like a lot of work, but it will save you or your colleagues more work in the long run.


  • Inconsistent data naming

  • Your query suggests that the "contact.id" column is apparently supposed to contain the same value as the "customer.name" column, but these columns have different contents and different names.
  • If they are supposed to contain the same values, then their names should indicate this.
  • If they contain different values, then your query should not be trying to match them.
  • The usual convention is that "contact.id" would be the primary key of the "contact" table.
  • But in your model contact.id actually contains the customer name plus an unreliable prefix. So is it really supposed to be a foreign key back to the "customer" table?
  • What do these column names mean, and why are they different, if they are supposed to contain the same values?
  • Perhaps the fact that nobody knows the "contact.id" and "customer.name" are the same thing is why they are putting inconsistent data in these columns?


  • Poor key definitions

  • What are the primary and foreign keys for your "contact" and "customer" tables?
  • For example, is customer.name meant to be a unique key on the "customer" table, and a foreign key called (confusingly) "id" from the contact table back to the customer table?
  • Do you know, and does anybody in your team know?


  • Defining keys properly

  • A database table may have several alternative unique keys i.e. certain columns (or combinations of columns) that have unique values for a given record.
  • The primary key is chosen from these "candidate keys", but it has to meet certain extra criteria i.e. it should be unique, mandatory and immutable.
  • The customer.name (or the corresponding contact.id) is not a good choice of primary key.
  • Customers can have the same name, so the name is not reliably unique.
  • Customers can change their names, so the name is not immutable.
  • The primary key of a parent table is used as the foreign key back to that parent from any child tables, so you do not want to change the primary key value because it will force you to change all the corresponding FK values (Oracle stops you doing this by default anyway).
  • As an aside, it is also a really bad idea to add arbitrary suffixes/prefixes to values that you need to rely on for queries, including keys, because that immediately introduces extra sources of errors and inconsistencies, which is what has happened here.
  • All this means you probably need to change your data model to use proper primary keys, and thus foreign keys, on your tables. This will then allow you to implement simple join queries like this in a reliable and efficient manner.
  • One common technique to avoid a lot of these problems is to use a surrogate key i.e. an artificial key that you can guarantee will be unique.
  • Start by renaming the badly named fake key columns you are currently using (like "contact.id") so that people know what these columns really contain.
  • Then create a new numeric surrogate key for each relevant table using a database sequence, which will act as the real primary key in future.
  • This new surrogate key will also be used as the foreign key in any related tables.
  • This sounds like a lot of work, and it probably is, but it is your organisation's fault for screwing up the database design in the first place.
  • This is technical debt and you will be paying for it one way or another, so you might as well start planning for it now.


  • EDIT (22/11/15):
  • One important thing you must do once you've identified your primary/foreign keys is create database constraints to enforce them.
  • A primary key constraint will ensure that the PK column values will be unique, mandatory and cannot be changed.
  • A foreign key constraint will ensure that when you put a value in there, it must correspond to an existing PK in the parent table.
  • This will prevent the kind of problems you are having where the fake keys in the contact and customer tables don't match.


  • Poor SQL queries

  • So let's ignore the fact that your query is relying on two columns containing the same data when they don't, and look at what you are apparently trying to do.


  • This is a nested query that seems to be trying to do a join, so you could just do a join.
  • If you had a reliable and consistent set of values (consistent case, no prefix) in the id/name, then it would look like this:


  • Easy, huh?
  • Of course, even if you don't want to do a join, there are still other problems here e.g. don't do "SELECT * FROM ..." a table because (a) the columns in the table might change in future but the code that handles the results of this query might not know about the changes, and (b) you are probably fetching data that you don't actually need.
  • Always specify the columns you want to read as it's safer and more efficient.


  • Matching mixed case values in a query

  • So now we finally get to your question, but perhaps you are starting to realise that this is not the only problem in this data model?
  • Anyway, here are some ways to handle queries where you want to ignore the case of the data.


  • 1. If your DB values are in a known case e.g. UPPER, but the search value may be either lower or UPPER:

  • Index the DB column(s) you will be searching on.
  • Convert the search value to the known case in the query e.g. "WHERE mytable.mycolumn = UPPER('somevalue')."
  • This will allow your query to do a single comparison against the search value, and more importantly it can use the index to do so.
  • Unfortunately, you are comparing stored values which are all in mixed case, so you can't do this.


  • 2. Dumb poor-performing query

  • Convert both sides of the query to use a known case e.g. UPPER:


  • This gives you a single comparison using the same case, but your query will no longer use the index on the search columns, because you have applied an UPPER() function to the indexed value.
  • This means your query may be slow as it may have to do a full table scan.


  • 3. RECOMMENDED SOLUTION: Fix the problem, not the symptoms

    As mentioned above:

  • Stop writing bad data to your database - fix the bugs in your application that create this data.
  • Run a one-off SQL UPDATE to fix the data e.g. make it all the same case, get rid of the stupid prefix.
  • Then you can stop worrying about matching case in your queries, and get back to worrying about how to fix your data model, which is what your organisation should have been doing in the first place.


  • 4. Hide the symptoms

  • You could add a function-based index to support your queries.
  • I haven't done this, so I can't say if it will work or not, and I would not recommend it, because you are only hiding the problem instead of fixing it.
  • But it might give you a temporary work-around until you can start fixing the real problems.
  • There are probably other variations on the "hide the symptoms" approach, but you should fix the problem instead.


  • Final recommendation: I'm guessing there are plenty more examples of this kind of crap in your system, so maybe somebody at your organisation should pick up a book on relational databases and start learning how to do this stuff properly.
     
    Roel De Nijs
    Sheriff
    Posts: 10662
    144
    AngularJS Chrome Eclipse IDE Hibernate Java jQuery MySQL Database Spring Tomcat Server
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Martin Vajsar wrote:A better way would be to homogenize the records in the CONTACT table to contain only one of the variants, or (even better) to remove the 'tbt-' prefix from the ID column altogether. It doesn't really belong there.

    I definitely agree! What you are trying to do here is some dodgy patch work, and that's never a good idea! Just handle (and get rid of) the real cause of this issue.
     
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic