Bill Karwin

author
+ Follow
since Aug 02, 2010
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Bill Karwin

Mike Peters wrote:I wonder, in what scenario, do you want a user to be able to alter a query component other than the parameters (except for a query tool)?



I was having a discussion with a security expert and he posed me the same question. He even asserted that I'd be "hard pressed" to come up with any realistic example.

I showed him a query like the following:

SELECT * FROM Products WHERE ...criteria... ORDER BY {$Column} {$AscOrDesc}

This is a pretty common pattern where you want the user's choice to determine the sort order and sort direction. For example on Amazon.com you can sort by average customer review, or by price low to high or by price high to low, etc.

The trick is that you can't use a query parameter for column names or SQL keywords (e.g. ASC vs. DESC). You can use a query parameter only in a context where you would otherwise use a single literal value, like a number or a quoted string literal or quoted date literal.
Thanks to Java Ranch for having me as a guest and to the community for asking such good questions!

Cheers!
I fully agree that MySQL should not parse and silently ignore constraints. It doesn't even generate a warning. I have sent feedback to the MySQL team in the past that this is a terrible bug, but it's too late at this point since changing the behavior would break millions of apps and web sites.

On the other hand, Oracle supports foreign key constraints -- except not cascading updates. Also, an empty string in VARCHAR2 is the same as a null, which makes them incompatible with ANSI/ISO SQL and every other vendor's implementation of SQL. This, they cannot fix? They've been the 2nd largest software company in the world for decades!

That's what I mean when I say every database has its idiosyncrasies.

Paul Sturrock wrote:I dislike MySQL for database purist reasons



MySQL has for ten years provided the InnoDB storage engine, which supports RI constraints and transactions. In MySQL 5.5, InnoDB is the default storage engine (finally), and over the last few years InnoDB has been improved so it's faster than MyISAM in most cases.

I agree MySQL still has a few WTF moments (whither check constraints? raise error in triggers? recursive CTEs?), but all RDBMS implementations have their own idiosyncrasies.
Hi Mark,

I wouldn't set a hard limit on the number of joins, or consider too many joins to be an antipattern. This would be like saying it's a Java antipattern that you have more than a certain number of classes. Sometimes you do have too many classes in one application, but the same number of classes in another application might be just what is needed.

In my experience, it's more common for developers to use fewer joins than they need, than for them to use too many joins. They read some blog that says "joins are slow" and so they try anything they can to avoid using joins -- even if by eliminating joins they have actually made the problem harder to solve or slower to run. See the example given by Razvan Popovici earlier this week: https://coderanch.com/t/508765/JDBC/java/SQL-Antipatterns

One common place I see people use too many joins is when you use the Entity-Attribute-Value design and you try to query a result set from it on one row per entity, as if it were a proper table. You end up doing an extra join to your EAV table for each attribute. The EAV design is of course an antipattern.
Certainly it depends on the type of relation used for the relation valued attribute. You can't just use arbitrary key/value pairs, with a potentially different set of keys in each row.

For it to be relational, you'd have to ensure that the same keys are present in every row. The easiest way to do this in an RVA is to make the keys column names, and the values would be data in a single row.

But if you're doing that, you might as well just store the properties in conventional columns instead of an RVA. The reason a product like Joomla stores the blob of key/value pairs is that they want to extend the set of keys without creating more columns. And that isn't supported by relational theory.

David Lee Lambert wrote:...newline-separated list of "key=value" pairs...



This is what Martin Fowler might classify as the Serialized LOB pattern: http://www.martinfowler.com/eaaCatalog/serializedLOB.html

It's a question of the lesser of the available evils. The alternative way some people store key/value pairs is the awful Entity-Attribute-Value design, which is an SQL antipattern. Collecting the key/value pairs into a blob and storing them all together is actually preferable to EAV.

You'll still find it inconvenient to use the key/value pairs as discrete attributes in SQL expressions, simply because SQL doesn't offer syntax to address the individual sub-elements. You just have to treat the params column as a kind of black box, at least from the context of SQL queries. You can fetch the whole list of key/value pairs and access them using application code once you've deserialized them into an application-side Map of some kind. You've described that you do this with Hibernate or with stored procedures, but that's kind of a lot of work, and you still can't reference one of these sub-elements simply in a WHERE clause.

Read "SQL and Relational Theory" by C. J. Date (probably the best living authority on relational theory). He points out that a single attribute can have structure. You could even store a relation in a single column. He calls these relation valued attributes, and that starts to resemble a set of key/value pairs. Most SQL vendors don't support this concept, but it's not a violation of relational theory, as long as the given column stores the same type of relation on all rows.
Hi David,

In his presentation, he didn't show the LINQ code that produced that blob of SQL. But it's an extreme example anyway. The point is that abstraction layers have to be general-purpose, so in any situation besides the utterly trivial SELECT * FROM Table, ORM's can't optimize for a given situation as well as a human can.

That's why I say that ORMs are generally for developer productivity, not code efficiency. It could still be a win if 80% of your code uses the SELECT * FROM Table type of query, but you should assume that you'll need to bypass the ORM for your remaining queries. If we were to accept that, the ORM could be smaller and simpler, and it wouldn't be as likely to make a mess out of the simple queries, as in the LINQ example above.
What brand of database is this? (MySQL, Oracle, Microsoft SQL Server, etc.)

In MySQL, you can do: SELECT SLEEP(180) to make a query that take three minutes.

If you are using another brand of database, perhaps it has a similar function.
Hi Jeanne,

Yes, I do recommend using parameters when the dynamic part of the query is a value.

I have an example in my book of adding a list of ? parameter placeholders to a query string and preparing that. Your batching idea takes that a step further, and gives benefit in some vendors of RDBMS, where the optimization of a given SQL query is cached. Other databases don't cache optimization plans, so there's no benefit in that way.

But as I said, you can't use parameter placeholders for things other than values. If your dynamic query involves dynamic table name, column name, expressions, or SQL keywords, you have to rely on string interpolation. And that's where it's important to prevent untrusted input from getting into your string.
My book tries to identify common blunders. I didn't write about the merits of using stored procedures in my book, because I consider neither using stored procedures nor using application code to be a blunder. As developers, we should use the language and environment where we are the most productive. So it depends on the tools you have available, and the skills and experience of the individuals in your team.

There are those who say every database interaction should be through a stored procedure interface. There are others who insist the database should be a dumb persistence store, and all constraints and domain logic should be implemented in application code. I think both of these positions are too inflexible, too unequivocal.

I'm a proponent of using referential integrity constraints (foreign keys) in the database, and of using these constraints to implement cascading delete and cascading update. But on the other hand, I seldom write triggers or stored procedures unless there's a significant advantage for the sake of logic or performance.
I admit I may have misunderstood Hibernate, it appears that it transparently creates a table to serve as a supertype. Mea culpa!
http://docs.jboss.org/hibernate/stable/core/reference/en/html/inheritance.html

The antipattern is when a child table references several parent tables using a single foreign key column, but there is no table for the supertype common to all the parents. It's more common for programmers who use loosely typed languages like PHP or Ruby to try to use the foreign key in the spirit of "duck typing."
Using prepared statements with parameters helps to defend against SQL injection in most common cases, when you would otherwise interpolate untrusted content into a string and then execute the string as an SQL statement.

But a query parameter takes the place of a single value. You can't use a query parameter as a substitute for a dynamic table name, column name, list of values (e.g. for an IN() predicate), expressions, or SQL keywords.

For those cases, you can use techniques like filtering or whitelisting so you don't interpolate untrusted content into your SQL strings.

Filtering is where you strip out any characters that would cause trouble. If you know your dynamic column name should only be alphanumeric characters, then apply a filter to your variable before using it in SQL. Or else just reject a variable if it doesn't match a regular expression like /^[A-Za-z0-9]*$/

Whitelisting is where you use a Map<String,String> so you can transform the untrusted input into known safe values. If a user inputs a column name that doesn't appear as a key in the Map, then use a default or else raise an error.
Hi Helana,

My book is mostly about SQL, but I wrote six chapters in a section on Application Development Antipatterns. SQL is designed to be used in cooperation with a host language which could be Java, C#, PHP, Python, Perl, Delphi, or anything else. The Application Development Antipatterns I cover are about how we use SQL in an application written in another language.

Sunil Tiwari wrote:1) How can SQL Antipatterns help in MVC architecture?


I do have a chapter titled Magic Beans in my book, about the MVC architecture. The MVC antipattern I see most commonly is for application development frameworks such as Ruby on Rails (and its imitators) to simplify MVC by making the Model an instance of the Active Record data access component. Martin Fowler describes Active Record (http://martinfowler.com/eaaCatalog/activeRecord.html), but he also describes several other data access patterns. Unfortunately, frameworks that try to simplify application development push the idea that all you need is Active Record. Of course most applications in the real world are more complex, and you have to encapsulate that complexity somewhere. So users of these frameworks end up putting a lot of their domain logic into their Controller classes, which leads to fat controllers, code duplication, and difficulty in unit testing.

Sunil Tiwari wrote:2) What are the common pitfalls in DB application development, using Java?


I have standard advice for Java developers:
  • 90% of the issues you will face are related to your CLASSPATH.
  • Of the remaining problems that are not due to CLASSPATH, 90% of those are related to your classloader.

  • This is kind of a glib answer, but I think it's not far from being true. In Java, the biggest hurdle that faces every developer (edit: every developer who uses JDBC, that is) is to try to understand how to load a JDBC driver and get a connection to the database. It's probably the first time a Java developer uses the reflection API (perhaps the only time).

    This isn't the most advanced or complex issue, but it's the most common one, because every developer has to learn to connect to the database first, before doing anything further. And let's face it, there's quite a steep learning curve to understanding the steps of connecting to a database.