Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

High-Performance Java - When do I need to look at my vendor's JDBC driver code?

 
Kent O. Johnson
Ranch Hand
Posts: 62
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Vlad, how often when optimizing and writing high-performance persistence code will I need to delve into the source code of my persistence vendor's JDBC driver?

I have had the experience of working with a JDBC driver that either didn't support the locking features I wanted or I just couldn't figure out where they were.
Specifically I was using the community version of Ingres 10, a product now supported by the Actian corporation. As I tried to use each of the different locking strategies through Spring Data JPA, then JDBC, and then Hibernate I finally found a solution with Hibernate.

Though it is likely I may have overlooked some documentation or missed code in the JDBC driver I was unable to use locking mechanisms like shown in most posts I found on Stackoverflow and the internet. I was also a bit surprised when I felt I had to open the source code for Ingres's JDBC driver because of how exasperated I got at one point.

Do you find it a common or uncommon occurrence to be viewing one's persistence vendor's JDBC driver source code?
 
Vlad Mihalcea
Author
Ranch Hand
Posts: 32
6
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I always try to understand the JDBC driver as much as possible and also get familiar with the bugs as well.
After all, every statement flows through the driver, and if it has an issue, there isn't much you can do in an upper-layer such as Hibernate.

The first part of my book dives into Oracle, SQL Server, PostgreSQL, and MySQL driver peculiarities in relation to batching, statement caching, fetching particularities, and also concurrency control on the db side.
As a back-end developer, we are supposed to know how the database works, how to index it, how to interpret an execution plan, to write the most effective queries.
This means we need to be very familiar with the database manual and also the JDBC driver documentation.
 
Tim Holloway
Saloon Keeper
Posts: 18362
56
Android Eclipse IDE Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There was a time when the Microsoft JDBC driver for SQL server was severely lacking in transaction support, as I learned the hard way. Fortunately the open-source driver on SourceForge was more robust, so that's what I ended up using. Since then, Microsoft has fixed their own driver, but at the time, no such luck.

These days, unless you have a source license for JDBC driver code or it is publicly published, you could have legal problems. The DMCA is rather intolerant of reverse-engineering and Mickey MouseĀ® might come kick down your doors and drag you off to Gitmo.

If you do have legal access and want something to read yourself to sleep, there are worse things to do than read driver code. But the standard admonition against premature optimization should still be observed. I can relate from long experience that the performance issues are almost never where you "know" they're going to be, and doubly-so when the real bottlenecks are likely to be inside the database server itself. So avoid creating gnarly code just to solve problems that might not arise and spend some of that late-night reading time studying performance measurement tools.
 
Vlad Mihalcea
Author
Ranch Hand
Posts: 32
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Unfortunately, jTDS hasn't moved forward, being stuck with JDBC 3.0. The Microsoft driver has released new versions in the meanwhile, so, it might be the only option these days.
If premature optimization is a bad thing, designing the data access layer without knowing the stack underneaths is even worse.

An ORM tool can make it very easy to select way too much data or to execute more queries than necessary.
Many performance issues don't arise during development phase due to lack of data, and once you move into production and data accumulates, the problems start to become obvious.
That's, unfortunately, a way to common scenario too.

Monitoring and measuring is the most reliable way to find the cause of any given performance issue, and doing it incrementally while the project is still under development can give a better understanding of how the stack behaves when a traffic spike occurs or when the underlying data keeps on increasing.
 
Kent O. Johnson
Ranch Hand
Posts: 62
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Vlad and Tim, thank you. I am very new to the domain of performance tuning in the persistence layer and your comments are raising my awareness of the need to pay close attention to the behavior of each part of my persistence stack. In the past I would mostly use JPA, Hibernate, and JDBC and not think much about it. As I am now looking for ways to not only increase performance of an application yet also to design for high performance and easy scalability I appreciate how your comments are drawing my attention to the persistence layer. I look forward to reading your book Vlad. I am hoping it will guide me to know how to measure and analyze the efficiency and performance so I can make judgments about potential improvements.

The deepest I have gone so far is to use and look at database locking and very simple transaction management. One of my current shortcomings as a developer is a deep knowledge of the internal workings of relational persistence technology which I hope to remedy with working on a great project in conjunction with reading Vlad's book and other resources I can find. it would be nice if there were some sort of a roadmap to becoming competent and able with regard to optimizing a persistence stack as well as designing and architecting software for high-performance. I am sure there is such a roadmap, I just need to become aware of it.
 
Tim Holloway
Saloon Keeper
Posts: 18362
56
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I mentioned performance monitoring tools, and EXPLAINs, but there's another option when you're dealing with Hibernate JPA and that's to turn on logging.

Hibernate logs can tell you a great deal about how Hibernate is dealing with your JPA code - including what actual SQL statements your JPA queries are being rendered as, how caching is being applied to statements and data, and much more. I spent WAY more time than I wanted to learning about the mayhem that can result from working with a database that supports fixed-length space-padded keys (DB2) and JDBC drivers that sometimes, but not always trim trailing spaces. For the record, I don't recommend fixed-length keys in databases where the key could be space-padded, but this particular product was ported from an IBM mainframe and fixed-length fields were once more efficient there. One of the reasons why you shouldn't optimize for implementation details before you actually observe a problem.

One thing you'll see from Hibernate JPA traces is that some decidedly non-intuitive things go on in there and they have a terminology all their own. Some of which you'll likely only see in Hibernate, and not, say, in OpenJPA. Which, again, is why premature low-level optimization is something we caution against. I once had to migrate a project from OpenJPA to Hibernate JPA because the OpenJPA implementation of the day wasn't supporting a critical feature for my app. Thanks to the miracle of Maven and the fact that JPA is a JEE standard, it only took me 15 minutes. If I'd loaded my app with Hibernate-specific optimizations, it would have been much harder.

Higher level optimization is much safer. As I said in the beginning, there's a cost for joins that's almost inevitable. In some cases, it may be useful to not actually define your JPA entities with the joins explicitly built-in, although this frequently can come back and bite you. Tuning lazy-fetch/eager-fetch can be very important, especially if you're like me and detach the entities before turning them over to the business logic. As a rule, the less data you transfer to/from your app to the DB server the happier the app will be and the less work you make the DB server do on its end, the happier the DBA will be. These goals are not mutually exclusive if your requests are well-designed. You'll also make the network admin happier if you minimize the amount of data going from DB server to app. Making the DBA happy mostly means using tools like EXPLAIN to pare down complex requests so that the non-applicable data gets removed as soon as possible in the request process. There's even a place for stored procedures sometimes, and I've seen too much grief come from stored procedures to recommend them casually.
 
Vlad Mihalcea
Author
Ranch Hand
Posts: 32
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I use DataSource-proxy for logging statements because it allows me to add custom listeners like to ensure the statement count is the one I expect it to be.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic