Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

min/max of a group

 
Hubert Grzeskowiak
Greenhorn
Posts: 22
Java Linux Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

given an Entity Employee with fields name, age, salary and employment status (enum as string), how would you get the one top-earning employee with status=EMPLOYED grouped by age?

Example data (name, age, salary, status):
- Rose, 21, 30000€, EMPLOYED
- Clara, 21, 28000€, EMPLOYED
- River, 23, 31000€, EMPLOYED
- Rory, 23, 40000€, EMPLOYED
- Doctor, 1300, 42000€, INACTIVE
- Amy, 27, 27000€, EMPLOYED

Expected results:
- Rose, 21, 30000€, EMPLOYED
- Rory, 23, 40000€, EMPLOYED
- Amy, 27, 27000€, EMPLOYED


In some DBMS a select over all fields and a max(salary) would return values from the row selected by max(), but this is not the case in MySQL. It simply returns the one field from the max() row but the rest doesn't come from the same row. Example

The code above returns the max salary, but other fields are from another row.

I have found a way using a subselect:

which basically finds the highest salary and returns a row which salary equals that max value.
My problems with this are
- hard readability
- error-prone and hardly maintainable due to double code

Is there a better and DBMS agnostic way of doing this?
 
Roel De Nijs
Sheriff
Posts: 10662
144
AngularJS Chrome Eclipse IDE Hibernate Java jQuery MySQL Database Spring Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hubert Grzeskowiak wrote:In some DBMS a select over all fields and a max(salary) would return values from the row selected by max(), but this is not the case in MySQL. It simply returns the one field from the max() row but the rest doesn't come from the same row.

I wonder which DBMS will return values from the row selected by MAX(), because that really makes no sense at all! Which results would you expect if the example data would have been:Multiple rows have the maximum value for salary, so how can the DBMS know which value to return

And your example query is illegal in standard SQL, because the nonaggregated name, age, and status columns in the select list do not appear in the GROUP BY clauseBut MySQL has a specific handling of GROUP BY and that's why that query is allowed. But it has to be used with caution, otherwise you'll end up with incorrect results.

Hubert Grzeskowiak wrote:My problems with this are
- hard readability
- error-prone and hardly maintainable due to double code

Your initial solution might be easier to read, but is illegal SQL and will be error-prone as well

Hubert Grzeskowiak wrote:Is there a better and DBMS agnostic way of doing this?

I don't think there is, but I am not a SQL expert at all. If you didn't have to take the employment status into account, you could do probably something like this

Hope it helps!
Kind regards,
Roel
 
Hubert Grzeskowiak
Greenhorn
Posts: 22
Java Linux Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi and thanks for the reply,

Multiple rows have the maximum value for salary, so how can the DBMS know which value to return.

- In my case any of the entries would be okay. I'd expect the database to simply return the first one it finds.


And your example query is illegal in standard SQL

This is really good to know! I didn't know this was MySQL specific.

If you didn't have to take the employment status into account

Unfortunately this is part of the core business logic. In my actual use case I'm dealing with system emergencies and an enum is used to represent the status (e.g. OPEN, FIXED), so when showing active emergencies I don't want to show outdated ones.

After some more searching I found that StackOverflow actually has a tag specially for this problem: http://stackoverflow.com/questions/tagged/greatest-n-per-group
Some of the best rated answers are very similar to yours :-)
 
Roel De Nijs
Sheriff
Posts: 10662
144
AngularJS Chrome Eclipse IDE Hibernate Java jQuery MySQL Database Spring Tomcat Server
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hubert Grzeskowiak wrote:- In my case any of the entries would be okay. I'd expect the database to simply return the first one it finds.

That sounds really like a very weird business requirement. How can River be considered the same as Rory or Roel

Hubert Grzeskowiak wrote:After some more searching I found that StackOverflow actually has a tag specially for this problem: http://stackoverflow.com/questions/tagged/greatest-n-per-group
Some of the best rated answers are very similar to yours :-)

Pfew! I feel a bit releaved to know what I've suggested makes some sense
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic