Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

need 2 rows rather than 5 from this query

 
Marilyn de Queiroz
Sheriff
Posts: 9065
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
When I run this query,

I get 5 rows -- 2 rows for 7/21/2007 reportingPeriod and 3 rows for 7/28/2007 reportingPeriod for the same machineName.

What I want is 2 rows,
1) one row for 7/21/2007 (the row where lastBackupStatus = Completed)
2) one row for 7/28/2007 (the row where lastBackupStatus = Completed)

I'm thinking I need some sort of subquery, but I'm having trouble coming up with the algorithm that will get me what I want.

Can you help?
[ July 25, 2007: Message edited by: Marilyn de Queiroz ]
 
Abhishek Asthana
Ranch Hand
Posts: 146
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
On what criteria you want those two rows?
Status being 'COMPLETED'? If yes then add this condition in your query's where clause.
Otherwise, please mention the requirement in more detail.
 
Marilyn de Queiroz
Sheriff
Posts: 9065
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In this case each of the reportingPeriods for this machineName has at least one case where status is "Completed", but that will not always be the case. I don't want to exclude those which have only other statuses, but I only want one instance of machine for each reportingPeriod.

If more than one is status Completed, I want the most recent of those.

A status of Completed within a reportingPeriod will override any possible failures later in the same reportingPeriod.

If none of the rows for that machineName for that reportingPeriod has a status of Completed, I still want (only) one instance for that reportingPeriod, maybe the most recent backup time.
[ July 26, 2007: Message edited by: Marilyn de Queiroz ]
 
subodh gupta
Ranch Hand
Posts: 203
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
try distinct function on date.
 
Michael Matola
whippersnapper
Ranch Hand
Posts: 1820
4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
but I only want one instance of machine for each reportingPeriod.

Aha! That tells me essentially that you want to write a query that operates on some "groups" that you create -- groups consisting of a machineName/reportingPeriod combination. Generally, there are (at least) two ways of doing this: GROUP BY or correlated subqueries. I'll take the correlated subquery approach. With correlated subqueries you essentially create your groups by carefully choosing which fields in the subquery you tie back to the main query. In this case it will be machineName and reportingPeriod.

If more than one is status Completed, I want the most recent of those.
A status of Completed within a reportingPeriod will override any possible failures later in the same reportingPeriod.


"Most recent" is generally easiest to do with some max date logic and a correlated subquery. "Most recent completed" tells me that the qualification "lastBackupStatus = Completed" must be inside the subquery. (In this case you can have it both inside and outside, but I don't think it changes the results.)

First shot:

SELECT v.*
FROM veritas AS v
WHERE v.lastBackupTime = ( select max( v2.lastBackupTime)
from veritas v2
where v2.machineName = v.machineName
and v2.reportingPeriod = v.reportingPeriod
and v2.lastBackupStatus = 'Completed')

But wait! You said: In this case each of the reportingPeriods for this machineName has at least one case where status is "Completed", but that will not always be the case. <...> I don't want to exclude those which have only other statuses <...>If none of the rows for that machineName for that reportingPeriod has a status of Completed, I still want (only) one instance for that reportingPeriod, maybe the most recent backup time.

Since you didn't provide data for this scenario, I made up my own. There are some problems with that, though. I don't know when in your data lastBackupTime is populated or not. Your "Canceled by admin" that has a lastBackupTime suggest that the field gets populated if a backup is at least attempted. But "Never" backed up suggests it can also happily be null. So I see three flavors where machineName/reportingPeriod combos do not have at least one status of completed:

(1) (my 7/15 data) -- lastBackupTime is nonnull for all rows
(2) (my 7/12 data) -- lastBackupTime is null for all rows
(3) (my 7/10 data) -- lastBackupTime is null for some, nonnull for others



The first case (7/15 data) is easy. We can use your requirement "maybe the most recent backup time".

Second shot:

SELECT v.*
FROM veritas AS v
WHERE v.lastBackupTime = ( select max( v2.lastBackupTime)
from veritas v2
where v2.machineName = v.machineName
and v2.reportingPeriod = v.reportingPeriod
and v2.lastBackupStatus = 'Completed')
UNION SELECT v3.*
FROM veritas AS v3
WHERE v3.lastBackupTime = ( select max (v4.lastBackupTime)
from veritas v4
where v4.machineName = v3.machineName
and v4.reportingPeriod = v3.reportingPeriod)
and not exists ( select 'x' from veritas v5 where v5.machineName = v3.machineName
and v5.reportingPeriod = v3.reportingPeriod
and v5.lastBackupStatus = 'Completed')
ORDER BY v.machineName, v.reportingPeriod;

So I unioned in a second select that handles the (7/15) scenario. It selects the most recent row for those machineName/reporting combinations that do not happen to have any completed backups.

(If it bothers you that the first select and second select *may* not be mutually exclusive, you can add a condition v.lastBackupStatus = 'Completed' to the outer select in the first half of the union.)

But this query doesn't select in any of the 7/12 data. (How could we ever know the most recent if all dates are null?) And for the 7/10 data it selects in the 7/10 lastBackupTime because the max() function considers any date greater than null.

I don't even know if my 7/10 and 7/12 mockup data is realistic, so I won't try to handle it.

This, at least, gives you a starting point, I think.

Something about the data you provide though leaves me kind of squirmy. What does it mean to have a populated "lastCompletedBackupTime" and a lastBackupStatus of "Client computer not responding"? Why do you even need both a "lastBackupTime" and "lastCompletedBackupTime" if the status tells you whether it's completed or not? How many "Never backed up" rows get stacked up? Etc.
 
Marilyn de Queiroz
Sheriff
Posts: 9065
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
try distinct function on date.
That won't work in this case because I can't distinct on only one item without using distinct on the group. Since all rows are different, I'd still get 5 rows.
 
Marc Peabody
pie sneak
Sheriff
Posts: 4727
Mac Ruby VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My solution looks like Michael's "second shot".
 
Marilyn de Queiroz
Sheriff
Posts: 9065
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
[MM] Aha! That tells me essentially that you want to write a query that operates on some "groups" that you create -- groups consisting of a machineName/reportingPeriod combination. Generally, there are (at least) two ways of doing this: GROUP BY or correlated subqueries. I'll take the correlated subquery approach. With correlated subqueries you essentially create your groups by carefully choosing which fields in the subquery you tie back to the main query. In this case it will be machineName and reportingPeriod.

This is the line of reasoning I was trying to follow.

Since you didn't provide data for this scenario,



(1) (my 7/15 data) -- lastBackupTime is nonnull for all rows

This is probably true for most machineNames.

(2) (my 7/12 data) -- lastBackupTime is null for all rows

Hmmm. I guess that might be possible. Maybe they just got the machine the day the reporting period ends, so they only have one row and lastBackupTime is null.

More research seems to indicate that every time lastBackupTime is null, the status is "Never backed up"

(3) (my 7/10 data) -- lastBackupTime is null for some, nonnull for others

This is quite probable.

But this query doesn't select in any of the 7/12 data. (How could we ever know the most recent if all dates are null?)

True, but we still need to list those machines (once each) in the resultset.

And for the 7/10 data it selects in the 7/10 lastBackupTime because the max() function considers any date greater than null.

This, at least, gives you a starting point, I think.


Yep.

What does it mean to have a populated "lastCompletedBackupTime" and a lastBackupStatus of "Client computer not responding"?

I think it just copies the "lastCompletedBackupTime" from the previous time the backup completed successfully even tho this one didn't.

Why do you even need both a "lastBackupTime" and "lastCompletedBackupTime" if the status tells you whether it's completed or not?

Good question! I probably really don't care about the "lastBackupTime" at all. What I really care about from this resultset is the lastCompletedBackupTime, the status, and the reportPeriod, but if there is no lastCompletedBackupTime, I still need to list the machine (once). It doesn't really matter which instance of the machine within the reportPeriod I choose when the status is not Complete.

How many "Never backed up" rows get stacked up? Etc.

I'm not 100% on this, but there appear to be only one per reportingPeriod per machineName.
 
Mark Spritzler
ranger
Sheriff
Posts: 17278
6
IntelliJ IDE Mac Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Michael definitely is up to the issue and solutions.

I do propose that there seems to be "rules" on which one record you want for the single record on that date. Sometimes, and just sometimes, it is actually easier to create a Stored Procedure to do the work, so you can split it up into multiple queries and piece it together through code that run the rules.

Mark
 
Michael Matola
whippersnapper
Ranch Hand
Posts: 1820
4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mark Spritzler:

Sometimes, and just sometimes, it is actually easier to create a Stored Procedure to do the work, so you can split it up into multiple queries and piece it together through code that run the rules.


Definitely. If not a stored procedure, then just select more data than you actually need and do the further processing or weeding out in whatever language you're working in.

Marilyn's posted some queries here that I believe have had the requirement "do it all in one SQL statement" (hope I'm not misrepresenting Marilyn) and that have had embedded in them one special case after another. (If the data's like this, present it this way, if it's like that, do this instead, and oh, if it's like this, transform it it to be like this, etc.) In my response, I was just running with that approach. But I'm also in agreement with Mark; I'd be the person asking "do I *really* have to do this in one single query?"
[ July 27, 2007: Message edited by: Michael Matola ]
 
Marilyn de Queiroz
Sheriff
Posts: 9065
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks, everyone, for your input. You are correct, Mike. I don't have access to stored procedures in the present circumstances, so I need to handle stuff in a complex sql statement. I think I have this resolved. Thank you again to everyone.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic