• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

MySQL - Creating bins based on and joins

 
Anders Alexandersson
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am trying to figure out how to accomplish this in MySQL

I have two tables:

1) Service Table:

Serv_ID Prov_ID Serv_Date

11812 CT18T005 2016-01-21
11819 CT18T005 2016-02-23
11823 CT18T006 2016-01-25
11832 CT18T010 2016-01-26
11838 CT18T010 2016-02-23
11842 CT18T007 2016-01-25
11848 CT18T007 2016-02-25
12886 CT18T006 2016-02-26
12906 CT18T005 2016-04-21
12907 CT18T005 2016-04-28
12916 CT18T010 2016-03-22
12918 CT18T010 2016-03-29
12934 CT18T007 2016-04-29
13475 CT18T010 2016-06-07

2) Provider Table:

Prov_ID Prov_Start Prov_End

CT18T001 2016-01-01 2016-06-30
CT18T005 2016-01-01 2016-06-30
CT18T006 2016-01-01 2016-05-13
CT18T007 2016-01-01 2016-06-30
CT18T010 2016-01-15 2016-06-30


I need to compute the Average Service Units (ASU) provided per provider per month during the reporting period (01/01/2016 - 06/30/2016), taking into account the number of days the provider(s) were active during the month.  So, there were 4 providers active for the entire month of February.  Provider 006 was active between 01/01/2016 and 05/13/2016 and thus should be counted for the entire month of Jan - April and 13 days in May.

Month ASU/Provider

January 4/(4+15/30)  assuming 30 days in a month
February 4/5
...
...
June

The final output will be a number (Average of the ASU/Provider per month) : Sum(ASU/Provider) / 6

I'm trying to generate a table for active providers by month from the Provider table and that is the part where I don't know how to proceed.

Any help will be appreciated.  This is my first time here.  If I've posted in the wrong section, please let me know.

Thanks.

 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For anyone who wants to help. (I may look at this later.) In SQL Fiddle, in the first box paste the following and click the "Build Schema" button:
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I do not understand the scenario.

there were 4 providers active for the entire month of February.

I see 5 in the data above, as all 5 started before February and ended after it.

Provider 006 was active between 01/01/2016 and 05/13/2016 and thus should be counted for the entire month of Jan - April and 13 days in May.

First, that is not shown in the data above. Second, there is no id "006" in the data.

January 4/(4+15/30)

Not sure what that means.
assuming 30 days in a month

Why assume that which can be calculated?

I just don't understand how you are understanding the data or what you are trying to do with it. Please explain it again.
 
Anders Alexandersson
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Brian Tkatch wrote:I do not understand the scenario.

there were 4 providers active for the entire month of February.

I see 5 in the data above, as all 5 started before February and ended after it.


You are correct - there are 5 providers in February

Brian Tkatch wrote:
Provider 006 was active between 01/01/2016 and 05/13/2016 and thus should be counted for the entire month of Jan - April and 13 days in May.

First, that is not shown in the data above. Second, there is no id "006" in the data.


CT18T006 is 006  - as per the Provider table that Provider is active 01/01 - 05-13

Brian Tkatch wrote:
January 4/(4+15/30)

Not sure what that means.


I was just thinking out aloud and laid out my approach - For January the Average Service Units factoring in the active Providers for the month of January would be 4 / 4.5 .  But if you have a better way of coming up with the desired result, I more than welcome it.

Brian Tkatch wrote:
assuming 30 days in a month


Why assume that which can be calculated?


I was assuming 30 days for the sake of simplicity.  If you think you could use actually factor in the actual number of days (31 in Jan, 28 in Feb etc), that would be terrific.
Brian Tkatch wrote:
I just don't understand how you are understanding the data or what you are trying to do with it. Please explain it again.


Essentially its a capacity utilization report and we need to calculate the units of service rendered per month per Provider for the entire reporting period (01/01 - 06/30) - the final output will be one number.  The first tables actually tells us that 1 unit of service (11812 is just an ID) was provided on 01/21 by CT18T005.  The second tables tells us periods for which the Providers were available.  This is important because there maybe providers (like CT18T001) who were active (and eligible) to provide service for all 6 months, but didn't do so.   There were 4 units overall in the month of January and 4.5 Providers were eligible (from the second table).
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This ought to be easy to do, once we understand exactly what needs to be done. Maybe it's my mood, but it just hasn't clicked yet. I'm still confused by the month thing though. The first post says, "per provider per month" (which i assume is calendar month) and then, "for the entire month of Jan - April and 13 days in May" (which is a lot more than a calendar month).

Anyway, let's decide all the steps that need to be done in a simple list, and we'll work out the answer to each one. We can join all the steps together at the end. I really just want a list of steps. For example:

1) Find all providers active for each calendar month.
2) For each, calculate average x in y
3) ...

As for the date functions, each RDBMS does it differently, but basically, there's always a way to figure out the last day of the month (and hence how many days are in it) either directly or indirectly (indirectly being first day of the following month - 1 day) and then subtracting the two dates directly (as dates) or indirectly (by extracting the days, or day of year first). A google search ought to find plenty of examples when we're up to that step.
 
Anders Alexandersson
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Where I am stuck is the query to come up with the month-wise breakdown of number of active providers per month based on the Provider table (your step 1 in the message below).  Basically if the report is run from 01/01 - 06/30, then I need a query to return the number of active providers for every month between January and June (this maybe a fraction if the provider is deactivated in the middle of the month).

That output will look like

Month                     Active Providers
January                            4.48  (4 providers were active the whole month and one provider for  .48 (15/31) of the month)
February                           5
March                                5
April                                  5
May                                    4.42
Jun                                     4

Thanks.



Brian Tkatch wrote:This ought to be easy to do, once we understand exactly what needs to be done. Maybe it's my mood, but it just hasn't clicked yet. I'm still confused by the month thing though. The first post says, "per provider per month" (which i assume is calendar month) and then, "for the entire month of Jan - April and 13 days in May" (which is a lot more than a calendar month).

Anyway, let's decide all the steps that need to be done in a simple list, and we'll work out the answer to each one. We can join all the steps together at the end. I really just want a list of steps. For example:

1) Find all providers active for each calendar month.
2) For each, calculate average x in y
3) ...

As for the date functions, each RDBMS does it differently, but basically, there's always a way to figure out the last day of the month (and hence how many days are in it) either directly or indirectly (indirectly being first day of the following month - 1 day) and then subtracting the two dates directly (as dates) or indirectly (by extracting the days, or day of year first). A google search ought to find plenty of examples when we're up to that step.
 
Brian Tkatch
Bartender
Posts: 567
25
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm trying to do this step by step. Reading this last post, the first step, perhaps, is to have a calendar, that is, a set of months. (I'm no expert with MySQL, and the lack of CTEs is really restricting.) So for starters, i tried getting the first month and last month of providers:
To get the amount of days in a month, you can EXTRACT(Day FROM) it's LAST_DAY():
The problem is, we probably need a calendar table to list all the months between the two dates. Because, if a provided month is included between the dates but never explicitly listed (such as it itself being a start or end date) there's no way to pull it up in a query. A recursive CTE would shine right here, but, it looks like it isn't supported in mySQL. (If it is, please link to the documentation.) So, the next best this is a calendar table. That is, a table that lists all dates. (For convenience, maybe even have a separate table with columns for year, month, and days in month?) With that available, calculating amount of months or days would be much easier. It's just some fancy joins with GROUP BY/HAVING. Googling calendar table mysql brings up plenty of examples.

It is possible the query would be easier to understand by building a set of (temporary) table in between. Subqueries are not MySQL's highpoint.

Once the calendar table is built, we'll start to GROUP BY the query on month (to get the results you are showing in the previous post) using COUNT() to get active providers. That will require some math to decide on the partials, but it's just based on the start month and end month (separately, because only those two months can be partial), where we EXTRACT(Day FROM) them, and match them up against the 1st of the month, and the LAST_DAY() of the month, respectively. (If a provider can start after the first day of a month and end before the last day of that very month, it may complicate it a little.)

If this makes sense to you, please start by creating a calendar table and showing the DDL (so i can run it in SQL Fiddle) and start writing a GROUP BY query, if you can, to list the individual months. I'll be happy to jump in and help. BTW, to show a month's name (as opposed to its number) use the MONTHNAME() function.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic