• Post Reply Bookmark Topic Watch Topic
  • New Topic

Group by Categories  RSS feed

 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I need to create a difficult query, haven't find the way to do it...

Problem:
There's a huge table I need to query in, but this table is too large, can't afford traverse through it several times, so I have to extract two different things from it based on a category, one way and not efficient way to do it is creating two queries over the same table, extracting what i need in each of them, then process what i need, but it's not the way...

If we have this table below in img1.png you'll see a table with id, category, and a value, I need to group by two categories, let's say one category is grouped by items with type 'a' and 'b' and the other category is with the rest of types, i need a result like you see in img2.png

Any help/suggestion would be appreciated...
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Don't know why i couldn't attach the images with the tables, input table would be as follows

IdTypeVal
1a2
2b5
3b3
1b4
4x5
6b5
2c3
2d2
1x2
4b3
6c2
4d3


Categories would be
Group by Id adding Vals with type a, b (alpha)
Group by Id addinv Vals with types not in (a, b) i mean the rest of types (beta)

Result would be:

IdSumCategory
16alpha
12beta
25alpha
25beta
33alpha
43alpha
48beta
65alpha
62beta


Please help, thanks...


 
Paul Clapham
Sheriff
Posts: 22841
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You seem to have already decided not to do two queries -- why? It's a bad idea to decide some process is "not efficient" when you don't have any other process to compare it to. Personally I would do the two queries and combine them with "UNION" to produce the result you're looking for.
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's currently made it with two queries, one query is performed and joined it to a table, then second query is made it and joined it to the same table, then it validates (filter) based on the two queries to discard some records.

it's like:

Table A
Table B

Query 1 as Q1 on table B join table A
Query 2 as Q2 on table B join table A
where Q1 filtering
and Q2 filtering

It's traversing the same table twice, and the query 1 is grouping by a category and query 2 is grouping by another category, i came to the idea of not doing the same thing twice as the cost of traversing the table twice is high and is just grouping, if i can group the two categories in the same table i would traverse the table only once...
 
Paul Clapham
Sheriff
Posts: 22841
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think it would be more straightforward if you had the two queries do the necessary filtering, rather than doing that as a separate step. Also when you say

one query is performed and joined it to a table


does that mean that the result of the query is appended to a database table? That sounds like a lot of extra work for the database to do. Why not something like

Select xxx from B, A where some-conditions group by category
union
Select xxx from B, A where some-other-conditions group by other-category

?
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes I meant exactly that, when query 1 ran, it's attached to table A, then the other query ran, and it's attached to table A, and after that, the filtering is done it...

So I'll try that, I'll do the two queries, use a Union, and after that, attach it to table A, although i'll had two scans, do the filtering before join to table A I could find some optimization there, specially because table A is huge, Table B which is used for query 1 and query 2 is big too, but instead of making two joins over two huge tables would improve making the table B filtered before joining to table A, I'll tell you how it goes...
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
a union isn't the path to take, it scan the table twice, plus the spend the engine takes to merge the results, and it adds the same amount of records to the main table, not winning anything with union...
 
Brian Tkatch
Bartender
Posts: 598
26
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul Clapham wrote:I would do the two queries and combine them with "UNION" to produce the result you're looking for.

Repeat after me: UNION ALL, UNION ALL, UNION ALL. Never use UNION unless you really need to (and even then...) This is a case where SQL is backwards. UNION should really be UNION DISTINCT, and UNION ALL, just plain UNION. But the guy who "invented" UNION happened to need a DISTINCT, and, rather unfortunately, it stuck. Here's is where SQL must hang it's head in shame for being stupid. One keyword doing two things (append, distinct) is just plain bad programming. Shame on you SQL. So, remember the mantra and repeat after me: UNION ALL, UNION ALL, UNION ALL. :)

Ricardo Coto wrote:union isn't the path to take, it scan the table twice

That is not necessarily true. And, even if it did read it twice, the second time it would likely be cached in memory, and therefore add (relatively) very little processing time.

It is true that it is usually not the path to take. But not because of it maybe scanning twice. Instead, because it is usually not telling the RDBMS what it is you are trying to do, and thus the RBDMS is forced into (rewriting) an inefficient query.

Anyway, let's take a look at that data:
So, you want take Type and modify it before GROUP BY gets ahold of it. No problem, that is what CASE is for. :)
There are prettier versions, perhaps, but this is the base query.

I am not sure if it works, but i typed it in on SQL Fiddle.
 
Paul Clapham
Sheriff
Posts: 22841
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ricardo Coto wrote:a union isn't the path to take


Yes, I realized that last night and now I'm back. UNION isn't the way to go but not for the reasons you suggest -- UNION doesn't add any records to anything so your problem with records being added to the main table must be a different problem.

Brian's suggestion of "CASE" is a much better idea and that's the way to go. (And I apologize for being sloppy about UNION -- I haven't used it much so I was being vague about it.)
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Nice, it works, time isn't the best, how can i discard records not in the categories i defined, what about if i need to define two categories, and the items not present in those categories shouldn't be selected... this is the query, take a look...

SELECT   acc.id,
         Sum(COALESCE( t.credit_amount, 0 )) -
         Sum(COALESCE( t.debit_amount, 0 )) as payedAmount,
         CASE WHEN t.type IN ('REGULAR','RETURNED','REFUND') THEN 'alpha' 
         WHEN t.type NOT IN ('MINIMUM', 'ACH', 'CARD') THEN 'beta'
         THEN NULL
         END Category
FROM     account_transact t
JOIN     account acc
ON       t.account_id = acc.id
WHERE    acc.status IN ( 'A','CO' )
AND      t.voided_ts IS NULL
AND      t.recovery = false
AND      t.surrender = false
AND      t.effective_dt <= CURRENT_DATE
GROUP BY acc.id,
CASE
WHEN t.type IN ('REGULAR','RETURNED','REFUND') THEN 'alpha'
WHEN t.type NOT IN ('MINIMUM', 'ACH', 'CARD') THEN 'beta'
THEN NULL
END
 
Brian Tkatch
Bartender
Posts: 598
26
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ah, SQL Server. It helps to know the RDBMS.

Why COALESCE? SUM ignores NULLs. If you are afraid that all the values might be NULL, resulting in a NULL even after the summation, put it as COALESCE(SUM()), so COALESCE is only run once.

Let's use a CTE to make it more readable. By throwing a WITH on top, the CASE only need be written once, and there is usually no added cost with a CTE like this.

And now format and rewrite it into something i can understand... The CASE statement should be fine. If you want to time it, just summarize the CTE. That is, something like this, timing it both with and without the CASE statement. (Don't time the first run, because it might not be cached yet, giving the second run an unfair advantage.)If this is still slow, we can look a bit deeper, but we'll need to have a bit more information too, such as the plan being used, amount of records, and so on.

I have not tested the above queries, due to lack of data and the need to go to sleep. *yawn* It's past my bed time.
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm checking the query, but it's not sql server, is postgres
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's taking a lot of time, if i run two queries, they take 38 secs one of them, and 48 secs the other one...

Query you put before, get me empty categories and takes a lot of time

Account has
436.087 records

Account_Transact_Category has
21.163.838 records

 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A piece of the results:

369927851 |     -749.44 | beta
369927851 |       86.49 |
369927915 |     -896.34 | beta
369927915 |       86.20 |
369928298 |    -2805.12 | beta
369928298 |      215.78 |
369928503 |      100.00 | alpha
369928503 |     -654.82 | beta
369928503 |      184.59 |
369928570 |    -1789.83 | beta
369928570 |      149.16 |
369928632 |    -1395.96 | beta
369928632 |      161.10 |

(614203 rows)

Time: 51006.365 ms

Actually the time it took is pretty good, it took a lot of time presenting me de results because i'm using the shell of psql, but if you notice it said the time to return was 51 seconds, now the thing that i don't understand is the empty category...
 
Brian Tkatch
Bartender
Posts: 598
26
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ricardo Coto wrote:but it's not sql server, is postgres

I guess tired me got mixed up with COALESCE. Thank you for the correction.

How many records does the query return? I ask this because printing out many screens-full of results to the terminal takes time. Which means, there is the execution time of the query, and then the shell has to fetch records, print them to the screen, and when it needs more, fetch some more. This can be tested by shrinking the window so you don't watch the results being printed. This goes much faster (when this is what is slowing it down.

Oh, you already said that yourself...
Ricardo Coto wrote:Actually the time it took is pretty good, it took a lot of time presenting me de results because i'm using the shell of psql

I'm still curious what the actual timing is.

Anyway, a quick search found this informative reply referring to EXPLAIN ANALYZE and \timing. Of course, there is a really cheap way to do the same thing, that is, by wrapping the entire query in another query with only one record returned:
Ricardo Coto wrote:Query you put before, get me empty categories and takes a lot of time

The empty categories are groups where Type is 'MINIMUM', 'ACH', 'CARD', or is NULL.  Is that not accurate? (BTW, side point, ELSE NULL is redundant, and can be removed. I should have noticed that last night.)
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Query works, i've been studying deeply the query i'm trying to optimize, and the issue is beyond that query, your help would be appreciated as i'm running out of ideas, i'm a developer, so my expertise isn't querying optimizing, so this is the issue from the big picture of it...

I have a table in which there is some joins, and after that left joins, but his left joins have inner queries, this left joins have alias for that inner query, after that in the root evaluation of the query there are references to the alias of that inner joins accesing some internal attribute, that's my main issue because progress creates nested loops, so to optimize this query i really really need a way to remove this nested loops...

Select id
from table a
inner join table b
inner join table c
left join(
            select id,amount
            from table p
            where bla,bla,bla
) aliasB
on b.id = aliasB.id
left join(
            select id,amount
            from table c
            where bla bla bla
) aliasC
on c.id = aliasC.id

where aliasB.amount > aliasC.amount
 
Brian Tkatch
Bartender
Posts: 598
26
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I really can't stand the ANSI join syntax. I find the classic syntax so much easier to understand. Anyway, that is an incomplete query, so i can't understand what you are trying to do.

Fwiw, one quick way to change an outer query is to do it the old fashioned way. That is, use a correlated sub-query in the SELECT clause. It's not always an option, but many times it is faster, so it might be worth a shot.
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I wrote the query as i did, because the main issue is the left joins with inner queries, aliasing and then using them externally in the where accessing inside attribures of it...

please notice the where clause, how it's accesing aliasB.amount and comparing it to aliasC.amount, it unleashed a nasty nested loop that i need to remove..

that's how the query is made, i'm not the one who wrote it, i'm optimizing it...
 
Brian Tkatch
Bartender
Posts: 598
26
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I understand why you did it, and i appreciate your attempt to help. However, it does not have to be just the where clause causing the nested loops. (And, nested loops are not always bad either.)

If you want to help, you can whittle down the query to it's smallest working form, but still exhibits the issue. Otherwise, it's probably best if you post the whole query. It's much easier to see what is going on that way.
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
oh yes sorry, i'm telling this, because i explain analyze the query, and i'm 100% sure, that the most inefficient part of the query, is because that nested loop...

I've been making tests, to see when a nested loop is generated (using postgres off course) and the only way i've been able to generate one, is how i specified in the query i wrote above, and as it is is how the query i'm trying to optimize is performing it, so, i need to remove how it left joins those tables (is not the left join creating the nested loop) aliasing them and after that, comparing their inside properties outside...
 
Brian Tkatch
Bartender
Posts: 598
26
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The optimizer is what chooses what to do, and that is based on a lot of things, including, table size, indexes, and columns retrieved. It is virtually impossible to do anything other than guess, when shown a simple version of the query.

Then again, that may just be my own limitation. Perhaps one of the other people here can help you.
 
Knute Snortum
Sheriff
Posts: 4288
127
Chrome Eclipse IDE Java Postgres Database VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you have pgAdmin III, you can put the query in the SQL Editor and press Explain Query.  That can give you clues to what is inefficient.
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes i actually did already, that's why i'm trying to remove the nested loops, thanks knute
 
Dave Tolls
Ranch Foreman
Posts: 3068
37
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This might seem like an obvious question, but do you have a DBA where you are to sit with and go through this?
As Brian says, there's a whole stack of things that can effect a query plan, much of which is not something that'll be easy to deal with in a forum like this for more complex queries like yours.
Just look at some Ask Tom threads (that's an Oracle Q/A thing) and you'll see how much information is needed to analyse these things, and one of the things he'll often ask for is the actual query that shows the problem, including dummy tables with data.
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're totally right Dave, i think exactly like you, but, it is what it is, we have dba and i request one, but the request was denied, and it's expected for me to do the job, i'm a developer, and i know how to query and i have optimized before, but this issue is beyond what i can achieved in the time they want me to, so i have to do it someway...

I found an optimization of 40% that i'm about to apply, as I told you before there are several inner queries, and as you know this demands the engine to run this several times, so i thought to create this inner queries in materialized views, some of them delay more the query which is weird, i think it's because for the engine is better to handle it in memory, but there are one of them that is actually improving the response time of the query...

Below is the query, I'm creating a materialized view over 'z'

WITH actions AS
(
         SELECT   Rank() OVER(partition BY customer_account_id ORDER BY action_timestamp ASC),
                  ai.id,
                  ai.customer_account_id,
                  ai.action,
                  Cast (ai.action_timestamp AS DATE)
         FROM     action_item ai
         JOIN     customer_account cac
         ON       ai.customer_account_id = cac.id
         WHERE    cac.status IN ('ACTIVE',
                                 'CHARGE_OFF') ), account_attributes AS
(
         SELECT   customer_account_id,
                  Count(*)
         FROM     customer_account_attribute
         WHERE    NAME IN ('BANKRUPCY',
                           'DISPUTE',
                           'CREDIT_CARD',
                           'BANKRUPCY_PEND',
                           'DO_NOT_CONTACT',
                           'FRAUD',
                           'NO_CALLS',
                           'NO_EMAIL',
                           'SURRENDER',
                           'SETTLEMENT_PENDING',
                           'LEGAL',
                           'ATTORNEY_REP')
         GROUP BY customer_account_id), sent_history AS
(
       SELECT ccnh.*
       FROM   command_center_configuration ccc
       JOIN   document_template dt
       ON     ccc.document_template_id = dt.id
       AND    ccc.id= 231489734
       JOIN   command_center_notification_history ccnh
       ON     ccnh.document_template_id = dt.id)
SELECT    ap.id
FROM      customer_account a
JOIN      customer_agreement ag
ON        a.customer_agreement_id =ag.id
JOIN      customer_application ap
ON        ap.id = ag.customer_application_id
JOIN      customer_application_customer cac
ON        ap.id=cac.customer_application_id
JOIN      customer c
ON        c.id=cac.customer_id
JOIN      merchant m
ON        m.id=ap.merchant_id
LEFT JOIN sent_history ccnh
ON        ccnh.application_id = ap.id
LEFT JOIN actions ac
ON        ac.customer_account_id = a.id
AND       ac.rank = 1
LEFT JOIN account_attributes att
ON        a.id = att.customer_account_id
LEFT JOIN
          (
                     SELECT     cp_f.id ,
                                Sum(
                                CASE
                                           WHEN fr_f.type = 'FUNDING' THEN amount
                                           ELSE amount*(-1)
                                END) AS funded_amt
                     FROM       funding_record fr_f
                     INNER JOIN customer_agreement cg_f
                     ON         fr_f.customer_agreement_id = cg_f.id
                     INNER JOIN customer_application cp_f
                     ON         cp_f.id = cg_f.customer_application_id
                     AND        cp_f.status NOT IN ('WITHDRAWN',
                                                    'ERROR')
                     INNER JOIN customer_account cap_f
                     ON         cg_f.id = cap_f.customer_agreement_id
                     WHERE      funding_record_batch_id IS NOT NULL
                     AND        cap_f.status IN ('ACTIVE',
                                                 'CHARGE_OFF')
                     GROUP BY   cp_f.id
                     ORDER BY   cp_f.id ) fundings
ON        ap.id = fundings.id
LEFT JOIN
          (
                   SELECT   acc.id,
                            Sum( COALESCE( t.credit_amount, 0 ) ) -Sum( COALESCE( t.debit_amount, 0 ) ) AS amount_paid,
                            (
                                   SELECT debit_amount
                                   FROM   customer_account_transaction z
                                   WHERE  z.account_id = acc.id
                                   AND    z.type = 'OPENING_BALANCE'
                                   AND    z.is_recovery = false) AS opening_balance,
                            Max(effective_dt)                    AS last_payment
                   FROM     customer_account_transaction t
                   JOIN     customer_account acc
                   ON       t.account_id = acc.id
                   WHERE    acc.status IN ( 'ACTIVE',
                                           'CHARGE_OFF' )
                   AND      t.type NOT IN ('MINIMUM_PAYMENT',
                                           'ACH_PAYMENT',
                                           'CARD_PAYMENT')
                   AND      t.voided_ts IS NULL
                   AND      t.is_recovery = false
                   AND      t.is_surrender = false
                   AND      t.effective_dt <= CURRENT_DATE
                   GROUP BY acc.id ) paid
ON        a.id = paid.id
LEFT JOIN
          (
                   SELECT   acc.id,
                            Sum(COALESCE(t.credit_amount, 0)) - Sum(COALESCE(t.debit_amount, 0)) AS amount_paid
                   FROM     customer_account_transaction t
                   JOIN     customer_account acc
                   ON       t.account_id = acc.id
                   WHERE    acc.status IN ( 'ACTIVE',
                                           'CHARGE_OFF' )
                   AND      t.type IN ('REGULAR_PAYMENT',
                                       'RETURNED_PAYMENT',
                                       'PAYMENT_REFUND')
                   AND      t.voided_ts IS NULL
                   AND      t.is_recovery = false
                   AND      t.is_surrender = false
                   AND      t.effective_dt <= CURRENT_DATE
                   GROUP BY acc.id) paid2
ON        a.id = paid2.id
LEFT JOIN
          (
                     SELECT     account_id,
                                Min(effective_dt) AS mindate
                     FROM       customer_account_transaction
                     INNER JOIN customer_account cac
                     ON         cac.id = account_id
                     WHERE      type IN ( 'REGULAR_PAYMENT',
                                         'ACH_PAYMENT',
                                         'CARD_PAYMENT')
                     AND        cac.status IN ('ACTIVE',
                                               'CHARGE_OFF')
                     AND        ((
                                                      account_payment_batch_id IS NOT NULL)
                                OR         (
                                                      payment_method IN ('CREDIT_CARD',
                                                                         'CASH',
                                                                         'CHECK',
                                                                         'MONEY_GRAM',
                                                                         'MONEY_ORDER',
                                                                         'PAYPAL',
                                                                         'WESTERN_UNION')
                                           AND        account_payment_batch_id IS NULL))
                     GROUP BY   account_id
                     HAVING     Min(effective_dt) BETWEEN ( CURRENT_DATE - 15 ) AND        (
                                           CURRENT_DATE )) z
ON        a.id = z.account_id
WHERE     ccnh.return_reason IS NULL
AND       a.collections_category IN ('QUEUE_RETURNS',
                                     'QUEUE_RETURNS_FOLLOW_UP',
                                     'QUEUE_30_DAY',
                                     'QUEUE_60_DAY',
                                     'QUEUE_90_DAY')
AND       a.status IN ('ACTIVE')
AND       ccnh.sent_dt IS NULL
AND       (
                    action_timestamp < (CURRENT_DATE)
          OR        ( (
                                        action_timestamp >= (CURRENT_DATE) )
                    AND       ac.action NOT IN ('BANKRUPTCY',
                                                'CREDIT_CARD_PAYMENT',
                                                'DISPUTE_FOLLOW_UP',
                                                'DIALER',
                                                'FOLLOW_UP',
                                                'PENDING_RETRIEVAL',
                                                'PROMISE_TO_PAY',
                                                'PROMISE_TO_PAY_MAILING',
                                                'SETTLEMENT',
                                                'VERIFY_PAYMENT'))
          OR        ac.id IS NULL )
AND       att.customer_account_id IS NULL
AND       fundings.funded_amt > 0
AND
          CASE
                    WHEN paid2.amount_paid IS NULL THEN true
                    ELSE paid2.amount_paid / fundings.funded_amt < 1.90
          END
AND       (
                    paid.amount_paid * -1) > 100
AND       a.days_past_due >= 14
AND       z.mindate IS NULL
GROUP BY  ap.id





 
Brian Tkatch
Bartender
Posts: 598
26
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Wow, that is one rockin' query.

I want to take a look at it later, unless Dave beats me to it. However, so far, i have looked at the CTEs and have some comments. First, here they are formatted to my preferences. First thing to note is Sent_History uses SELECT *. I wouldn't use * even if all the columns were required. Instead, list what you need. Specifically, that's Application_Id, Return_Reason, and Sent_Dt. I'll have to look again, but those IS NULLs in the main query might be better off in the CTE.

Second, Account_Attributes does a GROUP BY and COUNT(*) but is never used. Instead, GROUP BY is just being used in lieu of DISTINCT. That's okay, but i think DISTINCT is much clearer. In any case, the COUNT(*) is unnecessary.

Third, Actions is only being used for Customer_Account_Id, making the RANK() and CAST() useless, and the other columns being retrieved, just plain confusing. If the optimizer doesn't remove that analytic function, that's some wasted processing time right there.

That's it for now. Go to run to dinner in a few.
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'll take a look thank you very much Brian...
 
Brian Tkatch
Bartender
Posts: 598
26
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A few more comments.

1) The main query does a GROUP BY, which is in lieu of DISTINCT. That is ultimately stylistic, but i think DISTINCT is much more clear.

2) Merchant is included and joined to Customer_Application, yet nothing else is done with it. Assuming Customer_Application.Merchant_Id is FKed to Merchant and NOT NULL, that join is redundant and should be removed.

3) The left joins are confusing because they are sub-queries that are then joined. To make them easier to understand, i suggest they all be made CTEs. Though, i want to see them a little more closely to see if they are actually required at all.

Anyway, i'm off to bed. I may just trudge through the rest tomorrow.

FWIW, this is my work-in-progress so far:
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I had to update the query, and now is taking more time, as there's one field that i needed :s

WITH actions AS
(
         SELECT   Rank() OVER(partition BY customer_account_id ORDER BY action_timestamp ASC),
                  ai.id,
                  ai.customer_account_id,
                  ai.action,
                  Cast (ai.action_timestamp AS DATE)
         FROM     action_item ai
         JOIN     customer_account cac
         ON       ai.customer_account_id = cac.id
         WHERE    cac.status IN ('ACTIVE',
                                 'CHARGE_OFF') ), account_attributes AS
(
         SELECT   customer_account_id,
                  Count(*)
         FROM     customer_account_attribute
         WHERE    NAME IN ('BANKRUPCY',
                           'DISPUTE',
                           'CREDIT_CARD',
                           'BANKRUPCY_PEND',
                           'DO_NOT_CONTACT',
                           'FRAUD',
                           'NO_CALLS',
                           'NO_EMAIL',
                           'SURRENDER',
                           'SETTLEMENT_PENDING',
                           'LEGAL',
                           'ATTORNEY_REP')
         GROUP BY customer_account_id), sent_history AS
(
       SELECT ccnh.*
       FROM   command_center_configuration ccc
       JOIN   document_template dt
       ON     ccc.document_template_id = dt.id
       AND    ccc.id= 231489734
       JOIN   command_center_notification_history ccnh
       ON     ccnh.document_template_id = dt.id)
SELECT    ap.id,
          a.collections_category
FROM      customer_account a
JOIN      customer_agreement ag
ON        a.customer_agreement_id =ag.id
JOIN      customer_application ap
ON        ap.id = ag.customer_application_id
JOIN      customer_application_customer cac
ON        ap.id=cac.customer_application_id
JOIN      customer c
ON        c.id=cac.customer_id
JOIN      merchant m
ON        m.id=ap.merchant_id
LEFT JOIN sent_history ccnh
ON        ccnh.application_id = ap.id
LEFT JOIN actions ac
ON        ac.customer_account_id = a.id
AND       ac.rank = 1
LEFT JOIN account_attributes att
ON        a.id = att.customer_account_id
LEFT JOIN
          (
                     SELECT     cp_f.id ,
                                Sum(
                                CASE
                                           WHEN fr_f.type = 'FUNDING' THEN amount
                                           ELSE amount*(-1)
                                END) AS funded_amt
                     FROM       funding_record fr_f
                     INNER JOIN customer_agreement cg_f
                     ON         fr_f.customer_agreement_id = cg_f.id
                     INNER JOIN customer_application cp_f
                     ON         cp_f.id = cg_f.customer_application_id
                     AND        cp_f.status NOT IN ('WITHDRAWN',
                                                    'ERROR')
                     INNER JOIN customer_account cap_f
                     ON         cg_f.id = cap_f.customer_agreement_id
                     WHERE      funding_record_batch_id IS NOT NULL
                     AND        cap_f.status IN ('ACTIVE',
                                                 'CHARGE_OFF')
                     GROUP BY   cp_f.id
                     ORDER BY   cp_f.id ) fundings
ON        ap.id = fundings.id
LEFT JOIN
          (
                   SELECT   acc.id,
                            Sum( COALESCE( t.credit_amount, 0 ) ) -Sum( COALESCE( t.debit_amount, 0 ) ) AS amount_paid,
                            Max(effective_dt)                                                           AS last_payment
                   FROM     snapdata.customer_account_transaction t
                   JOIN     snapdata.customer_account acc
                   ON       t.account_id = acc.id
                   WHERE    acc.status IN ( 'ACTIVE',
                                           'CHARGE_OFF' )
                   AND      t.type NOT IN ('MINIMUM_PAYMENT',
                                           'ACH_PAYMENT',
                                           'CARD_PAYMENT')
                   AND      t.voided_ts IS NULL
                   AND      t.is_recovery = false
                   AND      t.is_surrender = false
                   AND      t.is_surrender = false
                   AND      t.effective_dt <= CURRENT_DATE
                   GROUP BY acc.id ) paid
ON        a.id = paid.id
LEFT JOIN
          (
                   SELECT   acc.id,
                            Sum(COALESCE(t.credit_amount, 0)) - Sum(COALESCE(t.debit_amount, 0)) AS amount_paid
                   FROM     snapdata.customer_account_transaction t
                   JOIN     snapdata.customer_account acc
                   ON       t.account_id = acc.id
                   WHERE    acc.status IN ( 'ACTIVE',
                                           'CHARGE_OFF' )
                   AND      t.type IN ('REGULAR_PAYMENT',
                                       'RETURNED_PAYMENT',
                                       'PAYMENT_REFUND')
                   AND      t.voided_ts IS NULL
                   AND      t.is_recovery = false
                   AND      t.is_surrender = false
                   AND      t.is_surrender = false
                   AND      t.effective_dt <= CURRENT_DATE
                   GROUP BY acc.id) paid2
ON        a.id = paid2.id
LEFT JOIN
          (
                     SELECT     account_id,
                                Min(effective_dt) AS mindate
                     FROM       customer_account_transaction
                     INNER JOIN customer_account cac
                     ON         cac.id = account_id
                     WHERE      type IN ( 'REGULAR_PAYMENT',
                                         'ACH_PAYMENT',
                                         'CARD_PAYMENT')
                     AND        cac.status IN ('ACTIVE',
                                               'CHARGE_OFF')
                     AND        ((
                                                      account_payment_batch_id IS NOT NULL)
                                OR         (
                                                      payment_method IN ('CREDIT_CARD',
                                                                         'CASH',
                                                                         'CHECK',
                                                                         'MONEY_GRAM',
                                                                         'MONEY_ORDER',
                                                                         'PAYPAL',
                                                                         'WESTERN_UNION')
                                           AND        account_payment_batch_id IS NULL))
                     GROUP BY   account_id
                     HAVING     Min(effective_dt) BETWEEN ( CURRENT_DATE - 15 ) AND        (
                                           CURRENT_DATE )) z
ON        a.id = z.account_id
WHERE     ccnh.return_reason IS NULL
AND       a.collections_category IN ('QUEUE_RETURNS',
                                     'QUEUE_RETURNS_FOLLOW_UP',
                                     'QUEUE_30_DAY',
                                     'QUEUE_60_DAY',
                                     'QUEUE_90_DAY')
AND       a.status IN ('ACTIVE')
AND       ccnh.sent_dt IS NULL
AND       (
                    action_timestamp < (CURRENT_DATE)
          OR        ( (
                                        action_timestamp >= (CURRENT_DATE) )
                    AND       ac.action NOT IN ('BANKRUPTCY',
                                                'CREDIT_CARD_PAYMENT',
                                                'DISPUTE_FOLLOW_UP',
                                                'DIALER',
                                                'FOLLOW_UP',
                                                'PENDING_RETRIEVAL',
                                                'PROMISE_TO_PAY',
                                                'PROMISE_TO_PAY_MAILING',
                                                'SETTLEMENT',
                                                'VERIFY_PAYMENT'))
          OR        ac.id IS NULL )
AND       att.customer_account_id IS NULL
AND       fundings.funded_amt > 0
AND
          CASE
                    WHEN paid2.amount_paid IS NULL THEN true
                    ELSE paid2.amount_paid / fundings.funded_amt < 1.90
          END
AND       (
                    paid.amount_paid * -1) > 100
AND       a.days_past_due >= 14
AND       z.mindate IS NULL
GROUP BY  ap.id,
          a.collections_category
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I just don't get it, in the first query i put, i created a materialized view for 'z' and it improves time, now that i need to grop by ap.id nad a.collections_category, the query running with the materialized view takes more time than without it, I'm loosing my patience...
 
Brian Tkatch
Bartender
Posts: 598
26
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have not looked at the left joins in detail (yet). I do have a question though. If none of the columns in the SELECT clause are from the left joined tables, what is the purpose of joining them?

Maybe i am missing something. Here is my thought on the matter: A join is done to bring a table into scope or to act as an EXISTS(). An outer join negates the second option, leaving only the idea of bringing a table into scope. However, none of those tables are used for their columns. And, the two columns that are used for there columns, are normal joins, and are joined to each other directly (without an outer join between them). If so, what is the purpose of the outer joins?
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
if you notice, below there are some filters in the where clause with the column extracted from the inner queries in the left joins...
 
Brian Tkatch
Bartender
Posts: 598
26
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ricardo Coto wrote:if you notice, below there are some filters in the where clause with the column extracted from the inner queries in the left joins...


Oh, now i see them. Might be better as EXISTS() then. Seems kind of wasteful to outer join them, and then just check a column or two.

I hope to look at the query a bit further tonight. I've gotten a bit busy and haven't had the mindset and time it takes to go through a query like this.
 
Brian Tkatch
Bartender
Posts: 598
26
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Some notes as i am going through it.

Fundings
- ORDER BY is redundant. First, because results are not expected from it. Second, because GROUP BY implicitly does an ordering anyway (At least in Oracle it does.)
- One of the columns was not qualified with its table name. Not really an issue, but it should be used for clarity.
- The CASE statement is using the complex version, where the simple version should be used. (Documentation)
- This might just be stylistic, but the CASE statement includes Amount in the CASE. However, since Amount is always returned, it should be outside the CASE statement.
- Customer_Application.Id is SELECTed but never used.

Paid and Paid2
- This might just be stylistic, but  the boolean columns are being checked for = 'false'. Instead, just say NOT column-name
- SUM(COALESCE(column, 0)) is redundant, as SUM ignores NULLs. If you are afraid that all the values will be null, use COALESCE outside of SUM(), so it is only done once.
- Account.Id, Opening_Balance, and Last_Payment, are SELECTed but never used.
- Being Paid and Paid2 are only different based on Transaction.Type, you should combine the two and just pull two columns instead, using a CASE statement to implement the condition in the WHERE clause.
  It'd probably be easier to make the entire query a CTE, and then SELECT it with a CASE statement. I have tried doing so in what i have below. Creating a new CTE Pre_Paid to handle the logic.

That's it for now. Here is what i have so far:



 
Brian Tkatch
Bartender
Posts: 598
26
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Continuining.

Z
- Alias is not very descriptive.
- Many columns are not fully qualified, even though at least two tables are in scope. I assume they are all from Customer_Account_Transaction.
- Inside the OR, AND Account_Payment_Batch_Id IS NULL is probably redundant, since this is an OR from Account_Payment_Batch_Id IS NOT NULL.
   Unless the other clauses are exclusive to when Account_Payment_Batch_Id IS NULL should fail if it IS NOT NULL.
- Customer_Account is joined, but not used. Assuming Transaction.Account_Id is FKed and NOT NULL, Customer_Account is redundant. A benefit of removing it is there will be only one version of the table in scope.
- Z is only used to check that MinDate IS NULL. That should probably be an EXISTS(), not a table in the FROM clause.
- On second thought, the query is using MIN() to see in there is a NOT NULL result, because MIN() (like many other aggregate functions) will only return NULL if there are no NOT NULL values in the group.
   This is then checked in the WHERE clause to make sure it is NULL. This is not only a backwards approach, it pretty much negates the use of an index as all values must be checked. And, MIN() is not intuitive.
   Instead, use a direct approach, that is, NOT EXISTS() checking for a NOT NULL value. This will allow the database to stop looking when the first NOT NULL value is found.
   The question is, does the query still need the GROUP BY for the HAVING on Effective_Dt. If not, the GROUP BY can be removed.
- On third thought , the HAVING clause uses BETWEEN on Effective_Dt, which negates the possibility of a NULL. That very column is then checked for NULL in the WHERE clause. This is illogical, and requires an outer join.
   That is, the only way this can be, is if the query found no results, and the outer join therefore supplied NULLs for all the expected columns.
   Meaning, the query is using an outer join for the sole purpose of making sure no records exist that satisfy the WHERE clause inside the sub-query. This should most certainly be a NOT EXISTS.
   I keep wondering if the GROUP BY is redundant. Technically it is not, MIN() restricts the earliest date for the entire set of transaction to be within the last 2 weeks.

Anyway, assuming i kept it all straight, Z could be changed into:

 
Brian Tkatch
Bartender
Posts: 598
26
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
(A bunch of comments just got wiped out by a mis-click. )

Anyway, here is what i ended up with,  but i think the remaining outer joins should be changed to EXISTS() or NOT EXISTS(). I'm assuming i mad some mistakes and it'll need some help to work, but it should be much clearer as to what is being done, especially with the removal of redundant predicates, and the changing from outer join to NOT EXISTS().

Is this slow too?

 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
that last query has a bunch of errors, i'm trying to correct them in order to see the numbers... i'm analyzing all comments though it will take some time, right now, the only optimization i made, from some miliseconds, was to materialized z...
 
Ricardo Coto
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Brian:

Adjustment in Funding removing the order by gave one ms of optimization
Adjustment in paid removing last_payment gave me a little bit of time (less tan a ms)
Can't remove the ids you mentioned because they are use to link the joins...

Thanks, i'll go with the next tips...
 
Brian Tkatch
Bartender
Posts: 598
26
Linux Notepad Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ricardo Coto wrote:Brian:

Adjustment in Funding removing the order by gave one ms of optimization
Adjustment in paid removing last_payment gave me a little bit of time (less tan a ms)
Can't remove the ids you mentioned because they are use to link the joins...

Thanks, i'll go with the next tips...


I would change the outer joins to NOT EXISTS, wherever possible. That's about all i can help without knowing the data model and exactly what is being done.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!