Brian Tkatch

Bartender
+ Follow
since Apr 06, 2015
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
In last 30 days
0
Forums and Threads

Recent posts by Brian Tkatch

Ricardo Coto wrote:Brian:

Adjustment in Funding removing the order by gave one ms of optimization
Adjustment in paid removing last_payment gave me a little bit of time (less tan a ms)
Can't remove the ids you mentioned because they are use to link the joins...

Thanks, i'll go with the next tips...



I would change the outer joins to NOT EXISTS, wherever possible. That's about all i can help without knowing the data model and exactly what is being done.
(A bunch of comments just got wiped out by a mis-click. )

Anyway, here is what i ended up with,  but i think the remaining outer joins should be changed to EXISTS() or NOT EXISTS(). I'm assuming i mad some mistakes and it'll need some help to work, but it should be much clearer as to what is being done, especially with the removal of redundant predicates, and the changing from outer join to NOT EXISTS().

Is this slow too?

Continuining.

Z
- Alias is not very descriptive.
- Many columns are not fully qualified, even though at least two tables are in scope. I assume they are all from Customer_Account_Transaction.
- Inside the OR, AND Account_Payment_Batch_Id IS NULL is probably redundant, since this is an OR from Account_Payment_Batch_Id IS NOT NULL.
  Unless the other clauses are exclusive to when Account_Payment_Batch_Id IS NULL should fail if it IS NOT NULL.
- Customer_Account is joined, but not used. Assuming Transaction.Account_Id is FKed and NOT NULL, Customer_Account is redundant. A benefit of removing it is there will be only one version of the table in scope.
- Z is only used to check that MinDate IS NULL. That should probably be an EXISTS(), not a table in the FROM clause.
- On second thought, the query is using MIN() to see in there is a NOT NULL result, because MIN() (like many other aggregate functions) will only return NULL if there are no NOT NULL values in the group.
  This is then checked in the WHERE clause to make sure it is NULL. This is not only a backwards approach, it pretty much negates the use of an index as all values must be checked. And, MIN() is not intuitive.
  Instead, use a direct approach, that is, NOT EXISTS() checking for a NOT NULL value. This will allow the database to stop looking when the first NOT NULL value is found.
  The question is, does the query still need the GROUP BY for the HAVING on Effective_Dt. If not, the GROUP BY can be removed.
- On third thought , the HAVING clause uses BETWEEN on Effective_Dt, which negates the possibility of a NULL. That very column is then checked for NULL in the WHERE clause. This is illogical, and requires an outer join.
  That is, the only way this can be, is if the query found no results, and the outer join therefore supplied NULLs for all the expected columns.
  Meaning, the query is using an outer join for the sole purpose of making sure no records exist that satisfy the WHERE clause inside the sub-query. This should most certainly be a NOT EXISTS.
  I keep wondering if the GROUP BY is redundant. Technically it is not, MIN() restricts the earliest date for the entire set of transaction to be within the last 2 weeks.

Anyway, assuming i kept it all straight, Z could be changed into:

Some notes as i am going through it.

Fundings
- ORDER BY is redundant. First, because results are not expected from it. Second, because GROUP BY implicitly does an ordering anyway (At least in Oracle it does.)
- One of the columns was not qualified with its table name. Not really an issue, but it should be used for clarity.
- The CASE statement is using the complex version, where the simple version should be used. (Documentation)
- This might just be stylistic, but the CASE statement includes Amount in the CASE. However, since Amount is always returned, it should be outside the CASE statement.
- Customer_Application.Id is SELECTed but never used.

Paid and Paid2
- This might just be stylistic, but  the boolean columns are being checked for = 'false'. Instead, just say NOT column-name
- SUM(COALESCE(column, 0)) is redundant, as SUM ignores NULLs. If you are afraid that all the values will be null, use COALESCE outside of SUM(), so it is only done once.
- Account.Id, Opening_Balance, and Last_Payment, are SELECTed but never used.
- Being Paid and Paid2 are only different based on Transaction.Type, you should combine the two and just pull two columns instead, using a CASE statement to implement the condition in the WHERE clause.
 It'd probably be easier to make the entire query a CTE, and then SELECT it with a CASE statement. I have tried doing so in what i have below. Creating a new CTE Pre_Paid to handle the logic.

That's it for now. Here is what i have so far:



Ricardo Coto wrote:if you notice, below there are some filters in the where clause with the column extracted from the inner queries in the left joins...



Oh, now i see them. Might be better as EXISTS() then. Seems kind of wasteful to outer join them, and then just check a column or two.

I hope to look at the query a bit further tonight. I've gotten a bit busy and haven't had the mindset and time it takes to go through a query like this.

Dave Tolls wrote:This isn't the query.
The error is coming from the prepareStatement call.
(Had a whole reply based on it being from the execution!)



Good catch, Dave! I need to read those traces more carefully.
And vacuum didn't reduce the size? That sounds odd.

Perhaps you can ask on the SQLite support email list.
7 years ago
Have you looked at SQLite's VACUUM command?
7 years ago
What are the query values, and have you tried running the query directly? With a type mismatch, both the input and output should be looked at.
I have not looked at the left joins in detail (yet). I do have a question though. If none of the columns in the SELECT clause are from the left joined tables, what is the purpose of joining them?

Maybe i am missing something. Here is my thought on the matter: A join is done to bring a table into scope or to act as an EXISTS(). An outer join negates the second option, leaving only the idea of bringing a table into scope. However, none of those tables are used for their columns. And, the two columns that are used for there columns, are normal joins, and are joined to each other directly (without an outer join between them). If so, what is the purpose of the outer joins?
A few more comments.

1) The main query does a GROUP BY, which is in lieu of DISTINCT. That is ultimately stylistic, but i think DISTINCT is much more clear.

2) Merchant is included and joined to Customer_Application, yet nothing else is done with it. Assuming Customer_Application.Merchant_Id is FKed to Merchant and NOT NULL, that join is redundant and should be removed.

3) The left joins are confusing because they are sub-queries that are then joined. To make them easier to understand, i suggest they all be made CTEs. Though, i want to see them a little more closely to see if they are actually required at all.

Anyway, i'm off to bed. I may just trudge through the rest tomorrow.

FWIW, this is my work-in-progress so far:
Why didn't i think of that? I feel kind of silly now. I had forgotten to remove my filter, thus i never saw when it said: java.lang.SecurityException: Permission denied (missing INTERNET permission?)

I did have the following line in AndroidManifest.xml: <uses-permission android:name="INTERNET"/>
But, a simple search found that should have been: <uses-permission android:name="android.permission.INTERNET"/>

It also says i need i also needed: <uses-permission android:name="android.permission.ACCESS_NETWORK_STATE"/>, but we'll wait and see on that one. Right now i have a different bug to work out.

Thank you!

7 years ago
Wow, that is one rockin' query.

I want to take a look at it later, unless Dave beats me to it. However, so far, i have looked at the CTEs and have some comments. First, here they are formatted to my preferences. First thing to note is Sent_History uses SELECT *. I wouldn't use * even if all the columns were required. Instead, list what you need. Specifically, that's Application_Id, Return_Reason, and Sent_Dt. I'll have to look again, but those IS NULLs in the main query might be better off in the CTE.

Second, Account_Attributes does a GROUP BY and COUNT(*) but is never used. Instead, GROUP BY is just being used in lieu of DISTINCT. That's okay, but i think DISTINCT is much clearer. In any case, the COUNT(*) is unnecessary.

Third, Actions is only being used for Customer_Account_Id, making the RANK() and CAST() useless, and the other columns being retrieved, just plain confusing. If the optimizer doesn't remove that analytic function, that's some wasted processing time right there.

That's it for now. Go to run to dinner in a few.
I am still fumbling through Android, today, in trying to create an AsyncTask to connect to a website, post some data, and get a response. My understanding is that i must use an AsyncTask here, as the main thread is not allowed to do this sort of work. And, iiuc, AsyncTask threads are not available for debugging with the Studio debugger.

The log shows "1" then "exception," as it fails on connection.getOutputStream(). I have verified it is that specific method by splitting up the statement, and setting connection.getOutputStream() to an interim variable in it's own line of code.

How to determine what the error is?
7 years ago
The optimizer is what chooses what to do, and that is based on a lot of things, including, table size, indexes, and columns retrieved. It is virtually impossible to do anything other than guess, when shown a simple version of the query.

Then again, that may just be my own limitation. Perhaps one of the other people here can help you.