Monica Shiralkar

Ranch Hand
+ Follow
since Jul 07, 2012
Cows and Likes
Total received
In last 30 days
Total given
Total received
Received in last 30 days
Total given
Given in last 30 days
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Monica Shiralkar

Once I finish some work, before getting it reviewed by the team lead, I  take a look once and see that it is correct and does not have mistakes. This takes long time for me. Also, sometimes I miss things in this self review and then thus depend on team leader to find any mistakes left which I could not see myself.

By what ways can I get better at quick self review of my work (after doing it ) before getting reviewed by the team leader.

8 hours ago
Yes that's was wrong.format is function.
The reason I had thought that it would give error is that  it's purpose is to format but instead what I was using it for was for conversion so it would give error.
4 days ago
Number of output files in pyspark output varies.

I have sometimes seen 1 output file , sometimes 2, sometimes 10 and so on.

It depends on the number of workers. But how does it keep varying ?
5 days ago
Since we have to use format like below which does not use the format keyword :

But then there is format keyword which can be used for data formatting.

If I do

There is no data formatting that I am doing.So I thought it would give error.
5 days ago

Yes.What I did not understand is that why is my second statement (which I am using incorrectly as you rightly said ), doing a string conversion ?
5 days ago
What is the difference between using the below statements:



6 days ago
The application reads input data from Source1 and Source2 and in the Result (JSON format ), populates a field success as True when both files are present, and success as False when atleast one file is not present.

It has methods such as readFromSource1 and readFromSource2 and processData, writeOutput. When writing unit test cases, I can implement methods like test_readFromSource1 and test_readFromSource2, test_processData and test_writeOutput.

This can be used to test individual methods which are parts of functionality but how to test the overall flow which is like in below scenarios :

Scenario 1: Put sample files in both sources.
Expected output : the success field in the JSON output should be True.

Scenario 2: Put file only in source1 and not in source2.
Expected output : the success field should be false.

I can test individual methods using test methods but is there way to test the entire flow of a particular scenario using a test case ?
Or do I have to test this manually only by running the program and checking the value of success field and then record these results for each scenario in say a word document.?


1 week ago
I understand that the examples I came up with were not good.

The question that I intended to ask was whether mixing camel case and snake case is acceptable practice (in cases one thinks it is easier to understand).

As said above , one shouldn't be mixing cases and stick to snake case for python functions.

2 weeks ago
input_data , processed_data are relatively more meaningful than data.

I found currentDay_inputData (mixed of snake case and camel case) is easier to understand than current_day_input_data but the latter and not the former follows the python naming standards.
2 weeks ago

Tim Holloway wrote:but avoid mixed cases in c_like_names or underscores in javaLikeNames.


What I intended to by using is that in merge_currentDayData_with_pastDayData currentDayData and pastDayData are more together than merge and currentDayData and with and pastDayData, but there may be better ways to show this.

For e.g.  In function name write_processed_data , processed and data are more together than write and processed.

Is there a better way of showing this instead of using mixed cases like write_processedData
2 weeks ago

Junilu Lacar wrote: If it's always today's and yesterday's data that are being merged, it might be better to leave that information out of the name and put in the documentation.

Yes, I agree.

Junilu Lacar wrote:
I don't know if the word "data" is really needed. What does the data represent? That would probably be the better word/phrase to use in the name instead of just the generic "data" idea.

But should that not be in the context that what kind of data it is .E.g whether it is weather data or stocks data ?

Junilu Lacar wrote: For example, add_current_day_total_with_past_day_total might just be calculate_weekly_running_total.

But weekly data would mean data for the entire week instead of just data for today and yesterday.

Junilu Lacar wrote: Think at a higher level of abstraction and less at the implementation level when trying to find good names.

I was thinking that higher level part will be in the context so if I specify that in the name it may look unnecessary and long.

I understood that this name which I gave merge_currentDayData_with_pastDayData can be improved but what I want to know is that is that whether snake case and camel case mixed is also an acceptable format ?

2 weeks ago
In python the function names should be in snake case.

If my function merges the current day data with past day data, I can name it as


Or should I do it like


The first one follows snake case but the second one is more readable.

Which one is the better way to name the functions ?

2 weeks ago
Suppose we have to populate value of a new column myCol based on some computation value returned from some function say myFunc

Instead of calling myFunc like in below code which is not allowed

We will have to create a UDF for myFunc and then call it.That will work.

What is the reason spark doesn't allow us to call function like this but allows only UDFs at this place ?

2 weeks ago