• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Liutauras Vilda
  • Ron McLeod
Sheriffs:
  • Jeanne Boyarsky
  • Devaka Cooray
  • Paul Clapham
Saloon Keepers:
  • Scott Selikoff
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
  • Frits Walraven
Bartenders:
  • Stephan van Hulst
  • Carey Brown

Can someone please review this test case code where tests are passing and fully covering the code

 
Ranch Hand
Posts: 2954
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Below is the code for test case class on running which the 2 test cases are getting PASSED




Below is the actual code being tested in file spark.s3.reader.py :




Below is the output on running the test cases :


Ran 2 tests in 5.772s

OK

Also, I checked it is covering all lines of the method read_from_s3

Can someone please review whether my unit test code is correct?

Thanks
 
Marshal
Posts: 8988
652
Mac OS X Spring VI Editor BSD Java
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
For me personally, such type of test(s) are a maintenance burden.

To explain it in different words, what do they test? Pretty much nothing, or not so much useful. "Whether code executed top to down".

You may would want to mock a data frame as a result of reading from s3 bucket, and test some i.e. transforms with data and compare the expected outcome against the actual - that would be valuable.

Now, when it comes down to testing whether your application/function can read from an s3 bucket, that is an integration test, which wouldn't be part of your usual unit test suite.
 
Liutauras Vilda
Marshal
Posts: 8988
652
Mac OS X Spring VI Editor BSD Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
...and, don't get hooked too much on the lines of code executed during the test(s) run. 100% coverage may mean nothing. Test what is valuable to test, some sort of behaviour, which would give you a confidence about the function you are testing, not the reassurance that 100% of code been executed.

You can however have some asserts in addition to what you are mainly testing, whether there were an expected interaction with a mock.

i.e. pulled from a thin air, ignore syntax:


 
Monica Shiralkar
Ranch Hand
Posts: 2954
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks

Liutauras Vilda wrote:

You may would want to mock a data frame as a result of reading from s3 bucket, and test some i.e. transforms with data and compare the expected outcome against the actual - that would be valuable.



Yes, I had initially wanted to do that but mocking S3 hadn't worked fine for me.I had got error "No FileSystem for scheme "s3" "while using unittest.mock to mock spark.read.csv call to S3

Liutauras Vilda wrote:

Now, when it comes down to testing whether your application/function can read from an s3 bucket, that is an integration test, which wouldn't be part of your usual unit test suite.



If corelate this with the statement above, does it mean that unit test should not be written for it and rather integration test should be written.?
 
Liutauras Vilda
Marshal
Posts: 8988
652
Mac OS X Spring VI Editor BSD Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:Yes, I had initially wanted to do that but mocking S3 hadn't worked fine for me.I had got error "No FileSystem for scheme "s3" "while using unittest.mock to mock spark.read.csv call to S3


Well, you were not really even testing anything related to s3. All you verified (assuming that test ran), that spark mock has been called, but that's not the full story. I don't know a lot of subtleties there, but what you potentially would have wanted at least, is to test whether spark.read.csv been called, assuming you'd want to ensure that reader is fixed to csv file type.

But again, do you see? This type of discussion is not only confusing, but pretty much useless for the application you are building - hence I said, that such test is just an extra maintenance without much value.

Monica Shiralkar wrote:If corelate this with the statement above, does it mean that unit test should not be written for it and rather integration test should be written.?


Well, correct. Unit tests shouldn't communicate with external blob storage, that's why I said, for what you need to test, you can mock a read blob, which would be a DataFrame, wouldn't it, and test your application business logic, assuming you successfully read a blob from s3 bucket and that got loaded into data frame.

While the integration test would be that reads an actual file from s3 bucket (not just that!) and tests some more elaborate behaviour, and which by proxy would test whether a reading from s3 bucket succeeds, meaning application has an access to it etc... I'm assuming an access to s3 bucket wouldn't be needed from where application gets built, but rather from where it runs, that's another reason why that supposed to be part of a bigger integration test, might be running once a day or so, which loads spark job to cluster (i.e. aws:emr gcp:dataproc) depending where you'd have it in practice.

I perhaps would leave integration tests for later, until your application more matures (along with your CI/CD pipeline).
 
Monica Shiralkar
Ranch Hand
Posts: 2954
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Liutauras Vilda wrote:you can mock a read blob, which would be a DataFrame, wouldn't it, and test your application business logic, assuming you successfully read a blob from s3 bucket and that got loaded into data frame.



Yes, I had initially tried to do that as below:


     


However, it had given error  "No FileSystem for scheme "s3".
reply
    Bookmark Topic Watch Topic
  • New Topic