Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
Win a 3 month subscription to Marco Behler Videos this week in the Spring forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Read number of occurrence of word from file  RSS feed

 
Atul More
Greenhorn
Posts: 29
1
Java jQuery Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have written a code which read the file from location, count the word based on the separator and count the occurrence of each word.
Just want to analyze the code and is it well written? Is there any improvement required. I am trying to write it on Java8.

 
Stephan van Hulst
Saloon Keeper
Posts: 7507
135
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
  • Only make classes public when they're needed by classes outside of their package. Only make classes non-final when you plan to override them.
  • Don't use hard-coded file paths.
  • Constants should be static.
  • occuranceData should be occurrenceData, or simply occurrences. There are also typos in calculateTopPhresesFromFile and calcualteOccurance. Be diligent.
  • Don't allow nulls to be passed to your methods, unless that implies something specific. Never return null from a method.
  • Don't use arrays in method signatures or return types. Use collections.
  • Don't use String to represent anything other than names. For file paths, use Path.
  • Prefer enhanced for loops if you don't expressly need an index.
  • Don't create objects over and over again if you don't have to. fileFilter can be a static final field.
  • Don't use forEach if you can use a collector. You can use flatMap() and Arrays.stream() on your lines.
  • Don't catch exceptions if you don't have a good way of handling them. Propagate them from your method if you want the caller to deal with them.
  • Don't pass Comparator.reverseOrder() to a Comparator. Just use reversed() on the Comparator.
  • You don't need to pass a merge function to toMap() if you can guarantee that the key selector never returns duplicates.
  • You should perform parameter checking in all your methods and constructors.
  • Make your fields final wherever possible.
  •  
    Campbell Ritchie
    Marshal
    Posts: 54909
    155
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    The field limit_count shou‍ld read limitCount, or as a constant LIMIT_COUNT.

    Documentation comments don't need two *s at their end. Line 42 shou‍ld read */ not * */
    No space before [] in arrays.
    Line 70 is too long; divide it with a new line before each dot.I am sure there is a better way to do that. What you are trying to do is get all the words from your file into a List. I am getting suspicious because I could see a forEach method call.
    How do you know you will have enough information to make it worthwhile parallelising the Stream? Why not parallelise it earlier?
    Now, let's try something different. You get a Stream<String> from the file reading with the lines method.
    Then let's parallelise it straight away.
    Now, let's see if we can't get the Stream<String> into another Stream<String> with its flatMap() method, which takes some Function or other as its argument. Let's see if we can get away with a straight λ call to String#split. I tried it and it didn't work.
    You have to use map to get a Stream<String[]> and then pass Arrays::stream as a method reference to flatMap, or possibly a λ. Look at my lines 3‑4. [Addition: That is the same as Stephan said.]
    As a terminal operation, let's see if we can't turn that straight into a List. You can use the Collectors class' methods to get a ready‑made Collector, and this method returns a Collector which creates a List. Let's see whether this will work:-Your methods look confused. You have multiple return statements, and a lot of methods seem simply to call other methods with similar names. It is difficult to read the code or follow the flow of execution. I agree with Stephan: don't let nulls in and don't let nulls out. There is nothing wrong with a 0‑length List.

    Please explain more about line 82‑88. That look interesting, but is too tightly packed to read properly.
     
    Scott Shipp
    Ranch Hand
    Posts: 223
    12
    Eclipse IDE IntelliJ IDE Java Scala Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Stephan van Hulst wrote:
  • Don't use arrays in method signatures or return types. Use collections.



  • What is the rationale for this advice? And does this apply to methods with any access modifier or would you differentiate between public, private, etc? Would you add any caveats for performance considerations?
     
    Stephan van Hulst
    Saloon Keeper
    Posts: 7507
    135
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Arrays are not good for APIs. They're basic building blocks of more advanced data structures. Whether a method is private or not doesn't matter. You're still a consumer of your own private methods, and collections are much more versatile and lead to code that's easier to read. Performance will rarely play a part, because collections that use an array as their backing store (ArrayList, ArrayDeque, ArrayBlockingQueue) shouldn't perform that much poorer.

    There might be some caveats, but those will be known by software architects who have a very specific and well thought out reason to use an array instead of a collection, and only after profiling. If you're not sure you need one, then you don't need one.
     
    Piet Souris
    Rancher
    Posts: 1916
    66
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    An alternative to
          
    is

    This is not limited to limit_count, but in your code you might miss some Strings, if the Set of Strings with equal frequency is larger than limit_count.

    Edit: and oh yeah: is it not wise to convert all strings to lowercase?
     
    Scott Shipp
    Ranch Hand
    Posts: 223
    12
    Eclipse IDE IntelliJ IDE Java Scala Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Stephan van Hulst wrote:Arrays are not good for APIs. They're basic building blocks of more advanced data structures. Whether a method is private or not doesn't matter. You're still a consumer of your own private methods, and collections are much more versatile and lead to code that's easier to read. Performance will rarely play a part, because collections that use an array as their backing store (ArrayList, ArrayDeque, ArrayBlockingQueue) shouldn't perform that much poorer.

    There might be some caveats, but those will be known by software architects who have a very specific and well thought out reason to use an array instead of a collection, and only after profiling. If you're not sure you need one, then you don't need one.


    What about all of these methods that take arrays in the standard Java library? What do you say about those? Or these?
     
    Campbell Ritchie
    Marshal
    Posts: 54909
    155
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    If people are going to use arrays, they will require methods to handle them. The existence of the Arrays class means that some people use arrays, not necessarily that they are a good thing to use in production code. The Java® APIs are designed to provide methods for anybody who might want them, not simply methods frequently used.
     
    Stephan van Hulst
    Saloon Keeper
    Posts: 7507
    135
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    It would be silly for a utility class for array operations, not to work on arrays.

    String is an extremely low level data structure. It's not a good example of a high level application API.
     
    Atul More
    Greenhorn
    Posts: 29
    1
    Java jQuery Spring
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Hi,

    Thanks Stephan for your inputs, will keep in mind those.
    @Campbell: Thanks for the explanation and alternative. The code from line 82 on wards till the end to read the list of words and get the count of each word.
    Let me write it again.

    Thanks,
    Atul
     
    Campbell Ritchie
    Marshal
    Posts: 54909
    155
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Atul More wrote:. . . The code from line 82 on wards till the end to read the list of words and get the count of each word. . . .
    I suggest you start by writing an explanation of each method call and what it returns; seeing the structure of the code will probably help you enhance it.
     
    Campbell Ritchie
    Marshal
    Posts: 54909
    155
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    . . . Also please explain how you are working out whether to use a sequential Stream or a parallel Stream.
     
    Scott Shipp
    Ranch Hand
    Posts: 223
    12
    Eclipse IDE IntelliJ IDE Java Scala Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Stephan van Hulst wrote:It would be silly for a utility class for array operations, not to work on arrays.

    String is an extremely low level data structure. It's not a good example of a high level application API.


    So, my apologies, but I'm getting more confused. It sounds like the advice in question is actually centered on "public," "high-level" API's? Because at first you seemed to indicate it didn't matter if they were public or private, just a blanket "Don't use arrays in method signatures or return types. Use collections." I immediately thought of the examples I gave and wondered if you thought they were bad.

    I'll throw in another fascinating topic. A lot of experienced developers actually say don't use collections, use specific / custom objects. Some of us are old enough to have lived through the introduction of List instead of Vector and HashMap instead of HashTable. We lived through the introduction of generics in Java 5. All of our APIs became screwed up, specifically because we used Java collections in them. Actually, now that I think about it, had we used arrays, we wouldn't have had the same problem. But nevertheless, advice I would give is to hide the fact of whether you are using Arrays, Vectors or Lists, and expose only the behavior you want the client to use through an object.

    Another fascinating digression on this topic would be to consider the discussion in Item 25 of Effective Java centered on the differences in covariance and invariance between objects and arrays.

    Personally, I feel that the space is too complex and there are too many nuances to issue any blanket generalities about using arrays or objects in method signatures. It takes a careful software craftsman to consider the context and make a (hopefully) wise decision. In the given example from the OP, I personally don't object to his use of an array there.

     
    Stephan van Hulst
    Saloon Keeper
    Posts: 7507
    135
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Scott Shipp wrote:It sounds like the advice in question is actually centered on "public," "high-level" API's?

    No, all methods, public or private, except for the lowest of the low. String and Arrays are the lowest of the low, and it's exactly because these classes exist that you don't have to write your own. To clarify: accepting or returning arrays isn't necessarily bad, it's just that there's almost no reason to do it. If you do it, you probably didn't use an existing class or method that already does what you're trying to do. Using an array should ring a bell: "Hey, I probably designed my class wrong, or didn't use the correct tools".

    Personally, I feel that the space is too complex and there are too many nuances to issue any blanket generalities about using arrays or objects in method signatures. It takes a careful software craftsman to consider the context and make a (hopefully) wise decision. In the given example from the OP, I personally don't object to his use of an array there.

    I will agree to that. There are some very specific caveats to my statement, but I used the blanket statement because for a beginning programmer it's much easier to remember. When somebody *really* needs to pass an array around instead of some other existing type, they need a *very* good reason for it. Maybe when they're working with computer graphics or something and using classes like ByteBuffer or something similar isn't performant enough, after extensive benchmarking.

    A lot of experienced developers actually say don't use collections, use specific / custom objects.

    Using an object that accurately describes what you need is almost always preferable to a less descriptive type. However, sometimes you just need more than one of something, and then you should prefer collections over arrays.

    All of our APIs became screwed up, specifically because we used Java collections in them. Actually, now that I think about it, had we used arrays, we wouldn't have had the same problem.

    How did you APIs become screwed up? Classes that accepted HashTable didn't become worse when HashMap came out. You just had a better alternative in new code. Similarly, classes that used raw collection types pre-generics didn't become worse when generics came out.

    [edit]

    Interesting discussion by the way, please have a cow!
     
    Consider Paul's rocket mass heater.
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!