• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Separating Unique names from a file

 
Greenhorn
Posts: 28
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a file where the contents are written in pattern itemID UniqueAuthorName<>paper title<>Venue.
How do I separate the UniqueAuthorName and the UniqueVenue and also count the number of publications per venue. Please help!
 
Saloon Keeper
Posts: 27752
196
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the JavaRanch, Souvik!

Have you looked at the String split() method?
 
Souvik Bhattacharyya
Greenhorn
Posts: 28
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes sir, I did. But it is not helping. Like I have to fetch the UniqueAuthorNames without the itemID, String split is only removing the <> and putting everything into a list format.  
 
Marshal
Posts: 79151
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the Ranch again

Since you posted on the Java8 forum, I presume you know about the Streams API. I think I would start with a presentation class which can take those details from your file as constructor parameters.
There are ways of splitting Strings on regexes which accept both space and arrowheads. Also please explain your pattern a bit more. Are you allowed spaces in name or venue, for a start, please?
 
Souvik Bhattacharyya
Greenhorn
Posts: 28
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
No sir, I'm not quite familiar with Streams API, could you guide me a bit more please?
And about the pattern, the file has the pattern itemID UniqueAuthorName<>PaperTitle<>Venue, and no there are no spaces in name or venue.
Please look into it.
 
Bartender
Posts: 2911
150
Google Web Toolkit Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Souvik, Your question has a couple of tasks :
  • Opening a file for reading purposes (and handling errors doing so)
  • Reading each lines in the file into a String (assuming each line is data you want)
  • Parsing each String into the format you want
  • Learning streams to make this job more better


  • I suggest that you can simply start your code like this:

    Are you able to fill in the blanks above ? (don't worry if you are wrong, we can help you)
     
    Souvik Bhattacharyya
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator




    This is what I came up with, yet, it shows an error message during compilation. Please help.
     
    Saloon Keeper
    Posts: 15484
    363
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    It's helpful if you post the stack trace.
     
    Campbell Ritchie
    Marshal
    Posts: 79151
    377
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Don't use StringTokenizer; it is a legacy class. I don't think that delimiter will work, but I might be mistaken.
    As well as using code tags, pleasee check your indentation. Look at lines 40‑53, where it is impossible to tell how many {s and how many }s there are.
     
    salvin francis
    Bartender
    Posts: 2911
    150
    Google Web Toolkit Eclipse IDE Java
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Souvik Bhattacharyya wrote: ... This is what I came up with, yet, it shows an error message during compilation. Please help.


    Hi Souvik, this is not quite what I meant. Let's simply leave out the file reading and error handling for now and look at parsing a simple String as per my suggested code. Are you able to do that ?
     
    Souvik Bhattacharyya
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    10_1 J Martin; James L Crowley<>An Appearance-Based Approach to Gesture-Recognition <>ICIAP
    10_2 J Martin; Augustin Lux ; Christophe Le Gal ; James L Crowley<>Smart Office: Design of an Intelligent Environment <>IEEE Intelligent Systems
    10_3 J Martin; Daniela Hall ; James L Crowley<>Statistical Gesture Recognition Through Modelling of Parameter Trajectories <>Gesture Workshop
    10_4 J Martin; James L Crowley ; Olivier Chomat<>A Probabilistic Sensor for the Perception and Recognition of Activities <>ECCV European Conference Computer Vision
    11_1 J Martin; Ramón Puigjaner<>Extending UML Deployment Diagrams form a Performance Engineering Perspective <>Software Engineering Research and Practice
    11_2 J Martin; Carlos Juiz ; Nunzio Nicoló Savino Vázquez<>Unified system builder through interacting blocks USBIB for soft real-time systems <>Workshop Software and Performance
    11_3 J Martin; Didier Boudigue ; Georges Gardarin ; José Antonio Corbacho ; Nunzio Nicoló Savino Vázquez ; Ramón Puigjaner ; Sophie Dumas<>Predicting the behaviour of three-tiered applications: dealing with distributed-object technology and databases <>Perform Eval
    11_4 J Martin; José Antonio Corbacho ; Nunzio Nicoló Savino Vázquez ; Ramón Puigjaner<>Extending SMART to Predict the Behaviour of PL/SQL Applications <>Computer Performance Evaluation Tools
    12_1 J Martin; Chris Lankester<>Ask Me Tomorrow: The NRC and University of Ottawa Question Answering System <>TREC Text REtrieval Conference
    12_2 J Martin; Alain Désilets ; Berry de Bruijn<>Extracting Keyphrases from Spoken Audio Documents <>SIGIR Research and Development Information Retrieval Workshop Information Retrieval Techniques for Speech Applications
    12_3 J Martin; <>Focusing Attention for Observational Learning: The Importance of Context <>IJCAI International Joint Conference Artificial Intelligence
    13_1 J Martin; Dominique Béroule ; R Veldman<>Developing Multimodal Interfaces: A Theoretical Framework and Guided Propagation Networks <>Multimodal Human Computer Communication
    13_2 J Martin; Dominique Béroule<>Temporal Codes within a Typology of Cooperation Between Modalities <>Artif Intell Rev
    13_3 J Martin; Adam Cheyer ; Luc Julia<>A Unified Framework for Constructing Multimodal Experiments and Applications <>Cooperative Multimodal Communication
    14_1 J Martin; Peter H Welch<>A CSP Model for Java Multithreading <>PDSE
    14_2 J Martin; <>A Tool for Checking the CSP sat Property <>Comput J
    15_1 J Martin; Daniel M Dias ; David C Sadler ; Jamshed H Mirza ; Marc Snir ; Tilak Agerwala<>SP System Architecture <>IBM Systems Journal
    15_2 J Martin; Daniel M Dias ; David C Sadler ; Jamshed H Mirza ; Marc Snir ; Tilak Agerwala<>SP System Architecture <>IBM Systems Journal
    16_1 J Martin; Ferenc A Jolesz ; Guido Gerig ; Martha Elizabeth Shenton ; Olaf Kübler ; Ron Kikinis<>Automating Segmentation of Dual-Echo MR Head Data <>IPMI Information Processing Medical Imaging
    16_2 J Martin; Alex Pentland ; Ron Kikinis ; Stan Sclaroff<>Characterization of Neuropathological Shape Deformations <>IEEE Transactions Pattern Analysis and Machine Intelligence
    1_16 J Martin; Court Sansom ; Debbie Durant ; Henry Moeller ; Jim Osborn ; Virginia Redmond<>Slowing Down the Revolving Door: Motivating and Retaining Student Employees <>SIGUCCS Special Interest Group University and Computing Services
    1_9 J Martin; <>Help desk training: art, science, or prayer <>SIGUCCS Special Interest Group University and Computing Services
    1_1 J Martin; <>What does faculty really want from information technology?<>SIGUCCS Special Interest Group University and Computing Services
    1_2 J Martin; <>Managing the network <>SIGUCCS Special Interest Group University and Computing Services
    1_10 J Martin; <>Seamless Integration of Client Server Applications - Conclusion or How Many SIGUCCS Papers Can You Get From One Project?<>SIGUCCS Special Interest Group University and Computing Services
    1_3 J Martin; <>There's gold in them thar networks! or searching for treasure in all wrong places <>SIGUCCS Special Interest Group University and Computing Services



    This a part of the sample text.
     
    Souvik Bhattacharyya
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator


    And this is what I have done so far.
     
    Rancher
    Posts: 517
    15
    Notepad Java
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    @ Souvik Bhattacharyya

    I think you may want to think of a Data class with attributes: itemId, authorName, paperTitle and venue.
    Read the file and get the tokenized data into a collection (a List, perhaps) of type Data class.
    Finally, query the collection to get the required results.

    My suggestion is get the solution one step at a time.
     
    Souvik Bhattacharyya
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Can you help me with the code format?
     
    Prasad Saya
    Rancher
    Posts: 517
    15
    Notepad Java
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    I think you may want to try the code suggested by salvin francis, first. That is the approach to start with. Just take that one string (or one line of data) and break into the required tokens and print. Try to do that, first. No files to start with.
     
    Souvik Bhattacharyya
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Yes absolutely, I had done that but I was facing trouble while splitting. That is why I came up with this code form.
     
    Prasad Saya
    Rancher
    Posts: 517
    15
    Notepad Java
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    What is the problem. Explain. Can you post the code here?
     
    Souvik Bhattacharyya
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator




    This is the overall code. This reads the file and nothing else.
     
    Saloon Keeper
    Posts: 10687
    85
    Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    As suggested by others, you need some sort of Data class that you can parse your line into. You file has 3 levels of delimiters: "<>" for ID/names, title, and venue. For ID/names you have a pattern for the ID and then a delimiter of ";" for the names. This becomes a bit tricky as you can see. The opening of the file should be done using "try with resources" (google it). I've provided the Data class with parsing here as well as the file reading portion. Not provided is the data analysis for uniqueness and counting that you mentioned.

     
    Prasad Saya
    Rancher
    Posts: 517
    15
    Notepad Java
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Prasad Saya wrote:I think you may want to try the code suggested by salvin francis, first. That is the approach to start with. Just take that one string (or one line of data) and break into the required tokens and print. Try to do that, first. No files to start with.



    Souvik, I think tokenizing/splitting one line of data first will help. The same procedure can be applied to all the remaining contents in the file. Get the solution for one line and the rest will take care of itself.

    Once you are able to do this correctly, you can read the file, one line at a time and apply the same logic to each line.
     
    Souvik Bhattacharyya
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Carey Brown wrote:



    Sir, could you explain this code please? This shows a compilation error.
     
    Campbell Ritchie
    Marshal
    Posts: 79151
    377
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Souvik Bhattacharyya wrote:. . . This shows a compilation error.

    Not when I tried it on Jshell:-Remind yourself about regexes in the Java™ Tutorials; maybe the part about predefined classes will be the most useful. The Jshell output may make it easier to follow the pattern. As you know \ and . (full stop) are metacharacters. The first part means any number (>0) of digits followed by _ once followed by a similar any number of digits. That is followed by any number (>0) of whitespace characters followed by any number (0) of anything else (probably not including line end characters).
     
    Souvik Bhattacharyya
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    .



    Sir, this I what I got when I did it.
     
    Souvik Bhattacharyya
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    The rest of the program was amazingly fine. I had to make some changes but it made my job 10 times easier. Thanks a lot.
     
    Campbell Ritchie
    Marshal
    Posts: 79151
    377
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    What you posted first time didn't have the dot between Pattern and pattern.
     
    Souvik Bhattacharyya
    Greenhorn
    Posts: 28
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Okay got it. Sorry my bad. Thanks a lot though.
     
    Campbell Ritchie
    Marshal
    Posts: 79151
    377
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Apology accepted That is why we recommend always to copy'n'paste code that might contain an error.
    Beware: one of the links from that link is broken.
     
    Don't get me started about those stupid light bulbs.
    reply
      Bookmark Topic Watch Topic
    • New Topic