• Post Reply Bookmark Topic Watch Topic
  • New Topic

Assigning large number of strings (data processing)  RSS feed

 
Michael Voase
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I currently have a processor that runs through multiple zip file and orders the contents (XML files) by date/time.

These are then assigned to a string for example:



I have taken out the actual file names for confidentiality purposes.

These are then used in the following code below to process the files and assert against an API endpoint.



Then the assertion code is below:


Again just put examples in the required fields.

My question is I have 25,000 of these XML files, can someone suggest a better way of processing these rather than setting them in a string.

I don't want to type out a string for each STEP.

I need to run them one by one, so use one XML file, check the endpoint and response file match, then go to the next STEP.

Any suggestions with a few code examples would be great.
 
Pete Letkeman
Ranch Foreman
Posts: 834
25
Android Chrome IntelliJ IDE Java MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you do post a copy of this question to some other site, please let us know by providing a link to that post.
I found this question here https://stackoverflow.com/questions/47218659/assigning-large-number-of-strings-data-processing-java

That being said, I do have a few follow up questions/thoughts:
  • Is this an XML file or a JSON file?
  • Have you looked into any processing libraries like Jackson for JSON prasing or the SAX for XML? Note that there are many processing libraries out there, I think that Google even created one.
  • The StringBuilder API is much more memory efficient then Strings, so you may want to use that.
  • What exactly is getRefinedProcessor()? I Googled for it and I could not find an answer.

  • If this is a one time event, then perhaps you are better off with a native script e.g. bash on *nix or powershell Windows.
     
    Michael Voase
    Greenhorn
    Posts: 11
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Pete Letkeman wrote:If you do post a copy of this question to some other site, please let us know by providing a link to that post.
    I found this question here https://stackoverflow.com/questions/47218659/assigning-large-number-of-strings-data-processing-java

    That being said, I do have a few follow up questions/thoughts:
  • Is this an XML file or a JSON file?
  • Have you looked into any processing libraries like Jackson for JSON prasing or the SAX for XML? Note that there are many processing libraries out there, I think that Google even created one.
  • The StringBuilder API is much more memory efficient then Strings, so you may want to use that.
  • What exactly is getRefinedProcessor()? I Googled for it and I could not find an answer.

  • If this is a one time event, then perhaps you are better off with a native script e.g. bash on *nix or powershell Windows.


    Thanks fore replying Pete.
    First of all apologies for not specifying that I had posted elsewhere.

    All refinedProcessor does is skip past certain XML files within the ZIP folder i.e. ignores them for example:


    This would ignore any files in the zip folder with MICHAELVOASE as part of the file name.

    Here are my responses to each other question:
  • Is this an XML file or a JSON file?

  • It is processing the XML files from multiple zip folders and ordering by date/time - I am assigning a String to the ones I need to use for the STEP and assertion.
    We're then asserting this XML file against an API endpoint, status code and json file.


  • Have you looked into any processing libraries like Jackson for JSON prasing or the SAX for XML? Note that there are many processing libraries out there, I think that Google even created one.

  • Yes there are processing libraries that I have looked at, just not sure which way to go about this with there been a large quantity of xml files that are being asserted - There is a unique endpoint and json file for each XML file.
  • The StringBuilder API is much more memory efficient then Strings, so you may want to use that.

  • Again not sure the best way about building out this for so many strings.

    Appreciate you taking the time to respond.
     
    Pete Letkeman
    Ranch Foreman
    Posts: 834
    25
    Android Chrome IntelliJ IDE Java MySQL Database
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I'm unsure if I can help much more as I do not know the too much regarding XML and Java.
    But I can point you to the CodeRanch XML FAQ which is found here
    https://coderanch.com/wiki/659751/Xml-Faq

    Possibly one of the links in the FAQ can help you out.
     
    Paul Clapham
    Sheriff
    Posts: 22708
    43
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Michael Voase wrote:I currently have a processor that runs through multiple zip file and orders the contents (XML files) by date/time.

    These are then assigned to a string for example:



    I was already confused by this point. So... you have some ZIP files. And each of them contains... an XML file? Several XML files?

    And you want to order those XML files by date and time. Is this all of the XML files collectively, or all of the XML files in a ZIP file?

    And then you're assigning an XML file to a String... is that the contents of the XML file, or the name of the file, or what?

    Later you said

    My question is I have 25,000 of these XML files, can someone suggest a better way of processing these rather than setting them in a string.

    I don't want to type out a string for each STEP.


    I missed the explanation of why you have to type out strings. And what a STEP is. And what it means to set an XML file in a string, which might be different than typing the string. If you're going to ask people for the best way to do something, a clear description of what that something is would be a great help. So could you try to clarify what you're trying to do?
     
    Michael Voase
    Greenhorn
    Posts: 11
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Paul Clapham wrote:
    Michael Voase wrote:I currently have a processor that runs through multiple zip file and orders the contents (XML files) by date/time.

    These are then assigned to a string for example:



    I was already confused by this point. So... you have some ZIP files. And each of them contains... an XML file? Several XML files?

    And you want to order those XML files by date and time. Is this all of the XML files collectively, or all of the XML files in a ZIP file?

    And then you're assigning an XML file to a String... is that the contents of the XML file, or the name of the file, or what?

    Later you said

    My question is I have 25,000 of these XML files, can someone suggest a better way of processing these rather than setting them in a string.

    I don't want to type out a string for each STEP.


    I missed the explanation of why you have to type out strings. And what a STEP is. And what it means to set an XML file in a string, which might be different than typing the string. If you're going to ask people for the best way to do something, a clear description of what that something is would be a great help. So could you try to clarify what you're trying to do?


    Okay so let me try and answer all your questions so you have a bit more clarity around this.

    I was already confused by this point. So... you have some ZIP files. And each of them contains... an XML file? Several XML files?
    Yes I have several ZIP files that contain several xml files.


    And you want to order those XML files by date and time. Is this all of the XML files collectively, or all of the XML files in a ZIP file?
    This process already happens, it orders the names of the XML files by date/time order in a console log .txt file.

    And then you're assigning an XML file to a String... is that the contents of the XML file, or the name of the file, or what?
    This is just the file name, so the assertAPI knows which XML file its looking at.


    I missed the explanation of why you have to type out strings. And what a STEP is. And what it means to set an XML file in a string, which might be different than typing the string. If you're going to ask people for the best way to do something, a clear description of what that something is would be a great help. So could you try to clarify what you're trying to do?
    Okay so we set the xml file name under STEP_1. Then provide that in the assertAPI("XMLFILENAME", statuscode, endpoint)

    Because there is 25,000 it seems silly manually creating 25,000 strings and writing out 25,000 assertAPI requests.



     
    Paul Clapham
    Sheriff
    Posts: 22708
    43
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Michael Voase wrote:Because there is 25,000 it seems silly manually creating 25,000 strings and writing out 25,000 assertAPI requests.


    I agree. So could you explain why you have to create those strings manually? I should remind you that I don't know anything about where those strings are coming from, although you have a long writeup which seems like it's supposed to explain that.
     
    Michael Voase
    Greenhorn
    Posts: 11
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Paul Clapham wrote:
    Michael Voase wrote:Because there is 25,000 it seems silly manually creating 25,000 strings and writing out 25,000 assertAPI requests.


    I agree. So could you explain why you have to create those strings manually? I should remind you that I don't know anything about where those strings are coming from, although you have a long writeup which seems like it's supposed to explain that.


    I don't have to at all, this is my biggest question is a way to process them in a much quicker way.
    Barring in mind they are then used to assert different endpoints dependent on the string.

    The strings are literally the file names of the XML files.
    They get ordered and placed into a console log text file, then I just pick them out and attach them to a string i.e. FILE.xml = STEP_1
     
    Paul Clapham
    Sheriff
    Posts: 22708
    43
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Okay. So you get a collection of XML files, and you write their names out to a file.

    So, what's the problem now? You still seem to be concerned about some string which you are having to build, or something. I can't really tell why it's a problem or why you are doing that, whatever it is.
     
    Michael Voase
    Greenhorn
    Posts: 11
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Paul Clapham wrote:Okay. So you get a collection of XML files, and you write their names out to a file.

    So, what's the problem now? You still seem to be concerned about some string which you are having to build, or something. I can't really tell why it's a problem or why you are doing that, whatever it is.


    Because there is 25,000 of them in the file.

    I don't want to go..
    Private string STEP_1 = filename.xml
    Private string STEP_2 = filename2.xml
    Etc etc

    I want a way to go okay so we're taking the first filename out of the file.

    Attach it to a string, call assertAPI with that string, expected response code and endpoint needed.
    Loop back round pick the next filename and do the same.

     
    Paul Clapham
    Sheriff
    Posts: 22708
    43
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Michael Voase wrote:I want a way to go okay so we're taking the first filename out of the file.

    Attach it to a string, call assertAPI with that string, expected response code and endpoint needed.
    Loop back round pick the next filename and do the same.


    Yes, that part sounds perfectly reasonable. (Except for the part "Attach it to a string"... I don't understand what it means to "attach" a string to a string.) I'm just failing to understand why you think you need 25000 String variables. As far as I can see you only need one String variable, to hold the name of the current file being processed.
     
    Michael Voase
    Greenhorn
    Posts: 11
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Paul Clapham wrote:
    Michael Voase wrote:I want a way to go okay so we're taking the first filename out of the file.

    Attach it to a string, call assertAPI with that string, expected response code and endpoint needed.
    Loop back round pick the next filename and do the same.


    Yes, that part sounds perfectly reasonable. (Except for the part "Attach it to a string"... I don't understand what it means to "attach" a string to a string.) I'm just failing to understand why you think you need 25000 String variables. As far as I can see you only need one String variable, to hold the name of the current file being processed.


    Correct I do, but then I need to go and process that request again for the next one.
    I just don't know the best way to do that.
     
    Paul Clapham
    Sheriff
    Posts: 22708
    43
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    You produce a list of file names:



    Then you process them:



    What am I missing?
     
    Michael Voase
    Greenhorn
    Posts: 11
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Paul Clapham wrote:You produce a list of file names:



    Then you process them:



    What am I missing?


    Okay so I will try that in order to create a list of the filenames,
    however once this is done, we will have our fileName - Which is great.

    The next bit that needs handling is the assertAPI request:

    So you have your String name for the file, but then they have different endpoints and responses (needed for the assertAPI request).
    These aren't really stored anywhere currently its just domain knowledge, any idea how best to store these to link them up to each fileName string.
     
    Dave Tolls
    Ranch Foreman
    Posts: 3008
    37
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I think that's the bit that's missing.
    What is the "assertAPI" request, and what relationship does that have to the file names?
     
    Michael Voase
    Greenhorn
    Posts: 11
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Dave Tolls wrote:I think that's the bit that's missing.
    What is the "assertAPI" request, and what relationship does that have to the file names?


    Okay so i've completely gone away from the current workflow now.
    Firstly, I have set the XML filename, response code, endpoint and expected file path in a csv file and pulled that in as so:




    This brings in all the data from the csv file.

    Can anyone direct me on how to bring it in row by row and then loop round and get the next row?
     
    Junilu Lacar
    Sheriff
    Posts: 11435
    176
    Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Michael Voase wrote:
    I don't want to go..
    Private string STEP_1 = filename.xml
    Private string STEP_2 = filename2.xml

    Wouldn't you just do something like below?

    I can't figure out the rest of what you want to do so maybe I'm missing something here...
     
    Junilu Lacar
    Sheriff
    Posts: 11435
    176
    Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Michael Voase wrote:


    This is confusing to read because:
    1. Methods don't normally start with a capital letter. Method names start with a lowercase letter, so steps() instead of Steps()
    2. The method is declared with a void return type and it creates a Steps object that appears to be just thrown away since it's assigned to a local variable.
    3. The class name Steps is plural but it seems like an instance really only represents ONE "step", whatever that is.

    If anything, I would have expected something like this:

    Then to go through all of the "steps" you've gathered and do something like what I showed before:
           
     
    Michael Voase
    Greenhorn
    Posts: 11
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Thank you for the replies.
    I've managed to query my csv file and return the values that I need.
    I.e xmlfilename, response code, endpoint and expected response.

    This allows me to call assertAPI.

    It then goes back into the csv and gets the next row and keeps looping the assertion round with different values.
     
    Dave Tolls
    Ranch Foreman
    Posts: 3008
    37
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    So, this looks like a test framework?
    You have a whole load of test XML files for firing at some web services, and you want a way of creating the test steps.
    At least that's what I'm getting from this.

    Your test CSV therefore looks something like:
    Given an XML request file
    And an endpoint
    When the request is sent
    Then we get this response code
    And this response

    Something like that?
     
    Don't get me started about those stupid light bulbs.
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!