Been learning about parsing more and more but I still seem to come up empty as to what they are actually doing. I've heard parsing described as chunking things up, like a sushi chef or something, and then working with those individual parts of data. Like if you have a String array such as:
we can parse the array of string and remove the commas and work with each piece separately. But then I've also heard parsing described as conversion? so which one is it? Is it chopping up data into pieces or is it converting from one form to another like ints to String or String to double? if that even works?
The things I've seen so far, which I don't understand what they are doing are:
What do any of these things mean? how are they working? what exactly is happening here?
Parsing converts data from an external format to an internal format. So for example you might convert an XML document into a tree structure of Java objects; that would be parsing. Or you might convert a line of text from a file into an array of strings by splitting it based on the commas, as in your example. That would be parsing too.
You wouldn't describe converting a string to an integer as parsing, though. Normally parsing produces a collection of objects which are related by a structure.
Actually, parsing is done from a string or text. Any parse() method will take a string as its argument and produce whatever that String/text actually represents, whether it's an int, double, Boolean, Date, or even structured types like an XML document or a JSON object. It's always from a textual representation to the actual type that the information represents.
You would not usually parse a comma‑separated list of names; you would however parse a sentence:-
"Bob, John, Victor, Sally, Janet and Fred read Coderanch."
You would divide that into subject verb object. You can parse code similarly.
Campbell Ritchie wrote:You would not usually parse a comma‑separated list of names; you would however parse a sentence...
So, do we actually have an answer?
Personally, I've always though of parsing as "conversion involving rules", and also - usually - conversion that doesn't involve a simple 1:1 mapping; so on that basis, converting a comma‑separated list of names would be parsing - albeit dirt-simple parsing.
And dealing with CSVs generically - ie, handling "values" that can include the delimiter - is definitely parsing, IMO.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
I see nothing wrong with the name "parse" to represent the behavior that it has in Integer and other classes that have a parse method. Sure, parsing is usually associated with compilation/interpretation at a much larger scale, i.e., turning textual code in a programming language into an executable form, the mechanics of turning a String into an integer or Date are basically the same. You have text as input, you have some rules for how the text should be broken down, and you have some rules for how to treat each small bit of text that has been broken down.
In the case in an integer, the rules are pretty simple: each character must be a digit. The conversion rule is that with each character parsed from the input text, you multiply the result by 10 before adding the value of the digit. There are other rules of course but these are the main ones.
As for CSV, the main rules are simply that each line in the input represents a row and commas separate values that are in different columns/fields. A full CSV parsing also involves further parsing of each field into an appropriate data type.
I think it comes down to context and intent when naming a method "parse" or something else like "split" because technically, split is still parsing the string it is operating on.
For me, parsing is "assigning meaning to symbols". A symbol doesn't necessarily have to be a string of characters, you could also parse audio (which is what we do in our heads when we listen to somebody, we assign a meaning to waves of air pressure).
After parsing, you're left with something that can have meaning in one context, but not in another. You can parse it again to assign meaning to it using the extra context.
For instance, you can parse the string "1", so it means 'the number 1'. But 1 doesn't necessarily have a meaning by itself. If it's in the context of f(x) = 2x + 1, you can parse it again so it means 'the initial value in a linear function being 1'.
Junilu Lacar wrote:I see nothing wrong with the name "parse" to represent the behavior that it has in Integer and other classes that have a parse method.
Yes... there's no reason for me to insist that parsing must produce more than one data item. After all, one is a perfectly good number like any other so that rule would be nugatory. So I agree with you.