Win a copy of Mastering Non-Functional Requirements this week in the Design forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

How to convert End Of line character of CSV file using java.  RSS feed

 
Puspender Tanwar
Ranch Hand
Posts: 451
2
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Liutauras Vilda wrote:Who told you that? This is only how the raw file looks like, after insertion, it gets inserted in the DB and treated as 1 record, regardless if you have it on 3 different lines or you'd have on 1 line
Load file with lines 2, 3 and 4 as one liner ending with CRLF and you'll see it goes through and in DB looking exactly the same.
They don't look similar in DB. correctFile.csv multiline record first record seperated in 3 lines in DB also and that's the expected behaviour I guess.

Liutauras Vilda wrote:So, keep original file untouched after it gets downloaded and make copy of it, but fixed one. That fixing part needs to be happening upon file writing (as you don't want to fix them manually)
How can I inter change LF and CRLF?

Dave Tolls wrote:For the one that fails it has picked up LF as the terminator, so all lines are being treated as new records.

Yes I also understood this one. The file which loaded contains CRLF at complete ending of record. Now of I give INFILE ____  "str '\r\n' "  than SQL loader will look for CRLF at ending of each record and if it doesn't find it then continue on the next record.
This same happens if I don't provide any "str" parameter, as SQL loader uses the default EOL for windows i.e CRLF. Now if I try to apply same concept on the incorrect file whose complete records are ending with LF and incomplete with CRLF, and provide "str '\n' " then it fails because LF is in ending of every line (CRLF also contains LF in it) and then it consider every line as complete record. After that it looks for the CONTINUEIF LAST ' " ' parameter and that that result into uneven number of double quotes and hence throw the error second enclosure string not present.

Dave Tolls wrote:If so, how about trying CR ('\r') as the CONTINUEIF LAST instaed of the double quotes?
Tried this also, but I guess \n is not allowed in this parameter. I got the following error:
SQL*Loader-458: Comparison text ('
') of CONTINUEIF LAST must be non-whitespace





 
Dave Tolls
Ranch Hand
Posts: 2652
29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OK, but (if you have to work with this type of file, see below) then you'll need to play around with values for the continueif.
How about '"\r'?

However...you do need to tell us exactly what format the file comes out of the Salesforce API.
Since you have a version with the correct CRLF terminator, I assume that came from the API somehow.
If that's the case then that is what needs to be looked at...why (how) is the incorrect file being generated?
 
Campbell Ritchie
Sheriff
Posts: 54475
150
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Puspender Tanwar wrote:. . . I guess \n is not allowed in this parameter. I got the following error:
SQL*Loader-458: Comparison text ('
') of CONTINUEIF LAST must be non-whitespace
Don't guess. Find the details for the program and read them. Such an error message would probably exclude any line end characters from a CONTINUEIF marker since they all count as whitespace. You may have an insoluble problem that the SQL loader doesn't permit multi‑line Strings. I don't know how you are going to get those line end sequences in. You can try changing a solitary LF to \\n, but there is no warranty and it may not work.
 
Puspender Tanwar
Ranch Hand
Posts: 451
2
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:How about '"\r'?

error : SQL*Loader-457: Comparison text of [b]CONTINUEIF LAST must have length 1 not 2[/b]

I think the best way to handle this is using INFILE ____ "str ''" parameter
Since the enclosing character of record is double quote (FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' in ctl file), means this is very sure that the last character in every record will be double quote(other than EOL character).
So if their is any way we can provide "str '<doublequote>\n' " (combination of double quote and LF), the SQL loader will not get confused and load the data.
I tried below combination  :
str '"\n'  - invalid
str '"''\n' - (" and \n enclosed by single quote separately) SQL loader stopped working

Dave Tolls wrote:Since you have a version with the correct CRLF terminator, I assume that came from the API somehow.

Not actually, the incorrect.csv file I got from Salesforce api(I used their bulk api and in return got the data stream, then I write that data stream to csv file using the code which I pasted first in this thread). And the correct.csv I got from Data Loader tool. Data Loader is a tool provided by salesforce for getting/uploading records from salesforce.
 
Dave Tolls
Ranch Hand
Posts: 2652
29
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OK, then at this point I would go and ask Salesforce how this is supposed to work with sqlldr.

I can't imagine they haven't had to deal with this before.

Can you access the API other than through Java?
Say via Postman (assuming this is a web service)?
Just to get the raw data that is returned.
 
Liutauras Vilda
Marshal
Posts: 4111
227
BSD
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Puspender Tanwar wrote:multiline record first record seperated in 3 lines in DB also and that's the expected behaviour I guess.

Really?!?

The file which loaded contains CRLF at complete ending of record.

By this you contradict your first statement (upper quote).

Does the first record always split on three lines? Are there any other records like that? In my belief you just need to fix broken raw file. In my terminology it is broken from the point of view how it needs to go into database.

You get this file from whenever you get that (SalesForce API). Big question is if the file really contains some records which fall into multiple lines or you break it yourself during your code (very first post) execution.
What I'd do, I'd either fix or not break (in case you broke it) file during pre-process (once again, this is what you sort of doing with your very first post's code).

Your problem is 1st record, not the line endings or something. We have a proof of that. Of course you haven't tried to load first record as one liner to prove to yourself, but I'm almost sure this is how it supposed to be.

I used to work in a company where we used to receive thousands of IT vendors sales data from all around the Europe on daily basis and we had to process them and load on real time, so the vendors could see what they sell in Spain or Germany or anywhere else in the past hour or so. Many files were fine (followed guidelines/specification, nicely delimited, single liners), many were broken (as yours), hence they had to be fixed. Some of those could be fixed automatically with regular expressions as many of those followed some patterns, some of those were able to be fixed with written scripts (on the fly too), some of those had to be fixed manually, because didn't follow any logical patterns.

When you say you cannot modify record, well, you cannot mess around with original data file if it can be downloaded once and needs to be stored as it is (because it belongs to customer), but you supposed to be able to make a copy (you do that already by executing Java code) of it in the way you want to satisfy insertion as long as you don't modify data file content.
 
Dave Tolls
Ranch Hand
Posts: 2652
29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Liutauras Vilda wrote:
Puspender Tanwar wrote:multiline record first record seperated in 3 lines in DB also and that's the expected behaviour I guess.

Really?!?



Seems reasonable to me.
Whoever entered the original data, which will be somewhere in Salesforce, put new lines in one of the fields.  That ought to be reflected in the transfer.

I'm not a fan of massaging data if it can be avoided.
There's no good reason that data in the column in the local Oracle table shouldn't have new line characters in it.
 
Liutauras Vilda
Marshal
Posts: 4111
227
BSD
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:Whoever entered the original data, which will be somewhere in Salesforce, put new lines in one of the fields.  That ought to be reflected in the transfer.

Not sure I understand your wording correctly, but yeah, I don't disagree with that. But into the table still needs to be inserted following its structure. This record, which lays down on lines 2, 3 and 4 is a singular record, which is meant to be inserted into the 4 table's fields. But not 3 records into random fields (this is what I thought user is claiming).
 
Liutauras Vilda
Marshal
Posts: 4111
227
BSD
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
IdIsDeletedCaseNumberDescription
5002800000MMzJUAA1false00001000Per customer-"The client is now thinking of buying vs remodeling - I'll keep you posted!"
 
Puspender Tanwar
Ranch Hand
Posts: 451
2
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:Can you access the API other than through Java?Say via Postman (assuming this is a web service)?
Just to get the raw data that is returned.

I used the SOAP UI and the data returned is in multiline:


 
Liutauras Vilda
Marshal
Posts: 4111
227
BSD
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave, I think I misread your post slightly. I know what you mean. Agree with you.
 
Liutauras Vilda
Marshal
Posts: 4111
227
BSD
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Puspender Tanwar wrote:I used the SOAP UI and the data returned is in multiline:

Yeah. All trouble for you comes because of the quoted sentence, so it messes up loader.
 
Puspender Tanwar
Ranch Hand
Posts: 451
2
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Liutauras Vilda wrote:Yeah. All trouble for you comes because of the quoted sentence, so it messes up loader.
Actually problem is not the double quote only, the problem is the combination of double quote and newline.

don't you want to make that data dummy looking?
Sorry Liutauras, but what does that mean ?
 
Dave Tolls
Ranch Hand
Posts: 2652
29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So if that's what comes from the API call, then there must be something in your code that translates XML to a CSV.

And that part probably ought to do things differently.
 
Puspender Tanwar
Ranch Hand
Posts: 451
2
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:So if that's what comes from the API call, then there must be something in your code that translates XML to a CSV.

No, I get XML response when I sent request in XML using SOAP UI.
But actually I am using salesforce api to get the code. Salesforce has some methods that return us the data stream. Below is the class used to get the record stream.


Salesforce return the records as InputStream :

And then writeToCSVFile() is used to write that stream to CSV file. No XML in picture. Nothing I modified in data I got from salesforce
 
Liutauras Vilda
Marshal
Posts: 4111
227
BSD
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Liutauras Vilda wrote:don't you want to make that data dummy looking?

Puspender Tanwar wrote:Sorry Liutauras, but what does that mean ?

For the future, be careful exposing actual data. That may be sensitive and somebody can sue you, because of exposing it publicly.



 
Puspender Tanwar
Ranch Hand
Posts: 451
2
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Liutauras Vilda wrote:For the future, be careful exposing actual data. That may be sensitive and somebody can sue you, because of exposing it publicly.

Actually that data is dummy only, I entered that in my salesforce account just for test purpose. Nothing sensitive is in that.
Thanks
 
Liutauras Vilda
Marshal
Posts: 4111
227
BSD
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Puspender Tanwar wrote:Actually that data is dummy only, I entered that in my salesforce account just for test purpose.

Here we go, new info, did you actually enter description as:


?

Trying to understand how that API works and what it gives you as streamed data.
 
Puspender Tanwar
Ranch Hand
Posts: 451
2
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Liutauras Vilda wrote:Here we go, new info, did you actually enter description as:

Trying to understand how that API works and what it gives you as streamed data.

Yes, I entered single double quote in record. But salesforce enclose them using double quote while returning it. But in a weird way actually :
Per customer-" is enclosed by the salesforce api and returned as "Per customer-""
The client is now thinking of buying vs remodeling - I'll keep you posted!  is not enclosed by "
" is again enclosed by ".

Why only first and third rows are enclosed with " , why not the second line too. This is something weird.
multilineDescription.JPG
[Thumbnail for multilineDescription.JPG]
snapshot of salesforce
 
Liutauras Vilda
Marshal
Posts: 4111
227
BSD
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Puspender Tanwar wrote:Why only first and third rows are enclosed with " , why not the second line too. This is something weird.

That is because your file separator is comma, which distinct the columns. Commas as such, need to be identified too as not regular, but file/table column separators, hence they are enclosed in quotes (or fields enclosed in quotes, depends how you look at it, more correct would be say fields data enclosed in quotes).
Now, since you added your actual text in quotes, these quotes needs to be known as text quotes and not the field enclosing quotes, hence need to be escaped too.

And all that combination is too advance for your loader.

And might that wouldn't be so advance, but you putted record on 3 lines by having quotes on separate lines. Rather than having record as (by letting text field to do text formatting):


Of course user may enter by one character per row, and user is free to do so, but then on your side you need to have more sophisticated system to handle that.

Either to clean record (probably would need to consult with your line manager) if you are allowed... or something.
 
Liutauras Vilda
Marshal
Posts: 4111
227
BSD
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Actually probably we are still around the same place. I'd think you need to work on the part how csv file is written and I'm not sure about your loader (what it is and if you are allowed to do with him, i.e. improve).

Let's see what other contributors think.
 
Dave Tolls
Ranch Hand
Posts: 2652
29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
SQL Loader is a bulk loading tool for Oracle.
The loader, therefore, is a pretty fixed entity.

For me?
Talk to Salesforce.
As I said earlier, they must have encountered something like this before.
 
Liutauras Vilda
Marshal
Posts: 4111
227
BSD
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just for you better understand about what I said about those quotes.

In a raw file you have:

1. Very first quote on first line is field enclosing quote.
2. Pair of quotes at the end of the first line is the regular quote, which is escaped by another quote so the system would know it is a regular quote and need to be interpreted as regular text.
3. On line 3, first and second quotes are same as as at the end of line 1, regular quote which is escaped with quote to be treated as regular text.
4. On line 3, the last quote is the field enclosing quote.

5. And second line it is just text. Goes into the same database table field as other text in this text box, so nothing here.
 
Liutauras Vilda
Marshal
Posts: 4111
227
BSD
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:For me?
Talk to Salesforce.
As I said earlier, they must have encountered something like this before.

Maybe Dave, maybe.

But with a current knowledge about the problem, I'd say all records supposed to occupy at most one line. Meaning on the conversion part that needs to be ensured. That is what I can tell so far from my own experience how I saw multiple data.

And then if there is a need how that looks taken out from DB, some BI tool could handle that again according to some rules.
 
Dave Tolls
Ranch Hand
Posts: 2652
29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, SQL Loader is set up to handle multi-row records.
That's what the continueif parameter is for.

The immediate issue here is that the way the data has been exported by the Salesforce Bulk API is not handling multi-row data in a way that SQL Loader can be told to handle it.
That's the mismatch, and I would say Salesforce is best placed to at least provide pointers to a solution.  Anything we can do here is little more than massaging the data that (IMO) ought to be OK in the first place.

Maybe they'll say that the Bulk API shouldn't be used for this sort of thing.
They seem to have half a dozen ways to get at the data...
 
Liutauras Vilda
Marshal
Posts: 4111
227
BSD
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can also read their api docs, you may find something useful.
 
Puspender Tanwar
Ranch Hand
Posts: 451
2
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Somehow I managed to load the incorrect.csv into oracle using the below .ctl file.


Explaining what changes I made and why.
1. str X'220A'    -> After looking at the csv file I noticed that every logical record is ending with "LF . So I told SQL loader that look for "LF at ending of each record and if it not their consider it as physical record and  search in the next physical record until it found "LF. The position where it find "LF, create a logical record using the previous physical record.
2. removed CONTINUEIF LAST !="    -> I don't even I have idea why it was there. Why  my team decided to place this. As if we look at the csv file send by salesforce, whenever a record is split, salesforce enclose that also
                                                         and that make us sure that " can be in physical record as well. So no use of this.
3. removed OPTIONALLY ENCLOSED BY '"'   -> since i was telling Loader that my logical record ending would be "LF , this statement was confusing Loader whether the last " is enclosing character or the EOL character ("LF).

Now after these changes the data loader, BUT this time all the double quotes got entered to the database because we have removed  OPTIONALLY ENCLOSED BY '"' which was telling Loader to skip loading the enclosing character(") and only load the valid record. So, even I managed to load the data, I failed again in inserting the valid data.

Now what I think the only resolution is : to read the Inputstream as bytes, and whenever CRLF is encountered, replace it with LF. And whenever LF  is encountered, replace it with CRLF.
Eagerly waiting for your expert suggestions. Thanks
 
Dave Tolls
Ranch Hand
Posts: 2652
29
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Or...talk to someone at Salesforce.

This really has the feel of you shoehorning a fix into something that may not actually need it.
I see that sort of thing far too often in this industry.  An unwillingness to talk to the people who may have the answer and to plow on with a code hack.
 
Liutauras Vilda
Marshal
Posts: 4111
227
BSD
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Puspender,

Let us know after you talk with SalesForce, so we know too, after all we got involved quite a bit, it is interesting to know what's on offer from them.
 
Puspender Tanwar
Ranch Hand
Posts: 451
2
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sure Liutauras,
I have posted the question on developer.salesforce community, hope I will get some response there.
Thanks Dave, Liutauras and everyone who contributed in this thread. I learned a lot about EOL characters and Data Loader.
 
Puspender Tanwar
Ranch Hand
Posts: 451
2
Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sure Liutauras,
I have posted the question on developer.salesforce community, hope I will get some response there.
Thanks Dave, Liutauras and everyone who contributed in this thread. I learned a lot about EOL characters and Data Loader.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!