Forums Register Login

How to convert End Of line character of CSV file using java.

+Pie Number of slices to send: Send
I am using some API and getting data stream, then i saved that data stream to a csv file using java.

Now the problem is that the End of Line character in this csv file is LF, but I want them to be CRLF (carriage return+line feed).
Is their any way I can do this ?

Thanks
+Pie Number of slices to send: Send
I know I can end each record with CRLF using :


This will end every line with CRLF (for windows environment)
But original stream has some lines ending with LF and some ending with CRLF, all I want is to swap them with each other.
+Pie Number of slices to send: Send
Why?
+Pie Number of slices to send: Send
I presume somebody has told you they want CRLF, which is the only justification for using a particular line end. In which case forget about buffered writers. Use a Formatter object.
myFileFormatter.format("%s\r\n", textOfLine);

You would probably be better off using a special CSV writing program.
+Pie Number of slices to send: Send
 

Stephan van Hulst wrote:Why?


Actually I am importing the CSV file to oracle DB using Oracle SQL Loader. But there is a multiline record. Now I have two CSV having exactly same records but they have different EOL character.
The one which is inserting successfully contains the CRLF as EOL character and for the multi line record, the record is seperated by LF. Attached is the snapshot(correctFile.jpg).
The other while is contains same record has EOL character as LF, and for separating multi line record CRLF is used. No for this file SQL Loader is giving me the error as:
Record 1: Rejected - Error on table ODI_DEV_TARGET.CASE, column DESCRIPTION.
second enclosure string not present


Below is the .ctl file used for SQL Loader :


Since the file which is inserting successfully contains CRLF as EOL character, I modified my .ctl file by changing INFILE :
and using this also file got successfully loaded.

So I thought that for CRLF if passing "str '\r\n'" works, than passing the "str '\n'" for LF should work but that's not the case.

Case_correct.JPG
file which successfully inserted
[Case_correct.JPG]
case_incorrect.JPG
file which didn't successfully inserted
[case_incorrect.JPG]
+Pie Number of slices to send: Send
 

Puspender Tanwar wrote:Actually I am importing the CSV file to oracle DB using Oracle SQL Loader.  . . .

Please always tell us the full details; if you give incomplete details you will get the answer to the wrong question. Which has happened here. Nobody is asking you for a file with CRLF. Since methods like readLine() remove the line end sequence anyway, you cannot tell which line end sequence a line uses. Also, a look at the dcumentation for readLine() shou‍ld tell you that the line end is a red herring.
How do you know you are reading the input file correctly in the first place? What happens if you simply read it and count the lines?

I think the error has to do with how you are writing to the database. What happens if you insert the contents of a line into the database via the command line?

I shall add this discussion to our databases forum.
+Pie Number of slices to send: Send
By the way: why has one file got four columns in and the other three?
+Pie Number of slices to send: Send
I am sorry for the details. Please delete this thread. I am creating another on Database Forum with full details.
Thanks
+Pie Number of slices to send: Send
 

Puspender Tanwar wrote:. . . Now I have two CSV having exactly same records

Are you sure about that?

. . . second enclosure string not present . . .

Doesn't that error message give you any hints?

So I thought that for CRLF if passing "str '\r\n'" works, than passing the "str '\n'" for LF should work but that's not the case.

I thought earlier that the line ends were a red herribng.
+Pie Number of slices to send: Send
 

Campbell Ritchie wrote:Are you sure about that?


yes, Attaching the new snapshot for csv file. sorry for earlier snaps.

Campbell Ritchie wrote:Doesn't that error message give you any hints?


Yes, I understand that error. A double quote started but did't closed before EOL. But why don't that error comes up when I loaded the same file(first file, that uses different EOL character).

correctFile.jpg
file that got inserted
[Thumbnail for correctFile.jpg]
incorrectFile.jpg
file that didn't got inserted
[Thumbnail for incorrectFile.jpg]
+Pie Number of slices to send: Send
As Campbell says, the line endings are a red herring.

From your control file:


That says, to only continue reading (ie treat the following line as a continuation of the record) if the line doesn't end in double-quotes.
But your example data does.

Now, since you have a file that does work, and that appears to have double quotes, could you actually copy/paste from both files (not an image) the record in question here.
Don't put it in code tags or anything.

I'm asking that because, and this could be something to do with image quality, the two images really don't look like they have the same quote glyphs.
+Pie Number of slices to send: Send
Does header have any significance in your application? They aren't same.
+Pie Number of slices to send: Send
Records at lines 2, 3, 4 are also strange.
At line 2 there is one extra quote, which can escape last quote, that would mean last column isn't closed, hence missing.
Line 3 don't get at all.
Line 4 problems look similar to line 2, just way less columns.

What these are?

+Pie Number of slices to send: Send
The files now look different from what you posted first. As DT says, please post the actual text.
+Pie Number of slices to send: Send
Sorry to late response. Here are the files.
The file which got loaded:

"ID","ISDELETED","CASENUMBER","DESCRIPTION"
"5002800000MMzJUAA1","false","00001000","Per customer-""
The client is now thinking of buying vs remodeling - I'll keep you posted!
"""
"5002800000MMzJVAA1","false","00001001",""
"5002800000MMzJWAA1","false","00001002",""
"5002800000MMzJXAA1","false","00001003",""
"5002800000MMzJYAA1","false","00001004",""
"5002800000MMzJZAA1","false","00001005",""
"5002800000MMzJaAAL","false","00001006",""
"5002800000MMzJbAAL","false","00001007",""
"5002800000MMzJcAAL","false","00001008",""
"5002800000MMzJdAAL","false","00001009",""
"5002800000MMzJeAAL","false","00001010",""
"5002800000MMzJfAAL","false","00001011",""
"5002800000MMzJgAAL","false","00001012",""
"5002800000MMzJhAAL","false","00001013",""
"5002800000MMzJiAAL","false","00001014",""
"5002800000MMzJjAAL","false","00001015",""
"5002800000MMzJkAAL","false","00001016",""
"5002800000MMzJlAAL","false","00001017",""
"5002800000MMzJmAAL","false","00001018",""
"5002800000MMzJnAAL","false","00001019",""
"5002800000MMzJoAAL","false","00001020",""
"5002800000MMzJpAAL","false","00001021",""
"5002800000MMzJqAAL","false","00001022",""
"5002800000MMzJrAAL","false","00001023",""
"5002800000MMzJsAAL","false","00001024",""
"5002800000MMzJtAAL","false","00001025",""

File which failed to load :

"Id","IsDeleted","CaseNumber","Description"
"5002800000MMzJUAA1","false","00001000","Per customer-""
The client is now thinking of buying vs remodeling - I'll keep you posted!
"""
"5002800000MMzJVAA1","false","00001001",""
"5002800000MMzJWAA1","false","00001002",""
"5002800000MMzJXAA1","false","00001003",""
"5002800000MMzJYAA1","false","00001004",""
"5002800000MMzJZAA1","false","00001005",""
"5002800000MMzJaAAL","false","00001006",""
"5002800000MMzJbAAL","false","00001007",""
"5002800000MMzJcAAL","false","00001008",""
"5002800000MMzJdAAL","false","00001009",""
"5002800000MMzJeAAL","false","00001010",""
"5002800000MMzJfAAL","false","00001011",""
"5002800000MMzJgAAL","false","00001012",""
"5002800000MMzJhAAL","false","00001013",""
"5002800000MMzJiAAL","false","00001014",""
"5002800000MMzJjAAL","false","00001015",""
"5002800000MMzJkAAL","false","00001016",""
"5002800000MMzJlAAL","false","00001017",""
"5002800000MMzJmAAL","false","00001018",""
"5002800000MMzJnAAL","false","00001019",""
"5002800000MMzJoAAL","false","00001020",""
"5002800000MMzJpAAL","false","00001021",""
"5002800000MMzJqAAL","false","00001022",""
"5002800000MMzJrAAL","false","00001023",""
"5002800000MMzJsAAL","false","00001024",""
"5002800000MMzJtAAL","false","00001025",""


Liutauras Vilda wrote:Does header have any significance in your application? They aren't same.


No, columns are not case sensitive.
+Pie Number of slices to send: Send
I counted the double quotes as well, but they are same in both the files.
Why couldn't it be EOL character issue ?
+Pie Number of slices to send: Send
Thank you for posting the text; I did a diff and the only difference is that the headings are in UPPER CASE in one file and lower case in the other. You have already confirmed that doesn't matter, since SQL is case‑insensitive. In view of the triple quote mark in line 4, I am surprised that either file can be correctly parsed into SQL.
I told you to look at the documentation for readLine() and that it removes the line ends, so LF versus CRLF is still a red herring.
+Pie Number of slices to send: Send
 

Campbell Ritchie wrote:Thank you for posting the text; I did a diff and the only difference is that the headings are in UPPER CASE in one file and lower case in the other. You have already confirmed that doesn't matter, since SQL is case‑insensitive. In view of the triple quote mark in line 4, I am surprised that either file can be correctly parsed into SQL.
I told you to look at the documentation for readLine() and that it removes the line ends, so LF versus CRLF is still a red herring.


Hello Campbell,
readline() is not in the picture now. I have these csv files, I am loading them to oracle using SQL loader command
sqlldr odi_dev_target/odi_dev_target@odidb52 control=ctlfile.ctl log=logfile.log
Java code is not coming to the picture, that only came into picture when I create the CSV file. But the question here is why SQL Loader is behaving different for different EOL characters.
+Pie Number of slices to send: Send
Also, the CSV which in not loading(EOL LF) is created using java code, but not by readline(). it is created the first code i have posted in this thread. So, readline() not in picture.
And the second CSV which is loading successfully, I have downloaded that from some application.
Actually its a salesforce.com data, first I have downloaded using their API and second is using Data Loader tool.
+Pie Number of slices to send: Send
Open that second file (which doesn't load) in notepad++ or other editor and change manually those line endings and please verify to us, that problem indeed is that.
It is taking too long to debug all that stuff.
+Pie Number of slices to send: Send
 

Liutauras Vilda wrote:Open that second file (which doesn't load) in notepad++ or other editor and change manually those line endings and please verify to us, that problem indeed is that.
It is taking too long to debug all that stuff.


In Notepad++ we have only option to change EOL character according to the platforms(windows, unix, mac). But that will convert all of the EOL characters, I want to swap them. Swap LF with CRLF and vice versa.
+Pie Number of slices to send: Send
So, what is the problem. Do replace for all records:
find: \n
replace with: \r\n

And then you end up with 2 and 3rd line records which need to be in sync with first file, meaning have LF, which you can using regular expression change too.

Load it and see if it goes through.
+Pie Number of slices to send: Send
 

Liutauras Vilda wrote:And then you end up with 2 and 3rd line records which need to be in sync with first file, meaning have LF, which you can using regular expression change too.


How to change CRLF with LF using Regex ?
+Pie Number of slices to send: Send
After replacing the LF with CRLF using notepad++, I have the below(attached) file now. But in order to sync it with the first file(correct.csv), I need to convert the EOL character of 2nd and 3rd to LF. How can I do it ?

incorrect_change.JPG
CRLF at every line's ending
[Thumbnail for incorrect_change.JPG]
+Pie Number of slices to send: Send
Take it.
Filename: for_puspender.csv
Description:
File size: 1 Kbytes
[Download for_puspender.csv] Download Attachment
+Pie Number of slices to send: Send
After that, try to load your version with all CRLF's and tell us what happens.
+Pie Number of slices to send: Send
 

Liutauras Vilda wrote:After that, try to load your version with all CRLF's and tell us what happens.


The csv you attached got loaded successfully, attached is the logFile.log. Could you please tell me how you changed CRLF to LF ?


SQL*Loader: Release 11.2.0.1.0 - Production on Mon Jul 3 19:11:12 2017

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

Control File:   ctlfile.ctl
Character Set UTF8 specified for all input.

Data File:      C:\Users\puspender.tanwar\Desktop\try_SQLLDR\for_puspender.csv
 Bad File:     C:\Users\puspender.tanwar\Desktop\try_SQLLDR\badfile.bad
 Discard File: C:\Users\puspender.tanwar\Desktop\try_SQLLDR\DSCfile.dsc
(Allow all discards)

Number to load: ALL
Number to skip: 1
Errors allowed: 1
Bind array:     64 rows, maximum of 256000 bytes
Continuation:   Last non-white character != 0X22(character '"')
               Preserving continuation characters as data
Path used:      Conventional

Table ODI_DEV_TARGET.CASE, loaded from every logical record.
Insert option in effect for this table: INSERT
TRAILING NULLCOLS option in effect

  Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
ID                                  FIRST     *   ,  O(") CHARACTER            
   SQL string for column : "REPLACE(:ID,'<br>',chr(10))"
ISDELETED                            NEXT     *   ,  O(") CHARACTER            
   SQL string for column : "CASE WHEN :ISDELETED='true' then 'T' ELSE 'F' END"
CASENUMBER                           NEXT     *   ,  O(") CHARACTER            
   SQL string for column : "REPLACE(:CASENUMBER,'<br>',chr(10))"
DESCRIPTION                          NEXT 30000   ,  O(") CHARACTER            

value used for ROWS parameter changed from 64 to 8

Table ODI_DEV_TARGET.CASE:
 26 Rows successfully loaded.
 0 Rows not loaded due to data errors.
 0 Rows not loaded because all WHEN clauses were failed.
 0 Rows not loaded because all fields were null.


Space allocated for bind array:                 246208 bytes(8 rows)
Read   buffer bytes: 1048576

Total logical records skipped:          1
Total logical records read:            26
Total logical records rejected:         0
Total logical records discarded:        0

Run began on Mon Jul 03 19:11:12 2017
Run ended on Mon Jul 03 19:11:13 2017

Elapsed time was:     00:00:00.50
CPU time was:         00:00:00.03

+Pie Number of slices to send: Send
 

Liutauras Vilda wrote:After that, try to load your version with all CRLF's and tell us what happens.


sorry, in last reply I quoted wrong post. That was for the csv you attached for me.
Below is for the file which contains all records ending with CRLF:
The first record got skipped and other got inserted. Below is the logFile.log for file containing CRLF for all record:


SQL*Loader: Release 11.2.0.1.0 - Production on Mon Jul 3 19:19:36 2017

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

Control File:   ctlfile.ctl
Character Set UTF8 specified for all input.

Data File:      C:\Users\puspender.tanwar\Desktop\try_SQLLDR\csvfile3.csv
 Bad File:     C:\Users\puspender.tanwar\Desktop\try_SQLLDR\badfile.bad
 Discard File: C:\Users\puspender.tanwar\Desktop\try_SQLLDR\DSCfile.dsc
(Allow all discards)

Number to load: ALL
Number to skip: 1
Errors allowed: 1
Bind array:     64 rows, maximum of 256000 bytes
Continuation:   Last non-white character != 0X22(character '"')
               Preserving continuation characters as data
Path used:      Conventional

Table ODI_DEV_TARGET.CASE, loaded from every logical record.
Insert option in effect for this table: INSERT
TRAILING NULLCOLS option in effect

  Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
ID                                  FIRST     *   ,  O(") CHARACTER            
   SQL string for column : "REPLACE(:ID,'<br>',chr(10))"
ISDELETED                            NEXT     *   ,  O(") CHARACTER            
   SQL string for column : "CASE WHEN :ISDELETED='true' then 'T' ELSE 'F' END"
CASENUMBER                           NEXT     *   ,  O(") CHARACTER            
   SQL string for column : "REPLACE(:CASENUMBER,'<br>',chr(10))"
DESCRIPTION                          NEXT 30000   ,  O(") CHARACTER            

value used for ROWS parameter changed from 64 to 8
Record 1: Rejected - Error on table ODI_DEV_TARGET.CASE, column DESCRIPTION.
second enclosure string not present
Record 2: Rejected - Error on table ODI_DEV_TARGET.CASE, column ID.
ORA-12899: value too large for column "ODI_DEV_TARGET"."CASE"."ID" (actual: 77, maximum: 50)


MAXIMUM ERROR COUNT EXCEEDED - Above statistics reflect partial run.

Table ODI_DEV_TARGET.CASE:
 0 Rows successfully loaded.
 2 Rows not loaded due to data errors.
 0 Rows not loaded because all WHEN clauses were failed.
 0 Rows not loaded because all fields were null.


Space allocated for bind array:                 246208 bytes(8 rows)
Read   buffer bytes: 1048576

Total logical records skipped:          1
Total logical records read:             9
Total logical records rejected:         2
Total logical records discarded:        0

Run began on Mon Jul 03 19:19:36 2017
Run ended on Mon Jul 03 19:19:37 2017

Elapsed time was:     00:00:00.25
CPU time was:         00:00:00.07


"5002800000MMzJUAA1","false","00001000","Per customer-""
The client is now thinking of buying vs remodeling - I'll keep you posted!
"""

got skipped and gone to the badFile.bad


+Pie Number of slices to send: Send
Alright, clear what is that bit. So the lines 2 and 3 and 4 are actually 1 record, and those excessive quotes are escaping some quotes in order to have sentence enclosed in quotes (this is what I said at the beginning), so after all, it has 4 columns too and this record ends with CRLF (line 4) as every other record.

Why are you writing that record on 3 separate lines?

+Pie Number of slices to send: Send
 

Puspender Tanwar wrote:"5002800000MMzJUAA1","false","00001000","Per customer-""
The client is now thinking of buying vs remodeling - I'll keep you posted!
"""
got skipped and gone to the badFile.bad


Yes, because these were treated as 3 records, where:
1st has 4 columns + 1 loose quote (broken record)
2nd has just a text not enclosed in quotes, even if it were, there would be missing another 3 columns, so, that would need to be "sentence","","","" (after all, broken record)
3rd has 1 column and 1 loose quote (broken record).

You need to work on file writing phase in my opinion in order either to get lines 2, 3 and 4 on the same line, or get line endings LF for those lines, so during the load the program would know that lines 2, 3 and 4 are same record.

Let us see what guys think here and can suggest to you.

+Pie Number of slices to send: Send
 

Liutauras Vilda wrote:Alright, clear what is that bit. So the lines 2 and 3 and 4 are actually 1 record, and those excessive quotes are escaping some quotes in order to have sentence enclosed in quotes (this is what I said at the beginning), so after all, it has 4 columns too and this record ends with CRLF (line 4) as every other record.

Why are you writing that record on 3 separate lines?


Yes, 2, 3 and 4 lines are single record. I am not writing these records, I am using the Salesforce.com Api and I get the data in this format only(the incorrect file). I am not supposed to change these multiline records to single line, I need to import this format csv only to oracle.

comma(,) is used as seperator and double quote(") is used as enclosing character for data.
+Pie Number of slices to send: Send
 

Puspender Tanwar wrote:I am not writing these records, I am using the Salesforce.com Api and I get the data in this format only(the incorrect file).


Dodgy API then, because it gives you a file which is broken already. I need to take it back due to recent Dave's findings.

Puspender Tanwar wrote:I am not supposed to change these multiline records to single line


Who told you that? This is only how the raw file looks like, after insertion, it gets inserted in the DB and treated as 1 record, regardless if you have it on 3 different lines or you'd have on 1 line

Load file with lines 2, 3 and 4 as one liner ending with CRLF and you'll see it goes through and in DB looking exactly the same.
+Pie Number of slices to send: Send
Your end goal as far as I understand is to load data to DB after it gets downloaded from SalesForce crm system or so.

So, keep original file untouched after it gets downloaded and make copy of it, but fixed one. That fixing part needs to be happening upon file writing (as you don't want to fix them manually).
(1 cow)
+Pie Number of slices to send: Send
Right, just gone through the docs, and this is the key part (https://docs.oracle.com/cd/B28359_01/server.111/b28319/ldr_concepts.htm#i1004685, the bit under Stream Record Format):

On Windows NT, if no terminator_string is specified, then SQL*Loader uses either \n or \r\n as the record terminator, depending on which one it finds first in the datafile. This means that if you know that one or more records in your datafile has \n embedded in a field, but you want \r\n to be used as the record terminator, you must specify it.



So, for the file that loads sqlldr has selected CR/LF as the terminator.
Consequently the "multiline" record is actually no such thing, it is treated as a single record and the continueif is never used...an LF is simply not recognised as the end of record.

For the one that fails it has picked up LF as the terminator, so all lines are being treated as new records.
This implies that your continueif is not working.
+Pie Number of slices to send: Send
So, it's the continueif.
For the version with LF as the record terminator, does the in-record new line always come out as CRLF?

If so, how about trying CR ('\r') as the CONTINUEIF LAST instaed of the double quotes?
+Pie Number of slices to send: Send
Actually, am I reading the first few posts correctly?

Is the data coming from the Salesforce.com Api you call correct?
That is, does it leave Salesforce with the "correct" mix of CRLF anf LF?
+Pie Number of slices to send: Send
 

Dave Tolls wrote:So, it's the continueif.
For the version with LF as the record terminator, does the in-record new line always come out as CRLF?

If so, how about trying CR ('\r') as the CONTINUEIF LAST instaed of the double quotes?


Yeah, he got very unfortunate 1st record, because sentence is enclosed into the actual quotes, hence this continueif misbehaving.
+Pie Number of slices to send: Send
It's not really misbehaving.  It's doing what's asked of it.


If I read the earlier parts of the thread correctly, the Salesforce API is producing the correct output (CRLF terminators, with LF used inside multiline values), but the Java code is misinterpreting it?
That would make sense on their part, as it "solves" the import side...at least for Windows machines.  Wonder what they do for *nixes?
Anyway, I'll have to wait for the OP to come back to clarify that part.
These are the worst of times and these are the best of times. And this is the best tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com


reply
reply
This thread has been viewed 148045 times.
Similar Threads
Unable to use "\r\n" as a line separator so a file destined for notepad is correct
Image saving size reduction
Reading the multiple data from csv
current path
Writing an Image Into a Local File from a Socket Stream Without Using HTTP-related Classes
More...

All times above are in ranch (not your local) time.
The current ranch time is
Apr 16, 2024 04:37:47.