Hi,
I have a unix command as follows:
Command : nawk -F, '{x=$4;gsub(" ", "", x);print >x".dat"}' $proj_file
The file passed to it is a .txt file with over 20 columns and over a million rows.
The above command will be performing the action as pasted below.
This command is performing all the work but it is taking a long time.
So i am considering doing this by java.
How do i achieve the same.
Description of the unix command:
I'm reading a file $proj_file (a variable set elsewhere in my script).
-F, sets "," as the field delimiter
x=$4 assigns the value of the fourth comma-delimited field to the variable "x"
gsub performs a global substitution against the string assigned to x. Basically it eliminates all spaces by changing them to the empty string.
print without further specification writes out the complete input record
> redirects to an output file
x".dat" is the output file, whose name is constructed from the variable "x" (remember: the fourth field of your input record, but with spaces eliminated) and a suffix ".dat".
Let's say I'm have an input record like tis:
123, 456, some more, this is it, rest of, record
My statement will write the complete input line to a file named "thisisit.dat"
Each line containing "this is it" will go to the same output file (awk always appends),
lines containing other data will go to other output files according to the name construction rules above.
In other words, the command distributes the content of an input file to one or several output files, depending on the content of a particular column in the input data.