Zero-Padding a CSV with AWK
This was purely out of sheer need, and this was the fastest way I could’ve gotten it done (I ended up learning a LOT about CLI and the awk
command from this, so I’m really grateful for that)
The problem: I have a column in a utf-8
CSV file of type Integer, which should actually be type string and zero-padded up to (let’s say length N).
~/projects/awk_pad ❯ cat out.csv
a,Y,1
b,N,10
c,Y,12223253
What I want, is the following (output from the cat
tool):
~/projects/awk_pad ❯ cat out_clean.csv
a,Y,00000000001
b,N,00000000010
c,Y,00012223253
In order to accomplish the above, the following was used:
~/p/awk_pad ❯ awk -F ',' '{OFS=","};{print $1, $2, sprintf("%011d", $3)}' out.csv > out_clean.csv
Let’s break this down, the awk
command is a typical *NIX tool, which according to the manual page is short for gawk
, a “pattern scanning and processing language”. I’m not going into the interals, since it’s far too detailed; instead, I’d cover the above command alone.
The general syntax is as follows:
awk options 'selection _criteria {action }' input-file > output-file
Firstly, -F ','
is an option used to tell awk
that the input file is comma-separated. This allows accesses of line-elements by the $
(i.e., $2
would access the 2nd element of every line), since awk
works on a line-by-line basis.
The first part of the of the action is {OFS=","}
, which tells awk
that all arguments must be separated by the comma in output.
The second part of the action is {print $1, $2, sprintf("%011d", $3)}
. This tells awk to output the first and second arguments (think first and second columns of the CSV file), followed by a zero-padded version of the third argument (column).
The %011d
in this case says “print with precision of 11”, which ensures that the outpu is ALWAYS length eleven, anf if not is instead zero-padded. If the zero were replaced by a blank space " “, the resulting would have been space-padded strings.