Friday, November 27, 2015

Unix - AWK Command by examples. Part 1

AWK command is one of the most powerful tools in Unix used for processing the rows and columns in a file. AWK has built in string functions and associative arrays and supports all functionalities of C language. We can also convert AWK scripts into Perl scripts using a2p utility, an added advantage of AWK.

awk 'BEGIN {start_action} {action} END {stop_action}' file

The actions in the begin block are performed before processing the file.
The actions in the end block are performed after processing the file.
The rest of the actions are performed while processing the file.

Create a file test_file with the following data. This file can be easily created using the output of ls -l.
-rw-rw-rw- 1 unixrun unixrun   0 Nov  26 21:39 AAAA
-rw-rw-rw- 1 unixrun unixrun 175 Nov  26 21:15 BBBB
-rw-rw-rw- 1 unixrun unixrun 430 Nov  26 21:39 EEEE
-rw-rw-rw- 1 unixrun unixrun 428 Nov  26 21:39 GGGG

Example 1. awk '{print $1}' test_file

$1, $2, $3... represents the first, second, third column,.. in a row respectively.
Here, awk command will print the first column in each row as shown below.

The variable "$0" refers to the entire line that AWK reads in. That is, if you had nine fields in a line,
print $0
is similar to
print $1, $2, $3, $4, $5, $6, $7, $8, $9

-rw-rw-rw-
-rw-rw-rw-
-rw-rw-rw-
-rw-rw-rw-

To print the 2th and 5th columns in a file use awk '{print $2,$5}' test_file

Example 2. awk 'BEGIN {cnt=0} {cnt=cnt+5+$3} END {print cnt}' test_file

This will prints the sum of the value as  (value of 3th column + 5 ).
In the Begin block the cnt variable is assigned with value 0.
In the next block the v (value of 3rd column + 5) is added to cnt variable.
In addition of this, same scenario is repeated for no of rows in a file.
When all the rows are processed the cnt variable will hold the sum of the values as per our formula (value of 3rd column + 5).
Finally, the output printed in the End block.

We can perform the above steps in another way also - by creating command file.

vi summation.awk
#!/usr/bin/awk -f
BEGIN {cnt=0}
{cnt=cnt+$3+5}
END {print cnt}
:wq

This will create a file named as summation.awk, it will be executed by using awk command as below:

awk -f summation.awk test_file.

This will run the script in summation.awk file and displays the sum as per our formula (sum of the 3rd column + 5 ) in the test_file.

Example 3. awk '{ if($5 == "time") print $0;}' test_file

Above command is used for to find a particular string in a file in a specific column.
Here, it searches the string "time" in the 5th column  of the file and prints entire line in output.

-rw-r--r-- 1 unixrun unixrun 43 Nov  26 21:39 time

Example 4. awk 'BEGIN { for(i=1;i<=4;i++) print "Cube of", i, "is",i*i*i; }'

This will print the Cube of numbers from 1 to 4, The output of the command is

Cube of 1 is 1
Cube of 2 is 8
Cube of 3 is 27
Cube of 4 is 64