CodeGym /Courses /Docker SELF /Formatting Data Using `awk`

Formatting Data Using `awk`

Docker SELF
Level 8 , Lesson 2
Available

Formatting Data Using awk

1. Getting to Know awk

awk — a powerful text utility for data processing. If you think of a text file as a table, where the lines are rows (makes sense, right?), and the columns are data separated by spaces, then awk becomes your best buddy. It’ll help you quickly select columns, filter rows, format data, and even perform arithmetic operations.

It’s named after its creators — Alfred Aho, Peter Weinberger, and Brian Kernighan (yep, the same guy who worked on the C language).

The main idea behind awk is processing data using patterns and actions, which makes it kind of like a mini programming language.

Syntax of awk

The basic syntax of the command looks like this:

awk 'pattern {action}' file
  • pattern — the condition checked for each line in the file.
  • action — the operations performed on lines matching the pattern.
  • If there’s no pattern, the action is performed on every line.

Example:

awk '{print $1}' data.txt

This command will print the first column ($1) for every line in the data.txt file.


2. Basic Features of awk

1. Selecting Columns

The simplest way to use awk is to select one or more columns from a file. A field is represented as $n, where n is the column number, starting from 1.

Example:

Print the first and third columns:

awk '{print $1, $3}' data.txt

Suppose our file data.txt looks like this:

John 25 Engineer
Jane 30 Designer
Mike 28 Developer

Result:

John Engineer
Jane Designer
Mike Developer

2. Conditional Row Processing

Conditions allow you to process only those rows that meet specific criteria.

Example:

Print rows where the value in the second column is greater than 27:

awk '$2 > 27 {print $1, $2}' data.txt

Result:

Jane 30
Mike 28

3. Arithmetic Operations

awk can perform arithmetic operations. This is helpful when you need to calculate something on the fly.

Example:

Add 10 to the value in the second column:

awk '{print $1, $2+10}' data.txt

Result:

John 35
Jane 40
Mike 38

4. Counting Rows

awk automatically knows how many rows it has processed. This information is stored in the variable NR (Number of Records).

Example:

Count the number of rows in a file:

awk 'END {print NR}' data.txt

Result:

3

3. Advanced Features

1. Output Formatting

awk supports powerful formatted output using the printf function. This is similar to the printf function in C.

Example:

Display data with alignment:

awk '{printf "%-10s %-5s %-10s\n", $1, $2, $3}' data.txt

Output:

John       25    Engineer  
Jane       30    Designer  
Mike       28    Developer 

2. Variables

You can use variables to store data and simplify operations.

Example:

Calculate the sum of the second column:

awk '{sum += $2} END {print "Total Age:", sum}' data.txt

Output:

Total Age: 83

3. Regular Expressions

awk supports regular expressions for finding lines.

Example:

Display lines where the first column contains the letter J:

awk '/J/ {print $0}' data.txt

Output:

John 25 Engineer
Jane 30 Designer

4. Practical Example

1. Analyzing a System Log

Let's say we have a system log file /var/log/syslog, and we want to find out which processes are mentioned most often.

Command:

cat /var/log/syslog | awk '{print $5}' | sort | uniq -c | sort -nr | head -10

What it does:

  1. cat /var/log/syslog — reads the contents of the file.
  2. awk '{print $5}' — extracts the fifth column (process name).
  3. sort — sorts the lines alphabetically.
  4. uniq -c — counts the number of unique lines.
  5. sort -nr — sorts the lines in descending numeric order.
  6. head -10 — displays the top 10 processes.

2. Generating a Salary Report

We have a file salaries.txt:

John 25 4000
Jane 30 5000
Mike 28 4500

Task:

Increase salaries by 10% and display the final report.

Solution:

awk '{new_salary = $3 * 1.1; printf "%-10s %-5s %-10.2f\n", $1, $2, new_salary}' salaries.txt

Result:

John       25    4400.00
Jane       30    5500.00
Mike       28    4950.00

5. Common Mistakes When Working with awk

Issues with Delimiters

By default, awk uses spaces or tabs as delimiters. If your data is separated by something else (like commas or colons), you need to specify this using the -F option.

Example:

File data.csv:

John,25,Engineer
Jane,30,Designer
Mike,28,Developer

Command for working with CSV:

awk -F',' '{print $1, $3}' data.csv

Result:

John Engineer
Jane Designer
Mike Developer

Skipping Fields Due to Bad Formatting

Sometimes lines can have unexpected spaces or missing columns. This can cause errors. It’s always a good idea to check your data before starting work.

Practical Application

You just learned how to use awk to analyze system logs, work with salary data, and create reports. These skills will help you work with large databases, CSV files, and logs on real-world projects. If you’re in DevOps, analyzing system logs with awk will be your superpower. And if you’re a developer, it’s a great way to quickly manipulate data right from the terminal.

For a deeper dive into awk, I recommend checking out the official GNU Awk documentation. Now you definitely know how to make your data more obedient!

1
Task
Docker SELF, level 8, lesson 2
Locked
Output the first column of data
Output the first column of data
1
Task
Docker SELF, level 8, lesson 2
Locked
Conditional line output
Conditional line output
1
Task
Docker SELF, level 8, lesson 2
Locked
Calculating Total Sum
Calculating Total Sum
1
Task
Docker SELF, level 8, lesson 2
Locked
Formatting Output and Calculating Salary
Formatting Output and Calculating Salary
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION