CodeGym /Courses /Docker SELF /Sorting and Filtering Files: Commands `sort`, `uniq`, `cu...

Sorting and Filtering Files: Commands `sort`, `uniq`, `cut`

Docker SELF
Level 8 , Lesson 3
Available

1. Command sort

The sort command is used for sorting lines in a text file. It works with any text format: from plain text to CSV files. By default, it sorts lines alphabetically. However, there are a bunch of options that let you do some pretty cool stuff.

Basic Syntax

sort [options] file

Simple Example

Let's say we have a file names.txt with the following content:

Charlie
Alice
Bob
David

We can sort it alphabetically like this:

sort names.txt

Result:

Alice
Bob
Charlie
David

Key Options for sort

Sorting in Reverse Order

If you want to sort the lines in descending order, use the -r option:

sort -r names.txt

Result:

David
Charlie
Bob
Alice

Numeric Sorting

For numbers, alphabetical sorting may not work correctly. For example, here’s the content of the file numbers.txt:

10
2
5
1

Alphabetical sorting will give:

1
10
2
5

But with the -n (numeric sorting) option:

sort -n numbers.txt

The result will be correct:

1
2
5
10

Ignoring Leading and Trailing Spaces

Sometimes lines have spaces at the beginning or end. To ensure sort isn’t confused, use the -b option:

sort -b dirty_file.txt

Example

Suppose we have a website visit log file visits.log:

user2 15
user1 5
user3 30
user4 20

We want to sort users by the number of visits (the second column). This is how it's done:

sort -k2 -n visits.log

Where -k2 means "use the second column for sorting". Result:

user1 5
user2 15
user4 20
user3 30

2. Command uniq

The uniq command removes duplicate lines in a file. But it's important to remember: it only works with consecutive duplicates. So, if identical lines appear in different parts of the file, you need to sort them first.

Basic Syntax

uniq [options] file

Simple Example

Let's say we have a file colors.txt:

red
green
green
blue
blue
blue
red

If we simply use uniq:

uniq colors.txt

The result will be:

red
green
blue
red

Removing inevitable duplicates

First, let's sort the file:

sort colors.txt | uniq

Result:

blue
green
red

Key Options for uniq

Counting Repetitions

If you want to know how many times each line occurred, use the -c option:

sort colors.txt | uniq -c

Result:

   3 blue
   2 green
   2 red

Example

In the file access.log we have a list of IP addresses:

192.168.0.1
192.168.0.2
192.168.0.1
192.168.0.3
192.168.0.1

We want to find out which IP occurred most frequently:

sort access.log | uniq -c | sort -rn

Result:

   3 192.168.0.1
   1 192.168.0.2
   1 192.168.0.3

3. Command cut

The cut command allows you to extract specific parts of strings, like individual columns in a CSV file or ranges of characters.

Main Syntax

cut [options] file

Simple Example

File data.csv:

Alice,25,Developer
Bob,30,Designer
Charlie,22,Manager

Let’s extract only the names (the first column):

cut -d',' -f1 data.csv

Result:

Alice
Bob
Charlie

Where:

  • -d',' — delimiter (comma).
  • -f1 — field (column) to extract.

Key Options for cut

Selecting a Range of Characters

If our file data.csv has fixed-width columns:

Alice      25 Developer
Bob        30 Designer
Charlie    22 Manager

We can extract only the age (characters 12 to 14):

cut -c12-14 data.csv

Result:

25
30
22

Selecting Multiple Fields

If we have a file log.csv:

2023-01-01,INFO,Server started
2023-01-02,ERROR,Connection failed
2023-01-03,INFO,Server stopped

Let’s select only the date and log level (fields 1 and 2):

cut -d',' -f1,2 log.csv

Result:

2023-01-01,INFO
2023-01-02,ERROR
2023-01-03,INFO

4. Practical Example: Combining sort, uniq, and cut

Let's take a log file with the following data:

user1,192.168.0.1
user2,192.168.0.2
user1,192.168.0.1
user3,192.168.0.3
user2,192.168.0.2
  1. Extract the IP addresses:
cut -d',' -f2 log.txt
  1. Remove duplicates and count them:
cut -d',' -f2 log.txt | sort | uniq -c
  1. Sort by the number of occurrences:
cut -d',' -f2 log.txt | sort | uniq -c | sort -rn

Result:

2 192.168.0.2
2 192.168.0.1
1 192.168.0.3
1
Task
Docker SELF, level 8, lesson 3
Locked
Sorting a list of numbers
Sorting a list of numbers
1
Task
Docker SELF, level 8, lesson 3
Locked
Counting unique lines
Counting unique lines
1
Task
Docker SELF, level 8, lesson 3
Locked
Extracting columns from CSV
Extracting columns from CSV
1
Task
Docker SELF, level 8, lesson 3
Locked
Web Log Analysis
Web Log Analysis
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION