1. Command sort
The sort command is used for sorting lines in a text file. It works with any text format: from plain text to CSV files. By default, it sorts lines alphabetically. However, there are a bunch of options that let you do some pretty cool stuff.
Basic Syntax
sort [options] file
Simple Example
Let's say we have a file names.txt with the following content:
Charlie
Alice
Bob
David
We can sort it alphabetically like this:
sort names.txt
Result:
Alice
Bob
Charlie
David
Key Options for sort
Sorting in Reverse Order
If you want to sort the lines in descending order, use the -r option:
sort -r names.txt
Result:
David
Charlie
Bob
Alice
Numeric Sorting
For numbers, alphabetical sorting may not work correctly. For example, here’s the content of the file numbers.txt:
10
2
5
1
Alphabetical sorting will give:
1
10
2
5
But with the -n (numeric sorting) option:
sort -n numbers.txt
The result will be correct:
1
2
5
10
Ignoring Leading and Trailing Spaces
Sometimes lines have spaces at the beginning or end. To ensure sort isn’t confused, use the -b option:
sort -b dirty_file.txt
Example
Suppose we have a website visit log file visits.log:
user2 15
user1 5
user3 30
user4 20
We want to sort users by the number of visits (the second column). This is how it's done:
sort -k2 -n visits.log
Where -k2 means "use the second column for sorting". Result:
user1 5
user2 15
user4 20
user3 30
2. Command uniq
The uniq command removes duplicate lines in a file. But it's important to remember: it only works with consecutive duplicates. So, if identical lines appear in different parts of the file, you need to sort them first.
Basic Syntax
uniq [options] file
Simple Example
Let's say we have a file colors.txt:
red
green
green
blue
blue
blue
red
If we simply use uniq:
uniq colors.txt
The result will be:
red
green
blue
red
Removing inevitable duplicates
First, let's sort the file:
sort colors.txt | uniq
Result:
blue
green
red
Key Options for uniq
Counting Repetitions
If you want to know how many times each line occurred, use the -c option:
sort colors.txt | uniq -c
Result:
3 blue
2 green
2 red
Example
In the file access.log we have a list of IP addresses:
192.168.0.1
192.168.0.2
192.168.0.1
192.168.0.3
192.168.0.1
We want to find out which IP occurred most frequently:
sort access.log | uniq -c | sort -rn
Result:
3 192.168.0.1
1 192.168.0.2
1 192.168.0.3
3. Command cut
The cut command allows you to extract specific parts of strings, like individual columns in a CSV file or ranges of characters.
Main Syntax
cut [options] file
Simple Example
File data.csv:
Alice,25,Developer
Bob,30,Designer
Charlie,22,Manager
Let’s extract only the names (the first column):
cut -d',' -f1 data.csv
Result:
Alice
Bob
Charlie
Where:
-d','— delimiter (comma).-f1— field (column) to extract.
Key Options for cut
Selecting a Range of Characters
If our file data.csv has fixed-width columns:
Alice 25 Developer
Bob 30 Designer
Charlie 22 Manager
We can extract only the age (characters 12 to 14):
cut -c12-14 data.csv
Result:
25
30
22
Selecting Multiple Fields
If we have a file log.csv:
2023-01-01,INFO,Server started
2023-01-02,ERROR,Connection failed
2023-01-03,INFO,Server stopped
Let’s select only the date and log level (fields 1 and 2):
cut -d',' -f1,2 log.csv
Result:
2023-01-01,INFO
2023-01-02,ERROR
2023-01-03,INFO
4. Practical Example: Combining sort, uniq, and cut
Let's take a log file with the following data:
user1,192.168.0.1
user2,192.168.0.2
user1,192.168.0.1
user3,192.168.0.3
user2,192.168.0.2
- Extract the IP addresses:
cut -d',' -f2 log.txt
- Remove duplicates and count them:
cut -d',' -f2 log.txt | sort | uniq -c
- Sort by the number of occurrences:
cut -d',' -f2 log.txt | sort | uniq -c | sort -rn
Result:
2 192.168.0.2
2 192.168.0.1
1 192.168.0.3
GO TO FULL VERSION