CodeGym /Courses /Docker SELF /Text Data Processing and System Update

Text Data Processing and System Update

Docker SELF
Level 8 , Lesson 6
Available

Text Data Processing and System Update

1. Text Data Processing

Today, we're gonna pull all this knowledge together and use it in practice to process text data and perform a system update. We'll dive into real-world scenarios: from system log analysis to automating package installation and configuration.

Task 1: Finding Errors in Logs

Scenario: Imagine you're a system admin, and you need to quickly figure out what's happening in the system. For this, we'll analyze system logs (/var/log/syslog).

Step 1: Filtering by Keywords

Let's start by looking for lines related to errors using grep:

grep "error" /var/log/syslog

Oh, we found something! But let's improve this. Say you wanna ignore case (like ERROR or error):

grep -i "error" /var/log/syslog

Now we see more matches. But sometimes you wanna find everything that's not related to errors:

grep -v "error" /var/log/syslog

Step 2: Simplifying Output with cut

Suppose you're only interested in the timestamp and the message. Let's extract the relevant columns:

grep "error" /var/log/syslog | cut -d' ' -f1,2,3,5-

Here we're using cut to split lines by spaces -d' ', picking columns 1, 2, 3 (time), and the remaining text.

Task 2: Counting Event Frequency

Now we wanna figure out how often errors occur. We combine grep, cut, and sort with uniq:

grep "error" /var/log/syslog | cut -d' ' -f5 | sort | uniq -c

This command:

  1. Searches for lines with errors using grep.
  2. Extracts only the error source info with cut.
  3. Sorts the data sort, so uniq can count the number of occurrences of each line.

The result will look like this:

  10 systemd
   7 kernel
   5 cron

Errors from systemd happen the most often. Time to draw some conclusions!


2. Loading and Processing External Data

Scenario: Data Analysis from an External Source

Let’s say we need to load a text file with data (for example, a log file) from a web server and analyze it. Let's do it step by step.

Step 1: File Loading

First, let’s download the file using wget:

wget -O data.log http://example.com/logs/data.log

The file is downloaded and saved as data.log. In case of a download error, let’s add resume support:

wget -c -O data.log http://example.com/logs/data.log

If you prefer curl:

curl -o data.log http://example.com/logs/data.log

Step 2: Searching for Information

Now let’s search for lines that contain a specific pattern, for example, WARNING:

grep "WARNING" data.log

Step 3: Formatting Data with awk

Let’s say the log has three columns: date, time, and message. We want to display only the date and message:

awk '{print $1, $3}' data.log

And if you need to filter lines where the message contains the word error, you can add a condition:

awk '/error/ {print $1, $3}' data.log

3. Updating the System with apt-get and yum

Now let's move on to more "system-level" tasks. Updating packages is a super important process for keeping your system secure and stable. Let's show how it's done.

Scenario: System Update

Step 1: Update the package list

For Debian-based distributions:

sudo apt-get update

For RedHat-based distributions:

sudo yum check-update

Step 2: Install updates

Debian-based:

sudo apt-get upgrade
RedHat-based:
sudo yum update

Step 3: Install a new package

For example, to install the vim text editor:

sudo apt-get install vim
sudo yum install vim

Helpful Tip

If you know exactly which package you want to install but are not sure about its name, use apt search or yum search:

apt search package_name
yum search package_name

4. Final Assignment

Task: Automate Updating and Data Processing

  1. Create a bash script that:
    • Updates the system;
    • Downloads a text file (e.g., a log);
    • Analyzes this file for errors;
    • Saves the analysis results to a new file.

Here's an example of such a script:

# Step 1: System update
echo "Updating the system..."
sudo apt-get update && sudo apt-get -y upgrade

# Step 2: File download
echo "Downloading the log file..."
wget -O data.log http://example.com/logs/data.log

# Step 3: File analysis
echo "Analyzing the log file for errors..."
grep "ERROR" data.log | cut -d' ' -f1,5- | sort | uniq -c > analysis.log

echo "Analysis complete. Results saved in analysis.log"

Save this script, for example, as update_and_analyze.sh, and set execution permissions:

chmod +x update_and_analyze.sh

And run it:

./update_and_analyze.sh

Attention: Common Mistakes

  • If you see an "access denied" message, make sure to run the script as a user with sudo privileges.

  • If wget or curl is not installed, add its installation at the beginning of the script:

    sudo apt-get install -y wget
    

What's the practical benefit?

These skills will come in handy not just at work, but also during interviews. Knowing how to find errors in logs, filter data, and run system updates is highly valued among admins and engineers. Scripts let you automate tasks, saving time and ensuring no human errors.

P.S. Remember, in the real world, you'll face tasks where you’ll need to combine commands, modify outputs, and automatically set up systems. Today's example is just the tip of the iceberg for your future practice.

1
Task
Docker SELF, level 8, lesson 6
Locked
Finding errors in the system log
Finding errors in the system log
1
Task
Docker SELF, level 8, lesson 6
Locked
Error Frequency
Error Frequency
1
Task
Docker SELF, level 8, lesson 6
Locked
Data Download and Analysis
Data Download and Analysis
1
Опрос
Utility Tools in Linux,  8 уровень,  6 лекция
недоступен
Utility Tools in Linux
Utility Tools in Linux
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION