CodeGym /Courses /JAVA 25 SELF /Asynchronous processing of text files

Asynchronous processing of text files

JAVA 25 SELF
Level 56 , Lesson 2
Available

1. Reading a file in chunks: ByteBuffer and encoding

Nowadays we rarely deal with small text files. Typically these are huge server logs, reports, CSV files, or gigabyte data dumps. Therefore, it’s important not just to read a file, but to do it efficiently and without the application “freezing”.

An asynchronous approach helps with exactly this: it doesn’t block the main thread—be it the UI or server logic—allows you to read and write large volumes of data in parallel, and makes the application scalable when you need to work with several files at once.

The key thing to understand: asynchronous I/O doesn’t make the disk itself faster—there are no miracles. It simply lets your program avoid idling while the disk performs an operation and do other work in the meantime.

How does asynchronous reading work?

An asynchronous channel (AsynchronousFileChannel) reads not strings but blocks of bytes into a ByteBuffer object. It’s like carrying boxes of letters rather than individual words. After reading, you need to turn those bytes into strings—with the right charset!

Example: asynchronous file reading in blocks

Let’s write the simplest example of asynchronous reading of a file in 4096-byte blocks and printing the contents to the console.

import java.nio.ByteBuffer;
import java.nio.channels.AsynchronousFileChannel;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
import java.util.concurrent.Future;
import java.nio.charset.StandardCharsets;
import java.io.IOException;

public class AsyncTextReadExample {
    public static void main(String[] args) throws Exception {
        Path path = Path.of("bigfile.txt");
        try (AsynchronousFileChannel channel = AsynchronousFileChannel.open(path, StandardOpenOption.READ)) {
            ByteBuffer buffer = ByteBuffer.allocate(4096);
            int position = 0;
            Future<Integer> future = channel.read(buffer, position);

            while (future.get() > 0) {
                buffer.flip();
                // Convert bytes to a string (UTF-8)
                String chunk = StandardCharsets.UTF_8.decode(buffer).toString();
                System.out.print(chunk);
                buffer.clear();
                position += chunk.getBytes(StandardCharsets.UTF_8).length;
                future = channel.read(buffer, position);
            }
        }
    }
}

Important points:

  • We read the file in parts (by buffer), not all at once.
  • After reading, bytes are decoded into a string using Charset.
  • Don’t forget buffer.clear()—otherwise the next read won’t work!

Why isn’t simply decoding bytes enough?

The trouble is that a character can be “split” between two blocks, especially when using a multibyte charset (for example, "UTF-8"). If the last byte in the buffer is half of a character, the next block will start with the “remainder” of that character. Without special handling you’ll get gibberish or even a decoding error.

2. Converting bytes to strings: handling splits

The problem of split lines

Suppose you have the string "Hello\nWorld\n", and the buffer ended at "Hel", while "lo\nWorld\n" ended up in the next block. If you simply concatenate strings, you can lose characters or get an invalid string.

Solution: use CharsetDecoder

Java provides the CharsetDecoder class, which can correctly handle such cases. It “remembers” undecoded bytes and correctly reconstructs characters at block boundaries.

Example of using CharsetDecoder

import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.CharBuffer;
import java.nio.ByteBuffer;

CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
ByteBuffer buffer = ... // your bytes
CharBuffer charBuffer = CharBuffer.allocate(buffer.capacity());
decoder.decode(buffer, charBuffer, false);
// Now charBuffer contains correctly decoded characters

In a real task you will keep a “leftover” between reads and decode with this leftover taken into account.

3. Asynchronous writing of text files

Reading is only half the story. Writing is also performed in blocks of bytes, which you must first obtain from strings (encode).

Example: asynchronous writing of a string to a file

import java.nio.ByteBuffer;
import java.nio.channels.AsynchronousFileChannel;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
import java.util.concurrent.Future;
import java.nio.charset.StandardCharsets;

public class AsyncTextWriteExample {
    public static void main(String[] args) throws Exception {
        Path path = Path.of("output.txt");
        String text = "Hello, world!\n";
        ByteBuffer buffer = ByteBuffer.wrap(text.getBytes(StandardCharsets.UTF_8));
        try (AsynchronousFileChannel channel = AsynchronousFileChannel.open(path, StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {
            Future<Integer> future = channel.write(buffer, 0);
            // For demonstration, wait for completion (you usually shouldn't!)
            future.get();
            System.out.println("Data written asynchronously.");
        }
    }
}

Comment: In real asynchronous scenarios, you shouldn’t call future.get() on the main thread—it turns asynchronous code into synchronous code. It’s better to use CompletionHandler (see the previous lecture).

4. Practice: asynchronously reading a large text file and counting lines

Let’s implement a practical task: asynchronously read a large text file and count the number of lines ("\n"). The result—print the number of lines to the console.

Example using CompletionHandler

import java.nio.ByteBuffer;
import java.nio.channels.AsynchronousFileChannel;
import java.nio.channels.CompletionHandler;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.StandardCharsets;
import java.nio.CharBuffer;
import java.io.IOException;
import java.util.concurrent.atomic.AtomicLong;

public class AsyncLineCounter {
    public static void main(String[] args) throws IOException {
        Path path = Path.of("bigfile.txt");
        AsynchronousFileChannel channel = AsynchronousFileChannel.open(path, StandardOpenOption.READ);

        ByteBuffer buffer = ByteBuffer.allocate(4096);
        AtomicLong position = new AtomicLong(0);
        CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
        StringBuilder leftover = new StringBuilder();
        AtomicLong lines = new AtomicLong(0);

        channel.read(buffer, position.get(), null, new CompletionHandler<Integer, Object>() {
            @Override
            public void completed(Integer result, Object attachment) {
                if (result == -1) {
                    // File read to the end
                    if (leftover.length() > 0) lines.incrementAndGet();
                    System.out.println("Lines in file: " + lines.get());
                    try { channel.close(); } catch (IOException e) { e.printStackTrace(); }
                    return;
                }
                buffer.flip();
                CharBuffer charBuffer = CharBuffer.allocate(buffer.remaining());
                decoder.decode(buffer, charBuffer, false);
                charBuffer.flip();
                String chunk = leftover.toString() + charBuffer.toString();
                leftover.setLength(0);

                // Count lines
                int last = 0;
                int idx;
                while ((idx = chunk.indexOf('\n', last)) != -1) {
                    lines.incrementAndGet();
                    last = idx + 1;
                }
                // Remainder (part of the line after the last \n)
                if (last < chunk.length()) {
                    leftover.append(chunk.substring(last));
                }
                buffer.clear();
                position.addAndGet(result);
                channel.read(buffer, position.get(), null, this);
            }

            @Override
            public void failed(Throwable exc, Object attachment) {
                System.err.println("Read error: " + exc.getMessage());
                try { channel.close(); } catch (IOException e) { e.printStackTrace(); }
            }
        });

        // So the program doesn't exit too early (for demo only!)
        try { Thread.sleep(2000); } catch (InterruptedException e) {}
    }
}
  • We use CompletionHandler for truly async code.
  • After each read the buffer is decoded using CharsetDecoder.
  • The remainder of a line that didn’t end with "\n" is carried over to the next block.
  • After reaching the end of the file, if something remains in leftover, that also counts as a line.
  • For simplicity, the example “sleeps” for 2000 ms so the asynchronous operation can complete (in real applications this isn’t needed—you usually have a main loop or UI).

5. Asynchronous writing of results to a file

Suppose we want to write the result (for example, the number of lines) to a new file—asynchronously as well.

import java.nio.ByteBuffer;
import java.nio.channels.AsynchronousFileChannel;
import java.nio.channels.CompletionHandler;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
import java.nio.charset.StandardCharsets;
import java.io.IOException;

public class AsyncWriteResult {
    public static void main(String[] args) throws IOException {
        String result = "Lines in file: 12345\n";
        ByteBuffer buffer = ByteBuffer.wrap(result.getBytes(StandardCharsets.UTF_8));
        Path path = Path.of("result.txt");

        AsynchronousFileChannel channel = AsynchronousFileChannel.open(
            path, StandardOpenOption.WRITE, StandardOpenOption.CREATE, StandardOpenOption.TRUNCATE_EXISTING);

        channel.write(buffer, 0, null, new CompletionHandler<Integer, Object>() {
            @Override
            public void completed(Integer written, Object attachment) {
                System.out.println("Result written asynchronously!");
                try { channel.close(); } catch (IOException e) { e.printStackTrace(); }
            }

            @Override
            public void failed(Throwable exc, Object attachment) {
                System.err.println("Write error: " + exc.getMessage());
                try { channel.close(); } catch (IOException e) { e.printStackTrace(); }
            }
        });

        try { Thread.sleep(500); } catch (InterruptedException e) {}
    }
}

6. Tips for handling partial data and charsets

Partial lines between blocks

If a line is split between two blocks, do not try to “glue” bytes together manually! Use CharsetDecoder, which will carefully handle the missing bytes and won’t lose a single character.

Working with different charsets

"UTF-8" is the standard for modern applications, but if the file uses a different charset (for example, "Windows-1251" or "UTF-16"), use the corresponding Charset:

import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;

Charset charset = Charset.forName("Windows-1251");
CharsetDecoder decoder = charset.newDecoder();

Using CharsetDecoder and CharsetEncoder

When you read or write data in parts, it’s important to handle the charset correctly. A character may be “split” between two blocks, and without extra handling you’ll get a mess of bytes.

To avoid this, use CharsetDecoder and CharsetEncoder.

When reading, call decode(ByteBuffer, CharBuffer, endOfInput), and when writing—encode(CharBuffer, ByteBuffer, endOfInput).

They ensure that even if a character ends up split between two blocks, it will still be assembled and handled correctly.

7. Common mistakes in asynchronous processing of text files

Mistake No. 1: Ignoring leftover line fragments. If you don’t keep the “tail” of a line between blocks, some lines may be lost or decoded incorrectly.

Mistake No. 2: Incorrect buffer handling. Forgot to call buffer.clear() after processing—the next read won’t work or the data will be incorrect.

Mistake No. 3: Using the wrong charset. If bytes are decoded with a different Charset than was used when writing the file, you may get gibberish or even errors.

Mistake No. 4: Blocking the main thread. If you call future.get() or Thread.sleep() on the UI thread, you lose the point of asynchrony. Use CompletionHandler and reactive approaches.

Mistake No. 5: Not closing the channel after completion. Always close the channel (channel.close()) after all operations finish, even if an error occurred.

1
Task
JAVA 25 SELF, level 56, lesson 2
Locked
Swift Secretary: asynchronous line counting for a huge report 📊
Swift Secretary: asynchronous line counting for a huge report 📊
1
Task
JAVA 25 SELF, level 56, lesson 2
Locked
Automated Data Factory: reading, processing and asynchronous output 🤖
Automated Data Factory: reading, processing and asynchronous output 🤖
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION