1. Why reading large files in a single thread is like carrying bricks one by one
When you work with large files — tens or hundreds of megabytes, or even gigabytes — single-threaded reading or writing quickly becomes a bottleneck. One thread simply cannot keep up with the load: the disk can deliver data faster than the program can process it.
Even if you have a fast SSD, the bottleneck may not be the disk but overhead — context switches, buffer handling, data transformation in memory. As a result, throughput drops while the CPU is bored, since its other cores are idle.
Suppose you decide to count the number of words in a huge log. If you do this sequentially, one thread will monotonously chew through the file while you just wait. But if you split the file into chunks and assign processing to multiple threads, things go much faster: each thread processes its own part, and you almost fully utilise the disk’s potential.
In practice it looks like this: on an SSD with a throughput of 2 GB/s, single-threaded reading yields only about 300–500 MB/s. If you read in parallel, you can squeeze out everything the drive is capable of.
2. Chunking — how to make the file work for you
When a file becomes too large to process as a whole, the most reasonable approach is to split it into parts. This technique is called chunking (from the word chunk — “piece”). The idea is simple: you divide a large file into several logical segments and assign each thread its own region.
Each thread knows from which offset (offset) it should start and where to stop. It reads only its piece, processes the data, and then the results are merged back into the overall total.
This approach allows you to utilise all CPU cores simultaneously and significantly speed up processing, especially if you have a modern SSD or NVMe drive. For tasks like counting lines, text search, or statistics aggregation, chunking works like a turbocharger — it simply adds speed with little effort.
How to choose the chunk size
Chunk size is almost like portion size: too small — you’ll waste time cutting; too large — it’s hard to digest. It all depends on the task and your machine’s capabilities.
On average, a range of 8–64 MB per thread gives good results. For most tasks, something around 10–20 MB is enough, but there is no perfect number — you tune it experimentally. The main thing is that a chunk should be large enough not to waste time on excessive thread switching, and not so large that it thrashes the CPU cache or occupies all the memory.
If you work with text — for example, counting words or searching for matches — it’s important that chunks don’t split lines or words in the middle. Typically this is solved simply: make a small overlap between chunks or shift boundaries to the nearest newline character. This way the processing remains accurate and the result is clean and predictable.
3. Tools for positioned access: FileChannel and MappedByteBuffer
FileChannel: Positioned IO
FileChannel is a class from the java.nio.channels package that lets you work with files at a low level, including reading and writing data to/from an arbitrary file position.
Key methods:
- position(long newPosition) — set the position (offset) for reading/writing.
- read(ByteBuffer dst, long position) — read data from a file into a buffer starting at the given position (does not change the channel’s current position!).
- write(ByteBuffer src, long position) — write data to a file starting at the given position.
Example: reading a file chunk
try (FileChannel channel = FileChannel.open(Path.of("bigfile.txt"), StandardOpenOption.READ)) {
long chunkSize = 16 * 1024 * 1024; // 16 MB
long offset = 0;
ByteBuffer buffer = ByteBuffer.allocate((int) chunkSize);
int bytesRead = channel.read(buffer, offset);
// buffer contains the first 16 MB of the file
}
Advantages:
- You can read/write from any position.
- Convenient for parallel processing: each thread works with its own piece.
MappedByteBuffer: Memory-mapped files
MappedByteBuffer is a special buffer that lets you “map” a region of a file into memory. The operating system itself takes care of loading data from disk into memory and back.
How does it work?
- You map a piece of a file into memory.
- You read and write to the buffer — the OS loads the required pages automatically.
- No explicit calls to read/write — everything goes through memory.
Example:
try (FileChannel channel = FileChannel.open(Path.of("bigfile.txt"), StandardOpenOption.READ)) {
long chunkSize = 16 * 1024 * 1024; // 16 MB
long offset = 0;
MappedByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, offset, chunkSize);
// Now buffer behaves like a byte array, but data are read from disk on demand
}
Pros:
- Very high speed (especially on SSDs).
- Simplicity: read/write like an array.
Cons:
- Uses virtual memory — if the file is very large, you can “fill up” memory.
- It’s hard to control eviction from memory (the buffer can stay mapped longer than needed).
- Not always convenient for very large files (more than 2–4 GB on 32-bit systems).
4. Example: parallel reading and word counting
Consider the task: count the number of words in a large text file (for example, a 10 GB log) using parallel processing.
Step 1. Split the file into chunks
- Get the file size: long fileSize = Files.size(path);
- Choose a chunk size, for example, 16 MB.
- For each chunk, compute the offset: offset = chunkIndex * chunkSize;
- The last chunk may be smaller.
Step 2. Create tasks for threads
- For each chunk, create a Callable<Integer> (or Runnable) that:
- Opens its piece of the file via FileChannel.read(ByteBuffer, offset) or MappedByteBuffer.
- Counts the number of words in its piece.
- Returns the result (the word count).
Step 3. Submit tasks via ExecutorService
- Create a thread pool: ExecutorService pool = Executors.newFixedThreadPool(N);
- Submit tasks to the pool: List<Future<Integer>> results = pool.invokeAll(tasks);
- Aggregate the results: sum the values from all futures.
Code example (simplified):
import java.nio.*;
import java.nio.channels.*;
import java.nio.file.*;
import java.util.*;
import java.util.concurrent.*;
public class ParallelWordCount {
public static void main(String[] args) throws Exception {
Path path = Path.of("bigfile.txt");
long fileSize = Files.size(path);
int chunkSize = 16 * 1024 * 1024; // 16 MB
int chunks = (int) ((fileSize + chunkSize - 1) / chunkSize);
ExecutorService pool = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
List<Future<Integer>> results = new ArrayList<>();
for (int i = 0; i < chunks; i++) {
long offset = (long) i * chunkSize;
long size = Math.min(chunkSize, fileSize - offset);
results.add(pool.submit(() -> {
try (FileChannel channel = FileChannel.open(path, StandardOpenOption.READ)) {
MappedByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, offset, size);
byte[] bytes = new byte[(int) size];
buffer.get(bytes);
String text = new String(bytes);
// Important: handle chunk boundaries so you don’t split a word!
return countWords(text);
}
}));
}
int totalWords = 0;
for (Future<Integer> f : results) {
totalWords += f.get();
}
pool.shutdown();
System.out.println("Total words: " + totalWords);
}
private static int countWords(String text) {
// The simplest way: split by whitespace and filter out empty strings
String[] words = text.split("\\s+");
int count = 0;
for (String w : words) {
if (!w.isBlank()) count++;
}
return count;
}
}
Attention: in real tasks you must carefully handle chunk boundaries so as not to split a word or a line between two threads. Typically, you make a small overlap (for example, +100 bytes) and adjust the chunk’s start/end.
5. Takeaways and best practices
- For large files, use splitting into chunks and parallel processing.
- Use FileChannel for positioned access, and MappedByteBuffer for memory-mapped files.
- Select the chunk size experimentally; aim for the CPU cache and the disk’s throughput.
- Carefully handle chunk boundaries (especially for text).
- For parallel processing, use ExecutorService and a thread pool.
- Don’t overuse the number of threads: usually 2–4 threads on an SSD is enough.
- Watch memory consumption: MappedByteBuffer can occupy a lot of virtual memory.
6. Common mistakes when working with large files and chunking
Error #1: Reading the entire file into memory. When processing large files, this can lead to OutOfMemoryError. Instead, read data in parts (chunks).
Error #2: Incorrect handling of chunk boundaries. If you split a file without accounting for line or word boundaries, you can “tear” the data and the result will be incorrect.
Error #3: Suboptimal chunk size. Chunks that are too small add unnecessary threading overhead, and chunks that are too large use memory inefficiently.
Error #4: Unclosed FileChannel. This leads to resource leaks. Use try-with-resources to guarantee the channel is closed.
Error #5: Excessive number of threads. If there are too many threads, the disk can’t service the requests fast enough, and performance drops instead of increasing.
GO TO FULL VERSION