1. Introduction
It's time to cover those annoying situations when working with files turns into a meetup with (sometimes rather mysterious) errors. If you've ever run into errors like System.Text.DecoderFallbackException, you're already familiar with this topic firsthand!
In this lecture we'll cover:
- What kinds of encoding-related errors happen in .NET;
- How corrupted or invalid files manifest;
- Practical examples of catching and handling such errors;
- What to watch out for when working with someone else's files (or with “good old” files found on an ancient disk).
So, if ASCII was too simple and Unicode too clever, sometimes you meet files nobody can read. That's when exceptions show up.
Why does this happen?
When you open a file with StreamReader, specifying an encoding (or using the default), .NET assumes all bytes in that file can be correctly converted to characters. But if the file contains bytes that don't map to any character in that encoding, a decoding error happens.
2. Exceptions when reading files with the wrong encoding
The most common exception — DecoderFallbackException
This exception is thrown by .NET when it can't map a byte sequence to a character in the expected encoding.
A simple example to make it clear:
// Suppose an old file is in Windows-1251 (Cyrillic)
string win1251File = "win1251_test.txt";
File.WriteAllText(win1251File, "Privet, mir!", Encoding.GetEncoding("windows-1251"));
try
{
// Let's try to read that file as UTF-8
using var reader = new StreamReader(win1251File, Encoding.UTF8);
string content = reader.ReadToEnd();
Console.WriteLine(content); // ...and it will print gibberish (or throw)
}
catch (DecoderFallbackException ex)
{
Console.WriteLine("Decoding error: " + ex.Message);
}
In most cases, reading a Windows-1251 file as UTF-8 will produce a bunch of “mojibake”. By default StreamReader doesn't throw here but substitutes a replacement character "�" for unknown bytes. However, if you explicitly configure the encoding with a strict DecoderExceptionFallback or if the stream contains particularly “unparsable” bytes, a DecoderFallbackException will be thrown.
DecoderFallbackException in detail
- When it happens: when trying to read a sequence of bytes that can't be converted to characters in the current encoding.
- What to do: read the file with the correct encoding! If you don't know the file's encoding, try to guess (sometimes you can use the BOM or the filename) or ask whoever created the file.
3. Example with an explicitly corrupted file
Now let's complicate things. Imagine the file is corrupted: inside the byte sequence there are truncated fragments of incomplete characters. This happens when a write was interrupted, due to network errors, bad conversions, or literally cutting the file with a utility... in the literal sense: the file is “cut” somewhere.
Let's create a “broken” file
// Write a valid string in UTF-8
byte[] valid = Encoding.UTF8.GetBytes("Privet, mir!");
// Now create an invalid byte array (cut part of a character)
byte[] corrupted = new byte[valid.Length - 1];
Array.Copy(valid, corrupted, valid.Length - 1); // Copied everything except the last byte
// Save the file
File.WriteAllBytes("corrupted.txt", corrupted);
try
{
using var reader = new StreamReader("corrupted.txt", Encoding.UTF8);
string s = reader.ReadToEnd();
Console.WriteLine("Read text: " + s);
}
catch (DecoderFallbackException ex)
{
Console.WriteLine("File is corrupted! " + ex.Message);
}
Outcome: .NET won't be able to properly assemble the final character. By default it will replace it with the special character "�" (or "?") or, if the encoding is configured accordingly, throw a DecoderFallbackException.
4. Fallback strategies: can we avoid the exception?
Sometimes when a character is “unknown” you'd rather not crash but replace it with “?” or something else. .NET provides so-called fallback strategies for that.
Example: replace unknown chars instead of throwing
// An array with an invalid sequence for UTF-8
byte[] data = { 0xD0, 0x9F, 0xD1, 0x80, 0xD0, 0xB8, 0xD0, 0xB2, 0xD0, 0xB5, 0xD1, 0x82, 0xD1 }; // Last byte cut
File.WriteAllBytes("broken_utf8.txt", data);
// Fallback strategy: replace problematic chars with question mark
var encodingWithFallback = Encoding.GetEncoding(
"UTF-8",
new EncoderReplacementFallback("?"),
new DecoderReplacementFallback("?")
);
using var reader = new StreamReader("broken_utf8.txt", encodingWithFallback);
string s = reader.ReadToEnd();
Console.WriteLine("Text (with replacements): " + s);
Result: the file will be read with unknown chars replaced by "?". This avoids crashing the app, but the text won't be fully “native”.
5. BOM issues and incompatibility
Reminder: BOM is the Byte Order Mark, a special byte sequence at the start of a file that says “hi, I'm this encoding!”.
When BOM can cause headaches
- If a file has a BOM and the app doesn't handle it, the first character may look weird (e.g. "" or an invisible char).
- Sometimes absence of BOM leads to wrong encoding detection.
Exceptions related to BOM
C# usually handles BOM when reading, but if you specify the wrong encoding or manually strip the BOM, you risk:
- An unexpected character at the start (e.g. "�");
- An exception if the encoding is configured to throw and the BOM is treated as an invalid byte sequence.
Practical tip: always explicitly specify the encoding when reading/writing if the encoding type matters to you.
6. Other interesting exceptions and scenarios
Wrong encoding on write
When you try to write a string that contains characters not supported by the chosen encoding. For example, try to save the emoji “😊” into a file with Encoding.ASCII:
try
{
using var writer = new StreamWriter("ascii.txt", false, Encoding.ASCII);
writer.WriteLine("Eto test 😊");
}
catch (EncoderFallbackException ex)
{
Console.WriteLine("Encoding error: " + ex.Message);
}
Result: you'll either get an EncoderFallbackException or the character will be replaced with "?" — depends on the configured fallback strategy.
Data loss when converting between encodings
When converting a file you can accidentally lose data if the target encoding can't represent all characters from the source (for example, converting UTF-8 to Windows-1251 when the file contains Japanese text).
File corrupted by disk, network, or “manual editing”
If random or corrupted bytes got into a file (e.g., after a disk crash or editing a binary file with a text editor), attempts to read it often throw decoding exceptions.
7. How to catch and handle errors in practice?
Since errors can occur at many stages of file handling, recommended practice:
- Use try-catch blocks to catch exceptions — primarily DecoderFallbackException and EncoderFallbackException.
- Don't hesitate to inform the user: if the file is corrupted or the encoding is wrong — it's better to say so than to show strange text.
- Automate encoding detection if possible (e.g., by BOM or using libraries like Ude), but always let the user pick an encoding if auto-detect fails.
Typical code structure:
try
{
using var reader = new StreamReader("file.txt", Encoding.GetEncoding("windows-1251"));
string s = reader.ReadToEnd();
Console.WriteLine(s);
}
catch (DecoderFallbackException ex)
{
Console.WriteLine($"Could not read the file: {ex.Message}");
// You can offer the user to try a different encoding
}
catch (IOException ex)
{
Console.WriteLine($"I/O error: {ex.Message}");
}
8. Common pitfalls
Trying to read a UTF-8 file as Windows-1251: at best you'll see mojibake, at worst you'll get an exception (if the encoding is configured to throw).
Writing a file in ASCII that contains Russian text: anything that's not in the English alphabet will be replaced with "?" or cause EncoderFallbackException.
Reading a file without a BOM as UTF-8 when it's actually UTF-16: you'll read gibberish or may fail to read the file entirely.
Files without an explicit encoding from unsafe sources: always be cautious: even if a file opens “without errors”, that doesn't guarantee the result is correct.
GO TO FULL VERSION