Java serialization formats

Published in the Serialization in Java group

Hi! Let's talk about serialization. You probably remember that we've already had lessons on serialization. And so we did :) Here's the first And here's the second. If you don't remember well how serialization works, why serialization is needed, and what tools Java has for serialization, you can run through these lessons. Today's lesson will be about theory. We're going to take a closer look at serialization formats. First, let's recall what serialization is. Serialization is the process of storing the state of an object in a sequence of bytes. Deserialization is the process of restoring an object from these bytes. A Java object can be serialized and sent over a network (for example, to another computer). The sequence of bytes can be represented in different formats. You're familiar with this concept from ordinary computer use. For example, an electronic book (or a simple text document) can be written in a bunch of different formats:

docx (Microsoft Word format);
pdf (Adobe format);
mobi (commonly used on Amazon Kindle devices);
and much more (ePub, djvu, fb2, etc.).

In each case, the objective seems to be the same: present the text in a human-readable form. Still, people have invented a lots of different formats. Without going into the details of their work, we can assume that they had good reasons. Each format has its own advantages and disadvantages compared to the rest. Maybe various serialization formats were created following these same principles? Excellent guess, student! :) That's exactly right. The reality is that sending data over a wire (or wirelessly) is tricky business, and it involves many factors. Who is sending the data? Where to? What volume? Will the recipient be a human or a computer (i.e. should the data be human-readable)? What device will read the data? Obviously, these situations are different. It's one thing to send a 500 KB image from one smartphone to another. And it's a completely different thing if we're talking about 500 terabytes of business data that must to be optimally compressed and sent as quickly as possible. Let's get acquainted with the main serialization formats and consider the advantages and disadvantages of each of them!

JSON

JavaScript Object Notation. You already know a little about this format! We talked about it in this lesson, and we covered serialization into JSON right here. It got its name for a reason. Java objects converted to JSON actually look exactly like objects in JavaScript. You don't need to know JavaScript to understand our object:


{
   "title": "War and Peace",
   "author": "Lev Tolstoy",
   "year": 1869
}

We're not limited to sending a single object. The JSON format can also represent an array of objects:


[
 {
   "title": "War and Peace",
   "author": "Lev Tolstoy",
   "year": 1869
 },

 {
   "title": "Demons",
   "author": "Fyodor Dostoyevsky",
   "year": 1872
 },

 {
   "title": "The Seagull",
   "author": "Anton Chekhov",
   "year": 1896
 }
]

Because JSON represents JavaScript objects, it supports the following JavaScript data formats:

strings;
numbers;
objects;
arrays;
booleans (true and false);
null.

What are the benefits of JSON?

Human-readable format. This is an obvious advantage if your end user is human. For example, suppose your server has a database with a schedule of flights. A human customer, sitting at his computer at home, requests data from this database using a web application. Because you need to provide data in a format that he can understand, JSON is a great solution.
Simplicity. It's super simple :) Above, we gave an example of two JSON files. And even if you haven't heard about JavaScript (let alone JavaScript objects), you can easily understand the sort of objects described there.
The whole of JSON documentation consists of a webpage with a couple of pictures.
Widespread use. JavaScript is the dominant front-end language, and it has its own requirements. Using JSON is a must. Therefore, a huge number of web services use JSON as the data exchange format. Every modern IDE supports the JSON format (including IntelliJ IDEA). A bunch of libraries have been written for all sorts of programming languages to enable working with JSON.

For example, you've already worked with the Jackson library in a lesson where we learned to serialize Java objects into JSON. But besides Jackson, we have, for example, GSON, which is a very convenient library from Google.

YAML

Initially, YAML stood for "Yet Another Markup Language". When it began, it was positioned as a competitor to XML. Now, with the passage of time, YAML has come to mean "YAML Ain't Markup Language". What is it exactly? Let's imagine that we need to create 3 classes to represent characters in a computer game: Warrior, Mage, and Thief. They will have the following characteristics: strength, agility, endurance, a set of weapons. Here's what a YAML file describing our classes would look like:


classes:
 class-1:
   title: Warrior
   power: 8
   agility: 4
   stamina: 7
   weapons:
     - sword
     - spear
    
 class-2:
   title: Mage
   power: 5
   agility: 7
   stamina: 5
   weapons:
     - magic staff

 class-3:
   title: Thief
   power: 6
   agility: 6
   stamina: 5
   weapons:
     - dagger
     - poison

A YAML file has a tree structure: some elements are nested in others. We can control nesting using a certain number of spaces, which we use to denote each level. What are the advantages of the YAML format?

Human-readable. Again, even seeing a YAML file without a description, you can easily understand the objects that it describes. YAML is so human readable that the website yaml.org is an ordinary YAML file :)
Compactness. The file structure is created using spaces: there's no need to use brackets or quotation marks.
Support for native data structures for programming languages. The huge advantage of YAML over JSON and many other formats is that it supports various data structures. They include:
- !!map
  An unordered set of key-value pairs that cannot have duplicates;
- !!omap
  An ordered sequence of key-value pairs that cannot have duplicates;
- !!pairs:
  An ordered sequence of key-value pairs that can have duplicates;
- !!set
  An unordered sequence of values that are not equal to each other;
- !!seq
  A sequence of arbitrary values;
You will recognize some of these structures from Java! :) This means that various data structures from programming languages can be serialized into YAML.
Ability to use anchor and alias

These markers allow you to identify some element in a YAML file, and then refer to it in the rest of the file if it occurs repeatedly. An anchor is created using the symbol &, and an alias is created using *.

Suppose we have a file describing books by Leo Tolstoy. In order to avoid writing out the author's name for each book, we simply create the leo anchor and refer to it using an alias when we need it:
```
books:
 book-1:
   title: War and Peace
   author: &leo Leo Tolstoy
   year: 1869

 book-2:
   title: Anna Karenina
   author: *leo
   year: 1873

 book-3:
   title: Family Happiness
   author: *leo
   year: 1859
```
When this file is parsed, the value "Leo Tolstoy" is substituted in the right places where we have our aliases.

YAML can embed data in other formats. For example, JSON:


books: [
        {
          "title": "War and Peace",
          "author": "Leo Tolstoy",
          "year": 1869
        },

        {
          "title": "Anna Karenina",
          "author": "Leo Tolstoy",
          "year": 1873
        },

        {
          "title": "Family Happiness",
          "author": "Leo Tolstoy",
          "year": 1859
        }
      ]

Other serialization formats

XML

This format is based on a tag tree.


<book>
   <title>Harry Potter and the Philosopher’s Stone</title>
   <author>J. K. Rowling</author>
   <year>1997</year>
</book>

Each element consists of an opening and closing tag (<> and </>). Each element can have nested elements. XML is a common format that's just as good as JSON and YAML (if we're talking about real projects). We have a separate lesson about XML.

BSON (binary JSON)

As its name implies, BSON is very similar to JSON, but it is not human-readable and uses binary data. As a result, it is very good for storing and transferring images and other attachments. In addition, BSON supports some data types not available in JSON. For example, a BSON file can include a date (in millisecond format) or even a piece of JavaScript code. The popular MongoDB NoSQL database stores information in BSON format.

Position-based protocol

In some situations, we need to drastically reduce the amount of data sent (for example, if we have a lot of data and need to reduce the load). In this situation, we can use the position-based protocol, that is, send parameter values without the names of the parameters themselves.


"Leo Tolstoy" | "Anna Karenina" | 1873

Data in this format takes several times less space than a full JSON file. Of course, there are other serialization formats, but you don't need to know all of them right now :) It's good if you are familiar with the current industry standard formats when developing applications, and remember their advantages and how they differ from one another. And with this, our lesson comes to an end :) Don't forget to solve a couple of tasks today! Until next time! :)

Andrey Gorkovenko

Frontend Engineer at NFON AG

In the past, Andrey ran his web studio in Kyiv and worked as a front-end developer at CodeGym. Now he codes for a German product c ... [Read full bio]

Comments (1)

TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION

Anonymous #11363932 Level 41, Germany, Germany

12 September 2023

Me personally I'd expect at least a mention of csv. I know it's bound to have errors but in my opinion so is yaml because one bad tab could break the entire data in a file. But csv is very easy to parse compared to many other formats yet still has serilization libaries in basically any langauge that's still commonly used.