1. Getting started with XML
Although JSON is the de facto standard today, the Java world has plenty of legacy code, so you may still encounter XML. So knowing the basics is useful. XML (eXtensible Markup Language) is a text format for storing and exchanging structured data. It’s often used for configurations, inter-program communication, data storage, and even object serialization.
Core elements of an XML document structure:
Tags (elements):
Everything between < and >. Each tag opens (<tag>) and closes (</tag>), or is self-closing (<tag/>).
Attributes:
Additional parameters inside a tag.
Example: <user name="Vasya" age="25"/>
Text nodes:
Text between tags.
Example: <greeting>Hello, world!</greeting>
Simple XML document example
<?xml version="1.0" encoding="UTF-8"?>
<users>
<user name="Vasya" age="25">Hello!</user>
<user name="Katya" age="30"/>
</users>
Here:
- <users> — the root element.
- <user> — an element with the name and age attributes; the first user has text content.
2. Key concepts
Prolog:
A line at the beginning of the file that specifies the XML version and encoding.
Example: <?xml version="1.0" encoding="UTF-8"?>
Elements:
The basic building blocks of XML.
Example: <book>...</book>
Attributes:
Parameters inside an opening tag.
Example: <book title="Java" author="Ivanov"/>
Comments:
As in HTML, they are written between <!-- ... -->.
Example: <!-- This is a comment -->
CDATA sections:
Allow you to insert text that will not be interpreted as XML (for example, text with <, &, etc.).
Example:
<script><![CDATA[
if (a < b && b > 0) { ... }
]]></script>
3. Why XML namespaces are needed
Problem: name conflicts
In large XML documents, you often encounter identical tag names with different meanings. For example, you might have <table> as part of HTML and <table> as part of your business logic (e.g., a database). How can you tell which tag means what?
Without a namespace:
<root>
<table>
<row>...</row>
</table>
<table>
<column>...</column>
</table>
</root>
Here it’s unclear which <table> refers to what.
Solution: namespaces
A namespace is a way to “label” elements and attributes to avoid name conflicts and clearly separate the meaning of different tags.
- Each namespace is a unique URI (it usually looks like a link, but it’s just an identifier).
- Elements and attributes in different namespaces are considered different, even if the local name is the same.
Benefits:
- No name conflicts between different standards and schemas.
- You can combine data from different sources in one document.
- It’s clear which “vocabulary” each tag belongs to.
4. Using namespaces: declaration and usage
How to declare a namespace
In the opening tag of an element, use the special xmlns attribute (XML Namespace):
- Without a prefix: xmlns="URI" — declares the default namespace for all nested elements.
- With a prefix: xmlns:prefix="URI" — declares a namespace with a short name (a prefix), which is used for tags.
Example of declaring and using a namespace
<root xmlns:h="http://www.w3.org/TR/html4/"
xmlns:f="http://www.w3schools.com/furniture">
<h:table>
<h:tr>
<h:td>Cell 1</h:td>
<h:td>Cell 2</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
Here:
- xmlns:h="..." — declares the h prefix for HTML table elements.
- xmlns:f="..." — declares the f prefix for furniture.
- <h:table> and <f:table> — these are now different elements, even though the local name is the same.
How to use a namespace
- Declaration: In the root or any other element: xmlns:prefix="URI"
- Application: Put the prefix and a colon before the element or attribute name: <prefix:element>...</prefix:element>
Default namespace
If you declare just xmlns="URI", all elements without a prefix will belong to that namespace.
<books xmlns="http://example.com/books">
<book>...</book>
</books>
5. Practice: creating a simple XML document with multiple namespaces
Let’s create an example XML document that uses two namespaces: one for HTML and one for furniture.
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:h="http://www.w3.org/TR/html4/"
xmlns:f="http://www.w3schools.com/furniture">
<h:table>
<h:tr>
<h:td>Cell 1</h:td>
<h:td>Cell 2</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
What’s happening here:
- The root element declares two namespaces: h and f.
- All elements with the h: prefix belong to the HTML table namespace.
- All elements with the f: prefix belong to the furniture namespace.
6. Common mistakes
Mistake 1: forgot to declare a prefix
<root>
<h:table>...</h:table>
</root>
Result: The XML parser will throw an error: The prefix 'h' for element 'h:table' is not bound.
Mistake 2: identical names without a namespace
<root>
<table>...</table>
<table>...</table>
</root>
Result: It’s impossible to distinguish which <table> refers to what.
Mistake 3: incorrect use of a prefix
<root xmlns:h="http://www.w3.org/TR/html4/">
<h:table>...</h:table>
<f:table>...</f:table> <!-- Error: prefix f is not declared -->
</root>
Result: Parser error: The prefix 'f' for element 'f:table' is not bound.
GO TO FULL VERSION