A lecture snippet with a mentor as part of the Codegym University course. Sign up for the full course.


"Greetings, Amigo. Once upon a time, you learned that in order to write a string of characters in code, you need to wrap them in double quotes."

"Yes, and that gives us a string literal. It wasn't very long ago that I found out about this."

"In our profession, that was a long time ago. But that's not the point right now. Instead, please tell me what to do if we need quotation marks inside a string literal?"

"Hmm... A string containing quotes — what could be easier. I'm sure there's some way..."

"Yes. Let's say we want to display the text "Friends" was nominated for an "Oscar". How would we do it?"

"To be honest, I have no idea. I can't think of anything."

"You won't be able to get to a solution through logic. Let me just show you what to do.

Code Notes
String s = ""Friends" was nominated for an "Oscar"";
This option will not work!"

"This option will not work, because the compiler interprets this as entirely different code:

Code Notes
String s = ""Friends" was nominated for an "Oscar"";
This option will not work!"

"After the compiler encounters double quotes in the code, it treats what follows as the beginning of a string literal. The next double quotation mark indicates the end of the string literal."

"So how do you write double quotes inside of a literal?"

"There is a way. It is called escaping characters. You just write the quotation marks within the string of text. And before the quotes, you add the \ (backslash) symbol.

"This is what the string literal looks like when written properly:

Code Notes
String s = "\"Friends\" was nominated for an \"Oscar\"";
This will work!

"The compiler will interpret everything correctly and will not consider the quotation mark after the backslash as a normal quotation mark.

"What's more, if you output this string to the screen, the quotes with backslashes will be processed correctly, and the text will be displayed without any backslashes: "Friends" was nominated for an "Oscar"

"Well, I'm not going to say that this is super convenient..."

"But what can you do, those are the rules. Another important point. A quotation mark preceded by a backslash represents a single character: we're simply using slick notation that doesn't interfere with the compiler's ability to recognize string literals in our code. You can assign quotes to a char variable:

Code Notes
char c = '\"';
\" is one character, not two
char c = '"';
This is also possible: a double quotation mark inside single quotes

Common situations that occur when escaping characters

"In addition to double quotes, there are many other characters that the compiler handles in a special way. For example, a line break.

"How do we add a line break to a literal? There is also a special combination for this:

\n
Line break character

"If you need to add a line break to a string literal, you just add a couple of characters" \n.

Example:

Code Console output
System.out.println("Best regards, \n Anonymous");
Best regards,
Anonymous

"There are a total of 8 special combinations like this, which are also called escape sequences. Here they are:

Code Description
\t Insert a tab character
\b Insert a backspace character
\n Insert a newline character
\r Insert a carriage return character
\f Insert a page feed character
\' Insert a single quotation mark
\" Insert a double quotation mark
\\ Insert a backslash

"You already showed me two of them. What do the other 6 mean?"

"I'll explain it all right now.

\t is a tab character

When this text appears in text, it is equivalent to pressing the Tab key while typing. It shifts the text that follows it and makes it possible to align text.

Example:

Code Console output
System.out.println("0\t1\t2\t3");
System.out.println("0\t10\t20\t30");
System.out.println("0\t100\t200\t300");
0       1        2        3
0       10       20       30
0       100      200      300

\b means 'go back one character'

This sequence in a string is equivalent to pressing the Backspace key on the keyboard. It removes the character that precedes it:

Code Console output
System.out.println("Hello\b\b World");
Hel World

\r is the carriage return character

This character moves the cursor to the beginning of the current line without changing the text (depends on JDK version). Whatever next is displayed next will overwrite the existing string.

Example:

Code Console output
System.out.println("Greetings\r World!");
World!ngs

\f is a page feed character

This symbol comes down to us from the days of the first dot matrix printers. Outputting this sequence to a printer would cause the printer to simply feeds out the current sheet, without printing any text, until a new page begins.

Now we would call it a page break or new page.

\\ is a backslash

Everything is straightforward here. If we use a backslash to escape characters in our text, then how do we write a backslash character itself in the string?

It's simple: add a backslash to the text — you have to write two in a row.

Example:

Code Console output
System.out.println("c:\projects\my\first");
The compiler will yell at you for unknown escaped characters.
System.out.println("c:\\projects\\my\\first");
That's how it's done right!
7
Task
Module 1. Java Syntax,  level 10lesson 2
Locked
Escaping characters
Display the following text on two lines: It's a Windows path: "C:\Program Files\Java\jdk-13.0.0\bin" It's a Java string: \"C:\\Program Files\\Java\\jdk-13.0.0\\bin\" Hint: \" is for escaping a double quotation mark; \\ is for escaping a backslash (\). Read more about escaping characters and escape

"Using double slashes makes sense. But I haven't immediately been able to memorize everything else. I'll have to rely on your hints."

"Gradually, you'll remember what you need. Don't worry. And for everything else, there's Google.

Unicode encoding

"You already know that each character displayed on the screen corresponds to a specific numerical code. A standardized set of these codes is called an encoding.

"Once upon a time, when computers were newly invented, seven bits (less than one byte) were enough to encode every character. The first encoding contained only 128 characters. This encoding was called ASCII."

"That's a strange name."

"There's nothing strange about it. It's an abbreviation. ASCII stands for American Standard Code for Information Interchange — a standard American code table for printable characters and some special codes."

"It consists of 33 non-printable control characters (which affect how text and spaces are processed) and 95 printable characters, including numbers, uppercase and lowercase Latin letters, and several punctuation marks.

"As computers grew in popularity, each country began to release its own encoding. Usually, they took ASCII as a starting point and replaced rarely used ASCII characters with symbols from their respective alphabets.

"Over time, an idea emerged: create a single encoding that contains all the characters of every encoding in the world.

Unicode encoding

"Thus, in 1993, the Unicode encoding was created, and the Java language became the first programming language that used this encoding as the standard for storing text. Now Unicode is the standard for the entire IT industry.

"Although Unicode itself is the standard, it has several representations or Unicode transformation formats (UTF): UTF-8, UTF-16 and UTF-32, etc.

"Java uses an advanced version of Unicode encoding — UTF-16: each character is encoded in 16 bits (2 bytes). It can accommodate up to 65,536 characters! You can find almost every character of every alphabets in the world in this encoding."

"I hope I don't need to know it by heart?"

"If you want to, go for it!"

"Okay, fine. I'll use this rule: you can't know everything, but you can Google everything."

"Adopting a rational approach is everything. So, to write a Unicode character in your program using its code, you need to write \u + the code in hexadecimal. For example, \u00A9

Code Console output
System.out.println("\u00A9 CodeGym");
© CodeGym
7
Task
Module 1. Java Syntax,  level 10lesson 2
Locked
Unicode encoding
The public static init(char[]) method is passed an array of 9 characters, which needs to be filled with the following values: 0 - '\u00A9' 1 - '\u004A' 2 - '\u0061' 3 - '\u0076' 4 - '\u0061' 5 - '\u0052' 6 - '\u0075' 7 - '\u0073' 8 - '\u0068' To see the result, run the main() method.

Unicode: code point

"640 kilobytes ought to be enough for everyone! Or not". Bill Gates once said that. Or not. At least this quote is attributed to him."

"Haha. 640 kilobytes isn't even enough to load a cleaning robot's brain."

"Life is rough, and over time, the UTF-16 encoding began to be inadequate. It turns out that there are a lot of Asian languages, and they have a lot of glyphs. And all these glyphs simply cannot be crammed into 2 bytes."

"So what do we do?"

"Use more bytes! But the char type is only 2 bytes and changing it to 4 is not so easy: billions of lines of Java code have been written all over the world, which would break if the char type suddenly becomes 4 bytes a Java machine. So we can't change the char type!

"There is another approach. Remember how we escape characters by putting a backslash in front of them. Basically, we encoded a single character using multiple characters. Java's creators decided to use the same approach.

"Some characters that visually appear as a single character are encoded as two chars in a string:

Code Console output
System.out.println("\uD83D\uDD0A");
🔊

"Now your Java program can even output emojis to the console 😎"

"I'll definitely use that to have some fun!"

7
Task
Module 1. Java Syntax,  level 10lesson 2
Locked
Congratulations
Initialize the static variables with the following values: - partyFace - "\uD83E\uDD73" - balloon - "\uD83C\uDF88" - gift - "\uD83C\uDF81" - partyPopper - "\uD83C\uDF89" - cake - "\uD83C\uDF82" To see some congratulations, run the main() method.