CodeGym /Courses /Java Multithreading /Regular expressions (with examples)

Regular expressions (with examples)

Java Multithreading
Level 2 , Lesson 11
Available
Regular expressions (with examples) - 1

"And now I'll tell you about regular expressions. This topic is both complex and simple at the same time. To thoroughly understand regular expressions, you may need to read two or three hefty books, but I can teach you how to use them right now."

"As experienced programmers like to joke, if you have a problem and think you're going to solve it with regular expressions, now you have two problems."

"Hmm."

"I hope I didn't scare you too much, my friend. No?"

"Okay, good. So, our new topic is regular expressions."

"If we oversimplify them, regular expressions are patterns for strings."

"You can check whether a string matches a given pattern. You can also split a string into parts using a delimiter or a pattern."

"But let's start with something simple: what is a pattern?"

"In SQL (but not in Java), you can check whether a string matches a particular pattern. This is how it looks:"

name like 'Alex%'

Here name is a variable, like is a command to check a pattern, and "Alex%" is the pattern.

In this case, % means any string or substring.

Pattern Strings matching the pattern
‘Alex%’ Alex
Alexandr
Alexander
Alexandra
….
‘%x%’ Max
Maxim
Alexandr
‘%a’ Olga
Helena
Ira

"In SQL, if you need to specify that there should only be one other character, then you would use the underscore character: "_"."

Pattern Strings matching the pattern
‘Alex%_’ Alex
Alexandr
Alexander
Alexandra
….
‘_x’ Ax
Bx
Cx
‘___’ Aaa
Aab
Bbb

"That makes sense."

"Okay, then let's move on to regular expressions."

"Regular expressions typically include restriction not only on the number of characters, but also their 'content'. "Any mask usually consists of two (sometimes more) parts: the first describes character 'preferences', and the second describes the number of characters."

"Here are some content examples:"

Pattern Description Examples
. Any one character 1
\d Any digit 7
\D Any non-digit C
\s A space, line break, or tab character ‘ ‘
\S Anything except spaces, tabs, and line breaks f
[a-z] Any letter from a to z z
[0-9] Any digit from 0 to 9. 8
\w Any word character c
\W Any non-word character _

"I won't remember those right off, but it doesn't look too hard."

"Excellent, then here are examples of the number of characters in a mask:"

Pattern Description Examples
A? The character 'A' occurs once or not at all A
B+ The character 'B' occurs one or more times BBBB
C* The character 'C' occurs zero or more times CCC
D{n} The character 'D' occurs n times The pattern D{4} matches DDDD
E{n,} The character 'E' occurs n or more times The pattern E{2,} matches EEEEEEE
F{n,m} The character 'F' occurs between n and m times The pattern E{2,4} matches EEEE

"That all seems pretty straightforward."

"You're catching on to everything so quickly. Now let's see how it looks all together:"

Pattern Description Examples
[a-d]? A character between 'a' and 'd' occurs once or not at all a, b, c, d
[b-d,z]+ The characters 'b', 'c', 'd', or 'z' occur one or more times b, bcdcdbdbdbdbzzzzbbzbzb, zbz
[1,7-9]* The digits 1, 7, 8, or 9 occur zero or more times 1, 7, 9, 9777, 111199
1{5} The digit 1 occurs 5 times 11111
[1,2,a,b]{2} The symbols 1, 2, 'a', or 'b' occur twice 11, 12, 1a, ab, 2b, bb, 22
[a,0]{2,3} The symbols 'a' or 0 occur 2 or 3 times aa, a0,00,0a, aaa,000, a00,0a0, a0a

"Still all clear."

"Really? Hmm. Either I explained everything really well or you're too quick on the uptake. Well, either way, that's good for us."

"Here are a couple of new insights for you."

"Since regular expressions are often used to find substrings, we can add two more characters (^ and $) to our patterns."

"^ means that the substring must include the beginning of the string."

"$ means that the substring must include the end of the string."

"Here are some examples:"

Pattern String and substrings that match the pattern
a{3} aaa a aaa a aaa
a{3}$ aaa a aaa a aaa
^a{3} aaa a aaa a aaa
^a{3}$ aaa a aaa a aaa

"And one more important point."

"In regular expressions, the following characters have special meaning: [ ] \ / ^ $ . | ? * + ( ) { }. They're called control characters. So, you can't simply use them in strings."

"As in Java code, they must be escaped. "And again as in Java code, the '\' character is used for this."

"If we want to describe a string consisting of three '?' characters, we can't write '?{3}', because '?' is a control character. We need to do it like this: \?{3}. If we want to use a '\' character, then we need to write '\\'."

"OK, got it."

"And now here's another interesting tidbit. In files with Java code, the '\' character must also be escaped in strings, since it's a control character."

"Of course."

"So, if you're trying to define a Java regular expression in a string, then you need to escape the '\' character twice."

"Here's an example:"

I want a mask that matches 'c:\anything'
In theory, the regular expression should look like this:
one 'c' character,
colon,
backslash,
period, and asterisk (to denote any number of characters). I added spaces to improve readability:
c : \ .*
But the characters '\' and '.' need to be escaped, so the regular expression will look like this:
c :  \\\.*
Or, without spaces
c:\\\.*
"We should have three backslashes in our regular expression.
That means that in a Java file the regular expression will look like this:"
String regexp = "c:\\\\\\.*";

"Wow! Whoa. Now I know."

"And if you decide to dig deeper into this, here are a couple of good links:"

Lesson on Wikipedia

Comments (10)
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION
Evgeniia Shabaeva Level 40, Budapest, Hungary
7 January 2025
...So, I guess I'm the next person to ask what the phrase "zero or more times" means in this context...
manny9876 Level 36, Israel
2 November 2023
\\\\\\\This lesson;
元. Level 26, Taipei, Taiwan, Province of China
14 May 2023
Finally going to teach this😂
Hubert Matlak Level 34, Poland, Poland
6 April 2023
"C* -- The character 'C' occurs zero or more times". What's that used for? It seems like it's always true, any character can occur zero or more time :)
Justin Smith Level 41, Greenfield, USA, United States
15 December 2021
"The digits 1, 7, 8, or 9 occur zero or more times" Wouldn't this pattern be useless? It would match everything. Every character occurs "0 or more times" in every string.
Mateusz Level 29, Poland
25 September 2020
There is a mistake at the end in: Stringregexp = "c:\\\\\\.*";. There should be only 4 backslahes, not 6.
Alex Vypirailenko Level 41, USA
26 September 2020
You do need 6 backslashes in the regex, if you want 3 backslashes in the output.
Mateusz Level 29, Poland
26 September 2020
You're right but the thing is we don't need 3 backslashes in the output. We need just two. The article claims that the dot needs to be escaped but it does not. If we escaped the dot, we would be looking for a dot in our string, but there are no dots in it. We want the standard functionality of a dot, which is "any one character", so we can't escape it. Uff, I hope I explained it well :)
fzw Level 41, West University Place, United States
18 April 2020
For [a,0]{2,3}, there should be more, like 00a
Juan Ma Level 41, Arauca, Colombia
22 March 2020
I have always liked regular expressions :)