Plaintext
1
About the author
Matthias Kowalewski, the author of this document, is, despite popular
belief, not a real horse. He loves regex as much as being random.
If he is not actively hosting regex challenges in the Splunk> community
he works as a Splunk> Consultant with focus on IT-Security. He also
enjoys playing strategy games on his computer or hacking into
vulnerable machines that were set up in his private security lab.
Find Matthias on LinkedIn: www.linkedin.com/in/matthiaskowalewski
Licensing
This document was published using the creative commons license BY-NC-SA 4.0
Learn more about creative commons here:
https://creativecommons.org/
Learn more about BY-NC-SA 4.0:
https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
In the words of the author
Use this guide to challenge your friends or colleagues. Share it with everyone.
Be aware that it is prohibited to use the presented material in any commercial way. You are not
allowed to make money with the content of this document. Do not use the content of this
document elsewhere without appropriately crediting the author and origin.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
2
Glossary
Licensing 1
Glossary 2
Introduction 3
About this document 3
How to use this document? 3
What regex flavor should I use? 3
What regex editor should I use? 3
Where can I learn more about regular expressions? 3
About the challenges 4
How difficult are the challenges? 4
Did horsefez come up with all the challenges by himself? 4
Feedback 4
Introduction to the Challenges 5
Structure of challenges 5
Additional info 6
The Challenges 8
The famous writer 8
The end of your sentence 10
The 29th of February 12
Numbers 15
Asterisk the Gaul 18
Markups, Markdowns, Markarounds 20
Roman Numerals 23
Vali-Dates 25
Credit Card Numbers 27
The Picky Painter 30
Revelation 33
Acknowledgements 37
About the author 39
Contact me for feedback! 39
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
3
Introduction
About this document
This document serves as a collection of regular expression (regex) challenges held by horsefez
during the ‘Regex Tuesday’ events in the Splunk> community user group on slack. This
document includes ten challenges, plus one bonus challenge that has never been revealed
before.
How to use this document?
This document should encourage you to overcome your struggles with regex by challenging you
in fun and creative ways. Challenge yourself, your friends or coworkers to a regex-off and find
out who the real regex master is.
What regex flavor should I use?
You can use any regular expression flavor you like or feel comfortable with. Just note that the
challenges were originally run using PCRE (PHP<7.3). Scores were calculated with the PCRE
engine implemented on regex101.com.
What regex editor should I use?
There are multiple regular expression editors and engines on the web. The author prefers to use
regex101.com as it is easy to use and comes with a debugger option, quick-reference,
explanations and a mode to use substitution. It additionally allows you to save, fork and share
your solutions with friends and colleagues.
Where can I learn more about regular expressions?
Like with regex editors there are also multiple learning options when it comes to regular
expressions. Sites like rexegg.com and regular-expressions.info are good starting points. These
are the sites where the author learned and improved his skills. There is also a fun site called
regexcrossword.com where you are able to fill out crossword-puzzles using text that has to
match regular expression statements.
There are also so called ‘regex-golf’ sites, but the author discourages you from trying them out
as a beginner, as a lot of those so called ‘challenges’ require you to use brute-force to get to the
solution.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
4
About the challenges
The challenges were all already tested in the field. Meaning that countless people have tried
them already and came up with solutions - therefore are doable. They might look difficult or
impossible to some of you at first glance, but you can trust the author that they are indeed
possible.
How difficult are the challenges?
The author tried to sort the challenges in a way so it starts fairly easy and then ramps up the
difficulty in the later ones. This document uses a horse-based difficulty ranking system (HbDRS)
which shows the difficulty using horse figures. Example of HbDRS in action:
Difficulty Rating “Easy” =
Difficulty Rating “Medium” =
Difficulty Rating “Hard” =
Difficulty Rating “Insane” =
Did horsefez come up with all the challenges by himself?
No, not at all. Many of the challenges present in this collection were found on the web. Big
thanks go out to Callum Macrae who not only gave me the idea for the ‘Regex Tuesday’ events
but also provided a lot of the challenges. Fun fact, they aren’t his either as he also just found
them on the web and collected them on his website here callumacrae.github.io/regex-tuesday/.
The challenges on Callum Macrae’s site were designed to be done by using the JavaScript
implementation of regex. The author went ahead and made them PCRE compatible, added
additional descriptions, wrote an extended ruleset and adjusted the challenge data.
Feedback
I would love to hear your feedback on all things regex.
Check out the Contact me for feedback! section for information on how to reach me.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
5
Introduction to the Challenges
Structure of challenges
Title
The title of the challenge.
Difficulty (HbDRS)
Difficulty assumption done by the author.
Preamble
An optional short introduction story.
Description
A description on what to actually achieve in the challenge.
Rules
The ruleset for the current challenge. Mostly the same on all challenges.
Winning Categories
The categories that were judged in the original installment of the challenge.
Mostly ‘fewest steps’ and ‘fewest chars’. Be aware that most of the time there is one solution
that has an optimal step count, but not an optimal char count and vice versa. Solutions that
have good scores in both categories are rare.
Score-Self-Check
Shows score ranges, so you can better evaluate your own solutions.
Hints
Optional hints the author might have.
Challenge Data
The data you need to match. Put it into the ‘test string’ section on regex101.
Expected Output
What the end results should look like.
Helpful Links
Links to techniques that may help you in the challenge.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
6
Additional info
Default Rules
Do not try skipping over unwanted words/lines by using tricks like this or similar:
(?<match1>\w{15})\s*\d{7}\n(?<match2>\w{36})
Instead write logic that works with the content of every line independently.
Do not use anchors like \A, \Z, \z or \G when writing your regular expression.
Instead use anchors like \b, \B, ^ or $.
Do not make your regex fail on purpose when it encounters the part of data that should
not match using trickery with regex control-verbs.
Instead write a regex that evaluates every line of the data individually. Regex
control-verbs are allowed to be used. But don’t try to get them to work as a beginner.
Do not use flags ‘A’ and ‘J’.
Use flags ‘g’ (global) and ‘m’ (multiline) as default. Other flags are also allowed.
Debugger
On regex101.com there is a neat debugger built into the site. You can find it on the left side of
the screen and should use it to optimize your step count after you’ve come up with a working
solution or when you need to understand how the regex engine actually works.
Yes, I encourage you to make use of this feature extensively.
Stuck somewhere?
Oftentimes it is useful to start from scratch. If you struggle to solve a problem in life it is often
good to look at it from another angle. Regex is no different. Save what you have and start anew.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
7
Substitution function
Some of the upcoming challenges require you to use the built-in substitution function of
regex101.com. It might frighten you a bit at first, but let me assure you that it is rather easy to
use.
You can find the substitution function here. Just click on it and it shows up.
Let us make a short example to show you how it works and what it actually does.
With the power of regex, we changed their opinion about regular expressions. Neat.
You can find another good example and explanation over here:
https://sodocumentation.net/regex/topic/9852/substitutions-with-regular-expressions
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
8
The Challenges
The famous writer
HbDRS
Preamble
In a land far far away...
A famous novel writer once told me the biggest challenge he was facing whenever he had
finished a new book was correcting mistakes he made while writing it.
One of the problems was that whenever he drank alcoholic beverages before the writing session
he would occasionally repeat words twice ‘twice’.
It previously was very difficult for him to iron out those mistakes whenever he did some
proofreading afterwards. Luckily for him, he employed you to solve his problems.
Description
Find the words that are occuring twice twice right after another. Remember that those words
could also be case CASE sensitive.
It can happen that some words contain other words in them and therefore will also match if the
regex is written poorly. Only words that are coming right after each other should be matched.
To help the novel author to better find the words he should delete we are going to encase the
second word in html tags, formatting it to bold text using <b>bold text</b>.
You need to use the built-in ‘substitution’ function of regex101.
Check the ‘Expected Output’ section for further clarification.
Rules
Default rules.
Winning Categories
Fewest steps.
Count of steps displayed on regex101.com
Fewest chars.
Count of characters of the regex + Count of characters of the substitution
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
9
Score-Self-Check
Fewest steps:
Good ~ 1000 steps; Great ~ 600 steps; Amazing ~ 450 steps
Fewest chars:
Good ~ 50 chars; Great ~ 40 chars; Amazing ~ 30 chars
Hints
(\b) Word boundaries.
(\1) Backreference.
(?i) Flags.
Challenge Data Expected Output (10 matches)
This is a text This is a text
This is is a text This is <b>is</b> a text
This text text is is This text <b>text</b> is <b>is</b>
This text is a text This text is a text
This test text is a test This test text is a test
This this text is a text This <b>this</b> text is a text
cat dog dog cat dog cat dog <b>dog</b> cat dog
This test is a test tester This test is a test tester
hello world hello world hello world hello world
This nottest test is something This nottest test is something
This is IS a test This is <b>IS</b> a test
<Westy> I'll I'll be be back back soon soon. <Westy> I'll <b>I'll</b> be <b>be</b> back
<b>back</b> soon <b>soon</b>.
Helpful Links
https://www.regular-expressions.info/wordboundaries.html
https://www.regular-expressions.info/backref.html
https://www.regular-expressions.info/modifiers.html
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
10
The end of your sentence
HbDRS
Preamble
After you have successfully helped out our novel writer from the first challenge he contacts you
a while later with a new request. He wants you to match sentences in pairs of two out of his
latest script. The reason behind this request is unknown, but he pays you good money so you
won’t ask further questions.
Description
You need to separate sentences at the natural sentence-break-points (made up word) by
splitting them into two separate matches using named capture groups:
● first_sentence
● second_sentence
example: Joseph Hornsby works for Splunk. I am his biggest fan and
regex: (?<first_sentence>some regex logic) (?<second_sentence>some other logic)
first_sentence: Joseph Hornsby works for Splunk.
second_sentence: I am his biggest fan and
Do not capture the space between the sentences. In most cases it is just one whitespace.
Do use named capture groups to capture both sentences individually.
Check the ‘Expected Output’ section for further clarification.
Rules
Default rules.
Winning Categories
Fewest steps.
Count of steps displayed on regex101.com
Fewest chars.
Count of characters of the regex
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
11
Score-Self-Check
Fewest steps:
Good ~ 1000 steps; Great ~ 300 steps; Amazing ~ 105 steps
Fewest chars:
Good ~ 90 chars; Great ~ 75 chars; Amazing ~ 65 chars
Hints
([...] or [^...]) Character classes.
Challenge Data
assumes word senses. Within the confines of the book
does the clustering. In the event of a cluster-failure
would finish it, but when? It was hard to tell
soon afterwards ‘The Tick’ arrived." After she had told him
what a mess! Ryan Adler did not accept it and
it wasn't hers!' She replied to the police officer
always thought so.) Then he went to the airport
I didn't think about this. Meanwhile, the penguins attacked
in the U.S.A., people often assume degus to be squirrels
Kail?", he often thought, but Greg denied it
the goose weighed 13.5 kilograms, which was a lot
well... they'd better not install that software
J.H. has long been a very talented Operations Analyst at Splunk
like that", James M. thought, but was pleasantly surprised when
but W. G. Grace never had much hope in Thomas Turner
Expected Output (8 matches)
The first eight (8) sentences from the top downwards should match using your regex logic. The
rest of the sentences should not match using your logic.
Helpful Links
https://www.regular-expressions.info/charclass.html
https://www.regular-expressions.info/named.html
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
12
The 29th of February
HbDRS
Preamble
Let’s just assume that the novel author is an immortal being and wants to publish his books only
on the 29th of February of every leap-year. He asks you to match only the 29th of February’s that
are valid in a list of dates he provides, so that he knows when he has to publish new literature.
Also match dates that are already in the past.
And so ends the tale of the famous writer who was able to solve all his issues with the power of
regex and he then lived happily ever after. Or does he? We’ll see.
Description
It is kind of obvious that regular expression is not able to model a complex logic such as
validating which 29th of February is legit or not. But it is certainly possible to match only the
correct dates using a brute-force approach.
Check the ‘Expected Output’ section for further clarification.
Rules
Do not use anchors like \A, \Z, \z or \G. Do not use flags ‘A’ and ‘J’. Everything else is allowed.
Winning Categories
Fewest steps.
Count of steps displayed on regex101.com
Fewest chars.
Count of characters of the regex
Score-Self-Check
Fewest steps:
Good ~ 2000 steps; Great ~ 1300 steps; Amazing ~ 700 steps
Fewest chars:
Good ~ 50 chars; Great ~ 43 chars; Amazing ~ 35 chars
Hints
Character classes. Negated Character classes. Repetition.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
13
Challenge Data
29th of February 2034 29th of February 2070
29th of February 1998
29th of February 2035 29th of February 2071
29th of February 1999
29th of February 2036 29th of February 2072
29th of February 2000
29th of February 2037 29th of February 2073
29th of February 2001
29th of February 2038 29th of February 2074
29th of February 2002
29th of February 2039 29th of February 2075
29th of February 2003
29th of February 2040 29th of February 2076
29th of February 2004
29th of February 2041 29th of February 2077
29th of February 2005
29th of February 2042 29th of February 2078
29th of February 2006
29th of February 2043 29th of February 2079
29th of February 2007
29th of February 2044 29th of February 2080
29th of February 2008
29th of February 2045 29th of February 2081
29th of February 2009
29th of February 2046 29th of February 2082
29th of February 2010
29th of February 2047 29th of February 2083
29th of February 2011
29th of February 2048 29th of February 2084
29th of February 2012
29th of February 2049 29th of February 2085
29th of February 2013
29th of February 2050 29th of February 2086
29th of February 2014
29th of February 2051 29th of February 2087
29th of February 2015
29th of February 2052 29th of February 2088
29th of February 2016
29th of February 2053 29th of February 2089
29th of February 2017
29th of February 2054 29th of February 2090
29th of February 2018
29th of February 2055 29th of February 2091
29th of February 2019
29th of February 2056 29th of February 2092
29th of February 2020
29th of February 2057 29th of February 2093
29th of February 2021
29th of February 2058 29th of February 2094
29th of February 2022
29th of February 2059 29th of February 2095
29th of February 2023
29th of February 2060 29th of February 2096
29th of February 2024
29th of February 2061 29th of February 2097
29th of February 2025
29th of February 2062 29th of February 2098
29th of February 2026
29th of February 2063 29th of February 2099
29th of February 2027
29th of February 2064 29th of February 2100
29th of February 2028
29th of February 2065 29th of February 3066
29th of February 2029
29th of February 2066 29th of February 4040
29th of February 2030
29th of February 2067 29th of February 7072
29th of February 2031
29th of February 2068 29th of February 8022
29th of February 2032
29th of February 2069 29th of February 9996
29th of February 2033
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
14
Expected Output (28 matches)
There are 28 valid dates. 28 lines should match your logic. The rest of the lines should not
match your logic.
It is a brute-force approach. It might get ugly, but it doesn’t have to. If you get stuck anywhere
remember that starting from scratch can oftentimes help.
Helpful Links
https://www.regular-expressions.info/charclass.html
https://www.regular-expressions.info/charclasssubtract.html
https://www.regular-expressions.info/repeat.html
https://www.timeanddate.com/date/leapyear.html
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
15
Numbers
HbDRS
Preamble
This challenge marks the start of the ‘medium’ difficulty challenges. By now you should have a
basic understanding of regular expressions.
Description
In this challenge you need to validate certain number formats. Write an expression that would
also work with different numbers in the same formats. Do not brute-force.
Your matching numbers need to be captured in a named capturing group called “match”.
Check the ‘Expected Output’ section for further clarification.
Rules
Default rules.
Winning Categories
Fewest steps.
Count of steps displayed on regex101.com
Fewest chars.
Count of characters of the regex
Score-Self-Check
Fewest steps:
Good ~ 1500 steps; Great ~ 1100 steps; Amazing ~ 900 steps
Fewest chars:
Good ~ 100 chars; Great ~ 75 chars; Amazing ~ 60 chars
Hints
Possessive Quantifiers. Optional Items. Alternation.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
16
Challenge Data (match, no match)
47 .7
124
10,000,000,45
1,024
10 000 000.45
2,000,204
123,456,789,
3,000.6
10 102.3523
8,205,500.4672
10,214 241
0.5
36,000.57
100,000
5
42
10,5
10.5
10.44444444
1 024
9 999 352
10,19836
30 000,7302
0,5
47 372
10,000,000.45
10 000 000,45
123,456,789
123 456 789
1,05335
1.53252
.5
1025
1,1337,000
,046
100.
2.2.2
10,
10,.5
34 34
3692 38
36 047.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
17
Expected Output (25 matches)
The first 25 lines should match your regex logic. All the other lines should not match.
Match each valid line using a named capture group called ‘match’.
(?<match>your regex logic)
Helpful Links
https://www.regular-expressions.info/possessive.html
https://www.regular-expressions.info/optional.html
https://www.regular-expressions.info/alternation.html
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
18
Asterisk the Gaul
HbDRS
Preamble
Looks like the famous writer friend needs our help once again. This time around he wants you to
reformat sections from his new comic book that he previously formatted using markdown style.
Being the nice person you are, you aren’t going to deny his request.
Description
To help the writer correct his formatting mistakes we are going to reformat words or strings of
words that have one asterisk ‘*’ on one side that corresponds to one asterisk ‘*’ on the other
side. Use the html tags <i>italic</i> to mark those text sections correctly as italic text.
Be aware that there are also sections with two asterisks ‘**’, which should not be messed with.
Additionally there are single asterisks who have no corresponding partner-asterisk that also
should not be matched by your logic.
Observe the below example carefully to understand the rules.
example:
Example with one * lonely asterisk, one *italic section* and one **section that is bold**
expected result:
Example with one * lonely asterisk, one <i>italic section</i> and one **section that is bold**
You need to use the built-in ‘substitution’ function of regex101.
Check the ‘Expected Output’ section for further clarification.
Rules
Default rules.
Winning Categories
Fewest steps.
Count of steps displayed on regex101.com
Fewest chars.
Count of characters of the regex + Count of characters of the substitution
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
19
Score-Self-Check
Fewest steps:
Good ~ 900 steps; Great ~ 500 steps; Amazing ~ 220 steps
Fewest chars:
Good ~ 90 chars; Great ~ 60 chars; Amazing ~ 45 chars
Hints
Lookahead and Lookbehind.
Challenge Data Expected Output (9 matches)
This text is not italic. This text is not italic.
*This text is italic.* <i>This text is italic.</i>
This text is *partially* italic This text is <i>partially</i> italic
This text has *two* *italic* bits This text has <i>two</i> <i>italic</i> bits
**bold text (not italic)** **bold text (not italic)**
**bold text with *italic* ** **bold text with <i>italic</i> **
**part bold,** *part italic* **part bold,** <i>part italic</i>
*italic text **with bold** * <i>italic text **with bold** </i>
*italic* **bold** *italic* **bold** <i>italic</i> **bold** <i>italic</i> **bold**
*invalid markdown (do not parse)** *invalid markdown (do not parse)**
random * asterisk random * asterisk
Helpful Links
https://www.regular-expressions.info/lookaround.html
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
20
Markups, Markdowns, Markarounds
HbDRS
Preamble
In the previous challenge we reformatted text that was in markdown-format into html-style. This
time around we are going to validate markdown statements. Huge thanks go out to Damien
Chillet (@d3.iso) who originally helped me out on making this challenge possible.
Description
I am going to cut the description short and actually move all the explanations to the ‘Expected
Output’ section.
You need to use the built-in ‘substitution’ function of regex101.
Check the ‘Expected Output’ section for further clarification.
Rules
Default rules.
Winning Categories
Fewest steps.
Count of steps displayed on regex101.com
Fewest chars.
Count of characters of the regex + Count of characters of the substitution
Score-Self-Check
Fewest steps:
Good ~ 1500 steps; Great ~ 500 steps; Amazing ~ 300 steps
Fewest chars:
Good ~ 120 chars; Great ~ 85 chars; Amazing ~ 68 chars
Hints
Lookahead and Lookbehind. Character classes. Negated Character classes.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
21
Challenge Data (match, no match) Expected Output
[Basic link](http://example.com) <a href="http://example.com">Basic link</a>
[Another](http://example.com/) <a href="http://example.com/">Another</a>
Link: [macr.ae](https://macr.ae/) Link: <a href="https://macr.ae/">macr.ae</a>
[Text](https://test.this-test.com/) <a href="https://test.this-test.com/">Text</a>
[Test!](https://this.com) hello <a href="https://this.com">Test!</a> hello
l [l](https://TESTdomain.com) l l <a href="https://TESTdomain.com">l</a> l
[number](http://0test.com/) <a href="http://0test.com/">number</a>
[Invalid](http\\0test.com/) [Invalid](http\\0test.com/)
[Invalid](invalid://example.com) [Invalid](invalid://example.com)
[Invalid](mailto:nobody@example.com) [Invalid](mailto:nobody@example.com)
[Invalid](javascript:alert()) [Invalid](javascript:alert())
[Invalid](http://test_ing.com) [Invalid](http://test_ing.com)
[Invalid](http://inval.id,com) [Invalid](http://inval.id,com)
![Image](http://example.com/cats.jpg) ![Image](http://example.com/cats.jpg)
![Other image](cats.jpg) ![Other image](cats.jpg)
l[radioactive](http://dolphin.com) l[radioactive](http://dolphin.com)
[Invalid MarkDown](http://example.com)l [Invalid MarkDown](http://example.com)l
[[cat-penguin](http://example.com) [[cat-penguin](http://example.com)
[[Invalid MarkDown](http://example.com)) [[Invalid MarkDown](http://example.com))
Additional Explanations
Do not make your regex fail when encountering the word ‘invalid’.
Why should the bottom lines not be matched?
[Invalid](http\\0test.com/)
→ because it isn't well-known URL syntax (\\)
[Invalid](invalid://example.com)
→ because it isn't well-known URL syntax (invalid:)
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
22
[Invalid](mailto:nobody@example.com)
→ because it isn't well-known URL syntax (mailto)
[Invalid](javascript:alert())
→ because it isn't well-known URL syntax (javascript:alert())
[Invalid](http://test_ing.com)
→ because it isn't well-known URL syntax (underscore)
[Invalid](http://inval.id,com)
→ because it isn't well-known URL syntax (, instead of .)
![Image](http://example.com/cats.jpg)
→ because the ‘!’ makes it an image-reference, therefore it shouldn't be converted into a
hyperlink
![Other image](cats.jpg)
→ because the ‘!’ makes it an image-reference, therefore it shouldn't be converted into a
hyperlink
l[radioactive](http://dolphin.com)
→ because there is the letter ‘l’ without a space afterwards, which causes an invalid syntax
[Invalid MarkDown](http://example.com)l
→ because there is the letter ‘l’ without a space before it, which causes an invalid syntax
[[cat-penguin](http://example.com)
→ because there is a second ‘[’
[[Invalid MarkDown](http://example.com))
→ because the parentheses mismatch ‘[...)’, additionally this causes a syntax error
Helpful Links
https://www.regular-expressions.info/lookaround.html
https://www.regular-expressions.info/charclass.html
https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
23
Roman Numerals
HbDRS
Preamble
This challenge marks the end of the ‘medium’ difficulty challenges. What better way to celebrate
this than to make one that has absolutely no real-world application. Let’s make a challenge
about matching roman numerals, which is a superior numbering system used by Romans and
Horses.
Description
In this challenge you need to come up with regex logic for matching roman numerals. Luckily,
the logic behind roman numerals is known to mankind and well documented.
Check the ‘Expected Output’ section for further clarification.
Rules
Default rules.
Winning Categories
Fewest steps.
Count of steps displayed on regex101.com
Fewest chars.
Count of characters of the regex
Score-Self-Check
Fewest steps:
Good ~ 10000 steps; Great ~ 5500 steps; Amazing ~ 3700 steps
Fewest chars:
Good ~ 200 chars; Great ~ 100 chars; Amazing ~ 70 chars
Hints
Possessive Quantifiers. Optional Items. Alternation.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
24
Challenge Data (match, no match)
I II III IV V VI VII VIII IX
X XIII XIV XV XVIII XIX
XXV XXVI XXVII XXVII XXIX
XXX XXXIII XXXVIII XXXIX
XL XLII XLV XLVIII XLIX
L LI LII LIII LIV LV LVI LVII LVIII LIX
LX LXVI LXIX LXXIX LXXXV LXXXVIII
XC XCV XCVI XCVII XCIX
C CI CV CIX
CXI CXX CXXX CXXXVIII CXXXIX CXLII CL CLV CLVIII CLX CLXX CLXXX CXC CXCIII CXCVII
CXCIX CC CCIII CCVIII CCIX CCXXV CCL CCLXXV CCCLXXV CD CDXXV CDL CDLXXV CDXC
D DIX DCLXVI DCLXXV DCCCXXVIII
CM CMLXXV
M ML MCV MCCCL MD MDCCXXV MDCCCLXXV MCML MCMXCVIII MCMXCIX
MM MMCCCXXV MMCDLXXV MMDL
MMM MMMCCXXVIII MMMCCCXXVIII MMMD
MMMCMXCV MMMCMXCVIII MMMCMXCIX
VV VVX XVV IIII IIIII IVIVIV XIIX XXXXII VC IIXV IC CXIIL
LVIXXX LIXXX XXXVIIII XXXVV MCVV MLTK IIV IIX IIL IIM IIC CCM
CMD CMDB L0L LOL GG RAF WTF XIIIXCVIIVMC MMIXIII CCCXXXIIX
MCMM MDCCLXXVIIII CCCD CCCXXXIIII MMMMXXXX
Expected Output (111 matches)
Using the logic you came up with you need to match every single numeral marked in green.
While simultaneously not matching the red ones.
Helpful Links
https://www.regular-expressions.info/possessive.html
https://www.regular-expressions.info/optional.html
https://www.regular-expressions.info/alternation.html
https://www.factmonster.com/math-science/mathematics/roman-numerals
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
25
Vali-Dates
HbDRS
Preamble
Now we finally arrive at the hard challenges. Congratulations to you if you have come this far.
Remember matching the correct dates for the 29th of February a couple of challenges back?
This time I want you to validate dates that follow a certain scheme.
Description
The challenge data includes a list of dates with different formats.
I want you to match dates with the following two formats:
yyyy/mm/dd HH:MM
yyyy/mm/dd HH:MM:SS
All the other formats and especially invalid dates should not be matched.
Do not validate if a month has 28 or 29 days (leap years) or 30 or 31 days.
Do however validate if there are obvious errors like 27:44:13 or 2021/17/25.
Check the ‘Expected Output’ section for further clarification.
Rules
Default rules.
Winning Categories
Fewest steps.
Count of steps displayed on regex101.com
Fewest chars.
Count of characters of the regex
Score-Self-Check
Fewest steps:
Good ~ 1200 steps; Great ~ 700 steps; Amazing ~ 400 steps
Fewest chars:
Good ~ 140 chars; Great ~ 100 chars; Amazing ~ 85 chars
Hints
Alternation. Character classes.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
26
Challenge Data (match, no match)
2012/09/18 12:10
2001/09/30 23:59:11
1995/12/01 12:12:12
1001/01/07 14:27
2021/10/20 10:10
2000/01/01 01:01:01
2007/07/22 22:34:59
2021/05/05 00:00:00
2021/9/18 23:40
2013/XY/09 09:09
2021/00/01 01:49:59
2012/13/25 22:17:00
1994/11/00 12:12
2012/12/4 12:12
2009/11/11 24:00:00
2021/06/24 13:60
2002/10/10 14:59:60
a2021/11/11 11:11:11
2005/05/05 05:05:05d
2000 01 01 01:01:01
2007-07-22 22:34:59
2020/05/05 00/00/00
Expected Output (8 matches)
The first 8 lines should match your logic, while the rest of the lines should not be matched.
Helpful Links
https://www.regular-expressions.info/alternation.html
https://www.regular-expressions.info/charclass.html
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
27
Credit Card Numbers
HbDRS
Preamble
As quickly as we arrived at the hard challenges we are going to leave them behind soon, so we
can head towards the insane ones. This challenge serves as some sort of gate-keeper. If you are
able to master this one you are ready for the insanity that comes next.
Description
I want you to match credit card numbers (CCN) following these six simple rules.
Rule #1: The CCN must start with a 4, 5 or 6.
Rule #2: The CCN must only contain numbers 0 to 9 and optionally hyphens.
Rule #3: The CCN must contain exactly 16 digits... no less, no more.
Rule #4: The CCN may come in groups of 4 (four) digits separated by a hyphen. There must be
either three hyphens in total or none at all. Nothing in between.
Rule #5: The CCN must not use any other separator that is different from a hyphen.
Rule #6: The CCN must not have 4 (four) or more consecutive repeated digits.
While rules #1 to #5 shouldn’t pose much of a challenge to you, rule #6 is more difficult to be
implemented correctly.
Check the ‘Expected Output’ section for further clarification.
Rules
Default rules.
Winning Categories
Fewest steps.
Count of steps displayed on regex101.com
Fewest chars.
Count of characters of the regex
Score-Self-Check
Fewest steps:
Good ~ 5000 steps; Great ~ 3500 steps; Amazing ~ 1100 steps
Fewest chars:
Good ~ 130 chars; Great ~ 90 chars; Amazing ~ 60 chars
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
28
Hints
Alternation. Character classes. Back References. Optional Items.
Challenge Data (match, no match)
4123456789123456
5123-4567-8912-3456
6000700080009000
6653625879615786
4424424424442444
6543-6543-6543-6543
2223334445556660
61234-567-8912-3456
5133-3367-8912-3456
5123 - 3567 - 8912 - 3456
4512-1234 - 1244 -3256
5553-323519230091
55533235-19230091
555332351923-0091
abcd-eFgh-Hjkl-mnop
42536258796157867
4424444424442444
5122-2368-7954 - 3214
44244x4424442444
0525362587961578
4332-2223-5532-2010
5522,5522,5522,5522
5522_5522-5522,5522
6543-6543 6543-6543
5533 5555 5522 2255
5553-5553-5553-5555
8231-9200-2724-2219
6333-3444-2221-1133
5554-4433-3222-111O
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
29
Expected Output (6 matches)
The first 6 lines should match your logic, while the rest of the lines should not match.
Valid CCN Examples:
4253625879615786
4424424424442444
5122-2368-7954-3214
Invalid CCN Examples:
42536258796157867 17 digits in card number
4427777724442444 Consecutive digits are repeating 4 or more times
5122-2368-7954 - 3214 Spaces between the hyphen separator
44244x4424442444 Contains non-digit character
0525362587961578 Doesn't start with 4, 5 or 6
Helpful Links
https://www.regular-expressions.info/alternation.html
https://www.regular-expressions.info/charclass.html
https://www.regular-expressions.info/backref.html
https://www.regular-expressions.info/backref2.html
https://www.regular-expressions.info/optional.html
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
30
The Picky Painter
HbDRS
Preamble
With all the weird stuff going on in the world of modern art, there is an artist who has a big art
collection consisting of paintings only in grayscale. People love it. You are her loyal apprentice
who has no clue about modern art, but helps her to find the right colors for her next
masterpiece. She likes grayish colors. Dark gray, light gray, gray gray, all shades of gray. Fifty.
However she already has decided to use a subset of them for her next painting and wants you to
select them from a long list of colors. Unfortunately, this list is rather ugly and looks like a
copy’n paste job gone wrong. There are some other colors and issues with incorrect formatting.
Description
The objective is fairly simple. Just match the gray colors.
How are gray colors defined?
Gray colors just have the same percentage of red, green and blue or alternatively cyan, magenta
and yellow coloring to it. But not 100% (white) or 0% (black). Additionally look for mistakes in
color notation and correct formatting. Your matching colors need to be captured in a named
capturing group called “match”.
Check the ‘Expected Output’ section for further clarification.
Rules
Default rules.
Winning Categories
Fewest steps.
Count of steps displayed on regex101.com
Fewest chars.
Count of characters of the regex
Score-Self-Check
Fewest steps:
Good ~ 3500 steps; Great ~ 2000 steps; Amazing ~ 1100 steps
Fewest chars:
Good ~ 370 chars; Great ~ 280 chars; Amazing ~ 210 chars
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
31
Hints
Oftentimes it is useful to start from scratch. Save what you have and start anew.
Challenge Data (match, no match)
#000000000
#111
rbb(1, 1, 1)
#aaa
rgb(10, 10, 10, 10)
#eEe
rgb(257, 257, 257)
#111111
rgb(10%, 10, 10)
#6F6F6F
hsl (20,0%, 500)
#efEfEF
argb(1.1.1)
rgb(2, 2, 2)
rgb(15,15,15)
rgb(2.5, 2.5,2.5)
rgb(1, 01, 000001)
rgb(20%, 20%,20%)
rgba(4,4,4,0.8)
rgba(4,4, 4,1 )
rgba(3,3,3,0.12536)
rgba(10%,10%,10%,5%)
hsl(20,0%, 50%)
hsl(0, 10%, 100%)
hsl(0.5, 10.5%, 0%)
hsl(5, 5%, 0%)
hsla(20, 0%, 50%, 0.88)
hsla(0, 0%, 0%, 0.25)
#ef4
#eEf
#11111e
#123456
rgb(2, 4, 7)
rgb(10, 10,100)
rgb(1.5%, 1.5%, 1.8%)
rgba(1, 01, 0010, 0.5)
hsl(20, 20%, 20%)
hls(0 1% 01%)
hsla(0, 10%, 50%, 0.5)
#11111
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
32
Expected Output (21 matches)
The first 21 lines should match your logic, while the rest of the lines should not match.
Match each valid line using a named capture group called ‘match’.
(?<match>your regex logic)
Remember that your regex only has to work for the presented data. The challenge is not about
validating all the possible coloring formats or all the grayish-colors out there.
Helpful Links
https://en.wikipedia.org/wiki/Color_theory
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
33
Revelation
HbDRS
Preamble
This challenge marks the last one of this collection. The author has never shown this challenge
to anyone before. This challenge demands you to use everything you have learned so far. It is
diabolically difficult. At least that is what the author hopes for.
The original idea for this challenge came from https://www.reddit.com/user/jordanreiter.
Description
Ever had issues with reformatting weird excel spreadsheets to get them into a working csv
format? No? Yes? Maybe?
Whatever your answer to this question might be, you just have to achieve one simple goal.
Get the presented challenge data into the form of the expected output.
Get the presented challenge data into the form of the expected output.
Get the presented challenge data into the form of the expected output.
Get the presented challenge data into the form of the expected output.
You need to use the built-in ‘substitution’ function of regex101.
Check the ‘Expected Output’ section for further clarification.
Rules
Default rules.
Winning Categories
Fewest steps.
Count of steps displayed on regex101.com
Fewest chars.
Count of characters of the regex + Count of characters of the substitution
Score-Self-Check
The author has achieved a solution with 2104 steps and 94 chars. Can you beat him?
Hints
Think you have solved it? Better check it vigorously before you call your solution done. There are
lots of small traps laid out in it to throw you off and ruin your day.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
34
Challenge Data
This is a test
This is another test
This "big test" is a test
This "big test" is a "big test",yeah!
Almost "this entire" thing "is just a" quote
Matthew Banana-Horsey's friend Joseph
Matthew Banana-Horsey is a test
Matthew Banana-Horsey is--a--test
This----is--test
This-----is-test field
Don't say anything,ok?
I can't think
This " is a " test
don't tell Matthew Banana-Horsey that I broke Brandon's toy horse
I can't see Matthew Banana-Horsey anywhere; can you?
Too long; didn't read
Matthew Banana-Horsey's car was stolen
Damien Chillet is a regex prodigy
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
35
Expected Output
This,is,a,test
This,is,another,test
This,big test,is,a,test
This,big test,is,a,big test,yeah,!
Almost,this entire,thing,is just a,quote
Matthew,Banana-Horsey's,friend,Joseph
Matthew,Banana-Horsey,is,a,test
Matthew,Banana-Horsey,is,a,test
This,is,test
This,is-test,field
Don't,say,anything,ok,?
I,can't,think
This, is a ,test
don't,tell,Matthew,Banana-Horsey,that,I,broke,Brandon's,toy,horse
I,can't,see,Matthew,Banana-Horsey,anywhere,can,you,?
Too,long,didn't,read
Matthew,Banana-Horsey's,car,was,stolen
Damien,Chillet,is,a,regex,prodigy
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
36
Additional Explanations
Get the presented challenge data into the form of the expected output. Get the presented
challenge data into the form of the expected output. Get the presented challenge data into the
form of the expected output. Get the presented challenge data into the form of the expected
output. Get the presented challenge data into the form of the expected output. Get the presented
challenge data into the form of the expected output. Get the presented challenge data into the
form of the expected output. Get the presented challenge data into the form of the expected
output. Get the presented challenge data into the form of the expected output. Get the presented
challenge data into the form of the expected output. Get the presented challenge data into the
form of the expected output. Get the presented challenge data into the form of the expected
output. Get the presented challenge data into the form of the expected output. Get the presented
challenge data into the form of the expected output. Get the presented challenge data into the
form of the expected output. Get the presented challenge data into the form of the expected
output. Get the presented challenge data into the form of the expected output. Get the presented
challenge data into the form of the expected output. Get the presented challenge data into the
form of the expected output. Get the presented challenge data into the form of the expected
output. Get the presented challenge data into the form of the expected output. Get the presented
challenge data into the form of the expected output. Get the presented challenge data into the
form of the expected output. Get the presented challenge data into the form of the expected
output. Get the presented challenge data into the form of the expected output. Get the presented
challenge data into the form of the expected output. Get the presented challenge data into the
form of the expected output. Get the presented challenge data into the form of the expected
output. Get the presented challenge data into the form of the expected output. Get the presented
challenge data into the form of the expected output. Get the presented challenge data into the
form of the expected output. Get the presented challenge data into the form of the expected
output. Get the presented challenge data into the form of the expected output. Get the presented
challenge data into the form of the expected output. Get the presented challenge data into the
form of the expected output. Get the presented challenge data into the form of the expected
output. Get the presented challenge data into the form of the expected output. Get the presented
challenge data into the form of the expected output. Get the presented challenge data into the
form of the expected output.
Helpful Links
None.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
37
Acknowledgements
Cary Petterborg
Cary Petterborg inspired me to learn regex when I joined the slack
community in 2017. He not only showed me how to improve my skills,
but also encouraged me to try out new approaches and look at
problems from different angles. Without him I would’ve never
mastered regex.
Find Cary on LinkedIn: https://www.linkedin.com/in/carypetterborg/
Dal Jeanis
Dal Jeanis always believed in me and picked me up from the ground
every time I had tripped up. He saw the potential in me and was able to
convince me to keep fighting against all odds. His vast knowledge and
professional attitude were beneficial in my endeavour of becoming
who I am today.
Find Dal on LinkedIn: https://www.linkedin.com/in/daljeanis/
Callum Macrae
Callum Macrae’s website callumacrae.github.io/regex-tuesday/ gave me the idea to host
periodically occurring regex challenges in the Splunk> community. The challenges on his
website are the baseline for most challenges in this collection.
Check out Callum on: https://macr.ae/
The amazing regex web resources
Thanks go out to the amazing regex resources on the web. I linked to many of them in this
guide. Special thanks go out to Firas Dib, the creator of regex101.com, a site which made
hosting regex challenges manageable. Besides that, regex101.com is my go-to site whenever I
feel like doing regex.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
38
The amazing folks from the unlimited_randomness channel
Thanks to all the amazing people from the ‘unlimited_randomness’ channel on the splunk user
group slack. Thank you for amazing conversations, funny stories and great entertainment
altogether. Without you all I wouldn’t have made it so far and probably gone insane by now.
All the challenge participants
I want to thank each and everyone of you who participated in my regex challenges in the past.
Without your contribution it wouldn’t be possible to now publish this collection. Big thanks to
everyone for spending their valuable time. I hope you all had fun doing so.
The people proof-reading this document
Many thanks go out to the awesome folks that helped me finalize the contents of this
document. They spent their free time hunting through this document for spelling errors, issues
with grammar or factual mistakes.
Family and friends
Last but not least I want to thank my family and friends who supported me on countless
endeavours throughout my life and hopefully will continue doing so in the future.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0
39
About the author
Matthias Kowalewski, the author of this document, is, despite popular
belief, not a real horse. He loves regex as much as being random.
If he is not actively hosting regex challenges in the Splunk> community
he works as a Splunk> Consultant with focus on IT-Security. He also
enjoys playing strategy games on his computer or hacking into
vulnerable machines that were set up in his private security lab.
Find Matthias on LinkedIn: www.linkedin.com/in/matthiaskowalewski
About creating this document
It was a lot of fun, and hard work, for me to put this collection of regex challenges together. This
is my first publication in the form of an e-book and I did learn a lot throughout the creation
process.
I am proud of how it all turned out in the end. I can only hope that you’ll like it as much as I do
and that the newly gained knowledge about regular expressions will help you in your career.
Although, I already know that it will certainly be beneficial.
Contact me for feedback!
Wanna share your score?
Has this guide helped you to get better at regex?
Curious about how other people did in the challenges?
Wanna see my cool solution I use to validate IPv4 addresses?
Do you want to tell me about how you tortured your coworkers with the challenges?
Anything else regex related?
If yes, then please contact me at: horsefez@pm.me
Want me to solve your regex problems at work?
I am not going to. Sorry.
I am not paid to do your job.
GO AND LEARN REGEX. LOL.
Jump to Glossary Licensed using creative commons BY-NC-SA 4.0