2023-05-03 11:06:20 +02:00
== Why is this an issue?
2021-04-28 16:49:39 +02:00
Characters like ``++'é'++`` can be expressed either as a single code point or as a cluster of the letter ``++'e'++`` and a combining accent mark. Without the ``++CANON_EQ++`` flag, a regex will only match a string in which the characters are expressed in the same way.
2021-04-28 18:08:03 +02:00
2023-05-03 11:06:20 +02:00
=== Noncompliant code example
2021-04-28 16:49:39 +02:00
2022-02-04 17:28:24 +01:00
[source,java]
2021-04-28 16:49:39 +02:00
----
String s = "e\u0300";
Pattern p = Pattern.compile("é|ë|è"); // Noncompliant
2021-07-09 17:34:46 +02:00
System.out.println(p.matcher(s).replaceAll("e")); // print 'è'
2021-04-28 16:49:39 +02:00
----
2021-04-28 18:08:03 +02:00
2023-05-03 11:06:20 +02:00
=== Compliant solution
2021-04-28 16:49:39 +02:00
2022-02-04 17:28:24 +01:00
[source,java]
2021-04-28 16:49:39 +02:00
----
String s = "e\u0300";
Pattern p = Pattern.compile("é|ë|è", Pattern.CANON_EQ);
System.out.println(p.matcher(s).replaceAll("e")); // print 'e'
----
2021-04-28 18:08:03 +02:00
2021-09-20 15:38:42 +02:00
ifdef::env-github,rspecator-view[]
'''
== Implementation Specification
(visible only on this page)
2023-05-25 14:18:12 +02:00
=== Message
Use the CANON_EQ flag with this pattern
=== Highlighting
the string of the pattern
2021-09-20 15:38:42 +02:00
endif::env-github,rspecator-view[]