
Inline adoc files when they are included exactly once. Also fix language tags because this inlining gives us better information on what language the code is written in.
44 lines
1.5 KiB
Plaintext
44 lines
1.5 KiB
Plaintext
== Why is this an issue?
|
|
|
|
When using POSIX classes like ``++\p{Alpha}++`` without the ``++UNICODE_CHARACTER_CLASS++`` flag or when using hard-coded character classes like ``++"[a-zA-Z]"++``, letters outside of the ASCII range, such as umlauts, accented letters or letter from non-Latin languages, won't be matched. This may cause code to incorrectly handle input containing such letters.
|
|
|
|
|
|
To correctly handle non-ASCII input, it is recommended to use Unicode classes like ``++\p{IsAlphabetic}++``. When using POSIX classes, Unicode support should be enabled by either passing ``++Pattern.UNICODE_CHARACTER_CLASS++`` as a flag to ``++Pattern.compile++`` or by using ``++(?U)++`` inside the regex.
|
|
|
|
|
|
=== Noncompliant code example
|
|
|
|
[source,java]
|
|
----
|
|
Pattern.compile("[a-zA-Z]");
|
|
Pattern.compile("\\p{Alpha}");
|
|
----
|
|
|
|
|
|
=== Compliant solution
|
|
|
|
[source,java]
|
|
----
|
|
Pattern.compile("\\p{IsAlphabetic}"); // matches all letters from all languages
|
|
Pattern.compile("\\p{IsLatin}"); // matches latin letters, including umlauts and other non-ASCII variations
|
|
Pattern.compile("\\p{Alpha}", Pattern.UNICODE_CHARACTER_CLASS);
|
|
Pattern.compile("(?U)\\p{Alpha}");
|
|
----
|
|
|
|
|
|
ifdef::env-github,rspecator-view[]
|
|
|
|
'''
|
|
== Implementation Specification
|
|
(visible only on this page)
|
|
|
|
=== Message
|
|
|
|
* when using plain character classes: Replace this character range with a Unicode-aware character class.
|
|
* when using POSIX classes: Enable the "UNICODE_CHARACTER_CLASS" flag or use a Unicode-aware alternative.
|
|
|
|
|
|
include::../highlighting.adoc[]
|
|
|
|
endif::env-github,rspecator-view[]
|