rspec/rules/S5867/java/rule.adoc

44 lines
1.5 KiB
Plaintext
Raw Normal View History

== Why is this an issue?
2021-04-28 16:49:39 +02:00
When using POSIX classes like ``++\p{Alpha}++`` without the ``++UNICODE_CHARACTER_CLASS++`` flag or when using hard-coded character classes like ``++"[a-zA-Z]"++``, letters outside of the ASCII range, such as umlauts, accented letters or letter from non-Latin languages, won't be matched. This may cause code to incorrectly handle input containing such letters.
To correctly handle non-ASCII input, it is recommended to use Unicode classes like ``++\p{IsAlphabetic}++``. When using POSIX classes, Unicode support should be enabled by either passing ``++Pattern.UNICODE_CHARACTER_CLASS++`` as a flag to ``++Pattern.compile++`` or by using ``++(?U)++`` inside the regex.
=== Noncompliant code example
2021-04-28 16:49:39 +02:00
2022-02-04 17:28:24 +01:00
[source,java]
2021-04-28 16:49:39 +02:00
----
Pattern.compile("[a-zA-Z]");
Pattern.compile("\\p{Alpha}");
----
=== Compliant solution
2021-04-28 16:49:39 +02:00
2022-02-04 17:28:24 +01:00
[source,java]
2021-04-28 16:49:39 +02:00
----
Pattern.compile("\\p{IsAlphabetic}"); // matches all letters from all languages
Pattern.compile("\\p{IsLatin}"); // matches latin letters, including umlauts and other non-ASCII variations
Pattern.compile("\\p{Alpha}", Pattern.UNICODE_CHARACTER_CLASS);
Pattern.compile("(?U)\\p{Alpha}");
----
ifdef::env-github,rspecator-view[]
'''
== Implementation Specification
(visible only on this page)
=== Message
* when using plain character classes: Replace this character range with a Unicode-aware character class.
* when using POSIX classes: Enable the "UNICODE_CHARACTER_CLASS" flag or use a Unicode-aware alternative.
include::../highlighting.adoc[]
endif::env-github,rspecator-view[]