
* Create rule S5867[kotlin]: Unicode-aware versions of character classes should be preferred * Fix typo Co-authored-by: margarita-nedzelska-sonarsource <70522623+margarita-nedzelska-sonarsource@users.noreply.github.com> Co-authored-by: margarita-nedzelska-sonarsource <70522623+margarita-nedzelska-sonarsource@users.noreply.github.com>
36 lines
1.3 KiB
Plaintext
36 lines
1.3 KiB
Plaintext
When using POSIX classes like ``++\p{Alpha}++`` without the ``++UNICODE_CHARACTER_CLASS++`` flag or when using hard-coded character classes like ``++"[a-zA-Z]"++``, letters outside of the ASCII range, such as umlauts, accented letters or letter from non-Latin languages, won't be matched. This may cause code to incorrectly handle input containing such letters.
|
|
|
|
|
|
To correctly handle non-ASCII input, it is recommended to use Unicode classes like ``++\p{IsAlphabetic}++``. When using POSIX classes, Unicode support should be enabled by either passing ``++Pattern.UNICODE_CHARACTER_CLASS++`` as a flag to ``++Pattern.compile++`` or by using ``++(?U)++`` inside the regex.
|
|
|
|
|
|
== Noncompliant Code Example
|
|
|
|
----
|
|
Pattern.compile("[a-zA-Z]");
|
|
Pattern.compile("\\p{Alpha}");
|
|
----
|
|
|
|
|
|
== Compliant Solution
|
|
|
|
----
|
|
Pattern.compile("\\p{IsAlphabetic}"); // matches all letters from all languages
|
|
Pattern.compile("\\p{IsLatin}"); // matches latin letters, including umlauts and other non-ASCII variations
|
|
Pattern.compile("\\p{Alpha}", Pattern.UNICODE_CHARACTER_CLASS);
|
|
Pattern.compile("(?U)\\p{Alpha}");
|
|
----
|
|
|
|
|
|
ifdef::env-github,rspecator-view[]
|
|
|
|
'''
|
|
== Implementation Specification
|
|
(visible only on this page)
|
|
|
|
include::message.adoc[]
|
|
|
|
include::../highlighting.adoc[]
|
|
|
|
endif::env-github,rspecator-view[]
|