
Inline adoc files when they are included exactly once. Also fix language tags because this inlining gives us better information on what language the code is written in.
45 lines
1.3 KiB
Plaintext
45 lines
1.3 KiB
Plaintext
== Why is this an issue?
|
|
|
|
When using POSIX classes like `\p{Alpha}` without the `(?U)` to include Unicode characters or when using hard-coded character classes like `"[a-zA-Z]"`, letters outside of the ASCII range, such as umlauts, accented letters or letter from non-Latin languages, won't be matched. This may cause code to incorrectly handle input containing such letters.
|
|
|
|
|
|
To correctly handle non-ASCII input, it is recommended to use Unicode classes like `\p{IsAlphabetic}`. When using POSIX classes, Unicode support should be enabled by using `(?U)` inside the regex.
|
|
|
|
|
|
=== Noncompliant code example
|
|
|
|
[source,kotlin]
|
|
----
|
|
Regex("[a-zA-Z]")
|
|
Regex("\\p{Alpha}")
|
|
Regex("""\p{Alpha}""")
|
|
----
|
|
|
|
|
|
=== Compliant solution
|
|
|
|
[source,kotlin]
|
|
----
|
|
Regex("""\p{IsAlphabetic}""") // matches all letters from all languages
|
|
Regex("""\p{IsLatin}""") // matches latin letters, including umlauts and other non-ASCII variations
|
|
Regex("""(?U)\p{Alpha}""")
|
|
Regex("(?U)\\p{Alpha}")
|
|
----
|
|
|
|
|
|
ifdef::env-github,rspecator-view[]
|
|
|
|
'''
|
|
== Implementation Specification
|
|
(visible only on this page)
|
|
|
|
=== Message
|
|
|
|
* when using plain character classes: Replace this character range with a Unicode-aware character class.
|
|
* when using POSIX classes: Use a Unicode-aware alternative or "(?U)".
|
|
|
|
|
|
include::../highlighting.adoc[]
|
|
|
|
endif::env-github,rspecator-view[]
|