rspec/rules/S5868/description.adoc

9 lines
891 B
Plaintext

When placing Unicode https://unicode.org/glossary/#grapheme_cluster[Grapheme Clusters] (characters which require to be encoded in multiple https://unicode.org/glossary/#code_point[Code Points]) inside a character class of a regular expression, this will likely lead to unintended behavior.
For instance, the grapheme cluster ``++c̈++`` requires two code points: one for ``++'c'++``, followed by one for the _umlaut_ modifier ``++'\u{0308}'++``. If placed within a character class, such as ``++[c̈]++``, the regex will consider the character class being the enumeration ``++[c\u{0308}]++`` instead. It will, therefore, match every ``++'c'++`` and every _umlaut_ that isn't expressed as a single codepoint, which is extremely unlikely to be the intended behavior.
This rule raises an issue every time Unicode Grapheme Clusters are used within a character class of a regular expression.