Create rule S6537: Octal escape sequences should not be used in regular expressions (#1659)

This commit is contained in:
github-actions[bot] 2023-03-27 18:18:03 +02:00 committed by GitHub
parent f0841f2661
commit b5ec694d70
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 66 additions and 0 deletions

View File

@ -0,0 +1,16 @@
ifdef::env-github,rspecator-view[]
'''
== Implementation Specification
(visible only on this page)
== Message
Consider replacing this octal escape sequence with a Unicode or hexadecimal sequence instead.
== Highlighting
The octal escape sequence
'''
endif::env-github,rspecator-view[]

View File

@ -0,0 +1,2 @@
{
}

View File

@ -0,0 +1,17 @@
{
"title": "Octal escape sequences should not be used in regular expressions.",
"type": "CODE_SMELL",
"status": "ready",
"remediation": {
"func": "Constant\/Issue",
"constantCost": "5min"
},
"tags": [
],
"defaultSeverity": "Major",
"ruleSpecification": "RSPEC-6537",
"sqKey": "S6537",
"scope": "All",
"defaultQualityProfiles": ["Sonar way"],
"quickfix": "unknown"
}

View File

@ -0,0 +1,31 @@
Octal escape sequences, when used in regular expressions, can easily be mistaken for backreferences. When the use of such sequence is intentional, it is generally better to replace them with Unicode or hexadecimal sequence to avoid any ambiguity.
== Why is this an issue?
Using octal escapes in regular expressions can create confusion with backreferences.
Octal escapes are sequences of digits that represent a character in the ASCII table, and they are sometimes used to represent special characters in regular expressions.
However, they can be easily mistaken for backreferences, which are also sequences of digits that represent previously captured groups. This confusion can lead to unexpected results or errors in the regular expression.
== How to fix it
Instead of using octal escapes, it is recommended to use other ways to represent special characters in regular expressions. For example, you can use Unicode escape sequences, hexadecimal escape sequences or character classes. By using these alternatives, you can avoid the confusion with backreferences and improve the readability of your regular expressions.
=== Code examples
==== Noncompliant code example
[source,python]
----
import re
match = re.match(r"\101", "A")
----
==== Compliant solution
[source,python]
----
import re
match = re.match(r"\x41", "A")
----
include::../implementation.adoc[]