rspec/rules/S5998/java/rule.adoc
Fred Tingaud 16f6c0aecf
Inline adoc when include has no additional value (#1940)
Inline adoc files when they are included exactly once.

Also fix language tags because this inlining gives us better information
on what language the code is written in.
2023-05-25 14:18:12 +02:00

64 lines
2.7 KiB
Plaintext

== Why is this an issue?
The Java regex engine uses recursive method calls to implement backtracking. Therefore when a repetition inside a regular expression contains multiple paths (i.e. the body of the repetition contains an alternation (``++|++``), an optional element or another repetition), trying to match the regular expression can cause a stack overflow on large inputs. This does not happen when using a possessive quantifier (such as ``++*+++`` instead of ``++*++``) or when using a character class inside a repetition (e.g. ``++[ab]*++`` instead of ``++(a|b)*++``).
The size of the input required to overflow the stack depends on various factors, including of course the stack size of the JVM. One thing that significantly increases the size of the input that can be processed is if each iteration of the repetition goes through a chain of multiple constant characters because such consecutive characters will be matched by the regex engine without invoking any recursion.
For example, on a JVM with a stack size of 1MB, the regex ``++(?:a|b)*++`` will overflow the stack after matching around 6000 characters (actual numbers may differ between JVM versions and even across multiple runs on the same JVM) whereas ``++(?:abc|def)*++`` can handle around 15000 characters.
Since often times stack growth can't easily be avoided, this rule will only report issues on regular expressions if they can cause a stack overflow on realistically sized inputs. You can adjust the ``++maxStackConsumptionFactor++`` parameter to adjust this.
=== Noncompliant code example
[source,java]
----
Pattern.compile("(a|b)*"); // Noncompliant
Pattern.compile("(.|\n)*"); // Noncompliant
Pattern.compile("(ab?)*"); // Noncompliant
----
=== Compliant solution
[source,java]
----
Pattern.compile("[ab]*"); // Character classes don't cause recursion the way that '|' does
Pattern.compile("(?s).*"); // Enabling the (?s) flag makes '.' match line breaks, so '|\n' isn't necessary
Pattern.compile("(ab?)*+"); // Possessive quantifiers don't cause recursion because they disable backtracking
----
ifdef::env-github,rspecator-view[]
'''
== Implementation Specification
(visible only on this page)
=== Message
Refactor this repetition that can lead to a stack overflow for large inputs.
=== Parameters
.maxStackConsumptionFactor
****
----
TBD
----
An indicator approximately proportional to how quickly the stack grows relative to the input size. An issue will be reported if the value for a regex exceeds the maximum set here. Setting this to 0 will cause an issue to be reported for all regular expressions with non-constant stack consumption.
****
=== Highlighting
The offending repetition
endif::env-github,rspecator-view[]