[APPSEC-271] Modify rule S2091: Change text to the education framework format [Python] (#1396)

* Add rule information * Add lxml * Add Python stdlib * Change the XPath queries such that they're correct * Remove nonexistant highlighting reference * Add lxml as allowed framework * Split up parameterized queries and validation * Fix typo * Make changes in Java docs * Fix .NET text * Update rules/S2091/python/how-to-fix-it/python.adoc Co-authored-by: Pierre-Loup <49131563+pierre-loup-tristant-sonarsource@users.noreply.github.com> * Update common texts * Update code samples * Fix typo * Use correct syntax for lxml * Apply code review fixes Co-authored-by: Pierre-Loup <49131563+pierre-loup-tristant-sonarsource@users.noreply.github.com>
2022-11-21 11:47:20 +01:00 · 2022-11-21 11:47:20 +01:00 · f22ae4c3e2
commit f22ae4c3e2
parent 5873b5bf33
9 changed files with 189 additions and 39 deletions
--- a/docs/allowed_framework_names.adoc
+++ b/docs/allowed_framework_names.adoc
@ -38,4 +38,5 @@
 * PyYAML
 * SQLAlchemy
 * Amazon DynamoDB
-* python-ldap
+* python-ldap
+* lxml
--- a/rules/S2091/common/fix/parameterized-queries.adoc
+++ b/rules/S2091/common/fix/parameterized-queries.adoc
@ -0,0 +1,12 @@
+==== Parameterized Queries
+For XPath injections, the cleanest way to do so is to use parameterized queries.
+
+XPath allows for the usage of variables inside expressions in the form of `$variable`. XPath variables can be used to construct an XPath query without needing to concatenate user arguments to the query at runtime. Here is an example of an XPath query with variables:
+
+----
+
+/users/user[@user=$user and @pass=$pass]
+
+----
+
+When the XPath query is executed, the user input is passed alongside it. During execution, when the values of the variables need to be known, a resolver will return the correct user input for each variable. The contents of the variables are not considered application logic by the XPath executor, and thus injection is not possible.
--- a/rules/S2091/common/fix/validation.adoc
+++ b/rules/S2091/common/fix/validation.adoc
@ -1,40 +1,13 @@
 ==== Validation

-As a rule of thumb, the best approach to protect against injections is to
-systematically ensure that untrusted data cannot break out of the initially
-intended logic.
+In case XPath parameterized queries are not available, the most secure way to protect against injections is to validate the input before using it in an XPath query.

-For XPath injections, the cleanest way to do so is to use parameterized queries
-if it is available for your use case. Else, the most secure way to do so is to
-validate the input before using it in an XPath query.
+**Important**: The application must do this validation server-side. Validating this client-side is insecure.

-Create a list of authorized and secure strings that you want the application to
-be able to use in a query. +
-If a user input does not match an entry in this list, it should be rejected
-because it is considered unsafe.
+Input can be validated in multiple ways:

-The list can be either a regex string, an array, or validators on specific
-ranges of characters. If you use regexes, choose simple regexes to avoid ReDOS
-attacks.
-
-*Important note*: The application must do validation on the server side. Not on
-client-side front-ends.
-
-As a last resort, the most important security mechanism is the restriction of
-XPath metacharacters, such as:
-
-* `"`
-* `'`
-* `/`
-* `@`
-* `=`
-* `*`
-* `[`
-* `]`
-* `(`
-* `)`
-
-For Java, OWASP's function
-https://www.javadoc.io/doc/org.owasp.esapi/esapi/latest/org/owasp/esapi/Encoder.html#encodeForXPath-java.lang.String-[`encodeForXPath`]
-allows sanitizing these characters automatically.
+*  By checking against a list of authorized and secure strings that the application is allowed to use in a query.
+*  By ensuring user input is restricted to a specific range of characters (e.g., the regex `/^[a-zA-Z0-9]*$/` only allows alphanumeric characters.)
+*  By ensuring user input does not include any XPath metacharacters, such as `"`, `'`, `/`, `@`, `=`, `*`, `[`, `]`, `(` and `)`.

+If user input is not considered valid, it should be rejected as it is unsafe.
--- a/rules/S2091/csharp/how-to-fix-it/dotnet.adoc
+++ b/rules/S2091/csharp/how-to-fix-it/dotnet.adoc
@ -44,6 +44,10 @@ public class ExampleController : Controller

 === How does this work?

+As a rule of thumb, the best approach to protect against injections is to
+systematically ensure that untrusted data cannot break out of the initially
+intended logic.
+
 include::../../common/fix/validation.adoc[]

 In the example, a validation mechanism is applied to untrusted input to ensure
--- a/rules/S2091/java/how-to-fix-it/java-se.adoc
+++ b/rules/S2091/java/how-to-fix-it/java-se.adoc
@ -45,9 +45,14 @@ public boolean authenticate(HttpServletRequest req, XPath xpath, Document doc) t

 === How does this work?

+As a rule of thumb, the best approach to protect against injections is to
+systematically ensure that untrusted data cannot break out of the initially
+intended logic.
+
+include::../../common/fix/parameterized-queries.adoc[]
+
+In the example, a parameterized XPath query is created, and an `XPathVariableResolver` is used to securely insert untrusted data into the query, similar to parameterized SQL queries.
+
 include::../../common/fix/validation.adoc[]

-In the example, a parameterized XPath query is created as an XPath Factory
-resolver to insert untrusted data to the request in a safe way, similarly to
-parametrized SQL queries.
-
+For Java, OWASP's Enterprise Security API offers https://www.javadoc.io/doc/org.owasp.esapi/esapi/latest/org/owasp/esapi/Encoder.html#encodeForXPath-java.lang.String-[`encodeForXPath`] which sanitizes metacharacters automatically.
--- a/rules/S2091/python/how-to-fix-it/lxml.adoc
+++ b/rules/S2091/python/how-to-fix-it/lxml.adoc
@ -0,0 +1,54 @@
+=== How to fix it in lxml
+
+The following noncompliant code is vulnerable to XPath injections because untrusted data is
+concatenated to an XPath query without prior validation.
+
+==== Noncompliant code example
+
+[source,python,diff-id=1,diff-type=noncompliant]
+----
+from flask import request
+from lxml import etree
+
+@app.route('/authenticate')
+def authenticate():
+    username = request.args['username']
+    password = request.args['password']
+    expression = "./users/user[@name='" + username + "' and @pass='" + password + "']"
+    tree = etree.parse('resources/users.xml')
+
+    if tree.find(expression) is None:
+        return "Invalid credentials", 401
+    else:
+        return "Success", 200
+----
+
+==== Compliant solution
+
+[source,python,diff-id=1,diff-type=compliant]
+----
+from flask import request
+from lxml import etree
+
+@app.route('/authenticate')
+def authenticate():
+    username = request.args['username']
+    password = request.args['password']
+    expression = "./users/user[@name=$username and @pass=$password]"
+    tree = etree.parse('resources/users.xml')
+
+    if tree.xpath(expression, username=username, password=password) is None:
+        return "Invalid credentials", 401
+    else:
+        return "Success", 200
+----
+
+=== How does this work?
+
+As a rule of thumb, the best approach to protect against injections is to
+systematically ensure that untrusted data cannot break out of the initially
+intended logic.
+
+include::../../common/fix/parameterized-queries.adoc[]
+
+In the example, the username and password are passed as https://lxml.de/xpathxslt.html#:~:text=The%20xpath()%20method%20has%20support%20for%20XPath%20variables%3A[XPath variables] rather than concatenated to the XPath query. By using a parameterized query, injection is successfully prevented.
--- a/rules/S2091/python/how-to-fix-it/python.adoc
+++ b/rules/S2091/python/how-to-fix-it/python.adoc
@ -0,0 +1,73 @@
+=== How to fix it in the Python Standard Library
+
+The following noncompliant code is vulnerable to XPath injections because untrusted data is
+concatenated to an XPath query without prior validation. Although `xml.etree.ElementTree` only
+supports a subset of XPath syntax, exploitation can still be possible.
+
+==== Noncompliant code example
+
+[source,python,diff-id=1,diff-type=noncompliant]
+----
+from xml.etree import ElementTree
+from flask import request
+
+@app.route('/authenticate')
+def authenticate():
+    username = request.args['username']
+    password = request.args['password']
+    expression = "./users/user[@name='" + username + "'][@pass='" + password + "']"
+    tree = ElementTree.parse('resources/users.xml')
+
+    if tree.find(expression) is None:
+        return "Invalid credentials", 401
+    else:
+        return "Success", 200
+----
+
+////
+The above example would be exploitable by using e.g. `username=”admin][.!=”` and `password=”][.!='XXXXX”`
+This creates the following query: `./users/user[@name='admin][.!='][@pass='][.!='XXXXX']`,
+where `[.!='][@pass=']` and `[.!='XXXXX']` should match on everything, resulting in authentication as admin.
+////
+
+==== Compliant solution
+
+[source,python,diff-id=1,diff-type=compliant]
+----
+import re
+from xml.etree import ElementTree
+from flask import request
+
+tree = ElementTree.parse('resources/users.xml')
+
+@app.route('/authenticate')
+def authenticate():
+    username = request.args['username']
+    password = request.args['password']
+
+    if re.match("^[a-zA-Z0-9]*$", username) is None or re.match("^[a-zA-Z0-9]*$", password) is None:
+        return "Username or password contains invalid characters", 400
+
+    expression = "./users/user[@name='" + username + "'][@pass='" + password + "']"
+    tree = ElementTree.parse('resources/users.xml')
+
+    if tree.find(expression) is None:
+        return "Invalid credentials", 401
+    else:
+        return "Success", 200
+----
+
+=== How does this work?
+
+As a rule of thumb, the best approach to protect against injections is to
+systematically ensure that untrusted data cannot break out of the initially
+intended logic.
+
+include::../../common/fix/parameterized-queries.adoc[]
+
+It is not possible to construct parameterized queries using only the Python Standard Library. +
+Please use https://pypi.org/project/lxml/[lxml] instead, which allows for parameterized queries using the `ElementTree.xpath()` method.
+
+include::../../common/fix/validation.adoc[]
+
+In the example, we ensure that the username and password only contain alphanumeric characters by doing a regex match before executing the XPath query.
--- a/rules/S2091/python/metadata.json
+++ b/rules/S2091/python/metadata.json
@ -0,0 +1 @@
+{}
--- a/rules/S2091/python/rule.adoc
+++ b/rules/S2091/python/rule.adoc
@ -0,0 +1,27 @@
+== Why is this an issue?
+
+include::../rationale.adoc[]
+
+include::../impact.adoc[]
+
+== How to fix it?
+
+include::how-to-fix-it/lxml.adoc[]
+
+include::how-to-fix-it/python.adoc[]
+
+== Resources
+
+include::../common/resources/standards.adoc[]
+
+ifdef::env-github,rspecator-view[]
+
+'''
+== Implementation Specification
+(visible only on this page)
+
+include::../message.adoc[]
+
+'''
+'''
+endif::env-github,rspecator-view[]