[APPSEC-271] Modify rule S2091: Change text to the education framework format [Python] (#1396)

* Add rule information

* Add lxml

* Add Python stdlib

* Change the XPath queries such that they're correct

* Remove nonexistant highlighting reference

* Add lxml as allowed framework

* Split up parameterized queries and validation

* Fix typo

* Make changes in Java docs

* Fix .NET text

* Update rules/S2091/python/how-to-fix-it/python.adoc

Co-authored-by: Pierre-Loup <49131563+pierre-loup-tristant-sonarsource@users.noreply.github.com>

* Update common texts

* Update code samples

* Fix typo

* Use correct syntax for lxml

* Apply code review fixes

Co-authored-by: Pierre-Loup <49131563+pierre-loup-tristant-sonarsource@users.noreply.github.com>
This commit is contained in:
Egon Okerman 2022-11-21 11:47:20 +01:00 committed by Christophe Zürn
parent 5873b5bf33
commit f22ae4c3e2
9 changed files with 189 additions and 39 deletions

View File

@ -38,4 +38,5 @@
* PyYAML
* SQLAlchemy
* Amazon DynamoDB
* python-ldap
* python-ldap
* lxml

View File

@ -0,0 +1,12 @@
==== Parameterized Queries
For XPath injections, the cleanest way to do so is to use parameterized queries.
XPath allows for the usage of variables inside expressions in the form of `$variable`. XPath variables can be used to construct an XPath query without needing to concatenate user arguments to the query at runtime. Here is an example of an XPath query with variables:
----
/users/user[@user=$user and @pass=$pass]
----
When the XPath query is executed, the user input is passed alongside it. During execution, when the values of the variables need to be known, a resolver will return the correct user input for each variable. The contents of the variables are not considered application logic by the XPath executor, and thus injection is not possible.

View File

@ -1,40 +1,13 @@
==== Validation
As a rule of thumb, the best approach to protect against injections is to
systematically ensure that untrusted data cannot break out of the initially
intended logic.
In case XPath parameterized queries are not available, the most secure way to protect against injections is to validate the input before using it in an XPath query.
For XPath injections, the cleanest way to do so is to use parameterized queries
if it is available for your use case. Else, the most secure way to do so is to
validate the input before using it in an XPath query.
**Important**: The application must do this validation server-side. Validating this client-side is insecure.
Create a list of authorized and secure strings that you want the application to
be able to use in a query. +
If a user input does not match an entry in this list, it should be rejected
because it is considered unsafe.
Input can be validated in multiple ways:
The list can be either a regex string, an array, or validators on specific
ranges of characters. If you use regexes, choose simple regexes to avoid ReDOS
attacks.
*Important note*: The application must do validation on the server side. Not on
client-side front-ends.
As a last resort, the most important security mechanism is the restriction of
XPath metacharacters, such as:
* `"`
* `'`
* `/`
* `@`
* `=`
* `*`
* `[`
* `]`
* `(`
* `)`
For Java, OWASP's function
https://www.javadoc.io/doc/org.owasp.esapi/esapi/latest/org/owasp/esapi/Encoder.html#encodeForXPath-java.lang.String-[`encodeForXPath`]
allows sanitizing these characters automatically.
* By checking against a list of authorized and secure strings that the application is allowed to use in a query.
* By ensuring user input is restricted to a specific range of characters (e.g., the regex `/^[a-zA-Z0-9]*$/` only allows alphanumeric characters.)
* By ensuring user input does not include any XPath metacharacters, such as `"`, `'`, `/`, `@`, `=`, `*`, `[`, `]`, `(` and `)`.
If user input is not considered valid, it should be rejected as it is unsafe.

View File

@ -44,6 +44,10 @@ public class ExampleController : Controller
=== How does this work?
As a rule of thumb, the best approach to protect against injections is to
systematically ensure that untrusted data cannot break out of the initially
intended logic.
include::../../common/fix/validation.adoc[]
In the example, a validation mechanism is applied to untrusted input to ensure

View File

@ -45,9 +45,14 @@ public boolean authenticate(HttpServletRequest req, XPath xpath, Document doc) t
=== How does this work?
As a rule of thumb, the best approach to protect against injections is to
systematically ensure that untrusted data cannot break out of the initially
intended logic.
include::../../common/fix/parameterized-queries.adoc[]
In the example, a parameterized XPath query is created, and an `XPathVariableResolver` is used to securely insert untrusted data into the query, similar to parameterized SQL queries.
include::../../common/fix/validation.adoc[]
In the example, a parameterized XPath query is created as an XPath Factory
resolver to insert untrusted data to the request in a safe way, similarly to
parametrized SQL queries.
For Java, OWASP's Enterprise Security API offers https://www.javadoc.io/doc/org.owasp.esapi/esapi/latest/org/owasp/esapi/Encoder.html#encodeForXPath-java.lang.String-[`encodeForXPath`] which sanitizes metacharacters automatically.

View File

@ -0,0 +1,54 @@
=== How to fix it in lxml
The following noncompliant code is vulnerable to XPath injections because untrusted data is
concatenated to an XPath query without prior validation.
==== Noncompliant code example
[source,python,diff-id=1,diff-type=noncompliant]
----
from flask import request
from lxml import etree
@app.route('/authenticate')
def authenticate():
username = request.args['username']
password = request.args['password']
expression = "./users/user[@name='" + username + "' and @pass='" + password + "']"
tree = etree.parse('resources/users.xml')
if tree.find(expression) is None:
return "Invalid credentials", 401
else:
return "Success", 200
----
==== Compliant solution
[source,python,diff-id=1,diff-type=compliant]
----
from flask import request
from lxml import etree
@app.route('/authenticate')
def authenticate():
username = request.args['username']
password = request.args['password']
expression = "./users/user[@name=$username and @pass=$password]"
tree = etree.parse('resources/users.xml')
if tree.xpath(expression, username=username, password=password) is None:
return "Invalid credentials", 401
else:
return "Success", 200
----
=== How does this work?
As a rule of thumb, the best approach to protect against injections is to
systematically ensure that untrusted data cannot break out of the initially
intended logic.
include::../../common/fix/parameterized-queries.adoc[]
In the example, the username and password are passed as https://lxml.de/xpathxslt.html#:~:text=The%20xpath()%20method%20has%20support%20for%20XPath%20variables%3A[XPath variables] rather than concatenated to the XPath query. By using a parameterized query, injection is successfully prevented.

View File

@ -0,0 +1,73 @@
=== How to fix it in the Python Standard Library
The following noncompliant code is vulnerable to XPath injections because untrusted data is
concatenated to an XPath query without prior validation. Although `xml.etree.ElementTree` only
supports a subset of XPath syntax, exploitation can still be possible.
==== Noncompliant code example
[source,python,diff-id=1,diff-type=noncompliant]
----
from xml.etree import ElementTree
from flask import request
@app.route('/authenticate')
def authenticate():
username = request.args['username']
password = request.args['password']
expression = "./users/user[@name='" + username + "'][@pass='" + password + "']"
tree = ElementTree.parse('resources/users.xml')
if tree.find(expression) is None:
return "Invalid credentials", 401
else:
return "Success", 200
----
////
The above example would be exploitable by using e.g. `username=”admin][.!=”` and `password=”][.!='XXXXX”`
This creates the following query: `./users/user[@name='admin][.!='][@pass='][.!='XXXXX']`,
where `[.!='][@pass=']` and `[.!='XXXXX']` should match on everything, resulting in authentication as admin.
////
==== Compliant solution
[source,python,diff-id=1,diff-type=compliant]
----
import re
from xml.etree import ElementTree
from flask import request
tree = ElementTree.parse('resources/users.xml')
@app.route('/authenticate')
def authenticate():
username = request.args['username']
password = request.args['password']
if re.match("^[a-zA-Z0-9]*$", username) is None or re.match("^[a-zA-Z0-9]*$", password) is None:
return "Username or password contains invalid characters", 400
expression = "./users/user[@name='" + username + "'][@pass='" + password + "']"
tree = ElementTree.parse('resources/users.xml')
if tree.find(expression) is None:
return "Invalid credentials", 401
else:
return "Success", 200
----
=== How does this work?
As a rule of thumb, the best approach to protect against injections is to
systematically ensure that untrusted data cannot break out of the initially
intended logic.
include::../../common/fix/parameterized-queries.adoc[]
It is not possible to construct parameterized queries using only the Python Standard Library. +
Please use https://pypi.org/project/lxml/[lxml] instead, which allows for parameterized queries using the `ElementTree.xpath()` method.
include::../../common/fix/validation.adoc[]
In the example, we ensure that the username and password only contain alphanumeric characters by doing a regex match before executing the XPath query.

View File

@ -0,0 +1 @@
{}

View File

@ -0,0 +1,27 @@
== Why is this an issue?
include::../rationale.adoc[]
include::../impact.adoc[]
== How to fix it?
include::how-to-fix-it/lxml.adoc[]
include::how-to-fix-it/python.adoc[]
== Resources
include::../common/resources/standards.adoc[]
ifdef::env-github,rspecator-view[]
'''
== Implementation Specification
(visible only on this page)
include::../message.adoc[]
'''
'''
endif::env-github,rspecator-view[]