Address review comments

Create rule S6981
2024-06-05 14:26:49 +02:00 · 2024-06-05 11:24:14 +02:00
3 changed files with 137 additions and 0 deletions
--- a/rules/S6981/metadata.json
+++ b/rules/S6981/metadata.json
@ -0,0 +1,2 @@
 {
 }
--- a/rules/S6981/python/metadata.json
+++ b/rules/S6981/python/metadata.json
@ -0,0 +1,23 @@
 {
  "title": "Gradients should be scaled when using mixed precision",
  "type": "BUG",
  "status": "ready",
  "remediation": {
    "func": "Constant\/Issue",
    "constantCost": "5min"
  },
  "tags": [
  ],
  "defaultSeverity": "Major",
  "ruleSpecification": "RSPEC-6981",
  "sqKey": "S6981",
  "scope": "All",
  "defaultQualityProfiles": ["Sonar way"],
  "quickfix": "infeasible",
  "code": {
    "impacts": {
      "RELIABILITY": "HIGH"
    },
    "attribute": "COMPLETE"
  }
 }
--- a/rules/S6981/python/rule.adoc
+++ b/rules/S6981/python/rule.adoc
@ -0,0 +1,112 @@
 This rule raises an issue when an unscaled loss is used for the backward pass and when the forward pass happened in a mixed-precision context
 == Why is this an issue?
 When using mixed precision training, tensors can be cast to lower precision variants to save memory and computing power. 
 The gradients accumulated during the forward pass might also be cast to a lower precision variant. If the resulting gradients have a small enough magnitude, they might underflow.
 === What is the potential impact?
 If the gradients underflow, the model might not learn properly and the training might be unstable.
 == How to fix it
 To fix this issue, you can use the relevant implementation of `GradScaler`, depending on the `autocast` context and device you are using.
 === Code examples
 ==== Noncompliant code example
 [source,python,diff-id=1,diff-type=noncompliant]
 ----
 import torch
 model = torch.nn.Linear(28*28, 10)
 optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-5)
 x = torch.randn(1, 1*28*28)
 y = torch.rand(1, 10)
 optimizer.zero_grad()
 with torch.autocast(device_type="cuda"):
    output = model(x)
    loss = torch.nn.functional.cross_entropy(output, y)
 loss.backward() # Noncompliant: The loss is used without being scaled
 optimizer.step()
 ----
 ==== Compliant solution
 [source,python,diff-id=1,diff-type=compliant]
 ----
 import torch
 model = torch.nn.Linear(28*28, 10)
 optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-5)
 scaler = torch.cuda.amp.GradScaler()
 x = torch.randn(1, 1*28*28)
 y = torch.rand(1, 10)
 optimizer.zero_grad()
 with torch.autocast(device_type="cuda"):
    output = model(x)
    loss = torch.nn.functional.cross_entropy(output, y)
 scaler.scale(loss).backward()
 scaler.step(optimizer)
 scaler.update()
 ----
 === How does this work?
 The `GradScaler` class is used to scale the loss before calling the `backward` method. This ensures that the gradients are not underflowing.
 The calls to `backward()` and `step()` are replaced by `scaler.scale(loss).backward()` and `scaler.step(optimizer)` respectively.
 We also need to add a call to `scaler.update()` to correctly update the scaler.
 == Resources
 === Documentation
 * Pytorch documentation - https://pytorch.org/docs/stable/amp.html#gradient-scaling[Gradient Scaling]
 ifdef::env-github,rspecator-view[]
 (visible only on this page)
 == Implementation specification 
 Tough implementation, with lots of false negatives in sight.
 There are multiple ways to have an autocast context, with the context manager or with a decorator on the `forward` method of the model.
 I think the implementation should not try too hard to find the issue.
 Find one function that has the properties : 
 - Has the autocast context manager, which contains a call to a subclass of `nn.Module`
 OR 
 - Contains a call to a subclass of `nn.Module`, with the `@autocast` decorator on the `forward` method.
 - Has a call to the `backward` method of a tensor
 - Has a call to the `step` method, (possibly filter to an object in the optimizer module ?)
 === Message 
 Primary : Use a GradScaler to avoid underflows
 Secondary:  Autocast context started here, The optimizer step should be proxied by a GradScaler
 === Issue location
 Primary : on the entire .backward() call
 Secondary : The autocast context or decorator
 Secondary : The optimizer.step() call
 === Quickfix
 No
 endif::env-github,rspecator-view[]
Author	SHA1	Message	Date
Ghislain Piot	36a7291544	Address review comments	2024-06-05 14:26:49 +02:00
ghislainpiot	0660c9d2a2	Create rule S6981	2024-06-05 11:24:14 +02:00