From: bugzilla-daemon@kernel.org
To: linux-scsi@vger.kernel.org
Subject: [Bug 220504] New: [BUG] aacraid: DMA mapping leak in aac_send_raw_srb() causes eventual -ENOMEM failure
Date: Wed, 27 Aug 2025 05:34:34 +0000 [thread overview]
Message-ID: <bug-220504-11613@https.bugzilla.kernel.org/> (raw)
https://bugzilla.kernel.org/show_bug.cgi?id=220504
Bug ID: 220504
Summary: [BUG] aacraid: DMA mapping leak in aac_send_raw_srb()
causes eventual -ENOMEM failure
Product: SCSI Drivers
Version: 2.5
Hardware: All
OS: Linux
Status: NEW
Severity: high
Priority: P3
Component: AACRAID
Assignee: scsi_drivers-aacraid@kernel-bugs.osdl.org
Reporter: shobu@ume2001.com
Regression: No
Created attachment 308556
--> https://bugzilla.kernel.org/attachment.cgi?id=308556&action=edit
Please refer to the details in the "Attachments" section of the main text
# Summary
On systems using the aacraid driver, repeated execution of FSACTL_SEND_RAW_SRB
ioctls, such as from smartctl -d aacraid,..., triggers a DMA mapping leak in
the aac_send_raw_srb() function. Over time, this exhausts DMA resources,
leading to allocation failures (-ENOMEM, reported as -12) and eventual system
instability or crash.
## Description
The aac_send_raw_srb() function in drivers/scsi/aacraid/commctrl.c is
responsible for handling raw SRB commands from userspace. When processing a
command with a scatter/gather (SG) list, the function iterates through the
list, allocates kernel memory for each entry (kmalloc), and maps it for DMA
using dma_map_single.
However, the corresponding dma_unmap_single call is missing from all code
paths, including both the success and error-handling (cleanup) paths.
Consequently, every SG entry processed through this ioctl leaks its DMA
mapping.
This continuous resource leak eventually prevents new DMA mappings from being
created, causing I/O operations to fail with -ENOMEM. This is observed in
kernel logs as aac_read:
```
aac_read: aac_fib_send failed with status: -12
```
Eventually, the system becomes unstable or crashes.
## Affected Versions
Observed on AlmaLinux 9 kernels (5.14.x) through kernel-ml 6.14.9 from ELRepo.
Based on code inspection, the issue likely persists in the latest upstream
aacraid driver.
**Note:** AlmaLinux 9.x and `kernel-ml` packages use the upstream `aacraid`
code without distribution-specific modifications.
## Steps to Reproduce
1. Use a system with an Adaptec RAID controller (e.g., ASR81605ZQ or ASR71605).
2. Run the following accelerated test repeatedly for 1–2 days:
```
while true; do
smartctl -d aacraid,N1,N2,N3 /dev/sdX
done
```
(Replace N1,N2,N3 and device names as appropriate.)
3. Monitor with the attached bpftrace script (aac_leak_monitor_pre1.bt) to
observe allocation/free imbalance during aac_send_raw_srb() execution.
## Observed Behavior
- During smartctl stress, alloc - free difference grows monotonically.
- Eventually, kernel logs show:
```
aac_read: aac_fib_send failed with status: -12
```
followed by system hang or reboot.
- Other tools (e.g., arcconf / MaxView) may cause temporary diffs, but they
return to zero.
## Sample Logs
```
kernel: aacraid: Host adapter abort request.
kernel: aacraid: Host bus reset request. SCSI hang?
kernel: aacraid 0000:12:00.0: outstanding cmd: midlevel-0
kernel: aacraid 0000:12:00.0: outstanding cmd: firmware-1
kernel: aacraid 0000:12:00.0: Issuing IOP reset
kernel: aacraid 0000:12:00.0: IOP reset succeeded
kernel: aac_read: aac_fib_send failed with status: -12.
(repeated hundreds of times before crash)
```
## bpftrace Output (excerpt)
```
Time Alloc: 1024 Free: 1010 Diff: 14
Bytes: 65536 Live: 14
...
Time Alloc: 2048 Free: 2010 Diff: 38
Bytes: 131072 Live: 38
```
The Diff value increases steadily during smartctl stress and never returns to
zero, confirming a leak in the aac_send_raw_srb() scope.
## Workaround
Enable expose_physicals in aacraid so that smartctl does not require -d. In
this mode, the leak does not occur.
## Additional Notes
Attached is a bpftrace script used to confirm the leak.
Also attached is an experimental patch (AI-assisted, not for direct use) that
attempts to fix the leak by ensuring all DMA mappings are unmapped in cleanup
paths. This patch is provided for reference only to highlight the suspected
problematic areas in aac_send_raw_srb().
The final resolution should be determined by maintainers.
This patch was generated with the assistance of AI tools and adapted by a
non-expert user. Applying AI-generated patches without a deep understanding of
the subsystem and established development practices is not recommended. The
final resolution should be determined by the maintainers.
_Some logs also contained the message `"Host bus reset request. SCSI hang?"`,
which initially led me to suspect a relation to the issue reported on
linux-scsi (see: https://marc.info/?l=linux-scsi&m=168781894020549&w=2). I also
tested with the recent patch associated with that report, but it appears
unrelated to this problem. For the background and progression of my
investigation until reaching this conclusion, please refer to the following
report: "0001514: kmod-aacraid Issue Resurfaces with ASR71605 on AlmaLinux 9.5"
(https://elrepo.org/bugs/view.php?id=1514)._
## Attachments
- aacraid-smartctl-test - script for accelerated test
- aac_leak_monitor_pre1.bt – bpftrace script for leak detection
- aacraid-fix-aac_send_raw_srb-memleak.patch – experimental patch (reference
only)
## Test Environments
### **Test Environments A**
```
System:
Host: serverA Kernel: 6.14.9-1.el9.elrepo.x86_64 arch: x86_64 bits: 64
Console: pty pts/0 Distro: AlmaLinux 9.6 (Sage Margay)
Machine:
Type: Unknown Mobo: ASRockRack model: X470D4U serial: M80-D8000100878 UEFI:
American Megatrends
LLC. v: L4.29A date: 03/11/2024
Memory:
System RAM: total: 32 GiB available: 30.21 GiB used: 18.28 GiB (60.5%)
Array-1: capacity: 128 GiB slots: 4 modules: 2 EC: Multi-bit ECC
Device-1: Channel-A DIMM 0 type: no module installed
Device-2: Channel-A DIMM 1 type: DDR4 size: 16 GiB speed: 3200 MT/s
Device-3: Channel-B DIMM 0 type: no module installed
Device-4: Channel-B DIMM 1 type: DDR4 size: 16 GiB speed: 3200 MT/s
CPU:
Info: 6-core model: AMD Ryzen 5 PRO 4650G with Radeon Graphics
```
```
# arcconf getconfig 1 AD | grep 'Model'
Controller Model : Adaptec ASR81605ZQ
```
```
# arcconf getversion 1
Controllers found: 1
Controller #1
==============
Firmware : 7.18-0 (33556)
Staged Firmware : 7.18-0 (33556)
BIOS : 7.18-0 (33556)
Driver : 1.2-1 (50983)
Boot Flash : 7.18-0 (33556)
CPLD (Load version/ Flash version) : 12/ 12
SEEPROM (Load version/ Flash version) : 1/ 1
FCT Custom Init String Version : 0x0
```
---
### **Test Environments B**
```
System:
Host: serverB Kernel: 6.14.9-1.el9.elrepo.x86_64 arch: x86_64 bits: 64
Console: pty pts/4 Distro: AlmaLinux 9.6 (Sage Margay)
Machine:
Type: Unknown Mobo: ASRockRack model: X470D4U serial: M80-D6013900227
UEFI-[Legacy]: American Megatrends v: P3.50 date: 11/02/2020
Memory:
System RAM: total: 48 GiB available: 46.73 GiB used: 16.34 GiB (35.0%)
Array-1: capacity: 128 GiB slots: 4 modules: 4 EC: Multi-bit ECC
Device-1: Channel-A DIMM 0 type: DDR4 size: 16 GiB speed: spec: 3200 MT/s
actual: 2933 MT/s
Device-2: Channel-A DIMM 1 type: DDR4 size: 8 GiB speed: spec: 3200 MT/s
actual: 2933 MT/s
Device-3: Channel-B DIMM 0 type: DDR4 size: 16 GiB speed: spec: 3200 MT/s
actual: 2933 MT/s
Device-4: Channel-B DIMM 1 type: DDR4 size: 8 GiB speed: spec: 3200 MT/s
actual: 2933 MT/s
CPU:
Info: 6-core model: AMD Ryzen 5 2600
```
```
# arcconf getconfig 1 AD | grep 'Model'
Controller Model : Adaptec ASR71605
```
```
# arcconf getversion 1
Controllers found: 1
Controller #1
==============
Firmware : 7.5-0 (32118)
Staged Firmware : 7.5-0 (32118)
BIOS : 7.5-0 (32118)
Driver : 1.2-1 (50983)
Boot Flash : 7.5-0 (32118)
CPLD (Load version/ Flash version) : 8/ 10
SEEPROM (Load version/ Flash version) : 1/ 1
```
---
### **Test Environments C**
```
System:
Host: serverC Kernel: 6.14.9-1.el9.elrepo.x86_64 arch: x86_64 bits: 64
Console: pty pts/0 Distro: AlmaLinux 9.6 (Sage Margay)
Machine:
Type: Desktop Mobo: ASUSTeK model: ROG STRIX B350-F GAMING v: Rev X.0x
serial: 171012711504156
UEFI: American Megatrends v: 6232 date: 09/29/2024
Memory:
System RAM: total: 16 GiB available: 15.28 GiB used: 1.28 GiB (8.3%)
Array-1: capacity: 128 GiB slots: 4 modules: 2 EC: None
Device-1: DIMM_A1 type: no module installed
Device-2: DIMM_A2 type: DDR4 size: 8 GiB speed: 2666 MT/s
Device-3: DIMM_B1 type: no module installed
Device-4: DIMM_B2 type: DDR4 size: 8 GiB speed: 2666 MT/s
CPU:
Info: 6-core model: AMD Ryzen 5 1600
```
```
# arcconf getconfig 1 AD | grep 'Model'
Controller Model : Adaptec ASR71605
```
```
# arcconf getversion 1
Controllers found: 1
Controller #1
==============
Firmware : 7.5-0 (32118)
Staged Firmware : 7.5-0 (32118)
BIOS : 7.5-0 (32118)
Driver : 1.2-1 (50983)
Boot Flash : 7.5-0 (32118)
CPLD (Load version/ Flash version) : 7/ 10
SEEPROM (Load version/ Flash version) : 0/ 1
```
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
next reply other threads:[~2025-08-27 5:34 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-27 5:34 bugzilla-daemon [this message]
2025-08-27 5:35 ` [Bug 220504] [BUG] aacraid: DMA mapping leak in aac_send_raw_srb() causes eventual -ENOMEM failure bugzilla-daemon
2025-08-27 5:35 ` bugzilla-daemon
2025-08-27 5:36 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-220504-11613@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@kernel.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).