* [PATCH AUTOSEL 7.0] fs: aio: set VMA_DONTCOPY_BIT in mmap to fix NULL-pointer-dereference error
[not found] <20260428104133.2858589-1-sashal@kernel.org>
@ 2026-04-28 10:40 ` Sasha Levin
2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-5.10] fs: aio: reject partial mremap to avoid Null-pointer-dereference error Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2026-04-28 10:40 UTC (permalink / raw)
To: patches, stable
Cc: Zizhi Wo, Zizhi Wo, Jan Kara, Christian Brauner, Sasha Levin,
bcrl, viro, linux-aio, linux-fsdevel, linux-kernel
From: Zizhi Wo <wozizhi@huawei.com>
[ Upstream commit c03ce4173c7bffe1e7477f905a09b015d4000d3c ]
[BUG]
Recently, our internal syzkaller testing uncovered a null pointer
dereference issue:
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
[ 51.111664] filemap_read_folio+0x25/0xe0
[ 51.112410] filemap_fault+0xad7/0x1250
[ 51.113112] __do_fault+0x4b/0x460
[ 51.113699] do_pte_missing+0x5bc/0x1db0
[ 51.114250] ? __pte_offset_map+0x23/0x170
[ 51.114822] __handle_mm_fault+0x9f8/0x1680
[ 51.115408] handle_mm_fault+0x24c/0x570
[ 51.115958] do_user_addr_fault+0x226/0xa50
...
Crash analysis showed the file involved was an AIO ring file.
[CAUSE]
PARENT process CHILD process
t=0 io_setup(1, &ctx)
[access ctx addr]
fork()
io_destroy
vm_munmap // not affect child vma
percpu_ref_put
...
put_aio_ring_file
t=1 [access ctx addr] // pagefault
...
__do_fault
filemap_fault
max_idx = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE)
t=2 truncate_setsize
truncate_pagecache
t=3 filemap_get_folio // no folio, create folio
__filemap_get_folio(..., FGP_CREAT, ...) // page_not_uptodate
filemap_read_folio(file, mapping->a_ops->read_folio, folio) // oops!
At t=0, the parent process calls io_setup and then fork. The child process
gets its own VMA but without any PTEs. The parent then calls io_destroy.
Before i_size is truncated to 0, at t=1 the child process accesses this AIO
ctx address and triggers a pagefault. After the max_idx check passes, at
t=2 the parent calls truncate_setsize and truncate_pagecache. At t=3 the
child fails to obtain the folio, falls into the "page_not_uptodate" path,
and hits this problem because AIO does not implement "read_folio".
[Fix]
Fix this by marking the AIO ring buffer VMA with VM_DONTCOPY so
that fork()'s dup_mmap() skips it entirely. This is the correct
semantic because:
1) The child's ioctx_table is already reset to NULL by mm_init_aio() during
fork(), so the child has no AIO context and no way to perform any AIO
operations on this mapping.
2) The AIO ring VMA is only meaningful in conjunction with its associated
kioctx, which is never inherited across fork(). So child process with no
AIO context has no legitimate reason to access the ring buffer. Delivering
SIGSEGV on such an erroneous access is preferable to a kernel crash.
Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com>
Link: https://patch.msgid.link/20260413010814.548568-1-wozizhi@huawei.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Based on my investigation, I now have enough information to provide a
comprehensive analysis.
## Complete Phase-by-Phase Analysis
### PHASE 1: COMMIT MESSAGE FORENSICS
**Step 1.1 - Subject Line:**
- Record: `[fs: aio]` `[set VMA_DONTCOPY_BIT]` - indicates fix; title
explicitly says "to fix NULL-pointer-dereference error"
**Step 1.2 - Tags:**
- Record:
- Signed-off-by: Zizhi Wo (author from Huawei)
- Link:
https://patch.msgid.link/20260413010814.548568-1-wozizhi@huawei.com
- Reviewed-by: Jan Kara (SUSE, MM/filesystems reviewer)
- Signed-off-by: Christian Brauner (VFS maintainer)
- No explicit Cc: stable, no Fixes: tag (expected for this evaluation
pipeline)
**Step 1.3 - Commit Body:**
- Record: The body describes an internal syzkaller-discovered NULL
pointer deref reproducible by a fork()+io_destroy race. A detailed
timing diagram shows 4 time steps (t=0..t=3) explaining the race
between parent's io_destroy() teardown and child's page fault on the
inherited AIO ring VMA. The kernel crash stack trace shows:
`do_user_addr_fault -> handle_mm_fault -> __handle_mm_fault ->
do_pte_missing -> __do_fault -> filemap_fault -> filemap_read_folio` -
oops at `a_ops->read_folio` (NULL).
**Step 1.4 - Hidden bug fixes:**
- Record: Not hidden - the subject explicitly says "to fix NULL-pointer-
dereference error". This is a clear bug fix.
### PHASE 2: DIFF ANALYSIS
**Step 2.1 - Inventory:**
- Record: One file modified (`fs/aio.c`), 1 line changed (+1/-1), single
function `aio_ring_mmap_prepare()`. Surgical, minimal scope.
**Step 2.2 - Code flow:**
- Record: Before: VMA created with `VMA_DONTEXPAND_BIT` only. After: VMA
created with both `VMA_DONTEXPAND_BIT` and `VMA_DONTCOPY_BIT`. Affects
fork()'s `dup_mmap()` behavior: child will not inherit this VMA.
**Step 2.3 - Bug mechanism:**
- Record: Category (h) Hardware-semantic fix / (d) Memory safety.
Mechanism: Preventing fork()-time VMA duplication of the AIO ring
buffer, eliminating the race window where child holds a VMA to a ring
file while parent tears it down.
**Step 2.4 - Fix quality:**
- Record: Obviously correct, minimal, surgical. Risk of regression
extremely low - the only behavioral change is that child processes
cannot access the parent's AIO ring (which was never semantically
valid - see `mm_init_aio()` which already zeros `ioctx_table` in
child).
### PHASE 3: GIT HISTORY INVESTIGATION
**Step 3.1 - Blame the buggy code:**
- Record: The AIO ring mmap hook is ancient (pre-2.6.12). The `.fault =
filemap_fault` vm_op was added in mid-2010s. The fundamental bug (fork
copies VMA but child has no AIO context) has existed essentially since
AIO ring was made mappable. Verified via `git log --follow fs/aio.c`
showing AIO predates the current git history (from Linux-2.6.12-rc2).
**Step 3.2 - Follow Fixes: tag:**
- Record: No Fixes: tag. The bug is essentially inherent to the AIO ring
design from the start.
**Step 3.3 - Related changes:**
- Record: Previously, commit `81e9d6f864765` ("aio: fix mremap after
fork null-deref", 2023, in v6.3) fixed an adjacent fork+AIO NULL-
deref. That commit was `Cc: stable` tagged and backported. A follow-up
commit `3adf7ae18bf42` ("fs: aio: reject partial mremap...") by the
same author fixes yet another NULL-deref in the same family (also
reviewed by Jan Kara). These demonstrate a pattern of fork+AIO race
bugs.
**Step 3.4 - Author:**
- Record: Zizhi Wo is a regular Huawei kernel contributor, working on
filesystem issues. Also authored the related `3adf7ae18bf42` mremap
fix.
**Step 3.5 - Dependencies:**
- Record: None. The fix is self-contained. The `VM_DONTCOPY` flag has
been part of `dup_mmap()` logic for many years (mm/mmap.c), checked
via `mpnt->vm_flags & VM_DONTCOPY`.
### PHASE 4: MAILING LIST RESEARCH
**Step 4.1 - Original discussion:**
- Record: `b4 dig -c c03ce4173c7bf` found the original submission at htt
ps://lore.kernel.org/all/20260413010814.548568-1-wozizhi@huawei.com/ -
v1 only (no later revisions needed). Jan Kara's review comment
(retrieved via b4 dig -m): "*I agree it would have to be a rather
contrived setup to rely on AIO ringbuffer being inherited by
fork(2)... AIO ringbuffer is mostly a legacy thing these days... So
I'm OK with trying this simple fix and seeing whether somebody
complains.*" - No NAKs, no stable nomination but no objection to the
approach.
**Step 4.2 - Reviewers:**
- Record: CC'd: viro (VFS), jack (Jan Kara - MM/FS), brauner (VFS
maintainer), bcrl (AIO original maintainer), linux-fsdevel, linux-aio,
yangerkun, chengzhihao1. Plus Jan Kara added Jens Axboe for awareness.
Appropriate review coverage.
**Step 4.3 - Bug report:**
- Record: Found by Huawei internal syzkaller (fuzzer). Reproducible
kernel NULL pointer dereference - not theoretical.
**Step 4.4 - Related patches:**
- Record: Follow-up `3adf7ae18bf42` ("fs: aio: reject partial
mremap...") addresses a related but different NULL-deref in the same
subsystem. Independent fix.
**Step 4.5 - Stable list history:**
- Record: No explicit stable mailing list discussion found. However, the
precedent (81e9d6f864765) of fork-related AIO fix being backported
supports that this is stable material.
### PHASE 5: CODE SEMANTIC ANALYSIS
**Step 5.1 - Key functions:**
- Record: `aio_ring_mmap_prepare()` is the only function modified.
**Step 5.2 - Callers:**
- Record: Called by VFS mmap logic via `f_op->mmap_prepare` during
`mmap()` on the AIO ring file. Reachable from `io_setup(2)` syscall
via `aio_setup_ring() -> do_mmap(aio_ring_file, ...)`. Reachable by
any unprivileged process that can do io_setup().
**Step 5.3 - Callees:**
- Record: `vma_desc_set_flags()` - setting VMA flags during mmap
preparation. No side effects other than flag setting.
**Step 5.4 - Call chain:**
- Record: Bug path reachable from userspace:
1. User calls `io_setup(2)` -> mmap of AIO ring VMA
2. User calls `fork(2)` -> child inherits VMA (before this fix)
3. User (child) touches the VMA address -> triggers fault
4. User (parent) calls `io_destroy(2)` concurrently -> race triggers
NULL deref
All reachable by unprivileged userspace.
**Step 5.5 - Similar patterns:**
- Record: Verified via Grep that `VM_DONTCOPY` is used in several kernel
subsystems (android/binder.c, KFD, xen, infiniband, etc.) for VMAs
that shouldn't be inherited by fork. The AIO ring is semantically the
same class - it's associated with parent-specific kernel state.
### PHASE 6: CROSS-REFERENCING AND STABLE TREE ANALYSIS
**Step 6.1 - Buggy code in stable trees:**
- Record: Verified by examining `fs/aio.c` in each stable tree:
- `stable/linux-5.10.y`: Uses `vma->vm_flags |= VM_DONTEXPAND;` (no
VM_DONTCOPY)
- `stable/linux-5.15.y`: Uses `vma->vm_flags |= VM_DONTEXPAND;`
- `stable/linux-6.1.y`: Uses `vma->vm_flags |= VM_DONTEXPAND;`
- `stable/linux-6.6.y`: Uses `vm_flags_set(vma, VM_DONTEXPAND);`
- `stable/linux-6.12.y`: Uses `vm_flags_set(vma, VM_DONTEXPAND);`
- `stable/linux-6.17.y`, `6.18.y`, `6.19.y`: Uses `desc->vm_flags |=
VM_DONTEXPAND;`
All stable trees are missing VM_DONTCOPY and vulnerable to the bug.
**Step 6.2 - Backport complications:**
- Record: The upstream patch uses `vma_desc_set_flags(desc,
VMA_DONTEXPAND_BIT, VMA_DONTCOPY_BIT)` which was introduced in 7.0
(master). For each stable tree, the fix needs adaptation:
- 5.10-6.1: `vma->vm_flags |= VM_DONTEXPAND | VM_DONTCOPY;`
- 6.6-6.12: `vm_flags_set(vma, VM_DONTEXPAND | VM_DONTCOPY);`
- 6.17-6.19: `desc->vm_flags |= VM_DONTEXPAND | VM_DONTCOPY;`
Minor textual adjustment needed but semantically identical.
**Step 6.3 - Related fixes in stable:**
- Record: Commit `81e9d6f864765` ("aio: fix mremap after fork null-
deref") was backported to stable (verified present in
stable/linux-5.10.y as `c261f798f7baa` and in stable/linux-6.6.y as
`81e9d6f864765`). That confirms the AIO+fork class of bugs has been
considered stable-worthy before.
### PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT
**Step 7.1 - Subsystem:**
- Record: `fs/aio.c` - AIO filesystem interface. IMPORTANT criticality -
not in the hot path for most users (io_uring is newer), but AIO is
widely used by legacy applications, databases (Oracle, MySQL), and
libaio consumers. Still heavily supported.
**Step 7.2 - Activity:**
- Record: AIO is mature/stable subsystem. Low activity (mostly
maintenance) - the bug has likely been present for years without being
hit due to the unusual trigger (fork after io_setup is uncommon).
### PHASE 8: IMPACT AND RISK ASSESSMENT
**Step 8.1 - Affected users:**
- Record: Any system using AIO where a process that called io_setup()
then forks (e.g., databases, async I/O applications with forking). The
fork+AIO combination is unusual but legitimate.
**Step 8.2 - Trigger conditions:**
- Record: Race between parent's io_destroy() and child's page fault on
inherited ring. Triggerable by unprivileged userspace. Timing-
dependent but demonstrated via syzkaller (reproducible). No special
privileges needed.
**Step 8.3 - Failure mode severity:**
- Record: Kernel NULL pointer dereference - CRITICAL. System crash
(oops), potential panic depending on config. Triggerable by
unprivileged userspace - this borders on security concern (DoS).
**Step 8.4 - Risk-benefit:**
- Record:
- BENEFIT: Prevents kernel NULL deref crash triggerable by
unprivileged userspace. HIGH benefit.
- RISK: 1-line flag change. VM_DONTCOPY means child doesn't inherit
parent's AIO ring - semantically correct since `mm_init_aio()`
already clears the child's `ioctx_table`. Verified in kernel/fork.c:
`mm_init_aio() { mm->ioctx_table = NULL; }`. Very low risk.
### PHASE 9: FINAL SYNTHESIS
**Step 9.1 - Evidence:**
- FOR: Fixes real kernel NULL deref; found by syzkaller (reproducible);
reviewed by Jan Kara; 1-line surgical change; semantically correct
(child has no AIO context); precedent of similar fork+AIO fix being
backported (`81e9d6f864765`); triggerable by unprivileged userspace
(crash = DoS).
- AGAINST: Requires backport adjustment for older stable trees (API
evolved from `vma->vm_flags |=` to `vm_flags_set()` to
`desc->vm_flags` to `vma_desc_set_flags()`). However this is a trivial
textual adjustment.
**Step 9.2 - Stable rules checklist:**
1. Obviously correct and tested? YES - reviewed by Jan Kara, logic
verified
2. Fixes a real bug? YES - syzkaller-reproducible NULL deref
3. Important issue? YES - CRITICAL: NULL pointer deref = kernel crash
4. Small and contained? YES - 1 line
5. No new features? YES - just adds a flag
6. Can apply to stable? YES with minor textual adjustment per kernel
version
**Step 9.3 - Exception categories:**
- Not applicable - this is a direct bug fix for a kernel NULL deref.
**Step 9.4 - Decision:** The fix clearly meets all stable criteria:
obviously correct, fixes a real (syzkaller-reproducible) kernel crash,
very small (1 line), no new APIs. The bug is triggerable by unprivileged
userspace via fork+io_setup/io_destroy race, making it a serious issue
worthy of backport. The only complication is the 4 slightly different
forms needed for different stable trees, but each is a trivial
adaptation.
## Verification
- [Phase 1] Parsed tags: Reviewed-by Jan Kara, Signed-off-by Christian
Brauner (VFS maintainer), Link to lore patch submission
- [Phase 2] Diff analysis: verified via `git show c03ce4173c7bf` -
single 1-line change in `aio_ring_mmap_prepare()` adding
VMA_DONTCOPY_BIT
- [Phase 3] Checked file history `git log --oneline -- fs/aio.c` - aio.c
is pre-2.6.12 (ancient), the ring mmap code has existed for over a
decade
- [Phase 3] Found related earlier fix `81e9d6f864765` "aio: fix mremap
after fork null-deref" with explicit `Cc: <stable@vger.kernel.org>` -
established precedent
- [Phase 4] `b4 dig -c c03ce4173c7bf -a`: single version (v1) - applied
as-is, no revisions needed
- [Phase 4] `b4 dig -c c03ce4173c7bf -w`: verified maintainers CC'd
(viro, jack, brauner, bcrl, linux-fsdevel, linux-aio)
- [Phase 4] `b4 dig -m /tmp/aio_patch.mbox`: Jan Kara's review approved
the approach, called AIO ring "mostly a legacy thing", no NAKs
- [Phase 5] Verified `VMA_DONTCOPY_BIT` = 17 via
`DECLARE_VMA_BIT(DONTCOPY, 17)` in include/linux/mm.h;
`vma_desc_set_flags` expands to `vma_desc_set_flags_mask(desc,
mk_vma_flags(__VA_ARGS__))`
- [Phase 5] Verified `mm_init_aio` in kernel/fork.c: `mm->ioctx_table =
NULL;` - child has no AIO context, confirming semantic correctness
- [Phase 5] Verified `VM_DONTCOPY` handling in mm/mmap.c dup_mmap: `if
(mpnt->vm_flags & VM_DONTCOPY) { ... continue; }` - VMA is skipped
during fork
- [Phase 6] Read code from each stable tree's `fs/aio.c`:
- 5.10.y line 369-373: `vma->vm_flags |= VM_DONTEXPAND`
- 5.15.y line 368-373: `vma->vm_flags |= VM_DONTEXPAND`
- 6.1.y line 395-400: `vma->vm_flags |= VM_DONTEXPAND`
- 6.6.y line 395-400: `vm_flags_set(vma, VM_DONTEXPAND)`
- 6.12.y line 395-400: `vm_flags_set(vma, VM_DONTEXPAND)`
- 6.17.y/6.18.y/6.19.y line 395-400: `desc->vm_flags |= VM_DONTEXPAND`
None have VM_DONTCOPY - all are vulnerable.
- [Phase 6] Verified related `81e9d6f864765` is in stable via `git log
stable/linux-5.10.y` (as `c261f798f7baa`) and stable/linux-6.6.y
- [Phase 7] Subsystem identified as fs/aio (filesystem, async I/O) -
IMPORTANT criticality level
- [Phase 8] Failure mode: NULL deref at `filemap_read_folio` when
accessing `a_ops->read_folio` (not implemented by AIO) - confirmed
from stack trace in commit message
- UNVERIFIED: Whether the original syzkaller reproducer is public
(Huawei internal testing, report not public)
- UNVERIFIED: Exact date when the bug first became exploitable (depends
on when filemap_fault path was used for this VMA, which has been
present since aio ring was mappable - approximately since 2013)
## Conclusion
This is a small, surgical bug fix for a kernel NULL pointer dereference
that can be triggered by unprivileged userspace via a fork+AIO race. The
fix is semantically correct (child has no AIO context, so the VMA
shouldn't be inherited), was reviewed by Jan Kara, and has an
established precedent of related fork+AIO fixes being backported to
stable. The only caveat is that each stable tree needs a minor textual
adaptation due to API evolution (from `vma->vm_flags |=` to
`vm_flags_set()` to `vma_desc_set_flags()`), but the one-line semantic
change applies cleanly in every case.
**YES**
fs/aio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/aio.c b/fs/aio.c
index a07bdd1aaaa60..6d436f8b3f349 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -394,7 +394,7 @@ static const struct vm_operations_struct aio_ring_vm_ops = {
static int aio_ring_mmap_prepare(struct vm_area_desc *desc)
{
- vma_desc_set_flags(desc, VMA_DONTEXPAND_BIT);
+ vma_desc_set_flags(desc, VMA_DONTEXPAND_BIT, VMA_DONTCOPY_BIT);
desc->vm_ops = &aio_ring_vm_ops;
return 0;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 2+ messages in thread* [PATCH AUTOSEL 7.0-5.10] fs: aio: reject partial mremap to avoid Null-pointer-dereference error
[not found] <20260428104133.2858589-1-sashal@kernel.org>
2026-04-28 10:40 ` [PATCH AUTOSEL 7.0] fs: aio: set VMA_DONTCOPY_BIT in mmap to fix NULL-pointer-dereference error Sasha Levin
@ 2026-04-28 10:40 ` Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2026-04-28 10:40 UTC (permalink / raw)
To: patches, stable
Cc: Zizhi Wo, Zizhi Wo, Jan Kara, Christian Brauner, Sasha Levin,
viro, bcrl, linux-fsdevel, linux-aio, linux-kernel
From: Zizhi Wo <wozizhi@huawei.com>
[ Upstream commit 3adf7ae18bf42601246031002287c103a27df307 ]
[BUG]
Recently, our internal syzkaller testing uncovered a null pointer
dereference issue:
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
[ 51.111664] filemap_read_folio+0x25/0xe0
[ 51.112410] filemap_fault+0xad7/0x1250
[ 51.113112] __do_fault+0x4b/0x460
[ 51.113699] do_pte_missing+0x5bc/0x1db0
[ 51.114250] ? __pte_offset_map+0x23/0x170
[ 51.114822] __handle_mm_fault+0x9f8/0x1680
...
Crash analysis showed the file involved was an AIO ring file. The
phenomenon triggered is the same as the issue described in [1].
[CAUSE]
Consider the following scenario: userspace sets up an AIO context via
io_setup(), which creates a VMA covering the entire ring buffer. Then
userspace calls mremap() with the AIO ring address as the source, a smaller
old_len (less than the full ring size), MREMAP_MAYMOVE set, and without
MREMAP_DONTUNMAP. The kernel will relocate the requested portion to a new
destination address.
During this move, __split_vma() splits the original AIO ring VMA. The
requested portion is unmapped from the source and re-established at the
destination, while the remainder stays at the original source address as
an orphan VMA. The aio_ring_mremap() callback fires on the new destination
VMA, updating ctx->mmap_base to the destination address. But the callback
is unaware that only a partial region was moved and that an orphan VMA
still exists at the source:
source(AIO):
+-------------------+---------------------+
| moved to dest | orphan VMA (AIO) |
+-------------------+---------------------+
A A+partial_len A+ctx->mmap_size
dest:
+-------------------+
| moved VMA (AIO) |
+-------------------+
B B+partial_len
Later, io_destroy() calls vm_munmap(ctx->mmap_base, ctx->mmap_size), which
unmaps the destination. This not only fails to unmap the orphan VMA at the
source, but also overshoots the destination VMA and may unmap unrelated
mappings adjacent to it! After put_aio_ring_file() calls truncate_setsize()
to remove all pages from the pagecache, any subsequent access to the orphan
VMA triggers filemap_fault(), which calls a_ops->read_folio(). Since aio
does not implement read_folio, this results in a NULL pointer dereference.
[FIX]
Note that expanding mremap (new_len > old_len) is already rejected because
AIO ring VMAs are created with VM_DONTEXPAND. The only problematic case is
a partial move where "old_len == new_len" but both are smaller than the
full ring size.
Fix this by checking in aio_ring_mremap() that the new VMA covers the
entire ring. This ensures the AIO ring is always moved as a whole,
preventing orphan VMAs and the subsequent crash.
[1]: https://lore.kernel.org/all/20260413010814.548568-1-wozizhi@huawei.com/
Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com>
Link: https://patch.msgid.link/20260418060634.3713620-1-wozizhi@huaweicloud.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have all the information I need. Let me provide my analysis.
## PHASE 1: COMMIT MESSAGE FORENSICS
**Step 1.1 - Subject line parse**
- Record: Subsystem `fs: aio:`, verb "reject" (defensive fix), summary:
reject partial mremap on the AIO ring VMA to avoid a NULL-pointer
dereference.
**Step 1.2 - Tags**
- Record:
- `Signed-off-by: Zizhi Wo <wozizhi@huaweicloud.com>` (author)
- `Link: https://patch.msgid.link/20260418060634.3713620-1-
wozizhi@huaweicloud.com`
- `Reviewed-by: Jan Kara <jack@suse.cz>` (well-known VFS/FS reviewer)
- `Signed-off-by: Christian Brauner <brauner@kernel.org>` (VFS
maintainer, applied to vfs.fixes)
- No `Fixes:` tag, no `Cc: stable`, no syzbot `Reported-by`. Commit
message mentions "our internal syzkaller testing" so it is a fuzzer-
found, reproducible bug even though it is not on the public syzbot
instance.
- Mentions related issue `[1]: https://lore.kernel.org/all/20260413010
814.548568-1-wozizhi@huawei.com/` — the earlier NULL-deref fix in
this series (commit `c03ce4173c7bf` using `VMA_DONTCOPY_BIT` for the
fork-after-io_setup() variant).
**Step 1.3 - Body analysis**
- Record: Bug is a NULL pointer dereference caused by `filemap_fault()`
calling `a_ops->read_folio` (NULL for AIO ring mapping). The root
cause is that `mremap()` can partially move an AIO ring VMA (when
`old_len == new_len` but smaller than the full ring), splitting it
into a moved destination VMA + an orphan source VMA.
`aio_ring_mremap()` blindly updates `ctx->mmap_base` to the
destination, leaving the orphan untracked. Later `io_destroy()` calls
`vm_munmap(ctx->mmap_base, ctx->mmap_size)` which (a) fails to unmap
the orphan, and (b) overshoots the destination VMA, possibly unmapping
adjacent user mappings. The orphan survives
`put_aio_ring_file()`/`truncate_setsize()`, then any access faults
into `filemap_fault` → `read_folio` (NULL) → kernel oops. Failure mode
is a kernel NULL-deref oops, plus potential silent unmap of unrelated
user mappings.
**Step 1.4 - Hidden fix detection**
- Record: Not disguised — the commit is explicitly framed as a fix for a
NULL pointer dereference crash. The "reject" verb and BUG/CAUSE/FIX
structure make it a clear bug fix.
## PHASE 2: DIFF ANALYSIS
**Step 2.1 - Inventory**
- Record: Single file `fs/aio.c`, +2/-1, a single hunk inside
`aio_ring_mremap()`. Scope classification: minimal single-file
surgical fix.
**Step 2.2 - Code flow change**
- Record: Before:
```354:384:fs/aio.c
static int aio_ring_mremap(struct vm_area_struct *vma)
{
...
for (i = 0; i < table->nr; i++) {
struct kioctx *ctx;
ctx = rcu_dereference(table->table[i]);
if (ctx && ctx->aio_ring_file == file) {
if (!atomic_read(&ctx->dead)) {
ctx->user_id = ctx->mmap_base =
vma->vm_start;
res = 0;
}
break;
}
}
...
}
```
After, the inner `if` now also requires `ctx->mmap_size ==
(vma->vm_end - vma->vm_start)`. When that condition fails, `res` stays
`-EINVAL` which is returned to the mremap path. `move_vma()`
(mm/mremap.c) then reverts the page-table move and returns an error to
userspace.
**Step 2.3 - Bug mechanism**
- Record: Category (g) correctness / missing validation in an mmap
callback. Mechanism: `aio_ring_mremap()` accepted a post-split
destination VMA smaller than `ctx->mmap_size` and silently updated
`ctx->mmap_base`, desynchronizing the AIO bookkeeping from VMA
reality. The fix adds a size check so the AIO ring can only be
remapped as a whole.
**Step 2.4 - Fix quality**
- Record: The fix is obviously correct. It preserves the existing error-
path semantics (`-EINVAL`), and `move_vma()` already has the revert
path that relies on ->mremap returning an error (verified in
`mm/mremap.c:1215-1232`). Because `move_vma()` undoes the page-table
move on error and completes the unmap of the new VMA, the user sees a
normal mremap failure. No deadlock or new locking is introduced. Zero
regression risk for any user who is not currently intentionally
partially-remapping an AIO ring (and any such caller was already
setting themselves up for a crash).
## PHASE 3: GIT HISTORY INVESTIGATION
**Step 3.1 - Blame**
- Record: `git blame` on the changed lines shows the `if
(!atomic_read(&ctx->dead))` block was added by `b2edffdd912b4` (Al
Viro, Apr 2015, "fix mremap() vs. ioctx_kill() race"), and
`aio_ring_mremap()` itself was introduced by `e4a0d3e720e7e` (Pavel
Emelyanov, Sep 2014, "aio: Make it possible to remap aio ring", first
released in v3.19). The buggy omission (no ring-size check) has
existed since the callback was introduced — more than 10 years.
Present in every currently-supported stable tree.
**Step 3.2 - Fixes: tag**
- Record: No `Fixes:` tag is present. Logically the original bug source
is `e4a0d3e720e7e` (the callback introduction). That commit is in all
stable trees (v3.19+).
**Step 3.3 - File history**
- Record: The parent commits `c03ce4173c7bf` ("fs: aio: set
VMA_DONTCOPY_BIT…") and `3833d335d7be8` ("aio: Stop using
i_private_data…") are newer aio changes. The fork-variant fix
`c03ce4173c7bf` (April 13) and this mremap-variant fix (April 18) form
a closely related 2-piece series addressing AIO-ring NULL deref
scenarios. This patch is standalone and does NOT depend on
`c03ce4173c7bf` — each fix targets a distinct scenario (fork vs.
mremap). The prior analogous precedent is `81e9d6f864765` ("aio: fix
mremap after fork null-deref", Jan 2023), which was explicitly `Cc:
stable` and backported. It was itself a NULL-deref fix in the same
`aio_ring_mremap()` function.
**Step 3.4 - Author**
- Record: Zizhi Wo (Huawei) is a frequent, experienced fs-subsystem
contributor (cachefiles NULL-deref fixes, ext4, xfs, netfs/fscache).
Reviewed-by Jan Kara is a top-tier VFS maintainer. Signed-off-by
Christian Brauner (VFS maintainer) applied it to `vfs.fixes`. The
chain of trust is strong.
**Step 3.5 - Dependencies**
- Record: Standalone fix. The only fields it depends on
(`ctx->mmap_size`, `vma->vm_start`, `vma->vm_end`, `ctx->dead`) exist
unchanged in every stable branch checked
(5.10/5.15/6.1/6.6/6.12/6.17/6.18/6.19). No prerequisite commit
needed.
## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH
**Step 4.1 - Original submission**
- Record: `b4 dig -c 3adf7ae18bf42` → https://patch.msgid.link/202604180
60634.3713620-1-wozizhi@huaweicloud.com ; `b4 dig -a` shows only v1 —
applied as-is, no rework or NAK.
**Step 4.2 - Reviewers**
- Record: `b4 dig -w` shows the patch was addressed to Al Viro, Jan
Kara, Christian Brauner, Benjamin LaHaise (aio maintainer), Jens
Axboe, linux-fsdevel, linux-aio, linux-kernel — all appropriate
maintainers and lists. Jan Kara replied with `Reviewed-by`. Christian
Brauner applied it to `vfs.fixes`.
**Step 4.3 - Bug report**
- Record: Internal Huawei syzkaller testing uncovered the issue. Stack
trace provided (`filemap_read_folio → filemap_fault → __do_fault →
do_pte_missing → __handle_mm_fault`). Same symptom family as the
earlier `[1]` thread. No external public bugzilla or syzbot URL.
**Step 4.4 - Series context**
- Record: There is a logical 2-piece "AIO ring NULL-deref" pair: (i)
fork-related `c03ce4173c7bf` VMA_DONTCOPY fix, (ii) this mremap-
related fix. They are independent; either may be applied without the
other. Both were reviewed by Jan Kara and applied by Christian
Brauner.
**Step 4.5 - Stable mailing list**
- Record: Could not fetch lore.kernel.org directly (Anubis anti-bot
challenge). No `Cc: stable` was placed on the original posting;
reviewer did not explicitly request stable. However, the substantially
similar earlier fix `81e9d6f864765` had `Cc: stable@vger.kernel.org`
and was backported.
## PHASE 5: CODE SEMANTIC ANALYSIS
**Step 5.1 - Functions**
- Record: The only function touched is `aio_ring_mremap()` (a
`vm_operations_struct.mremap` callback).
**Step 5.2 - Callers**
- Record: Called from `move_vma()` in `mm/mremap.c` (line 1216: `err =
vma->vm_ops->mremap(new_vma);`). That is invoked from the `mremap(2)`
syscall path. Directly reachable from an unprivileged user's
`mremap()` syscall on any AIO ring they have mapped — i.e., high
reachability.
**Step 5.3 - Callees**
- Record: The function only reads `ctx->dead`, `ctx->aio_ring_file`, and
now `ctx->mmap_size`, plus writes `ctx->user_id` and `ctx->mmap_base`.
No new allocations, no locks, no RCU changes introduced. The new check
is pure arithmetic.
**Step 5.4 - Call chain reachability**
- Record: The bug is reachable from userspace via an ordinary
`io_setup()` + `mremap(addr, old_len, new_len=old_len, MREMAP_MAYMOVE,
new_addr)` with `old_len < ctx->mmap_size`. No privileges required.
This is clearly user-triggerable DoS / potential corruption of
adjacent mappings.
**Step 5.5 - Similar patterns**
- Record: The earlier `81e9d6f864765` fix and `c03ce4173c7bf` DONTCOPY
fix address sibling NULL-deref scenarios in the same AIO-ring file-
backed mapping. The pattern of the AIO ring being fragile when VMA
bookkeeping diverges from kioctx bookkeeping is well-established; each
leak has been plugged over the years.
## PHASE 6: CROSS-REFERENCING AND STABLE TREE ANALYSIS
**Step 6.1 - Code in stable?**
- Record: Verified across `stable-
push/linux-{5.10,5.15,6.1,6.6,6.12,6.17,6.18,6.19}.y`. In every
branch, `aio_ring_mremap()` contains the identical pre-patch block:
```text
if (ctx && ctx->aio_ring_file == file) {
if (!atomic_read(&ctx->dead)) {
ctx->user_id = ctx->mmap_base = vma->vm_start;
```
The `ctx->mmap_size` field also exists unchanged in all these
branches.
**Step 6.2 - Backport complications**
- Record: Patch should apply cleanly or with trivial offset-only fuzzing
on every active stable tree (5.10.y, 5.15.y, 6.1.y, 6.6.y, 6.12.y,
6.17.y, 6.18.y, 6.19.y). The two-line addition uses only pre-existing
struct fields and a pre-existing `vma` argument. No adjustment needed.
**Step 6.3 - Related fixes already in stable?**
- Record: Prior `81e9d6f864765` (mremap after fork null-deref) is
already in stable; this is a complementary fix for a different mremap
scenario.
## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT
**Step 7.1 - Subsystem criticality**
- Record: `fs/aio.c` is the kernel AIO implementation — used by libaio,
databases (MySQL/MariaDB/PostgreSQL via libaio), storage benchmarks,
and many userspace libraries. Criticality: IMPORTANT (widely used core
fs/IO code, affects many servers and containers).
**Step 7.2 - Subsystem activity**
- Record: Active — several recent commits (credential guards,
`i_private_data` removal, alloc conversions). The aio_ring_mremap area
itself sees occasional fix traffic (roughly one fix every few years)
whenever a new VMA-manipulation edge case is discovered.
## PHASE 8: IMPACT AND RISK ASSESSMENT
**Step 8.1 - Affected users**
- Record: Any user running a kernel where a local unprivileged user can
perform `io_setup()` + `mremap()`. That is essentially every Linux
system. AIO is enabled by default in every distro kernel.
**Step 8.2 - Trigger conditions**
- Record: Unprivileged user calls `io_setup()`; then calls `mremap(addr,
old_len, new_len, MREMAP_MAYMOVE, new_addr)` where `old_len ==
new_len` and `old_len < ctx->mmap_size`. No hardware or race needed —
deterministic. Internal syzkaller reproduced it.
**Step 8.3 - Failure mode severity**
- Record: CRITICAL. Two distinct bad outcomes:
1. Kernel NULL-pointer dereference oops (system crash / availability
loss).
2. `vm_munmap(ctx->mmap_base, ctx->mmap_size)` overshoot can unmap
*unrelated user mappings* adjacent to the destination VMA — i.e.,
memory corruption of an unprivileged user's other mappings,
reachable without privileges. This is a local DoS / potentially
security-relevant issue.
**Step 8.4 - Risk-benefit**
- Record:
- Benefit: prevents kernel NULL-deref oops and prevents unrelated mmap
regions from being silently torn down, both triggerable by
unprivileged userspace. Very high benefit.
- Risk: two lines, pure size check, pre-existing `-EINVAL` error path
already exercised in normal failure cases, no new locks, no ABI
change. `move_vma()` already handles ->mremap returning an error by
reverting. Very low risk.
- Ratio: strongly favorable.
## PHASE 9: FINAL SYNTHESIS
**Step 9.1 - Evidence**
- For backport:
- Fixes a deterministic, unprivileged-triggerable kernel NULL-pointer
dereference (KASAN/BUG).
- Also fixes a potential silent tear-down of unrelated user mappings
by `vm_munmap()` overshoot.
- Bug exists since v3.19 (2014) — present in every active stable tree.
- Fix is 2 lines in a single function, obviously correct, with an
existing error-revert path already in place.
- Found via (internal) syzkaller — concrete, reproducible.
- Reviewed by a top VFS maintainer (Jan Kara) and applied by VFS
maintainer (Christian Brauner).
- Author has a solid track record of filesystem bug fixes.
- Prior analogous NULL-deref fix in the same function
(`81e9d6f864765`) was `Cc: stable` and backported.
- Applies cleanly to every active stable branch (verified by reading
each branch's `aio_ring_mremap`).
- Against backport:
- No `Cc: stable` tag (but the question brief explicitly says this is
not a negative signal).
- No `Fixes:` tag (same caveat).
- No public syzbot ID; only "internal syzkaller".
- None of these are technical objections.
- Unresolved:
- UNVERIFIED: Could not fetch lore.kernel.org discussion pages
directly (Anubis) to confirm no follow-up concerns; the saved mbox
thread shows only Jan Kara's `Reviewed-by` and Christian Brauner's
"Applied to vfs.fixes" — no objections captured there.
**Step 9.2 - Stable rules**
1. Obviously correct and tested? YES — reviewed and applied by
maintainers; found by fuzzing; logic trivial.
2. Real bug, users affected? YES — unprivileged userspace can crash the
kernel.
3. Important issue (crash/security/corruption/deadlock)? YES — NULL-
deref oops + potential unmap of unrelated mappings.
4. Small and contained? YES — 2 lines, one function.
5. No new features/APIs? YES — pure validation.
6. Applies to stable? YES — verified structurally for
5.10/5.15/6.1/6.6/6.12/6.17/6.18/6.19.
**Step 9.3 - Exceptions**
- Not a device-ID/quirk/DT/build/doc fix, but it squarely meets the core
stable criteria on its own merits.
**Step 9.4 - Decision**
- All criteria for stable inclusion are met. The fix is minimal,
surgical, addresses an unprivileged-triggerable kernel crash path that
has existed since 2014, and has been vetted by appropriate
maintainers. Backport.
## Verification
- [Phase 1] `git show 3adf7ae18bf42` — extracted all tags: Signed-off-
by, Link, Reviewed-by (Jan Kara), Signed-off-by (Christian Brauner).
No Fixes:, no Cc: stable, no syzbot Reported-by.
- [Phase 2] Read `fs/aio.c` lines 354–393 — confirmed single hunk, 2
adds / 1 change; the new condition is `ctx->mmap_size == (vma->vm_end
- vma->vm_start)`.
- [Phase 2] Read `mm/mremap.c` lines 1215–1232 — confirmed that when
`vm_ops->mremap` returns an error, `move_vma()` reverts the page-table
move, so returning `-EINVAL` is a safe abort.
- [Phase 2] Read `mm/mremap.c` lines 1700–1741 — confirmed
MREMAP_DONTUNMAP is blocked by VM_DONTEXPAND and expansion is blocked
by VM_DONTEXPAND, so only the "partial move with old_len == new_len"
case reaches aio_ring_mremap, matching the commit message.
- [Phase 3] `git blame -L 365,380 fs/aio.c` — confirmed introduction
lineage: e4a0d3e720e7e (2014, v3.19) for the callback, b2edffdd912b4
(2015) for the `dead` check.
- [Phase 3] `git describe --contains e4a0d3e720e7e5` →
`v3.19-rc1~83^2~1` — bug exists since v3.19.
- [Phase 3] `git show 81e9d6f8647650` — confirmed prior similar NULL-
deref fix in same function was `Cc: stable@vger.kernel.org`.
- [Phase 3] `git log --oneline 3adf7ae18bf42~5..3adf7ae18bf42` —
confirmed the related commit c03ce4173c7bf is the sibling fix from the
same author, independent of this one.
- [Phase 4] `b4 dig -c 3adf7ae18bf42` → https://patch.msgid.link/2026041
8060634.3713620-1-wozizhi@huaweicloud.com ; `b4 dig -a` shows v1 only.
- [Phase 4] `b4 dig -c 3adf7ae18bf42 -w` — confirmed To: Viro, Jan Kara,
Christian Brauner, Benjamin LaHaise (aio maintainer), Jens Axboe; Cc:
linux-fsdevel, linux-aio, linux-kernel.
- [Phase 4] Saved thread mbox and read it — Jan Kara's Reviewed-by;
Christian Brauner's "Applied to vfs.fixes". No objections or follow-
ups.
- [Phase 5] Grep for callers of `aio_ring_mremap` — reached via
`vm_ops->mremap(new_vma)` in `mm/mremap.c:1216`, i.e., the `mremap(2)`
syscall. Unprivileged userspace reach confirmed.
- [Phase 6] `git show stable-
push/linux-{5.10,5.15,6.1,6.6,6.12,6.17,6.18,6.19}.y:fs/aio.c` —
confirmed the identical pre-patch `aio_ring_mremap()` block in every
active stable tree; `ctx->mmap_size` field exists in each.
- [Phase 6] Verified VM_DONTEXPAND is still applied to the AIO ring VMA
in every stable branch, so the commit's premise (only the partial-move
case matters) also holds in stable.
- [Phase 7] File path `fs/aio.c` → IMPORTANT subsystem (AIO, widely used
by userspace libaio).
- [Phase 8] Trigger analysis via commit message + mremap.c read →
unprivileged deterministic trigger.
- UNVERIFIED: Could not fetch live lore.kernel.org HTML (Anubis anti-
bot); relied on the mbox that b4 already retrieved, which did not show
any objections.
- UNVERIFIED: Did not independently execute the syzkaller reproducer;
relied on author's description plus maintainer review.
The fix is small, surgical, reviewed by a VFS maintainer, and addresses
an unprivileged-triggerable NULL-pointer dereference that also risks
silent unmap of unrelated user mappings. The buggy code is present,
identically, in every active stable tree going back to 5.10, and the
patch applies trivially. All stable-kernel-rules criteria are met.
**YES**
fs/aio.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/aio.c b/fs/aio.c
index 6d436f8b3f349..b8a163d90bfaf 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -369,7 +369,8 @@ static int aio_ring_mremap(struct vm_area_struct *vma)
ctx = rcu_dereference(table->table[i]);
if (ctx && ctx->aio_ring_file == file) {
- if (!atomic_read(&ctx->dead)) {
+ if (!atomic_read(&ctx->dead) &&
+ (ctx->mmap_size == (vma->vm_end - vma->vm_start))) {
ctx->user_id = ctx->mmap_base = vma->vm_start;
res = 0;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 2+ messages in thread