Linux NFS development
 help / color / mirror / Atom feed
From: yangerkun <yangerkun@huawei.com>
To: Chuck Lever <cel@kernel.org>,
	Misbah Anjum N <misanjum@linux.ibm.com>,
	Jeff Layton <jlayton@kernel.org>, NeilBrown <neil@brown.name>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	Trond Myklebust <trondmy@kernel.org>,
	Anna Schumaker <anna@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>, <yi.zhang@huawei.com>,
	Zhihao Cheng <chengzhihao1@huawei.com>,
	Li Lingfeng <lilingfeng3@huawei.com>
Cc: <linux-nfs@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<netdev@vger.kernel.org>, Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files
Date: Sat, 9 May 2026 17:41:21 +0800	[thread overview]
Message-ID: <4ee398d0-d2ec-45b2-8214-6e35520fca2d@huawei.com> (raw)
In-Reply-To: <f4caa4fa-f15f-4c95-8318-d4ec216e6090@app.fastmail.com>

Hi Chuck!

在 2026/5/9 4:47, Chuck Lever 写道:
> Hi Erkun -
> 
> On Fri, May 8, 2026, at 9:00 AM, yangerkun wrote:
>> 在 2026/5/8 16:16, yangerkun 写道:
>>>
>>>
>>> 在 2026/5/8 11:08, yangerkun 写道:
>>> After reviewing these two commits:
>>>
>>> e7fcf179b82d NFSD: Hold net reference for the lifetime of /proc/fs/nfs/
>>> exports fd
>>> 48db892356d6 NFSD: Defer sub-object cleanup in export put callbacks
>>>
>>> I believe that the issue described in commit e7fcf179b82d might be the
>>> root cause of the null pointer dereferences mentioned in [1].
> 
> That's where I landed too. e7fcf179b82d closed the specific
> oops Misbah hit on /proc/fs/nfs/exports. The matching patch

Yeah!

> in this series is 5/6 ("SUNRPC: Hold cd->net for the lifetime
> of cache files"), which extends the same get_net()/put_net()
> guard to the sunrpc cache files at
> 
>   /proc/net/rpc/<cache>/{content,channel,flush} .
> 
> Those open helpers had the same hole; sosreport just hit the
> nfsd-specific file first because it reads /proc/fs/nfsd/exports.


Hmm... /proc/net is always a symlink to /proc/self/net. After opening 
/proc/net/rpc/<cache>/content and attempting to read it, the 
proc_reg_read function calls use_pde before pde_read. This sequence can 
prevent a race condition because nfsd_export_shutdown leads to 
cache_unregister_net, which calls remove_cache_proc_entries, then 
proc_remove, and eventually proc_entry_rundown. The proc_entry_rundown 
function waits until unuse_pde is called in proc_reg_read. Therefore, 
I'm not sure if forgetting to call get_net when opening 
/proc/net/rpc/<cache>/content is the root cause of the null pointer in 
c_show. I've tried to find any other possible root causes but have been 
unsuccessful. Sorry....


> 
> Patch 5/6's changelog pins down the deref site you asked
> about: cache_check_rcu() faults reading h->flags off a
> garbage cache_head returned by __cache_seq_start() walking a
> cd->hash_table that cache_destroy_net() already freed. Not a
> dentry deref. The dentry-teardown path is a separate failure
> mode that 48db892356d6 closed for the export and expkey caches.
> 
> 
>>> To prevent the
>>> issue described in commit 69d803c40ede, should we consider reverting
>>> commit 48db892356d6 first?
> 
> Not for this series. Patches 3/6 and 4/6 don't add any new
> path_put deferral; their commit messages call them out as
> consistency changes, not bug fixes. ip_map holds only an
> auth_domain reference and unix_gid holds only a group_info,
> so neither cache reaches mntput from the deferred release.
> The exportfs-r-then-umount sequence isn't touched by this
> series.
> 
> The svc_export and svc_expkey path_put deferral lives in
> 48db892356d6, which is already in v7.0. If the umount window
> from 69d803c40ede is still reachable through that commit,
> that's a regression in 48db892356d6 and worth a separate
> thread.

Yeah! Totally agree!

> 
> 
>> Locally, I wrote a stable regression test case. I also reverted to
>> commit 9189d23b835cec646ba5010db35d1557a77c5857 (which is before commits
>> 2862eee078a4 "SUNRPC: make sure cache entry active before cache_show"
>> and be8f982c369c "nfsd: make sure exp active before svc_export_show").
>> Even then, a panic can still be triggered without any actual export path...
> 
> That fits 5/6's failure mode. Without an export no svc_export
> or svc_expkey entry is populated, but rpc.mountd reads
> auth.unix.ip/content and auth.unix.gid/content directly,
> and on a pre-5/6 tree the open helpers in cache.c hold no
> reference on cd->net. cache_destroy_net() at namespace exit
> then races a reader still inside cache_seq_start_rcu(), and
> the reader walks a freed cd->hash_table.
> 
> Could you share the reproducer and the panic stack trace?
> If the fault is in cache_check_rcu() through one of the
> sunrpc cache files, that confirms 5/6 is the right fix, and
> I'll happily carry your Tested-by on it.

The shell(Created will AI assist):

#!/bin/bash
#
# Test for e7fcf179b82d ("NFSD: Hold net reference for ...")
#
# Reproduces the scenario described in the commit:
#   1. Process opens /proc/fs/nfsd/exports in netns A
#   2. Process leaves A (joins B), emptying A
#   3. ip netns del A triggers nfsd_export_shutdown → cache_detail freed
#   4. Process reads from still-open fd → UAF on UNFIXED kernel
#
# On current kernel (with e7fcf179b82d applied):
#   get_net in exports_net_open prevents netns A from being destroyed
#   → read succeeds safely (test output: "SUCCESS")
#
# On kernel WITHOUT e7fcf179b82d:
#   No get_net → A destroyed → read triggers UAF:
#     - KASAN: use-after-free, or
#     - NULL deref, or
#     - slab corruption (ASCII strings like "cap_type", "libz.so.")
#
# Usage: sudo ./test_nfsd_exports_uaf.sh

set -e -u

NS_A="nfsd_test_A_$$"
NS_B="nfsd_test_B_$$"
SYNC="/tmp/nfsd_uaf_sync_$$"
GO="/tmp/nfsd_uaf_go_$$"
REPRO="/tmp/uaf_repro_$$"

cleanup() {
     set +e
     kill $REPRO_PID 2>/dev/null || true
     wait $REPRO_PID 2>/dev/null || true
     ip netns del "$NS_B" 2>/dev/null || true
     ip netns del "$NS_A" 2>/dev/null || true
     rm -f "$REPRO" "$SYNC" "$GO"
}
trap cleanup EXIT

echo "=== Reproduce e7fcf179b82d scenario ==="

# --- Setup ---
echo "[setup] creating netns A and B..."
ip netns add "$NS_A"
ip netns add "$NS_B"

echo "[setup] loading nfsd..."
modprobe nfsd || true

echo "[setup] compiling repro..."
gcc -o "$REPRO" /tmp/uaf_repro.c 2>/dev/null || \
     gcc -o "$REPRO" -x c - <<'SRCEOF' 2>/dev/null
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <signal.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
static volatile sig_atomic_t go_flag = 0;
static void handler(int sig) { go_flag = 1; }
int main(int argc, char *argv[]) {
     const char *netns_b = argv[1], *sync_f = argv[2], *go_f = argv[3];
     int fd, nsfd; ssize_t n; char buf[4096];
     fd = open("/proc/fs/nfs/exports", O_RDONLY);
     if (fd < 0) { perror("open exports debug"); return 1; }
     fprintf(stderr, "[repro] opened exports (fd=%d) in netns A\n", fd);
     nsfd = open(netns_b, O_RDONLY);
     if (nsfd < 0) { perror("open B"); return 1; }
     if (setns(nsfd, CLONE_NEWNET) < 0) { perror("setns"); return 1; }
     close(nsfd);
     fprintf(stderr, "[repro] moved to B; A has no processes\n");
     close(open(sync_f, O_CREAT | O_WRONLY, 0666));
     signal(SIGCONT, handler);
     while (!go_flag) { struct stat st; if (stat(go_f, &st) == 0) break; 
pause(); }
     fprintf(stderr, "[repro] reading exports fd...\n");
     lseek(fd, 0, SEEK_SET);
     sleep(1);
     n = read(fd, buf, sizeof(buf)-1);
     if (n < 0) { perror("read"); close(fd); return 1; }
     buf[n] = '\0';
     fprintf(stderr, "[repro] SUCCESS: read %zd bytes (no UAF)\n", n);
     close(fd);
     return 0;
}
SRCEOF

# --- Run repro inside A ---
rm -f "$SYNC" "$GO"
echo "[test] starting repro inside netns A..."
ip netns exec "$NS_A" "$REPRO" /var/run/netns/"$NS_B" "$SYNC" "$GO" &
REPRO_PID=$!

# --- Wait for repro to move to B ---
echo "[test] waiting for repro to signal that A is empty..."
for i in $(seq 1 30); do
     if [ -f "$SYNC" ]; then break; fi
     if ! kill -0 $REPRO_PID 2>/dev/null; then
         echo "[FAIL] repro exited prematurely"
         wait $REPRO_PID || true
         exit 1
     fi
     sleep 0.2
done

if [ ! -f "$SYNC" ]; then
     echo "[FAIL] timeout waiting for repro"
     exit 1
fi
echo "[test] repro moved to B"

# --- Destroy netns A ---
echo "[test] destroying netns A (ip netns del $NS_A)..."
set +e
ip netns del "$NS_A" 2>&1
RC=$?
set -e

if [ $RC -eq 0 ]; then
     echo "[test] 'ip netns del $NS_A' returned success"
else
     echo "[test] 'ip netns del $NS_A' returned $RC"
fi

# --- Signal repro to read from the exports fd ---
echo "[test] signaling repro to read from exports fd..."
touch "$GO"
kill -CONT $REPRO_PID 2>/dev/null || true

# --- Wait for repro and check result ---
set +e
wait $REPRO_PID
RC=$?
set -e

if [ $RC -eq 0 ]; then
     echo ""
     echo "=== TEST PASSED: no UAF detected (kernel has e7fcf179b82d 
fix) ==="
     echo "   get_net() holds netns A alive while the fd is open."
else
     echo ""
     echo "=== TEST FAILED with exit code $RC ==="
     echo "   Possible UAF or other error."
     echo "   If running on a kernel WITHOUT e7fcf179b82d, this crash is 
EXPECTED."
fi
exit $RC


Panic show as follow with commit:

commit 9189d23b835cec646ba5010db35d1557a77c5857 (HEAD -> master)
Author: Chuck Lever <chuck.lever@oracle.com>
Date:   Thu Oct 17 09:36:31 2024 -0400

     lockd: Remove unneeded initialization of file_lock::c.flc_flags


localhost login: [   39.462598][  T579] 
================================================================== 
  
  
     [202/363]
[   39.463541][  T579] BUG: KASAN: slab-use-after-free in 
cache_seq_next_rcu+0xa4/0x180 [sunrpc]
[   39.464551][  T579] Read of size 4 at addr ffff00000fbe8408 by task 
uaf_repro_563/579
[   39.465291][  T579]
[   39.465513][  T579] CPU: 1 UID: 0 PID: 579 Comm: uaf_repro_563 Not 
tainted 6.12.0-rc7+ #17
[   39.466349][  T579] Hardware name: linux,dummy-virt (DT)
[   39.466897][  T579] Call trace:
[   39.467224][  T579]  dump_backtrace+0xa4/0x140
[   39.467742][  T579]  show_stack+0x20/0x38
[   39.468156][  T579]  dump_stack_lvl+0x80/0xf8
[   39.468694][  T579]  print_report+0xfc/0x5c8
[   39.469237][  T579]  kasan_report+0x78/0xc8
[   39.469676][  T579]  __asan_load4+0x9c/0xc0
[   39.470115][  T579]  cache_seq_next_rcu+0xa4/0x180 [sunrpc]
[   39.470842][  T579]  seq_read_iter+0x4a0/0x6c0
[   39.471355][  T579]  seq_read+0x194/0x218
[   39.471770][  T579]  proc_reg_read+0x110/0x198
[   39.472235][  T579]  vfs_read+0x150/0x490
[   39.472656][  T579]  ksys_read+0xd4/0x198
[   39.473070][  T579]  __arm64_sys_read+0x4c/0x68
[   39.473537][  T579]  invoke_syscall+0x64/0x188
[   39.473992][  T579]  el0_svc_common.constprop.1+0xd8/0x158
[   39.474558][  T579]  do_el0_svc+0x38/0x50
[   39.474981][  T579]  el0_svc+0x34/0xc0
[   39.475422][  T579]  el0t_64_sync_handler+0xa0/0xc8
[   39.475933][  T579]  el0t_64_sync+0x188/0x190
[   39.476385][  T579]
[   39.476618][  T579] Allocated by task 566:
[   39.477087][  T579]  kasan_save_stack+0x2c/0x58
[   39.477561][  T579]  kasan_save_track+0x20/0x40
[   39.478030][  T579]  kasan_save_alloc_info+0x40/0x58
[   39.478539][  T579]  __kasan_kmalloc+0xa0/0xb8
[   39.478997][  T579]  __kmalloc_node_track_caller_noprof+0x194/0x370
[   39.479646][  T579]  kmemdup_noprof+0x34/0x68
[   39.480094][  T579]  cache_create_net+0x30/0x108 [sunrpc]
[   39.480800][  T579]  nfsd_export_init+0x78/0x188 [nfsd]
[   39.481505][  T579]  nfsd_net_init+0x50/0x1e8 [nfsd]
[   39.482136][  T579]  ops_init+0xcc/0x210
[   39.482615][  T579]  register_pernet_operations+0x218/0x348
[   39.483180][  T579]  register_pernet_subsys+0x38/0x60
[   39.483698][  T579]  0xffffb6c9bf5b90c0
[   39.484096][  T579]  do_one_initcall+0xa8/0x3c8
[   39.484563][  T579]  do_init_module+0x100/0x378
[   39.485070][  T579]  load_module+0x2d78/0x2e80
[   39.485532][  T579]  init_module_from_file+0xec/0x148
[   39.486044][  T579]  __arm64_sys_finit_module+0x394/0x618
[   39.486604][  T579]  invoke_syscall+0x64/0x188
[   39.487065][  T579]  el0_svc_common.constprop.1+0xd8/0x158
[   39.487629][  T579]  do_el0_svc+0x38/0x50
[   39.488044][  T579]  el0_svc+0x34/0xc0
[   39.488437][  T579]  el0t_64_sync_handler+0xa0/0xc8
[   39.488939][  T579]  el0t_64_sync+0x188/0x190
[   39.489398][  T579]
[   39.489635][  T579] Freed by task 53:
[   39.490013][  T579]  kasan_save_stack+0x2c/0x58
[   39.490479][  T579]  kasan_save_track+0x20/0x40
[   39.490948][  T579]  kasan_save_free_info+0x4c/0x78
[   39.491449][  T579]  __kasan_slab_free+0x50/0x70
[   39.491924][  T579]  kfree+0x160/0x310
[   39.492312][  T579]  cache_destroy_net+0x34/0x50 [sunrpc]
[   39.493015][  T579]  nfsd_export_shutdown+0xc0/0x150 [nfsd]
[   39.493711][  T579]  nfsd_net_exit+0x68/0x88 [nfsd]
[   39.494338][  T579]  ops_exit_list.isra.13+0x64/0xc0
[   39.494856][  T579]  cleanup_net+0x508/0x788
[   39.495300][  T579]  process_scheduled_works+0x3d8/0x7e8
[   39.495895][  T579]  worker_thread+0x29c/0x630
[   39.496364][  T579]  kthread+0x170/0x188
[   39.496773][  T579]  ret_from_fork+0x10/0x20
[   39.497217][  T579]
[   39.497453][  T579] The buggy address belongs to the object at 
ffff00000fbe8400

I have try to replace
fd = open("/proc/fs/nfs/exports", O_RDONLY);
with
fd = open("/proc/fs/nfs/exports", O_RDONLY);

No c_show UAF trigger...


> 
> 


  reply	other threads:[~2026-05-09  9:41 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01 14:51 [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files Chuck Lever
2026-05-01 14:51 ` [PATCH 1/6] SUNRPC: Move cache_initialize() declaration to sunrpc-private header Chuck Lever
2026-05-01 14:51 ` [PATCH 2/6] SUNRPC: Provide a shared workqueue for cache release callbacks Chuck Lever
2026-05-01 14:51 ` [PATCH 3/6] SUNRPC: Defer ip_map sub-object cleanup past RCU grace period Chuck Lever
2026-05-01 14:51 ` [PATCH 4/6] SUNRPC: Use shared release pattern for the unix_gid cache Chuck Lever
2026-05-01 14:51 ` [PATCH 5/6] SUNRPC: Hold cd->net for the lifetime of cache files Chuck Lever
2026-05-01 14:51 ` [PATCH 6/6] NFSD: Convert nfsd_export_shutdown() to sunrpc_cache_destroy_net() Chuck Lever
2026-05-05  5:32 ` [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files Jeff Layton
2026-05-05 10:49 ` Calum Mackay
2026-05-05 10:53   ` Chuck Lever
2026-05-07  9:09 ` yangerkun
2026-05-07 16:12   ` Chuck Lever
2026-05-08  2:45     ` yangerkun
2026-05-08  3:08       ` yangerkun
2026-05-08  8:16         ` yangerkun
2026-05-08 13:00           ` yangerkun
2026-05-08 20:47             ` Chuck Lever
2026-05-09  9:41               ` yangerkun [this message]
2026-05-10 16:18                 ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ee398d0-d2ec-45b2-8214-6e35520fca2d@huawei.com \
    --to=yangerkun@huawei.com \
    --cc=Dai.Ngo@oracle.com \
    --cc=anna@kernel.org \
    --cc=cel@kernel.org \
    --cc=chengzhihao1@huawei.com \
    --cc=chuck.lever@oracle.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jlayton@kernel.org \
    --cc=kuba@kernel.org \
    --cc=lilingfeng3@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=misanjum@linux.ibm.com \
    --cc=neil@brown.name \
    --cc=netdev@vger.kernel.org \
    --cc=okorniev@redhat.com \
    --cc=pabeni@redhat.com \
    --cc=tom@talpey.com \
    --cc=trondmy@kernel.org \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox