Netdev List
 help / color / mirror / Atom feed
From: yangerkun <yangerkun@huawei.com>
To: Chuck Lever <cel@kernel.org>,
	Misbah Anjum N <misanjum@linux.ibm.com>,
	Jeff Layton <jlayton@kernel.org>, NeilBrown <neil@brown.name>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	Trond Myklebust <trondmy@kernel.org>,
	Anna Schumaker <anna@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>, <yi.zhang@huawei.com>,
	Zhihao Cheng <chengzhihao1@huawei.com>,
	Li Lingfeng <lilingfeng3@huawei.com>
Cc: <linux-nfs@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<netdev@vger.kernel.org>, Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files
Date: Fri, 8 May 2026 21:00:12 +0800	[thread overview]
Message-ID: <39819ad4-3105-4802-b5e2-79e131b25984@huawei.com> (raw)
In-Reply-To: <10019b42-4589-4f9f-8d5b-d8197db1ce3c@huawei.com>



在 2026/5/8 16:16, yangerkun 写道:
> 
> 
> 在 2026/5/8 11:08, yangerkun 写道:
>>
>>
>> 在 2026/5/8 10:45, yangerkun 写道:
>>> Hello  Chuck,
>>>
>>> 在 2026/5/8 0:12, Chuck Lever 写道:
>>>> Hello Erkun -
>>>>
>>>> On Thu, May 7, 2026, at 11:09 AM, yangerkun wrote:
>>>>> Hi,
>>>>>
>>>>> 在 2026/5/1 22:51, Chuck Lever 写道:
>>>>>> Misbah Anjum reported a use-after-free in cache_check_rcu()
>>>>>> reached through e_show() while sosreport was reading
>>>>>> /proc/fs/nfsd/exports on ppc64le.  Two fixes for that report
>>>>>> landed in v7.0:
>>>>>>
>>>>>>     48db892356d6 ("NFSD: Defer sub-object cleanup in export put 
>>>>>> callbacks")
>>>>>>     e7fcf179b82d ("NFSD: Hold net reference for the lifetime of / 
>>>>>> proc/fs/nfs/exports fd")
>>>>>
>>>>> Back to the problem fixed by this patches, I'm a little confused why
>>>>> this UAF can be trigged.
>>>>>
>>>>> Before this patches, svc_export_put show as follow:
>>>>>
>>>>>    368 static void svc_export_put(struct kref *ref)
>>>>>    369 {
>>>>>    370         struct svc_export *exp = container_of(ref, struct
>>>>> svc_export, h.ref);
>>>>>    371
>>>>>    372         path_put(&exp->ex_path);
>>>>>    373         auth_domain_put(exp->ex_client);
>>>>>    374         call_rcu(&exp->ex_rcu, svc_export_release);
>>>>>    375 }
>>>>>
>>>>> The auth_domain_put function releases ->name using call_rcu, and
>>>>> path_put may release the dentry also via call_rcu. All of this 
>>>>> seems to
>>>>> prevent e_show from causing a UAF. Could you point out which line in
>>>>> d_path triggers the issue?
>>>>
>>>> The dentry, the mount, and the auth_domain ->name buffer all
>>>> end up RCU-freed (dentry_free() and delayed_free_vfsmnt in
>>>> fs/, svcauth_unix_domain_release_rcu() in svcauth_unix.c).
>>>> The eventual kfree isn't the problem.
>>>>
>>>> The problem is the synchronous teardown inside path_put(),
>>>> which runs before svc_export_put() ever reaches its own
>>>> call_rcu():
>>>>
>>>>    path_put(&exp->ex_path)
>>>>      -> dput(dentry)
>>>>         -> __dentry_kill()              [if last ref]
>>>>            -> __d_drop()                /* unhashes */
>>>>            -> dentry_unlink_inode()     /* d_inode = NULL */
>>>>            -> d_op->d_release() if set
>>>>            -> drops parent d_lockref    /* may cascade up */
>>>>            -> dentry_free()             /* call_rcu deferred */
>>>>      -> mntput(mnt)                     /* deferred via task_work */
>>>>
>>>> The dentry pointer itself is RCU-safe, so prepend_path()'s walk
>>>> of d_parent and d_name doesn't read freed memory.  But by the
>>>> time the reader gets there, __d_clear_type_and_inode() has
>>>> already stored NULL into d_inode, __d_drop() has broken the
>>>> hash linkage, and the parent's d_lockref has been decremented
>>>> -- which can in turn fire __dentry_kill() on the parent, and
>>>> on up the tree.  An e_show() that's still inside its cache RCU
>>>> read section walks into that half-dismantled state through
>>>> seq_path(), and that's the NULL deref Misbah reported.
>>>
>>> Thank you for your detailed explanation! Yes, e_show might be called 
>>> when the state is partially dismantled, but after carefully reviewing 
>>> the code with dput up to __dentry_kill, I still cannot find anything 
>>> that could cause this issue. Additionally, the comments for 
>>> prepend_path indicate that they have already taken into account that 
>>> the dentry can be removed concurrently. I have also run some tests on 
>>> my arm64 QEMU, but I couldn't reproduce the problem either. Could you 
>>> please help me identify the specific line or pointer in the dentry 
>>> that triggers this use-after-free or null pointer issue?
>>>
>>> Maybe I am not be very familiar with the code, which caused me to 
>>> fail to identify the real root cause. I'm so sorry for that.
>>>
>>>
>>> 265 char *d_path(const struct path *path, char *buf, int buflen)
>>> 266 {
>>> 267         DECLARE_BUFFER(b, buf, buflen);
>>> 268         struct path root;
>>> 269
>>> 270         /*
>>> 271          * We have various synthetic filesystems that never get 
>>> mounted.  On
>>> 272          * these filesystems dentries are never used for lookup 
>>> purposes, and
>>> 273          * thus don't need to be hashed.  They also don't need a 
>>> name until a
>>> 274          * user wants to identify the object in /proc/pid/fd/. 
>>> The little hack
>>> 275          * below allows us to generate a name for these objects 
>>> on demand:
>>> 276          *
>>> 277          * Some pseudo inodes are mountable.  When they are mounted
>>> 278          * path->dentry == path->mnt->mnt_root.  In that case 
>>> don't call d_dname
>>> 279          * and instead have d_path return the mounted path.
>>> 280          */
>>> 281         if (path->dentry->d_op && path->dentry->d_op->d_dname &&
>>> 282             (!IS_ROOT(path->dentry) || path->dentry != path->mnt- 
>>>  >mnt_root))
>>> 283                 return path->dentry->d_op->d_dname(path->dentry, 
>>> buf, buflen);
>>> 284
>>> 285         rcu_read_lock();
>>> 286         get_fs_root_rcu(current->fs, &root);
>>> 287         if (unlikely(d_unlinked(path->dentry)))
>>> 288                 prepend(&b, " (deleted)", 11);
>>> 289         else
>>> 290                 prepend_char(&b, 0);
>>> 291         prepend_path(path, &root, &b);
>>> 292         rcu_read_unlock();
>>> 293
>>> 294         return extract_string(&b);
>>> 295 }
>>>
>>>
>>>>
>>>> The earlier fix (2530766492ec, "nfsd: fix UAF when access
>>>> ex_uuid or ex_stats") moved the kfree of ex_uuid and ex_stats
>>>> into svc_export_release() so those are RCU-safe now.
>>>> path_put() and auth_domain_put() couldn't go in there because
>>>> both may sleep, and call_rcu callbacks run in softirq context.
>>>> This series uses queue_rcu_work() instead: it defers past the
>>>> grace period AND runs the callback in process context, so the
>>>> sleeping puts move into the deferred path and the window
>>>> closes.
>>>
>>> Yeah, I can get this! Thanks again for your detail explanation!
>>
>> Also, could the scenario described in this commit be triggered again?
>>
>> commit 69d803c40edeaf94089fbc8751c9b746cdc35044
>> Author: Yang Erkun <yangerkun@huawei.com>
>> Date:   Mon Dec 16 22:21:52 2024 +0800
>>
>>      nfsd: Revert "nfsd: release svc_expkey/svc_export with rcu_work"
>>
>>      This reverts commit f8c989a0c89a75d30f899a7cabdc14d72522bb8d.
>>
>>      Before this commit, svc_export_put or expkey_put will call 
>> path_put with
>>      sync mode. After this commit, path_put will be called with async 
>> mode.
>>      And this can lead the unexpected results show as follow.
>>
>>      mkfs.xfs -f /dev/sda
>>      echo "/ *(rw,no_root_squash,fsid=0)" > /etc/exports
>>      echo "/mnt *(rw,no_root_squash,fsid=1)" >> /etc/exports
>>      exportfs -ra
>>      service nfs-server start
>>      mount -t nfs -o vers=4.0 127.0.0.1:/mnt /mnt1
>>      mount /dev/sda /mnt/sda
>>      touch /mnt1/sda/file
>>      exportfs -r
>>      umount /mnt/sda # failed unexcepted
>>
>>      The touch will finally call nfsd_cross_mnt, add refcount to 
>> mount, and
>>      then add cache_head. Before this commit, exportfs -r will call
>>      cache_flush to cleanup all cache_head, and path_put in
>>      svc_export_put/expkey_put will be finished with sync mode. So, the
>>      latter umount will always success. However, after this commit, 
>> path_put
>>      will be called with async mode, the latter umount may failed, and if
>>      we add some delay, umount will success too. Personally I think 
>> this bug
>>      and should be fixed. We first revert before bugfix patch, and 
>> then fix
>>      the original bug with a different way.
>>
>>      Fixes: f8c989a0c89a ("nfsd: release svc_expkey/svc_export with 
>> rcu_work")
>>      Signed-off-by: Yang Erkun <yangerkun@huawei.com>
>>      Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>
>>
> 
> After reviewing these two commits:
> 
> e7fcf179b82d NFSD: Hold net reference for the lifetime of /proc/fs/nfs/ 
> exports fd
> 48db892356d6 NFSD: Defer sub-object cleanup in export put callbacks
> 
> I believe that the issue described in commit e7fcf179b82d might be the
> root cause of the null pointer dereferences mentioned in [1]. This is
> because we do not call get_net when opening /proc/fs/nfs/exports. As a
> result, when the network namespace exits, nfsd_net_exit is triggered.
> If, at the same time, the contents of /proc/fs/nfs/exports are being
> read, a use-after-free (UAF) can occur on the struct cache_detail. I
> think all three bugs referenced in [1] stem from this issue. Therefore,
> commit e7fcf179b82d has already addressed the problem. To prevent the
> issue described in commit 69d803c40ede, should we consider reverting
> commit 48db892356d6 first? Please let me know if I have misunderstood
> any aspect of this problem.

Locally, I wrote a stable regression test case. I also reverted to 
commit 9189d23b835cec646ba5010db35d1557a77c5857 (which is before commits 
2862eee078a4 "SUNRPC: make sure cache entry active before cache_show" 
and be8f982c369c "nfsd: make sure exp active before svc_export_show"). 
Even then, a panic can still be triggered without any actual export path...

> 
>>>
>>> Thanks,
>>> Erkun.
>>>
>>>>
>>>>
>>>
>>
>>
>>
> 
> 
> 


  reply	other threads:[~2026-05-08 13:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01 14:51 [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files Chuck Lever
2026-05-01 14:51 ` [PATCH 1/6] SUNRPC: Move cache_initialize() declaration to sunrpc-private header Chuck Lever
2026-05-01 14:51 ` [PATCH 2/6] SUNRPC: Provide a shared workqueue for cache release callbacks Chuck Lever
2026-05-01 14:51 ` [PATCH 3/6] SUNRPC: Defer ip_map sub-object cleanup past RCU grace period Chuck Lever
2026-05-01 14:51 ` [PATCH 4/6] SUNRPC: Use shared release pattern for the unix_gid cache Chuck Lever
2026-05-01 14:51 ` [PATCH 5/6] SUNRPC: Hold cd->net for the lifetime of cache files Chuck Lever
2026-05-01 14:51 ` [PATCH 6/6] NFSD: Convert nfsd_export_shutdown() to sunrpc_cache_destroy_net() Chuck Lever
2026-05-05  5:32 ` [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files Jeff Layton
2026-05-05 10:49 ` Calum Mackay
2026-05-05 10:53   ` Chuck Lever
2026-05-07  9:09 ` yangerkun
2026-05-07 16:12   ` Chuck Lever
2026-05-08  2:45     ` yangerkun
2026-05-08  3:08       ` yangerkun
2026-05-08  8:16         ` yangerkun
2026-05-08 13:00           ` yangerkun [this message]
2026-05-08 20:47             ` Chuck Lever
2026-05-09  9:41               ` yangerkun
2026-05-10 16:18                 ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39819ad4-3105-4802-b5e2-79e131b25984@huawei.com \
    --to=yangerkun@huawei.com \
    --cc=Dai.Ngo@oracle.com \
    --cc=anna@kernel.org \
    --cc=cel@kernel.org \
    --cc=chengzhihao1@huawei.com \
    --cc=chuck.lever@oracle.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jlayton@kernel.org \
    --cc=kuba@kernel.org \
    --cc=lilingfeng3@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=misanjum@linux.ibm.com \
    --cc=neil@brown.name \
    --cc=netdev@vger.kernel.org \
    --cc=okorniev@redhat.com \
    --cc=pabeni@redhat.com \
    --cc=tom@talpey.com \
    --cc=trondmy@kernel.org \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox