The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* Re: [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files
       [not found] <20260501-cache-uaf-fix-v1-0-a49928bf4817@oracle.com>
@ 2026-05-05  5:32 ` Jeff Layton
  2026-05-05 10:49 ` Calum Mackay
  2026-05-07  9:09 ` yangerkun
  2 siblings, 0 replies; 12+ messages in thread
From: Jeff Layton @ 2026-05-05  5:32 UTC (permalink / raw)
  To: Chuck Lever, Misbah Anjum N, NeilBrown, Olga Kornievskaia,
	Dai Ngo, Tom Talpey, Trond Myklebust, Anna Schumaker,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Yang Erkun
  Cc: linux-nfs, linux-kernel, netdev, Chuck Lever

On Fri, 2026-05-01 at 10:51 -0400, Chuck Lever wrote:
> Misbah Anjum reported a use-after-free in cache_check_rcu()
> reached through e_show() while sosreport was reading
> /proc/fs/nfsd/exports on ppc64le.  Two fixes for that report
> landed in v7.0:
> 
>   48db892356d6 ("NFSD: Defer sub-object cleanup in export put callbacks")
>   e7fcf179b82d ("NFSD: Hold net reference for the lifetime of /proc/fs/nfs/exports fd")
> 
> The original e_show() repro is now fixed.  However, the same
> sosreport workload still reproduces a closely related fault on
> post-v7.0 mainline (Misbah, ppc64le) and on master.20260424
> (internal report, aarch64).  In both cases the fault is in
> cache_check_rcu() reached through c_show() rather than e_show(),
> and the cache_head pointer is plain garbage:
> 
>   pc : cache_check_rcu+0x40 [sunrpc]
>   lr : c_show+0x60 [sunrpc]
>   ...faulting on h->flags off h = 0x0000000200000000
> 
> c_show() is the generic show callback used by
> /proc/net/rpc/<cd>/content for every per-net cache_detail
> (auth.unix.ip, auth.unix.gid, nfsd.fh, nfsd.export).  Two
> bugs combine in that path:
> 
> 1. cache_unregister_net() / cache_destroy_net() free cd and
>    cd->hash_table synchronously when the namespace exits.  The
>    /proc/net/rpc/.../content open path takes only a module
>    reference, so a fd kept open across a netns exit walks a
>    freed hash_table and returns garbage cache_head pointers.
>    This is the same hazard that e7fcf179b82d closed for the
>    /proc/fs/nfs/exports file alone.
> 
> 2. ip_map_put() drops auth_domain_put() before kfree_rcu(), so
>    sub-objects can be freed before the RCU grace period -- the
>    same hazard that 48db892356d6 fixed for svc_export_put() and
>    expkey_put().  unix_gid_put() does not have this bug
>    structurally (its put_group_info() runs inside the call_rcu()
>    callback) but it uses a separate idiom from the other three
>    caches.
> 
> This series replaces the v1 narrow fixes with shared
> infrastructure that covers all four cache_detail .put paths
> and all three per-cache file types:
> 
> Patch 1 hoists nfsd_export_wq up to the sunrpc layer as
> sunrpc_cache_wq, exposed through sunrpc_cache_queue_release()
> and sunrpc_cache_drain() so all four put callbacks share one
> workqueue and one drain primitive.
> 
> Patch 2 converts ip_map_put() to the queue_rcu_work() pattern,
> moving auth_domain_put() into a deferred ip_map_release() that
> runs after the RCU grace period.
> 
> Patch 3 unifies unix_gid_put() onto the same pattern for
> consistency (not a bug fix on its own).
> 
> Patch 4 takes a get_net(cd->net) in content_open(), cache_open(),
> and open_flush() and drops it in the matching release helpers,
> so cache_destroy_net() cannot run while a sunrpc cache fd is
> open.
> 
> Series has been compile-tested only.
> 
> ---
> Chuck Lever (6):
>       SUNRPC: Move cache_initialize() declaration to sunrpc-private header
>       SUNRPC: Provide a shared workqueue for cache release callbacks
>       SUNRPC: Defer ip_map sub-object cleanup past RCU grace period
>       SUNRPC: Use shared release pattern for the unix_gid cache
>       SUNRPC: Hold cd->net for the lifetime of cache files
>       NFSD: Convert nfsd_export_shutdown() to sunrpc_cache_destroy_net()
> 
>  fs/nfsd/export.c             | 45 ++--------------------
>  fs/nfsd/export.h             |  2 -
>  fs/nfsd/nfsctl.c             |  8 +---
>  include/linux/sunrpc/cache.h |  3 +-
>  net/sunrpc/cache.c           | 90 ++++++++++++++++++++++++++++++++++++++++++--
>  net/sunrpc/sunrpc.h          |  2 +
>  net/sunrpc/sunrpc_syms.c     | 23 ++++++-----
>  net/sunrpc/svcauth_unix.c    | 46 ++++++++++++----------
>  8 files changed, 135 insertions(+), 84 deletions(-)
> ---
> base-commit: f3a313ecd1fdab1f5da119db355363b13af6fcac
> change-id: 20260430-cache-uaf-fix-a13000f67c37
> 
> Best regards,
> --  
> Chuck Lever

The series looks sane.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files
       [not found] <20260501-cache-uaf-fix-v1-0-a49928bf4817@oracle.com>
  2026-05-05  5:32 ` [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files Jeff Layton
@ 2026-05-05 10:49 ` Calum Mackay
  2026-05-05 10:53   ` Chuck Lever
  2026-05-07  9:09 ` yangerkun
  2 siblings, 1 reply; 12+ messages in thread
From: Calum Mackay @ 2026-05-05 10:49 UTC (permalink / raw)
  To: Chuck Lever, Misbah Anjum N, Jeff Layton, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Yang Erkun
  Cc: Calum Mackay, linux-nfs, linux-kernel, netdev, Chuck Lever,
	alexandr.alexandrov

On 01/05/2026 3:51 pm, Chuck Lever wrote:
> Misbah Anjum reported a use-after-free in cache_check_rcu()
> reached through e_show() while sosreport was reading
> /proc/fs/nfsd/exports on ppc64le.  Two fixes for that report
> landed in v7.0:
> 
>    48db892356d6 ("NFSD: Defer sub-object cleanup in export put callbacks")
>    e7fcf179b82d ("NFSD: Hold net reference for the lifetime of /proc/fs/nfs/exports fd")
> 
> The original e_show() repro is now fixed.  However, the same
> sosreport workload still reproduces a closely related fault on
> post-v7.0 mainline (Misbah, ppc64le) and on master.20260424
> (internal report, aarch64).  In both cases the fault is in
> cache_check_rcu() reached through c_show() rather than e_show(),
> and the cache_head pointer is plain garbage:
> 
>    pc : cache_check_rcu+0x40 [sunrpc]
>    lr : c_show+0x60 [sunrpc]
>    ...faulting on h->flags off h = 0x0000000200000000
> 
> c_show() is the generic show callback used by
> /proc/net/rpc/<cd>/content for every per-net cache_detail
> (auth.unix.ip, auth.unix.gid, nfsd.fh, nfsd.export).  Two
> bugs combine in that path:
> 
> 1. cache_unregister_net() / cache_destroy_net() free cd and
>     cd->hash_table synchronously when the namespace exits.  The
>     /proc/net/rpc/.../content open path takes only a module
>     reference, so a fd kept open across a netns exit walks a
>     freed hash_table and returns garbage cache_head pointers.
>     This is the same hazard that e7fcf179b82d closed for the
>     /proc/fs/nfs/exports file alone.
> 
> 2. ip_map_put() drops auth_domain_put() before kfree_rcu(), so
>     sub-objects can be freed before the RCU grace period -- the
>     same hazard that 48db892356d6 fixed for svc_export_put() and
>     expkey_put().  unix_gid_put() does not have this bug
>     structurally (its put_group_info() runs inside the call_rcu()
>     callback) but it uses a separate idiom from the other three
>     caches.
> 
> This series replaces the v1 narrow fixes with shared
> infrastructure that covers all four cache_detail .put paths
> and all three per-cache file types:
> 
> Patch 1 hoists nfsd_export_wq up to the sunrpc layer as
> sunrpc_cache_wq, exposed through sunrpc_cache_queue_release()
> and sunrpc_cache_drain() so all four put callbacks share one
> workqueue and one drain primitive.
> 
> Patch 2 converts ip_map_put() to the queue_rcu_work() pattern,
> moving auth_domain_put() into a deferred ip_map_release() that
> runs after the RCU grace period.
> 
> Patch 3 unifies unix_gid_put() onto the same pattern for
> consistency (not a bug fix on its own).
> 
> Patch 4 takes a get_net(cd->net) in content_open(), cache_open(),
> and open_flush() and drops it in the matching release helpers,
> so cache_destroy_net() cannot run while a sunrpc cache fd is
> open.
> 
> Series has been compile-tested only.
> 
> ---
> Chuck Lever (6):
>        SUNRPC: Move cache_initialize() declaration to sunrpc-private header
>        SUNRPC: Provide a shared workqueue for cache release callbacks
>        SUNRPC: Defer ip_map sub-object cleanup past RCU grace period
>        SUNRPC: Use shared release pattern for the unix_gid cache
>        SUNRPC: Hold cd->net for the lifetime of cache files
>        NFSD: Convert nfsd_export_shutdown() to sunrpc_cache_destroy_net()
> 
>   fs/nfsd/export.c             | 45 ++--------------------
>   fs/nfsd/export.h             |  2 -
>   fs/nfsd/nfsctl.c             |  8 +---
>   include/linux/sunrpc/cache.h |  3 +-
>   net/sunrpc/cache.c           | 90 ++++++++++++++++++++++++++++++++++++++++++--
>   net/sunrpc/sunrpc.h          |  2 +
>   net/sunrpc/sunrpc_syms.c     | 23 ++++++-----
>   net/sunrpc/svcauth_unix.c    | 46 ++++++++++++----------
>   8 files changed, 135 insertions(+), 84 deletions(-)
> ---
> base-commit: f3a313ecd1fdab1f5da119db355363b13af6fcac
> change-id: 20260430-cache-uaf-fix-a13000f67c37
> 
> Best regards,
> --
> Chuck Lever
> 
> 

Looks good Chuck, thanks very much.

With these patches, testing shows no crashes, sosreport no longer hangs, 
no seq_file errors.

Tested-by: Alexandr Alexandrov <alexandr.alexandrov@oracle.com>

cheers,
c.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files
  2026-05-05 10:49 ` Calum Mackay
@ 2026-05-05 10:53   ` Chuck Lever
  0 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2026-05-05 10:53 UTC (permalink / raw)
  To: Calum Mackay, Misbah Anjum N, Jeff Layton, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Yang Erkun
  Cc: linux-nfs, linux-kernel, netdev, Chuck Lever, alexandr.alexandrov

On 5/5/26 12:49 PM, Calum Mackay wrote:
> On 01/05/2026 3:51 pm, Chuck Lever wrote:
>> Misbah Anjum reported a use-after-free in cache_check_rcu()
>> reached through e_show() while sosreport was reading
>> /proc/fs/nfsd/exports on ppc64le.  Two fixes for that report
>> landed in v7.0:
>>
>>    48db892356d6 ("NFSD: Defer sub-object cleanup in export put
>> callbacks")
>>    e7fcf179b82d ("NFSD: Hold net reference for the lifetime of /proc/
>> fs/nfs/exports fd")
>>
>> The original e_show() repro is now fixed.  However, the same
>> sosreport workload still reproduces a closely related fault on
>> post-v7.0 mainline (Misbah, ppc64le) and on master.20260424
>> (internal report, aarch64).  In both cases the fault is in
>> cache_check_rcu() reached through c_show() rather than e_show(),
>> and the cache_head pointer is plain garbage:
>>
>>    pc : cache_check_rcu+0x40 [sunrpc]
>>    lr : c_show+0x60 [sunrpc]
>>    ...faulting on h->flags off h = 0x0000000200000000
>>
>> c_show() is the generic show callback used by
>> /proc/net/rpc/<cd>/content for every per-net cache_detail
>> (auth.unix.ip, auth.unix.gid, nfsd.fh, nfsd.export).  Two
>> bugs combine in that path:
>>
>> 1. cache_unregister_net() / cache_destroy_net() free cd and
>>     cd->hash_table synchronously when the namespace exits.  The
>>     /proc/net/rpc/.../content open path takes only a module
>>     reference, so a fd kept open across a netns exit walks a
>>     freed hash_table and returns garbage cache_head pointers.
>>     This is the same hazard that e7fcf179b82d closed for the
>>     /proc/fs/nfs/exports file alone.
>>
>> 2. ip_map_put() drops auth_domain_put() before kfree_rcu(), so
>>     sub-objects can be freed before the RCU grace period -- the
>>     same hazard that 48db892356d6 fixed for svc_export_put() and
>>     expkey_put().  unix_gid_put() does not have this bug
>>     structurally (its put_group_info() runs inside the call_rcu()
>>     callback) but it uses a separate idiom from the other three
>>     caches.
>>
>> This series replaces the v1 narrow fixes with shared
>> infrastructure that covers all four cache_detail .put paths
>> and all three per-cache file types:
>>
>> Patch 1 hoists nfsd_export_wq up to the sunrpc layer as
>> sunrpc_cache_wq, exposed through sunrpc_cache_queue_release()
>> and sunrpc_cache_drain() so all four put callbacks share one
>> workqueue and one drain primitive.
>>
>> Patch 2 converts ip_map_put() to the queue_rcu_work() pattern,
>> moving auth_domain_put() into a deferred ip_map_release() that
>> runs after the RCU grace period.
>>
>> Patch 3 unifies unix_gid_put() onto the same pattern for
>> consistency (not a bug fix on its own).
>>
>> Patch 4 takes a get_net(cd->net) in content_open(), cache_open(),
>> and open_flush() and drops it in the matching release helpers,
>> so cache_destroy_net() cannot run while a sunrpc cache fd is
>> open.
>>
>> Series has been compile-tested only.
>>
>> ---
>> Chuck Lever (6):
>>        SUNRPC: Move cache_initialize() declaration to sunrpc-private
>> header
>>        SUNRPC: Provide a shared workqueue for cache release callbacks
>>        SUNRPC: Defer ip_map sub-object cleanup past RCU grace period
>>        SUNRPC: Use shared release pattern for the unix_gid cache
>>        SUNRPC: Hold cd->net for the lifetime of cache files
>>        NFSD: Convert nfsd_export_shutdown() to sunrpc_cache_destroy_net()
>>
>>   fs/nfsd/export.c             | 45 ++--------------------
>>   fs/nfsd/export.h             |  2 -
>>   fs/nfsd/nfsctl.c             |  8 +---
>>   include/linux/sunrpc/cache.h |  3 +-
>>   net/sunrpc/cache.c           | 90 ++++++++++++++++++++++++++++++++++
>> ++++++++--
>>   net/sunrpc/sunrpc.h          |  2 +
>>   net/sunrpc/sunrpc_syms.c     | 23 ++++++-----
>>   net/sunrpc/svcauth_unix.c    | 46 ++++++++++++----------
>>   8 files changed, 135 insertions(+), 84 deletions(-)
>> ---
>> base-commit: f3a313ecd1fdab1f5da119db355363b13af6fcac
>> change-id: 20260430-cache-uaf-fix-a13000f67c37
>>
>> Best regards,
>> -- 
>> Chuck Lever
>>
>>
> 
> Looks good Chuck, thanks very much.
> 
> With these patches, testing shows no crashes, sosreport no longer hangs,
> no seq_file errors.
> 
> Tested-by: Alexandr Alexandrov <alexandr.alexandrov@oracle.com>
> 
> cheers,
> c.
> 

Excellent; pushed with Jeff's R-b and Alexandr's T-b.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files
       [not found] <20260501-cache-uaf-fix-v1-0-a49928bf4817@oracle.com>
  2026-05-05  5:32 ` [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files Jeff Layton
  2026-05-05 10:49 ` Calum Mackay
@ 2026-05-07  9:09 ` yangerkun
  2026-05-07 16:12   ` Chuck Lever
  2 siblings, 1 reply; 12+ messages in thread
From: yangerkun @ 2026-05-07  9:09 UTC (permalink / raw)
  To: Chuck Lever, Misbah Anjum N, Jeff Layton, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: linux-nfs, linux-kernel, netdev, Chuck Lever, yangerkun

Hi,

在 2026/5/1 22:51, Chuck Lever 写道:
> Misbah Anjum reported a use-after-free in cache_check_rcu()
> reached through e_show() while sosreport was reading
> /proc/fs/nfsd/exports on ppc64le.  Two fixes for that report
> landed in v7.0:
> 
>    48db892356d6 ("NFSD: Defer sub-object cleanup in export put callbacks")
>    e7fcf179b82d ("NFSD: Hold net reference for the lifetime of /proc/fs/nfs/exports fd")

Back to the problem fixed by this patches, I'm a little confused why
this UAF can be trigged.

Before this patches, svc_export_put show as follow:

  368 static void svc_export_put(struct kref *ref)
  369 {
  370         struct svc_export *exp = container_of(ref, struct 
svc_export, h.ref);
  371
  372         path_put(&exp->ex_path);
  373         auth_domain_put(exp->ex_client);
  374         call_rcu(&exp->ex_rcu, svc_export_release);
  375 }

The auth_domain_put function releases ->name using call_rcu, and
path_put may release the dentry also via call_rcu. All of this seems to
prevent e_show from causing a UAF. Could you point out which line in
d_path triggers the issue?

Thanks,
Erkun.


> 
> The original e_show() repro is now fixed.  However, the same
> sosreport workload still reproduces a closely related fault on
> post-v7.0 mainline (Misbah, ppc64le) and on master.20260424
> (internal report, aarch64).  In both cases the fault is in
> cache_check_rcu() reached through c_show() rather than e_show(),
> and the cache_head pointer is plain garbage:
> 
>    pc : cache_check_rcu+0x40 [sunrpc]
>    lr : c_show+0x60 [sunrpc]
>    ...faulting on h->flags off h = 0x0000000200000000
> 
> c_show() is the generic show callback used by
> /proc/net/rpc/<cd>/content for every per-net cache_detail
> (auth.unix.ip, auth.unix.gid, nfsd.fh, nfsd.export).  Two
> bugs combine in that path:
> 
> 1. cache_unregister_net() / cache_destroy_net() free cd and
>     cd->hash_table synchronously when the namespace exits.  The
>     /proc/net/rpc/.../content open path takes only a module
>     reference, so a fd kept open across a netns exit walks a
>     freed hash_table and returns garbage cache_head pointers.
>     This is the same hazard that e7fcf179b82d closed for the
>     /proc/fs/nfs/exports file alone.
> 
> 2. ip_map_put() drops auth_domain_put() before kfree_rcu(), so
>     sub-objects can be freed before the RCU grace period -- the
>     same hazard that 48db892356d6 fixed for svc_export_put() and
>     expkey_put().  unix_gid_put() does not have this bug
>     structurally (its put_group_info() runs inside the call_rcu()
>     callback) but it uses a separate idiom from the other three
>     caches.
> 
> This series replaces the v1 narrow fixes with shared
> infrastructure that covers all four cache_detail .put paths
> and all three per-cache file types:
> 
> Patch 1 hoists nfsd_export_wq up to the sunrpc layer as
> sunrpc_cache_wq, exposed through sunrpc_cache_queue_release()
> and sunrpc_cache_drain() so all four put callbacks share one
> workqueue and one drain primitive.
> 
> Patch 2 converts ip_map_put() to the queue_rcu_work() pattern,
> moving auth_domain_put() into a deferred ip_map_release() that
> runs after the RCU grace period.
> 
> Patch 3 unifies unix_gid_put() onto the same pattern for
> consistency (not a bug fix on its own).
> 
> Patch 4 takes a get_net(cd->net) in content_open(), cache_open(),
> and open_flush() and drops it in the matching release helpers,
> so cache_destroy_net() cannot run while a sunrpc cache fd is
> open.
> 
> Series has been compile-tested only.
> 
> ---
> Chuck Lever (6):
>        SUNRPC: Move cache_initialize() declaration to sunrpc-private header
>        SUNRPC: Provide a shared workqueue for cache release callbacks
>        SUNRPC: Defer ip_map sub-object cleanup past RCU grace period
>        SUNRPC: Use shared release pattern for the unix_gid cache
>        SUNRPC: Hold cd->net for the lifetime of cache files
>        NFSD: Convert nfsd_export_shutdown() to sunrpc_cache_destroy_net()
> 
>   fs/nfsd/export.c             | 45 ++--------------------
>   fs/nfsd/export.h             |  2 -
>   fs/nfsd/nfsctl.c             |  8 +---
>   include/linux/sunrpc/cache.h |  3 +-
>   net/sunrpc/cache.c           | 90 ++++++++++++++++++++++++++++++++++++++++++--
>   net/sunrpc/sunrpc.h          |  2 +
>   net/sunrpc/sunrpc_syms.c     | 23 ++++++-----
>   net/sunrpc/svcauth_unix.c    | 46 ++++++++++++----------
>   8 files changed, 135 insertions(+), 84 deletions(-)
> ---
> base-commit: f3a313ecd1fdab1f5da119db355363b13af6fcac
> change-id: 20260430-cache-uaf-fix-a13000f67c37
> 
> Best regards,
> --
> Chuck Lever
> 
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files
  2026-05-07  9:09 ` yangerkun
@ 2026-05-07 16:12   ` Chuck Lever
  2026-05-08  2:45     ` yangerkun
  0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever @ 2026-05-07 16:12 UTC (permalink / raw)
  To: yangerkun, Misbah Anjum N, Jeff Layton, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: linux-nfs, linux-kernel, netdev, Chuck Lever

Hello Erkun -

On Thu, May 7, 2026, at 11:09 AM, yangerkun wrote:
> Hi,
>
> 在 2026/5/1 22:51, Chuck Lever 写道:
>> Misbah Anjum reported a use-after-free in cache_check_rcu()
>> reached through e_show() while sosreport was reading
>> /proc/fs/nfsd/exports on ppc64le.  Two fixes for that report
>> landed in v7.0:
>> 
>>    48db892356d6 ("NFSD: Defer sub-object cleanup in export put callbacks")
>>    e7fcf179b82d ("NFSD: Hold net reference for the lifetime of /proc/fs/nfs/exports fd")
>
> Back to the problem fixed by this patches, I'm a little confused why
> this UAF can be trigged.
>
> Before this patches, svc_export_put show as follow:
>
>   368 static void svc_export_put(struct kref *ref)
>   369 {
>   370         struct svc_export *exp = container_of(ref, struct 
> svc_export, h.ref);
>   371
>   372         path_put(&exp->ex_path);
>   373         auth_domain_put(exp->ex_client);
>   374         call_rcu(&exp->ex_rcu, svc_export_release);
>   375 }
>
> The auth_domain_put function releases ->name using call_rcu, and
> path_put may release the dentry also via call_rcu. All of this seems to
> prevent e_show from causing a UAF. Could you point out which line in
> d_path triggers the issue?

The dentry, the mount, and the auth_domain ->name buffer all
end up RCU-freed (dentry_free() and delayed_free_vfsmnt in
fs/, svcauth_unix_domain_release_rcu() in svcauth_unix.c).
The eventual kfree isn't the problem.

The problem is the synchronous teardown inside path_put(),
which runs before svc_export_put() ever reaches its own
call_rcu():

  path_put(&exp->ex_path)
    -> dput(dentry)
       -> __dentry_kill()              [if last ref]
          -> __d_drop()                /* unhashes */
          -> dentry_unlink_inode()     /* d_inode = NULL */
          -> d_op->d_release() if set
          -> drops parent d_lockref    /* may cascade up */
          -> dentry_free()             /* call_rcu deferred */
    -> mntput(mnt)                     /* deferred via task_work */

The dentry pointer itself is RCU-safe, so prepend_path()'s walk
of d_parent and d_name doesn't read freed memory.  But by the
time the reader gets there, __d_clear_type_and_inode() has
already stored NULL into d_inode, __d_drop() has broken the
hash linkage, and the parent's d_lockref has been decremented
-- which can in turn fire __dentry_kill() on the parent, and
on up the tree.  An e_show() that's still inside its cache RCU
read section walks into that half-dismantled state through
seq_path(), and that's the NULL deref Misbah reported.

The earlier fix (2530766492ec, "nfsd: fix UAF when access
ex_uuid or ex_stats") moved the kfree of ex_uuid and ex_stats
into svc_export_release() so those are RCU-safe now.
path_put() and auth_domain_put() couldn't go in there because
both may sleep, and call_rcu callbacks run in softirq context.
This series uses queue_rcu_work() instead: it defers past the
grace period AND runs the callback in process context, so the
sleeping puts move into the deferred path and the window
closes.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files
  2026-05-07 16:12   ` Chuck Lever
@ 2026-05-08  2:45     ` yangerkun
  2026-05-08  3:08       ` yangerkun
  0 siblings, 1 reply; 12+ messages in thread
From: yangerkun @ 2026-05-08  2:45 UTC (permalink / raw)
  To: Chuck Lever, Misbah Anjum N, Jeff Layton, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, yi.zhang
  Cc: linux-nfs, linux-kernel, netdev, Chuck Lever

Hello  Chuck,

在 2026/5/8 0:12, Chuck Lever 写道:
> Hello Erkun -
> 
> On Thu, May 7, 2026, at 11:09 AM, yangerkun wrote:
>> Hi,
>>
>> 在 2026/5/1 22:51, Chuck Lever 写道:
>>> Misbah Anjum reported a use-after-free in cache_check_rcu()
>>> reached through e_show() while sosreport was reading
>>> /proc/fs/nfsd/exports on ppc64le.  Two fixes for that report
>>> landed in v7.0:
>>>
>>>     48db892356d6 ("NFSD: Defer sub-object cleanup in export put callbacks")
>>>     e7fcf179b82d ("NFSD: Hold net reference for the lifetime of /proc/fs/nfs/exports fd")
>>
>> Back to the problem fixed by this patches, I'm a little confused why
>> this UAF can be trigged.
>>
>> Before this patches, svc_export_put show as follow:
>>
>>    368 static void svc_export_put(struct kref *ref)
>>    369 {
>>    370         struct svc_export *exp = container_of(ref, struct
>> svc_export, h.ref);
>>    371
>>    372         path_put(&exp->ex_path);
>>    373         auth_domain_put(exp->ex_client);
>>    374         call_rcu(&exp->ex_rcu, svc_export_release);
>>    375 }
>>
>> The auth_domain_put function releases ->name using call_rcu, and
>> path_put may release the dentry also via call_rcu. All of this seems to
>> prevent e_show from causing a UAF. Could you point out which line in
>> d_path triggers the issue?
> 
> The dentry, the mount, and the auth_domain ->name buffer all
> end up RCU-freed (dentry_free() and delayed_free_vfsmnt in
> fs/, svcauth_unix_domain_release_rcu() in svcauth_unix.c).
> The eventual kfree isn't the problem.
> 
> The problem is the synchronous teardown inside path_put(),
> which runs before svc_export_put() ever reaches its own
> call_rcu():
> 
>    path_put(&exp->ex_path)
>      -> dput(dentry)
>         -> __dentry_kill()              [if last ref]
>            -> __d_drop()                /* unhashes */
>            -> dentry_unlink_inode()     /* d_inode = NULL */
>            -> d_op->d_release() if set
>            -> drops parent d_lockref    /* may cascade up */
>            -> dentry_free()             /* call_rcu deferred */
>      -> mntput(mnt)                     /* deferred via task_work */
> 
> The dentry pointer itself is RCU-safe, so prepend_path()'s walk
> of d_parent and d_name doesn't read freed memory.  But by the
> time the reader gets there, __d_clear_type_and_inode() has
> already stored NULL into d_inode, __d_drop() has broken the
> hash linkage, and the parent's d_lockref has been decremented
> -- which can in turn fire __dentry_kill() on the parent, and
> on up the tree.  An e_show() that's still inside its cache RCU
> read section walks into that half-dismantled state through
> seq_path(), and that's the NULL deref Misbah reported.

Thank you for your detailed explanation! Yes, e_show might be called 
when the state is partially dismantled, but after carefully reviewing 
the code with dput up to __dentry_kill, I still cannot find anything 
that could cause this issue. Additionally, the comments for prepend_path 
indicate that they have already taken into account that the dentry can 
be removed concurrently. I have also run some tests on my arm64 QEMU, 
but I couldn't reproduce the problem either. Could you please help me 
identify the specific line or pointer in the dentry that triggers this 
use-after-free or null pointer issue?

Maybe I am not be very familiar with the code, which caused me to fail 
to identify the real root cause. I'm so sorry for that.


265 char *d_path(const struct path *path, char *buf, int buflen)
266 {
267         DECLARE_BUFFER(b, buf, buflen);
268         struct path root;
269
270         /*
271          * We have various synthetic filesystems that never get 
mounted.  On
272          * these filesystems dentries are never used for lookup 
purposes, and
273          * thus don't need to be hashed.  They also don't need a 
name until a
274          * user wants to identify the object in /proc/pid/fd/.  The 
little hack
275          * below allows us to generate a name for these objects on 
demand:
276          *
277          * Some pseudo inodes are mountable.  When they are mounted
278          * path->dentry == path->mnt->mnt_root.  In that case don't 
call d_dname
279          * and instead have d_path return the mounted path.
280          */
281         if (path->dentry->d_op && path->dentry->d_op->d_dname &&
282             (!IS_ROOT(path->dentry) || path->dentry != 
path->mnt->mnt_root))
283                 return path->dentry->d_op->d_dname(path->dentry, 
buf, buflen);
284
285         rcu_read_lock();
286         get_fs_root_rcu(current->fs, &root);
287         if (unlikely(d_unlinked(path->dentry)))
288                 prepend(&b, " (deleted)", 11);
289         else
290                 prepend_char(&b, 0);
291         prepend_path(path, &root, &b);
292         rcu_read_unlock();
293
294         return extract_string(&b);
295 }


> 
> The earlier fix (2530766492ec, "nfsd: fix UAF when access
> ex_uuid or ex_stats") moved the kfree of ex_uuid and ex_stats
> into svc_export_release() so those are RCU-safe now.
> path_put() and auth_domain_put() couldn't go in there because
> both may sleep, and call_rcu callbacks run in softirq context.
> This series uses queue_rcu_work() instead: it defers past the
> grace period AND runs the callback in process context, so the
> sleeping puts move into the deferred path and the window
> closes.

Yeah, I can get this! Thanks again for your detail explanation!

Thanks,
Erkun.

> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files
  2026-05-08  2:45     ` yangerkun
@ 2026-05-08  3:08       ` yangerkun
  2026-05-08  8:16         ` yangerkun
  0 siblings, 1 reply; 12+ messages in thread
From: yangerkun @ 2026-05-08  3:08 UTC (permalink / raw)
  To: Chuck Lever, Misbah Anjum N, Jeff Layton, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, yi.zhang
  Cc: linux-nfs, linux-kernel, netdev, Chuck Lever



在 2026/5/8 10:45, yangerkun 写道:
> Hello  Chuck,
> 
> 在 2026/5/8 0:12, Chuck Lever 写道:
>> Hello Erkun -
>>
>> On Thu, May 7, 2026, at 11:09 AM, yangerkun wrote:
>>> Hi,
>>>
>>> 在 2026/5/1 22:51, Chuck Lever 写道:
>>>> Misbah Anjum reported a use-after-free in cache_check_rcu()
>>>> reached through e_show() while sosreport was reading
>>>> /proc/fs/nfsd/exports on ppc64le.  Two fixes for that report
>>>> landed in v7.0:
>>>>
>>>>     48db892356d6 ("NFSD: Defer sub-object cleanup in export put 
>>>> callbacks")
>>>>     e7fcf179b82d ("NFSD: Hold net reference for the lifetime of / 
>>>> proc/fs/nfs/exports fd")
>>>
>>> Back to the problem fixed by this patches, I'm a little confused why
>>> this UAF can be trigged.
>>>
>>> Before this patches, svc_export_put show as follow:
>>>
>>>    368 static void svc_export_put(struct kref *ref)
>>>    369 {
>>>    370         struct svc_export *exp = container_of(ref, struct
>>> svc_export, h.ref);
>>>    371
>>>    372         path_put(&exp->ex_path);
>>>    373         auth_domain_put(exp->ex_client);
>>>    374         call_rcu(&exp->ex_rcu, svc_export_release);
>>>    375 }
>>>
>>> The auth_domain_put function releases ->name using call_rcu, and
>>> path_put may release the dentry also via call_rcu. All of this seems to
>>> prevent e_show from causing a UAF. Could you point out which line in
>>> d_path triggers the issue?
>>
>> The dentry, the mount, and the auth_domain ->name buffer all
>> end up RCU-freed (dentry_free() and delayed_free_vfsmnt in
>> fs/, svcauth_unix_domain_release_rcu() in svcauth_unix.c).
>> The eventual kfree isn't the problem.
>>
>> The problem is the synchronous teardown inside path_put(),
>> which runs before svc_export_put() ever reaches its own
>> call_rcu():
>>
>>    path_put(&exp->ex_path)
>>      -> dput(dentry)
>>         -> __dentry_kill()              [if last ref]
>>            -> __d_drop()                /* unhashes */
>>            -> dentry_unlink_inode()     /* d_inode = NULL */
>>            -> d_op->d_release() if set
>>            -> drops parent d_lockref    /* may cascade up */
>>            -> dentry_free()             /* call_rcu deferred */
>>      -> mntput(mnt)                     /* deferred via task_work */
>>
>> The dentry pointer itself is RCU-safe, so prepend_path()'s walk
>> of d_parent and d_name doesn't read freed memory.  But by the
>> time the reader gets there, __d_clear_type_and_inode() has
>> already stored NULL into d_inode, __d_drop() has broken the
>> hash linkage, and the parent's d_lockref has been decremented
>> -- which can in turn fire __dentry_kill() on the parent, and
>> on up the tree.  An e_show() that's still inside its cache RCU
>> read section walks into that half-dismantled state through
>> seq_path(), and that's the NULL deref Misbah reported.
> 
> Thank you for your detailed explanation! Yes, e_show might be called 
> when the state is partially dismantled, but after carefully reviewing 
> the code with dput up to __dentry_kill, I still cannot find anything 
> that could cause this issue. Additionally, the comments for prepend_path 
> indicate that they have already taken into account that the dentry can 
> be removed concurrently. I have also run some tests on my arm64 QEMU, 
> but I couldn't reproduce the problem either. Could you please help me 
> identify the specific line or pointer in the dentry that triggers this 
> use-after-free or null pointer issue?
> 
> Maybe I am not be very familiar with the code, which caused me to fail 
> to identify the real root cause. I'm so sorry for that.
> 
> 
> 265 char *d_path(const struct path *path, char *buf, int buflen)
> 266 {
> 267         DECLARE_BUFFER(b, buf, buflen);
> 268         struct path root;
> 269
> 270         /*
> 271          * We have various synthetic filesystems that never get 
> mounted.  On
> 272          * these filesystems dentries are never used for lookup 
> purposes, and
> 273          * thus don't need to be hashed.  They also don't need a 
> name until a
> 274          * user wants to identify the object in /proc/pid/fd/.  The 
> little hack
> 275          * below allows us to generate a name for these objects on 
> demand:
> 276          *
> 277          * Some pseudo inodes are mountable.  When they are mounted
> 278          * path->dentry == path->mnt->mnt_root.  In that case don't 
> call d_dname
> 279          * and instead have d_path return the mounted path.
> 280          */
> 281         if (path->dentry->d_op && path->dentry->d_op->d_dname &&
> 282             (!IS_ROOT(path->dentry) || path->dentry != path->mnt- 
>  >mnt_root))
> 283                 return path->dentry->d_op->d_dname(path->dentry, 
> buf, buflen);
> 284
> 285         rcu_read_lock();
> 286         get_fs_root_rcu(current->fs, &root);
> 287         if (unlikely(d_unlinked(path->dentry)))
> 288                 prepend(&b, " (deleted)", 11);
> 289         else
> 290                 prepend_char(&b, 0);
> 291         prepend_path(path, &root, &b);
> 292         rcu_read_unlock();
> 293
> 294         return extract_string(&b);
> 295 }
> 
> 
>>
>> The earlier fix (2530766492ec, "nfsd: fix UAF when access
>> ex_uuid or ex_stats") moved the kfree of ex_uuid and ex_stats
>> into svc_export_release() so those are RCU-safe now.
>> path_put() and auth_domain_put() couldn't go in there because
>> both may sleep, and call_rcu callbacks run in softirq context.
>> This series uses queue_rcu_work() instead: it defers past the
>> grace period AND runs the callback in process context, so the
>> sleeping puts move into the deferred path and the window
>> closes.
> 
> Yeah, I can get this! Thanks again for your detail explanation!

Also, could the scenario described in this commit be triggered again?

commit 69d803c40edeaf94089fbc8751c9b746cdc35044
Author: Yang Erkun <yangerkun@huawei.com>
Date:   Mon Dec 16 22:21:52 2024 +0800

     nfsd: Revert "nfsd: release svc_expkey/svc_export with rcu_work"

     This reverts commit f8c989a0c89a75d30f899a7cabdc14d72522bb8d.

     Before this commit, svc_export_put or expkey_put will call path_put 
with
     sync mode. After this commit, path_put will be called with async mode.
     And this can lead the unexpected results show as follow.

     mkfs.xfs -f /dev/sda
     echo "/ *(rw,no_root_squash,fsid=0)" > /etc/exports
     echo "/mnt *(rw,no_root_squash,fsid=1)" >> /etc/exports
     exportfs -ra
     service nfs-server start
     mount -t nfs -o vers=4.0 127.0.0.1:/mnt /mnt1
     mount /dev/sda /mnt/sda
     touch /mnt1/sda/file
     exportfs -r
     umount /mnt/sda # failed unexcepted

     The touch will finally call nfsd_cross_mnt, add refcount to mount, and
     then add cache_head. Before this commit, exportfs -r will call
     cache_flush to cleanup all cache_head, and path_put in
     svc_export_put/expkey_put will be finished with sync mode. So, the
     latter umount will always success. However, after this commit, path_put
     will be called with async mode, the latter umount may failed, and if
     we add some delay, umount will success too. Personally I think this bug
     and should be fixed. We first revert before bugfix patch, and then fix
     the original bug with a different way.

     Fixes: f8c989a0c89a ("nfsd: release svc_expkey/svc_export with 
rcu_work")
     Signed-off-by: Yang Erkun <yangerkun@huawei.com>
     Signed-off-by: Chuck Lever <chuck.lever@oracle.com>


> 
> Thanks,
> Erkun.
> 
>>
>>
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files
  2026-05-08  3:08       ` yangerkun
@ 2026-05-08  8:16         ` yangerkun
  2026-05-08 13:00           ` yangerkun
  0 siblings, 1 reply; 12+ messages in thread
From: yangerkun @ 2026-05-08  8:16 UTC (permalink / raw)
  To: Chuck Lever, Misbah Anjum N, Jeff Layton, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, yi.zhang, Zhihao Cheng, Li Lingfeng
  Cc: linux-nfs, linux-kernel, netdev, Chuck Lever



在 2026/5/8 11:08, yangerkun 写道:
> 
> 
> 在 2026/5/8 10:45, yangerkun 写道:
>> Hello  Chuck,
>>
>> 在 2026/5/8 0:12, Chuck Lever 写道:
>>> Hello Erkun -
>>>
>>> On Thu, May 7, 2026, at 11:09 AM, yangerkun wrote:
>>>> Hi,
>>>>
>>>> 在 2026/5/1 22:51, Chuck Lever 写道:
>>>>> Misbah Anjum reported a use-after-free in cache_check_rcu()
>>>>> reached through e_show() while sosreport was reading
>>>>> /proc/fs/nfsd/exports on ppc64le.  Two fixes for that report
>>>>> landed in v7.0:
>>>>>
>>>>>     48db892356d6 ("NFSD: Defer sub-object cleanup in export put 
>>>>> callbacks")
>>>>>     e7fcf179b82d ("NFSD: Hold net reference for the lifetime of / 
>>>>> proc/fs/nfs/exports fd")
>>>>
>>>> Back to the problem fixed by this patches, I'm a little confused why
>>>> this UAF can be trigged.
>>>>
>>>> Before this patches, svc_export_put show as follow:
>>>>
>>>>    368 static void svc_export_put(struct kref *ref)
>>>>    369 {
>>>>    370         struct svc_export *exp = container_of(ref, struct
>>>> svc_export, h.ref);
>>>>    371
>>>>    372         path_put(&exp->ex_path);
>>>>    373         auth_domain_put(exp->ex_client);
>>>>    374         call_rcu(&exp->ex_rcu, svc_export_release);
>>>>    375 }
>>>>
>>>> The auth_domain_put function releases ->name using call_rcu, and
>>>> path_put may release the dentry also via call_rcu. All of this seems to
>>>> prevent e_show from causing a UAF. Could you point out which line in
>>>> d_path triggers the issue?
>>>
>>> The dentry, the mount, and the auth_domain ->name buffer all
>>> end up RCU-freed (dentry_free() and delayed_free_vfsmnt in
>>> fs/, svcauth_unix_domain_release_rcu() in svcauth_unix.c).
>>> The eventual kfree isn't the problem.
>>>
>>> The problem is the synchronous teardown inside path_put(),
>>> which runs before svc_export_put() ever reaches its own
>>> call_rcu():
>>>
>>>    path_put(&exp->ex_path)
>>>      -> dput(dentry)
>>>         -> __dentry_kill()              [if last ref]
>>>            -> __d_drop()                /* unhashes */
>>>            -> dentry_unlink_inode()     /* d_inode = NULL */
>>>            -> d_op->d_release() if set
>>>            -> drops parent d_lockref    /* may cascade up */
>>>            -> dentry_free()             /* call_rcu deferred */
>>>      -> mntput(mnt)                     /* deferred via task_work */
>>>
>>> The dentry pointer itself is RCU-safe, so prepend_path()'s walk
>>> of d_parent and d_name doesn't read freed memory.  But by the
>>> time the reader gets there, __d_clear_type_and_inode() has
>>> already stored NULL into d_inode, __d_drop() has broken the
>>> hash linkage, and the parent's d_lockref has been decremented
>>> -- which can in turn fire __dentry_kill() on the parent, and
>>> on up the tree.  An e_show() that's still inside its cache RCU
>>> read section walks into that half-dismantled state through
>>> seq_path(), and that's the NULL deref Misbah reported.
>>
>> Thank you for your detailed explanation! Yes, e_show might be called 
>> when the state is partially dismantled, but after carefully reviewing 
>> the code with dput up to __dentry_kill, I still cannot find anything 
>> that could cause this issue. Additionally, the comments for 
>> prepend_path indicate that they have already taken into account that 
>> the dentry can be removed concurrently. I have also run some tests on 
>> my arm64 QEMU, but I couldn't reproduce the problem either. Could you 
>> please help me identify the specific line or pointer in the dentry 
>> that triggers this use-after-free or null pointer issue?
>>
>> Maybe I am not be very familiar with the code, which caused me to fail 
>> to identify the real root cause. I'm so sorry for that.
>>
>>
>> 265 char *d_path(const struct path *path, char *buf, int buflen)
>> 266 {
>> 267         DECLARE_BUFFER(b, buf, buflen);
>> 268         struct path root;
>> 269
>> 270         /*
>> 271          * We have various synthetic filesystems that never get 
>> mounted.  On
>> 272          * these filesystems dentries are never used for lookup 
>> purposes, and
>> 273          * thus don't need to be hashed.  They also don't need a 
>> name until a
>> 274          * user wants to identify the object in /proc/pid/fd/.  
>> The little hack
>> 275          * below allows us to generate a name for these objects on 
>> demand:
>> 276          *
>> 277          * Some pseudo inodes are mountable.  When they are mounted
>> 278          * path->dentry == path->mnt->mnt_root.  In that case 
>> don't call d_dname
>> 279          * and instead have d_path return the mounted path.
>> 280          */
>> 281         if (path->dentry->d_op && path->dentry->d_op->d_dname &&
>> 282             (!IS_ROOT(path->dentry) || path->dentry != path->mnt- 
>>  >mnt_root))
>> 283                 return path->dentry->d_op->d_dname(path->dentry, 
>> buf, buflen);
>> 284
>> 285         rcu_read_lock();
>> 286         get_fs_root_rcu(current->fs, &root);
>> 287         if (unlikely(d_unlinked(path->dentry)))
>> 288                 prepend(&b, " (deleted)", 11);
>> 289         else
>> 290                 prepend_char(&b, 0);
>> 291         prepend_path(path, &root, &b);
>> 292         rcu_read_unlock();
>> 293
>> 294         return extract_string(&b);
>> 295 }
>>
>>
>>>
>>> The earlier fix (2530766492ec, "nfsd: fix UAF when access
>>> ex_uuid or ex_stats") moved the kfree of ex_uuid and ex_stats
>>> into svc_export_release() so those are RCU-safe now.
>>> path_put() and auth_domain_put() couldn't go in there because
>>> both may sleep, and call_rcu callbacks run in softirq context.
>>> This series uses queue_rcu_work() instead: it defers past the
>>> grace period AND runs the callback in process context, so the
>>> sleeping puts move into the deferred path and the window
>>> closes.
>>
>> Yeah, I can get this! Thanks again for your detail explanation!
> 
> Also, could the scenario described in this commit be triggered again?
> 
> commit 69d803c40edeaf94089fbc8751c9b746cdc35044
> Author: Yang Erkun <yangerkun@huawei.com>
> Date:   Mon Dec 16 22:21:52 2024 +0800
> 
>      nfsd: Revert "nfsd: release svc_expkey/svc_export with rcu_work"
> 
>      This reverts commit f8c989a0c89a75d30f899a7cabdc14d72522bb8d.
> 
>      Before this commit, svc_export_put or expkey_put will call path_put 
> with
>      sync mode. After this commit, path_put will be called with async mode.
>      And this can lead the unexpected results show as follow.
> 
>      mkfs.xfs -f /dev/sda
>      echo "/ *(rw,no_root_squash,fsid=0)" > /etc/exports
>      echo "/mnt *(rw,no_root_squash,fsid=1)" >> /etc/exports
>      exportfs -ra
>      service nfs-server start
>      mount -t nfs -o vers=4.0 127.0.0.1:/mnt /mnt1
>      mount /dev/sda /mnt/sda
>      touch /mnt1/sda/file
>      exportfs -r
>      umount /mnt/sda # failed unexcepted
> 
>      The touch will finally call nfsd_cross_mnt, add refcount to mount, and
>      then add cache_head. Before this commit, exportfs -r will call
>      cache_flush to cleanup all cache_head, and path_put in
>      svc_export_put/expkey_put will be finished with sync mode. So, the
>      latter umount will always success. However, after this commit, 
> path_put
>      will be called with async mode, the latter umount may failed, and if
>      we add some delay, umount will success too. Personally I think this 
> bug
>      and should be fixed. We first revert before bugfix patch, and then fix
>      the original bug with a different way.
> 
>      Fixes: f8c989a0c89a ("nfsd: release svc_expkey/svc_export with 
> rcu_work")
>      Signed-off-by: Yang Erkun <yangerkun@huawei.com>
>      Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> 
> 

After reviewing these two commits:

e7fcf179b82d NFSD: Hold net reference for the lifetime of 
/proc/fs/nfs/exports fd
48db892356d6 NFSD: Defer sub-object cleanup in export put callbacks

I believe that the issue described in commit e7fcf179b82d might be the
root cause of the null pointer dereferences mentioned in [1]. This is
because we do not call get_net when opening /proc/fs/nfs/exports. As a
result, when the network namespace exits, nfsd_net_exit is triggered.
If, at the same time, the contents of /proc/fs/nfs/exports are being
read, a use-after-free (UAF) can occur on the struct cache_detail. I
think all three bugs referenced in [1] stem from this issue. Therefore,
commit e7fcf179b82d has already addressed the problem. To prevent the
issue described in commit 69d803c40ede, should we consider reverting
commit 48db892356d6 first? Please let me know if I have misunderstood
any aspect of this problem.

>>
>> Thanks,
>> Erkun.
>>
>>>
>>>
>>
> 
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files
  2026-05-08  8:16         ` yangerkun
@ 2026-05-08 13:00           ` yangerkun
  2026-05-08 20:47             ` Chuck Lever
  0 siblings, 1 reply; 12+ messages in thread
From: yangerkun @ 2026-05-08 13:00 UTC (permalink / raw)
  To: Chuck Lever, Misbah Anjum N, Jeff Layton, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, yi.zhang, Zhihao Cheng, Li Lingfeng
  Cc: linux-nfs, linux-kernel, netdev, Chuck Lever



在 2026/5/8 16:16, yangerkun 写道:
> 
> 
> 在 2026/5/8 11:08, yangerkun 写道:
>>
>>
>> 在 2026/5/8 10:45, yangerkun 写道:
>>> Hello  Chuck,
>>>
>>> 在 2026/5/8 0:12, Chuck Lever 写道:
>>>> Hello Erkun -
>>>>
>>>> On Thu, May 7, 2026, at 11:09 AM, yangerkun wrote:
>>>>> Hi,
>>>>>
>>>>> 在 2026/5/1 22:51, Chuck Lever 写道:
>>>>>> Misbah Anjum reported a use-after-free in cache_check_rcu()
>>>>>> reached through e_show() while sosreport was reading
>>>>>> /proc/fs/nfsd/exports on ppc64le.  Two fixes for that report
>>>>>> landed in v7.0:
>>>>>>
>>>>>>     48db892356d6 ("NFSD: Defer sub-object cleanup in export put 
>>>>>> callbacks")
>>>>>>     e7fcf179b82d ("NFSD: Hold net reference for the lifetime of / 
>>>>>> proc/fs/nfs/exports fd")
>>>>>
>>>>> Back to the problem fixed by this patches, I'm a little confused why
>>>>> this UAF can be trigged.
>>>>>
>>>>> Before this patches, svc_export_put show as follow:
>>>>>
>>>>>    368 static void svc_export_put(struct kref *ref)
>>>>>    369 {
>>>>>    370         struct svc_export *exp = container_of(ref, struct
>>>>> svc_export, h.ref);
>>>>>    371
>>>>>    372         path_put(&exp->ex_path);
>>>>>    373         auth_domain_put(exp->ex_client);
>>>>>    374         call_rcu(&exp->ex_rcu, svc_export_release);
>>>>>    375 }
>>>>>
>>>>> The auth_domain_put function releases ->name using call_rcu, and
>>>>> path_put may release the dentry also via call_rcu. All of this 
>>>>> seems to
>>>>> prevent e_show from causing a UAF. Could you point out which line in
>>>>> d_path triggers the issue?
>>>>
>>>> The dentry, the mount, and the auth_domain ->name buffer all
>>>> end up RCU-freed (dentry_free() and delayed_free_vfsmnt in
>>>> fs/, svcauth_unix_domain_release_rcu() in svcauth_unix.c).
>>>> The eventual kfree isn't the problem.
>>>>
>>>> The problem is the synchronous teardown inside path_put(),
>>>> which runs before svc_export_put() ever reaches its own
>>>> call_rcu():
>>>>
>>>>    path_put(&exp->ex_path)
>>>>      -> dput(dentry)
>>>>         -> __dentry_kill()              [if last ref]
>>>>            -> __d_drop()                /* unhashes */
>>>>            -> dentry_unlink_inode()     /* d_inode = NULL */
>>>>            -> d_op->d_release() if set
>>>>            -> drops parent d_lockref    /* may cascade up */
>>>>            -> dentry_free()             /* call_rcu deferred */
>>>>      -> mntput(mnt)                     /* deferred via task_work */
>>>>
>>>> The dentry pointer itself is RCU-safe, so prepend_path()'s walk
>>>> of d_parent and d_name doesn't read freed memory.  But by the
>>>> time the reader gets there, __d_clear_type_and_inode() has
>>>> already stored NULL into d_inode, __d_drop() has broken the
>>>> hash linkage, and the parent's d_lockref has been decremented
>>>> -- which can in turn fire __dentry_kill() on the parent, and
>>>> on up the tree.  An e_show() that's still inside its cache RCU
>>>> read section walks into that half-dismantled state through
>>>> seq_path(), and that's the NULL deref Misbah reported.
>>>
>>> Thank you for your detailed explanation! Yes, e_show might be called 
>>> when the state is partially dismantled, but after carefully reviewing 
>>> the code with dput up to __dentry_kill, I still cannot find anything 
>>> that could cause this issue. Additionally, the comments for 
>>> prepend_path indicate that they have already taken into account that 
>>> the dentry can be removed concurrently. I have also run some tests on 
>>> my arm64 QEMU, but I couldn't reproduce the problem either. Could you 
>>> please help me identify the specific line or pointer in the dentry 
>>> that triggers this use-after-free or null pointer issue?
>>>
>>> Maybe I am not be very familiar with the code, which caused me to 
>>> fail to identify the real root cause. I'm so sorry for that.
>>>
>>>
>>> 265 char *d_path(const struct path *path, char *buf, int buflen)
>>> 266 {
>>> 267         DECLARE_BUFFER(b, buf, buflen);
>>> 268         struct path root;
>>> 269
>>> 270         /*
>>> 271          * We have various synthetic filesystems that never get 
>>> mounted.  On
>>> 272          * these filesystems dentries are never used for lookup 
>>> purposes, and
>>> 273          * thus don't need to be hashed.  They also don't need a 
>>> name until a
>>> 274          * user wants to identify the object in /proc/pid/fd/. 
>>> The little hack
>>> 275          * below allows us to generate a name for these objects 
>>> on demand:
>>> 276          *
>>> 277          * Some pseudo inodes are mountable.  When they are mounted
>>> 278          * path->dentry == path->mnt->mnt_root.  In that case 
>>> don't call d_dname
>>> 279          * and instead have d_path return the mounted path.
>>> 280          */
>>> 281         if (path->dentry->d_op && path->dentry->d_op->d_dname &&
>>> 282             (!IS_ROOT(path->dentry) || path->dentry != path->mnt- 
>>>  >mnt_root))
>>> 283                 return path->dentry->d_op->d_dname(path->dentry, 
>>> buf, buflen);
>>> 284
>>> 285         rcu_read_lock();
>>> 286         get_fs_root_rcu(current->fs, &root);
>>> 287         if (unlikely(d_unlinked(path->dentry)))
>>> 288                 prepend(&b, " (deleted)", 11);
>>> 289         else
>>> 290                 prepend_char(&b, 0);
>>> 291         prepend_path(path, &root, &b);
>>> 292         rcu_read_unlock();
>>> 293
>>> 294         return extract_string(&b);
>>> 295 }
>>>
>>>
>>>>
>>>> The earlier fix (2530766492ec, "nfsd: fix UAF when access
>>>> ex_uuid or ex_stats") moved the kfree of ex_uuid and ex_stats
>>>> into svc_export_release() so those are RCU-safe now.
>>>> path_put() and auth_domain_put() couldn't go in there because
>>>> both may sleep, and call_rcu callbacks run in softirq context.
>>>> This series uses queue_rcu_work() instead: it defers past the
>>>> grace period AND runs the callback in process context, so the
>>>> sleeping puts move into the deferred path and the window
>>>> closes.
>>>
>>> Yeah, I can get this! Thanks again for your detail explanation!
>>
>> Also, could the scenario described in this commit be triggered again?
>>
>> commit 69d803c40edeaf94089fbc8751c9b746cdc35044
>> Author: Yang Erkun <yangerkun@huawei.com>
>> Date:   Mon Dec 16 22:21:52 2024 +0800
>>
>>      nfsd: Revert "nfsd: release svc_expkey/svc_export with rcu_work"
>>
>>      This reverts commit f8c989a0c89a75d30f899a7cabdc14d72522bb8d.
>>
>>      Before this commit, svc_export_put or expkey_put will call 
>> path_put with
>>      sync mode. After this commit, path_put will be called with async 
>> mode.
>>      And this can lead the unexpected results show as follow.
>>
>>      mkfs.xfs -f /dev/sda
>>      echo "/ *(rw,no_root_squash,fsid=0)" > /etc/exports
>>      echo "/mnt *(rw,no_root_squash,fsid=1)" >> /etc/exports
>>      exportfs -ra
>>      service nfs-server start
>>      mount -t nfs -o vers=4.0 127.0.0.1:/mnt /mnt1
>>      mount /dev/sda /mnt/sda
>>      touch /mnt1/sda/file
>>      exportfs -r
>>      umount /mnt/sda # failed unexcepted
>>
>>      The touch will finally call nfsd_cross_mnt, add refcount to 
>> mount, and
>>      then add cache_head. Before this commit, exportfs -r will call
>>      cache_flush to cleanup all cache_head, and path_put in
>>      svc_export_put/expkey_put will be finished with sync mode. So, the
>>      latter umount will always success. However, after this commit, 
>> path_put
>>      will be called with async mode, the latter umount may failed, and if
>>      we add some delay, umount will success too. Personally I think 
>> this bug
>>      and should be fixed. We first revert before bugfix patch, and 
>> then fix
>>      the original bug with a different way.
>>
>>      Fixes: f8c989a0c89a ("nfsd: release svc_expkey/svc_export with 
>> rcu_work")
>>      Signed-off-by: Yang Erkun <yangerkun@huawei.com>
>>      Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>
>>
> 
> After reviewing these two commits:
> 
> e7fcf179b82d NFSD: Hold net reference for the lifetime of /proc/fs/nfs/ 
> exports fd
> 48db892356d6 NFSD: Defer sub-object cleanup in export put callbacks
> 
> I believe that the issue described in commit e7fcf179b82d might be the
> root cause of the null pointer dereferences mentioned in [1]. This is
> because we do not call get_net when opening /proc/fs/nfs/exports. As a
> result, when the network namespace exits, nfsd_net_exit is triggered.
> If, at the same time, the contents of /proc/fs/nfs/exports are being
> read, a use-after-free (UAF) can occur on the struct cache_detail. I
> think all three bugs referenced in [1] stem from this issue. Therefore,
> commit e7fcf179b82d has already addressed the problem. To prevent the
> issue described in commit 69d803c40ede, should we consider reverting
> commit 48db892356d6 first? Please let me know if I have misunderstood
> any aspect of this problem.

Locally, I wrote a stable regression test case. I also reverted to 
commit 9189d23b835cec646ba5010db35d1557a77c5857 (which is before commits 
2862eee078a4 "SUNRPC: make sure cache entry active before cache_show" 
and be8f982c369c "nfsd: make sure exp active before svc_export_show"). 
Even then, a panic can still be triggered without any actual export path...

> 
>>>
>>> Thanks,
>>> Erkun.
>>>
>>>>
>>>>
>>>
>>
>>
>>
> 
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files
  2026-05-08 13:00           ` yangerkun
@ 2026-05-08 20:47             ` Chuck Lever
  2026-05-09  9:41               ` yangerkun
  0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever @ 2026-05-08 20:47 UTC (permalink / raw)
  To: yangerkun, Misbah Anjum N, Jeff Layton, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, yi.zhang, Zhihao Cheng, Li Lingfeng
  Cc: linux-nfs, linux-kernel, netdev, Chuck Lever

Hi Erkun -

On Fri, May 8, 2026, at 9:00 AM, yangerkun wrote:
> 在 2026/5/8 16:16, yangerkun 写道:
>> 
>> 
>> 在 2026/5/8 11:08, yangerkun 写道:
>> After reviewing these two commits:
>> 
>> e7fcf179b82d NFSD: Hold net reference for the lifetime of /proc/fs/nfs/ 
>> exports fd
>> 48db892356d6 NFSD: Defer sub-object cleanup in export put callbacks
>> 
>> I believe that the issue described in commit e7fcf179b82d might be the
>> root cause of the null pointer dereferences mentioned in [1].

That's where I landed too. e7fcf179b82d closed the specific
oops Misbah hit on /proc/fs/nfs/exports. The matching patch
in this series is 5/6 ("SUNRPC: Hold cd->net for the lifetime
of cache files"), which extends the same get_net()/put_net()
guard to the sunrpc cache files at

 /proc/net/rpc/<cache>/{content,channel,flush} .

Those open helpers had the same hole; sosreport just hit the
nfsd-specific file first because it reads /proc/fs/nfsd/exports.

Patch 5/6's changelog pins down the deref site you asked
about: cache_check_rcu() faults reading h->flags off a
garbage cache_head returned by __cache_seq_start() walking a
cd->hash_table that cache_destroy_net() already freed. Not a
dentry deref. The dentry-teardown path is a separate failure
mode that 48db892356d6 closed for the export and expkey caches.


>> To prevent the
>> issue described in commit 69d803c40ede, should we consider reverting
>> commit 48db892356d6 first?

Not for this series. Patches 3/6 and 4/6 don't add any new
path_put deferral; their commit messages call them out as
consistency changes, not bug fixes. ip_map holds only an
auth_domain reference and unix_gid holds only a group_info,
so neither cache reaches mntput from the deferred release.
The exportfs-r-then-umount sequence isn't touched by this
series.

The svc_export and svc_expkey path_put deferral lives in
48db892356d6, which is already in v7.0. If the umount window
from 69d803c40ede is still reachable through that commit,
that's a regression in 48db892356d6 and worth a separate
thread.


> Locally, I wrote a stable regression test case. I also reverted to 
> commit 9189d23b835cec646ba5010db35d1557a77c5857 (which is before commits 
> 2862eee078a4 "SUNRPC: make sure cache entry active before cache_show" 
> and be8f982c369c "nfsd: make sure exp active before svc_export_show"). 
> Even then, a panic can still be triggered without any actual export path...

That fits 5/6's failure mode. Without an export no svc_export
or svc_expkey entry is populated, but rpc.mountd reads
auth.unix.ip/content and auth.unix.gid/content directly,
and on a pre-5/6 tree the open helpers in cache.c hold no
reference on cd->net. cache_destroy_net() at namespace exit
then races a reader still inside cache_seq_start_rcu(), and
the reader walks a freed cd->hash_table.                                                                                                                               

Could you share the reproducer and the panic stack trace?
If the fault is in cache_check_rcu() through one of the
sunrpc cache files, that confirms 5/6 is the right fix, and
I'll happily carry your Tested-by on it.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files
  2026-05-08 20:47             ` Chuck Lever
@ 2026-05-09  9:41               ` yangerkun
  2026-05-10 16:18                 ` Chuck Lever
  0 siblings, 1 reply; 12+ messages in thread
From: yangerkun @ 2026-05-09  9:41 UTC (permalink / raw)
  To: Chuck Lever, Misbah Anjum N, Jeff Layton, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, yi.zhang, Zhihao Cheng, Li Lingfeng
  Cc: linux-nfs, linux-kernel, netdev, Chuck Lever

Hi Chuck!

在 2026/5/9 4:47, Chuck Lever 写道:
> Hi Erkun -
> 
> On Fri, May 8, 2026, at 9:00 AM, yangerkun wrote:
>> 在 2026/5/8 16:16, yangerkun 写道:
>>>
>>>
>>> 在 2026/5/8 11:08, yangerkun 写道:
>>> After reviewing these two commits:
>>>
>>> e7fcf179b82d NFSD: Hold net reference for the lifetime of /proc/fs/nfs/
>>> exports fd
>>> 48db892356d6 NFSD: Defer sub-object cleanup in export put callbacks
>>>
>>> I believe that the issue described in commit e7fcf179b82d might be the
>>> root cause of the null pointer dereferences mentioned in [1].
> 
> That's where I landed too. e7fcf179b82d closed the specific
> oops Misbah hit on /proc/fs/nfs/exports. The matching patch

Yeah!

> in this series is 5/6 ("SUNRPC: Hold cd->net for the lifetime
> of cache files"), which extends the same get_net()/put_net()
> guard to the sunrpc cache files at
> 
>   /proc/net/rpc/<cache>/{content,channel,flush} .
> 
> Those open helpers had the same hole; sosreport just hit the
> nfsd-specific file first because it reads /proc/fs/nfsd/exports.


Hmm... /proc/net is always a symlink to /proc/self/net. After opening 
/proc/net/rpc/<cache>/content and attempting to read it, the 
proc_reg_read function calls use_pde before pde_read. This sequence can 
prevent a race condition because nfsd_export_shutdown leads to 
cache_unregister_net, which calls remove_cache_proc_entries, then 
proc_remove, and eventually proc_entry_rundown. The proc_entry_rundown 
function waits until unuse_pde is called in proc_reg_read. Therefore, 
I'm not sure if forgetting to call get_net when opening 
/proc/net/rpc/<cache>/content is the root cause of the null pointer in 
c_show. I've tried to find any other possible root causes but have been 
unsuccessful. Sorry....


> 
> Patch 5/6's changelog pins down the deref site you asked
> about: cache_check_rcu() faults reading h->flags off a
> garbage cache_head returned by __cache_seq_start() walking a
> cd->hash_table that cache_destroy_net() already freed. Not a
> dentry deref. The dentry-teardown path is a separate failure
> mode that 48db892356d6 closed for the export and expkey caches.
> 
> 
>>> To prevent the
>>> issue described in commit 69d803c40ede, should we consider reverting
>>> commit 48db892356d6 first?
> 
> Not for this series. Patches 3/6 and 4/6 don't add any new
> path_put deferral; their commit messages call them out as
> consistency changes, not bug fixes. ip_map holds only an
> auth_domain reference and unix_gid holds only a group_info,
> so neither cache reaches mntput from the deferred release.
> The exportfs-r-then-umount sequence isn't touched by this
> series.
> 
> The svc_export and svc_expkey path_put deferral lives in
> 48db892356d6, which is already in v7.0. If the umount window
> from 69d803c40ede is still reachable through that commit,
> that's a regression in 48db892356d6 and worth a separate
> thread.

Yeah! Totally agree!

> 
> 
>> Locally, I wrote a stable regression test case. I also reverted to
>> commit 9189d23b835cec646ba5010db35d1557a77c5857 (which is before commits
>> 2862eee078a4 "SUNRPC: make sure cache entry active before cache_show"
>> and be8f982c369c "nfsd: make sure exp active before svc_export_show").
>> Even then, a panic can still be triggered without any actual export path...
> 
> That fits 5/6's failure mode. Without an export no svc_export
> or svc_expkey entry is populated, but rpc.mountd reads
> auth.unix.ip/content and auth.unix.gid/content directly,
> and on a pre-5/6 tree the open helpers in cache.c hold no
> reference on cd->net. cache_destroy_net() at namespace exit
> then races a reader still inside cache_seq_start_rcu(), and
> the reader walks a freed cd->hash_table.
> 
> Could you share the reproducer and the panic stack trace?
> If the fault is in cache_check_rcu() through one of the
> sunrpc cache files, that confirms 5/6 is the right fix, and
> I'll happily carry your Tested-by on it.

The shell(Created will AI assist):

#!/bin/bash
#
# Test for e7fcf179b82d ("NFSD: Hold net reference for ...")
#
# Reproduces the scenario described in the commit:
#   1. Process opens /proc/fs/nfsd/exports in netns A
#   2. Process leaves A (joins B), emptying A
#   3. ip netns del A triggers nfsd_export_shutdown → cache_detail freed
#   4. Process reads from still-open fd → UAF on UNFIXED kernel
#
# On current kernel (with e7fcf179b82d applied):
#   get_net in exports_net_open prevents netns A from being destroyed
#   → read succeeds safely (test output: "SUCCESS")
#
# On kernel WITHOUT e7fcf179b82d:
#   No get_net → A destroyed → read triggers UAF:
#     - KASAN: use-after-free, or
#     - NULL deref, or
#     - slab corruption (ASCII strings like "cap_type", "libz.so.")
#
# Usage: sudo ./test_nfsd_exports_uaf.sh

set -e -u

NS_A="nfsd_test_A_$$"
NS_B="nfsd_test_B_$$"
SYNC="/tmp/nfsd_uaf_sync_$$"
GO="/tmp/nfsd_uaf_go_$$"
REPRO="/tmp/uaf_repro_$$"

cleanup() {
     set +e
     kill $REPRO_PID 2>/dev/null || true
     wait $REPRO_PID 2>/dev/null || true
     ip netns del "$NS_B" 2>/dev/null || true
     ip netns del "$NS_A" 2>/dev/null || true
     rm -f "$REPRO" "$SYNC" "$GO"
}
trap cleanup EXIT

echo "=== Reproduce e7fcf179b82d scenario ==="

# --- Setup ---
echo "[setup] creating netns A and B..."
ip netns add "$NS_A"
ip netns add "$NS_B"

echo "[setup] loading nfsd..."
modprobe nfsd || true

echo "[setup] compiling repro..."
gcc -o "$REPRO" /tmp/uaf_repro.c 2>/dev/null || \
     gcc -o "$REPRO" -x c - <<'SRCEOF' 2>/dev/null
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <signal.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
static volatile sig_atomic_t go_flag = 0;
static void handler(int sig) { go_flag = 1; }
int main(int argc, char *argv[]) {
     const char *netns_b = argv[1], *sync_f = argv[2], *go_f = argv[3];
     int fd, nsfd; ssize_t n; char buf[4096];
     fd = open("/proc/fs/nfs/exports", O_RDONLY);
     if (fd < 0) { perror("open exports debug"); return 1; }
     fprintf(stderr, "[repro] opened exports (fd=%d) in netns A\n", fd);
     nsfd = open(netns_b, O_RDONLY);
     if (nsfd < 0) { perror("open B"); return 1; }
     if (setns(nsfd, CLONE_NEWNET) < 0) { perror("setns"); return 1; }
     close(nsfd);
     fprintf(stderr, "[repro] moved to B; A has no processes\n");
     close(open(sync_f, O_CREAT | O_WRONLY, 0666));
     signal(SIGCONT, handler);
     while (!go_flag) { struct stat st; if (stat(go_f, &st) == 0) break; 
pause(); }
     fprintf(stderr, "[repro] reading exports fd...\n");
     lseek(fd, 0, SEEK_SET);
     sleep(1);
     n = read(fd, buf, sizeof(buf)-1);
     if (n < 0) { perror("read"); close(fd); return 1; }
     buf[n] = '\0';
     fprintf(stderr, "[repro] SUCCESS: read %zd bytes (no UAF)\n", n);
     close(fd);
     return 0;
}
SRCEOF

# --- Run repro inside A ---
rm -f "$SYNC" "$GO"
echo "[test] starting repro inside netns A..."
ip netns exec "$NS_A" "$REPRO" /var/run/netns/"$NS_B" "$SYNC" "$GO" &
REPRO_PID=$!

# --- Wait for repro to move to B ---
echo "[test] waiting for repro to signal that A is empty..."
for i in $(seq 1 30); do
     if [ -f "$SYNC" ]; then break; fi
     if ! kill -0 $REPRO_PID 2>/dev/null; then
         echo "[FAIL] repro exited prematurely"
         wait $REPRO_PID || true
         exit 1
     fi
     sleep 0.2
done

if [ ! -f "$SYNC" ]; then
     echo "[FAIL] timeout waiting for repro"
     exit 1
fi
echo "[test] repro moved to B"

# --- Destroy netns A ---
echo "[test] destroying netns A (ip netns del $NS_A)..."
set +e
ip netns del "$NS_A" 2>&1
RC=$?
set -e

if [ $RC -eq 0 ]; then
     echo "[test] 'ip netns del $NS_A' returned success"
else
     echo "[test] 'ip netns del $NS_A' returned $RC"
fi

# --- Signal repro to read from the exports fd ---
echo "[test] signaling repro to read from exports fd..."
touch "$GO"
kill -CONT $REPRO_PID 2>/dev/null || true

# --- Wait for repro and check result ---
set +e
wait $REPRO_PID
RC=$?
set -e

if [ $RC -eq 0 ]; then
     echo ""
     echo "=== TEST PASSED: no UAF detected (kernel has e7fcf179b82d 
fix) ==="
     echo "   get_net() holds netns A alive while the fd is open."
else
     echo ""
     echo "=== TEST FAILED with exit code $RC ==="
     echo "   Possible UAF or other error."
     echo "   If running on a kernel WITHOUT e7fcf179b82d, this crash is 
EXPECTED."
fi
exit $RC


Panic show as follow with commit:

commit 9189d23b835cec646ba5010db35d1557a77c5857 (HEAD -> master)
Author: Chuck Lever <chuck.lever@oracle.com>
Date:   Thu Oct 17 09:36:31 2024 -0400

     lockd: Remove unneeded initialization of file_lock::c.flc_flags


localhost login: [   39.462598][  T579] 
================================================================== 
  
  
     [202/363]
[   39.463541][  T579] BUG: KASAN: slab-use-after-free in 
cache_seq_next_rcu+0xa4/0x180 [sunrpc]
[   39.464551][  T579] Read of size 4 at addr ffff00000fbe8408 by task 
uaf_repro_563/579
[   39.465291][  T579]
[   39.465513][  T579] CPU: 1 UID: 0 PID: 579 Comm: uaf_repro_563 Not 
tainted 6.12.0-rc7+ #17
[   39.466349][  T579] Hardware name: linux,dummy-virt (DT)
[   39.466897][  T579] Call trace:
[   39.467224][  T579]  dump_backtrace+0xa4/0x140
[   39.467742][  T579]  show_stack+0x20/0x38
[   39.468156][  T579]  dump_stack_lvl+0x80/0xf8
[   39.468694][  T579]  print_report+0xfc/0x5c8
[   39.469237][  T579]  kasan_report+0x78/0xc8
[   39.469676][  T579]  __asan_load4+0x9c/0xc0
[   39.470115][  T579]  cache_seq_next_rcu+0xa4/0x180 [sunrpc]
[   39.470842][  T579]  seq_read_iter+0x4a0/0x6c0
[   39.471355][  T579]  seq_read+0x194/0x218
[   39.471770][  T579]  proc_reg_read+0x110/0x198
[   39.472235][  T579]  vfs_read+0x150/0x490
[   39.472656][  T579]  ksys_read+0xd4/0x198
[   39.473070][  T579]  __arm64_sys_read+0x4c/0x68
[   39.473537][  T579]  invoke_syscall+0x64/0x188
[   39.473992][  T579]  el0_svc_common.constprop.1+0xd8/0x158
[   39.474558][  T579]  do_el0_svc+0x38/0x50
[   39.474981][  T579]  el0_svc+0x34/0xc0
[   39.475422][  T579]  el0t_64_sync_handler+0xa0/0xc8
[   39.475933][  T579]  el0t_64_sync+0x188/0x190
[   39.476385][  T579]
[   39.476618][  T579] Allocated by task 566:
[   39.477087][  T579]  kasan_save_stack+0x2c/0x58
[   39.477561][  T579]  kasan_save_track+0x20/0x40
[   39.478030][  T579]  kasan_save_alloc_info+0x40/0x58
[   39.478539][  T579]  __kasan_kmalloc+0xa0/0xb8
[   39.478997][  T579]  __kmalloc_node_track_caller_noprof+0x194/0x370
[   39.479646][  T579]  kmemdup_noprof+0x34/0x68
[   39.480094][  T579]  cache_create_net+0x30/0x108 [sunrpc]
[   39.480800][  T579]  nfsd_export_init+0x78/0x188 [nfsd]
[   39.481505][  T579]  nfsd_net_init+0x50/0x1e8 [nfsd]
[   39.482136][  T579]  ops_init+0xcc/0x210
[   39.482615][  T579]  register_pernet_operations+0x218/0x348
[   39.483180][  T579]  register_pernet_subsys+0x38/0x60
[   39.483698][  T579]  0xffffb6c9bf5b90c0
[   39.484096][  T579]  do_one_initcall+0xa8/0x3c8
[   39.484563][  T579]  do_init_module+0x100/0x378
[   39.485070][  T579]  load_module+0x2d78/0x2e80
[   39.485532][  T579]  init_module_from_file+0xec/0x148
[   39.486044][  T579]  __arm64_sys_finit_module+0x394/0x618
[   39.486604][  T579]  invoke_syscall+0x64/0x188
[   39.487065][  T579]  el0_svc_common.constprop.1+0xd8/0x158
[   39.487629][  T579]  do_el0_svc+0x38/0x50
[   39.488044][  T579]  el0_svc+0x34/0xc0
[   39.488437][  T579]  el0t_64_sync_handler+0xa0/0xc8
[   39.488939][  T579]  el0t_64_sync+0x188/0x190
[   39.489398][  T579]
[   39.489635][  T579] Freed by task 53:
[   39.490013][  T579]  kasan_save_stack+0x2c/0x58
[   39.490479][  T579]  kasan_save_track+0x20/0x40
[   39.490948][  T579]  kasan_save_free_info+0x4c/0x78
[   39.491449][  T579]  __kasan_slab_free+0x50/0x70
[   39.491924][  T579]  kfree+0x160/0x310
[   39.492312][  T579]  cache_destroy_net+0x34/0x50 [sunrpc]
[   39.493015][  T579]  nfsd_export_shutdown+0xc0/0x150 [nfsd]
[   39.493711][  T579]  nfsd_net_exit+0x68/0x88 [nfsd]
[   39.494338][  T579]  ops_exit_list.isra.13+0x64/0xc0
[   39.494856][  T579]  cleanup_net+0x508/0x788
[   39.495300][  T579]  process_scheduled_works+0x3d8/0x7e8
[   39.495895][  T579]  worker_thread+0x29c/0x630
[   39.496364][  T579]  kthread+0x170/0x188
[   39.496773][  T579]  ret_from_fork+0x10/0x20
[   39.497217][  T579]
[   39.497453][  T579] The buggy address belongs to the object at 
ffff00000fbe8400

I have try to replace
fd = open("/proc/fs/nfs/exports", O_RDONLY);
with
fd = open("/proc/fs/nfs/exports", O_RDONLY);

No c_show UAF trigger...


> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files
  2026-05-09  9:41               ` yangerkun
@ 2026-05-10 16:18                 ` Chuck Lever
  0 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2026-05-10 16:18 UTC (permalink / raw)
  To: yangerkun, Misbah Anjum N, Jeff Layton, NeilBrown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, Trond Myklebust,
	Anna Schumaker, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, yi.zhang, Zhihao Cheng, Li Lingfeng
  Cc: linux-nfs, linux-kernel, netdev, Chuck Lever

Hi Erkun -

On Sat, May 9, 2026, at 5:41 AM, yangerkun wrote:
> Hmm... /proc/net is always a symlink to /proc/self/net. After opening
> /proc/net/rpc/<cache>/content and attempting to read it, the
> proc_reg_read function calls use_pde before pde_read. This sequence can
> prevent a race condition because nfsd_export_shutdown leads to
> cache_unregister_net, which calls remove_cache_proc_entries, then
> proc_remove, and eventually proc_entry_rundown. The proc_entry_rundown
> function waits until unuse_pde is called in proc_reg_read. Therefore,
> I'm not sure if forgetting to call get_net when opening
> /proc/net/rpc/<cache>/content is the root cause of the null pointer in
> c_show.

Walked the synchronization. You're right.

cache_unregister_net() calls remove_cache_proc_entries(),
which runs proc_remove(); remove_proc_subtree() then invokes
proc_entry_rundown() on each per-cache file. Rundown does
atomic_add_return(BIAS, &de->in_use), where BIAS = -1U << 31.
No active readers means the post-add value equals BIAS and
rundown returns at once. Readers present means the value
exceeds BIAS, and wait_for_completion() blocks until the last
unuse_pde() decrements the counter to exactly BIAS and signals
the completion. atomic_inc_unless_negative() in use_pde() then
fails, so any later read() on a still-open userspace fd
returns -EIO without touching cd. close_pdeo() forces release
on the remaining openers while cd is still valid.
cache_destroy_net() runs only after that whole sequence has
finished, so cd->hash_table is freed once no reader can be
inside cache_seq_*_rcu() and no fd can dereference cd through
a release callback.

The 5/6 changelog overstates the window. Your reproducer
opens /proc/fs/nfs/exports through exports_nfsd_open(), which
bypasses use_pde() and is the path e7fcf179b82d closed. The
sunrpc cache files reach c_show through proc_reg_read(), which
goes through use_pde()/unuse_pde() and is covered by rundown.
5/6 doesn't close the hazard its changelog describes.

Patch 3/6 is what matches Misbah's reproducer. Pre-series
ip_map_put() drops auth_domain_put() synchronously, with only
the ip_map free deferred:

    auth_domain_put(&im->m_client->h);   /* synchronous */
    kfree_rcu(im, m_rcu);

A reader walking auth.unix.ip/content under rcu_read_lock()
can dereference im->m_client after the auth_domain has been
freed. Same shape as 48db892356d6's svc_export fix, applied
to ip_map. 3/6 moves auth_domain_put() into a deferred
ip_map_release() scheduled via queue_rcu_work(), so the
sub-object free waits for the grace period.

For v2: re-test Misbah's reproducer with patches 1-4 and 6
only and see whether 3/6 alone closes the crash. If it does,
drop 5/6; if it doesn't, reframe 5/6 as a consistency change
without the UAF claim (and without the behavioral change that
pins a netns alive while a debug fd is open). Either way, the
cover letter needs a rewrite to match.

Thanks for your analysis and review.

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-05-10 16:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260501-cache-uaf-fix-v1-0-a49928bf4817@oracle.com>
2026-05-05  5:32 ` [PATCH 0/6] SUNRPC: Address remaining cache_check_rcu() UAF in cache content files Jeff Layton
2026-05-05 10:49 ` Calum Mackay
2026-05-05 10:53   ` Chuck Lever
2026-05-07  9:09 ` yangerkun
2026-05-07 16:12   ` Chuck Lever
2026-05-08  2:45     ` yangerkun
2026-05-08  3:08       ` yangerkun
2026-05-08  8:16         ` yangerkun
2026-05-08 13:00           ` yangerkun
2026-05-08 20:47             ` Chuck Lever
2026-05-09  9:41               ` yangerkun
2026-05-10 16:18                 ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox