[PATCH] nfsd: free nfsd_file by gc after adding it to lru list

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
@ 2025-01-13  2:59 Li Lingfeng
  2025-01-13 14:07 ` Chuck Lever
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Li Lingfeng @ 2025-01-13  2:59 UTC (permalink / raw)
  To: chuck.lever, jlayton, neilb, okorniev, Dai.Ngo, tom
  Cc: linux-nfs, linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun,
	lilingfeng, lilingfeng3

In nfsd_file_put, after inserting the nfsd_file into the nfsd_file_lru
list, gc may be triggered in another thread and immediately release this
nfsd_file, which will lead to a UAF when accessing this nfsd_file again.

All the places where unhash is done will also perform lru_remove, so there
is no need to do lru_remove separately here. After inserting the nfsd_file
into the nfsd_file_lru list, it can be released by relying on gc.

Fixes: 4a0e73e635e3 ("NFSD: Leave open files out of the filecache LRU")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
---
 fs/nfsd/filecache.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index a1cdba42c4fa..37b65cb1579a 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -372,18 +372,10 @@ nfsd_file_put(struct nfsd_file *nf)
 		/* Try to add it to the LRU.  If that fails, decrement. */
 		if (nfsd_file_lru_add(nf)) {
 			/* If it's still hashed, we're done */
-			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
+			if (list_lru_count(&nfsd_file_lru))
 				nfsd_file_schedule_laundrette();
-				return;
-			}
 
-			/*
-			 * We're racing with unhashing, so try to remove it from
-			 * the LRU. If removal fails, then someone else already
-			 * has our reference.
-			 */
-			if (!nfsd_file_lru_remove(nf))
-				return;
+			return;
 		}
 	}
 	if (refcount_dec_and_test(&nf->nf_ref))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-13  2:59 [PATCH] nfsd: free nfsd_file by gc after adding it to lru list Li Lingfeng
@ 2025-01-13 14:07 ` Chuck Lever
  2025-01-14  1:54   ` Li Lingfeng
  2025-01-14 19:27 ` Jeff Layton
  2025-01-14 19:40 ` cel
  2 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2025-01-13 14:07 UTC (permalink / raw)
  To: Li Lingfeng, jlayton, neilb, okorniev, Dai.Ngo, tom
  Cc: linux-nfs, linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun,
	lilingfeng

Hello!

On 1/12/25 9:59 PM, Li Lingfeng wrote:
> In nfsd_file_put, after inserting the nfsd_file into the nfsd_file_lru
> list, gc may be triggered in another thread and immediately release this
> nfsd_file, which will lead to a UAF when accessing this nfsd_file again.

Do you happen to have a reproducer that can trigger this issue?


> All the places where unhash is done will also perform lru_remove, so there
> is no need to do lru_remove separately here. After inserting the nfsd_file
> into the nfsd_file_lru list, it can be released by relying on gc.
> 
> Fixes: 4a0e73e635e3 ("NFSD: Leave open files out of the filecache LRU")

The code that is being replaced below was introduced in ac3a2585f018
("nfsd: rework refcounting in filecache"). This fix won't apply to
kernels that have only 4a0e73e635e3 but not ac3a2585f018, for instance.

At the very least we need to add "Cc: stable@vger.kernel.org # v6.2" but
I'm not convinced "Fixes: 4a0e73e635e3" is correct.


> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
> ---
>   fs/nfsd/filecache.c | 12 ++----------
>   1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index a1cdba42c4fa..37b65cb1579a 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -372,18 +372,10 @@ nfsd_file_put(struct nfsd_file *nf)
>   		/* Try to add it to the LRU.  If that fails, decrement. */
>   		if (nfsd_file_lru_add(nf)) {
>   			/* If it's still hashed, we're done */
> -			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> +			if (list_lru_count(&nfsd_file_lru))
>   				nfsd_file_schedule_laundrette();
> -				return;
> -			}

The above change does not seem to be related to the fix described
in the patch description. Can you help me understand why this is
needed?


> -			/*
> -			 * We're racing with unhashing, so try to remove it from
> -			 * the LRU. If removal fails, then someone else already
> -			 * has our reference.
> -			 */
> -			if (!nfsd_file_lru_remove(nf))
> -				return;
> +			return;
>   		}
>   	}
>   	if (refcount_dec_and_test(&nf->nf_ref))


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-13 14:07 ` Chuck Lever
@ 2025-01-14  1:54   ` Li Lingfeng
  2025-01-14 19:17     ` Chuck Lever
  0 siblings, 1 reply; 17+ messages in thread
From: Li Lingfeng @ 2025-01-14  1:54 UTC (permalink / raw)
  To: Chuck Lever, jlayton, neilb, okorniev, Dai.Ngo, tom
  Cc: linux-nfs, linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun,
	lilingfeng


在 2025/1/13 22:07, Chuck Lever 写道:
> Hello!
>
> On 1/12/25 9:59 PM, Li Lingfeng wrote:
>> In nfsd_file_put, after inserting the nfsd_file into the nfsd_file_lru
>> list, gc may be triggered in another thread and immediately release this
>> nfsd_file, which will lead to a UAF when accessing this nfsd_file again.
>
> Do you happen to have a reproducer that can trigger this issue?
>
Hi, thanks for your reply.

Here is the reproducer:

Patch:
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c

index 585163b4e11c..8a8245b0ce32 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -356,6 +356,7 @@ nfsd_file_get(struct nfsd_file *nf)
  void
  nfsd_file_put(struct nfsd_file *nf)
  {
+ static int count = 0;
         might_sleep();
         trace_nfsd_file_put(nf);

@@ -370,6 +371,11 @@ nfsd_file_put(struct nfsd_file *nf)

                 /* Try to add it to the LRU. If that fails, decrement. */
                 if (nfsd_file_lru_add(nf)) {
+ if (count++ % 3 == 0) {
+ printk("%s %d sleep before test...\n", __func__, __LINE__);
+ msleep(5000);
+ printk("%s %d sleep done\n", __func__, __LINE__);
+ }
                         /* If it's still hashed, we're done */
                         if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
                                 nfsd_file_schedule_laundrette();

Steps:
mkfs.ext4 -F /dev/sdb
mount /dev/sdb /mnt/sdb
echo "/mnt *(rw,no_root_squash,fsid=0)" > /etc/exports
echo "/mnt/sdb *(rw,no_root_squash,fsid=1)" >> /etc/exports
systemctl restart nfs-server

for i in {1..10}; do filename="file$i.txt"; echo "123" > 
/mnt/sdb/"$filename"; done
mount -t nfs -o rw,relatime,vers=3 127.0.0.1:/mnt/sdb /mnt/sdbb
sh test.sh

test.sh:
for i in {1..10}; do
     filename="file$i.txt"
     cat /mnt/sdbb/"$filename" &
done

[  205.114996][ T2409] 
==================================================================

[  205.115006][ T2409] BUG: KASAN: slab-use-after-free in 
nfsd_file_put+0x190/0x3b0
[  205.115055][ T2409] Read of size 8 at addr ffff88810798f3e8 by task 
nfsd/2409
[  205.115062][ T2409]
[  205.115068][ T2409] CPU: 1 UID: 0 PID: 2409 Comm: nfsd Not tainted 
6.13.0-rc6-00038-g09a0fa92e5b4-dirty #83
[  205.115078][ T2409] Hardware name: QEMU Standard PC (i440FX + PIIX, 
1996), BIOS 
?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
123[  205.115085][ T2409] Call Trace:
[  205.115095][ T2409]  <TASK>

[  205.115101][ T2409]  dump_stack_lvl+0x68/0xa0
[  205.115117][ T2409] print_address_description.constprop.0+0x2c/0x3d0
[  205.115132][ T2409]  ? nfsd_file_put+0x190/0x3b0
[  205.115140][ T2409]  print_report+0xb3/0x270
123[  205.115147][ T2409]  ? kasan_addr_to_slab+0xd/0xa0
[  205.115160][ T2409]  kasan_report+0x93/0xc0

[  205.115168][ T2409]  ? nfsd_file_put+0x190/0x3b0
[  205.115196][ T2409]  kasan_check_range+0xf6/0x1b0
[  205.115207][ T2409]  nfsd_file_put+0x190/0x3b0
[  205.115217][ T2409]  nfsd_read+0x151/0x3b0
[  205.115229][ T2409]  ? __pfx_nfsd_read+0x10/0x10
[  205.115238][ T2409]  ? lock_acquire+0x15c/0x3f0
[  205.115249][ T2409]  ? nfsd_init_request+0x6b/0x300
[  205.115259][ T2409]  ? rcu_is_watching+0x20/0x50
[  205.115272][ T2409]  nfsd3_proc_read+0x1c9/0x280
[  205.115285][ T2409]  nfsd_dispatch+0x1b7/0x4c0
[  205.115294][ T2409]  ? __pfx_nfsd_dispatch+0x10/0x10
[  205.115305][ T2409]  ? __asan_memset+0x24/0x50
[  205.115315][ T2409]  ? rcu_is_watching+0x20/0x50
[  205.115325][ T2409]  svc_process_common+0x615/0xd20
[  205.115346][ T2409]  ? __pfx_svc_process_common+0x10/0x10
[  205.115359][ T2409]  ? __pfx_nfsd_dispatch+0x10/0x10
[  205.115374][ T2409]  ? mark_held_locks+0x24/0x90
[  205.115388][ T2409]  ? xdr_init_decode+0x11d/0x190
[  205.115410][ T2409]  svc_process+0x2a8/0x430
[  205.115435][ T2409]  svc_handle_xprt+0x71c/0xb40
[  205.115458][ T2409]  svc_recv+0x5f1/0x9b0
[  205.115477][ T2409]  ? svc_recv+0xab/0x9b0
[  205.115501][ T2409]  nfsd+0x1e7/0x390
[  205.115522][ T2409]  ? __pfx_nfsd+0x10/0x10
[  205.115536][ T2409]  kthread+0x1a7/0x1f0
[  205.115551][ T2409]  ? kthread+0xfb/0x1f0
[  205.139288][ T2409]  ? __pfx_kthread+0x10/0x10
[  205.139769][ T2409]  ret_from_fork+0x34/0x60
[  205.140238][ T2409]  ? __pfx_kthread+0x10/0x10
[  205.140701][ T2409]  ret_from_fork_asm+0x1a/0x30
[  205.141226][ T2409]  </TASK>
[  205.141548][ T2409]
[  205.141809][ T2409] Allocated by task 2409:
[  205.142254][ T2409]  kasan_save_stack+0x24/0x50
[  205.142726][ T2409]  kasan_save_track+0x14/0x30
[  205.143214][ T2409]  __kasan_slab_alloc+0x87/0x90
[  205.143694][ T2409]  kmem_cache_alloc_noprof+0x162/0x4a0
[  205.144268][ T2409]  nfsd_file_do_acquire+0x3a2/0x1420
[  205.144812][ T2409]  nfsd_file_acquire_gc+0x5a/0x80
[  205.145338][ T2409]  nfsd_read+0xc6/0x3b0
[  205.145766][ T2409]  nfsd3_proc_read+0x1c9/0x280
[  205.146226][ T2409]  nfsd_dispatch+0x1b7/0x4c0
[  205.146682][ T2409]  svc_process_common+0x615/0xd20
[  205.147212][ T2409]  svc_process+0x2a8/0x430
[  205.147652][ T2409]  svc_handle_xprt+0x71c/0xb40
[  205.148140][ T2409]  svc_recv+0x5f1/0x9b0
[  205.148569][ T2409]  nfsd+0x1e7/0x390
[  205.148966][ T2409]  kthread+0x1a7/0x1f0
[  205.149388][ T2409]  ret_from_fork+0x34/0x60
[  205.149838][ T2409]  ret_from_fork_asm+0x1a/0x30
[  205.150330][ T2409]
[  205.150646][ T2409] Freed by task 0:
[  205.151120][ T2409]  kasan_save_stack+0x24/0x50
[  205.151778][ T2409]  kasan_save_track+0x14/0x30
[  205.152409][ T2409]  kasan_save_free_info+0x3a/0x60
[  205.153073][ T2409]  __kasan_slab_free+0x54/0x70
[  205.153677][ T2409]  kmem_cache_free+0x159/0x5f0
[  205.154169][ T2409]  rcu_do_batch+0x311/0x900
[  205.154643][ T2409]  rcu_core+0x58c/0x6f0
[  205.155065][ T2409]  handle_softirqs+0xf8/0x570
[  205.155526][ T2409]  irq_exit_rcu+0x141/0x160
[  205.156004][ T2409]  sysvec_apic_timer_interrupt+0x76/0x90
[  205.156575][ T2409]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  205.157196][ T2409]
[  205.157416][ T2409] Last potentially related work creation:
[  205.157986][ T2409]  kasan_save_stack+0x24/0x50
[  205.158404][ T2409]  __kasan_record_aux_stack+0xa6/0xc0
[  205.158893][ T2409]  __call_rcu_common.constprop.0+0xa5/0x8f0
[  205.159515][ T2409]  nfsd_file_dispose_list+0x93/0xc0
[  205.160067][ T2409]  nfsd_file_net_dispose+0x1ad/0x1f0
[  205.160613][ T2409]  nfsd+0x1ef/0x390
[  205.161021][ T2409]  kthread+0x1a7/0x1f0
[  205.161451][ T2409]  ret_from_fork+0x34/0x60
[  205.161908][ T2409]  ret_from_fork_asm+0x1a/0x30
[  205.162407][ T2409]
[  205.162663][ T2409] The buggy address belongs to the object at 
ffff88810798f3b8
[  205.162663][ T2409]  which belongs to the cache nfsd_file of size 128
[  205.164061][ T2409] The buggy address is located 48 bytes inside of
[  205.164061][ T2409]  freed 128-byte region [ffff88810798f3b8, 
ffff88810798f438)
[  205.165446][ T2409]
[  205.165678][ T2409] The buggy address belongs to the physical page:
[  205.166340][ T2409] page: refcount:1 mapcount:0 
mapping:0000000000000000 index:0xffff88810798f4a8 pfn:0x10798e
[  205.167356][ T2409] head: order:1 mapcount:0 entire_mapcount:0 
nr_pages_mapped:0 pincount:0
[  205.168204][ T2409] flags: 
0x17ffffc0000240(workingset|head|node=0|zone=2|lastcpupid=0x1fffff)
[  205.169071][ T2409] page_type: f5(slab)
[  205.169483][ T2409] raw: 0017ffffc0000240 ffff888443a06500 
ffff8881214b4bc8 ffff8881214b4bc8
[  205.170431][ T2409] raw: ffff88810798f4a8 0000000000220005 
00000001f5000000 0000000000000000
[  205.171554][ T2409] head: 0017ffffc0000240 ffff888443a06500 
ffff8881214b4bc8 ffff8881214b4bc8
[  205.172630][ T2409] head: ffff88810798f4a8 0000000000220005 
00000001f5000000 0000000000000000
[  205.173509][ T2409] head: 0017ffffc0000001 ffffea00041e6381 
ffffffffffffffff 0000000000000000
[  205.174587][ T2409] head: 0000000000000002 0000000000000000 
00000000ffffffff 0000000000000000
[  205.175538][ T2409] page dumped because: kasan: bad access detected
[  205.176193][ T2409]
[  205.176427][ T2409] Memory state around the buggy address:
[  205.176997][ T2409]  ffff88810798f280: fc fc fc fc fc fc fc fc fc fc 
fc fc fc fc fc fc
[  205.177810][ T2409]  ffff88810798f300: fc fc fc fc fc fc fc fc fc fc 
fc fc fc fc fc fc
[  205.178601][ T2409] >ffff88810798f380: fc fc fc fc fc fc fc fa fb fb 
fb fb fb fb fb fb
[  205.179420][ 
T2409]                                                           ^
[  205.180190][ T2409]  ffff88810798f400: fb fb fb fb fb fb fb fc fc fc 
fc fc fc fc fc fc
[  205.181000][ T2409]  ffff88810798f480: fc fc fc fc fc fc fc fc fc fc 
fc fc fc fc fc fc
[  205.181844][ T2409] 
==================================================================
[  205.182677][ T2409] Disabling lock debugging due to kernel taint

>
>> All the places where unhash is done will also perform lru_remove, so 
>> there
>> is no need to do lru_remove separately here. After inserting the 
>> nfsd_file
>> into the nfsd_file_lru list, it can be released by relying on gc.
>>
>> Fixes: 4a0e73e635e3 ("NFSD: Leave open files out of the filecache LRU")
>
> The code that is being replaced below was introduced in ac3a2585f018
> ("nfsd: rework refcounting in filecache"). This fix won't apply to
> kernels that have only 4a0e73e635e3 but not ac3a2585f018, for instance.
>
> At the very least we need to add "Cc: stable@vger.kernel.org # v6.2" but
> I'm not convinced "Fixes: 4a0e73e635e3" is correct.
The replaced code is indeed introduced by ac3a2585f018 ("nfsd: rework
refcounting in filecache").
However, commit 4a0e73e635e3 ("NFSD: Leave open files out of the filecache
LRU") added the nfsd_file_lru_add operation in nfsd_file_put, which
enables concurrent removal of the current nfsd_file by gc, potentially
triggering UAF when accessing nfsd_file below.
Therefore, I use 4a0e73e635e3 as the fix tag.
>
>
>> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
>> ---
>>   fs/nfsd/filecache.c | 12 ++----------
>>   1 file changed, 2 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>> index a1cdba42c4fa..37b65cb1579a 100644
>> --- a/fs/nfsd/filecache.c
>> +++ b/fs/nfsd/filecache.c
>> @@ -372,18 +372,10 @@ nfsd_file_put(struct nfsd_file *nf)
>>           /* Try to add it to the LRU.  If that fails, decrement. */
>>           if (nfsd_file_lru_add(nf)) {
>>               /* If it's still hashed, we're done */
>> -            if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
>> +            if (list_lru_count(&nfsd_file_lru))
>>                   nfsd_file_schedule_laundrette();
>> -                return;
>> -            }
>
> The above change does not seem to be related to the fix described
> in the patch description. Can you help me understand why this is
> needed?
Firstly, I removed the access operations on nfsd_file, including checking
nf->nf_flags and nfsd_file_lru_remove, to prevent triggering UAF after
concurrent release of nfsd_file by gc.
Secondly, I kept nfsd_file_schedule_laundrette and used list_lru_count to
determine whether to trigger gc, with the aim of initiating gc and
reclaiming nfsd_file as soon as possible even if no other process triggers
gc.
>
>
>> -            /*
>> -             * We're racing with unhashing, so try to remove it from
>> -             * the LRU. If removal fails, then someone else already
>> -             * has our reference.
>> -             */
>> -            if (!nfsd_file_lru_remove(nf))
>> -                return;
>> +            return;
>>           }
>>       }
>>       if (refcount_dec_and_test(&nf->nf_ref))
>
>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-14  1:54   ` Li Lingfeng
@ 2025-01-14 19:17     ` Chuck Lever
  0 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever @ 2025-01-14 19:17 UTC (permalink / raw)
  To: Li Lingfeng, jlayton
  Cc: linux-nfs, linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun,
	lilingfeng, neilb, okorniev, Dai.Ngo, tom

On 1/13/25 8:54 PM, Li Lingfeng wrote:
> 
> 在 2025/1/13 22:07, Chuck Lever 写道:
>> Hello!
>>
>> On 1/12/25 9:59 PM, Li Lingfeng wrote:
>>> In nfsd_file_put, after inserting the nfsd_file into the nfsd_file_lru
>>> list, gc may be triggered in another thread and immediately release this
>>> nfsd_file, which will lead to a UAF when accessing this nfsd_file again.
>>
>> Do you happen to have a reproducer that can trigger this issue?
>>
> Hi, thanks for your reply.
> 
> Here is the reproducer:
> 
> Patch:
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> 
> index 585163b4e11c..8a8245b0ce32 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -356,6 +356,7 @@ nfsd_file_get(struct nfsd_file *nf)
>   void
>   nfsd_file_put(struct nfsd_file *nf)
>   {
> + static int count = 0;
>          might_sleep();
>          trace_nfsd_file_put(nf);
> 
> @@ -370,6 +371,11 @@ nfsd_file_put(struct nfsd_file *nf)
> 
>                  /* Try to add it to the LRU. If that fails, decrement. */
>                  if (nfsd_file_lru_add(nf)) {
> + if (count++ % 3 == 0) {
> + printk("%s %d sleep before test...\n", __func__, __LINE__);
> + msleep(5000);

This will certainly misbehave because garbage collection evicts files
after 2 seconds, and here the reproducer waits /5/ seconds.

But I can see how it is unsafe to continue dereferencing @nf after the
reference count is dropped. Jeff, any comments about this failure mode
or the proposed fix?

More below.


> + printk("%s %d sleep done\n", __func__, __LINE__);
> + }
>                          /* If it's still hashed, we're done */
>                          if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
>                                  nfsd_file_schedule_laundrette();
> 
> Steps:
> mkfs.ext4 -F /dev/sdb
> mount /dev/sdb /mnt/sdb
> echo "/mnt *(rw,no_root_squash,fsid=0)" > /etc/exports
> echo "/mnt/sdb *(rw,no_root_squash,fsid=1)" >> /etc/exports
> systemctl restart nfs-server
> 
> for i in {1..10}; do filename="file$i.txt"; echo "123" > /mnt/ 
> sdb/"$filename"; done
> mount -t nfs -o rw,relatime,vers=3 127.0.0.1:/mnt/sdb /mnt/sdbb
> sh test.sh
> 
> test.sh:
> for i in {1..10}; do
>      filename="file$i.txt"
>      cat /mnt/sdbb/"$filename" &
> done
> 
> [  205.114996][ T2409] 
> ==================================================================
> 
> [  205.115006][ T2409] BUG: KASAN: slab-use-after-free in 
> nfsd_file_put+0x190/0x3b0
> [  205.115055][ T2409] Read of size 8 at addr ffff88810798f3e8 by task 
> nfsd/2409
> [  205.115062][ T2409]
> [  205.115068][ T2409] CPU: 1 UID: 0 PID: 2409 Comm: nfsd Not tainted 
> 6.13.0-rc6-00038-g09a0fa92e5b4-dirty #83
> [  205.115078][ T2409] Hardware name: QEMU Standard PC (i440FX + PIIX, 
> 1996), BIOS ?-20190727_073836-buildvm- 
> ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
> 123[  205.115085][ T2409] Call Trace:
> [  205.115095][ T2409]  <TASK>
> 
> [  205.115101][ T2409]  dump_stack_lvl+0x68/0xa0
> [  205.115117][ T2409] print_address_description.constprop.0+0x2c/0x3d0
> [  205.115132][ T2409]  ? nfsd_file_put+0x190/0x3b0
> [  205.115140][ T2409]  print_report+0xb3/0x270
> 123[  205.115147][ T2409]  ? kasan_addr_to_slab+0xd/0xa0
> [  205.115160][ T2409]  kasan_report+0x93/0xc0
> 
> [  205.115168][ T2409]  ? nfsd_file_put+0x190/0x3b0
> [  205.115196][ T2409]  kasan_check_range+0xf6/0x1b0
> [  205.115207][ T2409]  nfsd_file_put+0x190/0x3b0
> [  205.115217][ T2409]  nfsd_read+0x151/0x3b0
> [  205.115229][ T2409]  ? __pfx_nfsd_read+0x10/0x10
> [  205.115238][ T2409]  ? lock_acquire+0x15c/0x3f0
> [  205.115249][ T2409]  ? nfsd_init_request+0x6b/0x300
> [  205.115259][ T2409]  ? rcu_is_watching+0x20/0x50
> [  205.115272][ T2409]  nfsd3_proc_read+0x1c9/0x280
> [  205.115285][ T2409]  nfsd_dispatch+0x1b7/0x4c0
> [  205.115294][ T2409]  ? __pfx_nfsd_dispatch+0x10/0x10
> [  205.115305][ T2409]  ? __asan_memset+0x24/0x50
> [  205.115315][ T2409]  ? rcu_is_watching+0x20/0x50
> [  205.115325][ T2409]  svc_process_common+0x615/0xd20
> [  205.115346][ T2409]  ? __pfx_svc_process_common+0x10/0x10
> [  205.115359][ T2409]  ? __pfx_nfsd_dispatch+0x10/0x10
> [  205.115374][ T2409]  ? mark_held_locks+0x24/0x90
> [  205.115388][ T2409]  ? xdr_init_decode+0x11d/0x190
> [  205.115410][ T2409]  svc_process+0x2a8/0x430
> [  205.115435][ T2409]  svc_handle_xprt+0x71c/0xb40
> [  205.115458][ T2409]  svc_recv+0x5f1/0x9b0
> [  205.115477][ T2409]  ? svc_recv+0xab/0x9b0
> [  205.115501][ T2409]  nfsd+0x1e7/0x390
> [  205.115522][ T2409]  ? __pfx_nfsd+0x10/0x10
> [  205.115536][ T2409]  kthread+0x1a7/0x1f0
> [  205.115551][ T2409]  ? kthread+0xfb/0x1f0
> [  205.139288][ T2409]  ? __pfx_kthread+0x10/0x10
> [  205.139769][ T2409]  ret_from_fork+0x34/0x60
> [  205.140238][ T2409]  ? __pfx_kthread+0x10/0x10
> [  205.140701][ T2409]  ret_from_fork_asm+0x1a/0x30
> [  205.141226][ T2409]  </TASK>
> [  205.141548][ T2409]
> [  205.141809][ T2409] Allocated by task 2409:
> [  205.142254][ T2409]  kasan_save_stack+0x24/0x50
> [  205.142726][ T2409]  kasan_save_track+0x14/0x30
> [  205.143214][ T2409]  __kasan_slab_alloc+0x87/0x90
> [  205.143694][ T2409]  kmem_cache_alloc_noprof+0x162/0x4a0
> [  205.144268][ T2409]  nfsd_file_do_acquire+0x3a2/0x1420
> [  205.144812][ T2409]  nfsd_file_acquire_gc+0x5a/0x80
> [  205.145338][ T2409]  nfsd_read+0xc6/0x3b0
> [  205.145766][ T2409]  nfsd3_proc_read+0x1c9/0x280
> [  205.146226][ T2409]  nfsd_dispatch+0x1b7/0x4c0
> [  205.146682][ T2409]  svc_process_common+0x615/0xd20
> [  205.147212][ T2409]  svc_process+0x2a8/0x430
> [  205.147652][ T2409]  svc_handle_xprt+0x71c/0xb40
> [  205.148140][ T2409]  svc_recv+0x5f1/0x9b0
> [  205.148569][ T2409]  nfsd+0x1e7/0x390
> [  205.148966][ T2409]  kthread+0x1a7/0x1f0
> [  205.149388][ T2409]  ret_from_fork+0x34/0x60
> [  205.149838][ T2409]  ret_from_fork_asm+0x1a/0x30
> [  205.150330][ T2409]
> [  205.150646][ T2409] Freed by task 0:
> [  205.151120][ T2409]  kasan_save_stack+0x24/0x50
> [  205.151778][ T2409]  kasan_save_track+0x14/0x30
> [  205.152409][ T2409]  kasan_save_free_info+0x3a/0x60
> [  205.153073][ T2409]  __kasan_slab_free+0x54/0x70
> [  205.153677][ T2409]  kmem_cache_free+0x159/0x5f0
> [  205.154169][ T2409]  rcu_do_batch+0x311/0x900
> [  205.154643][ T2409]  rcu_core+0x58c/0x6f0
> [  205.155065][ T2409]  handle_softirqs+0xf8/0x570
> [  205.155526][ T2409]  irq_exit_rcu+0x141/0x160
> [  205.156004][ T2409]  sysvec_apic_timer_interrupt+0x76/0x90
> [  205.156575][ T2409]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [  205.157196][ T2409]
> [  205.157416][ T2409] Last potentially related work creation:
> [  205.157986][ T2409]  kasan_save_stack+0x24/0x50
> [  205.158404][ T2409]  __kasan_record_aux_stack+0xa6/0xc0
> [  205.158893][ T2409]  __call_rcu_common.constprop.0+0xa5/0x8f0
> [  205.159515][ T2409]  nfsd_file_dispose_list+0x93/0xc0
> [  205.160067][ T2409]  nfsd_file_net_dispose+0x1ad/0x1f0
> [  205.160613][ T2409]  nfsd+0x1ef/0x390
> [  205.161021][ T2409]  kthread+0x1a7/0x1f0
> [  205.161451][ T2409]  ret_from_fork+0x34/0x60
> [  205.161908][ T2409]  ret_from_fork_asm+0x1a/0x30
> [  205.162407][ T2409]
> [  205.162663][ T2409] The buggy address belongs to the object at 
> ffff88810798f3b8
> [  205.162663][ T2409]  which belongs to the cache nfsd_file of size 128
> [  205.164061][ T2409] The buggy address is located 48 bytes inside of
> [  205.164061][ T2409]  freed 128-byte region [ffff88810798f3b8, 
> ffff88810798f438)
> [  205.165446][ T2409]
> [  205.165678][ T2409] The buggy address belongs to the physical page:
> [  205.166340][ T2409] page: refcount:1 mapcount:0 
> mapping:0000000000000000 index:0xffff88810798f4a8 pfn:0x10798e
> [  205.167356][ T2409] head: order:1 mapcount:0 entire_mapcount:0 
> nr_pages_mapped:0 pincount:0
> [  205.168204][ T2409] flags: 0x17ffffc0000240(workingset|head|node=0| 
> zone=2|lastcpupid=0x1fffff)
> [  205.169071][ T2409] page_type: f5(slab)
> [  205.169483][ T2409] raw: 0017ffffc0000240 ffff888443a06500 
> ffff8881214b4bc8 ffff8881214b4bc8
> [  205.170431][ T2409] raw: ffff88810798f4a8 0000000000220005 
> 00000001f5000000 0000000000000000
> [  205.171554][ T2409] head: 0017ffffc0000240 ffff888443a06500 
> ffff8881214b4bc8 ffff8881214b4bc8
> [  205.172630][ T2409] head: ffff88810798f4a8 0000000000220005 
> 00000001f5000000 0000000000000000
> [  205.173509][ T2409] head: 0017ffffc0000001 ffffea00041e6381 
> ffffffffffffffff 0000000000000000
> [  205.174587][ T2409] head: 0000000000000002 0000000000000000 
> 00000000ffffffff 0000000000000000
> [  205.175538][ T2409] page dumped because: kasan: bad access detected
> [  205.176193][ T2409]
> [  205.176427][ T2409] Memory state around the buggy address:
> [  205.176997][ T2409]  ffff88810798f280: fc fc fc fc fc fc fc fc fc fc 
> fc fc fc fc fc fc
> [  205.177810][ T2409]  ffff88810798f300: fc fc fc fc fc fc fc fc fc fc 
> fc fc fc fc fc fc
> [  205.178601][ T2409] >ffff88810798f380: fc fc fc fc fc fc fc fa fb fb 
> fb fb fb fb fb fb
> [  205.179420] 
> [ T2409]                                                           ^
> [  205.180190][ T2409]  ffff88810798f400: fb fb fb fb fb fb fb fc fc fc 
> fc fc fc fc fc fc
> [  205.181000][ T2409]  ffff88810798f480: fc fc fc fc fc fc fc fc fc fc 
> fc fc fc fc fc fc
> [  205.181844][ T2409] 
> ==================================================================
> [  205.182677][ T2409] Disabling lock debugging due to kernel taint
> 
>>
>>> All the places where unhash is done will also perform lru_remove, so 
>>> there
>>> is no need to do lru_remove separately here. After inserting the 
>>> nfsd_file
>>> into the nfsd_file_lru list, it can be released by relying on gc.
>>>
>>> Fixes: 4a0e73e635e3 ("NFSD: Leave open files out of the filecache LRU")
>>
>> The code that is being replaced below was introduced in ac3a2585f018
>> ("nfsd: rework refcounting in filecache"). This fix won't apply to
>> kernels that have only 4a0e73e635e3 but not ac3a2585f018, for instance.
>>
>> At the very least we need to add "Cc: stable@vger.kernel.org # v6.2" but
>> I'm not convinced "Fixes: 4a0e73e635e3" is correct.
> The replaced code is indeed introduced by ac3a2585f018 ("nfsd: rework
> refcounting in filecache").
> However, commit 4a0e73e635e3 ("NFSD: Leave open files out of the filecache
> LRU") added the nfsd_file_lru_add operation in nfsd_file_put, which
> enables concurrent removal of the current nfsd_file by gc, potentially
> triggering UAF when accessing nfsd_file below.
> Therefore, I use 4a0e73e635e3 as the fix tag.
>>
>>
>>> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
>>> ---
>>>   fs/nfsd/filecache.c | 12 ++----------
>>>   1 file changed, 2 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>>> index a1cdba42c4fa..37b65cb1579a 100644
>>> --- a/fs/nfsd/filecache.c
>>> +++ b/fs/nfsd/filecache.c
>>> @@ -372,18 +372,10 @@ nfsd_file_put(struct nfsd_file *nf)
>>>           /* Try to add it to the LRU.  If that fails, decrement. */
>>>           if (nfsd_file_lru_add(nf)) {
>>>               /* If it's still hashed, we're done */
>>> -            if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
>>> +            if (list_lru_count(&nfsd_file_lru))
>>>                   nfsd_file_schedule_laundrette();
>>> -                return;
>>> -            }
>>
>> The above change does not seem to be related to the fix described
>> in the patch description. Can you help me understand why this is
>> needed?
> Firstly, I removed the access operations on nfsd_file, including checking
> nf->nf_flags and nfsd_file_lru_remove, to prevent triggering UAF after
> concurrent release of nfsd_file by gc.

Understood and agreed.


> Secondly, I kept nfsd_file_schedule_laundrette and used list_lru_count to
> determine whether to trigger gc, with the aim of initiating gc and
> reclaiming nfsd_file as soon as possible even if no other process triggers
> gc.

The list_lru_count() remains non-zero for extended periods which might
cause the proposed code to trigger the laundrette more often than is
optimal.


>>> -            /*
>>> -             * We're racing with unhashing, so try to remove it from
>>> -             * the LRU. If removal fails, then someone else already
>>> -             * has our reference.
>>> -             */
>>> -            if (!nfsd_file_lru_remove(nf))
>>> -                return;
>>> +            return;
>>>           }
>>>       }
>>>       if (refcount_dec_and_test(&nf->nf_ref))
>>
>>


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-13  2:59 [PATCH] nfsd: free nfsd_file by gc after adding it to lru list Li Lingfeng
  2025-01-13 14:07 ` Chuck Lever
@ 2025-01-14 19:27 ` Jeff Layton
  2025-01-14 19:39   ` Jeff Layton
  2025-01-14 19:40 ` cel
  2 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2025-01-14 19:27 UTC (permalink / raw)
  To: Li Lingfeng, chuck.lever, neilb, okorniev, Dai.Ngo, tom
  Cc: linux-nfs, linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun,
	lilingfeng

On Mon, 2025-01-13 at 10:59 +0800, Li Lingfeng wrote:
> In nfsd_file_put, after inserting the nfsd_file into the nfsd_file_lru
> list, gc may be triggered in another thread and immediately release this
> nfsd_file, which will lead to a UAF when accessing this nfsd_file again.
> 
> All the places where unhash is done will also perform lru_remove, so there
> is no need to do lru_remove separately here. After inserting the nfsd_file
> into the nfsd_file_lru list, it can be released by relying on gc.
> 
> Fixes: 4a0e73e635e3 ("NFSD: Leave open files out of the filecache LRU")
> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
> ---
>  fs/nfsd/filecache.c | 12 ++----------
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index a1cdba42c4fa..37b65cb1579a 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -372,18 +372,10 @@ nfsd_file_put(struct nfsd_file *nf)
>  		/* Try to add it to the LRU.  If that fails, decrement. */
>  		if (nfsd_file_lru_add(nf)) {
>  			/* If it's still hashed, we're done */
> -			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> +			if (list_lru_count(&nfsd_file_lru))
>  				nfsd_file_schedule_laundrette();
> -				return;
> -			}
>  
> -			/*
> -			 * We're racing with unhashing, so try to remove it from
> -			 * the LRU. If removal fails, then someone else already
> -			 * has our reference.
> -			 */
> -			if (!nfsd_file_lru_remove(nf))
> -				return;
> +			return;
>  		}
>  	}
>  	if (refcount_dec_and_test(&nf->nf_ref))

I think this looks OK. Filecache bugs are particularly nasty though, so
let's run this through a nice long testing cycle.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-14 19:27 ` Jeff Layton
@ 2025-01-14 19:39   ` Jeff Layton
  2025-01-15 15:03     ` Chuck Lever
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2025-01-14 19:39 UTC (permalink / raw)
  To: Li Lingfeng, chuck.lever, neilb, okorniev, Dai.Ngo, tom
  Cc: linux-nfs, linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun,
	lilingfeng

On Tue, 2025-01-14 at 14:27 -0500, Jeff Layton wrote:
> On Mon, 2025-01-13 at 10:59 +0800, Li Lingfeng wrote:
> > In nfsd_file_put, after inserting the nfsd_file into the nfsd_file_lru
> > list, gc may be triggered in another thread and immediately release this
> > nfsd_file, which will lead to a UAF when accessing this nfsd_file again.
> > 
> > All the places where unhash is done will also perform lru_remove, so there
> > is no need to do lru_remove separately here. After inserting the nfsd_file
> > into the nfsd_file_lru list, it can be released by relying on gc.
> > 
> > Fixes: 4a0e73e635e3 ("NFSD: Leave open files out of the filecache LRU")
> > Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
> > ---
> >  fs/nfsd/filecache.c | 12 ++----------
> >  1 file changed, 2 insertions(+), 10 deletions(-)
> > 
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index a1cdba42c4fa..37b65cb1579a 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -372,18 +372,10 @@ nfsd_file_put(struct nfsd_file *nf)
> >  		/* Try to add it to the LRU.  If that fails, decrement. */
> >  		if (nfsd_file_lru_add(nf)) {
> >  			/* If it's still hashed, we're done */
> > -			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> > +			if (list_lru_count(&nfsd_file_lru))
> >  				nfsd_file_schedule_laundrette();
> > -				return;
> > -			}
> >  
> > -			/*
> > -			 * We're racing with unhashing, so try to remove it from
> > -			 * the LRU. If removal fails, then someone else already
> > -			 * has our reference.
> > -			 */
> > -			if (!nfsd_file_lru_remove(nf))
> > -				return;
> > +			return;
> >  		}
> >  	}
> >  	if (refcount_dec_and_test(&nf->nf_ref))
> 
> I think this looks OK. Filecache bugs are particularly nasty though, so
> let's run this through a nice long testing cycle.
> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>

Actually, I take it back. This is problematic in another way.

In this case, we're racing with another task that is unhashing the
object, but we've put it on the LRU ourselves. What guarantee do we
have that the unhashing and removal from the LRU didn't occur before
this task called nfsd_file_lru_add()? That's why we attempt to remove
it here -- we can't rely on the task that unhashed it to do so at that
point.

What might be best is to take and hold the rcu_read_lock() before doing
the nfsd_file_lru_add, and just release it after we do these racy
checks. That should make it safe to access the object.

Thoughts?
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-13  2:59 [PATCH] nfsd: free nfsd_file by gc after adding it to lru list Li Lingfeng
  2025-01-13 14:07 ` Chuck Lever
  2025-01-14 19:27 ` Jeff Layton
@ 2025-01-14 19:40 ` cel
  2 siblings, 0 replies; 17+ messages in thread
From: cel @ 2025-01-14 19:40 UTC (permalink / raw)
  To: jlayton, neilb, okorniev, Dai.Ngo, tom, Li Lingfeng
  Cc: Chuck Lever, linux-nfs, linux-kernel, yukuai1, houtao1, yi.zhang,
	yangerkun, lilingfeng

From: Chuck Lever <chuck.lever@oracle.com>

On Mon, 13 Jan 2025 10:59:57 +0800, Li Lingfeng wrote:
> In nfsd_file_put, after inserting the nfsd_file into the nfsd_file_lru
> list, gc may be triggered in another thread and immediately release this
> nfsd_file, which will lead to a UAF when accessing this nfsd_file again.
> 
> All the places where unhash is done will also perform lru_remove, so there
> is no need to do lru_remove separately here. After inserting the nfsd_file
> into the nfsd_file_lru list, it can be released by relying on gc.
> 
> [...]

Applied to nfsd-testing, thanks!

Cc-stable tag added.

[1/1] nfsd: free nfsd_file by gc after adding it to lru list
      commit: e57420be100ab3ff6d42992a37ce34f3e03d8d91

--
Chuck Lever


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-14 19:39   ` Jeff Layton
@ 2025-01-15 15:03     ` Chuck Lever
  2025-01-15 15:27       ` Jeff Layton
  2025-01-21 20:50       ` Jeff Layton
  0 siblings, 2 replies; 17+ messages in thread
From: Chuck Lever @ 2025-01-15 15:03 UTC (permalink / raw)
  To: Jeff Layton, Li Lingfeng, neilb, okorniev, Dai.Ngo, tom
  Cc: linux-nfs, linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun,
	lilingfeng

On 1/14/25 2:39 PM, Jeff Layton wrote:
> On Tue, 2025-01-14 at 14:27 -0500, Jeff Layton wrote:
>> On Mon, 2025-01-13 at 10:59 +0800, Li Lingfeng wrote:
>>> In nfsd_file_put, after inserting the nfsd_file into the nfsd_file_lru
>>> list, gc may be triggered in another thread and immediately release this
>>> nfsd_file, which will lead to a UAF when accessing this nfsd_file again.
>>>
>>> All the places where unhash is done will also perform lru_remove, so there
>>> is no need to do lru_remove separately here. After inserting the nfsd_file
>>> into the nfsd_file_lru list, it can be released by relying on gc.
>>>
>>> Fixes: 4a0e73e635e3 ("NFSD: Leave open files out of the filecache LRU")
>>> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
>>> ---
>>>   fs/nfsd/filecache.c | 12 ++----------
>>>   1 file changed, 2 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>>> index a1cdba42c4fa..37b65cb1579a 100644
>>> --- a/fs/nfsd/filecache.c
>>> +++ b/fs/nfsd/filecache.c
>>> @@ -372,18 +372,10 @@ nfsd_file_put(struct nfsd_file *nf)
>>>   		/* Try to add it to the LRU.  If that fails, decrement. */
>>>   		if (nfsd_file_lru_add(nf)) {
>>>   			/* If it's still hashed, we're done */
>>> -			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
>>> +			if (list_lru_count(&nfsd_file_lru))
>>>   				nfsd_file_schedule_laundrette();
>>> -				return;
>>> -			}
>>>   
>>> -			/*
>>> -			 * We're racing with unhashing, so try to remove it from
>>> -			 * the LRU. If removal fails, then someone else already
>>> -			 * has our reference.
>>> -			 */
>>> -			if (!nfsd_file_lru_remove(nf))
>>> -				return;
>>> +			return;
>>>   		}
>>>   	}
>>>   	if (refcount_dec_and_test(&nf->nf_ref))
>>
>> I think this looks OK. Filecache bugs are particularly nasty though, so
>> let's run this through a nice long testing cycle.
>>
>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> 
> Actually, I take it back. This is problematic in another way.
> 
> In this case, we're racing with another task that is unhashing the
> object, but we've put it on the LRU ourselves. What guarantee do we
> have that the unhashing and removal from the LRU didn't occur before
> this task called nfsd_file_lru_add()? That's why we attempt to remove
> it here -- we can't rely on the task that unhashed it to do so at that
> point.
> 
> What might be best is to take and hold the rcu_read_lock() before doing
> the nfsd_file_lru_add, and just release it after we do these racy
> checks. That should make it safe to access the object.
> 
> Thoughts?

Holding the RCU read lock will keep the dereferences safe since
nfsd_file objects are freed only after an RCU grace period. But will the
logic in nfsd_file_put() work properly on totally dead nfsd_file
objects? I don't see a specific failure mode there, but I'm not very
imaginative.

Overall, I think RCU would help.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-15 15:03     ` Chuck Lever
@ 2025-01-15 15:27       ` Jeff Layton
  2025-01-22  1:33         ` Li Lingfeng
  2025-01-21 20:50       ` Jeff Layton
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2025-01-15 15:27 UTC (permalink / raw)
  To: Chuck Lever, Li Lingfeng, neilb, okorniev, Dai.Ngo, tom
  Cc: linux-nfs, linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun,
	lilingfeng

On Wed, 2025-01-15 at 10:03 -0500, Chuck Lever wrote:
> On 1/14/25 2:39 PM, Jeff Layton wrote:
> > On Tue, 2025-01-14 at 14:27 -0500, Jeff Layton wrote:
> > > On Mon, 2025-01-13 at 10:59 +0800, Li Lingfeng wrote:
> > > > In nfsd_file_put, after inserting the nfsd_file into the nfsd_file_lru
> > > > list, gc may be triggered in another thread and immediately release this
> > > > nfsd_file, which will lead to a UAF when accessing this nfsd_file again.
> > > > 
> > > > All the places where unhash is done will also perform lru_remove, so there
> > > > is no need to do lru_remove separately here. After inserting the nfsd_file
> > > > into the nfsd_file_lru list, it can be released by relying on gc.
> > > > 
> > > > Fixes: 4a0e73e635e3 ("NFSD: Leave open files out of the filecache LRU")
> > > > Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
> > > > ---
> > > >   fs/nfsd/filecache.c | 12 ++----------
> > > >   1 file changed, 2 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > > index a1cdba42c4fa..37b65cb1579a 100644
> > > > --- a/fs/nfsd/filecache.c
> > > > +++ b/fs/nfsd/filecache.c
> > > > @@ -372,18 +372,10 @@ nfsd_file_put(struct nfsd_file *nf)
> > > >   		/* Try to add it to the LRU.  If that fails, decrement. */
> > > >   		if (nfsd_file_lru_add(nf)) {
> > > >   			/* If it's still hashed, we're done */
> > > > -			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> > > > +			if (list_lru_count(&nfsd_file_lru))
> > > >   				nfsd_file_schedule_laundrette();
> > > > -				return;
> > > > -			}
> > > >   
> > > > -			/*
> > > > -			 * We're racing with unhashing, so try to remove it from
> > > > -			 * the LRU. If removal fails, then someone else already
> > > > -			 * has our reference.
> > > > -			 */
> > > > -			if (!nfsd_file_lru_remove(nf))
> > > > -				return;
> > > > +			return;
> > > >   		}
> > > >   	}
> > > >   	if (refcount_dec_and_test(&nf->nf_ref))
> > > 
> > > I think this looks OK. Filecache bugs are particularly nasty though, so
> > > let's run this through a nice long testing cycle.
> > > 
> > > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > 
> > Actually, I take it back. This is problematic in another way.
> > 
> > In this case, we're racing with another task that is unhashing the
> > object, but we've put it on the LRU ourselves. What guarantee do we
> > have that the unhashing and removal from the LRU didn't occur before
> > this task called nfsd_file_lru_add()? That's why we attempt to remove
> > it here -- we can't rely on the task that unhashed it to do so at that
> > point.
> > 
> > What might be best is to take and hold the rcu_read_lock() before doing
> > the nfsd_file_lru_add, and just release it after we do these racy
> > checks. That should make it safe to access the object.
> > 
> > Thoughts?
> 
> Holding the RCU read lock will keep the dereferences safe since
> nfsd_file objects are freed only after an RCU grace period. But will the
> logic in nfsd_file_put() work properly on totally dead nfsd_file
> objects? I don't see a specific failure mode there, but I'm not very
> imaginative.
> 
> Overall, I think RCU would help.
> 

It should be safe to call nfsd_file_lru_add() with the rcu_read_lock()
held. After that we're just looking at the nf_flags() and the nf_lru
list head. On a dead file, HASHED will be clear and the
nfsd_file_lru_remove() call will be a no-op (the list_head will be
empty).

Li Lingfeng, do you want to propose a patch for this? Unfortunately,
your reproducer won't work after that, since you can't sleep with the
rcu_read_lock held.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-15 15:03     ` Chuck Lever
  2025-01-15 15:27       ` Jeff Layton
@ 2025-01-21 20:50       ` Jeff Layton
  2025-01-22  1:15         ` NeilBrown
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2025-01-21 20:50 UTC (permalink / raw)
  To: Chuck Lever, Li Lingfeng, neilb, okorniev, Dai.Ngo, tom
  Cc: linux-nfs, linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun,
	lilingfeng

On Wed, 2025-01-15 at 10:03 -0500, Chuck Lever wrote:
> On 1/14/25 2:39 PM, Jeff Layton wrote:
> > On Tue, 2025-01-14 at 14:27 -0500, Jeff Layton wrote:
> > > On Mon, 2025-01-13 at 10:59 +0800, Li Lingfeng wrote:
> > > > In nfsd_file_put, after inserting the nfsd_file into the nfsd_file_lru
> > > > list, gc may be triggered in another thread and immediately release this
> > > > nfsd_file, which will lead to a UAF when accessing this nfsd_file again.
> > > > 
> > > > All the places where unhash is done will also perform lru_remove, so there
> > > > is no need to do lru_remove separately here. After inserting the nfsd_file
> > > > into the nfsd_file_lru list, it can be released by relying on gc.
> > > > 
> > > > Fixes: 4a0e73e635e3 ("NFSD: Leave open files out of the filecache LRU")
> > > > Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
> > > > ---
> > > >   fs/nfsd/filecache.c | 12 ++----------
> > > >   1 file changed, 2 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > > index a1cdba42c4fa..37b65cb1579a 100644
> > > > --- a/fs/nfsd/filecache.c
> > > > +++ b/fs/nfsd/filecache.c
> > > > @@ -372,18 +372,10 @@ nfsd_file_put(struct nfsd_file *nf)
> > > >   		/* Try to add it to the LRU.  If that fails, decrement. */
> > > >   		if (nfsd_file_lru_add(nf)) {
> > > >   			/* If it's still hashed, we're done */
> > > > -			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> > > > +			if (list_lru_count(&nfsd_file_lru))
> > > >   				nfsd_file_schedule_laundrette();
> > > > -				return;
> > > > -			}
> > > >   
> > > > -			/*
> > > > -			 * We're racing with unhashing, so try to remove it from
> > > > -			 * the LRU. If removal fails, then someone else already
> > > > -			 * has our reference.
> > > > -			 */
> > > > -			if (!nfsd_file_lru_remove(nf))
> > > > -				return;
> > > > +			return;
> > > >   		}
> > > >   	}
> > > >   	if (refcount_dec_and_test(&nf->nf_ref))
> > > 
> > > I think this looks OK. Filecache bugs are particularly nasty though, so
> > > let's run this through a nice long testing cycle.
> > > 
> > > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > 
> > Actually, I take it back. This is problematic in another way.
> > 
> > In this case, we're racing with another task that is unhashing the
> > object, but we've put it on the LRU ourselves. What guarantee do we
> > have that the unhashing and removal from the LRU didn't occur before
> > this task called nfsd_file_lru_add()? That's why we attempt to remove
> > it here -- we can't rely on the task that unhashed it to do so at that
> > point.
> > 
> > What might be best is to take and hold the rcu_read_lock() before doing
> > the nfsd_file_lru_add, and just release it after we do these racy
> > checks. That should make it safe to access the object.
> > 
> > Thoughts?
> 
> Holding the RCU read lock will keep the dereferences safe since
> nfsd_file objects are freed only after an RCU grace period. But will the
> logic in nfsd_file_put() work properly on totally dead nfsd_file
> objects? I don't see a specific failure mode there, but I'm not very
> imaginative.
> 
> Overall, I think RCU would help.
> 
> 

To be clear, I think we need to drop e57420be100ab from your nfsd-
testing branch. The race I identified above is quite likely to occur
and could lead to leaks.

If Li Lingfeng doesn't propose a patch, I'll spin one up tomorrow. I
think the RCU approach is safe.

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-21 20:50       ` Jeff Layton
@ 2025-01-22  1:15         ` NeilBrown
  2025-01-22  1:43           ` Li Lingfeng
  2025-01-22  3:48           ` NeilBrown
  0 siblings, 2 replies; 17+ messages in thread
From: NeilBrown @ 2025-01-22  1:15 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Chuck Lever, Li Lingfeng, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun, lilingfeng

On Wed, 22 Jan 2025, Jeff Layton wrote:
> On Wed, 2025-01-15 at 10:03 -0500, Chuck Lever wrote:
> > On 1/14/25 2:39 PM, Jeff Layton wrote:
> > > On Tue, 2025-01-14 at 14:27 -0500, Jeff Layton wrote:
> > > > On Mon, 2025-01-13 at 10:59 +0800, Li Lingfeng wrote:
> > > > > In nfsd_file_put, after inserting the nfsd_file into the nfsd_file_lru
> > > > > list, gc may be triggered in another thread and immediately release this
> > > > > nfsd_file, which will lead to a UAF when accessing this nfsd_file again.
> > > > > 
> > > > > All the places where unhash is done will also perform lru_remove, so there
> > > > > is no need to do lru_remove separately here. After inserting the nfsd_file
> > > > > into the nfsd_file_lru list, it can be released by relying on gc.
> > > > > 
> > > > > Fixes: 4a0e73e635e3 ("NFSD: Leave open files out of the filecache LRU")
> > > > > Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
> > > > > ---
> > > > >   fs/nfsd/filecache.c | 12 ++----------
> > > > >   1 file changed, 2 insertions(+), 10 deletions(-)
> > > > > 
> > > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > > > index a1cdba42c4fa..37b65cb1579a 100644
> > > > > --- a/fs/nfsd/filecache.c
> > > > > +++ b/fs/nfsd/filecache.c
> > > > > @@ -372,18 +372,10 @@ nfsd_file_put(struct nfsd_file *nf)
> > > > >   		/* Try to add it to the LRU.  If that fails, decrement. */
> > > > >   		if (nfsd_file_lru_add(nf)) {
> > > > >   			/* If it's still hashed, we're done */
> > > > > -			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> > > > > +			if (list_lru_count(&nfsd_file_lru))
> > > > >   				nfsd_file_schedule_laundrette();
> > > > > -				return;
> > > > > -			}
> > > > >   
> > > > > -			/*
> > > > > -			 * We're racing with unhashing, so try to remove it from
> > > > > -			 * the LRU. If removal fails, then someone else already
> > > > > -			 * has our reference.
> > > > > -			 */
> > > > > -			if (!nfsd_file_lru_remove(nf))
> > > > > -				return;
> > > > > +			return;
> > > > >   		}
> > > > >   	}
> > > > >   	if (refcount_dec_and_test(&nf->nf_ref))
> > > > 
> > > > I think this looks OK. Filecache bugs are particularly nasty though, so
> > > > let's run this through a nice long testing cycle.
> > > > 
> > > > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > > 
> > > Actually, I take it back. This is problematic in another way.
> > > 
> > > In this case, we're racing with another task that is unhashing the
> > > object, but we've put it on the LRU ourselves. What guarantee do we
> > > have that the unhashing and removal from the LRU didn't occur before
> > > this task called nfsd_file_lru_add()? That's why we attempt to remove
> > > it here -- we can't rely on the task that unhashed it to do so at that
> > > point.
> > > 
> > > What might be best is to take and hold the rcu_read_lock() before doing
> > > the nfsd_file_lru_add, and just release it after we do these racy
> > > checks. That should make it safe to access the object.
> > > 
> > > Thoughts?
> > 
> > Holding the RCU read lock will keep the dereferences safe since
> > nfsd_file objects are freed only after an RCU grace period. But will the
> > logic in nfsd_file_put() work properly on totally dead nfsd_file
> > objects? I don't see a specific failure mode there, but I'm not very
> > imaginative.
> > 
> > Overall, I think RCU would help.
> > 
> > 
> 
> To be clear, I think we need to drop e57420be100ab from your nfsd-
> testing branch. The race I identified above is quite likely to occur
> and could lead to leaks.
> 
> If Li Lingfeng doesn't propose a patch, I'll spin one up tomorrow. I
> think the RCU approach is safe.

I'm not convinced this is the right approach.
I cannot see how nfsd_file_put() can race with unhashing.  If it cannot
then we can simply unconditionally call nfsd_file_schedule_laundrette().

Can describe how the race can happen - if indeed it can.

Note that we also need rcu protection in nfsd_file_lru_add() so that the
nf doesn't get freed after it is added the the lru and before the trace
point.  If we don't end up needing rcu round the call, we will need it
in the call.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-15 15:27       ` Jeff Layton
@ 2025-01-22  1:33         ` Li Lingfeng
  0 siblings, 0 replies; 17+ messages in thread
From: Li Lingfeng @ 2025-01-22  1:33 UTC (permalink / raw)
  To: Jeff Layton, Chuck Lever, neilb, okorniev, Dai.Ngo, tom
  Cc: linux-nfs, linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun,
	lilingfeng


在 2025/1/15 23:27, Jeff Layton 写道:
> On Wed, 2025-01-15 at 10:03 -0500, Chuck Lever wrote:
>> On 1/14/25 2:39 PM, Jeff Layton wrote:
>>> On Tue, 2025-01-14 at 14:27 -0500, Jeff Layton wrote:
>>>> On Mon, 2025-01-13 at 10:59 +0800, Li Lingfeng wrote:
>>>>> In nfsd_file_put, after inserting the nfsd_file into the nfsd_file_lru
>>>>> list, gc may be triggered in another thread and immediately release this
>>>>> nfsd_file, which will lead to a UAF when accessing this nfsd_file again.
>>>>>
>>>>> All the places where unhash is done will also perform lru_remove, so there
>>>>> is no need to do lru_remove separately here. After inserting the nfsd_file
>>>>> into the nfsd_file_lru list, it can be released by relying on gc.
>>>>>
>>>>> Fixes: 4a0e73e635e3 ("NFSD: Leave open files out of the filecache LRU")
>>>>> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
>>>>> ---
>>>>>    fs/nfsd/filecache.c | 12 ++----------
>>>>>    1 file changed, 2 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>>>>> index a1cdba42c4fa..37b65cb1579a 100644
>>>>> --- a/fs/nfsd/filecache.c
>>>>> +++ b/fs/nfsd/filecache.c
>>>>> @@ -372,18 +372,10 @@ nfsd_file_put(struct nfsd_file *nf)
>>>>>    		/* Try to add it to the LRU.  If that fails, decrement. */
>>>>>    		if (nfsd_file_lru_add(nf)) {
>>>>>    			/* If it's still hashed, we're done */
>>>>> -			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
>>>>> +			if (list_lru_count(&nfsd_file_lru))
>>>>>    				nfsd_file_schedule_laundrette();
>>>>> -				return;
>>>>> -			}
>>>>>    
>>>>> -			/*
>>>>> -			 * We're racing with unhashing, so try to remove it from
>>>>> -			 * the LRU. If removal fails, then someone else already
>>>>> -			 * has our reference.
>>>>> -			 */
>>>>> -			if (!nfsd_file_lru_remove(nf))
>>>>> -				return;
>>>>> +			return;
>>>>>    		}
>>>>>    	}
>>>>>    	if (refcount_dec_and_test(&nf->nf_ref))
>>>> I think this looks OK. Filecache bugs are particularly nasty though, so
>>>> let's run this through a nice long testing cycle.
>>>>
>>>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>>> Actually, I take it back. This is problematic in another way.
>>>
>>> In this case, we're racing with another task that is unhashing the
>>> object, but we've put it on the LRU ourselves. What guarantee do we
>>> have that the unhashing and removal from the LRU didn't occur before
>>> this task called nfsd_file_lru_add()? That's why we attempt to remove
>>> it here -- we can't rely on the task that unhashed it to do so at that
>>> point.
>>>
>>> What might be best is to take and hold the rcu_read_lock() before doing
>>> the nfsd_file_lru_add, and just release it after we do these racy
>>> checks. That should make it safe to access the object.
>>>
>>> Thoughts?
>> Holding the RCU read lock will keep the dereferences safe since
>> nfsd_file objects are freed only after an RCU grace period. But will the
>> logic in nfsd_file_put() work properly on totally dead nfsd_file
>> objects? I don't see a specific failure mode there, but I'm not very
>> imaginative.
>>
>> Overall, I think RCU would help.
>>
> It should be safe to call nfsd_file_lru_add() with the rcu_read_lock()
> held. After that we're just looking at the nf_flags() and the nf_lru
> list head. On a dead file, HASHED will be clear and the
> nfsd_file_lru_remove() call will be a no-op (the list_head will be
> empty).
>
> Li Lingfeng, do you want to propose a patch for this? Unfortunately,
> your reproducer won't work after that, since you can't sleep with the
> rcu_read_lock held.
Sorry for the delay.
Of course I'm willing to do it.
Maybe I can reproduce the problem by changing msleep to mdelay? I will try.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-22  1:15         ` NeilBrown
@ 2025-01-22  1:43           ` Li Lingfeng
  2025-01-22  2:21             ` Li Lingfeng
  2025-01-22  3:48           ` NeilBrown
  1 sibling, 1 reply; 17+ messages in thread
From: Li Lingfeng @ 2025-01-22  1:43 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton
  Cc: Chuck Lever, okorniev, Dai.Ngo, tom, linux-nfs, linux-kernel,
	yukuai1, houtao1, yi.zhang, yangerkun, lilingfeng


在 2025/1/22 9:15, NeilBrown 写道:
> On Wed, 22 Jan 2025, Jeff Layton wrote:
>> On Wed, 2025-01-15 at 10:03 -0500, Chuck Lever wrote:
>>> On 1/14/25 2:39 PM, Jeff Layton wrote:
>>>> On Tue, 2025-01-14 at 14:27 -0500, Jeff Layton wrote:
>>>>> On Mon, 2025-01-13 at 10:59 +0800, Li Lingfeng wrote:
>>>>>> In nfsd_file_put, after inserting the nfsd_file into the nfsd_file_lru
>>>>>> list, gc may be triggered in another thread and immediately release this
>>>>>> nfsd_file, which will lead to a UAF when accessing this nfsd_file again.
>>>>>>
>>>>>> All the places where unhash is done will also perform lru_remove, so there
>>>>>> is no need to do lru_remove separately here. After inserting the nfsd_file
>>>>>> into the nfsd_file_lru list, it can be released by relying on gc.
>>>>>>
>>>>>> Fixes: 4a0e73e635e3 ("NFSD: Leave open files out of the filecache LRU")
>>>>>> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
>>>>>> ---
>>>>>>    fs/nfsd/filecache.c | 12 ++----------
>>>>>>    1 file changed, 2 insertions(+), 10 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>>>>>> index a1cdba42c4fa..37b65cb1579a 100644
>>>>>> --- a/fs/nfsd/filecache.c
>>>>>> +++ b/fs/nfsd/filecache.c
>>>>>> @@ -372,18 +372,10 @@ nfsd_file_put(struct nfsd_file *nf)
>>>>>>    		/* Try to add it to the LRU.  If that fails, decrement. */
>>>>>>    		if (nfsd_file_lru_add(nf)) {
>>>>>>    			/* If it's still hashed, we're done */
>>>>>> -			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
>>>>>> +			if (list_lru_count(&nfsd_file_lru))
>>>>>>    				nfsd_file_schedule_laundrette();
>>>>>> -				return;
>>>>>> -			}
>>>>>>    
>>>>>> -			/*
>>>>>> -			 * We're racing with unhashing, so try to remove it from
>>>>>> -			 * the LRU. If removal fails, then someone else already
>>>>>> -			 * has our reference.
>>>>>> -			 */
>>>>>> -			if (!nfsd_file_lru_remove(nf))
>>>>>> -				return;
>>>>>> +			return;
>>>>>>    		}
>>>>>>    	}
>>>>>>    	if (refcount_dec_and_test(&nf->nf_ref))
>>>>> I think this looks OK. Filecache bugs are particularly nasty though, so
>>>>> let's run this through a nice long testing cycle.
>>>>>
>>>>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>>>> Actually, I take it back. This is problematic in another way.
>>>>
>>>> In this case, we're racing with another task that is unhashing the
>>>> object, but we've put it on the LRU ourselves. What guarantee do we
>>>> have that the unhashing and removal from the LRU didn't occur before
>>>> this task called nfsd_file_lru_add()? That's why we attempt to remove
>>>> it here -- we can't rely on the task that unhashed it to do so at that
>>>> point.
>>>>
>>>> What might be best is to take and hold the rcu_read_lock() before doing
>>>> the nfsd_file_lru_add, and just release it after we do these racy
>>>> checks. That should make it safe to access the object.
>>>>
>>>> Thoughts?
>>> Holding the RCU read lock will keep the dereferences safe since
>>> nfsd_file objects are freed only after an RCU grace period. But will the
>>> logic in nfsd_file_put() work properly on totally dead nfsd_file
>>> objects? I don't see a specific failure mode there, but I'm not very
>>> imaginative.
>>>
>>> Overall, I think RCU would help.
>>>
>>>
>> To be clear, I think we need to drop e57420be100ab from your nfsd-
>> testing branch. The race I identified above is quite likely to occur
>> and could lead to leaks.
>>
>> If Li Lingfeng doesn't propose a patch, I'll spin one up tomorrow. I
>> think the RCU approach is safe.
> I'm not convinced this is the right approach.
> I cannot see how nfsd_file_put() can race with unhashing.  If it cannot
> then we can simply unconditionally call nfsd_file_schedule_laundrette().
>
> Can describe how the race can happen - if indeed it can.
At the beginning, our testing team discovered this issue on 5.10 using
their own test cases (I'm not clear on how their test cases were
constructed).
Then, I found that there might be a problem here, so I reproduced the
phenomenon in the above way.
I will ask if their test cases can be shared externally or if they can
describe what their test cases did.
> Note that we also need rcu protection in nfsd_file_lru_add() so that the
> nf doesn't get freed after it is added the the lru and before the trace
> point.  If we don't end up needing rcu round the call, we will need it
> in the call.
>
> Thanks,
> NeilBrown

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-22  1:43           ` Li Lingfeng
@ 2025-01-22  2:21             ` Li Lingfeng
  0 siblings, 0 replies; 17+ messages in thread
From: Li Lingfeng @ 2025-01-22  2:21 UTC (permalink / raw)
  To: Li Lingfeng, NeilBrown, Jeff Layton
  Cc: Chuck Lever, okorniev, Dai.Ngo, tom, linux-nfs, linux-kernel,
	yukuai1, houtao1, yi.zhang, yangerkun


在 2025/1/22 9:43, Li Lingfeng 写道:
>
> 在 2025/1/22 9:15, NeilBrown 写道:
>> On Wed, 22 Jan 2025, Jeff Layton wrote:
>>> On Wed, 2025-01-15 at 10:03 -0500, Chuck Lever wrote:
>>>> On 1/14/25 2:39 PM, Jeff Layton wrote:
>>>>> On Tue, 2025-01-14 at 14:27 -0500, Jeff Layton wrote:
>>>>>> On Mon, 2025-01-13 at 10:59 +0800, Li Lingfeng wrote:
>>>>>>> In nfsd_file_put, after inserting the nfsd_file into the 
>>>>>>> nfsd_file_lru
>>>>>>> list, gc may be triggered in another thread and immediately 
>>>>>>> release this
>>>>>>> nfsd_file, which will lead to a UAF when accessing this 
>>>>>>> nfsd_file again.
>>>>>>>
>>>>>>> All the places where unhash is done will also perform 
>>>>>>> lru_remove, so there
>>>>>>> is no need to do lru_remove separately here. After inserting the 
>>>>>>> nfsd_file
>>>>>>> into the nfsd_file_lru list, it can be released by relying on gc.
>>>>>>>
>>>>>>> Fixes: 4a0e73e635e3 ("NFSD: Leave open files out of the 
>>>>>>> filecache LRU")
>>>>>>> Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
>>>>>>> ---
>>>>>>>    fs/nfsd/filecache.c | 12 ++----------
>>>>>>>    1 file changed, 2 insertions(+), 10 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>>>>>>> index a1cdba42c4fa..37b65cb1579a 100644
>>>>>>> --- a/fs/nfsd/filecache.c
>>>>>>> +++ b/fs/nfsd/filecache.c
>>>>>>> @@ -372,18 +372,10 @@ nfsd_file_put(struct nfsd_file *nf)
>>>>>>>            /* Try to add it to the LRU.  If that fails, 
>>>>>>> decrement. */
>>>>>>>            if (nfsd_file_lru_add(nf)) {
>>>>>>>                /* If it's still hashed, we're done */
>>>>>>> -            if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
>>>>>>> +            if (list_lru_count(&nfsd_file_lru))
>>>>>>>                    nfsd_file_schedule_laundrette();
>>>>>>> -                return;
>>>>>>> -            }
>>>>>>>    -            /*
>>>>>>> -             * We're racing with unhashing, so try to remove it 
>>>>>>> from
>>>>>>> -             * the LRU. If removal fails, then someone else 
>>>>>>> already
>>>>>>> -             * has our reference.
>>>>>>> -             */
>>>>>>> -            if (!nfsd_file_lru_remove(nf))
>>>>>>> -                return;
>>>>>>> +            return;
>>>>>>>            }
>>>>>>>        }
>>>>>>>        if (refcount_dec_and_test(&nf->nf_ref))
>>>>>> I think this looks OK. Filecache bugs are particularly nasty 
>>>>>> though, so
>>>>>> let's run this through a nice long testing cycle.
>>>>>>
>>>>>> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>>>>> Actually, I take it back. This is problematic in another way.
>>>>>
>>>>> In this case, we're racing with another task that is unhashing the
>>>>> object, but we've put it on the LRU ourselves. What guarantee do we
>>>>> have that the unhashing and removal from the LRU didn't occur before
>>>>> this task called nfsd_file_lru_add()? That's why we attempt to remove
>>>>> it here -- we can't rely on the task that unhashed it to do so at 
>>>>> that
>>>>> point.
>>>>>
>>>>> What might be best is to take and hold the rcu_read_lock() before 
>>>>> doing
>>>>> the nfsd_file_lru_add, and just release it after we do these racy
>>>>> checks. That should make it safe to access the object.
>>>>>
>>>>> Thoughts?
>>>> Holding the RCU read lock will keep the dereferences safe since
>>>> nfsd_file objects are freed only after an RCU grace period. But 
>>>> will the
>>>> logic in nfsd_file_put() work properly on totally dead nfsd_file
>>>> objects? I don't see a specific failure mode there, but I'm not very
>>>> imaginative.
>>>>
>>>> Overall, I think RCU would help.
>>>>
>>>>
>>> To be clear, I think we need to drop e57420be100ab from your nfsd-
>>> testing branch. The race I identified above is quite likely to occur
>>> and could lead to leaks.
>>>
>>> If Li Lingfeng doesn't propose a patch, I'll spin one up tomorrow. I
>>> think the RCU approach is safe.
>> I'm not convinced this is the right approach.
>> I cannot see how nfsd_file_put() can race with unhashing.  If it cannot
>> then we can simply unconditionally call nfsd_file_schedule_laundrette().
>>
>> Can describe how the race can happen - if indeed it can.
> At the beginning, our testing team discovered this issue on 5.10 using
> their own test cases (I'm not clear on how their test cases were
> constructed).
> Then, I found that there might be a problem here, so I reproduced the
> phenomenon in the above way.
> I will ask if their test cases can be shared externally or if they can
> describe what their test cases did.
They used three machines for the test: one server and two clients.
ClientA continuously creates and deletes a file, while ClientB uses fio to
perform random writes on this file.

I found the original stack trace (since I only have a screenshot, I can
only extract some important information from it):
list_del corruption. prev-next should be ffff000052be6828, but was 
ffff000052221e48
WARNING:CPU: 1 PID: 12224at lib/list_debug.c:95 
list_del_entry_valid+Oxe0/0x10
...
Call trace:
  list_del_entry_valid+Oxe0/0x10c
  list_lru_del+0xb4/0x19c
  nfsd_file_put+0x134/0xla0 [nfsd]
  nfsd_read+0xc0/0xldo [nfsd
  nfsd3_proc_read+0xl44/0xla0 [nfsd]
  nfsd_dispatch+0x190/0x254 [nfsd]
  svc_process_common+Ox52c/0x780 [sunrpc]
  svc_process+Ox88/0xe4 [sunrpc]
  nfsd+0xb4/0x160 [nfsd]
  kthread+0x108/0x134
  ret_from_fork+ох10/0xl8
---[ end trace f87713f92e49e854 ]---

refcount_t: underflow;use-after-free.
WARNING:CPU: 1 PID: 12220 at lib/refcount.c:28 
refcount_warn_saturate+0xf4/0xl44
...
Call trace:
  refcount_warn_saturate+0xf4/0x144
  nfsd_file_put+0x160/0xla0 [nfsd]
  nfsd_read+oxco/0xld0 [nfsd]
  nfsd3_proc_read+0xl44/0xla0 [nfsd]
  nfsd_dispatch+0x190/0x254 [nfsd]
  svc_process_common+Ox52c/0x780[sunrpc]
  svc_process+0x88/0xe4_ [sunrpc]
  nfsd+0xb4/0x160 [nfsd]
  kthread+0x108/0x134
  ret_from_fork+0x10/0x18
  ---[ end tracef87713f92e49e853 ]---

After applying this modification, this test case no longer encountered any
issues.
>> Note that we also need rcu protection in nfsd_file_lru_add() so that the
>> nf doesn't get freed after it is added the the lru and before the trace
>> point.  If we don't end up needing rcu round the call, we will need it
>> in the call.
>>
>> Thanks,
>> NeilBrown


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-22  1:15         ` NeilBrown
  2025-01-22  1:43           ` Li Lingfeng
@ 2025-01-22  3:48           ` NeilBrown
  2025-01-22  7:31             ` Li Lingfeng
  2025-01-22 12:31             ` Jeff Layton
  1 sibling, 2 replies; 17+ messages in thread
From: NeilBrown @ 2025-01-22  3:48 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Chuck Lever, Li Lingfeng, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun, lilingfeng

On Wed, 22 Jan 2025, NeilBrown wrote:
> On Wed, 22 Jan 2025, Jeff Layton wrote:
> > To be clear, I think we need to drop e57420be100ab from your nfsd-
> > testing branch. The race I identified above is quite likely to occur
> > and could lead to leaks.
> > 
> > If Li Lingfeng doesn't propose a patch, I'll spin one up tomorrow. I
> > think the RCU approach is safe.
> 
> I'm not convinced this is the right approach.
> I cannot see how nfsd_file_put() can race with unhashing.  If it cannot
> then we can simply unconditionally call nfsd_file_schedule_laundrette().
> 
> Can describe how the race can happen - if indeed it can.

I thought I should explore this more and explain what I think actually
happens ...

Certainly nfsd_file_unhash() might race with nfsd_file_put().  At this
point in nfsd_file_put() we have the only reference but a hash lookup
could gain another reference and the immediately unhash it.
nfsd_file_queue_for_close() can do this.  There might be other paths.

But why does this mean we need to remove it from the lru and free it
immediately?  If we leave it on the lru it will be freed in a couple of
seconds.

The reason might be nfsd_file_close_inode_sync().  This needs to close
files before returning.
But if nfsd_file_close_inode_sync() is called while some other thread
holds a reference to the file and might want to call nfsd_file_put(),
then it isn't going to succeed anyway so any race here doesn't make any
difference.

So I think the following might be the best fix

???

Thanks,
NeilBrown

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index fcd751cb7c76..773788a50e56 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -322,10 +322,13 @@ nfsd_file_check_writeback(struct nfsd_file *nf)
 static bool nfsd_file_lru_add(struct nfsd_file *nf)
 {
 	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
+	rcu_read_lock();
 	if (list_lru_add_obj(&nfsd_file_lru, &nf->nf_lru)) {
 		trace_nfsd_file_lru_add(nf);
+		rcu_read_unlock();
 		return true;
 	}
+	rcu_read_unlock();
 	return false;
 }
 
@@ -371,19 +374,8 @@ nfsd_file_put(struct nfsd_file *nf)
 
 		/* Try to add it to the LRU.  If that fails, decrement. */
 		if (nfsd_file_lru_add(nf)) {
-			/* If it's still hashed, we're done */
-			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
-				nfsd_file_schedule_laundrette();
-				return;
-			}
-
-			/*
-			 * We're racing with unhashing, so try to remove it from
-			 * the LRU. If removal fails, then someone else already
-			 * has our reference.
-			 */
-			if (!nfsd_file_lru_remove(nf))
-				return;
+			nfsd_file_schedule_laundrette();
+			return;
 		}
 	}
 	if (refcount_dec_and_test(&nf->nf_ref))


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-22  3:48           ` NeilBrown
@ 2025-01-22  7:31             ` Li Lingfeng
  2025-01-22 12:31             ` Jeff Layton
  1 sibling, 0 replies; 17+ messages in thread
From: Li Lingfeng @ 2025-01-22  7:31 UTC (permalink / raw)
  To: NeilBrown, Jeff Layton
  Cc: Chuck Lever, okorniev, Dai.Ngo, tom, linux-nfs, linux-kernel,
	yukuai1, houtao1, yi.zhang, yangerkun, lilingfeng


在 2025/1/22 11:48, NeilBrown 写道:
> On Wed, 22 Jan 2025, NeilBrown wrote:
>> On Wed, 22 Jan 2025, Jeff Layton wrote:
>>> To be clear, I think we need to drop e57420be100ab from your nfsd-
>>> testing branch. The race I identified above is quite likely to occur
>>> and could lead to leaks.
>>>
>>> If Li Lingfeng doesn't propose a patch, I'll spin one up tomorrow. I
>>> think the RCU approach is safe.
>> I'm not convinced this is the right approach.
>> I cannot see how nfsd_file_put() can race with unhashing.  If it cannot
>> then we can simply unconditionally call nfsd_file_schedule_laundrette().
>>
>> Can describe how the race can happen - if indeed it can.
> I thought I should explore this more and explain what I think actually
> happens ...
>
> Certainly nfsd_file_unhash() might race with nfsd_file_put().  At this
> point in nfsd_file_put() we have the only reference but a hash lookup
> could gain another reference and the immediately unhash it.
> nfsd_file_queue_for_close() can do this.  There might be other paths.
>
> But why does this mean we need to remove it from the lru and free it
> immediately?  If we leave it on the lru it will be freed in a couple of
> seconds.
>
> The reason might be nfsd_file_close_inode_sync().  This needs to close
> files before returning.
> But if nfsd_file_close_inode_sync() is called while some other thread
> holds a reference to the file and might want to call nfsd_file_put(),
> then it isn't going to succeed anyway so any race here doesn't make any
> difference.
>
> So I think the following might be the best fix
Agree.
I ignored the trace point in nfsd_file_lru_add(), this one looks better.
And I have tested it.

Thanks.

>
> ???
>
> Thanks,
> NeilBrown
>
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index fcd751cb7c76..773788a50e56 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -322,10 +322,13 @@ nfsd_file_check_writeback(struct nfsd_file *nf)
>   static bool nfsd_file_lru_add(struct nfsd_file *nf)
>   {
>   	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
> +	rcu_read_lock();
>   	if (list_lru_add_obj(&nfsd_file_lru, &nf->nf_lru)) {
>   		trace_nfsd_file_lru_add(nf);
> +		rcu_read_unlock();
>   		return true;
>   	}
> +	rcu_read_unlock();
>   	return false;
>   }
>   
> @@ -371,19 +374,8 @@ nfsd_file_put(struct nfsd_file *nf)
>   
>   		/* Try to add it to the LRU.  If that fails, decrement. */
>   		if (nfsd_file_lru_add(nf)) {
> -			/* If it's still hashed, we're done */
> -			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> -				nfsd_file_schedule_laundrette();
> -				return;
> -			}
> -
> -			/*
> -			 * We're racing with unhashing, so try to remove it from
> -			 * the LRU. If removal fails, then someone else already
> -			 * has our reference.
> -			 */
> -			if (!nfsd_file_lru_remove(nf))
> -				return;
> +			nfsd_file_schedule_laundrette();
> +			return;
>   		}
>   	}
>   	if (refcount_dec_and_test(&nf->nf_ref))
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
  2025-01-22  3:48           ` NeilBrown
  2025-01-22  7:31             ` Li Lingfeng
@ 2025-01-22 12:31             ` Jeff Layton
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2025-01-22 12:31 UTC (permalink / raw)
  To: NeilBrown
  Cc: Chuck Lever, Li Lingfeng, okorniev, Dai.Ngo, tom, linux-nfs,
	linux-kernel, yukuai1, houtao1, yi.zhang, yangerkun, lilingfeng

On Wed, 2025-01-22 at 14:48 +1100, NeilBrown wrote:
> On Wed, 22 Jan 2025, NeilBrown wrote:
> > On Wed, 22 Jan 2025, Jeff Layton wrote:
> > > To be clear, I think we need to drop e57420be100ab from your nfsd-
> > > testing branch. The race I identified above is quite likely to occur
> > > and could lead to leaks.
> > > 
> > > If Li Lingfeng doesn't propose a patch, I'll spin one up tomorrow. I
> > > think the RCU approach is safe.
> > 
> > I'm not convinced this is the right approach.
> > I cannot see how nfsd_file_put() can race with unhashing.  If it cannot
> > then we can simply unconditionally call nfsd_file_schedule_laundrette().
> > 
> > Can describe how the race can happen - if indeed it can.
> 
> I thought I should explore this more and explain what I think actually
> happens ...
> 
> Certainly nfsd_file_unhash() might race with nfsd_file_put().  At this
> point in nfsd_file_put() we have the only reference but a hash lookup
> could gain another reference and the immediately unhash it.
> nfsd_file_queue_for_close() can do this.  There might be other paths.
> 
> But why does this mean we need to remove it from the lru and free it
> immediately?  If we leave it on the lru it will be freed in a couple of
> seconds.
> 
> The reason might be nfsd_file_close_inode_sync().  This needs to close
> files before returning.
> But if nfsd_file_close_inode_sync() is called while some other thread
> holds a reference to the file and might want to call nfsd_file_put(),
> then it isn't going to succeed anyway so any race here doesn't make any
> difference.
> 
> So I think the following might be the best fix
> 
> ???
> 
> Thanks,
> NeilBrown
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index fcd751cb7c76..773788a50e56 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -322,10 +322,13 @@ nfsd_file_check_writeback(struct nfsd_file *nf)
>  static bool nfsd_file_lru_add(struct nfsd_file *nf)
>  {
>  	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
> +	rcu_read_lock();
>  	if (list_lru_add_obj(&nfsd_file_lru, &nf->nf_lru)) {
>  		trace_nfsd_file_lru_add(nf);
> +		rcu_read_unlock();
>  		return true;
>  	}
> +	rcu_read_unlock();
>  	return false;
>  }
>  

I think that I'm now convinced that it's OK to remove the code in the
if block below. But if we do that, then I don't think you need the
rcu_read_lock() in nfsd_file_lru_add(). It's just handing off the
reference to the LRU at that point and once that's done, it doesn't
need to look at it again. That makes the rcu_read_lock() unnecessary.

Given that, Li Lingfeng's original patch is OK after all.

Am I missing something?


> @@ -371,19 +374,8 @@ nfsd_file_put(struct nfsd_file *nf)
>  
>  		/* Try to add it to the LRU.  If that fails, decrement. */
>  		if (nfsd_file_lru_add(nf)) {
> -			/* If it's still hashed, we're done */
> -			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> -				nfsd_file_schedule_laundrette();
> -				return;
> -			}
> -
> -			/*
> -			 * We're racing with unhashing, so try to remove it from
> -			 * the LRU. If removal fails, then someone else already
> -			 * has our reference.
> -			 */
> -			if (!nfsd_file_lru_remove(nf))
> -				return;
> +			nfsd_file_schedule_laundrette();
> +			return;
>  		}
>  	}
>  	if (refcount_dec_and_test(&nf->nf_ref))
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-01-22 12:31 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-13  2:59 [PATCH] nfsd: free nfsd_file by gc after adding it to lru list Li Lingfeng
2025-01-13 14:07 ` Chuck Lever
2025-01-14  1:54   ` Li Lingfeng
2025-01-14 19:17     ` Chuck Lever
2025-01-14 19:27 ` Jeff Layton
2025-01-14 19:39   ` Jeff Layton
2025-01-15 15:03     ` Chuck Lever
2025-01-15 15:27       ` Jeff Layton
2025-01-22  1:33         ` Li Lingfeng
2025-01-21 20:50       ` Jeff Layton
2025-01-22  1:15         ` NeilBrown
2025-01-22  1:43           ` Li Lingfeng
2025-01-22  2:21             ` Li Lingfeng
2025-01-22  3:48           ` NeilBrown
2025-01-22  7:31             ` Li Lingfeng
2025-01-22 12:31             ` Jeff Layton
2025-01-14 19:40 ` cel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox