* [PATCH] nfsd: fix race condition in nfsd_file_acquire
@ 2023-08-18 6:55 Haodong Wong
2023-08-21 0:27 ` NeilBrown
0 siblings, 1 reply; 3+ messages in thread
From: Haodong Wong @ 2023-08-18 6:55 UTC (permalink / raw)
To: chuck.lever, jlayton
Cc: haodong.wong, J. Bruce Fields, linux-nfs, linux-kernel
Before Kernel 6.1, we observed the following OOPS in the stress test
caused by reorder on set bit NFSD_FILE_HASHED and NFSD_FILE_PENDING,
and smp_mb__after_atomic() should be a paire.
Task A: Task B:
nfsd_file_acquire:
new = nfsd_file_alloc()
open_file:
refcount_inc(&nf->nf_ref);
nf = nfsd_file_find_locked();
wait_for_construction:
since nf_flags is zero it will not wait
wait_on_bit(&nf->nf_flags,
NFSD_FILE_PENDING);
if (status == nfs_ok) {
*pnf = nf; //OOPS happen!
Unable to handle kernel NULL pointer at virtual address 0000000000000028
Mem abort info:
ESR = 0x96000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=0000000152546000
[0000000000000028] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT_RT SMP
CPU: 7 PID: 1767 Comm: nfsd Not tainted 5.10.104 #1
pstate: 40c00005 (nZcv daif +PAN +UAO -TCO BTYPE=--)
pc : nfsd_read+0x78/0x280 [nfsd]
lr : nfsd_read+0x68/0x280 [nfsd]
sp : ffff80001c0b3c70
x29: ffff80001c0b3c70 x28: 0000000000000000
x27: 0000000000000002 x26: ffff0000c8a3ca70
x25: ffff0000c8a45180 x24: 0000000000002000
x23: ffff0000c8a45178 x22: ffff0000c8a45008
x21: ffff0000c31aac40 x20: ffff0000c8a3c000
x19: 0000000000000000 x18: 0000000000000001
x17: 0000000000000007 x16: 00000000b35db681
x15: 0000000000000156 x14: ffff0000c3f91300
x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000
x9 : 0000000000000000 x8 : ffff000118014a80
x7 : 0000000000000002 x6 : ffff0002559142dc
x5 : ffff0000c31aac40 x4 : 0000000000000004
x3 : 0000000000000001 x2 : 0000000000000000
x1 : 0000000000000001 x0 : ffff000255914280
Call trace:
nfsd_read+0x78/0x280 [nfsd]
nfsd3_proc_read+0x98/0xc0 [nfsd]
nfsd_dispatch+0xc8/0x160 [nfsd]
svc_process_common+0x440/0x790
svc_process+0xb0/0xd0
nfsd+0xfc/0x160 [nfsd]
kthread+0x17c/0x1a0
ret_from_fork+0x10/0x18
Signed-off-by: Haodong Wong <haydenw.kernel@gmail.com>
---
fs/nfsd/filecache.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index e30e1ddc1ace..ba980369e6b4 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -974,8 +974,12 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
nfsd_file_slab_free(&new->nf_rcu);
wait_for_construction:
+ /* In case of set bit NFSD_FILE_PENDING and NFSD_FILE_HASHED reorder */
+ smp_rmb();
wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE);
+ /* Be a paire of smp_mb after clear bit NFSD_FILE_PENDING */
+ smp_mb__after_atomic();
/* Did construction of this file fail? */
if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
if (!retry) {
@@ -1018,8 +1022,11 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
nf = new;
/* Take reference for the hashtable */
refcount_inc(&nf->nf_ref);
- __set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
__set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
+ /* Ensure set bit order set NFSD_FILE_HASHED after set NFSD_FILE_PENDING */
+ smp_wmb();
+ __set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
+
list_lru_add(&nfsd_file_lru, &nf->nf_lru);
hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head);
++nfsd_file_hashtbl[hashval].nfb_count;
--
2.25.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] nfsd: fix race condition in nfsd_file_acquire
2023-08-18 6:55 [PATCH] nfsd: fix race condition in nfsd_file_acquire Haodong Wong
@ 2023-08-21 0:27 ` NeilBrown
2023-08-22 9:31 ` Hayden Wong
0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2023-08-21 0:27 UTC (permalink / raw)
To: Haodong Wong
Cc: chuck.lever, jlayton, haodong.wong, J. Bruce Fields, linux-nfs,
linux-kernel
On Fri, 18 Aug 2023, Haodong Wong wrote:
> Before Kernel 6.1, we observed the following OOPS in the stress test
> caused by reorder on set bit NFSD_FILE_HASHED and NFSD_FILE_PENDING,
> and smp_mb__after_atomic() should be a paire.
If you saw this "before kernel 6.1" does that mean you don't see it
after kernel 6.1? So has it already been fixed?
What kernel are you targeting with your patch? It doesn't apply to
mainline, but looks like it might to 6.0. The oops message is from
5.10.104. Maybe that is where you want a fix?
I assume you want this fix to go to a -stable kernel? It would be good
to identify which upstream patch fixed the problem, then either backport
that, or explain why something simpler is needed.
>
> Task A: Task B:
>
> nfsd_file_acquire:
>
> new = nfsd_file_alloc()
> open_file:
> refcount_inc(&nf->nf_ref);
The key step in Task A is
hlist_add_head_rcu(&nf->nf_node,
&nfsd_file_hashtbl[hashval].nfb_head);
Until that completes, nfsd_file_find_locked() cannot find the new file.
hlist_add_head_rcu() uses "rcu_assign_pointer()" which should include
all the barriers needed.
> nf = nfsd_file_find_locked();
> wait_for_construction:
>
> since nf_flags is zero it will not wait
>
> wait_on_bit(&nf->nf_flags,
> NFSD_FILE_PENDING);
>
> if (status == nfs_ok) {
> *pnf = nf; //OOPS happen!
The oops message suggests that after nfsd_read() calls
nfsd_file_acquire() there is no error, but nf is NULL.
So the nf->nf_file access causes the oops. nf_file is at offset
0x28, which is the virtual address mentioned in the oops.
So do you think 'nf' is NULL at this point where you write "OOPS
happen!" ??
I cannot see how that might happen even if wait_on_bit() didn't
actually wait.
Maybe if you could explain if a bit more detail what you think is
happening. What exactly is NULL which causes the OOPS, and how exactly
does it end up being NULL.
I cannot see what might be the cause, but the oops makes it clear that
it did happen.
Also is this a pure 5.10.104 kernel, or are there other patches on it?
Thanks,
NeilBrown
>
> Unable to handle kernel NULL pointer at virtual address 0000000000000028
> Mem abort info:
> ESR = 0x96000004
> EC = 0x25: DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> Data abort info:
> ISV = 0, ISS = 0x00000004
> CM = 0, WnR = 0
> user pgtable: 4k pages, 48-bit VAs, pgdp=0000000152546000
> [0000000000000028] pgd=0000000000000000, p4d=0000000000000000
> Internal error: Oops: 96000004 [#1] PREEMPT_RT SMP
> CPU: 7 PID: 1767 Comm: nfsd Not tainted 5.10.104 #1
> pstate: 40c00005 (nZcv daif +PAN +UAO -TCO BTYPE=--)
> pc : nfsd_read+0x78/0x280 [nfsd]
> lr : nfsd_read+0x68/0x280 [nfsd]
> sp : ffff80001c0b3c70
> x29: ffff80001c0b3c70 x28: 0000000000000000
> x27: 0000000000000002 x26: ffff0000c8a3ca70
> x25: ffff0000c8a45180 x24: 0000000000002000
> x23: ffff0000c8a45178 x22: ffff0000c8a45008
> x21: ffff0000c31aac40 x20: ffff0000c8a3c000
> x19: 0000000000000000 x18: 0000000000000001
> x17: 0000000000000007 x16: 00000000b35db681
> x15: 0000000000000156 x14: ffff0000c3f91300
> x13: 0000000000000000 x12: 0000000000000000
> x11: 0000000000000000 x10: 0000000000000000
> x9 : 0000000000000000 x8 : ffff000118014a80
> x7 : 0000000000000002 x6 : ffff0002559142dc
> x5 : ffff0000c31aac40 x4 : 0000000000000004
> x3 : 0000000000000001 x2 : 0000000000000000
> x1 : 0000000000000001 x0 : ffff000255914280
> Call trace:
> nfsd_read+0x78/0x280 [nfsd]
> nfsd3_proc_read+0x98/0xc0 [nfsd]
> nfsd_dispatch+0xc8/0x160 [nfsd]
> svc_process_common+0x440/0x790
> svc_process+0xb0/0xd0
> nfsd+0xfc/0x160 [nfsd]
> kthread+0x17c/0x1a0
> ret_from_fork+0x10/0x18
>
> Signed-off-by: Haodong Wong <haydenw.kernel@gmail.com>
> ---
> fs/nfsd/filecache.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index e30e1ddc1ace..ba980369e6b4 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -974,8 +974,12 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> nfsd_file_slab_free(&new->nf_rcu);
>
> wait_for_construction:
> + /* In case of set bit NFSD_FILE_PENDING and NFSD_FILE_HASHED reorder */
> + smp_rmb();
> wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE);
>
> + /* Be a paire of smp_mb after clear bit NFSD_FILE_PENDING */
> + smp_mb__after_atomic();
> /* Did construction of this file fail? */
> if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> if (!retry) {
> @@ -1018,8 +1022,11 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> nf = new;
> /* Take reference for the hashtable */
> refcount_inc(&nf->nf_ref);
> - __set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
> __set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
> + /* Ensure set bit order set NFSD_FILE_HASHED after set NFSD_FILE_PENDING */
> + smp_wmb();
> + __set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
> +
> list_lru_add(&nfsd_file_lru, &nf->nf_lru);
> hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head);
> ++nfsd_file_hashtbl[hashval].nfb_count;
> --
> 2.25.1
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] nfsd: fix race condition in nfsd_file_acquire
2023-08-21 0:27 ` NeilBrown
@ 2023-08-22 9:31 ` Hayden Wong
0 siblings, 0 replies; 3+ messages in thread
From: Hayden Wong @ 2023-08-22 9:31 UTC (permalink / raw)
To: NeilBrown
Cc: chuck.lever, jlayton, haodong.wong, J. Bruce Fields, linux-nfs,
linux-kernel
On Mon, Aug 21, 2023 at 8:27 AM NeilBrown <neilb@suse.de> wrote:
>
> On Fri, 18 Aug 2023, Haodong Wong wrote:
> > Before Kernel 6.1, we observed the following OOPS in the stress test
> > caused by reorder on set bit NFSD_FILE_HASHED and NFSD_FILE_PENDING,
> > and smp_mb__after_atomic() should be a paire.
>
> If you saw this "before kernel 6.1" does that mean you don't see it
> after kernel 6.1? So has it already been fixed?
>
I just found after the patched c4c649ab413ba "NFSD: Convert filecache
to rhltable" , this issue should not have .
It was fixed as below:
-nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may)
+nfsd_file_alloc(struct net *net, struct inode *inode, unsigned char need,
+ bool want_gc)
{
struct nfsd_file *nf;
nf = kmem_cache_alloc(nfsd_file_slab, GFP_KERNEL);
- if (nf) {
- INIT_LIST_HEAD(&nf->nf_lru);
- nf->nf_birthtime = ktime_get();
- nf->nf_file = NULL;
- nf->nf_cred = get_current_cred();
- nf->nf_net = key->net;
- nf->nf_flags = 0;
- __set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
- __set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
- if (key->gc)
- __set_bit(NFSD_FILE_GC, &nf->nf_flags);
- nf->nf_inode = key->inode;
- refcount_set(&nf->nf_ref, 1);
- nf->nf_may = key->need;
- nf->nf_mark = NULL;
- }
+ if (unlikely(!nf))
+ return NULL;
+
+ INIT_LIST_HEAD(&nf->nf_lru);
+ nf->nf_birthtime = ktime_get();
+ nf->nf_file = NULL;
+ nf->nf_cred = get_current_cred();
+ nf->nf_net = net;
+ nf->nf_flags = want_gc ?
+ BIT(NFSD_FILE_HASHED) | BIT(NFSD_FILE_PENDING) |
BIT(NFSD_FILE_GC) :
+ BIT(NFSD_FILE_HASHED) | BIT(NFSD_FILE_PENDING);
+ nf->nf_inode = inode;
+ refcount_set(&nf->nf_ref, 1);
+ nf->nf_may = need;
+ nf->nf_mark = NULL;
return nf;
}
Before above patch, this OOPS happen as below
Task A
Task B
nfs_read
nfs_read
nfsd_file_acquire
nfsd_file_acquire
__set_bit(NFSD_FILE_HASHED, &nf->nf_flags)
nf = nfsd_file_find_locked();
wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING);
since rcu_assign_pointer has barrier in
hlist_add_head_rcu, but before smp_store_release in which
two __set_bit can cross update rcu hlist head pointer
(first), so Task B wait_on_bit didn't wait, just go below:
*pnf = nf;
file = nf->nf_file
file->f_op->splice_read *OOPS
here in nfs_read!
I also found similar issue at here https://lkml.org/lkml/2023/1/4/880
Since this patch can fix this issue, I discard my commit
Very glad to receive your reply.
Thanks a lot!
Regards,
Haodong Wong
> What kernel are you targeting with your patch? It doesn't apply to
> mainline, but looks like it might to 6.0. The oops message is from
> 5.10.104. Maybe that is where you want a fix?
>
The target is before kernel 6.1
> I assume you want this fix to go to a -stable kernel? It would be good
> to identify which upstream patch fixed the problem, then either backport
> that, or explain why something simpler is needed.
>
> >
> > Task A: Task B:
> >
> > nfsd_file_acquire:
> >
> > new = nfsd_file_alloc()
> > open_file:
> > refcount_inc(&nf->nf_ref);
>
> The key step in Task A is
> hlist_add_head_rcu(&nf->nf_node,
> &nfsd_file_hashtbl[hashval].nfb_head);
>
> Until that completes, nfsd_file_find_locked() cannot find the new file.
> hlist_add_head_rcu() uses "rcu_assign_pointer()" which should include
> all the barriers needed.
>
> > nf = nfsd_file_find_locked();
> > wait_for_construction:
> >
> > since nf_flags is zero it will not wait
> >
> > wait_on_bit(&nf->nf_flags,
> > NFSD_FILE_PENDING);
> >
> > if (status == nfs_ok) {
> > *pnf = nf; //OOPS happen!
>
> The oops message suggests that after nfsd_read() calls
> nfsd_file_acquire() there is no error, but nf is NULL.
> So the nf->nf_file access causes the oops. nf_file is at offset
> 0x28, which is the virtual address mentioned in the oops.
>
> So do you think 'nf' is NULL at this point where you write "OOPS
> happen!" ??
Sorry, something missing, it oops happened in nfs_read after
nfsd_file_acquire *pnf = nf, because it copy nf_file is still NULL as
above
> I cannot see how that might happen even if wait_on_bit() didn't
> actually wait.
>
> Maybe if you could explain if a bit more detail what you think is
> happening. What exactly is NULL which causes the OOPS, and how exactly
> does it end up being NULL.
> I cannot see what might be the cause, but the oops makes it clear that
> it did happen.
>
> Also is this a pure 5.10.104 kernel, or are there other patches on it?
>
It is applied with PREEMPT_RT Patch
> Thanks,
> NeilBrown
>
>
>
> >
> > Unable to handle kernel NULL pointer at virtual address 0000000000000028
> > Mem abort info:
> > ESR = 0x96000004
> > EC = 0x25: DABT (current EL), IL = 32 bits
> > SET = 0, FnV = 0
> > EA = 0, S1PTW = 0
> > Data abort info:
> > ISV = 0, ISS = 0x00000004
> > CM = 0, WnR = 0
> > user pgtable: 4k pages, 48-bit VAs, pgdp=0000000152546000
> > [0000000000000028] pgd=0000000000000000, p4d=0000000000000000
> > Internal error: Oops: 96000004 [#1] PREEMPT_RT SMP
> > CPU: 7 PID: 1767 Comm: nfsd Not tainted 5.10.104 #1
> > pstate: 40c00005 (nZcv daif +PAN +UAO -TCO BTYPE=--)
> > pc : nfsd_read+0x78/0x280 [nfsd]
> > lr : nfsd_read+0x68/0x280 [nfsd]
> > sp : ffff80001c0b3c70
> > x29: ffff80001c0b3c70 x28: 0000000000000000
> > x27: 0000000000000002 x26: ffff0000c8a3ca70
> > x25: ffff0000c8a45180 x24: 0000000000002000
> > x23: ffff0000c8a45178 x22: ffff0000c8a45008
> > x21: ffff0000c31aac40 x20: ffff0000c8a3c000
> > x19: 0000000000000000 x18: 0000000000000001
> > x17: 0000000000000007 x16: 00000000b35db681
> > x15: 0000000000000156 x14: ffff0000c3f91300
> > x13: 0000000000000000 x12: 0000000000000000
> > x11: 0000000000000000 x10: 0000000000000000
> > x9 : 0000000000000000 x8 : ffff000118014a80
> > x7 : 0000000000000002 x6 : ffff0002559142dc
> > x5 : ffff0000c31aac40 x4 : 0000000000000004
> > x3 : 0000000000000001 x2 : 0000000000000000
> > x1 : 0000000000000001 x0 : ffff000255914280
> > Call trace:
> > nfsd_read+0x78/0x280 [nfsd]
> > nfsd3_proc_read+0x98/0xc0 [nfsd]
> > nfsd_dispatch+0xc8/0x160 [nfsd]
> > svc_process_common+0x440/0x790
> > svc_process+0xb0/0xd0
> > nfsd+0xfc/0x160 [nfsd]
> > kthread+0x17c/0x1a0
> > ret_from_fork+0x10/0x18
> >
> > Signed-off-by: Haodong Wong <haydenw.kernel@gmail.com>
> > ---
> > fs/nfsd/filecache.c | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index e30e1ddc1ace..ba980369e6b4 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -974,8 +974,12 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > nfsd_file_slab_free(&new->nf_rcu);
> >
> > wait_for_construction:
> > + /* In case of set bit NFSD_FILE_PENDING and NFSD_FILE_HASHED reorder */
> > + smp_rmb();
> > wait_on_bit(&nf->nf_flags, NFSD_FILE_PENDING, TASK_UNINTERRUPTIBLE);
> >
> > + /* Be a paire of smp_mb after clear bit NFSD_FILE_PENDING */
> > + smp_mb__after_atomic();
> > /* Did construction of this file fail? */
> > if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> > if (!retry) {
> > @@ -1018,8 +1022,11 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > nf = new;
> > /* Take reference for the hashtable */
> > refcount_inc(&nf->nf_ref);
> > - __set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
> > __set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
> > + /* Ensure set bit order set NFSD_FILE_HASHED after set NFSD_FILE_PENDING */
> > + smp_wmb();
> > + __set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
> > +
> > list_lru_add(&nfsd_file_lru, &nf->nf_lru);
> > hlist_add_head_rcu(&nf->nf_node, &nfsd_file_hashtbl[hashval].nfb_head);
> > ++nfsd_file_hashtbl[hashval].nfb_count;
> > --
> > 2.25.1
> >
> >
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-08-22 9:31 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-18 6:55 [PATCH] nfsd: fix race condition in nfsd_file_acquire Haodong Wong
2023-08-21 0:27 ` NeilBrown
2023-08-22 9:31 ` Hayden Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox