* [PATCH] NFSv4.1: Don't allow a lock_state to be freed while checking @ 2016-11-14 16:50 Benjamin Coddington 2016-11-14 20:18 ` Benjamin Coddington 0 siblings, 1 reply; 4+ messages in thread From: Benjamin Coddington @ 2016-11-14 16:50 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs While walking the list of lock_states, keep the current nfs4_lock_state referenced so that it isn't freed while checking. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> --- fs/nfs/nfs4proc.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 7897826d7c51..9a1cb9e8c4fc 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2564,12 +2564,15 @@ static void nfs41_check_delegation_stateid(struct nfs4_state *state) static int nfs41_check_expired_locks(struct nfs4_state *state) { int status, ret = NFS_OK; - struct nfs4_lock_state *lsp; + struct nfs4_lock_state *lsp, *tmp; struct nfs_server *server = NFS_SERVER(state->inode); if (!test_bit(LK_STATE_IN_USE, &state->flags)) goto out; - list_for_each_entry(lsp, &state->lock_states, ls_locks) { + spin_lock(&state->state_lock); + list_for_each_entry_safe(lsp, tmp, &state->lock_states, ls_locks) { + atomic_inc(&lsp->ls_count); + spin_unlock(&state->state_lock); if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) { struct rpc_cred *cred = lsp->ls_state->owner->so_cred; @@ -2588,7 +2591,10 @@ static int nfs41_check_expired_locks(struct nfs4_state *state) break; } } - }; + nfs4_put_lock_state(lsp); + spin_lock(&state->state_lock); + } + spin_unlock(&state->state_lock); out: return ret; } -- 2.5.5 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] NFSv4.1: Don't allow a lock_state to be freed while checking 2016-11-14 16:50 [PATCH] NFSv4.1: Don't allow a lock_state to be freed while checking Benjamin Coddington @ 2016-11-14 20:18 ` Benjamin Coddington 2016-11-15 16:12 ` Benjamin Coddington 0 siblings, 1 reply; 4+ messages in thread From: Benjamin Coddington @ 2016-11-14 20:18 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs On 14 Nov 2016, at 11:50, Benjamin Coddington wrote: > While walking the list of lock_states, keep the current > nfs4_lock_state > referenced so that it isn't freed while checking. > > Signed-off-by: Benjamin Coddington <bcodding@redhat.com> > --- > fs/nfs/nfs4proc.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index 7897826d7c51..9a1cb9e8c4fc 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -2564,12 +2564,15 @@ static void > nfs41_check_delegation_stateid(struct nfs4_state *state) > static int nfs41_check_expired_locks(struct nfs4_state *state) > { > int status, ret = NFS_OK; > - struct nfs4_lock_state *lsp; > + struct nfs4_lock_state *lsp, *tmp; > struct nfs_server *server = NFS_SERVER(state->inode); > > if (!test_bit(LK_STATE_IN_USE, &state->flags)) > goto out; > - list_for_each_entry(lsp, &state->lock_states, ls_locks) { > + spin_lock(&state->state_lock); > + list_for_each_entry_safe(lsp, tmp, &state->lock_states, ls_locks) { > + atomic_inc(&lsp->ls_count); > + spin_unlock(&state->state_lock); > if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) { > struct rpc_cred *cred = lsp->ls_state->owner->so_cred; > > @@ -2588,7 +2591,10 @@ static int nfs41_check_expired_locks(struct > nfs4_state *state) > break; > } > } > - }; > + nfs4_put_lock_state(lsp); > + spin_lock(&state->state_lock); > + } > + spin_unlock(&state->state_lock); > out: > return ret; > } Actually, something else may be wrong here.. after several more hours of testing with this and Trond's two fixes for CLOSE races this popped up: [ 8102.015071] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 8102.015985] IP: [<ffffffff8182717c>] _raw_spin_lock+0xc/0x30 [ 8102.016622] PGD 0 [ 8102.016829] [ 8102.017014] Oops: 0002 [#1] SMP [ 8102.017357] Modules linked in: nfsv4 dns_resolver nfs ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables nfsd auth_rpcgss nfs_acl lockd grace sunrpc virtio_balloon virtio_console virtio_net virtio_blk crct10dif_pclmul ppdev crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_pci serio_raw i2c_piix4 virtio_ring virtio ata_generic parport_pc pata_acpi parport acpi_cpufreq tpm_tis tpm_tis_core tpm [ 8102.025180] CPU: 3 PID: 4475 Comm: ::1-manager Not tainted 4.9.0-rc5-00268-gbb3ec452d27c #45 [ 8102.026093] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [ 8102.027122] task: ffff880132b93a80 task.stack: ffffc90003200000 [ 8102.027769] RIP: 0010:[<ffffffff8182717c>] [<ffffffff8182717c>] _raw_spin_lock+0xc/0x30 [ 8102.028657] RSP: 0018:ffffc90003203d40 EFLAGS: 00010246 [ 8102.029227] RAX: 0000000000000000 RBX: ffff880132d9c31c RCX: 0000000000000001 [ 8102.029996] RDX: 0000000000000001 RSI: 0000000000000048 RDI: 0000000000000048 [ 8102.030771] RBP: ffffc90003203d58 R08: 0000000032b45601 R09: 000000018010000e [ 8102.031532] R10: 000000008010000e R11: 0000000000100000 R12: 0000000000000048 [ 8102.032291] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880139fc2050 [ 8102.033064] FS: 0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000 [ 8102.033930] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8102.034546] CR2: 0000000000000048 CR3: 0000000139678000 CR4: 00000000000406e0 [ 8102.035312] Stack: [ 8102.035542] ffffffff81489818 ffff880132d9c200 0000000000000000 ffffc90003203d78 [ 8102.036400] ffffffffa02f9289 ffff8801314f29c0 0000000000000000 ffffc90003203de8 [ 8102.037259] ffffffffa02e3dd2 ffff88013393f000 ffff8801314f29e0 ffff8801314f2a08 [ 8102.038120] Call Trace: [ 8102.038392] [<ffffffff81489818>] ? _atomic_dec_and_lock+0x48/0x5b [ 8102.039073] [<ffffffffa02f9289>] nfs4_put_lock_state+0x29/0x90 [nfsv4] [ 8102.039797] [<ffffffffa02e3dd2>] nfs41_open_expired+0x192/0x410 [nfsv4] [ 8102.040523] [<ffffffffa02f859f>] nfs4_do_reclaim+0x1af/0x630 [nfsv4] [ 8102.041227] [<ffffffffa02f8f30>] nfs4_run_state_manager+0x510/0x7d0 [nfsv4] [ 8102.041995] [<ffffffffa02f8a20>] ? nfs4_do_reclaim+0x630/0x630 [nfsv4] [ 8102.042719] [<ffffffffa02f8a20>] ? nfs4_do_reclaim+0x630/0x630 [nfsv4] [ 8102.043434] [<ffffffff810c96a9>] kthread+0xd9/0xf0 [ 8102.043968] [<ffffffff810c95d0>] ? kthread_park+0x60/0x60 [ 8102.044559] [<ffffffff81827615>] ret_from_fork+0x25/0x30 [ 8102.045150] Code: c0 ba 01 00 00 00 f0 0f b1 17 85 c0 75 02 5d c3 89 c6 e8 18 db 8c ff 5d c3 66 0f 1f 44 00 00 66 66 66 66 90 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 f0 da 8c ff 5d [ 8102.048136] RIP [<ffffffff8182717c>] _raw_spin_lock+0xc/0x30 [ 8102.048775] RSP <ffffc90003203d40> [ 8102.049153] CR2: 0000000000000048 [ 8102.049893] ---[ end trace 8dc47952e274960a ]--- [ 8102.050391] Kernel panic - not syncing: Fatal exception [ 8102.051090] Kernel Offset: disabled [ 8102.051475] ---[ end Kernel panic - not syncing: Fatal exception OK, it now looks like the nfs4_state has been freed while walking the list of lock_states. I'll have to look at this a bit more tomorrow. I have to ferry the kids around now. Ben ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] NFSv4.1: Don't allow a lock_state to be freed while checking 2016-11-14 20:18 ` Benjamin Coddington @ 2016-11-15 16:12 ` Benjamin Coddington 2016-11-17 21:24 ` Benjamin Coddington 0 siblings, 1 reply; 4+ messages in thread From: Benjamin Coddington @ 2016-11-15 16:12 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs On 14 Nov 2016, at 15:18, Benjamin Coddington wrote: > On 14 Nov 2016, at 11:50, Benjamin Coddington wrote: > >> While walking the list of lock_states, keep the current >> nfs4_lock_state >> referenced so that it isn't freed while checking. >> >> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> >> --- >> fs/nfs/nfs4proc.c | 12 +++++++++--- >> 1 file changed, 9 insertions(+), 3 deletions(-) >> >> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >> index 7897826d7c51..9a1cb9e8c4fc 100644 >> --- a/fs/nfs/nfs4proc.c >> +++ b/fs/nfs/nfs4proc.c >> @@ -2564,12 +2564,15 @@ static void >> nfs41_check_delegation_stateid(struct nfs4_state *state) >> static int nfs41_check_expired_locks(struct nfs4_state *state) >> { >> int status, ret = NFS_OK; >> - struct nfs4_lock_state *lsp; >> + struct nfs4_lock_state *lsp, *tmp; >> struct nfs_server *server = NFS_SERVER(state->inode); >> >> if (!test_bit(LK_STATE_IN_USE, &state->flags)) >> goto out; >> - list_for_each_entry(lsp, &state->lock_states, ls_locks) { >> + spin_lock(&state->state_lock); >> + list_for_each_entry_safe(lsp, tmp, &state->lock_states, ls_locks) { >> + atomic_inc(&lsp->ls_count); >> + spin_unlock(&state->state_lock); >> if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) { >> struct rpc_cred *cred = lsp->ls_state->owner->so_cred; >> >> @@ -2588,7 +2591,10 @@ static int nfs41_check_expired_locks(struct >> nfs4_state *state) >> break; >> } >> } >> - }; >> + nfs4_put_lock_state(lsp); >> + spin_lock(&state->state_lock); >> + } >> + spin_unlock(&state->state_lock); >> out: >> return ret; >> } > > Actually, something else may be wrong here.. after several more hours > of > testing with this and Trond's two fixes for CLOSE races this popped > up: I've been an idiot. This patch is wrong in two ways: 1. list_for_each_entry_safe() won't keep the *next* entry from being freed 2. There shouldn't be any reason (I think) it can't hold the state lock over nfs41_test_and_free_expired_stateid(), so the much simpler fix is to hold it the entire time. (I expect the problem I saw with that the first time was the CLOSE/OPEN race recently fixed). *ben goes off and tests what should be a simple fix again* diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 7897826d7c51..34b36f148b65 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2569,6 +2569,7 @@ static int nfs41_check_expired_locks(struct nfs4_state *state) if (!test_bit(LK_STATE_IN_USE, &state->flags)) goto out; + spin_lock(&state->state_lock); list_for_each_entry(lsp, &state->lock_states, ls_locks) { if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) { struct rpc_cred *cred = lsp->ls_state->owner->so_cred; @@ -2588,7 +2589,8 @@ static int nfs41_check_expired_locks(struct nfs4_state *state) break; } } - }; + } + spin_unlock(&state->state_lock); out: return ret; } No, it /does/ soft lockup this time: crash> bt 2514 PID: 2514 TASK: ffff880131560000 CPU: 2 COMMAND: "::1-manager" #0 [ffffc90002a1b9d0] __schedule at ffffffff8182212e #1 [ffffc90002a1ba50] schedule at ffffffff818225d6 #2 [ffffc90002a1ba68] rpc_wait_bit_killable at ffffffffa00d38f4 [sunrpc] #3 [ffffc90002a1ba88] __wait_on_bit at ffffffff818229e1 #4 [ffffc90002a1bac8] out_of_line_wait_on_bit at ffffffff81822b62 #5 [ffffc90002a1bb40] __rpc_execute at ffffffffa00d477a [sunrpc] #6 [ffffc90002a1bba8] rpc_execute at ffffffffa00d6d41 [sunrpc] #7 [ffffc90002a1bbd8] rpc_run_task at ffffffffa00c7d99 [sunrpc] #8 [ffffc90002a1bc00] nfs4_call_sync_sequence at ffffffffa02d11e3 [nfsv4] #9 [ffffc90002a1bc78] _nfs41_test_stateid at ffffffffa02d9941 [nfsv4] #10 [ffffc90002a1bd30] nfs41_test_and_free_expired_stateid at ffffffffa02db9b9 [nfsv4] #11 [ffffc90002a1bd90] nfs41_open_expired at ffffffffa02dbdd7 [nfsv4] #12 [ffffc90002a1bdf0] nfs4_do_reclaim at ffffffffa02f057f [nfsv4] #13 [ffffc90002a1be88] nfs4_run_state_manager at ffffffffa02f0f10 [nfsv4] #14 [ffffc90002a1bec8] kthread at ffffffff810c96a9 #15 [ffffc90002a1bf50] ret_from_fork at ffffffff81827615 crash> l *(0xffffffffa00d477a) 0xffffffffa00d477a is in __rpc_execute (net/sunrpc/sched.c:807). 802 /* sync task: sleep here */ 803 dprintk("RPC: %5u sync task going to sleep\n", task->tk_pid); 804 status = out_of_line_wait_on_bit(&task->tk_runstate, 805 RPC_TASK_QUEUED, rpc_wait_bit_killable, 806 TASK_KILLABLE); 807 if (status == -ERESTARTSYS) { 808 /* 809 * When a sync task receives a signal, it exits with 810 * -ERESTARTSYS. In order to catch any callbacks that 811 * clean up after sleeping on some queue, we don't Why it is queued here? Did the server return DELAY maybe? Investigations are ongoing, advice welcomed. I am going to resume email silence until I have something worthwhile to contribute, rather than continue to reply to myself. Ben ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] NFSv4.1: Don't allow a lock_state to be freed while checking 2016-11-15 16:12 ` Benjamin Coddington @ 2016-11-17 21:24 ` Benjamin Coddington 0 siblings, 0 replies; 4+ messages in thread From: Benjamin Coddington @ 2016-11-17 21:24 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs On 15 Nov 2016, at 11:12, Benjamin Coddington wrote: > On 14 Nov 2016, at 15:18, Benjamin Coddington wrote: > >> On 14 Nov 2016, at 11:50, Benjamin Coddington wrote: >> >>> While walking the list of lock_states, keep the current >>> nfs4_lock_state >>> referenced so that it isn't freed while checking. >>> >>> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> >>> --- >>> fs/nfs/nfs4proc.c | 12 +++++++++--- >>> 1 file changed, 9 insertions(+), 3 deletions(-) >>> >>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >>> index 7897826d7c51..9a1cb9e8c4fc 100644 >>> --- a/fs/nfs/nfs4proc.c >>> +++ b/fs/nfs/nfs4proc.c >>> @@ -2564,12 +2564,15 @@ static void >>> nfs41_check_delegation_stateid(struct nfs4_state *state) >>> static int nfs41_check_expired_locks(struct nfs4_state *state) >>> { >>> int status, ret = NFS_OK; >>> - struct nfs4_lock_state *lsp; >>> + struct nfs4_lock_state *lsp, *tmp; >>> struct nfs_server *server = NFS_SERVER(state->inode); >>> >>> if (!test_bit(LK_STATE_IN_USE, &state->flags)) >>> goto out; >>> - list_for_each_entry(lsp, &state->lock_states, ls_locks) { >>> + spin_lock(&state->state_lock); >>> + list_for_each_entry_safe(lsp, tmp, &state->lock_states, ls_locks) >>> { >>> + atomic_inc(&lsp->ls_count); >>> + spin_unlock(&state->state_lock); >>> if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) { >>> struct rpc_cred *cred = lsp->ls_state->owner->so_cred; >>> >>> @@ -2588,7 +2591,10 @@ static int nfs41_check_expired_locks(struct >>> nfs4_state *state) >>> break; >>> } >>> } >>> - }; >>> + nfs4_put_lock_state(lsp); >>> + spin_lock(&state->state_lock); >>> + } >>> + spin_unlock(&state->state_lock); >>> out: >>> return ret; >>> } >> >> Actually, something else may be wrong here.. after several more hours >> of >> testing with this and Trond's two fixes for CLOSE races this popped >> up: > > I've been an idiot. This patch is wrong in two ways: > > 1. list_for_each_entry_safe() won't keep the *next* entry from being > freed > 2. There shouldn't be any reason (I think) it can't hold the state > lock over > nfs41_test_and_free_expired_stateid(), so the much simpler fix is > to hold it > the entire time. (I expect the problem I saw with that the first > time was the > CLOSE/OPEN race recently fixed). > > *ben goes off and tests what should be a simple fix again* > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index 7897826d7c51..34b36f148b65 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -2569,6 +2569,7 @@ static int nfs41_check_expired_locks(struct > nfs4_state *state) > > if (!test_bit(LK_STATE_IN_USE, &state->flags)) > goto out; > + spin_lock(&state->state_lock); > list_for_each_entry(lsp, &state->lock_states, ls_locks) { > if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) { > struct rpc_cred *cred = > lsp->ls_state->owner->so_cred; > @@ -2588,7 +2589,8 @@ static int nfs41_check_expired_locks(struct > nfs4_state *state) > break; > } > } > - }; > + } > + spin_unlock(&state->state_lock); > out: > return ret; > } > > No, it /does/ soft lockup this time: > > crash> bt 2514 > PID: 2514 TASK: ffff880131560000 CPU: 2 COMMAND: "::1-manager" > #0 [ffffc90002a1b9d0] __schedule at ffffffff8182212e > #1 [ffffc90002a1ba50] schedule at ffffffff818225d6 > #2 [ffffc90002a1ba68] rpc_wait_bit_killable at ffffffffa00d38f4 > [sunrpc] > #3 [ffffc90002a1ba88] __wait_on_bit at ffffffff818229e1 > #4 [ffffc90002a1bac8] out_of_line_wait_on_bit at ffffffff81822b62 > #5 [ffffc90002a1bb40] __rpc_execute at ffffffffa00d477a [sunrpc] > #6 [ffffc90002a1bba8] rpc_execute at ffffffffa00d6d41 [sunrpc] > #7 [ffffc90002a1bbd8] rpc_run_task at ffffffffa00c7d99 [sunrpc] > #8 [ffffc90002a1bc00] nfs4_call_sync_sequence at ffffffffa02d11e3 > [nfsv4] > #9 [ffffc90002a1bc78] _nfs41_test_stateid at ffffffffa02d9941 [nfsv4] > #10 [ffffc90002a1bd30] nfs41_test_and_free_expired_stateid at > ffffffffa02db9b9 [nfsv4] > #11 [ffffc90002a1bd90] nfs41_open_expired at ffffffffa02dbdd7 [nfsv4] > #12 [ffffc90002a1bdf0] nfs4_do_reclaim at ffffffffa02f057f [nfsv4] > #13 [ffffc90002a1be88] nfs4_run_state_manager at ffffffffa02f0f10 > [nfsv4] > #14 [ffffc90002a1bec8] kthread at ffffffff810c96a9 > #15 [ffffc90002a1bf50] ret_from_fork at ffffffff81827615 > > crash> l *(0xffffffffa00d477a) > 0xffffffffa00d477a is in __rpc_execute (net/sunrpc/sched.c:807). > 802 /* sync task: sleep here */ > 803 dprintk("RPC: %5u sync task going to sleep\n", > task->tk_pid); > 804 status = > out_of_line_wait_on_bit(&task->tk_runstate, > 805 RPC_TASK_QUEUED, > rpc_wait_bit_killable, > 806 TASK_KILLABLE); > 807 if (status == -ERESTARTSYS) { > 808 /* > 809 * When a sync task receives a signal, > it exits with > 810 * -ERESTARTSYS. In order to catch any > callbacks that > 811 * clean up after sleeping on some > queue, we don't > > Why it is queued here? Did the server return DELAY maybe? > Investigations are > ongoing, advice welcomed. I am going to resume email silence until I > have > something worthwhile to contribute, rather than continue to reply to > myself. I feel I've wasted a lot of time learning something which should be obvious: Don't hold a spinlock and then send a RPC, let alone several of them. Softlockup happens when rpciod is spinning on the state_lock in locku_done, and then xprtiod is scheduled to receive data on the same CPU. That data would be the replies to TEST_STATEID which would complete TEST_STATEID that would release the lock. So, please drop this patch. I'll send another that takes a reference and sets a flag on each lock state for testing. Then the spin lock can be dropped while TEST_STATEID and possible FREE_STATEID are sent. Ben ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-11-17 21:25 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-11-14 16:50 [PATCH] NFSv4.1: Don't allow a lock_state to be freed while checking Benjamin Coddington 2016-11-14 20:18 ` Benjamin Coddington 2016-11-15 16:12 ` Benjamin Coddington 2016-11-17 21:24 ` Benjamin Coddington
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).