* [PATCH 1/1] SUNRPC: don't retry a gss destroy rpc if task was killed @ 2019-05-24 19:22 Olga Kornievskaia 2019-05-25 15:40 ` Trond Myklebust 0 siblings, 1 reply; 6+ messages in thread From: Olga Kornievskaia @ 2019-05-24 19:22 UTC (permalink / raw) To: trond.myklebust, anna.schumaker; +Cc: linux-nfs From: Olga Kornievskaia <kolga@netapp.com> It's possible that on an umount we send a NULL call to destroy a gss context and a failure occurs on the reply. Since it's possible that in that case the rpc client and auth structure are going away don't retry. Otherwise, the kernel hits the following oops. [37247.291617] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 [37247.296200] #PF error: [normal kernel read fault] [37247.298110] PGD 0 P4D 0 [37247.299264] Oops: 0000 [#1] SMP PTI [37247.300729] CPU: 1 PID: 23870 Comm: kworker/u256:1 Not tainted 5.1.0+ #172 [37247.303547] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 [37247.311770] Workqueue: rpciod rpc_async_schedule [sunrpc] [37247.313958] RIP: 0010:xprt_adjust_timeout+0x9/0x110 [sunrpc] [37247.316220] Code: c7 c7 20 0d 50 c0 31 c0 e8 68 00 e2 fc 41 c7 45 04 f4 ff ff ff eb c9 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 53 <48> 8b 87 98 00 00 00 48 89 fb 4c 8b 27 48 8b 80 a8 00 00 00 48 8b [37247.323625] RSP: 0018:ffffb0ab84f5fd68 EFLAGS: 00010207 [37247.325676] RAX: 00000000fffffff5 RBX: ffff9e0ff1042800 RCX: 0000000000000003 [37247.328433] RDX: ffff9e0ff11baac0 RSI: 00000000fffffe01 RDI: 0000000000000000 [37247.331206] RBP: ffff9e0fe20cb200 R08: ffff9e0ff11baac0 R09: ffff9e0ff11baac0 [37247.334038] R10: ffff9e0ff11baab8 R11: 0000000000000003 R12: ffff9e1039b55050 [37247.337098] R13: ffff9e0ff1042830 R14: 0000000000000000 R15: 0000000000000001 [37247.339966] FS: 0000000000000000(0000) GS:ffff9e103bc40000(0000) knlGS:0000000000000000 [37247.343261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [37247.345669] CR2: 0000000000000098 CR3: 000000007603a002 CR4: 00000000001606e0 [37247.348564] Call Trace: [37247.351034] rpc_check_timeout+0x1d/0x140 [sunrpc] [37247.353005] call_decode+0x13e/0x1f0 [sunrpc] [37247.354893] ? rpc_check_timeout+0x140/0x140 [sunrpc] [37247.357143] __rpc_execute+0x7e/0x3d0 [sunrpc] [37247.359104] rpc_async_schedule+0x29/0x40 [sunrpc] [37247.362565] process_one_work+0x16b/0x370 [37247.365598] worker_thread+0x49/0x3f0 [37247.367164] kthread+0xf5/0x130 [37247.368453] ? max_active_store+0x80/0x80 [37247.370087] ? kthread_bind+0x10/0x10 [37247.372505] ret_from_fork+0x1f/0x30 [37247.374695] Modules linked in: nfsv3 cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs rfcomm fuse ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_security ip6table_raw iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bnep snd_seq_midi snd_seq_midi_event crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper vmw_balloon snd_ens1371 snd_ac97_codec uvcvideo ac97_bus snd_seq pcspkr btusb btrtl btbcm videobuf2_vmalloc snd_pcm videobuf2_memops btintel videobuf2_v4l2 videodev bluetooth snd_timer snd_rawmidi vmw_vmci snd_seq_device rfkill videobuf2_common snd ecdh_generic i2c_piix4 soundcore nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops [37247.389774] ttm crc32c_intel drm serio_raw ahci ata_piix libahci libata mptspi scsi_transport_spi e1000 mptscsih mptbase i2c_core dm_mirror dm_region_hash dm_log dm_mod [37247.437859] CR2: 0000000000000098 [37247.462263] ---[ end trace 0d9a85f0df2cef9e ]--- Signed-off-by: Olga Kornievskaia <kolga@netapp.com> --- net/sunrpc/clnt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 8ff11dc..8928f93 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -2487,7 +2487,7 @@ void rpc_force_rebind(struct rpc_clnt *clnt) out_garbage: clnt->cl_stats->rpcgarbage++; - if (task->tk_garb_retry) { + if (task->tk_garb_retry && !(task->tk_flags & RPC_TASK_KILLED)) { task->tk_garb_retry--; task->tk_action = call_encode; return -EAGAIN; @@ -2541,7 +2541,7 @@ void rpc_force_rebind(struct rpc_clnt *clnt) case rpc_autherr_badcred: case rpc_autherr_badverf: /* possibly garbled cred/verf? */ - if (!task->tk_garb_retry) + if (!task->tk_garb_retry || task->tk_flags & RPC_TASK_KILLED) break; task->tk_garb_retry--; trace_rpc__bad_creds(task); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] SUNRPC: don't retry a gss destroy rpc if task was killed 2019-05-24 19:22 [PATCH 1/1] SUNRPC: don't retry a gss destroy rpc if task was killed Olga Kornievskaia @ 2019-05-25 15:40 ` Trond Myklebust 2019-05-28 14:32 ` Olga Kornievskaia 2019-05-28 14:36 ` Olga Kornievskaia 0 siblings, 2 replies; 6+ messages in thread From: Trond Myklebust @ 2019-05-25 15:40 UTC (permalink / raw) To: anna.schumaker@netapp.com, olga.kornievskaia@gmail.com Cc: linux-nfs@vger.kernel.org On Fri, 2019-05-24 at 15:22 -0400, Olga Kornievskaia wrote: > From: Olga Kornievskaia <kolga@netapp.com> > > It's possible that on an umount we send a NULL call to destroy > a gss context and a failure occurs on the reply. Since it's possible > that in that case the rpc client and auth structure are going away > don't retry. Otherwise, the kernel hits the following oops. > > [37247.291617] BUG: unable to handle kernel NULL pointer dereference > at 0000000000000098 > [37247.296200] #PF error: [normal kernel read fault] > [37247.298110] PGD 0 P4D 0 > [37247.299264] Oops: 0000 [#1] SMP PTI > [37247.300729] CPU: 1 PID: 23870 Comm: kworker/u256:1 Not tainted > 5.1.0+ #172 > [37247.303547] Hardware name: VMware, Inc. VMware Virtual > Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 > [37247.311770] Workqueue: rpciod rpc_async_schedule [sunrpc] > [37247.313958] RIP: 0010:xprt_adjust_timeout+0x9/0x110 [sunrpc] > [37247.316220] Code: c7 c7 20 0d 50 c0 31 c0 e8 68 00 e2 fc 41 c7 45 > 04 f4 ff ff ff eb c9 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 > 54 55 53 <48> 8b 87 98 00 00 00 48 89 fb 4c 8b 27 48 8b 80 a8 00 00 > 00 > 48 8b > [37247.323625] RSP: 0018:ffffb0ab84f5fd68 EFLAGS: 00010207 > [37247.325676] RAX: 00000000fffffff5 RBX: ffff9e0ff1042800 RCX: > 0000000000000003 > [37247.328433] RDX: ffff9e0ff11baac0 RSI: 00000000fffffe01 RDI: > 0000000000000000 > [37247.331206] RBP: ffff9e0fe20cb200 R08: ffff9e0ff11baac0 R09: > ffff9e0ff11baac0 > [37247.334038] R10: ffff9e0ff11baab8 R11: 0000000000000003 R12: > ffff9e1039b55050 > [37247.337098] R13: ffff9e0ff1042830 R14: 0000000000000000 R15: > 0000000000000001 > [37247.339966] FS: 0000000000000000(0000) GS:ffff9e103bc40000(0000) > knlGS:0000000000000000 > [37247.343261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [37247.345669] CR2: 0000000000000098 CR3: 000000007603a002 CR4: > 00000000001606e0 > [37247.348564] Call Trace: > [37247.351034] rpc_check_timeout+0x1d/0x140 [sunrpc] > [37247.353005] call_decode+0x13e/0x1f0 [sunrpc] > [37247.354893] ? rpc_check_timeout+0x140/0x140 [sunrpc] > [37247.357143] __rpc_execute+0x7e/0x3d0 [sunrpc] > [37247.359104] rpc_async_schedule+0x29/0x40 [sunrpc] > [37247.362565] process_one_work+0x16b/0x370 > [37247.365598] worker_thread+0x49/0x3f0 > [37247.367164] kthread+0xf5/0x130 > [37247.368453] ? max_active_store+0x80/0x80 > [37247.370087] ? kthread_bind+0x10/0x10 > [37247.372505] ret_from_fork+0x1f/0x30 > [37247.374695] Modules linked in: nfsv3 cts rpcsec_gss_krb5 nfsv4 > dns_resolver nfs rfcomm fuse ip6t_rpfilter ipt_REJECT nf_reject_ipv4 > ip6t_REJECT nf_reject_ipv6 xt_conntrack nf_conntrack nf_defrag_ipv6 > nf_defrag_ipv4 ebtable_nat ebtable_broute bridge stp llc > ip6table_mangle ip6table_security ip6table_raw iptable_mangle > iptable_security iptable_raw ebtable_filter ebtables ip6table_filter > ip6_tables iptable_filter bnep snd_seq_midi snd_seq_midi_event > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel > crypto_simd cryptd glue_helper vmw_balloon snd_ens1371 snd_ac97_codec > uvcvideo ac97_bus snd_seq pcspkr btusb btrtl btbcm videobuf2_vmalloc > snd_pcm videobuf2_memops btintel videobuf2_v4l2 videodev bluetooth > snd_timer snd_rawmidi vmw_vmci snd_seq_device rfkill videobuf2_common > snd ecdh_generic i2c_piix4 soundcore nfsd nfs_acl lockd auth_rpcgss > grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic > pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt > fb_sys_fops > [37247.389774] ttm crc32c_intel drm serio_raw ahci ata_piix libahci > libata mptspi scsi_transport_spi e1000 mptscsih mptbase i2c_core > dm_mirror dm_region_hash dm_log dm_mod > [37247.437859] CR2: 0000000000000098 > [37247.462263] ---[ end trace 0d9a85f0df2cef9e ]--- > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > --- > net/sunrpc/clnt.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > index 8ff11dc..8928f93 100644 > --- a/net/sunrpc/clnt.c > +++ b/net/sunrpc/clnt.c > @@ -2487,7 +2487,7 @@ void rpc_force_rebind(struct rpc_clnt *clnt) > > out_garbage: > clnt->cl_stats->rpcgarbage++; > - if (task->tk_garb_retry) { > + if (task->tk_garb_retry && !(task->tk_flags & RPC_TASK_KILLED)) > { > task->tk_garb_retry--; > task->tk_action = call_encode; > return -EAGAIN; > @@ -2541,7 +2541,7 @@ void rpc_force_rebind(struct rpc_clnt *clnt) > case rpc_autherr_badcred: > case rpc_autherr_badverf: > /* possibly garbled cred/verf? */ > - if (!task->tk_garb_retry) > + if (!task->tk_garb_retry || task->tk_flags & > RPC_TASK_KILLED) > break; > task->tk_garb_retry--; > trace_rpc__bad_creds(task); Hmm... The RPC_TASK_KILLED flag was changed in 5.2-rc1 in order to try to fix a few atomicity issues with it, so the above patch will not apply to the current codebase. The new behaviour will actually check the new flag in __rpc_execute() itself (see the check of RPC_SIGNALLED()). I'd therefore be very interested to see if the bug is still reproducible with the newest 5.2- rcX. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] SUNRPC: don't retry a gss destroy rpc if task was killed 2019-05-25 15:40 ` Trond Myklebust @ 2019-05-28 14:32 ` Olga Kornievskaia 2019-05-28 14:36 ` Olga Kornievskaia 1 sibling, 0 replies; 6+ messages in thread From: Olga Kornievskaia @ 2019-05-28 14:32 UTC (permalink / raw) To: Trond Myklebust; +Cc: anna.schumaker@netapp.com, linux-nfs@vger.kernel.org On Sat, May 25, 2019 at 11:40 AM Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Fri, 2019-05-24 at 15:22 -0400, Olga Kornievskaia wrote: > > From: Olga Kornievskaia <kolga@netapp.com> > > > > It's possible that on an umount we send a NULL call to destroy > > a gss context and a failure occurs on the reply. Since it's possible > > that in that case the rpc client and auth structure are going away > > don't retry. Otherwise, the kernel hits the following oops. > > > > [37247.291617] BUG: unable to handle kernel NULL pointer dereference > > at 0000000000000098 > > [37247.296200] #PF error: [normal kernel read fault] > > [37247.298110] PGD 0 P4D 0 > > [37247.299264] Oops: 0000 [#1] SMP PTI > > [37247.300729] CPU: 1 PID: 23870 Comm: kworker/u256:1 Not tainted > > 5.1.0+ #172 > > [37247.303547] Hardware name: VMware, Inc. VMware Virtual > > Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 > > [37247.311770] Workqueue: rpciod rpc_async_schedule [sunrpc] > > [37247.313958] RIP: 0010:xprt_adjust_timeout+0x9/0x110 [sunrpc] > > [37247.316220] Code: c7 c7 20 0d 50 c0 31 c0 e8 68 00 e2 fc 41 c7 45 > > 04 f4 ff ff ff eb c9 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 > > 54 55 53 <48> 8b 87 98 00 00 00 48 89 fb 4c 8b 27 48 8b 80 a8 00 00 > > 00 > > 48 8b > > [37247.323625] RSP: 0018:ffffb0ab84f5fd68 EFLAGS: 00010207 > > [37247.325676] RAX: 00000000fffffff5 RBX: ffff9e0ff1042800 RCX: > > 0000000000000003 > > [37247.328433] RDX: ffff9e0ff11baac0 RSI: 00000000fffffe01 RDI: > > 0000000000000000 > > [37247.331206] RBP: ffff9e0fe20cb200 R08: ffff9e0ff11baac0 R09: > > ffff9e0ff11baac0 > > [37247.334038] R10: ffff9e0ff11baab8 R11: 0000000000000003 R12: > > ffff9e1039b55050 > > [37247.337098] R13: ffff9e0ff1042830 R14: 0000000000000000 R15: > > 0000000000000001 > > [37247.339966] FS: 0000000000000000(0000) GS:ffff9e103bc40000(0000) > > knlGS:0000000000000000 > > [37247.343261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [37247.345669] CR2: 0000000000000098 CR3: 000000007603a002 CR4: > > 00000000001606e0 > > [37247.348564] Call Trace: > > [37247.351034] rpc_check_timeout+0x1d/0x140 [sunrpc] > > [37247.353005] call_decode+0x13e/0x1f0 [sunrpc] > > [37247.354893] ? rpc_check_timeout+0x140/0x140 [sunrpc] > > [37247.357143] __rpc_execute+0x7e/0x3d0 [sunrpc] > > [37247.359104] rpc_async_schedule+0x29/0x40 [sunrpc] > > [37247.362565] process_one_work+0x16b/0x370 > > [37247.365598] worker_thread+0x49/0x3f0 > > [37247.367164] kthread+0xf5/0x130 > > [37247.368453] ? max_active_store+0x80/0x80 > > [37247.370087] ? kthread_bind+0x10/0x10 > > [37247.372505] ret_from_fork+0x1f/0x30 > > [37247.374695] Modules linked in: nfsv3 cts rpcsec_gss_krb5 nfsv4 > > dns_resolver nfs rfcomm fuse ip6t_rpfilter ipt_REJECT nf_reject_ipv4 > > ip6t_REJECT nf_reject_ipv6 xt_conntrack nf_conntrack nf_defrag_ipv6 > > nf_defrag_ipv4 ebtable_nat ebtable_broute bridge stp llc > > ip6table_mangle ip6table_security ip6table_raw iptable_mangle > > iptable_security iptable_raw ebtable_filter ebtables ip6table_filter > > ip6_tables iptable_filter bnep snd_seq_midi snd_seq_midi_event > > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel > > crypto_simd cryptd glue_helper vmw_balloon snd_ens1371 snd_ac97_codec > > uvcvideo ac97_bus snd_seq pcspkr btusb btrtl btbcm videobuf2_vmalloc > > snd_pcm videobuf2_memops btintel videobuf2_v4l2 videodev bluetooth > > snd_timer snd_rawmidi vmw_vmci snd_seq_device rfkill videobuf2_common > > snd ecdh_generic i2c_piix4 soundcore nfsd nfs_acl lockd auth_rpcgss > > grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic > > pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt > > fb_sys_fops > > [37247.389774] ttm crc32c_intel drm serio_raw ahci ata_piix libahci > > libata mptspi scsi_transport_spi e1000 mptscsih mptbase i2c_core > > dm_mirror dm_region_hash dm_log dm_mod > > [37247.437859] CR2: 0000000000000098 > > [37247.462263] ---[ end trace 0d9a85f0df2cef9e ]--- > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > --- > > net/sunrpc/clnt.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > > index 8ff11dc..8928f93 100644 > > --- a/net/sunrpc/clnt.c > > +++ b/net/sunrpc/clnt.c > > @@ -2487,7 +2487,7 @@ void rpc_force_rebind(struct rpc_clnt *clnt) > > > > out_garbage: > > clnt->cl_stats->rpcgarbage++; > > - if (task->tk_garb_retry) { > > + if (task->tk_garb_retry && !(task->tk_flags & RPC_TASK_KILLED)) > > { > > task->tk_garb_retry--; > > task->tk_action = call_encode; > > return -EAGAIN; > > @@ -2541,7 +2541,7 @@ void rpc_force_rebind(struct rpc_clnt *clnt) > > case rpc_autherr_badcred: > > case rpc_autherr_badverf: > > /* possibly garbled cred/verf? */ > > - if (!task->tk_garb_retry) > > + if (!task->tk_garb_retry || task->tk_flags & > > RPC_TASK_KILLED) > > break; > > task->tk_garb_retry--; > > trace_rpc__bad_creds(task); > > > Hmm... The RPC_TASK_KILLED flag was changed in 5.2-rc1 in order to try > to fix a few atomicity issues with it, so the above patch will not > apply to the current codebase. > > The new behaviour will actually check the new flag in __rpc_execute() > itself (see the check of RPC_SIGNALLED()). I'd therefore be very > interested to see if the bug is still reproducible with the newest 5.2- > rcX. > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] SUNRPC: don't retry a gss destroy rpc if task was killed 2019-05-25 15:40 ` Trond Myklebust 2019-05-28 14:32 ` Olga Kornievskaia @ 2019-05-28 14:36 ` Olga Kornievskaia 2019-05-28 15:03 ` Olga Kornievskaia 1 sibling, 1 reply; 6+ messages in thread From: Olga Kornievskaia @ 2019-05-28 14:36 UTC (permalink / raw) To: Trond Myklebust; +Cc: anna.schumaker@netapp.com, linux-nfs@vger.kernel.org On Sat, May 25, 2019 at 11:40 AM Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Fri, 2019-05-24 at 15:22 -0400, Olga Kornievskaia wrote: > > From: Olga Kornievskaia <kolga@netapp.com> > > > > It's possible that on an umount we send a NULL call to destroy > > a gss context and a failure occurs on the reply. Since it's possible > > that in that case the rpc client and auth structure are going away > > don't retry. Otherwise, the kernel hits the following oops. > > > > [37247.291617] BUG: unable to handle kernel NULL pointer dereference > > at 0000000000000098 > > [37247.296200] #PF error: [normal kernel read fault] > > [37247.298110] PGD 0 P4D 0 > > [37247.299264] Oops: 0000 [#1] SMP PTI > > [37247.300729] CPU: 1 PID: 23870 Comm: kworker/u256:1 Not tainted > > 5.1.0+ #172 > > [37247.303547] Hardware name: VMware, Inc. VMware Virtual > > Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 > > [37247.311770] Workqueue: rpciod rpc_async_schedule [sunrpc] > > [37247.313958] RIP: 0010:xprt_adjust_timeout+0x9/0x110 [sunrpc] > > [37247.316220] Code: c7 c7 20 0d 50 c0 31 c0 e8 68 00 e2 fc 41 c7 45 > > 04 f4 ff ff ff eb c9 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 > > 54 55 53 <48> 8b 87 98 00 00 00 48 89 fb 4c 8b 27 48 8b 80 a8 00 00 > > 00 > > 48 8b > > [37247.323625] RSP: 0018:ffffb0ab84f5fd68 EFLAGS: 00010207 > > [37247.325676] RAX: 00000000fffffff5 RBX: ffff9e0ff1042800 RCX: > > 0000000000000003 > > [37247.328433] RDX: ffff9e0ff11baac0 RSI: 00000000fffffe01 RDI: > > 0000000000000000 > > [37247.331206] RBP: ffff9e0fe20cb200 R08: ffff9e0ff11baac0 R09: > > ffff9e0ff11baac0 > > [37247.334038] R10: ffff9e0ff11baab8 R11: 0000000000000003 R12: > > ffff9e1039b55050 > > [37247.337098] R13: ffff9e0ff1042830 R14: 0000000000000000 R15: > > 0000000000000001 > > [37247.339966] FS: 0000000000000000(0000) GS:ffff9e103bc40000(0000) > > knlGS:0000000000000000 > > [37247.343261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [37247.345669] CR2: 0000000000000098 CR3: 000000007603a002 CR4: > > 00000000001606e0 > > [37247.348564] Call Trace: > > [37247.351034] rpc_check_timeout+0x1d/0x140 [sunrpc] > > [37247.353005] call_decode+0x13e/0x1f0 [sunrpc] > > [37247.354893] ? rpc_check_timeout+0x140/0x140 [sunrpc] > > [37247.357143] __rpc_execute+0x7e/0x3d0 [sunrpc] > > [37247.359104] rpc_async_schedule+0x29/0x40 [sunrpc] > > [37247.362565] process_one_work+0x16b/0x370 > > [37247.365598] worker_thread+0x49/0x3f0 > > [37247.367164] kthread+0xf5/0x130 > > [37247.368453] ? max_active_store+0x80/0x80 > > [37247.370087] ? kthread_bind+0x10/0x10 > > [37247.372505] ret_from_fork+0x1f/0x30 > > [37247.374695] Modules linked in: nfsv3 cts rpcsec_gss_krb5 nfsv4 > > dns_resolver nfs rfcomm fuse ip6t_rpfilter ipt_REJECT nf_reject_ipv4 > > ip6t_REJECT nf_reject_ipv6 xt_conntrack nf_conntrack nf_defrag_ipv6 > > nf_defrag_ipv4 ebtable_nat ebtable_broute bridge stp llc > > ip6table_mangle ip6table_security ip6table_raw iptable_mangle > > iptable_security iptable_raw ebtable_filter ebtables ip6table_filter > > ip6_tables iptable_filter bnep snd_seq_midi snd_seq_midi_event > > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel > > crypto_simd cryptd glue_helper vmw_balloon snd_ens1371 snd_ac97_codec > > uvcvideo ac97_bus snd_seq pcspkr btusb btrtl btbcm videobuf2_vmalloc > > snd_pcm videobuf2_memops btintel videobuf2_v4l2 videodev bluetooth > > snd_timer snd_rawmidi vmw_vmci snd_seq_device rfkill videobuf2_common > > snd ecdh_generic i2c_piix4 soundcore nfsd nfs_acl lockd auth_rpcgss > > grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic > > pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt > > fb_sys_fops > > [37247.389774] ttm crc32c_intel drm serio_raw ahci ata_piix libahci > > libata mptspi scsi_transport_spi e1000 mptscsih mptbase i2c_core > > dm_mirror dm_region_hash dm_log dm_mod > > [37247.437859] CR2: 0000000000000098 > > [37247.462263] ---[ end trace 0d9a85f0df2cef9e ]--- > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > --- > > net/sunrpc/clnt.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > > index 8ff11dc..8928f93 100644 > > --- a/net/sunrpc/clnt.c > > +++ b/net/sunrpc/clnt.c > > @@ -2487,7 +2487,7 @@ void rpc_force_rebind(struct rpc_clnt *clnt) > > > > out_garbage: > > clnt->cl_stats->rpcgarbage++; > > - if (task->tk_garb_retry) { > > + if (task->tk_garb_retry && !(task->tk_flags & RPC_TASK_KILLED)) > > { > > task->tk_garb_retry--; > > task->tk_action = call_encode; > > return -EAGAIN; > > @@ -2541,7 +2541,7 @@ void rpc_force_rebind(struct rpc_clnt *clnt) > > case rpc_autherr_badcred: > > case rpc_autherr_badverf: > > /* possibly garbled cred/verf? */ > > - if (!task->tk_garb_retry) > > + if (!task->tk_garb_retry || task->tk_flags & > > RPC_TASK_KILLED) > > break; > > task->tk_garb_retry--; > > trace_rpc__bad_creds(task); > > > Hmm... The RPC_TASK_KILLED flag was changed in 5.2-rc1 in order to try > to fix a few atomicity issues with it, so the above patch will not > apply to the current codebase. > > The new behaviour will actually check the new flag in __rpc_execute() > itself (see the check of RPC_SIGNALLED()). I'd therefore be very > interested to see if the bug is still reproducible with the newest 5.2- > rcX. With 5.2-rcX, the umount hangs (but on the plus side it doesn't oops). > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] SUNRPC: don't retry a gss destroy rpc if task was killed 2019-05-28 14:36 ` Olga Kornievskaia @ 2019-05-28 15:03 ` Olga Kornievskaia 2019-05-28 20:09 ` Olga Kornievskaia 0 siblings, 1 reply; 6+ messages in thread From: Olga Kornievskaia @ 2019-05-28 15:03 UTC (permalink / raw) To: Trond Myklebust; +Cc: anna.schumaker@netapp.com, linux-nfs@vger.kernel.org On Tue, May 28, 2019 at 10:36 AM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > On Sat, May 25, 2019 at 11:40 AM Trond Myklebust > <trondmy@hammerspace.com> wrote: > > > > On Fri, 2019-05-24 at 15:22 -0400, Olga Kornievskaia wrote: > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > > > It's possible that on an umount we send a NULL call to destroy > > > a gss context and a failure occurs on the reply. Since it's possible > > > that in that case the rpc client and auth structure are going away > > > don't retry. Otherwise, the kernel hits the following oops. > > > > > > [37247.291617] BUG: unable to handle kernel NULL pointer dereference > > > at 0000000000000098 > > > [37247.296200] #PF error: [normal kernel read fault] > > > [37247.298110] PGD 0 P4D 0 > > > [37247.299264] Oops: 0000 [#1] SMP PTI > > > [37247.300729] CPU: 1 PID: 23870 Comm: kworker/u256:1 Not tainted > > > 5.1.0+ #172 > > > [37247.303547] Hardware name: VMware, Inc. VMware Virtual > > > Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 > > > [37247.311770] Workqueue: rpciod rpc_async_schedule [sunrpc] > > > [37247.313958] RIP: 0010:xprt_adjust_timeout+0x9/0x110 [sunrpc] > > > [37247.316220] Code: c7 c7 20 0d 50 c0 31 c0 e8 68 00 e2 fc 41 c7 45 > > > 04 f4 ff ff ff eb c9 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 > > > 54 55 53 <48> 8b 87 98 00 00 00 48 89 fb 4c 8b 27 48 8b 80 a8 00 00 > > > 00 > > > 48 8b > > > [37247.323625] RSP: 0018:ffffb0ab84f5fd68 EFLAGS: 00010207 > > > [37247.325676] RAX: 00000000fffffff5 RBX: ffff9e0ff1042800 RCX: > > > 0000000000000003 > > > [37247.328433] RDX: ffff9e0ff11baac0 RSI: 00000000fffffe01 RDI: > > > 0000000000000000 > > > [37247.331206] RBP: ffff9e0fe20cb200 R08: ffff9e0ff11baac0 R09: > > > ffff9e0ff11baac0 > > > [37247.334038] R10: ffff9e0ff11baab8 R11: 0000000000000003 R12: > > > ffff9e1039b55050 > > > [37247.337098] R13: ffff9e0ff1042830 R14: 0000000000000000 R15: > > > 0000000000000001 > > > [37247.339966] FS: 0000000000000000(0000) GS:ffff9e103bc40000(0000) > > > knlGS:0000000000000000 > > > [37247.343261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [37247.345669] CR2: 0000000000000098 CR3: 000000007603a002 CR4: > > > 00000000001606e0 > > > [37247.348564] Call Trace: > > > [37247.351034] rpc_check_timeout+0x1d/0x140 [sunrpc] > > > [37247.353005] call_decode+0x13e/0x1f0 [sunrpc] > > > [37247.354893] ? rpc_check_timeout+0x140/0x140 [sunrpc] > > > [37247.357143] __rpc_execute+0x7e/0x3d0 [sunrpc] > > > [37247.359104] rpc_async_schedule+0x29/0x40 [sunrpc] > > > [37247.362565] process_one_work+0x16b/0x370 > > > [37247.365598] worker_thread+0x49/0x3f0 > > > [37247.367164] kthread+0xf5/0x130 > > > [37247.368453] ? max_active_store+0x80/0x80 > > > [37247.370087] ? kthread_bind+0x10/0x10 > > > [37247.372505] ret_from_fork+0x1f/0x30 > > > [37247.374695] Modules linked in: nfsv3 cts rpcsec_gss_krb5 nfsv4 > > > dns_resolver nfs rfcomm fuse ip6t_rpfilter ipt_REJECT nf_reject_ipv4 > > > ip6t_REJECT nf_reject_ipv6 xt_conntrack nf_conntrack nf_defrag_ipv6 > > > nf_defrag_ipv4 ebtable_nat ebtable_broute bridge stp llc > > > ip6table_mangle ip6table_security ip6table_raw iptable_mangle > > > iptable_security iptable_raw ebtable_filter ebtables ip6table_filter > > > ip6_tables iptable_filter bnep snd_seq_midi snd_seq_midi_event > > > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel > > > crypto_simd cryptd glue_helper vmw_balloon snd_ens1371 snd_ac97_codec > > > uvcvideo ac97_bus snd_seq pcspkr btusb btrtl btbcm videobuf2_vmalloc > > > snd_pcm videobuf2_memops btintel videobuf2_v4l2 videodev bluetooth > > > snd_timer snd_rawmidi vmw_vmci snd_seq_device rfkill videobuf2_common > > > snd ecdh_generic i2c_piix4 soundcore nfsd nfs_acl lockd auth_rpcgss > > > grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic > > > pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt > > > fb_sys_fops > > > [37247.389774] ttm crc32c_intel drm serio_raw ahci ata_piix libahci > > > libata mptspi scsi_transport_spi e1000 mptscsih mptbase i2c_core > > > dm_mirror dm_region_hash dm_log dm_mod > > > [37247.437859] CR2: 0000000000000098 > > > [37247.462263] ---[ end trace 0d9a85f0df2cef9e ]--- > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > --- > > > net/sunrpc/clnt.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > > > index 8ff11dc..8928f93 100644 > > > --- a/net/sunrpc/clnt.c > > > +++ b/net/sunrpc/clnt.c > > > @@ -2487,7 +2487,7 @@ void rpc_force_rebind(struct rpc_clnt *clnt) > > > > > > out_garbage: > > > clnt->cl_stats->rpcgarbage++; > > > - if (task->tk_garb_retry) { > > > + if (task->tk_garb_retry && !(task->tk_flags & RPC_TASK_KILLED)) > > > { > > > task->tk_garb_retry--; > > > task->tk_action = call_encode; > > > return -EAGAIN; > > > @@ -2541,7 +2541,7 @@ void rpc_force_rebind(struct rpc_clnt *clnt) > > > case rpc_autherr_badcred: > > > case rpc_autherr_badverf: > > > /* possibly garbled cred/verf? */ > > > - if (!task->tk_garb_retry) > > > + if (!task->tk_garb_retry || task->tk_flags & > > > RPC_TASK_KILLED) > > > break; > > > task->tk_garb_retry--; > > > trace_rpc__bad_creds(task); > > > > > > Hmm... The RPC_TASK_KILLED flag was changed in 5.2-rc1 in order to try > > to fix a few atomicity issues with it, so the above patch will not > > apply to the current codebase. > > > > The new behaviour will actually check the new flag in __rpc_execute() > > itself (see the check of RPC_SIGNALLED()). I'd therefore be very > > interested to see if the bug is still reproducible with the newest 5.2- > > rcX. > > With 5.2-rcX, the umount hangs (but on the plus side it doesn't oops). Also to clarify, this is generic (sec) umount that hangs (not even using the bad verifier trigger). > > > > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trond.myklebust@hammerspace.com > > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/1] SUNRPC: don't retry a gss destroy rpc if task was killed 2019-05-28 15:03 ` Olga Kornievskaia @ 2019-05-28 20:09 ` Olga Kornievskaia 0 siblings, 0 replies; 6+ messages in thread From: Olga Kornievskaia @ 2019-05-28 20:09 UTC (permalink / raw) To: Trond Myklebust; +Cc: anna.schumaker@netapp.com, linux-nfs@vger.kernel.org On Tue, May 28, 2019 at 11:03 AM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > On Tue, May 28, 2019 at 10:36 AM Olga Kornievskaia > <olga.kornievskaia@gmail.com> wrote: > > > > On Sat, May 25, 2019 at 11:40 AM Trond Myklebust > > <trondmy@hammerspace.com> wrote: > > > > > > On Fri, 2019-05-24 at 15:22 -0400, Olga Kornievskaia wrote: > > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > > > > > It's possible that on an umount we send a NULL call to destroy > > > > a gss context and a failure occurs on the reply. Since it's possible > > > > that in that case the rpc client and auth structure are going away > > > > don't retry. Otherwise, the kernel hits the following oops. > > > > > > > > [37247.291617] BUG: unable to handle kernel NULL pointer dereference > > > > at 0000000000000098 > > > > [37247.296200] #PF error: [normal kernel read fault] > > > > [37247.298110] PGD 0 P4D 0 > > > > [37247.299264] Oops: 0000 [#1] SMP PTI > > > > [37247.300729] CPU: 1 PID: 23870 Comm: kworker/u256:1 Not tainted > > > > 5.1.0+ #172 > > > > [37247.303547] Hardware name: VMware, Inc. VMware Virtual > > > > Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018 > > > > [37247.311770] Workqueue: rpciod rpc_async_schedule [sunrpc] > > > > [37247.313958] RIP: 0010:xprt_adjust_timeout+0x9/0x110 [sunrpc] > > > > [37247.316220] Code: c7 c7 20 0d 50 c0 31 c0 e8 68 00 e2 fc 41 c7 45 > > > > 04 f4 ff ff ff eb c9 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 > > > > 54 55 53 <48> 8b 87 98 00 00 00 48 89 fb 4c 8b 27 48 8b 80 a8 00 00 > > > > 00 > > > > 48 8b > > > > [37247.323625] RSP: 0018:ffffb0ab84f5fd68 EFLAGS: 00010207 > > > > [37247.325676] RAX: 00000000fffffff5 RBX: ffff9e0ff1042800 RCX: > > > > 0000000000000003 > > > > [37247.328433] RDX: ffff9e0ff11baac0 RSI: 00000000fffffe01 RDI: > > > > 0000000000000000 > > > > [37247.331206] RBP: ffff9e0fe20cb200 R08: ffff9e0ff11baac0 R09: > > > > ffff9e0ff11baac0 > > > > [37247.334038] R10: ffff9e0ff11baab8 R11: 0000000000000003 R12: > > > > ffff9e1039b55050 > > > > [37247.337098] R13: ffff9e0ff1042830 R14: 0000000000000000 R15: > > > > 0000000000000001 > > > > [37247.339966] FS: 0000000000000000(0000) GS:ffff9e103bc40000(0000) > > > > knlGS:0000000000000000 > > > > [37247.343261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [37247.345669] CR2: 0000000000000098 CR3: 000000007603a002 CR4: > > > > 00000000001606e0 > > > > [37247.348564] Call Trace: > > > > [37247.351034] rpc_check_timeout+0x1d/0x140 [sunrpc] > > > > [37247.353005] call_decode+0x13e/0x1f0 [sunrpc] > > > > [37247.354893] ? rpc_check_timeout+0x140/0x140 [sunrpc] > > > > [37247.357143] __rpc_execute+0x7e/0x3d0 [sunrpc] > > > > [37247.359104] rpc_async_schedule+0x29/0x40 [sunrpc] > > > > [37247.362565] process_one_work+0x16b/0x370 > > > > [37247.365598] worker_thread+0x49/0x3f0 > > > > [37247.367164] kthread+0xf5/0x130 > > > > [37247.368453] ? max_active_store+0x80/0x80 > > > > [37247.370087] ? kthread_bind+0x10/0x10 > > > > [37247.372505] ret_from_fork+0x1f/0x30 > > > > [37247.374695] Modules linked in: nfsv3 cts rpcsec_gss_krb5 nfsv4 > > > > dns_resolver nfs rfcomm fuse ip6t_rpfilter ipt_REJECT nf_reject_ipv4 > > > > ip6t_REJECT nf_reject_ipv6 xt_conntrack nf_conntrack nf_defrag_ipv6 > > > > nf_defrag_ipv4 ebtable_nat ebtable_broute bridge stp llc > > > > ip6table_mangle ip6table_security ip6table_raw iptable_mangle > > > > iptable_security iptable_raw ebtable_filter ebtables ip6table_filter > > > > ip6_tables iptable_filter bnep snd_seq_midi snd_seq_midi_event > > > > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel > > > > crypto_simd cryptd glue_helper vmw_balloon snd_ens1371 snd_ac97_codec > > > > uvcvideo ac97_bus snd_seq pcspkr btusb btrtl btbcm videobuf2_vmalloc > > > > snd_pcm videobuf2_memops btintel videobuf2_v4l2 videodev bluetooth > > > > snd_timer snd_rawmidi vmw_vmci snd_seq_device rfkill videobuf2_common > > > > snd ecdh_generic i2c_piix4 soundcore nfsd nfs_acl lockd auth_rpcgss > > > > grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic > > > > pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt > > > > fb_sys_fops > > > > [37247.389774] ttm crc32c_intel drm serio_raw ahci ata_piix libahci > > > > libata mptspi scsi_transport_spi e1000 mptscsih mptbase i2c_core > > > > dm_mirror dm_region_hash dm_log dm_mod > > > > [37247.437859] CR2: 0000000000000098 > > > > [37247.462263] ---[ end trace 0d9a85f0df2cef9e ]--- > > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > > --- > > > > net/sunrpc/clnt.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > > > > index 8ff11dc..8928f93 100644 > > > > --- a/net/sunrpc/clnt.c > > > > +++ b/net/sunrpc/clnt.c > > > > @@ -2487,7 +2487,7 @@ void rpc_force_rebind(struct rpc_clnt *clnt) > > > > > > > > out_garbage: > > > > clnt->cl_stats->rpcgarbage++; > > > > - if (task->tk_garb_retry) { > > > > + if (task->tk_garb_retry && !(task->tk_flags & RPC_TASK_KILLED)) > > > > { > > > > task->tk_garb_retry--; > > > > task->tk_action = call_encode; > > > > return -EAGAIN; > > > > @@ -2541,7 +2541,7 @@ void rpc_force_rebind(struct rpc_clnt *clnt) > > > > case rpc_autherr_badcred: > > > > case rpc_autherr_badverf: > > > > /* possibly garbled cred/verf? */ > > > > - if (!task->tk_garb_retry) > > > > + if (!task->tk_garb_retry || task->tk_flags & > > > > RPC_TASK_KILLED) > > > > break; > > > > task->tk_garb_retry--; > > > > trace_rpc__bad_creds(task); > > > > > > > > > Hmm... The RPC_TASK_KILLED flag was changed in 5.2-rc1 in order to try > > > to fix a few atomicity issues with it, so the above patch will not > > > apply to the current codebase. > > > > > > The new behaviour will actually check the new flag in __rpc_execute() > > > itself (see the check of RPC_SIGNALLED()). I'd therefore be very > > > interested to see if the bug is still reproducible with the newest 5.2- > > > rcX. > > > > With 5.2-rcX, the umount hangs (but on the plus side it doesn't oops). > > Also to clarify, this is generic (sec) umount that hangs (not even > using the bad verifier trigger). > Hi Trond, There is an infinite loop going on during (sec=krb5,vers=3) unmount, the task is stuck as it gets ENOTCONN and keeps on just calling call_encode and then call_status in a loop. This fixes the problem. Which makes me question the patch that moved the ENOTCONN (commit c8485e4d63: SUNRPC: Handle ECONNREFUSED correctly in xprt_transmit()). Any comments on that? diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index d6e57da..94a653b 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -2288,13 +2288,13 @@ void rpc_force_rebind(struct rpc_clnt *clnt) case -ECONNREFUSED: case -ECONNRESET: case -ECONNABORTED: + case -ENOTCONN: rpc_force_rebind(clnt); /* fall through */ case -EADDRINUSE: rpc_delay(task, 3*HZ); /* fall through */ case -EPIPE: - case -ENOTCONN: case -EAGAIN: break; case -EIO: I have applied this fix and then been trying to trigger a context destruction with bad verifier but haven't been able to trigger it because as I explained before the rpc client gets killed right away with the gss destroy task never executing it. So I can't say if with the current code the problem exists or not. > > > > > > > > -- > > > Trond Myklebust > > > Linux NFS client maintainer, Hammerspace > > > trond.myklebust@hammerspace.com > > > > > > ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-05-28 20:09 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-05-24 19:22 [PATCH 1/1] SUNRPC: don't retry a gss destroy rpc if task was killed Olga Kornievskaia 2019-05-25 15:40 ` Trond Myklebust 2019-05-28 14:32 ` Olga Kornievskaia 2019-05-28 14:36 ` Olga Kornievskaia 2019-05-28 15:03 ` Olga Kornievskaia 2019-05-28 20:09 ` Olga Kornievskaia
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox