From: bfields@fieldses.org (J. Bruce Fields)
To: Nick Alcock <nix@esperi.org.uk>
Cc: NFS list <linux-nfs@vger.kernel.org>
Subject: Re: steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10)
Date: Tue, 23 Feb 2021 17:57:01 -0500 [thread overview]
Message-ID: <20210223225701.GD8042@fieldses.org> (raw)
In-Reply-To: <877dourt7c.fsf@esperi.org.uk>
On Sun, Jan 03, 2021 at 02:27:51PM +0000, Nick Alcock wrote:
> So I bought a game or two on Steam a fortnight ago to while away a
> Christmas spent stuck indoors with no family, decorations, or presents
> (stuck at my sister's and she got a perfectly-timed case of covid-19).
> It turned out this was a very bad idea!
>
> I keep my Steam directory on an ext4 fs on the local fileserver and
> access it via NFSv4.2: over 10GbE: it's actually faster than local
> spinning-rust storage would be (five-spindle RAID-6 and bcache and
> 128GiB RAM on the server for caching probably have something to do with
> that!) though likely slower than the NVMe the local machine actually
> has: but it doesn't have enough local storage, so I keep all the Steam
> stuff remotely.
>
> (I keep my $HOME on the same fileserver, though that's on xfs: I'd use
> xfs for Steam, too, but there are still a lot of games out there that
> fail on filesystems with 64-bit inode numbers.)
>
> Unfortunately, what I observe now when kicking up Steam is a spasm of
> downloading I/O for the game I just bought and then a complete
> client-side hang when it starts installing: just about all I can do is
> hard-reboot. This of course stops me using Steam for anything which
> didn't make me very happy :P
>
> The server hums onwards as though nothing went wrong, but the client is
> in a bad way indeed: not only has NFS hung but there are clearly some
> important locks taken out since even an ls on a non-NFS dir (with no NFS
> dirs in the PATH) hangs, as does a while :; ps -e; done loop. I can
> rebuild with lockdep and do a lock dump if that would be useful. (The
> kernel is not quite stock: it has DTrace built in, but the hangs happen
> even if the kernel module has never been loaded, and I'm pretty sure the
> hangs can't be associated with the not-very-substantial built-in DTrace
> changes. I can try rebuilding without that as well, but I'll have to rip
> chunks out of my monitoring machinery to do that so I've avoided doing
> it so far.)
>
> I finally got round to doing a bit of debugging today. NFS debugging
> dumps covering the last fifteen seconds or so before the hang and a
> couple of minutes afterwards (taken on the client, logged without
> incident to the server's syslog, as proof that not *everything* is
> stalled) are a few megabytes long so are at
> <http://www.esperi.org.uk/~nix/temporary/nfs-steam-hang.log>. Everything
> seems to go wrong after an especially large bunch of NFS commits.
>
> Both client (silk, 192.168.16.21) and server (loom, 192.168.16.8) are
> running 5.10.4 with the same NFS config, though the client has no NFS
> server exports at the time and the server isn't acting as an NFS client
> at all:
>
> CONFIG_NFS_FS=y
> # CONFIG_NFS_V2 is not set
> CONFIG_NFS_V3=y
> CONFIG_NFS_V3_ACL=y
> CONFIG_NFS_V4=y
> # CONFIG_NFS_SWAP is not set
> CONFIG_NFS_V4_1=y
> CONFIG_NFS_V4_2=y
> CONFIG_PNFS_FILE_LAYOUT=y
> CONFIG_PNFS_BLOCK=y
> CONFIG_PNFS_FLEXFILE_LAYOUT=m
> CONFIG_NFS_V4_1_IMPLEMENTATION_ID_DOMAIN="kernel.org"
> # CONFIG_NFS_V4_1_MIGRATION is not set
> # CONFIG_ROOT_NFS is not set
> # CONFIG_NFS_USE_LEGACY_DNS is not set
> CONFIG_NFS_USE_KERNEL_DNS=y
> CONFIG_NFS_DEBUG=y
> CONFIG_NFS_DISABLE_UDP_SUPPORT=y
> # CONFIG_NFS_V4_2_READ_PLUS is not set
> CONFIG_NFSD=y
> CONFIG_NFSD_V2_ACL=y
> CONFIG_NFSD_V3=y
> CONFIG_NFSD_V3_ACL=y
> CONFIG_NFSD_V4=y
> # CONFIG_NFSD_BLOCKLAYOUT is not set
> # CONFIG_NFSD_SCSILAYOUT is not set
> # CONFIG_NFSD_FLEXFILELAYOUT is not set
> CONFIG_NFSD_V4_2_INTER_SSC=y
> CONFIG_GRACE_PERIOD=y
> CONFIG_LOCKD=y
> CONFIG_LOCKD_V4=y
> CONFIG_NFS_ACL_SUPPORT=y
> CONFIG_NFS_COMMON=y
> CONFIG_SUNRPC=y
> CONFIG_SUNRPC_GSS=y
> CONFIG_SUNRPC_BACKCHANNEL=y
> CONFIG_SUNRPC_DEBUG=y
>
> The server (loom) has a large number of exports: many of the filesystems
> being exported are moved around via bind mounts and the like. The steam
> datadir is in /pkg/non-free/steam, which is bind-mounted onto
> /home/nix/.steam on the client: /home/nix itself is bind-mounted from
> /home/.loom.srvr.nix/steam, and is NFS-exported from the server under
> that pathname.
>
> Relevant exports, from /proc/fs/nfs/exports:
>
> / 192.168.16.0/24,silk.wkstn.nix(ro,insecure,no_root_squash,sync,no_wdelay,no_subtree_check,v4root,fsid=0,uuid=0a4a4563:00764033:8c827c0e:989cf534,sec=390003:390004:390005:1)
> /home/.loom.srvr.nix *.srvr.nix,fold.srvr.nix(rw,root_squash,sync,wdelay,no_subtree_check,uuid=0a4a4563:00764033:8c827c0e:989cf534,sec=1)
> /pkg/non-free/steam 192.168.16.0/24,silk.wkstn.nix(rw,root_squash,sync,wdelay,uuid=11b94016:cb6e4b64:beb6c134:9ea3d6b3,sec=1)
>
> Blocked tasks after the hang suggest that the client is hung waiting
> indefinitely for the server to reply, and I'd guess the server doesn't
> think it needs to. (I can do some server-side debugging dumps too if you
> think the problem might be there.)
I seem to remember there have been some subtle changes to how dropped
replies are handled. If I remember right: a v4 server is never supposed
to drop a reply without also closing the tcp connection, but it may have
done so in the past. (I *think* all those cases are fixed?)
And the client may have been changed to be more unforgiving about that
requirement in recent years.
But I don't know if that's where the issue is.
I'm not sure how best to go about debugging. I suppose you could try to
capture a network trace and work out if there's an RPC that isn't
getting a reply.
Maybe turning on server debugging and looking for "svc: svc_process
dropit" messages timed with the start of the hang. I think what you
want is "rpdebug -m rpc -s svcdsp".
Does rpc.mountd look healthy?
--b.
>
> [ 322.843962] sysrq: Show Blocked State
> [ 322.844058] task:tee state:D stack: 0 pid: 1176 ppid: 1173 flags:0x00000000
> [ 322.844064] Call Trace:
> [ 322.844074] __schedule+0x3db/0x7ee
> [ 322.844079] ? io_schedule+0x70/0x6d
> [ 322.844082] schedule+0x68/0xdc
> [ 322.844085] io_schedule+0x46/0x6d
> [ 322.844089] bit_wait_io+0x11/0x52
> [ 322.844092] __wait_on_bit+0x31/0x90
> [ 322.844096] out_of_line_wait_on_bit+0x7e/0x80
> [ 322.844100] ? var_wake_function+0x30/0x27
> [ 322.844104] nfs_wait_on_request.part.0+0x31/0x33
> [ 322.844107] nfs_page_group_lock_head+0x31/0x78
> [ 322.844110] nfs_lock_and_join_requests+0x9d/0x26e
> [ 322.844114] nfs_updatepage+0x1be/0x985
> [ 322.844118] nfs_write_end+0x128/0x52a
> [ 322.844122] ? iov_iter_copy_from_user_atomic+0xe8/0x3ce
> [ 322.844127] generic_perform_write+0xed/0x17c
> [ 322.844131] nfs_file_write+0x14a/0x29f
> [ 322.844135] new_sync_write+0xfb/0x16b
> [ 322.844139] vfs_write+0x1ee/0x29a
> [ 322.844142] ? vfs_read+0x17b/0x198
> [ 322.844145] ksys_write+0x61/0xd0
> [ 322.844148] __x64_sys_write+0x1a/0x1c
> [ 322.844151] do_syscall_64+0x32/0x45
> [ 322.844154] entry_SYSCALL_64_after_hwframe+0x44/0x0
> [ 322.844158] RIP: 0033:0x7f4781f323a3
> [ 322.844160] RSP: 002b:00007ffcf8220408 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 322.844164] RAX: ffffffffffffffda RBX: 0000000000000092 RCX: 00007f4781f323a3
> [ 322.844166] RDX: 0000000000000092 RSI: 00007ffcf8220530 RDI: 0000000000000003
> [ 322.844168] RBP: 00007ffcf8220530 R08: 0000000000000000 R09: 0000000000000001
> [ 322.844170] R10: 0000000000400ad2 R11: 0000000000000246 R12: 0000000000000092
> [ 322.844172] R13: 0000000002361580 R14: 0000000000000092 R15: 00007f4782201720
> [ 322.844202] task:CIPCServer::Thr state:D stack: 0 pid: 1801 ppid: 1540 flags:0x20020000
> [ 322.844207] Call Trace:
> [ 322.844212] __schedule+0x3db/0x7ee
> [ 322.844219] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e
> [ 322.844222] schedule+0x68/0xdc
> [ 322.844227] rpc_wait_bit_killable+0x11/0x52
> [ 322.844230] __wait_on_bit+0x31/0x90
> [ 322.844235] ? __rpc_atrun+0x20/0x1a
> [ 322.844238] ? __rpc_atrun+0x20/0x1a
> [ 322.844242] out_of_line_wait_on_bit+0x7e/0x80
> [ 322.844246] ? var_wake_function+0x30/0x27
> [ 322.844250] __rpc_execute+0xe5/0x485
> [ 322.844254] ? sugov_get_util+0xf0/0xeb
> [ 322.844258] rpc_execute+0xa5/0xaa
> [ 322.844263] rpc_run_task+0x148/0x187
> [ 322.844267] nfs4_do_call_sync+0x61/0x81
> [ 322.844272] _nfs4_proc_getattr+0x10a/0x120
> [ 322.844278] nfs4_proc_getattr+0x67/0xee
> [ 322.844283] __nfs_revalidate_inode+0xb7/0x2df
> [ 322.844287] nfs_access_get_cached+0x14d/0x1df
> [ 322.844291] nfs_do_access+0x48/0x28e
> [ 322.844296] ? step_into+0x2bc/0x6a1
> [ 322.844301] nfs_permission+0xa2/0x1b8
> [ 322.844305] inode_permission+0x96/0xca
> [ 322.844308] ? unlazy_walk+0x4c/0x80
> [ 322.844312] link_path_walk.part.0+0x2a5/0x320
> [ 322.844315] ? path_init+0x2c2/0x3e5
> [ 322.844318] path_lookupat+0x3f/0x1b1
> [ 322.844322] filename_lookup+0x97/0x17f
> [ 322.844326] ? kmem_cache_alloc+0x32/0x200
> [ 322.844331] user_path_at_empty+0x59/0x8a
> [ 322.844333] do_faccessat+0x70/0x227
> [ 322.844337] __ia32_sys_access+0x1c/0x1e
> [ 322.844340] __do_fast_syscall_32+0x5f/0x8e
> [ 322.844342] do_fast_syscall_32+0x3d/0x80
> [ 322.844346] entry_SYSCALL_compat_after_hwframe+0x45/0x0
> [ 322.844348] RIP: 0023:0xf7ee6549
> [ 322.844350] RSP: 002b:00000000eb763b28 EFLAGS: 00200292 ORIG_RAX: 0000000000000021
> [ 322.844353] RAX: ffffffffffffffda RBX: 000000005754d9e0 RCX: 0000000000000004
> [ 322.844355] RDX: 00000000f7ab3e1c RSI: 000000005754d9e0 RDI: 00000000eb763bc8
> [ 322.844357] RBP: 00000000f78a91c0 R08: 0000000000000000 R09: 0000000000000000
> [ 322.844359] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 322.844360] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 322.844364] task:CFileWriterThre state:D stack: 0 pid: 1810 ppid: 1540 flags:0x20024000
> [ 322.844367] Call Trace:
> [ 322.844371] __schedule+0x3db/0x7ee
> [ 322.844375] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e
> [ 322.844379] schedule+0x68/0xdc
> [ 322.844382] rpc_wait_bit_killable+0x11/0x52
> [ 322.844385] __wait_on_bit+0x31/0x90
> [ 322.844389] ? call_reserveresult+0xa0/0x9f
> [ 322.844391] ? call_transmit_status+0x160/0x159
> [ 322.844395] out_of_line_wait_on_bit+0x7e/0x80
> [ 322.844398] ? var_wake_function+0x30/0x27
> [ 322.844401] __rpc_execute+0xe5/0x485
> [ 322.844404] ? nfs_check_verifier+0x5b/0xa4
> [ 322.844407] rpc_execute+0xa5/0xaa
> [ 322.844410] rpc_run_task+0x148/0x187
> [ 322.844414] nfs4_do_call_sync+0x61/0x81
> [ 322.844417] _nfs4_proc_getattr+0x10a/0x120
> [ 322.844422] nfs4_proc_getattr+0x67/0xee
> [ 322.844426] __nfs_revalidate_inode+0xb7/0x2df
> [ 322.844430] nfs_getattr+0x159/0x374
> [ 322.844433] vfs_getattr_nosec+0x96/0xa4
> [ 322.844435] vfs_statx+0x7a/0xe5
> [ 322.844438] vfs_fstatat+0x1f/0x21
> [ 322.844443] __do_compat_sys_ia32_lstat64+0x2b/0x4e
> [ 322.844448] __ia32_compat_sys_ia32_lstat64+0x14/0x16
> [ 322.844450] __do_fast_syscall_32+0x5f/0x8e
> [ 322.844453] do_fast_syscall_32+0x3d/0x80
> [ 322.844457] entry_SYSCALL_compat_after_hwframe+0x45/0x0
> [ 322.844459] RIP: 0023:0xf7ee6549
> [ 322.844460] RSP: 002b:00000000eb561be8 EFLAGS: 00000292 ORIG_RAX: 00000000000000c4
> [ 322.844463] RAX: ffffffffffffffda RBX: 00000000c521bf60 RCX: 00000000eb561e70
> [ 322.844465] RDX: 00000000f7ab3e1c RSI: 00000000eb561e70 RDI: 00000000c521bf60
> [ 322.844467] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ 322.844468] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 322.844470] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 322.844477] task:Steam - DirWatc state:D stack: 0 pid: 1924 ppid: 1540 flags:0x20024000
> [ 322.844480] Call Trace:
> [ 322.844484] __schedule+0x3db/0x7ee
> [ 322.844488] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e
> [ 322.844492] schedule+0x68/0xdc
> [ 322.844495] rpc_wait_bit_killable+0x11/0x52
> [ 322.844498] __wait_on_bit+0x31/0x90
> [ 322.844501] ? rpc_sleep_on_timeout+0x64/0x9c
> [ 322.844504] ? rpc_check_timeout+0x1e0/0x1e0
> [ 322.844507] ? call_connect_status+0x180/0x17a
> [ 322.844511] out_of_line_wait_on_bit+0x7e/0x80
> [ 322.844513] ? var_wake_function+0x30/0x27
> [ 322.844516] __rpc_execute+0xe5/0x485
> [ 322.844520] rpc_execute+0xa5/0xaa
> [ 322.844523] rpc_run_task+0x148/0x187
> [ 322.844526] nfs4_do_call_sync+0x61/0x81
> [ 322.844530] _nfs4_proc_getattr+0x10a/0x120
> [ 322.844534] nfs4_proc_getattr+0x67/0xee
> [ 322.844538] __nfs_revalidate_inode+0xb7/0x2df
> [ 322.844542] nfs_lookup_verify_inode+0x67/0x82
> [ 322.844545] nfs_do_lookup_revalidate+0x19b/0x29c
> [ 322.844548] nfs4_do_lookup_revalidate+0x5d/0xc5
> [ 322.844552] nfs4_lookup_revalidate+0x6a/0x8d
> [ 322.844555] lookup_fast+0xb8/0x122
> [ 322.844559] path_openat+0xfd/0xfd8
> [ 322.844562] ? unlazy_walk+0x4c/0x80
> [ 322.844565] ? terminate_walk+0x62/0xed
> [ 322.844569] ? putname+0x4b/0x53
> [ 322.844572] do_filp_open+0x86/0x110
> [ 322.844576] ? kmem_cache_alloc+0x32/0x200
> [ 322.844579] ? __alloc_fd+0xb2/0x143
> [ 322.844583] do_sys_openat2+0x8d/0x13a
> [ 322.844585] ? do_faccessat+0x108/0x227
> [ 322.844588] __ia32_compat_sys_openat+0x48/0x63
> [ 322.844591] __do_fast_syscall_32+0x5f/0x8e
> [ 322.844594] do_fast_syscall_32+0x3d/0x80
> [ 322.844597] entry_SYSCALL_compat_after_hwframe+0x45/0x0
> [ 322.844600] RIP: 0023:0xf7ee6549
> [ 322.844601] RSP: 002b:000000008c8fbcb0 EFLAGS: 00200287 ORIG_RAX: 0000000000000127
> [ 322.844604] RAX: ffffffffffffffda RBX: 00000000ffffff9c RCX: 000000008c8fbffc
> [ 322.844606] RDX: 0000000000098800 RSI: 0000000000000000 RDI: 00000000f7ab3e1c
> [ 322.844608] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ 322.844609] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 322.844611] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 322.844614] task:CGenericAsyncFi state:D stack: 0 pid: 1927 ppid: 1540 flags:0x20020000
> [ 322.844617] Call Trace:
> [ 322.844621] __schedule+0x3db/0x7ee
> [ 322.844625] schedule+0x68/0xdc
> [ 322.844628] io_schedule+0x46/0x6d
> [ 322.844631] wait_on_page_bit_common+0xf8/0x325
> [ 322.844634] ? trace_raw_output_file_check_and_advance_wb_err+0x80/0x7d
> [ 322.844638] wait_on_page_bit+0x3f/0x41
> [ 322.844642] wait_on_page_writeback.part.0+0x35/0x80
> [ 322.844645] wait_on_page_writeback+0x27/0x29
> [ 322.844647] __filemap_fdatawait_range+0x88/0xd9
> [ 322.844651] filemap_write_and_wait_range+0x45/0x80
> [ 322.844655] nfs_wb_all+0x30/0x13d
> [ 322.844658] nfs4_file_flush+0x85/0xc0
> [ 322.844661] filp_close+0x3e/0x78
> [ 322.844664] __close_fd+0x23/0x30
> [ 322.844667] __ia32_sys_close+0x22/0x50
> [ 322.844670] __do_fast_syscall_32+0x5f/0x8e
> [ 322.844673] do_fast_syscall_32+0x3d/0x80
> [ 322.844676] entry_SYSCALL_compat_after_hwframe+0x45/0x0
> [ 322.844678] RIP: 0023:0xf7ee6549
> [ 322.844680] RSP: 002b:000000008c5fddb0 EFLAGS: 00200286 ORIG_RAX: 0000000000000006
> [ 322.844682] RAX: ffffffffffffffda RBX: 000000000000004d RCX: 0000000000000002
> [ 322.844684] RDX: 0000000000000000 RSI: 00000000f7d02e70 RDI: 000000000000004d
> [ 322.844686] RBP: 000000008c5fe110 R08: 0000000000000000 R09: 0000000000000000
> [ 322.844688] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 322.844689] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 322.844699] task:ThreadPoolForeg state:D stack: 0 pid: 1768 ppid: 1735 flags:0x00004000
> [ 322.844702] Call Trace:
> [ 322.844705] __schedule+0x3db/0x7ee
> [ 322.844709] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e
> [ 322.844712] schedule+0x68/0xdc
> [ 322.844716] rpc_wait_bit_killable+0x11/0x52
> [ 322.844719] __wait_on_bit+0x31/0x90
> [ 322.844722] ? rpc_sleep_on_timeout+0x64/0x9c
> [ 322.844725] ? rpc_check_timeout+0x1e0/0x1e0
> [ 322.844728] ? call_connect_status+0x180/0x17a
> [ 322.844731] out_of_line_wait_on_bit+0x7e/0x80
> [ 322.844734] ? var_wake_function+0x30/0x27
> [ 322.844737] __rpc_execute+0xe5/0x485
> [ 322.844740] ? nfs_check_verifier+0x5b/0xa4
> [ 322.844743] rpc_execute+0xa5/0xaa
> [ 322.844746] rpc_run_task+0x148/0x187
> [ 322.844750] nfs4_do_call_sync+0x61/0x81
> [ 322.844753] _nfs4_proc_getattr+0x10a/0x120
> [ 322.844758] nfs4_proc_getattr+0x67/0xee
> [ 322.844762] __nfs_revalidate_inode+0xb7/0x2df
> [ 322.844766] nfs_getattr+0x159/0x374
> [ 322.844769] vfs_getattr_nosec+0x96/0xa4
> [ 322.844771] vfs_statx+0x7a/0xe5
> [ 322.844774] __do_sys_newstat+0x31/0x4c
> [ 322.844778] __x64_sys_newstat+0x16/0x18
> [ 322.844781] do_syscall_64+0x32/0x45
> [ 322.844783] entry_SYSCALL_64_after_hwframe+0x44/0x0
> [ 322.844786] RIP: 0033:0x7fe70ebf19e6
> [ 322.844787] RSP: 002b:00007fe6c15b2068 EFLAGS: 00000246 ORIG_RAX: 0000000000000004
> [ 322.844790] RAX: ffffffffffffffda RBX: 00000bd4193e1910 RCX: 00007fe70ebf19e6
> [ 322.844792] RDX: 00007fe6c15b2078 RSI: 00007fe6c15b2078 RDI: 00000bd4193e1910
> [ 322.844794] RBP: 00007fe6c15b2160 R08: 0000000000000001 R09: 00007ffd971ed090
> [ 322.844796] R10: 0000000000056764 R11: 0000000000000246 R12: 00007fe6c15b2078
> [ 322.844797] R13: 00000bd4192f6f60 R14: 00007fe6c15b2108 R15: 00007fe6c15b2120
> [ 322.844801] task:ThreadPoolForeg state:D stack: 0 pid: 1769 ppid: 1735 flags:0x00000000
> [ 322.844804] Call Trace:
> [ 322.844807] __schedule+0x3db/0x7ee
> [ 322.844811] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e
> [ 322.844814] schedule+0x68/0xdc
> [ 322.844817] rpc_wait_bit_killable+0x11/0x52
> [ 322.844820] __wait_on_bit+0x31/0x90
> [ 322.844823] ? __rpc_atrun+0x20/0x1a
> [ 322.844826] ? __rpc_atrun+0x20/0x1a
> [ 322.844829] out_of_line_wait_on_bit+0x7e/0x80
> [ 322.844831] ? var_wake_function+0x30/0x27
> [ 322.844834] __rpc_execute+0xe5/0x485
> [ 322.844837] ? update_cfs_group+0x9a/0x9e
> [ 322.844840] rpc_execute+0xa5/0xaa
> [ 322.844842] rpc_run_task+0x148/0x187
> [ 322.844845] nfs4_do_call_sync+0x61/0x81
> [ 322.844849] _nfs4_proc_getattr+0x10a/0x120
> [ 322.844853] nfs4_proc_getattr+0x67/0xee
> [ 322.844856] __nfs_revalidate_inode+0xb7/0x2df
> [ 322.844859] nfs_file_write+0xfc/0x29f
> [ 322.844862] new_sync_write+0xfb/0x16b
> [ 322.844866] vfs_write+0x1ee/0x29a
> [ 322.844868] ? __fget_light+0x31/0x72
> [ 322.844871] ksys_write+0x61/0xd0
> [ 322.844874] __x64_sys_write+0x1a/0x1c
> [ 322.844877] do_syscall_64+0x32/0x45
> [ 322.844880] entry_SYSCALL_64_after_hwframe+0x44/0x0
> [ 322.844881] RIP: 0033:0x7fe71859ab7f
> [ 322.844883] RSP: 002b:00007fe6bd5b0d30 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> [ 322.844885] RAX: ffffffffffffffda RBX: 00007fe6bd5b0d68 RCX: 00007fe71859ab7f
> [ 322.844887] RDX: 0000000000000007 RSI: 00007fe6bd5b1090 RDI: 000000000000005b
> [ 322.844888] RBP: 00007fe6bd5b0de0 R08: 0000000000000000 R09: 00007ffd971ed090
> [ 322.844890] R10: 0000000000065fd2 R11: 0000000000000293 R12: 00000bd418d95020
> [ 322.844891] R13: 00007fe6bd5b1090 R14: 0000000000000007 R15: 0000000000000000
> [ 322.844900] task:ThreadPoolForeg state:D stack: 0 pid: 1807 ppid: 1773 flags:0x00000000
> [ 322.844902] Call Trace:
> [ 322.844904] __schedule+0x3db/0x7ee
> [ 322.844907] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e
> [ 322.844908] schedule+0x68/0xdc
> [ 322.844911] rpc_wait_bit_killable+0x11/0x52
> [ 322.844913] __wait_on_bit+0x31/0x90
> [ 322.844915] ? __rpc_atrun+0x20/0x1a
> [ 322.844916] ? __rpc_atrun+0x20/0x1a
> [ 322.844918] out_of_line_wait_on_bit+0x7e/0x80
> [ 322.844920] ? var_wake_function+0x30/0x27
> [ 322.844922] __rpc_execute+0xe5/0x485
> [ 322.844924] rpc_execute+0xa5/0xaa
> [ 322.844926] rpc_run_task+0x148/0x187
> [ 322.844928] nfs4_do_call_sync+0x61/0x81
> [ 322.844930] _nfs4_proc_getattr+0x10a/0x120
> [ 322.844933] nfs4_proc_getattr+0x67/0xee
> [ 322.844936] __nfs_revalidate_inode+0xb7/0x2df
> [ 322.844938] nfs_access_get_cached+0x14d/0x1df
> [ 322.844940] nfs_do_access+0x48/0x28e
> [ 322.844942] ? nfs4_lookup_revalidate+0x38/0x8d
> [ 322.844944] nfs_permission+0xa2/0x1b8
> [ 322.844946] inode_permission+0x96/0xca
> [ 322.844948] ? unlazy_walk+0x4c/0x80
> [ 322.844950] link_path_walk.part.0+0x2a5/0x320
> [ 322.844952] ? path_init+0x2c2/0x3e5
> [ 322.844954] path_lookupat+0x3f/0x1b1
> [ 322.844956] filename_lookup+0x97/0x17f
> [ 322.844959] ? do_unlk+0x9a/0xca
> [ 322.844960] ? kmem_cache_alloc+0x32/0x200
> [ 322.844963] user_path_at_empty+0x59/0x8a
> [ 322.844965] vfs_statx+0x64/0xe5
> [ 322.844967] __do_sys_newstat+0x31/0x4c
> [ 322.844968] ? do_fcntl+0x113/0x582
> [ 322.844971] ? fput+0x13/0x15
> [ 322.844972] ? __x64_sys_fcntl+0x81/0xa8
> [ 322.844974] __x64_sys_newstat+0x16/0x18
> [ 322.844976] do_syscall_64+0x32/0x45
> [ 322.844977] entry_SYSCALL_64_after_hwframe+0x44/0x0
> [ 322.844979] RIP: 0033:0x7f0d1fd0c9e6
> [ 322.844980] RSP: 002b:00007f0d08e12e18 EFLAGS: 00000246 ORIG_RAX: 0000000000000004
> [ 322.844981] RAX: ffffffffffffffda RBX: 00007f0d08e12edc RCX: 00007f0d1fd0c9e6
> [ 322.844983] RDX: 00007f0d08e12e28 RSI: 00007f0d08e12e28 RDI: 00001605c2ff1169
> [ 322.844984] RBP: 00007f0d08e12ec0 R08: 0000000000000001 R09: 00007fffaddd9090
> [ 322.844985] R10: 00001605c3133480 R11: 0000000000000246 R12: 0000000000000000
> [ 322.844986] R13: 0000000000000000 R14: 00001605c2ff0f00 R15: 0000000000000000
> [ 322.844988] task:ThreadPoolForeg state:D stack: 0 pid: 2031 ppid: 1773 flags:0x00004000
> [ 322.844990] Call Trace:
> [ 322.844992] __schedule+0x3db/0x7ee
> [ 322.844995] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e
> [ 322.844997] schedule+0x68/0xdc
> [ 322.844999] rpc_wait_bit_killable+0x11/0x52
> [ 322.845001] __wait_on_bit+0x31/0x90
> [ 322.845003] ? rpc_sleep_on_timeout+0x64/0x9c
> [ 322.845005] ? rpc_check_timeout+0x1e0/0x1e0
> [ 322.845006] ? call_connect_status+0x180/0x17a
> [ 322.845009] out_of_line_wait_on_bit+0x7e/0x80
> [ 322.845010] ? var_wake_function+0x30/0x27
> [ 322.845012] __rpc_execute+0xe5/0x485
> [ 322.845014] rpc_execute+0xa5/0xaa
> [ 322.845016] rpc_run_task+0x148/0x187
> [ 322.845018] nfs4_do_call_sync+0x61/0x81
> [ 322.845020] _nfs4_proc_getattr+0x10a/0x120
> [ 322.845023] nfs4_proc_getattr+0x67/0xee
> [ 322.845026] __nfs_revalidate_inode+0xb7/0x2df
> [ 322.845028] nfs_access_get_cached+0x14d/0x1df
> [ 322.845030] nfs_do_access+0x48/0x28e
> [ 322.845032] ? nfs4_lookup_revalidate+0x38/0x8d
> [ 322.845034] nfs_permission+0xa2/0x1b8
> [ 322.845036] inode_permission+0x96/0xca
> [ 322.845038] ? unlazy_walk+0x4c/0x80
> [ 322.845040] link_path_walk.part.0+0x2a5/0x320
> [ 322.845042] ? path_init+0x2c2/0x3e5
> [ 322.845044] path_lookupat+0x3f/0x1b1
> [ 322.845046] filename_lookup+0x97/0x17f
> [ 322.845049] ? kmem_cache_alloc+0x32/0x200
> [ 322.845051] user_path_at_empty+0x59/0x8a
> [ 322.845054] ? mntput_no_expire+0x4f/0x264
> [ 322.845055] vfs_statx+0x64/0xe5
> [ 322.845057] ? putname+0x4b/0x53
> [ 322.845059] __do_sys_newstat+0x31/0x4c
> [ 322.845062] ? __x64_sys_futex+0x13a/0x159
> [ 322.845064] ? switch_fpu_return+0x58/0xf2
> [ 322.845066] __x64_sys_newstat+0x16/0x18
> [ 322.845068] do_syscall_64+0x32/0x45
> [ 322.845069] entry_SYSCALL_64_after_hwframe+0x44/0x0
> [ 322.845071] RIP: 0033:0x7f0d1fd0c9e6
> [ 322.845072] RSP: 002b:00007f0d00331108 EFLAGS: 00000246 ORIG_RAX: 0000000000000004
> [ 322.845073] RAX: ffffffffffffffda RBX: 00001605c2fdc0a0 RCX: 00007f0d1fd0c9e6
> [ 322.845074] RDX: 00007f0d00331118 RSI: 00007f0d00331118 RDI: 00001605c2fdc0a0
> [ 322.845076] RBP: 00007f0d00331200 R08: 0000000000000001 R09: 00007fffaddd9090
> [ 322.845077] R10: 000000000005771c R11: 0000000000000246 R12: 00007f0d00331118
> [ 322.845078] R13: 00001605c3309c40 R14: 00007f0d003311a8 R15: 00007f0d003311c0
> [ 322.845087] task:192.168.16.8-ma state:D stack: 0 pid: 2044 ppid: 2 flags:0x00004000
> [ 322.845089] Call Trace:
> [ 322.845091] __schedule+0x3db/0x7ee
> [ 322.845094] ? trace_event_raw_event_rpc_xdr_overflow+0x280/0x27e
> [ 322.845095] schedule+0x68/0xdc
> [ 322.845098] rpc_wait_bit_killable+0x11/0x52
> [ 322.845100] __wait_on_bit+0x31/0x90
> [ 322.845102] ? __rpc_atrun+0x20/0x1a
> [ 322.845103] ? __rpc_atrun+0x20/0x1a
> [ 322.845105] out_of_line_wait_on_bit+0x7e/0x80
> [ 322.845107] ? var_wake_function+0x30/0x27
> [ 322.845109] __rpc_execute+0xe5/0x485
> [ 322.845111] rpc_execute+0xa5/0xaa
> [ 322.845113] rpc_run_task+0x148/0x187
> [ 322.845115] nfs4_do_call_sync+0x61/0x81
> [ 322.845117] _nfs4_proc_statfs+0xc1/0xcf
> [ 322.845120] nfs4_proc_statfs+0x55/0x74
> [ 322.845122] nfs_statfs+0x65/0x152
> [ 322.845124] ? _nfs4_proc_delegreturn+0x21e/0x2e5
> [ 322.845127] statfs_by_dentry+0x55/0x74
> [ 322.845129] vfs_statfs+0x24/0x43
> [ 322.845132] check_free_space+0x3c/0xe5
> [ 322.845135] do_acct_process+0x5d/0x5e1
> [ 322.845136] ? nfs_end_delegation_return+0x1cb/0x3aa
> [ 322.845138] ? complete+0x49/0x57
> [ 322.845141] acct_process+0xe5/0x105
> [ 322.845143] do_exit+0x781/0xa1b
> [ 322.845145] ? refcount_dec_and_lock+0x22/0x8c
> [ 322.845148] __module_put_and_exit+0x1b/0x1b
> [ 322.845150] nfs4_run_state_manager+0x293/0x933
> [ 322.845152] ? nfs4_do_reclaim+0x820/0x81c
> [ 322.845155] kthread+0x142/0x15f
> [ 322.845157] ? __kthread_create_worker+0x130/0x122
> [ 322.845159] ret_from_fork+0x1f/0x2a
> [ 369.859385] INFO: task tee:1176 blocked for more than 122 seconds.
> [ 369.859391] Not tainted 5.10.4-00021-gde017294a539-dirty #3
> [ 369.859393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 369.859395] task:tee state:D stack: 0 pid: 1176 ppid: 1173 flags:0x00000000
> [ 369.859399] Call Trace:
> [ 369.859407] __schedule+0x3db/0x7ee
> [ 369.859410] ? io_schedule+0x70/0x6d
> [ 369.859412] schedule+0x68/0xdc
> [ 369.859415] io_schedule+0x46/0x6d
> [ 369.859417] bit_wait_io+0x11/0x52
> [ 369.859420] __wait_on_bit+0x31/0x90
> [ 369.859422] out_of_line_wait_on_bit+0x7e/0x80
> [ 369.859426] ? var_wake_function+0x30/0x27
> [ 369.859429] nfs_wait_on_request.part.0+0x31/0x33
> [ 369.859431] nfs_page_group_lock_head+0x31/0x78
> [ 369.859433] nfs_lock_and_join_requests+0x9d/0x26e
> [ 369.859436] nfs_updatepage+0x1be/0x985
> [ 369.859439] nfs_write_end+0x128/0x52a
> [ 369.859443] ? iov_iter_copy_from_user_atomic+0xe8/0x3ce
> [ 369.859446] generic_perform_write+0xed/0x17c
> [ 369.859449] nfs_file_write+0x14a/0x29f
> [ 369.859452] new_sync_write+0xfb/0x16b
> [ 369.859455] vfs_write+0x1ee/0x29a
> [ 369.859457] ? vfs_read+0x17b/0x198
> [ 369.859459] ksys_write+0x61/0xd0
> [ 369.859462] __x64_sys_write+0x1a/0x1c
> [ 369.859464] do_syscall_64+0x32/0x45
> [ 369.859467] entry_SYSCALL_64_after_hwframe+0x44/0x0
> [ 369.859469] RIP: 0033:0x7f4781f323a3
> [ 369.859471] RSP: 002b:00007ffcf8220408 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 369.859474] RAX: ffffffffffffffda RBX: 0000000000000092 RCX: 00007f4781f323a3
> [ 369.859475] RDX: 0000000000000092 RSI: 00007ffcf8220530 RDI: 0000000000000003
> [ 369.859477] RBP: 00007ffcf8220530 R08: 0000000000000000 R09: 0000000000000001
> [ 369.859478] R10: 0000000000400ad2 R11: 0000000000000246 R12: 0000000000000092
> [ 369.859480] R13: 0000000002361580 R14: 0000000000000092 R15: 00007f4782201720
> [ 369.859495] INFO: task CGenericAsyncFi:1927 blocked for more than 122 seconds.
> [ 369.859497] Not tainted 5.10.4-00021-gde017294a539-dirty #3
> [ 369.859498] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 369.859499] task:CGenericAsyncFi state:D stack: 0 pid: 1927 ppid: 1540 flags:0x20020000
> [ 369.859501] Call Trace:
> [ 369.859504] __schedule+0x3db/0x7ee
> [ 369.859506] schedule+0x68/0xdc
> [ 369.859508] io_schedule+0x46/0x6d
> [ 369.859510] wait_on_page_bit_common+0xf8/0x325
> [ 369.859512] ? trace_raw_output_file_check_and_advance_wb_err+0x80/0x7d
> [ 369.859515] wait_on_page_bit+0x3f/0x41
> [ 369.859517] wait_on_page_writeback.part.0+0x35/0x80
> [ 369.859519] wait_on_page_writeback+0x27/0x29
> [ 369.859521] __filemap_fdatawait_range+0x88/0xd9
> [ 369.859523] filemap_write_and_wait_range+0x45/0x80
> [ 369.859525] nfs_wb_all+0x30/0x13d
> [ 369.859528] nfs4_file_flush+0x85/0xc0
> [ 369.859530] filp_close+0x3e/0x78
> [ 369.859532] __close_fd+0x23/0x30
> [ 369.859534] __ia32_sys_close+0x22/0x50
> [ 369.859536] __do_fast_syscall_32+0x5f/0x8e
> [ 369.859538] do_fast_syscall_32+0x3d/0x80
> [ 369.859540] entry_SYSCALL_compat_after_hwframe+0x45/0x0
> [ 369.859541] RIP: 0023:0xf7ee6549
> [ 369.859543] RSP: 002b:000000008c5fddb0 EFLAGS: 00200286 ORIG_RAX: 0000000000000006
> [ 369.859545] RAX: ffffffffffffffda RBX: 000000000000004d RCX: 0000000000000002
> [ 369.859546] RDX: 0000000000000000 RSI: 00000000f7d02e70 RDI: 000000000000004d
> [ 369.859547] RBP: 000000008c5fe110 R08: 0000000000000000 R09: 0000000000000000
> [ 369.859548] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ 369.859549] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 394.423093] nfs: server loom not responding, still trying
>
> --
> NULL && (void)
next prev parent reply other threads:[~2021-02-23 23:02 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-03 14:27 steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) Nick Alcock
2021-02-23 22:57 ` J. Bruce Fields [this message]
2021-02-23 23:58 ` Trond Myklebust
2021-02-24 2:01 ` bfields
2021-04-01 13:33 ` Nix
2021-04-01 13:44 ` bfields
2021-04-01 21:52 ` Nix
2021-04-02 19:20 ` bfields
2021-04-03 22:41 ` Nix
2021-04-05 11:48 ` Nix
2021-04-05 16:52 ` Nix
-- strict thread matches above, loose matches on Subject: below --
2021-02-13 15:21 Nick Alcock
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210223225701.GD8042@fieldses.org \
--to=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
--cc=nix@esperi.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.