From: Dai Ngo <dai.ngo@oracle.com>
To: Rik Theys <Rik.Theys@esat.kuleuven.be>,
Jeff Layton <jlayton@kernel.org>,
Linux Nfs <linux-nfs@vger.kernel.org>
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning
Date: Tue, 19 Mar 2024 10:09:30 -0700 [thread overview]
Message-ID: <50dd9475-b485-4b9b-bcbd-5f7dfabfbac5@oracle.com> (raw)
In-Reply-To: <c7dbc796-7e7d-4041-ac71-eea02a701b12@esat.kuleuven.be>
On 3/19/24 12:58 AM, Rik Theys wrote:
> Hi,
>
> On 3/18/24 22:54, Jeff Layton wrote:
>> On Mon, 2024-03-18 at 22:15 +0100, Rik Theys wrote:
>>> Hi,
>>>
>>> On 3/18/24 21:21, Rik Theys wrote:
>>>> Hi Jeff,
>>>>
>>>> On 3/12/24 13:47, Jeff Layton wrote:
>>>>> On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
>>>>>> Hi Jeff,
>>>>>>
>>>>>> On 3/12/24 12:22, Jeff Layton wrote:
>>>>>>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>>>>>> Since a few weeks our Rocky Linux 9 NFS server has periodically
>>>>>>>> logged hung nfsd tasks. The initial effect was that some clients
>>>>>>>> could no longer access the NFS server. This got worse and worse
>>>>>>>> (probably as more nfsd threads got blocked) and we had to restart
>>>>>>>> the server. Restarting the server also failed as the NFS server
>>>>>>>> service could no longer be stopped.
>>>>>>>>
>>>>>>>> The initial kernel we noticed this behavior on was
>>>>>>>> kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
>>>>>>>> kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
>>>>>>>> happened again on this newer kernel version:
>>>>> 419 is fairly up to date with nfsd changes. There are some known bugs
>>>>> around callbacks, and there is a draft MR in flight to fix it.
>>>>>
>>>>> What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ? If we
>>>>> can bracket the changes around a particular version, then that might
>>>>> help identify the problem.
>>>>>
>>>>>>>> [Mon Mar 11 14:10:08 2024] Not tainted
>>>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>>> [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>> [Mon Mar 11 14:10:08 2024]task:nfsd state:D stack:0
>>>>>>>> pid:8865 ppid:2 flags:0x00004000
>>>>>>>> [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>> [Mon Mar 11 14:10:08 2024] <TASK>
>>>>>>>> [Mon Mar 11 14:10:08 2024] __schedule+0x21b/0x550
>>>>>>>> [Mon Mar 11 14:10:08 2024] schedule+0x2d/0x70
>>>>>>>> [Mon Mar 11 14:10:08 2024] schedule_timeout+0x11f/0x160
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? select_idle_sibling+0x28/0x430
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? wake_affine+0x62/0x1f0
>>>>>>>> [Mon Mar 11 14:10:08 2024] __wait_for_common+0x90/0x1d0
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_schedule_timeout+0x10/0x10
>>>>>>>> [Mon Mar 11 14:10:08 2024] __flush_workqueue+0x13a/0x3f0
>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd4_shutdown_callback+0x49/0x120
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? nfsd4_cld_remove+0x54/0x1d0
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] ?
>>>>>>>> nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? nfsd4_shutdown_copy+0x68/0xc0
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] __destroy_client+0x1f3/0x290
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd4_exchange_id+0x75f/0x770
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? nfsd4_decode_opaque+0x3a/0x90
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd4_proc_compound+0x44b/0x700
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] svc_process_common+0x2ec/0x660
>>>>>>>> [sunrpc]
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_nfsd_dispatch+0x10/0x10
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] svc_process+0x12d/0x170 [sunrpc]
>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd+0x84/0xb0 [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] kthread+0xdd/0x100
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_kthread+0x10/0x10
>>>>>>>> [Mon Mar 11 14:10:08 2024] ret_from_fork+0x29/0x50
>>>>>>>> [Mon Mar 11 14:10:08 2024] </TASK>
>>>>>>>> [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
>>>>>>>> more than 122 seconds.
>>>>>>>> [Mon Mar 11 14:10:08 2024] Not tainted
>>>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>>> [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>> [Mon Mar 11 14:10:08 2024]task:nfsd state:D stack:0
>>>>>>>> pid:8866 ppid:2 flags:0x00004000
>>>>>>>> [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>> [Mon Mar 11 14:10:08 2024] <TASK>
>>>>>>>> [Mon Mar 11 14:10:08 2024] __schedule+0x21b/0x550
>>>>>>>> [Mon Mar 11 14:10:08 2024] schedule+0x2d/0x70
>>>>>>>> [Mon Mar 11 14:10:08 2024] schedule_timeout+0x11f/0x160
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? select_idle_sibling+0x28/0x430
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? tcp_recvmsg+0x196/0x210
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? wake_affine+0x62/0x1f0
>>>>>>>> [Mon Mar 11 14:10:08 2024] __wait_for_common+0x90/0x1d0
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_schedule_timeout+0x10/0x10
>>>>>>>> [Mon Mar 11 14:10:08 2024] __flush_workqueue+0x13a/0x3f0
>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd4_destroy_session+0x1a4/0x240
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd4_proc_compound+0x44b/0x700
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] svc_process_common+0x2ec/0x660
>>>>>>>> [sunrpc]
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_nfsd_dispatch+0x10/0x10
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] svc_process+0x12d/0x170 [sunrpc]
>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd+0x84/0xb0 [nfsd]
>>>>>>>> [Mon Mar 11 14:10:08 2024] kthread+0xdd/0x100
>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_kthread+0x10/0x10
>>>>>>>> [Mon Mar 11 14:10:08 2024] ret_from_fork+0x29/0x50
>>>>>>>> [Mon Mar 11 14:10:08 2024] </TASK>
>>>>>>>>
>>>>>>> The above threads are trying to flush the workqueue, so that
>>>>>>> probably
>>>>>>> means that they are stuck waiting on a workqueue job to finish.
>>>>>>>> The above is repeated a few times, and then this warning is
>>>>>>>> also logged:
>>>>>>>> [Mon Mar 11 14:12:04 2024] ------------[ cut here
>>>>>>>> ]------------
>>>>>>>> [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
>>>>>>>> fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
>>>>>>>> dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
>>>>>>>> iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter
>>>>>>>> nft_ct
>>>>>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
>>>>>>>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat
>>>>>>>> fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>>>>>> ibcrc32c dm_service_time dm_multipath intel_rapl_msr
>>>>>>>> intel_rapl_common intel_uncore_frequency
>>>>>>>> intel_uncore_frequency_common isst_if_common skx_edac nfit
>>>>>>>> libnvdimm ipmi_ssif x86_pkg_temp
>>>>>>>> _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
>>>>>>>> dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
>>>>>>>> dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>>>>>> ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt
>>>>>>>> acpi_ipmi
>>>>>>>> mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
>>>>>>>> ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>>>>>> nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
>>>>>>>> mbcache jbd2 sd_mod sg lpfc
>>>>>>>> [Mon Mar 11 14:12:05 2024] nvmet_fc nvmet nvme_fc
>>>>>>>> nvme_fabrics
>>>>>>>> crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel
>>>>>>>> ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>>>>>> el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
>>>>>>>> dm_region_hash dm_log dm_mod
>>>>>>>> [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
>>>>>>>> tainted 5.14.0-419.el9.x86_64 #1
>>>>>>>> [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
>>>>>>>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>>>>>> [Mon Mar 11 14:12:05 2024] RIP:
>>>>>>>> 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48
>>>>>>>> 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9
>>>>>>>> 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>>>>>> 02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>>>>>> [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS:
>>>>>>>> 00010246
>>>>>>>> [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
>>>>>>>> ffff8ada51930900 RCX: 0000000000000024
>>>>>>>> [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
>>>>>>>> ffff8ad582933c00 RDI: 0000000000002000
>>>>>>>> [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
>>>>>>>> ffff9929e0bb7b48 R09: 0000000000000000
>>>>>>>> [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
>>>>>>>> 0000000000000000 R12: ffff8ad6f497c360
>>>>>>>> [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
>>>>>>>> ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>>>>>> [Mon Mar 11 14:12:05 2024] FS: 0000000000000000(0000)
>>>>>>>> GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>>>>>> [Mon Mar 11 14:12:05 2024] CS: 0010 DS: 0000 ES: 0000 CR0:
>>>>>>>> 0000000080050033
>>>>>>>> [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
>>>>>>>> 00000018e58de006 CR4: 00000000007706e0
>>>>>>>> [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
>>>>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>>>> [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
>>>>>>>> 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>> [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>>>>>> [Mon Mar 11 14:12:05 2024] Call Trace:
>>>>>>>> [Mon Mar 11 14:12:05 2024] <TASK>
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? __break_lease+0x16f/0x5f0
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? __warn+0x81/0x110
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? report_bug+0x10a/0x140
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? handle_bug+0x3c/0x70
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? exc_invalid_op+0x14/0x70
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? asm_exc_invalid_op+0x16/0x20
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] __break_lease+0x16f/0x5f0
>>>>>>>> [Mon Mar 11 14:12:05 2024] ?
>>>>>>>> nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? list_lru_del+0x101/0x150
>>>>>>>> [Mon Mar 11 14:12:05 2024] nfsd_file_do_acquire+0x790/0x830
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] nfs4_get_vfs_file+0x315/0x3a0
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] nfsd4_process_open2+0x430/0xa30
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? fh_verify+0x297/0x2f0 [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] nfsd4_proc_compound+0x44b/0x700
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] svc_process_common+0x2ec/0x660
>>>>>>>> [sunrpc]
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? __pfx_nfsd_dispatch+0x10/0x10
>>>>>>>> [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] svc_process+0x12d/0x170 [sunrpc]
>>>>>>>> [Mon Mar 11 14:12:05 2024] nfsd+0x84/0xb0 [nfsd]
>>>>>>>> [Mon Mar 11 14:12:05 2024] kthread+0xdd/0x100
>>>>>>>> [Mon Mar 11 14:12:05 2024] ? __pfx_kthread+0x10/0x10
>>>>>>>> [Mon Mar 11 14:12:05 2024] ret_from_fork+0x29/0x50
>>>>>>>> [Mon Mar 11 14:12:05 2024] </TASK>
>>>>>>>> [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651
>>>>>>>> ]---
>>>>>>> This is probably this WARN in nfsd_break_one_deleg:
>>>>>>>
>>>>>>> WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
>>>>>>>
>>>>>>> It means that a delegation break callback to the client couldn't be
>>>>>>> queued to the workqueue, and so it didn't run.
>>>>>>>
>>>>>>>> Could this be the same issue as described
>>>>>>>> here:https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/af0ec881-5ebf-4feb-98ae-3ed2a77f86f1@oracle.com/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdBV9En7$
>>>>>>>> ?
>>>>>>> Yes, most likely the same problem.
>>>>>> If I read that thread correctly, this issue was introduced between
>>>>>> 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
>>>>>> backported these changes, or were we hitting some other bug with
>>>>>> that
>>>>>> version? It seems the 6.1.x kernel is not affected? If so, that
>>>>>> would be
>>>>>> the recommended kernel to run?
>>>>> Anything is possible. We have to identify the problem first.
>>>>>>>> As described in that thread, I've tried to obtain the requested
>>>>>>>> information.
>>>>>>>>
>>>>>>>> Is it possible this is the issue that was fixed by the patches
>>>>>>>> described
>>>>>>>> here?https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkedtUP09$
>>>>>>>>
>>>>>>> Doubtful. Those are targeted toward a different set of issues.
>>>>>>>
>>>>>>> If you're willing, I do have some patches queued up for CentOS here
>>>>>>> that
>>>>>>> fix some backchannel problems that could be related. I'm mainly
>>>>>>> waiting
>>>>>>> on Chuck to send these to Linus and then we'll likely merge them
>>>>>>> into
>>>>>>> CentOS soon afterward:
>>>>>>>
>>>>>>> https://urldefense.com/v3/__https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdvDn8y7$
>>>>>>>
>>>>>>>
>>>>>> If you can send me a patch file, I can rebuild the C9S kernel
>>>>>> with that
>>>>>> patch and run it. It can take a while for the bug to trigger as I
>>>>>> believe it seems to be very workload dependent (we were running very
>>>>>> stable for months and now hit this bug every other week).
>>>>>>
>>>>>>
>>>>> It's probably simpler to just pull down the build artifacts for
>>>>> that MR.
>>>>> You have to drill down through the CI for it, but they are here:
>>>>>
>>>>> https://urldefense.com/v3/__https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts*1194300175*publish_x86_64*6278921877*artifacts*__;Ly8vLy8!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkaP5eW8V$
>>>>>
>>>>>
>>>>> There's even a repo file you can install on the box to pull them
>>>>> down.
>>>> We installed this kernel on the server 3 days ago. Today, a user
>>>> informed us that their screen was black after logging in. Similar to
>>>> other occurrences of this issue, the mount command on the client was
>>>> hung. But in contrast to the other times, there were no messages in
>>>> the logs kernel logs on the server. Even restarting the client does
>>>> not resolve the issue.
>>
>> Ok, so you rebooted the client and it's still unable to mount? That
>> sounds like a server problem if so.
>>
>> Are both client and server running the same kernel?
> No, the server runs 5.14.0-427.3689_1194299994.el9 and the client
> 5.14.0-362.18.1.el9_3.
>>
>>>> Something still seems to be wrong on the server though. When I look at
>>>> the directories under /proc/fs/nfsd/clients, there's still a directory
>>>> for the specific client, even though it's no longer running:
>>>>
>>>> # cat 155/info
>>>> clientid: 0xc8edb7f65f4a9ad
>>>> address: "10.87.31.152:819"
>>>> status: confirmed
>>>> seconds from last renew: 33163
>>>> name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
>>>> minor version: 2
>>>> Implementation domain: "kernel.org"
>>>> Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
>>>> PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
>>>> Implementation time: [0, 0]
>>>> callback state: DOWN
>>>> callback address: 10.87.31.152:0
>>>>
>> If you just shut down the client, the server won't immediately purge its
>> record. In fact, assuming you're running the same kernel on the server,
>> it won't purge the client record until there is a conflicting request
>> for its state.
> Is there a way to force such a conflicting request (to get the client
> record to purge)?
Try:
# echo "expire" > /proc/fs/nfsd/clients/155/ctl
-Dai
>>
>>
>>> The nfsdclnts command for this client shows the following delegations:
>>>
>>> # nfsdclnts -f 155/states -t all
>>> Inode number | Type | Access | Deny | ip address | Filename
>>> 169346743 | open | r- | -- | 10.87.31.152:819 |
>>> disconnected dentry
>>> 169346743 | deleg | r | | 10.87.31.152:819 |
>>> disconnected dentry
>>> 169346746 | open | r- | -- | 10.87.31.152:819 |
>>> disconnected dentry
>>> 169346746 | deleg | r | | 10.87.31.152:819 |
>>> disconnected dentry
>>>
>>> I see a lot of recent patches regarding directory delegations. Could
>>> this be related to this?
>>>
>>> Will a 5.14.0-362.18.1.el9_3.0.1 kernel try to use a directory
>>> delegation?
>>>
>>>
>> No. Directory delegations are a new feature that's still under
>> development. They use some of the same machinery as file delegations,
>> but they wouldn't be a factor here.
>>
>>>> The system seems to have identified that the client is no longer
>>>> reachable, but the client entry does not go away. When a mount was
>>>> hanging on the client, there would be two directories in clients for
>>>> the same client. Killing the mount command clears up the second entry.
>>>>
>>>> Even after running conntrack -D on the server to remove the tcp
>>>> connection from the conntrack table, the entry doesn't go away and the
>>>> client still can not mount anything from the server.
>>>>
>>>> A tcpdump on the client while a mount was running logged the following
>>>> messages over and over again:
>>>>
>>>> request:
>>>>
>>>> Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024 bits)
>>>> Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
>>>> ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
>>>> Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
>>>> Transmission Control Protocol, Src Port: 932, Dst Port: 2049, Seq: 1,
>>>> Ack: 1, Len: 312
>>>> Remote Procedure Call, Type:Call XID:0x1d3220c4
>>>> Network File System
>>>> [Program Version: 4]
>>>> [V4 Procedure: COMPOUND (1)]
>>>> GSS Data, Ops(1): CREATE_SESSION
>>>> Length: 152
>>>> GSS Sequence Number: 76
>>>> Tag: <EMPTY>
>>>> minorversion: 2
>>>> Operations (count: 1): CREATE_SESSION
>>>> [Main Opcode: CREATE_SESSION (43)]
>>>> GSS Checksum:
>>>> 00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
>>>>
>>>> GSS Token Length: 40
>>>> GSS-API Generic Security Service Application Program
>>>> Interface
>>>> krb5_blob:
>>>> 040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…
>>>>
>>>>
>>>> response:
>>>>
>>>> Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648 bits)
>>>> Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
>>>> HP_19:7d:4b (e0:73:e7:19:7d:4b)
>>>> Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
>>>> Transmission Control Protocol, Src Port: 2049, Dst Port: 932, Seq: 1,
>>>> Ack: 313, Len: 140
>>>> Remote Procedure Call, Type:Reply XID:0x1d3220c4
>>>> Network File System
>>>> [Program Version: 4]
>>>> [V4 Procedure: COMPOUND (1)]
>>>> GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
>>>> Length: 24
>>>> GSS Sequence Number: 76
>>>> Status: NFS4ERR_DELAY (10008)
>>>> Tag: <EMPTY>
>>>> Operations (count: 1)
>>>> [Main Opcode: CREATE_SESSION (43)]
>>>> GSS Checksum:
>>>> 00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
>>>>
>>>> GSS Token Length: 40
>>>> GSS-API Generic Security Service Application Program
>>>> Interface
>>>> krb5_blob:
>>>> 040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…
>>>>
>>>>
>>>> I was hoping that giving the client a different IP address would
>>>> resolve the issue for this client, but it didn't. Even though the
>>>> client had a new IP address (hostname was kept the same), it failed to
>>>> mount anything from the server.
>>>>
>> Changing the IP address won't help. The client is probably using the
>> same long-form client id as before, so the server still identifies the
>> client even with the address change.
> How is the client id determined? Will changing the hostname of the
> client trigger a change of the client id?
>>
>> Unfortunately, the cause of an NFS4ERR_DELAY error is tough to guess.
>> The client is expected to back off and retry, so if the server keeps
>> returning that repeatedly, then a hung mount command is expected.
>>
>> The question is why the server would keep returning DELAY. A lot of
>> different problems ranging from memory allocation issues to protocol
>> problems can result in that error. You may want to check the NFS server
>> and see if anything was logged there.
> There are no messages in the system logs that indicate any sort of
> memory issue. We also increased the min_kbytes_free sysctl to 2G on
> the server before we restarted it with the newer kernel.
>>
>> This is on a CREATE_SESSION call, so I wonder if the record held by the
>> (courteous) server is somehow blocking the attempt to reestablish the
>> session?
>>
>> Do you have a way to reproduce this? Since this is a centos kernel, you
>> could follow the page here to open a bug:
>
> Unfortunately we haven't found a reliable way to reproduce it. But we
> do seem to trigger it more and more lately.
>
> Regards,
>
> Rik
>
>>
>> https://urldefense.com/v3/__https://wiki.centos.org/ReportBugs.html__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkWIqsboq$
>>
>>
>>>> I created another dump of the workqueues and worker pools on the
>>>> server:
>>>>
>>>> [Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker pools:
>>>> [Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
>>>> [Mon Mar 18 14:59:33 2024] pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>> active=1/256 refcnt=2
>>>> [Mon Mar 18 14:59:33 2024] pending: drm_fb_helper_damage_work
>>>> [drm_kms_helper]
>>>> [Mon Mar 18 14:59:33 2024] workqueue events_power_efficient:
>>>> flags=0x80
>>>> [Mon Mar 18 14:59:33 2024] pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>> active=1/256 refcnt=2
>>>> [Mon Mar 18 14:59:33 2024] pending: fb_flashcursor
>>>> [Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
>>>> [Mon Mar 18 14:59:33 2024] pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>> active=1/256 refcnt=3
>>>> [Mon Mar 18 14:59:33 2024] pending: lru_add_drain_per_cpu BAR(362)
>>>> [Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
>>>> [Mon Mar 18 14:59:33 2024] pwq 55: cpus=27 node=1 flags=0x0 nice=-20
>>>> active=1/256 refcnt=2
>>>> [Mon Mar 18 14:59:33 2024] pending: blk_mq_timeout_work
>>>>
>>>>
>>>> In contrast to last time, it doesn't show anything regarding nfs this
>>>> time.
>>>>
>>>> I also tried the suggestion from Dai Ngo (echo 3 >
>>>> /proc/sys/vm/drop_caches), but that didn't seem to make any
>>>> difference.
>>>>
>>>> We haven't restarted the server yet as it seems the impact seems to
>>>> affect fewer clients that before. Is there anything we can run on the
>>>> server to further debug this?
>>>>
>>>> In the past, the issue seemed to deteriorate rapidly and resulted in
>>>> issues for almost all clients after about 20 minutes. This time the
>>>> impact seems to be less, but it's not gone.
>>>>
>>>> How can we force the NFS server to forget about a specific client? I
>>>> haven't tried to restart the nfs service yet as I'm afraid it will
>>>> fail to stop as before.
>>>>
>> Not with that kernel. There are some new administrative interfaces that
>> might allow that in the future, but they were just merged upstream and
>> aren't in that kernel.
>>
>> --
>> Jeff Layton <jlayton@kernel.org>
>
next prev parent reply other threads:[~2024-03-19 17:12 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-12 12:24 nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning Rik Theys
2024-03-12 12:47 ` Jeff Layton
2024-03-18 20:21 ` Rik Theys
2024-03-18 21:15 ` Rik Theys
2024-03-18 21:54 ` Jeff Layton
2024-03-19 7:58 ` Rik Theys
2024-03-19 10:39 ` Jeff Layton
2024-03-19 10:58 ` Rik Theys
2024-03-19 11:36 ` Jeff Layton
2024-03-20 13:03 ` Charles Hedrick
2024-03-19 17:09 ` Dai Ngo [this message]
2024-03-19 19:41 ` Rik Theys
2024-03-19 20:55 ` Jeff Layton
2024-03-19 21:42 ` Dai Ngo
2024-03-20 19:41 ` Rik Theys
2024-03-21 20:48 ` Jeff Layton
2024-03-21 21:13 ` Rik Theys
-- strict thread matches above, loose matches on Subject: below --
2024-03-12 13:30 Rik Theys
2024-03-11 18:43 Rik Theys
2024-03-12 11:22 ` Jeff Layton
2024-03-12 11:37 ` Jeff Layton
2024-03-12 16:43 ` Dai Ngo
2024-03-12 18:23 ` Rik Theys
2024-03-13 18:44 ` Dai Ngo
2024-03-13 19:50 ` Rik Theys
2024-03-13 22:45 ` Dai Ngo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50dd9475-b485-4b9b-bcbd-5f7dfabfbac5@oracle.com \
--to=dai.ngo@oracle.com \
--cc=Rik.Theys@esat.kuleuven.be \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox