Linux NFS development
 help / color / mirror / Atom feed
From: dai.ngo@oracle.com
To: Chuck Lever <chuck.lever@oracle.com>,
	Wolfgang Walter <linux-nfs@stwm.de>
Cc: Linux Nfs <linux-nfs@vger.kernel.org>
Subject: Re: kernel v6.6.3: nfsd hangs in nfsd_break_deleg_cb
Date: Mon, 4 Dec 2023 11:12:52 -0800	[thread overview]
Message-ID: <537b96d3-1d8a-4eaa-b271-e103f73e980d@oracle.com> (raw)
In-Reply-To: <ZW37M7DOavddVpFd@tissot.1015granger.net>


On 12/4/23 8:15 AM, Chuck Lever wrote:
> On Mon, Dec 04, 2023 at 04:34:00PM +0100, Wolfgang Walter wrote:
>> Hello,
>>
>> after upgrading from stable 6.1.63 to stable 6.6.3 our nfs-server logged a
>> WARNING and then more and more clients hanged:
>>
>>
>> Dec 04 14:59:25 engel kernel: ------------[ cut here ]------------
>> Dec 04 14:59:25 engel kernel: WARNING: CPU: 17 PID: 8431 at
>> fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x174/0x190 [nfsd]
>> Dec 04 14:59:25 engel kernel: Modules linked in: cts rpcsec_gss_krb5 msr
>> 8021q garp stp mrp llc binfmt_misc intel_rapl_msr intel_rapl_common sb_edac
>> x86_pkg_temp_thermal intel_powerclamp coretemp kv>
>> Dec 04 14:59:25 engel kernel:  enclosure sd_mod usbhid t10_pi hid
>> crc64_rocksoft crc64 crc_t10dif crct10dif_generic ixgbe ahci xfrm_algo
>> xhci_pci libahci dca mdio_devres mpt3sas ehci_pci crct10dif_p>
>> Dec 04 14:59:25 engel kernel: CPU: 17 PID: 8431 Comm: nfsd Not tainted
>> 6.6.3-debian64.all+1.2 #1
>> Dec 04 14:59:25 engel kernel: Hardware name: Supermicro X10DRi/X10DRI-T,
>> BIOS 1.1a 10/16/2015
>> Dec 04 14:59:25 engel kernel: RIP: 0010:nfsd_break_deleg_cb+0x174/0x190
>> [nfsd]
>> Dec 04 14:59:25 engel kernel: Code: 02 8c a4 c2 e9 ff fe ff ff 48 89 df be
>> 01 00 00 00 e8 70 7c ed c2 48 8d bb 98 00 00 00 e8 b4 0e 01 00 84 c0 0f 85
>> 2e ff ff ff <0f> 0b e9 27 ff ff ff be 02 00 00 0>
>> Dec 04 14:59:25 engel kernel: RSP: 0018:ffffbd57227c7a98 EFLAGS: 00010246
>> Dec 04 14:59:25 engel kernel: RAX: 0000000000000000 RBX: ffff94a77356e200
>> RCX: 0000000000000000
>> Dec 04 14:59:25 engel kernel: RDX: ffff94a77356e2c8 RSI: ffff94b78cf58000
>> RDI: 0000000000002000
>> Dec 04 14:59:25 engel kernel: RBP: ffff94a0392b3a34 R08: ffffbd57227c7a80
>> R09: 0000000000000000
>> Dec 04 14:59:25 engel kernel: R10: ffff94a05f4a9440 R11: 0000000000000000
>> R12: ffff94b8e3995b00
>> Dec 04 14:59:25 engel kernel: R13: ffff94a0392b3a20 R14: ffff94b8e3995b00
>> R15: 000000010eb733cd
>> Dec 04 14:59:25 engel kernel: FS:  0000000000000000(0000)
>> GS:ffff94b71fcc0000(0000) knlGS:0000000000000000
>> Dec 04 14:59:25 engel kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
>> 0000000080050033
>> Dec 04 14:59:25 engel kernel: CR2: 00007f9ef8554000 CR3: 000000295e020003
>> CR4: 00000000001706e0
>> Dec 04 14:59:25 engel kernel: Call Trace:
>> Dec 04 14:59:25 engel kernel:  <TASK>
>> Dec 04 14:59:25 engel kernel:  ? nfsd_break_deleg_cb+0x174/0x190 [nfsd]
>> Dec 04 14:59:25 engel kernel:  ? __warn+0x81/0x130
>> Dec 04 14:59:25 engel kernel:  ? nfsd_break_deleg_cb+0x174/0x190 [nfsd]
>> Dec 04 14:59:25 engel kernel:  ? report_bug+0x171/0x1a0
>> Dec 04 14:59:25 engel kernel:  ? handle_bug+0x3c/0x80
>> Dec 04 14:59:25 engel kernel:  ? exc_invalid_op+0x17/0x70
>> Dec 04 14:59:25 engel kernel:  ? asm_exc_invalid_op+0x1a/0x20
>> Dec 04 14:59:25 engel kernel:  ? nfsd_break_deleg_cb+0x174/0x190 [nfsd]
>> Dec 04 14:59:25 engel kernel:  ? nfsd_break_deleg_cb+0x9a/0x190 [nfsd]
>> Dec 04 14:59:25 engel kernel:  __break_lease+0x25c/0x720
>> Dec 04 14:59:25 engel kernel:  __nfsd_open.isra.0+0xa9/0x1a0 [nfsd]
>> Dec 04 14:59:25 engel kernel:  nfsd_file_do_acquire+0x4ca/0xc50 [nfsd]
>> Dec 04 14:59:25 engel kernel:  nfs4_get_vfs_file+0x34c/0x3b0 [nfsd]
>> Dec 04 14:59:25 engel kernel:  nfsd4_process_open2+0x42c/0x15d0 [nfsd]
>> Dec 04 14:59:25 engel kernel:  ? nfsd_permission+0x63/0x100 [nfsd]
>> Dec 04 14:59:25 engel kernel:  ? fh_verify+0x42e/0x720 [nfsd]
>> Dec 04 14:59:25 engel kernel:  nfsd4_open+0x64a/0xc40 [nfsd]
>> Dec 04 14:59:25 engel kernel:  ? nfsd4_encode_operation+0xa7/0x2b0 [nfsd]
>> Dec 04 14:59:25 engel kernel:  nfsd4_proc_compound+0x351/0x670 [nfsd]
>> Dec 04 14:59:25 engel kernel:  ? __pfx_nfsd+0x10/0x10 [nfsd]
>> Dec 04 14:59:25 engel kernel:  nfsd_dispatch+0x7c/0x1b0 [nfsd]
>> Dec 04 14:59:25 engel kernel:  svc_process_common+0x431/0x6e0 [sunrpc]
>> Dec 04 14:59:25 engel kernel:  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>> Dec 04 14:59:25 engel kernel:  ? __pfx_nfsd+0x10/0x10 [nfsd]
>> Dec 04 14:59:25 engel kernel:  svc_process+0x131/0x180 [sunrpc]
>> Dec 04 14:59:25 engel kernel:  nfsd+0x84/0xd0 [nfsd]
>> Dec 04 14:59:25 engel kernel:  kthread+0xe5/0x120
>> Dec 04 14:59:25 engel kernel:  ? __pfx_kthread+0x10/0x10
>> Dec 04 14:59:25 engel kernel:  ret_from_fork+0x31/0x50
>> Dec 04 14:59:25 engel kernel:  ? __pfx_kthread+0x10/0x10
>> Dec 04 14:59:25 engel kernel:  ret_from_fork_asm+0x1b/0x30
>> Dec 04 14:59:25 engel kernel:  </TASK>
>> Dec 04 14:59:25 engel kernel: ---[ end trace 0000000000000000 ]---
>>
>>
>> 6.1. did not show such a problem.
>>
>> Both are vanilla stable kernels (self-built).
> Thank you for your report.
>
> If you are able to bisect your server between v6.1 and v6.6, that
> would help us narrow down the cause.
>
> Dai, can you have a look at this?

The warning message indicates the callback work was not queued since
it was already queued. In this case we'll have taken an extra reference
to the stid that will never be put (see b95239ca4954a0), we should fix
this but I don't think this extra reference causing the client to hang.

It's hard to say what the root cause is without a core dump and/or some
network trace or a way to reproduce the problem. As Chuck mentioned, it's
best to bisect the server to help us narrow down the cause.

Wolfgang, could you provide some additional info such as how often this
problem happens, under load?, problem reproducible?, number of clients
involved, type of NFS activities, etc.

Thanks,
-Dai

>

  reply	other threads:[~2023-12-04 19:15 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-04 15:34 kernel v6.6.3: nfsd hangs in nfsd_break_deleg_cb Wolfgang Walter
2023-12-04 16:15 ` Chuck Lever
2023-12-04 19:12   ` dai.ngo [this message]
2023-12-04 21:10     ` Wolfgang Walter
2023-12-05  1:05       ` dai.ngo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=537b96d3-1d8a-4eaa-b271-e103f73e980d@oracle.com \
    --to=dai.ngo@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@stwm.de \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox