linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Salvatore Bonaccorso <carnil@debian.org>
Cc: linux-nfs@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work
Date: Mon, 12 Oct 2020 10:26:02 -0400	[thread overview]
Message-ID: <20201012142602.GD26571@fieldses.org> (raw)
In-Reply-To: <20201011075913.GA8065@eldamar.lan>

On Sun, Oct 11, 2020 at 09:59:13AM +0200, Salvatore Bonaccorso wrote:
> Hi
> 
> On a system running 4.19.146-1 in Debian buster an issue got hit,
> while the server was under some slight load, but it does not seem
> easily reproducible, so asking if some more information can be
> provided to track/narrow this down. On the console the following was
> caught:

Worth checking git logs of fs/nfsd/nfs4state.c and
fs/nfsd/nfs4callback.c.  It might be
2bbfed98a4d82ac4e7abfcd4eba40bddfc670b1d "nfsd: Fix races between
nfsd4_cb_release() and nfsd4_shutdown_callback()" ?

--b.

> 
> [1002011.364398] list_add corruption. prev->next should be next (ffff9b9e875e6ea8), but was ffff9b9ec0a14140. (prev=ffff9b9ec0a14140).
> [1002011.376317] ------------[ cut here ]------------
> [1002011.381131] kernel BUG at lib/list_debug.c:28!
> [1002011.385781] invalid opcode: 0000 [#1] SMP PTI
> [1002011.390325] CPU: 13 PID: 25775 Comm: kworker/u40:0 Not tainted 4.19.0-11-amd64 #1 Debian 4.19.146-1
> [1002011.399562] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
> [1002011.409427] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
> [1002011.415539] RIP: 0010:__list_add_valid.cold.0+0x26/0x28
> [1002011.420949] Code: 00 00 00 c3 48 89 d1 48 c7 c7 d0 a3 69 a3 48 89 c2 e8 50 6b d0 ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 28 a4 69 a3 e8 3c 6b d0 ff <0f> 0b 48 89 fe 48 89 c2 48 c7 c7 b8 a4 69 a3 e8
>  28 6b d0 ff 0f 0b
> [1002011.439914] RSP: 0018:ffffbf14c6993e28 EFLAGS: 00010246
> [1002011.445329] RAX: 0000000000000075 RBX: ffff9b9ec0a14140 RCX: 0000000000000000
> [1002011.452656] RDX: 0000000000000000 RSI: ffff9baa8f7966b8 RDI: ffff9baa8f7966b8
> [1002011.459979] RBP: ffffbf14c6993e70 R08: 0000000000000559 R09: 0000000000aaaaaa
> [1002011.467301] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b9e875e6e00
> [1002011.474627] R13: ffff9ba0341872a8 R14: ffff9ba034187278 R15: ffff9b9e875e6ea8
> [1002011.481948] FS:  0000000000000000(0000) GS:ffff9baa8f780000(0000) knlGS:0000000000000000
> [1002011.490228] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [1002011.496160] CR2: 00007fda6b5330a0 CR3: 000000079d60a002 CR4: 00000000007606e0
> [1002011.503487] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [1002011.510806] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [1002011.518132] PKRU: 55555554
> [1002011.521022] Call Trace:
> [1002011.523672]  nfsd4_cb_recall_prepare+0x2b8/0x310 [nfsd]
> [1002011.529089]  nfsd4_run_cb_work+0x7d/0xf0 [nfsd]
> [1002011.533818]  process_one_work+0x1a7/0x3a0
> [1002011.538015]  worker_thread+0x30/0x390
> [1002011.541861]  ? create_worker+0x1a0/0x1a0
> [1002011.545968]  kthread+0x112/0x130
> [1002011.549381]  ? kthread_bind+0x30/0x30
> [1002011.553232]  ret_from_fork+0x35/0x40
> [1002011.556992] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache quota_v2 quota_tree bonding ipmi_ssif intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ast ttm ghash_clmulni_intel drm_kms_helper intel_cstate mei_me intel_uncore drm iTCO_wdt ioatdma intel_rapl_perf pcspkr evdev joydev pcc_cpufreq i2c_algo_bit sg iTCO_vendor_support mei ipmi_si dca ipmi_devintf wmi ipmi_msghandler acpi_pad acpi_power_meter button nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1
> [1002011.627690]  raid0 multipath linear md_mod hid_generic usbhid hid sd_mod crc32c_intel xhci_pci ahci xhci_hcd aesni_intel libahci aes_x86_64 crypto_simd libata arcmsr cryptd usbcore i40e scsi_mod glue_helper lpc_ich i2c_i801 mfd_core usb_common
> [1002011.651077] ---[ end trace 465dc56412b98978 ]---
> [1002011.765289] RIP: 0010:__list_add_valid.cold.0+0x26/0x28
> [1002011.771502] Code: 00 00 00 c3 48 89 d1 48 c7 c7 d0 a3 69 a3 48 89 c2 e8 50 6b d0 ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 28 a4 69 a3 e8 3c 6b d0 ff <0f> 0b 48 89 fe 48 89 c2 48 c7 c7 b8 a4 69 a3 e8 28 6b d0 ff 0f 0b
> [1002011.792067] RSP: 0018:ffffbf14c6993e28 EFLAGS: 00010246
> [1002011.798311] RAX: 0000000000000075 RBX: ffff9b9ec0a14140 RCX: 0000000000000000
> [1002011.806453] RDX: 0000000000000000 RSI: ffff9baa8f7966b8 RDI: ffff9baa8f7966b8
> [1002011.814572] RBP: ffffbf14c6993e70 R08: 0000000000000559 R09: 0000000000aaaaaa
> [1002011.822680] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b9e875e6e00
> [1002011.830789] R13: ffff9ba0341872a8 R14: ffff9ba034187278 R15: ffff9b9e875e6ea8
> [1002011.838895] FS:  0000000000000000(0000) GS:ffff9baa8f780000(0000) knlGS:0000000000000000
> [1002011.847972] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [1002011.854694] CR2: 00007fda6b5330a0 CR3: 000000079d60a002 CR4: 00000000007606e0
> [1002011.862806] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [1002011.870904] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [1002011.878989] PKRU: 55555554
> [1002011.882636] Kernel panic - not syncing: Fatal exception
> [1002011.888846] Kernel Offset: 0x21800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> 
> Kernel version is 4.19.146-1 as in the current Debian buster.
> 
> The exports are as:
> 
> /srv/exports/data-01
>                 node1(rw,wdelay,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
> /srv/exports/data-02
>                 node1(rw,wdelay,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
> /srv/exports/data-01
>                 @netgroup(rw,wdelay,root_squash,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)
> /srv/exports/data-02
>                 @netgroup(rw,wdelay,root_squash,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)
> 
> Clients mounting those are mixed systems running Debian buster on 4.19.146-1
> and Debian stretch systems running 4.9.228-1.
> 
> Versions of listed packages (maybe relevant) on the server:
> 
> acl: 2.2.53-4
> libgssapi-krb5-2: 1.17-3
> libevent-2.1-6: 2.1.8-stable-4
> nfs-utils: 1:1.3.4-2.5+deb10u1
> util-linux: 2.33.1-0.1
> 
> (note those are all the respective versions in Debian buster).
> 
> Is there anything one can try to provivde to possibly track this down
> or are we here simply out of luck?
> 
> Regards,
> Salvatore

  reply	other threads:[~2020-10-12 14:26 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-11  7:59 Kernel panic / list_add corruption when in nfsd4_run_cb_work Salvatore Bonaccorso
2020-10-12 14:26 ` J. Bruce Fields [this message]
2020-10-12 15:41   ` Salvatore Bonaccorso
2020-10-12 16:33     ` J. Bruce Fields
2020-10-18  9:39       ` Salvatore Bonaccorso
2021-10-06 18:46         ` Salvatore Bonaccorso
2021-11-22  9:15           ` Olivier Monaco
2021-11-22 15:17             ` Chuck Lever III
2021-11-24 15:29               ` Bruce Fields
2021-11-24 15:59                 ` Trond Myklebust
2021-11-24 16:10                   ` Trond Myklebust
2021-11-24 16:10                   ` bfields
2021-11-24 17:14                     ` Trond Myklebust
2021-11-24 22:06                       ` bfields
2021-11-24 22:17                         ` Trond Myklebust
2021-12-01 22:33                           ` bfields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201012142602.GD26571@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=carnil@debian.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).