From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A198A10E3 for ; Thu, 16 Oct 2025 00:10:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760573444; cv=none; b=oZtibjgOAQHdtFwoVcBP7v0wXahpDbr9HrE0l8YUG6QELDMrx7+eicDf7LD8QrAZddhzFa8nB+mpb2WITEj75Z2Jkqw2EVVLVEfMISbOXcfUuwmFdYvA7vKl2P5pyKW0SX0ql4xQwmtDHaOYdtjHoz+898hHCsE3I8xpMuw9VYU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760573444; c=relaxed/simple; bh=6YQYgOFzh+6KnBmhMCTG3QLgdWjpDixqvtZr3f8zBTM=; h=From:To:Subject:Date:Message-ID:Content-Type:MIME-Version; b=aA7R85XVlqBxMnc5FcwgTXjKBeIfhBBQoMBwJPRrjrwGe1bAQqAEszQA6fwIp0BO2FcGttsIIUZ3QuyU2HfTD1HPF6/Sia6JQVVL5mHfj3ZbI3ekreX50oxbQWWxpONqp6a0gyf69WPmL1J+gXDXwfSJqNOU0sgP4e8VY72bOQk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=o2M412a3; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="o2M412a3" Received: by smtp.kernel.org (Postfix) with ESMTPS id 34D2BC4CEFE for ; Thu, 16 Oct 2025 00:10:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1760573444; bh=6YQYgOFzh+6KnBmhMCTG3QLgdWjpDixqvtZr3f8zBTM=; h=From:To:Subject:Date:From; b=o2M412a3twVaXtzuHgVkfela9xUTEWvLMOhkXVSZSCQVqDZcv23aNQFjHrEIeFB5O vG5wwf5ehiat2zVAqNzsjCl+TDoOK9llj5USX1GtJIO8Lze123TbbOTYrpHSReIX/e cP7xxSpVgaSF0b22uODNWzUKbckj9rDJT+zswbsHTdKy9Y4kFBUmp4yrtJvj1LIM41 rHla6ow0NaE2MF2R+WyeMVNoNks1Bk6yB3f8LDt/1OGLceaFkTRJYlYD4cptpeaHgV JdtnhX0MQoVVl4uIj5eLyUD+OeI+yxyrBdyeHz5JvPNff40swa6pxdWK2sjxQekop3 BEstm3N7qPkog== Received: by aws-us-west-2-korg-bugzilla-1.web.codeaurora.org (Postfix, from userid 48) id 2AA93C41612; Thu, 16 Oct 2025 00:10:44 +0000 (UTC) From: bugzilla-daemon@kernel.org To: linux-xfs@vger.kernel.org Subject: [Bug 220669] New: Drive issues cause system and coredump and reboot Date: Thu, 16 Oct 2025 00:10:43 +0000 X-Bugzilla-Reason: None X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: AssignedTo filesystem_xfs@kernel-bugs.kernel.org X-Bugzilla-Product: File System X-Bugzilla-Component: XFS X-Bugzilla-Version: 2.5 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: bshephar@bne-home.net X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: filesystem_xfs@kernel-bugs.kernel.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter cf_regression Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugzilla.kernel.org/ Auto-Submitted: auto-generated Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 https://bugzilla.kernel.org/show_bug.cgi?id=3D220669 Bug ID: 220669 Summary: Drive issues cause system and coredump and reboot Product: File System Version: 2.5 Hardware: All OS: Linux Status: NEW Severity: normal Priority: P3 Component: XFS Assignee: filesystem_xfs@kernel-bugs.kernel.org Reporter: bshephar@bne-home.net Regression: No We have a number of drives across our clusters that don't present as failin= g, but seem to have read errors that cause the system to coredump and reboot. Failing drives is obviously not the fault of XFS, but my expectation would = be that it doesn't completely cause the system to hang and need to reboot. Kernel Version: $ uname -r 4.18.0-553.69.1.el8_10.x86_64 Ideally, I think XFS should be able to more gracefully handle this situatio= n. We can see the drive at /dev/sdv have issues here before the Call Trace and coredump: [ 998.356008] sd 0:0:21:0: [sdv] tag#415 FAILED Result: hostbyte=3DDID_OK driverbyte=3DDRIVER_SENSE cmd_age=3D3s [ 998.365528] sd 0:0:21:0: [sdv] tag#415 Sense Key : Medium Error [current] [descriptor] [ 998.373544] sd 0:0:21:0: [sdv] tag#415 Add. Sense: Unrecovered read error [ 998.380347] sd 0:0:21:0: [sdv] tag#415 CDB: Read(16) 88 00 00 00 00 05 0= 1 1f 6d 90 00 00 00 08 00 00 [ 998.389524] blk_update_request: critical medium error, dev sdv, sector 21493673360 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0 [ 998.404141] XFS (sdv): metadata I/O error in "xfs_da_read_buf+0xd3/0x120 [xfs]" at daddr 0x5011f6d90 len 8 error 61 [ 998.414706] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 998.422554] PGD 1218fc067 P4D 1218fc067 PUD 1d0139067 PMD 0 [ 998.428247] Oops: 0002 [#1] SMP NOPTI [ 998.431931] CPU: 72 PID: 4802 Comm: swift-object-au Kdump: loaded Not tainted 4.18.0-553.69.1.el8_10.x86_64 #1 [ 998.441958] Hardware name: Dell Inc. PowerEdge R740xd2/0VNGN1, BIOS 2.14= .2 03/23/2022 [ 998.449818] RIP: 0010:_raw_spin_lock_irqsave+0x1e/0x40 [ 998.454975] Code: cf 80 0b 08 eb 88 90 90 90 90 90 90 0f 1f 44 00 00 53 = 9c 58 0f 1f 44 00 00 48 89 c3 fa 66 0f 1f 44 00 00 31 c0 ba 01 00 00 00 0= f b1 17 75 09 48 89 d8 5b c3 cc cc cc cc 89 c6 e8 1c 7b 73 ff [ 998.473799] RSP: 0018:ffff99455ccafbe0 EFLAGS: 00010046 [ 998.479344] RAX: 0000000000000000 RBX: 0000000000000286 RCX: 0000000000000000 [ 998.486695] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000020 [ 998.494044] RBP: 0000000000000000 R08: 0000000080000000 R09: ffff8e3abffd5000 [ 998.501360] R10: 0000000000000001 R11: ffff99455ccaf848 R12: 0000000000000000 [ 998.508690] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 998.516004] FS: 00007f74aa03e740(0000) GS:ffff8e3941100000(0000) knlGS:0000000000000000 [ 998.524333] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 998.530270] CR2: 0000000000000020 CR3: 00000001cfbb6003 CR4: 00000000007706e0 [ 998.537585] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 998.544901] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 998.552224] PKRU: 55555554 [ 998.555136] Call Trace: [ 998.557765] ? __die_body+0x1a/0x60 [ 998.561435] ? no_context+0x1ba/0x3f0 [ 998.565276] ? __bad_area_nosemaphore+0x157/0x180 [ 998.570202] ? xfs_trans_read_buf_map+0x128/0x360 [xfs] [ 998.575693] ? do_page_fault+0x37/0x12d [ 998.579713] ? page_fault+0x1e/0x30 [ 998.583390] ? _raw_spin_lock_irqsave+0x1e/0x40 [ 998.588113] up+0x12/0x50 [ 998.590913] xfs_buf_unlock+0x15/0x70 [xfs] [ 998.595350] xfs_trans_brelse+0xc6/0xe0 [xfs] [ 998.599960] xfs_attr_leaf_get+0xb0/0xc0 [xfs] [ 998.604639] xfs_attr_get+0x9e/0xc0 [xfs] [ 998.608888] xfs_xattr_get+0x75/0xb0 [xfs] [ 998.613238] __vfs_getxattr+0x54/0x70 [ 998.617104] vfs_getxattr+0x118/0x140 [ 998.620959] getxattr+0x187/0x1b0 [ 998.624470] ? cp_new_stat+0x150/0x190 [ 998.628411] ? do_vfs_ioctl+0xa4/0x690 [ 998.632356] ? __do_sys_newfstat+0x5e/0x70 [ 998.636644] ? syscall_trace_enter+0x1ff/0x2d0 [ 998.641283] __x64_sys_fgetxattr+0x5f/0xb0 [ 998.645573] do_syscall_64+0x5b/0x1a0 [ 998.649429] entry_SYSCALL_64_after_hwframe+0x66/0xcb [ 998.654672] RIP: 0033:0x7f74a88dc5fe [ 998.658434] Code: 48 8b 0d 8d 78 39 00 f7 d8 64 89 01 48 83 c8 ff c3 66 = 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 c1 00 00 00 0f 05 <48> 3= d 01 f0 ff ff 73 01 c3 48 8b 0d 5a 78 39 00 f7 d8 64 89 01 48 [ 998.677570] RSP: 002b:00007ffe2289a9f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000c1 [ 998.685351] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f74a88dc5fe [ 998.692683] RDX: 0000000000000000 RSI: 00007f749538e7d0 RDI: 0000000000000003 [ 998.700024] RBP: 00007ffe2289aa80 R08: 0000000000000000 R09: 00007f74a9f400d0 [ 998.707357] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 998.714680] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 [ 998.722013] Modules linked in: dm_mod tun xfs ipt_MASQUERADE nf_conntrack_netlink xt_addrtype br_netfilter bridge stp llc tcp_diag inet_= diag overlay xt_multiport ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6_tables ip_tables nft_compat ip_set nf_tab= les libcrc32c nfnetlink dell_rbu vfat fat ipmi_ssif intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel irdma kvm i40e irqbypass crc32_pclmul ib_uverbs rapl dell_smbios iTCO_wdt intel_cstate iTCO_vendor_support acpi_ipmi dell_wmi_descriptor wmi_bmof mei_me dcdbas ib_core intel_uncore pcspkr mei i2c_i801 ipmi_si lpc_ich wmi ipmi_devintf ipmi_msghandler ext4 mbcache jbd2 sd_mod sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt i2c_algo= _bit crct10dif_pclmul drm_shmem_helper ahci ice nvme [ 998.722079] crc32c_intel libahci drm nvme_core ghash_clmulni_intel liba= ta megaraid_sas tg3 t10_pi fuse crash> sys | grep -e CPUS -e LOAD -e RELEASE -e PANIC CPUS: 80 LOAD AVERAGE: 79.36, 197.55, 179.00 RELEASE: 4.18.0-553.69.1.el8_10.x86_64 PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000020" crash> set -p PID: 4802 COMMAND: "swift-object-au" TASK: ffff8e3ad2990000 [THREAD_INFO: ffff8e3ad2990000] CPU: 72 STATE: TASK_RUNNING (PANIC) crash> bt PID: 4802 TASK: ffff8e3ad2990000 CPU: 72 COMMAND: "swift-object-au" #0 [ffff99455ccaf910] machine_kexec at ffffffff9b86de53 #1 [ffff99455ccaf968] __crash_kexec at ffffffff9b9b9e1a #2 [ffff99455ccafa28] crash_kexec at ffffffff9b9bad51 #3 [ffff99455ccafa40] oops_end at ffffffff9b82c131 #4 [ffff99455ccafa60] no_context at ffffffff9b880da3 #5 [ffff99455ccafab8] __bad_area_nosemaphore at ffffffff9b881107 #6 [ffff99455ccafb00] do_page_fault at ffffffff9b881dc7 #7 [ffff99455ccafb30] page_fault at ffffffff9c4011fe [exception RIP: _raw_spin_lock_irqsave+30] RIP: ffffffff9c2270be RSP: ffff99455ccafbe0 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000286 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000020 RBP: 0000000000000000 R8: 0000000080000000 R9: ffff8e3abffd5000 R10: 0000000000000001 R11: ffff99455ccaf848 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff99455ccafbe8] up at ffffffff9c2230e2 #9 [ffff99455ccafc00] xfs_buf_unlock at ffffffffc0d95455 [xfs] #10 [ffff99455ccafc20] xfs_trans_brelse at ffffffffc0dcf3b6 [xfs] #11 [ffff99455ccafc40] xfs_attr_leaf_get at ffffffffc0d49e80 [xfs] #12 [ffff99455ccafc70] xfs_attr_get at ffffffffc0d4afae [xfs] #13 [ffff99455ccafc90] xfs_xattr_get at ffffffffc0dbaa05 [xfs] #14 [ffff99455ccafd40] __vfs_getxattr at ffffffff9bb9a064 #15 [ffff99455ccafd70] vfs_getxattr at ffffffff9bb9a5a8 #16 [ffff99455ccafdb8] getxattr at ffffffff9bb9aa07 #17 [ffff99455ccaff00] __x64_sys_fgetxattr at ffffffff9bb9b17f #18 [ffff99455ccaff38] do_syscall_64 at ffffffff9b803cab #19 [ffff99455ccaff50] entry_SYSCALL_64_after_hwframe at ffffffff9c40012e RIP: 00007f74a88dc5fe RSP: 00007ffe2289a9f8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f74a88dc5fe RDX: 0000000000000000 RSI: 00007f749538e7d0 RDI: 0000000000000003 RBP: 00007ffe2289aa80 R8: 0000000000000000 R9: 00007f74a9f400d0 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: 00000000000000c1 CS: 0033 SS: 002b crash> dis _raw_spin_lock_irqsave+30 0xffffffff9c2270be <_raw_spin_lock_irqsave+30>: lock cmpxchg %edx,(%rdi) RDI From the above=20 [exception RIP: _raw_spin_lock_irqsave+30] RIP: ffffffff9c2270be RSP: ffff99455ccafbe0 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000286 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000020 RBP: 0000000000000000 R8: 0000000080000000 R9: ffff8e3abffd5000 R10: 0000000000000001 R11: ffff99455ccaf848 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Seems like that memory address would be incorrect for RDI? https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt The `vmcore` is quite large. But I can probably get any information that you need to investigate further if required. --=20 You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.=