From: <maartenvanmalland@gmail.com>
To: <linux-raid@vger.kernel.org>
Subject: Raid 5 journal bug (crash)
Date: Tue, 9 May 2023 13:02:27 +0200 [thread overview]
Message-ID: <04df01d98265$be3a9b80$3aafd280$@gmail.com> (raw)
Hi,
I'm pretty sure I hit a bug in the raid5 code somewhere. (I also have a rather obscure storage config, so please bear with me.)
The (relevant) storage config is as follows:
2 x nvme (mdadm raid1) -> lvm2 -> volume for journal of ext4, volume for bcache caching and also a volume for raid5 journal.
6 x sata (mdadm raid5) -> bcache -> lvm2 -> ext4 volume with external journal
Surprisingly, this boots just fine 😉. However, since I enabled the raid5 journal on the nvme drives, the system hangs randomly with the following kernel output:
kernel: [14785.293972] BUG: kernel NULL pointer dereference, address: 0000000000000050
kernel: [14785.293984] #PF: supervisor read access in kernel mode
kernel: [14785.293992] #PF: error_code(0x0000) - not-present page
kernel: [14785.293997] PGD 0 P4D 0
kernel: [14785.294004] Oops: 0000 [#1] PREEMPT SMP PTI
kernel: [14785.294010] CPU: 4 PID: 543 Comm: md3_raid5 Tainted: P O 6.1.0-7-amd64 #1 Debian 6.1.20-2
kernel: [14785.294018] Hardware name: System manufacturer System Product Name/WS X299 PRO_SE, BIOS 3701 05/24/2022
kernel: [14785.294022] RIP: 0010:blk_cgroup_bio_start+0x46/0xb0
kernel: [14785.294033] Code: 00 00 0f 45 c2 89 c5 e8 b8 5c be ff 48 c7 c7 ad 6d 1a 8a e8 dc 14 53 00 48 8b 43 48 0f b7 4b 14 65 8b 35 f5 a5 d2 76 48 63 d6 <48> 8b 40 50 48 03 04 d5 a0 7a 1c 8a 48 63 d5 f6 c5 01 75 0e 80 cd
kernel: [14785.294039] RSP: 0000:ffffaf9500db7cb8 EFLAGS: 00010282
kernel: [14785.294045] RAX: 0000000000000000 RBX: ffffa03544df28b8 RCX: 0000000000000000
kernel: [14785.294050] RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffffffff8a15820e
kernel: [14785.294054] RBP: 0000000000000001 R08: 0000000000040001 R09: ffffaf9500db7d38
kernel: [14785.294058] R10: 0000000000000007 R11: ffffffffc08ff6a0 R12: 0000000000000000
kernel: [14785.294062] R13: 8000000000000000 R14: ffffa03582724b1c R15: 0000000000000008
kernel: [14785.294066] FS: 0000000000000000(0000) GS:ffffa053dfb00000(0000) knlGS:0000000000000000
kernel: [14785.294072] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [14785.294076] CR2: 0000000000000050 CR3: 0000000dae210005 CR4: 00000000003726e0
kernel: [14785.294081] Call Trace:
kernel: [14785.294086] <TASK>
kernel: [14785.294092] submit_bio_noacct_nocheck+0x38/0x370
kernel: [14785.294099] ? bio_associate_blkg+0x28/0x60
kernel: [14785.294106] ? bio_init+0x6d/0xc0
kernel: [14785.294117] handle_active_stripes.constprop.0+0x349/0x560 [raid456]
kernel: [14785.294152] raid5d+0x49c/0x760 [raid456]
kernel: [14785.294173] ? __schedule+0x359/0xa20
kernel: [14785.294183] ? _raw_spin_lock_irqsave+0x23/0x50
kernel: [14785.294191] ? preempt_count_add+0x6a/0xa0
kernel: [14785.294197] ? _raw_spin_lock_irqsave+0x23/0x50
kernel: [14785.294206] ? unregister_md_personality+0x70/0x70 [md_mod]
kernel: [14785.294230] md_thread+0xa7/0x180 [md_mod]
kernel: [14785.294253] ? dequeue_task_stop+0x70/0x70
kernel: [14785.294262] kthread+0xe6/0x110
kernel: [14785.294270] ? kthread_complete_and_exit+0x20/0x20
kernel: [14785.294278] ret_from_fork+0x1f/0x30
kernel: [14785.294292] </TASK>
kernel: [14785.294295] Modules linked in: brd veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype br_netfilter snd_seq_dummy snd_hrtimer vhost_net vhost vhost_iotlb tap tun cpufreq_conservative cpufreq_userspace cpufreq_powersave cpufreq_ondemand nvidia_uvm(PO) xfrm_user xfrm_algo rdma_ucm ib_uverbs rdma_cm iw_cm scsi_transport_iscsi ib_cm ib_core nvme_fabrics overlay nvidia_modeset(PO) qrtr bridge stp llc bonding tls nft_log xt_NFLOG nfnetlink_log xt_geoip(O) nft_chain_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp nft_compat nf_tables nfnetlink binfmt_misc nls_ascii nls_cp437 vfat fat xfs intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common nfit libnvdimm squashfs x86_pkg_temp_thermal nvidia(PO) intel_powerclamp coretemp kvm_intel zfs(PO) kvm ftdi_sio usbserial snd_hda_codec_realtek ghash_clmulni_intel snd_hda_codec_generic snd_hda_codec_hdmi zunicode(PO) snd_hda_intel zzstd(O) snd_intel_dspcfg aesni_intel
kernel: [14785.294429] eeepc_wmi snd_intel_sdw_acpi zlua(O) crypto_simd asus_wmi snd_hda_codec zavl(PO) cryptd ipmi_ssif platform_profile snd_hda_core icp(PO) rapl battery snd_hwdep sparse_keymap intel_cstate zcommon(PO) snd_pcm_oss ledtrig_audio znvpair(PO) intel_uncore snd_mixer_oss rfkill wmi_bmof intel_wmi_thunderbolt mei_me joydev spl(O) sha512_ssse3 snd_pcm mei ioatdma sha512_generic acpi_ipmi ipmi_si acpi_tad evdev tcp_bbr sch_fq vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio irqbypass vmwgfx snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore sg ipmi_watchdog ipmi_poweroff ipmi_devintf ipmi_msghandler msr nfsd drbd auth_rpcgss lru_cache nfs_acl parport_pc lockd ppdev grace lp parport fuse sunrpc loop efi_pstore configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic zstd_compress efivarfs raid10 multipath linear z3fold lz4 lz4_compress dm_snapshot dm_bufio i915 drm_buddy video drm_display_helper cec rc_core sr_mod
kernel: [14785.294590] cdrom hid_generic usbhid uas hid usb_storage raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 bcache dm_mod sd_mod raid1 md_mod nvme nvme_core t10_pi ast crc64_rocksoft drm_vram_helper ahci crc64 atlantic drm_ttm_helper igb libahci crc_t10dif xhci_pci ttm macsec dca drm_kms_helper libata crct10dif_generic xhci_hcd crc32_pclmul ptp i2c_i801 usbcore crct10dif_pclmul drm mxm_wmi scsi_mod crc32c_intel pps_core i2c_smbus i2c_algo_bit crct10dif_common usb_common scsi_common wmi button [last unloaded: brd]
kernel: [14785.294688] CR2: 0000000000000050
kernel: [14785.294693] ---[ end trace 0000000000000000 ]---
kernel: [14787.059673] RIP: 0010:blk_cgroup_bio_start+0x46/0xb0
kernel: [14787.059682] Code: 00 00 0f 45 c2 89 c5 e8 b8 5c be ff 48 c7 c7 ad 6d 1a 8a e8 dc 14 53 00 48 8b 43 48 0f b7 4b 14 65 8b 35 f5 a5 d2 76 48 63 d6 <48> 8b 40 50 48 03 04 d5 a0 7a 1c 8a 48 63 d5 f6 c5 01 75 0e 80 cd
kernel: [14787.059684] RSP: 0000:ffffaf9500db7cb8 EFLAGS: 00010282
kernel: [14787.059687] RAX: 0000000000000000 RBX: ffffa03544df28b8 RCX: 0000000000000000
kernel: [14787.059688] RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffffffff8a15820e
kernel: [14787.059689] RBP: 0000000000000001 R08: 0000000000040001 R09: ffffaf9500db7d38
kernel: [14787.059691] R10: 0000000000000007 R11: ffffffffc08ff6a0 R12: 0000000000000000
kernel: [14787.059692] R13: 8000000000000000 R14: ffffa03582724b1c R15: 0000000000000008
kernel: [14787.059693] FS: 0000000000000000(0000) GS:ffffa053dfb00000(0000) knlGS:0000000000000000
kernel: [14787.059695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [14787.059697] CR2: 0000000000000050 CR3: 000000048199e006 CR4: 00000000003726e0
kernel: [14787.059698] note: md3_raid5[543] exited with irqs disabled
kernel: [14787.059788] note: md3_raid5[543] exited with preempt_count 1
kernel: [14787.059794] ------------[ cut here ]------------
kernel: [14787.059796] WARNING: CPU: 4 PID: 543 at kernel/exit.c:814 do_exit+0x8ff/0xb10
kernel: [14787.059803] Modules linked in: brd veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype br_netfilter snd_seq_dummy snd_hrtimer vhost_net vhost vhost_iotlb tap tun cpufreq_conservative cpufreq_userspace cpufreq_powersave cpufreq_ondemand nvidia_uvm(PO) xfrm_user xfrm_algo rdma_ucm ib_uverbs rdma_cm iw_cm scsi_transport_iscsi ib_cm ib_core nvme_fabrics overlay nvidia_modeset(PO) qrtr bridge stp llc bonding tls nft_log xt_NFLOG nfnetlink_log xt_geoip(O) nft_chain_nat xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp nft_compat nf_tables nfnetlink binfmt_misc nls_ascii nls_cp437 vfat fat xfs intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common nfit libnvdimm squashfs x86_pkg_temp_thermal nvidia(PO) intel_powerclamp coretemp kvm_intel zfs(PO) kvm ftdi_sio usbserial snd_hda_codec_realtek ghash_clmulni_intel snd_hda_codec_generic snd_hda_codec_hdmi zunicode(PO) snd_hda_intel zzstd(O) snd_intel_dspcfg aesni_intel
kernel: [14787.059873] eeepc_wmi snd_intel_sdw_acpi zlua(O) crypto_simd asus_wmi snd_hda_codec zavl(PO) cryptd ipmi_ssif platform_profile snd_hda_core icp(PO) rapl battery snd_hwdep sparse_keymap intel_cstate zcommon(PO) snd_pcm_oss ledtrig_audio znvpair(PO) intel_uncore snd_mixer_oss rfkill wmi_bmof intel_wmi_thunderbolt mei_me joydev spl(O) sha512_ssse3 snd_pcm mei ioatdma sha512_generic acpi_ipmi ipmi_si acpi_tad evdev tcp_bbr sch_fq vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio irqbypass vmwgfx snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore sg ipmi_watchdog ipmi_poweroff ipmi_devintf ipmi_msghandler msr nfsd drbd auth_rpcgss lru_cache nfs_acl parport_pc lockd ppdev grace lp parport fuse sunrpc loop efi_pstore configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic zstd_compress efivarfs raid10 multipath linear z3fold lz4 lz4_compress dm_snapshot dm_bufio i915 drm_buddy video drm_display_helper cec rc_core sr_mod
kernel: [14787.059934] cdrom hid_generic usbhid uas hid usb_storage raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 bcache dm_mod sd_mod raid1 md_mod nvme nvme_core t10_pi ast crc64_rocksoft drm_vram_helper ahci crc64 atlantic drm_ttm_helper igb libahci crc_t10dif xhci_pci ttm macsec dca drm_kms_helper libata crct10dif_generic xhci_hcd crc32_pclmul ptp i2c_i801 usbcore crct10dif_pclmul drm mxm_wmi scsi_mod crc32c_intel pps_core i2c_smbus i2c_algo_bit crct10dif_common usb_common scsi_common wmi button [last unloaded: brd]
kernel: [14787.059971] CPU: 4 PID: 543 Comm: md3_raid5 Tainted: P D O 6.1.0-7-amd64 #1 Debian 6.1.20-2
kernel: [14787.059973] Hardware name: System manufacturer System Product Name/WS X299 PRO_SE, BIOS 3701 05/24/2022
kernel: [14787.059975] RIP: 0010:do_exit+0x8ff/0xb10
kernel: [14787.059977] Code: 06 ff ff ff 48 89 df e8 3f 47 0f 00 e9 4e f9 ff ff 0f 0b e9 51 f7 ff ff 4c 89 e6 bf 05 06 00 00 e8 06 eb 00 00 e9 2a f8 ff ff <0f> 0b e9 74 f7 ff ff 48 8b bb e0 0b 00 00 e8 2e db ff ff 48 85 c0
kernel: [14787.059978] RSP: 0000:ffffaf9500db7ed8 EFLAGS: 00010286
kernel: [14787.059980] RAX: 0000000000000000 RBX: ffffa034e42c8000 RCX: 0000000000000000
kernel: [14787.059981] RDX: 0000000000000001 RSI: 0000000000002710 RDI: 00000000ffffffff
kernel: [14787.059983] RBP: ffffa034e42e0000 R08: 0000000000000000 R09: ffffaf9500db7dd0
kernel: [14787.059984] R10: 0000000000000003 R11: ffffa0545ff6eb28 R12: 0000000000000009
kernel: [14787.059985] R13: ffffa035827c5ac0 R14: 0000000000000000 R15: 0000000000000000
kernel: [14787.059986] FS: 0000000000000000(0000) GS:ffffa053dfb00000(0000) knlGS:0000000000000000
kernel: [14787.059988] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [14787.059989] CR2: 0000000000000050 CR3: 000000048199e006 CR4: 00000000003726e0
kernel: [14787.059991] Call Trace:
kernel: [14787.059993] <TASK>
kernel: [14787.059996] make_task_dead+0x8d/0x90
kernel: [14787.059999] rewind_stack_and_make_dead+0x17/0x20
kernel: [14787.060003] RIP: 0000:0x0
kernel: [14787.060007] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
kernel: [14787.060008] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
kernel: [14787.060010] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
kernel: [14787.059981] RDX: 0000000000000001 RSI: 0000000000002710 RDI: 00000000ffffffff
kernel: [14787.059983] RBP: ffffa034e42e0000 R08: 0000000000000000 R09: ffffaf9500db7dd0
kernel: [14787.059984] R10: 0000000000000003 R11: ffffa0545ff6eb28 R12: 0000000000000009
kernel: [14787.059985] R13: ffffa035827c5ac0 R14: 0000000000000000 R15: 0000000000000000
kernel: [14787.059986] FS: 0000000000000000(0000) GS:ffffa053dfb00000(0000) knlGS:0000000000000000
kernel: [14787.059988] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [14787.059989] CR2: 0000000000000050 CR3: 000000048199e006 CR4: 00000000003726e0
kernel: [14787.059991] Call Trace:
kernel: [14787.059993] <TASK>
kernel: [14787.059996] make_task_dead+0x8d/0x90
kernel: [14787.059999] rewind_stack_and_make_dead+0x17/0x20
kernel: [14787.060003] RIP: 0000:0x0
kernel: [14787.060007] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
kernel: [14787.060008] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
kernel: [14787.060010] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
kernel: [14787.060011] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
kernel: [14787.060012] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
kernel: [14787.060013] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
kernel: [14787.060014] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
kernel: [14787.060017] </TASK>
kernel: [14787.060018] ---[ end trace 0000000000000000 ]---
For now I've reverted to an internal bitmap for the raid5 and all is stable again. If you need more information, please let me know!
Kind regards,
Maarten
next reply other threads:[~2023-05-09 11:03 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-09 11:02 maartenvanmalland [this message]
2023-05-09 11:39 ` Raid 5 journal bug (crash) Yu Kuai
2023-05-11 14:55 ` maartenvanmalland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='04df01d98265$be3a9b80$3aafd280$@gmail.com' \
--to=maartenvanmalland@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.