From: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
To: Pavel <pavel2000@ngs.ru>
Cc: linux-raid@vger.kernel.org
Subject: Re: Misbehavior of md-raid RAID on failed NVMe.
Date: Wed, 8 Jun 2022 10:32:09 +0200 [thread overview]
Message-ID: <20220608103209.00001d6a@linux.intel.com> (raw)
In-Reply-To: <984f2ca5-2565-025d-62a2-2425b518a01f@ngs.ru>
On Wed, 8 Jun 2022 10:48:09 +0700
Pavel <pavel2000@ngs.ru> wrote:
> Hi, linux-raid community.
>
> Today we found a strange and even scaring behavior of md-raid RAID based
> on NVMe devices.
>
> We ordered new server, and started data transfer (using dd, filesystems
> was unmounted on source, etc - no errors here).
>
> While data in transfer, kernel started IO errors reporting on one of
> NVMe devices. (dmesg output below)
> But md-raid not reacted on them in any way. RAID array not went into any
> failed state, and "clean" state reported all the time.
>
> Based on earlier practice, we trusted md-raid and thought things goes ok.
> After data transfer finished, server was turned off and cables was
> replaced on suspicion.
>
> After OS started on this new server, we found MySQL crashes.
> Thorough checksum check showed us mismatches on files content.
> (Of course, we did checksumming of untouched files, not MySQL database
> files)
>
> So, there is data-loss possible when NVMe device behaves wrong.
> We think, md-raid has to remove failed device from raid in a such case.
> That it didn't happen is wrong behaviour, so want to inform community
> about finding.
>
> Hope, this will help to make kernel ever better.
> Thanks for your work.
>
Hi Pavel,
IMO it is not a RAID problem. In this case some part of requests hangs
inside nvme driver and raid1d hanged too. It is rather nvme problem not a raid.
RAID should handle it well if IO errors are continuously reported.
Thanks,
Mariusz
> ---
> [Tue Jun 7 09:58:45 2022] Call Trace:
> [Tue Jun 7 09:58:45 2022] <IRQ>
> [Tue Jun 7 09:58:45 2022] nvme_pci_complete_rq+0x5b/0x67 [nvme]
> [Tue Jun 7 09:58:45 2022] nvme_poll_cq+0x1e4/0x265 [nvme]
> [Tue Jun 7 09:58:45 2022] nvme_irq+0x36/0x6e [nvme]
> [Tue Jun 7 09:58:45 2022] __handle_irq_event_percpu+0x73/0x13e
> [Tue Jun 7 09:58:45 2022] handle_irq_event_percpu+0x31/0x77
> [Tue Jun 7 09:58:45 2022] handle_irq_event+0x2e/0x51
> [Tue Jun 7 09:58:45 2022] handle_edge_irq+0xc9/0xee
> [Tue Jun 7 09:58:45 2022] __common_interrupt+0x41/0x9e
> [Tue Jun 7 09:58:45 2022] common_interrupt+0x6e/0x8b
> [Tue Jun 7 09:58:45 2022] </IRQ>
> [Tue Jun 7 09:58:45 2022] <TASK>
> [Tue Jun 7 09:58:45 2022] asm_common_interrupt+0x1e/0x40
> [Tue Jun 7 09:58:45 2022] RIP: 0010:__blk_mq_try_issue_directly+0x12/0x136
> [Tue Jun 7 09:58:45 2022] Code: fe ff ff 48 8b 97 78 01 00 00 48 8b 92
> 80 00 00 00 48 89 34 c2 b0 01 c3 0f 1f 44 00 00 41 57 41 56 41 55 41 54
> 55 48 89 f5 53 <89> d3 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44
> 24 10 31 c0
> [Tue Jun 7 09:58:45 2022] RSP: 0018:ffff88810bbf7ad8 EFLAGS: 00000246
> [Tue Jun 7 09:58:45 2022] RAX: 0000000000000000 RBX: 0000000000000001
> RCX: 0000000000000001
> [Tue Jun 7 09:58:45 2022] RDX: 0000000000000001 RSI: ffff8881137e6e40
> RDI: ffff88810dfdf400
> [Tue Jun 7 09:58:45 2022] RBP: ffff8881137e6e40 R08: 0000000000000001
> R09: 0000000000000a20
> [Tue Jun 7 09:58:45 2022] R10: 0000000000000000 R11: 0000000000000000
> R12: 0000000000000000
> [Tue Jun 7 09:58:45 2022] R13: ffff88810dfdf400 R14: ffff8881137e6e40
> R15: ffff88810bbf7df0
> [Tue Jun 7 09:58:45 2022] blk_mq_request_issue_directly+0x46/0x78
> [Tue Jun 7 09:58:45 2022] blk_mq_try_issue_list_directly+0x41/0xba
> [Tue Jun 7 09:58:45 2022] blk_mq_sched_insert_requests+0x86/0xd0
> [Tue Jun 7 09:58:45 2022] blk_mq_flush_plug_list+0x1b5/0x214
> [Tue Jun 7 09:58:45 2022] ? __blk_mq_alloc_requests+0x1c7/0x21d
> [Tue Jun 7 09:58:45 2022] blk_mq_submit_bio+0x437/0x518
> [Tue Jun 7 09:58:45 2022] submit_bio_noacct+0x93/0x1e6
> [Tue Jun 7 09:58:45 2022] ? bio_associate_blkg_from_css+0x137/0x15c
> [Tue Jun 7 09:58:45 2022] flush_bio_list+0x96/0xa5
> [Tue Jun 7 09:58:45 2022] flush_pending_writes+0x7a/0xbf
> [Tue Jun 7 09:58:45 2022] ? md_check_recovery+0x8a/0x4bd
> [Tue Jun 7 09:58:45 2022] raid1d+0x194/0x10e8
> [Tue Jun 7 09:58:45 2022] ? common_interrupt+0xf/0x8b
> [Tue Jun 7 09:58:45 2022] md_thread+0x12c/0x155
> [Tue Jun 7 09:58:45 2022] ? init_wait_entry+0x29/0x29
> [Tue Jun 7 09:58:45 2022] ? signal_pending+0x19/0x19
> [Tue Jun 7 09:58:45 2022] kthread+0x104/0x10c
> [Tue Jun 7 09:58:45 2022] ? set_kthread_struct+0x32/0x32
> [Tue Jun 7 09:58:45 2022] ret_from_fork+0x22/0x30
> [Tue Jun 7 09:58:45 2022] </TASK>
> [Tue Jun 7 09:58:45 2022] ---[ end trace 15dc74ae2e04f737 ]---
> [Tue Jun 7 09:58:45 2022] ------------[ cut here ]------------
> [Tue Jun 7 09:58:45 2022] refcount_t: underflow; use-after-free.
next prev parent reply other threads:[~2022-06-08 9:11 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-08 3:48 Misbehavior of md-raid RAID on failed NVMe Pavel
2022-06-08 8:32 ` Mariusz Tkaczyk [this message]
[not found] ` <8b0c4bf1-a165-95ca-9746-8ef7be46092e@areainter.net>
2022-06-08 9:11 ` Mariusz Tkaczyk
2022-06-08 16:52 ` Wol
2022-06-08 18:16 ` Pavel
[not found] ` <CAAMCDef5jamJa+um=DSM08CPdzoTvEQuFOdrGo7jiNivrNVbpg@mail.gmail.com>
2022-06-09 6:43 ` Pavel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220608103209.00001d6a@linux.intel.com \
--to=mariusz.tkaczyk@linux.intel.com \
--cc=linux-raid@vger.kernel.org \
--cc=pavel2000@ngs.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).