From: Takahiro Yasui <tyasui@redhat.com>
To: dm-devel@redhat.com
Subject: Re: [PATCH 0/7] patches: fix dm-raid1 race, bug 502927
Date: Mon, 30 Nov 2009 11:46:04 -0500 [thread overview]
Message-ID: <4B13F6CC.5050305@redhat.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0911180703410.21358@hs20-bc2-1.build.redhat.com>
On 11/18/09 07:09, Mikulas Patocka wrote:
> Hi
>
> Here is the serie of 7 patches to hold write bios on dm-raid1 until
> dmeventd does its job. It fixes bug
> https://bugzilla.redhat.com/show_bug.cgi?id=502927 . The first 6 patches
> are preparatory, they just move the code around, the last patch does the
> fix.
>
> I tested the thing, I managed to reproduce the bug (by manually stopping
> dmeventd with STOP signal, failing primary mirror leg and writing to the
> device) and I also verified that the patches fix the bug.
>
> For non-dmeventd operation, the current behavior is wrong and I just keep
> it as wrong as it was. There is no easy fix. It is just assume that if the
> user doesn't use dmeventd, he can't activate failed disks again.
>
> Mikulas
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
I reviewed and tested your patch set and looks good as a kernel side.
Reviewed-by: Takahiro Yasui <tyasui@redhat.com>
Tested-by: Takahiro Yasui <tyasui@redhat.com>
However, there are two issues found related to #8 and requires
improvements of dmeventd and lvm commands. This patch set is based on
the idea that dmeventd and lvm commands (lvconvert and vgreduce) fix
device failures and release blocked write I/Os. However, the blocked
write I/Os won't be released forever in some cases.
* Case 1: Medium error
When medium errors are detected for a write I/O and reported to dmeventd,
lvconvert is kicked from dmeventd, but nothing is done. Therefore, write
I/Os will be blocked forever. lvm commands will result in as follows:
# dmsetup status vg00-lv00
0 24576 mirror 2 253:1 253:2 23/24 1 DA 3 disk 253:0 A
# lvconvert --config devices{ignore_suspended_devices=1} --repair vg00/lv00
The mirror is consistent, nothing to repair.
# vgreduce --removemissing vg00
Volume group "vg00" is already consistent
# /usr/sbin/lvm version
LVM version: 2.02.57(1)-cvs (2009-11-24)
Library version: 1.02.41-cvs (2009-11-24)
Driver version: 4.15.0
# uname -mr
2.6.31.6 i686
* Case 2: Sync error
dmeventd doesn't handle sync error ('S' showed by status) which happens
during recovery. When write I/Os are issued on out-of-sync region, they
are blocked, but dmeventd won't handle sync error and release blocked I/Os.
The error on the primary leg during recovery won't help and we need to
accept the system stop because of no valid leg. The error on the secondary
leg can be handled as the regular write error. We can fix this issue by
changing the error flag from DM_RAID1_SYNC_ERROR to DM_RAID1_WRITE_ERROR
so that this error can be handled by dmeventd.
Other idea is to change dmeventd so that it can handle sync error ('S')
on the secondary error. It is also easy to make a small patch.
Thanks,
Taka
prev parent reply other threads:[~2009-11-30 16:46 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-18 12:09 [PATCH 0/7] patches: fix dm-raid1 race, bug 502927 Mikulas Patocka
2009-11-18 12:10 ` [PATCH 1/7] Explicitly initialize bio lists Mikulas Patocka
2009-11-18 12:11 ` [PATCH 2/7] A framework for holding bios until suspend Mikulas Patocka
2009-11-18 12:11 ` [PATCH 3/7] Use the hold framework in do_failures Mikulas Patocka
2009-11-18 12:12 ` [PATCH 4/7] Don't optimize for failure case Mikulas Patocka
2009-11-18 12:13 ` [PATCH 5/7] Move a logic to get a valid mirror leg to a function Mikulas Patocka
2009-11-18 12:18 ` [PATCH 6/7] Move bio completion from dm_rh_mark_nosync to its caller Mikulas Patocka
2009-11-18 12:19 ` [PATCH 7/7] Hold all write bios when errors are handled Mikulas Patocka
2009-11-23 5:58 ` malahal
2009-11-23 17:54 ` Takahiro Yasui
2009-11-24 11:51 ` Mikulas Patocka
2009-11-24 19:17 ` malahal
2009-11-25 13:19 ` Mikulas Patocka
2009-11-25 15:43 ` Takahiro Yasui
2009-11-25 20:44 ` malahal
2009-11-25 22:50 ` Takahiro Yasui
2009-11-26 17:56 ` Mikulas Patocka
2009-11-26 17:54 ` [PATCH 8/7] Hold all write bios in nosync region Mikulas Patocka
2009-11-25 20:23 ` [PATCH 7/7] Hold all write bios when errors are handled malahal
2009-11-25 22:47 ` Takahiro Yasui
2009-11-25 23:20 ` malahal
2009-11-25 23:50 ` Takahiro Yasui
2009-11-26 0:30 ` malahal
2009-11-26 17:58 ` Mikulas Patocka
2009-11-26 22:22 ` malahal
2009-11-28 18:02 ` [PATCH 2/7] A framework for holding bios until suspend Takahiro Yasui
2009-11-30 2:55 ` malahal
2009-11-30 9:41 ` Alasdair G Kergon
2009-11-30 16:46 ` Takahiro Yasui [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B13F6CC.5050305@redhat.com \
--to=tyasui@redhat.com \
--cc=dm-devel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.