From: Takahiro Yasui <tyasui@redhat.com>
To: dm-devel@redhat.com
Subject: Re: [PATCH 7/7] Hold all write bios when errors are handled
Date: Wed, 25 Nov 2009 17:47:36 -0500 [thread overview]
Message-ID: <4B0DB408.6020906@redhat.com> (raw)
In-Reply-To: <20091125202346.GB1673@us.ibm.com>
On 11/25/09 15:23, malahal@us.ibm.com wrote:
> Mikulas Patocka [mpatocka@redhat.com] wrote:
>>
>>>> Imagine this scenario:
>>>> * secondary leg fails
>>>> * write fails on the secondaty leg and succeeds on the primary leg
>>>> and is successfully complete
>>>> * the computer crashes
>>>> * after a reboot, the primary leg is inaccessible and the secondary leg is
>>>> back online --- now raid1 would be returning stale data.
>>>
>>> The software can detect this case. We can fail this completely or use
>>> the data from the secondary that could be "stale" with help from admin.
>>> Let us call this method 1.
>>
>> You can't detect it because the computer crashed *before* you write the
>> information that the secondary leg failed to the metadata.
>>
>> So, after a reboot, you can't tell if any mirror leg failed some requests
>> before the crash.
>
> My definition of 'primary' is the first leg. Now on, I will use "first
> leg" to avoid confusion. On a reboot, LVM can find if its first leg is
> missing. If it is missing, it can ask the admin whether to use the
> 'second' leg or not. When I said, "software" can detect, I really meant
> that LVM can detect that the "first leg" is missing.
I think again the scenario which Mikulas pointed. It looks double failures
(fails happened on two legs), and human intervention would be acceptable.
However, how do we know if the second leg contains valid data?
There might be two cases.
1) System crashed during write operations without any disk failures, and
the first leg fails at the next boot.
We can use the secondary leg because data in the secondary leg is valid.
2) System crashed after the secondary leg failed, and the first leg fails
and the secondary leg gets back online at the next boot.
We can't use the secondary leg because data might be stale.
I haven't checked the contents of log disk, but I guess we can't differentiate
these cases from log disks. Another possibility I thought was error messages.
If any error messages for the secondary leg are recorded, we can judge that
the secondary leg contains stale data, but I suspect that it is not a secure
way because syslog might not be written in disk before system crash.
I would like to enhance system availability by keep system running when
the secondary leg fails, but we need to confirm this case.
I appreciate your comments.
Thanks,
Taka
next prev parent reply other threads:[~2009-11-25 22:47 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-18 12:09 [PATCH 0/7] patches: fix dm-raid1 race, bug 502927 Mikulas Patocka
2009-11-18 12:10 ` [PATCH 1/7] Explicitly initialize bio lists Mikulas Patocka
2009-11-18 12:11 ` [PATCH 2/7] A framework for holding bios until suspend Mikulas Patocka
2009-11-18 12:11 ` [PATCH 3/7] Use the hold framework in do_failures Mikulas Patocka
2009-11-18 12:12 ` [PATCH 4/7] Don't optimize for failure case Mikulas Patocka
2009-11-18 12:13 ` [PATCH 5/7] Move a logic to get a valid mirror leg to a function Mikulas Patocka
2009-11-18 12:18 ` [PATCH 6/7] Move bio completion from dm_rh_mark_nosync to its caller Mikulas Patocka
2009-11-18 12:19 ` [PATCH 7/7] Hold all write bios when errors are handled Mikulas Patocka
2009-11-23 5:58 ` malahal
2009-11-23 17:54 ` Takahiro Yasui
2009-11-24 11:51 ` Mikulas Patocka
2009-11-24 19:17 ` malahal
2009-11-25 13:19 ` Mikulas Patocka
2009-11-25 15:43 ` Takahiro Yasui
2009-11-25 20:44 ` malahal
2009-11-25 22:50 ` Takahiro Yasui
2009-11-26 17:56 ` Mikulas Patocka
2009-11-26 17:54 ` [PATCH 8/7] Hold all write bios in nosync region Mikulas Patocka
2009-11-25 20:23 ` [PATCH 7/7] Hold all write bios when errors are handled malahal
2009-11-25 22:47 ` Takahiro Yasui [this message]
2009-11-25 23:20 ` malahal
2009-11-25 23:50 ` Takahiro Yasui
2009-11-26 0:30 ` malahal
2009-11-26 17:58 ` Mikulas Patocka
2009-11-26 22:22 ` malahal
2009-11-28 18:02 ` [PATCH 2/7] A framework for holding bios until suspend Takahiro Yasui
2009-11-30 2:55 ` malahal
2009-11-30 9:41 ` Alasdair G Kergon
2009-11-30 16:46 ` [PATCH 0/7] patches: fix dm-raid1 race, bug 502927 Takahiro Yasui
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B0DB408.6020906@redhat.com \
--to=tyasui@redhat.com \
--cc=dm-devel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.