dmeventd doesn't handle failures during mirror resync.

All of lore.kernel.org
 help / color / mirror / Atom feed

* dmeventd doesn't handle failures during mirror resync.
@ 2010-04-29  7:01 Neil Brown
  2010-05-05  8:08 ` Petr Rockai
  0 siblings, 1 reply; 6+ messages in thread
From: Neil Brown @ 2010-04-29  7:01 UTC (permalink / raw)
  To: lvm-devel


Hi,
 I've been exploring the behaviour of dm-raid1 (particular in a clustered
environment) in response to various error conditions.

I was surprised to discover that while a normal write error is
handled properly - dmeventd runs 'lvconvert' to fix the array up,
this does not happen in response to a write error while syncing
the array.

If I arrange for the new device to die, then 
          lvconvert --repair --use-policies

will fix it up as I would expect, but dmeventd never asks it to do
this.

This seems to be a deliberate decision:  in _process_status_code 
in dmeventd_mirror.c, a status of 'F' will cause lvconvert to be
run while 'S' and 'R' (sync and read errors) will not.

Is there a reason for this?

I realise that the data is not at risk in the case of a sync error as the
dirty log records that the section of the array is out of sync so no
IO will be directed there.  However it seems to go against expectation.
Also, if you then stop the array (vgchange -an) without lvconvert being
run, and restart it (vgchange -ay) the restart will fail unless you use
--partial.
This would mean that a shutdown/reboot would probably leave the volume
inactive (as it seems init scripts don't tend to inculde --partial) which
is not what I would expect.

Can we change dmeventd to response to sync (and read) errors in the same
way that it responds to write errors?

Thanks,
NeilBrown


diff --git a/daemons/dmeventd/plugins/mirror/dmeventd_mirror.c b/daemons/dmeventd/plugins/mirror/dmeventd_mirror.c
index 4d2eee5..7c6ceb9 100644
--- a/daemons/dmeventd/plugins/mirror/dmeventd_mirror.c
+++ b/daemons/dmeventd/plugins/mirror/dmeventd_mirror.c
@@ -41,21 +41,20 @@ static int _process_status_code(const char status_code, const char *dev_name,
 	 *    R => Read - A read failure occurred, mirror data unaffected
 	 *    U => Unclassified failure (bug)
 	 */ 
-	if (status_code == 'F') {
+	if (status_code == 'F')
 		syslog(LOG_ERR, "%s device %s flush failed.",
 		       dev_type, dev_name);
-		r = ME_FAILURE;
-	} else if (status_code == 'S')
+	else if (status_code == 'S')
 		syslog(LOG_ERR, "%s device %s sync failed.",
 		       dev_type, dev_name);
 	else if (status_code == 'R')
 		syslog(LOG_ERR, "%s device %s read failed.",
 		       dev_type, dev_name);
-	else if (status_code != 'A') {
+	else if (status_code != 'A')
 		syslog(LOG_ERR, "%s device %s has failed (%c).",
 		       dev_type, dev_name, status_code);
+	if (status_code != 'A')
 		r = ME_FAILURE;
-	}
 
 	return r;
 }



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* dmeventd doesn't handle failures during mirror resync.
  2010-04-29  7:01 dmeventd doesn't handle failures during mirror resync Neil Brown
@ 2010-05-05  8:08 ` Petr Rockai
  2010-05-05 13:22   ` Jonathan Brassow
  2010-05-12 23:19   ` Neil Brown
  0 siblings, 2 replies; 6+ messages in thread
From: Petr Rockai @ 2010-05-05  8:08 UTC (permalink / raw)
  To: lvm-devel

Neil Brown <neilb@suse.de> writes:

> I was surprised to discover that while a normal write error is
> handled properly - dmeventd runs 'lvconvert' to fix the array up,
> this does not happen in response to a write error while syncing
> the array.
>
> If I arrange for the new device to die, then 
>           lvconvert --repair --use-policies
>
> will fix it up as I would expect, but dmeventd never asks it to do
> this.
>
> This seems to be a deliberate decision:  in _process_status_code 
> in dmeventd_mirror.c, a status of 'F' will cause lvconvert to be
> run while 'S' and 'R' (sync and read errors) will not.
>
> Is there a reason for this?
I think the rationale is that:

For read errors, we should *not* strip the mirror leg, since we want to
keep as much redundancy as possible in this scenario. The failure should
be logged, but I think that's it.

For sync, I am not sure. It may be that the reason for this is that sync
is usually related to manual action and dmeventd intervention may be
unexpected and unwanted in this case. But that case could be argued.

> Can we change dmeventd to response to sync (and read) errors in the same
> way that it responds to write errors?
I think it's a bad idea for read errors, unless maybe we could have a
new feature for that -- one that'd upconvert the mirror first (if
there's a hotspare) and only if that finishes OK, kill the bad leg. Just
log the error if there are no hotspares.

For sync errors, I am ambivalent. Any further opinions?

Yours,
   Petr.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* dmeventd doesn't handle failures during mirror resync.
  2010-05-05  8:08 ` Petr Rockai
@ 2010-05-05 13:22   ` Jonathan Brassow
  2010-05-05 15:26     ` Takahiro Yasui
  2010-05-12 23:19   ` Neil Brown
  1 sibling, 1 reply; 6+ messages in thread
From: Jonathan Brassow @ 2010-05-05 13:22 UTC (permalink / raw)
  To: lvm-devel


On May 5, 2010, at 3:08 AM, Petr Rockai wrote:

> Neil Brown <neilb@suse.de> writes:
>
>> I was surprised to discover that while a normal write error is
>> handled properly - dmeventd runs 'lvconvert' to fix the array up,
>> this does not happen in response to a write error while syncing
>> the array.
>>
>> If I arrange for the new device to die, then
>>          lvconvert --repair --use-policies
>>
>> will fix it up as I would expect, but dmeventd never asks it to do
>> this.
>>
>> This seems to be a deliberate decision:  in _process_status_code
>> in dmeventd_mirror.c, a status of 'F' will cause lvconvert to be
>> run while 'S' and 'R' (sync and read errors) will not.
>>
>> Is there a reason for this?
> I think the rationale is that:
>
> For read errors, we should *not* strip the mirror leg, since we want  
> to
> keep as much redundancy as possible in this scenario. The failure  
> should
> be logged, but I think that's it.
>
> For sync, I am not sure. It may be that the reason for this is that  
> sync
> is usually related to manual action and dmeventd intervention may be
> unexpected and unwanted in this case. But that case could be argued.
>
>> Can we change dmeventd to response to sync (and read) errors in the  
>> same
>> way that it responds to write errors?
> I think it's a bad idea for read errors, unless maybe we could have a
> new feature for that -- one that'd upconvert the mirror first (if
> there's a hotspare) and only if that finishes OK, kill the bad leg.  
> Just
> log the error if there are no hotspares.
>
> For sync errors, I am ambivalent. Any further opinions?

I think for sync errors, we should restart the sync.  This can be done  
by a suspend/resume of the mirror device.  Effectively, we are  
assuming a transient failure.  Perhaps if we have tried to clear the  
fault a couple times, then we could remove the failed device.

Read errors I would definitely leave alone.  Drives can often relocate  
bad sectors, but that is done on writes.  If the relocation fails, we  
will know about it when the write fails.

  brassow



^ permalink raw reply	[flat|nested] 6+ messages in thread

* dmeventd doesn't handle failures during mirror resync.
  2010-05-05 13:22   ` Jonathan Brassow
@ 2010-05-05 15:26     ` Takahiro Yasui
  0 siblings, 0 replies; 6+ messages in thread
From: Takahiro Yasui @ 2010-05-05 15:26 UTC (permalink / raw)
  To: lvm-devel

On 05/05/10 09:22, Jonathan Brassow wrote:
> On May 5, 2010, at 3:08 AM, Petr Rockai wrote:
>> Neil Brown <neilb@suse.de> writes:
>>
>>> I was surprised to discover that while a normal write error is
>>> handled properly - dmeventd runs 'lvconvert' to fix the array up,
>>> this does not happen in response to a write error while syncing
>>> the array.
>>>
>>> If I arrange for the new device to die, then
>>>          lvconvert --repair --use-policies
>>>
>>> will fix it up as I would expect, but dmeventd never asks it to do
>>> this.
>>>
>>> This seems to be a deliberate decision:  in _process_status_code
>>> in dmeventd_mirror.c, a status of 'F' will cause lvconvert to be
>>> run while 'S' and 'R' (sync and read errors) will not.
>>>
>>> Is there a reason for this?
>> I think the rationale is that:
>>
>> For read errors, we should *not* strip the mirror leg, since we want  
>> to
>> keep as much redundancy as possible in this scenario. The failure  
>> should
>> be logged, but I think that's it.
>>
>> For sync, I am not sure. It may be that the reason for this is that  
>> sync
>> is usually related to manual action and dmeventd intervention may be
>> unexpected and unwanted in this case. But that case could be argued.
>>
>>> Can we change dmeventd to response to sync (and read) errors in the  
>>> same
>>> way that it responds to write errors?
>> I think it's a bad idea for read errors, unless maybe we could have a
>> new feature for that -- one that'd upconvert the mirror first (if
>> there's a hotspare) and only if that finishes OK, kill the bad leg.  
>> Just
>> log the error if there are no hotspares.
>>
>> For sync errors, I am ambivalent. Any further opinions?
> 
> I think for sync errors, we should restart the sync.  This can be done  
> by a suspend/resume of the mirror device.  Effectively, we are  
> assuming a transient failure.  Perhaps if we have tried to clear the  
> fault a couple times, then we could remove the failed device.
> 
> Read errors I would definitely leave alone.  Drives can often relocate  
> bad sectors, but that is done on writes.  If the relocation fails, we  
> will know about it when the write fails.

Read errors can't be handled with the current device-mapper in kernel.
Sync error is reported by both read and write errors. Read error
occurs by reading a default mirror device and write error occurs
by writing data in non-default mirror devices. If sync error was
the one reported by read error, we will lose a default mirror, which
contains a valid data. Think a sync error when new disk is added.
If it is a transient error, we could save data by retrying sync,
for example as Jon suggested.

Also, I proposed the following patch a half year ago. The patch
handles only "write sync error." This approach could improve
an error handling, although this patch should be improved to
consider the case that a default mirror changed.

lvm2: sync error handling on secondary mirror in dmeventd
https://www.redhat.com/archives/lvm-devel/2009-November/msg00179.html

Thanks,
Taka



^ permalink raw reply	[flat|nested] 6+ messages in thread

* dmeventd doesn't handle failures during mirror resync.
  2010-05-05  8:08 ` Petr Rockai
  2010-05-05 13:22   ` Jonathan Brassow
@ 2010-05-12 23:19   ` Neil Brown
  2010-05-13  9:27     ` Nikanth Karthikesan
  1 sibling, 1 reply; 6+ messages in thread
From: Neil Brown @ 2010-05-12 23:19 UTC (permalink / raw)
  To: lvm-devel

Thanks for the reply,

On Wed, 05 May 2010 10:08:09 +0200
Petr Rockai <prockai@redhat.com> wrote:

> Neil Brown <neilb@suse.de> writes:
> 
> > I was surprised to discover that while a normal write error is
> > handled properly - dmeventd runs 'lvconvert' to fix the array up,
> > this does not happen in response to a write error while syncing
> > the array.
> >
> > If I arrange for the new device to die, then 
> >           lvconvert --repair --use-policies
> >
> > will fix it up as I would expect, but dmeventd never asks it to do
> > this.
> >
> > This seems to be a deliberate decision:  in _process_status_code 
> > in dmeventd_mirror.c, a status of 'F' will cause lvconvert to be
> > run while 'S' and 'R' (sync and read errors) will not.
> >
> > Is there a reason for this?
> I think the rationale is that:
> 
> For read errors, we should *not* strip the mirror leg, since we want to
> keep as much redundancy as possible in this scenario. The failure should
> be logged, but I think that's it.

In general I would agree, though there are cases where you would want to
strip a leg based only on read errors.  If a drive is failing in a way that
all read attempts cause multiple retries and timeouts, then having that
device remain in the array can kill performance.

It seems therefore that this should be a policy based decision, possibly
involving extra testing, and the place for this would seem to be in lvconvert.

Hence the thrust of my argument, which is that dmeventd should always run
lvconvert on any event that is at all suspect, and lvconvert should make the
correct decision, which may be to leave the array alone.

Maybe lvconvert doesn't currently always make the 'correct' decision so
changing dmeventd might be premature, but I would like to know if you agree
that this is the best long-term strategy.

> 
> For sync, I am not sure. It may be that the reason for this is that sync
> is usually related to manual action and dmeventd intervention may be
> unexpected and unwanted in this case. But that case could be argued.
> 

If you are using dmeventd, then a write failure could easily cause lvconvert
to kick off a sync on a new device, so it could easily not be a manual action.

The main problem I see with the current arrangement is that if a device goes
missing during a sync and we don't 'lvconvert repair', then the next restart
will not succeed with "--partial" and that would cause problems rebooting.

Thanks,
NeilBrown

(please Cc: me, I'm not subscribed).

> > Can we change dmeventd to response to sync (and read) errors in the same
> > way that it responds to write errors?
> I think it's a bad idea for read errors, unless maybe we could have a
> new feature for that -- one that'd upconvert the mirror first (if
> there's a hotspare) and only if that finishes OK, kill the bad leg. Just
> log the error if there are no hotspares.
> 
> For sync errors, I am ambivalent. Any further opinions?
> 
> Yours,
>    Petr.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* dmeventd doesn't handle failures during mirror resync.
  2010-05-12 23:19   ` Neil Brown
@ 2010-05-13  9:27     ` Nikanth Karthikesan
  0 siblings, 0 replies; 6+ messages in thread
From: Nikanth Karthikesan @ 2010-05-13  9:27 UTC (permalink / raw)
  To: lvm-devel

On Thursday 13 May 2010 04:49:20 Neil Brown wrote:
> Thanks for the reply,
> 
> On Wed, 05 May 2010 10:08:09 +0200
> 
> Petr Rockai <prockai@redhat.com> wrote:
> > Neil Brown <neilb@suse.de> writes:
> > > I was surprised to discover that while a normal write error is
> > > handled properly - dmeventd runs 'lvconvert' to fix the array up,
> > > this does not happen in response to a write error while syncing
> > > the array.
> > >
> > > If I arrange for the new device to die, then
> > >           lvconvert --repair --use-policies
> > >
> > > will fix it up as I would expect, but dmeventd never asks it to do
> > > this.
> > >
> > > This seems to be a deliberate decision:  in _process_status_code
> > > in dmeventd_mirror.c, a status of 'F' will cause lvconvert to be
> > > run while 'S' and 'R' (sync and read errors) will not.
> > >
> > > Is there a reason for this?
> >
> > I think the rationale is that:
> >
> > For read errors, we should *not* strip the mirror leg, since we want to
> > keep as much redundancy as possible in this scenario. The failure should
> > be logged, but I think that's it.
> 
> In general I would agree, though there are cases where you would want to
> strip a leg based only on read errors.  If a drive is failing in a way that
> all read attempts cause multiple retries and timeouts, then having that
> device remain in the array can kill performance.
> 

On most workloads, on such cases, some write failure would occur sooner, which 
should fix this?

> It seems therefore that this should be a policy based decision, possibly
> involving extra testing, and the place for this would seem to be in
>  lvconvert.
> 
> Hence the thrust of my argument, which is that dmeventd should always run
> lvconvert on any event that is at all suspect, and lvconvert should make
>  the correct decision, which may be to leave the array alone.
> 
> Maybe lvconvert doesn't currently always make the 'correct' decision so
> changing dmeventd might be premature, but I would like to know if you agree
> that this is the best long-term strategy.
> 

Letting lvconvert decide based on policy, looks like a good idea to me.

> > For sync, I am not sure. It may be that the reason for this is that sync
> > is usually related to manual action and dmeventd intervention may be
> > unexpected and unwanted in this case. But that case could be argued.
> 
> If you are using dmeventd, then a write failure could easily cause
>  lvconvert to kick off a sync on a new device, so it could easily not be a
>  manual action.
> 
> 
> The main problem I see with the current arrangement is that if a device
>  goes missing during a sync and we don't 'lvconvert repair', then the next
>  restart will not succeed with "--partial" and that would cause problems
>  rebooting.
> 

I guess, s/with/without.

May be --partial should be used by default, till this is solved? Or repair 
should be done before shutdown.

Thanks
Nikanth



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-05-13  9:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-29  7:01 dmeventd doesn't handle failures during mirror resync Neil Brown
2010-05-05  8:08 ` Petr Rockai
2010-05-05 13:22   ` Jonathan Brassow
2010-05-05 15:26     ` Takahiro Yasui
2010-05-12 23:19   ` Neil Brown
2010-05-13  9:27     ` Nikanth Karthikesan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.