linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* harmful parallel AoE check/resync
@ 2009-04-06 14:31 Ferenc Wagner
  2009-04-06 16:39 ` Chris Webb
  0 siblings, 1 reply; 2+ messages in thread
From: Ferenc Wagner @ 2009-04-06 14:31 UTC (permalink / raw)
  To: linux-raid

Hi,

md_do_sync() in md.c takes care not to resync/check MD devices built
on different parts of the same physical device in parallel.  This does
not account for AoE devices, which, however, most of the time share
and are limited by network bandwidth.  Thus a monthly data check
congests all AoE RAIDs at the same time.  Don't you think this case
should be handled specially, and such devices should be serialized by
default (possibly overridden by parallel_resync, of course)?  This
could be implemented in match_mddev_units by returning true for any
two AoE devices, for example.  Or leaving that alone, and extending
parallel_resync with a new option to force serialization.  Maybe this
would be more elegant.  What do you think?

(Please Cc me, I'm not subscribed.)
-- 
Thanks,
Feri.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: harmful parallel AoE check/resync
  2009-04-06 14:31 harmful parallel AoE check/resync Ferenc Wagner
@ 2009-04-06 16:39 ` Chris Webb
  0 siblings, 0 replies; 2+ messages in thread
From: Chris Webb @ 2009-04-06 16:39 UTC (permalink / raw)
  To: Ferenc Wagner; +Cc: linux-raid

Ferenc Wagner <wferi@niif.hu> writes:

> md_do_sync() in md.c takes care not to resync/check MD devices built
> on different parts of the same physical device in parallel.  This does
> not account for AoE devices, which, however, most of the time share
> and are limited by network bandwidth.

...and may also share underlying backing devices at the far end.

I run a cluster with a lot of cross-access of storage via AoE, combined
using md. In fact, every RAID device on every host shares physical devices
behind AoE, so I crack this particular nut locally with the following
sledge-hammer:

diff --git a/drivers/md/md.c b/drivers/md/md.c
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5744,8 +5744,7 @@
 			if (mddev2 == mddev)
 				continue;
 			if (!mddev->parallel_resync
-			&&  mddev2->curr_resync
-			&&  match_mddev_units(mddev, mddev2)) {
+			&&  mddev2->curr_resync) {
 				DEFINE_WAIT(wq);
 				if (mddev < mddev2 && mddev->curr_resync == 2) {
 					/* arbitrarily yield */

This clearly isn't the right solution more generally, though, and it'd be
great to have a more elegant way of defining which backing devices conflict
with one another, and which are independent.

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-04-06 16:39 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-06 14:31 harmful parallel AoE check/resync Ferenc Wagner
2009-04-06 16:39 ` Chris Webb

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).