All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: heinzm@redhat.com
Cc: device-mapper development <dm-devel@redhat.com>
Subject: Re: [PATCH 00/24] dm-raid456 support using md/raid5.c, now with dirty-log
Date: Fri, 18 Jun 2010 13:52:31 +1000	[thread overview]
Message-ID: <20100618135231.124065ef@notabene.brown> (raw)
In-Reply-To: <1276771620.10854.18.camel@o>

On Thu, 17 Jun 2010 12:47:00 +0200
Heinz Mauelshagen <heinzm@redhat.com> wrote:

> On Thu, 2010-06-17 at 15:41 +1000, Neil Brown wrote:
> > On Wed, 16 Jun 2010 13:26:27 +0200
> > Heinz Mauelshagen <heinzm@redhat.com> wrote:
> > 
> > > On Wed, 2010-06-16 at 09:45 +1000, Neil Brown wrote:
> > > > Hi Heinz,
> > > > 
> > > > On Tue, 15 Jun 2010 15:23:26 +0200
> > > > Heinz Mauelshagen <heinzm@redhat.com> wrote:
> > > > 
> > > > > 
> > > > > Neil,
> > > > > 
> > > > > for missing devices we (re)load the table with error mapped device(s) in
> > > > > their place rather than identifying them with a special char/string.
> > > > > Did you go for the later in order to avoid some MD hassle with error
> > > > > targets being accessed by the MD personalities? If not so, we can drop
> > > > > the special '-' char support to identify missing devices.
> > > > 
> > > > When raid5 determines that a device has failed and so it will not write to it
> > > > again, it must be sure that the failure is record in the metadata before it
> > > > proceeds with that decision not to even try writing to the failed device.
> > > 
> > > Yes, that's the mandatory order to avoid data loss.
> > > 
> > > > Otherwise a crash/restart before the metadata was safely updated would result
> > > > in corruption.
> > > > 
> > > > This means that it must be possible for user-space to explicitly tell the
> > > > raid5 that a device is known to be faulty.
> > > > Doing that through the constructor seems the natural way to do it.
> > > 
> > > If /dev/mapper/error with an error target mapping would be used instead,
> > > would that cause consistency troubles to the MD personality if accessing
> > > those rather than the NULL device solution with the '-' argument you
> > > have now?
> > > If not so, we could drop the '-' support, which I'm aiming at.
> > > 
> > 
> > I would need to differentiate between /dev/mapper/error (where errors are
> > expected) and any other device (where errors are not expected).  How do you
> > propose I do that?
> 
> Is that differentiation mandatory for the MD personality at all?

I thought we had already agreed that it was.

If the metadata records that a device is working, then raid5 will not expect
an error and if it gets one it must wait for the metadata to be updated.
If the metadata records that a device is not working, then raid5 will never
bother even trying to write to that device.

So yes, and important distinction.

I imagine that the dmevent handler would handle a failure by updating the
metadata, then dismantling the array and recreating it with the failed device
missing.  Given that in some cases it would want to do this anyway to
introduce a spare, and as I think this is how dm-raid1 errors are handled, it
seems the logical approach.
I note that your dm-raid45 code uses a message to "allow_writes".
Using that could be racy unless you suspend the device before updating the
metadata, and if you do that you may as well use the suspend/resume sequence
as part of re-enabling writes. (???)

This is probably just a question of interface design, and we both naturally
prefer the one we came up with first :-)

I guess it just feels clumsy for the raid5 driver to have to try to read and
fail rather than simply being told up front not to bother trying.
With md/raid5, a read failure will be followed by an attempt to write out the
good data in case it was a simple media error.  Having it do that every time
you start a degraded array seems extra pointless....


> 
> You'll get immediate and durable errors from the error target and hence
> you'd instantly drop the leg.
> 
> > 
> > > > 
> > > > > 
> > > > > I'm thinking about adding a ctr wrapper to allow us to keep the "raid45"
> > > > > ctr interfaces semantics (ie. the arguments) and the new interface to
> > > > > "raid456" if you don't have objections. That would be implemented by 2
> > > > > targets being registered implemented in one module.
> > > > 
> > > > I have no strong objections, though having two targets that do almost the
> > > > same things seems rather ugly.
> > > 
> > > Keep in mind it'll only be another target_type structs, another
> > > raid_ctr() function plus another (de)registration of the respective
> > > target. The ctr functions for raid45 and raid456 can share moset of the
> > > code anyway.
> > 
> > Sure.
> > 
> > > 
> > > > 
> > > > Is there existing published code that uses the extra arguments to raid45?
> > > 
> > > Yes, dmraid with a list of RAID5 supporting metadata format handlers.
> > 
> > I just had a look at the latest dmraid from CVS and the only place that I can
> > find where a raid45 target is created is in lib/activate/activate.c in
> > _dm_raid45_bol.
> > This only uses arguments that my code understands.
> > What am I missing?
> 
> I was arguing against breaking API compatibility.
> dm-raid45.c is being shipped by distros.
> 

Shipped by distros which all have an 'upstream first' policy, and are fully
aware of the possible consequences of not following it :-)

I'm not really that fussed about the API.  I don't like seeing APIs that I
think are badly designed go into Linux, but plenty of them do and one more
isn't going to make much difference.

Approving an API is really a job for the dm maintainer.  Is that Alasdair?
Having stated my case I'm happy to accept whatever he prefers.

NeilBrown

> Thanks,
> Heinz
> 
> > Thanks,
> > NeilBrown
> 

  reply	other threads:[~2010-06-18  3:52 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-01  9:56 [PATCH 00/24] dm-raid456 support using md/raid5.c, now with dirty-log NeilBrown
2010-06-01  9:56 ` [PATCH 03/24] md/raid5: ensure we create a unique name for kmem_cache when mddev has no gendisk NeilBrown
2010-06-01  9:56 ` [PATCH 01/24] md: reduce dependence on sysfs NeilBrown
2010-06-01  9:56 ` [PATCH 02/24] md/raid5: factor out code for changing size of stripe cache NeilBrown
2010-06-01  9:56 ` [PATCH 04/24] md: be more careful setting MD_CHANGE_CLEAN NeilBrown
2010-06-01  9:56 ` [PATCH 10/24] dm-raid456: add congestion checking NeilBrown
2010-06-01  9:56 ` [PATCH 13/24] dm-raid456: support unplug NeilBrown
2010-06-01  9:56 ` [PATCH 05/24] md: split out md_rdev_init NeilBrown
2010-06-01  9:56 ` [PATCH 09/24] raid5: Don't set read-ahead when there is no queue NeilBrown
2010-06-01  9:56 ` [PATCH 07/24] md/dm: create dm-raid456 module using md/raid5 NeilBrown
2010-06-01  9:56 ` [PATCH 14/24] dm-raid456: add support for setting IO hints NeilBrown
2010-06-01  9:56 ` [PATCH 06/24] md: export various start/stop interfaces NeilBrown
2010-06-01  9:56 ` [PATCH 11/24] md/raid5: add simple plugging infrastructure NeilBrown
2010-06-01  9:56 ` [PATCH 08/24] dm-raid456: add support for raising events to userspace NeilBrown
2010-06-01  9:56 ` [PATCH 12/24] md/plug: optionally use plugger to unplug an array during resync/recovery NeilBrown
2010-06-01  9:56 ` [PATCH 20/24] md/bitmap: optimise scanning of empty bitmaps NeilBrown
2010-06-01  9:56 ` [PATCH 24/24] dm-raid456: switch to use dm_dirty_log for tracking dirty regions NeilBrown
2010-06-01  9:56 ` [PATCH 19/24] md/bitmap: clean up plugging calls NeilBrown
2010-06-01  9:56 ` [PATCH 16/24] dm-raid456: add message handler NeilBrown
2010-06-01  9:56 ` [PATCH 22/24] md/bitmap: prepare for storing write-intent-bitmap via dm-dirty-log NeilBrown
2010-06-01  9:56 ` [PATCH 17/24] md/bitmap: white space clean up and similar NeilBrown
2010-06-01  9:56 ` [PATCH 15/24] dm-raid456: add suspend/resume method NeilBrown
2010-06-01  9:56 ` [PATCH 23/24] md/bitmap: separate out loading a bitmap from initialising the structures NeilBrown
2010-06-01  9:56 ` [PATCH 21/24] dm-dirty-log: allow log size to be different from target size NeilBrown
2010-06-02 14:57   ` Heinz Mauelshagen
2010-06-03  0:10     ` [dm-devel] " Neil Brown
2010-06-03  0:53       ` Heinz Mauelshagen
2010-06-01  9:56 ` [PATCH 18/24] md/bitmap: reduce dependence on sysfs NeilBrown
2010-06-15 13:23 ` [PATCH 00/24] dm-raid456 support using md/raid5.c, now with dirty-log Heinz Mauelshagen
2010-06-15 23:45   ` Neil Brown
2010-06-16 11:26     ` Heinz Mauelshagen
2010-06-17  5:41       ` Neil Brown
2010-06-17 10:47         ` Heinz Mauelshagen
2010-06-18  3:52           ` Neil Brown [this message]
2010-06-18 10:42             ` Heinz Mauelshagen
2010-06-21 23:09               ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100618135231.124065ef@notabene.brown \
    --to=neilb@suse.de \
    --cc=dm-devel@redhat.com \
    --cc=heinzm@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.