linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: snitzer@redhat.com (Mike Snitzer)
Subject: [PATCH v3 07/12] md: suspend i/o during runtime blk_integrity_unregister
Date: Wed, 13 Jan 2016 14:51:37 -0500	[thread overview]
Message-ID: <20160113195137.GA3896@redhat.com> (raw)
In-Reply-To: <CAPcyv4ieU0EjB=i1sxs8QGisNyygYoqcx5rvkE_f7Q9Kp-j9Vg@mail.gmail.com>

On Wed, Jan 13 2016 at 12:11am -0500,
Dan Williams <dan.j.williams@intel.com> wrote:

> On Tue, Jan 12, 2016@7:10 PM, NeilBrown <neilb@suse.com> wrote:
> > On Wed, Jan 13 2016, Dan Williams wrote:
> >
> >> On Tue, Jan 12, 2016@6:14 PM, NeilBrown <neilb@suse.com> wrote:
> >>> On Thu, Oct 22 2015, Dan Williams wrote:
> >>>
> >>>> Synchronize pending i/o against a change in the integrity profile to
> >>>> avoid the possibility of spurious integrity errors.  Given linear_add()
> >>>> is suspending the mddev before manipulating the mddev, do the same for
> >>>> the other personalities.
> >>>>
> >>>> Acked-by: NeilBrown <neilb at suse.com>
> >>>> Signed-off-by: Dan Williams <dan.j.williams at intel.com>
> >>>> ---
> >>>>  drivers/md/md.c        |    1 +
> >>>>  drivers/md/multipath.c |    2 ++
> >>>>  drivers/md/raid1.c     |    2 ++
> >>>>  drivers/md/raid10.c    |    2 ++
> >>>>  4 files changed, 7 insertions(+)
> >>>>
> >>>
> >>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> >>>> index 7c99a4037715..6f0ec107996a 100644
> >>>> --- a/drivers/md/raid10.c
> >>>> +++ b/drivers/md/raid10.c
> >>>> @@ -1736,7 +1736,9 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
> >>>>               rcu_assign_pointer(p->rdev, rdev);
> >>>>               break;
> >>>>       }
> >>>> +     mddev_suspend(mddev);
> >>>>       md_integrity_add_rdev(rdev, mddev);
> >>>> +     mddev_resume(mddev);
> >>>>       if (mddev->queue && blk_queue_discard(bdev_get_queue(rdev->bdev)))
> >>>>               queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
> >>>>
> >>>
> >>> Hi Dan,
> >>>
> >>> Somewhat belatedly I am testing this code, and it deadlocks.  Which is
> >>> obvious once I think about it.
> >>>
> >>> raid10_add_disk() is called from
> >>>   raid10d -> md_check_recovery -> remove_and_add_spares
> >>>
> >>> so this call to mddev_suspend() will block raid10d until all pending IO
> >>> completes.  But raid10d might be needed to complete some IO -
> >>> particularly in the case of errors.  I don't know exactly what is
> >>> deadlocking in my test, but it doesn't surprise me that something does.
> >>
> >> Ugh.
> >>
> >>> Can you explain in more detail what needs to be synchronised here?  If
> >>> the synchronisation doesn't happen, what can do wrong?
> >>> Because I cannot imagine what this might actually be protecting against.
> >>
> >> My thinking is that if the integrity profile changes while i/o is in
> >> flight we can get spurious errors.  For example when enabling
> >> integrity a checksum for in-flight commands will be missing, so wait
> >> for those to complete.
> >
> > Who/what adds that checksum?  The filesystem?
> > mddev_suspend() doesn't stop the filesystem from submitting requests.
> > It just stops them from getting into md.  So when mddev_resume()
> > completes, old requests can arrive.
> 
> The block layer generates the checksum when the bio is queued.
> 
> You're right the suspend does nothing to protect against the integrity
> generated for the array.

MD and DM are _not_ using the Linux block layer's ability to generate
protection information metadata (via bio_integrity_generate).  As such
they are only providing pass-through of upper layer application
generated protection information (DIX).

A bit more explanation:
bio-based MD and DM's cloning of protection information causes their
cloned bios to have bio->bi_integrity -- as such the underlying
request-based driver skips bio_integrity_generate() via
blk_integrity_prep() (because blk_queue_bio()'s call to
bio_integrity_enabled() returns false for MD and DM's cloned bios).

> > If stability of the integrity profile is important, then we presumably
> > need to make sure it never changes (while the array is active - or at
> > least while it is mounted).  So don't accept drives which don't support
> > the current profile.  Also don't upgrade the profile just because all
> > current devices support some new feature.
> >
> > Maybe the array should have a persistent knowledge of what profile it
> > has and only accept devices which match?
> > Or maybe integrity should be completely disabled on arrays which can
> > hot-add devices (so support it on RAID0 and LINEAR, but not on
> > RAID1/4/5/6/10).
> 
> For -stable I'll develop an immediate fix that disallows live changes
> of the integrity similar to DM.
> 
> However, given no filesystem is sending down app tags, I'm wondering
> why md has integrity enabled at the array level in the first place?
> Is this just future preparation for eventual filesystem enabling?
> Otherwise, the integrity handling for each component device is
> sufficient.  I'm probably missing something.

AFAIK, Oracle DB is the only application known to be using DIX.

  reply	other threads:[~2016-01-13 19:51 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-21 17:19 [PATCH v3 00/12] Simplify block integrity registration v3 Dan Williams
2015-10-21 17:19 ` [PATCH v3 01/12] block: Move integrity kobject to struct gendisk Dan Williams
2015-10-21 17:19 ` [PATCH v3 02/12] block: Consolidate static integrity profile properties Dan Williams
2015-10-21 17:19 ` [PATCH v3 03/12] block: Reduce the size of struct blk_integrity Dan Williams
2015-10-21 17:19 ` [PATCH v3 04/12] block: Export integrity data interval size in sysfs Dan Williams
2015-10-21 17:19 ` [PATCH v3 05/12] block: Inline blk_integrity in struct gendisk Dan Williams
2015-10-21 17:19 ` [PATCH v3 06/12] md, dm, scsi, nvme, libnvdimm: drop blk_integrity_unregister() at shutdown Dan Williams
2016-01-14 10:32   ` Sagi Grimberg
2015-10-21 17:20 ` [PATCH v3 07/12] md: suspend i/o during runtime blk_integrity_unregister Dan Williams
2016-01-13  2:14   ` NeilBrown
2016-01-13  2:24     ` Dan Williams
2016-01-13  2:59       ` Mike Snitzer
2016-01-13  3:10       ` NeilBrown
2016-01-13  5:11         ` Dan Williams
2016-01-13 19:51           ` Mike Snitzer [this message]
2016-01-14  0:00     ` [PATCH] md/raid: only permit hot-add of compatible integrity profiles Dan Williams
2016-01-14  0:56       ` NeilBrown
2015-10-21 17:20 ` [PATCH v3 08/12] nvme: suspend i/o during runtime blk_integrity_unregister Dan Williams
2016-01-14 10:32   ` Sagi Grimberg
2015-10-21 17:20 ` [PATCH v3 09/12] block: generic request_queue reference counting Dan Williams
2016-01-14 10:35   ` Sagi Grimberg
2015-10-21 17:20 ` [PATCH v3 10/12] block: move blk_integrity to request_queue Dan Williams
2016-01-14 10:36   ` Sagi Grimberg
2015-10-21 17:20 ` [PATCH v3 11/12] block: blk_flush_integrity() for bio-based drivers Dan Williams
2016-01-14 10:36   ` Sagi Grimberg
2015-10-21 17:20 ` [PATCH v3 12/12] block, libnvdimm, nvme: provide a built-in blk_integrity nop profile Dan Williams
2016-01-14 10:37   ` Sagi Grimberg
2015-10-22 15:26 ` [PATCH v3 00/12] Simplify block integrity registration v3 Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160113195137.GA3896@redhat.com \
    --to=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).