Re: data corruption with 'splt' workload to XFS on DM cache with its 3 underlying devices being on same NVMe device

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Laurence Oberman <loberman@redhat.com>
To: Hannes Reinecke <hare@suse.de>, Mike Snitzer <snitzer@redhat.com>
Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	dm-devel@redhat.com
Subject: Re: data corruption with 'splt' workload to XFS on DM cache with its 3 underlying devices being on same NVMe device
Date: Tue, 24 Jul 2018 09:57:03 -0400	[thread overview]
Message-ID: <1532440623.9819.4.camel@redhat.com> (raw)
In-Reply-To: <27cadfb3-2442-3931-6d58-58aa6adb2e2b@suse.de>

On Tue, 2018-07-24 at 15:51 +0200, Hannes Reinecke wrote:
> On 07/24/2018 03:07 PM, Mike Snitzer wrote:
> > On Tue, Jul 24 2018 at  2:00am -0400,
> > Hannes Reinecke <hare@suse.de> wrote:
> > 
> > > On 07/23/2018 06:33 PM, Mike Snitzer wrote:
> > > > Hi,
> > > > 
> > > > I've opened the following public BZ:
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1607527
> > > > 
> > > > Feel free to add comments to that BZ if you have a redhat
> > > > bugzilla
> > > > account.
> > > > 
> > > > But otherwise, happy to get as much feedback and discussion
> > > > going purely
> > > > on the relevant lists.  I've taken ~1.5 weeks to categorize and
> > > > isolate
> > > > this issue.  But I've reached a point where I'm getting
> > > > diminishing
> > > > returns and could _really_ use the collective eyeballs and
> > > > expertise of
> > > > the community.  This is by far one of the most nasty cases of
> > > > corruption
> > > > I've seen in a while.  Not sure where the ultimate cause of
> > > > corruption
> > > > lies (that the money question) but it _feels_ rooted in NVMe
> > > > and is
> > > > unique to this particular workload I've stumbled onto via
> > > > customer
> > > > escalation and then trying to replicate an rbd device using a
> > > > more
> > > > approachable one (request-based DM multipath in this case).
> > > > 
> > > 
> > > I might be stating the obvious, but so far we only have
> > > considered
> > > request-based multipath as being active for the _entire_ device.
> > > To my knowledge we've never tested that when running on a
> > > partition.
> > 
> > True.  We only ever support mapping the partitions ontop of
> > request-based multipath (via dm-linear volumes created by kpartx).
> > 
> > > So, have you tested that request-based multipathing works on a
> > > partition _at all_? I'm not sure if partition mapping is done
> > > correctly here; we never remap the start of the request (nor bio,
> > > come to speak of it), so it looks as if we would be doing the
> > > wrong
> > > things here.
> > > 
> > > Have you checked that partition remapping is done correctly?
> > 
> > It clearly doesn't work.  Not quite following why but...
> > 
> > After running the test the partition table at the start of the
> > whole
> > NVMe device is overwritten by XFS.  So likely the IO destined to
> > the
> > dm-cache's "slow" (dm-mpath device on NVMe partition) was issued to
> > the
> > whole NVMe device:
> > 
> > # pvcreate /dev/nvme1n1
> > WARNING: xfs signature detected on /dev/nvme1n1 at offset 0. Wipe
> > it? [y/n]
> > 
> > # vgcreate test /dev/nvme1n1
> > # lvcreate -n slow -L 512G test
> > WARNING: xfs signature detected on /dev/test/slow at offset 0. Wipe
> > it?
> > [y/n]: y
> >    Wiping xfs signature on /dev/test/slow.
> >    Logical volume "slow" created.
> > 
> > Isn't this a failing of block core's partitioning?  Why should a
> > target
> > that is given the entire partition of a device need to be concerned
> > with
> > remapping IO?  Shouldn't block core handle that mapping?
> > 
> 
> Only if the device is marked a 'partitionable', which device-mapper 
> devices are not.
> But I thought you knew that ...
> 
> > Anyway, yesterday I went so far as to hack together request-based
> > support for DM linear (because request-based DM cannot stack on
> > bio-based DM) .  With this, request-based linear devices instead of
> > conventional partitioning, I no longer see the XFS corruption when
> > running the test:
> > 
> 
> _Actually_, I would've done it the other way around; after all,
> where't 
> the point in running dm-multipath on a partition?
> Anything running on the other partitions would suffer from the
> issues 
> dm-multipath is designed to handle (temporary path loss etc), so I'm
> not 
> quite sure what you are trying to achieve with your testcase.
> Can you enlighten me?
> 
> Cheers,
> 
> Hannes

This came about because a customer is using nvme for a dm-cache device
and created multiple partitions so as to use the same nvme to cache
multiple different "slower" devices. The corruption was noticed in XFS
and I engaged Mike to assist in figuring out what was going on.

next prev parent reply	other threads:[~2018-07-24 15:03 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-23 16:33 data corruption with 'splt' workload to XFS on DM cache with its 3 underlying devices being on same NVMe device Mike Snitzer
2018-07-24  6:00 ` Hannes Reinecke
2018-07-24 13:07   ` Mike Snitzer
2018-07-24 13:22     ` Laurence Oberman
2018-07-24 13:51     ` Hannes Reinecke
2018-07-24 13:57       ` Laurence Oberman [this message]
2018-07-24 15:18         ` Mike Snitzer
2018-07-24 15:31           ` Laurence Oberman
2018-07-24 17:42     ` Christoph Hellwig
2018-07-24 14:25   ` Bart Van Assche
2018-07-24 15:07     ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1532440623.9819.4.camel@redhat.com \
    --to=loberman@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=hare@suse.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).