[PATCH] dm-io: reject unsupported DISCARD/WRITE SAME requests with EOPNOTSUPP

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] dm-io: reject unsupported DISCARD/WRITE SAME requests with EOPNOTSUPP
@ 2015-02-13  9:24 Darrick J. Wong
  2015-02-13 15:55 ` Mike Snitzer
  0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2015-02-13  9:24 UTC (permalink / raw)
  To: snitzer, agk, darrick.wong; +Cc: dm-devel, Martin K. Petersen, Srinivas Eeda

I created a dm-raid1 device backed by a device that supports DISCARD
and another device that does NOT support DISCARD with the following
dm configuration:

# echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo
# lsblk -D
NAME         DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda                 0        4K       1G         0
`-moo (dm-0)        0        4K       1G         0
sdb                 0        0B       0B         0
`-moo (dm-0)        0        4K       1G         0

Notice that the mirror device /dev/mapper/moo advertises DISCARD
support even though one of the mirror halves doesn't.

If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl
BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite
loop in do_region() when it tries to issue a DISCARD request to sdb.
The problem is that when we call do_region() against sdb, num_sectors
is set to zero because q->limits.max_discard_sectors is zero.
Therefore, "remaining" never decreases and the loop never terminates.

Before entering the loop, check for the combination of REQ_DISCARD and
no discard and return -EOPNOTSUPP to avoid hanging up the mirror
device.  Fix the same problem with WRITE_DISCARD while we're at it.

This bug was found by the unfortunate coincidence of pvmove and a
discard operation in the RHEL 6.5 kernel; 3.19 is also affected.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Srinivas Eeda <srinivas.eeda@oracle.com>
---
 drivers/md/dm-io.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index c09359d..32d330a 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -290,6 +290,13 @@ static void do_region(int rw, unsigned region, struct dm_io_region *where,
 	unsigned short logical_block_size = queue_logical_block_size(q);
 	sector_t num_sectors;

+	/* Reject unsupported discard/write same requests */
+	if (((rw & REQ_DISCARD) && !blk_queue_discard(q)) ||
+	    ((rw & REQ_WRITE_SAME) && !bdev_write_same(where->bdev))) {
+		dec_count(io, region, -EOPNOTSUPP);
+		return;
+	}
+
 	/*
 	 * where->count may be zero if rw holds a flush and we need to
 	 * send a zero-sized flush.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: dm-io: reject unsupported DISCARD/WRITE SAME requests with EOPNOTSUPP
  2015-02-13  9:24 [PATCH] dm-io: reject unsupported DISCARD/WRITE SAME requests with EOPNOTSUPP Darrick J. Wong
@ 2015-02-13 15:55 ` Mike Snitzer
  2015-02-13 17:01   ` Darrick J. Wong
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Mike Snitzer @ 2015-02-13 15:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: dm-devel, Martin K. Petersen, agk, Srinivas Eeda

On Fri, Feb 13 2015 at  4:24am -0500,
Darrick J. Wong <darrick.wong@oracle.com> wrote:

> I created a dm-raid1 device backed by a device that supports DISCARD
> and another device that does NOT support DISCARD with the following
> dm configuration:
> 
> # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo
> # lsblk -D
> NAME         DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
> sda                 0        4K       1G         0
> `-moo (dm-0)        0        4K       1G         0
> sdb                 0        0B       0B         0
> `-moo (dm-0)        0        4K       1G         0
> 
> Notice that the mirror device /dev/mapper/moo advertises DISCARD
> support even though one of the mirror halves doesn't.
> 
> If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl
> BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite
> loop in do_region() when it tries to issue a DISCARD request to sdb.
> The problem is that when we call do_region() against sdb, num_sectors
> is set to zero because q->limits.max_discard_sectors is zero.
> Therefore, "remaining" never decreases and the loop never terminates.
> 
> Before entering the loop, check for the combination of REQ_DISCARD and
> no discard and return -EOPNOTSUPP to avoid hanging up the mirror
> device.  Fix the same problem with WRITE_DISCARD while we're at it.
> 
> This bug was found by the unfortunate coincidence of pvmove and a
> discard operation in the RHEL 6.5 kernel; 3.19 is also affected.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> Cc: Srinivas Eeda <srinivas.eeda@oracle.com>

Your patch looks fine but it is laser focused on dm-io.  Again, that is
fine (fixes a real problem).  But I'm wondering how other targets will
respond in the face of partial discard support across the logical
address space of the DM device.

When I implemented dm_table_supports_discards() I consciously allowed a
DM table to contain a mix of discard support.  I'm now wondering where
it is we benefit from that?  Seems like more of a liability than
anything -- so a bigger hammer approach to fixing this would be to
require all targets and all devices in a DM table support discard.
Which amounts to changing dm_table_supports_discards() to be like
dm_table_supports_write_same().

BTW, given dm_table_supports_write_same(), your patch shouldn't need to
worry about WRITE SAME.  Did you experience issues with WRITE SAME too
or were you just being proactive?

Mike

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dm-io: reject unsupported DISCARD/WRITE SAME requests with EOPNOTSUPP
  2015-02-13 15:55 ` Mike Snitzer
@ 2015-02-13 17:01   ` Darrick J. Wong
  2015-02-13 19:21   ` Martin K. Petersen
  2015-02-26 19:56   ` Mikulas Patocka
  2 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2015-02-13 17:01 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel, Martin K. Petersen, agk, Srinivas Eeda

On Fri, Feb 13, 2015 at 10:55:50AM -0500, Mike Snitzer wrote:
> On Fri, Feb 13 2015 at  4:24am -0500,
> Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> > I created a dm-raid1 device backed by a device that supports DISCARD
> > and another device that does NOT support DISCARD with the following
> > dm configuration:
> > 
> > # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo
> > # lsblk -D
> > NAME         DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
> > sda                 0        4K       1G         0
> > `-moo (dm-0)        0        4K       1G         0
> > sdb                 0        0B       0B         0
> > `-moo (dm-0)        0        4K       1G         0
> > 
> > Notice that the mirror device /dev/mapper/moo advertises DISCARD
> > support even though one of the mirror halves doesn't.
> > 
> > If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl
> > BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite
> > loop in do_region() when it tries to issue a DISCARD request to sdb.
> > The problem is that when we call do_region() against sdb, num_sectors
> > is set to zero because q->limits.max_discard_sectors is zero.
> > Therefore, "remaining" never decreases and the loop never terminates.
> > 
> > Before entering the loop, check for the combination of REQ_DISCARD and
> > no discard and return -EOPNOTSUPP to avoid hanging up the mirror
> > device.  Fix the same problem with WRITE_DISCARD while we're at it.
> > 
> > This bug was found by the unfortunate coincidence of pvmove and a
> > discard operation in the RHEL 6.5 kernel; 3.19 is also affected.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> > Cc: Srinivas Eeda <srinivas.eeda@oracle.com>
> 
> Your patch looks fine but it is laser focused on dm-io.  Again, that is
> fine (fixes a real problem).  But I'm wondering how other targets will
> respond in the face of partial discard support across the logical
> address space of the DM device.

I'm not sure.  As far as I could tell, the simple dm targets seem happy enough
to pass along the -EOPNOTSUPP code to whomever gave it the bio.  I didn't look
at complicated things like dm-thin*, dm-cache, or dm-raid456.  It might be
worth having a regression test that can be applied to all the dm* devices in
rapid succession so we can find out.

Of the callers that I audited, ext4, xfs, btrfs, and f2fs silently ignore the
EOPNOTSUPP code.  jfs and swapfiles printk about the error but take no other
action.  BLKDISCARD (the ioctl) simply returns the error code to userspace.
gfs2 and nilfs2 will disable discards; and fat doesn't check error codes at
all.  None of them panic or otherwise go bonkers.

It seems that most FSes expect transient discard failures and/or partial
support.

> When I implemented dm_table_supports_discards() I consciously allowed a
> DM table to contain a mix of discard support.  I'm now wondering where
> it is we benefit from that?  Seems like more of a liability than

<shrug> Given the result above, I think we can leave it this way if we perform
a code audit, though certainly this cautious approach would mask any other bugs
that could be lurking.

> anything -- so a bigger hammer approach to fixing this would be to
> require all targets and all devices in a DM table support discard.
> Which amounts to changing dm_table_supports_discards() to be like
> dm_table_supports_write_same().
> 
> BTW, given dm_table_supports_write_same(), your patch shouldn't need to
> worry about WRITE SAME.  Did you experience issues with WRITE SAME too
> or were you just being proactive?

I was being (perhaps overly) proactive, at 1am. :)

That part of the if statement can go away.  I'll send a v2.

--D

> 
> Mike

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dm-io: reject unsupported DISCARD/WRITE SAME requests with EOPNOTSUPP
  2015-02-13 15:55 ` Mike Snitzer
  2015-02-13 17:01   ` Darrick J. Wong
@ 2015-02-13 19:21   ` Martin K. Petersen
  2015-02-13 20:07     ` Darrick J. Wong
  2015-02-26 19:56   ` Mikulas Patocka
  2 siblings, 1 reply; 10+ messages in thread
From: Martin K. Petersen @ 2015-02-13 19:21 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: dm-devel, Srinivas Eeda, Martin K. Petersen, agk, Darrick J. Wong

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> When I implemented dm_table_supports_discards() I consciously
Mike> allowed a DM table to contain a mix of discard support.  I'm now
Mike> wondering where it is we benefit from that?  Seems like more of a
Mike> liability than anything -- so a bigger hammer approach to fixing
Mike> this would be to require all targets and all devices in a DM table
Mike> support discard.

I think our original rationale was that since discard is only a hint it
would be fine to mix and match. And at the time there seemed to be value
in supporting a heterogeneous setups with say a disk drive and an SSD.

Back then the SSD vendors were all busy telling us how crucial discard
would be going forward. However, that turned out not to be the case and
discard often causes more problems than it solves. So I'm perfectly OK
with requiring all devices in a table to have the same capabilities. In
many ways I think that's a cleaner approach.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dm-io: reject unsupported DISCARD/WRITE SAME requests with EOPNOTSUPP
  2015-02-13 19:21   ` Martin K. Petersen
@ 2015-02-13 20:07     ` Darrick J. Wong
  2015-02-13 20:58       ` Mike Snitzer
  0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2015-02-13 20:07 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: dm-devel, agk, Mike Snitzer, Srinivas Eeda

On Fri, Feb 13, 2015 at 02:21:01PM -0500, Martin K. Petersen wrote:
> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike> When I implemented dm_table_supports_discards() I consciously
> Mike> allowed a DM table to contain a mix of discard support.  I'm now
> Mike> wondering where it is we benefit from that?  Seems like more of a
> Mike> liability than anything -- so a bigger hammer approach to fixing
> Mike> this would be to require all targets and all devices in a DM table
> Mike> support discard.
> 
> I think our original rationale was that since discard is only a hint it
> would be fine to mix and match. And at the time there seemed to be value
> in supporting a heterogeneous setups with say a disk drive and an SSD.
> 
> Back then the SSD vendors were all busy telling us how crucial discard
> would be going forward. However, that turned out not to be the case and
> discard often causes more problems than it solves. So I'm perfectly OK
> with requiring all devices in a table to have the same capabilities. In
> many ways I think that's a cleaner approach.

I agree that imposing that extra requirement would be cleaner from a software
engineering POV.

That said, given that discard is advisory and most of the callers seem to
anticipate that it might not work, why not allow heterogeneous mixes?  It seems
unfortunate to remove that ability just because there are kernel bugs.  If
you're implementing thin provisioning and are unmapping storage when discard
requests come through, wouldn't it be an antifeature that this suddenly stops
just because something got in the way?  fstrim ought to be able to talk to
the parts of the compound device that can be trimmed.

Second question: Can a dm device detect that q->limits.max_discard_sectors has
changed in one of the devices it's sitting on?  Say I have this:

X -> Y -> SSD1
 \-> Z -> SSD2

SSD*, Z, Y, and X all advertise discard.

Then I change Y to point to a spinny disk:

X -> Y -> HDD
 \-> Z -> SSD2

Even if Y notices that it no longer supports discard, will X figure that out
too?

--D

> 
> -- 
> Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dm-io: reject unsupported DISCARD/WRITE SAME requests with EOPNOTSUPP
  2015-02-13 20:07     ` Darrick J. Wong
@ 2015-02-13 20:58       ` Mike Snitzer
  2015-02-13 21:12         ` Martin K. Petersen
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Snitzer @ 2015-02-13 20:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: dm-devel, agk, Martin K. Petersen, Srinivas Eeda

On Fri, Feb 13 2015 at  3:07P -0500,
Darrick J. Wong <darrick.wong@oracle.com> wrote:

> On Fri, Feb 13, 2015 at 02:21:01PM -0500, Martin K. Petersen wrote:
> > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> > 
> > Mike> When I implemented dm_table_supports_discards() I consciously
> > Mike> allowed a DM table to contain a mix of discard support.  I'm now
> > Mike> wondering where it is we benefit from that?  Seems like more of a
> > Mike> liability than anything -- so a bigger hammer approach to fixing
> > Mike> this would be to require all targets and all devices in a DM table
> > Mike> support discard.
> > 
> > I think our original rationale was that since discard is only a hint it
> > would be fine to mix and match. And at the time there seemed to be value
> > in supporting a heterogeneous setups with say a disk drive and an SSD.
> > 
> > Back then the SSD vendors were all busy telling us how crucial discard
> > would be going forward. However, that turned out not to be the case and
> > discard often causes more problems than it solves. So I'm perfectly OK
> > with requiring all devices in a table to have the same capabilities. In
> > many ways I think that's a cleaner approach.
> 
> I agree that imposing that extra requirement would be cleaner from a software
> engineering POV.
> 
> That said, given that discard is advisory and most of the callers seem to
> anticipate that it might not work, why not allow heterogeneous mixes?  It seems
> unfortunate to remove that ability just because there are kernel bugs.  If
> you're implementing thin provisioning and are unmapping storage when discard
> requests come through, wouldn't it be an antifeature that this suddenly stops
> just because something got in the way?  fstrim ought to be able to talk to
> the parts of the compound device that can be trimmed.

Yeah, I'm OK with leaving it as is for now (allowing a mix).  So I'll be
picking your dm-io patch up for 3.20-rc.  But if other similar bugs
manifest then we can revisit disallowing a mix.

As for your question on thin provisioning.  The DM thinp target
advertises support for discards even if the underlying data device
doesn't (unless the user explicitly asked for discards to be disabled on
the thin-pool).  This is accomplished with the target's
'discards_supported' override.  So even if we did make the change to
disallowing a mix: the 'discards_supported' override would still enable
discards.

> Second question: Can a dm device detect that q->limits.max_discard_sectors has
> changed in one of the devices it's sitting on?  Say I have this:
> 
> X -> Y -> SSD1
>  \-> Z -> SSD2
> 
> SSD*, Z, Y, and X all advertise discard.
> 
> Then I change Y to point to a spinny disk:
> 
> X -> Y -> HDD
>  \-> Z -> SSD2
> 
> Even if Y notices that it no longer supports discard, will X figure that out
> too?

No, the DM table reload for Y would alter the discard capability of Y
but it will _not_ bubble up to X.

(would be nice if such limit stacking with refresh but...)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dm-io: reject unsupported DISCARD/WRITE SAME requests with EOPNOTSUPP
  2015-02-13 20:58       ` Mike Snitzer
@ 2015-02-13 21:12         ` Martin K. Petersen
  0 siblings, 0 replies; 10+ messages in thread
From: Martin K. Petersen @ 2015-02-13 21:12 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: dm-devel, Srinivas Eeda, agk, Martin K. Petersen, Darrick J. Wong

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> Yeah, I'm OK with leaving it as is for now (allowing a mix).

That's fine with me as long as you're comfortable that we're handling
the corner cases correctly.

Mike> So I'll be picking your dm-io patch up for 3.20-rc.

Should also go to stable.

Mike> No, the DM table reload for Y would alter the discard capability
Mike> of Y but it will _not_ bubble up to X.

Mike> (would be nice if such limit stacking with refresh but...)

Yeah. We keep talking about this. Maybe we can have a quick brain storm
at LSF/MM?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dm-io: reject unsupported DISCARD/WRITE SAME requests with EOPNOTSUPP
  2015-02-13 15:55 ` Mike Snitzer
  2015-02-13 17:01   ` Darrick J. Wong
  2015-02-13 19:21   ` Martin K. Petersen
@ 2015-02-26 19:56   ` Mikulas Patocka
  2015-02-26 20:10     ` Mikulas Patocka
  2015-02-27 18:42     ` Darrick J. Wong
  2 siblings, 2 replies; 10+ messages in thread
From: Mikulas Patocka @ 2015-02-26 19:56 UTC (permalink / raw)
  To: device-mapper development
  Cc: Srinivas Eeda, agk, Martin K. Petersen, Darrick J. Wong



On Fri, 13 Feb 2015, Mike Snitzer wrote:

> On Fri, Feb 13 2015 at  4:24am -0500,
> Darrick J. Wong <darrick.wong@oracle.com> wrote:
> 
> > I created a dm-raid1 device backed by a device that supports DISCARD
> > and another device that does NOT support DISCARD with the following
> > dm configuration:
> > 
> > # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo
> > # lsblk -D
> > NAME         DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
> > sda                 0        4K       1G         0
> > `-moo (dm-0)        0        4K       1G         0
> > sdb                 0        0B       0B         0
> > `-moo (dm-0)        0        4K       1G         0
> > 
> > Notice that the mirror device /dev/mapper/moo advertises DISCARD
> > support even though one of the mirror halves doesn't.
> > 
> > If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl
> > BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite
> > loop in do_region() when it tries to issue a DISCARD request to sdb.
> > The problem is that when we call do_region() against sdb, num_sectors
> > is set to zero because q->limits.max_discard_sectors is zero.
> > Therefore, "remaining" never decreases and the loop never terminates.
> > 
> > Before entering the loop, check for the combination of REQ_DISCARD and
> > no discard and return -EOPNOTSUPP to avoid hanging up the mirror
> > device.  Fix the same problem with WRITE_DISCARD while we're at it.
> > 
> > This bug was found by the unfortunate coincidence of pvmove and a
> > discard operation in the RHEL 6.5 kernel; 3.19 is also affected.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> > Cc: Srinivas Eeda <srinivas.eeda@oracle.com>
> 
> Your patch looks fine but it is laser focused on dm-io.  Again, that is
> fine (fixes a real problem).  But I'm wondering how other targets will
> respond in the face of partial discard support across the logical
> address space of the DM device.
> 
> When I implemented dm_table_supports_discards() I consciously allowed a
> DM table to contain a mix of discard support.  I'm now wondering where
> it is we benefit from that?  Seems like more of a liability than
> anything -- so a bigger hammer approach to fixing this would be to
> require all targets and all devices in a DM table support discard.
> Which amounts to changing dm_table_supports_discards() to be like
> dm_table_supports_write_same().
> 
> BTW, given dm_table_supports_write_same(), your patch shouldn't need to
> worry about WRITE SAME.  Did you experience issues with WRITE SAME too
> or were you just being proactive?
> 
> Mike

I think that Darrick's patch is needed even for WRITE SAME.

Note that queue limits and flags can't be reliably prevent bios from 
coming in.

For example:

1. Some piece of code tests queue flags and sees that 
max_write_same_sectors is non-zero, it constructs WRITE_SAME bio and sends 
it with submit_bio.

2. Meanwhile, the device is reconfigured so that it doesn't support 
WRITE_SAME. q->limits.max_write_same_sectors is set to zero.

3. The bio submitted at step 1 can't be reverted, so it arrives at the 
device mapper even if it advertises that it doesn't support write same - 
now, it causes the lockup that Darrick observed.


Another problem is that queue flags are not propagated up when you reload 
a single device - someone could reload a mirror leg with a different dm 
table that doesn't support write_same, and even after the reload, the 
mirror keeps advertising that it does support WRITE_SAME.


Mikulas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dm-io: reject unsupported DISCARD/WRITE SAME requests with EOPNOTSUPP
  2015-02-26 19:56   ` Mikulas Patocka
@ 2015-02-26 20:10     ` Mikulas Patocka
  2015-02-27 18:42     ` Darrick J. Wong
  1 sibling, 0 replies; 10+ messages in thread
From: Mikulas Patocka @ 2015-02-26 20:10 UTC (permalink / raw)
  To: device-mapper development, Mike Snitzer
  Cc: Srinivas Eeda, agk, Martin K. Petersen, Darrick J. Wong



On Thu, 26 Feb 2015, Mikulas Patocka wrote:

> 
> 
> On Fri, 13 Feb 2015, Mike Snitzer wrote:
> 
> > On Fri, Feb 13 2015 at  4:24am -0500,
> > Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > 
> > > I created a dm-raid1 device backed by a device that supports DISCARD
> > > and another device that does NOT support DISCARD with the following
> > > dm configuration:
> > > 
> > > # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo
> > > # lsblk -D
> > > NAME         DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
> > > sda                 0        4K       1G         0
> > > `-moo (dm-0)        0        4K       1G         0
> > > sdb                 0        0B       0B         0
> > > `-moo (dm-0)        0        4K       1G         0
> > > 
> > > Notice that the mirror device /dev/mapper/moo advertises DISCARD
> > > support even though one of the mirror halves doesn't.
> > > 
> > > If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl
> > > BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite
> > > loop in do_region() when it tries to issue a DISCARD request to sdb.
> > > The problem is that when we call do_region() against sdb, num_sectors
> > > is set to zero because q->limits.max_discard_sectors is zero.
> > > Therefore, "remaining" never decreases and the loop never terminates.
> > > 
> > > Before entering the loop, check for the combination of REQ_DISCARD and
> > > no discard and return -EOPNOTSUPP to avoid hanging up the mirror
> > > device.  Fix the same problem with WRITE_DISCARD while we're at it.
> > > 
> > > This bug was found by the unfortunate coincidence of pvmove and a
> > > discard operation in the RHEL 6.5 kernel; 3.19 is also affected.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> > > Cc: Srinivas Eeda <srinivas.eeda@oracle.com>
> > 
> > Your patch looks fine but it is laser focused on dm-io.  Again, that is
> > fine (fixes a real problem).  But I'm wondering how other targets will
> > respond in the face of partial discard support across the logical
> > address space of the DM device.
> > 
> > When I implemented dm_table_supports_discards() I consciously allowed a
> > DM table to contain a mix of discard support.  I'm now wondering where
> > it is we benefit from that?  Seems like more of a liability than
> > anything -- so a bigger hammer approach to fixing this would be to
> > require all targets and all devices in a DM table support discard.
> > Which amounts to changing dm_table_supports_discards() to be like
> > dm_table_supports_write_same().
> > 
> > BTW, given dm_table_supports_write_same(), your patch shouldn't need to
> > worry about WRITE SAME.  Did you experience issues with WRITE SAME too
> > or were you just being proactive?
> > 
> > Mike
> 
> I think that Darrick's patch is needed even for WRITE SAME.
> 
> Note that queue limits and flags can't be reliably prevent bios from 
> coming in.
> 
> For example:
> 
> 1. Some piece of code tests queue flags and sees that 
> max_write_same_sectors is non-zero, it constructs WRITE_SAME bio and sends 
> it with submit_bio.
> 
> 2. Meanwhile, the device is reconfigured so that it doesn't support 
> WRITE_SAME. q->limits.max_write_same_sectors is set to zero.
> 
> 3. The bio submitted at step 1 can't be reverted, so it arrives at the 
> device mapper even if it advertises that it doesn't support write same - 
> now, it causes the lockup that Darrick observed.
> 
> 
> Another problem is that queue flags are not propagated up when you reload 
> a single device - someone could reload a mirror leg with a different dm 
> table that doesn't support write_same, and even after the reload, the 
> mirror keeps advertising that it does support WRITE_SAME.

It comes to another idea - if the limits change while the do-while loop is 
in progress, even the original Darrick's patch is wrong and fails to 
prevent the lockup. So - we need to read the limits in advance, test them 
and never re-read them.

Mikulas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: dm-io: reject unsupported DISCARD/WRITE SAME requests with EOPNOTSUPP
  2015-02-26 19:56   ` Mikulas Patocka
  2015-02-26 20:10     ` Mikulas Patocka
@ 2015-02-27 18:42     ` Darrick J. Wong
  1 sibling, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2015-02-27 18:42 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin K. Petersen, agk, Srinivas Eeda

On Thu, Feb 26, 2015 at 02:56:20PM -0500, Mikulas Patocka wrote:
> 
> 
> On Fri, 13 Feb 2015, Mike Snitzer wrote:
> 
> > On Fri, Feb 13 2015 at  4:24am -0500,
> > Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > 
> > > I created a dm-raid1 device backed by a device that supports DISCARD
> > > and another device that does NOT support DISCARD with the following
> > > dm configuration:
> > > 
> > > # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo
> > > # lsblk -D
> > > NAME         DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
> > > sda                 0        4K       1G         0
> > > `-moo (dm-0)        0        4K       1G         0
> > > sdb                 0        0B       0B         0
> > > `-moo (dm-0)        0        4K       1G         0
> > > 
> > > Notice that the mirror device /dev/mapper/moo advertises DISCARD
> > > support even though one of the mirror halves doesn't.
> > > 
> > > If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl
> > > BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite
> > > loop in do_region() when it tries to issue a DISCARD request to sdb.
> > > The problem is that when we call do_region() against sdb, num_sectors
> > > is set to zero because q->limits.max_discard_sectors is zero.
> > > Therefore, "remaining" never decreases and the loop never terminates.
> > > 
> > > Before entering the loop, check for the combination of REQ_DISCARD and
> > > no discard and return -EOPNOTSUPP to avoid hanging up the mirror
> > > device.  Fix the same problem with WRITE_DISCARD while we're at it.
> > > 
> > > This bug was found by the unfortunate coincidence of pvmove and a
> > > discard operation in the RHEL 6.5 kernel; 3.19 is also affected.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> > > Cc: Srinivas Eeda <srinivas.eeda@oracle.com>
> > 
> > Your patch looks fine but it is laser focused on dm-io.  Again, that is
> > fine (fixes a real problem).  But I'm wondering how other targets will
> > respond in the face of partial discard support across the logical
> > address space of the DM device.
> > 
> > When I implemented dm_table_supports_discards() I consciously allowed a
> > DM table to contain a mix of discard support.  I'm now wondering where
> > it is we benefit from that?  Seems like more of a liability than
> > anything -- so a bigger hammer approach to fixing this would be to
> > require all targets and all devices in a DM table support discard.
> > Which amounts to changing dm_table_supports_discards() to be like
> > dm_table_supports_write_same().
> > 
> > BTW, given dm_table_supports_write_same(), your patch shouldn't need to
> > worry about WRITE SAME.  Did you experience issues with WRITE SAME too
> > or were you just being proactive?
> > 
> > Mike
> 
> I think that Darrick's patch is needed even for WRITE SAME.
> 
> Note that queue limits and flags can't be reliably prevent bios from 
> coming in.
> 
> For example:
> 
> 1. Some piece of code tests queue flags and sees that 
> max_write_same_sectors is non-zero, it constructs WRITE_SAME bio and sends 
> it with submit_bio.
> 
> 2. Meanwhile, the device is reconfigured so that it doesn't support 
> WRITE_SAME. q->limits.max_write_same_sectors is set to zero.
> 
> 3. The bio submitted at step 1 can't be reverted, so it arrives at the 
> device mapper even if it advertises that it doesn't support write same - 
> now, it causes the lockup that Darrick observed.

I'd pondered patching the WRITE SAME case too.

> Another problem is that queue flags are not propagated up when you reload 
> a single device - someone could reload a mirror leg with a different dm 
> table that doesn't support write_same, and even after the reload, the 
> mirror keeps advertising that it does support WRITE_SAME.

Not sure how to deal with that -- I suppose we could save the 'last recorded q
limits' and watch for changes, but I suppose that depends on how resilient
callers are against DISCARD and WRITE SAME returning -EOPNOTSUPP.  So far as I
can tell the in-kernel caller handles it gracefully enough, and I'd hope that
any sane userland program would know to follow up with a regular write (if
applicable), but who knows...

> It comes to another idea - if the limits change while the do-while loop is 
> in progress, even the original Darrick's patch is wrong and fails to 
> prevent the lockup. So - we need to read the limits in advance, test them 
> and never re-read them.

Agreed.  I /think/ issuing discard/write same to a device that no longer
supports it will return an error... or at least in my crummy 5 minute test it
seemed to work.

--D

> 
> 
> Mikulas
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-02-27 18:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-13  9:24 [PATCH] dm-io: reject unsupported DISCARD/WRITE SAME requests with EOPNOTSUPP Darrick J. Wong
2015-02-13 15:55 ` Mike Snitzer
2015-02-13 17:01   ` Darrick J. Wong
2015-02-13 19:21   ` Martin K. Petersen
2015-02-13 20:07     ` Darrick J. Wong
2015-02-13 20:58       ` Mike Snitzer
2015-02-13 21:12         ` Martin K. Petersen
2015-02-26 19:56   ` Mikulas Patocka
2015-02-26 20:10     ` Mikulas Patocka
2015-02-27 18:42     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.