* [linux-lvm] Testing TRIM with LVM
@ 2011-04-12 14:59 DarkNovaNick
2011-04-12 23:47 ` Mike Snitzer
0 siblings, 1 reply; 54+ messages in thread
From: DarkNovaNick @ 2011-04-12 14:59 UTC (permalink / raw)
To: linux-lvm
[-- Attachment #1: Type: text/plain, Size: 1443 bytes --]
I recently purchased a Crucial C300 SSD and set it up as my primary drive.
I wanted to use LVM, and my reading indicates that LVM added support to
pass-through TRIM commands with kernel 2.6.37. I'm running Ubuntu 10.10,
but I installed the latest 11.04 kernel, so I'm running kernel 2.6.38-8. I
added "discard" to fstab so my mountpoint looks like:
/dev/mapper/vg0-vol0 on / type ext4
(rw,noatime,nodiratime,errors=remount-ro,discard,commit=0)
I found directions on various sites like:
http://duopetalflower.blogspot.com/2010/11/enterprise-kernel-6-has-ssd-trim.html
on how to confirm if TRIM is working. I ran (as root):
dd if=/dev/urandom of=tempfile count=100 bs=512k oflag=direct
hdparm --fibmap tempfile
(then took the first begin_LBA)
hdparm --read-sector 191710208 /dev/sda
and it printed:
/dev/sda:
reading sector 191710208: succeeded
3254 3a32 3834 313a 2b36 3030 303a 2030
4544 5542 2047 3728 3a29 5320 7661 6e69
2067 6e69 6564 2078 6f66 2072 7270 646f
............
Then I ran:
rm tempfile
sync
hdparm --read-sector 191710208 /dev/sda
and it still returned:
/dev/sda:
reading sector 191710208: succeeded
3254 3a32 3834 313a 2b36 3030 303a 2030
4544 5542 2047 3728 3a29 5320 7661 6e69
2067 6e69 6564 2078 6f66 2072 7270 646f
.............
If TRIM is working, the sector is supposed to contain all zeros.
Am I doing something wrong or do I need to do something more to get LVM to
pass down the TRIM command?
Nick
[-- Attachment #2: Type: text/html, Size: 1643 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] Testing TRIM with LVM
2011-04-12 14:59 [linux-lvm] Testing TRIM with LVM DarkNovaNick
@ 2011-04-12 23:47 ` Mike Snitzer
2011-04-13 1:47 ` DarkNovaNick
0 siblings, 1 reply; 54+ messages in thread
From: Mike Snitzer @ 2011-04-12 23:47 UTC (permalink / raw)
To: DarkNovaNick; +Cc: LVM general discussion and development
On Tue, Apr 12 2011 at 10:59am -0400,
DarkNovaNick@gmail.com <DarkNovaNick@gmail.com> wrote:
> I recently purchased a Crucial C300 SSD and set it up as my primary
> drive. I wanted to use LVM, and my reading indicates that LVM added
> support to pass-through TRIM commands with kernel 2.6.37. I'm
> running Ubuntu 10.10, but I installed the latest 11.04 kernel, so
> I'm running kernel 2.6.38-8. I added "discard" to fstab so my
> mountpoint looks like:
>
> /dev/mapper/vg0-vol0 on / type ext4
> (rw,noatime,nodiratime,errors=remount-ro,discard,commit=0)
If the discards aren't working then ext4 will print a warning and stop
issuing them, see: http://git.kernel.org/linus/a30eec2a8650a77f7
So if you don't see "discard not supported, disabling" in your dmesg (or
/var/log/messages) then TRIM is likely working as expected.
> I found directions on various sites like:
> http://duopetalflower.blogspot.com/2010/11/enterprise-kernel-6-has-ssd-trim.html
> on how to confirm if TRIM is working. I ran (as root):
>
> dd if=/dev/urandom of=tempfile count=100 bs=512k oflag=direct
> hdparm --fibmap tempfile
> (then took the first begin_LBA)
> hdparm --read-sector 191710208 /dev/sda
>
> and it printed:
>
> /dev/sda:
> reading sector 191710208: succeeded
> 3254 3a32 3834 313a 2b36 3030 303a 2030
> 4544 5542 2047 3728 3a29 5320 7661 6e69
> 2067 6e69 6564 2078 6f66 2072 7270 646f
> ............
>
> Then I ran:
> rm tempfile
> sync
> hdparm --read-sector 191710208 /dev/sda
>
> and it still returned:
> /dev/sda:
> reading sector 191710208: succeeded
> 3254 3a32 3834 313a 2b36 3030 303a 2030
> 4544 5542 2047 3728 3a29 5320 7661 6e69
> 2067 6e69 6564 2078 6f66 2072 7270 646f
> .............
>
> If TRIM is working, the sector is supposed to contain all zeros.
That is only if the SSD implements TRIM so that it zeroes the discarded
blocks. You can check if the device at least knows enough to report
discard_zeroes_data:
cat /sys/block/sda/queue/discard_zeroes_data
Also, even if the SSD does zero, it may take some time -- well after the
discard has completed -- for the SSD to actually zero the blocks (this
is true of some lesser quality SSDs).
> Am I doing something wrong or do I need to do something more to get
> LVM to pass down the TRIM command?
LVM (device mapper specifically) passes discards perfectly well. But
the snapshot and dm-crypt targets don't have discard support. All other
targets do have discard support. What is your output for: dmsetup table
Mike
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] Testing TRIM with LVM
2011-04-12 23:47 ` Mike Snitzer
@ 2011-04-13 1:47 ` DarkNovaNick
2011-04-13 8:41 ` Zdenek Kabelac
2011-04-13 22:40 ` [linux-lvm] [PATCH] dm snapshot: add discard support to the snapshot-origin target [was: Re: Testing TRIM with LVM] Mike Snitzer
0 siblings, 2 replies; 54+ messages in thread
From: DarkNovaNick @ 2011-04-13 1:47 UTC (permalink / raw)
To: Mike Snitzer; +Cc: LVM general discussion and development
[-- Attachment #1: Type: text/plain, Size: 4058 bytes --]
On Apr 12, 2011 6:47pm, Mike Snitzer <snitzer@redhat.com> wrote:
> On Tue, Apr 12 2011 at 10:59am -0400,
> DarkNovaNick@gmail.com DarkNovaNick@gmail.com> wrote:
> > I recently purchased a Crucial C300 SSD and set it up as my primary
> > drive. I wanted to use LVM, and my reading indicates that LVM added
> > support to pass-through TRIM commands with kernel 2.6.37. I'm
> > running Ubuntu 10.10, but I installed the latest 11.04 kernel, so
> > I'm running kernel 2.6.38-8. I added "discard" to fstab so my
> > mountpoint looks like:
> >
> > /dev/mapper/vg0-vol0 on / type ext4
> > (rw,noatime,nodiratime,errors=remount-ro,discard,commit=0)
> If the discards aren't working then ext4 will print a warning and stop
> issuing them, see: http://git.kernel.org/linus/a30eec2a8650a77f7
> So if you don't see "discard not supported, disabling" in your dmesg (or
> /var/log/messages) then TRIM is likely working as expected.
> > I found directions on various sites like:
> >
> http://duopetalflower.blogspot.com/2010/11/enterprise-kernel-6-has-ssd-trim.html
> > on how to confirm if TRIM is working. I ran (as root):
> >
> > dd if=/dev/urandom of=tempfile count=100 bs=512k oflag=direct
> > hdparm --fibmap tempfile
> > (then took the first begin_LBA)
> > hdparm --read-sector 191710208 /dev/sda
> >
> > and it printed:
> >
> > /dev/sda:
> > reading sector 191710208: succeeded
> > 3254 3a32 3834 313a 2b36 3030 303a 2030
> > 4544 5542 2047 3728 3a29 5320 7661 6e69
> > 2067 6e69 6564 2078 6f66 2072 7270 646f
> > ............
> >
> > Then I ran:
> > rm tempfile
> > sync
> > hdparm --read-sector 191710208 /dev/sda
> >
> > and it still returned:
> > /dev/sda:
> > reading sector 191710208: succeeded
> > 3254 3a32 3834 313a 2b36 3030 303a 2030
> > 4544 5542 2047 3728 3a29 5320 7661 6e69
> > 2067 6e69 6564 2078 6f66 2072 7270 646f
> > .............
> >
> > If TRIM is working, the sector is supposed to contain all zeros.
> That is only if the SSD implements TRIM so that it zeroes the discarded
> blocks. You can check if the device at least knows enough to report
> discard_zeroes_data:
> cat /sys/block/sda/queue/discard_zeroes_data
> Also, even if the SSD does zero, it may take some time -- well after the
> discard has completed -- for the SSD to actually zero the blocks (this
> is true of some lesser quality SSDs).
> > Am I doing something wrong or do I need to do something more to get
> > LVM to pass down the TRIM command?
> LVM (device mapper specifically) passes discards perfectly well. But
> the snapshot and dm-crypt targets don't have discard support. All other
> targets do have discard support. What is your output for: dmsetup table
> Mike
I get 0 back when I do "cat /sys/block/sda/queue/discard_zeroes_data" so
does this mean that my drive doesn't zero out the TRIMed blocks? I don't
really care if it does or does not, I just want to make sure the drive is
getting the TRIM commands as the benchmarks I've seen show performance
degradation if they aren't used.
Regarding snapshot targets, the only instance of "discard not supported"
came as I did a snapshot today:
Apr 12 15:27:58 darknovanick kernel: [65147.211211] EXT4-fs warning (device
dm-0): release_blocks_on_commit:2672: discard not supported, disabling
Apr 12 15:28:16 darknovanick kernel: [65166.103061] EXT4-fs (dm-1): 72
orphan inodes deleted
Apr 12 15:28:16 darknovanick kernel: [65166.103062] EXT4-fs (dm-1):
recovery complete
Apr 12 15:28:17 darknovanick kernel: [65166.232257] EXT4-fs (dm-1): mounted
filesystem with ordered data mode. Opts: (null)
This snapshot was of a LV on the SSD, but the snapshot itself was being
stored on a mechanical hard drive. Does this message mean that discard was
just disabled for the snapshot itself (which was deleted a half hour
later), or is it now disabled for the entire volume now that I've done a
snapshot, even though its been removed?
My dmsetup table output:
vg0-vol0: 0 249438208 linear 8:2 512
Thanks,
Nick
[-- Attachment #2: Type: text/html, Size: 5644 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] Testing TRIM with LVM
2011-04-13 1:47 ` DarkNovaNick
@ 2011-04-13 8:41 ` Zdenek Kabelac
2011-04-13 15:38 ` DarkNovaNick
2011-04-13 22:40 ` [linux-lvm] [PATCH] dm snapshot: add discard support to the snapshot-origin target [was: Re: Testing TRIM with LVM] Mike Snitzer
1 sibling, 1 reply; 54+ messages in thread
From: Zdenek Kabelac @ 2011-04-13 8:41 UTC (permalink / raw)
To: LVM general discussion and development; +Cc: Mike Snitzer, DarkNovaNick
Dne 13.4.2011 03:47, DarkNovaNick@gmail.com napsal(a):
> On Apr 12, 2011 6:47pm, Mike Snitzer <snitzer@redhat.com> wrote:
>> On Tue, Apr 12 2011 at 10:59am -0400,
>
>> DarkNovaNick@gmail.com DarkNovaNick@gmail.com> wrote:
>
>
>
>> > I recently purchased a Crucial C300 SSD and set it up as my primary
>
>> > drive. I wanted to use LVM, and my reading indicates that LVM added
>
>> > support to pass-through TRIM commands with kernel 2.6.37. I'm
>
>> > running Ubuntu 10.10, but I installed the latest 11.04 kernel, so
>
>> > I'm running kernel 2.6.38-8. I added "discard" to fstab so my
>
>> > mountpoint looks like:
>
>> >
>
>> > /dev/mapper/vg0-vol0 on / type ext4
>
>> > (rw,noatime,nodiratime,errors=remount-ro,discard,commit=0)
>
>
>
>> If the discards aren't working then ext4 will print a warning and stop
>
>> issuing them, see: http://git.kernel.org/linus/a30eec2a8650a77f7
>
>
>
>> So if you don't see "discard not supported, disabling" in your dmesg (or
>
>> /var/log/messages) then TRIM is likely working as expected.
>
>
>
>> > I found directions on various sites like:
>
>> >
>> http://duopetalflower.blogspot.com/2010/11/enterprise-kernel-6-has-ssd-trim.html
>>
>
>> > on how to confirm if TRIM is working. I ran (as root):
>
>> >
>
>> > dd if=/dev/urandom of=tempfile count=100 bs=512k oflag=direct
>
>> > hdparm --fibmap tempfile
>
>> > (then took the first begin_LBA)
>
>> > hdparm --read-sector 191710208 /dev/sda
>
>> >
>
>> > and it printed:
>
>> >
>
>> > /dev/sda:
>
>> > reading sector 191710208: succeeded
>
>> > 3254 3a32 3834 313a 2b36 3030 303a 2030
>
>> > 4544 5542 2047 3728 3a29 5320 7661 6e69
>
>> > 2067 6e69 6564 2078 6f66 2072 7270 646f
>
>> > ............
>
>> >
>
>> > Then I ran:
>
>> > rm tempfile
>
>> > sync
>
>> > hdparm --read-sector 191710208 /dev/sda
>
>> >
>
>> > and it still returned:
>
>> > /dev/sda:
>
>> > reading sector 191710208: succeeded
>
>> > 3254 3a32 3834 313a 2b36 3030 303a 2030
>
>> > 4544 5542 2047 3728 3a29 5320 7661 6e69
>
>> > 2067 6e69 6564 2078 6f66 2072 7270 646f
>
>> > .............
>
>> >
>
>> > If TRIM is working, the sector is supposed to contain all zeros.
>
>
>
>> That is only if the SSD implements TRIM so that it zeroes the discarded
>
>> blocks. You can check if the device at least knows enough to report
>
>> discard_zeroes_data:
>
>
>
>> cat /sys/block/sda/queue/discard_zeroes_data
>
>
>
>> Also, even if the SSD does zero, it may take some time -- well after the
>
>> discard has completed -- for the SSD to actually zero the blocks (this
>
>> is true of some lesser quality SSDs).
>
>
>
>> > Am I doing something wrong or do I need to do something more to get
>
>> > LVM to pass down the TRIM command?
>
>
>
>> LVM (device mapper specifically) passes discards perfectly well. But
>
>> the snapshot and dm-crypt targets don't have discard support. All other
>
>> targets do have discard support. What is your output for: dmsetup table
>
>
>
>> Mike
>
>
>
>
> I get 0 back when I do "cat /sys/block/sda/queue/discard_zeroes_data" so does
> this mean that my drive doesn't zero out the TRIMed blocks? I don't really
> care if it does or does not, I just want to make sure the drive is getting the
> TRIM commands as the benchmarks I've seen show performance degradation if they
> aren't used.
As I've the same drive - what kind of performance degradation do you see?
Note: this drive (unlike many others) is processing TRIM command on
background. So 'traditional' benchmark do not provide valuable results.
(As we've been evaluating this drive somewhat to compare with Intel).
Outcome is - when you issue TRIM command - it's immediately finished - but the
performance of drive is temporarily decreased while the drive is performing
block trimming for some time. Thus you might get 'impressive' results when
some benchmarks are evaluating TRIM performance ;)
Zdenek
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] Testing TRIM with LVM
2011-04-13 8:41 ` Zdenek Kabelac
@ 2011-04-13 15:38 ` DarkNovaNick
0 siblings, 0 replies; 54+ messages in thread
From: DarkNovaNick @ 2011-04-13 15:38 UTC (permalink / raw)
To: Zdenek Kabelac, DarkNovaNick
Cc: Mike Snitzer, LVM general discussion and development
[-- Attachment #1: Type: text/plain, Size: 4741 bytes --]
On Apr 13, 2011 3:41am, Zdenek Kabelac <zdenek.kabelac@gmail.com> wrote:
> Dne 13.4.2011 03:47, DarkNovaNick@gmail.com napsal(a):
> > On Apr 12, 2011 6:47pm, Mike Snitzer snitzer@redhat.com> wrote:
> >> On Tue, Apr 12 2011 at 10:59am -0400,
> >
> >> DarkNovaNick@gmail.com DarkNovaNick@gmail.com> wrote:
> >
> >
> >
> >> > I recently purchased a Crucial C300 SSD and set it up as my primary
> >
> >> > drive. I wanted to use LVM, and my reading indicates that LVM added
> >
> >> > support to pass-through TRIM commands with kernel 2.6.37. I'm
> >
> >> > running Ubuntu 10.10, but I installed the latest 11.04 kernel, so
> >
> >> > I'm running kernel 2.6.38-8. I added "discard" to fstab so my
> >
> >> > mountpoint looks like:
> >
> >> >
> >
> >> > /dev/mapper/vg0-vol0 on / type ext4
> >
> >> > (rw,noatime,nodiratime,errors=remount-ro,discard,commit=0)
> >
> >
> >
> >> If the discards aren't working then ext4 will print a warning and stop
> >
> >> issuing them, see: http://git.kernel.org/linus/a30eec2a8650a77f7
> >
> >
> >
> >> So if you don't see "discard not supported, disabling" in your dmesg
> (or
> >
> >> /var/log/messages) then TRIM is likely working as expected.
> >
> >
> >
> >> > I found directions on various sites like:
> >
> >> >
> >>
> http://duopetalflower.blogspot.com/2010/11/enterprise-kernel-6-has-ssd-trim.html
> >>
> >
> >> > on how to confirm if TRIM is working. I ran (as root):
> >
> >> >
> >
> >> > dd if=/dev/urandom of=tempfile count=100 bs=512k oflag=direct
> >
> >> > hdparm --fibmap tempfile
> >
> >> > (then took the first begin_LBA)
> >
> >> > hdparm --read-sector 191710208 /dev/sda
> >
> >> >
> >
> >> > and it printed:
> >
> >> >
> >
> >> > /dev/sda:
> >
> >> > reading sector 191710208: succeeded
> >
> >> > 3254 3a32 3834 313a 2b36 3030 303a 2030
> >
> >> > 4544 5542 2047 3728 3a29 5320 7661 6e69
> >
> >> > 2067 6e69 6564 2078 6f66 2072 7270 646f
> >
> >> > ............
> >
> >> >
> >
> >> > Then I ran:
> >
> >> > rm tempfile
> >
> >> > sync
> >
> >> > hdparm --read-sector 191710208 /dev/sda
> >
> >> >
> >
> >> > and it still returned:
> >
> >> > /dev/sda:
> >
> >> > reading sector 191710208: succeeded
> >
> >> > 3254 3a32 3834 313a 2b36 3030 303a 2030
> >
> >> > 4544 5542 2047 3728 3a29 5320 7661 6e69
> >
> >> > 2067 6e69 6564 2078 6f66 2072 7270 646f
> >
> >> > .............
> >
> >> >
> >
> >> > If TRIM is working, the sector is supposed to contain all zeros.
> >
> >
> >
> >> That is only if the SSD implements TRIM so that it zeroes the discarded
> >
> >> blocks. You can check if the device at least knows enough to report
> >
> >> discard_zeroes_data:
> >
> >
> >
> >> cat /sys/block/sda/queue/discard_zeroes_data
> >
> >
> >
> >> Also, even if the SSD does zero, it may take some time -- well after
> the
> >
> >> discard has completed -- for the SSD to actually zero the blocks (this
> >
> >> is true of some lesser quality SSDs).
> >
> >
> >
> >> > Am I doing something wrong or do I need to do something more to get
> >
> >> > LVM to pass down the TRIM command?
> >
> >
> >
> >> LVM (device mapper specifically) passes discards perfectly well. But
> >
> >> the snapshot and dm-crypt targets don't have discard support. All other
> >
> >> targets do have discard support. What is your output for: dmsetup table
> >
> >
> >
> >> Mike
> >
> >
> >
> >
> > I get 0 back when I do "cat /sys/block/sda/queue/discard_zeroes_data"
> so does
> > this mean that my drive doesn't zero out the TRIMed blocks? I don't
> really
> > care if it does or does not, I just want to make sure the drive is
> getting the
> > TRIM commands as the benchmarks I've seen show performance degradation
> if they
> > aren't used.
> As I've the same drive - what kind of performance degradation do you see?
> Note: this drive (unlike many others) is processing TRIM command on
> background. So 'traditional' benchmark do not provide valuable results.
> (As we've been evaluating this drive somewhat to compare with Intel).
> Outcome is - when you issue TRIM command - it's immediately finished -
> but the
> performance of drive is temporarily decreased while the drive is
> performing
> block trimming for some time. Thus you might get 'impressive' results when
> some benchmarks are evaluating TRIM performance ;)
> Zdenek
I may not have been clear, but the benchmarks I am referring to show the
performance degradation if you are *not* using TRIM (ie just relying on the
drive's internal garbage collector). Which is why I want to make sure the
TRIM is working so it doesn't get into this poor state.
Nick
[-- Attachment #2: Type: text/html, Size: 8591 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* [linux-lvm] [PATCH] dm snapshot: add discard support to the snapshot-origin target [was: Re: Testing TRIM with LVM]
2011-04-13 1:47 ` DarkNovaNick
2011-04-13 8:41 ` Zdenek Kabelac
@ 2011-04-13 22:40 ` Mike Snitzer
2011-04-13 23:48 ` [linux-lvm] " Mike Snitzer
2011-04-14 15:31 ` [linux-lvm] [PATCH] dm snapshot: add discard support to the snapshot-origin target [was: Re: Testing TRIM wi DarkNovaNick
1 sibling, 2 replies; 54+ messages in thread
From: Mike Snitzer @ 2011-04-13 22:40 UTC (permalink / raw)
To: DarkNovaNick; +Cc: dm-devel, LVM general discussion and development
On Tue, Apr 12 2011 at 9:47pm -0400,
DarkNovaNick@gmail.com <DarkNovaNick@gmail.com> wrote:
> I get 0 back when I do "cat
> /sys/block/sda/queue/discard_zeroes_data" so does this mean that my
> drive doesn't zero out the TRIMed blocks? I don't really care if it
> does or does not, I just want to make sure the drive is getting the
> TRIM commands as the benchmarks I've seen show performance
> degradation if they aren't used.
Right, means TRIM doesn't zero the TRIM'd blocks.
But otherwise it looks like you're in good shape relative to TRIM.
> Regarding snapshot targets, the only instance of "discard not
> supported" came as I did a snapshot today:
>
> Apr 12 15:27:58 darknovanick kernel: [65147.211211] EXT4-fs warning
> (device dm-0): release_blocks_on_commit:2672: discard not supported,
> disabling
> Apr 12 15:28:16 darknovanick kernel: [65166.103061] EXT4-fs (dm-1):
> 72 orphan inodes deleted
> Apr 12 15:28:16 darknovanick kernel: [65166.103062] EXT4-fs (dm-1):
> recovery complete
> Apr 12 15:28:17 darknovanick kernel: [65166.232257] EXT4-fs (dm-1):
> mounted filesystem with ordered data mode. Opts: (null)
>
> This snapshot was of a LV on the SSD, but the snapshot itself was
> being stored on a mechanical hard drive. Does this message mean that
> discard was just disabled for the snapshot itself (which was deleted
> a half hour later), or is it now disabled for the entire volume now
> that I've done a snapshot, even though its been removed?
Unfortunately, the failed discard to the snapshot-origin device disables
ext4's discards for the entire volume (even after you remove the
snapshot).
But the following patch fixes that:
Subject: dm snapshot: add discard support to the snapshot-origin target
Allow the snapshot-origin target to pass discards to the origin device
(after any copyout of the region being discarded; but generally speaking
that copyout won't be needed as the region will have already been copied
to the cow).
So for example, when you remove a file from an ext4 mounted origin
volume it triggers copyout to the snapshot. When the DISCARD is then
issued during the subsequent journal commit no copyout is needed.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-snap.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index a2d3309..028d216 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -2076,6 +2076,7 @@ static int origin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
ti->private = dev;
ti->num_flush_requests = 1;
+ ti->num_discard_requests = 1;
return 0;
}
@@ -2153,7 +2154,7 @@ static int origin_iterate_devices(struct dm_target *ti,
static struct target_type origin_target = {
.name = "snapshot-origin",
- .version = {1, 7, 1},
+ .version = {1, 8, 0},
.module = THIS_MODULE,
.ctr = origin_ctr,
.dtr = origin_dtr,
^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [linux-lvm] dm snapshot: add discard support to the snapshot-origin target [was: Re: Testing TRIM with LVM]
2011-04-13 22:40 ` [linux-lvm] [PATCH] dm snapshot: add discard support to the snapshot-origin target [was: Re: Testing TRIM with LVM] Mike Snitzer
@ 2011-04-13 23:48 ` Mike Snitzer
2011-04-26 17:32 ` Mike Snitzer
2011-04-14 15:31 ` [linux-lvm] [PATCH] dm snapshot: add discard support to the snapshot-origin target [was: Re: Testing TRIM wi DarkNovaNick
1 sibling, 1 reply; 54+ messages in thread
From: Mike Snitzer @ 2011-04-13 23:48 UTC (permalink / raw)
To: DarkNovaNick; +Cc: dm-devel, LVM general discussion and development
On Wed, Apr 13 2011 at 6:40pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> On Tue, Apr 12 2011 at 9:47pm -0400,
> DarkNovaNick@gmail.com <DarkNovaNick@gmail.com> wrote:
>
> > I get 0 back when I do "cat
> > /sys/block/sda/queue/discard_zeroes_data" so does this mean that my
> > drive doesn't zero out the TRIMed blocks? I don't really care if it
> > does or does not, I just want to make sure the drive is getting the
> > TRIM commands as the benchmarks I've seen show performance
> > degradation if they aren't used.
>
> Right, means TRIM doesn't zero the TRIM'd blocks.
>
> But otherwise it looks like you're in good shape relative to TRIM.
>
> > Regarding snapshot targets, the only instance of "discard not
> > supported" came as I did a snapshot today:
> >
> > Apr 12 15:27:58 darknovanick kernel: [65147.211211] EXT4-fs warning
> > (device dm-0): release_blocks_on_commit:2672: discard not supported,
> > disabling
> > Apr 12 15:28:16 darknovanick kernel: [65166.103061] EXT4-fs (dm-1):
> > 72 orphan inodes deleted
> > Apr 12 15:28:16 darknovanick kernel: [65166.103062] EXT4-fs (dm-1):
> > recovery complete
> > Apr 12 15:28:17 darknovanick kernel: [65166.232257] EXT4-fs (dm-1):
> > mounted filesystem with ordered data mode. Opts: (null)
> >
> > This snapshot was of a LV on the SSD, but the snapshot itself was
> > being stored on a mechanical hard drive. Does this message mean that
> > discard was just disabled for the snapshot itself (which was deleted
> > a half hour later), or is it now disabled for the entire volume now
> > that I've done a snapshot, even though its been removed?
>
> Unfortunately, the failed discard to the snapshot-origin device disables
> ext4's discards for the entire volume (even after you remove the
> snapshot).
>
> But the following patch fixes that:
>
> Subject: dm snapshot: add discard support to the snapshot-origin target
>
> Allow the snapshot-origin target to pass discards to the origin device
> (after any copyout of the region being discarded; but generally speaking
> that copyout won't be needed as the region will have already been copied
> to the cow).
>
> So for example, when you remove a file from an ext4 mounted origin
> volume it triggers copyout to the snapshot. When the DISCARD is then
> issued during the subsequent journal commit no copyout is needed.
It should be noted that discards are treated as writes. This means that
a snapshot should not be taken of an origin volume before a filesystem
is placed on that origin. Doing so may result in the entire origin
volume being discarded (mkfs.ext4 does this) which would cause copyout
of the entire origin volume to the snapshot.
Mike
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [PATCH] dm snapshot: add discard support to the snapshot-origin target [was: Re: Testing TRIM wi
2011-04-13 22:40 ` [linux-lvm] [PATCH] dm snapshot: add discard support to the snapshot-origin target [was: Re: Testing TRIM with LVM] Mike Snitzer
2011-04-13 23:48 ` [linux-lvm] " Mike Snitzer
@ 2011-04-14 15:31 ` DarkNovaNick
1 sibling, 0 replies; 54+ messages in thread
From: DarkNovaNick @ 2011-04-14 15:31 UTC (permalink / raw)
To: Mike Snitzer; +Cc: LVM general discussion and development
[-- Attachment #1: Type: text/plain, Size: 1772 bytes --]
On Apr 13, 2011 5:40pm, Mike Snitzer <snitzer@redhat.com> wrote:
> Unfortunately, the failed discard to the snapshot-origin device disables
> ext4's discards for the entire volume (even after you remove the
> snapshot).
> But the following patch fixes that:
> Subject: dm snapshot: add discard support to the snapshot-origin target
> Allow the snapshot-origin target to pass discards to the origin device
> (after any copyout of the region being discarded; but generally speaking
> that copyout won't be needed as the region will have already been copied
> to the cow).
> So for example, when you remove a file from an ext4 mounted origin
> volume it triggers copyout to the snapshot. When the DISCARD is then
> issued during the subsequent journal commit no copyout is needed.
> Signed-off-by: Mike Snitzer snitzer@redhat.com>
> ---
> drivers/md/dm-snap.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
> diff --git a/drivers/md/dm-snap.cb/drivers/md/dm-snap.c
> index a2d3309..028d216 100644
> --- a/drivers/md/dm-snap.c
> +++ b/drivers/md/dm-snap.c
> @@ -2076,6 +2076,7 @@ static int origin_ctr(struct dm_target *ti,
> unsigned int argc, char **argv)
> ti->private = dev;
> ti->num_flush_requests = 1;
> + ti->num_discard_requests = 1;
> return 0;
> }
> @@ -2153,7 +2154,7 @@ static int origin_iterate_devices(struct dm_target
> *ti,
> static struct target_type origin_target = {
> .name = "snapshot-origin",
> - .version = {1, 7, 1},
> + .version = {1, 8, 0},
> .module = THIS_MODULE,
> .ctr = origin_ctr,
> .dtr = origin_dtr,
Thanks, I will put that patch in my kernel. Do you anticipate this change
making its way into the main kernel source at some point in the future?
Thanks again,
Nick
[-- Attachment #2: Type: text/html, Size: 2717 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] dm snapshot: add discard support to the snapshot-origin target [was: Re: Testing TRIM with LVM]
2011-04-13 23:48 ` [linux-lvm] " Mike Snitzer
@ 2011-04-26 17:32 ` Mike Snitzer
2011-04-28 0:19 ` [linux-lvm] [PATCH] dm snapshot: ignore discards issued to the snapshot-origin target Mike Snitzer
0 siblings, 1 reply; 54+ messages in thread
From: Mike Snitzer @ 2011-04-26 17:32 UTC (permalink / raw)
To: DarkNovaNick; +Cc: dm-devel, LVM general discussion and development
On Wed, Apr 13 2011 at 7:48pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> On Wed, Apr 13 2011 at 6:40pm -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
>
> > Unfortunately, the failed discard to the snapshot-origin device disables
> > ext4's discards for the entire volume (even after you remove the
> > snapshot).
> >
> > But the following patch fixes that:
> >
> > Subject: dm snapshot: add discard support to the snapshot-origin target
> >
> > Allow the snapshot-origin target to pass discards to the origin device
> > (after any copyout of the region being discarded; but generally speaking
> > that copyout won't be needed as the region will have already been copied
> > to the cow).
> >
> > So for example, when you remove a file from an ext4 mounted origin
> > volume it triggers copyout to the snapshot. When the DISCARD is then
> > issued during the subsequent journal commit no copyout is needed.
>
> It should be noted that discards are treated as writes. This means that
> a snapshot should not be taken of an origin volume before a filesystem
> is placed on that origin. Doing so may result in the entire origin
> volume being discarded (mkfs.ext4 does this) which would cause copyout
> of the entire origin volume to the snapshot.
Just a quick follow-up. We had a discussion on snapshot-origin discard
support and arrived at this needing to be done in 2 steps:
1) have snapshot-origin ignore discard but return success
2) process the discard, discard treated as a write to the origin, if
user passes an option to enable it
The concern is that the patch I proposed can break existing scripts that
create filesystems on pristine origin devices (that have a snapshot(s)).
Instead of only copying out the various minimalist changes that mkfs
does (when discard isn't used) the entire origin device would be copied
out (when the entire device is discarded early in mkfs.ext4).
I'll circle back to working on this in a few weeks (the annoying bit is
the userspace change to make the snapshot-origin's discard behavior
configurable via lvcreate -s --enable-origin-discards ...; maybe someone
else will get to it before I do :)
Mike
^ permalink raw reply [flat|nested] 54+ messages in thread
* [linux-lvm] [PATCH] dm snapshot: ignore discards issued to the snapshot-origin target
2011-04-26 17:32 ` Mike Snitzer
@ 2011-04-28 0:19 ` Mike Snitzer
2011-04-28 7:53 ` [linux-lvm] [dm-devel] " Christoph Hellwig
0 siblings, 1 reply; 54+ messages in thread
From: Mike Snitzer @ 2011-04-28 0:19 UTC (permalink / raw)
To: dm-devel; +Cc: DarkNovaNick, LVM general discussion and development
Discards pose a problem for the snapshot-origin target because they are
treated as writes. Treating a discard as a write would trigger a
copyout to the snapshot. Such copyout can prove too costly in the face
of otherwise benign scenarios (e.g. create a snapshot and then mkfs.ext4
the origin -- mkfs.ext4 discards the entire volume by default, which
would copyout the entire origin volume to the snapshot).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm-snap.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index a2d3309..639af12 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -2076,6 +2076,7 @@ static int origin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
ti->private = dev;
ti->num_flush_requests = 1;
+ ti->num_discard_requests = 1;
return 0;
}
@@ -2095,6 +2096,12 @@ static int origin_map(struct dm_target *ti, struct bio *bio,
if (bio->bi_rw & REQ_FLUSH)
return DM_MAPIO_REMAPPED;
+ if (bio->bi_rw & REQ_DISCARD) {
+ /* ignore discard requests */
+ bio_endio(bio, 0);
+ return DM_MAPIO_SUBMITTED;
+ }
+
/* Only tell snapshots if this is a write */
return (bio_rw(bio) == WRITE) ? do_origin(dev, bio) : DM_MAPIO_REMAPPED;
}
@@ -2153,7 +2160,7 @@ static int origin_iterate_devices(struct dm_target *ti,
static struct target_type origin_target = {
.name = "snapshot-origin",
- .version = {1, 7, 1},
+ .version = {1, 8, 0},
.module = THIS_MODULE,
.ctr = origin_ctr,
.dtr = origin_dtr,
^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] [PATCH] dm snapshot: ignore discards issued to the snapshot-origin target
2011-04-28 0:19 ` [linux-lvm] [PATCH] dm snapshot: ignore discards issued to the snapshot-origin target Mike Snitzer
@ 2011-04-28 7:53 ` Christoph Hellwig
2011-04-28 20:59 ` [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target] Mike Snitzer
0 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2011-04-28 7:53 UTC (permalink / raw)
To: device-mapper development
Cc: DarkNovaNick, LVM general discussion and development
On Wed, Apr 27, 2011 at 08:19:13PM -0400, Mike Snitzer wrote:
> Discards pose a problem for the snapshot-origin target because they are
> treated as writes. Treating a discard as a write would trigger a
> copyout to the snapshot. Such copyout can prove too costly in the face
> of otherwise benign scenarios (e.g. create a snapshot and then mkfs.ext4
> the origin -- mkfs.ext4 discards the entire volume by default, which
> would copyout the entire origin volume to the snapshot).
You also need to make sure that we don't claim discard_zeroes_data for
the origin volume in this case. Especially as ext4 started to rely
on this actually working (very bad idea IMHO, but that's another story)
^ permalink raw reply [flat|nested] 54+ messages in thread
* [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-04-28 7:53 ` [linux-lvm] [dm-devel] " Christoph Hellwig
@ 2011-04-28 20:59 ` Mike Snitzer
2011-04-28 21:28 ` Eric Sandeen
` (2 more replies)
0 siblings, 3 replies; 54+ messages in thread
From: Mike Snitzer @ 2011-04-28 20:59 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: sandeen, dm-devel, linux-ext4, DarkNovaNick, linux-lvm
[cc'ing linux-ext4]
On Thu, Apr 28 2011 at 3:53am -0400,
Christoph Hellwig <hch@infradead.org> wrote:
> On Wed, Apr 27, 2011 at 08:19:13PM -0400, Mike Snitzer wrote:
> > Discards pose a problem for the snapshot-origin target because they are
> > treated as writes. Treating a discard as a write would trigger a
> > copyout to the snapshot. Such copyout can prove too costly in the face
> > of otherwise benign scenarios (e.g. create a snapshot and then mkfs.ext4
> > the origin -- mkfs.ext4 discards the entire volume by default, which
> > would copyout the entire origin volume to the snapshot).
>
> You also need to make sure that we don't claim discard_zeroes_data for
> the origin volume in this case. Especially as ext4 started to rely
> on this actually working (very bad idea IMHO, but that's another story)
Eric Sandeen helped me see that having the DM snapshot-origin target
return success but actually ignore discards is just bad form.
Especially when you consider that this exercise was motivated by the
fact that ext4 will disable discards on the first discard failure, see:
http://www.redhat.com/archives/dm-devel/2011-April/msg00070.html
Eric and I think it is best to revert this commit:
a30eec2 ext4: stop issuing discards if not supported by device
(though ideally ext4 would still WARN_ONCE per superblock with something
like: "discard failed, please consider disabling discard support")
1) The user asked for discards (with '-o discard' mount option)
- what is the real harm in coninuing to issue them even if it _seems_
they aren't supported?
2) assuming the entire block device uniformly supports discards can
be flawed (a DM device's discard support can vary based on logical
offset).
Thoughts?
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-04-28 20:59 ` [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target] Mike Snitzer
@ 2011-04-28 21:28 ` Eric Sandeen
2011-04-28 22:59 ` Alasdair G Kergon
2011-04-29 1:12 ` Andreas Dilger
2011-04-29 9:30 ` Lukas Czerner
2 siblings, 1 reply; 54+ messages in thread
From: Eric Sandeen @ 2011-04-28 21:28 UTC (permalink / raw)
To: Mike Snitzer
Cc: Christoph Hellwig, dm-devel, linux-ext4, DarkNovaNick, linux-lvm
On 4/28/11 3:59 PM, Mike Snitzer wrote:
> [cc'ing linux-ext4]
>
> On Thu, Apr 28 2011 at 3:53am -0400,
> Christoph Hellwig <hch@infradead.org> wrote:
>
>> On Wed, Apr 27, 2011 at 08:19:13PM -0400, Mike Snitzer wrote:
>>> Discards pose a problem for the snapshot-origin target because they are
>>> treated as writes. Treating a discard as a write would trigger a
>>> copyout to the snapshot. Such copyout can prove too costly in the face
>>> of otherwise benign scenarios (e.g. create a snapshot and then mkfs.ext4
>>> the origin -- mkfs.ext4 discards the entire volume by default, which
>>> would copyout the entire origin volume to the snapshot).
>>
>> You also need to make sure that we don't claim discard_zeroes_data for
>> the origin volume in this case. Especially as ext4 started to rely
>> on this actually working (very bad idea IMHO, but that's another story)
>
> Eric Sandeen helped me see that having the DM snapshot-origin target
> return success but actually ignore discards is just bad form.
>
> Especially when you consider that this exercise was motivated by the
> fact that ext4 will disable discards on the first discard failure, see:
> http://www.redhat.com/archives/dm-devel/2011-April/msg00070.html
>
> Eric and I think it is best to revert this commit:
> a30eec2 ext4: stop issuing discards if not supported by device
>
> (though ideally ext4 would still WARN_ONCE per superblock with something
> like: "discard failed, please consider disabling discard support")
>
> 1) The user asked for discards (with '-o discard' mount option)
> - what is the real harm in coninuing to issue them even if it _seems_
> they aren't supported?
TBH I sent a30eec2 on a whim. Seemed reasonable at the time, but if
discard-ability changes over time, it may not be the best plan.
> 2) assuming the entire block device uniformly supports discards can
> be flawed (a DM device's discard support can vary based on logical
> offset).
I still think that concats of floppies, usb disks, and ssds should be rare, so I'm less concerned about that ;)
I think Mike is right though, that if you do not do anything with a discard, you should return -EOPNOTSUPP, and not pretend that you honored it.
We should, IMHO, deal with the truth of the matter at the filesystem caller.
-Eric
> Thoughts?
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-04-28 21:28 ` Eric Sandeen
@ 2011-04-28 22:59 ` Alasdair G Kergon
2011-04-28 23:01 ` Eric Sandeen
0 siblings, 1 reply; 54+ messages in thread
From: Alasdair G Kergon @ 2011-04-28 22:59 UTC (permalink / raw)
To: LVM general discussion and development, Mike Snitzer,
Christoph Hellwig, dm-devel, linux-ext4, DarkNovaNick
On Thu, Apr 28, 2011 at 04:28:17PM -0500, Eric Sandeen wrote:
> I still think that concats of floppies, usb disks, and ssds should be rare, so I'm less concerned about that ;)
It's the 'undefined' cases that cause us the trouble though: we do have
to return something and I prefer to work with defined and documented
behaviour, rather than pretending something is 'undefined' or
'unimportant' and later finding people relied on its actual behaviour.
Alasdair
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-04-28 22:59 ` Alasdair G Kergon
@ 2011-04-28 23:01 ` Eric Sandeen
2011-04-28 23:11 ` Alasdair G Kergon
0 siblings, 1 reply; 54+ messages in thread
From: Eric Sandeen @ 2011-04-28 23:01 UTC (permalink / raw)
To: LVM general discussion and development, Mike Snitzer,
Christoph Hellwig, dm-devel, linux-ext4, DarkNovaNick
On 4/28/11 5:59 PM, Alasdair G Kergon wrote:
> On Thu, Apr 28, 2011 at 04:28:17PM -0500, Eric Sandeen wrote:
>> I still think that concats of floppies, usb disks, and ssds should be rare, so I'm less concerned about that ;)
>
> It's the 'undefined' cases that cause us the trouble though: we do have
> to return something and I prefer to work with defined and documented
> behaviour, rather than pretending something is 'undefined' or
> 'unimportant' and later finding people relied on its actual behaviour.
If you are saying that a target should return EOPNOTSUPP when it does not support the operation, then we are in violent agreement. :)
-Eric
> Alasdair
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-04-28 23:01 ` Eric Sandeen
@ 2011-04-28 23:11 ` Alasdair G Kergon
0 siblings, 0 replies; 54+ messages in thread
From: Alasdair G Kergon @ 2011-04-28 23:11 UTC (permalink / raw)
To: Eric Sandeen
Cc: Mike Snitzer, Christoph Hellwig, dm-devel, DarkNovaNick,
LVM general discussion and development, linux-ext4
On Thu, Apr 28, 2011 at 06:01:18PM -0500, Eric Sandeen wrote:
> If you are saying that a target should return EOPNOTSUPP when it does not support the operation, then we are in violent agreement. :)
Naturally, with the definition you used in the first post.
(I.e. if the target does *anything* with it, you do not get that error, even
if it did not process it completely.)
Alasdair
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-04-28 20:59 ` [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target] Mike Snitzer
2011-04-28 21:28 ` Eric Sandeen
@ 2011-04-29 1:12 ` Andreas Dilger
2011-04-29 13:55 ` Mike Snitzer
2011-04-29 9:30 ` Lukas Czerner
2 siblings, 1 reply; 54+ messages in thread
From: Andreas Dilger @ 2011-04-29 1:12 UTC (permalink / raw)
To: Mike Snitzer
Cc: sandeen, Christoph Hellwig, dm-devel, DarkNovaNick, linux-lvm,
linux-ext4
On Apr 28, 2011, at 14:59, Mike Snitzer wrote:
> On Thu, Apr 28 2011 at 3:53am -0400,
> Christoph Hellwig <hch@infradead.org> wrote:
>> On Wed, Apr 27, 2011 at 08:19:13PM -0400, Mike Snitzer wrote:
>>> Discards pose a problem for the snapshot-origin target because they are
>>> treated as writes. Treating a discard as a write would trigger a
>>> copyout to the snapshot. Such copyout can prove too costly in the face
>>> of otherwise benign scenarios (e.g. create a snapshot and then mkfs.ext4
>>> the origin -- mkfs.ext4 discards the entire volume by default, which
>>> would copyout the entire origin volume to the snapshot).
>>
>> You also need to make sure that we don't claim discard_zeroes_data for
>> the origin volume in this case. Especially as ext4 started to rely
>> on this actually working (very bad idea IMHO, but that's another story)
>
> Eric Sandeen helped me see that having the DM snapshot-origin target
> return success but actually ignore discards is just bad form.
>
> Especially when you consider that this exercise was motivated by the
> fact that ext4 will disable discards on the first discard failure, see:
> http://www.redhat.com/archives/dm-devel/2011-April/msg00070.html
>
> Eric and I think it is best to revert this commit:
> a30eec2 ext4: stop issuing discards if not supported by device
>
> (though ideally ext4 would still WARN_ONCE per superblock with something
> like: "discard failed, please consider disabling discard support")
>
> 1) The user asked for discards (with '-o discard' mount option)
> - what is the real harm in coninuing to issue them even if it _seems_
> they aren't supported?
Aren't discard operations considered barriers? If there are needless discard operations being sent to the disk then it seems like this would have a non-zero performance penalty, just by virtue of blocking IO submissions at the block device layer.
If my understanding of the mechanics is flawed, then please ignore.
> 2) assuming the entire block device uniformly supports discards can
> be flawed (a DM device's discard support can vary based on logical
> offset).
Hybrid storage is potentially an increasingly popular configuration. We've been looking at DM-mapped SSD RAID-1 + HDD RAID-6 arrays for ext4 to allow storing only the metadata on SSD. That said, it may make sense to have the DM translation layer check the component devices of an LV to see if discard is supported on any of them, and only return -EOPNOTSUPP if none of them do.
While not a perfect solution for the case where an LV is migrated onto a device that does support discard, I don't think we have to handle every corner case perfectly to still handle reasonable use cases in a sensible manner.
Cheers, Andreas
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-04-28 20:59 ` [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target] Mike Snitzer
2011-04-28 21:28 ` Eric Sandeen
2011-04-29 1:12 ` Andreas Dilger
@ 2011-04-29 9:30 ` Lukas Czerner
2011-04-29 12:24 ` [linux-lvm] [dm-devel] " Alasdair G Kergon
2 siblings, 1 reply; 54+ messages in thread
From: Lukas Czerner @ 2011-04-29 9:30 UTC (permalink / raw)
To: Mike Snitzer
Cc: sandeen, Christoph Hellwig, dm-devel, DarkNovaNick, linux-lvm,
linux-ext4
On Thu, 28 Apr 2011, Mike Snitzer wrote:
> [cc'ing linux-ext4]
>
> On Thu, Apr 28 2011 at 3:53am -0400,
> Christoph Hellwig <hch@infradead.org> wrote:
>
> > On Wed, Apr 27, 2011 at 08:19:13PM -0400, Mike Snitzer wrote:
> > > Discards pose a problem for the snapshot-origin target because they are
> > > treated as writes. Treating a discard as a write would trigger a
> > > copyout to the snapshot. Such copyout can prove too costly in the face
> > > of otherwise benign scenarios (e.g. create a snapshot and then mkfs.ext4
> > > the origin -- mkfs.ext4 discards the entire volume by default, which
> > > would copyout the entire origin volume to the snapshot).
> >
> > You also need to make sure that we don't claim discard_zeroes_data for
> > the origin volume in this case. Especially as ext4 started to rely
> > on this actually working (very bad idea IMHO, but that's another story)
I do not think that it is bad idea. It is supposed to work and we do not
want to "optimize" for broken devices (or broken cheap crap, as someone
concisely described before).
>
> Eric Sandeen helped me see that having the DM snapshot-origin target
> return success but actually ignore discards is just bad form.
>
> Especially when you consider that this exercise was motivated by the
> fact that ext4 will disable discards on the first discard failure, see:
> http://www.redhat.com/archives/dm-devel/2011-April/msg00070.html
>
> Eric and I think it is best to revert this commit:
> a30eec2 ext4: stop issuing discards if not supported by device
>
> (though ideally ext4 would still WARN_ONCE per superblock with something
> like: "discard failed, please consider disabling discard support")
I think that we do not need to revert it, we just need to do the "right
thing" in the underlying layers. That said:
1. We need to honor all the "discard limits" so the discard bios does
not actually fail.
2. If the device is composed of various other devices, we should return
-EOPNOTSUPP is none of the devices support discard.
3. We should succeed, if at least one of the devices support discard and
it does not fail for any reason.
4. We should not advertise discard_zeroes_data if any of the devices
does not zero data or does not support discard.
I am not sure how "hard" is to assure those conditions in DM. If those
conditions are met, we can rely on consistent information in the layers
above.
Thanks!
-Lukas
>
> 1) The user asked for discards (with '-o discard' mount option)
> - what is the real harm in coninuing to issue them even if it _seems_
> they aren't supported?
> 2) assuming the entire block device uniformly supports discards can
> be flawed (a DM device's discard support can vary based on logical
> offset).
>
> Thoughts?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-04-29 9:30 ` Lukas Czerner
@ 2011-04-29 12:24 ` Alasdair G Kergon
2011-04-29 12:29 ` Christoph Hellwig
2011-05-02 7:16 ` Lukas Czerner
0 siblings, 2 replies; 54+ messages in thread
From: Alasdair G Kergon @ 2011-04-29 12:24 UTC (permalink / raw)
To: Lukas Czerner
Cc: sandeen, Mike Snitzer, Christoph Hellwig, dm-devel, DarkNovaNick,
linux-lvm, linux-ext4
On Fri, Apr 29, 2011 at 11:30:47AM +0200, Lukas Czerner wrote:
> 1. We need to honor all the "discard limits" so the discard bios does
> not actually fail.
> 2. If the device is composed of various other devices, we should return
> -EOPNOTSUPP is none of the devices support discard.
> 3. We should succeed, if at least one of the devices support discard and
> it does not fail for any reason.
> 4. We should not advertise discard_zeroes_data if any of the devices
> does not zero data or does not support discard.
> I am not sure how "hard" is to assure those conditions in DM. If those
> conditions are met, we can rely on consistent information in the layers
> above.
Remember that -EOPNOTSUPP return applies only to that one *specific*
discard. It does not tell you for sure whether or not another future
discard to the device will succeed. (It's a property of offset - if
there are several devices underneath - and of time - if the device or
one below it gets reconfigured.)
The core issue here is whether a filesytem should decide that the
receipt of a single -EOPNOTSUPP is a reason never to send any more,
whether a more sophisticated algorithm should be used (considering
the proportion/offsets of them over given periods of time and retrying
later), or whether more comprehensive information about the discard
capabilities of the device should be presented - and whether this should
be handled automatically or whether it should be under userspace control
(i.e. the sysadmin can instruct the filesystem what to do).
dm's role is simply to handle a discard if it can, and report back
if it couldn't. Additionally it could provide more comprehensive
information about the discard capabilities of the device, but my sense
is that most consider this unnecessary as normally dm devices will have
a coherent behaviour throughout.
Alasdair
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-04-29 12:24 ` [linux-lvm] [dm-devel] " Alasdair G Kergon
@ 2011-04-29 12:29 ` Christoph Hellwig
2011-04-29 14:28 ` Eric Sandeen
2011-05-02 7:16 ` Lukas Czerner
1 sibling, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2011-04-29 12:29 UTC (permalink / raw)
To: Lukas Czerner, Mike Snitzer, sandeen, Christoph Hellwig, dm-devel,
DarkNovaNick, linux-lvm, linux-ext4
FYI, that disable discard on failure seems to be an ext4 special and no
one else picked up that stupid idea. If ext4 wants to misbehave for
that just let them..
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-04-29 1:12 ` Andreas Dilger
@ 2011-04-29 13:55 ` Mike Snitzer
0 siblings, 0 replies; 54+ messages in thread
From: Mike Snitzer @ 2011-04-29 13:55 UTC (permalink / raw)
To: Andreas Dilger
Cc: Christoph Hellwig, sandeen,
linux-ext4@vger.kernel.org device-mapper development,
DarkNovaNick, linux-lvm
On Thu, Apr 28 2011 at 9:12pm -0400,
Andreas Dilger <adilger.kernel@dilger.ca> wrote:
> On Apr 28, 2011, at 14:59, Mike Snitzer wrote:
> > Eric and I think it is best to revert this commit:
> > a30eec2 ext4: stop issuing discards if not supported by device
> >
> > (though ideally ext4 would still WARN_ONCE per superblock with something
> > like: "discard failed, please consider disabling discard support")
> >
> > 1) The user asked for discards (with '-o discard' mount option)
> > - what is the real harm in coninuing to issue them even if it _seems_
> > they aren't supported?
>
> Aren't discard operations considered barriers? If there are needless
> discard operations being sent to the disk then it seems like this
> would have a non-zero performance penalty, just by virtue of blocking
> IO submissions at the block device layer.
The blkdev_issue_discard() interface is synchronous. Given that, yes it
will cause jbd2 to block waiting for it to complete.
For DM, the discard would not actually be issued to the physical storage
there is still work that would be done to submit the bio only to have DM
return -EOPNOTSUPP.
But ext4 doesn't care if it is on DM or not; so I agree that cost of
continuing to issue discards would be higher on non-DM storage.
That said, the user asked for discards via -o discard. Ext4 is taking
that to mean "no the user only _really_ asked for discards if they will
always work".
If ext4 is committed to that disable-on-first-failure hueristic then
Documentation/filesystems/ext4.txt should probably speak to it.
> > 2) assuming the entire block device uniformly supports discards can
> > be flawed (a DM device's discard support can vary based on logical
> > offset).
>
> Hybrid storage is potentially an increasingly popular configuration.
> We've been looking at DM-mapped SSD RAID-1 + HDD RAID-6 arrays for
> ext4 to allow storing only the metadata on SSD.
Meaning you assemble a tailormade DM linear (concat) device that mixes
the two? What are you using to assemble that device -- presummably some
code that is aware of ext4's metadata and data layout? Code hooked off
of mkfs?
> That said, it may
> make sense to have the DM translation layer check the component
> devices of an LV to see if discard is supported on any of them, and
> only return -EOPNOTSUPP if none of them do.
That is what DM does. DM will only return -EOPNOTSUPP if a particular
target doesn't support discards. If only one device supports discards
then the entire logical DM device advertises discard support.
Mike
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-04-29 12:29 ` Christoph Hellwig
@ 2011-04-29 14:28 ` Eric Sandeen
2011-04-29 15:13 ` Ray Morris
2011-05-04 16:33 ` Ted Ts'o
0 siblings, 2 replies; 54+ messages in thread
From: Eric Sandeen @ 2011-04-29 14:28 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Snitzer, dm-devel, DarkNovaNick, linux-lvm, Lukas Czerner,
linux-ext4
On 4/29/11 7:29 AM, Christoph Hellwig wrote:
> FYI, that disable discard on failure seems to be an ext4 special and no
> one else picked up that stupid idea. If ext4 wants to misbehave for
> that just let them..
It was my "stupid idea," and I'm ok with reverting it ;)
-Eric
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-04-29 14:28 ` Eric Sandeen
@ 2011-04-29 15:13 ` Ray Morris
2011-05-04 16:33 ` Ted Ts'o
1 sibling, 0 replies; 54+ messages in thread
From: Ray Morris @ 2011-04-29 15:13 UTC (permalink / raw)
To: LVM general discussion and development
Cc: sandeen, Mike Snitzer, Lukas, Christoph Hellwig, dm-devel,
DarkNovaNick, Czerner, linux-ext4
> On 4/29/11 7:29 AM, Christoph Hellwig wrote:
> > no one else picked up that stupid idea. If ext4 wants to
> > misbehave for that just let them..
>
> It was my "stupid idea," and I'm ok with reverting it ;)
>
> -Eric
http://www.google.com/search?q=how+to+remove+foot+from+mouth
--
Ray Morris
support@bettercgi.com
Strongbox - The next generation in site security:
http://www.bettercgi.com/strongbox/
Throttlebox - Intelligent Bandwidth Control
http://www.bettercgi.com/throttlebox/
Strongbox / Throttlebox affiliate program:
http://www.bettercgi.com/affiliates/user/register.php
On Fri, 29 Apr 2011 09:28:51 -0500
Eric Sandeen <sandeen@redhat.com> wrote:
> On 4/29/11 7:29 AM, Christoph Hellwig wrote:
> > FYI, that disable discard on failure seems to be an ext4 special
> > and no one else picked up that stupid idea. If ext4 wants to
> > misbehave for that just let them..
>
> It was my "stupid idea," and I'm ok with reverting it ;)
>
> -Eric
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-04-29 12:24 ` [linux-lvm] [dm-devel] " Alasdair G Kergon
2011-04-29 12:29 ` Christoph Hellwig
@ 2011-05-02 7:16 ` Lukas Czerner
2011-05-02 8:13 ` Alasdair G Kergon
1 sibling, 1 reply; 54+ messages in thread
From: Lukas Czerner @ 2011-05-02 7:16 UTC (permalink / raw)
To: Alasdair G Kergon
Cc: sandeen, Mike Snitzer, Christoph Hellwig, dm-devel, DarkNovaNick,
linux-lvm, Lukas Czerner, linux-ext4
On Fri, 29 Apr 2011, Alasdair G Kergon wrote:
> On Fri, Apr 29, 2011 at 11:30:47AM +0200, Lukas Czerner wrote:
> > 1. We need to honor all the "discard limits" so the discard bios does
> > not actually fail.
> > 2. If the device is composed of various other devices, we should return
> > -EOPNOTSUPP is none of the devices support discard.
> > 3. We should succeed, if at least one of the devices support discard and
> > it does not fail for any reason.
> > 4. We should not advertise discard_zeroes_data if any of the devices
> > does not zero data or does not support discard.
> > I am not sure how "hard" is to assure those conditions in DM. If those
> > conditions are met, we can rely on consistent information in the layers
> > above.
>
> Remember that -EOPNOTSUPP return applies only to that one *specific*
> discard. It does not tell you for sure whether or not another future
> discard to the device will succeed. (It's a property of offset - if
> there are several devices underneath - and of time - if the device or
> one below it gets reconfigured.)
>
> The core issue here is whether a filesytem should decide that the
> receipt of a single -EOPNOTSUPP is a reason never to send any more,
> whether a more sophisticated algorithm should be used (considering
> the proportion/offsets of them over given periods of time and retrying
> later), or whether more comprehensive information about the discard
> capabilities of the device should be presented - and whether this should
> be handled automatically or whether it should be under userspace control
> (i.e. the sysadmin can instruct the filesystem what to do).
>
> dm's role is simply to handle a discard if it can, and report back
> if it couldn't. Additionally it could provide more comprehensive
> information about the discard capabilities of the device, but my sense
> is that most consider this unnecessary as normally dm devices will have
> a coherent behaviour throughout.
>
> Alasdair
>
>
Hi Alasdair,
I like the last idea the best. Dm is the only one who has the
overall information about the storage, hence it should provide
some of that information for others to do the right thing.
Christoph mentioned that ext4 is the only one actually disabling
discard on -EOPNOTSUPP, but firstly that's not true (gfs2 and
nilfs does that as well) and secondly it is the right thing to
do! What else should we be doing when it tells us that "discard is
not supported"? continue trying ? that does not make sense at all!
However when we have dm device where part of the device supports
discard due to underlying hardware capability we just can not return
EOPNOTSUPP from blkdev_issue_discard, because it is just not true! It
does support discard, but it is not consistent through the whole device
and at some layer we should notice that and not let it through. How to
do this is just implementation detail. We can either do it at dm level,
while handling bios, or we can export "do not return eopnotsupp"
information and blkdev_issue_discard() should use that information to do
the right thing.
Thanks!
-Lukas
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-02 7:16 ` Lukas Czerner
@ 2011-05-02 8:13 ` Alasdair G Kergon
2011-05-02 8:19 ` Christoph Hellwig
0 siblings, 1 reply; 54+ messages in thread
From: Alasdair G Kergon @ 2011-05-02 8:13 UTC (permalink / raw)
To: device-mapper development
Cc: sandeen, Mike Snitzer, Christoph Hellwig, DarkNovaNick, linux-lvm,
Lukas Czerner, linux-ext4, Alasdair G Kergon
On Mon, May 02, 2011 at 09:16:21AM +0200, Lukas Czerner wrote:
> However when we have dm device where part of the device supports
> discard due to underlying hardware capability we just can not return
> EOPNOTSUPP from blkdev_issue_discard, because it is just not true!
EOPNOTSUPP from dm means the operation was not supported on that *one* bio.
It does *not* tell you anything in general about the device, or whether
you'd get the same error from different bios in future.
Alasdair
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-02 8:13 ` Alasdair G Kergon
@ 2011-05-02 8:19 ` Christoph Hellwig
2011-05-02 10:24 ` Lukas Czerner
0 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2011-05-02 8:19 UTC (permalink / raw)
To: device-mapper development, Alasdair G Kergon, sandeen,
Mike Snitzer, Christoph Hellwig, DarkNovaNick, linux-lvm,
Lukas Czerner, linux-ext4
On Mon, May 02, 2011 at 09:13:08AM +0100, Alasdair G Kergon wrote:
> On Mon, May 02, 2011 at 09:16:21AM +0200, Lukas Czerner wrote:
> > However when we have dm device where part of the device supports
> > discard due to underlying hardware capability we just can not return
> > EOPNOTSUPP from blkdev_issue_discard, because it is just not true!
>
> EOPNOTSUPP from dm means the operation was not supported on that *one* bio.
> It does *not* tell you anything in general about the device, or whether
> you'd get the same error from different bios in future.
Exactly. We already have the information in the queue limits to tell
the filesystem if discard is supported at all or not.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-02 8:19 ` Christoph Hellwig
@ 2011-05-02 10:24 ` Lukas Czerner
2011-05-02 12:48 ` [linux-lvm] " Mike Snitzer
` (2 more replies)
0 siblings, 3 replies; 54+ messages in thread
From: Lukas Czerner @ 2011-05-02 10:24 UTC (permalink / raw)
To: Christoph Hellwig
Cc: sandeen, Mike Snitzer, device-mapper development, DarkNovaNick,
linux-lvm, Lukas Czerner, linux-ext4, Alasdair G Kergon
On Mon, 2 May 2011, Christoph Hellwig wrote:
> On Mon, May 02, 2011 at 09:13:08AM +0100, Alasdair G Kergon wrote:
> > On Mon, May 02, 2011 at 09:16:21AM +0200, Lukas Czerner wrote:
> > > However when we have dm device where part of the device supports
> > > discard due to underlying hardware capability we just can not return
> > > EOPNOTSUPP from blkdev_issue_discard, because it is just not true!
> >
> > EOPNOTSUPP from dm means the operation was not supported on that *one* bio.
> > It does *not* tell you anything in general about the device, or whether
> > you'd get the same error from different bios in future.
>
> Exactly. We already have the information in the queue limits to tell
> the filesystem if discard is supported at all or not.
>
So I gave it a try. First of all the device composed of SSD and spinning
disk does export discard_support information properly, however it also
advertise discard_zeroes_data which is wrong and possibly dangerous and
should be fixed!
And second of all, strictly speaking if EOPNOTSUPP from dm means that
operation was not supported on that *one* bio, blkdev_issue_discard()
should handle that and do not return EOPNOTSUPP further if queue limits
tells that device has discard support. Is this acceptable solution for
you guys ? I can make that change since I am changing blkdev_issue_discard()
anyway. Or, we can make that change in filesystem where we disable
discard on mount time, when we notice that discard mount option was
specified, but the device does not support it (we should probably do
this regardless on blkdev_issue_discard() change).
Thanks!
-Lukas
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-02 10:24 ` Lukas Czerner
@ 2011-05-02 12:48 ` Mike Snitzer
2011-05-02 13:05 ` Lukas Czerner
2011-05-02 13:48 ` [linux-lvm] [dm-devel] " Martin K. Petersen
2011-05-02 14:20 ` Martin K. Petersen
2 siblings, 1 reply; 54+ messages in thread
From: Mike Snitzer @ 2011-05-02 12:48 UTC (permalink / raw)
To: Lukas Czerner
Cc: sandeen, Christoph Hellwig, device-mapper development,
DarkNovaNick, linux-lvm, linux-ext4, Alasdair G Kergon
On Mon, May 02 2011 at 6:24am -0400,
Lukas Czerner <lczerner@redhat.com> wrote:
> On Mon, 2 May 2011, Christoph Hellwig wrote:
>
> > On Mon, May 02, 2011 at 09:13:08AM +0100, Alasdair G Kergon wrote:
> > > On Mon, May 02, 2011 at 09:16:21AM +0200, Lukas Czerner wrote:
> > > > However when we have dm device where part of the device supports
> > > > discard due to underlying hardware capability we just can not return
> > > > EOPNOTSUPP from blkdev_issue_discard, because it is just not true!
> > >
> > > EOPNOTSUPP from dm means the operation was not supported on that *one* bio.
> > > It does *not* tell you anything in general about the device, or whether
> > > you'd get the same error from different bios in future.
> >
> > Exactly. We already have the information in the queue limits to tell
> > the filesystem if discard is supported at all or not.
> >
>
> So I gave it a try. First of all the device composed of SSD and spinning
> disk does export discard_support information properly, however it also
> advertise discard_zeroes_data which is wrong and possibly dangerous and
> should be fixed!
You're effectively advocating that blk_stack_limits() needs to clear the
topmost device's discard_zeroes_data flag if any one bottom device does
not have discard_zeroes_data.
> And second of all, strictly speaking if EOPNOTSUPP from dm means that
> operation was not supported on that *one* bio, blkdev_issue_discard()
> should handle that and do not return EOPNOTSUPP further if queue limits
> tells that device has discard support. Is this acceptable solution for
> you guys ? I can make that change since I am changing blkdev_issue_discard()
> anyway. Or, we can make that change in filesystem where we disable
> discard on mount time, when we notice that discard mount option was
> specified, but the device does not support it (we should probably do
> this regardless on blkdev_issue_discard() change).
The blkdev_issue_discard() change you propose could be fine (mask
EOPNOTSUPP return if device advertises support for discards) -- though
Eric said we shouldn't ever say we did something when we didn't.
But that blkdev_issue_discard() change is really only safe if the
discard_zeroes_data flag is cleared by blk_stack_limits() if finds
inconsistent discard_zeroes_data support.
Mike
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-02 12:48 ` [linux-lvm] " Mike Snitzer
@ 2011-05-02 13:05 ` Lukas Czerner
2011-05-02 14:47 ` Eric Sandeen
0 siblings, 1 reply; 54+ messages in thread
From: Lukas Czerner @ 2011-05-02 13:05 UTC (permalink / raw)
To: Mike Snitzer
Cc: sandeen, Christoph Hellwig, device-mapper development,
DarkNovaNick, linux-lvm, Lukas Czerner, linux-ext4,
Alasdair G Kergon
On Mon, 2 May 2011, Mike Snitzer wrote:
> On Mon, May 02 2011 at 6:24am -0400,
> Lukas Czerner <lczerner@redhat.com> wrote:
>
> > On Mon, 2 May 2011, Christoph Hellwig wrote:
> >
> > > On Mon, May 02, 2011 at 09:13:08AM +0100, Alasdair G Kergon wrote:
> > > > On Mon, May 02, 2011 at 09:16:21AM +0200, Lukas Czerner wrote:
> > > > > However when we have dm device where part of the device supports
> > > > > discard due to underlying hardware capability we just can not return
> > > > > EOPNOTSUPP from blkdev_issue_discard, because it is just not true!
> > > >
> > > > EOPNOTSUPP from dm means the operation was not supported on that *one* bio.
> > > > It does *not* tell you anything in general about the device, or whether
> > > > you'd get the same error from different bios in future.
> > >
> > > Exactly. We already have the information in the queue limits to tell
> > > the filesystem if discard is supported at all or not.
> > >
> >
> > So I gave it a try. First of all the device composed of SSD and spinning
> > disk does export discard_support information properly, however it also
> > advertise discard_zeroes_data which is wrong and possibly dangerous and
> > should be fixed!
>
> You're effectively advocating that blk_stack_limits() needs to clear the
> topmost device's discard_zeroes_data flag if any one bottom device does
> not have discard_zeroes_data.
Exactly.
>
> > And second of all, strictly speaking if EOPNOTSUPP from dm means that
> > operation was not supported on that *one* bio, blkdev_issue_discard()
> > should handle that and do not return EOPNOTSUPP further if queue limits
> > tells that device has discard support. Is this acceptable solution for
> > you guys ? I can make that change since I am changing blkdev_issue_discard()
> > anyway. Or, we can make that change in filesystem where we disable
> > discard on mount time, when we notice that discard mount option was
> > specified, but the device does not support it (we should probably do
> > this regardless on blkdev_issue_discard() change).
>
> The blkdev_issue_discard() change you propose could be fine (mask
> EOPNOTSUPP return if device advertises support for discards) -- though
> Eric said we shouldn't ever say we did something when we didn't.
Exactly, so we should not say that it is not supported when it is, but
we just hit the "wrong" part of the device:) I would just very much like
to keep the abstraction of having one consistent device underneath the
file system and not deal with several devices, or regions with different
behaviour in the file system itself (let the pixies underneath deal with
that, after all not all of us are btrfs:))
>
> But that blkdev_issue_discard() change is really only safe if the
> discard_zeroes_data flag is cleared by blk_stack_limits() if finds
> inconsistent discard_zeroes_data support.
Agreed
>
> Mike
>
Thanks!
-Lukas
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-02 10:24 ` Lukas Czerner
2011-05-02 12:48 ` [linux-lvm] " Mike Snitzer
@ 2011-05-02 13:48 ` Martin K. Petersen
2011-05-02 14:20 ` Martin K. Petersen
2 siblings, 0 replies; 54+ messages in thread
From: Martin K. Petersen @ 2011-05-02 13:48 UTC (permalink / raw)
To: Lukas Czerner
Cc: sandeen, Mike Snitzer, Christoph Hellwig,
device-mapper development, DarkNovaNick, linux-lvm, linux-ext4,
Alasdair G Kergon
>>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
Lukas> however it also advertise discard_zeroes_data which is wrong and
Lukas> possibly dangerous and should be fixed!
That shouldn't happen. I'll have a look...
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-02 10:24 ` Lukas Czerner
2011-05-02 12:48 ` [linux-lvm] " Mike Snitzer
2011-05-02 13:48 ` [linux-lvm] [dm-devel] " Martin K. Petersen
@ 2011-05-02 14:20 ` Martin K. Petersen
2011-05-02 14:39 ` Lukas Czerner
2 siblings, 1 reply; 54+ messages in thread
From: Martin K. Petersen @ 2011-05-02 14:20 UTC (permalink / raw)
To: Lukas Czerner
Cc: sandeen, Mike Snitzer, Christoph Hellwig,
device-mapper development, DarkNovaNick, linux-lvm, linux-ext4,
Alasdair G Kergon
>>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
Lukas> So I gave it a try. First of all the device composed of SSD and
Lukas> spinning disk does export discard_support information properly,
Lukas> however it also advertise discard_zeroes_data which is wrong and
Lukas> possibly dangerous and should be fixed!
I can't reproduce this here. If I mix discard and non-discard devices
things work correctly. discard_zeroes_data also gets cleared if I mix
discard-capable drives where one zeroes and one doesn't.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-02 14:20 ` Martin K. Petersen
@ 2011-05-02 14:39 ` Lukas Czerner
2011-05-02 14:50 ` Martin K. Petersen
` (2 more replies)
0 siblings, 3 replies; 54+ messages in thread
From: Lukas Czerner @ 2011-05-02 14:39 UTC (permalink / raw)
To: Martin K. Petersen
Cc: sandeen, Mike Snitzer, Christoph Hellwig,
device-mapper development, DarkNovaNick, linux-lvm, Lukas Czerner,
linux-ext4, Alasdair G Kergon
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1779 bytes --]
On Mon, 2 May 2011, Martin K. Petersen wrote:
> >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
>
> Lukas> So I gave it a try. First of all the device composed of SSD and
> Lukas> spinning disk does export discard_support information properly,
> Lukas> however it also advertise discard_zeroes_data which is wrong and
> Lukas> possibly dangerous and should be fixed!
>
> I can't reproduce this here. If I mix discard and non-discard devices
> things work correctly. discard_zeroes_data also gets cleared if I mix
> discard-capable drives where one zeroes and one doesn't.
>
>
[root@trim ~]# hdparm -I /dev/sdb | grep -i trim
[root@trim ~]# cat /sys/block/sdb/queue/discard_zeroes_data
0
[root@trim ~]# hdparm -I /dev/sdd | grep -i trim
* Data Set Management TRIM supported
* Deterministic read after TRIM
[root@trim ~]# cat /sys/block/sdd/queue/discard_zeroes_data
1
[root@trim ~]# pvcreate /dev/sdd1
Physical volume "/dev/sdd1" successfully created
[root@trim ~]# pvcreate /dev/sdb3
Physical volume "/dev/sdb3" successfully created
[root@trim ~]# vgcreate vg_test /dev/sdd1 /dev/sdb3
Volume group "vg_test" successfully created
[root@trim ~]# lvcreate -L 3500M vg_test
Logical volume "lvol0" created
[root@trim ~]# ls -lah /dev/mapper/vg_test-lvol0
lrwxrwxrwx. 1 root root 7 2.�kv� 10.30 /dev/mapper/vg_test-lvol0 -> ../dm-0
[root@trim ~]# cat /sys/block/dm-0/queue/discard_zeroes_data
1
[root@trim ~]# uname -r
2.6.39-rc5-blkdev+
[root@trim ~]# cat /proc/partitions
8 19 1951897 sdb3
8 49 1951866 sdd1
253 0 3584000 dm-0
So I assume it got fixes just recently ?. I'll give it a try with the
up-to-date kernel.
Thanks!
-Lukas
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-02 13:05 ` Lukas Czerner
@ 2011-05-02 14:47 ` Eric Sandeen
2011-05-02 14:48 ` Christoph Hellwig
2011-05-02 14:58 ` Lukas Czerner
0 siblings, 2 replies; 54+ messages in thread
From: Eric Sandeen @ 2011-05-02 14:47 UTC (permalink / raw)
To: Lukas Czerner
Cc: Mike Snitzer, Christoph Hellwig, device-mapper development,
DarkNovaNick, linux-lvm, linux-ext4, Alasdair G Kergon
On 5/2/11 8:05 AM, Lukas Czerner wrote:
> On Mon, 2 May 2011, Mike Snitzer wrote:
...
>> The blkdev_issue_discard() change you propose could be fine (mask
>> EOPNOTSUPP return if device advertises support for discards) -- though
>> Eric said we shouldn't ever say we did something when we didn't.
>
> Exactly, so we should not say that it is not supported when it is, but
> we just hit the "wrong" part of the device:) I would just very much like
> to keep the abstraction of having one consistent device underneath the
> file system and not deal with several devices, or regions with different
> behaviour in the file system itself (let the pixies underneath deal with
> that, after all not all of us are btrfs:))
I still think we need to stick with the simple rule: "EOPNOTSUPP returned for a particular bio means that it is not supported for that particular bio" - I don't know what else we can do, without creating an ambiguity...
This does, however, suck for the layer calling in to a complex device.
What is the overhead for sending discard bios down to a device that does not support it?
-Eric
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-02 14:47 ` Eric Sandeen
@ 2011-05-02 14:48 ` Christoph Hellwig
2011-05-02 14:58 ` Lukas Czerner
1 sibling, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2011-05-02 14:48 UTC (permalink / raw)
To: Eric Sandeen
Cc: Mike Snitzer, Christoph Hellwig, device-mapper development,
DarkNovaNick, linux-lvm, Lukas Czerner, linux-ext4,
Alasdair G Kergon
On Mon, May 02, 2011 at 09:47:16AM -0500, Eric Sandeen wrote:
> I still think we need to stick with the simple rule: "EOPNOTSUPP returned for a particular bio means that it is not supported for that particular bio" - I don't know what else we can do, without creating an ambiguity...
>
> This does, however, suck for the layer calling in to a complex device.
>
> What is the overhead for sending discard bios down to a device that does not support it?
For a DM-like device there is very little overhead. The bio gets
dispatched to make_request and failed in there almost immediately.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-02 14:39 ` Lukas Czerner
@ 2011-05-02 14:50 ` Martin K. Petersen
2011-05-02 14:58 ` [linux-lvm] " Mike Snitzer
2011-05-02 16:58 ` [linux-lvm] [dm-devel] " Martin K. Petersen
2 siblings, 0 replies; 54+ messages in thread
From: Martin K. Petersen @ 2011-05-02 14:50 UTC (permalink / raw)
To: Lukas Czerner
Cc: sandeen, Martin K. Petersen, Mike Snitzer, Christoph Hellwig,
device-mapper development, DarkNovaNick, linux-lvm, linux-ext4,
Alasdair G Kergon
>>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
Lukas> So I assume it got fixes just recently ?. I'll give it a try with
Lukas> the up-to-date kernel.
Ok, turns out the SSD I used for the test predates advertising
TRIM. Beats me why this is broken for the zeroes/!zeroes case. I'll get
it fixed...
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-02 14:39 ` Lukas Czerner
2011-05-02 14:50 ` Martin K. Petersen
@ 2011-05-02 14:58 ` Mike Snitzer
2011-05-02 16:58 ` [linux-lvm] [dm-devel] " Martin K. Petersen
2 siblings, 0 replies; 54+ messages in thread
From: Mike Snitzer @ 2011-05-02 14:58 UTC (permalink / raw)
To: Lukas Czerner
Cc: sandeen, Martin K. Petersen, Christoph Hellwig,
device-mapper development, DarkNovaNick, linux-lvm, linux-ext4,
Alasdair G Kergon
On Mon, May 02 2011 at 10:39am -0400,
Lukas Czerner <lczerner@redhat.com> wrote:
> On Mon, 2 May 2011, Martin K. Petersen wrote:
>
> > >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
> >
> > Lukas> So I gave it a try. First of all the device composed of SSD and
> > Lukas> spinning disk does export discard_support information properly,
> > Lukas> however it also advertise discard_zeroes_data which is wrong and
> > Lukas> possibly dangerous and should be fixed!
> >
> > I can't reproduce this here. If I mix discard and non-discard devices
> > things work correctly. discard_zeroes_data also gets cleared if I mix
> > discard-capable drives where one zeroes and one doesn't.
> >
> >
>
> [root@trim ~]# hdparm -I /dev/sdb | grep -i trim
> [root@trim ~]# cat /sys/block/sdb/queue/discard_zeroes_data
> 0
> [root@trim ~]# hdparm -I /dev/sdd | grep -i trim
> * Data Set Management TRIM supported
> * Deterministic read after TRIM
> [root@trim ~]# cat /sys/block/sdd/queue/discard_zeroes_data
> 1
> [root@trim ~]# pvcreate /dev/sdd1
> Physical volume "/dev/sdd1" successfully created
> [root@trim ~]# pvcreate /dev/sdb3
> Physical volume "/dev/sdb3" successfully created
> [root@trim ~]# vgcreate vg_test /dev/sdd1 /dev/sdb3
> Volume group "vg_test" successfully created
> [root@trim ~]# lvcreate -L 3500M vg_test
> Logical volume "lvol0" created
> [root@trim ~]# ls -lah /dev/mapper/vg_test-lvol0
> lrwxrwxrwx. 1 root root 7 2. kvě 10.30 /dev/mapper/vg_test-lvol0 -> ../dm-0
> [root@trim ~]# cat /sys/block/dm-0/queue/discard_zeroes_data
> 1
Hmm, Strange considering DM just uses blk_stack_limits().
As Martin said, current blk_stack_limits() code is fine:
t->discard_zeroes_data &= b->discard_zeroes_data;
> So I assume it got fixes just recently ?. I'll give it a try with the
> up-to-date kernel.
Hasn't changed at all since it was introduced:
98262f2 v2.6.33-rc1 block: Allow devices to indicate whether discarded blocks are zeroed
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-02 14:47 ` Eric Sandeen
2011-05-02 14:48 ` Christoph Hellwig
@ 2011-05-02 14:58 ` Lukas Czerner
1 sibling, 0 replies; 54+ messages in thread
From: Lukas Czerner @ 2011-05-02 14:58 UTC (permalink / raw)
To: Eric Sandeen
Cc: Mike Snitzer, Christoph Hellwig, device-mapper development,
DarkNovaNick, linux-lvm, Lukas Czerner, linux-ext4,
Alasdair G Kergon
On Mon, 2 May 2011, Eric Sandeen wrote:
> On 5/2/11 8:05 AM, Lukas Czerner wrote:
> > On Mon, 2 May 2011, Mike Snitzer wrote:
>
> ...
>
> >> The blkdev_issue_discard() change you propose could be fine (mask
> >> EOPNOTSUPP return if device advertises support for discards) -- though
> >> Eric said we shouldn't ever say we did something when we didn't.
> >
> > Exactly, so we should not say that it is not supported when it is, but
> > we just hit the "wrong" part of the device:) I would just very much like
> > to keep the abstraction of having one consistent device underneath the
> > file system and not deal with several devices, or regions with different
> > behaviour in the file system itself (let the pixies underneath deal with
> > that, after all not all of us are btrfs:))
>
> I still think we need to stick with the simple rule: "EOPNOTSUPP returned for a particular bio means that it is not supported for that particular bio" - I don't know what else we can do, without creating an ambiguity...
I agree, but we do not care about bios in file system. We need to know
whether the device itself supports it or not. Also consider BLKDISCARD
ioctl(), now as it is it's completely broken on such dm devices, because
you are not able to use it due to -EOPNOTSUPP first time you hit the
device which does not support it, but it can very well be just a small
part of dm device.
Again, it does not make sense to return -EOPNOTSUPP from
blkdev_issue_discard() when the device actually supports it.
-Lukas
>
> This does, however, suck for the layer calling in to a complex device.
>
> What is the overhead for sending discard bios down to a device that does not support it?
>
> -Eric
>
--
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-02 14:39 ` Lukas Czerner
2011-05-02 14:50 ` Martin K. Petersen
2011-05-02 14:58 ` [linux-lvm] " Mike Snitzer
@ 2011-05-02 16:58 ` Martin K. Petersen
2011-05-03 8:57 ` Lukas Czerner
2 siblings, 1 reply; 54+ messages in thread
From: Martin K. Petersen @ 2011-05-02 16:58 UTC (permalink / raw)
To: Lukas Czerner
Cc: sandeen, Martin K. Petersen, Mike Snitzer, Christoph Hellwig,
device-mapper development, DarkNovaNick, linux-lvm, linux-ext4,
Alasdair G Kergon
>>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
Lukas,
Lukas> [root@trim ~]# lvcreate -L 3500M vg_test
Lukas> Logical volume "lvol0" created
Ok, so here's what I think is going on. You're creating a linear target
which happens to fit inside the first PV.
Here's two devices. One that supports discard_zeroes_data=1 (8:17) and
one that doesn't (8:49).
# dmsetup table
foo-bar: 0 1032192 striped 2 32 8:17 384 8:49 384
foo-baz: 0 106496 linear 8:17 516480
# grep . /sys/block/d*/queue/discard_z*
/sys/block/dm-0/queue/discard_zeroes_data:0
/sys/block/dm-1/queue/discard_zeroes_data:1
LV foo/bar (dm-0) is striped, straddling two devices with incompatible
values. Hence 0.
LV foo/baz (dm-1) is linear and fits inside the first PV. Thus it has
discard_zeroes_data=1.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-02 16:58 ` [linux-lvm] [dm-devel] " Martin K. Petersen
@ 2011-05-03 8:57 ` Lukas Czerner
2011-05-04 15:10 ` Martin K. Petersen
2011-05-04 15:16 ` [linux-lvm] [dm-devel] " Martin K. Petersen
0 siblings, 2 replies; 54+ messages in thread
From: Lukas Czerner @ 2011-05-03 8:57 UTC (permalink / raw)
To: Martin K. Petersen
Cc: sandeen, Mike Snitzer, Christoph Hellwig,
device-mapper development, DarkNovaNick, linux-lvm, Lukas Czerner,
linux-ext4, Alasdair G Kergon
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2129 bytes --]
On Mon, 2 May 2011, Martin K. Petersen wrote:
> >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
>
> Lukas,
>
> Lukas> [root@trim ~]# lvcreate -L 3500M vg_test
> Lukas> Logical volume "lvol0" created
>
> Ok, so here's what I think is going on. You're creating a linear target
> which happens to fit inside the first PV.
>
> Here's two devices. One that supports discard_zeroes_data=1 (8:17) and
> one that doesn't (8:49).
>
> # dmsetup table
> foo-bar: 0 1032192 striped 2 32 8:17 384 8:49 384
> foo-baz: 0 106496 linear 8:17 516480
>
> # grep . /sys/block/d*/queue/discard_z*
> /sys/block/dm-0/queue/discard_zeroes_data:0
> /sys/block/dm-1/queue/discard_zeroes_data:1
>
> LV foo/bar (dm-0) is striped, straddling two devices with incompatible
> values. Hence 0.
>
> LV foo/baz (dm-1) is linear and fits inside the first PV. Thus it has
> discard_zeroes_data=1.
Well that's why I have included /proc/partition, so you can see that
this is not happening, because both partitions are 1.86GB long and the LV I
am creating from those is 3.42GB long.
[root@trim ~]# dmsetup table
vg_test-lvol0: 0 3899392 linear 8:49 2048
vg_test-lvol0: 3899392 3268608 linear 8:19 2048
Nevertheless there is something weird going on, because even when I
create striped volume I get this:
[root@trim ~]# dmsetup table
vg_test-lvol0: 0 7176192 striped 2 16 8:49 2048 8:19 2048
[root@trim ~]# ls -lah /dev/mapper/vg_test-lvol0
lrwxrwxrwx. 1 root root 7 3.�kv� 04.42 /dev/mapper/vg_test-lvol0 ->
../dm-0
[root@trim ~]# grep . /sys/block/dm-0/queue/discard_*
/sys/block/dm-0/queue/discard_granularity:512
/sys/block/dm-0/queue/discard_max_bytes:4294966784
/sys/block/dm-0/queue/discard_zeroes_data:1
And of course (8:49) is a partition on the SSD which does zero data and
(8:19) is a partition on the spinning device which does not zero data,
nor does it support discard. And this is on yesterdays kernel.
[root@trim ~]# uname -r
2.6.39-rc5+
So assuming that you're trying recent kernel as well, someone is doing
something different :)
Thanks!
-Lukas
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-03 8:57 ` Lukas Czerner
@ 2011-05-04 15:10 ` Martin K. Petersen
2011-05-04 16:02 ` [linux-lvm] " Mike Snitzer
` (2 more replies)
2011-05-04 15:16 ` [linux-lvm] [dm-devel] " Martin K. Petersen
1 sibling, 3 replies; 54+ messages in thread
From: Martin K. Petersen @ 2011-05-04 15:10 UTC (permalink / raw)
To: Lukas Czerner
Cc: sandeen, Martin K. Petersen, Mike Snitzer, Christoph Hellwig,
device-mapper development, DarkNovaNick, linux-lvm, linux-ext4,
Alasdair G Kergon
>>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
Lukas> Nevertheless there is something weird going on, because even when
Lukas> I create striped volume I get this:
Could you please try the following patch? It has a bunch of small tweaks
to the discard stack in it. I'll split it up before posting for real but
I'd like to know if it fixes your issue...
block/libata/scsi: Various logical block provisioning fixes
- Add sysfs documentation for the discard topology parameters
- Fix discard stacking problem
- Switch our libata SAT over to using the WRITE SAME limits
- UNMAP alignment needs to be converted to bytes
- Only report alignment and zeroes_data if the device supports discard
Reported-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index 4873c75..c1eb41c 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -142,3 +142,67 @@ Description:
with the previous I/O request are enabled. When set to 2,
all merge tries are disabled. The default value is 0 -
which enables all types of merge tries.
+
+What: /sys/block/<disk>/discard_alignment
+Date: May 2011
+Contact: Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+ Devices that support discard functionality may
+ internally allocate space in units that are bigger than
+ the exported logical block size. The discard_alignment
+ parameter indicates how many bytes the beginning of the
+ device is offset from the internal allocation unit's
+ natural alignment.
+
+What: /sys/block/<disk>/<partition>/discard_alignment
+Date: May 2011
+Contact: Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+ Devices that support discard functionality may
+ internally allocate space in units that are bigger than
+ the exported logical block size. The discard_alignment
+ parameter indicates how many bytes the beginning of the
+ partition is offset from the internal allocation unit's
+ natural alignment.
+
+What: /sys/block/<disk>/queue/discard_granularity
+Date: May 2011
+Contact: Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+ Devices that support discard functionality may
+ internally allocate space using units that are bigger
+ than the logical block size. The discard_granularity
+ parameter indicates the size of the internal allocation
+ unit in bytes if reported by the device. Otherwise the
+ discard_granularity will be set to match the device's
+ physical block size. A discard_granularity of 0 means
+ that the device does not support discard functionality.
+
+What: /sys/block/<disk>/queue/discard_max_bytes
+Date: May 2011
+Contact: Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+ Devices that support discard functionality may have
+ internal limits on the number of bytes that can be
+ trimmed or unmapped in a single operation. Some storage
+ protocols also have inherent limits on the number of
+ blocks that can be described in a single command. The
+ discard_max_bytes parameter is set by the device driver
+ to the maximum number of bytes that can be discarded in
+ a single operation. Discard requests issued to the
+ device must not exceed this limit. A discard_max_bytes
+ value of 0 means that the device does not support
+ discard functionality.
+
+What: /sys/block/<disk>/queue/discard_zeroes_data
+Date: May 2011
+Contact: Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+ Devices that support discard functionality may return
+ stale or random data when a previously discarded block
+ is read back. This can cause problems if the filesystem
+ expects discarded blocks to be explicitly cleared. If a
+ device reports that it deterministically returns zeroes
+ when a discarded area is read the discard_zeroes_data
+ parameter will be set to one. Otherwise it will be 0 and
+ the result of reading a discarded area is undefined.
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 1fa7692..42d3bf5 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -120,7 +120,7 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->discard_granularity = 0;
lim->discard_alignment = 0;
lim->discard_misaligned = 0;
- lim->discard_zeroes_data = -1;
+ lim->discard_zeroes_data = 1;
lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT);
lim->alignment_offset = 0;
@@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
blk_set_default_limits(&q->limits);
blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS);
+ q->limits.discard_zeroes_data = 0;
/*
* by default assume old behaviour and bounce for any highmem page
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index e2f57e9e..30ea95f 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2138,7 +2138,7 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
* with the unmap bit set.
*/
if (ata_id_has_trim(args->id)) {
- put_unaligned_be32(65535 * 512 / 8, &rbuf[20]);
+ put_unaligned_be64(65535 * 512 / 8, &rbuf[36]);
put_unaligned_be32(1, &rbuf[28]);
}
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index bd0806e..cc081dd 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -490,7 +490,8 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
unsigned int max_blocks = 0;
q->limits.discard_zeroes_data = sdkp->lbprz;
- q->limits.discard_alignment = sdkp->unmap_alignment;
+ q->limits.discard_alignment = sdkp->unmap_alignment *
+ (logical_block_size >> 9);
q->limits.discard_granularity =
max(sdkp->physical_block_size,
sdkp->unmap_granularity * logical_block_size);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2ad95fa..1416e3c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -257,7 +257,7 @@ struct queue_limits {
unsigned char misaligned;
unsigned char discard_misaligned;
unsigned char cluster;
- signed char discard_zeroes_data;
+ unsigned char discard_zeroes_data;
};
struct request_queue
@@ -1066,13 +1066,16 @@ static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector
{
unsigned int alignment = (sector << 9) & (lim->discard_granularity - 1);
+ if (!lim->max_discard_sectors)
+ return 0;
+
return (lim->discard_granularity + lim->discard_alignment - alignment)
& (lim->discard_granularity - 1);
}
static inline unsigned int queue_discard_zeroes_data(struct request_queue *q)
{
- if (q->limits.discard_zeroes_data == 1)
+ if (q->limits.max_discard_sectors && q->limits.discard_zeroes_data == 1)
return 1;
return 0;
^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-03 8:57 ` Lukas Czerner
2011-05-04 15:10 ` Martin K. Petersen
@ 2011-05-04 15:16 ` Martin K. Petersen
2011-05-04 16:12 ` Lukas Czerner
2011-05-05 8:33 ` Karel Zak
1 sibling, 2 replies; 54+ messages in thread
From: Martin K. Petersen @ 2011-05-04 15:16 UTC (permalink / raw)
To: Lukas Czerner, Karel Zak
Cc: sandeen, Martin K. Petersen, Mike Snitzer, Christoph Hellwig,
device-mapper development, DarkNovaNick, linux-lvm, linux-ext4,
Alasdair G Kergon
>>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
I got tired of poking around in sysfs to find the discard topology.
Here's a patch against lsblk that adds a -D option to present this
information in a human-readable form:
# lsblk -D
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda 0 0B 0B 0
└─sda1 0 0B 0B 0
sdb 0 512B 2G 1
└─sdb1 0 512B 2G 1
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
diff --git a/misc-utils/lsblk.8 b/misc-utils/lsblk.8
index 38ff48f..d7d7aa8 100644
--- a/misc-utils/lsblk.8
+++ b/misc-utils/lsblk.8
@@ -29,6 +29,8 @@ Print the SIZE column in bytes rather than in human-readable format.
.IP "\fB\-d, \-\-nodeps\fP"
Don't print device holders or slaves. For example "lsblk --nodeps /dev/sda" prints
information about the sda device only.
+.IP "\fB\-D, \-\-discard\fP"
+Print information about the discard (TRIM, UNMAP) capabilities for each device.
.IP "\fB\-e, \-\-exclude \fIlist\fP
Exclude the devices specified by a comma-separated \fIlist\fR of major device numbers.
Note that RAM disks (major=1) are excluded by default.
diff --git a/misc-utils/lsblk.c b/misc-utils/lsblk.c
index 38326d0..671e690 100644
--- a/misc-utils/lsblk.c
+++ b/misc-utils/lsblk.c
@@ -77,6 +77,10 @@ enum {
COL_ROTA,
COL_SCHED,
COL_TYPE,
+ COL_DALIGN,
+ COL_DGRAN,
+ COL_DMAX,
+ COL_DZERO,
__NCOLUMNS
};
@@ -112,8 +116,11 @@ static struct colinfo infos[__NCOLUMNS] = {
[COL_PHYSEC] = { "PHY-SEC", 7, TT_FL_RIGHT, N_("physical sector size") },
[COL_LOGSEC] = { "LOG-SEC", 7, TT_FL_RIGHT, N_("logical sector size") },
[COL_SCHED] = { "SCHED", 0.1, 0, N_("I/O scheduler name") },
- [COL_TYPE] = { "TYPE", 4, 0, N_("device type") }
-
+ [COL_TYPE] = { "TYPE", 4, 0, N_("device type") },
+ [COL_DALIGN] = { "DISC-ALN", 6, TT_FL_RIGHT, N_("discard alignment offset") },
+ [COL_DGRAN] = { "DISC-GRAN", 6, TT_FL_RIGHT, N_("discard granularity") },
+ [COL_DMAX] = { "DISC-MAX", 6, TT_FL_RIGHT, N_("discard max bytes") },
+ [COL_DZERO] = { "DISC-ZERO", 1, TT_FL_RIGHT, N_("discard zeroes data") },
};
struct lsblk {
@@ -702,6 +709,33 @@ static void set_tt_data(struct blkdev_cxt *cxt, int col, int id, struct tt_line
if (p)
tt_line_set_data(ln, col, p);
break;
+ case COL_DALIGN:
+ p = sysfs_strdup(cxt, "discard_alignment");
+ if (p)
+ tt_line_set_data(ln, col, p);
+ break;
+ case COL_DGRAN:
+ p = sysfs_strdup(cxt, "queue/discard_granularity");
+ if (!lsblk->bytes)
+ p = size_to_human_string(atoi(p));
+
+ if (p)
+ tt_line_set_data(ln, col, p);
+ break;
+ case COL_DMAX:
+ p = sysfs_strdup(cxt, "queue/discard_max_bytes");
+
+ if (!lsblk->bytes)
+ p = size_to_human_string(atoi(p));
+
+ if (p)
+ tt_line_set_data(ln, col, p);
+ break;
+ case COL_DZERO:
+ p = sysfs_strdup(cxt, "queue/discard_zeroes_data");
+ if (p)
+ tt_line_set_data(ln, col, p);
+ break;
};
}
@@ -930,6 +964,7 @@ static void __attribute__((__noreturn__)) help(FILE *out)
" -a, --all print all devices\n"
" -b, --bytes print SIZE in bytes rather than in human readable format\n"
" -d, --nodeps don't print slaves or holders\n"
+ " -D, --discard print discard capabilities\n"
" -e, --exclude <list> exclude devices by major number (default: RAM disks)\n"
" -f, --fs output info about filesystems\n"
" -h, --help usage information (this)\n"
@@ -967,6 +1002,7 @@ int main(int argc, char *argv[])
{ "all", 0, 0, 'a' },
{ "bytes", 0, 0, 'b' },
{ "nodeps", 0, 0, 'd' },
+ { "discard", 0, 0, 'D' },
{ "help", 0, 0, 'h' },
{ "output", 1, 0, 'o' },
{ "perms", 0, 0, 'm' },
@@ -987,7 +1023,7 @@ int main(int argc, char *argv[])
lsblk = &_ls;
memset(lsblk, 0, sizeof(*lsblk));
- while((c = getopt_long(argc, argv, "abde:fhlnmo:irt", longopts, NULL)) != -1) {
+ while((c = getopt_long(argc, argv, "abdDe:fhlnmo:irt", longopts, NULL)) != -1) {
switch(c) {
case 'a':
lsblk->all_devices = 1;
@@ -998,6 +1034,13 @@ int main(int argc, char *argv[])
case 'd':
lsblk->nodeps = 1;
break;
+ case 'D':
+ columns[ncolumns++] = COL_NAME;
+ columns[ncolumns++] = COL_DALIGN;
+ columns[ncolumns++] = COL_DGRAN;
+ columns[ncolumns++] = COL_DMAX;
+ columns[ncolumns++] = COL_DZERO;
+ break;
case 'e':
parse_excludes(optarg);
break;
^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-04 15:10 ` Martin K. Petersen
@ 2011-05-04 16:02 ` Mike Snitzer
2011-05-04 16:50 ` Martin K. Petersen
2011-05-04 17:10 ` [linux-lvm] [dm-devel] " Lukas Czerner
2011-05-18 12:16 ` [linux-lvm] " Mike Snitzer
2 siblings, 1 reply; 54+ messages in thread
From: Mike Snitzer @ 2011-05-04 16:02 UTC (permalink / raw)
To: Martin K. Petersen
Cc: sandeen, Christoph Hellwig, device-mapper development,
DarkNovaNick, linux-lvm, Lukas Czerner, linux-ext4,
Alasdair G Kergon
On Wed, May 04 2011 at 11:10am -0400,
Martin K. Petersen <martin.petersen@oracle.com> wrote:
> >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
>
> Lukas> Nevertheless there is something weird going on, because even when
> Lukas> I create striped volume I get this:
>
> Could you please try the following patch? It has a bunch of small tweaks
> to the discard stack in it. I'll split it up before posting for real but
> I'd like to know if it fixes your issue...
>
>
> block/libata/scsi: Various logical block provisioning fixes
>
> - Add sysfs documentation for the discard topology parameters
>
> - Fix discard stacking problem
>
> - Switch our libata SAT over to using the WRITE SAME limits
>
> - UNMAP alignment needs to be converted to bytes
>
> - Only report alignment and zeroes_data if the device supports discard
>
> Reported-by: Lukas Czerner <lczerner@redhat.com>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 1fa7692..42d3bf5 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -120,7 +120,7 @@ void blk_set_default_limits(struct queue_limits *lim)
> lim->discard_granularity = 0;
> lim->discard_alignment = 0;
> lim->discard_misaligned = 0;
> - lim->discard_zeroes_data = -1;
> + lim->discard_zeroes_data = 1;
> lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
> lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT);
> lim->alignment_offset = 0;
lim->discard_zeroes_data = -1; was suspect to me too.
But why default to 1 here?
> @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
>
> blk_set_default_limits(&q->limits);
> blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS);
> + q->limits.discard_zeroes_data = 0;
>
> /*
> * by default assume old behaviour and bounce for any highmem page
Only to then reset to 0 here? Shouldn't we default to 0 and only set to
1 where applicable (e.g. sd_config_discard)?
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-04 15:16 ` [linux-lvm] [dm-devel] " Martin K. Petersen
@ 2011-05-04 16:12 ` Lukas Czerner
2011-05-05 8:33 ` Karel Zak
1 sibling, 0 replies; 54+ messages in thread
From: Lukas Czerner @ 2011-05-04 16:12 UTC (permalink / raw)
To: Martin K. Petersen
Cc: sandeen, Mike Snitzer, Christoph Hellwig,
device-mapper development, Karel Zak, DarkNovaNick, linux-lvm,
Lukas Czerner, linux-ext4, Alasdair G Kergon
[-- Attachment #1: Type: TEXT/PLAIN, Size: 4904 bytes --]
On Wed, 4 May 2011, Martin K. Petersen wrote:
> >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
>
> I got tired of poking around in sysfs to find the discard topology.
> Here's a patch against lsblk that adds a -D option to present this
> information in a human-readable form:
>
> # lsblk -D
> NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
> sda 0 0B 0B 0
> └─sda1 0 0B 0B 0
> sdb 0 512B 2G 1
> └─sdb1 0 512B 2G 1
Very useful indeed!
Thanks!
-Lukas
>
>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
>
> diff --git a/misc-utils/lsblk.8 b/misc-utils/lsblk.8
> index 38ff48f..d7d7aa8 100644
> --- a/misc-utils/lsblk.8
> +++ b/misc-utils/lsblk.8
> @@ -29,6 +29,8 @@ Print the SIZE column in bytes rather than in human-readable format.
> .IP "\fB\-d, \-\-nodeps\fP"
> Don't print device holders or slaves. For example "lsblk --nodeps /dev/sda" prints
> information about the sda device only.
> +.IP "\fB\-D, \-\-discard\fP"
> +Print information about the discard (TRIM, UNMAP) capabilities for each device.
> .IP "\fB\-e, \-\-exclude \fIlist\fP
> Exclude the devices specified by a comma-separated \fIlist\fR of major device numbers.
> Note that RAM disks (major=1) are excluded by default.
> diff --git a/misc-utils/lsblk.c b/misc-utils/lsblk.c
> index 38326d0..671e690 100644
> --- a/misc-utils/lsblk.c
> +++ b/misc-utils/lsblk.c
> @@ -77,6 +77,10 @@ enum {
> COL_ROTA,
> COL_SCHED,
> COL_TYPE,
> + COL_DALIGN,
> + COL_DGRAN,
> + COL_DMAX,
> + COL_DZERO,
>
> __NCOLUMNS
> };
> @@ -112,8 +116,11 @@ static struct colinfo infos[__NCOLUMNS] = {
> [COL_PHYSEC] = { "PHY-SEC", 7, TT_FL_RIGHT, N_("physical sector size") },
> [COL_LOGSEC] = { "LOG-SEC", 7, TT_FL_RIGHT, N_("logical sector size") },
> [COL_SCHED] = { "SCHED", 0.1, 0, N_("I/O scheduler name") },
> - [COL_TYPE] = { "TYPE", 4, 0, N_("device type") }
> -
> + [COL_TYPE] = { "TYPE", 4, 0, N_("device type") },
> + [COL_DALIGN] = { "DISC-ALN", 6, TT_FL_RIGHT, N_("discard alignment offset") },
> + [COL_DGRAN] = { "DISC-GRAN", 6, TT_FL_RIGHT, N_("discard granularity") },
> + [COL_DMAX] = { "DISC-MAX", 6, TT_FL_RIGHT, N_("discard max bytes") },
> + [COL_DZERO] = { "DISC-ZERO", 1, TT_FL_RIGHT, N_("discard zeroes data") },
> };
>
> struct lsblk {
> @@ -702,6 +709,33 @@ static void set_tt_data(struct blkdev_cxt *cxt, int col, int id, struct tt_line
> if (p)
> tt_line_set_data(ln, col, p);
> break;
> + case COL_DALIGN:
> + p = sysfs_strdup(cxt, "discard_alignment");
> + if (p)
> + tt_line_set_data(ln, col, p);
> + break;
> + case COL_DGRAN:
> + p = sysfs_strdup(cxt, "queue/discard_granularity");
> + if (!lsblk->bytes)
> + p = size_to_human_string(atoi(p));
> +
> + if (p)
> + tt_line_set_data(ln, col, p);
> + break;
> + case COL_DMAX:
> + p = sysfs_strdup(cxt, "queue/discard_max_bytes");
> +
> + if (!lsblk->bytes)
> + p = size_to_human_string(atoi(p));
> +
> + if (p)
> + tt_line_set_data(ln, col, p);
> + break;
> + case COL_DZERO:
> + p = sysfs_strdup(cxt, "queue/discard_zeroes_data");
> + if (p)
> + tt_line_set_data(ln, col, p);
> + break;
> };
> }
>
> @@ -930,6 +964,7 @@ static void __attribute__((__noreturn__)) help(FILE *out)
> " -a, --all print all devices\n"
> " -b, --bytes print SIZE in bytes rather than in human readable format\n"
> " -d, --nodeps don't print slaves or holders\n"
> + " -D, --discard print discard capabilities\n"
> " -e, --exclude <list> exclude devices by major number (default: RAM disks)\n"
> " -f, --fs output info about filesystems\n"
> " -h, --help usage information (this)\n"
> @@ -967,6 +1002,7 @@ int main(int argc, char *argv[])
> { "all", 0, 0, 'a' },
> { "bytes", 0, 0, 'b' },
> { "nodeps", 0, 0, 'd' },
> + { "discard", 0, 0, 'D' },
> { "help", 0, 0, 'h' },
> { "output", 1, 0, 'o' },
> { "perms", 0, 0, 'm' },
> @@ -987,7 +1023,7 @@ int main(int argc, char *argv[])
> lsblk = &_ls;
> memset(lsblk, 0, sizeof(*lsblk));
>
> - while((c = getopt_long(argc, argv, "abde:fhlnmo:irt", longopts, NULL)) != -1) {
> + while((c = getopt_long(argc, argv, "abdDe:fhlnmo:irt", longopts, NULL)) != -1) {
> switch(c) {
> case 'a':
> lsblk->all_devices = 1;
> @@ -998,6 +1034,13 @@ int main(int argc, char *argv[])
> case 'd':
> lsblk->nodeps = 1;
> break;
> + case 'D':
> + columns[ncolumns++] = COL_NAME;
> + columns[ncolumns++] = COL_DALIGN;
> + columns[ncolumns++] = COL_DGRAN;
> + columns[ncolumns++] = COL_DMAX;
> + columns[ncolumns++] = COL_DZERO;
> + break;
> case 'e':
> parse_excludes(optarg);
> break;
>
--
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-04-29 14:28 ` Eric Sandeen
2011-04-29 15:13 ` Ray Morris
@ 2011-05-04 16:33 ` Ted Ts'o
2011-05-04 17:02 ` Lukas Czerner
1 sibling, 1 reply; 54+ messages in thread
From: Ted Ts'o @ 2011-05-04 16:33 UTC (permalink / raw)
To: Eric Sandeen
Cc: Mike Snitzer, Christoph Hellwig, dm-devel, DarkNovaNick,
linux-lvm, Lukas Czerner, linux-ext4
On Fri, Apr 29, 2011 at 09:28:51AM -0500, Eric Sandeen wrote:
> On 4/29/11 7:29 AM, Christoph Hellwig wrote:
> > FYI, that disable discard on failure seems to be an ext4 special and no
> > one else picked up that stupid idea. If ext4 wants to misbehave for
> > that just let them..
>
> It was my "stupid idea," and I'm ok with reverting it ;)
I think I forgot to send the patch out, but it's been reverted in the
ext4 master branch, commit id: d9f34504e695. It's more than a revert,
actually, since I also dropped error checking for FITRIM. Otherwise
an attempt to use FITRIM would stop after hitting the first dm region
that didn't support discards.
- Ted
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-04 16:02 ` [linux-lvm] " Mike Snitzer
@ 2011-05-04 16:50 ` Martin K. Petersen
2011-05-04 18:03 ` Mike Snitzer
0 siblings, 1 reply; 54+ messages in thread
From: Martin K. Petersen @ 2011-05-04 16:50 UTC (permalink / raw)
To: Mike Snitzer
Cc: sandeen, Martin K. Petersen, Christoph Hellwig,
device-mapper development, DarkNovaNick, linux-lvm, Lukas Czerner,
linux-ext4, Alasdair G Kergon
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike> lim->discard_zeroes_data = -1; was suspect to me too.
Mike> But why default to 1 here?
Because otherwise DM would default to having dzd to "unsupported",
meaning the feature would never be turned on regardless of the bottom
device capabilities.
The old approach used the -1 value to indicate "has not been set". That
was only really intended as a value for the stacking drivers, not for
the LLDs. It was a bit of a hack and I'd rather deal with dzd the same
way as we do with clustering.
>> @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue
>> *q, make_request_fn *mfn)
>>
>> blk_set_default_limits(&q->limits);
>> blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS);
>> + q->limits.discard_zeroes_data = 0;
>>
>> /*
>> * by default assume old behaviour and bounce for any highmem page
Mike> Only to then reset to 0 here? Shouldn't we default to 0 and only
Mike> set to 1 where applicable (e.g. sd_config_discard)?
My first approach was to set it in dm-table.c before stacking. But I
thought it was icky to have the stacking driver ask for defaults and
then have to tweak them for things to work correctly.
The other option is to have blk_set_default_stacking_limits(). Or we
could add a flag to blk_set_default_limits to indicate whether this is a
LLD or a stacking driver.
We already special-case BLK_SAFE_MAX_SECTORS when setting the request
function. And that's the only non-stacking user of the default limits
call. So that's why I disabled dzd there. Since this is a stable bugfix
I also wanted to keep it small and simple. But I'm totally open to
suggestions.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-04 16:33 ` Ted Ts'o
@ 2011-05-04 17:02 ` Lukas Czerner
0 siblings, 0 replies; 54+ messages in thread
From: Lukas Czerner @ 2011-05-04 17:02 UTC (permalink / raw)
To: Ted Ts'o
Cc: Eric Sandeen, Mike Snitzer, Christoph Hellwig, dm-devel,
DarkNovaNick, linux-lvm, Lukas Czerner, linux-ext4
On Wed, 4 May 2011, Ted Ts'o wrote:
> On Fri, Apr 29, 2011 at 09:28:51AM -0500, Eric Sandeen wrote:
> > On 4/29/11 7:29 AM, Christoph Hellwig wrote:
> > > FYI, that disable discard on failure seems to be an ext4 special and no
> > > one else picked up that stupid idea. If ext4 wants to misbehave for
> > > that just let them..
> >
> > It was my "stupid idea," and I'm ok with reverting it ;)
>
> I think I forgot to send the patch out, but it's been reverted in the
> ext4 master branch, commit id: d9f34504e695. It's more than a revert,
> actually, since I also dropped error checking for FITRIM. Otherwise
> an attempt to use FITRIM would stop after hitting the first dm region
> that didn't support discards.
>
> - Ted
>
However I would very much like to change blkdev_issue_discard() not NOT
return EOPNOTSUPP if QUEUE_FLAG_DISCARD is set, this way we can rely on
EOPNOTSUPP saying really that it is NOT supported. Regardless on ext4
change.
I will send a patch for it, if you guy agree. Also consider BLKDISCARD
which is now broken and this change will fix that!
Thanks!
-Lukas
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-04 15:10 ` Martin K. Petersen
2011-05-04 16:02 ` [linux-lvm] " Mike Snitzer
@ 2011-05-04 17:10 ` Lukas Czerner
2011-05-04 17:32 ` Martin K. Petersen
2011-05-18 12:16 ` [linux-lvm] " Mike Snitzer
2 siblings, 1 reply; 54+ messages in thread
From: Lukas Czerner @ 2011-05-04 17:10 UTC (permalink / raw)
To: Martin K. Petersen
Cc: sandeen, Mike Snitzer, Christoph Hellwig,
device-mapper development, DarkNovaNick, linux-lvm, Lukas Czerner,
linux-ext4, Alasdair G Kergon
[-- Attachment #1: Type: TEXT/PLAIN, Size: 9258 bytes --]
On Wed, 4 May 2011, Martin K. Petersen wrote:
> >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
>
> Lukas> Nevertheless there is something weird going on, because even when
> Lukas> I create striped volume I get this:
>
> Could you please try the following patch? It has a bunch of small tweaks
> to the discard stack in it. I'll split it up before posting for real but
> I'd like to know if it fixes your issue...
It works thanks! This is before the patch applied:
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sdb 0 0B 0B 0
О©╫О©╫sdb1 4293918720 0B 0B 0
О©╫О©╫sdb2 4293918720 0B 0B 0
О©╫О©╫sdb3 4293918720 0B 0B 0
О©╫О©╫vg_test-lvol0 (dm-0) 0 512B 16E 1
sdd 0 512B 16E 1
О©╫О©╫sdd1 0 512B 16E 1
О©╫О©╫vg_test-lvol0 (dm-0) 0 512B 16E 1
and this is with the patch:
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sdb 0 0B 0B 0
О©╫О©╫sdb1 0 0B 0B 0
О©╫О©╫sdb2 0 0B 0B 0
О©╫О©╫sdb3 0 0B 0B 0
О©╫О©╫vg_test-lvol0 (dm-0) 0 512B 2G 0
sdd 0 512B 2G 1
О©╫О©╫sdd1 0 512B 2G 1
О©╫О©╫vg_test-lvol0 (dm-0) 0 512B 2G 0
It also works properly with two devices supporting discard.
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sdd 0 512B 2G 1
О©╫О©╫sdd1 0 512B 2G 1
О©╫ О©╫О©╫vg_test-lvol0 (dm-0) 0 512B 2G 1
О©╫О©╫sdd2 0 512B 2G 1
О©╫О©╫vg_test-lvol0 (dm-0) 0 512B 2G 1
Also I am not sure why this changed discard_max_bytes from 4294966784
to 2147450880 in my case, that does not seem right...lsblk in the first
place apparently overflowed.
Thanks for solving this!
-Lukas
>
>
> block/libata/scsi: Various logical block provisioning fixes
>
> - Add sysfs documentation for the discard topology parameters
>
> - Fix discard stacking problem
>
> - Switch our libata SAT over to using the WRITE SAME limits
>
> - UNMAP alignment needs to be converted to bytes
>
> - Only report alignment and zeroes_data if the device supports discard
>
> Reported-by: Lukas Czerner <lczerner@redhat.com>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
>
> diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
> index 4873c75..c1eb41c 100644
> --- a/Documentation/ABI/testing/sysfs-block
> +++ b/Documentation/ABI/testing/sysfs-block
> @@ -142,3 +142,67 @@ Description:
> with the previous I/O request are enabled. When set to 2,
> all merge tries are disabled. The default value is 0 -
> which enables all types of merge tries.
> +
> +What: /sys/block/<disk>/discard_alignment
> +Date: May 2011
> +Contact: Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> + Devices that support discard functionality may
> + internally allocate space in units that are bigger than
> + the exported logical block size. The discard_alignment
> + parameter indicates how many bytes the beginning of the
> + device is offset from the internal allocation unit's
> + natural alignment.
> +
> +What: /sys/block/<disk>/<partition>/discard_alignment
> +Date: May 2011
> +Contact: Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> + Devices that support discard functionality may
> + internally allocate space in units that are bigger than
> + the exported logical block size. The discard_alignment
> + parameter indicates how many bytes the beginning of the
> + partition is offset from the internal allocation unit's
> + natural alignment.
> +
> +What: /sys/block/<disk>/queue/discard_granularity
> +Date: May 2011
> +Contact: Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> + Devices that support discard functionality may
> + internally allocate space using units that are bigger
> + than the logical block size. The discard_granularity
> + parameter indicates the size of the internal allocation
> + unit in bytes if reported by the device. Otherwise the
> + discard_granularity will be set to match the device's
> + physical block size. A discard_granularity of 0 means
> + that the device does not support discard functionality.
> +
> +What: /sys/block/<disk>/queue/discard_max_bytes
> +Date: May 2011
> +Contact: Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> + Devices that support discard functionality may have
> + internal limits on the number of bytes that can be
> + trimmed or unmapped in a single operation. Some storage
> + protocols also have inherent limits on the number of
> + blocks that can be described in a single command. The
> + discard_max_bytes parameter is set by the device driver
> + to the maximum number of bytes that can be discarded in
> + a single operation. Discard requests issued to the
> + device must not exceed this limit. A discard_max_bytes
> + value of 0 means that the device does not support
> + discard functionality.
> +
> +What: /sys/block/<disk>/queue/discard_zeroes_data
> +Date: May 2011
> +Contact: Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> + Devices that support discard functionality may return
> + stale or random data when a previously discarded block
> + is read back. This can cause problems if the filesystem
> + expects discarded blocks to be explicitly cleared. If a
> + device reports that it deterministically returns zeroes
> + when a discarded area is read the discard_zeroes_data
> + parameter will be set to one. Otherwise it will be 0 and
> + the result of reading a discarded area is undefined.
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 1fa7692..42d3bf5 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -120,7 +120,7 @@ void blk_set_default_limits(struct queue_limits *lim)
> lim->discard_granularity = 0;
> lim->discard_alignment = 0;
> lim->discard_misaligned = 0;
> - lim->discard_zeroes_data = -1;
> + lim->discard_zeroes_data = 1;
> lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
> lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT);
> lim->alignment_offset = 0;
> @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
>
> blk_set_default_limits(&q->limits);
> blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS);
> + q->limits.discard_zeroes_data = 0;
>
> /*
> * by default assume old behaviour and bounce for any highmem page
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index e2f57e9e..30ea95f 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -2138,7 +2138,7 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
> * with the unmap bit set.
> */
> if (ata_id_has_trim(args->id)) {
> - put_unaligned_be32(65535 * 512 / 8, &rbuf[20]);
> + put_unaligned_be64(65535 * 512 / 8, &rbuf[36]);
> put_unaligned_be32(1, &rbuf[28]);
> }
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index bd0806e..cc081dd 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -490,7 +490,8 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
> unsigned int max_blocks = 0;
>
> q->limits.discard_zeroes_data = sdkp->lbprz;
> - q->limits.discard_alignment = sdkp->unmap_alignment;
> + q->limits.discard_alignment = sdkp->unmap_alignment *
> + (logical_block_size >> 9);
> q->limits.discard_granularity =
> max(sdkp->physical_block_size,
> sdkp->unmap_granularity * logical_block_size);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 2ad95fa..1416e3c 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -257,7 +257,7 @@ struct queue_limits {
> unsigned char misaligned;
> unsigned char discard_misaligned;
> unsigned char cluster;
> - signed char discard_zeroes_data;
> + unsigned char discard_zeroes_data;
> };
>
> struct request_queue
> @@ -1066,13 +1066,16 @@ static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector
> {
> unsigned int alignment = (sector << 9) & (lim->discard_granularity - 1);
>
> + if (!lim->max_discard_sectors)
> + return 0;
> +
> return (lim->discard_granularity + lim->discard_alignment - alignment)
> & (lim->discard_granularity - 1);
> }
>
> static inline unsigned int queue_discard_zeroes_data(struct request_queue *q)
> {
> - if (q->limits.discard_zeroes_data == 1)
> + if (q->limits.max_discard_sectors && q->limits.discard_zeroes_data == 1)
> return 1;
>
> return 0;
>
--
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-04 17:10 ` [linux-lvm] [dm-devel] " Lukas Czerner
@ 2011-05-04 17:32 ` Martin K. Petersen
2011-05-04 17:35 ` Lukas Czerner
0 siblings, 1 reply; 54+ messages in thread
From: Martin K. Petersen @ 2011-05-04 17:32 UTC (permalink / raw)
To: Lukas Czerner
Cc: sandeen, Martin K. Petersen, Mike Snitzer, Christoph Hellwig,
device-mapper development, DarkNovaNick, linux-lvm, linux-ext4,
Alasdair G Kergon
>>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
Lukas> Also I am not sure why this changed discard_max_bytes from
Lukas> 4294966784 to 2147450880 in my case, that does not seem
Lukas> right...lsblk in the first place apparently overflowed.
2G is correct. The relevant libata patch was missing from the series I
submitted during the merge window and I don't see it in the archives. It
was at the end of the stack so I guess I had an off-by-one in my commit
range...
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-04 17:32 ` Martin K. Petersen
@ 2011-05-04 17:35 ` Lukas Czerner
0 siblings, 0 replies; 54+ messages in thread
From: Lukas Czerner @ 2011-05-04 17:35 UTC (permalink / raw)
To: Martin K. Petersen
Cc: sandeen, Mike Snitzer, Christoph Hellwig,
device-mapper development, DarkNovaNick, linux-lvm, Lukas Czerner,
linux-ext4, Alasdair G Kergon
On Wed, 4 May 2011, Martin K. Petersen wrote:
> >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
>
> Lukas> Also I am not sure why this changed discard_max_bytes from
> Lukas> 4294966784 to 2147450880 in my case, that does not seem
> Lukas> right...lsblk in the first place apparently overflowed.
>
> 2G is correct. The relevant libata patch was missing from the series I
> submitted during the merge window and I don't see it in the archives. It
> was at the end of the stack so I guess I had an off-by-one in my commit
> range...
>
Ok, thanks.
-Lukas
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-04 16:50 ` Martin K. Petersen
@ 2011-05-04 18:03 ` Mike Snitzer
0 siblings, 0 replies; 54+ messages in thread
From: Mike Snitzer @ 2011-05-04 18:03 UTC (permalink / raw)
To: Martin K. Petersen
Cc: sandeen, Christoph Hellwig, device-mapper development,
DarkNovaNick, linux-lvm, Lukas Czerner, linux-ext4,
Alasdair G Kergon
On Wed, May 04 2011 at 12:50pm -0400,
Martin K. Petersen <martin.petersen@oracle.com> wrote:
> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
>
> Mike> lim->discard_zeroes_data = -1; was suspect to me too.
> Mike> But why default to 1 here?
>
> Because otherwise DM would default to having dzd to "unsupported",
> meaning the feature would never be turned on regardless of the bottom
> device capabilities.
>
> The old approach used the -1 value to indicate "has not been set". That
> was only really intended as a value for the stacking drivers, not for
> the LLDs. It was a bit of a hack and I'd rather deal with dzd the same
> way as we do with clustering.
>
>
> >> @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue
> >> *q, make_request_fn *mfn)
> >>
> >> blk_set_default_limits(&q->limits);
> >> blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS);
> >> + q->limits.discard_zeroes_data = 0;
> >>
> >> /*
> >> * by default assume old behaviour and bounce for any highmem page
>
> Mike> Only to then reset to 0 here? Shouldn't we default to 0 and only
> Mike> set to 1 where applicable (e.g. sd_config_discard)?
>
> My first approach was to set it in dm-table.c before stacking. But I
> thought it was icky to have the stacking driver ask for defaults and
> then have to tweak them for things to work correctly.
>
> The other option is to have blk_set_default_stacking_limits(). Or we
> could add a flag to blk_set_default_limits to indicate whether this is a
> LLD or a stacking driver.
>
> We already special-case BLK_SAFE_MAX_SECTORS when setting the request
> function. And that's the only non-stacking user of the default limits
> call. So that's why I disabled dzd there. Since this is a stable bugfix
> I also wanted to keep it small and simple. But I'm totally open to
> suggestions.
Your current approach sounds good. Might be good to briefly speak to
the duality of the stacking vs non-stacking approach in the associated
patch header.
Thanks for clarifying.
Mike
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-04 15:16 ` [linux-lvm] [dm-devel] " Martin K. Petersen
2011-05-04 16:12 ` Lukas Czerner
@ 2011-05-05 8:33 ` Karel Zak
2011-05-05 10:48 ` Lukas Czerner
1 sibling, 1 reply; 54+ messages in thread
From: Karel Zak @ 2011-05-05 8:33 UTC (permalink / raw)
To: Martin K. Petersen
Cc: sandeen, Mike Snitzer, Christoph Hellwig,
device-mapper development, DarkNovaNick, linux-lvm, Lukas Czerner,
linux-ext4, Alasdair G Kergon
On Wed, May 04, 2011 at 11:16:05AM -0400, Martin K. Petersen wrote:
> >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
>
> I got tired of poking around in sysfs to find the discard topology.
> Here's a patch against lsblk that adds a -D option to present this
> information in a human-readable form:
Applied, thanks.
> # lsblk -D
> NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
> sda 0 0B 0B 0
> └─sda1 0 0B 0B 0
> sdb 0 512B 2G 1
> └─sdb1 0 512B 2G 1
I have a question, 2.6.35 on my ThinkPad, non-SSD disk:
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda 0 0B 0B 0
├─sda1 4294935040 0B 0B 0
├─sda2 4188038656 0B 0B 0
├─sda3 1346205184 0B 0B 0
├─sda4 3231165440 0B 0B 0
├─sda5 4188006400 0B 0B 0
│ └─kzak-home (dm-0) 0 0B 0B 0
└─sda6 2035725312 0B 0B 0
Does is make sense? The DISC-ALN is non-zero but DISC-GRAN is zero.
Note that cat /sys/block/sda/sda*/discard_alignment returns the same
numbers.
Karel
--
Karel Zak <kzak@redhat.com>
http://karelzak.blogspot.com
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] [dm-devel] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-05 8:33 ` Karel Zak
@ 2011-05-05 10:48 ` Lukas Czerner
0 siblings, 0 replies; 54+ messages in thread
From: Lukas Czerner @ 2011-05-05 10:48 UTC (permalink / raw)
To: Karel Zak
Cc: sandeen, Mike Snitzer, Martin K. Petersen, Christoph Hellwig,
device-mapper development, DarkNovaNick, linux-lvm, Lukas Czerner,
linux-ext4, Alasdair G Kergon
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1632 bytes --]
On Thu, 5 May 2011, Karel Zak wrote:
> On Wed, May 04, 2011 at 11:16:05AM -0400, Martin K. Petersen wrote:
> > >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
> >
> > I got tired of poking around in sysfs to find the discard topology.
> > Here's a patch against lsblk that adds a -D option to present this
> > information in a human-readable form:
>
> Applied, thanks.
>
> > # lsblk -D
> > NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
> > sda 0 0B 0B 0
> > └─sda1 0 0B 0B 0
> > sdb 0 512B 2G 1
> > └─sdb1 0 512B 2G 1
>
> I have a question, 2.6.35 on my ThinkPad, non-SSD disk:
>
> NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
> sda 0 0B 0B 0
> ├─sda1 4294935040 0B 0B 0
> ├─sda2 4188038656 0B 0B 0
> ├─sda3 1346205184 0B 0B 0
> ├─sda4 3231165440 0B 0B 0
> ├─sda5 4188006400 0B 0B 0
> │ └─kzak-home (dm-0) 0 0B 0B 0
> └─sda6 2035725312 0B 0B 0
>
>
> Does is make sense? The DISC-ALN is non-zero but DISC-GRAN is zero.
>
> Note that cat /sys/block/sda/sda*/discard_alignment returns the same
> numbers.
>
> Karel
>
>
That is fixed by the kernel patch posted above :)
-Lukas
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-04 15:10 ` Martin K. Petersen
2011-05-04 16:02 ` [linux-lvm] " Mike Snitzer
2011-05-04 17:10 ` [linux-lvm] [dm-devel] " Lukas Czerner
@ 2011-05-18 12:16 ` Mike Snitzer
2011-05-18 12:52 ` Mike Snitzer
2 siblings, 1 reply; 54+ messages in thread
From: Mike Snitzer @ 2011-05-18 12:16 UTC (permalink / raw)
To: Martin K. Petersen
Cc: sandeen, Christoph Hellwig, device-mapper development,
DarkNovaNick, linux-lvm, Lukas Czerner, linux-ext4,
Alasdair G Kergon
Hey Martin,
On Wed, May 04 2011 at 11:10am -0400,
Martin K. Petersen <martin.petersen@oracle.com> wrote:
> >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
>
> Lukas> Nevertheless there is something weird going on, because even when
> Lukas> I create striped volume I get this:
>
> Could you please try the following patch? It has a bunch of small tweaks
> to the discard stack in it. I'll split it up before posting for real but
> I'd like to know if it fixes your issue...
Would be ideal to get these fixes staged for 2.6.40.
Do you intend to split these patches up and post for 2.6.40 inclussion?
Please advise, thanks!
> block/libata/scsi: Various logical block provisioning fixes
>
> - Add sysfs documentation for the discard topology parameters
>
> - Fix discard stacking problem
>
> - Switch our libata SAT over to using the WRITE SAME limits
>
> - UNMAP alignment needs to be converted to bytes
>
> - Only report alignment and zeroes_data if the device supports discard
>
> Reported-by: Lukas Czerner <lczerner@redhat.com>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
>
> diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
> index 4873c75..c1eb41c 100644
> --- a/Documentation/ABI/testing/sysfs-block
> +++ b/Documentation/ABI/testing/sysfs-block
> @@ -142,3 +142,67 @@ Description:
> with the previous I/O request are enabled. When set to 2,
> all merge tries are disabled. The default value is 0 -
> which enables all types of merge tries.
> +
> +What: /sys/block/<disk>/discard_alignment
> +Date: May 2011
> +Contact: Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> + Devices that support discard functionality may
> + internally allocate space in units that are bigger than
> + the exported logical block size. The discard_alignment
> + parameter indicates how many bytes the beginning of the
> + device is offset from the internal allocation unit's
> + natural alignment.
> +
> +What: /sys/block/<disk>/<partition>/discard_alignment
> +Date: May 2011
> +Contact: Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> + Devices that support discard functionality may
> + internally allocate space in units that are bigger than
> + the exported logical block size. The discard_alignment
> + parameter indicates how many bytes the beginning of the
> + partition is offset from the internal allocation unit's
> + natural alignment.
> +
> +What: /sys/block/<disk>/queue/discard_granularity
> +Date: May 2011
> +Contact: Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> + Devices that support discard functionality may
> + internally allocate space using units that are bigger
> + than the logical block size. The discard_granularity
> + parameter indicates the size of the internal allocation
> + unit in bytes if reported by the device. Otherwise the
> + discard_granularity will be set to match the device's
> + physical block size. A discard_granularity of 0 means
> + that the device does not support discard functionality.
> +
> +What: /sys/block/<disk>/queue/discard_max_bytes
> +Date: May 2011
> +Contact: Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> + Devices that support discard functionality may have
> + internal limits on the number of bytes that can be
> + trimmed or unmapped in a single operation. Some storage
> + protocols also have inherent limits on the number of
> + blocks that can be described in a single command. The
> + discard_max_bytes parameter is set by the device driver
> + to the maximum number of bytes that can be discarded in
> + a single operation. Discard requests issued to the
> + device must not exceed this limit. A discard_max_bytes
> + value of 0 means that the device does not support
> + discard functionality.
> +
> +What: /sys/block/<disk>/queue/discard_zeroes_data
> +Date: May 2011
> +Contact: Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> + Devices that support discard functionality may return
> + stale or random data when a previously discarded block
> + is read back. This can cause problems if the filesystem
> + expects discarded blocks to be explicitly cleared. If a
> + device reports that it deterministically returns zeroes
> + when a discarded area is read the discard_zeroes_data
> + parameter will be set to one. Otherwise it will be 0 and
> + the result of reading a discarded area is undefined.
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 1fa7692..42d3bf5 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -120,7 +120,7 @@ void blk_set_default_limits(struct queue_limits *lim)
> lim->discard_granularity = 0;
> lim->discard_alignment = 0;
> lim->discard_misaligned = 0;
> - lim->discard_zeroes_data = -1;
> + lim->discard_zeroes_data = 1;
> lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
> lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT);
> lim->alignment_offset = 0;
> @@ -166,6 +166,7 @@ void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
>
> blk_set_default_limits(&q->limits);
> blk_queue_max_hw_sectors(q, BLK_SAFE_MAX_SECTORS);
> + q->limits.discard_zeroes_data = 0;
>
> /*
> * by default assume old behaviour and bounce for any highmem page
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index e2f57e9e..30ea95f 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -2138,7 +2138,7 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
> * with the unmap bit set.
> */
> if (ata_id_has_trim(args->id)) {
> - put_unaligned_be32(65535 * 512 / 8, &rbuf[20]);
> + put_unaligned_be64(65535 * 512 / 8, &rbuf[36]);
> put_unaligned_be32(1, &rbuf[28]);
> }
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index bd0806e..cc081dd 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -490,7 +490,8 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
> unsigned int max_blocks = 0;
>
> q->limits.discard_zeroes_data = sdkp->lbprz;
> - q->limits.discard_alignment = sdkp->unmap_alignment;
> + q->limits.discard_alignment = sdkp->unmap_alignment *
> + (logical_block_size >> 9);
> q->limits.discard_granularity =
> max(sdkp->physical_block_size,
> sdkp->unmap_granularity * logical_block_size);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 2ad95fa..1416e3c 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -257,7 +257,7 @@ struct queue_limits {
> unsigned char misaligned;
> unsigned char discard_misaligned;
> unsigned char cluster;
> - signed char discard_zeroes_data;
> + unsigned char discard_zeroes_data;
> };
>
> struct request_queue
> @@ -1066,13 +1066,16 @@ static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector
> {
> unsigned int alignment = (sector << 9) & (lim->discard_granularity - 1);
>
> + if (!lim->max_discard_sectors)
> + return 0;
> +
> return (lim->discard_granularity + lim->discard_alignment - alignment)
> & (lim->discard_granularity - 1);
> }
>
> static inline unsigned int queue_discard_zeroes_data(struct request_queue *q)
> {
> - if (q->limits.discard_zeroes_data == 1)
> + if (q->limits.max_discard_sectors && q->limits.discard_zeroes_data == 1)
> return 1;
>
> return 0;
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target]
2011-05-18 12:16 ` [linux-lvm] " Mike Snitzer
@ 2011-05-18 12:52 ` Mike Snitzer
0 siblings, 0 replies; 54+ messages in thread
From: Mike Snitzer @ 2011-05-18 12:52 UTC (permalink / raw)
To: Martin K. Petersen
Cc: sandeen, Christoph Hellwig, device-mapper development,
DarkNovaNick, linux-lvm, Lukas Czerner, linux-ext4,
Alasdair G Kergon
On Wed, May 18 2011 at 8:16am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> Hey Martin,
>
> On Wed, May 04 2011 at 11:10am -0400,
> Martin K. Petersen <martin.petersen@oracle.com> wrote:
>
> > >>>>> "Lukas" == Lukas Czerner <lczerner@redhat.com> writes:
> >
> > Lukas> Nevertheless there is something weird going on, because even when
> > Lukas> I create striped volume I get this:
> >
> > Could you please try the following patch? It has a bunch of small tweaks
> > to the discard stack in it. I'll split it up before posting for real but
> > I'd like to know if it fixes your issue...
>
> Would be ideal to get these fixes staged for 2.6.40.
>
> Do you intend to split these patches up and post for 2.6.40 inclussion?
Ah, I now see you posted them and Jens picked some up.. thanks
Mike
^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2011-05-18 12:52 UTC | newest]
Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-12 14:59 [linux-lvm] Testing TRIM with LVM DarkNovaNick
2011-04-12 23:47 ` Mike Snitzer
2011-04-13 1:47 ` DarkNovaNick
2011-04-13 8:41 ` Zdenek Kabelac
2011-04-13 15:38 ` DarkNovaNick
2011-04-13 22:40 ` [linux-lvm] [PATCH] dm snapshot: add discard support to the snapshot-origin target [was: Re: Testing TRIM with LVM] Mike Snitzer
2011-04-13 23:48 ` [linux-lvm] " Mike Snitzer
2011-04-26 17:32 ` Mike Snitzer
2011-04-28 0:19 ` [linux-lvm] [PATCH] dm snapshot: ignore discards issued to the snapshot-origin target Mike Snitzer
2011-04-28 7:53 ` [linux-lvm] [dm-devel] " Christoph Hellwig
2011-04-28 20:59 ` [linux-lvm] do not disable ext4 discards on first discard failure? [was: Re: dm snapshot: ignore discards issued to the snapshot-origin target] Mike Snitzer
2011-04-28 21:28 ` Eric Sandeen
2011-04-28 22:59 ` Alasdair G Kergon
2011-04-28 23:01 ` Eric Sandeen
2011-04-28 23:11 ` Alasdair G Kergon
2011-04-29 1:12 ` Andreas Dilger
2011-04-29 13:55 ` Mike Snitzer
2011-04-29 9:30 ` Lukas Czerner
2011-04-29 12:24 ` [linux-lvm] [dm-devel] " Alasdair G Kergon
2011-04-29 12:29 ` Christoph Hellwig
2011-04-29 14:28 ` Eric Sandeen
2011-04-29 15:13 ` Ray Morris
2011-05-04 16:33 ` Ted Ts'o
2011-05-04 17:02 ` Lukas Czerner
2011-05-02 7:16 ` Lukas Czerner
2011-05-02 8:13 ` Alasdair G Kergon
2011-05-02 8:19 ` Christoph Hellwig
2011-05-02 10:24 ` Lukas Czerner
2011-05-02 12:48 ` [linux-lvm] " Mike Snitzer
2011-05-02 13:05 ` Lukas Czerner
2011-05-02 14:47 ` Eric Sandeen
2011-05-02 14:48 ` Christoph Hellwig
2011-05-02 14:58 ` Lukas Czerner
2011-05-02 13:48 ` [linux-lvm] [dm-devel] " Martin K. Petersen
2011-05-02 14:20 ` Martin K. Petersen
2011-05-02 14:39 ` Lukas Czerner
2011-05-02 14:50 ` Martin K. Petersen
2011-05-02 14:58 ` [linux-lvm] " Mike Snitzer
2011-05-02 16:58 ` [linux-lvm] [dm-devel] " Martin K. Petersen
2011-05-03 8:57 ` Lukas Czerner
2011-05-04 15:10 ` Martin K. Petersen
2011-05-04 16:02 ` [linux-lvm] " Mike Snitzer
2011-05-04 16:50 ` Martin K. Petersen
2011-05-04 18:03 ` Mike Snitzer
2011-05-04 17:10 ` [linux-lvm] [dm-devel] " Lukas Czerner
2011-05-04 17:32 ` Martin K. Petersen
2011-05-04 17:35 ` Lukas Czerner
2011-05-18 12:16 ` [linux-lvm] " Mike Snitzer
2011-05-18 12:52 ` Mike Snitzer
2011-05-04 15:16 ` [linux-lvm] [dm-devel] " Martin K. Petersen
2011-05-04 16:12 ` Lukas Czerner
2011-05-05 8:33 ` Karel Zak
2011-05-05 10:48 ` Lukas Czerner
2011-04-14 15:31 ` [linux-lvm] [PATCH] dm snapshot: add discard support to the snapshot-origin target [was: Re: Testing TRIM wi DarkNovaNick
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).