* Re: fstrim on newly created filesystem tries to discard data beyond the last sector of a device
2014-11-21 17:09 ` fstrim on newly created filesystem tries to discard data beyond the last sector of a device Lutz Vieweg
@ 2014-11-21 21:20 ` Mike Frysinger
2014-11-24 9:23 ` Karel Zak
2014-11-24 12:25 ` Lukáš Czerner
2014-11-24 21:24 ` Dave Chinner
2 siblings, 1 reply; 9+ messages in thread
From: Mike Frysinger @ 2014-11-21 21:20 UTC (permalink / raw)
To: Lutz Vieweg
Cc: util-linux-u79uwXL29TY76Z2rM5mHXA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-xfs-VZNHf3L845pBDgjK7y7TUQ
[-- Attachment #1: Type: text/plain, Size: 619 bytes --]
On 21 Nov 2014 18:09, Lutz Vieweg wrote:
> The relevant strace output of the above fstrim command:
> > stat("/mnt/PFexp1", {st_mode=S_IFDIR|0755, st_size=6, ...}) = 0
> > open("/mnt/PFexp1", O_RDONLY) = 3
> > ioctl(3, FITRIM, 0x7fff0733a4c0) = -1 EIO (Input/output error)
that hasn't decoded the ioctl (send a patch for that), but guess is that you're
passing down the default:
range.start = 0;
range.minlen = 0;
range.len = UULONG_MAX;
in which case the expectation is the kernel layers will take care of trimming
everything and not die when it hits the end of the device.
-mike
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: fstrim on newly created filesystem tries to discard data beyond the last sector of a device
2014-11-21 21:20 ` Mike Frysinger
@ 2014-11-24 9:23 ` Karel Zak
0 siblings, 0 replies; 9+ messages in thread
From: Karel Zak @ 2014-11-24 9:23 UTC (permalink / raw)
To: Lutz Vieweg, util-linux, linux-fsdevel, linux-xfs; +Cc: Lukas Czerner
On Fri, Nov 21, 2014 at 04:20:44PM -0500, Mike Frysinger wrote:
> On 21 Nov 2014 18:09, Lutz Vieweg wrote:
> > The relevant strace output of the above fstrim command:
> > > stat("/mnt/PFexp1", {st_mode=S_IFDIR|0755, st_size=6, ...}) = 0
> > > open("/mnt/PFexp1", O_RDONLY) = 3
> > > ioctl(3, FITRIM, 0x7fff0733a4c0) = -1 EIO (Input/output error)
>
> that hasn't decoded the ioctl (send a patch for that), but guess is that you're
> passing down the default:
> range.start = 0;
> range.minlen = 0;
> range.len = UULONG_MAX;
>
> in which case the expectation is the kernel layers will take care of trimming
> everything and not die when it hits the end of the device.
Yep, it's fine to specify such range, xfs_ioc_trim():
end = start + BTOBBT(range.len) - 1;
...
if (end > XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) - 1)
end = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)- 1;
it really seems like kernel issue. (CC: to Lukas).
Karel
--
Karel Zak <kzak@redhat.com>
http://karelzak.blogspot.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: fstrim on newly created filesystem tries to discard data beyond the last sector of a device
2014-11-21 17:09 ` fstrim on newly created filesystem tries to discard data beyond the last sector of a device Lutz Vieweg
2014-11-21 21:20 ` Mike Frysinger
@ 2014-11-24 12:25 ` Lukáš Czerner
2014-11-24 19:30 ` Lutz Vieweg
2014-11-24 21:24 ` Dave Chinner
2 siblings, 1 reply; 9+ messages in thread
From: Lukáš Czerner @ 2014-11-24 12:25 UTC (permalink / raw)
To: Lutz Vieweg; +Cc: linux-fsdevel, util-linux, linux-xfs
On Fri, 21 Nov 2014, Lutz Vieweg wrote:
> Date: Fri, 21 Nov 2014 18:09:17 +0100
> From: Lutz Vieweg <lvml@5t9.de>
> To: linux-fsdevel@vger.kernel.org
> Cc: util-linux@vger.kernel.org, linux-xfs@oss.sgi.com
> Subject: fstrim on newly created filesystem tries to discard data beyond the
> last sector of a device
>
> I'm experiencing a 100% reproduceable misbehaviour of
> fstrim, which seems to put data integrity on stake:
>
> Whenever I use "fstrim" on a just newly "mkfs.xfs"ed
> filesystem on a newly installed SSD (Crucial_CT1024M550SSD1,
> firmware MU01), I get (after some activity on the device)
> this error message:
> > fitrim ioctl failed: input/output error
>
> Looking into the dmesg output reveals:
> > [1039455.530947] sd 0:0:1:0: [sdb]
> > [1039455.533192] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > [1039455.535369] sd 0:0:1:0: [sdb]
> > [1039455.537521] Sense Key : Illegal Request [current]
> > [1039455.539684] Info fld=0x772cdab0
> > [1039455.541802] sd 0:0:1:0: [sdb]
> > [1039455.543877] Add. Sense: Logical block address out of range
> > [1039455.545966] sd 0:0:1:0: [sdb] CDB:
> > [1039455.548008] Unmap/Read sub-channel: 42 00 00 00 00 00 00 00 18 00
> > [1039455.550080] end_request: critical target error, dev sdb, sector
> 1999428272
This is very odd. So the file system will send discard requests for
the free data ranges of the file system (not outside), but there
might be a bug somewhere in there, however I've never seen it so
far with any SSD, or other discard capable devices.
Can you please try to reproduce the problem with the loop device ?
# truncate -s1T /path/to/new/file
# losetup --show -f /path/to/new/file
(this will print out the new loop device for example /dev/loop0)
# mkfs.ext4 /dev/loop0
# mount /dev/loop0 /mount/point
# fstrim -v /mount/point
Can you see any errors or will it succeed ?
Now another thing to try is rule out the file system entirely. Can
you try to run blkdiscard on the ssd device directly ?
# blkdiscard /dev/sdb
# sync
# blkdiscard /dev/sdb
Why twice ? Because I've seen the devices behaving weirdly after it
receives bunch of discard commands and mkfs itself will attempt to
discard the device before it creates the file system on top of it.
Mentioning that, can you try to reproduce you problem with turning
mkfs discard off ?
mkfs.ext4 -E nodiscard ...
mkfs.xfs -K ...
Does it make any difference ?
>
> (I bought 4 of the same SSD model, and the error occurs the same with
> the other exemplars, so I can assume this is not some hardware issue.)
So this might very well be a firmware issue because you have 4
identical devices.
Now looking at the sector that seems to be "out of range" seems to
be actually well in range of the file system. From the mkfs.xfs
output I can see that the file system has 250051158 blocks of 4096
Bytes which is 1024209543168 Bytes. Now the sector mentioned in that
error output is 1999428272 which is (1999428272 * 512 =
1023707275264) which is in range of the file system. According the
data from /proc/partitions it is also true for the entire device.
I can see that the device reports 4096 physical sector size so it
might be that there is a bug regarding 4k physical sector size
somewhere in block layer or a driver ?
>
> The "Logical block address out of range" error says no less than that
> fstrim issued a fitrim ioctl that was asking the device to discard the
> content of sectors well beyond the boundaries of the device. If it
> wasn't for the "end of the physical device" making the SSD return an error,
> if instead there was another partition behind a filesystem to trim, then
> valuable, live data would have been discarded.
>
> I've tried the same with ext4 instead of XFS, and the very same
> error occurs, just with a slightly different sector being named
> by the dmesg error output:
> > [710565.947608] end_request: critical target error, dev sdb, sector
> 2000158720
>
>
> Here's a list of properties of the system that might be
> relevant for the issue:
>
> According to smartctl, the capacity of this SSD is:
> > User Capacity: 1,024,209,543,168 bytes [1.02 TB]
> > Sector Sizes: 512 bytes logical, 4096 bytes physical
>
> And cat /proc/partitions tells:
> > major minor #blocks name
> > 8 16 1000204632 sdb
>
> Kernel is mainline linux-3.17.1
>
> fstrim --version says:
> > fstrim from util-linux 2.23.2
>
> Distribution is CentOS 7.
>
> mkfs.xfs -V says:
> > mkfs.xfs version 3.2.0-alpha2
> rpm -qif /usr/sbin/mkfs.xfs
> > Name : xfsprogs
> > Version : 3.2.0
> > Release : 0.10.alpha2.el7
>
> (Should I be concerned that CentOS 7 comes with a mkfs.xfs
> version having an -alpha2 suffix?)
>
> The filesystem is created with:
> > mkfs.xfs -l lazy-count=1 -f /dev/sdb
> > meta-data=/dev/sdb isize=256 agcount=4, agsize=62512790
> > blks
> > = sectsz=4096 attr=2, projid32bit=1
> > = crc=0
> > data = bsize=4096 blocks=250051158, imaxpct=25
> > = sunit=0 swidth=0 blks
> > naming =version 2 bsize=4096 ascii-ci=0 ftype=0
> > log =internal log bsize=4096 blocks=122095, version=2
> > = sectsz=4096 sunit=1 blks, lazy-count=1
> > realtime =none extsz=4096 blocks=0, rtextents=0
>
> The filesystem is mounted with:
> > mount /dev/sdb /mnt/PFexp1
>
> fstrim was started this way:
> > > fstrim -v /mnt/PFexp1
> > fstrim: /mnt/PFexp1: FITRIM ioctl failed: Input/output error
>
> The relevant strace output of the above fstrim command:
> > stat("/mnt/PFexp1", {st_mode=S_IFDIR|0755, st_size=6, ...}) = 0
> > open("/mnt/PFexp1", O_RDONLY) = 3
> > ioctl(3, FITRIM, 0x7fff0733a4c0) = -1 EIO (Input/output error)
>
> Any idea why that happenes?
> Do we need to fear a loss of data when using fstrim in general?
No you definitely should not be. While some bugs might appear we
have extensive test cases to catch that. In fact while there has
been several bugs in the file system fstrim implementation AFAIK it
was never data loss scenario. And so far I do not believe this is
the case here either, but we'll have to investigate first.
Thanks!
-Lukas
>
> Regards,
>
> Lutz Vieweg
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: fstrim on newly created filesystem tries to discard data beyond the last sector of a device
2014-11-24 12:25 ` Lukáš Czerner
@ 2014-11-24 19:30 ` Lutz Vieweg
[not found] ` <5473873E.1070101-i6VILw57VWU@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Lutz Vieweg @ 2014-11-24 19:30 UTC (permalink / raw)
To: linux-fsdevel; +Cc: util-linux, linux-xfs
On 11/24/2014 01:25 PM, Lukáš Czerner wrote:
> Can you please try to reproduce the problem with the loop device ?
>
> # truncate -s1T /path/to/new/file
> # losetup --show -f /path/to/new/file
> (this will print out the new loop device for example /dev/loop0)
>
> # mkfs.ext4 /dev/loop0
> # mount /dev/loop0 /mount/point
> # fstrim -v /mount/point
>
> Can you see any errors or will it succeed ?
I see no errors when doing this. (But then again, do we know whether
the loop device code would complain about a discard beyond its end?)
> Now another thing to try is rule out the file system entirely. Can
> you try to run blkdiscard on the ssd device directly ?
>
> # blkdiscard /dev/sdb
This indeed also reliably triggers an Input/Output error:
>> blkdiscard -v /dev/sdb
> blkdiscard: /dev/sdb: BLKDISCARD ioctl failed: Input/output error
> [971965.901014] sd 0:0:1:0: [sdb]
> [971965.902856] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [971965.904654] sd 0:0:1:0: [sdb]
> [971965.906422] Sense Key : Illegal Request [current]
> [971965.908182] Info fld=0x76fff120
> [971965.909928] sd 0:0:1:0: [sdb]
> [971965.911659] Add. Sense: Logical block address out of range
> [971965.913402] sd 0:0:1:0: [sdb] CDB:
> [971965.915136] Unmap/Read sub-channel: 42 00 00 00 00 00 00 00 18 00
> [971965.916936] end_request: critical target error, dev sdb, sector 1996484896
The relevant associated part of strace output:
> 13230 stat("/dev/sdb", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 16), ...}) = 0
> 13230 open("/dev/sdb", O_WRONLY) = 3
> 13230 ioctl(3, BLKGETSIZE64, 1024209543168) = 0
> 13230 ioctl(3, BLKSSZGET, 512) = 0
> 13230 ioctl(3, BLKDISCARD, {0, 7fffa8b8dd10}) = -1 EIO (Input/output error)
Since the issue also occured with both xfs and ext4, I think we can be
sure now it's not a bug in a filesystem that triggers it.
> Now looking at the sector that seems to be "out of range" seems to
> be actually well in range of the file system. From the mkfs.xfs
> output I can see that the file system has 250051158 blocks of 4096
> Bytes which is 1024209543168 Bytes. Now the sector mentioned in that
> error output is 1999428272 which is (1999428272 * 512 =
> 1023707275264) which is in range of the file system. According the
> data from /proc/partitions it is also true for the entire device.
I could envision that the block discarding happenes in
larger chunks (certainly issuing less than "one TRIM command per 4k"),
so maybe some higher granularity of such chunks would cause
the end of the chunk to be discarded extend beyond the device end?
Of course this is speculation - is there a way to tell which
size the last/failed TRIM command did actually intend to discard?
> I can see that the device reports 4096 physical sector size so it
> might be that there is a bug regarding 4k physical sector size
> somewhere in block layer or a driver ?
That could sure be relevant for branching into a buggy codepath.
Then there's another idea: The device is a SATA SSD, but attached
to a SAS2 expander chip on the backplane of the server (LSI SAS2X28)
which in turn is connected to a LSI SAS HBA 9207-4i4e.
could maybe, just maybe, the TRIM command be modified wrongly
on its way through these / their respective drivers?
>> Do we need to fear a loss of data when using fstrim in general?
>
> No you definitely should not be. While some bugs might appear we
> have extensive test cases to catch that. In fact while there has
> been several bugs in the file system fstrim implementation AFAIK it
> was never data loss scenario. And so far I do not believe this is
> the case here either, but we'll have to investigate first.
I was thinking about how I could setup a proof-of-concept scenario
where the effect actually discards valid data.
I tried creating two partitions on the device, one big covering
most of the SSD, one very small at its end, like:
> Device Boot Start End Blocks Id System
> /dev/sdb1 2048 2000409247 1000203600 83 Linux
> /dev/sdb2 2000409248 2000409263 8 83 Linux
I did this for several sizes of sdb2, not just Blocks=8.
Then I did:
> dd if=/dev/urandom of=/dev/sdb2 bs=512 oflag=direct
> dd if=/dev/sdb2 bs=512 iflag=direct | md5sum
> blkdiscard -v /dev/sdb1
> sync
> dd if=/dev/sdb2 bs=512 iflag=direct | md5sum
... and checked whether the md5sum result was still the same.
The good news is, in no case, when using partitions, would
the blkdiscard /dev/sdb1 command trigger an I/O error, and in all cases
the MD5 sums were the same.
The bad news is: blkdiscard on /dev/sdb2 consistenty triggers the Input/output error:
> blkdiscard -v /dev/sdb2
> blkdiscard: /dev/sdb2: BLKDISCARD ioctl failed: Input/output error
Strange, what might be so different when discarding at the end of
the physical device?
Regards,
Lutz Vieweg
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: fstrim on newly created filesystem tries to discard data beyond the last sector of a device
2014-11-21 17:09 ` fstrim on newly created filesystem tries to discard data beyond the last sector of a device Lutz Vieweg
2014-11-21 21:20 ` Mike Frysinger
2014-11-24 12:25 ` Lukáš Czerner
@ 2014-11-24 21:24 ` Dave Chinner
2 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2014-11-24 21:24 UTC (permalink / raw)
To: Lutz Vieweg
Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
util-linux-u79uwXL29TY76Z2rM5mHXA,
linux-xfs-VZNHf3L845pBDgjK7y7TUQ
On Fri, Nov 21, 2014 at 06:09:17PM +0100, Lutz Vieweg wrote:
> I'm experiencing a 100% reproduceable misbehaviour of
> fstrim, which seems to put data integrity on stake:
>
> Whenever I use "fstrim" on a just newly "mkfs.xfs"ed
> filesystem on a newly installed SSD (Crucial_CT1024M550SSD1,
> firmware MU01), I get (after some activity on the device)
> this error message:
> > fitrim ioctl failed: input/output error
>
> Looking into the dmesg output reveals:
> > [1039455.530947] sd 0:0:1:0: [sdb]
> > [1039455.533192] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > [1039455.535369] sd 0:0:1:0: [sdb]
> > [1039455.537521] Sense Key : Illegal Request [current]
> > [1039455.539684] Info fld=0x772cdab0
> > [1039455.541802] sd 0:0:1:0: [sdb]
> > [1039455.543877] Add. Sense: Logical block address out of range
> > [1039455.545966] sd 0:0:1:0: [sdb] CDB:
> > [1039455.548008] Unmap/Read sub-channel: 42 00 00 00 00 00 00 00 18 00
> > [1039455.550080] end_request: critical target error, dev sdb, sector 1999428272
So, that's a sector well within the advertised size of the device.
> (I bought 4 of the same SSD model, and the error occurs the same with
> the other exemplars, so I can assume this is not some hardware issue.)
Oh, I wouldn't bet on it. Very likely this is a firmware bug,
because...
> I've tried the same with ext4 instead of XFS, and the very same
> error occurs, just with a slightly different sector being named
> by the dmesg error output:
> > [710565.947608] end_request: critical target error, dev sdb, sector 2000158720
Even that is supposed to be within the device range.
> Here's a list of properties of the system that might be
> relevant for the issue:
>
> According to smartctl, the capacity of this SSD is:
> > User Capacity: 1,024,209,543,168 bytes [1.02 TB]
> > Sector Sizes: 512 bytes logical, 4096 bytes physical
They make 512e SSDs now? I haven't seen one of them before. Anyway,
for a device of that size the number of logical sectors is
2000409264, which means the above errors are 500MB and 128MB from
the end of the device, respectively.
> And cat /proc/partitions tells:
> > major minor #blocks name
> > 8 16 1000204632 sdb
They are also well within the end of the device as advertised by the
kernel. This doesn't look like a filesystem or kernel issue, though
you can rull that out completely with a block trace that will show
us exactly what IO errored out...
> Do we need to fear a loss of data when using fstrim in general?
In general, from a kernel perspective, no. However, from a "does my
hardware work correctly?" perspective, we have come across lots of
devices/firmwares with broken TRIM implementations over the years.
I'd suggest you upgrade your drive to the latest firmware before
testing it again...
Cheers,
Dave.
--
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe util-linux" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread