Linux block layer
 help / color / mirror / Atom feed
* Re: [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks
       [not found] ` <169687551395.3948976.8425812597156927952.stgit@frogsfrogsfrogs>
@ 2023-10-10  7:01   ` Christoph Hellwig
  2023-10-11 19:10     ` Darrick J. Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2023-10-10  7:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, linux-xfs, fstests, guan, linux-block, linux-scsi

On Mon, Oct 09, 2023 at 11:18:33AM -0700, Darrick J. Wong wrote:
> The storage advertises SCSI UNMAP support, but it is of the variety
> where the UNMAP command returns immediately but takes its time to unmap
> in the background.  Subsequent rereads are allowed to return stale
> contents, per DISCARD semantics.
> 
> When the fstests cloud is not busy, the old contents disappear in a few
> seconds.  However, at peak utilization, there are ~75 VMs running, and
> the storage backend can take several minutes to commit these background
> requests.

Umm, that is not valid behavior fo SCSI UNMAP or any other command
that Linux discard maps to.  All of them can do one of the two options
on a per-block basis:

 - return the unmap pattern (usually but not always 0) for any read
   following the unmap/trim/discard
 - always return the previous pattern until it is overwritten or
   discarded again

Changing the pattern some time after unmap is a grave bug, and we need
to blacklist the device.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks
  2023-10-10  7:01   ` [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks Christoph Hellwig
@ 2023-10-11 19:10     ` Darrick J. Wong
  2023-10-12  4:26       ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Darrick J. Wong @ 2023-10-11 19:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: zlang, linux-xfs, fstests, guan, linux-block, linux-scsi

On Tue, Oct 10, 2023 at 12:01:33AM -0700, Christoph Hellwig wrote:
> On Mon, Oct 09, 2023 at 11:18:33AM -0700, Darrick J. Wong wrote:
> > The storage advertises SCSI UNMAP support, but it is of the variety
> > where the UNMAP command returns immediately but takes its time to unmap
> > in the background.  Subsequent rereads are allowed to return stale
> > contents, per DISCARD semantics.
> > 
> > When the fstests cloud is not busy, the old contents disappear in a few
> > seconds.  However, at peak utilization, there are ~75 VMs running, and
> > the storage backend can take several minutes to commit these background
> > requests.
> 
> Umm, that is not valid behavior fo SCSI UNMAP or any other command
> that Linux discard maps to.  All of them can do one of the two options
> on a per-block basis:
> 
>  - return the unmap pattern (usually but not always 0) for any read
>    following the unmap/trim/discard
>  - always return the previous pattern until it is overwritten or
>    discarded again
> 
> Changing the pattern some time after unmap is a grave bug, and we need
> to blacklist the device.

Ok, I'll go pester them about fixing that, if they haven't already.
Apparently discard support is somewhat new.

I'm pretty sure I've seen some NVME SSDs where you can issue devicewide
DISCARDs and slowly watch the namespace utilization go down over tens of
minutes; and reads will only eventually start returning zeroes.

(Note that *writes* during the slow-discard period are persisted
correctly.)

However, that's orthogonal to this patch -- if the device doesn't
support discard, _scratch_mkfs won't zero the entire disk to remove old
dead superblocks that might have been written by previous tests.  After
we shatter the primary super, the xfs_repair scanning code can still
trip over those old supers and break the golden output.

--D

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks
  2023-10-11 19:10     ` Darrick J. Wong
@ 2023-10-12  4:26       ` Christoph Hellwig
  2023-10-12  5:03         ` Darrick J. Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2023-10-12  4:26 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, zlang, linux-xfs, fstests, guan, linux-block,
	linux-scsi

On Wed, Oct 11, 2023 at 12:10:25PM -0700, Darrick J. Wong wrote:
> I'm pretty sure I've seen some NVME SSDs where you can issue devicewide
> DISCARDs and slowly watch the namespace utilization go down over tens of
> minutes; and reads will only eventually start returning zeroes.

Well, the second part is broken.  The first part is fine, and I've briefly
consulted with a firmware team implementing such a feature.  It just needs
to make sure to return zeroes right after the return of the discard
even if the blocks aren't erased yet, including after a powerfail.
(anyone who knows the XFS truncate / hole punch code will have a vague
idea of how that could work).

> However, that's orthogonal to this patch -- if the device doesn't
> support discard, _scratch_mkfs won't zero the entire disk to remove old
> dead superblocks that might have been written by previous tests.  After
> we shatter the primary super, the xfs_repair scanning code can still
> trip over those old supers and break the golden output.

True.  I have to admit I stopped reading the patch after the unmap
description.  I'll take another look.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks
  2023-10-12  4:26       ` Christoph Hellwig
@ 2023-10-12  5:03         ` Darrick J. Wong
  0 siblings, 0 replies; 4+ messages in thread
From: Darrick J. Wong @ 2023-10-12  5:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: zlang, linux-xfs, fstests, guan, linux-block, linux-scsi

On Wed, Oct 11, 2023 at 09:26:51PM -0700, Christoph Hellwig wrote:
> On Wed, Oct 11, 2023 at 12:10:25PM -0700, Darrick J. Wong wrote:
> > I'm pretty sure I've seen some NVME SSDs where you can issue devicewide
> > DISCARDs and slowly watch the namespace utilization go down over tens of
> > minutes; and reads will only eventually start returning zeroes.
> 
> Well, the second part is broken.  The first part is fine, and I've briefly
> consulted with a firmware team implementing such a feature.  It just needs
> to make sure to return zeroes right after the return of the discard
> even if the blocks aren't erased yet, including after a powerfail.
> (anyone who knows the XFS truncate / hole punch code will have a vague
> idea of how that could work).
> 
> > However, that's orthogonal to this patch -- if the device doesn't
> > support discard, _scratch_mkfs won't zero the entire disk to remove old
> > dead superblocks that might have been written by previous tests.  After
> > we shatter the primary super, the xfs_repair scanning code can still
> > trip over those old supers and break the golden output.
> 
> True.  I have to admit I stopped reading the patch after the unmap
> description.  I'll take another look.

<nod> I think I'll update the next version of this patch to substitute
the paragraph that I wrote above for all the misleading ramblings about
DISCARD.  Today's revisit of that clod block device doesn't show the
weird stale reads that disappear behavior, so it's entirely possible
that they've fixed it already.

--D

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-10-12  5:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <169687550821.3948976.6892161616008393594.stgit@frogsfrogsfrogs>
     [not found] ` <169687551395.3948976.8425812597156927952.stgit@frogsfrogsfrogs>
2023-10-10  7:01   ` [PATCH 1/3] xfs/178: don't fail when SCRATCH_DEV contains random xfs superblocks Christoph Hellwig
2023-10-11 19:10     ` Darrick J. Wong
2023-10-12  4:26       ` Christoph Hellwig
2023-10-12  5:03         ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox