All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Joe Thornber <thornber@redhat.com>,
	Sarthak Kukreti <sarthakkukreti@chromium.org>,
	dm-devel@redhat.com, "Michael S. Tsirkin" <mst@redhat.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Jason Wang <jasowang@redhat.com>,
	Bart Van Assche <bvanassche@google.com>,
	linux-kernel@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Joe Thornber <ejt@redhat.com>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	linux-fsdevel@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	linux-ext4@vger.kernel.org, Brian Foster <bfoster@redhat.com>,
	Alasdair Kergon <agk@redhat.com>
Subject: Re: [dm-devel] [PATCH v7 0/5] Introduce provisioning primitives
Date: Wed, 7 Jun 2023 19:50:25 -0400	[thread overview]
Message-ID: <ZIEXwTd17z0iYW4s@redhat.com> (raw)
In-Reply-To: <ZH/k9ss2Cg9HYrEV@dread.disaster.area>

On Tue, Jun 06 2023 at 10:01P -0400,
Dave Chinner <david@fromorbit.com> wrote:

> On Sat, Jun 03, 2023 at 11:57:48AM -0400, Mike Snitzer wrote:
> > On Fri, Jun 02 2023 at  8:52P -0400,
> > Dave Chinner <david@fromorbit.com> wrote:
> > 
> > > Mike, I think you might have misunderstood what I have been proposing.
> > > Possibly unintentionally, I didn't call it REQ_OP_PROVISION but
> > > that's what I intended - the operation does not contain data at all.
> > > It's an operation like REQ_OP_DISCARD or REQ_OP_WRITE_ZEROS - it
> > > contains a range of sectors that need to be provisioned (or
> > > discarded), and nothing else.
> > 
> > No, I understood that.
> > 
> > > The write IOs themselves are not tagged with anything special at all.
> > 
> > I know, but I've been looking at how to also handle the delalloc
> > usecase (and yes I know you feel it doesn't need handling, the issue
> > is XFS does deal nicely with ensuring it has space when it tracks its
> > allocations on "thick" storage
> 
> Oh, no it doesn't. It -works for most cases-, but that does not mean
> it provides any guarantees at all. We can still get ENOSPC for user
> data when delayed allocation reservations "run out".
> 
> This may be news to you, but the ephemeral XFS delayed allocation
> space reservation is not accurate. It contains a "fudge factor"
> called "indirect length". This is a "wet finger in the wind"
> estimation of how much new metadata will need to be allocated to
> index the physical allocations when they are made. It assumes large
> data extents are allocated, which is good enough for most cases, but
> it is no guarantee when there are no large data extents available to
> allocate (e.g. near ENOSPC!).
> 
> And therein lies the fundamental problem with ephemeral range
> reservations: at the time of reservation, we don't know how many
> individual physical LBA ranges the reserved data range is actually
> going to span.
> 
> As a result, XFS delalloc reservations are a "close-but-not-quite"
> reservation backed by a global reserve pool that can be dipped into
> if we run out of delalloc reservation. If the reserve pool is then
> fully depleted before all delalloc conversion completes, we'll still
> give ENOSPC. The pool is sized such that the vast majority of
> workloads will complete delalloc conversion successfully before the
> pool is depleted.
> 
> Hence XFS gives everyone the -appearance- that it deals nicely with
> ENOSPC conditions, but it never provides a -guarantee- that any
> accepted write will always succeed without ENOSPC.
> 
> IMO, using this "close-but-not-quite" reservation as the basis of
> space requirements for other layers to provide "won't ENOSPC"
> guarantees is fraught with problems. We already know that it is
> insufficient in important corner cases at the filesystem level, and
> we also know that lower layers trying to do ephemeral space
> reservations will have exactly the same problems providing a
> guarantee. And these are problems we've been unable to engineer
> around in the past, so the likelihood we can engineer around them
> now or in the future is also very unlikely.

Thanks for clarifying. Wasn't aware of XFS delalloc's "wet finger in
the air" ;)

So do you think it reasonable to require applications to fallocate
their data files? Unaware if users are aware to take that extra step.

> > -- so adding coordination between XFS
> > and dm-thin layers provides comparable safety.. that safety is an
> > expected norm).
> >
> > But rather than discuss in terms of data vs metadata, the distinction
> > is:
> > 1) LBA range reservation (normal case, your proposal)
> > 2) non-LBA reservation (absolute value, LBA range is known at later stage)
> > 
> > But I'm clearly going off script for dwelling on wanting to handle
> > both.
> 
> Right, because if we do 1) then we don't need 2). :)

Sure.

> > My looking at (ab)using REQ_META being set (use 1) vs not (use 2) was
> > a crude simplification for branching between the 2 approaches.
> > 
> > And I understand I made you nervous by expanding the scope to a much
> > more muddled/shitty interface. ;)
> 
> Nervous? No, I'm simply trying to make sure that everyone is on the
> same page. i.e. that if we water down the guarantee that 1) relies
> on, then it's not actually useful to filesystems at all.

Yeah, makes sense.
 
> > > Put simply: if we restrict REQ_OP_PROVISION guarantees to just
> > > REQ_META writes (or any other specific type of write operation) then
> > > it's simply not worth persuing at the filesystem level because the
> > > guarantees we actually need just aren't there and the complexity of
> > > discovering and handling those corner cases just isn't worth the
> > > effort.
> > 
> > Here is where I get to say: I think you misunderstood me (but it was
> > my fault for not being absolutely clear: I'm very much on the same
> > page as you and Joe; and your visions need to just be implemented
> > ASAP).
> 
> OK, good that we've clarified the misunderstandings on both sides
> quickly :)

Do you think you're OK to scope out, and/or implement, the XFS changes
if you use v7 of this patchset as the starting point? (v8 should just
be v7 minus the dm-thin.c and dm-snap.c changes).  The thinp
support in v7 will work enough to allow XFS to issue REQ_OP_PROVISION
and/or fallocate (via mkfs.xfs) to dm-thin devices.

And Joe and I can make independent progress on the dm-thin.c changes
needed to ensure the REQ_OP_PROVISION gaurantee you need.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	Joe Thornber <thornber@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Jason Wang <jasowang@redhat.com>,
	Bart Van Assche <bvanassche@google.com>,
	linux-kernel@vger.kernel.org, Joe Thornber <ejt@redhat.com>,
	linux-block@vger.kernel.org, dm-devel@redhat.com,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Sarthak Kukreti <sarthakkukreti@chromium.org>,
	linux-fsdevel@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	linux-ext4@vger.kernel.org, Brian Foster <bfoster@redhat.com>,
	Alasdair Kergon <agk@redhat.com>
Subject: Re: [PATCH v7 0/5] Introduce provisioning primitives
Date: Wed, 7 Jun 2023 19:50:25 -0400	[thread overview]
Message-ID: <ZIEXwTd17z0iYW4s@redhat.com> (raw)
In-Reply-To: <ZH/k9ss2Cg9HYrEV@dread.disaster.area>

On Tue, Jun 06 2023 at 10:01P -0400,
Dave Chinner <david@fromorbit.com> wrote:

> On Sat, Jun 03, 2023 at 11:57:48AM -0400, Mike Snitzer wrote:
> > On Fri, Jun 02 2023 at  8:52P -0400,
> > Dave Chinner <david@fromorbit.com> wrote:
> > 
> > > Mike, I think you might have misunderstood what I have been proposing.
> > > Possibly unintentionally, I didn't call it REQ_OP_PROVISION but
> > > that's what I intended - the operation does not contain data at all.
> > > It's an operation like REQ_OP_DISCARD or REQ_OP_WRITE_ZEROS - it
> > > contains a range of sectors that need to be provisioned (or
> > > discarded), and nothing else.
> > 
> > No, I understood that.
> > 
> > > The write IOs themselves are not tagged with anything special at all.
> > 
> > I know, but I've been looking at how to also handle the delalloc
> > usecase (and yes I know you feel it doesn't need handling, the issue
> > is XFS does deal nicely with ensuring it has space when it tracks its
> > allocations on "thick" storage
> 
> Oh, no it doesn't. It -works for most cases-, but that does not mean
> it provides any guarantees at all. We can still get ENOSPC for user
> data when delayed allocation reservations "run out".
> 
> This may be news to you, but the ephemeral XFS delayed allocation
> space reservation is not accurate. It contains a "fudge factor"
> called "indirect length". This is a "wet finger in the wind"
> estimation of how much new metadata will need to be allocated to
> index the physical allocations when they are made. It assumes large
> data extents are allocated, which is good enough for most cases, but
> it is no guarantee when there are no large data extents available to
> allocate (e.g. near ENOSPC!).
> 
> And therein lies the fundamental problem with ephemeral range
> reservations: at the time of reservation, we don't know how many
> individual physical LBA ranges the reserved data range is actually
> going to span.
> 
> As a result, XFS delalloc reservations are a "close-but-not-quite"
> reservation backed by a global reserve pool that can be dipped into
> if we run out of delalloc reservation. If the reserve pool is then
> fully depleted before all delalloc conversion completes, we'll still
> give ENOSPC. The pool is sized such that the vast majority of
> workloads will complete delalloc conversion successfully before the
> pool is depleted.
> 
> Hence XFS gives everyone the -appearance- that it deals nicely with
> ENOSPC conditions, but it never provides a -guarantee- that any
> accepted write will always succeed without ENOSPC.
> 
> IMO, using this "close-but-not-quite" reservation as the basis of
> space requirements for other layers to provide "won't ENOSPC"
> guarantees is fraught with problems. We already know that it is
> insufficient in important corner cases at the filesystem level, and
> we also know that lower layers trying to do ephemeral space
> reservations will have exactly the same problems providing a
> guarantee. And these are problems we've been unable to engineer
> around in the past, so the likelihood we can engineer around them
> now or in the future is also very unlikely.

Thanks for clarifying. Wasn't aware of XFS delalloc's "wet finger in
the air" ;)

So do you think it reasonable to require applications to fallocate
their data files? Unaware if users are aware to take that extra step.

> > -- so adding coordination between XFS
> > and dm-thin layers provides comparable safety.. that safety is an
> > expected norm).
> >
> > But rather than discuss in terms of data vs metadata, the distinction
> > is:
> > 1) LBA range reservation (normal case, your proposal)
> > 2) non-LBA reservation (absolute value, LBA range is known at later stage)
> > 
> > But I'm clearly going off script for dwelling on wanting to handle
> > both.
> 
> Right, because if we do 1) then we don't need 2). :)

Sure.

> > My looking at (ab)using REQ_META being set (use 1) vs not (use 2) was
> > a crude simplification for branching between the 2 approaches.
> > 
> > And I understand I made you nervous by expanding the scope to a much
> > more muddled/shitty interface. ;)
> 
> Nervous? No, I'm simply trying to make sure that everyone is on the
> same page. i.e. that if we water down the guarantee that 1) relies
> on, then it's not actually useful to filesystems at all.

Yeah, makes sense.
 
> > > Put simply: if we restrict REQ_OP_PROVISION guarantees to just
> > > REQ_META writes (or any other specific type of write operation) then
> > > it's simply not worth persuing at the filesystem level because the
> > > guarantees we actually need just aren't there and the complexity of
> > > discovering and handling those corner cases just isn't worth the
> > > effort.
> > 
> > Here is where I get to say: I think you misunderstood me (but it was
> > my fault for not being absolutely clear: I'm very much on the same
> > page as you and Joe; and your visions need to just be implemented
> > ASAP).
> 
> OK, good that we've clarified the misunderstandings on both sides
> quickly :)

Do you think you're OK to scope out, and/or implement, the XFS changes
if you use v7 of this patchset as the starting point? (v8 should just
be v7 minus the dm-thin.c and dm-snap.c changes).  The thinp
support in v7 will work enough to allow XFS to issue REQ_OP_PROVISION
and/or fallocate (via mkfs.xfs) to dm-thin devices.

And Joe and I can make independent progress on the dm-thin.c changes
needed to ensure the REQ_OP_PROVISION gaurantee you need.

Thanks,
Mike

  reply	other threads:[~2023-06-07 23:50 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-18 22:33 [dm-devel] [PATCH v7 0/5] Introduce provisioning primitives Sarthak Kukreti
2023-05-18 22:33 ` Sarthak Kukreti
2023-05-18 22:33 ` [dm-devel] [PATCH v7 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
2023-05-18 22:33   ` Sarthak Kukreti
2023-05-19  4:09   ` [dm-devel] " Christoph Hellwig
2023-05-19  4:09     ` Christoph Hellwig
2023-05-19 15:17   ` [dm-devel] " Darrick J. Wong
2023-05-19 15:17     ` Darrick J. Wong
2023-05-18 22:33 ` [dm-devel] [PATCH v7 2/5] block: Introduce provisioning primitives Sarthak Kukreti
2023-05-18 22:33   ` Sarthak Kukreti
2023-05-19  4:18   ` [dm-devel] " Christoph Hellwig
2023-05-19  4:18     ` Christoph Hellwig
2023-06-09 20:00   ` [dm-devel] " Mike Snitzer
2023-06-09 20:00     ` Mike Snitzer
2023-05-18 22:33 ` [dm-devel] [PATCH v7 3/5] dm: Add block provisioning support Sarthak Kukreti
2023-05-18 22:33   ` Sarthak Kukreti
2023-05-18 22:33 ` [dm-devel] [PATCH v7 4/5] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
2023-05-18 22:33   ` Sarthak Kukreti
2023-05-19 15:23   ` [dm-devel] " Mike Snitzer
2023-05-19 15:23     ` Mike Snitzer
2023-06-08 21:24     ` [dm-devel] " Mike Snitzer
2023-06-08 21:24       ` Mike Snitzer
2023-06-09  0:28       ` [dm-devel] " Mike Snitzer
2023-06-09  0:28         ` Mike Snitzer
2023-05-18 22:33 ` [dm-devel] [PATCH v7 5/5] loop: Add support for provision requests Sarthak Kukreti
2023-05-18 22:33   ` Sarthak Kukreti
2023-05-22 16:37   ` [dm-devel] " Darrick J. Wong
2023-05-22 16:37     ` Darrick J. Wong
2023-05-22 22:09     ` Sarthak Kukreti
2023-05-22 22:09       ` Sarthak Kukreti
2023-05-23  1:22       ` [dm-devel] " Darrick J. Wong
2023-05-23  1:22         ` Darrick J. Wong
2023-10-07  1:29         ` [dm-devel] " Sarthak Kukreti
2023-10-07  1:29           ` Sarthak Kukreti
2023-05-19  4:09 ` [dm-devel] [PATCH v7 0/5] Introduce provisioning primitives Christoph Hellwig
2023-05-19  4:09   ` Christoph Hellwig
2023-05-19 14:41   ` [dm-devel] " Mike Snitzer
2023-05-19 14:41     ` Mike Snitzer
2023-05-19 23:07     ` [dm-devel] " Dave Chinner
2023-05-19 23:07       ` Dave Chinner
2023-05-22 18:27       ` [dm-devel] " Mike Snitzer
2023-05-22 18:27         ` Mike Snitzer
2023-05-23 14:05         ` [dm-devel] " Brian Foster
2023-05-23 14:05           ` Brian Foster
2023-05-23 15:26           ` [dm-devel] " Mike Snitzer
2023-05-23 15:26             ` Mike Snitzer
2023-05-24  0:40             ` [dm-devel] " Dave Chinner
2023-05-24  0:40               ` Dave Chinner
2023-05-24 20:02               ` [dm-devel] " Mike Snitzer
2023-05-24 20:02                 ` Mike Snitzer
2023-05-25 11:39                 ` [dm-devel] " Dave Chinner
2023-05-25 11:39                   ` Dave Chinner
2023-05-25 16:00                   ` [dm-devel] " Mike Snitzer
2023-05-25 16:00                     ` Mike Snitzer
2023-05-25 22:47                     ` [dm-devel] " Sarthak Kukreti
2023-05-25 22:47                       ` Sarthak Kukreti
2023-05-26  1:36                       ` [dm-devel] " Dave Chinner
2023-05-26  1:36                         ` Dave Chinner
2023-05-26  2:35                         ` [dm-devel] " Sarthak Kukreti
2023-05-26  2:35                           ` Sarthak Kukreti
2023-05-26 15:56                           ` [dm-devel] " Brian Foster
2023-05-26 15:56                             ` Brian Foster
2023-05-25 16:19               ` [dm-devel] " Brian Foster
2023-05-25 16:19                 ` Brian Foster
2023-05-26  9:37                 ` [dm-devel] " Dave Chinner
2023-05-26  9:37                   ` Dave Chinner
2023-05-26 11:04                   ` [dm-devel] " Joe Thornber
2023-05-26 23:45                     ` Dave Chinner
2023-05-26 23:45                       ` Dave Chinner
2023-05-30  7:27                       ` [dm-devel] " Joe Thornber
2023-05-30 14:02                         ` Mike Snitzer
2023-05-30 14:02                           ` Mike Snitzer
2023-05-30 14:55                           ` [dm-devel] " Joe Thornber
2023-05-30 15:28                             ` Mike Snitzer
2023-05-30 15:28                               ` Mike Snitzer
2023-06-02 18:44                               ` [dm-devel] " Sarthak Kukreti
2023-06-02 18:44                                 ` Sarthak Kukreti
2023-06-02 21:50                                 ` [dm-devel] " Mike Snitzer
2023-06-02 21:50                                   ` Mike Snitzer
2023-06-03  0:52                                 ` [dm-devel] " Dave Chinner
2023-06-03  0:52                                   ` Dave Chinner
2023-06-03 15:57                                   ` [dm-devel] " Mike Snitzer
2023-06-03 15:57                                     ` Mike Snitzer
2023-06-05 21:14                                     ` [dm-devel] " Sarthak Kukreti
2023-06-05 21:14                                       ` Sarthak Kukreti
2023-06-07  2:15                                       ` [dm-devel] " Dave Chinner
2023-06-07  2:15                                         ` Dave Chinner
2023-06-07 23:27                                       ` [dm-devel] " Mike Snitzer
2023-06-07 23:27                                         ` Mike Snitzer
2023-06-09 20:31                                         ` [dm-devel] " Mike Snitzer
2023-06-09 20:31                                           ` Mike Snitzer
2023-06-09 21:54                                           ` [dm-devel] " Dave Chinner
2023-06-09 21:54                                             ` Dave Chinner
2023-10-07  1:30                                           ` [dm-devel] " Sarthak Kukreti
2023-10-07  1:30                                             ` Sarthak Kukreti
2023-06-07  2:01                                     ` [dm-devel] " Dave Chinner
2023-06-07  2:01                                       ` Dave Chinner
2023-06-07 23:50                                       ` Mike Snitzer [this message]
2023-06-07 23:50                                         ` Mike Snitzer
2023-06-09  3:32                                         ` [dm-devel] " Dave Chinner
2023-06-09  3:32                                           ` Dave Chinner
2023-06-08  2:03                                   ` [dm-devel] " Martin K. Petersen
2023-06-08  2:03                                     ` Martin K. Petersen
2023-06-09  0:10                                     ` [dm-devel] " Dave Chinner
2023-06-09  0:10                                       ` Dave Chinner
2023-05-26 15:47                   ` [dm-devel] " Brian Foster
2023-05-26 15:47                     ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZIEXwTd17z0iYW4s@redhat.com \
    --to=snitzer@kernel.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bfoster@redhat.com \
    --cc=bvanassche@google.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=dm-devel@redhat.com \
    --cc=ejt@redhat.com \
    --cc=hch@infradead.org \
    --cc=jasowang@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=sarthakkukreti@chromium.org \
    --cc=stefanha@redhat.com \
    --cc=thornber@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.