linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Daniil Lunev <dlunev@google.com>, Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
	Sarthak Kukreti <sarthakkukreti@chromium.org>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	Bart Van Assche <bvanassche@google.com>,
	Mike Snitzer <snitzer@kernel.org>,
	linux-kernel@vger.kernel.org,
	Gwendal Grignou <gwendal@google.com>,
	virtualization@lists.linux-foundation.org, dm-devel@redhat.com,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	linux-ext4@vger.kernel.org, Evan Green <evgreen@google.com>,
	Alasdair Kergon <agk@redhat.com>
Subject: Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage
Date: Fri, 23 Sep 2022 10:08:11 -0400	[thread overview]
Message-ID: <Yy29y/jUvWM6GRZ5@redhat.com> (raw)
In-Reply-To: <Yy1zkMH0f9ski4Sg@infradead.org>

On Fri, Sep 23 2022 at  4:51P -0400,
Christoph Hellwig <hch@infradead.org> wrote:

> On Wed, Sep 21, 2022 at 07:48:50AM +1000, Daniil Lunev wrote:
> > > There is no such thing as WRITE UNAVAILABLE in NVMe.
> > Apologize, that is WRITE UNCORRECTABLE. Chapter 3.2.7 of
> > NVM Express NVM Command Set Specification 1.0b
> 
> Write uncorrectable is a very different thing, and the equivalent of the
> horribly misnamed SCSI WRITE LONG COMMAND.  It injects an unrecoverable
> error, and does not provision anything.
> 
> > * Each application is potentially allowed to consume the entirety
> >   of the disk space - there is no strict size limit for application
> > * Applications need to pre-allocate space sometime, for which
> >   they use fallocate. Once the operation succeeded, the application
> >   assumed the space is guaranteed to be there for it.
> > * Since filesystems on the volumes are independent, filesystem
> >   level enforcement of size constraints is impossible and the only
> >   common level is the thin pool, thus, each fallocate has to find its
> >   representation in thin pool one way or another - otherwise you
> >   may end up in the situation, where FS thinks it has allocated space
> >   but when it tries to actually write it, the thin pool is already
> >   exhausted.
> > * Hole-Punching fallocate will not reach the thin pool, so the only
> >   solution presently is zero-writing pre-allocate.
> 
> To me it sounds like you want a non-thin pool in dm-thin and/or
> guaranted space reservations for it.

What is implemented in this patchset: enablement for dm-thinp to
actually provide guarantees which fallocate requires.

Seems you're getting hung up on the finishing details in HW (details
which are _not_ the point of this patchset).

The proposed changes are in service to _Linux_ code. The patchset
implements the primitive from top (ext4) to bottom (dm-thinp, loop).
It stops short of implementing handling everywhere that'd need it
(e.g. in XFS, etc). But those changes can come as follow-on work once
the primitive is established top to bottom.

But you know all this ;)

> > * Thus, a provisioning block operation allows an interface specific
> >   operation that guarantees the presence of the block in the
> >   mapped space. LVM Thin-pool itself is the primary target for our
> >   use case but the argument is that this operation maps well to
> >   other interfaces which allow thinly provisioned units.
> 
> I think where you are trying to go here is badly mistaken.  With flash
> (or hard drive SMR) there is no such thing as provisioning LBAs.  Every
> write is out of place, and a one time space allocation does not help
> you at all.  So fundamentally what you try to here just goes against
> the actual physics of modern storage media.  While there are some
> layers that keep up a pretence, trying to that an an exposed API
> level is a really bad idea.

This doesn't need to be so feudal.  Reserving an LBA in physical HW
really isn't the point.

Fact remains: an operation that ensures space is actually reserved via
fallocate is long overdue (just because an FS did its job doesn't mean
underlying layers reflect that). And certainly useful, even if "only"
benefiting dm-thinp and the loop driver. Like other block primitives,
REQ_OP_PROVISION is filtered out by block core if the device doesn't
support it.

That said, I agree with Brian Foster that we need really solid
documentation and justification for why fallocate mode=0 cannot be
used (but the case has been made in this thread).

Also, I do see an issue with the implementation (relative to stacked
devices): dm_table_supports_provision() is too myopic about DM. It
needs to go a step further and verify that some layer in the stack
actually services REQ_OP_PROVISION. Will respond to DM patch too.


  reply	other threads:[~2022-09-23 14:08 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-15 16:48 [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 1/8] block: Introduce provisioning primitives Sarthak Kukreti
2022-09-23 15:15   ` Mike Snitzer
2022-12-29  8:17     ` Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 2/8] dm: Add support for block provisioning Sarthak Kukreti
2022-09-23 14:23   ` Mike Snitzer
2022-12-29  8:22     ` Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 3/8] virtio_blk: Add support for provision requests Sarthak Kukreti
2022-09-16  5:48   ` Stefan Hajnoczi
2022-09-20  2:33     ` Sarthak Kukreti
2022-09-27 21:37   ` Michael S. Tsirkin
2022-09-15 16:48 ` [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION Sarthak Kukreti
2022-09-16 11:56   ` Brian Foster
2022-09-16 21:02     ` Sarthak Kukreti
2022-09-21 15:39       ` Brian Foster
2022-09-22  8:04         ` Sarthak Kukreti
2022-09-22 18:29           ` Brian Foster
2022-12-29  8:13             ` Sarthak Kukreti
2022-09-20  7:49   ` Christoph Hellwig
2022-09-21  5:54     ` Sarthak Kukreti
2022-09-21 15:21       ` Mike Snitzer
2022-09-22  8:08         ` Sarthak Kukreti
2022-09-23  8:45       ` Christoph Hellwig
2022-12-29  8:14         ` Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 5/8] loop: Add support for provision requests Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 6/8] ext4: Add support for FALLOC_FL_PROVISION Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 7/8] ext4: Add mount option for provisioning blocks during allocations Sarthak Kukreti
2022-09-15 16:48 ` [PATCH RFC 8/8] ext4: Add a per-file provision override xattr Sarthak Kukreti
2022-09-16  6:09 ` [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage Stefan Hajnoczi
2022-09-16 18:48   ` Sarthak Kukreti
2022-09-16 20:01     ` Bart Van Assche
2022-09-16 21:59       ` Sarthak Kukreti
2022-09-20  7:46     ` Christoph Hellwig
     [not found]       ` <CAAKderPF5Z5QLxyEb80Y+90+eR0sfRmL-WfgXLp=eL=HxWSZ9g@mail.gmail.com>
2022-09-20 11:30         ` Christoph Hellwig
     [not found]           ` <CAAKderNcHpbBqWqqd5-WuKLRCQQUt7a_4D4ti4gy15+fKGK0vQ@mail.gmail.com>
2022-09-21 15:08             ` Mike Snitzer
2022-09-23  8:51             ` Christoph Hellwig
2022-09-23 14:08               ` Mike Snitzer [this message]
2022-12-29  8:17                 ` Sarthak Kukreti
2022-09-17  3:03 ` [dm-devel] " Darrick J. Wong
2022-09-17 19:46   ` Sarthak Kukreti
2022-09-19 16:36     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yy29y/jUvWM6GRZ5@redhat.com \
    --to=snitzer@redhat.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@google.com \
    --cc=dlunev@google.com \
    --cc=dm-devel@redhat.com \
    --cc=evgreen@google.com \
    --cc=gwendal@google.com \
    --cc=hch@infradead.org \
    --cc=jasowang@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=sarthakkukreti@chromium.org \
    --cc=snitzer@kernel.org \
    --cc=stefanha@redhat.com \
    --cc=tytso@mit.edu \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).