From: Mike Snitzer <snitzer@redhat.com>
To: Daniil Lunev <dlunev@google.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
Sarthak Kukreti <sarthakkukreti@chromium.org>,
"Michael S . Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Bart Van Assche <bvanassche@google.com>,
Mike Snitzer <snitzer@kernel.org>,
linux-kernel@vger.kernel.org,
Gwendal Grignou <gwendal@google.com>,
virtualization@lists.linux-foundation.org,
Christoph Hellwig <hch@infradead.org>,
dm-devel@redhat.com, Andreas Dilger <adilger.kernel@dilger.ca>,
Stefan Hajnoczi <stefanha@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
linux-ext4@vger.kernel.org, Evan Green <evgreen@google.com>,
Alasdair Kergon <agk@redhat.com>
Subject: Re: [dm-devel] [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage
Date: Wed, 21 Sep 2022 11:08:43 -0400 [thread overview]
Message-ID: <Yyso+9ChDJQUf9B1@redhat.com> (raw)
In-Reply-To: <CAAKderNcHpbBqWqqd5-WuKLRCQQUt7a_4D4ti4gy15+fKGK0vQ@mail.gmail.com>
On Tue, Sep 20 2022 at 5:48P -0400,
Daniil Lunev <dlunev@google.com> wrote:
> > There is no such thing as WRITE UNAVAILABLE in NVMe.
> Apologize, that is WRITE UNCORRECTABLE. Chapter 3.2.7 of
> NVM Express NVM Command Set Specification 1.0b
>
> > That being siad you still haven't actually explained what problem
> > you're even trying to solve.
>
> The specific problem is the following:
> * There is an thinpool over a physical device
> * There are multiple logical volumes over the thin pool
> * Each logical volume has an independent file system and an
> independent application running over it
> * Each application is potentially allowed to consume the entirety
> of the disk space - there is no strict size limit for application
> * Applications need to pre-allocate space sometime, for which
> they use fallocate. Once the operation succeeded, the application
> assumed the space is guaranteed to be there for it.
> * Since filesystems on the volumes are independent, filesystem
> level enforcement of size constraints is impossible and the only
> common level is the thin pool, thus, each fallocate has to find its
> representation in thin pool one way or another - otherwise you
> may end up in the situation, where FS thinks it has allocated space
> but when it tries to actually write it, the thin pool is already
> exhausted.
> * Hole-Punching fallocate will not reach the thin pool, so the only
> solution presently is zero-writing pre-allocate.
> * Not all storage devices support zero-writing efficiently - apart
> from NVMe being or not being capable of doing efficient write
> zero - changing which is easier said than done, and would take
> years - there are also other types of storage devices that do not
> have WRITE ZERO capability in the first place or have it in a
> peculiar way. And adding custom WRITE ZERO to LVM would be
> arguably a much bigger hack.
> * Thus, a provisioning block operation allows an interface specific
> operation that guarantees the presence of the block in the
> mapped space. LVM Thin-pool itself is the primary target for our
> use case but the argument is that this operation maps well to
> other interfaces which allow thinly provisioned units.
Thanks for this overview. Should help level-set others.
Adding fallocate support has been a long-standing dm-thin TODO item
for me. I just never got around to it. So thanks to Sarthak, you and
anyone else who had a hand in developing this.
I had a look at the DM thin implementation and it looks pretty simple
(doesn't require a thin-metadata change, etc). I'll look closer at
the broader implementation (block, etc) but I'm encouraged by what I'm
seeing.
Mike
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com>
To: Daniil Lunev <dlunev@google.com>
Cc: Christoph Hellwig <hch@infradead.org>,
Sarthak Kukreti <sarthakkukreti@chromium.org>,
Stefan Hajnoczi <stefanha@redhat.com>,
dm-devel@redhat.com, linux-block@vger.kernel.org,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
Jens Axboe <axboe@kernel.dk>,
"Michael S . Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Alasdair Kergon <agk@redhat.com>,
Mike Snitzer <snitzer@kernel.org>, Theodore Ts'o <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Bart Van Assche <bvanassche@google.com>,
Evan Green <evgreen@google.com>,
Gwendal Grignou <gwendal@google.com>
Subject: Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage
Date: Wed, 21 Sep 2022 11:08:43 -0400 [thread overview]
Message-ID: <Yyso+9ChDJQUf9B1@redhat.com> (raw)
In-Reply-To: <CAAKderNcHpbBqWqqd5-WuKLRCQQUt7a_4D4ti4gy15+fKGK0vQ@mail.gmail.com>
On Tue, Sep 20 2022 at 5:48P -0400,
Daniil Lunev <dlunev@google.com> wrote:
> > There is no such thing as WRITE UNAVAILABLE in NVMe.
> Apologize, that is WRITE UNCORRECTABLE. Chapter 3.2.7 of
> NVM Express NVM Command Set Specification 1.0b
>
> > That being siad you still haven't actually explained what problem
> > you're even trying to solve.
>
> The specific problem is the following:
> * There is an thinpool over a physical device
> * There are multiple logical volumes over the thin pool
> * Each logical volume has an independent file system and an
> independent application running over it
> * Each application is potentially allowed to consume the entirety
> of the disk space - there is no strict size limit for application
> * Applications need to pre-allocate space sometime, for which
> they use fallocate. Once the operation succeeded, the application
> assumed the space is guaranteed to be there for it.
> * Since filesystems on the volumes are independent, filesystem
> level enforcement of size constraints is impossible and the only
> common level is the thin pool, thus, each fallocate has to find its
> representation in thin pool one way or another - otherwise you
> may end up in the situation, where FS thinks it has allocated space
> but when it tries to actually write it, the thin pool is already
> exhausted.
> * Hole-Punching fallocate will not reach the thin pool, so the only
> solution presently is zero-writing pre-allocate.
> * Not all storage devices support zero-writing efficiently - apart
> from NVMe being or not being capable of doing efficient write
> zero - changing which is easier said than done, and would take
> years - there are also other types of storage devices that do not
> have WRITE ZERO capability in the first place or have it in a
> peculiar way. And adding custom WRITE ZERO to LVM would be
> arguably a much bigger hack.
> * Thus, a provisioning block operation allows an interface specific
> operation that guarantees the presence of the block in the
> mapped space. LVM Thin-pool itself is the primary target for our
> use case but the argument is that this operation maps well to
> other interfaces which allow thinly provisioned units.
Thanks for this overview. Should help level-set others.
Adding fallocate support has been a long-standing dm-thin TODO item
for me. I just never got around to it. So thanks to Sarthak, you and
anyone else who had a hand in developing this.
I had a look at the DM thin implementation and it looks pretty simple
(doesn't require a thin-metadata change, etc). I'll look closer at
the broader implementation (block, etc) but I'm encouraged by what I'm
seeing.
Mike
WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com>
To: Daniil Lunev <dlunev@google.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
Sarthak Kukreti <sarthakkukreti@chromium.org>,
"Michael S . Tsirkin" <mst@redhat.com>,
Bart Van Assche <bvanassche@google.com>,
Mike Snitzer <snitzer@kernel.org>,
linux-kernel@vger.kernel.org,
Gwendal Grignou <gwendal@google.com>,
virtualization@lists.linux-foundation.org,
Christoph Hellwig <hch@infradead.org>,
dm-devel@redhat.com, Andreas Dilger <adilger.kernel@dilger.ca>,
Stefan Hajnoczi <stefanha@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
linux-ext4@vger.kernel.org, Evan Green <evgreen@google.com>,
Alasdair Kergon <agk@redhat.com>
Subject: Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage
Date: Wed, 21 Sep 2022 11:08:43 -0400 [thread overview]
Message-ID: <Yyso+9ChDJQUf9B1@redhat.com> (raw)
In-Reply-To: <CAAKderNcHpbBqWqqd5-WuKLRCQQUt7a_4D4ti4gy15+fKGK0vQ@mail.gmail.com>
On Tue, Sep 20 2022 at 5:48P -0400,
Daniil Lunev <dlunev@google.com> wrote:
> > There is no such thing as WRITE UNAVAILABLE in NVMe.
> Apologize, that is WRITE UNCORRECTABLE. Chapter 3.2.7 of
> NVM Express NVM Command Set Specification 1.0b
>
> > That being siad you still haven't actually explained what problem
> > you're even trying to solve.
>
> The specific problem is the following:
> * There is an thinpool over a physical device
> * There are multiple logical volumes over the thin pool
> * Each logical volume has an independent file system and an
> independent application running over it
> * Each application is potentially allowed to consume the entirety
> of the disk space - there is no strict size limit for application
> * Applications need to pre-allocate space sometime, for which
> they use fallocate. Once the operation succeeded, the application
> assumed the space is guaranteed to be there for it.
> * Since filesystems on the volumes are independent, filesystem
> level enforcement of size constraints is impossible and the only
> common level is the thin pool, thus, each fallocate has to find its
> representation in thin pool one way or another - otherwise you
> may end up in the situation, where FS thinks it has allocated space
> but when it tries to actually write it, the thin pool is already
> exhausted.
> * Hole-Punching fallocate will not reach the thin pool, so the only
> solution presently is zero-writing pre-allocate.
> * Not all storage devices support zero-writing efficiently - apart
> from NVMe being or not being capable of doing efficient write
> zero - changing which is easier said than done, and would take
> years - there are also other types of storage devices that do not
> have WRITE ZERO capability in the first place or have it in a
> peculiar way. And adding custom WRITE ZERO to LVM would be
> arguably a much bigger hack.
> * Thus, a provisioning block operation allows an interface specific
> operation that guarantees the presence of the block in the
> mapped space. LVM Thin-pool itself is the primary target for our
> use case but the argument is that this operation maps well to
> other interfaces which allow thinly provisioned units.
Thanks for this overview. Should help level-set others.
Adding fallocate support has been a long-standing dm-thin TODO item
for me. I just never got around to it. So thanks to Sarthak, you and
anyone else who had a hand in developing this.
I had a look at the DM thin implementation and it looks pretty simple
(doesn't require a thin-metadata change, etc). I'll look closer at
the broader implementation (block, etc) but I'm encouraged by what I'm
seeing.
Mike
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next prev parent reply other threads:[~2022-09-21 15:13 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-15 16:48 [dm-devel] [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 1/8] block: Introduce provisioning primitives Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-23 15:15 ` [dm-devel] " Mike Snitzer
2022-09-23 15:15 ` Mike Snitzer
2022-09-23 15:15 ` Mike Snitzer
2022-12-29 8:17 ` [dm-devel] " Sarthak Kukreti
2022-12-29 8:17 ` Sarthak Kukreti
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 2/8] dm: Add support for block provisioning Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-23 14:23 ` [dm-devel] " Mike Snitzer
2022-09-23 14:23 ` Mike Snitzer
2022-09-23 14:23 ` Mike Snitzer
2022-12-29 8:22 ` [dm-devel] " Sarthak Kukreti
2022-12-29 8:22 ` Sarthak Kukreti
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 3/8] virtio_blk: Add support for provision requests Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-16 5:48 ` [dm-devel] " Stefan Hajnoczi
2022-09-16 5:48 ` Stefan Hajnoczi
2022-09-16 5:48 ` Stefan Hajnoczi
2022-09-20 2:33 ` [dm-devel] " Sarthak Kukreti
2022-09-20 2:33 ` Sarthak Kukreti
2022-09-27 21:37 ` [dm-devel] " Michael S. Tsirkin
2022-09-27 21:37 ` Michael S. Tsirkin
2022-09-27 21:37 ` Michael S. Tsirkin
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-16 11:56 ` [dm-devel] " Brian Foster
2022-09-16 11:56 ` Brian Foster
2022-09-16 11:56 ` Brian Foster
2022-09-16 21:02 ` [dm-devel] " Sarthak Kukreti
2022-09-16 21:02 ` Sarthak Kukreti
2022-09-21 15:39 ` [dm-devel] " Brian Foster
2022-09-21 15:39 ` Brian Foster
2022-09-21 15:39 ` Brian Foster
2022-09-22 8:04 ` [dm-devel] " Sarthak Kukreti
2022-09-22 8:04 ` Sarthak Kukreti
2022-09-22 18:29 ` [dm-devel] " Brian Foster
2022-09-22 18:29 ` Brian Foster
2022-09-22 18:29 ` Brian Foster
2022-12-29 8:13 ` [dm-devel] " Sarthak Kukreti
2022-12-29 8:13 ` Sarthak Kukreti
2022-09-20 7:49 ` [dm-devel] " Christoph Hellwig
2022-09-20 7:49 ` Christoph Hellwig
2022-09-20 7:49 ` Christoph Hellwig
2022-09-21 5:54 ` [dm-devel] " Sarthak Kukreti
2022-09-21 5:54 ` Sarthak Kukreti
2022-09-21 15:21 ` [dm-devel] " Mike Snitzer
2022-09-21 15:21 ` Mike Snitzer
2022-09-21 15:21 ` Mike Snitzer
2022-09-22 8:08 ` [dm-devel] " Sarthak Kukreti
2022-09-22 8:08 ` Sarthak Kukreti
2022-09-23 8:45 ` [dm-devel] " Christoph Hellwig
2022-09-23 8:45 ` Christoph Hellwig
2022-09-23 8:45 ` Christoph Hellwig
2022-12-29 8:14 ` [dm-devel] " Sarthak Kukreti
2022-12-29 8:14 ` Sarthak Kukreti
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 5/8] loop: Add support for provision requests Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 6/8] ext4: Add support for FALLOC_FL_PROVISION Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 7/8] ext4: Add mount option for provisioning blocks during allocations Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 8/8] ext4: Add a per-file provision override xattr Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-16 6:09 ` [dm-devel] [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage Stefan Hajnoczi
2022-09-16 6:09 ` Stefan Hajnoczi
2022-09-16 6:09 ` Stefan Hajnoczi
2022-09-16 18:48 ` [dm-devel] " Sarthak Kukreti
2022-09-16 18:48 ` Sarthak Kukreti
2022-09-16 20:01 ` [dm-devel] " Bart Van Assche
2022-09-16 20:01 ` Bart Van Assche
2022-09-16 20:01 ` Bart Van Assche
2022-09-16 21:59 ` [dm-devel] " Sarthak Kukreti
2022-09-16 21:59 ` Sarthak Kukreti
2022-09-20 7:46 ` [dm-devel] " Christoph Hellwig
2022-09-20 7:46 ` Christoph Hellwig
2022-09-20 7:46 ` Christoph Hellwig
2022-09-20 10:17 ` [dm-devel] " Daniil Lunev
2022-09-20 11:30 ` Christoph Hellwig
2022-09-20 11:30 ` Christoph Hellwig
2022-09-20 11:30 ` Christoph Hellwig
2022-09-20 21:48 ` [dm-devel] " Daniil Lunev
2022-09-21 15:08 ` Mike Snitzer [this message]
2022-09-21 15:08 ` Mike Snitzer
2022-09-21 15:08 ` Mike Snitzer
2022-09-23 8:51 ` [dm-devel] " Christoph Hellwig
2022-09-23 8:51 ` Christoph Hellwig
2022-09-23 8:51 ` Christoph Hellwig
2022-09-23 14:08 ` [dm-devel] " Mike Snitzer
2022-09-23 14:08 ` Mike Snitzer
2022-09-23 14:08 ` Mike Snitzer
2022-12-29 8:17 ` [dm-devel] " Sarthak Kukreti
2022-12-29 8:17 ` Sarthak Kukreti
2022-09-17 3:03 ` [dm-devel] " Darrick J. Wong
2022-09-17 3:03 ` Darrick J. Wong
2022-09-17 19:46 ` Sarthak Kukreti
2022-09-17 19:46 ` Sarthak Kukreti
2022-09-19 16:36 ` Stefan Hajnoczi
2022-09-19 16:36 ` Stefan Hajnoczi
2022-09-19 16:36 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yyso+9ChDJQUf9B1@redhat.com \
--to=snitzer@redhat.com \
--cc=adilger.kernel@dilger.ca \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@google.com \
--cc=dlunev@google.com \
--cc=dm-devel@redhat.com \
--cc=evgreen@google.com \
--cc=gwendal@google.com \
--cc=hch@infradead.org \
--cc=jasowang@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=sarthakkukreti@chromium.org \
--cc=snitzer@kernel.org \
--cc=stefanha@redhat.com \
--cc=tytso@mit.edu \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.