From: Mike Snitzer <snitzer@kernel.org>
To: "Darrick J. Wong" <djwong@kernel.org>,
Sarthak Kukreti <sarthakkukreti@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>,
Christoph Hellwig <hch@infradead.org>,
Theodore Ts'o <tytso@mit.edu>,
"Michael S. Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Bart Van Assche <bvanassche@google.com>,
linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
dm-devel@redhat.com, Andreas Dilger <adilger.kernel@dilger.ca>,
Daniil Lunev <dlunev@google.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
Brian Foster <bfoster@redhat.com>,
Alasdair Kergon <agk@redhat.com>
Subject: Re: [dm-devel] [PATCH v4 1/4] block: Introduce provisioning primitives
Date: Wed, 19 Apr 2023 12:17:34 -0400 [thread overview]
Message-ID: <ZEAUHnWqt9cIiJRb@redhat.com> (raw)
In-Reply-To: <20230419153611.GE360885@frogsfrogsfrogs>
On Wed, Apr 19 2023 at 11:36P -0400,
Darrick J. Wong <djwong@kernel.org> wrote:
> On Tue, Apr 18, 2023 at 03:12:04PM -0700, Sarthak Kukreti wrote:
> > Introduce block request REQ_OP_PROVISION. The intent of this request
> > is to request underlying storage to preallocate disk space for the given
> > block range. Block devices that support this capability will export
> > a provision limit within their request queues.
> >
> > This patch also adds the capability to call fallocate() in mode 0
> > on block devices, which will send REQ_OP_PROVISION to the block
> > device for the specified range,
> >
> > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > ---
> > block/blk-core.c | 5 ++++
> > block/blk-lib.c | 53 +++++++++++++++++++++++++++++++++++++++
> > block/blk-merge.c | 18 +++++++++++++
> > block/blk-settings.c | 19 ++++++++++++++
> > block/blk-sysfs.c | 8 ++++++
> > block/bounce.c | 1 +
> > block/fops.c | 25 +++++++++++++-----
> > include/linux/bio.h | 6 +++--
> > include/linux/blk_types.h | 5 +++-
> > include/linux/blkdev.h | 16 ++++++++++++
> > 10 files changed, 147 insertions(+), 9 deletions(-)
> >
>
> <cut to the fallocate part; the block/ changes look fine to /me/ at
> first glance, but what do I know... ;)>
>
> > diff --git a/block/fops.c b/block/fops.c
> > index d2e6be4e3d1c..e1775269654a 100644
> > --- a/block/fops.c
> > +++ b/block/fops.c
> > @@ -611,9 +611,13 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
> > return ret;
> > }
> >
> > +#define BLKDEV_FALLOC_FL_TRUNCATE \
>
> At first I thought from this name that you were defining a new truncate
> mode for fallocate, then I realized that this is mask for deciding if we
> /want/ to truncate the pagecache.
>
> #define BLKDEV_FALLOC_TRUNCATE_MASK ?
>
> > + (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE | \
>
> Ok, so discarding and writing zeroes truncates the page cache, makes
> sense since we're "writing" directly to the block device.
>
> > + FALLOC_FL_NO_HIDE_STALE)
>
> Here things get tricky -- some of the FALLOC_FL mode bits are really an
> opcode and cannot be specified together, whereas others select optional
> behavior for certain opcodes.
>
> IIRC, the mutually exclusive opcodes are:
>
> PUNCH_HOLE
> ZERO_RANGE
> COLLAPSE_RANGE
> INSERT_RANGE
> (none of the above, for allocation)
>
> and the "variants on a theme are":
>
> KEEP_SIZE
> NO_HIDE_STALE
> UNSHARE_RANGE
>
> not all of which are supported by all the opcodes.
>
> Does it make sense to truncate the page cache if userspace passes in
> mode == NO_HIDE_STALE? There's currently no defined meaning for this
> combination, but I think this means we'll truncate the pagecache before
> deciding if we're actually going to issue any commands.
>
> I think that's just a bug in the existing code -- it should be
> validating that @mode is any of the supported combinations *before*
> truncating the pagecache.
>
> Otherwise you could have a mkfs program that starts writing new fs
> metadata, decides to provision the storage (say for a logging region),
> doesn't realize it's running on an old kernel, and then oops the
> provision attempt fails but have we now shredded the pagecache and lost
> all the writes?
While that just caused me to have an "oh shit, that's crazy" (in a
scary way) belly laugh...
(And obviously needs fixing independent of this patchset)
Shouldn't mkfs first check that the underlying storage supports
REQ_OP_PROVISION by verifying
/sys/block/<dev>/queue/provision_max_bytes exists and is not 0?
(Just saying, we need to add new features more defensively.. you just
made the case based on this scenario's implications alone)
Sarthak, please note I said "provision_max_bytes": all other ops
(e.g. DISCARD, WRITE_ZEROES, etc) have <op>_max_bytes exported through
sysfs, not <op>_max_sectors. Please export provision_max_bytes, e.g.:
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 202aa78f933e..2e5ac7b1ffbd 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -605,12 +605,12 @@ QUEUE_RO_ENTRY(queue_io_min, "minimum_io_size");
QUEUE_RO_ENTRY(queue_io_opt, "optimal_io_size");
QUEUE_RO_ENTRY(queue_max_discard_segments, "max_discard_segments");
-QUEUE_RO_ENTRY(queue_max_provision_sectors, "max_provision_sectors");
QUEUE_RO_ENTRY(queue_discard_granularity, "discard_granularity");
QUEUE_RO_ENTRY(queue_discard_max_hw, "discard_max_hw_bytes");
QUEUE_RW_ENTRY(queue_discard_max, "discard_max_bytes");
QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");
+QUEUE_RO_ENTRY(queue_provision_max, "provision_max_bytes");
QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@kernel.org>
To: "Darrick J. Wong" <djwong@kernel.org>,
Sarthak Kukreti <sarthakkukreti@chromium.org>
Cc: dm-devel@redhat.com, linux-block@vger.kernel.org,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
Theodore Ts'o <tytso@mit.edu>,
"Michael S. Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Bart Van Assche <bvanassche@google.com>,
Christoph Hellwig <hch@infradead.org>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Daniil Lunev <dlunev@google.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Brian Foster <bfoster@redhat.com>,
Alasdair Kergon <agk@redhat.com>
Subject: Re: [PATCH v4 1/4] block: Introduce provisioning primitives
Date: Wed, 19 Apr 2023 12:17:34 -0400 [thread overview]
Message-ID: <ZEAUHnWqt9cIiJRb@redhat.com> (raw)
In-Reply-To: <20230419153611.GE360885@frogsfrogsfrogs>
On Wed, Apr 19 2023 at 11:36P -0400,
Darrick J. Wong <djwong@kernel.org> wrote:
> On Tue, Apr 18, 2023 at 03:12:04PM -0700, Sarthak Kukreti wrote:
> > Introduce block request REQ_OP_PROVISION. The intent of this request
> > is to request underlying storage to preallocate disk space for the given
> > block range. Block devices that support this capability will export
> > a provision limit within their request queues.
> >
> > This patch also adds the capability to call fallocate() in mode 0
> > on block devices, which will send REQ_OP_PROVISION to the block
> > device for the specified range,
> >
> > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > ---
> > block/blk-core.c | 5 ++++
> > block/blk-lib.c | 53 +++++++++++++++++++++++++++++++++++++++
> > block/blk-merge.c | 18 +++++++++++++
> > block/blk-settings.c | 19 ++++++++++++++
> > block/blk-sysfs.c | 8 ++++++
> > block/bounce.c | 1 +
> > block/fops.c | 25 +++++++++++++-----
> > include/linux/bio.h | 6 +++--
> > include/linux/blk_types.h | 5 +++-
> > include/linux/blkdev.h | 16 ++++++++++++
> > 10 files changed, 147 insertions(+), 9 deletions(-)
> >
>
> <cut to the fallocate part; the block/ changes look fine to /me/ at
> first glance, but what do I know... ;)>
>
> > diff --git a/block/fops.c b/block/fops.c
> > index d2e6be4e3d1c..e1775269654a 100644
> > --- a/block/fops.c
> > +++ b/block/fops.c
> > @@ -611,9 +611,13 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
> > return ret;
> > }
> >
> > +#define BLKDEV_FALLOC_FL_TRUNCATE \
>
> At first I thought from this name that you were defining a new truncate
> mode for fallocate, then I realized that this is mask for deciding if we
> /want/ to truncate the pagecache.
>
> #define BLKDEV_FALLOC_TRUNCATE_MASK ?
>
> > + (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE | \
>
> Ok, so discarding and writing zeroes truncates the page cache, makes
> sense since we're "writing" directly to the block device.
>
> > + FALLOC_FL_NO_HIDE_STALE)
>
> Here things get tricky -- some of the FALLOC_FL mode bits are really an
> opcode and cannot be specified together, whereas others select optional
> behavior for certain opcodes.
>
> IIRC, the mutually exclusive opcodes are:
>
> PUNCH_HOLE
> ZERO_RANGE
> COLLAPSE_RANGE
> INSERT_RANGE
> (none of the above, for allocation)
>
> and the "variants on a theme are":
>
> KEEP_SIZE
> NO_HIDE_STALE
> UNSHARE_RANGE
>
> not all of which are supported by all the opcodes.
>
> Does it make sense to truncate the page cache if userspace passes in
> mode == NO_HIDE_STALE? There's currently no defined meaning for this
> combination, but I think this means we'll truncate the pagecache before
> deciding if we're actually going to issue any commands.
>
> I think that's just a bug in the existing code -- it should be
> validating that @mode is any of the supported combinations *before*
> truncating the pagecache.
>
> Otherwise you could have a mkfs program that starts writing new fs
> metadata, decides to provision the storage (say for a logging region),
> doesn't realize it's running on an old kernel, and then oops the
> provision attempt fails but have we now shredded the pagecache and lost
> all the writes?
While that just caused me to have an "oh shit, that's crazy" (in a
scary way) belly laugh...
(And obviously needs fixing independent of this patchset)
Shouldn't mkfs first check that the underlying storage supports
REQ_OP_PROVISION by verifying
/sys/block/<dev>/queue/provision_max_bytes exists and is not 0?
(Just saying, we need to add new features more defensively.. you just
made the case based on this scenario's implications alone)
Sarthak, please note I said "provision_max_bytes": all other ops
(e.g. DISCARD, WRITE_ZEROES, etc) have <op>_max_bytes exported through
sysfs, not <op>_max_sectors. Please export provision_max_bytes, e.g.:
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 202aa78f933e..2e5ac7b1ffbd 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -605,12 +605,12 @@ QUEUE_RO_ENTRY(queue_io_min, "minimum_io_size");
QUEUE_RO_ENTRY(queue_io_opt, "optimal_io_size");
QUEUE_RO_ENTRY(queue_max_discard_segments, "max_discard_segments");
-QUEUE_RO_ENTRY(queue_max_provision_sectors, "max_provision_sectors");
QUEUE_RO_ENTRY(queue_discard_granularity, "discard_granularity");
QUEUE_RO_ENTRY(queue_discard_max_hw, "discard_max_hw_bytes");
QUEUE_RW_ENTRY(queue_discard_max, "discard_max_bytes");
QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");
+QUEUE_RO_ENTRY(queue_provision_max, "provision_max_bytes");
QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");
next prev parent reply other threads:[~2023-04-19 16:17 UTC|newest]
Thread overview: 117+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20221229071647.437095-1-sarthakkukreti@chromium.org>
2023-04-14 0:02 ` [dm-devel] [PATCH v3 0/3] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
2023-04-14 0:02 ` Sarthak Kukreti
2023-04-14 0:02 ` [dm-devel] [PATCH v3 1/3] block: Introduce provisioning primitives Sarthak Kukreti
2023-04-14 0:02 ` Sarthak Kukreti
2023-04-17 17:35 ` [dm-devel] " Brian Foster
2023-04-17 17:35 ` Brian Foster
2023-04-18 22:13 ` [dm-devel] " Sarthak Kukreti
2023-04-18 22:13 ` Sarthak Kukreti
2023-04-14 0:02 ` [dm-devel] [PATCH v3 2/3] dm: Add support for block provisioning Sarthak Kukreti
2023-04-14 0:02 ` Sarthak Kukreti
2023-04-14 13:32 ` [dm-devel] " Joe Thornber
2023-04-14 18:14 ` Mike Snitzer
2023-04-14 18:14 ` Mike Snitzer
2023-04-14 21:58 ` [dm-devel] " Mike Snitzer
2023-04-14 21:58 ` Mike Snitzer
2023-04-18 22:13 ` [dm-devel] " Sarthak Kukreti
2023-04-18 22:13 ` Sarthak Kukreti
2023-04-14 0:02 ` [dm-devel] [PATCH v3 3/3] loop: Add support for provision requests Sarthak Kukreti
2023-04-14 0:02 ` Sarthak Kukreti
2023-04-18 22:12 ` [dm-devel] [PATCH v4 0/4] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
2023-04-18 22:12 ` Sarthak Kukreti
2023-04-18 22:12 ` [dm-devel] [PATCH v4 1/4] block: Introduce provisioning primitives Sarthak Kukreti
2023-04-18 22:12 ` Sarthak Kukreti
2023-04-18 22:43 ` [dm-devel] " Bart Van Assche
2023-04-18 22:43 ` Bart Van Assche
2023-04-20 17:41 ` [dm-devel] " Sarthak Kukreti
2023-04-20 17:41 ` Sarthak Kukreti
2023-04-19 15:36 ` [dm-devel] " Darrick J. Wong
2023-04-19 15:36 ` Darrick J. Wong
2023-04-19 16:17 ` Mike Snitzer [this message]
2023-04-19 16:17 ` Mike Snitzer
2023-04-19 17:26 ` [dm-devel] " Darrick J. Wong
2023-04-19 17:26 ` Darrick J. Wong
2023-04-19 23:21 ` [dm-devel] " Dave Chinner
2023-04-19 23:21 ` Dave Chinner
2023-04-20 0:53 ` [dm-devel] " Sarthak Kukreti
2023-04-20 0:53 ` Sarthak Kukreti
2023-04-18 22:12 ` [dm-devel] [PATCH v4 2/4] dm: Add block provisioning support Sarthak Kukreti
2023-04-18 22:12 ` Sarthak Kukreti
2023-04-18 22:12 ` [dm-devel] [PATCH v4 3/4] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
2023-04-18 22:12 ` Sarthak Kukreti
2023-04-18 22:12 ` [dm-devel] [PATCH v4 4/4] loop: Add support for provision requests Sarthak Kukreti
2023-04-18 22:12 ` Sarthak Kukreti
2023-04-20 0:48 ` [dm-devel] [PATCH v5 0/5] Introduce block provisioning primitives Sarthak Kukreti
2023-04-20 0:48 ` Sarthak Kukreti
2023-04-20 0:48 ` [dm-devel] [PATCH v5 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
2023-04-20 0:48 ` Sarthak Kukreti
2023-04-20 1:22 ` [dm-devel] " Darrick J. Wong
2023-04-20 1:22 ` Darrick J. Wong
2023-04-20 1:48 ` [dm-devel] " Sarthak Kukreti
2023-04-20 1:48 ` Sarthak Kukreti
2023-04-20 1:47 ` [dm-devel] [PATCH v5-fix " Sarthak Kukreti
2023-04-20 1:47 ` Sarthak Kukreti
2023-04-20 16:20 ` [dm-devel] " Mike Snitzer
2023-04-20 16:20 ` Mike Snitzer
2023-04-20 17:28 ` [dm-devel] " Sarthak Kukreti
2023-04-20 17:28 ` Sarthak Kukreti
2023-04-20 18:17 ` [dm-devel] " Sarthak Kukreti
2023-04-20 18:17 ` Sarthak Kukreti
2023-04-20 18:15 ` [dm-devel] " Sarthak Kukreti
2023-04-20 18:15 ` Sarthak Kukreti
2023-04-24 15:54 ` [dm-devel] [PATCH v5 " kernel test robot
2023-04-24 15:54 ` kernel test robot
2023-05-04 8:50 ` [dm-devel] " kernel test robot
2023-05-04 8:50 ` kernel test robot
2023-04-20 0:48 ` [dm-devel] [PATCH v5 2/5] block: Introduce provisioning primitives Sarthak Kukreti
2023-04-20 0:48 ` Sarthak Kukreti
2023-04-20 0:48 ` [dm-devel] [PATCH v5 3/5] dm: Add block provisioning support Sarthak Kukreti
2023-04-20 0:48 ` Sarthak Kukreti
2023-04-20 0:48 ` [dm-devel] [PATCH v5 4/5] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
2023-04-20 0:48 ` Sarthak Kukreti
2023-05-01 19:15 ` [dm-devel] " Mike Snitzer
2023-05-01 19:15 ` Mike Snitzer
2023-05-06 6:32 ` [dm-devel] " Sarthak Kukreti
2023-05-06 6:32 ` Sarthak Kukreti
2023-04-20 0:48 ` [dm-devel] [PATCH v5 5/5] loop: Add support for provision requests Sarthak Kukreti
2023-04-20 0:48 ` Sarthak Kukreti
2023-05-06 6:29 ` [dm-devel] [PATCH v6 0/5] Introduce block provisioning primitives Sarthak Kukreti
2023-05-06 6:29 ` Sarthak Kukreti
2023-05-06 6:29 ` [dm-devel] [PATCH v6 1/5] block: Don't invalidate pagecache for invalid falloc modes Sarthak Kukreti
2023-05-06 6:29 ` Sarthak Kukreti
2023-05-09 16:51 ` [dm-devel] " Mike Snitzer
2023-05-09 16:51 ` Mike Snitzer
2023-05-12 18:31 ` [dm-devel] " Darrick J. Wong
2023-05-12 18:31 ` Darrick J. Wong
2023-05-06 6:29 ` [dm-devel] [PATCH v6 2/5] block: Introduce provisioning primitives Sarthak Kukreti
2023-05-06 6:29 ` Sarthak Kukreti
2023-05-09 16:52 ` [dm-devel] " Mike Snitzer
2023-05-09 16:52 ` Mike Snitzer
2023-05-12 18:37 ` [dm-devel] " Darrick J. Wong
2023-05-12 18:37 ` Darrick J. Wong
2023-05-15 21:55 ` [dm-devel] " Sarthak Kukreti
2023-05-15 21:55 ` Sarthak Kukreti
2023-05-06 6:29 ` [dm-devel] [PATCH v6 3/5] dm: Add block provisioning support Sarthak Kukreti
2023-05-06 6:29 ` Sarthak Kukreti
2023-05-09 16:52 ` [dm-devel] " Mike Snitzer
2023-05-09 16:52 ` Mike Snitzer
2023-05-06 6:29 ` [dm-devel] [PATCH v6 4/5] dm-thin: Add REQ_OP_PROVISION support Sarthak Kukreti
2023-05-06 6:29 ` Sarthak Kukreti
2023-05-09 16:58 ` [dm-devel] " Mike Snitzer
2023-05-09 16:58 ` Mike Snitzer
2023-05-11 20:03 ` [dm-devel] " Sarthak Kukreti
2023-05-11 20:03 ` Sarthak Kukreti
2023-05-12 14:34 ` [dm-devel] " Mike Snitzer
2023-05-12 14:34 ` Mike Snitzer
2023-05-12 17:32 ` [dm-devel] " Mike Snitzer
2023-05-12 17:32 ` Mike Snitzer
2023-05-15 21:19 ` [dm-devel] " Sarthak Kukreti
2023-05-15 21:19 ` Sarthak Kukreti
2023-05-06 6:29 ` [dm-devel] [PATCH v6 5/5] loop: Add support for provision requests Sarthak Kukreti
2023-05-06 6:29 ` Sarthak Kukreti
2023-05-15 12:40 ` [dm-devel] " Brian Foster
2023-05-15 12:40 ` Brian Foster
2023-05-15 21:31 ` [dm-devel] " Sarthak Kukreti
2023-05-15 21:31 ` Sarthak Kukreti
2023-05-12 18:28 ` [dm-devel] [PATCH v6 0/5] Introduce block provisioning primitives Darrick J. Wong
2023-05-12 18:28 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZEAUHnWqt9cIiJRb@redhat.com \
--to=snitzer@kernel.org \
--cc=adilger.kernel@dilger.ca \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=bfoster@redhat.com \
--cc=bvanassche@google.com \
--cc=djwong@kernel.org \
--cc=dlunev@google.com \
--cc=dm-devel@redhat.com \
--cc=hch@infradead.org \
--cc=jasowang@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=sarthakkukreti@chromium.org \
--cc=stefanha@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.