From: Brian Foster <bfoster@redhat.com>
To: Sarthak Kukreti <sarthakkukreti@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>,
Gwendal Grignou <gwendal@google.com>,
Theodore Ts'o <tytso@mit.edu>,
"Michael S . Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Bart Van Assche <bvanassche@google.com>,
Mike Snitzer <snitzer@kernel.org>,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
linux-block@vger.kernel.org, dm-devel@redhat.com,
Andreas Dilger <adilger.kernel@dilger.ca>,
Daniil Lunev <dlunev@google.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
linux-ext4@vger.kernel.org, Evan Green <evgreen@google.com>,
Alasdair Kergon <agk@redhat.com>
Subject: Re: [dm-devel] [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION
Date: Thu, 22 Sep 2022 14:29:36 -0400 [thread overview]
Message-ID: <YyypkOr3l2lx4Odi@bfoster> (raw)
In-Reply-To: <CAG9=OMPEoShYMx6A+p97-tw4MuLpgOEpy7aFs5CH6wTedptALQ@mail.gmail.com>
On Thu, Sep 22, 2022 at 01:04:33AM -0700, Sarthak Kukreti wrote:
> On Wed, Sep 21, 2022 at 8:39 AM Brian Foster <bfoster@redhat.com> wrote:
> >
> > On Fri, Sep 16, 2022 at 02:02:31PM -0700, Sarthak Kukreti wrote:
> > > On Fri, Sep 16, 2022 at 4:56 AM Brian Foster <bfoster@redhat.com> wrote:
> > > >
> > > > On Thu, Sep 15, 2022 at 09:48:22AM -0700, Sarthak Kukreti wrote:
> > > > > From: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > > > >
> > > > > FALLOC_FL_PROVISION is a new fallocate() allocation mode that
> > > > > sends a hint to (supported) thinly provisioned block devices to
> > > > > allocate space for the given range of sectors via REQ_OP_PROVISION.
> > > > >
> > > > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > > > > ---
> > > > > block/fops.c | 7 ++++++-
> > > > > include/linux/falloc.h | 3 ++-
> > > > > include/uapi/linux/falloc.h | 8 ++++++++
> > > > > 3 files changed, 16 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/block/fops.c b/block/fops.c
> > > > > index b90742595317..a436a7596508 100644
> > > > > --- a/block/fops.c
> > > > > +++ b/block/fops.c
> > > > ...
> > > > > @@ -661,6 +662,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
> > > > > error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
> > > > > len >> SECTOR_SHIFT, GFP_KERNEL);
> > > > > break;
> > > > > + case FALLOC_FL_PROVISION:
> > > > > + error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
> > > > > + len >> SECTOR_SHIFT, GFP_KERNEL);
> > > > > + break;
> > > > > default:
> > > > > error = -EOPNOTSUPP;
> > > > > }
> > > >
> > > > Hi Sarthak,
> > > >
> > > > Neat mechanism.. I played with something very similar in the past (that
> > > > was much more crudely hacked up to target dm-thin) to allow filesystems
> > > > to request a thinly provisioned device to allocate blocks and try to do
> > > > a better job of avoiding inactivation when overprovisioned.
> > > >
> > > > One thing I'm a little curious about here.. what's the need for a new
> > > > fallocate mode? On a cursory glance, the provision mode looks fairly
> > > > analogous to normal (mode == 0) allocation mode with the exception of
> > > > sending the request down to the bdev. blkdev_fallocate() already maps
> > > > some of the logical falloc modes (i.e. punch hole, zero range) to
> > > > sending write sames or discards, etc., and it doesn't currently look
> > > > like it supports allocation mode, so could it not map such requests to
> > > > the underlying REQ_OP_PROVISION op?
> > > >
> > > > I guess the difference would be at the filesystem level where we'd
> > > > probably need to rely on a mount option or some such to control whether
> > > > traditional fallocate issues provision ops (like you've implemented for
> > > > ext4) vs. the specific falloc command, but that seems fairly consistent
> > > > with historical punch hole/discard behavior too. Hm? You might want to
> > > > cc linux-fsdevel in future posts in any event to get some more feedback
> > > > on how other filesystems might want to interact with such a thing.
> > > >
> > > Thanks for the feedback!
> > > Argh, I completely forgot that I should add linux-fsdevel. Let me
> > > re-send this with linux-fsdevel cc'd
> > >
> > > There's a slight distinction is that the current filesystem-level
> > > controls are usually for default handling, but userspace can still
> > > call the relevant functions manually if they need to. For example, for
> > > ext4, the 'discard' mount option dictates whether free blocks are
> > > discarded, but it doesn't set the policy to allow/disallow userspace
> > > from manually punching holes into files even if the mount opt is
> > > 'nodiscard'. FALLOC_FL_PROVISION is similar in that regard; it adds a
> > > manual mechanism for users to provision the files' extents, that is
> > > separate from the filesystems' default handling of provisioning files.
> > >
> >
> > What I'm trying to understand is why not let blkdev_fallocate() issue a
> > provision based on the default mode (i.e. mode == 0) of fallocate(),
> > which is already defined to mean "perform allocation?" It currently
> > issues discards or write zeroes based on variants of
> > FALLOC_FL_PUNCH_HOLE without the need for a separate FALLOC_FL_DISCARD
> > mode, for example.
> >
> It's mostly to keep the block device fallocate() semantics in-line and
> consistent with the file-specific modes: I added the separate
> filesystem fallocate() mode under the assumption that we'd want to
> keep the traditional handling for filesystems intact with (mode == 0).
> And for block devices, I didn't map the requests to mode == 0 so that
> it's less confusing to describe (eg. mode == 0 on block devices will
> issue provision; mode == 0 on files will not). It would complicate
> loopback devices, for instance; if the loop device is backed by a
> file, it would need to use (mode == FALLOC_FL_PROVISION) but if the
> loop device is backed by another block device, then the fallocate()
> call would need to switch to (mode == 0).
>
I would expect the loopback scenario for provision to behave similar to
how discards are handled. I.e., loopback receives a provision request
and translates that to fallocate(mode = 0). If the backing device is
block, blkdev_fallocate(mode = 0) translates that to another provision
request. If the backing device is a file, the associated fallocate
handler allocs/maps, if necessary, and then issues a provision on
allocation, if enabled by the fs.
AFAICT there's no need for FL_PROVISION at all in that scenario. Is
there a functional purpose to FL_PROVISION? Is the intent to try and
guarantee that a provision request propagates down the I/O stack? If so,
what happens if blocks were already preallocated in the backing file (in
the loopback file example)?
BTW, an unrelated thing I noticed is that blkdev_fallocate()
unconditionally calls truncate_bdev_range(), which probably doesn't make
sense for any sort of alloc mode.
> With the separate mode, we can describe the semantics of falllcate()
> modes a bit more cleanly, and it is common for both files and block
> devices:
>
> 1. mode == 0: allocation at the same layer, will not provision on the
> underlying device/filesystem (unsupported for block devices).
> 2. mode == FALLOC_FL_PROVISION, allocation at the layer, will
> provision on the underlying device/filesystem.
>
I think I see why you make the distinction, since the block layer
doesn't have a "this layer only" mode, but IMO it's also quite confusing
to say that mode == FL_PROVISION can allocate at the current and
underlying layer(s) but mode == 0 to that underlying layer cannot.
Either way, if you want to propose a new falloc mode/modifier, it
probably warrants a more detailed commit log with more explanation of
the purpose, examples of behavior, perhaps some details on how the mode
might be documented in man pages, etc.
Brian
> Block devices don't technically need to use a separate mode, but it
> makes it much less confusing if filesystems are already using a
> separate mode for provision.
>
> Best
> Sarthak
>
> > Brian
> >
> > > > BTW another thing that might be useful wrt to dm-thin is to support
> > > > FALLOC_FL_UNSHARE. I.e., it looks like the previous dm-thin patch only
> > > > checks that blocks are allocated, but not whether those blocks are
> > > > shared (re: lookup_result.shared). It might be useful to do the COW in
> > > > such cases if the caller passes down a REQ_UNSHARE or some such flag.
> > > >
> > > That's an interesting idea! There's a few more things on the TODO list
> > > for this patch series but I think we can follow up with a patch to
> > > handle that as well.
> > >
> > > Sarthak
> > >
> > > > Brian
> > > >
> > > > > diff --git a/include/linux/falloc.h b/include/linux/falloc.h
> > > > > index f3f0b97b1675..a0e506255b20 100644
> > > > > --- a/include/linux/falloc.h
> > > > > +++ b/include/linux/falloc.h
> > > > > @@ -30,7 +30,8 @@ struct space_resv {
> > > > > FALLOC_FL_COLLAPSE_RANGE | \
> > > > > FALLOC_FL_ZERO_RANGE | \
> > > > > FALLOC_FL_INSERT_RANGE | \
> > > > > - FALLOC_FL_UNSHARE_RANGE)
> > > > > + FALLOC_FL_UNSHARE_RANGE | \
> > > > > + FALLOC_FL_PROVISION)
> > > > >
> > > > > /* on ia32 l_start is on a 32-bit boundary */
> > > > > #if defined(CONFIG_X86_64)
> > > > > diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
> > > > > index 51398fa57f6c..2d323d113eed 100644
> > > > > --- a/include/uapi/linux/falloc.h
> > > > > +++ b/include/uapi/linux/falloc.h
> > > > > @@ -77,4 +77,12 @@
> > > > > */
> > > > > #define FALLOC_FL_UNSHARE_RANGE 0x40
> > > > >
> > > > > +/*
> > > > > + * FALLOC_FL_PROVISION acts as a hint for thinly provisioned devices to allocate
> > > > > + * blocks for the range/EOF.
> > > > > + *
> > > > > + * FALLOC_FL_PROVISION can only be used with allocate-mode fallocate.
> > > > > + */
> > > > > +#define FALLOC_FL_PROVISION 0x80
> > > > > +
> > > > > #endif /* _UAPI_FALLOC_H_ */
> > > > > --
> > > > > 2.31.0
> > > > >
> > > >
> > >
> >
>
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
WARNING: multiple messages have this Message-ID (diff)
From: Brian Foster <bfoster@redhat.com>
To: Sarthak Kukreti <sarthakkukreti@chromium.org>
Cc: dm-devel@redhat.com, linux-block@vger.kernel.org,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
Jens Axboe <axboe@kernel.dk>,
"Michael S . Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Alasdair Kergon <agk@redhat.com>,
Mike Snitzer <snitzer@kernel.org>, Theodore Ts'o <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Bart Van Assche <bvanassche@google.com>,
Daniil Lunev <dlunev@google.com>, Evan Green <evgreen@google.com>,
Gwendal Grignou <gwendal@google.com>
Subject: Re: [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION
Date: Thu, 22 Sep 2022 14:29:36 -0400 [thread overview]
Message-ID: <YyypkOr3l2lx4Odi@bfoster> (raw)
In-Reply-To: <CAG9=OMPEoShYMx6A+p97-tw4MuLpgOEpy7aFs5CH6wTedptALQ@mail.gmail.com>
On Thu, Sep 22, 2022 at 01:04:33AM -0700, Sarthak Kukreti wrote:
> On Wed, Sep 21, 2022 at 8:39 AM Brian Foster <bfoster@redhat.com> wrote:
> >
> > On Fri, Sep 16, 2022 at 02:02:31PM -0700, Sarthak Kukreti wrote:
> > > On Fri, Sep 16, 2022 at 4:56 AM Brian Foster <bfoster@redhat.com> wrote:
> > > >
> > > > On Thu, Sep 15, 2022 at 09:48:22AM -0700, Sarthak Kukreti wrote:
> > > > > From: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > > > >
> > > > > FALLOC_FL_PROVISION is a new fallocate() allocation mode that
> > > > > sends a hint to (supported) thinly provisioned block devices to
> > > > > allocate space for the given range of sectors via REQ_OP_PROVISION.
> > > > >
> > > > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > > > > ---
> > > > > block/fops.c | 7 ++++++-
> > > > > include/linux/falloc.h | 3 ++-
> > > > > include/uapi/linux/falloc.h | 8 ++++++++
> > > > > 3 files changed, 16 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/block/fops.c b/block/fops.c
> > > > > index b90742595317..a436a7596508 100644
> > > > > --- a/block/fops.c
> > > > > +++ b/block/fops.c
> > > > ...
> > > > > @@ -661,6 +662,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
> > > > > error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
> > > > > len >> SECTOR_SHIFT, GFP_KERNEL);
> > > > > break;
> > > > > + case FALLOC_FL_PROVISION:
> > > > > + error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
> > > > > + len >> SECTOR_SHIFT, GFP_KERNEL);
> > > > > + break;
> > > > > default:
> > > > > error = -EOPNOTSUPP;
> > > > > }
> > > >
> > > > Hi Sarthak,
> > > >
> > > > Neat mechanism.. I played with something very similar in the past (that
> > > > was much more crudely hacked up to target dm-thin) to allow filesystems
> > > > to request a thinly provisioned device to allocate blocks and try to do
> > > > a better job of avoiding inactivation when overprovisioned.
> > > >
> > > > One thing I'm a little curious about here.. what's the need for a new
> > > > fallocate mode? On a cursory glance, the provision mode looks fairly
> > > > analogous to normal (mode == 0) allocation mode with the exception of
> > > > sending the request down to the bdev. blkdev_fallocate() already maps
> > > > some of the logical falloc modes (i.e. punch hole, zero range) to
> > > > sending write sames or discards, etc., and it doesn't currently look
> > > > like it supports allocation mode, so could it not map such requests to
> > > > the underlying REQ_OP_PROVISION op?
> > > >
> > > > I guess the difference would be at the filesystem level where we'd
> > > > probably need to rely on a mount option or some such to control whether
> > > > traditional fallocate issues provision ops (like you've implemented for
> > > > ext4) vs. the specific falloc command, but that seems fairly consistent
> > > > with historical punch hole/discard behavior too. Hm? You might want to
> > > > cc linux-fsdevel in future posts in any event to get some more feedback
> > > > on how other filesystems might want to interact with such a thing.
> > > >
> > > Thanks for the feedback!
> > > Argh, I completely forgot that I should add linux-fsdevel. Let me
> > > re-send this with linux-fsdevel cc'd
> > >
> > > There's a slight distinction is that the current filesystem-level
> > > controls are usually for default handling, but userspace can still
> > > call the relevant functions manually if they need to. For example, for
> > > ext4, the 'discard' mount option dictates whether free blocks are
> > > discarded, but it doesn't set the policy to allow/disallow userspace
> > > from manually punching holes into files even if the mount opt is
> > > 'nodiscard'. FALLOC_FL_PROVISION is similar in that regard; it adds a
> > > manual mechanism for users to provision the files' extents, that is
> > > separate from the filesystems' default handling of provisioning files.
> > >
> >
> > What I'm trying to understand is why not let blkdev_fallocate() issue a
> > provision based on the default mode (i.e. mode == 0) of fallocate(),
> > which is already defined to mean "perform allocation?" It currently
> > issues discards or write zeroes based on variants of
> > FALLOC_FL_PUNCH_HOLE without the need for a separate FALLOC_FL_DISCARD
> > mode, for example.
> >
> It's mostly to keep the block device fallocate() semantics in-line and
> consistent with the file-specific modes: I added the separate
> filesystem fallocate() mode under the assumption that we'd want to
> keep the traditional handling for filesystems intact with (mode == 0).
> And for block devices, I didn't map the requests to mode == 0 so that
> it's less confusing to describe (eg. mode == 0 on block devices will
> issue provision; mode == 0 on files will not). It would complicate
> loopback devices, for instance; if the loop device is backed by a
> file, it would need to use (mode == FALLOC_FL_PROVISION) but if the
> loop device is backed by another block device, then the fallocate()
> call would need to switch to (mode == 0).
>
I would expect the loopback scenario for provision to behave similar to
how discards are handled. I.e., loopback receives a provision request
and translates that to fallocate(mode = 0). If the backing device is
block, blkdev_fallocate(mode = 0) translates that to another provision
request. If the backing device is a file, the associated fallocate
handler allocs/maps, if necessary, and then issues a provision on
allocation, if enabled by the fs.
AFAICT there's no need for FL_PROVISION at all in that scenario. Is
there a functional purpose to FL_PROVISION? Is the intent to try and
guarantee that a provision request propagates down the I/O stack? If so,
what happens if blocks were already preallocated in the backing file (in
the loopback file example)?
BTW, an unrelated thing I noticed is that blkdev_fallocate()
unconditionally calls truncate_bdev_range(), which probably doesn't make
sense for any sort of alloc mode.
> With the separate mode, we can describe the semantics of falllcate()
> modes a bit more cleanly, and it is common for both files and block
> devices:
>
> 1. mode == 0: allocation at the same layer, will not provision on the
> underlying device/filesystem (unsupported for block devices).
> 2. mode == FALLOC_FL_PROVISION, allocation at the layer, will
> provision on the underlying device/filesystem.
>
I think I see why you make the distinction, since the block layer
doesn't have a "this layer only" mode, but IMO it's also quite confusing
to say that mode == FL_PROVISION can allocate at the current and
underlying layer(s) but mode == 0 to that underlying layer cannot.
Either way, if you want to propose a new falloc mode/modifier, it
probably warrants a more detailed commit log with more explanation of
the purpose, examples of behavior, perhaps some details on how the mode
might be documented in man pages, etc.
Brian
> Block devices don't technically need to use a separate mode, but it
> makes it much less confusing if filesystems are already using a
> separate mode for provision.
>
> Best
> Sarthak
>
> > Brian
> >
> > > > BTW another thing that might be useful wrt to dm-thin is to support
> > > > FALLOC_FL_UNSHARE. I.e., it looks like the previous dm-thin patch only
> > > > checks that blocks are allocated, but not whether those blocks are
> > > > shared (re: lookup_result.shared). It might be useful to do the COW in
> > > > such cases if the caller passes down a REQ_UNSHARE or some such flag.
> > > >
> > > That's an interesting idea! There's a few more things on the TODO list
> > > for this patch series but I think we can follow up with a patch to
> > > handle that as well.
> > >
> > > Sarthak
> > >
> > > > Brian
> > > >
> > > > > diff --git a/include/linux/falloc.h b/include/linux/falloc.h
> > > > > index f3f0b97b1675..a0e506255b20 100644
> > > > > --- a/include/linux/falloc.h
> > > > > +++ b/include/linux/falloc.h
> > > > > @@ -30,7 +30,8 @@ struct space_resv {
> > > > > FALLOC_FL_COLLAPSE_RANGE | \
> > > > > FALLOC_FL_ZERO_RANGE | \
> > > > > FALLOC_FL_INSERT_RANGE | \
> > > > > - FALLOC_FL_UNSHARE_RANGE)
> > > > > + FALLOC_FL_UNSHARE_RANGE | \
> > > > > + FALLOC_FL_PROVISION)
> > > > >
> > > > > /* on ia32 l_start is on a 32-bit boundary */
> > > > > #if defined(CONFIG_X86_64)
> > > > > diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
> > > > > index 51398fa57f6c..2d323d113eed 100644
> > > > > --- a/include/uapi/linux/falloc.h
> > > > > +++ b/include/uapi/linux/falloc.h
> > > > > @@ -77,4 +77,12 @@
> > > > > */
> > > > > #define FALLOC_FL_UNSHARE_RANGE 0x40
> > > > >
> > > > > +/*
> > > > > + * FALLOC_FL_PROVISION acts as a hint for thinly provisioned devices to allocate
> > > > > + * blocks for the range/EOF.
> > > > > + *
> > > > > + * FALLOC_FL_PROVISION can only be used with allocate-mode fallocate.
> > > > > + */
> > > > > +#define FALLOC_FL_PROVISION 0x80
> > > > > +
> > > > > #endif /* _UAPI_FALLOC_H_ */
> > > > > --
> > > > > 2.31.0
> > > > >
> > > >
> > >
> >
>
WARNING: multiple messages have this Message-ID (diff)
From: Brian Foster <bfoster@redhat.com>
To: Sarthak Kukreti <sarthakkukreti@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>,
Gwendal Grignou <gwendal@google.com>,
Theodore Ts'o <tytso@mit.edu>,
"Michael S . Tsirkin" <mst@redhat.com>,
Bart Van Assche <bvanassche@google.com>,
Mike Snitzer <snitzer@kernel.org>,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
linux-block@vger.kernel.org, dm-devel@redhat.com,
Andreas Dilger <adilger.kernel@dilger.ca>,
Daniil Lunev <dlunev@google.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
linux-ext4@vger.kernel.org, Evan Green <evgreen@google.com>,
Alasdair Kergon <agk@redhat.com>
Subject: Re: [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION
Date: Thu, 22 Sep 2022 14:29:36 -0400 [thread overview]
Message-ID: <YyypkOr3l2lx4Odi@bfoster> (raw)
In-Reply-To: <CAG9=OMPEoShYMx6A+p97-tw4MuLpgOEpy7aFs5CH6wTedptALQ@mail.gmail.com>
On Thu, Sep 22, 2022 at 01:04:33AM -0700, Sarthak Kukreti wrote:
> On Wed, Sep 21, 2022 at 8:39 AM Brian Foster <bfoster@redhat.com> wrote:
> >
> > On Fri, Sep 16, 2022 at 02:02:31PM -0700, Sarthak Kukreti wrote:
> > > On Fri, Sep 16, 2022 at 4:56 AM Brian Foster <bfoster@redhat.com> wrote:
> > > >
> > > > On Thu, Sep 15, 2022 at 09:48:22AM -0700, Sarthak Kukreti wrote:
> > > > > From: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > > > >
> > > > > FALLOC_FL_PROVISION is a new fallocate() allocation mode that
> > > > > sends a hint to (supported) thinly provisioned block devices to
> > > > > allocate space for the given range of sectors via REQ_OP_PROVISION.
> > > > >
> > > > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
> > > > > ---
> > > > > block/fops.c | 7 ++++++-
> > > > > include/linux/falloc.h | 3 ++-
> > > > > include/uapi/linux/falloc.h | 8 ++++++++
> > > > > 3 files changed, 16 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/block/fops.c b/block/fops.c
> > > > > index b90742595317..a436a7596508 100644
> > > > > --- a/block/fops.c
> > > > > +++ b/block/fops.c
> > > > ...
> > > > > @@ -661,6 +662,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
> > > > > error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
> > > > > len >> SECTOR_SHIFT, GFP_KERNEL);
> > > > > break;
> > > > > + case FALLOC_FL_PROVISION:
> > > > > + error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
> > > > > + len >> SECTOR_SHIFT, GFP_KERNEL);
> > > > > + break;
> > > > > default:
> > > > > error = -EOPNOTSUPP;
> > > > > }
> > > >
> > > > Hi Sarthak,
> > > >
> > > > Neat mechanism.. I played with something very similar in the past (that
> > > > was much more crudely hacked up to target dm-thin) to allow filesystems
> > > > to request a thinly provisioned device to allocate blocks and try to do
> > > > a better job of avoiding inactivation when overprovisioned.
> > > >
> > > > One thing I'm a little curious about here.. what's the need for a new
> > > > fallocate mode? On a cursory glance, the provision mode looks fairly
> > > > analogous to normal (mode == 0) allocation mode with the exception of
> > > > sending the request down to the bdev. blkdev_fallocate() already maps
> > > > some of the logical falloc modes (i.e. punch hole, zero range) to
> > > > sending write sames or discards, etc., and it doesn't currently look
> > > > like it supports allocation mode, so could it not map such requests to
> > > > the underlying REQ_OP_PROVISION op?
> > > >
> > > > I guess the difference would be at the filesystem level where we'd
> > > > probably need to rely on a mount option or some such to control whether
> > > > traditional fallocate issues provision ops (like you've implemented for
> > > > ext4) vs. the specific falloc command, but that seems fairly consistent
> > > > with historical punch hole/discard behavior too. Hm? You might want to
> > > > cc linux-fsdevel in future posts in any event to get some more feedback
> > > > on how other filesystems might want to interact with such a thing.
> > > >
> > > Thanks for the feedback!
> > > Argh, I completely forgot that I should add linux-fsdevel. Let me
> > > re-send this with linux-fsdevel cc'd
> > >
> > > There's a slight distinction is that the current filesystem-level
> > > controls are usually for default handling, but userspace can still
> > > call the relevant functions manually if they need to. For example, for
> > > ext4, the 'discard' mount option dictates whether free blocks are
> > > discarded, but it doesn't set the policy to allow/disallow userspace
> > > from manually punching holes into files even if the mount opt is
> > > 'nodiscard'. FALLOC_FL_PROVISION is similar in that regard; it adds a
> > > manual mechanism for users to provision the files' extents, that is
> > > separate from the filesystems' default handling of provisioning files.
> > >
> >
> > What I'm trying to understand is why not let blkdev_fallocate() issue a
> > provision based on the default mode (i.e. mode == 0) of fallocate(),
> > which is already defined to mean "perform allocation?" It currently
> > issues discards or write zeroes based on variants of
> > FALLOC_FL_PUNCH_HOLE without the need for a separate FALLOC_FL_DISCARD
> > mode, for example.
> >
> It's mostly to keep the block device fallocate() semantics in-line and
> consistent with the file-specific modes: I added the separate
> filesystem fallocate() mode under the assumption that we'd want to
> keep the traditional handling for filesystems intact with (mode == 0).
> And for block devices, I didn't map the requests to mode == 0 so that
> it's less confusing to describe (eg. mode == 0 on block devices will
> issue provision; mode == 0 on files will not). It would complicate
> loopback devices, for instance; if the loop device is backed by a
> file, it would need to use (mode == FALLOC_FL_PROVISION) but if the
> loop device is backed by another block device, then the fallocate()
> call would need to switch to (mode == 0).
>
I would expect the loopback scenario for provision to behave similar to
how discards are handled. I.e., loopback receives a provision request
and translates that to fallocate(mode = 0). If the backing device is
block, blkdev_fallocate(mode = 0) translates that to another provision
request. If the backing device is a file, the associated fallocate
handler allocs/maps, if necessary, and then issues a provision on
allocation, if enabled by the fs.
AFAICT there's no need for FL_PROVISION at all in that scenario. Is
there a functional purpose to FL_PROVISION? Is the intent to try and
guarantee that a provision request propagates down the I/O stack? If so,
what happens if blocks were already preallocated in the backing file (in
the loopback file example)?
BTW, an unrelated thing I noticed is that blkdev_fallocate()
unconditionally calls truncate_bdev_range(), which probably doesn't make
sense for any sort of alloc mode.
> With the separate mode, we can describe the semantics of falllcate()
> modes a bit more cleanly, and it is common for both files and block
> devices:
>
> 1. mode == 0: allocation at the same layer, will not provision on the
> underlying device/filesystem (unsupported for block devices).
> 2. mode == FALLOC_FL_PROVISION, allocation at the layer, will
> provision on the underlying device/filesystem.
>
I think I see why you make the distinction, since the block layer
doesn't have a "this layer only" mode, but IMO it's also quite confusing
to say that mode == FL_PROVISION can allocate at the current and
underlying layer(s) but mode == 0 to that underlying layer cannot.
Either way, if you want to propose a new falloc mode/modifier, it
probably warrants a more detailed commit log with more explanation of
the purpose, examples of behavior, perhaps some details on how the mode
might be documented in man pages, etc.
Brian
> Block devices don't technically need to use a separate mode, but it
> makes it much less confusing if filesystems are already using a
> separate mode for provision.
>
> Best
> Sarthak
>
> > Brian
> >
> > > > BTW another thing that might be useful wrt to dm-thin is to support
> > > > FALLOC_FL_UNSHARE. I.e., it looks like the previous dm-thin patch only
> > > > checks that blocks are allocated, but not whether those blocks are
> > > > shared (re: lookup_result.shared). It might be useful to do the COW in
> > > > such cases if the caller passes down a REQ_UNSHARE or some such flag.
> > > >
> > > That's an interesting idea! There's a few more things on the TODO list
> > > for this patch series but I think we can follow up with a patch to
> > > handle that as well.
> > >
> > > Sarthak
> > >
> > > > Brian
> > > >
> > > > > diff --git a/include/linux/falloc.h b/include/linux/falloc.h
> > > > > index f3f0b97b1675..a0e506255b20 100644
> > > > > --- a/include/linux/falloc.h
> > > > > +++ b/include/linux/falloc.h
> > > > > @@ -30,7 +30,8 @@ struct space_resv {
> > > > > FALLOC_FL_COLLAPSE_RANGE | \
> > > > > FALLOC_FL_ZERO_RANGE | \
> > > > > FALLOC_FL_INSERT_RANGE | \
> > > > > - FALLOC_FL_UNSHARE_RANGE)
> > > > > + FALLOC_FL_UNSHARE_RANGE | \
> > > > > + FALLOC_FL_PROVISION)
> > > > >
> > > > > /* on ia32 l_start is on a 32-bit boundary */
> > > > > #if defined(CONFIG_X86_64)
> > > > > diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
> > > > > index 51398fa57f6c..2d323d113eed 100644
> > > > > --- a/include/uapi/linux/falloc.h
> > > > > +++ b/include/uapi/linux/falloc.h
> > > > > @@ -77,4 +77,12 @@
> > > > > */
> > > > > #define FALLOC_FL_UNSHARE_RANGE 0x40
> > > > >
> > > > > +/*
> > > > > + * FALLOC_FL_PROVISION acts as a hint for thinly provisioned devices to allocate
> > > > > + * blocks for the range/EOF.
> > > > > + *
> > > > > + * FALLOC_FL_PROVISION can only be used with allocate-mode fallocate.
> > > > > + */
> > > > > +#define FALLOC_FL_PROVISION 0x80
> > > > > +
> > > > > #endif /* _UAPI_FALLOC_H_ */
> > > > > --
> > > > > 2.31.0
> > > > >
> > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next prev parent reply other threads:[~2022-09-22 18:29 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-15 16:48 [dm-devel] [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 1/8] block: Introduce provisioning primitives Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-23 15:15 ` [dm-devel] " Mike Snitzer
2022-09-23 15:15 ` Mike Snitzer
2022-09-23 15:15 ` Mike Snitzer
2022-12-29 8:17 ` [dm-devel] " Sarthak Kukreti
2022-12-29 8:17 ` Sarthak Kukreti
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 2/8] dm: Add support for block provisioning Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-23 14:23 ` [dm-devel] " Mike Snitzer
2022-09-23 14:23 ` Mike Snitzer
2022-09-23 14:23 ` Mike Snitzer
2022-12-29 8:22 ` [dm-devel] " Sarthak Kukreti
2022-12-29 8:22 ` Sarthak Kukreti
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 3/8] virtio_blk: Add support for provision requests Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-16 5:48 ` [dm-devel] " Stefan Hajnoczi
2022-09-16 5:48 ` Stefan Hajnoczi
2022-09-16 5:48 ` Stefan Hajnoczi
2022-09-20 2:33 ` [dm-devel] " Sarthak Kukreti
2022-09-20 2:33 ` Sarthak Kukreti
2022-09-27 21:37 ` [dm-devel] " Michael S. Tsirkin
2022-09-27 21:37 ` Michael S. Tsirkin
2022-09-27 21:37 ` Michael S. Tsirkin
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-16 11:56 ` [dm-devel] " Brian Foster
2022-09-16 11:56 ` Brian Foster
2022-09-16 11:56 ` Brian Foster
2022-09-16 21:02 ` [dm-devel] " Sarthak Kukreti
2022-09-16 21:02 ` Sarthak Kukreti
2022-09-21 15:39 ` [dm-devel] " Brian Foster
2022-09-21 15:39 ` Brian Foster
2022-09-21 15:39 ` Brian Foster
2022-09-22 8:04 ` [dm-devel] " Sarthak Kukreti
2022-09-22 8:04 ` Sarthak Kukreti
2022-09-22 18:29 ` Brian Foster [this message]
2022-09-22 18:29 ` Brian Foster
2022-09-22 18:29 ` Brian Foster
2022-12-29 8:13 ` [dm-devel] " Sarthak Kukreti
2022-12-29 8:13 ` Sarthak Kukreti
2022-09-20 7:49 ` [dm-devel] " Christoph Hellwig
2022-09-20 7:49 ` Christoph Hellwig
2022-09-20 7:49 ` Christoph Hellwig
2022-09-21 5:54 ` [dm-devel] " Sarthak Kukreti
2022-09-21 5:54 ` Sarthak Kukreti
2022-09-21 15:21 ` [dm-devel] " Mike Snitzer
2022-09-21 15:21 ` Mike Snitzer
2022-09-21 15:21 ` Mike Snitzer
2022-09-22 8:08 ` [dm-devel] " Sarthak Kukreti
2022-09-22 8:08 ` Sarthak Kukreti
2022-09-23 8:45 ` [dm-devel] " Christoph Hellwig
2022-09-23 8:45 ` Christoph Hellwig
2022-09-23 8:45 ` Christoph Hellwig
2022-12-29 8:14 ` [dm-devel] " Sarthak Kukreti
2022-12-29 8:14 ` Sarthak Kukreti
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 5/8] loop: Add support for provision requests Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 6/8] ext4: Add support for FALLOC_FL_PROVISION Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 7/8] ext4: Add mount option for provisioning blocks during allocations Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-15 16:48 ` [dm-devel] [PATCH RFC 8/8] ext4: Add a per-file provision override xattr Sarthak Kukreti
2022-09-15 16:48 ` Sarthak Kukreti
2022-09-16 6:09 ` [dm-devel] [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage Stefan Hajnoczi
2022-09-16 6:09 ` Stefan Hajnoczi
2022-09-16 6:09 ` Stefan Hajnoczi
2022-09-16 18:48 ` [dm-devel] " Sarthak Kukreti
2022-09-16 18:48 ` Sarthak Kukreti
2022-09-16 20:01 ` [dm-devel] " Bart Van Assche
2022-09-16 20:01 ` Bart Van Assche
2022-09-16 20:01 ` Bart Van Assche
2022-09-16 21:59 ` [dm-devel] " Sarthak Kukreti
2022-09-16 21:59 ` Sarthak Kukreti
2022-09-20 7:46 ` [dm-devel] " Christoph Hellwig
2022-09-20 7:46 ` Christoph Hellwig
2022-09-20 7:46 ` Christoph Hellwig
2022-09-20 10:17 ` [dm-devel] " Daniil Lunev
2022-09-20 11:30 ` Christoph Hellwig
2022-09-20 11:30 ` Christoph Hellwig
2022-09-20 11:30 ` Christoph Hellwig
2022-09-20 21:48 ` [dm-devel] " Daniil Lunev
2022-09-21 15:08 ` Mike Snitzer
2022-09-21 15:08 ` Mike Snitzer
2022-09-21 15:08 ` Mike Snitzer
2022-09-23 8:51 ` [dm-devel] " Christoph Hellwig
2022-09-23 8:51 ` Christoph Hellwig
2022-09-23 8:51 ` Christoph Hellwig
2022-09-23 14:08 ` [dm-devel] " Mike Snitzer
2022-09-23 14:08 ` Mike Snitzer
2022-09-23 14:08 ` Mike Snitzer
2022-12-29 8:17 ` [dm-devel] " Sarthak Kukreti
2022-12-29 8:17 ` Sarthak Kukreti
2022-09-17 3:03 ` [dm-devel] " Darrick J. Wong
2022-09-17 3:03 ` Darrick J. Wong
2022-09-17 19:46 ` Sarthak Kukreti
2022-09-17 19:46 ` Sarthak Kukreti
2022-09-19 16:36 ` Stefan Hajnoczi
2022-09-19 16:36 ` Stefan Hajnoczi
2022-09-19 16:36 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YyypkOr3l2lx4Odi@bfoster \
--to=bfoster@redhat.com \
--cc=adilger.kernel@dilger.ca \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@google.com \
--cc=dlunev@google.com \
--cc=dm-devel@redhat.com \
--cc=evgreen@google.com \
--cc=gwendal@google.com \
--cc=jasowang@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=sarthakkukreti@chromium.org \
--cc=snitzer@kernel.org \
--cc=stefanha@redhat.com \
--cc=tytso@mit.edu \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.