* Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage [not found] <20220915164826.1396245-1-sarthakkukreti@google.com> @ 2022-09-16 6:09 ` Stefan Hajnoczi [not found] ` <CAG9=OMPHZqdDhX=M+ovdg5fa3x4-Q_1r5SWPa8pMTQw0mr5fPg@mail.gmail.com> [not found] ` <YyU5CyQfS+64xmnm@magnolia> ` (4 subsequent siblings) 5 siblings, 1 reply; 18+ messages in thread From: Stefan Hajnoczi @ 2022-09-16 6:09 UTC (permalink / raw) To: Sarthak Kukreti Cc: Jens Axboe, Gwendal Grignou, Theodore Ts'o, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, virtualization, linux-block, dm-devel, Andreas Dilger, Daniil Lunev, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon [-- Attachment #1.1: Type: text/plain, Size: 2593 bytes --] On Thu, Sep 15, 2022 at 09:48:18AM -0700, Sarthak Kukreti wrote: > From: Sarthak Kukreti <sarthakkukreti@chromium.org> > > Hi, > > This patch series is an RFC of a mechanism to pass through provision requests on stacked thinly provisioned storage devices/filesystems. > > The linux kernel provides several mechanisms to set up thinly provisioned block storage abstractions (eg. dm-thin, loop devices over sparse files), either directly as block devices or backing storage for filesystems. Currently, short of writing data to either the device or filesystem, there is no way for users to pre-allocate space for use in such storage setups. Consider the following use-cases: > > 1) Suspend-to-disk and resume from a dm-thin device: In order to ensure that the underlying thinpool metadata is not modified during the suspend mechanism, the dm-thin device needs to be fully provisioned. > 2) If a filesystem uses a loop device over a sparse file, fallocate() on the filesystem will allocate blocks for files but the underlying sparse file will remain intact. > 3) Another example is virtual machine using a sparse file/dm-thin as a storage device; by default, allocations within the VM boundaries will not affect the host. > 4) Several storage standards support mechanisms for thin provisioning on real hardware devices. For example: > a. The NVMe spec 1.0b section 2.1.1 loosely talks about thin provisioning: "When the THINP bit in the NSFEAT field of the Identify Namespace data structure is set to ‘1’, the controller ... shall track the number of allocated blocks in the Namespace Utilization field" > b. The SCSi Block Commands reference - 4 section references "Thin provisioned logical units", > c. UFS 3.0 spec section 13.3.3 references "Thin provisioning". When REQ_OP_PROVISION is sent on an already-allocated range of blocks, are those blocks zeroed? NVMe Write Zeroes with Deallocate=0 works this way, for example. That behavior is counterintuitive since the operation name suggests it just affects the logical block's provisioning state, not the contents of the blocks. > In all of the above situations, currently the only way for pre-allocating space is to issue writes (or use WRITE_ZEROES/WRITE_SAME). However, that does not scale well with larger pre-allocation sizes. What exactly is the issue with WRITE_ZEROES scalability? Are you referring to cases where the device doesn't support an efficient WRITE_ZEROES command and actually writes blocks filled with zeroes instead of updating internal allocation metadata cheaply? Stefan [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 183 bytes --] _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <CAG9=OMPHZqdDhX=M+ovdg5fa3x4-Q_1r5SWPa8pMTQw0mr5fPg@mail.gmail.com>]
* Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage [not found] ` <CAG9=OMPHZqdDhX=M+ovdg5fa3x4-Q_1r5SWPa8pMTQw0mr5fPg@mail.gmail.com> @ 2022-09-16 20:01 ` Bart Van Assche 2022-09-20 7:46 ` Christoph Hellwig 1 sibling, 0 replies; 18+ messages in thread From: Bart Van Assche @ 2022-09-16 20:01 UTC (permalink / raw) To: Sarthak Kukreti, Stefan Hajnoczi Cc: Jens Axboe, Gwendal Grignou, Theodore Ts'o, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, virtualization, linux-block, dm-devel, Andreas Dilger, Daniil Lunev, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon On 9/16/22 11:48, Sarthak Kukreti wrote: > Yes. On ChromiumOS, we regularly deal with storage devices that don't > support WRITE_ZEROES or that need to have it disabled, via a quirk, > due to a bug in the vendor's implementation. Using WRITE_ZEROES for > allocation makes the allocation path quite slow for such devices (not > to mention the effect on storage lifetime), so having a separate > provisioning construct is very appealing. Even for devices that do > support an efficient WRITE_ZEROES implementation but don't support > logical provisioning per-se, I suppose that the allocation path might > be a bit faster (the device driver's request queue would report > 'max_provision_sectors'=0 and the request would be short circuited > there) although I haven't benchmarked the difference. Some background information about why ChromiumOS uses thin provisioning instead of a single filesystem across the entire storage device would be welcome. Although UFS devices support thin provisioning I am not aware of any use cases in Android that would benefit from UFS thin provisioning support. Thanks, Bart. _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage [not found] ` <CAG9=OMPHZqdDhX=M+ovdg5fa3x4-Q_1r5SWPa8pMTQw0mr5fPg@mail.gmail.com> 2022-09-16 20:01 ` Bart Van Assche @ 2022-09-20 7:46 ` Christoph Hellwig [not found] ` <CAAKderPF5Z5QLxyEb80Y+90+eR0sfRmL-WfgXLp=eL=HxWSZ9g@mail.gmail.com> 1 sibling, 1 reply; 18+ messages in thread From: Christoph Hellwig @ 2022-09-20 7:46 UTC (permalink / raw) To: Sarthak Kukreti Cc: Jens Axboe, Gwendal Grignou, Theodore Ts'o, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, virtualization, linux-block, dm-devel, Andreas Dilger, Daniil Lunev, Stefan Hajnoczi, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon On Fri, Sep 16, 2022 at 11:48:34AM -0700, Sarthak Kukreti wrote: > Yes. On ChromiumOS, we regularly deal with storage devices that don't > support WRITE_ZEROES or that need to have it disabled, via a quirk, > due to a bug in the vendor's implementation. So bloody punich the vendors for it. Unlike most of the Linux community your actually have purchasing power and you'd help everyone by making use of that instead adding hacks to upstream. _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <CAAKderPF5Z5QLxyEb80Y+90+eR0sfRmL-WfgXLp=eL=HxWSZ9g@mail.gmail.com>]
* Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage [not found] ` <CAAKderPF5Z5QLxyEb80Y+90+eR0sfRmL-WfgXLp=eL=HxWSZ9g@mail.gmail.com> @ 2022-09-20 11:30 ` Christoph Hellwig [not found] ` <CAAKderNcHpbBqWqqd5-WuKLRCQQUt7a_4D4ti4gy15+fKGK0vQ@mail.gmail.com> 0 siblings, 1 reply; 18+ messages in thread From: Christoph Hellwig @ 2022-09-20 11:30 UTC (permalink / raw) To: Daniil Lunev Cc: Jens Axboe, linux-block, Theodore Ts'o, Sarthak Kukreti, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, Gwendal Grignou, virtualization, Christoph Hellwig, dm-devel, Andreas Dilger, Stefan Hajnoczi, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon On Tue, Sep 20, 2022 at 08:17:10PM +1000, Daniil Lunev wrote: > to WRITE ZERO command in NVMe, but to WRITE UNAVAILABLE in There is no such thing as WRITE UNAVAILABLE in NVMe. > NVME 2.0 spec, and to UNMAP ANCHORED in SCSI spec. The SCSI anchored LBA state is quite complicated, and in addition to UNMAP you can also create it using WRITE SAME, which is at least partially useful, as it allows for sensible initialization pattern. For the purpose of Linux that woud be 0. That being siad you still haven't actually explained what problem you're even trying to solve. _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <CAAKderNcHpbBqWqqd5-WuKLRCQQUt7a_4D4ti4gy15+fKGK0vQ@mail.gmail.com>]
* Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage [not found] ` <CAAKderNcHpbBqWqqd5-WuKLRCQQUt7a_4D4ti4gy15+fKGK0vQ@mail.gmail.com> @ 2022-09-21 15:08 ` Mike Snitzer 2022-09-23 8:51 ` Christoph Hellwig 1 sibling, 0 replies; 18+ messages in thread From: Mike Snitzer @ 2022-09-21 15:08 UTC (permalink / raw) To: Daniil Lunev Cc: Jens Axboe, linux-block, Theodore Ts'o, Sarthak Kukreti, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, Gwendal Grignou, virtualization, Christoph Hellwig, dm-devel, Andreas Dilger, Stefan Hajnoczi, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon On Tue, Sep 20 2022 at 5:48P -0400, Daniil Lunev <dlunev@google.com> wrote: > > There is no such thing as WRITE UNAVAILABLE in NVMe. > Apologize, that is WRITE UNCORRECTABLE. Chapter 3.2.7 of > NVM Express NVM Command Set Specification 1.0b > > > That being siad you still haven't actually explained what problem > > you're even trying to solve. > > The specific problem is the following: > * There is an thinpool over a physical device > * There are multiple logical volumes over the thin pool > * Each logical volume has an independent file system and an > independent application running over it > * Each application is potentially allowed to consume the entirety > of the disk space - there is no strict size limit for application > * Applications need to pre-allocate space sometime, for which > they use fallocate. Once the operation succeeded, the application > assumed the space is guaranteed to be there for it. > * Since filesystems on the volumes are independent, filesystem > level enforcement of size constraints is impossible and the only > common level is the thin pool, thus, each fallocate has to find its > representation in thin pool one way or another - otherwise you > may end up in the situation, where FS thinks it has allocated space > but when it tries to actually write it, the thin pool is already > exhausted. > * Hole-Punching fallocate will not reach the thin pool, so the only > solution presently is zero-writing pre-allocate. > * Not all storage devices support zero-writing efficiently - apart > from NVMe being or not being capable of doing efficient write > zero - changing which is easier said than done, and would take > years - there are also other types of storage devices that do not > have WRITE ZERO capability in the first place or have it in a > peculiar way. And adding custom WRITE ZERO to LVM would be > arguably a much bigger hack. > * Thus, a provisioning block operation allows an interface specific > operation that guarantees the presence of the block in the > mapped space. LVM Thin-pool itself is the primary target for our > use case but the argument is that this operation maps well to > other interfaces which allow thinly provisioned units. Thanks for this overview. Should help level-set others. Adding fallocate support has been a long-standing dm-thin TODO item for me. I just never got around to it. So thanks to Sarthak, you and anyone else who had a hand in developing this. I had a look at the DM thin implementation and it looks pretty simple (doesn't require a thin-metadata change, etc). I'll look closer at the broader implementation (block, etc) but I'm encouraged by what I'm seeing. Mike _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage [not found] ` <CAAKderNcHpbBqWqqd5-WuKLRCQQUt7a_4D4ti4gy15+fKGK0vQ@mail.gmail.com> 2022-09-21 15:08 ` Mike Snitzer @ 2022-09-23 8:51 ` Christoph Hellwig 2022-09-23 14:08 ` Mike Snitzer 1 sibling, 1 reply; 18+ messages in thread From: Christoph Hellwig @ 2022-09-23 8:51 UTC (permalink / raw) To: Daniil Lunev Cc: Jens Axboe, linux-block, Theodore Ts'o, Sarthak Kukreti, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, Gwendal Grignou, virtualization, Christoph Hellwig, dm-devel, Andreas Dilger, Stefan Hajnoczi, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon On Wed, Sep 21, 2022 at 07:48:50AM +1000, Daniil Lunev wrote: > > There is no such thing as WRITE UNAVAILABLE in NVMe. > Apologize, that is WRITE UNCORRECTABLE. Chapter 3.2.7 of > NVM Express NVM Command Set Specification 1.0b Write uncorrectable is a very different thing, and the equivalent of the horribly misnamed SCSI WRITE LONG COMMAND. It injects an unrecoverable error, and does not provision anything. > * Each application is potentially allowed to consume the entirety > of the disk space - there is no strict size limit for application > * Applications need to pre-allocate space sometime, for which > they use fallocate. Once the operation succeeded, the application > assumed the space is guaranteed to be there for it. > * Since filesystems on the volumes are independent, filesystem > level enforcement of size constraints is impossible and the only > common level is the thin pool, thus, each fallocate has to find its > representation in thin pool one way or another - otherwise you > may end up in the situation, where FS thinks it has allocated space > but when it tries to actually write it, the thin pool is already > exhausted. > * Hole-Punching fallocate will not reach the thin pool, so the only > solution presently is zero-writing pre-allocate. To me it sounds like you want a non-thin pool in dm-thin and/or guaranted space reservations for it. > * Thus, a provisioning block operation allows an interface specific > operation that guarantees the presence of the block in the > mapped space. LVM Thin-pool itself is the primary target for our > use case but the argument is that this operation maps well to > other interfaces which allow thinly provisioned units. I think where you are trying to go here is badly mistaken. With flash (or hard drive SMR) there is no such thing as provisioning LBAs. Every write is out of place, and a one time space allocation does not help you at all. So fundamentally what you try to here just goes against the actual physics of modern storage media. While there are some layers that keep up a pretence, trying to that an an exposed API level is a really bad idea. _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage 2022-09-23 8:51 ` Christoph Hellwig @ 2022-09-23 14:08 ` Mike Snitzer 0 siblings, 0 replies; 18+ messages in thread From: Mike Snitzer @ 2022-09-23 14:08 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, Gwendal Grignou, Theodore Ts'o, Stefan Hajnoczi, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, virtualization, linux-block, dm-devel, Andreas Dilger, Daniil Lunev, Sarthak Kukreti, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon On Fri, Sep 23 2022 at 4:51P -0400, Christoph Hellwig <hch@infradead.org> wrote: > On Wed, Sep 21, 2022 at 07:48:50AM +1000, Daniil Lunev wrote: > > > There is no such thing as WRITE UNAVAILABLE in NVMe. > > Apologize, that is WRITE UNCORRECTABLE. Chapter 3.2.7 of > > NVM Express NVM Command Set Specification 1.0b > > Write uncorrectable is a very different thing, and the equivalent of the > horribly misnamed SCSI WRITE LONG COMMAND. It injects an unrecoverable > error, and does not provision anything. > > > * Each application is potentially allowed to consume the entirety > > of the disk space - there is no strict size limit for application > > * Applications need to pre-allocate space sometime, for which > > they use fallocate. Once the operation succeeded, the application > > assumed the space is guaranteed to be there for it. > > * Since filesystems on the volumes are independent, filesystem > > level enforcement of size constraints is impossible and the only > > common level is the thin pool, thus, each fallocate has to find its > > representation in thin pool one way or another - otherwise you > > may end up in the situation, where FS thinks it has allocated space > > but when it tries to actually write it, the thin pool is already > > exhausted. > > * Hole-Punching fallocate will not reach the thin pool, so the only > > solution presently is zero-writing pre-allocate. > > To me it sounds like you want a non-thin pool in dm-thin and/or > guaranted space reservations for it. What is implemented in this patchset: enablement for dm-thinp to actually provide guarantees which fallocate requires. Seems you're getting hung up on the finishing details in HW (details which are _not_ the point of this patchset). The proposed changes are in service to _Linux_ code. The patchset implements the primitive from top (ext4) to bottom (dm-thinp, loop). It stops short of implementing handling everywhere that'd need it (e.g. in XFS, etc). But those changes can come as follow-on work once the primitive is established top to bottom. But you know all this ;) > > * Thus, a provisioning block operation allows an interface specific > > operation that guarantees the presence of the block in the > > mapped space. LVM Thin-pool itself is the primary target for our > > use case but the argument is that this operation maps well to > > other interfaces which allow thinly provisioned units. > > I think where you are trying to go here is badly mistaken. With flash > (or hard drive SMR) there is no such thing as provisioning LBAs. Every > write is out of place, and a one time space allocation does not help > you at all. So fundamentally what you try to here just goes against > the actual physics of modern storage media. While there are some > layers that keep up a pretence, trying to that an an exposed API > level is a really bad idea. This doesn't need to be so feudal. Reserving an LBA in physical HW really isn't the point. Fact remains: an operation that ensures space is actually reserved via fallocate is long overdue (just because an FS did its job doesn't mean underlying layers reflect that). And certainly useful, even if "only" benefiting dm-thinp and the loop driver. Like other block primitives, REQ_OP_PROVISION is filtered out by block core if the device doesn't support it. That said, I agree with Brian Foster that we need really solid documentation and justification for why fallocate mode=0 cannot be used (but the case has been made in this thread). Also, I do see an issue with the implementation (relative to stacked devices): dm_table_supports_provision() is too myopic about DM. It needs to go a step further and verify that some layer in the stack actually services REQ_OP_PROVISION. Will respond to DM patch too. _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <YyU5CyQfS+64xmnm@magnolia>]
[parent not found: <CAG9=OMNPnsjaUw2EUG0XFjV94-V1eD63V+1anoGM=EWKyzXEfg@mail.gmail.com>]
* Re: [dm-devel] [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage [not found] ` <CAG9=OMNPnsjaUw2EUG0XFjV94-V1eD63V+1anoGM=EWKyzXEfg@mail.gmail.com> @ 2022-09-19 16:36 ` Stefan Hajnoczi 0 siblings, 0 replies; 18+ messages in thread From: Stefan Hajnoczi @ 2022-09-19 16:36 UTC (permalink / raw) To: Sarthak Kukreti Cc: Jens Axboe, Gwendal Grignou, Theodore Ts'o, Michael S . Tsirkin, Darrick J. Wong, Bart Van Assche, Mike Snitzer, linux-kernel, virtualization, linux-block, dm-devel, Evan Green, Daniil Lunev, Paolo Bonzini, Andreas Dilger, linux-ext4, Alasdair Kergon [-- Attachment #1.1: Type: text/plain, Size: 4366 bytes --] On Sat, Sep 17, 2022 at 12:46:33PM -0700, Sarthak Kukreti wrote: > On Fri, Sep 16, 2022 at 8:03 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > On Thu, Sep 15, 2022 at 09:48:18AM -0700, Sarthak Kukreti wrote: > > > From: Sarthak Kukreti <sarthakkukreti@chromium.org> > > > > > > Hi, > > > > > > This patch series is an RFC of a mechanism to pass through provision > > > requests on stacked thinly provisioned storage devices/filesystems. > > > > [Reflowed text] > > > > > The linux kernel provides several mechanisms to set up thinly > > > provisioned block storage abstractions (eg. dm-thin, loop devices over > > > sparse files), either directly as block devices or backing storage for > > > filesystems. Currently, short of writing data to either the device or > > > filesystem, there is no way for users to pre-allocate space for use in > > > such storage setups. Consider the following use-cases: > > > > > > 1) Suspend-to-disk and resume from a dm-thin device: In order to > > > ensure that the underlying thinpool metadata is not modified during > > > the suspend mechanism, the dm-thin device needs to be fully > > > provisioned. > > > 2) If a filesystem uses a loop device over a sparse file, fallocate() > > > on the filesystem will allocate blocks for files but the underlying > > > sparse file will remain intact. > > > 3) Another example is virtual machine using a sparse file/dm-thin as a > > > storage device; by default, allocations within the VM boundaries will > > > not affect the host. > > > 4) Several storage standards support mechanisms for thin provisioning > > > on real hardware devices. For example: > > > a. The NVMe spec 1.0b section 2.1.1 loosely talks about thin > > > provisioning: "When the THINP bit in the NSFEAT field of the > > > Identify Namespace data structure is set to ‘1’, the controller ... > > > shall track the number of allocated blocks in the Namespace > > > Utilization field" > > > b. The SCSi Block Commands reference - 4 section references "Thin > > > provisioned logical units", > > > c. UFS 3.0 spec section 13.3.3 references "Thin provisioning". > > > > > > In all of the above situations, currently the only way for > > > pre-allocating space is to issue writes (or use > > > WRITE_ZEROES/WRITE_SAME). However, that does not scale well with > > > larger pre-allocation sizes. > > > > > > This patchset introduces primitives to support block-level > > > provisioning (note: the term 'provisioning' is used to prevent > > > overloading the term 'allocations/pre-allocations') requests across > > > filesystems and block devices. This allows fallocate() and file > > > creation requests to reserve space across stacked layers of block > > > devices and filesystems. Currently, the patchset covers a prototype on > > > the device-mapper targets, loop device and ext4, but the same > > > mechanism can be extended to other filesystems/block devices as well > > > as extended for use with devices in 4 a-c. > > > > If you call REQ_OP_PROVISION on an unmapped LBA range of a block device > > and then try to read the provisioned blocks, what do you get? Zeroes? > > Random stale disk contents? > > > > I think I saw elsewhere in the thread that any mapped LBAs within the > > provisioning range are left alone (i.e. not zeroed) so I'll proceed on > > that basis. > > > For block devices, I'd say it's definitely possible to get stale data, depending > on the implementation of the allocation layer; for example, with dm-thinpool, > the default setting via using LVM2 tools is to zero out blocks on allocation. > But that's configurable and can be turned off to improve performance. > > Similarly, for actual devices that end up supporting thin provisioning, unless > the specification absolutely mandates that an LBA contains zeroes post > allocation, some implementations will definitely miss out on that (probably > similar to the semantics of discard_zeroes_data today). I'm operating under > the assumption that it's possible to get stale data from LBAs allocated using > provision requests at the block layer and trying to see if we can create a > safe default operating model from that. Please explain the semantics of REQ_OP_PROVISION in the code/documentation in the next revision. Thanks, Stefan [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 183 bytes --] _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20220915164826.1396245-5-sarthakkukreti@google.com>]
* Re: [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION [not found] ` <20220915164826.1396245-5-sarthakkukreti@google.com> @ 2022-09-16 11:56 ` Brian Foster [not found] ` <CAG9=OMNL1Z3DiO-usdH0k90NDsDkDQ7A7CHc4Nu6MCXKNKjWdw@mail.gmail.com> 2022-09-20 7:49 ` Christoph Hellwig 1 sibling, 1 reply; 18+ messages in thread From: Brian Foster @ 2022-09-16 11:56 UTC (permalink / raw) To: Sarthak Kukreti Cc: Jens Axboe, Gwendal Grignou, Theodore Ts'o, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, virtualization, linux-block, dm-devel, Andreas Dilger, Daniil Lunev, Stefan Hajnoczi, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon On Thu, Sep 15, 2022 at 09:48:22AM -0700, Sarthak Kukreti wrote: > From: Sarthak Kukreti <sarthakkukreti@chromium.org> > > FALLOC_FL_PROVISION is a new fallocate() allocation mode that > sends a hint to (supported) thinly provisioned block devices to > allocate space for the given range of sectors via REQ_OP_PROVISION. > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org> > --- > block/fops.c | 7 ++++++- > include/linux/falloc.h | 3 ++- > include/uapi/linux/falloc.h | 8 ++++++++ > 3 files changed, 16 insertions(+), 2 deletions(-) > > diff --git a/block/fops.c b/block/fops.c > index b90742595317..a436a7596508 100644 > --- a/block/fops.c > +++ b/block/fops.c ... > @@ -661,6 +662,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start, > error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, > len >> SECTOR_SHIFT, GFP_KERNEL); > break; > + case FALLOC_FL_PROVISION: > + error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT, > + len >> SECTOR_SHIFT, GFP_KERNEL); > + break; > default: > error = -EOPNOTSUPP; > } Hi Sarthak, Neat mechanism.. I played with something very similar in the past (that was much more crudely hacked up to target dm-thin) to allow filesystems to request a thinly provisioned device to allocate blocks and try to do a better job of avoiding inactivation when overprovisioned. One thing I'm a little curious about here.. what's the need for a new fallocate mode? On a cursory glance, the provision mode looks fairly analogous to normal (mode == 0) allocation mode with the exception of sending the request down to the bdev. blkdev_fallocate() already maps some of the logical falloc modes (i.e. punch hole, zero range) to sending write sames or discards, etc., and it doesn't currently look like it supports allocation mode, so could it not map such requests to the underlying REQ_OP_PROVISION op? I guess the difference would be at the filesystem level where we'd probably need to rely on a mount option or some such to control whether traditional fallocate issues provision ops (like you've implemented for ext4) vs. the specific falloc command, but that seems fairly consistent with historical punch hole/discard behavior too. Hm? You might want to cc linux-fsdevel in future posts in any event to get some more feedback on how other filesystems might want to interact with such a thing. BTW another thing that might be useful wrt to dm-thin is to support FALLOC_FL_UNSHARE. I.e., it looks like the previous dm-thin patch only checks that blocks are allocated, but not whether those blocks are shared (re: lookup_result.shared). It might be useful to do the COW in such cases if the caller passes down a REQ_UNSHARE or some such flag. Brian > diff --git a/include/linux/falloc.h b/include/linux/falloc.h > index f3f0b97b1675..a0e506255b20 100644 > --- a/include/linux/falloc.h > +++ b/include/linux/falloc.h > @@ -30,7 +30,8 @@ struct space_resv { > FALLOC_FL_COLLAPSE_RANGE | \ > FALLOC_FL_ZERO_RANGE | \ > FALLOC_FL_INSERT_RANGE | \ > - FALLOC_FL_UNSHARE_RANGE) > + FALLOC_FL_UNSHARE_RANGE | \ > + FALLOC_FL_PROVISION) > > /* on ia32 l_start is on a 32-bit boundary */ > #if defined(CONFIG_X86_64) > diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h > index 51398fa57f6c..2d323d113eed 100644 > --- a/include/uapi/linux/falloc.h > +++ b/include/uapi/linux/falloc.h > @@ -77,4 +77,12 @@ > */ > #define FALLOC_FL_UNSHARE_RANGE 0x40 > > +/* > + * FALLOC_FL_PROVISION acts as a hint for thinly provisioned devices to allocate > + * blocks for the range/EOF. > + * > + * FALLOC_FL_PROVISION can only be used with allocate-mode fallocate. > + */ > +#define FALLOC_FL_PROVISION 0x80 > + > #endif /* _UAPI_FALLOC_H_ */ > -- > 2.31.0 > _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <CAG9=OMNL1Z3DiO-usdH0k90NDsDkDQ7A7CHc4Nu6MCXKNKjWdw@mail.gmail.com>]
* Re: [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION [not found] ` <CAG9=OMNL1Z3DiO-usdH0k90NDsDkDQ7A7CHc4Nu6MCXKNKjWdw@mail.gmail.com> @ 2022-09-21 15:39 ` Brian Foster [not found] ` <CAG9=OMPEoShYMx6A+p97-tw4MuLpgOEpy7aFs5CH6wTedptALQ@mail.gmail.com> 0 siblings, 1 reply; 18+ messages in thread From: Brian Foster @ 2022-09-21 15:39 UTC (permalink / raw) To: Sarthak Kukreti Cc: Jens Axboe, Gwendal Grignou, Theodore Ts'o, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, virtualization, linux-block, dm-devel, Andreas Dilger, Daniil Lunev, Stefan Hajnoczi, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon On Fri, Sep 16, 2022 at 02:02:31PM -0700, Sarthak Kukreti wrote: > On Fri, Sep 16, 2022 at 4:56 AM Brian Foster <bfoster@redhat.com> wrote: > > > > On Thu, Sep 15, 2022 at 09:48:22AM -0700, Sarthak Kukreti wrote: > > > From: Sarthak Kukreti <sarthakkukreti@chromium.org> > > > > > > FALLOC_FL_PROVISION is a new fallocate() allocation mode that > > > sends a hint to (supported) thinly provisioned block devices to > > > allocate space for the given range of sectors via REQ_OP_PROVISION. > > > > > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org> > > > --- > > > block/fops.c | 7 ++++++- > > > include/linux/falloc.h | 3 ++- > > > include/uapi/linux/falloc.h | 8 ++++++++ > > > 3 files changed, 16 insertions(+), 2 deletions(-) > > > > > > diff --git a/block/fops.c b/block/fops.c > > > index b90742595317..a436a7596508 100644 > > > --- a/block/fops.c > > > +++ b/block/fops.c > > ... > > > @@ -661,6 +662,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start, > > > error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, > > > len >> SECTOR_SHIFT, GFP_KERNEL); > > > break; > > > + case FALLOC_FL_PROVISION: > > > + error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT, > > > + len >> SECTOR_SHIFT, GFP_KERNEL); > > > + break; > > > default: > > > error = -EOPNOTSUPP; > > > } > > > > Hi Sarthak, > > > > Neat mechanism.. I played with something very similar in the past (that > > was much more crudely hacked up to target dm-thin) to allow filesystems > > to request a thinly provisioned device to allocate blocks and try to do > > a better job of avoiding inactivation when overprovisioned. > > > > One thing I'm a little curious about here.. what's the need for a new > > fallocate mode? On a cursory glance, the provision mode looks fairly > > analogous to normal (mode == 0) allocation mode with the exception of > > sending the request down to the bdev. blkdev_fallocate() already maps > > some of the logical falloc modes (i.e. punch hole, zero range) to > > sending write sames or discards, etc., and it doesn't currently look > > like it supports allocation mode, so could it not map such requests to > > the underlying REQ_OP_PROVISION op? > > > > I guess the difference would be at the filesystem level where we'd > > probably need to rely on a mount option or some such to control whether > > traditional fallocate issues provision ops (like you've implemented for > > ext4) vs. the specific falloc command, but that seems fairly consistent > > with historical punch hole/discard behavior too. Hm? You might want to > > cc linux-fsdevel in future posts in any event to get some more feedback > > on how other filesystems might want to interact with such a thing. > > > Thanks for the feedback! > Argh, I completely forgot that I should add linux-fsdevel. Let me > re-send this with linux-fsdevel cc'd > > There's a slight distinction is that the current filesystem-level > controls are usually for default handling, but userspace can still > call the relevant functions manually if they need to. For example, for > ext4, the 'discard' mount option dictates whether free blocks are > discarded, but it doesn't set the policy to allow/disallow userspace > from manually punching holes into files even if the mount opt is > 'nodiscard'. FALLOC_FL_PROVISION is similar in that regard; it adds a > manual mechanism for users to provision the files' extents, that is > separate from the filesystems' default handling of provisioning files. > What I'm trying to understand is why not let blkdev_fallocate() issue a provision based on the default mode (i.e. mode == 0) of fallocate(), which is already defined to mean "perform allocation?" It currently issues discards or write zeroes based on variants of FALLOC_FL_PUNCH_HOLE without the need for a separate FALLOC_FL_DISCARD mode, for example. Brian > > BTW another thing that might be useful wrt to dm-thin is to support > > FALLOC_FL_UNSHARE. I.e., it looks like the previous dm-thin patch only > > checks that blocks are allocated, but not whether those blocks are > > shared (re: lookup_result.shared). It might be useful to do the COW in > > such cases if the caller passes down a REQ_UNSHARE or some such flag. > > > That's an interesting idea! There's a few more things on the TODO list > for this patch series but I think we can follow up with a patch to > handle that as well. > > Sarthak > > > Brian > > > > > diff --git a/include/linux/falloc.h b/include/linux/falloc.h > > > index f3f0b97b1675..a0e506255b20 100644 > > > --- a/include/linux/falloc.h > > > +++ b/include/linux/falloc.h > > > @@ -30,7 +30,8 @@ struct space_resv { > > > FALLOC_FL_COLLAPSE_RANGE | \ > > > FALLOC_FL_ZERO_RANGE | \ > > > FALLOC_FL_INSERT_RANGE | \ > > > - FALLOC_FL_UNSHARE_RANGE) > > > + FALLOC_FL_UNSHARE_RANGE | \ > > > + FALLOC_FL_PROVISION) > > > > > > /* on ia32 l_start is on a 32-bit boundary */ > > > #if defined(CONFIG_X86_64) > > > diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h > > > index 51398fa57f6c..2d323d113eed 100644 > > > --- a/include/uapi/linux/falloc.h > > > +++ b/include/uapi/linux/falloc.h > > > @@ -77,4 +77,12 @@ > > > */ > > > #define FALLOC_FL_UNSHARE_RANGE 0x40 > > > > > > +/* > > > + * FALLOC_FL_PROVISION acts as a hint for thinly provisioned devices to allocate > > > + * blocks for the range/EOF. > > > + * > > > + * FALLOC_FL_PROVISION can only be used with allocate-mode fallocate. > > > + */ > > > +#define FALLOC_FL_PROVISION 0x80 > > > + > > > #endif /* _UAPI_FALLOC_H_ */ > > > -- > > > 2.31.0 > > > > > > _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <CAG9=OMPEoShYMx6A+p97-tw4MuLpgOEpy7aFs5CH6wTedptALQ@mail.gmail.com>]
* Re: [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION [not found] ` <CAG9=OMPEoShYMx6A+p97-tw4MuLpgOEpy7aFs5CH6wTedptALQ@mail.gmail.com> @ 2022-09-22 18:29 ` Brian Foster 0 siblings, 0 replies; 18+ messages in thread From: Brian Foster @ 2022-09-22 18:29 UTC (permalink / raw) To: Sarthak Kukreti Cc: Jens Axboe, Gwendal Grignou, Theodore Ts'o, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, virtualization, linux-block, dm-devel, Andreas Dilger, Daniil Lunev, Stefan Hajnoczi, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon On Thu, Sep 22, 2022 at 01:04:33AM -0700, Sarthak Kukreti wrote: > On Wed, Sep 21, 2022 at 8:39 AM Brian Foster <bfoster@redhat.com> wrote: > > > > On Fri, Sep 16, 2022 at 02:02:31PM -0700, Sarthak Kukreti wrote: > > > On Fri, Sep 16, 2022 at 4:56 AM Brian Foster <bfoster@redhat.com> wrote: > > > > > > > > On Thu, Sep 15, 2022 at 09:48:22AM -0700, Sarthak Kukreti wrote: > > > > > From: Sarthak Kukreti <sarthakkukreti@chromium.org> > > > > > > > > > > FALLOC_FL_PROVISION is a new fallocate() allocation mode that > > > > > sends a hint to (supported) thinly provisioned block devices to > > > > > allocate space for the given range of sectors via REQ_OP_PROVISION. > > > > > > > > > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org> > > > > > --- > > > > > block/fops.c | 7 ++++++- > > > > > include/linux/falloc.h | 3 ++- > > > > > include/uapi/linux/falloc.h | 8 ++++++++ > > > > > 3 files changed, 16 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/block/fops.c b/block/fops.c > > > > > index b90742595317..a436a7596508 100644 > > > > > --- a/block/fops.c > > > > > +++ b/block/fops.c > > > > ... > > > > > @@ -661,6 +662,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start, > > > > > error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, > > > > > len >> SECTOR_SHIFT, GFP_KERNEL); > > > > > break; > > > > > + case FALLOC_FL_PROVISION: > > > > > + error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT, > > > > > + len >> SECTOR_SHIFT, GFP_KERNEL); > > > > > + break; > > > > > default: > > > > > error = -EOPNOTSUPP; > > > > > } > > > > > > > > Hi Sarthak, > > > > > > > > Neat mechanism.. I played with something very similar in the past (that > > > > was much more crudely hacked up to target dm-thin) to allow filesystems > > > > to request a thinly provisioned device to allocate blocks and try to do > > > > a better job of avoiding inactivation when overprovisioned. > > > > > > > > One thing I'm a little curious about here.. what's the need for a new > > > > fallocate mode? On a cursory glance, the provision mode looks fairly > > > > analogous to normal (mode == 0) allocation mode with the exception of > > > > sending the request down to the bdev. blkdev_fallocate() already maps > > > > some of the logical falloc modes (i.e. punch hole, zero range) to > > > > sending write sames or discards, etc., and it doesn't currently look > > > > like it supports allocation mode, so could it not map such requests to > > > > the underlying REQ_OP_PROVISION op? > > > > > > > > I guess the difference would be at the filesystem level where we'd > > > > probably need to rely on a mount option or some such to control whether > > > > traditional fallocate issues provision ops (like you've implemented for > > > > ext4) vs. the specific falloc command, but that seems fairly consistent > > > > with historical punch hole/discard behavior too. Hm? You might want to > > > > cc linux-fsdevel in future posts in any event to get some more feedback > > > > on how other filesystems might want to interact with such a thing. > > > > > > > Thanks for the feedback! > > > Argh, I completely forgot that I should add linux-fsdevel. Let me > > > re-send this with linux-fsdevel cc'd > > > > > > There's a slight distinction is that the current filesystem-level > > > controls are usually for default handling, but userspace can still > > > call the relevant functions manually if they need to. For example, for > > > ext4, the 'discard' mount option dictates whether free blocks are > > > discarded, but it doesn't set the policy to allow/disallow userspace > > > from manually punching holes into files even if the mount opt is > > > 'nodiscard'. FALLOC_FL_PROVISION is similar in that regard; it adds a > > > manual mechanism for users to provision the files' extents, that is > > > separate from the filesystems' default handling of provisioning files. > > > > > > > What I'm trying to understand is why not let blkdev_fallocate() issue a > > provision based on the default mode (i.e. mode == 0) of fallocate(), > > which is already defined to mean "perform allocation?" It currently > > issues discards or write zeroes based on variants of > > FALLOC_FL_PUNCH_HOLE without the need for a separate FALLOC_FL_DISCARD > > mode, for example. > > > It's mostly to keep the block device fallocate() semantics in-line and > consistent with the file-specific modes: I added the separate > filesystem fallocate() mode under the assumption that we'd want to > keep the traditional handling for filesystems intact with (mode == 0). > And for block devices, I didn't map the requests to mode == 0 so that > it's less confusing to describe (eg. mode == 0 on block devices will > issue provision; mode == 0 on files will not). It would complicate > loopback devices, for instance; if the loop device is backed by a > file, it would need to use (mode == FALLOC_FL_PROVISION) but if the > loop device is backed by another block device, then the fallocate() > call would need to switch to (mode == 0). > I would expect the loopback scenario for provision to behave similar to how discards are handled. I.e., loopback receives a provision request and translates that to fallocate(mode = 0). If the backing device is block, blkdev_fallocate(mode = 0) translates that to another provision request. If the backing device is a file, the associated fallocate handler allocs/maps, if necessary, and then issues a provision on allocation, if enabled by the fs. AFAICT there's no need for FL_PROVISION at all in that scenario. Is there a functional purpose to FL_PROVISION? Is the intent to try and guarantee that a provision request propagates down the I/O stack? If so, what happens if blocks were already preallocated in the backing file (in the loopback file example)? BTW, an unrelated thing I noticed is that blkdev_fallocate() unconditionally calls truncate_bdev_range(), which probably doesn't make sense for any sort of alloc mode. > With the separate mode, we can describe the semantics of falllcate() > modes a bit more cleanly, and it is common for both files and block > devices: > > 1. mode == 0: allocation at the same layer, will not provision on the > underlying device/filesystem (unsupported for block devices). > 2. mode == FALLOC_FL_PROVISION, allocation at the layer, will > provision on the underlying device/filesystem. > I think I see why you make the distinction, since the block layer doesn't have a "this layer only" mode, but IMO it's also quite confusing to say that mode == FL_PROVISION can allocate at the current and underlying layer(s) but mode == 0 to that underlying layer cannot. Either way, if you want to propose a new falloc mode/modifier, it probably warrants a more detailed commit log with more explanation of the purpose, examples of behavior, perhaps some details on how the mode might be documented in man pages, etc. Brian > Block devices don't technically need to use a separate mode, but it > makes it much less confusing if filesystems are already using a > separate mode for provision. > > Best > Sarthak > > > Brian > > > > > > BTW another thing that might be useful wrt to dm-thin is to support > > > > FALLOC_FL_UNSHARE. I.e., it looks like the previous dm-thin patch only > > > > checks that blocks are allocated, but not whether those blocks are > > > > shared (re: lookup_result.shared). It might be useful to do the COW in > > > > such cases if the caller passes down a REQ_UNSHARE or some such flag. > > > > > > > That's an interesting idea! There's a few more things on the TODO list > > > for this patch series but I think we can follow up with a patch to > > > handle that as well. > > > > > > Sarthak > > > > > > > Brian > > > > > > > > > diff --git a/include/linux/falloc.h b/include/linux/falloc.h > > > > > index f3f0b97b1675..a0e506255b20 100644 > > > > > --- a/include/linux/falloc.h > > > > > +++ b/include/linux/falloc.h > > > > > @@ -30,7 +30,8 @@ struct space_resv { > > > > > FALLOC_FL_COLLAPSE_RANGE | \ > > > > > FALLOC_FL_ZERO_RANGE | \ > > > > > FALLOC_FL_INSERT_RANGE | \ > > > > > - FALLOC_FL_UNSHARE_RANGE) > > > > > + FALLOC_FL_UNSHARE_RANGE | \ > > > > > + FALLOC_FL_PROVISION) > > > > > > > > > > /* on ia32 l_start is on a 32-bit boundary */ > > > > > #if defined(CONFIG_X86_64) > > > > > diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h > > > > > index 51398fa57f6c..2d323d113eed 100644 > > > > > --- a/include/uapi/linux/falloc.h > > > > > +++ b/include/uapi/linux/falloc.h > > > > > @@ -77,4 +77,12 @@ > > > > > */ > > > > > #define FALLOC_FL_UNSHARE_RANGE 0x40 > > > > > > > > > > +/* > > > > > + * FALLOC_FL_PROVISION acts as a hint for thinly provisioned devices to allocate > > > > > + * blocks for the range/EOF. > > > > > + * > > > > > + * FALLOC_FL_PROVISION can only be used with allocate-mode fallocate. > > > > > + */ > > > > > +#define FALLOC_FL_PROVISION 0x80 > > > > > + > > > > > #endif /* _UAPI_FALLOC_H_ */ > > > > > -- > > > > > 2.31.0 > > > > > > > > > > > > > > > _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION [not found] ` <20220915164826.1396245-5-sarthakkukreti@google.com> 2022-09-16 11:56 ` [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION Brian Foster @ 2022-09-20 7:49 ` Christoph Hellwig [not found] ` <CAG9=OMNoG01UUStNs_Zhsv6mXZw0M0q2v54ZriJvHZ4aspvjEQ@mail.gmail.com> 1 sibling, 1 reply; 18+ messages in thread From: Christoph Hellwig @ 2022-09-20 7:49 UTC (permalink / raw) To: Sarthak Kukreti Cc: Jens Axboe, Gwendal Grignou, Theodore Ts'o, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, virtualization, linux-block, dm-devel, Andreas Dilger, Daniil Lunev, Stefan Hajnoczi, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon On Thu, Sep 15, 2022 at 09:48:22AM -0700, Sarthak Kukreti wrote: > From: Sarthak Kukreti <sarthakkukreti@chromium.org> > > FALLOC_FL_PROVISION is a new fallocate() allocation mode that > sends a hint to (supported) thinly provisioned block devices to > allocate space for the given range of sectors via REQ_OP_PROVISION. So, how does that "provisioning" actually work in todays world where storage is usually doing out of place writes in one or more layers, including the flash storage everyone is using. Does it give you one write? And unlimited number? Some undecided number inbetween? How is it affected by write zeroes to that range or a discard? _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <CAG9=OMNoG01UUStNs_Zhsv6mXZw0M0q2v54ZriJvHZ4aspvjEQ@mail.gmail.com>]
* Re: [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION [not found] ` <CAG9=OMNoG01UUStNs_Zhsv6mXZw0M0q2v54ZriJvHZ4aspvjEQ@mail.gmail.com> @ 2022-09-21 15:21 ` Mike Snitzer 2022-09-23 8:45 ` Christoph Hellwig 1 sibling, 0 replies; 18+ messages in thread From: Mike Snitzer @ 2022-09-21 15:21 UTC (permalink / raw) To: Sarthak Kukreti Cc: Jens Axboe, linux-block, Theodore Ts'o, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, Gwendal Grignou, virtualization, Christoph Hellwig, dm-devel, Andreas Dilger, Daniil Lunev, Stefan Hajnoczi, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon On Wed, Sep 21 2022 at 1:54P -0400, Sarthak Kukreti <sarthakkukreti@chromium.org> wrote: > On Tue, Sep 20, 2022 at 12:49 AM Christoph Hellwig <hch@infradead.org> wrote: > > > > On Thu, Sep 15, 2022 at 09:48:22AM -0700, Sarthak Kukreti wrote: > > > From: Sarthak Kukreti <sarthakkukreti@chromium.org> > > > > > > FALLOC_FL_PROVISION is a new fallocate() allocation mode that > > > sends a hint to (supported) thinly provisioned block devices to > > > allocate space for the given range of sectors via REQ_OP_PROVISION. > > > > So, how does that "provisioning" actually work in todays world where > > storage is usually doing out of place writes in one or more layers, > > including the flash storage everyone is using. Does it give you one > > write? And unlimited number? Some undecided number inbetween? > > Apologies, the patchset was a bit short on describing the semantics so > I'll expand more in the next revision; I'd say that it's the minimum > of regular mode fallocate() guarantees at each allocation layer. For > example, the guarantees from a contrived storage stack like (left to > right is bottom to top): > > [ mmc0blkp1 | ext4(1) | sparse file | loop | dm-thinp | dm-thin | ext4(2) ] > > would be predicated on the guarantees of fallocate() per allocation > layer; if ext4(1) was replaced by a filesystem that did not support > fallocate(), then there would be no guarantee that a write to a file > on ext4(2) succeeds. > > For dm-thinp, in the current implementation, the provision request > allocates blocks for the range specified and adds the mapping to the > thinpool metadata. All subsequent writes are to the same block, so > you'll be able to write to the same block inifinitely. Brian mentioned > this above, one case it doesn't cover is if provision is called on a > shared block, but the natural extension would be to allocate and > assign a new block and copy the contents of the shared block (kind of > like copy-on-provision). It follows that ChromiumOS isn't using dm-thinp's snapshot support? But please do fold in incremental dm-thinp support to properly handle shared blocks (dm-thinp already handles breaking sharing, etc.. so I'll need to see where you're hooking into that you don't get this "for free"). Mike _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION [not found] ` <CAG9=OMNoG01UUStNs_Zhsv6mXZw0M0q2v54ZriJvHZ4aspvjEQ@mail.gmail.com> 2022-09-21 15:21 ` Mike Snitzer @ 2022-09-23 8:45 ` Christoph Hellwig 1 sibling, 0 replies; 18+ messages in thread From: Christoph Hellwig @ 2022-09-23 8:45 UTC (permalink / raw) To: Sarthak Kukreti Cc: Jens Axboe, linux-block, Theodore Ts'o, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, Gwendal Grignou, virtualization, Christoph Hellwig, dm-devel, Andreas Dilger, Daniil Lunev, Stefan Hajnoczi, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon On Tue, Sep 20, 2022 at 10:54:32PM -0700, Sarthak Kukreti wrote: > [ mmc0blkp1 | ext4(1) | sparse file | loop | dm-thinp | dm-thin | ext4(2) ] > > would be predicated on the guarantees of fallocate() per allocation > layer; if ext4(1) was replaced by a filesystem that did not support > fallocate(), then there would be no guarantee that a write to a file > on ext4(2) succeeds. a write or any unlimited number of writes? _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20220915164826.1396245-3-sarthakkukreti@google.com>]
* Re: [PATCH RFC 2/8] dm: Add support for block provisioning [not found] ` <20220915164826.1396245-3-sarthakkukreti@google.com> @ 2022-09-23 14:23 ` Mike Snitzer 0 siblings, 0 replies; 18+ messages in thread From: Mike Snitzer @ 2022-09-23 14:23 UTC (permalink / raw) To: Sarthak Kukreti Cc: Jens Axboe, Gwendal Grignou, Theodore Ts'o, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, virtualization, linux-block, dm-devel, Andreas Dilger, Daniil Lunev, Stefan Hajnoczi, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon On Thu, Sep 15 2022 at 12:48P -0400, Sarthak Kukreti <sarthakkukreti@chromium.org> wrote: > From: Sarthak Kukreti <sarthakkukreti@chromium.org> > > Add support to dm devices for REQ_OP_PROVISION. The default mode > is to pass through the request and dm-thin will utilize it to provision > blocks. > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org> > --- > drivers/md/dm-crypt.c | 4 +- > drivers/md/dm-linear.c | 1 + > drivers/md/dm-table.c | 17 +++++++ > drivers/md/dm-thin.c | 86 +++++++++++++++++++++++++++++++++-- > drivers/md/dm.c | 4 ++ > include/linux/device-mapper.h | 6 +++ > 6 files changed, 113 insertions(+), 5 deletions(-) > > diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c > index 159c6806c19b..357f0899cfb6 100644 > --- a/drivers/md/dm-crypt.c > +++ b/drivers/md/dm-crypt.c > @@ -3081,6 +3081,8 @@ static int crypt_ctr_optional(struct dm_target *ti, unsigned int argc, char **ar > if (ret) > return ret; > > + ti->num_provision_bios = 1; > + > while (opt_params--) { > opt_string = dm_shift_arg(&as); > if (!opt_string) { > @@ -3384,7 +3386,7 @@ static int crypt_map(struct dm_target *ti, struct bio *bio) > * - for REQ_OP_DISCARD caller must use flush if IO ordering matters > */ > if (unlikely(bio->bi_opf & REQ_PREFLUSH || > - bio_op(bio) == REQ_OP_DISCARD)) { > + bio_op(bio) == REQ_OP_DISCARD || bio_op(bio) == REQ_OP_PROVISION)) { > bio_set_dev(bio, cc->dev->bdev); > if (bio_sectors(bio)) > bio->bi_iter.bi_sector = cc->start + > diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c > index 3212ef6aa81b..1aa782149428 100644 > --- a/drivers/md/dm-linear.c > +++ b/drivers/md/dm-linear.c > @@ -61,6 +61,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv) > ti->num_discard_bios = 1; > ti->num_secure_erase_bios = 1; > ti->num_write_zeroes_bios = 1; > + ti->num_provision_bios = 1; > ti->private = lc; > return 0; > > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c > index 332f96b58252..b7f9cb66b7ba 100644 > --- a/drivers/md/dm-table.c > +++ b/drivers/md/dm-table.c > @@ -1853,6 +1853,18 @@ static bool dm_table_supports_write_zeroes(struct dm_table *t) > return true; > } > > +static bool dm_table_supports_provision(struct dm_table *t) > +{ > + for (unsigned int i = 0; i < t->num_targets; i++) { > + struct dm_target *ti = dm_table_get_target(t, i); > + > + if (ti->num_provision_bios) > + return true; > + } > + > + return false; > +} > + This needs to go a step further and verify a device in the stack actually services REQ_OP_PROVISION. Please see dm_table_supports_discards(): it iterates all devices in the table and checks that support is advertised. For discard, DM requires that _all_ devices in a table advertise support (that is pretty strict and likely could be relaxed to _any_). You'll need ti->provision_supported (like ->discards_supported) to advertise actual support is provided by dm-thinp (even if underlying devices don't support it). And yeah, dm-thinp passdown support for REQ_OP_PROVISION can follow later as needed (if there actual HW that would benefit from REQ_OP_PROVISION). Mike _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20220915164826.1396245-2-sarthakkukreti@google.com>]
* Re: [PATCH RFC 1/8] block: Introduce provisioning primitives [not found] ` <20220915164826.1396245-2-sarthakkukreti@google.com> @ 2022-09-23 15:15 ` Mike Snitzer 0 siblings, 0 replies; 18+ messages in thread From: Mike Snitzer @ 2022-09-23 15:15 UTC (permalink / raw) To: Sarthak Kukreti Cc: Jens Axboe, Gwendal Grignou, Theodore Ts'o, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, virtualization, linux-block, dm-devel, Andreas Dilger, Daniil Lunev, Stefan Hajnoczi, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon On Thu, Sep 15 2022 at 12:48P -0400, Sarthak Kukreti <sarthakkukreti@chromium.org> wrote: > From: Sarthak Kukreti <sarthakkukreti@chromium.org> > > Introduce block request REQ_OP_PROVISION. The intent of this request > is to request underlying storage to preallocate disk space for the given > block range. Block device that support this capability will export > a provision limit within their request queues. > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org> > --- > block/blk-core.c | 5 ++++ > block/blk-lib.c | 55 +++++++++++++++++++++++++++++++++++++++ > block/blk-merge.c | 17 ++++++++++++ > block/blk-settings.c | 19 ++++++++++++++ > block/blk-sysfs.c | 8 ++++++ > block/bounce.c | 1 + > include/linux/bio.h | 6 +++-- > include/linux/blk_types.h | 5 +++- > include/linux/blkdev.h | 16 ++++++++++++ > 9 files changed, 129 insertions(+), 3 deletions(-) > > diff --git a/block/blk-settings.c b/block/blk-settings.c > index 8bb9eef5310e..be79ad68b330 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -57,6 +57,7 @@ void blk_set_default_limits(struct queue_limits *lim) > lim->misaligned = 0; > lim->zoned = BLK_ZONED_NONE; > lim->zone_write_granularity = 0; > + lim->max_provision_sectors = 0; > } > EXPORT_SYMBOL(blk_set_default_limits); > > @@ -81,6 +82,7 @@ void blk_set_stacking_limits(struct queue_limits *lim) > lim->max_dev_sectors = UINT_MAX; > lim->max_write_zeroes_sectors = UINT_MAX; > lim->max_zone_append_sectors = UINT_MAX; > + lim->max_provision_sectors = UINT_MAX; > } > EXPORT_SYMBOL(blk_set_stacking_limits); > Please work through the blk_stack_limits() implementation too (simple min_not_zero?). _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20220915164826.1396245-4-sarthakkukreti@google.com>]
* Re: [PATCH RFC 3/8] virtio_blk: Add support for provision requests [not found] ` <20220915164826.1396245-4-sarthakkukreti@google.com> @ 2022-09-16 5:48 ` Stefan Hajnoczi 2022-09-27 21:37 ` Michael S. Tsirkin 1 sibling, 0 replies; 18+ messages in thread From: Stefan Hajnoczi @ 2022-09-16 5:48 UTC (permalink / raw) To: Sarthak Kukreti Cc: Jens Axboe, Gwendal Grignou, Theodore Ts'o, Michael S . Tsirkin, Bart Van Assche, Mike Snitzer, linux-kernel, virtualization, linux-block, dm-devel, Andreas Dilger, Daniil Lunev, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon [-- Attachment #1.1: Type: text/plain, Size: 4536 bytes --] On Thu, Sep 15, 2022 at 09:48:21AM -0700, Sarthak Kukreti wrote: > From: Sarthak Kukreti <sarthakkukreti@chromium.org> > > Adds support for provision requests. Provision requests act like > the inverse of discards. > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org> > --- > drivers/block/virtio_blk.c | 48 +++++++++++++++++++++++++++++++++ > include/uapi/linux/virtio_blk.h | 9 +++++++ > 2 files changed, 57 insertions(+) Please send a VIRTIO spec patch too: https://github.com/oasis-tcs/virtio-spec#providing-feedback Stefan > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index 30255fcaf181..eacc2bffe1d1 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -178,6 +178,39 @@ static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap) > return 0; > } > > +static int virtblk_setup_provision(struct request *req) > +{ > + unsigned short segments = blk_rq_nr_discard_segments(req); > + unsigned short n = 0; > + > + struct virtio_blk_discard_write_zeroes *range; > + struct bio *bio; > + u32 flags = 0; > + > + range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC); > + if (!range) > + return -ENOMEM; > + > + __rq_for_each_bio(bio, req) { > + u64 sector = bio->bi_iter.bi_sector; > + u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT; > + > + range[n].flags = cpu_to_le32(flags); > + range[n].num_sectors = cpu_to_le32(num_sectors); > + range[n].sector = cpu_to_le64(sector); > + n++; > + } > + > + WARN_ON_ONCE(n != segments); > + > + req->special_vec.bv_page = virt_to_page(range); > + req->special_vec.bv_offset = offset_in_page(range); > + req->special_vec.bv_len = sizeof(*range) * segments; > + req->rq_flags |= RQF_SPECIAL_PAYLOAD; > + > + return 0; > +} > + > static void virtblk_unmap_data(struct request *req, struct virtblk_req *vbr) > { > if (blk_rq_nr_phys_segments(req)) > @@ -243,6 +276,9 @@ static blk_status_t virtblk_setup_cmd(struct virtio_device *vdev, > case REQ_OP_DRV_IN: > type = VIRTIO_BLK_T_GET_ID; > break; > + case REQ_OP_PROVISION: > + type = VIRTIO_BLK_T_PROVISION; > + break; > default: > WARN_ON_ONCE(1); > return BLK_STS_IOERR; > @@ -256,6 +292,11 @@ static blk_status_t virtblk_setup_cmd(struct virtio_device *vdev, > return BLK_STS_RESOURCE; > } > > + if (type == VIRTIO_BLK_T_PROVISION) { > + if (virtblk_setup_provision(req)) > + return BLK_STS_RESOURCE; > + } > + > return 0; > } > > @@ -1075,6 +1116,12 @@ static int virtblk_probe(struct virtio_device *vdev) > blk_queue_max_write_zeroes_sectors(q, v ? v : UINT_MAX); > } > > + if (virtio_has_feature(vdev, VIRTIO_BLK_F_PROVISION)) { > + virtio_cread(vdev, struct virtio_blk_config, > + max_provision_sectors, &v); > + q->limits.max_provision_sectors = v ? v : UINT_MAX; > + } > + > virtblk_update_capacity(vblk, false); > virtio_device_ready(vdev); > > @@ -1177,6 +1224,7 @@ static unsigned int features[] = { > VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, > VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE, > VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES, > + VIRTIO_BLK_F_PROVISION, > }; > > static struct virtio_driver virtio_blk = { > diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h > index d888f013d9ff..184f8cf6d185 100644 > --- a/include/uapi/linux/virtio_blk.h > +++ b/include/uapi/linux/virtio_blk.h > @@ -40,6 +40,7 @@ > #define VIRTIO_BLK_F_MQ 12 /* support more than one vq */ > #define VIRTIO_BLK_F_DISCARD 13 /* DISCARD is supported */ > #define VIRTIO_BLK_F_WRITE_ZEROES 14 /* WRITE ZEROES is supported */ > +#define VIRTIO_BLK_F_PROVISION 15 /* provision is supported */ > > /* Legacy feature bits */ > #ifndef VIRTIO_BLK_NO_LEGACY > @@ -120,6 +121,11 @@ struct virtio_blk_config { > */ > __u8 write_zeroes_may_unmap; > > + /* > + * The maximum number of sectors in a provision request. > + */ > + __virtio32 max_provision_sectors; > + > __u8 unused1[3]; > } __attribute__((packed)); > > @@ -155,6 +161,9 @@ struct virtio_blk_config { > /* Write zeroes command */ > #define VIRTIO_BLK_T_WRITE_ZEROES 13 > > +/* Provision command */ > +#define VIRTIO_BLK_T_PROVISION 14 > + > #ifndef VIRTIO_BLK_NO_LEGACY > /* Barrier before this op. */ > #define VIRTIO_BLK_T_BARRIER 0x80000000 > -- > 2.31.0 > [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 183 bytes --] _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH RFC 3/8] virtio_blk: Add support for provision requests [not found] ` <20220915164826.1396245-4-sarthakkukreti@google.com> 2022-09-16 5:48 ` [PATCH RFC 3/8] virtio_blk: Add support for provision requests Stefan Hajnoczi @ 2022-09-27 21:37 ` Michael S. Tsirkin 1 sibling, 0 replies; 18+ messages in thread From: Michael S. Tsirkin @ 2022-09-27 21:37 UTC (permalink / raw) To: Sarthak Kukreti Cc: Jens Axboe, Gwendal Grignou, Theodore Ts'o, Bart Van Assche, Mike Snitzer, linux-kernel, virtualization, linux-block, dm-devel, Andreas Dilger, Daniil Lunev, Stefan Hajnoczi, Paolo Bonzini, linux-ext4, Evan Green, Alasdair Kergon On Thu, Sep 15, 2022 at 09:48:21AM -0700, Sarthak Kukreti wrote: > From: Sarthak Kukreti <sarthakkukreti@chromium.org> > > Adds support for provision requests. Provision requests act like > the inverse of discards. > > Signed-off-by: Sarthak Kukreti <sarthakkukreti@chromium.org> > --- > drivers/block/virtio_blk.c | 48 +++++++++++++++++++++++++++++++++ > include/uapi/linux/virtio_blk.h | 9 +++++++ > 2 files changed, 57 insertions(+) > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index 30255fcaf181..eacc2bffe1d1 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -178,6 +178,39 @@ static int virtblk_setup_discard_write_zeroes(struct request *req, bool unmap) > return 0; > } > > +static int virtblk_setup_provision(struct request *req) > +{ > + unsigned short segments = blk_rq_nr_discard_segments(req); > + unsigned short n = 0; > + > + struct virtio_blk_discard_write_zeroes *range; > + struct bio *bio; > + u32 flags = 0; > + > + range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC); > + if (!range) > + return -ENOMEM; > + > + __rq_for_each_bio(bio, req) { > + u64 sector = bio->bi_iter.bi_sector; > + u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT; > + > + range[n].flags = cpu_to_le32(flags); > + range[n].num_sectors = cpu_to_le32(num_sectors); > + range[n].sector = cpu_to_le64(sector); > + n++; > + } > + > + WARN_ON_ONCE(n != segments); > + > + req->special_vec.bv_page = virt_to_page(range); > + req->special_vec.bv_offset = offset_in_page(range); > + req->special_vec.bv_len = sizeof(*range) * segments; > + req->rq_flags |= RQF_SPECIAL_PAYLOAD; > + > + return 0; > +} > + > static void virtblk_unmap_data(struct request *req, struct virtblk_req *vbr) > { > if (blk_rq_nr_phys_segments(req)) > @@ -243,6 +276,9 @@ static blk_status_t virtblk_setup_cmd(struct virtio_device *vdev, > case REQ_OP_DRV_IN: > type = VIRTIO_BLK_T_GET_ID; > break; > + case REQ_OP_PROVISION: > + type = VIRTIO_BLK_T_PROVISION; > + break; > default: > WARN_ON_ONCE(1); > return BLK_STS_IOERR; > @@ -256,6 +292,11 @@ static blk_status_t virtblk_setup_cmd(struct virtio_device *vdev, > return BLK_STS_RESOURCE; > } > > + if (type == VIRTIO_BLK_T_PROVISION) { > + if (virtblk_setup_provision(req)) > + return BLK_STS_RESOURCE; > + } > + > return 0; > } > > @@ -1075,6 +1116,12 @@ static int virtblk_probe(struct virtio_device *vdev) > blk_queue_max_write_zeroes_sectors(q, v ? v : UINT_MAX); > } > > + if (virtio_has_feature(vdev, VIRTIO_BLK_F_PROVISION)) { > + virtio_cread(vdev, struct virtio_blk_config, > + max_provision_sectors, &v); > + q->limits.max_provision_sectors = v ? v : UINT_MAX; > + } > + > virtblk_update_capacity(vblk, false); > virtio_device_ready(vdev); > > @@ -1177,6 +1224,7 @@ static unsigned int features[] = { > VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, > VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE, > VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES, > + VIRTIO_BLK_F_PROVISION, > }; > > static struct virtio_driver virtio_blk = { > diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h > index d888f013d9ff..184f8cf6d185 100644 > --- a/include/uapi/linux/virtio_blk.h > +++ b/include/uapi/linux/virtio_blk.h > @@ -40,6 +40,7 @@ > #define VIRTIO_BLK_F_MQ 12 /* support more than one vq */ > #define VIRTIO_BLK_F_DISCARD 13 /* DISCARD is supported */ > #define VIRTIO_BLK_F_WRITE_ZEROES 14 /* WRITE ZEROES is supported */ > +#define VIRTIO_BLK_F_PROVISION 15 /* provision is supported */ > > /* Legacy feature bits */ > #ifndef VIRTIO_BLK_NO_LEGACY > @@ -120,6 +121,11 @@ struct virtio_blk_config { > */ > __u8 write_zeroes_may_unmap; > > + /* > + * The maximum number of sectors in a provision request. > + */ > + __virtio32 max_provision_sectors; > + > __u8 unused1[3]; > } __attribute__((packed)); > > @@ -155,6 +161,9 @@ struct virtio_blk_config { > /* Write zeroes command */ > #define VIRTIO_BLK_T_WRITE_ZEROES 13 > > +/* Provision command */ > +#define VIRTIO_BLK_T_PROVISION 14 > + > #ifndef VIRTIO_BLK_NO_LEGACY > /* Barrier before this op. */ > #define VIRTIO_BLK_T_BARRIER 0x80000000 Feature bit has to be reserved in the virtio spec. Pls do this through the virtio TC mailing list. > -- > 2.31.0 _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-09-27 21:38 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20220915164826.1396245-1-sarthakkukreti@google.com>
2022-09-16 6:09 ` [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage Stefan Hajnoczi
[not found] ` <CAG9=OMPHZqdDhX=M+ovdg5fa3x4-Q_1r5SWPa8pMTQw0mr5fPg@mail.gmail.com>
2022-09-16 20:01 ` Bart Van Assche
2022-09-20 7:46 ` Christoph Hellwig
[not found] ` <CAAKderPF5Z5QLxyEb80Y+90+eR0sfRmL-WfgXLp=eL=HxWSZ9g@mail.gmail.com>
2022-09-20 11:30 ` Christoph Hellwig
[not found] ` <CAAKderNcHpbBqWqqd5-WuKLRCQQUt7a_4D4ti4gy15+fKGK0vQ@mail.gmail.com>
2022-09-21 15:08 ` Mike Snitzer
2022-09-23 8:51 ` Christoph Hellwig
2022-09-23 14:08 ` Mike Snitzer
[not found] ` <YyU5CyQfS+64xmnm@magnolia>
[not found] ` <CAG9=OMNPnsjaUw2EUG0XFjV94-V1eD63V+1anoGM=EWKyzXEfg@mail.gmail.com>
2022-09-19 16:36 ` [dm-devel] " Stefan Hajnoczi
[not found] ` <20220915164826.1396245-5-sarthakkukreti@google.com>
2022-09-16 11:56 ` [PATCH RFC 4/8] fs: Introduce FALLOC_FL_PROVISION Brian Foster
[not found] ` <CAG9=OMNL1Z3DiO-usdH0k90NDsDkDQ7A7CHc4Nu6MCXKNKjWdw@mail.gmail.com>
2022-09-21 15:39 ` Brian Foster
[not found] ` <CAG9=OMPEoShYMx6A+p97-tw4MuLpgOEpy7aFs5CH6wTedptALQ@mail.gmail.com>
2022-09-22 18:29 ` Brian Foster
2022-09-20 7:49 ` Christoph Hellwig
[not found] ` <CAG9=OMNoG01UUStNs_Zhsv6mXZw0M0q2v54ZriJvHZ4aspvjEQ@mail.gmail.com>
2022-09-21 15:21 ` Mike Snitzer
2022-09-23 8:45 ` Christoph Hellwig
[not found] ` <20220915164826.1396245-3-sarthakkukreti@google.com>
2022-09-23 14:23 ` [PATCH RFC 2/8] dm: Add support for block provisioning Mike Snitzer
[not found] ` <20220915164826.1396245-2-sarthakkukreti@google.com>
2022-09-23 15:15 ` [PATCH RFC 1/8] block: Introduce provisioning primitives Mike Snitzer
[not found] ` <20220915164826.1396245-4-sarthakkukreti@google.com>
2022-09-16 5:48 ` [PATCH RFC 3/8] virtio_blk: Add support for provision requests Stefan Hajnoczi
2022-09-27 21:37 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).