* [PATCH 1/1] virtio-blk: Add description for blk_size field @ 2024-09-25 14:52 Max Gurtovoy 2024-09-25 20:57 ` Daniel Verkamp 0 siblings, 1 reply; 10+ messages in thread From: Max Gurtovoy @ 2024-09-25 14:52 UTC (permalink / raw) To: mst, virtualization, stefanha, virtio-dev; +Cc: oren, parav, Max Gurtovoy This field is only valid when the VIRTIO_BLK_F_BLK_SIZE feature bit is set. The blk_size field represents the smallest addressable unit of data that can be read from or written to the device. It is always a power of two and typically ranges from 512 bytes to larger values such as 4 KB. Linux/Windows systems typically use 512-byte/4-KB block sizes. This description provides clarity on the constraints of the blk_size field. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> --- device-types/blk/description.tex | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/device-types/blk/description.tex b/device-types/blk/description.tex index 2712ada..88a7591 100644 --- a/device-types/blk/description.tex +++ b/device-types/blk/description.tex @@ -3,7 +3,10 @@ \section{Block Device}\label{sec:Device Types / Block Device} The virtio block device is a simple virtual block device (ie. disk). Read and write requests (and other exotic requests) are placed in one of its queues, and serviced (probably out of order) by the -device except where noted. +device except where noted. Each block device consists of a sequence of logical +blocks. A logical block represents the smallest addressable unit of data that +can be read from or written to the device. The logical block size is always a +power of two. Logical block sizes may be 512 bytes, 1KB, 2KB, 4KB, 8KB, etc. \subsection{Device ID}\label{sec:Device Types / Block Device / Device ID} 2 @@ -135,6 +138,9 @@ \subsection{Device configuration layout}\label{sec:Device Types / Block Device / present. The availability of the others all depend on various feature bits as indicated above. +The field \field{blk_size} exists only if VIRTIO_BLK_F_BLK_SIZE is set. This field +reports the logical block size of the device, expressed in bytes. + The field \field{num_queues} only exists if VIRTIO_BLK_F_MQ is set. This field specifies the number of queues. @@ -228,7 +234,7 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic \item The device size can be read from \field{capacity}. \item If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, - \field{blk_size} can be read to determine the optimal sector size + \field{blk_size} can be read to determine the logical block size for the driver to use. This does not affect the units used in the protocol (always 512 bytes), but awareness of the correct value can affect performance. @@ -282,6 +288,12 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic \drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} +Drivers MUST negotiate VIRTIO_BLK_F_BLK_SIZE if the feature is offered by the +device. + +If the VIRTIO_BLK_F_BLK_SIZE feature is not offered by the device, then drivers +MAY assume that the logical block size is 512 bytes. + Drivers SHOULD NOT negotiate VIRTIO_BLK_F_FLUSH if they are incapable of sending VIRTIO_BLK_T_FLUSH commands. @@ -319,6 +331,10 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic \devicenormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} +Devices SHOULD always offer VIRTIO_BLK_F_BLK_SIZE feature. When this feature is +offered, devices MUST initialize \field{blk_size} to a power of two greater +than or equal to 512. + Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it if they offer VIRTIO_BLK_F_CONFIG_WCE. -- 2.18.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] virtio-blk: Add description for blk_size field 2024-09-25 14:52 [PATCH 1/1] virtio-blk: Add description for blk_size field Max Gurtovoy @ 2024-09-25 20:57 ` Daniel Verkamp [not found] ` <ca954af2-4c61-41f5-8cf0-69a1233704ba@nvidia.com> 0 siblings, 1 reply; 10+ messages in thread From: Daniel Verkamp @ 2024-09-25 20:57 UTC (permalink / raw) To: Max Gurtovoy; +Cc: mst, virtualization, stefanha, virtio-dev, oren, parav On Wed, Sep 25, 2024 at 8:05 AM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > > This field is only valid when the VIRTIO_BLK_F_BLK_SIZE feature bit is > set. > > The blk_size field represents the smallest addressable unit of data that > can be read from or written to the device. It is always a power of two > and typically ranges from 512 bytes to larger values such as 4 KB. I agree that this is what the blk_size field probably *should* have meant (matching other storage protocols like ATA, SCSI, NVMe, ...), but I believe rewriting the spec like this changes the meaning of the blk_size field in an incompatible way. Previously, blk_size was described as the "optimal" sector size, so it seems clear that the driver is still allowed to send requests that are not aligned to blk_size ("awareness of the correct value can affect performance" but no mention of correctness). A device may need to do extra work to support this, of course, such as turning an unaligned write into a read-modify-write sequence if the underlying storage device doesn't support such an access directly, but such requests are still valid. It's also perfectly fine for a driver to ignore device-offered features like VIRTIO_BLK_F_BLK_SIZE and still operate correctly. > Linux/Windows systems typically use 512-byte/4-KB block sizes. While probably true, this doesn't really have anything to do with the change (and the logical block size is a property of the storage medium and controller, not anything to do with the operating system). > This description provides clarity on the constraints of the blk_size > field. > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> > --- > device-types/blk/description.tex | 20 ++++++++++++++++++-- > 1 file changed, 18 insertions(+), 2 deletions(-) > > diff --git a/device-types/blk/description.tex b/device-types/blk/description.tex > index 2712ada..88a7591 100644 > --- a/device-types/blk/description.tex > +++ b/device-types/blk/description.tex > @@ -3,7 +3,10 @@ \section{Block Device}\label{sec:Device Types / Block Device} > The virtio block device is a simple virtual block device (ie. > disk). Read and write requests (and other exotic requests) are > placed in one of its queues, and serviced (probably out of order) by the > -device except where noted. > +device except where noted. Each block device consists of a sequence of logical > +blocks. A logical block represents the smallest addressable unit of data that > +can be read from or written to the device. The logical block size is always a > +power of two. Logical block sizes may be 512 bytes, 1KB, 2KB, 4KB, 8KB, etc. Does this imply that a device may return an error for a request where the sector or num_sectors fields are not aligned to the blk_size? If so, it should be called out as a more explicit normative requirement, but that would definitely be an incompatible change. The "smallest addressable unit" wording is also more confusing than clarifying here, since in virtio-blk, the "smallest addressable unit" is always 512 bytes, regardless of blk_size (even if a change like this would enforce limits on the sector values that would actually be allowed, it would still be possible to create a request that refers to a smaller unit than blk_size, unlike other storage protocols where logical block size changes the multiplier of the address, making it impossible to address smaller units). > \subsection{Device ID}\label{sec:Device Types / Block Device / Device ID} > 2 > @@ -135,6 +138,9 @@ \subsection{Device configuration layout}\label{sec:Device Types / Block Device / > present. The availability of the others all depend on various feature > bits as indicated above. > > +The field \field{blk_size} exists only if VIRTIO_BLK_F_BLK_SIZE is set. This field > +reports the logical block size of the device, expressed in bytes. Not a new problem with this change (see another example right below), but "set" is unclear - does that mean it is present if the device offered the feature, or only if the driver also accepted it? > The field \field{num_queues} only exists if VIRTIO_BLK_F_MQ is set. This field specifies > the number of queues. > > @@ -228,7 +234,7 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic > \item The device size can be read from \field{capacity}. > > \item If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, > - \field{blk_size} can be read to determine the optimal sector size > + \field{blk_size} can be read to determine the logical block size > for the driver to use. This does not affect the units used in > the protocol (always 512 bytes), but awareness of the correct > value can affect performance. > @@ -282,6 +288,12 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic > > \drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} > > +Drivers MUST negotiate VIRTIO_BLK_F_BLK_SIZE if the feature is offered by the > +device. Why is this a MUST? This is also an incompatible change since there was no such requirement before, and drivers are free to ignore features they don't care about. > +If the VIRTIO_BLK_F_BLK_SIZE feature is not offered by the device, then drivers > +MAY assume that the logical block size is 512 bytes. > + > Drivers SHOULD NOT negotiate VIRTIO_BLK_F_FLUSH if they are incapable of > sending VIRTIO_BLK_T_FLUSH commands. > [...] In my opinion, this change should probably be something more like a non-normative comment, along the lines of "blk_size is often the logical block size of the storage medium", without changing the existing meaning from previous spec versions. If there is a desire to introduce a hard requirement for logical-block-aligned I/O requests, that seems like it would have to be a new field and/or feature bit. Thanks, -- Daniel ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <ca954af2-4c61-41f5-8cf0-69a1233704ba@nvidia.com>]
* Re: [PATCH 1/1] virtio-blk: Add description for blk_size field [not found] ` <ca954af2-4c61-41f5-8cf0-69a1233704ba@nvidia.com> @ 2024-09-26 21:44 ` Daniel Verkamp 2024-09-28 0:22 ` Max Gurtovoy 0 siblings, 1 reply; 10+ messages in thread From: Daniel Verkamp @ 2024-09-26 21:44 UTC (permalink / raw) To: Max Gurtovoy; +Cc: mst, virtualization, stefanha, virtio-dev, oren, parav On Wed, Sep 25, 2024 at 4:42 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > > Hi Daniel, > > On 25/09/2024 23:57, Daniel Verkamp wrote: >> >> On Wed, Sep 25, 2024 at 8:05 AM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: >> >> This field is only valid when the VIRTIO_BLK_F_BLK_SIZE feature bit is >> set. >> >> The blk_size field represents the smallest addressable unit of data that >> can be read from or written to the device. It is always a power of two >> and typically ranges from 512 bytes to larger values such as 4 KB. >> >> I agree that this is what the blk_size field probably *should* have >> meant (matching other storage protocols like ATA, SCSI, NVMe, ...), >> but I believe rewriting the spec like this changes the meaning of the >> blk_size field in an incompatible way. > > I believe that this is more of a clarification than a re-write. > >> Previously, blk_size was described as the "optimal" sector size, so it >> seems clear that the driver is still allowed to send requests that are >> not aligned to blk_size ("awareness of the correct value can affect >> performance" but no mention of correctness). A device may need to do >> extra work to support this, of course, such as turning an unaligned >> write into a read-modify-write sequence if the underlying storage >> device doesn't support such an access directly, but such requests are >> still valid. It's also perfectly fine for a driver to ignore >> device-offered features like VIRTIO_BLK_F_BLK_SIZE and still operate >> correctly. > > A device should report its block size (logical) using the VIRTIO_BLK_F_BLK_SIZE feature. For legacy devices that don't report this feature, Linux assumes a 512-byte block size. Sure, I agree with both points here. > Drivers should not ignore this feature when it's offered by the device. It's crucial for correct operation IMO. This is a "should", not a "must", though - adding a new MUST is a breaking change. > The statement "driver is still allowed to send requests that are not aligned to blk_size" needs explicit clarification in the spec. It definitely would be better to have it written out explicitly, one way or the other, no disagreement there. > From my understanding of the spec, the driver is not allowed to do so (otherwise error will be returned or the behavior is undefined). This doesn't seem to follow from my reading of the spec. Otherwise, phrases like "optimal sector size" and "can affect performance" would not make sense (such a request would instead be described as illegal/returning an error). > In Linux, the reported blk_size is used to set logical_block_size, while physical_block_exp is used to calculate the physical_block_size. Sure, but this is describing the behavior of one driver implementation, not really relevant to the meaning of the spec. > Regarding the specification statement: "This does not affect the units used in the protocol (always 512 bytes)": > > This likely refers to how certain fields in the virtio-blk protocol are expressed, not the actual logical block size used for I/O operations. For example: capacity reporting, max_discard_sectors and Other fields that are expressed in 512 byte units. The way the fields are expressed in the virtio-blk protocol is what is relevant in the virtio spec. Even if a virtio-blk device is sending requests to a physical SCSI/ATA/... device behind the scenes, it's the job of that virtio-blk device implementation to handle the translation and whatever mismatches there may be between the protocols. Particularly, the "sector" value in a virtio_blk_req used for VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT is always in 512-byte units, regardless of what blk_size the device reports, and regardless of what underlying physical media may be used. From the virtio spec: "The sector number indicates the offset (multiplied by 512) where the read or write is to occur." (This could probably use clarification that the "offset" is in bytes, but it still seems pretty clear that it does not depend on blk_size.) So a request with sector = 1 always means an offset of 512 bytes from the beginning of the block device, even if the advertised blk_size is 4096. This is different from (e.g.) a SCSI READ where the logical block address is in units of the block length returned by READ CAPACITY, so it is impossible to send a request that is not block size aligned. [...] >>> diff --git a/device-types/blk/description.tex b/device-types/blk/description.tex >>> index 2712ada..88a7591 100644 >>> --- a/device-types/blk/description.tex >>> +++ b/device-types/blk/description.tex >>> @@ -3,7 +3,10 @@ \section{Block Device}\label{sec:Device Types / Block Device} >>> The virtio block device is a simple virtual block device (ie. >>> disk). Read and write requests (and other exotic requests) are >>> placed in one of its queues, and serviced (probably out of order) by the >>> -device except where noted. >>> +device except where noted. Each block device consists of a sequence of logical >>> +blocks. A logical block represents the smallest addressable unit of data that >>> +can be read from or written to the device. The logical block size is always a >>> +power of two. Logical block sizes may be 512 bytes, 1KB, 2KB, 4KB, 8KB, etc. >> >> Does this imply that a device may return an error for a request where >> the sector or num_sectors fields are not aligned to the blk_size? If >> so, it should be called out as a more explicit normative requirement, >> but that would definitely be an incompatible change. > > As mentioned, the logical block size, such as 512 bytes, defines the smallest addressable unit for I/O operations on the storage device. > This has important implications for I/O requests: > 1. Minimum I/O size: You cannot send I/O requests smaller than the logical block size. For example, with a 512-byte logical block size, requests of 128 bytes are not valid. 128-byte requests are not valid, certainly, but that is already covered by the wording in the spec ("in multiples of 512 bytes" in the VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT description, and "The length of data MUST be a multiple of 512 bytes for VIRTIO_BLK_T_IN and VIRTIO_BLK_T_OUT requests" in the driver requirements). > 2. I/O alignment: All I/O requests must be aligned to the logical block size. This means the target start address (on the media device) of the I/O must be a multiple of the logical block size (the expressed units in the spec are not relevant). Unless there is a change to the spec (like this proposed patch), I don't think this is required. I suppose a device that can't manage to perform a misaligned I/O could choose to deny such requests and return VIRTIO_BLK_S_IOERR, but this is not explicitly allowed or denied by the current spec. > 3. I/O size granularity: The size of I/O requests must be a multiple of the logical block size. For a 512-byte logical block size, valid I/O sizes would be 512 bytes, 1024 bytes, 1536 bytes, and so on. Same as point 1 above, the only requirement (that I can see, anyway) is that the I/O request has to be a multiple of 512 bytes. >> The "smallest addressable unit" wording is also more confusing than >> clarifying here, since in virtio-blk, the "smallest addressable unit" >> is always 512 bytes, regardless of blk_size (even if a change like >> this would enforce limits on the sector values that would actually be >> allowed, it would still be possible to create a request that refers to >> a smaller unit than blk_size, unlike other storage protocols where >> logical block size changes the multiplier of the address, making it >> impossible to address smaller units). > > Can you please point me to the place in spec that explicitly say that "in virtio-blk, the smallest addressable unit is always 512 bytes, regardless of blk_size" ? The description of struct virtio_blk_req's sector field: "The sector number indicates the offset (multiplied by 512) where the read or write is to occur." Unless we have different definitions of "smallest addressable unit", that seems pretty clear. It is always possible to address a part of the disk in 512-byte multiples (e.g. sectors 1 through 3) even when the blk_size is larger than 512. Maybe those requests won't succeed, but it is always possible to describe a region of the block device with an address that is not a multiple of blk_size. > I didn't find it in the spec and I don't agree with this call if it is implicit. I agree it should be clarified. The current wording requires too much reading between the lines. > The implementation of the Linux kernel driver isn't implemented in this manner as well. When filling the sector field, the Linux virtio-blk driver uses the value of blk_rq_pos(), which is also always in units of 512 bytes, as far as I can tell (it returns sector_t, which is in units of 512 bytes per the definition in include/linux/blk_types.h). [...] >>> @@ -228,7 +234,7 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic >>> \item The device size can be read from \field{capacity}. >>> >>> \item If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, >>> - \field{blk_size} can be read to determine the optimal sector size >>> + \field{blk_size} can be read to determine the logical block size >>> for the driver to use. This does not affect the units used in >>> the protocol (always 512 bytes), but awareness of the correct >>> value can affect performance. >>> @@ -282,6 +288,12 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic >>> >>> \drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} >>> >>> +Drivers MUST negotiate VIRTIO_BLK_F_BLK_SIZE if the feature is offered by the >>> +device. >> >> Why is this a MUST? This is also an incompatible change since there >> was no such requirement before, and drivers are free to ignore >> features they don't care about. > > I can say SHOULD, but then the driver that is ignoring this feature may guess/assume the logical block size and it can be wrong. > > This may lead to an undefined behavior. Unless the spec is changed, if a device can't correctly handle non-blk_size-aligned requests, this seems like it would be a device bug ("undefined behavior") or at least non-spec-derived limitation (returning an error would not be undefined, but also probably wouldn't result in a working system). Certainly it would be preferable for the driver to send aligned requests, but the existing wording seems to cover that pretty well already. Thanks, -- Daniel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] virtio-blk: Add description for blk_size field 2024-09-26 21:44 ` Daniel Verkamp @ 2024-09-28 0:22 ` Max Gurtovoy 2024-10-01 1:45 ` Daniel Verkamp 0 siblings, 1 reply; 10+ messages in thread From: Max Gurtovoy @ 2024-09-28 0:22 UTC (permalink / raw) To: Daniel Verkamp; +Cc: mst, virtualization, stefanha, virtio-dev, oren, parav On 27/09/2024 0:44, Daniel Verkamp wrote: > On Wed, Sep 25, 2024 at 4:42 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: >> Hi Daniel, >> >> On 25/09/2024 23:57, Daniel Verkamp wrote: >>> On Wed, Sep 25, 2024 at 8:05 AM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: >>> >>> This field is only valid when the VIRTIO_BLK_F_BLK_SIZE feature bit is >>> set. >>> >>> The blk_size field represents the smallest addressable unit of data that >>> can be read from or written to the device. It is always a power of two >>> and typically ranges from 512 bytes to larger values such as 4 KB. >>> >>> I agree that this is what the blk_size field probably *should* have >>> meant (matching other storage protocols like ATA, SCSI, NVMe, ...), >>> but I believe rewriting the spec like this changes the meaning of the >>> blk_size field in an incompatible way. >> I believe that this is more of a clarification than a re-write. >> >>> Previously, blk_size was described as the "optimal" sector size, so it >>> seems clear that the driver is still allowed to send requests that are >>> not aligned to blk_size ("awareness of the correct value can affect >>> performance" but no mention of correctness). A device may need to do >>> extra work to support this, of course, such as turning an unaligned >>> write into a read-modify-write sequence if the underlying storage >>> device doesn't support such an access directly, but such requests are >>> still valid. It's also perfectly fine for a driver to ignore >>> device-offered features like VIRTIO_BLK_F_BLK_SIZE and still operate >>> correctly. >> A device should report its block size (logical) using the VIRTIO_BLK_F_BLK_SIZE feature. For legacy devices that don't report this feature, Linux assumes a 512-byte block size. > Sure, I agree with both points here. Great. So I guess we agree that blk_size == logical_block_size. > >> Drivers should not ignore this feature when it's offered by the device. It's crucial for correct operation IMO. > This is a "should", not a "must", though - adding a new MUST is a > breaking change. I can change it to 'SHOULD', but this modification has implications: 1. drivers that ignore the VIRTIO_BLK_F_BLK_SIZE feature will make assumptions about the logical block size. 2. Incorrect assumptions could lead to undefined behavior. I'm ok with 'SHOULD', since it's a strong recommendation for drivers to negotiate and respect this feature when available. I'm not really familiar with drivers that ignore this feature so I won't argue here. >> The statement "driver is still allowed to send requests that are not aligned to blk_size" needs explicit clarification in the spec. > It definitely would be better to have it written out explicitly, one > way or the other, no disagreement there. > >> From my understanding of the spec, the driver is not allowed to do so (otherwise error will be returned or the behavior is undefined). > This doesn't seem to follow from my reading of the spec. Otherwise, > phrases like "optimal sector size" and "can affect performance" would > not make sense (such a request would instead be described as > illegal/returning an error). Is it possible that certain parts of the spec could be improved ? Is it possible that certain implicit parts of the spec could be understood differently by various implementers or users? In reality "optimal block size that can effect performance" probably refers to logical vs. physical block size. This patch tries to make the spec clearer. > >> In Linux, the reported blk_size is used to set logical_block_size, while physical_block_exp is used to calculate the physical_block_size. > Sure, but this is describing the behavior of one driver > implementation, not really relevant to the meaning of the spec. It is relevant IMO. It's worth considering the historical context of the specification. To my understanding, the spec was initially derived from the Linux implementation, rather than the other way around. This translation from implementation to specification could have introduced some inconsistencies or ambiguities. Also worth considering virtio-block devices that were implemented before this feature bit. The WG probably tried to maintain backward compatibility with these devices. > >> Regarding the specification statement: "This does not affect the units used in the protocol (always 512 bytes)": >> >> This likely refers to how certain fields in the virtio-blk protocol are expressed, not the actual logical block size used for I/O operations. For example: capacity reporting, max_discard_sectors and Other fields that are expressed in 512 byte units. > The way the fields are expressed in the virtio-blk protocol is what is > relevant in the virtio spec. Even if a virtio-blk device is sending > requests to a physical SCSI/ATA/... device behind the scenes, it's the > job of that virtio-blk device implementation to handle the translation > and whatever mismatches there may be between the protocols. This is true. > > Particularly, the "sector" value in a virtio_blk_req used for > VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT is always in 512-byte units, > regardless of what blk_size the device reports, and regardless of what > underlying physical media may be used. > > From the virtio spec: "The sector number indicates the offset > (multiplied by 512) where the read or write is to occur." > > (This could probably use clarification that the "offset" is in bytes, > but it still seems pretty clear that it does not depend on blk_size.) > > So a request with sector = 1 always means an offset of 512 bytes from > the beginning of the block device, even if the advertised blk_size is > 4096. I appreciate your perspective, but I disagree with it. When a block device has a logical_block_size of 4K, it's not possible to perform read/write operations at arbitrary offsets like 512 bytes. The smallest addressable unit is the full 4K logical block in that case. This is a fundamental principle of block device operations, not just an optimization. Also IO granularity. I believe it's important to ensure that the specification accurately reflects these fundamental block device principles. I believe that the reason that the "sector" is multiplication of 512 is to maintain backward compatibility, and not to allow breaking these fundamental principles. > > This is different from (e.g.) a SCSI READ where the logical block > address is in units of the block length returned by READ CAPACITY, so > it is impossible to send a request that is not block size aligned. IMO, it is impossible for virtio block device and any other block device as well. > > [...] >>>> diff --git a/device-types/blk/description.tex b/device-types/blk/description.tex >>>> index 2712ada..88a7591 100644 >>>> --- a/device-types/blk/description.tex >>>> +++ b/device-types/blk/description.tex >>>> @@ -3,7 +3,10 @@ \section{Block Device}\label{sec:Device Types / Block Device} >>>> The virtio block device is a simple virtual block device (ie. >>>> disk). Read and write requests (and other exotic requests) are >>>> placed in one of its queues, and serviced (probably out of order) by the >>>> -device except where noted. >>>> +device except where noted. Each block device consists of a sequence of logical >>>> +blocks. A logical block represents the smallest addressable unit of data that >>>> +can be read from or written to the device. The logical block size is always a >>>> +power of two. Logical block sizes may be 512 bytes, 1KB, 2KB, 4KB, 8KB, etc. >>> Does this imply that a device may return an error for a request where >>> the sector or num_sectors fields are not aligned to the blk_size? If >>> so, it should be called out as a more explicit normative requirement, >>> but that would definitely be an incompatible change. >> As mentioned, the logical block size, such as 512 bytes, defines the smallest addressable unit for I/O operations on the storage device. >> This has important implications for I/O requests: >> 1. Minimum I/O size: You cannot send I/O requests smaller than the logical block size. For example, with a 512-byte logical block size, requests of 128 bytes are not valid. > 128-byte requests are not valid, certainly, but that is already > covered by the wording in the spec ("in multiples of 512 bytes" in the > VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT description, and "The length of data > MUST be a multiple of 512 bytes for VIRTIO_BLK_T_IN and > VIRTIO_BLK_T_OUT requests" in the driver requirements). I guess I should have given 4KB logical block size as a better example. In case logical block size is 4KB, the minimum IO size is 4KB and not 512. Agreed ? > >> 2. I/O alignment: All I/O requests must be aligned to the logical block size. This means the target start address (on the media device) of the I/O must be a multiple of the logical block size (the expressed units in the spec are not relevant). > Unless there is a change to the spec (like this proposed patch), I > don't think this is required. I suppose a device that can't manage to > perform a misaligned I/O could choose to deny such requests and return > VIRTIO_BLK_S_IOERR, but this is not explicitly allowed or denied by > the current spec. This is a clarification of the existing specification, not a change. I believe I've identified the main point of disagreement regarding this patch: I believe that I/O operations to storage block devices MUST be aligned to the logical block size. In the other hand, you believe it is just a recommendation for virtio block device. Maybe let's see what others think as well ? Stefan ? >> 3. I/O size granularity: The size of I/O requests must be a multiple of the logical block size. For a 512-byte logical block size, valid I/O sizes would be 512 bytes, 1024 bytes, 1536 bytes, and so on. > Same as point 1 above, the only requirement (that I can see, anyway) > is that the I/O request has to be a multiple of 512 bytes. Do you think that for logical block size of 4k one can send IO of 1K to such device and succeed ? > >>> The "smallest addressable unit" wording is also more confusing than >>> clarifying here, since in virtio-blk, the "smallest addressable unit" >>> is always 512 bytes, regardless of blk_size (even if a change like >>> this would enforce limits on the sector values that would actually be >>> allowed, it would still be possible to create a request that refers to >>> a smaller unit than blk_size, unlike other storage protocols where >>> logical block size changes the multiplier of the address, making it >>> impossible to address smaller units). >> Can you please point me to the place in spec that explicitly say that "in virtio-blk, the smallest addressable unit is always 512 bytes, regardless of blk_size" ? > The description of struct virtio_blk_req's sector field: > "The sector number indicates the offset (multiplied by 512) where the > read or write is to occur." > > Unless we have different definitions of "smallest addressable unit", > that seems pretty clear. It is always possible to address a part of > the disk in 512-byte multiples (e.g. sectors 1 through 3) even when > the blk_size is larger than 512. Maybe those requests won't succeed, > but it is always possible to describe a region of the block device > with an address that is not a multiple of blk_size. It doesn't say anything explicit about IO alignment and granularity. > >> I didn't find it in the spec and I don't agree with this call if it is implicit. > I agree it should be clarified. The current wording requires too much > reading between the lines. > >> The implementation of the Linux kernel driver isn't implemented in this manner as well. > When filling the sector field, the Linux virtio-blk driver uses the > value of blk_rq_pos(), which is also always in units of 512 bytes, as > far as I can tell (it returns sector_t, which is in units of 512 bytes > per the definition in include/linux/blk_types.h). Ignoring the units for the discussion, can it ever be unaligned to the logical block size? > > [...] >>>> @@ -228,7 +234,7 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic >>>> \item The device size can be read from \field{capacity}. >>>> >>>> \item If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, >>>> - \field{blk_size} can be read to determine the optimal sector size >>>> + \field{blk_size} can be read to determine the logical block size >>>> for the driver to use. This does not affect the units used in >>>> the protocol (always 512 bytes), but awareness of the correct >>>> value can affect performance. >>>> @@ -282,6 +288,12 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic >>>> >>>> \drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} >>>> >>>> +Drivers MUST negotiate VIRTIO_BLK_F_BLK_SIZE if the feature is offered by the >>>> +device. >>> Why is this a MUST? This is also an incompatible change since there >>> was no such requirement before, and drivers are free to ignore >>> features they don't care about. >> I can say SHOULD, but then the driver that is ignoring this feature may guess/assume the logical block size and it can be wrong. >> >> This may lead to an undefined behavior. > Unless the spec is changed, if a device can't correctly handle > non-blk_size-aligned requests, this seems like it would be a device > bug ("undefined behavior") or at least non-spec-derived limitation > (returning an error would not be undefined, but also probably wouldn't > result in a working system). Again, this is a clarification of the existing specification, not a change. IMO, storage block device implementations should be able to assume that I/O requests from drivers are aligned to the logical block size. If a device receives misaligned I/O requests, it might be appropriate for the device to fail these requests. It is not a device "bug" IMO. I'm curious if you've encountered any actual implementations where drivers send non-block-size-aligned requests to block devices. I never saw such implementation. Sounds like a theoretical objection. IMO, the specification could benefit from explicitly stating this alignment principal, which would guide both device and driver implementations. I believe that this patch is clarifying important points and would lead to more robust definition. > > Certainly it would be preferable for the driver to send aligned > requests, but the existing wording seems to cover that pretty well > already. > > Thanks, > -- Daniel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] virtio-blk: Add description for blk_size field 2024-09-28 0:22 ` Max Gurtovoy @ 2024-10-01 1:45 ` Daniel Verkamp 2024-10-02 13:42 ` Stefan Hajnoczi 2024-10-02 23:56 ` Daniel Verkamp 0 siblings, 2 replies; 10+ messages in thread From: Daniel Verkamp @ 2024-10-01 1:45 UTC (permalink / raw) To: Max Gurtovoy; +Cc: mst, virtualization, stefanha, virtio-dev, oren, parav [side note: I thought I subscribed to the virtio-dev list previously, but the confirmation got lost in the spam filter initially, should be fixed now] On Fri, Sep 27, 2024 at 5:22 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > > > On 27/09/2024 0:44, Daniel Verkamp wrote: > > On Wed, Sep 25, 2024 at 4:42 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > >> Hi Daniel, > >> > >> On 25/09/2024 23:57, Daniel Verkamp wrote: > >>> On Wed, Sep 25, 2024 at 8:05 AM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > >>> > >>> This field is only valid when the VIRTIO_BLK_F_BLK_SIZE feature bit is > >>> set. > >>> > >>> The blk_size field represents the smallest addressable unit of data that > >>> can be read from or written to the device. It is always a power of two > >>> and typically ranges from 512 bytes to larger values such as 4 KB. > >>> > >>> I agree that this is what the blk_size field probably *should* have > >>> meant (matching other storage protocols like ATA, SCSI, NVMe, ...), > >>> but I believe rewriting the spec like this changes the meaning of the > >>> blk_size field in an incompatible way. > >> I believe that this is more of a clarification than a re-write. > >> > >>> Previously, blk_size was described as the "optimal" sector size, so it > >>> seems clear that the driver is still allowed to send requests that are > >>> not aligned to blk_size ("awareness of the correct value can affect > >>> performance" but no mention of correctness). A device may need to do > >>> extra work to support this, of course, such as turning an unaligned > >>> write into a read-modify-write sequence if the underlying storage > >>> device doesn't support such an access directly, but such requests are > >>> still valid. It's also perfectly fine for a driver to ignore > >>> device-offered features like VIRTIO_BLK_F_BLK_SIZE and still operate > >>> correctly. > >> A device should report its block size (logical) using the VIRTIO_BLK_F_BLK_SIZE feature. For legacy devices that don't report this feature, Linux assumes a 512-byte block size. > > Sure, I agree with both points here. > > Great. > > So I guess we agree that blk_size == logical_block_size. Not necessarily (hence "should report" rather than "must report"). The device is free to return any arbitrary size that it feels will improve performance in blk_size (say, a RAID stripe size), even if it's not the same as the logical block size of any backing storage. > >> Drivers should not ignore this feature when it's offered by the device. It's crucial for correct operation IMO. > > This is a "should", not a "must", though - adding a new MUST is a > > breaking change. > > I can change it to 'SHOULD', but this modification has implications: > > 1. drivers that ignore the VIRTIO_BLK_F_BLK_SIZE feature will make > assumptions about the logical block size. > > 2. Incorrect assumptions could lead to undefined behavior. Where does the undefined behavior come from? > I'm ok with 'SHOULD', since it's a strong recommendation for drivers to > negotiate and respect this feature when available. > > I'm not really familiar with drivers that ignore this feature so I won't > argue here. > > >> The statement "driver is still allowed to send requests that are not aligned to blk_size" needs explicit clarification in the spec. > > It definitely would be better to have it written out explicitly, one > > way or the other, no disagreement there. > > > >> From my understanding of the spec, the driver is not allowed to do so (otherwise error will be returned or the behavior is undefined). > > This doesn't seem to follow from my reading of the spec. Otherwise, > > phrases like "optimal sector size" and "can affect performance" would > > not make sense (such a request would instead be described as > > illegal/returning an error). > > Is it possible that certain parts of the spec could be improved ? > > Is it possible that certain implicit parts of the spec could be > understood differently by various implementers or users? > > In reality "optimal block size that can effect performance" probably > refers to logical vs. physical block size. If this is the case, then it would be describing the physical_block_exp and related fields from the VIRTIO_BLK_F_TOPOLOGY feature, but it's referring to blk_size/VIRTIO_BLK_F_BLK_SIZE. Maybe that's a mistake, but that's what the spec currently says (and has said since at least the virtio-0.9.5 draft, which is the oldest one I could dig up). > This patch tries to make the spec clearer. > > > > >> In Linux, the reported blk_size is used to set logical_block_size, while physical_block_exp is used to calculate the physical_block_size. > > Sure, but this is describing the behavior of one driver > > implementation, not really relevant to the meaning of the spec. > > It is relevant IMO. > > It's worth considering the historical context of the specification. To > my understanding, the spec was initially derived from the Linux > implementation, rather than the other way around. The Linux implementation that inspired the spec originally is useful when trying to understand the context, but if a current-day implementor has to go read Linux code to understand how to interpret the spec, then the spec is not clear enough and needs to be improved (I would guess we agree on this). If the spec never matched reality, then the spec is not useful, but I don't think that's the case here. > This translation from implementation to specification could have > introduced some inconsistencies or ambiguities. > > Also worth considering virtio-block devices that were implemented before > this feature bit. The WG probably tried to maintain backward > compatibility with these devices. This feature bit and its description in the spec has existed for a long time (since before virtio 1.0), so it's likely that many more virtio-blk implementations have been written since this part of the spec was added than before it. > >> Regarding the specification statement: "This does not affect the units used in the protocol (always 512 bytes)": > >> > >> This likely refers to how certain fields in the virtio-blk protocol are expressed, not the actual logical block size used for I/O operations. For example: capacity reporting, max_discard_sectors and Other fields that are expressed in 512 byte units. > > The way the fields are expressed in the virtio-blk protocol is what is > > relevant in the virtio spec. Even if a virtio-blk device is sending > > requests to a physical SCSI/ATA/... device behind the scenes, it's the > > job of that virtio-blk device implementation to handle the translation > > and whatever mismatches there may be between the protocols. > > This is true. > > > > > Particularly, the "sector" value in a virtio_blk_req used for > > VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT is always in 512-byte units, > > regardless of what blk_size the device reports, and regardless of what > > underlying physical media may be used. > > > > From the virtio spec: "The sector number indicates the offset > > (multiplied by 512) where the read or write is to occur." > > > > (This could probably use clarification that the "offset" is in bytes, > > but it still seems pretty clear that it does not depend on blk_size.) > > > > So a request with sector = 1 always means an offset of 512 bytes from > > the beginning of the block device, even if the advertised blk_size is > > 4096. > > I appreciate your perspective, but I disagree with it. > > When a block device has a logical_block_size of 4K, it's not possible to > perform read/write operations at arbitrary offsets like 512 bytes. The > smallest addressable unit is the full 4K logical block in that case. Maybe we disagree on the meaning of "addressable", then. What I assume when I read "addressable" is "I can write the address" of such a location (regardless of whether it's valid to submit a request using this address). I can always write the address of the area of a virtio-blk device that starts at an offset of 512 bytes, and that address is always sector=1, even if the device advertises blk_size=4096 and rejects unaligned I/O, so 512 is always the smallest addressable unit. This is just like a CPU that has byte-addressable memory but only allows aligned loads/stores - it's possible to write an address like 0x123 and try to load a word from there, so that location is addressable, even though it's not valid when trying to load something larger than a byte from it. I would not object to clarifying the wording, but it can't change the meaning (if the smallest addressable unit was 4096 when blk_size=4096, then sector=1 would mean 4096, but that's not how it works). Rather than using wording like "smallest addressable unit", something like "I/O request `sector` and data length must be a multiple of `blk_size`" would accurately describe what you are trying to say (I think), but this is still a breaking change in my opinion. In theory, such a change could be introduced as a requirement tied to a new feature bit (if the device advertises it and the driver acks it, then I/O requests must be aligned, or something like that), but since it would be a new, optional feature, I don't know how useful it would be. > This is a fundamental principle of block device operations, not just an > optimization. For normal block devices that speak ATA or SCSI or NVMe command sets, yes, I agree - but this is not how virtio-blk was defined. And again, I agree that it was probably a mistake to define virtio-blk in a way that is different from all other normal block device command sets, but that's the way it has been described in the spec since at least 2012. > Also IO granularity. > > I believe it's important to ensure that the specification accurately > reflects these fundamental block device principles. > > I believe that the reason that the "sector" is multiplication of 512 is > to maintain backward compatibility, and not to allow breaking these > fundamental principles. Yes, it is exactly this backward compatibility that I'm concerned with - old drivers (or drivers that ignore blk_size, which they are clearly allowed to do per my reading of the spec) must still work correctly with devices written against new versions of the spec. > > This is different from (e.g.) a SCSI READ where the logical block > > address is in units of the block length returned by READ CAPACITY, so > > it is impossible to send a request that is not block size aligned. > > IMO, it is impossible for virtio block device and any other block device > as well. I don't understand how it would be impossible; inconvenient, sure, but definitely possible. If a device receives a VIRTIO_BLK_T_IN request that is not aligned to its preferred blk_size, it can read the minimum blk_size-aligned region around the request into an intermediate buffer and copy only the requested parts into the virtio buffer. If a device receives a VIRTIO_BLK_T_OUT request that is not aligned to its preferred blk_size, it can read the aligned region into a temporary buffer (as before), copy the data from the virtio buffer on top, and perform the aligned write. If a device implementation can't afford to buffer data (such as a hardware device with minimal on-board memory), then maybe it would be reasonable for that device implementation to choose to return an error for unaligned I/O, but it must not fail in a way that would cause undefined behavior like corrupting parts of the disk. > > [...] > >>>> diff --git a/device-types/blk/description.tex b/device-types/blk/description.tex > >>>> index 2712ada..88a7591 100644 > >>>> --- a/device-types/blk/description.tex > >>>> +++ b/device-types/blk/description.tex > >>>> @@ -3,7 +3,10 @@ \section{Block Device}\label{sec:Device Types / Block Device} > >>>> The virtio block device is a simple virtual block device (ie. > >>>> disk). Read and write requests (and other exotic requests) are > >>>> placed in one of its queues, and serviced (probably out of order) by the > >>>> -device except where noted. > >>>> +device except where noted. Each block device consists of a sequence of logical > >>>> +blocks. A logical block represents the smallest addressable unit of data that > >>>> +can be read from or written to the device. The logical block size is always a > >>>> +power of two. Logical block sizes may be 512 bytes, 1KB, 2KB, 4KB, 8KB, etc. > >>> Does this imply that a device may return an error for a request where > >>> the sector or num_sectors fields are not aligned to the blk_size? If > >>> so, it should be called out as a more explicit normative requirement, > >>> but that would definitely be an incompatible change. > >> As mentioned, the logical block size, such as 512 bytes, defines the smallest addressable unit for I/O operations on the storage device. > >> This has important implications for I/O requests: > >> 1. Minimum I/O size: You cannot send I/O requests smaller than the logical block size. For example, with a 512-byte logical block size, requests of 128 bytes are not valid. > > 128-byte requests are not valid, certainly, but that is already > > covered by the wording in the spec ("in multiples of 512 bytes" in the > > VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT description, and "The length of data > > MUST be a multiple of 512 bytes for VIRTIO_BLK_T_IN and > > VIRTIO_BLK_T_OUT requests" in the driver requirements). > > I guess I should have given 4KB logical block size as a better example. > > In case logical block size is 4KB, the minimum IO size is 4KB and not 512. > > Agreed ? In the case of SCSI or whatever, absolutely. virtio-blk doesn't have a concept of logical block size today, only the existing blk_size field, which I argue is not necessarily equivalent. (I have noticed that the virtio_blk_topology fields are defined in terms of "logical blocks", which I feel is a mistake - either these should be described in terms of blk_size, or the blk_size field needs to be redefined to explicitly mean "logical block size" as in your patch.) > >> 2. I/O alignment: All I/O requests must be aligned to the logical block size. This means the target start address (on the media device) of the I/O must be a multiple of the logical block size (the expressed units in the spec are not relevant). > > Unless there is a change to the spec (like this proposed patch), I > > don't think this is required. I suppose a device that can't manage to > > perform a misaligned I/O could choose to deny such requests and return > > VIRTIO_BLK_S_IOERR, but this is not explicitly allowed or denied by > > the current spec. > > This is a clarification of the existing specification, not a change. > > I believe I've identified the main point of disagreement regarding this > patch: > > I believe that I/O operations to storage block devices MUST be aligned > to the logical block size. In the other hand, you believe it is just a > recommendation for virtio block device. > > Maybe let's see what others think as well ? > > Stefan ? I agree that it would be useful to have more input from other interested parties. > >> 3. I/O size granularity: The size of I/O requests must be a multiple of the logical block size. For a 512-byte logical block size, valid I/O sizes would be 512 bytes, 1024 bytes, 1536 bytes, and so on. > > Same as point 1 above, the only requirement (that I can see, anyway) > > is that the I/O request has to be a multiple of 512 bytes. > > Do you think that for logical block size of 4k one can send IO of 1K to > such device and succeed ? For a virtio-blk device, yes (assuming "such a device" is a virtio-blk device implementation and we substitute blk_size for "logical block size"). > >>> The "smallest addressable unit" wording is also more confusing than > >>> clarifying here, since in virtio-blk, the "smallest addressable unit" > >>> is always 512 bytes, regardless of blk_size (even if a change like > >>> this would enforce limits on the sector values that would actually be > >>> allowed, it would still be possible to create a request that refers to > >>> a smaller unit than blk_size, unlike other storage protocols where > >>> logical block size changes the multiplier of the address, making it > >>> impossible to address smaller units). > >> Can you please point me to the place in spec that explicitly say that "in virtio-blk, the smallest addressable unit is always 512 bytes, regardless of blk_size" ? > > The description of struct virtio_blk_req's sector field: > > "The sector number indicates the offset (multiplied by 512) where the > > read or write is to occur." > > > > Unless we have different definitions of "smallest addressable unit", > > that seems pretty clear. It is always possible to address a part of > > the disk in 512-byte multiples (e.g. sectors 1 through 3) even when > > the blk_size is larger than 512. Maybe those requests won't succeed, > > but it is always possible to describe a region of the block device > > with an address that is not a multiple of blk_size. > > It doesn't say anything explicit about IO alignment and granularity. > > > > > >> I didn't find it in the spec and I don't agree with this call if it is implicit. > > I agree it should be clarified. The current wording requires too much > > reading between the lines. > > > >> The implementation of the Linux kernel driver isn't implemented in this manner as well. > > When filling the sector field, the Linux virtio-blk driver uses the > > value of blk_rq_pos(), which is also always in units of 512 bytes, as > > far as I can tell (it returns sector_t, which is in units of 512 bytes > > per the definition in include/linux/blk_types.h). > > Ignoring the units for the discussion, can it ever be unaligned to the > logical block size? The Linux driver will always request I/O that is aligned to the blk_size (if the device reports one), yes - but that's one driver's choice. Other drivers (which were written against the spec, not "whatever the Linux driver does") may operate differently. > > [...] > >>>> @@ -228,7 +234,7 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic > >>>> \item The device size can be read from \field{capacity}. > >>>> > >>>> \item If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, > >>>> - \field{blk_size} can be read to determine the optimal sector size > >>>> + \field{blk_size} can be read to determine the logical block size > >>>> for the driver to use. This does not affect the units used in > >>>> the protocol (always 512 bytes), but awareness of the correct > >>>> value can affect performance. > >>>> @@ -282,6 +288,12 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic > >>>> > >>>> \drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization} > >>>> > >>>> +Drivers MUST negotiate VIRTIO_BLK_F_BLK_SIZE if the feature is offered by the > >>>> +device. > >>> Why is this a MUST? This is also an incompatible change since there > >>> was no such requirement before, and drivers are free to ignore > >>> features they don't care about. > >> I can say SHOULD, but then the driver that is ignoring this feature may guess/assume the logical block size and it can be wrong. > >> > >> This may lead to an undefined behavior. > > Unless the spec is changed, if a device can't correctly handle > > non-blk_size-aligned requests, this seems like it would be a device > > bug ("undefined behavior") or at least non-spec-derived limitation > > (returning an error would not be undefined, but also probably wouldn't > > result in a working system). > > Again, this is a clarification of the existing specification, not a change. > > IMO, storage block device implementations should be able to assume that > I/O requests from drivers are aligned to the logical block size. If a > device receives misaligned I/O requests, it might be appropriate for the > device to fail these requests. It is not a device "bug" IMO. > > I'm curious if you've encountered any actual implementations where > drivers send non-block-size-aligned requests to block devices. I never > saw such implementation. I don't know of any real-world implementations like this, but the point of the spec is to allow interoperability without having to know about every implementation. > Sounds like a theoretical objection. > > IMO, the specification could benefit from explicitly stating this > alignment principal, which would guide both device and driver > implementations. > > I believe that this patch is clarifying important points and would lead > to more robust definition. From my point of view, it would be fine to clarify a few things: - blk_size should (not must) be the logical block size of the underlying storage device - data should (not must) be a multiple of blk_size for best performance And maybe: - devices may choose to return IOERR if a driver submits an I/O request that does not conform to the above recommendations (but this conflicts with the "performance"-related wording that exists now) > > Certainly it would be preferable for the driver to send aligned > > requests, but the existing wording seems to cover that pretty well > > already. > > > > Thanks, > > -- Daniel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] virtio-blk: Add description for blk_size field 2024-10-01 1:45 ` Daniel Verkamp @ 2024-10-02 13:42 ` Stefan Hajnoczi 2024-10-02 23:56 ` Daniel Verkamp 1 sibling, 0 replies; 10+ messages in thread From: Stefan Hajnoczi @ 2024-10-02 13:42 UTC (permalink / raw) To: Daniel Verkamp; +Cc: Max Gurtovoy, mst, virtualization, virtio-dev, oren, parav [-- Attachment #1: Type: text/plain, Size: 1119 bytes --] On Mon, Sep 30, 2024 at 06:45:23PM -0700, Daniel Verkamp wrote: > From my point of view, it would be fine to clarify a few things: > - blk_size should (not must) be the logical block size of the > underlying storage device > - data should (not must) be a multiple of blk_size for best performance > > And maybe: > - devices may choose to return IOERR if a driver submits an I/O > request that does not conform to the above recommendations (but this > conflicts with the "performance"-related wording that exists now) QEMU's virtio-blk implementation returns IOERR when the driver submits VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT requests that are not aligned to the logical block size: https://gitlab.com/qemu-project/qemu/-/blob/master/hw/block/virtio-blk.c#L367 Although I interpret the virtio-blk spec in the same way as you (blk_size is just a hint for optimal performance), I guess in practice drivers align requests to blk_size. Adding a note that devices may return IOERR is worthwhile. It will tell driver authors not to expect device implementations to accept misaligned requests. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] virtio-blk: Add description for blk_size field 2024-10-01 1:45 ` Daniel Verkamp 2024-10-02 13:42 ` Stefan Hajnoczi @ 2024-10-02 23:56 ` Daniel Verkamp 2024-10-06 11:35 ` Max Gurtovoy 2024-10-08 20:09 ` Stefan Hajnoczi 1 sibling, 2 replies; 10+ messages in thread From: Daniel Verkamp @ 2024-10-02 23:56 UTC (permalink / raw) To: Max Gurtovoy; +Cc: mst, virtualization, stefanha, virtio-dev, oren, parav On Mon, Sep 30, 2024 at 6:45 PM Daniel Verkamp <dverkamp@chromium.org> wrote: > On Wed, Sep 25, 2024 at 4:42 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: [...] > > I'm curious if you've encountered any actual implementations where > > drivers send non-block-size-aligned requests to block devices. I never > > saw such implementation. > > I don't know of any real-world implementations like this, but the > point of the spec is to allow interoperability without having to know > about every implementation. I did find at least one existing real-world driver implementation that ignores blk_size and VIRTIO_BLK_F_BLK_SIZE - the u-boot virtio-blk driver: https://github.com/u-boot/u-boot/blob/master/drivers/virtio/virtio_blk.c So this is not just spec language lawyering; this driver depends on the ability to read/write at arbitrary 512-byte sector addresses and data sizes, and it seems like it should be allowed to do that based on my reading of the spec. (Perhaps u-boot's driver does not work with the QEMU's virtio-blk device with certain storage configurations, per Stefan's note, but that is again just one device implementation.) -- Daniel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] virtio-blk: Add description for blk_size field 2024-10-02 23:56 ` Daniel Verkamp @ 2024-10-06 11:35 ` Max Gurtovoy 2024-10-08 1:58 ` Daniel Verkamp 2024-10-08 20:09 ` Stefan Hajnoczi 1 sibling, 1 reply; 10+ messages in thread From: Max Gurtovoy @ 2024-10-06 11:35 UTC (permalink / raw) To: Daniel Verkamp; +Cc: mst, virtualization, stefanha, virtio-dev, oren, parav On 03/10/2024 2:56, Daniel Verkamp wrote: > On Mon, Sep 30, 2024 at 6:45 PM Daniel Verkamp <dverkamp@chromium.org> wrote: >> On Wed, Sep 25, 2024 at 4:42 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > [...] >>> I'm curious if you've encountered any actual implementations where >>> drivers send non-block-size-aligned requests to block devices. I never >>> saw such implementation. >> I don't know of any real-world implementations like this, but the >> point of the spec is to allow interoperability without having to know >> about every implementation. > I did find at least one existing real-world driver implementation that > ignores blk_size and VIRTIO_BLK_F_BLK_SIZE - the u-boot virtio-blk > driver: https://github.com/u-boot/u-boot/blob/master/drivers/virtio/virtio_blk.c > > So this is not just spec language lawyering; this driver depends on > the ability to read/write at arbitrary 512-byte sector addresses and > data sizes, and it seems like it should be allowed to do that based on > my reading of the spec. > > (Perhaps u-boot's driver does not work with the QEMU's virtio-blk > device with certain storage configurations, per Stefan's note, but > that is again just one device implementation.) Vague specifications and inconsistent interpretations should be avoided. I'm trying to make the situation better. I'll send a V2 according to the comments. BTW, Are you familiar with some virtio-blk device that implements the buffer-cache logic you've described ? or is it only a theoretical discussion ? > > -- Daniel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] virtio-blk: Add description for blk_size field 2024-10-06 11:35 ` Max Gurtovoy @ 2024-10-08 1:58 ` Daniel Verkamp 0 siblings, 0 replies; 10+ messages in thread From: Daniel Verkamp @ 2024-10-08 1:58 UTC (permalink / raw) To: Max Gurtovoy; +Cc: mst, virtualization, stefanha, virtio-dev, oren, parav On Sun, Oct 6, 2024 at 4:36 AM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > > > On 03/10/2024 2:56, Daniel Verkamp wrote: > > On Mon, Sep 30, 2024 at 6:45 PM Daniel Verkamp <dverkamp@chromium.org> wrote: > >> On Wed, Sep 25, 2024 at 4:42 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote: > > [...] > >>> I'm curious if you've encountered any actual implementations where > >>> drivers send non-block-size-aligned requests to block devices. I never > >>> saw such implementation. > >> I don't know of any real-world implementations like this, but the > >> point of the spec is to allow interoperability without having to know > >> about every implementation. > > I did find at least one existing real-world driver implementation that > > ignores blk_size and VIRTIO_BLK_F_BLK_SIZE - the u-boot virtio-blk > > driver: https://github.com/u-boot/u-boot/blob/master/drivers/virtio/virtio_blk.c > > > > So this is not just spec language lawyering; this driver depends on > > the ability to read/write at arbitrary 512-byte sector addresses and > > data sizes, and it seems like it should be allowed to do that based on > > my reading of the spec. > > > > (Perhaps u-boot's driver does not work with the QEMU's virtio-blk > > device with certain storage configurations, per Stefan's note, but > > that is again just one device implementation.) > > Vague specifications and inconsistent interpretations should be avoided. > > I'm trying to make the situation better. Thanks, I do want to improve the situation as well, and hopefully we can reach that while keeping compatibility. > I'll send a V2 according to the comments. > > BTW, Are you familiar with some virtio-blk device that implements the > buffer-cache logic you've described ? or is it only a theoretical > discussion ? The most straightforward (software) implementation of a virtio-blk device on a Linux host system would be a raw disk image stored as a regular file on a filesystem, opened in the normal (non-O_DIRECT) way, with the device implementation translating VIRTIO_BLK_T_IN/OUT requests into [p]read/write syscalls; arguably, the host operating system's fs/mm code would be handling the buffering in this case, so it's not really part of the block device implementation per se, but I would still count it. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/1] virtio-blk: Add description for blk_size field 2024-10-02 23:56 ` Daniel Verkamp 2024-10-06 11:35 ` Max Gurtovoy @ 2024-10-08 20:09 ` Stefan Hajnoczi 1 sibling, 0 replies; 10+ messages in thread From: Stefan Hajnoczi @ 2024-10-08 20:09 UTC (permalink / raw) To: Daniel Verkamp; +Cc: Max Gurtovoy, mst, virtualization, virtio-dev, oren, parav [-- Attachment #1: Type: text/plain, Size: 509 bytes --] On Wed, Oct 02, 2024 at 04:56:10PM -0700, Daniel Verkamp wrote: > (Perhaps u-boot's driver does not work with the QEMU's virtio-blk > device with certain storage configurations, per Stefan's note, but > that is again just one device implementation.) Yes, IOERR only happens under certain conditions: when --device virtio-blk-pci,logical_block_size=<size> is configured. When that's not the case, QEMU buffers as necessary to adjust for guest block size vs host block size differences without errors. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-10-08 20:09 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-25 14:52 [PATCH 1/1] virtio-blk: Add description for blk_size field Max Gurtovoy
2024-09-25 20:57 ` Daniel Verkamp
[not found] ` <ca954af2-4c61-41f5-8cf0-69a1233704ba@nvidia.com>
2024-09-26 21:44 ` Daniel Verkamp
2024-09-28 0:22 ` Max Gurtovoy
2024-10-01 1:45 ` Daniel Verkamp
2024-10-02 13:42 ` Stefan Hajnoczi
2024-10-02 23:56 ` Daniel Verkamp
2024-10-06 11:35 ` Max Gurtovoy
2024-10-08 1:58 ` Daniel Verkamp
2024-10-08 20:09 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).