All of lore.kernel.org
 help / color / mirror / Atom feed
* Block protocol incompatibilities with 4K logical sector size disks
@ 2024-08-29 10:59 Roger Pau Monné
  2024-08-29 13:15 ` Anthony PERARD
  0 siblings, 1 reply; 12+ messages in thread
From: Roger Pau Monné @ 2024-08-29 10:59 UTC (permalink / raw)
  To: xen-devel
  Cc: Paul Durrant, Owen Smith, Mark Syms, Anthony Perard,
	Stefano Stabellini, Juergen Gross

Hello,

To give some context, this started as a bug report against FreeBSD failing to
work with PV blkif attached disks with 4K logical sectors when the backend is
Linux kernel blkback:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280884

Further investigation has lead me to discover that the protocol described in
the public blkif.h header is not implemented uniformly, and there are major
inconsistencies between implementations regarding the meaning of the `sectors`
and `sector-size` xenstore nodes, and the sector_number and {first,last}_sect
struct request fields.  Below is a summary of the findings on the
implementation I've analyzed.

Linux blk{front,back} always assumes the `sectors` xenstore node to be in 512b
units, regardless of the value of the `sector-size` node.  Equally the ring
request sector_number and the segments {first,last}_sect fields are always
assumed to be in units of 512b regardless of the value of `sector-size`.  The
`feature-large-sector-size` node is neither exposed by blkfront, neither
checked by blkback before exposing a `sector-size` node different than 512b.

FreeBSD blk{front,back} calculates (and for blkback exposes) the disk size as
`sectors` * `sector-size` based on the values in the xenstore nodes (as
described in blkif.h).  The ring sector_number is filled with the sector number
based on the `sector-size` value, however the {first,last}_sect fields are
always calculated as 512b units.   The `feature-large-sector-size` node is
neither exposed by blkfront, neither checked by blkback before exposing a
`sector-size` node different than 512b.

QEMU qdisk blkback implementation exposes the `sectors` disk size in units of
`sector-size` (as FreeBSD blkback).  The ring structure fields sector_number
and {first,last}_sect are assumed to be in units of `sector-size`.  This
implementation will not expose a `sector-size` node with a value different than
512 unless the frontend xenstore path has the `feature-large-sector-size` node
present.

Windows blkfront calculates the disk size as `sectors` * `sector-size` from the
xenstore nodes exposed by blkback.   The ring structure fields sector_number
and {first,last}_sect are assumed to be in units of `sector-size`.  This
frontend implementation exposes `feature-large-sector-size`.

When using a disk with a logical sector size different than 512b, Linux is only
compatible with itself, same for FreeBSD.  QEMU blkback implementation is also
only compatible with the Windows blkfront implementation.  The
`feature-large-sector-size` seems to only be implemented for the QEMU/Windows
combination, both Linux and FreeBSD don't implement any support for it neither
in the backend or the frontend.

The following table attempts to summarize in which units the following fields
are defined for the analyzed implementations (please correct me if I got some
of this wrong):

                        │ sectors xenbus node │ requests sector_number │ requests {first,last}_sect
────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
FreeBSD blk{front,back} │     sector-size     │      sector-size       │           512
────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
Linux blk{front,back}   │         512         │          512           │           512
────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
QEMU blkback            │     sector-size     │      sector-size       │       sector-size
────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
Windows blkfront        │     sector-size     │      sector-size       │       sector-size
────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
MiniOS                  │     sector-size     │          512           │           512
────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
tapdisk blkback         │         512         │      sector-size       │           512

It's all a mess, I'm surprised we didn't get more reports about brokenness when
using disks with 4K logical sectors.

Overall I think the in-kernel backends are more difficult to update (as it
might require a kernel rebuild), compared to QEMU or blktap.  Hence my slight
preference would be to adjust the public interface to match the behavior of
Linux blkback, and then adjust the implementation in the rest of the backends
and frontends.

There was an attempt in 2019 to introduce a new frontend feature flag to signal
whether the frontend supported `sector-size` xenstore nodes different than 512 [0].
However that was only ever implemented for QEMU blkback and Windows blkfront,
all the other backends will expose `sector-size` different than 512 without
checking if `feature-large-sector-size` is exposed by the frontend.  I'm afraid
it's now too late to retrofit that feature into existing backends, seeing as
they already expose `sector-size` nodes greater than 512 without checking if
`feature-large-sector-size` is reported by the frontend.

My proposal would be to adjust the public interface with:

 * Disk size is calculated as: `sectors` * 512 (`sectors` being the contents of
   such xenstore backend node).

 * All the sector related fields in blkif ring requests use a 512b base sector
   size, regardless of the value in the `sector-size` xenstore node.

 * The `sector-size` contains the disk logical sector size.  The frontend must
   ensure that all request segments addresses are aligned and it's length is
   a multiple of such size.  Otherwise the backend will refuse to process the
   request.

Regards, Roger.

[0] https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=67e1c050e36b2c9900cca83618e56189effbad98


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Block protocol incompatibilities with 4K logical sector size disks
  2024-08-29 10:59 Block protocol incompatibilities with 4K logical sector size disks Roger Pau Monné
@ 2024-08-29 13:15 ` Anthony PERARD
  2024-08-29 15:42   ` Roger Pau Monné
  0 siblings, 1 reply; 12+ messages in thread
From: Anthony PERARD @ 2024-08-29 13:15 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Paul Durrant, Owen Smith, Mark Syms,
	Stefano Stabellini, Juergen Gross

On Thu, Aug 29, 2024 at 12:59:43PM +0200, Roger Pau Monné wrote:
> Hello,
>
> To give some context, this started as a bug report against FreeBSD failing to
> work with PV blkif attached disks with 4K logical sectors when the backend is
> Linux kernel blkback:
>
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280884
>
> Further investigation has lead me to discover that the protocol described in
> the public blkif.h header is not implemented uniformly, and there are major
> inconsistencies between implementations regarding the meaning of the `sectors`
> and `sector-size` xenstore nodes, and the sector_number and {first,last}_sect
> struct request fields.  Below is a summary of the findings on the
> implementation I've analyzed.
>
> Linux blk{front,back} always assumes the `sectors` xenstore node to be in 512b
> units, regardless of the value of the `sector-size` node.  Equally the ring
> request sector_number and the segments {first,last}_sect fields are always
> assumed to be in units of 512b regardless of the value of `sector-size`.  The
> `feature-large-sector-size` node is neither exposed by blkfront, neither
> checked by blkback before exposing a `sector-size` node different than 512b.
>
> FreeBSD blk{front,back} calculates (and for blkback exposes) the disk size as
> `sectors` * `sector-size` based on the values in the xenstore nodes (as
> described in blkif.h).  The ring sector_number is filled with the sector number
> based on the `sector-size` value, however the {first,last}_sect fields are
> always calculated as 512b units.   The `feature-large-sector-size` node is
> neither exposed by blkfront, neither checked by blkback before exposing a
> `sector-size` node different than 512b.
>
> QEMU qdisk blkback implementation exposes the `sectors` disk size in units of
> `sector-size` (as FreeBSD blkback).  The ring structure fields sector_number
> and {first,last}_sect are assumed to be in units of `sector-size`.  This
> implementation will not expose a `sector-size` node with a value different than
> 512 unless the frontend xenstore path has the `feature-large-sector-size` node
> present.
>
> Windows blkfront calculates the disk size as `sectors` * `sector-size` from the
> xenstore nodes exposed by blkback.   The ring structure fields sector_number
> and {first,last}_sect are assumed to be in units of `sector-size`.  This
> frontend implementation exposes `feature-large-sector-size`.
>
> When using a disk with a logical sector size different than 512b, Linux is only
> compatible with itself, same for FreeBSD.  QEMU blkback implementation is also
> only compatible with the Windows blkfront implementation.  The
> `feature-large-sector-size` seems to only be implemented for the QEMU/Windows
> combination, both Linux and FreeBSD don't implement any support for it neither
> in the backend or the frontend.
>
> The following table attempts to summarize in which units the following fields
> are defined for the analyzed implementations (please correct me if I got some
> of this wrong):
>
>                         │ sectors xenbus node │ requests sector_number │ requests {first,last}_sect
> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> FreeBSD blk{front,back} │     sector-size     │      sector-size       │           512
> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> Linux blk{front,back}   │         512         │          512           │           512
> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> QEMU blkback            │     sector-size     │      sector-size       │       sector-size
> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> Windows blkfront        │     sector-size     │      sector-size       │       sector-size
> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> MiniOS                  │     sector-size     │          512           │           512
> ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> tapdisk blkback         │         512         │      sector-size       │           512

There's OVMF as well, which copied MiniOS's implementation, and looks
like it's still the same as MiniOS for the table above:

OVMF (base on MiniOS)   │     sector-size     │          512           │           512

>
> It's all a mess, I'm surprised we didn't get more reports about brokenness when
> using disks with 4K logical sectors.
>
> Overall I think the in-kernel backends are more difficult to update (as it
> might require a kernel rebuild), compared to QEMU or blktap.  Hence my slight
> preference would be to adjust the public interface to match the behavior of
> Linux blkback, and then adjust the implementation in the rest of the backends
> and frontends.

I would add that making "sector-size" been different from 512 illegal
makes going forward easier, has every implementation will work with a
"sector-size" of 512, and it probably going to be the most common sector
size for a while longer.

> There was an attempt in 2019 to introduce a new frontend feature flag to signal
> whether the frontend supported `sector-size` xenstore nodes different than 512 [0].
> However that was only ever implemented for QEMU blkback and Windows blkfront,
> all the other backends will expose `sector-size` different than 512 without
> checking if `feature-large-sector-size` is exposed by the frontend.  I'm afraid
> it's now too late to retrofit that feature into existing backends, seeing as
> they already expose `sector-size` nodes greater than 512 without checking if
> `feature-large-sector-size` is reported by the frontend.

Much before that, "physical-sector-size" was introduced (2013):
    https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=a67e2dac9f8339681b30b0f89274a64e691ea139

Linux seems to implement it, but QEMU or OVMF don't have it.

> My proposal would be to adjust the public interface with:
>
>  * Disk size is calculated as: `sectors` * 512 (`sectors` being the contents of
>    such xenstore backend node).
>
>  * All the sector related fields in blkif ring requests use a 512b base sector
>    size, regardless of the value in the `sector-size` xenstore node.
>
>  * The `sector-size` contains the disk logical sector size.  The frontend must
>    ensure that all request segments addresses are aligned and it's length is
>    a multiple of such size.  Otherwise the backend will refuse to process the
>    request.

You still want to try to have a "sector-size" different from 512? To me
this just add confusion to the confusion. There would be no way fro
backend or frontend to know if setting something other than 512 is going
to work. Also, it is probably easier to update backend than frontend, so
it is just likely that something is going to lag behind and broke.

Why not make use of the node "physical-sector-size" that have existed
for 10 years, even if unused or unadvertised, and if an IO request isn't
aligned on it, it is just going to be slow (as backend would have to
read,update,write instead of just write sectors).

Cheers,

--

Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Block protocol incompatibilities with 4K logical sector size disks
  2024-08-29 13:15 ` Anthony PERARD
@ 2024-08-29 15:42   ` Roger Pau Monné
  2024-08-30 16:09     ` Anthony PERARD
  0 siblings, 1 reply; 12+ messages in thread
From: Roger Pau Monné @ 2024-08-29 15:42 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: xen-devel, Paul Durrant, Owen Smith, Mark Syms,
	Stefano Stabellini, Juergen Gross

On Thu, Aug 29, 2024 at 01:15:42PM +0000, Anthony PERARD wrote:
> On Thu, Aug 29, 2024 at 12:59:43PM +0200, Roger Pau Monné wrote:
> > Hello,
> >
> > To give some context, this started as a bug report against FreeBSD failing to
> > work with PV blkif attached disks with 4K logical sectors when the backend is
> > Linux kernel blkback:
> >
> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280884
> >
> > Further investigation has lead me to discover that the protocol described in
> > the public blkif.h header is not implemented uniformly, and there are major
> > inconsistencies between implementations regarding the meaning of the `sectors`
> > and `sector-size` xenstore nodes, and the sector_number and {first,last}_sect
> > struct request fields.  Below is a summary of the findings on the
> > implementation I've analyzed.
> >
> > Linux blk{front,back} always assumes the `sectors` xenstore node to be in 512b
> > units, regardless of the value of the `sector-size` node.  Equally the ring
> > request sector_number and the segments {first,last}_sect fields are always
> > assumed to be in units of 512b regardless of the value of `sector-size`.  The
> > `feature-large-sector-size` node is neither exposed by blkfront, neither
> > checked by blkback before exposing a `sector-size` node different than 512b.
> >
> > FreeBSD blk{front,back} calculates (and for blkback exposes) the disk size as
> > `sectors` * `sector-size` based on the values in the xenstore nodes (as
> > described in blkif.h).  The ring sector_number is filled with the sector number
> > based on the `sector-size` value, however the {first,last}_sect fields are
> > always calculated as 512b units.   The `feature-large-sector-size` node is
> > neither exposed by blkfront, neither checked by blkback before exposing a
> > `sector-size` node different than 512b.
> >
> > QEMU qdisk blkback implementation exposes the `sectors` disk size in units of
> > `sector-size` (as FreeBSD blkback).  The ring structure fields sector_number
> > and {first,last}_sect are assumed to be in units of `sector-size`.  This
> > implementation will not expose a `sector-size` node with a value different than
> > 512 unless the frontend xenstore path has the `feature-large-sector-size` node
> > present.
> >
> > Windows blkfront calculates the disk size as `sectors` * `sector-size` from the
> > xenstore nodes exposed by blkback.   The ring structure fields sector_number
> > and {first,last}_sect are assumed to be in units of `sector-size`.  This
> > frontend implementation exposes `feature-large-sector-size`.
> >
> > When using a disk with a logical sector size different than 512b, Linux is only
> > compatible with itself, same for FreeBSD.  QEMU blkback implementation is also
> > only compatible with the Windows blkfront implementation.  The
> > `feature-large-sector-size` seems to only be implemented for the QEMU/Windows
> > combination, both Linux and FreeBSD don't implement any support for it neither
> > in the backend or the frontend.
> >
> > The following table attempts to summarize in which units the following fields
> > are defined for the analyzed implementations (please correct me if I got some
> > of this wrong):
> >
> >                         │ sectors xenbus node │ requests sector_number │ requests {first,last}_sect
> > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > FreeBSD blk{front,back} │     sector-size     │      sector-size       │           512
> > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > Linux blk{front,back}   │         512         │          512           │           512
> > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > QEMU blkback            │     sector-size     │      sector-size       │       sector-size
> > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > Windows blkfront        │     sector-size     │      sector-size       │       sector-size
> > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > MiniOS                  │     sector-size     │          512           │           512
> > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > tapdisk blkback         │         512         │      sector-size       │           512
> 
> There's OVMF as well, which copied MiniOS's implementation, and looks
> like it's still the same as MiniOS for the table above:
> 
> OVMF (base on MiniOS)   │     sector-size     │          512           │           512
> 
> >
> > It's all a mess, I'm surprised we didn't get more reports about brokenness when
> > using disks with 4K logical sectors.
> >
> > Overall I think the in-kernel backends are more difficult to update (as it
> > might require a kernel rebuild), compared to QEMU or blktap.  Hence my slight
> > preference would be to adjust the public interface to match the behavior of
> > Linux blkback, and then adjust the implementation in the rest of the backends
> > and frontends.
> 
> I would add that making "sector-size" been different from 512 illegal
> makes going forward easier, has every implementation will work with a
> "sector-size" of 512, and it probably going to be the most common sector
> size for a while longer.

My main concern is the amount of backends out there that already
expose a "sector-size" different than 512.  I fear any changes here
will take time to propagate to in-kernel backends, and hence my
approach was to avoid modifying Linux blkback, because (as seen in the
FreeBSD bug report) there are already instances of 4K logical sector
disks being used by users.  Modifying the frontends is likely easier,
as that's under the owner of the VM control.

> > There was an attempt in 2019 to introduce a new frontend feature flag to signal
> > whether the frontend supported `sector-size` xenstore nodes different than 512 [0].
> > However that was only ever implemented for QEMU blkback and Windows blkfront,
> > all the other backends will expose `sector-size` different than 512 without
> > checking if `feature-large-sector-size` is exposed by the frontend.  I'm afraid
> > it's now too late to retrofit that feature into existing backends, seeing as
> > they already expose `sector-size` nodes greater than 512 without checking if
> > `feature-large-sector-size` is reported by the frontend.
> 
> Much before that, "physical-sector-size" was introduced (2013):
>     https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=a67e2dac9f8339681b30b0f89274a64e691ea139
> 
> Linux seems to implement it, but QEMU or OVMF don't have it.

Yeah, I was aware of this, normal disks already have a physical sector
size (optimal sector size) and a logical sector size (minimal size
supported by the drive).  Some implement a smaller logical than
physical sector size by doing read-modify-write.

> > My proposal would be to adjust the public interface with:
> >
> >  * Disk size is calculated as: `sectors` * 512 (`sectors` being the contents of
> >    such xenstore backend node).
> >
> >  * All the sector related fields in blkif ring requests use a 512b base sector
> >    size, regardless of the value in the `sector-size` xenstore node.
> >
> >  * The `sector-size` contains the disk logical sector size.  The frontend must
> >    ensure that all request segments addresses are aligned and it's length is
> >    a multiple of such size.  Otherwise the backend will refuse to process the
> >    request.
> 
> You still want to try to have a "sector-size" different from 512? To me
> this just add confusion to the confusion. There would be no way fro
> backend or frontend to know if setting something other than 512 is going
> to work.

But that's already the case, most (all?) backends except QEMU will set
"sector-size" to the underlying block storage logical sector size
without any way to tell if the frontend supports sector-sizes != 512.
So the issue is not inherently with the setting of the "sector-size"
node to a value different than 512, but rather how different
implementations have diverged regarding which is the base unit of
other fields.

> Also, it is probably easier to update backend than frontend, so
> it is just likely that something is going to lag behind and broke.

Hm, I'm not convinced, sometimes the owner of a VM has no control over
the version of the backends if it's not the admin of the host.  OTOH
the owner of a VM could always update the kernel in order to
workaround such blkfront/blkback incompatibility issues.  Hence my
preference was for solutions that didn't involve changing Linux
blkback, as I believe that's the most commonly used backend.

> Why not make use of the node "physical-sector-size" that have existed
> for 10 years, even if unused or unadvertised, and if an IO request isn't
> aligned on it, it is just going to be slow (as backend would have to
> read,update,write instead of just write sectors).

I don't really fancy implementing read-modify-write on the backends,
as it's going to add more complexity to blkback implementations,
specially the in-kernel ones I would assume.

All frontends I've looked into support "sector-size" != 512, but
there's a lack of uniformity on whether other units used in the
protocol are based on the blkback exposed "sector-size", or hardcoded
to 512.

So your suggestion would be to hardcode "sector-size" to 512 and use
the "physical-sector-size" node value to set the block device logical
sector size the frontends?

If we go that route I would suggest that backends are free to refuse
requests that aren't a multiple of "physical-sector-size".

Thanks, Roger.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Block protocol incompatibilities with 4K logical sector size disks
  2024-08-29 15:42   ` Roger Pau Monné
@ 2024-08-30 16:09     ` Anthony PERARD
  2024-09-02  8:55       ` Roger Pau Monné
  0 siblings, 1 reply; 12+ messages in thread
From: Anthony PERARD @ 2024-08-30 16:09 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Paul Durrant, Owen Smith, Mark Syms,
	Stefano Stabellini, Juergen Gross

On Thu, Aug 29, 2024 at 05:42:45PM +0200, Roger Pau Monné wrote:
> On Thu, Aug 29, 2024 at 01:15:42PM +0000, Anthony PERARD wrote:
> > On Thu, Aug 29, 2024 at 12:59:43PM +0200, Roger Pau Monné wrote:
> > > The following table attempts to summarize in which units the following fields
> > > are defined for the analyzed implementations (please correct me if I got some
> > > of this wrong):
> > >
> > >                         │ sectors xenbus node │ requests sector_number │ requests {first,last}_sect
> > > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > > FreeBSD blk{front,back} │     sector-size     │      sector-size       │           512
> > > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > > Linux blk{front,back}   │         512         │          512           │           512
> > > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > > QEMU blkback            │     sector-size     │      sector-size       │       sector-size
> > > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > > Windows blkfront        │     sector-size     │      sector-size       │       sector-size
> > > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > > MiniOS                  │     sector-size     │          512           │           512
> > > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > > tapdisk blkback         │         512         │      sector-size       │           512

Tapdisk situation seems more like:

     tapdisk blkback         │      ??????????     │      ???????????       │         ?????

I've looks at the implementation at xapi-project/blktat[1] and the way
sector_number or {first,last}_sect seems to be used varied on which
backend is used (block-vhd, block-nbd, block-aio).

[1] https://github.com/xapi-project/blktap

block-vhd seems mostly sectors of 512 but recalculated with "s->spb"
(sector per block?) but still, sector seems to be only 512.

block-nbd seems to set "sector-size" to always 512, but uses
"sector-size" for sector_number and {first,last}_sect.

The weirdest one is block-aio, where on read it multiply sector_number
and {first,last}_sect by 512, but on write, those are multiplied by
"sector-size". With "sector-size" set by ioctl(BLKSSZGET)

At least, is seems "sectors" is a multiple of 512 on all those, like in
the table, but I've only look at those 3 "drivers".


> > There's OVMF as well, which copied MiniOS's implementation, and looks
> > like it's still the same as MiniOS for the table above:
> >
> > OVMF (base on MiniOS)   │     sector-size     │          512           │           512
> >
> > >
> > > It's all a mess, I'm surprised we didn't get more reports about brokenness when
> > > using disks with 4K logical sectors.
> > >
> > > Overall I think the in-kernel backends are more difficult to update (as it
> > > might require a kernel rebuild), compared to QEMU or blktap.  Hence my slight
> > > preference would be to adjust the public interface to match the behavior of
> > > Linux blkback, and then adjust the implementation in the rest of the backends
> > > and frontends.
> >
> > I would add that making "sector-size" been different from 512 illegal
> > makes going forward easier, has every implementation will work with a
> > "sector-size" of 512, and it probably going to be the most common sector
> > size for a while longer.
>
> My main concern is the amount of backends out there that already
> expose a "sector-size" different than 512.  I fear any changes here
> will take time to propagate to in-kernel backends, and hence my
> approach was to avoid modifying Linux blkback, because (as seen in the
> FreeBSD bug report) there are already instances of 4K logical sector
> disks being used by users.  Modifying the frontends is likely easier,
> as that's under the owner of the VM control.
>
> > > There was an attempt in 2019 to introduce a new frontend feature flag to signal
> > > whether the frontend supported `sector-size` xenstore nodes different than 512 [0].
> > > However that was only ever implemented for QEMU blkback and Windows blkfront,
> > > all the other backends will expose `sector-size` different than 512 without
> > > checking if `feature-large-sector-size` is exposed by the frontend.  I'm afraid
> > > it's now too late to retrofit that feature into existing backends, seeing as
> > > they already expose `sector-size` nodes greater than 512 without checking if
> > > `feature-large-sector-size` is reported by the frontend.
> >
> > Much before that, "physical-sector-size" was introduced (2013):
> >     https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=a67e2dac9f8339681b30b0f89274a64e691ea139
> >
> > Linux seems to implement it, but QEMU or OVMF don't have it.
>
> Yeah, I was aware of this, normal disks already have a physical sector
> size (optimal sector size) and a logical sector size (minimal size
> supported by the drive).  Some implement a smaller logical than
> physical sector size by doing read-modify-write.
>
> > > My proposal would be to adjust the public interface with:
> > >
> > >  * Disk size is calculated as: `sectors` * 512 (`sectors` being the contents of
> > >    such xenstore backend node).
> > >
> > >  * All the sector related fields in blkif ring requests use a 512b base sector
> > >    size, regardless of the value in the `sector-size` xenstore node.
> > >
> > >  * The `sector-size` contains the disk logical sector size.  The frontend must
> > >    ensure that all request segments addresses are aligned and it's length is
> > >    a multiple of such size.  Otherwise the backend will refuse to process the
> > >    request.
> >
> > You still want to try to have a "sector-size" different from 512? To me
> > this just add confusion to the confusion. There would be no way fro
> > backend or frontend to know if setting something other than 512 is going
> > to work.
>
> But that's already the case, most (all?) backends except QEMU will set
> "sector-size" to the underlying block storage logical sector size

QEMU, only if feature-large-sector-size is set, indeed, otherwise it
just return an error if it have to set "sector-size" to a value
different from 512.

Otherwise, yes for Linux, FreeBSD, and maybe yes for blktap. For blktap
it seems to depend of the storage, more or less:
    - block-vhd: always "sector-size" = 512
    - block-nbd: always "sector-size" = 512
    - block-aio: physical storage sector size

> without any way to tell if the frontend supports sector-sizes != 512.
> So the issue is not inherently with the setting of the "sector-size"
> node to a value different than 512, but rather how different
> implementations have diverged regarding which is the base unit of
> other fields.
>
> > Also, it is probably easier to update backend than frontend, so
> > it is just likely that something is going to lag behind and broke.
>
> Hm, I'm not convinced, sometimes the owner of a VM has no control over
> the version of the backends if it's not the admin of the host.  OTOH
> the owner of a VM could always update the kernel in order to
> workaround such blkfront/blkback incompatibility issues.  Hence my
> preference was for solutions that didn't involve changing Linux
> blkback, as I believe that's the most commonly used backend.

Going the Linux way might be the least bad option indeed. sectors in
requests has been described as a 512-bytes for a long while. It's only
"sectors" that have been described as "sector-size"-bytes size.

> > Why not make use of the node "physical-sector-size" that have existed
> > for 10 years, even if unused or unadvertised, and if an IO request isn't
> > aligned on it, it is just going to be slow (as backend would have to
> > read,update,write instead of just write sectors).
>
> I don't really fancy implementing read-modify-write on the backends,
> as it's going to add more complexity to blkback implementations,
> specially the in-kernel ones I would assume.
>
> All frontends I've looked into support "sector-size" != 512, but
> there's a lack of uniformity on whether other units used in the
> protocol are based on the blkback exposed "sector-size", or hardcoded
> to 512.
>
> So your suggestion would be to hardcode "sector-size" to 512 and use
> the "physical-sector-size" node value to set the block device logical
> sector size the frontends?
>
> If we go that route I would suggest that backends are free to refuse
> requests that aren't a multiple of "physical-sector-size".

After looking in more detail in the different implementations, and linux
one, I don't think changing "physical-sector-size" meaning is going to
be helpful.

What to do about "feature-large-sector-size"? Should backend refuse to
connect to the front end if that flag is set and "sector-size" want to
be different than 512? This would just be Windows frontend I guess.
(Just as an helper for updated backend)


So yes, after more research, having sector in the protocol been a
512-byte size seems the least bad option. "sector_number" and
"{first,last}_sect" have been described as is for a long while. Only
"sectors" for the size has been described as a "sector-size" quantity.

Cheers,

--

Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Block protocol incompatibilities with 4K logical sector size disks
  2024-08-30 16:09     ` Anthony PERARD
@ 2024-09-02  8:55       ` Roger Pau Monné
  2024-09-02 14:19         ` Paul Durrant
  2024-09-02 14:50         ` Mark Syms
  0 siblings, 2 replies; 12+ messages in thread
From: Roger Pau Monné @ 2024-09-02  8:55 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: xen-devel, Paul Durrant, Owen Smith, Mark Syms,
	Stefano Stabellini, Juergen Gross

On Fri, Aug 30, 2024 at 04:09:25PM +0000, Anthony PERARD wrote:
> On Thu, Aug 29, 2024 at 05:42:45PM +0200, Roger Pau Monné wrote:
> > On Thu, Aug 29, 2024 at 01:15:42PM +0000, Anthony PERARD wrote:
> > > On Thu, Aug 29, 2024 at 12:59:43PM +0200, Roger Pau Monné wrote:
> > > > The following table attempts to summarize in which units the following fields
> > > > are defined for the analyzed implementations (please correct me if I got some
> > > > of this wrong):
> > > >
> > > >                         │ sectors xenbus node │ requests sector_number │ requests {first,last}_sect
> > > > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > > > FreeBSD blk{front,back} │     sector-size     │      sector-size       │           512
> > > > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > > > Linux blk{front,back}   │         512         │          512           │           512
> > > > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > > > QEMU blkback            │     sector-size     │      sector-size       │       sector-size
> > > > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > > > Windows blkfront        │     sector-size     │      sector-size       │       sector-size
> > > > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > > > MiniOS                  │     sector-size     │          512           │           512
> > > > ────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
> > > > tapdisk blkback         │         512         │      sector-size       │           512
> 
> Tapdisk situation seems more like:
> 
>      tapdisk blkback         │      ??????????     │      ???????????       │         ?????
> 
> I've looks at the implementation at xapi-project/blktat[1] and the way
> sector_number or {first,last}_sect seems to be used varied on which
> backend is used (block-vhd, block-nbd, block-aio).
> 
> [1] https://github.com/xapi-project/blktap
> 
> block-vhd seems mostly sectors of 512 but recalculated with "s->spb"
> (sector per block?) but still, sector seems to be only 512.
> 
> block-nbd seems to set "sector-size" to always 512, but uses
> "sector-size" for sector_number and {first,last}_sect.
> 
> The weirdest one is block-aio, where on read it multiply sector_number
> and {first,last}_sect by 512, but on write, those are multiplied by
> "sector-size". With "sector-size" set by ioctl(BLKSSZGET)
> 
> At least, is seems "sectors" is a multiple of 512 on all those, like in
> the table, but I've only look at those 3 "drivers".

You looked more than myself, I've just looked at the block-aio write
path I think, and I was happy enough to see it was yet different from
the other implementations.

I think it's clear enough that every implementation has slight
differences, and I don't plan to fix all of them.

> 
> > > There's OVMF as well, which copied MiniOS's implementation, and looks
> > > like it's still the same as MiniOS for the table above:
> > >
> > > OVMF (base on MiniOS)   │     sector-size     │          512           │           512
> > >
> > > >
> > > > It's all a mess, I'm surprised we didn't get more reports about brokenness when
> > > > using disks with 4K logical sectors.
> > > >
> > > > Overall I think the in-kernel backends are more difficult to update (as it
> > > > might require a kernel rebuild), compared to QEMU or blktap.  Hence my slight
> > > > preference would be to adjust the public interface to match the behavior of
> > > > Linux blkback, and then adjust the implementation in the rest of the backends
> > > > and frontends.
> > >
> > > I would add that making "sector-size" been different from 512 illegal
> > > makes going forward easier, has every implementation will work with a
> > > "sector-size" of 512, and it probably going to be the most common sector
> > > size for a while longer.
> >
> > My main concern is the amount of backends out there that already
> > expose a "sector-size" different than 512.  I fear any changes here
> > will take time to propagate to in-kernel backends, and hence my
> > approach was to avoid modifying Linux blkback, because (as seen in the
> > FreeBSD bug report) there are already instances of 4K logical sector
> > disks being used by users.  Modifying the frontends is likely easier,
> > as that's under the owner of the VM control.
> >
> > > > There was an attempt in 2019 to introduce a new frontend feature flag to signal
> > > > whether the frontend supported `sector-size` xenstore nodes different than 512 [0].
> > > > However that was only ever implemented for QEMU blkback and Windows blkfront,
> > > > all the other backends will expose `sector-size` different than 512 without
> > > > checking if `feature-large-sector-size` is exposed by the frontend.  I'm afraid
> > > > it's now too late to retrofit that feature into existing backends, seeing as
> > > > they already expose `sector-size` nodes greater than 512 without checking if
> > > > `feature-large-sector-size` is reported by the frontend.
> > >
> > > Much before that, "physical-sector-size" was introduced (2013):
> > >     https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=a67e2dac9f8339681b30b0f89274a64e691ea139
> > >
> > > Linux seems to implement it, but QEMU or OVMF don't have it.
> >
> > Yeah, I was aware of this, normal disks already have a physical sector
> > size (optimal sector size) and a logical sector size (minimal size
> > supported by the drive).  Some implement a smaller logical than
> > physical sector size by doing read-modify-write.
> >
> > > > My proposal would be to adjust the public interface with:
> > > >
> > > >  * Disk size is calculated as: `sectors` * 512 (`sectors` being the contents of
> > > >    such xenstore backend node).
> > > >
> > > >  * All the sector related fields in blkif ring requests use a 512b base sector
> > > >    size, regardless of the value in the `sector-size` xenstore node.
> > > >
> > > >  * The `sector-size` contains the disk logical sector size.  The frontend must
> > > >    ensure that all request segments addresses are aligned and it's length is
> > > >    a multiple of such size.  Otherwise the backend will refuse to process the
> > > >    request.
> > >
> > > You still want to try to have a "sector-size" different from 512? To me
> > > this just add confusion to the confusion. There would be no way fro
> > > backend or frontend to know if setting something other than 512 is going
> > > to work.
> >
> > But that's already the case, most (all?) backends except QEMU will set
> > "sector-size" to the underlying block storage logical sector size
> 
> QEMU, only if feature-large-sector-size is set, indeed, otherwise it
> just return an error if it have to set "sector-size" to a value
> different from 512.
> 
> Otherwise, yes for Linux, FreeBSD, and maybe yes for blktap. For blktap
> it seems to depend of the storage, more or less:
>     - block-vhd: always "sector-size" = 512
>     - block-nbd: always "sector-size" = 512
>     - block-aio: physical storage sector size
> 
> > without any way to tell if the frontend supports sector-sizes != 512.
> > So the issue is not inherently with the setting of the "sector-size"
> > node to a value different than 512, but rather how different
> > implementations have diverged regarding which is the base unit of
> > other fields.
> >
> > > Also, it is probably easier to update backend than frontend, so
> > > it is just likely that something is going to lag behind and broke.
> >
> > Hm, I'm not convinced, sometimes the owner of a VM has no control over
> > the version of the backends if it's not the admin of the host.  OTOH
> > the owner of a VM could always update the kernel in order to
> > workaround such blkfront/blkback incompatibility issues.  Hence my
> > preference was for solutions that didn't involve changing Linux
> > blkback, as I believe that's the most commonly used backend.
> 
> Going the Linux way might be the least bad option indeed. sectors in
> requests has been described as a 512-bytes for a long while. It's only
> "sectors" that have been described as "sector-size"-bytes size.
> 
> > > Why not make use of the node "physical-sector-size" that have existed
> > > for 10 years, even if unused or unadvertised, and if an IO request isn't
> > > aligned on it, it is just going to be slow (as backend would have to
> > > read,update,write instead of just write sectors).
> >
> > I don't really fancy implementing read-modify-write on the backends,
> > as it's going to add more complexity to blkback implementations,
> > specially the in-kernel ones I would assume.
> >
> > All frontends I've looked into support "sector-size" != 512, but
> > there's a lack of uniformity on whether other units used in the
> > protocol are based on the blkback exposed "sector-size", or hardcoded
> > to 512.
> >
> > So your suggestion would be to hardcode "sector-size" to 512 and use
> > the "physical-sector-size" node value to set the block device logical
> > sector size the frontends?
> >
> > If we go that route I would suggest that backends are free to refuse
> > requests that aren't a multiple of "physical-sector-size".
> 
> After looking in more detail in the different implementations, and linux
> one, I don't think changing "physical-sector-size" meaning is going to
> be helpful.
> 
> What to do about "feature-large-sector-size"? Should backend refuse to
> connect to the front end if that flag is set and "sector-size" want to
> be different than 512? This would just be Windows frontend I guess.
> (Just as an helper for updated backend)

I think it's likely too late to mandate exposing
"feature-large-sector-size" in the frontends, as for example Linux
blkfront will handle "sector-size" != 512, yet it doesn't expose
"feature-large-sector-size".  If we retroactively push this change to
the backends we might break setups that were working fine
previously.

> 
> So yes, after more research, having sector in the protocol been a
> 512-byte size seems the least bad option. "sector_number" and
> "{first,last}_sect" have been described as is for a long while. Only
> "sectors" for the size has been described as a "sector-size" quantity.

Thanks for your input.  I would also like to hear from the blktap and
Windows PV drivers maintainers, as the change that I'm proposing here
will require changes to their implementations.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Block protocol incompatibilities with 4K logical sector size disks
  2024-09-02  8:55       ` Roger Pau Monné
@ 2024-09-02 14:19         ` Paul Durrant
  2024-09-02 15:23           ` Roger Pau Monné
  2024-09-02 14:50         ` Mark Syms
  1 sibling, 1 reply; 12+ messages in thread
From: Paul Durrant @ 2024-09-02 14:19 UTC (permalink / raw)
  To: Roger Pau Monné, Anthony PERARD
  Cc: xen-devel, Owen Smith, Mark Syms, Stefano Stabellini,
	Juergen Gross

On 02/09/2024 09:55, Roger Pau Monné wrote:
[snip]
> 
> Thanks for your input.  I would also like to hear from the blktap and
> Windows PV drivers maintainers, as the change that I'm proposing here
> will require changes to their implementations.
> 

So IIUC you are proposing to refuse to connect to a frontend that sets 
feature-large-sector-size if sector-size != 512? Is that right?

   Paul



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Block protocol incompatibilities with 4K logical sector size disks
  2024-09-02  8:55       ` Roger Pau Monné
  2024-09-02 14:19         ` Paul Durrant
@ 2024-09-02 14:50         ` Mark Syms
  2024-09-02 15:26           ` Roger Pau Monné
  1 sibling, 1 reply; 12+ messages in thread
From: Mark Syms @ 2024-09-02 14:50 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Anthony PERARD, xen-devel, Paul Durrant, Owen Smith,
	Stefano Stabellini, Juergen Gross

On Mon, 2 Sept 2024 at 09:55, Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> > So yes, after more research, having sector in the protocol been a
> > 512-byte size seems the least bad option. "sector_number" and
> > "{first,last}_sect" have been described as is for a long while. Only
> > "sectors" for the size has been described as a "sector-size" quantity.
>
> Thanks for your input.  I would also like to hear from the blktap and
> Windows PV drivers maintainers, as the change that I'm proposing here
> will require changes to their implementations.

Well, that's a whole big mess isn't it ;( FWIW, it's tacitly assumed
that tapdisk is only running on 512 or 512e storage as its primary use
case is VHD and that driver explodes spectacularly on 4KN. So,
hardening those implicit conditions into hard explicit ones seems like
an entirely reasonable thing.

Mark


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Block protocol incompatibilities with 4K logical sector size disks
  2024-09-02 14:19         ` Paul Durrant
@ 2024-09-02 15:23           ` Roger Pau Monné
  2024-09-02 15:25             ` Paul Durrant
  2024-09-02 15:50             ` Anthony PERARD
  0 siblings, 2 replies; 12+ messages in thread
From: Roger Pau Monné @ 2024-09-02 15:23 UTC (permalink / raw)
  To: paul
  Cc: Anthony PERARD, xen-devel, Owen Smith, Mark Syms,
	Stefano Stabellini, Juergen Gross

On Mon, Sep 02, 2024 at 03:19:56PM +0100, Paul Durrant wrote:
> On 02/09/2024 09:55, Roger Pau Monné wrote:
> [snip]
> > 
> > Thanks for your input.  I would also like to hear from the blktap and
> > Windows PV drivers maintainers, as the change that I'm proposing here
> > will require changes to their implementations.
> > 
> 
> So IIUC you are proposing to refuse to connect to a frontend that sets
> feature-large-sector-size if sector-size != 512? Is that right?

Is is worth retrofitting this into existing backends?  My suggestion
would be to make `feature-large-sector-size` not mandatory to expose a
sector-size != 512, but I wouldn't go as far as refusing to connect to
frontends that expose the feature.  I have no idea which frontends
might expose `feature-large-sector-size` but still be compatible with
Linux blkback regarding sector-size != 512 (I know the Windows one
isn't).

I think we have reached consensus with Anthony on the approach, so it
might be best if I just draft a proposal change to blkif.h because
that's less ambiguous than attempting to describe it here.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Block protocol incompatibilities with 4K logical sector size disks
  2024-09-02 15:23           ` Roger Pau Monné
@ 2024-09-02 15:25             ` Paul Durrant
  2024-09-02 15:50             ` Anthony PERARD
  1 sibling, 0 replies; 12+ messages in thread
From: Paul Durrant @ 2024-09-02 15:25 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Anthony PERARD, xen-devel, Owen Smith, Mark Syms,
	Stefano Stabellini, Juergen Gross

On 02/09/2024 16:23, Roger Pau Monné wrote:
> On Mon, Sep 02, 2024 at 03:19:56PM +0100, Paul Durrant wrote:
>> On 02/09/2024 09:55, Roger Pau Monné wrote:
>> [snip]
>>>
>>> Thanks for your input.  I would also like to hear from the blktap and
>>> Windows PV drivers maintainers, as the change that I'm proposing here
>>> will require changes to their implementations.
>>>
>>
>> So IIUC you are proposing to refuse to connect to a frontend that sets
>> feature-large-sector-size if sector-size != 512? Is that right?
> 
> Is is worth retrofitting this into existing backends?  My suggestion
> would be to make `feature-large-sector-size` not mandatory to expose a
> sector-size != 512, but I wouldn't go as far as refusing to connect to
> frontends that expose the feature.  I have no idea which frontends
> might expose `feature-large-sector-size` but still be compatible with
> Linux blkback regarding sector-size != 512 (I know the Windows one
> isn't).
> 
> I think we have reached consensus with Anthony on the approach, so it
> might be best if I just draft a proposal change to blkif.h because
> that's less ambiguous than attempting to describe it here.
> 

Yes, I think that would be best at this point.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Block protocol incompatibilities with 4K logical sector size disks
  2024-09-02 14:50         ` Mark Syms
@ 2024-09-02 15:26           ` Roger Pau Monné
  0 siblings, 0 replies; 12+ messages in thread
From: Roger Pau Monné @ 2024-09-02 15:26 UTC (permalink / raw)
  To: Mark Syms
  Cc: Anthony PERARD, xen-devel, Paul Durrant, Owen Smith,
	Stefano Stabellini, Juergen Gross

On Mon, Sep 02, 2024 at 03:50:17PM +0100, Mark Syms wrote:
> On Mon, 2 Sept 2024 at 09:55, Roger Pau Monné <roger.pau@citrix.com> wrote:
> >
> > > So yes, after more research, having sector in the protocol been a
> > > 512-byte size seems the least bad option. "sector_number" and
> > > "{first,last}_sect" have been described as is for a long while. Only
> > > "sectors" for the size has been described as a "sector-size" quantity.
> >
> > Thanks for your input.  I would also like to hear from the blktap and
> > Windows PV drivers maintainers, as the change that I'm proposing here
> > will require changes to their implementations.
> 
> Well, that's a whole big mess isn't it ;( FWIW, it's tacitly assumed
> that tapdisk is only running on 512 or 512e storage as its primary use
> case is VHD and that driver explodes spectacularly on 4KN. So,
> hardening those implicit conditions into hard explicit ones seems like
> an entirely reasonable thing.

OK, so I take you are fine with the adjustments to the protocol being
suggested here, and will be happy to adjust blktap if/when required to
meet them (if it ever supports exposing 4K sector sized disks).

Thanks for the feedback, will Cc you on the patch to blkif.h for one
extra review if possible.

Roger.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Block protocol incompatibilities with 4K logical sector size disks
  2024-09-02 15:23           ` Roger Pau Monné
  2024-09-02 15:25             ` Paul Durrant
@ 2024-09-02 15:50             ` Anthony PERARD
  2024-09-02 16:08               ` Roger Pau Monné
  1 sibling, 1 reply; 12+ messages in thread
From: Anthony PERARD @ 2024-09-02 15:50 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: paul, xen-devel, Owen Smith, Mark Syms, Stefano Stabellini,
	Juergen Gross

On Mon, Sep 02, 2024 at 05:23:37PM +0200, Roger Pau Monné wrote:
> On Mon, Sep 02, 2024 at 03:19:56PM +0100, Paul Durrant wrote:
> > On 02/09/2024 09:55, Roger Pau Monné wrote:
> > [snip]
> > >
> > > Thanks for your input.  I would also like to hear from the blktap and
> > > Windows PV drivers maintainers, as the change that I'm proposing here
> > > will require changes to their implementations.
> > >
> >
> > So IIUC you are proposing to refuse to connect to a frontend that sets
> > feature-large-sector-size if sector-size != 512? Is that right?
>
> Is is worth retrofitting this into existing backends?  My suggestion
> would be to make `feature-large-sector-size` not mandatory to expose a
> sector-size != 512, but I wouldn't go as far as refusing to connect to
> frontends that expose the feature.  I have no idea which frontends
> might expose `feature-large-sector-size` but still be compatible with
> Linux blkback regarding sector-size != 512 (I know the Windows one
> isn't).

The frontend exposing "feature-large-sector-size" are not going to work
with Linux's backend if it set "sector-size" to a value different from
512.

From blkif.h:
    feature-large-sector-size
         A value of "1" indicates that the frontend will correctly supply and
         interpret all sector-based quantities in terms of the "sector-size"
         value supplied in the backend info, whatever that may be set to.
         ...

But Linux interprets "sector-based quantities" as 512 bytes. This is
incompatible with "feature-large-sector-size".

This is why I proposed to mark "feature-large-sector-size" as broken or
incompatible with your proposal to use 512B for all sector-based
quantities.

Cheers,

--

Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Block protocol incompatibilities with 4K logical sector size disks
  2024-09-02 15:50             ` Anthony PERARD
@ 2024-09-02 16:08               ` Roger Pau Monné
  0 siblings, 0 replies; 12+ messages in thread
From: Roger Pau Monné @ 2024-09-02 16:08 UTC (permalink / raw)
  To: Anthony PERARD
  Cc: paul, xen-devel, Owen Smith, Mark Syms, Stefano Stabellini,
	Juergen Gross

On Mon, Sep 02, 2024 at 03:50:05PM +0000, Anthony PERARD wrote:
> On Mon, Sep 02, 2024 at 05:23:37PM +0200, Roger Pau Monné wrote:
> > On Mon, Sep 02, 2024 at 03:19:56PM +0100, Paul Durrant wrote:
> > > On 02/09/2024 09:55, Roger Pau Monné wrote:
> > > [snip]
> > > >
> > > > Thanks for your input.  I would also like to hear from the blktap and
> > > > Windows PV drivers maintainers, as the change that I'm proposing here
> > > > will require changes to their implementations.
> > > >
> > >
> > > So IIUC you are proposing to refuse to connect to a frontend that sets
> > > feature-large-sector-size if sector-size != 512? Is that right?
> >
> > Is is worth retrofitting this into existing backends?  My suggestion
> > would be to make `feature-large-sector-size` not mandatory to expose a
> > sector-size != 512, but I wouldn't go as far as refusing to connect to
> > frontends that expose the feature.  I have no idea which frontends
> > might expose `feature-large-sector-size` but still be compatible with
> > Linux blkback regarding sector-size != 512 (I know the Windows one
> > isn't).
> 
> The frontend exposing "feature-large-sector-size" are not going to work
> with Linux's backend if it set "sector-size" to a value different from
> 512.
> 
> From blkif.h:
>     feature-large-sector-size
>          A value of "1" indicates that the frontend will correctly supply and
>          interpret all sector-based quantities in terms of the "sector-size"
>          value supplied in the backend info, whatever that may be set to.
>          ...

While this is what the protocol specification says, just look how
different implementations have managed to diverge in all kind of
different ways.  I wouldn't discard there's a frontend somewhere that
exposes `feature-large-sector-size` without doing the calculations as
stated in the specification.

> But Linux interprets "sector-based quantities" as 512 bytes. This is
> incompatible with "feature-large-sector-size".
> 
> This is why I proposed to mark "feature-large-sector-size" as broken or
> incompatible with your proposal to use 512B for all sector-based
> quantities.

Yeah, that's the intention, to mark `feature-large-sector-size` as
deprecated in blkif.h and note it shouldn't be exposed by frontends.

I however wasn't planning to change Linux blkback for example to
refuse to attach to frontends that expose `feature-large-sector-size`
when `sector-size` != 512.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-09-02 16:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-29 10:59 Block protocol incompatibilities with 4K logical sector size disks Roger Pau Monné
2024-08-29 13:15 ` Anthony PERARD
2024-08-29 15:42   ` Roger Pau Monné
2024-08-30 16:09     ` Anthony PERARD
2024-09-02  8:55       ` Roger Pau Monné
2024-09-02 14:19         ` Paul Durrant
2024-09-02 15:23           ` Roger Pau Monné
2024-09-02 15:25             ` Paul Durrant
2024-09-02 15:50             ` Anthony PERARD
2024-09-02 16:08               ` Roger Pau Monné
2024-09-02 14:50         ` Mark Syms
2024-09-02 15:26           ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.