xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* bug when using 4K sectors?
@ 2012-08-13 14:12 James Harper
  2012-09-05 20:29 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 13+ messages in thread
From: James Harper @ 2012-08-13 14:12 UTC (permalink / raw)
  To: xen-devel@lists.xen.org

I notice this code in drivers/block/xen-blkback/common.h

#define vbd_sz(_v)      ((_v)->bdev->bd_part ? \
                         (_v)->bdev->bd_part->nr_sects : \
                          get_capacity((_v)->bdev->bd_disk))

is the value returned by vbd_sz(_v) the number of sectors in the Linux device (eg size / 4096), or the number of 512 byte sectors? I suspect the former which is causing block requests beyond 1/8th the size of the device to fail (assuming 4K sectors are expected to work at all - I can't quite get my head around how it would be expected to work - does Linux do the read-modify-write if required?)

I can't test until tomorrow AEDT, but maybe someone here knows the answer already?

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug when using 4K sectors?
  2012-08-13 14:12 bug when using 4K sectors? James Harper
@ 2012-09-05 20:29 ` Konrad Rzeszutek Wilk
  2012-09-05 23:56   ` James Harper
  0 siblings, 1 reply; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-05 20:29 UTC (permalink / raw)
  To: James Harper; +Cc: xen-devel@lists.xen.org

On Mon, Aug 13, 2012 at 02:12:58PM +0000, James Harper wrote:
> I notice this code in drivers/block/xen-blkback/common.h
> 
> #define vbd_sz(_v)      ((_v)->bdev->bd_part ? \
>                          (_v)->bdev->bd_part->nr_sects : \
>                           get_capacity((_v)->bdev->bd_disk))
> 
> is the value returned by vbd_sz(_v) the number of sectors in the Linux device (eg size / 4096), or the number of 512 byte sectors? I suspect the former which is causing block requests beyond 1/8th the size of the device to fail (assuming 4K sectors are expected to work at all - I can't quite get my head around how it would be expected to work - does Linux do the read-modify-write if required?)

I think you need to instrument it to be sure.. But more interesting, do you actually
have a disk that exposes a 4KB hardware and logical sector? So far I've only found
SSDs that expose a 512kB logical sector but also expose the 4KB hardware.

Never could figure out how that is all suppose to work as the blkback
is filled with << 9 on a bunch of things.

> 
> I can't test until tomorrow AEDT, but maybe someone here knows the answer already?
> 
> James
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug when using 4K sectors?
  2012-09-05 20:29 ` Konrad Rzeszutek Wilk
@ 2012-09-05 23:56   ` James Harper
  2012-09-06 10:58     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 13+ messages in thread
From: James Harper @ 2012-09-05 23:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel@lists.xen.org

> On Mon, Aug 13, 2012 at 02:12:58PM +0000, James Harper wrote:
> > I notice this code in drivers/block/xen-blkback/common.h
> >
> > #define vbd_sz(_v)      ((_v)->bdev->bd_part ? \
> >                          (_v)->bdev->bd_part->nr_sects : \
> >                           get_capacity((_v)->bdev->bd_disk))
> >
> > is the value returned by vbd_sz(_v) the number of sectors in the Linux
> > device (eg size / 4096), or the number of 512 byte sectors? I suspect
> > the former which is causing block requests beyond 1/8th the size of
> > the device to fail (assuming 4K sectors are expected to work at all -
> > I can't quite get my head around how it would be expected to work -
> > does Linux do the read-modify-write if required?)
> 
> I think you need to instrument it to be sure.. But more interesting, do you
> actually have a disk that exposes a 4KB hardware and logical sector? So far
> I've only found SSDs that expose a 512kB logical sector but also expose the
> 4KB hardware.
> 
> Never could figure out how that is all suppose to work as the blkback is filled
> with << 9 on a bunch of things.
> 

I was using bcache which does expose a 4K block size, by default. I changed it to 512 and it all works now, although I haven't tested if there is any loss of performance.

Does Xen provide a way to tell Windows that the underlying device is 512e (4K sector with 512 byte emulated interface)? This would keep everything working as is but allow windows to align writes to 4K boundaries where possible.

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug when using 4K sectors?
  2012-09-05 23:56   ` James Harper
@ 2012-09-06 10:58     ` Konrad Rzeszutek Wilk
  2012-09-16  7:00       ` Joseph Glanville
  0 siblings, 1 reply; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-06 10:58 UTC (permalink / raw)
  To: James Harper; +Cc: xen-devel@lists.xen.org

On Wed, Sep 05, 2012 at 11:56:08PM +0000, James Harper wrote:
> > On Mon, Aug 13, 2012 at 02:12:58PM +0000, James Harper wrote:
> > > I notice this code in drivers/block/xen-blkback/common.h
> > >
> > > #define vbd_sz(_v)      ((_v)->bdev->bd_part ? \
> > >                          (_v)->bdev->bd_part->nr_sects : \
> > >                           get_capacity((_v)->bdev->bd_disk))
> > >
> > > is the value returned by vbd_sz(_v) the number of sectors in the Linux
> > > device (eg size / 4096), or the number of 512 byte sectors? I suspect
> > > the former which is causing block requests beyond 1/8th the size of
> > > the device to fail (assuming 4K sectors are expected to work at all -
> > > I can't quite get my head around how it would be expected to work -
> > > does Linux do the read-modify-write if required?)
> > 
> > I think you need to instrument it to be sure.. But more interesting, do you
> > actually have a disk that exposes a 4KB hardware and logical sector? So far
> > I've only found SSDs that expose a 512kB logical sector but also expose the
> > 4KB hardware.
> > 
> > Never could figure out how that is all suppose to work as the blkback is filled
> > with << 9 on a bunch of things.
> > 
> 
> I was using bcache which does expose a 4K block size, by default. I changed it to 512 and it all works now, although I haven't tested if there is any loss of performance.

OK, let me see how I can setup bcache and play with that.
> 
> Does Xen provide a way to tell Windows that the underlying device is 512e (4K sector with 512 byte emulated interface)? This would keep everything working as is but allow windows to align writes to 4K boundaries where possible.

We can certainly expose that via the XenBus interface.
> 
> James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug when using 4K sectors?
  2012-09-06 10:58     ` Konrad Rzeszutek Wilk
@ 2012-09-16  7:00       ` Joseph Glanville
  2012-09-16  8:31         ` Keir Fraser
  0 siblings, 1 reply; 13+ messages in thread
From: Joseph Glanville @ 2012-09-16  7:00 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: James Harper, xen-devel@lists.xen.org

On 6 September 2012 20:58, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Wed, Sep 05, 2012 at 11:56:08PM +0000, James Harper wrote:
>> > On Mon, Aug 13, 2012 at 02:12:58PM +0000, James Harper wrote:
>> > > I notice this code in drivers/block/xen-blkback/common.h
>> > >
>> > > #define vbd_sz(_v)      ((_v)->bdev->bd_part ? \
>> > >                          (_v)->bdev->bd_part->nr_sects : \
>> > >                           get_capacity((_v)->bdev->bd_disk))
>> > >
>> > > is the value returned by vbd_sz(_v) the number of sectors in the Linux
>> > > device (eg size / 4096), or the number of 512 byte sectors? I suspect
>> > > the former which is causing block requests beyond 1/8th the size of
>> > > the device to fail (assuming 4K sectors are expected to work at all -
>> > > I can't quite get my head around how it would be expected to work -
>> > > does Linux do the read-modify-write if required?)
>> >
>> > I think you need to instrument it to be sure.. But more interesting, do you
>> > actually have a disk that exposes a 4KB hardware and logical sector? So far
>> > I've only found SSDs that expose a 512kB logical sector but also expose the
>> > 4KB hardware.
>> >
>> > Never could figure out how that is all suppose to work as the blkback is filled
>> > with << 9 on a bunch of things.
>> >
>>
>> I was using bcache which does expose a 4K block size, by default. I changed it to 512 and it all works now, although I haven't tested if there is any loss of performance.
>
> OK, let me see how I can setup bcache and play with that.
>>
>> Does Xen provide a way to tell Windows that the underlying device is 512e (4K sector with 512 byte emulated interface)? This would keep everything working as is but allow windows to align writes to 4K boundaries where possible.
>
> We can certainly expose that via the XenBus interface.
>>
>> James
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

After reading through blkback it appears that it can only support 512
byte sector sizes and removing this limitation would take quite abit
of work.
It uses hard coded bitshifts pervasively to convert between number of
requests/pages and size of sectors etc. (that is all the >> 9
everywhere)

I am going to see what I can about working on getting it to support 4k
sectors too and eventually uncoupled logical/physical sizes but that
would take even more work as far as I can tell.

Being able to use 4k sectors seems like it would provide pretty
massive gains in performance just by being more efficient let alone
increasing byte aligned writes to the underlying block storage system.

Joseph.

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug when using 4K sectors?
  2012-09-16  7:00       ` Joseph Glanville
@ 2012-09-16  8:31         ` Keir Fraser
  2012-09-16  9:00           ` Joseph Glanville
                             ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Keir Fraser @ 2012-09-16  8:31 UTC (permalink / raw)
  To: Joseph Glanville, Konrad Rzeszutek Wilk
  Cc: James Harper, xen-devel@lists.xen.org

On 16/09/2012 08:00, "Joseph Glanville" <joseph.glanville@orionvm.com.au>
wrote:

> After reading through blkback it appears that it can only support 512
> byte sector sizes and removing this limitation would take quite abit
> of work.
> It uses hard coded bitshifts pervasively to convert between number of
> requests/pages and size of sectors etc. (that is all the >> 9
> everywhere)
> 
> I am going to see what I can about working on getting it to support 4k
> sectors too and eventually uncoupled logical/physical sizes but that
> would take even more work as far as I can tell.
> 
> Being able to use 4k sectors seems like it would provide pretty
> massive gains in performance just by being more efficient let alone
> increasing byte aligned writes to the underlying block storage system.

The PV blk transport may be based on 512-byte sectors, but the real sector
size is communicated between blkfront and blkback via xenbus (field
'sector-size') and blkfront is expected to only make requests that are
multiple of, and aligned according to, that real 'sector-size'.

I would kind of expect it to work, as CD-ROMs have a larger sector size (2kB
IIRC) and we support those...

Bashing your head against the PV blk transport code may be premature. ;)

 -- Keir

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug when using 4K sectors?
  2012-09-16  8:31         ` Keir Fraser
@ 2012-09-16  9:00           ` Joseph Glanville
  2012-09-16 10:37           ` James Harper
  2012-09-16 11:27           ` Alan Cox
  2 siblings, 0 replies; 13+ messages in thread
From: Joseph Glanville @ 2012-09-16  9:00 UTC (permalink / raw)
  To: Keir Fraser; +Cc: James Harper, xen-devel@lists.xen.org, Konrad Rzeszutek Wilk

On 16 September 2012 18:31, Keir Fraser <keir.xen@gmail.com> wrote:
> On 16/09/2012 08:00, "Joseph Glanville" <joseph.glanville@orionvm.com.au>
> wrote:
>
>> After reading through blkback it appears that it can only support 512
>> byte sector sizes and removing this limitation would take quite abit
>> of work.
>> It uses hard coded bitshifts pervasively to convert between number of
>> requests/pages and size of sectors etc. (that is all the >> 9
>> everywhere)
>>
>> I am going to see what I can about working on getting it to support 4k
>> sectors too and eventually uncoupled logical/physical sizes but that
>> would take even more work as far as I can tell.
>>
>> Being able to use 4k sectors seems like it would provide pretty
>> massive gains in performance just by being more efficient let alone
>> increasing byte aligned writes to the underlying block storage system.
>
> The PV blk transport may be based on 512-byte sectors, but the real sector
> size is communicated between blkfront and blkback via xenbus (field
> 'sector-size') and blkfront is expected to only make requests that are
> multiple of, and aligned according to, that real 'sector-size'.
>
> I would kind of expect it to work, as CD-ROMs have a larger sector size (2kB
> IIRC) and we support those...
>
> Bashing your head against the PV blk transport code may be premature. ;)
>
>  -- Keir
>
>

Understood, still have a fair bit of reading to do. :)

Thanks,
Joseph.

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug when using 4K sectors?
  2012-09-16  8:31         ` Keir Fraser
  2012-09-16  9:00           ` Joseph Glanville
@ 2012-09-16 10:37           ` James Harper
  2012-09-16 11:18             ` Keir Fraser
  2012-09-16 11:27           ` Alan Cox
  2 siblings, 1 reply; 13+ messages in thread
From: James Harper @ 2012-09-16 10:37 UTC (permalink / raw)
  To: Keir Fraser, Joseph Glanville, Konrad Rzeszutek Wilk
  Cc: xen-devel@lists.xen.org

> > Being able to use 4k sectors seems like it would provide pretty
> > massive gains in performance just by being more efficient let alone
> > increasing byte aligned writes to the underlying block storage system.
> 
> The PV blk transport may be based on 512-byte sectors, but the real sector
> size is communicated between blkfront and blkback via xenbus (field
> 'sector-size') and blkfront is expected to only make requests that are
> multiple of, and aligned according to, that real 'sector-size'.
> 
> I would kind of expect it to work, as CD-ROMs have a larger sector size (2kB
> IIRC) and we support those...
> 
> Bashing your head against the PV blk transport code may be premature. ;)
> 

So a sector-size of 4096 would basically be a 512e device, allowing the underlying OS to communicate in 512 byte blocks but knowing that things will work best in 4096 byte sized transfers aligned to multiples of 4096 bytes, right?

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug when using 4K sectors?
  2012-09-16 10:37           ` James Harper
@ 2012-09-16 11:18             ` Keir Fraser
  2012-09-16 11:21               ` James Harper
  0 siblings, 1 reply; 13+ messages in thread
From: Keir Fraser @ 2012-09-16 11:18 UTC (permalink / raw)
  To: James Harper, Joseph Glanville, Konrad Rzeszutek Wilk
  Cc: xen-devel@lists.xen.org

On 16/09/2012 11:37, "James Harper" <james.harper@bendigoit.com.au> wrote:

>>> Being able to use 4k sectors seems like it would provide pretty
>>> massive gains in performance just by being more efficient let alone
>>> increasing byte aligned writes to the underlying block storage system.
>> 
>> The PV blk transport may be based on 512-byte sectors, but the real sector
>> size is communicated between blkfront and blkback via xenbus (field
>> 'sector-size') and blkfront is expected to only make requests that are
>> multiple of, and aligned according to, that real 'sector-size'.
>> 
>> I would kind of expect it to work, as CD-ROMs have a larger sector size (2kB
>> IIRC) and we support those...
>> 
>> Bashing your head against the PV blk transport code may be premature. ;)
>> 
> 
> So a sector-size of 4096 would basically be a 512e device, allowing the
> underlying OS to communicate in 512 byte blocks but knowing that things will
> work best in 4096 byte sized transfers aligned to multiples of 4096 bytes,
> right?

My recollection is that blkfront is required to submit only appropriately
-sized and -aligned requests; i.e. it's not merely advisory. I remember this
got added for CD-ROM support and if they had worked without this, I'm sure
we wouldn't have bothered!

 -- Keir

> James
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug when using 4K sectors?
  2012-09-16 11:18             ` Keir Fraser
@ 2012-09-16 11:21               ` James Harper
  0 siblings, 0 replies; 13+ messages in thread
From: James Harper @ 2012-09-16 11:21 UTC (permalink / raw)
  To: Keir Fraser, Joseph Glanville, Konrad Rzeszutek Wilk
  Cc: xen-devel@lists.xen.org

> > So a sector-size of 4096 would basically be a 512e device, allowing
> > the underlying OS to communicate in 512 byte blocks but knowing that
> > things will work best in 4096 byte sized transfers aligned to
> > multiples of 4096 bytes, right?
> 
> My recollection is that blkfront is required to submit only appropriately -sized
> and -aligned requests; i.e. it's not merely advisory. I remember this got
> added for CD-ROM support and if they had worked without this, I'm sure we
> wouldn't have bothered!
> 

That's a shame. It would be good to have separate values for Physical and Logical block sizes so the guest VM can make appropriate alignment decisions. In fact there is a lot of stuff in /sys for the block devices that would be nice to be mapped into xenstore!

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug when using 4K sectors?
  2012-09-16  8:31         ` Keir Fraser
  2012-09-16  9:00           ` Joseph Glanville
  2012-09-16 10:37           ` James Harper
@ 2012-09-16 11:27           ` Alan Cox
  2012-09-16 11:50             ` James Harper
  2 siblings, 1 reply; 13+ messages in thread
From: Alan Cox @ 2012-09-16 11:27 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Joseph Glanville, James Harper, xen-devel@lists.xen.org,
	Konrad Rzeszutek Wilk

> I would kind of expect it to work, as CD-ROMs have a larger sector size (2kB
> IIRC) and we support those...

For data blocks they are 2K, as are some magneto-opticals.

The more complicated case is modern hard disks, while you can access them
on 512 byte boundaries they are actually using bigger block sizes but the
large blocks are not neccessarily on the 0 boundary in order to get
optimal alignment for existing file systems and partitioning.

So knowing the block size isn't the whole story.

Alan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug when using 4K sectors?
  2012-09-16 11:27           ` Alan Cox
@ 2012-09-16 11:50             ` James Harper
  2012-09-16 16:07               ` Alan Cox
  0 siblings, 1 reply; 13+ messages in thread
From: James Harper @ 2012-09-16 11:50 UTC (permalink / raw)
  To: Alan Cox, Keir Fraser
  Cc: Joseph Glanville, xen-devel@lists.xen.org, Konrad Rzeszutek Wilk

> 
> > I would kind of expect it to work, as CD-ROMs have a larger sector
> > size (2kB
> > IIRC) and we support those...
> 
> For data blocks they are 2K, as are some magneto-opticals.
> 
> The more complicated case is modern hard disks, while you can access them
> on 512 byte boundaries they are actually using bigger block sizes but the large
> blocks are not neccessarily on the 0 boundary in order to get optimal
> alignment for existing file systems and partitioning.
> 
> So knowing the block size isn't the whole story.
> 

Are you saying that Xen and/or Linux needs to worry about a user setting up a poorly aligned filesystem to pass to a VM? Seems simpler just to set things up right in the first place.

Or did you mean something else?

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: bug when using 4K sectors?
  2012-09-16 11:50             ` James Harper
@ 2012-09-16 16:07               ` Alan Cox
  0 siblings, 0 replies; 13+ messages in thread
From: Alan Cox @ 2012-09-16 16:07 UTC (permalink / raw)
  To: James Harper
  Cc: Joseph Glanville, Keir Fraser, xen-devel@lists.xen.org,
	Konrad Rzeszutek Wilk

> > So knowing the block size isn't the whole story.
> > 
> 
> Are you saying that Xen and/or Linux needs to worry about a user setting up a poorly aligned filesystem to pass to a VM? Seems simpler just to set things up right in the first place.

That assumes things like a file system and the existing layout being
correct. Plus you also have to set the thing up which means you have to
know about such stuff.

For file systems Linux itself does indeed take the approach of "so
partition sensibly" because in the fs case it's really hard if not
impossible to do a good job any other way.

For raw devices and things like databases wanting atomicity of block
writes however its quite different and you need to be aware of the
alignments.

Alan

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-09-16 16:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-13 14:12 bug when using 4K sectors? James Harper
2012-09-05 20:29 ` Konrad Rzeszutek Wilk
2012-09-05 23:56   ` James Harper
2012-09-06 10:58     ` Konrad Rzeszutek Wilk
2012-09-16  7:00       ` Joseph Glanville
2012-09-16  8:31         ` Keir Fraser
2012-09-16  9:00           ` Joseph Glanville
2012-09-16 10:37           ` James Harper
2012-09-16 11:18             ` Keir Fraser
2012-09-16 11:21               ` James Harper
2012-09-16 11:27           ` Alan Cox
2012-09-16 11:50             ` James Harper
2012-09-16 16:07               ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).