* [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size
[not found] <1262018363-15871-1-git-send-email-avi@redhat.com>
@ 2010-01-04 3:08 ` Rusty Russell
2010-01-04 7:02 ` Avi Kivity
2010-01-04 8:30 ` Christoph Hellwig
0 siblings, 2 replies; 9+ messages in thread
From: Rusty Russell @ 2010-01-04 3:08 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, kvm, virtualization
On Tue, 29 Dec 2009 03:09:23 am Avi Kivity wrote:
> This patch adds a physical block size attribute to virtio disks,
> corresponding to /sys/devices/.../physical_block_size. It is defined as
> the request alignment which will not trigger RMW cycles. This can be
> important for modern disks which use 4K physical sectors (though they
> still support 512 logical sectors), and for file-backed disk images (which
> have both the underlying filesystem block size and their own allocation
> granularity to consider).
>
> Installers use this to align partitions to physical block boundaries.
>
> Note the spec already defined blk_size as the performance rather than
> minimum alignment. However the driver interpreted this as the logical
> block size, so I updated the spec to match the driver assuming the driver
> predates the spec and that this is an error.
I thought this was what I was doing, but I have shown over and over that
I have no idea about block devices.
Our current driver treats BLK_SIZE as the logical and physical size (see
blk_queue_logical_block_size).
I have no idea what "logical" vs. "physical" actually means. Anyone? Most
importantly, is it some Linux-internal difference or a real I/O-visible
distinction?
Rusty.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size
2010-01-04 3:08 ` [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size Rusty Russell
@ 2010-01-04 7:02 ` Avi Kivity
2010-01-05 20:18 ` Jamie Lokier
2010-01-04 8:30 ` Christoph Hellwig
1 sibling, 1 reply; 9+ messages in thread
From: Avi Kivity @ 2010-01-04 7:02 UTC (permalink / raw)
To: Rusty Russell; +Cc: qemu-devel, kvm, virtualization
On 01/04/2010 05:08 AM, Rusty Russell wrote:
> On Tue, 29 Dec 2009 03:09:23 am Avi Kivity wrote:
>
>> This patch adds a physical block size attribute to virtio disks,
>> corresponding to /sys/devices/.../physical_block_size. It is defined as
>> the request alignment which will not trigger RMW cycles. This can be
>> important for modern disks which use 4K physical sectors (though they
>> still support 512 logical sectors), and for file-backed disk images (which
>> have both the underlying filesystem block size and their own allocation
>> granularity to consider).
>>
>> Installers use this to align partitions to physical block boundaries.
>>
>> Note the spec already defined blk_size as the performance rather than
>> minimum alignment. However the driver interpreted this as the logical
>> block size, so I updated the spec to match the driver assuming the driver
>> predates the spec and that this is an error.
>>
> I thought this was what I was doing, but I have shown over and over that
> I have no idea about block devices.
>
> Our current driver treats BLK_SIZE as the logical and physical size (see
> blk_queue_logical_block_size).
>
But we want them to be different.
> I have no idea what "logical" vs. "physical" actually means. Anyone? Most
> importantly, is it some Linux-internal difference or a real I/O-visible
> distinction?
>
Yes.
Logical block size is the minimum block size the hardware will allow.
Try to write less than that, and the hardware will laugh in your face.
Physical block size is the what the logical block size would have been
is software didn't suck. In theory they should be the same, but since
compatibility reaons clamp the logical block size to 512, they have to
differ. A disk may have a physical block size of 4096 and emulate
logical block size of 512 on top of that using read-modify-write.
Or so I understand it.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size
2010-01-04 3:08 ` [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size Rusty Russell
2010-01-04 7:02 ` Avi Kivity
@ 2010-01-04 8:30 ` Christoph Hellwig
2010-01-05 12:56 ` Rusty Russell
1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2010-01-04 8:30 UTC (permalink / raw)
To: Rusty Russell; +Cc: virtualization, Avi Kivity, kvm, qemu-devel
On Mon, Jan 04, 2010 at 01:38:51PM +1030, Rusty Russell wrote:
> I thought this was what I was doing, but I have shown over and over that
> I have no idea about block devices.
>
> Our current driver treats BLK_SIZE as the logical and physical size (see
> blk_queue_logical_block_size).
>
> I have no idea what "logical" vs. "physical" actually means. Anyone? Most
> importantly, is it some Linux-internal difference or a real I/O-visible
> distinction?
Those should be the same for any sane interface. They are for classical
disk devices with larger block sizes (MO, s390 dasd) and also for the
now appearing 4k sector scsi disks. But in the ide world people are
concerned about dos/window legacy compatiblity so they came up with a
nasty hack:
- there is a physical block size as used by the disk internally
(4k initially)
- all the interfaces to the operating system still happen in the
traditional 512 byte blocks to not break any existing assumptions
- to make sure modern operating systems can optimize for the larger
physical sectors the disks expose this size, too.
- even worse disks can also have alignment hacks for the traditional
DOS partitions tables, so that the 512 byte block zero might even
have an offset into the first larger physical block. This is also
exposed in the ATA identify information.
All in all I don't think this mess is a good idea to replicate in
virtio. Virtio by defintion requires virtualization aware guests, so we
should just follow the SCSI way of larger real block sizes here.
>
> Rusty.
>
---end quoted text---
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size
2010-01-04 8:30 ` Christoph Hellwig
@ 2010-01-05 12:56 ` Rusty Russell
2010-01-05 12:58 ` Avi Kivity
0 siblings, 1 reply; 9+ messages in thread
From: Rusty Russell @ 2010-01-05 12:56 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: virtualization, Avi Kivity, kvm, qemu-devel
On Mon, 4 Jan 2010 07:00:35 pm Christoph Hellwig wrote:
> On Mon, Jan 04, 2010 at 01:38:51PM +1030, Rusty Russell wrote:
> > I thought this was what I was doing, but I have shown over and over that
> > I have no idea about block devices.
> >
> > Our current driver treats BLK_SIZE as the logical and physical size (see
> > blk_queue_logical_block_size).
> >
> > I have no idea what "logical" vs. "physical" actually means. Anyone? Most
> > importantly, is it some Linux-internal difference or a real I/O-visible
> > distinction?
>
> Those should be the same for any sane interface. They are for classical
> disk devices with larger block sizes (MO, s390 dasd) and also for the
> now appearing 4k sector scsi disks. But in the ide world people are
> concerned about dos/window legacy compatiblity so they came up with a
> nasty hack:
>
> - there is a physical block size as used by the disk internally
> (4k initially)
> - all the interfaces to the operating system still happen in the
> traditional 512 byte blocks to not break any existing assumptions
> - to make sure modern operating systems can optimize for the larger
> physical sectors the disks expose this size, too.
> - even worse disks can also have alignment hacks for the traditional
> DOS partitions tables, so that the 512 byte block zero might even
> have an offset into the first larger physical block. This is also
> exposed in the ATA identify information.
>
> All in all I don't think this mess is a good idea to replicate in
> virtio. Virtio by defintion requires virtualization aware guests, so we
> should just follow the SCSI way of larger real block sizes here.
Yes. The current VIRTIO_BLK_F_BLK_SIZE says "please use this block size".
We haven't actually specified what happens if the guest doesn't, but the
spec says "must", and the Linux implementation does so AFAICT.
If we want a "soft" size, we could add that as a separate feature.
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size
2010-01-05 12:56 ` Rusty Russell
@ 2010-01-05 12:58 ` Avi Kivity
2010-01-05 20:16 ` Jamie Lokier
0 siblings, 1 reply; 9+ messages in thread
From: Avi Kivity @ 2010-01-05 12:58 UTC (permalink / raw)
To: Rusty Russell; +Cc: virtualization, Christoph Hellwig, kvm, qemu-devel
On 01/05/2010 02:56 PM, Rusty Russell wrote:
>
>> Those should be the same for any sane interface. They are for classical
>> disk devices with larger block sizes (MO, s390 dasd) and also for the
>> now appearing 4k sector scsi disks. But in the ide world people are
>> concerned about dos/window legacy compatiblity so they came up with a
>> nasty hack:
>>
>> - there is a physical block size as used by the disk internally
>> (4k initially)
>> - all the interfaces to the operating system still happen in the
>> traditional 512 byte blocks to not break any existing assumptions
>> - to make sure modern operating systems can optimize for the larger
>> physical sectors the disks expose this size, too.
>> - even worse disks can also have alignment hacks for the traditional
>> DOS partitions tables, so that the 512 byte block zero might even
>> have an offset into the first larger physical block. This is also
>> exposed in the ATA identify information.
>>
>> All in all I don't think this mess is a good idea to replicate in
>> virtio. Virtio by defintion requires virtualization aware guests, so we
>> should just follow the SCSI way of larger real block sizes here.
>>
> Yes. The current VIRTIO_BLK_F_BLK_SIZE says "please use this block size".
> We haven't actually specified what happens if the guest doesn't, but the
> spec says "must", and the Linux implementation does so AFAICT.
>
> If we want a "soft" size, we could add that as a separate feature.
>
No - I agree with Christoph, there's no reason to use a 512/4096
monstrosity with virtio.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size
2010-01-05 12:58 ` Avi Kivity
@ 2010-01-05 20:16 ` Jamie Lokier
2010-01-08 15:40 ` Christoph Hellwig
0 siblings, 1 reply; 9+ messages in thread
From: Jamie Lokier @ 2010-01-05 20:16 UTC (permalink / raw)
To: Avi Kivity
Cc: qemu-devel, Rusty Russell, Christoph Hellwig, kvm, virtualization
Avi Kivity wrote:
> On 01/05/2010 02:56 PM, Rusty Russell wrote:
> >
> >>Those should be the same for any sane interface. They are for classical
> >>disk devices with larger block sizes (MO, s390 dasd) and also for the
> >>now appearing 4k sector scsi disks. But in the ide world people are
> >>concerned about dos/window legacy compatiblity so they came up with a
> >>nasty hack:
> >>
> >> - there is a physical block size as used by the disk internally
> >> (4k initially)
> >> - all the interfaces to the operating system still happen in the
> >> traditional 512 byte blocks to not break any existing assumptions
> >> - to make sure modern operating systems can optimize for the larger
> >> physical sectors the disks expose this size, too.
> >> - even worse disks can also have alignment hacks for the traditional
> >> DOS partitions tables, so that the 512 byte block zero might even
> >> have an offset into the first larger physical block. This is also
> >> exposed in the ATA identify information.
> >>
> >>All in all I don't think this mess is a good idea to replicate in
> >>virtio. Virtio by defintion requires virtualization aware guests, so we
> >>should just follow the SCSI way of larger real block sizes here.
> >>
> >Yes. The current VIRTIO_BLK_F_BLK_SIZE says "please use this block size".
> >We haven't actually specified what happens if the guest doesn't, but the
> >spec says "must", and the Linux implementation does so AFAICT.
> >
> >If we want a "soft" size, we could add that as a separate feature.
> >
>
> No - I agree with Christoph, there's no reason to use a 512/4096
> monstrosity with virtio.
It would be good if virtio relayed the backing device's basic topology
hints, so:
- If the backing dev is a real disk with 512-byte sectors,
virtio should indicate 512-byte blocks to the guest.
- If the backing dev is a real disk with 4096-byte sectors,
virtio should indicate 4096-byte blocks to the guest.
With databases and filesystems, if you care about data integrity:
- If the backing dev is a real disk with 4096-byte sectors,
or a file whose access is through a 4096-byte-per-page cache,
virtio must indicate 4096-byte blocks otherwise guest
journalling is not host-powerfail safe.
You get the idea. If there is only one parameter, it really should be
at least as large as the smallest unit which may be corrupted by
writes when errors occur.
-- Jamie
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size
2010-01-04 7:02 ` Avi Kivity
@ 2010-01-05 20:18 ` Jamie Lokier
0 siblings, 0 replies; 9+ messages in thread
From: Jamie Lokier @ 2010-01-05 20:18 UTC (permalink / raw)
To: Avi Kivity; +Cc: Rusty Russell, qemu-devel, kvm, virtualization
Avi Kivity wrote:
> Physical block size is the what the logical block size would have been
> is software didn't suck. In theory they should be the same, but since
> compatibility reaons clamp the logical block size to 512, they have to
> differ. A disk may have a physical block size of 4096 and emulate
> logical block size of 512 on top of that using read-modify-write.
>
> Or so I understand it.
I think that's right, but a side effect is that if you get a power
failure during the read-modify-write, bytes anywhere in 4096 sector
may be incorrect, so journalling (etc.) needs to use 4096 byte blocks
for data integrity, even though the drive emulates smaller writes.
-- Jamie
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size
2010-01-05 20:16 ` Jamie Lokier
@ 2010-01-08 15:40 ` Christoph Hellwig
2010-01-10 12:35 ` Avi Kivity
0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2010-01-08 15:40 UTC (permalink / raw)
To: Jamie Lokier
Cc: kvm, Rusty Russell, qemu-devel, virtualization, Avi Kivity,
Christoph Hellwig
On Tue, Jan 05, 2010 at 08:16:15PM +0000, Jamie Lokier wrote:
> It would be good if virtio relayed the backing device's basic topology
> hints, so:
>
> - If the backing dev is a real disk with 512-byte sectors,
> virtio should indicate 512-byte blocks to the guest.
>
> - If the backing dev is a real disk with 4096-byte sectors,
> virtio should indicate 4096-byte blocks to the guest.
>
> With databases and filesystems, if you care about data integrity:
>
> - If the backing dev is a real disk with 4096-byte sectors,
> or a file whose access is through a 4096-byte-per-page cache,
> virtio must indicate 4096-byte blocks otherwise guest
> journalling is not host-powerfail safe.
>
> You get the idea. If there is only one parameter, it really should be
> at least as large as the smallest unit which may be corrupted by
> writes when errors occur.
It's not that easy. IDE only supports larger sectors sizes with the
physical sector size attribute, not native larger sectors. While scsi
does support it it's untypical and I would not expect the guests to
not always get it right. So the best is to use the transport native
way to express that we have larger sectors and expect the guest to do
the right thing.
I've done some work on the autodetection of the larger sector sizes
a while ago. But now that people brought up migration I wonder if that
makes sense - if we migrate from a 512 byte sector size disk to a 4096
byte sector size disk we can't simply change guest visible attributes.
Maybe we should pick one on image creation and then stick to it. For an
image format we could write down this information in the image, but for
a raw images that's impossible.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size
2010-01-08 15:40 ` Christoph Hellwig
@ 2010-01-10 12:35 ` Avi Kivity
0 siblings, 0 replies; 9+ messages in thread
From: Avi Kivity @ 2010-01-10 12:35 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Rusty Russell, qemu-devel, kvm, virtualization
On 01/08/2010 05:40 PM, Christoph Hellwig wrote:
> Maybe we should pick one on image creation and then stick to it. For an
> image format we could write down this information in the image, but for
> a raw images that's impossible.
>
The management system should remember it (like it remembers which images
belong to which guest, and how to expose them).
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-01-10 12:36 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1262018363-15871-1-git-send-email-avi@redhat.com>
2010-01-04 3:08 ` [Qemu-devel] Re: [PATCH v2] virtio-blk physical block size Rusty Russell
2010-01-04 7:02 ` Avi Kivity
2010-01-05 20:18 ` Jamie Lokier
2010-01-04 8:30 ` Christoph Hellwig
2010-01-05 12:56 ` Rusty Russell
2010-01-05 12:58 ` Avi Kivity
2010-01-05 20:16 ` Jamie Lokier
2010-01-08 15:40 ` Christoph Hellwig
2010-01-10 12:35 ` Avi Kivity
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).