* Re: booting from virtio-blk
[not found] ` <47F24ABA.3020404@us.ibm.com>
@ 2008-04-01 16:13 ` Hollis Blanchard
2008-04-01 17:05 ` Anthony Liguori
2008-04-01 17:09 ` Anthony Liguori
0 siblings, 2 replies; 8+ messages in thread
From: Hollis Blanchard @ 2008-04-01 16:13 UTC (permalink / raw)
To: Anthony Liguori
Cc: kvm-ppc-devel, Christian Ehrhardt, Rusty Russell, kvm-devel
[-- Attachment #1: Type: text/plain, Size: 2579 bytes --]
On Tue, 2008-04-01 at 09:46 -0500, Anthony Liguori wrote:
> Hollis Blanchard wrote:
> > On Tue, 2008-04-01 at 14:08 +0200, Christian Ehrhardt wrote:
> >> bash-3.00# cat /proc/partitions
> >> major minor #blocks name
> >> [...]
> >> 254 0 22517998136852480 vda <- ?broken?
> >>
> >
> > My guess is this is run-of-the-mill endianness mismatch.
> > 22517998136852480 = 0x00500000_00000000, which 64-bit byteswapped would
> > be 0x5000, and that's probably a reasonable number of 512-byte blocks.
> > Is your disk image 10MB?
> >
> > Why would we have a problem, since both guest and host are big-endian?
> > Because virtio is a PCI device, and PCI MMIO are LE, so
> > __virtio_config_val() in the guest is (correctly) using le64_to_cpu().
> >
> > Why didn't we have problems with virtio-net? Because virtio-net doesn't
> > seem to have anything interesting in PCI config space. virtio-blk's
> > config space contains the capacity and a few other pieces of
> > information.
> >
> > The fix needs to be in qemu, and given the lack of qemu endianness
> > infrastructure, I'm afraid it will be a hack. See
> > http://svn.savannah.nongnu.org/viewvc/trunk/hw/e1000.c?root=qemu&r1=4046&r2=4045&pathrev=4046 for reference. We all know that TARGET_WORDS_BIGENDIAN is totally wrong, but unfortunately it also seems to be the only (accidentally) working solution in qemu without major IO system rework. :(
>
> It's actually not so bad since the virtio config space is already read
> one byte at a time. The following should help.
>
> diff --git a/qemu/hw/virtio-blk.c b/qemu/hw/virtio-blk.c
> index 0f55d2a..492bd7f 100644
> --- a/qemu/hw/virtio-blk.c
> +++ b/qemu/hw/virtio-blk.c
> @@ -134,8 +134,8 @@ static void virtio_blk_update_config(VirtIODevice
> *vdev, uin
> int64_t capacity;
>
> bdrv_get_geometry(s->bs, &capacity);
> - blkcfg.capacity = capacity;
> - blkcfg.seg_max = 128 - 2;
> + blkcfg.capacity = cpu_to_le64(capacity);
> + blkcfg.seg_max = cpu_to_le32(128 - 2);
> memcpy(config, &blkcfg, sizeof(blkcfg));
> }
Thanks Anthony, you've saved me a lot of debug time! Rusty, doing 64-bit
PCI config space accesses with ioread8() definitely violates the
principle of least surprises, and would have taken me a long time to
track down. :(
Attached is a boot log of a PowerPC guest booting from virtio-blk root.
"ramdisk_image" is the standard ~4MB image provided with DENX Embedded
Linux Development Kit. Booting is also *way* faster than NFS root (a few
seconds to get to a shell :) .
--
Hollis Blanchard
IBM Linux Technology Center
[-- Attachment #2: virtio-blk.log --]
[-- Type: text/x-log, Size: 5196 bytes --]
bash-3.00# ./qemu-system-ppcemb -M bamboo -nographic -kernel ../../uImage.bamboo -L ../pc-bios/ -append "root=/dev/vda rw debug" -net nic,model=virtio -net tap -drive file=/images/ramdisk_image,if=virtio,boot=on
bamboo_init: START
Ram size passed is: 144 MB
Calling function ppc440_init
setup mmio
setup universal controller
trying to setup sdram controller
sdram_unmap_bcr: Unmap RAM area 0000000000000000 00400000
sdram_unmap_bcr: Unmap RAM area 0000000000000000 00400000
sdram_set_bcr: Map RAM area 0000000000000000 08000000
sdram_set_bcr: Map RAM area 0000000000000000 01000000
Initializing first serial port
ppc405_serial_init: offset 0000000000000300
Done calling ppc440_init
bamboo_init: load kernel
kernel is at guest address: 0x0
bamboo_init: load device tree file
device tree address is at guest address: 0x2b2100
bamboo_init: loading kvm registers
bamboo_init: DONE
Using Bamboo machine description
Linux version 2.6.25-rc3-hg1858cec8eb87-dirty (hollisb@basalt) (gcc version 3.4.2) #152 Tue Apr 1 10:52:01 CDT 2008
Found legacy serial port 0 for /plb/opb/serial@ef600300
mem=ef600300, taddr=ef600300, irq=0, clk=11059200, speed=115200
Found legacy serial port 1 for /plb/opb/serial@ef600400
mem=ef600400, taddr=ef600400, irq=0, clk=11059200, speed=0
console [udbg0] enabled
Entering add_active_range(0, 0, 36864) 0 entries of 256 used
setup_arch: bootmem
arch: exit
Top of RAM: 0x9000000, Total RAM: 0x9000000
Memory hole size: 0MB
Zone PFN ranges:
DMA 0 -> 36864
Normal 36864 -> 36864
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
0: 0 -> 36864
On node 0 totalpages: 36864
DMA zone: 288 pages used for memmap
DMA zone: 0 pages reserved
DMA zone: 36576 pages, LIFO batch:7
Normal zone: 0 pages used for memmap
Movable zone: 0 pages used for memmap
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 36576
Kernel command line: root=/dev/vda rw debug
irq: Allocated host of type 2 @0xc03f3880
UIC0 (32 IRQ sources) at DCR 0xc0
irq: Default host set to @0xc03f3880
PID hash table entries: 1024 (order: 10, 4096 bytes)
time_init: decrementer frequency = 666.666660 MHz
time_init: processor frequency = 666.666660 MHz
clocksource: timebase mult[600000] shift[22] registered
clockevent: decrementer mult[aaaa] shift[16] cpu[0]
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
Memory: 143060k/147456k available (2632k kernel code, 4252k reserved, 100k data, 125k bss, 132k init)
SLUB: Genslabs=10, HWalign=32, Order=0-1, MinObjects=4, CPUs=1, Nodes=1
Calibrating delay loop... 2490.36 BogoMIPS (lpj=4980736)
Mount-cache hash table entries: 512
net_namespace: 156 bytes
NET: Registered protocol family 16
PCI host bridge /plb/pci@ec000000 (primary) ranges:
MEM 0x00000000a0000000..0x00000000bfffffff -> 0x00000000a0000000
IO 0x00000000e8000000..0x00000000e800ffff -> 0x0000000000000000
4xx PCI DMA offset set to 0x00000000
PCI: Probing PCI hardware
PCI: Hiding 4xx host bridge resources 0000:00:00.0
irq: irq_create_mapping(0xc03f3880, 0x1c)
irq: -> using host @c03f3880
irq: -> obtained virq 28
irq: irq_create_mapping(0xc03f3880, 0x1b)
irq: -> using host @c03f3880
irq: -> obtained virq 27
Time: timebase clocksource has been installed.
NET: Registered protocol family 2
IP route cache hash table entries: 2048 (order: 1, 8192 bytes)
TCP established hash table entries: 8192 (order: 4, 65536 bytes)
TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
TCP: Hash tables configured (established 8192 bind 8192)
TCP reno registered
irq: irq_create_mapping(0xc03f3880, 0x0)
irq: -> using host @c03f3880
irq: -> obtained virq 16
irq: irq_create_mapping(0xc03f3880, 0x1)
irq: -> using host @c03f3880
irq: -> obtained virq 17
io scheduler noop registered
io scheduler anticipatory registered (default)
io scheduler deadline registered
io scheduler cfq registered
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
serial8250.0: ttyS0 at MMIO 0xef600300 (irq = 16) is a 16450
console handover: boot [udbg0] -> real [ttyS0]
irq: irq_create_mapping(0xc03f3880, 0x0)
irq: -> using host @c03f3880
irq: -> existing mapping on virq 16
ef600300.serial: ttyS0 at MMIO 0xef600300 (irq = 16) is a 16450
irq: irq_create_mapping(0xc03f3880, 0x1)
irq: -> using host @c03f3880
irq: -> existing mapping on virq 17
brd: module loaded
Intel(R) PRO/1000 Network Driver - version 7.3.20-k2
Copyright (c) 1999-2006 Intel Corporation.
pcnet32.c:v1.34 14.Aug.2007 tsbogend@alpha.franken.de
tun: Universal TUN/TAP device driver, 1.6
tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
PCI: Enabling device 0000:00:01.0 (0000 -> 0001)
vda: unknown partition table
PCI: Enabling device 0000:00:02.0 (0000 -> 0001)
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 132k init
root:~> ### Application running ...
root:~> ls
bin etc home linuxrc sbin usr
dev ftp lib proc tmp var
root:~>
[-- Attachment #3: Type: text/plain, Size: 278 bytes --]
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
[-- Attachment #4: Type: text/plain, Size: 158 bytes --]
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: booting from virtio-blk
2008-04-01 16:13 ` booting from virtio-blk Hollis Blanchard
@ 2008-04-01 17:05 ` Anthony Liguori
2008-04-01 17:09 ` Anthony Liguori
1 sibling, 0 replies; 8+ messages in thread
From: Anthony Liguori @ 2008-04-01 17:05 UTC (permalink / raw)
To: Hollis Blanchard
Cc: kvm-ppc-devel, Christian Ehrhardt, Rusty Russell, kvm-devel
Hollis Blanchard wrote:
> On Tue, 2008-04-01 at 09:46 -0500, Anthony Liguori wrote:
>
> Thanks Anthony, you've saved me a lot of debug time! Rusty, doing 64-bit
> PCI config space accesses with ioread8() definitely violates the
> principle of least surprises, and would have taken me a long time to
> track down. :(
>
> Attached is a boot log of a PowerPC guest booting from virtio-blk root.
>
> "ramdisk_image" is the standard ~4MB image provided with DENX Embedded
> Linux Development Kit. Booting is also *way* faster than NFS root (a few
> seconds to get to a shell :) .
>
That suggests you have vmexit latency issues. A 4MB disk is pretty much
entirely cachable in memory so you probably end up with only a handful
of requests to get the full disk into memory. Conversely, when using
NFS, every single filesystem operation requests in multiple packets
being delivered/received. To complicate matters further, NFS means you
won't be doing any dentry caching so every single filesystem access will
result in requests as opposed to just the first access.
What sort of ping latency do you get with virtio-net?
Regards,
Anthony Liguori
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: booting from virtio-blk
2008-04-01 16:13 ` booting from virtio-blk Hollis Blanchard
2008-04-01 17:05 ` Anthony Liguori
@ 2008-04-01 17:09 ` Anthony Liguori
2008-04-01 20:36 ` [kvm-ppc-devel] " Benjamin Herrenschmidt
1 sibling, 1 reply; 8+ messages in thread
From: Anthony Liguori @ 2008-04-01 17:09 UTC (permalink / raw)
To: Hollis Blanchard
Cc: kvm-ppc-devel, Christian Ehrhardt, Rusty Russell, kvm-devel
Hollis Blanchard wrote:
>
> Thanks Anthony, you've saved me a lot of debug time! Rusty, doing 64-bit
> PCI config space accesses with ioread8() definitely violates the
> principle of least surprises, and would have taken me a long time to
> track down. :(
>
It's the unfortunate side-effect of using PCI config space without
passing it's semantics through to the virtio devices. Right now, you do
a config_get which is basically a memcpy. If we didn't do accesses with
ioread8(), you could potentially have a caller than did a config_get()
of size 4 that didn't intend on having endian conversion applied.
The other option would have been to provide config_get() and
config_get8/16/32/64() the later performing endian conversion.
Regards,
Anthony Liguori
> Attached is a boot log of a PowerPC guest booting from virtio-blk root.
>
> "ramdisk_image" is the standard ~4MB image provided with DENX Embedded
> Linux Development Kit. Booting is also *way* faster than NFS root (a few
> seconds to get to a shell :) .
>
>
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [kvm-ppc-devel] booting from virtio-blk
2008-04-01 17:09 ` Anthony Liguori
@ 2008-04-01 20:36 ` Benjamin Herrenschmidt
2008-04-01 21:03 ` Anthony Liguori
0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2008-04-01 20:36 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm-ppc-devel, kvm-devel, Rusty Russell, Hollis Blanchard
On Tue, 2008-04-01 at 12:09 -0500, Anthony Liguori wrote:
> It's the unfortunate side-effect of using PCI config space without
> passing it's semantics through to the virtio devices. Right now, you do
> a config_get which is basically a memcpy. If we didn't do accesses with
> ioread8(), you could potentially have a caller than did a config_get()
> of size 4 that didn't intend on having endian conversion applied.
>
> The other option would have been to provide config_get() and
> config_get8/16/32/64() the later performing endian conversion.
Config space should be 8/16/32. Is that ever bridged to real PCI config
space anyway ? Or only virtio ? And it should be endian swapped at the
low level, either by your HV calls or by the low level kernel. Always.
That's how PCI config space is supposed to work.
Ben.
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [kvm-ppc-devel] booting from virtio-blk
2008-04-01 20:36 ` [kvm-ppc-devel] " Benjamin Herrenschmidt
@ 2008-04-01 21:03 ` Anthony Liguori
2008-04-01 21:14 ` Benjamin Herrenschmidt
2008-04-01 21:18 ` Hollis Blanchard
0 siblings, 2 replies; 8+ messages in thread
From: Anthony Liguori @ 2008-04-01 21:03 UTC (permalink / raw)
To: benh; +Cc: kvm-ppc-devel, kvm-devel, Rusty Russell, Hollis Blanchard
Benjamin Herrenschmidt wrote:
> On Tue, 2008-04-01 at 12:09 -0500, Anthony Liguori wrote:
>
>
>> It's the unfortunate side-effect of using PCI config space without
>> passing it's semantics through to the virtio devices. Right now, you do
>> a config_get which is basically a memcpy. If we didn't do accesses with
>> ioread8(), you could potentially have a caller than did a config_get()
>> of size 4 that didn't intend on having endian conversion applied.
>>
>> The other option would have been to provide config_get() and
>> config_get8/16/32/64() the later performing endian conversion.
>>
>
> Config space should be 8/16/32. Is that ever bridged to real PCI config
> space anyway ? Or only virtio ? And it should be endian swapped at the
> low level, either by your HV calls or by the low level kernel. Always.
> That's how PCI config space is supposed to work.
>
I guess the point is, is that virtio config space is an abstraction with
the implementation that is based on PCI converting all accesses to a
series of 8-bit accesses. The virtio config space happens to be little
endian just like the PCI config space.
Regards,
Anthony Liguori
> Ben.
>
>
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [kvm-ppc-devel] booting from virtio-blk
2008-04-01 21:03 ` Anthony Liguori
@ 2008-04-01 21:14 ` Benjamin Herrenschmidt
2008-04-01 21:18 ` Hollis Blanchard
1 sibling, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2008-04-01 21:14 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm-ppc-devel, kvm-devel, Rusty Russell, Hollis Blanchard
> > Config space should be 8/16/32. Is that ever bridged to real PCI config
> > space anyway ? Or only virtio ? And it should be endian swapped at the
> > low level, either by your HV calls or by the low level kernel. Always.
> > That's how PCI config space is supposed to work.
> >
>
> I guess the point is, is that virtio config space is an abstraction with
> the implementation that is based on PCI converting all accesses to a
> series of 8-bit accesses. The virtio config space happens to be little
> endian just like the PCI config space.
But PCI does -not- convert all accesses into a serie of 8 bit
accesses :-)
Ben.
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [kvm-ppc-devel] booting from virtio-blk
2008-04-01 21:03 ` Anthony Liguori
2008-04-01 21:14 ` Benjamin Herrenschmidt
@ 2008-04-01 21:18 ` Hollis Blanchard
2008-04-01 21:24 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 8+ messages in thread
From: Hollis Blanchard @ 2008-04-01 21:18 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm-ppc-devel, benh, Rusty Russell, kvm-devel
On Tue, 2008-04-01 at 16:03 -0500, Anthony Liguori wrote:
> Benjamin Herrenschmidt wrote:
> > On Tue, 2008-04-01 at 12:09 -0500, Anthony Liguori wrote:
> >
> >
> >> It's the unfortunate side-effect of using PCI config space without
> >> passing it's semantics through to the virtio devices. Right now, you do
> >> a config_get which is basically a memcpy. If we didn't do accesses with
> >> ioread8(), you could potentially have a caller than did a config_get()
> >> of size 4 that didn't intend on having endian conversion applied.
> >>
> >> The other option would have been to provide config_get() and
> >> config_get8/16/32/64() the later performing endian conversion.
> >>
> >
> > Config space should be 8/16/32. Is that ever bridged to real PCI config
> > space anyway ? Or only virtio ? And it should be endian swapped at the
> > low level, either by your HV calls or by the low level kernel. Always.
> > That's how PCI config space is supposed to work.
Virtio accesses will not be bridged to real PCI space.
> I guess the point is, is that virtio config space is an abstraction with
> the implementation that is based on PCI converting all accesses to a
> series of 8-bit accesses. The virtio config space happens to be little
> endian just like the PCI config space.
The point is that a virtio device appears as a PCI device. Like all
other PCI devices, it has config space. Unlike all other PCI devices,
its config space is accessed with 1-byte reads.
--
Hollis Blanchard
IBM Linux Technology Center
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [kvm-ppc-devel] booting from virtio-blk
2008-04-01 21:18 ` Hollis Blanchard
@ 2008-04-01 21:24 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2008-04-01 21:24 UTC (permalink / raw)
To: Hollis Blanchard; +Cc: kvm-ppc-devel, kvm-devel, Rusty Russell
On Tue, 2008-04-01 at 16:18 -0500, Hollis Blanchard wrote:
> The point is that a virtio device appears as a PCI device. Like all
> other PCI devices, it has config space. Unlike all other PCI devices,
> its config space is accessed with 1-byte reads.
Which is weirdo ... it you guys make it look like PCI, then -really-
make it look like PCI :-)
Ben.
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-04-01 21:24 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1207051275240-git-send-email-ehrhardt@linux.vnet.ibm.com>
[not found] ` <47F225D2.9020409@linux.vnet.ibm.com>
[not found] ` <1207060430.6214.12.camel@basalt>
[not found] ` <47F24ABA.3020404@us.ibm.com>
2008-04-01 16:13 ` booting from virtio-blk Hollis Blanchard
2008-04-01 17:05 ` Anthony Liguori
2008-04-01 17:09 ` Anthony Liguori
2008-04-01 20:36 ` [kvm-ppc-devel] " Benjamin Herrenschmidt
2008-04-01 21:03 ` Anthony Liguori
2008-04-01 21:14 ` Benjamin Herrenschmidt
2008-04-01 21:18 ` Hollis Blanchard
2008-04-01 21:24 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox