* [ANNOUNCE] Native Linux KVM tool v2
@ 2011-06-15 15:53 Pekka Enberg
2011-06-15 16:30 ` Avi Kivity
` (4 more replies)
0 siblings, 5 replies; 43+ messages in thread
From: Pekka Enberg @ 2011-06-15 15:53 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Avi Kivity, Andrew Morton, Linus Torvalds, Ingo Molnar,
Prasad Joshi, Sasha Levin, Cyrill Gorcunov, Asias He
Hi all,
We’re proud to announce the second version of the Native Linux KVM tool! We’re
now officially aiming for merging to mainline in 3.1.
Highlights:
- Experimental GUI support using SDL and VNC
- SMP support. tools/kvm/ now has a highly scalable, largely lockless driver
interface and the individual drivers are using finegrained locks.
- TAP-based virtio networking
- Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See the
following URL for test result details: https://gist.github.com/1026888
- Virtio-9p support for host filesystem access in guests
- Virtio Random Number Generator
- Host block devices as in-memory copy-on-write guest images on 64-bit hosts
1. To try out the tool, clone the git repository:
git clone git://github.com/penberg/linux-kvm.git
or alternatively, if you already have a kernel source tree:
git remote add kvm-tool git://github.com/penberg/linux-kvm.git
git remote update
git checkout -b kvm-tool/master kvm-tool
2. Compile the tool:
cd tools/kvm && make
3. Download a raw userspace image:
Minimal:
wget http://wiki.qemu.org/download/linux-0.2.img.bz2 && bunzip2 linux-0.2.img.bz2
Debian Squeeze QCOW2 image:
wget http://people.debian.org/~aurel32/qemu/i386/debian_squeeze_i386_standard.qcow2
4. Build a kernel with the following options:
CONFIG_VIRTIO_BLK=y
CONFIG_VIRTIO_NET=y
CONFIG_VIRTIO_CONSOLE=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_HW_RANDOM_VIRTIO=y
CONFIG_FB_VESA=y
Note: also make sure you have CONFIG_EXT2_FS or CONFIG_EXT4_FS if you use the
above images.
5. And finally, launch the hypervisor:
./kvm run -d linux-0.2.img
or
./kvm run -d debian_squeeze_i386_standard.qcow2
or
sudo ./kvm run -d linux-0.2.img -n virtio
This release was brought to you by the following people:
Sasha Levin
Pekka Enberg
Asias He
Prasad Joshi
Cyrill Gorcunov
Ingo Molnar
John Floren
Amos Kong
Giuseppe Calderaro
Amerigo Wang
Paul Bolle
David Ahern
Most of us developers are hanging out on #pvm channel at irc.freenode.net
if you want to drop by for questions, comments, and bug reports.
Pekka
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 15:53 [ANNOUNCE] Native Linux KVM tool v2 Pekka Enberg @ 2011-06-15 16:30 ` Avi Kivity 2011-06-15 17:10 ` Pekka Enberg 2011-06-15 21:41 ` Anthony Liguori ` (3 subsequent siblings) 4 siblings, 1 reply; 43+ messages in thread From: Avi Kivity @ 2011-06-15 16:30 UTC (permalink / raw) To: Pekka Enberg Cc: linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Prasad Joshi, Sasha Levin, Cyrill Gorcunov, Asias He On 06/15/2011 06:53 PM, Pekka Enberg wrote: > - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See the > following URL for test result details: https://gist.github.com/1026888 This is surprising. How is qemu invoked? btw the dump above is a little hard to interpret. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 16:30 ` Avi Kivity @ 2011-06-15 17:10 ` Pekka Enberg 2011-06-15 20:13 ` Prasad Joshi 0 siblings, 1 reply; 43+ messages in thread From: Pekka Enberg @ 2011-06-15 17:10 UTC (permalink / raw) To: Avi Kivity Cc: linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Prasad Joshi, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivity <avi@redhat.com> wrote: > On 06/15/2011 06:53 PM, Pekka Enberg wrote: >> >> - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See >> the >> following URL for test result details: https://gist.github.com/1026888 > > This is surprising. How is qemu invoked? Prasad will have the details. Please note that the above are with Qemu defaults which doesn't use virtio. The results with virtio are little better but still in favor of tools/kvm. > btw the dump above is a little hard to interpret. It's what fio reports. The relevant bits are: Qemu: Run status group 0 (all jobs): READ: io=204800KB, aggrb=61152KB/s, minb=15655KB/s, maxb=17845KB/s, mint=2938msec, maxt=3349msec WRITE: io=68544KB, aggrb=28045KB/s, minb=6831KB/s, maxb=7858KB/s, mint=2292msec, maxt=2444msec Run status group 1 (all jobs): READ: io=204800KB, aggrb=61779KB/s, minb=15815KB/s, maxb=17189KB/s, mint=3050msec, maxt=3315msec WRITE: io=66576KB, aggrb=24165KB/s, minb=6205KB/s, maxb=7166KB/s, mint=2485msec, maxt=2755msec Run status group 2 (all jobs): READ: io=204800KB, aggrb=6722KB/s, minb=1720KB/s, maxb=1737KB/s, mint=30178msec, maxt=30467msec WRITE: io=65424KB, aggrb=2156KB/s, minb=550KB/s, maxb=573KB/s, mint=29682msec, maxt=30342msec Run status group 3 (all jobs): READ: io=204800KB, aggrb=6994KB/s, minb=1790KB/s, maxb=1834KB/s, mint=28574msec, maxt=29279msec WRITE: io=68192KB, aggrb=2382KB/s, minb=548KB/s, maxb=740KB/s, mint=27121msec, maxt=28625msec Disk stats (read/write): sdb: ios=60583/6652, merge=0/164, ticks=156340/672030, in_queue=828230, util=82.71% tools/kvm: Run status group 0 (all jobs): READ: io=204800KB, aggrb=149162KB/s, minb=38185KB/s, maxb=46030KB/s, mint=1139msec, maxt=1373msec WRITE: io=70528KB, aggrb=79156KB/s, minb=18903KB/s, maxb=23726KB/s, mint=804msec, maxt=891msec Run status group 1 (all jobs): READ: io=204800KB, aggrb=188235KB/s, minb=48188KB/s, maxb=57932KB/s, mint=905msec, maxt=1088msec WRITE: io=64464KB, aggrb=84821KB/s, minb=21751KB/s, maxb=27392KB/s, mint=570msec, maxt=760msec Run status group 2 (all jobs): READ: io=204800KB, aggrb=20005KB/s, minb=5121KB/s, maxb=5333KB/s, mint=9830msec, maxt=10237msec WRITE: io=66624KB, aggrb=6615KB/s, minb=1671KB/s, maxb=1781KB/s, mint=9558msec, maxt=10071msec Run status group 3 (all jobs): READ: io=204800KB, aggrb=66149KB/s, minb=16934KB/s, maxb=17936KB/s, mint=2923msec, maxt=3096msec WRITE: io=69600KB, aggrb=26717KB/s, minb=6595KB/s, maxb=7342KB/s, mint=2530msec, maxt=2605msec Disk stats (read/write): vdb: ios=61002/6654, merge=0/183, ticks=27270/205780, in_queue=232220, util=69.46% ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 17:10 ` Pekka Enberg @ 2011-06-15 20:13 ` Prasad Joshi 2011-06-15 20:23 ` Sasha Levin ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: Prasad Joshi @ 2011-06-15 20:13 UTC (permalink / raw) To: Pekka Enberg, Avi Kivity Cc: linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enberg <penberg@kernel.org> wrote: > On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivity <avi@redhat.com> wrote: >> On 06/15/2011 06:53 PM, Pekka Enberg wrote: >>> >>> - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See >>> the >>> following URL for test result details: https://gist.github.com/1026888 >> >> This is surprising. How is qemu invoked? > > Prasad will have the details. Please note that the above are with Qemu > defaults which doesn't use virtio. The results with virtio are little > better but still in favor of tools/kvm. > The qcow2 image used for testing was copied on to /dev/shm to avoid the disk delays in performance measurement. QEMU was invoked with following parameters $ qemu-system-x86_64 -hda <disk image on hard disk> -hdb /dev/shm/test.qcow2 -m 1024M FIO job file used for measuring the numbers was prasad@prasad-vm:~$ cat fio-mixed.job ; fio-mixed.job for autotest [global] name=fio-sync directory=/mnt rw=randrw rwmixread=67 rwmixwrite=33 bsrange=16K-256K direct=0 end_fsync=1 verify=crc32 ;ioscheduler=x numjobs=4 [file1] size=50M ioengine=sync mem=malloc [file2] stonewall size=50M ioengine=aio mem=shm iodepth=4 [file3] stonewall size=50M ioengine=mmap mem=mmap direct=1 [file4] stonewall size=50M ioengine=splice mem=malloc direct=1 - The test generates 16 file each of ~50MB, so in total ~800MB data was written. - The test.qcow2 was newly created before it was used with QEMU or KVM tool - The size of the QCOW2 image was 1.5GB. - The host machine had 2GB RAM. - The guest machine in both the cases was started with 1GB memory. Thanks and Regards, Prasad >> btw the dump above is a little hard to interpret. > > It's what fio reports. The relevant bits are: > > > Qemu: > > Run status group 0 (all jobs): > READ: io=204800KB, aggrb=61152KB/s, minb=15655KB/s, maxb=17845KB/s, > mint=2938msec, maxt=3349msec > WRITE: io=68544KB, aggrb=28045KB/s, minb=6831KB/s, maxb=7858KB/s, > mint=2292msec, maxt=2444msec > > Run status group 1 (all jobs): > READ: io=204800KB, aggrb=61779KB/s, minb=15815KB/s, maxb=17189KB/s, > mint=3050msec, maxt=3315msec > WRITE: io=66576KB, aggrb=24165KB/s, minb=6205KB/s, maxb=7166KB/s, > mint=2485msec, maxt=2755msec > > Run status group 2 (all jobs): > READ: io=204800KB, aggrb=6722KB/s, minb=1720KB/s, maxb=1737KB/s, > mint=30178msec, maxt=30467msec > WRITE: io=65424KB, aggrb=2156KB/s, minb=550KB/s, maxb=573KB/s, > mint=29682msec, maxt=30342msec > > Run status group 3 (all jobs): > READ: io=204800KB, aggrb=6994KB/s, minb=1790KB/s, maxb=1834KB/s, > mint=28574msec, maxt=29279msec > WRITE: io=68192KB, aggrb=2382KB/s, minb=548KB/s, maxb=740KB/s, > mint=27121msec, maxt=28625msec > > Disk stats (read/write): > sdb: ios=60583/6652, merge=0/164, ticks=156340/672030, > in_queue=828230, util=82.71% > > tools/kvm: > > Run status group 0 (all jobs): > READ: io=204800KB, aggrb=149162KB/s, minb=38185KB/s, > maxb=46030KB/s, mint=1139msec, maxt=1373msec > WRITE: io=70528KB, aggrb=79156KB/s, minb=18903KB/s, maxb=23726KB/s, > mint=804msec, maxt=891msec > > Run status group 1 (all jobs): > READ: io=204800KB, aggrb=188235KB/s, minb=48188KB/s, > maxb=57932KB/s, mint=905msec, maxt=1088msec > WRITE: io=64464KB, aggrb=84821KB/s, minb=21751KB/s, maxb=27392KB/s, > mint=570msec, maxt=760msec > > Run status group 2 (all jobs): > READ: io=204800KB, aggrb=20005KB/s, minb=5121KB/s, maxb=5333KB/s, > mint=9830msec, maxt=10237msec > WRITE: io=66624KB, aggrb=6615KB/s, minb=1671KB/s, maxb=1781KB/s, > mint=9558msec, maxt=10071msec > > Run status group 3 (all jobs): > READ: io=204800KB, aggrb=66149KB/s, minb=16934KB/s, maxb=17936KB/s, > mint=2923msec, maxt=3096msec > WRITE: io=69600KB, aggrb=26717KB/s, minb=6595KB/s, maxb=7342KB/s, > mint=2530msec, maxt=2605msec > > Disk stats (read/write): > vdb: ios=61002/6654, merge=0/183, ticks=27270/205780, > in_queue=232220, util=69.46% > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 20:13 ` Prasad Joshi @ 2011-06-15 20:23 ` Sasha Levin 2011-06-15 20:49 ` Prasad Joshi 2011-06-15 21:53 ` Anthony Liguori 2011-06-15 22:04 ` Anthony Liguori 2 siblings, 1 reply; 43+ messages in thread From: Sasha Levin @ 2011-06-15 20:23 UTC (permalink / raw) To: Prasad Joshi Cc: Pekka Enberg, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Cyrill Gorcunov, Asias He, Jens Axboe On Wed, 2011-06-15 at 21:13 +0100, Prasad Joshi wrote: > On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enberg <penberg@kernel.org> wrote: > > On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivity <avi@redhat.com> wrote: > >> On 06/15/2011 06:53 PM, Pekka Enberg wrote: > >>> > >>> - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See > >>> the > >>> following URL for test result details: https://gist.github.com/1026888 > >> > >> This is surprising. How is qemu invoked? > > > > Prasad will have the details. Please note that the above are with Qemu > > defaults which doesn't use virtio. The results with virtio are little > > better but still in favor of tools/kvm. > > > > The qcow2 image used for testing was copied on to /dev/shm to avoid > the disk delays in performance measurement. > > QEMU was invoked with following parameters > > $ qemu-system-x86_64 -hda <disk image on hard disk> -hdb > /dev/shm/test.qcow2 -m 1024M > Prasad, Could you please run this test with '-drive file=/dev/shm/test.qcow2,if=virtio' instead of the '-hdb' thing? > FIO job file used for measuring the numbers was > > prasad@prasad-vm:~$ cat fio-mixed.job > ; fio-mixed.job for autotest > > [global] > name=fio-sync > directory=/mnt > rw=randrw > rwmixread=67 > rwmixwrite=33 > bsrange=16K-256K > direct=0 > end_fsync=1 > verify=crc32 > ;ioscheduler=x > numjobs=4 > > [file1] > size=50M > ioengine=sync > mem=malloc > > [file2] > stonewall > size=50M > ioengine=aio > mem=shm > iodepth=4 > > [file3] > stonewall > size=50M > ioengine=mmap > mem=mmap > direct=1 > > [file4] > stonewall > size=50M > ioengine=splice > mem=malloc > direct=1 > > - The test generates 16 file each of ~50MB, so in total ~800MB data was written. > - The test.qcow2 was newly created before it was used with QEMU or KVM tool > - The size of the QCOW2 image was 1.5GB. > - The host machine had 2GB RAM. > - The guest machine in both the cases was started with 1GB memory. > > Thanks and Regards, > Prasad > > >> btw the dump above is a little hard to interpret. > > > > It's what fio reports. The relevant bits are: > > > > > > Qemu: > > > > Run status group 0 (all jobs): > > READ: io=204800KB, aggrb=61152KB/s, minb=15655KB/s, maxb=17845KB/s, > > mint=2938msec, maxt=3349msec > > WRITE: io=68544KB, aggrb=28045KB/s, minb=6831KB/s, maxb=7858KB/s, > > mint=2292msec, maxt=2444msec > > > > Run status group 1 (all jobs): > > READ: io=204800KB, aggrb=61779KB/s, minb=15815KB/s, maxb=17189KB/s, > > mint=3050msec, maxt=3315msec > > WRITE: io=66576KB, aggrb=24165KB/s, minb=6205KB/s, maxb=7166KB/s, > > mint=2485msec, maxt=2755msec > > > > Run status group 2 (all jobs): > > READ: io=204800KB, aggrb=6722KB/s, minb=1720KB/s, maxb=1737KB/s, > > mint=30178msec, maxt=30467msec > > WRITE: io=65424KB, aggrb=2156KB/s, minb=550KB/s, maxb=573KB/s, > > mint=29682msec, maxt=30342msec > > > > Run status group 3 (all jobs): > > READ: io=204800KB, aggrb=6994KB/s, minb=1790KB/s, maxb=1834KB/s, > > mint=28574msec, maxt=29279msec > > WRITE: io=68192KB, aggrb=2382KB/s, minb=548KB/s, maxb=740KB/s, > > mint=27121msec, maxt=28625msec > > > > Disk stats (read/write): > > sdb: ios=60583/6652, merge=0/164, ticks=156340/672030, > > in_queue=828230, util=82.71% > > > > tools/kvm: > > > > Run status group 0 (all jobs): > > READ: io=204800KB, aggrb=149162KB/s, minb=38185KB/s, > > maxb=46030KB/s, mint=1139msec, maxt=1373msec > > WRITE: io=70528KB, aggrb=79156KB/s, minb=18903KB/s, maxb=23726KB/s, > > mint=804msec, maxt=891msec > > > > Run status group 1 (all jobs): > > READ: io=204800KB, aggrb=188235KB/s, minb=48188KB/s, > > maxb=57932KB/s, mint=905msec, maxt=1088msec > > WRITE: io=64464KB, aggrb=84821KB/s, minb=21751KB/s, maxb=27392KB/s, > > mint=570msec, maxt=760msec > > > > Run status group 2 (all jobs): > > READ: io=204800KB, aggrb=20005KB/s, minb=5121KB/s, maxb=5333KB/s, > > mint=9830msec, maxt=10237msec > > WRITE: io=66624KB, aggrb=6615KB/s, minb=1671KB/s, maxb=1781KB/s, > > mint=9558msec, maxt=10071msec > > > > Run status group 3 (all jobs): > > READ: io=204800KB, aggrb=66149KB/s, minb=16934KB/s, maxb=17936KB/s, > > mint=2923msec, maxt=3096msec > > WRITE: io=69600KB, aggrb=26717KB/s, minb=6595KB/s, maxb=7342KB/s, > > mint=2530msec, maxt=2605msec > > > > Disk stats (read/write): > > vdb: ios=61002/6654, merge=0/183, ticks=27270/205780, > > in_queue=232220, util=69.46% > > -- Sasha. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 20:23 ` Sasha Levin @ 2011-06-15 20:49 ` Prasad Joshi 0 siblings, 0 replies; 43+ messages in thread From: Prasad Joshi @ 2011-06-15 20:49 UTC (permalink / raw) To: Sasha Levin Cc: Pekka Enberg, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Cyrill Gorcunov, Asias He, Jens Axboe On Wed, Jun 15, 2011 at 9:23 PM, Sasha Levin <levinsasha928@gmail.com> wrote: > On Wed, 2011-06-15 at 21:13 +0100, Prasad Joshi wrote: >> On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enberg <penberg@kernel.org> wrote: >> > On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivity <avi@redhat.com> wrote: >> >> On 06/15/2011 06:53 PM, Pekka Enberg wrote: >> >>> >> >>> - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See >> >>> the >> >>> following URL for test result details: https://gist.github.com/1026888 >> >> >> >> This is surprising. How is qemu invoked? >> > >> > Prasad will have the details. Please note that the above are with Qemu >> > defaults which doesn't use virtio. The results with virtio are little >> > better but still in favor of tools/kvm. >> > >> >> The qcow2 image used for testing was copied on to /dev/shm to avoid >> the disk delays in performance measurement. >> >> QEMU was invoked with following parameters >> >> $ qemu-system-x86_64 -hda <disk image on hard disk> -hdb >> /dev/shm/test.qcow2 -m 1024M >> > > Prasad, Could you please run this test with '-drive > file=/dev/shm/test.qcow2,if=virtio' instead of the '-hdb' thing? > Infact I have already tried that. Like Pekka mentioned, the results are still in favour of KVM tools. I machine that I work on is not with me at the moment, I will be able to mail the exact numbers tomorrow. Thanks and Regards, Prasad >> FIO job file used for measuring the numbers was >> >> prasad@prasad-vm:~$ cat fio-mixed.job >> ; fio-mixed.job for autotest >> >> [global] >> name=fio-sync >> directory=/mnt >> rw=randrw >> rwmixread=67 >> rwmixwrite=33 >> bsrange=16K-256K >> direct=0 >> end_fsync=1 >> verify=crc32 >> ;ioscheduler=x >> numjobs=4 >> >> [file1] >> size=50M >> ioengine=sync >> mem=malloc >> >> [file2] >> stonewall >> size=50M >> ioengine=aio >> mem=shm >> iodepth=4 >> >> [file3] >> stonewall >> size=50M >> ioengine=mmap >> mem=mmap >> direct=1 >> >> [file4] >> stonewall >> size=50M >> ioengine=splice >> mem=malloc >> direct=1 >> >> - The test generates 16 file each of ~50MB, so in total ~800MB data was written. >> - The test.qcow2 was newly created before it was used with QEMU or KVM tool >> - The size of the QCOW2 image was 1.5GB. >> - The host machine had 2GB RAM. >> - The guest machine in both the cases was started with 1GB memory. >> >> Thanks and Regards, >> Prasad >> >> >> btw the dump above is a little hard to interpret. >> > >> > It's what fio reports. The relevant bits are: >> > >> > >> > Qemu: >> > >> > Run status group 0 (all jobs): >> > READ: io=204800KB, aggrb=61152KB/s, minb=15655KB/s, maxb=17845KB/s, >> > mint=2938msec, maxt=3349msec >> > WRITE: io=68544KB, aggrb=28045KB/s, minb=6831KB/s, maxb=7858KB/s, >> > mint=2292msec, maxt=2444msec >> > >> > Run status group 1 (all jobs): >> > READ: io=204800KB, aggrb=61779KB/s, minb=15815KB/s, maxb=17189KB/s, >> > mint=3050msec, maxt=3315msec >> > WRITE: io=66576KB, aggrb=24165KB/s, minb=6205KB/s, maxb=7166KB/s, >> > mint=2485msec, maxt=2755msec >> > >> > Run status group 2 (all jobs): >> > READ: io=204800KB, aggrb=6722KB/s, minb=1720KB/s, maxb=1737KB/s, >> > mint=30178msec, maxt=30467msec >> > WRITE: io=65424KB, aggrb=2156KB/s, minb=550KB/s, maxb=573KB/s, >> > mint=29682msec, maxt=30342msec >> > >> > Run status group 3 (all jobs): >> > READ: io=204800KB, aggrb=6994KB/s, minb=1790KB/s, maxb=1834KB/s, >> > mint=28574msec, maxt=29279msec >> > WRITE: io=68192KB, aggrb=2382KB/s, minb=548KB/s, maxb=740KB/s, >> > mint=27121msec, maxt=28625msec >> > >> > Disk stats (read/write): >> > sdb: ios=60583/6652, merge=0/164, ticks=156340/672030, >> > in_queue=828230, util=82.71% >> > >> > tools/kvm: >> > >> > Run status group 0 (all jobs): >> > READ: io=204800KB, aggrb=149162KB/s, minb=38185KB/s, >> > maxb=46030KB/s, mint=1139msec, maxt=1373msec >> > WRITE: io=70528KB, aggrb=79156KB/s, minb=18903KB/s, maxb=23726KB/s, >> > mint=804msec, maxt=891msec >> > >> > Run status group 1 (all jobs): >> > READ: io=204800KB, aggrb=188235KB/s, minb=48188KB/s, >> > maxb=57932KB/s, mint=905msec, maxt=1088msec >> > WRITE: io=64464KB, aggrb=84821KB/s, minb=21751KB/s, maxb=27392KB/s, >> > mint=570msec, maxt=760msec >> > >> > Run status group 2 (all jobs): >> > READ: io=204800KB, aggrb=20005KB/s, minb=5121KB/s, maxb=5333KB/s, >> > mint=9830msec, maxt=10237msec >> > WRITE: io=66624KB, aggrb=6615KB/s, minb=1671KB/s, maxb=1781KB/s, >> > mint=9558msec, maxt=10071msec >> > >> > Run status group 3 (all jobs): >> > READ: io=204800KB, aggrb=66149KB/s, minb=16934KB/s, maxb=17936KB/s, >> > mint=2923msec, maxt=3096msec >> > WRITE: io=69600KB, aggrb=26717KB/s, minb=6595KB/s, maxb=7342KB/s, >> > mint=2530msec, maxt=2605msec >> > >> > Disk stats (read/write): >> > vdb: ios=61002/6654, merge=0/183, ticks=27270/205780, >> > in_queue=232220, util=69.46% >> > > > -- > > Sasha. > > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 20:13 ` Prasad Joshi 2011-06-15 20:23 ` Sasha Levin @ 2011-06-15 21:53 ` Anthony Liguori 2011-06-15 22:04 ` Anthony Liguori 2 siblings, 0 replies; 43+ messages in thread From: Anthony Liguori @ 2011-06-15 21:53 UTC (permalink / raw) To: Prasad Joshi Cc: Pekka Enberg, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On 06/15/2011 03:13 PM, Prasad Joshi wrote: > On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enberg<penberg@kernel.org> wrote: >> On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivity<avi@redhat.com> wrote: >>> On 06/15/2011 06:53 PM, Pekka Enberg wrote: >>>> >>>> - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See >>>> the >>>> following URL for test result details: https://gist.github.com/1026888 >>> >>> This is surprising. How is qemu invoked? >> >> Prasad will have the details. Please note that the above are with Qemu >> defaults which doesn't use virtio. The results with virtio are little >> better but still in favor of tools/kvm. >> > > The qcow2 image used for testing was copied on to /dev/shm to avoid > the disk delays in performance measurement. Our experience has been that this is actually not a great way to simulate fast storage. Spindle based storage has very different characteristics than memory as there is a significant cost for seeking. -hdb uses IDE too. That's pretty unfair since IDE is limited to a single request at a time whereas virtio can support multiple requests (and native kvm tools is using virtio). Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 20:13 ` Prasad Joshi 2011-06-15 20:23 ` Sasha Levin 2011-06-15 21:53 ` Anthony Liguori @ 2011-06-15 22:04 ` Anthony Liguori 2011-06-15 22:07 ` Alexander Graf 2011-06-16 5:29 ` Stefan Hajnoczi 2 siblings, 2 replies; 43+ messages in thread From: Anthony Liguori @ 2011-06-15 22:04 UTC (permalink / raw) To: Prasad Joshi Cc: Pekka Enberg, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On 06/15/2011 03:13 PM, Prasad Joshi wrote: > On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enberg<penberg@kernel.org> wrote: >> On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivity<avi@redhat.com> wrote: >>> On 06/15/2011 06:53 PM, Pekka Enberg wrote: >>>> >>>> - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See >>>> the >>>> following URL for test result details: https://gist.github.com/1026888 >>> >>> This is surprising. How is qemu invoked? >> >> Prasad will have the details. Please note that the above are with Qemu >> defaults which doesn't use virtio. The results with virtio are little >> better but still in favor of tools/kvm. >> > > The qcow2 image used for testing was copied on to /dev/shm to avoid > the disk delays in performance measurement. > > QEMU was invoked with following parameters > > $ qemu-system-x86_64 -hda<disk image on hard disk> -hdb > /dev/shm/test.qcow2 -m 1024M Looking more closely at native KVM tools, you would need to use the following invocation to have an apples-to-apples comparison: qemu-system-x86_64 -drive file=/dev/shm/test.qcow2,cache=writeback,if=virtio It doesn't appear that writes are stable by default with native KVM tools. They are stable by default in QEMU because since many guests simply do not inject FLUSH's reliably. cache=writeback with qcow2 will use the same mode that native KVM tools is using, unstable writes for data with metadata consistency preserved. This is almost certainly while you're seeing such high performance btw. You should also advertise WCE=1 to the guest from a correctness perspective. You aren't doing that right now. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 22:04 ` Anthony Liguori @ 2011-06-15 22:07 ` Alexander Graf 2011-06-15 22:20 ` Anthony Liguori 2011-06-16 5:45 ` Pekka Enberg 2011-06-16 5:29 ` Stefan Hajnoczi 1 sibling, 2 replies; 43+ messages in thread From: Alexander Graf @ 2011-06-15 22:07 UTC (permalink / raw) To: Anthony Liguori Cc: Prasad Joshi, Pekka Enberg, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On 16.06.2011, at 00:04, Anthony Liguori wrote: > On 06/15/2011 03:13 PM, Prasad Joshi wrote: >> On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enberg<penberg@kernel.org> wrote: >>> On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivity<avi@redhat.com> wrote: >>>> On 06/15/2011 06:53 PM, Pekka Enberg wrote: >>>>> >>>>> - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See >>>>> the >>>>> following URL for test result details: https://gist.github.com/1026888 >>>> >>>> This is surprising. How is qemu invoked? >>> >>> Prasad will have the details. Please note that the above are with Qemu >>> defaults which doesn't use virtio. The results with virtio are little >>> better but still in favor of tools/kvm. >>> >> >> The qcow2 image used for testing was copied on to /dev/shm to avoid >> the disk delays in performance measurement. >> >> QEMU was invoked with following parameters >> >> $ qemu-system-x86_64 -hda<disk image on hard disk> -hdb >> /dev/shm/test.qcow2 -m 1024M > > Looking more closely at native KVM tools, you would need to use the following invocation to have an apples-to-apples comparison: > > qemu-system-x86_64 -drive file=/dev/shm/test.qcow2,cache=writeback,if=virtio Wouldn't this still be using threaded AIO mode? I thought KVM tools used native AIO? Alex ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 22:07 ` Alexander Graf @ 2011-06-15 22:20 ` Anthony Liguori 2011-06-15 22:44 ` Anthony Liguori 2011-06-16 5:45 ` Pekka Enberg 1 sibling, 1 reply; 43+ messages in thread From: Anthony Liguori @ 2011-06-15 22:20 UTC (permalink / raw) To: Alexander Graf Cc: Prasad Joshi, Pekka Enberg, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On 06/15/2011 05:07 PM, Alexander Graf wrote: > > On 16.06.2011, at 00:04, Anthony Liguori wrote: > >> On 06/15/2011 03:13 PM, Prasad Joshi wrote: >>> On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enberg<penberg@kernel.org> wrote: >>>> On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivity<avi@redhat.com> wrote: >>>>> On 06/15/2011 06:53 PM, Pekka Enberg wrote: >>>>>> >>>>>> - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See >>>>>> the >>>>>> following URL for test result details: https://gist.github.com/1026888 >>>>> >>>>> This is surprising. How is qemu invoked? >>>> >>>> Prasad will have the details. Please note that the above are with Qemu >>>> defaults which doesn't use virtio. The results with virtio are little >>>> better but still in favor of tools/kvm. >>>> >>> >>> The qcow2 image used for testing was copied on to /dev/shm to avoid >>> the disk delays in performance measurement. >>> >>> QEMU was invoked with following parameters >>> >>> $ qemu-system-x86_64 -hda<disk image on hard disk> -hdb >>> /dev/shm/test.qcow2 -m 1024M >> >> Looking more closely at native KVM tools, you would need to use the following invocation to have an apples-to-apples comparison: >> >> qemu-system-x86_64 -drive file=/dev/shm/test.qcow2,cache=writeback,if=virtio > > Wouldn't this still be using threaded AIO mode? I thought KVM tools used native AIO? Nope. The relevant code is: > /* blk device ?*/ > disk = blkdev__probe(filename, &st); > if (disk) > return disk; > > fd = open(filename, readonly ? O_RDONLY : O_RDWR); > if (fd < 0) > return NULL; > > /* qcow image ?*/ > disk = qcow_probe(fd, readonly); > if (disk) > return disk; > > /* raw image ?*/ > disk = raw_image__probe(fd, &st, readonly); > if (disk) > return disk; It uses a synchronous I/O model similar to qcow2 in QEMU with what I assume is a global lock that's outside of the actual implementation. I think it lacks some of the caching that Kevin's added recently though so I assume that if QEMU was run with cache=writeback, it would probably do quite a bit better than native KVM tool. It also turns out that while they have the infrastructure to deal with FLUSH, they don't implement it for qcow2 :-/ So even if the guest does an fsync(), it native KVM tool will never actually sync the data to disk... That's probably why it's fast, it doesn't preserve data integrity :( Regards, Anthony Liguori > > > Alex > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 22:20 ` Anthony Liguori @ 2011-06-15 22:44 ` Anthony Liguori 2011-06-16 5:41 ` Pekka Enberg 0 siblings, 1 reply; 43+ messages in thread From: Anthony Liguori @ 2011-06-15 22:44 UTC (permalink / raw) To: Alexander Graf Cc: Prasad Joshi, Pekka Enberg, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On 06/15/2011 05:20 PM, Anthony Liguori wrote: > On 06/15/2011 05:07 PM, Alexander Graf wrote: >> >> On 16.06.2011, at 00:04, Anthony Liguori wrote: >> >>> On 06/15/2011 03:13 PM, Prasad Joshi wrote: >>>> On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enberg<penberg@kernel.org> >>>> wrote: >>>>> On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivity<avi@redhat.com> wrote: >>>>>> On 06/15/2011 06:53 PM, Pekka Enberg wrote: >>>>>>> >>>>>>> - Fast QCOW2 image read-write support beating Qemu in fio >>>>>>> benchmarks. See >>>>>>> the >>>>>>> following URL for test result details: >>>>>>> https://gist.github.com/1026888 >>>>>> >>>>>> This is surprising. How is qemu invoked? >>>>> >>>>> Prasad will have the details. Please note that the above are with Qemu >>>>> defaults which doesn't use virtio. The results with virtio are little >>>>> better but still in favor of tools/kvm. >>>>> >>>> >>>> The qcow2 image used for testing was copied on to /dev/shm to avoid >>>> the disk delays in performance measurement. >>>> >>>> QEMU was invoked with following parameters >>>> >>>> $ qemu-system-x86_64 -hda<disk image on hard disk> -hdb >>>> /dev/shm/test.qcow2 -m 1024M >>> >>> Looking more closely at native KVM tools, you would need to use the >>> following invocation to have an apples-to-apples comparison: >>> >>> qemu-system-x86_64 -drive >>> file=/dev/shm/test.qcow2,cache=writeback,if=virtio >> >> Wouldn't this still be using threaded AIO mode? I thought KVM tools >> used native AIO? > > Nope. The relevant code is: > >> /* blk device ?*/ >> disk = blkdev__probe(filename, &st); >> if (disk) >> return disk; >> >> fd = open(filename, readonly ? O_RDONLY : O_RDWR); >> if (fd < 0) >> return NULL; >> >> /* qcow image ?*/ >> disk = qcow_probe(fd, readonly); >> if (disk) >> return disk; >> >> /* raw image ?*/ >> disk = raw_image__probe(fd, &st, readonly); >> if (disk) >> return disk; > > It uses a synchronous I/O model similar to qcow2 in QEMU with what I > assume is a global lock that's outside of the actual implementation. > > I think it lacks some of the caching that Kevin's added recently though > so I assume that if QEMU was run with cache=writeback, it would probably > do quite a bit better than native KVM tool. > > It also turns out that while they have the infrastructure to deal with > FLUSH, they don't implement it for qcow2 :-/ > > So even if the guest does an fsync(), it native KVM tool will never > actually sync the data to disk... > > That's probably why it's fast, it doesn't preserve data integrity :( Actually, I misread the code. It does unstable writes but it does do fsync() on FLUSH. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 22:44 ` Anthony Liguori @ 2011-06-16 5:41 ` Pekka Enberg 2011-06-16 6:21 ` Pekka Enberg 0 siblings, 1 reply; 43+ messages in thread From: Pekka Enberg @ 2011-06-16 5:41 UTC (permalink / raw) To: Anthony Liguori Cc: Alexander Graf, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On Thu, Jun 16, 2011 at 1:44 AM, Anthony Liguori <anthony@codemonkey.ws> wrote: >> That's probably why it's fast, it doesn't preserve data integrity :( > > Actually, I misread the code. It does unstable writes but it does do > fsync() on FLUSH. Yes. That's fine, right? Or did we misread how virtio block devices are supposed to work? Btw, unstable writes doesn't really explain why *read* performance is better. Pekka ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 5:41 ` Pekka Enberg @ 2011-06-16 6:21 ` Pekka Enberg 2011-06-16 9:24 ` Christoph Hellwig 0 siblings, 1 reply; 43+ messages in thread From: Pekka Enberg @ 2011-06-16 6:21 UTC (permalink / raw) To: Anthony Liguori Cc: Alexander Graf, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On Thu, Jun 16, 2011 at 1:44 AM, Anthony Liguori <anthony@codemonkey.ws> wrote: >>> That's probably why it's fast, it doesn't preserve data integrity :( >> >> Actually, I misread the code. It does unstable writes but it does do >> fsync() on FLUSH. On Thu, Jun 16, 2011 at 8:41 AM, Pekka Enberg <penberg@kernel.org> wrote: > Yes. That's fine, right? Or did we misread how virtio block devices > are supposed to work? And btw, we use sync_file_range() to make sure the metadata part of a QCOW2 image is never corrupted. The rational here is that if the guest doesn't do VIRTIO_BLK_T_FLUSH, you can corrupt your _guest filesystem_ but the _image_ will still work just fine and you can do fsck on it. Also, Prasad ran xfstests and did over-night stress tests to iron out corruption issues. Now we obviously can't promise that we'll never eat your data but I can assure you that we've done as much as we've been able to with the resources we have at the moment. Pekka ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 6:21 ` Pekka Enberg @ 2011-06-16 9:24 ` Christoph Hellwig 2011-06-16 9:34 ` Pekka Enberg 0 siblings, 1 reply; 43+ messages in thread From: Christoph Hellwig @ 2011-06-16 9:24 UTC (permalink / raw) To: Pekka Enberg Cc: Anthony Liguori, Alexander Graf, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On Thu, Jun 16, 2011 at 09:21:03AM +0300, Pekka Enberg wrote: > And btw, we use sync_file_range() Which doesn't help you at all. sync_file_range is just a hint for VM writeback, but never commits filesystem metadata nor the physical disk's write cache. In short it's a completely dangerous interface, and that is pretty well documented in the man page. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 9:24 ` Christoph Hellwig @ 2011-06-16 9:34 ` Pekka Enberg 2011-06-16 9:48 ` Christoph Hellwig 0 siblings, 1 reply; 43+ messages in thread From: Pekka Enberg @ 2011-06-16 9:34 UTC (permalink / raw) To: Christoph Hellwig Cc: Anthony Liguori, Alexander Graf, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe Hi Christoph, On Thu, Jun 16, 2011 at 09:21:03AM +0300, Pekka Enberg wrote: >> And btw, we use sync_file_range() On Thu, Jun 16, 2011 at 12:24 PM, Christoph Hellwig <hch@infradead.org> wrote: > Which doesn't help you at all. sync_file_range is just a hint for VM > writeback, but never commits filesystem metadata nor the physical > disk's write cache. In short it's a completely dangerous interface, and > that is pretty well documented in the man page. Doh - I didn't read it carefully enough and got hung up with: Therefore, unless the application is strictly performing overwrites of already-instantiated disk blocks, there are no guarantees that the data will be available after a crash. without noticing that it obviously doesn't work with filesystems like btrfs that do copy-on-write. What's the right thing to do here? Is fdatasync() sufficient? Pekka ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 9:34 ` Pekka Enberg @ 2011-06-16 9:48 ` Christoph Hellwig 2011-06-16 9:57 ` Ingo Molnar 2011-06-16 9:57 ` Pekka Enberg 0 siblings, 2 replies; 43+ messages in thread From: Christoph Hellwig @ 2011-06-16 9:48 UTC (permalink / raw) To: Pekka Enberg Cc: Christoph Hellwig, Anthony Liguori, Alexander Graf, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On Thu, Jun 16, 2011 at 12:34:04PM +0300, Pekka Enberg wrote: > Hi Christoph, > > On Thu, Jun 16, 2011 at 09:21:03AM +0300, Pekka Enberg wrote: > >> And btw, we use sync_file_range() > > On Thu, Jun 16, 2011 at 12:24 PM, Christoph Hellwig <hch@infradead.org> wrote: > > Which doesn't help you at all. ?sync_file_range is just a hint for VM > > writeback, but never commits filesystem metadata nor the physical > > disk's write cache. ?In short it's a completely dangerous interface, and > > that is pretty well documented in the man page. > > Doh - I didn't read it carefully enough and got hung up with: > > Therefore, unless the application is strictly performing overwrites of > already-instantiated disk blocks, there are no guarantees that the data will > be available after a crash. > > without noticing that it obviously doesn't work with filesystems like > btrfs that do copy-on-write. You also missed: " This system call does not flush disk write caches and thus does not provide any data integrity on systems with volatile disk write caches." so it's not safe if you either have a cache, or are using btrfs, or are using a sparse image, or are using an image preallocated using fallocate/posix_fallocate. > What's the right thing to do here? Is fdatasync() sufficient? Yes. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 9:48 ` Christoph Hellwig @ 2011-06-16 9:57 ` Ingo Molnar 2011-06-16 9:57 ` Pekka Enberg 1 sibling, 0 replies; 43+ messages in thread From: Ingo Molnar @ 2011-06-16 9:57 UTC (permalink / raw) To: Christoph Hellwig Cc: Pekka Enberg, Anthony Liguori, Alexander Graf, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe * Christoph Hellwig <hch@infradead.org> wrote: > > What's the right thing to do here? Is fdatasync() sufficient? > > Yes. Prasad, Pekka, mind redoing the numbers with fdatasync()? I'd be surprised if they were significantly worse but it has to be done to have apples-to-apples numbers. Thanks, Ingo ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 9:48 ` Christoph Hellwig 2011-06-16 9:57 ` Ingo Molnar @ 2011-06-16 9:57 ` Pekka Enberg 2011-06-16 10:02 ` Christoph Hellwig 1 sibling, 1 reply; 43+ messages in thread From: Pekka Enberg @ 2011-06-16 9:57 UTC (permalink / raw) To: Christoph Hellwig Cc: Anthony Liguori, Alexander Graf, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On Thu, Jun 16, 2011 at 12:48 PM, Christoph Hellwig <hch@infradead.org> wrote: > You also missed: > > " This system call does not flush disk write caches and thus does not > provide any data integrity on systems with volatile disk write > caches." > > so it's not safe if you either have a cache, or are using btrfs, or > are using a sparse image, or are using an image preallocated using > fallocate/posix_fallocate. Uh-oh. Someone needs to apply this patch to sync_file_range(): diff --git a/fs/sync.c b/fs/sync.c index ba76b96..32078aa 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -277,6 +277,8 @@ SYSCALL_DEFINE(sync_file_range)(int fd, loff_t offset, loff_t nbytes, int fput_needed; umode_t i_mode; + WARN_ONCE(1, "when this breaks, you get to keep both pieces"); + ret = -EINVAL; if (flags & ~VALID_FLAGS) goto out; >> What's the right thing to do here? Is fdatasync() sufficient? > > Yes. We'll fix that up. Thanks Christoph! Pekka ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 9:57 ` Pekka Enberg @ 2011-06-16 10:02 ` Christoph Hellwig 2011-06-16 11:22 ` Ingo Molnar 0 siblings, 1 reply; 43+ messages in thread From: Christoph Hellwig @ 2011-06-16 10:02 UTC (permalink / raw) To: Pekka Enberg Cc: Christoph Hellwig, Anthony Liguori, Alexander Graf, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On Thu, Jun 16, 2011 at 12:57:36PM +0300, Pekka Enberg wrote: > Uh-oh. Someone needs to apply this patch to sync_file_range(): There actually are a few cases where using it makes sense. It's just the minority. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 10:02 ` Christoph Hellwig @ 2011-06-16 11:22 ` Ingo Molnar 2011-06-16 11:25 ` Christoph Hellwig 2011-06-17 7:21 ` Jeff Garzik 0 siblings, 2 replies; 43+ messages in thread From: Ingo Molnar @ 2011-06-16 11:22 UTC (permalink / raw) To: Christoph Hellwig Cc: Pekka Enberg, Anthony Liguori, Alexander Graf, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe * Christoph Hellwig <hch@infradead.org> wrote: > On Thu, Jun 16, 2011 at 12:57:36PM +0300, Pekka Enberg wrote: > > Uh-oh. Someone needs to apply this patch to sync_file_range(): > > There actually are a few cases where using it makes sense. [...] Such as? I don't think apps can actually know whether disk blocks have been 'instantiated' by a particular filesystem or not, so the manpage: Some details None of these operations write out the file’s metadata. Therefore, unless the appli- cation is strictly performing overwrites of already-instantiated disk blocks, there are no guarantees that the data will be available after a crash. is rather misleading. This is a dangerous (and rather pointless) syscall and this should be made much clearer in the manpage. Thanks, Ingo ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 11:22 ` Ingo Molnar @ 2011-06-16 11:25 ` Christoph Hellwig 2011-06-16 11:40 ` Ingo Molnar 2011-06-17 7:21 ` Jeff Garzik 1 sibling, 1 reply; 43+ messages in thread From: Christoph Hellwig @ 2011-06-16 11:25 UTC (permalink / raw) To: Ingo Molnar Cc: Christoph Hellwig, Pekka Enberg, Anthony Liguori, Alexander Graf, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On Thu, Jun 16, 2011 at 01:22:30PM +0200, Ingo Molnar wrote: > Such as? I don't think apps can actually know whether disk blocks > have been 'instantiated' by a particular filesystem or not, so the > manpage: In general they can't. The only good use case for sync_file_range is to paper over^H^H^H^H^H^H^H^H^Hcontrol write back behaviour. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 11:25 ` Christoph Hellwig @ 2011-06-16 11:40 ` Ingo Molnar 2011-06-16 11:51 ` Christoph Hellwig 0 siblings, 1 reply; 43+ messages in thread From: Ingo Molnar @ 2011-06-16 11:40 UTC (permalink / raw) To: Christoph Hellwig Cc: Pekka Enberg, Anthony Liguori, Alexander Graf, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe * Christoph Hellwig <hch@infradead.org> wrote: > On Thu, Jun 16, 2011 at 01:22:30PM +0200, Ingo Molnar wrote: > > Such as? I don't think apps can actually know whether disk blocks > > have been 'instantiated' by a particular filesystem or not, so > > the manpage: > > In general they can't. The only good use case for sync_file_range > is to paper over^H^H^H^H^H^H^H^H^Hcontrol write back behaviour. Well, if overwrite is fundamentally safe on a filesystem (which is most of them) then sync_file_range() would work - and it has the big advantage that it's a pretty simple facility. Filesystems that cannot guarantee that should map their sync_file_range() implementation to fdatasync() or so, right? Thanks, Ingo ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 11:40 ` Ingo Molnar @ 2011-06-16 11:51 ` Christoph Hellwig 0 siblings, 0 replies; 43+ messages in thread From: Christoph Hellwig @ 2011-06-16 11:51 UTC (permalink / raw) To: Ingo Molnar Cc: Christoph Hellwig, Pekka Enberg, Anthony Liguori, Alexander Graf, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On Thu, Jun 16, 2011 at 01:40:45PM +0200, Ingo Molnar wrote: > Filesystems that cannot guarantee that should map their > sync_file_range() implementation to fdatasync() or so, right? Filesystems aren't even told about sync_file_range, it's purely a VM thing, which is the root of the problem. In-kernel we have all the infrastructure for a real ranged fsync/fdatasync, and once we get a killer users for that can triviall export it at the syscall level. I don't think mapping sync_file_range with it's weird set of flags and confusing behaviour to it is a good idea, though. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 11:22 ` Ingo Molnar 2011-06-16 11:25 ` Christoph Hellwig @ 2011-06-17 7:21 ` Jeff Garzik 1 sibling, 0 replies; 43+ messages in thread From: Jeff Garzik @ 2011-06-17 7:21 UTC (permalink / raw) To: Ingo Molnar Cc: Christoph Hellwig, Pekka Enberg, Anthony Liguori, Alexander Graf, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On 06/16/2011 07:22 AM, Ingo Molnar wrote: > > * Christoph Hellwig<hch@infradead.org> wrote: > >> On Thu, Jun 16, 2011 at 12:57:36PM +0300, Pekka Enberg wrote: >>> Uh-oh. Someone needs to apply this patch to sync_file_range(): >> >> There actually are a few cases where using it makes sense. [...] > > Such as? I don't think apps can actually know whether disk blocks > have been 'instantiated' by a particular filesystem or not, so the > manpage: > > Some details > None of these operations write out the file’s metadata. Therefore, unless the appli- > cation is strictly performing overwrites of already-instantiated disk blocks, there > are no guarantees that the data will be available after a crash. > > is rather misleading. This is a dangerous (and rather pointless) > syscall and this should be made much clearer in the manpage. Not pointless at all -- see Linus's sync_file_range() examples in "Re: Unexpected splice "always copy" behavior observed" thread from May 2010. Apps like MythTV may use it for streaming data to disk, basically shoving the VM out of the way to give the app more fine-grained writeout control. Just don't mistake sync_file_range() for a data integrity syscall. Jeff ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 22:07 ` Alexander Graf 2011-06-15 22:20 ` Anthony Liguori @ 2011-06-16 5:45 ` Pekka Enberg 2011-06-16 7:24 ` Ingo Molnar 1 sibling, 1 reply; 43+ messages in thread From: Pekka Enberg @ 2011-06-16 5:45 UTC (permalink / raw) To: Alexander Graf Cc: Anthony Liguori, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On Thu, Jun 16, 2011 at 1:07 AM, Alexander Graf <agraf@suse.de> wrote: >> qemu-system-x86_64 -drive file=/dev/shm/test.qcow2,cache=writeback,if=virtio > > Wouldn't this still be using threaded AIO mode? I thought KVM tools used native AIO? We don't use AIO at all. It's just normal read()/write() with a thread pool. I actually looked at AIO but didn't really see why we'd want to use it. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 5:45 ` Pekka Enberg @ 2011-06-16 7:24 ` Ingo Molnar 2011-06-16 7:33 ` Pekka Enberg 2011-06-16 9:09 ` Stefan Hajnoczi 0 siblings, 2 replies; 43+ messages in thread From: Ingo Molnar @ 2011-06-16 7:24 UTC (permalink / raw) To: Pekka Enberg Cc: Alexander Graf, Anthony Liguori, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe * Pekka Enberg <penberg@kernel.org> wrote: > On Thu, Jun 16, 2011 at 1:07 AM, Alexander Graf <agraf@suse.de> wrote: > >> qemu-system-x86_64 -drive file=/dev/shm/test.qcow2,cache=writeback,if=virtio > > > > Wouldn't this still be using threaded AIO mode? I thought KVM tools used native AIO? > > We don't use AIO at all. It's just normal read()/write() with a > thread pool. I actually looked at AIO but didn't really see why > we'd want to use it. We could certainly try kernel AIO, it would allow us to do all the virtio-blk logic from the vcpu thread, without single threading it - turning the QCOW2 logic into an AIO driven state machine in essence. Advantages: - we wouldnt do context-switching between the vcpu thread and the helper threads - it would potentially provide tighter caching and potentially would allow higher scalability. Disadvantages: - the kaio codepaths are actually *more* complex than the regular read()/write() IO codepaths - they keep track of an 'IO context', so part of the efficiency advantages are spent on AIO tracking. - executing AIO in the vcpu thread eats up precious vcpu execution time: combined QCOW2 throughput would be limited by a single core's performance, and any time spent on QCOW2 processing would not be spent running the guest CPU. (In such a model we certainly couldnt do more intelligent, CPU-intense storage solutions like on the fly compress/decompress of QCOW2 data.) - state machines are also fragile in the sense that any unintentional blocking of the vcpu context will kill the performance and latencies of *all* processing in certain circumstances. So we generally strive to keep the vcpu demux path obvious, simple and atomic. - more advanced security models go out the window as well: we couldnt isolate drivers from each other if all of them execute in the same vcpu context ... - state machines are also notoriously difficult to develop, debug and maintain. So careful performance, scalability, IO delay and maintainability measurements have to accompany such a model switch, because the disadvantages are numerous. I'd only consider KAIO it if it provides some *real* measurable performance advantage of at least 10% in some important usecase. A few percent probably wouldnt be worth it. Thanks, Ingo ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 7:24 ` Ingo Molnar @ 2011-06-16 7:33 ` Pekka Enberg 2011-06-16 8:07 ` Ingo Molnar 2011-06-16 9:09 ` Stefan Hajnoczi 1 sibling, 1 reply; 43+ messages in thread From: Pekka Enberg @ 2011-06-16 7:33 UTC (permalink / raw) To: Ingo Molnar Cc: Alexander Graf, Anthony Liguori, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe Hi Ingo, On Thu, Jun 16, 2011 at 10:24 AM, Ingo Molnar <mingo@elte.hu> wrote: > - executing AIO in the vcpu thread eats up precious vcpu execution > time: combined QCOW2 throughput would be limited by a single > core's performance, and any time spent on QCOW2 processing would > not be spent running the guest CPU. (In such a model we certainly > couldnt do more intelligent, CPU-intense storage solutions like on > the fly compress/decompress of QCOW2 data.) Most image formats have optional on-the-fly compression/decompression so we'd need to keep the current I/O thread scheme anyway. > I'd only consider KAIO it if it provides some *real* measurable > performance advantage of at least 10% in some important usecase. > A few percent probably wouldnt be worth it. I've only been following AIO kernel development from the sidelines but I really haven't seen any reports of significant gains over read()/write() from a thread pool. Are there any such reports? Pekka ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 7:33 ` Pekka Enberg @ 2011-06-16 8:07 ` Ingo Molnar 0 siblings, 0 replies; 43+ messages in thread From: Ingo Molnar @ 2011-06-16 8:07 UTC (permalink / raw) To: Pekka Enberg Cc: Alexander Graf, Anthony Liguori, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe * Pekka Enberg <penberg@kernel.org> wrote: > Hi Ingo, > > On Thu, Jun 16, 2011 at 10:24 AM, Ingo Molnar <mingo@elte.hu> wrote: > > - executing AIO in the vcpu thread eats up precious vcpu execution > > time: combined QCOW2 throughput would be limited by a single > > core's performance, and any time spent on QCOW2 processing would > > not be spent running the guest CPU. (In such a model we certainly > > couldnt do more intelligent, CPU-intense storage solutions like on > > the fly compress/decompress of QCOW2 data.) > > Most image formats have optional on-the-fly > compression/decompression so we'd need to keep the current I/O > thread scheme anyway. Yeah - although high-performance setups will probably not use that. > > I'd only consider KAIO it if it provides some *real* measurable > > performance advantage of at least 10% in some important usecase. > > A few percent probably wouldnt be worth it. > > I've only been following AIO kernel development from the sidelines > but I really haven't seen any reports of significant gains over > read()/write() from a thread pool. Are there any such reports? I've measured such gains myself a couple of years ago, using an Oracle DB and a well-known OLTP benchmark, on a 64-way system. I also profiled+tuned the kernel-side AIO implementation to be more scalable so i'm reasonably certain that the gains exist, and they were above 10%. So the kaio gains existed back then but they needed sane userspace (POSIX AIO with signal notification sucks) and needed a well-tuned in-kernel implementation as well. (the current AIO code might have bitrotted) Also, synchronous read()/write() [and scheduler() :-)] scalability improvements have not stopped in the past few years so the performance picture might have shifted in favor of a thread pool. Thanks, Ingo ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 7:24 ` Ingo Molnar 2011-06-16 7:33 ` Pekka Enberg @ 2011-06-16 9:09 ` Stefan Hajnoczi 1 sibling, 0 replies; 43+ messages in thread From: Stefan Hajnoczi @ 2011-06-16 9:09 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Alexander Graf, Anthony Liguori, Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe On Thu, Jun 16, 2011 at 8:24 AM, Ingo Molnar <mingo@elte.hu> wrote: > - executing AIO in the vcpu thread eats up precious vcpu execution > time: combined QCOW2 throughput would be limited by a single > core's performance, and any time spent on QCOW2 processing would > not be spent running the guest CPU. (In such a model we certainly > couldnt do more intelligent, CPU-intense storage solutions like on > the fly compress/decompress of QCOW2 data.) This has been a problem in qemu-kvm. io_submit(2) steals time from the guest (I think it was around 20us on the system I measured last year). Add the fact that the guest kernel might be holding a spinlock and it becomes a scalability problem for SMP guests. Anything that takes noticable CPU time should be done outside the vcpu thread. Stefan ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 22:04 ` Anthony Liguori 2011-06-15 22:07 ` Alexander Graf @ 2011-06-16 5:29 ` Stefan Hajnoczi 2011-06-16 5:42 ` Pekka Enberg 1 sibling, 1 reply; 43+ messages in thread From: Stefan Hajnoczi @ 2011-06-16 5:29 UTC (permalink / raw) To: Prasad Joshi Cc: Pekka Enberg, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe, Anthony Liguori On Wed, Jun 15, 2011 at 11:04 PM, Anthony Liguori <anthony@codemonkey.ws> wrote: > On 06/15/2011 03:13 PM, Prasad Joshi wrote: >> >> On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enberg<penberg@kernel.org> wrote: >>> >>> On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivity<avi@redhat.com> wrote: >>>> >>>> On 06/15/2011 06:53 PM, Pekka Enberg wrote: >>>>> >>>>> - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. >>>>> See >>>>> the >>>>> following URL for test result details: >>>>> https://gist.github.com/1026888 >>>> >>>> This is surprising. How is qemu invoked? >>> >>> Prasad will have the details. Please note that the above are with Qemu >>> defaults which doesn't use virtio. The results with virtio are little >>> better but still in favor of tools/kvm. >>> >> >> The qcow2 image used for testing was copied on to /dev/shm to avoid >> the disk delays in performance measurement. >> >> QEMU was invoked with following parameters >> >> $ qemu-system-x86_64 -hda<disk image on hard disk> -hdb >> /dev/shm/test.qcow2 -m 1024M > > Looking more closely at native KVM tools, you would need to use the > following invocation to have an apples-to-apples comparison: > > qemu-system-x86_64 -drive file=/dev/shm/test.qcow2,cache=writeback,if=virtio In addition to this it is important to set identical guest RAM sizes (QEMU's -m <ram_mb>) option. If you are comparing with qemu.git rather than qemu-kvm.git then you need to ./configure --enable-io-thread and launch with QEMU's -enable-kvm option. Stefan ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 5:29 ` Stefan Hajnoczi @ 2011-06-16 5:42 ` Pekka Enberg 0 siblings, 0 replies; 43+ messages in thread From: Pekka Enberg @ 2011-06-16 5:42 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Prasad Joshi, Avi Kivity, linux-kernel, kvm, Andrew Morton, Linus Torvalds, Ingo Molnar, Sasha Levin, Cyrill Gorcunov, Asias He, Jens Axboe, Anthony Liguori On Thu, Jun 16, 2011 at 8:29 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Wed, Jun 15, 2011 at 11:04 PM, Anthony Liguori <anthony@codemonkey.ws> wrote: >> On 06/15/2011 03:13 PM, Prasad Joshi wrote: >>> >>> On Wed, Jun 15, 2011 at 6:10 PM, Pekka Enberg<penberg@kernel.org> wrote: >>>> >>>> On Wed, Jun 15, 2011 at 7:30 PM, Avi Kivity<avi@redhat.com> wrote: >>>>> >>>>> On 06/15/2011 06:53 PM, Pekka Enberg wrote: >>>>>> >>>>>> - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. >>>>>> See >>>>>> the >>>>>> following URL for test result details: >>>>>> https://gist.github.com/1026888 >>>>> >>>>> This is surprising. How is qemu invoked? >>>> >>>> Prasad will have the details. Please note that the above are with Qemu >>>> defaults which doesn't use virtio. The results with virtio are little >>>> better but still in favor of tools/kvm. >>>> >>> >>> The qcow2 image used for testing was copied on to /dev/shm to avoid >>> the disk delays in performance measurement. >>> >>> QEMU was invoked with following parameters >>> >>> $ qemu-system-x86_64 -hda<disk image on hard disk> -hdb >>> /dev/shm/test.qcow2 -m 1024M >> >> Looking more closely at native KVM tools, you would need to use the >> following invocation to have an apples-to-apples comparison: >> >> qemu-system-x86_64 -drive file=/dev/shm/test.qcow2,cache=writeback,if=virtio > > In addition to this it is important to set identical guest RAM sizes > (QEMU's -m <ram_mb>) option. Right. > If you are comparing with qemu.git rather than qemu-kvm.git then you > need to ./configure --enable-io-thread and launch with QEMU's > -enable-kvm option. I think Prasad is testing qemu-kvm.git. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 15:53 [ANNOUNCE] Native Linux KVM tool v2 Pekka Enberg 2011-06-15 16:30 ` Avi Kivity @ 2011-06-15 21:41 ` Anthony Liguori 2011-06-16 14:28 ` Michael S. Tsirkin ` (2 subsequent siblings) 4 siblings, 0 replies; 43+ messages in thread From: Anthony Liguori @ 2011-06-15 21:41 UTC (permalink / raw) To: Pekka Enberg Cc: linux-kernel, kvm, Avi Kivity, Andrew Morton, Linus Torvalds, Ingo Molnar, Prasad Joshi, Sasha Levin, Cyrill Gorcunov, Asias He On 06/15/2011 10:53 AM, Pekka Enberg wrote: > Hi all, > > We’re proud to announce the second version of the Native Linux KVM tool! We’re > now officially aiming for merging to mainline in 3.1. > > Highlights: > > - Experimental GUI support using SDL and VNC > > - SMP support. tools/kvm/ now has a highly scalable, largely lockless driver > interface and the individual drivers are using finegrained locks. > > - TAP-based virtio networking > > - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See the > following URL for test result details: https://gist.github.com/1026888 What was the commit hash for the QEMU you tested? The following caused a major regression in qcow2: commit a16c53b101a9897b0b2be96a1bb3bde7c04380f2 Author: Anthony Liguori <aliguori@us.ibm.com> Date: Mon Jun 6 08:25:06 2011 -0500 Fix regression introduced by -machine accel= Commit 85097db6 changed the timing when kvm_allowed is set until after kvm is initialized. During initialization, the ioeventfd initialization cod checks kvm_enabled() and after this change, ioeventfd is effectively disable If it's not in your tree, it would be useful to rerun the test with the latest git. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 15:53 [ANNOUNCE] Native Linux KVM tool v2 Pekka Enberg 2011-06-15 16:30 ` Avi Kivity 2011-06-15 21:41 ` Anthony Liguori @ 2011-06-16 14:28 ` Michael S. Tsirkin 2011-06-16 15:01 ` Asias He 2011-06-16 14:48 ` Pekka Enberg 2011-06-17 7:31 ` justin 4 siblings, 1 reply; 43+ messages in thread From: Michael S. Tsirkin @ 2011-06-16 14:28 UTC (permalink / raw) To: Pekka Enberg Cc: linux-kernel, kvm, Avi Kivity, Andrew Morton, Linus Torvalds, Ingo Molnar, Prasad Joshi, Sasha Levin, Cyrill Gorcunov, Asias He On Wed, Jun 15, 2011 at 06:53:34PM +0300, Pekka Enberg wrote: > Hi all, > > We’re proud to announce the second version of the Native Linux KVM tool! We’re > now officially aiming for merging to mainline in 3.1. > > Highlights: > > - Experimental GUI support using SDL and VNC > > - SMP support. tools/kvm/ now has a highly scalable, largely lockless driver > interface and the individual drivers are using finegrained locks. > > - TAP-based virtio networking Wanted to ask for a while: would it make sense to use vhost-net? Or maybe use that exclusively? Less hypervisor code to support would help the focus. -- MST ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 14:28 ` Michael S. Tsirkin @ 2011-06-16 15:01 ` Asias He 2011-06-19 8:15 ` Michael S. Tsirkin 0 siblings, 1 reply; 43+ messages in thread From: Asias He @ 2011-06-16 15:01 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Pekka Enberg, linux-kernel, kvm, Avi Kivity, Andrew Morton, Linus Torvalds, Ingo Molnar, Prasad Joshi, Sasha Levin, Cyrill Gorcunov On 06/16/2011 10:28 PM, Michael S. Tsirkin wrote: > On Wed, Jun 15, 2011 at 06:53:34PM +0300, Pekka Enberg wrote: >> Hi all, >> >> We’re proud to announce the second version of the Native Linux KVM tool! We’re >> now officially aiming for merging to mainline in 3.1. >> >> Highlights: >> >> - Experimental GUI support using SDL and VNC >> >> - SMP support. tools/kvm/ now has a highly scalable, largely lockless driver >> interface and the individual drivers are using finegrained locks. >> >> - TAP-based virtio networking > > Wanted to ask for a while: would it make sense to use vhost-net? > Or maybe use that exclusively? > Less hypervisor code to support would help the focus. > Sure. We are planning to use vhost-net. Just out of time right now, we are currently working on simple user model network support which allows plain user to use network without root privilege. -- Best Regards, Asias He ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 15:01 ` Asias He @ 2011-06-19 8:15 ` Michael S. Tsirkin 0 siblings, 0 replies; 43+ messages in thread From: Michael S. Tsirkin @ 2011-06-19 8:15 UTC (permalink / raw) To: Asias He Cc: Pekka Enberg, linux-kernel, kvm, Avi Kivity, Andrew Morton, Linus Torvalds, Ingo Molnar, Prasad Joshi, Sasha Levin, Cyrill Gorcunov On Thu, Jun 16, 2011 at 11:01:22PM +0800, Asias He wrote: > On 06/16/2011 10:28 PM, Michael S. Tsirkin wrote: > > On Wed, Jun 15, 2011 at 06:53:34PM +0300, Pekka Enberg wrote: > >> Hi all, > >> > >> We’re proud to announce the second version of the Native Linux KVM tool! We’re > >> now officially aiming for merging to mainline in 3.1. > >> > >> Highlights: > >> > >> - Experimental GUI support using SDL and VNC > >> > >> - SMP support. tools/kvm/ now has a highly scalable, largely lockless driver > >> interface and the individual drivers are using finegrained locks. > >> > >> - TAP-based virtio networking > > > > Wanted to ask for a while: would it make sense to use vhost-net? > > Or maybe use that exclusively? > > Less hypervisor code to support would help the focus. > > > > Sure. We are planning to use vhost-net. Just out of time right now, we > are currently working on simple user model network support which allows > plain user to use network without root privilege. Yes, qemu does this by implementing NAT and the TCP stack in userspace. What always made me unhappy about this solution is that we have a perfectly fine NAT and TCP in kernel, we just lack APIs to make an unpriveledged user make use of it the way we want. I hope you can avoid this duplication. Another question is whether you want to implement a dhcp server. -- MST ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 15:53 [ANNOUNCE] Native Linux KVM tool v2 Pekka Enberg ` (2 preceding siblings ...) 2011-06-16 14:28 ` Michael S. Tsirkin @ 2011-06-16 14:48 ` Pekka Enberg 2011-06-16 22:50 ` Anthony Liguori 2011-06-17 5:11 ` Stefan Hajnoczi 2011-06-17 7:31 ` justin 4 siblings, 2 replies; 43+ messages in thread From: Pekka Enberg @ 2011-06-16 14:48 UTC (permalink / raw) To: linux-kernel, kvm Cc: Avi Kivity, Andrew Morton, Linus Torvalds, Ingo Molnar, Prasad Joshi, Sasha Levin, Cyrill Gorcunov, Asias He On Wed, Jun 15, 2011 at 6:53 PM, Pekka Enberg <penberg@kernel.org> wrote: > - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See the > following URL for test result details: https://gist.github.com/1026888 It turns out we were benchmarking the wrong guest kernel version for qemu-kvm which is why it performed so much worse. Here's a summary of qemu-kvm beating tools/kvm: https://raw.github.com/gist/1029359/9f9a714ecee64802c08a3455971e410d5029370b/gistfile1.txt I'd ask for a brown paper bag if I wasn't so busy eating my hat at the moment. Pekka ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 14:48 ` Pekka Enberg @ 2011-06-16 22:50 ` Anthony Liguori 2011-06-17 1:03 ` Sasha Levin 2011-06-17 5:11 ` Stefan Hajnoczi 1 sibling, 1 reply; 43+ messages in thread From: Anthony Liguori @ 2011-06-16 22:50 UTC (permalink / raw) To: Pekka Enberg Cc: linux-kernel, kvm, Avi Kivity, Andrew Morton, Linus Torvalds, Ingo Molnar, Prasad Joshi, Sasha Levin, Cyrill Gorcunov, Asias He On 06/16/2011 09:48 AM, Pekka Enberg wrote: > On Wed, Jun 15, 2011 at 6:53 PM, Pekka Enberg<penberg@kernel.org> wrote: >> - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See the >> following URL for test result details: https://gist.github.com/1026888 > > It turns out we were benchmarking the wrong guest kernel version for > qemu-kvm which is why it performed so much worse. Here's a summary of > qemu-kvm beating tools/kvm: > > https://raw.github.com/gist/1029359/9f9a714ecee64802c08a3455971e410d5029370b/gistfile1.txt > > I'd ask for a brown paper bag if I wasn't so busy eating my hat at the moment. np, it happens. Is that still with QEMU with IDE emulation, cache=writethrough, and 128MB of guest memory? Does your raw driver support multiple parallel requests? It doesn't look like it does from how I read the code. At some point, I'd be happy to help ya'll do some benchmarking against QEMU. It would be very useful to compare as we have some ugly things in QEMU that we've never quite been able to determine how much they affect performance. Having an alternative implementation to benchmark against would be quite helpful. Regards, Anthony Liguori > > Pekka > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 22:50 ` Anthony Liguori @ 2011-06-17 1:03 ` Sasha Levin 2011-06-17 5:00 ` Stefan Hajnoczi 2011-06-17 13:45 ` Anthony Liguori 0 siblings, 2 replies; 43+ messages in thread From: Sasha Levin @ 2011-06-17 1:03 UTC (permalink / raw) To: Anthony Liguori Cc: Pekka Enberg, linux-kernel, kvm, Avi Kivity, Andrew Morton, Linus Torvalds, Ingo Molnar, Prasad Joshi, Cyrill Gorcunov, Asias He On Thu, 2011-06-16 at 17:50 -0500, Anthony Liguori wrote: > On 06/16/2011 09:48 AM, Pekka Enberg wrote: > > On Wed, Jun 15, 2011 at 6:53 PM, Pekka Enberg<penberg@kernel.org> wrote: > >> - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See the > >> following URL for test result details: https://gist.github.com/1026888 > > > > It turns out we were benchmarking the wrong guest kernel version for > > qemu-kvm which is why it performed so much worse. Here's a summary of > > qemu-kvm beating tools/kvm: > > > > https://raw.github.com/gist/1029359/9f9a714ecee64802c08a3455971e410d5029370b/gistfile1.txt > > > > I'd ask for a brown paper bag if I wasn't so busy eating my hat at the moment. > > np, it happens. > > Is that still with QEMU with IDE emulation, cache=writethrough, and > 128MB of guest memory? > > Does your raw driver support multiple parallel requests? It doesn't > look like it does from how I read the code. At some point, I'd be happy > to help ya'll do some benchmarking against QEMU. > Each virtio-blk device can process requests regardless of other virtio-blk devices, which means that we can do parallel requests for devices. Within each device, we support parallel requests in the sense that we do vectored IO for each head (which may contain multiple blocks) in the vring, we don't do multiple heads because when I've tried adding AIO I've noticed that at most there are 2-3 possible heads - and since it points to the same device it doesn't really help running them in parallel. > It would be very useful to compare as we have some ugly things in QEMU > that we've never quite been able to determine how much they affect > performance. Having an alternative implementation to benchmark against > would be quite helpful. -- Sasha. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-17 1:03 ` Sasha Levin @ 2011-06-17 5:00 ` Stefan Hajnoczi 2011-06-17 13:41 ` Sasha Levin 2011-06-17 13:45 ` Anthony Liguori 1 sibling, 1 reply; 43+ messages in thread From: Stefan Hajnoczi @ 2011-06-17 5:00 UTC (permalink / raw) To: Sasha Levin Cc: Anthony Liguori, Pekka Enberg, linux-kernel, kvm, Avi Kivity, Andrew Morton, Linus Torvalds, Ingo Molnar, Prasad Joshi, Cyrill Gorcunov, Asias He On Fri, Jun 17, 2011 at 2:03 AM, Sasha Levin <levinsasha928@gmail.com> wrote: > On Thu, 2011-06-16 at 17:50 -0500, Anthony Liguori wrote: >> On 06/16/2011 09:48 AM, Pekka Enberg wrote: >> > On Wed, Jun 15, 2011 at 6:53 PM, Pekka Enberg<penberg@kernel.org> wrote: >> >> - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See the >> >> following URL for test result details: https://gist.github.com/1026888 >> > >> > It turns out we were benchmarking the wrong guest kernel version for >> > qemu-kvm which is why it performed so much worse. Here's a summary of >> > qemu-kvm beating tools/kvm: >> > >> > https://raw.github.com/gist/1029359/9f9a714ecee64802c08a3455971e410d5029370b/gistfile1.txt >> > >> > I'd ask for a brown paper bag if I wasn't so busy eating my hat at the moment. >> >> np, it happens. >> >> Is that still with QEMU with IDE emulation, cache=writethrough, and >> 128MB of guest memory? >> >> Does your raw driver support multiple parallel requests? It doesn't >> look like it does from how I read the code. At some point, I'd be happy >> to help ya'll do some benchmarking against QEMU. >> > > Each virtio-blk device can process requests regardless of other > virtio-blk devices, which means that we can do parallel requests for > devices. > > Within each device, we support parallel requests in the sense that we do > vectored IO for each head (which may contain multiple blocks) in the > vring, we don't do multiple heads because when I've tried adding AIO > I've noticed that at most there are 2-3 possible heads - and since it > points to the same device it doesn't really help running them in > parallel. One thing that QEMU does but I'm a little suspicious of is request merging. virtio-blk will submit those 2-3 heads using bdrv_aio_multiwrite() if they become available in the same virtqueue notify. The requests will be merged if possible. My feeling is that we should already have merged requests coming through virtio-blk and there should be no need to do any merging - which could be a workaround for a poor virtio-blk vring configuration that prevented the guest from sending large requests. However, this feature did yield performance improvements with qcow2 image files when it was introduced, so that would be interesting to look at. Are you enabling indirect descriptors on the virtio-blk vring? That should allow more requests to be made available because you don't run out of vring descriptors so easily. Stefan ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-17 5:00 ` Stefan Hajnoczi @ 2011-06-17 13:41 ` Sasha Levin 0 siblings, 0 replies; 43+ messages in thread From: Sasha Levin @ 2011-06-17 13:41 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Anthony Liguori, Pekka Enberg, linux-kernel, kvm, Avi Kivity, Andrew Morton, Linus Torvalds, Ingo Molnar, Prasad Joshi, Cyrill Gorcunov, Asias He On Fri, 2011-06-17 at 06:00 +0100, Stefan Hajnoczi wrote: > On Fri, Jun 17, 2011 at 2:03 AM, Sasha Levin <levinsasha928@gmail.com> wrote: > > On Thu, 2011-06-16 at 17:50 -0500, Anthony Liguori wrote: > >> On 06/16/2011 09:48 AM, Pekka Enberg wrote: > >> > On Wed, Jun 15, 2011 at 6:53 PM, Pekka Enberg<penberg@kernel.org> wrote: > >> >> - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See the > >> >> following URL for test result details: https://gist.github.com/1026888 > >> > > >> > It turns out we were benchmarking the wrong guest kernel version for > >> > qemu-kvm which is why it performed so much worse. Here's a summary of > >> > qemu-kvm beating tools/kvm: > >> > > >> > https://raw.github.com/gist/1029359/9f9a714ecee64802c08a3455971e410d5029370b/gistfile1.txt > >> > > >> > I'd ask for a brown paper bag if I wasn't so busy eating my hat at the moment. > >> > >> np, it happens. > >> > >> Is that still with QEMU with IDE emulation, cache=writethrough, and > >> 128MB of guest memory? > >> > >> Does your raw driver support multiple parallel requests? It doesn't > >> look like it does from how I read the code. At some point, I'd be happy > >> to help ya'll do some benchmarking against QEMU. > >> > > > > Each virtio-blk device can process requests regardless of other > > virtio-blk devices, which means that we can do parallel requests for > > devices. > > > > Within each device, we support parallel requests in the sense that we do > > vectored IO for each head (which may contain multiple blocks) in the > > vring, we don't do multiple heads because when I've tried adding AIO > > I've noticed that at most there are 2-3 possible heads - and since it > > points to the same device it doesn't really help running them in > > parallel. > > One thing that QEMU does but I'm a little suspicious of is request > merging. virtio-blk will submit those 2-3 heads using > bdrv_aio_multiwrite() if they become available in the same virtqueue > notify. The requests will be merged if possible. > > My feeling is that we should already have merged requests coming > through virtio-blk and there should be no need to do any merging - > which could be a workaround for a poor virtio-blk vring configuration > that prevented the guest from sending large requests. However, this > feature did yield performance improvements with qcow2 image files when > it was introduced, so that would be interesting to look at. > > Are you enabling indirect descriptors on the virtio-blk vring? That > should allow more requests to be made available because you don't run > out of vring descriptors so easily. No, but we're usually not getting close to running out of vring descriptors either. -- Sasha. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-17 1:03 ` Sasha Levin 2011-06-17 5:00 ` Stefan Hajnoczi @ 2011-06-17 13:45 ` Anthony Liguori 1 sibling, 0 replies; 43+ messages in thread From: Anthony Liguori @ 2011-06-17 13:45 UTC (permalink / raw) To: Sasha Levin Cc: Pekka Enberg, linux-kernel, kvm, Avi Kivity, Andrew Morton, Linus Torvalds, Ingo Molnar, Prasad Joshi, Cyrill Gorcunov, Asias He On 06/16/2011 08:03 PM, Sasha Levin wrote: > On Thu, 2011-06-16 at 17:50 -0500, Anthony Liguori wrote: > Each virtio-blk device can process requests regardless of other > virtio-blk devices, which means that we can do parallel requests for > devices. > > Within each device, we support parallel requests in the sense that we do > vectored IO for each head (which may contain multiple blocks) in the > vring, we don't do multiple heads because when I've tried adding AIO A scatter/gather list isn't multiple requests, it's just one. So you handle one request at a time ATM. There's nothing with that, but there's no use in saying "we support it in the sense..." :-) > I've noticed that at most there are 2-3 possible heads - and since it > points to the same device it doesn't really help running them in > parallel. Sure it does. If you use the host page cache (and you do), then if you have two requests, A and B, and request A requires a disk access and request B can be satisfied from the page cache, then being able to submit both requests means that you can return B almost immediately instead of stalling out to finish A before starting B. Not to mention that modern disks work better with multiple in flight requests because they have their own cache and reordering algorithms in the drives cache. With RAID and higher end storage devices, a single device may map to multiple spindles. The only way to have them all spin at once is to submit parallel requests. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-16 14:48 ` Pekka Enberg 2011-06-16 22:50 ` Anthony Liguori @ 2011-06-17 5:11 ` Stefan Hajnoczi 1 sibling, 0 replies; 43+ messages in thread From: Stefan Hajnoczi @ 2011-06-17 5:11 UTC (permalink / raw) To: Pekka Enberg Cc: linux-kernel, kvm, Avi Kivity, Andrew Morton, Linus Torvalds, Ingo Molnar, Prasad Joshi, Sasha Levin, Cyrill Gorcunov, Asias He On Thu, Jun 16, 2011 at 3:48 PM, Pekka Enberg <penberg@kernel.org> wrote: > On Wed, Jun 15, 2011 at 6:53 PM, Pekka Enberg <penberg@kernel.org> wrote: >> - Fast QCOW2 image read-write support beating Qemu in fio benchmarks. See the >> following URL for test result details: https://gist.github.com/1026888 > > It turns out we were benchmarking the wrong guest kernel version for > qemu-kvm which is why it performed so much worse. Here's a summary of > qemu-kvm beating tools/kvm: > > https://raw.github.com/gist/1029359/9f9a714ecee64802c08a3455971e410d5029370b/gistfile1.txt Thanks for digging into the results so quickly and rerunning. Stefan ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [ANNOUNCE] Native Linux KVM tool v2 2011-06-15 15:53 [ANNOUNCE] Native Linux KVM tool v2 Pekka Enberg ` (3 preceding siblings ...) 2011-06-16 14:48 ` Pekka Enberg @ 2011-06-17 7:31 ` justin 4 siblings, 0 replies; 43+ messages in thread From: justin @ 2011-06-17 7:31 UTC (permalink / raw) To: Pekka Enberg Cc: linux-kernel, kvm, Avi Kivity, Andrew Morton, Linus Torvalds, Ingo Molnar, Prasad Joshi, Sasha Levin, Cyrill Gorcunov, Asias He On 2011年06月15日 23:53, Pekka Enberg wrote: > or alternatively, if you already have a kernel source tree: > > git remote add kvm-tool git://github.com/penberg/linux-kvm.git > git remote update > git checkout -b kvm-tool/master kvm-tool I tried this, but it do not work, there is something wrong when I executed the 3rd git command. I tried "git checkout -b tools/kvm kvm-tool/master", it seems works fine. -- justin ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2011-06-19 8:15 UTC | newest] Thread overview: 43+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-06-15 15:53 [ANNOUNCE] Native Linux KVM tool v2 Pekka Enberg 2011-06-15 16:30 ` Avi Kivity 2011-06-15 17:10 ` Pekka Enberg 2011-06-15 20:13 ` Prasad Joshi 2011-06-15 20:23 ` Sasha Levin 2011-06-15 20:49 ` Prasad Joshi 2011-06-15 21:53 ` Anthony Liguori 2011-06-15 22:04 ` Anthony Liguori 2011-06-15 22:07 ` Alexander Graf 2011-06-15 22:20 ` Anthony Liguori 2011-06-15 22:44 ` Anthony Liguori 2011-06-16 5:41 ` Pekka Enberg 2011-06-16 6:21 ` Pekka Enberg 2011-06-16 9:24 ` Christoph Hellwig 2011-06-16 9:34 ` Pekka Enberg 2011-06-16 9:48 ` Christoph Hellwig 2011-06-16 9:57 ` Ingo Molnar 2011-06-16 9:57 ` Pekka Enberg 2011-06-16 10:02 ` Christoph Hellwig 2011-06-16 11:22 ` Ingo Molnar 2011-06-16 11:25 ` Christoph Hellwig 2011-06-16 11:40 ` Ingo Molnar 2011-06-16 11:51 ` Christoph Hellwig 2011-06-17 7:21 ` Jeff Garzik 2011-06-16 5:45 ` Pekka Enberg 2011-06-16 7:24 ` Ingo Molnar 2011-06-16 7:33 ` Pekka Enberg 2011-06-16 8:07 ` Ingo Molnar 2011-06-16 9:09 ` Stefan Hajnoczi 2011-06-16 5:29 ` Stefan Hajnoczi 2011-06-16 5:42 ` Pekka Enberg 2011-06-15 21:41 ` Anthony Liguori 2011-06-16 14:28 ` Michael S. Tsirkin 2011-06-16 15:01 ` Asias He 2011-06-19 8:15 ` Michael S. Tsirkin 2011-06-16 14:48 ` Pekka Enberg 2011-06-16 22:50 ` Anthony Liguori 2011-06-17 1:03 ` Sasha Levin 2011-06-17 5:00 ` Stefan Hajnoczi 2011-06-17 13:41 ` Sasha Levin 2011-06-17 13:45 ` Anthony Liguori 2011-06-17 5:11 ` Stefan Hajnoczi 2011-06-17 7:31 ` justin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).