* [PATCH][RFC] open HVM backing storage with O_SYNC
@ 2006-07-28 7:06 Rik van Riel
2006-07-28 17:02 ` Rik van Riel
2006-08-04 9:29 ` Christian Limpach
0 siblings, 2 replies; 7+ messages in thread
From: Rik van Riel @ 2006-07-28 7:06 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1: Type: text/plain, Size: 758 bytes --]
I noticed that the new qemu-dm code has DMA_MULTI_THREAD defined, so
I/O already overlaps with CPU run time of the guest domain. This
means that we might as well open the backing storage with O_SYNC, so
writes done by the guest hit the disk when the guest expects them to,
and in the other the guest expects them to.
I am now running my postgresql HVM test domain (which has had its
database eaten a number of times by the async write behaviour) with
this patch, and will try to abuse it heavily over the next few days.
Any comments on this patch?
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
[-- Attachment #2: xen-hvm-osync.patch --]
[-- Type: text/x-patch, Size: 3160 bytes --]
Make sure disk writes really made it to disk before we report I/O
completion to the guest domain. The DMA_MULTI_THREAD functionality
from the qemu-dm IDE emulation should make the performance overhead
of synchronous writes bearable, or at least comparable to native
hardware.
Signed-off-by: Rik van Riel <riel@redhat.com>
--- xen-unstable-10712/tools/ioemu/block-bochs.c.osync 2006-07-28 02:15:56.000000000 -0400
+++ xen-unstable-10712/tools/ioemu/block-bochs.c 2006-07-28 02:21:08.000000000 -0400
@@ -91,7 +91,7 @@
int fd, i;
struct bochs_header bochs;
- fd = open(filename, O_RDWR | O_BINARY | O_LARGEFILE);
+ fd = open(filename, O_RDWR | O_BINARY | O_LARGEFILE | O_SYNC);
if (fd < 0) {
fd = open(filename, O_RDONLY | O_BINARY | O_LARGEFILE);
if (fd < 0)
--- xen-unstable-10712/tools/ioemu/block.c.osync 2006-07-28 02:15:56.000000000 -0400
+++ xen-unstable-10712/tools/ioemu/block.c 2006-07-28 02:19:27.000000000 -0400
@@ -677,7 +677,7 @@
int rv;
#endif
- fd = open(filename, O_RDWR | O_BINARY | O_LARGEFILE);
+ fd = open(filename, O_RDWR | O_BINARY | O_LARGEFILE | O_SYNC);
if (fd < 0) {
fd = open(filename, O_RDONLY | O_BINARY | O_LARGEFILE);
if (fd < 0)
--- xen-unstable-10712/tools/ioemu/block-cloop.c.osync 2006-07-28 02:15:56.000000000 -0400
+++ xen-unstable-10712/tools/ioemu/block-cloop.c 2006-07-28 02:17:13.000000000 -0400
@@ -55,7 +55,7 @@
BDRVCloopState *s = bs->opaque;
uint32_t offsets_size,max_compressed_block_size=1,i;
- s->fd = open(filename, O_RDONLY | O_BINARY | O_LARGEFILE);
+ s->fd = open(filename, O_RDONLY | O_BINARY | O_LARGEFILE | O_SYNC);
if (s->fd < 0)
return -1;
bs->read_only = 1;
--- xen-unstable-10712/tools/ioemu/block-cow.c.osync 2006-07-28 02:15:56.000000000 -0400
+++ xen-unstable-10712/tools/ioemu/block-cow.c 2006-07-28 02:21:34.000000000 -0400
@@ -69,7 +69,7 @@
struct cow_header_v2 cow_header;
int64_t size;
- fd = open(filename, O_RDWR | O_BINARY | O_LARGEFILE);
+ fd = open(filename, O_RDWR | O_BINARY | O_LARGEFILE | O_SYNC);
if (fd < 0) {
fd = open(filename, O_RDONLY | O_BINARY | O_LARGEFILE);
if (fd < 0)
--- xen-unstable-10712/tools/ioemu/block-qcow.c.osync 2006-07-28 02:15:56.000000000 -0400
+++ xen-unstable-10712/tools/ioemu/block-qcow.c 2006-07-28 02:20:05.000000000 -0400
@@ -95,7 +95,7 @@
int fd, len, i, shift;
QCowHeader header;
- fd = open(filename, O_RDWR | O_BINARY | O_LARGEFILE);
+ fd = open(filename, O_RDWR | O_BINARY | O_LARGEFILE | O_SYNC);
if (fd < 0) {
fd = open(filename, O_RDONLY | O_BINARY | O_LARGEFILE);
if (fd < 0)
--- xen-unstable-10712/tools/ioemu/block-vmdk.c.osync 2006-07-28 02:15:56.000000000 -0400
+++ xen-unstable-10712/tools/ioemu/block-vmdk.c 2006-07-28 02:20:20.000000000 -0400
@@ -96,7 +96,7 @@
uint32_t magic;
int l1_size;
- fd = open(filename, O_RDWR | O_BINARY | O_LARGEFILE);
+ fd = open(filename, O_RDWR | O_BINARY | O_LARGEFILE | O_SYNC);
if (fd < 0) {
fd = open(filename, O_RDONLY | O_BINARY | O_LARGEFILE);
if (fd < 0)
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][RFC] open HVM backing storage with O_SYNC
2006-07-28 7:06 [PATCH][RFC] open HVM backing storage with O_SYNC Rik van Riel
@ 2006-07-28 17:02 ` Rik van Riel
2006-07-28 20:21 ` Rik van Riel
2006-08-04 9:29 ` Christian Limpach
1 sibling, 1 reply; 7+ messages in thread
From: Rik van Riel @ 2006-07-28 17:02 UTC (permalink / raw)
To: xen-devel
Rik van Riel wrote:
> Any comments on this patch?
I got some comments from Alan, who would like to see this behaviour
tunable with hdparm from inside the guest. This requires larger
qemu changes though, to be specific an ->fsync callback into each
of the backing store drivers, so that is something for the qemu
mailing list.
The current bottleneck seems to be that MAX_MULT_COUNT is only 16.
I will try raising this to 256 so we can transport a lot more data
per world and domain switch...
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][RFC] open HVM backing storage with O_SYNC
2006-07-28 17:02 ` Rik van Riel
@ 2006-07-28 20:21 ` Rik van Riel
2006-07-29 0:44 ` Christian Limpach
0 siblings, 1 reply; 7+ messages in thread
From: Rik van Riel @ 2006-07-28 20:21 UTC (permalink / raw)
To: xen-devel
Rik van Riel wrote:
> Rik van Riel wrote:
>
>> Any comments on this patch?
>
> I got some comments from Alan, who would like to see this behaviour
> tunable with hdparm from inside the guest. This requires larger
> qemu changes though, to be specific an ->fsync callback into each
> of the backing store drivers, so that is something for the qemu
> mailing list.
Considering the AIO-based development going on in the qemu community,
I think we should stick with the O_SYNC band-aid. The idea Alan
described would just be a fancier band-aid.
> The current bottleneck seems to be that MAX_MULT_COUNT is only 16.
Upon closer inspection of the code, this seems to not be the case for
LBA48 transfers.
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][RFC] open HVM backing storage with O_SYNC
2006-07-28 20:21 ` Rik van Riel
@ 2006-07-29 0:44 ` Christian Limpach
2006-07-30 22:45 ` Rik van Riel
0 siblings, 1 reply; 7+ messages in thread
From: Christian Limpach @ 2006-07-29 0:44 UTC (permalink / raw)
To: Rik van Riel; +Cc: xen-devel
On 7/28/06, Rik van Riel <riel@redhat.com> wrote:
> Rik van Riel wrote:
> > Rik van Riel wrote:
> >
> >> Any comments on this patch?
> >
> > I got some comments from Alan, who would like to see this behaviour
> > tunable with hdparm from inside the guest. This requires larger
> > qemu changes though, to be specific an ->fsync callback into each
> > of the backing store drivers, so that is something for the qemu
> > mailing list.
>
> Considering the AIO-based development going on in the qemu community,
> I think we should stick with the O_SYNC band-aid. The idea Alan
> described would just be a fancier band-aid.
Another possibility would be to integrate blktap/tapdisk into qemu
which will provide asynchronous completion events and hides the
immediate AIO interaction from qemu. This should also make using qemu
inside a stub domain easier since the code to talk to tapdisk will be
very similar to the blkfront code. Also, this is somewhat required to
use tap devices for HVM domains, the alternative of using blkfront
within dom0 to export the device for qemu to use doesn't sound too
appealing.
Do you fancy looking into this?
> > The current bottleneck seems to be that MAX_MULT_COUNT is only 16.
>
> Upon closer inspection of the code, this seems to not be the case for
> LBA48 transfers.
Any other ideas what could be the bottleneck then?
christian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][RFC] open HVM backing storage with O_SYNC
2006-07-29 0:44 ` Christian Limpach
@ 2006-07-30 22:45 ` Rik van Riel
2006-08-02 23:26 ` Rik van Riel
0 siblings, 1 reply; 7+ messages in thread
From: Rik van Riel @ 2006-07-30 22:45 UTC (permalink / raw)
To: Christian.Limpach; +Cc: xen-devel
Christian Limpach wrote:
> Another possibility would be to integrate blktap/tapdisk into qemu
> which will provide asynchronous completion events and hides the
> immediate AIO interaction from qemu. This should also make using qemu
> inside a stub domain easier
Sounds like a very good idea indeed.
> Do you fancy looking into this?
Unfortunately we've got some nasty blocker bugs left for
Fedora Core 6 which we're trying to track down first...
>> > The current bottleneck seems to be that MAX_MULT_COUNT is only 16.
>>
>> Upon closer inspection of the code, this seems to not be the case for
>> LBA48 transfers.
>
> Any other ideas what could be the bottleneck then?
Probably scheduling latency. I'm running 2 VT domains on this
system, and both qemu-dm processes are taking up to 25% of the
CPU each, on a 3GHz system.
When running top inside the VT guest, a lot of CPU time is spent
in "hi" and "si" time, which is irq code being emulated by qemu-dm.
Of course, with qemu-dm taking this much CPU time, it'll have a
lower CPU priority and will not get scheduled immediately. Still
fast enough to have 10000+ context switches/second, but apparently
not quite fast enough for the VT guest to have decent performance
under heavy network traffic...
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][RFC] open HVM backing storage with O_SYNC
2006-07-30 22:45 ` Rik van Riel
@ 2006-08-02 23:26 ` Rik van Riel
0 siblings, 0 replies; 7+ messages in thread
From: Rik van Riel @ 2006-08-02 23:26 UTC (permalink / raw)
To: Rik van Riel; +Cc: xen-devel, Christian.Limpach
Rik van Riel wrote:
> Christian Limpach wrote:
>> Any other ideas what could be the bottleneck then?
>
> Probably scheduling latency.
After switching to the rtl8139 network emulation (which now work
well), the CPU use of both qemu-dm and my VT guests dramatically
decreased and performance is a lot better now.
I'll let you know what the next bottleneck is once I run into
it :)
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH][RFC] open HVM backing storage with O_SYNC
2006-07-28 7:06 [PATCH][RFC] open HVM backing storage with O_SYNC Rik van Riel
2006-07-28 17:02 ` Rik van Riel
@ 2006-08-04 9:29 ` Christian Limpach
1 sibling, 0 replies; 7+ messages in thread
From: Christian Limpach @ 2006-08-04 9:29 UTC (permalink / raw)
To: Rik van Riel; +Cc: xen-devel
On 7/28/06, Rik van Riel <riel@redhat.com> wrote:
> I noticed that the new qemu-dm code has DMA_MULTI_THREAD defined, so
> I/O already overlaps with CPU run time of the guest domain. This
> means that we might as well open the backing storage with O_SYNC, so
> writes done by the guest hit the disk when the guest expects them to,
> and in the other the guest expects them to.
>
> I am now running my postgresql HVM test domain (which has had its
> database eaten a number of times by the async write behaviour) with
> this patch, and will try to abuse it heavily over the next few days.
>
> Any comments on this patch?
Applied, thanks!
christian
>
> --
> "Debugging is twice as hard as writing the code in the first place.
> Therefore, if you write the code as cleverly as possible, you are,
> by definition, not smart enough to debug it." - Brian W. Kernighan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-08-04 9:29 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-28 7:06 [PATCH][RFC] open HVM backing storage with O_SYNC Rik van Riel
2006-07-28 17:02 ` Rik van Riel
2006-07-28 20:21 ` Rik van Riel
2006-07-29 0:44 ` Christian Limpach
2006-07-30 22:45 ` Rik van Riel
2006-08-02 23:26 ` Rik van Riel
2006-08-04 9:29 ` Christian Limpach
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.