* [Qemu-devel] [patch] Fix block I/O hang.
@ 2008-11-11 16:35 Gerd Hoffmann
2008-11-11 16:49 ` Anthony Liguori
0 siblings, 1 reply; 13+ messages in thread
From: Gerd Hoffmann @ 2008-11-11 16:35 UTC (permalink / raw)
To: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 48 bytes --]
Hi,
$subject says all.
please apply,
Gerd
[-- Attachment #2: 0050-Fix-qemu-hang-in-block-layer.patch --]
[-- Type: text/plain, Size: 836 bytes --]
>From db0f41a843004157fd1181082716f83879f518ca Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Tue, 11 Nov 2008 17:31:55 +0100
Subject: [PATCH] Fix qemu hang in block layer.
Without s->rfd being non-blocking qemu hangs in posix_aio_read()
due to the read() syscall blocking.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
block-raw-posix.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/block-raw-posix.c b/block-raw-posix.c
index c06e38d..a08a773 100644
--- a/block-raw-posix.c
+++ b/block-raw-posix.c
@@ -591,6 +591,7 @@ static int posix_aio_init(void)
s->rfd = fds[0];
s->wfd = fds[1];
+ fcntl(s->rfd, F_SETFL, O_NONBLOCK);
fcntl(s->wfd, F_SETFL, O_NONBLOCK);
qemu_aio_set_fd_handler(s->rfd, posix_aio_read, NULL, posix_aio_flush, s);
--
1.5.6.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [patch] Fix block I/O hang.
2008-11-11 16:35 [Qemu-devel] [patch] Fix block I/O hang Gerd Hoffmann
@ 2008-11-11 16:49 ` Anthony Liguori
2008-11-11 17:48 ` Gerd Hoffmann
2008-11-12 13:03 ` Gerd Hoffmann
0 siblings, 2 replies; 13+ messages in thread
From: Anthony Liguori @ 2008-11-11 16:49 UTC (permalink / raw)
To: qemu-devel
Gerd Hoffmann wrote:
> Hi,
>
> $subject says all.
>
> please apply,
> Gerd
>
Under what circumstances? posix_aio_read() is only invoked from a
select callback. This means there should be data available to be read.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [patch] Fix block I/O hang.
2008-11-11 16:49 ` Anthony Liguori
@ 2008-11-11 17:48 ` Gerd Hoffmann
2008-11-11 17:55 ` Anthony Liguori
2008-11-12 13:49 ` Johannes Stezenbach
2008-11-12 13:03 ` Gerd Hoffmann
1 sibling, 2 replies; 13+ messages in thread
From: Gerd Hoffmann @ 2008-11-11 17:48 UTC (permalink / raw)
To: qemu-devel
Anthony Liguori wrote:
> Gerd Hoffmann wrote:
>> Hi,
>>
>> $subject says all.
>>
>> please apply,
>> Gerd
>
> Under what circumstances? posix_aio_read() is only invoked from a
> select callback. This means there should be data available to be read.
Hmm, looked at the loop again a bit closer.
First, the check for errno == EAGAIN is superfluous when the filehandle
isn't in non-blocking mode. It will not happen.
Second, the check for errno == EINTR is superfluous too because the loop
will retry the read() syscall anyway for *any* error.
Third, when called from a select callback it shouldn't block indeed. It
does though for me now and then when booting xen guests (with a big
stack of xenner patches). Doesn't reproduce reliable though. Sprinkled
in a printk (with rfd being non-blocking) and got a EAGAIN once, so it
got called with rfd not having data.
cheers,
Gerd
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [patch] Fix block I/O hang.
2008-11-11 17:48 ` Gerd Hoffmann
@ 2008-11-11 17:55 ` Anthony Liguori
2008-11-11 20:51 ` Gerd Hoffmann
2008-11-12 13:49 ` Johannes Stezenbach
1 sibling, 1 reply; 13+ messages in thread
From: Anthony Liguori @ 2008-11-11 17:55 UTC (permalink / raw)
To: qemu-devel
Gerd Hoffmann wrote:
> Anthony Liguori wrote:
>
>> Gerd Hoffmann wrote:
>>
>>> Hi,
>>>
>>> $subject says all.
>>>
>>> please apply,
>>> Gerd
>>>
>> Under what circumstances? posix_aio_read() is only invoked from a
>> select callback. This means there should be data available to be read.
>>
>
> Hmm, looked at the loop again a bit closer.
>
> First, the check for errno == EAGAIN is superfluous when the filehandle
> isn't in non-blocking mode. It will not happen.
>
> Second, the check for errno == EINTR is superfluous too because the loop
> will retry the read() syscall anyway for *any* error.
>
Yes, this all is. I think the intention was to make it non-blocking so
that the full queue could be drained to avoid superfluous calls to
posix_aio_read() since it handles all outstanding ops at once. However,
> Third, when called from a select callback it shouldn't block indeed. It
> does though for me now and then when booting xen guests (with a big
> stack of xenner patches). Doesn't reproduce reliable though. Sprinkled
> in a printk (with rfd being non-blocking) and got a EAGAIN once, so it
> got called with rfd not having data.
>
I think it would be good to root cause this. AFAICT, it shouldn't block.
Regards,
Anthony Liguori
> cheers,
> Gerd
>
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [patch] Fix block I/O hang.
2008-11-11 17:55 ` Anthony Liguori
@ 2008-11-11 20:51 ` Gerd Hoffmann
2008-11-11 21:07 ` Anthony Liguori
0 siblings, 1 reply; 13+ messages in thread
From: Gerd Hoffmann @ 2008-11-11 20:51 UTC (permalink / raw)
To: qemu-devel
Anthony Liguori wrote:
> Yes, this all is. I think the intention was to make it non-blocking so
> that the full queue could be drained to avoid superfluous calls to
> posix_aio_read() since it handles all outstanding ops at once.
That is what I concluded after a brief look, without realizing that the
loop doesn't actually do that (yet?), thus went for the "oops, this fd
must be in non-blocking mode" fix.
> However,
>
>> Third, when called from a select callback it shouldn't block indeed. It
>> does though for me now and then when booting xen guests (with a big
>> stack of xenner patches). Doesn't reproduce reliable though. Sprinkled
>> in a printk (with rfd being non-blocking) and got a EAGAIN once, so it
>> got called with rfd not having data.
>
> I think it would be good to root cause this. AFAICT, it shouldn't block.
Yes, something is fishy here. Will try to pin it.
Ideas where to look are welcome in case you have any ;)
cheers,
Gerd
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [patch] Fix block I/O hang.
2008-11-11 20:51 ` Gerd Hoffmann
@ 2008-11-11 21:07 ` Anthony Liguori
2008-11-11 21:34 ` Gerd Hoffmann
0 siblings, 1 reply; 13+ messages in thread
From: Anthony Liguori @ 2008-11-11 21:07 UTC (permalink / raw)
To: qemu-devel
Gerd Hoffmann wrote:
>
>> However,
>>
>>
>>> Third, when called from a select callback it shouldn't block indeed. It
>>> does though for me now and then when booting xen guests (with a big
>>> stack of xenner patches). Doesn't reproduce reliable though. Sprinkled
>>> in a printk (with rfd being non-blocking) and got a EAGAIN once, so it
>>> got called with rfd not having data.
>>>
>> I think it would be good to root cause this. AFAICT, it shouldn't block.
>>
>
> Yes, something is fishy here. Will try to pin it.
> Ideas where to look are welcome in case you have any ;)
>
Are you doing something funky with threads in xenner?
Regards,
Anthony Liguori
> cheers,
> Gerd
>
>
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [patch] Fix block I/O hang.
2008-11-11 21:07 ` Anthony Liguori
@ 2008-11-11 21:34 ` Gerd Hoffmann
0 siblings, 0 replies; 13+ messages in thread
From: Gerd Hoffmann @ 2008-11-11 21:34 UTC (permalink / raw)
To: qemu-devel
Anthony Liguori wrote:
> Gerd Hoffmann wrote:
>>
>>> However,
>>>
>>>
>>>> Third, when called from a select callback it shouldn't block
>>>> indeed. It
>>>> does though for me now and then when booting xen guests (with a big
>>>> stack of xenner patches). Doesn't reproduce reliable though.
>>>> Sprinkled
>>>> in a printk (with rfd being non-blocking) and got a EAGAIN once, so it
>>>> got called with rfd not having data.
>>>>
>>> I think it would be good to root cause this. AFAICT, it shouldn't
>>> block.
>>>
>>
>> Yes, something is fishy here. Will try to pin it.
>> Ideas where to look are welcome in case you have any ;)
>>
>
> Are you doing something funky with threads in xenner?
No, I don't create any.
cheers,
Gerd
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [patch] Fix block I/O hang.
2008-11-11 16:49 ` Anthony Liguori
2008-11-11 17:48 ` Gerd Hoffmann
@ 2008-11-12 13:03 ` Gerd Hoffmann
2008-11-12 14:57 ` Anthony Liguori
1 sibling, 1 reply; 13+ messages in thread
From: Gerd Hoffmann @ 2008-11-12 13:03 UTC (permalink / raw)
To: qemu-devel
Anthony Liguori wrote:
> Gerd Hoffmann wrote:
>> Hi,
>>
>> $subject says all.
>>
>> please apply,
>> Gerd
> Under what circumstances? posix_aio_read() is only invoked from a
> select callback. This means there should be data available to be read.
Well, there are *two* select loops: main_loop_wait() and
qemu_aio_wait(). Calling sync block i/o functions from a i/o handler
causes the two select loops run nested => boom.
cheers,
Gerd
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [patch] Fix block I/O hang.
2008-11-11 17:48 ` Gerd Hoffmann
2008-11-11 17:55 ` Anthony Liguori
@ 2008-11-12 13:49 ` Johannes Stezenbach
2008-11-12 14:58 ` Anthony Liguori
2008-11-12 15:01 ` Gerd Hoffmann
1 sibling, 2 replies; 13+ messages in thread
From: Johannes Stezenbach @ 2008-11-12 13:49 UTC (permalink / raw)
To: qemu-devel
Hi,
On Tue, Nov 11, 2008 at 06:48:00PM +0100, Gerd Hoffmann wrote:
> Anthony Liguori wrote:
> >
> > Under what circumstances? posix_aio_read() is only invoked from a
> > select callback. This means there should be data available to be read.
...
>
> Third, when called from a select callback it shouldn't block indeed. It
> does though for me now and then when booting xen guests (with a big
> stack of xenner patches). Doesn't reproduce reliable though. Sprinkled
> in a printk (with rfd being non-blocking) and got a EAGAIN once, so it
> got called with rfd not having data.
I don't know what kind of fd you're talking about, but the
Linux select man page says:
BUGS
Under Linux, select() may report a socket file descriptor as "ready for
reading", while nevertheless a sub‐ sequent read blocks. This could for
example happen when data has arrived but upon examination has wrong
checksum and is discarded. There may be other circumstances in which
a file descriptor is spuriously reported as ready. Thus it may be safer to
use O_NONBLOCK on sockets that should not block.
HTH
Johannes
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [patch] Fix block I/O hang.
2008-11-12 13:03 ` Gerd Hoffmann
@ 2008-11-12 14:57 ` Anthony Liguori
2008-11-13 9:14 ` Gerd Hoffmann
0 siblings, 1 reply; 13+ messages in thread
From: Anthony Liguori @ 2008-11-12 14:57 UTC (permalink / raw)
To: qemu-devel
Gerd Hoffmann wrote:
> Anthony Liguori wrote:
>
>> Gerd Hoffmann wrote:
>>
>>> Hi,
>>>
>>> $subject says all.
>>>
>>> please apply,
>>> Gerd
>>>
>
>
>> Under what circumstances? posix_aio_read() is only invoked from a
>> select callback. This means there should be data available to be read.
>>
>
> Well, there are *two* select loops: main_loop_wait() and
> qemu_aio_wait(). Calling sync block i/o functions from a i/o handler
> causes the two select loops run nested => boom.
>
Yeah, qemu_aio_wait needs to die. Can you resubmit your patch with a
better description, and change the read() look in posix_aio_read() to
consume as much data as possible before hitting EAGAIN?
Regards,
Anthony Liguori
> cheers,
> Gerd
>
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [patch] Fix block I/O hang.
2008-11-12 13:49 ` Johannes Stezenbach
@ 2008-11-12 14:58 ` Anthony Liguori
2008-11-12 15:01 ` Gerd Hoffmann
1 sibling, 0 replies; 13+ messages in thread
From: Anthony Liguori @ 2008-11-12 14:58 UTC (permalink / raw)
To: qemu-devel
Johannes Stezenbach wrote:
> Hi,
>
> On Tue, Nov 11, 2008 at 06:48:00PM +0100, Gerd Hoffmann wrote:
>
>> Anthony Liguori wrote:
>>
>>> Under what circumstances? posix_aio_read() is only invoked from a
>>> select callback. This means there should be data available to be read.
>>>
> ...
>
>> Third, when called from a select callback it shouldn't block indeed. It
>> does though for me now and then when booting xen guests (with a big
>> stack of xenner patches). Doesn't reproduce reliable though. Sprinkled
>> in a printk (with rfd being non-blocking) and got a EAGAIN once, so it
>> got called with rfd not having data.
>>
>
> I don't know what kind of fd you're talking about, but the
> Linux select man page says:
>
It's a pipe. The socket issue has bit me a number of times in the past
though :-)
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [patch] Fix block I/O hang.
2008-11-12 13:49 ` Johannes Stezenbach
2008-11-12 14:58 ` Anthony Liguori
@ 2008-11-12 15:01 ` Gerd Hoffmann
1 sibling, 0 replies; 13+ messages in thread
From: Gerd Hoffmann @ 2008-11-12 15:01 UTC (permalink / raw)
To: qemu-devel
Johannes Stezenbach wrote:
> I don't know what kind of fd you're talking about, but the
> Linux select man page says:
pipe.
> BUGS
> Under Linux, select() may report a socket file descriptor as "ready for
> reading", while nevertheless a sub‐ sequent read blocks. This could for
> example happen when data has arrived but upon examination has wrong
> checksum and is discarded. There may be other circumstances in which
> a file descriptor is spuriously reported as ready. Thus it may be safer to
> use O_NONBLOCK on sockets that should not block.
Unlikely to apply here, I've also found the real problem (see other mail
in this thread).
cheers,
Gerd
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [patch] Fix block I/O hang.
2008-11-12 14:57 ` Anthony Liguori
@ 2008-11-13 9:14 ` Gerd Hoffmann
0 siblings, 0 replies; 13+ messages in thread
From: Gerd Hoffmann @ 2008-11-13 9:14 UTC (permalink / raw)
To: qemu-devel
Anthony Liguori wrote:
> Gerd Hoffmann wrote:
>>
>>> Under what circumstances? posix_aio_read() is only invoked from a
>>> select callback. This means there should be data available to be read.
>>>
>>
>> Well, there are *two* select loops: main_loop_wait() and
>> qemu_aio_wait(). Calling sync block i/o functions from a i/o handler
>> causes the two select loops run nested => boom.
>
> Yeah, qemu_aio_wait needs to die. Can you resubmit your patch with a
> better description, and change the read() look in posix_aio_read() to
> consume as much data as possible before hitting EAGAIN?
I've fixed my problem by changing xen_disk to use a bottom half for
actual work, so the block read/write calls are moved out of the select
loop anyway. Which turned out to be useful for aio support too.
So I'm fine again with the current state. I can create such a patch
nevertheless though.
cheers,
Gerd
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-11-13 9:14 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-11 16:35 [Qemu-devel] [patch] Fix block I/O hang Gerd Hoffmann
2008-11-11 16:49 ` Anthony Liguori
2008-11-11 17:48 ` Gerd Hoffmann
2008-11-11 17:55 ` Anthony Liguori
2008-11-11 20:51 ` Gerd Hoffmann
2008-11-11 21:07 ` Anthony Liguori
2008-11-11 21:34 ` Gerd Hoffmann
2008-11-12 13:49 ` Johannes Stezenbach
2008-11-12 14:58 ` Anthony Liguori
2008-11-12 15:01 ` Gerd Hoffmann
2008-11-12 13:03 ` Gerd Hoffmann
2008-11-12 14:57 ` Anthony Liguori
2008-11-13 9:14 ` Gerd Hoffmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).