* [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send()
@ 2013-01-14 8:16 Liu Yuan
2013-01-14 9:09 ` Paolo Bonzini
2013-01-14 10:23 ` Stefan Hajnoczi
0 siblings, 2 replies; 7+ messages in thread
From: Liu Yuan @ 2013-01-14 8:16 UTC (permalink / raw)
To: qemu-devel; +Cc: Stefan Hajnoczi
Hi List,
This problem can be reproduced by:
1. start a sheepdog cluster and create a volume 'test'*
2. attach 'test' to a bootable image like
$ qemu -hda image -drive if=virtio,file=sheepdog:test
3. pkill sheep # create a half-closed situation
I have straced it that QEMU is busy doing nonsense read/write() after
select() in os_host_main_loop_wait(). I have no knowledge of
glib_select_xxx, so someone please help fix it.
Another unexpected behavior is that qemu_co_send() will send data
successfully for the half-closed situation, even the other end is
completely down. I think the *expected* behavior is that we get notified
by a HUP and close the affected sockfd, then qemu_co_send() will not
send any data, then the caller of qemu_co_send() can handle error case.
I don't know which one I should Cc, so I only include Stefan in.
* You can easily start up a one node sheepdog cluster as following:
$ git clone https://github.com/collie/sheepdog.git
$ cd sheepdog
$ apt-get install liburcu-dev
$ ./autogen.sh; ./configure --disable-corosync;make
#start up a one node sheep cluster
$ mkdir store;./sheep/sheep store -c local
$ collie/collie cluster format -c 1
#create a volume named test
$ collie/collie vdi create test 1G
Thanks,
Yuan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send()
2013-01-14 8:16 [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send() Liu Yuan
@ 2013-01-14 9:09 ` Paolo Bonzini
2013-01-14 9:29 ` Liu Yuan
2013-01-14 10:23 ` Stefan Hajnoczi
1 sibling, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2013-01-14 9:09 UTC (permalink / raw)
To: Liu Yuan; +Cc: qemu-devel, Stefan Hajnoczi
Il 14/01/2013 09:16, Liu Yuan ha scritto:
> Hi List,
> This problem can be reproduced by:
> 1. start a sheepdog cluster and create a volume 'test'*
> 2. attach 'test' to a bootable image like
> $ qemu -hda image -drive if=virtio,file=sheepdog:test
> 3. pkill sheep # create a half-closed situation
>
> I have straced it that QEMU is busy doing nonsense read/write() after
> select() in os_host_main_loop_wait(). I have no knowledge of
> glib_select_xxx, so someone please help fix it.
read/write() is not done by os_host_main_loop_wait().
It must be done by qemu_co_send()/qemu_co_recv() after the handler has
reentered the coroutine.
> Another unexpected behavior is that qemu_co_send() will send data
> successfully for the half-closed situation, even the other end is
> completely down. I think the *expected* behavior is that we get notified
> by a HUP and close the affected sockfd, then qemu_co_send() will not
> send any data, then the caller of qemu_co_send() can handle error case.
qemu_co_send() should get an EPIPE or similar error. The first time it
will report a partial send, the second time it will report the error
directly to the caller.
Please check if this isn't a bug in the Sheepdog driver.
Paolo
> I don't know which one I should Cc, so I only include Stefan in.
>
> * You can easily start up a one node sheepdog cluster as following:
> $ git clone https://github.com/collie/sheepdog.git
> $ cd sheepdog
> $ apt-get install liburcu-dev
> $ ./autogen.sh; ./configure --disable-corosync;make
> #start up a one node sheep cluster
> $ mkdir store;./sheep/sheep store -c local
> $ collie/collie cluster format -c 1
> #create a volume named test
> $ collie/collie vdi create test 1G
>
> Thanks,
> Yuan
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send()
2013-01-14 9:09 ` Paolo Bonzini
@ 2013-01-14 9:29 ` Liu Yuan
2013-01-14 10:07 ` Paolo Bonzini
0 siblings, 1 reply; 7+ messages in thread
From: Liu Yuan @ 2013-01-14 9:29 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi
On 01/14/2013 05:09 PM, Paolo Bonzini wrote:
>> Another unexpected behavior is that qemu_co_send() will send data
>> > successfully for the half-closed situation, even the other end is
>> > completely down. I think the *expected* behavior is that we get notified
>> > by a HUP and close the affected sockfd, then qemu_co_send() will not
>> > send any data, then the caller of qemu_co_send() can handle error case.
> qemu_co_send() should get an EPIPE or similar error. The first time it
> will report a partial send, the second time it will report the error
> directly to the caller.
>
> Please check if this isn't a bug in the Sheepdog driver.
I don't think so. I use netstat to assure that the connection is in closed_wait state and I added a printf in the qemu_co_send() and it indeed sent successfully, this can be backed by the Linux kernel source code:
static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
size_t size, int flags)
{
....
/* Wait for a connection to finish. One exception is TCP Fast Open
* (passive side) where data is allowed to be sent before a connection
* is fully established.
*/
if (((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) &&
!tcp_passive_fastopen(sk)) {
if ((err = sk_stream_wait_connect(sk, &timeo)) != 0)
goto out_err;
}
....
}
which will put data in the sock buf and returns successful in a CLOSED_WAIT state. I don't see means in Sheepdog driver code to get a HUP notification for a actual cut off connection.
Thanks,
Yuan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send()
2013-01-14 9:29 ` Liu Yuan
@ 2013-01-14 10:07 ` Paolo Bonzini
2013-01-16 6:39 ` Liu Yuan
0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2013-01-14 10:07 UTC (permalink / raw)
To: Liu Yuan; +Cc: Anthony Liguori, qemu-devel, Stefan Hajnoczi
Il 14/01/2013 10:29, Liu Yuan ha scritto:
> I don't think so. I use netstat to assure that the connection is in
> closed_wait state and I added a printf in the qemu_co_send() and it
> indeed sent successfully, this can be backed by the Linux kernel
> source code:
>
> static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
> size_t size, int flags)
> {
> ....
> /* Wait for a connection to finish. One exception is TCP Fast Open
> * (passive side) where data is allowed to be sent before a connection
> * is fully established.
> */
> if (((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) &&
> !tcp_passive_fastopen(sk)) {
> if ((err = sk_stream_wait_connect(sk, &timeo)) != 0)
> goto out_err;
> }
> ....
> }
>
> which will put data in the sock buf and returns successful in a
> CLOSED_WAIT state. I don't see means in Sheepdog driver code to get a
> HUP notification for a actual cut off connection.
Ok. I guess the problem is that we use select(), not poll(), so we have
no way to get POLLHUP notifications. But the write fd_set receives the
file descriptor because indeed writes will not block.
Stefan, Anthony, any ideas?
Paolo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send()
2013-01-14 8:16 [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send() Liu Yuan
2013-01-14 9:09 ` Paolo Bonzini
@ 2013-01-14 10:23 ` Stefan Hajnoczi
2013-01-14 10:26 ` Liu Yuan
1 sibling, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2013-01-14 10:23 UTC (permalink / raw)
To: Liu Yuan; +Cc: qemu-devel
On Mon, Jan 14, 2013 at 04:16:34PM +0800, Liu Yuan wrote:
> Hi List,
> This problem can be reproduced by:
> 1. start a sheepdog cluster and create a volume 'test'*
> 2. attach 'test' to a bootable image like
> $ qemu -hda image -drive if=virtio,file=sheepdog:test
> 3. pkill sheep # create a half-closed situation
>
> I have straced it that QEMU is busy doing nonsense read/write() after
> select() in os_host_main_loop_wait(). I have no knowledge of
> glib_select_xxx, so someone please help fix it.
You mentioned a nonsense read(). What is the return value?
If you get a read with return value 0, this tells you the socket has
been closed. Can you handle these cases in block/sheepdog.c?
Stefan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send()
2013-01-14 10:23 ` Stefan Hajnoczi
@ 2013-01-14 10:26 ` Liu Yuan
0 siblings, 0 replies; 7+ messages in thread
From: Liu Yuan @ 2013-01-14 10:26 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: qemu-devel
On 01/14/2013 06:23 PM, Stefan Hajnoczi wrote:
> On Mon, Jan 14, 2013 at 04:16:34PM +0800, Liu Yuan wrote:
>> Hi List,
>> This problem can be reproduced by:
>> 1. start a sheepdog cluster and create a volume 'test'*
>> 2. attach 'test' to a bootable image like
>> $ qemu -hda image -drive if=virtio,file=sheepdog:test
>> 3. pkill sheep # create a half-closed situation
>>
>> I have straced it that QEMU is busy doing nonsense read/write() after
>> select() in os_host_main_loop_wait(). I have no knowledge of
>> glib_select_xxx, so someone please help fix it.
>
> You mentioned a nonsense read(). What is the return value?
>
> If you get a read with return value 0, this tells you the socket has
> been closed. Can you handle these cases in block/sheepdog.c?
>
This is what I saw repeatedly:
select(25, [3 4 5 8 9 10 13 18 19 24], [], [], {1, 0}) = 2 (in [5 13],
left {0, 999994})
read(5, "\6\0\0\0\0\0\0\0", 16) = 8
write(5, "\1\0\0\0\0\0\0\0", 8) = 8
write(5, "\1\0\0\0\0\0\0\0", 8) = 8
write(5, "\1\0\0\0\0\0\0\0", 8) = 8
write(5, "\1\0\0\0\0\0\0\0", 8) = 8
write(5, "\1\0\0\0\0\0\0\0", 8) = 8
write(5, "\1\0\0\0\0\0\0\0", 8) = 8
....
5 isn't a sockfd that sheepdog uses.
I don't know if I can handle it in sheepdog.c because I noticed that
there isn't any function called when this happens. So I suspect this
should be handle at upper block core.
Thanks,
Yuan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send()
2013-01-14 10:07 ` Paolo Bonzini
@ 2013-01-16 6:39 ` Liu Yuan
0 siblings, 0 replies; 7+ messages in thread
From: Liu Yuan @ 2013-01-16 6:39 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Anthony Liguori, qemu-devel, Stefan Hajnoczi
On 01/14/2013 06:07 PM, Paolo Bonzini wrote:
> Ok. I guess the problem is that we use select(), not poll(), so we have
> no way to get POLLHUP notifications. But the write fd_set receives the
> file descriptor because indeed writes will not block.
Hi Paolo,
is there any movement switch select() to poll()?
Thanks,
Yuan
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-01-16 6:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-14 8:16 [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send() Liu Yuan
2013-01-14 9:09 ` Paolo Bonzini
2013-01-14 9:29 ` Liu Yuan
2013-01-14 10:07 ` Paolo Bonzini
2013-01-16 6:39 ` Liu Yuan
2013-01-14 10:23 ` Stefan Hajnoczi
2013-01-14 10:26 ` Liu Yuan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).