* [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send() @ 2013-01-14 8:16 Liu Yuan 2013-01-14 9:09 ` Paolo Bonzini 2013-01-14 10:23 ` Stefan Hajnoczi 0 siblings, 2 replies; 7+ messages in thread From: Liu Yuan @ 2013-01-14 8:16 UTC (permalink / raw) To: qemu-devel; +Cc: Stefan Hajnoczi Hi List, This problem can be reproduced by: 1. start a sheepdog cluster and create a volume 'test'* 2. attach 'test' to a bootable image like $ qemu -hda image -drive if=virtio,file=sheepdog:test 3. pkill sheep # create a half-closed situation I have straced it that QEMU is busy doing nonsense read/write() after select() in os_host_main_loop_wait(). I have no knowledge of glib_select_xxx, so someone please help fix it. Another unexpected behavior is that qemu_co_send() will send data successfully for the half-closed situation, even the other end is completely down. I think the *expected* behavior is that we get notified by a HUP and close the affected sockfd, then qemu_co_send() will not send any data, then the caller of qemu_co_send() can handle error case. I don't know which one I should Cc, so I only include Stefan in. * You can easily start up a one node sheepdog cluster as following: $ git clone https://github.com/collie/sheepdog.git $ cd sheepdog $ apt-get install liburcu-dev $ ./autogen.sh; ./configure --disable-corosync;make #start up a one node sheep cluster $ mkdir store;./sheep/sheep store -c local $ collie/collie cluster format -c 1 #create a volume named test $ collie/collie vdi create test 1G Thanks, Yuan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send() 2013-01-14 8:16 [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send() Liu Yuan @ 2013-01-14 9:09 ` Paolo Bonzini 2013-01-14 9:29 ` Liu Yuan 2013-01-14 10:23 ` Stefan Hajnoczi 1 sibling, 1 reply; 7+ messages in thread From: Paolo Bonzini @ 2013-01-14 9:09 UTC (permalink / raw) To: Liu Yuan; +Cc: qemu-devel, Stefan Hajnoczi Il 14/01/2013 09:16, Liu Yuan ha scritto: > Hi List, > This problem can be reproduced by: > 1. start a sheepdog cluster and create a volume 'test'* > 2. attach 'test' to a bootable image like > $ qemu -hda image -drive if=virtio,file=sheepdog:test > 3. pkill sheep # create a half-closed situation > > I have straced it that QEMU is busy doing nonsense read/write() after > select() in os_host_main_loop_wait(). I have no knowledge of > glib_select_xxx, so someone please help fix it. read/write() is not done by os_host_main_loop_wait(). It must be done by qemu_co_send()/qemu_co_recv() after the handler has reentered the coroutine. > Another unexpected behavior is that qemu_co_send() will send data > successfully for the half-closed situation, even the other end is > completely down. I think the *expected* behavior is that we get notified > by a HUP and close the affected sockfd, then qemu_co_send() will not > send any data, then the caller of qemu_co_send() can handle error case. qemu_co_send() should get an EPIPE or similar error. The first time it will report a partial send, the second time it will report the error directly to the caller. Please check if this isn't a bug in the Sheepdog driver. Paolo > I don't know which one I should Cc, so I only include Stefan in. > > * You can easily start up a one node sheepdog cluster as following: > $ git clone https://github.com/collie/sheepdog.git > $ cd sheepdog > $ apt-get install liburcu-dev > $ ./autogen.sh; ./configure --disable-corosync;make > #start up a one node sheep cluster > $ mkdir store;./sheep/sheep store -c local > $ collie/collie cluster format -c 1 > #create a volume named test > $ collie/collie vdi create test 1G > > Thanks, > Yuan > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send() 2013-01-14 9:09 ` Paolo Bonzini @ 2013-01-14 9:29 ` Liu Yuan 2013-01-14 10:07 ` Paolo Bonzini 0 siblings, 1 reply; 7+ messages in thread From: Liu Yuan @ 2013-01-14 9:29 UTC (permalink / raw) To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi On 01/14/2013 05:09 PM, Paolo Bonzini wrote: >> Another unexpected behavior is that qemu_co_send() will send data >> > successfully for the half-closed situation, even the other end is >> > completely down. I think the *expected* behavior is that we get notified >> > by a HUP and close the affected sockfd, then qemu_co_send() will not >> > send any data, then the caller of qemu_co_send() can handle error case. > qemu_co_send() should get an EPIPE or similar error. The first time it > will report a partial send, the second time it will report the error > directly to the caller. > > Please check if this isn't a bug in the Sheepdog driver. I don't think so. I use netstat to assure that the connection is in closed_wait state and I added a printf in the qemu_co_send() and it indeed sent successfully, this can be backed by the Linux kernel source code: static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, size_t size, int flags) { .... /* Wait for a connection to finish. One exception is TCP Fast Open * (passive side) where data is allowed to be sent before a connection * is fully established. */ if (((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) && !tcp_passive_fastopen(sk)) { if ((err = sk_stream_wait_connect(sk, &timeo)) != 0) goto out_err; } .... } which will put data in the sock buf and returns successful in a CLOSED_WAIT state. I don't see means in Sheepdog driver code to get a HUP notification for a actual cut off connection. Thanks, Yuan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send() 2013-01-14 9:29 ` Liu Yuan @ 2013-01-14 10:07 ` Paolo Bonzini 2013-01-16 6:39 ` Liu Yuan 0 siblings, 1 reply; 7+ messages in thread From: Paolo Bonzini @ 2013-01-14 10:07 UTC (permalink / raw) To: Liu Yuan; +Cc: Anthony Liguori, qemu-devel, Stefan Hajnoczi Il 14/01/2013 10:29, Liu Yuan ha scritto: > I don't think so. I use netstat to assure that the connection is in > closed_wait state and I added a printf in the qemu_co_send() and it > indeed sent successfully, this can be backed by the Linux kernel > source code: > > static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, > size_t size, int flags) > { > .... > /* Wait for a connection to finish. One exception is TCP Fast Open > * (passive side) where data is allowed to be sent before a connection > * is fully established. > */ > if (((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) && > !tcp_passive_fastopen(sk)) { > if ((err = sk_stream_wait_connect(sk, &timeo)) != 0) > goto out_err; > } > .... > } > > which will put data in the sock buf and returns successful in a > CLOSED_WAIT state. I don't see means in Sheepdog driver code to get a > HUP notification for a actual cut off connection. Ok. I guess the problem is that we use select(), not poll(), so we have no way to get POLLHUP notifications. But the write fd_set receives the file descriptor because indeed writes will not block. Stefan, Anthony, any ideas? Paolo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send() 2013-01-14 10:07 ` Paolo Bonzini @ 2013-01-16 6:39 ` Liu Yuan 0 siblings, 0 replies; 7+ messages in thread From: Liu Yuan @ 2013-01-16 6:39 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Anthony Liguori, qemu-devel, Stefan Hajnoczi On 01/14/2013 06:07 PM, Paolo Bonzini wrote: > Ok. I guess the problem is that we use select(), not poll(), so we have > no way to get POLLHUP notifications. But the write fd_set receives the > file descriptor because indeed writes will not block. Hi Paolo, is there any movement switch select() to poll()? Thanks, Yuan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send() 2013-01-14 8:16 [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send() Liu Yuan 2013-01-14 9:09 ` Paolo Bonzini @ 2013-01-14 10:23 ` Stefan Hajnoczi 2013-01-14 10:26 ` Liu Yuan 1 sibling, 1 reply; 7+ messages in thread From: Stefan Hajnoczi @ 2013-01-14 10:23 UTC (permalink / raw) To: Liu Yuan; +Cc: qemu-devel On Mon, Jan 14, 2013 at 04:16:34PM +0800, Liu Yuan wrote: > Hi List, > This problem can be reproduced by: > 1. start a sheepdog cluster and create a volume 'test'* > 2. attach 'test' to a bootable image like > $ qemu -hda image -drive if=virtio,file=sheepdog:test > 3. pkill sheep # create a half-closed situation > > I have straced it that QEMU is busy doing nonsense read/write() after > select() in os_host_main_loop_wait(). I have no knowledge of > glib_select_xxx, so someone please help fix it. You mentioned a nonsense read(). What is the return value? If you get a read with return value 0, this tells you the socket has been closed. Can you handle these cases in block/sheepdog.c? Stefan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send() 2013-01-14 10:23 ` Stefan Hajnoczi @ 2013-01-14 10:26 ` Liu Yuan 0 siblings, 0 replies; 7+ messages in thread From: Liu Yuan @ 2013-01-14 10:26 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: qemu-devel On 01/14/2013 06:23 PM, Stefan Hajnoczi wrote: > On Mon, Jan 14, 2013 at 04:16:34PM +0800, Liu Yuan wrote: >> Hi List, >> This problem can be reproduced by: >> 1. start a sheepdog cluster and create a volume 'test'* >> 2. attach 'test' to a bootable image like >> $ qemu -hda image -drive if=virtio,file=sheepdog:test >> 3. pkill sheep # create a half-closed situation >> >> I have straced it that QEMU is busy doing nonsense read/write() after >> select() in os_host_main_loop_wait(). I have no knowledge of >> glib_select_xxx, so someone please help fix it. > > You mentioned a nonsense read(). What is the return value? > > If you get a read with return value 0, this tells you the socket has > been closed. Can you handle these cases in block/sheepdog.c? > This is what I saw repeatedly: select(25, [3 4 5 8 9 10 13 18 19 24], [], [], {1, 0}) = 2 (in [5 13], left {0, 999994}) read(5, "\6\0\0\0\0\0\0\0", 16) = 8 write(5, "\1\0\0\0\0\0\0\0", 8) = 8 write(5, "\1\0\0\0\0\0\0\0", 8) = 8 write(5, "\1\0\0\0\0\0\0\0", 8) = 8 write(5, "\1\0\0\0\0\0\0\0", 8) = 8 write(5, "\1\0\0\0\0\0\0\0", 8) = 8 write(5, "\1\0\0\0\0\0\0\0", 8) = 8 .... 5 isn't a sockfd that sheepdog uses. I don't know if I can handle it in sheepdog.c because I noticed that there isn't any function called when this happens. So I suspect this should be handle at upper block core. Thanks, Yuan ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-01-16 6:39 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-01-14 8:16 [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected behavior for qemu_co_send() Liu Yuan 2013-01-14 9:09 ` Paolo Bonzini 2013-01-14 9:29 ` Liu Yuan 2013-01-14 10:07 ` Paolo Bonzini 2013-01-16 6:39 ` Liu Yuan 2013-01-14 10:23 ` Stefan Hajnoczi 2013-01-14 10:26 ` Liu Yuan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).