From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:52016)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1Tug3C-00021h-OK
	for qemu-devel@nongnu.org; Mon, 14 Jan 2013 04:09:57 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1Tug36-0000dP-O7
	for qemu-devel@nongnu.org; Mon, 14 Jan 2013 04:09:50 -0500
Received: from mail-wg0-f52.google.com ([74.125.82.52]:62263)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1Tug36-0000cS-HN
	for qemu-devel@nongnu.org; Mon, 14 Jan 2013 04:09:44 -0500
Received: by mail-wg0-f52.google.com with SMTP id 12so1933851wgh.7
	for <qemu-devel@nongnu.org>; Mon, 14 Jan 2013 01:09:43 -0800 (PST)
Sender: Paolo Bonzini <paolo.bonzini@gmail.com>
Message-ID: <50F3CB54.6080506@redhat.com>
Date: Mon, 14 Jan 2013 10:09:40 +0100
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <50F3BEE2.5090802@gmail.com>
In-Reply-To: <50F3BEE2.5090802@gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] 100% CPU when sockfd is half-closed and unexpected
 behavior for qemu_co_send()
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Liu Yuan <namei.unix@gmail.com>
Cc: qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>

Il 14/01/2013 09:16, Liu Yuan ha scritto:
> Hi List,
>   This problem can be reproduced by:
>   1. start a sheepdog cluster and create a volume 'test'*
>   2. attach 'test' to a bootable image like
>      $ qemu -hda image -drive if=virtio,file=sheepdog:test
>   3. pkill sheep # create a half-closed situation
> 
> I have straced it that QEMU is busy doing nonsense read/write() after
> select() in os_host_main_loop_wait(). I have no knowledge of
> glib_select_xxx, so someone please help fix it.

read/write() is not done by os_host_main_loop_wait().

It must be done by qemu_co_send()/qemu_co_recv() after the handler has
reentered the coroutine.

> Another unexpected behavior is that qemu_co_send() will send data
> successfully for the half-closed situation, even the other end is
> completely down. I think the *expected* behavior is that we get notified
> by a HUP and close the affected sockfd, then qemu_co_send() will not
> send any data, then the caller of qemu_co_send() can handle error case.

qemu_co_send() should get an EPIPE or similar error.  The first time it
will report a partial send, the second time it will report the error
directly to the caller.

Please check if this isn't a bug in the Sheepdog driver.

Paolo

> I don't know which one I should Cc, so I only include Stefan in.
> 
> * You can easily start up a one node sheepdog cluster as following:
>  $ git clone https://github.com/collie/sheepdog.git
>  $ cd sheepdog
>  $ apt-get install liburcu-dev
>  $ ./autogen.sh; ./configure --disable-corosync;make
>  #start up a one node sheep cluster
>  $ mkdir store;./sheep/sheep store -c local
>  $ collie/collie cluster format -c 1
>  #create a volume named test
>  $ collie/collie vdi create test 1G
> 
> Thanks,
> Yuan
> 
>