From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:58027) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RgZnl-0002aV-67 for qemu-devel@nongnu.org; Fri, 30 Dec 2011 05:35:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RgZnj-0000n4-TV for qemu-devel@nongnu.org; Fri, 30 Dec 2011 05:35:05 -0500 Received: from mail-we0-f173.google.com ([74.125.82.173]:50393) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RgZnj-0000mY-Nu for qemu-devel@nongnu.org; Fri, 30 Dec 2011 05:35:03 -0500 Received: by werb10 with SMTP id b10so7677796wer.4 for ; Fri, 30 Dec 2011 02:35:02 -0800 (PST) Date: Fri, 30 Dec 2011 10:35:01 +0000 From: Stefan Hajnoczi Message-ID: <20111230103500.GA1740@stefanha-thinkpad.localdomain> References: <1313152395-25248-1-git-send-email-morita.kazutaka@lab.ntt.co.jp> <20111223133850.GA12770@lst.de> <20111229120626.GA32331@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111229120626.GA32331@lst.de> Subject: Re: [Qemu-devel] coroutine bug?, was Re: [PATCH] sheepdog: use coroutines List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Christoph Hellwig Cc: kwolf@redhat.com, sheepdog@lists.wpkg.org, qemu-devel@nongnu.org, MORITA Kazutaka On Thu, Dec 29, 2011 at 01:06:26PM +0100, Christoph Hellwig wrote: > On Fri, Dec 23, 2011 at 02:38:50PM +0100, Christoph Hellwig wrote: > > FYI, this causes segfaults when doing large streaming writes when > > running against a sheepdog cluster which: > > > > a) has relatively fast SSDs > > > > and > > > > b) uses buffered I/O. > > > > Unfortunately I can't get a useful backtrace out of gdb. When running just > > this commit I at least get some debugging messages: > > > > qemu-system-x86_64: failed to recv a rsp, Socket operation on non-socket > > qemu-system-x86_64: failed to get the header, Socket operation on non-socket > > > > but on least qemu these don't show up either. > > s/least/latest/ > > Some more debugging. Just for the call that eventually segfaults s->fd > turns from its normal value (normall 13 for me) into 0. This is entirely > reproducable in my testing, and given that the sheepdog driver never > assigns to that value except opening the device this seems to point to > an issue in the coroutine code to me. Are you building with gcc 4.5.3 or later? (Earlier versions may mis-compile, see https://bugs.launchpad.net/qemu/+bug/902148.) If you can reproduce this bug and suspect coroutines are involved then I suggest using gdb to observe the last valid field values of s and the address of s. When the coroutine re-enters make sure that s still has the same address and check if the field values are the same as before. I don't have a sheepdog setup here but if there's an easy way to reproduce please let me know and I'll take a look. Stefan