From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43191)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1Ws8HN-0005QS-W8
	for qemu-devel@nongnu.org; Wed, 04 Jun 2014 06:18:51 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1Ws8HD-0006av-Tz
	for qemu-devel@nongnu.org; Wed, 04 Jun 2014 06:18:45 -0400
Received: from mail-qg0-x22a.google.com ([2607:f8b0:400d:c04::22a]:62946)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1Ws8HD-0006ap-Ph
	for qemu-devel@nongnu.org; Wed, 04 Jun 2014 06:18:35 -0400
Received: by mail-qg0-f42.google.com with SMTP id q107so15582526qgd.1
	for <qemu-devel@nongnu.org>; Wed, 04 Jun 2014 03:18:35 -0700 (PDT)
Sender: Paolo Bonzini <paolo.bonzini@gmail.com>
Message-ID: <538EF278.30308@redhat.com>
Date: Wed, 04 Jun 2014 12:18:32 +0200
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <538C248F.2000801@beyond.pl>
	<20140604100156.GF26902@stefanha-thinkpad.redhat.com>
In-Reply-To: <20140604100156.GF26902@stefanha-thinkpad.redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v2] thread-pool: fix deadlock when
 callbacks depends on each other
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>, =?UTF-8?B?TWFyY2luIEdpYnXFgmE=?= <m.gibula@beyond.pl>
Cc: qemu-devel <qemu-devel@nongnu.org>

Il 04/06/2014 12:01, Stefan Hajnoczi ha scritto:
>> > Without this patch, I could reproduce this bug with snapshot-commit with
>> > about 1 per 10 tries. With this patch, I couldn't reproduce it any more.
>> >
>> > Signed-off-by: Marcin Gibula <m.gibula@beyond.pl>
>> > ---
> This is an interesting bug that definitely needs a test case to prevent
> regressions in the future.
>
> Please take a look at tests/test-thread-pool.c and add a test to it.  It
> can be reproduced deterministically - just call aio_poll() after the
> dummy worker functions have both completed.  Then the next aio_poll()
> call in the thread pool callback will suffer the problem you described.

The question if we want to consider this thread-pool.c behavior a real 
bug or just a misfeature (the real bug being elsewhere).

Even though this patch avoids the performance problems of v1, we would 
have to fix at least two other cases and it's not obvious (a) that those 
two are the only ones (b) tgat those two can be fixed without affecting 
performance.

If the bottom half code is immune from this event notifier problem, 
bdrv_drain/bdrv_drain_all calls in coroutine context can defer the 
actual draining to a bottom half and reenter the coroutine afterwards; 
we can then audit that all other calls should come from the main loop 
rather than aio_poll.

Paolo