From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43191) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ws8HN-0005QS-W8 for qemu-devel@nongnu.org; Wed, 04 Jun 2014 06:18:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ws8HD-0006av-Tz for qemu-devel@nongnu.org; Wed, 04 Jun 2014 06:18:45 -0400 Received: from mail-qg0-x22a.google.com ([2607:f8b0:400d:c04::22a]:62946) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ws8HD-0006ap-Ph for qemu-devel@nongnu.org; Wed, 04 Jun 2014 06:18:35 -0400 Received: by mail-qg0-f42.google.com with SMTP id q107so15582526qgd.1 for ; Wed, 04 Jun 2014 03:18:35 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <538EF278.30308@redhat.com> Date: Wed, 04 Jun 2014 12:18:32 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <538C248F.2000801@beyond.pl> <20140604100156.GF26902@stefanha-thinkpad.redhat.com> In-Reply-To: <20140604100156.GF26902@stefanha-thinkpad.redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2] thread-pool: fix deadlock when callbacks depends on each other List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi , =?UTF-8?B?TWFyY2luIEdpYnXFgmE=?= Cc: qemu-devel Il 04/06/2014 12:01, Stefan Hajnoczi ha scritto: >> > Without this patch, I could reproduce this bug with snapshot-commit with >> > about 1 per 10 tries. With this patch, I couldn't reproduce it any more. >> > >> > Signed-off-by: Marcin Gibula >> > --- > This is an interesting bug that definitely needs a test case to prevent > regressions in the future. > > Please take a look at tests/test-thread-pool.c and add a test to it. It > can be reproduced deterministically - just call aio_poll() after the > dummy worker functions have both completed. Then the next aio_poll() > call in the thread pool callback will suffer the problem you described. The question if we want to consider this thread-pool.c behavior a real bug or just a misfeature (the real bug being elsewhere). Even though this patch avoids the performance problems of v1, we would have to fix at least two other cases and it's not obvious (a) that those two are the only ones (b) tgat those two can be fixed without affecting performance. If the bottom half code is immune from this event notifier problem, bdrv_drain/bdrv_drain_all calls in coroutine context can defer the actual draining to a bottom half and reenter the coroutine afterwards; we can then audit that all other calls should come from the main loop rather than aio_poll. Paolo