From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48926) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X2eGz-0008Lj-8Z for qemu-devel@nongnu.org; Thu, 03 Jul 2014 06:29:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X2eGt-0007Pk-36 for qemu-devel@nongnu.org; Thu, 03 Jul 2014 06:29:49 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55902) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X2eGs-0007Pe-QW for qemu-devel@nongnu.org; Thu, 03 Jul 2014 06:29:43 -0400 Message-ID: <53B5308F.3030008@redhat.com> Date: Thu, 03 Jul 2014 12:29:35 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <20140627120129.GO12061@stefanha-thinkpad.muc.redhat.com> <53ADE769.3060903@redhat.com> <20140630080850.GB30969@stefanha-thinkpad.redhat.com> <53B2E69A.1090707@redhat.com> <20140702085453.GI4660@stefanha-thinkpad.redhat.com> <53B43194.80102@redhat.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ming Lei Cc: Kevin Wolf , Fam Zheng , "Michael S. Tsirkin" , Stefan Hajnoczi , qemu-devel , Stefan Hajnoczi Il 03/07/2014 06:54, Ming Lei ha scritto: > On Thu, Jul 3, 2014 at 12:21 AM, Paolo Bonzini wrote: >> Il 02/07/2014 17:45, Ming Lei ha scritto: >>> The attachment debug patch skips aio_notify() if qemu_bh_schedule >>> is running from current aio context, but looks there is still 120K >>> writes triggered. (without the patch, 400K can be observed in >>> same test) >> >> Nice. Another observation is that after aio_dispatch we'll always >> re-evaluate everything (bottom halves, file descriptors and timeouts), > > The idea is very good. > > If aio_notify() is called from the 1st aio_dispatch() in aio_poll(), > ctc->notifier might need to be set, but it can be handled easily. Yes, you can just move the atomic_inc/atomic_dec in aio_poll. >> so we can skip the aio_notify if we're inside aio_dispatch. >> >> So what about this untested patch: >> >> diff --git a/aio-posix.c b/aio-posix.c >> index f921d4f..a23d85d 100644 >> --- a/aio-posix.c >> +++ b/aio-posix.c > > #include "qemu/atomic.h" > >> @@ -124,6 +124,9 @@ static bool aio_dispatch(AioContext *ctx) >> AioHandler *node; >> bool progress = false; >> >> + /* No need to set the event notifier during aio_notify. */ >> + ctx->running++; >> + >> /* >> * We have to walk very carefully in case qemu_aio_set_fd_handler is >> * called while we're walking. >> @@ -169,6 +171,11 @@ static bool aio_dispatch(AioContext *ctx) >> /* Run our timers */ >> progress |= timerlistgroup_run_timers(&ctx->tlg); >> >> + smp_wmb(); >> + ctx->iter_count++; >> + smp_wmb(); >> + ctx->running--; >> + >> return progress; >> } >> >> diff --git a/async.c b/async.c >> index 5b6fe6b..1f56afa 100644 >> --- a/async.c >> +++ b/async.c > > #include "qemu/atomic.h" > >> @@ -249,7 +249,19 @@ ThreadPool *aio_get_thread_pool(AioContext *ctx) >> >> void aio_notify(AioContext *ctx) >> { >> - event_notifier_set(&ctx->notifier); >> + uint32_t iter_count; >> + do { >> + iter_count = ctx->iter_count; >> + /* Read ctx->iter_count before ctx->running. */ >> + smb_rmb(); > > s/smb/smp > >> + if (!ctx->running) { >> + event_notifier_set(&ctx->notifier); >> + return; >> + } >> + /* Read ctx->running before ctx->iter_count. */ >> + smb_rmb(); > > s/smb/smp > >> + /* ctx might have gone to sleep. */ >> + } while (iter_count != ctx->iter_count); >> } > > Since both 'running' and 'iter_count' may be read lockless, something > like ACCESS_ONCE() should be used to avoid compiler optimization. No, smp_rmb() is enough to avoid them. See also include/qemu/seqlock.h The first access to ctx->iter_count _could_ be protected by ACCESS_ONCE(), which in QEMU we call atomic_read()/atomic_set(), but it's not necessary. See docs/atomics.txt for a description for QEMU's atomic access functions. > In my test, it does decrease write() very much, and I hope > a formal version can be applied soon. Can you take care of that (you can add my Signed-off-by), since you have the best testing environment? v5 of the plug/unplug series will be good to go, I think. Paolo