From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:41384)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1QrrjH-00047k-4R
	for qemu-devel@nongnu.org; Fri, 12 Aug 2011 09:24:52 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1QrrjF-0004JM-QD
	for qemu-devel@nongnu.org; Fri, 12 Aug 2011 09:24:51 -0400
Received: from mail-gw0-f45.google.com ([74.125.83.45]:63667)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1QrrjF-0004JE-LN
	for qemu-devel@nongnu.org; Fri, 12 Aug 2011 09:24:49 -0400
Received: by gwb19 with SMTP id 19so2290809gwb.4
	for <qemu-devel@nongnu.org>; Fri, 12 Aug 2011 06:24:48 -0700 (PDT)
Message-ID: <4E452994.9080805@codemonkey.ws>
Date: Fri, 12 Aug 2011 08:24:36 -0500
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
References: <1312803458-2272-1-git-send-email-avi@redhat.com>
In-Reply-To: <1312803458-2272-1-git-send-email-avi@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Avi Kivity <avi@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel@nongnu.org, kvm@vger.kernel.org

On 08/08/2011 06:37 AM, Avi Kivity wrote:
> In certain circumstances, posix-aio-compat can incur a lot of latency:
>   - threads are created by vcpu threads, so if vcpu affinity is set,
>     aio threads inherit vcpu affinity.  This can cause many aio threads
>     to compete for one cpu.
>   - we can create up to max_threads (64) aio threads in one go; since a
>     pthread_create can take around 30μs, we have up to 2ms of cpu time
>     under a global lock.
>
> Fix by:
>   - moving thread creation to the main thread, so we inherit the main
>     thread's affinity instead of the vcpu thread's affinity.
>   - if a thread is currently being created, and we need to create yet
>     another thread, let thread being born create the new thread, reducing
>     the amount of time we spend under the main thread.
>   - drop the local lock while creating a thread (we may still hold the
>     global mutex, though)
>
> Note this doesn't eliminate latency completely; scheduler artifacts or
> lack of host cpu resources can still cause it.  We may want pre-allocated
> threads when this cannot be tolerated.
>
> Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions.
>
> Signed-off-by: Avi Kivity<avi@redhat.com>
> ---
>   posix-aio-compat.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++--
>   1 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/posix-aio-compat.c b/posix-aio-compat.c
> index 8dc00cb..aa30673 100644
> --- a/posix-aio-compat.c
> +++ b/posix-aio-compat.c
> @@ -30,6 +30,7 @@
>
>   #include "block/raw-posix-aio.h"
>
> +static void do_spawn_thread(void);
>
>   struct qemu_paiocb {
>       BlockDriverAIOCB common;
> @@ -64,6 +65,9 @@ static pthread_attr_t attr;
>   static int max_threads = 64;
>   static int cur_threads = 0;
>   static int idle_threads = 0;
> +static int new_threads = 0;     /* backlog of threads we need to create */
> +static int pending_threads = 0; /* threads created but not running yet */
> +static QEMUBH *new_thread_bh;
>   static QTAILQ_HEAD(, qemu_paiocb) request_list;
>
>   #ifdef CONFIG_PREADV
> @@ -311,6 +315,13 @@ static void *aio_thread(void *unused)
>
>       pid = getpid();
>
> +    mutex_lock(&lock);
> +    if (new_threads) {
> +        do_spawn_thread();
> +    }
> +    pending_threads--;
> +    mutex_unlock(&lock);
> +
>       while (1) {
>           struct qemu_paiocb *aiocb;
>           ssize_t ret = 0;
> @@ -381,11 +392,18 @@ static void *aio_thread(void *unused)
>       return NULL;
>   }
>
> -static void spawn_thread(void)
> +static void do_spawn_thread(void)
>   {
>       sigset_t set, oldset;
>
> -    cur_threads++;
> +    if (!new_threads) {
> +        return;
> +    }
> +
> +    new_threads--;
> +    pending_threads++;
> +
> +    mutex_unlock(&lock);
>
>       /* block all signals */
>       if (sigfillset(&set)) die("sigfillset");
> @@ -394,6 +412,31 @@ static void spawn_thread(void)
>       thread_create(&thread_id,&attr, aio_thread, NULL);
>
>       if (sigprocmask(SIG_SETMASK,&oldset, NULL)) die("sigprocmask restore");
> +
> +    mutex_lock(&lock);
> +}
> +
> +static void spawn_thread_bh_fn(void *opaque)
> +{
> +    mutex_lock(&lock);
> +    do_spawn_thread();
> +    mutex_unlock(&lock);
> +}

The locking here is odd.  Why not call do_spawn_thread() without the 
lock, and acquire the lock for the section that needs to hold it?

Otherwise, the logic seems correct to me.

Kevin, could you also take a look at this patch?

Regards,

Anthony Liguori

> +
> +static void spawn_thread(void)
> +{
> +    cur_threads++;
> +    new_threads++;
> +    /* If there are threads being created, they will spawn new workers, so
> +     * we don't spend time creating many threads in a loop holding a mutex or
> +     * starving the current vcpu.
> +     *
> +     * If there are no idle threads, ask the main thread to create one, so we
> +     * inherit the correct affinity instead of the vcpu affinity.
> +     */
> +    if (!pending_threads) {
> +        qemu_bh_schedule(new_thread_bh);
> +    }
>   }
>
>   static void qemu_paio_submit(struct qemu_paiocb *aiocb)
> @@ -665,6 +708,7 @@ int paio_init(void)
>           die2(ret, "pthread_attr_setdetachstate");
>
>       QTAILQ_INIT(&request_list);
> +    new_thread_bh = qemu_bh_new(spawn_thread_bh_fn, NULL);
>
>       posix_aio_state = s;
>       return 0;