From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues Date: Mon, 08 Aug 2011 15:42:11 +0300 Message-ID: <4E3FD9A3.9010109@redhat.com> References: <1312803458-2272-1-git-send-email-avi@redhat.com> <4E3FD7E7.4090509@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org To: Anthony Liguori Return-path: Received: from mx1.redhat.com ([209.132.183.28]:9062 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752479Ab1HHMmQ (ORCPT ); Mon, 8 Aug 2011 08:42:16 -0400 In-Reply-To: <4E3FD7E7.4090509@codemonkey.ws> Sender: kvm-owner@vger.kernel.org List-ID: On 08/08/2011 03:34 PM, Anthony Liguori wrote: > On 08/08/2011 06:37 AM, Avi Kivity wrote: >> In certain circumstances, posix-aio-compat can incur a lot of latenc= y: >> - threads are created by vcpu threads, so if vcpu affinity is set, >> aio threads inherit vcpu affinity. This can cause many aio thre= ads >> to compete for one cpu. >> - we can create up to max_threads (64) aio threads in one go; sinc= e a >> pthread_create can take around 30=CE=BCs, we have up to 2ms of c= pu time >> under a global lock. >> >> Fix by: >> - moving thread creation to the main thread, so we inherit the mai= n >> thread's affinity instead of the vcpu thread's affinity. >> - if a thread is currently being created, and we need to create ye= t >> another thread, let thread being born create the new thread,=20 >> reducing >> the amount of time we spend under the main thread. >> - drop the local lock while creating a thread (we may still hold t= he >> global mutex, though) >> >> Note this doesn't eliminate latency completely; scheduler artifacts = or >> lack of host cpu resources can still cause it. We may want=20 >> pre-allocated >> threads when this cannot be tolerated. >> >> Thanks to Uli Obergfell of Red Hat for his excellent analysis and=20 >> suggestions. > > Do you have a scenario where you can measure the benefits of this cha= nge?=20 It's a customer scenario, so I can't share it. Not that I know exactly= =20 what happened there in terms of workload. > The idle time in the thread pool is rather large, it surprises me tha= t=20 > it'd be an issue in practice. > Just starting up a virtio guest will fill the queue with > max_threads=20 requests, and if the vcpu is pinned, all 64 thread creations and=20 executions will have to run on the same cpu, and will likely preempt th= e=20 vcpu since it's classified as a "cpu hog" by some schedulers. --=20 error compiling committee.c: too many arguments to function