From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:36937) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QqP9t-0006md-RW for qemu-devel@nongnu.org; Mon, 08 Aug 2011 08:42:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QqP9s-0005eN-K2 for qemu-devel@nongnu.org; Mon, 08 Aug 2011 08:42:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58040) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QqP9s-0005eG-84 for qemu-devel@nongnu.org; Mon, 08 Aug 2011 08:42:16 -0400 Message-ID: <4E3FD9A3.9010109@redhat.com> Date: Mon, 08 Aug 2011 15:42:11 +0300 From: Avi Kivity MIME-Version: 1.0 References: <1312803458-2272-1-git-send-email-avi@redhat.com> <4E3FD7E7.4090509@codemonkey.ws> In-Reply-To: <4E3FD7E7.4090509@codemonkey.ws> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org On 08/08/2011 03:34 PM, Anthony Liguori wrote: > On 08/08/2011 06:37 AM, Avi Kivity wrote: >> In certain circumstances, posix-aio-compat can incur a lot of latency: >> - threads are created by vcpu threads, so if vcpu affinity is set, >> aio threads inherit vcpu affinity. This can cause many aio thread= s >> to compete for one cpu. >> - we can create up to max_threads (64) aio threads in one go; since = a >> pthread_create can take around 30=CE=BCs, we have up to 2ms of cpu= time >> under a global lock. >> >> Fix by: >> - moving thread creation to the main thread, so we inherit the main >> thread's affinity instead of the vcpu thread's affinity. >> - if a thread is currently being created, and we need to create yet >> another thread, let thread being born create the new thread,=20 >> reducing >> the amount of time we spend under the main thread. >> - drop the local lock while creating a thread (we may still hold the >> global mutex, though) >> >> Note this doesn't eliminate latency completely; scheduler artifacts or >> lack of host cpu resources can still cause it. We may want=20 >> pre-allocated >> threads when this cannot be tolerated. >> >> Thanks to Uli Obergfell of Red Hat for his excellent analysis and=20 >> suggestions. > > Do you have a scenario where you can measure the benefits of this chang= e?=20 It's a customer scenario, so I can't share it. Not that I know exactly=20 what happened there in terms of workload. > The idle time in the thread pool is rather large, it surprises me that=20 > it'd be an issue in practice. > Just starting up a virtio guest will fill the queue with > max_threads=20 requests, and if the vcpu is pinned, all 64 thread creations and=20 executions will have to run on the same cpu, and will likely preempt the=20 vcpu since it's classified as a "cpu hog" by some schedulers. --=20 error compiling committee.c: too many arguments to function