From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH] posix-aio-compat: fix latency issues Date: Mon, 08 Aug 2011 15:54:43 +0300 Message-ID: <4E3FDC93.3000701@redhat.com> References: <1312803458-2272-1-git-send-email-avi@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org To: Frediano Ziglio Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org List-Id: kvm.vger.kernel.org On 08/08/2011 03:49 PM, Frediano Ziglio wrote: > 2011/8/8 Avi Kivity: > > In certain circumstances, posix-aio-compat can incur a lot of latenc= y: > > - threads are created by vcpu threads, so if vcpu affinity is set, > > aio threads inherit vcpu affinity. This can cause many aio threa= ds > > to compete for one cpu. > > - we can create up to max_threads (64) aio threads in one go; sinc= e a > > pthread_create can take around 30=CE=BCs, we have up to 2ms of cp= u time > > under a global lock. > > > > Fix by: > > - moving thread creation to the main thread, so we inherit the mai= n > > thread's affinity instead of the vcpu thread's affinity. > > - if a thread is currently being created, and we need to create ye= t > > another thread, let thread being born create the new thread, redu= cing > > the amount of time we spend under the main thread. > > - drop the local lock while creating a thread (we may still hold t= he > > global mutex, though) > > > > Note this doesn't eliminate latency completely; scheduler artifacts = or > > lack of host cpu resources can still cause it. We may want pre-allo= cated > > threads when this cannot be tolerated. > > > > Thanks to Uli Obergfell of Red Hat for his excellent analysis and su= ggestions. > > > > Signed-off-by: Avi Kivity > > Why not calling pthread_attr_setaffinity_np (where available) before > thread creation or shed_setaffinity at thread start instead of telling > another thread to create a thread for us just to get affinity cleared? > The entire qemu process may be affined to a subset of the host cpus; we=20 don't want to break that. For example: taskset 0xf0 qemu .... (qemu) info cpus --=20 error compiling committee.c: too many arguments to function From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:60965) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QqPM2-0003X3-CI for qemu-devel@nongnu.org; Mon, 08 Aug 2011 08:54:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QqPM1-0007uV-Bq for qemu-devel@nongnu.org; Mon, 08 Aug 2011 08:54:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51725) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QqPM1-0007uR-3Z for qemu-devel@nongnu.org; Mon, 08 Aug 2011 08:54:49 -0400 Message-ID: <4E3FDC93.3000701@redhat.com> Date: Mon, 08 Aug 2011 15:54:43 +0300 From: Avi Kivity MIME-Version: 1.0 References: <1312803458-2272-1-git-send-email-avi@redhat.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Frediano Ziglio Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org On 08/08/2011 03:49 PM, Frediano Ziglio wrote: > 2011/8/8 Avi Kivity: > > In certain circumstances, posix-aio-compat can incur a lot of latenc= y: > > - threads are created by vcpu threads, so if vcpu affinity is set, > > aio threads inherit vcpu affinity. This can cause many aio threa= ds > > to compete for one cpu. > > - we can create up to max_threads (64) aio threads in one go; sinc= e a > > pthread_create can take around 30=CE=BCs, we have up to 2ms of cp= u time > > under a global lock. > > > > Fix by: > > - moving thread creation to the main thread, so we inherit the mai= n > > thread's affinity instead of the vcpu thread's affinity. > > - if a thread is currently being created, and we need to create ye= t > > another thread, let thread being born create the new thread, redu= cing > > the amount of time we spend under the main thread. > > - drop the local lock while creating a thread (we may still hold t= he > > global mutex, though) > > > > Note this doesn't eliminate latency completely; scheduler artifacts = or > > lack of host cpu resources can still cause it. We may want pre-allo= cated > > threads when this cannot be tolerated. > > > > Thanks to Uli Obergfell of Red Hat for his excellent analysis and su= ggestions. > > > > Signed-off-by: Avi Kivity > > Why not calling pthread_attr_setaffinity_np (where available) before > thread creation or shed_setaffinity at thread start instead of telling > another thread to create a thread for us just to get affinity cleared? > The entire qemu process may be affined to a subset of the host cpus; we=20 don't want to break that. For example: taskset 0xf0 qemu .... (qemu) info cpus --=20 error compiling committee.c: too many arguments to function