From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752935AbbIONgm (ORCPT ); Tue, 15 Sep 2015 09:36:42 -0400 Received: from e06smtp16.uk.ibm.com ([195.75.94.112]:40162 "EHLO e06smtp16.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752336AbbIONgk (ORCPT ); Tue, 15 Sep 2015 09:36:40 -0400 X-Helo: d06dlp02.portsmouth.uk.ibm.com X-MailFrom: borntraeger@de.ibm.com X-RcptTo: linux-kernel@vger.kernel.org Subject: Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm To: Peter Zijlstra References: <55F8097A.7000206@de.ibm.com> <20150915130550.GC16853@twins.programming.kicks-ass.net> Cc: Tejun Heo , Ingo Molnar , "linux-kernel@vger.kernel.org >> Linux Kernel Mailing List" , KVM list , Oleg Nesterov , Paul McKenney From: Christian Borntraeger Message-ID: <55F81EE2.4090708@de.ibm.com> Date: Tue, 15 Sep 2015 15:36:34 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20150915130550.GC16853@twins.programming.kicks-ass.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15091513-0025-0000-0000-000006DD9A5E Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am 15.09.2015 um 15:05 schrieb Peter Zijlstra: > On Tue, Sep 15, 2015 at 02:05:14PM +0200, Christian Borntraeger wrote: >> Tejun, >> >> >> commit d59cfc09c32a2ae31f1c3bc2983a0cd79afb3f14 (sched, cgroup: replace >> signal_struct->group_rwsem with a global percpu_rwsem) causes some noticably >> hickups when starting several kvm guests (which libvirt will move into cgroups >> - each vcpu thread and each i/o thread) >> When you now start lots of guests in parallel on a bigger system (32CPUs with >> 2way smt in my case) the system is so busy that systemd runs into several timeouts >> like "Did not receive a reply. Possible causes include: the remote application did >> not send a reply, the message bus security policy blocked the reply, the reply >> timeout expired, or the network connection was broken." >> >> The problem seems to be that the newly used percpu_rwsem does a >> rcu_synchronize_sched_expedited for all write downs/ups. > > Can you try: > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev.2015.09.11ab yes, dev.2015.09.11a seems to help, thanks. Getting rid of the expedited hammer was really helpful - I guess. > > those include Oleg's rework of the percpu rwsem which should hopefully > improve things somewhat. > > But yes, pounding a global lock on a big machine will always suck. By hacking out the fast path I actually degraded percpu rwsem to a real global lock, but things were still a lot faster. I am wondering why the old code behaved in such fatal ways. Is there some interaction between waiting for a reschedule in the synchronize_sched writer and some fork code actually waiting for the read side to get the lock together with some rescheduling going on waiting for a lock that fork holds? lockdep does not give me an hints so I have no clue :-( Christian