From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751927AbbIOMFZ (ORCPT ); Tue, 15 Sep 2015 08:05:25 -0400 Received: from e06smtp07.uk.ibm.com ([195.75.94.103]:58527 "EHLO e06smtp07.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751567AbbIOMFX (ORCPT ); Tue, 15 Sep 2015 08:05:23 -0400 X-Helo: d06dlp02.portsmouth.uk.ibm.com X-MailFrom: borntraeger@de.ibm.com X-RcptTo: linux-kernel@vger.kernel.org To: Tejun Heo Cc: Ingo Molnar , Peter Zijlstra , "linux-kernel@vger.kernel.org >> Linux Kernel Mailing List" , KVM list From: Christian Borntraeger Subject: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm Message-ID: <55F8097A.7000206@de.ibm.com> Date: Tue, 15 Sep 2015 14:05:14 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15091512-0029-0000-0000-00000421204B Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Tejun, commit d59cfc09c32a2ae31f1c3bc2983a0cd79afb3f14 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes some noticably hickups when starting several kvm guests (which libvirt will move into cgroups - each vcpu thread and each i/o thread) When you now start lots of guests in parallel on a bigger system (32CPUs with 2way smt in my case) the system is so busy that systemd runs into several timeouts like "Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken." The problem seems to be that the newly used percpu_rwsem does a rcu_synchronize_sched_expedited for all write downs/ups. Hacking the percpu_rw_semaphore to always go the slow path and avoid the synchronize_sched seems to fix the issue. For some (yet unknown to me) reason the synchronize_sched and the fast path seems to block writers for incredibly long times. To trigger the problem, the guest must be CPU bound, iow idle guests seem to not trigger the pathological case. Any idea how to improve the situation, this looks like a real regression for larger kvm installations. Christian