From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751927AbbIOMFZ (ORCPT <rfc822;w@1wt.eu>);
	Tue, 15 Sep 2015 08:05:25 -0400
Received: from e06smtp07.uk.ibm.com ([195.75.94.103]:58527 "EHLO
	e06smtp07.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751567AbbIOMFX (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 15 Sep 2015 08:05:23 -0400
X-Helo: d06dlp02.portsmouth.uk.ibm.com
X-MailFrom: borntraeger@de.ibm.com
X-RcptTo: linux-kernel@vger.kernel.org
To: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
        "linux-kernel@vger.kernel.org >> Linux Kernel Mailing List" 
	<linux-kernel@vger.kernel.org>,
        KVM list <kvm@vger.kernel.org>
From: Christian Borntraeger <borntraeger@de.ibm.com>
Subject: [4.2] commit d59cfc09c32 (sched, cgroup: replace
 signal_struct->group_rwsem with a global percpu_rwsem) causes regression for
 libvirt/kvm
Message-ID: <55F8097A.7000206@de.ibm.com>
Date: Tue, 15 Sep 2015 14:05:14 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.2.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 15091512-0029-0000-0000-00000421204B
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Tejun,


commit d59cfc09c32a2ae31f1c3bc2983a0cd79afb3f14 (sched, cgroup: replace 
signal_struct->group_rwsem with a global percpu_rwsem) causes some noticably
hickups when starting several kvm guests (which libvirt will move into cgroups
- each vcpu thread and each i/o thread)
When you now start lots of guests in parallel on a bigger system (32CPUs with
2way smt in my case) the system is so busy that systemd runs into several timeouts
like "Did not receive a reply. Possible causes include: the remote application did
not send a reply, the message bus security policy blocked the reply, the reply
timeout expired, or the network connection was broken."

The problem seems to be that the newly used percpu_rwsem does a
rcu_synchronize_sched_expedited for all write downs/ups.

Hacking the percpu_rw_semaphore to always go the slow path and avoid the
synchronize_sched seems to fix the issue. For some (yet unknown to me) reason
the synchronize_sched and the fast path seems to block writers for incredibly
long times.

To trigger the problem, the guest must be CPU bound, iow idle guests seem to
not trigger the pathological case. Any idea how to improve the situation, 
this looks like a real regression for larger kvm installations.

Christian