From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751533AbcFYNms (ORCPT ); Sat, 25 Jun 2016 09:42:48 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:45757 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751249AbcFYNmq (ORCPT ); Sat, 25 Jun 2016 09:42:46 -0400 X-IBM-Helo: d01dlp03.pok.ibm.com X-IBM-MailFrom: xinhui.pan@linux.vnet.ibm.com From: Pan Xinhui To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, mingo@redhat.com, dave@stgolabs.net, will.deacon@arm.com, Waiman.Long@hpe.com, benh@kernel.crashing.org, Pan Xinhui Subject: [PATCH] locking/osq: Drop the overload of osq lock Date: Sat, 25 Jun 2016 13:42:03 -0400 X-Mailer: git-send-email 2.4.11 X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16062513-0040-0000-0000-000000A8449C X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16062513-0041-0000-0000-000004823785 Message-Id: <1466876523-33437-1-git-send-email-xinhui.pan@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-06-25_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1606250162 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org An over-committed guest with more vCPUs than pCPUs has a heavy overload in osq_lock(). This is because vCPU A hold the osq lock and yield out, vCPU B wait per_cpu node->locked to be set. IOW, vCPU B wait vCPU A to run and unlock the osq lock. Even there is need_resched(), it did not help on such scenario. To fix such bad issue, add a threshold in one while-loop of osq_lock(). The value of threshold is somehow equal to SPIN_THRESHOLD. perf record -a perf bench sched messaging -g 400 -p && perf report before patch: 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is 2.49% sched-messaging [kernel.vmlinux] [k] system_call after patch: 7.62% sched-messaging [kernel.kallsyms] [k] wait_consider_task 7.30% sched-messaging [kernel.kallsyms] [k] _raw_write_lock_irq 5.93% sched-messaging [kernel.kallsyms] [k] mutex_unlock 5.74% sched-messaging [unknown] [H] 0xc000000000077590 4.37% sched-messaging [kernel.kallsyms] [k] __copy_tofrom_user_powe 2.58% sched-messaging [kernel.kallsyms] [k] system_call Signed-off-by: Pan Xinhui --- kernel/locking/osq_lock.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c index 05a3785..922fe5d 100644 --- a/kernel/locking/osq_lock.c +++ b/kernel/locking/osq_lock.c @@ -81,12 +81,16 @@ osq_wait_next(struct optimistic_spin_queue *lock, return next; } +/* The threahold should take nearly 0.5ms on most archs */ +#define OSQ_SPIN_THRESHOLD (1 << 15) + bool osq_lock(struct optimistic_spin_queue *lock) { struct optimistic_spin_node *node = this_cpu_ptr(&osq_node); struct optimistic_spin_node *prev, *next; int curr = encode_cpu(smp_processor_id()); int old; + int loops; node->locked = 0; node->next = NULL; @@ -118,8 +122,14 @@ bool osq_lock(struct optimistic_spin_queue *lock) while (!READ_ONCE(node->locked)) { /* * If we need to reschedule bail... so we can block. + * An over-committed guest with more vCPUs than pCPUs + * might fall in this loop and cause a huge overload. + * This is because vCPU A(prev) hold the osq lock and yield out, + * vCPU B(node) wait ->locked to be set, IOW, wait till + * vCPU A run and unlock the osq lock. + * NOTE that vCPU A and vCPU B might run on same physical cpu. */ - if (need_resched()) + if (need_resched() || loops++ == OSQ_SPIN_THRESHOLD) goto unqueue; cpu_relax_lowlatency(); -- 2.4.11