From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756265AbcE0T2u (ORCPT ); Fri, 27 May 2016 15:28:50 -0400 Received: from mail-by2on0102.outbound.protection.outlook.com ([207.46.100.102]:22704 "EHLO na01-by2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750850AbcE0T2r (ORCPT ); Fri, 27 May 2016 15:28:47 -0400 Authentication-Results: gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=none action=none header.from=hpe.com; Message-ID: <57489FE7.8080402@hpe.com> Date: Fri, 27 May 2016 15:28:39 -0400 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Boqun Feng CC: Peter Zijlstra , Ingo Molnar , , Pan Xinhui , Scott J Norton , Douglas Hatch Subject: Re: [PATCH 1/2] locking/pvqspinlock: Fix missed PV wakeup problem References: <1464286918-39748-1-git-send-email-Waiman.Long@hpe.com> <1464286918-39748-2-git-send-email-Waiman.Long@hpe.com> <20160527074331.GB8096@insomnia> In-Reply-To: <20160527074331.GB8096@insomnia> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [72.71.243.249] X-ClientProxiedBy: SN1PR01CA0012.prod.exchangelabs.com (10.165.224.22) To CS1PR84MB0312.NAMPRD84.PROD.OUTLOOK.COM (10.162.190.30) X-MS-Office365-Filtering-Correlation-Id: c87d2eb7-8e53-4939-989b-08d386651e2e X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0312;2:7bnlh/AuGVrYd8diEUy6zkheoDa4hFwXE0u1TxRCvE3sGiiYbtYic6g2Krc22W1CuBS8gFlH38NyTHIf6UJv+Y8l/BfqONDdOPZflflY45lSmDyZspS76AWqAMdk6AnPcqwvleD/+r1mR4p2QAixhhXI4b7DlCUNEQIGIH/TVgRwAsNlk7S159mNYzZefT2O;3:R+o1uot21Msd5fFEak+7Rej9HpDqphW/YSihX2dpo3ucSumVHxTsWvMljoECamhwdftiIj1lN502BmHsEqTq+1uPBUaQDX9x/WjS88/1WsyOm5346q3a8vPWrDYeYUso;25:jNG4vqlKjDESNhIXJK6MmhhyhGSGDCYXBg360uUYIMT8M7zzH/TYgU3/AFoC029lnm7kv/tHUmbzKQ5XlOrTDACJ980jNkKxtcV5v5DcWJ2CXnXMSquQGbYGzrUQKW3kH0PDT1+jvObJCFToW7iHz3bgldQ4eEPkB8zzAIvdgRpgdHwyNM3o8dZ/WJhMkoRUcU8tnj5aM0Zl8hUSHJmJX7p4PKXzobrll8uhrtCMrbanYdkB96371tK9QLLX80vl9dSscEfZhAsfrILvOEf6rykxBSM/1QlwNBlne5ewvifD9UGgjDDF72vXfdDr5tew032z9TheVKuRqFJQg2Jf+jrEeheRnjsscGM7tvHJDyTpCCtwHpprg5jpimw+qAPtcXitYTKtT4ToXS+Cmmj/bJCzHhNEPFPWHXlF6A0hZf0= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CS1PR84MB0312; X-LD-Processed: 105b2061-b669-4b31-92ac-24d304d195dc,ExtAddr X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0312;20:sxFsh+n4wxKGAKnqip1WVNETpAVoxdoTGnWo/u/TZKS0j4S04tqyRrhEbIKFHSoW4LWbfjJ9hgiPTNQj9eh4UmOYRDleQeJfTAwTCxj3/vOD8FyZu4qtuvZDFjbWY3+uba9vtfi6RBSbXQmJA57/iNyz6EhTk5gkW4F42qhyNUAaOU44H6hZH4yIgJ9a4h6krW+6bE8bOyR6B9tQRSzIlTsAeZq6epfNYMyqspLvmXnKVB+icSYAVVzVC/VnoS58CJRGR7AjgACVY8V5SUgo8pWmxTQOKaocSeFdMiQzLOA5bOH/Ra47AouNloIx8R4yG2clr3Ha+3Of81xe3PAfIg==;4:4qn3eXISHor48OukXmNlW86MYTg0QjPQj2oeeXI1cJuujyPFWulaC3cEV8XVZ+r2ccY3iDOSby89/br4FyMRWfTcQUQckjELgIskMDhSFgmnL2dX8UHFgwbdcvDKYErKu1Ofuo88O7SbYFP6gdgYQAur4NWjdYExq5PLeR71g39Giy0+HrX9Upx3spXHYQnzlO1TfLTzMNkygko8y4Sp2NmhqSZQxrTFCKK53oXSMj/VS/jpUEJQI33xQjPM0QnmoGXws3woZeeYkHHQD0B4Y4luSr4mIyTocnol+UiNm0y0MI0JI4oan8KeTNQ8II/F3L+hJnTYQGTQrAUzCRe0yMZemHCpeM52uz3C8vMcQ7vdY42eG1D8au4/uWT8dOxwCGdV5CqnYZS2hkEI8qTQ5WKvqZzM7tqhYdCA7+TCghybGRmumc5B0Q3yB38fuZ1x X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(227479698468861)(104084551191319); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(5005006)(3002001)(10201501046);SRVR:CS1PR84MB0312;BCL:0;PCL:0;RULEID:;SRVR:CS1PR84MB0312; X-Forefront-PRVS: 09555FB1AD X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6049001)(6009001)(377454003)(33964002)(24454002)(50466002)(65956001)(65806001)(66066001)(36756003)(47776003)(23756003)(64126003)(86362001)(99136001)(77096005)(2950100001)(33656002)(110136002)(6116002)(54356999)(76176999)(65816999)(117156001)(189998001)(42186005)(50986999)(4326007)(230700001)(8676002)(83506001)(2906002)(19580395003)(5004730100002)(19580405001)(92566002)(81166006)(3846002)(5008740100001)(586003)(80316001);DIR:OUT;SFP:1102;SCL:1;SRVR:CS1PR84MB0312;H:[192.168.142.141];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?iso-8859-1?Q?1;CS1PR84MB0312;23:Wq1vaOcTXtVs5kvtzq0ViPjWxhuLquNtOAZYDpk?= =?iso-8859-1?Q?Yh4LNRHhL5Crod+I6aaAGQBXp9GSRCmx/3I9zzdAtVI/3nCmkn8LBsKgqO?= =?iso-8859-1?Q?vJHNEmgUffdYoJf6Y6UxzAu+nUxBcTVE6iZyU0hkJom4a/4dSa9dFNg0GD?= =?iso-8859-1?Q?HabAddnQiai3ZxK256DXxvypcj1ZOYWqSjuD18v5YEbALFv8Z201kI03qe?= =?iso-8859-1?Q?mH0QTkeaWDTknEZT0Veq2uXQ9nZmZayRGwf47YFZAomPI1MfPy0yjjGsrv?= =?iso-8859-1?Q?RBWJuYH2BfkeGBPMmM83ZMjAL4EtdoFhEOFk4P73JHGercM7NO3BMUURFR?= =?iso-8859-1?Q?Ln6EvwlQmcW6QbJpJXa07iMXBFZUHQWaIutAIHAGohEjNFIZJezYJ+zEKJ?= =?iso-8859-1?Q?HxzSwlMrRJUX2DGpA7LwMfgYB+nyd18DJeVgjayiA6hYZn73SBLf3BWEnT?= =?iso-8859-1?Q?JWA9gvqbUa649IWuaJCCD6Kc5RPIY+ks8tPxxe8nRatiyc4nBV05QR/emJ?= =?iso-8859-1?Q?lrsxRxF8kIKUafGp7NhSfBY+bjnxOJvWQ8Q0RUBbxzThmX5SAximnFAKT6?= =?iso-8859-1?Q?F82WSNpT3x/wm8CROK18v/OO+vjHgJE7yXyrXEIU65vm5OEjfsKDlxCDxX?= =?iso-8859-1?Q?652IRrDw8RTQA1ZSnWU+KfIUij03ko+dZjMOWKnrI6imz1+Xxnn2wd0xhL?= =?iso-8859-1?Q?KASZo+ht8wfYbQajxFDIKdBHgdCDwVfPpYAA42C/FWYmvas1VJRy183OsV?= =?iso-8859-1?Q?ED+1X/IUVYnXrpcVv5XeqNFmTTrRrqwUDazJ3OLPZQBlP+CjJpxCN/kmR7?= =?iso-8859-1?Q?sEzm/P31s51U3E8axQB33g/DkrhYfXTcU77c7lfEnZuJTSCXgV09wzzQ7J?= =?iso-8859-1?Q?z4OV0af4pKSedoFtKfpUf2ZOhGAm6YixskWH97AMzohLBK91rOpP3f0+JP?= =?iso-8859-1?Q?J8qc4k9r0NqYcabHm4W+hbSV+hdGbhWjxlHWMK7OwPgaF+Oj7G0orOvgxY?= =?iso-8859-1?Q?J5d1g5gpHz/KIUm2cH2fCVrZqTfeFXeedQWwUo9DNJ8SeXFQGw1l8vE9Kb?= =?iso-8859-1?Q?6Q5MWYMPMsMRmS1kNqxleF9943MSGyBN3w3W4r20zyZnvoED4yCUQuAybp?= =?iso-8859-1?Q?u/d/kG/uYG28EXbgOgpXVUSNSCQ=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0312;5:hjBVPgOZ+fS0xo6PTxtJDsrYsOB/NxNqK4iGWKpmYh8g4OoZkLVIe6yBA32Nv18SlybKIAGLlh8gFD1wsRE2TC0AzGUxRmWWGnS0HHqdhRWFWke3Hc0dYW6xPc1kU2ViIq5SERwwSnwySe8EqQcr2w==;24:C34QCq67Qyrhhky2zcP8rKpPk+8sTzBMC6xz09MS6AF6JfWqov/xVy68Pi6DBoPKTyc4UHh87eGo2qnX6EXr/WYG7/y+IBLShsH7Op5LfZE=;7:T1kQUWdOKTJcBhyS+IxfHtkT32rhoMBFnRTr86OkUFf00DTnlN03ayrQ2Ez4ehe/j2bX1L6Ma7EESs9D8w+ryy1clWvvrWnxC9W4H9vbCgPgVsO914udnZJVJ/O7CjFtVrRA93KJdmrmL3pHveCSIHYSwa3K4xBi/s4uOl9d+QlVQMJzzGywDx/23s3W5VIc SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: hpe.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 May 2016 19:28:43.7123 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CS1PR84MB0312 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/27/2016 03:43 AM, Boqun Feng wrote: > Hi Waiman, > > On Thu, May 26, 2016 at 02:21:57PM -0400, Waiman Long wrote: >> Currently, calling pv_hash() and setting _Q_SLOW_VAL is only >> done once for any pv_node. It is either in pv_kick_node() or in >> pv_wait_head_or_lock(). Because of lock stealing, a pv_kick'ed node is >> not guaranteed to get the lock before the spinning threshold expires >> and has to call pv_wait() again. As a result, the new lock holder >> won't see _Q_SLOW_VAL and so won't wake up the sleeping vCPU. >> >> This patch fixes this missed PV wakeup problem by allowing multiple >> _Q_SLOW_VAL settings within pv_wait_head_or_lock() and matching each >> successful setting of _Q_SLOW_VAL to a pv_hash() call. >> >> Reported-by: Pan Xinhui >> Signed-off-by: Waiman Long >> --- >> kernel/locking/qspinlock_paravirt.h | 48 ++++++++++++++++++++++------------ >> 1 files changed, 31 insertions(+), 17 deletions(-) >> >> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h >> index 21ede57..452d06d 100644 >> --- a/kernel/locking/qspinlock_paravirt.h >> +++ b/kernel/locking/qspinlock_paravirt.h >> @@ -369,12 +369,16 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node) >> /* >> * Put the lock into the hash table and set the _Q_SLOW_VAL. >> * >> - * As this is the same vCPU that will check the _Q_SLOW_VAL value and >> - * the hash table later on at unlock time, no atomic instruction is >> - * needed. >> + * It is very unlikely that this will race with the _Q_SLOW_VAL setting >> + * in pv_wait_head_or_lock(). However, we use cmpxchg() here to be >> + * sure that we won't do a double pv_hash(). >> + * >> + * As it is the lock holder, it won't race with >> + * __pv_queued_spin_unlock(). >> */ >> - WRITE_ONCE(l->locked, _Q_SLOW_VAL); >> - (void)pv_hash(lock, pn); >> + if (likely(cmpxchg(&l->locked, _Q_LOCKED_VAL, _Q_SLOW_VAL) >> + == _Q_LOCKED_VAL)) >> + pv_hash(lock, pn); >> } >> >> /* >> @@ -389,18 +393,10 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node) >> { >> struct pv_node *pn = (struct pv_node *)node; >> struct __qspinlock *l = (void *)lock; >> - struct qspinlock **lp = NULL; >> int waitcnt = 0; >> int loop; >> >> /* >> - * If pv_kick_node() already advanced our state, we don't need to >> - * insert ourselves into the hash table anymore. >> - */ >> - if (READ_ONCE(pn->state) == vcpu_hashed) >> - lp = (struct qspinlock **)1; >> - >> - /* >> * Tracking # of slowpath locking operations >> */ >> qstat_inc(qstat_pv_lock_slowpath, true); >> @@ -422,11 +418,19 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node) >> goto gotlock; >> cpu_relax(); >> } >> - clear_pending(lock); >> >> + /* >> + * Make sure the lock value check below is executed after >> + * all the previous loads. >> + */ >> + smp_rmb(); >> >> - if (!lp) { /* ONCE */ >> - lp = pv_hash(lock, pn); >> + /* >> + * Set _Q_SLOW_VAL and hash the PV node, if necessary. >> + */ >> + if (READ_ONCE(l->locked) != _Q_SLOW_VAL) { >> + struct qspinlock **lp = pv_hash(lock, pn); >> + u8 locked; >> > Just out of curiosity, what if the following sequence happens: > > CPU 0 CPU 1 > ================= ==================== > spin_lock(): spin_lock(): > pv_kick_node(): pv_wait_head_or_lock(): > if (READ_ONCE(l->locked) != _Q_SLOW_VAL) { // True > pv_hash(); > > cmpxchg(&l->locked, _Q_LOCKED_VAL, _Q_SLOW_VAL); > pv_hash(); > locked = xchg(&l->locked, _Q_SLOW_VAL); > do_something(); if(...) { > } > spin_unlock(): > pv_unhash(); > else if (unlikely(locked == _Q_SLOW_VAL)) { > WRITE_ONCE(*lp, NULL); > > because pv_hash() on CPU 1 called before the one on CPU 0, therefore > the hash entry from CPU 1 is preceding the hash entry from CPU 0 in the > hash table, so that when pv_unhash() called, hash entry from CPU 1 will > be unhashed, however, the WRITE_ONCE(*lp, NULL) on CPU 1 will also > unhash the same entry, leaving that hash entry from CPU 0 not unhashed. > > This could result in several interesting problems, right? ;-) This is a very unlikely scenario, but I agree that it can happen. I think the only way to close this loophole is to make pv_unhash() atomic. By using pv_unhash() instead of WRITE_ONCE(), it should fix this issue. I will send out an updated patch to fix that. Cheers, Longman