From mboxrd@z Thu Jan 1 00:00:00 1970 From: paulmck@linux.vnet.ibm.com (Paul E. McKenney) Date: Thu, 5 Feb 2015 09:02:28 -0800 Subject: [PATCH v2] ARM: Don't use complete() during __cpu_die In-Reply-To: <20150205161100.GQ8656@n2100.arm.linux.org.uk> References: <1423131270-24047-1-git-send-email-k.kozlowski@samsung.com> <20150205105035.GL8656@n2100.arm.linux.org.uk> <20150205142918.GA10634@linux.vnet.ibm.com> <20150205161100.GQ8656@n2100.arm.linux.org.uk> Message-ID: <20150205170228.GZ5370@linux.vnet.ibm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Thu, Feb 05, 2015 at 04:11:00PM +0000, Russell King - ARM Linux wrote: > On Thu, Feb 05, 2015 at 06:29:18AM -0800, Paul E. McKenney wrote: > > Works for me, assuming no hidden uses of RCU in the IPI code. ;-) > > Sigh... I kind'a new it wouldn't be this simple. The gic code which > actually raises the IPI takes a raw spinlock, so it's not going to be > this simple - there's a small theoretical window where we have taken > this lock, written the register to send the IPI, and then dropped the > lock - the update to the lock to release it could get lost if the > CPU power is quickly cut at that point. > > Also, we _do_ need the second cache flush in place to ensure that the > unlock is seen to other CPUs. > > We could work around that by taking and releasing the lock in the IPI > processing function... but this is starting to look less attractive > as the lock is private to irq-gic.c. > > Well, we're very close to 3.19, we're too close to be trying to sort > this out, so I'm hoping that your changes which cause this RCU error > are *not* going in during this merge window, because we seem to have > something of a problem right now which needs more time to resolve. Most likely into the 3.20 merge window. But please keep in mind that RCU is just the messenger here -- the current code will break if any CPU for whatever reason takes more than a jiffy to get from its _stop_machine() handler to the end of its last RCU read-side critical section on its way out. A jiffy may sound like a lot, but it is not hard to exceed this limit, especially in virtualized environments. So not like to go into v3.19, but it does need to be resolved. Thanx, Paul From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758109AbbBERCl (ORCPT ); Thu, 5 Feb 2015 12:02:41 -0500 Received: from e33.co.us.ibm.com ([32.97.110.151]:44114 "EHLO e33.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753616AbbBERCk (ORCPT ); Thu, 5 Feb 2015 12:02:40 -0500 Date: Thu, 5 Feb 2015 09:02:28 -0800 From: "Paul E. McKenney" To: Russell King - ARM Linux Cc: Krzysztof Kozlowski , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Arnd Bergmann , Mark Rutland , Bartlomiej Zolnierkiewicz , Marek Szyprowski , Stephen Boyd , Catalin Marinas , Will Deacon Subject: Re: [PATCH v2] ARM: Don't use complete() during __cpu_die Message-ID: <20150205170228.GZ5370@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <1423131270-24047-1-git-send-email-k.kozlowski@samsung.com> <20150205105035.GL8656@n2100.arm.linux.org.uk> <20150205142918.GA10634@linux.vnet.ibm.com> <20150205161100.GQ8656@n2100.arm.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150205161100.GQ8656@n2100.arm.linux.org.uk> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15020517-0009-0000-0000-00000889173F Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 05, 2015 at 04:11:00PM +0000, Russell King - ARM Linux wrote: > On Thu, Feb 05, 2015 at 06:29:18AM -0800, Paul E. McKenney wrote: > > Works for me, assuming no hidden uses of RCU in the IPI code. ;-) > > Sigh... I kind'a new it wouldn't be this simple. The gic code which > actually raises the IPI takes a raw spinlock, so it's not going to be > this simple - there's a small theoretical window where we have taken > this lock, written the register to send the IPI, and then dropped the > lock - the update to the lock to release it could get lost if the > CPU power is quickly cut at that point. > > Also, we _do_ need the second cache flush in place to ensure that the > unlock is seen to other CPUs. > > We could work around that by taking and releasing the lock in the IPI > processing function... but this is starting to look less attractive > as the lock is private to irq-gic.c. > > Well, we're very close to 3.19, we're too close to be trying to sort > this out, so I'm hoping that your changes which cause this RCU error > are *not* going in during this merge window, because we seem to have > something of a problem right now which needs more time to resolve. Most likely into the 3.20 merge window. But please keep in mind that RCU is just the messenger here -- the current code will break if any CPU for whatever reason takes more than a jiffy to get from its _stop_machine() handler to the end of its last RCU read-side critical section on its way out. A jiffy may sound like a lot, but it is not hard to exceed this limit, especially in virtualized environments. So not like to go into v3.19, but it does need to be resolved. Thanx, Paul