From mboxrd@z Thu Jan 1 00:00:00 1970 From: michal.simek@petalogix.com (Michal Simek) Date: Tue, 31 May 2011 13:08:35 +0200 Subject: [BUG] "sched: Remove rq->lock from the first half of ttwu()" locks up on ARM In-Reply-To: <1306588381.2497.481.camel@laptop> References: <1306405979.1200.63.camel@twins> <1306407759.27474.207.camel@e102391-lin.cambridge.arm.com> <1306409575.1200.71.camel@twins> <1306412511.1200.90.camel@twins> <20110526122623.GA11875@elte.hu> <20110526123137.GG24876@n2100.arm.linux.org.uk> <20110526125007.GA27083@elte.hu> <20110527120629.GA32617@elte.hu> <20110527205240.GT24876@n2100.arm.linux.org.uk> <1306588381.2497.481.camel@laptop> Message-ID: <4DE4CC33.7090404@petalogix.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Peter Zijlstra wrote: > On Fri, 2011-05-27 at 21:52 +0100, Russell King - ARM Linux wrote: >> On Fri, May 27, 2011 at 02:06:29PM +0200, Ingo Molnar wrote: >>> The expectations are to have irqs off (we are holding the runqueue >>> lock if !__ARCH_WANT_INTERRUPTS_ON_CTXSW), so that's not workable i >>> suspect. >> Just a thought, but we _might_ be able to avoid a lot of this hastle if >> we had a new arch hook in finish_task_switch(), after finish_lock_switch() >> returns but before the old MM is dropped. > > I'd be more than willing to provide this. > >> For the new ASID-based switch_mm(), we currently do this: >> >> 1. check ASID validity >> 2. flush branch predictor >> 3. set reserved ASID value >> 4. set new page tables >> 5. set new ASID value >> >> This will be shortly changed to: >> >> 1. check ASID validity >> 2. flush branch predictor >> 3. set swapper_pg_dir tables >> 4. set new ASID value >> 5. set new page tables >> >> We could change switch_mm() to only do: >> >> 1. flush branch predictor >> 2. set swapper_pg_dir tables >> 3. check ASID validity >> 4. set new ASID value >> >> At this point, we have no user mappings, and so nothing will be using the >> ASID at this point. Then in a new post-finish_lock_switch() arch hook: >> >> 5. check whether we need to do flushing as a result of ASID change >> 6. set new page tables >> >> I think this may simplify the ASID code. It needs prototyping out, >> reviewing and testing, but I think it may work. >> >> And I think it may also be workable with the CPUs which need to flush >> the caches on context switches - we can postpone their page table >> switch to this new arch hook too, which will mean we wouldn't require >> __ARCH_WANT_INTERRUPTS_ON_CTXSW on ARM at all. >> >> Any thoughts (if you've followed what I'm going on about) ? > > Yeah, definitely worth a try, you mentioned on IRC the problem of > detecting if switch_mm() happened in the new arch hook. Since > switch_mm() gets a @next pointer we can set a TIF flag there and have > the new arch hook test for that and conditionally perform the required > work. > > Now, supposing we can get ARM to not rely on > __ARCH_WANT_INTERRUPTS_ON_CTXSW anymore, there's only microblaze left, > Michal, would a similar scheme work for you? If so we can fully > deprecate and remove this exception from the scheduler (yay!). Hi, please correct me if I am wrong but this is workaround just for ARM. I am not aware that we need to do anything with caches. I enabled that options after our discussion (http://lkml.org/lkml/2009/12/3/204) because of problems with lockdep. I will look if I can remove that option but it will be necessary to do some changes in code. switch_to should be called with irq OFF right? Michal Michal -- Michal Simek, Ing. (M.Eng) PetaLogix - Linux Solutions for a Reconfigurable World w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663