From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752675AbdKHQJE (ORCPT ); Wed, 8 Nov 2017 11:09:04 -0500 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:35070 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751763AbdKHQJC (ORCPT ); Wed, 8 Nov 2017 11:09:02 -0500 Message-ID: <5A032BB2.2000806@arm.com> Date: Wed, 08 Nov 2017 16:07:14 +0000 From: James Morse User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.6.0 MIME-Version: 1.0 To: Peter Zijlstra CC: Thomas Gleixner , "linux-kernel@vger.kernel.org" Subject: Re: get_online_cpus() from a preemptible() context (bug?) References: <59FC8119.8030608@arm.com> <20171106103212.GG3165@worktop.lehotels.local> <5A00AF37.7030606@arm.com> <20171106210718.GB3326@worktop> In-Reply-To: <20171106210718.GB3326@worktop> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Peter, On 06/11/17 21:07, Peter Zijlstra wrote: > On Mon, Nov 06, 2017 at 06:51:35PM +0000, James Morse wrote: >>> If you look at percpu_down_read(), you'll note it'll disable preemption >>> before calling __percpu_down_read(). >> >> Yes, this is how __percpu_down_read() protects the combination of it's fast/slow >> paths. >> >> But next percpu_down_read() calls preempt_enable(), I can't see what stops us >> migrating before percpu_up_read() preempt_disable()s to call __this_cpu_dec(), >> which now affects a different variable. >> > > Ah, so the two operations that comment talks about are: > > percpu_down_read_preempt_disable() > preempt_disable(); > 1) __this_cpu_inc(*sem->read_count); > if (unlikely(!rcu_sync_is_idle(&sem->rss))) > __percpu_down_read() > smp_mb() > if (likely(!smp_load_acquire(&sem->readers_block))) // false > __percpu_up_read() > smp_mb() > 2) __this_cpu_dec(*sem->read_count); > rcuwait_wake_up(&sem->writer); > preempt_enable_no_resched(); > > If you want more detail on this, I'll actually have to go think :-) I think this was the answer to a much smarter question than mine! I've tried (and failed) to break it instead. To answer my own question: I thought this was potentially-broken because the __this_cpu_{add,dec}() out in {get,put}_online_cpus() will operate on different per-cpu read_count variables if we migrate. (not the pair above) This isn't a problem as the only thing that reads the read_count is readers_active_check(), which per_cpu_sum()s them all together before comparing against zero. As they are all unsigned-ints it uses unsigned-overflow to do the right thing. This even works if a CPU holding a vital part of the read_count is offline, as per_cpu_sum() uses for_each_possible_cpu(). Thanks! James