From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755669Ab2LMP1p (ORCPT ); Thu, 13 Dec 2012 10:27:45 -0500 Received: from e28smtp03.in.ibm.com ([122.248.162.3]:35386 "EHLO e28smtp03.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753485Ab2LMP1n (ORCPT ); Thu, 13 Dec 2012 10:27:43 -0500 Message-ID: <50C9F38F.3020005@linux.vnet.ibm.com> Date: Thu, 13 Dec 2012 20:56:07 +0530 From: "Srivatsa S. Bhat" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Oleg Nesterov CC: tglx@linutronix.de, peterz@infradead.org, paulmck@linux.vnet.ibm.com, rusty@rustcorp.com.au, mingo@kernel.org, akpm@linux-foundation.org, namhyung@kernel.org, vincent.guittot@linaro.org, tj@kernel.org, sbw@mit.edu, amit.kucheria@linaro.org, rostedt@goodmis.org, rjw@sisk.pl, wangyun@linux.vnet.ibm.com, xiaoguangrong@linux.vnet.ibm.com, nikunj@linux.vnet.ibm.com, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context References: <20121211140314.23621.64088.stgit@srivatsabhat.in.ibm.com> <20121211140358.23621.97011.stgit@srivatsabhat.in.ibm.com> <20121212171720.GA22289@redhat.com> <50C8C4A5.4080104@linux.vnet.ibm.com> <20121212180248.GA24882@redhat.com> <50C8CD52.8040808@linux.vnet.ibm.com> <20121212184849.GA26784@redhat.com> <50C8D739.6030903@linux.vnet.ibm.com> In-Reply-To: <50C8D739.6030903@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12121315-3864-0000-0000-000005FD552C Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote: > On 12/13/2012 12:18 AM, Oleg Nesterov wrote: >> On 12/13, Srivatsa S. Bhat wrote: >>> >>> On 12/12/2012 11:32 PM, Oleg Nesterov wrote: >>>> And _perhaps_ get_ can avoid it too? >>>> >>>> I didn't really try to think, probably this is not right, but can't >>>> something like this work? >>>> >>>> #define XXXX (1 << 16) >>>> #define MASK (XXXX -1) >>>> >>>> void get_online_cpus_atomic(void) >>>> { >>>> preempt_disable(); >>>> >>>> // only for writer >>>> __this_cpu_add(reader_percpu_refcnt, XXXX); >>>> >>>> if (__this_cpu_read(reader_percpu_refcnt) & MASK) { >>>> __this_cpu_inc(reader_percpu_refcnt); >>>> } else { >>>> smp_wmb(); >>>> if (writer_active()) { >>>> ... >>>> } >>>> } >>>> >>>> __this_cpu_dec(reader_percpu_refcnt, XXXX); >>>> } >>>> >>> >>> Sorry, may be I'm too blind to see, but I didn't understand the logic >>> of how the mask helps us avoid disabling interrupts.. >> >> Why do we need cli/sti at all? We should prevent the following race: >> >> - the writer already holds hotplug_rwlock, so get_ must not >> succeed. >> >> - the new reader comes, it increments reader_percpu_refcnt, >> but before it checks writer_active() ... >> >> - irq handler does get_online_cpus_atomic() and sees >> reader_nested_percpu() == T, so it simply increments >> reader_percpu_refcnt and succeeds. >> >> OTOH, why do we need to increment reader_percpu_refcnt the counter >> in advance? To ensure that either we see writer_active() or the >> writer should see reader_percpu_refcnt != 0 (and that is why they >> should write/read in reverse order). >> >> The code above tries to avoid this race using the lower 16 bits >> as a "nested-counter", and the upper bits to avoid the race with >> the writer. >> >> // only for writer >> __this_cpu_add(reader_percpu_refcnt, XXXX); >> >> If irq comes and does get_online_cpus_atomic(), it won't be confused >> by __this_cpu_add(XXXX), it will check the lower bits and switch to >> the "slow path". >> > > This is a very clever scheme indeed! :-) Thanks a lot for explaining > it in detail. > >> >> But once again, so far I didn't really try to think. It is quite >> possible I missed something. >> > > Even I don't spot anything wrong with it. But I'll give it some more > thought.. Since an interrupt handler can also run get_online_cpus_atomic(), we cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt, right? To maintain the integrity of the update itself, we will have to use the this_cpu_* variant, which basically plays spoil-sport on this whole scheme... :-( But still, this scheme is better, because the reader doesn't have to spin on the read_lock() with interrupts disabled. That way, interrupt handlers that are not hotplug readers can continue to run on this CPU while taking another CPU offline. Regards, Srivatsa S. Bhat