From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oleg Nesterov Subject: Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context Date: Mon, 10 Dec 2012 18:24:10 +0100 Message-ID: <20121210172410.GA28479@redhat.com> References: <20121207173702.27305.1486.stgit@srivatsabhat.in.ibm.com> <20121207173759.27305.84316.stgit@srivatsabhat.in.ibm.com> <20121209191437.GA2816@redhat.com> <50C4EB79.5050203@linux.vnet.ibm.com> <20121209202234.GA5793@redhat.com> <50C564ED.9090803@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <50C564ED.9090803@linux.vnet.ibm.com> Sender: linux-kernel-owner@vger.kernel.org To: "Srivatsa S. Bhat" Cc: tglx@linutronix.de, peterz@infradead.org, paulmck@linux.vnet.ibm.com, rusty@rustcorp.com.au, mingo@kernel.org, akpm@linux-foundation.org, namhyung@kernel.org, vincent.guittot@linaro.org, tj@kernel.org, sbw@mit.edu, amit.kucheria@linaro.org, rostedt@goodmis.org, rjw@sisk.pl, wangyun@linux.vnet.ibm.com, xiaoguangrong@linux.vnet.ibm.com, nikunj@linux.vnet.ibm.com, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org List-Id: linux-pm@vger.kernel.org On 12/10, Srivatsa S. Bhat wrote: > > On 12/10/2012 01:52 AM, Oleg Nesterov wrote: > > On 12/10, Srivatsa S. Bhat wrote: > >> > >> On 12/10/2012 12:44 AM, Oleg Nesterov wrote: > >> > >>> But yes, it is easy to blame somebody else's code ;) And I can't suggest > >>> something better at least right now. If I understand correctly, we can not > >>> use, say, synchronize_sched() in _cpu_down() path > >> > >> We can't sleep in that code.. so that's a no-go. > > > > But we can? > > > > Note that I meant _cpu_down(), not get_online_cpus_atomic() or take_cpu_down(). > > > > Maybe I'm missing something, but how would it help if we did a > synchronize_sched() so early (in _cpu_down())? Another bunch of preempt_disable() > sections could start immediately after our call to synchronize_sched() no? > How would we deal with that? Sorry for confusion. Of course synchronize_sched() alone is not enough. But we can use it to synchronize with preempt-disabled section and avoid the barriers/atomic in the fast-path. For example, bool writer_pending; DEFINE_RWLOCK(writer_rwlock); DEFINE_PER_CPU(int, reader_ctr); void get_online_cpus_atomic(void) { preempt_disable(); if (likely(!writer_pending) || __this_cpu_read(reader_ctr)) { __this_cpu_inc(reader_ctr); return; } read_lock(&writer_rwlock); __this_cpu_inc(reader_ctr); read_unlock(&writer_rwlock); } // lacks release semantics, but we don't care void put_online_cpus_atomic(void) { __this_cpu_dec(reader_ctr); preempt_enable(); } Now, _cpu_down() does writer_pending = true; synchronize_sched(); before stop_one_cpu(). When synchronize_sched() returns, we know that every get_online_cpus_atomic() must see writer_pending == T. And, if any CPU incremented its reader_ctr we must see it is not zero. take_cpu_down() does write_lock(&writer_rwlock); for_each_online_cpu(cpu) { while (per_cpu(reader_ctr, cpu)) cpu_relax(); } and takes the lock. However. This can lead to the deadlock we already discussed. So take_cpu_down() should do retry: write_lock(&writer_rwlock); for_each_online_cpu(cpu) { if (per_cpu(reader_ctr, cpu)) { write_unlock(&writer_rwlock); goto retry; } } to take the lock. But this is livelockable. However, I do not think it is possible to avoid the livelock. Just in case, the code above is only for illustration, perhaps it is not 100% correct and perhaps we can do it better. cpu_hotplug.active_writer is ignored for simplicity, get/put should check current == active_writer. Oleg.