From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754315AbZBPTmo (ORCPT ); Mon, 16 Feb 2009 14:42:44 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753505AbZBPTmR (ORCPT ); Mon, 16 Feb 2009 14:42:17 -0500 Received: from casper.infradead.org ([85.118.1.10]:37918 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753444AbZBPTmQ (ORCPT ); Mon, 16 Feb 2009 14:42:16 -0500 Subject: Re: [PATCH 1/4] generic-smp: remove single ipi fallback for smp_call_function_many() From: Peter Zijlstra To: Oleg Nesterov Cc: Linus Torvalds , Nick Piggin , Jens Axboe , "Paul E. McKenney" , Ingo Molnar , Rusty Russell , Steven Rostedt , linux-kernel@vger.kernel.org In-Reply-To: <20090216191008.GA1521@redhat.com> References: <20090216163847.431174825@chello.nl> <20090216164114.433430761@chello.nl> <20090216191008.GA1521@redhat.com> Content-Type: text/plain Date: Mon, 16 Feb 2009 20:41:32 +0100 Message-Id: <1234813292.30178.327.camel@laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.25.90 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2009-02-16 at 20:10 +0100, Oleg Nesterov wrote: > On 02/16, Peter Zijlstra wrote: > > > > @@ -347,31 +384,32 @@ void smp_call_function_many(const struct > > return; > > } > > > > - data = kmalloc(sizeof(*data) + cpumask_size(), GFP_ATOMIC); > > - if (unlikely(!data)) { > > - /* Slow path. */ > > - for_each_online_cpu(cpu) { > > - if (cpu == smp_processor_id()) > > - continue; > > - if (cpumask_test_cpu(cpu, mask)) > > - smp_call_function_single(cpu, func, info, wait); > > - } > > - return; > > + data = kmalloc(sizeof(*data), GFP_ATOMIC); > > + if (data) > > + data->csd.flags = CSD_FLAG_ALLOC; > > + else { > > + data = &per_cpu(cfd_data, me); > > + /* > > + * We need to wait for all previous users to go away. > > + */ > > + while (data->csd.flags & CSD_FLAG_LOCK) > > + cpu_relax(); > > + data->csd.flags = CSD_FLAG_LOCK; > > } > > > > spin_lock_init(&data->lock); > > - data->csd.flags = CSD_FLAG_ALLOC; > > if (wait) > > data->csd.flags |= CSD_FLAG_WAIT; > > data->csd.func = func; > > data->csd.info = info; > > - cpumask_and(to_cpumask(data->cpumask_bits), mask, cpu_online_mask); > > - cpumask_clear_cpu(smp_processor_id(), to_cpumask(data->cpumask_bits)); > > - data->refs = cpumask_weight(to_cpumask(data->cpumask_bits)); > > - > > - spin_lock_irqsave(&call_function_lock, flags); > > - list_add_tail_rcu(&data->csd.list, &call_function_queue); > > - spin_unlock_irqrestore(&call_function_lock, flags); > > + cpumask_and(&data->cpumask, mask, cpu_online_mask); > > + cpumask_clear_cpu(smp_processor_id(), &data->cpumask); > > (perhaps it makes sense to use "me" instead of smp_processor_id()) Ah, missed one it seems, thanks ;-) > > + data->refs = cpumask_weight(&data->cpumask); > > + > > + spin_lock_irqsave(&call_function.lock, flags); > > + call_function.counter++; > > + list_add_tail_rcu(&data->csd.list, &call_function.queue); > > + spin_unlock_irqrestore(&call_function.lock, flags); > > What if the initialization above leaks into the critical section? > > I mean, generic_smp_call_function_interrupt() running on another CPU > can see the result of list_add_tail_rcu() and cpumask_and(data->cpumask) > but not (say) "data->refs = ...". Ho humm, nice :-) So best would be to put that data initialization under data->lock. This would be a bug in the original code too. > Actually I don't understand the counter/free_list logic. > > generic_smp_call_function_interrupt: > > /* > * When the global queue is empty, its guaranteed that no cpu > * is still observing any entry on the free_list, therefore > * we can go ahead and unlock them. > */ > if (!--call_function.counter) > list_splice_init(&call_function.free_list, &free_list); > > I can't see why "its guaranteed that no cpu ...". Let's suppose CPU 0 > "hangs" for some reason in generic_smp_call_function_interrupt() right > before "if (!cpumask_test_cpu(cpu, data->cpumask))" test. Then it is > possible that another CPU removes the single entry (which doesn't have > CPU 0 in data->cpumask) from call_function.queue. Then call_function.counter wouldn't be 0, and we would not release the list entries. > Now, if that entry is re-added, we can have a problem if CPU 0 sees > the partly initialized entry. Right, so the scenario is that a cpu observes the list entry after we do list_del_rcu() -- most likely a cpu not in the original mask, taversing the list for a next entry -- we have to wait for some quiescent state to ensure nobody is still referencing it. We cannot use regular RCU, because its quiesent state takes forever to happen, therefore this implementes a simple counting rcu for the queue domain only. When the list is empty, there's nobody seeing any elements, ergo we can release the entries and re-use them. > But why do we need counter/free_list at all? > Can't we do the following: > > - initialize call_function_data.lock at boot time > > - change smp_call_function_many() to initialize cfd_data > under ->lock > > - change generic_smp_call_function_interrupt() to do > > list_for_each_entry_rcu(data) { > > if (!cpumask_test_cpu(cpu, data->cpumask)) > continue; > > spin_lock(&data->lock); > if (!cpumask_test_cpu(cpu, data->cpumask)) { > spin_unlock(data->lock); > continue; > } > > cpumask_clear_cpu(cpu, data->cpumask); > refs = --data->refs; > > func = data->func; > info = data->info; > spin_unlock(&data->lock); > > func(info); > > if (refs) > continue; > > ... > } > > Afaics, it is OK if smp_call_function_many() sees the "unneeded" entry on > list, the only thing we must ensure is that we can't "misunderstand" this > entry. > > No? Hmm, could that not leed to loops, and or missed entries in the list-iteration? The saves approach seemed to wait until sure nobody observed the entry before re-using it. > > Off-topic question, looks like smp_call_function_single() must not be > called from interrupt/bh handler, right? But the comment says nothing. > > And, > smp_call_function_single: > > /* Can deadlock when called with interrupts disabled */ > WARN_ON(irqs_disabled()); > > Just curious, we can only deadlock if wait = T, right? Right.