From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756350AbYHKS2d (ORCPT ); Mon, 11 Aug 2008 14:28:33 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755914AbYHKS2R (ORCPT ); Mon, 11 Aug 2008 14:28:17 -0400 Received: from gw.goop.org ([64.81.55.164]:54650 "EHLO mail.goop.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755688AbYHKS2R (ORCPT ); Mon, 11 Aug 2008 14:28:17 -0400 Message-ID: <48A084AC.4090006@goop.org> Date: Mon, 11 Aug 2008 11:27:56 -0700 From: Jeremy Fitzhardinge User-Agent: Thunderbird 2.0.0.16 (X11/20080723) MIME-Version: 1.0 To: Nick Piggin CC: Venki Pallipadi , Jens Axboe , Ingo Molnar , npiggin@suse.de, linux-kernel , suresh.b.siddha@intel.com Subject: Re: [PATCH] stack and rcu interaction bug in smp_call_function_mask() References: <20080808193753.GA21964@linux-os.sc.intel.com> <200808101624.15112.nickpiggin@yahoo.com.au> <489FBF6A.40402@goop.org> <200808111449.48123.nickpiggin@yahoo.com.au> In-Reply-To: <200808111449.48123.nickpiggin@yahoo.com.au> X-Enigmail-Version: 0.95.6 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Nick Piggin wrote: > Well that's implemented with the optimized call-single code of course, > so it could be used to implement the masked calls... > > I had wanted to look into finding a good cutoff point and use the > percpu queues for light weight masks, and the single global queue for > larger ones. > > Queue per cpu is not going to be perfect, though. In the current > implementation, you would need a lot of data structures. You could > alleviate this problem by using per CPU vectors rather than lists, > but then you get the added problem of resource starvation at the > remote end too. > > For heavy weight masks on large systems, the single queue I'd say > will be a win. But I never did detailed measurements, so I'm open > to be proven wrong. > Yeah, there's a lot of parameters there. And as I've mentioned before, I wonder whether we should take NUMA topology into account when deciding where and when to use queues. My intuition is that most cross-cpu calls are going to be within cpus on a node, on the grounds that most are mm->cpu_vm_mask calls, and the rest of the system tries hard to co-locate processes sharing memory on one node. Waffle, handwave. J