From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760454AbZDGT65 (ORCPT ); Tue, 7 Apr 2009 15:58:57 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756838AbZDGT6n (ORCPT ); Tue, 7 Apr 2009 15:58:43 -0400 Received: from courier.cs.helsinki.fi ([128.214.9.1]:36586 "EHLO mail.cs.helsinki.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760192AbZDGT6m (ORCPT ); Tue, 7 Apr 2009 15:58:42 -0400 Message-ID: <49DBAF7E.30704@cs.helsinki.fi> Date: Tue, 07 Apr 2009 22:54:38 +0300 From: Pekka Enberg User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302) MIME-Version: 1.0 To: David Rientjes CC: Christoph Lameter , linux-kernel@vger.kernel.org Subject: Re: [patch] slub: default min_partial to at least highest cpus per node References: <49DBA23A.3000106@cs.helsinki.fi> <84144f020904071209j638aae9bv406661ec401af7af@mail.gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org David Rientjes wrote: > On Tue, 7 Apr 2009, Pekka Enberg wrote: > >>> Hmm, partial lists are per-node, so wouldn't it be better to do the >>> adjustment for every struct kmem_cache_node separately? The >>> 'min_partial_per_node' global seems just too ugly and confusing to live >>> with. >> Btw, that requires moving ->min_partial to struct kmem_cache_node from >> struct kmem_cache. But I think that makes a whole lot of sense if >> some nodes may have more CPUs than others. >> >> And while the improvement is kinda obvious, I would be interested to >> know what kind of workload benefits from this patch (and see numbers >> if there are any). >> > > It doesn't really depend on the workload, it depends on the type of NUMA > machine it's running on (and whether that NUMA is asymmetric amongst > cpus). > > Since min_partial_per_node is capped at MAX_PARTIAL, this is only really > relevant for remote node defragmentation if it's allowed (and not just 2% > of the time like the default). We want to avoid stealing partial slabs > from remote nodes if there are fewer than the number of cpus on that node. > > Otherwise, it's possible for each cpu on the victim node to try to > allocate a single object and require nr_cpus_node(node) new slab > allocations. In this case it's entirely possible for the majority of cpus > to have cpu slabs from remote nodes. This change reduces the liklihood of > that happening because we'll always have cpu slab replacements on our > local partial list before allowing remote defragmentation. > > I'd be just as happy with the following, although it would require > changing MIN_PARTIAL to be greater than its default of 5 if a node > supports more cpus for optimal performance (the old patch did that > automatically up to MAX_PARTIAL). Hmm but why not move ->min_partial to struct kmem_cache_node as I suggested and make sure it's adjusted properly as with nr_cpus_node()? > diff --git a/mm/slub.c b/mm/slub.c > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1326,11 +1326,13 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags) > zonelist = node_zonelist(slab_node(current->mempolicy), flags); > for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) { > struct kmem_cache_node *n; > + int node; > > - n = get_node(s, zone_to_nid(zone)); > + node = zone_to_nid(zone); > + n = get_node(s, node); > > if (n && cpuset_zone_allowed_hardwall(zone, flags) && > - n->nr_partial > s->min_partial) { > + n->nr_partial > nr_cpus_node(node)) { > page = get_partial_node(n); > if (page) > return page;