Re: [PATCH] mm: page_alloc: High-order per-cpu page allocator v3

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@techsingularity.net>
To: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>, Michal Hocko <mhocko@suse.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-Kernel <linux-kernel@vger.kernel.org>,
	Rick Jones <rick.jones2@hpe.com>, Paolo Abeni <pabeni@redhat.com>
Subject: Re: [PATCH] mm: page_alloc: High-order per-cpu page allocator v3
Date: Wed, 30 Nov 2016 14:06:15 +0000	[thread overview]
Message-ID: <20161130140615.3bbn7576iwbyc3op@techsingularity.net> (raw)
In-Reply-To: <20161130134034.3b60c7f0@redhat.com>

On Wed, Nov 30, 2016 at 01:40:34PM +0100, Jesper Dangaard Brouer wrote:
> 
> On Sun, 27 Nov 2016 13:19:54 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> [...]
> > SLUB has been the default small kernel object allocator for quite some time
> > but it is not universally used due to performance concerns and a reliance
> > on high-order pages. The high-order concerns has two major components --
> > high-order pages are not always available and high-order page allocations
> > potentially contend on the zone->lock. This patch addresses some concerns
> > about the zone lock contention by extending the per-cpu page allocator to
> > cache high-order pages. The patch makes the following modifications
> > 
> > o New per-cpu lists are added to cache the high-order pages. This increases
> >   the cache footprint of the per-cpu allocator and overall usage but for
> >   some workloads, this will be offset by reduced contention on zone->lock.
> 
> This will also help performance of NIC driver that allocator
> higher-order pages for their RX-ring queue (and chop it up for MTU).
> I do like this patch, even-though I'm working on moving drivers away
> from allocation these high-order pages.
> 
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> 

Thanks.

> [...]
> > This is the result from netperf running UDP_STREAM on localhost. It was
> > selected on the basis that it is slab-intensive and has been the subject
> > of previous SLAB vs SLUB comparisons with the caveat that this is not
> > testing between two physical hosts.
> 
> I do like you are using a networking test to benchmark this. Looking at
> the results, my initial response is that the improvements are basically
> too good to be true.
> 

FWIW, LKP independently measured the boost to be 23% so it's expected
there will be different results depending on exact configuration and CPU.

> Can you share how you tested this with netperf and the specific netperf
> parameters? 

The mmtests config file used is
configs/config-global-dhp__network-netperf-unbound so all details can be
extrapolated or reproduced from that.

> e.g.
>  How do you configure the send/recv sizes?

Static range of sizes specified in the config file.

>  Have you pinned netperf and netserver on different CPUs?
> 

No. While it's possible to do a pinned test which helps stability, it
also tends to be less reflective of what happens in a variety of
workloads so I took the "harder" option.

> For localhost testing, when netperf and netserver run on the same CPU,
> you observer half the performance, very intuitively.  When pinning
> netperf and netserver (via e.g. option -T 1,2) you observe the most
> stable results.  When allowing netperf and netserver to migrate between
> CPUs (default setting), the real fun starts and unstable results,
> because now the CPU scheduler is also being tested, and my experience
> is also more "fun" memory situations occurs, as I guess we are hopping
> between more per CPU alloc caches (also affecting the SLUB per CPU usage
> pattern).
> 

Yes which is another reason why I used an unbound configuration. I didn't
want to get an artificial boost from pinned server/client using the same
per-cpu caches. As a side-effect, it may mean that machines with fewer
CPUs get a greater boost as there are fewer per-cpu caches being used.

> > 2-socket modern machine
> >                                 4.9.0-rc5             4.9.0-rc5
> >                                   vanilla             hopcpu-v3
> 
> The kernel from 4.9.0-rc5-vanilla to 4.9.0-rc5-hopcpu-v3 only contains
> this single change right?

Yes.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Mel Gorman <mgorman@techsingularity.net>
To: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>, Michal Hocko <mhocko@suse.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-Kernel <linux-kernel@vger.kernel.org>,
	Rick Jones <rick.jones2@hpe.com>, Paolo Abeni <pabeni@redhat.com>
Subject: Re: [PATCH] mm: page_alloc: High-order per-cpu page allocator v3
Date: Wed, 30 Nov 2016 14:06:15 +0000	[thread overview]
Message-ID: <20161130140615.3bbn7576iwbyc3op@techsingularity.net> (raw)
In-Reply-To: <20161130134034.3b60c7f0@redhat.com>

On Wed, Nov 30, 2016 at 01:40:34PM +0100, Jesper Dangaard Brouer wrote:
> 
> On Sun, 27 Nov 2016 13:19:54 +0000 Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> [...]
> > SLUB has been the default small kernel object allocator for quite some time
> > but it is not universally used due to performance concerns and a reliance
> > on high-order pages. The high-order concerns has two major components --
> > high-order pages are not always available and high-order page allocations
> > potentially contend on the zone->lock. This patch addresses some concerns
> > about the zone lock contention by extending the per-cpu page allocator to
> > cache high-order pages. The patch makes the following modifications
> > 
> > o New per-cpu lists are added to cache the high-order pages. This increases
> >   the cache footprint of the per-cpu allocator and overall usage but for
> >   some workloads, this will be offset by reduced contention on zone->lock.
> 
> This will also help performance of NIC driver that allocator
> higher-order pages for their RX-ring queue (and chop it up for MTU).
> I do like this patch, even-though I'm working on moving drivers away
> from allocation these high-order pages.
> 
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> 

Thanks.

> [...]
> > This is the result from netperf running UDP_STREAM on localhost. It was
> > selected on the basis that it is slab-intensive and has been the subject
> > of previous SLAB vs SLUB comparisons with the caveat that this is not
> > testing between two physical hosts.
> 
> I do like you are using a networking test to benchmark this. Looking at
> the results, my initial response is that the improvements are basically
> too good to be true.
> 

FWIW, LKP independently measured the boost to be 23% so it's expected
there will be different results depending on exact configuration and CPU.

> Can you share how you tested this with netperf and the specific netperf
> parameters? 

The mmtests config file used is
configs/config-global-dhp__network-netperf-unbound so all details can be
extrapolated or reproduced from that.

> e.g.
>  How do you configure the send/recv sizes?

Static range of sizes specified in the config file.

>  Have you pinned netperf and netserver on different CPUs?
> 

No. While it's possible to do a pinned test which helps stability, it
also tends to be less reflective of what happens in a variety of
workloads so I took the "harder" option.

> For localhost testing, when netperf and netserver run on the same CPU,
> you observer half the performance, very intuitively.  When pinning
> netperf and netserver (via e.g. option -T 1,2) you observe the most
> stable results.  When allowing netperf and netserver to migrate between
> CPUs (default setting), the real fun starts and unstable results,
> because now the CPU scheduler is also being tested, and my experience
> is also more "fun" memory situations occurs, as I guess we are hopping
> between more per CPU alloc caches (also affecting the SLUB per CPU usage
> pattern).
> 

Yes which is another reason why I used an unbound configuration. I didn't
want to get an artificial boost from pinned server/client using the same
per-cpu caches. As a side-effect, it may mean that machines with fewer
CPUs get a greater boost as there are fewer per-cpu caches being used.

> > 2-socket modern machine
> >                                 4.9.0-rc5             4.9.0-rc5
> >                                   vanilla             hopcpu-v3
> 
> The kernel from 4.9.0-rc5-vanilla to 4.9.0-rc5-hopcpu-v3 only contains
> this single change right?

Yes.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2016-11-30 14:06 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-27 13:19 [PATCH] mm: page_alloc: High-order per-cpu page allocator v3 Mel Gorman
2016-11-27 13:19 ` Mel Gorman
2016-11-28 11:00 ` Vlastimil Babka
2016-11-28 11:00   ` Vlastimil Babka
2016-11-28 11:45   ` Mel Gorman
2016-11-28 11:45     ` Mel Gorman
2016-11-30  8:55   ` Mel Gorman
2016-11-30  8:55     ` Mel Gorman
2016-11-28 15:39 ` Christoph Lameter
2016-11-28 15:39   ` Christoph Lameter
2016-11-28 16:21   ` Mel Gorman
2016-11-28 16:21     ` Mel Gorman
2016-11-28 16:38     ` Christoph Lameter
2016-11-28 16:38       ` Christoph Lameter
2016-11-28 18:47       ` Mel Gorman
2016-11-28 18:47         ` Mel Gorman
2016-11-28 18:54         ` Christoph Lameter
2016-11-28 18:54           ` Christoph Lameter
2016-11-28 20:59           ` Vlastimil Babka
2016-11-28 20:59             ` Vlastimil Babka
2016-11-28 19:54 ` Johannes Weiner
2016-11-28 19:54   ` Johannes Weiner
2016-11-30 12:40 ` Jesper Dangaard Brouer
2016-11-30 12:40   ` Jesper Dangaard Brouer
2016-11-30 14:06   ` Mel Gorman [this message]
2016-11-30 14:06     ` Mel Gorman
2016-11-30 15:06     ` Jesper Dangaard Brouer
2016-11-30 15:06       ` Jesper Dangaard Brouer
2016-11-30 16:35       ` Mel Gorman
2016-11-30 16:35         ` Mel Gorman
2016-12-01 17:34         ` Jesper Dangaard Brouer
2016-12-01 17:34           ` Jesper Dangaard Brouer
2016-12-01 22:17           ` Paolo Abeni
2016-12-01 22:17             ` Paolo Abeni
2016-12-02 15:37             ` Jesper Dangaard Brouer
2016-12-02 15:37               ` Jesper Dangaard Brouer
2016-12-02 15:44               ` Paolo Abeni
2016-12-02 15:44                 ` Paolo Abeni
2016-11-30 13:05 ` Michal Hocko
2016-11-30 13:05   ` Michal Hocko
2016-11-30 14:16   ` Mel Gorman
2016-11-30 14:16     ` Mel Gorman
2016-11-30 14:59     ` Michal Hocko
2016-11-30 14:59       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161130140615.3bbn7576iwbyc3op@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=brouer@redhat.com \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=pabeni@redhat.com \
    --cc=rick.jones2@hpe.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.