Re: [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists
       [not found] ` <20210531120412.17411-3-mgorman@techsingularity.net>
@ 2021-05-31 15:23   ` Jesper Dangaard Brouer
  2021-06-01 12:45     ` Mel Gorman
  0 siblings, 1 reply; 3+ messages in thread
From: Jesper Dangaard Brouer @ 2021-05-31 15:23 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Dave Hansen, Vlastimil Babka, Michal Hocko, LKML,
	brouer, netdev@vger.kernel.org

On Mon, 31 May 2021 13:04:12 +0100
Mel Gorman <mgorman@techsingularity.net> wrote:

> The per-cpu page allocator (PCP) only stores order-0 pages. This means
> that all THP and "cheap" high-order allocations including SLUB contends
> on the zone->lock. This patch extends the PCP allocator to store THP and
> "cheap" high-order pages. Note that struct per_cpu_pages increases in
> size to 256 bytes (4 cache lines) on x86-64.
> 
> Note that this is not necessarily a universal performance win because of
> how it is implemented. High-order pages can cause pcp->high to be exceeded
> prematurely for lower-orders so for example, a large number of THP pages
> being freed could release order-0 pages from the PCP lists. Hence, much
> depends on the allocation/free pattern as observed by a single CPU to
> determine if caching helps or hurts a particular workload.
> 
> That said, basic performance testing passed. The following is a netperf
> UDP_STREAM test which hits the relevant patches as some of the network
> allocations are high-order.

This series[1] looks very interesting!  I confirm that some network
allocations do use high-order allocations.  Thus, I think this will
increase network performance in general, like you confirm below:

> netperf-udp
>                                  5.13.0-rc2             5.13.0-rc2
>                            mm-pcpburst-v3r4   mm-pcphighorder-v1r7
> Hmean     send-64         261.46 (   0.00%)      266.30 *   1.85%*
> Hmean     send-128        516.35 (   0.00%)      536.78 *   3.96%*
> Hmean     send-256       1014.13 (   0.00%)     1034.63 *   2.02%*
> Hmean     send-1024      3907.65 (   0.00%)     4046.11 *   3.54%*
> Hmean     send-2048      7492.93 (   0.00%)     7754.85 *   3.50%*
> Hmean     send-3312     11410.04 (   0.00%)    11772.32 *   3.18%*
> Hmean     send-4096     13521.95 (   0.00%)    13912.34 *   2.89%*
> Hmean     send-8192     21660.50 (   0.00%)    22730.72 *   4.94%*
> Hmean     send-16384    31902.32 (   0.00%)    32637.50 *   2.30%*
> 
> From a functional point of view, a patch like this is necessary to
> make bulk allocation of high-order pages work with similar performance
> to order-0 bulk allocations. The bulk allocator is not updated in this
> series as it would have to be determined by bulk allocation users how
> they want to track the order of pages allocated with the bulk allocator.

Thanks for working on this Mel, it is great to see! :-)

Message-Id: <20210531120412.17411-3-mgorman@techsingularity.net>
 [1] https://lore.kernel.org/linux-mm/20210531120412.17411-3-mgorman@techsingularity.net/
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists
  2021-05-31 15:23   ` [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists Jesper Dangaard Brouer
@ 2021-06-01 12:45     ` Mel Gorman
  2021-06-02 13:53       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 3+ messages in thread
From: Mel Gorman @ 2021-06-01 12:45 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Linux-MM, Dave Hansen, Vlastimil Babka, Michal Hocko, LKML,
	netdev@vger.kernel.org

On Mon, May 31, 2021 at 05:23:38PM +0200, Jesper Dangaard Brouer wrote:
> On Mon, 31 May 2021 13:04:12 +0100
> Mel Gorman <mgorman@techsingularity.net> wrote:
> 
> > The per-cpu page allocator (PCP) only stores order-0 pages. This means
> > that all THP and "cheap" high-order allocations including SLUB contends
> > on the zone->lock. This patch extends the PCP allocator to store THP and
> > "cheap" high-order pages. Note that struct per_cpu_pages increases in
> > size to 256 bytes (4 cache lines) on x86-64.
> > 
> > Note that this is not necessarily a universal performance win because of
> > how it is implemented. High-order pages can cause pcp->high to be exceeded
> > prematurely for lower-orders so for example, a large number of THP pages
> > being freed could release order-0 pages from the PCP lists. Hence, much
> > depends on the allocation/free pattern as observed by a single CPU to
> > determine if caching helps or hurts a particular workload.
> > 
> > That said, basic performance testing passed. The following is a netperf
> > UDP_STREAM test which hits the relevant patches as some of the network
> > allocations are high-order.
> 
> This series[1] looks very interesting!  I confirm that some network
> allocations do use high-order allocations.  Thus, I think this will
> increase network performance in general, like you confirm below:
> 

Would you be able to do a small test on a real high-speed network? It's
something I can do easily myself in a few weeks but I do not have testbed
readily available at the moment. It's ok if you do not have the time,
it would just be nice if I could include independent results in the
changelog if the results are positive. Alternatively, a negative result
would mean going back to the drawing board :)

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists
  2021-06-01 12:45     ` Mel Gorman
@ 2021-06-02 13:53       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 3+ messages in thread
From: Jesper Dangaard Brouer @ 2021-06-02 13:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linux-MM, Dave Hansen, Vlastimil Babka, Michal Hocko, LKML,
	netdev@vger.kernel.org, brouer

On Tue, 1 Jun 2021 13:45:33 +0100
Mel Gorman <mgorman@techsingularity.net> wrote:

> On Mon, May 31, 2021 at 05:23:38PM +0200, Jesper Dangaard Brouer wrote:
> > On Mon, 31 May 2021 13:04:12 +0100
> > Mel Gorman <mgorman@techsingularity.net> wrote:
> >   
> > > The per-cpu page allocator (PCP) only stores order-0 pages. This means
> > > that all THP and "cheap" high-order allocations including SLUB contends
> > > on the zone->lock. This patch extends the PCP allocator to store THP and
> > > "cheap" high-order pages. Note that struct per_cpu_pages increases in
> > > size to 256 bytes (4 cache lines) on x86-64.
> > > 
> > > Note that this is not necessarily a universal performance win because of
> > > how it is implemented. High-order pages can cause pcp->high to be exceeded
> > > prematurely for lower-orders so for example, a large number of THP pages
> > > being freed could release order-0 pages from the PCP lists. Hence, much
> > > depends on the allocation/free pattern as observed by a single CPU to
> > > determine if caching helps or hurts a particular workload.
> > > 
> > > That said, basic performance testing passed. The following is a netperf
> > > UDP_STREAM test which hits the relevant patches as some of the network
> > > allocations are high-order.  
> > 
> > This series[1] looks very interesting!  I confirm that some network
> > allocations do use high-order allocations.  Thus, I think this will
> > increase network performance in general, like you confirm below:
> >   
> 
> Would you be able to do a small test on a real high-speed network? It's
> something I can do easily myself in a few weeks but I do not have testbed
> readily available at the moment. It's ok if you do not have the time,
> it would just be nice if I could include independent results in the
> changelog if the results are positive. 

I don't have time right now.

If others have time, you can use this git tree provided by Mel:

 https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git/
 git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git
 branch: mm-pcphighorder-v1r7


> Alternatively, a negative result would mean going back to the drawing
> board :)

I'm confident that this will be a positive performance change. (I
remember we played with similar patches back in 2017).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-06-02 13:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20210531120412.17411-1-mgorman@techsingularity.net>
     [not found] ` <20210531120412.17411-3-mgorman@techsingularity.net>
2021-05-31 15:23   ` [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists Jesper Dangaard Brouer
2021-06-01 12:45     ` Mel Gorman
2021-06-02 13:53       ` Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).