* Re: [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists [not found] ` <20210531120412.17411-3-mgorman@techsingularity.net> @ 2021-05-31 15:23 ` Jesper Dangaard Brouer 2021-06-01 12:45 ` Mel Gorman 0 siblings, 1 reply; 3+ messages in thread From: Jesper Dangaard Brouer @ 2021-05-31 15:23 UTC (permalink / raw) To: Mel Gorman Cc: Linux-MM, Dave Hansen, Vlastimil Babka, Michal Hocko, LKML, brouer, netdev@vger.kernel.org On Mon, 31 May 2021 13:04:12 +0100 Mel Gorman <mgorman@techsingularity.net> wrote: > The per-cpu page allocator (PCP) only stores order-0 pages. This means > that all THP and "cheap" high-order allocations including SLUB contends > on the zone->lock. This patch extends the PCP allocator to store THP and > "cheap" high-order pages. Note that struct per_cpu_pages increases in > size to 256 bytes (4 cache lines) on x86-64. > > Note that this is not necessarily a universal performance win because of > how it is implemented. High-order pages can cause pcp->high to be exceeded > prematurely for lower-orders so for example, a large number of THP pages > being freed could release order-0 pages from the PCP lists. Hence, much > depends on the allocation/free pattern as observed by a single CPU to > determine if caching helps or hurts a particular workload. > > That said, basic performance testing passed. The following is a netperf > UDP_STREAM test which hits the relevant patches as some of the network > allocations are high-order. This series[1] looks very interesting! I confirm that some network allocations do use high-order allocations. Thus, I think this will increase network performance in general, like you confirm below: > netperf-udp > 5.13.0-rc2 5.13.0-rc2 > mm-pcpburst-v3r4 mm-pcphighorder-v1r7 > Hmean send-64 261.46 ( 0.00%) 266.30 * 1.85%* > Hmean send-128 516.35 ( 0.00%) 536.78 * 3.96%* > Hmean send-256 1014.13 ( 0.00%) 1034.63 * 2.02%* > Hmean send-1024 3907.65 ( 0.00%) 4046.11 * 3.54%* > Hmean send-2048 7492.93 ( 0.00%) 7754.85 * 3.50%* > Hmean send-3312 11410.04 ( 0.00%) 11772.32 * 3.18%* > Hmean send-4096 13521.95 ( 0.00%) 13912.34 * 2.89%* > Hmean send-8192 21660.50 ( 0.00%) 22730.72 * 4.94%* > Hmean send-16384 31902.32 ( 0.00%) 32637.50 * 2.30%* > > From a functional point of view, a patch like this is necessary to > make bulk allocation of high-order pages work with similar performance > to order-0 bulk allocations. The bulk allocator is not updated in this > series as it would have to be determined by bulk allocation users how > they want to track the order of pages allocated with the bulk allocator. Thanks for working on this Mel, it is great to see! :-) Message-Id: <20210531120412.17411-3-mgorman@techsingularity.net> [1] https://lore.kernel.org/linux-mm/20210531120412.17411-3-mgorman@techsingularity.net/ -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists 2021-05-31 15:23 ` [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists Jesper Dangaard Brouer @ 2021-06-01 12:45 ` Mel Gorman 2021-06-02 13:53 ` Jesper Dangaard Brouer 0 siblings, 1 reply; 3+ messages in thread From: Mel Gorman @ 2021-06-01 12:45 UTC (permalink / raw) To: Jesper Dangaard Brouer Cc: Linux-MM, Dave Hansen, Vlastimil Babka, Michal Hocko, LKML, netdev@vger.kernel.org On Mon, May 31, 2021 at 05:23:38PM +0200, Jesper Dangaard Brouer wrote: > On Mon, 31 May 2021 13:04:12 +0100 > Mel Gorman <mgorman@techsingularity.net> wrote: > > > The per-cpu page allocator (PCP) only stores order-0 pages. This means > > that all THP and "cheap" high-order allocations including SLUB contends > > on the zone->lock. This patch extends the PCP allocator to store THP and > > "cheap" high-order pages. Note that struct per_cpu_pages increases in > > size to 256 bytes (4 cache lines) on x86-64. > > > > Note that this is not necessarily a universal performance win because of > > how it is implemented. High-order pages can cause pcp->high to be exceeded > > prematurely for lower-orders so for example, a large number of THP pages > > being freed could release order-0 pages from the PCP lists. Hence, much > > depends on the allocation/free pattern as observed by a single CPU to > > determine if caching helps or hurts a particular workload. > > > > That said, basic performance testing passed. The following is a netperf > > UDP_STREAM test which hits the relevant patches as some of the network > > allocations are high-order. > > This series[1] looks very interesting! I confirm that some network > allocations do use high-order allocations. Thus, I think this will > increase network performance in general, like you confirm below: > Would you be able to do a small test on a real high-speed network? It's something I can do easily myself in a few weeks but I do not have testbed readily available at the moment. It's ok if you do not have the time, it would just be nice if I could include independent results in the changelog if the results are positive. Alternatively, a negative result would mean going back to the drawing board :) -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists 2021-06-01 12:45 ` Mel Gorman @ 2021-06-02 13:53 ` Jesper Dangaard Brouer 0 siblings, 0 replies; 3+ messages in thread From: Jesper Dangaard Brouer @ 2021-06-02 13:53 UTC (permalink / raw) To: Mel Gorman Cc: Linux-MM, Dave Hansen, Vlastimil Babka, Michal Hocko, LKML, netdev@vger.kernel.org, brouer On Tue, 1 Jun 2021 13:45:33 +0100 Mel Gorman <mgorman@techsingularity.net> wrote: > On Mon, May 31, 2021 at 05:23:38PM +0200, Jesper Dangaard Brouer wrote: > > On Mon, 31 May 2021 13:04:12 +0100 > > Mel Gorman <mgorman@techsingularity.net> wrote: > > > > > The per-cpu page allocator (PCP) only stores order-0 pages. This means > > > that all THP and "cheap" high-order allocations including SLUB contends > > > on the zone->lock. This patch extends the PCP allocator to store THP and > > > "cheap" high-order pages. Note that struct per_cpu_pages increases in > > > size to 256 bytes (4 cache lines) on x86-64. > > > > > > Note that this is not necessarily a universal performance win because of > > > how it is implemented. High-order pages can cause pcp->high to be exceeded > > > prematurely for lower-orders so for example, a large number of THP pages > > > being freed could release order-0 pages from the PCP lists. Hence, much > > > depends on the allocation/free pattern as observed by a single CPU to > > > determine if caching helps or hurts a particular workload. > > > > > > That said, basic performance testing passed. The following is a netperf > > > UDP_STREAM test which hits the relevant patches as some of the network > > > allocations are high-order. > > > > This series[1] looks very interesting! I confirm that some network > > allocations do use high-order allocations. Thus, I think this will > > increase network performance in general, like you confirm below: > > > > Would you be able to do a small test on a real high-speed network? It's > something I can do easily myself in a few weeks but I do not have testbed > readily available at the moment. It's ok if you do not have the time, > it would just be nice if I could include independent results in the > changelog if the results are positive. I don't have time right now. If others have time, you can use this git tree provided by Mel: https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git/ git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git branch: mm-pcphighorder-v1r7 > Alternatively, a negative result would mean going back to the drawing > board :) I'm confident that this will be a positive performance change. (I remember we played with similar patches back in 2017). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-06-02 13:53 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20210531120412.17411-1-mgorman@techsingularity.net> [not found] ` <20210531120412.17411-3-mgorman@techsingularity.net> 2021-05-31 15:23 ` [PATCH 2/2] mm/page_alloc: Allow high-order pages to be stored on the per-cpu lists Jesper Dangaard Brouer 2021-06-01 12:45 ` Mel Gorman 2021-06-02 13:53 ` Jesper Dangaard Brouer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).