Re: [PATCH v2 3/3] mm/page_alloc: Remotely drain per-cpu lists

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Nicolas Saenz Julienne <nsaenzju@redhat.com>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, frederic@kernel.org, tglx@linutronix.de,
	peterz@infradead.org, nilal@redhat.com,
	linux-rt-users@vger.kernel.org, vbabka@suse.cz, cl@linux.com,
	ppandit@redhat.com
Subject: Re: [PATCH v2 3/3] mm/page_alloc: Remotely drain per-cpu lists
Date: Fri, 10 Dec 2021 10:55:49 +0000	[thread overview]
Message-ID: <20211210105549.GJ3301@suse.de> (raw)
In-Reply-To: <20211209174535.GA70283@fuller.cnet>

On Thu, Dec 09, 2021 at 02:45:35PM -0300, Marcelo Tosatti wrote:
> On Fri, Dec 03, 2021 at 02:13:06PM +0000, Mel Gorman wrote:
> > On Wed, Nov 03, 2021 at 06:05:12PM +0100, Nicolas Saenz Julienne wrote:
> > > Some setups, notably NOHZ_FULL CPUs, are too busy to handle the per-cpu
> > > drain work queued by __drain_all_pages(). So introduce new a mechanism
> > > to remotely drain the per-cpu lists. It is made possible by remotely
> > > locking 'struct per_cpu_pages' new per-cpu spinlocks. A benefit of this
> > > new scheme is that drain operations are now migration safe.
> > > 
> > > There was no observed performance degradation vs. the previous scheme.
> > > Both netperf and hackbench were run in parallel to triggering the
> > > __drain_all_pages(NULL, true) code path around ~100 times per second.
> > > The new scheme performs a bit better (~5%), although the important point
> > > here is there are no performance regressions vs. the previous mechanism.
> > > Per-cpu lists draining happens only in slow paths.
> > > 
> > 
> > netperf and hackbench are not great indicators of page allocator
> > performance as IIRC they are more slab-intensive than page allocator
> > intensive. I ran the series through a few benchmarks and can confirm
> > that there was negligible difference to netperf and hackbench.
> > 
> > However, on Page Fault Test (pft in mmtests), it is noticable. On a
> > 2-socket cascadelake machine I get
> > 
> > pft timings
> >                                  5.16.0-rc1             5.16.0-rc1
> >                                     vanilla    mm-remotedrain-v2r1
> > Amean     system-1         27.48 (   0.00%)       27.85 *  -1.35%*
> > Amean     system-4         28.65 (   0.00%)       30.84 *  -7.65%*
> > Amean     system-7         28.70 (   0.00%)       32.43 * -13.00%*
> > Amean     system-12        30.33 (   0.00%)       34.21 * -12.80%*
> > Amean     system-21        37.14 (   0.00%)       41.51 * -11.76%*
> > Amean     system-30        36.79 (   0.00%)       46.15 * -25.43%*
> > Amean     system-48        58.95 (   0.00%)       65.28 * -10.73%*
> > Amean     system-79       111.61 (   0.00%)      114.78 *  -2.84%*
> > Amean     system-80       113.59 (   0.00%)      116.73 *  -2.77%*
> > Amean     elapsed-1        32.83 (   0.00%)       33.12 *  -0.88%*
> > Amean     elapsed-4         8.60 (   0.00%)        9.17 *  -6.66%*
> > Amean     elapsed-7         4.97 (   0.00%)        5.53 * -11.30%*
> > Amean     elapsed-12        3.08 (   0.00%)        3.43 * -11.41%*
> > Amean     elapsed-21        2.19 (   0.00%)        2.41 * -10.06%*
> > Amean     elapsed-30        1.73 (   0.00%)        2.04 * -17.87%*
> > Amean     elapsed-48        1.73 (   0.00%)        2.03 * -17.77%*
> > Amean     elapsed-79        1.61 (   0.00%)        1.64 *  -1.90%*
> > Amean     elapsed-80        1.60 (   0.00%)        1.64 *  -2.50%*
> > 
> > It's not specific to cascade lake, I see varying size regressions on
> > different Intel and AMD chips, some better and worse than this result.
> > The smallest regression was on a single CPU skylake machine with a 2-6%
> > hit. Worst was Zen1 with a 3-107% hit.
> > 
> > I didn't profile it to establish why but in all cases the system CPU
> > usage was much higher. It *might* be because the spinlock in
> > per_cpu_pages crosses a new cache line and it might be cold although the
> > penalty seems a bit high for that to be the only factor.
> > 
> > Code-wise, the patches look fine but the apparent penalty for PFT is
> > too severe.
> 
> Mel,
> 
> Have you read Nicolas RCU patches?
> 

I agree with Vlastimil's review on overhead.

I think it would be more straight-forward to disable the pcp allocator for
NOHZ_FULL CPUs like what zone_pcp_disable except for individual CPUs with
care taken to not accidentally re-enable nohz CPus in zone_pcp_enable. The
downside is that there will be a performance penalty if an application
running on a NOHZ_FULL CPU is page allocator intensive for whatever
reason.  However, I guess this is unlikely because if there was a lot
of kernel activity for a NOHZ_FULL CPU, the vmstat shepherd would also
cause interference.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2021-12-10 10:56 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-03 17:05 [PATCH v2 0/3] mm/page_alloc: Remote per-cpu page list drain support Nicolas Saenz Julienne
2021-11-03 17:05 ` [PATCH v2 1/3] mm/page_alloc: Don't pass pfn to free_unref_page_commit() Nicolas Saenz Julienne
2021-11-23 14:41   ` Vlastimil Babka
2021-11-03 17:05 ` [PATCH v2 2/3] mm/page_alloc: Convert per-cpu lists' local locks to per-cpu spin locks Nicolas Saenz Julienne
2021-11-04 14:38   ` [mm/page_alloc] 5541e53659: BUG:spinlock_bad_magic_on_CPU kernel test robot
2021-11-04 16:39     ` Nicolas Saenz Julienne
2021-11-03 17:05 ` [PATCH v2 3/3] mm/page_alloc: Remotely drain per-cpu lists Nicolas Saenz Julienne
2021-12-03 14:13   ` Mel Gorman
2021-12-09 10:50     ` Nicolas Saenz Julienne
2021-12-09 17:45     ` Marcelo Tosatti
2021-12-10 10:55       ` Mel Gorman [this message]
2021-12-14 10:58         ` Marcelo Tosatti
2021-12-14 11:42           ` Christoph Lameter
2021-12-14 12:25             ` Marcelo Tosatti
2021-11-23 14:58 ` [PATCH v2 0/3] mm/page_alloc: Remote per-cpu page list drain support Vlastimil Babka
2021-11-30 18:09   ` Nicolas Saenz Julienne
2021-12-01 14:01     ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211210105549.GJ3301@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=nilal@redhat.com \
    --cc=nsaenzju@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ppandit@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).