All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Thomas Gleixner <tglx@linutronix.de>,
	Matt Fleming <matt@codeblueprint.co.uk>,
	Borislav Petkov <bp@alien8.de>, Linux-MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/3] mm, meminit: Recalculate pcpu batch and high limits after init completes
Date: Fri, 18 Oct 2019 15:09:59 +0100	[thread overview]
Message-ID: <20191018140959.GK3321@techsingularity.net> (raw)
In-Reply-To: <20191018130127.GP5017@dhcp22.suse.cz>

On Fri, Oct 18, 2019 at 03:01:27PM +0200, Michal Hocko wrote:
> On Fri 18-10-19 11:56:05, Mel Gorman wrote:
> > Deferred memory initialisation updates zone->managed_pages during
> > the initialisation phase but before that finishes, the per-cpu page
> > allocator (pcpu) calculates the number of pages allocated/freed in
> > batches as well as the maximum number of pages allowed on a per-cpu list.
> > As zone->managed_pages is not up to date yet, the pcpu initialisation
> > calculates inappropriately low batch and high values.
> > 
> > This increases zone lock contention quite severely in some cases with the
> > degree of severity depending on how many CPUs share a local zone and the
> > size of the zone. A private report indicated that kernel build times were
> > excessive with extremely high system CPU usage. A perf profile indicated
> > that a large chunk of time was lost on zone->lock contention.
> > 
> > This patch recalculates the pcpu batch and high values after deferred
> > initialisation completes on each node. It was tested on a 2-socket AMD
> > EPYC 2 machine using a kernel compilation workload -- allmodconfig and
> > all available CPUs.
> > 
> > mmtests configuration: config-workload-kernbench-max
> > Configuration was modified to build on a fresh XFS partition.
> > 
> > kernbench
> >                                 5.4.0-rc3              5.4.0-rc3
> >                                   vanilla         resetpcpu-v1r1
> > Amean     user-256    13249.50 (   0.00%)    15928.40 * -20.22%*
> > Amean     syst-256    14760.30 (   0.00%)     4551.77 *  69.16%*
> > Amean     elsp-256      162.42 (   0.00%)      118.46 *  27.06%*
> > Stddev    user-256       42.97 (   0.00%)       50.83 ( -18.30%)
> > Stddev    syst-256      336.87 (   0.00%)       33.70 (  90.00%)
> > Stddev    elsp-256        2.46 (   0.00%)        0.81 (  67.01%)
> > 
> >                    5.4.0-rc3   5.4.0-rc3
> >                      vanillaresetpcpu-v1r1
> > Duration User       39766.24    47802.92
> > Duration System     44298.10    13671.93
> > Duration Elapsed      519.11      387.65
> > 
> > The patch reduces system CPU usage by 69.16% and total build time by
> > 27.06%. The variance of system CPU usage is also much reduced.
> 
> The fix makes sense. It would be nice to see the difference in the batch
> sizes from the initial setup compared to the one after the deferred
> intialization is done
> 

Before, this was the breakdown of batch and high values over all zones
were

    256               batch: 1
    256               batch: 63
    512               batch: 7

    256               high:  0
    256               high:  378
    512               high:  42

i.e. 512 pcpu pagesets had a batch limit of 7 and a high limit of 42.
These were for the NORMAL zones on the system. After the patch

    256               batch: 1
    768               batch: 63

    256               high:  0
    768               high:  378

> > Cc: stable@vger.kernel.org # v4.15+
> 
> Hmm, are you sure about 4.15? Doesn't this go all the way down to
> deferred initialization? I do not see any recent changes on when
> setup_per_cpu_pageset is called.
> 

No, I'm not 100% sure. It looks like this was always an issue from the
code but did not happen on at least one 4.12-based distribution kernel for
reasons that are non-obvious. Either way, the tag should have been "v4.1+"

Thanks.

-- 
Mel Gorman
SUSE Labs


  reply	other threads:[~2019-10-18 14:10 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-18 10:56 [PATCH 0/3] Recalculate per-cpu page allocator batch and high limits after deferred meminit Mel Gorman
2019-10-18 10:56 ` [PATCH 1/3] mm, pcp: Share common code between memory hotplug and percpu sysctl handler Mel Gorman
2019-10-18 11:57   ` Matt Fleming
2019-10-18 12:51   ` Michal Hocko
2019-10-18 10:56 ` [PATCH 2/3] mm, meminit: Recalculate pcpu batch and high limits after init completes Mel Gorman
2019-10-18 11:57   ` Matt Fleming
2019-10-18 13:01   ` Michal Hocko
2019-10-18 14:09     ` Mel Gorman [this message]
2019-10-19  1:40       ` Andrew Morton
2019-10-20  9:32         ` Mel Gorman
2019-10-18 10:56 ` [PATCH 3/3] mm, pcpu: Make zone pcp updates and reset internal to the mm Mel Gorman
2019-10-18 11:57   ` Matt Fleming
2019-10-18 13:02   ` Michal Hocko
2019-10-18 11:58 ` [PATCH 0/3] Recalculate per-cpu page allocator batch and high limits after deferred meminit Matt Fleming
2019-10-18 12:54   ` Mel Gorman
2019-10-18 14:48     ` Matt Fleming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191018140959.GK3321@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matt@codeblueprint.co.uk \
    --cc=mhocko@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.