From: Mel Gorman <mgorman@techsingularity.net>
To: Huang Ying <ying.huang@intel.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Arjan Van De Ven <arjan@linux.intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
David Hildenbrand <david@redhat.com>,
Johannes Weiner <jweiner@redhat.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Michal Hocko <mhocko@suse.com>,
Pavel Tatashin <pasha.tatashin@soleen.com>,
Matthew Wilcox <willy@infradead.org>,
Christoph Lameter <cl@linux.com>
Subject: Re: [PATCH 01/10] mm, pcp: avoid to drain PCP when process exit
Date: Wed, 11 Oct 2023 13:46:10 +0100 [thread overview]
Message-ID: <20231011124610.4punxroovolyvmgr@techsingularity.net> (raw)
In-Reply-To: <20230920061856.257597-2-ying.huang@intel.com>
On Wed, Sep 20, 2023 at 02:18:47PM +0800, Huang Ying wrote:
> In commit f26b3fa04611 ("mm/page_alloc: limit number of high-order
> pages on PCP during bulk free"), the PCP (Per-CPU Pageset) will be
> drained when PCP is mostly used for high-order pages freeing to
> improve the cache-hot pages reusing between page allocation and
> freeing CPUs.
>
> But, the PCP draining mechanism may be triggered unexpectedly when
> process exits. With some customized trace point, it was found that
> PCP draining (free_high == true) was triggered with the order-1 page
> freeing with the following call stack,
>
> => free_unref_page_commit
> => free_unref_page
> => __mmdrop
> => exit_mm
> => do_exit
> => do_group_exit
> => __x64_sys_exit_group
> => do_syscall_64
>
> Checking the source code, this is the page table PGD
> freeing (mm_free_pgd()). It's a order-1 page freeing if
> CONFIG_PAGE_TABLE_ISOLATION=y. Which is a common configuration for
> security.
>
> Just before that, page freeing with the following call stack was
> found,
>
> => free_unref_page_commit
> => free_unref_page_list
> => release_pages
> => tlb_batch_pages_flush
> => tlb_finish_mmu
> => exit_mmap
> => __mmput
> => exit_mm
> => do_exit
> => do_group_exit
> => __x64_sys_exit_group
> => do_syscall_64
>
> So, when a process exits,
>
> - a large number of user pages of the process will be freed without
> page allocation, it's highly possible that pcp->free_factor becomes
> > 0.
>
> - after freeing all user pages, the PGD will be freed, which is a
> order-1 page freeing, PCP will be drained.
>
> All in all, when a process exits, it's high possible that the PCP will
> be drained. This is an unexpected behavior.
>
> To avoid this, in the patch, the PCP draining will only be triggered
> for 2 consecutive high-order page freeing.
>
> On a 2-socket Intel server with 224 logical CPU, we tested kbuild on
> one socket with `make -j 112`. With the patch, the build time
> decreases 3.4% (from 206s to 199s). The cycles% of the spinlock
> contention (mostly for zone lock) decreases from 43.6% to 40.3% (with
> PCP size == 361). The number of PCP draining for high order pages
> freeing (free_high) decreases 50.8%.
>
> This helps network workload too for reduced zone lock contention. On
> a 2-socket Intel server with 128 logical CPU, with the patch, the
> network bandwidth of the UNIX (AF_UNIX) test case of lmbench test
> suite with 16-pair processes increase 17.1%. The cycles% of the
> spinlock contention (mostly for zone lock) decreases from 50.0% to
> 45.8%. The number of PCP draining for high order pages
> freeing (free_high) decreases 27.4%. The cache miss rate keeps 0.3%.
>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
However, I want to note that batching on exit is not necessarily
unexpected. For processes that are multi-TB in size, the time to exit
can actually be quite large and batching is of benefit but optimising
for exit is rarely a winning strategy. The pattern of "all allocs on CPU
B and all frees on CPU B" or "short-lived tasks triggering a premature
drain" is a bit more compelling but not worth a changelog rewrite.
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 4106fbc5b4b3..64d5ed2bb724 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -676,12 +676,15 @@ enum zone_watermarks {
> #define high_wmark_pages(z) (z->_watermark[WMARK_HIGH] + z->watermark_boost)
> #define wmark_pages(z, i) (z->_watermark[i] + z->watermark_boost)
>
> +#define PCPF_PREV_FREE_HIGH_ORDER 0x01
> +
The meaning of the flag and its intent should have been documented.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2023-10-11 12:46 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-20 6:18 [PATCH 00/10] mm: PCP high auto-tuning Huang Ying
2023-09-20 6:18 ` [PATCH 01/10] mm, pcp: avoid to drain PCP when process exit Huang Ying
2023-10-11 12:46 ` Mel Gorman [this message]
2023-10-11 17:16 ` Andrew Morton
2023-10-12 13:09 ` Mel Gorman
2023-10-12 13:35 ` Huang, Ying
2023-10-12 12:21 ` Huang, Ying
2023-09-20 6:18 ` [PATCH 02/10] cacheinfo: calculate per-CPU data cache size Huang Ying
2023-09-20 9:24 ` Sudeep Holla
2023-09-22 7:56 ` Huang, Ying
2023-10-11 12:20 ` Mel Gorman
2023-10-12 12:08 ` Huang, Ying
2023-10-12 12:52 ` Mel Gorman
2023-10-12 13:12 ` Huang, Ying
2023-10-12 15:22 ` Mel Gorman
2023-10-13 3:06 ` Huang, Ying
2023-10-16 15:43 ` Mel Gorman
2023-09-20 6:18 ` [PATCH 03/10] mm, pcp: reduce lock contention for draining high-order pages Huang Ying
2023-10-11 12:49 ` Mel Gorman
2023-10-12 12:11 ` Huang, Ying
2023-09-20 6:18 ` [PATCH 04/10] mm: restrict the pcp batch scale factor to avoid too long latency Huang Ying
2023-10-11 12:52 ` Mel Gorman
2023-10-12 12:15 ` Huang, Ying
2023-09-20 6:18 ` [PATCH 05/10] mm, page_alloc: scale the number of pages that are batch allocated Huang Ying
2023-10-11 12:54 ` Mel Gorman
2023-09-20 6:18 ` [PATCH 06/10] mm: add framework for PCP high auto-tuning Huang Ying
2023-09-20 6:18 ` [PATCH 07/10] mm: tune PCP high automatically Huang Ying
2023-09-20 6:18 ` [PATCH 08/10] mm, pcp: decrease PCP high if free pages < high watermark Huang Ying
2023-10-11 13:08 ` Mel Gorman
2023-10-12 12:19 ` Huang, Ying
2023-09-20 6:18 ` [PATCH 09/10] mm, pcp: avoid to reduce PCP high unnecessarily Huang Ying
2023-10-11 14:09 ` Mel Gorman
2023-10-12 7:48 ` Huang, Ying
2023-10-12 12:49 ` Mel Gorman
2023-10-12 13:19 ` Huang, Ying
2023-09-20 6:18 ` [PATCH 10/10] mm, pcp: reduce detecting time of consecutive high order page freeing Huang Ying
2023-09-20 16:41 ` [PATCH 00/10] mm: PCP high auto-tuning Andrew Morton
2023-09-21 13:32 ` Huang, Ying
2023-09-21 15:46 ` Andrew Morton
2023-09-22 0:33 ` Huang, Ying
2023-10-11 13:05 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231011124610.4punxroovolyvmgr@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=akpm@linux-foundation.org \
--cc=arjan@linux.intel.com \
--cc=cl@linux.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=jweiner@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=pasha.tatashin@soleen.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.