From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
To: Daniel Jordan <daniel.m.jordan@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
Herbert Xu <herbert@gondor.apana.org.au>,
Steffen Klassert <steffen.klassert@secunet.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
Dan Williams <dan.j.williams@intel.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Hildenbrand <david@redhat.com>,
Jason Gunthorpe <jgg@ziepe.ca>, Jonathan Corbet <corbet@lwn.net>,
Josh Triplett <josh@joshtriplett.org>,
Kirill Tkhai <ktkhai@virtuozzo.com>,
Michal Hocko <mhocko@kernel.org>, Pavel Machek <pavel@ucw.cz>,
Pavel Tatashin <pasha.tatashin@soleen.com>,
Peter Zijlstra <peterz@infradead.org>,
Randy Dunlap <rdunlap@infradead.org>,
Shile Zhang <shile.zhang@linux.alibaba.com>,
Tejun Heo <tj@kernel.org>, Zi Yan <ziy@nvidia.com>,
linux-crypto@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 5/7] mm: move zone iterator outside of deferred_init_maxorder()
Date: Thu, 30 Apr 2020 14:43:28 -0700 [thread overview]
Message-ID: <deadac9a-fbef-6c66-207c-83d251d2ef50@linux.intel.com> (raw)
In-Reply-To: <20200430201125.532129-6-daniel.m.jordan@oracle.com>
On 4/30/2020 1:11 PM, Daniel Jordan wrote:
> padata will soon divide up pfn ranges between threads when parallelizing
> deferred init, and deferred_init_maxorder() complicates that by using an
> opaque index in addition to start and end pfns. Move the index outside
> the function to make splitting the job easier, and simplify the code
> while at it.
>
> deferred_init_maxorder() now always iterates within a single pfn range
> instead of potentially multiple ranges, and advances start_pfn to the
> end of that range instead of the max-order block so partial pfn ranges
> in the block aren't skipped in a later iteration. The section alignment
> check in deferred_grow_zone() is removed as well since this alignment is
> no longer guaranteed. It's not clear what value the alignment provided
> originally.
>
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
So part of the reason for splitting it up along section aligned
boundaries was because we already had an existing functionality in
deferred_grow_zone that was going in and pulling out a section aligned
chunk and processing it to prepare enough memory for other threads to
keep running. I suspect that the section alignment was done because
normally I believe that is also the alignment for memory onlining.
With this already breaking things up over multiple threads how does this
work with deferred_grow_zone? Which thread is it trying to allocate from
if it needs to allocate some memory for itself?
Also what is to prevent a worker from stop deferred_grow_zone from
bailing out in the middle of a max order page block if there is a hole
in the middle of the block?
> ---
> mm/page_alloc.c | 88 +++++++++++++++----------------------------------
> 1 file changed, 27 insertions(+), 61 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 68669d3a5a665..990514d8f0d94 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1708,55 +1708,23 @@ deferred_init_mem_pfn_range_in_zone(u64 *i, struct zone *zone,
> }
>
> /*
> - * Initialize and free pages. We do it in two loops: first we initialize
> - * struct page, then free to buddy allocator, because while we are
> - * freeing pages we can access pages that are ahead (computing buddy
> - * page in __free_one_page()).
> - *
> - * In order to try and keep some memory in the cache we have the loop
> - * broken along max page order boundaries. This way we will not cause
> - * any issues with the buddy page computation.
> + * Initialize the struct pages and then free them to the buddy allocator at
> + * most a max order block at a time because while we are freeing pages we can
> + * access pages that are ahead (computing buddy page in __free_one_page()).
> + * It's also cache friendly.
> */
> static unsigned long __init
> -deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn,
> - unsigned long *end_pfn)
> +deferred_init_maxorder(struct zone *zone, unsigned long *start_pfn,
> + unsigned long end_pfn)
> {
> - unsigned long mo_pfn = ALIGN(*start_pfn + 1, MAX_ORDER_NR_PAGES);
> - unsigned long spfn = *start_pfn, epfn = *end_pfn;
> - unsigned long nr_pages = 0;
> - u64 j = *i;
> -
> - /* First we loop through and initialize the page values */
> - for_each_free_mem_pfn_range_in_zone_from(j, zone, start_pfn, end_pfn) {
> - unsigned long t;
> -
> - if (mo_pfn <= *start_pfn)
> - break;
> -
> - t = min(mo_pfn, *end_pfn);
> - nr_pages += deferred_init_pages(zone, *start_pfn, t);
> -
> - if (mo_pfn < *end_pfn) {
> - *start_pfn = mo_pfn;
> - break;
> - }
> - }
> -
> - /* Reset values and now loop through freeing pages as needed */
> - swap(j, *i);
> -
> - for_each_free_mem_pfn_range_in_zone_from(j, zone, &spfn, &epfn) {
> - unsigned long t;
> -
> - if (mo_pfn <= spfn)
> - break;
> + unsigned long nr_pages, pfn;
>
> - t = min(mo_pfn, epfn);
> - deferred_free_pages(spfn, t);
> + pfn = ALIGN(*start_pfn + 1, MAX_ORDER_NR_PAGES);
> + pfn = min(pfn, end_pfn);
>
> - if (mo_pfn <= epfn)
> - break;
> - }
> + nr_pages = deferred_init_pages(zone, *start_pfn, pfn);
> + deferred_free_pages(*start_pfn, pfn);
> + *start_pfn = pfn;
>
> return nr_pages;
> }
> @@ -1814,9 +1782,11 @@ static int __init deferred_init_memmap(void *data)
> * that we can avoid introducing any issues with the buddy
> * allocator.
> */
> - while (spfn < epfn) {
> - nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
> - cond_resched();
> + for_each_free_mem_pfn_range_in_zone_from(i, zone, &spfn, &epfn) {
> + while (spfn < epfn) {
> + nr_pages += deferred_init_maxorder(zone, &spfn, epfn);
> + cond_resched();
> + }
> }
> zone_empty:
> /* Sanity check that the next zone really is unpopulated */
> @@ -1883,22 +1853,18 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
> * that we can avoid introducing any issues with the buddy
> * allocator.
> */
> - while (spfn < epfn) {
> - /* update our first deferred PFN for this section */
> - first_deferred_pfn = spfn;
> -
> - nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
> - touch_nmi_watchdog();
> -
> - /* We should only stop along section boundaries */
> - if ((first_deferred_pfn ^ spfn) < PAGES_PER_SECTION)
> - continue;
> -
> - /* If our quota has been met we can stop here */
> - if (nr_pages >= nr_pages_needed)
> - break;
> + for_each_free_mem_pfn_range_in_zone_from(i, zone, &spfn, &epfn) {
> + while (spfn < epfn) {
> + nr_pages += deferred_init_maxorder(zone, &spfn, epfn);
> + touch_nmi_watchdog();
> +
> + /* If our quota has been met we can stop here */
> + if (nr_pages >= nr_pages_needed)
> + goto out;
> + }
> }
>
> +out:
> pgdat->first_deferred_pfn = spfn;
> pgdat_resize_unlock(pgdat, &flags);
>
>
next prev parent reply other threads:[~2020-04-30 21:43 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-30 20:11 [PATCH 0/7] padata: parallelize deferred page init Daniel Jordan
2020-04-30 20:11 ` [PATCH 1/7] padata: remove exit routine Daniel Jordan
2020-04-30 20:11 ` [PATCH 2/7] padata: initialize earlier Daniel Jordan
2020-04-30 20:11 ` [PATCH 3/7] padata: allocate work structures for parallel jobs from a pool Daniel Jordan
2020-04-30 20:11 ` [PATCH 4/7] padata: add basic support for multithreaded jobs Daniel Jordan
2020-04-30 20:11 ` [PATCH 5/7] mm: move zone iterator outside of deferred_init_maxorder() Daniel Jordan
2020-04-30 21:43 ` Alexander Duyck [this message]
2020-05-01 2:45 ` Daniel Jordan
2020-05-04 22:10 ` Alexander Duyck
2020-05-05 0:54 ` Daniel Jordan
2020-05-05 15:27 ` Alexander Duyck
2020-05-06 22:39 ` Daniel Jordan
2020-05-07 15:26 ` Alexander Duyck
2020-05-07 20:20 ` Daniel Jordan
2020-05-07 21:18 ` Alexander Duyck
2020-05-07 22:15 ` Daniel Jordan
2020-04-30 20:11 ` [PATCH 6/7] mm: parallelize deferred_init_memmap() Daniel Jordan
2020-05-04 22:33 ` Alexander Duyck
2020-05-04 23:38 ` Josh Triplett
2020-05-04 23:38 ` Josh Triplett
2020-05-05 0:40 ` Alexander Duyck
2020-05-05 1:48 ` Daniel Jordan
2020-05-05 2:09 ` Daniel Jordan
2020-05-05 14:55 ` Alexander Duyck
2020-05-06 22:21 ` Daniel Jordan
2020-05-06 22:36 ` Alexander Duyck
2020-05-06 22:43 ` Daniel Jordan
2020-05-06 23:01 ` Daniel Jordan
2020-05-05 1:26 ` Daniel Jordan
2020-04-30 20:11 ` [PATCH 7/7] padata: document multithreaded jobs Daniel Jordan
2020-04-30 21:31 ` [PATCH 0/7] padata: parallelize deferred page init Andrew Morton
2020-04-30 21:40 ` Pavel Tatashin
2020-05-01 2:40 ` Daniel Jordan
2020-05-01 0:50 ` Josh Triplett
2020-05-01 1:09 ` Josh Triplett
2020-05-01 2:48 ` Daniel Jordan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=deadac9a-fbef-6c66-207c-83d251d2ef50@linux.intel.com \
--to=alexander.h.duyck@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=corbet@lwn.net \
--cc=dan.j.williams@intel.com \
--cc=daniel.m.jordan@oracle.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=herbert@gondor.apana.org.au \
--cc=jgg@ziepe.ca \
--cc=josh@joshtriplett.org \
--cc=ktkhai@virtuozzo.com \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=pasha.tatashin@soleen.com \
--cc=pavel@ucw.cz \
--cc=peterz@infradead.org \
--cc=rdunlap@infradead.org \
--cc=shile.zhang@linux.alibaba.com \
--cc=steffen.klassert@secunet.com \
--cc=tj@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.