From: Andrew Morton <akpm@linux-foundation.org>
To: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: steven.sistare@oracle.com, daniel.m.jordan@oracle.com,
m.mizuma@jp.fujitsu.com, mhocko@suse.com,
catalin.marinas@arm.com, takahiro.akashi@linaro.org,
gi-oh.kim@profitbricks.com, heiko.carstens@de.ibm.com,
baiyaowei@cmss.chinamobile.com, richard.weiyang@gmail.com,
paul.burton@mips.com, miles.chen@mediatek.com, vbabka@suse.cz,
mgorman@suse.de, hannes@cmpxchg.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2 1/1] mm: initialize pages on demand during boot
Date: Thu, 8 Feb 2018 12:03:34 -0800 [thread overview]
Message-ID: <20180208120334.0779ed0726bb527a9cad0336@linux-foundation.org> (raw)
In-Reply-To: <20180208184555.5855-2-pasha.tatashin@oracle.com>
On Thu, 8 Feb 2018 13:45:55 -0500 Pavel Tatashin <pasha.tatashin@oracle.com> wrote:
> Deferred page initialization allows the boot cpu to initialize a small
> subset of the system's pages early in boot, with other cpus doing the rest
> later on.
>
> It is, however, problematic to know how many pages the kernel needs during
> boot. Different modules and kernel parameters may change the requirement,
> so the boot cpu either initializes too many pages or runs out of memory.
>
> To fix that, initialize early pages on demand. This ensures the kernel
> does the minimum amount of work to initialize pages during boot and leaves
> the rest to be divided in the multithreaded initialization path
> (deferred_init_memmap).
>
> The on-demand code is permanently disabled using static branching once
> deferred pages are initialized. After the static branch is changed to
> false, the overhead is up-to two branch-always instructions if the zone
> watermark check fails or if rmqueue fails.
>
> ...
>
> @@ -1604,6 +1566,96 @@ static int __init deferred_init_memmap(void *data)
> pgdat_init_report_one_done();
> return 0;
> }
> +
> +/*
> + * Protects some early interrupt threads, and also for a short period of time
> + * from smp_init() to page_alloc_init_late() when deferred pages are
> + * initialized.
> + */
> +static __initdata DEFINE_SPINLOCK(deferred_zone_grow_lock);
Comment is a little confusing. Locks don't protect "threads" - they
protect data. Can we be specific about which data is being protected?
Why is a new lock needed here? Those data structures already have
designated locks, don't they?
If the lock protects "early interrupt threads" then it's surprising to
see it taken with spin_lock() and not spin_lock_irqsave()?
> +DEFINE_STATIC_KEY_TRUE(deferred_pages);
> +
> +/*
> + * If this zone has deferred pages, try to grow it by initializing enough
> + * deferred pages to satisfy the allocation specified by order, rounded up to
> + * the nearest PAGES_PER_SECTION boundary. So we're adding memory in increments
> + * of SECTION_SIZE bytes by initializing struct pages in increments of
> + * PAGES_PER_SECTION * sizeof(struct page) bytes.
> + */
Please also document the return value.
> +static noinline bool __init
Why was noinline needed?
> +deferred_grow_zone(struct zone *zone, unsigned int order)
> +{
> + int zid = zone_idx(zone);
> + int nid = zone->node;
> + pg_data_t *pgdat = NODE_DATA(nid);
> + unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION);
> + unsigned long nr_pages = 0;
> + unsigned long first_init_pfn, first_deferred_pfn, spfn, epfn, t;
> + phys_addr_t spa, epa;
> + u64 i;
> +
> + /* Only the last zone may have deferred pages */
> + if (zone_end_pfn(zone) != pgdat_end_pfn(pgdat))
> + return false;
> +
> + first_deferred_pfn = READ_ONCE(pgdat->first_deferred_pfn);
It would be nice to have a little comment explaining why READ_ONCE was
needed.
Would it still be needed if this code was moved into the locked region?
> + first_init_pfn = max(zone->zone_start_pfn, first_deferred_pfn);
> +
> + if (first_init_pfn >= pgdat_end_pfn(pgdat))
> + return false;
> +
> + spin_lock(&deferred_zone_grow_lock);
> + /*
> + * Bail if we raced with another thread that disabled on demand
> + * initialization.
> + */
> + if (!static_branch_unlikely(&deferred_pages)) {
> + spin_unlock(&deferred_zone_grow_lock);
> + return false;
> + }
> +
> + for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) {
> + spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa));
> + epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa));
> +
> + while (spfn < epfn && nr_pages < nr_pages_needed) {
> + t = ALIGN(spfn + PAGES_PER_SECTION, PAGES_PER_SECTION);
> + first_deferred_pfn = min(t, epfn);
> + nr_pages += deferred_init_pages(nid, zid, spfn,
> + first_deferred_pfn);
> + spfn = first_deferred_pfn;
> + }
> +
> + if (nr_pages >= nr_pages_needed)
> + break;
> + }
> +
> + for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) {
> + spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa));
> + epfn = min_t(unsigned long, first_deferred_pfn, PFN_DOWN(epa));
> + deferred_free_pages(nid, zid, spfn, epfn);
> +
> + if (first_deferred_pfn == epfn)
> + break;
> + }
> + WRITE_ONCE(pgdat->first_deferred_pfn, first_deferred_pfn);
> + spin_unlock(&deferred_zone_grow_lock);
> +
> + return nr_pages >= nr_pages_needed;
> +}
> +
> +/*
> + * deferred_grow_zone() is __init, but it is called from
> + * get_page_from_freelist() during early boot until deferred_pages permanently
> + * disables this call. This is why, we have refdata wrapper to avoid warning,
> + * and ensure that the function body gets unloaded.
s/why,/why/
s/ensure/to ensure/
> + */
> +static bool __ref
> +_deferred_grow_zone(struct zone *zone, unsigned int order)
> +{
> + return deferred_grow_zone(zone, order);
> +}
> +
> #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>
> void __init page_alloc_init_late(void)
> @@ -1613,6 +1665,14 @@ void __init page_alloc_init_late(void)
> #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> int nid;
>
> + /*
> + * We are about to initialize the rest of deferred pages, permanently
> + * disable on-demand struct page initialization.
> + */
> + spin_lock(&deferred_zone_grow_lock);
> + static_branch_disable(&deferred_pages);
> + spin_unlock(&deferred_zone_grow_lock);
Ah, so the new lock is to protect the static branch machinery only?
> /* There will be num_node_state(N_MEMORY) threads */
> atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY));
> for_each_node_state(nid, N_MEMORY) {
>
> ...
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2018-02-08 20:03 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-08 18:45 [PATCH v2 0/1] initialize pages on demand during boot Pavel Tatashin
2018-02-08 18:45 ` [PATCH v2 1/1] mm: " Pavel Tatashin
2018-02-08 20:03 ` Andrew Morton [this message]
2018-02-08 22:27 ` Pavel Tatashin
2018-02-09 0:58 ` Pavel Tatashin
2018-02-10 5:24 ` kbuild test robot
2018-02-10 5:37 ` kbuild test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180208120334.0779ed0726bb527a9cad0336@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=baiyaowei@cmss.chinamobile.com \
--cc=catalin.marinas@arm.com \
--cc=daniel.m.jordan@oracle.com \
--cc=gi-oh.kim@profitbricks.com \
--cc=hannes@cmpxchg.org \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=m.mizuma@jp.fujitsu.com \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=miles.chen@mediatek.com \
--cc=pasha.tatashin@oracle.com \
--cc=paul.burton@mips.com \
--cc=richard.weiyang@gmail.com \
--cc=steven.sistare@oracle.com \
--cc=takahiro.akashi@linaro.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).