From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org,
Pavel Tatashin <pasha.tatashin@oracle.com>,
Michal Hocko <mhocko@suse.com>,
Mel Gorman <mgorman@techsingularity.net>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.13 32/35] mm/page_alloc.c: broken deferred calculation
Date: Wed, 22 Nov 2017 11:12:26 +0100 [thread overview]
Message-ID: <20171122101139.672630598@linuxfoundation.org> (raw)
In-Reply-To: <20171122101137.661212603@linuxfoundation.org>
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Tatashin <pasha.tatashin@oracle.com>
commit d135e5750205a21a212a19dbb05aeb339e2cbea7 upstream.
In reset_deferred_meminit() we determine number of pages that must not
be deferred. We initialize pages for at least 2G of memory, but also
pages for reserved memory in this node.
The reserved memory is determined in this function:
memblock_reserved_memory_within(), which operates over physical
addresses, and returns size in bytes. However, reset_deferred_meminit()
assumes that that this function operates with pfns, and returns page
count.
The result is that in the best case machine boots slower than expected
due to initializing more pages than needed in single thread, and in the
worst case panics because fewer than needed pages are initialized early.
Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com
Fixes: 864b9a393dcb ("mm: consider memblock reservations for deferred memory initialization sizing")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/mmzone.h | 3 ++-
mm/page_alloc.c | 27 ++++++++++++++++++---------
2 files changed, 20 insertions(+), 10 deletions(-)
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -691,7 +691,8 @@ typedef struct pglist_data {
* is the first PFN that needs to be initialised.
*/
unsigned long first_deferred_pfn;
- unsigned long static_init_size;
+ /* Number of non-deferred pages */
+ unsigned long static_init_pgcnt;
#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -289,28 +289,37 @@ EXPORT_SYMBOL(nr_online_nodes);
int page_group_by_mobility_disabled __read_mostly;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+
+/*
+ * Determine how many pages need to be initialized durig early boot
+ * (non-deferred initialization).
+ * The value of first_deferred_pfn will be set later, once non-deferred pages
+ * are initialized, but for now set it ULONG_MAX.
+ */
static inline void reset_deferred_meminit(pg_data_t *pgdat)
{
- unsigned long max_initialise;
- unsigned long reserved_lowmem;
+ phys_addr_t start_addr, end_addr;
+ unsigned long max_pgcnt;
+ unsigned long reserved;
/*
* Initialise at least 2G of a node but also take into account that
* two large system hashes that can take up 1GB for 0.25TB/node.
*/
- max_initialise = max(2UL << (30 - PAGE_SHIFT),
- (pgdat->node_spanned_pages >> 8));
+ max_pgcnt = max(2UL << (30 - PAGE_SHIFT),
+ (pgdat->node_spanned_pages >> 8));
/*
* Compensate the all the memblock reservations (e.g. crash kernel)
* from the initial estimation to make sure we will initialize enough
* memory to boot.
*/
- reserved_lowmem = memblock_reserved_memory_within(pgdat->node_start_pfn,
- pgdat->node_start_pfn + max_initialise);
- max_initialise += reserved_lowmem;
+ start_addr = PFN_PHYS(pgdat->node_start_pfn);
+ end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt);
+ reserved = memblock_reserved_memory_within(start_addr, end_addr);
+ max_pgcnt += PHYS_PFN(reserved);
- pgdat->static_init_size = min(max_initialise, pgdat->node_spanned_pages);
+ pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages);
pgdat->first_deferred_pfn = ULONG_MAX;
}
@@ -337,7 +346,7 @@ static inline bool update_defer_init(pg_
if (zone_end < pgdat_end_pfn(pgdat))
return true;
(*nr_initialised)++;
- if ((*nr_initialised > pgdat->static_init_size) &&
+ if ((*nr_initialised > pgdat->static_init_pgcnt) &&
(pfn & (PAGES_PER_SECTION - 1)) == 0) {
pgdat->first_deferred_pfn = pfn;
return false;
next prev parent reply other threads:[~2017-11-22 10:17 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-22 10:11 [PATCH 4.13 00/35] 4.13.16-stable review Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.13 01/35] tcp_nv: fix division by zero in tcpnv_acked() Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.13 02/35] net: vrf: correct FRA_L3MDEV encode type Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.13 03/35] tcp: do not mangle skb->cb[] in tcp_make_synack() Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.13 04/35] net: systemport: Correct IPG length settings Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.13 05/35] netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 06/35] l2tp: dont use l2tp_tunnel_find() in l2tp_ip and l2tp_ip6 Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 07/35] bonding: discard lowest hash bit for 802.3ad layer3+4 Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 11/35] net: usb: asix: fill null-ptr-deref in asix_suspend Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 12/35] tcp: gso: avoid refcount_t warning from tcp_gso_segment() Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 13/35] tcp: fix tcp_fastretrans_alert warning Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 14/35] vlan: fix a use-after-free in vlan_device_event() Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 15/35] net/mlx5: Cancel health poll before sending panic teardown command Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 16/35] net/mlx5e: Set page to null in case dma mapping fails Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 17/35] af_netlink: ensure that NLMSG_DONE never fails in dumps Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 18/35] vxlan: fix the issue that neigh proxy blocks all icmpv6 packets Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 20/35] sctp: do not peel off an assoc from one netns to another one Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 21/35] fealnx: Fix building error on MIPS Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 22/35] net/sctp: Always set scope_id in sctp_inet6_skb_msgname Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 23/35] ima: do not update security.ima if appraisal status is not INTEGRITY_PASS Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 24/35] serial: omap: Fix EFR write on RTS deassertion Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 25/35] serial: 8250_fintek: Fix finding base_port with activated SuperIO Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 26/35] tpm-dev-common: Reject too short writes Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 27/35] rcu: Fix up pending cbs check in rcu_prepare_for_idle Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 28/35] mm/pagewalk.c: report holes in hugetlb ranges Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 29/35] ocfs2: fix cluster hang after a node dies Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 30/35] ocfs2: should wait dio before inode lock in ocfs2_setattr() Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 31/35] ipmi: fix unsigned long underflow Greg Kroah-Hartman
2017-11-22 10:12 ` Greg Kroah-Hartman [this message]
2017-11-22 10:12 ` [PATCH 4.13 33/35] mm/page_ext.c: check if page_ext is not prepared Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 34/35] x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 35/35] coda: fix kernel memory exposure attempt in fsync Greg Kroah-Hartman
2017-11-22 16:49 ` [PATCH 4.13 00/35] 4.13.16-stable review Greg Kroah-Hartman
2017-11-22 21:33 ` Guenter Roeck
2017-11-23 14:48 ` Naresh Kamboju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171122101139.672630598@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=pasha.tatashin@oracle.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).