* [PATCH 0/2] fix a kernel oops in reading sysfs valid_zones @ 2017-01-26 21:44 Toshi Kani 2017-01-26 21:44 ` [PATCH 1/2] mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone() Toshi Kani 2017-01-26 21:44 ` [PATCH 2/2] base/memory, hotplug: fix a kernel oops in show_valid_zones() Toshi Kani 0 siblings, 2 replies; 7+ messages in thread From: Toshi Kani @ 2017-01-26 21:44 UTC (permalink / raw) To: akpm, gregkh Cc: linux-mm, zhenzhang.zhang, arbab, dan.j.williams, abanman, rientjes, linux-kernel A sysfs memory file is created for each 128MiB or 2GiB of a memory block on x86. [1] When the start address of a memory block is not backed by struct page, i.e. memory range is not aligned by the memory block size, reading its valid_zones attribute file leads to a kernel oops. This patch-set fixes this issue. Patch 1 first fixes an issue in test_pages_in_a_zone() that it does not test the start section. Patch 2 then fixes the kernel oops by extending test_pages_in_a_zone() to return valid [start, end). [1] 2GB when the system has 64GB or larger memory. --- Toshi Kani (2): 1/2 mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone() 2/2 base/memory, hotplug: fix a kernel oops in show_valid_zones() --- drivers/base/memory.c | 12 ++++++------ include/linux/memory_hotplug.h | 3 ++- mm/memory_hotplug.c | 28 +++++++++++++++++++++------- 3 files changed, 29 insertions(+), 14 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone() 2017-01-26 21:44 [PATCH 0/2] fix a kernel oops in reading sysfs valid_zones Toshi Kani @ 2017-01-26 21:44 ` Toshi Kani 2017-01-26 21:44 ` [PATCH 2/2] base/memory, hotplug: fix a kernel oops in show_valid_zones() Toshi Kani 1 sibling, 0 replies; 7+ messages in thread From: Toshi Kani @ 2017-01-26 21:44 UTC (permalink / raw) To: akpm, gregkh Cc: linux-mm, zhenzhang.zhang, arbab, dan.j.williams, abanman, rientjes, linux-kernel, Toshi Kani test_pages_in_a_zone() does not check 'start_pfn' when it is aligned by section since 'sec_end_pfn' is set equal to 'pfn'. Since this function is called for testing the range of a sysfs memory file, 'start_pfn' is always aligned by section. Fix it by properly setting 'sec_end_pfn' to the next section pfn. Also make sure that this function returns 1 only when the range belongs to a zone. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrew Banman <abanman@sgi.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> --- mm/memory_hotplug.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index e43142c1..7836606 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1477,7 +1477,7 @@ bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) } /* - * Confirm all pages in a range [start, end) is belongs to the same zone. + * Confirm all pages in a range [start, end) belong to the same zone. */ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) { @@ -1485,9 +1485,9 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) struct zone *zone = NULL; struct page *page; int i; - for (pfn = start_pfn, sec_end_pfn = SECTION_ALIGN_UP(start_pfn); + for (pfn = start_pfn, sec_end_pfn = SECTION_ALIGN_UP(start_pfn + 1); pfn < end_pfn; - pfn = sec_end_pfn + 1, sec_end_pfn += PAGES_PER_SECTION) { + pfn = sec_end_pfn, sec_end_pfn += PAGES_PER_SECTION) { /* Make sure the memory section is present first */ if (!present_section_nr(pfn_to_section_nr(pfn))) continue; @@ -1506,7 +1506,11 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) zone = page_zone(page); } } - return 1; + + if (zone) + return 1; + else + return 0; } /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] base/memory, hotplug: fix a kernel oops in show_valid_zones() 2017-01-26 21:44 [PATCH 0/2] fix a kernel oops in reading sysfs valid_zones Toshi Kani 2017-01-26 21:44 ` [PATCH 1/2] mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone() Toshi Kani @ 2017-01-26 21:44 ` Toshi Kani 2017-01-26 21:52 ` Andrew Morton 1 sibling, 1 reply; 7+ messages in thread From: Toshi Kani @ 2017-01-26 21:44 UTC (permalink / raw) To: akpm, gregkh Cc: linux-mm, zhenzhang.zhang, arbab, dan.j.williams, abanman, rientjes, linux-kernel, Toshi Kani Reading a sysfs memoryN/valid_zones file leads to the following oops when the first page of a range is not backed by struct page. show_valid_zones() assumes that 'start_pfn' is always valid for page_zone(). BUG: unable to handle kernel paging request at ffffea017a000000 IP: show_valid_zones+0x6f/0x160 Since test_pages_in_a_zone() already checks holes, extend this function to return 'valid_start' and 'valid_end' for a given range. show_valid_zones() then proceeds with the valid range. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Zhang Zhen <zhenzhang.zhang@huawei.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> --- drivers/base/memory.c | 12 ++++++------ include/linux/memory_hotplug.h | 3 ++- mm/memory_hotplug.c | 20 +++++++++++++++----- 3 files changed, 23 insertions(+), 12 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 8ab8ea1..2c9aad9 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -389,33 +389,33 @@ static ssize_t show_valid_zones(struct device *dev, { struct memory_block *mem = to_memory_block(dev); unsigned long start_pfn, end_pfn; + unsigned long valid_start, valid_end, valid_pages; unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; - struct page *first_page; struct zone *zone; int zone_shift = 0; start_pfn = section_nr_to_pfn(mem->start_section_nr); end_pfn = start_pfn + nr_pages; - first_page = pfn_to_page(start_pfn); /* The block contains more than one zone can not be offlined. */ - if (!test_pages_in_a_zone(start_pfn, end_pfn)) + if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, &valid_end)) return sprintf(buf, "none\n"); - zone = page_zone(first_page); + zone = page_zone(pfn_to_page(valid_start)); + valid_pages = valid_end - valid_start; /* MMOP_ONLINE_KEEP */ sprintf(buf, "%s", zone->name); /* MMOP_ONLINE_KERNEL */ - zone_shift = zone_can_shift(start_pfn, nr_pages, ZONE_NORMAL); + zone_shift = zone_can_shift(valid_start, valid_pages, ZONE_NORMAL); if (zone_shift) { strcat(buf, " "); strcat(buf, (zone + zone_shift)->name); } /* MMOP_ONLINE_MOVABLE */ - zone_shift = zone_can_shift(start_pfn, nr_pages, ZONE_MOVABLE); + zone_shift = zone_can_shift(valid_start, valid_pages, ZONE_MOVABLE); if (zone_shift) { strcat(buf, " "); strcat(buf, (zone + zone_shift)->name); diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 01033fa..b6aa972 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -85,7 +85,8 @@ extern int zone_grow_waitqueues(struct zone *zone, unsigned long nr_pages); extern int add_one_highpage(struct page *page, int pfn, int bad_ppro); /* VM interface that may be used by firmware interface */ extern int online_pages(unsigned long, unsigned long, int); -extern int test_pages_in_a_zone(unsigned long, unsigned long); +extern int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn, + unsigned long *valid_start, unsigned long *valid_end); extern void __offline_isolated_pages(unsigned long, unsigned long); typedef void (*online_page_callback_t)(struct page *page); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 7836606..9de2f83 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1478,10 +1478,13 @@ bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) /* * Confirm all pages in a range [start, end) belong to the same zone. + * When true, return its valid [start, end). */ -int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) +int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn, + unsigned long *valid_start, unsigned long *valid_end) { unsigned long pfn, sec_end_pfn; + unsigned long start, end; struct zone *zone = NULL; struct page *page; int i; @@ -1503,14 +1506,20 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) page = pfn_to_page(pfn + i); if (zone && page_zone(page) != zone) return 0; + if (!zone) + start = pfn + i; zone = page_zone(page); + end = pfn + MAX_ORDER_NR_PAGES; } } - if (zone) + if (zone) { + *valid_start = start; + *valid_end = end; return 1; - else + } else { return 0; + } } /* @@ -1837,6 +1846,7 @@ static int __ref __offline_pages(unsigned long start_pfn, long offlined_pages; int ret, drain, retry_max, node; unsigned long flags; + unsigned long valid_start, valid_end; struct zone *zone; struct memory_notify arg; @@ -1847,10 +1857,10 @@ static int __ref __offline_pages(unsigned long start_pfn, return -EINVAL; /* This makes hotplug much easier...and readable. we assume this for now. .*/ - if (!test_pages_in_a_zone(start_pfn, end_pfn)) + if (!test_pages_in_a_zone(start_pfn, end_pfn, &valid_start, &valid_end)) return -EINVAL; - zone = page_zone(pfn_to_page(start_pfn)); + zone = page_zone(pfn_to_page(valid_start)); node = zone_to_nid(zone); nr_pages = end_pfn - start_pfn; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] base/memory, hotplug: fix a kernel oops in show_valid_zones() 2017-01-26 21:44 ` [PATCH 2/2] base/memory, hotplug: fix a kernel oops in show_valid_zones() Toshi Kani @ 2017-01-26 21:52 ` Andrew Morton 2017-01-26 22:26 ` Kani, Toshimitsu 0 siblings, 1 reply; 7+ messages in thread From: Andrew Morton @ 2017-01-26 21:52 UTC (permalink / raw) To: Toshi Kani Cc: gregkh, linux-mm, zhenzhang.zhang, arbab, dan.j.williams, abanman, rientjes, linux-kernel On Thu, 26 Jan 2017 14:44:15 -0700 Toshi Kani <toshi.kani@hpe.com> wrote: > Reading a sysfs memoryN/valid_zones file leads to the following > oops when the first page of a range is not backed by struct page. > show_valid_zones() assumes that 'start_pfn' is always valid for > page_zone(). > > BUG: unable to handle kernel paging request at ffffea017a000000 > IP: show_valid_zones+0x6f/0x160 > > Since test_pages_in_a_zone() already checks holes, extend this > function to return 'valid_start' and 'valid_end' for a given range. > show_valid_zones() then proceeds with the valid range. This doesn't apply to current mainline due to changes in zone_can_shift(). Please redo and resend. Please also update the changelog to provide sufficient information for others to decide which kernel(s) need the fix. In particular: under what circumstances will it occur? On real machines which real people own? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] base/memory, hotplug: fix a kernel oops in show_valid_zones() 2017-01-26 21:52 ` Andrew Morton @ 2017-01-26 22:26 ` Kani, Toshimitsu 2017-01-27 7:48 ` gregkh 0 siblings, 1 reply; 7+ messages in thread From: Kani, Toshimitsu @ 2017-01-26 22:26 UTC (permalink / raw) To: akpm@linux-foundation.org Cc: zhenzhang.zhang@huawei.com, linux-kernel@vger.kernel.org, arbab@linux.vnet.ibm.com, abanman@sgi.com, linux-mm@kvack.org, dan.j.williams@intel.com, gregkh@linuxfoundation.org, rientjes@google.com On Thu, 2017-01-26 at 13:52 -0800, Andrew Morton wrote: > On Thu, 26 Jan 2017 14:44:15 -0700 Toshi Kani <toshi.kani@hpe.com> > wrote: > > > Reading a sysfs memoryN/valid_zones file leads to the following > > oops when the first page of a range is not backed by struct page. > > show_valid_zones() assumes that 'start_pfn' is always valid for > > page_zone(). > > > > BUG: unable to handle kernel paging request at ffffea017a000000 > > IP: show_valid_zones+0x6f/0x160 > > > > Since test_pages_in_a_zone() already checks holes, extend this > > function to return 'valid_start' and 'valid_end' for a given range. > > show_valid_zones() then proceeds with the valid range. > > This doesn't apply to current mainline due to changes in > zone_can_shift(). Please redo and resend. Sorry, I will rebase to the -mm tree and resend the patches. > Please also update the changelog to provide sufficient information > for others to decide which kernel(s) need the fix. In particular: > under what circumstances will it occur? On real machines which real > people own? Yes, this issue happens on real x86 machines with 64GiB or more memory. On such systems, the memory block size is bumped up to 2GiB. [1] Here is an example system. 0x3240000000 is only aligned by 1GiB and its memory block starts from 0x3200000000, which is not backed by struct page. BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable I will add the descriptions to the patch. [1] http://lkml.iu.edu/hypermail/linux/kernel/1411.0/02287.html Thanks, -Toshi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] base/memory, hotplug: fix a kernel oops in show_valid_zones() 2017-01-26 22:26 ` Kani, Toshimitsu @ 2017-01-27 7:48 ` gregkh 2017-01-27 17:47 ` Kani, Toshimitsu 0 siblings, 1 reply; 7+ messages in thread From: gregkh @ 2017-01-27 7:48 UTC (permalink / raw) To: Kani, Toshimitsu Cc: akpm@linux-foundation.org, zhenzhang.zhang@huawei.com, linux-kernel@vger.kernel.org, arbab@linux.vnet.ibm.com, abanman@sgi.com, linux-mm@kvack.org, dan.j.williams@intel.com, rientjes@google.com On Thu, Jan 26, 2017 at 10:26:23PM +0000, Kani, Toshimitsu wrote: > On Thu, 2017-01-26 at 13:52 -0800, Andrew Morton wrote: > > On Thu, 26 Jan 2017 14:44:15 -0700 Toshi Kani <toshi.kani@hpe.com> > > wrote: > > > > > Reading a sysfs memoryN/valid_zones file leads to the following > > > oops when the first page of a range is not backed by struct page. > > > show_valid_zones() assumes that 'start_pfn' is always valid for > > > page_zone(). > > > > > > BUG: unable to handle kernel paging request at ffffea017a000000 > > > IP: show_valid_zones+0x6f/0x160 > > > > > > Since test_pages_in_a_zone() already checks holes, extend this > > > function to return 'valid_start' and 'valid_end' for a given range. > > > show_valid_zones() then proceeds with the valid range. > > > > This doesn't apply to current mainline due to changes in > > zone_can_shift(). Please redo and resend. > > Sorry, I will rebase to the -mm tree and resend the patches. > > > Please also update the changelog to provide sufficient information > > for others to decide which kernel(s) need the fix. In particular: > > under what circumstances will it occur? On real machines which real > > people own? > > Yes, this issue happens on real x86 machines with 64GiB or more memory. > On such systems, the memory block size is bumped up to 2GiB. [1] > > Here is an example system. 0x3240000000 is only aligned by 1GiB and > its memory block starts from 0x3200000000, which is not backed by > struct page. > > BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable > > I will add the descriptions to the patch. Should it also be backported to the stable kernels to resolve the issue there? thanks, greg k-h -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] base/memory, hotplug: fix a kernel oops in show_valid_zones() 2017-01-27 7:48 ` gregkh @ 2017-01-27 17:47 ` Kani, Toshimitsu 0 siblings, 0 replies; 7+ messages in thread From: Kani, Toshimitsu @ 2017-01-27 17:47 UTC (permalink / raw) To: gregkh@linuxfoundation.org Cc: zhenzhang.zhang@huawei.com, linux-kernel@vger.kernel.org, arbab@linux.vnet.ibm.com, abanman@sgi.com, linux-mm@kvack.org, dan.j.williams@intel.com, akpm@linux-foundation.org, rientjes@google.com On Fri, 2017-01-27 at 08:48 +0100, gregkh@linuxfoundation.org wrote: > On Thu, Jan 26, 2017 at 10:26:23PM +0000, Kani, Toshimitsu wrote: > > On Thu, 2017-01-26 at 13:52 -0800, Andrew Morton wrote: > > > On Thu, 26 Jan 2017 14:44:15 -0700 Toshi Kani <toshi.kani@hpe.com > > > > > > > wrote: > > > > > > > Reading a sysfs memoryN/valid_zones file leads to the following > > > > oops when the first page of a range is not backed by struct > > > > page. show_valid_zones() assumes that 'start_pfn' is always > > > > valid for page_zone(). > > > > > > > > BUG: unable to handle kernel paging request at > > > > ffffea017a000000 > > > > IP: show_valid_zones+0x6f/0x160 > > > > > > > > Since test_pages_in_a_zone() already checks holes, extend this > > > > function to return 'valid_start' and 'valid_end' for a given > > > > range. show_valid_zones() then proceeds with the valid range. > > > > > > This doesn't apply to current mainline due to changes in > > > zone_can_shift(). Please redo and resend. > > > > Sorry, I will rebase to the -mm tree and resend the patches. > > > > > Please also update the changelog to provide sufficient > > > information for others to decide which kernel(s) need the > > > fix. In particular: under what circumstances will it occur? On > > > real machines which real people own? > > > > Yes, this issue happens on real x86 machines with 64GiB or more > > memory. On such systems, the memory block size is bumped up to > > 2GiB. [1] > > > > Here is an example system. 0x3240000000 is only aligned by 1GiB > > and its memory block starts from 0x3200000000, which is not backed > > by struct page. > > > > BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable > > > > I will add the descriptions to the patch. > > Should it also be backported to the stable kernels to resolve the > issue there? Yes, it should be backported to the stable kernels. The memory block size change was made by commit bdee237c034, which was accepted to 3.9. However, this patch-set depends on (and fixes) the change to test_pages_in_a_zone() made by commit 5f0f2887f4, which was accepted to 4.4. So, in the current form, I'd recommend we backport it up to 4.4. Thanks, -Toshi ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-01-27 17:47 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-01-26 21:44 [PATCH 0/2] fix a kernel oops in reading sysfs valid_zones Toshi Kani 2017-01-26 21:44 ` [PATCH 1/2] mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone() Toshi Kani 2017-01-26 21:44 ` [PATCH 2/2] base/memory, hotplug: fix a kernel oops in show_valid_zones() Toshi Kani 2017-01-26 21:52 ` Andrew Morton 2017-01-26 22:26 ` Kani, Toshimitsu 2017-01-27 7:48 ` gregkh 2017-01-27 17:47 ` Kani, Toshimitsu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).