* [PATCH 1/3] x86: Kill E820_RESERVED_KERN
2015-08-27 7:05 [PATCH 0/3] PM / hibernate: Fix hibernation panic caused by inconsistent e820 memory map Chen Yu
@ 2015-08-27 7:05 ` Chen Yu
2015-08-27 7:06 ` [PATCH 2/3] PM / hibernate: avoid unsafe pages in e820 reserved regions Chen Yu
2015-08-27 7:06 ` [PATCH 3/3] PM / hibernate: Remove the restriction when checking memory size before/after hibernation Chen Yu
2 siblings, 0 replies; 6+ messages in thread
From: Chen Yu @ 2015-08-27 7:05 UTC (permalink / raw)
To: tglx, mingo, rjw, pavel, hpa
Cc: len.brown, yinghai, joeyli.kernel, rui.zhang, linux-pm,
linux-kernel
From: Yinghai Lu <yinghai@kernel.org>
Sometimes E820_RESERVED_KERN causes hibernation failor when resuming:
https://bugzilla.kernel.org/show_bug.cgi?id=96111
This is because E820_RESERVED_KERN sometimes cause the regions in e820
table not page aligned, then bootup process will misjudgment the
non-page-aligned space as "hole" space and adds them to nosave region
list, then hibernation resuming process will treat these regions as
invalid thus the resuming process terminates, which causes the failor.
So we need to remove the impact of E820_RESERVED_KERN on hibernation.
Actually we do not need to touch e820 map at all, and we can kill
E820_RESERVED_KERN safely because:
1.E820_RESERVED_KERN was once introduced to do early allocation for
setup_data when we were using original early_res with e820 map.
Now we are using memblock to do early resource reservation/allocation, and
setup_data is reserved in memblock early already.
2.For kexec path, kexec generates setup_data (now kexec-tool creates
SETUP_EFI and SETUP_E820_EXT), and passes pointer to second kernel,
and second kernel reserves setup_data by its own without using e820 map.
This makes the code simpler, and at same time will fix the bug in
hibernation we mentioned before: E820_RAM and E820_RESERVED_KERN
ranges are continuous and boundary is not page aligned, which can
not be handled by hibernation.
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=913885
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96111
Reported-by: "Lee, Chun-Yi" <jlee@suse.com>
Tested-by: "Lee, Chun-Yi" <jlee@suse.com>
Reported-by: "Tian, Ye" <yex.tian@intel.com>
Tested-by: "Tian, Ye" <yex.tian@intel.com>
Cc: "Lee, Chun-Yi" <jlee@suse.com>
Cc: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Richard L Maliszewski <richard.l.maliszewski@intel.com>
Cc: Gang Wei <gang.wei@intel.com>
Cc: Shane Wang <shane.wang@intel.com>
Cc: tboot-devel@lists.sourceforge.net
Cc: stable@vger.kernel.org
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
arch/x86/include/uapi/asm/e820.h | 8 --------
arch/x86/kernel/e820.c | 6 ++----
arch/x86/kernel/setup.c | 25 -------------------------
arch/x86/kernel/tboot.c | 3 +--
arch/x86/mm/init_64.c | 11 ++++-------
5 files changed, 7 insertions(+), 46 deletions(-)
diff --git a/arch/x86/include/uapi/asm/e820.h b/arch/x86/include/uapi/asm/e820.h
index 0f457e6..a9216a1 100644
--- a/arch/x86/include/uapi/asm/e820.h
+++ b/arch/x86/include/uapi/asm/e820.h
@@ -45,14 +45,6 @@
*/
#define E820_PRAM 12
-/*
- * reserved RAM used by kernel itself
- * if CONFIG_INTEL_TXT is enabled, memory of this type will be
- * included in the S3 integrity calculation and so should not include
- * any memory that BIOS might alter over the S3 transition
- */
-#define E820_RESERVED_KERN 128
-
#ifndef __ASSEMBLY__
#include <linux/types.h>
struct e820entry {
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index a102564..2770069 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -134,7 +134,6 @@ static void __init e820_print_type(u32 type)
{
switch (type) {
case E820_RAM:
- case E820_RESERVED_KERN:
printk(KERN_CONT "usable");
break;
case E820_RESERVED:
@@ -693,7 +692,7 @@ void __init e820_mark_nosave_regions(unsigned long limit_pfn)
pfn = PFN_DOWN(ei->addr + ei->size);
- if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
+ if (ei->type != E820_RAM)
register_nosave_region(PFN_UP(ei->addr), pfn);
if (pfn >= limit_pfn)
@@ -914,7 +913,6 @@ void __init finish_e820_parsing(void)
static inline const char *e820_type_to_string(int e820_type)
{
switch (e820_type) {
- case E820_RESERVED_KERN:
case E820_RAM: return "System RAM";
case E820_ACPI: return "ACPI Tables";
case E820_NVS: return "ACPI Non-volatile Storage";
@@ -1111,7 +1109,7 @@ void __init memblock_x86_fill(void)
if (end != (resource_size_t)end)
continue;
- if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
+ if (ei->type != E820_RAM)
continue;
memblock_add(ei->addr, ei->size);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 80f874b..2ee40ef 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -457,29 +457,6 @@ static void __init parse_setup_data(void)
}
}
-static void __init e820_reserve_setup_data(void)
-{
- struct setup_data *data;
- u64 pa_data;
-
- pa_data = boot_params.hdr.setup_data;
- if (!pa_data)
- return;
-
- while (pa_data) {
- data = early_memremap(pa_data, sizeof(*data));
- e820_update_range(pa_data, sizeof(*data)+data->len,
- E820_RAM, E820_RESERVED_KERN);
- pa_data = data->next;
- early_memunmap(data, sizeof(*data));
- }
-
- sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
- memcpy(&e820_saved, &e820, sizeof(struct e820map));
- printk(KERN_INFO "extended physical RAM map:\n");
- e820_print_map("reserve setup_data");
-}
-
static void __init memblock_x86_reserve_range_setup_data(void)
{
struct setup_data *data;
@@ -1018,8 +995,6 @@ void __init setup_arch(char **cmdline_p)
early_dump_pci_devices();
#endif
- /* update the e820_saved too */
- e820_reserve_setup_data();
finish_e820_parsing();
if (efi_enabled(EFI_BOOT))
diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index 91a4496..3c2752a 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -195,8 +195,7 @@ static int tboot_setup_sleep(void)
tboot->num_mac_regions = 0;
for (i = 0; i < e820.nr_map; i++) {
- if ((e820.map[i].type != E820_RAM)
- && (e820.map[i].type != E820_RESERVED_KERN))
+ if (e820.map[i].type != E820_RAM)
continue;
add_mac_region(e820.map[i].addr, e820.map[i].size);
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3fba623..bd302a9 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -412,8 +412,7 @@ phys_pte_init(pte_t *pte_page, unsigned long addr, unsigned long end,
next = (addr & PAGE_MASK) + PAGE_SIZE;
if (addr >= end) {
if (!after_bootmem &&
- !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM) &&
- !e820_any_mapped(addr & PAGE_MASK, next, E820_RESERVED_KERN))
+ !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM))
set_pte(pte, __pte(0));
continue;
}
@@ -459,9 +458,8 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end,
next = (address & PMD_MASK) + PMD_SIZE;
if (address >= end) {
- if (!after_bootmem &&
- !e820_any_mapped(address & PMD_MASK, next, E820_RAM) &&
- !e820_any_mapped(address & PMD_MASK, next, E820_RESERVED_KERN))
+ if (!after_bootmem && !e820_any_mapped(
+ address & PMD_MASK, next, E820_RAM))
set_pmd(pmd, __pmd(0));
continue;
}
@@ -534,8 +532,7 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
next = (addr & PUD_MASK) + PUD_SIZE;
if (addr >= end) {
if (!after_bootmem &&
- !e820_any_mapped(addr & PUD_MASK, next, E820_RAM) &&
- !e820_any_mapped(addr & PUD_MASK, next, E820_RESERVED_KERN))
+ !e820_any_mapped(addr & PUD_MASK, next, E820_RAM))
set_pud(pud, __pud(0));
continue;
}
--
1.8.4.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/3] PM / hibernate: avoid unsafe pages in e820 reserved regions
2015-08-27 7:05 [PATCH 0/3] PM / hibernate: Fix hibernation panic caused by inconsistent e820 memory map Chen Yu
2015-08-27 7:05 ` [PATCH 1/3] x86: Kill E820_RESERVED_KERN Chen Yu
@ 2015-08-27 7:06 ` Chen Yu
2015-08-27 7:06 ` [PATCH 3/3] PM / hibernate: Remove the restriction when checking memory size before/after hibernation Chen Yu
2 siblings, 0 replies; 6+ messages in thread
From: Chen Yu @ 2015-08-27 7:06 UTC (permalink / raw)
To: tglx, mingo, rjw, pavel, hpa
Cc: len.brown, yinghai, joeyli.kernel, rui.zhang, linux-pm,
linux-kernel
From: "Lee, Chun-Yi" <joeyli.kernel@gmail.com>
When the machine doesn't well handle the e820 persistent when hibernate
resuming, then it may cause page fault when writing image to snapshot
buffer:
[ 17.929495] BUG: unable to handle kernel paging request at ffff880069d4f000
[ 17.933469] IP: [<ffffffff810a1cf0>] load_image_lzo+0x810/0xe40
[ 17.933469] PGD 2194067 PUD 77ffff067 PMD 2197067 PTE 0
[ 17.933469] Oops: 0002 [#1] SMP
...
The ffff880069d4f000 page is in e820 reserved region of resume boot
kernel:
[ 0.000000] BIOS-e820: [mem 0x0000000069d4f000-0x0000000069e12fff] reserved
...
[ 0.000000] PM: Registered nosave memory: [mem 0x69d4f000-0x69e12fff]
So snapshot.c mark the pfn to forbidden pages map. But, this
page is also in the memory bitmap in snapshot image because it's an
original page used by image kernel, so it will also mark as an
unsafe(free) page in prepare_image().
That means the page in e820 when resuming mark as "forbidden" and
"free", it causes get_buffer() treat it as an allocated unsafe page.
Then snapshot_write_next() return this page to load_image, load_image
writing content to this address, but this page didn't really allocated
. So, we got page fault.
Although the root cause is from BIOS, I think aggressive check and
significant message in kernel will better then a page fault for
issue tracking, especially when serial console unavailable.
This patch adds code in mark_unsafe_pages() for check does free pages in
nosave region. If so, then it print message and return fault to stop whole
S4 resume process:
[ 8.166004] PM: Image loading progress: 0%
[ 8.658717] PM: 0x6796c000 in e820 nosave region: [mem 0x6796c000-0x6796cfff]
[ 8.918737] PM: Read 2511940 kbytes in 1.04 seconds (2415.32 MB/s)
[ 8.926633] PM: Error -14 resuming
[ 8.933534] PM: Failed to load hibernation image, recovering.
Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
kernel/power/snapshot.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 5235dd4..c24d5a2 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -955,6 +955,25 @@ static void mark_nosave_pages(struct memory_bitmap *bm)
}
}
+static bool is_nosave_page(unsigned long pfn)
+{
+ struct nosave_region *region;
+
+ list_for_each_entry(region, &nosave_regions, list) {
+ if (pfn >= region->start_pfn && pfn < region->end_pfn) {
+ pr_err("PM: %#010llx in e820 nosave region: "
+ "[mem %#010llx-%#010llx]\n",
+ (unsigned long long) pfn << PAGE_SHIFT,
+ (unsigned long long) region->start_pfn << PAGE_SHIFT,
+ ((unsigned long long) region->end_pfn << PAGE_SHIFT)
+ - 1);
+ return true;
+ }
+ }
+
+ return false;
+}
+
/**
* create_basic_memory_bitmaps - create bitmaps needed for marking page
* frames that should not be saved and free page frames. The pointers
@@ -2023,7 +2042,7 @@ static int mark_unsafe_pages(struct memory_bitmap *bm)
do {
pfn = memory_bm_next_pfn(bm);
if (likely(pfn != BM_END_OF_MAP)) {
- if (likely(pfn_valid(pfn)))
+ if (likely(pfn_valid(pfn)) && !is_nosave_page(pfn))
swsusp_set_page_free(pfn_to_page(pfn));
else
return -EFAULT;
--
1.8.4.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/3] PM / hibernate: Remove the restriction when checking memory size before/after hibernation
2015-08-27 7:05 [PATCH 0/3] PM / hibernate: Fix hibernation panic caused by inconsistent e820 memory map Chen Yu
2015-08-27 7:05 ` [PATCH 1/3] x86: Kill E820_RESERVED_KERN Chen Yu
2015-08-27 7:06 ` [PATCH 2/3] PM / hibernate: avoid unsafe pages in e820 reserved regions Chen Yu
@ 2015-08-27 7:06 ` Chen Yu
2015-08-28 5:37 ` Ingo Molnar
2 siblings, 1 reply; 6+ messages in thread
From: Chen Yu @ 2015-08-27 7:06 UTC (permalink / raw)
To: tglx, mingo, rjw, pavel, hpa
Cc: len.brown, yinghai, joeyli.kernel, rui.zhang, linux-pm,
linux-kernel
Sometimes the resuming of hibernation might fail, because the
system before/after hibernation have different number of page
frames, and in current implementation, this situation will be
regarded as invalud resuming process. However, consider the following
scenario: The resuming system has a larger memory capacity than
the one before hibernation, and the former memory region is a
superset of the latter, it should be allowed to resume. For example,
someone plugs more DRAMs before resuming from hibernation.
Here's a case for this situation:
e820 memory map before hibernation:
BIOS-e820: [mem 0x0000000020200000-0x0000000077517fff] usable
BIOS-e820: [mem 0x0000000077518000-0x0000000077567fff] reserved
e820 memory map during resuming:
BIOS-e820: [mem 0x0000000020200000-0x000000007753ffff] usable
BIOS-e820: [mem 0x0000000077540000-0x0000000077567fff] reserved
In current code, the resuming process will be terminated, because
they have different memory size(usable region), but actually we should
let it continue to resume because [0x0000000020200000-0x000000007753ffff]
is a superset of [0x0000000020200000-0x0000000077517fff].
This patch removes the constraint that number of page frames should
be strictly the same before/after hibernation.
Note: This patch can only work after:
Commit ec93ef809f34 ("PM / hibernate: avoid unsafe pages in e820
reserved regions") applied.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
kernel/power/snapshot.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index c24d5a2..5b1a071 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -2072,8 +2072,12 @@ static int check_header(struct swsusp_info *info)
char *reason;
reason = check_image_kernel(info);
- if (!reason && info->num_physpages != get_num_physpages())
- reason = "memory size";
+ /*
+ * No need to check num_physpages with get_num_physpages
+ * as we did before(please refer to git log), because
+ * is_nosave_page will ensure that each page is safe
+ * to be restored.
+ */
if (reason) {
printk(KERN_ERR "PM: Image mismatch: %s\n", reason);
return -EPERM;
--
1.8.4.2
^ permalink raw reply related [flat|nested] 6+ messages in thread