* [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region
@ 2015-01-30 3:58 Lee, Chun-Yi
2015-01-30 7:35 ` Yinghai Lu
2015-01-30 8:30 ` Yinghai Lu
0 siblings, 2 replies; 6+ messages in thread
From: Lee, Chun-Yi @ 2015-01-30 3:58 UTC (permalink / raw)
To: Rafael J. Wysocki, Ingo Molnar, Pavel Machek
Cc: Thomas Gleixner, x86, linux-kernel, Lee, Chun-Yi, Len Brown,
H. Peter Anvin, Yinghai Lu, Takashi Iwai
The reserve setup_data action break usable regions to not align to
page size. As following case:
BIOS-e820: [mem 0x0000000000088000-0x00000000000bffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x0000000094caffff] usable
...
e820: update [mem 0x93c5f018-0x93c6f057] usable ==> usable /* reserve setup_data */
e820: update [mem 0x93c4f018-0x93c5e057] usable ==> usable /* reserve setup_data */
...
reserve setup_data: [mem0x0000000000088000-0x00000000000bffff] reserved
reserve setup_data: [mem0x0000000000100000-0x0000000093c4f017] usable /* not align */
reserve setup_data: [mem0x0000000093c4f018-0x0000000093c5e057] usable /* not align */
reserve setup_data: [mem0x0000000093c5e058-0x0000000093c5f017] usable /* not align */
reserve setup_data: [mem0x0000000093c5f018-0x0000000093c6f057] usable /* not align */
reserve setup_data: [mem0x0000000093c6f058-0x0000000094caffff] usable
The codes in e820_mark_nosave_regions() check pfn of each e820
entry to find out the hole between two entries then register it to
nosave region. This logic misjudges the non-align but continuous usable
region, then register one page in usable region to nosave. As following:
PM: Registered nosave memory: [mem 0x000c0000-0x000fffff]
PM: Registered nosave memory: [mem 0x93c4f000-0x93c4ffff] /* misjudgment */
PM: Registered nosave memory: [mem 0x93c5e000-0x93c5efff] /* misjudgment */
PM: Registered nosave memory: [mem 0x93c5f000-0x93c5ffff] /* misjudgment */
PM: Registered nosave memory: [mem 0x93c6f000-0x93c6ffff] /* misjudgment */
PM: Registered nosave memory: [mem 0x94cb0000-0x960affff]
The issue causes some usable pages didn't collect to hibernate snapshot image.
And, it also misjudges a usable page in nosave regions then hibernate resuming
process interrupted by the unsafe pages checking:
https://bugzilla.opensuse.org/show_bug.cgi?id=913885
This patch changed the code in e820_mark_nosave_regions() to check the
address instead of pfn to avoid above issue.
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <len.brown@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
---
arch/x86/kernel/e820.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 49f8864..6eae021 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -687,15 +687,16 @@ void __init parse_e820_ext(u64 phys_addr, u32 data_len)
void __init e820_mark_nosave_regions(unsigned long limit_pfn)
{
int i;
- unsigned long pfn = 0;
+ unsigned long pfn = 0, pfnaddr = 0;
for (i = 0; i < e820.nr_map; i++) {
struct e820entry *ei = &e820.map[i];
- if (pfn < PFN_UP(ei->addr))
+ if (pfnaddr < ei->addr)
register_nosave_region(pfn, PFN_UP(ei->addr));
- pfn = PFN_DOWN(ei->addr + ei->size);
+ pfnaddr = ei->addr + ei->size;
+ pfn = PFN_DOWN(pfnaddr);
if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
register_nosave_region(PFN_UP(ei->addr), pfn);
--
1.8.4.5
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region
2015-01-30 3:58 [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region Lee, Chun-Yi
@ 2015-01-30 7:35 ` Yinghai Lu
2015-01-30 14:41 ` joeyli
2015-01-30 8:30 ` Yinghai Lu
1 sibling, 1 reply; 6+ messages in thread
From: Yinghai Lu @ 2015-01-30 7:35 UTC (permalink / raw)
To: Lee, Chun-Yi
Cc: Rafael J. Wysocki, Ingo Molnar, Pavel Machek, Thomas Gleixner,
the arch/x86 maintainers, Linux Kernel Mailing List, Lee, Chun-Yi,
Len Brown, H. Peter Anvin, Takashi Iwai
On Thu, Jan 29, 2015 at 7:58 PM, Lee, Chun-Yi <joeyli.kernel@gmail.com> wrote:
>
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index 49f8864..6eae021 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -687,15 +687,16 @@ void __init parse_e820_ext(u64 phys_addr, u32 data_len)
> void __init e820_mark_nosave_regions(unsigned long limit_pfn)
> {
> int i;
> - unsigned long pfn = 0;
> + unsigned long pfn = 0, pfnaddr = 0;
>
> for (i = 0; i < e820.nr_map; i++) {
> struct e820entry *ei = &e820.map[i];
>
> - if (pfn < PFN_UP(ei->addr))
> + if (pfnaddr < ei->addr)
> register_nosave_region(pfn, PFN_UP(ei->addr));
>
> - pfn = PFN_DOWN(ei->addr + ei->size);
> + pfnaddr = ei->addr + ei->size;
> + pfn = PFN_DOWN(pfnaddr);
> if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
> register_nosave_region(PFN_UP(ei->addr), pfn);
>
Those changes may not fix the problem.
those E820_RESERVED_KERN and E820_RAM should be continuous.
So you need to find the real end for those continuous ranges.
Thanks
Yinghai
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region
2015-01-30 3:58 [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region Lee, Chun-Yi
2015-01-30 7:35 ` Yinghai Lu
@ 2015-01-30 8:30 ` Yinghai Lu
2015-01-30 14:46 ` joeyli
1 sibling, 1 reply; 6+ messages in thread
From: Yinghai Lu @ 2015-01-30 8:30 UTC (permalink / raw)
To: Lee, Chun-Yi, Huang Ying
Cc: Rafael J. Wysocki, Ingo Molnar, Pavel Machek, Thomas Gleixner,
the arch/x86 maintainers, Linux Kernel Mailing List, Lee, Chun-Yi,
Len Brown, H. Peter Anvin, Takashi Iwai
[-- Attachment #1: Type: text/plain, Size: 2750 bytes --]
On Thu, Jan 29, 2015 at 7:58 PM, Lee, Chun-Yi <joeyli.kernel@gmail.com> wrote:
> The reserve setup_data action break usable regions to not align to
> page size. As following case:
>
> BIOS-e820: [mem 0x0000000000088000-0x00000000000bffff] reserved
> BIOS-e820: [mem 0x0000000000100000-0x0000000094caffff] usable
> ...
> e820: update [mem 0x93c5f018-0x93c6f057] usable ==> usable /* reserve setup_data */
> e820: update [mem 0x93c4f018-0x93c5e057] usable ==> usable /* reserve setup_data */
> ...
> reserve setup_data: [mem0x0000000000088000-0x00000000000bffff] reserved
> reserve setup_data: [mem0x0000000000100000-0x0000000093c4f017] usable /* not align */
> reserve setup_data: [mem0x0000000093c4f018-0x0000000093c5e057] usable /* not align */
> reserve setup_data: [mem0x0000000093c5e058-0x0000000093c5f017] usable /* not align */
> reserve setup_data: [mem0x0000000093c5f018-0x0000000093c6f057] usable /* not align */
> reserve setup_data: [mem0x0000000093c6f058-0x0000000094caffff] usable
>
> The codes in e820_mark_nosave_regions() check pfn of each e820
> entry to find out the hole between two entries then register it to
> nosave region. This logic misjudges the non-align but continuous usable
> region, then register one page in usable region to nosave. As following:
>
> PM: Registered nosave memory: [mem 0x000c0000-0x000fffff]
> PM: Registered nosave memory: [mem 0x93c4f000-0x93c4ffff] /* misjudgment */
> PM: Registered nosave memory: [mem 0x93c5e000-0x93c5efff] /* misjudgment */
> PM: Registered nosave memory: [mem 0x93c5f000-0x93c5ffff] /* misjudgment */
> PM: Registered nosave memory: [mem 0x93c6f000-0x93c6ffff] /* misjudgment */
> PM: Registered nosave memory: [mem 0x94cb0000-0x960affff]
>
> The issue causes some usable pages didn't collect to hibernate snapshot image.
> And, it also misjudges a usable page in nosave regions then hibernate resuming
> process interrupted by the unsafe pages checking:
> https://bugzilla.opensuse.org/show_bug.cgi?id=913885
>
> This patch changed the code in e820_mark_nosave_regions() to check the
> address instead of pfn to avoid above issue.
[+cc: Ying Huang]
would like to suggest use attached instead:
Subject: [RFC PATCH] x86: Kill E820_RESERVED_KERN
Now we are using memblock to do early resource allocation instead of using
e820 map directly, and setup_data is reserved in memblock early.
Also kexec pass setup_data pointer to second kernel, so second kernel
will reserve setup_data by their own.
So we can kill E820_RESERVED_KERN and not touch e820 map at all.
That will fix bug in mark_nonsave_region that can not handle that
case: E820_RAM and E820_RESERVED_KERN continuous and boundary is
not page aligned.
Not sure about why tboot need to use this...
[-- Attachment #2: kill_e820_reserved_kern.patch --]
[-- Type: text/x-patch, Size: 5940 bytes --]
Subject: [RFC PATCH] x86: Kill E820_RESERVED_KERN
Now we are using memblock to do early resource allocation instead of using
e820 map directly, and setup_data is reserved in memblock early.
Also kexec pass setup_data pointer to second kernel, so second kernel
will reserve setup_data by their own.
So we can kill E820_RESERVED_KERN and not touch e820 map at all.
That will fix bug in mark_nonsave_region that can not handle that
case: E820_RAM and E820_RESERVED_KERN continuous and boundary is
not page aligned.
Not sure about why tboot need to use this...
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
arch/x86/include/uapi/asm/e820.h | 9 ---------
arch/x86/kernel/e820.c | 6 ++----
arch/x86/kernel/setup.c | 26 --------------------------
arch/x86/kernel/tboot.c | 3 +--
arch/x86/mm/init_64.c | 11 ++++-------
5 files changed, 7 insertions(+), 48 deletions(-)
Index: linux-2.6/arch/x86/include/uapi/asm/e820.h
===================================================================
--- linux-2.6.orig/arch/x86/include/uapi/asm/e820.h
+++ linux-2.6/arch/x86/include/uapi/asm/e820.h
@@ -33,15 +33,6 @@
#define E820_NVS 4
#define E820_UNUSABLE 5
-
-/*
- * reserved RAM used by kernel itself
- * if CONFIG_INTEL_TXT is enabled, memory of this type will be
- * included in the S3 integrity calculation and so should not include
- * any memory that BIOS might alter over the S3 transition
- */
-#define E820_RESERVED_KERN 128
-
#ifndef __ASSEMBLY__
#include <linux/types.h>
struct e820entry {
Index: linux-2.6/arch/x86/kernel/e820.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820.c
+++ linux-2.6/arch/x86/kernel/e820.c
@@ -134,7 +134,6 @@ static void __init e820_print_type(u32 t
{
switch (type) {
case E820_RAM:
- case E820_RESERVED_KERN:
printk(KERN_CONT "usable");
break;
case E820_RESERVED:
@@ -688,7 +687,7 @@ void __init e820_mark_nosave_regions(uns
register_nosave_region(pfn, PFN_UP(ei->addr));
pfn = PFN_DOWN(ei->addr + ei->size);
- if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
+ if (ei->type != E820_RAM)
register_nosave_region(PFN_UP(ei->addr), pfn);
if (pfn >= limit_pfn)
@@ -902,7 +901,6 @@ void __init finish_e820_parsing(void)
static inline const char *e820_type_to_string(int e820_type)
{
switch (e820_type) {
- case E820_RESERVED_KERN:
case E820_RAM: return "System RAM";
case E820_ACPI: return "ACPI Tables";
case E820_NVS: return "ACPI Non-volatile Storage";
@@ -1077,7 +1075,7 @@ void __init memblock_x86_fill(void)
if (end != (resource_size_t)end)
continue;
- if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
+ if (ei->type != E820_RAM)
continue;
memblock_add(ei->addr, ei->size);
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -456,30 +456,6 @@ static void __init parse_setup_data(void
}
}
-static void __init e820_reserve_setup_data(void)
-{
- struct setup_data *data;
- u64 pa_data;
- int found = 0;
-
- pa_data = boot_params.hdr.setup_data;
- while (pa_data) {
- data = early_memremap(pa_data, sizeof(*data));
- e820_update_range(pa_data, sizeof(*data)+data->len,
- E820_RAM, E820_RESERVED_KERN);
- found = 1;
- pa_data = data->next;
- early_iounmap(data, sizeof(*data));
- }
- if (!found)
- return;
-
- sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
- memcpy(&e820_saved, &e820, sizeof(struct e820map));
- printk(KERN_INFO "extended physical RAM map:\n");
- e820_print_map("reserve setup_data");
-}
-
static void __init memblock_x86_reserve_range_setup_data(void)
{
struct setup_data *data;
@@ -1011,8 +987,6 @@ void __init setup_arch(char **cmdline_p)
early_dump_pci_devices();
#endif
- /* update the e820_saved too */
- e820_reserve_setup_data();
finish_e820_parsing();
if (efi_enabled(EFI_BOOT))
Index: linux-2.6/arch/x86/kernel/tboot.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/tboot.c
+++ linux-2.6/arch/x86/kernel/tboot.c
@@ -195,8 +195,7 @@ static int tboot_setup_sleep(void)
tboot->num_mac_regions = 0;
for (i = 0; i < e820.nr_map; i++) {
- if ((e820.map[i].type != E820_RAM)
- && (e820.map[i].type != E820_RESERVED_KERN))
+ if (e820.map[i].type != E820_RAM)
continue;
add_mac_region(e820.map[i].addr, e820.map[i].size);
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -426,8 +426,7 @@ phys_pte_init(pte_t *pte_page, unsigned
next = (addr & PAGE_MASK) + PAGE_SIZE;
if (addr >= end) {
if (!after_bootmem &&
- !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM) &&
- !e820_any_mapped(addr & PAGE_MASK, next, E820_RESERVED_KERN))
+ !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM))
set_pte(pte, __pte(0));
continue;
}
@@ -473,9 +472,8 @@ phys_pmd_init(pmd_t *pmd_page, unsigned
next = (address & PMD_MASK) + PMD_SIZE;
if (address >= end) {
- if (!after_bootmem &&
- !e820_any_mapped(address & PMD_MASK, next, E820_RAM) &&
- !e820_any_mapped(address & PMD_MASK, next, E820_RESERVED_KERN))
+ if (!after_bootmem && !e820_any_mapped(
+ address & PMD_MASK, next, E820_RAM))
set_pmd(pmd, __pmd(0));
continue;
}
@@ -548,8 +546,7 @@ phys_pud_init(pud_t *pud_page, unsigned
next = (addr & PUD_MASK) + PUD_SIZE;
if (addr >= end) {
if (!after_bootmem &&
- !e820_any_mapped(addr & PUD_MASK, next, E820_RAM) &&
- !e820_any_mapped(addr & PUD_MASK, next, E820_RESERVED_KERN))
+ !e820_any_mapped(addr & PUD_MASK, next, E820_RAM))
set_pud(pud, __pud(0));
continue;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region
2015-01-30 7:35 ` Yinghai Lu
@ 2015-01-30 14:41 ` joeyli
0 siblings, 0 replies; 6+ messages in thread
From: joeyli @ 2015-01-30 14:41 UTC (permalink / raw)
To: Yinghai Lu
Cc: Lee, Chun-Yi, Rafael J. Wysocki, Ingo Molnar, Pavel Machek,
Thomas Gleixner, the arch/x86 maintainers,
Linux Kernel Mailing List, Len Brown, H. Peter Anvin,
Takashi Iwai
Hi,
On Thu, Jan 29, 2015 at 11:35:49PM -0800, Yinghai Lu wrote:
> On Thu, Jan 29, 2015 at 7:58 PM, Lee, Chun-Yi <joeyli.kernel@gmail.com> wrote:
> >
> > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> > index 49f8864..6eae021 100644
> > --- a/arch/x86/kernel/e820.c
> > +++ b/arch/x86/kernel/e820.c
> > @@ -687,15 +687,16 @@ void __init parse_e820_ext(u64 phys_addr, u32 data_len)
> > void __init e820_mark_nosave_regions(unsigned long limit_pfn)
> > {
> > int i;
> > - unsigned long pfn = 0;
> > + unsigned long pfn = 0, pfnaddr = 0;
> >
> > for (i = 0; i < e820.nr_map; i++) {
> > struct e820entry *ei = &e820.map[i];
> >
> > - if (pfn < PFN_UP(ei->addr))
> > + if (pfnaddr < ei->addr)
> > register_nosave_region(pfn, PFN_UP(ei->addr));
> >
> > - pfn = PFN_DOWN(ei->addr + ei->size);
> > + pfnaddr = ei->addr + ei->size;
> > + pfn = PFN_DOWN(pfnaddr);
> > if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
> > register_nosave_region(PFN_UP(ei->addr), pfn);
> >
> Those changes may not fix the problem.
>
> those E820_RESERVED_KERN and E820_RAM should be continuous.
Yes, they are continuous but not aligned.
>
> So you need to find the real end for those continuous ranges.
>
Sorry for I didn't capture the point for need to find the real end
because those misjudgments are happened in the boundary between
E820_RESERVED_KERN and E820_RAM.
> Thanks
>
> Yinghai
e.g.
reserve setup_data: [mem 0x0000000000100000-0x0000000093c4f017] usable
reserve setup_data: [mem 0x0000000093c4f018-0x0000000093c5e057] usable
pfn = PFN_DOWN(0x93c4f017+1) = 0x93c4f
PFN_UP(0x93c4f018) = (0x93c4f018 + 0x1000 - 1 >> PAGE_SHIFT) = 0x93C50
The above case match with "if (pfn < PFN_UP(ei->addr))" logic, it causes
e820_mark_nosave_regions() add one usable page to nosave region:
PM: Registered nosave memory: [mem 0x93c4f000-0x93c4ffff]
Comparing the pfn of regions is not enough so my patch compares the address
instead of pfn. It works to me to avoid those one page area show in nosave
region.
Thanks a lot!
Joey Lee
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region
2015-01-30 8:30 ` Yinghai Lu
@ 2015-01-30 14:46 ` joeyli
2015-01-30 16:28 ` joeyli
0 siblings, 1 reply; 6+ messages in thread
From: joeyli @ 2015-01-30 14:46 UTC (permalink / raw)
To: Yinghai Lu
Cc: Lee, Chun-Yi, Huang Ying, Rafael J. Wysocki, Ingo Molnar,
Pavel Machek, Thomas Gleixner, the arch/x86 maintainers,
Linux Kernel Mailing List, Len Brown, H. Peter Anvin,
Takashi Iwai
On Fri, Jan 30, 2015 at 12:30:00AM -0800, Yinghai Lu wrote:
> On Thu, Jan 29, 2015 at 7:58 PM, Lee, Chun-Yi <joeyli.kernel@gmail.com> wrote:
> > The reserve setup_data action break usable regions to not align to
> > page size. As following case:
> >
> > BIOS-e820: [mem 0x0000000000088000-0x00000000000bffff] reserved
> > BIOS-e820: [mem 0x0000000000100000-0x0000000094caffff] usable
> > ...
> > e820: update [mem 0x93c5f018-0x93c6f057] usable ==> usable /* reserve setup_data */
> > e820: update [mem 0x93c4f018-0x93c5e057] usable ==> usable /* reserve setup_data */
> > ...
> > reserve setup_data: [mem0x0000000000088000-0x00000000000bffff] reserved
> > reserve setup_data: [mem0x0000000000100000-0x0000000093c4f017] usable /* not align */
> > reserve setup_data: [mem0x0000000093c4f018-0x0000000093c5e057] usable /* not align */
> > reserve setup_data: [mem0x0000000093c5e058-0x0000000093c5f017] usable /* not align */
> > reserve setup_data: [mem0x0000000093c5f018-0x0000000093c6f057] usable /* not align */
> > reserve setup_data: [mem0x0000000093c6f058-0x0000000094caffff] usable
> >
> > The codes in e820_mark_nosave_regions() check pfn of each e820
> > entry to find out the hole between two entries then register it to
> > nosave region. This logic misjudges the non-align but continuous usable
> > region, then register one page in usable region to nosave. As following:
> >
> > PM: Registered nosave memory: [mem 0x000c0000-0x000fffff]
> > PM: Registered nosave memory: [mem 0x93c4f000-0x93c4ffff] /* misjudgment */
> > PM: Registered nosave memory: [mem 0x93c5e000-0x93c5efff] /* misjudgment */
> > PM: Registered nosave memory: [mem 0x93c5f000-0x93c5ffff] /* misjudgment */
> > PM: Registered nosave memory: [mem 0x93c6f000-0x93c6ffff] /* misjudgment */
> > PM: Registered nosave memory: [mem 0x94cb0000-0x960affff]
> >
> > The issue causes some usable pages didn't collect to hibernate snapshot image.
> > And, it also misjudges a usable page in nosave regions then hibernate resuming
> > process interrupted by the unsafe pages checking:
> > https://bugzilla.opensuse.org/show_bug.cgi?id=913885
> >
> > This patch changed the code in e820_mark_nosave_regions() to check the
> > address instead of pfn to avoid above issue.
>
> [+cc: Ying Huang]
>
> would like to suggest use attached instead:
>
> Subject: [RFC PATCH] x86: Kill E820_RESERVED_KERN
>
> Now we are using memblock to do early resource allocation instead of using
> e820 map directly, and setup_data is reserved in memblock early.
> Also kexec pass setup_data pointer to second kernel, so second kernel
> will reserve setup_data by their own.
>
> So we can kill E820_RESERVED_KERN and not touch e820 map at all.
>
> That will fix bug in mark_nonsave_region that can not handle that
> case: E820_RAM and E820_RESERVED_KERN continuous and boundary is
> not page aligned.
>
> Not sure about why tboot need to use this...
Yes, that's good to totally remove the E820_RESERVED_KERN because we already
use memblock to reserve setup_data ranges.
Thanks
Joey Lee
> Subject: [RFC PATCH] x86: Kill E820_RESERVED_KERN
>
> Now we are using memblock to do early resource allocation instead of using
> e820 map directly, and setup_data is reserved in memblock early.
> Also kexec pass setup_data pointer to second kernel, so second kernel
> will reserve setup_data by their own.
>
> So we can kill E820_RESERVED_KERN and not touch e820 map at all.
>
> That will fix bug in mark_nonsave_region that can not handle that
> case: E820_RAM and E820_RESERVED_KERN continuous and boundary is
> not page aligned.
>
> Not sure about why tboot need to use this...
>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> ---
> arch/x86/include/uapi/asm/e820.h | 9 ---------
> arch/x86/kernel/e820.c | 6 ++----
> arch/x86/kernel/setup.c | 26 --------------------------
> arch/x86/kernel/tboot.c | 3 +--
> arch/x86/mm/init_64.c | 11 ++++-------
> 5 files changed, 7 insertions(+), 48 deletions(-)
>
> Index: linux-2.6/arch/x86/include/uapi/asm/e820.h
> ===================================================================
> --- linux-2.6.orig/arch/x86/include/uapi/asm/e820.h
> +++ linux-2.6/arch/x86/include/uapi/asm/e820.h
> @@ -33,15 +33,6 @@
> #define E820_NVS 4
> #define E820_UNUSABLE 5
>
> -
> -/*
> - * reserved RAM used by kernel itself
> - * if CONFIG_INTEL_TXT is enabled, memory of this type will be
> - * included in the S3 integrity calculation and so should not include
> - * any memory that BIOS might alter over the S3 transition
> - */
> -#define E820_RESERVED_KERN 128
> -
> #ifndef __ASSEMBLY__
> #include <linux/types.h>
> struct e820entry {
> Index: linux-2.6/arch/x86/kernel/e820.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/e820.c
> +++ linux-2.6/arch/x86/kernel/e820.c
> @@ -134,7 +134,6 @@ static void __init e820_print_type(u32 t
> {
> switch (type) {
> case E820_RAM:
> - case E820_RESERVED_KERN:
> printk(KERN_CONT "usable");
> break;
> case E820_RESERVED:
> @@ -688,7 +687,7 @@ void __init e820_mark_nosave_regions(uns
> register_nosave_region(pfn, PFN_UP(ei->addr));
>
> pfn = PFN_DOWN(ei->addr + ei->size);
> - if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
> + if (ei->type != E820_RAM)
> register_nosave_region(PFN_UP(ei->addr), pfn);
>
> if (pfn >= limit_pfn)
> @@ -902,7 +901,6 @@ void __init finish_e820_parsing(void)
> static inline const char *e820_type_to_string(int e820_type)
> {
> switch (e820_type) {
> - case E820_RESERVED_KERN:
> case E820_RAM: return "System RAM";
> case E820_ACPI: return "ACPI Tables";
> case E820_NVS: return "ACPI Non-volatile Storage";
> @@ -1077,7 +1075,7 @@ void __init memblock_x86_fill(void)
> if (end != (resource_size_t)end)
> continue;
>
> - if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
> + if (ei->type != E820_RAM)
> continue;
>
> memblock_add(ei->addr, ei->size);
> Index: linux-2.6/arch/x86/kernel/setup.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/setup.c
> +++ linux-2.6/arch/x86/kernel/setup.c
> @@ -456,30 +456,6 @@ static void __init parse_setup_data(void
> }
> }
>
> -static void __init e820_reserve_setup_data(void)
> -{
> - struct setup_data *data;
> - u64 pa_data;
> - int found = 0;
> -
> - pa_data = boot_params.hdr.setup_data;
> - while (pa_data) {
> - data = early_memremap(pa_data, sizeof(*data));
> - e820_update_range(pa_data, sizeof(*data)+data->len,
> - E820_RAM, E820_RESERVED_KERN);
> - found = 1;
> - pa_data = data->next;
> - early_iounmap(data, sizeof(*data));
> - }
> - if (!found)
> - return;
> -
> - sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
> - memcpy(&e820_saved, &e820, sizeof(struct e820map));
> - printk(KERN_INFO "extended physical RAM map:\n");
> - e820_print_map("reserve setup_data");
> -}
> -
> static void __init memblock_x86_reserve_range_setup_data(void)
> {
> struct setup_data *data;
> @@ -1011,8 +987,6 @@ void __init setup_arch(char **cmdline_p)
> early_dump_pci_devices();
> #endif
>
> - /* update the e820_saved too */
> - e820_reserve_setup_data();
> finish_e820_parsing();
>
> if (efi_enabled(EFI_BOOT))
> Index: linux-2.6/arch/x86/kernel/tboot.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/tboot.c
> +++ linux-2.6/arch/x86/kernel/tboot.c
> @@ -195,8 +195,7 @@ static int tboot_setup_sleep(void)
> tboot->num_mac_regions = 0;
>
> for (i = 0; i < e820.nr_map; i++) {
> - if ((e820.map[i].type != E820_RAM)
> - && (e820.map[i].type != E820_RESERVED_KERN))
> + if (e820.map[i].type != E820_RAM)
> continue;
>
> add_mac_region(e820.map[i].addr, e820.map[i].size);
> Index: linux-2.6/arch/x86/mm/init_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/init_64.c
> +++ linux-2.6/arch/x86/mm/init_64.c
> @@ -426,8 +426,7 @@ phys_pte_init(pte_t *pte_page, unsigned
> next = (addr & PAGE_MASK) + PAGE_SIZE;
> if (addr >= end) {
> if (!after_bootmem &&
> - !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM) &&
> - !e820_any_mapped(addr & PAGE_MASK, next, E820_RESERVED_KERN))
> + !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM))
> set_pte(pte, __pte(0));
> continue;
> }
> @@ -473,9 +472,8 @@ phys_pmd_init(pmd_t *pmd_page, unsigned
>
> next = (address & PMD_MASK) + PMD_SIZE;
> if (address >= end) {
> - if (!after_bootmem &&
> - !e820_any_mapped(address & PMD_MASK, next, E820_RAM) &&
> - !e820_any_mapped(address & PMD_MASK, next, E820_RESERVED_KERN))
> + if (!after_bootmem && !e820_any_mapped(
> + address & PMD_MASK, next, E820_RAM))
> set_pmd(pmd, __pmd(0));
> continue;
> }
> @@ -548,8 +546,7 @@ phys_pud_init(pud_t *pud_page, unsigned
> next = (addr & PUD_MASK) + PUD_SIZE;
> if (addr >= end) {
> if (!after_bootmem &&
> - !e820_any_mapped(addr & PUD_MASK, next, E820_RAM) &&
> - !e820_any_mapped(addr & PUD_MASK, next, E820_RESERVED_KERN))
> + !e820_any_mapped(addr & PUD_MASK, next, E820_RAM))
> set_pud(pud, __pud(0));
> continue;
> }
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region
2015-01-30 14:46 ` joeyli
@ 2015-01-30 16:28 ` joeyli
0 siblings, 0 replies; 6+ messages in thread
From: joeyli @ 2015-01-30 16:28 UTC (permalink / raw)
To: Yinghai Lu
Cc: Lee, Chun-Yi, Huang Ying, Rafael J. Wysocki, Ingo Molnar,
Pavel Machek, Thomas Gleixner, the arch/x86 maintainers,
Linux Kernel Mailing List, Len Brown, H. Peter Anvin,
Takashi Iwai
On Fri, Jan 30, 2015 at 10:46:48PM +0800, joeyli wrote:
> On Fri, Jan 30, 2015 at 12:30:00AM -0800, Yinghai Lu wrote:
> > On Thu, Jan 29, 2015 at 7:58 PM, Lee, Chun-Yi <joeyli.kernel@gmail.com> wrote:
> > > The reserve setup_data action break usable regions to not align to
> > > page size. As following case:
> > >
> > > BIOS-e820: [mem 0x0000000000088000-0x00000000000bffff] reserved
> > > BIOS-e820: [mem 0x0000000000100000-0x0000000094caffff] usable
> > > ...
> > > e820: update [mem 0x93c5f018-0x93c6f057] usable ==> usable /* reserve setup_data */
> > > e820: update [mem 0x93c4f018-0x93c5e057] usable ==> usable /* reserve setup_data */
> > > ...
> > > reserve setup_data: [mem0x0000000000088000-0x00000000000bffff] reserved
> > > reserve setup_data: [mem0x0000000000100000-0x0000000093c4f017] usable /* not align */
> > > reserve setup_data: [mem0x0000000093c4f018-0x0000000093c5e057] usable /* not align */
> > > reserve setup_data: [mem0x0000000093c5e058-0x0000000093c5f017] usable /* not align */
> > > reserve setup_data: [mem0x0000000093c5f018-0x0000000093c6f057] usable /* not align */
> > > reserve setup_data: [mem0x0000000093c6f058-0x0000000094caffff] usable
> > >
> > > The codes in e820_mark_nosave_regions() check pfn of each e820
> > > entry to find out the hole between two entries then register it to
> > > nosave region. This logic misjudges the non-align but continuous usable
> > > region, then register one page in usable region to nosave. As following:
> > >
> > > PM: Registered nosave memory: [mem 0x000c0000-0x000fffff]
> > > PM: Registered nosave memory: [mem 0x93c4f000-0x93c4ffff] /* misjudgment */
> > > PM: Registered nosave memory: [mem 0x93c5e000-0x93c5efff] /* misjudgment */
> > > PM: Registered nosave memory: [mem 0x93c5f000-0x93c5ffff] /* misjudgment */
> > > PM: Registered nosave memory: [mem 0x93c6f000-0x93c6ffff] /* misjudgment */
> > > PM: Registered nosave memory: [mem 0x94cb0000-0x960affff]
> > >
> > > The issue causes some usable pages didn't collect to hibernate snapshot image.
> > > And, it also misjudges a usable page in nosave regions then hibernate resuming
> > > process interrupted by the unsafe pages checking:
> > > https://bugzilla.opensuse.org/show_bug.cgi?id=913885
> > >
> > > This patch changed the code in e820_mark_nosave_regions() to check the
> > > address instead of pfn to avoid above issue.
> >
> > [+cc: Ying Huang]
> >
> > would like to suggest use attached instead:
> >
> > Subject: [RFC PATCH] x86: Kill E820_RESERVED_KERN
> >
> > Now we are using memblock to do early resource allocation instead of using
> > e820 map directly, and setup_data is reserved in memblock early.
> > Also kexec pass setup_data pointer to second kernel, so second kernel
> > will reserve setup_data by their own.
> >
> > So we can kill E820_RESERVED_KERN and not touch e820 map at all.
> >
> > That will fix bug in mark_nonsave_region that can not handle that
> > case: E820_RAM and E820_RESERVED_KERN continuous and boundary is
> > not page aligned.
> >
> > Not sure about why tboot need to use this...
>
> Yes, that's good to totally remove the E820_RESERVED_KERN because we already
> use memblock to reserve setup_data ranges.
>
>
> Thanks
> Joey Lee
This patch works on my Gateway EG50_HW notebook to fix issue in nosave regions.
Tested-by: Lee, Chun-Yi <jlee@suse.com>
Thanks a lot!
Joey Lee
>
> > Subject: [RFC PATCH] x86: Kill E820_RESERVED_KERN
> >
> > Now we are using memblock to do early resource allocation instead of using
> > e820 map directly, and setup_data is reserved in memblock early.
> > Also kexec pass setup_data pointer to second kernel, so second kernel
> > will reserve setup_data by their own.
> >
> > So we can kill E820_RESERVED_KERN and not touch e820 map at all.
> >
> > That will fix bug in mark_nonsave_region that can not handle that
> > case: E820_RAM and E820_RESERVED_KERN continuous and boundary is
> > not page aligned.
> >
> > Not sure about why tboot need to use this...
> >
> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> >
> > ---
> > arch/x86/include/uapi/asm/e820.h | 9 ---------
> > arch/x86/kernel/e820.c | 6 ++----
> > arch/x86/kernel/setup.c | 26 --------------------------
> > arch/x86/kernel/tboot.c | 3 +--
> > arch/x86/mm/init_64.c | 11 ++++-------
> > 5 files changed, 7 insertions(+), 48 deletions(-)
> >
> > Index: linux-2.6/arch/x86/include/uapi/asm/e820.h
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/include/uapi/asm/e820.h
> > +++ linux-2.6/arch/x86/include/uapi/asm/e820.h
> > @@ -33,15 +33,6 @@
> > #define E820_NVS 4
> > #define E820_UNUSABLE 5
> >
> > -
> > -/*
> > - * reserved RAM used by kernel itself
> > - * if CONFIG_INTEL_TXT is enabled, memory of this type will be
> > - * included in the S3 integrity calculation and so should not include
> > - * any memory that BIOS might alter over the S3 transition
> > - */
> > -#define E820_RESERVED_KERN 128
> > -
> > #ifndef __ASSEMBLY__
> > #include <linux/types.h>
> > struct e820entry {
> > Index: linux-2.6/arch/x86/kernel/e820.c
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/kernel/e820.c
> > +++ linux-2.6/arch/x86/kernel/e820.c
> > @@ -134,7 +134,6 @@ static void __init e820_print_type(u32 t
> > {
> > switch (type) {
> > case E820_RAM:
> > - case E820_RESERVED_KERN:
> > printk(KERN_CONT "usable");
> > break;
> > case E820_RESERVED:
> > @@ -688,7 +687,7 @@ void __init e820_mark_nosave_regions(uns
> > register_nosave_region(pfn, PFN_UP(ei->addr));
> >
> > pfn = PFN_DOWN(ei->addr + ei->size);
> > - if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
> > + if (ei->type != E820_RAM)
> > register_nosave_region(PFN_UP(ei->addr), pfn);
> >
> > if (pfn >= limit_pfn)
> > @@ -902,7 +901,6 @@ void __init finish_e820_parsing(void)
> > static inline const char *e820_type_to_string(int e820_type)
> > {
> > switch (e820_type) {
> > - case E820_RESERVED_KERN:
> > case E820_RAM: return "System RAM";
> > case E820_ACPI: return "ACPI Tables";
> > case E820_NVS: return "ACPI Non-volatile Storage";
> > @@ -1077,7 +1075,7 @@ void __init memblock_x86_fill(void)
> > if (end != (resource_size_t)end)
> > continue;
> >
> > - if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN)
> > + if (ei->type != E820_RAM)
> > continue;
> >
> > memblock_add(ei->addr, ei->size);
> > Index: linux-2.6/arch/x86/kernel/setup.c
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/kernel/setup.c
> > +++ linux-2.6/arch/x86/kernel/setup.c
> > @@ -456,30 +456,6 @@ static void __init parse_setup_data(void
> > }
> > }
> >
> > -static void __init e820_reserve_setup_data(void)
> > -{
> > - struct setup_data *data;
> > - u64 pa_data;
> > - int found = 0;
> > -
> > - pa_data = boot_params.hdr.setup_data;
> > - while (pa_data) {
> > - data = early_memremap(pa_data, sizeof(*data));
> > - e820_update_range(pa_data, sizeof(*data)+data->len,
> > - E820_RAM, E820_RESERVED_KERN);
> > - found = 1;
> > - pa_data = data->next;
> > - early_iounmap(data, sizeof(*data));
> > - }
> > - if (!found)
> > - return;
> > -
> > - sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
> > - memcpy(&e820_saved, &e820, sizeof(struct e820map));
> > - printk(KERN_INFO "extended physical RAM map:\n");
> > - e820_print_map("reserve setup_data");
> > -}
> > -
> > static void __init memblock_x86_reserve_range_setup_data(void)
> > {
> > struct setup_data *data;
> > @@ -1011,8 +987,6 @@ void __init setup_arch(char **cmdline_p)
> > early_dump_pci_devices();
> > #endif
> >
> > - /* update the e820_saved too */
> > - e820_reserve_setup_data();
> > finish_e820_parsing();
> >
> > if (efi_enabled(EFI_BOOT))
> > Index: linux-2.6/arch/x86/kernel/tboot.c
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/kernel/tboot.c
> > +++ linux-2.6/arch/x86/kernel/tboot.c
> > @@ -195,8 +195,7 @@ static int tboot_setup_sleep(void)
> > tboot->num_mac_regions = 0;
> >
> > for (i = 0; i < e820.nr_map; i++) {
> > - if ((e820.map[i].type != E820_RAM)
> > - && (e820.map[i].type != E820_RESERVED_KERN))
> > + if (e820.map[i].type != E820_RAM)
> > continue;
> >
> > add_mac_region(e820.map[i].addr, e820.map[i].size);
> > Index: linux-2.6/arch/x86/mm/init_64.c
> > ===================================================================
> > --- linux-2.6.orig/arch/x86/mm/init_64.c
> > +++ linux-2.6/arch/x86/mm/init_64.c
> > @@ -426,8 +426,7 @@ phys_pte_init(pte_t *pte_page, unsigned
> > next = (addr & PAGE_MASK) + PAGE_SIZE;
> > if (addr >= end) {
> > if (!after_bootmem &&
> > - !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM) &&
> > - !e820_any_mapped(addr & PAGE_MASK, next, E820_RESERVED_KERN))
> > + !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM))
> > set_pte(pte, __pte(0));
> > continue;
> > }
> > @@ -473,9 +472,8 @@ phys_pmd_init(pmd_t *pmd_page, unsigned
> >
> > next = (address & PMD_MASK) + PMD_SIZE;
> > if (address >= end) {
> > - if (!after_bootmem &&
> > - !e820_any_mapped(address & PMD_MASK, next, E820_RAM) &&
> > - !e820_any_mapped(address & PMD_MASK, next, E820_RESERVED_KERN))
> > + if (!after_bootmem && !e820_any_mapped(
> > + address & PMD_MASK, next, E820_RAM))
> > set_pmd(pmd, __pmd(0));
> > continue;
> > }
> > @@ -548,8 +546,7 @@ phys_pud_init(pud_t *pud_page, unsigned
> > next = (addr & PUD_MASK) + PUD_SIZE;
> > if (addr >= end) {
> > if (!after_bootmem &&
> > - !e820_any_mapped(addr & PUD_MASK, next, E820_RAM) &&
> > - !e820_any_mapped(addr & PUD_MASK, next, E820_RESERVED_KERN))
> > + !e820_any_mapped(addr & PUD_MASK, next, E820_RAM))
> > set_pud(pud, __pud(0));
> > continue;
> > }
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-01-30 16:28 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-30 3:58 [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region Lee, Chun-Yi
2015-01-30 7:35 ` Yinghai Lu
2015-01-30 14:41 ` joeyli
2015-01-30 8:30 ` Yinghai Lu
2015-01-30 14:46 ` joeyli
2015-01-30 16:28 ` joeyli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox