* [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region @ 2015-01-30 3:58 Lee, Chun-Yi 2015-01-30 7:35 ` Yinghai Lu 2015-01-30 8:30 ` Yinghai Lu 0 siblings, 2 replies; 6+ messages in thread From: Lee, Chun-Yi @ 2015-01-30 3:58 UTC (permalink / raw) To: Rafael J. Wysocki, Ingo Molnar, Pavel Machek Cc: Thomas Gleixner, x86, linux-kernel, Lee, Chun-Yi, Len Brown, H. Peter Anvin, Yinghai Lu, Takashi Iwai The reserve setup_data action break usable regions to not align to page size. As following case: BIOS-e820: [mem 0x0000000000088000-0x00000000000bffff] reserved BIOS-e820: [mem 0x0000000000100000-0x0000000094caffff] usable ... e820: update [mem 0x93c5f018-0x93c6f057] usable ==> usable /* reserve setup_data */ e820: update [mem 0x93c4f018-0x93c5e057] usable ==> usable /* reserve setup_data */ ... reserve setup_data: [mem0x0000000000088000-0x00000000000bffff] reserved reserve setup_data: [mem0x0000000000100000-0x0000000093c4f017] usable /* not align */ reserve setup_data: [mem0x0000000093c4f018-0x0000000093c5e057] usable /* not align */ reserve setup_data: [mem0x0000000093c5e058-0x0000000093c5f017] usable /* not align */ reserve setup_data: [mem0x0000000093c5f018-0x0000000093c6f057] usable /* not align */ reserve setup_data: [mem0x0000000093c6f058-0x0000000094caffff] usable The codes in e820_mark_nosave_regions() check pfn of each e820 entry to find out the hole between two entries then register it to nosave region. This logic misjudges the non-align but continuous usable region, then register one page in usable region to nosave. As following: PM: Registered nosave memory: [mem 0x000c0000-0x000fffff] PM: Registered nosave memory: [mem 0x93c4f000-0x93c4ffff] /* misjudgment */ PM: Registered nosave memory: [mem 0x93c5e000-0x93c5efff] /* misjudgment */ PM: Registered nosave memory: [mem 0x93c5f000-0x93c5ffff] /* misjudgment */ PM: Registered nosave memory: [mem 0x93c6f000-0x93c6ffff] /* misjudgment */ PM: Registered nosave memory: [mem 0x94cb0000-0x960affff] The issue causes some usable pages didn't collect to hibernate snapshot image. And, it also misjudges a usable page in nosave regions then hibernate resuming process interrupted by the unsafe pages checking: https://bugzilla.opensuse.org/show_bug.cgi?id=913885 This patch changed the code in e820_mark_nosave_regions() to check the address instead of pfn to avoid above issue. Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <len.brown@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Lee, Chun-Yi <jlee@suse.com> --- arch/x86/kernel/e820.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 49f8864..6eae021 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -687,15 +687,16 @@ void __init parse_e820_ext(u64 phys_addr, u32 data_len) void __init e820_mark_nosave_regions(unsigned long limit_pfn) { int i; - unsigned long pfn = 0; + unsigned long pfn = 0, pfnaddr = 0; for (i = 0; i < e820.nr_map; i++) { struct e820entry *ei = &e820.map[i]; - if (pfn < PFN_UP(ei->addr)) + if (pfnaddr < ei->addr) register_nosave_region(pfn, PFN_UP(ei->addr)); - pfn = PFN_DOWN(ei->addr + ei->size); + pfnaddr = ei->addr + ei->size; + pfn = PFN_DOWN(pfnaddr); if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN) register_nosave_region(PFN_UP(ei->addr), pfn); -- 1.8.4.5 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region 2015-01-30 3:58 [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region Lee, Chun-Yi @ 2015-01-30 7:35 ` Yinghai Lu 2015-01-30 14:41 ` joeyli 2015-01-30 8:30 ` Yinghai Lu 1 sibling, 1 reply; 6+ messages in thread From: Yinghai Lu @ 2015-01-30 7:35 UTC (permalink / raw) To: Lee, Chun-Yi Cc: Rafael J. Wysocki, Ingo Molnar, Pavel Machek, Thomas Gleixner, the arch/x86 maintainers, Linux Kernel Mailing List, Lee, Chun-Yi, Len Brown, H. Peter Anvin, Takashi Iwai On Thu, Jan 29, 2015 at 7:58 PM, Lee, Chun-Yi <joeyli.kernel@gmail.com> wrote: > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > index 49f8864..6eae021 100644 > --- a/arch/x86/kernel/e820.c > +++ b/arch/x86/kernel/e820.c > @@ -687,15 +687,16 @@ void __init parse_e820_ext(u64 phys_addr, u32 data_len) > void __init e820_mark_nosave_regions(unsigned long limit_pfn) > { > int i; > - unsigned long pfn = 0; > + unsigned long pfn = 0, pfnaddr = 0; > > for (i = 0; i < e820.nr_map; i++) { > struct e820entry *ei = &e820.map[i]; > > - if (pfn < PFN_UP(ei->addr)) > + if (pfnaddr < ei->addr) > register_nosave_region(pfn, PFN_UP(ei->addr)); > > - pfn = PFN_DOWN(ei->addr + ei->size); > + pfnaddr = ei->addr + ei->size; > + pfn = PFN_DOWN(pfnaddr); > if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN) > register_nosave_region(PFN_UP(ei->addr), pfn); > Those changes may not fix the problem. those E820_RESERVED_KERN and E820_RAM should be continuous. So you need to find the real end for those continuous ranges. Thanks Yinghai ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region 2015-01-30 7:35 ` Yinghai Lu @ 2015-01-30 14:41 ` joeyli 0 siblings, 0 replies; 6+ messages in thread From: joeyli @ 2015-01-30 14:41 UTC (permalink / raw) To: Yinghai Lu Cc: Lee, Chun-Yi, Rafael J. Wysocki, Ingo Molnar, Pavel Machek, Thomas Gleixner, the arch/x86 maintainers, Linux Kernel Mailing List, Len Brown, H. Peter Anvin, Takashi Iwai Hi, On Thu, Jan 29, 2015 at 11:35:49PM -0800, Yinghai Lu wrote: > On Thu, Jan 29, 2015 at 7:58 PM, Lee, Chun-Yi <joeyli.kernel@gmail.com> wrote: > > > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c > > index 49f8864..6eae021 100644 > > --- a/arch/x86/kernel/e820.c > > +++ b/arch/x86/kernel/e820.c > > @@ -687,15 +687,16 @@ void __init parse_e820_ext(u64 phys_addr, u32 data_len) > > void __init e820_mark_nosave_regions(unsigned long limit_pfn) > > { > > int i; > > - unsigned long pfn = 0; > > + unsigned long pfn = 0, pfnaddr = 0; > > > > for (i = 0; i < e820.nr_map; i++) { > > struct e820entry *ei = &e820.map[i]; > > > > - if (pfn < PFN_UP(ei->addr)) > > + if (pfnaddr < ei->addr) > > register_nosave_region(pfn, PFN_UP(ei->addr)); > > > > - pfn = PFN_DOWN(ei->addr + ei->size); > > + pfnaddr = ei->addr + ei->size; > > + pfn = PFN_DOWN(pfnaddr); > > if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN) > > register_nosave_region(PFN_UP(ei->addr), pfn); > > > Those changes may not fix the problem. > > those E820_RESERVED_KERN and E820_RAM should be continuous. Yes, they are continuous but not aligned. > > So you need to find the real end for those continuous ranges. > Sorry for I didn't capture the point for need to find the real end because those misjudgments are happened in the boundary between E820_RESERVED_KERN and E820_RAM. > Thanks > > Yinghai e.g. reserve setup_data: [mem 0x0000000000100000-0x0000000093c4f017] usable reserve setup_data: [mem 0x0000000093c4f018-0x0000000093c5e057] usable pfn = PFN_DOWN(0x93c4f017+1) = 0x93c4f PFN_UP(0x93c4f018) = (0x93c4f018 + 0x1000 - 1 >> PAGE_SHIFT) = 0x93C50 The above case match with "if (pfn < PFN_UP(ei->addr))" logic, it causes e820_mark_nosave_regions() add one usable page to nosave region: PM: Registered nosave memory: [mem 0x93c4f000-0x93c4ffff] Comparing the pfn of regions is not enough so my patch compares the address instead of pfn. It works to me to avoid those one page area show in nosave region. Thanks a lot! Joey Lee ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region 2015-01-30 3:58 [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region Lee, Chun-Yi 2015-01-30 7:35 ` Yinghai Lu @ 2015-01-30 8:30 ` Yinghai Lu 2015-01-30 14:46 ` joeyli 1 sibling, 1 reply; 6+ messages in thread From: Yinghai Lu @ 2015-01-30 8:30 UTC (permalink / raw) To: Lee, Chun-Yi, Huang Ying Cc: Rafael J. Wysocki, Ingo Molnar, Pavel Machek, Thomas Gleixner, the arch/x86 maintainers, Linux Kernel Mailing List, Lee, Chun-Yi, Len Brown, H. Peter Anvin, Takashi Iwai [-- Attachment #1: Type: text/plain, Size: 2750 bytes --] On Thu, Jan 29, 2015 at 7:58 PM, Lee, Chun-Yi <joeyli.kernel@gmail.com> wrote: > The reserve setup_data action break usable regions to not align to > page size. As following case: > > BIOS-e820: [mem 0x0000000000088000-0x00000000000bffff] reserved > BIOS-e820: [mem 0x0000000000100000-0x0000000094caffff] usable > ... > e820: update [mem 0x93c5f018-0x93c6f057] usable ==> usable /* reserve setup_data */ > e820: update [mem 0x93c4f018-0x93c5e057] usable ==> usable /* reserve setup_data */ > ... > reserve setup_data: [mem0x0000000000088000-0x00000000000bffff] reserved > reserve setup_data: [mem0x0000000000100000-0x0000000093c4f017] usable /* not align */ > reserve setup_data: [mem0x0000000093c4f018-0x0000000093c5e057] usable /* not align */ > reserve setup_data: [mem0x0000000093c5e058-0x0000000093c5f017] usable /* not align */ > reserve setup_data: [mem0x0000000093c5f018-0x0000000093c6f057] usable /* not align */ > reserve setup_data: [mem0x0000000093c6f058-0x0000000094caffff] usable > > The codes in e820_mark_nosave_regions() check pfn of each e820 > entry to find out the hole between two entries then register it to > nosave region. This logic misjudges the non-align but continuous usable > region, then register one page in usable region to nosave. As following: > > PM: Registered nosave memory: [mem 0x000c0000-0x000fffff] > PM: Registered nosave memory: [mem 0x93c4f000-0x93c4ffff] /* misjudgment */ > PM: Registered nosave memory: [mem 0x93c5e000-0x93c5efff] /* misjudgment */ > PM: Registered nosave memory: [mem 0x93c5f000-0x93c5ffff] /* misjudgment */ > PM: Registered nosave memory: [mem 0x93c6f000-0x93c6ffff] /* misjudgment */ > PM: Registered nosave memory: [mem 0x94cb0000-0x960affff] > > The issue causes some usable pages didn't collect to hibernate snapshot image. > And, it also misjudges a usable page in nosave regions then hibernate resuming > process interrupted by the unsafe pages checking: > https://bugzilla.opensuse.org/show_bug.cgi?id=913885 > > This patch changed the code in e820_mark_nosave_regions() to check the > address instead of pfn to avoid above issue. [+cc: Ying Huang] would like to suggest use attached instead: Subject: [RFC PATCH] x86: Kill E820_RESERVED_KERN Now we are using memblock to do early resource allocation instead of using e820 map directly, and setup_data is reserved in memblock early. Also kexec pass setup_data pointer to second kernel, so second kernel will reserve setup_data by their own. So we can kill E820_RESERVED_KERN and not touch e820 map at all. That will fix bug in mark_nonsave_region that can not handle that case: E820_RAM and E820_RESERVED_KERN continuous and boundary is not page aligned. Not sure about why tboot need to use this... [-- Attachment #2: kill_e820_reserved_kern.patch --] [-- Type: text/x-patch, Size: 5940 bytes --] Subject: [RFC PATCH] x86: Kill E820_RESERVED_KERN Now we are using memblock to do early resource allocation instead of using e820 map directly, and setup_data is reserved in memblock early. Also kexec pass setup_data pointer to second kernel, so second kernel will reserve setup_data by their own. So we can kill E820_RESERVED_KERN and not touch e820 map at all. That will fix bug in mark_nonsave_region that can not handle that case: E820_RAM and E820_RESERVED_KERN continuous and boundary is not page aligned. Not sure about why tboot need to use this... Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/include/uapi/asm/e820.h | 9 --------- arch/x86/kernel/e820.c | 6 ++---- arch/x86/kernel/setup.c | 26 -------------------------- arch/x86/kernel/tboot.c | 3 +-- arch/x86/mm/init_64.c | 11 ++++------- 5 files changed, 7 insertions(+), 48 deletions(-) Index: linux-2.6/arch/x86/include/uapi/asm/e820.h =================================================================== --- linux-2.6.orig/arch/x86/include/uapi/asm/e820.h +++ linux-2.6/arch/x86/include/uapi/asm/e820.h @@ -33,15 +33,6 @@ #define E820_NVS 4 #define E820_UNUSABLE 5 - -/* - * reserved RAM used by kernel itself - * if CONFIG_INTEL_TXT is enabled, memory of this type will be - * included in the S3 integrity calculation and so should not include - * any memory that BIOS might alter over the S3 transition - */ -#define E820_RESERVED_KERN 128 - #ifndef __ASSEMBLY__ #include <linux/types.h> struct e820entry { Index: linux-2.6/arch/x86/kernel/e820.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/e820.c +++ linux-2.6/arch/x86/kernel/e820.c @@ -134,7 +134,6 @@ static void __init e820_print_type(u32 t { switch (type) { case E820_RAM: - case E820_RESERVED_KERN: printk(KERN_CONT "usable"); break; case E820_RESERVED: @@ -688,7 +687,7 @@ void __init e820_mark_nosave_regions(uns register_nosave_region(pfn, PFN_UP(ei->addr)); pfn = PFN_DOWN(ei->addr + ei->size); - if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN) + if (ei->type != E820_RAM) register_nosave_region(PFN_UP(ei->addr), pfn); if (pfn >= limit_pfn) @@ -902,7 +901,6 @@ void __init finish_e820_parsing(void) static inline const char *e820_type_to_string(int e820_type) { switch (e820_type) { - case E820_RESERVED_KERN: case E820_RAM: return "System RAM"; case E820_ACPI: return "ACPI Tables"; case E820_NVS: return "ACPI Non-volatile Storage"; @@ -1077,7 +1075,7 @@ void __init memblock_x86_fill(void) if (end != (resource_size_t)end) continue; - if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN) + if (ei->type != E820_RAM) continue; memblock_add(ei->addr, ei->size); Index: linux-2.6/arch/x86/kernel/setup.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -456,30 +456,6 @@ static void __init parse_setup_data(void } } -static void __init e820_reserve_setup_data(void) -{ - struct setup_data *data; - u64 pa_data; - int found = 0; - - pa_data = boot_params.hdr.setup_data; - while (pa_data) { - data = early_memremap(pa_data, sizeof(*data)); - e820_update_range(pa_data, sizeof(*data)+data->len, - E820_RAM, E820_RESERVED_KERN); - found = 1; - pa_data = data->next; - early_iounmap(data, sizeof(*data)); - } - if (!found) - return; - - sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map); - memcpy(&e820_saved, &e820, sizeof(struct e820map)); - printk(KERN_INFO "extended physical RAM map:\n"); - e820_print_map("reserve setup_data"); -} - static void __init memblock_x86_reserve_range_setup_data(void) { struct setup_data *data; @@ -1011,8 +987,6 @@ void __init setup_arch(char **cmdline_p) early_dump_pci_devices(); #endif - /* update the e820_saved too */ - e820_reserve_setup_data(); finish_e820_parsing(); if (efi_enabled(EFI_BOOT)) Index: linux-2.6/arch/x86/kernel/tboot.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/tboot.c +++ linux-2.6/arch/x86/kernel/tboot.c @@ -195,8 +195,7 @@ static int tboot_setup_sleep(void) tboot->num_mac_regions = 0; for (i = 0; i < e820.nr_map; i++) { - if ((e820.map[i].type != E820_RAM) - && (e820.map[i].type != E820_RESERVED_KERN)) + if (e820.map[i].type != E820_RAM) continue; add_mac_region(e820.map[i].addr, e820.map[i].size); Index: linux-2.6/arch/x86/mm/init_64.c =================================================================== --- linux-2.6.orig/arch/x86/mm/init_64.c +++ linux-2.6/arch/x86/mm/init_64.c @@ -426,8 +426,7 @@ phys_pte_init(pte_t *pte_page, unsigned next = (addr & PAGE_MASK) + PAGE_SIZE; if (addr >= end) { if (!after_bootmem && - !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM) && - !e820_any_mapped(addr & PAGE_MASK, next, E820_RESERVED_KERN)) + !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM)) set_pte(pte, __pte(0)); continue; } @@ -473,9 +472,8 @@ phys_pmd_init(pmd_t *pmd_page, unsigned next = (address & PMD_MASK) + PMD_SIZE; if (address >= end) { - if (!after_bootmem && - !e820_any_mapped(address & PMD_MASK, next, E820_RAM) && - !e820_any_mapped(address & PMD_MASK, next, E820_RESERVED_KERN)) + if (!after_bootmem && !e820_any_mapped( + address & PMD_MASK, next, E820_RAM)) set_pmd(pmd, __pmd(0)); continue; } @@ -548,8 +546,7 @@ phys_pud_init(pud_t *pud_page, unsigned next = (addr & PUD_MASK) + PUD_SIZE; if (addr >= end) { if (!after_bootmem && - !e820_any_mapped(addr & PUD_MASK, next, E820_RAM) && - !e820_any_mapped(addr & PUD_MASK, next, E820_RESERVED_KERN)) + !e820_any_mapped(addr & PUD_MASK, next, E820_RAM)) set_pud(pud, __pud(0)); continue; } ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region 2015-01-30 8:30 ` Yinghai Lu @ 2015-01-30 14:46 ` joeyli 2015-01-30 16:28 ` joeyli 0 siblings, 1 reply; 6+ messages in thread From: joeyli @ 2015-01-30 14:46 UTC (permalink / raw) To: Yinghai Lu Cc: Lee, Chun-Yi, Huang Ying, Rafael J. Wysocki, Ingo Molnar, Pavel Machek, Thomas Gleixner, the arch/x86 maintainers, Linux Kernel Mailing List, Len Brown, H. Peter Anvin, Takashi Iwai On Fri, Jan 30, 2015 at 12:30:00AM -0800, Yinghai Lu wrote: > On Thu, Jan 29, 2015 at 7:58 PM, Lee, Chun-Yi <joeyli.kernel@gmail.com> wrote: > > The reserve setup_data action break usable regions to not align to > > page size. As following case: > > > > BIOS-e820: [mem 0x0000000000088000-0x00000000000bffff] reserved > > BIOS-e820: [mem 0x0000000000100000-0x0000000094caffff] usable > > ... > > e820: update [mem 0x93c5f018-0x93c6f057] usable ==> usable /* reserve setup_data */ > > e820: update [mem 0x93c4f018-0x93c5e057] usable ==> usable /* reserve setup_data */ > > ... > > reserve setup_data: [mem0x0000000000088000-0x00000000000bffff] reserved > > reserve setup_data: [mem0x0000000000100000-0x0000000093c4f017] usable /* not align */ > > reserve setup_data: [mem0x0000000093c4f018-0x0000000093c5e057] usable /* not align */ > > reserve setup_data: [mem0x0000000093c5e058-0x0000000093c5f017] usable /* not align */ > > reserve setup_data: [mem0x0000000093c5f018-0x0000000093c6f057] usable /* not align */ > > reserve setup_data: [mem0x0000000093c6f058-0x0000000094caffff] usable > > > > The codes in e820_mark_nosave_regions() check pfn of each e820 > > entry to find out the hole between two entries then register it to > > nosave region. This logic misjudges the non-align but continuous usable > > region, then register one page in usable region to nosave. As following: > > > > PM: Registered nosave memory: [mem 0x000c0000-0x000fffff] > > PM: Registered nosave memory: [mem 0x93c4f000-0x93c4ffff] /* misjudgment */ > > PM: Registered nosave memory: [mem 0x93c5e000-0x93c5efff] /* misjudgment */ > > PM: Registered nosave memory: [mem 0x93c5f000-0x93c5ffff] /* misjudgment */ > > PM: Registered nosave memory: [mem 0x93c6f000-0x93c6ffff] /* misjudgment */ > > PM: Registered nosave memory: [mem 0x94cb0000-0x960affff] > > > > The issue causes some usable pages didn't collect to hibernate snapshot image. > > And, it also misjudges a usable page in nosave regions then hibernate resuming > > process interrupted by the unsafe pages checking: > > https://bugzilla.opensuse.org/show_bug.cgi?id=913885 > > > > This patch changed the code in e820_mark_nosave_regions() to check the > > address instead of pfn to avoid above issue. > > [+cc: Ying Huang] > > would like to suggest use attached instead: > > Subject: [RFC PATCH] x86: Kill E820_RESERVED_KERN > > Now we are using memblock to do early resource allocation instead of using > e820 map directly, and setup_data is reserved in memblock early. > Also kexec pass setup_data pointer to second kernel, so second kernel > will reserve setup_data by their own. > > So we can kill E820_RESERVED_KERN and not touch e820 map at all. > > That will fix bug in mark_nonsave_region that can not handle that > case: E820_RAM and E820_RESERVED_KERN continuous and boundary is > not page aligned. > > Not sure about why tboot need to use this... Yes, that's good to totally remove the E820_RESERVED_KERN because we already use memblock to reserve setup_data ranges. Thanks Joey Lee > Subject: [RFC PATCH] x86: Kill E820_RESERVED_KERN > > Now we are using memblock to do early resource allocation instead of using > e820 map directly, and setup_data is reserved in memblock early. > Also kexec pass setup_data pointer to second kernel, so second kernel > will reserve setup_data by their own. > > So we can kill E820_RESERVED_KERN and not touch e820 map at all. > > That will fix bug in mark_nonsave_region that can not handle that > case: E820_RAM and E820_RESERVED_KERN continuous and boundary is > not page aligned. > > Not sure about why tboot need to use this... > > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > > --- > arch/x86/include/uapi/asm/e820.h | 9 --------- > arch/x86/kernel/e820.c | 6 ++---- > arch/x86/kernel/setup.c | 26 -------------------------- > arch/x86/kernel/tboot.c | 3 +-- > arch/x86/mm/init_64.c | 11 ++++------- > 5 files changed, 7 insertions(+), 48 deletions(-) > > Index: linux-2.6/arch/x86/include/uapi/asm/e820.h > =================================================================== > --- linux-2.6.orig/arch/x86/include/uapi/asm/e820.h > +++ linux-2.6/arch/x86/include/uapi/asm/e820.h > @@ -33,15 +33,6 @@ > #define E820_NVS 4 > #define E820_UNUSABLE 5 > > - > -/* > - * reserved RAM used by kernel itself > - * if CONFIG_INTEL_TXT is enabled, memory of this type will be > - * included in the S3 integrity calculation and so should not include > - * any memory that BIOS might alter over the S3 transition > - */ > -#define E820_RESERVED_KERN 128 > - > #ifndef __ASSEMBLY__ > #include <linux/types.h> > struct e820entry { > Index: linux-2.6/arch/x86/kernel/e820.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/e820.c > +++ linux-2.6/arch/x86/kernel/e820.c > @@ -134,7 +134,6 @@ static void __init e820_print_type(u32 t > { > switch (type) { > case E820_RAM: > - case E820_RESERVED_KERN: > printk(KERN_CONT "usable"); > break; > case E820_RESERVED: > @@ -688,7 +687,7 @@ void __init e820_mark_nosave_regions(uns > register_nosave_region(pfn, PFN_UP(ei->addr)); > > pfn = PFN_DOWN(ei->addr + ei->size); > - if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN) > + if (ei->type != E820_RAM) > register_nosave_region(PFN_UP(ei->addr), pfn); > > if (pfn >= limit_pfn) > @@ -902,7 +901,6 @@ void __init finish_e820_parsing(void) > static inline const char *e820_type_to_string(int e820_type) > { > switch (e820_type) { > - case E820_RESERVED_KERN: > case E820_RAM: return "System RAM"; > case E820_ACPI: return "ACPI Tables"; > case E820_NVS: return "ACPI Non-volatile Storage"; > @@ -1077,7 +1075,7 @@ void __init memblock_x86_fill(void) > if (end != (resource_size_t)end) > continue; > > - if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN) > + if (ei->type != E820_RAM) > continue; > > memblock_add(ei->addr, ei->size); > Index: linux-2.6/arch/x86/kernel/setup.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/setup.c > +++ linux-2.6/arch/x86/kernel/setup.c > @@ -456,30 +456,6 @@ static void __init parse_setup_data(void > } > } > > -static void __init e820_reserve_setup_data(void) > -{ > - struct setup_data *data; > - u64 pa_data; > - int found = 0; > - > - pa_data = boot_params.hdr.setup_data; > - while (pa_data) { > - data = early_memremap(pa_data, sizeof(*data)); > - e820_update_range(pa_data, sizeof(*data)+data->len, > - E820_RAM, E820_RESERVED_KERN); > - found = 1; > - pa_data = data->next; > - early_iounmap(data, sizeof(*data)); > - } > - if (!found) > - return; > - > - sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map); > - memcpy(&e820_saved, &e820, sizeof(struct e820map)); > - printk(KERN_INFO "extended physical RAM map:\n"); > - e820_print_map("reserve setup_data"); > -} > - > static void __init memblock_x86_reserve_range_setup_data(void) > { > struct setup_data *data; > @@ -1011,8 +987,6 @@ void __init setup_arch(char **cmdline_p) > early_dump_pci_devices(); > #endif > > - /* update the e820_saved too */ > - e820_reserve_setup_data(); > finish_e820_parsing(); > > if (efi_enabled(EFI_BOOT)) > Index: linux-2.6/arch/x86/kernel/tboot.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/tboot.c > +++ linux-2.6/arch/x86/kernel/tboot.c > @@ -195,8 +195,7 @@ static int tboot_setup_sleep(void) > tboot->num_mac_regions = 0; > > for (i = 0; i < e820.nr_map; i++) { > - if ((e820.map[i].type != E820_RAM) > - && (e820.map[i].type != E820_RESERVED_KERN)) > + if (e820.map[i].type != E820_RAM) > continue; > > add_mac_region(e820.map[i].addr, e820.map[i].size); > Index: linux-2.6/arch/x86/mm/init_64.c > =================================================================== > --- linux-2.6.orig/arch/x86/mm/init_64.c > +++ linux-2.6/arch/x86/mm/init_64.c > @@ -426,8 +426,7 @@ phys_pte_init(pte_t *pte_page, unsigned > next = (addr & PAGE_MASK) + PAGE_SIZE; > if (addr >= end) { > if (!after_bootmem && > - !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM) && > - !e820_any_mapped(addr & PAGE_MASK, next, E820_RESERVED_KERN)) > + !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM)) > set_pte(pte, __pte(0)); > continue; > } > @@ -473,9 +472,8 @@ phys_pmd_init(pmd_t *pmd_page, unsigned > > next = (address & PMD_MASK) + PMD_SIZE; > if (address >= end) { > - if (!after_bootmem && > - !e820_any_mapped(address & PMD_MASK, next, E820_RAM) && > - !e820_any_mapped(address & PMD_MASK, next, E820_RESERVED_KERN)) > + if (!after_bootmem && !e820_any_mapped( > + address & PMD_MASK, next, E820_RAM)) > set_pmd(pmd, __pmd(0)); > continue; > } > @@ -548,8 +546,7 @@ phys_pud_init(pud_t *pud_page, unsigned > next = (addr & PUD_MASK) + PUD_SIZE; > if (addr >= end) { > if (!after_bootmem && > - !e820_any_mapped(addr & PUD_MASK, next, E820_RAM) && > - !e820_any_mapped(addr & PUD_MASK, next, E820_RESERVED_KERN)) > + !e820_any_mapped(addr & PUD_MASK, next, E820_RAM)) > set_pud(pud, __pud(0)); > continue; > } ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region 2015-01-30 14:46 ` joeyli @ 2015-01-30 16:28 ` joeyli 0 siblings, 0 replies; 6+ messages in thread From: joeyli @ 2015-01-30 16:28 UTC (permalink / raw) To: Yinghai Lu Cc: Lee, Chun-Yi, Huang Ying, Rafael J. Wysocki, Ingo Molnar, Pavel Machek, Thomas Gleixner, the arch/x86 maintainers, Linux Kernel Mailing List, Len Brown, H. Peter Anvin, Takashi Iwai On Fri, Jan 30, 2015 at 10:46:48PM +0800, joeyli wrote: > On Fri, Jan 30, 2015 at 12:30:00AM -0800, Yinghai Lu wrote: > > On Thu, Jan 29, 2015 at 7:58 PM, Lee, Chun-Yi <joeyli.kernel@gmail.com> wrote: > > > The reserve setup_data action break usable regions to not align to > > > page size. As following case: > > > > > > BIOS-e820: [mem 0x0000000000088000-0x00000000000bffff] reserved > > > BIOS-e820: [mem 0x0000000000100000-0x0000000094caffff] usable > > > ... > > > e820: update [mem 0x93c5f018-0x93c6f057] usable ==> usable /* reserve setup_data */ > > > e820: update [mem 0x93c4f018-0x93c5e057] usable ==> usable /* reserve setup_data */ > > > ... > > > reserve setup_data: [mem0x0000000000088000-0x00000000000bffff] reserved > > > reserve setup_data: [mem0x0000000000100000-0x0000000093c4f017] usable /* not align */ > > > reserve setup_data: [mem0x0000000093c4f018-0x0000000093c5e057] usable /* not align */ > > > reserve setup_data: [mem0x0000000093c5e058-0x0000000093c5f017] usable /* not align */ > > > reserve setup_data: [mem0x0000000093c5f018-0x0000000093c6f057] usable /* not align */ > > > reserve setup_data: [mem0x0000000093c6f058-0x0000000094caffff] usable > > > > > > The codes in e820_mark_nosave_regions() check pfn of each e820 > > > entry to find out the hole between two entries then register it to > > > nosave region. This logic misjudges the non-align but continuous usable > > > region, then register one page in usable region to nosave. As following: > > > > > > PM: Registered nosave memory: [mem 0x000c0000-0x000fffff] > > > PM: Registered nosave memory: [mem 0x93c4f000-0x93c4ffff] /* misjudgment */ > > > PM: Registered nosave memory: [mem 0x93c5e000-0x93c5efff] /* misjudgment */ > > > PM: Registered nosave memory: [mem 0x93c5f000-0x93c5ffff] /* misjudgment */ > > > PM: Registered nosave memory: [mem 0x93c6f000-0x93c6ffff] /* misjudgment */ > > > PM: Registered nosave memory: [mem 0x94cb0000-0x960affff] > > > > > > The issue causes some usable pages didn't collect to hibernate snapshot image. > > > And, it also misjudges a usable page in nosave regions then hibernate resuming > > > process interrupted by the unsafe pages checking: > > > https://bugzilla.opensuse.org/show_bug.cgi?id=913885 > > > > > > This patch changed the code in e820_mark_nosave_regions() to check the > > > address instead of pfn to avoid above issue. > > > > [+cc: Ying Huang] > > > > would like to suggest use attached instead: > > > > Subject: [RFC PATCH] x86: Kill E820_RESERVED_KERN > > > > Now we are using memblock to do early resource allocation instead of using > > e820 map directly, and setup_data is reserved in memblock early. > > Also kexec pass setup_data pointer to second kernel, so second kernel > > will reserve setup_data by their own. > > > > So we can kill E820_RESERVED_KERN and not touch e820 map at all. > > > > That will fix bug in mark_nonsave_region that can not handle that > > case: E820_RAM and E820_RESERVED_KERN continuous and boundary is > > not page aligned. > > > > Not sure about why tboot need to use this... > > Yes, that's good to totally remove the E820_RESERVED_KERN because we already > use memblock to reserve setup_data ranges. > > > Thanks > Joey Lee This patch works on my Gateway EG50_HW notebook to fix issue in nosave regions. Tested-by: Lee, Chun-Yi <jlee@suse.com> Thanks a lot! Joey Lee > > > Subject: [RFC PATCH] x86: Kill E820_RESERVED_KERN > > > > Now we are using memblock to do early resource allocation instead of using > > e820 map directly, and setup_data is reserved in memblock early. > > Also kexec pass setup_data pointer to second kernel, so second kernel > > will reserve setup_data by their own. > > > > So we can kill E820_RESERVED_KERN and not touch e820 map at all. > > > > That will fix bug in mark_nonsave_region that can not handle that > > case: E820_RAM and E820_RESERVED_KERN continuous and boundary is > > not page aligned. > > > > Not sure about why tboot need to use this... > > > > Signed-off-by: Yinghai Lu <yinghai@kernel.org> > > > > --- > > arch/x86/include/uapi/asm/e820.h | 9 --------- > > arch/x86/kernel/e820.c | 6 ++---- > > arch/x86/kernel/setup.c | 26 -------------------------- > > arch/x86/kernel/tboot.c | 3 +-- > > arch/x86/mm/init_64.c | 11 ++++------- > > 5 files changed, 7 insertions(+), 48 deletions(-) > > > > Index: linux-2.6/arch/x86/include/uapi/asm/e820.h > > =================================================================== > > --- linux-2.6.orig/arch/x86/include/uapi/asm/e820.h > > +++ linux-2.6/arch/x86/include/uapi/asm/e820.h > > @@ -33,15 +33,6 @@ > > #define E820_NVS 4 > > #define E820_UNUSABLE 5 > > > > - > > -/* > > - * reserved RAM used by kernel itself > > - * if CONFIG_INTEL_TXT is enabled, memory of this type will be > > - * included in the S3 integrity calculation and so should not include > > - * any memory that BIOS might alter over the S3 transition > > - */ > > -#define E820_RESERVED_KERN 128 > > - > > #ifndef __ASSEMBLY__ > > #include <linux/types.h> > > struct e820entry { > > Index: linux-2.6/arch/x86/kernel/e820.c > > =================================================================== > > --- linux-2.6.orig/arch/x86/kernel/e820.c > > +++ linux-2.6/arch/x86/kernel/e820.c > > @@ -134,7 +134,6 @@ static void __init e820_print_type(u32 t > > { > > switch (type) { > > case E820_RAM: > > - case E820_RESERVED_KERN: > > printk(KERN_CONT "usable"); > > break; > > case E820_RESERVED: > > @@ -688,7 +687,7 @@ void __init e820_mark_nosave_regions(uns > > register_nosave_region(pfn, PFN_UP(ei->addr)); > > > > pfn = PFN_DOWN(ei->addr + ei->size); > > - if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN) > > + if (ei->type != E820_RAM) > > register_nosave_region(PFN_UP(ei->addr), pfn); > > > > if (pfn >= limit_pfn) > > @@ -902,7 +901,6 @@ void __init finish_e820_parsing(void) > > static inline const char *e820_type_to_string(int e820_type) > > { > > switch (e820_type) { > > - case E820_RESERVED_KERN: > > case E820_RAM: return "System RAM"; > > case E820_ACPI: return "ACPI Tables"; > > case E820_NVS: return "ACPI Non-volatile Storage"; > > @@ -1077,7 +1075,7 @@ void __init memblock_x86_fill(void) > > if (end != (resource_size_t)end) > > continue; > > > > - if (ei->type != E820_RAM && ei->type != E820_RESERVED_KERN) > > + if (ei->type != E820_RAM) > > continue; > > > > memblock_add(ei->addr, ei->size); > > Index: linux-2.6/arch/x86/kernel/setup.c > > =================================================================== > > --- linux-2.6.orig/arch/x86/kernel/setup.c > > +++ linux-2.6/arch/x86/kernel/setup.c > > @@ -456,30 +456,6 @@ static void __init parse_setup_data(void > > } > > } > > > > -static void __init e820_reserve_setup_data(void) > > -{ > > - struct setup_data *data; > > - u64 pa_data; > > - int found = 0; > > - > > - pa_data = boot_params.hdr.setup_data; > > - while (pa_data) { > > - data = early_memremap(pa_data, sizeof(*data)); > > - e820_update_range(pa_data, sizeof(*data)+data->len, > > - E820_RAM, E820_RESERVED_KERN); > > - found = 1; > > - pa_data = data->next; > > - early_iounmap(data, sizeof(*data)); > > - } > > - if (!found) > > - return; > > - > > - sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map); > > - memcpy(&e820_saved, &e820, sizeof(struct e820map)); > > - printk(KERN_INFO "extended physical RAM map:\n"); > > - e820_print_map("reserve setup_data"); > > -} > > - > > static void __init memblock_x86_reserve_range_setup_data(void) > > { > > struct setup_data *data; > > @@ -1011,8 +987,6 @@ void __init setup_arch(char **cmdline_p) > > early_dump_pci_devices(); > > #endif > > > > - /* update the e820_saved too */ > > - e820_reserve_setup_data(); > > finish_e820_parsing(); > > > > if (efi_enabled(EFI_BOOT)) > > Index: linux-2.6/arch/x86/kernel/tboot.c > > =================================================================== > > --- linux-2.6.orig/arch/x86/kernel/tboot.c > > +++ linux-2.6/arch/x86/kernel/tboot.c > > @@ -195,8 +195,7 @@ static int tboot_setup_sleep(void) > > tboot->num_mac_regions = 0; > > > > for (i = 0; i < e820.nr_map; i++) { > > - if ((e820.map[i].type != E820_RAM) > > - && (e820.map[i].type != E820_RESERVED_KERN)) > > + if (e820.map[i].type != E820_RAM) > > continue; > > > > add_mac_region(e820.map[i].addr, e820.map[i].size); > > Index: linux-2.6/arch/x86/mm/init_64.c > > =================================================================== > > --- linux-2.6.orig/arch/x86/mm/init_64.c > > +++ linux-2.6/arch/x86/mm/init_64.c > > @@ -426,8 +426,7 @@ phys_pte_init(pte_t *pte_page, unsigned > > next = (addr & PAGE_MASK) + PAGE_SIZE; > > if (addr >= end) { > > if (!after_bootmem && > > - !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM) && > > - !e820_any_mapped(addr & PAGE_MASK, next, E820_RESERVED_KERN)) > > + !e820_any_mapped(addr & PAGE_MASK, next, E820_RAM)) > > set_pte(pte, __pte(0)); > > continue; > > } > > @@ -473,9 +472,8 @@ phys_pmd_init(pmd_t *pmd_page, unsigned > > > > next = (address & PMD_MASK) + PMD_SIZE; > > if (address >= end) { > > - if (!after_bootmem && > > - !e820_any_mapped(address & PMD_MASK, next, E820_RAM) && > > - !e820_any_mapped(address & PMD_MASK, next, E820_RESERVED_KERN)) > > + if (!after_bootmem && !e820_any_mapped( > > + address & PMD_MASK, next, E820_RAM)) > > set_pmd(pmd, __pmd(0)); > > continue; > > } > > @@ -548,8 +546,7 @@ phys_pud_init(pud_t *pud_page, unsigned > > next = (addr & PUD_MASK) + PUD_SIZE; > > if (addr >= end) { > > if (!after_bootmem && > > - !e820_any_mapped(addr & PUD_MASK, next, E820_RAM) && > > - !e820_any_mapped(addr & PUD_MASK, next, E820_RESERVED_KERN)) > > + !e820_any_mapped(addr & PUD_MASK, next, E820_RAM)) > > set_pud(pud, __pud(0)); > > continue; > > } > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-01-30 16:28 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-30 3:58 [PATCH] x86/mm, hibernate: Fix misjudgment of register setup_data page to nosave region Lee, Chun-Yi 2015-01-30 7:35 ` Yinghai Lu 2015-01-30 14:41 ` joeyli 2015-01-30 8:30 ` Yinghai Lu 2015-01-30 14:46 ` joeyli 2015-01-30 16:28 ` joeyli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox