* Reserved page flaging of 2.4 kernel memory changed recently?
@ 2004-02-05 2:07 Michael Frank
2004-02-08 2:06 ` Andrea Arcangeli
0 siblings, 1 reply; 12+ messages in thread
From: Michael Frank @ 2004-02-05 2:07 UTC (permalink / raw)
To: linux-kernel; +Cc: Nigel Cunningham
The question is related to saving the kernel with swsusp.
Looking at 2.4.24 x86 kernel page flags, kernel memory is flaged reserved
the same way as video, BIOS pages.
Is this a recent change since using the aa vm and should it be like that?
If so, should hardware related reserved pages i.e video, BIOS be flaged
PG_nosave upon init?
What about iomemory?
Michael
Note: (Flags & 0x4000) == PG_reserved
# crash vmlinux
crash 3.7-5.2
Copyright (C) 2002, 2003 Red Hat, Inc.
Copyright (C) 1998-2003 Hewlett-Packard Co
Copyright (C) 1999, 2002 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb Red Hat Linux (5.3post-0.20021129.36rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
6c 14
WARNING: net_init: unknown device type for net device KERNEL: vmlinux
DUMPFILE: /dev/mem
CPUS: 1
DATE: Thu Feb 5 09:36:36 2004
UPTIME: 00:57:01
LOAD AVERAGE: 0.08, 0.02, 0.01
TASKS: 76
NODENAME: mhfl4
RELEASE: 2.4.24-mhf169
VERSION: #2 Sat Jan 31 16:03:07 HKT 2004
MACHINE: i686 (2399 Mhz)
MEMORY: 496 MB
PID: 1872
COMMAND: "crash"
TASK: cec66000
CPU: 0
STATE: TASK_RUNNING (ACTIVE)
crash> kmem -p
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
c100001c 0 0 0 0 4000
c1000048 1000 0 0 0 4000
c1000074 2000 0 0 0 4000
c10000a0 3000 0 0 0 4000
c10000cc 4000 0 0 0 0
c10000f8 5000 0 0 0 0
c1000124 6000 0 0 0 0
c1000150 7000 0 0 0 0
c100017c 8000 0 0 0 0
c10001a8 9000 0 0 0 0
c10001d4 a000 0 0 0 0
c1000200 b000 0 0 0 0
[]
c1001b70 9f000 0 0 0 4000
c1001b9c a0000 0 0 0 4000
c1001bc8 a1000 0 0 0 4000
c1001bf4 a2000 0 0 0 4000
c1001c20 a3000 0 0 0 4000
c1001c4c a4000 0 0 0 4000
c1001c78 a5000 0 0 0 4000
c1001ca4 a6000 0 0 0 4000
c1001cd0 a7000 0 0 0 4000
c1001cfc a8000 0 0 0 4000
c1001d28 a9000 0 0 0 4000
c1001d54 aa000 0 0 0 4000
c1001d80 ab000 0 0 0 4000
c1001dac ac000 0 0 0 4000
c1001dd8 ad000 0 0 0 4000
c1001e04 ae000 0 0 0 4000
[]
c1002b98 fd000 0 0 0 4000
c1002bc4 fe000 0 0 0 4000
c1002bf0 ff000 0 0 0 4000
c1002c1c 100000 0 0 0 4000
c1002c48 101000 0 0 0 4000
c1002c74 102000 0 0 0 4000
c1002ca0 103000 0 0 0 4000
c1002ccc 104000 0 0 1425 4000
c1002cf8 105000 0 0 0 4000
c1002d24 106000 0 0 0 4000
c1002d50 107000 0 0 0 4000
c1002d7c 108000 0 0 0 4000
c1002da8 109000 0 0 0 4000
[]
c100b2b0 40f000 0 0 0 4000
c100b2dc 410000 0 0 0 4000
c100b308 411000 0 0 0 4000
c100b334 412000 0 0 0 0
c100b360 413000 0 0 0 0
c100b38c 414000 0 0 0 0
c100b3b8 415000 0 0 0 0
c100b3e4 416000 0 0 0 0
c100b410 417000 0 0 0 0
c100b43c 418000 0 0 0 0
c100b468 419000 0 0 0 0
c100b494 41a000 0 0 0 0
c100b4c0 41b000 0 0 0 0
c100b4ec 41c000 0 0 0 0
c100b518 41d000 0 0 0 0
c100b544 41e000 0 0 0 0
c100b570 41f000 0 0 0 0
c100b59c 420000 0 0 0 0
c100b5c8 421000 0 0 0 0
c100b5f4 422000 0 0 0 0
c100b620 423000 0 0 0 0
c100b64c 424000 0 0 0 0
c100b678 425000 0 0 0 0
c100b6a4 426000 0 0 0 0
c100b6d0 427000 0 0 0 0
c100b6fc 428000 0 0 0 0
c100b728 429000 0 0 0 0
c100b754 42a000 0 0 0 0
c100b780 42b000 0 0 0 0
c100b7ac 42c000 0 0 0 0
c100b7d8 42d000 0 0 0 0
c100b804 42e000 0 0 0 0
c100b830 42f000 0 0 0 0
c100b85c 430000 0 0 0 4000
c100b888 431000 0 0 0 4000
c100b8b4 432000 0 0 0 4000
c100b8e0 433000 0 0 0 4000
c100b90c 434000 0 0 0 4000
c100b938 435000 0 0 0 4000
c100b964 436000 0 0 0 4000
c100b990 437000 0 0 0 4000
c100b9bc 438000 0 0 0 4000
c100b9e8 439000 0 0 0 4000
c100ba14 43a000 0 0 0 4000
[]
c100c6a0 483000 0 0 0 4000
c100c6cc 484000 0 0 0 4000
c100c6f8 485000 0 0 0 4000
c100c724 486000 0 0 0 4000
c100c750 487000 0 0 0 4000
c100c77c 488000 0 0 0 4000
c100c7a8 489000 0 0 0 0
c100c7d4 48a000 0 0 0 0
c100c800 48b000 0 0 0 0
c100c82c 48c000 0 0 0 0
c100c858 48d000 0 0 0 4000
c100c884 48e000 0 0 0 0
c100c8b0 48f000 0 0 0 0
c100c8dc 490000 0 0 0 0
c100c908 491000 0 0 0 0
c100c934 492000 0 0 0 0
c100c960 493000 0 0 0 0
c100c98c 494000 0 0 0 0
c100c9b8 495000 0 0 0 0
c100c9e4 496000 0 0 0 0
c100ca10 497000 0 0 0 0
c100ca3c 498000 0 0 0 0
c100ca68 499000 0 0 0 0
c100ca94 49a000 0 0 0 0
c100cac0 49b000 0 0 0 0
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Reserved page flaging of 2.4 kernel memory changed recently?
2004-02-05 2:07 Reserved page flaging of 2.4 kernel memory changed recently? Michael Frank
@ 2004-02-08 2:06 ` Andrea Arcangeli
2004-02-10 15:24 ` Michael Frank
0 siblings, 1 reply; 12+ messages in thread
From: Andrea Arcangeli @ 2004-02-08 2:06 UTC (permalink / raw)
To: Michael Frank; +Cc: linux-kernel, Nigel Cunningham
On Thu, Feb 05, 2004 at 10:07:35AM +0800, Michael Frank wrote:
> The question is related to saving the kernel with swsusp.
>
> Looking at 2.4.24 x86 kernel page flags, kernel memory is flaged reserved
> the same way as video, BIOS pages.
>
> Is this a recent change since using the aa vm and should it be like that?
this is the same as 2.2 too, the reserved bit means this isn't a normal
"ram" page, this is either non-ram in the mem_map region or a ram page
being used by a device driver for source/destination dma or similar
special usage.
> If so, should hardware related reserved pages i.e video, BIOS be flaged
> PG_nosave upon init?
the non-ram regions of the physical address space present in the
mem_map_t array are marked as reserved at boot.
About the ram pieces of the mem_map_t, it's by the time the device
driver needs some ram to do dma on it, that you alloc one page with
alloc_pages and you mark it reserved.
marking physical ram pages as reserved is only needed when you want to
make this page visible to userspace via ->mmap/mmap(2). if you only work
with copy_to_user/copy_from_user read(2)/write(2), nothing will change
if the page is reserved or not (same goes for the mmio areas part of the
mem_map_t array).
the PG_reserved plays a role by the time you map the page in userspace,
then a fork() won't copy-on-write, such a page will be shared, since
it's a special page that the hardware "owns", if you would copy-on-write
you couldn't talk with the device anymore on the copied page. After all
references to the device have been released, the release callback is run
by the vfs, so you know the page isn't mapped in userspace anymore and
if it's a ram page you can clear the PG_reserved and then free the page
(if you free the page w/o clearing PG_reserved first you'll leak memory
silenty).
Those regions normally are also marked VM_IO in the vma, to avoid ptrace
or rawio to mess with those dma pages, which isn't guaranteed to be safe
and could lockup the bus.
> What about iomemory?
iomemory (i.e. MMIO) is not ram and normally it doesn't fit by mistake
in the mem_map_t array either, so if there's no page struct they can't
be marked reserved either. The vm will automatically recognize and
threat pages outside the mem_map_t as reserved.
ioremap is needed to access MMIO memory and it's a different matter.
not sure what's the reason of the question though. with regard to
suspend to disk you should probably use the original e820 map to find if
the reserved pages are ram or non ram, the reserved ram pages should
probably be saved/restored, however the saving/restore process should be
probably directed by the device driver owning those reserved ram pages
to be very safe (can suspend to disk be math safe at all? :). the non
ram pages shouldn't need to be saved/restored (as you found there's the
bios in there). Basically you've to differentiate between reserved ram
pages and reserved non-ram (marked as reserved just because their
physical address fits in the mem_map_t array).
I've seen in 2.6 there's a PG_nosave, but it seems to have a different
purpose than a "PG_ram" that tells you if the page is ram or not. From a
quick read of the code it seems all reserved pages are stored except the
ones in the nosave segment (which is also marked protected as part of
the static kernel .text). So in short it looks like we save/restore the
non-ram too, maybe it's ok, dunno but I would find it a lot safer not to
touch that non-ram.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Reserved page flaging of 2.4 kernel memory changed recently?
2004-02-08 2:06 ` Andrea Arcangeli
@ 2004-02-10 15:24 ` Michael Frank
2004-02-10 18:51 ` Andrea Arcangeli
2004-02-19 7:26 ` Pavel Machek
0 siblings, 2 replies; 12+ messages in thread
From: Michael Frank @ 2004-02-10 15:24 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Nigel Cunningham, linux-kernel
Andrea,
Thank you very much you for your reply.
On Sunday 08 February 2004 10:06, Andrea Arcangeli wrote:
> not sure what's the reason of the question though. with regard to
Reason for the question is to gain more understanding for centralizing
flaging of pages which should not be touched.
I also was confused by having seen a netdump patch which uses the
PG_reserved bit wrongly, and asked if anything was changed recently
to explain whether this patch actually could work as is.
> suspend to disk you should probably use the original e820 map to find if
> the reserved pages are ram or non ram,
Generally yes but the process should be centralized using standard page
flag mechanism. This is the only clean and application independent way.
Examples emphasizing the importance of centralisation as one would not
want to consider these issues in the implementation of swsusp, netdump
or debuggers:
- Drivers may be sensitive to reading/writing their DMA buffers without
considering ongoing transfers.
- (Embedded) devices iomemory could be mapped into DMA zone and
get trashed by parasitic accesses.
- A bad ppro (see patch) can be locked up by accessing "the wrong" page.
> the reserved ram pages should probably be saved/restored,
Except when marked PG_nosave ;)
> however the saving/restore process should be probably directed by the
> device driver owning those reserved ram pages to be very safe
Yes, exactly, only the driver knows what is going on with it's data and
_must_ be made responsible for taking care of it's data.
Also in case of DMA buffers, DMA must be properly managed (suspended)
by the driver.
- sidenote as to PM - Device buffers such as disk buffers and port FIFO's
(serial, USB serial...) must be flushed too by their drivers -
> (can suspend to disk be made safe at all? :).
Well, it seems to put a kernel and drivers into a fridge and revive
them on taking them out is like freezing and reviving hell :)
Nigels Software suspend 2.0 is stable on 2.4 and 2.6 and well tested,
but there are issues affecting 1 in a 1000 such as MCE's occuring -
more on that will follow.
And with drivers, we continue to have problems and have high hopes
for PM in 2.6.
> the non
> ram pages shouldn't need to be saved/restored (as you found there's the
> bios in there). Basically you've to differentiate between reserved ram
> pages and reserved non-ram (marked as reserved just because their
> physical address fits in the mem_map_t array).
It seems unclean and unsafe to touch non-RAM regions. On lots of
"proper" non-PC hardware, there would even be even bus timeouts
if the location is not accessible (such as write to BIOS).
>
> I've seen in 2.6 there's a PG_nosave, but it seems to have a different
> purpose than a "PG_ram" that tells you if the page is ram or not. From a
> quick read of the code it seems all reserved pages are stored except the
> ones in the nosave segment (which is also marked protected as part of
> the static kernel .text). So in short it looks like we save/restore the
> non-ram too, maybe it's ok, dunno but I would find it a lot safer not to
> touch that non-ram.
>
By what I read on LKML, 64bit is probably more fussy then 32bit. eg when
accessing non-existing memory such as on a system with memory holes
with /dev/mem often causes MCE's.
And here is an example of touching non-RAM going wrong on a x86 PC:
One swsusp user received a MCE on swsusp accessing 0xa0000 (video).
This seems to be quite recent hardware: a Athlon mobile XP 20000.
This Compaq evo is running alright with NOMCE on the commandline.
(I had posted a question about this on LKML -
"Reserved pages not flagged on Compaq evo? on 4 Feb", but found out
meanwhile (to my big surprise) that swsusp accesses the area in
question, thus likely rendering the question obsolete)
Here is a patch for 2.4.2[45], which marks non-ram, CPU-broken-pages, and
nosave kernel-pages pages with PG_nosave.
Applications such as swsusp, netdump or debuggers have just to check
the PG_nosave bit to be safe.
I actually would like to rename the bit PG_nosave to PG_donttouch ;)
diff -uN -r -X /home/mhf/sys/dont/dontdiff linux-2.4.24-Vanilla/arch/i386/mm/init.c linux-2.4.24-mhf179/arch/i386/mm/init.c
--- linux-2.4.24-Vanilla/arch/i386/mm/init.c 2004-01-21 15:53:01.000000000 +0800
+++ linux-2.4.24-mhf179/arch/i386/mm/init.c 2004-02-10 06:15:31.000000000 +0800
@@ -451,15 +451,18 @@
{
if (!page_is_ram(pfn)) {
SetPageReserved(page);
+ SetPageNosave(page);
return;
}
if (bad_ppro && page_kills_ppro(pfn)) {
SetPageReserved(page);
+ SetPageNosave(page);
return;
}
ClearPageReserved(page);
+ ClearPageNosave(page);
set_bit(PG_highmem, &page->flags);
atomic_set(&page->count, 1);
__free_page(page);
@@ -478,10 +481,18 @@
#endif
}
+
+/* The definition of a nosave region is part of software suspend
+ * and repeated to enable compilation and execution if this patch
+ * with a nosave region using a vanilla kernel.
+ */
+//extern char __nosave_begin, __nosave_end;
+
static int __init free_pages_init(void)
{
extern int ppro_with_ram_bug(void);
int bad_ppro, reservedpages, pfn;
+ unsigned long addr;
bad_ppro = ppro_with_ram_bug();
@@ -489,12 +500,29 @@
totalram_pages += free_all_bootmem();
reservedpages = 0;
- for (pfn = 0; pfn < max_low_pfn; pfn++) {
+ addr = (unsigned long)__va(0);
+ for (pfn = 0; pfn < max_low_pfn; pfn++, addr += PAGE_SIZE) {
+ if (page_is_ram(pfn)) {
+ /*
+ * Only count reserved RAM pages
+ */
+ if (PageReserved(mem_map+pfn))
+ reservedpages++;
+#if defined(__nosave_begin)
+ /*
+ * Mark nosave pages
+ */
+ if (addr < (unsigned long)&__nosave_begin ||
+ addr >= (unsigned long)&__nosave_end)
+ continue;
+#else
+ continue;
+#endif
+ }
/*
- * Only count reserved RAM pages
+ * All other pages such as non-RAM pages are always nosave
*/
- if (page_is_ram(pfn) && PageReserved(mem_map+pfn))
- reservedpages++;
+ SetPageNosave(mem_map+pfn);
}
#ifdef CONFIG_HIGHMEM
for (pfn = highend_pfn-1; pfn >= highstart_pfn; pfn--)
diff -uN -r -X /home/mhf/sys/dont/dontdiff linux-2.4.24-Vanilla/include/linux/mm.h linux-2.4.24-mhf179/include/linux/mm.h
--- linux-2.4.24-Vanilla/include/linux/mm.h 2004-02-06 15:40:59.000000000 +0800
+++ linux-2.4.24-mhf179/include/linux/mm.h 2004-02-10 01:19:40.000000000 +0800
@@ -301,6 +301,21 @@
#define PG_launder 15 /* written out by VM pressure.. */
#define PG_fs_1 16 /* Filesystem specific */
+/* This page is either part of the nosave region of software suspend or
+ * hardware reserved.
+ *
+ * This page may be part of ROM, device mapped memory or broken on the CPU
+ * in use and should _not_ ever be accessed except when in the software
+ * suspend nosave region or when refering device mapped memory.
+ *
+ * This page is not saved by software suspend, ignored by netdump and by
+ * debuggers performing memory dumps.
+ *
+ * Debuggers may wish to implement an overide to allow access to this page
+ * for specialized debugging.
+ */
+#define PG_nosave 17
+
#ifndef arch_set_page_uptodate
#define arch_set_page_uptodate(page)
#endif
@@ -327,6 +342,9 @@
#define SetPageLaunder(page) set_bit(PG_launder, &(page)->flags)
#define ClearPageLaunder(page) clear_bit(PG_launder, &(page)->flags)
#define ClearPageArch1(page) clear_bit(PG_arch_1, &(page)->flags)
+#define PageNosave(page) test_bit(PG_nosave, &(page)->flags)
+#define SetPageNosave(page) set_bit(PG_nosave, &(page)->flags)
+#define ClearPageNosave(page) clear_bit(PG_nosave, &(page)->flags)
/*
* The zone field is never updated after free_area_init_core()
What is your opinion of this approach?
BTW, The patch below is needed to run it with a nosave region on
Vanilla 2.4.2[45]:
diff -ruN linux-2.4.24/arch/i386/vmlinux.lds software-suspend-linux-2.4.24-rev7/arch/i386/vmlinux.lds
--- linux-2.4.24/arch/i386/vmlinux.lds 2004-01-22 19:46:03.000000000 +1300
+++ software-suspend-linux-2.4.24-rev7/arch/i386/vmlinux.lds 2004-01-30 15:23:38.000000000 +1300
@@ -53,6 +53,12 @@
__init_end = .;
. = ALIGN(4096);
+ __nosave_begin = .;
+ .data_nosave : { *(.data.nosave) }
+ . = ALIGN(4096);
+ __nosave_end = .;
+
+ . = ALIGN(4096);
.data.page_aligned : { *(.data.idt) }
. = ALIGN(32);
also uncomment in mm/init.c the line:
//extern char __nosave_begin, __nosave_end;
Regards
Michael
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Reserved page flaging of 2.4 kernel memory changed recently?
2004-02-10 15:24 ` Michael Frank
@ 2004-02-10 18:51 ` Andrea Arcangeli
2004-02-10 19:38 ` Michael Frank
2004-02-11 8:36 ` Michael Frank
2004-02-19 7:26 ` Pavel Machek
1 sibling, 2 replies; 12+ messages in thread
From: Andrea Arcangeli @ 2004-02-10 18:51 UTC (permalink / raw)
To: Michael Frank; +Cc: Nigel Cunningham, linux-kernel
On Tue, Feb 10, 2004 at 11:24:01PM +0800, Michael Frank wrote:
> By what I read on LKML, 64bit is probably more fussy then 32bit. eg when
> accessing non-existing memory such as on a system with memory holes
> with /dev/mem often causes MCE's.
yes, this happens on ia64 and it may happen on x86-64 too.
> And here is an example of touching non-RAM going wrong on a x86 PC:
>
> One swsusp user received a MCE on swsusp accessing 0xa0000 (video).
> This seems to be quite recent hardware: a Athlon mobile XP 20000.
> This Compaq evo is running alright with NOMCE on the commandline.
this is possible too.
> Here is a patch for 2.4.2[45], which marks non-ram, CPU-broken-pages, and
> nosave kernel-pages pages with PG_nosave.
>
> Applications such as swsusp, netdump or debuggers have just to check
> the PG_nosave bit to be safe.
>
> I actually would like to rename the bit PG_nosave to PG_donttouch ;)
;)
>
> diff -uN -r -X /home/mhf/sys/dont/dontdiff linux-2.4.24-Vanilla/arch/i386/mm/init.c linux-2.4.24-mhf179/arch/i386/mm/init.c
> --- linux-2.4.24-Vanilla/arch/i386/mm/init.c 2004-01-21 15:53:01.000000000 +0800
> +++ linux-2.4.24-mhf179/arch/i386/mm/init.c 2004-02-10 06:15:31.000000000 +0800
> @@ -451,15 +451,18 @@
> {
> if (!page_is_ram(pfn)) {
> SetPageReserved(page);
> + SetPageNosave(page);
> return;
> }
>
> if (bad_ppro && page_kills_ppro(pfn)) {
> SetPageReserved(page);
> + SetPageNosave(page);
> return;
> }
>
> ClearPageReserved(page);
> + ClearPageNosave(page);
why this clearpagenosave? looks superflous, you're not doing it in the
normal zone anyways.
> +#if defined(__nosave_begin)
this won't work right, __nosave_begin isn't a preprocessor thing so it
will be ignored when you uncomment it. You probably can use #if 0
instead and a comment near __nosave_begin to turn it to 1 when enabling
the suspend code.
> What is your opinion of this approach?
except for the above two nitpicks, the patch is correct and needed for
safe suspend IMHO. 2.6 seems to miss this thing too, why not add it to
2.6 first?
> BTW, The patch below is needed to run it with a nosave region on
> Vanilla 2.4.2[45]:
>
> diff -ruN linux-2.4.24/arch/i386/vmlinux.lds software-suspend-linux-2.4.24-rev7/arch/i386/vmlinux.lds
> --- linux-2.4.24/arch/i386/vmlinux.lds 2004-01-22 19:46:03.000000000 +1300
> +++ software-suspend-linux-2.4.24-rev7/arch/i386/vmlinux.lds 2004-01-30 15:23:38.000000000 +1300
> @@ -53,6 +53,12 @@
> __init_end = .;
>
> . = ALIGN(4096);
> + __nosave_begin = .;
> + .data_nosave : { *(.data.nosave) }
> + . = ALIGN(4096);
> + __nosave_end = .;
> +
> + . = ALIGN(4096);
> .data.page_aligned : { *(.data.idt) }
>
> . = ALIGN(32);
>
> also uncomment in mm/init.c the line:
> //extern char __nosave_begin, __nosave_end;
yep.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Reserved page flaging of 2.4 kernel memory changed recently?
2004-02-10 18:51 ` Andrea Arcangeli
@ 2004-02-10 19:38 ` Michael Frank
2004-02-11 8:36 ` Michael Frank
1 sibling, 0 replies; 12+ messages in thread
From: Michael Frank @ 2004-02-10 19:38 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Nigel Cunningham, linux-kernel
On Wednesday 11 February 2004 02:51, Andrea Arcangeli wrote:
> On Tue, Feb 10, 2004 at 11:24:01PM +0800, Michael Frank wrote:
> >
> > diff -uN -r -X /home/mhf/sys/dont/dontdiff linux-2.4.24-Vanilla/arch/i386/mm/init.c linux-2.4.24-mhf179/arch/i386/mm/init.c
> > --- linux-2.4.24-Vanilla/arch/i386/mm/init.c 2004-01-21 15:53:01.000000000 +0800
> > +++ linux-2.4.24-mhf179/arch/i386/mm/init.c 2004-02-10 06:15:31.000000000 +0800
> > @@ -451,15 +451,18 @@
> > {
> > if (!page_is_ram(pfn)) {
> > SetPageReserved(page);
> > + SetPageNosave(page);
> > return;
> > }
> >
> > if (bad_ppro && page_kills_ppro(pfn)) {
> > SetPageReserved(page);
> > + SetPageNosave(page);
> > return;
> > }
> >
> > ClearPageReserved(page);
> > + ClearPageNosave(page);
>
> why this clearpagenosave? looks superflous, you're not doing it in the
> normal zone anyways.
I'll sleep on it and get back to you with my arguments.
>
> > +#if defined(__nosave_begin)
>
> this won't work right, __nosave_begin isn't a preprocessor thing so it
> will be ignored when you uncomment it. You probably can use #if 0
> instead and a comment near __nosave_begin to turn it to 1 when enabling
> the suspend code.
Oh sh*t, this is what one gets for fixing things up for a demo after
a long night... Will bite my lower rear portion after the nap.
>
> > What is your opinion of this approach?
>
> except for the above two nitpicks, the patch is correct and needed for
> safe suspend IMHO. 2.6 seems to miss this thing too, why not add it to
> 2.6 first?
Swsusp won't be in 2.4 anyway, if Nigel accepts the patch, it will become part
of his next releases for 2.4 and 2.6.
Anyway, I'll fix the patch up for 2.6, test it and post the patch in a few days.
Regards
Michael
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Reserved page flaging of 2.4 kernel memory changed recently?
2004-02-10 18:51 ` Andrea Arcangeli
2004-02-10 19:38 ` Michael Frank
@ 2004-02-11 8:36 ` Michael Frank
1 sibling, 0 replies; 12+ messages in thread
From: Michael Frank @ 2004-02-11 8:36 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Nigel Cunningham, linux-kernel
On Wednesday 11 February 2004 02:51, Andrea Arcangeli wrote:
> On Tue, Feb 10, 2004 at 11:24:01PM +0800, Michael Frank wrote:
> > + ClearPageNosave(page);
>
> why this clearpagenosave? looks superflous, you're not doing it in the
> normal zone anyways.
OK, this gets removed.
Regards
Michael
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Reserved page flaging of 2.4 kernel memory changed recently?
2004-02-10 15:24 ` Michael Frank
2004-02-10 18:51 ` Andrea Arcangeli
@ 2004-02-19 7:26 ` Pavel Machek
2004-02-19 9:00 ` Michael Frank
1 sibling, 1 reply; 12+ messages in thread
From: Pavel Machek @ 2004-02-19 7:26 UTC (permalink / raw)
To: Michael Frank; +Cc: Andrea Arcangeli, Nigel Cunningham, linux-kernel
Hi!
> I actually would like to rename the bit PG_nosave to PG_donttouch ;)
Its used for swsusp internal data, too...
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Reserved page flaging of 2.4 kernel memory changed recently?
2004-02-19 7:26 ` Pavel Machek
@ 2004-02-19 9:00 ` Michael Frank
2004-02-19 16:14 ` Pavel Machek
0 siblings, 1 reply; 12+ messages in thread
From: Michael Frank @ 2004-02-19 9:00 UTC (permalink / raw)
To: Pavel Machek; +Cc: Andrea Arcangeli, Nigel Cunningham, linux-kernel
On Thu, 19 Feb 2004 08:26:30 +0100, Pavel Machek <pavel@suse.cz> wrote:
> Hi!
>
mhf wrote:
>> I actually would like to rename the bit PG_nosave to PG_donttouch ;)
to make a point with regard to:
no transfer of page contents during suspend/resume
no netdump
no debugger access without override
... but the name does not matter and we do not have to change it.
>
> Its used for swsusp internal data, too...
Yes of course - how else would swsusp run, but these data are also not
"touched"
during suspend and resume wrt transfer of page content.
x86 Pages for PG_nosave:
Video/BIOS 0xA0000-0XFFFFF
Anything reserved < max_pfn
Pentium 2 broken highmem pages
Driver specific areas in DMA zone are also thinkable
.. or else you get mce's or possibly crashes on newer x86 HW and on 64Bit
for sure.
- we had a mce recently at 0xa0000 on a Athlon XP and I went digging...
Regards
Michael
--
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Reserved page flaging of 2.4 kernel memory changed recently?
2004-02-19 9:00 ` Michael Frank
@ 2004-02-19 16:14 ` Pavel Machek
2004-02-19 17:37 ` Michael Frank
0 siblings, 1 reply; 12+ messages in thread
From: Pavel Machek @ 2004-02-19 16:14 UTC (permalink / raw)
To: Michael Frank; +Cc: Andrea Arcangeli, Nigel Cunningham, linux-kernel
Hi!
> >>I actually would like to rename the bit PG_nosave to PG_donttouch ;)
>
> to make a point with regard to:
>
> no transfer of page contents during suspend/resume
> no netdump
> no debugger access without override
>
> ... but the name does not matter and we do not have to change it.
>
> >Its used for swsusp internal data, too...
>
> Yes of course - how else would swsusp run, but these data are also not
> "touched"
> during suspend and resume wrt transfer of page content.
Yes. But I still want to be able to access swsusp internal data
through debugger, and I want them in the netdump.
That means that PG_nosave | PG_reserved indeed is "PG_donttouch", but
PG_nosave has slightly different meaning.
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Reserved page flaging of 2.4 kernel memory changed recently?
2004-02-19 17:37 ` Michael Frank
@ 2004-02-19 17:35 ` Pavel Machek
2004-02-19 17:59 ` Michael Frank
0 siblings, 1 reply; 12+ messages in thread
From: Pavel Machek @ 2004-02-19 17:35 UTC (permalink / raw)
To: Michael Frank; +Cc: Andrea Arcangeli, Nigel Cunningham, linux-kernel
Hi!
> >That means that PG_nosave | PG_reserved indeed is "PG_donttouch", but
> >PG_nosave has slightly different meaning.
>
> Makes sense, but PG_reserved is used to keep VM out of these pages.
>
> Can we have a seperate bit PG_donttouch which is set with PG_nosave
> | PG_reserved in reserved/video/BIOS/Broken CPU areas?
Why?
I do not see what is wrong with 2 separate flags... In fact, you might
want to
#define PG_donttouch (PG_reserved | PG_nosave)
and (modulo atomic macros etc), it would work for everyone...
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Reserved page flaging of 2.4 kernel memory changed recently?
2004-02-19 16:14 ` Pavel Machek
@ 2004-02-19 17:37 ` Michael Frank
2004-02-19 17:35 ` Pavel Machek
0 siblings, 1 reply; 12+ messages in thread
From: Michael Frank @ 2004-02-19 17:37 UTC (permalink / raw)
To: Pavel Machek; +Cc: Andrea Arcangeli, Nigel Cunningham, linux-kernel
On Thu, 19 Feb 2004 17:14:55 +0100, Pavel Machek <pavel@suse.cz> wrote:
> Hi!
>
>> >>I actually would like to rename the bit PG_nosave to PG_donttouch ;)
>>
>> to make a point with regard to:
>>
>> no transfer of page contents during suspend/resume
>> no netdump
>> no debugger access without override
>>
>> ... but the name does not matter and we do not have to change it.
>>
>> >Its used for swsusp internal data, too...
>>
>> Yes of course - how else would swsusp run, but these data are also not
>> "touched"
>> during suspend and resume wrt transfer of page content.
>
> Yes. But I still want to be able to access swsusp internal data
> through debugger, and I want them in the netdump.
>
> That means that PG_nosave | PG_reserved indeed is "PG_donttouch", but
> PG_nosave has slightly different meaning.
Makes sense, but PG_reserved is used to keep VM out of these pages.
Can we have a seperate bit PG_donttouch which is set with PG_nosave
| PG_reserved in reserved/video/BIOS/Broken CPU areas?
This way
- debugger and netdump use PG_donttouch to prevent accesses
which might result in MCE's and CPU crashes.
- Swsusp uses PG_nosave.
- VM continues to use PG_reserved.
This is also a safe provision for hardware/driver changes.
As an example of using PG_nosave and PG_reserved Here are the actual
pageflags of a running 2.4.24 kernel using PG_nosave for
video/BIOS shown using crash utility:
Flags 4000 is PG_reserved
Flags 20000 is PG_nosave
By my proposal we would set PG_dontouch as well where PG_reserved
&& PG_nosave are set right now.
crash> kmem -p
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
Interrupt vectors
c100001c 0 0 0 0 4000
c1000048 1000 0 0 0 4000
c1000074 2000 0 0 0 4000
c10000a0 3000 0 0 0 4000
Main memory
c10000cc 4000 deb641c4 3862 2 c8
c10000f8 5000 deb641c4 3861 2 c8
c1000124 6000 deb641c4 3860 2 c8
c1000150 7000 deb641c4 3859 2 c8
c100017c 8000 0 1 2 4c
[]
c1000ca8 49000 0 4 2 4c
c1000cd4 4a000 0 15 2 4c
c1000d00 4b000 0 0 2 4c
c1000d2c 4c000 0 4 2 4c
c1000d58 4d000 0 3 2 4c
c1000d84 4e000 0 7 2 4c
c1000db0 4f000 0 6 2 4c
c1000ddc 50000 cb35b6a4 669 2 4c
c1000e08 51000 cb35b6a4 668 2 4c
c1000e34 52000 cb35b6a4 667 2 4c
c1000e60 53000 cb35b6a4 666 2 4c
c1000e8c 54000 dee6e564 7085941 2 c0
c1000eb8 55000 cb35b6a4 604 2 4c
c1000ee4 56000 0 25308 1 108
c1000f10 57000 cb35b6a4 605 2 4c
c1000f3c 58000 cb35b6a4 606 2 4c
c1000f68 59000 0 0 2 4c
c1000f94 5a000 cb35b6a4 615 2 4c
[]
c1001a94 9a000 deb641c4 3818 2 c8
c1001ac0 9b000 deb641c4 3817 2 c8
c1001aec 9c000 deb641c4 3826 2 c8
c1001b18 9d000 deb641c4 3825 2 c8
c1001b44 9e000 cb35b8a4 1 2 4c
Donttouch reserved/video/BIOS area
c1001b70 9f000 0 0 0 24000
c1001b9c a0000 0 0 0 24000
c1001bc8 a1000 0 0 0 24000
c1001bf4 a2000 0 0 0 24000
c1001c20 a3000 0 0 0 24000
c1001c4c a4000 0 0 0 24000
c1001c78 a5000 0 0 0 24000
c1001ca4 a6000 0 0 0 24000
c1001cd0 a7000 0 0 0 24000
c1001cfc a8000 0 0 0 24000
c1001d28 a9000 0 0 0 24000
[]
c1002b14 fa000 0 0 0 24000
c1002b40 fb000 0 0 0 24000
c1002b6c fc000 0 0 0 24000
c1002b98 fd000 0 0 0 24000
c1002bc4 fe000 0 0 0 24000
c1002bf0 ff000 0 0 0 24000
Kernel
c1002c1c 100000 0 0 0 4000
c1002c48 101000 0 0 0 4000
c1002c74 102000 0 0 0 4000
c1002ca0 103000 0 0 0 4000
c1002ccc 104000 0 0 11875 4000
[]
c100b360 413000 0 0 0 4000
c100b38c 414000 0 0 0 4000
c100b3b8 415000 0 0 0 4000
Here is 120K ex init memory
c100b3e4 416000 deb641c4 3820 2 c8
c100b410 417000 deb641c4 3819 2 c8
c100b43c 418000 deb641c4 3866 2 c8
c100b468 419000 deb641c4 3865 2 c8
c100b494 41a000 deb641c4 3864 2 c8
c100b4c0 41b000 deb641c4 3863 2 c8
c100b4ec 41c000 deb641c4 3828 2 c8
c100b518 41d000 deb641c4 3827 2 c8
c100b544 41e000 0 0 1 100
c100b570 41f000 deb641c4 3807 2 c8
c100b59c 420000 cb35b6a4 3652 2 c8
c100b5c8 421000 cb35b6a4 3651 2 c8
c100b5f4 422000 cb35b6a4 3650 2 c8
c100b620 423000 cb35b6a4 3649 2 c8
c100b64c 424000 cb35b6a4 113 2 cc
c100b678 425000 0 0 1 100
c100b6a4 426000 0 0 1 5c
c100b6d0 427000 0 1 1 5c
c100b6fc 428000 0 25405 1 108
c100b728 429000 cb35b6a4 498 2 cc
c100b754 42a000 cb35b6a4 543 2 cc
c100b780 42b000 cb35b6a4 542 2 cc
c100b7ac 42c000 cb35b6a4 365 2 cc
c100b7d8 42d000 cb35b6a4 364 2 cc
c100b804 42e000 cb35b6a4 484 2 cc
c100b830 42f000 dee6e564 7077953 2 c0
c100b85c 430000 dee6e564 7078648 2 c0
c100b888 431000 cb35b6a4 508 2 cc
c100b8b4 432000 cb35b6a4 561 2 4c
c100b8e0 433000 cb35b6a4 560 2 4c
swsusp nosave area - 1 page
c100b90c 434000 0 0 0 24000
some more kernel data
c100b938 435000 0 0 0 4000
c100b964 436000 0 0 0 4000
c100b990 437000 0 0 0 4000
c100b9bc 438000 0 0 0 4000
c100b9e8 439000 0 0 0 4000
[]
c100c800 48b000 0 0 0 4000
c100c82c 48c000 0 0 0 4000
End of kernel data / main memory
c100c858 48d000 cb35b6a4 116 2 cc
c100c884 48e000 deb641c4 3822 2 c8
c100c8b0 48f000 deb641c4 3821 2 c8
c100c8dc 490000 cb35b6a4 117 2 cc
...
Regards
Michael
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Reserved page flaging of 2.4 kernel memory changed recently?
2004-02-19 17:35 ` Pavel Machek
@ 2004-02-19 17:59 ` Michael Frank
0 siblings, 0 replies; 12+ messages in thread
From: Michael Frank @ 2004-02-19 17:59 UTC (permalink / raw)
To: Pavel Machek; +Cc: Andrea Arcangeli, Nigel Cunningham, linux-kernel
On Thu, 19 Feb 2004 18:35:14 +0100, Pavel Machek <pavel@suse.cz> wrote:
> Hi!
>
>> >That means that PG_nosave | PG_reserved indeed is "PG_donttouch", but
>> >PG_nosave has slightly different meaning.
>>
>> Makes sense, but PG_reserved is used to keep VM out of these pages.
>>
>> Can we have a seperate bit PG_donttouch which is set with PG_nosave
>> | PG_reserved in reserved/video/BIOS/Broken CPU areas?
>
> Why?
>
> I do not see what is wrong with 2 separate flags... In fact, you might
> want to
>
> #define PG_donttouch (PG_reserved | PG_nosave)
>
> and (modulo atomic macros etc), it would work for everyone...
>
As your earlier post pointed out, it would not work in swsusp nosave area
which is only PG_reserved | PG_nosave.
Are we too short of bits ? ;)
What about:
- export swsusp __nosave range for netdump override to dump __nosave page(s)
- debugger (linked in) uses swsusp __nosave range to enable access to __nosave page(s)
Regards
Michael
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2004-02-19 17:52 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-05 2:07 Reserved page flaging of 2.4 kernel memory changed recently? Michael Frank
2004-02-08 2:06 ` Andrea Arcangeli
2004-02-10 15:24 ` Michael Frank
2004-02-10 18:51 ` Andrea Arcangeli
2004-02-10 19:38 ` Michael Frank
2004-02-11 8:36 ` Michael Frank
2004-02-19 7:26 ` Pavel Machek
2004-02-19 9:00 ` Michael Frank
2004-02-19 16:14 ` Pavel Machek
2004-02-19 17:37 ` Michael Frank
2004-02-19 17:35 ` Pavel Machek
2004-02-19 17:59 ` Michael Frank
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox