public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Reserved page flaging of 2.4 kernel memory changed recently?
@ 2004-02-05  2:07 Michael Frank
  2004-02-08  2:06 ` Andrea Arcangeli
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Frank @ 2004-02-05  2:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: Nigel Cunningham

The question is related to saving the kernel with swsusp.

Looking at 2.4.24 x86 kernel page flags, kernel memory is flaged reserved 
the same way as video, BIOS pages.

Is this a recent change since using the aa vm and should it be like that?

If so, should hardware related reserved pages i.e video, BIOS be flaged
PG_nosave upon init?

What about iomemory?

Michael

Note: (Flags & 0x4000) == PG_reserved

# crash vmlinux
crash 3.7-5.2
Copyright (C) 2002, 2003  Red Hat, Inc.
Copyright (C) 1998-2003  Hewlett-Packard Co
Copyright (C) 1999, 2002  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb Red Hat Linux (5.3post-0.20021129.36rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...

6c 14
WARNING: net_init: unknown device type for net device      KERNEL: vmlinux
    DUMPFILE: /dev/mem
        CPUS: 1
        DATE: Thu Feb  5 09:36:36 2004
      UPTIME: 00:57:01
LOAD AVERAGE: 0.08, 0.02, 0.01
       TASKS: 76
    NODENAME: mhfl4
     RELEASE: 2.4.24-mhf169
     VERSION: #2 Sat Jan 31 16:03:07 HKT 2004
     MACHINE: i686  (2399 Mhz)
      MEMORY: 496 MB
         PID: 1872
     COMMAND: "crash"
        TASK: cec66000
         CPU: 0
       STATE: TASK_RUNNING (ACTIVE)

crash> kmem -p
  PAGE    PHYSICAL   MAPPING    INDEX CNT FLAGS
c100001c         0         0         0  0 4000
c1000048      1000         0         0  0 4000
c1000074      2000         0         0  0 4000
c10000a0      3000         0         0  0 4000
c10000cc      4000         0         0  0 0
c10000f8      5000         0         0  0 0
c1000124      6000         0         0  0 0
c1000150      7000         0         0  0 0
c100017c      8000         0         0  0 0
c10001a8      9000         0         0  0 0
c10001d4      a000         0         0  0 0
c1000200      b000         0         0  0 0
[]
c1001b70     9f000         0         0  0 4000
c1001b9c     a0000         0         0  0 4000
c1001bc8     a1000         0         0  0 4000
c1001bf4     a2000         0         0  0 4000
c1001c20     a3000         0         0  0 4000
c1001c4c     a4000         0         0  0 4000
c1001c78     a5000         0         0  0 4000
c1001ca4     a6000         0         0  0 4000
c1001cd0     a7000         0         0  0 4000
c1001cfc     a8000         0         0  0 4000
c1001d28     a9000         0         0  0 4000
c1001d54     aa000         0         0  0 4000
c1001d80     ab000         0         0  0 4000
c1001dac     ac000         0         0  0 4000
c1001dd8     ad000         0         0  0 4000
c1001e04     ae000         0         0  0 4000
[]
c1002b98     fd000         0         0  0 4000
c1002bc4     fe000         0         0  0 4000
c1002bf0     ff000         0         0  0 4000
c1002c1c    100000         0         0  0 4000
c1002c48    101000         0         0  0 4000
c1002c74    102000         0         0  0 4000
c1002ca0    103000         0         0  0 4000
c1002ccc    104000         0         0 1425 4000
c1002cf8    105000         0         0  0 4000
c1002d24    106000         0         0  0 4000
c1002d50    107000         0         0  0 4000
c1002d7c    108000         0         0  0 4000
c1002da8    109000         0         0  0 4000
[]
c100b2b0    40f000         0         0  0 4000
c100b2dc    410000         0         0  0 4000
c100b308    411000         0         0  0 4000
c100b334    412000         0         0  0 0
c100b360    413000         0         0  0 0
c100b38c    414000         0         0  0 0
c100b3b8    415000         0         0  0 0
c100b3e4    416000         0         0  0 0
c100b410    417000         0         0  0 0
c100b43c    418000         0         0  0 0
c100b468    419000         0         0  0 0
c100b494    41a000         0         0  0 0
c100b4c0    41b000         0         0  0 0
c100b4ec    41c000         0         0  0 0
c100b518    41d000         0         0  0 0
c100b544    41e000         0         0  0 0
c100b570    41f000         0         0  0 0
c100b59c    420000         0         0  0 0
c100b5c8    421000         0         0  0 0
c100b5f4    422000         0         0  0 0
c100b620    423000         0         0  0 0
c100b64c    424000         0         0  0 0
c100b678    425000         0         0  0 0
c100b6a4    426000         0         0  0 0
c100b6d0    427000         0         0  0 0
c100b6fc    428000         0         0  0 0
c100b728    429000         0         0  0 0
c100b754    42a000         0         0  0 0
c100b780    42b000         0         0  0 0
c100b7ac    42c000         0         0  0 0
c100b7d8    42d000         0         0  0 0
c100b804    42e000         0         0  0 0
c100b830    42f000         0         0  0 0
c100b85c    430000         0         0  0 4000
c100b888    431000         0         0  0 4000
c100b8b4    432000         0         0  0 4000
c100b8e0    433000         0         0  0 4000
c100b90c    434000         0         0  0 4000
c100b938    435000         0         0  0 4000
c100b964    436000         0         0  0 4000
c100b990    437000         0         0  0 4000
c100b9bc    438000         0         0  0 4000
c100b9e8    439000         0         0  0 4000
c100ba14    43a000         0         0  0 4000
[]
c100c6a0    483000         0         0  0 4000
c100c6cc    484000         0         0  0 4000
c100c6f8    485000         0         0  0 4000
c100c724    486000         0         0  0 4000
c100c750    487000         0         0  0 4000
c100c77c    488000         0         0  0 4000
c100c7a8    489000         0         0  0 0
c100c7d4    48a000         0         0  0 0
c100c800    48b000         0         0  0 0
c100c82c    48c000         0         0  0 0
c100c858    48d000         0         0  0 4000
c100c884    48e000         0         0  0 0
c100c8b0    48f000         0         0  0 0
c100c8dc    490000         0         0  0 0
c100c908    491000         0         0  0 0
c100c934    492000         0         0  0 0
c100c960    493000         0         0  0 0
c100c98c    494000         0         0  0 0
c100c9b8    495000         0         0  0 0
c100c9e4    496000         0         0  0 0
c100ca10    497000         0         0  0 0
c100ca3c    498000         0         0  0 0
c100ca68    499000         0         0  0 0
c100ca94    49a000         0         0  0 0
c100cac0    49b000         0         0  0 0





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reserved page flaging of 2.4 kernel memory changed recently?
  2004-02-05  2:07 Reserved page flaging of 2.4 kernel memory changed recently? Michael Frank
@ 2004-02-08  2:06 ` Andrea Arcangeli
  2004-02-10 15:24   ` Michael Frank
  0 siblings, 1 reply; 12+ messages in thread
From: Andrea Arcangeli @ 2004-02-08  2:06 UTC (permalink / raw)
  To: Michael Frank; +Cc: linux-kernel, Nigel Cunningham

On Thu, Feb 05, 2004 at 10:07:35AM +0800, Michael Frank wrote:
> The question is related to saving the kernel with swsusp.
> 
> Looking at 2.4.24 x86 kernel page flags, kernel memory is flaged reserved 
> the same way as video, BIOS pages.
> 
> Is this a recent change since using the aa vm and should it be like that?

this is the same as 2.2 too, the reserved bit means this isn't a normal
"ram" page, this is either non-ram in the mem_map region or a ram page
being used by a device driver for source/destination dma or similar
special usage.

> If so, should hardware related reserved pages i.e video, BIOS be flaged
> PG_nosave upon init?

the non-ram regions of the physical address space present in the
mem_map_t array are marked as reserved at boot.

About the ram pieces of the mem_map_t, it's by the time the device
driver needs some ram to do dma on it, that you alloc one page with
alloc_pages and you mark it reserved.

marking physical ram pages as reserved is only needed when you want to
make this page visible to userspace via ->mmap/mmap(2). if you only work
with copy_to_user/copy_from_user read(2)/write(2), nothing will change
if the page is reserved or not (same goes for the mmio areas part of the
mem_map_t array).

the PG_reserved plays a role by the time you map the page in userspace,
then a fork() won't copy-on-write, such a page will be shared, since
it's a special page that the hardware "owns", if you would copy-on-write
you couldn't talk with the device anymore on the copied page. After all
references to the device have been released, the release callback is run
by the vfs, so you know the page isn't mapped in userspace anymore and
if it's a ram page you can clear the PG_reserved and then free the page
(if you free the page w/o clearing PG_reserved first you'll leak memory
silenty).

Those regions normally are also marked VM_IO in the vma, to avoid ptrace
or rawio to mess with those dma pages, which isn't guaranteed to be safe
and could lockup the bus.

> What about iomemory?

iomemory (i.e. MMIO) is not ram and normally it doesn't fit by mistake
in the mem_map_t array either, so if there's no page struct they can't
be marked reserved either. The vm will automatically recognize and
threat pages outside the mem_map_t as reserved.

ioremap is needed to access MMIO memory and it's a different matter.

not sure what's the reason of the question though. with regard to
suspend to disk you should probably use the original e820 map to find if
the reserved pages are ram or non ram, the reserved ram pages should
probably be saved/restored, however the saving/restore process should be
probably directed by the device driver owning those reserved ram pages
to be very safe (can suspend to disk be math safe at all? :). the non
ram pages shouldn't need to be saved/restored (as you found there's the
bios in there). Basically you've to differentiate between reserved ram
pages and reserved non-ram (marked as reserved just because their
physical address fits in the mem_map_t array).

I've seen in 2.6 there's a PG_nosave, but it seems to have a different
purpose than a "PG_ram" that tells you if the page is ram or not. From a
quick read of the code it seems all reserved pages are stored except the
ones in the nosave segment (which is also marked protected as part of
the static kernel .text). So in short it looks like we save/restore the
non-ram too, maybe it's ok, dunno but I would find it a lot safer not to
touch that non-ram.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reserved page flaging of 2.4 kernel memory changed recently?
  2004-02-08  2:06 ` Andrea Arcangeli
@ 2004-02-10 15:24   ` Michael Frank
  2004-02-10 18:51     ` Andrea Arcangeli
  2004-02-19  7:26     ` Pavel Machek
  0 siblings, 2 replies; 12+ messages in thread
From: Michael Frank @ 2004-02-10 15:24 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Nigel Cunningham, linux-kernel

Andrea,

Thank you very much you for your reply. 

On Sunday 08 February 2004 10:06, Andrea Arcangeli wrote:
> not sure what's the reason of the question though. with regard to

Reason for the question is to gain more understanding for centralizing
flaging of pages which should not be touched.

I also was confused by having seen a netdump patch which uses the 
PG_reserved bit wrongly, and asked if anything was changed recently
to explain whether this patch actually could work as is. 

> suspend to disk you should probably use the original e820 map to find if
> the reserved pages are ram or non ram, 

Generally yes but the process should be centralized using standard page 
flag mechanism. This is the only clean and application independent way.

Examples emphasizing the importance of centralisation as one would not 
want to consider these issues in the implementation of swsusp, netdump 
or debuggers:

- Drivers may be sensitive to reading/writing their DMA buffers without
  considering ongoing transfers. 
- (Embedded) devices iomemory could be mapped into DMA zone and 
  get trashed by parasitic accesses. 
- A bad ppro (see patch) can be locked up by accessing "the wrong" page. 

> the reserved ram pages should probably be saved/restored, 

Except when marked PG_nosave ;)

> however the saving/restore process should be probably directed by the 
> device driver owning those reserved ram pages to be very safe 

Yes, exactly, only the driver knows what is going on with it's data and 
_must_ be made responsible for taking care of it's data.

Also in case of DMA buffers, DMA must be properly managed (suspended)
by the driver.

- sidenote as to PM - Device buffers such as disk buffers and port FIFO's 
(serial, USB serial...) must be flushed too by their drivers -

> (can suspend to disk be made safe at all? :). 

Well, it seems to put a kernel and drivers into a fridge and revive 
them on taking them out is like freezing and reviving hell :)

Nigels Software suspend 2.0 is stable on 2.4 and 2.6 and well tested,
but there are issues affecting 1 in a 1000 such as MCE's occuring -
more on that will follow.
And with drivers, we continue to have problems and have high hopes 
for PM in 2.6.

> the non
> ram pages shouldn't need to be saved/restored (as you found there's the
> bios in there). Basically you've to differentiate between reserved ram
> pages and reserved non-ram (marked as reserved just because their
> physical address fits in the mem_map_t array).

It seems unclean and unsafe to touch non-RAM regions. On lots of 
"proper" non-PC hardware, there would even be even bus timeouts
if the location is not accessible (such as write to BIOS). 

> 
> I've seen in 2.6 there's a PG_nosave, but it seems to have a different
> purpose than a "PG_ram" that tells you if the page is ram or not. From a
> quick read of the code it seems all reserved pages are stored except the
> ones in the nosave segment (which is also marked protected as part of
> the static kernel .text). So in short it looks like we save/restore the
> non-ram too, maybe it's ok, dunno but I would find it a lot safer not to
> touch that non-ram.
> 

By what I read on LKML, 64bit is probably more fussy then 32bit. eg when 
accessing non-existing memory such as on a system with memory holes 
with /dev/mem often causes MCE's. 

And here is an example of touching non-RAM going wrong on a x86 PC:

One swsusp user received a MCE on swsusp accessing 0xa0000 (video). 
This seems to be quite recent hardware: a Athlon mobile XP 20000.
This Compaq evo is running alright with NOMCE on the commandline.

(I had posted a question about this on LKML - 
"Reserved pages not flagged on Compaq evo? on 4 Feb", but found out
meanwhile (to my big surprise) that swsusp accesses the area in 
question, thus likely rendering the question obsolete)

Here is a patch for 2.4.2[45], which marks non-ram, CPU-broken-pages, and 
nosave kernel-pages pages with PG_nosave. 

Applications such as swsusp, netdump or debuggers have just to check 
the PG_nosave bit to be safe.

I actually would like to rename the bit PG_nosave to PG_donttouch ;)

diff -uN -r -X /home/mhf/sys/dont/dontdiff linux-2.4.24-Vanilla/arch/i386/mm/init.c linux-2.4.24-mhf179/arch/i386/mm/init.c
--- linux-2.4.24-Vanilla/arch/i386/mm/init.c	2004-01-21 15:53:01.000000000 +0800
+++ linux-2.4.24-mhf179/arch/i386/mm/init.c	2004-02-10 06:15:31.000000000 +0800
@@ -451,15 +451,18 @@
 {
 	if (!page_is_ram(pfn)) {
 		SetPageReserved(page);
+		SetPageNosave(page);
 		return;
 	}
 	
 	if (bad_ppro && page_kills_ppro(pfn)) {
 		SetPageReserved(page);
+		SetPageNosave(page);
 		return;
 	}
 	
 	ClearPageReserved(page);
+	ClearPageNosave(page);
 	set_bit(PG_highmem, &page->flags);
 	atomic_set(&page->count, 1);
 	__free_page(page);
@@ -478,10 +481,18 @@
 #endif
 }
 
+
+/* The definition of a nosave region is part of software suspend
+ * and repeated to enable compilation and execution if this patch
+ * with a nosave region using a vanilla kernel.
+ */
+//extern char __nosave_begin, __nosave_end;
+
 static int __init free_pages_init(void)
 {
 	extern int ppro_with_ram_bug(void);
 	int bad_ppro, reservedpages, pfn;
+	unsigned long addr;
 
 	bad_ppro = ppro_with_ram_bug();
 
@@ -489,12 +500,29 @@
 	totalram_pages += free_all_bootmem();
 
 	reservedpages = 0;
-	for (pfn = 0; pfn < max_low_pfn; pfn++) {
+	addr = (unsigned long)__va(0);
+	for (pfn = 0; pfn < max_low_pfn; pfn++, addr += PAGE_SIZE) {
+		if (page_is_ram(pfn)) {
+			/*
+			 * Only count reserved RAM pages
+			 */
+			if (PageReserved(mem_map+pfn))
+				reservedpages++;
+#if defined(__nosave_begin)
+			/*
+			 * Mark nosave pages
+			 */
+			if (addr < (unsigned long)&__nosave_begin ||
+					addr >= (unsigned long)&__nosave_end)
+                                continue;
+#else
+			continue;
+#endif
+		}
 		/*
-		 * Only count reserved RAM pages
+		 * All other pages such as non-RAM pages are always nosave
 		 */
-		if (page_is_ram(pfn) && PageReserved(mem_map+pfn))
-			reservedpages++;
+		SetPageNosave(mem_map+pfn);
 	}
 #ifdef CONFIG_HIGHMEM
 	for (pfn = highend_pfn-1; pfn >= highstart_pfn; pfn--)
diff -uN -r -X /home/mhf/sys/dont/dontdiff linux-2.4.24-Vanilla/include/linux/mm.h linux-2.4.24-mhf179/include/linux/mm.h
--- linux-2.4.24-Vanilla/include/linux/mm.h	2004-02-06 15:40:59.000000000 +0800
+++ linux-2.4.24-mhf179/include/linux/mm.h	2004-02-10 01:19:40.000000000 +0800
@@ -301,6 +301,21 @@
 #define PG_launder		15	/* written out by VM pressure.. */
 #define PG_fs_1			16	/* Filesystem specific */
 
+/* This page is either part of the nosave region of software suspend or
+ * hardware reserved.
+ *
+ * This page may be part of ROM, device mapped memory or broken on the CPU
+ * in use and should _not_ ever be accessed except when in the software
+ * suspend nosave region or when refering device mapped memory.
+ *
+ * This page is not saved by software suspend, ignored by netdump and by
+ * debuggers performing memory dumps.
+ *
+ * Debuggers may wish to implement an overide to allow access to this page
+ * for specialized debugging.
+ */
+#define PG_nosave		17
+
 #ifndef arch_set_page_uptodate
 #define arch_set_page_uptodate(page)
 #endif
@@ -327,6 +342,9 @@
 #define SetPageLaunder(page)	set_bit(PG_launder, &(page)->flags)
 #define ClearPageLaunder(page)	clear_bit(PG_launder, &(page)->flags)
 #define ClearPageArch1(page)	clear_bit(PG_arch_1, &(page)->flags)
+#define PageNosave(page)	test_bit(PG_nosave, &(page)->flags)
+#define SetPageNosave(page)	set_bit(PG_nosave, &(page)->flags)
+#define ClearPageNosave(page)	clear_bit(PG_nosave, &(page)->flags)
 
 /*
  * The zone field is never updated after free_area_init_core()

What is your opinion of this approach?

BTW, The patch below is needed to run it with a nosave region on 
Vanilla 2.4.2[45]:

diff -ruN linux-2.4.24/arch/i386/vmlinux.lds software-suspend-linux-2.4.24-rev7/arch/i386/vmlinux.lds
--- linux-2.4.24/arch/i386/vmlinux.lds	2004-01-22 19:46:03.000000000 +1300
+++ software-suspend-linux-2.4.24-rev7/arch/i386/vmlinux.lds	2004-01-30 15:23:38.000000000 +1300
@@ -53,6 +53,12 @@
   __init_end = .;
 
   . = ALIGN(4096);
+  __nosave_begin = .;
+  .data_nosave : { *(.data.nosave) }
+  . = ALIGN(4096);
+  __nosave_end = .;
+
+  . = ALIGN(4096);
   .data.page_aligned : { *(.data.idt) }
 
   . = ALIGN(32);

also uncomment in mm/init.c the line:
//extern char __nosave_begin, __nosave_end; 

Regards
Michael



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reserved page flaging of 2.4 kernel memory changed recently?
  2004-02-10 15:24   ` Michael Frank
@ 2004-02-10 18:51     ` Andrea Arcangeli
  2004-02-10 19:38       ` Michael Frank
  2004-02-11  8:36       ` Michael Frank
  2004-02-19  7:26     ` Pavel Machek
  1 sibling, 2 replies; 12+ messages in thread
From: Andrea Arcangeli @ 2004-02-10 18:51 UTC (permalink / raw)
  To: Michael Frank; +Cc: Nigel Cunningham, linux-kernel

On Tue, Feb 10, 2004 at 11:24:01PM +0800, Michael Frank wrote:
> By what I read on LKML, 64bit is probably more fussy then 32bit. eg when 
> accessing non-existing memory such as on a system with memory holes 
> with /dev/mem often causes MCE's. 

yes, this happens on ia64 and it may happen on x86-64 too.

> And here is an example of touching non-RAM going wrong on a x86 PC:
> 
> One swsusp user received a MCE on swsusp accessing 0xa0000 (video). 
> This seems to be quite recent hardware: a Athlon mobile XP 20000.
> This Compaq evo is running alright with NOMCE on the commandline.

this is possible too.

> Here is a patch for 2.4.2[45], which marks non-ram, CPU-broken-pages, and 
> nosave kernel-pages pages with PG_nosave. 
> 
> Applications such as swsusp, netdump or debuggers have just to check 
> the PG_nosave bit to be safe.
> 
> I actually would like to rename the bit PG_nosave to PG_donttouch ;)

;)

> 
> diff -uN -r -X /home/mhf/sys/dont/dontdiff linux-2.4.24-Vanilla/arch/i386/mm/init.c linux-2.4.24-mhf179/arch/i386/mm/init.c
> --- linux-2.4.24-Vanilla/arch/i386/mm/init.c	2004-01-21 15:53:01.000000000 +0800
> +++ linux-2.4.24-mhf179/arch/i386/mm/init.c	2004-02-10 06:15:31.000000000 +0800
> @@ -451,15 +451,18 @@
>  {
>  	if (!page_is_ram(pfn)) {
>  		SetPageReserved(page);
> +		SetPageNosave(page);
>  		return;
>  	}
>  	
>  	if (bad_ppro && page_kills_ppro(pfn)) {
>  		SetPageReserved(page);
> +		SetPageNosave(page);
>  		return;
>  	}
>  	
>  	ClearPageReserved(page);
> +	ClearPageNosave(page);

why this clearpagenosave? looks superflous, you're not doing it in the
normal zone anyways.

> +#if defined(__nosave_begin)

this won't work right, __nosave_begin isn't a preprocessor thing so it
will be ignored when you uncomment it. You probably can use #if 0
instead and a comment near __nosave_begin to turn it to 1 when enabling
the suspend code.

> What is your opinion of this approach?

except for the above two nitpicks, the patch is correct and needed for
safe suspend IMHO. 2.6 seems to miss this thing too, why not add it to
2.6 first?

> BTW, The patch below is needed to run it with a nosave region on 
> Vanilla 2.4.2[45]:
> 
> diff -ruN linux-2.4.24/arch/i386/vmlinux.lds software-suspend-linux-2.4.24-rev7/arch/i386/vmlinux.lds
> --- linux-2.4.24/arch/i386/vmlinux.lds	2004-01-22 19:46:03.000000000 +1300
> +++ software-suspend-linux-2.4.24-rev7/arch/i386/vmlinux.lds	2004-01-30 15:23:38.000000000 +1300
> @@ -53,6 +53,12 @@
>    __init_end = .;
>  
>    . = ALIGN(4096);
> +  __nosave_begin = .;
> +  .data_nosave : { *(.data.nosave) }
> +  . = ALIGN(4096);
> +  __nosave_end = .;
> +
> +  . = ALIGN(4096);
>    .data.page_aligned : { *(.data.idt) }
>  
>    . = ALIGN(32);
> 
> also uncomment in mm/init.c the line:
> //extern char __nosave_begin, __nosave_end; 

yep.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reserved page flaging of 2.4 kernel memory changed recently?
  2004-02-10 18:51     ` Andrea Arcangeli
@ 2004-02-10 19:38       ` Michael Frank
  2004-02-11  8:36       ` Michael Frank
  1 sibling, 0 replies; 12+ messages in thread
From: Michael Frank @ 2004-02-10 19:38 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Nigel Cunningham, linux-kernel

On Wednesday 11 February 2004 02:51, Andrea Arcangeli wrote:
> On Tue, Feb 10, 2004 at 11:24:01PM +0800, Michael Frank wrote:
> > 
> > diff -uN -r -X /home/mhf/sys/dont/dontdiff linux-2.4.24-Vanilla/arch/i386/mm/init.c linux-2.4.24-mhf179/arch/i386/mm/init.c
> > --- linux-2.4.24-Vanilla/arch/i386/mm/init.c	2004-01-21 15:53:01.000000000 +0800
> > +++ linux-2.4.24-mhf179/arch/i386/mm/init.c	2004-02-10 06:15:31.000000000 +0800
> > @@ -451,15 +451,18 @@
> >  {
> >  	if (!page_is_ram(pfn)) {
> >  		SetPageReserved(page);
> > +		SetPageNosave(page);
> >  		return;
> >  	}
> >  	
> >  	if (bad_ppro && page_kills_ppro(pfn)) {
> >  		SetPageReserved(page);
> > +		SetPageNosave(page);
> >  		return;
> >  	}
> >  	
> >  	ClearPageReserved(page);
> > +	ClearPageNosave(page);
> 
> why this clearpagenosave? looks superflous, you're not doing it in the
> normal zone anyways.

I'll sleep on it and get back to you with my arguments.

> 
> > +#if defined(__nosave_begin)
> 
> this won't work right, __nosave_begin isn't a preprocessor thing so it
> will be ignored when you uncomment it. You probably can use #if 0
> instead and a comment near __nosave_begin to turn it to 1 when enabling
> the suspend code.

Oh sh*t, this is what one gets for fixing things up for a demo after
a long night... Will bite my lower rear portion after the nap.

> 
> > What is your opinion of this approach?
> 
> except for the above two nitpicks, the patch is correct and needed for
> safe suspend IMHO. 2.6 seems to miss this thing too, why not add it to
> 2.6 first?

Swsusp won't be in 2.4 anyway, if Nigel accepts the patch, it will become part 
of his next releases for 2.4 and 2.6.

Anyway, I'll fix the patch up for 2.6, test it and post the patch in a few days.

Regards
Michael


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reserved page flaging of 2.4 kernel memory changed recently?
  2004-02-10 18:51     ` Andrea Arcangeli
  2004-02-10 19:38       ` Michael Frank
@ 2004-02-11  8:36       ` Michael Frank
  1 sibling, 0 replies; 12+ messages in thread
From: Michael Frank @ 2004-02-11  8:36 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Nigel Cunningham, linux-kernel

On Wednesday 11 February 2004 02:51, Andrea Arcangeli wrote:
> On Tue, Feb 10, 2004 at 11:24:01PM +0800, Michael Frank wrote:
> > +	ClearPageNosave(page);
> 
> why this clearpagenosave? looks superflous, you're not doing it in the
> normal zone anyways.

OK, this gets removed.

Regards
Michael


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reserved page flaging of 2.4 kernel memory changed recently?
  2004-02-10 15:24   ` Michael Frank
  2004-02-10 18:51     ` Andrea Arcangeli
@ 2004-02-19  7:26     ` Pavel Machek
  2004-02-19  9:00       ` Michael Frank
  1 sibling, 1 reply; 12+ messages in thread
From: Pavel Machek @ 2004-02-19  7:26 UTC (permalink / raw)
  To: Michael Frank; +Cc: Andrea Arcangeli, Nigel Cunningham, linux-kernel

Hi!

> I actually would like to rename the bit PG_nosave to PG_donttouch ;)

Its used for swsusp internal data, too...
-- 
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms         


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reserved page flaging of 2.4 kernel memory changed recently?
  2004-02-19  7:26     ` Pavel Machek
@ 2004-02-19  9:00       ` Michael Frank
  2004-02-19 16:14         ` Pavel Machek
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Frank @ 2004-02-19  9:00 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Andrea Arcangeli, Nigel Cunningham, linux-kernel

On Thu, 19 Feb 2004 08:26:30 +0100, Pavel Machek <pavel@suse.cz> wrote:

> Hi!
>

mhf wrote:

>> I actually would like to rename the bit PG_nosave to PG_donttouch ;)

to make a point with regard to:

	no transfer of page contents during suspend/resume
	no netdump
	no debugger access without override

... but the name does not matter and we do not have to change it.

>
> Its used for swsusp internal data, too...

Yes of course - how else would swsusp run, but these data are also not  
"touched"
during suspend and resume wrt transfer of page content.

x86 Pages for PG_nosave:

Video/BIOS 0xA0000-0XFFFFF
Anything reserved < max_pfn
Pentium 2 broken highmem pages
Driver specific areas in DMA zone are also thinkable

.. or else you get mce's or possibly crashes on newer x86 HW and on 64Bit  
for sure.
	- we had a mce recently at 0xa0000 on a Athlon XP and I went digging...

Regards
Michael

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reserved page flaging of 2.4 kernel memory changed recently?
  2004-02-19  9:00       ` Michael Frank
@ 2004-02-19 16:14         ` Pavel Machek
  2004-02-19 17:37           ` Michael Frank
  0 siblings, 1 reply; 12+ messages in thread
From: Pavel Machek @ 2004-02-19 16:14 UTC (permalink / raw)
  To: Michael Frank; +Cc: Andrea Arcangeli, Nigel Cunningham, linux-kernel

Hi!

> >>I actually would like to rename the bit PG_nosave to PG_donttouch ;)
> 
> to make a point with regard to:
> 
> 	no transfer of page contents during suspend/resume
> 	no netdump
> 	no debugger access without override
> 
> ... but the name does not matter and we do not have to change it.
> 
> >Its used for swsusp internal data, too...
> 
> Yes of course - how else would swsusp run, but these data are also not  
> "touched"
> during suspend and resume wrt transfer of page content.

Yes. But I still want to be able to access swsusp internal data
through debugger, and I want them in the netdump.

That means that PG_nosave | PG_reserved indeed is "PG_donttouch", but
PG_nosave has slightly different meaning.
								Pavel
-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reserved page flaging of 2.4 kernel memory changed recently?
  2004-02-19 17:37           ` Michael Frank
@ 2004-02-19 17:35             ` Pavel Machek
  2004-02-19 17:59               ` Michael Frank
  0 siblings, 1 reply; 12+ messages in thread
From: Pavel Machek @ 2004-02-19 17:35 UTC (permalink / raw)
  To: Michael Frank; +Cc: Andrea Arcangeli, Nigel Cunningham, linux-kernel

Hi!

> >That means that PG_nosave | PG_reserved indeed is "PG_donttouch", but
> >PG_nosave has slightly different meaning.
> 
> Makes sense, but PG_reserved is used to keep VM out of these pages.
> 
> Can we have a seperate bit PG_donttouch which is set with PG_nosave
> | PG_reserved in reserved/video/BIOS/Broken CPU areas?

Why?

I do not see what is wrong with 2 separate flags... In fact, you might
want to 

#define PG_donttouch (PG_reserved | PG_nosave)

and (modulo atomic macros etc), it would work for everyone...
 
								Pavel

-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reserved page flaging of 2.4 kernel memory changed recently?
  2004-02-19 16:14         ` Pavel Machek
@ 2004-02-19 17:37           ` Michael Frank
  2004-02-19 17:35             ` Pavel Machek
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Frank @ 2004-02-19 17:37 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Andrea Arcangeli, Nigel Cunningham, linux-kernel

On Thu, 19 Feb 2004 17:14:55 +0100, Pavel Machek <pavel@suse.cz> wrote:

> Hi!
>
>> >>I actually would like to rename the bit PG_nosave to PG_donttouch ;)
>>
>> to make a point with regard to:
>>
>> 	no transfer of page contents during suspend/resume
>> 	no netdump
>> 	no debugger access without override
>>
>> ... but the name does not matter and we do not have to change it.
>>
>> >Its used for swsusp internal data, too...
>>
>> Yes of course - how else would swsusp run, but these data are also not
>> "touched"
>> during suspend and resume wrt transfer of page content.
>
> Yes. But I still want to be able to access swsusp internal data
> through debugger, and I want them in the netdump.
>
> That means that PG_nosave | PG_reserved indeed is "PG_donttouch", but
> PG_nosave has slightly different meaning.

Makes sense, but PG_reserved is used to keep VM out of these pages.

Can we have a seperate bit PG_donttouch which is set with PG_nosave
| PG_reserved in reserved/video/BIOS/Broken CPU areas?

This way

- debugger and netdump use PG_donttouch to prevent accesses
   which might result in MCE's and CPU crashes.

- Swsusp uses PG_nosave.

- VM continues to use PG_reserved.	

This is also a safe provision for hardware/driver changes.

As an example of using PG_nosave and PG_reserved Here are the actual
pageflags of a running 2.4.24 kernel using PG_nosave for
video/BIOS shown using crash utility:

Flags 4000 is PG_reserved
Flags 20000 is PG_nosave

By my proposal we would set PG_dontouch as well where PG_reserved
&& PG_nosave are set right now.

crash> kmem -p

   PAGE    PHYSICAL   MAPPING    INDEX CNT FLAGS

Interrupt vectors

c100001c         0         0         0  0 4000
c1000048      1000         0         0  0 4000
c1000074      2000         0         0  0 4000
c10000a0      3000         0         0  0 4000

Main memory

c10000cc      4000  deb641c4      3862  2 c8
c10000f8      5000  deb641c4      3861  2 c8
c1000124      6000  deb641c4      3860  2 c8
c1000150      7000  deb641c4      3859  2 c8
c100017c      8000         0         1  2 4c
[]
c1000ca8     49000         0         4  2 4c
c1000cd4     4a000         0        15  2 4c
c1000d00     4b000         0         0  2 4c
c1000d2c     4c000         0         4  2 4c
c1000d58     4d000         0         3  2 4c
c1000d84     4e000         0         7  2 4c
c1000db0     4f000         0         6  2 4c
c1000ddc     50000  cb35b6a4       669  2 4c
c1000e08     51000  cb35b6a4       668  2 4c
c1000e34     52000  cb35b6a4       667  2 4c
c1000e60     53000  cb35b6a4       666  2 4c
c1000e8c     54000  dee6e564   7085941  2 c0
c1000eb8     55000  cb35b6a4       604  2 4c
c1000ee4     56000         0     25308  1 108
c1000f10     57000  cb35b6a4       605  2 4c
c1000f3c     58000  cb35b6a4       606  2 4c
c1000f68     59000         0         0  2 4c
c1000f94     5a000  cb35b6a4       615  2 4c
[]
c1001a94     9a000  deb641c4      3818  2 c8
c1001ac0     9b000  deb641c4      3817  2 c8
c1001aec     9c000  deb641c4      3826  2 c8
c1001b18     9d000  deb641c4      3825  2 c8
c1001b44     9e000  cb35b8a4         1  2 4c

Donttouch reserved/video/BIOS area

c1001b70     9f000         0         0  0 24000
c1001b9c     a0000         0         0  0 24000
c1001bc8     a1000         0         0  0 24000
c1001bf4     a2000         0         0  0 24000
c1001c20     a3000         0         0  0 24000
c1001c4c     a4000         0         0  0 24000
c1001c78     a5000         0         0  0 24000
c1001ca4     a6000         0         0  0 24000
c1001cd0     a7000         0         0  0 24000
c1001cfc     a8000         0         0  0 24000
c1001d28     a9000         0         0  0 24000
[]
c1002b14     fa000         0         0  0 24000
c1002b40     fb000         0         0  0 24000
c1002b6c     fc000         0         0  0 24000
c1002b98     fd000         0         0  0 24000
c1002bc4     fe000         0         0  0 24000
c1002bf0     ff000         0         0  0 24000

Kernel

c1002c1c    100000         0         0  0 4000
c1002c48    101000         0         0  0 4000
c1002c74    102000         0         0  0 4000
c1002ca0    103000         0         0  0 4000
c1002ccc    104000         0         0 11875 4000
[]
c100b360    413000         0         0  0 4000
c100b38c    414000         0         0  0 4000
c100b3b8    415000         0         0  0 4000

Here is 120K ex init memory

c100b3e4    416000  deb641c4      3820  2 c8
c100b410    417000  deb641c4      3819  2 c8
c100b43c    418000  deb641c4      3866  2 c8
c100b468    419000  deb641c4      3865  2 c8
c100b494    41a000  deb641c4      3864  2 c8
c100b4c0    41b000  deb641c4      3863  2 c8
c100b4ec    41c000  deb641c4      3828  2 c8
c100b518    41d000  deb641c4      3827  2 c8
c100b544    41e000         0         0  1 100
c100b570    41f000  deb641c4      3807  2 c8
c100b59c    420000  cb35b6a4      3652  2 c8
c100b5c8    421000  cb35b6a4      3651  2 c8
c100b5f4    422000  cb35b6a4      3650  2 c8
c100b620    423000  cb35b6a4      3649  2 c8
c100b64c    424000  cb35b6a4       113  2 cc
c100b678    425000         0         0  1 100
c100b6a4    426000         0         0  1 5c
c100b6d0    427000         0         1  1 5c
c100b6fc    428000         0     25405  1 108
c100b728    429000  cb35b6a4       498  2 cc
c100b754    42a000  cb35b6a4       543  2 cc
c100b780    42b000  cb35b6a4       542  2 cc
c100b7ac    42c000  cb35b6a4       365  2 cc
c100b7d8    42d000  cb35b6a4       364  2 cc
c100b804    42e000  cb35b6a4       484  2 cc
c100b830    42f000  dee6e564   7077953  2 c0
c100b85c    430000  dee6e564   7078648  2 c0
c100b888    431000  cb35b6a4       508  2 cc
c100b8b4    432000  cb35b6a4       561  2 4c
c100b8e0    433000  cb35b6a4       560  2 4c

swsusp nosave area - 1 page

c100b90c    434000         0         0  0 24000

some more kernel data

c100b938    435000         0         0  0 4000
c100b964    436000         0         0  0 4000
c100b990    437000         0         0  0 4000
c100b9bc    438000         0         0  0 4000
c100b9e8    439000         0         0  0 4000
[]
c100c800    48b000         0         0  0 4000
c100c82c    48c000         0         0  0 4000

End of kernel data / main memory

c100c858    48d000  cb35b6a4       116  2 cc
c100c884    48e000  deb641c4      3822  2 c8
c100c8b0    48f000  deb641c4      3821  2 c8
c100c8dc    490000  cb35b6a4       117  2 cc
...

Regards
Michael

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reserved page flaging of 2.4 kernel memory changed recently?
  2004-02-19 17:35             ` Pavel Machek
@ 2004-02-19 17:59               ` Michael Frank
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Frank @ 2004-02-19 17:59 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Andrea Arcangeli, Nigel Cunningham, linux-kernel

On Thu, 19 Feb 2004 18:35:14 +0100, Pavel Machek <pavel@suse.cz> wrote:

> Hi!
>
>> >That means that PG_nosave | PG_reserved indeed is "PG_donttouch", but
>> >PG_nosave has slightly different meaning.
>>
>> Makes sense, but PG_reserved is used to keep VM out of these pages.
>>
>> Can we have a seperate bit PG_donttouch which is set with PG_nosave
>> | PG_reserved in reserved/video/BIOS/Broken CPU areas?
>
> Why?
>
> I do not see what is wrong with 2 separate flags... In fact, you might
> want to
>
> #define PG_donttouch (PG_reserved | PG_nosave)
>
> and (modulo atomic macros etc), it would work for everyone...
>

As your earlier post pointed out, it would not work in swsusp nosave area
which is only PG_reserved | PG_nosave.

Are we too short of bits ? ;)

What about:
  - export swsusp __nosave range for netdump override to dump __nosave page(s)

  - debugger (linked in) uses swsusp __nosave range to enable access to __nosave page(s)

Regards
Michael





^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-02-19 17:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-05  2:07 Reserved page flaging of 2.4 kernel memory changed recently? Michael Frank
2004-02-08  2:06 ` Andrea Arcangeli
2004-02-10 15:24   ` Michael Frank
2004-02-10 18:51     ` Andrea Arcangeli
2004-02-10 19:38       ` Michael Frank
2004-02-11  8:36       ` Michael Frank
2004-02-19  7:26     ` Pavel Machek
2004-02-19  9:00       ` Michael Frank
2004-02-19 16:14         ` Pavel Machek
2004-02-19 17:37           ` Michael Frank
2004-02-19 17:35             ` Pavel Machek
2004-02-19 17:59               ` Michael Frank

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox