public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] i386 Encapsulate copying of pgd entries
@ 2005-08-06  0:26 zach
  2005-08-06  1:13 ` Chris Wright
  2005-08-07 19:00 ` Pavel Machek
  0 siblings, 2 replies; 8+ messages in thread
From: zach @ 2005-08-06  0:26 UTC (permalink / raw)
  To: akpm, chrisl, davej, hpa, linux-kernel, pavel, pratap, Riley,
	zach

Add a clone operation for pgd updates.

This helps complete the encapsulation of updates to page tables (or pages
about to become page tables) into accessor functions rather than using
memcpy() to duplicate them.  This is both generally good for consistency
and also necessary for running in a hypervisor which requires explicit
updates to page table entries.

The new function is:

clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
 
   dst - pointer to pgd range anwhere on a pgd page
   src - ""
   count - the number of pgds to copy.

   dst and src can be on the same page, but the range must not overlap
   and must not cross a page boundary.

Note that I ommitted using this call to copy pgd entries into the
software suspend page root, since this is not technically a live paging
structure, rather it is used on resume from suspend.  CC'ing Pavel in case
he has any feedback on this.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Index: linux-2.6.13/arch/i386/mm/pgtable.c
===================================================================
--- linux-2.6.13.orig/arch/i386/mm/pgtable.c	2005-08-04 12:02:10.000000000 -0700
+++ linux-2.6.13/arch/i386/mm/pgtable.c	2005-08-05 17:13:29.000000000 -0700
@@ -207,19 +207,18 @@
 {
 	unsigned long flags;
 
+	memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
 	if (PTRS_PER_PMD == 1)
 		spin_lock_irqsave(&pgd_lock, flags);
 
-	memcpy((pgd_t *)pgd + USER_PTRS_PER_PGD,
+	clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
 			swapper_pg_dir + USER_PTRS_PER_PGD,
-			(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
-
+			KERNEL_PGD_PTRS);
 	if (PTRS_PER_PMD > 1)
 		return;
 
 	pgd_list_add(pgd);
 	spin_unlock_irqrestore(&pgd_lock, flags);
-	memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
 }
 
 /* never called when PTRS_PER_PMD > 1 */
Index: linux-2.6.13/arch/i386/kernel/smpboot.c
===================================================================
--- linux-2.6.13.orig/arch/i386/kernel/smpboot.c	2005-08-04 12:02:10.000000000 -0700
+++ linux-2.6.13/arch/i386/kernel/smpboot.c	2005-08-04 13:15:45.000000000 -0700
@@ -1017,8 +1017,8 @@
 	tsc_sync_disabled = 1;
 
 	/* init low mem mapping */
-	memcpy(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS,
-			sizeof(swapper_pg_dir[0]) * KERNEL_PGD_PTRS);
+	clone_pgd_range(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS,
+			KERNEL_PGD_PTRS);
 	flush_tlb_all();
 	schedule_work(&task);
 	wait_for_completion(&done);
Index: linux-2.6.13/include/asm-i386/pgtable.h
===================================================================
--- linux-2.6.13.orig/include/asm-i386/pgtable.h	2005-08-04 12:02:10.000000000 -0700
+++ linux-2.6.13/include/asm-i386/pgtable.h	2005-08-05 17:12:33.000000000 -0700
@@ -276,6 +276,21 @@
 }
 
 /*
+ * clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
+ *
+ *  dst - pointer to pgd range anwhere on a pgd page
+ *  src - ""
+ *  count - the number of pgds to copy.
+ *
+ * dst and src can be on the same page, but the range must not overlap,
+ * and must not cross a page boundary.
+ */
+static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
+{
+       memcpy(dst, src, count * sizeof(pgd_t));
+}
+
+/*
  * Macro to mark a page protection value as "uncacheable".  On processors which do not support
  * it, this is a no-op.
  */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] i386 Encapsulate copying of pgd entries
  2005-08-06  0:26 [PATCH 1/1] i386 Encapsulate copying of pgd entries zach
@ 2005-08-06  1:13 ` Chris Wright
  2005-08-06  1:52   ` Zachary Amsden
  2005-08-06  8:05   ` Zachary Amsden
  2005-08-07 19:00 ` Pavel Machek
  1 sibling, 2 replies; 8+ messages in thread
From: Chris Wright @ 2005-08-06  1:13 UTC (permalink / raw)
  To: zach; +Cc: akpm, chrisl, davej, hpa, linux-kernel, pavel, pratap, Riley

* zach@vmware.com (zach@vmware.com) wrote:
> +	memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
>  	if (PTRS_PER_PMD == 1)
>  		spin_lock_irqsave(&pgd_lock, flags);
>  
> -	memcpy((pgd_t *)pgd + USER_PTRS_PER_PGD,
> +	clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
>  			swapper_pg_dir + USER_PTRS_PER_PGD,
> -			(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
> -
> +			KERNEL_PGD_PTRS);
>  	if (PTRS_PER_PMD > 1)
>  		return;
>  
>  	pgd_list_add(pgd);
>  	spin_unlock_irqrestore(&pgd_lock, flags);
> -	memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));

Why memset was never done on PAE?

thanks,
-chris

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] i386 Encapsulate copying of pgd entries
  2005-08-06  1:13 ` Chris Wright
@ 2005-08-06  1:52   ` Zachary Amsden
  2005-08-06  6:00     ` Zachary Amsden
  2005-08-06  8:05   ` Zachary Amsden
  1 sibling, 1 reply; 8+ messages in thread
From: Zachary Amsden @ 2005-08-06  1:52 UTC (permalink / raw)
  To: Chris Wright; +Cc: akpm, chrisl, davej, hpa, linux-kernel, pavel, pratap, Riley

Chris Wright wrote:

>* zach@vmware.com (zach@vmware.com) wrote:
>  
>
>>+	memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
>> 	if (PTRS_PER_PMD == 1)
>> 		spin_lock_irqsave(&pgd_lock, flags);
>> 
>>-	memcpy((pgd_t *)pgd + USER_PTRS_PER_PGD,
>>+	clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
>> 			swapper_pg_dir + USER_PTRS_PER_PGD,
>>-			(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
>>-
>>+			KERNEL_PGD_PTRS);
>> 	if (PTRS_PER_PMD > 1)
>> 		return;
>> 
>> 	pgd_list_add(pgd);
>> 	spin_unlock_irqrestore(&pgd_lock, flags);
>>-	memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
>>    
>>
>
>Why memset was never done on PAE?
>  
>

That's a good point.  The memset() is redundant on PAE, since it 
allocates all 4 PMDs immediately after that (in pgd_alloc).  There are 
two reasons for moving the memset() - one is that it can potentially 
perform useful work ahead of the lock and effectively act as a 
prefetch.  The second is that at least on a hypervisor, 
clone_pgd_range() is likely to be taken as a page allocation hint, and 
thus moving the memset() before this operation allows only the actually 
present page directory entry updates to be passed to the hypervisor.

Actually, the memset() could be redundant on non-PAE as well, since we 
should have gone through free_pgtables, which would have done a 
pmd_clear() on each user level pmd, and the kernel level pmds are copied 
in again inside the lock.

I'll try it out to see if this is possible.

Zach

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] i386 Encapsulate copying of pgd entries
  2005-08-06  1:52   ` Zachary Amsden
@ 2005-08-06  6:00     ` Zachary Amsden
  0 siblings, 0 replies; 8+ messages in thread
From: Zachary Amsden @ 2005-08-06  6:00 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Chris Wright, akpm, chrisl, davej, hpa, linux-kernel, pavel,
	pratap, Riley

[-- Attachment #1: Type: text/plain, Size: 5163 bytes --]

Zachary Amsden wrote:

> Chris Wright wrote:
>
>>
>> Why memset was never done on PAE?
>
>
> That's a good point.  The memset() is redundant on PAE, since it 
> allocates all 4 PMDs immediately after that (in pgd_alloc).  There are 
> two reasons for moving the memset() - one is that it can potentially 
> perform useful work ahead of the lock and effectively act as a 
> prefetch.  The second is that at least on a hypervisor, 
> clone_pgd_range() is likely to be taken as a page allocation hint, and 
> thus moving the memset() before this operation allows only the 
> actually present page directory entry updates to be passed to the 
> hypervisor.
>
> Actually, the memset() could be redundant on non-PAE as well, since we 
> should have gone through free_pgtables, which would have done a 
> pmd_clear() on each user level pmd, and the kernel level pmds are 
> copied in again inside the lock.
>
> I'll try it out to see if this is possible.
>
> Zach
>

So that turned out to be a really bad idea.  But, I did notice that the 
pmds in PAE mode could be cached with the pgds instead of destroying and 
re-allocating them.  Unfortunately, this spends three pages per cached 
PAE pgd, and doesn't look like a big win.  I ran microbenchmarks, stolen 
mostly from lmbench (thank you Larry!), and this patch shows almost no 
improvement.  Judging by the fact the the kmem slab cache seems to work 
very efficiently, I don't think the extra overhead from memset in the 
constructor is of much significance.

Here's the benchmark results on native hardware (P4, 2.4 GHz, PAE kernel):

Before:
(getpid and segv truncated beyond my scrollback, but of no significance)
  forkwait: 0.596u 3.932s 0:04.54 99.5% 0+0k 0+0io 0pf+0w
  forkwait: 0.632u 3.876s 0:04.50 100.0%        0+0k 0+0io 0pf+0w
  forkwait: 0.468u 4.048s 0:04.51 99.7% 0+0k 0+0io 0pf+0w
  forkwait: 0.516u 3.988s 0:04.50 99.7% 0+0k 0+0io 0pf+0w
  forkwait: 0.644u 3.908s 0:04.55 99.7% 0+0k 0+0io 0pf+0w
   divzero: 1.356u 6.712s 0:08.07 99.8% 0+0k 0+0io 0pf+0w
   divzero: 1.332u 6.620s 0:07.94 100.1%        0+0k 0+0io 0pf+0w
   divzero: 1.300u 6.652s 0:07.95 100.0%        0+0k 0+0io 0pf+0w
   divzero: 1.672u 6.312s 0:07.98 100.0%        0+0k 0+0io 0pf+0w
   divzero: 1.128u 6.824s 0:07.95 99.8% 0+0k 0+0io 0pf+0w
  lat_pipe: 0.228u 8.196s 0:16.98 49.5% 0+0k 0+0io 0pf+0w
  lat_pipe: 0.220u 8.420s 0:17.15 50.3% 0+0k 0+0io 0pf+0w
  lat_pipe: 0.236u 8.376s 0:17.00 50.5% 0+0k 0+0io 0pf+0w
  lat_pipe: 0.220u 8.140s 0:16.97 49.2% 0+0k 0+0io 0pf+0w
  lat_pipe: 0.232u 8.488s 0:16.86 51.6% 0+0k 0+0io 0pf+0w
    Switch: 5.896u 7.172s 0:11.97 109.1%        0+0k 0+0io 53pf+0w
    Switch: 6.168u 6.792s 0:11.23 115.3%        0+0k 0+0io 1pf+0w
    Switch: 6.084u 7.044s 0:11.22 116.9%        0+0k 0+0io 1pf+0w
    Switch: 6.044u 7.088s 0:11.34 115.6%        0+0k 0+0io 1pf+0w
    Switch: 6.252u 7.212s 0:11.45 117.5%        0+0k 0+0io 1pf+0w

After:

zach-dev2:Micro-bench $ cat out.post-patch
    getpid: 0.076u 0.000s 0:00.08 87.5% 0+0k 0+0io 0pf+0w
    getpid: 0.076u 0.004s 0:00.07 100.0%        0+0k 0+0io 0pf+0w
    getpid: 0.080u 0.000s 0:00.08 100.0%        0+0k 0+0io 0pf+0w
    getpid: 0.076u 0.000s 0:00.07 100.0%        0+0k 0+0io 0pf+0w
    getpid: 0.072u 0.004s 0:00.07 100.0%        0+0k 0+0io 0pf+0w
      segv: 1.168u 8.552s 0:09.72 99.8% 0+0k 0+0io 0pf+0w
      segv: 1.160u 8.544s 0:09.70 100.0%        0+0k 0+0io 0pf+0w
      segv: 1.248u 8.364s 0:09.61 99.8% 0+0k 0+0io 0pf+0w
      segv: 1.296u 8.368s 0:09.66 99.8% 0+0k 0+0io 0pf+0w
      segv: 1.312u 8.288s 0:09.59 100.0%        0+0k 0+0io 0pf+0w
  forkwait: 0.600u 3.932s 0:04.53 100.0%        0+0k 0+0io 0pf+0w
  forkwait: 0.580u 3.940s 0:04.51 100.2%        0+0k 0+0io 0pf+0w
  forkwait: 0.576u 3.948s 0:04.52 99.7% 0+0k 0+0io 0pf+0w
  forkwait: 0.492u 3.996s 0:04.48 100.0%        0+0k 0+0io 0pf+0w
  forkwait: 0.604u 3.908s 0:04.51 99.7% 0+0k 0+0io 0pf+0w
   divzero: 1.304u 6.740s 0:08.04 100.0%        0+0k 0+0io 0pf+0w
   divzero: 1.360u 6.704s 0:08.06 100.0%        0+0k 0+0io 0pf+0w
   divzero: 1.344u 6.696s 0:08.03 100.0%        0+0k 0+0io 0pf+0w
   divzero: 1.428u 6.600s 0:08.02 100.0%        0+0k 0+0io 0pf+0w
   divzero: 1.308u 6.720s 0:08.02 100.0%        0+0k 0+0io 0pf+0w
  lat_pipe: 0.212u 7.648s 0:16.40 47.8% 0+0k 0+0io 0pf+0w
  lat_pipe: 0.268u 8.208s 0:16.78 50.4% 0+0k 0+0io 0pf+0w
  lat_pipe: 0.188u 8.296s 0:16.42 51.5% 0+0k 0+0io 0pf+0w
  lat_pipe: 0.180u 8.084s 0:16.91 48.8% 0+0k 0+0io 0pf+0w
  lat_pipe: 0.160u 7.668s 0:16.85 46.4% 0+0k 0+0io 0pf+0w
    Switch: 6.168u 6.740s 0:11.91 108.3%        0+0k 0+0io 53pf+0w
    Switch: 5.860u 7.332s 0:11.45 115.1%        0+0k 0+0io 1pf+0w
    Switch: 5.804u 7.140s 0:11.34 114.1%        0+0k 0+0io 1pf+0w
    Switch: 6.168u 6.644s 0:11.12 115.1%        0+0k 0+0io 1pf+0w
    Switch: 6.076u 6.896s 0:11.34 114.2%        0+0k 0+0io 1pf+0w

So lat_pipe seems to have improved slightly.. but it could be noise.  
Yeah, not worth it.  Plus, this patch is obviously broken - the panic() 
could be avoided by reworking the code, but this seems like a large 
amount of work for very little gain.  Nevertheless, I have attached the 
patch for posterity's sake.

Zach

[-- Attachment #2: pae-fork-opt --]
[-- Type: text/plain, Size: 2602 bytes --]


Index: linux-2.6.13-rc4-mm1/arch/i386/mm/pgtable.c
===================================================================
--- linux-2.6.13-rc4-mm1.orig/arch/i386/mm/pgtable.c	2005-08-04 05:42:32.000000000 -0700
+++ linux-2.6.13-rc4-mm1/arch/i386/mm/pgtable.c	2005-08-05 13:04:14.000000000 -0700
@@ -214,8 +214,16 @@
 	clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
 			swapper_pg_dir + USER_PTRS_PER_PGD,
 			PTRS_PER_PGD - USER_PTRS_PER_PGD);
-	if (PTRS_PER_PMD > 1)
+	if (PTRS_PER_PMD > 1) {
+		int i;
+		for (i = 0; i < USER_PTRS_PER_PGD; ++i) {
+			pmd_t *pmd = kmem_cache_alloc(pmd_cache, GFP_KERNEL);
+			if (!pmd)
+				panic("oom");
+			set_pgd(&((pgd_t *)pgd)[i], __pgd(1 + __pa(pmd)));
+		}
 		return;
+	}
 
 	pgd_list_add(pgd);
 	spin_unlock_irqrestore(&pgd_lock, flags);
@@ -226,6 +234,14 @@
 {
 	unsigned long flags; /* can be called from interrupt context */
 
+	/* in the PAE case user pgd entries are overwritten before usage */
+	if (PTRS_PER_PMD > 1) {
+		int i;
+		for (i = 0; i < USER_PTRS_PER_PGD; ++i)
+			kmem_cache_free(pmd_cache, (void *)__va(pgd_val(((pgd_t *)pgd)[i])-1));
+		return;
+	}
+
 	spin_lock_irqsave(&pgd_lock, flags);
 	pgd_list_del(pgd);
 	spin_unlock_irqrestore(&pgd_lock, flags);
@@ -233,35 +249,12 @@
 
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-	int i;
 	pgd_t *pgd = kmem_cache_alloc(pgd_cache, GFP_KERNEL);
-
-	if (PTRS_PER_PMD == 1 || !pgd)
-		return pgd;
-
-	for (i = 0; i < USER_PTRS_PER_PGD; ++i) {
-		pmd_t *pmd = kmem_cache_alloc(pmd_cache, GFP_KERNEL);
-		if (!pmd)
-			goto out_oom;
-		set_pgd(&pgd[i], __pgd(1 + __pa(pmd)));
-	}
 	return pgd;
 
-out_oom:
-	for (i--; i >= 0; i--)
-		kmem_cache_free(pmd_cache, (void *)__va(pgd_val(pgd[i])-1));
-	kmem_cache_free(pgd_cache, pgd);
-	return NULL;
 }
 
 void pgd_free(pgd_t *pgd)
 {
-	int i;
-
-	/* in the PAE case user pgd entries are overwritten before usage */
-	if (PTRS_PER_PMD > 1)
-		for (i = 0; i < USER_PTRS_PER_PGD; ++i)
-			kmem_cache_free(pmd_cache, (void *)__va(pgd_val(pgd[i])-1));
-	/* in the non-PAE case, free_pgtables() clears user pgd entries */
 	kmem_cache_free(pgd_cache, pgd);
 }
Index: linux-2.6.13-rc4-mm1/arch/i386/mm/init.c
===================================================================
--- linux-2.6.13-rc4-mm1.orig/arch/i386/mm/init.c	2005-08-04 04:01:27.000000000 -0700
+++ linux-2.6.13-rc4-mm1/arch/i386/mm/init.c	2005-08-05 13:02:34.000000000 -0700
@@ -635,7 +635,7 @@
 				PTRS_PER_PGD*sizeof(pgd_t),
 				0,
 				pgd_ctor,
-				PTRS_PER_PMD == 1 ? pgd_dtor : NULL);
+				pgd_dtor);
 	if (!pgd_cache)
 		panic("pgtable_cache_init(): Cannot create pgd cache");
 }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] i386 Encapsulate copying of pgd entries
  2005-08-06  1:13 ` Chris Wright
  2005-08-06  1:52   ` Zachary Amsden
@ 2005-08-06  8:05   ` Zachary Amsden
  1 sibling, 0 replies; 8+ messages in thread
From: Zachary Amsden @ 2005-08-06  8:05 UTC (permalink / raw)
  To: Chris Wright, akpm; +Cc: chrisl, davej, hpa, linux-kernel, pavel, pratap, Riley

[-- Attachment #1: Type: text/plain, Size: 782 bytes --]

Chris Wright wrote:

>* zach@vmware.com (zach@vmware.com) wrote:
>  
>
>>+	memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
>> 	if (PTRS_PER_PMD == 1)
>> 		spin_lock_irqsave(&pgd_lock, flags);
>> 
>>-	memcpy((pgd_t *)pgd + USER_PTRS_PER_PGD,
>>+	clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
>> 			swapper_pg_dir + USER_PTRS_PER_PGD,
>>-			(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
>>-
>>+			KERNEL_PGD_PTRS);
>> 	if (PTRS_PER_PMD > 1)
>> 		return;
>> 
>> 	pgd_list_add(pgd);
>> 	spin_unlock_irqrestore(&pgd_lock, flags);
>>-	memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
>>    
>>
>
>Why memset was never done on PAE?
>
>thanks,
>-chris
>  
>

Update patch.  Thanks for catching this.  Tested PAE and non-PAE on 
native hardware and in a virtual machine :)

Zach

[-- Attachment #2: pgd-clone-call --]
[-- Type: text/plain, Size: 3424 bytes --]

Add a clone operation for pgd updates.

This helps complete the encapsulation of updates to page tables (or pages
about to become page tables) into accessor functions rather than using
memcpy() to duplicate them.  This is both generally good for consistency
and also necessary for running in a hypervisor which requires explicit
updates to page table entries.

The new function is:

clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
 
   dst - pointer to pgd range anwhere on a pgd page
   src - ""
   count - the number of pgds to copy.

   dst and src can be on the same page, but the range must not overlap
   and must not cross a page boundary.

Note that I ommitted using this call to copy pgd entries into the
software suspend page root, since this is not technically a live paging
structure, rather it is used on resume from suspend.  CC'ing Pavel in case
he has any feedback on this.

Thanks to Chris Wright for noticing that this could be more optimal in
PAE compiles by eliminating the memset.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Index: linux-2.6.13/arch/i386/mm/pgtable.c
===================================================================
--- linux-2.6.13.orig/arch/i386/mm/pgtable.c	2005-08-04 12:02:10.000000000 -0700
+++ linux-2.6.13/arch/i386/mm/pgtable.c	2005-08-06 00:43:51.000000000 -0700
@@ -207,19 +207,19 @@
 {
 	unsigned long flags;
 
-	if (PTRS_PER_PMD == 1)
+	if (PTRS_PER_PMD == 1) {
+		memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
 		spin_lock_irqsave(&pgd_lock, flags);
+	}
 
-	memcpy((pgd_t *)pgd + USER_PTRS_PER_PGD,
+	clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
 			swapper_pg_dir + USER_PTRS_PER_PGD,
-			(PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t));
-
+			KERNEL_PGD_PTRS);
 	if (PTRS_PER_PMD > 1)
 		return;
 
 	pgd_list_add(pgd);
 	spin_unlock_irqrestore(&pgd_lock, flags);
-	memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
 }
 
 /* never called when PTRS_PER_PMD > 1 */
Index: linux-2.6.13/arch/i386/kernel/smpboot.c
===================================================================
--- linux-2.6.13.orig/arch/i386/kernel/smpboot.c	2005-08-04 12:02:10.000000000 -0700
+++ linux-2.6.13/arch/i386/kernel/smpboot.c	2005-08-04 13:15:45.000000000 -0700
@@ -1017,8 +1017,8 @@
 	tsc_sync_disabled = 1;
 
 	/* init low mem mapping */
-	memcpy(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS,
-			sizeof(swapper_pg_dir[0]) * KERNEL_PGD_PTRS);
+	clone_pgd_range(swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS,
+			KERNEL_PGD_PTRS);
 	flush_tlb_all();
 	schedule_work(&task);
 	wait_for_completion(&done);
Index: linux-2.6.13/include/asm-i386/pgtable.h
===================================================================
--- linux-2.6.13.orig/include/asm-i386/pgtable.h	2005-08-04 12:02:10.000000000 -0700
+++ linux-2.6.13/include/asm-i386/pgtable.h	2005-08-05 17:12:33.000000000 -0700
@@ -276,6 +276,21 @@
 }
 
 /*
+ * clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
+ *
+ *  dst - pointer to pgd range anwhere on a pgd page
+ *  src - ""
+ *  count - the number of pgds to copy.
+ *
+ * dst and src can be on the same page, but the range must not overlap,
+ * and must not cross a page boundary.
+ */
+static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
+{
+       memcpy(dst, src, count * sizeof(pgd_t));
+}
+
+/*
  * Macro to mark a page protection value as "uncacheable".  On processors which do not support
  * it, this is a no-op.
  */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] i386 Encapsulate copying of pgd entries
  2005-08-06  0:26 [PATCH 1/1] i386 Encapsulate copying of pgd entries zach
  2005-08-06  1:13 ` Chris Wright
@ 2005-08-07 19:00 ` Pavel Machek
  2005-08-07 19:20   ` Zachary Amsden
  1 sibling, 1 reply; 8+ messages in thread
From: Pavel Machek @ 2005-08-07 19:00 UTC (permalink / raw)
  To: zach; +Cc: akpm, chrisl, davej, hpa, linux-kernel, pavel, pratap, Riley

Hi!

> This helps complete the encapsulation of updates to page tables (or pages
> about to become page tables) into accessor functions rather than using
> memcpy() to duplicate them.  This is both generally good for consistency
> and also necessary for running in a hypervisor which requires explicit
> updates to page table entries.

Hmm, I'm not sure if this kind of hypervisor can reasonably work with swsusp;
swsusp is just copying memory, it does not know which part of memory are page tables.
				Pavel
-- 
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms         


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] i386 Encapsulate copying of pgd entries
  2005-08-07 19:00 ` Pavel Machek
@ 2005-08-07 19:20   ` Zachary Amsden
  2005-08-07 19:37     ` Pavel Machek
  0 siblings, 1 reply; 8+ messages in thread
From: Zachary Amsden @ 2005-08-07 19:20 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, pratap

Pavel Machek wrote:

>Hi!
>
>  
>
>>This helps complete the encapsulation of updates to page tables (or pages
>>about to become page tables) into accessor functions rather than using
>>memcpy() to duplicate them.  This is both generally good for consistency
>>and also necessary for running in a hypervisor which requires explicit
>>updates to page table entries.
>>    
>>
>
>Hmm, I'm not sure if this kind of hypervisor can reasonably work with swsusp;
>swsusp is just copying memory, it does not know which part of memory are page tables.
>				Pavel
>  
>

There are good and bad things about this.  Everything with swap suspend 
should be fine for the suspend portion of things; it is when resuming 
back up that one must take care to call the hypervisor functions for 
reloading control registers and updating page tables.

I'm not sure that Xen can cope with that scenario - it would have to go 
into shadowed mode first, then after resume, go back from shadow mode 
into direct page table mode.  Otherwise, as you say, making swap suspend 
aware of what pages are page tables would be quite difficult and 
needlessly complicate the code.

Since most hypervisors will likely provide a suspend/resume mechanism 
that is external to the guest, most of this is a moot point anyways.  
But I wondered if you thought the pgd_clone() accessor would make this 
cleaner or if it is just most confusing:

#ifdef CONFIG_SOFTWARE_SUSPEND
/*
 * Swap suspend & friends need this for resume because things like the 
intel-agp
 * driver might have split up a kernel 4MB mapping.
 */
char __nosavedata swsusp_pg_dir[PAGE_SIZE]
        __attribute__ ((aligned (PAGE_SIZE)));

static inline void save_pg_dir(void)
{
        memcpy(swsusp_pg_dir, swapper_pg_dir, PAGE_SIZE);  <--- could be 
clone_pgd_range()
}

Zach

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/1] i386 Encapsulate copying of pgd entries
  2005-08-07 19:20   ` Zachary Amsden
@ 2005-08-07 19:37     ` Pavel Machek
  0 siblings, 0 replies; 8+ messages in thread
From: Pavel Machek @ 2005-08-07 19:37 UTC (permalink / raw)
  To: Zachary Amsden; +Cc: linux-kernel, pratap

Hi!

> Since most hypervisors will likely provide a suspend/resume mechanism 
> that is external to the guest, most of this is a moot point anyways.  

Ok.

> But I wondered if you thought the pgd_clone() accessor would make this 
> cleaner or if it is just most confusing:
> 
> #ifdef CONFIG_SOFTWARE_SUSPEND
> /*
> * Swap suspend & friends need this for resume because things like the 
> intel-agp
> * driver might have split up a kernel 4MB mapping.
> */
> char __nosavedata swsusp_pg_dir[PAGE_SIZE]
>        __attribute__ ((aligned (PAGE_SIZE)));
> 
> static inline void save_pg_dir(void)
> {
>        memcpy(swsusp_pg_dir, swapper_pg_dir, PAGE_SIZE);  <--- could be 
> clone_pgd_range()
> }

Yep, clone_pgd_range would make sense here... 
								Pavel

-- 
if you have sharp zaurus hardware you don't need... you know my address

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-08-07 19:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-06  0:26 [PATCH 1/1] i386 Encapsulate copying of pgd entries zach
2005-08-06  1:13 ` Chris Wright
2005-08-06  1:52   ` Zachary Amsden
2005-08-06  6:00     ` Zachary Amsden
2005-08-06  8:05   ` Zachary Amsden
2005-08-07 19:00 ` Pavel Machek
2005-08-07 19:20   ` Zachary Amsden
2005-08-07 19:37     ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox