linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Question regarding x86_64 __PHYSICAL_MASK_SHIFT
@ 2005-10-04 11:59 Tejun Heo
  2005-10-04 12:02 ` Tejun Heo
  2005-10-04 17:24 ` Andi Kleen
  0 siblings, 2 replies; 8+ messages in thread
From: Tejun Heo @ 2005-10-04 11:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: lkml


  Hello, Andi.

  In include/asm-x86_64/page.h, __VIRTUAL_MASK_SHIFT is defined as 48 
bits which is the size of virtual address space on current x86_64 
machines as used as such.  OTOH, __PHYSICAL_MASK_SHIFT is defined as 46 
and used as mask shift for physical page address (i.e. physaddr >> 12).

  In addition to being a bit confusing due to similar names but 
different meanings, this means that we assume processors can physically 
address 58 (46 + 12) bits, but both amd64 and IA-32e manuals say that 
current architectural limit is 52 bits and bits 52-62 are reserved in 
all page table entries.  This currently (and in foreseeable future) 
doesn't cause any problem but it's still a bit weird.

  Am I missing something?

  Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question regarding x86_64 __PHYSICAL_MASK_SHIFT
  2005-10-04 11:59 Question regarding x86_64 __PHYSICAL_MASK_SHIFT Tejun Heo
@ 2005-10-04 12:02 ` Tejun Heo
  2005-10-04 17:24 ` Andi Kleen
  1 sibling, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2005-10-04 12:02 UTC (permalink / raw)
  To: Andi Kleen; +Cc: lkml

 Sorry, corrections.

On Tue, Oct 04, 2005 at 08:59:48PM +0900, Tejun Heo wrote:
> 
>  Hello, Andi.
> 
>  In include/asm-x86_64/page.h, __VIRTUAL_MASK_SHIFT is defined as 48 
> bits which is the size of virtual address space on current x86_64 
> machines as used as such.  OTOH, __PHYSICAL_MASK_SHIFT is defined as 46 
           and used as such.
> and used as mask shift for physical page address (i.e. physaddr >> 12).
                             physical page number (i.e. physaddr >> 12).

 Sorry about extra noise, I gotta eat something, feeling dizzy.  :-)

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question regarding x86_64 __PHYSICAL_MASK_SHIFT
  2005-10-04 11:59 Question regarding x86_64 __PHYSICAL_MASK_SHIFT Tejun Heo
  2005-10-04 12:02 ` Tejun Heo
@ 2005-10-04 17:24 ` Andi Kleen
  2005-10-04 18:52   ` Tejun Heo
  1 sibling, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2005-10-04 17:24 UTC (permalink / raw)
  To: Tejun Heo; +Cc: lkml, discuss

On Tuesday 04 October 2005 13:59, Tejun Heo wrote:
>   Hello, Andi.
>
>   In include/asm-x86_64/page.h, __VIRTUAL_MASK_SHIFT is defined as 48
> bits which is the size of virtual address space on current x86_64
> machines as used as such.  OTOH, __PHYSICAL_MASK_SHIFT is defined as 46
> and used as mask shift for physical page address (i.e. physaddr >> 12).
>
>   In addition to being a bit confusing due to similar names but
> different meanings, this means that we assume processors can physically
> address 58 (46 + 12) bits, but both amd64 and IA-32e manuals say that
> current architectural limit is 52 bits and bits 52-62 are reserved in
> all page table entries.  This currently (and in foreseeable future)
> doesn't cause any problem but it's still a bit weird.

You're right - PHYSICAL_MASK shouldn't be applied to PFNs, only to full
addresses. Fixed with appended patch.

The 46bits limit is because half of the 48bit virtual space 
is used for user space and the other 47 bit half is divided into
direct mapping and other mappings (ioremap, vmalloc etc.). All 
physical memory has to fit into the direct mapping, so you 
end with a 46 bit limit.

See also Documentation/x86-64/mm.txt

-Andi

Don't apply __PHYSICAL_MASK to page frame numbers

Pointed out by Tejun Heo.

Cc: htejun@gmail.com

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/include/asm-x86_64/pgtable.h
===================================================================
--- linux.orig/include/asm-x86_64/pgtable.h
+++ linux/include/asm-x86_64/pgtable.h
@@ -259,7 +259,7 @@ static inline unsigned long pud_bad(pud_
 #define pages_to_mb(x) ((x) >> (20-PAGE_SHIFT))	/* FIXME: is this
 						   right? */
 #define pte_page(x)	pfn_to_page(pte_pfn(x))
-#define pte_pfn(x)  ((pte_val(x) >> PAGE_SHIFT) & __PHYSICAL_MASK)
+#define pte_pfn(x)  ((pte_val(x) & __PHYSICAL_MASK) >> PAGE_SHIFT)
 
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
@@ -384,7 +384,7 @@ static inline pud_t *__pud_offset_k(pud_
 #define pmd_clear(xp)	do { set_pmd(xp, __pmd(0)); } while (0)
 #define	pmd_bad(x)	((pmd_val(x) & (~PTE_MASK & ~_PAGE_USER)) != 
_KERNPG_TABLE )
 #define pfn_pmd(nr,prot) (__pmd(((nr) << PAGE_SHIFT) | pgprot_val(prot)))
-#define pmd_pfn(x)  ((pmd_val(x) >> PAGE_SHIFT) & __PHYSICAL_MASK)
+#define pmd_pfn(x)  ((pmd_val(x) & __PHYSICAL_MASK) >> PAGE_SHIFT)
 
 #define pte_to_pgoff(pte) ((pte_val(pte) & PHYSICAL_PAGE_MASK) >> PAGE_SHIFT)
 #define pgoff_to_pte(off) ((pte_t) { ((off) << PAGE_SHIFT) | _PAGE_FILE })

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question regarding x86_64 __PHYSICAL_MASK_SHIFT
  2005-10-04 17:24 ` Andi Kleen
@ 2005-10-04 18:52   ` Tejun Heo
  2005-10-04 19:01     ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2005-10-04 18:52 UTC (permalink / raw)
  To: Andi Kleen; +Cc: lkml, discuss

 Hello, Andi.

On Tue, Oct 04, 2005 at 07:24:56PM +0200, Andi Kleen wrote:
> 
> You're right - PHYSICAL_MASK shouldn't be applied to PFNs, only to full
> addresses. Fixed with appended patch.
> 
> The 46bits limit is because half of the 48bit virtual space 
> is used for user space and the other 47 bit half is divided into
> direct mapping and other mappings (ioremap, vmalloc etc.). All 
> physical memory has to fit into the direct mapping, so you 
> end with a 46 bit limit.

 __PHYSICAL_MASK is only used to mask out non-pfn bits from page table
entries.  I don't really see how it's related to virtual space
construction.

> 
> See also Documentation/x86-64/mm.txt
> 

 Thanks.  :-)

 I think PHYSICAL_PAGE_MASK and PTE_FILE_MAX_BITS should also be
updated.  How about the following patch?  Compile & boot tested.


 This patch cleans up __PHYSICAL_MASK and related constants.

- __PHYSICAL_MASK_SHIFT is changed to 52 to reflect architectural
  physical address limit.

- PHYSICAL_PAGE_MASK is fixed to properly mask pfn bits of page table
  entries.

- PTE_FILE_MAX_BITS is fixed to properly reflect available bits in a
  page table entry (40bits).

- Macros dealing with page table entries are modified to universally
  use PTE_MASK (which equals PHYSICAL_PAGE_MASK) when extracting pfn
  for consistency.

Signed-off-by: Tejun Heo <htejun@gmail.com>

diff --git a/include/asm-x86_64/page.h b/include/asm-x86_64/page.h
--- a/include/asm-x86_64/page.h
+++ b/include/asm-x86_64/page.h
@@ -11,7 +11,7 @@
 #define PAGE_SIZE	(1UL << PAGE_SHIFT)
 #endif
 #define PAGE_MASK	(~(PAGE_SIZE-1))
-#define PHYSICAL_PAGE_MASK	(~(PAGE_SIZE-1) & (__PHYSICAL_MASK << PAGE_SHIFT))
+#define PHYSICAL_PAGE_MASK	(PAGE_MASK & __PHYSICAL_MASK)
 
 #define THREAD_ORDER 1 
 #ifdef __ASSEMBLY__
@@ -81,7 +81,7 @@ typedef struct { unsigned long pgprot; }
 #define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
 
 /* See Documentation/x86_64/mm.txt for a description of the memory map. */
-#define __PHYSICAL_MASK_SHIFT	46
+#define __PHYSICAL_MASK_SHIFT	52
 #define __PHYSICAL_MASK		((1UL << __PHYSICAL_MASK_SHIFT) - 1)
 #define __VIRTUAL_MASK_SHIFT	48
 #define __VIRTUAL_MASK		((1UL << __VIRTUAL_MASK_SHIFT) - 1)
diff --git a/include/asm-x86_64/pgtable.h b/include/asm-x86_64/pgtable.h
--- a/include/asm-x86_64/pgtable.h
+++ b/include/asm-x86_64/pgtable.h
@@ -101,7 +101,7 @@ static inline void pgd_clear (pgd_t * pg
 }
 
 #define pud_page(pud) \
-((unsigned long) __va(pud_val(pud) & PHYSICAL_PAGE_MASK))
+((unsigned long) __va(pud_val(pud) & PTE_MASK))
 
 #define ptep_get_and_clear(mm,addr,xp)	__pte(xchg(&(xp)->pte, 0))
 
@@ -245,7 +245,7 @@ static inline unsigned long pud_bad(pud_
 #define pages_to_mb(x) ((x) >> (20-PAGE_SHIFT))	/* FIXME: is this
 						   right? */
 #define pte_page(x)	pfn_to_page(pte_pfn(x))
-#define pte_pfn(x)  ((pte_val(x) >> PAGE_SHIFT) & __PHYSICAL_MASK)
+#define pte_pfn(x)	((pte_val(x) & PTE_MASK) >> PAGE_SHIFT)
 
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
@@ -352,11 +352,11 @@ static inline pud_t *__pud_offset_k(pud_
 #define pmd_clear(xp)	do { set_pmd(xp, __pmd(0)); } while (0)
 #define	pmd_bad(x)	((pmd_val(x) & (~PTE_MASK & ~_PAGE_USER)) != _KERNPG_TABLE )
 #define pfn_pmd(nr,prot) (__pmd(((nr) << PAGE_SHIFT) | pgprot_val(prot)))
-#define pmd_pfn(x)  ((pmd_val(x) >> PAGE_SHIFT) & __PHYSICAL_MASK)
+#define pmd_pfn(x)	((pmd_val(x) & PTE_MASK) >> PAGE_SHIFT)
 
-#define pte_to_pgoff(pte) ((pte_val(pte) & PHYSICAL_PAGE_MASK) >> PAGE_SHIFT)
+#define pte_to_pgoff(pte) ((pte_val(pte) & PTE_MASK) >> PAGE_SHIFT)
 #define pgoff_to_pte(off) ((pte_t) { ((off) << PAGE_SHIFT) | _PAGE_FILE })
-#define PTE_FILE_MAX_BITS __PHYSICAL_MASK_SHIFT
+#define PTE_FILE_MAX_BITS (__PHYSICAL_MASK_SHIFT - PAGE_SHIFT)
 
 /* PTE - Level 1 access. */
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question regarding x86_64 __PHYSICAL_MASK_SHIFT
  2005-10-04 18:52   ` Tejun Heo
@ 2005-10-04 19:01     ` Andi Kleen
  2005-10-04 19:19       ` Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2005-10-04 19:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: lkml, discuss

On Tuesday 04 October 2005 20:52, Tejun Heo wrote:
>  Hello, Andi.
>
> On Tue, Oct 04, 2005 at 07:24:56PM +0200, Andi Kleen wrote:
> > You're right - PHYSICAL_MASK shouldn't be applied to PFNs, only to full
> > addresses. Fixed with appended patch.
> >
> > The 46bits limit is because half of the 48bit virtual space
> > is used for user space and the other 47 bit half is divided into
> > direct mapping and other mappings (ioremap, vmalloc etc.). All
> > physical memory has to fit into the direct mapping, so you
> > end with a 46 bit limit.
>
>  __PHYSICAL_MASK is only used to mask out non-pfn bits from page table
> entries.  I don't really see how it's related to virtual space
> construction.

Any other bits are not needed and should be pte_bad()ed.

Ok there could be IO mappings beyond 46bits in theory, but I will worry about
these when they happen. For now it's better to error out to easier detect
memory corruptions in page tables (some x86-64 CPUs tend to machine
check when presented with unmapped physical addresses, which 
is nasty to track down) 

>
> > See also Documentation/x86-64/mm.txt
>
>  Thanks.  :-)
>
>  I think PHYSICAL_PAGE_MASK and PTE_FILE_MAX_BITS should also be
> updated.  How about the following patch?  Compile & boot tested.

No, I think the existing code with my patch is fine.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question regarding x86_64 __PHYSICAL_MASK_SHIFT
  2005-10-04 19:01     ` Andi Kleen
@ 2005-10-04 19:19       ` Tejun Heo
  2005-10-04 19:27         ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2005-10-04 19:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: lkml, discuss

Andi Kleen wrote:
> On Tuesday 04 October 2005 20:52, Tejun Heo wrote:
> 
>> Hello, Andi.
>>
>>On Tue, Oct 04, 2005 at 07:24:56PM +0200, Andi Kleen wrote:
>>
>>>You're right - PHYSICAL_MASK shouldn't be applied to PFNs, only to full
>>>addresses. Fixed with appended patch.
>>>
>>>The 46bits limit is because half of the 48bit virtual space
>>>is used for user space and the other 47 bit half is divided into
>>>direct mapping and other mappings (ioremap, vmalloc etc.). All
>>>physical memory has to fit into the direct mapping, so you
>>>end with a 46 bit limit.
>>
>> __PHYSICAL_MASK is only used to mask out non-pfn bits from page table
>>entries.  I don't really see how it's related to virtual space
>>construction.
> 
> 
> Any other bits are not needed and should be pte_bad()ed.
> 
> Ok there could be IO mappings beyond 46bits in theory, but I will worry about
> these when they happen. For now it's better to error out to easier detect
> memory corruptions in page tables (some x86-64 CPUs tend to machine
> check when presented with unmapped physical addresses, which 
> is nasty to track down) 
> 

  Ahh.. I see.

> 
>>>See also Documentation/x86-64/mm.txt
>>
>> Thanks.  :-)
>>
>> I think PHYSICAL_PAGE_MASK and PTE_FILE_MAX_BITS should also be
>>updated.  How about the following patch?  Compile & boot tested.
> 
> 
> No, I think the existing code with my patch is fine.

  Hmmm.. but, currently

* PHYSICAL_PAGE_MASK == (~(PAGE_SIZE-1)&(__PHYSICAL_MASK << PAGE_SHIFT)
	== (0xffffffff_fffff000 & (0x00003fff_ffffffff << 12)
  	== 0x03ffffff_fffff000
  while it actually should be 0x00003fff_fffff000

* PTE_FILE_MAX_BITS == __PHYSICAL_MASK_SHIFT == 46, but only 40bits are 
available in page table entries.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question regarding x86_64 __PHYSICAL_MASK_SHIFT
  2005-10-04 19:19       ` Tejun Heo
@ 2005-10-04 19:27         ` Andi Kleen
  2005-10-05  4:01           ` Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2005-10-04 19:27 UTC (permalink / raw)
  To: Tejun Heo; +Cc: lkml, discuss

On Tuesday 04 October 2005 21:19, Tejun Heo wrote:

>   Hmmm.. but, currently
>
> * PHYSICAL_PAGE_MASK == (~(PAGE_SIZE-1)&(__PHYSICAL_MASK << PAGE_SHIFT)
> 	== (0xffffffff_fffff000 & (0x00003fff_ffffffff << 12)
>   	== 0x03ffffff_fffff000
>   while it actually should be 0x00003fff_fffff000

Right. Fixed now

> * PTE_FILE_MAX_BITS == __PHYSICAL_MASK_SHIFT == 46, but only 40bits are
> available in page table entries.

The non linear mapping format is independent from the MMU, the number
is pretty much arbitary, but it is consistent to make it the same as
other ptes for easier sanity checking.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question regarding x86_64 __PHYSICAL_MASK_SHIFT
  2005-10-04 19:27         ` Andi Kleen
@ 2005-10-05  4:01           ` Tejun Heo
  0 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2005-10-05  4:01 UTC (permalink / raw)
  To: Andi Kleen; +Cc: lkml, discuss


  Hello, Andi.

Andi Kleen wrote:
> On Tuesday 04 October 2005 21:19, Tejun Heo wrote:
> 
> 
>>  Hmmm.. but, currently
>>
>>* PHYSICAL_PAGE_MASK == (~(PAGE_SIZE-1)&(__PHYSICAL_MASK << PAGE_SHIFT)
>>	== (0xffffffff_fffff000 & (0x00003fff_ffffffff << 12)
>>  	== 0x03ffffff_fffff000
>>  while it actually should be 0x00003fff_fffff000
> 
> 
> Right. Fixed now
> 
> 
>>* PTE_FILE_MAX_BITS == __PHYSICAL_MASK_SHIFT == 46, but only 40bits are
>>available in page table entries.
> 
> 
> The non linear mapping format is independent from the MMU, the number
> is pretty much arbitary, but it is consistent to make it the same as
> other ptes for easier sanity checking.

  Okay, please forgive me if I'm bugging you with something stupid but I 
still don't quite get it.  When using NONLINEAR mapping, pgoff is stored 
to pte to use later when faulting in the page.  Storing and reading 
pgoff are done with the following macros.

#define pte_to_pgoff(pte) \
	((pte_val(pte) & PHYSICAL_PAGE_MASK) >> PAGE_SHIFT)
#define pgoff_to_pte(off) \
	((pte_t) { ((off) << PAGE_SHIFT) | _PAGE_FILE })

  In pte_to_pgoff we're masking pte value with PHYSICAL_PAGE_MASK which 
gives us 34bits with patches applied.  This means that if a pgoff goes 
through pgoff_to_pte and then pte_to_pgoff only 34bits survive.

  sys_remap_file_pages() checks if required pgoff's fit in pte's using 
PTE_FILE_MAX_BITS.

#define PTE_FILE_MAX_BITS __PHYSICAL_MASK_SHIFT

  Which is 46 with patches applied.  Meaning that we could end up 
shoving up value larger than 34bits into pte and losing information when 
reading back (and it's only 16GB!).

  So, IMHO, we should either shrink PTE_FILE_MAX_BITS to 36 or change 
pte_to_pgoff/pgoff_to_pte macros to carry more bits (as pte bits 52-62 
are available, we can shove 46bits easily).

  No?

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-10-05  4:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-04 11:59 Question regarding x86_64 __PHYSICAL_MASK_SHIFT Tejun Heo
2005-10-04 12:02 ` Tejun Heo
2005-10-04 17:24 ` Andi Kleen
2005-10-04 18:52   ` Tejun Heo
2005-10-04 19:01     ` Andi Kleen
2005-10-04 19:19       ` Tejun Heo
2005-10-04 19:27         ` Andi Kleen
2005-10-05  4:01           ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).