From mboxrd@z Thu Jan 1 00:00:00 1970 From: PUCCETTI Armand Subject: Re: idle_pg_tables?? Date: Tue, 05 Sep 2006 12:50:16 +0200 Message-ID: <44FD5668.7080806@cea.fr> References: <907625E08839C4409CE5768403633E0BA7FEDC@sefsexmb1.amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <907625E08839C4409CE5768403633E0BA7FEDC@sefsexmb1.amd.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Petersson, Mats" Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Petersson, Mats a =E9crit : > =20 > > =20 >> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com=20 >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of=20 >> PUCCETTI Armand >> Sent: 01 September 2006 17:11 >> To: xen-devel@lists.xensource.com >> Subject: [Xen-devel] idle_pg_tables?? >> >> In the paging mechanism of XEN what is the role of the variable=20 >> 'idle_pg_table*' variables ?? >> >> For a 4-levels paging system these variables are defined in=20 >> x86_64.S and=20 >> partially initialised. >> Here is the code, copied from x86_64.S: >> >> __________________________________________________ >> ... >> >> /* Initial PML4 -- level-4 page table. */ >> .org 0x2000 >> ENTRY(idle_pg_table) >> ENTRY(idle_pg_table_4) >> .quad idle_pg_table_l3 - __PAGE_OFFSET + 7 # PML4[0] >> .fill 261,8,0 >> .quad idle_pg_table_l3 - __PAGE_OFFSET + 7 # PML4[262] >> >> /* Initial PDP -- level-3 page table. */ >> .org 0x3000 >> ENTRY(idle_pg_table_l3) >> .quad idle_pg_table_l2 - __PAGE_OFFSET + 7 >> >> /* Initial PDE -- level-2 page table. Maps first 64MB=20 >> physical memory. */ >> .org 0x4000 >> ENTRY(idle_pg_table_l2) >> .macro identmap from=3D0, count=3D32 >> .if \count-1 >> identmap "(\from+0)","(\count/2)" >> identmap "(\from+(0x200000*(\count/2)))","(\count/2)" >> .else >> .quad 0x00000000000001e3 + \from >> .endif >> .endm >> identmap >> >> .org 0x4000 + PAGE_SIZE >> .code64 >> >> .section ".bss.stack_aligned","w" >> ENTRY(cpu0_stack) >> .fill STACK_SIZE,1,0 >> ______________________________________________________ >> trying to understand that: >> >> - idle_pg_table_l4 is the same as idle_pg_table and contains=20 >> 263 enties,=20 >> all zeroed but two (identical) ones. These >> two pointers point somewhere close to idle_pg_table_l3. Why are there=20 >> two identical pointers and why shift them by __PAGE_OFFSET +7? >> =20 > > So that we can have a map for both LOW memory (address zero and 1GB > forward) and a map for the upper range of memory where Xen's > base-virtual address is (__PAGE_OFFSET). I think you'll find that if yo= u > shift __PAGE_OFFSET sufficient number of bits (30 or so), the remaining > number is 262... [I haven't checked this]. Reusing the same-pagetable > entry allows the use of a single entry in the next page-table level.=20 > > It's shifted by PAGE_OFFSET because the code is linked such that > everything is based on the virtual address that we eventually will use > in the system. But the page-table wants to have a PHYSICAL address, so > we subtract the virtual baseaddress from the location that we want the > PT entry to point to.=20 > > The magic number of 7 represents the flags for the page-entry, which is > bit 0=3DPresent, bit 1=3D R/W (Writable) and bit 2 U/S =3D> User access= ible. > Since this is the top lavel page, it makes sense to set it all to > present and allow full access, since next level down can always overrid= e > a permission (but can't allow something forbidden by upper level).=20 > > =20 >> - idle_pg_table_l3 is located between 0x3000 and 0x4000 ,=20 >> with only the=20 >> first slot initialised. The later points to >> level 2 table with some offset. >> =20 > > This allows the next 128MB of memory to be mapped. Which is sufficient > for the initialization of the system. > =20 >> - idle_pg_table_l2 has terrible code with a recursive macro,=20 >> who expands=20 >> into 63 quad constants. It is unclear >> to me why this complicated macro?? I would have put a table=20 >> of constants=20 >> pretty simply... Every entry in that l2 table points to a >> fixed address, at intervals of 4K (a page).l2 tables are=20 >> located between=20 >> 0x01E3 to 0x03E001E3 in groups. Every group >> is apparently a set of 4 page tables and each table has a=20 >> size of 128K.=20 >> Groups are separated by approx 256MB. >> Why are these spacings and groups? >> =20 > > I can't explain why there is a macro and why it does things in the way > it does, except I think you'll find that it's related to the code being > located at a virtual address which is non-zero at this level [I haven't > checked this out].=20 > > The value 0x1E3 is used to indicate that the pages are 2MB, Dirty > (prevents the MMU from rewriting them dirty if they are later written), > Accessed (same reason as D), Writeable and Present.=20 > =20 >> - idle_pg_table_l1 is not an entry and so l1 tables are not=20 >> allocated. Why? >> =20 > > Because the value 1E3 (or part thereof) is indicating that the page is > 2MB pages, so we don't need a L1 table entry for the pages defined in > the above way.=20 > =20 OK, pages are 2MB in size on AMD64. Now, what is the supposed size of that idle_pgtable_l2? According to page.h, the C view of that variable is extern l2_pgentry_t idle_pg_table_l2[ROOT_PAGETABLE_ENTRIES] with ROOT_PAGETABLE_ENTRIES=3D512 but according to x86_64.S, the memory area where that variable is=20 placed, namely between 0x4000 and 0x4000+PAGE_SIZE is much bigger. What is the real=20 size of that variable? Same question for the idle_pgtable_l4 and _l3? They are, a priori, 512*8=20 bytes=3D2M long... > May I ask what you're trying to achieve - as far as I know, the above > code is working just fine, so messing with it doesn't seem like a good > plan [Getting page-table initialization and such things to work right i= s > notoriously complicated, because it tends to break without any way of > really debugging it].=20 > =20 Sorry that I forgot to write my intentions: the purpose is to understand=20 the source code and perform a static analysis of it. That's part of IST FP6 OPENTC projec= t.