From mboxrd@z Thu Jan 1 00:00:00 1970 From: zhu Subject: [PATCH - proposed] XI Shadow Page Table Mechanism] Date: Fri, 30 Jun 2006 10:37:16 +0800 Message-ID: <44A48E5C.8060609@gmail.com> References: <44A3CA74.4000206@virtualiron.com> <44A3D3CE.3040408@virtualiron.com> <44A3E08B.9090103@gmail.com> <44A3E98B.9040602@virtualiron.com> <44A3F073.50305@virtualiron.com> Mime-Version: 1.0 Content-Type: text/plain; charset=gb18030; format=flowed Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <44A3F073.50305@virtualiron.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Robert S. Phillips" Cc: Xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Hi, Robert, I found out another confusing code snippet: in void xi_invl_mfn(struct domain *d, unsigned long mfn) if (ext && pfn < ext->large_page_aligned_size) According to the code, it should be if (ext && (pfn>>SPT_ENTRIES_ORDER) < ext->large_page_aligned_size) If I made any mistake, please point it out. _______________________________________________________ Best Regards, hanzhu Robert S. Phillips =D0=B4=B5=C0: > Hi Han, > Yes it is a bug but of limited impact. The effect is that we will fail= =20 > to keep the 'accessed' flag up-to-date but only for L3 guest page table= =20 > entries in 64-bit guests. Still I'm very pleased that you spotted it. > -- rsp >=20 > Robert S. Phillips wrote: >> Hi Han, >> I think you're right. I'll get back to you soon after we review the=20 >> ramifications. >> Thanks! >> -- rsp >> p.s. Out of curiosity ... where do you work? >> >> han wrote: >>> Hi Robert, >>> Thanks a lot for your thorough explanation. >>> However, in dt_init_action I notice this code snippet: >>> + int _32pae_l3 =3D c_nr_levels <=3D 3 && c_level =3D=3D 3= ; >>> Seems c_nr_levels is always <=3D 3, isn't it? >>> I'm afraid the code means that c_nr_levels =3D=3D 3. >>> >>> I suppose the c_nr_levels should be 3 if the guest is 32bit(pae or=20 >>> not), and c_nr_levels should be 4 if the guest is 64bit. So I=20 >>> mentioned you about the code for uncombine_nr_levels. >>> >>> >>> _______________________________________________________ >>> Best Regards, >>> hanzhu >>> >>> >>> Robert S. Phillips =D0=B4=B5=C0: >>>> Hello Han, >>>> >>>> The puzzling function, uncombine_nr_levels(), should be reviewed=20 >>>> together with the previous two functions: combine nr_levels() and=20 >>>> uncombine_level(), shown below. The problem I'm attempting to solve= =20 >>>> is to minimize the size of the decision table. It is currently 4096= =20 >>>> entries. >>>> >>>> The decision function depends on "level" (that is, the level of the=20 >>>> current shadow page table) and, to a lesser extent, on "nr_levels"=20 >>>> (that is, the number of guest paging levels). It really only cares=20 >>>> if nr_levels is 3 or not, which is why uncombine_nr_levels() returns= =20 >>>> 3 or not 3. >>>> >>>> Normally each of the two values can each have three values (2, 3, or= =20 >>>> 4) making 9 combinations. >>>> I squeeze them down to 4 combinations, which cuts the decision table= =20 >>>> size by more than half. >>>> >>>> The function combine_levels() is used to encode nr_levels and=20 >>>> levels into a number from 0 to 3. >>>> >>>> The functions uncombine_nr_levels() and uncombine_level() do the=20 >>>> inverse, sort of: >>>> uncombine_level() returns the number of levels (1, 2, 3, or 4) >>>> and uncombine_nr_levels() returns 3 if the number of guest paging=20 >>>> levels was 3, 0 otherwise. This is all dt_init_action() needs to=20 >>>> initialize the decision table. It only tests if c_nr_levels is 3. >>>> >>>> I hope this explanation helps. >>>> >>>> -- rsp >>>> >>>> /* Combine nr_levels and level into 4 combinations by >>>> * taking advantage of the fact that the rules for PTE levels >>>> * 3 and 4 are identical >>>> * except that the PDPT (ie nr_level =3D=3D 3, level =3D=3D 3) >>>> * is slightly irregular. */ >>>> static inline int combine_levels(int c_nr_levels, >>>> int c_level) >>>> { >>>> switch (c_level) { >>>> case 1: return 0; >>>> case 2: return 1; >>>> case 3: return c_nr_levels =3D=3D 3 >>>> ? 2 /* PDPT */ >>>> : 3; /* treat like level 4*/ >>>> default: /* the compiler needs this */ >>>> case 4: return 3; >>>> } >>>> } >>>> >>>> static inline int uncombine_level(int combo) >>>> { >>>> return combo + 1; >>>> } >>>> >>>> static inline int uncombine_nr_levels(int combo) >>>> { >>>> return combo =3D=3D 2 ? 3 : 0; >>>> } >>>> >>>> Ben Thomas wrote: >>>>> >>>>> >>>>> -------- Original Message -------- >>>>> Subject: Re: [Xen-devel] [PATCH - proposed] XI Shadow Page Table=20 >>>>> Mechanism >>>>> Date: Thu, 29 Jun 2006 20:25:13 +0800 >>>>> From: han >>>>> To: Ben Thomas >>>>> References: <44A055FA.1010405@virtualiron.com> >>>>> >>>>> Hi, Ben >>>>> I'm learning about your patch for the rewritten shadow pagetable co= de. >>>>> However, I feel confused about some part of your code: >>>>> +static inline int uncombine_nr_levels(int combo) >>>>> +{ >>>>> + return combo =3D=3D 2 ? 3 : 0; >>>>> +} >>>>> shouldn't it be like below snippet: >>>>> +static inline int uncombine_nr_levels(int combo) >>>>> +{ >>>>> + return combo <=3D 2 ? 3 : 4; >>>>> +} >>>>> >>>>> _______________________________________________________ >>>>> Best Regards, >>>>> hanzhu >>>>> >>>>> >>>>> Ben Thomas =D0=B4=B5=C0: >>>>>> A post last week contained the design document for a shadow page >>>>>> table mechanism. This post contains more information, and a pointe= r >>>>>> to the code. >>>>>> >>>>>> As a recap, the "XI Shadow Mechanism" is a design for shadow >>>>>> page table code for fully virtualized HVM domains running on a 64-= bit >>>>>> Xen hypervisor. >>>>>> >>>>>> This work was undertaken to address a number of goals. These are >>>>>> enumerated in the document posted last week and include: >>>>>> >>>>>> - ability to run fully virtualized 32, 32PAE and 64 bit guest >>>>>> domains concurrently on a 64-bit hypervisor >>>>>> >>>>>> - start support of live migration of fully virtualized domains >>>>>> >>>>>> - provide good performance and robustness >>>>>> >>>>>> - limit size and scope of change to Xen hypervisor >>>>>> >>>>>> This design has been coded, and tested on a variety of operating >>>>>> systems. I've included this weekend's test report. >>>>>> >>>>>> Of particular interest from the following table are the Average Ti= me >>>>>> comparisions (smaller is better): >>>>>> >>>>>> pfault 10.56 vs 2.43 (XI) >>>>>> sync_all 39.72 vs 33.51 (XI) >>>>>> invl_pg 22.75 vs 4.00 (XI) >>>>>> >>>>>> The patch is too large for the mailing list, and may be obtained f= rom >>>>>> >>>>>> http://s3.amazonaws.com/VIDownloads/xi-20060626.patch >>>>>> >>>>>> The patch applies against changeset 10352 (and possibly others) >>>>>> >>>>>> We look forward to any comments and suggestions. >>>>>> >>>>>> Thanks, >>>>>> -b >>>>>> >>>>>> >>>>>> >>>>>> XI Shadow Test Report >>>>>> >>>>>> This report shows a comparsion of test runs done on xen-unstable=20 >>>>>> changeset >>>>>> 10352 with and without the XI patch applied. It also compares=20 >>>>>> pfault, >>>>>> sync_all, and invl_pg performance. >>>>>> >>>>>> With the XI patch 32bit SMP (PAE) guests now boot and pass the=20 >>>>>> regression >>>>>> tests, closing Bug #661. With the XI patch 64bit SMP guests are=20 >>>>>> much more >>>>>> stable and are no longer experiencing SEGVs and GPFs, closing Bug=20 >>>>>> #678. >>>>>> >>>>>> -----------------------------------------------------------------= -----=20 >>>>>> >>>>>> | XEN | Guest Kernel SMP kernels booted with 2=20 >>>>>> CPUs | >>>>>> |=20 >>>>>> Changeset|--------------------------------------------------------= ---|=20 >>>>>> >>>>>> | | 32bit Non-XI | 32bit XI | 64bit Non-XI | 64bit=20 >>>>>> XI | >>>>>> | =20 >>>>>> |--------------|--------------|--------------|--------------| >>>>>> | | Boot | Test | Boot | Test | Boot | Test | Boot |=20 >>>>>> Test | >>>>>> |----------|------|-------|------|-------|------|-------|------|--= -----|=20 >>>>>> >>>>>> | 10352 | Fail | | Pass | 852/1 | Pass | 852/0 | Pass |=20 >>>>>> 852/0 | >>>>>> | | (1) | | | (2) | |(3,4,5)| =20 >>>>>> |(3,4) | >>>>>> -----------------------------------------------------------------= -----=20 >>>>>> >>>>>> >>>>>> 1. BUG 661: 32bit SMP guest crashes on boot: >>>>>> "(XEN) __hvm_bug at vmx.c:2289" >>>>>> 2. BUG 666: 32bit UP guest fail ltp gettimeofday02: >>>>>> "Time is going backwards" >>>>>> 3. BUG 663: 64bit SMP guest report these messages while running: >>>>>> "(XEN) spurious IRQ irq got=3D-1" >>>>>> 4. BUG 662: 64bit SMP guest report this message in ltp pth_str01/2= /3: >>>>>> "(XEN) level trig mode repeatedly for=20 >>>>>> vector 252" >>>>>> 5. BUG 678: 64bit 2CPU SMP guest failed several ltp tests with=20 >>>>>> SEGVs or GPF >>>>>> "sshd[23654] general protection rip:2a9555d426 rsp:7fbfffd000=20 >>>>>> error 0" >>>>>> >>>>>> The XI patch can be enabled or disabled at build time and includes= =20 >>>>>> some >>>>>> performance statistics that can be examined whether or not it's=20 >>>>>> enabled. >>>>>> This allows us to compare the performance of XI vs the existing=20 >>>>>> shadow code. >>>>>> You access the statistics from the XEN console (^a^a^a). Typing a = 'Y' >>>>>> clears the counters and a 'y' displays the counters since the last= =20 >>>>>> clear. >>>>>> Here is a comparsion of XI vs Non-XI shadow code done on xen-unsta= ble >>>>>> changeset 10352. The guest is a RHEL4U2-64bit 2 CPU SMP 256MB=20 >>>>>> Guest running >>>>>> a usex -b24 load for 5 minutes. Shadow statistics cleared at=20 >>>>>> start of run. >>>>>> >>>>>> (XEN) Non-XI shadow performance statistics for the last 303 second= s >>>>>> (XEN) Op type # of=20 >>>>>> Operations Total =20 >>>>>> Avg Tick Profile >>>>>> (XEN) <2usec <3usec <4usec <8usec =20 >>>>>> <16usec <32usec <64usec >=3D64usec ops/msec usec In= =20 >>>>>> Op Locked Total >>>>>> (XEN) pfault 6750685 1128922 624988 3249437 =20 >>>>>> 3833201 2863229 1224908 315992 19991362 10.56 =20 >>>>>> 0 0 0 >>>>>> (XEN) 33.7% 5.6% 3.1% 16.2% =20 >>>>>> 19.1% 14.3% 6.1% 1.5% 211175 >>>>>> (XEN) sync_all 8176 380 333 159011 =20 >>>>>> 167619 305470 814598 298411 1753998 39.72 =20 >>>>>> 0 0 0 >>>>>> (XEN) 0.4% 0.0% 0.0% 9.0% =20 >>>>>> 9.5% 17.4% 46.4% 17.0% 69670 >>>>>> (XEN) invl_pg 219 17 114 23333 =20 >>>>>> 40087 21904 24622 4639 114935 22.75 =20 >>>>>> 0 0 0 >>>>>> (XEN) 0.1% 0.0% 0.0% 20.3% =20 >>>>>> 34.8% 19.0% 21.4% 4.0% 2615 >>>>>> >>>>>> (XEN) XI shadow performance statistics for the last 302 seconds >>>>>> (XEN) Op type # of=20 >>>>>> Operations Total =20 >>>>>> Avg Tick Profile >>>>>> (XEN) <2usec <3usec <4usec <8usec =20 >>>>>> <16usec <32usec <64usec >=3D64usec ops/msec usec In= =20 >>>>>> Op Locked Total >>>>>> (XEN) pfault 23078414 4694139 485499 613419 =20 >>>>>> 396088 1427265 11976 8228 30715028 2.43 =20 >>>>>> 0 0 0 >>>>>> (XEN) 75.1% 15.2% 1.5% 1.9% =20 >>>>>> 1.2% 4.6% 0.0% 0.0% 74690 >>>>>> (XEN) sync_all 1734 97 169 305433 =20 >>>>>> 484991 174300 150199 130405 1247328 33.51 =20 >>>>>> 0 0 0 >>>>>> (XEN) 0.1% 0.0% 0.0% 24.4% =20 >>>>>> 38.8% 13.9% 12.0% 10.4% 41810 >>>>>> (XEN) invl_pg 10169 302977 125270 44927 =20 >>>>>> 5385 17603 776 1656 508763 4.00 =20 >>>>>> 0 0 0 >>>>>> (XEN) 1.9% 59.5% 24.6% 8.8% =20 >>>>>> 1.0% 3.4% 0.1% 0.3% 2037 >>>>>> >>>>>> >>>>> >>>> >> >=20