Hi, Robert, I found out another confusing code snippet: in void xi_invl_mfn(struct domain *d, unsigned long mfn) if (ext && pfn < ext->large_page_aligned_size) According to the code, it should be if (ext && (pfn>>SPT_ENTRIES_ORDER) < ext->large_page_aligned_size) If I made any mistake, please point it out. _______________________________________________________ Best Regards, hanzhu Robert S. Phillips дµÀ: > Hi Han, > Yes it is a bug but of limited impact. The effect is that we will fail > to keep the 'accessed' flag up-to-date but only for L3 guest page table > entries in 64-bit guests. Still I'm very pleased that you spotted it. > -- rsp > > Robert S. Phillips wrote: >> Hi Han, >> I think you're right. I'll get back to you soon after we review the >> ramifications. >> Thanks! >> -- rsp >> p.s. Out of curiosity ... where do you work? >> >> han wrote: >>> Hi Robert, >>> Thanks a lot for your thorough explanation. >>> However, in dt_init_action I notice this code snippet: >>> + int _32pae_l3 = c_nr_levels <= 3 && c_level == 3; >>> Seems c_nr_levels is always <= 3, isn't it? >>> I'm afraid the code means that c_nr_levels == 3. >>> >>> I suppose the c_nr_levels should be 3 if the guest is 32bit(pae or >>> not), and c_nr_levels should be 4 if the guest is 64bit. So I >>> mentioned you about the code for uncombine_nr_levels. >>> >>> >>> _______________________________________________________ >>> Best Regards, >>> hanzhu >>> >>> >>> Robert S. Phillips дµÀ: >>>> Hello Han, >>>> >>>> The puzzling function, uncombine_nr_levels(), should be reviewed >>>> together with the previous two functions: combine nr_levels() and >>>> uncombine_level(), shown below. The problem I'm attempting to solve >>>> is to minimize the size of the decision table. It is currently 4096 >>>> entries. >>>> >>>> The decision function depends on "level" (that is, the level of the >>>> current shadow page table) and, to a lesser extent, on "nr_levels" >>>> (that is, the number of guest paging levels). It really only cares >>>> if nr_levels is 3 or not, which is why uncombine_nr_levels() returns >>>> 3 or not 3. >>>> >>>> Normally each of the two values can each have three values (2, 3, or >>>> 4) making 9 combinations. >>>> I squeeze them down to 4 combinations, which cuts the decision table >>>> size by more than half. >>>> >>>> The function combine_levels() is used to encode nr_levels and >>>> levels into a number from 0 to 3. >>>> >>>> The functions uncombine_nr_levels() and uncombine_level() do the >>>> inverse, sort of: >>>> uncombine_level() returns the number of levels (1, 2, 3, or 4) >>>> and uncombine_nr_levels() returns 3 if the number of guest paging >>>> levels was 3, 0 otherwise. This is all dt_init_action() needs to >>>> initialize the decision table. It only tests if c_nr_levels is 3. >>>> >>>> I hope this explanation helps. >>>> >>>> -- rsp >>>> >>>> /* Combine nr_levels and level into 4 combinations by >>>> * taking advantage of the fact that the rules for PTE levels >>>> * 3 and 4 are identical >>>> * except that the PDPT (ie nr_level == 3, level == 3) >>>> * is slightly irregular. */ >>>> static inline int combine_levels(int c_nr_levels, >>>> int c_level) >>>> { >>>> switch (c_level) { >>>> case 1: return 0; >>>> case 2: return 1; >>>> case 3: return c_nr_levels == 3 >>>> ? 2 /* PDPT */ >>>> : 3; /* treat like level 4*/ >>>> default: /* the compiler needs this */ >>>> case 4: return 3; >>>> } >>>> } >>>> >>>> static inline int uncombine_level(int combo) >>>> { >>>> return combo + 1; >>>> } >>>> >>>> static inline int uncombine_nr_levels(int combo) >>>> { >>>> return combo == 2 ? 3 : 0; >>>> } >>>> >>>> Ben Thomas wrote: >>>>> >>>>> >>>>> -------- Original Message -------- >>>>> Subject: Re: [Xen-devel] [PATCH - proposed] XI Shadow Page Table >>>>> Mechanism >>>>> Date: Thu, 29 Jun 2006 20:25:13 +0800 >>>>> From: han >>>>> To: Ben Thomas >>>>> References: <44A055FA.1010405@virtualiron.com> >>>>> >>>>> Hi, Ben >>>>> I'm learning about your patch for the rewritten shadow pagetable code. >>>>> However, I feel confused about some part of your code: >>>>> +static inline int uncombine_nr_levels(int combo) >>>>> +{ >>>>> + return combo == 2 ? 3 : 0; >>>>> +} >>>>> shouldn't it be like below snippet: >>>>> +static inline int uncombine_nr_levels(int combo) >>>>> +{ >>>>> + return combo <= 2 ? 3 : 4; >>>>> +} >>>>> >>>>> _______________________________________________________ >>>>> Best Regards, >>>>> hanzhu >>>>> >>>>> >>>>> Ben Thomas дµÀ: >>>>>> A post last week contained the design document for a shadow page >>>>>> table mechanism. This post contains more information, and a pointer >>>>>> to the code. >>>>>> >>>>>> As a recap, the "XI Shadow Mechanism" is a design for shadow >>>>>> page table code for fully virtualized HVM domains running on a 64-bit >>>>>> Xen hypervisor. >>>>>> >>>>>> This work was undertaken to address a number of goals. These are >>>>>> enumerated in the document posted last week and include: >>>>>> >>>>>> - ability to run fully virtualized 32, 32PAE and 64 bit guest >>>>>> domains concurrently on a 64-bit hypervisor >>>>>> >>>>>> - start support of live migration of fully virtualized domains >>>>>> >>>>>> - provide good performance and robustness >>>>>> >>>>>> - limit size and scope of change to Xen hypervisor >>>>>> >>>>>> This design has been coded, and tested on a variety of operating >>>>>> systems. I've included this weekend's test report. >>>>>> >>>>>> Of particular interest from the following table are the Average Time >>>>>> comparisions (smaller is better): >>>>>> >>>>>> pfault 10.56 vs 2.43 (XI) >>>>>> sync_all 39.72 vs 33.51 (XI) >>>>>> invl_pg 22.75 vs 4.00 (XI) >>>>>> >>>>>> The patch is too large for the mailing list, and may be obtained from >>>>>> >>>>>> http://s3.amazonaws.com/VIDownloads/xi-20060626.patch >>>>>> >>>>>> The patch applies against changeset 10352 (and possibly others) >>>>>> >>>>>> We look forward to any comments and suggestions. >>>>>> >>>>>> Thanks, >>>>>> -b >>>>>> >>>>>> >>>>>> >>>>>> XI Shadow Test Report >>>>>> >>>>>> This report shows a comparsion of test runs done on xen-unstable >>>>>> changeset >>>>>> 10352 with and without the XI patch applied. It also compares >>>>>> pfault, >>>>>> sync_all, and invl_pg performance. >>>>>> >>>>>> With the XI patch 32bit SMP (PAE) guests now boot and pass the >>>>>> regression >>>>>> tests, closing Bug #661. With the XI patch 64bit SMP guests are >>>>>> much more >>>>>> stable and are no longer experiencing SEGVs and GPFs, closing Bug >>>>>> #678. >>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>> >>>>>> | XEN | Guest Kernel SMP kernels booted with 2 >>>>>> CPUs | >>>>>> | >>>>>> Changeset|-----------------------------------------------------------| >>>>>> >>>>>> | | 32bit Non-XI | 32bit XI | 64bit Non-XI | 64bit >>>>>> XI | >>>>>> | >>>>>> |--------------|--------------|--------------|--------------| >>>>>> | | Boot | Test | Boot | Test | Boot | Test | Boot | >>>>>> Test | >>>>>> |----------|------|-------|------|-------|------|-------|------|-------| >>>>>> >>>>>> | 10352 | Fail | | Pass | 852/1 | Pass | 852/0 | Pass | >>>>>> 852/0 | >>>>>> | | (1) | | | (2) | |(3,4,5)| >>>>>> |(3,4) | >>>>>> ---------------------------------------------------------------------- >>>>>> >>>>>> >>>>>> 1. BUG 661: 32bit SMP guest crashes on boot: >>>>>> "(XEN) __hvm_bug at vmx.c:2289" >>>>>> 2. BUG 666: 32bit UP guest fail ltp gettimeofday02: >>>>>> "Time is going backwards" >>>>>> 3. BUG 663: 64bit SMP guest report these messages while running: >>>>>> "(XEN) spurious IRQ irq got=-1" >>>>>> 4. BUG 662: 64bit SMP guest report this message in ltp pth_str01/2/3: >>>>>> "(XEN) level trig mode repeatedly for >>>>>> vector 252" >>>>>> 5. BUG 678: 64bit 2CPU SMP guest failed several ltp tests with >>>>>> SEGVs or GPF >>>>>> "sshd[23654] general protection rip:2a9555d426 rsp:7fbfffd000 >>>>>> error 0" >>>>>> >>>>>> The XI patch can be enabled or disabled at build time and includes >>>>>> some >>>>>> performance statistics that can be examined whether or not it's >>>>>> enabled. >>>>>> This allows us to compare the performance of XI vs the existing >>>>>> shadow code. >>>>>> You access the statistics from the XEN console (^a^a^a). Typing a 'Y' >>>>>> clears the counters and a 'y' displays the counters since the last >>>>>> clear. >>>>>> Here is a comparsion of XI vs Non-XI shadow code done on xen-unstable >>>>>> changeset 10352. The guest is a RHEL4U2-64bit 2 CPU SMP 256MB >>>>>> Guest running >>>>>> a usex -b24 load for 5 minutes. Shadow statistics cleared at >>>>>> start of run. >>>>>> >>>>>> (XEN) Non-XI shadow performance statistics for the last 303 seconds >>>>>> (XEN) Op type # of >>>>>> Operations Total >>>>>> Avg Tick Profile >>>>>> (XEN) <2usec <3usec <4usec <8usec >>>>>> <16usec <32usec <64usec >=64usec ops/msec usec In >>>>>> Op Locked Total >>>>>> (XEN) pfault 6750685 1128922 624988 3249437 >>>>>> 3833201 2863229 1224908 315992 19991362 10.56 >>>>>> 0 0 0 >>>>>> (XEN) 33.7% 5.6% 3.1% 16.2% >>>>>> 19.1% 14.3% 6.1% 1.5% 211175 >>>>>> (XEN) sync_all 8176 380 333 159011 >>>>>> 167619 305470 814598 298411 1753998 39.72 >>>>>> 0 0 0 >>>>>> (XEN) 0.4% 0.0% 0.0% 9.0% >>>>>> 9.5% 17.4% 46.4% 17.0% 69670 >>>>>> (XEN) invl_pg 219 17 114 23333 >>>>>> 40087 21904 24622 4639 114935 22.75 >>>>>> 0 0 0 >>>>>> (XEN) 0.1% 0.0% 0.0% 20.3% >>>>>> 34.8% 19.0% 21.4% 4.0% 2615 >>>>>> >>>>>> (XEN) XI shadow performance statistics for the last 302 seconds >>>>>> (XEN) Op type # of >>>>>> Operations Total >>>>>> Avg Tick Profile >>>>>> (XEN) <2usec <3usec <4usec <8usec >>>>>> <16usec <32usec <64usec >=64usec ops/msec usec In >>>>>> Op Locked Total >>>>>> (XEN) pfault 23078414 4694139 485499 613419 >>>>>> 396088 1427265 11976 8228 30715028 2.43 >>>>>> 0 0 0 >>>>>> (XEN) 75.1% 15.2% 1.5% 1.9% >>>>>> 1.2% 4.6% 0.0% 0.0% 74690 >>>>>> (XEN) sync_all 1734 97 169 305433 >>>>>> 484991 174300 150199 130405 1247328 33.51 >>>>>> 0 0 0 >>>>>> (XEN) 0.1% 0.0% 0.0% 24.4% >>>>>> 38.8% 13.9% 12.0% 10.4% 41810 >>>>>> (XEN) invl_pg 10169 302977 125270 44927 >>>>>> 5385 17603 776 1656 508763 4.00 >>>>>> 0 0 0 >>>>>> (XEN) 1.9% 59.5% 24.6% 8.8% >>>>>> 1.0% 3.4% 0.1% 0.3% 2037 >>>>>> >>>>>> >>>>> >>>> >> >