From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1IXk5O-00089k-SF for qemu-devel@nongnu.org; Tue, 18 Sep 2007 16:54:22 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1IXk5M-00089V-Ap for qemu-devel@nongnu.org; Tue, 18 Sep 2007 16:54:22 -0400 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IXk5M-00089Q-6G for qemu-devel@nongnu.org; Tue, 18 Sep 2007 16:54:20 -0400 Received: from bangui.magic.fr ([195.154.194.245]) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1IXk5L-000662-Bz for qemu-devel@nongnu.org; Tue, 18 Sep 2007 16:54:19 -0400 Received: from [192.168.0.2] (ppp-36.net-123.static.magiconline.fr [80.118.184.36]) by bangui.magic.fr (8.13.1/8.13.1) with ESMTP id l8IKsGYg005636 for ; Tue, 18 Sep 2007 22:54:16 +0200 Subject: [Fwd: Re: [Qemu-devel] [PATCH] SVM support] From: "J. Mayer" Content-Type: text/plain Date: Tue, 18 Sep 2007 22:54:16 +0200 Message-Id: <1190148856.14938.247.camel@rapid> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Forwarded, as requested. -------- Forwarded Message -------- > From: Alexander Graf > To: J.Mayer > Subject: Re: [Qemu-devel] [PATCH] SVM support > Date: Tue, 18 Sep 2007 13:51:26 +0200 > > On Sep 18, 2007, at 12:09 PM, J. Mayer wrote: > > > On Tue, 2007-09-18 at 03:02 +0200, Alexander Graf wrote: > >> Jocelyn Mayer wrote: > >>> On Mon, 2007-09-17 at 12:02 +0200, Alexander Graf wrote: > >>> > >>>> J. Mayer wrote: > >>>> > >>>>> On Thu, 2007-09-13 at 17:27 +0200, Alexander Graf wrote: > >>>>> > >>>>> > >>>>>> Thiemo Seufer wrote: > >>>>>> > >>>>>> > >>>>>>> Alexander Graf wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Thanks to Michael Peter who tried the patch on an x86 host I > >>>>>>>> found some > >>>>>>>> compilation problems. > >>>>>>>> This updated patch addresses these issues and thus should > >>>>>>>> compile on > >>>>>>>> every platform for every target available. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>> [...] > >>>>> > >>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> Wow sorry about that, looks like I missed these. > >>>>>> > >>>>>> > >>>>> Index: qemu-0.9.0.cvs/exec-all.h > >>>>> ================================================================== > >>>>> = > >>>>> --- qemu-0.9.0.cvs.orig/exec-all.h > >>>>> +++ qemu-0.9.0.cvs/exec-all.h > >>>>> @@ -166,6 +166,7 @@ static inline int tlb_set_page(CPUState > >>>>> typedef struct TranslationBlock { > >>>>> target_ulong pc; /* simulated PC corresponding to this > >>>>> block (EIP > >>>>> + CS base) */ > >>>>> target_ulong cs_base; /* CS base for this block */ > >>>>> + uint64_t intercept; /* SVM intercept vector */ > >>>>> unsigned int flags; /* flags defining in which context the > >>>>> code was > >>>>> generated */ > >>>>> uint16_t size; /* size of target code for this block > >>>>> (1 <= > >>>>> size <= TARGET_PAGE_SIZE) */ > >>>>> Index: qemu-0.9.0.cvs/cpu-all.h > >>>>> ================================================================== > >>>>> = > >>>>> --- qemu-0.9.0.cvs.orig/cpu-all.h > >>>>> +++ qemu-0.9.0.cvs/cpu-all.h > >>>>> @@ -715,6 +715,7 @@ extern int code_copy_enabled; > >>>>> #define CPU_INTERRUPT_HALT 0x20 /* CPU halt wanted */ > >>>>> #define CPU_INTERRUPT_SMI 0x40 /* (x86 only) SMI interrupt > >>>>> pending > >>>>> */ > >>>>> #define CPU_INTERRUPT_DEBUG 0x80 /* Debug event occured. */ > >>>>> +#define CPU_INTERRUPT_VIRQ 0x100 /* virtual interrupt > >>>>> pending. */ > >>>>> > >>>>> void cpu_interrupt(CPUState *s, int mask); > >>>>> void cpu_reset_interrupt(CPUState *env, int mask); > >>>>> > >>>>> Those two patches look ugly to me as target specific features > >>>>> should > >>>>> never go in generic code or structures. > >>>>> The CPU_INTERRUPT flags should just contain information about the > >>>>> emulator behavior, thus CPU_INTERRUPT_TIMER, CPU_INTERRUPT_SMI > >>>>> are to be > >>>>> removed. Target specific informations about the exception > >>>>> nature should > >>>>> go in the CPUState structure... Then, adding a > >>>>> CPU_INTERRUPT_VIRQ seems > >>>>> not a good idea at all: it's outside of the generic emulator > >>>>> scope and > >>>>> pointless for most targets. > >>>>> > >>>>> > >>>> I don't see any practical reason not to do it this way. We could > >>>> define > >>>> a CPU_INTERRUPT_TARGET interrupt, that stores the information in > >>>> a the > >>>> target specific CPUState, but this would hit performance > >>>> (marginally > >>>> though) and does not improve the situation. I don't think that > >>>> we should > >>>> should forcefully try to seperate targets where we are not even > >>>> close to > >>>> running out of constants. If everyone on this list sees the > >>>> situation as > >>>> Jocelyn does, I would be fine with writing a patch that puts target > >>>> specific interrupts to the specific targets. > >>>> > >>> > >>> I was to do the same to add some functionality to the PowerPC > >>> target, > >>> long time ago, and Fabrice Bellard convinced me not to do and > >>> agreed it > >>> has been a bad idea to add the target specific CPU_INTERRUPT_xxx > >>> flags. > >>> Then I did manage the PowerPC target to use only generic > >>> CPU_INTERRUPT_xxx and put all other target specific informations > >>> in the > >>> CPUState structure. It absolutelly changes nothing in terms of > >>> functionality nor performance. The only thing is that it makes the > >>> generic parts simpler and could even allow the cpu_exec function to > >>> contain almost no target specific code, which would really great > >>> imho. > >>> > >> > >> I can give that a try :-). Sounds reasonable for me. > > > >> [Next message] > >> Oh well I just thought about this a bit more and while stumbling > >> across > >> CPU_INTERRUPT_FIQ which does the same thing one major problem came > >> to me > >> on that one: Priority. Real interrupts have priority over virtual > >> interrupts so the VIRQs have to be processed after HARD IRQs, whereas > >> SMIs and NMIs have to be taken care of before the HARD IRQs. It > >> simply > >> wouldn't work out to generalize the IRQs, just as it would not > >> work with > >> the VIRQ, as it has to be handled as a real IRQ but without accessing > >> the APIC which has to be done for HARD IRQs. > >> > >> I guess we're stuck with what's there now. > > > > Priorities are not an issue, you can take a look to the > > ppc_hw_interrupt function > > to see how the PowerPC emulation code manages priorities and > > masking between > > 10 different hardware interruption sources. The code could be > > better (I wanted > > to preserve existing code) but it's a solution that actually works... > > But looking better the x86 emulation code, I agree that if you > > don't want to change it > > too much, you have to add a flag. The other solution (and best, > > imho) would be > > to do the same as what has been done for PowerPC: just let the > > generic code in > > the cpu_exec loop and move the x86 specific code somewhere in > > target-i386 code. > > Going this way, I see no reason why the interruption code in the > > cpu_exec loop > > could not finally become: > > interrupt_request = env->interrupt_request; > > if (__builtin_expect(interrupt_request, 0)) { > > if (interrupt_request & CPU_INTERRUPT_DEBUG) { > > env->interrupt_request &= > > ~CPU_INTERRUPT_DEBUG; > > env->exception_index = EXCP_DEBUG; > > cpu_loop_exit(); > > } > > if (interrupt_request & CPU_INTERRUPT_HALT) { > > env->interrupt_request &= ~CPU_INTERRUPT_HALT; > > env->halted = 1; > > env->exception_index = EXCP_HLT; > > cpu_loop_exit(); > > } > > if (interrupt_request & CPU_INTERRUPT_HARD) { > > hw_interrupt(env); > > if (env->pending_interrupts == 0) > > env->interrupt_request &= > > ~CPU_INTERRUPT_HARD; > > #if defined(__sparc__) && !defined(HOST_SOLARIS) > > tmp_T0 = 0; > > #else > > T0 = 0; > > #endif > > } > > /* Don't use the cached interupt_request value, > > do_interrupt may have updated the EXITTB > > flag. */ > > if (interrupt_request & CPU_INTERRUPT_EXITTB) { > > env->interrupt_request &= > > ~CPU_INTERRUPT_EXITTB; > > /* ensure that no TB jump will be modified as > > the program flow was changed */ > > #if defined(__sparc__) && !defined(HOST_SOLARIS) > > tmp_T0 = 0; > > #else > > T0 = 0; > > #endif > > } > > if (interrupt_request & CPU_INTERRUPT_EXIT) { > > env->interrupt_request &= ~CPU_INTERRUPT_EXIT; > > env->exception_index = EXCP_INTERRUPT; > > cpu_loop_exit(); > > } > > } > > All the targets specific tricks could be done in the hw_interrupt > > routine. And the generic code > > would become much more readable. But this needs some works (not so > > much) and intensive tests... > > And I guess nobody feels like taking this risk right now ;-) > > But I think this will have to be done someday... > > I definitely agree with you on that one. The current interrupt > handlers are filled with #ifdefs and not easily readable. This is a > different task though and maybe I will get the time to do this. I > believe it should be done "all archs at once" though, so either we > use the current method (adding non-gerneric interrupts) for now and > tidy up everything later on or tidy up the code now and port the svm > patch on that. As I don't see the cleanup coming in the near future > I'd vote for including it now and have it reworked at once when > someone feels dare enough to clean up the whole generic interrupt > handler. > > Nevertheless I really do like the idea of having a cleaned up > interrupt handler. This is a different patch though and is rather a > TODO than anything that should be mangled with the svm patch. > > > > >>> > >>>>> For the same reason, the intercept field in the TB structure > >>>>> seems not > >>>>> acceptable, as TB specific target informations are already to > >>>>> be stored > >>>>> in the flags field. As intercept seems only to be a bit field, > >>>>> it should > >>>>> go, in a way or another, in tb flags. And as it seems that some > >>>>> > >>>>> > >>>> TB flags is a 32-bit bitfield, where several bits are already used. > >>>> Currently SVM supports 45 intercepts and each combination can > >>>> generate a > >>>> different TB, as we do not want to reduce performance for the > >>>> non-intercepted case. Think of the intercept field as flags2, > >>>> that could > >>>> be used by other virtualization implementations on other > >>>> architectures > >>>> too (though I do not know of any) > >>>> > >>> > >>> So, why not make it generic and extend the flag field to as many > >>> bits as > >>> you need ? Intercept is x86 specific and has no meaning outside > >>> of this > >>> scope. Having more bits for flags is generic and may be useful > >>> for some > >>> other targets... The easy way is to convert flag to 64 bits, if > >>> you have > >>> enough bits. Another way is to make it a table, but this may need > >>> changes in all targets code... > >>> > >> > >> I think it might be quite a bad idea to extend flags until everything > >> fits in there. I'm planning on implementing VMX as well and there are > >> intercepts too so the meaning of the different fields change and I > >> would > >> not want that happening with the normal flags vector. What would you > >> think of an #ifdef TARGET_I386? This way intercepts are x86 only and > >> don't interfere with any other target architecture, but are still > >> seperated from the flags (which is semantically correct as far as it > >> goes for me). > > > > I don't consider as a problem that one bit in the TB flags field can > > have different meanings. > > This field is something opaque from the generic code. I actually have > > the same thing in > > the PowerPC emulation (even if it's not fully done for now): TB flags > > are just an > > selection of the Machine State Register bits. Some of those bits can > > have different > > meanings depending on the PowerPC implementation (ie CPU model). > > But the > > only thing important is to know if the current CPUState flags are > > equal > > to the > > TB flags, not the actual meaning of the bits. As the MSR is 32 bits on > > 32 bits > > PowerPC implementations, I have no hesitation to use those bits > > directly > > even if their actual meaning, from the CPU "point of view" changes. > > Then, imho, you should not hesitate to put those intercept bits in a > > generic field... > > > Ok, I will try to shift the intercepts in an uint_64 flags variable > in the TB. > > > > >>> > >>>>> interceptions are related with functions implemented in helpers > >>>>> (not > >>>>> micro-ops), you'd better check the interception in the helper at > >>>>> runtime, which would add no visible overhead (as calling a > >>>>> helper is > >>>>> slow compared to direct micro-ops execution), then you would > >>>>> not need to > >>>>> store those infos in the TB structure. This may even make the > >>>>> emulation > >>>>> > >>>>> > >>>> This was the case in an earlier version of the patch. The current > >>>> implementation you see is an optimization to make it running > >>>> faster, as > >>>> there is (nearly) no need for runtime detection. > >>>> > >>> > >>> For all interceptions related to operations in helpers, it's not a > >>> actual optimisation. Things done in helpers are slow and not > >>> widely used > >>> (compared to the translated code, I mean), then adding a single > >>> test for > >>> interception in a helper may not hurt global performances. You > >>> may need > >>> TB flags only for operations you want to intercept that would be > >>> inlined > >>> in the translated code, then maybe reduce the number of bits needed > >>> without decreasing performanced. But, not knowing a lot about x86 VM > >>> specification I may be wrong... > >>> > >>> > >> > >> I feel really uncomfortable with supporting hundreds of different > >> ways > >> of interception. It took me days to take all the intercept checks > >> out of > >> the helpers into the translate.c, as it is way easier to maintain and > >> read that way. I do not like the idea of converting the intercept > >> vector > >> to something internal. It's quite nice to be able to use the very > >> same > >> constants as kvm does. This makes debugging really easy and you don't > >> have to define everything twice. Additionally this way you can > >> just set > >> a variable without calculating anything, which does come in handy > >> as well. > >> > >> Basically the reason I really want to have everyting in > >> translate.c is > >> simplicity though. > > > > OK, I understand that using the same code for all interceptions can > > make > > your coding > > far easier... > > > >>>>> run faster has you won't fill the TB cache with multiple > >>>>> translation of > >>>>> the same code each time the env->intercept changes, thus have > >>>>> chance to > >>>>> avoid many TB caches flushes. > >>>>> > >>>>> > >>>> This is actually intended, as I believe the emulated machine > >>>> should run > >>>> as fast as before when not using any interceptions > >>>> > >>> > >>> It depends if those flags are to be changed often or not, which > >>> depends > >>> on the hypervisor software in use I guess. If the flags never > >>> change, > >>> there'll be no TB flush, I agree... > >>> > >> > >> Usually (at least that's the case with kvm) you have two different > >> sets > >> of intercept flags. One for the real machine and one for the > >> virtualized > >> one. There is only one minor situation where that might change but > >> that's not worth mentioning here. Nevertheless TBs point to physical > >> memory (afaik) so since the virtual kernel is on a different page > >> than > >> the real one, we could not reuse TBs anyway so having the intercept > >> information in the TB only helps in finding out which intercepts to > >> check for as only a minority of the possible intercepts actually > >> get used. > > > > So as far as the hypervisor do not provide shared routines to be > > used by > > the OS, > > you should never have any "collisison" between translated blocks in > > different environments ? > > I guess you need a complete TB flush when switching between the > > virtualized environments. > > Then, OK, the interception flags should never lead to fill the TB > > cache > > with multiple translations > > of the same blocks... > > Exactly. The only possible "collision" would be two virtual machines > that share code on the same physical page somehow. > > Cheers, > > Alex -- J. Mayer Never organized