From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1IXk5O-00089k-SF
	for qemu-devel@nongnu.org; Tue, 18 Sep 2007 16:54:22 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1IXk5M-00089V-Ap
	for qemu-devel@nongnu.org; Tue, 18 Sep 2007 16:54:22 -0400
Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1IXk5M-00089Q-6G
	for qemu-devel@nongnu.org; Tue, 18 Sep 2007 16:54:20 -0400
Received: from bangui.magic.fr ([195.154.194.245])
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <l_indien@magic.fr>) id 1IXk5L-000662-Bz
	for qemu-devel@nongnu.org; Tue, 18 Sep 2007 16:54:19 -0400
Received: from [192.168.0.2] (ppp-36.net-123.static.magiconline.fr
	[80.118.184.36])
	by bangui.magic.fr (8.13.1/8.13.1) with ESMTP id l8IKsGYg005636
	for <qemu-devel@nongnu.org>; Tue, 18 Sep 2007 22:54:16 +0200
Subject: [Fwd: Re: [Qemu-devel] [PATCH] SVM support]
From: "J. Mayer" <l_indien@magic.fr>
Content-Type: text/plain
Date: Tue, 18 Sep 2007 22:54:16 +0200
Message-Id: <1190148856.14938.247.camel@rapid>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

Forwarded, as requested.

-------- Forwarded Message --------
> From: Alexander Graf <agraf@suse.de>
> To: J.Mayer <l_indien@magic.fr>
> Subject: Re: [Qemu-devel] [PATCH] SVM support
> Date: Tue, 18 Sep 2007 13:51:26 +0200
> 
> On Sep 18, 2007, at 12:09 PM, J. Mayer wrote:
> 
> > On Tue, 2007-09-18 at 03:02 +0200, Alexander Graf wrote:
> >> Jocelyn Mayer wrote:
> >>> On Mon, 2007-09-17 at 12:02 +0200, Alexander Graf wrote:
> >>>
> >>>> J. Mayer wrote:
> >>>>
> >>>>> On Thu, 2007-09-13 at 17:27 +0200, Alexander Graf wrote:
> >>>>>
> >>>>>
> >>>>>> Thiemo Seufer wrote:
> >>>>>>
> >>>>>>
> >>>>>>> Alexander Graf wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Thanks to Michael Peter who tried the patch on an x86 host I  
> >>>>>>>> found some
> >>>>>>>> compilation problems.
> >>>>>>>> This updated patch addresses these issues and thus should  
> >>>>>>>> compile on
> >>>>>>>> every platform for every target available.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>> [...]
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>> Wow sorry about that, looks like I missed these.
> >>>>>>
> >>>>>>
> >>>>> Index: qemu-0.9.0.cvs/exec-all.h
> >>>>> ================================================================== 
> >>>>> =
> >>>>> --- qemu-0.9.0.cvs.orig/exec-all.h
> >>>>> +++ qemu-0.9.0.cvs/exec-all.h
> >>>>> @@ -166,6 +166,7 @@ static inline int tlb_set_page(CPUState
> >>>>>  typedef struct TranslationBlock {
> >>>>>      target_ulong pc;   /* simulated PC corresponding to this  
> >>>>> block (EIP
> >>>>> + CS base) */
> >>>>>      target_ulong cs_base; /* CS base for this block */
> >>>>> +    uint64_t intercept; /* SVM intercept vector */
> >>>>>      unsigned int flags; /* flags defining in which context the  
> >>>>> code was
> >>>>> generated */
> >>>>>      uint16_t size;      /* size of target code for this block  
> >>>>> (1 <=
> >>>>>                             size <= TARGET_PAGE_SIZE) */
> >>>>> Index: qemu-0.9.0.cvs/cpu-all.h
> >>>>> ================================================================== 
> >>>>> =
> >>>>> --- qemu-0.9.0.cvs.orig/cpu-all.h
> >>>>> +++ qemu-0.9.0.cvs/cpu-all.h
> >>>>> @@ -715,6 +715,7 @@ extern int code_copy_enabled;
> >>>>>  #define CPU_INTERRUPT_HALT   0x20 /* CPU halt wanted */
> >>>>>  #define CPU_INTERRUPT_SMI    0x40 /* (x86 only) SMI interrupt  
> >>>>> pending
> >>>>> */
> >>>>>  #define CPU_INTERRUPT_DEBUG  0x80 /* Debug event occured.  */
> >>>>> +#define CPU_INTERRUPT_VIRQ   0x100 /* virtual interrupt  
> >>>>> pending.  */
> >>>>>
> >>>>>  void cpu_interrupt(CPUState *s, int mask);
> >>>>>  void cpu_reset_interrupt(CPUState *env, int mask);
> >>>>>
> >>>>> Those two patches look ugly to me as target specific features  
> >>>>> should
> >>>>> never go in generic code or structures.
> >>>>> The CPU_INTERRUPT flags should just contain information about the
> >>>>> emulator behavior, thus CPU_INTERRUPT_TIMER, CPU_INTERRUPT_SMI  
> >>>>> are to be
> >>>>> removed. Target specific informations about the exception  
> >>>>> nature should
> >>>>> go in the CPUState structure... Then, adding a  
> >>>>> CPU_INTERRUPT_VIRQ seems
> >>>>> not a good idea at all: it's outside of the generic emulator  
> >>>>> scope and
> >>>>> pointless for most targets.
> >>>>>
> >>>>>
> >>>> I don't see any practical reason not to do it this way. We could  
> >>>> define
> >>>> a CPU_INTERRUPT_TARGET interrupt, that stores the information in  
> >>>> a the
> >>>> target specific CPUState, but this would hit performance  
> >>>> (marginally
> >>>> though) and does not improve the situation. I don't think that  
> >>>> we should
> >>>> should forcefully try to seperate targets where we are not even  
> >>>> close to
> >>>> running out of constants. If everyone on this list sees the  
> >>>> situation as
> >>>> Jocelyn does, I would be fine with writing a patch that puts target
> >>>> specific interrupts to the specific targets.
> >>>>
> >>>
> >>> I was to do the same to add some functionality to the PowerPC  
> >>> target,
> >>> long time ago, and Fabrice Bellard convinced me not to do and  
> >>> agreed it
> >>> has been a bad idea to add the target specific CPU_INTERRUPT_xxx  
> >>> flags.
> >>> Then I did manage the PowerPC target to use only generic
> >>> CPU_INTERRUPT_xxx and put all other target specific informations  
> >>> in the
> >>> CPUState structure. It absolutelly changes nothing in terms of
> >>> functionality nor performance. The only thing is that it makes the
> >>> generic parts simpler and could even allow the cpu_exec function to
> >>> contain almost no target specific code, which would really great  
> >>> imho.
> >>>
> >>
> >> I can give that a try :-). Sounds reasonable for me.
> >
> >> [Next message]
> >> Oh well I just thought about this a bit more and while stumbling  
> >> across
> >> CPU_INTERRUPT_FIQ which does the same thing one major problem came  
> >> to me
> >> on that one: Priority. Real interrupts have priority over virtual
> >> interrupts so the VIRQs have to be processed after HARD IRQs, whereas
> >> SMIs and NMIs have to be taken care of before the HARD IRQs. It  
> >> simply
> >> wouldn't work out to generalize the IRQs, just as it would not  
> >> work with
> >> the VIRQ, as it has to be handled as a real IRQ but without accessing
> >> the APIC which has to be done for HARD IRQs.
> >>
> >> I guess we're stuck with what's there now.
> >
> > Priorities are not an issue, you can take a look to the  
> > ppc_hw_interrupt function
> > to see how the PowerPC emulation code manages priorities and  
> > masking between
> > 10 different hardware interruption sources. The code could be  
> > better (I wanted
> > to preserve existing code) but it's a solution that actually works...
> > But looking better the x86 emulation code, I agree that if you  
> > don't want to change it
> > too much, you have to add a flag. The other solution (and best,  
> > imho) would be
> > to do the same as what has been done for PowerPC: just let the  
> > generic code in
> > the cpu_exec loop and move the x86 specific code somewhere in  
> > target-i386 code.
> > Going this way, I see no reason why the interruption code in the  
> > cpu_exec loop
> > could not finally become:
> >                 interrupt_request = env->interrupt_request;
> >                 if (__builtin_expect(interrupt_request, 0)) {
> >                     if (interrupt_request & CPU_INTERRUPT_DEBUG) {
> >                         env->interrupt_request &=  
> > ~CPU_INTERRUPT_DEBUG;
> >                         env->exception_index = EXCP_DEBUG;
> >                         cpu_loop_exit();
> >                     }
> >                     if (interrupt_request & CPU_INTERRUPT_HALT) {
> >                         env->interrupt_request &= ~CPU_INTERRUPT_HALT;
> >                         env->halted = 1;
> >                         env->exception_index = EXCP_HLT;
> >                         cpu_loop_exit();
> >                     }
> >                     if (interrupt_request & CPU_INTERRUPT_HARD) {
> >                         hw_interrupt(env);
> >                         if (env->pending_interrupts == 0)
> >                             env->interrupt_request &=  
> > ~CPU_INTERRUPT_HARD;
> > #if defined(__sparc__) && !defined(HOST_SOLARIS)
> >                         tmp_T0 = 0;
> > #else
> >                         T0 = 0;
> > #endif
> >                     }
> >                    /* Don't use the cached interupt_request value,
> >                       do_interrupt may have updated the EXITTB  
> > flag. */
> >                     if (interrupt_request & CPU_INTERRUPT_EXITTB) {
> >                         env->interrupt_request &=  
> > ~CPU_INTERRUPT_EXITTB;
> >                         /* ensure that no TB jump will be modified as
> >                            the program flow was changed */
> > #if defined(__sparc__) && !defined(HOST_SOLARIS)
> >                         tmp_T0 = 0;
> > #else
> >                         T0 = 0;
> > #endif
> >                     }
> >                     if (interrupt_request & CPU_INTERRUPT_EXIT) {
> >                         env->interrupt_request &= ~CPU_INTERRUPT_EXIT;
> >                         env->exception_index = EXCP_INTERRUPT;
> >                         cpu_loop_exit();
> >                     }
> >                 }
> > All the targets specific tricks could be done in the hw_interrupt  
> > routine. And the generic code
> > would become much more readable. But this needs some works (not so  
> > much) and intensive tests...
> > And I guess nobody feels like taking this risk right now ;-)
> > But I think this will have to be done someday...
> 
> I definitely agree with you on that one. The current interrupt  
> handlers are filled with #ifdefs and not easily readable. This is a  
> different task though and maybe I will get the time to do this. I  
> believe it should be done "all archs at once" though, so either we  
> use the current method (adding non-gerneric interrupts) for now and  
> tidy up everything later on or tidy up the code now and port the svm  
> patch on that. As I don't see the cleanup coming in the near future  
> I'd vote for including it now and have it reworked at once when  
> someone feels dare enough to clean up the whole generic interrupt  
> handler.
> 
> Nevertheless I really do like the idea of having a cleaned up  
> interrupt handler. This is a different patch though and is rather a  
> TODO than anything that should be mangled with the svm patch.
> 
> >
> >>>
> >>>>> For the same reason, the intercept field in the TB structure  
> >>>>> seems not
> >>>>> acceptable, as TB specific target informations are already to  
> >>>>> be stored
> >>>>> in the flags field. As intercept seems only to be a bit field,  
> >>>>> it should
> >>>>> go, in a way or another, in tb flags. And as it seems that some
> >>>>>
> >>>>>
> >>>> TB flags is a 32-bit bitfield, where several bits are already used.
> >>>> Currently SVM supports 45 intercepts and each combination can  
> >>>> generate a
> >>>> different TB, as we do not want to reduce performance for the
> >>>> non-intercepted case. Think of the intercept field as flags2,  
> >>>> that could
> >>>> be used by other virtualization implementations on other  
> >>>> architectures
> >>>> too (though I do not know of any)
> >>>>
> >>>
> >>> So, why not make it generic and extend the flag field to as many  
> >>> bits as
> >>> you need ? Intercept is x86 specific and has no meaning outside  
> >>> of this
> >>> scope. Having more bits for flags is generic and may be useful  
> >>> for some
> >>> other targets... The easy way is to convert flag to 64 bits, if  
> >>> you have
> >>> enough bits. Another way is to make it a table, but this may need
> >>> changes in all targets code...
> >>>
> >>
> >> I think it might be quite a bad idea to extend flags until everything
> >> fits in there. I'm planning on implementing VMX as well and there are
> >> intercepts too so the meaning of the different fields change and I  
> >> would
> >> not want that happening with the normal flags vector. What would you
> >> think of an #ifdef TARGET_I386? This way intercepts are x86 only and
> >> don't interfere with any other target architecture, but are still
> >> seperated from the flags (which is semantically correct as far as it
> >> goes for me).
> >
> > I don't consider as a problem that one bit in the TB flags field can
> > have different meanings.
> > This field is something opaque from the generic code. I actually have
> > the same thing in
> > the PowerPC emulation (even if it's not fully done for now): TB flags
> > are just an
> > selection of the Machine State Register bits. Some of those bits can
> > have different
> > meanings depending on the PowerPC implementation (ie CPU model).  
> > But the
> > only thing important is to know if the current CPUState flags are  
> > equal
> > to the
> > TB flags, not the actual meaning of the bits. As the MSR is 32 bits on
> > 32 bits
> > PowerPC implementations, I have no hesitation to use those bits  
> > directly
> > even if their actual meaning, from the CPU "point of view" changes.
> > Then, imho, you should not hesitate to put those intercept bits in a
> > generic field...
> 
> 
> Ok, I will try to shift the intercepts in an uint_64 flags variable  
> in the TB.
> 
> >
> >>>
> >>>>> interceptions are related with functions implemented in helpers  
> >>>>> (not
> >>>>> micro-ops), you'd better check the interception in the helper at
> >>>>> runtime, which would add no visible overhead (as calling a  
> >>>>> helper is
> >>>>> slow compared to direct micro-ops execution), then you would  
> >>>>> not need to
> >>>>> store those infos in the TB structure. This may even make the  
> >>>>> emulation
> >>>>>
> >>>>>
> >>>> This was the case in an earlier version of the patch. The current
> >>>> implementation you see is an optimization to make it running  
> >>>> faster, as
> >>>> there is (nearly) no need for runtime detection.
> >>>>
> >>>
> >>> For all interceptions related to operations in helpers, it's not a
> >>> actual optimisation. Things done in helpers are slow and not  
> >>> widely used
> >>> (compared to the translated code, I mean), then adding a single  
> >>> test for
> >>> interception in a helper may not hurt global performances. You  
> >>> may need
> >>> TB flags only for operations you want to intercept that would be  
> >>> inlined
> >>> in the translated code, then maybe reduce the number of bits needed
> >>> without decreasing performanced. But, not knowing a lot about x86 VM
> >>> specification I may be wrong...
> >>>
> >>>
> >>
> >> I feel really uncomfortable with supporting hundreds of different  
> >> ways
> >> of interception. It took me days to take all the intercept checks  
> >> out of
> >> the helpers into the translate.c, as it is way easier to maintain and
> >> read that way. I do not like the idea of converting the intercept  
> >> vector
> >> to something internal. It's quite nice to be able to use the very  
> >> same
> >> constants as kvm does. This makes debugging really easy and you don't
> >> have to define everything twice. Additionally this way you can  
> >> just set
> >> a variable without calculating anything, which does come in handy  
> >> as well.
> >>
> >> Basically the reason I really want to have everyting in  
> >> translate.c is
> >> simplicity though.
> >
> > OK, I understand that using the same code for all interceptions can  
> > make
> > your coding
> > far easier...
> >
> >>>>> run faster has you won't fill the TB cache with multiple  
> >>>>> translation of
> >>>>> the same code each time the env->intercept changes, thus have  
> >>>>> chance to
> >>>>> avoid many TB caches flushes.
> >>>>>
> >>>>>
> >>>> This is actually intended, as I believe the emulated machine  
> >>>> should run
> >>>> as fast as before when not using any interceptions
> >>>>
> >>>
> >>> It depends if those flags are to be changed often or not, which  
> >>> depends
> >>> on the hypervisor software in use I guess. If the flags never  
> >>> change,
> >>> there'll be no TB flush, I agree...
> >>>
> >>
> >> Usually (at least that's the case with kvm) you have two different  
> >> sets
> >> of intercept flags. One for the real machine and one for the  
> >> virtualized
> >> one. There is only one minor situation where that might change but
> >> that's not worth mentioning here. Nevertheless TBs point to physical
> >> memory  (afaik) so since the virtual kernel is on a different page  
> >> than
> >> the real one, we could not reuse TBs anyway so having the intercept
> >> information in the TB only helps in finding out which intercepts to
> >> check for as only a minority of the possible intercepts actually  
> >> get used.
> >
> > So as far as the hypervisor do not provide shared routines to be  
> > used by
> > the OS,
> > you should never have any "collisison" between translated blocks in
> > different environments ?
> > I guess you need a complete TB flush when switching between the
> > virtualized environments.
> > Then, OK, the interception flags should never lead to fill the TB  
> > cache
> > with multiple translations
> > of the same blocks...
> 
> Exactly. The only possible "collision" would be two virtual machines  
> that share code on the same physical page somehow.
> 
> Cheers,
> 
> Alex
-- 
J. Mayer <l_indien@magic.fr>
Never organized