From mboxrd@z Thu Jan  1 00:00:00 1970
From: Zachary Amsden <zach@vmware.com>
Subject: Re: hardwired VMI crap
Date: Thu, 08 Mar 2007 12:46:20 -0800
Message-ID: <45F0761C.6060107@vmware.com>
References: <45EE1EA3.90803@vmware.com>	
	<1173256666.24738.576.camel@localhost.localdomain>	
	<45EEF966.6060902@goop.org>
	<Line.LNX.4.64.0703071304080.1851@d.namei>	
	<45EF0CF5.5090305@goop.org> <45EF175D.6030609@vmware.com>	
	<1173302503.24738.795.camel@localhost.localdomain>	
	<45EF372E.7030600@goop.org>	
	<1173308717.24738.898.camel@localhost.localdomain>	
	<45EF49E9.7040509@vmware.com> <20070308091019.GA19460@elte.hu>	
	<45EFE010.7080108@vmware.com>
	<1173352154.24738.1023.camel@localhost.localdomain>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Return-path: <virtualization-bounces@lists.osdl.org>
In-Reply-To: <1173352154.24738.1023.camel@localhost.localdomain>
List-Unsubscribe: <https://lists.osdl.org/mailman/listinfo/virtualization>,
	<mailto:virtualization-request@lists.osdl.org?subject=unsubscribe>
List-Archive: <http://lists.osdl.org/pipermail/virtualization>
List-Post: <mailto:virtualization@lists.osdl.org>
List-Help: <mailto:virtualization-request@lists.osdl.org?subject=help>
List-Subscribe: <https://lists.osdl.org/mailman/listinfo/virtualization>,
	<mailto:virtualization-request@lists.osdl.org?subject=subscribe>
Sender: virtualization-bounces@lists.osdl.org
Errors-To: virtualization-bounces@lists.osdl.org
To: tglx@linutronix.de
Cc: john stultz <johnstul@us.ibm.com>, LKML <linux-kernel@vger.kernel.org>, Chris Wright <chrisw@sous-sol.org>, Virtualization Mailing List <virtualization@lists.osdl.org>, Ingo Molnar <mingo@elte.hu>, Linus Torvalds <torvalds@linux-foundation.org>, akpm@linux-foundation.org
List-Id: virtualization@lists.linuxfoundation.org

Thomas Gleixner wrote:
> On Thu, 2007-03-08 at 02:06 -0800, Zachary Amsden wrote:
>   =

>>>> The correct solution here is to properly separate the APIC, SMP, and =

>>>> timer code so the logic of it which we want to reuse is separated from =

>>>> the hardware dependence.  Clock events and clocksources take care of =

>>>> most of the timer issues, but there is still ugliness from SMP timer =

>>>> events depending on having part of the APIC infrastructure for wiring =

>>>> the interrupt gates.
>>>>     =

>>>>         =

>>> what are you talking about? A clockevents driver does not need to know =

>>> about lapic details, at all. In terms of interrupt gates for the =

>>> hypervisor to notify about clock events - use a virtual interrupt =

>>> controller via genirq.
>>>   =

>>>       =

>> See my last e-mail.  It is not possible on i386, since local per-cpu =

>> interrupts are only supported via the APIC.
>>     =

>
> It is not possible from your POV. It is possible, as we have already a
> complete irq abstraction layer, which supports _ALL_ of the
> requirements.
>
> To make use of it in a maintainable way, it just needs the work of doing
> a proper client for the genirq layer, which get's its interrupt injected
> by the hypervisor.
>
> genirq() does not care by which mechanism handle_percpu_irq() is called.
>
> We provided the abstractions and you just tell us straight in the face,
> that your hypervisor works that way and therefor we have to accept that
> you do it that way.
>
> It's not rocket science to implement an abstract interrupt controller,
> which lets you inject per cpu or global interrupts into the generic
> layer. It needs some preparatory work to distangle the boot code
> assumptions from the implicit hardware, but this is a better spent time,
> than another set of hackery, which you already advertised for smpboot.c
>   =


When we're about two weeks away from a product release and you are =

threatening to unmerge or block our code because we didn't create an =

abstract interrupt controller, we re-used the APIC and IO-APIC, this is =

uber rocket science.  We've been doing things this way, with public =

patches for over a year, and you've even been CC'd on some of the =

discussions.  So it is a little late to tell us - "redesign your =

hypervisor, or else.."

> All we want you and the other hypervisor folks to do is to =

>
> - use existing abstractions in the way they are designed
> - create new ones where applicable
>   =


Great.
> - break the hardwired hardware assumptions, so a sane emulation model
> can be used.
>   =


Why?  This is your own invention, as you think it would make life =

easier.  It doesn't - you still have real hardware to deal with, and =

your code will always be designed to operate on silicon with these =

hardwired assumptions.  Breaking away from that can actually make the =

code more complex, both in the hypervisor and in Linux.

>   =

>> So far, all you have done is not complain about our code until it was =

>> merged, the pursue every tactic possible to break it.  It is not us that =

>> are stonewalling.
>>     =

>
> You have been told before. Andi asked you more than once to move to
> clockevents.
>   =


Which we have done.  And now you refuse to give any feedback on =

technical points, but maintain an objection to the way we have done it.

> If you can not change your hypervisor model to use a sane abstraction of
> interrupts, then please emulate lapic, io_apic and everything else
> _OUTSIDE_ of the kernel.
>   =


We faithfully emulate lapic, io_apic, the pit, pic, and a normal =

interrupt subsystem.  We can't magically stop using these things because =

we have to support traditional full virtualization.  Which means any =

version of Linux, virtual interrupt controller or not, is going to boot =

up, find these things, and try to use them.  So for a paravirt kernel, =

either we have to disable each of these things in the Linux code or just =

re-use them.

So we re-use them.  We don't even change their semantics.  Where we get =

into trouble is the fact that only the lapic can deliver per-cpu timer =

IRQs, and we need to provide a better time abstraction than TSC.  So we =

need a time device, but there is no way to implement it in the =

traditional hardware model.

And I ask again for your feedback on which approach you think is correct:

1) Rewrite the interrupt subsystem of our hypervisor, making it =

incompatible with full virtualization, so that we can support an =

abstract interrupt controller with a "clean" interface
2) Reuse the same method that HPET, PIT and other time clients in i386 =

use - the global_clock_event pointer which allows you to wrest control =

back from the APIC and reuse the lapic_events local clockevents.
3) Create a new low level interrupt handler for the per-cpu VMI timer =

IRQs instead of re-using the APIC handler
4) Use the irq APIs to allocate IRQ-0 as a percpu IRQ, then change the =

IO-APIC code so it can know not to convert this PIC IRQ into a IO-APIC =

edge IRQ.
5) Disable the io-apic code entirely in paravirt mode.  Rather than =

change it, merge a parallel copy of it into the VMI code
 so that we can use the 99% of the code we need, with the one bugfix for =

#4 above
6) Disable the apic code entirely in paravirt mode.  Rather than change =

it, merge a parallel copy of into the VMI code so that we can use the =

90% of the code we need, with changes to the LVT0 timer handling.
7) For SMP only, allocate a non-shared IO-APIC IRQ, then after the =

IO-APIC is initialized, magically switch this to a percpu handler and =

start delivering local timer interrupts via this IRQ.
8) Create a pie-in-the-sky single interrupt source, reserve an IDT =

vector for it (or steal the lapic timer slot), and use the irq apis to =

set it up to be handled as a per-cpu interrupt.  This actually sounds =

pretty good, to me.  The only problem is we will need to switch the =

timer IRQ from IRQ 0 to this vector when the APIC is initialized, but I =

think we already have all the machinery we need to handle that.
9) ???

This is a serious question, I would appreciate a serious response =

instead of snide comments about the crappiness of our interface and our =

code.  Which do help a little, because by process of elimination,  we =

can rule out the approaches you don't like.  But it would be more =

productive if we could carry on a traditional dialogue and I could just =

ask a question and you could answer and vice versa.

Zach