From: Zachary Amsden <zach@vmware.com>
To: tglx@linutronix.de
Cc: john stultz <johnstul@us.ibm.com>,
LKML <linux-kernel@vger.kernel.org>,
Chris Wright <chrisw@sous-sol.org>,
Virtualization Mailing List <virtualization@lists.osdl.org>,
Ingo Molnar <mingo@elte.hu>,
Linus Torvalds <torvalds@linux-foundation.org>,
akpm@linux-foundation.org
Subject: Re: hardwired VMI crap
Date: Thu, 08 Mar 2007 12:46:20 -0800 [thread overview]
Message-ID: <45F0761C.6060107@vmware.com> (raw)
In-Reply-To: <1173352154.24738.1023.camel@localhost.localdomain>
Thomas Gleixner wrote:
> On Thu, 2007-03-08 at 02:06 -0800, Zachary Amsden wrote:
>
>>>> The correct solution here is to properly separate the APIC, SMP, and
>>>> timer code so the logic of it which we want to reuse is separated from
>>>> the hardware dependence. Clock events and clocksources take care of
>>>> most of the timer issues, but there is still ugliness from SMP timer
>>>> events depending on having part of the APIC infrastructure for wiring
>>>> the interrupt gates.
>>>>
>>>>
>>> what are you talking about? A clockevents driver does not need to know
>>> about lapic details, at all. In terms of interrupt gates for the
>>> hypervisor to notify about clock events - use a virtual interrupt
>>> controller via genirq.
>>>
>>>
>> See my last e-mail. It is not possible on i386, since local per-cpu
>> interrupts are only supported via the APIC.
>>
>
> It is not possible from your POV. It is possible, as we have already a
> complete irq abstraction layer, which supports _ALL_ of the
> requirements.
>
> To make use of it in a maintainable way, it just needs the work of doing
> a proper client for the genirq layer, which get's its interrupt injected
> by the hypervisor.
>
> genirq() does not care by which mechanism handle_percpu_irq() is called.
>
> We provided the abstractions and you just tell us straight in the face,
> that your hypervisor works that way and therefor we have to accept that
> you do it that way.
>
> It's not rocket science to implement an abstract interrupt controller,
> which lets you inject per cpu or global interrupts into the generic
> layer. It needs some preparatory work to distangle the boot code
> assumptions from the implicit hardware, but this is a better spent time,
> than another set of hackery, which you already advertised for smpboot.c
>
When we're about two weeks away from a product release and you are
threatening to unmerge or block our code because we didn't create an
abstract interrupt controller, we re-used the APIC and IO-APIC, this is
uber rocket science. We've been doing things this way, with public
patches for over a year, and you've even been CC'd on some of the
discussions. So it is a little late to tell us - "redesign your
hypervisor, or else.."
> All we want you and the other hypervisor folks to do is to
>
> - use existing abstractions in the way they are designed
> - create new ones where applicable
>
Great.
> - break the hardwired hardware assumptions, so a sane emulation model
> can be used.
>
Why? This is your own invention, as you think it would make life
easier. It doesn't - you still have real hardware to deal with, and
your code will always be designed to operate on silicon with these
hardwired assumptions. Breaking away from that can actually make the
code more complex, both in the hypervisor and in Linux.
>
>> So far, all you have done is not complain about our code until it was
>> merged, the pursue every tactic possible to break it. It is not us that
>> are stonewalling.
>>
>
> You have been told before. Andi asked you more than once to move to
> clockevents.
>
Which we have done. And now you refuse to give any feedback on
technical points, but maintain an objection to the way we have done it.
> If you can not change your hypervisor model to use a sane abstraction of
> interrupts, then please emulate lapic, io_apic and everything else
> _OUTSIDE_ of the kernel.
>
We faithfully emulate lapic, io_apic, the pit, pic, and a normal
interrupt subsystem. We can't magically stop using these things because
we have to support traditional full virtualization. Which means any
version of Linux, virtual interrupt controller or not, is going to boot
up, find these things, and try to use them. So for a paravirt kernel,
either we have to disable each of these things in the Linux code or just
re-use them.
So we re-use them. We don't even change their semantics. Where we get
into trouble is the fact that only the lapic can deliver per-cpu timer
IRQs, and we need to provide a better time abstraction than TSC. So we
need a time device, but there is no way to implement it in the
traditional hardware model.
And I ask again for your feedback on which approach you think is correct:
1) Rewrite the interrupt subsystem of our hypervisor, making it
incompatible with full virtualization, so that we can support an
abstract interrupt controller with a "clean" interface
2) Reuse the same method that HPET, PIT and other time clients in i386
use - the global_clock_event pointer which allows you to wrest control
back from the APIC and reuse the lapic_events local clockevents.
3) Create a new low level interrupt handler for the per-cpu VMI timer
IRQs instead of re-using the APIC handler
4) Use the irq APIs to allocate IRQ-0 as a percpu IRQ, then change the
IO-APIC code so it can know not to convert this PIC IRQ into a IO-APIC
edge IRQ.
5) Disable the io-apic code entirely in paravirt mode. Rather than
change it, merge a parallel copy of it into the VMI code
so that we can use the 99% of the code we need, with the one bugfix for
#4 above
6) Disable the apic code entirely in paravirt mode. Rather than change
it, merge a parallel copy of into the VMI code so that we can use the
90% of the code we need, with changes to the LVT0 timer handling.
7) For SMP only, allocate a non-shared IO-APIC IRQ, then after the
IO-APIC is initialized, magically switch this to a percpu handler and
start delivering local timer interrupts via this IRQ.
8) Create a pie-in-the-sky single interrupt source, reserve an IDT
vector for it (or steal the lapic timer slot), and use the irq apis to
set it up to be handled as a per-cpu interrupt. This actually sounds
pretty good, to me. The only problem is we will need to switch the
timer IRQ from IRQ 0 to this vector when the APIC is initialized, but I
think we already have all the machinery we need to handle that.
9) ???
This is a serious question, I would appreciate a serious response
instead of snide comments about the crappiness of our interface and our
code. Which do help a little, because by process of elimination, we
can rule out the approaches you don't like. But it would be more
productive if we could carry on a traditional dialogue and I could just
ask a question and you could answer and vice versa.
Zach
next prev parent reply other threads:[~2007-03-08 20:46 UTC|newest]
Thread overview: 105+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <200703060654.l266sVxr014860@shell0.pdx.osdl.net>
[not found] ` <45ED16D2.3000202@vmware.com>
[not found] ` <20070306084258.GA15745@elte.hu>
[not found] ` <20070306084647.GA16280@elte.hu>
2007-03-06 8:55 ` + stupid-hack-to-make-mainline-build.patch added to -mm tree Zachary Amsden
2007-03-06 10:59 ` Thomas Gleixner
2007-03-06 21:07 ` Dan Hecht
2007-03-06 22:21 ` Andi Kleen
2007-03-06 21:32 ` Dan Hecht
2007-03-06 23:53 ` Thomas Gleixner
2007-03-07 0:24 ` Jeremy Fitzhardinge
2007-03-07 0:35 ` Dan Hecht
2007-03-07 0:49 ` Thomas Gleixner
2007-03-07 0:53 ` Dan Hecht
2007-03-07 1:18 ` Thomas Gleixner
2007-03-07 2:08 ` Dan Hecht
2007-03-07 8:37 ` Thomas Gleixner
2007-03-07 17:41 ` Jeremy Fitzhardinge
2007-03-07 17:49 ` Ingo Molnar
2007-03-07 18:03 ` James Morris
2007-03-07 18:35 ` Jeremy Fitzhardinge
2007-03-08 0:45 ` Alan Cox
2007-03-07 17:52 ` Ingo Molnar
2007-03-07 18:28 ` Jeremy Fitzhardinge
2007-03-07 18:53 ` Thomas Gleixner
2007-03-07 18:11 ` James Morris
2007-03-07 18:56 ` Thomas Gleixner
2007-03-07 19:05 ` Jeremy Fitzhardinge
2007-03-07 19:49 ` Dan Hecht
2007-03-07 20:11 ` Jeremy Fitzhardinge
2007-03-07 20:49 ` Dan Hecht
2007-03-07 21:14 ` Thomas Gleixner
2007-03-07 20:57 ` Thomas Gleixner
2007-03-07 21:02 ` Dan Hecht
2007-03-07 21:08 ` Jeremy Fitzhardinge
2007-03-07 21:19 ` Thomas Gleixner
2007-03-07 21:14 ` Dan Hecht
2007-03-07 21:21 ` Thomas Gleixner
2007-03-07 21:33 ` Dan Hecht
2007-03-07 22:05 ` Jeremy Fitzhardinge
2007-03-07 23:05 ` Thomas Gleixner
2007-03-07 23:25 ` Zachary Amsden
2007-03-07 23:36 ` Jeremy Fitzhardinge
2007-03-07 23:40 ` Zachary Amsden
2007-03-08 18:30 ` Chris Wright
2007-03-08 0:22 ` Thomas Gleixner
2007-03-08 1:01 ` Daniel Arai
2007-03-08 1:23 ` Jeremy Fitzhardinge
2007-03-08 7:02 ` Thomas Gleixner
2007-03-08 7:28 ` Thomas Gleixner
2007-03-08 8:01 ` Zachary Amsden
2007-03-08 18:24 ` Chris Wright
2007-03-08 18:44 ` Daniel Arai
2007-03-08 19:14 ` Chris Wright
2007-03-08 19:17 ` Ingo Molnar
2007-03-08 19:42 ` Jeremy Fitzhardinge
2007-03-08 19:47 ` Chris Wright
2007-03-08 19:52 ` Jeremy Fitzhardinge
2007-03-08 20:10 ` Chris Wright
2007-03-08 20:18 ` Jeremy Fitzhardinge
2007-03-08 20:23 ` Chris Wright
2007-03-08 20:33 ` Jeremy Fitzhardinge
2007-03-08 20:42 ` Chris Wright
2007-03-08 20:42 ` Jeremy Fitzhardinge
2007-03-08 21:45 ` Andi Kleen
2007-03-08 19:54 ` Ingo Molnar
[not found] ` <20070308091019.GA19460@elte.hu>
2007-03-08 10:06 ` hardwired VMI crap Zachary Amsden
2007-03-08 11:09 ` Thomas Gleixner
2007-03-08 20:46 ` Zachary Amsden [this message]
2007-03-08 21:15 ` Jeremy Fitzhardinge
2007-03-08 21:34 ` Ingo Molnar
2007-03-08 21:43 ` Andi Kleen
2007-03-08 23:39 ` Jeremy Fitzhardinge
2007-03-08 23:55 ` Zachary Amsden
2007-03-09 0:10 ` Jeremy Fitzhardinge
2007-03-09 0:29 ` Linus Torvalds
2007-03-09 0:22 ` Daniel Walker
2007-03-09 0:28 ` Thomas Gleixner
2007-03-09 0:04 ` Thomas Gleixner
2007-03-09 0:44 ` Jeremy Fitzhardinge
2007-03-08 22:31 ` Zachary Amsden
2007-03-08 21:39 ` Andi Kleen
2007-03-08 22:58 ` Zachary Amsden
2007-03-08 18:35 ` Chris Wright
2007-03-07 23:33 ` + stupid-hack-to-make-mainline-build.patch added to -mm tree Jeremy Fitzhardinge
2007-03-07 23:52 ` Dan Hecht
2007-03-08 0:19 ` Jeremy Fitzhardinge
2007-03-08 0:35 ` Thomas Gleixner
2007-03-08 0:38 ` Jeremy Fitzhardinge
2007-03-07 20:40 ` Thomas Gleixner
2007-03-07 21:07 ` Jeremy Fitzhardinge
2007-03-07 21:40 ` Thomas Gleixner
2007-03-07 21:34 ` Dan Hecht
2007-03-07 22:14 ` Thomas Gleixner
2007-03-07 22:17 ` Zachary Amsden
2007-03-07 22:31 ` Thomas Gleixner
2007-03-07 22:28 ` Dan Hecht
2007-03-08 8:01 ` Ingo Molnar
2007-03-08 8:15 ` Keir Fraser
2007-03-08 8:41 ` Jeremy Fitzhardinge
2007-03-08 10:26 ` Rusty Russell
2007-03-07 21:42 ` Dan Hecht
2007-03-07 22:07 ` Thomas Gleixner
2007-03-07 5:10 ` Jeremy Fitzhardinge
2007-03-07 0:40 ` Thomas Gleixner
2007-03-07 0:42 ` Dan Hecht
2007-03-07 1:22 ` Thomas Gleixner
2007-03-07 1:44 ` Dan Hecht
2007-03-07 7:48 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45F0761C.6060107@vmware.com \
--to=zach@vmware.com \
--cc=akpm@linux-foundation.org \
--cc=chrisw@sous-sol.org \
--cc=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=virtualization@lists.osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).