From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Hecht Subject: Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree Date: Wed, 07 Mar 2007 13:33:41 -0800 Message-ID: <45EF2FB5.1080604@vmware.com> References: <200703060654.l266sVxr014860@shell0.pdx.osdl.net> <45ED16D2.3000202@vmware.com> <20070306084258.GA15745@elte.hu> <20070306084647.GA16280@elte.hu> <45ED2C82.3080008@vmware.com> <1173178774.24738.311.camel@localhost.localdomain> <45EDD82F.90204@vmware.com> <1173225182.24738.507.camel@localhost.localdomain> <45EE0628.1080108@goop.org> <45EE08E8.2020008@vmware.com> <1173228544.24738.514.camel@localhost.localdomain> <45EE0D10.7070807@vmware.com> <1173230305.24738.529.camel@localhost.localdomain> <45EE1EA3.90803@vmware.com> <1173256666.24738.576.camel@localhost.localdomain> <45EEF966.6060902@goop.org> <45EF0CF5.5090305@goop.org> <45EF175D.6030609@vmware.com> <1173302503.24738.795.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1173302503.24738.795.camel@localhost.localdomain> Sender: linux-kernel-owner@vger.kernel.org To: tglx@linutronix.de Cc: Jeremy Fitzhardinge , James Morris , Virtualization Mailing List , akpm@linux-foundation.org, john stultz , Ingo Molnar , LKML List-Id: virtualization@lists.linuxfoundation.org On 03/07/2007 01:21 PM, Thomas Gleixner wrote: > On Wed, 2007-03-07 at 11:49 -0800, Dan Hecht wrote: >> Jeremy, I saw you sent out the Xen version earlier, thanks. Here's ours >> for reference (please excuse any formating issues); it's also lean. >> We'll send out a proper patch later after some more testing: > > Ah. Bitching loud enough speeds things up. :) > We've always planned to do this. We just didn't want to create the dependency between paravirt_ops and clockevents too early such that they would depend on each other to merge to main line. Now that they are both there, we are all for it. >> /** vmi clockevent */ >> >> static struct clock_event_device vmi_global_clockevent; >> >> static inline u32 vmi_alarm_wiring(struct clock_event_device *evt) >> { >> return (evt == &vmi_global_clockevent) ? >> VMI_ALARM_WIRED_IRQ0 : VMI_ALARM_WIRED_LVTT; >> } >> >> static void vmi_timer_set_mode(enum clock_event_mode mode, >> struct clock_event_device *evt) >> { >> u32 wiring; >> cycle_t now, cycles_per_hz; >> BUG_ON(!irqs_disabled()); >> >> wiring = vmi_alarm_wiring(evt); >> if (wiring == VMI_ALARM_WIRED_LVTT) >> /* Route the interrupt to the correct vector */ >> apic_write_around(APIC_LVTT, LOCAL_TIMER_VECTOR); > > Wire that in the hypervisor. > >> switch (mode) { >> case CLOCK_EVT_MODE_ONESHOT: >> break; >> case CLOCK_EVT_MODE_PERIODIC: >> cycles_per_hz = vmi_timer_ops.get_cycle_frequency(); >> (void)do_div(cycles_per_hz, HZ); >> now = vmi_timer_ops.get_cycle_counter(vmi_counter(VMI_PERIODIC)); >> vmi_timer_ops.set_alarm(wiring | VMI_PERIODIC, >> now, cycles_per_hz); > > paravirt_ops->paravirt_clockevent->set_periodic(vcpu, period); > Huh? paravirt_ops isn't a hypervisor interface, it's just a linux code abstraction. The code on both sides of paravirt_ops is *linux* code, any way you cut it. clockevents is already a linux code abstraction. why introduce the redundancy? >> break; >> case CLOCK_EVT_MODE_UNUSED: >> case CLOCK_EVT_MODE_SHUTDOWN: > > paravirt_ops->paravirt_clockevent->stop_event(vcpu, mode); > You would be introducing the same redundancy. > >> switch (evt->mode) { >> case CLOCK_EVT_MODE_ONESHOT: >> vmi_timer_ops.cancel_alarm(VMI_ONESHOT); >> break; >> case CLOCK_EVT_MODE_PERIODIC: >> vmi_timer_ops.cancel_alarm(VMI_PERIODIC); >> break; >> default: >> break; >> } >> break; >> default: >> break; >> } >> } > > This whole vmi_timer_ops thing is horrible. All hypervisors can share > paravirt_ops->paravirt_clockevent and retrieve the methods on boot. > vmi_timer_ops.whatever is where the kernel <-> hypervisor boundary is crossed for VMI. >> static int vmi_timer_next_event(unsigned long delta, >> struct clock_event_device *evt) >> { >> /* Unfortunately, set_next_event interface only passes relative >> * expiry, but we want absolute expiry. It'd be better if were >> * were passed an aboslute expiry, since a bunch of time may >> * have been stolen between the time the delta is computed and >> * when we set the alarm below. */ >> cycle_t now = vmi_timer_ops.get_cycle_counter(vmi_counter(VMI_ONESHOT)); >> >> BUG_ON(evt->mode != CLOCK_EVT_MODE_ONESHOT); >> vmi_timer_ops.set_alarm(vmi_alarm_wiring(evt) | VMI_ONESHOT, >> now + delta, 0); >> return 0; >> } > > Great. Now we have: > > s64 event = startup_offset + ktime_to_ns(evt->next_event); > > if (HYPERVISOR_set_timer_op(event) < 0) > BUG(); > and > > vmi_timer_ops.set_alarm(vmi_alarm_wiring(evt) | VMI_ONESHOT, now + delta, 0); > > How will the next implementations look like ? > > lguest_program_timer(delta + lguest_current_time(), LGUEST_TIMER_SHOOT_ONCE); > > virt_nextgen_ops.set_timer_event(delta, NO_WE_NEED_NO_FLAGS); > > ....... > > This is tinkering of the best. My understanding of the paravirt > discussion at Kernel Summit was, that paravirt ops are exactly there to > prevent the above random hackery in the kernel and to allow _ALL_ > hypervisors to interact via a sane interface inside of the kernel. > No, that was not the point of paravirt_ops. It is actually the complete opposite of the intention of paravirt_ops. paravirt_ops' intent is exactly to allow for *multiple* hypervisor ABIs to exist in the kernel. At kernel summit, paravirt_ops was proposed to allow for multiple hypervisor ABI's to be targeted by the kernel. The code on both sides of paravirt_ops is *linux* code. > You are just perverting the whole idea of a standartized > paravirtualization interface. > > This things can be done for clocksources, clockevents, interrupts (the > generic irq code allows this) and probaly for a whole bunch of other > stuff. > > The current paravirt interface is completely insane and will explode > into an unmaintainable nightmare within no time, if we keep accepting > that crap further. > > No thanks. > Again, you are misunderstanding the intent of paravirt_ops and history behind it's development. >> #ifdef CONFIG_X86_LOCAL_APIC >> >> /* Replacement for lapic timer local clock event. >> * paravirt_ops.setup_boot_clock = vmi_nop >> * (continue using global_clock_event on cpu0) >> * paravirt_ops.setup_secondary_clock = vmi_timer_setup_local_alarm >> */ >> void __devinit vmi_timer_setup_local_alarm(void) >> { >> struct clock_event_device *evt = &__get_cpu_var(local_clock_events); >> >> /* Then, start it back up as a local clockevent device. */ >> memcpy(evt, &vmi_clockevent, sizeof(*evt)); >> evt->cpumask = cpumask_of_cpu(smp_processor_id()); >> >> printk(KERN_WARNING "vmi: registering clock event %s. mult=%lu >> shift=%u\n", >> evt->name, evt->mult, evt->shift); >> clockevents_register_device(evt); >> } >> >> #endif > > Why the hell do you need an lapic emulator here? This is exactly the > kind of crap, we do not want to have. clockevents do not care which > piece of hardware is calling them and we do not care how a particular > hypervisor is wiring that hardware. > Again, I said in a previous mail that we am fine with introducing our own interrupt handler rather than using the lapic one. Dan