public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Zachary Amsden <zach@vmware.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>,
	Jeremy Fitzhardinge <jeremy@goop.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	john stultz <johnstul@us.ibm.com>,
	akpm@linux-foundation.org, LKML <linux-kernel@vger.kernel.org>,
	Rusty Russell <rusty@rustcorp.com.au>, Andi Kleen <ak@suse.de>,
	Chris Wright <chrisw@sous-sol.org>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: ABI coupling to hypervisors via CONFIG_PARAVIRT
Date: Fri, 09 Mar 2007 15:07:15 -0800	[thread overview]
Message-ID: <45F1E8A3.6000100@vmware.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0703091423570.10832@woody.linux-foundation.org>

Linus Torvalds wrote:
>> but ... maybe because VMI is so lowlevel and covers /all/ of x86 today, 
>> it will always be able to emulate whatever different concept we can come 
>> up with? Do we really know this absolutely sure?
>>     
>
> "For sure"? Absolutely not. But since any new interfaces we come up with 
> for doing timers etc had better work perfectly fine on an old hardware 
> platform too, we can't exactly require any interfaces that do things that 
> a bog-standard old dual-PPro didn't do 10 years ago, can we?
>
> So assumign that the VMI interface is roughly as powerful (by virtue of 
> basically emulating it) as the old single-ioapic/single-lapic systems we 
> used to use, I don't think it should ever be a real problem. Hmm?
>   

Sorry to keep you in a thread you don't want more to do with Linus, but 
to answer the question completely directly:

There are four design requirement which are inviolate for achieving a 
large measurable performance gain, which is the primary benefit of i386 
paravirt-ops for us.  These changes are required for /ALL/ high 
performance paravirtualized kernels not running in hardware 
virtualization, across a broad spectrum of hypervisors, and have either 
zero negative impact or opportunity for additional gain when hardware 
virtualization is enabled.

1) Ability to run the kernel at non-zero CPL
2) Ability to replace hardware interrupt masking functions with 
virtualizable equivalents
3) Notification when page tables are allocated and released
4) Notification in some form when page table entries are updated (or 
vma's are changed)

Everything after this are incremental gains, some more valuable than 
others, but not as major in significance as the above four (apic_write, 
incidentally, _is_ one of the more substantial gains for us).

These don't seem to be a major burden on the kernel at all.

#1 is already the default case now for even native hardware.
#2 requires a lot of hooks because interrupt masking is a common 
function, and this is where the large numbers of hook points Ingo was 
demonstrating came from, but these icache effects and costs are on 
already expensive instructions.  In fact, it appears on some hardware, 
the nop padding around cli / sti / pushf / popf contributes to a 
mysterious performance gain, perhaps due to some pipelining anomaly.
#3 is not a common enough operation to be of performance concern.  It 
does however, require pagetables, just as native hardware does.  Which 
we can implement perfectly well anyway in our backend, just as the 
native backend would if some reckless madman removed all notions of page 
tables from a paravirt kernel.
#4 involves an extra call in page fault paths and from some points in 
the mm layer.

There are no ABI requirements tied to these, merely the presence of any 
usable API for them in paravirt-ops.

Linus is right - our virtual hardware is an exact replica of real 
hardware.  So no matter how you change paravirtualization in the kernel, 
anything that runs on real hardware will continue to run on VMware.  VMI 
is tied very closely to the hardware, on purpose, and follows the rules 
of native hardware extremely closely.  So you can pretty much twist and 
abuse paravirt-ops in a number of ways, and as long as it continues to 
run on real hardware with the above four requirements, it still runs 
even on VMI.  Violate the above four requirements, and it costs a lot of 
performance, but we still continue to run.

Zach

  parent reply	other threads:[~2007-03-09 23:07 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-09 18:02 ABI coupling to hypervisors via CONFIG_PARAVIRT Ingo Molnar
2007-03-09 18:28 ` Andi Kleen
2007-03-09 18:30 ` Linus Torvalds
2007-03-09 19:24   ` Ingo Molnar
2007-03-09 19:51     ` Linus Torvalds
2007-03-09 20:12       ` Ingo Molnar
2007-03-09 21:05         ` Jeremy Fitzhardinge
2007-03-09 21:06         ` Linus Torvalds
2007-03-09 21:36           ` Ingo Molnar
2007-03-09 21:40             ` Jeremy Fitzhardinge
2007-03-09 22:27             ` Linus Torvalds
2007-03-09 22:50               ` Ingo Molnar
2007-03-09 23:07               ` Zachary Amsden [this message]
2007-03-09 23:10             ` Ingo Molnar
2007-03-09 23:38               ` Zachary Amsden
2007-03-09 21:04       ` Ingo Molnar
2007-03-09 21:27         ` Chris Wright
2007-03-09 21:47           ` Ingo Molnar
2007-03-09 21:59             ` Jeremy Fitzhardinge
2007-03-09 22:12               ` Ingo Molnar
2007-03-09 22:30                 ` Jeremy Fitzhardinge
2007-03-09 22:10             ` Chris Wright
2007-03-09 22:24               ` Ingo Molnar
2007-03-09 22:36                 ` Jeremy Fitzhardinge
2007-03-09 23:38                   ` Ingo Molnar
2007-03-09 22:46                 ` Chris Wright
2007-03-09 23:02                   ` Ingo Molnar
2007-03-09 23:13                 ` Rik van Riel
2007-03-09 20:50     ` Jan Engelhardt
2007-03-09 22:50       ` Lee Revell
2007-03-14  8:41         ` alsa was " Pavel Machek
2007-03-14 15:59           ` Jaroslav Kysela
2007-03-15  9:03             ` Pavel Machek
2007-03-15  9:10               ` Pavel Machek
2007-03-15  9:23                 ` Zachary Amsden
2007-03-15  9:32                   ` Pavel Machek
2007-03-09 19:00 ` Chris Wright

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45F1E8A3.6000100@vmware.com \
    --to=zach@vmware.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=chrisw@sous-sol.org \
    --cc=jeremy@goop.org \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox