virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Zachary Amsden <zach@vmware.com>
To: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Virtualization Mailing List <virtualization@lists.osdl.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Chris Wright <chrisw@sous-sol.org>, Avi Kivity <avi@qumranet.com>,
	Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt
Date: Wed, 22 Aug 2007 13:43:30 -0700	[thread overview]
Message-ID: <46CC9FF2.3040406@vmware.com> (raw)
In-Reply-To: <20070822194632.GD8058@bingen.suse.de>

Andi Kleen wrote:
>
> How is that measured? In a loop? In the same pipeline state?
>
> It seems a little dubious to me.
>   

I did the experiments in a controlled environment, with interrupts 
disabled and care to get the pipeline in the same state.  It was a 
perfectly repeatable experiment.  I don't have exact cycle time anymore, 
but they were the tightest measurements I've even seen on cycle counts 
because of the unique nature of serializing the processor for the fault 
/ privilege transition.  I tested a variety of different conditions, 
including different types of #GP (yes, the cost does vary), #NP, #PF, 
sysenter, int $0xxx.  Sysenter was the fastest, by far.  Int was about 
5x the cost.  #GP and friends were all about similar costs.  #PF was the 
most expensive.


>   
>>>> to verify protection in the page tables mapping the page allows 
>>>> execution (P, !NX, and U/S check).  This is a lot more expensive than a 
>>>>    
>>>>         
>>> When the page is not executable or not present you get #PF not #GP. 
>>> So the hardware already checks that.
>>>
>>> The only case where you would need to check yourself is if you emulate
>>> NX on non NX capable hardware, but I can't see you doing that.
>>>  
>>>       
>> No, it doesn't.  Between the #GP and decode, you have an SMP race where 
>> another processor can rewrite the instruction.
>>     
>
> That can be ignored imho. If the page goes away you'll notice
> when you handle the page fault on read. If it becomes NX then the execution
> just happened to be logically a little earlier.
>
>   

No, you can't ignore it.  The page protections won't change between the 
GP and the decoder execution, but the instruction can, causing you to 
decode into the next page where the processor would not have.  !P 
becomes obvious, but failure to respect NX or U/S is an exploitable 
bug.  Put a 1 byte instruction at the end of a page crossing into a NX 
(or supervisor page).  Remotely, change keep switching between the 
instruction and a segment override.

Result: user executes instruction on supervisor code page, learning data 
as a result of this; code on NX page gets executed.

> Or easier to just write a backend for the lguest virtio drivers,
> that will be likely faster in the end anyways than this gross
> hack.
>   

We already have drivers for all of our hardware in Linux.  Most of the 
hardware we emulate is physical hardware, and there are no virtual 
drivers for it.  Should we take the BusLogic driver and "paravirtualize" 
it by adding VMI hypercalls?  We might benefit from it, but would the 
BusLogic driver?  It sets a nasty precedent for maintenance as different 
hypervisors and emulators hack up different drivers for their own 
performance.

Our SCSI and IDE emulation and thus the drivers used by Linux are pretty 
much fixed in stone; we are not going to go about changing a tricky 
hardware interface to a virtual one, it is simply too risky for 
something as critical as storage.  We might be able to move our network 
driver over to virtio, but that is not a short-term prospect either.

There is great advantage in talking to our existing device layer faster, 
and this is something that is valuable today.

> Really LinuxHAL^wparavirt ops is already so complicated that
> any new hooks need an extremly good justification and that is
> just not here for this.
>
> We can add it if you find an equivalent number of hooks
> to eliminate.
>   

Interesting trade.  What if I sanitized the whole I/O messy macros into 
something fun and friendly:

native_port_in(int port, iosize_t opsize, int delay)
native_port_out(int port, iosize_t opsize, u32 output, int delay)
native_port_string_in(int port, void *ptr, iosize_t opsize, unsigned 
count, int delay)
native_port_string_out(int port, void *ptr, iosize_t opsize, unsigned 
count, int delay)

Then we can be rid of all the macro goo in io.h, which frightens my 
mother.  We might even be able to get rid of the umpteen different 
places where drivers wrap iospace access with their own byte / word / 
long functions so they can switch between port I/O and memory mapped I/O 
by moving it all into common infrastructure.

We could make similar (unwelcome?) advances on the pte functions if it 
were not for the regrettable disconnect between pte_high / pte_low and 
the rest.  Perhaps if it was hidden in macros?

Zach

  reply	other threads:[~2007-08-22 20:43 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-22  5:23 [PATCH] Add I/O hypercalls for i386 paravirt Zachary Amsden
2007-08-22  5:34 ` Avi Kivity
2007-08-22  5:40   ` Zachary Amsden
2007-08-22  8:37     ` Avi Kivity
2007-08-22 16:16       ` Zachary Amsden
2007-08-22  6:25   ` Rusty Russell
2007-08-22 10:35     ` Andi Kleen
2007-08-22  9:51       ` Avi Kivity
2007-08-22 11:08         ` Andi Kleen
2007-08-22 10:23           ` Avi Kivity
2007-08-22 11:23             ` Andi Kleen
2007-08-22 12:09               ` Avi Kivity
2007-08-22 13:15                 ` Andi Kleen
2007-08-22 12:32                   ` Avi Kivity
2007-08-22 16:11             ` Jeff Garzik
     [not found]     ` <1188237281.5972.69.camel@localhost.localdomain>
     [not found]       ` <46D3C60B.3050605@vmware.com>
2007-08-28 11:18         ` Benjamin Herrenschmidt
2007-08-22  6:00 ` H. Peter Anvin
2007-08-22  6:03   ` Zachary Amsden
2007-08-24 12:20     ` Pavel Machek
2007-08-22 10:22 ` Andi Kleen
2007-08-22 16:48   ` Zachary Amsden
2007-08-22 17:59     ` Andi Kleen
2007-08-22 17:07       ` Zachary Amsden
2007-08-22 19:46         ` Andi Kleen
2007-08-22 20:43           ` Zachary Amsden [this message]
2007-08-22 22:04             ` Andi Kleen
2007-08-22 21:25               ` Alan Cox
2007-08-22 21:41                 ` Zachary Amsden
2007-08-23  5:34                 ` Rusty Russell
2007-08-22 21:45               ` Zachary Amsden
2007-08-22 17:34       ` Alan Cox
2007-08-22 21:54 ` Jeremy Fitzhardinge
2007-08-22 22:15   ` James Courtier-Dutton
2007-08-22 22:20     ` Chris Wright
2007-08-22 22:29       ` James Courtier-Dutton
2007-08-22 22:34         ` Chris Wright
2007-08-22 23:14         ` Jeremy Fitzhardinge
2007-08-23  0:47           ` Andi Kleen
2007-08-23  0:38             ` Jeremy Fitzhardinge
2007-08-23  2:34               ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46CC9FF2.3040406@vmware.com \
    --to=zach@vmware.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=avi@qumranet.com \
    --cc=chrisw@sous-sol.org \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rusty@rustcorp.com.au \
    --cc=virtualization@lists.osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).