public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* Fast Syscalls and Virtualised Linux on Linux
@ 2005-06-29  4:20 Peter Chubb
  2005-06-29 15:03 ` Magenheimer, Dan (HP Labs Fort Collins)
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Peter Chubb @ 2005-06-29  4:20 UTC (permalink / raw)
  To: linux-ia64


Hi,
	I'm working on a user-mode-linux like thing for IA64.  I have
something working. but performance is awful.  The reason is
syscall-via-break.

When a process running in the guest OS calls a system call via break
(or a system call is restarted, and so falls back to syscall via
break) the host OS catches it and tries to execute the system call.
This is obviously undesireable.

The trick I'm using at present (not my own, Matt Chapman implemented
it) is to ptrace the Virtual Machine Monitor, catch all system calls,
and redirect ones that are for the guest OS to the guest.

Obviously, when a process running under the guest uses the fast system
call path, the host doesn't see it, and so it can run at full speed.
But there are enough cases where things fall back to the old via-break
path that the ptrace hack is still needed.  And because it intercepts
*every* system call even virtualised operations like read/write to the
virtual disks cop the overhead.

One approach might be to redefine __BREAK_SYSCALL for the guest OS.
That'd require a specially compiled glibc and kernel, and possibly
other executables.

Does anyone have a better idea?


-- 
Dr Peter Chubb  http://www.gelato.unsw.edu.au  peterc AT gelato.unsw.edu.au
The technical we do immediately,  the political takes *forever*

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Fast Syscalls and Virtualised Linux on Linux
  2005-06-29  4:20 Fast Syscalls and Virtualised Linux on Linux Peter Chubb
@ 2005-06-29 15:03 ` Magenheimer, Dan (HP Labs Fort Collins)
  2005-06-30 14:12 ` Jes Sorensen
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Magenheimer, Dan (HP Labs Fort Collins) @ 2005-06-29 15:03 UTC (permalink / raw)
  To: linux-ia64

I gather you are measuring with lmbench?  Is the performance
of a real, non-synthetic benchmark awful?

Also, I'm guessing that your guests are running at PL0?

Dan 

> -----Original Message-----
> From: linux-ia64-owner@vger.kernel.org 
> [mailto:linux-ia64-owner@vger.kernel.org] On Behalf Of Peter Chubb
> Sent: Tuesday, June 28, 2005 10:21 PM
> To: linux-ia64@vger.kernel.org
> Subject: Fast Syscalls and Virtualised Linux on Linux
> 
> 
> Hi,
> 	I'm working on a user-mode-linux like thing for IA64.  I have
> something working. but performance is awful.  The reason is
> syscall-via-break.
> 
> When a process running in the guest OS calls a system call via break
> (or a system call is restarted, and so falls back to syscall via
> break) the host OS catches it and tries to execute the system call.
> This is obviously undesireable.
> 
> The trick I'm using at present (not my own, Matt Chapman implemented
> it) is to ptrace the Virtual Machine Monitor, catch all system calls,
> and redirect ones that are for the guest OS to the guest.
> 
> Obviously, when a process running under the guest uses the fast system
> call path, the host doesn't see it, and so it can run at full speed.
> But there are enough cases where things fall back to the old via-break
> path that the ptrace hack is still needed.  And because it intercepts
> *every* system call even virtualised operations like read/write to the
> virtual disks cop the overhead.
> 
> One approach might be to redefine __BREAK_SYSCALL for the guest OS.
> That'd require a specially compiled glibc and kernel, and possibly
> other executables.
> 
> Does anyone have a better idea?
> 
> 
> -- 
> Dr Peter Chubb  http://www.gelato.unsw.edu.au  peterc AT 
> gelato.unsw.edu.au
> The technical we do immediately,  the political takes *forever*
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fast Syscalls and Virtualised Linux on Linux
  2005-06-29  4:20 Fast Syscalls and Virtualised Linux on Linux Peter Chubb
  2005-06-29 15:03 ` Magenheimer, Dan (HP Labs Fort Collins)
@ 2005-06-30 14:12 ` Jes Sorensen
  2005-06-30 14:15 ` Christoph Hellwig
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Jes Sorensen @ 2005-06-30 14:12 UTC (permalink / raw)
  To: linux-ia64

>>>>> "Peter" = Peter Chubb <peterc@gelato.unsw.edu.au> writes:

Peter> One approach might be to redefine __BREAK_SYSCALL for the guest
Peter> OS.  That'd require a specially compiled glibc and kernel, and
Peter> possibly other executables.

Peter> Does anyone have a better idea?

Hi Peter,

That approach will be begging for hidden nasties all over the
place. You need to catch any static binaries as well then and anything
else that might use the break directly.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fast Syscalls and Virtualised Linux on Linux
  2005-06-29  4:20 Fast Syscalls and Virtualised Linux on Linux Peter Chubb
  2005-06-29 15:03 ` Magenheimer, Dan (HP Labs Fort Collins)
  2005-06-30 14:12 ` Jes Sorensen
@ 2005-06-30 14:15 ` Christoph Hellwig
  2005-06-30 23:05 ` Peter Chubb
  2005-07-04 15:09 ` Christoph Hellwig
  4 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2005-06-30 14:15 UTC (permalink / raw)
  To: linux-ia64

On Wed, Jun 29, 2005 at 02:20:45PM +1000, Peter Chubb wrote:
> One approach might be to redefine __BREAK_SYSCALL for the guest OS.
> That'd require a specially compiled glibc and kernel, and possibly
> other executables.
> 
> Does anyone have a better idea?

Make sure glibc is always doing syscalls via the vdso (and that the kernel
actually support this ;-)) Then the guest kernel can provide an alternate
vdso implementation using whatever implementation it wants.  This also
allows the kernel an chose different syscall implemetations for native
compilation later on.  x86 is already doing this to suport sysenter()


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fast Syscalls and Virtualised Linux on Linux
  2005-06-29  4:20 Fast Syscalls and Virtualised Linux on Linux Peter Chubb
                   ` (2 preceding siblings ...)
  2005-06-30 14:15 ` Christoph Hellwig
@ 2005-06-30 23:05 ` Peter Chubb
  2005-07-04 15:09 ` Christoph Hellwig
  4 siblings, 0 replies; 6+ messages in thread
From: Peter Chubb @ 2005-06-30 23:05 UTC (permalink / raw)
  To: linux-ia64

>>>>> "Christoph" = Christoph Hellwig <hch@infradead.org> writes:

Christoph> On Wed, Jun 29, 2005 at 02:20:45PM +1000, Peter Chubb
Christoph> wrote:
>> One approach might be to redefine __BREAK_SYSCALL for the guest OS.
>> That'd require a specially compiled glibc and kernel, and possibly
>> other executables.
>> 
>> Does anyone have a better idea?

Christoph> Make sure glibc is always doing syscalls via the vdso (and
Christoph> that the kernel actually support this ;-)) Then the guest

It doesn't, that's the problem (otherwise it's easy).

Problem areas are:
	sys_sigreturn
	sys_clone2
	sys_syscall
	sys_vfork



In the clone2() commentary there's:

	/*
	 * clone2() is special: the child cannot execute br.ret right
	 * after the system call returns, because it starts out
	 * executing on an empty stack.  Because of this, we can't use
	 * the new (lightweight) syscall convention here.  Instead, we
	 * just fall back on always using "break".
	 *
	 * Furthermore, since the child starts with an empty stack, we
	 * need to avoid unwinding past invalid memory.  To that end,
	 * we'll pretend now that __clone2() is the end of the
	 * call-chain.  This is wrong for the parent, but only until
	 * it returns from clone2() but it's better than the
	 * alternative.
	 */


-- 
Dr Peter Chubb  http://www.gelato.unsw.edu.au  peterc AT gelato.unsw.edu.au
The technical we do immediately,  the political takes *forever*

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Fast Syscalls and Virtualised Linux on Linux
  2005-06-29  4:20 Fast Syscalls and Virtualised Linux on Linux Peter Chubb
                   ` (3 preceding siblings ...)
  2005-06-30 23:05 ` Peter Chubb
@ 2005-07-04 15:09 ` Christoph Hellwig
  4 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2005-07-04 15:09 UTC (permalink / raw)
  To: linux-ia64

On Fri, Jul 01, 2005 at 09:05:36AM +1000, Peter Chubb wrote:
> In the clone2() commentary there's:
> 
> 	/*
> 	 * clone2() is special: the child cannot execute br.ret right
> 	 * after the system call returns, because it starts out
> 	 * executing on an empty stack.  Because of this, we can't use
> 	 * the new (lightweight) syscall convention here.  Instead, we
> 	 * just fall back on always using "break".
> 	 *
> 	 * Furthermore, since the child starts with an empty stack, we
> 	 * need to avoid unwinding past invalid memory.  To that end,
> 	 * we'll pretend now that __clone2() is the end of the
> 	 * call-chain.  This is wrong for the parent, but only until
> 	 * it returns from clone2() but it's better than the
> 	 * alternative.
> 	 */

I might be missing something, so please help me a little more.  If you
look at x86 they always provider a vdso, for hardware recent enough
yhey use sysenter, else they do the same old int 0x80 an old libc would
do.  Now in the ia64 vdso couldn't you do a syscall similar enough to
the old one, just with the fixups inline you'd want the patches libc
for?


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-07-04 15:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-29  4:20 Fast Syscalls and Virtualised Linux on Linux Peter Chubb
2005-06-29 15:03 ` Magenheimer, Dan (HP Labs Fort Collins)
2005-06-30 14:12 ` Jes Sorensen
2005-06-30 14:15 ` Christoph Hellwig
2005-06-30 23:05 ` Peter Chubb
2005-07-04 15:09 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox