public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC, PATCH 0/24] VMI i386 Linux virtualization interface proposal
@ 2006-03-13 17:58 Zachary Amsden
  2006-03-13 18:09 ` Arjan van de Ven
                   ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Zachary Amsden @ 2006-03-13 17:58 UTC (permalink / raw)
  To: Linus Torvalds, Linux Kernel Mailing List,
	Virtualization Mailing List, Xen-devel, Andrew Morton,
	Zachary Amsden, Dan Hecht, Dan Arai, Anne Holler,
	Pratap Subrahmanyam, Christopher Li, Joshua LeVasseur,
	Chris Wright, Rik Van Riel, Jyothy Reddy, Jack Lo, Kip Macy,
	Jan Beulich, Ky Srinivasan, Wim Coekaerts, Leendert van Doorn,
	Zachary Amsden

In OLS 2005, we described the work that we have been doing in VMware
with respect a common interface for paravirtualization of Linux. We
shared the general vision in Rik's virtualization BoF.

This note is an update on our further work on the Virtual Machine
Interface, VMI.  The patches provided have been tested on 2.6.16-rc6.
We are currently recollecting performance information for the new -rc6
kernel, but expect our numbers to match previous results, which showed
no impact whatsoever on macro benchmarks, and nearly neglible impact
on microbenchmarks.

Unlike the full-virtualization techniques used in the traditional VMware
products, paravirtualization is a technique where the operating system
is modified to enlighten the hypervisor with timely knowledge about the
operating system's activities. Since the hypervisor now depends on the
kernel to tell it about common idioms etc, it does not need to write
protect OS objects such as page and descriptor tables as a solution
based on full-virtualization needs. This has two important effects (a)
it shortens the critical path, since faulting is expensive on modern
processors (b) by eliminating complex heuristics the hypervisor is
simplified. While the former delivers performance, the latter is quite
important too. 

Not surprisingly, paravirtualization's strength, ie that it encourages
tighter communication between the kernel and the hypervisor, is also its
weakness. Unless the changes to the operating system are moderated, you
can very quickly find yourself with a kernel that (a) looks and feels
like a brand new kernel or (b) cannot run on native machines or on newer
versions of the hypervisor without a full recompile. The former can
impede innovation in the Linux kernel, and the latter can be a problem
for software vendors. 

VMware proposes VMI as a paravirtualization interface for Linux that
solves these problems. 
  - A VMI'fied Linux kernel runs unmodified on native hardware, and on
    many hypervisors, while simultaneously delivering on the performance
    promise of paravirtualization. 
  - VMI has a rich and low level interface, which allows the kernel to
    cope with future hardware evolution by querying for hardware
    capability. It is our expectation that a single kernel will run
    unmodified on both today's processors with limited hardware
    virtualization support and also keep up with any evolution on the
    processor front 
  - VMI Linux is a fairly clean interface, with distinct name spaces
    for objects from the kernel and the hypervisor. Nowhere do we mingle
    names from the hypervisor with that of the kernel. This separation
    allows innovation in the kernel to proceed at the same speed as
    always. For most kernel developers, a VMI kernel looks and feels like
    a regular Linux kernel.  
  - VMI Linux still supports "native" hypervisor device drivers, for
    example a hypervisor vendor's own private network or block device
    drivers which are free to use any interface desired to communicate
    with the hypervisor.

At present, we are sharing a working implementation of the VMI for
2.6.16-rc6 version of Linux. We have verified that VMI Linux does indeed
run well on native machines (both P4 and Opterons), and on VMware style
hypervisors. VMI Linux has negligible overheads on native machines, so
much so, that we are confident that VMI Linux can, in the long run, be
the default Linux for i386.  We believe that this interface is both
cleaner and more powerful than other proposals that have been made
towards virtualization of Linux, and can easily be adapted to work with
other hypervisors.

This is by no means finished work. A few of the areas that need more
attention and exploration are (a) 64bit support is still lacking, but we
feel a port of VMI to the 64 bit Linux can be done without too much
trouble (b) the Xen compatibility layer needs some work to bring it
up to the Xen 3.0 interfaces.  Work is underway on this already, and
no major issues are expected at this time. 

Two final notes.  This is not an attempt to force a proprietary interface
into the Linux kernel.  This is an attempt to find a common interface
that can be used by many hypervisors by isolating hypervisor specific
idioms into a neutral layer.  This new layer is just what is claims to
be - a virtual machine interface, which allows hypervisor dependent code
to be abstracted in a way that benefits both Linux and hypervisor
development.

This is also not an attempt to define an exact and final specification
of how virtualization should be done in Linux.  This is very much a work
in progress, and it is understood that the interfaces proposed here will
change in time to accommodate the needs of all interested parties.  We 
hope to find a common solution that can eventually become part of the
Linux kernel and serve as a model for other operating systems as well.

We appreciate your feedback on this design and the patches to Linux, and
welcome working with anyone who is interested in making virtualization
in Linux a friendly environment to innovate in.  If you find the ideas
here interesting, please volunteer to help improve them.

^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: [RFC, PATCH 0/24] VMI i386 Linux virtualization interface proposal
@ 2006-03-17 15:56 Chuck Ebbert
  2006-03-17 17:52 ` Zachary Amsden
  0 siblings, 1 reply; 32+ messages in thread
From: Chuck Ebbert @ 2006-03-17 15:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Zachary Amsden, Arjan van de Ven, Linus Torvalds, linux-kernel,
	Virtualization Mailing List, Xen-devel, Chris Wright

In-Reply-To: <20060315102522.GA5926@infradead.org>

On Wed, 15 Mar 2006 10:25:22 +0000, Christoph Hellwig wrote:

> I agree with Zach here, the Xen hypervisor <-> kernel interface is
> not very nice.  This proposal seems like a step forward althogh it'll
> probably need to go through a few iterations.  Without and actually
> useable opensource hypevisor reference implementation it's totally
> unacceptable, though.

I'd like to see a test harness implementation that has no actual
hypervisor functionality and just implements the VMI calls natively.
This could be used to test the interface and would provide a nice
starting point for those who want to write a VMI hypervisor.


-- 
Chuck
"Penguins don't come from next door, they come from the Antarctic!"


^ permalink raw reply	[flat|nested] 32+ messages in thread
* RE: [RFC, PATCH 0/24] VMI i386 Linux virtualization interface proposal
@ 2006-03-20 22:03 Anne Holler
  0 siblings, 0 replies; 32+ messages in thread
From: Anne Holler @ 2006-03-20 22:03 UTC (permalink / raw)
  To: Anne Holler, Zach Amsden, Linus Torvalds,
	Linux Kernel Mailing List, Virtualization Mailing List, Xen-devel,
	Andrew Morton, Zach Amsden, Daniel Hecht, Daniel Arai,
	Pratap Subrahmanyam, Christopher Li, Joshua LeVasseur,
	Chris Wright, Rik Van Riel, Jyothy Reddy, Jack Lo, Kip Macy,
	Jan Beulich, Ky Srinivasan, Wim Coekaerts, Leendert van Doorn,
	Zach Amsden

[-- Attachment #1: Type: text/plain, Size: 2593 bytes --]

[Apologies for resend: earlier email with html attachments was
 rejected.  Resending with txt attachments.]

>From: Zachary Amsden [mailto:zach@vmware.com]
>Sent: Monday, March 13, 2006 9:58 AM

>In OLS 2005, we described the work that we have been doing in VMware
>with respect a common interface for paravirtualization of Linux. We
>shared the general vision in Rik's virtualization BoF.

>This note is an update on our further work on the Virtual Machine
>Interface, VMI.  The patches provided have been tested on 2.6.16-rc6.
>We are currently recollecting performance information for the new -rc6
>kernel, but expect our numbers to match previous results, which showed
>no impact whatsoever on macro benchmarks, and nearly neglible impact
>on microbenchmarks.

Folks,

I'm a member of the performance team at VMware & I recently did a
round of testing measuring the performance of a set of benchmarks
on the following 2 linux variants, both running natively:
 1) 2.6.16-rc6 including VMI + 64MB hole
 2) 2.6.16-rc6 not including VMI + no 64MB hole
The intent was to measure the overhead of VMI calls on native runs.
Data was collected on both p4 & opteron boxes.  The workloads used
were dbench/1client, netperf/receive+send, UP+SMP kernel compile,
lmbench, & some VMware in-house kernel microbenchmarks.  The CPU(s)
were pegged for all workloads except netperf, for which I include
CPU utilization measurements.

Attached please find a text file presenting the benchmark results
collected in terms of ratio of 1) to 2), along with the raw scores
given in brackets.  System configurations & benchmark descriptions
are given at the end of the page; more details are available on
request.  Also attached for reference is a text file giving the
width of the 95% confidence interval around the mean of the scores
reported for each benchmark, expressed as a percentage of the mean.

The VMI-Native & Native scores for almost all workloads match
within the 95% confidence interval.  On the P4, only 4 workloads,
all lmbench microbenchmarks (forkproc,shproc,mmap,pagefault) were
outside the interval & the overheads (2%,1%,2%,1%, respectively)
were low.  The opteron microbenchmark data was a little more
ragged than the P4 in terms of variance, but it appears that only
a few lmbench microbenchmarks (forkproc,execproc,shproc) were
outside their confidence intervals and they show low overheads
(4%,3%,2%, respectively); our in-house segv & divzero seemed to
show measureable overheads as well (8%,9%).

-Regards, Anne Holler (anne@vmware.com)

[-- Attachment #2: score.2.6.16-rc6.txt --]
[-- Type: text/plain, Size: 3892 bytes --]

2.6.16-rc6 Transparent Paravirtualization Performance Scoreboard2.6.16-rc6 Transparent Paravirtualization Performance Scoreboard
Updated: 03/20/2006 * Contact: Anne Holler (anne@vmware.com)

Throughput benchmarks -> HIGHER IS BETTER -> Higher ratio is better
                     P4                  Opteron 
                     VMI-Native/Native   VMI-Native/Native   Comments
 Dbench
  1client            1.00 [312/311]      1.00 [425/425]
 Netperf
  Receive            1.00 [948/947]      1.00 [937/937]      CpuUtil:P4(VMI:43%,Ntv:42%);Opteron(VMI:36%,Ntv:34%)
  Send               1.00 [939/939]      1.00 [937/936]      CpuUtil:P4(VMI:25%,Ntv:25%);Opteron(VMI:62%,Ntv:60%)

Latency benchmarks -> LOWER IS BETTER -> Lower ratio is better
                     P4                  Opteron 
                     VMI-Native/Native   VMI-Native/Native   Comments
 Kernel compile
  UP                 1.00 [221/220]      1.00 [131/131]
  SMP/2way           1.00 [117/117]      1.00 [67/67]
 Lmbench process time latencies
  null call          1.00 [0.17/0.17]    1.00 [0.08/0.08]
  null i/o           1.00 [0.29/0.29]    0.92 [0.23/0.25]    opteron: wide confidence interval
  stat               0.99 [2.14/2.16]    0.94 [2.25/2.39]    opteron: odd, 1% outside wide confidence interval
  open clos          1.01 [3.00/2.96]    0.98 [3.16/3.24]
  slct TCP           1.00 [8.84/8.83]    0.94 [11.8/12.5]    opteron: wide confidence interval
  sig inst           0.99 [0.68/0.69]    1.09 [0.36/0.33]    opteron: best is 1.03 [0.34/0.33]
  sig hndl           0.99 [2.19/2.21]    1.05 [1.20/1.14]    opteron: best is 1.02 [1.13/1.11]
  fork proc          1.02 [137/134]      1.04 [100/96]
  exec proc          1.02 [536/525]      1.03 [309/301]
  sh proc            1.01 [3204/3169]    1.02 [1551/1528]
 Lmbench context switch time latencies
  2p/0K              1.00 [2.84/2.84]    1.14 [0.74/0.65]    opteron: wide confidence interval
  2p/16K             1.01 [2.98/2.95]    0.93 [0.74/0.80]    opteron: wide confidence interval
  2p/64K             1.02 [3.06/3.01]    1.00 [4.19/4.18]
  8p/16K             1.02 [3.31/3.26]    0.97 [1.86/1.91]
  8p/64K             1.01 [30.4/30.0]    1.00 [4.33/4.34]
  16p/16K            0.96 [7.76/8.06]    0.97 [2.03/2.10]
  16p/64K            1.00 [41.5/41.4]    1.00 [15.9/15.9]
 Lmbench system latencies
  Mmap               1.02 [6681/6542]    1.00 [3452/3441]
  Prot Fault         1.06 [0.920/0.872]  1.07 [0.197/0.184]  p4+opteron: wide confidence interval
  Page Fault         1.01 [2.065/2.050]  1.00 [1.10/1.10]
 Kernel Microbenchmarks
  getppid            1.00 [1.70/1.70]    1.00 [0.83/0.83]
  segv               0.99 [7.05/7.09]    1.08 [2.95/2.72]
  forkwaitn          1.02 [3.60/3.54]    1.05 [2.61/2.48]
  divzero            0.99 [5.68/5.73]    1.09 [2.71/2.48]

System Configurations:
 P4:      CPU: 2.4GHz; MEM: 1024MB; DISK: 10K SCSI; Server+Client NICs: Intel e1000 server adapter
 Opteron: CPU: 2.2Ghz; MEM: 1024MB; DISK: 10K SCSI; Server+Client NICs: Broadcom NetXtreme BCM5704
 UP kernel used for all workloads except SMP kernel compile

Benchmark Descriptions:
 Dbench: repeat N times until 95% confidence interval 5% around mean; report mean
  version 2.0 run as "time ./dbench -c client_plain.txt 1"
 Netperf: best of 5 runs
  MessageSize:8192+SocketSize:65536; netperf -H client-ip -l 60 -t TCP_STREAM
 Kernel compile: best of 3 runs
  Build of 2.6.11 kernel w/gcc 4.0.2 via "time make -j 16 bzImage"
 Lmbench: average of best 18 of 30 runs
  version 3.0-a4; obtained from sourceforge
 Kernel microbenchmarks: average of best 3 of 5 runs
  getppid: loop of 10 calls to getppid, repeated 1,000,000 times
  segv: signal of SIGSEGV, repeated 3,000,000 times
  forkwaitn: fork/wait for child to exit, repeated 40,000 times
  divzero: divide by 0 fault 3,000,000 times

[-- Attachment #3: confid.2.6.16-rc6.txt --]
[-- Type: text/plain, Size: 2123 bytes --]

2.6.16-rc6 Transparent Paravirtualization Performance Confidence Interval Widths2.6.16-rc6 Transparent Paravirtualization Performance Confidence Interval Widths
Updated: 03/20/2006 * Contact: Anne Holler (anne@vmware.com)
Values are 95% confidence interval width around mean given in terms of percentage of mean

                   P4                  Opteron
                   Native VMI-Native   Native VMI-Native
 Dbench2.0
  1client            5.0%  1.4%          0.8%  3.6%
 Netperf
  Receive            0.1%  0.0%          0.0%  0.0%
  Send               0.6%  1.8%          0.0%  0.0%
 Kernel compile
  UP                 3.4%  2.6%          2.2%  0.0%
  SMP/2way           2.4%  4.9%          4.3%  4.2%
 Lmbench process time latencies
  null call          0.0%  0.0%          0.0%  0.0%
  null i/o           0.0%  0.0%          5.2% 10.8%
  stat               1.0%  1.0%          1.7%  3.2%
  open clos          1.3%  0.7%          2.4%  3.0%
  slct TCP           0.3%  0.3%         19.9% 20.1%
  sig inst           0.3%  0.5%          0.0%  5.5%
  sig hndl           0.4%  0.4%          2.0%  2.0%
  fork proc          0.5%  0.9%          0.8%  1.0%
  exec proc          0.8%  0.9%          1.0%  0.7%
  sh proc            0.1%  0.2%          0.9%  0.4%
 Lmbench context switch time latencies
  2p/0K              0.8%  1.8%         16.1%  9.9%
  2p/16K             1.5%  1.8%         10.5% 10.1%
  2p/64K             2.4%  3.0%          1.8%  1.4%
  8p/16K             4.5%  4.2%          2.4%  4.2%
  8p/64K             3.0%  2.8%          1.6%  1.5%
  16p/16K            3.1%  6.7%          2.6%  3.2%
  16p/64K            0.5%  0.5%          2.9%  2.9%
 Lmbench system latencies
  Mmap               0.7%  0.3%          2.2% 2.4%
  Prot Fault         7.4%  7.5%         49.4% 38.7%
  Page Fault         0.2%  0.2%          2.4%  2.9%
 Kernel Microbenchmarks
  getppid            1.7%  2.9%          3.5%  3.5%
  segv               2.3%  0.7%          1.8%  1.9%
  forkwaitn          0.8%  0.8%          5.3%  2.2%
  divzero            0.9%  1.3%          1.2%  1.1%

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2006-03-20 22:06 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-13 17:58 [RFC, PATCH 0/24] VMI i386 Linux virtualization interface proposal Zachary Amsden
2006-03-13 18:09 ` Arjan van de Ven
2006-03-13 18:22   ` Zachary Amsden
2006-03-13 18:26     ` Arjan van de Ven
2006-03-13 18:30       ` Zachary Amsden
2006-03-13 18:42         ` Arjan van de Ven
2006-03-13 18:48           ` Zachary Amsden
2006-03-13 19:02             ` Chris Wright
2006-03-13 18:56           ` Joshua LeVasseur
2006-03-16 18:52             ` Jan Engelhardt
2006-03-13 18:56         ` Hollis Blanchard
2006-03-13 18:59           ` Zachary Amsden
2006-03-15 10:25     ` Christoph Hellwig
2006-03-15 15:57       ` Zachary Amsden
2006-03-15 17:38       ` Joshua LeVasseur
2006-03-15 20:02         ` Andrew Morton
2006-03-16  0:05           ` Joshua LeVasseur
2006-03-13 20:17 ` Sam Vilain
2006-03-14  0:39 ` Anthony Liguori
2006-03-14  4:01   ` Zachary Amsden
2006-03-14  4:04     ` Rik van Riel
2006-03-14  4:55       ` Zachary Amsden
2006-03-14  4:13 ` Anthony Liguori
2006-03-14  4:26   ` Zachary Amsden
2006-03-14  4:30     ` Rik van Riel
2006-03-14  5:46       ` Zachary Amsden
2006-03-14 12:44         ` Rik van Riel
2006-03-14 16:22           ` Zachary Amsden
2006-03-16 18:58         ` Jan Engelhardt
  -- strict thread matches above, loose matches on Subject: below --
2006-03-17 15:56 Chuck Ebbert
2006-03-17 17:52 ` Zachary Amsden
2006-03-20 22:03 Anne Holler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox