From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [patch 00/13] RFC: split the global mutex Date: Sun, 20 Apr 2008 14:16:52 +0300 Message-ID: <480B2624.9040805@qumranet.com> References: <20080417201021.515148882@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: kvm-devel To: Marcelo Tosatti Return-path: In-Reply-To: <20080417201021.515148882@localhost.localdomain> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kvm-devel-bounces@lists.sourceforge.net Errors-To: kvm-devel-bounces@lists.sourceforge.net List-Id: kvm.vger.kernel.org Marcelo Tosatti wrote: > Introduce QEMUDevice, making the ioport/iomem->device relationship visible. > > At the moment it only contains a lock, but could be extended. > > With it the following is possible: > - vcpu's to read/write via ioports/iomem while the iothread is working on > some unrelated device, or just copying data from the kernel. > - vcpu's to read/write via ioports/iomem to different devices simultaneously. > > This patchset is only a proof of concept kind of thing, so only serial+raw image > are supported. > > Tried two benchmarks, iperf and tiobench. With tiobench the reported latency is > significantly lower (20%+), but throughput with IDE is only slightly higher. > > Expect to see larger improvements with a higher performing IO scheme (SCSI still buggy, > looking at it). > > The iperf numbers are pretty good. Performance of UP guests increase slightly but SMP > is quite significant. > > I expect you're seeing contention induced by memcpy()s and inefficient emulation. With the dma api, I expect the benefit will drop. > Note that workloads with multiple busy devices (such as databases, web servers) should > be the real winners. > > What is the feeling on this? Its not _that_ intrusive and can be easily NOP'ed out for > QEMU. > > I think many parts are missing (or maybe, I missed them). You need to lock the qemu internals (there are many read-mostly qemu caches scattered around the code), lock against hotplug, etc. For pure cpu emulation, there is a ton of work to be done: protecting the translator as well as making the translated code smp safe. I think that QemuDevice makes sense, and that we want this long term, but that we first need to improve efficiency (which reduces cpu utilization _and_ improves scalability) rather than look at scalability alone (which is much harder in addition to the drawback of not reducing cpu utilization). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone