From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Magenheimer Subject: RE: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM Date: Thu, 15 Mar 2012 12:16:04 -0700 (PDT) Message-ID: References: <1331224181.2585.16.camel@aks> <4F621FC0.7050800@redhat.com> <4F622E90.5080001@redhat.com> <20120315180233.GF452@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8BIT Cc: Akshay Karle , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ashu tripathi , nishant gulhane , amarmore2006 , Shreyas Mahure , mahesh mohan To: Konrad Wilk , Avi Kivity Return-path: Received: from acsinet15.oracle.com ([141.146.126.227]:35333 "EHLO acsinet15.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965311Ab2COTQU convert rfc822-to-8bit (ORCPT ); Thu, 15 Mar 2012 15:16:20 -0400 In-Reply-To: <20120315180233.GF452@phenom.dumpdata.com> Sender: kvm-owner@vger.kernel.org List-ID: > From: Konrad Rzeszutek Wilk > Subject: Re: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM > > On Thu, Mar 15, 2012 at 08:01:52PM +0200, Avi Kivity wrote: > > On 03/15/2012 07:49 PM, Dan Magenheimer wrote: > > > > > > The "WasActive" patch (https://lkml.org/lkml/2012/1/25/300) > > > is intended to avoid the streaming situation you are creating here. > > > It increases the "quality" of cached pages placed into zcache > > > and should probably also be used on the guest-side stubs (and/or maybe > > > the host-side zcache... I don't know KVM well enough to determine > > > if that would work). > > > > > > As Dave Hansen pointed out, the WasActive patch is not yet correct > > > and, as akpm points out, pageflag bits are scarce on 32-bit systems, > > > so it remains to be seen if the WasActive patch can be upstreamed. > > > Or maybe there is a different way to achieve the same goal. > > > But I wanted to let you know that the streaming issue is understood > > > and needs to be resolved for some cleancache backends just as it was > > > resolved in the core mm code. > > > > Nice. This takes care of the tail-end of the streaming (the more > > important one - since it always involves a cold copy). What about the > > other side? Won't the read code invoke cleancache_get_page() for every > > page? (this one is just a null hypercall, so it's cheaper, but still > > expensive). > > That is something we should fix - I think it was mentioned in the frontswap > email thread the need for batching and it certainly seems required as those > hypercalls aren't that cheap. And exactly how expensive ARE hypercalls these days? On the first VT/SVN systems they were tens of thousands of cycles... now they are closer to sub-thousand are they not? (I remember seeing a graph of hypercall overhead dropping across generations of CPUs... anybody have a pointer to a public graph of this?) One of my favorite papers these days is "When Poll is Better than Interrupt" (http://static.usenix.org/events/fast12/tech/full_papers/Yang.pdf) which argues that wasting some CPU cycles doing a busy-wait is often more efficient than slogging through the Block I/O subsystem to set up and respond to an interrupt, if the device is fast enough. I wonder if the same might be true comparing hypercall overhead for tmem vs the path for KVM to get a page from the host via its normal path? Ignoring that for now, if excessive hypercalls is a problem, a better solution than batching may be to modify the Maharashtra approach to be more like RAMster: Put zcache in the guest-side and treat the host like a "remote" system. But let's wait for the Maharashta team to do some measurements first before we make any assumptions or change any designs...