From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robin Holt Subject: Re: [PATCH] export notifier #1 Date: Wed, 23 Jan 2008 04:52:47 -0600 Message-ID: <20080123105246.GG26420@sgi.com> References: <478F9C9C.7070500@qumranet.com> <20080117193252.GC24131@v2.random> <20080121125204.GJ6970@v2.random> <4795F9D2.1050503@qumranet.com> <20080122144332.GE7331@v2.random> <20080122200858.GB15848@v2.random> <20080122223139.GD15848@v2.random> <479716AD.5070708@qumranet.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: Andrew Morton , Nick Piggin , Andrea Arcangeli , linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Benjamin Herrenschmidt , steiner-sJ/iWh9BUns@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org, daniel.blueman-xqY44rlHlBpWk0Htik3J/w@public.gmane.org, holt-sJ/iWh9BUns@public.gmane.org, Hugh Dickins , Christoph Lameter To: Avi Kivity Return-path: Content-Disposition: inline In-Reply-To: <479716AD.5070708-atKUWr5tajBWk0Htik3J/w@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org Errors-To: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org List-Id: kvm.vger.kernel.org On Wed, Jan 23, 2008 at 12:27:57PM +0200, Avi Kivity wrote: >> The approach with the export notifier is page based not based on the >> mm_struct. We only need a single page count for a page that is exported to >> a number of remote instances of linux. The page count is dropped when all >> the remote instances have unmapped the page. > > That won't work for kvm. If we have a hundred virtual machines, that means > 99 no-op notifications. But 100 callouts holding spinlocks will not work for our implementation and even if the callouts are made with spinlocks released, we would very strongly prefer a single callout which messages the range to the other side. > Also, our rmap key for finding the spte is keyed on (mm, va). I imagine > most RDMA cards are similar. For our RDMA rmap, it is based upon physical address. >> There is only the need to walk twice for pages that are marked Exported. >> And the double walk is only necessary if the exporter does not have its >> own rmap. The cross partition thing that we are doing has such an rmap and >> its a matter of walking the exporters rmap to clear out the external >> references and then we walk the local rmaps. All once. >> > > The problem is that external mmus need a reverse mapping structure to > locate their ptes. We can't expand struct page so we need to base it on mm > + va. Our rmap takes a physical address and turns it into mm+va. > Can they wait on that bit? PageLocked(page) should work, right? We already have a backoff mechanism so we expect to be able to adapt it to include a PageLocked(page) check. Thanks, Robin ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/