From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mats Petersson Subject: Re: [RFC/PATCH] Improve speed of mapping guest memory into Dom0 Date: Wed, 14 Nov 2012 16:43:07 +0000 Message-ID: <50A3CA1B.6050907@citrix.com> References: <50A37CC7.8050700@citrix.com> <50A397E1.7000602@citrix.com> <50A3C94C.2040806@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <50A3C94C.2040806@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: David Vrabel Cc: "xen-devel@lists.xen.org" , Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org On 14/11/12 16:39, David Vrabel wrote: > On 14/11/12 13:08, David Vrabel wrote: >> On 14/11/12 11:13, Mats Petersson wrote: >> >>> I have also found that the munmap() call used to unmap the guest memory >>> from Dom0 is about 35% slower in 3.7 kernel than in the 2.6 kernel (3.8M >>> cycles vs 2.8M cycles). >> This performance reduction only occurs with 32-bit guests is the Xen >> then traps-and-emulates both halves of the PTE write. >> >>> I think this could be made quicker by using a >>> direct write of zero rather than the compare exchange operation that is >>> currently used [which traps into Xen, performs the compare & exchange] - >> This is something I noticed but never got around to producing a patch. >> How about this (uncomplied!) patch? >> >> -- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -1146,8 +1146,16 @@ again: >> page->index > details->last_index)) >> continue; >> } >> - ptent = ptep_get_and_clear_full(mm, addr, pte, >> - tlb->fullmm); >> + /* >> + * No need for the expensive atomic get and >> + * clear for anonymous mappings as the dirty >> + * and young bits are not used. >> + */ >> + if (PageAnon(page)) > The mapping might not be backed by pages (e.g., foreign mappings) so: > > if (!page || PageAnon(page)) Indeed, this works fine - it now takes just under 500K cycles to "unmap" 1024 pages - compared to 3800k cycles with the original code. -- Mats > >> + pte_clear(mm, addr, pte); >> + else >> + ptent = ptep_get_and_clear_full(mm, addr, pte, >> + tlb->fullmm); >> tlb_remove_tlb_entry(tlb, pte, addr); >> if (unlikely(!page)) >> continue; > David > >