From mboxrd@z Thu Jan  1 00:00:00 1970
From: Mats Petersson <mats.petersson@citrix.com>
Subject: Re: [RFC/PATCH] Improve speed of mapping guest memory
 into	Dom0
Date: Wed, 14 Nov 2012 16:43:07 +0000
Message-ID: <50A3CA1B.6050907@citrix.com>
References: <50A37CC7.8050700@citrix.com> <50A397E1.7000602@citrix.com>
	<50A3C94C.2040806@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <50A3C94C.2040806@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: David Vrabel <david.vrabel@citrix.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
List-Id: xen-devel@lists.xenproject.org

On 14/11/12 16:39, David Vrabel wrote:
> On 14/11/12 13:08, David Vrabel wrote:
>> On 14/11/12 11:13, Mats Petersson wrote:
>>
>>> I have also found that the munmap() call used to unmap the guest memory
>>> from Dom0 is about 35% slower in 3.7 kernel than in the 2.6 kernel (3.8M
>>> cycles vs 2.8M cycles).
>> This performance reduction only occurs with 32-bit guests is the Xen
>> then traps-and-emulates both halves of the PTE write.
>>
>>> I think this could be made quicker by using a
>>> direct write of zero rather than the compare exchange operation that is
>>> currently used [which traps into Xen, performs the compare & exchange] -
>> This is something I noticed but never got around to producing a patch.
>> How about this (uncomplied!) patch?
>>
>> -- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -1146,8 +1146,16 @@ again:
>>   				     page->index > details->last_index))
>>   					continue;
>>   			}
>> -			ptent = ptep_get_and_clear_full(mm, addr, pte,
>> -							tlb->fullmm);
>> +			/*
>> +			 * No need for the expensive atomic get and
>> +			 * clear for anonymous mappings as the dirty
>> +			 * and young bits are not used.
>> +			 */
>> +			if (PageAnon(page))
> The mapping might not be backed by pages (e.g., foreign mappings) so:
>
> if (!page || PageAnon(page))
Indeed, this works fine - it now takes just under 500K cycles to "unmap" 
1024 pages - compared to 3800k cycles with the original code.

--
Mats
>
>> +				pte_clear(mm, addr, pte);
>> +			else
>> +				ptent = ptep_get_and_clear_full(mm, addr, pte,
>> +								tlb->fullmm);
>>   			tlb_remove_tlb_entry(tlb, pte, addr);
>>   			if (unlikely(!page))
>>   				continue;
> David
>
>