From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161263AbXDWKpE (ORCPT ); Mon, 23 Apr 2007 06:45:04 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1161361AbXDWKpE (ORCPT ); Mon, 23 Apr 2007 06:45:04 -0400 Received: from mx1.redhat.com ([66.187.233.31]:37443 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161263AbXDWKpB (ORCPT ); Mon, 23 Apr 2007 06:45:01 -0400 Message-ID: <462C8E1D.8000706@redhat.com> Date: Mon, 23 Apr 2007 06:44:45 -0400 From: Rik van Riel Organization: Red Hat, Inc User-Agent: Thunderbird 1.5.0.7 (X11/20061008) MIME-Version: 1.0 To: Nick Piggin CC: Andrew Morton , linux-kernel , linux-mm , shak , jakub@redhat.com, drepper@redhat.com Subject: Re: [PATCH] lazy freeing of memory through MADV_FREE References: <46247427.6000902@redhat.com> <20070420135715.f6e8e091.akpm@linux-foundation.org> <462932BE.4020005@redhat.com> <20070420150618.179d31a4.akpm@linux-foundation.org> <4629524C.5040302@redhat.com> <462ACA40.8070407@yahoo.com.au> <462B0156.9020407@redhat.com> <462BFAF3.4040509@yahoo.com.au> <462C2DC7.5070709@redhat.com> <462C2F33.8090508@redhat.com> <462C7A6F.9030905@redhat.com> <462C88B1.8080906@yahoo.com.au> <462C8B0A.8060801@redhat.com> <462C8BFF.2050405@yahoo.com.au> In-Reply-To: <462C8BFF.2050405@yahoo.com.au> Content-Type: multipart/mixed; boundary="------------090202030605030100060308" Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------090202030605030100060308 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Use TLB batching for MADV_FREE. Adds another 10-15% extra performance to the MySQL sysbench results on my quad core system. Signed-off-by: Rik van Riel --- Nick Piggin wrote: >> 3) because of this, we can treat any such accesses as >> happening simultaneously with the MADV_FREE and >> as illegal, aka undefined behaviour territory and >> we do not need to worry about them > > Yes, but I'm wondering if it is legal in all architectures. It's similar to trying to access memory during an munmap. You may be able to for a short time, but it'll come back to haunt you. >> 4) because we flush the tlb before releasing the page >> table lock, other CPUs cannot remove this page from >> the address space - they will block on the page >> table lock before looking at this pte > > We don't when the ptl is split. Even then we do. Each invocation of zap_pte_range() only touches one page table page, and it flushes the TLB before releasing the page table lock. > What the tlb flush used to be able to assume is that the page > has been removed from the pagetables when they are put in the > tlb flush batch. All the tlb flush code seems to assume is that the tlb entries should be invalidated. > I'm not saying there is any bugs, but just suggesting there > might be. Jakub found a potential bug, in that I did not use an atomic operation to clear the page table entries. I've attached a new patch which simply uses ptep_test_and_clear_dirty/young to get rid of the dirty and accessed bits. It uses the same atomic accesses we use elsewhere in the VM and the code is a line shorter than before. Andrew, please use this one. -- Politics is the struggle between those who want to make their country the best in the world, and those who believe it already is. Each group calls the other unpatriotic. --------------090202030605030100060308 Content-Type: text/x-patch; name="linux-2.6-madv_free-lazytlb.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="linux-2.6-madv_free-lazytlb.patch" --- linux-2.6.20.x86_64/mm/memory.c.orig 2007-04-23 02:48:36.000000000 -0400 +++ linux-2.6.20.x86_64/mm/memory.c 2007-04-23 02:54:42.000000000 -0400 @@ -677,11 +677,14 @@ static unsigned long zap_pte_range(struc remove_exclusive_swap_page(page); unlock_page(page); } - ptep_clear_flush_dirty(vma, addr, pte); - ptep_clear_flush_young(vma, addr, pte); + ptep_test_and_clear_dirty(vma, addr, pte); + ptep_test_and_clear_young(vma, addr, pte); SetPageLazyFree(page); if (PageActive(page)) deactivate_tail_page(page); + /* tlb_remove_page frees it again */ + get_page(page); + tlb_remove_page(tlb, page); continue; } } --------------090202030605030100060308--