From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755015AbYA2S3R (ORCPT ); Tue, 29 Jan 2008 13:29:17 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756529AbYA2S2m (ORCPT ); Tue, 29 Jan 2008 13:28:42 -0500 Received: from host36-195-149-62.serverdedicati.aruba.it ([62.149.195.36]:56266 "EHLO mx.cpushare.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754945AbYA2S2l (ORCPT ); Tue, 29 Jan 2008 13:28:41 -0500 Date: Tue, 29 Jan 2008 19:28:32 +0100 From: Andrea Arcangeli To: Christoph Lameter Cc: Robin Holt , Avi Kivity , Izik Eidus , Nick Piggin , kvm-devel@lists.sourceforge.net, Benjamin Herrenschmidt , Peter Zijlstra , steiner@sgi.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, daniel.blueman@quadrics.com, Hugh Dickins Subject: Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Message-ID: <20080129182831.GS7233@v2.random> References: <20080128202840.974253868@sgi.com> <20080128202923.849058104@sgi.com> <20080129162004.GL7233@v2.random> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080129162004.GL7233@v2.random> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Christoph, the below patch should fix the current leak of the pinned pages. I hope the page-pin that should be dropped by the invalidate_range op, is enough to prevent the "physical page" mapped on that "mm+address" to change before invalidate_range returns. If that would ever happen, there would be a coherency loss between the guest VM writes and the writes coming from userland on the same mm+address from a different thread (qemu, whatever). invalidate_page before PT lock was obviously safe. Now we entirely relay on the pin to prevent the page to change before invalidate_range returns. If the pte is unmapped and the page is mapped back in with a minor fault that's ok, as long as the physical page remains the same for that mm+address, until all sptes are gone. Signed-off-by: Andrea Arcangeli diff --git a/mm/fremap.c b/mm/fremap.c --- a/mm/fremap.c +++ b/mm/fremap.c @@ -212,8 +212,8 @@ asmlinkage long sys_remap_file_pages(uns spin_unlock(&mapping->i_mmap_lock); } + err = populate_range(mm, vma, start, size, pgoff); mmu_notifier(invalidate_range, mm, start, start + size, 0); - err = populate_range(mm, vma, start, size, pgoff); if (!err && !(flags & MAP_NONBLOCK)) { if (unlikely(has_write_lock)) { downgrade_write(&mm->mmap_sem); diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -1639,8 +1639,6 @@ gotten: /* * Re-check the pte - we dropped the lock */ - mmu_notifier(invalidate_range, mm, address, - address + PAGE_SIZE - 1, 0); page_table = pte_offset_map_lock(mm, pmd, address, &ptl); if (likely(pte_same(*page_table, orig_pte))) { if (old_page) { @@ -1676,6 +1674,8 @@ gotten: page_cache_release(old_page); unlock: pte_unmap_unlock(page_table, ptl); + mmu_notifier(invalidate_range, mm, address, + address + PAGE_SIZE - 1, 0); if (dirty_page) { if (vma->vm_file) file_update_time(vma->vm_file); From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrea Arcangeli Subject: Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Date: Tue, 29 Jan 2008 19:28:32 +0100 Message-ID: <20080129182831.GS7233@v2.random> References: <20080128202840.974253868@sgi.com> <20080128202923.849058104@sgi.com> <20080129162004.GL7233@v2.random> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: Nick Piggin , Peter Zijlstra , linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Benjamin Herrenschmidt , steiner-sJ/iWh9BUns@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Avi Kivity , kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org, daniel.blueman-xqY44rlHlBpWk0Htik3J/w@public.gmane.org, Robin Holt , Hugh Dickins To: Christoph Lameter Return-path: Content-Disposition: inline In-Reply-To: <20080129162004.GL7233-lysg2Xt5kKMAvxtiuMwx3w@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org Errors-To: kvm-devel-bounces-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org List-Id: kvm.vger.kernel.org Christoph, the below patch should fix the current leak of the pinned pages. I hope the page-pin that should be dropped by the invalidate_range op, is enough to prevent the "physical page" mapped on that "mm+address" to change before invalidate_range returns. If that would ever happen, there would be a coherency loss between the guest VM writes and the writes coming from userland on the same mm+address from a different thread (qemu, whatever). invalidate_page before PT lock was obviously safe. Now we entirely relay on the pin to prevent the page to change before invalidate_range returns. If the pte is unmapped and the page is mapped back in with a minor fault that's ok, as long as the physical page remains the same for that mm+address, until all sptes are gone. Signed-off-by: Andrea Arcangeli diff --git a/mm/fremap.c b/mm/fremap.c --- a/mm/fremap.c +++ b/mm/fremap.c @@ -212,8 +212,8 @@ asmlinkage long sys_remap_file_pages(uns spin_unlock(&mapping->i_mmap_lock); } + err = populate_range(mm, vma, start, size, pgoff); mmu_notifier(invalidate_range, mm, start, start + size, 0); - err = populate_range(mm, vma, start, size, pgoff); if (!err && !(flags & MAP_NONBLOCK)) { if (unlikely(has_write_lock)) { downgrade_write(&mm->mmap_sem); diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -1639,8 +1639,6 @@ gotten: /* * Re-check the pte - we dropped the lock */ - mmu_notifier(invalidate_range, mm, address, - address + PAGE_SIZE - 1, 0); page_table = pte_offset_map_lock(mm, pmd, address, &ptl); if (likely(pte_same(*page_table, orig_pte))) { if (old_page) { @@ -1676,6 +1674,8 @@ gotten: page_cache_release(old_page); unlock: pte_unmap_unlock(page_table, ptl); + mmu_notifier(invalidate_range, mm, address, + address + PAGE_SIZE - 1, 0); if (dirty_page) { if (vma->vm_file) file_update_time(vma->vm_file); ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Tue, 29 Jan 2008 19:28:32 +0100 From: Andrea Arcangeli Subject: Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Message-ID: <20080129182831.GS7233@v2.random> References: <20080128202840.974253868@sgi.com> <20080128202923.849058104@sgi.com> <20080129162004.GL7233@v2.random> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080129162004.GL7233@v2.random> Sender: owner-linux-mm@kvack.org Return-Path: To: Christoph Lameter Cc: Robin Holt , Avi Kivity , Izik Eidus , Nick Piggin , kvm-devel@lists.sourceforge.net, Benjamin Herrenschmidt , Peter Zijlstra , steiner@sgi.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, daniel.blueman@quadrics.com, Hugh Dickins List-ID: Christoph, the below patch should fix the current leak of the pinned pages. I hope the page-pin that should be dropped by the invalidate_range op, is enough to prevent the "physical page" mapped on that "mm+address" to change before invalidate_range returns. If that would ever happen, there would be a coherency loss between the guest VM writes and the writes coming from userland on the same mm+address from a different thread (qemu, whatever). invalidate_page before PT lock was obviously safe. Now we entirely relay on the pin to prevent the page to change before invalidate_range returns. If the pte is unmapped and the page is mapped back in with a minor fault that's ok, as long as the physical page remains the same for that mm+address, until all sptes are gone. Signed-off-by: Andrea Arcangeli diff --git a/mm/fremap.c b/mm/fremap.c --- a/mm/fremap.c +++ b/mm/fremap.c @@ -212,8 +212,8 @@ asmlinkage long sys_remap_file_pages(uns spin_unlock(&mapping->i_mmap_lock); } + err = populate_range(mm, vma, start, size, pgoff); mmu_notifier(invalidate_range, mm, start, start + size, 0); - err = populate_range(mm, vma, start, size, pgoff); if (!err && !(flags & MAP_NONBLOCK)) { if (unlikely(has_write_lock)) { downgrade_write(&mm->mmap_sem); diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -1639,8 +1639,6 @@ gotten: /* * Re-check the pte - we dropped the lock */ - mmu_notifier(invalidate_range, mm, address, - address + PAGE_SIZE - 1, 0); page_table = pte_offset_map_lock(mm, pmd, address, &ptl); if (likely(pte_same(*page_table, orig_pte))) { if (old_page) { @@ -1676,6 +1674,8 @@ gotten: page_cache_release(old_page); unlock: pte_unmap_unlock(page_table, ptl); + mmu_notifier(invalidate_range, mm, address, + address + PAGE_SIZE - 1, 0); if (dirty_page) { if (vma->vm_file) file_update_time(vma->vm_file); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org