From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx132.postini.com [74.125.245.132]) by kanga.kvack.org (Postfix) with SMTP id 764906B0032 for ; Tue, 28 May 2013 03:11:40 -0400 (EDT) Received: by mail-lb0-f174.google.com with SMTP id u10so7271953lbi.5 for ; Tue, 28 May 2013 00:11:38 -0700 (PDT) Message-ID: <51A45861.1010008@gmail.com> Date: Tue, 28 May 2013 11:10:25 +0400 From: Max Filippov MIME-Version: 1.0 Subject: Re: TLB and PTE coherency during munmap References: In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Peter Zijlstra , KAMEZAWA Hiroyuki , linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: Ralf Baechle , Chris Zankel , Marc Gauthier , linux-xtensa@linux-xtensa.org, Hugh Dickins On Sun, May 26, 2013 at 6:50 AM, Max Filippov wrote: > Hello arch and mm people. > > Is it intentional that threads of a process that invoked munmap syscall > can see TLB entries pointing to already freed pages, or it is a bug? > > I'm talking about zap_pmd_range and zap_pte_range: > > zap_pmd_range > zap_pte_range > arch_enter_lazy_mmu_mode > ptep_get_and_clear_full > tlb_remove_tlb_entry > __tlb_remove_page > arch_leave_lazy_mmu_mode > cond_resched > > With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry > and __tlb_remove_page there is a loop in the zap_pte_range that clears > PTEs and frees corresponding pages, but doesn't flush TLB, and > surrounding loop in the zap_pmd_range that calls cond_resched. If a thread > of the same process gets scheduled then it is able to see TLB entries > pointing to already freed physical pages. > > I've noticed that with xtensa arch when I added a test before returning to > userspace checking that TLB contents agrees with page tables of the > current mm. This check reliably fires with the LTP test mtest05 that > maps, unmaps and accesses memory from multiple threads. > > Is there anything wrong in my description, maybe something specific to > my arch, or this issue really exists? Hi, I've made similar checking function for MIPS (because qemu is my only choice and it simulates MIPS TLB) and ran my tests on mips-malta machine in qemu. With MIPS I can also see this issue. I hope I did it right, the patch at the bottom is for the reference. The test I run and the diagnostic output are as follows: # ./runltp -p -q -T 100 -s mtest05 ... mmstress 0 TINFO : test2: Test case tests the race condition between simultaneous write faults in the same address space. [ 439.010000] 14: 70d68000: 03178000/00000000 mmstress 2 TPASS : TEST 2 Passed ... mmstress 0 TINFO : test2: Test case tests the race condition between simultaneous write faults in the same address space. [ 947.390000] 10: 6f9d2000: 03639000/00000000 [ 947.390000] 10: 6f9d3000: 03638000/00000000 mmstress 2 TPASS : TEST 2 Passed ... mmstress 0 TINFO : test1: Test case tests the race condition between simultaneous read faults in the same address space. [ 1922.680000] 10: 68e12000: 03b59000/00000000 [ 1922.680000] 10: 68e13000: 03b58000/00000000 mmstress 1 TPASS : TEST 1 Passed ... To me it looks like the cond_resched in the zap_pmd_range is the root cause of this issue (let alone SMP case for now). It was introduced in the commit commit 97a894136f29802da19a15541de3c019e1ca147e Author: Peter Zijlstra Date: Tue May 24 17:12:04 2011 -0700 mm: Remove i_mmap_lock lockbreak Peter, Kamezawa, other reviewers of that commit, could you please comment? ------8<------