From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751813Ab2G0Iwz (ORCPT ); Fri, 27 Jul 2012 04:52:55 -0400 Received: from cantor2.suse.de ([195.135.220.15]:54699 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751113Ab2G0Iwx (ORCPT ); Fri, 27 Jul 2012 04:52:53 -0400 Date: Fri, 27 Jul 2012 09:52:50 +0100 From: Mel Gorman To: Rik van Riel Cc: Linux-MM , Michal Hocko , Hugh Dickins , David Gibson , Ken Chen , Cong Wang , LKML , Larry Woodman Subject: Re: [PATCH] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables v2 Message-ID: <20120727085250.GC612@suse.de> References: <20120720134937.GG9222@suse.de> <5011AFEC.2040609@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <5011AFEC.2040609@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 26, 2012 at 05:00:28PM -0400, Rik van Riel wrote: > On 07/20/2012 09:49 AM, Mel Gorman wrote: > >This V2 is still the mmap_sem approach that fixes a potential deadlock > >problem pointed out by Michal. > > Larry and I were looking around the hugetlb code some > more, and found what looks like yet another race. > > In hugetlb_no_page, we have the following code: > > > spin_lock(&mm->page_table_lock); > size = i_size_read(mapping->host) >> huge_page_shift(h); > if (idx >= size) > goto backout; > > ret = 0; > if (!huge_pte_none(huge_ptep_get(ptep))) > goto backout; > > if (anon_rmap) > hugepage_add_new_anon_rmap(page, vma, address); > else > page_dup_rmap(page); > new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE) > && (vma->vm_flags & VM_SHARED))); > set_huge_pte_at(mm, address, ptep, new_pte); > ... > spin_unlock(&mm->page_table_lock); > > Notice how we check !huge_pte_none with our own > mm->page_table_lock held. > > This offers no protection at all against other > processes, that also hold their own page_table_lock. > Yes, the page_table_lock is close to useless once shared page tables are involved. It's why if we ever wanted to make shared page tables a core MM thing we'd have to revisit how PTE locking at any level that can share page tables works. > In short, it looks like it is possible for multiple > processes to go through the above code simultaneously, > potentially resulting in: > > 1) one process overwriting the pte just created by > another process > > 2) data corruption, as one partially written page > gets superceded by an newly zeroed page, but no > TLB invalidates get sent to other CPUs > > 3) a memory leak of a huge page > > Is there anything that would make this race impossible, > or is this a real bug? > In this case it all happens under the hugetlb instantiation mutex in hugetlb_fault(). It's yet another reason why removing that mutex would be a serious undertaking and the gain for doing so is marginal. -- Mel Gorman SUSE Labs