From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with ESMTP id 23C3B6B0089 for ; Wed, 20 May 2009 10:52:56 -0400 (EDT) Subject: Re: [Bugme-new] [Bug 13302] New: "bad pmd" on fork() of process with hugepage shared memory segments attached From: Lee Schermerhorn In-Reply-To: <20090520113525.GA4409@csn.ul.ie> References: <6.2.5.6.2.20090515145151.03a55298@binnacle.cx> <20090520113525.GA4409@csn.ul.ie> Content-Type: text/plain Date: Wed, 20 May 2009 10:53:38 -0400 Message-Id: <1242831218.6194.13.camel@lts-notebook> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Mel Gorman Cc: starlight@binnacle.cx, Andrew Morton , linux-mm@kvack.org, bugzilla-daemon@bugzilla.kernel.org, bugme-daemon@bugzilla.kernel.org, Adam Litke , Eric B Munson , riel@redhat.com List-ID: On Wed, 2009-05-20 at 12:35 +0100, Mel Gorman wrote: > On Fri, May 15, 2009 at 02:53:27PM -0400, starlight@binnacle.cx wrote: > > Here's another possible clue: > > > > I tried the first 'tcbm' testcase on a 2.6.27.7 > > kernel that was hanging around from a few months > > ago and it breaks it 100% of the time. > > > > Completely hoses huge memory. Enough "bad pmd" > > errors to fill the kernel log. > > > > So I investigated what's wrong with 2.6.27.7. The problem is a race between > exec() and the handling of mlock()ed VMAs but I can't see where. The normal > teardown of pages is applied to a shared memory segment as if VM_HUGETLB > was not set. > > This was fixed between 2.6.27 and 2.6.28 but apparently by accident during the > introduction of CONFIG_UNEVITABLE_LRU. This patchset made a number of changes > to how mlock()ed are handled but I didn't spot which was the relevant change > that fixed the problem and reverse bisecting didn't help. I've added two people > that were working on the unevictable LRU patches to see if they spot something. Hi, Mel: and still do. With the unevictable lru, mlock()/mmap('LOCKED) now move the mlocked pages to the unevictable lru list and munlock, including at exit, must rescue them from the unevictable list. Since hugepages are not maintained on the lru and don't get reclaimed, we don't want to move them to the unevictable list, However, we still want to populate the page tables. So, we still call [_]mlock_vma_pages_range() for hugepage vmas, but after making the pages present to preserve prior behavior, we remove the VM_LOCKED flag from the vma. The basic change to handling of hugepage handling with the unevictable lru patches is that we no longer keep a huge page vma marked with VM_LOCKED. So, at exit time, there is no record that this is a vmlocked vma. A bit of context: before the unevictable lru, mlock() or mmap(MAP_LOCKED) would just set the VM_LOCKED flag and "make_pages_present()" for all but a few vma types. We've always excluded those that get_user_pages() can't handle and still do. With the unevictable lru, mlock()/mmap('LOCKED) now move the mlocked pages to the unevictable lru list and munlock, including at exit, must rescue them from the unevictable list. Since hugepages are not maintained on the lru and don't get reclaimed, we don't want to move them to the unevictable list, However, we still want to populate the page tables. So, we still call [_]mlock_vma_pages_range() for hugepage vmas, but after making the pages present to preserve prior behavior, we remove the VM_LOCKED flag from the vma. This may have resulted in the apparent fix to the subject problem in 2.6.28... > > For context, the two attached files are used to reproduce a problem > where bad pmd messages are scribbled all over the console on 2.6.27.7. > Do something like > > echo 64 > /proc/sys/vm/nr_hugepages > mount -t hugetlbfs none /mnt > sh ./test-tcbm.sh > > I did confirm that it didn't matter to 2.6.29.1 if CONFIG_UNEVITABLE_LRU is > set or not. It's possible the race it still there but I don't know where > it is. > > Any ideas where the race might be? No, sorry. Haven't had time to investigate this. Lee > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org