From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751837Ab0CXWta (ORCPT ); Wed, 24 Mar 2010 18:49:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49164 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751572Ab0CXWt3 (ORCPT ); Wed, 24 Mar 2010 18:49:29 -0400 Date: Wed, 24 Mar 2010 23:48:58 +0100 From: Andrea Arcangeli To: Johannes Weiner Cc: Andrew Morton , Naoya Horiguchi , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [rfc 5/5] mincore: transparent huge page support Message-ID: <20100324224858.GP10659@random.random> References: <1269354902-18975-1-git-send-email-hannes@cmpxchg.org> <1269354902-18975-6-git-send-email-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1269354902-18975-6-git-send-email-hannes@cmpxchg.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 23, 2010 at 03:35:02PM +0100, Johannes Weiner wrote: > +static int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, > + unsigned long addr, unsigned long end, > + unsigned char *vec) > +{ > + int huge = 0; > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > + spin_lock(&vma->vm_mm->page_table_lock); > + if (likely(pmd_trans_huge(*pmd))) { > + huge = !pmd_trans_splitting(*pmd); Under mmap_sem (read or write) a hugepage can't materialize under us. So here the pmd_trans_huge can be lockless and run _before_ taking the page_table_lock. That's the invariant I used to keep identical performance for all fast paths. And if it wasn't the case it wouldn't be safe to return huge = 0 as the page_table_lock is released at that point. > + spin_unlock(&vma->vm_mm->page_table_lock); > + /* > + * If we have an intact huge pmd entry, all pages in > + * the range are present in the mincore() sense of > + * things. > + * > + * But if the entry is currently being split into > + * normal page mappings, wait for it to finish and > + * signal the fallback to ptes. > + */ > + if (huge) > + memset(vec, 1, (end - addr) >> PAGE_SHIFT); > + else > + wait_split_huge_page(vma->anon_vma, pmd); > + } else > + spin_unlock(&vma->vm_mm->page_table_lock); > +#endif > + return huge; > +} > + It's probably cleaner to move the block into huge_memory.c and create a dummy for the #ifndef version like I did for all the rest. I'll incorporate and take care of those changes myself if you don't mind, as I'm going to do a new submit for -mm. I greatly appreciated you taken the time to port to transhuge it helps a lot! ;) Thanks, Andrea