From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with SMTP id 66CC36B0044 for ; Sat, 19 Dec 2009 10:10:25 -0500 (EST) Date: Sat, 19 Dec 2009 16:09:16 +0100 From: Andrea Arcangeli Subject: Re: [PATCH 00 of 28] Transparent Hugepage support #2 Message-ID: <20091219150916.GW29790@random.random> References: <4B2A8D83.30305@redhat.com> <20091218140530.GE29790@random.random> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org To: Christoph Lameter Cc: Rik van Riel , linux-mm@kvack.org, Marcelo Tosatti , Adam Litke , Avi Kivity , Izik Eidus , Hugh Dickins , Nick Piggin , Mel Gorman , Andi Kleen , Dave Hansen , Benjamin Herrenschmidt , Ingo Molnar , Mike Travis , KAMEZAWA Hiroyuki , Chris Wright , Andrew Morton List-ID: On Fri, Dec 18, 2009 at 12:33:36PM -0600, Christoph Lameter wrote: > On Fri, 18 Dec 2009, Andrea Arcangeli wrote: > > > On Thu, Dec 17, 2009 at 02:09:47PM -0600, Christoph Lameter wrote: > > > Can we do this step by step? This splitting thing and its > > > associated overhead causes me concerns. > > > > The split_huge_page* functionality whole point is exactly to do things > > step by step. Removing it would mean doing it all at once. > > The split huge page thing involved introducing new refcounting and locking > features into the VM. Not a first step thing. And certainly difficult to > verify if it is correct. I can explain how it works no problem. I already did with Marcelo who also audited my change to put_page. > > This is like the big kernel lock when SMP initially was > > introduced. Surely kernel would have been a little faster if the big > > kernel lock was never introduced but over time the split_huge_page can > > be removed just like the big kernel lock has been removed. Then the > > PG_compound_lock can go away too. > > That is a pretty strange comparison. Split huge page is like introducing > the split pte lock after removing the bkl. You first want to solve the > simpler issues (anon huge) and then see if there is a way to avoid > introducing new locking methods. I can't get your comparison... The reasoning behind my comparison is very simple: we can't put spinlocks everywhere and pretend the kernel to become threaded as a whole overnight. But we can put a BKL (split_huge_page) that takes care of all not-yet-threaded (hugepage aware) code paths that can't be converted overnight (swap, and all the rest of mm/*.c) while we start threading file-by-file. First the scheduler (malloc/free) and then the rest... Removing split_huge_page if needed is simple. > > scalable. In the future mmu notifier users that calls gup will stop > > using FOLL_GET and in turn they will stop calling put_page, so > > eliminating any need to take the PG_compound_lock in all KVM fast paths. > > Maybe do that first then and never introduce the lock in the first place? It's not feasible as I documented in previous emails. Removing FOLL_GET surely would remove the need of all refcounting changes in put_page for tail pages, and it would remove the need of PG_compound_lock, but this only works for gup users that are registered into mmu notifier! O_DIRECT will never be able to use mmu notifier because it does DMA and we can't interrupt DMA in the middle from the mmu notifier invalidate handler. To say it in another way the only way to remove the PG_compound_lock used by put_page _only_ when called on PageTail pages, is to force anybody calling gup to be registered into mmu notifier and supporting interrupting any access to the physical page returned by gup before returning from mmu_notifier_invalidate*. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org