From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with SMTP id C01746B0044 for ; Fri, 18 Dec 2009 09:07:11 -0500 (EST) Date: Fri, 18 Dec 2009 15:05:30 +0100 From: Andrea Arcangeli Subject: Re: [PATCH 00 of 28] Transparent Hugepage support #2 Message-ID: <20091218140530.GE29790@random.random> References: <4B2A8D83.30305@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org To: Christoph Lameter Cc: Rik van Riel , linux-mm@kvack.org, Marcelo Tosatti , Adam Litke , Avi Kivity , Izik Eidus , Hugh Dickins , Nick Piggin , Mel Gorman , Andi Kleen , Dave Hansen , Benjamin Herrenschmidt , Ingo Molnar , Mike Travis , KAMEZAWA Hiroyuki , Chris Wright , Andrew Morton List-ID: On Thu, Dec 17, 2009 at 02:09:47PM -0600, Christoph Lameter wrote: > Can we do this step by step? This splitting thing and its > associated overhead causes me concerns. The split_huge_page* functionality whole point is exactly to do things step by step. Removing it would mean doing it all at once. This is like the big kernel lock when SMP initially was introduced. Surely kernel would have been a little faster if the big kernel lock was never introduced but over time the split_huge_page can be removed just like the big kernel lock has been removed. Then the PG_compound_lock can go away too. > Frankly I am not sure that there is a problem. The word swap is mostly > synonymous with "problem". Huge pages are good. I dont think one > needs to necessarily associate something good (huge page) with a known > problem (swap) otherwise the whole may not improve. Others already answered extensively on why it is needed. Also look at Hugh's effort to make KSM pages swappable. Plus locking the huge pages in ram wouldn't actually remove the need of split_huge_page for all other places in the VM that aren't hugepage aware yet and where there is no urgency to make them swap aware either. NOTE: especially after "echo madvise > /sys/kernel/mm/transparent_hugepage/enabled" the risk of overhead caused by split_huge_page is exactly zero! (well unless you swap but at that point you're generally I/O bound or the locking on anon_vma lock is surely bigger scalability concern than the CPU cost of splitting, with or without split_huge_page) Also for hugetlbfs the overhead caused by the PG_compound_lock taken on tail pages is zero for anything but O_DIRECT, O_DIRECT is the only thing that can call put_page on tail pages. Everything else only work with head pages and with head pages there is zero slowdown caused by the PG_compound_lock. This is true for transparent hugepages too in fact, and O_DIRECT is I/O bound so the PG_compound_lock shouldn't be a big issue given it is a per-compound-page lock and so fully scalable. In the future mmu notifier users that calls gup will stop using FOLL_GET and in turn they will stop calling put_page, so eliminating any need to take the PG_compound_lock in all KVM fast paths. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org