From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A99A9C433EF for ; Wed, 25 May 2022 19:07:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E1908D0003; Wed, 25 May 2022 15:07:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 090FD8D0001; Wed, 25 May 2022 15:07:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EBC308D0003; Wed, 25 May 2022 15:07:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id DD7868D0001 for ; Wed, 25 May 2022 15:07:34 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B7D26359E0 for ; Wed, 25 May 2022 19:07:34 +0000 (UTC) X-FDA: 79505199228.02.35D8100 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf30.hostedemail.com (Postfix) with ESMTP id 7AAEF8002F for ; Wed, 25 May 2022 19:07:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=Hdz3E5cMNHktoQPf6GalaYtwFtpcTZ2xcee0DEdwyUI=; b=LSyc4arFZR6meIb70y5/HbgL3U +cvbg6UYpy6FqQ9Q/YpEG5mEl+8BkhgcppdOHOF74PYRH+XKm5pTEaT3Te05NS+wYaK3IjwO35vt5 WOm91Tr+68qK2tELhBJ5arxObYOHhdrBry+RdINvUn4JekgY69OphPIAF67R6mkFC/iqP14erXWoA faSH8uVlBkk5LlkzT6q4HQCSz4gCydVjY7q9A55P8Fl4c3rtCI4cf/qXCGmVPvPp7g89a8vaHmHWL UBFkjDnAKLLxOWw4t5szr24/+fGx8EHrxY9rYBfMbc0up0T1oyccygWgkNolbf7ij0F3qHn74kBzb KmSNb0/w==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1ntwLz-000cpg-Ei; Wed, 25 May 2022 19:07:31 +0000 Date: Wed, 25 May 2022 20:07:31 +0100 From: Matthew Wilcox To: Zach O'Keefe Cc: David Rientjes , "linux-mm@kvack.org" Subject: Re: mm/khugepaged: collapse file/shmem compound pages Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 7AAEF8002F X-Stat-Signature: z78cb1esuckn6fxiwppog37apehe7dgg X-Rspam-User: Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=LSyc4arF; spf=none (imf30.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none X-HE-Tag: 1653505624-13137 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 24, 2022 at 03:42:55PM -0700, Zach O'Keefe wrote: > Hey Matthew, > > I'm leading an attempt to add a new madvise mode, MADV_COLLAPSE, to > allow userspace-directed collapse of memory into THPs[1]. The initial > proposal only supports anonymous memory, but I'm > working on adding support for file-backed and shmem memory. > > The intended behavior of MADV_COLLAPSE is that it should return > "success" if all hugepage-aligned / sized regions requested are backed > by pmd-mapped THPs on return (races aside). IOW: we were able to > successfully collapse the memory, or it was already backed by > pmd-mapped THPs. > > Currently there is a nice "XXX: khugepaged should compact smaller > compound pages into a PMD sized page" in khugepaged_scan_file() when > we encounter a compound page during scanning. Do you know what kind of > gotchas or technical difficulties would be involved in doing this? I > presume this work would also benefit those relying on khugepaged to > collapse read-only file and shmem memory, and I'd be happy to help > move it forward. Hi Zach, Thanks for your interest, and I'd love some help on this. The khugepaged code (like much of the mm used to) assumes that memory comes in two sizes, PTE and PMD. That's still true for anon and shmem for now, but hopefully we'll start managing both anon & shmem memory in larger chunks, without necessarily going as far as PMD. I think the purpose of khugepaged should continue to be to construct PMD-size pages; I don't see the point of it wandering through process VMs replacing order-2 pages with order-5 pages. I may be wrong about that, of course, so feel free to argue with me. Anyway, that meaning behind that comment is that the PageTransCompound() test is going to be true on any compound page (TransCompound doesn't check that the page is necessarily a THP). So that particular test should be folio_test_pmd_mappable(), but there are probably other things which ought to be changed, including converting the entire file from dealing in pages to dealing in folios. I actually have one patch which starts in that direction, but I haven't followed it up yet with all the other patches to that file which will be needed: >From a64ac45ad951557103a1040c8bcc3f229022cd26 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 7 May 2021 23:40:19 -0400 Subject: [PATCH] mm/khugepaged: Allocate folios khugepaged only wants to deal in terms of folios, so switch to using the folio allocation functions. This eliminates the calls to prep_transhuge_page() and saves dozens of bytes of text. Signed-off-by: Matthew Wilcox (Oracle) --- mm/khugepaged.c | 32 ++++++++++++-------------------- 1 file changed, 12 insertions(+), 20 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 637bfecd6bf5..ec60ee4e14c9 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -854,18 +854,20 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) static struct page * khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) { + struct folio *folio; + VM_BUG_ON_PAGE(*hpage, *hpage); - *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); - if (unlikely(!*hpage)) { + folio = __folio_alloc_node(gfp, HPAGE_PMD_ORDER, node); + if (unlikely(!folio)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); *hpage = ERR_PTR(-ENOMEM); return NULL; } - prep_transhuge_page(*hpage); count_vm_event(THP_COLLAPSE_ALLOC); - return *hpage; + *hpage = &folio->page; + return &folio->page; } #else static int khugepaged_find_target_node(void) @@ -873,24 +875,14 @@ static int khugepaged_find_target_node(void) return 0; } -static inline struct page *alloc_khugepaged_hugepage(void) -{ - struct page *page; - - page = alloc_pages(alloc_hugepage_khugepaged_gfpmask(), - HPAGE_PMD_ORDER); - if (page) - prep_transhuge_page(page); - return page; -} - static struct page *khugepaged_alloc_hugepage(bool *wait) { - struct page *hpage; + struct folio *folio; do { - hpage = alloc_khugepaged_hugepage(); - if (!hpage) { + folio = folio_alloc(alloc_hugepage_khugepaged_gfpmask(), + HPAGE_PMD_ORDER); + if (!folio) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); if (!*wait) return NULL; @@ -899,9 +891,9 @@ static struct page *khugepaged_alloc_hugepage(bool *wait) khugepaged_alloc_sleep(); } else count_vm_event(THP_COLLAPSE_ALLOC); - } while (unlikely(!hpage) && likely(khugepaged_enabled())); + } while (unlikely(!folio) && likely(khugepaged_enabled())); - return hpage; + return &folio->page; } static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) -- 2.34.1