From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E70E0C433EF for ; Thu, 26 May 2022 01:24:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A0878D0003; Wed, 25 May 2022 21:24:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 54BF68D0001; Wed, 25 May 2022 21:24:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 414FC8D0003; Wed, 25 May 2022 21:24:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2D02C8D0001 for ; Wed, 25 May 2022 21:24:32 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D4B7E3C4 for ; Thu, 26 May 2022 01:24:31 +0000 (UTC) X-FDA: 79506149142.23.05921D6 Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) by imf17.hostedemail.com (Postfix) with ESMTP id C5A2E40035 for ; Thu, 26 May 2022 01:24:00 +0000 (UTC) Received: by mail-lf1-f54.google.com with SMTP id bq30so322191lfb.3 for ; Wed, 25 May 2022 18:24:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WHBrNWXzTMZ5nc5NsSIxXlS4ki/e4Ypcdqyo2x17aJQ=; b=e13JEYVqkrsPkspBetqTJTFfdHzKamhjwEXHaRLhqGDdXdyMvP91Di4dIqyeoBxjNg wq0901Vd+fFp7rfM31iTmE6/Hzyax/d91BoqewmwVBKMLhKp6fbRXVYShBuPbx/ST5s7 fK/WTCIWhcCjHg7M1wT1Pim0M6njZwLrKuXMM57MD0Ei1uhIlWtfV6vk3v971Mk6X2QR /yKgiX/EMYXHIv1s3PpI+3xdmZDNpjpNk42d1HmeklY8JxqHcvX/AtGNSHhjULY+wEej G/jYOhtyrDjwx642ejP14KZcvPQomT2dzAyvMf0A4yH3/rhUKPIQpmKBynhyPvht41t+ cxNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WHBrNWXzTMZ5nc5NsSIxXlS4ki/e4Ypcdqyo2x17aJQ=; b=sYXQtF64xvIW1gKAM+e6gKEnZEYgRJ59Yd3xJ1leEdcj7TpB0i8TxRtSEJYXptn2+N GDHECMsPXxXZfTsuCrX/CZS81n8D4+6XH3SZZ3t4UlRah7YmWFBmi1IWNqJL65aldLIE uF0jazuwZTmY/KhvXCe134Ibs53ILlytlsEd+91AVJI92Askp560bYTUnjhXP/aC0I9B qZA/c4k80vt1e+eknrV2WQp+JRw0FpQWrpNnwaiQ+r5P8z2Ec9ZtswvntxXsIgoja3/u OGsStgrGXyACTVUOCeeSot/Da//pEc6rmrzqwM3iAaIZ6f2EHGpGqpPUzpxaSMJrLBBm H1nw== X-Gm-Message-State: AOAM5312E2m5+dNnKqNy7e7fTcNM4nJzxoqv7+82zxUBzpmOxBrTE0e0 WboKacW2nCjvfna4Iy13dBSfLtRHDVH+UYnrnfY1Cw== X-Google-Smtp-Source: ABdhPJx+Z06ln4NXss2njBH317ucA5xjgU0Qv1fOHnGBMUH3G0t8+BLOHyb+i92epkAwb4U+oJ0yvO0jOyfKcTBIWjo= X-Received: by 2002:ac2:4988:0:b0:478:7f26:5c62 with SMTP id f8-20020ac24988000000b004787f265c62mr10329920lfl.198.1653528269244; Wed, 25 May 2022 18:24:29 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: "Zach O'Keefe" Date: Wed, 25 May 2022 18:23:52 -0700 Message-ID: Subject: Re: mm/khugepaged: collapse file/shmem compound pages To: Matthew Wilcox Cc: David Rientjes , "linux-mm@kvack.org" Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: C5A2E40035 X-Stat-Signature: q4fa1pz8zpa9z3sa51usdt89g8extw4f Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=e13JEYVq; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of zokeefe@google.com designates 209.85.167.54 as permitted sender) smtp.mailfrom=zokeefe@google.com X-Rspam-User: X-HE-Tag: 1653528240-697305 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 25, 2022 at 12:07 PM Matthew Wilcox wrote: > > On Tue, May 24, 2022 at 03:42:55PM -0700, Zach O'Keefe wrote: > > Hey Matthew, > > > > I'm leading an attempt to add a new madvise mode, MADV_COLLAPSE, to > > allow userspace-directed collapse of memory into THPs[1]. The initial > > proposal only supports anonymous memory, but I'm > > working on adding support for file-backed and shmem memory. > > > > The intended behavior of MADV_COLLAPSE is that it should return > > "success" if all hugepage-aligned / sized regions requested are backed > > by pmd-mapped THPs on return (races aside). IOW: we were able to > > successfully collapse the memory, or it was already backed by > > pmd-mapped THPs. > > > > Currently there is a nice "XXX: khugepaged should compact smaller > > compound pages into a PMD sized page" in khugepaged_scan_file() when > > we encounter a compound page during scanning. Do you know what kind of > > gotchas or technical difficulties would be involved in doing this? I > > presume this work would also benefit those relying on khugepaged to > > collapse read-only file and shmem memory, and I'd be happy to help > > move it forward. Hey Matthew, Thanks for taking the time! > > Hi Zach, > > Thanks for your interest, and I'd love some help on this. > > The khugepaged code (like much of the mm used to) assumes that memory > comes in two sizes, PTE and PMD. That's still true for anon and shmem > for now, but hopefully we'll start managing both anon & shmem memory in > larger chunks, without necessarily going as far as PMD. > > I think the purpose of khugepaged should continue to be to construct > PMD-size pages; I don't see the point of it wandering through process VMs > replacing order-2 pages with order-5 pages. I may be wrong about that, > of course, so feel free to argue with me. I'd agree here. > Anyway, that meaning behind that comment is that the PageTransCompound() > test is going to be true on any compound page (TransCompound doesn't > check that the page is necessarily a THP). So that particular test should > be folio_test_pmd_mappable(), but there are probably other things which > ought to be changed, including converting the entire file from dealing > in pages to dealing in folios. Right, at this point, the page might be a pmd-mapped THP, or it could be a pte-mapped compound page (I'm unsure if we can encounter compound pages outside hugepages). If we could tell it's already pmd-mapped, we're done :) IIUC, folio_test_pmd_mappable() is a necessary but not sufficient condition to determine this. Else, if it's not, is it safe to try and continue? Suppose we find a folio of 0 < order < HPAGE_PMD_ORDER. Are we safely able to try and extend it, or will we break some filesystems that expect a certain order folio? > I actually have one patch which starts in that direction, but I haven't > followed it up yet with all the other patches to that file which will > be needed: Thanks for the head start! Not an expert here, but would you say converting this file to use folios is a necessary first step? Again, thanks for your time, Zach > From a64ac45ad951557103a1040c8bcc3f229022cd26 Mon Sep 17 00:00:00 2001 > From: "Matthew Wilcox (Oracle)" > Date: Fri, 7 May 2021 23:40:19 -0400 > Subject: [PATCH] mm/khugepaged: Allocate folios > > khugepaged only wants to deal in terms of folios, so switch to > using the folio allocation functions. This eliminates the calls to > prep_transhuge_page() and saves dozens of bytes of text. > > Signed-off-by: Matthew Wilcox (Oracle) > --- > mm/khugepaged.c | 32 ++++++++++++-------------------- > 1 file changed, 12 insertions(+), 20 deletions(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 637bfecd6bf5..ec60ee4e14c9 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -854,18 +854,20 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) > static struct page * > khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) > { > + struct folio *folio; > + > VM_BUG_ON_PAGE(*hpage, *hpage); > > - *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); > - if (unlikely(!*hpage)) { > + folio = __folio_alloc_node(gfp, HPAGE_PMD_ORDER, node); > + if (unlikely(!folio)) { > count_vm_event(THP_COLLAPSE_ALLOC_FAILED); > *hpage = ERR_PTR(-ENOMEM); > return NULL; > } > > - prep_transhuge_page(*hpage); > count_vm_event(THP_COLLAPSE_ALLOC); > - return *hpage; > + *hpage = &folio->page; > + return &folio->page; > } > #else > static int khugepaged_find_target_node(void) > @@ -873,24 +875,14 @@ static int khugepaged_find_target_node(void) > return 0; > } > > -static inline struct page *alloc_khugepaged_hugepage(void) > -{ > - struct page *page; > - > - page = alloc_pages(alloc_hugepage_khugepaged_gfpmask(), > - HPAGE_PMD_ORDER); > - if (page) > - prep_transhuge_page(page); > - return page; > -} > - > static struct page *khugepaged_alloc_hugepage(bool *wait) > { > - struct page *hpage; > + struct folio *folio; > > do { > - hpage = alloc_khugepaged_hugepage(); > - if (!hpage) { > + folio = folio_alloc(alloc_hugepage_khugepaged_gfpmask(), > + HPAGE_PMD_ORDER); > + if (!folio) { > count_vm_event(THP_COLLAPSE_ALLOC_FAILED); > if (!*wait) > return NULL; > @@ -899,9 +891,9 @@ static struct page *khugepaged_alloc_hugepage(bool *wait) > khugepaged_alloc_sleep(); > } else > count_vm_event(THP_COLLAPSE_ALLOC); > - } while (unlikely(!hpage) && likely(khugepaged_enabled())); > + } while (unlikely(!folio) && likely(khugepaged_enabled())); > > - return hpage; > + return &folio->page; > } > > static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) > -- > 2.34.1 >