From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 949D9C433F5 for ; Thu, 26 May 2022 03:36:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A9CFB8D0003; Wed, 25 May 2022 23:36:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A490F8D0002; Wed, 25 May 2022 23:36:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E8088D0003; Wed, 25 May 2022 23:36:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 80ADF8D0002 for ; Wed, 25 May 2022 23:36:10 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 5249031D01 for ; Thu, 26 May 2022 03:36:10 +0000 (UTC) X-FDA: 79506480900.19.C844ED8 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf19.hostedemail.com (Postfix) with ESMTP id 502441A00E5 for ; Thu, 26 May 2022 03:35:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=svdE7NTKpsvWjcLXRdhFJ3LCcS6tT3YnTsTuoCuV+gg=; b=aq05cegRNvGNF7eRPHJ7TYBzV2 aPXKmQQ99rQ2u4EuZoH3szSKUsbR4asT4dciA8y86alsEK9YJB3Ty+QmizX1R2HlvBKD7qB9peLXm ekRMPJYsIAyZiQdqKSZoNFr46leeipdmnhw2EkpD+z2q/6eixgxF2ix+r4ioW6NCY2+f32ZvSjNcR o9ACwGKbqrDyrEGPaxm6MGJTw57/KjVFqYegLsDg0Fm0F2KYbZk3uG/MSwXV3tgY81JTm6kOIGxDk AUN6M/Xrrrr6mtkX9lCAZ5yOsIWIbi8nWA8hZpkjW4XYWaiAK8l8lAI2eKMzJSfo2sxUamFF8gzO/ FcWWf3oQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nu4I8-000vZt-OQ; Thu, 26 May 2022 03:36:04 +0000 Date: Thu, 26 May 2022 04:36:04 +0100 From: Matthew Wilcox To: Zach O'Keefe Cc: David Rientjes , "linux-mm@kvack.org" Subject: Re: mm/khugepaged: collapse file/shmem compound pages Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 502441A00E5 X-Stat-Signature: 4hsunzgudxzjthfezkjfiede9buh3tno X-Rspam-User: Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=aq05cegR; spf=none (imf19.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none X-HE-Tag: 1653536156-501849 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 25, 2022 at 06:23:52PM -0700, Zach O'Keefe wrote: > On Wed, May 25, 2022 at 12:07 PM Matthew Wilcox wrote: > > The khugepaged code (like much of the mm used to) assumes that memory > > comes in two sizes, PTE and PMD. That's still true for anon and shmem > > for now, but hopefully we'll start managing both anon & shmem memory in > > larger chunks, without necessarily going as far as PMD. > > > > I think the purpose of khugepaged should continue to be to construct > > PMD-size pages; I don't see the point of it wandering through process VMs > > replacing order-2 pages with order-5 pages. I may be wrong about that, > > of course, so feel free to argue with me. > > I'd agree here. > > > Anyway, that meaning behind that comment is that the PageTransCompound() > > test is going to be true on any compound page (TransCompound doesn't > > check that the page is necessarily a THP). So that particular test should > > be folio_test_pmd_mappable(), but there are probably other things which > > ought to be changed, including converting the entire file from dealing > > in pages to dealing in folios. > > Right, at this point, the page might be a pmd-mapped THP, or it could > be a pte-mapped compound page (I'm unsure if we can encounter compound > pages outside hugepages). Today, there is a way. We can find a folio with an order between 0 and PMD_ORDER if the underlying filesystem supports large folios and the file is executable and we've enabled CONFIG_READ_ONLY_THP_FOR_FS. In this case, we'll simply skip over it because the code believes that means it's already a PMD. > If we could tell it's already pmd-mapped, we're done :) IIUC, > folio_test_pmd_mappable() is a necessary but not sufficient condition > to determine this. It is necessary, but from khugepaged's point of view, it's sufficient because khugepaged's job is to create PMD-sized folios -- it's not up to khugepaged to ensure that PMD-sized folios are actually mapped using a PMD. There may be some other component of the system (eg DAMON?) which has chosen to temporarily map the PMD-sized folio using PTEs in order to track whether the memory is all being used. It may also be the case that (for file-based memory), the VMA is mis-aligned and despite creating a PMD-sized folio, it can't be mapped with a PMD. > Else, if it's not, is it safe to try and continue? Suppose we find a > folio of 0 < order < HPAGE_PMD_ORDER. Are we safely able to try and > extend it, or will we break some filesystems that expect a certain > order folio? We're not giving filesystems the opportunity to request that ;-) Filesystems are expected to handle folios of arbitrary order (if they claim the ability to support large folios at all). In practice, I've capped the folio creation size at PMD_ORDER (because I don't want to track down all the places that assume pmd_page() is necessarily a head page), but filesystems shouldn't rely on it. shmem still expects folios to be of order either 0 or PMD_ORDER. That assumption extends into the swap code and I haven't had the heart to go and fix all those places yet. Plus Neil was doing major surgery to the swap code in the most recent deveopment cycle and I didn't want to get in his way. So I am absolutely fine with khugepaged allocating a PMD-size folio for any inode that claims mapping_large_folio_support(). If any filesystems break, we'll fix them. > > I actually have one patch which starts in that direction, but I haven't > > followed it up yet with all the other patches to that file which will > > be needed: > > Thanks for the head start! Not an expert here, but would you say > converting this file to use folios is a necessary first step? Not _necessary_, but I find it helps keep things clearer. Plus it's something that needs to happen anyway.