From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67F2EC433F5 for ; Sat, 28 May 2022 03:48:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9EFF58D0003; Fri, 27 May 2022 23:48:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 99F028D0001; Fri, 27 May 2022 23:48:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 88BE18D0003; Fri, 27 May 2022 23:48:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 765DF8D0001 for ; Fri, 27 May 2022 23:48:48 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 430A020BD0 for ; Sat, 28 May 2022 03:48:48 +0000 (UTC) X-FDA: 79513770336.23.93E29D5 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf22.hostedemail.com (Postfix) with ESMTP id E4883C0040 for ; Sat, 28 May 2022 03:48:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=7aHrE4aJBJp2oNP9YAJPCLNXOvIxCkEvrOtLHsW/B08=; b=owz7B6HdtvGC1+8A3iLKKkGTbh /OTjuic2op0wlvBJ/lnpaZjP6DVSBMsnw9AG4WtU+8KrVqnKy+6nDk3k3YDWdA8dGsgt8zGcpWwwe LqOpPbpsmvPXLSHGUDRxm51LCrGrSDqC4HDOAtBH0qMd+Dlwiy0GGnbTKm3l5VnY/bKGDBE8Aq5uI yLthX1iN11y8cwtiFHdH454+KRfagZusbhzy5zL8+ca5hnMqx2YJ+2myXVaaXII1kQDtiAQ3LyNVN gCpFsuLPaFfCH/RgE93uWITk4oq5v7zt2sDP6o2Lb/RcYjPvFq6lfEBwLrqAKsf3r31jQkrsCLw3T 7Fev9lpg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nunRV-002azJ-Ra; Sat, 28 May 2022 03:48:45 +0000 Date: Sat, 28 May 2022 04:48:45 +0100 From: Matthew Wilcox To: Zach O'Keefe Cc: David Rientjes , "linux-mm@kvack.org" Subject: Re: mm/khugepaged: collapse file/shmem compound pages Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: E4883C0040 X-Stat-Signature: ffmkptyntg54n84qxft78osorhtty7q4 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=owz7B6Hd; dmarc=none; spf=none (imf22.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org X-HE-Tag: 1653709724-854352 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, May 27, 2022 at 09:27:33AM -0700, Zach O'Keefe wrote: > On Thu, May 26, 2022 at 8:47 PM Matthew Wilcox wrote: > > Because PageTransCompound() does not do what it says on the tin. > > > > static inline int PageTransCompound(struct page *page) > > { > > return PageCompound(page); > > } > > > > So any compound page is treated as if it's a PMD-sized page. > > Right - therein lies the problem :) I think I misattributed your > comment "we'll simply skip over it because the code believes that > means it's already a PMD" as a solution, not as the current state of > things. What we need to be able to do is: > > 1) If folio order == 0: do what we've been doing > 2) If folio order == HPAGE_PMD_ORDER: check if it's _actually_ > pmd-mapped. If it is, we're done. If not, continue to step (3) I would not do that part. Just leave it alone and assume everything's good. > 3) Else (folio order > 0 and not pmd-mapped): new magic; hopefully > it's ~ same as step (1) Yes, exactly this. > > > I thought the point / benefit of khugepaged was precisely to try and > > > find places where we can collapse many pte entries into a single pmd > > > mapping? > > > > Ideally, yes. But if a file is mapped at an address which isn't > > PMD-aligned, it can't. Maybe it should just decline to operate in that > > case. > > To make sure I'm not missing anything here: It's not actually > important that the file is mapped at a pmd-aligned address. All that > is important is that the region of memory being collapsed is > pmd-aligned. If we wanted to collapse memory mapped to the start of > the file, then sure, the file has to be mapped suitably. Ah, what you're probably missing is that for file pages/folios, they have to be naturally aligned. The data structure underlying the page cache simply can't cope with askew pages. (It kind of can under some circumstances that are so complicated that they shouldn't be explained, and it's far easier just to say "folios must be naturally aligned within the file") > > > > shmem still expects folios to be of order either 0 or PMD_ORDER. > > > > That assumption extends into the swap code and I haven't had the heart > > > > to go and fix all those places yet. Plus Neil was doing major surgery > > > > to the swap code in the most recent deveopment cycle and I didn't want > > > > to get in his way. > > > > > > > > So I am absolutely fine with khugepaged allocating a PMD-size folio for > > > > any inode that claims mapping_large_folio_support(). If any filesystems > > > > break, we'll fix them. > > > > > > Just for clarification, what is the equivalent code today that > > > enforces mapping_large_folio_support()? I.e. today, khugepaged can > > > successfully collapse file without checking if the inode supports it > > > (we only check that it's a regular file not opened for writing). > > > > Yeah, that's a dodgy hack which needs to go away. But we need a lot > > more filesystems converted to supporting large folios before we can > > delete it. Not your responsibility; I'm doing my best to encourage > > fs maintainers to do this part. > > Got it. In the meantime, do we want to check the old conditions + > mapping_large_folio_support()? Yes, that should work. khugepaged should be free to create large folios if the underlying filesystem supports them OR (executable, read-only, CONFIG_THP_RO, etc, etc). > > > Also, just to check, there isn't anything wrong with following > > > collapse_file()'s approach, even for folios of 0 < order < > > > HPAGE_PMD_ORDER? I.e this part: > > > > > > * Basic scheme is simple, details are more complex: > > > * - allocate and lock a new huge page; > > > * - scan page cache replacing old pages with the new one > > > * + swap/gup in pages if necessary; > > > * + fill in gaps; > > > * + keep old pages around in case rollback is required; > > > * - if replacing succeeds: > > > * + copy data over; > > > * + free old pages; > > > * + unlock huge page; > > > * - if replacing failed; > > > * + put all pages back and unfreeze them; > > > * + restore gaps in the page cache; > > > * + unlock and free huge page; > > > */ > > > > Correct. At least, as far as I know! Working on folios has been quite > > the education for me ... > > Great! Well, perhaps I'll run into a snafu here or there (and > hopefully learn something myself) but this gives me enough confidence > to naively give it a try and see what happens! > > Again, thank you very much for your time, help and advice with this, You're welcome! Thanks for putting in some work on this project!