From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A5B4C77B7F for ; Tue, 24 Jun 2025 20:29:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 256836B0088; Tue, 24 Jun 2025 16:29:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 207896B00A6; Tue, 24 Jun 2025 16:29:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 144B56B00A2; Tue, 24 Jun 2025 16:29:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 032606B00A7 for ; Tue, 24 Jun 2025 16:29:37 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 70C39C104E for ; Tue, 24 Jun 2025 20:29:36 +0000 (UTC) X-FDA: 83591434752.01.35A8670 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf23.hostedemail.com (Postfix) with ESMTP id 24FDC140007 for ; Tue, 24 Jun 2025 20:29:33 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=g3ukmmM4 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750796974; a=rsa-sha256; cv=none; b=RFFxJ5+wyzJp52OAI2tH8KCe7iBwZIZU/gNym2D7H7+t2PAzJIVn+SFO82lvl1lLFTmXsk UfDZ3RWK/49siVuCqRJcQ5WZ6R8YImFubQcmkCUu2rMPagXRnDlM+XzJfjPSkem/kVpfTR zTggDfv7GgrrCephb1BeT/qaDosE5xs= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=g3ukmmM4; spf=none (imf23.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750796974; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=feBMW+aJI9knsD5n0X2ZaNAUB88wCulWvsvLfURx4LY=; b=ZlwEZqC4Sx++oZnR8PWQiEWfjMNev/fHeLZMa79LlEe8PPoUIOqJVH0aHjy2kEOiBPmOrw 07yfsP4DsXonE5IzYPuWllT/lBRjYrOwcOLSszMWZeIcuWb/IZmKZ4BxOl4ndBNVuB2rz9 imIi4pjkPk62wy1zBvZi9BEIA6BGBhE= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=feBMW+aJI9knsD5n0X2ZaNAUB88wCulWvsvLfURx4LY=; b=g3ukmmM4b9QfPWD/ex5oVNjBpZ zz9oISnRWOvqGsctD11jJ7Kv4XzYiKxsfYpqPnK+eybYD7vtFZ/v0q0B8BNPdJ9nleaUsnYhC50ut RnW/rhkXzb5XgHflBiEE53fZjxN+75H2Fb/MNQ2UwX8+2uArMf1NaCc6u8BggMh3Pgq06MqLHZmvp LFXLmpkKyrjaT/r+DwdMbxuu236R9l6tL39URxMOaMcPMAXRztGj0KxWW5e+ajo+FFBqAYgTBXIH/ ZcjG2KS9V0iqAcsjsKuM/NsupPHuI+HaefObU/PYRjI+eNQzBN4+4Wl5iwaWBTm35A58OJcHTpZzM du10wKsg==; Received: from willy by casper.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1uUAGp-00000007aSe-2keO; Tue, 24 Jun 2025 20:29:31 +0000 Date: Tue, 24 Jun 2025 21:29:31 +0100 From: Matthew Wilcox To: David Hildenbrand Cc: linux-mm@kvack.org Subject: Re: How should we RCU-free folios? Message-ID: References: <86582169-b4fa-4f4a-9480-612002b63174@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <86582169-b4fa-4f4a-9480-612002b63174@redhat.com> X-Stat-Signature: 44bi8gbuaotjzkii7sbsrk86kcdj6z7s X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 24FDC140007 X-Rspam-User: X-HE-Tag: 1750796973-447222 X-HE-Meta: U2FsdGVkX19VfCzzHoGKgAl5qLYk0yakl9QLlv9wtM5Vp9SJxB0wvY1ZilOt1ypkIZm0jGhKY/rjyBcyMUJXO7wZpME/w8c/SA7DgzHXoC/gdNrXAKq+mohodHW6Qg3GC8PuVO+ZPZPoqQ+WuZS6dwxejUPRVTa/r4V2uHxBxRrPRwibMx7qNuaskICXbzZ3uXtJ1sgmMoJPQMrjQcxhpSFlwlY7+LwKX07eLSv8YSeUXDCMXVLeFyhJlCLiUvYhkjaSxsjxZCr2Dhf/qsAibTsinaZ/H7VQxVkk1QvL21jFlssGGA0w7B3gr6MIE85N99KN93w1XAITKOtlPGxbnUOnodfBRxf5e8P+h0KQv3q48zMuJ8vdqlxwu3iFwb/LILN1SOayx4BDQ7KcPkH8yHjGQFU+4QJwcjSP5UW1mp/uyuQQreoqqbgjEFfBxphzwWXCS0IbdM6w2spZBD63Ja3JmQZ9+e2qN4VRnN9kz6toHdJG+U6wkqs/TWOrJxRRtI6gETZGbi25EHnk4IzhPTcmng7n3jmVG17eIur0RJh0pFS4qJaOiYUFddc3cu9vKvmwuciK4tly+9/SdUL0/l2gKsJcjz9xYy4qCXhOkpRsLhRF7lA5IwRXgxAMeZY0jmg24B4apR3Eqpzvkh0/GD6iwpwbfAD2jHv6MyfafAPJGpCeyonK+Nk6IVtVKFZPpm++UDLBe8M9A1+891O2JDfcLedR19yIXuHf9U41xO3a7Wy/2T3erUcX5VNpcgH8uafnlROsmDCAhx7JIR7SIF0nUkt1Dx1psR9mr2wJqWyfAk6EeYsgBHbKWQmELrVR3wqbK0eTCKBr/t5Anr2+zwJ1glbtJm4+JJckVL53WfanHzQLIONePvyfshbf/JkKgBBY+j8pLe+u/PX584xCaOxUjS+qQ1gze+yvONXsei8h2A/wqJAPd95EOy2zhgjUYPNrvT63jVC+yULecMg Fkh3X5yA N/cAHeJtjyl/8izcOnZD/y7Liq2y237Z4+pb6Wu1RKJCwrvzB0+eQVJvsSquQS+6G9s+EM0XOrWLjGcjf2Wvk6CTd9hv4VBvo6GZ2GLntJY0AQVtoy97ImtFpyor/fLf2Lm6KRE+Xpc9xYVohI1vSLRWNxVz8W6824bIrf5daJy3Pv0/3ZkXdp7X1iO92nGvjuizZOGdT70GovV9kfP6wY7jfvAisFlbAnaFizXPAhlMTgJCO5RcFQ9BYCRKI+tZCXCSdyPSE43Gj+6CmjmrKa2BHfg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 24, 2025 at 02:13:47PM +0200, David Hildenbrand wrote: > On 29.05.25 17:02, Matthew Wilcox wrote: > > When folios are allocated separately from the underlying pages they > > represent, they must also be freed. See > > https://kernelnewbies.org/MatthewWilcox/FolioAlloc > > > > Since we want to do lockless lookups of folios in the page cache and > > GUP, > > And in PFN walkers as well. Good point (for those not quite clear what David means here, think migration where we're doing a physical walk and trying to decide what to do with the memory, although that's just an example; hwpoison detection has similar problems) > > 1. Free the folio back to the slab immediately, and mark the slab as > > TYPESAFE_BY_RCU. That means that the folio may get reallocated at > > any time, but it must always remain a folio (until an RCU grace period > > has passed and then the entire slab may be reallocated to a different > > purpose). Lookups will do: > > > > a. Get a pointer to the folio > > b. Tryget a refcount on the folio > > c. If it succeeds, re-check the folio is still the one we want > > (If pagecache, check the xarray still points to the folio; if GUP, > > check the page still points to the folio) > > Hm, that means that all PFN walker would now also have to do a tryget > unconditionally. To a certain extent. At least for migration, there's a first pass where we can just look at the value contained in the memdesc to decide if this block is migratable, then in the second pass we get the refcount and start doing migration-things to each page. > Also, free hugetlb folios have a refcount of 0 right now ... Right ... I think handling of hugetlb folios will probably change a bit. A free hugetlb folio probably doesn't free the folio, but might set a flag indicating that it's free. It'd be up to the PFN walker to, say, grab the hugetlb_lock which would make sure this hugetlb folio wasn't allocated while it's messing with it. > > 2. RCU-free the folio. The folio will not be reallocated until the > > reader drops the RCU read lock. The read side still needs to tryget > > the folio refcount. However, if it succeeds, it does not need to > > re-check the pointer to the folio as the folio cannot have been > > freed. The downside is that folios will hang around in the system for > > longer before being reallocated, and this may be an unacceptable > > increase in memory usage. > > > > 3. RCU free the folio and RCU free the memory it controls. Now an > > RCU-protected lookup doesn't need to bump the refcount; if it found the > > pointer, it knows the memory cannot be freed. I think this is a > > step too far and would > > That sound nice, though :) > > > > > I'm favouring option 1; it's what we currently do. But I wanted to > > give people a chance to chime in and tell me my tradeoffs are wrong. > > Or propose a fourth option. > > I really dislike the refcount dependency. > > Also ... what about memdescs without a refcount (e.g., PFN walkers and > slab?)? Depending on the PFN walker, it needs to know how to handle each kind of memdesc. Migration might choose to skip slabs and so "handle" them by moving on to the next block. hwpoison doesn't need to handle them either (the system is dead if we see poison in a slab). I'm not sure how a PFN walker can protect against slab "doing something" with the struct slab. Maybe something like slab_lock() will be needed (yes, I know mostly slab bypasses slab_lock). But it is going to be a per-memdesc kind of problem to solve. Two things I did want to raise though: First, this is an improvement. There's altogether too much code that thinks "If I raise the refcount on the page, that will prevent the memory from being freed". And it'll certainly prevent the page from being returned to the page allocator, but it won't prevent the slab allocator from reusing the memory. Other allocators (eg dma_pool)? No idea. Second, struct slab doesn't need to be RCU freed (unless we discover PFN walkers are going to force us to). The slab allocator knows it is the only user, and when it's done, it can just free it and there's no chance anybody else is looking at it. Unless PFN walkers look at it, which they can't today because struct slab is in mm/slab.h.