From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB935404BCE; Mon, 15 Jun 2026 15:28:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781537294; cv=none; b=lDjQzr5MJLz7xYuXnm0YKzmT8jyBkPuXQkgR6RzzXntuULZlAMQorN7rXe96wllJw02i/uQCu8pGSbqYVj20Krnw/Ts/uIh1MtaoLaLKtcERvdy8jLYabq7pOSagta/511VW2bX7yCSEV30C2r/OvFWp0VYL+tX8afHMrVTT4e0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781537294; c=relaxed/simple; bh=BoTtY+5WFCKyM/nJth0Ri9/adaIwRPOC6uGO8MF3zLs=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=WQd0Wwtw2CD46QUbn3sKpcU13XCugA1EB2fTTl9Xa6OIROyqMtt5cVfBQ6+vMyGRfnPQRrUjI7txL1r2P+sjWbH3KlOrTvvot0Si6EpNMg62V1dHpnzy97zPtgT8xFH/f1MhQsONCn/Pp/QUm99oM88TxJP/XBP+P+MpXOnen+4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Y81Np9cz; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Y81Np9cz" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 588391F000E9; Mon, 15 Jun 2026 15:27:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1781537293; bh=BSNMJxqwkwoTqeJfSaHajKXGyp3UZ1tNKxLN3vqI2Ig=; h=Date:Subject:To:Cc:References:From:In-Reply-To; b=Y81Np9czIc3cjUXPKriZfo/1HV+7/NUQLqsciOax4hLZKuRqaQ9npdqUJvefURjsu mZ6XBQ8zSK7d00EcnLrpjppbQD9ppVyP6sRo1WhHh4oGv8j/UkfTmY9dGaPwy34BjQ 7OSQp4gPIA0UxkWes4k0VA6NOTw+5iAAAA/XOLntaxBnCPjCc0U7upqZk4mOHvxlEx VT0du4RCMg/ExAzkPcJjBE5cpmA5Fk/E5L7VJ4eaZ9gu8f3bWWJehegZ3r+4OC9QAX S+zcyQTg42FgB2nfy0zLoMlJSWkhFmxi6piZSbKw7bRPBtPQpzKDKTFTe6xsbWs1JP H6LOYFggYwoFg== Message-ID: <94d6c446-a8a6-485e-bb3c-ee809ebb1d3b@kernel.org> Date: Mon, 15 Jun 2026 17:27:57 +0200 Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM) Content-Language: en-US To: "David Hildenbrand (Arm)" , Gregory Price Cc: Balbir Singh , lsf-pc@lists.linux-foundation.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, damon@lists.linux.dev, kernel-team@meta.com, gregkh@linuxfoundation.org, rafael@kernel.org, dakr@kernel.org, dave@stgolabs.net, jonathan.cameron@huawei.com, dave.jiang@intel.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com, longman@redhat.com, akpm@linux-foundation.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, osalvador@suse.de, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, ying.huang@linux.alibaba.com, apopple@nvidia.com, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, yury.norov@gmail.com, linux@rasmusvillemoes.dk, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, jackmanb@google.com, sj@kernel.org, baolin.wang@linux.alibaba.com, npache@redhat.com, ryan.roberts@arm.com, dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev, muchun.song@linux.dev, xu.xin16@zte.com.cn, chengming.zhou@linux.dev, jannh@google.com, linmiaohe@huawei.com, nao.horiguchi@gmail.com, pfalcato@suse.de, rientjes@google.com, shakeel.butt@linux.dev, riel@surriel.com, harry.yoo@oracle.com, cl@gentwo.org, roman.gushchin@linux.dev, chrisl@kernel.org, kasong@tencent.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, zhengqi.arch@bytedance.com, terry.bowman@amd.com, Matthew Wilcox References: <9f1815b0-896b-44ab-9e6d-9316d8f11033@kernel.org> From: "Vlastimil Babka (SUSE)" Autocrypt: addr=vbabka@kernel.org; keydata= xsFNBFZdmxYBEADsw/SiUSjB0dM+vSh95UkgcHjzEVBlby/Fg+g42O7LAEkCYXi/vvq31JTB KxRWDHX0R2tgpFDXHnzZcQywawu8eSq0LxzxFNYMvtB7sV1pxYwej2qx9B75qW2plBs+7+YB 87tMFA+u+L4Z5xAzIimfLD5EKC56kJ1CsXlM8S/LHcmdD9Ctkn3trYDNnat0eoAcfPIP2OZ+ 9oe9IF/R28zmh0ifLXyJQQz5ofdj4bPf8ecEW0rhcqHfTD8k4yK0xxt3xW+6Exqp9n9bydiy tcSAw/TahjW6yrA+6JhSBv1v2tIm+itQc073zjSX8OFL51qQVzRFr7H2UQG33lw2QrvHRXqD Ot7ViKam7v0Ho9wEWiQOOZlHItOOXFphWb2yq3nzrKe45oWoSgkxKb97MVsQ+q2SYjJRBBH4 8qKhphADYxkIP6yut/eaj9ImvRUZZRi0DTc8xfnvHGTjKbJzC2xpFcY0DQbZzuwsIZ8OPJCc LM4S7mT25NE5kUTG/TKQCk922vRdGVMoLA7dIQrgXnRXtyT61sg8PG4wcfOnuWf8577aXP1x 6mzw3/jh3F+oSBHb/GcLC7mvWreJifUL2gEdssGfXhGWBo6zLS3qhgtwjay0Jl+kza1lo+Cv BB2T79D4WGdDuVa4eOrQ02TxqGN7G0Biz5ZLRSFzQSQwLn8fbwARAQABzSNWbGFzdGltaWwg QmFia2EgPHZiYWJrYUBrZXJuZWwub3JnPsLBsAQTAQoAWhYhBKlA1DSZLC6OmRA9UCJPp+fM gqZkBQJqFFy6GxSAAAAAAAQADm1hbnUyLDIuNSsxLjEyLDIsMgIbAwUJGtCBUAULCQgHAwUV CgkICwUWAgMBAAIeBQIXgAAKCRAiT6fnzIKmZJIUEADFx/tREzUImHrEwVHeSvDFmA7tJysI UVrlvrM09E7GIuzphzv7jYmo8n3ANpCczLEVr4G0syYQdTigaZgv3+FQDIIzhKih1IHhu1Ei XHlywNWKnQxxQEUNi5Mwx43wQz5XVw9F1A7gtKBKNtfogO511hAbrzagrYajyQacEJ/+sfhZ 9Da8ltHIXD8pcYaHUfQgEusCgmEd9+KrUwrTbckFKmYq5chuE6yJ4J0EmWknL096jIE6CnzF FRslQ3B1UKDjxVsm1ZHfir5NeWszLkTvGFsddFaWTgh8UycESG6VQzKXjjewXu2pG7YQYRpj QKm1W5X2TkwWkXRBZTmfmbhxIUMh3+zf5wQ463rSmDN/8v81tdqBtAW6rH/kzg1GvkaTHXn0 507yEHFzBksk2viAuIxxr7km8+/KARYLIdGtx30EG8cKzAUZOK6WqxtNCsXUJNrVE8CWrCaD icoNu7Fs1c5hmPHdSTnU48ce67449DdnO4neLSNhRiGlMHJgfJUmgrxu/hcYeOZ3haWmEQ2w uW1Mh01OHi8QZHCEyAbABrPs9GUgccc/4eYXX9hIgxfSkYzn8f+8NuIFPWl/0uTvjgqU29FQ SbzOLxHq9439Ox40G5mS5eZXRGxITYR+6TXvRGI6P/264jvflnr/pDGUttaikU+0W+1uxgKH cmYbEc7ATQRbGTU1AQgAn0H6UrFiWcovkh6EXVcl+SeqyO6JHOPm+e9Wu0Vw+VIUvXZVUVVQ La1PQDUi6j00ChlcR66g9/V0sPIcSutacPKfdKYOBvzd4rlhL8rfrdEsQw5ApZxrA8kYZVMh FmBRKAa6wos25moTlMKpCWzTH84+WO5+ziCTsTUZASAToz3RdunTD+vQcHj0GqNTPAHK63sf bAB2I0BslZkXkY1RLb/YhuA6E7JyEd2pilZOrIuBGl/5q2qSakgnAVFWFBR/DO27JuAksYnq +aH8vI0xGvwn75KqSk4UzAkDzWSmO4ZHuahKtQgZNsMYV+PGayRBX9b9zbldzopoLBdqHc4n jQARAQABwsF8BBgBCgAmAhsMFiEEqUDUNJksLo6ZED1QIk+n58yCpmQFAmfIHFQFCRYU6J8A CgkQIk+n58yCpmS2PA//bqN1LfcotmArgElsa+0EGZSQlYgK48pm8WAeTXTngudP9IJ4SuKY HR5RNjHcBeqN+Me0zxRqYzRb8nGanHEkDyf4Im8DQM8d6vbyU+FcPmG4skud4kgS1zMHnlVd SXfSIwKC/hKgdHG8aBV7545Lz9X6Iohea+94wneD0aw/hqF+QWewGZhWJriWAZtvEkzNjQOi 4U9F/trLten/x7bpphDSnDMKJtITbtzATT1Dq7o7VpIUK1nCTQALMuMjKCdi8OdU/+V+R3O4 0PXWvX8qrvqYapVbZ+9KqT74FsuB0Ya9uXwgBF2Q6cRuETZk5vqaqKxzqoQZCO8AOz/58j6O 2RHNy/mZEN+7tJ5Tsq42zVJ4jxsT8b9YplavCMsnBgDeRWhcbYhCyttoL7nYISyWg4kQYZ/P wIV3OuNv2f8iKYsxNsRuClOAF82+gvqOy1/1pprFjy8uo2pkoOrb63aOP3vO5VHnRKgra6dq NcaZ+c6J4H+nEJGi2SkHAUJz5oBzuThvPudLvPA/SK8sKoM01IRxSihev/S/5WLazXB1PGem OCbvzC1IjWJJraxiDJ5IygokapUa2RP7+WBR22skQ3SSl6G107QgWKSyTOGWEaRmV53vxQLV jXuCmzSSasTL60zq5yGrT4/DYQVSNEUiUbG4pYekxJujNeEDkUlky0Y= In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 6/15/26 17:18, David Hildenbrand (Arm) wrote: > On 6/15/26 16:38, Vlastimil Babka (SUSE) wrote: >> On 6/12/26 17:29, Gregory Price wrote: >>> On Wed, Jun 10, 2026 at 04:12:52PM -0400, Gregory Price wrote: >>>> ... snip ... >>>> >>>> I will still probably send the next RFC version tomorrow or friday, >>>> as I want to get some eyes on the __GFP_PRIVATE-less pattern. >>>> >>>> Also, I made a new `anondax` driver which enables userland testing >>>> of this functionality without any specialty hardware. >>>> >>> >>> (apologies for the length of this email: this will all be covered in >>> the coming cover letter, but I just wanted to share a bit of a preview) >>> >>> === >>> >>> Just another small update - I am planning to post the RFC today once i >>> get some mild cleanup done. It will be based on the dax atomic hotplug >>> >>> https://lore.kernel.org/linux-mm/20260605211911.2160954-1-gourry@gourry.net/ >>> >>> But a couple specific details regarding the memalloc pieces that i've >>> learned the past couple of days playing with it. >>> >>> 1) memalloc_folio is required to ensure non-folio allocations don't land >>> on the private node, even if it happens within a memalloc_private >>> context. Since memalloc_folio may be useful in contexts outside of >>> private nodes, I kept this as a separate flag. >>> >>> If we think there will *never* be additional users of memalloc_folio, >>> then we could fold _folio into _private to save the flag for now and >>> add it back when we actually need it. >>> >>> 2) memalloc_private is needed to unlock private nodes, but in the >>> original NOFALLBACK-only design, you also needed __GFP_THISNODE. >>> >>> This is *highly* restrictive. I found when playing with mbind that >>> MPOL_BIND + __GFP_THISNODE generates a WARN (valid WARN, it normally >>> implies a bug). >>> >>> That leads me to #3 >> >> I think the memalloc approach is dangerous due to unexpected nesting. There >> might be nested page allocations in page allocation itself (due to some >> debugging option). But also interrupts do not change what "current" points >> to. Suddenly those could start requesting folios and/or private nodes and be >> surprised, I'm afraid. > > Yeah, we'd need some way to distinguish the main allocation from these other > (nested) allocations. That goes against the very principle of scopes. And I don't see how, except via a ... flag to the main allocation :D >> >> The memalloc scopes only work well when they restrict the context wrt >> reclaim, and allocations in IRQ have to be already restricted heavily >> (atomic) so further memalloc restrictions don't do anything in practice. But >> to make them change other aspects of the allocations like this won't work. > > I was assuming that memalloc_pin_save() would already violate that, but really > it only restricts where movable allocations land, and that doesn't matter for > other kernel allocations. Hm yeah its suboptimal, as it can turn a movable allocation unmovable. But shouldn't cause outright bugs. > Do you see any other way to make something like an allocation context work, and > avoid introducing more GFP flags? Yeah, the idea of augomenting gfp flags with alloc_flags that are no longer strictly internal to the page allocator, seems like a way to achieve what we need.