From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A15548C8A8 for ; Wed, 1 Jul 2026 15:49:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.177 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782920966; cv=none; b=O3fgwbiKRaOgndHDw0+smxW+FrJm47N2sevPeJ9y3Cu1xQoCX7ozExPpEkpHK/yOqUwTlc3ZMiipTHji3EvXTXnwgtOzSPZ10zeQy/4MRJYfdIIgK0wS9KZEPBkk765hJTbB3kC3KscC6Fjg24G267h9Kvzo+pxrqAqoTX5JwOg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782920966; c=relaxed/simple; bh=wKGOFH+ip1lxL+eCEUQ0R74IlfDHj99uk5pl8qPYy0w=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=GjKr1SDa2OrUaccggHsYD6eNcgZ5w7S6kddWo1rEDX52mns33gs/MYXRTfiQB+5Qizi8zJrD4ieyfflyGc34fCB4T8wHodK2JjtMKF6EFrXn67bQGxVi+CxFVnOGUSZL9o+Qd9s2uLfoBJ0wy7NBWH5Wb2ZjStxMIiJFtkVgVlE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=Ry6m7XKQ; arc=none smtp.client-ip=209.85.160.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="Ry6m7XKQ" Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-51bf2479349so4936941cf.2 for ; Wed, 01 Jul 2026 08:49:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1782920961; x=1783525761; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=i0eLlm391NF2bUHGbij/mFfmXYZrka70d1KGczHgghA=; b=Ry6m7XKQnPmm8jXH8eGCiwKnIwoI/nIy4tvVXIYVfvBMSu7IBjujZy88qD0GHsdTOG GBpuTngOu3is61AWLhshWtfO4/sWb/EsOn7+Ugqju6KosKmGnY8QZgQendklY7Ivg92f WEnfS32uAQBVGM95sATXBoDoSM75uKOiqZYvYvNdJPNFdu/yIUiIr5yboRlkzXlBrNvm 8wynOdLYocX0pOUxSEC4XoHVlH0eRY4DPdYc1DGrvHvTf4/PDysSCw24Otse8GK71lwv xGleDIKPJmdJ5l+P/7ux4XFGWplj72v6o1zI/hCllti5pJH3PE/+aN8pF/43pyjKxqOn wNUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782920961; x=1783525761; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=i0eLlm391NF2bUHGbij/mFfmXYZrka70d1KGczHgghA=; b=CVKhoToWT1ZJ6SEnrDtW0m8MJBTQ20YqowJ9E2Ybfk1BrvsQkkJKLm3ah2BwoOpXbV YupnRdx5lsonpZnfy3KGndcbmsjNlXh9UcpRetu1KZSu0lyn+gkzipnZv00v8mcptH2l SmJcaLVc8ZEeejt5If4U07IXBjRB838SExI4YaK1T8cID5w7EcbURJgkITPHi5VMbd6B 8YES0CpxnJ0izdw65NQrjYrKKfvN/BO3QDWNTmUwPa++aKaacG8y3erDv3dypql3aM6f NTElvvZNZRJ4vHL92CoWBFi4F+Fnwc4vKQykLGsiQSGUzKN+bj3VyU7Y5bSprwW5zPgb se7w== X-Forwarded-Encrypted: i=1; AFNElJ8HPXFl217iZBpLtsmMaWYrY7o6IOsgG2kuLwhtTNkdELwZEv8QMSDkwdB251HNTyA6aDaH87bell4oeCM=@vger.kernel.org X-Gm-Message-State: AOJu0YxNpGjqeZZ+O+Wc0UvyyY/a8LkwviOY3KUAN5vUM6CI6NWxAcjX ZhuPr8Imlw/J9gFrQAhyjhbnRb8AsvVHZ2UgQISCCsPgQwXENYDUEV4wjwP9FIbcrwc= X-Gm-Gg: AfdE7cmZ9jwxng01afelfjnPW1BP/anTcBOhh1p2CeLDf9iL1p9OOAxB0nX4rwHjhEt llLEZS2LOvY/NCIP4AQ2f4/wcSZkzAZ0Yc6b0FaHRhBenk/bRorxih0jwwM08srPwdW5HRoNUzN /LE/2q9lYDzNMIB6wLFOdvqazkoccsyUHASUTvZ8/yeIb8xp6J6joBmtySKarzlACR9v3l0weOd FDIWzZnNi+uWoOiVE4jw5sJYG91VWHZ3d3I03AohuoUhMtcl9EISWZvgKLO+roq/vwDAUrnslrr 8tS1QjwmUhJNNeT8w/yDrb91++SfXjcKjxYsS7FEGhPr1pJ8c0YRfirfN3FIIGBDNY8C5+lWDVD BQB4ow2Qoh9GAPA8LnWmY80K4vXwwyrITiS1o/kiG/a9/3QKq9gNMFLhFe4pqTm/jzom6lrdmeT MO9vaUU6sCID4= X-Received: by 2002:a05:622a:4d:b0:51c:7f8:2a9e with SMTP id d75a77b69052e-51c26b3d86dmr27574701cf.61.1782920960665; Wed, 01 Jul 2026 08:49:20 -0700 (PDT) Received: from localhost ([2603:7001:f100:500:365a:60ff:fe62:ff29]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8f4718141e6sm179246d6.24.2026.07.01.08.49.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Jul 2026 08:49:19 -0700 (PDT) Date: Wed, 1 Jul 2026 11:49:16 -0400 From: Johannes Weiner To: Gregory Price Cc: "Huang, Ying" , Andrew Morton , David Hildenbrand , Zi Yan , Matthew Brost , Joshua Hahn , Rakie Kim , Byungchul Park , Alistair Popple , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Neha Gholkar Subject: Re: [PATCH] mm: mempolicy: fix automatic numa balancing for shmem Message-ID: References: <20260629163337.1264881-1-hannes@cmpxchg.org> <87h5mkz33h.fsf@DESKTOP-5N7EMDA> <877bnfynsr.fsf@DESKTOP-5N7EMDA> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Jul 01, 2026 at 11:33:38AM -0400, Gregory Price wrote: > Something i found while seeing if i could make ZONE_NORMAL nodes more > reliably hotpluggable: > > diff --git a/lib/stackdepot.c b/lib/stackdepot.c > index dd2717ff94bf..9ceeb56574ef 100644 > --- a/lib/stackdepot.c > +++ b/lib/stackdepot.c > @@ -682,7 +682,15 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries, > * we won't be able to do that under the lock. > */ > if (unlikely(can_alloc && !READ_ONCE(new_pool))) { > - page = alloc_pages(gfp_nested_mask(alloc_flags), > + /* > + * The stack depot pool is a global, never-freed allocation. > + * Use alloc_pages_node() on the CPU-local node instead of > + * alloc_pages() so the pool does not inherit a transient task's > + * NUMA mempolicy (e.g. MPOL_BIND to a CPU-less/bound node), which > + * would strand this long-lived page on that node forever. > + */ > + page = alloc_pages_node(numa_node_id(), > + gfp_nested_mask(alloc_flags), > DEPOT_POOL_ORDER); > if (page) > prealloc = page_address(page); > > This is a global, permanently allocated, resource that inherits a task > mempolicy's placement because that task *happened* to be the first one > to touch it. > > There are many alloc_pages() calls (155 instances kernel-wide) that > inherit a task mempolicy when that's probably not what we want. > > alloc_pages() is called in: net/, lib/, kexec_core/, drivers/, arch/ > > you can imagine a task setting `set_mempolicy(INTERLEAVE, ALL)` and the > result is a bunch of random driver memory gets spread all over the place > along with the task's heap. Is that really what the caller wanted, or > did they just want userland data spread about? > > But at this point it's a 20 year old interface, not much we can do about > it without making *someone* sad :[ > > I considered proposing MPOL_F_MOVABLE_ONLY to mean (roughly) "userland > memory only" - and then slowly trying to migrate numactl to make this > the default. Hm. Kernel allocations that are totally incidental like the stackdepot example above should not follow task policy. But there are kernel allocations (kernel stack, inodes, pipes) that are directly allocated on behalf of a process, and so probably SHOULD follow task policy. That's an annotation problem that I think we have solved already, because cgroups need the same distinction for what allocations to charge to the current task's cgroup context. We could rename __GFP_ACCOUNT / SLAB_ACCOUNT to __GFP_TASKPOLICY / SLAB_TASKPOLICY or something, and have mempolicy follow it too. There is still the whole "changing 20 year old behavior" aspect, but I think the polarity works in our favor: big important allocations have already been following the policy correctly. The behavior changes primarily for smaller, random allocations.