From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F01B38E5CD for ; Mon, 12 Jan 2026 19:23:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768245824; cv=none; b=LqXlCrDzFSClgnqffmzN47cpOtpb9h1M5d1iw2RaUmk21oh39q8UNtzorvD7ZWik3z0hNkzlZKXQYK/NlRZX/aXhW2X3K3607GrezJV7otw7YrQMb7dxDK2RZyrY5HmWHPizpe7ZD9zBfzhJTsjT24DPcQrYSt8rBJA6GkV0q8s= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768245824; c=relaxed/simple; bh=cOw8Xm1Y+2bO2aY2zNf5gMcWMKqq3Y4YPTqX1Wb+huk=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=QGLkPs0JCBHsD++Bs2uOknIFee6SXCzgddlllC3GWxhypeqJMQ8DmQYVNNHBxu+x8Be0Ke7zAbwwxWEPeFko9f4s/DDdzo+pbMQSoxUWV4hsidngT0RgeV4oHZYOlZ1KkQussxkUdc5DvITcroXrf/K/3Ls6tT7vsr9iNgSg3GM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=GctyodZb; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="GctyodZb" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-29f02651fccso9705ad.0 for ; Mon, 12 Jan 2026 11:23:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1768245822; x=1768850622; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=hmOnMmQfqqdHxNQ09Yihr63xKdO4t9TmwZwP3j3Qm5Q=; b=GctyodZbsEemdcM995AXl0rRZuo6M5xBXoH8Gor1VUqHBnaOj8rU8FZlj1YmJMyqPz DyYqU7w09HJ3q+Qutt407C1eXlc9agetx87pp7bKc+lV5jpBZcrQKTdv6Pc/DEYr6+vO pMzfFO3RW7uVjlKVDmICEoZA9PAT+IvdlARXE12VmDRcdtFWruRAYNAstpho8fW3UJRE pr9KO1VauZ75HxhagUQpCU4N2C55Hl49Qm2sBmZJIEHYa7EfuAJTYFsNXpySs53WfiUH SkOeznbuqjaZSDFbIcyF5DR9zNunhM4T/9R6NKMrD8W3I3ncvPiN2jwhBH77mj+7XeGL IeJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768245822; x=1768850622; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hmOnMmQfqqdHxNQ09Yihr63xKdO4t9TmwZwP3j3Qm5Q=; b=k3MsRTAtvEyW79MIHtt+eQ8fU5i+kFWR6fAlV18BvYH5o9OcUaQBeUSFD0ajos2Bmf nXd3bWAhh4VHbJCtqHK80x4iPOMFfyfTLICJWvod6XQRrNoEKp4HWlrTSu8NH0soUi3Y QQVt+bSJlZlGiI0twnOSh+bRds6GgnzyFJCW9LVM7MRfbmCrILhrzyv/Q60K/gzqwKM6 OIPNv2yqE6XoGRPq3E4pd4Ocf7qnv9iZq6leo49xub15AmboRT9F0yFIjJPBi5oIiBOx iY7ZRQ1P2WrVdvnGsqHDmOF50CbjWf788mKbXlGa/RGfAny1fOrUh4ushmS6rhxhW1WS yG/w== X-Forwarded-Encrypted: i=1; AJvYcCVqm+c0F1eklbvUf4KvFgPAh7HhmUkeHILaeyQLKY2/wlXAOAcVYBqSB7rTFAJR2Z0aJi5aB5XMaCuB2J0=@vger.kernel.org X-Gm-Message-State: AOJu0YzAECp53wkue0ZtY43rFyrY5qwgc+NAhcQ8xBus5NZG9ATG65Pf 6nIcDrsxXs3JXQC2K4giiDNotJI0Tnxt3BnAYJSKZzZjEh6dlA+MVkX8e1b0/S4Srw== X-Gm-Gg: AY/fxX4JDmZKvKPHa4oBkO/1g8BmxaMsNaPNipFEyT6b4fQhWNR+LwlmKbhC4LmgmFr YgnLvpLeI2/D5pw3+ydlD9qbLUC0o3rSIdeL66gK3N45uNV4Kktmosyd7XJmRMQLEyuaxb4/VmD g74eysTOHU3pYBvRJwu4iOEZ+tlQ9i8rUZrm2eFi3NJJs9ybD4xqZ1YmymPEsjfmuAmc1wCZ+7/ BdC1t1rlYAkD9Hq/4jIlLmrB96Tf8TX1vy8y5GZ/P4P/o9XzWDgML0hOIeoGYHQk+ih9COffIrJ JTWat/DuE/7Z+HV5d/YMURwN8Ule9j5R+fRaJ1KplI1rP+w/jqgNftOOr+5h3eHkJ5gRkXDsns2 kSrZ3T2sE4lfvgeaxaAqjZFq/ag2Afcb6CHkA0WImLT40YwBJkvB6+CTxLh58YGWg+geW8cwl3q A43qoC3ZL6kxjwHn2zP9HsapAOsMV7vstKhtlBsDyVlzhAQfLRfJZs X-Received: by 2002:a17:902:e54f:b0:299:c367:9e04 with SMTP id d9443c01a7336-2a58c7603bdmr254405ad.19.1768245822052; Mon, 12 Jan 2026 11:23:42 -0800 (PST) Received: from google.com (248.132.125.34.bc.googleusercontent.com. [34.125.132.248]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2a3e3cd2acdsm179373125ad.89.2026.01.12.11.23.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Jan 2026 11:23:41 -0800 (PST) Date: Mon, 12 Jan 2026 19:23:36 +0000 From: Bing Jiao To: Joshua Hahn Cc: Donet Tom , linux-mm@kvack.org, Andrew Morton , Johannes Weiner , David Hildenbrand , Michal Hocko , Qi Zheng , Shakeel Butt , Lorenzo Stoakes , Axel Rasmussen , Yuanchu Xie , Wei Xu , linux-kernel@vger.kernel.org Subject: Re: [PATCH v1 1/2] mm/vmscan: balance demotion allocation in alloc_demote_folio() Message-ID: References: <20260110005229.1348817-1-joshua.hahnjy@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260110005229.1348817-1-joshua.hahnjy@gmail.com> On Fri, Jan 09, 2026 at 04:52:28PM -0800, Joshua Hahn wrote: > On Fri, 9 Jan 2026 23:45:57 +0000 Bing Jiao wrote: > > > On Thu, Jan 08, 2026 at 06:14:02PM +0530, Donet Tom wrote: > > > > > > On 1/7/26 12:58 PM, Bing Jiao wrote: > > > > + /* Randomly select a node from fallback nodes for balanced allocation */ > > > > + if (allowed_mask) { > > > > + mtc->nid = node_random(allowed_mask); > > > > > > > > > This random selection can cause allocations to fall back to distant memory > > > even when the nearer demotion target has sufficient free memory, correct? > > > Could this also lead to increased promotion latency? > > > > Hi Donet, > > > > Thanks for your questions. > > > > Yes, the random selection could select a distant node and lead to > > incresed promotion latency. > > > > I just realized that the the fallback allocation should not weighted > > by a single metric, such as node distance, capacity, free space. > > Hello Bing, I hope you are doing well! > > Yes -- this is also what I believe, and I think this idea of "how should we > select demotion / allocation targets" is something that is a difficult problem > (and one that may not have a single solution that "just works"). > > It's also a question that I have been thinking about, and what was discussed > in part at LSFMMBPF last year. At the time, I made some auto-tuning weights [1] > for weighted interleave based on bandwidth capacity, since the main benefit of > weighted interleave is to distribute memory accesses across multiple nodes > to maximize how much bandwidth the system can use at once. A follow-up was to > think about how these weights could change over time, and what heuristics > should be used to determine how the weights are selected. > > Ultimately, we agreed that the heuristics should probably be delegated to > userspace, since there are just so many scenarios that could change what > metrics should take priority. (Jonathan Corbet wrote a great summary of the > discussion in an LWN article [2]) > > Coming back to this patchset, I think that all of the ideas above apply > nicely here as well. What nodes should be selected for demotion and how they > should be weighted is a difficult question, and one that is probably best > answered by userspace and what workload they expect to use on their specific > system. > > What I do believe though, is that an unweighted random selection / round-robin > approach to selecting demotion targets might lead to some unexpected > performance implications. > > > We need a thoroughly study before changing alloc_demote_folio(). > > So I think this is the way to go : -) > Although, I'm not actively exploring this at the moment ;) > > Please let me know what you think, I hope you have a great day! > Joshua > > [1] https://lore.kernel.org/all/20250109185048.28587-1-joshua.hahnjy@gmail.com/ > [2] https://lwn.net/Articles/1016842/ Hi Joshua, hope you had a great weekend! I appreciate you sharing that information. I really enjoyed reading these articles and discussions. It makes sense to assume users understand their requirements, but I think the kernel needs internal heuristics for weight adjustment. Because users often lack the comprehensive and immediate information necessary to update their configration in a timely manner, unless the system has an omniscient administrator who can oversee and (pre)allocate resource for all tasks running on that system. Therefore, I think it is still necessary to have kernel on weight adjustment. I will think more about this and explore it further from userspace, kernel space, or using a hybrid approach. Thank you again for the sharing! Best, Bing