From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5334640DFD7 for ; Tue, 28 Apr 2026 03:58:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777348684; cv=none; b=YH5LeWHJcgo8/wUKJPI+rbdq1aAmWjekRkRyOtXRBV/9n6t69bBIIeTmSZ7seXie+3HwgiRKUabiZXCvbZQvUkpcqtJq4zJT/vCscjly2g21+ulPjjKN7RwtDy1B+5m9f4nmZV1h9teiohKTGIBskEYt5hOdEsQ7dqZDgnGBH8w= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777348684; c=relaxed/simple; bh=LZXWkrSOWPKnByw1e47lka23jslz4Run1/onV9jU3+U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Bw4CsgXoR1/dS7xPzyFEwXq+L1lBTIupb/+g2pZV8WJzcYJHM79g1AZ6Qpa2FlO248zkhM85zocorMw5C+RSVkgUYTnh9mf9wdDg9zbGDDMX8tvg61Xr3KqFs+DkxdciacYmAcmWs3jVCeNFMG+yIY3JXinUu1gORSJNXa38Vyc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=qLX3Iv+A; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qLX3Iv+A" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B65E6C2BCAF; Tue, 28 Apr 2026 03:58:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777348683; bh=LZXWkrSOWPKnByw1e47lka23jslz4Run1/onV9jU3+U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qLX3Iv+AJ2Y4Qvmrbv5XILKylrB9O3x02pZfhKJH3MD6D6hSrExsyHpooGYTIYoqu Ws2hdIzE6PwktwuHQ9wJGDk9jCUF6amL/GfYeTTT+xOkBYUvmD7VHTRKp9nVC9gsNl i40krKK19CcvtpUQGh141wXR/iNIHX2TvZHMggsYTIwXWN7EzAC5HCB8Qj8FfkIZtv 68Taev2MDUUCMTlFCG9760bqEdCxmk9JtXiFFClC3BfT+Ml3vWkLBeGzugqwX2bbYD GMi9liDl2j9d0tSTO8nwiVHff7JrhWmCtXMSLsh1yfjklERLtID6MZHFEbxklKMS92 NEIJixo5b4fvg== From: SeongJae Park To: sashiko-bot@kernel.org Cc: SeongJae Park , "Ravi Jonnalagadda" , damon@lists.linux.dev Subject: Re: [PATCH v9.2] mm/damon: add node_eligible_mem_bp goal metric Date: Mon, 27 Apr 2026 20:57:53 -0700 Message-ID: <20260428035754.116202-1-sj@kernel.org> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260428032542.98E37C2BCAF@smtp.kernel.org> References: Precedence: bulk X-Mailing-List: damon@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit On Tue, 28 Apr 2026 03:25:42 +0000 sashiko-bot@kernel.org wrote: > Sashiko AI review found 4 potential issue(s): > - [High] Missing `cond_resched()` inside the inner PFN-scanning loop causes CPU soft lockups. > - [High] Casting 64-bit `phys_addr_t` to 32-bit `unsigned long` causes massive quota calculation errors on 32-bit PAE systems. > - [High] Physical address wraparound of `folio_end` causes an infinite loop and system hang. > - [High] Pre-existing `VM_BUG_ON` in speculative folio access within `damon_get_folio()`. I find none of above as a blocker of this patch. Please read below for more details. > -- > > commit 377bf0664129c87741fa640c563456b379c8a816 > Author: Ravi Jonnalagadda > > mm/damon: add node_eligible_mem_bp goal metric > > This patch enables system-wide page distribution with target-state goals > using PA-mode DAMON schemes. It tracks the ratio of scheme-eligible memory > on a target node to adjust migration aggressiveness. The implementation > calculates the current memory distribution across regions that match the > scheme's access pattern. > > > diff --git a/mm/damon/core.c b/mm/damon/core.c > > index 3dbbbfdeff71..9a4220174a61 100644 > > --- a/mm/damon/core.c > > +++ b/mm/damon/core.c > [ ... ] > > @@ -2287,7 +2306,115 @@ static unsigned long damos_get_node_memcg_used_bp( > [ ... ] > > + while (addr < end_addr) { > > + struct folio *folio; > > + phys_addr_t folio_start, folio_end; > > + phys_addr_t overlap_start, overlap_end; > > + phys_addr_t counted; > > + > > + folio = damon_get_folio(PHYS_PFN(addr)); > > Since this code heavily invokes damon_get_folio() in a new PFN scanner, could > the speculative folio access inside damon_get_folio() trigger a VM_BUG_ON? If > damon_get_folio() tests folio_test_lru() before acquiring a reference with > folio_try_get(), the page could be concurrently split or reallocated as a tail > page. I'm not sure what VM_BUG_ON() Sashiko is saying about. Even if Sashiko is correct, seems Sashiko argues the issue is inside damon_get_folio(), which is not modified by this patch. So this is not a blocker of this patch. > > > + if (!folio) { > > + addr = PAGE_ALIGN_DOWN(addr + > > + PAGE_SIZE); > > + if (!addr) > > + break; > > + continue; > > + } > > + > > + /* > > + * Calculate exact overlap between the region > > + * [addr, end_addr) and the folio range. > > + * The folio may start before addr if addr is > > + * in the middle of a large folio. > > + */ > > + folio_start = PFN_PHYS(folio_pfn(folio)); > > + folio_end = folio_start + folio_size(folio); > > + > > + overlap_start = max(addr, folio_start); > > + overlap_end = min(end_addr, folio_end); > > + > > + if (overlap_end > overlap_start) { > > + counted = overlap_end - overlap_start; > > + total_eligible += counted; > > + if (folio_nid(folio) == nid) > > + node_eligible += counted; > > + } > > + > > + /* Advance past the entire folio */ > > + addr = folio_end; > > If a folio is located at the very end of the physical address space, could > folio_end wrap around to 0? If addr becomes 0, could it cause an infinite > loop since addr < end_addr would remain true? The folio is a valid one, so should located on a valid physical address range. And to my understanding, no real world machine allows 2**64 - 1 address. So this should be no blocker of this patch. > > > + folio_put(folio); > > + } > > + cond_resched(); > > Could scanning a large DAMON region without yielding the CPU in the inner loop > cause soft lockups? Moving the cond_resched() inside the while loop would > prevent it from monopolizing the processor. As replied on previous versions, this is unlikely to happen. We can revisit if this really causes such issue. > > > + } > > + } > > + > > + *total = total_eligible; > > + return node_eligible; > > +} > > + > > +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c, > > + struct damos *s, int nid) > > +{ > > + phys_addr_t total_eligible = 0; > > + phys_addr_t node_eligible; > > + > > + if (c->ops.id != DAMON_OPS_PADDR) > > + return 0; > > + > > + if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)) > > + return 0; > > + > > + node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible); > > + > > + if (!(unsigned long)total_eligible) > > + return 0; > > + > > + return mult_frac((unsigned long)node_eligible, 10000, > > + (unsigned long)total_eligible); > > Does casting these phys_addr_t variables to unsigned long cause truncation > errors on 32-bit PAE systems where physical addresses are 64-bit? For > example, if the eligible memory size exceeds 4GB, the truncation could > corrupt the ratio metrics and break the quota autotuner. Would converting > byte counts to page counts before casting avoid this? As replied on previous versions, the user impact is trivial. No blocker. Thanks, SJ [...]