From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qv1-f52.google.com (mail-qv1-f52.google.com [209.85.219.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0FA572D0C7E for ; Mon, 29 Jun 2026 18:22:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.52 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782757379; cv=none; b=uT8UE64S4n0Mw8yKCY0210VyAHU5zFNRKY+8/5oIMFCPTfRpOQRrmNTZ54rDCJse6ODtNiaE8opK8Rc06gsi/6+8p1MVQJSNbuT/FHznca/tMcIu7RworRwgsa010MK2gL7TUsaLwXjT79uLxRBM2kD8V4eFpq828ozvdPku3gU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782757379; c=relaxed/simple; bh=8LitLxlAOutZbaH33mpGlDcoNMIF8Ilb11JLCDWYAtU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=c+Bhm29AGdRJBddcscPPtg5Q1A9IgcSeUno/NqJby+TSE71O6SyKJTgb/WI3xvrHIoq5+/QNuuBPUTt+SSgBSEKVbxPpz8b+kopuNFiF03V5WNaROGPDBbIp1pJYdfnR2z+tPVA4ZB+RfxBWNh977dbE7ym5MZV2g9aT6MLlE98= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=UPh/DT++; arc=none smtp.client-ip=209.85.219.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="UPh/DT++" Received: by mail-qv1-f52.google.com with SMTP id 6a1803df08f44-8ee6912d86dso15391556d6.1 for ; Mon, 29 Jun 2026 11:22:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1782757376; x=1783362176; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=hdEpOaUd21Ax6YrMiG1pwzwQGghCCbiiNKGBo6BJ3iE=; b=UPh/DT++ypRAChkI5kFUhuRvIaDzpGYhc73rp4JQJ570pQaVYZs9Poncm66o/jyop7 5ZabRo2ATsDy9OwRZh4RqX4T0BgH1gx+YqaZd0cx6hd7xdtHTQLVvK4IJ0+pbFv9eO5C nO98dcrak+OlsCpCFlAH6PYdXMgiwPouh9cAEGfkbb2fy1XzGLyg77upf1pCogNtLaln xClPQeVDymn4w1iaiY2bYuCOOAVQql3YgiAxyzbJrm72c6HEZmJ+cmE883z9VwvjjLJj kBghG3bkoBVqmRoLIN+kG0gxHHsQkLJirrFS2DyvJHGdGQ0LaqboICuFwu5W9r+34YVk GFUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782757376; x=1783362176; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hdEpOaUd21Ax6YrMiG1pwzwQGghCCbiiNKGBo6BJ3iE=; b=REQh/KTuzfe9H283dDY+WU3Jm4l8pv3ng3nsk0dGYz7KRElXV7RNsdZtKGFjlII00f eYlTejKs7xiSPw0yYNZZTFRdTaj1X/NxijEOq63bzgCRS/QW/Fgfhfqx+QnKoWDWghk0 CsNdB2sZlaqTeHQ3ZDizSdzpOBe0FxFUzA9YDVoLEh2TiO9XThkCENfMIvbeCl0Y9QID +3oSYwF3t8bW2JPwZ21kTh2vJYWE7H7EB5PTGLflJ2mptzjP9u7ohDxeYQRdjzhdzNq1 CPhL4SaDkWvVMTt/PWJSrC+6B2pd82Zbxa7OPboesQHWm/p54z3UkmO642Bl/nRLRNbI AA1Q== X-Forwarded-Encrypted: i=1; AHgh+RqBFPDpc/zSn9o1PKYm3B+2xr/+9hU/Fb54pI2Cbu8XN9DURdbNFKI+leMlS3NGGvJgNQ5Sl1fqJb1Y8kY=@vger.kernel.org X-Gm-Message-State: AOJu0Yz9VhCBY64mDOURhr3QgIP8RBGXg/4UX9BncsDstEeQbUsziOvH CfvIeCuHP6fX4O61RxRhTNKKr4NGuNMKEPgUpzLRJ7xXcDM9LFLXbDVKf9fq/JvJjI4= X-Gm-Gg: AfdE7cne7O/HIUyQv3koaIg2Yx+8EK13Yt990w9Hyf0XmZWNKx2zDUstVbRps+1WxTh pSwVldKyzi43Gv5ZXVLfAMVE2Ebwf+okdJXmTYZ9pngcL2qNu6uyEDjjk1+s+k3Gu249f5Ry8dr GUaMfnFokVA3/K9hoRl+Ct/fgkDY/IS/eBtsYv5u5HY8y8egizXTCkvGbX/JiM7EuZNd4XvT8sC 2i2PGD+Sdxwazbxy0zp43YqeOxSvCXGLjE0XGov5z479ggWJKcD/oT0xVz8jAZjW/+vwppwYMkC YZcXo8p0MLtA/OXRvLArj40wMk9mQFZTQrbu0DD1VRRtYczmeUpXfXNNaVZ0u8F2sSGUEx8O1pR +FurWOcMBknxA9mQz5oCKJbaBZ0GoXJkL5APpWvcmnbuuzkJKqG5GUoqrWtqTQkGUi6iXUjgNOs UmSQoE76NuZk4= X-Received: by 2002:a0c:e70f:0:b0:8ce:ade5:e8fd with SMTP id 6a1803df08f44-8f1bbec32b1mr5223426d6.25.1782757375699; Mon, 29 Jun 2026 11:22:55 -0700 (PDT) Received: from localhost ([2603:7001:f100:500:365a:60ff:fe62:ff29]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8f1a328fed8sm4510276d6.16.2026.06.29.11.22.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jun 2026 11:22:55 -0700 (PDT) Date: Mon, 29 Jun 2026 14:22:54 -0400 From: Johannes Weiner To: Gregory Price Cc: Andrew Morton , David Hildenbrand , Zi Yan , Matthew Brost , Joshua Hahn , Rakie Kim , Byungchul Park , Ying Huang , Alistair Popple , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Neha Gholkar Subject: Re: [PATCH] mm: mempolicy: fix automatic numa balancing for shmem Message-ID: References: <20260629163337.1264881-1-hannes@cmpxchg.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Mon, Jun 29, 2026 at 01:59:41PM -0400, Gregory Price wrote: > On Mon, Jun 29, 2026 at 12:33:37PM -0400, Johannes Weiner wrote: > > Neha reports that mapped shmem aren't considered for NUMA balancing, > > noting convergence problems and bandwidth bottlenecking for cachelib > > based workloads on tiered memory systems. > > > > Looking at the code and going through the git history, this doesn't > > actually seem intentional: > > > > Commit fc3147245d19 ("mm: numa: Limit NUMA scanning to migrate-on-fault > > VMAs") added a vma_policy_mof() gate to task_numa_work() so VMAs whose > > policy lacks MPOL_F_MOF are skipped from NUMA balancing scans. The > > motivation was a real usecase: Oracle was pinning shared segments with > > mbind(MPOL_BIND) so trapping faults was both expensive and pointless. > > > > The handling of NULL from vm_ops->get_policy, however, treated "user > > explicitly opted out" the same as "user never specified anything." For > > VMAs whose shared policy is absent - the common case for shmem - the > > scan was disabled too. > > > > This issue is old. It probably hurts less in conventional NUMA. But it's > > very noticable on tiered systems, where entire tmpfs workingsets can get > > stuck on lower-bandwidth memory. > > > > Eugh. > > Demotions don't care about mempolicy, so opting shmem out of NUMA > balancing and mbind'ing on a tiered system is just full sadness. Right, mbinding in tiered mode is a whole other ball of wax. I'm just trying to make the default case work ;-) > This is all just more evidence that demotion needs to be completely > redone, it's creating a mess of undefined behavior for memory placement. No argument from me. > > Fix this by having vma_policy_mof() use __get_vma_policy() directly, and > > thereby handle the fallback to task policy (-> preferred_node_policy() > > has MPOL_F_MOF per default). Every other consumer of vm_ops->get_policy > > already handles it this way, the scan-eligibility check was the outlier. > > > > This preserves Mel's intended fix: don't scan stuff the user explicitly > > pinned. But allow default policy vmas to participate in balancing. > > > > Reported-by: Neha Gholkar > > Tested-by: Neha Gholkar > > Fixes: fc3147245d19 ("mm: numa: Limit NUMA scanning to migrate-on-fault VMAs") > > Signed-off-by: Johannes Weiner > > Reviewed-by: Gregory Price Thanks! Sorry for making you feel bad.