From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0AC62EA173 for ; Thu, 30 Apr 2026 19:59:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.15 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777579191; cv=none; b=M4JmXDurtSSCUFCyVUXMSoi1J5MZZQbdV5r8q77c+5qYsmva/hFcSx5wAKJGI81gFnHaZRSDMs0BbgfTrUQwl1bz+ZZej66pslTZXQ14gLDXwQF6mch/+3erHS/RsBk/ex+ML06c9wgY8qyM++L4oHK0M/Gs3bEnHsaXexu14RM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777579191; c=relaxed/simple; bh=slz8eTKsY4YDA50xQG0Q9MRcubgU+W/wvTmZpbUFzOw=; h=Message-ID:Subject:From:To:Cc:Date:In-Reply-To:References: Content-Type:MIME-Version; b=HIM5cCts/h4p29b01LpIwdHXuAXpPx9CLI03nC8kCXroVYsJfOymgqo5MHthSXobZKXaL/yDD2UF4tOWma6pSRsoBghGmTvptldbctXiUNEGbxaqpGZOy4XRMx6PPE6xuauHdWmsOfde0eqWnAAFJ9Jk3v8MUO45F2z8NRlApQc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NXx4M23f; arc=none smtp.client-ip=198.175.65.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NXx4M23f" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1777579189; x=1809115189; h=message-id:subject:from:to:cc:date:in-reply-to: references:content-transfer-encoding:mime-version; bh=slz8eTKsY4YDA50xQG0Q9MRcubgU+W/wvTmZpbUFzOw=; b=NXx4M23foxKWdZKiofD13Y+ZMViuJA1Jo+S4KkbfU9NwW7GgT/wPrE5c dw/XR5LCrdft5AzE3Xhh6GEK3aF2nTLUzHT+BHDaz683vuLuC0vt/tVxw kC8QcMZmtuszkUjSinCURWHEaPlxeyneCWm0SzTXqjNMRUemPuIIUtBvX 4CYquZrAD7ka3GxDSQk53bDxFCFQ941zg9fLdRSJw1OwpjrBLX2g3HkI9 BIbB8ZbsakO/WgpQeZbwzgfiC7/OuQCtffYxJRpjWVW6ciJ+CfXGgz/gJ fztSKV9SPvDG6MU6R8jGDs2hKUGOghjZxDU2a+z4+aT8zo6Ed8uxttAuH A==; X-CSE-ConnectionGUID: u602nKv/RyONvKUqdPAbzg== X-CSE-MsgGUID: pPkBDxw8QFeNg/vXRAlyDg== X-IronPort-AV: E=McAfee;i="6800,10657,11772"; a="82148628" X-IronPort-AV: E=Sophos;i="6.23,208,1770624000"; d="scan'208";a="82148628" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Apr 2026 12:59:49 -0700 X-CSE-ConnectionGUID: /q/sqexoSfWOFfrF7fdhDg== X-CSE-MsgGUID: /+zgkXZdQO29g+P0S15dqQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,208,1770624000"; d="scan'208";a="232024909" Received: from dalessan-mobl3.ger.corp.intel.com (HELO [10.245.244.73]) ([10.245.244.73]) by fmviesa008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Apr 2026 12:59:45 -0700 Message-ID: <4ebeb470d5ab495f5d5342454db57b1a6ccc4753.camel@linux.intel.com> Subject: Re: [PATCH v2 1/5] mm: Introduce zone_appears_fragmented() From: Thomas =?ISO-8859-1?Q?Hellstr=F6m?= To: Matthew Brost Cc: "David Hildenbrand (Arm)" , intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Andrew Morton , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Johannes Weiner Date: Thu, 30 Apr 2026 21:59:43 +0200 In-Reply-To: References: <20260423055656.1696379-1-matthew.brost@intel.com> <20260423055656.1696379-2-matthew.brost@intel.com> <76191a17-18bf-4e9b-9ab5-dc9a48abfabb@kernel.org> <291406b26b8badf2e565996515931d9ebe50208f.camel@linux.intel.com> Organization: Intel Sweden AB, Registration Number: 556189-6027 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.58.3 (3.58.3-1.fc43) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 On Thu, 2026-04-30 at 09:34 -0700, Matthew Brost wrote: > On Thu, Apr 30, 2026 at 09:47:37AM +0200, Thomas Hellstr=C3=B6m wrote: > > On Wed, 2026-04-29 at 19:47 -0700, Matthew Brost wrote: > > > On Fri, Apr 24, 2026 at 09:26:18AM +0200, David Hildenbrand (Arm) > > > wrote: > > > > On 4/24/26 09:05, Thomas Hellstr=C3=B6m wrote: > > > > > On Thu, 2026-04-23 at 15:21 -0700, Matthew Brost wrote: > > > > > > On Thu, Apr 23, 2026 at 12:08:36PM -0700, Matthew Brost > > > > > > wrote: > > > > > > >=20 > > > > > > > If the order were included in shrink_control, there is > > > > > > > about > > > > > > > a 95% > > > > > > > certain that this change would allow TTM / Xe to break > > > > > > > the > > > > > > > problematic > > > > > > > kswapd feedback loop. This may also better express the > > > > > > > intent > > > > > > > of > > > > > > > the > > > > > > > problem we are trying to fix here. > > > > > > >=20 > > > > > > > For reference, the cover letter [1] details the problem. > > > > > > >=20 > > > > > > > Any guidance from the core MM folks would be > > > > > > > appreciated=E2=80=94would > > > > > > > adding > > > > > > > the order to shrink_control be an acceptable solution? > > > > > > >=20 > > > > > > > Matt > > > > > > >=20 > > > > > > > [1] https://patchwork.freedesktop.org/series/165330/ > > > > > > >=20 > > > > > >=20 > > > > > > It doesn't look like __GFP_NORETRY, __GFP_RETRY_MAYFAIL, > > > > > > __GFP_NOFAIL > > > > > > make it to the sc->gfp_mask flags from the caller and get > > > > > > into > > > > > > kswapd > > > > > > loop... > > > > >=20 > > > > > Perhaps that's because they mostly (only?) make sense from > > > > > direct > > > > > reclaim? Looks like the trace is from kswapd. > > > >=20 > > > > kswap obtains the desired order through pgdat->kswapd_order, as > > > > a > > > > hint from > > > > allocation code (wakeup_kswapd). The order can be easily merged > > > > (just use the max) > > > >=20 > > >=20 > > > Yes. > > >=20 > > > My current thinking is wire the order into shrink_control as that > > > is > > > quite straight forward + only call this helper + short circuit > > > shrinker > > > on higher orders. > > >=20 > > > > We do have the gfp_flags there, but merging them from different > > > > wakeups is a bit > > > > more tricky (and when to reset?). > > > >=20 > > > > Assume we have one urgent request for order-0 and one non- > > > > urgent > > > > (noretry,nofail, ...) request for order-9, we'd have to figure > > > > out > > > > a way how to > > > > represent that. Gets more complicated for more orders. > > > >=20 > > > > Of course, we could have some kind of array, and try to store > > > > some > > > > "priority" > > > > per order. But I assume plumbing that into the rest of kswapd > > > > might > > > > not be that > > > > easy. > > >=20 > > > Yes, this seems non-trivial. I was also on a call with Google > > > today > > > discussing what Android (client Linux) would like from shrinking, > > > and > > > my > > > initial feeling is that we will need to do some surgery to the > > > shrinker > > > core and GPU shrinkers to make all of this work well over the > > > next > > > year > > > or so. > > >=20 > > > So again, I think starting with wiring order into shrink_control > > > and > > > this helper is a good place to start, as it fixes an immediate > > > issue. > > >=20 > > > Let me know if that seems like a reasonable direction. > >=20 > > +1 for wiring order into shrink_control, and possibly also the > > priority > > as mentioned in an earlier email. > >=20 >=20 > Let me look at how priority field is used as well. >=20 > > However for cgroups-aware shrinkers, The number of free memory in a > > zone might not be an indication of fragmentation-triggered reclaim > > at > > all, it could be the result of the cgroup hitting its memory > > limits. > >=20 >=20 > I agree for cgroups what is in place here is not sufficent and based > Google's feedback of every user space in Andriod is assigned a cgroup > so > we will quickly need a cgroup story. >=20 > > So I think if we can solve this with a combination of GFP flags, > > plumbed-through order and plumbed-through priority, that would be > > ideal. >=20 > That is an idea. The other thing that came up is TTM LRU doesn't > understand relavence of hotness compared to other shrinkers LRUs > (e.g., > core pages) so our TTM shrinker may be evicting hot GPU pages while > cold > non-GPU pages could be evicted which would create less stress on the > system. Perhaps priority / GFP flags will help here? IIRC priority is used to calculate how many pages we are requested to shrink compared to the number we say we have available, but that needs to be double-checked. FWIW i915 has a check that xe's shrinker lacks, that shrinking is not attempted unless the number of pages requested is >=3D an average buffer object size. /Thomas >=20 > Matt >=20 > >=20 > > Thanks, > > Thomas > >=20 > > >=20 > > > Matt > > >=20 > > > >=20 > > > >=20 > > > > --=20 > > > > Cheers, > > > >=20 > > > > David