Re: [PATCH v4 0/6] mm, drm/ttm, drm/xe: Avoid reclaim/eviction loops under fragmentation

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: Dave Chinner <dgc@kernel.org>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	"Dave Chinner" <david@fromorbit.com>,
	"Qi Zheng" <zhengqi.arch@bytedance.com>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Shakeel Butt" <shakeel.butt@linux.dev>,
	"Kairui Song" <kasong@tencent.com>,
	"Barry Song" <baohua@kernel.org>,
	"Axel Rasmussen" <axelrasmussen@google.com>,
	"Yuanchu Xie" <yuanchu@google.com>, "Wei Xu" <weixugc@google.com>,
	"Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Carlos Santa" <carlos.santa@intel.com>,
	"Christian Koenig" <christian.koenig@amd.com>,
	"Huang Rui" <ray.huang@amd.com>,
	"Matthew Auld" <matthew.auld@intel.com>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Daniel Colascione" <dancol@dancol.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"David Hildenbrand" <david@kernel.org>,
	"Lorenzo Stoakes" <ljs@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	"Vlastimil Babka" <vbabka@kernel.org>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Suren Baghdasaryan" <surenb@google.com>,
	"Michal Hocko" <mhocko@suse.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 0/6] mm, drm/ttm, drm/xe: Avoid reclaim/eviction loops under fragmentation
Date: Fri, 1 May 2026 11:42:19 +1000	[thread overview]
Message-ID: <afQE-2JOzfOm8enM@dread> (raw)
In-Reply-To: <20260430191809.2142544-1-matthew.brost@intel.com>

On Thu, Apr 30, 2026 at 12:18:03PM -0700, Matthew Brost wrote:
> TTM allocations at higher orders can drive Xe into a pathological
> reclaim loop when memory is fragmented:
> 
> kswapd → shrinker → eviction → rebind (exec ioctl) → repeat
> 
> In this state, reclaim is triggered despite substantial free memory,
> but fails to produce contiguous higher-order pages. The Xe shrinker then
> evicts active buffer objects, increasing faulting and rebind activity
> and further feeding the loop. The result is high CPU overhead and poor
> GPU forward progress.
> 
> This issue was first reported in [1] and independently observed
> internally and by Google.
> 
> A simple reproducer is:
> 
> - Boot an iGPU system with mem=8G
> - Launch 10 Chrome tabs running the WebGL aquarium demo
> - Configure each tab with ~5k fish
> 
> Under this workload, ftrace shows a continuous loop of:
> 
> xe_shrinker_scan (kswapd)
> xe_vma_rebind_exec
> 
> Performance degrades significantly, with each tab dropping to ~2 FPS on
> PTL (Ubuntu 24.04).
> 
> At the same time, /proc/buddyinfo shows substantial free memory but no
> higher-order availability. For example, the Normal zone:
> 
> Count: 4063 4595 3455 3400 3139 2762 2293 1655 643 0 0
> 
> This corresponds to ~2.8GB free memory, but no order-9 (2MB) blocks,
> indicating severe fragmentation.
> 
> This series addresses the issue in two ways:
> 
> TTM: Restrict direct reclaim to beneficial_order. Larger allocations
> use __GFP_NORETRY to fail quickly rather than triggering reclaim.

NACK.

As I have said to the people trying to hack around direct reclaim
for high order allocations being costly for the page cache, fix the
problem with direct reclaim. (e.g.
https://lore.kernel.org/linux-xfs/adLlrSZ5oRAa_Hfd@dread/)

We should not be hacking around a problem in the mm infrastructure
by changing allocation context flags every high order allocation 
call site that needs high order allocations. Understand and fix the
infrastructure problem once and for all.

> Xe: Introduce a heuristic in the shrinker to avoid eviction when
> running under kswapd and the system appears memory-rich but
> fragmented.

NACK on architectural grounds.

Custom heuristics in individual shrinkers to decide whether the
should do what the mm subsystem has asked them to do has -always-
been a mistake to allow. The mm subsystem makes the decision on how
much cache shrinkage needs to occur, the shrinkers just do what they
are told to do.

If we have a problem where a workload causes excessive shrinker
reclaim, then we need to address the problem in the infrastructure
because excessive reclaim affects the performance of -all-
subsystems with shrinkable caches, not just the TTM subsystem.

As it is, I can't review what you've actually implemented because
you only cc'd me on a single patch in the series. In future, please
cc me on the whole patchset because shrinkers need to work as a
coherent whole, not just in isolation....

-Dave.
-- 
Dave Chinner
dgc@kernel.org

next prev parent reply	other threads:[~2026-05-01  1:42 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30 19:18 [PATCH v4 0/6] mm, drm/ttm, drm/xe: Avoid reclaim/eviction loops under fragmentation Matthew Brost
2026-04-30 19:18 ` [PATCH v4 1/6] mm: Wire up order in shrink_control Matthew Brost
2026-04-30 19:18 ` [PATCH v4 2/6] mm: Introduce zone_maybe_fragmented_in_shrinker() Matthew Brost
2026-05-01  0:50   ` Santa, Carlos
2026-05-01 19:08   ` PATCH v4 0/6] mm, drm/ttm, drm/xe: Avoid reclaim/eviction loops under fragmentation Kenneth Crudup
2026-05-01 20:00     ` Matthew Brost
2026-05-01 20:05       ` Kenneth Crudup
2026-05-01 21:10         ` Matthew Brost
2026-05-01 22:33           ` Matthew Brost
2026-05-07 18:55     ` Kenneth Crudup
2026-04-30 23:01 ` [PATCH " Andrew Morton
2026-05-01  6:28   ` Matthew Brost
2026-05-01 12:51     ` Andrew Morton
2026-05-01  1:42 ` Dave Chinner [this message]
2026-05-01  7:09   ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afQE-2JOzfOm8enM@dread \
    --to=dgc@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=carlos.santa@intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dancol@dancol.org \
    --cc=david@fromorbit.com \
    --cc=david@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=mripard@kernel.org \
    --cc=ray.huang@amd.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=simona@ffwll.ch \
    --cc=surenb@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tvrtko.ursulin@igalia.com \
    --cc=tzimmermann@suse.de \
    --cc=vbabka@kernel.org \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox