* Re: [PATCH] drm/ttm: Fix ttm_bo_shrink() infinite LRU walk on backup failure
2026-05-11 16:24 [PATCH] drm/ttm: Fix ttm_bo_shrink() infinite LRU walk on backup failure Thomas Hellström
@ 2026-05-12 13:30 ` Matthew Auld
2026-05-13 7:20 ` kernel test robot
2026-05-13 10:24 ` kernel test robot
2 siblings, 0 replies; 4+ messages in thread
From: Matthew Auld @ 2026-05-12 13:30 UTC (permalink / raw)
To: Thomas Hellström, intel-xe
Cc: Christian König, Huang Rui, Matthew Brost, Dave Airlie,
dri-devel, stable
On 11/05/2026 17:24, Thomas Hellström wrote:
> Apply the same fix as b2ed01e7ad ("drm/ttm: Fix ttm_bo_swapout()
> infinite LRU walk on swapout failure") to the ttm_bo_shrink() path.
>
> Move del_bulk_move from before the backup to after success only,
> using ttm_resource_del_bulk_move_unevictable() since the resource
> is now unevictable once fully backed up.
>
> Fixes: 70d645deac98 ("drm/ttm: Add helpers for shrinking")
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v6.15+
> Assisted-by: GitHub_Copilot:claude-opus-4.6
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> ---
> drivers/gpu/drm/ttm/ttm_bo_util.c | 11 +++--------
> 1 file changed, 3 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index f83b7d5ec6c6..3e3c201a0222 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -1112,19 +1112,14 @@ long ttm_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
> if (lret < 0)
> return lret;
>
> - if (bo->bulk_move) {
> - spin_lock(&bdev->lru_lock);
> - ttm_resource_del_bulk_move(bo->resource, bo);
> - spin_unlock(&bdev->lru_lock);
> - }
> -
> lret = ttm_tt_backup(bdev, bo->ttm, (struct ttm_backup_flags)
> {.purge = flags.purge,
> .writeback = flags.writeback});
>
> - if (lret <= 0 && bo->bulk_move) {
> + if (lret > 0) {
> spin_lock(&bdev->lru_lock);
> - ttm_resource_add_bulk_move(bo->resource, bo);
> + ttm_resource_del_bulk_move_unevictable(bo->resource, bo);
> + ttm_resource_move_to_lru_tail(bo->resource);
> spin_unlock(&bdev->lru_lock);
> }
>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] drm/ttm: Fix ttm_bo_shrink() infinite LRU walk on backup failure
2026-05-11 16:24 [PATCH] drm/ttm: Fix ttm_bo_shrink() infinite LRU walk on backup failure Thomas Hellström
2026-05-12 13:30 ` Matthew Auld
@ 2026-05-13 7:20 ` kernel test robot
2026-05-13 10:24 ` kernel test robot
2 siblings, 0 replies; 4+ messages in thread
From: kernel test robot @ 2026-05-13 7:20 UTC (permalink / raw)
To: Thomas Hellström, intel-xe
Cc: oe-kbuild-all, Thomas Hellström, Christian König,
Huang Rui, Matthew Auld, Matthew Brost, Dave Airlie, dri-devel,
stable
Hi Thomas,
kernel test robot noticed the following build errors:
[auto build test ERROR on drm-misc/drm-misc-next]
[also build test ERROR on linus/master v7.1-rc3 next-20260508]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/drm-ttm-Fix-ttm_bo_shrink-infinite-LRU-walk-on-backup-failure/20260513-095356
base: https://gitlab.freedesktop.org/drm/misc/kernel.git drm-misc-next
patch link: https://lore.kernel.org/r/20260511162443.24352-1-thomas.hellstrom%40linux.intel.com
patch subject: [PATCH] drm/ttm: Fix ttm_bo_shrink() infinite LRU walk on backup failure
config: powerpc-allmodconfig (https://download.01.org/0day-ci/archive/20260513/202605131522.yUSpVs9Q-lkp@intel.com/config)
compiler: powerpc64-linux-gcc (GCC) 15.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260513/202605131522.yUSpVs9Q-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605131522.yUSpVs9Q-lkp@intel.com/
All errors (new ones prefixed by >>):
drivers/gpu/drm/ttm/ttm_bo_util.c: In function 'ttm_bo_shrink':
>> drivers/gpu/drm/ttm/ttm_bo_util.c:1121:17: error: implicit declaration of function 'ttm_resource_del_bulk_move_unevictable'; did you mean 'ttm_resource_del_bulk_move'? [-Wimplicit-function-declaration]
1121 | ttm_resource_del_bulk_move_unevictable(bo->resource, bo);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| ttm_resource_del_bulk_move
vim +1121 drivers/gpu/drm/ttm/ttm_bo_util.c
1067
1068 /**
1069 * ttm_bo_shrink() - Helper to shrink a ttm buffer object.
1070 * @ctx: The struct ttm_operation_ctx used for the shrinking operation.
1071 * @bo: The buffer object.
1072 * @flags: Flags governing the shrinking behaviour.
1073 *
1074 * The function uses the ttm_tt_back_up functionality to back up or
1075 * purge a struct ttm_tt. If the bo is not in system, it's first
1076 * moved there.
1077 *
1078 * Return: The number of pages shrunken or purged, or
1079 * negative error code on failure.
1080 */
1081 long ttm_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
1082 const struct ttm_bo_shrink_flags flags)
1083 {
1084 static const struct ttm_place sys_placement_flags = {
1085 .fpfn = 0,
1086 .lpfn = 0,
1087 .mem_type = TTM_PL_SYSTEM,
1088 .flags = 0,
1089 };
1090 static struct ttm_placement sys_placement = {
1091 .num_placement = 1,
1092 .placement = &sys_placement_flags,
1093 };
1094 struct ttm_device *bdev = bo->bdev;
1095 long lret;
1096
1097 dma_resv_assert_held(bo->base.resv);
1098
1099 if (flags.allow_move && bo->resource->mem_type != TTM_PL_SYSTEM) {
1100 int ret = ttm_bo_validate(bo, &sys_placement, ctx);
1101
1102 /* Consider -ENOMEM and -ENOSPC non-fatal. */
1103 if (ret) {
1104 if (ret == -ENOMEM || ret == -ENOSPC)
1105 ret = -EBUSY;
1106 return ret;
1107 }
1108 }
1109
1110 ttm_bo_unmap_virtual(bo);
1111 lret = ttm_bo_wait_ctx(bo, ctx);
1112 if (lret < 0)
1113 return lret;
1114
1115 lret = ttm_tt_backup(bdev, bo->ttm, (struct ttm_backup_flags)
1116 {.purge = flags.purge,
1117 .writeback = flags.writeback});
1118
1119 if (lret > 0) {
1120 spin_lock(&bdev->lru_lock);
> 1121 ttm_resource_del_bulk_move_unevictable(bo->resource, bo);
1122 ttm_resource_move_to_lru_tail(bo->resource);
1123 spin_unlock(&bdev->lru_lock);
1124 }
1125
1126 if (lret < 0 && lret != -EINTR)
1127 return -EBUSY;
1128
1129 return lret;
1130 }
1131 EXPORT_SYMBOL(ttm_bo_shrink);
1132
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] drm/ttm: Fix ttm_bo_shrink() infinite LRU walk on backup failure
2026-05-11 16:24 [PATCH] drm/ttm: Fix ttm_bo_shrink() infinite LRU walk on backup failure Thomas Hellström
2026-05-12 13:30 ` Matthew Auld
2026-05-13 7:20 ` kernel test robot
@ 2026-05-13 10:24 ` kernel test robot
2 siblings, 0 replies; 4+ messages in thread
From: kernel test robot @ 2026-05-13 10:24 UTC (permalink / raw)
To: Thomas Hellström, intel-xe
Cc: oe-kbuild-all, Thomas Hellström, Christian König,
Huang Rui, Matthew Auld, Matthew Brost, Dave Airlie, dri-devel,
stable
Hi Thomas,
kernel test robot noticed the following build errors:
[auto build test ERROR on drm-misc/drm-misc-next]
[also build test ERROR on linus/master v7.1-rc3 next-20260508]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/drm-ttm-Fix-ttm_bo_shrink-infinite-LRU-walk-on-backup-failure/20260513-095356
base: https://gitlab.freedesktop.org/drm/misc/kernel.git drm-misc-next
patch link: https://lore.kernel.org/r/20260511162443.24352-1-thomas.hellstrom%40linux.intel.com
patch subject: [PATCH] drm/ttm: Fix ttm_bo_shrink() infinite LRU walk on backup failure
config: x86_64-allmodconfig (https://download.01.org/0day-ci/archive/20260513/202605131824.SbQ7agaE-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260513/202605131824.SbQ7agaE-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605131824.SbQ7agaE-lkp@intel.com/
All errors (new ones prefixed by >>):
>> drivers/gpu/drm/ttm/ttm_bo_util.c:1121:3: error: call to undeclared function 'ttm_resource_del_bulk_move_unevictable'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
1121 | ttm_resource_del_bulk_move_unevictable(bo->resource, bo);
| ^
drivers/gpu/drm/ttm/ttm_bo_util.c:1121:3: note: did you mean 'ttm_resource_del_bulk_move'?
include/drm/ttm/ttm_resource.h:449:6: note: 'ttm_resource_del_bulk_move' declared here
449 | void ttm_resource_del_bulk_move(struct ttm_resource *res,
| ^
1 error generated.
vim +/ttm_resource_del_bulk_move_unevictable +1121 drivers/gpu/drm/ttm/ttm_bo_util.c
1067
1068 /**
1069 * ttm_bo_shrink() - Helper to shrink a ttm buffer object.
1070 * @ctx: The struct ttm_operation_ctx used for the shrinking operation.
1071 * @bo: The buffer object.
1072 * @flags: Flags governing the shrinking behaviour.
1073 *
1074 * The function uses the ttm_tt_back_up functionality to back up or
1075 * purge a struct ttm_tt. If the bo is not in system, it's first
1076 * moved there.
1077 *
1078 * Return: The number of pages shrunken or purged, or
1079 * negative error code on failure.
1080 */
1081 long ttm_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
1082 const struct ttm_bo_shrink_flags flags)
1083 {
1084 static const struct ttm_place sys_placement_flags = {
1085 .fpfn = 0,
1086 .lpfn = 0,
1087 .mem_type = TTM_PL_SYSTEM,
1088 .flags = 0,
1089 };
1090 static struct ttm_placement sys_placement = {
1091 .num_placement = 1,
1092 .placement = &sys_placement_flags,
1093 };
1094 struct ttm_device *bdev = bo->bdev;
1095 long lret;
1096
1097 dma_resv_assert_held(bo->base.resv);
1098
1099 if (flags.allow_move && bo->resource->mem_type != TTM_PL_SYSTEM) {
1100 int ret = ttm_bo_validate(bo, &sys_placement, ctx);
1101
1102 /* Consider -ENOMEM and -ENOSPC non-fatal. */
1103 if (ret) {
1104 if (ret == -ENOMEM || ret == -ENOSPC)
1105 ret = -EBUSY;
1106 return ret;
1107 }
1108 }
1109
1110 ttm_bo_unmap_virtual(bo);
1111 lret = ttm_bo_wait_ctx(bo, ctx);
1112 if (lret < 0)
1113 return lret;
1114
1115 lret = ttm_tt_backup(bdev, bo->ttm, (struct ttm_backup_flags)
1116 {.purge = flags.purge,
1117 .writeback = flags.writeback});
1118
1119 if (lret > 0) {
1120 spin_lock(&bdev->lru_lock);
> 1121 ttm_resource_del_bulk_move_unevictable(bo->resource, bo);
1122 ttm_resource_move_to_lru_tail(bo->resource);
1123 spin_unlock(&bdev->lru_lock);
1124 }
1125
1126 if (lret < 0 && lret != -EINTR)
1127 return -EBUSY;
1128
1129 return lret;
1130 }
1131 EXPORT_SYMBOL(ttm_bo_shrink);
1132
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 4+ messages in thread