All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] TTM unlockable restartable LRU list iteration
@ 2024-03-06  7:01 Thomas Hellström
  2024-03-06  7:01 ` [PATCH v4 1/4] drm/ttm: Allow TTM LRU list nodes of different types Thomas Hellström
                   ` (14 more replies)
  0 siblings, 15 replies; 21+ messages in thread
From: Thomas Hellström @ 2024-03-06  7:01 UTC (permalink / raw)
  To: intel-xe, intel-gfx
  Cc: Thomas Hellström, Somalapuram Amaranath,
	Christian König, dri-devel

This patch-set is a prerequisite for a standalone TTM shrinker
and for exhaustive TTM eviction using sleeping dma_resv locks,
which is the motivation for it.

Currently when unlocking the TTM lru list lock, iteration needs
to be restarted from the beginning, rather from the next LRU list
node. This can potentially be a big problem, because if eviction
or shrinking fails for whatever reason after unlock, restarting
is likely to cause the same failure over and over again.

There are various schemes to be able to continue the list
iteration from where we left off. One such scheme used by the
GEM LRU list traversal is to pull items already considered off
the LRU list and reinsert them when iteration is done.
This has the drawback that concurrent list iteration doesn't see
the complete list (which is bad for exhaustive eviction) and also
doesn't lend itself well to bulk-move sublists since these will
be split in the process where items from those lists are
temporarily pulled from the list and moved to the list tail.

The approach taken here is that list iterators insert themselves
into the list next position using a special list node. Iteration
is then using that list node as starting point when restarting.
Concurrent iterators just skip over the special list nodes.

This is implemented in patch 1 and 2.

For bulk move sublist the approach is the same, but when a bulk
move sublist is moved to the tail, the iterator is also moved,
causing us to skip parts of the list. That is undesirable.
Patch 3 deals with that, and when iterator detects it is
traversing a sublist, it registers with the ttm_lru_bulk_move
struct using a linked list, and when that bulk move sublist
is moved to the tail, any iterator registered with it will
first be moved to the tail of the sublist.
This is implemented in patch 3.

The restartable property is used in patch 4 to restart swapout if
needed, but the main purpose is this paves the way for
shrinker- and exhaustive eviction.

v2:
- Rework patch 3 completely.
v3:
- Fix a NULL pointer dereference found by Xe CI.
v4:
- Remove some leftover code causing build problems.

Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: <dri-devel@lists.freedesktop.org>

Thomas Hellström (4):
  drm/ttm: Allow TTM LRU list nodes of different types
  drm/ttm: Use LRU hitches
  drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist
    moves
  drm/ttm: Allow continued swapout after -ENOSPC falure

 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |   4 +
 drivers/gpu/drm/ttm/ttm_bo.c           |   1 +
 drivers/gpu/drm/ttm/ttm_device.c       |  33 +++-
 drivers/gpu/drm/ttm/ttm_resource.c     | 228 ++++++++++++++++++++-----
 drivers/gpu/drm/xe/xe_vm.c             |   4 +
 include/drm/ttm/ttm_device.h           |   2 +
 include/drm/ttm/ttm_resource.h         |  96 +++++++++--
 7 files changed, 308 insertions(+), 60 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 21+ messages in thread
* Re: [PATCH v4 4/4] drm/ttm: Allow continued swapout after -ENOSPC falure
@ 2024-03-12 18:55 kernel test robot
  0 siblings, 0 replies; 21+ messages in thread
From: kernel test robot @ 2024-03-12 18:55 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp, Julia Lawall

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20240306070125.27071-5-thomas.hellstrom@linux.intel.com>
References: <20240306070125.27071-5-thomas.hellstrom@linux.intel.com>
TO: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
TO: intel-xe@lists.freedesktop.org
TO: intel-gfx@lists.freedesktop.org
CC: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
CC: "Christian König" <christian.koenig@amd.com>
CC: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
CC: dri-devel@lists.freedesktop.org

Hi Thomas,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on drm-intel/for-linux-next]
[cannot apply to drm-intel/for-linux-next-fixes drm-tip/drm-tip linus/master v6.8 next-20240312]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/drm-ttm-Allow-TTM-LRU-list-nodes-of-different-types/20240306-150355
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20240306070125.27071-5-thomas.hellstrom%40linux.intel.com
patch subject: [PATCH v4 4/4] drm/ttm: Allow continued swapout after -ENOSPC falure
:::::: branch date: 7 days ago
:::::: commit date: 7 days ago
config: x86_64-randconfig-r051-20240312 (https://download.01.org/0day-ci/archive/20240313/202403130229.ada3fBTp-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Julia Lawall <julia.lawall@inria.fr>
| Closes: https://lore.kernel.org/r/202403130229.ada3fBTp-lkp@intel.com/

cocci warnings: (new ones prefixed by >>)
>> drivers/gpu/drm/ttm/ttm_device.c:156:1-10: second lock on line 176
   drivers/gpu/drm/ttm/ttm_device.c:176:4-13: second lock on line 176

vim +156 drivers/gpu/drm/ttm/ttm_device.c

f9e2a03e110ad0 Christian König  2020-10-06  146  
f9e2a03e110ad0 Christian König  2020-10-06  147  int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
f9e2a03e110ad0 Christian König  2020-10-06  148  		       gfp_t gfp_flags)
f9e2a03e110ad0 Christian König  2020-10-06  149  {
5d05b988f1c0fd Christian König  2021-06-08  150  	struct ttm_resource_cursor cursor;
f9e2a03e110ad0 Christian König  2020-10-06  151  	struct ttm_resource_manager *man;
6a9b028994025f Christian König  2021-07-16  152  	struct ttm_resource *res;
5d05b988f1c0fd Christian König  2021-06-08  153  	unsigned i;
ebd59851c796c2 Christian König  2020-10-06  154  	int ret;
ebd59851c796c2 Christian König  2020-10-06  155  
a1f091f8ef2b68 Christian König  2020-10-06 @156  	spin_lock(&bdev->lru_lock);
f9e2a03e110ad0 Christian König  2020-10-06  157  	for (i = TTM_PL_SYSTEM; i < TTM_NUM_MEM_TYPES; ++i) {
f9e2a03e110ad0 Christian König  2020-10-06  158  		man = ttm_manager_type(bdev, i);
f9e2a03e110ad0 Christian König  2020-10-06  159  		if (!man || !man->use_tt)
f9e2a03e110ad0 Christian König  2020-10-06  160  			continue;
f9e2a03e110ad0 Christian König  2020-10-06  161  
5d05b988f1c0fd Christian König  2021-06-08  162  		ttm_resource_manager_for_each_res(man, &cursor, res) {
5d05b988f1c0fd Christian König  2021-06-08  163  			struct ttm_buffer_object *bo = res->bo;
81b0d0e4f81155 Christian König  2022-06-03  164  			uint32_t num_pages;
ebd59851c796c2 Christian König  2020-10-06  165  
9a9a8fe2675133 Thomas Hellström 2023-03-07  166  			if (!bo || bo->resource != res)
81b0d0e4f81155 Christian König  2022-06-03  167  				continue;
81b0d0e4f81155 Christian König  2022-06-03  168  
81b0d0e4f81155 Christian König  2022-06-03  169  			num_pages = PFN_UP(bo->base.size);
ebd59851c796c2 Christian König  2020-10-06  170  			ret = ttm_bo_swapout(bo, ctx, gfp_flags);
14fb6041b4dee4 Thomas Hellström 2024-03-06  171  			/* Couldn't swap out, and retained the lru_lock */
14fb6041b4dee4 Thomas Hellström 2024-03-06  172  			if (ret == -EBUSY)
14fb6041b4dee4 Thomas Hellström 2024-03-06  173  				continue;
14fb6041b4dee4 Thomas Hellström 2024-03-06  174  			/* Couldn't swap out and dropped the lru_lock */
14fb6041b4dee4 Thomas Hellström 2024-03-06  175  			if (ret == -ENOSPC) {
14fb6041b4dee4 Thomas Hellström 2024-03-06 @176  				spin_lock(&bdev->lru_lock);
14fb6041b4dee4 Thomas Hellström 2024-03-06  177  				continue;
cd259b6156f7e7 Thomas Hellström 2024-03-06  178  			}
14fb6041b4dee4 Thomas Hellström 2024-03-06  179  			/*
14fb6041b4dee4 Thomas Hellström 2024-03-06  180  			 * Dropped the lock and either succeeded or
14fb6041b4dee4 Thomas Hellström 2024-03-06  181  			 * hit an error that forces us to break.
14fb6041b4dee4 Thomas Hellström 2024-03-06  182  			 */
cd259b6156f7e7 Thomas Hellström 2024-03-06  183  			ttm_resource_cursor_fini(&cursor);
14fb6041b4dee4 Thomas Hellström 2024-03-06  184  			return ret ? ret : num_pages;
ebd59851c796c2 Christian König  2020-10-06  185  		}
cd259b6156f7e7 Thomas Hellström 2024-03-06  186  	}
cd259b6156f7e7 Thomas Hellström 2024-03-06  187  	ttm_resource_cursor_fini_locked(&cursor);
a1f091f8ef2b68 Christian König  2020-10-06  188  	spin_unlock(&bdev->lru_lock);
ebd59851c796c2 Christian König  2020-10-06  189  	return 0;
ebd59851c796c2 Christian König  2020-10-06  190  }
f9e2a03e110ad0 Christian König  2020-10-06  191  EXPORT_SYMBOL(ttm_device_swapout);
ebd59851c796c2 Christian König  2020-10-06  192  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-03-13 12:26 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-06  7:01 [PATCH v4 0/4] TTM unlockable restartable LRU list iteration Thomas Hellström
2024-03-06  7:01 ` [PATCH v4 1/4] drm/ttm: Allow TTM LRU list nodes of different types Thomas Hellström
2024-03-06  7:01 ` [PATCH v4 2/4] drm/ttm: Use LRU hitches Thomas Hellström
2024-03-08 13:20   ` Somalapuram, Amaranath
2024-03-11 13:04     ` Thomas Hellström
2024-03-06  7:01 ` [PATCH v4 3/4] drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist moves Thomas Hellström
2024-03-06  7:01 ` [PATCH v4 4/4] drm/ttm: Allow continued swapout after -ENOSPC falure Thomas Hellström
2024-03-06  7:06 ` ✓ CI.Patch_applied: success for TTM unlockable restartable LRU list iteration (rev4) Patchwork
2024-03-06  7:06 ` ✓ CI.checkpatch: " Patchwork
2024-03-06  7:07 ` ✓ CI.KUnit: " Patchwork
2024-03-06  7:18 ` ✓ CI.Build: " Patchwork
2024-03-06  7:18 ` ✓ CI.Hooks: " Patchwork
2024-03-06  7:19 ` ✗ CI.checksparse: warning " Patchwork
2024-03-06  7:54 ` ✓ CI.BAT: success " Patchwork
2024-03-06 13:40 ` ✗ Fi.CI.SPARSE: warning " Patchwork
2024-03-06 13:54 ` ✓ Fi.CI.BAT: success " Patchwork
2024-03-07 13:11 ` ✗ Fi.CI.IGT: failure " Patchwork
2024-03-08  7:43 ` [PATCH v4 0/4] TTM unlockable restartable LRU list iteration Somalapuram, Amaranath
2024-03-11 13:07   ` Thomas Hellström
2024-03-13 12:26     ` Thomas Hellström
  -- strict thread matches above, loose matches on Subject: below --
2024-03-12 18:55 [PATCH v4 4/4] drm/ttm: Allow continued swapout after -ENOSPC falure kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.