From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2181D3438BA for ; Fri, 12 Jun 2026 13:54:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781272463; cv=none; b=J46GZUCZ9lH1C5X2IGrn8z3a5BG2NyUbD6tFloD5Np380GB+HWeqpME4UnM+CWGsLe2Zm2SJSO+J4OAb9OThWtVfd7TV5guQMGZdWTtw1af2kCVwwhm3sKVZhQ2qFlA/IEAbiOSyDQ5yiGGBeZf8Iojwb+UnQGG7NPjU2dG+iC8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781272463; c=relaxed/simple; bh=kILQ85veuIrcULzo1G8jHYeL8gmTmoLkPT7LMDIHXQ0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=taONNur4Ln2xgGU3epjwrCXcFbQKwDrrXiB8fqLxBgwiHgtcqyhtCtBpdK+AzU85qPO4ZYC0Ic4YgNW9gE4K0VbYGcOev9Nfauu6gAQ+6jj1SuQN/x/yTiyqMPUqziYgA2MZaNwSIZEvqdPqi4xVEirqTzL6oHdm2+fG6Ews0KE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=KOpV0D7I; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KOpV0D7I" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1781272462; x=1812808462; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kILQ85veuIrcULzo1G8jHYeL8gmTmoLkPT7LMDIHXQ0=; b=KOpV0D7IUx4+tCpANCtCMykhLyvIVAUfMmL1OOvwaaLK9vWuzDROrnKg hilhKzrZyEezYNVcsve/BHCVzlq+EYMwn4o8Lxt9uPMgeDCbcbyiYGC6P eTOYlJq9KpDqqASgA2xgd44Htpcc2tSxZnV0rjTvD8MrpI7boz48/jPWB c/jH5b1N0dbGyH9WyKh35DPfH6Q+eMTpa2pndW1KHCisuj9ldDfYIxrdb KYG6OcNcqvygg/20/0l1hdtcuAUADE+1qppVvdo9zLuitX8PlX7Of9ans UTuhUYVeFsPaRXCNZCwwQBMybljX+fHcv3bxOF5nIJC7CaweyxltdHY/S w==; X-CSE-ConnectionGUID: TKRP5JP9TRiCBExWt4sZqQ== X-CSE-MsgGUID: UU+NyVO1RS21xMRjxEmVjA== X-IronPort-AV: E=McAfee;i="6800,10657,11813"; a="82163310" X-IronPort-AV: E=Sophos;i="6.24,200,1774335600"; d="scan'208";a="82163310" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2026 06:54:22 -0700 X-CSE-ConnectionGUID: 3CyBZ6NlS2WgezIvRIyQvw== X-CSE-MsgGUID: cle2O9uAQtGl90O/1adiGw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,200,1774335600"; d="scan'208";a="242446553" Received: from slindbla-desk.ger.corp.intel.com (HELO fedora) ([10.245.245.68]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2026 06:54:18 -0700 From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= To: intel-xe@lists.freedesktop.org Cc: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= , Matthew Brost , Maarten Lankhorst , Michal Mrozek , John Falkowski , Rodrigo Vivi , Lahtinen Joonas , David Howells , Christian Brauner , Kees Cook , Davidlohr Bueso , =?UTF-8?q?Christian=20K=C3=B6nig?= , Dave Airlie , Simona Vetter , dri-devel@lists.freedesktop.org, LMKL Subject: [PATCH 2/4] drm/xe: Add fault injection for rebind worker -ENOSPC Date: Fri, 12 Jun 2026 15:53:38 +0200 Message-ID: <20260612135340.116100-3-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260612135340.116100-1-thomas.hellstrom@linux.intel.com> References: <20260612135340.116100-1-thomas.hellstrom@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add fault injection support using the kernel fault injection infrastructure to inject -ENOSPC early in the success path of preempt_rebind_work_func(), before xe_svm_notifier_lock() is taken, testing the error handling paths without interference from real resource exhaustion. Injection is restricted to restartable VMs. When triggered, the worker deactivates the VM (rebind_deactivated). Upcoming patches will then also post an error event to userspace. Enable via debugfs: echo 1 > /sys/kernel/debug/dri/0/fail_rebind/times echo 100 > /sys/kernel/debug/dri/0/fail_rebind/probability Assisted-by: GitHub_Copilot:claude-sonnet-4.6 Signed-off-by: Thomas Hellström --- drivers/gpu/drm/xe/xe_debugfs.c | 4 +++- drivers/gpu/drm/xe/xe_vm.c | 32 ++++++++++++++++++++++++++++++++ drivers/gpu/drm/xe/xe_vm.h | 5 +++++ 3 files changed, 40 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c index 22b471303984..1a92c52ccd83 100644 --- a/drivers/gpu/drm/xe/xe_debugfs.c +++ b/drivers/gpu/drm/xe/xe_debugfs.c @@ -35,8 +35,8 @@ #ifdef CONFIG_DRM_XE_DEBUG #include "xe_bo_evict.h" #include "xe_migrate.h" -#include "xe_vm.h" #endif +#include "xe_vm.h" DECLARE_FAULT_ATTR(gt_reset_failure); DECLARE_FAULT_ATTR(inject_csc_hw_error); @@ -612,6 +612,8 @@ void xe_debugfs_register(struct xe_device *xe) fault_create_debugfs_attr("fail_gt_reset", root, >_reset_failure); + xe_vm_debugfs_register(root); + if (IS_SRIOV_PF(xe)) xe_sriov_pf_debugfs_register(xe, root); else if (IS_SRIOV_VF(xe)) diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c index 86ed8f31a219..b69a2e5bd9c9 100644 --- a/drivers/gpu/drm/xe/xe_vm.c +++ b/drivers/gpu/drm/xe/xe_vm.c @@ -18,6 +18,9 @@ #include #include #include +#ifdef CONFIG_DEBUG_FS +#include +#endif #include @@ -43,6 +46,17 @@ #include "xe_vm_madvise.h" #include "xe_wa.h" +#ifdef CONFIG_FAULT_INJECTION +static DECLARE_FAULT_ATTR(rebind_enospc); + +static void xe_vm_register_fault_attrs(struct dentry *root) +{ + fault_create_debugfs_attr("fail_rebind", root, &rebind_enospc); +} +#else +static inline void xe_vm_register_fault_attrs(struct dentry *root) {} +#endif + static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm) { return vm->gpuvm.r_obj; @@ -529,6 +543,13 @@ static void preempt_rebind_work_func(struct work_struct *w) goto out_unlock; } +#ifdef CONFIG_FAULT_INJECTION + if (xe_vm_is_restartable(vm) && should_fail(&rebind_enospc, 1)) { + err = -ENOSPC; + goto out_unlock; + } +#endif + #define retry_required(__tries, __vm) \ (IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT) ? \ (!(__tries)++ || __xe_vm_userptr_needs_repin(__vm)) : \ @@ -5042,3 +5063,14 @@ void xe_vm_remove_exec_queue(struct xe_vm *vm, struct xe_exec_queue *q) } up_write(&vm->exec_queues.lock); } + +#ifdef CONFIG_DEBUG_FS +/** + * xe_vm_debugfs_register() - Register xe_vm debugfs entries + * @root: debugfs root dentry for this device + */ +void xe_vm_debugfs_register(struct dentry *root) +{ + xe_vm_register_fault_attrs(root); +} +#endif diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h index 9ee44599cacd..0f9a38d97bf6 100644 --- a/drivers/gpu/drm/xe/xe_vm.h +++ b/drivers/gpu/drm/xe/xe_vm.h @@ -216,6 +216,11 @@ int xe_vm_restart_ioctl(struct drm_device *dev, void *data, struct drm_file *file); void xe_vm_close_and_put(struct xe_vm *vm); +#ifdef CONFIG_DEBUG_FS +struct dentry; +void xe_vm_debugfs_register(struct dentry *root); +#endif + static inline bool xe_vm_in_fault_mode(struct xe_vm *vm) { return vm->flags & XE_VM_FLAG_FAULT_MODE; -- 2.54.0