From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.14])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A91651448E3
	for <linux-kernel@vger.kernel.org>; Thu,  6 Mar 2025 20:57:09 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=198.175.65.14
ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741294631; cv=fail; b=eB7poki8bvN4K58FLlkyAYS9SIVU6ToiSJ1NjXAjQ94zFG+01VmK1NAZCaOamluuJKRmJFNwd67/YdOXDzlV8FM+2FkIOMKnSq9J44m6tYzi+4NooYEOWoApLqbTS5DD5/FCyDCWS88hqU6RGCOuXEkpeoBXPJY+URomybnk2d8=
ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741294631; c=relaxed/simple;
	bh=+Jca9lHWIKszEc9Y3Ty+i9h4VhgrgaJcfXO70UFws6Y=;
	h=Date:From:To:CC:Subject:Message-ID:References:Content-Type:
	 Content-Disposition:In-Reply-To:MIME-Version; b=k/5UzpK5qh0FJg9A3NMW1+lNqWLLBaLU+y45iR3QtVIQQ455Vk5YLxWiBVhz4BQWsf7zUc2FPsbpvEhOMK0jHThHi2sLP0znXYYEpYWudSNhHHX0O81yhRBpn+X7NWAxLMjEcjg0Dt+xrvyYtMP9VAGNIMkVP8fs1IP8okEZzAo=
ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=OJqbMD7I; arc=fail smtp.client-ip=198.175.65.14
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="OJqbMD7I"
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1741294630; x=1772830630;
  h=date:from:to:cc:subject:message-id:references:
   in-reply-to:mime-version;
  bh=+Jca9lHWIKszEc9Y3Ty+i9h4VhgrgaJcfXO70UFws6Y=;
  b=OJqbMD7ItVG5pA3hamwKo7bEpIgh/hogObn82+4RBuO+G/OUH29SGchv
   15aWvv7V6mcLcfHMYVOh+6iIsAEXFghNPyCUVgy8C5mV3/McZnO5B8pej
   RItFTPfxc3LkW22uFQWcOPpd8MAxqIbV5hB+BixOrkCnOkEWDRq86vUYS
   TZHrH4bDLp0V9X3eehvWyFZKCP9pDj8jSM87+0DkH70HLbDwdTa0Rc/yL
   rDcgEA+clXLuW8l7s5wKMdYv7jkQI3g5a3KP8aUQo6yievCRYX5bjZEQq
   Z7Js4KDE2QFA34J4AnZAK2QrKqCUJGqQFKBEVZZZKEkcwuwbNmohs2dfe
   A==;
X-CSE-ConnectionGUID: xEnxeaFQTrCUROya9QU6kg==
X-CSE-MsgGUID: iy5Rcxz2Qg6gwQfO+Cnvrg==
X-IronPort-AV: E=McAfee;i="6700,10204,11365"; a="46101340"
X-IronPort-AV: E=Sophos;i="6.14,227,1736841600"; 
   d="scan'208";a="46101340"
Received: from orviesa002.jf.intel.com ([10.64.159.142])
  by orvoesa106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2025 12:57:09 -0800
X-CSE-ConnectionGUID: lQbiUAs5SPeXkVfKn3cj3w==
X-CSE-MsgGUID: kiRfK3lxQXuyg+8J/ZsE/A==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.14,227,1736841600"; 
   d="scan'208";a="149928326"
Received: from orsmsx901.amr.corp.intel.com ([10.22.229.23])
  by orviesa002.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2025 12:57:09 -0800
Received: from ORSMSX901.amr.corp.intel.com (10.22.229.23) by
 ORSMSX901.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.2.1544.14; Thu, 6 Mar 2025 12:57:08 -0800
Received: from orsedg603.ED.cps.intel.com (10.7.248.4) by
 ORSMSX901.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.2.1544.14 via Frontend Transport; Thu, 6 Mar 2025 12:57:08 -0800
Received: from NAM10-MW2-obe.outbound.protection.outlook.com (104.47.55.47) by
 edgegateway.intel.com (134.134.137.100) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.1.2507.44; Thu, 6 Mar 2025 12:57:08 -0800
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=Njeo6LlvNTIFBbaVCyHF6GQSMwzf1VP1Z4MhSwTWOGvVfxk/HF4FXQiu2S0Oia811FeGzlKiUY6BKWKn5QaKdGk54RMSPmFmfhqMhjyE+irDdbDby3uytnBpzuy/rYrjZGj7yzLi5JrJyEkhrqkpeyxyxdUGQ4E+a3RmAaZ4ZoeTMGxTJuzJY36a8o/crK/9VYBt7/cxtshP6hwHYZjNfJkC+tQ2A965adnnkxpKvTbadMS5Th3TYlNCblKpnAhvzVPOeg7ZZpxWUXCuM99BOZhHeno+z4UUTmypz4cC9v5AP8gE0VZKXZAJdG7yGXWfBLpiMlS2VVoKvK7cIoibjA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=OqLwkKZ6/4Ed+B08i9Gqf76jl7JotCWUAh7Lb81VX60=;
 b=czVoX5mGzoak9DVQCdiik8kfWz+jByk75ui2dA0XcL/DXtVLdr20M+zuccz+/cMq18HlBzrE7g2M2WcUXcl1jMNGhinlY9ViZsHRybAX9lfFeT1XfcutD7Khd9pom/0H+EP86YY0uMgGDoZYNzl9RbmlrAL0L9D5SxSO/eKJi8xo7nKRKeI1L8MVDK449NjHqcvFKSPLUjyuwK0Ljq1bkCl0nrtsms+S18CdPFtQdrkTyVweNc+hyWvk9exS4eBR9jZiHLS20MFGxkVrzx8PRDDOQ594AQNUhTOBGmX1DltyugnZMrV8xeObaLzpHWsVbs4Z4VzlexsH1uzQSrspCw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com;
 dkim=pass header.d=intel.com; arc=none
Authentication-Results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=intel.com;
Received: from PH7PR11MB6522.namprd11.prod.outlook.com (2603:10b6:510:212::12)
 by SA1PR11MB6894.namprd11.prod.outlook.com (2603:10b6:806:2b1::14) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.18; Thu, 6 Mar
 2025 20:56:39 +0000
Received: from PH7PR11MB6522.namprd11.prod.outlook.com
 ([fe80::9e94:e21f:e11a:332]) by PH7PR11MB6522.namprd11.prod.outlook.com
 ([fe80::9e94:e21f:e11a:332%3]) with mapi id 15.20.8511.017; Thu, 6 Mar 2025
 20:56:38 +0000
Date: Thu, 6 Mar 2025 12:57:45 -0800
From: Matthew Brost <matthew.brost@intel.com>
To: Philipp Stanner <phasta@kernel.org>
CC: Danilo Krummrich <dakr@kernel.org>, Christian =?iso-8859-1?Q?K=F6nig?=
	<ckoenig.leichtzumerken@gmail.com>, Maarten Lankhorst
	<maarten.lankhorst@linux.intel.com>, Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>, David Airlie <airlied@gmail.com>,
	Simona Vetter <simona@ffwll.ch>, Sumit Semwal <sumit.semwal@linaro.org>,
	<dri-devel@lists.freedesktop.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v7 3/3] drm/sched: Update timedout_job()'s documentation
Message-ID: <Z8oMSWulN0mF43aB@lstrano-desk.jf.intel.com>
References: <20250305130551.136682-2-phasta@kernel.org>
 <20250305130551.136682-5-phasta@kernel.org>
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <20250305130551.136682-5-phasta@kernel.org>
X-ClientProxiedBy: MW4PR03CA0216.namprd03.prod.outlook.com
 (2603:10b6:303:b9::11) To PH7PR11MB6522.namprd11.prod.outlook.com
 (2603:10b6:510:212::12)
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: PH7PR11MB6522:EE_|SA1PR11MB6894:EE_
X-MS-Office365-Filtering-Correlation-Id: 0b6c7261-4239-4236-b1c5-08dd5cf163b3
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|376014|7416014|7053199007;
X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?H43LTvxDVO+944DJmnYvX1j7XnmUP+xPOHtSTmRKnmiSHamXe9Z73ZpyblV2?=
 =?us-ascii?Q?Z5BJ1dkHey5TDxiDOwyS5cFcbswi1BsbStuNjDXlSqbrAIOiLR71jnE5BOkS?=
 =?us-ascii?Q?KDAGmvoLPs+rOHkleYQtkq0ydUywsmpJyK3YQluPLj59snKYsKGL+uyAckQn?=
 =?us-ascii?Q?gaWuD9P9gpRQ6T04NDD8OfbN6ZUWv0DdjBr5HZS1l51sbsOf85KDauFuYgBC?=
 =?us-ascii?Q?KnfVpNgLulDQNy682c2nMXuDTXnXvdIOSLS8Z2oczOfC7JAXDS3iS/Wmira4?=
 =?us-ascii?Q?JM+cJ6futAH0HemruwX51qK5nKoeOzkQ6nHp00AldG3buGGPHUArGgOJ67o/?=
 =?us-ascii?Q?Ra5j5VT+EMgE55fb5n8ttMxSwiCN6Cv86MQILnXqJ2yNzey1rvmy0qqpc8v4?=
 =?us-ascii?Q?kf3X5wzD1RvUppfUN42doyVXwxdjjulhRyWb0//b4EafATn/FSN1vN/32F5Q?=
 =?us-ascii?Q?oHi3PCjWBHd85xPojMuOlCRtfP7fkdCxE0wLPaApxyyfWxRKw8AsTUU9FO+D?=
 =?us-ascii?Q?oj58l5WmmrxdYR/qfPubur111nCbXMJYydroM4IABHgBVCVVX59vd6tFneqf?=
 =?us-ascii?Q?tqpYvzmXDw8X5V+4o/Dh29NXgUc9exEXzu5PcRoRLBqY8b2bR/Vba6rPo3OP?=
 =?us-ascii?Q?Tj6QLj26k7F4uZ2Pze3M6Zv7713Yg7irGLIyzKkggpC6MUaqglrp4yMLRukY?=
 =?us-ascii?Q?4AaBDi4JtoZ6gcpvUu3H+CN2ioOT9li0W/fxlpApPTJ3BkII0ftkTWYCTpIx?=
 =?us-ascii?Q?X1Kyi6y6j7iBOdtQy13ZfPg8m56/eIucDdpdfRteWW5bW3RfmOeK0SW+oi/y?=
 =?us-ascii?Q?xCvlmPJgH6vB7OUagAXWDPKfqfy0snOA0jKOysrJiI+RBqnpydBuuOjywdDZ?=
 =?us-ascii?Q?3vnFL1lEtTgqsJwyS48OrSYL28BpNtEnw5JguUYK1jIPhpD8WP1aXjd8MvHa?=
 =?us-ascii?Q?PX1a9Ue30kJcHMxXVVlR7sOZ4ZxiX6bBRjFlyt3ciiiy0+wosFx/9OeI41yr?=
 =?us-ascii?Q?9mhm/V6Bl5swH6PIf6cWeaRaF2rpMLjsiQikEM80Rk9/bfGsP8yRYZhvAzNd?=
 =?us-ascii?Q?asJAIJIAFg+a3SKvmjL8q/D8oRz/X62ampmIuN81u8sVSG+KbcuVa9Y+uKGk?=
 =?us-ascii?Q?hfknr1M7Vj9EEZJII/0LFwNiv6/A8lT4XVyWe2hThafSmSOQmznb7OIxEg9W?=
 =?us-ascii?Q?+QO7BQ1PmDOxwEwIIBStaaUrUzA4uzuVoamrvWOZgS+bZBV7u4y6Ovt0NYTu?=
 =?us-ascii?Q?flHCr4/LGR4WJ7N68ytziNXpLyp8YhsSQjBjGOGyRYnyCO75mbviAAealbdP?=
 =?us-ascii?Q?Ze/mCtHtGqqLv3G7drmrDJt/G73NcCabLPnM+OokH9ozh4DF1rQ46rHFky46?=
 =?us-ascii?Q?WvIVuE3Xc4zIk/SpvO9ScYXY+cje?=
X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PH7PR11MB6522.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(376014)(7416014)(7053199007);DIR:OUT;SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?NFu/Mn6R22Nps/pOowScHHdFCkxVJeCUUFWUsPeThvMwexikT/OGkr3gqtIz?=
 =?us-ascii?Q?Uv3rpjut3lnElBr2ANQMcLiH8G2m8tkihrbie6UuR9SeJro9NyxzXlfsXQzX?=
 =?us-ascii?Q?9mSSa48260O7GrkJbg+6WlfMKkVQitGT312fZBqUILUa47EUugn+ymOLIozp?=
 =?us-ascii?Q?s7qUUz6R9yYv3yXg3eYzIDig0eSoINZTlz7i5q5Q8wAyt9JiiHZcdTk8+UMo?=
 =?us-ascii?Q?mOfN8aijCCBALg8nLWm3yVo90lv9+0epn3NR6fjzqJAgRUnlExIp/bY94+oy?=
 =?us-ascii?Q?L5eSLC0SVXo2PovMdb4ifOjm962btD/pqITrnpYGbL2AsTc/Jvn9krRuYSVz?=
 =?us-ascii?Q?5fk95OeAGVu69UBmF8YaF8Zp3aTMD9GCKYfLI932NKFrT4dgoe1zwp6lPE5+?=
 =?us-ascii?Q?7uqA8TxumN6oeBItbAQ4tUr0SPreuw3KdIi8mfLbsyCTF54NNXrYbkV8f9i8?=
 =?us-ascii?Q?uGofig6S9fJPFXvtXYWFcTFwsNVgwADXG4VsRGmgPUBbg7amdOeuvUd2kICO?=
 =?us-ascii?Q?7JA8BXE6jiR3j8cnobBlOY6uSpvxe/pyOILSgRjKTzVXvHYLLgKojR73Mwt4?=
 =?us-ascii?Q?sehm+sjkeR24Z8ZzuVtcXiMnB8crCh07JYZ1Kx0DCDTu7leynPmqE0s4Wm1d?=
 =?us-ascii?Q?LR9Ps156lYcK5lU350ArLREKurVPBXuZTTW84VLA2ZC+EZffl1PWC25V3yhW?=
 =?us-ascii?Q?DMA4GWY2ivQexdwMB0hLHmLp0zJ4v3bLehV0ta24s2V6gUWbGr7FRrgsgAeg?=
 =?us-ascii?Q?3QladiOMkfcY2kFVq5xp10K+s/DHjek/ucl8XHV9WLJaV6lMNEjXcyjQqW0K?=
 =?us-ascii?Q?vLgqgULBoUPt2YXob8Yl2/+UDylA3eRe8xXPyCzHa1fiCxEgRgHWk5/5kAHQ?=
 =?us-ascii?Q?rA2zgUDUVV+3Nv39LR+BQUM634G0aXLzmmNPDTTM04k3oRG4zYU0BFetykWI?=
 =?us-ascii?Q?frXTYWTrdIguzvVGAgZIDcs6f7l8JGdf/3ie0DPSjcKowYXY07+po6XqiP2x?=
 =?us-ascii?Q?SVOPOf8/Op9tSQ4UA84l1vb/xpvrmieET0MF7uelpCNc41F2+5emUSbuzb/b?=
 =?us-ascii?Q?PQP8mAvbxFoRZe+cqREtwwYZdOK/xyNKaBzbc13GAI3Ia7DybA8KLvD+q8il?=
 =?us-ascii?Q?JTIkzuBPVf0gkz636n2UgUx4Xk72vze+tpVXQr+ecjvsYqaJFREB9UgkWcbs?=
 =?us-ascii?Q?WQxMZCitMe5XA6wFXi29xqdkq5wLGS7rjz5NfClJegVIhbtZpN7rxdLnq/od?=
 =?us-ascii?Q?gxxQGeC+exrflYQl7Fc6r3VWvUaNXDHLx/h62M+opWVPBE0jkclu3/xgY6ck?=
 =?us-ascii?Q?ka2gSsuZ4J0KhZ5YDfYyR4zJ3qx5RU8CZ/DxqGGDnElvs5Vo7ZYBIVxeYpO2?=
 =?us-ascii?Q?IRQpFp1SIVIMZcMo/Cmws881Yl7tJQqqsl1KkJ9s1FgONr+oBL52+dTrP6SI?=
 =?us-ascii?Q?HBJN3nvxg3ys/BtUurOPM0t0oGereWG8H9r3DRwbu86fcsAQfdGvM842bi+v?=
 =?us-ascii?Q?V/Nhtai4GtIIUNTcBCX94rk471by6e6+gGyBY62Cyky9nXC9Go37RPZrFp4p?=
 =?us-ascii?Q?IEYg/XKssR3WdsRlO7XX6M3w0J5OHSwcgw8xWxLtOuhNs15v9hK/Q/UCQ3Mu?=
 =?us-ascii?Q?JA=3D=3D?=
X-MS-Exchange-CrossTenant-Network-Message-Id: 0b6c7261-4239-4236-b1c5-08dd5cf163b3
X-MS-Exchange-CrossTenant-AuthSource: PH7PR11MB6522.namprd11.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Mar 2025 20:56:38.8070
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: /Z1Dln2TDcp30bl0+oLZhmiZZgrezsX76GFoAeBXZ5MVzkbTkgKI4aQD3SZCPyol4HOkoL5h/UddsoYFeBPsXA==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR11MB6894
X-OriginatorOrg: intel.com

On Wed, Mar 05, 2025 at 02:05:52PM +0100, Philipp Stanner wrote:
> drm_sched_backend_ops.timedout_job()'s documentation is outdated. It
> mentions the deprecated function drm_sched_resubmit_jobs(). Furthermore,
> it does not point out the important distinction between hardware and
> firmware schedulers.
> 
> Since firmware schedulers typically only use one entity per scheduler,
> timeout handling is significantly more simple because the entity the
> faulted job came from can just be killed without affecting innocent
> processes.
> 
> Update the documentation with that distinction and other details.
> 
> Reformat the docstring to work to a unified style with the other
> handles.
> 

Looks really good, one suggestion.

> Signed-off-by: Philipp Stanner <phasta@kernel.org>
> ---
>  include/drm/gpu_scheduler.h | 78 ++++++++++++++++++++++---------------
>  1 file changed, 47 insertions(+), 31 deletions(-)
> 
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 6381baae8024..1a7e377d4cbb 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -383,8 +383,15 @@ struct drm_sched_job {
>  	struct xarray			dependencies;
>  };
>  
> +/**
> + * enum drm_gpu_sched_stat - the scheduler's status
> + *
> + * @DRM_GPU_SCHED_STAT_NONE: Reserved. Do not use.
> + * @DRM_GPU_SCHED_STAT_NOMINAL: Operation succeeded.
> + * @DRM_GPU_SCHED_STAT_ENODEV: Error: Device is not available anymore.
> + */
>  enum drm_gpu_sched_stat {
> -	DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
> +	DRM_GPU_SCHED_STAT_NONE,
>  	DRM_GPU_SCHED_STAT_NOMINAL,
>  	DRM_GPU_SCHED_STAT_ENODEV,
>  };
> @@ -447,43 +454,52 @@ struct drm_sched_backend_ops {
>  	 * @timedout_job: Called when a job has taken too long to execute,
>  	 * to trigger GPU recovery.
>  	 *
> -	 * This method is called in a workqueue context.
> +	 * @sched_job: The job that has timed out
>  	 *
> -	 * Drivers typically issue a reset to recover from GPU hangs, and this
> -	 * procedure usually follows the following workflow:
> +	 * Drivers typically issue a reset to recover from GPU hangs.
> +	 * This procedure looks very different depending on whether a firmware
> +	 * or a hardware scheduler is being used.
>  	 *
> -	 * 1. Stop the scheduler using drm_sched_stop(). This will park the
> -	 *    scheduler thread and cancel the timeout work, guaranteeing that
> -	 *    nothing is queued while we reset the hardware queue
> -	 * 2. Try to gracefully stop non-faulty jobs (optional)
> -	 * 3. Issue a GPU reset (driver-specific)
> -	 * 4. Re-submit jobs using drm_sched_resubmit_jobs()
> -	 * 5. Restart the scheduler using drm_sched_start(). At that point, new
> -	 *    jobs can be queued, and the scheduler thread is unblocked
> +	 * For a FIRMWARE SCHEDULER, each ring has one scheduler, and each
> +	 * scheduler has one entity. Hence, the steps taken typically look as
> +	 * follows:
> +	 *
> +	 * 1. Stop the scheduler using drm_sched_stop(). This will pause the
> +	 *    scheduler workqueues and cancel the timeout work, guaranteeing
> +	 *    that nothing is queued while the ring is being removed.
> +	 * 2. Remove the ring. The firmware will make sure that the
> +	 *    corresponding parts of the hardware are resetted, and that other
> +	 *    rings are not impacted.
> +	 * 3. Kill the entity and the associated scheduler.

Xe doesn't do step 3.

It does:
- Ban entity / scheduler so futures submissions are a NOP. This would be
  submissions with unmet dependencies. Submission at the IOCTL are
  disallowed 
- Signal all job's fences on the pending list
- Restart scheduler so free_job() is naturally called

I'm unsure if this how other firmware schedulers do this, but it seems
to work quite well in Xe.

Matt

> +	 *
> +	 *
> +	 * For a HARDWARE SCHEDULER, a scheduler instance schedules jobs from
> +	 * one or more entities to one ring. This implies that all entities
> +	 * associated with the affected scheduler cannot be torn down, because
> +	 * this would effectively also affect innocent userspace processes which
> +	 * did not submit faulty jobs (for example).
> +	 *
> +	 * Consequently, the procedure to recover with a hardware scheduler
> +	 * should look like this:
> +	 *
> +	 * 1. Stop all schedulers impacted by the reset using drm_sched_stop().
> +	 * 2. Kill the entity the faulty job stems from.
> +	 * 3. Issue a GPU reset on all faulty rings (driver-specific).
> +	 * 4. Re-submit jobs on all schedulers impacted by re-submitting them to
> +	 *    the entities which are still alive.
> +	 * 5. Restart all schedulers that were stopped in step #1 using
> +	 *    drm_sched_start().
>  	 *
>  	 * Note that some GPUs have distinct hardware queues but need to reset
>  	 * the GPU globally, which requires extra synchronization between the
> -	 * timeout handler of the different &drm_gpu_scheduler. One way to
> -	 * achieve this synchronization is to create an ordered workqueue
> -	 * (using alloc_ordered_workqueue()) at the driver level, and pass this
> -	 * queue to drm_sched_init(), to guarantee that timeout handlers are
> -	 * executed sequentially. The above workflow needs to be slightly
> -	 * adjusted in that case:
> +	 * timeout handlers of different schedulers. One way to achieve this
> +	 * synchronization is to create an ordered workqueue (using
> +	 * alloc_ordered_workqueue()) at the driver level, and pass this queue
> +	 * as drm_sched_init()'s @timeout_wq parameter. This will guarantee
> +	 * that timeout handlers are executed sequentially.
>  	 *
> -	 * 1. Stop all schedulers impacted by the reset using drm_sched_stop()
> -	 * 2. Try to gracefully stop non-faulty jobs on all queues impacted by
> -	 *    the reset (optional)
> -	 * 3. Issue a GPU reset on all faulty queues (driver-specific)
> -	 * 4. Re-submit jobs on all schedulers impacted by the reset using
> -	 *    drm_sched_resubmit_jobs()
> -	 * 5. Restart all schedulers that were stopped in step #1 using
> -	 *    drm_sched_start()
> +	 * Return: The scheduler's status, defined by &enum drm_gpu_sched_stat
>  	 *
> -	 * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal,
> -	 * and the underlying driver has started or completed recovery.
> -	 *
> -	 * Return DRM_GPU_SCHED_STAT_ENODEV, if the device is no longer
> -	 * available, i.e. has been unplugged.
>  	 */
>  	enum drm_gpu_sched_stat (*timedout_job)(struct drm_sched_job *sched_job);
>  
> -- 
> 2.48.1
>