From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 460962940D for ; Fri, 7 Mar 2025 17:35:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=192.198.163.9 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741368925; cv=fail; b=CpkqO+TEL2+etEjbDHQDIisgr5JY+w0XGw8e8E1HBF6l/xFReEig36n7MIvPQ8fsksKMpMlo3544Vy8HRZztidLkDQuyyqw9CkPSdBMhjzGiUdB40CXSsd5jqst9Xe9nNJ2q7STsQTLTKYFhrafdsGzHfjpa5E0FDEhuUtEpEOo= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741368925; c=relaxed/simple; bh=DeGt8P94wYx8Ot8N6hJ/qYIAOoTuUAI0BEg5W2BrAUg=; h=Date:From:To:CC:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=sUSB0kGV2Z+xidNPyrlloKBbxO+fOgyx+rS5btFvdbzTayX9P7T+WYQAQvkv9RzKtcB11IymdOo7L8cREOAlkM7G2g+QAsBWGwD9OyqGUfxAzYJpfdFN4KXZhhYj47zNhXXPhJ44PdXc28ZpstEd41yh42eJjY88AAzsTb470SI= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SLpmP2nL; arc=fail smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SLpmP2nL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1741368923; x=1772904923; h=date:from:to:cc:subject:message-id:references: content-transfer-encoding:in-reply-to:mime-version; bh=DeGt8P94wYx8Ot8N6hJ/qYIAOoTuUAI0BEg5W2BrAUg=; b=SLpmP2nLyrH733RCGI9sg9V5AR1RHWXWJX1oIL0gyIHMYm5q/rne3bT3 OP22h8ez9ihCbiMecfykxImhv9dj0DvuN6fMlczQD7dNeYBhLL03Id00A F8tVrzpEZu9teTqfbceIJ4KzsyuqNUtIz4Afuka3IcGQgfxaTCNs1ognT 5KH2dqs2qTDYLk/pqBoYSjWgNymwzzOwbV3dVvpQOtyCipy2ptmMselq3 25G/QN4bNQGKKYYuT1bx2l4jH+EUCjdUIuu0TrZoICWnTKn2BQIi8Pz9o Cwb2R0Bf//olIjaFQ6ovF3ow/HjVdNHmp5+NhK6uqO60IM0YtihpkGqSW A==; X-CSE-ConnectionGUID: LvLDL8OJQcKWoxroH6VOqQ== X-CSE-MsgGUID: EjPY0t03Qfi8XzOKhT5scg== X-IronPort-AV: E=McAfee;i="6700,10204,11365"; a="53062639" X-IronPort-AV: E=Sophos;i="6.14,229,1736841600"; d="scan'208";a="53062639" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Mar 2025 09:35:22 -0800 X-CSE-ConnectionGUID: kGHQ8JQDQby0fzCr7NJnqA== X-CSE-MsgGUID: f7BWbLFWQTWYIg3EV7JEAQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.14,229,1736841600"; d="scan'208";a="142620990" Received: from orsmsx601.amr.corp.intel.com ([10.22.229.14]) by fmviesa002.fm.intel.com with ESMTP/TLS/AES256-GCM-SHA384; 07 Mar 2025 09:35:22 -0800 Received: from ORSMSX901.amr.corp.intel.com (10.22.229.23) by ORSMSX601.amr.corp.intel.com (10.22.229.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.44; Fri, 7 Mar 2025 09:35:21 -0800 Received: from orsedg603.ED.cps.intel.com (10.7.248.4) by ORSMSX901.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.14 via Frontend Transport; Fri, 7 Mar 2025 09:35:21 -0800 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (104.47.55.49) by edgegateway.intel.com (134.134.137.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.44; Fri, 7 Mar 2025 09:35:19 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=cCxGsTkKutkwM5DIudZZiZopdotCGR/N+PYOYQuqQ5Ag2MVkFyI0LYUHAysOR3ZxODyC7sdchmekBBoaCsNjjrumyfx1b8nem4xH4m4Jbp5kfJkuKMRwq5ZuFDPiomulcJx6WcTZuUl7Nqby7vncx2sZKD4qHGHjPKNIYG6PQKZiLjlraXxdGiGNaKdaUX3VJHoprQUPmFjPkiTuO/UJN4XH+wPV21ZqfGO9ZzCRqLJOcq1g13X26xXdSqSUwqMw1VHfSVfSVTeFIVZlRBWkj7Wq0ukuIVfJs7kHuH2E4qsDixGWzViHOJPNXuMMsforyOdFDrKkWCsOo09tvzgPdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=o0PCTLQoo2kQIR/UQ3LvznHM3xyGhXnaZbOtp1ozpFM=; b=cgt13rxkxMP6XXoMb5M0igg8MAyDsdi8hi01ytT6+GXYiHvNKXZWU/ZCQrHss/ZSmENmD95cP3jxkK0XOFMgyHwMp0CXgh8+m4n8QPjZ2GdO2qNlQfU9Ow9NoDkk4RopLQSO2DMZKRMURuyLtU4gA9j+lRoJrrBmaAz7JhzY0nEcXh6opEhTNmx2Mdb2OVs9GJBykf5+F2N5H8/tCgxLG/9l1065NCy+Oc9Q1olGkJ5kdwU09Dv3CgVUDWiMiHcppCLgFkpb6uW8/+GI3fcwW+ecf7tKpiDdrDSRHZpuXSz8hme51yjZ13Oz+MG2jxkjFA2S7XPjroKr8TOB+ne6qg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; Received: from PH7PR11MB6522.namprd11.prod.outlook.com (2603:10b6:510:212::12) by PH7PR11MB6650.namprd11.prod.outlook.com (2603:10b6:510:1a8::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.19; Fri, 7 Mar 2025 17:34:32 +0000 Received: from PH7PR11MB6522.namprd11.prod.outlook.com ([fe80::9e94:e21f:e11a:332]) by PH7PR11MB6522.namprd11.prod.outlook.com ([fe80::9e94:e21f:e11a:332%3]) with mapi id 15.20.8511.017; Fri, 7 Mar 2025 17:34:32 +0000 Date: Fri, 7 Mar 2025 09:35:39 -0800 From: Matthew Brost To: CC: Danilo Krummrich , Christian =?iso-8859-1?Q?K=F6nig?= , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Simona Vetter , Sumit Semwal , , Subject: Re: [PATCH v7 3/3] drm/sched: Update timedout_job()'s documentation Message-ID: References: <20250305130551.136682-2-phasta@kernel.org> <20250305130551.136682-5-phasta@kernel.org> <0ff8b5ddce856a7d9f5ffbabcb220e345b2dcfaa.camel@mailbox.org> Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <0ff8b5ddce856a7d9f5ffbabcb220e345b2dcfaa.camel@mailbox.org> X-ClientProxiedBy: MW4PR04CA0110.namprd04.prod.outlook.com (2603:10b6:303:83::25) To PH7PR11MB6522.namprd11.prod.outlook.com (2603:10b6:510:212::12) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: PH7PR11MB6522:EE_|PH7PR11MB6650:EE_ X-MS-Office365-Filtering-Correlation-Id: 35ff36bd-abcf-4467-0ceb-08dd5d9e523b X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|366016|1800799024|7053199007; X-Microsoft-Antispam-Message-Info: =?utf-8?B?MHQ1Tk50LzRJNTBzU1JOdmtibWJMZndKTDJPaVB5dDJ3Y3JDdVhzNmMzZnJs?= =?utf-8?B?R3E1eGNsa2craVlsd1J4eFZBaVJOa0QzQkplT2NIY1FIaDJvM2UzNWRuVUUw?= =?utf-8?B?TDBHbXBRLy8xeVVJVVI2dGJwbnBkdDlvNDdoVitUTmRXODY5WFphMGFha1g1?= =?utf-8?B?d2wwWm5LeTBFMG5zaTVDTEpTYlpwb1R5QlVwMXlJdkUzbVNKcmg5bFZOTEhB?= =?utf-8?B?MUpHS0pLQ2xKd1Nwd3lKK3ErRnhheDU4TDlYUjJEdldCa1J3aC9lOXByNmkx?= =?utf-8?B?R1hoL2VGakV5MERGamV1U2RtckxNS0FzYkhKaERQd2pVREgwajd6MnpHZFln?= =?utf-8?B?cDByMTU1aUY0S21pNjNKWVJaZ21CM0krTzArMktVY24waHN0UWE2S3VrQWdH?= =?utf-8?B?eFd2ckx5ZDdYNmc4bmZkVWlQNXJMYjZHTC84NFdtSFU0bUd6dTdjaTVWa0l4?= =?utf-8?B?VlJzckRxb3F5c0hEU21ObHRsWTFvallSVm4ycjVWR3puamJqZXdPZTlnbEVs?= =?utf-8?B?WUtXbVAxRGNvTXptSTZyRFErQkJqWTR3VkpyT0JLVE9vREhYZkIveGZ1WE1D?= =?utf-8?B?QUZvRXFBZ2ZKSjVYakV0dWhNWTh0dlZESnpKZXBYaVQ5NGRlKzJ6TGpZQTE2?= =?utf-8?B?WVVtc0RZc09reXY0alVXT0VDaVpwYS9oQ2lmK1kydnJqajh4QmtlV0JvL2Rh?= =?utf-8?B?ditSTU5zOWhiVnpOUUlGbFdQUVBjVW9wQW9DRHU3ZFBJMHpqb2Q4ZElKNlNk?= =?utf-8?B?RjBiYlVCR2JVWEpqYlhGU0hIN05FT1VmNHNuSkxrbThVcTZ5VTZMMkNPdDFE?= =?utf-8?B?NFdQZGFPc1hpc2hDS3RrU3VjTDE5dWNHOXJKK0FsTVl0VkYrYnlyUVAxREJS?= =?utf-8?B?cGE4U1dleTh6VXk0OGhHLzhITllJcnk4SWdNTUMwMU01RExoN3VtVDhuZVFp?= =?utf-8?B?MkkvWDgvU3hyK1FRa1NwTW1nVVEreW5JdENoQ2VNdytTbmxuYVJTaFNtMjdH?= =?utf-8?B?aUNINWpJcENKcDc1eXZ2c3ZTTUhDei9jRkFLNTN5U0FkLzdPL0ptY3RQakRW?= =?utf-8?B?Kzc3amp6ZUdpa2dYTDN3Q0RyNjE4ajUrS3o1eDVGbVQybmF5VDkvNVA5NmRE?= =?utf-8?B?SDc5d2N0RklGTnNkdmI3QThnOXZ2dkpaSmNpUzg1WWdySUNTRXZjSXZOcXVD?= =?utf-8?B?cmZ1YU05WXJ2OC9aZzBHM3NENHhKS25ob0Y1T2I3bk42dEQ0eCtZOTFJTEdN?= =?utf-8?B?czdUVGlIRTZkbmxJYkJrUHhaWEk1OGxaTUZ0SnRJOVFNQVZZc1IwMzB5cjlY?= =?utf-8?B?YWxwSUlGamRkOUNiSTVIU1JwdFVaOElXWU52dXdsYXpvdUgrMHpRWlRROVJ6?= =?utf-8?B?OG92bWVzbXlnZ1BLb0NVd2RaaC9jZnVVbG5LUW1SemNQcHd5bnR1anp6V2t0?= =?utf-8?B?NlZqa09XWVVUSTlYZGVXb29lU1ZQYWJKWjdjRVZkQkJOWm51NUltY1Bpdjh5?= =?utf-8?B?cE5nSTZkRldSR2s5SEpUd3k2VlRTd2dVSGEvTFJPMW9PK0c0WWpDNUtFMU1m?= =?utf-8?B?QlgwL0gzbmZaSzl0YVZhY3UvVUZiZzdObmpjTzI4KzdwcmFTUnIrUktpQ3dZ?= =?utf-8?B?ekoya0lWUzM4U1lDRDZBcEx4V0RWMlp3V3B3d3o5bFZ2SHdaRzdQMGFZeXpy?= =?utf-8?B?bDBRUklZUzdYazl5bXZUbFAyYnN0ZklSMzd3aklqMStMd0NMMVNoM05xZ0pC?= =?utf-8?B?dytkWXN5Mms4VVNCWDdkZUJiQ2pHUjF0Yzg4eHQyUDg3YnFXKzhPbUN2WWZt?= =?utf-8?B?ak15U0k5MGVBa1AyaVRBWWh0VjA3VTlENm9xZ1BHU0FCVms4MVBzM3ZTbEt5?= =?utf-8?Q?qWswehzJ84eSS?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PH7PR11MB6522.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(7416014)(376014)(366016)(1800799024)(7053199007);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?cllDTm5DM1RjUnIxTnBsY3VGZGtzU3RuVEVyS3Nzc2V6QW5LLzFFZnJyRVAw?= =?utf-8?B?SGltRXlNamZuQ3c5WUV4MnBjVzUyaTZScFFRbmtJRjR5ZjlXa1pJbDZheVU4?= =?utf-8?B?QlJtOFJNbFhLcWRtVG1NK3F0M2NDZjZzcWJ4Z1VLU1FKZHk2UlRJOEhiOEZV?= =?utf-8?B?REJjUWhzWEV2R2RrUkt0RTg1OXkyYUY5S09PNnZHbUxhMFh6UVY1RnhjNkwy?= =?utf-8?B?c1FPcXFBWC9XQW1PMW90WmV4bEk0WkM4YmZhUkhzeU5tT2hIQWtuVHNrWHZu?= =?utf-8?B?REZnTnh3bGJEUXVQUE5MTkhLVnJSWkVld2lzZGNxbExISUFpVkFsNTlHZXMz?= =?utf-8?B?NUpTOEZWMjNuMDlRYmdBYk92elh2SW1HU09iN2ZwWHRYL1NYYUlkdWlxOU93?= =?utf-8?B?OXNKbGU2M0VHVjM1TmZTdnNUNGxFWDhKRktSYkFKT0EvVGVSbFZoZ09Tem0v?= =?utf-8?B?SFNwL1JnN0JrUGxYSXJFZytpS094Qldsc2lmNFZ1bzNCVFFud2k4QzVmV1BW?= =?utf-8?B?OXhvMFN1UkpLTk1yb3VXOGhXRmMrKzdxMGsyeVd6QWxjQnRUaGpxM0Nmckp1?= =?utf-8?B?RTJadEJVTEdLNWxQQXhZV3YrS0k3OUdZZzlhUjVNV2hGMm5YeVVvdUxxTU1u?= =?utf-8?B?bEpsUGhTS1JCZDNIWS9RbHR3OXdBS043NSt4ODUxcFBlYU10RnE4QzJ5OXYv?= =?utf-8?B?VW9FaENvNVBodkJHbUp6WjF2VFZHZ1hpVTR0dFF0WDE4S0VhVnlnbTlOSko0?= =?utf-8?B?MzNMZzBRdzVNNy94L05yWkRvY0dYNWVZanNKaWljb1ZFNXFSaTdzVjJ0OS9C?= =?utf-8?B?MnNyaHRzM0tZeTN6c1BxeCtxL3FWT1p0QUo4WWx3WmhwOVQ5OFVsbVlFelBp?= =?utf-8?B?ZGxOQWM4amZKR3oxeFRCUzVFZVZoaWpFeUFhU05Nd0lHa1lrdXQ0V2FQNmJG?= =?utf-8?B?WFo0TVdSVjVPcURDemdQMkwzUEI0SUIvTWJCMFFCc241dEtLN1k4MW5zZ1gx?= =?utf-8?B?SWFwNis3QUMrSFNMWmljYU54WURYR1hqL1ZRV0pLYTBzVUcyZmpDTzFLdjBW?= =?utf-8?B?NWJiTFhNQzhZQUpsWG5PQ0V1K0xDYjVualh4dTN5WWkzNmV0L0U4c3dGK0My?= =?utf-8?B?NC81Y2ZXVC9WZnZNTlVrWHJ3ZEJnVXVLQWt4NTk5UjN4ajZFRExxOUl1RlAz?= =?utf-8?B?RmZzQVN1N0g3R2I0N21hMEQzSFpoWlFhOG1saHVNUWo5Mm9uMEdBVFpVRDNn?= =?utf-8?B?eW95VlFaL2MydXFDYU9DVFBhL29jYko4UlRJWFNGcjI3UERUeVJuUjArRFlo?= =?utf-8?B?TWJsMXFxa1piM0cxYXJBUi92M0JLcEpDL2s5RmQ2azVqL2RwYmpVcitZSTkw?= =?utf-8?B?bFIweEJHL0RJQVBJMmdMZE4rVzBlMXRYRVNEQ0F1VkpBSmF4c2dQS3YycC83?= =?utf-8?B?RE4rT3Q2azVCalBkMHlKM1RuS1NYZkt6eDFYaExLSXpISm9IV0xLM2VCUWd3?= =?utf-8?B?aDJOV2VwZzFFbm1iakVwUmhLclFtRWFxdWREN0RSUkFZKzVsNUVZc094Q3lu?= =?utf-8?B?SlFDZ0lEUWFrWUxTWFhsK29rZW1TZS9DR3cxNVcycFcvR2RWVUJUMWhZaTJU?= =?utf-8?B?TVRxVG5xR1ZFb1l1Q2NLOFFsQnNFcFY1VFNNbXJNcStzYnFDL2Vxa2tRVkcy?= =?utf-8?B?cUpzRExnSnhVbjdXVlJWM01UMjRrWjk2ZVB6RjRMaVpNcFROQm1Sc1VSaE05?= =?utf-8?B?c2U4a3VrRjFDZ1QwWmhFNUlTR3lvQWNXYTAzbGVCcWVuWERDQU9ubzRXTUFm?= =?utf-8?B?RktleXFQanFHaW84NVJRc2hnOUJzK2dxbTZmR05vZFJ6cHhoRWJuOEJrcmdB?= =?utf-8?B?UVplOXBZTlhlb1ZCOUw1WkR3VGJoQW9KZTFIeHFZa3Y3U3dUc1VXR1lCSXlP?= =?utf-8?B?SUJJWm5oRVdkc2R0M01Ib25aMzNZSFNUQ0xXN1paUkdMelE2ZGN1cDhXcTlM?= =?utf-8?B?TnhaaHBCNmZ0ZE4zYmFzSjZReStVUFRDeCtoZEkzRmhQUW1DUWpMak51TXJ6?= =?utf-8?B?L1ArdGtwU0JLV2NzbW1ucWtwMzRSMDZTd2FRYm92a1BXR1BoNVNQeGZlQUFX?= =?utf-8?B?UUpLNzdybVJieVJ0a09uaVlEU3kvdmp5Zm5nZ3lZc0JadkNRMnU5bWt1Ky8v?= =?utf-8?B?NGc9PQ==?= X-MS-Exchange-CrossTenant-Network-Message-Id: 35ff36bd-abcf-4467-0ceb-08dd5d9e523b X-MS-Exchange-CrossTenant-AuthSource: PH7PR11MB6522.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 Mar 2025 17:34:32.3952 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: awLWKOaOojWODyZ/Men+5gcHFuNyDX4by/Nynzww9DWrMG2xYqhb1YhgrfIKU5LqgE043biUZLMJIuVrhB0/bQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR11MB6650 X-OriginatorOrg: intel.com On Fri, Mar 07, 2025 at 10:37:04AM +0100, Philipp Stanner wrote: > On Thu, 2025-03-06 at 12:57 -0800, Matthew Brost wrote: > > On Wed, Mar 05, 2025 at 02:05:52PM +0100, Philipp Stanner wrote: > > > drm_sched_backend_ops.timedout_job()'s documentation is outdated. > > > It > > > mentions the deprecated function drm_sched_resubmit_jobs(). > > > Furthermore, > > > it does not point out the important distinction between hardware > > > and > > > firmware schedulers. > > > > > > Since firmware schedulers typically only use one entity per > > > scheduler, > > > timeout handling is significantly more simple because the entity > > > the > > > faulted job came from can just be killed without affecting innocent > > > processes. > > > > > > Update the documentation with that distinction and other details. > > > > > > Reformat the docstring to work to a unified style with the other > > > handles. > > > > > > > Looks really good, one suggestion. > > Already merged. But I'm working already on the TODO and could address > your feedback in that followup. > > Of course, would also be great if you could provide a proposal in a > patch? :) > > > > > > Signed-off-by: Philipp Stanner > > > --- > > >  include/drm/gpu_scheduler.h | 78 ++++++++++++++++++++++----------- > > > ---- > > >  1 file changed, 47 insertions(+), 31 deletions(-) > > > > > > diff --git a/include/drm/gpu_scheduler.h > > > b/include/drm/gpu_scheduler.h > > > index 6381baae8024..1a7e377d4cbb 100644 > > > --- a/include/drm/gpu_scheduler.h > > > +++ b/include/drm/gpu_scheduler.h > > > @@ -383,8 +383,15 @@ struct drm_sched_job { > > >   struct xarray dependencies; > > >  }; > > >   > > > +/** > > > + * enum drm_gpu_sched_stat - the scheduler's status > > > + * > > > + * @DRM_GPU_SCHED_STAT_NONE: Reserved. Do not use. > > > + * @DRM_GPU_SCHED_STAT_NOMINAL: Operation succeeded. > > > + * @DRM_GPU_SCHED_STAT_ENODEV: Error: Device is not available > > > anymore. > > > + */ > > >  enum drm_gpu_sched_stat { > > > - DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */ > > > + DRM_GPU_SCHED_STAT_NONE, > > >   DRM_GPU_SCHED_STAT_NOMINAL, > > >   DRM_GPU_SCHED_STAT_ENODEV, > > >  }; > > > @@ -447,43 +454,52 @@ struct drm_sched_backend_ops { > > >   * @timedout_job: Called when a job has taken too long to > > > execute, > > >   * to trigger GPU recovery. > > >   * > > > - * This method is called in a workqueue context. > > > + * @sched_job: The job that has timed out > > >   * > > > - * Drivers typically issue a reset to recover from GPU > > > hangs, and this > > > - * procedure usually follows the following workflow: > > > + * Drivers typically issue a reset to recover from GPU > > > hangs. > > > + * This procedure looks very different depending on > > > whether a firmware > > > + * or a hardware scheduler is being used. > > >   * > > > - * 1. Stop the scheduler using drm_sched_stop(). This will > > > park the > > > - *    scheduler thread and cancel the timeout work, > > > guaranteeing that > > > - *    nothing is queued while we reset the hardware queue > > > - * 2. Try to gracefully stop non-faulty jobs (optional) > > > - * 3. Issue a GPU reset (driver-specific) > > > - * 4. Re-submit jobs using drm_sched_resubmit_jobs() > > > - * 5. Restart the scheduler using drm_sched_start(). At > > > that point, new > > > - *    jobs can be queued, and the scheduler thread is > > > unblocked > > > + * For a FIRMWARE SCHEDULER, each ring has one scheduler, > > > and each > > > + * scheduler has one entity. Hence, the steps taken > > > typically look as > > > + * follows: > > > + * > > > + * 1. Stop the scheduler using drm_sched_stop(). This will > > > pause the > > > + *    scheduler workqueues and cancel the timeout work, > > > guaranteeing > > > + *    that nothing is queued while the ring is being > > > removed. > > > + * 2. Remove the ring. The firmware will make sure that > > > the > > > + *    corresponding parts of the hardware are resetted, > > > and that other > > > + *    rings are not impacted. > > > + * 3. Kill the entity and the associated scheduler. > > > > Xe doesn't do step 3. > > > > It does: > > - Ban entity / scheduler so futures submissions are a NOP. This would > > be > >   submissions with unmet dependencies. Submission at the IOCTL are > >   disallowed > > - Signal all job's fences on the pending list > > - Restart scheduler so free_job() is naturally called > > > > I'm unsure if this how other firmware schedulers do this, but it > > seems > > to work quite well in Xe. Missed this part of the reply. > > Alright, so if I interpret this correctly you do that to avoid our > infamous memory leaks. That makes sense. > Yes. > The memory leaks are documented in drm_sched_fini()'s docu, but it > could make sense to mention them here, too. > The jobs in Xe ref count the scheduler so we never call drm_sched_fini until jobs in the pending list and dependency queues has made called free_job(). > … thinking about it, we probably actually have to rephrase this line. > Just tearing down entity & sched makes those leaks very likely. Argh. > > Nouveau, also a firmware scheduler, has effectively a copy of the > pending_list and also ensures that all fences get signalled. Only once > that copy of the pending list is empty it calls into drm_sched_fini(). > Take a look at nouveau_sched.c if you want, the code is quite > straightforward. > Same idea in Xe I think we just directly access the pending access list. Let me look at what Nouveau is doing before posting an updated doc here patch. Matt > P. > > > > > Matt > > > > > + * > > > + * > > > + * For a HARDWARE SCHEDULER, a scheduler instance > > > schedules jobs from > > > + * one or more entities to one ring. This implies that all > > > entities > > > + * associated with the affected scheduler cannot be torn > > > down, because > > > + * this would effectively also affect innocent userspace > > > processes which > > > + * did not submit faulty jobs (for example). > > > + * > > > + * Consequently, the procedure to recover with a hardware > > > scheduler > > > + * should look like this: > > > + * > > > + * 1. Stop all schedulers impacted by the reset using > > > drm_sched_stop(). > > > + * 2. Kill the entity the faulty job stems from. > > > + * 3. Issue a GPU reset on all faulty rings (driver- > > > specific). > > > + * 4. Re-submit jobs on all schedulers impacted by re- > > > submitting them to > > > + *    the entities which are still alive. > > > + * 5. Restart all schedulers that were stopped in step #1 > > > using > > > + *    drm_sched_start(). > > >   * > > >   * Note that some GPUs have distinct hardware queues but > > > need to reset > > >   * the GPU globally, which requires extra synchronization > > > between the > > > - * timeout handler of the different &drm_gpu_scheduler. > > > One way to > > > - * achieve this synchronization is to create an ordered > > > workqueue > > > - * (using alloc_ordered_workqueue()) at the driver level, > > > and pass this > > > - * queue to drm_sched_init(), to guarantee that timeout > > > handlers are > > > - * executed sequentially. The above workflow needs to be > > > slightly > > > - * adjusted in that case: > > > + * timeout handlers of different schedulers. One way to > > > achieve this > > > + * synchronization is to create an ordered workqueue > > > (using > > > + * alloc_ordered_workqueue()) at the driver level, and > > > pass this queue > > > + * as drm_sched_init()'s @timeout_wq parameter. This will > > > guarantee > > > + * that timeout handlers are executed sequentially. > > >   * > > > - * 1. Stop all schedulers impacted by the reset using > > > drm_sched_stop() > > > - * 2. Try to gracefully stop non-faulty jobs on all queues > > > impacted by > > > - *    the reset (optional) > > > - * 3. Issue a GPU reset on all faulty queues (driver- > > > specific) > > > - * 4. Re-submit jobs on all schedulers impacted by the > > > reset using > > > - *    drm_sched_resubmit_jobs() > > > - * 5. Restart all schedulers that were stopped in step #1 > > > using > > > - *    drm_sched_start() > > > + * Return: The scheduler's status, defined by &enum > > > drm_gpu_sched_stat > > >   * > > > - * Return DRM_GPU_SCHED_STAT_NOMINAL, when all is normal, > > > - * and the underlying driver has started or completed > > > recovery. > > > - * > > > - * Return DRM_GPU_SCHED_STAT_ENODEV, if the device is no > > > longer > > > - * available, i.e. has been unplugged. > > >   */ > > >   enum drm_gpu_sched_stat (*timedout_job)(struct > > > drm_sched_job *sched_job); > > >   > > > -- > > > 2.48.1 > > > >