From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 42582CAC5BD for ; Sat, 27 Sep 2025 23:11:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E8C1C10E1D1; Sat, 27 Sep 2025 23:11:34 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="oDjkkZmY"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) by gabe.freedesktop.org (Postfix) with ESMTPS id 717B110E1D1 for ; Sat, 27 Sep 2025 23:11:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1759014694; x=1790550694; h=date:from:to:cc:subject:message-id:references: in-reply-to:mime-version; bh=4zZbMg6kV6nDGYmryvbcyWRctuewnfRfOQvBmEL/DHU=; b=oDjkkZmY9Dcvp1LBZ/H0S/GE2txIxdY2Bg8QmBGErEJcU4recrNVHMxG 5NLD9rUmDNBV8+px1lZorgzI0Qq2Uw2bT2CSDRSNvrPwgBqJW89P5oDK0 EeZCLzmzSe/nxUdMvsLiKJKnMXt1B6NaEBVI6PuJu9v7Lab5caDsfQ9Av qog9NNWQfkg4+a4CkxZZVOb5cWVuKpa60fhIL6RAzai/fjcrRoPKznMfy y1FrZR/NLteIkRHxBUtLHAQpDXjMkcKzFurl5+40fgkN78dpGWGdpNkq8 ePQND9d0ysSVUQxSqJfTdtbxVdfXb7G5NKY23xCBtGbNAmXGskmkXw9gT Q==; X-CSE-ConnectionGUID: 2lR09qRbQhGXphIoUkL0Vg== X-CSE-MsgGUID: fKKvvgozQvansBf93Uc0wQ== X-IronPort-AV: E=McAfee;i="6800,10657,11566"; a="64936186" X-IronPort-AV: E=Sophos;i="6.18,298,1751266800"; d="scan'208";a="64936186" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Sep 2025 16:11:33 -0700 X-CSE-ConnectionGUID: o7eEu+0iRVyf0EmYgwrNoQ== X-CSE-MsgGUID: vmoYtvqsQImYrnRSCn/Y6g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.18,298,1751266800"; d="scan'208";a="178300161" Received: from fmsmsx902.amr.corp.intel.com ([10.18.126.91]) by fmviesa008.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Sep 2025 16:11:32 -0700 Received: from FMSMSX903.amr.corp.intel.com (10.18.126.92) by fmsmsx902.amr.corp.intel.com (10.18.126.91) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.27; Sat, 27 Sep 2025 16:11:31 -0700 Received: from fmsedg903.ED.cps.intel.com (10.1.192.145) by FMSMSX903.amr.corp.intel.com (10.18.126.92) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.27 via Frontend Transport; Sat, 27 Sep 2025 16:11:31 -0700 Received: from PH8PR06CU001.outbound.protection.outlook.com (40.107.209.50) by edgegateway.intel.com (192.55.55.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.27; Sat, 27 Sep 2025 16:11:31 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=C/vGW6if1VZxA/OKoh5JNVxN9S0g5DYkdIMpXs1K1/ohFKdWZEQGj6AfdTJPXIqk1BxH6yKPmRs8Z9ah3pCzliu0GYP+N7XuL6VfPkJiUvQ7EUYx4QKs4tmVGza1rt7TVgXsFIutIHTi+oPr9ot/cZN6kfjaBSlu3XGKGP4mTJ0HhkqVzYEM/BoziZAqApuzW+BJrztm0PMCztiXx15UR2PKIEN7jTQQd4p0HnWlVtF9tOGj//Sg2ukAXwNGd5rhi2HYnIDUL5Mb3xKq1zIC2PTAqFQ+v6rV0839jo/vdlTiaiB6GWF2033l66km5LeRW3kYgHv1PXa53NmlpmrfqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=cRNgObnd+vaDUkeM+UYydg+VbR3rHm712efOgMB2aoo=; b=dN5EY/3P7PpT7IgbvIlmpd6luYY5cxFHNfHpf9nsoXj6Ol94HMDQ8lkVLmi6oyrZCsnwgOFeDcu61C35neygd0C0O66IE7aSOKVqU4vAfPBNwDzrM0MSaYY42Jmif6ewXPQ2zhrptvbD6XWyD8agFWy35wzBOZ/H8/LiOcC5mWqyMe0Yk/pzPw6kDj0x9yV2kpQDXQokYqcZnEJKFF/jL5e00VDKK6yNk6fgzaVGEjv9saX6cSuH7jYKay9r5uCp4oaAczfjVV67tU1d54knQqVoLUpNPnEFURz+99RgmNij5u9kQpEl6D4115ikkLHIhOHeOBh0rykZdeJUKadxsQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; Received: from PH7PR11MB6522.namprd11.prod.outlook.com (2603:10b6:510:212::12) by SA1PR11MB8428.namprd11.prod.outlook.com (2603:10b6:806:38b::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9160.15; Sat, 27 Sep 2025 23:11:29 +0000 Received: from PH7PR11MB6522.namprd11.prod.outlook.com ([fe80::9e94:e21f:e11a:332]) by PH7PR11MB6522.namprd11.prod.outlook.com ([fe80::9e94:e21f:e11a:332%4]) with mapi id 15.20.9137.018; Sat, 27 Sep 2025 23:11:29 +0000 Date: Sat, 27 Sep 2025 16:11:27 -0700 From: Matthew Brost To: "Lis, Tomasz" CC: Subject: Re: [PATCH v2 26/34] drm/xe/vf: Replay GuC submission state on pause / unpause Message-ID: References: <20250924011601.888293-1-matthew.brost@intel.com> <20250924011601.888293-27-matthew.brost@intel.com> <7d9b5b08-f9bb-40d1-8dd3-8a01ace4a76e@intel.com> Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <7d9b5b08-f9bb-40d1-8dd3-8a01ace4a76e@intel.com> X-ClientProxiedBy: MW4PR03CA0103.namprd03.prod.outlook.com (2603:10b6:303:b7::18) To PH7PR11MB6522.namprd11.prod.outlook.com (2603:10b6:510:212::12) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: PH7PR11MB6522:EE_|SA1PR11MB8428:EE_ X-MS-Office365-Filtering-Correlation-Id: 7061aae8-8fad-4b03-b867-08ddfe1b30ef X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|1800799024|366016; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?Ir2iTzrz/uqENIV9fnZu4iAyzWa3mOiRqiftNysR1y2iuAlY6I9+b3RfFFFW?= =?us-ascii?Q?VcnBoIY1ffLBaalATfm3ZoZN91oqlBMcC+YZQaUy4je2S5wZ4ZUSWl2Wk7Tb?= =?us-ascii?Q?ZJjqgRck/gaAAjU1zU6CvidKaCQkwKcb75WVkyG4M4OtwZcFttMC5Ax1nI0/?= =?us-ascii?Q?fyzReQmopg22Cozj474L+CliLaNR17CMr1bI1a+pQRHqDxN2k04XweCK3P1N?= =?us-ascii?Q?bzpZJugxfXO46XdE+35sx22CGvVpdapL0HqCvEywQnekHO318zBdoqTkG5g6?= =?us-ascii?Q?8Wl629ICr5bOJrGGhCBDSJcPSLCIdfNN8xdca8mInJLg7EEJ+2o7WEysV3jn?= =?us-ascii?Q?6F88q77Du4lxKje5VNpZ7SXlEggg1GbcZ6fETk5CpWsEevquXogKQ5hwoDQh?= =?us-ascii?Q?2SoBRbqgFrsoNa+Pvoj7q+5uLm5z+MypJsBhKXWHPi9X4tcAPu5xlDL8Efup?= =?us-ascii?Q?k0fSCSghUXyIfHMLdNS+J+oHgZOkOLwquWwPbf3PUPdd0NFIn+5EaOSJn7V0?= =?us-ascii?Q?FlWpa5f90GBnUF2NkFqdktk/SvC+q9nW9+8nrOPPabtAxcHJui7t7loYwYDr?= =?us-ascii?Q?S4c2dWima1vG3R0j64hU1QHTrdr/dqQ1g3xMHtVhVZ2PIgWqSCyb6zE7gXK4?= =?us-ascii?Q?7zTa1Ank5Cq/qi0280X8c2IHNvEoMhjrObWLYx2AitgYQ/aoJtveb17RpHI0?= =?us-ascii?Q?VnKH4fC8C6QTzvyE8GTK11gB8U9IPLs9TUDMUmelEFlOdnYwDSk2txVWKdxh?= =?us-ascii?Q?dMUKbMYVr+sPzBwKwj8xiVruDQXBWgdllNHir3YryzcEr4mPw30NmV7DwYws?= =?us-ascii?Q?44LO4uAPP5Irw6Fezb8UZc2nx2DeWyEtn2AMdt5i6d6EY/jDghQoBWMZTIXt?= =?us-ascii?Q?f19e90paDzfF+jh03ZtvpcYVABA2Kqu23l+ThiO+bnUSc2EhYnFo6X5UC9Ut?= =?us-ascii?Q?VrUDfU8iACpRkOyD4kr/h7XCS/p8DQUSCruSbIY3ijFqYrrOPjDU2jzBesti?= =?us-ascii?Q?ml/qgLe/bCfiabVAMUUPPAMD9xzVGOHDboo+AeA3n2VV/NAq09akBaVCedC3?= =?us-ascii?Q?iyPU2XA3a9w7xNnH4ognGkQqYJ8eiOGu3VAIu81IIuABueRjCq7kpMfIEI4r?= =?us-ascii?Q?qjwGUvqGmyZ8HPAKCKmHrsZUjnpLxCi7Ze4scG8uukzTyiuymuYBLIhafilm?= =?us-ascii?Q?QW4pO7Q4LcEiKu79ikdNhQiSYTndDBf8FgEW7/lf8MOhRG3ybY1+mJ2WtKFw?= =?us-ascii?Q?T3smlRLGENk+FYKaFVb8d5lY5VTAkozerQyHLP7k5D2JBdVb00M4Xhi1RNuv?= =?us-ascii?Q?CIk8pg5zR1XtTeQcmi6O6ikFQcugiIQQKWkRjRvM7CbOgjxSGp9aPqS/qKDN?= =?us-ascii?Q?z3bQ07e5zheURr/9z2mcEynu+qzmaAr2WqDHcZLsFhgXHSS5dRlTI4NDJP/a?= =?us-ascii?Q?Yh/gf1vig6aWhMUnlyNyM/nMANq6nhtx?= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PH7PR11MB6522.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(1800799024)(366016); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?4PgTece0ia8j3mhZPb6nFmYXlYVnlESORDeoV77a8cRvk5Fg0h6RST8TckXz?= =?us-ascii?Q?zMHeI819Hr0Ju5h3C2ytHFmF1gBUYWp6agckQzgps/WGjw36eQCcw0gH365D?= =?us-ascii?Q?QGz+K0nVoWAaoiqN0Ch38nZiBlLcOle+tWaW/QOeYG57mMxGgFVoerysD3y5?= =?us-ascii?Q?kmUzT/pBbiSF+VkbJ3RdgHvyOf6FT/R/D1/gYzeWsDfhsFq8p/8TUrp424Pn?= =?us-ascii?Q?Or96afzEIJVHxg2gEDBYA5E6THAcfGdbpzVZJEpq8JaUUNZLSqAOgmmES4r3?= =?us-ascii?Q?4N9zRV4ojRLN2o/F3ILa9XIJABFIvsNiBRumVCiamzAYZmVzQ1ZsnZhYKSQe?= =?us-ascii?Q?UKd1nDmK2APXGPsNmaGMUJMeUQiQ9xPn11vqncfcBotRVVFDzcCVtOsH+K5R?= =?us-ascii?Q?tuw9ccDHgKPm4nCeMBZVE9n689crTcHwsZsDy3r6zkvn+TxctRV/TEWDXluI?= =?us-ascii?Q?xQFLBjvpxqy/mTH/hxxpN87yKi5PsA2mYiQab1W89k6QufjJYM2CI/j3osvW?= =?us-ascii?Q?mECgDevhjbRa2zrXlNusklvTepMJn38By8cB2+FtUdtWMIR0mlj/pamGJ4oO?= =?us-ascii?Q?Syx5CMqV7Sox/4uhmzO1+f2V7m2EVvxGhDXui7wopeUjBiEjvBBtxN5E1x56?= =?us-ascii?Q?fP0BoLQrgUk3ZpEQxtNvKeJad5TmBDtBxEv9mnp1kt4S0qqPlJ6IuKMgu0xX?= =?us-ascii?Q?liqT4pPjftE6JWvTCFmLgk/jXyd1iVrouYkw9aVllwYq/ZJ30Svi26hPN/Lu?= =?us-ascii?Q?ryUD1mX/n+1MBhS5jqAMvCULIv0drcVHk9EK204ZdMoMgx4beSt+authMmqp?= =?us-ascii?Q?AHeqmdGTttcQ0FeEmDRGBDzMvr3ce2UdqT5gnRDyI80nWx45Ze9KRyW0UQQ6?= =?us-ascii?Q?JEr2njNDlXPcmvGb0Q1HV3x+HjiUxRw2RDE1mQyEe+ifxM6Tm7tSOJnImkPr?= =?us-ascii?Q?ep0TmycASBYoCWDrWMs+ygb5c/x91uVtdzuJX8bN4jYG6wYq0kgL3LVWHDxd?= =?us-ascii?Q?/mpgAFa+LtOy6L/ggjgCqxvpCEUmf2bTQbyDR+RPT/K7Qn294ZGQXLchQi5D?= =?us-ascii?Q?qtiSkJkeFFljLfHKRSpfZcFj1qt54L/yqJRXiivWz8E/zd4AQr+I+jC6TXOc?= =?us-ascii?Q?mkOO0jWBJF0n+fVybGRgzuyLWVVnDKJTyScA33bELb2K0lKZyZSqkh0DWey4?= =?us-ascii?Q?bg7gUfDJmN5UoXbhtRchaeqgHV0v3Zc7UShp89Pbh1RT4769NdpJHIolA7ZI?= =?us-ascii?Q?dcJGPGEhlGDBvxDptDF9G031YEGIIX3pN0gptiyjrW6ZHszQSzRFnumyAGu5?= =?us-ascii?Q?tdkKYPTqIvPOAcrs7YwlQkFqQxELmiuDKvF06jL8eqDP4zcMIQz+YqkbNiux?= =?us-ascii?Q?i9tMz3bUHrky4WcfmBYeAyEhR8ighqndwOm1MWKnBW5kq2Y5mKD7zcgWrEtU?= =?us-ascii?Q?E98sCyw1w9yJP1b+2saOmTHcKOc1uNzUOCniPjcw6q3Sf+RxJqY7CoBGJEsE?= =?us-ascii?Q?qaKeqvi1DEPnsr+ykwvBmZna4SIsQ7rAOF0n4xW1uZFnY8Zkjeinx/A7NqRT?= =?us-ascii?Q?cFdWJRB8GIcVd7xT9rah2ziUze8tFuLO5iCwyut0KYQ41sXqNR/aAZwW0Pvz?= =?us-ascii?Q?7w=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: 7061aae8-8fad-4b03-b867-08ddfe1b30ef X-MS-Exchange-CrossTenant-AuthSource: PH7PR11MB6522.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Sep 2025 23:11:29.6470 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 9ULS76Yck5POvjPrnNdT/bZvnHEbRDwbxAfBcaw4g4QeqpU5Bm0wGHndKoAI7Dkdvq4fNOYj1GP36+XONeFOhw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR11MB8428 X-OriginatorOrg: intel.com X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Sat, Sep 27, 2025 at 03:33:43PM +0200, Lis, Tomasz wrote: > > On 9/24/2025 3:15 AM, Matthew Brost wrote: > > Fixup GuC submission pause / unpause functions to properly replay any > > possible state lost during VF post migration recovery. > > > > Signed-off-by: Matthew Brost > > --- > > drivers/gpu/drm/xe/xe_gpu_scheduler.c | 14 ++ > > drivers/gpu/drm/xe/xe_gpu_scheduler.h | 2 + > > drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 1 + > > drivers/gpu/drm/xe/xe_guc_exec_queue_types.h | 15 ++ > > drivers/gpu/drm/xe/xe_guc_submit.c | 225 +++++++++++++++++-- > > drivers/gpu/drm/xe/xe_guc_submit.h | 1 + > > drivers/gpu/drm/xe/xe_sched_job_types.h | 4 + > > 7 files changed, 247 insertions(+), 15 deletions(-) > > > > diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.c b/drivers/gpu/drm/xe/xe_gpu_scheduler.c > > index 455ccaf17314..af300adc7e1a 100644 > > --- a/drivers/gpu/drm/xe/xe_gpu_scheduler.c > > +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.c > > @@ -135,3 +135,17 @@ void xe_sched_add_msg_locked(struct xe_gpu_scheduler *sched, > > list_add_tail(&msg->link, &sched->msgs); > > xe_sched_process_msg_queue(sched); > > } > > + > > +/** > > + * xe_sched_add_msg_head() - Xe GPU scheduler add message to head of list > > + * @sched: Xe GPU scheduler > > + * @msg: Message to add > > + */ > > +void xe_sched_add_msg_head(struct xe_gpu_scheduler *sched, > > + struct xe_sched_msg *msg) > > +{ > > + lockdep_assert_held(&sched->base.job_list_lock); > > + > > + list_add(&msg->link, &sched->msgs); > > + xe_sched_process_msg_queue(sched); > > +} > > diff --git a/drivers/gpu/drm/xe/xe_gpu_scheduler.h b/drivers/gpu/drm/xe/xe_gpu_scheduler.h > > index e548b2aed95a..010003a6103a 100644 > > --- a/drivers/gpu/drm/xe/xe_gpu_scheduler.h > > +++ b/drivers/gpu/drm/xe/xe_gpu_scheduler.h > > @@ -29,6 +29,8 @@ void xe_sched_add_msg(struct xe_gpu_scheduler *sched, > > struct xe_sched_msg *msg); > > void xe_sched_add_msg_locked(struct xe_gpu_scheduler *sched, > > struct xe_sched_msg *msg); > > +void xe_sched_add_msg_head(struct xe_gpu_scheduler *sched, > > + struct xe_sched_msg *msg); > > static inline void xe_sched_msg_lock(struct xe_gpu_scheduler *sched) > > { > > diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c > > index a987560de2c7..91e7dbe80ab2 100644 > > --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c > > +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c > > @@ -1217,6 +1217,7 @@ static int vf_post_migration_fixups(struct xe_gt *gt) > > static void vf_post_migration_rearm(struct xe_gt *gt) > > { > > xe_guc_ct_restart(>->uc.guc.ct); > > + xe_guc_submit_unpause_prepare(>->uc.guc); > > } > > static void vf_post_migration_kickstart(struct xe_gt *gt) > > diff --git a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h > > index c30c0e3ccbbb..a3b034e4b205 100644 > > --- a/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h > > +++ b/drivers/gpu/drm/xe/xe_guc_exec_queue_types.h > > @@ -51,6 +51,21 @@ struct xe_guc_exec_queue { > > wait_queue_head_t suspend_wait; > > /** @suspend_pending: a suspend of the exec_queue is pending */ > > bool suspend_pending; > > + /** > > + * @needs_cleanup: Needs a cleanup message during VF post migration > > + * recovery. > > + */ > > + bool needs_cleanup; > > + /** > > + * @needs_suspend: Needs a suspend message during VF post migration > > + * recovery. > > + */ > > + bool needs_suspend; > > + /** > > + * @needs_resume: Needs a resume message during VF post migration > > + * recovery. > > + */ > > + bool needs_resume; > > }; > > #endif > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c > > index 8bee65dd9ca6..b112a4a91a5b 100644 > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c > > @@ -425,6 +425,11 @@ static void set_exec_queue_destroyed(struct xe_exec_queue *q) > > atomic_or(EXEC_QUEUE_STATE_DESTROYED, &q->guc->state); > > } > > +static void clear_exec_queue_destroyed(struct xe_exec_queue *q) > > +{ > > + atomic_and(~EXEC_QUEUE_STATE_DESTROYED, &q->guc->state); > > +} > > + > > static bool exec_queue_banned(struct xe_exec_queue *q) > > { > > return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_BANNED; > > @@ -505,7 +510,12 @@ static void set_exec_queue_extra_ref(struct xe_exec_queue *q) > > atomic_or(EXEC_QUEUE_STATE_EXTRA_REF, &q->guc->state); > > } > > -static bool __maybe_unused exec_queue_pending_resume(struct xe_exec_queue *q) > > +static void clear_exec_queue_extra_ref(struct xe_exec_queue *q) > > +{ > > + atomic_and(~EXEC_QUEUE_STATE_EXTRA_REF, &q->guc->state); > > +} > > + > > +static bool exec_queue_pending_resume(struct xe_exec_queue *q) > > { > > return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_PENDING_RESUME; > > } > > @@ -520,7 +530,7 @@ static void clear_exec_queue_pending_resume(struct xe_exec_queue *q) > > atomic_and(~EXEC_QUEUE_STATE_PENDING_RESUME, &q->guc->state); > > } > > -static bool __maybe_unused exec_queue_pending_tdr_exit(struct xe_exec_queue *q) > > +static bool exec_queue_pending_tdr_exit(struct xe_exec_queue *q) > > { > > return atomic_read(&q->guc->state) & EXEC_QUEUE_STATE_PENDING_TDR_EXIT; > > } > > @@ -1080,7 +1090,7 @@ static void wq_item_append(struct xe_exec_queue *q) > > } > > #define RESUME_PENDING ~0x0ull > > -static void submit_exec_queue(struct xe_exec_queue *q) > > +static void submit_exec_queue(struct xe_exec_queue *q, struct xe_sched_job *job) > > { > > struct xe_guc *guc = exec_queue_to_guc(q); > > struct xe_lrc *lrc = q->lrc[0]; > > @@ -1092,10 +1102,13 @@ static void submit_exec_queue(struct xe_exec_queue *q) > > xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q)); > > - if (xe_exec_queue_is_parallel(q)) > > - wq_item_append(q); > > - else > > - xe_lrc_set_ring_tail(lrc, lrc->ring.tail); > > + if (!job->skip_emit || job->last_replay) { > > + if (xe_exec_queue_is_parallel(q)) > > + wq_item_append(q); > > + else > > + xe_lrc_set_ring_tail(lrc, lrc->ring.tail); > > + job->last_replay = false; > > + } > > if (exec_queue_suspended(q) && !xe_exec_queue_is_parallel(q)) > > return; > > @@ -1148,8 +1161,10 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job) > > if (!killed_or_banned_or_wedged && !xe_sched_job_is_error(job)) { > > if (!exec_queue_registered(q)) > > register_exec_queue(q, GUC_CONTEXT_NORMAL); > > - q->ring_ops->emit_job(job); > > - submit_exec_queue(q); > > + if (!job->skip_emit) > > + q->ring_ops->emit_job(job); > > + submit_exec_queue(q, job); > > + job->skip_emit = false; > > } > > /* > > @@ -1860,6 +1875,7 @@ static void __guc_exec_queue_process_msg_resume(struct xe_sched_msg *msg) > > #define RESUME 4 > > #define OPCODE_MASK 0xf > > #define MSG_LOCKED BIT(8) > > +#define MSG_HEAD BIT(9) > > static void guc_exec_queue_process_msg(struct xe_sched_msg *msg) > > { > > @@ -1984,12 +2000,24 @@ static void guc_exec_queue_add_msg(struct xe_exec_queue *q, struct xe_sched_msg > > msg->private_data = q; > > trace_xe_sched_msg_add(msg); > > - if (opcode & MSG_LOCKED) > > + if (opcode & MSG_HEAD) > > + xe_sched_add_msg_head(&q->guc->sched, msg); > > + else if (opcode & MSG_LOCKED) > > xe_sched_add_msg_locked(&q->guc->sched, msg); > > else > > xe_sched_add_msg(&q->guc->sched, msg); > > } > > +static void guc_exec_queue_try_add_msg_head(struct xe_exec_queue *q, > > + struct xe_sched_msg *msg, > > + u32 opcode) > > +{ > > + if (!list_empty(&msg->link)) > > + return; > > + > > + guc_exec_queue_add_msg(q, msg, opcode | MSG_LOCKED | MSG_HEAD); > > +} > > + > > static bool guc_exec_queue_try_add_msg(struct xe_exec_queue *q, > > struct xe_sched_msg *msg, > > u32 opcode) > > @@ -2264,6 +2292,93 @@ void xe_guc_submit_stop(struct xe_guc *guc) > > } > > +/* > > + * This function is quite complex but only real way to ensure no state is lost > > + * during VF resume flows. The function scans the queue state, make adjustments > > + * as needed, and queues jobs / messages which replayed upon unpause. > > + */ > > +static void guc_exec_queue_pause(struct xe_guc *guc, struct xe_exec_queue *q) > > +{ > > + struct xe_gpu_scheduler *sched = &q->guc->sched; > > + struct xe_sched_job *job; > > + bool pending_enable, pending_disable, pending_resume; > > + int i; > > + > > + lockdep_assert_held(&guc->submission_state.lock); > > + > > + /* Stop scheduling + flush any DRM scheduler operations */ > > + xe_sched_submission_stop(sched); > > + if (xe_exec_queue_is_lr(q)) > > + cancel_work_sync(&q->guc->lr_tdr); > > + else > > + cancel_delayed_work_sync(&sched->base.work_tdr); > > We're doing the same cancelling in `__guc_exec_queue_destroy_async()`, maybe > close it into a function? > I could but in follow up I'm going to drop &q->guc->lr_tdr and just use DRM scheduler TDR so it will just be cancel_delayed_work_sync. I have the code for this lying around and there a couple of upcoming features this will get simplified if &q->guc->lr_tdr is droppped. > > + > > + pending_enable = exec_queue_pending_enable(q); > > + pending_resume = exec_queue_pending_resume(q); > > + > > + if (pending_enable && pending_resume) > > + q->guc->needs_resume = true; > > + > > + if (pending_enable && !pending_resume && > > + !exec_queue_pending_tdr_exit(q)) { > > + clear_exec_queue_registered(q); > > + if (xe_exec_queue_is_lr(q)) > > + xe_exec_queue_put(q); > > + } > > + > > + if (pending_enable) { > > + clear_exec_queue_enabled(q); > > + clear_exec_queue_pending_resume(q); > > + clear_exec_queue_pending_tdr_exit(q); > > + clear_exec_queue_pending_enable(q); > > + } > > + > > + if (exec_queue_destroyed(q) && exec_queue_registered(q)) { > > + clear_exec_queue_destroyed(q); > > + if (exec_queue_extra_ref(q)) > > + xe_exec_queue_put(q); > > + else > > + q->guc->needs_cleanup = true; > > + clear_exec_queue_extra_ref(q); > > + } > > + > > + pending_disable = exec_queue_pending_disable(q); > > + > > + if (pending_disable && exec_queue_suspended(q)) { > > + clear_exec_queue_suspended(q); > > + q->guc->needs_suspend = true; > > + } > > + > > + if (pending_disable) { > > + if (!pending_enable) > > + set_exec_queue_enabled(q); > > + clear_exec_queue_pending_disable(q); > > + clear_exec_queue_check_timeout(q); > > + } > > maybe we can close the above into a separate function as well? > > ie. guc_exec_queue_undo_unfinished_state_change()? > Do you mean all the parsing of exec queue state into a single function? That seems reasonable. I don't really want to abstract each individual if statement here as that in IMO that is too much abstraction and hard to figure out what is exactly going on. > guc_exec_queue_revert_pending_state_change()? > > That would make this function easier to read, but also describe what we're > doing. > > Then, a counterfunction could be ripped out of guc_exec_queue_unpause(). > > > + > > + q->guc->resume_time = 0; > > + > > + if (xe_exec_queue_is_parallel(q)) { > > + struct xe_device *xe = guc_to_xe(guc); > > + struct iosys_map map = xe_lrc_parallel_map(q->lrc[0]); > > + > > + for (i = 0; i < WQ_SIZE / sizeof(u32); ++i) > > + parallel_write(xe, map, wq[i], > > + FIELD_PREP(WQ_TYPE_MASK, WQ_TYPE_NOOP) | > > + FIELD_PREP(WQ_LEN_MASK, 0)); > > ok so for parallel wq we're NOP'ing everything and adding the items back at > new positions? Maybe a comment here would help in understanding that. > Yes, the GuC didn't like messing with the head / tail. I can add a comment. Matt > -Tomasz > > > + } > > + > > + job = xe_sched_first_pending_job(sched); > > + if (job) { > > + /* > > + * Adjust software tail so jobs submitted overwrite previous > > + * position in ring buffer with new GGTT addresses. > > + */ > > + for (i = 0; i < q->width; ++i) > > + q->lrc[i]->ring.tail = job->ptrs[i].head; > > + } > > +} > > + > > /** > > * xe_guc_submit_pause - Stop further runs of submission tasks on given GuC. > > * @guc: the &xe_guc struct instance whose scheduler is to be disabled > > @@ -2273,8 +2388,12 @@ void xe_guc_submit_pause(struct xe_guc *guc) > > struct xe_exec_queue *q; > > unsigned long index; > > + xe_gt_assert(guc_to_gt(guc), vf_recovery(guc)); > > + > > + mutex_lock(&guc->submission_state.lock); > > xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) > > - xe_sched_submission_stop_async(&q->guc->sched); > > + guc_exec_queue_pause(guc, q); > > + mutex_unlock(&guc->submission_state.lock); > > } > > static void guc_exec_queue_start(struct xe_exec_queue *q) > > @@ -2323,11 +2442,87 @@ int xe_guc_submit_start(struct xe_guc *guc) > > return 0; > > } > > -static void guc_exec_queue_unpause(struct xe_exec_queue *q) > > +static void guc_exec_queue_unpause_prepare(struct xe_guc *guc, > > + struct xe_exec_queue *q) > > { > > struct xe_gpu_scheduler *sched = &q->guc->sched; > > + struct drm_sched_job *s_job; > > + struct xe_sched_job *job = NULL; > > + > > + list_for_each_entry(s_job, &sched->base.pending_list, list) { > > + job = to_xe_sched_job(s_job); > > + > > + q->ring_ops->emit_job(job); > > + job->skip_emit = true; > > + } > > + > > + if (job) > > + job->last_replay = true; > > +} > > + > > +/** > > + * xe_guc_submit_unpause_prepare - Prepare unpause submission tasks on given GuC. > > + * @guc: the &xe_guc struct instance whose scheduler is to be prepared for unpause > > + */ > > +void xe_guc_submit_unpause_prepare(struct xe_guc *guc) > > +{ > > + struct xe_exec_queue *q; > > + unsigned long index; > > + > > + xe_gt_assert(guc_to_gt(guc), vf_recovery(guc)); > > + > > + mutex_lock(&guc->submission_state.lock); > > + xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) > > + guc_exec_queue_unpause_prepare(guc, q); > > + mutex_unlock(&guc->submission_state.lock); > > +} > > + > > +static void guc_exec_queue_unpause(struct xe_guc *guc, struct xe_exec_queue *q) > > +{ > > + struct xe_gpu_scheduler *sched = &q->guc->sched; > > + struct xe_sched_msg *msg; > > + bool needs_tdr = exec_queue_killed_or_banned_or_wedged(q); > > + > > + lockdep_assert_held(&guc->submission_state.lock); > > + > > + xe_sched_resubmit_jobs(sched); > > + > > + if (q->guc->needs_cleanup) { > > + msg = q->guc->static_msgs + STATIC_MSG_CLEANUP; > > + > > + guc_exec_queue_add_msg(q, msg, CLEANUP); > > + q->guc->needs_cleanup = false; > > + } > > + > > + if (q->guc->needs_suspend) { > > + msg = q->guc->static_msgs + STATIC_MSG_SUSPEND; > > + > > + xe_sched_msg_lock(sched); > > + guc_exec_queue_try_add_msg_head(q, msg, SUSPEND); > > + xe_sched_msg_unlock(sched); > > + > > + q->guc->needs_suspend = false; > > + } > > + > > + /* > > + * The resume must be in the message queue before the suspend as it is > > + * not possible for a resume to be issued if a suspend pending is, but > > + * the inverse is possible. > > + */ > > + if (q->guc->needs_resume) { > > + msg = q->guc->static_msgs + STATIC_MSG_RESUME; > > + > > + xe_sched_msg_lock(sched); > > + guc_exec_queue_try_add_msg_head(q, msg, RESUME); > > + xe_sched_msg_unlock(sched); > > + > > + q->guc->needs_resume = false; > > + } > > xe_sched_submission_start(sched); > > + if (needs_tdr) > > + xe_guc_exec_queue_trigger_cleanup(q); > > + xe_sched_submission_resume_tdr(sched); > > } > > /** > > @@ -2339,10 +2534,10 @@ void xe_guc_submit_unpause(struct xe_guc *guc) > > struct xe_exec_queue *q; > > unsigned long index; > > + mutex_lock(&guc->submission_state.lock); > > xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) > > - guc_exec_queue_unpause(q); > > - > > - wake_up_all(&guc->ct.wq); > > + guc_exec_queue_unpause(guc, q); > > + mutex_unlock(&guc->submission_state.lock); > > } > > /** > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h > > index fe82c317048e..b49a2748ec46 100644 > > --- a/drivers/gpu/drm/xe/xe_guc_submit.h > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.h > > @@ -22,6 +22,7 @@ void xe_guc_submit_stop(struct xe_guc *guc); > > int xe_guc_submit_start(struct xe_guc *guc); > > void xe_guc_submit_pause(struct xe_guc *guc); > > void xe_guc_submit_unpause(struct xe_guc *guc); > > +void xe_guc_submit_unpause_prepare(struct xe_guc *guc); > > void xe_guc_submit_pause_abort(struct xe_guc *guc); > > void xe_guc_submit_wedge(struct xe_guc *guc); > > diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h > > index 7ce58765a34a..13e7a12b03ad 100644 > > --- a/drivers/gpu/drm/xe/xe_sched_job_types.h > > +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h > > @@ -63,6 +63,10 @@ struct xe_sched_job { > > bool ring_ops_flush_tlb; > > /** @ggtt: mapped in ggtt. */ > > bool ggtt; > > + /** @skip_emit: skip emitting the job */ > > + bool skip_emit; > > + /** @last_replay: last job being replayed */ > > + bool last_replay; > > /** @ptrs: per instance pointers. */ > > struct xe_job_ptrs ptrs[]; > > };