From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A83571048925 for ; Sat, 28 Feb 2026 00:36:14 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 688E610EC51; Sat, 28 Feb 2026 00:36:14 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="chWunuEZ"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id A7FB910EC4E for ; Sat, 28 Feb 2026 00:36:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1772238973; x=1803774973; h=date:from:to:cc:subject:message-id:references: content-transfer-encoding:in-reply-to:mime-version; bh=/ZhsGkxjV2dtOSO5DnF8ztg5ZzJAmLW2DEZKltSLQI4=; b=chWunuEZen8tPilxjl8qxrrUiw1AiQutK4LOW8wqEONeyNHP0sbGFFi5 A8tlrgowEL/OfqgRHAgM1pT9MNVCMfaxZXxObtsabnttUPyZfAvth6FOm 9FtOaxlirStlDdzkUnDUNvTHjVleTsTsEHe3QHkhNH5bsknLWhHIxzwBs J9ZO8got6BrT9cWoxoeTS/mqtzit1PJE/kL3PC4jA/YDTr4tU2HHkcpBl NUFR8/jG+s8LGCHiNrX0d+HnvmoADqXappUpSjB6aTa7dHs96GmxmRFmY I6TthYy3csdTudo+YQVPBP0lIJ8mUYXo3RRHtaxcWgHhaAcll6HZnhbYg A==; X-CSE-ConnectionGUID: 0CYdy6WLT++EIv6a8/dHnw== X-CSE-MsgGUID: 3p85CdEIROK1O+lwlcU5YA== X-IronPort-AV: E=McAfee;i="6800,10657,11714"; a="84412682" X-IronPort-AV: E=Sophos;i="6.21,315,1763452800"; d="scan'208";a="84412682" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2026 16:36:12 -0800 X-CSE-ConnectionGUID: nhjrzJN0Sb+lgWFwS5U3Xg== X-CSE-MsgGUID: zxaCYzs3SoOnfrENfw0bog== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,315,1763452800"; d="scan'208";a="216957683" Received: from fmsmsx902.amr.corp.intel.com ([10.18.126.91]) by orviesa009.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Feb 2026 16:36:12 -0800 Received: from FMSMSX903.amr.corp.intel.com (10.18.126.92) by fmsmsx902.amr.corp.intel.com (10.18.126.91) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Fri, 27 Feb 2026 16:36:11 -0800 Received: from fmsedg903.ED.cps.intel.com (10.1.192.145) by FMSMSX903.amr.corp.intel.com (10.18.126.92) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37 via Frontend Transport; Fri, 27 Feb 2026 16:36:11 -0800 Received: from CH1PR05CU001.outbound.protection.outlook.com (52.101.193.47) by edgegateway.intel.com (192.55.55.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.37; Fri, 27 Feb 2026 16:36:11 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=vrniypW5MyCtmIOTUGm+ltWZvFPwI23UbrbvooIU6t/vDZm0PrA30YXSMDdd+2EKZXu2gpPDJcyH6WXJe2WCA3TAUrXvz6o/mVnAK9iMC3abHIxjA2iOAugQhr+hXeyp7GLknsD1B7aRbwdHMhL9ORpG16Ae0W7UPR0QFtqX496AZ4bC+1SG4bOV/+iWXk0Wh5PZSPhg3HPFksFlyp90pNcf3hb+k0V+EXh2dVn05YvhrajPiAgUDqA8fpEtVxTypG6BPjysKcyDWqx/4kOHDfdlqUDTTCOG0eLmOsXURbgPMaMiSYE4XfFRbfaCgeL5KtNvfKcql1I7nwbn9HxHHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=3HZlI+JpNFYFoVRq+jfCCF+w/FukJTENe1PNf3bn73A=; b=g81Z29EuXjXv9i7JgBWlZf8xmPkh5RGPNQBa7fHw4MIkH9RdteZNr5vZb5bdNK/ZTSHXrKP9t0NaiTxt301DUev5q6qAVkuN8KRiZw2Ud48aXND0k2c5S8fz8TASbGLZGfLU0vFmF6vH7dMfUTzEhMjgEVGMEcXVtvqInq5oVFe2LNROHTGykc2b9WelP/ZyPRMdvL96BkZhbNnG2/dfuW/NxaXezFIGBS33UOyUFg69oa1lhtJod7MOlbjLgZ2ln9jRlGC9jW6WEnlkBILXqCY0P9qAQzaccBK10q4MfKzRdJxSyQf7KoX4QhzIzK3BT0lrwSAuWJXUV/flTM056A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; Received: from PH7PR11MB6522.namprd11.prod.outlook.com (2603:10b6:510:212::12) by IA3PR11MB8937.namprd11.prod.outlook.com (2603:10b6:208:57c::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9632.13; Sat, 28 Feb 2026 00:36:03 +0000 Received: from PH7PR11MB6522.namprd11.prod.outlook.com ([fe80::e0c5:6cd8:6e67:dc0c]) by PH7PR11MB6522.namprd11.prod.outlook.com ([fe80::e0c5:6cd8:6e67:dc0c%6]) with mapi id 15.20.9654.014; Sat, 28 Feb 2026 00:36:03 +0000 Date: Fri, 27 Feb 2026 16:36:00 -0800 From: Matthew Brost To: Gwan-gyeong Mun CC: Mika Kuoppala , , , , , , , , , , , Jan =?utf-8?Q?Ma=C5=9Blak?= Subject: Re: [PATCH 21/22] drm/xe/eudebug: Introduce EU pagefault handling interface Message-ID: References: <20260223140318.1822138-1-mika.kuoppala@linux.intel.com> <20260223140318.1822138-22-mika.kuoppala@linux.intel.com> Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-ClientProxiedBy: MW4PR04CA0095.namprd04.prod.outlook.com (2603:10b6:303:83::10) To PH7PR11MB6522.namprd11.prod.outlook.com (2603:10b6:510:212::12) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: PH7PR11MB6522:EE_|IA3PR11MB8937:EE_ X-MS-Office365-Filtering-Correlation-Id: ca259a11-7335-4d86-6631-08de76615a0e X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|366016; X-Microsoft-Antispam-Message-Info: GgTZoDDcmLN9y76FaIXMDVsu/PQg7y7l3oS05/FbRjAkkmVPhtWFWQXcXgJpW+Q8MLabGx1wsFoRiKAOkJ0tOv34secJRJxWzqOPl/80z8j63Ryu1TwFYXKd5VtOW1xqkIxjNHXNT+h8J2B110d0k1IGzOqULdRetMmGpL3E/wYBtluNPtwoPDqtHp+QG9MUYLaeZIJW66si9sPbww0uP+g/83Ragp8R+27WJOOteP+Kib3Qo69VPX7rdA1KdpDAdQdfIlp41bJMuF45LSBCn8iMi1u+SLRdF3Es7Ael2tm5IHOKQC07Uo68dof46veRhNNDEp6H6sf0QqXS6nUzRTUJlm5BWh2zJemT0E4MECVPrq0FzR5yA47MUj+ebS3kHWh7454fXOQ5KnLnMWFeO8rF3WauisCxviZo3KZYPb5QwuiF1clm7IN/tZCZjm4fXPXhr75XFIKONGPxGU/od9/D3dsKMAHbTK4QuLXdDWuCuqv8lyp3JTiUHX/Svx5GfPfX9u61+46smQaxS8Y5ZjrX2P6wGY8YEb7Nw05e7sCqJFWcN3tlnid1/ziEhqC7Ud94+7SpoeS5Sl44b663gvxbaVU8r6rNaE47oQKTyUOhQjQP/+uXwflUNdhD0kJLrCgQFjRUQH+mhaWaAvvYX6Hbft0eofEDvJV3KpyyZn09vpdXnQnT4uPWjigC8bR3 X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PH7PR11MB6522.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(1800799024)(376014)(366016); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?Tm1tNW9NUUUzdmhndjhXd0ZUd04zWWhlTkdYTDBSRjBaOGxsdlJzdkpqVHpz?= =?utf-8?B?VnpyMlVMbStKeGZpVUE3S1RZSXFmclVuNHdyNW1CYWN6Ylo3T2lzN1RIcWNC?= =?utf-8?B?U3dvaUVJWTFDNUJWbWt2TXhlak5kWTRPMWhDN3l2Z1k3Qy9jVnIyQ3BvbjAv?= =?utf-8?B?RjV0UzdLa0hSVUlEamF3d25uMVIvWE1QdzM4UjlrVU0xYmJyVGlFL0FYNk9P?= =?utf-8?B?UE1vLy8rUmh0UjRxWnNWS0t4ZU0yVk5vNnZJcHl6eTQ4NjBEY0RGVTFiVlo5?= =?utf-8?B?bkl0N1FKa1BPZ2lGS1YxbC93aGp5a21WU21IYzJBeEg3cGE2eVk5Ky9CMWd1?= =?utf-8?B?THFtL0YybEx3anowWXRuR2pKY3pFTFo2S09PWEFoUStaQmMyMFhZeU9IYUIx?= =?utf-8?B?S1JvckM4QVpvcHVhSHB4cGVrZzJYVk9tcWV4UVU0MzhBWm5HRzNWWmZqZm9Q?= =?utf-8?B?VStRTnJteVZnVWZ0Rml3Uk1iaTNUTjVXNUNibFpFeXdVZ1NjTTBnNDhxVTdJ?= =?utf-8?B?UHovek9LR05tNktXTnpLcnhZUTRjVUkrNUNCR1VIN1BlNEplY3g2YWxoZGx5?= =?utf-8?B?VjFwZUJ0KzZWV1NTYnkyb2dJRkZ5Q09Qek4vRHp0bnpVS1lMcmJIalltODBR?= =?utf-8?B?dnNYTFhtOVdBV01NZ3NSU1lLbnh4WWc4NVFweTlBZE9SYlRIOVJNaGJTYSt6?= =?utf-8?B?aGRCWG0zWHEvQmdmbDBzTTF3WWRndGRCTHNOYzZpeldHdXN2NGNOb1h5QWVQ?= =?utf-8?B?eW9Cb1ZmYVpQMXh4TFVNMXN0cWJCSkY0cFUvT1M4NzNmOHZQRHJVdmthU3VN?= =?utf-8?B?aG9rRGdBZmZaZ1RQUmdEbnRhK1dtRTVtM0dsSVpFYlFvODJ1ZmZXSCtpZTV5?= =?utf-8?B?NzQ0WjdMMitXWk1JUUJhNGdmWUFPM3NKQjNGUjhYWmg1VGdxYnpOeW1UMldm?= =?utf-8?B?bkFiYTkxUHJRMlN4QzJCVEpFc1JMNTAvUFdOSUgvS200UU0wakdiY0hROUhZ?= =?utf-8?B?YVhmMkpMQnVXWk1DSjA1L0xQd3BGdDc1VEw3YUg0L0tObnpjenVVL1dXMXpL?= =?utf-8?B?R3ozbUQxcURoQTE5eDZXUFlQZ1F6YnhtTzlHS0g5OWhMTU1jR2NWeWR4dkY1?= =?utf-8?B?Szd5WTk3SE41ME02V2tyeDViR2laRHVKYkV3UXB3ZDhuODlTbE41MGtxT0pN?= =?utf-8?B?M0MwWnkwMW9rR1VYa1hMSUJaczBpbmdZQlpEZ1pNQUdhVEFRbHQ0VnFDbEkx?= =?utf-8?B?clo3NUJoYVFnRXdNMDFQODJSRVplSHoxQVgyRXF1THg1VDQ0QkxWelJLWFhJ?= =?utf-8?B?MWNHWXAvVndkaVJuM1ZjdWZIS0RvSmRoMGRwaDdnMCtXbVVWRFc4UkcxNzYw?= =?utf-8?B?bk9LZHVHekx6THdiYmVxcVBxUGprV0E0WnRrcFhPZDdmY2ZNbFpHTTdNUVJG?= =?utf-8?B?OUxRQkpHR21Sc0dsRWFZR0VMdEk4cVFrSHhXaTJOM0R2V1RkWkZKakkreitZ?= =?utf-8?B?RzZmeXhFQVhLNTVYQmQ5eFRmK3kzWU12UjQ5eTIvWEUzZStUL0pYWGxxM3Jm?= =?utf-8?B?eGh1N1lxRU4ydWhIYnNGbUZUMkxoTmdienpnM3V1Y0Y1UEFVbEpKRTlZNGlq?= =?utf-8?B?MXV6ZFVjRytab3gvN3phUmZIVVd4NHByQmNQV1F1d2tubFNrRzFqYzJtbTh0?= =?utf-8?B?djBuQS9wSnVaZHdBL0tUbDhpdjE0c0k4K0tDenZWOGUzYnN6OGxzd0FKK2ow?= =?utf-8?B?Sk1GVGtHMGVNNGlrR0dVVjZkYUVLSXpuaFhMaE1qSzlmSitMekx2SWdqR0dv?= =?utf-8?B?WGRDaWVKMkp0RzBWYi8yYXVCM2lkKyt3SlU4MjQ0bHF4ejBtMVlpK21MTldK?= =?utf-8?B?a1N3S1FPSlRSblVLTllEMS9CaWpWYWJ0SUdtOGRjZ3dBbDlyaFRraVpNdGtJ?= =?utf-8?B?OFpnN1hPWTgwVUhNRjFkekdONUhmMmtjMkZMbUl6bjZaQnl6aUZRR1JZS0hp?= =?utf-8?B?a05Jd3lkeWIyWlU4bVUwVm13SnorZkJmM1lCRGczc2FIYmRPcUVpZjVPRW1Q?= =?utf-8?B?bGw1K24vL0d3WmtPU0U2VFlYMlhNcEVwams3cVBjaS9vMjUzczJzRU96WmMw?= =?utf-8?B?VUR5L09wNEVlalcvVkdCZXhHL1REN01qV3pDUGhjUnUxVDlqSm15QWdQNWFa?= =?utf-8?B?QWw5MmFJZy95VUhraHdVY1pTYTI3R3pXWjhRanVRMXdwOGFOSitBdjNzZUta?= =?utf-8?B?N3MxRk8zdWFId2RyWWkvdTdwQ2diK3JWdHZXNVNPY1JBMWl5U25sSXUxK0N1?= =?utf-8?B?dkxzdWJHeE1rYlFFNjJ3OTNZOXhhSnhSUHNLQ05kdzdvNzFEbDBHemNxSHZr?= =?utf-8?Q?kZg82E3QAWEMlWY4=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: ca259a11-7335-4d86-6631-08de76615a0e X-MS-Exchange-CrossTenant-AuthSource: PH7PR11MB6522.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Feb 2026 00:36:02.9884 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 0tW3s6bKnxVdZOfgInfzwdy6ubLZxOGbbwIfQYAFkeq3dM4cXuYwHqDBNE02gNG69ANr1di79omNcsAb56tmBw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA3PR11MB8937 X-OriginatorOrg: intel.com X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Fri, Feb 27, 2026 at 02:10:26PM -0800, Gwan-gyeong Mun wrote: > > > On 2/23/26 11:08 AM, Matthew Brost wrote: > > On Mon, Feb 23, 2026 at 04:03:16PM +0200, Mika Kuoppala wrote: > > > From: Gwan-gyeong Mun > > > > > > > Not a complete review but a few quick comments below. > > > Thank you for your comments. I have left comments below for each point. > > > > The XE2 (and PVC) HW has a limitation that the pagefault due to invalid > > > access will halt the corresponding EUs. To solve this problem, introduce > > > EU pagefault handling functionality, which allows to unhalt pagefaulted > > > eu threads and to EU debugger to get inform about the eu attentions state > > > of EU threads during execution. > > > > > > If a pagefault occurs, send the DRM_XE_EUDEBUG_EVENT_PAGEFAULT event > > > after handling the pagefault. The pagefault eudebug event follows > > > the newly added drm_xe_eudebug_event_pagefault type. > > > When a pagefault occurs, it prevents to send the > > > DRM_XE_EUDEBUG_EVENT_EU_ATTENTION event to the client during pagefault > > > handling. > > > > > > The page fault event delivery follows the below policy. > > > (1) If EU Debugger discovery has completed and pagefaulted eu threads turn > > > on attention bit then pagefault handler delivers pagefault event > > > directly. > > > (2) If a pagefault occurs during eu debugger discovery process, pagefault > > > handler queues a pagefault event and sends the queued event when > > > discovery has completed and pagefaulted eu threads turn on attention > > > bit. > > > (3) If the pagefaulted eu thread struggles to turn on the attention bit > > > within the specified time, the attention scan worker sends a pagefault > > > event when it detects that the attention bit is turned on. > > > > > > If multiple eu threads are running and a pagefault occurs due to accessing > > > the same invalid address, send a single pagefault event > > > (DRM_XE_EUDEBUG_EVENT_PAGEFAULT type) to the user debugger instead of a > > > pagefault event for each of the multiple eu threads. > > > If eu threads (other than the one that caused the page fault before) access > > > the new invalid addresses, send a new pagefault event. > > > > > > As the attention scan worker send the eu attention event whenever the > > > attention bit is turned on, user debugger receives attenion event > > > immediately after pagefault event. > > > In this case, the page-fault event always precedes the attention event. > > > > > > When the user debugger receives an attention event after a pagefault event, > > > it can detect whether additional breakpoints or interrupts occur in > > > addition to the existing pagefault by comparing the eu threads where the > > > pagefault occurred with the eu threads where the attention bit is newly > > > enabled. > > > > > > v2: use only force exception (Joonas, Mika) > > > v3: rebased on v4 (Mika) > > > v4: streamline uapi, cleanups (Mika) > > > v5: struct member documentation (Mika) > > > v6: fault to fault_type (Mika) > > > > > > Signed-off-by: Gwan-gyeong Mun > > > Signed-off-by: Jan Maślak > > > Signed-off-by: Mika Kuoppala > > > --- > > > drivers/gpu/drm/xe/Makefile | 2 +- > > > drivers/gpu/drm/xe/xe_eudebug.c | 100 ++++- > > > drivers/gpu/drm/xe/xe_eudebug.h | 9 + > > > drivers/gpu/drm/xe/xe_eudebug_hw.c | 15 +- > > > drivers/gpu/drm/xe/xe_eudebug_pagefault.c | 440 ++++++++++++++++++++++ > > > drivers/gpu/drm/xe/xe_eudebug_pagefault.h | 47 +++ > > > drivers/gpu/drm/xe/xe_eudebug_types.h | 69 +++- > > > drivers/gpu/drm/xe/xe_pagefault_types.h | 4 + > > > include/uapi/drm/xe_drm_eudebug.h | 12 + > > > 9 files changed, 676 insertions(+), 22 deletions(-) > > > create mode 100644 drivers/gpu/drm/xe/xe_eudebug_pagefault.c > > > create mode 100644 drivers/gpu/drm/xe/xe_eudebug_pagefault.h > > > > > > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile > > > index 34db797ef8fc..b49fe7ae18e7 100644 > > > --- a/drivers/gpu/drm/xe/Makefile > > > +++ b/drivers/gpu/drm/xe/Makefile > > > @@ -152,7 +152,7 @@ xe-$(CONFIG_DRM_XE_GPUSVM) += xe_svm.o > > > xe-$(CONFIG_DRM_GPUSVM) += xe_userptr.o > > > # debugging shaders with gdb (eudebug) support > > > -xe-$(CONFIG_DRM_XE_EUDEBUG) += xe_eudebug.o xe_eudebug_vm.o xe_eudebug_hw.o xe_gt_debug.o > > > +xe-$(CONFIG_DRM_XE_EUDEBUG) += xe_eudebug.o xe_eudebug_vm.o xe_eudebug_hw.o xe_eudebug_pagefault.o xe_gt_debug.o > > > # graphics hardware monitoring (HWMON) support > > > xe-$(CONFIG_HWMON) += xe_hwmon.o > > > diff --git a/drivers/gpu/drm/xe/xe_eudebug.c b/drivers/gpu/drm/xe/xe_eudebug.c > > > index eae93c5f5e86..4b2f0dd9d234 100644 > > > --- a/drivers/gpu/drm/xe/xe_eudebug.c > > > +++ b/drivers/gpu/drm/xe/xe_eudebug.c > > > @@ -17,12 +17,16 @@ > > > #include "xe_eudebug.h" > > > #include "xe_eudebug_hw.h" > > > #include "xe_eudebug_types.h" > > > +#include "xe_eudebug_pagefault.h" > > > #include "xe_eudebug_vm.h" > > > #include "xe_exec_queue.h" > > > +#include "xe_force_wake.h" > > > #include "xe_gt.h" > > > #include "xe_hw_engine.h" > > > #include "xe_gt.h" > > > #include "xe_gt_debug.h" > > > +#include "xe_gt_mcr.h" > > > +#include "regs/xe_gt_regs.h" > > > #include "xe_macros.h" > > > #include "xe_pm.h" > > > #include "xe_sriov_pf.h" > > > @@ -263,6 +267,7 @@ static void xe_eudebug_free(struct kref *ref) > > > while (kfifo_get(&d->events.fifo, &event)) > > > kfree(event); > > > + xe_eudebug_pagefault_fini(d); > > > xe_eudebug_resources_destroy(d); > > > mutex_destroy(&d->target.lock); > > > XE_WARN_ON(d->target.xef); > > > @@ -461,7 +466,7 @@ static int _xe_eudebug_disconnect(struct xe_eudebug *d, > > > } \ > > > }) > > > -static struct xe_eudebug * > > > +struct xe_eudebug * > > > xe_eudebug_get_nolock(struct xe_file *xef) > > > { > > > struct xe_eudebug *d; > > > @@ -1888,10 +1893,6 @@ static int xe_eudebug_handle_gt_attention(struct xe_gt *gt) > > > { > > > int ret; > > > - ret = xe_gt_eu_threads_needing_attention(gt); > > > - if (ret <= 0) > > > - return ret; > > > - > > > ret = xe_send_gt_attention(gt); > > > /* Discovery in progress, fake it */ > > > @@ -1901,6 +1902,65 @@ static int xe_eudebug_handle_gt_attention(struct xe_gt *gt) > > > return ret; > > > } > > > +int xe_eudebug_send_pagefault_event(struct xe_eudebug *d, > > > + struct xe_eudebug_pagefault *pf) > > > +{ > > > + struct drm_xe_eudebug_event_pagefault *ep; > > > + struct drm_xe_eudebug_event *event; > > > + int h_queue, h_lrc; > > > + u32 size = xe_gt_eu_attention_bitmap_size(pf->q->gt) * 3; > > > + u32 sz = struct_size(ep, bitmask, size); > > > + int ret; > > > + > > > + XE_WARN_ON(pf->lrc_idx < 0 || pf->lrc_idx >= pf->q->width); > > > + > > > + XE_WARN_ON(!xe_exec_queue_is_debuggable(pf->q)); > > > + > > > + h_queue = find_handle(d, XE_EUDEBUG_RES_TYPE_EXEC_QUEUE, pf->q); > > > + if (h_queue < 0) > > > + return h_queue; > > > + > > > + h_lrc = find_handle(d, XE_EUDEBUG_RES_TYPE_LRC, pf->q->lrc[pf->lrc_idx]); > > > + if (h_lrc < 0) > > > + return h_lrc; > > > + > > > + event = xe_eudebug_create_event(d, DRM_XE_EUDEBUG_EVENT_PAGEFAULT, 0, > > > + DRM_XE_EUDEBUG_EVENT_STATE_CHANGE, sz); > > > + > > > + if (!event) > > > + return -ENOSPC; > > > + > > > + ep = cast_event(ep, event); > > > + ep->exec_queue_handle = h_queue; > > > + ep->lrc_handle = h_lrc; > > > + ep->bitmask_size = size; > > > + ep->pagefault_address = pf->fault.addr; > > > + > > > + memcpy(ep->bitmask, pf->attentions.before.att, pf->attentions.before.size); > > > + memcpy(ep->bitmask + pf->attentions.before.size, > > > + pf->attentions.after.att, pf->attentions.after.size); > > > + memcpy(ep->bitmask + pf->attentions.before.size + pf->attentions.after.size, > > > + pf->attentions.resolved.att, pf->attentions.resolved.size); > > > + > > > + event->seqno = atomic_long_inc_return(&d->events.seqno); > > > + > > > + ret = xe_eudebug_queue_event(d, event); > > > + if (ret) > > > + xe_eudebug_disconnect(d, ret); > > > + > > > + return ret; > > > +} > > > + > > > +static void handle_attention_fail(struct xe_gt *gt, int gt_id, int ret) > > > +{ > > > + /* TODO: error capture */ > > > + drm_info(>_to_xe(gt)->drm, > > > + "gt:%d unable to handle eu attention ret = %d\n", > > > + gt_id, ret); > > > + > > > + xe_gt_reset_async(gt); > > > +} > > > + > > > static void attention_poll_work(struct work_struct *work) > > > { > > > struct xe_device *xe = container_of(work, typeof(*xe), > > > @@ -1923,15 +1983,15 @@ static void attention_poll_work(struct work_struct *work) > > > if (gt->info.type != XE_GT_TYPE_MAIN) > > > continue; > > > - ret = xe_eudebug_handle_gt_attention(gt); > > > - if (ret) { > > > - /* TODO: error capture */ > > > - drm_info(>_to_xe(gt)->drm, > > > - "gt:%d unable to handle eu attention ret=%d\n", > > > - gt_id, ret); > > > + if (!xe_gt_eu_threads_needing_attention(gt)) > > > + continue; > > > + > > > + ret = xe_eudebug_handle_pagefaults(gt); > > > + if (!ret) > > > + ret = xe_eudebug_handle_gt_attention(gt); > > > - xe_gt_reset_async(gt); > > > - } > > > + if (ret) > > > + handle_attention_fail(gt, gt_id, ret); > > > } > > > xe_pm_runtime_put(xe); > > > @@ -1940,12 +2000,12 @@ static void attention_poll_work(struct work_struct *work) > > > schedule_delayed_work(&xe->eudebug.attention_dwork, delay); > > > } > > > -static void attention_poll_stop(struct xe_device *xe) > > > +void xe_eudebug_attention_poll_stop(struct xe_device *xe) > > > { > > > cancel_delayed_work_sync(&xe->eudebug.attention_dwork); > > > } > > > -static void attention_poll_start(struct xe_device *xe) > > > +void xe_eudebug_attention_poll_start(struct xe_device *xe) > > > { > > > mod_delayed_work(system_wq, &xe->eudebug.attention_dwork, 0); > > > } > > > @@ -1988,6 +2048,8 @@ xe_eudebug_connect(struct xe_device *xe, > > > kref_init(&d->ref); > > > mutex_init(&d->target.lock); > > > + mutex_init(&d->pf_lock); > > > + INIT_LIST_HEAD(&d->pagefaults); > > > init_waitqueue_head(&d->events.write_done); > > > init_waitqueue_head(&d->events.read_done); > > > init_completion(&d->discovery); > > > @@ -2019,7 +2081,7 @@ xe_eudebug_connect(struct xe_device *xe, > > > kref_get(&d->ref); > > > queue_work(xe->eudebug.wq, &d->discovery_work); > > > - attention_poll_start(xe); > > > + xe_eudebug_attention_poll_start(xe); > > > eu_dbg(d, "connected session %lld", d->session); > > > @@ -2098,9 +2160,9 @@ int xe_eudebug_enable(struct xe_device *xe, bool enable) > > > mutex_unlock(&xe->eudebug.lock); > > > if (enable) { > > > - attention_poll_start(xe); > > > + xe_eudebug_attention_poll_start(xe); > > > } else { > > > - attention_poll_stop(xe); > > > + xe_eudebug_attention_poll_stop(xe); > > > if (IS_SRIOV_PF(xe)) > > > xe_sriov_pf_end_lockdown(xe); > > > @@ -2153,7 +2215,7 @@ static void xe_eudebug_fini(struct drm_device *dev, void *__unused) > > > xe_assert(xe, list_empty(&xe->eudebug.targets)); > > > - attention_poll_stop(xe); > > > + xe_eudebug_attention_poll_stop(xe); > > > } > > > void xe_eudebug_init(struct xe_device *xe) > > > diff --git a/drivers/gpu/drm/xe/xe_eudebug.h b/drivers/gpu/drm/xe/xe_eudebug.h > > > index bd9fd7bf454f..34938e87be13 100644 > > > --- a/drivers/gpu/drm/xe/xe_eudebug.h > > > +++ b/drivers/gpu/drm/xe/xe_eudebug.h > > > @@ -13,12 +13,14 @@ struct drm_file; > > > struct xe_debug_data; > > > struct xe_device; > > > struct xe_file; > > > +struct xe_gt; > > > struct xe_vm; > > > struct xe_vma; > > > struct xe_vma_ops; > > > struct xe_exec_queue; > > > struct xe_user_fence; > > > struct xe_eudebug; > > > +struct xe_eudebug_pagefault; > > > #if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG) > > > @@ -72,8 +74,15 @@ void xe_eudebug_ufence_init(struct xe_user_fence *ufence); > > > void xe_eudebug_ufence_fini(struct xe_user_fence *ufence); > > > bool xe_eudebug_ufence_track(struct xe_user_fence *ufence); > > > +struct xe_eudebug *xe_eudebug_get_nolock(struct xe_file *xef); > > > void xe_eudebug_put(struct xe_eudebug *d); > > > +int xe_eudebug_send_pagefault_event(struct xe_eudebug *d, > > > + struct xe_eudebug_pagefault *pf); > > > + > > > +void xe_eudebug_attention_poll_stop(struct xe_device *xe); > > > +void xe_eudebug_attention_poll_start(struct xe_device *xe); > > > + > > > #else > > > static inline int xe_eudebug_connect_ioctl(struct drm_device *dev, > > > diff --git a/drivers/gpu/drm/xe/xe_eudebug_hw.c b/drivers/gpu/drm/xe/xe_eudebug_hw.c > > > index 5365265a67b3..270f7abc82e9 100644 > > > --- a/drivers/gpu/drm/xe/xe_eudebug_hw.c > > > +++ b/drivers/gpu/drm/xe/xe_eudebug_hw.c > > > @@ -322,6 +322,7 @@ static int do_eu_control(struct xe_eudebug *d, > > > struct xe_device *xe = d->xe; > > > u8 *bits = NULL; > > > unsigned int hw_attn_size, attn_size; > > > + struct dma_fence *pf_fence; > > > struct xe_exec_queue *q; > > > struct xe_lrc *lrc; > > > u64 seqno; > > > @@ -376,8 +377,20 @@ static int do_eu_control(struct xe_eudebug *d, > > > goto out_free; > > > } > > > - ret = -EINVAL; > > > mutex_lock(&d->hw.lock); > > > + do { > > > + pf_fence = dma_fence_get(d->pf_fence); > > > + if (pf_fence) { > > > + mutex_unlock(&d->hw.lock); > > > + ret = dma_fence_wait(pf_fence, true); > > > + dma_fence_put(pf_fence); > > > + if (ret) > > > + goto out_free; > > > + mutex_lock(&d->hw.lock); > > > + } > > > + } while (pf_fence); > > > + > > > + ret = -EINVAL; > > > switch (arg->cmd) { > > > case DRM_XE_EUDEBUG_EU_CONTROL_CMD_INTERRUPT_ALL: > > > diff --git a/drivers/gpu/drm/xe/xe_eudebug_pagefault.c b/drivers/gpu/drm/xe/xe_eudebug_pagefault.c > > > new file mode 100644 > > > index 000000000000..edd368a7f6ae > > > --- /dev/null > > > +++ b/drivers/gpu/drm/xe/xe_eudebug_pagefault.c > > > @@ -0,0 +1,440 @@ > > > +// SPDX-License-Identifier: MIT > > > +/* > > > + * Copyright © 2023-2025 Intel Corporation > > > + */ > > > + > > > +#include "xe_eudebug_pagefault.h" > > > + > > > +#include > > > + > > > +#include "xe_exec_queue.h" > > > +#include "xe_eudebug.h" > > > +#include "xe_eudebug_hw.h" > > > +#include "xe_force_wake.h" > > > +#include "xe_gt_debug.h" > > > +#include "xe_gt_mcr.h" > > > +#include "regs/xe_gt_regs.h" > > > +#include "xe_vm.h" > > > + > > > +static struct xe_gt * > > > +pf_to_gt(struct xe_eudebug_pagefault *pf) > > > +{ > > > + return pf->q->gt; > > > +} > > > + > > > +static void destroy_pagefault(struct xe_eudebug_pagefault *pf) > > > +{ > > > + xe_exec_queue_put(pf->q); > > > + kfree(pf); > > > +} > > > + > > > +static int queue_pagefault(struct xe_eudebug_pagefault *pf) > > > +{ > > > + struct xe_eudebug *d; > > > + > > > + d = xe_eudebug_get_nolock(pf->q->vm->xef); > > > + if (!d) > > > + return -EINVAL; > > > + > > > + mutex_lock(&d->pf_lock); > > > + list_add_tail(&pf->link, &d->pagefaults); > > > + mutex_unlock(&d->pf_lock); > > > + > > > + xe_eudebug_put(d); > > > + > > > + return 0; > > > +} > > > + > > > +static int send_pagefault(struct xe_eudebug_pagefault *pf, > > > + bool from_attention_scan) > > > +{ > > > + struct xe_gt *gt = pf_to_gt(pf); > > > + struct xe_eudebug *d; > > > + struct xe_exec_queue *q; > > > + int ret, lrc_idx; > > > + > > > + q = xe_gt_runalone_active_queue_get(gt, &lrc_idx); > > > + if (IS_ERR(q)) > > > + return PTR_ERR(q); > > > + > > > + if (!xe_exec_queue_is_debuggable(q)) { > > > + ret = -EPERM; > > > + goto out_exec_queue_put; > > > + } > > > + > > > + d = xe_eudebug_get_nolock(q->vm->xef); > > > + if (!d) { > > > + ret = -ENOTCONN; > > > + goto out_exec_queue_put; > > > + } > > > + > > > + if (pf->deferred_resolved) { > > > + xe_gt_eu_attentions_read(gt, &pf->attentions.resolved, > > > + XE_GT_ATTENTION_TIMEOUT_MS); > > > + > > > + if (!xe_eu_attentions_xor_count(&pf->attentions.after, > > > + &pf->attentions.resolved) && > > > + !from_attention_scan) { > > > + eu_dbg(d, "xe attentions not yet updated\n"); > > > + ret = -EBUSY; > > > + goto out_eudebug_put; > > > + } > > > + } > > > + > > > + ret = xe_eudebug_send_pagefault_event(d, pf); > > > + > > > +out_eudebug_put: > > > + xe_eudebug_put(d); > > > +out_exec_queue_put: > > > + xe_exec_queue_put(q); > > > + > > > + return ret; > > > +} > > > + > > > +static const char * > > > +pagefault_get_driver_name(struct dma_fence *dma_fence) > > > +{ > > > + return "xe"; > > > +} > > > + > > > +static const char * > > > +pagefault_fence_get_timeline_name(struct dma_fence *dma_fence) > > > +{ > > > + return "eudebug_pagefault_fence"; > > > +} > > > + > > > +static const struct dma_fence_ops pagefault_fence_ops = { > > > + .get_driver_name = pagefault_get_driver_name, > > > + .get_timeline_name = pagefault_fence_get_timeline_name, > > > +}; > > > + > > > +struct pagefault_fence { > > > + struct dma_fence base; > > > + spinlock_t lock; > > > +}; > > > + > > > +static struct pagefault_fence *pagefault_fence_create(void) > > > +{ > > > + struct pagefault_fence *fence; > > > + > > > + fence = kzalloc_obj(*fence, GFP_KERNEL); > > > + if (fence == NULL) > > > + return NULL; > > > + > > > + spin_lock_init(&fence->lock); > > > + dma_fence_init(&fence->base, &pagefault_fence_ops, &fence->lock, > > > + dma_fence_context_alloc(1), 1); > > > + > > > + return fence; > > > +} > > > + > > > +void > > > +xe_eudebug_pagefault_create(struct xe_vm *vm, struct xe_pagefault *pf) > > > > This function, as written, is basically a no from me given that > > DRM_XE_EUDEBUG is enabled by default. It adds time complexity via > > xe_vm_find_vma_by_addr(), which is O(log N) where N is the number of > > VMAs. > > > > Page faults are going to be heavily optimized since this is a critical > > path. Anything less than O(1) here when no EU connection exists — > > combined with DRM_XE_EUDEBUG being on — is likely to receive pushback > > from me. > > > I'll consider an implementation where eudebug directly uses the vma value > returned by xe_vm_find_vma_by_addr(), which is called by > xe_pagefault_service(). this way will avoid the performance degradation > caused by additional xe_vm_find_vma_by_addr() calls. ( Previously, due to > lock dependencies, eudebug directly called xe_vm_find_vma_by_addr(). I will > verify whether this issue still exists. ) > Yes, this would work for me. > > > +{ > > > + struct pagefault_fence *pf_fence; > > > + struct xe_eudebug_pagefault *epf; > > > + struct xe_vma *vma; > > > + struct xe_gt *gt = pf->gt; > > > + struct xe_exec_queue *q; > > > + struct dma_fence *fence; > > > + struct xe_eudebug *d; > > > + unsigned int fw_ref; > > > + int lrc_idx; > > > + u32 td_ctl; > > > + > > > + pf->consumer.epf = NULL; > > > + > > > + down_read(&vm->lock); > > > + vma = xe_vm_find_vma_by_addr(vm, pf->consumer.page_addr); > > > + up_read(&vm->lock); > > > > See my comment in [1] — this doesn't work for SVM. This will need to be > > rethought. > > > > [1] https://patchwork.freedesktop.org/patch/706437/?series=161979&rev=1#comment_1299420 > > > Additional implementation of eudebug pagefault routine for SVM is required. > I have replied to the mentioned email thread. > Reading this one and thinking this through. Matt > > > + > > > + if (vma) > > > + return; > > > + > > > + d = xe_eudebug_get_nolock(vm->xef); > > > + if (!d) > > > + return; > > > + > > > + q = xe_gt_runalone_active_queue_get(gt, &lrc_idx); > > > + if (IS_ERR(q)) > > > + goto err_put_eudebug; > > > + > > > + if (XE_WARN_ON(q->vm != vm)) > > > + goto err_put_exec_queue; > > > + > > > + if (!xe_exec_queue_is_debuggable(q)) > > > + goto err_put_exec_queue; > > > + > > > + fw_ref = xe_force_wake_get(gt_to_fw(gt), q->hwe->domain); > > > + if (!fw_ref) > > > + goto err_put_exec_queue; > > > + > > > + /* > > > + * If there is no debug functionality (TD_CTL_GLOBAL_DEBUG_ENABLE, etc.), > > > + * don't proceed pagefault routine for eu debugger. > > > + */ > > > + td_ctl = xe_gt_mcr_unicast_read_any(gt, TD_CTL); > > > + if (!td_ctl) > > > + goto err_put_fw; > > > + > > > + epf = kzalloc_obj(*epf, GFP_KERNEL); > > > + if (!epf) > > > + goto err_put_fw; > > > + > > > + xe_eudebug_attention_poll_stop(gt_to_xe(gt)); > > > + > > > + mutex_lock(&d->hw.lock); > > > + fence = dma_fence_get(d->pf_fence); > > > + > > > + if (fence) { > > > + /* > > > + * TODO: If the new incoming pagefaulted address is different > > > + * from the pagefaulted address it is currently handling on the > > > + * same ASID, it needs a routine to wait here and then do the > > > + * following pagefault. > > > + */ > > > + dma_fence_put(fence); > > > + goto err_unlock_hw_lock; > > > + } > > > + > > > + pf_fence = pagefault_fence_create(); > > > + if (!pf_fence) > > > + goto err_unlock_hw_lock; > > > + > > > + d->pf_fence = &pf_fence->base; > > > + > > > + INIT_LIST_HEAD(&epf->link); > > > + > > > + xe_gt_eu_attentions_read(gt, &epf->attentions.before, 0); > > > + > > > + if (td_ctl & TD_CTL_FORCE_EXCEPTION) > > > + eu_warn(d, "force exception already set!"); > > > + > > > + /* Halt regardless of thread dependencies */ > > > + while (!(td_ctl & TD_CTL_FORCE_EXCEPTION)) { > > > + xe_gt_mcr_multicast_write(gt, TD_CTL, > > > + td_ctl | TD_CTL_FORCE_EXCEPTION); > > > + udelay(200); > > > + td_ctl = xe_gt_mcr_unicast_read_any(gt, TD_CTL); > > > + } > > > + > > > + xe_gt_eu_attentions_read(gt, &epf->attentions.after, > > > + XE_GT_ATTENTION_TIMEOUT_MS); > > > + > > > + mutex_unlock(&d->hw.lock); > > > + > > > + /* > > > + * xe_exec_queue_put() will be called from xe_eudebug_pagefault_destroy() > > > + * or handle_pagefault() > > > + */ > > > + epf->q = q; > > > + epf->lrc_idx = lrc_idx; > > > + epf->fault.addr = pf->consumer.page_addr; > > > + epf->fault.type_level = pf->consumer.fault_type_level; > > > + epf->fault.access_type = pf->consumer.access_type; > > > + > > > + pf->consumer.epf = epf; > > > + > > > + xe_force_wake_put(gt_to_fw(gt), fw_ref); > > > + xe_eudebug_put(d); > > > + > > > + return; > > > + > > > +err_unlock_hw_lock: > > > + mutex_unlock(&d->hw.lock); > > > + xe_eudebug_attention_poll_start(gt_to_xe(gt)); > > > + kfree(epf); > > > +err_put_fw: > > > + xe_force_wake_put(gt_to_fw(gt), fw_ref); > > > +err_put_exec_queue: > > > + xe_exec_queue_put(q); > > > +err_put_eudebug: > > > + xe_eudebug_put(d); > > > +} > > > + > > > +struct xe_vma *xe_eudebug_create_vma(struct xe_vm *vm, struct xe_pagefault *pf) > > > +{ > > > + struct xe_vma *vma = NULL; > > > + > > > + if (!pf->consumer.epf) > > > + return NULL; > > > + > > > + vma = xe_vm_create_null_vma(vm, pf->consumer.page_addr); > > > + if (IS_ERR(vma)) > > > + return vma; > > > + > > > + pf->consumer.epf->is_null = true; > > > + > > > + return vma; > > > +} > > > + > > > +static void > > > +xe_eudebug_pagefault_process(struct xe_eudebug_pagefault *pf) > > > +{ > > > + struct xe_gt *gt = pf->q->gt; > > > + > > > + xe_gt_eu_attentions_read(gt, &pf->attentions.resolved, > > > + XE_GT_ATTENTION_TIMEOUT_MS); > > > + > > > + if (!xe_eu_attentions_xor_count(&pf->attentions.after, > > > + &pf->attentions.resolved)) > > > + pf->deferred_resolved = true; > > > +} > > > + > > > +static void > > > +_xe_eudebug_pagefault_destroy(struct xe_eudebug_pagefault *pf) > > > +{ > > > + struct xe_gt *gt = pf->q->gt; > > > + struct xe_vm *vm = pf->q->vm; > > > + struct xe_eudebug *d; > > > + unsigned int fw_ref; > > > + u32 td_ctl; > > > + bool queued, try_send; > > > + int ret; > > > + > > > + fw_ref = xe_force_wake_get(gt_to_fw(gt), pf->q->hwe->domain); > > > + if (!fw_ref) { > > > + struct xe_device *xe = gt_to_xe(gt); > > > + > > > + drm_warn(&xe->drm, "Forcewake fail: Can not recover TD_CTL"); > > > + } else { > > > + td_ctl = xe_gt_mcr_unicast_read_any(gt, TD_CTL); > > > + xe_gt_mcr_multicast_write(gt, TD_CTL, td_ctl & > > > + ~(TD_CTL_FORCE_EXCEPTION)); > > > + xe_force_wake_put(gt_to_fw(gt), fw_ref); > > > + } > > > + > > > + queued = false; > > > + try_send = pf->is_null; > > > + if (try_send) { > > > + ret = send_pagefault(pf, false); > > > + > > > + /* > > > + * if debugger discovery is not completed or resolved attentions are not > > > + * updated, then queue pagefault > > > + */ > > > + if (ret == -EBUSY) { > > > + ret = queue_pagefault(pf); > > > + if (!ret) > > > + queued = true; > > > + } > > > + } > > > + > > > + d = xe_eudebug_get_nolock(vm->xef); > > > + if (d) { > > > + struct dma_fence *f; > > > + > > > + mutex_lock(&d->hw.lock); > > > + f = d->pf_fence; > > > + d->pf_fence = NULL; > > > + mutex_unlock(&d->hw.lock); > > > + > > > + if (f) { > > > + if (!queued) > > > + dma_fence_signal(f); > > > + > > > + dma_fence_put(f); > > > + } > > > + > > > + xe_eudebug_put(d); > > > + } > > > + > > > + if (!queued) > > > + destroy_pagefault(pf); > > > + > > > + xe_eudebug_attention_poll_start(gt_to_xe(gt)); > > > +} > > > + > > > +static int send_queued_pagefaults(struct xe_eudebug *d) > > > +{ > > > + struct xe_eudebug_pagefault *pf, *pf_temp; > > > + int ret = 0; > > > + > > > + mutex_lock(&d->pf_lock); > > > + list_for_each_entry_safe(pf, pf_temp, &d->pagefaults, link) { > > > + ret = send_pagefault(pf, true); > > > + > > > + /* if resolved attentions are not updated */ > > > + if (ret == -EBUSY) > > > + break; > > > + > > > + list_del(&pf->link); > > > + > > > + destroy_pagefault(pf); > > > + > > > + if (ret) > > > + break; > > > + } > > > + mutex_unlock(&d->pf_lock); > > > + > > > + return ret; > > > +} > > > + > > > +int xe_eudebug_handle_pagefaults(struct xe_gt *gt) > > > +{ > > > + struct xe_exec_queue *q; > > > + struct xe_eudebug *d; > > > + int ret, lrc_idx; > > > + > > > + q = xe_gt_runalone_active_queue_get(gt, &lrc_idx); > > > + if (IS_ERR(q)) > > > + return PTR_ERR(q); > > > + > > > + if (!xe_exec_queue_is_debuggable(q)) { > > > + ret = -EPERM; > > > + goto out_exec_queue_put; > > > + } > > > + > > > + d = xe_eudebug_get_nolock(q->vm->xef); > > > + if (!d) { > > > + ret = -ENOTCONN; > > > + goto out_exec_queue_put; > > > + } > > > + > > > + ret = send_queued_pagefaults(d); > > > + > > > + xe_eudebug_put(d); > > > + > > > +out_exec_queue_put: > > > + xe_exec_queue_put(q); > > > + > > > + return ret; > > > +} > > > + > > > +void xe_eudebug_pagefault_service(struct xe_pagefault *pf) > > > +{ > > > + struct xe_eudebug_pagefault *f = pf->consumer.epf; > > > + > > > + if (!f) > > > + return; > > > + > > > + if (f->is_null) > > > + xe_eudebug_pagefault_process(f); > > > +} > > > + > > > +void xe_eudebug_pagefault_destroy(struct xe_pagefault *pf, int err) > > > +{ > > > + struct xe_eudebug_pagefault *f = pf->consumer.epf; > > > + > > > + if (!f) > > > + return; > > > + > > > + if (err) > > > + f->is_null = false; > > > + > > > + _xe_eudebug_pagefault_destroy(f); > > > +} > > > + > > > +void xe_eudebug_pagefault_fini(struct xe_eudebug *d) > > > +{ > > > + struct xe_eudebug_pagefault *pf, *pf_temp; > > > + > > > + /* Since it's the last reference no race here */ > > > + > > > + list_for_each_entry_safe(pf, pf_temp, &d->pagefaults, link) { > > > + list_del(&pf->link); > > > + destroy_pagefault(pf); > > > + } > > > + > > > + XE_WARN_ON(d->pf_fence); > > > +} > > > diff --git a/drivers/gpu/drm/xe/xe_eudebug_pagefault.h b/drivers/gpu/drm/xe/xe_eudebug_pagefault.h > > > new file mode 100644 > > > index 000000000000..1ba20beac3cf > > > --- /dev/null > > > +++ b/drivers/gpu/drm/xe/xe_eudebug_pagefault.h > > > @@ -0,0 +1,47 @@ > > > +/* SPDX-License-Identifier: MIT */ > > > +/* > > > + * Copyright © 2023-2025 Intel Corporation > > > + */ > > > + > > > +#ifndef _XE_EUDEBUG_PAGEFAULT_H_ > > > +#define _XE_EUDEBUG_PAGEFAULT_H_ > > > + > > > +#include > > > + > > > +struct xe_eudebug; > > > +struct xe_gt; > > > +struct xe_pagefault; > > > +struct xe_eudebug_pagefault; > > > +struct xe_vm; > > > + > > > +void xe_eudebug_pagefault_fini(struct xe_eudebug *d); > > > +int xe_eudebug_handle_pagefaults(struct xe_gt *gt); > > > + > > > +#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG) > > > +void xe_eudebug_pagefault_create(struct xe_vm *vm, struct xe_pagefault *pf); > > > +struct xe_vma *xe_eudebug_create_vma(struct xe_vm *vm, struct xe_pagefault *pf); > > > +void xe_eudebug_pagefault_service(struct xe_pagefault *pf); > > > +void xe_eudebug_pagefault_destroy(struct xe_pagefault *pf, int err); > > > +#else > > > + > > > +static inline void > > > +xe_eudebug_pagefault_create(struct xe_vm *vm, struct xe_pagefault *pf) > > > +{ > > > +} > > > + > > > +static inline struct xe_vma *xe_eudebug_create_vma(struct xe_vm *vm, struct xe_pagefault *pf) > > > +{ > > > + return NULL; > > > +} > > > + > > > +static inline void xe_eudebug_pagefault_service(struct xe_pagefault *pf) > > > +{ > > > +} > > > + > > > +static inline void xe_eudebug_pagefault_destroy(struct xe_pagefault *pf, int err) > > > +{ > > > +} > > > + > > > +#endif > > > + > > > +#endif /* _XE_EUDEBUG_PAGEFAULT_H_ */ > > > diff --git a/drivers/gpu/drm/xe/xe_eudebug_types.h b/drivers/gpu/drm/xe/xe_eudebug_types.h > > > index 386b5c78ecff..09bfae8b94ab 100644 > > > --- a/drivers/gpu/drm/xe/xe_eudebug_types.h > > > +++ b/drivers/gpu/drm/xe/xe_eudebug_types.h > > > @@ -15,6 +15,8 @@ > > > #include > > > #include > > > +#include "xe_gt_debug_types.h" > > > + > > > struct xe_device; > > > struct task_struct; > > > struct xe_eudebug; > > > @@ -37,7 +39,7 @@ enum xe_eudebug_state { > > > }; > > > #define CONFIG_DRM_XE_DEBUGGER_EVENT_QUEUE_SIZE 64 > > > -#define XE_EUDEBUG_MAX_EVENT_TYPE DRM_XE_EUDEBUG_EVENT_EU_ATTENTION > > > +#define XE_EUDEBUG_MAX_EVENT_TYPE DRM_XE_EUDEBUG_EVENT_PAGEFAULT > > > /** > > > * struct xe_eudebug_handle - eudebug resource handle > > > @@ -164,6 +166,71 @@ struct xe_eudebug { > > > /** @ops: operations for eu_control */ > > > struct xe_eudebug_eu_control_ops *ops; > > > + > > > + /** @pf_lock: guards access to pagefaults list*/ > > > + struct mutex pf_lock; > > > + /** @pagefaults: xe_eudebug_pagefault list for pagefault event queuing */ > > > + struct list_head pagefaults; > > > + /** > > > + * @pf_fence: fence on operations of eus (eu thread control and attention) > > > + * when page faults are being handled, protected by @eu_lock. > > > + */ > > > + struct dma_fence *pf_fence; > > > +}; > > > + > > > +/** > > > + * struct xe_eudebug_pagefault - eudebug structure for queuing pagefault > > > + */ > > > +struct xe_eudebug_pagefault { > > > + /** @link: link into the xe_eudebug.pagefaults */ > > > + struct list_head link; > > > + /** @q: exec_queue which raised pagefault */ > > > + struct xe_exec_queue *q; > > > + /** @lrc_idx: lrc index of the workload which raised pagefault */ > > > + int lrc_idx; > > > + > > > + /** @fault: pagefault raw partial data passed from guc */ > > > + struct { > > > + /** @addr: ppgtt address where the pagefault occurred */ > > > + u64 addr; > > > + u8 type_level; > > > + u8 access_type; > > > + } fault; > > > + > > > + /** @attentions: attention states in different phases of fault */ > > > + struct { > > > + /** @before: state of attention bits before page fault WA processing*/ > > > + struct xe_eu_attentions before; > > > + /** > > > + * @after: status of attention bits during page fault WA processing. > > > + * It includes eu threads where attention bits are turned on for > > > + * reasons other than page fault WA (breakpoint, interrupt, etc.). > > > + */ > > > + struct xe_eu_attentions after; > > > + /** > > > + * @resolved: state of the attention bits after page fault WA. > > > + * It includes the eu thread that caused the page fault. > > > + * To determine the eu thread that caused the page fault, > > > + * do XOR attentions.after and attentions.resolved. > > > + */ > > > + struct xe_eu_attentions resolved; > > > + } attentions; > > > + > > > + /** > > > + * @deferred_resolved: to update attentions.resolved again when attention > > > + * bits are ready if the eu thread fails to turn on attention bits within > > > + * a certain time after page fault WA processing. > > > + */ > > > + bool deferred_resolved; > > > + > > > + /** > > > + * @is_null: marks if this vma is null or not. The lookup for the > > > + * vma is done in two phases and eudebug pagefault struct needs > > > + * to be allocated apriori to resolving if we need null vma or not. > > > + * So we keep the state here so that processing and teardown > > > + * know which type of fault resulted in creation of this eudebug pf. > > > + */ > > > + bool is_null; > > > }; > > > #endif /* _XE_EUDEBUG_TYPES_H_ */ > > > diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h > > > index 0e378f41ede6..2bee858da597 100644 > > > --- a/drivers/gpu/drm/xe/xe_pagefault_types.h > > > +++ b/drivers/gpu/drm/xe/xe_pagefault_types.h > > > @@ -10,6 +10,7 @@ > > > struct xe_gt; > > > struct xe_pagefault; > > > +struct xe_eudebug_pagefault; > > > /** enum xe_pagefault_access_type - Xe page fault access type */ > > > enum xe_pagefault_access_type { > > > @@ -84,6 +85,9 @@ struct xe_pagefault { > > > u8 engine_class; > > > /** @consumer.engine_instance: engine instance */ > > > u8 engine_instance; > > > +#if IS_ENABLED(CONFIG_DRM_XE_EUDEBUG) > > > + struct xe_eudebug_pagefault *epf; > > > +#endif > > > > > > This will grow the pagefault struct from 64 bytes to 128 bytes. > > Everything will still be functionally correct, but I’d really prefer not > > to increase the size of this structure. The u64 reserved field will be > > used to implement the page-fault cache for fault storms, so that is a > > non-starter. > > > > Can we replace producer->private with epf and set a mask bit in the > > lower 3 bits to indicate that producer->private has been replaced by > > epf, then unwind epf vs. the original private on the producer side > > during the ack/cleanup? In that case, we would store the original > > producer->private in epf, if that isn’t clear. > > > Thank you for your feedback. It seems I can change the implementation to > store the epf in producer->private. I will incorporate this change in the > next version. > > > Another thing we will have to consider is how the EU debug interface for > > page faults will interact with the pagefault cache for fault storms > > that’s in the pipe [2] (which I’ll post as soon as CI is fixed). My > > initial thought is that it should be fine, given that the head of a > > fault storm will populate epf, and subsequent faults that hit the page > > being serviced will not have it populated. I’ll CC the EU debug team > > when I post this code to ensure we aren’t clobbering each other’s > > designs. > > > > [2] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-svn-perf-6-15-2025/-/commit/93669c7f4e00ec13d0a18e28d34dfcb41803b7c9 > > > Yes, I've checked your patch series. > https://patchwork.freedesktop.org/series/162167/ > > The eudebug pagefault handling routine does not appear to conflict > structurally with the pagefault cache for fault storms. After verifying the > behavior of applying the eudebug changes on top of your relevant patch, I > will provide an additional reply. > > G.G. > > > Matt > > > > > /** consumer.reserved: reserved bits for future expansion */ > > > u64 reserved; > > > } consumer; > > > diff --git a/include/uapi/drm/xe_drm_eudebug.h b/include/uapi/drm/xe_drm_eudebug.h > > > index 54394a7e12ab..f7d035532be2 100644 > > > --- a/include/uapi/drm/xe_drm_eudebug.h > > > +++ b/include/uapi/drm/xe_drm_eudebug.h > > > @@ -53,6 +53,7 @@ struct drm_xe_eudebug_event { > > > #define DRM_XE_EUDEBUG_EVENT_VM_BIND_OP_DEBUG_DATA 5 > > > #define DRM_XE_EUDEBUG_EVENT_VM_BIND_UFENCE 6 > > > #define DRM_XE_EUDEBUG_EVENT_EU_ATTENTION 7 > > > +#define DRM_XE_EUDEBUG_EVENT_PAGEFAULT 8 > > > /** @flags: Flags */ > > > __u16 flags; > > > @@ -358,6 +359,17 @@ struct drm_xe_eudebug_event_eu_attention { > > > __u8 bitmask[]; > > > }; > > > +struct drm_xe_eudebug_event_pagefault { > > > + struct drm_xe_eudebug_event base; > > > + > > > + __u64 exec_queue_handle; > > > + __u64 lrc_handle; > > > + __u32 flags; > > > + __u32 bitmask_size; > > > + __u64 pagefault_address; > > > + __u8 bitmask[]; > > > +}; > > > + > > > #if defined(__cplusplus) > > > } > > > #endif > > > -- > > > 2.43.0 > > > >