From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6A35AD42BA6 for ; Tue, 12 Nov 2024 16:25:36 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2A4BC10E27C; Tue, 12 Nov 2024 16:25:36 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (1024-bit key; unprotected) header.d=amd.com header.i=@amd.com header.b="3VXWAVlV"; dkim-atps=neutral Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2056.outbound.protection.outlook.com [40.107.220.56]) by gabe.freedesktop.org (Postfix) with ESMTPS id A239410E08B; Tue, 12 Nov 2024 16:25:34 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=RJfAv1EzEFxeOjs3+rLKn7ZU04WEk2YT1Xeb2YyfpGdiW+P4oV2OeaEPod8eqyEZv4x5ckk+G9rnczkgt7e4mLigU08JRNT0S5n1DEgTA6vC8JBEiCpvgF0s0OAlS/T4D3WbfjGdgl6gv9uK8s03/Gu7EpikoMtxSIBV2DrM8ZYG6+TY98lD0oTpZKshy2spXWmiPxaYr/vhJaAeOkkFGCy7rmWJnqCMuFMSkI+97gU2vRF+HWVQPLX2q/MfFpip+g1wfmuL0Jqdt01KqTyQK5fLTsc7+x2mvve18CEL/17UlRG6pZNIUk8+7mz79Vn1IgLkrA/zNlGuuGLImT6uOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=NEi9xUSVb6QRlJlx3mAItS0QoWPQw9WMn5tkqjik/5Y=; b=oOAR+pZjE/mh5JmC2ep5zFiG/hPuuqag8YbtqhdbyopAMdDSOcU9ZR0s2V2yM+z21OHRZi9WhKZjoiyxPcSgMOChycfyhgWzmsYpkzX/g5AqEiDJmv8fCUF1+2eL9mePnXGw26jWRSWYLiJ+XYmHCyiME0Kv0q2uDdLUEfc4gkXp0KQGcriC4/A2jCg5lPKZPuxBLaJo/9zy4lnUDZWalQEjxnVWkhedaTSTxq6f088LVsP5zz2DJKAoFy21U/qhGwtr+hJIQ5F5iojq3sOdW0DmYWJKkjop+l2q1ANBKqy/7mwYcyOTJ2jpqsI7HHF1Qqz+TyytOoLgmlz1NjOqDQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=NEi9xUSVb6QRlJlx3mAItS0QoWPQw9WMn5tkqjik/5Y=; b=3VXWAVlV4V0dtnK/wZZeg6MTHKCIGLpw19+wlTkleKlXdRNPteDe7461zf2qw4GLJYZgUhGt4RgheoitaR/AClgrkbHrvo2bJi4UlE3JbuxG7v9L79s9O/e3p2zRMngz13Asi0qQIQ0D3gxMZnBsFCrxRJTO3beQt7ciiJ3ZfRA= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com; Received: from PH7PR12MB5685.namprd12.prod.outlook.com (2603:10b6:510:13c::22) by SJ2PR12MB7823.namprd12.prod.outlook.com (2603:10b6:a03:4c9::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8158.16; Tue, 12 Nov 2024 16:25:31 +0000 Received: from PH7PR12MB5685.namprd12.prod.outlook.com ([fe80::46fb:96f2:7667:7ca5]) by PH7PR12MB5685.namprd12.prod.outlook.com ([fe80::46fb:96f2:7667:7ca5%2]) with mapi id 15.20.8137.027; Tue, 12 Nov 2024 16:25:30 +0000 Message-ID: Date: Tue, 12 Nov 2024 17:25:24 +0100 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access To: =?UTF-8?Q?Thomas_Hellstr=C3=B6m?= , Joonas Lahtinen , =?UTF-8?Q?Christian_K=C3=B6nig?= , Matthew Brost Cc: Simona Vetter , Rodrigo Vivi , Huang Rui , intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org, matthew.auld@intel.com, David Airlie , Simona Vetter References: <88ceb558-89d1-4d10-be8a-9ce2f3178fa5@amd.com> <85859dc3-cca3-4396-8c75-a726437fb81e@amd.com> <173131201749.35893.6727423786823542880@jlahtine-mobl.ger.corp.intel.com> <173133360247.70529.14349388659124962988@jlahtine-mobl.ger.corp.intel.com> <8d181425-db3b-46b7-ae73-ae6ef149d185@amd.com> <31b16d78-708d-4f19-9b33-891e31be5480@gmail.com> <173141886970.132411.13400253861916730373@jlahtine-mobl.ger.corp.intel.com> <27f8bd540ac1f04daf8a7786f4ae7828017d061b.camel@linux.intel.com> Content-Language: en-US From: =?UTF-8?Q?Christian_K=C3=B6nig?= In-Reply-To: <27f8bd540ac1f04daf8a7786f4ae7828017d061b.camel@linux.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: FR2P281CA0142.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:98::8) To PH7PR12MB5685.namprd12.prod.outlook.com (2603:10b6:510:13c::22) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: PH7PR12MB5685:EE_|SJ2PR12MB7823:EE_ X-MS-Office365-Filtering-Correlation-Id: a2243df7-8a6e-425d-3cbd-08dd0336a00f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|7416014|376014|1800799024; X-Microsoft-Antispam-Message-Info: =?utf-8?B?ZTJDcEtzZC9uZVhUMUwrWURzU29tKzdLRTh1RmwyTi9vOHlIdFhEZm9KdS9Y?= =?utf-8?B?RVJqN3lTR0FReXpObVR2eUIyamRLdmdMSnRGU2xuWk5XWTVWeTN1YXNaTVkw?= =?utf-8?B?Zy9SbVV3b1BRbXJuWVFvc3k1RVllU0RMbUFqT01VNGxZWTVjY1JXOTFDTzJO?= =?utf-8?B?UzNQT3MvN0VwRXR1UFJMaGJJRzlLVkdpRDFZY1ZIc3ZlcXBVRkhFMnAzTjZi?= =?utf-8?B?eFpHVWFvdlVkdDIrL2g0bDJSMjFZWlBObFFKeGE2MU9RNUlBeFliYXp3T3Rq?= =?utf-8?B?ODdLMFdaREpUQ01vOWN0KzBMaCsxTmVxQVk4UzlvQ1IzQ3BNYTc2S0RITWt4?= =?utf-8?B?cjR2SnFoWnd2MXgvVk9DN3lTL3dvWlFnenhTaVVKSFZKKy96N1B5VnRtOURK?= =?utf-8?B?R1J3WVpoWnhKVTR5UVBPMGhvUXI5WEFnVUYzdFlIYS9pcnhLWXpVa1puSExU?= =?utf-8?B?djN0d3FmWU9rM3ZzM0p6dHNWSnpXcU5yQjBidFdYUDBnMVNya0pScTBCRW9p?= =?utf-8?B?bDNKWDlhRzB4aGZmM3dlTldOU3lqa3g2WlB3Q2VEZm9LeXJkaXZmMnVtTnlH?= =?utf-8?B?UmhrNEErejEyR1FUSzlIT2xORnZVS0gwcFA5L2hkSWl2aWswamNXZ1Y3aS9H?= =?utf-8?B?enZwaHl4Y1oxVGlOZTJmWlhibW5QcFZNdURCYTgwS05xRVd2eE01dUhZYmQ2?= =?utf-8?B?aUxUblBTMzN4TGxTVmRxYXRqWlZrOW8vT0NzZnZWRHRpNW55czI4ZllXVFFL?= =?utf-8?B?TlV6MDZma1BwN0w1Nld0bmhmTkV4dFFQdllEN2tabUF3SWlhelZ2emdpN1Nt?= =?utf-8?B?dkdianR2cEsvb3BCQU1XOHpKSkNORll3MlFJUEYxVTk2azNOUVV3V1c2UVFp?= =?utf-8?B?NnowTFhndTdWZGJtLzBoWUI2ZjdybWlCemlKcVg2V0IySytBRUJ2MVdKZ244?= =?utf-8?B?SEJ0M3h6b2tiOEJ4RXRTemFKdjFkTU5CcGRHZnMvNVBheE9rdjMyaGcrSzFE?= =?utf-8?B?WGErdHlLRytYSUFKaGI4QlhZVlVnSVBKNkUyWlV2Q1REcTdCV2tvd1U1U1Js?= =?utf-8?B?SlIvcEp5RVo2djhIbVVTRGNPRUpKOXNNYnh1TWtMZkNydkIrUmxLdHdyR0hj?= =?utf-8?B?ak16emVpdGIzQWxKbW4vSStSYm5hY21uMk1NSkJRdFJiblBTR1JCN3MxRW96?= =?utf-8?B?b2lYazF0NWkvTTF6N3FZV0FOK2pOSVRTUTFyUXJJQmhnOVU5dHNtNExzdDhB?= =?utf-8?B?U3lHVHB0YjhjNldJZFl2emlwTjMrRDQ4WFZ3ci9HM3RWdDJIOFNzRmsvdU5y?= =?utf-8?B?cmg2a29NWnkxeVZBa1RZZlpPT29QTWszYVIrK294N0ZOOEdCMSt5ZTkzNDFm?= =?utf-8?B?NkU0QTlMQVY2UnY1a25NOVNBQjE1SzFyV3BSdzA2eHVWbGtnVHdQemQvWEll?= =?utf-8?B?STJGNW41Z1hlekNqWkdleUZaazhYNE85OEsyNDdPdzU4elNKUHVldXhKQVM3?= =?utf-8?B?amZlL2JjSnRobEM2N0JVajN5TUM3eUw2MktYaC82bkVkNWdaOHUxdFdNblEw?= =?utf-8?B?UzNvdUl4Y3EyVFhYblFIR3dLLzRnaWJ5Z1k1bVd6K0tmRDIzbVBVSENQbEpB?= =?utf-8?B?ODkza1cxQ3U0Vy9pY0JzZFozM1lFbXFBbDBkNlVNZWcwT1ZjRHVrS0pXek5K?= =?utf-8?B?Y2RRSXBlRmNDSDdLVHdlMVhGb3ZOeTlFcmNtL2g5c0pxQ3RDMkVJMTBBPT0=?= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PH7PR12MB5685.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(366016)(7416014)(376014)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?YXZmaW5HQmFhOFc3a0R2THhnS2hFMEhGRklYcmtXSEtFYTl1M3BhbEphUWdr?= =?utf-8?B?TjYzSlNSQW1RSnQ1enBPMDBNb2h2dWljL0tYTHg0ekhuN0thYTdzcEswY05Z?= =?utf-8?B?MG1Yc0RONE1pRmZEVEY5RWtzcDNiZjJabmFtaG1KYU8zbFN6Q2V1TnBiLzJ5?= =?utf-8?B?UE8vN1hUZ0FWci8zZTFhOWdrUGNUYTROOG5GbnBXVzQycWRmQWpyUjRWa3Ba?= =?utf-8?B?eVZtVjFiL1VIaWZUdWQ2OFJtYUcvMlQzQ3FLR21xQlgyZFMrWXpnb0dKdk1q?= =?utf-8?B?UFJSRVJoeUpSUld0TklFQkdxTnBXZExUVFFsQ00vL1NjRjlQd0R0eVIzUU15?= =?utf-8?B?MktXcUhJWUZnRUpibmwrRE4ySGVIYW5UbFFOUUF6ZmtvQ1VZTFVhRDBQY2V3?= =?utf-8?B?MTlsZVAxcDk3VEw1WXpaeWUxbHEwaWZrTmhJb3ZzbEJpRldZM3ptVnYzS2Y4?= =?utf-8?B?ZFVNTldmOU5XSnRYZFBFWVFOdUhjZllqVitGSUFRdGhGUDBCbXJjaEZIT2ZX?= =?utf-8?B?eU1WYlo3STlGQURuaGg5c3MzOEVxWVpWbWxwNnFFVk1uOTAvMS96cDRhcjhU?= =?utf-8?B?QWVZSDAxSjJObGtHUUNlQVdSZzU0Z0tNYkRFVm9FMGJWTlc1d094NnlaSTR4?= =?utf-8?B?VFpyaXgzVURLMklBdGpBSDhWZGloenBQZXpLZy9VeUFRaDZrZ0hsVHNDN0Vx?= =?utf-8?B?MXRoejJTSmF3S3ljcExiMFliRzdhMWdOemNRdGtzbU5JUmd1VWdGenpxZ2pr?= =?utf-8?B?RGljOVlCdHNaWUlHWmEvdEZ0a3lObnN3RjJmUGlBN21kenk5bkFINVpxRjFp?= =?utf-8?B?N0lTZDQ2djVsK0I4RHFnaC9xMmhEMWI0NTV1QnVFclp3cThuUmk1M0hFWW91?= =?utf-8?B?ejgwL3Q3NGhqZ0tzUWF0UmhnTHZNeWowM1pWSldEaEtOVDNrbWtJOGppZjRE?= =?utf-8?B?d29LcDJoRkUzMmVIZ2FtcU9zalNSNUVJVXhhelpCdUF2U0xjSEdWZWF4SDUw?= =?utf-8?B?MTF4dC9TRVV5MVBFZ1VKdFhVSFZLQmJhV1BoM1NzVHJiUVhNaGFOWWo0WWdn?= =?utf-8?B?QVQzUVhKZGU0M0tkczdUeDZMQ0FWWHJxVEU3a0JwVWUydzVGZGRqaG1RVGRj?= =?utf-8?B?VWtRc1A5dVVXU21QSngzOUdlRmJTcEVYVU1MMnBralBybUpKbVVGUEJUbHR2?= =?utf-8?B?RGFma1htMExqYUJONXh5M3Fzb2gyK3JXbEdOSFpQc296aFRrNG1mZU5tQ1hx?= =?utf-8?B?N3RoaElndlpRbVdhYUVuY3J6SHhRa0xMbHZpUEhVbXRlVGI0VE50VjFTeDN1?= =?utf-8?B?b2t6N0huNFNSaEtJRHFCMFBPcXl3aWZBMGJaVWdUM2F0OGNyb2VFV3Y1STNo?= =?utf-8?B?eEZSK0wvckJzU29sYm9qN3ZPa1F3UkZiL2FCTjRpem5NbUtwQ09ZRkdHcHpZ?= =?utf-8?B?ZnVjK2pZbnBzRzJlZnFXU3dDUDZ4TE9zK21DTFUzR0tONjRTVHpqcWc0aWht?= =?utf-8?B?NDcwMVhoT2dic1FlcTBVOEtiUWduV21rT2tyYitPaXhXZkkyRFFIK0FYVjNy?= =?utf-8?B?b05nZlFseHlBbmJOU3N6dllpMHIzSDJ6bjB6NTY5bk43M24vcXFuZ0FjNjBa?= =?utf-8?B?RnhYY0dCclhGMUtqUEhQSkxBZWlEUWhpaVhlT0hQenc5b2RYTXpRYm12cFFU?= =?utf-8?B?U2pBZE9QVGRoMTNlRzM4ME9heEVVSXBCR1J6aFFDNlhqYUJ2SXhMWk12UkhL?= =?utf-8?B?L3ovenpudUFZRWw1eWxVVndjdUVrb2NwOVJSYUtlZVBJTGJwblNCd3MybzBD?= =?utf-8?B?d3psck1YWmExUm1vYi90SHo5ZzhibmRRbG5sTHFhOURxaWRTK0FsWkhrY21s?= =?utf-8?B?YUFOUW16MnpaSmZ4V2VTNERSTGt3RGc5REdoR2p5THcvcllxUi9yS2o4Tjdo?= =?utf-8?B?QU53QVM4K1ovQ0VMeXo2akpSYjQ4MmxyOHMyTWxQeHpsa0lqODk2aENaVDJ0?= =?utf-8?B?dWZwK0g0ODRYaWpxZ3g4aWpCVmFKV0lKNEZETGF5WTAzZHFIQ3RVYWcwQ1p1?= =?utf-8?B?eTJ1QnZZeFgwMWV4R3lpUUxUaWhOM3Vudy85ZlkwbWVqUy9KWkNZdFZCUWRM?= =?utf-8?Q?swIob09f4u8DDMb6/IYbunmCQ?= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: a2243df7-8a6e-425d-3cbd-08dd0336a00f X-MS-Exchange-CrossTenant-AuthSource: PH7PR12MB5685.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Nov 2024 16:25:30.7977 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: ANuWBByLZSnpLqMOyBW8ZM/TSwShazVP/kZpC3CbmYj5kC4hgyPI2jK4NOlDaoGi X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ2PR12MB7823 X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" Am 12.11.24 um 17:22 schrieb Thomas Hellström: > On Tue, 2024-11-12 at 15:41 +0200, Joonas Lahtinen wrote: >> (+ Thomas) >> >> Quoting Christian König (2024-11-12 11:23:36) >>> Am 11.11.24 um 23:45 schrieb Matthew Brost: >>> >>>     [SNIP] >>> >>>             So I think only way to allow interactive debugging is >>> to avoid the >>>             dma_fences. Curious to hear if there are ideas for >>> otherwise. >>> >>>         You need to guarantee somehow that the process is taken >>> from the hardware so >>>         that the preemption fence can signal. >>> >>> >>>     Our preemption fences have this functionality. >>> >>>     A preemption fence issues a suspend execution command to the >>> firmware. The >>>     firmware, in turn, attempts to preempt the workload. If it >>> doesn't respond >>>     within a specified period, it resets the hardware queue, sends >>> a message to KMD, >>>     bans the software queue, and signals the preemption fence. >>> >>>     We provide even more protection than that. If, for some reason, >>> the firmware >>>     doesn't respond within a longer timeout period, the KMD >>> performs a device reset, >>>     ban the offending software queue(s), and will signal the >>> preemption fences. >>> >>>     This flow remains the same whether a debugger is attached or, >>> for example, a >>>     user submits a 10-minute non-preemptable workload. In either >>> case, other >>>     processes are guaranteed to make forward progress. >>> >>> >>> Yeah that is pretty much the same argumentation I have heard before >>> and it >>> turned out to not be working. >>> >>> >>>     The example above illustrates the memory oversubscription case, >>> where two >>>     processes are using 51% of the memory. >>> >>> >>> That isn't even necessary. We have seen applications dying just >>> because the >>> core memory management tried to join back small pages into huge >>> pages in an >>> userptr. >>> >>> That the core memory management jumps in and requests that the pre- >>> emption >>> fence signals can happen all the time. >> Ouch. Does there happen to be a known reproducer for this behavior or >> maybe >> bug report? >> >>> You can mitigate that a bit, Fedora for example disables joining >>> back small >>> pages into huge pages by default for example and we even had people >>> suggesting >>> to use mprotect() so that userptrs VMAs don't fork() any more >>> (which is of >>> course completely illegal). >>> >>> But my long term take away is that you can't block all causes of >>> sudden >>> requests to let a pre-emption fence signal. >> I think this problem equally applies to the LR-workloads like the EU >> debugging ones. >> >>>     Another preemption scenario involves two processes sharing >>> hardware resources. >>>     Our firmware follows the same flow here. If an LR workload is >>> using a hardware >>>     resource and a DMA-fence workload is waiting, and if the LR >>> workload doesn't >>>     preempt the in a timely manner, the firmware issues a hardware >>> reset, notifies >>>     KMD, and bans the LR software queue. The DMA-fence workload >>> then can make >>>     forward progress >>> >>>     With the above in mind, this is why I say that if a user tries >>> to run a game and >>>     a non-preemptable LR workload, either oversubscribing memory or >>> sharing hardware >>>     resources, it is unlikely to work well. However, I don't think >>> this is a common >>>     use case. I would expect that when a debugger is open, it is >>> typically by a >>>     power user who knows how to disable other GPU tasks (e.g., by >>> enabling software >>>     rendering or using a machine without any display). >>> >>>     Given this, please to reconsider your position. >>> >>> >>> The key point here is that this isn't stable, you can do that as a >>> tech demo >>> but it can always be that debugging an application just randomly >>> dies. And >>> believe me AMD has tried this to a rather extreme extend as well. >> It's not really only limited to the debuggable applications at all, >> the >> normal LR workloads are equally impacted as far as I understand. Just >> harder to catch the issue with LR-workloads if the pre-emption fence >> signaling is sporadic. >> >>> What you could potentially work is to taint the kernel and make >>> sure that this >>> function is only available to user who absolutely know what they >>> are doing. >>> >>> But I would say we can only allow that if all other options have >>> been exercised >>> and doing it like this is really the only option left. >> It sounds like servicing the memory pre-empt fence by stealing the >> pages from underneath the workload would be the way to resolve this >> issue. >> >> This has been extensively discussed already, but was expected to >> really >> only be needed for low-on-memory scenarios. However it now seems like >> the need is much earlier due to the random userptr page joining by >> core >> mm. > Just to clarify here: > > In Long-Running mode with recoverable pagefaults enabled we don't have > any preempt-fences, but rather just zap the PTEs pointing to the > affected memory and flush TLB. So from a memory resource POW a > breakpoint should be safe, and no mmu notifier nor shrinker will be > blocked. That sounds like a HMM based approach which would clearly work. But where is that? I don't see any HMM based approach anywhere in the XE driver. Regards, Christian. > > Nor will there be any jobs with published dma-fences depending on the > job blocked either temporarily by a pagefault or long-term by a > debugger breakpoint. > > /Thomas > > >> If that is done and the memory pre-empt fence is serviced even for >> debuggable contexts, do you have further concerns with the presented >> approach >> from dma-buf and drm/sched perspective? >> >> Regards, Joonas >> >>> Regards, >>> Christian. >>> >>> >>>         This means that a breakpoint or core dump doesn't halt GPU >>> threads, but >>>         rather suspends them. E.g. all running wave data is >>> collected into a state >>>         bag which can be restored later on. >>> >>>         I was under the impression that those long running compute >>> threads do >>>         exactly that, but when the hardware can't switch out the >>> GPU thread/process >>>         while in a break then that isn't the case. >>> >>>         As long as you don't find a way to avoid that this patch >>> set is a pretty >>>         clear NAK from my side as DMA-buf and TTM maintainer. >>> >>> >>>     I believe this is addressed above. >>> >>>     Matt >>> >>> >>>         What might work is to keep the submission on the hardware >>> in the break state >>>         but forbid any memory access. This way you can signal your >>> preemption fence >>>         even when the hardware isn't made available. >>> >>>         Before you continue XE setups a new pre-emption fence and >>> makes sure that >>>         all page tables etc... are up to date. >>> >>>         Could be tricky to get this right if completion fence based >>> submissions are >>>         mixed in as well, but that gives you at least a direction >>> you could >>>         potentially go. >>> >>>         Regards, >>>         Christian. >>> >>> >>>             Regards, Joonas >>> >>> >>>                 Regards, >>>                 Christian. >>> >>> >>>                     Some wash-up thoughts from me below, but >>> consider them fairly irrelevant >>>                     since I think the main driver for these big >>> questions here should be >>>                     gdb/userspace. >>> >>> >>>                         Quoting Christian König (2024-11-07 >>> 11:44:33) >>> >>>                             Am 06.11.24 um 18:00 schrieb Matthew >>> Brost: >>> >>>                                   [SNIP] >>> >>>                                   This is not a generic interface >>> that anyone can freely access. The same >>>                                   permissions used by ptrace are >>> checked when opening such an interface. >>>                                   See [1] [2]. >>> >>> >>> [1]https://patchwork.freedesktop.org/patch/617470/?series=136572&re >>> v=2 >>> >>> [2]https://patchwork.freedesktop.org/patch/617471/?series=136572&re >>> v=2 >>> >>> >>>                             Thanks a lot for those pointers, that >>> is exactly what I was looking for. >>> >>>                             And yeah, it is what I feared. You are >>> re-implementing existing functionality, >>>                             but see below. >>> >>>                         Could you elaborate on what this "existing >>> functionality" exactly is? >>>                         I do not think this functionality exists at >>> this time. >>> >>>                         The EU debugging architecture for Xe >>> specifically avoids the need for GDB >>>                         to attach with ptrace to the CPU process or >>> interfere with the CPU process for >>>                         the debugging via parasitic threads or so. >>> >>>                         Debugger connection is opened to the DRM >>> driver for given PID (which uses the >>>                         ptrace may access check for now) after >>> which the all DRM client of that >>>                         PID are exposed to the debugger process. >>> >>>                         What we want to expose via that debugger >>> connection is the ability for GDB to >>>                         read/write the different GPU VM address >>> spaces (ppGTT for Intel GPUs) just like >>>                         the EU threads would see them. Note that >>> the layout of the ppGTT is >>>                         completely up to the userspace driver to >>> setup and is mostly only partially >>>                         equal to the CPU address space. >>> >>>                         Specifically as part of reading/writing the >>> ppGTT for debugging purposes, >>>                         there are deep flushes needed: for example >>> flushing instruction cache >>>                         when adding/removing breakpoints. >>> >>>                         Maybe that will explain the background. I >>> elaborate on this at the end some more. >>> >>> >>>                                           kmap/vmap are used >>> everywhere in the DRM subsystem to access BOs, so I’m >>>                                           failing to see the >>> problem with adding a simple helper based on existing >>>                                           code. >>> >>>                                       What#s possible and often >>> done is to do kmap/vmap if you need to implement a >>>                                       CPU copy for scanout for >>> example or for copying/validating command buffers. >>>                                       But that usually requires >>> accessing the whole BO and has separate security >>>                                       checks. >>> >>>                                       When you want to access only >>> a few bytes of a BO that sounds massively like >>>                                       a peek/poke like interface >>> and we have already rejected that more than once. >>>                                       There even used to be >>> standardized GEM IOCTLs for that which have been >>>                                       removed by now. >>> >>>                         Referring to the explanation at top: These >>> IOCTL are not for the debugging target >>>                         process to issue. The peek/poke interface >>> is specifically for GDB only >>>                         to facilitate the emulation of memory >>> reads/writes on the GPU address >>>                         space as they were done by EUs themselves. >>> And to recap: for modifying >>>                         instructions for example (add/remove >>> breakpoint), extra level of cache flushing is >>>                         needed which is not available to regular >>> userspace. >>> >>>                         I specifically discussed with Sima on the >>> difference before moving forward with this >>>                         design originally. If something has changed >>> since then, I'm of course happy to rediscuss. >>> >>>                         However, if this code can't be added, not >>> sure how we would ever be able >>>                         to implement core dumps for GPU >>> threads/memory? >>> >>> >>>                                       If you need to access BOs >>> which are placed in not CPU accessible memory then >>>                                       implement the access callback >>> for ptrace, see amdgpu_ttm_access_memory for >>>                                       an example how to do this. >>> >>>                         As also mentioned above, we don't work via >>> ptrace at all when it comes >>>                         to debugging the EUs. The only thing used >>> for now is the ptrace_may_access to >>>                         implement similar access restrictions as >>> ptrace has. This can be changed >>>                         to something else if needed. >>> >>> >>>                                   Ptrace access via >>> vm_operations_struct.access → ttm_bo_vm_access. >>> >>>                                   This series renames >>> ttm_bo_vm_access to ttm_bo_access, with no code changes. >>> >>>                                   The above function accesses a BO >>> via kmap if it is in SYSTEM / TT, >>>                                   which is existing code. >>> >>>                                   This function is only exposed to >>> user space via ptrace permissions. >>> >>>                         Maybe this sentence is what caused the >>> confusion. >>> >>>                         Userspace is never exposed with peek/poke >>> interface, only the debugger >>>                         connection which is its own FD. >>> >>> >>>                                   In this series, we implement a >>> function [3] similar to >>>                                   amdgpu_ttm_access_memory for the >>> TTM vfunc access_memory. What is >>>                                   missing is non-visible CPU memory >>> access, similar to >>>                                   amdgpu_ttm_access_memory_sdma. >>> This will be addressed in a follow-up and >>>                                   was omitted in this series given >>> its complexity. >>> >>>                                   So, this looks more or less >>> identical to AMD's ptrace implementation, >>>                                   but in GPU address space. Again, >>> I fail to see what the problem is here. >>>                                   What am I missing? >>> >>> >>>                             The main question is why can't you use >>> the existing interfaces directly? >>> >>>                         We're not working on the CPU address space >>> or BOs. We're working >>>                         strictly on the GPU address space as would >>> be seen by an EU thread if it >>>                         accessed address X. >>> >>> >>>                             Additional to the peek/poke interface >>> of ptrace Linux has the pidfd_getfd >>>                             system call, see >>> herehttps://man7.org/linux/man-pages/man2/pidfd_getfd.2.html. >>> >>>                             The pidfd_getfd() allows to dup() the >>> render node file descriptor into your gdb >>>                             process. That in turn gives you all the >>> access you need from gdb, including >>>                             mapping BOs and command submission on >>> behalf of the application. >>> >>>                         We're not operating on the CPU address >>> space nor are we operating on BOs >>>                         (there is no concept of BO in the EU debug >>> interface). Each VMA in the VM >>>                         could come from anywhere, only the start >>> address and size matter. And >>>                         neither do we need to interfere with the >>> command submission of the >>>                         process under debug. >>> >>> >>>                             As far as I can see that allows for the >>> same functionality as the eudebug >>>                             interface, just without any driver >>> specific code messing with ptrace >>>                             permissions and peek/poke interfaces. >>> >>>                             So the question is still why do you >>> need the whole eudebug interface in the >>>                             first place? I might be missing >>> something, but that seems to be superfluous >>>                             from a high level view. >>> >>>                         Recapping from above. It is to allow the >>> debugging of EU threads per DRM >>>                         client, completely independent of the CPU >>> process. If ptrace_may_acces >>>                         is the sore point, we could consider other >>> permission checks, too. There >>>                         is no other connection to ptrace in this >>> architecture as single >>>                         permission check to know if PID is fair >>> game to access by debugger >>>                         process. >>> >>>                         Why no parasitic thread or ptrace: Going >>> forward, binding the EU debugging to >>>                         the DRM client would also pave way for >>> being able to extend core kernel generated >>>                         core dump with each DRM client's EU >>> thread/memory dump. We have similar >>>                         feature called "Offline core dump" enabled >>> in the downstream public >>>                         trees for i915, where we currently attach >>> the EU thread dump to i915 error state >>>                         and then later combine i915 error state >>> with CPU core dump file with a >>>                         tool. >>> >>>                         This is relatively little amount of extra >>> code, as this baseline series >>>                         already introduces GDB the ability to >>> perform the necessary actions. >>>                         It's just the matter of kernel driver >>> calling: "stop all threads", then >>>                         copying the memory map and memory contents >>> for GPU threads, just like is >>>                         done for CPU threads. >>> >>>                         With parasitic thread injection, not sure >>> if there is such way forward, >>>                         as it would seem to require to inject quite >>> abit more logic to core kernel? >>> >>> >>>                             It's true that the AMD KFD part has >>> still similar functionality, but that is >>>                             because of the broken KFD design of >>> tying driver state to the CPU process >>>                             (which makes it inaccessible for gdb >>> even with imported render node fd). >>> >>>                             Both Sima and I (and partially Dave as >>> well) have pushed back on the KFD >>>                             approach. And the long term plan is to >>> get rid of such device driver specific >>>                             interface which re-implement existing >>> functionality just differently. >>> >>>                         Recapping, this series is not adding it >>> back. The debugger connection >>>                         is a separate FD from the DRM one, with >>> separate IOCTL set. We don't allow >>>                         the DRM FD any new operations based on >>> ptrace is attached or not. We >>>                         don't ever do that check even. >>> >>>                         We only restrict the opening of the >>> debugger connection to given PID with >>>                         ptrace_may_access check for now. That can >>> be changed to something else, >>>                         if necessary. >>> >>>                     Yeah I think unnecessarily tying gpu processes >>> to cpu processes is a bad >>>                     thing, least because even today all the svm >>> discussions we have still hit >>>                     clear use-cases, where a 1:1 match is not >>> wanted (like multiple gpu svm >>>                     sections with offsets). Not even speaking of >>> all the gpu usecases where >>>                     the gpu vm space is still entirely independent >>> of the cpu side. >>> >>>                     So that's why I think this entirely separate >>> approach looks like the right >>>                     one, with ptrace_may_access as the access >>> control check to make sure we >>>                     match ptrace on the cpu side. >>> >>>                     But there's very obviously a bikeshed to be had >>> on what the actual uapi >>>                     should look like, especially how gdb opens up a >>> gpu debug access fd. But I >>>                     also think that's not much on drm to decide, >>> but whatever gdb wants. And >>>                     then we aim for some consistency on that >>> lookup/access control part >>>                     (ideally, I might be missing some reasons why >>> this is a bad idea) across >>>                     drm drivers. >>> >>> >>>                             So you need to have a really really >>> good explanation why the eudebug interface >>>                             is actually necessary. >>> >>>                         TL;DR The main point is to decouple the >>> debugging of the EU workloads from the >>>                         debugging of the CPU process. This avoids >>> the interference with the CPU process with >>>                         parasitic thread injection. Further this >>> also allows generating a core dump >>>                         without any GDB connected. There are also >>> many other smaller pros/cons >>>                         which can be discussed but for the context >>> of this patch, this is the >>>                         main one. >>> >>>                         So unlike parasitic thread injection, we >>> don't unlock any special IOCTL for >>>                         the process under debug to be performed by >>> the parasitic thread, but we >>>                         allow the minimal set of operations to be >>> performed by GDB as if those were >>>                         done on the EUs themselves. >>> >>>                         One can think of it like the minimal subset >>> of ptrace but for EU threads, >>>                         not the CPU threads. And thus, building on >>> this it's possible to extend >>>                         the core kernel generated core dumps with >>> DRM specific extension which >>>                         would contain the EU thread/memory dump. >>> >>>                     It might be good to document (in that debugging >>> doc patch probably) why >>>                     thread injection is not a great option, and why >>> the tradeoffs for >>>                     debugging are different than for for >>> checkpoint/restore, where with CRIU >>>                     we landed on doing most of this in userspace, >>> and often requiring >>>                     injection threads to make it all work. >>> >>>                     Cheers, Sima >>> >>> >>>                         Regards, Joonas >>> >>> >>>                             Regards, >>>                             Christian. >>> >>> >>> >>>                                   Matt >>> >>> >>> [3]https://patchwork.freedesktop.org/patch/622520/?series=140200&re >>> v=6 >>> >>> >>>                                       Regards, >>>                                       Christian. >>> >>> >>>                                           Matt >>> >>> >>>                                               Regards, >>>                                               Christian. >>> >>> >>>