From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7CA40C54791 for ; Wed, 13 Mar 2024 14:21:24 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D727310F30C; Wed, 13 Mar 2024 14:21:23 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (1024-bit key; unprotected) header.d=amd.com header.i=@amd.com header.b="dsgiqxFz"; dkim-atps=neutral Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2040.outbound.protection.outlook.com [40.107.236.40]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1B31D10F80C for ; Wed, 13 Mar 2024 14:21:22 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hkjF45BDqYgWzuSmYi7A8Shrjslzs2Hur7mSNO4mdIEOXlxJP4kGiBQzdU1DrKCOAvi43giF1Y2YiJaacwzm91PemUKGAGVR/mxr4YehXTmCSEmj7rJzGSoN9UQHQBZ8ZSztN5JznvZGO8Gg0J+OD8Y5EY0ZkyyAUTZRVLkbCz++Xfiz4oEgK8MxmbBgJSUVcAZXBYg/apGV3pWidr14rXy9SB2YnXmSZGIva7Oqaa9gE3fAEDUwZA+MR8XM0nZGzs/zsTS/QUb3kB2oMPaXcw1bnxFRKHktNPi6ZDSRmBTDS54CdVz6w5BN3yt8sjbDYRFbvF9CXr4QRDMrAq2Qjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rXgMB+xQkQ4gpX+++YsXdEeANSaz7X9as2JYDTFCNQU=; b=WvTtwSht6bhNsVS3O9GUQP0iOQdAm9lCC4BiRR9whDLY3+Emc2m4O9AakELdqpl2HVhk0IOTzTlf1Brr+hiIHiov18SmoKmIVogQoCvytiSJbsyUGxN35+Pk+IIDgy5JQO2o+9GSVuQY9Gv1T0wOxKEnFUXv6yIZH75JOCQF1G8+Bl06wmPnp9RxNt4stVoIJ3X9VFLG54jAwWuvN5ocg4ebRoiy2MFyIXvnyYZrEomMiJEYiaKnzzO+nN1rZCmaVJV5Mo5ji4xTvQ3K1gsLuzalxdoeJhW+CmC0gB1OyNQP+RUxZVFwHRjwR2iDQmpoUwsBEWUi39Zq7FOE15Xd8g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lists.freedesktop.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=rXgMB+xQkQ4gpX+++YsXdEeANSaz7X9as2JYDTFCNQU=; b=dsgiqxFzpw2sOotTfc5C2r1VPiVHR+z29H5UJi3N1KgOr0M9vKBIa1vzxgLuzCcECEz9G3pN/IJ4wt4HqMkGsmcQ2NSlmbOi10UUkieES3y44h+vUUscI350G10aUnfyvj0bdt4Q3t07uwckCtkb1kp4a0QsTBi6Uvr11dkcLOs= Received: from MW4PR03CA0019.namprd03.prod.outlook.com (2603:10b6:303:8f::24) by MN0PR12MB6080.namprd12.prod.outlook.com (2603:10b6:208:3c8::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7386.20; Wed, 13 Mar 2024 14:21:17 +0000 Received: from CO1PEPF000044FD.namprd21.prod.outlook.com (2603:10b6:303:8f:cafe::73) by MW4PR03CA0019.outlook.office365.com (2603:10b6:303:8f::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7339.38 via Frontend Transport; Wed, 13 Mar 2024 14:21:17 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1PEPF000044FD.mail.protection.outlook.com (10.167.241.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7409.0 via Frontend Transport; Wed, 13 Mar 2024 14:21:17 +0000 Received: from jonathan-KFD.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Wed, 13 Mar 2024 09:21:16 -0500 From: Jonathan Kim To: CC: , , Jonathan Kim , Jonathan Kim , Jesse Zhang Subject: [PATCH] drm/amdkfd: range check cp bad op exception interrupts Date: Wed, 13 Mar 2024 10:21:05 -0400 Message-ID: <20240313142105.3028030-1-Jonathan.Kim@amd.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF000044FD:EE_|MN0PR12MB6080:EE_ X-MS-Office365-Filtering-Correlation-Id: 7fdf130d-b8a4-47cb-0e7d-08dc4368d8e5 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: qaee4nkH8SHuP2qmi8hu1JMvh+AjtjyDDjH1tkVZy7L4DntlCAjd+1kTFhX9MZ3TMQYe/umGM+uHSVaBMaP4lfQTzwBMtnRhnjG4mrXJCi9GAGryekhVSCkP+UzFw4hGXoCXtXORhQsaFIhMgLOZ6nmW/iAOAZTy0Th8RHP8fj4kalGWxniN3dGFHtbH2bWPMAhaFMmF5a6WNU0hMIqVD7k1T+hxcw/iMPNHCFSwv2RGflZl7yiFVjx4IIAZZ1Y7WNdYwCGEcW7fWRHZrKEg8MMeRpUzRtMt4NTdaNzgD2z7+DC2BkM90NdFw6holbORPwGFshJlYXhXrvDe37YbhwoRhbDEV9x219c6g3BeWqCJ6Rm6YK6si1F92R1z387aGrT+A0af979dQ+E3baUZSDmAKTh3RqqLxqEiydwzqvdlN1avyBqi4PDiAIIyiGSrExLSKg2xhVYSdoUiTNjIj5iR3cB+NjXALFhfEXWjOIR1O33GXZ/sc6yWy0eKuaSDgytGg1EkO1ClwlPlI4o3NLO6Oqj1RQ6Q51H+14U8YPU61/aIB0qOVdbR+qnuoCP12gMsl0Kylb4773sypvS3vJkkf5TwEFB18qFzKW6vYB6vHJa3vAlPyaZs1kjRKF8Y5WJMpxI3cNKhOEukSYzVOQmjBSx3YN08zvjlc6E2tffRgu7IOFneKAPTgn7nBmwyCgro9gtJb+FyPP6dbezTq3YL3g1HSuVqys8yep1r/cRE+pbDMK+hBICet+UVi7Iy X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB04.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230031)(1800799015)(36860700004)(82310400014)(376005); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Mar 2024 14:21:17.2598 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7fdf130d-b8a4-47cb-0e7d-08dc4368d8e5 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF000044FD.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0PR12MB6080 X-BeenThere: amd-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: amd-gfx-bounces@lists.freedesktop.org Sender: "amd-gfx" Due to a CP interrupt bug, bad packet garbage exception codes are raised. Do a range check so that the debugger and runtime do not receive garbage codes. Update the user api to guard exception code type checking as well. Signed-off-by: Jonathan Kim Tested-by: Jesse Zhang --- .../gpu/drm/amd/amdkfd/kfd_int_process_v10.c | 3 ++- .../gpu/drm/amd/amdkfd/kfd_int_process_v11.c | 3 ++- drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 3 ++- include/uapi/linux/kfd_ioctl.h | 17 ++++++++++++++--- 4 files changed, 20 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c index a8e76287dde0..013d0a073b9b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c @@ -339,7 +339,8 @@ static void event_interrupt_wq_v10(struct kfd_node *dev, break; } kfd_signal_event_interrupt(pasid, context_id0 & 0x7fffff, 23); - } else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE) { + } else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE && + KFD_DBG_EC_TYPE_IS_PACKET(KFD_DEBUG_CP_BAD_OP_ECODE(context_id0))) { kfd_set_dbg_ev_from_interrupt(dev, pasid, KFD_DEBUG_DOORBELL_ID(context_id0), KFD_EC_MASK(KFD_DEBUG_CP_BAD_OP_ECODE(context_id0)), diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c index 7e2859736a55..fe2ad0c0de95 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c @@ -328,7 +328,8 @@ static void event_interrupt_wq_v11(struct kfd_node *dev, /* CP */ if (source_id == SOC15_INTSRC_CP_END_OF_PIPE) kfd_signal_event_interrupt(pasid, context_id0, 32); - else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE) + else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE && + KFD_DBG_EC_TYPE_IS_PACKET(KFD_CTXID0_CP_BAD_OP_ECODE(context_id0))) kfd_set_dbg_ev_from_interrupt(dev, pasid, KFD_CTXID0_DOORBELL_ID(context_id0), KFD_EC_MASK(KFD_CTXID0_CP_BAD_OP_ECODE(context_id0)), diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c index ff7392336795..5483211c5d3d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c @@ -388,7 +388,8 @@ static void event_interrupt_wq_v9(struct kfd_node *dev, break; } kfd_signal_event_interrupt(pasid, sq_int_data, 24); - } else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE) { + } else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE && + KFD_DBG_EC_TYPE_IS_PACKET(KFD_DEBUG_CP_BAD_OP_ECODE(context_id0))) { kfd_set_dbg_ev_from_interrupt(dev, pasid, KFD_DEBUG_DOORBELL_ID(context_id0), KFD_EC_MASK(KFD_DEBUG_CP_BAD_OP_ECODE(context_id0)), diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index 9ce46edc62a5..2040a470ddb4 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -913,14 +913,25 @@ enum kfd_dbg_trap_exception_code { KFD_EC_MASK(EC_DEVICE_NEW)) #define KFD_EC_MASK_PROCESS (KFD_EC_MASK(EC_PROCESS_RUNTIME) | \ KFD_EC_MASK(EC_PROCESS_DEVICE_REMOVE)) +#define KFD_EC_MASK_PACKET (KFD_EC_MASK(EC_QUEUE_PACKET_DISPATCH_DIM_INVALID) | \ + KFD_EC_MASK(EC_QUEUE_PACKET_DISPATCH_GROUP_SEGMENT_SIZE_INVALID) | \ + KFD_EC_MASK(EC_QUEUE_PACKET_DISPATCH_CODE_INVALID) | \ + KFD_EC_MASK(EC_QUEUE_PACKET_RESERVED) | \ + KFD_EC_MASK(EC_QUEUE_PACKET_UNSUPPORTED) | \ + KFD_EC_MASK(EC_QUEUE_PACKET_DISPATCH_WORK_GROUP_SIZE_INVALID) | \ + KFD_EC_MASK(EC_QUEUE_PACKET_DISPATCH_REGISTER_INVALID) | \ + KFD_EC_MASK(EC_QUEUE_PACKET_VENDOR_UNSUPPORTED)) /* Checks for exception code types for KFD search */ +#define KFD_DBG_EC_IS_VALID(ecode) (ecode > EC_NONE && ecode < EC_MAX) #define KFD_DBG_EC_TYPE_IS_QUEUE(ecode) \ - (!!(KFD_EC_MASK(ecode) & KFD_EC_MASK_QUEUE)) + (KFD_DBG_EC_IS_VALID(ecode) && !!(KFD_EC_MASK(ecode) & KFD_EC_MASK_QUEUE)) #define KFD_DBG_EC_TYPE_IS_DEVICE(ecode) \ - (!!(KFD_EC_MASK(ecode) & KFD_EC_MASK_DEVICE)) + (KFD_DBG_EC_IS_VALID(ecode) && !!(KFD_EC_MASK(ecode) & KFD_EC_MASK_DEVICE)) #define KFD_DBG_EC_TYPE_IS_PROCESS(ecode) \ - (!!(KFD_EC_MASK(ecode) & KFD_EC_MASK_PROCESS)) + (KFD_DBG_EC_IS_VALID(ecode) && !!(KFD_EC_MASK(ecode) & KFD_EC_MASK_PROCESS)) +#define KFD_DBG_EC_TYPE_IS_PACKET(ecode) \ + (KFD_DBG_EC_IS_VALID(ecode) && !!(KFD_EC_MASK(ecode) & KFD_EC_MASK_PACKET)) /* Runtime enable states */ -- 2.34.1