From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BCF94C54E92 for ; Tue, 20 May 2025 16:33:23 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A5A2C10E60A; Tue, 20 May 2025 16:33:21 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=igalia.com header.i=@igalia.com header.b="bWLts2ZO"; dkim-atps=neutral Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by gabe.freedesktop.org (Postfix) with ESMTPS id DA43E10E5AA; Tue, 20 May 2025 16:33:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=YdjyEPBmxO4B47+ZxLGzrbjFqRygXskU19UFo/LjOJM=; b=bWLts2ZODBFvtH+wGgsMpUh7b7 nAu2IEiYWkS7p1MLhuB9/NrtF6d1pyKy1NTlfXITZPI1qTo9Z34+2oB5e0job5+uC5wKhFCuqWNS/ V2ayLWB8goiJy6KFPpazNNQdFfM043h72CvELPE132mf4+FoPRNrk4P3v6XIurgX+fazAcAeOmbPF N6AM4QToRUe//nKtsyUHJSk3uAEX2TyCzzUTXK5Jsl6Ifgd9DIAiDOH3DbyC6sSLlBuVt+r8tPs6G in62DUYwH8D2X+ZICLlkdpsMmL98io0vJginl6Me+eHtjRAjh9yUEFRX/VrmcpwCAtjIEblmsD8I2 xNWutPVw==; Received: from [191.204.192.64] (helo=localhost.localdomain) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1uHPtp-00AnEg-Cy; Tue, 20 May 2025 18:33:05 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: "Alex Deucher" , =?UTF-8?q?Christian=20K=C3=B6nig?= , siqueira@igalia.com, airlied@gmail.com, simona@ffwll.ch, "Raag Jadav" , rodrigo.vivi@intel.com, jani.nikula@linux.intel.com, Xaver Hugl , Krzysztof Karas Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, kernel-dev@igalia.com, amd-gfx@lists.freedesktop.org, intel-xe@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v5 0/3] drm: Create a tas info option for wedge events Date: Tue, 20 May 2025 13:32:40 -0300 Message-ID: <20250520163243.328746-1-andrealmeid@igalia.com> X-Mailer: git-send-email 2.49.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" This patchset implements a request made by Xaver Hugl about wedge events: "I'd really like to have the PID of the client that triggered the GPU reset, so that we can kill it if multiple resets are triggered in a row (or switch to software rendering if it's KWin itself) and show a user-friendly notification about why their app(s) crashed, but that can be added later." >From https://lore.kernel.org/dri-devel/CAFZQkGwJ4qgHV8WTp2=svJ_VXhb-+Y8_VNtKB=jLsk6DqMYp9w@mail.gmail.com/ For testing, I've used amdgpu's debug_mask options debug_disable_soft_recovery and debug_disable_gpu_ring_reset to test both wedge event paths in the driver. To trigger a ring timeout, I've used this app: https://gitlab.freedesktop.org/andrealmeid/gpu-timeout Thanks! Changelog: v5: - Change from app to task also in structs, commit message and docs - Add a check for NULL or empty task name string v4: - Change from APP to TASK - Add defines for event_string and pid_string length v3: - Make comm_string and pid_string empty when there's no app info - Change "app that caused ..." to "app involved ..." - Clarify that devcoredump have more information about what happened v2: - Rebased on top of drm/drm-next - Added new patch for documentation André Almeida (3): drm: Create a task info option for wedge events drm/doc: Add a section about "Task information" for the wedge API drm/amdgpu: Make use of drm_wedge_task_info Documentation/gpu/drm-uapi.rst | 17 +++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 +++++- drivers/gpu/drm/drm_drv.c | 19 +++++++++++++++---- drivers/gpu/drm/i915/gt/intel_reset.c | 3 ++- drivers/gpu/drm/xe/xe_device.c | 3 ++- include/drm/drm_device.h | 8 ++++++++ include/drm/drm_drv.h | 3 ++- 8 files changed, 68 insertions(+), 10 deletions(-) -- 2.49.0