From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 326DBD6554C for ; Tue, 26 Nov 2024 19:19:01 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DAE4010E083; Tue, 26 Nov 2024 19:19:00 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="HqCfj54o"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) by gabe.freedesktop.org (Postfix) with ESMTPS id C0D5910E083 for ; Tue, 26 Nov 2024 19:18:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1732648740; x=1764184740; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=99oFKyB5Bq/hXOqvzZAKVH/aVX7Ec+9FiZUcP6/389U=; b=HqCfj54oV0QwkK0iIItn4oUfvBd3pH7nlONMHLgM1dJIFm3w59qgVfiI Nh5pHLvVMhdDVBdgEgf3jwZ3ClGojCxcWp7aEQCulH+joMaYUiClN+886 tpDcODZ9wPAVIqi13tL5r+lYiUIQxm5sKQFAeSRYsCY+oKMeH4AU2Sjvj UNBC0h6kFDIaDpoySQyb6b9wG1foJrBpJEGYByXvwJf+xBAl8ww7c7Mxz 6Y05XDfRu3VU0gh42aU9t4P3Rg9IwAIM2QQ18a7DGoP2Gge1BKgL6y4/y kQhynDxIZ+JeFqGyjkpt62zJX51YyQ4hw3yehc6dUTA0CJYL2V/o6utX1 Q==; X-CSE-ConnectionGUID: qEq7eZmPTpmJsRkN+6/nPg== X-CSE-MsgGUID: /IP0ug/LSn235n7+hpjHww== X-IronPort-AV: E=McAfee;i="6700,10204,11268"; a="44217843" X-IronPort-AV: E=Sophos;i="6.12,186,1728975600"; d="scan'208";a="44217843" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Nov 2024 11:18:58 -0800 X-CSE-ConnectionGUID: HeGNuCGXTBq1/o3jjFaFAg== X-CSE-MsgGUID: QkQ390vdT3673rTkrjXnKg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,186,1728975600"; d="scan'208";a="91811732" Received: from relo-linux-5.jf.intel.com ([10.165.21.152]) by orviesa006.jf.intel.com with ESMTP; 26 Nov 2024 11:18:58 -0800 From: John.C.Harrison@Intel.com To: Intel-Xe@Lists.FreeDesktop.Org Cc: John Harrison Subject: [PATCH v5 2/3] drm/xe: Move the coredump registration to the worker thread Date: Tue, 26 Nov 2024 11:18:56 -0800 Message-ID: <20241126191857.2583429-3-John.C.Harrison@Intel.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241126191857.2583429-1-John.C.Harrison@Intel.com> References: <20241126191857.2583429-1-John.C.Harrison@Intel.com> MIME-Version: 1.0 Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ Content-Transfer-Encoding: 8bit X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" From: John Harrison Adding lockdep checking to the coredump code showed that there was an existing violation. The dev_coredumpm_timeout() call is used to register the dump with the base coredump subsystem. However, that makes multiple memory allocations, only some of which use the GFP_ flags passed in. So that also needs to be deferred to the worker function where it is safe to allocate with arbitrary flags. In order to not add protoypes for the callback functions, moving the _timeout call also means moving the worker thread function to later in the file. Signed-off-by: John Harrison --- drivers/gpu/drm/xe/xe_devcoredump.c | 63 ++++++++++++++++------------- 1 file changed, 34 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c index f4c77f525819..5d19a4e3d5af 100644 --- a/drivers/gpu/drm/xe/xe_devcoredump.c +++ b/drivers/gpu/drm/xe/xe_devcoredump.c @@ -167,31 +167,6 @@ static void xe_devcoredump_snapshot_free(struct xe_devcoredump_snapshot *ss) ss->vm = NULL; } -static void xe_devcoredump_deferred_snap_work(struct work_struct *work) -{ - struct xe_devcoredump_snapshot *ss = container_of(work, typeof(*ss), work); - struct xe_devcoredump *coredump = container_of(ss, typeof(*coredump), snapshot); - unsigned int fw_ref; - - /* keep going if fw fails as we still want to save the memory and SW data */ - fw_ref = xe_force_wake_get(gt_to_fw(ss->gt), XE_FORCEWAKE_ALL); - if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL)) - xe_gt_info(ss->gt, "failed to get forcewake for coredump capture\n"); - xe_vm_snapshot_capture_delayed(ss->vm); - xe_guc_exec_queue_snapshot_capture_delayed(ss->ge); - xe_force_wake_put(gt_to_fw(ss->gt), fw_ref); - - /* Calculate devcoredump size */ - ss->read.size = __xe_devcoredump_read(NULL, INT_MAX, coredump); - - ss->read.buffer = kvmalloc(ss->read.size, GFP_USER); - if (!ss->read.buffer) - return; - - __xe_devcoredump_read(ss->read.buffer, ss->read.size, coredump); - xe_devcoredump_snapshot_free(ss); -} - static ssize_t xe_devcoredump_read(char *buffer, loff_t offset, size_t count, void *data, size_t datalen) { @@ -240,6 +215,40 @@ static void xe_devcoredump_free(void *data) "Xe device coredump has been deleted.\n"); } +static void xe_devcoredump_deferred_snap_work(struct work_struct *work) +{ + struct xe_devcoredump_snapshot *ss = container_of(work, typeof(*ss), work); + struct xe_devcoredump *coredump = container_of(ss, typeof(*coredump), snapshot); + unsigned int fw_ref; + + /* + * NB: Despite passing a GFP_ flags parameter here, more allocations are done + * internally using GFP_KERNEL expliictly. Hence this call must be in the worker + * thread and not in the initial capture call. + */ + dev_coredumpm_timeout(gt_to_xe(ss->gt)->drm.dev, THIS_MODULE, coredump, 0, GFP_KERNEL, + xe_devcoredump_read, xe_devcoredump_free, + XE_COREDUMP_TIMEOUT_JIFFIES); + + /* keep going if fw fails as we still want to save the memory and SW data */ + fw_ref = xe_force_wake_get(gt_to_fw(ss->gt), XE_FORCEWAKE_ALL); + if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL)) + xe_gt_info(ss->gt, "failed to get forcewake for coredump capture\n"); + xe_vm_snapshot_capture_delayed(ss->vm); + xe_guc_exec_queue_snapshot_capture_delayed(ss->ge); + xe_force_wake_put(gt_to_fw(ss->gt), fw_ref); + + /* Calculate devcoredump size */ + ss->read.size = __xe_devcoredump_read(NULL, INT_MAX, coredump); + + ss->read.buffer = kvmalloc(ss->read.size, GFP_USER); + if (!ss->read.buffer) + return; + + __xe_devcoredump_read(ss->read.buffer, ss->read.size, coredump); + xe_devcoredump_snapshot_free(ss); +} + static void devcoredump_snapshot(struct xe_devcoredump *coredump, struct xe_exec_queue *q, struct xe_sched_job *job) @@ -328,10 +337,6 @@ void xe_devcoredump(struct xe_exec_queue *q, struct xe_sched_job *job, const cha drm_info(&xe->drm, "Xe device coredump has been created\n"); drm_info(&xe->drm, "Check your /sys/class/drm/card%d/device/devcoredump/data\n", xe->drm.primary->index); - - dev_coredumpm_timeout(xe->drm.dev, THIS_MODULE, coredump, 0, GFP_KERNEL, - xe_devcoredump_read, xe_devcoredump_free, - XE_COREDUMP_TIMEOUT_JIFFIES); } static void xe_driver_devcoredump_fini(void *arg) -- 2.47.0