From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4FCE8C46CD2 for ; Tue, 2 Jan 2024 15:31:03 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E601210E0D5; Tue, 2 Jan 2024 15:31:02 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 567D910E0D5 for ; Tue, 2 Jan 2024 15:31:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1704209462; x=1735745462; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=XrzRX4qCF5PQJKTgfSwo5ckULXIn0GswpFb93AVai/k=; b=C3bjzvtyUWKcwx00lLfpHe2aYgWrPyOysp71Un7o+UUxRtrSIjCXTpez Heg/uZjFvwyfuywMm9/n0sogqGMP5kM1Fltmhwqrohy7+UepExr0e5vOv s+dKVi5WYx1mvxvp4pGDefuMBkNF4rmws41Vfveo3Gz4m5ww6mLEc8fyx AqPJxfP2ij8rfLfIaXoxCf097jM0XCM7tZiwU10nDjSXkU5QR021MNDdD S75KCOx9MjKpkGt4Psamjs7mqspEU+GTdGQcfOPO6BT8uYocfjC3HAM+v 2ETnqRSrKLkOB2EBvEYEifSfCyQFR7Ao4/YWRjQt5LNaXLnO7br0ONAQ3 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10941"; a="461194541" X-IronPort-AV: E=Sophos;i="6.04,325,1695711600"; d="scan'208";a="461194541" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jan 2024 07:31:01 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10941"; a="845612785" X-IronPort-AV: E=Sophos;i="6.04,325,1695711600"; d="scan'208";a="845612785" Received: from mkuoppal-desk.fi.intel.com (HELO localhost) ([10.237.72.193]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jan 2024 07:31:00 -0800 From: Mika Kuoppala To: "Poosa, Karthik" , "Gupta, Anshuman" , "intel-xe@lists.freedesktop.org" Subject: Re: [PATCH] drm/xe/debugfs: Add support for gt synchronous force reset In-Reply-To: <5f11046a-81c1-4984-bc60-ce904ff605ca@intel.com> References: <20231231144733.568481-1-karthik.poosa@intel.com> <5f11046a-81c1-4984-bc60-ce904ff605ca@intel.com> Date: Tue, 02 Jan 2024 17:26:32 +0200 Message-ID: <878r574uzb.fsf@mkuoppal-desk> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Vivi, Rodrigo" Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" "Poosa, Karthik" writes: > Hi Anshuman, > > Please find my comments inline below. > > Thanks, > > Karthik. > > On 01-01-2024 20:07, Gupta, Anshuman wrote: >> >>> -----Original Message----- >>> From: Poosa, Karthik >>> Sent: Sunday, December 31, 2023 8:18 PM >>> To: intel-xe@lists.freedesktop.org >>> Cc: Gupta, Anshuman ; Nilawar, Badal >>> ; Brost, Matthew ; >>> Vivi, Rodrigo ; Poosa, Karthik >>> >>> Subject: [PATCH] drm/xe/debugfs: Add support for gt synchronous force reset >>> >>> This support added as igt test freq_reset_multiple fails sporadically in case >>> xe_guc_pc is not started. >>> A completion is added in force_reset for this. >>> Writing non-zero to force_reset debugfs entry does synchronous reset. >> We need IGT patch as well for this patch. >>> v2: >>> - Changed wait for completion to interruptible (Anshuman). >>> - Moved timeout to xe_gt.h (Anshuman). >>> - Created a debugfs for updating timeout (Rodrigo). >>> >>> v3: >>> - Update force_reset debugfs with write support. >>> value 0 -> async reset, non-zero -> sync reset (Matthew Brost). >>> >>> Signed-off-by: Karthik Poosa >>> --- >>> drivers/gpu/drm/xe/xe_gt.c | 2 + >>> drivers/gpu/drm/xe/xe_gt_debugfs.c | 72 >>> ++++++++++++++++++++++++++++-- drivers/gpu/drm/xe/xe_gt_debugfs.h >>> | 3 ++ >>> drivers/gpu/drm/xe/xe_gt_types.h | 8 ++++ >>> 4 files changed, 82 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c index >>> 3af2adec1295..c8d2c18b0c62 100644 >>> --- a/drivers/gpu/drm/xe/xe_gt.c >>> +++ b/drivers/gpu/drm/xe/xe_gt.c >>> @@ -65,6 +65,7 @@ struct xe_gt *xe_gt_alloc(struct xe_tile *tile) >>> >>> gt->tile = tile; >>> gt->ordered_wq = alloc_ordered_workqueue("gt-ordered-wq", 0); >>> + init_completion(>->reset.done); >>> >>> return gt; >>> } >>> @@ -633,6 +634,7 @@ static int gt_reset(struct xe_gt *gt) >>> xe_device_mem_access_put(gt_to_xe(gt)); >>> XE_WARN_ON(err); >>> >>> + complete(>->reset.done); >>> xe_gt_info(gt, "reset done\n"); >>> >>> return 0; >>> diff --git a/drivers/gpu/drm/xe/xe_gt_debugfs.c >>> b/drivers/gpu/drm/xe/xe_gt_debugfs.c >>> index c4b67cf09f8f..3518ab89ccfc 100644 >>> --- a/drivers/gpu/drm/xe/xe_gt_debugfs.c >>> +++ b/drivers/gpu/drm/xe/xe_gt_debugfs.c >>> @@ -55,15 +55,71 @@ static int hw_engines(struct seq_file *m, void *data) >>> return 0; >>> } >>> >>> -static int force_reset(struct seq_file *m, void *data) >>> +static int force_reset(struct xe_gt *gt, bool is_sync) >>> { >>> - struct xe_gt *gt = node_to_gt(m->private); >>> + struct xe_device *xe = gt_to_xe(gt); >>> + long ret; >>> >>> xe_gt_reset_async(gt); >>> >>> + if (is_sync) { >>> + drm_info(&xe->drm, "waiting for gt reset completion"); >>> + ret = wait_for_completion_interruptible_timeout(>- >>>> reset.done, >>> + msecs_to_jiffies(gt- >>>> reset.timeout_ms)); >>> + if (ret <= 0) { >>> + drm_err(&xe->drm, "gt reset timed out/interrputed, >>> ret %ld\n", ret); >>> + return -ETIMEDOUT; >>> + } >>> + } >>> + >>> + return 0; >>> +} >>> + >>> +ssize_t force_reset_write(struct file *f, const char __user *buf, >>> +size_t len, loff_t *offset) { >>> + struct xe_gt *gt = file_inode(f)->i_private; >>> + struct xe_device *xe = gt_to_xe(gt); >>> + char arg_str[5]; >>> + size_t size; >>> + unsigned long arg; >>> + int ret; >>> + >>> + size = min(sizeof(arg_str)-1, len); >>> + arg = copy_from_user((void *)arg_str, buf, size); >>> + if (arg < 0) { >>> + drm_err(&xe->drm, "reading force_reset arg failed, ret %ld", >>> arg); >>> + return arg; >>> + } >>> + arg_str[size] = '\0'; >>> + >>> + ret = kstrtoul(arg_str, 0, &arg); >>> + if (ret) { >>> + drm_err(&xe->drm, "force_reset arg parse failed, ret %d", ret); >>> + return ret; >>> + } >>> + >>> + ret = force_reset(gt, (arg && 1)); >> How about !!arg > && will use a single instruction. > > but !! will happen with 2 instructions, so I think && would be better. This is not in a hotpath, more readable is better. -Mika > >> > + if (ret != 0) { >>> + drm_err(&xe->drm, "force_reset failed, ret %d\n", ret); >>> + return ret; >>> + } >>> + return len; >>> +} >>> + >>> +ssize_t force_reset_read(struct file *f, char __user *buf, size_t len, >>> +loff_t *offset) { >>> + struct xe_gt *gt = file_inode(f)->i_private; >>> + >>> + force_reset(gt, 0); >>> return 0; >>> } >>> >>> +static const struct file_operations force_reset_fops = { >>> + .owner = THIS_MODULE, >>> + .write = force_reset_write, >>> + .read = force_reset_read, >>> +}; >>> + >>> static int sa_info(struct seq_file *m, void *data) { >>> struct xe_tile *tile = gt_to_tile(node_to_gt(m->private)); >>> @@ -192,7 +248,6 @@ static int vecs_default_lrc(struct seq_file *m, void >>> *data) >>> >>> static const struct drm_info_list debugfs_list[] = { >>> {"hw_engines", hw_engines, 0}, >>> - {"force_reset", force_reset, 0}, >>> {"sa_info", sa_info, 0}, >>> {"topology", topology, 0}, >>> {"steering", steering, 0}, >>> @@ -245,5 +300,16 @@ void xe_gt_debugfs_register(struct xe_gt *gt) >>> ARRAY_SIZE(debugfs_list), >>> root, minor); >>> >>> + gt->reset.fr_dentry = debugfs_create_file("force_reset", 0600, root, >>> xe, >>> + &force_reset_fops); >> Please use DEFINE_SHOW_STORE_ATTRIBUTE here to create rw debugfs entry. >> Use force_reset_show and force_reset_restore function name. >>> + if (!gt->reset.fr_dentry) >>> + drm_warn(&xe->drm, "force_reset debugfs file creation >>> failed"); >>> + d_inode(gt->reset.fr_dentry)->i_private = gt; >>> + >>> + debugfs_create_u32("gt_reset_timeout_ms", 0600, root, >>> + >->reset.timeout_ms); >>> + /* set a default timeout value */ >>> + gt->reset.timeout_ms = 1000; >> xe_gt_debugfs_register() should only do the job of debugfs registration. >> This is odd here. > > Shall I move this to xe_gt_alloc (xe_gt.c), where gt is allocated and > initialized. > >>> + >>> xe_uc_debugfs_register(>->uc, root); } diff --git >>> a/drivers/gpu/drm/xe/xe_gt_debugfs.h >>> b/drivers/gpu/drm/xe/xe_gt_debugfs.h >>> index 5a329f118a57..40adcf1822c6 100644 >>> --- a/drivers/gpu/drm/xe/xe_gt_debugfs.h >>> +++ b/drivers/gpu/drm/xe/xe_gt_debugfs.h >>> @@ -6,8 +6,11 @@ >>> #ifndef _XE_GT_DEBUGFS_H_ >>> #define _XE_GT_DEBUGFS_H_ >>> >>> +#include >>> struct xe_gt; >>> >>> void xe_gt_debugfs_register(struct xe_gt *gt); >>> +ssize_t force_reset_write(struct file *f, const char __user *buf, >>> +size_t len, loff_t *offset); ssize_t force_reset_read(struct file *f, >>> +char __user *buf, size_t len, loff_t *offset); >>> >>> #endif >>> diff --git a/drivers/gpu/drm/xe/xe_gt_types.h >>> b/drivers/gpu/drm/xe/xe_gt_types.h >>> index f74684660475..d6134215094b 100644 >>> --- a/drivers/gpu/drm/xe/xe_gt_types.h >>> +++ b/drivers/gpu/drm/xe/xe_gt_types.h >>> @@ -148,6 +148,14 @@ struct xe_gt { >>> * code to safely flush all code paths >>> */ >>> struct work_struct worker; >>> + /** @done : completion for GT reset */ >>> + struct completion done; >>> + /** @timeout_ms : gt synchronous reset timeout in ms */ >>> + u32 timeout_ms; >>> + /** @dentry: dentry for force_reset */ >>> + struct dentry *fr_dentry; >>> + /** @is_sync_reset : gt reset is synchronous */ >>> + //bool is_sync_reset; >> Remove thus unused var. >> >> Thanks, >> Anshuman Gupta. >>> } reset; >>> >>> /** @tlb_invalidation: TLB invalidation state */ >>> -- >>> 2.25.1