From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2D8D2F36B9A for ; Fri, 10 Apr 2026 01:37:36 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E395910E88D; Fri, 10 Apr 2026 01:37:35 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="be4bUIQ1"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0DC8710E88D for ; Fri, 10 Apr 2026 01:37:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1775785054; x=1807321054; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=gZ8f1aLlXHOTDWSJnkpidRohUYSQCEdeJE//e5vz46Y=; b=be4bUIQ1DMicFClVfrm49xfdz8iQgKJgZXOmJwiU52DDEhpnCG0eDtwk JeBktj9n5CCKgDEJrpBIeOvlMtRbNkO9rCZPLK79CKuJDKPL7tLXaa3VE 8ahUwvsZ14cmB5W8IKtx60HRCGKTV+cjz9D55kxaA2kMPsfSm0w4E3jgb 2Ppc3CJ+n7D5fQyrpZsK0T9gcRVr1aLrbahCR7SXVEmSUJvhTpmRY+7wO oKmToc1yEzAEVinFjyYM0uNnddmgqkhsxt8B6Www0YHqZu2Icqa2WPn9j /TshF8JwqH+k49bSAjpkJAZ7av5Mb8KfmtP0J1EreJqr962MLg2ehC8Fi g==; X-CSE-ConnectionGUID: 0zHmaifeTxOumMU88dAS4w== X-CSE-MsgGUID: /KIeif2mQuG3QwI3SxAMlg== X-IronPort-AV: E=McAfee;i="6800,10657,11754"; a="75971488" X-IronPort-AV: E=Sophos;i="6.23,170,1770624000"; d="scan'208";a="75971488" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Apr 2026 18:37:33 -0700 X-CSE-ConnectionGUID: OXb38DFwTMWQTRHGPmymOQ== X-CSE-MsgGUID: iM0pcmawTvWYLf7MnWuKGQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,170,1770624000"; d="scan'208";a="259421236" Received: from cmaltby-mobl.amr.corp.intel.com (HELO adixit-MOBL3.intel.com) ([10.125.66.140]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Apr 2026 18:37:32 -0700 Date: Thu, 09 Apr 2026 18:37:31 -0700 Message-ID: <87a4vb8ujo.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Harish Chegondi Cc: , , , , Subject: Re: [PATCH v4 1/1] drm/xe/eustall: Return ENODEV from read if EU stall registers get reset In-Reply-To: <4360d082795f1e16e338de7f253926b1680e0beb.1775023744.git.harish.chegondi@intel.com> References: <4360d082795f1e16e338de7f253926b1680e0beb.1775023744.git.harish.chegondi@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Tue, 31 Mar 2026 23:15:44 -0700, Harish Chegondi wrote: > > If a reset (GT or engine) happens during EU stall data sampling, all the > EU stall registers can get reset to 0. This will result in EU stall data > buffers' read and write pointer register values to be out of sync with > the cached values. This will result in read() returning invalid data. To > prevent this, check the value of a EU stall base register. If it is zero, > it indicates a reset may have happened that wiped the register to zero. > If this happens, return ENODEV from read() upon which the user space > should disable and enable EU stall data sampling or close the fd and > open a new fd for a new EU stall data collection session. > > This patch has been tested by running two IGT tests simultaneously > xe_eu_stall and xe_exec_reset. > > Signed-off-by: Harish Chegondi > --- > v2: Move base register check from read to the poll function > v3: Don't reschedule work item after reset detected (Ashutosh) > v4: Return ENODEV as errno instead of EBADFD > Reset reset_detected in enable() > > drivers/gpu/drm/xe/xe_eu_stall.c | 39 +++++++++++++++++++++++++------- > 1 file changed, 31 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c > index c34408cfd292..8863d8ebd5d1 100644 > --- a/drivers/gpu/drm/xe/xe_eu_stall.c > +++ b/drivers/gpu/drm/xe/xe_eu_stall.c > @@ -44,6 +44,7 @@ struct per_xecore_buf { > struct xe_eu_stall_data_stream { > bool pollin; > bool enabled; > + bool reset_detected; > int wait_num_reports; > int sampling_rate_mult; > wait_queue_head_t poll_wq; > @@ -428,9 +429,20 @@ static bool eu_stall_data_buf_poll(struct xe_eu_stall_data_stream *stream) > set_bit(xecore, stream->data_drop.mask); > xecore_buf->write = write_ptr; > } > + /* If a GT or engine reset happens during EU stall sampling, > + * all EU stall registers get reset to 0 and the cached values of > + * the EU stall data buffers' read pointers are out of sync with > + * the register values. This causes invalid data to be returned > + * from read(). To prevent this, check the value of a EU stall base > + * register. If it is zero, there has been a reset. > + */ > + if (unlikely(!xe_gt_mcr_unicast_read_any(gt, XEHPC_EUSTALL_BASE))) > + stream->reset_detected = true; > + > + stream->pollin = min_data_present || stream->reset_detected; > mutex_unlock(&stream->xecore_buf_lock); > > - return min_data_present; > + return stream->pollin; > } > > static void clear_dropped_eviction_line_bit(struct xe_gt *gt, u16 group, u16 instance) > @@ -554,6 +566,15 @@ static ssize_t xe_eu_stall_stream_read_locked(struct xe_eu_stall_data_stream *st > } > stream->data_drop.reported_to_user = false; > } > + /* If EU stall registers got reset due to a GT/engine reset, > + * continuing with the read() will return invalid data to > + * the user space. Just return -ENODEV instead. > + */ > + if (unlikely(stream->reset_detected)) { > + xe_gt_dbg(gt, "EU stall base register has been reset\n"); > + mutex_unlock(&stream->xecore_buf_lock); > + return -ENODEV; > + } Likely OK, but could you quickly check if this new if statement should be moved above the previous 'if (bitmap_weight...)' if statement. Otherwise, this is now: Reviewed-by: Ashutosh Dixit > > for_each_dss_steering(xecore, gt, group, instance) { > ret = xe_eu_stall_data_buf_read(stream, buf, count, &total_size, > @@ -609,7 +630,8 @@ static ssize_t xe_eu_stall_stream_read(struct file *file, char __user *buf, > * We don't want to block the next read() when there is data in the buffer > * now, but couldn't be accommodated in the small user buffer. > */ > - stream->pollin = false; > + if (!stream->reset_detected) > + stream->pollin = false; > > return ret; > } > @@ -692,6 +714,7 @@ static int xe_eu_stall_stream_enable(struct xe_eu_stall_data_stream *stream) > xecore_buf->write = write_ptr; > xecore_buf->read = write_ptr; > } > + stream->reset_detected = false; > stream->data_drop.reported_to_user = false; > bitmap_zero(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS); > > @@ -717,13 +740,13 @@ static void eu_stall_data_buf_poll_work_fn(struct work_struct *work) > container_of(work, typeof(*stream), buf_poll_work.work); > struct xe_gt *gt = stream->gt; > > - if (eu_stall_data_buf_poll(stream)) { > - stream->pollin = true; > + if (eu_stall_data_buf_poll(stream)) > wake_up(&stream->poll_wq); > - } > - queue_delayed_work(gt->eu_stall->buf_ptr_poll_wq, > - &stream->buf_poll_work, > - msecs_to_jiffies(POLL_PERIOD_MS)); > + > + if (!stream->reset_detected) > + queue_delayed_work(gt->eu_stall->buf_ptr_poll_wq, > + &stream->buf_poll_work, > + msecs_to_jiffies(POLL_PERIOD_MS)); > } > > static int xe_eu_stall_stream_init(struct xe_eu_stall_data_stream *stream, > -- > 2.43.0 >