From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 43E5CC0218F for ; Thu, 30 Jan 2025 17:05:50 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0B97910E02D; Thu, 30 Jan 2025 17:05:50 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="jUOHB4ul"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) by gabe.freedesktop.org (Postfix) with ESMTPS id D7EBC10E02D for ; Thu, 30 Jan 2025 17:05:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1738256749; x=1769792749; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=0mxxr9AzZmLLV8LhZGfIixR5BwocaiT9iCTtpOH1Ibk=; b=jUOHB4ulIDqJjwCxypmEJTLwg3acmzjC/nmPRhnDEka1Yv8DCHg2rjUZ v5SE/xsHXTnbSADUE0xOj0jtLxaSNiwUMYuh/1L2nZkJiJDEWY3wDeWJT WzSZt5CyTr+iSQ0ycjSflHzgoH2w5+qIEAfDcHSvhpPD4FuKFpEUrzuWi KyBnSNwIGoXjTdiimD4anQpeQwR+pPffYwPV3Ag4cRd0KKN/1X4mhvbux PgjsNX4LdFtZ54WvG5MNW2M1B8IoO9OfrWatObSzUzcq7g3lu5i3FB3kz 9wpiT4OrKoBKJ33zOZ3TRtxXHw1yy8gYEUIsBS63B9Aqv3O3lToE4prvY g==; X-CSE-ConnectionGUID: 7eJT6f7jRh2LzLyb2kQ9Lw== X-CSE-MsgGUID: IBwy3X3aRiWKnb1a6pBYuw== X-IronPort-AV: E=McAfee;i="6700,10204,11331"; a="49469533" X-IronPort-AV: E=Sophos;i="6.13,245,1732608000"; d="scan'208";a="49469533" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2025 09:05:10 -0800 X-CSE-ConnectionGUID: J2DLg7PlTbiUix1vygJMtg== X-CSE-MsgGUID: 02cn6eUMREayx516QdbOgQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="110282074" Received: from orsosgc001.jf.intel.com (HELO orsosgc001.intel.com) ([10.165.21.142]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2025 09:05:10 -0800 Date: Thu, 30 Jan 2025 09:05:10 -0800 Message-ID: <8534h0fd15.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Harish Chegondi Cc: Subject: Re: [PATCH v8 4/7] drm/xe/eustall: Return -EIO error from read() if HW drops data In-Reply-To: <857c6cgb94.wl-ashutosh.dixit@intel.com> References: <9dc0f747ae215f9704fe1edec2e7007e19f8dd0b.1736970203.git.harish.chegondi@intel.com> <857c6cgb94.wl-ashutosh.dixit@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-redhat-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Wed, 29 Jan 2025 20:45:59 -0800, Dixit, Ashutosh wrote: > > On Wed, 15 Jan 2025 12:02:10 -0800, Harish Chegondi wrote: > > > > If the user space doesn't read the EU stall data fast enough, > > it is possible that the EU stall data buffer can get filled, > > and if the hardware wants to write more data, it simply drops > > data due to unavailable buffer space. In that case, hardware > > sets a bit in a register. If the driver detects data drop, > > the driver read() returns -EIO error to let the user space > > know that HW has dropped data. The -EIO error is returned > > even if there is EU stall data in the buffer. A subsequent > > read by the user space returns the remaining EU stall data. > > As I mentioned earlier, entire dropped packet handling should be in this > patch, so we can see the entire logic around this. So data_drop struct > should be defined in this patch. > > > > > Signed-off-by: Harish Chegondi > > --- > > drivers/gpu/drm/xe/xe_eu_stall.c | 12 ++++++++++++ > > drivers/gpu/drm/xe/xe_eu_stall.h | 1 + > > 2 files changed, 13 insertions(+) > > > > diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c > > index c388d733b857..437782f8433c 100644 > > --- a/drivers/gpu/drm/xe/xe_eu_stall.c > > +++ b/drivers/gpu/drm/xe/xe_eu_stall.c > > @@ -472,6 +472,7 @@ xe_eu_stall_stream_read_locked(struct xe_eu_stall_data_stream *stream, > > * before calling read(). > > * > > * Returns: The number of bytes copied or a negative error code on failure. > > + * -EIO if HW drops any EU stall data when the buffer is full. > > */ > > static ssize_t xe_eu_stall_stream_read(struct file *file, char __user *buf, > > size_t count, loff_t *ppos) > > @@ -485,6 +486,16 @@ static ssize_t xe_eu_stall_stream_read(struct file *file, char __user *buf, > > return -EINVAL; > > } > > > > + if (bitmap_weight(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS)) { > > Since data_drop.mask is being touched elsewhere under xecore_buf->lock, > here also it should be accessed under the same lock. So this returning -EIO > should probably be moved into xe_eu_stall_stream_read_locked? > > > + if (!stream->data_drop.reported_to_user) { > > + stream->data_drop.reported_to_user = true; > > + xe_gt_dbg(gt, "EU stall data dropped in XeCores: %*pb\n", > > + XE_MAX_DSS_FUSE_BITS, stream->data_drop.mask); > > + return -EIO; > > + } > > + stream->data_drop.reported_to_user = false; > > I don't think this logic is correct. We should set this to false only after > we have cleared all set bits (e.g. only after bitmap_weight) otherwise we > might keep returning -EIO multiple times? > > If HW continues to drop data and keep setting the line, while we are > resetting the bit, it is possible bitmap_weight might never become 0. I > think that is ok, we have returned -EIO at least once to indicate to > userspace that it is not reading data fast enough and HW is dropping data. > > Or we may return -EIO multiple times as is happening here, where > reported_to_user is set to 0 before all bits might have been cleared. So > what is happening here might be ok too. > > To see this clearly and evaluate it is why I am saying move all of this > data drop handling and -EIO return into this one patch. So we can decide > which approach to take: return -EIO just once or return multiple times. > > We can also maybe defer this patch and merge the other stuff first if it's > a separate patch. > > So maybe this is ok, maybe not, anyway something to think about. Because we don't want to complicate this, here's another simple idea: In eu_stall_data_buf_check(), when we see the drop bit set for a particular dss, move the read pointer (effectively discarding any data from that dss) and also clear_dropped_eviction_line_bit(), but the dss with dropped data is saved off in data_drop.mask. So basically we are clearing and saving the error condition. Discarding data from that dss might be ok, since HW has already dropped data. Now in the next read_locked, if any data_drop.mask bit is set, return -EIO and clear data_drop.mask. This way we don't need data_drop.reported_to_user. So this way the entire situation is cleared and handled once we have returned -EIO. If more data gets dropped before the next read, same process repeats and -EIO will be returned again. > > > + } > > + > > if (!(file->f_flags & O_NONBLOCK)) { > > do { > > if (!stream->pollin) { > > @@ -680,6 +691,7 @@ static int xe_eu_stall_stream_init(struct xe_eu_stall_data_stream *stream, > > if (!stream->xecore_buf) > > return -ENOMEM; > > > > + stream->data_drop.reported_to_user = false; > > bitmap_zero(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS); > > Stream is kzalloc'd, why do you need to init these? > > > > > xe_pm_runtime_get(gt_to_xe(gt)); > > diff --git a/drivers/gpu/drm/xe/xe_eu_stall.h b/drivers/gpu/drm/xe/xe_eu_stall.h > > index f97c8bf8e852..8bc44e9e98af 100644 > > --- a/drivers/gpu/drm/xe/xe_eu_stall.h > > +++ b/drivers/gpu/drm/xe/xe_eu_stall.h > > @@ -31,6 +31,7 @@ struct xe_eu_stall_data_stream { > > struct xe_bo *bo; > > struct per_xecore_buf *xecore_buf; > > struct { > > + bool reported_to_user; > > xe_dss_mask_t mask; > > } data_drop; > > struct hrtimer poll_check_timer; > > -- > > 2.47.1 > >