From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B5D62C0218F for ; Thu, 30 Jan 2025 04:46:02 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4D61610E13B; Thu, 30 Jan 2025 04:46:02 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="cx2EcESW"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8164210E13B for ; Thu, 30 Jan 2025 04:46:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1738212360; x=1769748360; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=VttJv2iwnrjwVOOGbphJZq0TYo+cPz3zOUMO4olyhBU=; b=cx2EcESWXTc+ns7cYd2B/sqQj2WAKCb7SdCTQ8pQbdfdx2VxQPiw+egR shamMgsXvVHQdMkbt22Ojkj9BuO+z/vxU6aeNc2lPPjH6tXsaY4/KGEWw gwgSTigyE+MP63eScplX+jwHygkaIhXLSXyei4JIqbJIlIK+1sUQ4YvCz vIrfG0KecqA37+zgPvqH7RtHuYwD6zPFza1jxyCROZuy7jdR/Vq9QWEul Jx3KzvIKXRuwWXk2XWW/S6QvSueJxKYOh0fk/mS5jPqBP1J54FlZE6Xhd xX6Ri70ciIJJdQ+5J8ev73eCUagWHTXqVvPzMvZB15abopPR726dF2rQD w==; X-CSE-ConnectionGUID: 6iOnUkXbQUC6cxhZr/P0MQ== X-CSE-MsgGUID: U+SKdR4jTDuYH2A+lPmhxw== X-IronPort-AV: E=McAfee;i="6700,10204,11330"; a="61219985" X-IronPort-AV: E=Sophos;i="6.13,244,1732608000"; d="scan'208";a="61219985" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2025 20:46:00 -0800 X-CSE-ConnectionGUID: nTiIS3lKTu+2QlMS4/qI9w== X-CSE-MsgGUID: MwFbiIvsTmWqjXFV5exP6Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="109080348" Received: from orsosgc001.jf.intel.com (HELO orsosgc001.intel.com) ([10.165.21.142]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2025 20:46:00 -0800 Date: Wed, 29 Jan 2025 20:45:59 -0800 Message-ID: <857c6cgb94.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Harish Chegondi Cc: Subject: Re: [PATCH v8 4/7] drm/xe/eustall: Return -EIO error from read() if HW drops data In-Reply-To: <9dc0f747ae215f9704fe1edec2e7007e19f8dd0b.1736970203.git.harish.chegondi@intel.com> References: <9dc0f747ae215f9704fe1edec2e7007e19f8dd0b.1736970203.git.harish.chegondi@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-redhat-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Wed, 15 Jan 2025 12:02:10 -0800, Harish Chegondi wrote: > > If the user space doesn't read the EU stall data fast enough, > it is possible that the EU stall data buffer can get filled, > and if the hardware wants to write more data, it simply drops > data due to unavailable buffer space. In that case, hardware > sets a bit in a register. If the driver detects data drop, > the driver read() returns -EIO error to let the user space > know that HW has dropped data. The -EIO error is returned > even if there is EU stall data in the buffer. A subsequent > read by the user space returns the remaining EU stall data. As I mentioned earlier, entire dropped packet handling should be in this patch, so we can see the entire logic around this. So data_drop struct should be defined in this patch. > > Signed-off-by: Harish Chegondi > --- > drivers/gpu/drm/xe/xe_eu_stall.c | 12 ++++++++++++ > drivers/gpu/drm/xe/xe_eu_stall.h | 1 + > 2 files changed, 13 insertions(+) > > diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c > index c388d733b857..437782f8433c 100644 > --- a/drivers/gpu/drm/xe/xe_eu_stall.c > +++ b/drivers/gpu/drm/xe/xe_eu_stall.c > @@ -472,6 +472,7 @@ xe_eu_stall_stream_read_locked(struct xe_eu_stall_data_stream *stream, > * before calling read(). > * > * Returns: The number of bytes copied or a negative error code on failure. > + * -EIO if HW drops any EU stall data when the buffer is full. > */ > static ssize_t xe_eu_stall_stream_read(struct file *file, char __user *buf, > size_t count, loff_t *ppos) > @@ -485,6 +486,16 @@ static ssize_t xe_eu_stall_stream_read(struct file *file, char __user *buf, > return -EINVAL; > } > > + if (bitmap_weight(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS)) { Since data_drop.mask is being touched elsewhere under xecore_buf->lock, here also it should be accessed under the same lock. So this returning -EIO should probably be moved into xe_eu_stall_stream_read_locked? > + if (!stream->data_drop.reported_to_user) { > + stream->data_drop.reported_to_user = true; > + xe_gt_dbg(gt, "EU stall data dropped in XeCores: %*pb\n", > + XE_MAX_DSS_FUSE_BITS, stream->data_drop.mask); > + return -EIO; > + } > + stream->data_drop.reported_to_user = false; I don't think this logic is correct. We should set this to false only after we have cleared all set bits (e.g. only after bitmap_weight) otherwise we might keep returning -EIO multiple times? If HW continues to drop data and keep setting the line, while we are resetting the bit, it is possible bitmap_weight might never become 0. I think that is ok, we have returned -EIO at least once to indicate to userspace that it is not reading data fast enough and HW is dropping data. Or we may return -EIO multiple times as is happening here, where reported_to_user is set to 0 before all bits might have been cleared. So what is happening here might be ok too. To see this clearly and evaluate it is why I am saying move all of this data drop handling and -EIO return into this one patch. So we can decide which approach to take: return -EIO just once or return multiple times. We can also maybe defer this patch and merge the other stuff first if it's a separate patch. So maybe this is ok, maybe not, anyway something to think about. > + } > + > if (!(file->f_flags & O_NONBLOCK)) { > do { > if (!stream->pollin) { > @@ -680,6 +691,7 @@ static int xe_eu_stall_stream_init(struct xe_eu_stall_data_stream *stream, > if (!stream->xecore_buf) > return -ENOMEM; > > + stream->data_drop.reported_to_user = false; > bitmap_zero(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS); Stream is kzalloc'd, why do you need to init these? > > xe_pm_runtime_get(gt_to_xe(gt)); > diff --git a/drivers/gpu/drm/xe/xe_eu_stall.h b/drivers/gpu/drm/xe/xe_eu_stall.h > index f97c8bf8e852..8bc44e9e98af 100644 > --- a/drivers/gpu/drm/xe/xe_eu_stall.h > +++ b/drivers/gpu/drm/xe/xe_eu_stall.h > @@ -31,6 +31,7 @@ struct xe_eu_stall_data_stream { > struct xe_bo *bo; > struct per_xecore_buf *xecore_buf; > struct { > + bool reported_to_user; > xe_dss_mask_t mask; > } data_drop; > struct hrtimer poll_check_timer; > -- > 2.47.1 >