From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 162C7C54E66 for ; Thu, 14 Mar 2024 00:51:32 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BEA4010E959; Thu, 14 Mar 2024 00:51:31 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="LRT+NZau"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by gabe.freedesktop.org (Postfix) with ESMTPS id EE18810E959 for ; Thu, 14 Mar 2024 00:51:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1710377491; x=1741913491; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=dxy61VRIeQ31R6ujmP2Lk44vkotmwKr1Dj+ruEm0aTE=; b=LRT+NZaugY7KT8uvYn0vCfkegDHQFIU7k9BFq0IJ5urLmZqIzGdSJImJ Q1HwODGFR1KvkAdKuUojYwMcRzN9zCEQng902VEQmSeKfMPkKuy4GvCZK SHO6fJbKmSD1u0+R7lplG5WpJGZiSU+aki1BbRg2CZYLLQqhtz+tS+lB0 KMKVUPmpR6lyidnX66UcM31cLnHYUhosBDd9vcaz9i7O0Rd3+vQl+uy3X ggHVLaSWJ+ZYGi1YbYLv8iSrGjs6fYFjm/X0N8Y6wJrXpAdc+wWx8pxXf OfSInJX1VG4K03FzIkVYyv+ltVh3sQKehaWeguyUHdQvzXPlpp6QgwhA/ A==; X-IronPort-AV: E=McAfee;i="6600,9927,11012"; a="5368852" X-IronPort-AV: E=Sophos;i="6.07,124,1708416000"; d="scan'208";a="5368852" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Mar 2024 17:48:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,124,1708416000"; d="scan'208";a="16711176" Received: from orsosgc001.jf.intel.com (HELO unerlige-ril.intel.com) ([10.165.21.138]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Mar 2024 17:48:18 -0700 Date: Wed, 13 Mar 2024 17:48:17 -0700 Message-ID: <85wmq563la.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Umesh Nerlige Ramappa Cc: intel-xe@lists.freedesktop.org Subject: Re: [PATCH 17/17] drm/xe/oa: Enable Xe2+ overrun mode In-Reply-To: References: <20240312034003.2747815-1-ashutosh.dixit@intel.com> <20240312034003.2747815-2-ashutosh.dixit@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-redhat-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Tue, 12 Mar 2024 13:14:14 -0700, Umesh Nerlige Ramappa wrote: > Hi Umesh, > On Mon, Mar 11, 2024 at 08:40:03PM -0700, Ashutosh Dixit wrote: > > Enable Xe2+ overrun mode. For Xe2+, when overrun mode is enabled, there are > > no partial reports at the end of buffer, making the OA buffer effectively a > > non-power-of-2 size circular buffer whose size, circ_size, is a multiple of > > the report size. > > > > Signed-off-by: Ashutosh Dixit > > --- > > drivers/gpu/drm/xe/xe_oa.c | 36 +++++++++++++++++++++++++------- > > drivers/gpu/drm/xe/xe_oa_types.h | 3 +++ > > 2 files changed, 31 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c > > index 6f5bbb0787d9..6a0d2e229254 100644 > > --- a/drivers/gpu/drm/xe/xe_oa.c > > +++ b/drivers/gpu/drm/xe/xe_oa.c > > @@ -106,7 +106,15 @@ static const struct xe_oa_format oa_formats[] = { > > > > static u32 xe_oa_circ_diff(struct xe_oa_stream *stream, u32 tail, u32 head) > > { > > - return (tail - head) & (XE_OA_BUFFER_SIZE - 1); > > + if (stream->oa_buffer.circ_size == XE_OA_BUFFER_SIZE) > > + return (tail - head) & (XE_OA_BUFFER_SIZE - 1); > > + else > > + return (tail - head) % stream->oa_buffer.circ_size; > > +} > > For ex: consider a 16 MB buffer with a report size of 384 bytes. At the end > of the buffer, you would have an empty space of 256 bytes (16 MB % 384) > > (For ref: 16 MB = 0x1000000, 384 = 0x180) > In this case circ_size = 0xFFFF00 > > Let's say your head is pointing to 0xFFFD80 and tail is pointing to 0x180 > (essentially there is one unread report at the end of the buffer and one > unread report at the beginning of the buffer). > > In this case, (tail - head) % stream->oa_buffer.circ_size, is not > calculating the correct size. Should be 0x300, but I am not getting > that. Can you please check/verify? > > I am thinking we need something like this (roughly). We don't need the mod > operation. First of all, thank you so much for checking, and catching, this huge bug. I did some digging into it and it is to do with how the % operator behaves with -ve numbers, as well as with u32 vs. s32 data types. And yes we should not use the % operation. If this bug had got merged, I can only imagine what a nightmare it would be to find and fix it, so it's great it is caught in the code review. I missed this completely because IGT tests with Xe2 overrun mode enabled were passing in spite of this bug! Specially igt@non-zero-reason would hang previously, because we wouldn't find any reports because we would be misaligned, due to the empty space at the end of the OA buffer. This test was also passing in spite of this bug, I am still trying to figure out how it could still pass. In any case, I have sent out the latest version of just this patch (rather than the entire series) with this issue hopefully fixed. Please take a look. > > diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h > > index 6984e7d04be5..d8d5c9d8c22e 100644 > > --- a/drivers/gpu/drm/xe/xe_oa_types.h > > +++ b/drivers/gpu/drm/xe/xe_oa_types.h > > @@ -163,6 +163,9 @@ struct xe_oa_buffer { > > > > /** @tail: The last verified cached tail where HW has completed writing */ > > u32 tail; > > + > > + /** @circ_size: The effective circular buffer size, for Xe2+ */ > > + u32 circ_size; > > You could store the difference here instead. > > /** @empty_space: empty space at tend of buffer */ > u32 empty_space; I have left circ_size here as it was in the previous version, since I think it better indicates that we still have a non-power-of-2 sized circular buffer even in this Xe2+ overrun case. Thanks. -- Ashutosh