From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47499C2BB1D for ; Tue, 14 Apr 2020 19:37:38 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 214A0206E9 for ; Tue, 14 Apr 2020 19:37:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 214A0206E9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BF17C6E144; Tue, 14 Apr 2020 19:37:37 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4F4086E144 for ; Tue, 14 Apr 2020 19:37:36 +0000 (UTC) IronPort-SDR: y7iNsH00njPCEhutZuipngslifoPlPQlljH9FMAeZc0kg2zq2jvpJe9GGYdjEwAzsZRzhwomAT v8ds3K6oNrug== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Apr 2020 12:37:35 -0700 IronPort-SDR: 8gwO2e2vBACAOwfkof1Rwnm8sjZfBhNub15mUFkzyQDdZrAPwrXsB4AlLY1EbODbzGL6/byFha IiBp4gMw5/nA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,384,1580803200"; d="scan'208";a="454669775" Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.212.39.201]) by fmsmga006.fm.intel.com with ESMTP; 14 Apr 2020 12:37:30 -0700 Date: Tue, 14 Apr 2020 12:37:25 -0700 Message-ID: <87tv1l505m.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Umesh Nerlige Ramappa In-Reply-To: <87v9m151ft.wl-ashutosh.dixit@intel.com> References: <20200413154822.11620-1-umesh.nerlige.ramappa@intel.com> <20200413154822.11620-2-umesh.nerlige.ramappa@intel.com> <87y2qy4psw.wl-ashutosh.dixit@intel.com> <87wo6i4kqd.wl-ashutosh.dixit@intel.com> <20200414165948.GD61072@orsosgc001.amr.corp.intel.com> <87v9m151ft.wl-ashutosh.dixit@intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL/10.8 EasyPG/1.0.0 Emacs/26 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Subject: Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for blocking perf OA reads X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: intel-gfx@lists.freedesktop.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Tue, 14 Apr 2020 12:09:42 -0700, Dixit, Ashutosh wrote: > > On Tue, 14 Apr 2020 09:59:48 -0700, Umesh Nerlige Ramappa wrote: > > On Mon, Apr 13, 2020 at 11:58:18PM -0700, Dixit, Ashutosh wrote: > > > On Mon, 13 Apr 2020 22:08:47 -0700, Dixit, Ashutosh wrote: > > >> > > >> On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote: > > >> > > > >> > A condition in wait_event_interruptible seems to be checked twice before > > >> > waiting on the event to occur. These checks are redundant when hrtimer > > >> > events will call oa_buffer_check_unlocked to update the oa_buffer tail > > >> > pointers. The redundant checks add cpu overhead. Simplify the check > > >> > to reduce cpu overhead when using blocking io to read oa buffer reports. > > >> > > > >> > Signed-off-by: Umesh Nerlige Ramappa > > >> > --- > > >> > drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++- > > >> > 1 file changed, 27 insertions(+), 1 deletion(-) > > >> > > > >> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c > > >> > index 5cde3e4e7be6..e28a3ab83fde 100644 > > >> > --- a/drivers/gpu/drm/i915/i915_perf.c > > >> > +++ b/drivers/gpu/drm/i915/i915_perf.c > > >> > @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream) > > >> > return pollin; > > >> > } > > >> > > > >> > +/** > > >> > + * oa_buffer_check_reports - quick check if reports are available > > >> > + * @stream: i915 stream instance > > >> > + * > > >> > + * The return from this function is used as a condition for > > >> > + * wait_event_interruptible in blocking read. This is used to detect > > >> > + * available reports. > > >> > + * > > >> > + * A condition in wait_event_interruptible seems to be checked twice before > > >> > + * waiting on an event to occur. These checks are redundant when hrtimer events > > >> > + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The > > >> > + * redundant checks add cpu overhead. We simplify the check to reduce cpu > > >> > + * overhead. > > >> > + */ > > >> > +static bool oa_buffer_check_reports(struct i915_perf_stream *stream) > > >> > +{ > > >> > + unsigned long flags; > > >> > + bool available; > > >> > + > > >> > + spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags); > > >> > + available = stream->oa_buffer.tail != stream->oa_buffer.head; > > >> > + spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags); > > >> > + > > >> > + return available; > > >> > +} > > >> > + > > >> > /** > > >> > * append_oa_status - Appends a status record to a userspace read() buffer. > > >> > * @stream: An i915-perf stream opened for OA metrics > > >> > @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream) > > >> > return -EIO; > > >> > > > >> > return wait_event_interruptible(stream->poll_wq, > > >> > - oa_buffer_check_unlocked(stream)); > > >> > + oa_buffer_check_reports(stream)); > > >> > } > > >> > > >> I think the problem with this patch is that the original code had the > > >> property that the condition for data availability is checked (and the OA > > >> tail advanced) /before/ entering the wait. So the tail was advanced and if > > >> there was data there was no need to wait at all. This change has lost that > > >> property. With this change we will first always enter the wait and then get > > >> unblocked after the timer interrupt which will advance the tail and wake us > > >> up. > > >> > > >> I think if we want to do this, it is possible to do without losing the > > >> original property. Just call oa_buffer_check_unlocked() first (outside > > >> wait_event) and if there is data just return. If not, put yourself on > > >> stream->poll_wq from which the timer callback will wake us up. I think this > > >> is going to be something like prepare_to_wait() / schedule() / > > >> finish_wait() pattern of which there are numerous examples in the > > >> kernel. So in this case we won't call wait_event_interruptible() which > > >> checks the condition upon waking up. No need to define > > >> oa_buffer_check_reports() either. > > > > > > If on the other hand we say that this should actually be the desired > > > behavior of the blocking read, that it should not unblock immediately but > > > only after the next timer interrupt (so that an app can call the blocking > > > read repeatedly without worrying about it will return a little bit of data > > > in each call at a considerable amount of CPU load), then we may be able to > > > do something like this, but even then we may have to think about what would > > > be the correct way to do that. Though this may be that correct way and we > > > may just need to change the commit message, but is that what we want? > > > > I am not quite clear on how the blocking read should behave in terms of the > > API itself. > > > > The change here is solely to reduce cpu overhead from the additional 2 > > calls to oa_buffer_check_unlocked before blocking and that would happen on > > every call to the read. I agree that the read would block if the timer did > > not update the tail yet, but that makes sense in a way that we don't want > > read to constantly return data when the OA sampling rates are high (say 100 > > us). With this change the behavior becomes consistent irrespective of the > > OA sampling rate and user has the flexibility to set the POLL_OA_PERIOD to > > whatever value is suitable for the OA sampling rate chosen. > > I think the patch commit message should clearly state the reason for the > patch. Afais if the purpose of the patch is to reduce cpu overhead, the > patch should not change API behavior. If on the other hand the purpose of > the patch is to make the API behavior consistent, then please resubmit the > patch after modifying the commit message to reflect the purpose of the > patch so that the patch can be reviewed from that point of view. > > Also I think we should be able to just use stream->pollin instead of > defining the new oa_buffer_check_reports(). E.g. isn't > stream->pollin == (stream->oa_buffer.tail != stream->oa_buffer.head) ? Actually yes, using stream->pollin will make the wait for both blocking and non-blocking reads identical. So the blocking wait just becomes: static int i915_oa_wait_unlocked(struct i915_perf_stream *stream) { ... return wait_event_interruptible(stream->poll_wq, stream->pollin); } That is if we re-purpose the patch to make the API behavior consistent across blocking and non-blocking reads. _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx