From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=MmFH=56=lists.freedesktop.org=intel-gfx-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,
	URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 47499C2BB1D
	for <intel-gfx@archiver.kernel.org>; Tue, 14 Apr 2020 19:37:38 +0000 (UTC)
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 214A0206E9
	for <intel-gfx@archiver.kernel.org>; Tue, 14 Apr 2020 19:37:38 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 214A0206E9
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id BF17C6E144;
	Tue, 14 Apr 2020 19:37:37 +0000 (UTC)
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 4F4086E144
 for <intel-gfx@lists.freedesktop.org>; Tue, 14 Apr 2020 19:37:36 +0000 (UTC)
IronPort-SDR: y7iNsH00njPCEhutZuipngslifoPlPQlljH9FMAeZc0kg2zq2jvpJe9GGYdjEwAzsZRzhwomAT
 v8ds3K6oNrug==
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga006.fm.intel.com ([10.253.24.20])
 by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 14 Apr 2020 12:37:35 -0700
IronPort-SDR: 8gwO2e2vBACAOwfkof1Rwnm8sjZfBhNub15mUFkzyQDdZrAPwrXsB4AlLY1EbODbzGL6/byFha
 IiBp4gMw5/nA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.72,384,1580803200"; d="scan'208";a="454669775"
Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com)
 ([10.212.39.201])
 by fmsmga006.fm.intel.com with ESMTP; 14 Apr 2020 12:37:30 -0700
Date: Tue, 14 Apr 2020 12:37:25 -0700
Message-ID: <87tv1l505m.wl-ashutosh.dixit@intel.com>
From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
In-Reply-To: <87v9m151ft.wl-ashutosh.dixit@intel.com>
References: <20200413154822.11620-1-umesh.nerlige.ramappa@intel.com>
 <20200413154822.11620-2-umesh.nerlige.ramappa@intel.com>
 <87y2qy4psw.wl-ashutosh.dixit@intel.com>
 <87wo6i4kqd.wl-ashutosh.dixit@intel.com>
 <20200414165948.GD61072@orsosgc001.amr.corp.intel.com>
 <87v9m151ft.wl-ashutosh.dixit@intel.com>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue)
 FLIM/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL/10.8 EasyPG/1.0.0 Emacs/26
 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Subject: Re: [Intel-gfx] [PATCH 1/3] drm/i915/perf: Reduce cpu overhead for
 blocking perf OA reads
X-BeenThere: intel-gfx@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel graphics driver community testing & development
 <intel-gfx.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Cc: intel-gfx@lists.freedesktop.org
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

On Tue, 14 Apr 2020 12:09:42 -0700, Dixit, Ashutosh wrote:
>
> On Tue, 14 Apr 2020 09:59:48 -0700, Umesh Nerlige Ramappa wrote:
> > On Mon, Apr 13, 2020 at 11:58:18PM -0700, Dixit, Ashutosh wrote:
> > > On Mon, 13 Apr 2020 22:08:47 -0700, Dixit, Ashutosh wrote:
> > >>
> > >> On Mon, 13 Apr 2020 08:48:20 -0700, Umesh Nerlige Ramappa wrote:
> > >> >
> > >> > A condition in wait_event_interruptible seems to be checked twice before
> > >> > waiting on the event to occur. These checks are redundant when hrtimer
> > >> > events will call oa_buffer_check_unlocked to update the oa_buffer tail
> > >> > pointers. The redundant checks add cpu overhead. Simplify the check
> > >> > to reduce cpu overhead when using blocking io to read oa buffer reports.
> > >> >
> > >> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > >> > ---
> > >> >  drivers/gpu/drm/i915/i915_perf.c | 28 +++++++++++++++++++++++++++-
> > >> >  1 file changed, 27 insertions(+), 1 deletion(-)
> > >> >
> > >> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > >> > index 5cde3e4e7be6..e28a3ab83fde 100644
> > >> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > >> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > >> > @@ -541,6 +541,32 @@ static bool oa_buffer_check_unlocked(struct i915_perf_stream *stream)
> > >> >	return pollin;
> > >> >  }
> > >> >
> > >> > +/**
> > >> > + * oa_buffer_check_reports - quick check if reports are available
> > >> > + * @stream: i915 stream instance
> > >> > + *
> > >> > + * The return from this function is used as a condition for
> > >> > + * wait_event_interruptible in blocking read. This is used to detect
> > >> > + * available reports.
> > >> > + *
> > >> > + * A condition in wait_event_interruptible seems to be checked twice before
> > >> > + * waiting on an event to occur. These checks are redundant when hrtimer events
> > >> > + * will call oa_buffer_check_unlocked to update the oa_buffer tail pointers. The
> > >> > + * redundant checks add cpu overhead. We simplify the check to reduce cpu
> > >> > + * overhead.
> > >> > + */
> > >> > +static bool oa_buffer_check_reports(struct i915_perf_stream *stream)
> > >> > +{
> > >> > +	unsigned long flags;
> > >> > +	bool available;
> > >> > +
> > >> > +	spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
> > >> > +	available = stream->oa_buffer.tail != stream->oa_buffer.head;
> > >> > +	spin_unlock_irqrestore(&stream->oa_buffer.ptr_lock, flags);
> > >> > +
> > >> > +	return available;
> > >> > +}
> > >> > +
> > >> >  /**
> > >> >   * append_oa_status - Appends a status record to a userspace read() buffer.
> > >> >   * @stream: An i915-perf stream opened for OA metrics
> > >> > @@ -1150,7 +1176,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
> > >> >		return -EIO;
> > >> >
> > >> >	return wait_event_interruptible(stream->poll_wq,
> > >> > -					oa_buffer_check_unlocked(stream));
> > >> > +					oa_buffer_check_reports(stream));
> > >> >  }
> > >>
> > >> I think the problem with this patch is that the original code had the
> > >> property that the condition for data availability is checked (and the OA
> > >> tail advanced) /before/ entering the wait. So the tail was advanced and if
> > >> there was data there was no need to wait at all. This change has lost that
> > >> property. With this change we will first always enter the wait and then get
> > >> unblocked after the timer interrupt which will advance the tail and wake us
> > >> up.
> > >>
> > >> I think if we want to do this, it is possible to do without losing the
> > >> original property. Just call oa_buffer_check_unlocked() first (outside
> > >> wait_event) and if there is data just return. If not, put yourself on
> > >> stream->poll_wq from which the timer callback will wake us up. I think this
> > >> is going to be something like prepare_to_wait() / schedule() /
> > >> finish_wait() pattern of which there are numerous examples in the
> > >> kernel. So in this case we won't call wait_event_interruptible() which
> > >> checks the condition upon waking up. No need to define
> > >> oa_buffer_check_reports() either.
> > >
> > > If on the other hand we say that this should actually be the desired
> > > behavior of the blocking read, that it should not unblock immediately but
> > > only after the next timer interrupt (so that an app can call the blocking
> > > read repeatedly without worrying about it will return a little bit of data
> > > in each call at a considerable amount of CPU load), then we may be able to
> > > do something like this, but even then we may have to think about what would
> > > be the correct way to do that. Though this may be that correct way and we
> > > may just need to change the commit message, but is that what we want?
> >
> > I am not quite clear on how the blocking read should behave in terms of the
> > API itself.
> >
> > The change here is solely to reduce cpu overhead from the additional 2
> > calls to oa_buffer_check_unlocked before blocking and that would happen on
> > every call to the read. I agree that the read would block if the timer did
> > not update the tail yet, but that makes sense in a way that we don't want
> > read to constantly return data when the OA sampling rates are high (say 100
> > us). With this change the behavior becomes consistent irrespective of the
> > OA sampling rate and user has the flexibility to set the POLL_OA_PERIOD to
> > whatever value is suitable for the OA sampling rate chosen.
>
> I think the patch commit message should clearly state the reason for the
> patch. Afais if the purpose of the patch is to reduce cpu overhead, the
> patch should not change API behavior. If on the other hand the purpose of
> the patch is to make the API behavior consistent, then please resubmit the
> patch after modifying the commit message to reflect the purpose of the
> patch so that the patch can be reviewed from that point of view.
>
> Also I think we should be able to just use stream->pollin instead of
> defining the new oa_buffer_check_reports(). E.g. isn't
> stream->pollin == (stream->oa_buffer.tail != stream->oa_buffer.head) ?

Actually yes, using stream->pollin will make the wait for both blocking and
non-blocking reads identical. So the blocking wait just becomes:

static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
{
	...

        return wait_event_interruptible(stream->poll_wq, stream->pollin);
}

That is if we re-purpose the patch to make the API behavior consistent
across blocking and non-blocking reads.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx