From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41D73C433DB for ; Wed, 6 Jan 2021 15:57:56 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D9EDF23130 for ; Wed, 6 Jan 2021 15:57:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D9EDF23130 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4B3206E21D; Wed, 6 Jan 2021 15:57:55 +0000 (UTC) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by gabe.freedesktop.org (Postfix) with ESMTPS id 626256E21D for ; Wed, 6 Jan 2021 15:57:53 +0000 (UTC) IronPort-SDR: wqq0ztfotAHLV58ZCkmAG3T1cyH96+jaPoSdJcRvSiEJUWlJvnuPcFUtM9puTZGl8UIksG10/6 2yoblpWq+ynw== X-IronPort-AV: E=McAfee;i="6000,8403,9855"; a="174718816" X-IronPort-AV: E=Sophos;i="5.78,480,1599548400"; d="scan'208";a="174718816" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jan 2021 07:57:52 -0800 IronPort-SDR: +odyROxSkaUkiQKkHQgKIfW1+lJXeOnnOnKAtmYflooGU5PqQlXn8ZFwb3YHtvzcRBqiTwZygC OpsfREwJ2tXg== X-IronPort-AV: E=Sophos;i="5.78,480,1599548400"; d="scan'208";a="422206096" Received: from vbratanx-mobl.ger.corp.intel.com (HELO [10.252.38.40]) ([10.252.38.40]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jan 2021 07:57:51 -0800 To: Chris Wilson , intel-gfx@lists.freedesktop.org References: <20210106123939.18435-1-chris@chris-wilson.co.uk> <20210106123939.18435-4-chris@chris-wilson.co.uk> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc Message-ID: <0ed63aeb-d58e-5ec6-2072-65d17be612dc@linux.intel.com> Date: Wed, 6 Jan 2021 15:57:49 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <20210106123939.18435-4-chris@chris-wilson.co.uk> Content-Language: en-US Subject: Re: [Intel-gfx] [PATCH 4/4] drm/i915/gt: Remove timeslice suppression X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On 06/01/2021 12:39, Chris Wilson wrote: > In the next^W future patch, we remove the strict priority system and > continuously re-evaluate the relative priority of tasks. As such we need > to enable the timeslice whenever there is more than one context in the > pipeline. This simplifies the decision and removes some of the tweaks to > suppress timeslicing, allowing us to lift the timeslice enabling to a > common spot at the end of running the submission tasklet. > > One consequence of the suppression is that it was reducing fairness > between virtual engines on an over saturated system; undermining the > principle for timeslicing. > > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2802 > Testcase: igt/gem_exec_balancer/fairslice > Signed-off-by: Chris Wilson > --- > drivers/gpu/drm/i915/gt/intel_engine_types.h | 10 - > .../drm/i915/gt/intel_execlists_submission.c | 173 +++++++----------- > 2 files changed, 68 insertions(+), 115 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h > index 430066e5884c..df62e793e747 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h > @@ -238,16 +238,6 @@ struct intel_engine_execlists { > */ > unsigned int port_mask; > > - /** > - * @switch_priority_hint: Second context priority. > - * > - * We submit multiple contexts to the HW simultaneously and would > - * like to occasionally switch between them to emulate timeslicing. > - * To know when timeslicing is suitable, we track the priority of > - * the context submitted second. > - */ > - int switch_priority_hint; > - > /** > * @queue_priority_hint: Highest pending priority. > * > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > index ba3114fd4389..50d4308023f3 100644 > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > @@ -1143,25 +1143,6 @@ static void defer_active(struct intel_engine_cs *engine) > defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq))); > } > > -static bool > -need_timeslice(const struct intel_engine_cs *engine, > - const struct i915_request *rq) > -{ > - int hint; > - > - if (!intel_engine_has_timeslices(engine)) > - return false; > - > - hint = max(engine->execlists.queue_priority_hint, > - virtual_prio(&engine->execlists)); > - > - if (!list_is_last(&rq->sched.link, &engine->active.requests)) > - hint = max(hint, rq_prio(list_next_entry(rq, sched.link))); > - > - GEM_BUG_ON(hint >= I915_PRIORITY_UNPREEMPTABLE); > - return hint >= effective_prio(rq); > -} > - > static bool > timeslice_yield(const struct intel_engine_execlists *el, > const struct i915_request *rq) > @@ -1181,76 +1162,68 @@ timeslice_yield(const struct intel_engine_execlists *el, > return rq->context->lrc.ccid == READ_ONCE(el->yield); > } > > -static bool > -timeslice_expired(const struct intel_engine_execlists *el, > - const struct i915_request *rq) > +static bool needs_timeslice(const struct intel_engine_cs *engine, > + const struct i915_request *rq) > { > + if (!intel_engine_has_timeslices(engine)) > + return false; > + > + /* If not currently active, or about to switch, wait for next event */ > + if (!rq || __i915_request_is_complete(rq)) > + return false; > + > + /* We do not need to start the timeslice until after the ACK */ > + if (READ_ONCE(engine->execlists.pending[0])) > + return false; > + > + /* If ELSP[1] is occupied, always check to see if worth slicing */ > + if (!list_is_last_rcu(&rq->sched.link, &engine->active.requests)) > + return true; > + > + /* Otherwise, ELSP[0] is by itself, but may be waiting in the queue */ > + if (!RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)) > + return true; > + > + return !RB_EMPTY_ROOT(&engine->execlists.virtual.rb_root); > +} > + > +static bool > +timeslice_expired(struct intel_engine_cs *engine, const struct i915_request *rq) > +{ > + const struct intel_engine_execlists *el = &engine->execlists; > + > + if (i915_request_has_nopreempt(rq) && __i915_request_has_started(rq)) > + return false; > + > + if (!needs_timeslice(engine, rq)) > + return false; > + > return timer_expired(&el->timer) || timeslice_yield(el, rq); > } > > -static int > -switch_prio(struct intel_engine_cs *engine, const struct i915_request *rq) > -{ > - if (list_is_last(&rq->sched.link, &engine->active.requests)) > - return engine->execlists.queue_priority_hint; > - > - return rq_prio(list_next_entry(rq, sched.link)); > -} > - > -static inline unsigned long > -timeslice(const struct intel_engine_cs *engine) > +static unsigned long timeslice(const struct intel_engine_cs *engine) > { > return READ_ONCE(engine->props.timeslice_duration_ms); > } > > -static unsigned long active_timeslice(const struct intel_engine_cs *engine) > -{ > - const struct intel_engine_execlists *execlists = &engine->execlists; > - const struct i915_request *rq = *execlists->active; > - > - if (!rq || __i915_request_is_complete(rq)) > - return 0; > - > - if (READ_ONCE(execlists->switch_priority_hint) < effective_prio(rq)) > - return 0; > - > - return timeslice(engine); > -} > - > -static void set_timeslice(struct intel_engine_cs *engine) > +static void start_timeslice(struct intel_engine_cs *engine) > { > + struct intel_engine_execlists *el = &engine->execlists; > unsigned long duration; > > - if (!intel_engine_has_timeslices(engine)) > - return; > + /* Disable the timer if there is nothing to switch to */ > + duration = 0; > + if (needs_timeslice(engine, *el->active)) { > + if (el->timer.expires) { Why not just timer_pending check? Are you sure timer->expires cannot legitimately be at jiffie 0 in wrap conditions? > + if (!timer_pending(&el->timer)) > + tasklet_hi_schedule(&engine->execlists.tasklet); > + return; > + } > > - duration = active_timeslice(engine); > - ENGINE_TRACE(engine, "bump timeslicing, interval:%lu", duration); > + duration = timeslice(engine); > + } > > - set_timer_ms(&engine->execlists.timer, duration); > -} > - > -static void start_timeslice(struct intel_engine_cs *engine, int prio) > -{ > - struct intel_engine_execlists *execlists = &engine->execlists; > - unsigned long duration; > - > - if (!intel_engine_has_timeslices(engine)) > - return; > - > - WRITE_ONCE(execlists->switch_priority_hint, prio); > - if (prio == INT_MIN) > - return; > - > - if (timer_pending(&execlists->timer)) > - return; > - > - duration = timeslice(engine); > - ENGINE_TRACE(engine, > - "start timeslicing, prio:%d, interval:%lu", > - prio, duration); > - > - set_timer_ms(&execlists->timer, duration); > + set_timer_ms(&el->timer, duration); > } > > static void record_preemption(struct intel_engine_execlists *execlists) > @@ -1363,16 +1336,16 @@ static void execlists_dequeue(struct intel_engine_cs *engine) > __unwind_incomplete_requests(engine); > > last = NULL; > - } else if (need_timeslice(engine, last) && > - timeslice_expired(execlists, last)) { > + } else if (timeslice_expired(engine, last)) { > ENGINE_TRACE(engine, > - "expired last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n", > - last->fence.context, > - last->fence.seqno, > - last->sched.attr.priority, > + "expired:%s last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n", > + yesno(timer_expired(&execlists->timer)), > + last->fence.context, last->fence.seqno, > + rq_prio(last), > execlists->queue_priority_hint, > yesno(timeslice_yield(execlists, last))); > > + cancel_timer(&execlists->timer); What is this cancel for? Regards, Tvrtko > ring_set_paused(engine, 1); > defer_active(engine); > > @@ -1408,7 +1381,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine) > * of timeslices, our queue might be. > */ > spin_unlock(&engine->active.lock); > - start_timeslice(engine, queue_prio(execlists)); > return; > } > } > @@ -1435,7 +1407,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine) > if (last && !can_merge_rq(last, rq)) { > spin_unlock(&ve->base.active.lock); > spin_unlock(&engine->active.lock); > - start_timeslice(engine, rq_prio(rq)); > return; /* leave this for another sibling */ > } > > @@ -1599,29 +1570,23 @@ static void execlists_dequeue(struct intel_engine_cs *engine) > execlists->queue_priority_hint = queue_prio(execlists); > spin_unlock(&engine->active.lock); > > - if (submit) { > - /* > - * Skip if we ended up with exactly the same set of requests, > - * e.g. trying to timeslice a pair of ordered contexts > - */ > - if (!memcmp(execlists->active, > - execlists->pending, > - (port - execlists->pending) * sizeof(*port))) > - goto skip_submit; > - > + /* > + * We can skip poking the HW if we ended up with exactly the same set > + * of requests as currently running, e.g. trying to timeslice a pair > + * of ordered contexts. > + */ > + if (submit && > + memcmp(execlists->active, > + execlists->pending, > + (port - execlists->pending) * sizeof(*port))) { > *port = NULL; > while (port-- != execlists->pending) > execlists_schedule_in(*port, port - execlists->pending); > > - execlists->switch_priority_hint = > - switch_prio(engine, *execlists->pending); > - > WRITE_ONCE(execlists->yield, -1); > set_preempt_timeout(engine, *execlists->active); > execlists_submit_ports(engine); > } else { > - start_timeslice(engine, execlists->queue_priority_hint); > -skip_submit: > ring_set_paused(engine, 0); > while (port-- != execlists->pending) > i915_request_put(*port); > @@ -1979,8 +1944,6 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive) > } > } while (head != tail); > > - set_timeslice(engine); > - > /* > * Gen11 has proven to fail wrt global observation point between > * entry and tail update, failing on the ordering and thus > @@ -1993,6 +1956,7 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive) > * invalidation before. > */ > invalidate_csb_entries(&buf[0], &buf[num_entries - 1]); > + cancel_timer(&execlists->timer); > > return inactive; > } > @@ -2405,8 +2369,10 @@ static void execlists_submission_tasklet(unsigned long data) > execlists_reset(engine, msg); > } > > - if (!engine->execlists.pending[0]) > + if (!engine->execlists.pending[0]) { > execlists_dequeue_irq(engine); > + start_timeslice(engine); > + } > > post_process_csb(post, inactive); > rcu_read_unlock(); > @@ -3851,9 +3817,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine, > show_request(m, last, "\t\t", 0); > } > > - if (execlists->switch_priority_hint != INT_MIN) > - drm_printf(m, "\t\tSwitch priority hint: %d\n", > - READ_ONCE(execlists->switch_priority_hint)); > if (execlists->queue_priority_hint != INT_MIN) > drm_printf(m, "\t\tQueue priority hint: %d\n", > READ_ONCE(execlists->queue_priority_hint)); > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx