* Trio of latency sensitive patches
@ 2018-07-30 15:25 Chris Wilson
2018-07-30 15:25 ` [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request Chris Wilson
` (6 more replies)
0 siblings, 7 replies; 15+ messages in thread
From: Chris Wilson @ 2018-07-30 15:25 UTC (permalink / raw)
To: intel-gfx
Just a couple of bug reports suggesting we should do better with power
distribution...
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request 2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson @ 2018-07-30 15:25 ` Chris Wilson 2018-08-01 9:56 ` Chris Wilson 2018-08-03 10:48 ` Tvrtko Ursulin 2018-07-30 15:25 ` [PATCH 2/3] drm/i915: Do not use iowait while waiting for the GPU Chris Wilson ` (5 subsequent siblings) 6 siblings, 2 replies; 15+ messages in thread From: Chris Wilson @ 2018-07-30 15:25 UTC (permalink / raw) To: intel-gfx; +Cc: Eero Tamminen If we are waiting for the currently executing request, we have a good idea that it will be completed in the very near future and so want to cap the CPU_DMA_LATENCY to ensure that we wake up the client quickly. v2: Not allowed to block in kmalloc after setting TASK_INTERRUPTIBLE. v3: Avoid the blocking notifier as well for TASK_INTERRUPTIBLE v4: Beautification? v5: And ignore the preemptibility of queue_work before schedule. Testcase: igt/gem_sync/store-default Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: Francisco Jerez <currojerez@riseup.net> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> --- drivers/gpu/drm/i915/i915_request.c | 52 +++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 5c2c93cbab12..f3ff8dbe363d 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1258,6 +1258,51 @@ static bool __i915_wait_request_check_and_reset(struct i915_request *request) return true; } +struct wait_dma_qos { + struct pm_qos_request req; + struct work_struct add, del; +}; + +static void __wait_dma_qos_add(struct work_struct *work) +{ + struct wait_dma_qos *qos = container_of(work, typeof(*qos), add); + + pm_qos_add_request(&qos->req, PM_QOS_CPU_DMA_LATENCY, 50); +} + +static void __wait_dma_qos_del(struct work_struct *work) +{ + struct wait_dma_qos *qos = container_of(work, typeof(*qos), del); + + if (!cancel_work_sync(&qos->add)) + pm_qos_remove_request(&qos->req); + + kfree(qos); +} + +static struct wait_dma_qos *wait_dma_qos_add(void) +{ + struct wait_dma_qos *qos; + + /* Called under TASK_INTERRUPTIBLE, so not allowed to sleep/block. */ + qos = kzalloc(sizeof(*qos), GFP_NOWAIT | __GFP_NOWARN); + if (!qos) + return NULL; + + INIT_WORK(&qos->add, __wait_dma_qos_add); + INIT_WORK(&qos->del, __wait_dma_qos_del); + schedule_work_on(raw_smp_processor_id(), &qos->add); + + return qos; +} + +static void wait_dma_qos_del(struct wait_dma_qos *qos) +{ + /* Defer to worker so not incur extra latency for our woken client. */ + if (qos) + schedule_work(&qos->del); +} + /** * i915_request_wait - wait until execution of request has finished * @rq: the request to wait upon @@ -1286,6 +1331,7 @@ long i915_request_wait(struct i915_request *rq, wait_queue_head_t *errq = &rq->i915->gpu_error.wait_queue; DEFINE_WAIT_FUNC(reset, default_wake_function); DEFINE_WAIT_FUNC(exec, default_wake_function); + struct wait_dma_qos *qos = NULL; struct intel_wait wait; might_sleep(); @@ -1363,6 +1409,11 @@ long i915_request_wait(struct i915_request *rq, break; } + if (!qos && + i915_seqno_passed(intel_engine_get_seqno(rq->engine), + wait.seqno - 1)) + qos = wait_dma_qos_add(); + timeout = io_schedule_timeout(timeout); if (intel_wait_complete(&wait) && @@ -1412,6 +1463,7 @@ long i915_request_wait(struct i915_request *rq, if (flags & I915_WAIT_LOCKED) remove_wait_queue(errq, &reset); remove_wait_queue(&rq->execute, &exec); + wait_dma_qos_del(qos); trace_i915_request_wait_end(rq); return timeout; -- 2.18.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request 2018-07-30 15:25 ` [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request Chris Wilson @ 2018-08-01 9:56 ` Chris Wilson 2018-08-03 10:48 ` Tvrtko Ursulin 1 sibling, 0 replies; 15+ messages in thread From: Chris Wilson @ 2018-08-01 9:56 UTC (permalink / raw) To: intel-gfx; +Cc: Eero Tamminen Quoting Chris Wilson (2018-07-30 16:25:20) > If we are waiting for the currently executing request, we have a good > idea that it will be completed in the very near future and so want to > cap the CPU_DMA_LATENCY to ensure that we wake up the client quickly. > > v2: Not allowed to block in kmalloc after setting TASK_INTERRUPTIBLE. > v3: Avoid the blocking notifier as well for TASK_INTERRUPTIBLE > v4: Beautification? > v5: And ignore the preemptibility of queue_work before schedule. > > Testcase: igt/gem_sync/store-default > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Eero Tamminen <eero.t.tamminen@intel.com> > Cc: Francisco Jerez <currojerez@riseup.net> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> media_bench disagrees with dropping iowait, but agrees with setting the DMA_LATENCY pm_qos. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request 2018-07-30 15:25 ` [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request Chris Wilson 2018-08-01 9:56 ` Chris Wilson @ 2018-08-03 10:48 ` Tvrtko Ursulin 2018-08-03 11:07 ` Chris Wilson 1 sibling, 1 reply; 15+ messages in thread From: Tvrtko Ursulin @ 2018-08-03 10:48 UTC (permalink / raw) To: Chris Wilson, intel-gfx; +Cc: Eero Tamminen On 30/07/2018 16:25, Chris Wilson wrote: > If we are waiting for the currently executing request, we have a good > idea that it will be completed in the very near future and so want to > cap the CPU_DMA_LATENCY to ensure that we wake up the client quickly. Maybe, but I have a feeling we shouldn't assume what the userspace wants. On the other hand "seqno - 1" guard alleviates some of my concerns, just not sure if all. > v2: Not allowed to block in kmalloc after setting TASK_INTERRUPTIBLE. > v3: Avoid the blocking notifier as well for TASK_INTERRUPTIBLE > v4: Beautification? > v5: And ignore the preemptibility of queue_work before schedule. > > Testcase: igt/gem_sync/store-default > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Eero Tamminen <eero.t.tamminen@intel.com> > Cc: Francisco Jerez <currojerez@riseup.net> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > --- > drivers/gpu/drm/i915/i915_request.c | 52 +++++++++++++++++++++++++++++ > 1 file changed, 52 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > index 5c2c93cbab12..f3ff8dbe363d 100644 > --- a/drivers/gpu/drm/i915/i915_request.c > +++ b/drivers/gpu/drm/i915/i915_request.c > @@ -1258,6 +1258,51 @@ static bool __i915_wait_request_check_and_reset(struct i915_request *request) > return true; > } > > +struct wait_dma_qos { > + struct pm_qos_request req; > + struct work_struct add, del; > +}; > + > +static void __wait_dma_qos_add(struct work_struct *work) > +{ > + struct wait_dma_qos *qos = container_of(work, typeof(*qos), add); > + > + pm_qos_add_request(&qos->req, PM_QOS_CPU_DMA_LATENCY, 50); > +} > + > +static void __wait_dma_qos_del(struct work_struct *work) > +{ > + struct wait_dma_qos *qos = container_of(work, typeof(*qos), del); > + > + if (!cancel_work_sync(&qos->add)) > + pm_qos_remove_request(&qos->req); > + > + kfree(qos); > +} > + > +static struct wait_dma_qos *wait_dma_qos_add(void) > +{ > + struct wait_dma_qos *qos; > + > + /* Called under TASK_INTERRUPTIBLE, so not allowed to sleep/block. */ > + qos = kzalloc(sizeof(*qos), GFP_NOWAIT | __GFP_NOWARN); Is it too big to put on the stack in i915_request_wait? Looks like that would be simpler. > + if (!qos) > + return NULL; > + > + INIT_WORK(&qos->add, __wait_dma_qos_add); > + INIT_WORK(&qos->del, __wait_dma_qos_del); > + schedule_work_on(raw_smp_processor_id(), &qos->add); Do we want to use the highpri wq? But in any case we do have a worker latency here, which may completely defeat the 50us QoS request. :( Also, do you need to specify the CPU manually or is that in fact detrimental to the worker running ASAP? AFAIU this makes the worker only be able to start once we go to sleep, with potentially other stuff in there preceding our work item. > + > + return qos; > +} > + > +static void wait_dma_qos_del(struct wait_dma_qos *qos) > +{ > + /* Defer to worker so not incur extra latency for our woken client. */ > + if (qos) > + schedule_work(&qos->del); > +} > + > /** > * i915_request_wait - wait until execution of request has finished > * @rq: the request to wait upon > @@ -1286,6 +1331,7 @@ long i915_request_wait(struct i915_request *rq, > wait_queue_head_t *errq = &rq->i915->gpu_error.wait_queue; > DEFINE_WAIT_FUNC(reset, default_wake_function); > DEFINE_WAIT_FUNC(exec, default_wake_function); > + struct wait_dma_qos *qos = NULL; > struct intel_wait wait; > > might_sleep(); > @@ -1363,6 +1409,11 @@ long i915_request_wait(struct i915_request *rq, > break; > } > > + if (!qos && > + i915_seqno_passed(intel_engine_get_seqno(rq->engine), > + wait.seqno - 1)) > + qos = wait_dma_qos_add(); > + > timeout = io_schedule_timeout(timeout); > > if (intel_wait_complete(&wait) && > @@ -1412,6 +1463,7 @@ long i915_request_wait(struct i915_request *rq, > if (flags & I915_WAIT_LOCKED) > remove_wait_queue(errq, &reset); > remove_wait_queue(&rq->execute, &exec); > + wait_dma_qos_del(qos); > trace_i915_request_wait_end(rq); > > return timeout; > Another thing we talked about on IRC is a potential to introduce an explicit low-latency flag to gem_wait ioctl. That would punt the responsibility to userspace to know if it cares, but on the other hand if some benchmark benefit from implicit setting that could be tempting. You said media-bench likes it so I'll try it. Explicit request would also simplify the code by removing the need for workers. And remove the worker latency from the request which may be the most attractive benefit. And also could apply the QoS request to before the seqno assignment. Currently I think there is a small window where wait can race with it, and fall into high-latency sleep, even if later it would chose to request low-latency. Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request 2018-08-03 10:48 ` Tvrtko Ursulin @ 2018-08-03 11:07 ` Chris Wilson 2018-08-03 13:00 ` Tvrtko Ursulin 0 siblings, 1 reply; 15+ messages in thread From: Chris Wilson @ 2018-08-03 11:07 UTC (permalink / raw) To: Tvrtko Ursulin, intel-gfx; +Cc: Eero Tamminen Quoting Tvrtko Ursulin (2018-08-03 11:48:44) > > On 30/07/2018 16:25, Chris Wilson wrote: > > If we are waiting for the currently executing request, we have a good > > idea that it will be completed in the very near future and so want to > > cap the CPU_DMA_LATENCY to ensure that we wake up the client quickly. > > Maybe, but I have a feeling we shouldn't assume what the userspace > wants. On the other hand "seqno - 1" guard alleviates some of my > concerns, just not sure if all. Yup, it at least pretends to not be wholly evil. > > v2: Not allowed to block in kmalloc after setting TASK_INTERRUPTIBLE. > > v3: Avoid the blocking notifier as well for TASK_INTERRUPTIBLE > > v4: Beautification? > > v5: And ignore the preemptibility of queue_work before schedule. > > > > Testcase: igt/gem_sync/store-default > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > Cc: Eero Tamminen <eero.t.tamminen@intel.com> > > Cc: Francisco Jerez <currojerez@riseup.net> > > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > > --- > > drivers/gpu/drm/i915/i915_request.c | 52 +++++++++++++++++++++++++++++ > > 1 file changed, 52 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > > index 5c2c93cbab12..f3ff8dbe363d 100644 > > --- a/drivers/gpu/drm/i915/i915_request.c > > +++ b/drivers/gpu/drm/i915/i915_request.c > > @@ -1258,6 +1258,51 @@ static bool __i915_wait_request_check_and_reset(struct i915_request *request) > > return true; > > } > > > > +struct wait_dma_qos { > > + struct pm_qos_request req; > > + struct work_struct add, del; > > +}; > > + > > +static void __wait_dma_qos_add(struct work_struct *work) > > +{ > > + struct wait_dma_qos *qos = container_of(work, typeof(*qos), add); > > + > > + pm_qos_add_request(&qos->req, PM_QOS_CPU_DMA_LATENCY, 50); > > +} > > + > > +static void __wait_dma_qos_del(struct work_struct *work) > > +{ > > + struct wait_dma_qos *qos = container_of(work, typeof(*qos), del); > > + > > + if (!cancel_work_sync(&qos->add)) > > + pm_qos_remove_request(&qos->req); > > + > > + kfree(qos); > > +} > > + > > +static struct wait_dma_qos *wait_dma_qos_add(void) > > +{ > > + struct wait_dma_qos *qos; > > + > > + /* Called under TASK_INTERRUPTIBLE, so not allowed to sleep/block. */ > > + qos = kzalloc(sizeof(*qos), GFP_NOWAIT | __GFP_NOWARN); > > Is it too big to put on the stack in i915_request_wait? Looks like that > would be simpler. We don't want to be synchronous in our out path, as that directly adds to the client latency. > > + if (!qos) > > + return NULL; > > + > > + INIT_WORK(&qos->add, __wait_dma_qos_add); > > + INIT_WORK(&qos->del, __wait_dma_qos_del); > > + schedule_work_on(raw_smp_processor_id(), &qos->add); > > Do we want to use the highpri wq? But in any case we do have a worker > latency here, which may completely defeat the 50us QoS request. :( > > Also, do you need to specify the CPU manually or is that in fact > detrimental to the worker running ASAP? AFAIU this makes the worker only > be able to start once we go to sleep, with potentially other stuff in > there preceding our work item. That was the idea with pinning it to the current cpu; that the worker is only run when we schedule ourselves. If we skipped the schedule, we wouldn't need to adjust the pm_qos. There is still some wiggle with preemption, but I don't see that being a huge concern, if we are switching cpus we may as well make sure we don't enter a high C-state. > > + > > + return qos; > > +} > > + > > +static void wait_dma_qos_del(struct wait_dma_qos *qos) > > +{ > > + /* Defer to worker so not incur extra latency for our woken client. */ > > + if (qos) > > + schedule_work(&qos->del); > > +} > > + > > /** > > * i915_request_wait - wait until execution of request has finished > > * @rq: the request to wait upon > > @@ -1286,6 +1331,7 @@ long i915_request_wait(struct i915_request *rq, > > wait_queue_head_t *errq = &rq->i915->gpu_error.wait_queue; > > DEFINE_WAIT_FUNC(reset, default_wake_function); > > DEFINE_WAIT_FUNC(exec, default_wake_function); > > + struct wait_dma_qos *qos = NULL; > > struct intel_wait wait; > > > > might_sleep(); > > @@ -1363,6 +1409,11 @@ long i915_request_wait(struct i915_request *rq, > > break; > > } > > > > + if (!qos && > > + i915_seqno_passed(intel_engine_get_seqno(rq->engine), > > + wait.seqno - 1)) > > + qos = wait_dma_qos_add(); > > + > > timeout = io_schedule_timeout(timeout); > > > > if (intel_wait_complete(&wait) && > > @@ -1412,6 +1463,7 @@ long i915_request_wait(struct i915_request *rq, > > if (flags & I915_WAIT_LOCKED) > > remove_wait_queue(errq, &reset); > > remove_wait_queue(&rq->execute, &exec); > > + wait_dma_qos_del(qos); > > trace_i915_request_wait_end(rq); > > > > return timeout; > > > > Another thing we talked about on IRC is a potential to introduce an > explicit low-latency flag to gem_wait ioctl. That would punt the > responsibility to userspace to know if it cares, but on the other hand > if some benchmark benefit from implicit setting that could be tempting. > You said media-bench likes it so I'll try it. > > Explicit request would also simplify the code by removing the need for > workers. And remove the worker latency from the request which may be the > most attractive benefit. No, we still need the workers as the pm_qos_update can take a while if it decides to change C-state there and then, so definitely don't want that in the tail, and we might as well keep the trick to only do the pm_qos_update if we sleep. > And also could apply the QoS request to before the seqno assignment. > Currently I think there is a small window where wait can race with it, > and fall into high-latency sleep, even if later it would chose to > request low-latency. That window should be followed by an interrupt, I think, and we should have applied the irq_seqno barrier for gen5-gen7, so the seqno should be valid wrt to the previous interrupt. A hard problem to debug for sure. Maybe we should preserve the last-interrupt timestamp in the HWSP and then we can inspect that. Ideally though we want a journal so we can go back in case the next interrupt arrives before we wakeup making the latency seem less. Still some information (with detectable error) is better than none. Let's see what that looks like. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request 2018-08-03 11:07 ` Chris Wilson @ 2018-08-03 13:00 ` Tvrtko Ursulin 2018-08-03 13:57 ` Chris Wilson 0 siblings, 1 reply; 15+ messages in thread From: Tvrtko Ursulin @ 2018-08-03 13:00 UTC (permalink / raw) To: Chris Wilson, intel-gfx; +Cc: Eero Tamminen On 03/08/2018 12:07, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-08-03 11:48:44) >> >> On 30/07/2018 16:25, Chris Wilson wrote: >>> If we are waiting for the currently executing request, we have a good >>> idea that it will be completed in the very near future and so want to >>> cap the CPU_DMA_LATENCY to ensure that we wake up the client quickly. >> >> Maybe, but I have a feeling we shouldn't assume what the userspace >> wants. On the other hand "seqno - 1" guard alleviates some of my >> concerns, just not sure if all. > > Yup, it at least pretends to not be wholly evil. > >>> v2: Not allowed to block in kmalloc after setting TASK_INTERRUPTIBLE. >>> v3: Avoid the blocking notifier as well for TASK_INTERRUPTIBLE >>> v4: Beautification? >>> v5: And ignore the preemptibility of queue_work before schedule. >>> >>> Testcase: igt/gem_sync/store-default >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> >>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> >>> Cc: Eero Tamminen <eero.t.tamminen@intel.com> >>> Cc: Francisco Jerez <currojerez@riseup.net> >>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> >>> --- >>> drivers/gpu/drm/i915/i915_request.c | 52 +++++++++++++++++++++++++++++ >>> 1 file changed, 52 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c >>> index 5c2c93cbab12..f3ff8dbe363d 100644 >>> --- a/drivers/gpu/drm/i915/i915_request.c >>> +++ b/drivers/gpu/drm/i915/i915_request.c >>> @@ -1258,6 +1258,51 @@ static bool __i915_wait_request_check_and_reset(struct i915_request *request) >>> return true; >>> } >>> >>> +struct wait_dma_qos { >>> + struct pm_qos_request req; >>> + struct work_struct add, del; >>> +}; >>> + >>> +static void __wait_dma_qos_add(struct work_struct *work) >>> +{ >>> + struct wait_dma_qos *qos = container_of(work, typeof(*qos), add); >>> + >>> + pm_qos_add_request(&qos->req, PM_QOS_CPU_DMA_LATENCY, 50); >>> +} >>> + >>> +static void __wait_dma_qos_del(struct work_struct *work) >>> +{ >>> + struct wait_dma_qos *qos = container_of(work, typeof(*qos), del); >>> + >>> + if (!cancel_work_sync(&qos->add)) >>> + pm_qos_remove_request(&qos->req); >>> + >>> + kfree(qos); >>> +} >>> + >>> +static struct wait_dma_qos *wait_dma_qos_add(void) >>> +{ >>> + struct wait_dma_qos *qos; >>> + >>> + /* Called under TASK_INTERRUPTIBLE, so not allowed to sleep/block. */ >>> + qos = kzalloc(sizeof(*qos), GFP_NOWAIT | __GFP_NOWARN); >> >> Is it too big to put on the stack in i915_request_wait? Looks like that >> would be simpler. > > We don't want to be synchronous in our out path, as that directly adds > to the client latency. True, I missed the fact we wouldn't be able to return until syncing with the worker. >>> + if (!qos) >>> + return NULL; >>> + >>> + INIT_WORK(&qos->add, __wait_dma_qos_add); >>> + INIT_WORK(&qos->del, __wait_dma_qos_del); >>> + schedule_work_on(raw_smp_processor_id(), &qos->add); >> >> Do we want to use the highpri wq? But in any case we do have a worker >> latency here, which may completely defeat the 50us QoS request. :( >> >> Also, do you need to specify the CPU manually or is that in fact >> detrimental to the worker running ASAP? AFAIU this makes the worker only >> be able to start once we go to sleep, with potentially other stuff in >> there preceding our work item. > > That was the idea with pinning it to the current cpu; that the worker is > only run when we schedule ourselves. If we skipped the schedule, we > wouldn't need to adjust the pm_qos. There is still some wiggle with > preemption, but I don't see that being a huge concern, if we are > switching cpus we may as well make sure we don't enter a high C-state. I guess it is OK, if there are preceding wq items which will delay our PM QoS setup they just add latency which we would get anyway after sleeping and before can be woken up. If I understand how these things work.. But highpri wq would still be a good move I think. > >>> + >>> + return qos; >>> +} >>> + >>> +static void wait_dma_qos_del(struct wait_dma_qos *qos) >>> +{ >>> + /* Defer to worker so not incur extra latency for our woken client. */ >>> + if (qos) >>> + schedule_work(&qos->del); >>> +} >>> + >>> /** >>> * i915_request_wait - wait until execution of request has finished >>> * @rq: the request to wait upon >>> @@ -1286,6 +1331,7 @@ long i915_request_wait(struct i915_request *rq, >>> wait_queue_head_t *errq = &rq->i915->gpu_error.wait_queue; >>> DEFINE_WAIT_FUNC(reset, default_wake_function); >>> DEFINE_WAIT_FUNC(exec, default_wake_function); >>> + struct wait_dma_qos *qos = NULL; >>> struct intel_wait wait; >>> >>> might_sleep(); >>> @@ -1363,6 +1409,11 @@ long i915_request_wait(struct i915_request *rq, >>> break; >>> } >>> >>> + if (!qos && >>> + i915_seqno_passed(intel_engine_get_seqno(rq->engine), >>> + wait.seqno - 1)) >>> + qos = wait_dma_qos_add(); >>> + >>> timeout = io_schedule_timeout(timeout); >>> >>> if (intel_wait_complete(&wait) && >>> @@ -1412,6 +1463,7 @@ long i915_request_wait(struct i915_request *rq, >>> if (flags & I915_WAIT_LOCKED) >>> remove_wait_queue(errq, &reset); >>> remove_wait_queue(&rq->execute, &exec); >>> + wait_dma_qos_del(qos); >>> trace_i915_request_wait_end(rq); >>> >>> return timeout; >>> >> >> Another thing we talked about on IRC is a potential to introduce an >> explicit low-latency flag to gem_wait ioctl. That would punt the >> responsibility to userspace to know if it cares, but on the other hand >> if some benchmark benefit from implicit setting that could be tempting. >> You said media-bench likes it so I'll try it. >> >> Explicit request would also simplify the code by removing the need for >> workers. And remove the worker latency from the request which may be the >> most attractive benefit. > > No, we still need the workers as the pm_qos_update can take a while if > it decides to change C-state there and then, so definitely don't want > that in the tail, and we might as well keep the trick to only do the > pm_qos_update if we sleep. Okay true, but we would be able to set up the PM QoS earlier, before we do our own busy spin, and if on different CPU that would mean partially avoiding the worker latency. Which would be nice, since if we are giving some attempts at wake-up guarantees it would be nice to be as close as possible. In the meantime I have exercised this via media-bench and it did seems there are gains for a few workloads (and almost no reliable regressions). But when ran under tracing the gains disappeared which makes me think either the 2s workload duration I selected is too short to give reliable numbers, or tracing brings a similar benefit wo/ PM QoS patch. Only conclusion is that more thorough testing is needed. Regards, Tvrtko >> And also could apply the QoS request to before the seqno assignment. >> Currently I think there is a small window where wait can race with it, >> and fall into high-latency sleep, even if later it would chose to >> request low-latency. > > That window should be followed by an interrupt, I think, and we should > have applied the irq_seqno barrier for gen5-gen7, so the seqno should be > valid wrt to the previous interrupt. > > A hard problem to debug for sure. Maybe we should preserve the > last-interrupt timestamp in the HWSP and then we can inspect that. > Ideally though we want a journal so we can go back in case the next > interrupt arrives before we wakeup making the latency seem less. Still > some information (with detectable error) is better than none. Let's see > what that looks like. > -Chris > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request 2018-08-03 13:00 ` Tvrtko Ursulin @ 2018-08-03 13:57 ` Chris Wilson 0 siblings, 0 replies; 15+ messages in thread From: Chris Wilson @ 2018-08-03 13:57 UTC (permalink / raw) To: Tvrtko Ursulin, intel-gfx; +Cc: Eero Tamminen Quoting Tvrtko Ursulin (2018-08-03 14:00:33) > In the meantime I have exercised this via media-bench and it did seems > there are gains for a few workloads (and almost no reliable > regressions). But when ran under tracing the gains disappeared which > makes me think either the 2s workload duration I selected is too short > to give reliable numbers, or tracing brings a similar benefit wo/ PM QoS > patch. Only conclusion is that more thorough testing is needed. Hmm, right tracing really does a number on cpufreq. Switched to + if (IS_ENABLED(CONFIG_DRM_I915_TRACE_IRQ_LATENCY)) { + struct drm_i915_private *dev_priv = rq->i915; + u32 now = I915_READ(RING_TIMESTAMP(rq->engine->mmio_base)); + + trace_i915_wait_irq_latency(rq, + now - intel_engine_get_timestamp(rq->engine), + rq->global_seqno == intel_engine_get_seqno(rq->engine)); + } + from a plain pr_err() and the latency effect just vanished. That makes it tricky. gem_sync 324 [002] 100.695373: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6882, prio=6882, delay=231 gem_sync 324 [002] 100.695536: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6883, prio=6883, delay=219 gem_sync 324 [002] 100.695697: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6884, prio=6884, delay=215 gem_sync 324 [002] 100.695860: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6885, prio=6885, delay=231 gem_sync 324 [002] 100.696040: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6886, prio=6886, delay=313 gem_sync 324 [002] 100.696203: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6887, prio=6887, delay=218 gem_sync 324 [002] 100.696366: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6888, prio=6888, delay=216 gem_sync 324 [002] 100.696527: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6889, prio=6889, delay=214 gem_sync 324 [002] 100.696689: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6890, prio=6890, delay=218 gem_sync 324 [002] 100.696851: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6891, prio=6891, delay=215 gem_sync 324 [002] 100.697013: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6892, prio=6892, delay=214 gem_sync 324 [002] 100.697175: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6893, prio=6893, delay=216 gem_sync 324 [002] 100.697336: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6894, prio=6894, delay=217 from [ 14.232750] Wake up latency: 4004 cycles [ 14.233779] Wake up latency: 3236 cycles [ 14.234831] Wake up latency: 3580 cycles [ 14.235864] Wake up latency: 3220 cycles [ 14.236865] Wake up latency: 2653 cycles [ 14.237893] Wake up latency: 3209 cycles [ 14.238941] Wake up latency: 3492 cycles [ 14.239975] Wake up latency: 3227 cycles [ 14.240943] Wake up latency: 2019 cycles [ 14.241971] Wake up latency: 3199 cycles [ 14.243019] Wake up latency: 3500 cycles [ 14.244051] Wake up latency: 3198 cycles [ 14.245073] Wake up latency: 3014 cycles [ 14.246108] Wake up latency: 3238 cycles [ 14.247158] Wake up latency: 3539 cycles [ 14.248188] Wake up latency: 3200 cycles -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/3] drm/i915: Do not use iowait while waiting for the GPU 2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson 2018-07-30 15:25 ` [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request Chris Wilson @ 2018-07-30 15:25 ` Chris Wilson 2018-07-31 13:03 ` Mika Kuoppala 2018-07-30 15:25 ` [PATCH 3/3] drm/i915: Interactive RPS mode Chris Wilson ` (4 subsequent siblings) 6 siblings, 1 reply; 15+ messages in thread From: Chris Wilson @ 2018-07-30 15:25 UTC (permalink / raw) To: intel-gfx; +Cc: Eero Tamminen A recent trend for cpufreq is to boost the CPU frequencies for iowaiters, in particularly to benefit high frequency I/O. We do the same and boost the GPU clocks to try and minimise time spent waiting for the GPU. However, as the igfx and CPU share the same TDP, boosting the CPU frequency will result in the GPU being throttled and its frequency being reduced. Thus declaring iowait negatively impacts on GPU throughput. v2: Both sleeps! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410 References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: Francisco Jerez <currojerez@riseup.net> --- drivers/gpu/drm/i915/i915_request.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index f3ff8dbe363d..3e48ea87b324 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1376,7 +1376,7 @@ long i915_request_wait(struct i915_request *rq, goto complete; } - timeout = io_schedule_timeout(timeout); + timeout = schedule_timeout(timeout); } while (1); GEM_BUG_ON(!intel_wait_has_seqno(&wait)); @@ -1414,7 +1414,7 @@ long i915_request_wait(struct i915_request *rq, wait.seqno - 1)) qos = wait_dma_qos_add(); - timeout = io_schedule_timeout(timeout); + timeout = schedule_timeout(timeout); if (intel_wait_complete(&wait) && intel_wait_check_request(&wait, rq)) -- 2.18.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] drm/i915: Do not use iowait while waiting for the GPU 2018-07-30 15:25 ` [PATCH 2/3] drm/i915: Do not use iowait while waiting for the GPU Chris Wilson @ 2018-07-31 13:03 ` Mika Kuoppala 2018-07-31 19:25 ` Francisco Jerez 0 siblings, 1 reply; 15+ messages in thread From: Mika Kuoppala @ 2018-07-31 13:03 UTC (permalink / raw) To: Chris Wilson, intel-gfx; +Cc: Eero Tamminen Chris Wilson <chris@chris-wilson.co.uk> writes: > A recent trend for cpufreq is to boost the CPU frequencies for > iowaiters, in particularly to benefit high frequency I/O. We do the same > and boost the GPU clocks to try and minimise time spent waiting for the > GPU. However, as the igfx and CPU share the same TDP, boosting the CPU > frequency will result in the GPU being throttled and its frequency being > reduced. Thus declaring iowait negatively impacts on GPU throughput. > > v2: Both sleeps! > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410 > References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup") The commit above has it's own heuristics on when to actual ramp up, inspecting the interval of io waits. Regardless of that, with shared tdp, the waiter should not stand in a way. And that it fixes a regression: Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> On other way around, the atomic commit code on updating planes, could potentially benefit of changing to the io_schedule_timeout. (and/or adopting c state limits) -Mika > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Eero Tamminen <eero.t.tamminen@intel.com> > Cc: Francisco Jerez <currojerez@riseup.net> > --- > drivers/gpu/drm/i915/i915_request.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > index f3ff8dbe363d..3e48ea87b324 100644 > --- a/drivers/gpu/drm/i915/i915_request.c > +++ b/drivers/gpu/drm/i915/i915_request.c > @@ -1376,7 +1376,7 @@ long i915_request_wait(struct i915_request *rq, > goto complete; > } > > - timeout = io_schedule_timeout(timeout); > + timeout = schedule_timeout(timeout); > } while (1); > > GEM_BUG_ON(!intel_wait_has_seqno(&wait)); > @@ -1414,7 +1414,7 @@ long i915_request_wait(struct i915_request *rq, > wait.seqno - 1)) > qos = wait_dma_qos_add(); > > - timeout = io_schedule_timeout(timeout); > + timeout = schedule_timeout(timeout); > > if (intel_wait_complete(&wait) && > intel_wait_check_request(&wait, rq)) > -- > 2.18.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] drm/i915: Do not use iowait while waiting for the GPU 2018-07-31 13:03 ` Mika Kuoppala @ 2018-07-31 19:25 ` Francisco Jerez 0 siblings, 0 replies; 15+ messages in thread From: Francisco Jerez @ 2018-07-31 19:25 UTC (permalink / raw) To: Mika Kuoppala, Chris Wilson, intel-gfx; +Cc: Eero Tamminen [-- Attachment #1.1.1: Type: text/plain, Size: 4214 bytes --] Mika Kuoppala <mika.kuoppala@linux.intel.com> writes: > Chris Wilson <chris@chris-wilson.co.uk> writes: > >> A recent trend for cpufreq is to boost the CPU frequencies for >> iowaiters, in particularly to benefit high frequency I/O. We do the same >> and boost the GPU clocks to try and minimise time spent waiting for the >> GPU. However, as the igfx and CPU share the same TDP, boosting the CPU >> frequency will result in the GPU being throttled and its frequency being >> reduced. Thus declaring iowait negatively impacts on GPU throughput. >> >> v2: Both sleeps! >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410 >> References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup") > > The commit above has it's own heuristics on when to actual ramp up, > inspecting the interval of io waits. > > Regardless of that, with shared tdp, the waiter should not stand in a > way. I've been running some tests with this series (and your previous ones). I still see statistically significant regressions in latency-sensitive benchmarks with this series applied: qgears2/render-backend=XRender Extension/test-mode=Text: XXX ±0.26% x12 -> XXX ±0.36% x15 d=-0.97% ±0.32% p=0.00% lightsmark: XXX ±0.51% x22 -> XXX ±0.49% x20 d=-1.58% ±0.50% p=0.00% gputest/triangle: XXX ±0.67% x10 -> XXX ±1.76% x20 d=-1.73% ±1.47% p=0.52% synmark/OglMultithread:ĝ XXX ±0.47% x10 -> XXX ±1.06% x20 d=-3.59% ±0.88% p=0.00% Numbers above are from a partial benchmark run on BXT J3455 -- I'm still waiting to get the results of a full run though. Worse, in combination with my intel_pstate branch the effect of this patch is strictly negative. There are no improvements because the cpufreq governor is able to figure out by itself that boosting the frequency of the CPU under GPU-bound conditions cannot possibly help (The HWP boost logic could be fixed to do the same thing easily which would allow us to obtain the best of both worlds on big core). The reason for the regressions is that IOWAIT is a useful signal for the cpufreq governor to provide reduced latency in applications that are unable to parallelize enough work between CPU and the IO device -- The upstream governor is just using it rather ineffectively. > And that it fixes a regression: > This patch isn't necessary anymore to fix the regression, there is another change going in that mitigates the problem [1]. Can we please keep the IO schedule calls here? (and elsewhere in the atomic commit code) [1] https://lkml.org/lkml/2018/7/30/880 > Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> > > On other way around, the atomic commit code on updating > planes, could potentially benefit of changing to the > io_schedule_timeout. (and/or adopting c state limits) > > -Mika > >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> >> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> >> Cc: Eero Tamminen <eero.t.tamminen@intel.com> >> Cc: Francisco Jerez <currojerez@riseup.net> >> --- >> drivers/gpu/drm/i915/i915_request.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c >> index f3ff8dbe363d..3e48ea87b324 100644 >> --- a/drivers/gpu/drm/i915/i915_request.c >> +++ b/drivers/gpu/drm/i915/i915_request.c >> @@ -1376,7 +1376,7 @@ long i915_request_wait(struct i915_request *rq, >> goto complete; >> } >> >> - timeout = io_schedule_timeout(timeout); >> + timeout = schedule_timeout(timeout); >> } while (1); >> >> GEM_BUG_ON(!intel_wait_has_seqno(&wait)); >> @@ -1414,7 +1414,7 @@ long i915_request_wait(struct i915_request *rq, >> wait.seqno - 1)) >> qos = wait_dma_qos_add(); >> >> - timeout = io_schedule_timeout(timeout); >> + timeout = schedule_timeout(timeout); >> >> if (intel_wait_complete(&wait) && >> intel_wait_check_request(&wait, rq)) >> -- >> 2.18.0 [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 227 bytes --] [-- Attachment #2: Type: text/plain, Size: 160 bytes --] _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/3] drm/i915: Interactive RPS mode 2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson 2018-07-30 15:25 ` [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request Chris Wilson 2018-07-30 15:25 ` [PATCH 2/3] drm/i915: Do not use iowait while waiting for the GPU Chris Wilson @ 2018-07-30 15:25 ` Chris Wilson 2018-07-30 15:41 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/3] drm/i915: Limit C-states when waiting for the active request Patchwork ` (3 subsequent siblings) 6 siblings, 0 replies; 15+ messages in thread From: Chris Wilson @ 2018-07-30 15:25 UTC (permalink / raw) To: intel-gfx RPS provides a feedback loop where we use the load during the previous evaluation interval to decide whether to up or down clock the GPU frequency. Our responsiveness is split into 3 regimes, a high and low plateau with the intent to keep the gpu clocked high to cover occasional stalls under high load, and low despite occasional glitches under steady low load, and inbetween. However, we run into situations like kodi where we want to stay at low power (video decoding is done efficiently inside the fixed function HW and doesn't need high clocks even for high bitrate streams), but just occasionally the pipeline is more complex than a video decode and we need a smidgen of extra GPU power to present on time. In the high power regime, we sample at sub frame intervals with a bias to upclocking, and conversely at low power we sample over a few frames worth to provide what we consider to be the right levels of responsiveness respectively. At low power, we more or less expect to be kicked out to high power at the start of a busy sequence by waitboosting. Prior to commit e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request") whenever we missed the frame or stalled, we would immediate go full throttle and upclock the GPU to max. But in commit e9af4ea2b9e7, we relaxed the waitboosting to only apply if the pipeline was deep to avoid over-committing resources for a near miss. Sadly though, a near miss is still a miss, and perceptible as jitter in the frame delivery. To try and prevent the near miss before having to resort to boosting after the fact, we use the pageflip queue as an indication that we are in an "interactive" regime and so should sample the load more frequently to provide power before the frame misses it vblank. This will make us more favorable to providing a small power increase (one or two bins) as required rather than going all the way to maximum and then having to work back down again. (We still keep the waitboosting mechanism around just in case a dramatic change in system load requires urgent uplocking, faster than we can provide in a few evaluation intervals.) v2: Reduce rps_set_interactive to a boolean parameter to avoid the confusion of what if they wanted a new power mode after pinning to a different mode (which to choose?) v3: Only reprogram RPS while the GT is awake, it will be set when we wake the GT, and while off warns about being used outside of rpm. v4: Fix deferred application of interactive mode v5: s/state/interactive/ Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107111 Fixes: e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> --- drivers/gpu/drm/i915/i915_debugfs.c | 1 + drivers/gpu/drm/i915/i915_drv.h | 4 ++ drivers/gpu/drm/i915/intel_display.c | 20 ++++++ drivers/gpu/drm/i915/intel_drv.h | 2 + drivers/gpu/drm/i915/intel_pm.c | 91 +++++++++++++++++++--------- 5 files changed, 89 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 59dc0610ea44..08d9b0748914 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -2218,6 +2218,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data) seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv)); seq_printf(m, "Boosts outstanding? %d\n", atomic_read(&rps->num_waiters)); + seq_printf(m, "Interactive? %d\n", READ_ONCE(rps->interactive)); seq_printf(m, "Frequency requested %d\n", intel_gpu_freq(dev_priv, rps->cur_freq)); seq_printf(m, " min hard:%d, soft:%d; max soft:%d, hard:%d\n", diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 0f49f9988dfa..4ee3f8870da6 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -784,6 +784,8 @@ struct intel_rps { int last_adj; enum { LOW_POWER, BETWEEN, HIGH_POWER } power; + unsigned int interactive; + struct mutex power_lock; bool enabled; atomic_t num_waiters; @@ -3422,6 +3424,8 @@ extern void i915_redisable_vga_power_on(struct drm_i915_private *dev_priv); extern bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val); extern void intel_init_pch_refclk(struct drm_i915_private *dev_priv); extern int intel_set_rps(struct drm_i915_private *dev_priv, u8 val); +extern void intel_rps_mark_interactive(struct drm_i915_private *i915, + bool interactive); extern bool intel_set_memory_cxsr(struct drm_i915_private *dev_priv, bool enable); diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 577b30dde45b..73c6d56ba3ec 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13104,6 +13104,19 @@ intel_prepare_plane_fb(struct drm_plane *plane, add_rps_boost_after_vblank(new_state->crtc, new_state->fence); } + /* + * We declare pageflips to be interactive and so merit a small bias + * towards upclocking to deliver the frame on time. By only changing + * the RPS thresholds to sample more regularly and aim for higher + * clocks we can hopefully deliver low power workloads (like kodi) + * that are not quite steady state without resorting to forcing + * maximum clocks following a vblank miss (see do_rps_boost()). + */ + if (!intel_state->rps_interactive) { + intel_rps_mark_interactive(dev_priv, true); + intel_state->rps_interactive = true; + } + return 0; } @@ -13120,8 +13133,15 @@ void intel_cleanup_plane_fb(struct drm_plane *plane, struct drm_plane_state *old_state) { + struct intel_atomic_state *intel_state = + to_intel_atomic_state(old_state->state); struct drm_i915_private *dev_priv = to_i915(plane->dev); + if (intel_state->rps_interactive) { + intel_rps_mark_interactive(dev_priv, false); + intel_state->rps_interactive = false; + } + /* Should only be called after a successful intel_prepare_plane_fb()! */ mutex_lock(&dev_priv->drm.struct_mutex); intel_plane_unpin_fb(to_intel_plane_state(old_state)); diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index 99a5f5be5b82..1ad7c1124bef 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -484,6 +484,8 @@ struct intel_atomic_state { */ bool skip_intermediate_wm; + bool rps_interactive; + /* Gen9+ only */ struct skl_ddb_values wm_results; diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 8a4152244571..f1b45a7f6550 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -6256,41 +6256,14 @@ static u32 intel_rps_limits(struct drm_i915_private *dev_priv, u8 val) return limits; } -static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val) +static void rps_set_power(struct drm_i915_private *dev_priv, int new_power) { struct intel_rps *rps = &dev_priv->gt_pm.rps; - int new_power; u32 threshold_up = 0, threshold_down = 0; /* in % */ u32 ei_up = 0, ei_down = 0; - new_power = rps->power; - switch (rps->power) { - case LOW_POWER: - if (val > rps->efficient_freq + 1 && - val > rps->cur_freq) - new_power = BETWEEN; - break; + lockdep_assert_held(&rps->power_lock); - case BETWEEN: - if (val <= rps->efficient_freq && - val < rps->cur_freq) - new_power = LOW_POWER; - else if (val >= rps->rp0_freq && - val > rps->cur_freq) - new_power = HIGH_POWER; - break; - - case HIGH_POWER: - if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 && - val < rps->cur_freq) - new_power = BETWEEN; - break; - } - /* Max/min bins are special */ - if (val <= rps->min_freq_softlimit) - new_power = LOW_POWER; - if (val >= rps->max_freq_softlimit) - new_power = HIGH_POWER; if (new_power == rps->power) return; @@ -6357,9 +6330,68 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val) rps->power = new_power; rps->up_threshold = threshold_up; rps->down_threshold = threshold_down; +} + +static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val) +{ + struct intel_rps *rps = &dev_priv->gt_pm.rps; + int new_power; + + new_power = rps->power; + switch (rps->power) { + case LOW_POWER: + if (val > rps->efficient_freq + 1 && + val > rps->cur_freq) + new_power = BETWEEN; + break; + + case BETWEEN: + if (val <= rps->efficient_freq && + val < rps->cur_freq) + new_power = LOW_POWER; + else if (val >= rps->rp0_freq && + val > rps->cur_freq) + new_power = HIGH_POWER; + break; + + case HIGH_POWER: + if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 && + val < rps->cur_freq) + new_power = BETWEEN; + break; + } + /* Max/min bins are special */ + if (val <= rps->min_freq_softlimit) + new_power = LOW_POWER; + if (val >= rps->max_freq_softlimit) + new_power = HIGH_POWER; + + mutex_lock(&rps->power_lock); + if (rps->interactive) + new_power = HIGH_POWER; + rps_set_power(dev_priv, new_power); + mutex_unlock(&rps->power_lock); rps->last_adj = 0; } +void intel_rps_mark_interactive(struct drm_i915_private *i915, bool interactive) +{ + struct intel_rps *rps = &i915->gt_pm.rps; + + if (INTEL_GEN(i915) < 6) + return; + + mutex_lock(&rps->power_lock); + if (interactive) { + if (!rps->interactive++ && READ_ONCE(i915->gt.awake)) + rps_set_power(i915, HIGH_POWER); + } else { + GEM_BUG_ON(!rps->interactive); + rps->interactive--; + } + mutex_unlock(&rps->power_lock); +} + static u32 gen6_rps_pm_mask(struct drm_i915_private *dev_priv, u8 val) { struct intel_rps *rps = &dev_priv->gt_pm.rps; @@ -9596,6 +9628,7 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val) void intel_pm_setup(struct drm_i915_private *dev_priv) { mutex_init(&dev_priv->pcu_lock); + mutex_init(&dev_priv->gt_pm.rps.power_lock); atomic_set(&dev_priv->gt_pm.rps.num_waiters, 0); -- 2.18.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 15+ messages in thread
* ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/3] drm/i915: Limit C-states when waiting for the active request 2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson ` (2 preceding siblings ...) 2018-07-30 15:25 ` [PATCH 3/3] drm/i915: Interactive RPS mode Chris Wilson @ 2018-07-30 15:41 ` Patchwork 2018-07-30 15:43 ` ✗ Fi.CI.SPARSE: " Patchwork ` (2 subsequent siblings) 6 siblings, 0 replies; 15+ messages in thread From: Patchwork @ 2018-07-30 15:41 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx == Series Details == Series: series starting with [1/3] drm/i915: Limit C-states when waiting for the active request URL : https://patchwork.freedesktop.org/series/47435/ State : warning == Summary == $ dim checkpatch origin/drm-tip 79a7b4257bd9 drm/i915: Limit C-states when waiting for the active request c9a918d455b0 drm/i915: Do not use iowait while waiting for the GPU -:16: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line) #16: References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup") -:16: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup")' #16: References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup") total: 1 errors, 1 warnings, 0 checks, 16 lines checked 8e9cd9e9f5bd drm/i915: Interactive RPS mode -:27: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request")' #27: full throttle and upclock the GPU to max. But in commit e9af4ea2b9e7, we -:79: CHECK:UNCOMMENTED_DEFINITION: struct mutex definition without comment #79: FILE: drivers/gpu/drm/i915/i915_drv.h:788: + struct mutex power_lock; -:87: CHECK:AVOID_EXTERNS: extern prototypes should be avoided in .h files #87: FILE: drivers/gpu/drm/i915/i915_drv.h:3427: +extern void intel_rps_mark_interactive(struct drm_i915_private *i915, total: 1 errors, 0 warnings, 2 checks, 183 lines checked _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* ✗ Fi.CI.SPARSE: warning for series starting with [1/3] drm/i915: Limit C-states when waiting for the active request 2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson ` (3 preceding siblings ...) 2018-07-30 15:41 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/3] drm/i915: Limit C-states when waiting for the active request Patchwork @ 2018-07-30 15:43 ` Patchwork 2018-07-30 16:03 ` ✓ Fi.CI.BAT: success " Patchwork 2018-07-30 17:47 ` ✗ Fi.CI.IGT: failure " Patchwork 6 siblings, 0 replies; 15+ messages in thread From: Patchwork @ 2018-07-30 15:43 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx == Series Details == Series: series starting with [1/3] drm/i915: Limit C-states when waiting for the active request URL : https://patchwork.freedesktop.org/series/47435/ State : warning == Summary == $ dim sparse origin/drm-tip Commit: drm/i915: Limit C-states when waiting for the active request Okay! Commit: drm/i915: Do not use iowait while waiting for the GPU Okay! Commit: drm/i915: Interactive RPS mode -drivers/gpu/drm/i915/selftests/../i915_drv.h:3645:16: warning: expression using sizeof(void) +drivers/gpu/drm/i915/selftests/../i915_drv.h:3649:16: warning: expression using sizeof(void) _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Limit C-states when waiting for the active request 2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson ` (4 preceding siblings ...) 2018-07-30 15:43 ` ✗ Fi.CI.SPARSE: " Patchwork @ 2018-07-30 16:03 ` Patchwork 2018-07-30 17:47 ` ✗ Fi.CI.IGT: failure " Patchwork 6 siblings, 0 replies; 15+ messages in thread From: Patchwork @ 2018-07-30 16:03 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx == Series Details == Series: series starting with [1/3] drm/i915: Limit C-states when waiting for the active request URL : https://patchwork.freedesktop.org/series/47435/ State : success == Summary == = CI Bug Log - changes from CI_DRM_4595 -> Patchwork_9812 = == Summary - WARNING == Minor unknown changes coming with Patchwork_9812 need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_9812, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. External URL: https://patchwork.freedesktop.org/api/1.0/series/47435/revisions/1/mbox/ == Possible new issues == Here are the unknown changes that may have been introduced in Patchwork_9812: === IGT changes === ==== Warnings ==== igt@drv_selftest@live_evict: fi-cnl-psr: SKIP -> PASS +9 == Known issues == Here are the changes found in Patchwork_9812 that come from known issues: === IGT changes === ==== Issues hit ==== igt@drv_selftest@live_hangcheck: fi-skl-guc: PASS -> DMESG-FAIL (fdo#107174) fi-kbl-guc: PASS -> DMESG-FAIL (fdo#106947) igt@drv_selftest@live_objects: fi-cnl-psr: SKIP -> DMESG-FAIL (fdo#107398) igt@drv_selftest@live_workarounds: fi-kbl-7560u: PASS -> DMESG-FAIL (fdo#107292) igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c: fi-kbl-7567u: PASS -> FAIL (fdo#103191) {igt@kms_psr@primary_mmap_gtt}: fi-cnl-psr: PASS -> DMESG-WARN (fdo#107372) ==== Possible fixes ==== igt@debugfs_test@read_all_entries: fi-snb-2520m: INCOMPLETE (fdo#103713) -> PASS igt@drv_selftest@live_coherency: {fi-icl-u}: DMESG-FAIL -> PASS igt@drv_selftest@live_requests: {fi-bsw-kefka}: INCOMPLETE (fdo#105876) -> PASS igt@drv_selftest@live_workarounds: {fi-cfl-8109u}: DMESG-FAIL (fdo#107292) -> PASS {fi-bsw-kefka}: DMESG-FAIL (fdo#107292) -> PASS fi-cnl-psr: DMESG-FAIL (fdo#107292) -> PASS igt@kms_chamelium@dp-edid-read: fi-kbl-7500u: FAIL (fdo#103841) -> PASS ==== Warnings ==== {igt@kms_psr@primary_page_flip}: fi-cnl-psr: DMESG-FAIL (fdo#107372) -> DMESG-WARN (fdo#107372) {name}: This element is suppressed. This means it is ignored when computing the status of the difference (SUCCESS, WARNING, or FAILURE). fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191 fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713 fdo#103841 https://bugs.freedesktop.org/show_bug.cgi?id=103841 fdo#105876 https://bugs.freedesktop.org/show_bug.cgi?id=105876 fdo#106947 https://bugs.freedesktop.org/show_bug.cgi?id=106947 fdo#107174 https://bugs.freedesktop.org/show_bug.cgi?id=107174 fdo#107292 https://bugs.freedesktop.org/show_bug.cgi?id=107292 fdo#107372 https://bugs.freedesktop.org/show_bug.cgi?id=107372 fdo#107398 https://bugs.freedesktop.org/show_bug.cgi?id=107398 == Participating hosts (52 -> 47) == Additional (1): fi-byt-j1900 Missing (6): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper == Build changes == * Linux: CI_DRM_4595 -> Patchwork_9812 CI_DRM_4595: f133adaa57cf7118381b5ffc081bba3bbb1dbc83 @ git://anongit.freedesktop.org/gfx-ci/linux IGT_4581: f1c868dae24056ebc27e4f3c197724ce9b956a8a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools Patchwork_9812: 8e9cd9e9f5bd27007a72b01037a18ee944a8e14c @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == 8e9cd9e9f5bd drm/i915: Interactive RPS mode c9a918d455b0 drm/i915: Do not use iowait while waiting for the GPU 79a7b4257bd9 drm/i915: Limit C-states when waiting for the active request == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9812/issues.html _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
* ✗ Fi.CI.IGT: failure for series starting with [1/3] drm/i915: Limit C-states when waiting for the active request 2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson ` (5 preceding siblings ...) 2018-07-30 16:03 ` ✓ Fi.CI.BAT: success " Patchwork @ 2018-07-30 17:47 ` Patchwork 6 siblings, 0 replies; 15+ messages in thread From: Patchwork @ 2018-07-30 17:47 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx == Series Details == Series: series starting with [1/3] drm/i915: Limit C-states when waiting for the active request URL : https://patchwork.freedesktop.org/series/47435/ State : failure == Summary == = CI Bug Log - changes from CI_DRM_4595_full -> Patchwork_9812_full = == Summary - FAILURE == Serious unknown changes coming with Patchwork_9812_full absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_9812_full, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. == Possible new issues == Here are the unknown changes that may have been introduced in Patchwork_9812_full: === IGT changes === ==== Possible regressions ==== igt@pm_rps@min-max-config-loaded: shard-glk: PASS -> FAIL == Known issues == Here are the changes found in Patchwork_9812_full that come from known issues: === IGT changes === ==== Issues hit ==== igt@kms_flip@plain-flip-fb-recreate-interruptible: shard-glk: PASS -> FAIL (fdo#100368) +1 igt@kms_rotation_crc@sprite-rotation-180: shard-hsw: PASS -> FAIL (fdo#103925) igt@pm_rps@min-max-config-loaded: shard-apl: PASS -> FAIL (fdo#102250) ==== Possible fixes ==== igt@gem_ctx_isolation@vcs0-s3: shard-kbl: INCOMPLETE (fdo#103665) -> PASS igt@kms_flip@2x-flip-vs-expired-vblank: shard-hsw: FAIL (fdo#102887) -> PASS igt@kms_flip@flip-vs-expired-vblank-interruptible: shard-kbl: FAIL (fdo#102887, fdo#105363) -> PASS igt@kms_flip@plain-flip-ts-check: shard-glk: FAIL (fdo#100368) -> PASS igt@kms_setmode@basic: shard-apl: FAIL (fdo#99912) -> PASS fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368 fdo#102250 https://bugs.freedesktop.org/show_bug.cgi?id=102250 fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887 fdo#103665 https://bugs.freedesktop.org/show_bug.cgi?id=103665 fdo#103925 https://bugs.freedesktop.org/show_bug.cgi?id=103925 fdo#105363 https://bugs.freedesktop.org/show_bug.cgi?id=105363 fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912 == Participating hosts (5 -> 5) == No changes in participating hosts == Build changes == * Linux: CI_DRM_4595 -> Patchwork_9812 CI_DRM_4595: f133adaa57cf7118381b5ffc081bba3bbb1dbc83 @ git://anongit.freedesktop.org/gfx-ci/linux IGT_4581: f1c868dae24056ebc27e4f3c197724ce9b956a8a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools Patchwork_9812: 8e9cd9e9f5bd27007a72b01037a18ee944a8e14c @ git://anongit.freedesktop.org/gfx-ci/linux piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9812/shards.html _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2018-08-03 13:58 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson 2018-07-30 15:25 ` [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request Chris Wilson 2018-08-01 9:56 ` Chris Wilson 2018-08-03 10:48 ` Tvrtko Ursulin 2018-08-03 11:07 ` Chris Wilson 2018-08-03 13:00 ` Tvrtko Ursulin 2018-08-03 13:57 ` Chris Wilson 2018-07-30 15:25 ` [PATCH 2/3] drm/i915: Do not use iowait while waiting for the GPU Chris Wilson 2018-07-31 13:03 ` Mika Kuoppala 2018-07-31 19:25 ` Francisco Jerez 2018-07-30 15:25 ` [PATCH 3/3] drm/i915: Interactive RPS mode Chris Wilson 2018-07-30 15:41 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/3] drm/i915: Limit C-states when waiting for the active request Patchwork 2018-07-30 15:43 ` ✗ Fi.CI.SPARSE: " Patchwork 2018-07-30 16:03 ` ✓ Fi.CI.BAT: success " Patchwork 2018-07-30 17:47 ` ✗ Fi.CI.IGT: failure " Patchwork
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.