public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Jani Nikula <jani.nikula@linux.intel.com>
To: Ville Syrjala <ville.syrjala@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Cc: intel-xe@lists.freedesktop.org
Subject: Re: [PATCH 3/3] drm/i915/de: Implement register polling in the display code
Date: Fri, 13 Mar 2026 17:03:16 +0200	[thread overview]
Message-ID: <c642aab2724d1db478cdcd3af623117e6f9dad5c@intel.com> (raw)
In-Reply-To: <20260313111028.25159-4-ville.syrjala@linux.intel.com>

On Fri, 13 Mar 2026, Ville Syrjala <ville.syrjala@linux.intel.com> wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>
> The plan is to move all the mmio stuff into the display code itself.
> As a first step implement the register polling in intel_de.c.
>
> Currently i915 and xe implement this stuff in slightly different
> ways, so there are some functional changes here. Try to go for a
> reasonable middle ground between the i915 and xe implementations:
> - the exponential backoff limit is the simpler approach taken
>   by i915 (== just clamp the max sleep duration to 1 ms)

The fact that xe has no upper limit is just bonkers, and with a suitable
sleep value the exponential backoff can be really bad, the timeout
happening almost 2x later than it should. Also, there's a bunch of quick
hammering reads with small timeouts at first, which aren't at all
necessary with longer timeouts. There are downsides to the exponential
backoff because of it's behaviour.

i915 also doesn't actually clamp the max, it'll double if the wait is
under the "max", but the doubling is okay to go over the max. Ditto in
this patch.

> - the fast vs. slow timeout handling is similar to i915 where
>   we first try the fast timeout and then again the slow timeout
>   if the condition still isn't satisfied. xe just adds up the
>   timeouts together, which is a bit weird.

Side note, IMO the fast vs. slow must remain an implementation detail
and not leak outside of intel_de.c. If it really matters, I think it's
more obvious to do it the way gmbus_wait() currently does it, i.e. first
atomic then regular in the call site itself. If the wait functions are
too complex, we either end up with too many functions or functions with
too many parameters, and anyway calls sites end up having to use
poll_timeout_us() directly.

> - the atomic wait variant uses udelay() like xe, whereas i915
>   has no udelay()s in its atomic loop. As a compromise go for a
>   fixed 1 usec delay  for short waits, instead of the somewhat
>   peculiar xe behaviour where it effectively just does one
>   iteration of the loop.

Overall I really prefer having separate functions for atomic and
non-atomic, similar to poll_timeout_us() and poll_timeout_us_atomic(). I
think it's so much easier to reason with the code with that than having
atomic as parameter.

> - keep the "use udelay() for < 10 usec waits" logic (which
>   more or less mirrors fsleep()), but include an explicit
>   might_sleep() even for these short waits when called from
>   a non-atomic intel_de_wait*() function. This should prevent
>   people from calling the non-atomic functions from the wrong
>   place.

Another difference between i915/xe and poll_timeout_us() is the range
for usleep_range(), which is *also* different from fsleep().

i915/xe have:

	usleep_range(wait__, wait__ * 2);

iopoll has:

	usleep_range((__sleep_us >> 2) + 1, __sleep_us);

fsleep() has:

	usleep_range(usecs, usecs + (usecs >> max_slack_shift));

I'm inclined to think all of the non-atomic variants should just use
fsleep(), but especially changing iopoll might be risky.

> Eventually we may want to switch over to poll_timeout*(),
> but that lacks the exponential backoff, so a bit too
> radical to change in one go.

Yeah, *sigh*. I fear it'll be too radical to add exponential backoff in
iopoll too. But I'm not entirely sure doing 10, 20, 40, 80, 160, etc. us
waits first when the timeout is like 1000000 us makes any sense
either. It's just wasteful hammering. Maybe the initial wait should be
relative to the timeout/sleep instead of fixed.

I don't know, lots of talk here. And lots of stuff I don't actually like
about this *or* the existing implementations all that much. But since
this largely remains an implementation detail that can be changed, I
guess I'm fine.

In the long run I kind of do expect the atomic and non-atomic paths to
be split. Having them combined is what I dislike most. Plus they'd have
to be split for migrating to poll_timeout_us() and
poll_timeout_us_atomic() anyway.

As far as the change is concerned, I think it does what it says on the
box. Please take the comments into consideration, especially regarding
future changes, but I'm not insisting on any changes now.

Reviewed-by: Jani Nikula <jani.nikula@intel.com>


>
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/display/intel_de.c       | 99 +++++++++++++++++--
>  .../drm/xe/compat-i915-headers/intel_uncore.h | 31 ------
>  2 files changed, 91 insertions(+), 39 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/display/intel_de.c b/drivers/gpu/drm/i915/display/intel_de.c
> index fce92535bd6a..6cbe50f3e2b4 100644
> --- a/drivers/gpu/drm/i915/display/intel_de.c
> +++ b/drivers/gpu/drm/i915/display/intel_de.c
> @@ -3,10 +3,85 @@
>   * Copyright © 2026 Intel Corporation
>   */
>  
> +#include <linux/delay.h>
> +
>  #include <drm/drm_print.h>
>  
>  #include "intel_de.h"
>  
> +static int __intel_de_wait_for_register(struct intel_display *display,
> +					i915_reg_t reg, u32 mask, u32 value,
> +					unsigned int timeout_us,
> +					u32 (*read)(struct intel_display *display, i915_reg_t reg),
> +					u32 *out_val, bool is_atomic)
> +{
> +	const ktime_t end = ktime_add_us(ktime_get_raw(), timeout_us);
> +	int wait_max = 1000;
> +	int wait = 10;
> +	u32 reg_value;
> +	int ret;
> +
> +	might_sleep_if(!is_atomic);
> +
> +	if (timeout_us <= 10) {
> +		is_atomic = true;
> +		wait = 1;
> +	}
> +
> +	for (;;) {
> +		bool expired = ktime_after(ktime_get_raw(), end);
> +
> +		/* guarantee the condition is evaluated after timeout expired */
> +		barrier();
> +
> +		reg_value = read(display, reg);
> +		if ((reg_value & mask) == value) {
> +			ret = 0;
> +			break;
> +		}
> +
> +		if (expired) {
> +			ret = -ETIMEDOUT;
> +			break;
> +		}
> +
> +		if (is_atomic)
> +			udelay(wait);
> +		else
> +			usleep_range(wait, wait << 1);
> +
> +		if (wait < wait_max)
> +			wait <<= 1;
> +	}
> +
> +	if (out_val)
> +		*out_val = reg_value;
> +
> +	return ret;
> +}
> +
> +static int intel_de_wait_for_register(struct intel_display *display,
> +				      i915_reg_t reg, u32 mask, u32 value,
> +				      unsigned int fast_timeout_us,
> +				      unsigned int slow_timeout_us,
> +				      u32 (*read)(struct intel_display *display, i915_reg_t reg),
> +				      u32 *out_value, bool is_atomic)
> +{
> +	int ret;
> +
> +	if (fast_timeout_us)
> +		ret = __intel_de_wait_for_register(display, reg, mask, value,
> +						   fast_timeout_us, read,
> +						   out_value, is_atomic);
> +
> +	if (ret && slow_timeout_us)
> +		ret = __intel_de_wait_for_register(display, reg, mask, value,
> +						   slow_timeout_us, read,
> +						   out_value, is_atomic);
> +
> +	return ret;
> +}
> +
>  int intel_de_wait_us(struct intel_display *display, i915_reg_t reg,
>  		     u32 mask, u32 value, unsigned int timeout_us,
>  		     u32 *out_value)
> @@ -15,8 +90,10 @@ int intel_de_wait_us(struct intel_display *display, i915_reg_t reg,
>  
>  	intel_dmc_wl_get(display, reg);
>  
> -	ret = __intel_wait_for_register(__to_uncore(display), reg, mask,
> -					value, timeout_us, 0, out_value);
> +	ret = intel_de_wait_for_register(display, reg, mask, value,
> +					 timeout_us, 0,
> +					 intel_de_read,
> +					 out_value, false);
>  
>  	intel_dmc_wl_put(display, reg);
>  
> @@ -31,8 +108,10 @@ int intel_de_wait_ms(struct intel_display *display, i915_reg_t reg,
>  
>  	intel_dmc_wl_get(display, reg);
>  
> -	ret = __intel_wait_for_register(__to_uncore(display), reg, mask,
> -					value, 2, timeout_ms, out_value);
> +	ret = intel_de_wait_for_register(display, reg, mask, value,
> +					 2, timeout_ms * 1000,
> +					 intel_de_read,
> +					 out_value, false);
>  
>  	intel_dmc_wl_put(display, reg);
>  
> @@ -43,16 +122,20 @@ int intel_de_wait_fw_ms(struct intel_display *display, i915_reg_t reg,
>  			u32 mask, u32 value, unsigned int timeout_ms,
>  			u32 *out_value)
>  {
> -	return __intel_wait_for_register_fw(__to_uncore(display), reg, mask,
> -					    value, 2, timeout_ms, out_value);
> +	return intel_de_wait_for_register(display, reg, mask, value,
> +					  2, timeout_ms * 1000,
> +					  intel_de_read_fw,
> +					  out_value, false);
>  }
>  
>  int intel_de_wait_fw_us_atomic(struct intel_display *display, i915_reg_t reg,
>  			       u32 mask, u32 value, unsigned int timeout_us,
>  			       u32 *out_value)
>  {
> -	return __intel_wait_for_register_fw(__to_uncore(display), reg, mask,
> -					    value, timeout_us, 0, out_value);
> +	return intel_de_wait_for_register(display, reg, mask, value,
> +					  timeout_us, 0,
> +					  intel_de_read_fw,
> +					  out_value, true);
>  }
>  
>  int intel_de_wait_for_set_us(struct intel_display *display, i915_reg_t reg,
> diff --git a/drivers/gpu/drm/xe/compat-i915-headers/intel_uncore.h b/drivers/gpu/drm/xe/compat-i915-headers/intel_uncore.h
> index a8cfd65119e0..08d7ab933672 100644
> --- a/drivers/gpu/drm/xe/compat-i915-headers/intel_uncore.h
> +++ b/drivers/gpu/drm/xe/compat-i915-headers/intel_uncore.h
> @@ -98,37 +98,6 @@ static inline u32 intel_uncore_rmw(struct intel_uncore *uncore,
>  	return xe_mmio_rmw32(__compat_uncore_to_mmio(uncore), reg, clear, set);
>  }
>  
> -static inline int
> -__intel_wait_for_register(struct intel_uncore *uncore, i915_reg_t i915_reg,
> -			  u32 mask, u32 value, unsigned int fast_timeout_us,
> -			  unsigned int slow_timeout_ms, u32 *out_value)
> -{
> -	struct xe_reg reg = XE_REG(i915_mmio_reg_offset(i915_reg));
> -	bool atomic;
> -
> -	/*
> -	 * Replicate the behavior from i915 here, in which sleep is not
> -	 * performed if slow_timeout_ms == 0. This is necessary because
> -	 * of some paths in display code where waits are done in atomic
> -	 * context.
> -	 */
> -	atomic = !slow_timeout_ms && fast_timeout_us > 0;
> -
> -	return xe_mmio_wait32(__compat_uncore_to_mmio(uncore), reg, mask, value,
> -			      fast_timeout_us + 1000 * slow_timeout_ms,
> -			      out_value, atomic);
> -}
> -
> -static inline int
> -__intel_wait_for_register_fw(struct intel_uncore *uncore, i915_reg_t i915_reg,
> -			     u32 mask, u32 value, unsigned int fast_timeout_us,
> -			     unsigned int slow_timeout_ms, u32 *out_value)
> -{
> -	return __intel_wait_for_register(uncore, i915_reg, mask, value,
> -					 fast_timeout_us, slow_timeout_ms,
> -					 out_value);
> -}
> -
>  static inline u32 intel_uncore_read_fw(struct intel_uncore *uncore,
>  				       i915_reg_t i915_reg)
>  {

-- 
Jani Nikula, Intel

  reply	other threads:[~2026-03-13 15:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-13 11:10 [PATCH 0/3] drm/i915/de: Move register polling into display code Ville Syrjala
2026-03-13 11:10 ` [PATCH 1/3] drm/i915/de: Introduce intel_de.c and move intel_de_{read, write}8() there Ville Syrjala
2026-03-13 14:21   ` Jani Nikula
2026-03-13 11:10 ` [PATCH 2/3] drm/i915/de: Move intel_de_wait*() into intel_de.c Ville Syrjala
2026-03-13 14:21   ` Jani Nikula
2026-03-13 11:10 ` [PATCH 3/3] drm/i915/de: Implement register polling in the display code Ville Syrjala
2026-03-13 15:03   ` Jani Nikula [this message]
2026-03-17  7:52     ` Ville Syrjälä
2026-03-23  9:50     ` Ville Syrjälä
2026-03-14  8:09   ` kernel test robot
2026-03-23  9:43   ` [PATCH v2 " Ville Syrjala
2026-03-13 11:15 ` ✗ CI.checkpatch: warning for drm/i915/de: Move register polling into " Patchwork
2026-03-13 11:17 ` ✓ CI.KUnit: success " Patchwork
2026-03-13 11:51 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-14 14:29 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-03-23 10:43 ` ✗ CI.checkpatch: warning for drm/i915/de: Move register polling into display code (rev2) Patchwork
2026-03-23 10:45 ` ✓ CI.KUnit: success " Patchwork
2026-03-23 11:26 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-23 13:56 ` ✓ Xe.CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c642aab2724d1db478cdcd3af623117e6f9dad5c@intel.com \
    --to=jani.nikula@linux.intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=ville.syrjala@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox