intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts
@ 2018-07-16 18:14 Nathan Ciobanu
  2018-07-16 19:22 ` Marc Herbert
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Nathan Ciobanu @ 2018-07-16 18:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: marc.herbert, dhinakaran.pandiyan, rodrigo.vivi

Limit the link training clock recovery loop to 10 failed attempts at
LANEx_CR_DONE per DP 1.4 spec section 3.5.1.2.2. Some USB-C MST hubs
cause us to get stuck in this loop indefinitely requesting

    voltage swing: 0, pre-emphasis level: 2
    voltage swing: 1, pre-emphasis level: 2
    voltage swing: 0, pre-emphasis level: 3

over and over: max_vswing is never reached, drm_dp_clock_recovery_ok()
never returns true and voltage_tries always gets reset to 1. The driver
sends those values to the hub but the hub keeps requesting new values
every time in the pattern shown above.

Changes in v2:
    - updated commit message (DK, Manasi)
    - defined DP_DP14_MAX_CR_TRIES (Marc)
    - made the loop iterate for max 10 times (Rodrigo, Marc)

Changes in v3:
    - changed error message to use DP_DP14_MAX_CR_TRIES

Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
 include/drm/drm_dp_helper.h                   | 1 +
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
index 4da6e33c7fa1..4bfba65c431c 100644
--- a/drivers/gpu/drm/i915/intel_dp_link_training.c
+++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
@@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
 intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
 {
 	uint8_t voltage;
-	int voltage_tries, max_vswing_tries;
+	int voltage_tries, max_vswing_tries, cr_tries;
 	uint8_t link_config[2];
 	uint8_t link_bw, rate_select;
 
@@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
 
 	voltage_tries = 1;
 	max_vswing_tries = 0;
-	for (;;) {
+	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {
 		uint8_t link_status[DP_LINK_STATUS_SIZE];
 
 		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
@@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
 			++max_vswing_tries;
 
 	}
+	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
+		  DP_DP14_MAX_CR_TRIES);
+	return false;
 }
 
 /*
diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
index c01564991a9f..2303ad8ed24e 100644
--- a/include/drm/drm_dp_helper.h
+++ b/include/drm/drm_dp_helper.h
@@ -820,6 +820,7 @@
 
 #define DP_DP13_DPCD_REV                    0x2200
 #define DP_DP13_MAX_LINK_RATE               0x2201
+#define DP_DP14_MAX_CR_TRIES                10
 
 #define DP_DPRX_FEATURE_ENUMERATION_LIST    0x2210  /* DP 1.3 */
 # define DP_GTC_CAP					(1 << 0)  /* DP 1.3 */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts
  2018-07-16 18:14 [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts Nathan Ciobanu
@ 2018-07-16 19:22 ` Marc Herbert
  2018-07-16 22:30   ` Rodrigo Vivi
  2018-07-16 22:34 ` Rodrigo Vivi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Marc Herbert @ 2018-07-16 19:22 UTC (permalink / raw)
  To: Nathan Ciobanu, intel-gfx; +Cc: dhinakaran.pandiyan, rodrigo.vivi


While the shortness of v3 is really nice, I think it's rather a good
opportunity to make much clearer (as a separate, no functional change
patch?)  its existing terminating condition(s) and what the existing loop
iterates on. I mean it's painful and risky enough to _combine multiple
counters in the same loop_, so at least these different counters should be
_invidually_ crystal-clear with enough comments. For instance let's pretend
there are 4 possible voltage values and that each value is tried 4 times -
then the last value will never be tried! Can this ever happen? If not then
why not? Not even with a buggy sink?  If it can happen then is it fine to
give up before trying some values? Is it compliant with the spec? Etc.

This should incidentally help clarify why and how the current logic allows
infinite loops.

BTW "max_vswing_tries" looks like a boolean, correct? If correct then please
remove this confusing "increment from false to true":

- if (max_vswing_tries == 1)
+ if (max_vswing_tries)

- max_vswing_tries++;
+ max_vwing_tries = true;

Marc


On 16/07/2018 11:14, Nathan Ciobanu wrote:
> Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
>  include/drm/drm_dp_helper.h                   | 1 +
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> index 4da6e33c7fa1..4bfba65c431c 100644
> --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
>  intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
>  {
>  	uint8_t voltage;
> -	int voltage_tries, max_vswing_tries;
> +	int voltage_tries, max_vswing_tries, cr_tries;
>  	uint8_t link_config[2];
>  	uint8_t link_bw, rate_select;
>  
> @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
>  
>  	voltage_tries = 1;
>  	max_vswing_tries = 0;
> -	for (;;) {
> +	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {
>  		uint8_t link_status[DP_LINK_STATUS_SIZE];
>  
>  		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
> @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
>  			++max_vswing_tries;
>  
>  	}
> +	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
> +		  DP_DP14_MAX_CR_TRIES);
> +	return false;
>  }
>  
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts
  2018-07-16 19:22 ` Marc Herbert
@ 2018-07-16 22:30   ` Rodrigo Vivi
  2018-07-16 23:27     ` Nathan Ciobanu
  0 siblings, 1 reply; 16+ messages in thread
From: Rodrigo Vivi @ 2018-07-16 22:30 UTC (permalink / raw)
  To: Marc Herbert; +Cc: intel-gfx, dhinakaran.pandiyan

On Mon, Jul 16, 2018 at 12:22:13PM -0700, Marc Herbert wrote:
> 
> While the shortness of v3 is really nice, I think it's rather a good
> opportunity to make much clearer (as a separate, no functional change
> patch?)  its existing terminating condition(s) and what the existing loop
> iterates on. I mean it's painful and risky enough to _combine multiple
> counters in the same loop_, so at least these different counters should be
> _invidually_ crystal-clear with enough comments. For instance let's pretend
> there are 4 possible voltage values and that each value is tried 4 times -
> then the last value will never be tried! Can this ever happen? If not then
> why not? Not even with a buggy sink?  If it can happen then is it fine to
> give up before trying some values? Is it compliant with the spec? Etc.

hm.. it seems that that code has room for improvements to make it more
clear and easier to map with the spec...
but yeap, separated patches.

> 
> This should incidentally help clarify why and how the current logic allows
> infinite loops.
> 
> BTW "max_vswing_tries" looks like a boolean, correct? If correct then please
> remove this confusing "increment from false to true":
> 
> - if (max_vswing_tries == 1)
> + if (max_vswing_tries)
> 
> - max_vswing_tries++;
> + max_vwing_tries = true;

if we change to boolean it is better to really change the type
and remove "_tries" from the name....

> 
> Marc
> 
> 
> On 16/07/2018 11:14, Nathan Ciobanu wrote:
> > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
> >  include/drm/drm_dp_helper.h                   | 1 +
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > index 4da6e33c7fa1..4bfba65c431c 100644
> > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> >  intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
> >  {
> >  	uint8_t voltage;
> > -	int voltage_tries, max_vswing_tries;
> > +	int voltage_tries, max_vswing_tries, cr_tries;
> >  	uint8_t link_config[2];
> >  	uint8_t link_bw, rate_select;
> >  
> > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> >  
> >  	voltage_tries = 1;
> >  	max_vswing_tries = 0;
> > -	for (;;) {
> > +	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {
> >  		uint8_t link_status[DP_LINK_STATUS_SIZE];
> >  
> >  		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
> > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> >  			++max_vswing_tries;
> >  
> >  	}
> > +	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
> > +		  DP_DP14_MAX_CR_TRIES);
> > +	return false;
> >  }
> >  
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts
  2018-07-16 18:14 [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts Nathan Ciobanu
  2018-07-16 19:22 ` Marc Herbert
@ 2018-07-16 22:34 ` Rodrigo Vivi
  2018-07-16 23:23   ` Nathan Ciobanu
  2018-07-17  8:01 ` ✗ Fi.CI.BAT: failure for drm/i915/dp: Give up link training clock recovery after 10 failed attempts (rev3) Patchwork
  2018-07-17  9:24 ` ✓ Fi.CI.IGT: success " Patchwork
  3 siblings, 1 reply; 16+ messages in thread
From: Rodrigo Vivi @ 2018-07-16 22:34 UTC (permalink / raw)
  To: Nathan Ciobanu; +Cc: intel-gfx, dhinakaran.pandiyan, marc.herbert

On Mon, Jul 16, 2018 at 11:14:45AM -0700, Nathan Ciobanu wrote:
> Limit the link training clock recovery loop to 10 failed attempts at
> LANEx_CR_DONE per DP 1.4 spec section 3.5.1.2.2.

Thanks... so this made me look to DP 1.3 and DP 1.4 differences here.
Are we already addressing all new conditions?

I wonder if that should be at drm level.

I noticed that one difference, 100us wait difference is already
addressed there at drm level...

> Some USB-C MST hubs
> cause us to get stuck in this loop indefinitely requesting
> 
>     voltage swing: 0, pre-emphasis level: 2
>     voltage swing: 1, pre-emphasis level: 2
>     voltage swing: 0, pre-emphasis level: 3
> 
> over and over: max_vswing is never reached, drm_dp_clock_recovery_ok()
> never returns true and voltage_tries always gets reset to 1. The driver
> sends those values to the hub but the hub keeps requesting new values
> every time in the pattern shown above.
> 
> Changes in v2:
>     - updated commit message (DK, Manasi)
>     - defined DP_DP14_MAX_CR_TRIES (Marc)
>     - made the loop iterate for max 10 times (Rodrigo, Marc)
> 
> Changes in v3:
>     - changed error message to use DP_DP14_MAX_CR_TRIES
> 
> Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
>  include/drm/drm_dp_helper.h                   | 1 +
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> index 4da6e33c7fa1..4bfba65c431c 100644
> --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
>  intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
>  {
>  	uint8_t voltage;
> -	int voltage_tries, max_vswing_tries;
> +	int voltage_tries, max_vswing_tries, cr_tries;
>  	uint8_t link_config[2];
>  	uint8_t link_bw, rate_select;
>  
> @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
>  
>  	voltage_tries = 1;
>  	max_vswing_tries = 0;
> -	for (;;) {
> +	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {

I know that I was the on that asked you to make this global here,
but just now with better spec pointers and definition is that I noticed
that this is only for DP 1.4 spec while this code here runs for all
DP...  So we need to change in a way that we check the spec version somehow...

and possibly at drm level?!

>  		uint8_t link_status[DP_LINK_STATUS_SIZE];
>  
>  		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
> @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
>  			++max_vswing_tries;
>  
>  	}
> +	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
> +		  DP_DP14_MAX_CR_TRIES);
> +	return false;
>  }
>  
>  /*
> diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
> index c01564991a9f..2303ad8ed24e 100644
> --- a/include/drm/drm_dp_helper.h
> +++ b/include/drm/drm_dp_helper.h
> @@ -820,6 +820,7 @@
>  
>  #define DP_DP13_DPCD_REV                    0x2200
>  #define DP_DP13_MAX_LINK_RATE               0x2201
> +#define DP_DP14_MAX_CR_TRIES                10
>  
>  #define DP_DPRX_FEATURE_ENUMERATION_LIST    0x2210  /* DP 1.3 */
>  # define DP_GTC_CAP					(1 << 0)  /* DP 1.3 */
> -- 
> 1.9.1
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts
  2018-07-16 22:34 ` Rodrigo Vivi
@ 2018-07-16 23:23   ` Nathan Ciobanu
  2018-07-16 23:40     ` Rodrigo Vivi
  0 siblings, 1 reply; 16+ messages in thread
From: Nathan Ciobanu @ 2018-07-16 23:23 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-gfx, dhinakaran.pandiyan, marc.herbert

On Mon, Jul 16, 2018 at 03:34:58PM -0700, Rodrigo Vivi wrote:
> On Mon, Jul 16, 2018 at 11:14:45AM -0700, Nathan Ciobanu wrote:
> > Limit the link training clock recovery loop to 10 failed attempts at
> > LANEx_CR_DONE per DP 1.4 spec section 3.5.1.2.2.
> 
> Thanks... so this made me look to DP 1.3 and DP 1.4 differences here.
> Are we already addressing all new conditions?
> 
> I wonder if that should be at drm level.
> 
> I noticed that one difference, 100us wait difference is already
> addressed there at drm level...
> 
> > Some USB-C MST hubs
> > cause us to get stuck in this loop indefinitely requesting
> > 
> >     voltage swing: 0, pre-emphasis level: 2
> >     voltage swing: 1, pre-emphasis level: 2
> >     voltage swing: 0, pre-emphasis level: 3
> > 
> > over and over: max_vswing is never reached, drm_dp_clock_recovery_ok()
> > never returns true and voltage_tries always gets reset to 1. The driver
> > sends those values to the hub but the hub keeps requesting new values
> > every time in the pattern shown above.
> > 
> > Changes in v2:
> >     - updated commit message (DK, Manasi)
> >     - defined DP_DP14_MAX_CR_TRIES (Marc)
> >     - made the loop iterate for max 10 times (Rodrigo, Marc)
> > 
> > Changes in v3:
> >     - changed error message to use DP_DP14_MAX_CR_TRIES
> > 
> > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
> >  include/drm/drm_dp_helper.h                   | 1 +
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > index 4da6e33c7fa1..4bfba65c431c 100644
> > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> >  intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
> >  {
> >  	uint8_t voltage;
> > -	int voltage_tries, max_vswing_tries;
> > +	int voltage_tries, max_vswing_tries, cr_tries;
> >  	uint8_t link_config[2];
> >  	uint8_t link_bw, rate_select;
> >  
> > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> >  
> >  	voltage_tries = 1;
> >  	max_vswing_tries = 0;
> > -	for (;;) {
> > +	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {
> 
> I know that I was the on that asked you to make this global here,
> but just now with better spec pointers and definition is that I noticed
> that this is only for DP 1.4 spec while this code here runs for all
> DP...  So we need to change in a way that we check the spec version somehow...
I think the bug is with this infinite loop which is at the mercy of an external device
and in my case I have this MST hub which appears to be DP 1.2 that triggers the
infinite loop case. We have to limit the number of iterations and
DP 1.4 spec happened to fix this issue. I'm a newbie in this area but in this case
shouldn't we apply the same counter to all <= "DP 1.4" devices?

-Nathan 
> 
> and possibly at drm level?!
> 
> >  		uint8_t link_status[DP_LINK_STATUS_SIZE];
> >  
> >  		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
> > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> >  			++max_vswing_tries;
> >  
> >  	}
> > +	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
> > +		  DP_DP14_MAX_CR_TRIES);
> > +	return false;
> >  }
> >  
> >  /*
> > diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
> > index c01564991a9f..2303ad8ed24e 100644
> > --- a/include/drm/drm_dp_helper.h
> > +++ b/include/drm/drm_dp_helper.h
> > @@ -820,6 +820,7 @@
> >  
> >  #define DP_DP13_DPCD_REV                    0x2200
> >  #define DP_DP13_MAX_LINK_RATE               0x2201
> > +#define DP_DP14_MAX_CR_TRIES                10
> >  
> >  #define DP_DPRX_FEATURE_ENUMERATION_LIST    0x2210  /* DP 1.3 */
> >  # define DP_GTC_CAP					(1 << 0)  /* DP 1.3 */
> > -- 
> > 1.9.1
> > 
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts
  2018-07-16 22:30   ` Rodrigo Vivi
@ 2018-07-16 23:27     ` Nathan Ciobanu
  2018-07-16 23:39       ` Rodrigo Vivi
  0 siblings, 1 reply; 16+ messages in thread
From: Nathan Ciobanu @ 2018-07-16 23:27 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: Marc Herbert, intel-gfx, dhinakaran.pandiyan

On Mon, Jul 16, 2018 at 03:30:49PM -0700, Rodrigo Vivi wrote:
> On Mon, Jul 16, 2018 at 12:22:13PM -0700, Marc Herbert wrote:
> > 
> > While the shortness of v3 is really nice, I think it's rather a good
> > opportunity to make much clearer (as a separate, no functional change
> > patch?)  its existing terminating condition(s) and what the existing loop
> > iterates on. I mean it's painful and risky enough to _combine multiple
> > counters in the same loop_, so at least these different counters should be
> > _invidually_ crystal-clear with enough comments. For instance let's pretend
> > there are 4 possible voltage values and that each value is tried 4 times -
> > then the last value will never be tried! Can this ever happen? If not then
> > why not? Not even with a buggy sink?  If it can happen then is it fine to
> > give up before trying some values? Is it compliant with the spec? Etc.
> 
> hm.. it seems that that code has room for improvements to make it more
> clear and easier to map with the spec...
> but yeap, separated patches.
Do we like to add macros for the other counters? I can try to clarify this a bit.

> 
> > 
> > This should incidentally help clarify why and how the current logic allows
> > infinite loops.
> > 
> > BTW "max_vswing_tries" looks like a boolean, correct? If correct then please
> > remove this confusing "increment from false to true":
> > 
> > - if (max_vswing_tries == 1)
> > + if (max_vswing_tries)
> > 
> > - max_vswing_tries++;
> > + max_vwing_tries = true;
> 
> if we change to boolean it is better to really change the type
> and remove "_tries" from the name....
Yup, will do!

> 
> > 
> > Marc
> > 
> > 
> > On 16/07/2018 11:14, Nathan Ciobanu wrote:
> > > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
> > >  include/drm/drm_dp_helper.h                   | 1 +
> > >  2 files changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > index 4da6e33c7fa1..4bfba65c431c 100644
> > > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > >  intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
> > >  {
> > >  	uint8_t voltage;
> > > -	int voltage_tries, max_vswing_tries;
> > > +	int voltage_tries, max_vswing_tries, cr_tries;
> > >  	uint8_t link_config[2];
> > >  	uint8_t link_bw, rate_select;
> > >  
> > > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > >  
> > >  	voltage_tries = 1;
> > >  	max_vswing_tries = 0;
> > > -	for (;;) {
> > > +	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {
> > >  		uint8_t link_status[DP_LINK_STATUS_SIZE];
> > >  
> > >  		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
> > > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > >  			++max_vswing_tries;
> > >  
> > >  	}
> > > +	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
> > > +		  DP_DP14_MAX_CR_TRIES);
> > > +	return false;
> > >  }
> > >  
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts
  2018-07-16 23:27     ` Nathan Ciobanu
@ 2018-07-16 23:39       ` Rodrigo Vivi
  0 siblings, 0 replies; 16+ messages in thread
From: Rodrigo Vivi @ 2018-07-16 23:39 UTC (permalink / raw)
  To: Nathan Ciobanu; +Cc: Marc Herbert, intel-gfx, dhinakaran.pandiyan

On Mon, Jul 16, 2018 at 04:27:47PM -0700, Nathan Ciobanu wrote:
> On Mon, Jul 16, 2018 at 03:30:49PM -0700, Rodrigo Vivi wrote:
> > On Mon, Jul 16, 2018 at 12:22:13PM -0700, Marc Herbert wrote:
> > > 
> > > While the shortness of v3 is really nice, I think it's rather a good
> > > opportunity to make much clearer (as a separate, no functional change
> > > patch?)  its existing terminating condition(s) and what the existing loop
> > > iterates on. I mean it's painful and risky enough to _combine multiple
> > > counters in the same loop_, so at least these different counters should be
> > > _invidually_ crystal-clear with enough comments. For instance let's pretend
> > > there are 4 possible voltage values and that each value is tried 4 times -
> > > then the last value will never be tried! Can this ever happen? If not then
> > > why not? Not even with a buggy sink?  If it can happen then is it fine to
> > > give up before trying some values? Is it compliant with the spec? Etc.
> > 
> > hm.. it seems that that code has room for improvements to make it more
> > clear and easier to map with the spec...
> > but yeap, separated patches.
> Do we like to add macros for the other counters? I can try to clarify this a bit.
> 
> > 
> > > 
> > > This should incidentally help clarify why and how the current logic allows
> > > infinite loops.
> > > 
> > > BTW "max_vswing_tries" looks like a boolean, correct? If correct then please
> > > remove this confusing "increment from false to true":
> > > 
> > > - if (max_vswing_tries == 1)
> > > + if (max_vswing_tries)
> > > 
> > > - max_vswing_tries++;
> > > + max_vwing_tries = true;
> > 
> > if we change to boolean it is better to really change the type
> > and remove "_tries" from the name....
> Yup, will do!

please note I'm not telling that we "should do"... but only "if we do" ;)

> 
> > 
> > > 
> > > Marc
> > > 
> > > 
> > > On 16/07/2018 11:14, Nathan Ciobanu wrote:
> > > > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
> > > >  include/drm/drm_dp_helper.h                   | 1 +
> > > >  2 files changed, 6 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > > index 4da6e33c7fa1..4bfba65c431c 100644
> > > > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > > >  intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
> > > >  {
> > > >  	uint8_t voltage;
> > > > -	int voltage_tries, max_vswing_tries;
> > > > +	int voltage_tries, max_vswing_tries, cr_tries;
> > > >  	uint8_t link_config[2];
> > > >  	uint8_t link_bw, rate_select;
> > > >  
> > > > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > > >  
> > > >  	voltage_tries = 1;
> > > >  	max_vswing_tries = 0;
> > > > -	for (;;) {
> > > > +	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {
> > > >  		uint8_t link_status[DP_LINK_STATUS_SIZE];
> > > >  
> > > >  		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
> > > > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > > >  			++max_vswing_tries;
> > > >  
> > > >  	}
> > > > +	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
> > > > +		  DP_DP14_MAX_CR_TRIES);
> > > > +	return false;
> > > >  }
> > > >  
> > 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts
  2018-07-16 23:23   ` Nathan Ciobanu
@ 2018-07-16 23:40     ` Rodrigo Vivi
  2018-07-16 23:51       ` Marc Herbert
  0 siblings, 1 reply; 16+ messages in thread
From: Rodrigo Vivi @ 2018-07-16 23:40 UTC (permalink / raw)
  To: Nathan Ciobanu; +Cc: intel-gfx, dhinakaran.pandiyan, marc.herbert

On Mon, Jul 16, 2018 at 04:23:23PM -0700, Nathan Ciobanu wrote:
> On Mon, Jul 16, 2018 at 03:34:58PM -0700, Rodrigo Vivi wrote:
> > On Mon, Jul 16, 2018 at 11:14:45AM -0700, Nathan Ciobanu wrote:
> > > Limit the link training clock recovery loop to 10 failed attempts at
> > > LANEx_CR_DONE per DP 1.4 spec section 3.5.1.2.2.
> > 
> > Thanks... so this made me look to DP 1.3 and DP 1.4 differences here.
> > Are we already addressing all new conditions?
> > 
> > I wonder if that should be at drm level.
> > 
> > I noticed that one difference, 100us wait difference is already
> > addressed there at drm level...
> > 
> > > Some USB-C MST hubs
> > > cause us to get stuck in this loop indefinitely requesting
> > > 
> > >     voltage swing: 0, pre-emphasis level: 2
> > >     voltage swing: 1, pre-emphasis level: 2
> > >     voltage swing: 0, pre-emphasis level: 3
> > > 
> > > over and over: max_vswing is never reached, drm_dp_clock_recovery_ok()
> > > never returns true and voltage_tries always gets reset to 1. The driver
> > > sends those values to the hub but the hub keeps requesting new values
> > > every time in the pattern shown above.
> > > 
> > > Changes in v2:
> > >     - updated commit message (DK, Manasi)
> > >     - defined DP_DP14_MAX_CR_TRIES (Marc)
> > >     - made the loop iterate for max 10 times (Rodrigo, Marc)
> > > 
> > > Changes in v3:
> > >     - changed error message to use DP_DP14_MAX_CR_TRIES
> > > 
> > > Signed-off-by: Nathan Ciobanu <nathan.d.ciobanu@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_dp_link_training.c | 7 +++++--
> > >  include/drm/drm_dp_helper.h                   | 1 +
> > >  2 files changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_dp_link_training.c b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > index 4da6e33c7fa1..4bfba65c431c 100644
> > > --- a/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > +++ b/drivers/gpu/drm/i915/intel_dp_link_training.c
> > > @@ -129,7 +129,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > >  intel_dp_link_training_clock_recovery(struct intel_dp *intel_dp)
> > >  {
> > >  	uint8_t voltage;
> > > -	int voltage_tries, max_vswing_tries;
> > > +	int voltage_tries, max_vswing_tries, cr_tries;
> > >  	uint8_t link_config[2];
> > >  	uint8_t link_bw, rate_select;
> > >  
> > > @@ -172,7 +172,7 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > >  
> > >  	voltage_tries = 1;
> > >  	max_vswing_tries = 0;
> > > -	for (;;) {
> > > +	for (cr_tries = 0; cr_tries < DP_DP14_MAX_CR_TRIES; ++cr_tries) {
> > 
> > I know that I was the on that asked you to make this global here,
> > but just now with better spec pointers and definition is that I noticed
> > that this is only for DP 1.4 spec while this code here runs for all
> > DP...  So we need to change in a way that we check the spec version somehow...
> I think the bug is with this infinite loop which is at the mercy of an external device
> and in my case I have this MST hub which appears to be DP 1.2 that triggers the
> infinite loop case. We have to limit the number of iterations and
> DP 1.4 spec happened to fix this issue. I'm a newbie in this area but in this case
> shouldn't we apply the same counter to all <= "DP 1.4" devices?

ok, the infinite loop situation is really bad... but I don't believe
we are going with the right fix...
and a good indication is that your fix is for DP-1.4 while your bug is seeing on DP-1.2

> 
> -Nathan 
> > 
> > and possibly at drm level?!
> > 
> > >  		uint8_t link_status[DP_LINK_STATUS_SIZE];
> > >  
> > >  		drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
> > > @@ -216,6 +216,9 @@ static bool intel_dp_link_max_vswing_reached(struct intel_dp *intel_dp)
> > >  			++max_vswing_tries;
> > >  
> > >  	}
> > > +	DRM_ERROR("Failed clock recovery %d times, giving up!\n",
> > > +		  DP_DP14_MAX_CR_TRIES);
> > > +	return false;
> > >  }
> > >  
> > >  /*
> > > diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
> > > index c01564991a9f..2303ad8ed24e 100644
> > > --- a/include/drm/drm_dp_helper.h
> > > +++ b/include/drm/drm_dp_helper.h
> > > @@ -820,6 +820,7 @@
> > >  
> > >  #define DP_DP13_DPCD_REV                    0x2200
> > >  #define DP_DP13_MAX_LINK_RATE               0x2201
> > > +#define DP_DP14_MAX_CR_TRIES                10
> > >  
> > >  #define DP_DPRX_FEATURE_ENUMERATION_LIST    0x2210  /* DP 1.3 */
> > >  # define DP_GTC_CAP					(1 << 0)  /* DP 1.3 */
> > > -- 
> > > 1.9.1
> > > 
> > 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts
  2018-07-16 23:40     ` Rodrigo Vivi
@ 2018-07-16 23:51       ` Marc Herbert
  2018-07-17 22:21         ` Dhinakaran Pandiyan
  0 siblings, 1 reply; 16+ messages in thread
From: Marc Herbert @ 2018-07-16 23:51 UTC (permalink / raw)
  Cc: intel-gfx, dhinakaran.pandiyan


>> I think the bug is with this infinite loop which is at the mercy of an external device
>> and in my case I have this MST hub which appears to be DP 1.2 that triggers the
>> infinite loop case. We have to limit the number of iterations and
>> DP 1.4 spec happened to fix this issue. I'm a newbie in this area but in this case
>> shouldn't we apply the same counter to all <= "DP 1.4" devices?
> 
> ok, the infinite loop situation is really bad... but I don't believe
> we are going with the right fix...
> and a good indication is that your fix is for DP-1.4 while your bug is seeing on DP-1.2

I thought the only reason the infinite loop fix isn't in 1.2 is just because there's
no 1.2.1 spec... (plus the naive assumption that buggy sinks don't exist)



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* ✗ Fi.CI.BAT: failure for drm/i915/dp: Give up link training clock recovery after 10 failed attempts (rev3)
  2018-07-16 18:14 [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts Nathan Ciobanu
  2018-07-16 19:22 ` Marc Herbert
  2018-07-16 22:34 ` Rodrigo Vivi
@ 2018-07-17  8:01 ` Patchwork
  2018-07-17  9:24 ` ✓ Fi.CI.IGT: success " Patchwork
  3 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2018-07-17  8:01 UTC (permalink / raw)
  To: Nathan Ciobanu; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/dp: Give up link training clock recovery after 10 failed attempts (rev3)
URL   : https://patchwork.freedesktop.org/series/46506/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4500 -> Patchwork_9680 =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_9680 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9680, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/46506/revisions/3/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9680:

  === IGT changes ===

    ==== Possible regressions ====

    igt@drv_selftest@live_workarounds:
      fi-bsw-n3050:       PASS -> DMESG-FAIL

    
    ==== Warnings ====

    igt@gem_exec_gttfill@basic:
      fi-pnv-d510:        PASS -> SKIP

    
== Known issues ==

  Here are the changes found in Patchwork_9680 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_selftest@live_coherency:
      fi-gdg-551:         PASS -> DMESG-FAIL (fdo#107164)

    igt@gem_exec_suspend@basic-s4-devices:
      {fi-kbl-8809g}:     NOTRUN -> INCOMPLETE (fdo#107139)

    igt@kms_busy@basic-flip-c:
      fi-skl-6700hq:      PASS -> DMESG-WARN (fdo#105998) +2

    
    ==== Possible fixes ====

    igt@debugfs_test@read_all_entries:
      fi-snb-2520m:       INCOMPLETE (fdo#103713) -> PASS

    igt@gem_exec_suspend@basic-s3:
      {fi-cfl-8109u}:     DMESG-WARN -> PASS

    igt@kms_flip@basic-flip-vs-modeset:
      fi-glk-j4005:       DMESG-WARN (fdo#106000) -> PASS

    igt@kms_pipe_crc_basic@hang-read-crc-pipe-b:
      fi-skl-6700hq:      DMESG-WARN (fdo#105998) -> PASS +5

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
      fi-bxt-dsi:         INCOMPLETE (fdo#103927) -> PASS

    
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#105998 https://bugs.freedesktop.org/show_bug.cgi?id=105998
  fdo#106000 https://bugs.freedesktop.org/show_bug.cgi?id=106000
  fdo#107139 https://bugs.freedesktop.org/show_bug.cgi?id=107139
  fdo#107164 https://bugs.freedesktop.org/show_bug.cgi?id=107164


== Participating hosts (46 -> 42) ==

  Additional (1): fi-kbl-8809g 
  Missing    (5): fi-ctg-p8600 fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-hsw-4200u 


== Build changes ==

    * Linux: CI_DRM_4500 -> Patchwork_9680

  CI_DRM_4500: 83c81d39fd8918d96ddbc330d5dab44415479112 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4559: 6d341aac2124836443ce74e8e97a4508ac8d5095 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9680: 3413714843c032b9e813b7aa2a9e79adfd23d4c0 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

3413714843c0 drm/i915/dp: Give up link training clock recovery after 10 failed attempts

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9680/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915/dp: Give up link training clock recovery after 10 failed attempts (rev3)
  2018-07-16 18:14 [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts Nathan Ciobanu
                   ` (2 preceding siblings ...)
  2018-07-17  8:01 ` ✗ Fi.CI.BAT: failure for drm/i915/dp: Give up link training clock recovery after 10 failed attempts (rev3) Patchwork
@ 2018-07-17  9:24 ` Patchwork
  3 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2018-07-17  9:24 UTC (permalink / raw)
  To: Nathan Ciobanu; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/dp: Give up link training clock recovery after 10 failed attempts (rev3)
URL   : https://patchwork.freedesktop.org/series/46506/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4500_full -> Patchwork_9680_full =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_9680_full need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9680_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9680_full:

  === IGT changes ===

    ==== Warnings ====

    igt@kms_vblank@pipe-b-query-forked-hang:
      shard-snb:          PASS -> SKIP

    
== Known issues ==

  Here are the changes found in Patchwork_9680_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_cs_tlb@render:
      shard-snb:          PASS -> INCOMPLETE (fdo#105411)

    igt@gem_exec_big:
      shard-hsw:          PASS -> INCOMPLETE (fdo#103540)

    igt@perf@blocking:
      shard-hsw:          PASS -> FAIL (fdo#102252)

    igt@perf_pmu@rc6-runtime-pm-long:
      shard-glk:          PASS -> FAIL (fdo#105010)

    
    ==== Possible fixes ====

    igt@kms_flip@2x-dpms-vs-vblank-race:
      shard-hsw:          FAIL (fdo#103060) -> PASS

    igt@kms_flip@2x-flip-vs-expired-vblank:
      shard-glk:          FAIL (fdo#105363) -> PASS

    igt@kms_flip@plain-flip-fb-recreate-interruptible:
      shard-glk:          FAIL (fdo#100368) -> PASS

    igt@kms_setmode@basic:
      shard-snb:          FAIL (fdo#99912) -> PASS

    igt@kms_universal_plane@cursor-fb-leak-pipe-a:
      shard-apl:          FAIL (fdo#107241) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252
  fdo#103060 https://bugs.freedesktop.org/show_bug.cgi?id=103060
  fdo#103540 https://bugs.freedesktop.org/show_bug.cgi?id=103540
  fdo#105010 https://bugs.freedesktop.org/show_bug.cgi?id=105010
  fdo#105363 https://bugs.freedesktop.org/show_bug.cgi?id=105363
  fdo#105411 https://bugs.freedesktop.org/show_bug.cgi?id=105411
  fdo#107241 https://bugs.freedesktop.org/show_bug.cgi?id=107241
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4500 -> Patchwork_9680

  CI_DRM_4500: 83c81d39fd8918d96ddbc330d5dab44415479112 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4559: 6d341aac2124836443ce74e8e97a4508ac8d5095 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9680: 3413714843c032b9e813b7aa2a9e79adfd23d4c0 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9680/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts
  2018-07-16 23:51       ` Marc Herbert
@ 2018-07-17 22:21         ` Dhinakaran Pandiyan
  2018-07-18  1:05           ` Nathan Ciobanu
  0 siblings, 1 reply; 16+ messages in thread
From: Dhinakaran Pandiyan @ 2018-07-17 22:21 UTC (permalink / raw)
  To: Marc Herbert, intel-gfx, Rodrigo Vivi, Nathan Ciobanu

On Mon, 2018-07-16 at 16:51 -0700, Marc Herbert wrote:
> > 
> > > 
> > > I think the bug is with this infinite loop which is at the mercy
> > > of an external device
> > > and in my case I have this MST hub which appears to be DP 1.2
> > > that triggers the
> > > infinite loop case. We have to limit the number of iterations and
> > > DP 1.4 spec happened to fix this issue. I'm a newbie in this area
> > > but in this case
> > > shouldn't we apply the same counter to all <= "DP 1.4" devices?
> > ok, the infinite loop situation is really bad... but I don't
> > believe
> > we are going with the right fix...
> > and a good indication is that your fix is for DP-1.4 while your bug
> > is seeing on DP-1.2
> I thought the only reason the infinite loop fix isn't in 1.2 is just
> because there's
> no 1.2.1 spec... (plus the naive assumption that buggy sinks don't
> exist)
> 
Irrespective of whether the sink is DP1.2 or 1.4, if there are sinks
out there that keep toggling between two values there should be an
overall limit to how many times this loop gets executed. Even if this
isn't right fix for the issue at hand, the loop has to break.

Then there's the question of what the limit should be. We could use the
DP1.4 limit as a reference and apply it widely. But there's a problem
here, we have 4 vswing values and 4 pre-emphasis values. If the sink
progressively changes one variable at a time, we'll need at least 16
iterations. Nathan's patch here will prematurely error out and that
doesn't sound reasonable to impose on non DP1.4 sinks.

-DK


> 
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts
  2018-07-17 22:21         ` Dhinakaran Pandiyan
@ 2018-07-18  1:05           ` Nathan Ciobanu
  2018-07-18  3:00             ` Dhinakaran Pandiyan
  2018-07-19 17:01             ` Rodrigo Vivi
  0 siblings, 2 replies; 16+ messages in thread
From: Nathan Ciobanu @ 2018-07-18  1:05 UTC (permalink / raw)
  To: Dhinakaran Pandiyan; +Cc: Marc Herbert, intel-gfx, Rodrigo Vivi

On Tue, Jul 17, 2018 at 03:21:17PM -0700, Dhinakaran Pandiyan wrote:
> On Mon, 2018-07-16 at 16:51 -0700, Marc Herbert wrote:
> > > 
> > > > 
> > > > I think the bug is with this infinite loop which is at the mercy
> > > > of an external device
> > > > and in my case I have this MST hub which appears to be DP 1.2
> > > > that triggers the
> > > > infinite loop case. We have to limit the number of iterations and
> > > > DP 1.4 spec happened to fix this issue. I'm a newbie in this area
> > > > but in this case
> > > > shouldn't we apply the same counter to all <= "DP 1.4" devices?
> > > ok, the infinite loop situation is really bad... but I don't
> > > believe
> > > we are going with the right fix...
> > > and a good indication is that your fix is for DP-1.4 while your bug
> > > is seeing on DP-1.2
> > I thought the only reason the infinite loop fix isn't in 1.2 is just
> > because there's
> > no 1.2.1 spec... (plus the naive assumption that buggy sinks don't
> > exist)
> > 
> Irrespective of whether the sink is DP1.2 or 1.4, if there are sinks
> out there that keep toggling between two values there should be an
> overall limit to how many times this loop gets executed. Even if this
> isn't right fix for the issue at hand, the loop has to break.
> 
> Then there's the question of what the limit should be. We could use the
> DP1.4 limit as a reference and apply it widely. But there's a problem
> here, we have 4 vswing values and 4 pre-emphasis values. If the sink
> progressively changes one variable at a time, we'll need at least 16
> iterations. Nathan's patch here will prematurely error out and that
> doesn't sound reasonable to impose on non DP1.4 sinks.

I think it would be a max of 13 iterations since one of the vswing
values will be max_vswing and the loop will terminate at that point
without trying the other 3 preemph values for that voltage, correct?

-Nathan
> 
> -DK
> 
> 
> > 
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts
  2018-07-18  1:05           ` Nathan Ciobanu
@ 2018-07-18  3:00             ` Dhinakaran Pandiyan
  2018-07-19 17:01             ` Rodrigo Vivi
  1 sibling, 0 replies; 16+ messages in thread
From: Dhinakaran Pandiyan @ 2018-07-18  3:00 UTC (permalink / raw)
  To: Nathan Ciobanu; +Cc: Marc Herbert, intel-gfx, Rodrigo Vivi

On Tue, 2018-07-17 at 18:05 -0700, Nathan Ciobanu wrote:
> On Tue, Jul 17, 2018 at 03:21:17PM -0700, Dhinakaran Pandiyan wrote:
> > 
> > On Mon, 2018-07-16 at 16:51 -0700, Marc Herbert wrote:
> > > 
> > > > 
> > > > 
> > > > > 
> > > > > 
> > > > > I think the bug is with this infinite loop which is at the
> > > > > mercy
> > > > > of an external device
> > > > > and in my case I have this MST hub which appears to be DP 1.2
> > > > > that triggers the
> > > > > infinite loop case. We have to limit the number of iterations
> > > > > and
> > > > > DP 1.4 spec happened to fix this issue. I'm a newbie in this
> > > > > area
> > > > > but in this case
> > > > > shouldn't we apply the same counter to all <= "DP 1.4"
> > > > > devices?
> > > > ok, the infinite loop situation is really bad... but I don't
> > > > believe
> > > > we are going with the right fix...
> > > > and a good indication is that your fix is for DP-1.4 while your
> > > > bug
> > > > is seeing on DP-1.2
> > > I thought the only reason the infinite loop fix isn't in 1.2 is
> > > just
> > > because there's
> > > no 1.2.1 spec... (plus the naive assumption that buggy sinks
> > > don't
> > > exist)
> > > 
> > Irrespective of whether the sink is DP1.2 or 1.4, if there are
> > sinks
> > out there that keep toggling between two values there should be an
> > overall limit to how many times this loop gets executed. Even if
> > this
> > isn't right fix for the issue at hand, the loop has to break.
> > 
> > Then there's the question of what the limit should be. We could use
> > the
> > DP1.4 limit as a reference and apply it widely. But there's a
> > problem
> > here, we have 4 vswing values and 4 pre-emphasis values. If the
> > sink
> > progressively changes one variable at a time, we'll need at least
> > 16
> > iterations. Nathan's patch here will prematurely error out and that
> > doesn't sound reasonable to impose on non DP1.4 sinks.
> I think it would be a max of 13 iterations since one of the vswing
> values will be max_vswing and the loop will terminate at that point
> without trying the other 3 preemph values for that voltage, correct?
> 
The spec defines maximum voltage swing level as the "sum of the
VOLTAGE_SWING_LANEx and PREEMPHASIS_LEVEL_LANEx value". So, we should
be looping through all values.

Can you check if https://patchwork.freedesktop.org/patch/239520/ helps?
Also, please file a fdo bug with dmesg.

-DK



> -Nathan
> > 
> > 
> > -DK
> > 
> > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts
  2018-07-18  1:05           ` Nathan Ciobanu
  2018-07-18  3:00             ` Dhinakaran Pandiyan
@ 2018-07-19 17:01             ` Rodrigo Vivi
  2018-07-19 18:42               ` Nathan Ciobanu
  1 sibling, 1 reply; 16+ messages in thread
From: Rodrigo Vivi @ 2018-07-19 17:01 UTC (permalink / raw)
  To: Nathan Ciobanu; +Cc: Marc Herbert, intel-gfx, Dhinakaran Pandiyan

On Tue, Jul 17, 2018 at 06:05:51PM -0700, Nathan Ciobanu wrote:
> On Tue, Jul 17, 2018 at 03:21:17PM -0700, Dhinakaran Pandiyan wrote:
> > On Mon, 2018-07-16 at 16:51 -0700, Marc Herbert wrote:
> > > > 
> > > > > 
> > > > > I think the bug is with this infinite loop which is at the mercy
> > > > > of an external device
> > > > > and in my case I have this MST hub which appears to be DP 1.2
> > > > > that triggers the
> > > > > infinite loop case. We have to limit the number of iterations and
> > > > > DP 1.4 spec happened to fix this issue. I'm a newbie in this area
> > > > > but in this case
> > > > > shouldn't we apply the same counter to all <= "DP 1.4" devices?
> > > > ok, the infinite loop situation is really bad... but I don't
> > > > believe
> > > > we are going with the right fix...
> > > > and a good indication is that your fix is for DP-1.4 while your bug
> > > > is seeing on DP-1.2
> > > I thought the only reason the infinite loop fix isn't in 1.2 is just
> > > because there's
> > > no 1.2.1 spec... (plus the naive assumption that buggy sinks don't
> > > exist)
> > > 
> > Irrespective of whether the sink is DP1.2 or 1.4, if there are sinks
> > out there that keep toggling between two values there should be an
> > overall limit to how many times this loop gets executed. Even if this
> > isn't right fix for the issue at hand, the loop has to break.
> > 
> > Then there's the question of what the limit should be. We could use the
> > DP1.4 limit as a reference and apply it widely. But there's a problem
> > here, we have 4 vswing values and 4 pre-emphasis values. If the sink
> > progressively changes one variable at a time, we'll need at least 16
> > iterations. Nathan's patch here will prematurely error out and that
> > doesn't sound reasonable to impose on non DP1.4 sinks.
> 
> I think it would be a max of 13 iterations since one of the vswing
> values will be max_vswing and the loop will terminate at that point
> without trying the other 3 preemph values for that voltage, correct?

I was talking to DK yesterday and he suggested that we should go with
a huge number for DP_1.2 and with the spec for DP_1.4. According to DK
80 was covering all combinations few times.

So you get your patch and create some max_cr_tries = dp_1.4 ? spec : 80;

or something like that.

I think I like that because infinite loop situation here is so bad
and we should avoid even if it was something else that got really wrong.

Thanks,
Rodrigo.

> 
> -Nathan
> > 
> > -DK
> > 
> > 
> > > 
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts
  2018-07-19 17:01             ` Rodrigo Vivi
@ 2018-07-19 18:42               ` Nathan Ciobanu
  0 siblings, 0 replies; 16+ messages in thread
From: Nathan Ciobanu @ 2018-07-19 18:42 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: Marc Herbert, intel-gfx, Dhinakaran Pandiyan

On Thu, Jul 19, 2018 at 10:01:41AM -0700, Rodrigo Vivi wrote:
> On Tue, Jul 17, 2018 at 06:05:51PM -0700, Nathan Ciobanu wrote:
> > On Tue, Jul 17, 2018 at 03:21:17PM -0700, Dhinakaran Pandiyan wrote:
> > > On Mon, 2018-07-16 at 16:51 -0700, Marc Herbert wrote:
> > > > > 
> > > > > > 
> > > > > > I think the bug is with this infinite loop which is at the mercy
> > > > > > of an external device
> > > > > > and in my case I have this MST hub which appears to be DP 1.2
> > > > > > that triggers the
> > > > > > infinite loop case. We have to limit the number of iterations and
> > > > > > DP 1.4 spec happened to fix this issue. I'm a newbie in this area
> > > > > > but in this case
> > > > > > shouldn't we apply the same counter to all <= "DP 1.4" devices?
> > > > > ok, the infinite loop situation is really bad... but I don't
> > > > > believe
> > > > > we are going with the right fix...
> > > > > and a good indication is that your fix is for DP-1.4 while your bug
> > > > > is seeing on DP-1.2
> > > > I thought the only reason the infinite loop fix isn't in 1.2 is just
> > > > because there's
> > > > no 1.2.1 spec... (plus the naive assumption that buggy sinks don't
> > > > exist)
> > > > 
> > > Irrespective of whether the sink is DP1.2 or 1.4, if there are sinks
> > > out there that keep toggling between two values there should be an
> > > overall limit to how many times this loop gets executed. Even if this
> > > isn't right fix for the issue at hand, the loop has to break.
> > > 
> > > Then there's the question of what the limit should be. We could use the
> > > DP1.4 limit as a reference and apply it widely. But there's a problem
> > > here, we have 4 vswing values and 4 pre-emphasis values. If the sink
> > > progressively changes one variable at a time, we'll need at least 16
> > > iterations. Nathan's patch here will prematurely error out and that
> > > doesn't sound reasonable to impose on non DP1.4 sinks.
> > 
> > I think it would be a max of 13 iterations since one of the vswing
> > values will be max_vswing and the loop will terminate at that point
> > without trying the other 3 preemph values for that voltage, correct?
> 
> I was talking to DK yesterday and he suggested that we should go with
> a huge number for DP_1.2 and with the spec for DP_1.4. According to DK
> 80 was covering all combinations few times.
> 
> So you get your patch and create some max_cr_tries = dp_1.4 ? spec : 80;
> 
> or something like that.
> 
> I think I like that because infinite loop situation here is so bad
> and we should avoid even if it was something else that got really wrong.
I just sent v4 to do that. Thanks!
-Nathan
> 
> Thanks,
> Rodrigo.
> 
> > 
> > -Nathan
> > > 
> > > -DK
> > > 
> > > 
> > > > 
> > > > 
> > > > _______________________________________________
> > > > Intel-gfx mailing list
> > > > Intel-gfx@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-07-19 18:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-16 18:14 [PATCH v3] drm/i915/dp: Give up link training clock recovery after 10 failed attempts Nathan Ciobanu
2018-07-16 19:22 ` Marc Herbert
2018-07-16 22:30   ` Rodrigo Vivi
2018-07-16 23:27     ` Nathan Ciobanu
2018-07-16 23:39       ` Rodrigo Vivi
2018-07-16 22:34 ` Rodrigo Vivi
2018-07-16 23:23   ` Nathan Ciobanu
2018-07-16 23:40     ` Rodrigo Vivi
2018-07-16 23:51       ` Marc Herbert
2018-07-17 22:21         ` Dhinakaran Pandiyan
2018-07-18  1:05           ` Nathan Ciobanu
2018-07-18  3:00             ` Dhinakaran Pandiyan
2018-07-19 17:01             ` Rodrigo Vivi
2018-07-19 18:42               ` Nathan Ciobanu
2018-07-17  8:01 ` ✗ Fi.CI.BAT: failure for drm/i915/dp: Give up link training clock recovery after 10 failed attempts (rev3) Patchwork
2018-07-17  9:24 ` ✓ Fi.CI.IGT: success " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).