netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Corinna Vinschen <vinschen@redhat.com>
To: Chris H <christopher.s.hall@intel.com>
Cc: intel-wired-lan@lists.osuosl.org, david.zage@intel.com,
	vinicius.gomes@intel.com, netdev@vger.kernel.org,
	rodrigo.cadore@l-acoustics.com,
	Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Subject: Re: [PATCH iwl-net v2 1/4] igc: Ensure the PTM cycle is reliably triggered
Date: Wed, 30 Oct 2024 11:17:58 +0100	[thread overview]
Message-ID: <ZyIH1rVt35JAFTW8@calimero.vinschen.de> (raw)
In-Reply-To: <20241023023040.111429-2-christopher.s.hall@intel.com>

Hi Chris,

On Oct 23 02:30, Chris H wrote:
> From: Christopher S M Hall <christopher.s.hall@intel.com>
> 
> Writing to clear the PTM status 'valid' bit while the PTM cycle is
> triggered results in unreliable PTM operation. To fix this, clear the
> PTM 'trigger' and status after each PTM transaction.
> 
> The issue can be reproduced with the following:
> 
> $ sudo phc2sys -R 1000 -O 0 -i tsn0 -m
> 
> Note: 1000 Hz (-R 1000) is unrealistically large, but provides a way to
> quickly reproduce the issue.
> 
> PHC2SYS exits with:
> 
> "ioctl PTP_OFFSET_PRECISE: Connection timed out" when the PTM transaction
>   fails
> 
> Fixes: a90ec8483732 ("igc: Add support for PTP getcrosststamp()")
> Signed-off-by: Christopher S M Hall <christopher.s.hall@intel.com>
> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

In the v1 thread in August I asked for adding a description of the kdump
problem we observed to the commit message.  I came up with this text,
maybe we can still add it, or maybe a shhortened version of it?

  This patch also fixes a hang in igc_probe() when loading the igc
  driver in the kdump kernel on systems supporting PTM.

  The igc driver running in the base kernel enables PTM trigger in
  igc_probe().  Therefore the driver is always in PTM trigger mode,
  except in brief periods when manually triggering a PTM cycle.

  When a crash occurs, the NIC is reset while PTM trigger is enabled.
  Due to a hardware problem, the NIC is subsequently in a bad busmaster
  state and doesn't handle register reads/writes.  When running
  igc_probe() in the kdump kernel, the first register access to a NIC
  register hangs driver probing and ultimately breaks kdump.

  With this patch, igc has PTM trigger disabled most of the time,
  and the trigger is only enabled for very brief (10 - 100 us) periods
  when manually triggering a PTM cycle.  Chances that a crash occurs
  during a PTM trigger are not 0, but extremly reduced.


Thanks,
Corinna


> ---
>  drivers/net/ethernet/intel/igc/igc_defines.h |  1 +
>  drivers/net/ethernet/intel/igc/igc_ptp.c     | 70 ++++++++++++--------
>  2 files changed, 42 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
> index 8e449904aa7d..afd0512dc9f8 100644
> --- a/drivers/net/ethernet/intel/igc/igc_defines.h
> +++ b/drivers/net/ethernet/intel/igc/igc_defines.h
> @@ -593,6 +593,7 @@
>  #define IGC_PTM_STAT_T4M1_OVFL		BIT(3) /* T4 minus T1 overflow */
>  #define IGC_PTM_STAT_ADJUST_1ST		BIT(4) /* 1588 timer adjusted during 1st PTM cycle */
>  #define IGC_PTM_STAT_ADJUST_CYC		BIT(5) /* 1588 timer adjusted during non-1st PTM cycle */
> +#define IGC_PTM_STAT_ALL        	GENMASK(5, 0) /* Used to clear all status */
>  
>  /* PCIe PTM Cycle Control */
>  #define IGC_PTM_CYCLE_CTRL_CYC_TIME(msec)	((msec) & 0x3ff) /* PTM Cycle Time (msec) */
> diff --git a/drivers/net/ethernet/intel/igc/igc_ptp.c b/drivers/net/ethernet/intel/igc/igc_ptp.c
> index 946edbad4302..00cc80d8d164 100644
> --- a/drivers/net/ethernet/intel/igc/igc_ptp.c
> +++ b/drivers/net/ethernet/intel/igc/igc_ptp.c
> @@ -974,11 +974,38 @@ static void igc_ptm_log_error(struct igc_adapter *adapter, u32 ptm_stat)
>  	}
>  }
>  
> +static void igc_ptm_trigger(struct igc_hw *hw)
> +{
> +	u32 ctrl;
> +
> +	/* To "manually" start the PTM cycle we need to set the
> +	 * trigger (TRIG) bit
> +	 */
> +	ctrl = rd32(IGC_PTM_CTRL);
> +	ctrl |= IGC_PTM_CTRL_TRIG;
> +	wr32(IGC_PTM_CTRL, ctrl);
> +	/* Perform flush after write to CTRL register otherwise
> +	 * transaction may not start
> +	 */
> +	wrfl();
> +}
> +
> +static void igc_ptm_reset(struct igc_hw *hw)
> +{
> +	u32 ctrl;
> +
> +	ctrl = rd32(IGC_PTM_CTRL);
> +	ctrl &= ~IGC_PTM_CTRL_TRIG;
> +	wr32(IGC_PTM_CTRL, ctrl);
> +	/* Write to clear all status */
> +	wr32(IGC_PTM_STAT, IGC_PTM_STAT_ALL);
> +}
> +
>  static int igc_phc_get_syncdevicetime(ktime_t *device,
>  				      struct system_counterval_t *system,
>  				      void *ctx)
>  {
> -	u32 stat, t2_curr_h, t2_curr_l, ctrl;
> +	u32 stat, t2_curr_h, t2_curr_l;
>  	struct igc_adapter *adapter = ctx;
>  	struct igc_hw *hw = &adapter->hw;
>  	int err, count = 100;
> @@ -994,25 +1021,13 @@ static int igc_phc_get_syncdevicetime(ktime_t *device,
>  		 * are transitory. Repeating the process returns valid
>  		 * data eventually.
>  		 */
> -
> -		/* To "manually" start the PTM cycle we need to clear and
> -		 * then set again the TRIG bit.
> -		 */
> -		ctrl = rd32(IGC_PTM_CTRL);
> -		ctrl &= ~IGC_PTM_CTRL_TRIG;
> -		wr32(IGC_PTM_CTRL, ctrl);
> -		ctrl |= IGC_PTM_CTRL_TRIG;
> -		wr32(IGC_PTM_CTRL, ctrl);
> -
> -		/* The cycle only starts "for real" when software notifies
> -		 * that it has read the registers, this is done by setting
> -		 * VALID bit.
> -		 */
> -		wr32(IGC_PTM_STAT, IGC_PTM_STAT_VALID);
> +		igc_ptm_trigger(hw);
>  
>  		err = readx_poll_timeout(rd32, IGC_PTM_STAT, stat,
>  					 stat, IGC_PTM_STAT_SLEEP,
>  					 IGC_PTM_STAT_TIMEOUT);
> +		igc_ptm_reset(hw);
> +
>  		if (err < 0) {
>  			netdev_err(adapter->netdev, "Timeout reading IGC_PTM_STAT register\n");
>  			return err;
> @@ -1021,15 +1036,7 @@ static int igc_phc_get_syncdevicetime(ktime_t *device,
>  		if ((stat & IGC_PTM_STAT_VALID) == IGC_PTM_STAT_VALID)
>  			break;
>  
> -		if (stat & ~IGC_PTM_STAT_VALID) {
> -			/* An error occurred, log it. */
> -			igc_ptm_log_error(adapter, stat);
> -			/* The STAT register is write-1-to-clear (W1C),
> -			 * so write the previous error status to clear it.
> -			 */
> -			wr32(IGC_PTM_STAT, stat);
> -			continue;
> -		}
> +		igc_ptm_log_error(adapter, stat);
>  	} while (--count);
>  
>  	if (!count) {
> @@ -1255,7 +1262,7 @@ void igc_ptp_stop(struct igc_adapter *adapter)
>  void igc_ptp_reset(struct igc_adapter *adapter)
>  {
>  	struct igc_hw *hw = &adapter->hw;
> -	u32 cycle_ctrl, ctrl;
> +	u32 cycle_ctrl, ctrl, stat;
>  	unsigned long flags;
>  	u32 timadj;
>  
> @@ -1290,14 +1297,19 @@ void igc_ptp_reset(struct igc_adapter *adapter)
>  		ctrl = IGC_PTM_CTRL_EN |
>  			IGC_PTM_CTRL_START_NOW |
>  			IGC_PTM_CTRL_SHRT_CYC(IGC_PTM_SHORT_CYC_DEFAULT) |
> -			IGC_PTM_CTRL_PTM_TO(IGC_PTM_TIMEOUT_DEFAULT) |
> -			IGC_PTM_CTRL_TRIG;
> +			IGC_PTM_CTRL_PTM_TO(IGC_PTM_TIMEOUT_DEFAULT);
>  
>  		wr32(IGC_PTM_CTRL, ctrl);
>  
>  		/* Force the first cycle to run. */
> -		wr32(IGC_PTM_STAT, IGC_PTM_STAT_VALID);
> +		igc_ptm_trigger(hw);
> +
> +		if (readx_poll_timeout_atomic(rd32, IGC_PTM_STAT, stat,
> +					      stat, IGC_PTM_STAT_SLEEP,
> +					      IGC_PTM_STAT_TIMEOUT))
> +			netdev_err(adapter->netdev, "Timeout reading IGC_PTM_STAT register\n");
>  
> +		igc_ptm_reset(hw);
>  		break;
>  	default:
>  		/* No work to do. */
> -- 
> 2.34.1


  parent reply	other threads:[~2024-10-30 10:18 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-23  2:30 [PATCH iwl-net v2 0/4] igc: Fix PTM timeout Chris H
2024-10-23  2:30 ` [PATCH iwl-net v2 1/4] igc: Ensure the PTM cycle is reliably triggered Chris H
2024-10-29 21:21   ` Tony Nguyen
2024-10-30 10:17   ` Corinna Vinschen [this message]
2024-10-31  9:34   ` [Intel-wired-lan] " Avigail Dahan
2024-10-23  2:30 ` [PATCH iwl-net v2 2/4] igc: Lengthen the hardware retry time to prevent timeouts Chris H
2024-10-31  9:35   ` [Intel-wired-lan] " Avigail Dahan
2024-10-23  2:30 ` [PATCH iwl-net v2 3/4] igc: Move ktime snapshot into PTM retry loop Chris H
2024-10-31  9:35   ` [Intel-wired-lan] " Mor Bar-Gabay
2024-10-23  2:30 ` [PATCH iwl-net v2 4/4] igc: Add lock preventing multiple simultaneous PTM transactions Chris H
2024-10-29 21:20   ` Tony Nguyen
2024-10-30 10:20   ` Corinna Vinschen
2024-10-31  9:36   ` [Intel-wired-lan] " Mor Bar-Gabay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZyIH1rVt35JAFTW8@calimero.vinschen.de \
    --to=vinschen@redhat.com \
    --cc=christopher.s.hall@intel.com \
    --cc=david.zage@intel.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=michal.swiatkowski@linux.intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=rodrigo.cadore@l-acoustics.com \
    --cc=vinicius.gomes@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).