netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
@ 2022-08-01 13:37 achaiken
  2022-08-01 21:34 ` Keller, Jacob E
  0 siblings, 1 reply; 9+ messages in thread
From: achaiken @ 2022-08-01 13:37 UTC (permalink / raw)
  To: jesse.brandeburg, richardcochran
  Cc: spayne, achaiken, alison, netdev, intel-wired-lan

From: Steve Payne <spayne@aurora.tech>

For an unknown reason, when `ixgbe_ptp_start_cyclecounter` is called
from `ixgbe_watchdog_link_is_down` the PHC on the NIC jumps backward
by a seemingly inconsistent amount, which causes discontinuities in
time synchronization. Explicitly reset the NIC's PHC to
`CLOCK_REALTIME` whenever the NIC goes up or down by calling
`ixgbe_ptp_reset` instead of the bare `ixgbe_ptp_start_cyclecounter`.

Signed-off-by: Steve Payne <spayne@aurora.tech>
Signed-off-by: Alison Chaiken <achaiken@aurora.tech>

---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 750b02bb2fdc2..ab1ec076fa75f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7462,7 +7462,7 @@ static void ixgbe_watchdog_link_is_up(struct ixgbe_adapter *adapter)
 	adapter->last_rx_ptp_check = jiffies;
 
 	if (test_bit(__IXGBE_PTP_RUNNING, &adapter->state))
-		ixgbe_ptp_start_cyclecounter(adapter);
+		ixgbe_ptp_reset(adapter);
 
 	switch (link_speed) {
 	case IXGBE_LINK_SPEED_10GB_FULL:
@@ -7527,7 +7527,7 @@ static void ixgbe_watchdog_link_is_down(struct ixgbe_adapter *adapter)
 		adapter->flags2 |= IXGBE_FLAG2_SEARCH_FOR_SFP;
 
 	if (test_bit(__IXGBE_PTP_RUNNING, &adapter->state))
-		ixgbe_ptp_start_cyclecounter(adapter);
+		ixgbe_ptp_reset(adapter);
 
 	e_info(drv, "NIC Link is Down\n");
 	netif_carrier_off(netdev);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
  2022-08-01 13:37 [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550 achaiken
@ 2022-08-01 21:34 ` Keller, Jacob E
       [not found]   ` <CAFzL-7tX845o2kJmE4o8EhbeD-=vkR6rmaiz_ZEWfSD4W+iWEA@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Keller, Jacob E @ 2022-08-01 21:34 UTC (permalink / raw)
  To: achaiken@aurora.tech, Brandeburg, Jesse, richardcochran@gmail.com
  Cc: spayne@aurora.tech, alison@she-devel.com, netdev@vger.kernel.org,
	intel-wired-lan@lists.osuosl.org



> -----Original Message-----
> From: achaiken@aurora.tech <achaiken@aurora.tech>
> Sent: Monday, August 01, 2022 6:38 AM
> To: Brandeburg, Jesse <jesse.brandeburg@intel.com>;
> richardcochran@gmail.com
> Cc: spayne@aurora.tech; achaiken@aurora.tech; alison@she-devel.com;
> netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org
> Subject: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
> 
> From: Steve Payne <spayne@aurora.tech>
> 
> For an unknown reason, when `ixgbe_ptp_start_cyclecounter` is called
> from `ixgbe_watchdog_link_is_down` the PHC on the NIC jumps backward
> by a seemingly inconsistent amount, which causes discontinuities in
> time synchronization. Explicitly reset the NIC's PHC to
> `CLOCK_REALTIME` whenever the NIC goes up or down by calling
> `ixgbe_ptp_reset` instead of the bare `ixgbe_ptp_start_cyclecounter`.
> 
> Signed-off-by: Steve Payne <spayne@aurora.tech>
> Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
> 

Resetting PTP could be a problem if the clock was not being synchronized with the kernel CLOCK_REALTIME, and does result in some loss of timer precision either way due to the delays involved with setting the time.

Do you have an example of the clock jump? How much is it? How often is it? Every time? More information would help in order to debug what is going wrong here.

Thanks,
Jake

> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 750b02bb2fdc2..ab1ec076fa75f 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -7462,7 +7462,7 @@ static void ixgbe_watchdog_link_is_up(struct
> ixgbe_adapter *adapter)
>  	adapter->last_rx_ptp_check = jiffies;
> 
>  	if (test_bit(__IXGBE_PTP_RUNNING, &adapter->state))
> -		ixgbe_ptp_start_cyclecounter(adapter);
> +		ixgbe_ptp_reset(adapter);
> 
>  	switch (link_speed) {
>  	case IXGBE_LINK_SPEED_10GB_FULL:
> @@ -7527,7 +7527,7 @@ static void ixgbe_watchdog_link_is_down(struct
> ixgbe_adapter *adapter)
>  		adapter->flags2 |= IXGBE_FLAG2_SEARCH_FOR_SFP;
> 
>  	if (test_bit(__IXGBE_PTP_RUNNING, &adapter->state))
> -		ixgbe_ptp_start_cyclecounter(adapter);
> +		ixgbe_ptp_reset(adapter);
> 
>  	e_info(drv, "NIC Link is Down\n");
>  	netif_carrier_off(netdev);
> --
> 2.32.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Fwd: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
       [not found]     ` <CAJmffrqxwFyRGpMRYRYLPi3yrLQgzqnW5UKgbgACGNqoN_hsVQ@mail.gmail.com>
@ 2022-08-01 23:00       ` Ilya Evenbach
  2022-08-01 23:29         ` Jacob Keller
  0 siblings, 1 reply; 9+ messages in thread
From: Ilya Evenbach @ 2022-08-01 23:00 UTC (permalink / raw)
  To: Alison Chaiken, jacob.e.keller, Steve Payne, jesse.brandeburg,
	richardcochran, netdev, intel-wired-lan

> > -----Original Message-----
> > From: achaiken@aurora.tech <achaiken@aurora.tech>
> > Sent: Monday, August 01, 2022 6:38 AM
> > To: Brandeburg, Jesse <jesse.brandeburg@intel.com>;
> > richardcochran@gmail.com
> > Cc: spayne@aurora.tech; achaiken@aurora.tech; alison@she-devel.com;
> > netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org
> > Subject: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
> >
> > From: Steve Payne <spayne@aurora.tech>
> >
> > For an unknown reason, when `ixgbe_ptp_start_cyclecounter` is called
> > from `ixgbe_watchdog_link_is_down` the PHC on the NIC jumps backward
> > by a seemingly inconsistent amount, which causes discontinuities in
> > time synchronization. Explicitly reset the NIC's PHC to
> > `CLOCK_REALTIME` whenever the NIC goes up or down by calling
> > `ixgbe_ptp_reset` instead of the bare `ixgbe_ptp_start_cyclecounter`.
> >
> > Signed-off-by: Steve Payne <spayne@aurora.tech>
> > Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
> >
>
> Resetting PTP could be a problem if the clock was not being synchronized with the kernel CLOCK_REALTIME,

That is true, but most likely not really important, as the unmitigated
problem also introduces significant discontinuities in time.
Basically, this patch does not make things worse.

>
> and does result in some loss of timer precision either way due to the delays involved with setting the time.

 That precision loss is negligible compared to jumps resulting from
link down/up, and should be corrected by normal PTP operation very
quickly.

>
> Do you have an example of the clock jump? How much is it?

2021-02-12T09:24:37.741191+00:00 bench-12 phc2sys: [195230.451]
CLOCK_REALTIME phc offset        61 s2 freq  -36503 delay   2298
2021-02-12T09:24:38.741315+00:00 bench-12 phc2sys: [195231.451]
CLOCK_REALTIME phc offset       169 s2 freq  -36377 delay   2294
2021-02-12T09:24:39.741407+00:00 bench-12 phc2sys: [195232.451]
CLOCK_REALTIME phc offset 195213702387037 s2 freq +100000000 delay
2301
2021-02-12T09:24:40.741489+00:00 bench-12 phc2sys: [195233.452]
CLOCK_REALTIME phc offset 195213591220495 s2 freq +100000000 delay
2081

>
> How often is it? Every time?

Every time (though the specific amount differs, it is usually at
similar magnitude)

> More information would help in order to debug what is going wrong here.
>
> Thanks,
> Jake
>
> > ---
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > index 750b02bb2fdc2..ab1ec076fa75f 100644
> > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > @@ -7462,7 +7462,7 @@ static void ixgbe_watchdog_link_is_up(struct
> > ixgbe_adapter *adapter)
> >       adapter->last_rx_ptp_check = jiffies;
> >
> >       if (test_bit(__IXGBE_PTP_RUNNING, &adapter->state))
> > -             ixgbe_ptp_start_cyclecounter(adapter);
> > +             ixgbe_ptp_reset(adapter);
> >
> >       switch (link_speed) {
> >       case IXGBE_LINK_SPEED_10GB_FULL:
> > @@ -7527,7 +7527,7 @@ static void ixgbe_watchdog_link_is_down(struct
> > ixgbe_adapter *adapter)
> >               adapter->flags2 |= IXGBE_FLAG2_SEARCH_FOR_SFP;
> >
> >       if (test_bit(__IXGBE_PTP_RUNNING, &adapter->state))
> > -             ixgbe_ptp_start_cyclecounter(adapter);
> > +             ixgbe_ptp_reset(adapter);
> >
> >       e_info(drv, "NIC Link is Down\n");
> >       netif_carrier_off(netdev);
> > --
> > 2.32.0
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
  2022-08-01 23:00       ` Fwd: " Ilya Evenbach
@ 2022-08-01 23:29         ` Jacob Keller
  2022-08-02  0:24           ` Alison Chaiken
  2022-08-02  0:26           ` Jacob Keller
  0 siblings, 2 replies; 9+ messages in thread
From: Jacob Keller @ 2022-08-01 23:29 UTC (permalink / raw)
  To: Ilya Evenbach, Alison Chaiken, Steve Payne, jesse.brandeburg,
	richardcochran, netdev, intel-wired-lan



On 8/1/2022 4:00 PM, Ilya Evenbach wrote:
>>> -----Original Message-----
>>> From: achaiken@aurora.tech <achaiken@aurora.tech>
>>> Sent: Monday, August 01, 2022 6:38 AM
>>> To: Brandeburg, Jesse <jesse.brandeburg@intel.com>;
>>> richardcochran@gmail.com
>>> Cc: spayne@aurora.tech; achaiken@aurora.tech; alison@she-devel.com;
>>> netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org
>>> Subject: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
>>>
>>> From: Steve Payne <spayne@aurora.tech>
>>>
>>> For an unknown reason, when `ixgbe_ptp_start_cyclecounter` is called
>>> from `ixgbe_watchdog_link_is_down` the PHC on the NIC jumps backward
>>> by a seemingly inconsistent amount, which causes discontinuities in
>>> time synchronization. Explicitly reset the NIC's PHC to
>>> `CLOCK_REALTIME` whenever the NIC goes up or down by calling
>>> `ixgbe_ptp_reset` instead of the bare `ixgbe_ptp_start_cyclecounter`.
>>>
>>> Signed-off-by: Steve Payne <spayne@aurora.tech>
>>> Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
>>>
>>
>> Resetting PTP could be a problem if the clock was not being synchronized with the kernel CLOCK_REALTIME,
> 
> That is true, but most likely not really important, as the unmitigated
> problem also introduces significant discontinuities in time.
> Basically, this patch does not make things worse.
> 

Sure, but I am trying to see if I can understand *why* things get wonky.
I suspect the issue is caused because of how we're resetting the
cyclecounter.

>>
>> and does result in some loss of timer precision either way due to the delays involved with setting the time.
> 
>  That precision loss is negligible compared to jumps resulting from
> link down/up, and should be corrected by normal PTP operation very
> quickly.
> 

Only if CLOCK_REALTIME is actually being synchronized. Yes, that is
generally true, but its not necessarily guaranteed.

>>
>> Do you have an example of the clock jump? How much is it?
> 
> 2021-02-12T09:24:37.741191+00:00 bench-12 phc2sys: [195230.451]
> CLOCK_REALTIME phc offset        61 s2 freq  -36503 delay   2298
> 2021-02-12T09:24:38.741315+00:00 bench-12 phc2sys: [195231.451]
> CLOCK_REALTIME phc offset       169 s2 freq  -36377 delay   2294
> 2021-02-12T09:24:39.741407+00:00 bench-12 phc2sys: [195232.451]
> CLOCK_REALTIME phc offset 195213702387037 s2 freq +100000000 delay
> 2301
> 2021-02-12T09:24:40.741489+00:00 bench-12 phc2sys: [195233.452]
> CLOCK_REALTIME phc offset 195213591220495 s2 freq +100000000 delay
> 2081
> 

Thanks.

I think what's actually going on is a bug in the
ixgbe_ptp_start_cyclecounter function where the system time registers
are being reset.

What hardware are you operating on? Do you know if its an X550 board? It
looks like this has been the case since a9763f3cb54c ("ixgbe: Update PTP
to support X550EM_x devices").

The start_cyclecounter was never supposed to modify the current time
registers, but resetting it to 0 as it does for X550 devices would give
the exact behavior you're seeing.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
  2022-08-01 23:29         ` Jacob Keller
@ 2022-08-02  0:24           ` Alison Chaiken
  2022-08-02  0:38             ` Jacob Keller
  2022-08-02  0:26           ` Jacob Keller
  1 sibling, 1 reply; 9+ messages in thread
From: Alison Chaiken @ 2022-08-02  0:24 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Ilya Evenbach, Steve Payne, jesse.brandeburg, richardcochran,
	netdev, intel-wired-lan

On Mon, Aug 1, 2022 at 4:29 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>
>
>
> On 8/1/2022 4:00 PM, Ilya Evenbach wrote:
> >>> -----Original Message-----
> >>> From: achaiken@aurora.tech <achaiken@aurora.tech>
> >>> Sent: Monday, August 01, 2022 6:38 AM
> >>> To: Brandeburg, Jesse <jesse.brandeburg@intel.com>;
> >>> richardcochran@gmail.com
> >>> Cc: spayne@aurora.tech; achaiken@aurora.tech; alison@she-devel.com;
> >>> netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org
> >>> Subject: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
> >>>
> >>> From: Steve Payne <spayne@aurora.tech>
> >>>
> >>> For an unknown reason, when `ixgbe_ptp_start_cyclecounter` is called
> >>> from `ixgbe_watchdog_link_is_down` the PHC on the NIC jumps backward
> >>> by a seemingly inconsistent amount, which causes discontinuities in
> >>> time synchronization. Explicitly reset the NIC's PHC to
> >>> `CLOCK_REALTIME` whenever the NIC goes up or down by calling
> >>> `ixgbe_ptp_reset` instead of the bare `ixgbe_ptp_start_cyclecounter`.
> >>>
> >>> Signed-off-by: Steve Payne <spayne@aurora.tech>
> >>> Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
> >>>
> >>
> >> Resetting PTP could be a problem if the clock was not being synchronized with the kernel CLOCK_REALTIME,
> >
> > That is true, but most likely not really important, as the unmitigated
> > problem also introduces significant discontinuities in time.
> > Basically, this patch does not make things worse.
> >
>
> Sure, but I am trying to see if I can understand *why* things get wonky.
> I suspect the issue is caused because of how we're resetting the
> cyclecounter.
>
> >>
> >> and does result in some loss of timer precision either way due to the delays involved with setting the time.
> >
> >  That precision loss is negligible compared to jumps resulting from
> > link down/up, and should be corrected by normal PTP operation very
> > quickly.
> >
>
> Only if CLOCK_REALTIME is actually being synchronized. Yes, that is
> generally true, but its not necessarily guaranteed.
>
> >>
> >> Do you have an example of the clock jump? How much is it?
> >
> > 2021-02-12T09:24:37.741191+00:00 bench-12 phc2sys: [195230.451]
> > CLOCK_REALTIME phc offset        61 s2 freq  -36503 delay   2298
> > 2021-02-12T09:24:38.741315+00:00 bench-12 phc2sys: [195231.451]
> > CLOCK_REALTIME phc offset       169 s2 freq  -36377 delay   2294
> > 2021-02-12T09:24:39.741407+00:00 bench-12 phc2sys: [195232.451]
> > CLOCK_REALTIME phc offset 195213702387037 s2 freq +100000000 delay
> > 2301
> > 2021-02-12T09:24:40.741489+00:00 bench-12 phc2sys: [195233.452]
> > CLOCK_REALTIME phc offset 195213591220495 s2 freq +100000000 delay
> > 2081
> >
>
> Thanks.
>
> I think what's actually going on is a bug in the
> ixgbe_ptp_start_cyclecounter function where the system time registers
> are being reset.
>
> What hardware are you operating on? Do you know if its an X550 board?

Indeed it is.

> It
> looks like this has been the case since a9763f3cb54c ("ixgbe: Update PTP
> to support X550EM_x devices").

The current test results come from v5.15.49-rt47. We observed the same
problem in 5.4.93-rt51, which contains a9763f3cb54c.

> The start_cyclecounter was never supposed to modify the current time
> registers, but resetting it to 0 as it does for X550 devices would give
> the exact behavior you're seeing.

That certainly sounds plausible.

Thanks,
Alison Chaiken
Aurora Innovation

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
  2022-08-01 23:29         ` Jacob Keller
  2022-08-02  0:24           ` Alison Chaiken
@ 2022-08-02  0:26           ` Jacob Keller
  2022-10-04 18:14             ` Alison Chaiken
  1 sibling, 1 reply; 9+ messages in thread
From: Jacob Keller @ 2022-08-02  0:26 UTC (permalink / raw)
  To: Ilya Evenbach, Alison Chaiken, Steve Payne, jesse.brandeburg,
	richardcochran, netdev, intel-wired-lan



On 8/1/2022 4:29 PM, Jacob Keller wrote:
> 
> 
> On 8/1/2022 4:00 PM, Ilya Evenbach wrote:
>>>> -----Original Message-----
>>>> From: achaiken@aurora.tech <achaiken@aurora.tech>
>>>> Sent: Monday, August 01, 2022 6:38 AM
>>>> To: Brandeburg, Jesse <jesse.brandeburg@intel.com>;
>>>> richardcochran@gmail.com
>>>> Cc: spayne@aurora.tech; achaiken@aurora.tech; alison@she-devel.com;
>>>> netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org
>>>> Subject: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
>>>>
>>>> From: Steve Payne <spayne@aurora.tech>
>>>>
>>>> For an unknown reason, when `ixgbe_ptp_start_cyclecounter` is called
>>>> from `ixgbe_watchdog_link_is_down` the PHC on the NIC jumps backward
>>>> by a seemingly inconsistent amount, which causes discontinuities in
>>>> time synchronization. Explicitly reset the NIC's PHC to
>>>> `CLOCK_REALTIME` whenever the NIC goes up or down by calling
>>>> `ixgbe_ptp_reset` instead of the bare `ixgbe_ptp_start_cyclecounter`.
>>>>
>>>> Signed-off-by: Steve Payne <spayne@aurora.tech>
>>>> Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
>>>>
>>>
>>> Resetting PTP could be a problem if the clock was not being synchronized with the kernel CLOCK_REALTIME,
>>
>> That is true, but most likely not really important, as the unmitigated
>> problem also introduces significant discontinuities in time.
>> Basically, this patch does not make things worse.
>>
> 
> Sure, but I am trying to see if I can understand *why* things get wonky.
> I suspect the issue is caused because of how we're resetting the
> cyclecounter.
> 
>>>
>>> and does result in some loss of timer precision either way due to the delays involved with setting the time.
>>
>>  That precision loss is negligible compared to jumps resulting from
>> link down/up, and should be corrected by normal PTP operation very
>> quickly.
>>
> 
> Only if CLOCK_REALTIME is actually being synchronized. Yes, that is
> generally true, but its not necessarily guaranteed.
> 
>>>
>>> Do you have an example of the clock jump? How much is it?
>>
>> 2021-02-12T09:24:37.741191+00:00 bench-12 phc2sys: [195230.451]
>> CLOCK_REALTIME phc offset        61 s2 freq  -36503 delay   2298
>> 2021-02-12T09:24:38.741315+00:00 bench-12 phc2sys: [195231.451]
>> CLOCK_REALTIME phc offset       169 s2 freq  -36377 delay   2294
>> 2021-02-12T09:24:39.741407+00:00 bench-12 phc2sys: [195232.451]
>> CLOCK_REALTIME phc offset 195213702387037 s2 freq +100000000 delay
>> 2301
>> 2021-02-12T09:24:40.741489+00:00 bench-12 phc2sys: [195233.452]
>> CLOCK_REALTIME phc offset 195213591220495 s2 freq +100000000 delay
>> 2081
>>
> 
> Thanks.
> 
> I think what's actually going on is a bug in the
> ixgbe_ptp_start_cyclecounter function where the system time registers
> are being reset.
> 
> What hardware are you operating on? Do you know if its an X550 board? It
> looks like this has been the case since a9763f3cb54c ("ixgbe: Update PTP
> to support X550EM_x devices").
> 
> The start_cyclecounter was never supposed to modify the current time
> registers, but resetting it to 0 as it does for X550 devices would give
> the exact behavior you're seeing.

I just posted an alternative fix which I believe resolves this issue.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
  2022-08-02  0:24           ` Alison Chaiken
@ 2022-08-02  0:38             ` Jacob Keller
  0 siblings, 0 replies; 9+ messages in thread
From: Jacob Keller @ 2022-08-02  0:38 UTC (permalink / raw)
  To: Alison Chaiken
  Cc: Ilya Evenbach, Steve Payne, jesse.brandeburg, richardcochran,
	netdev, intel-wired-lan



On 8/1/2022 5:24 PM, Alison Chaiken wrote:
> On Mon, Aug 1, 2022 at 4:29 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>>
>>
>>
>> On 8/1/2022 4:00 PM, Ilya Evenbach wrote:
>>>>> -----Original Message-----
>>>>> From: achaiken@aurora.tech <achaiken@aurora.tech>
>>>>> Sent: Monday, August 01, 2022 6:38 AM
>>>>> To: Brandeburg, Jesse <jesse.brandeburg@intel.com>;
>>>>> richardcochran@gmail.com
>>>>> Cc: spayne@aurora.tech; achaiken@aurora.tech; alison@she-devel.com;
>>>>> netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org
>>>>> Subject: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
>>>>>
>>>>> From: Steve Payne <spayne@aurora.tech>
>>>>>
>>>>> For an unknown reason, when `ixgbe_ptp_start_cyclecounter` is called
>>>>> from `ixgbe_watchdog_link_is_down` the PHC on the NIC jumps backward
>>>>> by a seemingly inconsistent amount, which causes discontinuities in
>>>>> time synchronization. Explicitly reset the NIC's PHC to
>>>>> `CLOCK_REALTIME` whenever the NIC goes up or down by calling
>>>>> `ixgbe_ptp_reset` instead of the bare `ixgbe_ptp_start_cyclecounter`.
>>>>>
>>>>> Signed-off-by: Steve Payne <spayne@aurora.tech>
>>>>> Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
>>>>>
>>>>
>>>> Resetting PTP could be a problem if the clock was not being synchronized with the kernel CLOCK_REALTIME,
>>>
>>> That is true, but most likely not really important, as the unmitigated
>>> problem also introduces significant discontinuities in time.
>>> Basically, this patch does not make things worse.
>>>
>>
>> Sure, but I am trying to see if I can understand *why* things get wonky.
>> I suspect the issue is caused because of how we're resetting the
>> cyclecounter.
>>
>>>>
>>>> and does result in some loss of timer precision either way due to the delays involved with setting the time.
>>>
>>>  That precision loss is negligible compared to jumps resulting from
>>> link down/up, and should be corrected by normal PTP operation very
>>> quickly.
>>>
>>
>> Only if CLOCK_REALTIME is actually being synchronized. Yes, that is
>> generally true, but its not necessarily guaranteed.
>>
>>>>
>>>> Do you have an example of the clock jump? How much is it?
>>>
>>> 2021-02-12T09:24:37.741191+00:00 bench-12 phc2sys: [195230.451]
>>> CLOCK_REALTIME phc offset        61 s2 freq  -36503 delay   2298
>>> 2021-02-12T09:24:38.741315+00:00 bench-12 phc2sys: [195231.451]
>>> CLOCK_REALTIME phc offset       169 s2 freq  -36377 delay   2294
>>> 2021-02-12T09:24:39.741407+00:00 bench-12 phc2sys: [195232.451]
>>> CLOCK_REALTIME phc offset 195213702387037 s2 freq +100000000 delay
>>> 2301
>>> 2021-02-12T09:24:40.741489+00:00 bench-12 phc2sys: [195233.452]
>>> CLOCK_REALTIME phc offset 195213591220495 s2 freq +100000000 delay
>>> 2081
>>>
>>
>> Thanks.
>>
>> I think what's actually going on is a bug in the
>> ixgbe_ptp_start_cyclecounter function where the system time registers
>> are being reset.
>>
>> What hardware are you operating on? Do you know if its an X550 board?
> 
> Indeed it is.
> 
>> It
>> looks like this has been the case since a9763f3cb54c ("ixgbe: Update PTP
>> to support X550EM_x devices").
> 
> The current test results come from v5.15.49-rt47. We observed the same
> problem in 5.4.93-rt51, which contains a9763f3cb54c.
> 
>> The start_cyclecounter was never supposed to modify the current time
>> registers, but resetting it to 0 as it does for X550 devices would give
>> the exact behavior you're seeing.
> 
> That certainly sounds plausible.
> 
> Thanks,
> Alison Chaiken
> Aurora Innovation

I just posted a fix which moves the SYSTIME clearing out of
start_cyclecounter and into ixgbe_ptp_reset. I'm fairly confident that
its the correct fix, based on the function comments. I think the
implementor for the X550 simply didn't understand the separation of
ixgbe_ptp_start_cyclecouter and ixgbe_ptp_reset.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
  2022-08-02  0:26           ` Jacob Keller
@ 2022-10-04 18:14             ` Alison Chaiken
  2022-10-04 21:08               ` Jacob Keller
  0 siblings, 1 reply; 9+ messages in thread
From: Alison Chaiken @ 2022-10-04 18:14 UTC (permalink / raw)
  To: anthony.l.nguyen
  Cc: Steve Payne, jesse.brandeburg, richardcochran, netdev,
	intel-wired-lan, Jacob Keller

How about this Intel X550 PTP fix for 6.1?

https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20220801/029590.html

Thanks,
Alison Chaiken
Aurora Innovation

On Mon, Aug 1, 2022 at 5:26 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>
> On 8/1/2022 4:29 PM, Jacob Keller wrote:
> >
> >
> > On 8/1/2022 4:00 PM, Ilya Evenbach wrote:
> >>>> -----Original Message-----
> >>>> From: achaiken@aurora.tech <achaiken@aurora.tech>
> >>>> Sent: Monday, August 01, 2022 6:38 AM
> >>>> To: Brandeburg, Jesse <jesse.brandeburg@intel.com>;
> >>>> richardcochran@gmail.com
> >>>> Cc: spayne@aurora.tech; achaiken@aurora.tech; alison@she-devel.com;
> >>>> netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org
> >>>> Subject: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
> >>>>
> >>>> From: Steve Payne <spayne@aurora.tech>
> >>>>
> >>>> For an unknown reason, when `ixgbe_ptp_start_cyclecounter` is called
> >>>> from `ixgbe_watchdog_link_is_down` the PHC on the NIC jumps backward
> >>>> by a seemingly inconsistent amount, which causes discontinuities in
> >>>> time synchronization. Explicitly reset the NIC's PHC to
> >>>> `CLOCK_REALTIME` whenever the NIC goes up or down by calling
> >>>> `ixgbe_ptp_reset` instead of the bare `ixgbe_ptp_start_cyclecounter`.
> >>>>
> >>>> Signed-off-by: Steve Payne <spayne@aurora.tech>
> >>>> Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
> >>>>
> >>>
> >>> Resetting PTP could be a problem if the clock was not being synchronized with the kernel CLOCK_REALTIME,
> >>
> >> That is true, but most likely not really important, as the unmitigated
> >> problem also introduces significant discontinuities in time.
> >> Basically, this patch does not make things worse.
> >>
> >
> > Sure, but I am trying to see if I can understand *why* things get wonky.
> > I suspect the issue is caused because of how we're resetting the
> > cyclecounter.
> >
> >>>
> >>> and does result in some loss of timer precision either way due to the delays involved with setting the time.
> >>
> >>  That precision loss is negligible compared to jumps resulting from
> >> link down/up, and should be corrected by normal PTP operation very
> >> quickly.
> >>
> >
> > Only if CLOCK_REALTIME is actually being synchronized. Yes, that is
> > generally true, but its not necessarily guaranteed.
> >
> >>>
> >>> Do you have an example of the clock jump? How much is it?
> >>
> >> 2021-02-12T09:24:37.741191+00:00 bench-12 phc2sys: [195230.451]
> >> CLOCK_REALTIME phc offset        61 s2 freq  -36503 delay   2298
> >> 2021-02-12T09:24:38.741315+00:00 bench-12 phc2sys: [195231.451]
> >> CLOCK_REALTIME phc offset       169 s2 freq  -36377 delay   2294
> >> 2021-02-12T09:24:39.741407+00:00 bench-12 phc2sys: [195232.451]
> >> CLOCK_REALTIME phc offset 195213702387037 s2 freq +100000000 delay
> >> 2301
> >> 2021-02-12T09:24:40.741489+00:00 bench-12 phc2sys: [195233.452]
> >> CLOCK_REALTIME phc offset 195213591220495 s2 freq +100000000 delay
> >> 2081
> >>
> >
> > Thanks.
> >
> > I think what's actually going on is a bug in the
> > ixgbe_ptp_start_cyclecounter function where the system time registers
> > are being reset.
> >
> > What hardware are you operating on? Do you know if its an X550 board? It
> > looks like this has been the case since a9763f3cb54c ("ixgbe: Update PTP
> > to support X550EM_x devices").
> >
> > The start_cyclecounter was never supposed to modify the current time
> > registers, but resetting it to 0 as it does for X550 devices would give
> > the exact behavior you're seeing.
>
> I just posted an alternative fix which I believe resolves this issue.
>
> Thanks,
> Jake

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Fwd: [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550
  2022-10-04 18:14             ` Alison Chaiken
@ 2022-10-04 21:08               ` Jacob Keller
  0 siblings, 0 replies; 9+ messages in thread
From: Jacob Keller @ 2022-10-04 21:08 UTC (permalink / raw)
  To: Alison Chaiken, anthony.l.nguyen
  Cc: Steve Payne, jesse.brandeburg, richardcochran, netdev,
	intel-wired-lan



On 10/4/2022 11:14 AM, Alison Chaiken wrote:
> How about this Intel X550 PTP fix for 6.1?
> 
> https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20220801/029590.html
> 
> Thanks,
> Alison Chaiken
> Aurora Innovation
> 


This was already submitted to net as [1], and it looks like it's already
applied to the stable Linux 5.19, 5.15, and 5.10 stable trees.

I've confirmed that it's in the v6.0 tag as commit 25d7a5f5a6bb ("ixgbe:
stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter")

[1]
https://lore.kernel.org/netdev/20220824193748.874343-2-anthony.l.nguyen@intel.com/

Hope that helps!

Thanks,
Jake

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-10-04 21:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-01 13:37 [PATCH] Use ixgbe_ptp_reset on linkup/linkdown for X550 achaiken
2022-08-01 21:34 ` Keller, Jacob E
     [not found]   ` <CAFzL-7tX845o2kJmE4o8EhbeD-=vkR6rmaiz_ZEWfSD4W+iWEA@mail.gmail.com>
     [not found]     ` <CAJmffrqxwFyRGpMRYRYLPi3yrLQgzqnW5UKgbgACGNqoN_hsVQ@mail.gmail.com>
2022-08-01 23:00       ` Fwd: " Ilya Evenbach
2022-08-01 23:29         ` Jacob Keller
2022-08-02  0:24           ` Alison Chaiken
2022-08-02  0:38             ` Jacob Keller
2022-08-02  0:26           ` Jacob Keller
2022-10-04 18:14             ` Alison Chaiken
2022-10-04 21:08               ` Jacob Keller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).