public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
To: Bob Van Valzah <bob@vanvalzah.com>, intel-wired-lan@lists.osuosl.org
Cc: anthony.l.nguyen@intel.com, netdev@vger.kernel.org,
	julianstj@fb.com, jeff@jeffgeerling.com
Subject: Re: [PATCH] igc: fix Tx timestamp timeout caused by unlocked TIMINCA write in adj fine
Date: Mon, 30 Mar 2026 17:39:08 +0100	[thread overview]
Message-ID: <65977d5b-16eb-418c-995e-6a918f67707a@linux.dev> (raw)
In-Reply-To: <D1C3B3DF-960F-40C7-BBD7-994359F0C8AD@vanvalzah.com>

On 29/03/2026 04:25, Bob Van Valzah wrote:
> Hi,
> 
> We found a race in igc_ptp_adjfine_i225() that causes "Tx timestamp
> timeout" errors and eventually wedges EXTTS when a PTP grandmaster
> (ptp4l with hardware timestamping) runs concurrently with PHC
> frequency discipline (any GPSDO calling clock_adjtime ADJ_FREQUENCY).
> 
> Root cause: igc_ptp_adjfine_i225() writes IGC_TIMINCA without holding
> any lock.  Every other PTP clock operation in igc_ptp.c (adjtime,
> gettime, settime) holds tmreg_lock, but adjfine does not.  When the
> increment rate changes while the hardware is capturing a TX timestamp,
> the captured value is corrupt.  The driver retries for
> IGC_PTP_TX_TIMEOUT (15s), then logs the timeout and frees the skb.
> Repeated occurrences eventually prevent EXTTS from delivering events.
> 
> The attached reproducer (triggers in ~17 seconds on i226):
> 
>    One thread calling clock_adjtime(ADJ_FREQUENCY) at ~200k/s on the
>    PHC, another sending UDP packets with SO_TIMESTAMPING requesting
>    hardware TX timestamps at ~100k/s.  A Python reproducer is at:
>    https://github.com/bobvan/PePPAR-Fix/blob/main/tools/igc_tx_timeout_repro.py
> 
>    At realistic rates (1 Hz adjfine from a GPSDO + ptp4l at 128 Hz
>    sync), the race triggers in ~30 minutes.
> 
> The attached patch holds ptp_tx_lock around the TIMINCA write and
> skips the write if any TX timestamps are pending (tx_tstamp[i].skb
> != NULL), returning -EBUSY.  This doesn't fully close the hardware
> race (a new TX capture can start between the check and the write),
> but at realistic rates the residual probability gives ~25 year MTBF
> vs ~30 minutes without the patch.
> 
> A complete fix would likely require either disabling TX timestamping
> around TIMINCA writes (via TSYNCTXCTL), or making the timeout recovery
> path more robust so a single corrupt timestamp doesn't wedge the
> subsystem.  We'd welcome guidance from the igc maintainers on the
> preferred approach.
> 
> Tested on:
>    - Intel i226 (TimeHAT v5 board on Raspberry Pi 5)
>    - Kernel 6.12.62+rpt-rpi-2712 (Raspberry Pi OS)
>    - Intel out-of-tree igc driver 5.4.0-7642.46
>    - Stock upstream igc_ptp.c (same code, same bug)
> 
> 	Bob
> 
> ---
> 
>   drivers/net/ethernet/intel/igc/igc_ptp.c | 18 +++++++++++++++++-
>   1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_ptp.c b/drivers/net/ethernet/intel/igc/igc_ptp.c
> index XXXXXXX..XXXXXXX 100644
> --- a/drivers/net/ethernet/intel/igc/igc_ptp.c
> +++ b/drivers/net/ethernet/intel/igc/igc_ptp.c
> @@ -47,8 +47,10 @@ static int igc_ptp_adjfine_i225(struct ptp_clock_info *ptp, long scaled_ppm)
>   {
>          struct igc_adapter *igc = container_of(ptp, struct igc_adapter,
>                                                 ptp_caps);
>          struct igc_hw *hw = &igc->hw;
> +       unsigned long flags;
>          int neg_adj = 0;
>          u64 rate;
>          u32 inca;
> +       int i;
> 
>          if (scaled_ppm < 0) {
>                  neg_adj = 1;
> @@ -63,7 +65,21 @@ static int igc_ptp_adjfine_i225(struct ptp_clock_info *ptp, long scaled_ppm)
>          if (neg_adj)
>                  inca |= ISGN;
> 
> -       wr32(IGC_TIMINCA, inca);
> +       /* Changing the clock increment rate while a TX timestamp is being
> +        * captured by the hardware can corrupt the timestamp, causing the
> +        * driver to report "Tx timestamp timeout" and eventually wedging
> +        * the EXTTS subsystem.  Serialize with pending TX timestamps:
> +        * skip the rate change if any are in flight.
> +        */
> +       spin_lock_irqsave(&igc->ptp_tx_lock, flags);
> +       for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) {
> +               if (igc->tx_tstamp[i].skb) {
> +                       spin_unlock_irqrestore(&igc->ptp_tx_lock, flags);
> +                       return -EBUSY;
> +               }
> +       }
> +       wr32(IGC_TIMINCA, inca);
> +       spin_unlock_irqrestore(&igc->ptp_tx_lock, flags);

It's a bit weird solution, because in this case we may end up having no
successful calls to adjfine with high amount of TX timestamp packets in 
flight. Another problem here is that access to timing registers is
guarded by tmreg_lock, but here you use ptp_tx_lock, which protects
queue.

Were you able to recover "corrupted" time stamps to figure out why they
are discarded?


> 
>          return 0;
>   }
> --
> 2.39.2


  reply	other threads:[~2026-03-30 16:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-29  3:25 [PATCH] igc: fix Tx timestamp timeout caused by unlocked TIMINCA write in adj fine Bob Van Valzah
2026-03-30 16:39 ` Vadim Fedorenko [this message]
2026-03-30 19:42   ` Bob Van Valzah
2026-04-01  1:14     ` Bob Van Valzah
2026-04-02  0:49       ` [PATCH] igc: fix Tx timestamp timeout caused by unlocked TIMINCA write in adj fine] Bob Van Valzah
2026-04-02 23:48         ` [Intel-wired-lan] " Vinicius Costa Gomes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65977d5b-16eb-418c-995e-6a918f67707a@linux.dev \
    --to=vadim.fedorenko@linux.dev \
    --cc=anthony.l.nguyen@intel.com \
    --cc=bob@vanvalzah.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jeff@jeffgeerling.com \
    --cc=julianstj@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox