* [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 @ 2026-02-05 7:54 Kurt Kanzenbach 2026-02-05 9:47 ` [Intel-wired-lan] " Loktionov, Aleksandr 2026-02-05 12:12 ` Sebastian Andrzej Siewior 0 siblings, 2 replies; 34+ messages in thread From: Kurt Kanzenbach @ 2026-02-05 7:54 UTC (permalink / raw) To: Tony Nguyen, Przemek Kitszel Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Richard Cochran, Vinicius Costa Gomes, Sebastian Andrzej Siewior, Paul Menzel, Vadim Fedorenko, Miroslav Lichvar, Jacob Keller, intel-wired-lan, netdev, linux-kernel, Kurt Kanzenbach Retrieve Tx timestamp directly from interrupt handler for i210. The current implementation uses schedule_work() which is executed by the system work queue to retrieve Tx timestamps. This increases latency and can lead to timeouts in case of heavy system load. i210 is often used in industrial systems, where timestamp timeouts can be fatal. Therefore, fetch the timestamp directly from the interrupt handler. The work queue code stays for all other NICs supported by igb. Tested on Intel i210 and i350. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> --- Changes in v3: - Switch back to IRQ, but for i210 only - Keep kworker for all other NICs like i350 (Miroslav) - Link to v2: https://lore.kernel.org/r/20250822-igb_irq_ts-v2-1-1ac37078a7a4@linutronix.de Changes in v2: - Switch from IRQ to PTP aux worker due to NTP performance regression (Miroslav) - Link to v1: https://lore.kernel.org/r/20250815-igb_irq_ts-v1-1-8c6fc0353422@linutronix.de --- drivers/net/ethernet/intel/igb/igb.h | 1 + drivers/net/ethernet/intel/igb/igb_main.c | 5 ++++- drivers/net/ethernet/intel/igb/igb_ptp.c | 22 ++++++++++++++++++++++ 3 files changed, 27 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index 0fff1df81b7b..1de29670784e 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -776,6 +776,7 @@ int igb_ptp_hwtstamp_get(struct net_device *netdev, int igb_ptp_hwtstamp_set(struct net_device *netdev, struct kernel_hwtstamp_config *config, struct netlink_ext_ack *extack); +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter); void igb_set_flag_queue_pairs(struct igb_adapter *, const u32); unsigned int igb_get_max_rss_queues(struct igb_adapter *); #ifdef CONFIG_IGB_HWMON diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index dbea37269d2c..d0d9245e6d72 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -7078,7 +7078,10 @@ static void igb_tsync_interrupt(struct igb_adapter *adapter) if (tsicr & E1000_TSICR_TXTS) { /* retrieve hardware timestamp */ - schedule_work(&adapter->ptp_tx_work); + if (hw->mac.type == e1000_i210) + igb_ptp_tx_tstamp_event(adapter); + else + schedule_work(&adapter->ptp_tx_work); } if (tsicr & TSINTR_TT0) diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c index bd85d02ecadd..8c8f2b8615f7 100644 --- a/drivers/net/ethernet/intel/igb/igb_ptp.c +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c @@ -796,6 +796,28 @@ static int igb_ptp_verify_pin(struct ptp_clock_info *ptp, unsigned int pin, return 0; } +/** + * igb_ptp_tx_tstamp_event + * @adapter: pointer to igb adapter + * + * This function checks the TSYNCTXCTL valid bit and stores the Tx hardware + * timestamp at the current skb. + **/ +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter) +{ + struct e1000_hw *hw = &adapter->hw; + u32 tsynctxctl; + + if (!adapter->ptp_tx_skb) + return; + + tsynctxctl = rd32(E1000_TSYNCTXCTL); + if (WARN_ON_ONCE(!(tsynctxctl & E1000_TSYNCTXCTL_VALID))) + return; + + igb_ptp_tx_hwtstamp(adapter); +} + /** * igb_ptp_tx_work * @work: pointer to work struct --- base-commit: e07d0d30939990da377672ef49ca09763b4fbc79 change-id: 20250813-igb_irq_ts-1aa77cc7b4cb Best regards, -- Kurt Kanzenbach <kurt@linutronix.de> ^ permalink raw reply related [flat|nested] 34+ messages in thread
* RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 7:54 [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 Kurt Kanzenbach @ 2026-02-05 9:47 ` Loktionov, Aleksandr 2026-02-05 10:03 ` Sebastian Andrzej Siewior 2026-02-05 11:58 ` Kurt Kanzenbach 2026-02-05 12:12 ` Sebastian Andrzej Siewior 1 sibling, 2 replies; 34+ messages in thread From: Loktionov, Aleksandr @ 2026-02-05 9:47 UTC (permalink / raw) To: Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw Cc: Paul Menzel, Vadim Fedorenko, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet, intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski, Paolo Abeni, David S. Miller, Sebastian Andrzej Siewior > -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf > Of Kurt Kanzenbach > Sent: Thursday, February 5, 2026 8:55 AM > To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, > Przemyslaw <przemyslaw.kitszel@intel.com> > Cc: Paul Menzel <pmenzel@molgen.mpg.de>; Vadim Fedorenko > <vadim.fedorenko@linux.dev>; Gomes, Vinicius > <vinicius.gomes@intel.com>; netdev@vger.kernel.org; Richard Cochran > <richardcochran@gmail.com>; Kurt Kanzenbach <kurt@linutronix.de>; > linux-kernel@vger.kernel.org; Andrew Lunn <andrew+netdev@lunn.ch>; > Eric Dumazet <edumazet@google.com>; intel-wired-lan@lists.osuosl.org; > Keller, Jacob E <jacob.e.keller@intel.com>; Jakub Kicinski > <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; David S. Miller > <davem@davemloft.net>; Sebastian Andrzej Siewior > <bigeasy@linutronix.de> > Subject: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx > timestamp directly from interrupt for i210 > > Retrieve Tx timestamp directly from interrupt handler for i210. > > The current implementation uses schedule_work() which is executed by > the system work queue to retrieve Tx timestamps. This increases > latency and can lead to timeouts in case of heavy system load. i210 is > often used in industrial systems, where timestamp timeouts can be > fatal. > > Therefore, fetch the timestamp directly from the interrupt handler. > > The work queue code stays for all other NICs supported by igb. > > Tested on Intel i210 and i350. > > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> > --- > Changes in v3: > - Switch back to IRQ, but for i210 only > - Keep kworker for all other NICs like i350 (Miroslav) > - Link to v2: https://lore.kernel.org/r/20250822-igb_irq_ts-v2-1- > 1ac37078a7a4@linutronix.de > > Changes in v2: > - Switch from IRQ to PTP aux worker due to NTP performance regression > (Miroslav) > - Link to v1: https://lore.kernel.org/r/20250815-igb_irq_ts-v1-1- > 8c6fc0353422@linutronix.de > --- > drivers/net/ethernet/intel/igb/igb.h | 1 + > drivers/net/ethernet/intel/igb/igb_main.c | 5 ++++- > drivers/net/ethernet/intel/igb/igb_ptp.c | 22 ++++++++++++++++++++++ > 3 files changed, 27 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/igb/igb.h > b/drivers/net/ethernet/intel/igb/igb.h > index 0fff1df81b7b..1de29670784e 100644 > --- a/drivers/net/ethernet/intel/igb/igb.h > +++ b/drivers/net/ethernet/intel/igb/igb.h > @@ -776,6 +776,7 @@ int igb_ptp_hwtstamp_get(struct net_device > *netdev, int igb_ptp_hwtstamp_set(struct net_device *netdev, > struct kernel_hwtstamp_config *config, > struct netlink_ext_ack *extack); > +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter); > void igb_set_flag_queue_pairs(struct igb_adapter *, const u32); > unsigned int igb_get_max_rss_queues(struct igb_adapter *); #ifdef > CONFIG_IGB_HWMON diff --git > a/drivers/net/ethernet/intel/igb/igb_main.c > b/drivers/net/ethernet/intel/igb/igb_main.c > index dbea37269d2c..d0d9245e6d72 100644 > --- a/drivers/net/ethernet/intel/igb/igb_main.c > +++ b/drivers/net/ethernet/intel/igb/igb_main.c > @@ -7078,7 +7078,10 @@ static void igb_tsync_interrupt(struct > igb_adapter *adapter) > > if (tsicr & E1000_TSICR_TXTS) { > /* retrieve hardware timestamp */ > - schedule_work(&adapter->ptp_tx_work); > + if (hw->mac.type == e1000_i210) > + igb_ptp_tx_tstamp_event(adapter); <-Called from IRQ! > + else > + schedule_work(&adapter->ptp_tx_work); > } > > if (tsicr & TSINTR_TT0) > diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c > b/drivers/net/ethernet/intel/igb/igb_ptp.c > index bd85d02ecadd..8c8f2b8615f7 100644 > --- a/drivers/net/ethernet/intel/igb/igb_ptp.c > +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c > @@ -796,6 +796,28 @@ static int igb_ptp_verify_pin(struct > ptp_clock_info *ptp, unsigned int pin, > return 0; > } > > +/** > + * igb_ptp_tx_tstamp_event > + * @adapter: pointer to igb adapter > + * > + * This function checks the TSYNCTXCTL valid bit and stores the Tx > +hardware > + * timestamp at the current skb. > + **/ > +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter) { > + struct e1000_hw *hw = &adapter->hw; > + u32 tsynctxctl; > + > + if (!adapter->ptp_tx_skb) > + return; > + > + tsynctxctl = rd32(E1000_TSYNCTXCTL); > + if (WARN_ON_ONCE(!(tsynctxctl & E1000_TSYNCTXCTL_VALID))) > + return; > + > + igb_ptp_tx_hwtstamp(adapter); <-Calls existing function designed for work queue! skb_tstamp_tx() can sleep Smells like sleep-in-atomic isn't it? > +} > + > /** > * igb_ptp_tx_work > * @work: pointer to work struct > > --- > base-commit: e07d0d30939990da377672ef49ca09763b4fbc79 > change-id: 20250813-igb_irq_ts-1aa77cc7b4cb > > Best regards, > -- > Kurt Kanzenbach <kurt@linutronix.de> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 9:47 ` [Intel-wired-lan] " Loktionov, Aleksandr @ 2026-02-05 10:03 ` Sebastian Andrzej Siewior 2026-02-05 10:37 ` Loktionov, Aleksandr 2026-02-05 11:58 ` Kurt Kanzenbach 1 sibling, 1 reply; 34+ messages in thread From: Sebastian Andrzej Siewior @ 2026-02-05 10:03 UTC (permalink / raw) To: Loktionov, Aleksandr Cc: Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Vadim Fedorenko, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet, intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski, Paolo Abeni, David S. Miller On 2026-02-05 09:47:14 [+0000], Loktionov, Aleksandr wrote: … > > --- a/drivers/net/ethernet/intel/igb/igb_ptp.c > > +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c > > @@ -796,6 +796,28 @@ static int igb_ptp_verify_pin(struct … > > + igb_ptp_tx_hwtstamp(adapter); <-Calls existing function designed for work queue! > > skb_tstamp_tx() can sleep > Smells like sleep-in-atomic isn't it? How or where can it sleep? Sebastian ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 10:03 ` Sebastian Andrzej Siewior @ 2026-02-05 10:37 ` Loktionov, Aleksandr 2026-02-05 10:52 ` Sebastian Andrzej Siewior 2026-02-05 11:56 ` Vadim Fedorenko 0 siblings, 2 replies; 34+ messages in thread From: Loktionov, Aleksandr @ 2026-02-05 10:37 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Vadim Fedorenko, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet, intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski, Paolo Abeni, David S. Miller > -----Original Message----- > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Sent: Thursday, February 5, 2026 11:04 AM > To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com> > Cc: Kurt Kanzenbach <kurt@linutronix.de>; Nguyen, Anthony L > <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw > <przemyslaw.kitszel@intel.com>; Paul Menzel <pmenzel@molgen.mpg.de>; > Vadim Fedorenko <vadim.fedorenko@linux.dev>; Gomes, Vinicius > <vinicius.gomes@intel.com>; netdev@vger.kernel.org; Richard Cochran > <richardcochran@gmail.com>; linux-kernel@vger.kernel.org; Andrew Lunn > <andrew+netdev@lunn.ch>; Eric Dumazet <edumazet@google.com>; intel- > wired-lan@lists.osuosl.org; Keller, Jacob E > <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo > Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net> > Subject: Re: RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve > Tx timestamp directly from interrupt for i210 > > On 2026-02-05 09:47:14 [+0000], Loktionov, Aleksandr wrote: > … > > > --- a/drivers/net/ethernet/intel/igb/igb_ptp.c > > > +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c > > > @@ -796,6 +796,28 @@ static int igb_ptp_verify_pin(struct > … > > > + igb_ptp_tx_hwtstamp(adapter); <-Calls existing function > designed for work queue! > > > > skb_tstamp_tx() can sleep > > Smells like sleep-in-atomic isn't it? > > How or where can it sleep? > > Sebastian igb_ptp_tx_hwtstamp() -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_tstamp_tx -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/__skb_complete_tx_timestamp -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/sock_queue_err_skb -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_queue_tail -> https://elixir.bootlin.com/linux/v6.19-rc5/source/net/core/skbuff.c#L4075 spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: RE: RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 10:37 ` Loktionov, Aleksandr @ 2026-02-05 10:52 ` Sebastian Andrzej Siewior 2026-02-05 11:56 ` Vadim Fedorenko 1 sibling, 0 replies; 34+ messages in thread From: Sebastian Andrzej Siewior @ 2026-02-05 10:52 UTC (permalink / raw) To: Loktionov, Aleksandr Cc: Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Vadim Fedorenko, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet, intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski, Paolo Abeni, David S. Miller On 2026-02-05 10:37:05 [+0000], Loktionov, Aleksandr wrote: > > > > How or where can it sleep? > > > > Sebastian > > igb_ptp_tx_hwtstamp() -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_tstamp_tx -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/__skb_complete_tx_timestamp -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/sock_queue_err_skb -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_queue_tail -> https://elixir.bootlin.com/linux/v6.19-rc5/source/net/core/skbuff.c#L4075 Would you please quote an actual call chain that can be looked up and not this where a line crosses 300 characters? > spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep Okay. So you are concerned about this spinlock_t, I see. igb_tsync_interrupt() also invokes ptp_clock_event() which acquires pps_event_time::tsevqs_lock. Why is this not a problem? Sebastian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 10:37 ` Loktionov, Aleksandr 2026-02-05 10:52 ` Sebastian Andrzej Siewior @ 2026-02-05 11:56 ` Vadim Fedorenko 2026-02-05 14:51 ` Sebastian Andrzej Siewior 1 sibling, 1 reply; 34+ messages in thread From: Vadim Fedorenko @ 2026-02-05 11:56 UTC (permalink / raw) To: Loktionov, Aleksandr, Sebastian Andrzej Siewior Cc: Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet, intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski, Paolo Abeni, David S. Miller On 05/02/2026 10:37, Loktionov, Aleksandr wrote: > > >> -----Original Message----- >> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> >> Sent: Thursday, February 5, 2026 11:04 AM >> To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com> >> Cc: Kurt Kanzenbach <kurt@linutronix.de>; Nguyen, Anthony L >> <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw >> <przemyslaw.kitszel@intel.com>; Paul Menzel <pmenzel@molgen.mpg.de>; >> Vadim Fedorenko <vadim.fedorenko@linux.dev>; Gomes, Vinicius >> <vinicius.gomes@intel.com>; netdev@vger.kernel.org; Richard Cochran >> <richardcochran@gmail.com>; linux-kernel@vger.kernel.org; Andrew Lunn >> <andrew+netdev@lunn.ch>; Eric Dumazet <edumazet@google.com>; intel- >> wired-lan@lists.osuosl.org; Keller, Jacob E >> <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo >> Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net> >> Subject: Re: RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve >> Tx timestamp directly from interrupt for i210 >> >> On 2026-02-05 09:47:14 [+0000], Loktionov, Aleksandr wrote: >> … >>>> --- a/drivers/net/ethernet/intel/igb/igb_ptp.c >>>> +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c >>>> @@ -796,6 +796,28 @@ static int igb_ptp_verify_pin(struct >> … >>>> + igb_ptp_tx_hwtstamp(adapter); <-Calls existing function >> designed for work queue! >>> >>> skb_tstamp_tx() can sleep >>> Smells like sleep-in-atomic isn't it? >> >> How or where can it sleep? >> >> Sebastian > > igb_ptp_tx_hwtstamp() -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_tstamp_tx -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/__skb_complete_tx_timestamp -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/sock_queue_err_skb -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_queue_tail -> https://elixir.bootlin.com/linux/v6.19-rc5/source/net/core/skbuff.c#L4075 > > spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep Hmm... that actually means we have some drivers broken for RT kernels if they are processing TX timestamps within a single irq vector: - hisilicon/hns3 - intel/i40e (and ice probably) - marvell/mvpp2 For igb/igc/i40e it's still OK to process TX timestamps directly in MSI-X configuration, as ring processing has separate vector, right? But in general skb_tstamp_tx should be moved to BH processing (NAPI poll callback). ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 11:56 ` Vadim Fedorenko @ 2026-02-05 14:51 ` Sebastian Andrzej Siewior 2026-02-05 16:27 ` Vadim Fedorenko 0 siblings, 1 reply; 34+ messages in thread From: Sebastian Andrzej Siewior @ 2026-02-05 14:51 UTC (permalink / raw) To: Vadim Fedorenko Cc: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet, intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski, Paolo Abeni, David S. Miller On 2026-02-05 11:56:44 [+0000], Vadim Fedorenko wrote: > On 05/02/2026 10:37, Loktionov, Aleksandr wrote: > > spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep > > Hmm... that actually means we have some drivers broken for RT kernels if > they are processing TX timestamps within a single irq vector: > - hisilicon/hns3 > - intel/i40e (and ice probably) > - marvell/mvpp2 > > For igb/igc/i40e it's still OK to process TX timestamps directly in > MSI-X configuration, as ring processing has separate vector, right? The statement made above is not accurate. Each and every driver does request_irq() and here on PREEMPT_RT you can freely acquire spinlock_t. But !RT looks problematic… __skb_tstamp_tx() invokes skb_may_tx_timestamp() which should exit early most of the time due to the passed bool (which is true) or sysctl_tstamp_allow_data which is true. However, should both be false then it tries to read_lock_bh(&sk->sk_callback_lock); where lockdep will complain because this lock is now acquired with disabled interrupts. The function will attempt do free the fresh/ cloned skb in error case via kfree_skb(). Since it is fresh skb, sk_buff::destructor is NULL and the warning in skb_release_head_state() won't trigger. So the only thing that bothers me is the read_lock_bh() in skb_may_tx_timestamp() which deadlocks if the socket is write-locked on the same CPU. > But in general skb_tstamp_tx should be moved to BH processing (NAPI poll > callback). Sebastian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 14:51 ` Sebastian Andrzej Siewior @ 2026-02-05 16:27 ` Vadim Fedorenko 2026-02-05 16:43 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 34+ messages in thread From: Vadim Fedorenko @ 2026-02-05 16:27 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet, intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski, Paolo Abeni, David S. Miller On 05/02/2026 14:51, Sebastian Andrzej Siewior wrote: > On 2026-02-05 11:56:44 [+0000], Vadim Fedorenko wrote: >> On 05/02/2026 10:37, Loktionov, Aleksandr wrote: >>> spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep >> >> Hmm... that actually means we have some drivers broken for RT kernels if >> they are processing TX timestamps within a single irq vector: >> - hisilicon/hns3 >> - intel/i40e (and ice probably) >> - marvell/mvpp2 >> >> For igb/igc/i40e it's still OK to process TX timestamps directly in >> MSI-X configuration, as ring processing has separate vector, right? > > The statement made above is not accurate. Each and every driver does > request_irq() and here on PREEMPT_RT you can freely acquire spinlock_t. > > But !RT looks problematic… > > __skb_tstamp_tx() invokes skb_may_tx_timestamp() which should exit early > most of the time due to the passed bool (which is true) or > sysctl_tstamp_allow_data which is true. However, should both be false > then it tries to > read_lock_bh(&sk->sk_callback_lock); > > where lockdep will complain because this lock is now acquired with > disabled interrupts. > > The function will attempt do free the fresh/ cloned skb in error case > via kfree_skb(). Since it is fresh skb, sk_buff::destructor is NULL and > the warning in skb_release_head_state() won't trigger. > > So the only thing that bothers me is the read_lock_bh() in > skb_may_tx_timestamp() which deadlocks if the socket is write-locked on > the same CPU. Alright. Now you make me think whether we should enforce OPT_TSONLY option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this check, and in case sysctl was flipped off - drop TX timestamps as it's done now? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 16:27 ` Vadim Fedorenko @ 2026-02-05 16:43 ` Sebastian Andrzej Siewior 2026-02-05 16:48 ` Vadim Fedorenko 0 siblings, 1 reply; 34+ messages in thread From: Sebastian Andrzej Siewior @ 2026-02-05 16:43 UTC (permalink / raw) To: Vadim Fedorenko Cc: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet, intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski, Paolo Abeni, David S. Miller On 2026-02-05 16:27:03 [+0000], Vadim Fedorenko wrote: > > So the only thing that bothers me is the read_lock_bh() in > > skb_may_tx_timestamp() which deadlocks if the socket is write-locked on > > the same CPU. > > Alright. Now you make me think whether we should enforce OPT_TSONLY > option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this > check, and in case sysctl was flipped off - drop TX timestamps as > it's done now? This would "fix" this problem for all users which do deliver the timestamp from their IRQ handler instead of napi. There are a few of those… This would be considered stable material, right? (despite the fact that we have it for quite some time and nobody complained so far). Sebastian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 16:43 ` Sebastian Andrzej Siewior @ 2026-02-05 16:48 ` Vadim Fedorenko 2026-02-05 21:41 ` Willem de Bruijn 0 siblings, 1 reply; 34+ messages in thread From: Vadim Fedorenko @ 2026-02-05 16:48 UTC (permalink / raw) To: Sebastian Andrzej Siewior, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller Cc: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E On 05/02/2026 16:43, Sebastian Andrzej Siewior wrote: > On 2026-02-05 16:27:03 [+0000], Vadim Fedorenko wrote: >>> So the only thing that bothers me is the read_lock_bh() in >>> skb_may_tx_timestamp() which deadlocks if the socket is write-locked on >>> the same CPU. >> >> Alright. Now you make me think whether we should enforce OPT_TSONLY >> option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this >> check, and in case sysctl was flipped off - drop TX timestamps as >> it's done now? > > This would "fix" this problem for all users which do deliver the > timestamp from their IRQ handler instead of napi. There are a few of > those… > This would be considered stable material, right? (despite the fact that > we have it for quite some time and nobody complained so far). cc: Willem as he is the author of the check introduced back in 2015. But it's more like a question to maintainers whether it is acceptable way of "fixing" drivers or it's no-go solution ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 16:48 ` Vadim Fedorenko @ 2026-02-05 21:41 ` Willem de Bruijn 2026-02-06 7:44 ` Sebastian Andrzej Siewior 2026-02-06 10:12 ` Vadim Fedorenko 0 siblings, 2 replies; 34+ messages in thread From: Willem de Bruijn @ 2026-02-05 21:41 UTC (permalink / raw) To: Vadim Fedorenko, Sebastian Andrzej Siewior, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller Cc: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E Vadim Fedorenko wrote: > On 05/02/2026 16:43, Sebastian Andrzej Siewior wrote: > > On 2026-02-05 16:27:03 [+0000], Vadim Fedorenko wrote: > >>> So the only thing that bothers me is the read_lock_bh() in > >>> skb_may_tx_timestamp() which deadlocks if the socket is write-locked on > >>> the same CPU. > >> > >> Alright. Now you make me think whether we should enforce OPT_TSONLY > >> option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this > >> check, and in case sysctl was flipped off - drop TX timestamps as > >> it's done now? > > > > This would "fix" this problem for all users which do deliver the > > timestamp from their IRQ handler instead of napi. There are a few of > > those… > > This would be considered stable material, right? (despite the fact that > > we have it for quite some time and nobody complained so far). > > cc: Willem as he is the author of the check introduced back in 2015. > > But it's more like a question to maintainers whether it is acceptable > way of "fixing" drivers or it's no-go solution Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 21:41 ` Willem de Bruijn @ 2026-02-06 7:44 ` Sebastian Andrzej Siewior 2026-02-06 10:12 ` Vadim Fedorenko 1 sibling, 0 replies; 34+ messages in thread From: Sebastian Andrzej Siewior @ 2026-02-06 7:44 UTC (permalink / raw) To: Willem de Bruijn Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E On 2026-02-05 16:41:03 [-0500], Willem de Bruijn wrote: > > Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users. okay. Can we move the check to sock_set_timestamping()/ setsockopt() time? On the plus side we could throw an error instead silently dropping packets. This might be a late win given that you describe the users as legacy users. I'm not sure if the "permission" can change over time and so get revoked while an application is running. Sebastian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 21:41 ` Willem de Bruijn 2026-02-06 7:44 ` Sebastian Andrzej Siewior @ 2026-02-06 10:12 ` Vadim Fedorenko 2026-02-08 16:25 ` Willem de Bruijn 1 sibling, 1 reply; 34+ messages in thread From: Vadim Fedorenko @ 2026-02-06 10:12 UTC (permalink / raw) To: Willem de Bruijn, Sebastian Andrzej Siewior, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller Cc: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E On 05.02.2026 21:41, Willem de Bruijn wrote: > Vadim Fedorenko wrote: >> On 05/02/2026 16:43, Sebastian Andrzej Siewior wrote: >>> On 2026-02-05 16:27:03 [+0000], Vadim Fedorenko wrote: >>>>> So the only thing that bothers me is the read_lock_bh() in >>>>> skb_may_tx_timestamp() which deadlocks if the socket is write-locked on >>>>> the same CPU. >>>> >>>> Alright. Now you make me think whether we should enforce OPT_TSONLY >>>> option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this >>>> check, and in case sysctl was flipped off - drop TX timestamps as >>>> it's done now? >>> >>> This would "fix" this problem for all users which do deliver the >>> timestamp from their IRQ handler instead of napi. There are a few of >>> those… >>> This would be considered stable material, right? (despite the fact that >>> we have it for quite some time and nobody complained so far). >> >> cc: Willem as he is the author of the check introduced back in 2015. >> >> But it's more like a question to maintainers whether it is acceptable >> way of "fixing" drivers or it's no-go solution > > Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users. Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX timestamps are silently dropped. To receive these timestamps users have to get CAP_NET_RAW permission, and it will work with the updated logic as well... ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-06 10:12 ` Vadim Fedorenko @ 2026-02-08 16:25 ` Willem de Bruijn 2026-02-09 9:06 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 34+ messages in thread From: Willem de Bruijn @ 2026-02-08 16:25 UTC (permalink / raw) To: Vadim Fedorenko, Willem de Bruijn, Sebastian Andrzej Siewior, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller Cc: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E Vadim Fedorenko wrote: > On 05.02.2026 21:41, Willem de Bruijn wrote: > > Vadim Fedorenko wrote: > >> On 05/02/2026 16:43, Sebastian Andrzej Siewior wrote: > >>> On 2026-02-05 16:27:03 [+0000], Vadim Fedorenko wrote: > >>>>> So the only thing that bothers me is the read_lock_bh() in > >>>>> skb_may_tx_timestamp() which deadlocks if the socket is write-locked on > >>>>> the same CPU. > >>>> > >>>> Alright. Now you make me think whether we should enforce OPT_TSONLY > >>>> option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this > >>>> check, and in case sysctl was flipped off - drop TX timestamps as > >>>> it's done now? > >>> > >>> This would "fix" this problem for all users which do deliver the > >>> timestamp from their IRQ handler instead of napi. There are a few of > >>> those… > >>> This would be considered stable material, right? (despite the fact that > >>> we have it for quite some time and nobody complained so far). > >> > >> cc: Willem as he is the author of the check introduced back in 2015. > >> > >> But it's more like a question to maintainers whether it is acceptable > >> way of "fixing" drivers or it's no-go solution > > > > Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users. > > Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX > timestamps are silently dropped. Are you referring to sysctl_tstamp_allow_data? That is enabled by default. > To receive these timestamps users have to get > CAP_NET_RAW permission, and it will work with the updated logic as well... ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-08 16:25 ` Willem de Bruijn @ 2026-02-09 9:06 ` Sebastian Andrzej Siewior 2026-02-09 10:43 ` Vadim Fedorenko 0 siblings, 1 reply; 34+ messages in thread From: Sebastian Andrzej Siewior @ 2026-02-09 9:06 UTC (permalink / raw) To: Willem de Bruijn Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E On 2026-02-08 11:25:40 [-0500], Willem de Bruijn wrote: > > >> But it's more like a question to maintainers whether it is acceptable > > >> way of "fixing" drivers or it's no-go solution > > > > > > Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users. > > > > Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX > > timestamps are silently dropped. > > Are you referring to sysctl_tstamp_allow_data? > > That is enabled by default. Yes. If so, then we don't need the check below which requires sk_callback_lock. Are SIOCSHWTSTAMP the legacy users or the ones which do not set OPT_TSONLY? I would suggest to move the CAP_NET_RAW check to the point where timestamping is getting enabled. Also if ndo_hwtstamp_set is the preferred method of getting things done, I could check how many old ones are can be easily converted… > > To receive these timestamps users have to get > > CAP_NET_RAW permission, and it will work with the updated logic as well... Sebastian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-09 9:06 ` Sebastian Andrzej Siewior @ 2026-02-09 10:43 ` Vadim Fedorenko 2026-02-09 11:48 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 34+ messages in thread From: Vadim Fedorenko @ 2026-02-09 10:43 UTC (permalink / raw) To: Sebastian Andrzej Siewior, Willem de Bruijn Cc: Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E On 09/02/2026 09:06, Sebastian Andrzej Siewior wrote: > On 2026-02-08 11:25:40 [-0500], Willem de Bruijn wrote: >>>>> But it's more like a question to maintainers whether it is acceptable >>>>> way of "fixing" drivers or it's no-go solution >>>> >>>> Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users. >>> >>> Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX >>> timestamps are silently dropped. >> >> Are you referring to sysctl_tstamp_allow_data? >> >> That is enabled by default. > > Yes. If so, then we don't need the check below which requires > sk_callback_lock. > > Are SIOCSHWTSTAMP the legacy users or the ones which do not set > OPT_TSONLY? > > I would suggest to move the CAP_NET_RAW check to the point where > timestamping is getting enabled. > Also if ndo_hwtstamp_set is the preferred method of getting things done, > I could check how many old ones are can be easily converted… Looks like you are mixing things. SIOCSHWTSTAMP/ndo_hwtstamp_set are HW configuration calls while OPT_TSONLY is socket option, which is setup via setsockopt, you can find points searching for SOF_TIMESTAMPING_OPT_TSONLY in the sources, basically sock_set_timestamping() is the function to check > >>> To receive these timestamps users have to get >>> CAP_NET_RAW permission, and it will work with the updated logic as well... > > Sebastian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-09 10:43 ` Vadim Fedorenko @ 2026-02-09 11:48 ` Sebastian Andrzej Siewior 2026-02-09 12:24 ` Vadim Fedorenko 0 siblings, 1 reply; 34+ messages in thread From: Sebastian Andrzej Siewior @ 2026-02-09 11:48 UTC (permalink / raw) To: Vadim Fedorenko Cc: Willem de Bruijn, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E On 2026-02-09 10:43:55 [+0000], Vadim Fedorenko wrote: > On 09/02/2026 09:06, Sebastian Andrzej Siewior wrote: > > On 2026-02-08 11:25:40 [-0500], Willem de Bruijn wrote: > > > > > > But it's more like a question to maintainers whether it is acceptable > > > > > > way of "fixing" drivers or it's no-go solution > > > > > > > > > > Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users. > > > > > > > > Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX > > > > timestamps are silently dropped. > > > > > > Are you referring to sysctl_tstamp_allow_data? > > > > > > That is enabled by default. > > > > Yes. If so, then we don't need the check below which requires > > sk_callback_lock. > > > > Are SIOCSHWTSTAMP the legacy users or the ones which do not set > > OPT_TSONLY? > > > > I would suggest to move the CAP_NET_RAW check to the point where > > timestamping is getting enabled. > > Also if ndo_hwtstamp_set is the preferred method of getting things done, > > I could check how many old ones are can be easily converted… > > Looks like you are mixing things. SIOCSHWTSTAMP/ndo_hwtstamp_set are HW > configuration calls while OPT_TSONLY is socket option, which is setup via > setsockopt, you can find points searching for > SOF_TIMESTAMPING_OPT_TSONLY in the sources, basically > sock_set_timestamping() is the function to check Yeah, but what is the legacy user here? If you enable HW-timestamps but never set OPT_TSONLY and the sysctl is also 0 then you reply on the CAP_NET_RAW later on. Right? I just try to justify the CAP_NET_RAW check and if it is required to move it earlier (where HW timestamps are enabled). And if the sysctl check is enough then maybe it is not needed. > > > > To receive these timestamps users have to get > > > > CAP_NET_RAW permission, and it will work with the updated logic as well... Sebastian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-09 11:48 ` Sebastian Andrzej Siewior @ 2026-02-09 12:24 ` Vadim Fedorenko 2026-02-09 12:46 ` Willem de Bruijn 0 siblings, 1 reply; 34+ messages in thread From: Vadim Fedorenko @ 2026-02-09 12:24 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Willem de Bruijn, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E On 09/02/2026 11:48, Sebastian Andrzej Siewior wrote: > On 2026-02-09 10:43:55 [+0000], Vadim Fedorenko wrote: >> On 09/02/2026 09:06, Sebastian Andrzej Siewior wrote: >>> On 2026-02-08 11:25:40 [-0500], Willem de Bruijn wrote: >>>>>>> But it's more like a question to maintainers whether it is acceptable >>>>>>> way of "fixing" drivers or it's no-go solution >>>>>> >>>>>> Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users. >>>>> >>>>> Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX >>>>> timestamps are silently dropped. >>>> >>>> Are you referring to sysctl_tstamp_allow_data? >>>> >>>> That is enabled by default. >>> >>> Yes. If so, then we don't need the check below which requires >>> sk_callback_lock. >>> >>> Are SIOCSHWTSTAMP the legacy users or the ones which do not set >>> OPT_TSONLY? >>> >>> I would suggest to move the CAP_NET_RAW check to the point where >>> timestamping is getting enabled. >>> Also if ndo_hwtstamp_set is the preferred method of getting things done, >>> I could check how many old ones are can be easily converted… >> >> Looks like you are mixing things. SIOCSHWTSTAMP/ndo_hwtstamp_set are HW >> configuration calls while OPT_TSONLY is socket option, which is setup via >> setsockopt, you can find points searching for >> SOF_TIMESTAMPING_OPT_TSONLY in the sources, basically >> sock_set_timestamping() is the function to check > > Yeah, but what is the legacy user here? If you enable HW-timestamps but > never set OPT_TSONLY and the sysctl is also 0 then you reply on the > CAP_NET_RAW later on. Right? Legacy users here means users of HW TX timestamps expecting full skb to be returned back with the TX timestamp. Legacy here means that skb will be returned with headers modified by stack, which is kind of exposure of data, which requires CAP_NET_RAW... > I just try to justify the CAP_NET_RAW check and if it is required to > move it earlier (where HW timestamps are enabled). And if the sysctl > check is enough then maybe it is not needed. Capabilities should not change during lifetime of the process, should be fine to move. On the other, sysctl can be changed system-wide which may affect users. > >>>>> To receive these timestamps users have to get >>>>> CAP_NET_RAW permission, and it will work with the updated logic as well... > > Sebastian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-09 12:24 ` Vadim Fedorenko @ 2026-02-09 12:46 ` Willem de Bruijn 2026-02-10 12:12 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 34+ messages in thread From: Willem de Bruijn @ 2026-02-09 12:46 UTC (permalink / raw) To: Vadim Fedorenko, Sebastian Andrzej Siewior Cc: Willem de Bruijn, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E Vadim Fedorenko wrote: > On 09/02/2026 11:48, Sebastian Andrzej Siewior wrote: > > On 2026-02-09 10:43:55 [+0000], Vadim Fedorenko wrote: > >> On 09/02/2026 09:06, Sebastian Andrzej Siewior wrote: > >>> On 2026-02-08 11:25:40 [-0500], Willem de Bruijn wrote: > >>>>>>> But it's more like a question to maintainers whether it is acceptable > >>>>>>> way of "fixing" drivers or it's no-go solution > >>>>>> > >>>>>> Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users. > >>>>> > >>>>> Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX > >>>>> timestamps are silently dropped. > >>>> > >>>> Are you referring to sysctl_tstamp_allow_data? > >>>> > >>>> That is enabled by default. > >>> > >>> Yes. If so, then we don't need the check below which requires > >>> sk_callback_lock. > >>> > >>> Are SIOCSHWTSTAMP the legacy users or the ones which do not set > >>> OPT_TSONLY? > >>> > >>> I would suggest to move the CAP_NET_RAW check to the point where > >>> timestamping is getting enabled. > >>> Also if ndo_hwtstamp_set is the preferred method of getting things done, > >>> I could check how many old ones are can be easily converted… > >> > >> Looks like you are mixing things. SIOCSHWTSTAMP/ndo_hwtstamp_set are HW > >> configuration calls while OPT_TSONLY is socket option, which is setup via > >> setsockopt, you can find points searching for > >> SOF_TIMESTAMPING_OPT_TSONLY in the sources, basically > >> sock_set_timestamping() is the function to check > > > > Yeah, but what is the legacy user here? If you enable HW-timestamps but > > never set OPT_TSONLY and the sysctl is also 0 then you reply on the > > CAP_NET_RAW later on. Right? > > Legacy users here means users of HW TX timestamps expecting full skb to > be returned back with the TX timestamp. Legacy here means that skb will > be returned with headers modified by stack, which is kind of exposure of > data, which requires CAP_NET_RAW... > > > I just try to justify the CAP_NET_RAW check and if it is required to > > move it earlier (where HW timestamps are enabled). And if the sysctl > > check is enough then maybe it is not needed. > > Capabilities should not change during lifetime of the process, should be > fine to move. On the other, sysctl can be changed system-wide which may > affect users. Ignore the hardware configuration. That is entirely optional. Some devices will timestamp every packet. The capability check here is per-socket, independent from the system hardware configuration. I don't see how it could be moved. Before OPT_TSONLY was introduced packets were always queued with their payload. The sysctl check was added to optionally disallow this. The check could arguably be moved earlier in the socket lifecycle and the decision cached in the socket. But then flipping the sysctl would not affect existing sockets, so that is a change in ABI behavior. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-09 12:46 ` Willem de Bruijn @ 2026-02-10 12:12 ` Sebastian Andrzej Siewior 2026-02-10 16:14 ` Willem de Bruijn 0 siblings, 1 reply; 34+ messages in thread From: Sebastian Andrzej Siewior @ 2026-02-10 12:12 UTC (permalink / raw) To: Willem de Bruijn Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E On 2026-02-09 07:46:01 [-0500], Willem de Bruijn wrote: > > > Yeah, but what is the legacy user here? If you enable HW-timestamps but > > > never set OPT_TSONLY and the sysctl is also 0 then you reply on the > > > CAP_NET_RAW later on. Right? > > > > Legacy users here means users of HW TX timestamps expecting full skb to > > be returned back with the TX timestamp. Legacy here means that skb will > > be returned with headers modified by stack, which is kind of exposure of > > data, which requires CAP_NET_RAW... Ah okay. I assumed the err-queue was the standard way of receiving timestamps. > > > I just try to justify the CAP_NET_RAW check and if it is required to > > > move it earlier (where HW timestamps are enabled). And if the sysctl > > > check is enough then maybe it is not needed. > > > > Capabilities should not change during lifetime of the process, should be > > fine to move. On the other, sysctl can be changed system-wide which may > > affect users. > > Ignore the hardware configuration. That is entirely optional. Some > devices will timestamp every packet. > > The capability check here is per-socket, independent from the system > hardware configuration. > > I don't see how it could be moved. > > Before OPT_TSONLY was introduced packets were always queued with their > payload. The sysctl check was added to optionally disallow this. The > check could arguably be moved earlier in the socket lifecycle and the > decision cached in the socket. But then flipping the sysctl would not > affect existing sockets, so that is a change in ABI behavior. You could cache only the part under sk_callback_lock. Any other suggestions? The access from IRQ is quick and avoids any detours. The alternative would be to move the whole routine into an aux_worker. For every driver doing it from the IRQ handler. Sebastian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-10 12:12 ` Sebastian Andrzej Siewior @ 2026-02-10 16:14 ` Willem de Bruijn 2026-02-11 12:08 ` Kurt Kanzenbach 0 siblings, 1 reply; 34+ messages in thread From: Willem de Bruijn @ 2026-02-10 16:14 UTC (permalink / raw) To: Sebastian Andrzej Siewior, Willem de Bruijn Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E Sebastian Andrzej Siewior wrote: > On 2026-02-09 07:46:01 [-0500], Willem de Bruijn wrote: > > > > Yeah, but what is the legacy user here? If you enable HW-timestamps but > > > > never set OPT_TSONLY and the sysctl is also 0 then you reply on the > > > > CAP_NET_RAW later on. Right? > > > > > > Legacy users here means users of HW TX timestamps expecting full skb to > > > be returned back with the TX timestamp. Legacy here means that skb will > > > be returned with headers modified by stack, which is kind of exposure of > > > data, which requires CAP_NET_RAW... > > Ah okay. I assumed the err-queue was the standard way of receiving > timestamps. It is, for tx timestamps. The open question is whether they are queued with or without packet payload. > > > > I just try to justify the CAP_NET_RAW check and if it is required to > > > > move it earlier (where HW timestamps are enabled). And if the sysctl > > > > check is enough then maybe it is not needed. > > > > > > Capabilities should not change during lifetime of the process, should be > > > fine to move. On the other, sysctl can be changed system-wide which may > > > affect users. > > > > Ignore the hardware configuration. That is entirely optional. Some > > devices will timestamp every packet. > > > > The capability check here is per-socket, independent from the system > > hardware configuration. > > > > I don't see how it could be moved. > > > > Before OPT_TSONLY was introduced packets were always queued with their > > payload. The sysctl check was added to optionally disallow this. The > > check could arguably be moved earlier in the socket lifecycle and the > > decision cached in the socket. But then flipping the sysctl would not > > affect existing sockets, so that is a change in ABI behavior. > > You could cache only the part under sk_callback_lock. Can you elaborate a bit? > Any other suggestions? > The access from IRQ is quick and avoids any detours. > The alternative would be to move the whole routine into an aux_worker. > For every driver doing it from the IRQ handler. Perhaps CAP_NET_RAW can be checked in a way that does not require this specific lock. See for instance other examples that instead use sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW). Though that uses current_cred() so probably won't work in interrupt context. Changing the check may subtly change behavior. But that may be acceptable as long as the consequences are clearly described. An alternative would be to delay until dequeue. But that would wake the reader and then drop the packets, so without any data to read. Not practical. The core issue seems to be that the ptp_tx_work is not scheduled quickly enough. I wonder if that is the issue to be fixed. When/why is this too slow? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-10 16:14 ` Willem de Bruijn @ 2026-02-11 12:08 ` Kurt Kanzenbach 2026-02-11 16:29 ` Willem de Bruijn ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Kurt Kanzenbach @ 2026-02-11 12:08 UTC (permalink / raw) To: Willem de Bruijn, Sebastian Andrzej Siewior, Willem de Bruijn Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E [-- Attachment #1: Type: text/plain, Size: 1649 bytes --] On Tue Feb 10 2026, Willem de Bruijn wrote: > The core issue seems to be that the ptp_tx_work is not scheduled > quickly enough. I wonder if that is the issue to be fixed. When/why > is this too slow? The igb driver uses schedule_work() for the Tx timestamp retrieval. That means the ptp_tx_work item is queued to the kernel-global workqueue. In case there is load on the system, the kworker which handles ptp_tx_work might be delayed too much, which results in ptp4l timeouts. Easy solution would be to tune the priority/affinity of the kworker. However, we have to figure which kworker it is. Furthermore, this kworker might handle other things as well, which are not related to igb timestamping at all. Therefore, tuning the priority of the kworker is not practical. Moving the timestamping in IRQ looked like a good solution, because the device already signals that the Tx timestamp is available now. No need to schedule any worker/work at all. So, it'd be very nice if skb_tstamp_tx() could be called from IRQ context. BTW other drivers like igc call this function in IRQ context as well. Alternative solution for igb is to move from schedule_work() to PTP AUX worker. That is a dedicated PTP worker thread called ptpX, which could handle the timestamping. This can be easily tuned with taskset and chrt. However, there's one difference to the kworker approach: The kworker always runs on the same CPU, where the IRQ triggered, the AUX worker not necessarily. This means, Miroslav needs to be aware of this and tune the AUX worker for his NTP use cases. I hope, that makes the motivation for this patch and discussion clear. Thanks, Kurt [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 861 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-11 12:08 ` Kurt Kanzenbach @ 2026-02-11 16:29 ` Willem de Bruijn 2026-02-12 18:33 ` Sebastian Andrzej Siewior 2026-02-14 23:26 ` Sebastian Andrzej Siewior 2026-02-11 18:54 ` Jakub Kicinski 2026-02-11 19:29 ` Jacob Keller 2 siblings, 2 replies; 34+ messages in thread From: Willem de Bruijn @ 2026-02-11 16:29 UTC (permalink / raw) To: Kurt Kanzenbach, Willem de Bruijn, Sebastian Andrzej Siewior, Willem de Bruijn Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E Kurt Kanzenbach wrote: > On Tue Feb 10 2026, Willem de Bruijn wrote: > > The core issue seems to be that the ptp_tx_work is not scheduled > > quickly enough. I wonder if that is the issue to be fixed. When/why > > is this too slow? > > The igb driver uses schedule_work() for the Tx timestamp retrieval. That > means the ptp_tx_work item is queued to the kernel-global workqueue. In > case there is load on the system, the kworker which handles ptp_tx_work > might be delayed too much, which results in ptp4l timeouts. > > Easy solution would be to tune the priority/affinity of the > kworker. However, we have to figure which kworker it is. Furthermore, > this kworker might handle other things as well, which are not related to > igb timestamping at all. Therefore, tuning the priority of the kworker > is not practical. > > Moving the timestamping in IRQ looked like a good solution, because the > device already signals that the Tx timestamp is available now. No need > to schedule any worker/work at all. So, it'd be very nice if > skb_tstamp_tx() could be called from IRQ context. BTW other drivers like > igc call this function in IRQ context as well. > > Alternative solution for igb is to move from schedule_work() to PTP AUX > worker. That is a dedicated PTP worker thread called ptpX, which could > handle the timestamping. This can be easily tuned with taskset and > chrt. However, there's one difference to the kworker approach: The > kworker always runs on the same CPU, where the IRQ triggered, the AUX > worker not necessarily. This means, Miroslav needs to be aware of this > and tune the AUX worker for his NTP use cases. > > I hope, that makes the motivation for this patch and discussion clear. It does thanks. I think we should look at the locking. It is not clear to me that sk_callback_lock needs to be held here at all. From the lock documentation itself its use is more limited. sk->sk_socket->file is indeed dereferenced elsewhere without holding such locks. sk_capable is another indication. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-11 16:29 ` Willem de Bruijn @ 2026-02-12 18:33 ` Sebastian Andrzej Siewior 2026-02-14 23:26 ` Sebastian Andrzej Siewior 1 sibling, 0 replies; 34+ messages in thread From: Sebastian Andrzej Siewior @ 2026-02-12 18:33 UTC (permalink / raw) To: Willem de Bruijn Cc: Kurt Kanzenbach, Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E On 2026-02-11 11:29:43 [-0500], Willem de Bruijn wrote: > I think we should look at the locking. It is not clear to me that > sk_callback_lock needs to be held here at all. > > From the lock documentation itself its use is more limited. > > sk->sk_socket->file is indeed dereferenced elsewhere without holding > such locks. > > sk_capable is another indication. That skb has a socket reference so that skbs should have a reference on 'sk' so it can't go away while the skb is around. sk_socket should be assigned once while sk is created and not change. Also that ->file should be assigned on accept and remain stable. That assignment happens in sock_graft() under the lock or in sock_init_data_uid() without it (but it both cases it should be new). If you close that socket then I think sock_close() will be invoked which ends in __sock_release() and assigns NULL to ->file. The filep itself is SLAB_TYPESAFE_BY_RCU (as it comes from alloc_file_pseudo()) so it safe to access it while in IRQ/ softirq because it can not go away. It should invoke sock_orphan() which sets ->sk_socket to NULL under the lock. I don't think that orphan can be called from outside the close path. And inode (socket) remains around as long as the filep is there. So based on this, if sk->sk_socket->file is accessed from within a syscall then the chain up to file should be valid because the fd can not be closed and so anything in that chain up to file become NULL. From within the IRQ/ softirq a close on the fd/ socket should not result in an immediate release. I *think* the skb holds sock holds the socket. If so that skb would hold the destruction of the socket back and this would make it safe to access it without the lock. This is the theory so far. Sebastian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-11 16:29 ` Willem de Bruijn 2026-02-12 18:33 ` Sebastian Andrzej Siewior @ 2026-02-14 23:26 ` Sebastian Andrzej Siewior 1 sibling, 0 replies; 34+ messages in thread From: Sebastian Andrzej Siewior @ 2026-02-14 23:26 UTC (permalink / raw) To: Willem de Bruijn Cc: Kurt Kanzenbach, Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E On 2026-02-11 11:29:43 [-0500], Willem de Bruijn wrote: > I think we should look at the locking. It is not clear to me that > sk_callback_lock needs to be held here at all. It can go away, posted https://lore.kernel.org/all/20260214232456.A37oV4KQ@linutronix.de/ Sebastian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-11 12:08 ` Kurt Kanzenbach 2026-02-11 16:29 ` Willem de Bruijn @ 2026-02-11 18:54 ` Jakub Kicinski 2026-02-12 16:28 ` Sebastian Andrzej Siewior 2026-02-11 19:29 ` Jacob Keller 2 siblings, 1 reply; 34+ messages in thread From: Jakub Kicinski @ 2026-02-11 18:54 UTC (permalink / raw) To: Kurt Kanzenbach Cc: Willem de Bruijn, Sebastian Andrzej Siewior, Vadim Fedorenko, Willem de Bruijn, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E On Wed, 11 Feb 2026 13:08:51 +0100 Kurt Kanzenbach wrote: > On Tue Feb 10 2026, Willem de Bruijn wrote: > > The core issue seems to be that the ptp_tx_work is not scheduled > > quickly enough. I wonder if that is the issue to be fixed. When/why > > is this too slow? > > The igb driver uses schedule_work() for the Tx timestamp retrieval. That > means the ptp_tx_work item is queued to the kernel-global workqueue. In > case there is load on the system, the kworker which handles ptp_tx_work > might be delayed too much, which results in ptp4l timeouts. > > Easy solution would be to tune the priority/affinity of the > kworker. However, we have to figure which kworker it is. Furthermore, > this kworker might handle other things as well, which are not related to > igb timestamping at all. Therefore, tuning the priority of the kworker > is not practical. > > Moving the timestamping in IRQ looked like a good solution, because the > device already signals that the Tx timestamp is available now. No need > to schedule any worker/work at all. So, it'd be very nice if > skb_tstamp_tx() could be called from IRQ context. BTW other drivers like > igc call this function in IRQ context as well. > > Alternative solution for igb is to move from schedule_work() to PTP AUX > worker. That is a dedicated PTP worker thread called ptpX, which could > handle the timestamping. This can be easily tuned with taskset and > chrt. However, there's one difference to the kworker approach: The > kworker always runs on the same CPU, where the IRQ triggered, the AUX > worker not necessarily. This means, Miroslav needs to be aware of this > and tune the AUX worker for his NTP use cases. > > I hope, that makes the motivation for this patch and discussion clear. Are you concerned about the latency of delivering the TS to the user space app / socket? Or purely reading the TS out of the HW fifo to make space for another packet to be timestamped? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-11 18:54 ` Jakub Kicinski @ 2026-02-12 16:28 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 34+ messages in thread From: Sebastian Andrzej Siewior @ 2026-02-12 16:28 UTC (permalink / raw) To: Jakub Kicinski Cc: Kurt Kanzenbach, Willem de Bruijn, Vadim Fedorenko, Willem de Bruijn, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org, Keller, Jacob E On 2026-02-11 10:54:44 [-0800], Jakub Kicinski wrote: > Are you concerned about the latency of delivering the TS to the user > space app / socket? Or purely reading the TS out of the HW fifo to make > space for another packet to be timestamped? The problem with the global workqueue is the latency. If the system is idle it works as-is but if it gets busy the workqueue can be significantly delayed. You can't give this workqueue a higher priority because the workqueue is anonymous and changes. The next best thing here is the aux_worker because everything here is executed always under the same process/ PID so you can apply a priority and ensure that is handled before other "less important" tasks. However. There is actually no need for the workqueue so you could do it directly (which is what some other driver do). But then I stumbled upon the locking issue… Sebastian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-11 12:08 ` Kurt Kanzenbach 2026-02-11 16:29 ` Willem de Bruijn 2026-02-11 18:54 ` Jakub Kicinski @ 2026-02-11 19:29 ` Jacob Keller 2026-02-11 21:44 ` Jakub Kicinski 2 siblings, 1 reply; 34+ messages in thread From: Jacob Keller @ 2026-02-11 19:29 UTC (permalink / raw) To: Kurt Kanzenbach, Willem de Bruijn, Sebastian Andrzej Siewior Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org On 2/11/2026 4:08 AM, Kurt Kanzenbach wrote: > On Tue Feb 10 2026, Willem de Bruijn wrote: >> The core issue seems to be that the ptp_tx_work is not scheduled >> quickly enough. I wonder if that is the issue to be fixed. When/why >> is this too slow? > > The igb driver uses schedule_work() for the Tx timestamp retrieval. That > means the ptp_tx_work item is queued to the kernel-global workqueue. In > case there is load on the system, the kworker which handles ptp_tx_work > might be delayed too much, which results in ptp4l timeouts. > Right. Old versions of ptp4l wait for ~1 millisecond by default, and newer ones default to ~10 milliseconds.. but latency here can have negative impacts on sync capability especially on profiles with higher sync rates. > Easy solution would be to tune the priority/affinity of the > kworker. However, we have to figure which kworker it is. Furthermore, > this kworker might handle other things as well, which are not related to > igb timestamping at all. Therefore, tuning the priority of the kworker > is not practical. > I don't think it is even guaranteed that you would get the same kworker every time, so I would consider such tuning not just impractical but impossible. > Moving the timestamping in IRQ looked like a good solution, because the > device already signals that the Tx timestamp is available now. No need > to schedule any worker/work at all. So, it'd be very nice if > skb_tstamp_tx() could be called from IRQ context. BTW other drivers like > igc call this function in IRQ context as well. > Right. Reporting the timestamp from the interrupt is the simplest and lowest latency method. I know Miroslav had some situations and devices where it apparently caused more problems than it helped, though I don't believe anyone else has reproduced those? > Alternative solution for igb is to move from schedule_work() to PTP AUX > worker. That is a dedicated PTP worker thread called ptpX, which could > handle the timestamping. This can be easily tuned with taskset and > chrt. However, there's one difference to the kworker approach: The > kworker always runs on the same CPU, where the IRQ triggered, the AUX > worker not necessarily. This means, Miroslav needs to be aware of this > and tune the AUX worker for his NTP use cases. This should be the standard at minimum for all new drivers, when it is not possible to report the timestamp directly in the interrupt handler. > > I hope, that makes the motivation for this patch and discussion clear. > > Thanks, > Kurt ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-11 19:29 ` Jacob Keller @ 2026-02-11 21:44 ` Jakub Kicinski 2026-02-12 16:47 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 34+ messages in thread From: Jakub Kicinski @ 2026-02-11 21:44 UTC (permalink / raw) To: Jacob Keller Cc: Kurt Kanzenbach, Willem de Bruijn, Sebastian Andrzej Siewior, Vadim Fedorenko, Willem de Bruijn, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org On Wed, 11 Feb 2026 11:29:03 -0800 Jacob Keller wrote: > > Moving the timestamping in IRQ looked like a good solution, because the > > device already signals that the Tx timestamp is available now. No need > > to schedule any worker/work at all. So, it'd be very nice if > > skb_tstamp_tx() could be called from IRQ context. BTW other drivers like > > igc call this function in IRQ context as well. > > Right. Reporting the timestamp from the interrupt is the simplest and > lowest latency method. I know Miroslav had some situations and devices > where it apparently caused more problems than it helped, though I don't > believe anyone else has reproduced those? There's a BH workqueue now, as a replacement for tasklets. Presumably smallest fix would be to switch to that? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-11 21:44 ` Jakub Kicinski @ 2026-02-12 16:47 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 34+ messages in thread From: Sebastian Andrzej Siewior @ 2026-02-12 16:47 UTC (permalink / raw) To: Jakub Kicinski Cc: Jacob Keller, Kurt Kanzenbach, Willem de Bruijn, Vadim Fedorenko, Willem de Bruijn, Paolo Abeni, Eric Dumazet, David S. Miller, Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, intel-wired-lan@lists.osuosl.org On 2026-02-11 13:44:36 [-0800], Jakub Kicinski wrote: > On Wed, 11 Feb 2026 11:29:03 -0800 Jacob Keller wrote: > > > Moving the timestamping in IRQ looked like a good solution, because the > > > device already signals that the Tx timestamp is available now. No need > > > to schedule any worker/work at all. So, it'd be very nice if > > > skb_tstamp_tx() could be called from IRQ context. BTW other drivers like > > > igc call this function in IRQ context as well. > > > > Right. Reporting the timestamp from the interrupt is the simplest and > > lowest latency method. I know Miroslav had some situations and devices > > where it apparently caused more problems than it helped, though I don't > > believe anyone else has reproduced those? > > There's a BH workqueue now, as a replacement for tasklets. > Presumably smallest fix would be to switch to that? Hmm. This would raise the TASKLET softirq from the interrupt handler and then handle it on the irq-exit path. Let me look if that one lock can be removed first… Sebastian ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 9:47 ` [Intel-wired-lan] " Loktionov, Aleksandr 2026-02-05 10:03 ` Sebastian Andrzej Siewior @ 2026-02-05 11:58 ` Kurt Kanzenbach 2026-02-05 12:20 ` Loktionov, Aleksandr 1 sibling, 1 reply; 34+ messages in thread From: Kurt Kanzenbach @ 2026-02-05 11:58 UTC (permalink / raw) To: Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw Cc: Paul Menzel, Vadim Fedorenko, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet, intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski, Paolo Abeni, David S. Miller, Sebastian Andrzej Siewior [-- Attachment #1: Type: text/plain, Size: 1013 bytes --] On Thu Feb 05 2026, Loktionov, Aleksandr wrote: >> +/** >> + * igb_ptp_tx_tstamp_event >> + * @adapter: pointer to igb adapter >> + * >> + * This function checks the TSYNCTXCTL valid bit and stores the Tx >> +hardware >> + * timestamp at the current skb. >> + **/ >> +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter) { >> + struct e1000_hw *hw = &adapter->hw; >> + u32 tsynctxctl; >> + >> + if (!adapter->ptp_tx_skb) >> + return; >> + >> + tsynctxctl = rd32(E1000_TSYNCTXCTL); >> + if (WARN_ON_ONCE(!(tsynctxctl & E1000_TSYNCTXCTL_VALID))) >> + return; >> + >> + igb_ptp_tx_hwtstamp(adapter); <-Calls existing function designed for work queue! > > skb_tstamp_tx() can sleep > Smells like sleep-in-atomic isn't it? AFAICS skb_tstamp_tx() is safe to call here. > spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep In case you're worried about PREEMPT_RT: On -RT the IRQ runs a dedicated thread. BTW I've tested this with and without -RT and with CONFIG_DEBUG_ATOMIC_SLEEP. Thanks, Kurt [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 861 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 11:58 ` Kurt Kanzenbach @ 2026-02-05 12:20 ` Loktionov, Aleksandr 2026-02-06 0:05 ` Jacob Keller 0 siblings, 1 reply; 34+ messages in thread From: Loktionov, Aleksandr @ 2026-02-05 12:20 UTC (permalink / raw) To: Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw Cc: Paul Menzel, Vadim Fedorenko, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet, intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski, Paolo Abeni, David S. Miller, Sebastian Andrzej Siewior > -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf > Of Kurt Kanzenbach > Sent: Thursday, February 5, 2026 12:58 PM > To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com>; Nguyen, > Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw > <przemyslaw.kitszel@intel.com> > Cc: Paul Menzel <pmenzel@molgen.mpg.de>; Vadim Fedorenko > <vadim.fedorenko@linux.dev>; Gomes, Vinicius > <vinicius.gomes@intel.com>; netdev@vger.kernel.org; Richard Cochran > <richardcochran@gmail.com>; linux-kernel@vger.kernel.org; Andrew Lunn > <andrew+netdev@lunn.ch>; Eric Dumazet <edumazet@google.com>; intel- > wired-lan@lists.osuosl.org; Keller, Jacob E > <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo > Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net>; > Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Subject: Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx > timestamp directly from interrupt for i210 > > On Thu Feb 05 2026, Loktionov, Aleksandr wrote: > >> +/** > >> + * igb_ptp_tx_tstamp_event > >> + * @adapter: pointer to igb adapter > >> + * > >> + * This function checks the TSYNCTXCTL valid bit and stores the Tx > >> +hardware > >> + * timestamp at the current skb. > >> + **/ > >> +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter) { > >> + struct e1000_hw *hw = &adapter->hw; > >> + u32 tsynctxctl; > >> + > >> + if (!adapter->ptp_tx_skb) > >> + return; > >> + > >> + tsynctxctl = rd32(E1000_TSYNCTXCTL); > >> + if (WARN_ON_ONCE(!(tsynctxctl & E1000_TSYNCTXCTL_VALID))) > >> + return; > >> + > >> + igb_ptp_tx_hwtstamp(adapter); <-Calls existing function > designed for work queue! > > > > skb_tstamp_tx() can sleep > > Smells like sleep-in-atomic isn't it? > > AFAICS skb_tstamp_tx() is safe to call here. > > > spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep > > In case you're worried about PREEMPT_RT: On -RT the IRQ runs a > dedicated thread. BTW I've tested this with and without -RT and with > CONFIG_DEBUG_ATOMIC_SLEEP. > > Thanks, > Kurt Thank you, Kurt for sharing your experience. I don't have so many experience with RT Linux. For me calling a function, not designed to be called from IRQ context is a SUS. So, I rose the question about sleeping. But there is also a question about non-RT Kernels with Shared IRQ Vectors... I.e. regular packet processing (NAPI poll in softirq) and PTP timestamp events (in hardirq). I suspect we have potentially broken drivers with shared vectors (MSI-X should be safe I hope). I'd like the comment to be added: /* Safe to call from IRQ: dedicated misc vector + RT-safe on PREEMPT_RT */ igb_ptp_tx_hwtstamp(adapter); But in long term perspective prefer to refactor moving to NAPI is the safer, more maintainable pattern. What do you think? Alex ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 12:20 ` Loktionov, Aleksandr @ 2026-02-06 0:05 ` Jacob Keller 0 siblings, 0 replies; 34+ messages in thread From: Jacob Keller @ 2026-02-06 0:05 UTC (permalink / raw) To: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw Cc: Paul Menzel, Vadim Fedorenko, Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet, intel-wired-lan@lists.osuosl.org, Jakub Kicinski, Paolo Abeni, David S. Miller, Sebastian Andrzej Siewior On 2/5/2026 4:20 AM, Loktionov, Aleksandr wrote: > > >> -----Original Message----- >> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf >> Of Kurt Kanzenbach >> Sent: Thursday, February 5, 2026 12:58 PM >> To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com>; Nguyen, >> Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw >> <przemyslaw.kitszel@intel.com> >> Cc: Paul Menzel <pmenzel@molgen.mpg.de>; Vadim Fedorenko >> <vadim.fedorenko@linux.dev>; Gomes, Vinicius >> <vinicius.gomes@intel.com>; netdev@vger.kernel.org; Richard Cochran >> <richardcochran@gmail.com>; linux-kernel@vger.kernel.org; Andrew Lunn >> <andrew+netdev@lunn.ch>; Eric Dumazet <edumazet@google.com>; intel- >> wired-lan@lists.osuosl.org; Keller, Jacob E >> <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo >> Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net>; >> Sebastian Andrzej Siewior <bigeasy@linutronix.de> >> Subject: Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx >> timestamp directly from interrupt for i210 >> >> On Thu Feb 05 2026, Loktionov, Aleksandr wrote: >>>> +/** >>>> + * igb_ptp_tx_tstamp_event >>>> + * @adapter: pointer to igb adapter >>>> + * >>>> + * This function checks the TSYNCTXCTL valid bit and stores the Tx >>>> +hardware >>>> + * timestamp at the current skb. >>>> + **/ >>>> +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter) { >>>> + struct e1000_hw *hw = &adapter->hw; >>>> + u32 tsynctxctl; >>>> + >>>> + if (!adapter->ptp_tx_skb) >>>> + return; >>>> + >>>> + tsynctxctl = rd32(E1000_TSYNCTXCTL); >>>> + if (WARN_ON_ONCE(!(tsynctxctl & E1000_TSYNCTXCTL_VALID))) >>>> + return; >>>> + >>>> + igb_ptp_tx_hwtstamp(adapter); <-Calls existing function >> designed for work queue! >>> >>> skb_tstamp_tx() can sleep >>> Smells like sleep-in-atomic isn't it? >> >> AFAICS skb_tstamp_tx() is safe to call here. >> >>> spin_lock_irqsave(&wq_head->lock, flags); <- RT mutex can sleep >> >> In case you're worried about PREEMPT_RT: On -RT the IRQ runs a >> dedicated thread. BTW I've tested this with and without -RT and with >> CONFIG_DEBUG_ATOMIC_SLEEP. >> >> Thanks, >> Kurt > > Thank you, Kurt for sharing your experience. I don't have so many experience with RT Linux. > For me calling a function, not designed to be called from IRQ context is a SUS. > So, I rose the question about sleeping. > My understanding is that RT is only safe to convert such spinlock_t to mutex *because* it also converts IRQs to threads. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 2026-02-05 7:54 [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 Kurt Kanzenbach 2026-02-05 9:47 ` [Intel-wired-lan] " Loktionov, Aleksandr @ 2026-02-05 12:12 ` Sebastian Andrzej Siewior 1 sibling, 0 replies; 34+ messages in thread From: Sebastian Andrzej Siewior @ 2026-02-05 12:12 UTC (permalink / raw) To: Kurt Kanzenbach Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Richard Cochran, Vinicius Costa Gomes, Paul Menzel, Vadim Fedorenko, Miroslav Lichvar, Jacob Keller, intel-wired-lan, netdev, linux-kernel On 2026-02-05 08:54:34 [+0100], Kurt Kanzenbach wrote: > Retrieve Tx timestamp directly from interrupt handler for i210. > > The current implementation uses schedule_work() which is executed by the > system work queue to retrieve Tx timestamps. This increases latency and can > lead to timeouts in case of heavy system load. i210 is often used in > industrial systems, where timestamp timeouts can be fatal. > > Therefore, fetch the timestamp directly from the interrupt handler. > > The work queue code stays for all other NICs supported by igb. > > Tested on Intel i210 and i350. > > Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> IMHO this is a compromise with Miroslav where he observed less PTP timestamps on the i350. While testing I did not get near Miroslav's difference but there was a small change. I don't understand *why* because the current workqueue usage reads the timestamp on the same CPU on which the interrupt occurred. Doing it directly just avoids the context switch. This feels beneficial. Sebastian ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2026-02-14 23:26 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-05 7:54 [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 Kurt Kanzenbach 2026-02-05 9:47 ` [Intel-wired-lan] " Loktionov, Aleksandr 2026-02-05 10:03 ` Sebastian Andrzej Siewior 2026-02-05 10:37 ` Loktionov, Aleksandr 2026-02-05 10:52 ` Sebastian Andrzej Siewior 2026-02-05 11:56 ` Vadim Fedorenko 2026-02-05 14:51 ` Sebastian Andrzej Siewior 2026-02-05 16:27 ` Vadim Fedorenko 2026-02-05 16:43 ` Sebastian Andrzej Siewior 2026-02-05 16:48 ` Vadim Fedorenko 2026-02-05 21:41 ` Willem de Bruijn 2026-02-06 7:44 ` Sebastian Andrzej Siewior 2026-02-06 10:12 ` Vadim Fedorenko 2026-02-08 16:25 ` Willem de Bruijn 2026-02-09 9:06 ` Sebastian Andrzej Siewior 2026-02-09 10:43 ` Vadim Fedorenko 2026-02-09 11:48 ` Sebastian Andrzej Siewior 2026-02-09 12:24 ` Vadim Fedorenko 2026-02-09 12:46 ` Willem de Bruijn 2026-02-10 12:12 ` Sebastian Andrzej Siewior 2026-02-10 16:14 ` Willem de Bruijn 2026-02-11 12:08 ` Kurt Kanzenbach 2026-02-11 16:29 ` Willem de Bruijn 2026-02-12 18:33 ` Sebastian Andrzej Siewior 2026-02-14 23:26 ` Sebastian Andrzej Siewior 2026-02-11 18:54 ` Jakub Kicinski 2026-02-12 16:28 ` Sebastian Andrzej Siewior 2026-02-11 19:29 ` Jacob Keller 2026-02-11 21:44 ` Jakub Kicinski 2026-02-12 16:47 ` Sebastian Andrzej Siewior 2026-02-05 11:58 ` Kurt Kanzenbach 2026-02-05 12:20 ` Loktionov, Aleksandr 2026-02-06 0:05 ` Jacob Keller 2026-02-05 12:12 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox