public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
@ 2026-02-05  7:54 Kurt Kanzenbach
  2026-02-05  9:47 ` [Intel-wired-lan] " Loktionov, Aleksandr
  2026-02-05 12:12 ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 34+ messages in thread
From: Kurt Kanzenbach @ 2026-02-05  7:54 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Richard Cochran, Vinicius Costa Gomes,
	Sebastian Andrzej Siewior, Paul Menzel, Vadim Fedorenko,
	Miroslav Lichvar, Jacob Keller, intel-wired-lan, netdev,
	linux-kernel, Kurt Kanzenbach

Retrieve Tx timestamp directly from interrupt handler for i210.

The current implementation uses schedule_work() which is executed by the
system work queue to retrieve Tx timestamps. This increases latency and can
lead to timeouts in case of heavy system load. i210 is often used in
industrial systems, where timestamp timeouts can be fatal.

Therefore, fetch the timestamp directly from the interrupt handler.

The work queue code stays for all other NICs supported by igb.

Tested on Intel i210 and i350.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
---
Changes in v3:
- Switch back to IRQ, but for i210 only
- Keep kworker for all other NICs like i350 (Miroslav)
- Link to v2: https://lore.kernel.org/r/20250822-igb_irq_ts-v2-1-1ac37078a7a4@linutronix.de

Changes in v2:
- Switch from IRQ to PTP aux worker due to NTP performance regression (Miroslav)
- Link to v1: https://lore.kernel.org/r/20250815-igb_irq_ts-v1-1-8c6fc0353422@linutronix.de
---
 drivers/net/ethernet/intel/igb/igb.h      |  1 +
 drivers/net/ethernet/intel/igb/igb_main.c |  5 ++++-
 drivers/net/ethernet/intel/igb/igb_ptp.c  | 22 ++++++++++++++++++++++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 0fff1df81b7b..1de29670784e 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -776,6 +776,7 @@ int igb_ptp_hwtstamp_get(struct net_device *netdev,
 int igb_ptp_hwtstamp_set(struct net_device *netdev,
 			 struct kernel_hwtstamp_config *config,
 			 struct netlink_ext_ack *extack);
+void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter);
 void igb_set_flag_queue_pairs(struct igb_adapter *, const u32);
 unsigned int igb_get_max_rss_queues(struct igb_adapter *);
 #ifdef CONFIG_IGB_HWMON
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index dbea37269d2c..d0d9245e6d72 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -7078,7 +7078,10 @@ static void igb_tsync_interrupt(struct igb_adapter *adapter)
 
 	if (tsicr & E1000_TSICR_TXTS) {
 		/* retrieve hardware timestamp */
-		schedule_work(&adapter->ptp_tx_work);
+		if (hw->mac.type == e1000_i210)
+			igb_ptp_tx_tstamp_event(adapter);
+		else
+			schedule_work(&adapter->ptp_tx_work);
 	}
 
 	if (tsicr & TSINTR_TT0)
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c
index bd85d02ecadd..8c8f2b8615f7 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -796,6 +796,28 @@ static int igb_ptp_verify_pin(struct ptp_clock_info *ptp, unsigned int pin,
 	return 0;
 }
 
+/**
+ * igb_ptp_tx_tstamp_event
+ * @adapter: pointer to igb adapter
+ *
+ * This function checks the TSYNCTXCTL valid bit and stores the Tx hardware
+ * timestamp at the current skb.
+ **/
+void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter)
+{
+	struct e1000_hw *hw = &adapter->hw;
+	u32 tsynctxctl;
+
+	if (!adapter->ptp_tx_skb)
+		return;
+
+	tsynctxctl = rd32(E1000_TSYNCTXCTL);
+	if (WARN_ON_ONCE(!(tsynctxctl & E1000_TSYNCTXCTL_VALID)))
+		return;
+
+	igb_ptp_tx_hwtstamp(adapter);
+}
+
 /**
  * igb_ptp_tx_work
  * @work: pointer to work struct

---
base-commit: e07d0d30939990da377672ef49ca09763b4fbc79
change-id: 20250813-igb_irq_ts-1aa77cc7b4cb

Best regards,
--  
Kurt Kanzenbach <kurt@linutronix.de>


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05  7:54 [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 Kurt Kanzenbach
@ 2026-02-05  9:47 ` Loktionov, Aleksandr
  2026-02-05 10:03   ` Sebastian Andrzej Siewior
  2026-02-05 11:58   ` Kurt Kanzenbach
  2026-02-05 12:12 ` Sebastian Andrzej Siewior
  1 sibling, 2 replies; 34+ messages in thread
From: Loktionov, Aleksandr @ 2026-02-05  9:47 UTC (permalink / raw)
  To: Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw
  Cc: Paul Menzel, Vadim Fedorenko, Gomes, Vinicius,
	netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski,
	Paolo Abeni, David S. Miller, Sebastian Andrzej Siewior



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Kurt Kanzenbach
> Sent: Thursday, February 5, 2026 8:55 AM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>
> Cc: Paul Menzel <pmenzel@molgen.mpg.de>; Vadim Fedorenko
> <vadim.fedorenko@linux.dev>; Gomes, Vinicius
> <vinicius.gomes@intel.com>; netdev@vger.kernel.org; Richard Cochran
> <richardcochran@gmail.com>; Kurt Kanzenbach <kurt@linutronix.de>;
> linux-kernel@vger.kernel.org; Andrew Lunn <andrew+netdev@lunn.ch>;
> Eric Dumazet <edumazet@google.com>; intel-wired-lan@lists.osuosl.org;
> Keller, Jacob E <jacob.e.keller@intel.com>; Jakub Kicinski
> <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; David S. Miller
> <davem@davemloft.net>; Sebastian Andrzej Siewior
> <bigeasy@linutronix.de>
> Subject: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx
> timestamp directly from interrupt for i210
> 
> Retrieve Tx timestamp directly from interrupt handler for i210.
> 
> The current implementation uses schedule_work() which is executed by
> the system work queue to retrieve Tx timestamps. This increases
> latency and can lead to timeouts in case of heavy system load. i210 is
> often used in industrial systems, where timestamp timeouts can be
> fatal.
> 
> Therefore, fetch the timestamp directly from the interrupt handler.
> 
> The work queue code stays for all other NICs supported by igb.
> 
> Tested on Intel i210 and i350.
> 
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
> ---
> Changes in v3:
> - Switch back to IRQ, but for i210 only
> - Keep kworker for all other NICs like i350 (Miroslav)
> - Link to v2: https://lore.kernel.org/r/20250822-igb_irq_ts-v2-1-
> 1ac37078a7a4@linutronix.de
> 
> Changes in v2:
> - Switch from IRQ to PTP aux worker due to NTP performance regression
> (Miroslav)
> - Link to v1: https://lore.kernel.org/r/20250815-igb_irq_ts-v1-1-
> 8c6fc0353422@linutronix.de
> ---
>  drivers/net/ethernet/intel/igb/igb.h      |  1 +
>  drivers/net/ethernet/intel/igb/igb_main.c |  5 ++++-
> drivers/net/ethernet/intel/igb/igb_ptp.c  | 22 ++++++++++++++++++++++
>  3 files changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb.h
> b/drivers/net/ethernet/intel/igb/igb.h
> index 0fff1df81b7b..1de29670784e 100644
> --- a/drivers/net/ethernet/intel/igb/igb.h
> +++ b/drivers/net/ethernet/intel/igb/igb.h
> @@ -776,6 +776,7 @@ int igb_ptp_hwtstamp_get(struct net_device
> *netdev,  int igb_ptp_hwtstamp_set(struct net_device *netdev,
>  			 struct kernel_hwtstamp_config *config,
>  			 struct netlink_ext_ack *extack);
> +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter);
>  void igb_set_flag_queue_pairs(struct igb_adapter *, const u32);
> unsigned int igb_get_max_rss_queues(struct igb_adapter *);  #ifdef
> CONFIG_IGB_HWMON diff --git
> a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index dbea37269d2c..d0d9245e6d72 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -7078,7 +7078,10 @@ static void igb_tsync_interrupt(struct
> igb_adapter *adapter)
> 
>  	if (tsicr & E1000_TSICR_TXTS) {
>  		/* retrieve hardware timestamp */
> -		schedule_work(&adapter->ptp_tx_work);
> +		if (hw->mac.type == e1000_i210)
> +			igb_ptp_tx_tstamp_event(adapter); <-Called from IRQ!

> +		else
> +			schedule_work(&adapter->ptp_tx_work);
>  	}
> 
>  	if (tsicr & TSINTR_TT0)
> diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c
> b/drivers/net/ethernet/intel/igb/igb_ptp.c
> index bd85d02ecadd..8c8f2b8615f7 100644
> --- a/drivers/net/ethernet/intel/igb/igb_ptp.c
> +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
> @@ -796,6 +796,28 @@ static int igb_ptp_verify_pin(struct
> ptp_clock_info *ptp, unsigned int pin,
>  	return 0;
>  }
> 
> +/**
> + * igb_ptp_tx_tstamp_event
> + * @adapter: pointer to igb adapter
> + *
> + * This function checks the TSYNCTXCTL valid bit and stores the Tx
> +hardware
> + * timestamp at the current skb.
> + **/
> +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter) {
> +	struct e1000_hw *hw = &adapter->hw;
> +	u32 tsynctxctl;
> +
> +	if (!adapter->ptp_tx_skb)
> +		return;
> +
> +	tsynctxctl = rd32(E1000_TSYNCTXCTL);
> +	if (WARN_ON_ONCE(!(tsynctxctl & E1000_TSYNCTXCTL_VALID)))
> +		return;
> +
> +	igb_ptp_tx_hwtstamp(adapter); <-Calls existing function designed for work queue!

skb_tstamp_tx() can sleep
Smells like sleep-in-atomic isn't it?

> +}
> +
>  /**
>   * igb_ptp_tx_work
>   * @work: pointer to work struct
> 
> ---
> base-commit: e07d0d30939990da377672ef49ca09763b4fbc79
> change-id: 20250813-igb_irq_ts-1aa77cc7b4cb
> 
> Best regards,
> --
> Kurt Kanzenbach <kurt@linutronix.de>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05  9:47 ` [Intel-wired-lan] " Loktionov, Aleksandr
@ 2026-02-05 10:03   ` Sebastian Andrzej Siewior
  2026-02-05 10:37     ` Loktionov, Aleksandr
  2026-02-05 11:58   ` Kurt Kanzenbach
  1 sibling, 1 reply; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-05 10:03 UTC (permalink / raw)
  To: Loktionov, Aleksandr
  Cc: Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Vadim Fedorenko, Gomes, Vinicius,
	netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski,
	Paolo Abeni, David S. Miller

On 2026-02-05 09:47:14 [+0000], Loktionov, Aleksandr wrote:
…
> > --- a/drivers/net/ethernet/intel/igb/igb_ptp.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
> > @@ -796,6 +796,28 @@ static int igb_ptp_verify_pin(struct
> > +	igb_ptp_tx_hwtstamp(adapter); <-Calls existing function designed for work queue!
> 
> skb_tstamp_tx() can sleep
> Smells like sleep-in-atomic isn't it?

How or where can it sleep?

Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05 10:03   ` Sebastian Andrzej Siewior
@ 2026-02-05 10:37     ` Loktionov, Aleksandr
  2026-02-05 10:52       ` Sebastian Andrzej Siewior
  2026-02-05 11:56       ` Vadim Fedorenko
  0 siblings, 2 replies; 34+ messages in thread
From: Loktionov, Aleksandr @ 2026-02-05 10:37 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Vadim Fedorenko, Gomes, Vinicius,
	netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski,
	Paolo Abeni, David S. Miller



> -----Original Message-----
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Sent: Thursday, February 5, 2026 11:04 AM
> To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com>
> Cc: Kurt Kanzenbach <kurt@linutronix.de>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Paul Menzel <pmenzel@molgen.mpg.de>;
> Vadim Fedorenko <vadim.fedorenko@linux.dev>; Gomes, Vinicius
> <vinicius.gomes@intel.com>; netdev@vger.kernel.org; Richard Cochran
> <richardcochran@gmail.com>; linux-kernel@vger.kernel.org; Andrew Lunn
> <andrew+netdev@lunn.ch>; Eric Dumazet <edumazet@google.com>; intel-
> wired-lan@lists.osuosl.org; Keller, Jacob E
> <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net>
> Subject: Re: RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve
> Tx timestamp directly from interrupt for i210
> 
> On 2026-02-05 09:47:14 [+0000], Loktionov, Aleksandr wrote:
> …
> > > --- a/drivers/net/ethernet/intel/igb/igb_ptp.c
> > > +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
> > > @@ -796,6 +796,28 @@ static int igb_ptp_verify_pin(struct
> …
> > > +	igb_ptp_tx_hwtstamp(adapter); <-Calls existing function
> designed for work queue!
> >
> > skb_tstamp_tx() can sleep
> > Smells like sleep-in-atomic isn't it?
> 
> How or where can it sleep?
> 
> Sebastian

igb_ptp_tx_hwtstamp() -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_tstamp_tx -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/__skb_complete_tx_timestamp -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/sock_queue_err_skb -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_queue_tail -> https://elixir.bootlin.com/linux/v6.19-rc5/source/net/core/skbuff.c#L4075 

spin_lock_irqsave(&wq_head->lock, flags);  <- RT mutex can sleep



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RE: RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05 10:37     ` Loktionov, Aleksandr
@ 2026-02-05 10:52       ` Sebastian Andrzej Siewior
  2026-02-05 11:56       ` Vadim Fedorenko
  1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-05 10:52 UTC (permalink / raw)
  To: Loktionov, Aleksandr
  Cc: Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Vadim Fedorenko, Gomes, Vinicius,
	netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski,
	Paolo Abeni, David S. Miller

On 2026-02-05 10:37:05 [+0000], Loktionov, Aleksandr wrote:
> > 
> > How or where can it sleep?
> > 
> > Sebastian
> 
> igb_ptp_tx_hwtstamp() -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_tstamp_tx -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/__skb_complete_tx_timestamp -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/sock_queue_err_skb -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_queue_tail -> https://elixir.bootlin.com/linux/v6.19-rc5/source/net/core/skbuff.c#L4075 

Would you please quote an actual call chain that can be looked up and
not this where a line crosses 300 characters?

> spin_lock_irqsave(&wq_head->lock, flags);  <- RT mutex can sleep

Okay. So you are concerned about this spinlock_t, I see.
igb_tsync_interrupt() also invokes ptp_clock_event() which acquires
pps_event_time::tsevqs_lock. Why is this not a problem?

Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05 10:37     ` Loktionov, Aleksandr
  2026-02-05 10:52       ` Sebastian Andrzej Siewior
@ 2026-02-05 11:56       ` Vadim Fedorenko
  2026-02-05 14:51         ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 34+ messages in thread
From: Vadim Fedorenko @ 2026-02-05 11:56 UTC (permalink / raw)
  To: Loktionov, Aleksandr, Sebastian Andrzej Siewior
  Cc: Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org,
	Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn,
	Eric Dumazet, intel-wired-lan@lists.osuosl.org, Keller, Jacob E,
	Jakub Kicinski, Paolo Abeni, David S. Miller

On 05/02/2026 10:37, Loktionov, Aleksandr wrote:
> 
> 
>> -----Original Message-----
>> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>> Sent: Thursday, February 5, 2026 11:04 AM
>> To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com>
>> Cc: Kurt Kanzenbach <kurt@linutronix.de>; Nguyen, Anthony L
>> <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
>> <przemyslaw.kitszel@intel.com>; Paul Menzel <pmenzel@molgen.mpg.de>;
>> Vadim Fedorenko <vadim.fedorenko@linux.dev>; Gomes, Vinicius
>> <vinicius.gomes@intel.com>; netdev@vger.kernel.org; Richard Cochran
>> <richardcochran@gmail.com>; linux-kernel@vger.kernel.org; Andrew Lunn
>> <andrew+netdev@lunn.ch>; Eric Dumazet <edumazet@google.com>; intel-
>> wired-lan@lists.osuosl.org; Keller, Jacob E
>> <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
>> Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net>
>> Subject: Re: RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve
>> Tx timestamp directly from interrupt for i210
>>
>> On 2026-02-05 09:47:14 [+0000], Loktionov, Aleksandr wrote:
>> …
>>>> --- a/drivers/net/ethernet/intel/igb/igb_ptp.c
>>>> +++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
>>>> @@ -796,6 +796,28 @@ static int igb_ptp_verify_pin(struct
>> …
>>>> +	igb_ptp_tx_hwtstamp(adapter); <-Calls existing function
>> designed for work queue!
>>>
>>> skb_tstamp_tx() can sleep
>>> Smells like sleep-in-atomic isn't it?
>>
>> How or where can it sleep?
>>
>> Sebastian
> 
> igb_ptp_tx_hwtstamp() -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_tstamp_tx -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/__skb_complete_tx_timestamp -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/sock_queue_err_skb -> https://elixir.bootlin.com/linux/v6.19-rc5/C/ident/skb_queue_tail -> https://elixir.bootlin.com/linux/v6.19-rc5/source/net/core/skbuff.c#L4075
> 
> spin_lock_irqsave(&wq_head->lock, flags);  <- RT mutex can sleep

Hmm... that actually means we have some drivers broken for RT kernels if
they are processing TX timestamps within a single irq vector:
- hisilicon/hns3
- intel/i40e (and ice probably)
- marvell/mvpp2

For igb/igc/i40e it's still OK to process TX timestamps directly in
MSI-X configuration, as ring processing has separate vector, right?

But in general skb_tstamp_tx should be moved to BH processing (NAPI poll
callback).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05  9:47 ` [Intel-wired-lan] " Loktionov, Aleksandr
  2026-02-05 10:03   ` Sebastian Andrzej Siewior
@ 2026-02-05 11:58   ` Kurt Kanzenbach
  2026-02-05 12:20     ` Loktionov, Aleksandr
  1 sibling, 1 reply; 34+ messages in thread
From: Kurt Kanzenbach @ 2026-02-05 11:58 UTC (permalink / raw)
  To: Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw
  Cc: Paul Menzel, Vadim Fedorenko, Gomes, Vinicius,
	netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski,
	Paolo Abeni, David S. Miller, Sebastian Andrzej Siewior

[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]

On Thu Feb 05 2026, Loktionov, Aleksandr wrote:
>> +/**
>> + * igb_ptp_tx_tstamp_event
>> + * @adapter: pointer to igb adapter
>> + *
>> + * This function checks the TSYNCTXCTL valid bit and stores the Tx
>> +hardware
>> + * timestamp at the current skb.
>> + **/
>> +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter) {
>> +	struct e1000_hw *hw = &adapter->hw;
>> +	u32 tsynctxctl;
>> +
>> +	if (!adapter->ptp_tx_skb)
>> +		return;
>> +
>> +	tsynctxctl = rd32(E1000_TSYNCTXCTL);
>> +	if (WARN_ON_ONCE(!(tsynctxctl & E1000_TSYNCTXCTL_VALID)))
>> +		return;
>> +
>> +	igb_ptp_tx_hwtstamp(adapter); <-Calls existing function designed for work queue!
>
> skb_tstamp_tx() can sleep
> Smells like sleep-in-atomic isn't it?

AFAICS skb_tstamp_tx() is safe to call here.

> spin_lock_irqsave(&wq_head->lock, flags);  <- RT mutex can sleep

In case you're worried about PREEMPT_RT: On -RT the IRQ runs a dedicated
thread. BTW I've tested this with and without -RT and with
CONFIG_DEBUG_ATOMIC_SLEEP.

Thanks,
Kurt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05  7:54 [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 Kurt Kanzenbach
  2026-02-05  9:47 ` [Intel-wired-lan] " Loktionov, Aleksandr
@ 2026-02-05 12:12 ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-05 12:12 UTC (permalink / raw)
  To: Kurt Kanzenbach
  Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Richard Cochran,
	Vinicius Costa Gomes, Paul Menzel, Vadim Fedorenko,
	Miroslav Lichvar, Jacob Keller, intel-wired-lan, netdev,
	linux-kernel

On 2026-02-05 08:54:34 [+0100], Kurt Kanzenbach wrote:
> Retrieve Tx timestamp directly from interrupt handler for i210.
> 
> The current implementation uses schedule_work() which is executed by the
> system work queue to retrieve Tx timestamps. This increases latency and can
> lead to timeouts in case of heavy system load. i210 is often used in
> industrial systems, where timestamp timeouts can be fatal.
> 
> Therefore, fetch the timestamp directly from the interrupt handler.
> 
> The work queue code stays for all other NICs supported by igb.
> 
> Tested on Intel i210 and i350.
> 
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

IMHO this is a compromise with Miroslav where he observed less PTP
timestamps on the i350. While testing I did not get near Miroslav's
difference but there was a small change. I don't understand *why*
because the current workqueue usage reads the timestamp on the same CPU
on which the interrupt occurred. Doing it directly just avoids the
context switch. This feels beneficial.

Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05 11:58   ` Kurt Kanzenbach
@ 2026-02-05 12:20     ` Loktionov, Aleksandr
  2026-02-06  0:05       ` Jacob Keller
  0 siblings, 1 reply; 34+ messages in thread
From: Loktionov, Aleksandr @ 2026-02-05 12:20 UTC (permalink / raw)
  To: Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw
  Cc: Paul Menzel, Vadim Fedorenko, Gomes, Vinicius,
	netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski,
	Paolo Abeni, David S. Miller, Sebastian Andrzej Siewior



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Kurt Kanzenbach
> Sent: Thursday, February 5, 2026 12:58 PM
> To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com>; Nguyen,
> Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>
> Cc: Paul Menzel <pmenzel@molgen.mpg.de>; Vadim Fedorenko
> <vadim.fedorenko@linux.dev>; Gomes, Vinicius
> <vinicius.gomes@intel.com>; netdev@vger.kernel.org; Richard Cochran
> <richardcochran@gmail.com>; linux-kernel@vger.kernel.org; Andrew Lunn
> <andrew+netdev@lunn.ch>; Eric Dumazet <edumazet@google.com>; intel-
> wired-lan@lists.osuosl.org; Keller, Jacob E
> <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net>;
> Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Subject: Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx
> timestamp directly from interrupt for i210
> 
> On Thu Feb 05 2026, Loktionov, Aleksandr wrote:
> >> +/**
> >> + * igb_ptp_tx_tstamp_event
> >> + * @adapter: pointer to igb adapter
> >> + *
> >> + * This function checks the TSYNCTXCTL valid bit and stores the Tx
> >> +hardware
> >> + * timestamp at the current skb.
> >> + **/
> >> +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter) {
> >> +	struct e1000_hw *hw = &adapter->hw;
> >> +	u32 tsynctxctl;
> >> +
> >> +	if (!adapter->ptp_tx_skb)
> >> +		return;
> >> +
> >> +	tsynctxctl = rd32(E1000_TSYNCTXCTL);
> >> +	if (WARN_ON_ONCE(!(tsynctxctl & E1000_TSYNCTXCTL_VALID)))
> >> +		return;
> >> +
> >> +	igb_ptp_tx_hwtstamp(adapter); <-Calls existing function
> designed for work queue!
> >
> > skb_tstamp_tx() can sleep
> > Smells like sleep-in-atomic isn't it?
> 
> AFAICS skb_tstamp_tx() is safe to call here.
> 
> > spin_lock_irqsave(&wq_head->lock, flags);  <- RT mutex can sleep
> 
> In case you're worried about PREEMPT_RT: On -RT the IRQ runs a
> dedicated thread. BTW I've tested this with and without -RT and with
> CONFIG_DEBUG_ATOMIC_SLEEP.
> 
> Thanks,
> Kurt

Thank you, Kurt for sharing your experience. I don't have so many experience with RT Linux.
For me calling a function, not designed to be called from IRQ context is a SUS.
So, I rose the question about sleeping.

But there is also a question about non-RT Kernels with Shared IRQ Vectors...
I.e. regular packet processing (NAPI poll in softirq) and PTP timestamp events (in hardirq).

I suspect we have potentially broken drivers with shared vectors (MSI-X should be safe I hope).

I'd like the comment to be added:
/* Safe to call from IRQ: dedicated misc vector + RT-safe on PREEMPT_RT */
igb_ptp_tx_hwtstamp(adapter);

But in long term perspective prefer to refactor moving to NAPI is the safer, more maintainable pattern.

What do you think?
Alex

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05 11:56       ` Vadim Fedorenko
@ 2026-02-05 14:51         ` Sebastian Andrzej Siewior
  2026-02-05 16:27           ` Vadim Fedorenko
  0 siblings, 1 reply; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-05 14:51 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L,
	Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius,
	netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski,
	Paolo Abeni, David S. Miller

On 2026-02-05 11:56:44 [+0000], Vadim Fedorenko wrote:
> On 05/02/2026 10:37, Loktionov, Aleksandr wrote:
> > spin_lock_irqsave(&wq_head->lock, flags);  <- RT mutex can sleep
> 
> Hmm... that actually means we have some drivers broken for RT kernels if
> they are processing TX timestamps within a single irq vector:
> - hisilicon/hns3
> - intel/i40e (and ice probably)
> - marvell/mvpp2
> 
> For igb/igc/i40e it's still OK to process TX timestamps directly in
> MSI-X configuration, as ring processing has separate vector, right?

The statement made above is not accurate. Each and every driver does
request_irq() and here on PREEMPT_RT you can freely acquire spinlock_t.

But !RT looks problematic…

__skb_tstamp_tx() invokes skb_may_tx_timestamp() which should exit early
most of the time due to the passed bool (which is true) or
sysctl_tstamp_allow_data which is true. However, should both be false
then it tries to
	read_lock_bh(&sk->sk_callback_lock);

where lockdep will complain because this lock is now acquired with
disabled interrupts.

The function will attempt do free the fresh/ cloned skb in error case
via kfree_skb(). Since it is fresh skb, sk_buff::destructor is NULL and
the warning in skb_release_head_state() won't trigger.

So the only thing that bothers me is the read_lock_bh() in
skb_may_tx_timestamp() which deadlocks if the socket is write-locked on
the same CPU.

> But in general skb_tstamp_tx should be moved to BH processing (NAPI poll
> callback).

Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05 14:51         ` Sebastian Andrzej Siewior
@ 2026-02-05 16:27           ` Vadim Fedorenko
  2026-02-05 16:43             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 34+ messages in thread
From: Vadim Fedorenko @ 2026-02-05 16:27 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L,
	Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius,
	netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski,
	Paolo Abeni, David S. Miller

On 05/02/2026 14:51, Sebastian Andrzej Siewior wrote:
> On 2026-02-05 11:56:44 [+0000], Vadim Fedorenko wrote:
>> On 05/02/2026 10:37, Loktionov, Aleksandr wrote:
>>> spin_lock_irqsave(&wq_head->lock, flags);  <- RT mutex can sleep
>>
>> Hmm... that actually means we have some drivers broken for RT kernels if
>> they are processing TX timestamps within a single irq vector:
>> - hisilicon/hns3
>> - intel/i40e (and ice probably)
>> - marvell/mvpp2
>>
>> For igb/igc/i40e it's still OK to process TX timestamps directly in
>> MSI-X configuration, as ring processing has separate vector, right?
> 
> The statement made above is not accurate. Each and every driver does
> request_irq() and here on PREEMPT_RT you can freely acquire spinlock_t.
> 
> But !RT looks problematic…
> 
> __skb_tstamp_tx() invokes skb_may_tx_timestamp() which should exit early
> most of the time due to the passed bool (which is true) or
> sysctl_tstamp_allow_data which is true. However, should both be false
> then it tries to
> 	read_lock_bh(&sk->sk_callback_lock);
> 
> where lockdep will complain because this lock is now acquired with
> disabled interrupts.
> 
> The function will attempt do free the fresh/ cloned skb in error case
> via kfree_skb(). Since it is fresh skb, sk_buff::destructor is NULL and
> the warning in skb_release_head_state() won't trigger.
> 
> So the only thing that bothers me is the read_lock_bh() in
> skb_may_tx_timestamp() which deadlocks if the socket is write-locked on
> the same CPU.

Alright. Now you make me think whether we should enforce OPT_TSONLY
option on socket which doesn't have CAP_NET_RAW? Then we can get rid of 
this check, and in case sysctl was flipped off - drop TX timestamps as
it's done now?


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05 16:27           ` Vadim Fedorenko
@ 2026-02-05 16:43             ` Sebastian Andrzej Siewior
  2026-02-05 16:48               ` Vadim Fedorenko
  0 siblings, 1 reply; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-05 16:43 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L,
	Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius,
	netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E, Jakub Kicinski,
	Paolo Abeni, David S. Miller

On 2026-02-05 16:27:03 [+0000], Vadim Fedorenko wrote:
> > So the only thing that bothers me is the read_lock_bh() in
> > skb_may_tx_timestamp() which deadlocks if the socket is write-locked on
> > the same CPU.
> 
> Alright. Now you make me think whether we should enforce OPT_TSONLY
> option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this
> check, and in case sysctl was flipped off - drop TX timestamps as
> it's done now?

This would "fix" this problem for all users which do deliver the
timestamp from their IRQ handler instead of napi. There are a few of
those…
This would be considered stable material, right? (despite the fact that
we have it for quite some time and nobody complained so far).

Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05 16:43             ` Sebastian Andrzej Siewior
@ 2026-02-05 16:48               ` Vadim Fedorenko
  2026-02-05 21:41                 ` Willem de Bruijn
  0 siblings, 1 reply; 34+ messages in thread
From: Vadim Fedorenko @ 2026-02-05 16:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Willem de Bruijn, Jakub Kicinski,
	Paolo Abeni, Eric Dumazet, David S. Miller
  Cc: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L,
	Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius,
	netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

On 05/02/2026 16:43, Sebastian Andrzej Siewior wrote:
> On 2026-02-05 16:27:03 [+0000], Vadim Fedorenko wrote:
>>> So the only thing that bothers me is the read_lock_bh() in
>>> skb_may_tx_timestamp() which deadlocks if the socket is write-locked on
>>> the same CPU.
>>
>> Alright. Now you make me think whether we should enforce OPT_TSONLY
>> option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this
>> check, and in case sysctl was flipped off - drop TX timestamps as
>> it's done now?
> 
> This would "fix" this problem for all users which do deliver the
> timestamp from their IRQ handler instead of napi. There are a few of
> those…
> This would be considered stable material, right? (despite the fact that
> we have it for quite some time and nobody complained so far).

cc: Willem as he is the author of the check introduced back in 2015.

But it's more like a question to maintainers whether it is acceptable
way of "fixing" drivers or it's no-go solution

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05 16:48               ` Vadim Fedorenko
@ 2026-02-05 21:41                 ` Willem de Bruijn
  2026-02-06  7:44                   ` Sebastian Andrzej Siewior
  2026-02-06 10:12                   ` Vadim Fedorenko
  0 siblings, 2 replies; 34+ messages in thread
From: Willem de Bruijn @ 2026-02-05 21:41 UTC (permalink / raw)
  To: Vadim Fedorenko, Sebastian Andrzej Siewior, Willem de Bruijn,
	Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller
  Cc: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L,
	Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius,
	netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

Vadim Fedorenko wrote:
> On 05/02/2026 16:43, Sebastian Andrzej Siewior wrote:
> > On 2026-02-05 16:27:03 [+0000], Vadim Fedorenko wrote:
> >>> So the only thing that bothers me is the read_lock_bh() in
> >>> skb_may_tx_timestamp() which deadlocks if the socket is write-locked on
> >>> the same CPU.
> >>
> >> Alright. Now you make me think whether we should enforce OPT_TSONLY
> >> option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this
> >> check, and in case sysctl was flipped off - drop TX timestamps as
> >> it's done now?
> > 
> > This would "fix" this problem for all users which do deliver the
> > timestamp from their IRQ handler instead of napi. There are a few of
> > those…
> > This would be considered stable material, right? (despite the fact that
> > we have it for quite some time and nobody complained so far).
> 
> cc: Willem as he is the author of the check introduced back in 2015.
> 
> But it's more like a question to maintainers whether it is acceptable
> way of "fixing" drivers or it's no-go solution

Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05 12:20     ` Loktionov, Aleksandr
@ 2026-02-06  0:05       ` Jacob Keller
  0 siblings, 0 replies; 34+ messages in thread
From: Jacob Keller @ 2026-02-06  0:05 UTC (permalink / raw)
  To: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L,
	Kitszel, Przemyslaw
  Cc: Paul Menzel, Vadim Fedorenko, Gomes, Vinicius,
	netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn, Eric Dumazet,
	intel-wired-lan@lists.osuosl.org, Jakub Kicinski, Paolo Abeni,
	David S. Miller, Sebastian Andrzej Siewior



On 2/5/2026 4:20 AM, Loktionov, Aleksandr wrote:
> 
> 
>> -----Original Message-----
>> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
>> Of Kurt Kanzenbach
>> Sent: Thursday, February 5, 2026 12:58 PM
>> To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com>; Nguyen,
>> Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
>> <przemyslaw.kitszel@intel.com>
>> Cc: Paul Menzel <pmenzel@molgen.mpg.de>; Vadim Fedorenko
>> <vadim.fedorenko@linux.dev>; Gomes, Vinicius
>> <vinicius.gomes@intel.com>; netdev@vger.kernel.org; Richard Cochran
>> <richardcochran@gmail.com>; linux-kernel@vger.kernel.org; Andrew Lunn
>> <andrew+netdev@lunn.ch>; Eric Dumazet <edumazet@google.com>; intel-
>> wired-lan@lists.osuosl.org; Keller, Jacob E
>> <jacob.e.keller@intel.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
>> Abeni <pabeni@redhat.com>; David S. Miller <davem@davemloft.net>;
>> Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>> Subject: Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx
>> timestamp directly from interrupt for i210
>>
>> On Thu Feb 05 2026, Loktionov, Aleksandr wrote:
>>>> +/**
>>>> + * igb_ptp_tx_tstamp_event
>>>> + * @adapter: pointer to igb adapter
>>>> + *
>>>> + * This function checks the TSYNCTXCTL valid bit and stores the Tx
>>>> +hardware
>>>> + * timestamp at the current skb.
>>>> + **/
>>>> +void igb_ptp_tx_tstamp_event(struct igb_adapter *adapter) {
>>>> +	struct e1000_hw *hw = &adapter->hw;
>>>> +	u32 tsynctxctl;
>>>> +
>>>> +	if (!adapter->ptp_tx_skb)
>>>> +		return;
>>>> +
>>>> +	tsynctxctl = rd32(E1000_TSYNCTXCTL);
>>>> +	if (WARN_ON_ONCE(!(tsynctxctl & E1000_TSYNCTXCTL_VALID)))
>>>> +		return;
>>>> +
>>>> +	igb_ptp_tx_hwtstamp(adapter); <-Calls existing function
>> designed for work queue!
>>>
>>> skb_tstamp_tx() can sleep
>>> Smells like sleep-in-atomic isn't it?
>>
>> AFAICS skb_tstamp_tx() is safe to call here.
>>
>>> spin_lock_irqsave(&wq_head->lock, flags);  <- RT mutex can sleep
>>
>> In case you're worried about PREEMPT_RT: On -RT the IRQ runs a
>> dedicated thread. BTW I've tested this with and without -RT and with
>> CONFIG_DEBUG_ATOMIC_SLEEP.
>>
>> Thanks,
>> Kurt
> 
> Thank you, Kurt for sharing your experience. I don't have so many experience with RT Linux.
> For me calling a function, not designed to be called from IRQ context is a SUS.
> So, I rose the question about sleeping.
> 

My understanding is that RT is only safe to convert such spinlock_t to 
mutex *because* it also converts IRQs to threads.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05 21:41                 ` Willem de Bruijn
@ 2026-02-06  7:44                   ` Sebastian Andrzej Siewior
  2026-02-06 10:12                   ` Vadim Fedorenko
  1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-06  7:44 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, David S. Miller, Loktionov, Aleksandr,
	Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org,
	Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

On 2026-02-05 16:41:03 [-0500], Willem de Bruijn wrote:
> 
> Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users.

okay. Can we move the check to sock_set_timestamping()/ setsockopt()
time? On the plus side we could throw an error instead silently dropping
packets. This might be a late win given that you describe the users as
legacy users.
I'm not sure if the "permission" can change over time and so get revoked
while an application is running.

Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-05 21:41                 ` Willem de Bruijn
  2026-02-06  7:44                   ` Sebastian Andrzej Siewior
@ 2026-02-06 10:12                   ` Vadim Fedorenko
  2026-02-08 16:25                     ` Willem de Bruijn
  1 sibling, 1 reply; 34+ messages in thread
From: Vadim Fedorenko @ 2026-02-06 10:12 UTC (permalink / raw)
  To: Willem de Bruijn, Sebastian Andrzej Siewior, Willem de Bruijn,
	Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller
  Cc: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L,
	Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius,
	netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

On 05.02.2026 21:41, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> On 05/02/2026 16:43, Sebastian Andrzej Siewior wrote:
>>> On 2026-02-05 16:27:03 [+0000], Vadim Fedorenko wrote:
>>>>> So the only thing that bothers me is the read_lock_bh() in
>>>>> skb_may_tx_timestamp() which deadlocks if the socket is write-locked on
>>>>> the same CPU.
>>>>
>>>> Alright. Now you make me think whether we should enforce OPT_TSONLY
>>>> option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this
>>>> check, and in case sysctl was flipped off - drop TX timestamps as
>>>> it's done now?
>>>
>>> This would "fix" this problem for all users which do deliver the
>>> timestamp from their IRQ handler instead of napi. There are a few of
>>> those…
>>> This would be considered stable material, right? (despite the fact that
>>> we have it for quite some time and nobody complained so far).
>>
>> cc: Willem as he is the author of the check introduced back in 2015.
>>
>> But it's more like a question to maintainers whether it is acceptable
>> way of "fixing" drivers or it's no-go solution
> 
> Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users.

Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX
timestamps are silently dropped. To receive these timestamps users have to get
CAP_NET_RAW permission, and it will work with the updated logic as well...

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-06 10:12                   ` Vadim Fedorenko
@ 2026-02-08 16:25                     ` Willem de Bruijn
  2026-02-09  9:06                       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 34+ messages in thread
From: Willem de Bruijn @ 2026-02-08 16:25 UTC (permalink / raw)
  To: Vadim Fedorenko, Willem de Bruijn, Sebastian Andrzej Siewior,
	Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	David S. Miller
  Cc: Loktionov, Aleksandr, Kurt Kanzenbach, Nguyen, Anthony L,
	Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius,
	netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

Vadim Fedorenko wrote:
> On 05.02.2026 21:41, Willem de Bruijn wrote:
> > Vadim Fedorenko wrote:
> >> On 05/02/2026 16:43, Sebastian Andrzej Siewior wrote:
> >>> On 2026-02-05 16:27:03 [+0000], Vadim Fedorenko wrote:
> >>>>> So the only thing that bothers me is the read_lock_bh() in
> >>>>> skb_may_tx_timestamp() which deadlocks if the socket is write-locked on
> >>>>> the same CPU.
> >>>>
> >>>> Alright. Now you make me think whether we should enforce OPT_TSONLY
> >>>> option on socket which doesn't have CAP_NET_RAW? Then we can get rid of this
> >>>> check, and in case sysctl was flipped off - drop TX timestamps as
> >>>> it's done now?
> >>>
> >>> This would "fix" this problem for all users which do deliver the
> >>> timestamp from their IRQ handler instead of napi. There are a few of
> >>> those…
> >>> This would be considered stable material, right? (despite the fact that
> >>> we have it for quite some time and nobody complained so far).
> >>
> >> cc: Willem as he is the author of the check introduced back in 2015.
> >>
> >> But it's more like a question to maintainers whether it is acceptable
> >> way of "fixing" drivers or it's no-go solution
> > 
> > Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users.
> 
> Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX
> timestamps are silently dropped.

Are you referring to sysctl_tstamp_allow_data?

That is enabled by default.

> To receive these timestamps users have to get
> CAP_NET_RAW permission, and it will work with the updated logic as well...

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-08 16:25                     ` Willem de Bruijn
@ 2026-02-09  9:06                       ` Sebastian Andrzej Siewior
  2026-02-09 10:43                         ` Vadim Fedorenko
  0 siblings, 1 reply; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-09  9:06 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, David S. Miller, Loktionov, Aleksandr,
	Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org,
	Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

On 2026-02-08 11:25:40 [-0500], Willem de Bruijn wrote:
> > >> But it's more like a question to maintainers whether it is acceptable
> > >> way of "fixing" drivers or it's no-go solution
> > > 
> > > Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users.
> > 
> > Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX
> > timestamps are silently dropped.
> 
> Are you referring to sysctl_tstamp_allow_data?
> 
> That is enabled by default.

Yes. If so, then we don't need the check below which requires
sk_callback_lock.

Are SIOCSHWTSTAMP the legacy users or the ones which do not set
OPT_TSONLY?

I would suggest to move the CAP_NET_RAW check to the point where
timestamping is getting enabled.
Also if ndo_hwtstamp_set is the preferred method of getting things done,
I could check how many old ones are can be easily converted…

> > To receive these timestamps users have to get
> > CAP_NET_RAW permission, and it will work with the updated logic as well...

Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-09  9:06                       ` Sebastian Andrzej Siewior
@ 2026-02-09 10:43                         ` Vadim Fedorenko
  2026-02-09 11:48                           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 34+ messages in thread
From: Vadim Fedorenko @ 2026-02-09 10:43 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Willem de Bruijn
  Cc: Willem de Bruijn, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	David S. Miller, Loktionov, Aleksandr, Kurt Kanzenbach,
	Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel,
	Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

On 09/02/2026 09:06, Sebastian Andrzej Siewior wrote:
> On 2026-02-08 11:25:40 [-0500], Willem de Bruijn wrote:
>>>>> But it's more like a question to maintainers whether it is acceptable
>>>>> way of "fixing" drivers or it's no-go solution
>>>>
>>>> Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users.
>>>
>>> Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX
>>> timestamps are silently dropped.
>>
>> Are you referring to sysctl_tstamp_allow_data?
>>
>> That is enabled by default.
> 
> Yes. If so, then we don't need the check below which requires
> sk_callback_lock.
> 
> Are SIOCSHWTSTAMP the legacy users or the ones which do not set
> OPT_TSONLY?
> 
> I would suggest to move the CAP_NET_RAW check to the point where
> timestamping is getting enabled.
> Also if ndo_hwtstamp_set is the preferred method of getting things done,
> I could check how many old ones are can be easily converted…

Looks like you are mixing things. SIOCSHWTSTAMP/ndo_hwtstamp_set are HW
configuration calls while OPT_TSONLY is socket option, which is setup 
via setsockopt, you can find points searching for
SOF_TIMESTAMPING_OPT_TSONLY in the sources, basically
sock_set_timestamping() is the function to check

> 
>>> To receive these timestamps users have to get
>>> CAP_NET_RAW permission, and it will work with the updated logic as well...
> 
> Sebastian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-09 10:43                         ` Vadim Fedorenko
@ 2026-02-09 11:48                           ` Sebastian Andrzej Siewior
  2026-02-09 12:24                             ` Vadim Fedorenko
  0 siblings, 1 reply; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-09 11:48 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Willem de Bruijn, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, David S. Miller, Loktionov, Aleksandr,
	Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org,
	Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

On 2026-02-09 10:43:55 [+0000], Vadim Fedorenko wrote:
> On 09/02/2026 09:06, Sebastian Andrzej Siewior wrote:
> > On 2026-02-08 11:25:40 [-0500], Willem de Bruijn wrote:
> > > > > > But it's more like a question to maintainers whether it is acceptable
> > > > > > way of "fixing" drivers or it's no-go solution
> > > > > 
> > > > > Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users.
> > > > 
> > > > Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX
> > > > timestamps are silently dropped.
> > > 
> > > Are you referring to sysctl_tstamp_allow_data?
> > > 
> > > That is enabled by default.
> > 
> > Yes. If so, then we don't need the check below which requires
> > sk_callback_lock.
> > 
> > Are SIOCSHWTSTAMP the legacy users or the ones which do not set
> > OPT_TSONLY?
> > 
> > I would suggest to move the CAP_NET_RAW check to the point where
> > timestamping is getting enabled.
> > Also if ndo_hwtstamp_set is the preferred method of getting things done,
> > I could check how many old ones are can be easily converted…
> 
> Looks like you are mixing things. SIOCSHWTSTAMP/ndo_hwtstamp_set are HW
> configuration calls while OPT_TSONLY is socket option, which is setup via
> setsockopt, you can find points searching for
> SOF_TIMESTAMPING_OPT_TSONLY in the sources, basically
> sock_set_timestamping() is the function to check

Yeah, but what is the legacy user here? If you enable HW-timestamps but
never set OPT_TSONLY and the sysctl is also 0 then you reply on the
CAP_NET_RAW later on. Right?

I just try to justify the CAP_NET_RAW check and if it is required to
move it earlier (where HW timestamps are enabled). And if the sysctl
check is enough then maybe it is not needed.
 
> > > > To receive these timestamps users have to get
> > > > CAP_NET_RAW permission, and it will work with the updated logic as well...

Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-09 11:48                           ` Sebastian Andrzej Siewior
@ 2026-02-09 12:24                             ` Vadim Fedorenko
  2026-02-09 12:46                               ` Willem de Bruijn
  0 siblings, 1 reply; 34+ messages in thread
From: Vadim Fedorenko @ 2026-02-09 12:24 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Willem de Bruijn, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, David S. Miller, Loktionov, Aleksandr,
	Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org,
	Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

On 09/02/2026 11:48, Sebastian Andrzej Siewior wrote:
> On 2026-02-09 10:43:55 [+0000], Vadim Fedorenko wrote:
>> On 09/02/2026 09:06, Sebastian Andrzej Siewior wrote:
>>> On 2026-02-08 11:25:40 [-0500], Willem de Bruijn wrote:
>>>>>>> But it's more like a question to maintainers whether it is acceptable
>>>>>>> way of "fixing" drivers or it's no-go solution
>>>>>>
>>>>>> Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users.
>>>>>
>>>>> Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX
>>>>> timestamps are silently dropped.
>>>>
>>>> Are you referring to sysctl_tstamp_allow_data?
>>>>
>>>> That is enabled by default.
>>>
>>> Yes. If so, then we don't need the check below which requires
>>> sk_callback_lock.
>>>
>>> Are SIOCSHWTSTAMP the legacy users or the ones which do not set
>>> OPT_TSONLY?
>>>
>>> I would suggest to move the CAP_NET_RAW check to the point where
>>> timestamping is getting enabled.
>>> Also if ndo_hwtstamp_set is the preferred method of getting things done,
>>> I could check how many old ones are can be easily converted…
>>
>> Looks like you are mixing things. SIOCSHWTSTAMP/ndo_hwtstamp_set are HW
>> configuration calls while OPT_TSONLY is socket option, which is setup via
>> setsockopt, you can find points searching for
>> SOF_TIMESTAMPING_OPT_TSONLY in the sources, basically
>> sock_set_timestamping() is the function to check
> 
> Yeah, but what is the legacy user here? If you enable HW-timestamps but
> never set OPT_TSONLY and the sysctl is also 0 then you reply on the
> CAP_NET_RAW later on. Right?

Legacy users here means users of HW TX timestamps expecting full skb to 
be returned back with the TX timestamp. Legacy here means that skb will 
be returned with headers modified by stack, which is kind of exposure of
data, which requires CAP_NET_RAW...

> I just try to justify the CAP_NET_RAW check and if it is required to
> move it earlier (where HW timestamps are enabled). And if the sysctl
> check is enough then maybe it is not needed.

Capabilities should not change during lifetime of the process, should be
fine to move. On the other, sysctl can be changed system-wide which may 
affect users.

>   
>>>>> To receive these timestamps users have to get
>>>>> CAP_NET_RAW permission, and it will work with the updated logic as well...
> 
> Sebastian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-09 12:24                             ` Vadim Fedorenko
@ 2026-02-09 12:46                               ` Willem de Bruijn
  2026-02-10 12:12                                 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 34+ messages in thread
From: Willem de Bruijn @ 2026-02-09 12:46 UTC (permalink / raw)
  To: Vadim Fedorenko, Sebastian Andrzej Siewior
  Cc: Willem de Bruijn, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, David S. Miller, Loktionov, Aleksandr,
	Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org,
	Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

Vadim Fedorenko wrote:
> On 09/02/2026 11:48, Sebastian Andrzej Siewior wrote:
> > On 2026-02-09 10:43:55 [+0000], Vadim Fedorenko wrote:
> >> On 09/02/2026 09:06, Sebastian Andrzej Siewior wrote:
> >>> On 2026-02-08 11:25:40 [-0500], Willem de Bruijn wrote:
> >>>>>>> But it's more like a question to maintainers whether it is acceptable
> >>>>>>> way of "fixing" drivers or it's no-go solution
> >>>>>>
> >>>>>> Requiring OPT_TSONLY unless CAP_NET_RAW would break legacy users.
> >>>>>
> >>>>> Well, they are kinda broken already. Without OPT_TSONLY and CAP_NET_RAW all TX
> >>>>> timestamps are silently dropped.
> >>>>
> >>>> Are you referring to sysctl_tstamp_allow_data?
> >>>>
> >>>> That is enabled by default.
> >>>
> >>> Yes. If so, then we don't need the check below which requires
> >>> sk_callback_lock.
> >>>
> >>> Are SIOCSHWTSTAMP the legacy users or the ones which do not set
> >>> OPT_TSONLY?
> >>>
> >>> I would suggest to move the CAP_NET_RAW check to the point where
> >>> timestamping is getting enabled.
> >>> Also if ndo_hwtstamp_set is the preferred method of getting things done,
> >>> I could check how many old ones are can be easily converted…
> >>
> >> Looks like you are mixing things. SIOCSHWTSTAMP/ndo_hwtstamp_set are HW
> >> configuration calls while OPT_TSONLY is socket option, which is setup via
> >> setsockopt, you can find points searching for
> >> SOF_TIMESTAMPING_OPT_TSONLY in the sources, basically
> >> sock_set_timestamping() is the function to check
> > 
> > Yeah, but what is the legacy user here? If you enable HW-timestamps but
> > never set OPT_TSONLY and the sysctl is also 0 then you reply on the
> > CAP_NET_RAW later on. Right?
> 
> Legacy users here means users of HW TX timestamps expecting full skb to 
> be returned back with the TX timestamp. Legacy here means that skb will 
> be returned with headers modified by stack, which is kind of exposure of
> data, which requires CAP_NET_RAW...
> 
> > I just try to justify the CAP_NET_RAW check and if it is required to
> > move it earlier (where HW timestamps are enabled). And if the sysctl
> > check is enough then maybe it is not needed.
> 
> Capabilities should not change during lifetime of the process, should be
> fine to move. On the other, sysctl can be changed system-wide which may 
> affect users.

Ignore the hardware configuration. That is entirely optional. Some
devices will timestamp every packet.

The capability check here is per-socket, independent from the system
hardware configuration.

I don't see how it could be moved.

Before OPT_TSONLY was introduced packets were always queued with their
payload. The sysctl check was added to optionally disallow this. The
check could arguably be moved earlier in the socket lifecycle and the
decision cached in the socket. But then flipping the sysctl would not
affect existing sockets, so that is a change in ABI behavior.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-09 12:46                               ` Willem de Bruijn
@ 2026-02-10 12:12                                 ` Sebastian Andrzej Siewior
  2026-02-10 16:14                                   ` Willem de Bruijn
  0 siblings, 1 reply; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-10 12:12 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, David S. Miller, Loktionov, Aleksandr,
	Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org,
	Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

On 2026-02-09 07:46:01 [-0500], Willem de Bruijn wrote:
> > > Yeah, but what is the legacy user here? If you enable HW-timestamps but
> > > never set OPT_TSONLY and the sysctl is also 0 then you reply on the
> > > CAP_NET_RAW later on. Right?
> > 
> > Legacy users here means users of HW TX timestamps expecting full skb to 
> > be returned back with the TX timestamp. Legacy here means that skb will 
> > be returned with headers modified by stack, which is kind of exposure of
> > data, which requires CAP_NET_RAW...

Ah okay. I assumed the err-queue was the standard way of receiving
timestamps.

> > > I just try to justify the CAP_NET_RAW check and if it is required to
> > > move it earlier (where HW timestamps are enabled). And if the sysctl
> > > check is enough then maybe it is not needed.
> > 
> > Capabilities should not change during lifetime of the process, should be
> > fine to move. On the other, sysctl can be changed system-wide which may 
> > affect users.
> 
> Ignore the hardware configuration. That is entirely optional. Some
> devices will timestamp every packet.
> 
> The capability check here is per-socket, independent from the system
> hardware configuration.
> 
> I don't see how it could be moved.
> 
> Before OPT_TSONLY was introduced packets were always queued with their
> payload. The sysctl check was added to optionally disallow this. The
> check could arguably be moved earlier in the socket lifecycle and the
> decision cached in the socket. But then flipping the sysctl would not
> affect existing sockets, so that is a change in ABI behavior.

You could cache only the part under sk_callback_lock.
Any other suggestions?
The access from IRQ is quick and avoids any detours.
The alternative would be to move the whole routine into an aux_worker.
For every driver doing it from the IRQ handler.

Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-10 12:12                                 ` Sebastian Andrzej Siewior
@ 2026-02-10 16:14                                   ` Willem de Bruijn
  2026-02-11 12:08                                     ` Kurt Kanzenbach
  0 siblings, 1 reply; 34+ messages in thread
From: Willem de Bruijn @ 2026-02-10 16:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Willem de Bruijn
  Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, David S. Miller, Loktionov, Aleksandr,
	Kurt Kanzenbach, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org,
	Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

Sebastian Andrzej Siewior wrote:
> On 2026-02-09 07:46:01 [-0500], Willem de Bruijn wrote:
> > > > Yeah, but what is the legacy user here? If you enable HW-timestamps but
> > > > never set OPT_TSONLY and the sysctl is also 0 then you reply on the
> > > > CAP_NET_RAW later on. Right?
> > > 
> > > Legacy users here means users of HW TX timestamps expecting full skb to 
> > > be returned back with the TX timestamp. Legacy here means that skb will 
> > > be returned with headers modified by stack, which is kind of exposure of
> > > data, which requires CAP_NET_RAW...
> 
> Ah okay. I assumed the err-queue was the standard way of receiving
> timestamps.

It is, for tx timestamps. The open question is whether they are queued
with or without packet payload.
 
> > > > I just try to justify the CAP_NET_RAW check and if it is required to
> > > > move it earlier (where HW timestamps are enabled). And if the sysctl
> > > > check is enough then maybe it is not needed.
> > > 
> > > Capabilities should not change during lifetime of the process, should be
> > > fine to move. On the other, sysctl can be changed system-wide which may 
> > > affect users.
> > 
> > Ignore the hardware configuration. That is entirely optional. Some
> > devices will timestamp every packet.
> > 
> > The capability check here is per-socket, independent from the system
> > hardware configuration.
> > 
> > I don't see how it could be moved.
> > 
> > Before OPT_TSONLY was introduced packets were always queued with their
> > payload. The sysctl check was added to optionally disallow this. The
> > check could arguably be moved earlier in the socket lifecycle and the
> > decision cached in the socket. But then flipping the sysctl would not
> > affect existing sockets, so that is a change in ABI behavior.
> 
> You could cache only the part under sk_callback_lock.

Can you elaborate a bit?

> Any other suggestions?
> The access from IRQ is quick and avoids any detours.
> The alternative would be to move the whole routine into an aux_worker.
> For every driver doing it from the IRQ handler.

Perhaps CAP_NET_RAW can be checked in a way that does not require this
specific lock.

See for instance other examples that instead use
sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW). Though that
uses current_cred() so probably won't work in interrupt context.

Changing the check may subtly change behavior. But that may be
acceptable as long as the consequences are clearly described.

An alternative would be to delay until dequeue. But that would wake
the reader and then drop the packets, so without any data to read. Not
practical.

The core issue seems to be that the ptp_tx_work is not scheduled
quickly enough. I wonder if that is the issue to be fixed. When/why
is this too slow?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-10 16:14                                   ` Willem de Bruijn
@ 2026-02-11 12:08                                     ` Kurt Kanzenbach
  2026-02-11 16:29                                       ` Willem de Bruijn
                                                         ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Kurt Kanzenbach @ 2026-02-11 12:08 UTC (permalink / raw)
  To: Willem de Bruijn, Sebastian Andrzej Siewior, Willem de Bruijn
  Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, David S. Miller, Loktionov, Aleksandr,
	Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel,
	Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

[-- Attachment #1: Type: text/plain, Size: 1649 bytes --]

On Tue Feb 10 2026, Willem de Bruijn wrote:
> The core issue seems to be that the ptp_tx_work is not scheduled
> quickly enough. I wonder if that is the issue to be fixed. When/why
> is this too slow?

The igb driver uses schedule_work() for the Tx timestamp retrieval. That
means the ptp_tx_work item is queued to the kernel-global workqueue. In
case there is load on the system, the kworker which handles ptp_tx_work
might be delayed too much, which results in ptp4l timeouts.

Easy solution would be to tune the priority/affinity of the
kworker. However, we have to figure which kworker it is. Furthermore,
this kworker might handle other things as well, which are not related to
igb timestamping at all. Therefore, tuning the priority of the kworker
is not practical.

Moving the timestamping in IRQ looked like a good solution, because the
device already signals that the Tx timestamp is available now. No need
to schedule any worker/work at all. So, it'd be very nice if
skb_tstamp_tx() could be called from IRQ context. BTW other drivers like
igc call this function in IRQ context as well.

Alternative solution for igb is to move from schedule_work() to PTP AUX
worker. That is a dedicated PTP worker thread called ptpX, which could
handle the timestamping. This can be easily tuned with taskset and
chrt. However, there's one difference to the kworker approach: The
kworker always runs on the same CPU, where the IRQ triggered, the AUX
worker not necessarily. This means, Miroslav needs to be aware of this
and tune the AUX worker for his NTP use cases.

I hope, that makes the motivation for this patch and discussion clear.

Thanks,
Kurt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-11 12:08                                     ` Kurt Kanzenbach
@ 2026-02-11 16:29                                       ` Willem de Bruijn
  2026-02-12 18:33                                         ` Sebastian Andrzej Siewior
  2026-02-14 23:26                                         ` Sebastian Andrzej Siewior
  2026-02-11 18:54                                       ` Jakub Kicinski
  2026-02-11 19:29                                       ` Jacob Keller
  2 siblings, 2 replies; 34+ messages in thread
From: Willem de Bruijn @ 2026-02-11 16:29 UTC (permalink / raw)
  To: Kurt Kanzenbach, Willem de Bruijn, Sebastian Andrzej Siewior,
	Willem de Bruijn
  Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, David S. Miller, Loktionov, Aleksandr,
	Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel,
	Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

Kurt Kanzenbach wrote:
> On Tue Feb 10 2026, Willem de Bruijn wrote:
> > The core issue seems to be that the ptp_tx_work is not scheduled
> > quickly enough. I wonder if that is the issue to be fixed. When/why
> > is this too slow?
> 
> The igb driver uses schedule_work() for the Tx timestamp retrieval. That
> means the ptp_tx_work item is queued to the kernel-global workqueue. In
> case there is load on the system, the kworker which handles ptp_tx_work
> might be delayed too much, which results in ptp4l timeouts.
> 
> Easy solution would be to tune the priority/affinity of the
> kworker. However, we have to figure which kworker it is. Furthermore,
> this kworker might handle other things as well, which are not related to
> igb timestamping at all. Therefore, tuning the priority of the kworker
> is not practical.
> 
> Moving the timestamping in IRQ looked like a good solution, because the
> device already signals that the Tx timestamp is available now. No need
> to schedule any worker/work at all. So, it'd be very nice if
> skb_tstamp_tx() could be called from IRQ context. BTW other drivers like
> igc call this function in IRQ context as well.
> 
> Alternative solution for igb is to move from schedule_work() to PTP AUX
> worker. That is a dedicated PTP worker thread called ptpX, which could
> handle the timestamping. This can be easily tuned with taskset and
> chrt. However, there's one difference to the kworker approach: The
> kworker always runs on the same CPU, where the IRQ triggered, the AUX
> worker not necessarily. This means, Miroslav needs to be aware of this
> and tune the AUX worker for his NTP use cases.
> 
> I hope, that makes the motivation for this patch and discussion clear.

It does thanks.

I think we should look at the locking. It is not clear to me that
sk_callback_lock needs to be held here at all.

From the lock documentation itself its use is more limited.

sk->sk_socket->file is indeed dereferenced elsewhere without holding
such locks.

sk_capable is another indication.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-11 12:08                                     ` Kurt Kanzenbach
  2026-02-11 16:29                                       ` Willem de Bruijn
@ 2026-02-11 18:54                                       ` Jakub Kicinski
  2026-02-12 16:28                                         ` Sebastian Andrzej Siewior
  2026-02-11 19:29                                       ` Jacob Keller
  2 siblings, 1 reply; 34+ messages in thread
From: Jakub Kicinski @ 2026-02-11 18:54 UTC (permalink / raw)
  To: Kurt Kanzenbach
  Cc: Willem de Bruijn, Sebastian Andrzej Siewior, Vadim Fedorenko,
	Willem de Bruijn, Paolo Abeni, Eric Dumazet, David S. Miller,
	Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org,
	Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

On Wed, 11 Feb 2026 13:08:51 +0100 Kurt Kanzenbach wrote:
> On Tue Feb 10 2026, Willem de Bruijn wrote:
> > The core issue seems to be that the ptp_tx_work is not scheduled
> > quickly enough. I wonder if that is the issue to be fixed. When/why
> > is this too slow?  
> 
> The igb driver uses schedule_work() for the Tx timestamp retrieval. That
> means the ptp_tx_work item is queued to the kernel-global workqueue. In
> case there is load on the system, the kworker which handles ptp_tx_work
> might be delayed too much, which results in ptp4l timeouts.
> 
> Easy solution would be to tune the priority/affinity of the
> kworker. However, we have to figure which kworker it is. Furthermore,
> this kworker might handle other things as well, which are not related to
> igb timestamping at all. Therefore, tuning the priority of the kworker
> is not practical.
> 
> Moving the timestamping in IRQ looked like a good solution, because the
> device already signals that the Tx timestamp is available now. No need
> to schedule any worker/work at all. So, it'd be very nice if
> skb_tstamp_tx() could be called from IRQ context. BTW other drivers like
> igc call this function in IRQ context as well.
> 
> Alternative solution for igb is to move from schedule_work() to PTP AUX
> worker. That is a dedicated PTP worker thread called ptpX, which could
> handle the timestamping. This can be easily tuned with taskset and
> chrt. However, there's one difference to the kworker approach: The
> kworker always runs on the same CPU, where the IRQ triggered, the AUX
> worker not necessarily. This means, Miroslav needs to be aware of this
> and tune the AUX worker for his NTP use cases.
> 
> I hope, that makes the motivation for this patch and discussion clear.

Are you concerned about the latency of delivering the TS to the user
space app / socket? Or purely reading the TS out of the HW fifo to make
space for another packet to be timestamped?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-11 12:08                                     ` Kurt Kanzenbach
  2026-02-11 16:29                                       ` Willem de Bruijn
  2026-02-11 18:54                                       ` Jakub Kicinski
@ 2026-02-11 19:29                                       ` Jacob Keller
  2026-02-11 21:44                                         ` Jakub Kicinski
  2 siblings, 1 reply; 34+ messages in thread
From: Jacob Keller @ 2026-02-11 19:29 UTC (permalink / raw)
  To: Kurt Kanzenbach, Willem de Bruijn, Sebastian Andrzej Siewior
  Cc: Vadim Fedorenko, Willem de Bruijn, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, David S. Miller, Loktionov, Aleksandr,
	Nguyen, Anthony L, Kitszel, Przemyslaw, Paul Menzel,
	Gomes, Vinicius, netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org



On 2/11/2026 4:08 AM, Kurt Kanzenbach wrote:
> On Tue Feb 10 2026, Willem de Bruijn wrote:
>> The core issue seems to be that the ptp_tx_work is not scheduled
>> quickly enough. I wonder if that is the issue to be fixed. When/why
>> is this too slow?
> 
> The igb driver uses schedule_work() for the Tx timestamp retrieval. That
> means the ptp_tx_work item is queued to the kernel-global workqueue. In
> case there is load on the system, the kworker which handles ptp_tx_work
> might be delayed too much, which results in ptp4l timeouts.
> 

Right. Old versions of ptp4l wait for ~1 millisecond by default, and 
newer ones default to ~10 milliseconds.. but latency here can have 
negative impacts on sync capability especially on profiles with higher 
sync rates.

> Easy solution would be to tune the priority/affinity of the
> kworker. However, we have to figure which kworker it is. Furthermore,
> this kworker might handle other things as well, which are not related to
> igb timestamping at all. Therefore, tuning the priority of the kworker
> is not practical.
> 

I don't think it is even guaranteed that you would get the same kworker 
every time, so I would consider such tuning not just impractical but 
impossible.

> Moving the timestamping in IRQ looked like a good solution, because the
> device already signals that the Tx timestamp is available now. No need
> to schedule any worker/work at all. So, it'd be very nice if
> skb_tstamp_tx() could be called from IRQ context. BTW other drivers like
> igc call this function in IRQ context as well.
> 

Right. Reporting the timestamp from the interrupt is the simplest and 
lowest latency method. I know Miroslav had some situations and devices 
where it apparently caused more problems than it helped, though I don't 
believe anyone else has reproduced those?

> Alternative solution for igb is to move from schedule_work() to PTP AUX
> worker. That is a dedicated PTP worker thread called ptpX, which could
> handle the timestamping. This can be easily tuned with taskset and
> chrt. However, there's one difference to the kworker approach: The
> kworker always runs on the same CPU, where the IRQ triggered, the AUX
> worker not necessarily. This means, Miroslav needs to be aware of this
> and tune the AUX worker for his NTP use cases.

This should be the standard at minimum for all new drivers, when it is 
not possible to report the timestamp directly in the interrupt handler.

> 
> I hope, that makes the motivation for this patch and discussion clear.
> 
> Thanks,
> Kurt


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-11 19:29                                       ` Jacob Keller
@ 2026-02-11 21:44                                         ` Jakub Kicinski
  2026-02-12 16:47                                           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 34+ messages in thread
From: Jakub Kicinski @ 2026-02-11 21:44 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Kurt Kanzenbach, Willem de Bruijn, Sebastian Andrzej Siewior,
	Vadim Fedorenko, Willem de Bruijn, Paolo Abeni, Eric Dumazet,
	David S. Miller, Loktionov, Aleksandr, Nguyen, Anthony L,
	Kitszel, Przemyslaw, Paul Menzel, Gomes, Vinicius,
	netdev@vger.kernel.org, Richard Cochran,
	linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org

On Wed, 11 Feb 2026 11:29:03 -0800 Jacob Keller wrote:
> > Moving the timestamping in IRQ looked like a good solution, because the
> > device already signals that the Tx timestamp is available now. No need
> > to schedule any worker/work at all. So, it'd be very nice if
> > skb_tstamp_tx() could be called from IRQ context. BTW other drivers like
> > igc call this function in IRQ context as well.
> 
> Right. Reporting the timestamp from the interrupt is the simplest and 
> lowest latency method. I know Miroslav had some situations and devices 
> where it apparently caused more problems than it helped, though I don't 
> believe anyone else has reproduced those?

There's a BH workqueue now, as a replacement for tasklets.
Presumably smallest fix would be to switch to that?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-11 18:54                                       ` Jakub Kicinski
@ 2026-02-12 16:28                                         ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-12 16:28 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Kurt Kanzenbach, Willem de Bruijn, Vadim Fedorenko,
	Willem de Bruijn, Paolo Abeni, Eric Dumazet, David S. Miller,
	Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org,
	Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

On 2026-02-11 10:54:44 [-0800], Jakub Kicinski wrote:
> Are you concerned about the latency of delivering the TS to the user
> space app / socket? Or purely reading the TS out of the HW fifo to make
> space for another packet to be timestamped?

The problem with the global workqueue is the latency. If the system is
idle it works as-is but if it gets busy the workqueue can be
significantly delayed. You can't give this workqueue a higher priority
because the workqueue is anonymous and changes.
The next best thing here is the aux_worker because everything here is
executed always under the same process/ PID so you can apply a priority
and ensure that is handled before other "less important" tasks.

However. There is actually no need for the workqueue so you could do it
directly (which is what some other driver do). But then I stumbled upon
the locking issue…

Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-11 21:44                                         ` Jakub Kicinski
@ 2026-02-12 16:47                                           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-12 16:47 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jacob Keller, Kurt Kanzenbach, Willem de Bruijn, Vadim Fedorenko,
	Willem de Bruijn, Paolo Abeni, Eric Dumazet, David S. Miller,
	Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org,
	Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org

On 2026-02-11 13:44:36 [-0800], Jakub Kicinski wrote:
> On Wed, 11 Feb 2026 11:29:03 -0800 Jacob Keller wrote:
> > > Moving the timestamping in IRQ looked like a good solution, because the
> > > device already signals that the Tx timestamp is available now. No need
> > > to schedule any worker/work at all. So, it'd be very nice if
> > > skb_tstamp_tx() could be called from IRQ context. BTW other drivers like
> > > igc call this function in IRQ context as well.
> > 
> > Right. Reporting the timestamp from the interrupt is the simplest and 
> > lowest latency method. I know Miroslav had some situations and devices 
> > where it apparently caused more problems than it helped, though I don't 
> > believe anyone else has reproduced those?
> 
> There's a BH workqueue now, as a replacement for tasklets.
> Presumably smallest fix would be to switch to that?

Hmm. This would raise the TASKLET softirq from the interrupt handler and
then handle it on the irq-exit path. 
Let me look if that one lock can be removed first…

Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-11 16:29                                       ` Willem de Bruijn
@ 2026-02-12 18:33                                         ` Sebastian Andrzej Siewior
  2026-02-14 23:26                                         ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-12 18:33 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Kurt Kanzenbach, Vadim Fedorenko, Willem de Bruijn,
	Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller,
	Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org,
	Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

On 2026-02-11 11:29:43 [-0500], Willem de Bruijn wrote:
> I think we should look at the locking. It is not clear to me that
> sk_callback_lock needs to be held here at all.
> 
> From the lock documentation itself its use is more limited.
> 
> sk->sk_socket->file is indeed dereferenced elsewhere without holding
> such locks.
>
> sk_capable is another indication.

That skb has a socket reference so that skbs should have a reference on
'sk' so it can't go away while the skb is around.
sk_socket should be assigned once while sk is created and not change.
Also that ->file should be assigned on accept and remain stable.
That assignment happens in sock_graft() under the lock or in
sock_init_data_uid() without it (but it both cases it should be new).

If you close that socket then I think sock_close() will be invoked which
ends in __sock_release() and assigns NULL to ->file. The filep itself
is SLAB_TYPESAFE_BY_RCU (as it comes from alloc_file_pseudo()) so it
safe to access it while in IRQ/ softirq because it can not go away.
It should invoke sock_orphan() which sets ->sk_socket to NULL under the
lock. I don't think that orphan can be called from outside the close
path. And inode (socket) remains around as long as the filep is there.

So based on this, if sk->sk_socket->file is accessed from within a
syscall then the chain up to file should be valid because the fd can not
be closed and so anything in that chain up to file become NULL.
From within the IRQ/ softirq a close on the fd/ socket should not result
in an immediate release. I *think* the skb holds sock holds the socket.
If so that skb would hold the destruction of the socket back and this
would make it safe to access it without the lock.
This is the theory so far.

Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210
  2026-02-11 16:29                                       ` Willem de Bruijn
  2026-02-12 18:33                                         ` Sebastian Andrzej Siewior
@ 2026-02-14 23:26                                         ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-14 23:26 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Kurt Kanzenbach, Vadim Fedorenko, Willem de Bruijn,
	Jakub Kicinski, Paolo Abeni, Eric Dumazet, David S. Miller,
	Loktionov, Aleksandr, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Paul Menzel, Gomes, Vinicius, netdev@vger.kernel.org,
	Richard Cochran, linux-kernel@vger.kernel.org, Andrew Lunn,
	intel-wired-lan@lists.osuosl.org, Keller, Jacob E

On 2026-02-11 11:29:43 [-0500], Willem de Bruijn wrote:
> I think we should look at the locking. It is not clear to me that
> sk_callback_lock needs to be held here at all.

It can go away, posted https://lore.kernel.org/all/20260214232456.A37oV4KQ@linutronix.de/

Sebastian

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2026-02-14 23:26 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-05  7:54 [PATCH iwl-next v3] igb: Retrieve Tx timestamp directly from interrupt for i210 Kurt Kanzenbach
2026-02-05  9:47 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-02-05 10:03   ` Sebastian Andrzej Siewior
2026-02-05 10:37     ` Loktionov, Aleksandr
2026-02-05 10:52       ` Sebastian Andrzej Siewior
2026-02-05 11:56       ` Vadim Fedorenko
2026-02-05 14:51         ` Sebastian Andrzej Siewior
2026-02-05 16:27           ` Vadim Fedorenko
2026-02-05 16:43             ` Sebastian Andrzej Siewior
2026-02-05 16:48               ` Vadim Fedorenko
2026-02-05 21:41                 ` Willem de Bruijn
2026-02-06  7:44                   ` Sebastian Andrzej Siewior
2026-02-06 10:12                   ` Vadim Fedorenko
2026-02-08 16:25                     ` Willem de Bruijn
2026-02-09  9:06                       ` Sebastian Andrzej Siewior
2026-02-09 10:43                         ` Vadim Fedorenko
2026-02-09 11:48                           ` Sebastian Andrzej Siewior
2026-02-09 12:24                             ` Vadim Fedorenko
2026-02-09 12:46                               ` Willem de Bruijn
2026-02-10 12:12                                 ` Sebastian Andrzej Siewior
2026-02-10 16:14                                   ` Willem de Bruijn
2026-02-11 12:08                                     ` Kurt Kanzenbach
2026-02-11 16:29                                       ` Willem de Bruijn
2026-02-12 18:33                                         ` Sebastian Andrzej Siewior
2026-02-14 23:26                                         ` Sebastian Andrzej Siewior
2026-02-11 18:54                                       ` Jakub Kicinski
2026-02-12 16:28                                         ` Sebastian Andrzej Siewior
2026-02-11 19:29                                       ` Jacob Keller
2026-02-11 21:44                                         ` Jakub Kicinski
2026-02-12 16:47                                           ` Sebastian Andrzej Siewior
2026-02-05 11:58   ` Kurt Kanzenbach
2026-02-05 12:20     ` Loktionov, Aleksandr
2026-02-06  0:05       ` Jacob Keller
2026-02-05 12:12 ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox