linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/2] fix two issues and optimize code on tpacket_snd()
@ 2025-07-11  9:32 Yun Lu
  2025-07-11  9:32 ` [PATCH v5 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd() Yun Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Yun Lu @ 2025-07-11  9:32 UTC (permalink / raw)
  To: willemdebruijn.kernel, davem, edumazet, kuba, pabeni, horms
  Cc: netdev, linux-kernel

From: Yun Lu <luyun@kylinos.cn>

This series fix two issues and optimize the code on tpacket_snd():
1, fix the SO_SNDTIMEO constraint not effective due to the changes in
commit 581073f626e3;
2, fix a soft lockup issue on a specific edge case, and also optimize
the loop logic to be clearer and more obvious;

---
Changes in v5:
- Still combine fix and optimization together, change to while(1). 
  Thanks: Willem de Bruijn.
- Link to v4: https://lore.kernel.org/all/20250710102639.280932-1-luyun_611@163.com/

Changes in v4:
- Fix a typo and add the missing semicolon. Thanks: Simon Horman.
- Split the second patch into two, one to fix, another to optimize.
  Thanks: Willem de Bruijn
- Link to v3: https://lore.kernel.org/all/20250709095653.62469-1-luyun_611@163.com/

Changes in v3:
- Split in two different patches.
- Simplify the code and reuse ph to continue. Thanks: Eric Dumazet.
- Link to v2: https://lore.kernel.org/all/20250708020642.27838-1-luyun_611@163.com/

Changes in v2:
- Add a Fixes tag.
- Link to v1: https://lore.kernel.org/all/20250707081629.10344-1-luyun_611@163.com/


Yun Lu (2):
  af_packet: fix the SO_SNDTIMEO constraint not effective on
    tpacked_snd()
  af_packet: fix soft lockup issue caused by tpacket_snd()

 net/packet/af_packet.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v5 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd()
  2025-07-11  9:32 [PATCH v5 0/2] fix two issues and optimize code on tpacket_snd() Yun Lu
@ 2025-07-11  9:32 ` Yun Lu
  2025-07-11  9:33 ` [PATCH v5 2/2] af_packet: fix soft lockup issue caused by tpacket_snd() Yun Lu
  2025-07-13  0:40 ` [PATCH v5 0/2] fix two issues and optimize code on tpacket_snd() patchwork-bot+netdevbpf
  2 siblings, 0 replies; 5+ messages in thread
From: Yun Lu @ 2025-07-11  9:32 UTC (permalink / raw)
  To: willemdebruijn.kernel, davem, edumazet, kuba, pabeni, horms
  Cc: netdev, linux-kernel

From: Yun Lu <luyun@kylinos.cn>

Due to the changes in commit 581073f626e3 ("af_packet: do not call
packet_read_pending() from tpacket_destruct_skb()"), every time
tpacket_destruct_skb() is executed, the skb_completion is marked as
completed. When wait_for_completion_interruptible_timeout() returns
completed, the pending_refcnt has not yet been reduced to zero.
Therefore, when ph is NULL, the wait function may need to be called
multiple times until packet_read_pending() finally returns zero.

We should call sock_sndtimeo() only once, otherwise the SO_SNDTIMEO
constraint could be way off.

Fixes: 581073f626e3 ("af_packet: do not call packet_read_pending() from tpacket_destruct_skb()")
Cc: stable@kernel.org
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yun Lu <luyun@kylinos.cn>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
 net/packet/af_packet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 3d43f3eae759..7089b8c2a655 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2785,7 +2785,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 	int len_sum = 0;
 	int status = TP_STATUS_AVAILABLE;
 	int hlen, tlen, copylen = 0;
-	long timeo = 0;
+	long timeo;
 
 	mutex_lock(&po->pg_vec_lock);
 
@@ -2839,6 +2839,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 	if ((size_max > dev->mtu + reserve + VLAN_HLEN) && !vnet_hdr_sz)
 		size_max = dev->mtu + reserve + VLAN_HLEN;
 
+	timeo = sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT);
 	reinit_completion(&po->skb_completion);
 
 	do {
@@ -2846,7 +2847,6 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 					  TP_STATUS_SEND_REQUEST);
 		if (unlikely(ph == NULL)) {
 			if (need_wait && skb) {
-				timeo = sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT);
 				timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
 				if (timeo <= 0) {
 					err = !timeo ? -ETIMEDOUT : -ERESTARTSYS;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v5 2/2] af_packet: fix soft lockup issue caused by tpacket_snd()
  2025-07-11  9:32 [PATCH v5 0/2] fix two issues and optimize code on tpacket_snd() Yun Lu
  2025-07-11  9:32 ` [PATCH v5 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd() Yun Lu
@ 2025-07-11  9:33 ` Yun Lu
  2025-07-11 19:32   ` Willem de Bruijn
  2025-07-13  0:40 ` [PATCH v5 0/2] fix two issues and optimize code on tpacket_snd() patchwork-bot+netdevbpf
  2 siblings, 1 reply; 5+ messages in thread
From: Yun Lu @ 2025-07-11  9:33 UTC (permalink / raw)
  To: willemdebruijn.kernel, davem, edumazet, kuba, pabeni, horms
  Cc: netdev, linux-kernel

From: Yun Lu <luyun@kylinos.cn>

When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for
pending_refcnt to decrement to zero before returning. The pending_refcnt
is decremented by 1 when the skb->destructor function is called,
indicating that the skb has been successfully sent and needs to be
destroyed.

If an error occurs during this process, the tpacket_snd() function will
exit and return error, but pending_refcnt may not yet have decremented to
zero. Assuming the next send operation is executed immediately, but there
are no available frames to be sent in tx_ring (i.e., packet_current_frame
returns NULL), and skb is also NULL, the function will not execute
wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it
will enter a do-while loop, waiting for pending_refcnt to be zero. Even
if the previous skb has completed transmission, the skb->destructor
function can only be invoked in the ksoftirqd thread (assuming NAPI
threading is enabled). When both the ksoftirqd thread and the tpacket_snd
operation happen to run on the same CPU, and the CPU trapped in the
do-while loop without yielding, the ksoftirqd thread will not get
scheduled to run. As a result, pending_refcnt will never be reduced to
zero, and the do-while loop cannot exit, eventually leading to a CPU soft
lockup issue.

In fact, skb is true for all but the first iterations of that loop, and
as long as pending_refcnt is not zero, even if incremented by a previous
call, wait_for_completion_interruptible_timeout() should be executed to
yield the CPU, allowing the ksoftirqd thread to be scheduled. Therefore,
the execution condition of this function should be modified to check if
pending_refcnt is not zero, instead of check skb.

-	if (need_wait && skb) {
+	if (need_wait && packet_read_pending(&po->tx_ring)) {

As a result, the judgment conditions are duplicated with the end code of
the while loop, and packet_read_pending() is a very expensive function.
Actually, this loop can only exit when ph is NULL, so the loop condition
can be changed to while (1), and in the "ph = NULL" branch, if the
subsequent condition of if is not met,  the loop can break directly. Now,
the loop logic remains the same as origin but is clearer and more obvious.

Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET")
Cc: stable@kernel.org
Suggested-by: LongJun Tang <tanglongjun@kylinos.cn>
Signed-off-by: Yun Lu <luyun@kylinos.cn>

---
Changes in v5:
- Still combine fix and optimization together, change to while(1). Thanks: Willem de Bruijn.
- Link to v4: https://lore.kernel.org/all/20250710102639.280932-3-luyun_611@163.com/

Changes in v4:
- Split to the fix alone. Thanks: Willem de Bruijn.
- Link to v3: https://lore.kernel.org/all/20250709095653.62469-3-luyun_611@163.com/

Changes in v3:
- Simplify the code and reuse ph to continue. Thanks: Eric Dumazet.
- Link to v2: https://lore.kernel.org/all/20250708020642.27838-1-luyun_611@163.com/

Changes in v2:
- Add a Fixes tag.
- Link to v1: https://lore.kernel.org/all/20250707081629.10344-1-luyun_611@163.com/
---
---
 net/packet/af_packet.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 7089b8c2a655..be608f07441f 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2846,15 +2846,21 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 		ph = packet_current_frame(po, &po->tx_ring,
 					  TP_STATUS_SEND_REQUEST);
 		if (unlikely(ph == NULL)) {
-			if (need_wait && skb) {
+			/* Note: packet_read_pending() might be slow if we
+			 * have to call it as it's per_cpu variable, but in
+			 * fast-path we don't have to call it, only when ph
+			 * is NULL, we need to check the pending_refcnt.
+			 */
+			if (need_wait && packet_read_pending(&po->tx_ring)) {
 				timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
 				if (timeo <= 0) {
 					err = !timeo ? -ETIMEDOUT : -ERESTARTSYS;
 					goto out_put;
 				}
-			}
-			/* check for additional frames */
-			continue;
+				/* check for additional frames */
+				continue;
+			} else
+				break;
 		}
 
 		skb = NULL;
@@ -2943,14 +2949,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 		}
 		packet_increment_head(&po->tx_ring);
 		len_sum += tp_len;
-	} while (likely((ph != NULL) ||
-		/* Note: packet_read_pending() might be slow if we have
-		 * to call it as it's per_cpu variable, but in fast-path
-		 * we already short-circuit the loop with the first
-		 * condition, and luckily don't have to go that path
-		 * anyway.
-		 */
-		 (need_wait && packet_read_pending(&po->tx_ring))));
+	} while (1);
 
 	err = len_sum;
 	goto out_put;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 2/2] af_packet: fix soft lockup issue caused by tpacket_snd()
  2025-07-11  9:33 ` [PATCH v5 2/2] af_packet: fix soft lockup issue caused by tpacket_snd() Yun Lu
@ 2025-07-11 19:32   ` Willem de Bruijn
  0 siblings, 0 replies; 5+ messages in thread
From: Willem de Bruijn @ 2025-07-11 19:32 UTC (permalink / raw)
  To: Yun Lu, willemdebruijn.kernel, davem, edumazet, kuba, pabeni,
	horms
  Cc: netdev, linux-kernel

Yun Lu wrote:
> From: Yun Lu <luyun@kylinos.cn>
> 
> When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for
> pending_refcnt to decrement to zero before returning. The pending_refcnt
> is decremented by 1 when the skb->destructor function is called,
> indicating that the skb has been successfully sent and needs to be
> destroyed.
> 
> If an error occurs during this process, the tpacket_snd() function will
> exit and return error, but pending_refcnt may not yet have decremented to
> zero. Assuming the next send operation is executed immediately, but there
> are no available frames to be sent in tx_ring (i.e., packet_current_frame
> returns NULL), and skb is also NULL, the function will not execute
> wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it
> will enter a do-while loop, waiting for pending_refcnt to be zero. Even
> if the previous skb has completed transmission, the skb->destructor
> function can only be invoked in the ksoftirqd thread (assuming NAPI
> threading is enabled). When both the ksoftirqd thread and the tpacket_snd
> operation happen to run on the same CPU, and the CPU trapped in the
> do-while loop without yielding, the ksoftirqd thread will not get
> scheduled to run. As a result, pending_refcnt will never be reduced to
> zero, and the do-while loop cannot exit, eventually leading to a CPU soft
> lockup issue.
> 
> In fact, skb is true for all but the first iterations of that loop, and
> as long as pending_refcnt is not zero, even if incremented by a previous
> call, wait_for_completion_interruptible_timeout() should be executed to
> yield the CPU, allowing the ksoftirqd thread to be scheduled. Therefore,
> the execution condition of this function should be modified to check if
> pending_refcnt is not zero, instead of check skb.
> 
> -	if (need_wait && skb) {
> +	if (need_wait && packet_read_pending(&po->tx_ring)) {
> 
> As a result, the judgment conditions are duplicated with the end code of
> the while loop, and packet_read_pending() is a very expensive function.
> Actually, this loop can only exit when ph is NULL, so the loop condition
> can be changed to while (1), and in the "ph = NULL" branch, if the
> subsequent condition of if is not met,  the loop can break directly. Now,
> the loop logic remains the same as origin but is clearer and more obvious.
> 
> Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET")
> Cc: stable@kernel.org
> Suggested-by: LongJun Tang <tanglongjun@kylinos.cn>
> Signed-off-by: Yun Lu <luyun@kylinos.cn>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5 0/2] fix two issues and optimize code on tpacket_snd()
  2025-07-11  9:32 [PATCH v5 0/2] fix two issues and optimize code on tpacket_snd() Yun Lu
  2025-07-11  9:32 ` [PATCH v5 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd() Yun Lu
  2025-07-11  9:33 ` [PATCH v5 2/2] af_packet: fix soft lockup issue caused by tpacket_snd() Yun Lu
@ 2025-07-13  0:40 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-07-13  0:40 UTC (permalink / raw)
  To: Yun Lu
  Cc: willemdebruijn.kernel, davem, edumazet, kuba, pabeni, horms,
	netdev, linux-kernel

Hello:

This series was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Fri, 11 Jul 2025 17:32:58 +0800 you wrote:
> From: Yun Lu <luyun@kylinos.cn>
> 
> This series fix two issues and optimize the code on tpacket_snd():
> 1, fix the SO_SNDTIMEO constraint not effective due to the changes in
> commit 581073f626e3;
> 2, fix a soft lockup issue on a specific edge case, and also optimize
> the loop logic to be clearer and more obvious;
> 
> [...]

Here is the summary with links:
  - [v5,1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd()
    https://git.kernel.org/netdev/net/c/c1ba3c0cbdb5
  - [v5,2/2] af_packet: fix soft lockup issue caused by tpacket_snd()
    https://git.kernel.org/netdev/net/c/55f0bfc03705

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-07-13  0:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-11  9:32 [PATCH v5 0/2] fix two issues and optimize code on tpacket_snd() Yun Lu
2025-07-11  9:32 ` [PATCH v5 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd() Yun Lu
2025-07-11  9:33 ` [PATCH v5 2/2] af_packet: fix soft lockup issue caused by tpacket_snd() Yun Lu
2025-07-11 19:32   ` Willem de Bruijn
2025-07-13  0:40 ` [PATCH v5 0/2] fix two issues and optimize code on tpacket_snd() patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).