netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] fix two issues on tpacket_snd()
@ 2025-07-09  9:56 Yun Lu
  2025-07-09  9:56 ` [PATCH v3 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd() Yun Lu
  2025-07-09  9:56 ` [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd() Yun Lu
  0 siblings, 2 replies; 13+ messages in thread
From: Yun Lu @ 2025-07-09  9:56 UTC (permalink / raw)
  To: willemdebruijn.kernel, davem, edumazet, kuba, pabeni; +Cc: netdev, linux-kernel

From: Yun Lu <luyun@kylinos.cn>

This series fix two issues on tpacket_snd():
1, fix the SO_SNDTIMEO constraint not effective;
2, fix a soft lockup issue.

---
Changes in v3:
- Split in two different patches.
- Simplify the code and reuse ph to continue. Thanks: Eric Dumazet.
- Link to v2: https://lore.kernel.org/all/20250708020642.27838-1-luyun_611@163.com/

Changes in v2:
- Add a Fixes tag.
- Link to v1: https://lore.kernel.org/all/20250707081629.10344-1-luyun_611@163.com/

Yun Lu (2):
  af_packet: fix the SO_SNDTIMEO constraint not effective on
    tpacked_snd()
  af_packet: fix soft lockup issue caused by tpacket_snd()

 net/packet/af_packet.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd()
  2025-07-09  9:56 [PATCH v3 0/2] fix two issues on tpacket_snd() Yun Lu
@ 2025-07-09  9:56 ` Yun Lu
  2025-07-09 12:41   ` Eric Dumazet
                     ` (2 more replies)
  2025-07-09  9:56 ` [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd() Yun Lu
  1 sibling, 3 replies; 13+ messages in thread
From: Yun Lu @ 2025-07-09  9:56 UTC (permalink / raw)
  To: willemdebruijn.kernel, davem, edumazet, kuba, pabeni; +Cc: netdev, linux-kernel

From: Yun Lu <luyun@kylinos.cn>

Due to the changes in commit 581073f626e3 ("af_packet: do not call
packet_read_pending() from tpacket_destruct_skb()"), every time
tpacket_destruct_skb() is executed, the skb_completion is marked as
completed. When wait_for_completion_interruptible_timeout() returns
completed, the pending_refcnt has not yet been reduced to zero.
Therefore, when ph is NULL, the wait function may need to be called
multiple times untill packet_read_pending() finally returns zero.

We should call sock_sndtimeo() only once, otherwise the SO_SNDTIMEO
constraint could be way off.

Fixes: 581073f626e3 ("af_packet: do not call packet_read_pending() from tpacket_destruct_skb()")
Cc: stable@kernel.org
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yun Lu <luyun@kylinos.cn>
---
 net/packet/af_packet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 3d43f3eae759..7089b8c2a655 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2785,7 +2785,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 	int len_sum = 0;
 	int status = TP_STATUS_AVAILABLE;
 	int hlen, tlen, copylen = 0;
-	long timeo = 0;
+	long timeo;
 
 	mutex_lock(&po->pg_vec_lock);
 
@@ -2839,6 +2839,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 	if ((size_max > dev->mtu + reserve + VLAN_HLEN) && !vnet_hdr_sz)
 		size_max = dev->mtu + reserve + VLAN_HLEN;
 
+	timeo = sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT);
 	reinit_completion(&po->skb_completion);
 
 	do {
@@ -2846,7 +2847,6 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 					  TP_STATUS_SEND_REQUEST);
 		if (unlikely(ph == NULL)) {
 			if (need_wait && skb) {
-				timeo = sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT);
 				timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
 				if (timeo <= 0) {
 					err = !timeo ? -ETIMEDOUT : -ERESTARTSYS;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd()
  2025-07-09  9:56 [PATCH v3 0/2] fix two issues on tpacket_snd() Yun Lu
  2025-07-09  9:56 ` [PATCH v3 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd() Yun Lu
@ 2025-07-09  9:56 ` Yun Lu
  2025-07-09 12:44   ` Eric Dumazet
                     ` (3 more replies)
  1 sibling, 4 replies; 13+ messages in thread
From: Yun Lu @ 2025-07-09  9:56 UTC (permalink / raw)
  To: willemdebruijn.kernel, davem, edumazet, kuba, pabeni; +Cc: netdev, linux-kernel

From: Yun Lu <luyun@kylinos.cn>

When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for
pending_refcnt to decrement to zero before returning. The pending_refcnt
is decremented by 1 when the skb->destructor function is called,
indicating that the skb has been successfully sent and needs to be
destroyed.

If an error occurs during this process, the tpacket_snd() function will
exit and return error, but pending_refcnt may not yet have decremented to
zero. Assuming the next send operation is executed immediately, but there
are no available frames to be sent in tx_ring (i.e., packet_current_frame
returns NULL), and skb is also NULL, the function will not execute
wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it
will enter a do-while loop, waiting for pending_refcnt to be zero. Even
if the previous skb has completed transmission, the skb->destructor
function can only be invoked in the ksoftirqd thread (assuming NAPI
threading is enabled). When both the ksoftirqd thread and the tpacket_snd
operation happen to run on the same CPU, and the CPU trapped in the
do-while loop without yielding, the ksoftirqd thread will not get
scheduled to run. As a result, pending_refcnt will never be reduced to
zero, and the do-while loop cannot exit, eventually leading to a CPU soft
lockup issue.

In fact, as long as pending_refcnt is not zero, even if skb is NULL,
wait_for_completion_interruptible_timeout() should be executed to yield
the CPU, allowing the ksoftirqd thread to be scheduled. Therefore, move
the penging_refcnt check to the start of the do-while loop, and reuse ph
to continue for the next iteration.

Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET")
Cc: stable@kernel.org
Suggested-by: LongJun Tang <tanglongjun@kylinos.cn>
Signed-off-by: Yun Lu <luyun@kylinos.cn>

---
Changes in v3:
- Simplify the code and reuse ph to continue. Thanks: Eric Dumazet.
- Link to v2: https://lore.kernel.org/all/20250708020642.27838-1-luyun_611@163.com/

Changes in v2:
- Add a Fixes tag.
- Link to v1: https://lore.kernel.org/all/20250707081629.10344-1-luyun_611@163.com/
---
 net/packet/af_packet.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 7089b8c2a655..89a5d2a3a720 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2846,11 +2846,21 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 		ph = packet_current_frame(po, &po->tx_ring,
 					  TP_STATUS_SEND_REQUEST);
 		if (unlikely(ph == NULL)) {
-			if (need_wait && skb) {
+			/* Note: packet_read_pending() might be slow if we
+			 * have to call it as it's per_cpu variable, but in
+			 * fast-path we don't have to call it, only when ph
+			 * is NULL, we need to check pending_refcnt.
+			 */
+			if (need_wait && packet_read_pending(&po->tx_ring)) {
 				timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
 				if (timeo <= 0) {
 					err = !timeo ? -ETIMEDOUT : -ERESTARTSYS;
 					goto out_put;
+				} else {
+					/* Just reuse ph to continue for the next iteration, and
+					 * ph will be reassigned at the start of the next iteration.
+					 */
+					ph = (void *)1;
 				}
 			}
 			/* check for additional frames */
@@ -2943,14 +2953,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 		}
 		packet_increment_head(&po->tx_ring);
 		len_sum += tp_len;
-	} while (likely((ph != NULL) ||
-		/* Note: packet_read_pending() might be slow if we have
-		 * to call it as it's per_cpu variable, but in fast-path
-		 * we already short-circuit the loop with the first
-		 * condition, and luckily don't have to go that path
-		 * anyway.
-		 */
-		 (need_wait && packet_read_pending(&po->tx_ring))));
+	} while (likely(ph != NULL))
 
 	err = len_sum;
 	goto out_put;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd()
  2025-07-09  9:56 ` [PATCH v3 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd() Yun Lu
@ 2025-07-09 12:41   ` Eric Dumazet
  2025-07-09 17:06   ` Willem de Bruijn
  2025-07-09 18:15   ` Simon Horman
  2 siblings, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2025-07-09 12:41 UTC (permalink / raw)
  To: Yun Lu; +Cc: willemdebruijn.kernel, davem, kuba, pabeni, netdev, linux-kernel

On Wed, Jul 9, 2025 at 2:57 AM Yun Lu <luyun_611@163.com> wrote:
>
> From: Yun Lu <luyun@kylinos.cn>
>
> Due to the changes in commit 581073f626e3 ("af_packet: do not call
> packet_read_pending() from tpacket_destruct_skb()"), every time
> tpacket_destruct_skb() is executed, the skb_completion is marked as
> completed. When wait_for_completion_interruptible_timeout() returns
> completed, the pending_refcnt has not yet been reduced to zero.
> Therefore, when ph is NULL, the wait function may need to be called
> multiple times untill packet_read_pending() finally returns zero.
>
> We should call sock_sndtimeo() only once, otherwise the SO_SNDTIMEO
> constraint could be way off.
>
> Fixes: 581073f626e3 ("af_packet: do not call packet_read_pending() from tpacket_destruct_skb()")
> Cc: stable@kernel.org
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Yun Lu <luyun@kylinos.cn>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd()
  2025-07-09  9:56 ` [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd() Yun Lu
@ 2025-07-09 12:44   ` Eric Dumazet
  2025-07-10  2:18     ` luyun
  2025-07-09 18:14   ` Simon Horman
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2025-07-09 12:44 UTC (permalink / raw)
  To: Yun Lu; +Cc: willemdebruijn.kernel, davem, kuba, pabeni, netdev, linux-kernel

On Wed, Jul 9, 2025 at 2:57 AM Yun Lu <luyun_611@163.com> wrote:
>
> From: Yun Lu <luyun@kylinos.cn>
>
> When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for
> pending_refcnt to decrement to zero before returning. The pending_refcnt
> is decremented by 1 when the skb->destructor function is called,
> indicating that the skb has been successfully sent and needs to be
> destroyed.
>
> If an error occurs during this process, the tpacket_snd() function will
> exit and return error, but pending_refcnt may not yet have decremented to
> zero. Assuming the next send operation is executed immediately, but there
> are no available frames to be sent in tx_ring (i.e., packet_current_frame
> returns NULL), and skb is also NULL, the function will not execute
> wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it
> will enter a do-while loop, waiting for pending_refcnt to be zero. Even
> if the previous skb has completed transmission, the skb->destructor
> function can only be invoked in the ksoftirqd thread (assuming NAPI
> threading is enabled). When both the ksoftirqd thread and the tpacket_snd
> operation happen to run on the same CPU, and the CPU trapped in the
> do-while loop without yielding, the ksoftirqd thread will not get
> scheduled to run. As a result, pending_refcnt will never be reduced to
> zero, and the do-while loop cannot exit, eventually leading to a CPU soft
> lockup issue.
>
> In fact, as long as pending_refcnt is not zero, even if skb is NULL,
> wait_for_completion_interruptible_timeout() should be executed to yield
> the CPU, allowing the ksoftirqd thread to be scheduled. Therefore, move
> the penging_refcnt check to the start of the do-while loop, and reuse ph
> to continue for the next iteration.
>
> Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET")
> Cc: stable@kernel.org
> Suggested-by: LongJun Tang <tanglongjun@kylinos.cn>
> Signed-off-by: Yun Lu <luyun@kylinos.cn>
>
> ---
> Changes in v3:
> - Simplify the code and reuse ph to continue. Thanks: Eric Dumazet.
> - Link to v2: https://lore.kernel.org/all/20250708020642.27838-1-luyun_611@163.com/
>
> Changes in v2:
> - Add a Fixes tag.
> - Link to v1: https://lore.kernel.org/all/20250707081629.10344-1-luyun_611@163.com/
> ---
>  net/packet/af_packet.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 7089b8c2a655..89a5d2a3a720 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2846,11 +2846,21 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
>                 ph = packet_current_frame(po, &po->tx_ring,
>                                           TP_STATUS_SEND_REQUEST);
>                 if (unlikely(ph == NULL)) {
> -                       if (need_wait && skb) {
> +                       /* Note: packet_read_pending() might be slow if we
> +                        * have to call it as it's per_cpu variable, but in
> +                        * fast-path we don't have to call it, only when ph
> +                        * is NULL, we need to check pending_refcnt.
> +                        */
> +                       if (need_wait && packet_read_pending(&po->tx_ring)) {
>                                 timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
>                                 if (timeo <= 0) {
>                                         err = !timeo ? -ETIMEDOUT : -ERESTARTSYS;
>                                         goto out_put;
> +                               } else {

nit (in case a new version is sent) : No need for an else {} after a
"goto XXXXX;"

if (....) {
     .....
     goto out_put;
}
/* Just reuse ph to continue for the next iteration, and...
 * .....
 */
ph = (void *)1;


Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd()
  2025-07-09  9:56 ` [PATCH v3 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd() Yun Lu
  2025-07-09 12:41   ` Eric Dumazet
@ 2025-07-09 17:06   ` Willem de Bruijn
  2025-07-09 18:15   ` Simon Horman
  2 siblings, 0 replies; 13+ messages in thread
From: Willem de Bruijn @ 2025-07-09 17:06 UTC (permalink / raw)
  To: Yun Lu, willemdebruijn.kernel, davem, edumazet, kuba, pabeni
  Cc: netdev, linux-kernel

Yun Lu wrote:
> From: Yun Lu <luyun@kylinos.cn>
> 
> Due to the changes in commit 581073f626e3 ("af_packet: do not call
> packet_read_pending() from tpacket_destruct_skb()"), every time
> tpacket_destruct_skb() is executed, the skb_completion is marked as
> completed. When wait_for_completion_interruptible_timeout() returns
> completed, the pending_refcnt has not yet been reduced to zero.
> Therefore, when ph is NULL, the wait function may need to be called
> multiple times untill packet_read_pending() finally returns zero.
> 
> We should call sock_sndtimeo() only once, otherwise the SO_SNDTIMEO
> constraint could be way off.
> 
> Fixes: 581073f626e3 ("af_packet: do not call packet_read_pending() from tpacket_destruct_skb()")
> Cc: stable@kernel.org
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Yun Lu <luyun@kylinos.cn>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd()
  2025-07-09  9:56 ` [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd() Yun Lu
  2025-07-09 12:44   ` Eric Dumazet
@ 2025-07-09 18:14   ` Simon Horman
  2025-07-10  2:20     ` luyun
  2025-07-09 21:14   ` Willem de Bruijn
  2025-07-10  7:27   ` kernel test robot
  3 siblings, 1 reply; 13+ messages in thread
From: Simon Horman @ 2025-07-09 18:14 UTC (permalink / raw)
  To: Yun Lu
  Cc: willemdebruijn.kernel, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel

On Wed, Jul 09, 2025 at 05:56:53PM +0800, Yun Lu wrote:

...

> @@ -2943,14 +2953,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
>  		}
>  		packet_increment_head(&po->tx_ring);
>  		len_sum += tp_len;
> -	} while (likely((ph != NULL) ||
> -		/* Note: packet_read_pending() might be slow if we have
> -		 * to call it as it's per_cpu variable, but in fast-path
> -		 * we already short-circuit the loop with the first
> -		 * condition, and luckily don't have to go that path
> -		 * anyway.
> -		 */
> -		 (need_wait && packet_read_pending(&po->tx_ring))));
> +	} while (likely(ph != NULL))

A semicolon is needed at the end of the line above.

>  
>  	err = len_sum;
>  	goto out_put;

-- 
pw-bot: changes-requested

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd()
  2025-07-09  9:56 ` [PATCH v3 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd() Yun Lu
  2025-07-09 12:41   ` Eric Dumazet
  2025-07-09 17:06   ` Willem de Bruijn
@ 2025-07-09 18:15   ` Simon Horman
  2 siblings, 0 replies; 13+ messages in thread
From: Simon Horman @ 2025-07-09 18:15 UTC (permalink / raw)
  To: Yun Lu
  Cc: willemdebruijn.kernel, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel

On Wed, Jul 09, 2025 at 05:56:52PM +0800, Yun Lu wrote:
> From: Yun Lu <luyun@kylinos.cn>
> 
> Due to the changes in commit 581073f626e3 ("af_packet: do not call
> packet_read_pending() from tpacket_destruct_skb()"), every time
> tpacket_destruct_skb() is executed, the skb_completion is marked as
> completed. When wait_for_completion_interruptible_timeout() returns
> completed, the pending_refcnt has not yet been reduced to zero.
> Therefore, when ph is NULL, the wait function may need to be called
> multiple times untill packet_read_pending() finally returns zero.

nit: until

> 
> We should call sock_sndtimeo() only once, otherwise the SO_SNDTIMEO
> constraint could be way off.
> 
> Fixes: 581073f626e3 ("af_packet: do not call packet_read_pending() from tpacket_destruct_skb()")
> Cc: stable@kernel.org
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Yun Lu <luyun@kylinos.cn>

...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd()
  2025-07-09  9:56 ` [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd() Yun Lu
  2025-07-09 12:44   ` Eric Dumazet
  2025-07-09 18:14   ` Simon Horman
@ 2025-07-09 21:14   ` Willem de Bruijn
  2025-07-10  2:36     ` luyun
  2025-07-10  7:27   ` kernel test robot
  3 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2025-07-09 21:14 UTC (permalink / raw)
  To: Yun Lu, willemdebruijn.kernel, davem, edumazet, kuba, pabeni
  Cc: netdev, linux-kernel

Yun Lu wrote:
> From: Yun Lu <luyun@kylinos.cn>
> 
> When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for
> pending_refcnt to decrement to zero before returning. The pending_refcnt
> is decremented by 1 when the skb->destructor function is called,
> indicating that the skb has been successfully sent and needs to be
> destroyed.
> 
> If an error occurs during this process, the tpacket_snd() function will
> exit and return error, but pending_refcnt may not yet have decremented to
> zero. Assuming the next send operation is executed immediately, but there
> are no available frames to be sent in tx_ring (i.e., packet_current_frame
> returns NULL), and skb is also NULL

This is a very specific edge case. And arguably the goal is to wait
for any pending skbs still, even if from a previous call.

skb is true for all but the first iterations of that loop. So your
earlier patch

-                       if (need_wait && skb) {
+                       if (need_wait && packet_read_pending(&po->tx_ring)) {

Is more concise and more obviously correct.

>, the function will not execute
> wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it
> will enter a do-while loop, waiting for pending_refcnt to be zero. Even
> if the previous skb has completed transmission, the skb->destructor
> function can only be invoked in the ksoftirqd thread (assuming NAPI
> threading is enabled). When both the ksoftirqd thread and the tpacket_snd
> operation happen to run on the same CPU, and the CPU trapped in the
> do-while loop without yielding, the ksoftirqd thread will not get
> scheduled to run.

Interestingly, this is quite similar to the issue that caused adding
the completion in the first place. Commit 89ed5b519004 ("af_packet:
Block execution of tasks waiting for transmit to complete in
AF_PACKET") added the completion because a SCHED_FIFO task could delay
ksoftirqd indefinitely.

> As a result, pending_refcnt will never be reduced to
> zero, and the do-while loop cannot exit, eventually leading to a CPU soft
> lockup issue.
> 
> In fact, as long as pending_refcnt is not zero, even if skb is NULL,
> wait_for_completion_interruptible_timeout() should be executed to yield
> the CPU, allowing the ksoftirqd thread to be scheduled. Therefore, move
> the penging_refcnt check to the start of the do-while loop, and reuse ph
> to continue for the next iteration.
> 
> Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET")
> Cc: stable@kernel.org
> Suggested-by: LongJun Tang <tanglongjun@kylinos.cn>
> Signed-off-by: Yun Lu <luyun@kylinos.cn>
> 
> ---
> Changes in v3:
> - Simplify the code and reuse ph to continue. Thanks: Eric Dumazet.
> - Link to v2: https://lore.kernel.org/all/20250708020642.27838-1-luyun_611@163.com/

If the fix alone is more obvious without this optimization, and
the extra packet_read_pending() is already present, not newly
introduced with the fix, then I would prefer to split the fix (to net,
and stable) from the optimization (to net-next).
 
> Changes in v2:
> - Add a Fixes tag.
> - Link to v1: https://lore.kernel.org/all/20250707081629.10344-1-luyun_611@163.com/
> ---
>  net/packet/af_packet.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 7089b8c2a655..89a5d2a3a720 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2846,11 +2846,21 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
>  		ph = packet_current_frame(po, &po->tx_ring,
>  					  TP_STATUS_SEND_REQUEST);
>  		if (unlikely(ph == NULL)) {
> -			if (need_wait && skb) {
> +			/* Note: packet_read_pending() might be slow if we
> +			 * have to call it as it's per_cpu variable, but in
> +			 * fast-path we don't have to call it, only when ph
> +			 * is NULL, we need to check pending_refcnt.
> +			 */
> +			if (need_wait && packet_read_pending(&po->tx_ring)) {
>  				timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
>  				if (timeo <= 0) {
>  					err = !timeo ? -ETIMEDOUT : -ERESTARTSYS;
>  					goto out_put;
> +				} else {
> +					/* Just reuse ph to continue for the next iteration, and
> +					 * ph will be reassigned at the start of the next iteration.
> +					 */
> +					ph = (void *)1;
>  				}
>  			}
>  			/* check for additional frames */
> @@ -2943,14 +2953,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
>  		}
>  		packet_increment_head(&po->tx_ring);
>  		len_sum += tp_len;
> -	} while (likely((ph != NULL) ||
> -		/* Note: packet_read_pending() might be slow if we have
> -		 * to call it as it's per_cpu variable, but in fast-path
> -		 * we already short-circuit the loop with the first
> -		 * condition, and luckily don't have to go that path
> -		 * anyway.
> -		 */
> -		 (need_wait && packet_read_pending(&po->tx_ring))));
> +	} while (likely(ph != NULL))
>  
>  	err = len_sum;
>  	goto out_put;
> -- 
> 2.43.0
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd()
  2025-07-09 12:44   ` Eric Dumazet
@ 2025-07-10  2:18     ` luyun
  0 siblings, 0 replies; 13+ messages in thread
From: luyun @ 2025-07-10  2:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: willemdebruijn.kernel, davem, kuba, pabeni, netdev, linux-kernel


在 2025/7/9 20:44, Eric Dumazet 写道:
> On Wed, Jul 9, 2025 at 2:57 AM Yun Lu <luyun_611@163.com> wrote:
>> From: Yun Lu <luyun@kylinos.cn>
>>
>> When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for
>> pending_refcnt to decrement to zero before returning. The pending_refcnt
>> is decremented by 1 when the skb->destructor function is called,
>> indicating that the skb has been successfully sent and needs to be
>> destroyed.
>>
>> If an error occurs during this process, the tpacket_snd() function will
>> exit and return error, but pending_refcnt may not yet have decremented to
>> zero. Assuming the next send operation is executed immediately, but there
>> are no available frames to be sent in tx_ring (i.e., packet_current_frame
>> returns NULL), and skb is also NULL, the function will not execute
>> wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it
>> will enter a do-while loop, waiting for pending_refcnt to be zero. Even
>> if the previous skb has completed transmission, the skb->destructor
>> function can only be invoked in the ksoftirqd thread (assuming NAPI
>> threading is enabled). When both the ksoftirqd thread and the tpacket_snd
>> operation happen to run on the same CPU, and the CPU trapped in the
>> do-while loop without yielding, the ksoftirqd thread will not get
>> scheduled to run. As a result, pending_refcnt will never be reduced to
>> zero, and the do-while loop cannot exit, eventually leading to a CPU soft
>> lockup issue.
>>
>> In fact, as long as pending_refcnt is not zero, even if skb is NULL,
>> wait_for_completion_interruptible_timeout() should be executed to yield
>> the CPU, allowing the ksoftirqd thread to be scheduled. Therefore, move
>> the penging_refcnt check to the start of the do-while loop, and reuse ph
>> to continue for the next iteration.
>>
>> Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET")
>> Cc: stable@kernel.org
>> Suggested-by: LongJun Tang <tanglongjun@kylinos.cn>
>> Signed-off-by: Yun Lu <luyun@kylinos.cn>
>>
>> ---
>> Changes in v3:
>> - Simplify the code and reuse ph to continue. Thanks: Eric Dumazet.
>> - Link to v2: https://lore.kernel.org/all/20250708020642.27838-1-luyun_611@163.com/
>>
>> Changes in v2:
>> - Add a Fixes tag.
>> - Link to v1: https://lore.kernel.org/all/20250707081629.10344-1-luyun_611@163.com/
>> ---
>>   net/packet/af_packet.c | 21 ++++++++++++---------
>>   1 file changed, 12 insertions(+), 9 deletions(-)
>>
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>> index 7089b8c2a655..89a5d2a3a720 100644
>> --- a/net/packet/af_packet.c
>> +++ b/net/packet/af_packet.c
>> @@ -2846,11 +2846,21 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
>>                  ph = packet_current_frame(po, &po->tx_ring,
>>                                            TP_STATUS_SEND_REQUEST);
>>                  if (unlikely(ph == NULL)) {
>> -                       if (need_wait && skb) {
>> +                       /* Note: packet_read_pending() might be slow if we
>> +                        * have to call it as it's per_cpu variable, but in
>> +                        * fast-path we don't have to call it, only when ph
>> +                        * is NULL, we need to check pending_refcnt.
>> +                        */
>> +                       if (need_wait && packet_read_pending(&po->tx_ring)) {
>>                                  timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
>>                                  if (timeo <= 0) {
>>                                          err = !timeo ? -ETIMEDOUT : -ERESTARTSYS;
>>                                          goto out_put;
>> +                               } else {
> nit (in case a new version is sent) : No need for an else {} after a
> "goto XXXXX;"
>
> if (....) {
>       .....
>       goto out_put;
> }
> /* Just reuse ph to continue for the next iteration, and...
>   * .....
>   */
> ph = (void *)1;

Yes, the code of "else {} " is redundant. I will remove it in the next 
version.

Thanks.

>
>
> Reviewed-by: Eric Dumazet <edumazet@google.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd()
  2025-07-09 18:14   ` Simon Horman
@ 2025-07-10  2:20     ` luyun
  0 siblings, 0 replies; 13+ messages in thread
From: luyun @ 2025-07-10  2:20 UTC (permalink / raw)
  To: Simon Horman
  Cc: willemdebruijn.kernel, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel


在 2025/7/10 02:14, Simon Horman 写道:
> On Wed, Jul 09, 2025 at 05:56:53PM +0800, Yun Lu wrote:
>
> ...
>
>> @@ -2943,14 +2953,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
>>   		}
>>   		packet_increment_head(&po->tx_ring);
>>   		len_sum += tp_len;
>> -	} while (likely((ph != NULL) ||
>> -		/* Note: packet_read_pending() might be slow if we have
>> -		 * to call it as it's per_cpu variable, but in fast-path
>> -		 * we already short-circuit the loop with the first
>> -		 * condition, and luckily don't have to go that path
>> -		 * anyway.
>> -		 */
>> -		 (need_wait && packet_read_pending(&po->tx_ring))));
>> +	} while (likely(ph != NULL))
> A semicolon is needed at the end of the line above.

Sorry, this was my mistake. I will fix it in the next version.

Thank you for pointing it out.

>
>>   
>>   	err = len_sum;
>>   	goto out_put;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd()
  2025-07-09 21:14   ` Willem de Bruijn
@ 2025-07-10  2:36     ` luyun
  0 siblings, 0 replies; 13+ messages in thread
From: luyun @ 2025-07-10  2:36 UTC (permalink / raw)
  To: Willem de Bruijn, davem, edumazet, kuba, pabeni; +Cc: netdev, linux-kernel


在 2025/7/10 05:14, Willem de Bruijn 写道:
> Yun Lu wrote:
>> From: Yun Lu <luyun@kylinos.cn>
>>
>> When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for
>> pending_refcnt to decrement to zero before returning. The pending_refcnt
>> is decremented by 1 when the skb->destructor function is called,
>> indicating that the skb has been successfully sent and needs to be
>> destroyed.
>>
>> If an error occurs during this process, the tpacket_snd() function will
>> exit and return error, but pending_refcnt may not yet have decremented to
>> zero. Assuming the next send operation is executed immediately, but there
>> are no available frames to be sent in tx_ring (i.e., packet_current_frame
>> returns NULL), and skb is also NULL
> This is a very specific edge case. And arguably the goal is to wait
> for any pending skbs still, even if from a previous call.
>
> skb is true for all but the first iterations of that loop. So your
> earlier patch
>
> -                       if (need_wait && skb) {
> +                       if (need_wait && packet_read_pending(&po->tx_ring)) {
>
> Is more concise and more obviously correct.
>
>> , the function will not execute
>> wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it
>> will enter a do-while loop, waiting for pending_refcnt to be zero. Even
>> if the previous skb has completed transmission, the skb->destructor
>> function can only be invoked in the ksoftirqd thread (assuming NAPI
>> threading is enabled). When both the ksoftirqd thread and the tpacket_snd
>> operation happen to run on the same CPU, and the CPU trapped in the
>> do-while loop without yielding, the ksoftirqd thread will not get
>> scheduled to run.
> Interestingly, this is quite similar to the issue that caused adding
> the completion in the first place. Commit 89ed5b519004 ("af_packet:
> Block execution of tasks waiting for transmit to complete in
> AF_PACKET") added the completion because a SCHED_FIFO task could delay
> ksoftirqd indefinitely.
>
>> As a result, pending_refcnt will never be reduced to
>> zero, and the do-while loop cannot exit, eventually leading to a CPU soft
>> lockup issue.
>>
>> In fact, as long as pending_refcnt is not zero, even if skb is NULL,
>> wait_for_completion_interruptible_timeout() should be executed to yield
>> the CPU, allowing the ksoftirqd thread to be scheduled. Therefore, move
>> the penging_refcnt check to the start of the do-while loop, and reuse ph
>> to continue for the next iteration.
>>
>> Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET")
>> Cc: stable@kernel.org
>> Suggested-by: LongJun Tang <tanglongjun@kylinos.cn>
>> Signed-off-by: Yun Lu <luyun@kylinos.cn>
>>
>> ---
>> Changes in v3:
>> - Simplify the code and reuse ph to continue. Thanks: Eric Dumazet.
>> - Link to v2: https://lore.kernel.org/all/20250708020642.27838-1-luyun_611@163.com/
> If the fix alone is more obvious without this optimization, and
> the extra packet_read_pending() is already present, not newly
> introduced with the fix, then I would prefer to split the fix (to net,
> and stable) from the optimization (to net-next).

Alright, referring to your suggestion, I will split this patch into two 
for the next version: one to fix the issue (as the first version, to 
net, and stable), and the other to optimize the code (to net-next).

Thanks for your review and suggestion.


>   
>> Changes in v2:
>> - Add a Fixes tag.
>> - Link to v1: https://lore.kernel.org/all/20250707081629.10344-1-luyun_611@163.com/
>> ---
>>   net/packet/af_packet.c | 21 ++++++++++++---------
>>   1 file changed, 12 insertions(+), 9 deletions(-)
>>
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>> index 7089b8c2a655..89a5d2a3a720 100644
>> --- a/net/packet/af_packet.c
>> +++ b/net/packet/af_packet.c
>> @@ -2846,11 +2846,21 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
>>   		ph = packet_current_frame(po, &po->tx_ring,
>>   					  TP_STATUS_SEND_REQUEST);
>>   		if (unlikely(ph == NULL)) {
>> -			if (need_wait && skb) {
>> +			/* Note: packet_read_pending() might be slow if we
>> +			 * have to call it as it's per_cpu variable, but in
>> +			 * fast-path we don't have to call it, only when ph
>> +			 * is NULL, we need to check pending_refcnt.
>> +			 */
>> +			if (need_wait && packet_read_pending(&po->tx_ring)) {
>>   				timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
>>   				if (timeo <= 0) {
>>   					err = !timeo ? -ETIMEDOUT : -ERESTARTSYS;
>>   					goto out_put;
>> +				} else {
>> +					/* Just reuse ph to continue for the next iteration, and
>> +					 * ph will be reassigned at the start of the next iteration.
>> +					 */
>> +					ph = (void *)1;
>>   				}
>>   			}
>>   			/* check for additional frames */
>> @@ -2943,14 +2953,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
>>   		}
>>   		packet_increment_head(&po->tx_ring);
>>   		len_sum += tp_len;
>> -	} while (likely((ph != NULL) ||
>> -		/* Note: packet_read_pending() might be slow if we have
>> -		 * to call it as it's per_cpu variable, but in fast-path
>> -		 * we already short-circuit the loop with the first
>> -		 * condition, and luckily don't have to go that path
>> -		 * anyway.
>> -		 */
>> -		 (need_wait && packet_read_pending(&po->tx_ring))));
>> +	} while (likely(ph != NULL))
>>   
>>   	err = len_sum;
>>   	goto out_put;
>> -- 
>> 2.43.0
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd()
  2025-07-09  9:56 ` [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd() Yun Lu
                     ` (2 preceding siblings ...)
  2025-07-09 21:14   ` Willem de Bruijn
@ 2025-07-10  7:27   ` kernel test robot
  3 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2025-07-10  7:27 UTC (permalink / raw)
  To: Yun Lu, willemdebruijn.kernel, davem, edumazet, kuba, pabeni
  Cc: oe-kbuild-all, netdev, linux-kernel

Hi Yun,

kernel test robot noticed the following build errors:

[auto build test ERROR on net-next/main]
[also build test ERROR on net/main linus/master v6.16-rc5 next-20250709]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yun-Lu/af_packet-fix-the-SO_SNDTIMEO-constraint-not-effective-on-tpacked_snd/20250709-175915
base:   net-next/main
patch link:    https://lore.kernel.org/r/20250709095653.62469-3-luyun_611%40163.com
patch subject: [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd()
config: i386-buildonly-randconfig-001-20250710 (https://download.01.org/0day-ci/archive/20250710/202507101547.Li8m6iCU-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14+deb12u1) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250710/202507101547.Li8m6iCU-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507101547.Li8m6iCU-lkp@intel.com/

All errors (new ones prefixed by >>):

   net/packet/af_packet.c: In function 'tpacket_snd':
>> net/packet/af_packet.c:2956:37: error: expected ';' before 'err'
    2956 |         } while (likely(ph != NULL))
         |                                     ^
         |                                     ;
    2957 | 
    2958 |         err = len_sum;
         |         ~~~                          


vim +2956 net/packet/af_packet.c

  2769	
  2770	static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
  2771	{
  2772		struct sk_buff *skb = NULL;
  2773		struct net_device *dev;
  2774		struct virtio_net_hdr *vnet_hdr = NULL;
  2775		struct sockcm_cookie sockc;
  2776		__be16 proto;
  2777		int err, reserve = 0;
  2778		void *ph;
  2779		DECLARE_SOCKADDR(struct sockaddr_ll *, saddr, msg->msg_name);
  2780		bool need_wait = !(msg->msg_flags & MSG_DONTWAIT);
  2781		int vnet_hdr_sz = READ_ONCE(po->vnet_hdr_sz);
  2782		unsigned char *addr = NULL;
  2783		int tp_len, size_max;
  2784		void *data;
  2785		int len_sum = 0;
  2786		int status = TP_STATUS_AVAILABLE;
  2787		int hlen, tlen, copylen = 0;
  2788		long timeo;
  2789	
  2790		mutex_lock(&po->pg_vec_lock);
  2791	
  2792		/* packet_sendmsg() check on tx_ring.pg_vec was lockless,
  2793		 * we need to confirm it under protection of pg_vec_lock.
  2794		 */
  2795		if (unlikely(!po->tx_ring.pg_vec)) {
  2796			err = -EBUSY;
  2797			goto out;
  2798		}
  2799		if (likely(saddr == NULL)) {
  2800			dev	= packet_cached_dev_get(po);
  2801			proto	= READ_ONCE(po->num);
  2802		} else {
  2803			err = -EINVAL;
  2804			if (msg->msg_namelen < sizeof(struct sockaddr_ll))
  2805				goto out;
  2806			if (msg->msg_namelen < (saddr->sll_halen
  2807						+ offsetof(struct sockaddr_ll,
  2808							sll_addr)))
  2809				goto out;
  2810			proto	= saddr->sll_protocol;
  2811			dev = dev_get_by_index(sock_net(&po->sk), saddr->sll_ifindex);
  2812			if (po->sk.sk_socket->type == SOCK_DGRAM) {
  2813				if (dev && msg->msg_namelen < dev->addr_len +
  2814					   offsetof(struct sockaddr_ll, sll_addr))
  2815					goto out_put;
  2816				addr = saddr->sll_addr;
  2817			}
  2818		}
  2819	
  2820		err = -ENXIO;
  2821		if (unlikely(dev == NULL))
  2822			goto out;
  2823		err = -ENETDOWN;
  2824		if (unlikely(!(dev->flags & IFF_UP)))
  2825			goto out_put;
  2826	
  2827		sockcm_init(&sockc, &po->sk);
  2828		if (msg->msg_controllen) {
  2829			err = sock_cmsg_send(&po->sk, msg, &sockc);
  2830			if (unlikely(err))
  2831				goto out_put;
  2832		}
  2833	
  2834		if (po->sk.sk_socket->type == SOCK_RAW)
  2835			reserve = dev->hard_header_len;
  2836		size_max = po->tx_ring.frame_size
  2837			- (po->tp_hdrlen - sizeof(struct sockaddr_ll));
  2838	
  2839		if ((size_max > dev->mtu + reserve + VLAN_HLEN) && !vnet_hdr_sz)
  2840			size_max = dev->mtu + reserve + VLAN_HLEN;
  2841	
  2842		timeo = sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT);
  2843		reinit_completion(&po->skb_completion);
  2844	
  2845		do {
  2846			ph = packet_current_frame(po, &po->tx_ring,
  2847						  TP_STATUS_SEND_REQUEST);
  2848			if (unlikely(ph == NULL)) {
  2849				/* Note: packet_read_pending() might be slow if we
  2850				 * have to call it as it's per_cpu variable, but in
  2851				 * fast-path we don't have to call it, only when ph
  2852				 * is NULL, we need to check pending_refcnt.
  2853				 */
  2854				if (need_wait && packet_read_pending(&po->tx_ring)) {
  2855					timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
  2856					if (timeo <= 0) {
  2857						err = !timeo ? -ETIMEDOUT : -ERESTARTSYS;
  2858						goto out_put;
  2859					} else {
  2860						/* Just reuse ph to continue for the next iteration, and
  2861						 * ph will be reassigned at the start of the next iteration.
  2862						 */
  2863						ph = (void *)1;
  2864					}
  2865				}
  2866				/* check for additional frames */
  2867				continue;
  2868			}
  2869	
  2870			skb = NULL;
  2871			tp_len = tpacket_parse_header(po, ph, size_max, &data);
  2872			if (tp_len < 0)
  2873				goto tpacket_error;
  2874	
  2875			status = TP_STATUS_SEND_REQUEST;
  2876			hlen = LL_RESERVED_SPACE(dev);
  2877			tlen = dev->needed_tailroom;
  2878			if (vnet_hdr_sz) {
  2879				vnet_hdr = data;
  2880				data += vnet_hdr_sz;
  2881				tp_len -= vnet_hdr_sz;
  2882				if (tp_len < 0 ||
  2883				    __packet_snd_vnet_parse(vnet_hdr, tp_len)) {
  2884					tp_len = -EINVAL;
  2885					goto tpacket_error;
  2886				}
  2887				copylen = __virtio16_to_cpu(vio_le(),
  2888							    vnet_hdr->hdr_len);
  2889			}
  2890			copylen = max_t(int, copylen, dev->hard_header_len);
  2891			skb = sock_alloc_send_skb(&po->sk,
  2892					hlen + tlen + sizeof(struct sockaddr_ll) +
  2893					(copylen - dev->hard_header_len),
  2894					!need_wait, &err);
  2895	
  2896			if (unlikely(skb == NULL)) {
  2897				/* we assume the socket was initially writeable ... */
  2898				if (likely(len_sum > 0))
  2899					err = len_sum;
  2900				goto out_status;
  2901			}
  2902			tp_len = tpacket_fill_skb(po, skb, ph, dev, data, tp_len, proto,
  2903						  addr, hlen, copylen, &sockc);
  2904			if (likely(tp_len >= 0) &&
  2905			    tp_len > dev->mtu + reserve &&
  2906			    !vnet_hdr_sz &&
  2907			    !packet_extra_vlan_len_allowed(dev, skb))
  2908				tp_len = -EMSGSIZE;
  2909	
  2910			if (unlikely(tp_len < 0)) {
  2911	tpacket_error:
  2912				if (packet_sock_flag(po, PACKET_SOCK_TP_LOSS)) {
  2913					__packet_set_status(po, ph,
  2914							TP_STATUS_AVAILABLE);
  2915					packet_increment_head(&po->tx_ring);
  2916					kfree_skb(skb);
  2917					continue;
  2918				} else {
  2919					status = TP_STATUS_WRONG_FORMAT;
  2920					err = tp_len;
  2921					goto out_status;
  2922				}
  2923			}
  2924	
  2925			if (vnet_hdr_sz) {
  2926				if (virtio_net_hdr_to_skb(skb, vnet_hdr, vio_le())) {
  2927					tp_len = -EINVAL;
  2928					goto tpacket_error;
  2929				}
  2930				virtio_net_hdr_set_proto(skb, vnet_hdr);
  2931			}
  2932	
  2933			skb->destructor = tpacket_destruct_skb;
  2934			__packet_set_status(po, ph, TP_STATUS_SENDING);
  2935			packet_inc_pending(&po->tx_ring);
  2936	
  2937			status = TP_STATUS_SEND_REQUEST;
  2938			err = packet_xmit(po, skb);
  2939			if (unlikely(err != 0)) {
  2940				if (err > 0)
  2941					err = net_xmit_errno(err);
  2942				if (err && __packet_get_status(po, ph) ==
  2943					   TP_STATUS_AVAILABLE) {
  2944					/* skb was destructed already */
  2945					skb = NULL;
  2946					goto out_status;
  2947				}
  2948				/*
  2949				 * skb was dropped but not destructed yet;
  2950				 * let's treat it like congestion or err < 0
  2951				 */
  2952				err = 0;
  2953			}
  2954			packet_increment_head(&po->tx_ring);
  2955			len_sum += tp_len;
> 2956		} while (likely(ph != NULL))
  2957	
  2958		err = len_sum;
  2959		goto out_put;
  2960	
  2961	out_status:
  2962		__packet_set_status(po, ph, status);
  2963		kfree_skb(skb);
  2964	out_put:
  2965		dev_put(dev);
  2966	out:
  2967		mutex_unlock(&po->pg_vec_lock);
  2968		return err;
  2969	}
  2970	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-07-10  7:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-09  9:56 [PATCH v3 0/2] fix two issues on tpacket_snd() Yun Lu
2025-07-09  9:56 ` [PATCH v3 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd() Yun Lu
2025-07-09 12:41   ` Eric Dumazet
2025-07-09 17:06   ` Willem de Bruijn
2025-07-09 18:15   ` Simon Horman
2025-07-09  9:56 ` [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd() Yun Lu
2025-07-09 12:44   ` Eric Dumazet
2025-07-10  2:18     ` luyun
2025-07-09 18:14   ` Simon Horman
2025-07-10  2:20     ` luyun
2025-07-09 21:14   ` Willem de Bruijn
2025-07-10  2:36     ` luyun
2025-07-10  7:27   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).