netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Yun Lu <luyun_611@163.com>,
	 willemdebruijn.kernel@gmail.com,  davem@davemloft.net,
	 edumazet@google.com,  kuba@kernel.org,  pabeni@redhat.com
Cc: netdev@vger.kernel.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd()
Date: Wed, 09 Jul 2025 17:14:32 -0400	[thread overview]
Message-ID: <686edbb8943d2_a6f49294e2@willemb.c.googlers.com.notmuch> (raw)
In-Reply-To: <20250709095653.62469-3-luyun_611@163.com>

Yun Lu wrote:
> From: Yun Lu <luyun@kylinos.cn>
> 
> When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for
> pending_refcnt to decrement to zero before returning. The pending_refcnt
> is decremented by 1 when the skb->destructor function is called,
> indicating that the skb has been successfully sent and needs to be
> destroyed.
> 
> If an error occurs during this process, the tpacket_snd() function will
> exit and return error, but pending_refcnt may not yet have decremented to
> zero. Assuming the next send operation is executed immediately, but there
> are no available frames to be sent in tx_ring (i.e., packet_current_frame
> returns NULL), and skb is also NULL

This is a very specific edge case. And arguably the goal is to wait
for any pending skbs still, even if from a previous call.

skb is true for all but the first iterations of that loop. So your
earlier patch

-                       if (need_wait && skb) {
+                       if (need_wait && packet_read_pending(&po->tx_ring)) {

Is more concise and more obviously correct.

>, the function will not execute
> wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it
> will enter a do-while loop, waiting for pending_refcnt to be zero. Even
> if the previous skb has completed transmission, the skb->destructor
> function can only be invoked in the ksoftirqd thread (assuming NAPI
> threading is enabled). When both the ksoftirqd thread and the tpacket_snd
> operation happen to run on the same CPU, and the CPU trapped in the
> do-while loop without yielding, the ksoftirqd thread will not get
> scheduled to run.

Interestingly, this is quite similar to the issue that caused adding
the completion in the first place. Commit 89ed5b519004 ("af_packet:
Block execution of tasks waiting for transmit to complete in
AF_PACKET") added the completion because a SCHED_FIFO task could delay
ksoftirqd indefinitely.

> As a result, pending_refcnt will never be reduced to
> zero, and the do-while loop cannot exit, eventually leading to a CPU soft
> lockup issue.
> 
> In fact, as long as pending_refcnt is not zero, even if skb is NULL,
> wait_for_completion_interruptible_timeout() should be executed to yield
> the CPU, allowing the ksoftirqd thread to be scheduled. Therefore, move
> the penging_refcnt check to the start of the do-while loop, and reuse ph
> to continue for the next iteration.
> 
> Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET")
> Cc: stable@kernel.org
> Suggested-by: LongJun Tang <tanglongjun@kylinos.cn>
> Signed-off-by: Yun Lu <luyun@kylinos.cn>
> 
> ---
> Changes in v3:
> - Simplify the code and reuse ph to continue. Thanks: Eric Dumazet.
> - Link to v2: https://lore.kernel.org/all/20250708020642.27838-1-luyun_611@163.com/

If the fix alone is more obvious without this optimization, and
the extra packet_read_pending() is already present, not newly
introduced with the fix, then I would prefer to split the fix (to net,
and stable) from the optimization (to net-next).
 
> Changes in v2:
> - Add a Fixes tag.
> - Link to v1: https://lore.kernel.org/all/20250707081629.10344-1-luyun_611@163.com/
> ---
>  net/packet/af_packet.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 7089b8c2a655..89a5d2a3a720 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2846,11 +2846,21 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
>  		ph = packet_current_frame(po, &po->tx_ring,
>  					  TP_STATUS_SEND_REQUEST);
>  		if (unlikely(ph == NULL)) {
> -			if (need_wait && skb) {
> +			/* Note: packet_read_pending() might be slow if we
> +			 * have to call it as it's per_cpu variable, but in
> +			 * fast-path we don't have to call it, only when ph
> +			 * is NULL, we need to check pending_refcnt.
> +			 */
> +			if (need_wait && packet_read_pending(&po->tx_ring)) {
>  				timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
>  				if (timeo <= 0) {
>  					err = !timeo ? -ETIMEDOUT : -ERESTARTSYS;
>  					goto out_put;
> +				} else {
> +					/* Just reuse ph to continue for the next iteration, and
> +					 * ph will be reassigned at the start of the next iteration.
> +					 */
> +					ph = (void *)1;
>  				}
>  			}
>  			/* check for additional frames */
> @@ -2943,14 +2953,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
>  		}
>  		packet_increment_head(&po->tx_ring);
>  		len_sum += tp_len;
> -	} while (likely((ph != NULL) ||
> -		/* Note: packet_read_pending() might be slow if we have
> -		 * to call it as it's per_cpu variable, but in fast-path
> -		 * we already short-circuit the loop with the first
> -		 * condition, and luckily don't have to go that path
> -		 * anyway.
> -		 */
> -		 (need_wait && packet_read_pending(&po->tx_ring))));
> +	} while (likely(ph != NULL))
>  
>  	err = len_sum;
>  	goto out_put;
> -- 
> 2.43.0
> 



  parent reply	other threads:[~2025-07-09 21:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-09  9:56 [PATCH v3 0/2] fix two issues on tpacket_snd() Yun Lu
2025-07-09  9:56 ` [PATCH v3 1/2] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd() Yun Lu
2025-07-09 12:41   ` Eric Dumazet
2025-07-09 17:06   ` Willem de Bruijn
2025-07-09 18:15   ` Simon Horman
2025-07-09  9:56 ` [PATCH v3 2/2] af_packet: fix soft lockup issue caused by tpacket_snd() Yun Lu
2025-07-09 12:44   ` Eric Dumazet
2025-07-10  2:18     ` luyun
2025-07-09 18:14   ` Simon Horman
2025-07-10  2:20     ` luyun
2025-07-09 21:14   ` Willem de Bruijn [this message]
2025-07-10  2:36     ` luyun
2025-07-10  7:27   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=686edbb8943d2_a6f49294e2@willemb.c.googlers.com.notmuch \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luyun_611@163.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).