netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: luyun <luyun_611@163.com>,
	 Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	 davem@davemloft.net,  edumazet@google.com,  kuba@kernel.org,
	 pabeni@redhat.com,  horms@kernel.org
Cc: netdev@vger.kernel.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 2/3] af_packet: fix soft lockup issue caused by tpacket_snd()
Date: Fri, 11 Jul 2025 15:33:32 -0400	[thread overview]
Message-ID: <6871670c9f9a5_168265294c9@willemb.c.googlers.com.notmuch> (raw)
In-Reply-To: <515fc9c6-a4a2-4fdf-8d91-396e42c95767@163.com>

luyun wrote:
> 
> 在 2025/7/10 21:49, Willem de Bruijn 写道:
> > Yun Lu wrote:
> >> From: Yun Lu <luyun@kylinos.cn>
> >>
> >> When MSG_DONTWAIT is not set, the tpacket_snd operation will wait for
> >> pending_refcnt to decrement to zero before returning. The pending_refcnt
> >> is decremented by 1 when the skb->destructor function is called,
> >> indicating that the skb has been successfully sent and needs to be
> >> destroyed.
> >>
> >> If an error occurs during this process, the tpacket_snd() function will
> >> exit and return error, but pending_refcnt may not yet have decremented to
> >> zero. Assuming the next send operation is executed immediately, but there
> >> are no available frames to be sent in tx_ring (i.e., packet_current_frame
> >> returns NULL), and skb is also NULL, the function will not execute
> >> wait_for_completion_interruptible_timeout() to yield the CPU. Instead, it
> >> will enter a do-while loop, waiting for pending_refcnt to be zero. Even
> >> if the previous skb has completed transmission, the skb->destructor
> >> function can only be invoked in the ksoftirqd thread (assuming NAPI
> >> threading is enabled). When both the ksoftirqd thread and the tpacket_snd
> >> operation happen to run on the same CPU, and the CPU trapped in the
> >> do-while loop without yielding, the ksoftirqd thread will not get
> >> scheduled to run. As a result, pending_refcnt will never be reduced to
> >> zero, and the do-while loop cannot exit, eventually leading to a CPU soft
> >> lockup issue.
> >>
> >> In fact, skb is true for all but the first iterations of that loop, and
> >> as long as pending_refcnt is not zero, even if incremented by a previous
> >> call, wait_for_completion_interruptible_timeout() should be executed to
> >> yield the CPU, allowing the ksoftirqd thread to be scheduled. Therefore,
> >> the execution condition of this function should be modified to check if
> >> pending_refcnt is not zero, instead of check skb.
> >>
> >> As a result, packet_read_pending() may be called twice in the loop. This
> >> will be optimized in the following patch.
> >>
> >> Fixes: 89ed5b519004 ("af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET")
> >> Cc: stable@kernel.org
> >> Suggested-by: LongJun Tang <tanglongjun@kylinos.cn>
> >> Signed-off-by: Yun Lu <luyun@kylinos.cn>
> >>
> >> ---
> >> Changes in v4:
> >> - Split to the fix alone. Thanks: Willem de Bruijn.
> >> - Link to v3: https://lore.kernel.org/all/20250709095653.62469-3-luyun_611@163.com/
> >>
> >> Changes in v3:
> >> - Simplify the code and reuse ph to continue. Thanks: Eric Dumazet.
> >> - Link to v2: https://lore.kernel.org/all/20250708020642.27838-1-luyun_611@163.com/
> >>
> >> Changes in v2:
> >> - Add a Fixes tag.
> >> - Link to v1: https://lore.kernel.org/all/20250707081629.10344-1-luyun_611@163.com/
> >> ---
> >> ---
> >>   net/packet/af_packet.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> >> index 7089b8c2a655..581a96ec8e1a 100644
> >> --- a/net/packet/af_packet.c
> >> +++ b/net/packet/af_packet.c
> >> @@ -2846,7 +2846,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> >>   		ph = packet_current_frame(po, &po->tx_ring,
> >>   					  TP_STATUS_SEND_REQUEST);
> >>   		if (unlikely(ph == NULL)) {
> >> -			if (need_wait && skb) {
> >> +			if (need_wait && packet_read_pending(&po->tx_ring)) {
> > Unfortunately I did not immediately fully appreciate Eric's
> > suggestion.
> >
> > My comments was
> >
> >      If [..] the extra packet_read_pending() is already present, not
> >      newly introduced with the fix
> >
> > But of course that expensive call is newly introduced, so my
> > suggestion was invalid.
> >
> > It's btw also not possible to mix net and net-next patches in a single
> > series like this (see Documentation/process/maintainer-netdev.rst).
> 
> Sorry, I misunderstood your comments. In the next version, I will 
> combine the second and third patches together.

My original suggestion was just wrong, sorry. Thanks for revising again.
 
> >
> > But, instead of going back entirely to v2, perhaps we can make the
> > logic a bit more obvious by just having a while (1) at the end to show
> > that the only way to exit the loop (except errors) is in the ph == NULL
> > branch. And break in that loop directly.
> >
> > There are two other ways to reach that while statement. A continue
> > on PACKET_SOCK_TP_LOSS, or by regular control flow. In both cases, ph
> > is non-zero, so the condition is true anyway.
> 
> Following your suggestion, I tried modifying the code (as shown below),  
> now the loop condition is still the same as origin, but the logic is now 
> clearer and more obvious.

  reply	other threads:[~2025-07-11 19:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-10 10:26 [PATCH v4 0/3] fix two issues and optimize code on tpacket_snd() Yun Lu
2025-07-10 10:26 ` [PATCH v4 1/3] af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd() Yun Lu
2025-07-10 10:26 ` [PATCH v4 2/3] af_packet: fix soft lockup issue caused by tpacket_snd() Yun Lu
2025-07-10 13:49   ` Willem de Bruijn
2025-07-11  7:20     ` luyun
2025-07-11 19:33       ` Willem de Bruijn [this message]
2025-07-10 10:26 ` [PATCH v4 3/3] af_packet: optimize the packet_read_pending function called on tpacket_snd() Yun Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6871670c9f9a5_168265294c9@willemb.c.googlers.com.notmuch \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luyun_611@163.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).