All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Eric Dumazet <edumazet@google.com>,
	"David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org, eric.dumazet@gmail.com,
	Eric Dumazet <edumazet@google.com>
Subject: Re: [PATCH net] net: fix races in netdev_tx_sent_queue()/dev_watchdog()
Date: Thu, 17 Oct 2024 18:11:22 +0200	[thread overview]
Message-ID: <87y12myaj9.fsf@toke.dk> (raw)
In-Reply-To: <20241015194118.3951657-1-edumazet@google.com>

Eric Dumazet <edumazet@google.com> writes:

> Some workloads hit the infamous dev_watchdog() message:
>
> "NETDEV WATCHDOG: eth0 (xxxx): transmit queue XX timed out"
>
> It seems possible to hit this even for perfectly normal
> BQL enabled drivers:
>
> 1) Assume a TX queue was idle for more than dev->watchdog_timeo
>    (5 seconds unless changed by the driver)
>
> 2) Assume a big packet is sent, exceeding current BQL limit.
>
> 3) Driver ndo_start_xmit() puts the packet in TX ring,
>    and netdev_tx_sent_queue() is called.
>
> 4) QUEUE_STATE_STACK_XOFF could be set from netdev_tx_sent_queue()
>    before txq->trans_start has been written.
>
> 5) txq->trans_start is written later, from netdev_start_xmit()
>
>     if (rc == NETDEV_TX_OK)
>           txq_trans_update(txq)
>
> dev_watchdog() running on another cpu could read the old
> txq->trans_start, and then see QUEUE_STATE_STACK_XOFF, because 5)
> did not happen yet.
>
> To solve the issue, write txq->trans_start right before one XOFF bit
> is set :
>
> - _QUEUE_STATE_DRV_XOFF from netif_tx_stop_queue()
> - __QUEUE_STATE_STACK_XOFF from netdev_tx_sent_queue()
>
> From dev_watchdog(), we have to read txq->state before txq->trans_start.
>
> Add memory barriers to enforce correct ordering.
>
> In the future, we could avoid writing over txq->trans_start for normal
> operations, and rename this field to txq->xoff_start_time.
>
> Fixes: bec251bc8b6a ("net: no longer stop all TX queues in dev_watchdog()")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>


  parent reply	other threads:[~2024-10-17 16:11 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-15 19:41 [PATCH net] net: fix races in netdev_tx_sent_queue()/dev_watchdog() Eric Dumazet
2024-10-16 17:11 ` Willem de Bruijn
2024-10-17 16:11 ` Toke Høiland-Jørgensen [this message]
2024-10-21 11:10 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y12myaj9.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.