public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Kevin Mitchell <kevmitch@arista.com>
To: Antoine Tenart <atenart@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: new warning caused by ("net-sysfs: update the queue counts in the unregistration path")
Date: Wed, 28 Sep 2022 16:20:33 -0700	[thread overview]
Message-ID: <YzTWwf/FyzBKGaww@chmeee> (raw)
In-Reply-To: <166435838013.3919.14607521178984182789@kwain>

On Wed, Sep 28, 2022 at 11:46:20AM +0200, Antoine Tenart wrote:
> Quoting Kevin Mitchell (2022-09-28 03:27:46)
> > With the inclusion of d7dac083414e ("net-sysfs: update the queue counts in the
> > unregistration path"), we have started see the following message during one of
> > our stress tests that brings an interface up and down while continuously
> > trying to send out packets on it:
> >
> > et3_11_1 selects TX queue 0, but real number of TX queues is 0
> >
> > It seems that this is a result of a race between remove_queue_kobjects() and
> > netdev_cap_txqueue() for the last packets before setting dev->flags &= ~IFF_UP
> > in __dev_close_many(). When this message is displayed, netdev_cap_txqueue()
> > selects queue 0 anyway (the noop queue at this point). As it did before the
> > above commit, that queue (which I guess is still around due to reference
> > counting) proceeds to drop the packet and return NET_XMIT_CN. So there doesn't
> > appear to be a functional change. However, the warning message seems to be
> > spurious if not slightly confusing.
>
> Do you know the call traces leading to this? Also I'm not 100% sure to
> follow as remove_queue_kobjects is called in the unregistration path
> while the test is setting the iface up & down. What driver is used?

Sorry, my language was imprecise. The device is being unregistered and
re-registered. The driver is out of tree for our front panel ports. I don't
think this is specific to the driver, but I'd be happy to be convinced
otherwise.

The call trace to the queue removal is

[  628.165565]  dump_stack+0x74/0x90
(remove_queue_kobject)
[  628.165569]  netdev_unregister_kobject+0x7a/0xb3
[  628.165572]  rollback_registered_many+0x560/0x5c4
[  628.165576]  unregister_netdevice_queue+0xa3/0xfc
[  628.165578]  unregister_netdev+0x1e/0x25
[  628.165589]  fdev_free+0x26e/0x29d [strata_dma_drv]

The call trace to the warning message is

[ 1094.355489]  dump_stack+0x74/0x90
(netdev_cap_txqueue)
[ 1094.355495]  netdev_core_pick_tx+0x91/0xaf
[ 1094.355500]  __dev_queue_xmit+0x249/0x602
[ 1094.355503]  ? printk+0x58/0x6f
[ 1094.355510]  dev_queue_xmit+0x10/0x12
[ 1094.355518]  packet_sendmsg+0xe88/0xeee
[ 1094.355524]  ? update_curr+0x6b/0x15d
[ 1094.355530]  sock_sendmsg_nosec+0x12/0x1d
[ 1094.355533]  sock_write_iter+0x8a/0xb6
[ 1094.355539]  new_sync_write+0x7c/0xb4
[ 1094.355543]  vfs_write+0xfe/0x12a
[ 1094.355547]  ksys_write+0x6e/0xb9
[ 1094.355552]  ? exit_to_user_mode_prepare+0xd3/0xf0
[ 1094.355555]  __x64_sys_write+0x1a/0x1c
[ 1094.355559]  do_syscall_64+0x31/0x40
[ 1094.355564]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

>
> As you said and looking around queue 0 is somewhat special and used as a
> fallback. My suggestion would be to 1) check if the above race is
> expected 2) if yes, a possible solution would be not to warn when
> real_num_tx_queues == 0 as in such cases selecting queue 0 would be the
> expected fallback (and you might want to check places like [1]).

Yes this is exactly where this is happening and that sounds like a good idea to
me. As far as I can tell, the message is completely innocuous. If there really
are no cases where it is useful to have this warning for real_num_tx_queues ==
0, I could submit a patch to not emit it in that case.

>
> Thanks,
> Antoine
>
> [1] https://elixir.bootlin.com/linux/latest/source/net/core/dev.c#L4126

  reply	other threads:[~2022-09-28 23:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-28  1:27 new warning caused by ("net-sysfs: update the queue counts in the unregistration path") Kevin Mitchell
2022-09-28  9:46 ` Antoine Tenart
2022-09-28 23:20   ` Kevin Mitchell [this message]
2022-09-30  2:11     ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YzTWwf/FyzBKGaww@chmeee \
    --to=kevmitch@arista.com \
    --cc=atenart@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox