From: Jakub Kicinski <kuba@kernel.org>
To: davem@davemloft.net
Cc: netdev@vger.kernel.org, edumazet@google.com, pabeni@redhat.com,
dw@davidwei.uk, almasrymina@google.com, jdamato@fastly.com,
Jakub Kicinski <kuba@kernel.org>
Subject: [PATCH net-next 1/8] net: make sure we retain NAPI ordering on netdev->napi_list
Date: Fri, 3 Jan 2025 10:59:46 -0800 [thread overview]
Message-ID: <20250103185954.1236510-2-kuba@kernel.org> (raw)
In-Reply-To: <20250103185954.1236510-1-kuba@kernel.org>
Netlink code depends on NAPI instances being sorted by ID on
the netdev list for dump continuation. We need to be able to
find the position on the list where we left off if dump does
not fit in a single skb, and in the meantime NAPI instances
can come and go.
This was trivially true when we were assigning a new ID to every
new NAPI instance. Since we added the NAPI config API, we try
to retain the ID previously used for the same queue, but still
add the new NAPI instance at the start of the list.
This is fine if we reset the entire netdev and all NAPIs get
removed and added back. If driver replaces a NAPI instance
during an operation like DEVMEM queue reset, or recreates
a subset of NAPI instances in other ways we may end up with
broken ordering, and therefore Netlink dumps with either
missing or duplicated entries.
At this stage the problem is theoretical. Only two drivers
support queue API, bnxt and gve. gve recreates NAPIs during
queue reset, but it doesn't support NAPI config.
bnxt supports NAPI config but doesn't recreate instances
during reset.
We need to save the ID in the config as soon as it is assigned
because otherwise the new NAPI will not know what ID it will
get at enable time, at the time it is being added.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
net/core/dev.c | 42 ++++++++++++++++++++++++++++++++++++------
1 file changed, 36 insertions(+), 6 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index faa23042df38..dffa8f71d5cc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6713,13 +6713,14 @@ static void napi_restore_config(struct napi_struct *n)
n->gro_flush_timeout = n->config->gro_flush_timeout;
n->irq_suspend_timeout = n->config->irq_suspend_timeout;
/* a NAPI ID might be stored in the config, if so use it. if not, use
- * napi_hash_add to generate one for us. It will be saved to the config
- * in napi_disable.
+ * napi_hash_add to generate one for us.
*/
- if (n->config->napi_id)
+ if (n->config->napi_id) {
napi_hash_add_with_id(n, n->config->napi_id);
- else
+ } else {
napi_hash_add(n);
+ n->config->napi_id = n->napi_id;
+ }
}
static void napi_save_config(struct napi_struct *n)
@@ -6727,10 +6728,39 @@ static void napi_save_config(struct napi_struct *n)
n->config->defer_hard_irqs = n->defer_hard_irqs;
n->config->gro_flush_timeout = n->gro_flush_timeout;
n->config->irq_suspend_timeout = n->irq_suspend_timeout;
- n->config->napi_id = n->napi_id;
napi_hash_del(n);
}
+/* Netlink wants the NAPI list to be sorted by ID, if adding a NAPI which will
+ * inherit an existing ID try to insert it at the right position.
+ */
+static void
+netif_napi_dev_list_add(struct net_device *dev, struct napi_struct *napi)
+{
+ unsigned int new_id, pos_id;
+ struct list_head *higher;
+ struct napi_struct *pos;
+
+ new_id = UINT_MAX;
+ if (napi->config && napi->config->napi_id)
+ new_id = napi->config->napi_id;
+
+ higher = &dev->napi_list;
+ list_for_each_entry(pos, &dev->napi_list, dev_list) {
+ if (pos->napi_id >= MIN_NAPI_ID)
+ pos_id = pos->napi_id;
+ else if (pos->config)
+ pos_id = pos->config->napi_id;
+ else
+ pos_id = UINT_MAX;
+
+ if (pos_id <= new_id)
+ break;
+ higher = &pos->dev_list;
+ }
+ list_add_rcu(&napi->dev_list, higher); /* adds after higher */
+}
+
void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
int (*poll)(struct napi_struct *, int), int weight)
{
@@ -6757,7 +6787,7 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
napi->list_owner = -1;
set_bit(NAPI_STATE_SCHED, &napi->state);
set_bit(NAPI_STATE_NPSVC, &napi->state);
- list_add_rcu(&napi->dev_list, &dev->napi_list);
+ netif_napi_dev_list_add(dev, napi);
/* default settings from sysfs are applied to all NAPIs. any per-NAPI
* configuration will be loaded in napi_enable
--
2.47.1
next prev parent reply other threads:[~2025-01-03 18:59 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-03 18:59 [PATCH net-next 0/8] net: make sure we retain NAPI ordering on netdev->napi_list Jakub Kicinski
2025-01-03 18:59 ` Jakub Kicinski [this message]
2025-01-06 9:35 ` [PATCH net-next 1/8] " Eric Dumazet
2025-01-03 18:59 ` [PATCH net-next 2/8] netdev: define NETDEV_INTERNAL Jakub Kicinski
2025-01-06 9:36 ` Eric Dumazet
2025-01-07 21:04 ` Mina Almasry
2025-01-03 18:59 ` [PATCH net-next 3/8] netdevsim: support NAPI config Jakub Kicinski
2025-01-06 9:36 ` Eric Dumazet
2025-01-03 18:59 ` [PATCH net-next 4/8] netdevsim: allocate rqs individually Jakub Kicinski
2025-01-06 9:52 ` Eric Dumazet
2025-01-07 10:33 ` Paolo Abeni
2025-01-03 18:59 ` [PATCH net-next 5/8] netdevsim: add queue alloc/free helpers Jakub Kicinski
2025-01-06 9:57 ` Eric Dumazet
2025-01-07 21:08 ` Mina Almasry
2025-01-07 21:15 ` Jakub Kicinski
2025-01-03 18:59 ` [PATCH net-next 6/8] netdevsim: add queue management API support Jakub Kicinski
2025-01-06 16:45 ` Stanislav Fomichev
2025-01-07 14:03 ` Willem de Bruijn
2025-01-07 14:49 ` Jakub Kicinski
2025-01-07 15:16 ` Willem de Bruijn
2025-01-03 18:59 ` [PATCH net-next 7/8] netdevsim: add debugfs-triggered queue reset Jakub Kicinski
2025-01-06 16:46 ` Stanislav Fomichev
2025-01-07 11:06 ` Paolo Abeni
2025-01-07 14:04 ` Willem de Bruijn
2025-01-03 18:59 ` [PATCH net-next 8/8] selftests: net: test listing NAPI vs queue resets Jakub Kicinski
2025-01-06 16:47 ` Stanislav Fomichev
2025-01-07 14:06 ` [PATCH net-next 0/8] net: make sure we retain NAPI ordering on netdev->napi_list Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250103185954.1236510-2-kuba@kernel.org \
--to=kuba@kernel.org \
--cc=almasrymina@google.com \
--cc=davem@davemloft.net \
--cc=dw@davidwei.uk \
--cc=edumazet@google.com \
--cc=jdamato@fastly.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).