From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AE5A4FEC0EB for ; Tue, 24 Mar 2026 18:13:12 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 4A2AE83BA0; Tue, 24 Mar 2026 18:13:12 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id UzzpWZdfrFAU; Tue, 24 Mar 2026 18:13:11 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.166.142; helo=lists1.osuosl.org; envelope-from=intel-wired-lan-bounces@osuosl.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 5FB7D83BA1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osuosl.org; s=default; t=1774375991; bh=/nbVlsaioYKF4sVSgiEokAiqvKpmQsI/mQ3o3FZFtp4=; h=Date:From:To:Cc:References:In-Reply-To:Subject:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From; b=yWCSaJ/mrsmeTHQuTnSsMc054dAW4G3dFvVxgvJU4cmiuQMyoIRh4kStGIci1tj/q 0YhR9bnFEKNUxDX8xiRtNTIBC/tUNZ9/+6Q8bGuIuCF7byLctBsRYzoxjZPASkbQ5m CcCyQFmhY3PYMQsi7r1w4yWlU7yq4setu3pzjO5EJ4idn0nLO8mb715qLWNqy4NmtQ E8+JtsUnAjjtnY5324IJe3i1397XsGy2yRiAK8OwPSLVcR6TCa7cTEZFf3PqI1XJmd XjleFiYTwmiGW9O4v9SksyrxHbZ3ROj6PSnL7+V5laIEss2KPdCOqzWdthTO1z9u7Q yfLr/D7gWbejw== Received: from lists1.osuosl.org (lists1.osuosl.org [140.211.166.142]) by smtp1.osuosl.org (Postfix) with ESMTP id 5FB7D83BA1; Tue, 24 Mar 2026 18:13:11 +0000 (UTC) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists1.osuosl.org (Postfix) with ESMTP id 6756B353 for ; Tue, 24 Mar 2026 18:13:09 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 6422560B9D for ; Tue, 24 Mar 2026 18:13:09 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id iOjSPG96x5XP for ; Tue, 24 Mar 2026 18:13:08 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::1232; helo=mail-dl1-x1232.google.com; envelope-from=stfomichev@gmail.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp3.osuosl.org 5316660B4E DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 5316660B4E Received: from mail-dl1-x1232.google.com (mail-dl1-x1232.google.com [IPv6:2607:f8b0:4864:20::1232]) by smtp3.osuosl.org (Postfix) with ESMTPS id 5316660B4E for ; Tue, 24 Mar 2026 18:13:08 +0000 (UTC) Received: by mail-dl1-x1232.google.com with SMTP id a92af1059eb24-128ebee22caso4639052c88.0 for ; Tue, 24 Mar 2026 11:13:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774375987; x=1774980787; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/nbVlsaioYKF4sVSgiEokAiqvKpmQsI/mQ3o3FZFtp4=; b=eWTQxk1x8t5aNNxQExOAWvR0Hw6kD2CRfY745eDiN8JGZP0dxV0SW0otKZRFMsV16x 0OH2+6E8nmAF6YzojN1RMgVk0ckiepHfrgR3pIHYS8TctpKfX7XE18lzE3e0OhFhlSMd yLSiTYnyubImqcExWqIYtxd3rMOsZ8klAyGgSvkiqdmP3VqdfgbBzwCSB3uOqhmkvCCa eGG6NDlR+bMFMx2By6zOMy/k112M6fknJ1QAeWyfu1k1kzuxDV4DiAnvv/ConcpoEfpo QZ3j/+PCNNa0Jt2gakJVGGnACBJdz5FeDr/pGE6MjnXC0azR6sqNXnRz5IxEAs6ydfXx x3dQ== X-Forwarded-Encrypted: i=1; AJvYcCWGvjeIAslfLiwSu2aVekkyOVzfbW71fFVJYGnQGDQZkLlgZptHAMFmuLePUTvOQUI5ijO0j1UfsbXZYzZs4bs=@lists.osuosl.org X-Gm-Message-State: AOJu0YxFFy0h03WVHLRwW5CLhKarSw7oVTYCg8pblYqNEObVxSKZ81LB scPxh7v4Tg36XD6VdbAq5Rkzh3G/8Ekofl7UrKBu3xAHYf/4cO8s+Z8= X-Gm-Gg: ATEYQzzbUu6//w+AVcOo2HDLT8D/FKxP0zgyosxvDmD5Hg9XHG3tfM3FBtomjS9kj0n q/mxy5zMOQyjBA7C7/NQ9NvFaUG0b7ORO7H/uDxWM7NncWs97WCSuU/I5m0iKRVd3Hiz+y1slnd ethgZJNR+67r3XKCmMvo8fqoC+9fatJjGazOoGwbIqwKKdWilKTm8jQBKMEzd4BmGaqF3k+Gcso ZvLmzywP9CX1E8bC9VOXx0BVEPcP2HlzeqL832RrvOz0mPW1uHzMIg7y/ZNekJIQFJsyUTyH6bb Tag8zrLLe0Fe8viXUySItzM4gBSdfTCqYpfD04kdeF15B33h6NZ8dFqE3/UpU7sEcLPjFuQTJ07 qy5AP3NSrHOJUlUWgpLEIe7xP8FU5qzBJ47doy8i2rD6Qnk/6XU/KYT1/tMFZGgAv64JxySYpuT MF5BS9BauP2xV1F4IIGzPsMVRGzOQxnRXEPdocY8PMOyTkHIFopHSVICyEZW1+I2ssCmihZbWR7 rQ0vQhY0jhCQ/Nf9A== X-Received: by 2002:a05:7022:69a:b0:11b:f056:a1b3 with SMTP id a92af1059eb24-12a96e5d9eamr272090c88.11.1774375986837; Tue, 24 Mar 2026 11:13:06 -0700 (PDT) Received: from localhost (c-76-102-12-149.hsd1.ca.comcast.net. [76.102.12.149]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-12a7330d1c5sm11223913c88.0.2026.03.24.11.13.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Mar 2026 11:13:05 -0700 (PDT) Date: Tue, 24 Mar 2026 11:13:04 -0700 From: Stanislav Fomichev To: Jakub Kicinski Cc: Stanislav Fomichev , netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, horms@kernel.org, corbet@lwn.net, skhan@linuxfoundation.org, andrew+netdev@lunn.ch, michael.chan@broadcom.com, pavan.chebbi@broadcom.com, anthony.l.nguyen@intel.com, przemyslaw.kitszel@intel.com, saeedm@nvidia.com, tariqt@nvidia.com, mbloch@nvidia.com, alexanderduyck@fb.com, kernel-team@meta.com, johannes@sipsolutions.net, sd@queasysnail.net, jianbol@nvidia.com, dtatulea@nvidia.com, mohsin.bashr@gmail.com, jacob.e.keller@intel.com, willemb@google.com, skhawaja@google.com, bestswngs@gmail.com, aleksandr.loktionov@intel.com, kees@kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org, linux-rdma@vger.kernel.org, linux-wireless@vger.kernel.org, linux-kselftest@vger.kernel.org, leon@kernel.org Message-ID: Mail-Followup-To: Stanislav Fomichev , Jakub Kicinski , Stanislav Fomichev , netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, horms@kernel.org, corbet@lwn.net, skhan@linuxfoundation.org, andrew+netdev@lunn.ch, michael.chan@broadcom.com, pavan.chebbi@broadcom.com, anthony.l.nguyen@intel.com, przemyslaw.kitszel@intel.com, saeedm@nvidia.com, tariqt@nvidia.com, mbloch@nvidia.com, alexanderduyck@fb.com, kernel-team@meta.com, johannes@sipsolutions.net, sd@queasysnail.net, jianbol@nvidia.com, dtatulea@nvidia.com, mohsin.bashr@gmail.com, jacob.e.keller@intel.com, willemb@google.com, skhawaja@google.com, bestswngs@gmail.com, aleksandr.loktionov@intel.com, kees@kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org, linux-rdma@vger.kernel.org, linux-wireless@vger.kernel.org, linux-kselftest@vger.kernel.org, leon@kernel.org References: <20260320012501.2033548-1-sdf@fomichev.me> <20260320012501.2033548-4-sdf@fomichev.me> <20260323162003.0d155055@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20260323162003.0d155055@kernel.org> X-Mailman-Original-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774375987; x=1774980787; darn=lists.osuosl.org; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:cc:to:from:date:from:to:cc :subject:date:message-id:reply-to; bh=/nbVlsaioYKF4sVSgiEokAiqvKpmQsI/mQ3o3FZFtp4=; b=KQzwBU4f+rxtRy+RFnTcZh3Eh/0MbeGbHp0iZzY/yM7j1ODSbL9x6b+LYcVlUGhLYK 6ibRuJj0iXNwaCfUlOkr3dFbHM683QA/VYjEwB9k3hGaWVLWohawv49Tb+M6TWClICsv 9LZQaz8JbtnDOFIhyNVmabXERHiD5BDY9BI5nSoDrZPYneweyBViNeDzDH+lGVayspSS dXUKh92UlohtdKknSdp3Ch7BA/826ZPkE65x0VjF1qhVP6qMEOTLQxTZXnM0wddOtJgc hqnk3u2PO/0PbtWCuus/tJ/vZhu2WzzcbBC/HNB7ygUKS4TtlA4un1PitzTujP7FFqtx Ystg== X-Mailman-Original-Authentication-Results: smtp3.osuosl.org; dmarc=pass (p=none dis=none) header.from=gmail.com X-Mailman-Original-Authentication-Results: smtp3.osuosl.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20251104 header.b=KQzwBU4f Subject: Re: [Intel-wired-lan] [PATCH net-next v3 03/13] net: introduce ndo_set_rx_mode_async and dev_rx_mode_work X-BeenThere: intel-wired-lan@osuosl.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Intel Wired Ethernet Linux Kernel Driver Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-wired-lan-bounces@osuosl.org Sender: "Intel-wired-lan" On 03/23, Jakub Kicinski wrote: > On Thu, 19 Mar 2026 18:24:51 -0700 Stanislav Fomichev wrote: > > diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst > > index 35704d115312..dc83d78d3b27 100644 > > --- a/Documentation/networking/netdevices.rst > > +++ b/Documentation/networking/netdevices.rst > > @@ -289,6 +289,14 @@ struct net_device synchronization rules > > ndo_set_rx_mode: > > Synchronization: netif_addr_lock spinlock. > > Context: BHs disabled > > + Notes: Deprecated in favor of sleepable ndo_set_rx_mode_async. > > > > +ndo_set_rx_mode_async: > > + Synchronization: rtnl_lock() semaphore. In addition, netdev instance > > + lock if the driver implements queue management or shaper API. > > + Context: process (from a work queue) > > + Notes: Sleepable version of ndo_set_rx_mode. Receives snapshots > > It's probably just my weirdness but I find creating adjectives out > of random nouns by adding "-able" to be in poor taste. "sleepable" > took root in certain three letter subsystems but I hope it won't > in netdev. > > Please use your words: > > Notes: Async version of ndo_set_rx_mode which runs in process > context. Receives snapshots f the unicast and multicast address lists. SG, will do! > > + of the unicast and multicast address lists. > > > > ndo_setup_tc: > > ``TC_SETUP_BLOCK`` and ``TC_SETUP_FT`` are running under NFT locks > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index 469b7cdb3237..b05bdd67b807 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -1117,6 +1117,16 @@ struct netdev_net_notifier { > > * This function is called device changes address list filtering. > > * If driver handles unicast address filtering, it should set > > * IFF_UNICAST_FLT in its priv_flags. > > + * Cannot sleep, called with netif_addr_lock_bh held. > > + * Deprecated in favor of sleepable ndo_set_rx_mode_async. > > + * > > + * void (*ndo_set_rx_mode_async)(struct net_device *dev, > > + * struct netdev_hw_addr_list *uc, > > + * struct netdev_hw_addr_list *mc); > > + * Sleepable version of ndo_set_rx_mode. Called from a work queue > > + * with rtnl_lock and netdev_lock_ops(dev) held. The uc/mc parameters > > + * are snapshots of the address lists - iterate with > > + * netdev_hw_addr_list_for_each(ha, uc). > > * > > * int (*ndo_set_mac_address)(struct net_device *dev, void *addr); > > * This function is called when the Media Access Control address > > @@ -1437,6 +1447,9 @@ struct net_device_ops { > > void (*ndo_change_rx_flags)(struct net_device *dev, > > int flags); > > void (*ndo_set_rx_mode)(struct net_device *dev); > > + void (*ndo_set_rx_mode_async)(struct net_device *dev, > > + struct netdev_hw_addr_list *uc, > > + struct netdev_hw_addr_list *mc); > > int (*ndo_set_mac_address)(struct net_device *dev, > > void *addr); > > int (*ndo_validate_addr)(struct net_device *dev); > > @@ -1903,6 +1916,7 @@ enum netdev_reg_state { > > * has been enabled due to the need to listen to > > * additional unicast addresses in a device that > > * does not implement ndo_set_rx_mode() > > + * @rx_mode_work: Work queue entry for ndo_set_rx_mode_async() > > * @uc: unicast mac addresses > > * @mc: multicast mac addresses > > * @dev_addrs: list of device hw addresses > > @@ -2293,6 +2307,7 @@ struct net_device { > > unsigned int promiscuity; > > unsigned int allmulti; > > bool uc_promisc; > > + struct work_struct rx_mode_work; > > #ifdef CONFIG_LOCKDEP > > unsigned char nested_level; > > #endif > > @@ -4661,6 +4676,11 @@ static inline bool netif_device_present(const struct net_device *dev) > > return test_bit(__LINK_STATE_PRESENT, &dev->state); > > } > > > > +static inline bool netif_up_and_present(const struct net_device *dev) > > +{ > > + return (dev->flags & IFF_UP) && netif_device_present(dev); > > Is this really worth a dedicated helper? What are you trying to express > here semantically? I mostly added it to avoid repeating the same present & UP check. Will undo. > > + > > void netif_device_detach(struct net_device *dev); > > > > void netif_device_attach(struct net_device *dev); > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 200d44883fc1..fedc423306fc 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -2381,6 +2381,8 @@ static void netstamp_clear(struct work_struct *work) > > static DECLARE_WORK(netstamp_work, netstamp_clear); > > #endif > > > > +static struct workqueue_struct *rx_mode_wq; > > + > > void net_enable_timestamp(void) > > { > > #ifdef CONFIG_JUMP_LABEL > > @@ -9669,22 +9671,84 @@ int netif_set_allmulti(struct net_device *dev, int inc, bool notify) > > return 0; > > } > > > > -/* > > - * Upload unicast and multicast address lists to device and > > - * configure RX filtering. When the device doesn't support unicast > > - * filtering it is put in promiscuous mode while unicast addresses > > - * are present. > > +static void dev_rx_mode_work(struct work_struct *work) > > +{ > > + struct net_device *dev = container_of(work, struct net_device, > > + rx_mode_work); > > + struct netdev_hw_addr_list uc_snap, mc_snap, uc_ref, mc_ref; > > + const struct net_device_ops *ops = dev->netdev_ops; > > + int err; > > + > > + __hw_addr_init(&uc_snap); > > + __hw_addr_init(&mc_snap); > > + __hw_addr_init(&uc_ref); > > + __hw_addr_init(&mc_ref); > > + > > + rtnl_lock(); > > + netdev_lock_ops(dev); > > + > > + if (!netif_up_and_present(dev)) > > + goto out; > > + > > + if (ops->ndo_set_rx_mode_async) { > > How did we get here if we don't have this op? > Are you planning to plumb more code thru this work in the future? > If yes the whole rx_mode handling should be in a dedicated helper > rather than indenting most of the function. I do expand this, yes, in the subsequent patches. With promisc handling (ndo_change_rx_flags) and ndo_set_rx_mode fallback. Let me try to see if I can add some helper+struct to manage a set of snapshots to make it a bit more clear. > > + netif_addr_lock_bh(dev); > > + > > + err = __hw_addr_list_snapshot(&uc_snap, &dev->uc, > > + dev->addr_len); > > + if (!err) > > + err = __hw_addr_list_snapshot(&uc_ref, &dev->uc, > > + dev->addr_len); > > + if (!err) > > + err = __hw_addr_list_snapshot(&mc_snap, &dev->mc, > > + dev->addr_len); > > + if (!err) > > + err = __hw_addr_list_snapshot(&mc_ref, &dev->mc, > > + dev->addr_len); > > This doesn't get slow with a few thousands of addresses? I can add kunit benchmark and attach the output? Although not sure where to go from that. The alternative to this is allocating an array of entries. I started with that initially but __hw_addr_sync_dev wants to kfree the individual entries and I decided not to have a separate helpers to manage the snapshots. > > + netif_addr_unlock_bh(dev); > > + > > + if (err) { > > + netdev_WARN(dev, "failed to sync uc/mc addresses\n"); > > + __hw_addr_flush(&uc_snap); > > + __hw_addr_flush(&uc_ref); > > + __hw_addr_flush(&mc_snap); > > + goto out; > > + } > > + > > + ops->ndo_set_rx_mode_async(dev, &uc_snap, &mc_snap); > > + > > + netif_addr_lock_bh(dev); > > + __hw_addr_list_reconcile(&dev->uc, &uc_snap, > > + &uc_ref, dev->addr_len); > > + __hw_addr_list_reconcile(&dev->mc, &mc_snap, > > + &mc_ref, dev->addr_len); > > + netif_addr_unlock_bh(dev); > > + } > > + > > +out: > > + netdev_unlock_ops(dev); > > + rtnl_unlock(); > > +} > > + > > +/** > > + * __dev_set_rx_mode() - upload unicast and multicast address lists to device > > + * and configure RX filtering. > > + * @dev: device > > + * > > + * When the device doesn't support unicast filtering it is put in promiscuous > > + * mode while unicast addresses are present. > > */ > > void __dev_set_rx_mode(struct net_device *dev) > > { > > const struct net_device_ops *ops = dev->netdev_ops; > > > > /* dev_open will call this function so the list will stay sane. */ > > - if (!(dev->flags&IFF_UP)) > > + if (!netif_up_and_present(dev)) > > return; > > > > - if (!netif_device_present(dev)) > > + if (ops->ndo_set_rx_mode_async) { > > + queue_work(rx_mode_wq, &dev->rx_mode_work); > > return; > > + } > > > > if (!(dev->priv_flags & IFF_UNICAST_FLT)) { > > /* Unicast addresses changes may only happen under the rtnl, > > @@ -11708,6 +11772,16 @@ void netdev_run_todo(void) > > > > __rtnl_unlock(); > > > > + /* Make sure all pending rx_mode work completes before returning. > > + * > > + * rx_mode_wq may be NULL during early boot: > > + * core_initcall(netlink_proto_init) vs subsys_initcall(net_dev_init). > > + * > > + * Check current_work() to avoid flushing from the wq. > > + */ > > + if (rx_mode_wq && !current_work()) > > + flush_workqueue(rx_mode_wq); > > Can we give the work a reference on the netdev (at init time) and > cancel + release it here instead of flushing / waiting? Not sure why cancel+release, maybe you're thinking about the unregister path? This is rtnl_unlock -> netdev_run_todo -> __rtnl_unlock + some extras. And the flush is here to plumb the addresses to the real devices before we return to the callers. Mostly because of the following things we have in the tests: # TEST: team cleanup mode lacp [FAIL] # macvlan unicast address not found on a slave Can you explain a bit more on the suggestion? > > /* Wait for rcu callbacks to finish before next phase */ > > if (!list_empty(&list)) > > rcu_barrier(); > > @@ -12099,6 +12173,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, > > #endif > > > > mutex_init(&dev->lock); > > + INIT_WORK(&dev->rx_mode_work, dev_rx_mode_work); > > > > dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM; > > setup(dev); > > @@ -12203,6 +12278,8 @@ void free_netdev(struct net_device *dev) > > > > kfree(rcu_dereference_protected(dev->ingress_queue, 1)); > > > > + cancel_work_sync(&dev->rx_mode_work); > > Should never happen so maybe wrap it in a WARN ? Or maybe just flush_workqueue here as well? To signal the intent that we are mostly waiting for the wq entry to be unused to be able to kfree it?