From: Eric Dumazet <dada1@cosmosbay.com>
To: Jiri Pirko <jpirko@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
jgarzik@pobox.com, davem@davemloft.net,
shemminger@linux-foundation.org,
bridge@lists.linux-foundation.org, fubar@us.ibm.com,
bonding-devel@lists.sourceforge.net, kaber@trash.net,
mschmidt@redhat.com, ivecera@redhat.com
Subject: Re: [PATCH 1/3] net: introduce a list of device addresses dev_addr_list
Date: Wed, 15 Apr 2009 11:27:50 +0200 [thread overview]
Message-ID: <49E5A896.90408@cosmosbay.com> (raw)
In-Reply-To: <20090415083223.GF21342@psychotron.englab.brq.redhat.com>
Jiri Pirko a écrit :
> This patch introduces a new list in struct net_device and brings a set of
> functions to handle the work with device address list. The list is a replacement
> for the original dev_addr field and because in some situations there is need to
> carry several device addresses with the net device. To be backward compatible,
> dev_addr is made to point to the first member of the list so original drivers
> sees no difference.
>
You see no difference ? Please look more closely...
I see one additional dereference in hot path, to small objects possibly
with false sharing effects.
So I would advise not changing dev_addr[] to a pointer.
And instead copy first netdev_hw_addr into it.
Also, doing a kzalloc(sizeof(struct netdev_hw_addr)) for allocating these structs
might give a block of memory < L1_CACHE_SIZE so kernel is free to give other
part of this cache line to some other layer that could be a hot spot, so
false sharing could happen.
kzalloc(max(sizeof(*ha), L1_CACHE_SIZE)) is thus higly recommended here.
> Note: patch adding list_first_entry_rcu (currently in Ingo's tip tree) needed.
>
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> ---
> include/linux/etherdevice.h | 24 ++++
> include/linux/netdevice.h | 31 +++++-
> net/core/dev.c | 263 +++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 316 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
> index a1f17ab..348a75e 100644
> --- a/include/linux/etherdevice.h
> +++ b/include/linux/etherdevice.h
> @@ -205,4 +205,28 @@ static inline int compare_ether_header(const void *a, const void *b)
> (a32[1] ^ b32[1]) | (a32[2] ^ b32[2]);
> }
>
> +/**
> + * is_etherdev_addr - Tell if given Ethernet address belongs to the device.
> + * @dev: Pointer to a device structure
> + * @addr: Pointer to a six-byte array containing the Ethernet address
> + *
> + * Compare passed address with all addresses of the device. Return true if the
> + * address if one of the device addresses.
> + */
> +static inline bool is_etherdev_addr(const struct net_device *dev,
> + const u8 *addr)
> +{
> + struct netdev_hw_addr *ha;
> + int res = 1;
> +
> + rcu_read_lock();
> + for_each_dev_addr(dev, ha) {
> + res = compare_ether_addr(addr, ha->addr);
compare_ether_addr_64bits() please ?
> + if (!res)
> + break;
> + }
> + rcu_read_unlock();
> + return !res;
> +}
> +
> #endif /* _LINUX_ETHERDEVICE_H */
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 2e7783f..77abfdf 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -210,6 +210,12 @@ struct dev_addr_list
> #define dmi_users da_users
> #define dmi_gusers da_gusers
>
> +struct netdev_hw_addr {
> + struct list_head list;
> + unsigned char addr[MAX_ADDR_LEN];
> + int refcount;
> +};
> +
> struct hh_cache
> {
> struct hh_cache *hh_next; /* Next entry */
> @@ -776,8 +782,11 @@ struct net_device
> */
> unsigned long last_rx; /* Time of last Rx */
> /* Interface address info used in eth_type_trans() */
> - unsigned char dev_addr[MAX_ADDR_LEN]; /* hw address, (before bcast
> - because most packets are unicast) */
> + unsigned char *dev_addr; /* hw address, (before bcast
> + because most packets are
> + unicast) */
> +
> + struct list_head dev_addr_list; /* list of device hw addresses */
>
> unsigned char broadcast[MAX_ADDR_LEN]; /* hw bcast add */
>
> @@ -1778,6 +1787,13 @@ static inline void netif_addr_unlock_bh(struct net_device *dev)
> spin_unlock_bh(&dev->addr_list_lock);
> }
>
> +/*
> + * dev_addr_list walker. Should be used only for read access. Call with
> + * rcu_read_lock held.
> + */
> +#define for_each_dev_addr(dev, ha) \
> + list_for_each_entry_rcu(ha, &dev->dev_addr_list, list)
> +
> /* These functions live elsewhere (drivers/net/net_init.c, but related) */
>
> extern void ether_setup(struct net_device *dev);
> @@ -1790,6 +1806,17 @@ extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
> alloc_netdev_mq(sizeof_priv, name, setup, 1)
> extern int register_netdev(struct net_device *dev);
> extern void unregister_netdev(struct net_device *dev);
> +
> +/* Functions used for device addresses handling */
> +extern int dev_addr_add(struct net_device *dev,
> + unsigned char *addr);
> +extern int dev_addr_del(struct net_device *dev,
> + unsigned char *addr);
> +extern int dev_addr_add_multiple(struct net_device *to_dev,
> + struct net_device *from_dev);
> +extern int dev_addr_del_multiple(struct net_device *to_dev,
> + struct net_device *from_dev);
> +
> /* Functions used for secondary unicast and multicast support */
> extern void dev_set_rx_mode(struct net_device *dev);
> extern void __dev_set_rx_mode(struct net_device *dev);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 91d792d..04cddbb 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3437,6 +3437,262 @@ void dev_set_rx_mode(struct net_device *dev)
> netif_addr_unlock_bh(dev);
> }
>
> +/* hw addresses list handling functions */
> +
> +static int __hw_addr_add_ii(struct list_head *list, unsigned char *addr,
> + int addr_len, int ignore_index)
> +{
> + struct netdev_hw_addr *ha;
> + int i = 0;
> +
> + if (addr_len > MAX_ADDR_LEN)
> + return -EINVAL;
> +
> + rcu_read_lock();
This locking is highly suspect.
> + list_for_each_entry_rcu(ha, list, list) {
> + if (i++ != ignore_index &&
> + !memcmp(ha->addr, addr, addr_len)) {
> + ha->refcount++;
> + rcu_read_unlock();
> + return 0;
> + }
> + }
> + rcu_read_unlock();
Since you obviously need a write lock here to be sure following
can be done by one cpu only.
You have same problem all over this patch.
> +
> + ha = kzalloc(sizeof(*ha), GFP_ATOMIC);
kzalloc(max(sizeof(*ha), L1_CACHE_SIZE), GFP_...) is thus higly recommended here.
Also, why GFP_ATOMIC is needed here ?
> + if (!ha)
> + return -ENOMEM;
> + memcpy(ha->addr, addr, addr_len);
> + ha->refcount = 1;
> + list_add_tail_rcu(&ha->list, list);
> + return 0;
> +}
> +
> +static int __hw_addr_add(struct list_head *list, unsigned char *addr,
> + int addr_len)
> +{
> + return __hw_addr_add_ii(list, addr, addr_len, -1);
> +}
> +
> +static int __hw_addr_del_ii(struct list_head *list, unsigned char *addr,
> + int addr_len, int ignore_index)
> +{
> + struct netdev_hw_addr *ha;
> + int i = 0;
> +
> + list_for_each_entry(ha, list, list) {
> + if (i++ != ignore_index &&
> + !memcmp(ha->addr, addr, addr_len)) {
> + if (--ha->refcount)
> + return 0;
> + list_del_rcu(&ha->list);
> + synchronize_rcu();
Oh well... I'm pretty sure this synchronize_rcu() call can be avoided,
dont you think ? Check kfree_rcu() or equivalent, as it seems not yet
included in current kernels...
> + kfree(ha);
> + return 0;
> + }
> + }
> + return -ENOENT;
> +}
> +
> +static int __hw_addr_del(struct list_head *list, unsigned char *addr,
> + int addr_len)
> +{
> + return __hw_addr_del_ii(list, addr, addr_len, -1);
> +}
> +
> +static int __hw_addr_add_multiple_ii(struct list_head *to_list,
> + struct list_head *from_list,
> + int addr_len, int ignore_index)
> +{
> + int err = 0;
> + struct netdev_hw_addr *ha, *ha2;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(ha, from_list, list) {
> + err = __hw_addr_add_ii(to_list, ha->addr, addr_len, 0);
> + if (err)
> + goto unroll;
> + }
> + goto unlock;
> +unroll:
> + list_for_each_entry_rcu(ha2, from_list, list) {
> + if (ha2 == ha)
> + break;
> + __hw_addr_del_ii(to_list, ha2->addr, addr_len, 0);
> + }
> +unlock:
> + rcu_read_unlock();
> + return err;
> +}
> +
> +static int __hw_addr_add_multiple(struct list_head *to_list,
> + struct list_head *from_list,
> + int addr_len)
> +{
> + return __hw_addr_add_multiple_ii(to_list, from_list, addr_len, -1);
> +}
> +
> +static void __hw_addr_del_multiple_ii(struct list_head *to_list,
> + struct list_head *from_list,
> + int addr_len, int ignore_index)
> +{
> + struct netdev_hw_addr *ha;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(ha, from_list, list) {
> + __hw_addr_del_ii(to_list, ha->addr, addr_len, 0);
> + }
> + rcu_read_unlock();
> +}
> +
> +static void __hw_addr_del_multiple(struct list_head *to_list,
> + struct list_head *from_list,
> + int addr_len)
> +{
> + __hw_addr_del_multiple_ii(to_list, from_list, addr_len, -1);
> +}
> +
> +static void __hw_addr_flush(struct list_head *list)
> +{
> + struct netdev_hw_addr *ha, *tmp;
> +
> + list_for_each_entry_safe(ha, tmp, list, list) {
> + list_del_rcu(&ha->list);
> + synchronize_rcu();
Oh no... :(
> + kfree(ha);
> + }
> +}
> +
> +/* Device addresses handling functions */
> +
> +static void dev_addr_flush(struct net_device *dev)
> +{
> + __hw_addr_flush(&dev->dev_addr_list);
> + dev->dev_addr = NULL;
> +}
> +
> +static int dev_addr_init(struct net_device *dev)
> +{
> + unsigned char addr[MAX_ADDR_LEN];
> + struct netdev_hw_addr *ha;
> + int err;
> +
> + INIT_LIST_HEAD(&dev->dev_addr_list);
> + memset(addr, 0, sizeof(*addr));
> + err = __hw_addr_add(&dev->dev_addr_list, addr, sizeof(*addr));
> + if (!err) {
> + /*
> + * Get the first (previously created) address from the list
> + * and set dev_addr pointer to this location.
> + */
> + rcu_read_lock();
locking is not correct or unnecessary
> + ha = list_first_entry_rcu(&dev->dev_addr_list,
> + struct netdev_hw_addr, list);
> + dev->dev_addr = ha->addr;
> + rcu_read_unlock();
> + }
> + return err;
> +}
> +
> +/**
> + * dev_addr_add - Add a device address
> + * @dev: device
> + * @addr: address to add
> + *
> + * Add a device address to the device or increase the reference count if
> + * it already exists.
> + *
> + * The caller must hold the rtnl_mutex.
> + */
> +int dev_addr_add(struct net_device *dev, unsigned char *addr)
> +{
> + int err;
> +
> + ASSERT_RTNL();
> +
> + err = __hw_addr_add_ii(&dev->dev_addr_list, addr, dev->addr_len, 0);
> + if (!err)
> + call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
> + return err;
> +}
> +EXPORT_SYMBOL(dev_addr_add);
> +
> +/**
> + * dev_addr_del - Release a device address.
> + * @dev: device
> + * @addr: address to delete
> + *
> + * Release reference to a device address and remove it from the device
> + * if the reference count drops to zero.
> + *
> + * The caller must hold the rtnl_mutex.
> + */
> +int dev_addr_del(struct net_device *dev, unsigned char *addr)
> +{
> + int err;
> +
> + ASSERT_RTNL();
> +
> + err = __hw_addr_del_ii(&dev->dev_addr_list, addr, dev->addr_len, 0);
> + if (!err)
> + call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
> + return err;
> +}
> +EXPORT_SYMBOL(dev_addr_del);
> +
> +/**
> + * dev_addr_add_multiple - Add device addresses from another device
> + * @to_dev: device to which addresses will be added
> + * @from_dev: device from which addresses will be added
> + *
> + * Add device addresses of the one device to another.
> + *
> + * The caller must hold the rtnl_mutex.
> + */
> +int dev_addr_add_multiple(struct net_device *to_dev,
> + struct net_device *from_dev)
> +{
> + int err;
> +
> + ASSERT_RTNL();
> +
> + if (from_dev->addr_len != to_dev->addr_len)
> + return -EINVAL;
> + err = __hw_addr_add_multiple_ii(&to_dev->dev_addr_list,
> + &from_dev->dev_addr_list,
> + to_dev->addr_len, 0);
> + if (!err)
> + call_netdevice_notifiers(NETDEV_CHANGEADDR, to_dev);
> + return err;
> +}
> +EXPORT_SYMBOL(dev_addr_add_multiple);
> +
> +/**
> + * dev_addr_del_multiple - Delete device addresses by another device
> + * @to_dev: device where the addresses will be deleted
> + * @from_dev: device by which addresses the addresses will be deleted
> + *
> + * Deletes addresses in to device by the list of addresses in from device.
> + *
> + * The caller must hold the rtnl_mutex.
> + */
> +int dev_addr_del_multiple(struct net_device *to_dev,
> + struct net_device *from_dev)
> +{
> + ASSERT_RTNL();
> +
> + if (from_dev->addr_len != to_dev->addr_len)
> + return -EINVAL;
> + __hw_addr_add_multiple_ii(&to_dev->dev_addr_list,
> + &from_dev->dev_addr_list,
> + to_dev->addr_len, 0);
> + call_netdevice_notifiers(NETDEV_CHANGEADDR, to_dev);
> + return 0;
> +}
> +EXPORT_SYMBOL(dev_addr_del_multiple);
> +
> +/* unicast and multicast addresses handling functions */
> +
> int __dev_addr_delete(struct dev_addr_list **list, int *count,
> void *addr, int alen, int glbl)
> {
> @@ -4257,6 +4513,9 @@ static void rollback_registered(struct net_device *dev)
> */
> dev_addr_discard(dev);
>
> + /* Flush device addresses */
> + dev_addr_flush(dev);
> +
> if (dev->netdev_ops->ndo_uninit)
> dev->netdev_ops->ndo_uninit(dev);
>
> @@ -4779,6 +5038,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
>
> dev->gso_max_size = GSO_MAX_SIZE;
>
> + dev_addr_init(dev);
> netdev_init_queues(dev);
>
> INIT_LIST_HEAD(&dev->napi_list);
> @@ -4965,6 +5225,9 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
> */
> dev_addr_discard(dev);
>
> + /* Flush device addresses */
> + dev_addr_flush(dev);
> +
> netdev_unregister_kobject(dev);
>
> /* Actually switch the network namespace */
next prev parent reply other threads:[~2009-04-15 9:31 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-13 18:33 [PATCH] bonding: allow bond in mode balance-alb to work properly in bridge Jiri Pirko
2009-03-14 5:39 ` Stephen Hemminger
2009-03-14 9:49 ` Jiri Pirko
2009-03-15 23:12 ` Stephen Hemminger
2009-03-16 11:11 ` Jiri Pirko
2009-03-19 6:20 ` David Miller
2009-03-19 8:44 ` Jiri Pirko
2009-03-19 10:21 ` David Miller
2009-03-19 11:19 ` Jiri Pirko
2009-03-19 8:50 ` Patrick McHardy
2009-03-19 16:31 ` Jiri Pirko
2009-03-25 13:04 ` [PATCH] bonding: allow bond in mode balance-alb to work properly in bridge -try2 Jiri Pirko
2009-03-25 13:40 ` Eric Dumazet
2009-03-25 14:39 ` Jiri Pirko
2009-03-25 15:19 ` [PATCH] bonding: allow bond in mode balance-alb to work properly in bridge -try3 Jiri Pirko
2009-03-25 16:31 ` Jay Vosburgh
2009-03-25 17:44 ` Jiri Pirko
2009-03-26 0:24 ` David Miller
2009-03-26 0:34 ` Jay Vosburgh
2009-03-26 11:12 ` Jiri Pirko
2009-03-26 15:52 ` [PATCH] bonding: allow bond in mode balance-alb to work properly in bridge -try4 Jiri Pirko
2009-03-27 7:38 ` David Miller
2009-03-27 7:46 ` Jiri Pirko
2009-03-27 7:53 ` Patrick McHardy
2009-03-27 8:41 ` Jiri Pirko
2009-03-27 8:55 ` Patrick McHardy
2009-03-27 9:47 ` Jiri Pirko
2009-03-29 20:53 ` David Miller
2009-03-30 12:04 ` Patrick McHardy
2009-03-30 12:40 ` Jiri Pirko
2009-03-30 12:47 ` Patrick McHardy
2009-03-30 12:52 ` Jiri Pirko
2009-03-30 12:58 ` Patrick McHardy
2009-04-13 8:37 ` [PATCH 0/4] bonding: allow bond in mode balance-alb to work properly in bridge -try5 Jiri Pirko
2009-04-13 8:38 ` [PATCH 1/4] net: introduce dev_mac_address_changed Jiri Pirko
2009-04-13 14:58 ` Stephen Hemminger
2009-04-13 8:42 ` [PATCH 2/4] net: introduce a list of device addresses dev_addr_list Jiri Pirko
2009-04-13 14:49 ` Stephen Hemminger
2009-04-13 22:54 ` David Miller
2009-04-13 22:53 ` David Miller
2009-04-13 8:44 ` [PATCH 3/4] net: bridge: use device address list instead of dev_addr Jiri Pirko
2009-04-13 14:54 ` Stephen Hemminger
2009-04-14 10:15 ` Jiri Pirko
2009-04-13 22:54 ` David Miller
2009-04-13 8:46 ` [PATCH 4/4] net: bonding: add slave device addresses in mode alb Jiri Pirko
2009-04-13 14:56 ` Stephen Hemminger
2009-04-15 8:17 ` [PATCH 0/3] bonding: allow bond in mode balance-alb to work properly in bridge -try6 Jiri Pirko
2009-04-15 8:18 ` [PATCH 1/3] net: introduce a list of device addresses dev_addr_list Jiri Pirko
2009-04-15 8:26 ` Li Zefan
2009-04-15 8:29 ` Jiri Pirko
2009-04-15 8:32 ` Jiri Pirko
2009-04-15 9:21 ` David Miller
2009-04-15 9:27 ` Eric Dumazet [this message]
2009-04-15 9:31 ` David Miller
2009-04-15 10:13 ` Patrick McHardy
2009-04-15 10:15 ` David Miller
2009-04-15 10:41 ` Patrick McHardy
2009-04-15 10:45 ` David Miller
2009-04-15 10:47 ` Patrick McHardy
2009-04-15 14:42 ` Jiri Pirko
2009-04-15 11:17 ` Jiri Pirko
2009-04-15 11:22 ` Patrick McHardy
2009-04-15 11:28 ` Jiri Pirko
2009-04-15 12:28 ` Eric Dumazet
2009-04-15 18:02 ` [PATCH 1/3] net: introduce a list of device addresses dev_addr_list (v2) Jiri Pirko
2009-04-15 18:54 ` Eric Dumazet
2009-04-16 8:46 ` Jiri Pirko
2009-04-17 11:57 ` [PATCH 1/3] net: introduce a list of device addresses dev_addr_list (v3) Jiri Pirko
2009-04-17 15:33 ` Stephen Hemminger
2009-04-18 7:01 ` Jiri Pirko
2009-04-18 7:35 ` Eric Dumazet
2009-04-18 7:44 ` Jiri Pirko
2009-04-18 8:06 ` Eric Dumazet
2009-04-18 8:58 ` [PATCH 1/3] net: introduce a list of device addresses dev_addr_list (v4) Jiri Pirko
2009-04-20 16:11 ` Jiri Pirko
2009-04-23 8:09 ` Jiri Pirko
2009-05-04 11:14 ` [PATCH] net: introduce a list of device addresses dev_addr_list (v5) Jiri Pirko
2009-05-05 4:37 ` David Miller
2009-05-05 6:37 ` Jiri Pirko
2009-05-05 12:48 ` [PATCH] net: introduce a list of device addresses dev_addr_list (v6) Jiri Pirko
2009-05-05 19:27 ` David Miller
2009-05-08 22:38 ` Stephen Hemminger
2009-05-08 23:00 ` David Miller
2009-05-08 23:12 ` Stephen Hemminger
2009-05-08 23:25 ` David Miller
2009-05-08 23:29 ` Stephen Hemminger
2009-04-15 8:21 ` [PATCH 2/3] net: bridge: use device address list instead of dev_addr Jiri Pirko
2009-05-06 14:46 ` [PATCH net-next] net: bridge: use device address list instead of dev_addr (repost) Jiri Pirko
2009-05-06 15:08 ` Eric Dumazet
2009-05-06 19:26 ` Stephen Hemminger
2009-05-07 22:03 ` David Miller
2009-04-15 8:22 ` [PATCH 3/3] net: bonding: add slave device addresses in mode alb Jiri Pirko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49E5A896.90408@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=bonding-devel@lists.sourceforge.net \
--cc=bridge@lists.linux-foundation.org \
--cc=davem@davemloft.net \
--cc=fubar@us.ibm.com \
--cc=ivecera@redhat.com \
--cc=jgarzik@pobox.com \
--cc=jpirko@redhat.com \
--cc=kaber@trash.net \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=mschmidt@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=shemminger@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox