* [RFC NET 00/02]: Secondary unicast address support
@ 2007-06-20 18:00 Patrick McHardy
2007-06-20 18:00 ` [RFC NET 01/02]: " Patrick McHardy
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Patrick McHardy @ 2007-06-20 18:00 UTC (permalink / raw)
To: netdev; +Cc: Patrick McHardy, shemminger, davem, jeff
These two patches contain a first short at secondary unicast address support.
I'm still working on converting macvlan as an example, but since I'm about to
leave for tonight I thougth I'd get them out for some comments now.
The patch adds two new functions dev_unicast_add and dev_unicast_delete to
add/remove addresses. Similar to dev_mc_add/dev_mc_delete they do refcounting
of the addresses and the address on a list associated with the device.
dev_address_upload is responsible for uploading both the multicast and
unicast list to the device. Devices that are capable of filtering multiple
unicast addresses need to provide a function dev->set_address_list that
deals with setting both unicast and multicast address filters. This seemed
like the easiest way for chips containing filters that can be used for
any address type, also parts of the logic when to use HW filters is similar
for unicast and multicast addresses. Devices not providing this function
are put in promiscous mode when secondary addresses are present and the
old set_multicast_list function is called to take care of multicast
filtering.
The dev_uc_list structure is kept similar to dev_mc_list to allow easier
integration in existing "fill address filters" loops.
E1000 is converted as an example, the patch worked fine in some limited
testing.
Comments welcome.
drivers/net/e1000/e1000_main.c | 39 ++++++---
include/linux/netdevice.h | 17 ++++
net/core/dev.c | 172 ++++++++++++++++++++++++++++++++++++++--
net/core/dev_mcast.c | 34 +-------
4 files changed, 212 insertions(+), 50 deletions(-)
Patrick McHardy (2):
[NET]: Secondary unicast address support
[E1000]: Secondary unicast address support
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC NET 01/02]: Secondary unicast address support
2007-06-20 18:00 [RFC NET 00/02]: Secondary unicast address support Patrick McHardy
@ 2007-06-20 18:00 ` Patrick McHardy
2007-06-20 18:00 ` [RFC E1000 02/02]: " Patrick McHardy
2007-06-21 19:08 ` [RFC NET 00/02]: " Eric W. Biederman
2 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2007-06-20 18:00 UTC (permalink / raw)
To: netdev; +Cc: Patrick McHardy, shemminger, davem, jeff
[NETDEV]: Secondary unicast address support
Add support for configuring secondary unicast addresses on network devices.
Devices supporting this feature need to change their set_multicast_list
function to configure unicast filters as well and assign it to
dev->set_address_list instead of dev->set_multicast_list. Devices not
supporting this feature are put in promiscous mode when secondary unicast
addresses are present.
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
commit 3f3f6e18b902ee177ecf5a108ba6ecbf1b5c9ba3
tree 8883aba620211e96d7419f96960cc596506cbeef
parent 890e2ae4ef5599ee34f280af4882f97c2dcfcb7b
author Patrick McHardy <kaber@trash.net> Wed, 20 Jun 2007 19:44:11 +0200
committer Patrick McHardy <kaber@trash.net> Wed, 20 Jun 2007 19:44:11 +0200
include/linux/netdevice.h | 17 ++++
net/core/dev.c | 172 +++++++++++++++++++++++++++++++++++++++++++--
net/core/dev_mcast.c | 34 +--------
3 files changed, 185 insertions(+), 38 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 868140d..a1cc2ea 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -191,6 +191,14 @@ struct dev_mc_list
int dmi_gusers;
};
+struct dev_uc_list
+{
+ struct dev_uc_list *next;
+ __u8 duci_addr[MAX_ADDR_LEN];
+ unsigned char duci_addrlen;
+ int duci_users;
+};
+
struct hh_cache
{
struct hh_cache *hh_next; /* Next entry */
@@ -389,7 +397,10 @@ struct net_device
unsigned short dev_id; /* for shared network cards */
struct dev_mc_list *mc_list; /* Multicast mac addresses */
+ struct dev_uc_list *uc_list; /* Secondary unicast mac addresses */
int mc_count; /* Number of installed mcasts */
+ int uc_count; /* Number of installed ucasts */
+ int uc_promisc;
int promiscuity;
int allmulti;
@@ -493,6 +504,8 @@ struct net_device
void *saddr,
unsigned len);
int (*rebuild_header)(struct sk_buff *skb);
+#define HAVE_ADDRESS_LIST
+ void (*set_address_list)(struct net_device *dev);
#define HAVE_MULTICAST
void (*set_multicast_list)(struct net_device *dev);
#define HAVE_SET_MAC_ADDR
@@ -1006,6 +1019,10 @@ extern void dev_mc_upload(struct net_device *dev);
extern int dev_mc_delete(struct net_device *dev, void *addr, int alen, int all);
extern int dev_mc_add(struct net_device *dev, void *addr, int alen, int newonly);
extern void dev_mc_discard(struct net_device *dev);
+extern int dev_unicast_delete(struct net_device *dev, void *addr, int alen);
+extern int dev_unicast_add(struct net_device *dev, void *addr, int alen);
+extern void __dev_address_upload(struct net_device *dev);
+extern void dev_address_upload(struct net_device *dev);
extern void dev_set_promiscuity(struct net_device *dev, int inc);
extern void dev_set_allmulti(struct net_device *dev, int inc);
extern void netdev_state_change(struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index 5974e5b..4f4beb0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -943,7 +943,7 @@ int dev_open(struct net_device *dev)
/*
* Initialize multicasting status
*/
- dev_mc_upload(dev);
+ dev_address_upload(dev);
/*
* Wakeup transmit queue engine
@@ -2522,6 +2522,163 @@ int netdev_set_master(struct net_device *slave, struct net_device *master)
return 0;
}
+void __dev_address_upload(struct net_device *dev)
+{
+ /* Don't do anything till we up the interface
+ * [dev_open will call this function so the list will
+ * stay sane]
+ */
+
+ if (!(dev->flags&IFF_UP))
+ return;
+
+ if (!netif_device_present(dev))
+ return;
+
+ if (dev->set_address_list)
+ dev->set_address_list(dev);
+ else {
+ if (dev->uc_count > 0 && !dev->uc_promisc) {
+ dev_set_promiscuity(dev, 1);
+ dev->uc_promisc = 1;
+ } else if (dev->uc_count == 0 && dev->uc_promisc) {
+ dev_set_promiscuity(dev, -1);
+ dev->uc_promisc = 0;
+ }
+
+ if (dev->set_multicast_list)
+ dev->set_multicast_list(dev);
+ }
+}
+
+/**
+ * dev_address_upload - upload address lists to device
+ * @dev: device
+ *
+ * Upload unicast and multicast address lists to device.
+ * When the device doesn't support unicast filtering it
+ * is put in promiscous mode while addresses are present.
+ *
+ */
+void dev_address_upload(struct net_device *dev)
+{
+ netif_tx_lock_bh(dev);
+ __dev_address_upload(dev);
+ netif_tx_unlock_bh(dev);
+}
+
+/**
+ * dev_unicast_delete - Release secondary unicast address.
+ * @dev: device
+ *
+ * Release reference to a secondary unicast address and remove it
+ * from the device if the reference count drop to zero.
+ *
+ */
+int dev_unicast_delete(struct net_device *dev, void *addr, int alen)
+{
+ int err = 0;
+ struct dev_uc_list *duci, **ducip;
+
+ netif_tx_lock_bh(dev);
+
+ for (ducip = &dev->uc_list; (duci = *ducip) != NULL;
+ ducip = &duci->next) {
+ /*
+ * Find the entry we want to delete. The device could
+ * have variable length entries so check these too.
+ */
+ if (memcmp(duci->duci_addr, addr, duci->duci_addrlen) == 0 &&
+ alen == duci->duci_addrlen) {
+ if (--duci->duci_users)
+ goto done;
+
+ /*
+ * Last user. So delete the entry.
+ */
+ *ducip = duci->next;
+ dev->uc_count--;
+
+ kfree(duci);
+
+ /*
+ * We have altered the list, so the card
+ * loaded filter is now wrong. Fix it
+ */
+ __dev_address_upload(dev);
+
+ netif_tx_unlock_bh(dev);
+ return 0;
+ }
+ }
+ err = -ENOENT;
+done:
+ netif_tx_unlock_bh(dev);
+ return err;
+}
+EXPORT_SYMBOL(dev_unicast_delete);
+
+/**
+ * dev_unicast_add - add a secondary unicast address
+ * @dev: device
+ *
+ * Add a secondary unicast address to the device or increase
+ * the reference count if it already exists.
+ *
+ */
+int dev_unicast_add(struct net_device *dev, void *addr, int alen)
+{
+ int err = 0;
+ struct dev_uc_list *duci, *duci1;
+
+ duci1 = kmalloc(sizeof(*duci), GFP_ATOMIC);
+
+ netif_tx_lock_bh(dev);
+ for (duci = dev->uc_list; duci != NULL; duci = duci->next) {
+ if (memcmp(duci->duci_addr, addr, duci->duci_addrlen) == 0 &&
+ duci->duci_addrlen == alen) {
+ duci->duci_users++;
+ goto done;
+ }
+ }
+
+ if ((duci = duci1) == NULL) {
+ netif_tx_unlock_bh(dev);
+ return -ENOMEM;
+ }
+ memcpy(duci->duci_addr, addr, alen);
+ duci->duci_addrlen = alen;
+ duci->next = dev->uc_list;
+ duci->duci_users = 1;
+ dev->uc_list = duci;
+ dev->uc_count++;
+
+ __dev_address_upload(dev);
+
+ netif_tx_unlock_bh(dev);
+ return 0;
+
+done:
+ netif_tx_unlock_bh(dev);
+ kfree(duci1);
+ return err;
+}
+EXPORT_SYMBOL(dev_unicast_add);
+
+void dev_unicast_discard(struct net_device *dev)
+{
+ netif_tx_lock_bh(dev);
+
+ while (dev->uc_list != NULL) {
+ struct dev_uc_list *tmp = dev->uc_list;
+ dev->uc_list = tmp->next;
+ kfree(tmp);
+ }
+ dev->uc_count = 0;
+
+ netif_tx_unlock_bh(dev);
+}
+
/**
* dev_set_promiscuity - update promiscuity count on a device
* @dev: device
@@ -2541,7 +2698,7 @@ void dev_set_promiscuity(struct net_device *dev, int inc)
else
dev->flags |= IFF_PROMISC;
if (dev->flags != old_flags) {
- dev_mc_upload(dev);
+ dev_address_upload(dev);
printk(KERN_INFO "device %s %s promiscuous mode\n",
dev->name, (dev->flags & IFF_PROMISC) ? "entered" :
"left");
@@ -2574,7 +2731,7 @@ void dev_set_allmulti(struct net_device *dev, int inc)
if ((dev->allmulti += inc) == 0)
dev->flags &= ~IFF_ALLMULTI;
if (dev->flags ^ old_flags)
- dev_mc_upload(dev);
+ dev_address_upload(dev);
}
unsigned dev_get_flags(const struct net_device *dev)
@@ -2617,10 +2774,10 @@ int dev_change_flags(struct net_device *dev, unsigned flags)
IFF_ALLMULTI));
/*
- * Load in the correct multicast list now the flags have changed.
+ * Load in the correct address list now the flags have changed.
*/
- dev_mc_upload(dev);
+ dev_address_upload(dev);
/*
* Have we downed the interface. We handle IFF_UP ourselves
@@ -2633,7 +2790,7 @@ int dev_change_flags(struct net_device *dev, unsigned flags)
ret = ((old_flags & IFF_UP) ? dev_close : dev_open)(dev);
if (!ret)
- dev_mc_upload(dev);
+ dev_address_upload(dev);
}
if (dev->flags & IFF_UP &&
@@ -3497,8 +3654,9 @@ void unregister_netdevice(struct net_device *dev)
raw_notifier_call_chain(&netdev_chain, NETDEV_UNREGISTER, dev);
/*
- * Flush the multicast chain
+ * Flush the unicast and multicast chains
*/
+ dev_unicast_discard(dev);
dev_mc_discard(dev);
if (dev->uninit)
diff --git a/net/core/dev_mcast.c b/net/core/dev_mcast.c
index 5a54053..45d616b 100644
--- a/net/core/dev_mcast.c
+++ b/net/core/dev_mcast.c
@@ -63,37 +63,9 @@
* We block accesses to device mc filters with netif_tx_lock.
*/
-/*
- * Update the multicast list into the physical NIC controller.
- */
-
-static void __dev_mc_upload(struct net_device *dev)
-{
- /* Don't do anything till we up the interface
- * [dev_open will call this function so the list will
- * stay sane]
- */
-
- if (!(dev->flags&IFF_UP))
- return;
-
- /*
- * Devices with no set multicast or which have been
- * detached don't get set.
- */
-
- if (dev->set_multicast_list == NULL ||
- !netif_device_present(dev))
- return;
-
- dev->set_multicast_list(dev);
-}
-
void dev_mc_upload(struct net_device *dev)
{
- netif_tx_lock_bh(dev);
- __dev_mc_upload(dev);
- netif_tx_unlock_bh(dev);
+ dev_address_upload(dev);
}
/*
@@ -135,7 +107,7 @@ int dev_mc_delete(struct net_device *dev, void *addr, int alen, int glbl)
* We have altered the list, so the card
* loaded filter is now wrong. Fix it
*/
- __dev_mc_upload(dev);
+ __dev_address_upload(dev);
netif_tx_unlock_bh(dev);
return 0;
@@ -185,7 +157,7 @@ int dev_mc_add(struct net_device *dev, void *addr, int alen, int glbl)
dev->mc_list = dmi;
dev->mc_count++;
- __dev_mc_upload(dev);
+ __dev_address_upload(dev);
netif_tx_unlock_bh(dev);
return 0;
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC E1000 02/02]: Secondary unicast address support
2007-06-20 18:00 [RFC NET 00/02]: Secondary unicast address support Patrick McHardy
2007-06-20 18:00 ` [RFC NET 01/02]: " Patrick McHardy
@ 2007-06-20 18:00 ` Patrick McHardy
2007-06-21 19:08 ` [RFC NET 00/02]: " Eric W. Biederman
2 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2007-06-20 18:00 UTC (permalink / raw)
To: netdev; +Cc: Patrick McHardy, shemminger, davem, jeff
[E1000]: Secondary unicast address support
Add support for configuring secondary unicast addresses. Unicast
addresses take precendece over multicast addresses when filling
the exact address filters to avoid going to promiscous mode.
When more unicast addresses are present than filter slots,
unicast filtering is disabled and all slots can be used for
multicast addresses.
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
commit 7c1ab0a6f8db6b7f1b9ce6cdd72074e2cb0be8f6
tree f900bbe8894ecd15db22c37d5c60ff3f2acd9f9e
parent 3f3f6e18b902ee177ecf5a108ba6ecbf1b5c9ba3
author Patrick McHardy <kaber@trash.net> Wed, 20 Jun 2007 19:46:06 +0200
committer Patrick McHardy <kaber@trash.net> Wed, 20 Jun 2007 19:46:06 +0200
drivers/net/e1000/e1000_main.c | 39 +++++++++++++++++++++++++++------------
1 files changed, 27 insertions(+), 12 deletions(-)
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index cf8af92..b1f6724 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -149,7 +149,7 @@ static void e1000_clean_tx_ring(struct e1000_adapter *adapter,
struct e1000_tx_ring *tx_ring);
static void e1000_clean_rx_ring(struct e1000_adapter *adapter,
struct e1000_rx_ring *rx_ring);
-static void e1000_set_multi(struct net_device *netdev);
+static void e1000_set_address_list(struct net_device *netdev);
static void e1000_update_phy_info(unsigned long data);
static void e1000_watchdog(unsigned long data);
static void e1000_82547_tx_fifo_stall(unsigned long data);
@@ -513,7 +513,7 @@ static void e1000_configure(struct e1000_adapter *adapter)
struct net_device *netdev = adapter->netdev;
int i;
- e1000_set_multi(netdev);
+ e1000_set_address_list(netdev);
e1000_restore_vlan(adapter);
e1000_init_manageability(adapter);
@@ -924,7 +924,7 @@ e1000_probe(struct pci_dev *pdev,
netdev->stop = &e1000_close;
netdev->hard_start_xmit = &e1000_xmit_frame;
netdev->get_stats = &e1000_get_stats;
- netdev->set_multicast_list = &e1000_set_multi;
+ netdev->set_address_list = &e1000_set_address_list;
netdev->set_mac_address = &e1000_set_mac;
netdev->change_mtu = &e1000_change_mtu;
netdev->do_ioctl = &e1000_ioctl;
@@ -2412,20 +2412,21 @@ e1000_set_mac(struct net_device *netdev, void *p)
}
/**
- * e1000_set_multi - Multicast and Promiscuous mode set
+ * e1000_set_address_list - Secondary Unicast, Multicast and Promiscuous mode set
* @netdev: network interface device structure
*
- * The set_multi entry point is called whenever the multicast address
- * list or the network interface flags are updated. This routine is
+ * The set_address_list entry point is called whenever the unicast or multicast
+ * address lists or the network interface flags are updated. This routine is
* responsible for configuring the hardware for proper multicast,
* promiscuous mode, and all-multi behavior.
**/
static void
-e1000_set_multi(struct net_device *netdev)
+e1000_set_address_list(struct net_device *netdev)
{
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = &adapter->hw;
+ struct dev_uc_list *uc_ptr;
struct dev_mc_list *mc_ptr;
uint32_t rctl;
uint32_t hash_value;
@@ -2449,9 +2450,16 @@ e1000_set_multi(struct net_device *netdev)
rctl |= (E1000_RCTL_UPE | E1000_RCTL_MPE);
} else if (netdev->flags & IFF_ALLMULTI) {
rctl |= E1000_RCTL_MPE;
- rctl &= ~E1000_RCTL_UPE;
} else {
- rctl &= ~(E1000_RCTL_UPE | E1000_RCTL_MPE);
+ rctl &= ~E1000_RCTL_MPE;
+ }
+
+ uc_ptr = NULL;
+ if (netdev->uc_count > rar_entries - 1) {
+ rctl |= E1000_RCTL_UPE;
+ } else if (!(netdev->flags & IFF_PROMISC)) {
+ rctl &= ~E1000_RCTL_UPE;
+ uc_ptr = netdev->uc_list;
}
E1000_WRITE_REG(hw, RCTL, rctl);
@@ -2461,7 +2469,10 @@ e1000_set_multi(struct net_device *netdev)
if (hw->mac_type == e1000_82542_rev2_0)
e1000_enter_82542_rst(adapter);
- /* load the first 14 multicast address into the exact filters 1-14
+ /* load the first 14 addresses into the exact filters 1-14. Unicast
+ * addresses take precedence to avoid disabling unicast filtering
+ * when possible.
+ *
* RAR 0 is used for the station MAC adddress
* if there are not 14 addresses, go ahead and clear the filters
* -- with 82571 controllers only 0-13 entries are filled here
@@ -2469,7 +2480,10 @@ e1000_set_multi(struct net_device *netdev)
mc_ptr = netdev->mc_list;
for (i = 1; i < rar_entries; i++) {
- if (mc_ptr) {
+ if (uc_ptr) {
+ e1000_rar_set(hw, uc_ptr->duci_addr, i);
+ uc_ptr = uc_ptr->next;
+ } else if (mc_ptr) {
e1000_rar_set(hw, mc_ptr->dmi_addr, i);
mc_ptr = mc_ptr->next;
} else {
@@ -2479,6 +2493,7 @@ e1000_set_multi(struct net_device *netdev)
E1000_WRITE_FLUSH(hw);
}
}
+ WARN_ON(uc_ptr != NULL);
/* clear the old settings from the multicast hash table */
@@ -5098,7 +5113,7 @@ e1000_suspend(struct pci_dev *pdev, pm_message_t state)
if (wufc) {
e1000_setup_rctl(adapter);
- e1000_set_multi(netdev);
+ e1000_set_address_list(netdev);
/* turn on all-multi mode if wake on multicast is enabled */
if (wufc & E1000_WUFC_MC) {
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC NET 00/02]: Secondary unicast address support
2007-06-20 18:00 [RFC NET 00/02]: Secondary unicast address support Patrick McHardy
2007-06-20 18:00 ` [RFC NET 01/02]: " Patrick McHardy
2007-06-20 18:00 ` [RFC E1000 02/02]: " Patrick McHardy
@ 2007-06-21 19:08 ` Eric W. Biederman
2007-06-21 19:13 ` David Miller
2007-06-21 19:13 ` Patrick McHardy
2 siblings, 2 replies; 15+ messages in thread
From: Eric W. Biederman @ 2007-06-21 19:08 UTC (permalink / raw)
To: Patrick McHardy; +Cc: netdev, shemminger, davem, jeff
Patrick McHardy <kaber@trash.net> writes:
> These two patches contain a first short at secondary unicast address support.
> I'm still working on converting macvlan as an example, but since I'm about to
> leave for tonight I thougth I'd get them out for some comments now.
>
> The patch adds two new functions dev_unicast_add and dev_unicast_delete to
> add/remove addresses. Similar to dev_mc_add/dev_mc_delete they do refcounting
> of the addresses and the address on a list associated with the device.
>
> dev_address_upload is responsible for uploading both the multicast and
> unicast list to the device. Devices that are capable of filtering multiple
> unicast addresses need to provide a function dev->set_address_list that
> deals with setting both unicast and multicast address filters. This seemed
> like the easiest way for chips containing filters that can be used for
> any address type, also parts of the logic when to use HW filters is similar
> for unicast and multicast addresses. Devices not providing this function
> are put in promiscous mode when secondary addresses are present and the
> old set_multicast_list function is called to take care of multicast
> filtering.
>
> The dev_uc_list structure is kept similar to dev_mc_list to allow easier
> integration in existing "fill address filters" loops.
>
> E1000 is converted as an example, the patch worked fine in some limited
> testing.
>
> Comments welcome.
I'm trying to understand what the point of this patch is.
In once sense I find the concept of filtering and listening for multiple
mac addresses very interesting, especially if we could break out different
streams of traffic by destination mac address into separate network devices.
This would remove the need to any kind of ethernet tunnel and makes multiple
network namespaces much more pleasant.
However this just seems to allow a card to decode multiple mac addresses
which in some oddball load balancing configurations may actually be
useful, but it seems fairly limited.
Do you have a specific use case you envision for this multiple mac
functionality?
Eric
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC NET 00/02]: Secondary unicast address support
2007-06-21 19:08 ` [RFC NET 00/02]: " Eric W. Biederman
@ 2007-06-21 19:13 ` David Miller
2007-06-21 21:11 ` Caitlin Bestler
2007-06-21 19:13 ` Patrick McHardy
1 sibling, 1 reply; 15+ messages in thread
From: David Miller @ 2007-06-21 19:13 UTC (permalink / raw)
To: ebiederm; +Cc: kaber, netdev, shemminger, jeff
From: ebiederm@xmission.com (Eric W. Biederman)
Date: Thu, 21 Jun 2007 13:08:12 -0600
> However this just seems to allow a card to decode multiple mac addresses
> which in some oddball load balancing configurations may actually be
> useful, but it seems fairly limited.
>
> Do you have a specific use case you envision for this multiple mac
> functionality?
Virtualization.
If you can't tell the ethernet card that more than 1 MAC address
are for it, you have to turn the thing into promiscuous mode.
Networking on virtualization is typically done by giving each
guest a unique MAC address, the guests have a virtual network
device that connects to the control node (or dom0 in Xen parlace)
and/or other guests.
The control node has a switch that routes the packets from the
guests either to other guests or out the real ethernet interface.
Each guest gets a unique MAC so that the switch can know which
guest an incoming packet is for.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC NET 00/02]: Secondary unicast address support
2007-06-21 19:08 ` [RFC NET 00/02]: " Eric W. Biederman
2007-06-21 19:13 ` David Miller
@ 2007-06-21 19:13 ` Patrick McHardy
2007-06-21 20:31 ` Eric W. Biederman
1 sibling, 1 reply; 15+ messages in thread
From: Patrick McHardy @ 2007-06-21 19:13 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: netdev, shemminger, davem, jeff
Eric W. Biederman wrote:
> I'm trying to understand what the point of this patch is.
>
> In once sense I find the concept of filtering and listening for multiple
> mac addresses very interesting, especially if we could break out different
> streams of traffic by destination mac address into separate network devices.
> This would remove the need to any kind of ethernet tunnel and makes multiple
> network namespaces much more pleasant.
>
> However this just seems to allow a card to decode multiple mac addresses
> which in some oddball load balancing configurations may actually be
> useful, but it seems fairly limited.
>
> Do you have a specific use case you envision for this multiple mac
> functionality?
>
Yes, please see the MACVLAN patch I posted one or two days earlier.
8021q can also make use of it and Dave mentioned some virtualization
devices want this as well.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC NET 00/02]: Secondary unicast address support
2007-06-21 19:13 ` Patrick McHardy
@ 2007-06-21 20:31 ` Eric W. Biederman
2007-06-22 0:08 ` Patrick McHardy
2007-06-22 1:56 ` Patrick McHardy
0 siblings, 2 replies; 15+ messages in thread
From: Eric W. Biederman @ 2007-06-21 20:31 UTC (permalink / raw)
To: Patrick McHardy; +Cc: netdev, shemminger, davem, jeff
Patrick McHardy <kaber@trash.net> writes:
> Eric W. Biederman wrote:
>> I'm trying to understand what the point of this patch is.
>>
>> In once sense I find the concept of filtering and listening for multiple
>> mac addresses very interesting, especially if we could break out different
>> streams of traffic by destination mac address into separate network devices.
>> This would remove the need to any kind of ethernet tunnel and makes multiple
>> network namespaces much more pleasant.
>>
>> However this just seems to allow a card to decode multiple mac addresses
>> which in some oddball load balancing configurations may actually be
>> useful, but it seems fairly limited.
>>
>> Do you have a specific use case you envision for this multiple mac
>> functionality?
>>
>
> Yes, please see the MACVLAN patch I posted one or two days earlier.
Thanks. That is what I was envisioning. I keep suspecting one of
the cool multi-rx queue nics my start doing some of the demux in hardware.
But whatever if it works and is relatively fast it is good enough for me.
> 8021q can also make use of it and Dave mentioned some virtualization
> devices want this as well.
Right makes sense. And ethernet bridging (which is the general case
of the virtualization Dave mentioned should also be able to take
advantage of multiple unicast addresses). So this definitely make
sense.
Have you done any performance testing with the macvlan code? With
the ethernet tunnel device we keep getting copied unicast packets on
some path or other which slowed things down. Simply not doing the
firewalling until the packets have made it through the macvlan device
should help here.
For the macvlan code do we need to do anything special if we transmit
to a mac we would normally receive? Another unicast mac of the same
nic for example.
For the macvlan hash you just use an upper byte. Is that just a
simple starting place, or do we not need a more complex hash.
Eric
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [RFC NET 00/02]: Secondary unicast address support
2007-06-21 19:13 ` David Miller
@ 2007-06-21 21:11 ` Caitlin Bestler
0 siblings, 0 replies; 15+ messages in thread
From: Caitlin Bestler @ 2007-06-21 21:11 UTC (permalink / raw)
To: netdev
netdev-owner@vger.kernel.org wrote:
> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Thu, 21 Jun 2007 13:08:12 -0600
>
>> However this just seems to allow a card to decode multiple mac
>> addresses which in some oddball load balancing configurations may
>> actually be useful, but it seems fairly limited.
>>
>> Do you have a specific use case you envision for this multiple mac
>> functionality?
>
> Virtualization.
>
> If you can't tell the ethernet card that more than 1 MAC
> address are for it, you have to turn the thing into promiscuous mode.
>
> Networking on virtualization is typically done by giving each
> guest a unique MAC address, the guests have a virtual network
> device that connects to the control node (or dom0 in Xen
> parlace) and/or other guests.
>
> The control node has a switch that routes the packets from
> the guests either to other guests or out the real ethernet interface.
>
> Each guest gets a unique MAC so that the switch can know
> which guest an incoming packet is for.
The same software switch could also throw away the excess
frames that promiscuous mode would have admitted. Unless
the misdirected frames were common it would not seem to
be a major CPU burden.
Keep in mind that the only MAC addresses that would have
been transmitted are the ones that the input filter would
have listed.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC NET 00/02]: Secondary unicast address support
2007-06-21 20:31 ` Eric W. Biederman
@ 2007-06-22 0:08 ` Patrick McHardy
2007-06-22 3:30 ` Ben Greear
2007-06-22 1:56 ` Patrick McHardy
1 sibling, 1 reply; 15+ messages in thread
From: Patrick McHardy @ 2007-06-22 0:08 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: netdev, shemminger, davem, jeff
Eric W. Biederman wrote:
> Patrick McHardy <kaber@trash.net> writes:
>
>
>> Eric W. Biederman wrote:
>>
>>> I'm trying to understand what the point of this patch is.
>>>
>>> In once sense I find the concept of filtering and listening for multiple
>>> mac addresses very interesting, especially if we could break out different
>>> streams of traffic by destination mac address into separate network devices.
>>> This would remove the need to any kind of ethernet tunnel and makes multiple
>>> network namespaces much more pleasant.
>>>
>>> However this just seems to allow a card to decode multiple mac addresses
>>> which in some oddball load balancing configurations may actually be
>>> useful, but it seems fairly limited.
>>>
>>> Do you have a specific use case you envision for this multiple mac
>>> functionality?
>>>
>>>
>> Yes, please see the MACVLAN patch I posted one or two days earlier.
>>
>
> Thanks. That is what I was envisioning. I keep suspecting one of
> the cool multi-rx queue nics my start doing some of the demux in hardware.
> But whatever if it works and is relatively fast it is good enough for me.
>
When NICs support that I guess they the macvlan driver could be adapted
to take advantage of that.
>
>> 8021q can also make use of it and Dave mentioned some virtualization
>> devices want this as well.
>>
>
> Right makes sense. And ethernet bridging (which is the general case
> of the virtualization Dave mentioned should also be able to take
> advantage of multiple unicast addresses). So this definitely make
> sense.
>
It needs promiscous mode to learn, so I'm not sure how much
this will help bridging.
> Have you done any performance testing with the macvlan code? With
> the ethernet tunnel device we keep getting copied unicast packets on
> some path or other which slowed things down. Simply not doing the
> firewalling until the packets have made it through the macvlan device
> should help here.
>
Performance should be at least as good as on a bridge device since
the macvlan driver does basically nothing and uses the same functions
for receiving and sending packets.
> For the macvlan code do we need to do anything special if we transmit
> to a mac we would normally receive? Another unicast mac of the same
> nic for example.
That doesn't happen under normal circumstances. I don't believe
it would work.
> For the macvlan hash you just use an upper byte. Is that just a
> simple starting place, or do we not need a more complex hash.
>
It comes from the original code, I think it should be good enough.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC NET 00/02]: Secondary unicast address support
2007-06-21 20:31 ` Eric W. Biederman
2007-06-22 0:08 ` Patrick McHardy
@ 2007-06-22 1:56 ` Patrick McHardy
2007-06-22 3:21 ` Ben Greear
1 sibling, 1 reply; 15+ messages in thread
From: Patrick McHardy @ 2007-06-22 1:56 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: netdev, shemminger, davem, jeff
Eric W. Biederman wrote:
> For the macvlan hash you just use an upper byte. Is that just a
> simple starting place, or do we not need a more complex hash.
>
That gave me an idea, since the default addresses are random
anyway I'm now using an incrementing counter for the upper byte.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC NET 00/02]: Secondary unicast address support
2007-06-22 1:56 ` Patrick McHardy
@ 2007-06-22 3:21 ` Ben Greear
2007-06-22 11:36 ` Patrick McHardy
0 siblings, 1 reply; 15+ messages in thread
From: Ben Greear @ 2007-06-22 3:21 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Eric W. Biederman, netdev, shemminger, davem, jeff
Patrick McHardy wrote:
> Eric W. Biederman wrote:
>> For the macvlan hash you just use an upper byte. Is that just a
>> simple starting place, or do we not need a more complex hash.
>>
>
> That gave me an idea, since the default addresses are random
> anyway I'm now using an incrementing counter for the upper byte.
Is there not a (relatively) easy way to hash the entire 6 bytes?
I'd prefer to be able to set the MACs to anything I want, without
worrying about trivially hitting a worst-case hash scenario.
Thanks,
Ben
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC NET 00/02]: Secondary unicast address support
2007-06-22 0:08 ` Patrick McHardy
@ 2007-06-22 3:30 ` Ben Greear
2007-06-22 4:30 ` Eric W. Biederman
0 siblings, 1 reply; 15+ messages in thread
From: Ben Greear @ 2007-06-22 3:30 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Eric W. Biederman, netdev, shemminger, davem, jeff
Patrick McHardy wrote:
> Eric W. Biederman wrote:
>> For the macvlan code do we need to do anything special if we transmit
>> to a mac we would normally receive? Another unicast mac of the same
>> nic for example.
>
> That doesn't happen under normal circumstances. I don't believe
> it would work.
Assuming you mean you want to send between two mac-vlans on the same physical
nic...
This can work if your mac-vlans are on different subnets and you are
routing between them (and if you have my send-to-self patch or have
another way to let a system send packets to itself).
A normal ethernet switch will NOT turn a packet around on the same
interface it was received, so that is why you must have them on different
subnets and have a router in between.
For sending directly to yourself, something like the 'veth' driver
is probably more useful.
>
>> For the macvlan hash you just use an upper byte. Is that just a
>> simple starting place, or do we not need a more complex hash.
>>
>
> It comes from the original code, I think it should be good enough.
Ahhh, I knew my hash was lame for some reason!
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC NET 00/02]: Secondary unicast address support
2007-06-22 3:30 ` Ben Greear
@ 2007-06-22 4:30 ` Eric W. Biederman
2007-06-22 12:08 ` Ben Greear
0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2007-06-22 4:30 UTC (permalink / raw)
To: Ben Greear; +Cc: Patrick McHardy, netdev, shemminger, davem, jeff
Ben Greear <greearb@candelatech.com> writes:
> Patrick McHardy wrote:
>> Eric W. Biederman wrote:
>
>>> For the macvlan code do we need to do anything special if we transmit
>>> to a mac we would normally receive? Another unicast mac of the same
>>> nic for example.
>>
>> That doesn't happen under normal circumstances. I don't believe
>> it would work.
>
> Assuming you mean you want to send between two mac-vlans on the same physical
> nic...
>
> This can work if your mac-vlans are on different subnets and you are
> routing between them (and if you have my send-to-self patch or have
> another way to let a system send packets to itself).
Ok. I didn't know if you could trigger this case without without
having then endpoints in separate namespaces. I was suspecting
the routing code would realize what we were doing realize the
route is local and route through lo.
> A normal ethernet switch will NOT turn a packet around on the same
> interface it was received, so that is why you must have them on different
> subnets and have a router in between.
Yes. That is essentially the configuration I was wondering about.
> For sending directly to yourself, something like the 'veth' driver
> is probably more useful.
True. And I think it has a place. However the common case with
the tunnel devices is to just hook them all up to an ethernet
bridge as well as a real ethernet device.
The far ends of the ethernet tunnels are dropped into different namespaces.
Which gets a very similar effect to the mac vlan code.
I'm just wondering if I can not setup an ethernet tunnel device
when my primary purpose is to talk to the outside world, but occasionally
want a little in the box traffic.
Eric
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC NET 00/02]: Secondary unicast address support
2007-06-22 3:21 ` Ben Greear
@ 2007-06-22 11:36 ` Patrick McHardy
0 siblings, 0 replies; 15+ messages in thread
From: Patrick McHardy @ 2007-06-22 11:36 UTC (permalink / raw)
To: Ben Greear; +Cc: Eric W. Biederman, netdev, shemminger, davem, jeff
Ben Greear wrote:
> Patrick McHardy wrote:
>
>> Eric W. Biederman wrote:
>>
>>> For the macvlan hash you just use an upper byte. Is that just a
>>> simple starting place, or do we not need a more complex hash.
>>>
>>
>>
>> That gave me an idea, since the default addresses are random
>> anyway I'm now using an incrementing counter for the upper byte.
>
>
> Is there not a (relatively) easy way to hash the entire 6 bytes?
>
> I'd prefer to be able to set the MACs to anything I want, without
> worrying about trivially hitting a worst-case hash scenario.
That would only happen if all your addresses have the same high
byte. I can't see a reason why you would want to do this, even
with manually configured addresses its still reasonable to
expect a uniform distribution.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC NET 00/02]: Secondary unicast address support
2007-06-22 4:30 ` Eric W. Biederman
@ 2007-06-22 12:08 ` Ben Greear
0 siblings, 0 replies; 15+ messages in thread
From: Ben Greear @ 2007-06-22 12:08 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Patrick McHardy, netdev, shemminger, davem, jeff
Eric W. Biederman wrote:
> Ben Greear <greearb@candelatech.com> writes:
>
>> Patrick McHardy wrote:
>>> Eric W. Biederman wrote:
>>>> For the macvlan code do we need to do anything special if we transmit
>>>> to a mac we would normally receive? Another unicast mac of the same
>>>> nic for example.
>>> That doesn't happen under normal circumstances. I don't believe
>>> it would work.
>> Assuming you mean you want to send between two mac-vlans on the same physical
>> nic...
>>
>> This can work if your mac-vlans are on different subnets and you are
>> routing between them (and if you have my send-to-self patch or have
>> another way to let a system send packets to itself).
>
> Ok. I didn't know if you could trigger this case without without
> having then endpoints in separate namespaces. I was suspecting
> the routing code would realize what we were doing realize the
> route is local and route through lo.
The routing code will short-circuit by default. It takes quite
a bit of effort to make them _not_ short circuit..that is what I
was talking about. Mac-vlans will be just like any
other ethernet nics as far as routing goes.
>
>> A normal ethernet switch will NOT turn a packet around on the same
>> interface it was received, so that is why you must have them on different
>> subnets and have a router in between.
>
> Yes. That is essentially the configuration I was wondering about.
>
>> For sending directly to yourself, something like the 'veth' driver
>> is probably more useful.
>
> True. And I think it has a place. However the common case with
> the tunnel devices is to just hook them all up to an ethernet
> bridge as well as a real ethernet device.
>
> The far ends of the ethernet tunnels are dropped into different namespaces.
>
> Which gets a very similar effect to the mac vlan code.
>
> I'm just wondering if I can not setup an ethernet tunnel device
> when my primary purpose is to talk to the outside world, but occasionally
> want a little in the box traffic.
mac-vlans should work on veth devices just fine, and the veths will also
short-circuit route (at least if they are in the same namespace).
I'm not sure I understand what you are trying to do..but in general
both veth and mac-vlans should act like ethernet nics..so if you can
find some way that does _not_ hold, please let us know.
Thanks,
Ben
>
> Eric
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2007-06-22 12:13 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-20 18:00 [RFC NET 00/02]: Secondary unicast address support Patrick McHardy
2007-06-20 18:00 ` [RFC NET 01/02]: " Patrick McHardy
2007-06-20 18:00 ` [RFC E1000 02/02]: " Patrick McHardy
2007-06-21 19:08 ` [RFC NET 00/02]: " Eric W. Biederman
2007-06-21 19:13 ` David Miller
2007-06-21 21:11 ` Caitlin Bestler
2007-06-21 19:13 ` Patrick McHardy
2007-06-21 20:31 ` Eric W. Biederman
2007-06-22 0:08 ` Patrick McHardy
2007-06-22 3:30 ` Ben Greear
2007-06-22 4:30 ` Eric W. Biederman
2007-06-22 12:08 ` Ben Greear
2007-06-22 1:56 ` Patrick McHardy
2007-06-22 3:21 ` Ben Greear
2007-06-22 11:36 ` Patrick McHardy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).