[net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
@ 2011-11-09  7:55 Roopa Prabhu
  2011-11-09  7:55 ` [net-next-2.6 PATCH 1/6 v4] rtnetlink: Netlink interface for setting MAC and VLAN filters Roopa Prabhu
                   ` (6 more replies)
  0 siblings, 7 replies; 36+ messages in thread
From: Roopa Prabhu @ 2011-11-09  7:55 UTC (permalink / raw)
  To: netdev, davem
  Cc: chrisw, sri, dragos.tatulea, kvm, arnd, mst, gregory.v.rose,
	mchan, dwang2, shemminger, eric.dumazet, kaber, benve

v3 -> v4
- Removed RFC in subject-prefix
- Regenerated patches over latest net-next
(no code changes)

Thanks to Greg Rose <gregory.v.rose@intel.com> for evaluating v3

v2 -> v3
- Moved set and get filter ops from rtnl_link_ops to netdev_ops
- Support for SRIOV VFs.
	[Note: The get filters msg (in the way current get rtnetlink handles
	it) might get too big for SRIOV vfs. This patch follows existing sriov 
	vf get code and tries to accomodate filters for all VF's in a PF. 
        And for the SRIOV case I have only tested the fact that the VF 
	arguments are getting delivered to rtnetlink correctly. The code
	follows existing sriov vf handling code so rest of it should work fine]
- Fixed all op and netlink attribute names to start with IFLA_RX_FILTER
- Changed macvlan filter ops to call corresponding lowerdev op if lowerdev 
  supports it for passthru mode. Else it falls back on macvlan handling the 
  filters locally as in v1 and v2

v1 -> v2
- Instead of TUNSETTXFILTER introduced rtnetlink interface for the same


Background and details:
=======================
Today macvtap used in virtualized environment does not have support to 
propagate MAC, VLAN and interface flags from guest to lowerdev.
Which means to be able to register additional VLANs, unicast and multicast
addresses or change pkt filter flags in the guest, the lowerdev has to be
put in promisocous mode. Today the only macvlan mode that supports this is 
the PASSTHRU mode and it puts the lower dev in promiscous mode.

PASSTHRU mode was added primarily for the SRIOV usecase. In PASSTHRU mode 
there is a 1-1 mapping between macvtap and physical NIC or VF.

There are two problems with putting the lowerdev in promiscous mode (ie SRIOV 
VF's):
	- Some SRIOV cards dont support promiscous mode today (Thread on Intel
	driver indicates that http://lists.openwall.net/netdev/2011/09/27/6)
	- For the SRIOV NICs that support it, Putting the lowerdev in 
	promiscous mode leads to additional traffic being sent up to the 
	guest virtio-net to filter result in extra overheads.
	
Both the above problems can be solved by offloading filtering to the 
lowerdev hw. ie lowerdev does not need to be in promiscous mode as 
long as the guest filters are passed down to the lowerdev. 

This patch basically adds the infrastructure to set and get MAC and VLAN 
filters on an interface via rtnetlink. It adds new netlink msg and netdev
ops for the same. And implements these ops in macvlan for passthru mode.

- Netlink interface:
    This patch provides the following netlink interface to set mac and vlan
    filters :

    Interface to set RX filter on a SRIOV VF:
    [IFLA_VF_RX_FILTERS] = {
    	[IFLA_VF_RX_FILTER] = {
    		[IFLA_RX_FILTER_VF]
    		[IFLA_RX_FILTER_ADDR] = {
    			[IFLA_RX_FILTER_ADDR_FLAGS]
    			[IFLA_RX_FILTER_ADDR_UC_LIST] = {
    				[IFLA_ADDR_LIST_ENTRY]
    			}
    			[IFLA_RX_FILTER_ADDR_MC_LIST] = {
    				[IFLA_ADDR_LIST_ENTRY]
    			}
    		}
    		[IFLA_RX_FILTER_VLAN] = {
    			[IFLA_RX_FILTER_VLAN_BITMAP]
    		}
    	}
    	...
    }
    
    Interface to set RX filter on a any network interface.:
    [IFLA_RX_FILTER] = {
    	[IFLA_RX_FILTER_VF]
    	[IFLA_RX_FILTER_ADDR] = {
    		[IFLA_RX_FILTER_ADDR_FLAGS]
    		[IFLA_RX_FILTER_ADDR_UC_LIST] = {
    			[IFLA_ADDR_LIST_ENTRY]
    		}
    		[IFLA_RX_FILTER_ADDR_MC_LIST] = {
    			[IFLA_ADDR_LIST_ENTRY]
    		}
    	}
    	[IFLA_RX_FILTER_VLAN] = {
    		[IFLA_RX_FILTER_VLAN_BITMAP]
	}
    } 

    Note1: The IFLA_RX_FILTER_VLAN is a nested attribute, but contains only 
    IFLA_RX_FILTER_VLAN_BITMAP today. The idea is that the IFLA_RX_FILTER_VLAN 
    can be extended tomorrow to have a vlan list if some implementations 
    prefer a list instead. 

    And it provides the following netdev_ops to set/get MAC/VLAN filters:

    int                     (*ndo_set_rx_filter_addr)(
	                                        struct net_device *dev, int vf,
                                                struct nlattr *tb[]);
    int                     (*ndo_set_rx_filter_vlan)(
                                                struct net_device *dev, int vf,
                                                struct nlattr *tb[]);
    size_t                  (*ndo_get_rx_filter_addr_size)(
                                                const struct net_device *dev,
                                                int vf);
    size_t                  (*ndo_get_rx_filter_vlan_size)(
                                                const struct net_device *dev,
                                                int vf);
    int                     (*ndo_get_rx_filter_addr)(
                                                const struct net_device *dev,
                                                int vf, struct sk_buff *skb);
    int                     (*ndo_get_rx_filter_vlan)(
                                                const struct net_device *dev,
                                                int vf, struct sk_buff *skb);

Some answers to questions that were raised during the review:
- Protection against address spoofing:
	- This patch adds filtering support only for macvtap PASSTHRU 
	Mode. PASSTHRU mode is used mainly with SRIOV VF's. And SRIOV VF's 
	come with anti mac/vlan spoofing support in the lowerdev driver. 
	(netdev infrastructure to support this was added recently 
	with IFLA_VF_SPOOFCHK). For 802.1Qbh devices, the port profile has a 
	knob to enable/disable anti spoof check. Lowerdevice drivers also 
	enforce limits on the number of address registrations allowed. 
	For non-SRIOV VF's its the responsibility of the lowerdev driver
	to implement any such protection. The currrent netdev hooks for 
	SRIOV VF's spoof check could be extended to accomodate any network 
	interface in the future.

- Support for multiqueue devices: Enable filtering on individual queues (?):
	As i understand after the thread between (Micheal and Greg),
	VMdq Linux implementation is not in yet and dont know how its going to
	take shape. But Intel VMdq devices do accept filters on a per-queue
	basis. Since the netdev infrastructure for VMdq is not in yet, Its
	hard to say how this patch can support it.

	This patch makes use of current netdev infrastructure for setting
	address and vlan filters. And if that changes for vmdq tomorrow,
	then the work that this patch represents can be modified to accomodate
	vmdq devices at that time. 

	So i dont see a huge problem with this patch coming in the way for
	vmdq devices.

- Support for non-PASSTHRU mode:
	I started implementing this. But there are a couple of problems.	
	- Today, in non-PASSTHRU cases macvlan_handle_frame assumes that 
	every macvlan device has a single unique mac.
	And the macvlans are hashed on that single mac address. 
	To support filtering for non-PASSTHRU mode in addition to this 
	patch the following needs to be done:
		- non-passthru mode with a single macvlan over a lower dev
		can be treated as PASSTHRU case
		- For non-PASSTHRU mode with multiple macvlans over a single 
		lower dev:  
			- Multiple unicast mac's now need to be hashed to the 
			same macvlan device. The macvlan hash needs to change 
			for lookup based on any one of the multiple unicast 
			addresses a macvlan is interested in
			- We need to consider vlans during the lookup too
			- So the macvlan device hash needs to hash on both mac 
			and vlan
		- But the support for filtering in non-PASSTHRU mode can be 
		built on this patch

This patch series implements the following 
01/6 rtnetlink: Netlink interface for setting MAC and VLAN filters
02/6 netdev: Add netdev_ops to set and get MAC/VLAN rx filters
03/6 rtnetlink: Add support to set MAC/VLAN filters
04/6 rtnetlink: Add support to get MAC/VLAN filters
05/6 macvlan: Add support to set MAC/VLAN filter netdev ops
06/6 macvlan: Add support to get MAC/VLAN filter netdev ops

Please comment. Thanks.

Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [net-next-2.6 PATCH 1/6 v4] rtnetlink: Netlink interface for setting MAC and VLAN filters
  2011-11-09  7:55 [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode Roopa Prabhu
@ 2011-11-09  7:55 ` Roopa Prabhu
  2011-11-18  0:17   ` Ben Hutchings
  2011-11-09  7:55 ` [net-next-2.6 PATCH 2/6 v4] net: Add netdev_ops to set and get MAC/VLAN rx filters Roopa Prabhu
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Roopa Prabhu @ 2011-11-09  7:55 UTC (permalink / raw)
  To: netdev, davem
  Cc: chrisw, sri, dragos.tatulea, kvm, arnd, mst, gregory.v.rose,
	mchan, dwang2, shemminger, eric.dumazet, kaber, benve

From: Roopa Prabhu <roprabhu@cisco.com>

This patch introduces the following netlink interface to set
MAC and VLAN filters on an network interface. It can be used to
set RX filter on any network interface (if supported by the driver) and
also on a SRIOV VF via its PF

Interface to set RX filter on a SRIOV VF
[IFLA_VF_RX_FILTERS] = {
	[IFLA_VF_RX_FILTER] = {
		[IFLA_RX_FILTER_VF]
		[IFLA_RX_FILTER_ADDR] = {
			[IFLA_RX_FILTER_ADDR_FLAGS]
			[IFLA_RX_FILTER_ADDR_UC_LIST] = {
				[IFLA_ADDR_LIST_ENTRY]
			}
			[IFLA_RX_FILTER_ADDR_MC_LIST] = {
				[IFLA_ADDR_LIST_ENTRY]
			}
		}
		[IFLA_RX_FILTER_VLAN] = {
			[IFLA_RX_FILTER_VLAN_BITMAP]
		}
	}
	...
}

Interface to set RX filter on any network interface.:
[IFLA_RX_FILTER] = {
	[IFLA_RX_FILTER_VF]
	[IFLA_RX_FILTER_ADDR] = {
		[IFLA_RX_FILTER_ADDR_FLAGS]
		[IFLA_RX_FILTER_ADDR_UC_LIST] = {
			[IFLA_ADDR_LIST_ENTRY]
		}
		[IFLA_RX_FILTER_ADDR_MC_LIST] = {
			[IFLA_ADDR_LIST_ENTRY]
		}
	}
	[IFLA_RX_FILTER_VLAN] = {
		[IFLA_RX_FILTER_VLAN_BITMAP]
	}
}

Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
---
 include/linux/if_link.h |   61 +++++++++++++++++++++++++++++++++++++++++++++++
 net/core/rtnetlink.c    |   20 +++++++++++++++
 2 files changed, 81 insertions(+), 0 deletions(-)


diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index c52d4b5..74a9f17 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -137,6 +137,8 @@ enum {
 	IFLA_AF_SPEC,
 	IFLA_GROUP,		/* Group the device belongs to */
 	IFLA_NET_NS_FD,
+	IFLA_VF_RX_FILTERS,
+	IFLA_RX_FILTER,
 	__IFLA_MAX
 };
 
@@ -390,4 +392,63 @@ struct ifla_port_vsi {
 	__u8 pad[3];
 };
 
+/* VF rx filters management section
+ *
+ *	Nested layout of set/get msg is:
+ *
+ *	[IFLA_VF_RX_FILTERS]
+ *		[IFLA_VF_RX_FILTER]
+ *			[IFLA_RX_FILTER_*], ...
+ *		[IFLA_VF_RX_FILTER]
+ *			[IFLA_RX_FILTER_*], ...
+ *		...
+ *	[IFLA_RX_FILTER]
+ *		[IFLA_RX_FILTER_*], ...
+ */
+enum {
+	IFLA_VF_RX_FILTER_UNSPEC,
+	IFLA_VF_RX_FILTER,			/* nest */
+	__IFLA_VF_RX_FILTER_MAX,
+};
+
+#define IFLA_VF_RX_FILTER_MAX (__IFLA_VF_RX_FILTER_MAX - 1)
+
+enum {
+	IFLA_RX_FILTER_UNSPEC,
+	IFLA_RX_FILTER_VF,		/* __u32 */
+	IFLA_RX_FILTER_ADDR,
+	IFLA_RX_FILTER_VLAN,
+	__IFLA_RX_FILTER_MAX,
+};
+#define IFLA_RX_FILTER_MAX (__IFLA_RX_FILTER_MAX - 1)
+
+enum {
+	IFLA_RX_FILTER_ADDR_UNSPEC,
+	IFLA_RX_FILTER_ADDR_FLAGS,
+	IFLA_RX_FILTER_ADDR_UC_LIST,
+	IFLA_RX_FILTER_ADDR_MC_LIST,
+	__IFLA_RX_FILTER_ADDR_MAX,
+};
+#define IFLA_RX_FILTER_ADDR_MAX (__IFLA_RX_FILTER_ADDR_MAX - 1)
+
+#define RX_FILTER_FLAGS (IFF_UP | IFF_BROADCAST | IFF_MULTICAST | \
+				IFF_PROMISC | IFF_ALLMULTI)
+
+enum {
+	IFLA_ADDR_LIST_UNSPEC,
+	IFLA_ADDR_LIST_ENTRY,
+	__IFLA_ADDR_LIST_MAX,
+};
+#define IFLA_ADDR_LIST_MAX (__IFLA_ADDR_LIST_MAX - 1)
+
+enum {
+	IFLA_RX_FILTER_VLAN_UNSPEC,
+	IFLA_RX_FILTER_VLAN_BITMAP,
+	__IFLA_RX_FILTER_VLAN_MAX,
+};
+#define IFLA_RX_FILTER_VLAN_MAX (__IFLA_RX_FILTER_VLAN_MAX - 1)
+
+#define VLAN_BITMAP_SPLIT_MAX 8
+#define VLAN_BITMAP_SIZE	(VLAN_N_VID/VLAN_BITMAP_SPLIT_MAX)
+
 #endif /* _LINUX_IF_LINK_H */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 9083e82..9eead8e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -42,6 +42,7 @@
 
 #include <linux/inet.h>
 #include <linux/netdevice.h>
+#include <linux/if_vlan.h>
 #include <net/ip.h>
 #include <net/protocol.h>
 #include <net/arp.h>
@@ -1097,6 +1098,8 @@ const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_VF_PORTS]		= { .type = NLA_NESTED },
 	[IFLA_PORT_SELF]	= { .type = NLA_NESTED },
 	[IFLA_AF_SPEC]		= { .type = NLA_NESTED },
+	[IFLA_VF_RX_FILTERS]	= { .type = NLA_NESTED },
+	[IFLA_RX_FILTER]	= { .type = NLA_NESTED },
 };
 EXPORT_SYMBOL(ifla_policy);
 
@@ -1132,6 +1135,23 @@ static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = {
 	[IFLA_PORT_RESPONSE]	= { .type = NLA_U16, },
 };
 
+static const struct nla_policy ifla_rx_filter_policy[IFLA_RX_FILTER_MAX+1] = {
+	[IFLA_RX_FILTER_VF]	= { .type = NLA_U32 },
+	[IFLA_RX_FILTER_ADDR]	= { .type = NLA_NESTED },
+	[IFLA_RX_FILTER_VLAN]	= { .type = NLA_NESTED },
+};
+
+static const struct nla_policy ifla_addr_filter_policy[IFLA_RX_FILTER_ADDR_MAX+1] = {
+	[IFLA_RX_FILTER_ADDR_FLAGS]	= { .type = NLA_U32 },
+	[IFLA_RX_FILTER_ADDR_UC_LIST]	= { .type = NLA_NESTED },
+	[IFLA_RX_FILTER_ADDR_MC_LIST]	= { .type = NLA_NESTED },
+};
+
+static const struct nla_policy ifla_vlan_filter_policy[IFLA_RX_FILTER_VLAN_MAX+1] = {
+	[IFLA_RX_FILTER_VLAN_BITMAP]	= { .type = NLA_BINARY,
+					    .len = VLAN_BITMAP_SIZE },
+};
+
 struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[])
 {
 	struct net *net;

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next-2.6 PATCH 2/6 v4] net: Add netdev_ops to set and get MAC/VLAN rx filters
  2011-11-09  7:55 [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode Roopa Prabhu
  2011-11-09  7:55 ` [net-next-2.6 PATCH 1/6 v4] rtnetlink: Netlink interface for setting MAC and VLAN filters Roopa Prabhu
@ 2011-11-09  7:55 ` Roopa Prabhu
  2011-11-09  7:55 ` [net-next-2.6 PATCH 3/6 v4] rtnetlink: Add support to set MAC/VLAN filters Roopa Prabhu
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Roopa Prabhu @ 2011-11-09  7:55 UTC (permalink / raw)
  To: netdev, davem
  Cc: chrisw, sri, dragos.tatulea, kvm, arnd, mst, gregory.v.rose,
	mchan, dwang2, shemminger, eric.dumazet, kaber, benve

From: Roopa Prabhu <roprabhu@cisco.com>

This patch adds the following netdev_ops to set and get MAC/VLAN
filters on a SRIOV VF or any netdev interface. Each op takes a vf argument.
vf value of SELF_VF or -1 is for applying the operation directly on the
interface.

ndo_set_rx_filter_addr - to set address filter
ndo_get_rx_filter_addr_size - to get address filter size
ndo_get_rx_filter_addr - To get addr filter

ndo_set_rx_filter_vlan - to set vlan filter
ndo_get_rx_filter_vlan_size - to get vlan filter size
ndo_get_rx_filter_vlan - To get vlan filter

Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
---
 include/linux/netdevice.h |   32 ++++++++++++++++++++++++++++++++
 1 files changed, 32 insertions(+), 0 deletions(-)


diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cbeb586..3cbd700 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -855,6 +855,20 @@ struct netdev_tc_txq {
  *	feature set might be less than what was returned by ndo_fix_features()).
  *	Must return >0 or -errno if it changed dev->features itself.
  *
+ * Address Filter management functions:
+ * int (*ndo_set_rx_filter_addr)(struct net_device *dev, int vf,
+ *				 struct nlattr *tb[]);
+ * size_t (*ndo_get_rx_filter_addr_size)(const struct net_device *dev, int vf);
+ * int (*ndo_get_rx_filter_addr)(const struct net_device *dev, int vf,
+ *				 struct sk_buff *skb);
+ *
+ * Vlan Filter management functions:
+ * int (*ndo_set_rx_filter_vlan)(struct net_device *dev, int vf,
+ *				 struct nlattr *tb[]);
+ * size_t (*ndo_get_rx_filter_vlan_size)(const struct net_device *dev, int vf);
+ * int (*ndo_get_rx_filter_vlan)(const struct net_device *dev, int vf,
+ *				 struct sk_buff *skb);
+ *
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -948,6 +962,24 @@ struct net_device_ops {
 						    u32 features);
 	int			(*ndo_set_features)(struct net_device *dev,
 						    u32 features);
+	int			(*ndo_set_rx_filter_addr)(
+						struct net_device *dev, int vf,
+						struct nlattr *tb[]);
+	size_t			(*ndo_get_rx_filter_addr_size)(
+						const struct net_device *dev,
+						int vf);
+	int			(*ndo_get_rx_filter_addr)(
+						const struct net_device *dev,
+						int vf, struct sk_buff *skb);
+	int			(*ndo_set_rx_filter_vlan)(
+						struct net_device *dev, int vf,
+						struct nlattr *tb[]);
+	size_t			(*ndo_get_rx_filter_vlan_size)(
+						const struct net_device *dev,
+						int vf);
+	int			(*ndo_get_rx_filter_vlan)(
+						const struct net_device *dev,
+						int vf, struct sk_buff *skb);
 };
 
 /*

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next-2.6 PATCH 3/6 v4] rtnetlink: Add support to set MAC/VLAN filters
  2011-11-09  7:55 [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode Roopa Prabhu
  2011-11-09  7:55 ` [net-next-2.6 PATCH 1/6 v4] rtnetlink: Netlink interface for setting MAC and VLAN filters Roopa Prabhu
  2011-11-09  7:55 ` [net-next-2.6 PATCH 2/6 v4] net: Add netdev_ops to set and get MAC/VLAN rx filters Roopa Prabhu
@ 2011-11-09  7:55 ` Roopa Prabhu
  2011-11-09  7:55 ` [net-next-2.6 PATCH 4/6 v4] rtnetlink: Add support to get " Roopa Prabhu
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Roopa Prabhu @ 2011-11-09  7:55 UTC (permalink / raw)
  To: netdev, davem
  Cc: chrisw, sri, dragos.tatulea, kvm, arnd, mst, gregory.v.rose,
	mchan, dwang2, shemminger, eric.dumazet, kaber, benve

From: Roopa Prabhu <roprabhu@cisco.com>

This patch adds support in rtnetlink for IFLA_RX_FILTER and
IFLA_VF_RX_FILTERS set. It calls netdev_ops->set_rx_filter_addr and
rtnl_link_ops->set_rx_filter_vlan

Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
---
 include/linux/if_link.h |    2 +
 net/core/rtnetlink.c    |  101 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 103 insertions(+), 0 deletions(-)


diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 74a9f17..a8c2c14 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -268,6 +268,8 @@ enum macvlan_mode {
 
 /* SR-IOV virtual function management section */
 
+#define SELF_VF		-1
+
 enum {
 	IFLA_VF_INFO_UNSPEC,
 	IFLA_VF_INFO,
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 9eead8e..a042910 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1294,6 +1294,66 @@ static int do_set_master(struct net_device *dev, int ifindex)
 	return 0;
 }
 
+static int do_set_rx_filter(struct net_device *dev, int vf,
+			    struct nlattr *rx_filter[],
+			    int *modified)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	int err = 0;
+
+	if (rx_filter[IFLA_RX_FILTER_ADDR]) {
+		struct nlattr *addr_filter[IFLA_RX_FILTER_ADDR_MAX+1];
+
+		if (!ops->ndo_set_rx_filter_addr) {
+			err = -EOPNOTSUPP;
+			goto errout;
+		}
+
+		err = nla_parse_nested(addr_filter, IFLA_RX_FILTER_ADDR_MAX,
+				rx_filter[IFLA_RX_FILTER_ADDR],
+				ifla_addr_filter_policy);
+		if (err < 0)
+			goto errout;
+
+		if (addr_filter[IFLA_RX_FILTER_ADDR_FLAGS]) {
+			unsigned int flags = nla_get_u32(
+					addr_filter[IFLA_RX_FILTER_ADDR_FLAGS]);
+			if (flags & ~RX_FILTER_FLAGS) {
+				err = -EINVAL;
+				goto errout;
+			}
+		}
+
+		err = ops->ndo_set_rx_filter_addr(dev, vf, addr_filter);
+		if (err < 0)
+			goto errout;
+		*modified = 1;
+	}
+
+	if (rx_filter[IFLA_RX_FILTER_VLAN]) {
+		struct nlattr *vlan_filter[IFLA_RX_FILTER_VLAN_MAX+1];
+
+		if (!ops->ndo_set_rx_filter_vlan) {
+			err = -EOPNOTSUPP;
+			goto errout;
+		}
+
+		err = nla_parse_nested(vlan_filter, IFLA_RX_FILTER_VLAN_MAX,
+				rx_filter[IFLA_RX_FILTER_VLAN],
+				ifla_vlan_filter_policy);
+		if (err < 0)
+			goto errout;
+
+		err = ops->ndo_set_rx_filter_vlan(dev, vf, vlan_filter);
+		if (err < 0)
+			goto errout;
+		*modified = 1;
+	}
+
+errout:
+	return err;
+}
+
 static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
 		      struct nlattr **tb, char *ifname, int modified)
 {
@@ -1515,6 +1575,47 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
 			modified = 1;
 		}
 	}
+
+	if (tb[IFLA_VF_RX_FILTERS]) {
+		struct nlattr *vf_rx_filter[IFLA_RX_FILTER_MAX+1];
+		struct nlattr *attr;
+		int vf;
+		int rem;
+
+		nla_for_each_nested(attr, tb[IFLA_VF_RX_FILTERS], rem) {
+			if (nla_type(attr) != IFLA_VF_RX_FILTER)
+				continue;
+			err = nla_parse_nested(vf_rx_filter, IFLA_RX_FILTER_MAX,
+					attr, ifla_rx_filter_policy);
+			if (err < 0)
+				goto errout;
+
+			if (!vf_rx_filter[IFLA_RX_FILTER_VF]) {
+				err = -EOPNOTSUPP;
+				goto errout;
+			}
+			vf = nla_get_u32(vf_rx_filter[IFLA_RX_FILTER_VF]);
+
+			err = do_set_rx_filter(dev, vf, vf_rx_filter,
+					 &modified);
+			if (err < 0)
+				goto errout;
+		}
+	}
+
+	if (tb[IFLA_RX_FILTER]) {
+		struct nlattr *rx_filter[IFLA_RX_FILTER_MAX+1];
+
+		err = nla_parse_nested(rx_filter, IFLA_RX_FILTER_MAX,
+				tb[IFLA_RX_FILTER], ifla_rx_filter_policy);
+		if (err < 0)
+			goto errout;
+
+		err = do_set_rx_filter(dev, SELF_VF, rx_filter, &modified);
+		if (err < 0)
+			goto errout;
+	}
+
 	err = 0;
 
 errout:


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next-2.6 PATCH 4/6 v4] rtnetlink: Add support to get MAC/VLAN filters
  2011-11-09  7:55 [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode Roopa Prabhu
                   ` (2 preceding siblings ...)
  2011-11-09  7:55 ` [net-next-2.6 PATCH 3/6 v4] rtnetlink: Add support to set MAC/VLAN filters Roopa Prabhu
@ 2011-11-09  7:55 ` Roopa Prabhu
  2011-11-09  7:56 ` [net-next-2.6 PATCH 5/6 v4] macvlan: Add support to for netdev ops to set " Roopa Prabhu
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Roopa Prabhu @ 2011-11-09  7:55 UTC (permalink / raw)
  To: netdev, davem
  Cc: chrisw, sri, dragos.tatulea, kvm, arnd, mst, gregory.v.rose,
	mchan, dwang2, shemminger, eric.dumazet, kaber, benve

From: Roopa Prabhu <roprabhu@cisco.com>

This patch adds support in rtnetlink for IFLA_RX_VF_FILTERS and
IFLA_RX_FILTER get. It gets the size of the filters using
netdev_ops->get_rx_filter_addr_size and netdev_ops->get_rx_filter_vlan_size
and uses netdev_ops->get_rx_filter_addr and netdev_ops->get_rx_filter_vlan.
In case of IFLA_RX_VF_FILTERS it loops through all vf's to get the filter
data

Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
---
 net/core/rtnetlink.c |  159 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 158 insertions(+), 1 deletions(-)


diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a042910..ea861b4 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -475,6 +475,62 @@ static size_t rtnl_link_get_af_size(const struct net_device *dev)
 	return size;
 }
 
+static size_t rtnl_vf_rx_filter_size(const struct net_device *dev, int vf)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	size_t size;
+
+	/* IFLA_RX_FILTER  or IFLA_VF_RX_FILTER */
+	size = nla_total_size(sizeof(struct nlattr));
+
+	if (vf != SELF_VF)
+		size = nla_total_size(4); /* IFLA_RX_FILTER_VF */
+
+	if (ops->ndo_get_rx_filter_addr_size) {
+		size_t rx_filter_addr_size =
+				ops->ndo_get_rx_filter_addr_size(dev, vf);
+
+		if (rx_filter_addr_size)
+			/* IFLA_RX_FILTER_ADDR */
+			size += nla_total_size(sizeof(struct nlattr)) +
+					rx_filter_addr_size;
+	}
+
+	if (ops->ndo_get_rx_filter_vlan_size) {
+		size_t rx_filter_vlan_size =
+				ops->ndo_get_rx_filter_vlan_size(dev, vf);
+
+		if (rx_filter_vlan_size)
+			/* IFLA_RX_FILTER_VLAN */
+			size += nla_total_size(sizeof(struct nlattr)) +
+					rx_filter_vlan_size;
+	}
+
+	return size;
+}
+
+static size_t rtnl_rx_filter_size(const struct net_device *dev)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	int vf = SELF_VF;
+	size_t size;
+
+	if (!ops->ndo_get_rx_filter_addr_size &&
+	    !ops->ndo_get_rx_filter_vlan_size)
+		return 0;
+
+	size = rtnl_vf_rx_filter_size(dev, vf); /* SELF_VF */
+
+	if (dev->dev.parent && dev_num_vf(dev->dev.parent)) {
+		/* IFLA_VF_RX_FILTERS */
+		size = nla_total_size(sizeof(struct nlattr));
+		for (vf = 0; vf < dev_num_vf(dev->dev.parent); vf++)
+			size += rtnl_vf_rx_filter_size(dev, vf);
+	}
+
+	return size;
+}
+
 static int rtnl_link_fill(struct sk_buff *skb, const struct net_device *dev)
 {
 	const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
@@ -513,6 +569,102 @@ out:
 	return err;
 }
 
+static int rtnl_vf_rx_filter_fill(struct sk_buff *skb,
+				  const struct net_device *dev, int vf)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+	struct nlattr *addr_filter = NULL, *vlan_filter = NULL;
+	struct nlattr *rx_filter;
+	int err = -EMSGSIZE;
+	int filter_attrtype =
+		(vf == SELF_VF ? IFLA_RX_FILTER : IFLA_VF_RX_FILTER);
+
+	rx_filter = nla_nest_start(skb, filter_attrtype);
+	if (rx_filter == NULL)
+		goto nla_put_failure;
+
+	if (vf != SELF_VF)
+		NLA_PUT_U32(skb, IFLA_RX_FILTER_VF, vf);
+
+	if (ops->ndo_get_rx_filter_addr) {
+		addr_filter = nla_nest_start(skb, IFLA_RX_FILTER_ADDR);
+		if (addr_filter == NULL)
+			goto err_cancel_rx_filter;
+		err = ops->ndo_get_rx_filter_addr(dev, vf, skb);
+		if (err == -ENODATA)
+			nla_nest_cancel(skb, addr_filter);
+		else if (err < 0)
+			goto err_cancel_addr_filter;
+		else
+			nla_nest_end(skb, addr_filter);
+	}
+
+	if (ops->ndo_get_rx_filter_vlan) {
+		vlan_filter = nla_nest_start(skb, IFLA_RX_FILTER_VLAN);
+		if (vlan_filter == NULL)
+			goto err_cancel_addr_filter;
+		err = ops->ndo_get_rx_filter_vlan(dev, vf, skb);
+		if (err == -ENODATA)
+			nla_nest_cancel(skb, vlan_filter);
+		else if (err)
+			goto err_cancel_vlan_filter;
+		else
+			nla_nest_end(skb, vlan_filter);
+	}
+	nla_nest_end(skb, rx_filter);
+
+	return 0;
+
+err_cancel_vlan_filter:
+	if (vlan_filter)
+		nla_nest_cancel(skb, vlan_filter);
+err_cancel_addr_filter:
+	if (addr_filter)
+		nla_nest_cancel(skb, addr_filter);
+err_cancel_rx_filter:
+	nla_nest_cancel(skb, rx_filter);
+nla_put_failure:
+	return err;
+}
+
+static int rtnl_rx_filter_fill(struct sk_buff *skb,
+			       const struct net_device *dev)
+{
+	struct nlattr *vf_rx_filters = NULL;
+	int vf = SELF_VF;
+	int err;
+
+	if (!dev->netdev_ops->ndo_get_rx_filter_addr &&
+	    !dev->netdev_ops->ndo_get_rx_filter_vlan)
+		return 0;
+
+	err = rtnl_vf_rx_filter_fill(skb, dev, vf); /* SELF_VF */
+	if (err)
+		return err;
+
+	if (dev->dev.parent && dev_num_vf(dev->dev.parent)) {
+		vf_rx_filters = nla_nest_start(skb, IFLA_VF_RX_FILTERS);
+		if (!vf_rx_filters)
+			return -EMSGSIZE;
+
+		for (vf = 0; vf < dev_num_vf(dev->dev.parent); vf++) {
+			err = rtnl_vf_rx_filter_fill(skb, dev, vf);
+			if (err == -EMSGSIZE)
+				goto err_cancel_nest_vf_rx_filters;
+		}
+
+		nla_nest_end(skb, vf_rx_filters);
+	}
+
+	return 0;
+
+err_cancel_nest_vf_rx_filters:
+	if (vf_rx_filters)
+		nla_nest_cancel(skb, vf_rx_filters);
+
+	return err;
+}
+
 static const int rtm_min[RTM_NR_FAMILIES] =
 {
 	[RTM_FAM(RTM_NEWLINK)]      = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
@@ -786,7 +938,9 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev)
 	       + rtnl_vfinfo_size(dev) /* IFLA_VFINFO_LIST */
 	       + rtnl_port_size(dev) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
 	       + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
-	       + rtnl_link_get_af_size(dev); /* IFLA_AF_SPEC */
+	       + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
+		/* IFLA_VF_RX_FILTERS + IFLA_RX_FILTER */
+	       + rtnl_rx_filter_size(dev);
 }
 
 static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
@@ -996,6 +1150,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	if (rtnl_port_fill(skb, dev))
 		goto nla_put_failure;
 
+	if (rtnl_rx_filter_fill(skb, dev) < 0)
+		goto nla_put_failure;
+
 	if (dev->rtnl_link_ops) {
 		if (rtnl_link_fill(skb, dev) < 0)
 			goto nla_put_failure;


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next-2.6 PATCH 5/6 v4] macvlan: Add support to for netdev ops to set MAC/VLAN filters
  2011-11-09  7:55 [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode Roopa Prabhu
                   ` (3 preceding siblings ...)
  2011-11-09  7:55 ` [net-next-2.6 PATCH 4/6 v4] rtnetlink: Add support to get " Roopa Prabhu
@ 2011-11-09  7:56 ` Roopa Prabhu
  2011-11-09  7:56 ` [net-next-2.6 PATCH 6/6 v4] macvlan: Add support to get MAC/VLAN filter netdev ops Roopa Prabhu
  2011-11-18  0:15 ` [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode Ben Hutchings
  6 siblings, 0 replies; 36+ messages in thread
From: Roopa Prabhu @ 2011-11-09  7:56 UTC (permalink / raw)
  To: netdev, davem
  Cc: chrisw, sri, dragos.tatulea, kvm, arnd, mst, gregory.v.rose,
	mchan, dwang2, shemminger, eric.dumazet, kaber, benve

From: Roopa Prabhu <roprabhu@cisco.com>

This patch adds support for MAC and VLAN filter netdev ops
on a macvlan interface. It adds support for set_rx_filter_addr and
set_rx_filter_vlan netdev operations. It currently supports only macvlan
PASSTHRU mode. And removes the code that puts the lowerdev in promiscous mode.

For passthru mode,
	For both Address and vlan filters set, lowerdev
	netdev_ops->set_rx_filter_addr and netdev_ops->set_rx_filter_vlan
	are called if the lowerdev supports these ops.

	Else parse the filter data and update the lowerdev filters:
	 - Address filters: macvlan netdev uc and mc lists and flags are
	updated to reflect the addresses and address filter flags that came
	in the filter. Which inturn results in calls to macvlan_set_rx_mode and
	macvlan_change_rx_flags. These functions pass the filter addresses
	and flags to lowerdev netdev. And the lowerdev driver will pass it
	to the hw.

	- VLAN filter: Currently applied vlan bitmap is cached in
	struct macvlan_dev->vlan_filter. This vlan bitmap is updated to
	reflect the new bitmap that came in the netlink vlan filter msg.
	macvlan_vlan_rx_add_vid and macvlan_vlan_rx_kill_vid are called
	to update the vlan ids on the macvlan netdev, which in turn results in
	passing the vlan ids to the lowerdev using netdev_ops
	ndo_vlan_rx_add_vid and ndo_vlan_rx_kill_vid


Note: In future if most lowerdev drivers find use for these ops and start
supporting them, we could remove the local handling of filters for passthru
mode in macvlan

Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
---
 drivers/net/macvlan.c      |  331 ++++++++++++++++++++++++++++++++++++++++----
 include/linux/if_macvlan.h |    2 
 2 files changed, 300 insertions(+), 33 deletions(-)


diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 7413497..c2dea97 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -309,30 +309,37 @@ static int macvlan_open(struct net_device *dev)
 	struct net_device *lowerdev = vlan->lowerdev;
 	int err;
 
-	if (vlan->port->passthru) {
-		dev_set_promiscuity(lowerdev, 1);
-		goto hash_add;
-	}
+	if (!vlan->port->passthru) {
+		err = -EBUSY;
+		if (macvlan_addr_busy(vlan->port, dev->dev_addr))
+			goto out;
 
-	err = -EBUSY;
-	if (macvlan_addr_busy(vlan->port, dev->dev_addr))
-		goto out;
+		err = dev_uc_add(lowerdev, dev->dev_addr);
+		if (err < 0)
+			goto out;
+	}
 
-	err = dev_uc_add(lowerdev, dev->dev_addr);
-	if (err < 0)
-		goto out;
 	if (dev->flags & IFF_ALLMULTI) {
 		err = dev_set_allmulti(lowerdev, 1);
 		if (err < 0)
 			goto del_unicast;
 	}
 
-hash_add:
+	if (dev->flags & IFF_PROMISC) {
+		err = dev_set_promiscuity(lowerdev, 1);
+		if (err < 0)
+			goto unset_allmulti;
+	}
+
 	macvlan_hash_add(vlan);
 	return 0;
 
+unset_allmulti:
+	dev_set_allmulti(lowerdev, -1);
+
 del_unicast:
-	dev_uc_del(lowerdev, dev->dev_addr);
+	if (!vlan->port->passthru)
+		dev_uc_del(lowerdev, dev->dev_addr);
 out:
 	return err;
 }
@@ -342,18 +349,16 @@ static int macvlan_stop(struct net_device *dev)
 	struct macvlan_dev *vlan = netdev_priv(dev);
 	struct net_device *lowerdev = vlan->lowerdev;
 
-	if (vlan->port->passthru) {
-		dev_set_promiscuity(lowerdev, -1);
-		goto hash_del;
-	}
-
+	dev_uc_unsync(lowerdev, dev);
 	dev_mc_unsync(lowerdev, dev);
 	if (dev->flags & IFF_ALLMULTI)
 		dev_set_allmulti(lowerdev, -1);
+	if (dev->flags & IFF_PROMISC)
+		dev_set_promiscuity(lowerdev, -1);
 
-	dev_uc_del(lowerdev, dev->dev_addr);
+	if (!vlan->port->passthru)
+		dev_uc_del(lowerdev, dev->dev_addr);
 
-hash_del:
 	macvlan_hash_del(vlan, !dev->dismantle);
 	return 0;
 }
@@ -394,12 +399,16 @@ static void macvlan_change_rx_flags(struct net_device *dev, int change)
 
 	if (change & IFF_ALLMULTI)
 		dev_set_allmulti(lowerdev, dev->flags & IFF_ALLMULTI ? 1 : -1);
+	if (change & IFF_PROMISC)
+		dev_set_promiscuity(lowerdev,
+			dev->flags & IFF_PROMISC ? 1 : -1);
 }
 
-static void macvlan_set_multicast_list(struct net_device *dev)
+static void macvlan_set_rx_mode(struct net_device *dev)
 {
 	struct macvlan_dev *vlan = netdev_priv(dev);
 
+	dev_uc_sync(vlan->lowerdev, dev);
 	dev_mc_sync(vlan->lowerdev, dev);
 }
 
@@ -542,6 +551,257 @@ static void macvlan_vlan_rx_kill_vid(struct net_device *dev,
 		ops->ndo_vlan_rx_kill_vid(lowerdev, vid);
 }
 
+static inline void macvlan_set_filter_vlan(struct net_device *dev, int vid)
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+
+	set_bit(vid, vlan->vlan_filter);
+	macvlan_vlan_rx_add_vid(dev, vid);
+}
+
+static inline void macvlan_clear_filter_vlan(struct net_device *dev, int vid)
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+
+	clear_bit(vid, vlan->vlan_filter);
+	macvlan_vlan_rx_kill_vid(dev, vid);
+}
+
+static int macvlan_set_rx_filter_vlan_passthru(struct net_device *dev, int vf,
+					       struct nlattr *tb[])
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	struct net_device *lowerdev = vlan->lowerdev;
+	const struct net_device_ops *ops = lowerdev->netdev_ops;
+	unsigned long *vlans;
+	u16 vid;
+
+	if (ops->ndo_set_rx_filter_vlan)
+		return ops->ndo_set_rx_filter_vlan(dev, vf, tb);
+
+	if (!tb[IFLA_RX_FILTER_VLAN_BITMAP])
+		return -EINVAL;
+
+	vlans = nla_data(tb[IFLA_RX_FILTER_VLAN_BITMAP]);
+
+	/*
+	 *	Clear vlans that are not present in the new filter
+	 */
+	for_each_set_bit(vid, vlan->vlan_filter, VLAN_N_VID) {
+		if (!test_bit(vid, vlans))
+			macvlan_clear_filter_vlan(dev, vid);
+	}
+
+	/*
+	 *	Set new vlans that came in the filter
+	 */
+	for_each_set_bit(vid, vlans, VLAN_N_VID) {
+		if (!test_bit(vid, vlan->vlan_filter))
+			macvlan_set_filter_vlan(dev, vid);
+	}
+
+	return 0;
+}
+
+static int macvlan_set_rx_filter_vlan(struct net_device *dev, int vf,
+				      struct nlattr *tb[])
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	int err;
+
+	if (vf != SELF_VF)
+		return -EINVAL;
+
+	switch (vlan->mode) {
+	case MACVLAN_MODE_PASSTHRU:
+		return macvlan_set_rx_filter_vlan_passthru(dev, vf, tb);
+		break;
+	default:
+		err = -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+static int macvlan_addr_in_hw_list(struct netdev_hw_addr_list *list,
+				   u8 *addr, int addrlen)
+{
+	struct netdev_hw_addr *ha;
+
+	netdev_hw_addr_list_for_each(ha, list) {
+		if (!memcmp(ha->addr, addr, addrlen))
+			return 1;
+	}
+
+	return 0;
+}
+
+static int macvlan_addr_in_attrs(struct nlattr *addr_list, u8 *addr,
+				 int addrlen)
+{
+	struct nlattr *addr_attr;
+	int addr_rem;
+
+	nla_for_each_nested(addr_attr, addr_list, addr_rem) {
+		if (!memcmp(nla_data(addr_attr), addr, addrlen))
+			return 1;
+	}
+
+	return 0;
+}
+
+static int macvlan_update_hw_addr_list(struct net_device *dev,
+				struct netdev_hw_addr_list *curr_addr_list,
+				int addr_list_type,
+				struct nlattr *new_addr_attrs)
+{
+	struct nlattr *addr_attr;
+	int addr_rem;
+	u8 *addr;
+	int alen, i;
+	int err = 0;
+
+	if (!netdev_hw_addr_list_empty(curr_addr_list)) {
+		struct netdev_hw_addr *ha;
+		u8 *del_addrlist;
+		int del_addr_count = 0;
+
+		alen = ETH_ALEN * netdev_hw_addr_list_count(curr_addr_list);
+		del_addrlist = kmalloc(alen, GFP_KERNEL);
+		if (!del_addrlist) {
+			err = -ENOMEM;
+			goto err_out;
+		}
+
+		/*
+		 *	Get the addresses that need to be deleted
+		 */
+		netdev_hw_addr_list_for_each(ha, curr_addr_list) {
+			if (!macvlan_addr_in_attrs(new_addr_attrs, ha->addr,
+				ETH_ALEN))
+				memcpy(del_addrlist + (del_addr_count++ *
+					ETH_ALEN), ha->addr, ETH_ALEN);
+		}
+
+		/*
+		 * Delete addresses
+		 */
+		for (i = 0, addr = del_addrlist; i < del_addr_count && addr;
+			i++, addr += ETH_ALEN) {
+			if (addr_list_type == NETDEV_HW_ADDR_T_UNICAST)
+				dev_uc_del(dev, addr);
+			else if (addr_list_type == NETDEV_HW_ADDR_T_MULTICAST)
+				dev_mc_del(dev, addr);
+		}
+		kfree(del_addrlist);
+	}
+
+	/* Add new addresses */
+	nla_for_each_nested(addr_attr, new_addr_attrs, addr_rem) {
+		if (!macvlan_addr_in_hw_list(curr_addr_list,
+			nla_data(addr_attr), ETH_ALEN)) {
+			if (addr_list_type == NETDEV_HW_ADDR_T_UNICAST)
+				dev_uc_add(dev, nla_data(addr_attr));
+			else if (addr_list_type == NETDEV_HW_ADDR_T_MULTICAST)
+				dev_mc_add(dev, nla_data(addr_attr));
+		}
+	}
+
+	return 0;
+
+err_out:
+	return err;
+}
+
+static int macvlan_set_rx_filter_addr_passthru(struct net_device *dev,
+					       int vf, struct nlattr *tb[])
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	struct net_device *lowerdev = vlan->lowerdev;
+	const struct net_device_ops *ops = lowerdev->netdev_ops;
+	unsigned int flags, flags_changed;
+	int err;
+
+	if (ops->ndo_set_rx_filter_addr)
+		return ops->ndo_set_rx_filter_addr(vlan->lowerdev, vf, tb);
+
+	if (tb[IFLA_RX_FILTER_ADDR_FLAGS]) {
+		flags = nla_get_u32(tb[IFLA_RX_FILTER_ADDR_FLAGS]);
+
+		flags_changed = (dev->flags ^ flags) & RX_FILTER_FLAGS;
+		if (flags_changed)
+			dev_change_flags(dev, dev->flags ^ flags_changed);
+	}
+
+	if (tb[IFLA_RX_FILTER_ADDR_UC_LIST]) {
+		err = macvlan_update_hw_addr_list(dev, &dev->uc,
+				NETDEV_HW_ADDR_T_UNICAST,
+				tb[IFLA_RX_FILTER_ADDR_UC_LIST]);
+		if (err)
+			return err;
+	}
+
+	if (tb[IFLA_RX_FILTER_ADDR_MC_LIST]) {
+		err = macvlan_update_hw_addr_list(dev, &dev->mc,
+				NETDEV_HW_ADDR_T_MULTICAST,
+				tb[IFLA_RX_FILTER_ADDR_MC_LIST]);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int macvlan_validate_rx_filter_addr(struct net_device *dev, int vf,
+					   struct nlattr *tb[])
+{
+	struct nlattr *addr_attr;
+	int addr_rem;
+
+	if (vf != SELF_VF)
+		return -EINVAL;
+
+	if (tb[IFLA_RX_FILTER_ADDR_UC_LIST]) {
+		nla_for_each_nested(addr_attr, tb[IFLA_RX_FILTER_ADDR_UC_LIST],
+				    addr_rem) {
+			if ((nla_type(addr_attr) != IFLA_ADDR_LIST_ENTRY) ||
+				!is_unicast_ether_addr(nla_data(addr_attr)))
+				return -EINVAL;
+		}
+	}
+
+	if (tb[IFLA_RX_FILTER_ADDR_MC_LIST]) {
+		nla_for_each_nested(addr_attr, tb[IFLA_RX_FILTER_ADDR_MC_LIST],
+				    addr_rem) {
+			if ((nla_type(addr_attr) != IFLA_ADDR_LIST_ENTRY) ||
+				!is_multicast_ether_addr(nla_data(addr_attr)))
+				return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+static int macvlan_set_rx_filter_addr(struct net_device *dev, int vf,
+				      struct nlattr *tb[])
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	int err;
+
+	err = macvlan_validate_rx_filter_addr(dev, vf, tb);
+	if (err)
+		return err;
+
+	switch (vlan->mode) {
+	case MACVLAN_MODE_PASSTHRU:
+		return macvlan_set_rx_filter_addr_passthru(dev, vf, tb);
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
 static void macvlan_ethtool_get_drvinfo(struct net_device *dev,
 					struct ethtool_drvinfo *drvinfo)
 {
@@ -564,19 +824,21 @@ static const struct ethtool_ops macvlan_ethtool_ops = {
 };
 
 static const struct net_device_ops macvlan_netdev_ops = {
-	.ndo_init		= macvlan_init,
-	.ndo_uninit		= macvlan_uninit,
-	.ndo_open		= macvlan_open,
-	.ndo_stop		= macvlan_stop,
-	.ndo_start_xmit		= macvlan_start_xmit,
-	.ndo_change_mtu		= macvlan_change_mtu,
-	.ndo_change_rx_flags	= macvlan_change_rx_flags,
-	.ndo_set_mac_address	= macvlan_set_mac_address,
-	.ndo_set_rx_mode	= macvlan_set_multicast_list,
-	.ndo_get_stats64	= macvlan_dev_get_stats64,
-	.ndo_validate_addr	= eth_validate_addr,
-	.ndo_vlan_rx_add_vid	= macvlan_vlan_rx_add_vid,
-	.ndo_vlan_rx_kill_vid	= macvlan_vlan_rx_kill_vid,
+	.ndo_init			= macvlan_init,
+	.ndo_uninit			= macvlan_uninit,
+	.ndo_open			= macvlan_open,
+	.ndo_stop			= macvlan_stop,
+	.ndo_start_xmit			= macvlan_start_xmit,
+	.ndo_change_mtu			= macvlan_change_mtu,
+	.ndo_change_rx_flags		= macvlan_change_rx_flags,
+	.ndo_set_mac_address		= macvlan_set_mac_address,
+	.ndo_set_rx_mode		= macvlan_set_rx_mode,
+	.ndo_get_stats64		= macvlan_dev_get_stats64,
+	.ndo_validate_addr		= eth_validate_addr,
+	.ndo_vlan_rx_add_vid		= macvlan_vlan_rx_add_vid,
+	.ndo_vlan_rx_kill_vid		= macvlan_vlan_rx_kill_vid,
+	.ndo_set_rx_filter_addr		= macvlan_set_rx_filter_addr,
+	.ndo_set_rx_filter_vlan		= macvlan_set_rx_filter_vlan,
 };
 
 void macvlan_common_setup(struct net_device *dev)
@@ -584,6 +846,7 @@ void macvlan_common_setup(struct net_device *dev)
 	ether_setup(dev);
 
 	dev->priv_flags	       &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
+	dev->priv_flags	       |= IFF_UNICAST_FLT;
 	dev->netdev_ops		= &macvlan_netdev_ops;
 	dev->destructor		= free_netdev;
 	dev->header_ops		= &macvlan_hard_header_ops,
@@ -711,6 +974,8 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 	if (data && data[IFLA_MACVLAN_MODE])
 		vlan->mode = nla_get_u32(data[IFLA_MACVLAN_MODE]);
 
+	memset(vlan->vlan_filter, 0, VLAN_BITMAP_SIZE);
+
 	if (vlan->mode == MACVLAN_MODE_PASSTHRU) {
 		if (port->count)
 			return -EINVAL;
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index d103dca..c0d84a5 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -7,6 +7,7 @@
 #include <linux/netlink.h>
 #include <net/netlink.h>
 #include <linux/u64_stats_sync.h>
+#include <linux/if_vlan.h>
 
 #if defined(CONFIG_MACVTAP) || defined(CONFIG_MACVTAP_MODULE)
 struct socket *macvtap_get_socket(struct file *);
@@ -65,6 +66,7 @@ struct macvlan_dev {
 	struct macvtap_queue	*taps[MAX_MACVTAP_QUEUES];
 	int			numvtaps;
 	int			minor;
+	unsigned long		vlan_filter[BITS_TO_LONGS(VLAN_N_VID)];
 };
 
 static inline void macvlan_count_rx(const struct macvlan_dev *vlan,

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next-2.6 PATCH 6/6 v4] macvlan: Add support to get MAC/VLAN filter netdev ops
  2011-11-09  7:55 [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode Roopa Prabhu
                   ` (4 preceding siblings ...)
  2011-11-09  7:56 ` [net-next-2.6 PATCH 5/6 v4] macvlan: Add support to for netdev ops to set " Roopa Prabhu
@ 2011-11-09  7:56 ` Roopa Prabhu
  2011-11-18  0:15 ` [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode Ben Hutchings
  6 siblings, 0 replies; 36+ messages in thread
From: Roopa Prabhu @ 2011-11-09  7:56 UTC (permalink / raw)
  To: netdev, davem
  Cc: chrisw, sri, dragos.tatulea, kvm, arnd, mst, gregory.v.rose,
	mchan, dwang2, shemminger, eric.dumazet, kaber, benve

From: Roopa Prabhu <roprabhu@cisco.com>

This patch adds support to get MAC and VLAN filter netdev ops
on a macvlan interface. It adds support for get_rx_filter_addr_size,
get_rx_filter_vlan_size, fill_rx_filter_addr and fill_rx_filter_vlan
netdev ops

Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
---
 drivers/net/macvlan.c |  158 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 158 insertions(+), 0 deletions(-)


diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index c2dea97..8a5320b 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -623,6 +623,55 @@ static int macvlan_set_rx_filter_vlan(struct net_device *dev, int vf,
 	return 0;
 }
 
+static size_t macvlan_get_rx_filter_vlan_size(const struct net_device *dev,
+					      int vf)
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	struct net_device *lowerdev = vlan->lowerdev;
+	const struct net_device_ops *ops = lowerdev->netdev_ops;
+
+	if (vf != SELF_VF)
+		return -EINVAL;
+
+	switch (vlan->mode) {
+	case MACVLAN_MODE_PASSTHRU:
+		if (ops->ndo_get_rx_filter_vlan_size)
+			return ops->ndo_get_rx_filter_vlan_size(dev, vf);
+		/* IFLA_RX_FILTER_VLAN_BITMAP */
+		return nla_total_size(VLAN_BITMAP_SIZE);
+	default:
+		return 0;
+	}
+}
+
+static int macvlan_get_rx_filter_vlan(const struct net_device *dev, int vf,
+				      struct sk_buff *skb)
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	struct net_device *lowerdev = vlan->lowerdev;
+	const struct net_device_ops *ops = lowerdev->netdev_ops;
+
+	if (vf != SELF_VF)
+		return -EINVAL;
+
+	switch (vlan->mode) {
+	case MACVLAN_MODE_PASSTHRU:
+		if (ops->ndo_get_rx_filter_vlan)
+			return ops->ndo_get_rx_filter_vlan(dev, vf, skb);
+
+		NLA_PUT(skb, IFLA_RX_FILTER_VLAN_BITMAP, VLAN_BITMAP_SIZE,
+			vlan->vlan_filter);
+		break;
+	default:
+		return -ENODATA; /* No data to Fill */
+	}
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
 static int macvlan_addr_in_hw_list(struct netdev_hw_addr_list *list,
 				   u8 *addr, int addrlen)
 {
@@ -802,6 +851,111 @@ static int macvlan_set_rx_filter_addr(struct net_device *dev, int vf,
 	return 0;
 }
 
+static size_t macvlan_get_rx_filter_addr_passthru_size(
+			const struct net_device *dev, int vf)
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	struct net_device *lowerdev = vlan->lowerdev;
+	const struct net_device_ops *ops = lowerdev->netdev_ops;
+	size_t size;
+
+	if (ops->ndo_get_rx_filter_addr_size)
+		return ops->ndo_get_rx_filter_addr_size(dev, vf);
+
+	/* IFLA_RX_FILTER_ADDR_FLAGS */
+	size = nla_total_size(sizeof(u32));
+
+	if (netdev_uc_count(dev))
+		/* IFLA_RX_FILTER_ADDR_UC_LIST */
+		size += nla_total_size(netdev_uc_count(dev) *
+				       ETH_ALEN * sizeof(struct nlattr));
+
+	if (netdev_mc_count(dev))
+		/* IFLA_RX_FILTER_ADDR_MC_LIST */
+		size += nla_total_size(netdev_mc_count(dev) *
+				       ETH_ALEN * sizeof(struct nlattr));
+
+	return size;
+}
+
+static size_t macvlan_get_rx_filter_addr_size(const struct net_device *dev,
+					      int vf)
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+
+	if (vf != SELF_VF)
+		return -EINVAL;
+
+	switch (vlan->mode) {
+	case MACVLAN_MODE_PASSTHRU:
+		return macvlan_get_rx_filter_addr_passthru_size(dev, vf);
+	default:
+		return 0;
+	}
+}
+
+static int macvlan_get_rx_filter_addr_passthru(const struct net_device *dev,
+					       int vf, struct sk_buff *skb)
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+	struct net_device *lowerdev = vlan->lowerdev;
+	const struct net_device_ops *ops = lowerdev->netdev_ops;
+	struct nlattr *uninitialized_var(uc_list), *mc_list;
+	struct netdev_hw_addr *ha;
+
+	if (ops->ndo_get_rx_filter_addr)
+		return ops->ndo_get_rx_filter_addr(dev, vf, skb);
+
+	NLA_PUT_U32(skb, IFLA_RX_FILTER_ADDR_FLAGS,
+		dev->flags & RX_FILTER_FLAGS);
+
+	if (netdev_uc_count(dev)) {
+		uc_list = nla_nest_start(skb, IFLA_RX_FILTER_ADDR_UC_LIST);
+		if (uc_list == NULL)
+			goto nla_put_failure;
+
+		netdev_for_each_uc_addr(ha, dev) {
+			NLA_PUT(skb, IFLA_ADDR_LIST_ENTRY, ETH_ALEN, ha->addr);
+		}
+		nla_nest_end(skb, uc_list);
+	}
+
+	if (netdev_mc_count(dev)) {
+		mc_list = nla_nest_start(skb, IFLA_RX_FILTER_ADDR_MC_LIST);
+		if (mc_list == NULL)
+			goto nla_uc_list_cancel;
+
+		netdev_for_each_mc_addr(ha, dev) {
+			NLA_PUT(skb, IFLA_ADDR_LIST_ENTRY, ETH_ALEN, ha->addr);
+		}
+		nla_nest_end(skb, mc_list);
+	}
+
+	return 0;
+
+nla_uc_list_cancel:
+	if (netdev_uc_count(dev))
+		nla_nest_cancel(skb, uc_list);
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
+static int macvlan_get_rx_filter_addr(const struct net_device *dev, int vf,
+				      struct sk_buff *skb)
+{
+	struct macvlan_dev *vlan = netdev_priv(dev);
+
+	if (vf != SELF_VF)
+		return -EINVAL;
+
+	switch (vlan->mode) {
+	case MACVLAN_MODE_PASSTHRU:
+		return macvlan_get_rx_filter_addr_passthru(dev, vf, skb);
+	default:
+		return -ENODATA; /* No data to Fill */
+	}
+}
+
 static void macvlan_ethtool_get_drvinfo(struct net_device *dev,
 					struct ethtool_drvinfo *drvinfo)
 {
@@ -838,7 +992,11 @@ static const struct net_device_ops macvlan_netdev_ops = {
 	.ndo_vlan_rx_add_vid		= macvlan_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid		= macvlan_vlan_rx_kill_vid,
 	.ndo_set_rx_filter_addr		= macvlan_set_rx_filter_addr,
+	.ndo_get_rx_filter_addr_size	= macvlan_get_rx_filter_addr_size,
+	.ndo_get_rx_filter_addr		= macvlan_get_rx_filter_addr,
 	.ndo_set_rx_filter_vlan		= macvlan_set_rx_filter_vlan,
+	.ndo_get_rx_filter_vlan_size	= macvlan_get_rx_filter_vlan_size,
+	.ndo_get_rx_filter_vlan		= macvlan_get_rx_filter_vlan,
 };
 
 void macvlan_common_setup(struct net_device *dev)


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-09  7:55 [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode Roopa Prabhu
                   ` (5 preceding siblings ...)
  2011-11-09  7:56 ` [net-next-2.6 PATCH 6/6 v4] macvlan: Add support to get MAC/VLAN filter netdev ops Roopa Prabhu
@ 2011-11-18  0:15 ` Ben Hutchings
  2011-11-18  0:32   ` Greg Rose
  2011-11-20 16:30   ` Roopa Prabhu
  6 siblings, 2 replies; 36+ messages in thread
From: Ben Hutchings @ 2011-11-18  0:15 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: netdev, davem, chrisw, sri, dragos.tatulea, kvm, arnd, mst,
	gregory.v.rose, mchan, dwang2, shemminger, eric.dumazet, kaber,
	benve

Sorry to come to this rather late.

On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
[...]
> v2 -> v3
> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
> - Support for SRIOV VFs.
>         [Note: The get filters msg (in the way current get rtnetlink handles
>         it) might get too big for SRIOV vfs. This patch follows existing sriov 
>         vf get code and tries to accomodate filters for all VF's in a PF. 
>         And for the SRIOV case I have only tested the fact that the VF 
>         arguments are getting delivered to rtnetlink correctly. The code
>         follows existing sriov vf handling code so rest of it should work fine]
[...]

This is already broken for large numbers of VFs, and increasing the
amount of information per VF is going to make the situation worse.  I am
no netlink expert but I think that the current approach of bundling all
information about an interface in a single message may not be
sustainable.

Also, I'm unclear on why this interface is to be used to set filtering
for the (PF) net device as well as for related VFs.  Doesn't that
duplicate the functionality of ndo_set_rx_mode and
ndo_vlan_rx_{add,kill}_vid?

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 1/6 v4] rtnetlink: Netlink interface for setting MAC and VLAN filters
  2011-11-09  7:55 ` [net-next-2.6 PATCH 1/6 v4] rtnetlink: Netlink interface for setting MAC and VLAN filters Roopa Prabhu
@ 2011-11-18  0:17   ` Ben Hutchings
  0 siblings, 0 replies; 36+ messages in thread
From: Ben Hutchings @ 2011-11-18  0:17 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: netdev, davem, chrisw, sri, dragos.tatulea, kvm, arnd, mst,
	gregory.v.rose, mchan, dwang2, shemminger, eric.dumazet, kaber,
	benve

On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
> From: Roopa Prabhu <roprabhu@cisco.com>
> 
> This patch introduces the following netlink interface to set
> MAC and VLAN filters on an network interface. It can be used to
> set RX filter on any network interface (if supported by the driver) and
> also on a SRIOV VF via its PF
>
> Interface to set RX filter on a SRIOV VF
> [IFLA_VF_RX_FILTERS] = {
> 	[IFLA_VF_RX_FILTER] = {
> 		[IFLA_RX_FILTER_VF]
> 		[IFLA_RX_FILTER_ADDR] = {
> 			[IFLA_RX_FILTER_ADDR_FLAGS]
> 			[IFLA_RX_FILTER_ADDR_UC_LIST] = {
> 				[IFLA_ADDR_LIST_ENTRY]
> 			}
> 			[IFLA_RX_FILTER_ADDR_MC_LIST] = {
> 				[IFLA_ADDR_LIST_ENTRY]
> 			}
> 		}
> 		[IFLA_RX_FILTER_VLAN] = {
> 			[IFLA_RX_FILTER_VLAN_BITMAP]
> 		}
> 	}
> 	...
> }
[...]

Please put the details of both syntax *and semantics* in if_link.h.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-18  0:15 ` [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode Ben Hutchings
@ 2011-11-18  0:32   ` Greg Rose
  2011-11-18  0:44     ` Ben Hutchings
  2011-11-20 16:30   ` Roopa Prabhu
  1 sibling, 1 reply; 36+ messages in thread
From: Greg Rose @ 2011-11-18  0:32 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Roopa Prabhu, netdev@vger.kernel.org, davem@davemloft.net,
	chrisw@redhat.com, sri@us.ibm.com, dragos.tatulea@gmail.com,
	kvm@vger.kernel.org, arnd@arndb.de, mst@redhat.com,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com


On 11/17/2011 4:15 PM, Ben Hutchings wrote:
> Sorry to come to this rather late.
>
> On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
> [...]
>> v2 ->  v3
>> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
>> - Support for SRIOV VFs.
>>          [Note: The get filters msg (in the way current get rtnetlink handles
>>          it) might get too big for SRIOV vfs. This patch follows existing sriov
>>          vf get code and tries to accomodate filters for all VF's in a PF.
>>          And for the SRIOV case I have only tested the fact that the VF
>>          arguments are getting delivered to rtnetlink correctly. The code
>>          follows existing sriov vf handling code so rest of it should work fine]
> [...]
>
> This is already broken for large numbers of VFs, and increasing the
> amount of information per VF is going to make the situation worse.  I am
> no netlink expert but I think that the current approach of bundling all
> information about an interface in a single message may not be
> sustainable.
>
> Also, I'm unclear on why this interface is to be used to set filtering
> for the (PF) net device as well as for related VFs.  Doesn't that
> duplicate the functionality of ndo_set_rx_mode and
> ndo_vlan_rx_{add,kill}_vid?

Functionally yes but contextually no.  This allows the PF driver to know 
that it is setting these filters in the context of the existence of VFs, 
allowing it to take appropriate action.  The other two functions may be 
called without the presence of SR-IOV enablement and the existence of VFs.

Anyway, that's why I asked Roopa to add that capability.

- Greg

>
> Ben.
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-18  0:32   ` Greg Rose
@ 2011-11-18  0:44     ` Ben Hutchings
  2011-11-18 16:58       ` Greg Rose
  0 siblings, 1 reply; 36+ messages in thread
From: Ben Hutchings @ 2011-11-18  0:44 UTC (permalink / raw)
  To: Greg Rose
  Cc: Roopa Prabhu, netdev@vger.kernel.org, davem@davemloft.net,
	chrisw@redhat.com, sri@us.ibm.com, dragos.tatulea@gmail.com,
	kvm@vger.kernel.org, arnd@arndb.de, mst@redhat.com,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com

On Thu, 2011-11-17 at 16:32 -0800, Greg Rose wrote:
> On 11/17/2011 4:15 PM, Ben Hutchings wrote:
> > Sorry to come to this rather late.
> >
> > On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
> > [...]
> >> v2 ->  v3
> >> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
> >> - Support for SRIOV VFs.
> >>          [Note: The get filters msg (in the way current get rtnetlink handles
> >>          it) might get too big for SRIOV vfs. This patch follows existing sriov
> >>          vf get code and tries to accomodate filters for all VF's in a PF.
> >>          And for the SRIOV case I have only tested the fact that the VF
> >>          arguments are getting delivered to rtnetlink correctly. The code
> >>          follows existing sriov vf handling code so rest of it should work fine]
> > [...]
> >
> > This is already broken for large numbers of VFs, and increasing the
> > amount of information per VF is going to make the situation worse.  I am
> > no netlink expert but I think that the current approach of bundling all
> > information about an interface in a single message may not be
> > sustainable.
> >
> > Also, I'm unclear on why this interface is to be used to set filtering
> > for the (PF) net device as well as for related VFs.  Doesn't that
> > duplicate the functionality of ndo_set_rx_mode and
> > ndo_vlan_rx_{add,kill}_vid?
> 
> Functionally yes but contextually no.  This allows the PF driver to know 
> that it is setting these filters in the context of the existence of VFs, 
> allowing it to take appropriate action.  The other two functions may be 
> called without the presence of SR-IOV enablement and the existence of VFs.
> 
> Anyway, that's why I asked Roopa to add that capability.

I don't follow.  The PF driver already knows whether it has enabled VFs.

How do filters set this way interact with filters set through the
existing operations?  Should they override promiscuous mode?  None of
this has been specified.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-18  0:44     ` Ben Hutchings
@ 2011-11-18 16:58       ` Greg Rose
  2011-11-18 17:40         ` Ben Hutchings
  0 siblings, 1 reply; 36+ messages in thread
From: Greg Rose @ 2011-11-18 16:58 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Roopa Prabhu, netdev@vger.kernel.org, davem@davemloft.net,
	chrisw@redhat.com, sri@us.ibm.com, dragos.tatulea@gmail.com,
	kvm@vger.kernel.org, arnd@arndb.de, mst@redhat.com,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com


On 11/17/2011 4:44 PM, Ben Hutchings wrote:
> On Thu, 2011-11-17 at 16:32 -0800, Greg Rose wrote:
>> On 11/17/2011 4:15 PM, Ben Hutchings wrote:
>>> Sorry to come to this rather late.
>>>
>>> On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
>>> [...]
>>>> v2 ->   v3
>>>> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
>>>> - Support for SRIOV VFs.
>>>>           [Note: The get filters msg (in the way current get rtnetlink handles
>>>>           it) might get too big for SRIOV vfs. This patch follows existing sriov
>>>>           vf get code and tries to accomodate filters for all VF's in a PF.
>>>>           And for the SRIOV case I have only tested the fact that the VF
>>>>           arguments are getting delivered to rtnetlink correctly. The code
>>>>           follows existing sriov vf handling code so rest of it should work fine]
>>> [...]
>>>
>>> This is already broken for large numbers of VFs, and increasing the
>>> amount of information per VF is going to make the situation worse.  I am
>>> no netlink expert but I think that the current approach of bundling all
>>> information about an interface in a single message may not be
>>> sustainable.
>>>
>>> Also, I'm unclear on why this interface is to be used to set filtering
>>> for the (PF) net device as well as for related VFs.  Doesn't that
>>> duplicate the functionality of ndo_set_rx_mode and
>>> ndo_vlan_rx_{add,kill}_vid?
>>
>> Functionally yes but contextually no.  This allows the PF driver to know
>> that it is setting these filters in the context of the existence of VFs,
>> allowing it to take appropriate action.  The other two functions may be
>> called without the presence of SR-IOV enablement and the existence of VFs.
>>
>> Anyway, that's why I asked Roopa to add that capability.
>
> I don't follow.  The PF driver already knows whether it has enabled VFs.
>
> How do filters set this way interact with filters set through the
> existing operations?  Should they override promiscuous mode?  None of
> this has been specified.

Promiscuous mode is exactly the issue this feature is intended for.  I'm 
not familiar with the solarflare device but Intel HW promiscuous mode is 
only promiscuous on the physical port, not on the VEB.  So a packet sent 
from a VF will not be captured by the PF across the VEB unless the MAC 
and VLAN filters have been programmed into the HW.  So you may not need 
the feature for your devices but it is required for Intel devices.  And 
it's a fairly simple request, just allow -1 to indicate that the target 
of the filter requests is for the PF itself.  Using the already existing 
set_rx_mode function wont' work because the PF driver will look at it 
and figure it's in promiscuous mode anyway, so it won't set the filters 
into the HW.  At least that is how it is in the case of our HW and 
driver.  Again, the behavior of your HW and driver is unknown to me and 
thus you may not require this feature.

- Greg

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-18 16:58       ` Greg Rose
@ 2011-11-18 17:40         ` Ben Hutchings
  2011-11-21 17:41           ` Greg Rose
  0 siblings, 1 reply; 36+ messages in thread
From: Ben Hutchings @ 2011-11-18 17:40 UTC (permalink / raw)
  To: Greg Rose
  Cc: Roopa Prabhu, netdev@vger.kernel.org, davem@davemloft.net,
	chrisw@redhat.com, sri@us.ibm.com, dragos.tatulea@gmail.com,
	kvm@vger.kernel.org, arnd@arndb.de, mst@redhat.com,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com

On Fri, 2011-11-18 at 08:58 -0800, Greg Rose wrote:
> On 11/17/2011 4:44 PM, Ben Hutchings wrote:
> > On Thu, 2011-11-17 at 16:32 -0800, Greg Rose wrote:
> >> On 11/17/2011 4:15 PM, Ben Hutchings wrote:
> >>> Sorry to come to this rather late.
> >>>
> >>> On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
> >>> [...]
> >>>> v2 ->   v3
> >>>> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
> >>>> - Support for SRIOV VFs.
> >>>>           [Note: The get filters msg (in the way current get rtnetlink handles
> >>>>           it) might get too big for SRIOV vfs. This patch follows existing sriov
> >>>>           vf get code and tries to accomodate filters for all VF's in a PF.
> >>>>           And for the SRIOV case I have only tested the fact that the VF
> >>>>           arguments are getting delivered to rtnetlink correctly. The code
> >>>>           follows existing sriov vf handling code so rest of it should work fine]
> >>> [...]
> >>>
> >>> This is already broken for large numbers of VFs, and increasing the
> >>> amount of information per VF is going to make the situation worse.  I am
> >>> no netlink expert but I think that the current approach of bundling all
> >>> information about an interface in a single message may not be
> >>> sustainable.
> >>>
> >>> Also, I'm unclear on why this interface is to be used to set filtering
> >>> for the (PF) net device as well as for related VFs.  Doesn't that
> >>> duplicate the functionality of ndo_set_rx_mode and
> >>> ndo_vlan_rx_{add,kill}_vid?
> >>
> >> Functionally yes but contextually no.  This allows the PF driver to know
> >> that it is setting these filters in the context of the existence of VFs,
> >> allowing it to take appropriate action.  The other two functions may be
> >> called without the presence of SR-IOV enablement and the existence of VFs.
> >>
> >> Anyway, that's why I asked Roopa to add that capability.
> >
> > I don't follow.  The PF driver already knows whether it has enabled VFs.
> >
> > How do filters set this way interact with filters set through the
> > existing operations?  Should they override promiscuous mode?  None of
> > this has been specified.
> 
> Promiscuous mode is exactly the issue this feature is intended for.  I'm 
> not familiar with the solarflare device but Intel HW promiscuous mode is 
> only promiscuous on the physical port, not on the VEB.  So a packet sent 
> from a VF will not be captured by the PF across the VEB unless the MAC 
> and VLAN filters have been programmed into the HW.

Yes, I get it.  The hardware bridge needs to know more about the address
configuration on the host than the driver is getting at the moment.

> So you may not need 
> the feature for your devices but it is required for Intel devices.

Well we don't have the hardware bridge but that means each VF driver
needs to know whether to fall back to the software bridge.  The net
driver needs much the same additional information.

> And 
> it's a fairly simple request, just allow -1 to indicate that the target 
> of the filter requests is for the PF itself.  Using the already existing 
> set_rx_mode function wont' work because the PF driver will look at it 
> and figure it's in promiscuous mode anyway, so it won't set the filters 
> into the HW.  At least that is how it is in the case of our HW and 
> driver.  Again, the behavior of your HW and driver is unknown to me and 
> thus you may not require this feature.

What concerns me is that this seems to be a workaround rather than a fix
for over-use of promiscuous mode, and it changes the semantics of
filtering modes in ways that haven't been well-specified.

What if there's a software bridge between two net devices corresponding
to separate physical ports, so that they really need to be promiscuous?
What if the administrator runs tcpdump and really wants the (PF) net
device to be promiscuous?

These cases shouldn't break because of VF acceleration.  Or at least we
should make a conscious and documented decision that 'promiscuous'
doesn't mean that if you enable it on your network adapter.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-18  0:15 ` [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode Ben Hutchings
  2011-11-18  0:32   ` Greg Rose
@ 2011-11-20 16:30   ` Roopa Prabhu
  2012-02-02  7:24     ` Michael S. Tsirkin
  1 sibling, 1 reply; 36+ messages in thread
From: Roopa Prabhu @ 2011-11-20 16:30 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, davem, chrisw, sri, dragos.tatulea, kvm, arnd, mst,
	gregory.v.rose, mchan, dwang2, shemminger, eric.dumazet, kaber,
	benve

On 11/17/11 4:15 PM, "Ben Hutchings" <bhutchings@solarflare.com> wrote:

> Sorry to come to this rather late.
> 
> On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
> [...]
>> v2 -> v3
>> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
>> - Support for SRIOV VFs.
>>         [Note: The get filters msg (in the way current get rtnetlink handles
>>         it) might get too big for SRIOV vfs. This patch follows existing
>> sriov 
>>         vf get code and tries to accomodate filters for all VF's in a PF.
>>         And for the SRIOV case I have only tested the fact that the VF
>>         arguments are getting delivered to rtnetlink correctly. The code
>>         follows existing sriov vf handling code so rest of it should work
>> fine]
> [...]
> 
> This is already broken for large numbers of VFs, and increasing the
> amount of information per VF is going to make the situation worse.  I am
> no netlink expert but I think that the current approach of bundling all
> information about an interface in a single message may not be
> sustainable.

Yes agreed. I have the same concern.

> 
> Also, I'm unclear on why this interface is to be used to set filtering
> for the (PF) net device as well as for related VFs.  Doesn't that
> duplicate the functionality of ndo_set_rx_mode and
> ndo_vlan_rx_{add,kill}_vid?
> 
Yes..I have thought about this. But the reason the final version is the way
it is because its trying to accommodate sriov and non sriov cases because I
was just trying to make the netlink interface available to as many use cases
that might need it.

I just wanted to bring up the original intent of this patch.
Which was to add support for TUNSETTXFILTER to macvtap so that it can do
filtering instead of putting the lower dev (physical dev) in promiscuous
mode (This part really does not care if the lowerdev is an SRIOV VF or not).
And the focus was on macvlan passthru mode because it is the simplest case
to solve (you have to just pass the filters to lowerdev device/driver).
Now this may seem like It can be done with existing set_rx_mode/add_vlan_id
etc (which are essentially the mechanisms I am using in the macvlan driver
to send the filters to lowerdev for passthru mode), but it might not be the
case for other macvlan modes. Macvlan device might need to maintain and do a
filter lookup like the tap driver does today. And the only reason SRIOV came
up in the original patch was because PASSTHRU mode of macvlan was added for
SRIOV use case, though it really does not care if the lowerdev is an SRIOV
VF or not.

Instead of implementing TUNSETTXFILTER, michael had suggested netlink
interface instead. 
When implementing the netlink interface, I did go back and forth in deciding
whether this should be on every netdev or only virtual devices that support
rtnl link ops. And the existence of set_rx_mode and add_vlan_id netdev ops
Was the reason for confusion. So the next version implemented it as rtnl
link ops because all I really want is a mechanism like TUNSETTXFILTER which
can set/get filters for virtual devices that need to do filtering by
themselves. But restricting this interface for only virtual devices did not
make great sense so when greg pointed it out that he will need it for VF
netdevs, I was happy to move it to netdev ops.

And the only reason this patch works on both PF and VF if the final version
is because, its trying to accommodate both SRIOV and non-SRIOV devices.
So by saying PF and VF, for me it really means SRIOV VF and any other netdev
devices. So I intentionally did not put PF or VF in the name of the op.

Thanks,
Roopa

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-18 17:40         ` Ben Hutchings
@ 2011-11-21 17:41           ` Greg Rose
  2011-11-29 16:35             ` Ben Hutchings
  0 siblings, 1 reply; 36+ messages in thread
From: Greg Rose @ 2011-11-21 17:41 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Roopa Prabhu, netdev@vger.kernel.org, davem@davemloft.net,
	chrisw@redhat.com, sri@us.ibm.com, dragos.tatulea@gmail.com,
	kvm@vger.kernel.org, arnd@arndb.de, mst@redhat.com,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com

On 11/18/2011 9:40 AM, Ben Hutchings wrote:
> On Fri, 2011-11-18 at 08:58 -0800, Greg Rose wrote:
>> On 11/17/2011 4:44 PM, Ben Hutchings wrote:
>>> On Thu, 2011-11-17 at 16:32 -0800, Greg Rose wrote:
>>>> On 11/17/2011 4:15 PM, Ben Hutchings wrote:
>>>>> Sorry to come to this rather late.
>>>>>
>>>>> On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
>>>>> [...]
>>>>>> v2 ->    v3
>>>>>> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
>>>>>> - Support for SRIOV VFs.
>>>>>>            [Note: The get filters msg (in the way current get rtnetlink handles
>>>>>>            it) might get too big for SRIOV vfs. This patch follows existing sriov
>>>>>>            vf get code and tries to accomodate filters for all VF's in a PF.
>>>>>>            And for the SRIOV case I have only tested the fact that the VF
>>>>>>            arguments are getting delivered to rtnetlink correctly. The code
>>>>>>            follows existing sriov vf handling code so rest of it should work fine]
>>>>> [...]
>>>>>
>>>>> This is already broken for large numbers of VFs, and increasing the
>>>>> amount of information per VF is going to make the situation worse.  I am
>>>>> no netlink expert but I think that the current approach of bundling all
>>>>> information about an interface in a single message may not be
>>>>> sustainable.
>>>>>
>>>>> Also, I'm unclear on why this interface is to be used to set filtering
>>>>> for the (PF) net device as well as for related VFs.  Doesn't that
>>>>> duplicate the functionality of ndo_set_rx_mode and
>>>>> ndo_vlan_rx_{add,kill}_vid?
>>>>
>>>> Functionally yes but contextually no.  This allows the PF driver to know
>>>> that it is setting these filters in the context of the existence of VFs,
>>>> allowing it to take appropriate action.  The other two functions may be
>>>> called without the presence of SR-IOV enablement and the existence of VFs.
>>>>
>>>> Anyway, that's why I asked Roopa to add that capability.
>>>
>>> I don't follow.  The PF driver already knows whether it has enabled VFs.
>>>
>>> How do filters set this way interact with filters set through the
>>> existing operations?  Should they override promiscuous mode?  None of
>>> this has been specified.
>>
>> Promiscuous mode is exactly the issue this feature is intended for.  I'm
>> not familiar with the solarflare device but Intel HW promiscuous mode is
>> only promiscuous on the physical port, not on the VEB.  So a packet sent
>> from a VF will not be captured by the PF across the VEB unless the MAC
>> and VLAN filters have been programmed into the HW.
>
> Yes, I get it.  The hardware bridge needs to know more about the address
> configuration on the host than the driver is getting at the moment.
>
>> So you may not need
>> the feature for your devices but it is required for Intel devices.
>
> Well we don't have the hardware bridge but that means each VF driver
> needs to know whether to fall back to the software bridge.  The net
> driver needs much the same additional information.
>
>> And
>> it's a fairly simple request, just allow -1 to indicate that the target
>> of the filter requests is for the PF itself.  Using the already existing
>> set_rx_mode function wont' work because the PF driver will look at it
>> and figure it's in promiscuous mode anyway, so it won't set the filters
>> into the HW.  At least that is how it is in the case of our HW and
>> driver.  Again, the behavior of your HW and driver is unknown to me and
>> thus you may not require this feature.
>
> What concerns me is that this seems to be a workaround rather than a fix
> for over-use of promiscuous mode, and it changes the semantics of
> filtering modes in ways that haven't been well-specified.

I feel the opposite is true.  It allows a known set of receive filters 
so that you don't have to use promiscuous mode, which cuts down on 
overhead from processing packets the upper layer stack isn't really 
interested in.

>
> What if there's a software bridge between two net devices corresponding
> to separate physical ports, so that they really need to be promiscuous?
> What if the administrator runs tcpdump and really wants the (PF) net
> device to be promiscuous?

I don't believe there is anything in this patch set that removes 
promiscuous mode operation as it is commonly used.  Perhaps I've missed 
something.

>
> These cases shouldn't break because of VF acceleration.

I don't see how they would in the case of Intel HW.  And these new ops 
to set rx filters are something that Intel HW needs because it 
implements a VEB that is *not* a learning bridge and we don't want it to 
be a learning bridge because of security concerns.  It is better for our 
requirements to be allowed to set the MAC/VLAN filters that we want a VF 
to be able to see.

> Or at least we
> should make a conscious and documented decision that 'promiscuous'
> doesn't mean that if you enable it on your network adapter.

Again, I don't know of any plans to change anything relating to putting 
a device in promiscuous mode or changing the semantics of what 
promiscuous mode means.

- Greg

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-21 17:41           ` Greg Rose
@ 2011-11-29 16:35             ` Ben Hutchings
  2011-11-29 17:19               ` Ben Hutchings
  0 siblings, 1 reply; 36+ messages in thread
From: Ben Hutchings @ 2011-11-29 16:35 UTC (permalink / raw)
  To: Greg Rose
  Cc: Roopa Prabhu, netdev@vger.kernel.org, davem@davemloft.net,
	chrisw@redhat.com, sri@us.ibm.com, dragos.tatulea@gmail.com,
	kvm@vger.kernel.org, arnd@arndb.de, mst@redhat.com,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com

On Mon, 2011-11-21 at 09:41 -0800, Greg Rose wrote:
> On 11/18/2011 9:40 AM, Ben Hutchings wrote:
[...]
> > What concerns me is that this seems to be a workaround rather than a fix
> > for over-use of promiscuous mode, and it changes the semantics of
> > filtering modes in ways that haven't been well-specified.
> 
> I feel the opposite is true.  It allows a known set of receive filters 
> so that you don't have to use promiscuous mode, which cuts down on 
> overhead from processing packets the upper layer stack isn't really 
> interested in.
>
> >
> > What if there's a software bridge between two net devices corresponding
> > to separate physical ports, so that they really need to be promiscuous?
> > What if the administrator runs tcpdump and really wants the (PF) net
> > device to be promiscuous?
> 
> I don't believe there is anything in this patch set that removes 
> promiscuous mode operation as it is commonly used.  Perhaps I've missed 
> something.
[...]

Maybe I missed something!

Let's be clear on what our models are for filtering.  At the moment we
have MAC filters set through ndo_set_rx_mode and VF filters set through
ndo_set_vf_{mac,vlan}.

Ignoring anti-spoofing for the moment, should the currently defined
filters look like this (a):

                TX ^   | RX
                   |   v
+------------------+---+-----------------+
|                  |  ++------------+    |
|                  |  |RX MAC filter|    |
|                  |  ++------------+    |
|                  |   |match            |
|                  ^   v                 |
|                  |  ++------------+    |
|                  |  |RX VF filters|    |
|                  |  +-------+-----+    |
|                 /|\     no /|\         |
|                | | \ match/ | |match 2 |
|                | ^  \    /  v |        |
|                | |   \  /match|        |
|                |  \   \/  1/  |        |
|                |   \  /\  /   |        |
|                ^    \/  \/    v        |
|                |    /\  /\    |        |
|                |   /  ||  \   |        |
|                |  /   ||   \  |        |
|                | /    ||    \ |        |
|                ||     ||     ||        |
+----------------++-----++-----++--------+
                 ||     ||     ||
                 PF    VF 1   VF 2

or like this (b):

                TX ^   | RX
                   |   v
+------------------+---+-----------------+
|                  |  ++------------+    |
|                  |  |RX VF filters|    |
|                  |  ++--------+---+    |
|                  | no|match  /|        |
|                  ^   v      | |        |
|                  | +-+----+ | |        |
|                  | |RX MAC| | |        |
|                  | |filter| | |        |
|                  | +------+ | |        |
|                  |   |match | |        |
|                 /|\  |      | |        |
|                | | \ | match| |match 2 |
|                | ^  \/    1 v |        |
|                | |  /\      | |        |
|                |  \/  \    /  |        |
|                |  /\   \  /   |        |
|                ^ /  \   \/    v        |
|                ||    \  /\    |        |
|                ||     ||  \   |        |
|                ||     ||   \  |        |
|                ||     ||    \ |        |
|                ||     ||     ||        |
+----------------++-----++-----++--------+
                 ||     ||     ||
                 PF    VF 1   VF 2

I think the current model is (a); do you agree?

So is the proposed new model something like this (c):

                TX ^   | RX
                   |   v
+------------------+---+-----------------+
|                  |  ++------------+    |
|                  |  |RX MAC filter|    |
|                  ^  ++------------+    |
|                  |   |match            |
|          no match|   v                 |
|  +----------------+ ++------------+    |
|  |loopback filters| |RX VF filters|    |
|  +---------+-----++ +-------+-----+    |
|           /|\   /|\ match  /|\         |
|          v | `-+>+-+-.2   / | |        |
|           \ \  | |m \ \   / | |        |
|     match 0\ `-+-+.a \ \ /  v |        |
|             \  | | \t \ X   / |        |
|              \ |  \ \c X X /  |        |
|               \|\  \ \h \ X   |        |
|                \ \  \/\1 X \  v        |
|                ||   /\ |/ \ \ |        |
|                |v  /  ||   \ \|        |
|                || /   ^|    \ |        |
|                ||/    |v     \|        |
|                ||     ||     ||        |
+----------------++-----++-----++--------+
                 ||     ||     ||
                 PF    VF 1   VF 2

(I've labelled the new filters as loopback filters here, and I'm still
leaving out anti-spoofing.)

If not, please explain what the new model *is*.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-29 16:35             ` Ben Hutchings
@ 2011-11-29 17:19               ` Ben Hutchings
  2011-11-30 17:34                 ` Greg Rose
  0 siblings, 1 reply; 36+ messages in thread
From: Ben Hutchings @ 2011-11-29 17:19 UTC (permalink / raw)
  To: Greg Rose
  Cc: Roopa Prabhu, netdev@vger.kernel.org, davem@davemloft.net,
	chrisw@redhat.com, sri@us.ibm.com, dragos.tatulea@gmail.com,
	kvm@vger.kernel.org, arnd@arndb.de, mst@redhat.com,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com

On Tue, 2011-11-29 at 16:35 +0000, Ben Hutchings wrote:
> On Mon, 2011-11-21 at 09:41 -0800, Greg Rose wrote:
> > On 11/18/2011 9:40 AM, Ben Hutchings wrote:
> [...]
> > > What concerns me is that this seems to be a workaround rather than a fix
> > > for over-use of promiscuous mode, and it changes the semantics of
> > > filtering modes in ways that haven't been well-specified.
> > 
> > I feel the opposite is true.  It allows a known set of receive filters 
> > so that you don't have to use promiscuous mode, which cuts down on 
> > overhead from processing packets the upper layer stack isn't really 
> > interested in.
> >
> > >
> > > What if there's a software bridge between two net devices corresponding
> > > to separate physical ports, so that they really need to be promiscuous?
> > > What if the administrator runs tcpdump and really wants the (PF) net
> > > device to be promiscuous?
> > 
> > I don't believe there is anything in this patch set that removes 
> > promiscuous mode operation as it is commonly used.  Perhaps I've missed 
> > something.
> [...]
> 
> Maybe I missed something!
> 
> Let's be clear on what our models are for filtering.  At the moment we
> have MAC filters set through ndo_set_rx_mode and VF filters set through
> ndo_set_vf_{mac,vlan}.
> 
> Ignoring anti-spoofing for the moment, should the currently defined
> filters look like this (a):
> 
>                 TX ^   | RX
>                    |   v
> +------------------+---+-----------------+
> |                  |  ++------------+    |
> |                  |  |RX MAC filter|    |
> |                  |  ++------------+    |
> |                  |   |match            |
> |                  ^   v                 |
> |                  |  ++------------+    |
> |                  |  |RX VF filters|    |
> |                  |  +-------+-----+    |
> |                 /|\     no /|\         |
> |                | | \ match/ | |match 2 |
> |                | ^  \    /  v |        |
> |                | |   \  /match|        |
> |                |  \   \/  1/  |        |
> |                |   \  /\  /   |        |
> |                ^    \/  \/    v        |
> |                |    /\  /\    |        |
> |                |   /  ||  \   |        |
> |                |  /   ||   \  |        |
> |                | /    ||    \ |        |
> |                ||     ||     ||        |
> +----------------++-----++-----++--------+
>                  ||     ||     ||
>                  PF    VF 1   VF 2
> 
> or like this (b):
> 
>                 TX ^   | RX
>                    |   v
> +------------------+---+-----------------+
> |                  |  ++------------+    |
> |                  |  |RX VF filters|    |
> |                  |  ++--------+---+    |
> |                  | no|match  /|        |
> |                  ^   v      | |        |
> |                  | +-+----+ | |        |
> |                  | |RX MAC| | |        |
> |                  | |filter| | |        |
> |                  | +------+ | |        |
> |                  |   |match | |        |
> |                 /|\  |      | |        |
> |                | | \ | match| |match 2 |
> |                | ^  \/    1 v |        |
> |                | |  /\      | |        |
> |                |  \/  \    /  |        |
> |                |  /\   \  /   |        |
> |                ^ /  \   \/    v        |
> |                ||    \  /\    |        |
> |                ||     ||  \   |        |
> |                ||     ||   \  |        |
> |                ||     ||    \ |        |
> |                ||     ||     ||        |
> +----------------++-----++-----++--------+
>                  ||     ||     ||
>                  PF    VF 1   VF 2
> 
> I think the current model is (a); do you agree?
> 
> So is the proposed new model something like this (c):

Corrected diagram:

                TX ^   | RX
                   |   v
+------------------+---+-----------------+
|                  |  ++------------+    |
|                  |  |RX MAC filter|    |
|                  ^  ++------------+    |
|                  |   |match            |
|          no match|   v                 |
|  +----------------+ ++------------+    |
|  |loopback filters| |RX VF filters|    |
|  +---------+-----++ +-------+-----+    |
|           /|\   /|\ match  /|\         |
|          v | `-+>+-+-.2   / | |        |
|           \ \  | |m \ \   / | |        |
|     match 0\ `-+-+.a \ \ /  v |        |
|             \  | | \t \ X   / |        |
|              \ |  \ \c X \ /  |        |
|               \|   \ \h \ X   |        |
|                \    \/\1 X \  v        |
|                ||   /\ |/ \ \ |        |
|                |v  /  ||   \ \|        |
|                || /   ^|    \ |        |
|                ||/    |v     ||        |
|                ||     ||     ||        |
+----------------++-----++-----++--------+
                 ||     ||     ||
                 PF    VF 1   VF 2

> (I've labelled the new filters as loopback filters here, and I'm still
> leaving out anti-spoofing.)
> 
> If not, please explain what the new model *is*.
> 
> Ben.
> 

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-29 17:19               ` Ben Hutchings
@ 2011-11-30 17:34                 ` Greg Rose
  2011-11-30 18:48                   ` Ben Hutchings
  0 siblings, 1 reply; 36+ messages in thread
From: Greg Rose @ 2011-11-30 17:34 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Roopa Prabhu, netdev@vger.kernel.org, davem@davemloft.net,
	chrisw@redhat.com, sri@us.ibm.com, dragos.tatulea@gmail.com,
	kvm@vger.kernel.org, arnd@arndb.de, mst@redhat.com,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com


On 11/29/2011 9:19 AM, Ben Hutchings wrote:
> On Tue, 2011-11-29 at 16:35 +0000, Ben Hutchings wrote:
>>
>> Maybe I missed something!
>>
>> Let's be clear on what our models are for filtering.  At the moment we
>> have MAC filters set through ndo_set_rx_mode and VF filters set through
>> ndo_set_vf_{mac,vlan}.
>>
>> Ignoring anti-spoofing for the moment, should the currently defined
>> filters look like this (a):
>>
>>                  TX ^   | RX
>>                     |   v
>> +------------------+---+-----------------+
>> |                  |  ++------------+    |
>> |                  |  |RX MAC filter|    |
>> |                  |  ++------------+    |
>> |                  |   |match            |
>> |                  ^   v                 |
>> |                  |  ++------------+    |
>> |                  |  |RX VF filters|    |
>> |                  |  +-------+-----+    |
>> |                 /|\     no /|\         |
>> |                | | \ match/ | |match 2 |
>> |                | ^  \    /  v |        |
>> |                | |   \  /match|        |
>> |                |  \   \/  1/  |        |
>> |                |   \  /\  /   |        |
>> |                ^    \/  \/    v        |
>> |                |    /\  /\    |        |
>> |                |   /  ||  \   |        |
>> |                |  /   ||   \  |        |
>> |                | /    ||    \ |        |
>> |                ||     ||     ||        |
>> +----------------++-----++-----++--------+
>>                   ||     ||     ||
>>                   PF    VF 1   VF 2
>>
>> or like this (b):
>>
>>                  TX ^   | RX
>>                     |   v
>> +------------------+---+-----------------+
>> |                  |  ++------------+    |
>> |                  |  |RX VF filters|    |
>> |                  |  ++--------+---+    |
>> |                  | no|match  /|        |
>> |                  ^   v      | |        |
>> |                  | +-+----+ | |        |
>> |                  | |RX MAC| | |        |
>> |                  | |filter| | |        |
>> |                  | +------+ | |        |
>> |                  |   |match | |        |
>> |                 /|\  |      | |        |
>> |                | | \ | match| |match 2 |
>> |                | ^  \/    1 v |        |
>> |                | |  /\      | |        |
>> |                |  \/  \    /  |        |
>> |                |  /\   \  /   |        |
>> |                ^ /  \   \/    v        |
>> |                ||    \  /\    |        |
>> |                ||     ||  \   |        |
>> |                ||     ||   \  |        |
>> |                ||     ||    \ |        |
>> |                ||     ||     ||        |
>> +----------------++-----++-----++--------+
>>                   ||     ||     ||
>>                   PF    VF 1   VF 2
>>
>> I think the current model is (a); do you agree?
>>
>> So is the proposed new model something like this (c):
>
> Corrected diagram:
>
>                  TX ^   | RX
>                     |   v
> +------------------+---+-----------------+
> |                  |  ++------------+    |
> |                  |  |RX MAC filter|    |
> |                  ^  ++------------+    |
> |                  |   |match            |
> |          no match|   v                 |
> |  +----------------+ ++------------+    |
> |  |loopback filters| |RX VF filters|    |
> |  +---------+-----++ +-------+-----+    |
> |           /|\   /|\ match  /|\         |
> |          v | `-+>+-+-.2   / | |        |
> |           \ \  | |m \ \   / | |        |
> |     match 0\ `-+-+.a \ \ /  v |        |
> |             \  | | \t \ X   / |        |
> |              \ |  \ \c X \ /  |        |
> |               \|   \ \h \ X   |        |
> |                \    \/\1 X \  v        |
> |                ||   /\ |/ \ \ |        |
> |                |v  /  ||   \ \|        |
> |                || /   ^|    \ |        |
> |                ||/    |v     ||        |
> |                ||     ||     ||        |
> +----------------++-----++-----++--------+
>                   ||     ||     ||
>                   PF    VF 1   VF 2
>
>> (I've labelled the new filters as loopback filters here, and I'm still
>> leaving out anti-spoofing.)
>>
>> If not, please explain what the new model *is*.

The new model is to incorporate a VEB into the NIC.  The current model 
doesn't address any of the requirements of a VEB in the NIC and this 
proposed set of patches allow us to set MAC filters for the *ports* on 
the internal NIC VEB.  Consider the PF and each of the VFs as just a 
port on the VEB.  We need the ability to set L2 filters (MAC, MC and 
VLAN) for each of the ports on that VEB.  There is no currently 
supported method for doing this.  So yes, this is a new model although 
it's a fairly simple one.

If you have an alternative proposal for allowing us to set L2 filters 
for the ports on our NIC VEB then I'm all ears (or eyes as the case may be).

- Greg

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-30 17:34                 ` Greg Rose
@ 2011-11-30 18:48                   ` Ben Hutchings
  2011-11-30 21:04                     ` Chris Wright
  0 siblings, 1 reply; 36+ messages in thread
From: Ben Hutchings @ 2011-11-30 18:48 UTC (permalink / raw)
  To: Greg Rose
  Cc: Roopa Prabhu, netdev@vger.kernel.org, davem@davemloft.net,
	chrisw@redhat.com, sri@us.ibm.com, dragos.tatulea@gmail.com,
	kvm@vger.kernel.org, arnd@arndb.de, mst@redhat.com,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com

On Wed, 2011-11-30 at 09:34 -0800, Greg Rose wrote:
> On 11/29/2011 9:19 AM, Ben Hutchings wrote:
> > On Tue, 2011-11-29 at 16:35 +0000, Ben Hutchings wrote:
> >>
> >> Maybe I missed something!
[...]
> >> If not, please explain what the new model *is*.
> 
> The new model is to incorporate a VEB into the NIC.  The current model 
> doesn't address any of the requirements of a VEB in the NIC and this 
> proposed set of patches allow us to set MAC filters for the *ports* on 
> the internal NIC VEB.  Consider the PF and each of the VFs as just a 
> port on the VEB.  We need the ability to set L2 filters (MAC, MC and 
> VLAN) for each of the ports on that VEB.  There is no currently 
> supported method for doing this.  So yes, this is a new model although 
> it's a fairly simple one.

Explain precisely how the VEB changes the existing model.  Explain how
the existing MAC filter and VF filter APIs interact with port filters on
the VEB.  Refer to any relevant standards.

(I have really had enough of net driver API proposals where all the
difficult questions are punted to the implementation.  Either
implementations diverge and users and userspace developers are left
horribly confused, or else the second and subsequent implementations
have to follow whatever quirks the first implementation had.  It's an
essential part of the review process that such questions are asked and
answered.)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-30 18:48                   ` Ben Hutchings
@ 2011-11-30 21:04                     ` Chris Wright
  2011-11-30 21:34                       ` Ben Hutchings
  0 siblings, 1 reply; 36+ messages in thread
From: Chris Wright @ 2011-11-30 21:04 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Greg Rose, Roopa Prabhu, netdev@vger.kernel.org,
	davem@davemloft.net, chrisw@redhat.com, sri@us.ibm.com,
	dragos.tatulea@gmail.com, kvm@vger.kernel.org, arnd@arndb.de,
	mst@redhat.com, mchan@broadcom.com, dwang2@cisco.com,
	shemminger@vyatta.com, eric.dumazet@gmail.com, kaber@trash.net,
	benve@cisco.com

* Ben Hutchings (bhutchings@solarflare.com) wrote:
> On Wed, 2011-11-30 at 09:34 -0800, Greg Rose wrote:
> > On 11/29/2011 9:19 AM, Ben Hutchings wrote:
> > > On Tue, 2011-11-29 at 16:35 +0000, Ben Hutchings wrote:
> > >>
> > >> Maybe I missed something!
> [...]
> > >> If not, please explain what the new model *is*.
> > 
> > The new model is to incorporate a VEB into the NIC.  The current model 
> > doesn't address any of the requirements of a VEB in the NIC and this 
> > proposed set of patches allow us to set MAC filters for the *ports* on 
> > the internal NIC VEB.  Consider the PF and each of the VFs as just a 
> > port on the VEB.  We need the ability to set L2 filters (MAC, MC and 
> > VLAN) for each of the ports on that VEB.  There is no currently 
> > supported method for doing this.  So yes, this is a new model although 
> > it's a fairly simple one.
> 
> Explain precisely how the VEB changes the existing model.  Explain how
> the existing MAC filter and VF filter APIs interact with port filters on
> the VEB.  Refer to any relevant standards.

I agree that it's confusing.  Couldn't you simplify your ascii art
(hopefully removing hw assumptions about receive processing, and
completely ignoring vlans for the moment) to something like:

             |RX
             v
+------------+-------------+
|     +------+--------+    |
|     | RX MAC filter |    |
|     |and port select|    |
|     +---------------+    |
|            /|\           |
|           / | \   match 2|
|          /  v  \         |
|         /match  \        |
|        /  1 |    \       |
|       /     |     \      |
|match /      |      \     |
|  0  /       |       \    |
|    v        |        v   |
|    |        |        |   |
+----+--------+--------+---+
     |        |        |
    PF       VF 1     VF 2

And there's an unclear number of ways to update "RX MAC filter and port
select" table.

1) PF ndo_set_mac_addr
I expect that to be implicit to match 0.

2) PF ndo_set_rx_mode
Less clear, but I'd still expect these to implicitly match 0

3) PF ndo_set_vf_mac
I expect these to be an explicit match to VF N (given the interface
specifices which VF's MAC is being programmed).

4) VF ndo_set_mac_addr
This one may or may not be allowed (setting MAC+port if the VF is owned
by a guest is likely not allowed), but would expect an implicit VF N.

5) VF ndo_set_rx_mode
Same as 4) above.

6) PF or VF? ndo_set_rx_filter_addr
The new proposal, which has an explicit VF, although when it's VF_SELF
I'm not clear if this is just the same as 5) above?

Have I missed anything?

thanks,
chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-30 21:04                     ` Chris Wright
@ 2011-11-30 21:34                       ` Ben Hutchings
  2011-11-30 23:00                         ` Chris Wright
  0 siblings, 1 reply; 36+ messages in thread
From: Ben Hutchings @ 2011-11-30 21:34 UTC (permalink / raw)
  To: Chris Wright
  Cc: Greg Rose, Roopa Prabhu, netdev@vger.kernel.org,
	davem@davemloft.net, sri@us.ibm.com, dragos.tatulea@gmail.com,
	kvm@vger.kernel.org, arnd@arndb.de, mst@redhat.com,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com

On Wed, 2011-11-30 at 13:04 -0800, Chris Wright wrote:
> * Ben Hutchings (bhutchings@solarflare.com) wrote:
> > On Wed, 2011-11-30 at 09:34 -0800, Greg Rose wrote:
> > > On 11/29/2011 9:19 AM, Ben Hutchings wrote:
> > > > On Tue, 2011-11-29 at 16:35 +0000, Ben Hutchings wrote:
> > > >>
> > > >> Maybe I missed something!
> > [...]
> > > >> If not, please explain what the new model *is*.
> > > 
> > > The new model is to incorporate a VEB into the NIC.  The current model 
> > > doesn't address any of the requirements of a VEB in the NIC and this 
> > > proposed set of patches allow us to set MAC filters for the *ports* on 
> > > the internal NIC VEB.  Consider the PF and each of the VFs as just a 
> > > port on the VEB.  We need the ability to set L2 filters (MAC, MC and 
> > > VLAN) for each of the ports on that VEB.  There is no currently 
> > > supported method for doing this.  So yes, this is a new model although 
> > > it's a fairly simple one.
> > 
> > Explain precisely how the VEB changes the existing model.  Explain how
> > the existing MAC filter and VF filter APIs interact with port filters on
> > the VEB.  Refer to any relevant standards.
> 
> I agree that it's confusing.  Couldn't you simplify your ascii art
> (hopefully removing hw assumptions about receive processing, and
> completely ignoring vlans for the moment) to something like:
>
>              |RX
>              v
> +------------+-------------+
> |     +------+--------+    |
> |     | RX MAC filter |    |
> |     |and port select|    |
> |     +---------------+    |
> |            /|\           |
> |           / | \   match 2|
> |          /  v  \         |
> |         /match  \        |
> |        /  1 |    \       |
> |       /     |     \      |
> |match /      |      \     |
> |  0  /       |       \    |
> |    v        |        v   |
> |    |        |        |   |
> +----+--------+--------+---+
>      |        |        |
>     PF       VF 1     VF 2
> 
> And there's an unclear number of ways to update "RX MAC filter and port
> select" table.
> 
> 1) PF ndo_set_mac_addr
> I expect that to be implicit to match 0.
> 
> 2) PF ndo_set_rx_mode
> Less clear, but I'd still expect these to implicitly match 0
> 
> 3) PF ndo_set_vf_mac
> I expect these to be an explicit match to VF N (given the interface
> specifices which VF's MAC is being programmed).

I'm not sure whether this is supposed to implicitly add to the MAC
filter or whether that has to be changed too.  That's the main
difference between my models (a) and (b).

There's also PF ndo_set_vf_vlan.

> 4) VF ndo_set_mac_addr
> This one may or may not be allowed (setting MAC+port if the VF is owned
> by a guest is likely not allowed), but would expect an implicit VF N.
> 
> 5) VF ndo_set_rx_mode
> Same as 4) above.

So this is where we are today.

> 6) PF or VF? ndo_set_rx_filter_addr
> The new proposal, which has an explicit VF, although when it's VF_SELF
> I'm not clear if this is just the same as 5) above?
> 
> Have I missed anything?

Any physical port can be bridged to a mixture of guests with and without
their own VFs.  Packets sent from a guest with a VF to the address of a
guest without a VF need to be forwarded to the PF rather than the
physical port, but none of the drivers currently get to know about those
addresses.

Packets sent from a guest with a VF to the address of another guest with
a VF need to be forwarded similarly, but the driver should be able to
infer that from (3).

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-30 21:34                       ` Ben Hutchings
@ 2011-11-30 23:00                         ` Chris Wright
  2011-11-30 23:19                           ` Rose, Gregory V
  2011-11-30 23:30                           ` Sridhar Samudrala
  0 siblings, 2 replies; 36+ messages in thread
From: Chris Wright @ 2011-11-30 23:00 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Chris Wright, Greg Rose, Roopa Prabhu, netdev@vger.kernel.org,
	davem@davemloft.net, sri@us.ibm.com, dragos.tatulea@gmail.com,
	kvm@vger.kernel.org, arnd@arndb.de, mst@redhat.com,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com

* Ben Hutchings (bhutchings@solarflare.com) wrote:
> On Wed, 2011-11-30 at 13:04 -0800, Chris Wright wrote:
> > I agree that it's confusing.  Couldn't you simplify your ascii art
> > (hopefully removing hw assumptions about receive processing, and
> > completely ignoring vlans for the moment) to something like:
> >
> >              |RX
> >              v
> > +------------+-------------+
> > |     +------+--------+    |
> > |     | RX MAC filter |    |
> > |     |and port select|    |
> > |     +---------------+    |
> > |            /|\           |
> > |           / | \   match 2|
> > |          /  v  \         |
> > |         /match  \        |
> > |        /  1 |    \       |
> > |       /     |     \      |
> > |match /      |      \     |
> > |  0  /       |       \    |
> > |    v        |        v   |
> > |    |        |        |   |
> > +----+--------+--------+---+
> >      |        |        |
> >     PF       VF 1     VF 2
> > 
> > And there's an unclear number of ways to update "RX MAC filter and port
> > select" table.
> > 
> > 1) PF ndo_set_mac_addr
> > I expect that to be implicit to match 0.
> > 
> > 2) PF ndo_set_rx_mode
> > Less clear, but I'd still expect these to implicitly match 0
> > 
> > 3) PF ndo_set_vf_mac
> > I expect these to be an explicit match to VF N (given the interface
> > specifices which VF's MAC is being programmed).
> 
> I'm not sure whether this is supposed to implicitly add to the MAC
> filter or whether that has to be changed too.  That's the main
> difference between my models (a) and (b).

I see now.  I wasn't entirely clear on the difference before.  It's also
going to be hw specific.  I think (Intel folks can verify) that the
Intel SR-IOV devices have a single global unicast exact match table,
for example.

> There's also PF ndo_set_vf_vlan.

Right, although I had mentioned I was trying to limit just to MAC
filtering to simplify.

> > 4) VF ndo_set_mac_addr
> > This one may or may not be allowed (setting MAC+port if the VF is owned
> > by a guest is likely not allowed), but would expect an implicit VF N.
> > 
> > 5) VF ndo_set_rx_mode
> > Same as 4) above.
> 
> So this is where we are today.

Cool, good that we agree there.

> > 6) PF or VF? ndo_set_rx_filter_addr
> > The new proposal, which has an explicit VF, although when it's VF_SELF
> > I'm not clear if this is just the same as 5) above?
> > 
> > Have I missed anything?
> 
> Any physical port can be bridged to a mixture of guests with and without
> their own VFs.  Packets sent from a guest with a VF to the address of a
> guest without a VF need to be forwarded to the PF rather than the
> physical port, but none of the drivers currently get to know about those
> addresses.

To clarify, do you mean something like this?

       physical port
             |
+------------+------------+
|         +-----+         |
|         | VEB |         |
|         +-----+         |
|        /   |   \        |
|       /    |    \       |
|      /     |     \      |
+-----+------+------+-----+
      |      |       |
     PF    VF 1    VF 2
     /       |       | 
 +---+---+  VM4  +---+---+
 |  sw   |       |macvtap|
 | switch|       +---+---+
 +-+-+-+-+           |
   / | \            VM5
  /  |  \
VM1 VM2 VM3

This has VMs 1-3 hanging of the PF via a linux bridge (traditional hv
switching), VM4 directly owning VF1 (pci device assignement), and VM5
indirectly owning VF2 (macvtap passthrough, that started this whole
thing).

So, I'm understanding you saying that VM4 or VM4 sending a packet to VM1
goes in to VEB, out PF, and into linux bridging code, rigth?  At which
point the PF is in promiscuous mode (btw, same does not work if bridge is
attached to VF, at least for some VFs, due to lack of promiscuous mode).

> Packets sent from a guest with a VF to the address of another guest with
> a VF need to be forwarded similarly, but the driver should be able to
> infer that from (3).

Right, and that works currently for the case where both guests are like
VM4, they directly own the VF via PCI device assignement.  But for VM4
to talk to VM5, VF3 is not in promiscuous mode and has a different MAC
address than VM5's vNIC.  If the embedded bridge does not learn, and
nobody programmed it to fwd frames for VM5 via VF3...

I believe this is what Roopa's patch will allow.  The question now is
whether there's a better way to handle this?

In my mind, we'd model the NIC's embedded bridge as, well, a bridge.
And set anti-spoofing, port mirroring, port mac/vlan filtering, etc via
that bridge.

thanks,
-chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-30 23:00                         ` Chris Wright
@ 2011-11-30 23:19                           ` Rose, Gregory V
  2011-11-30 23:30                           ` Sridhar Samudrala
  1 sibling, 0 replies; 36+ messages in thread
From: Rose, Gregory V @ 2011-11-30 23:19 UTC (permalink / raw)
  To: Chris Wright, Ben Hutchings
  Cc: Roopa Prabhu, netdev@vger.kernel.org, davem@davemloft.net,
	sri@us.ibm.com, dragos.tatulea@gmail.com, kvm@vger.kernel.org,
	arnd@arndb.de, mst@redhat.com, mchan@broadcom.com,
	dwang2@cisco.com, shemminger@vyatta.com, eric.dumazet@gmail.com,
	kaber@trash.net, benve@cisco.com

> -----Original Message-----
> From: Chris Wright [mailto:chrisw@redhat.com]
> Sent: Wednesday, November 30, 2011 3:01 PM
> To: Ben Hutchings
> Cc: Chris Wright; Rose, Gregory V; Roopa Prabhu; netdev@vger.kernel.org;
> davem@davemloft.net; sri@us.ibm.com; dragos.tatulea@gmail.com;
> kvm@vger.kernel.org; arnd@arndb.de; mst@redhat.com; mchan@broadcom.com;
> dwang2@cisco.com; shemminger@vyatta.com; eric.dumazet@gmail.com;
> kaber@trash.net; benve@cisco.com
> Subject: Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering
> support for passthru mode
> 
> * Ben Hutchings (bhutchings@solarflare.com) wrote:
> > On Wed, 2011-11-30 at 13:04 -0800, Chris Wright wrote:
> > > I agree that it's confusing.  Couldn't you simplify your ascii art
> > > (hopefully removing hw assumptions about receive processing, and
> > > completely ignoring vlans for the moment) to something like:
> > >
> > >              |RX
> > >              v
> > > +------------+-------------+
> > > |     +------+--------+    |
> > > |     | RX MAC filter |    |
> > > |     |and port select|    |
> > > |     +---------------+    |
> > > |            /|\           |
> > > |           / | \   match 2|
> > > |          /  v  \         |
> > > |         /match  \        |
> > > |        /  1 |    \       |
> > > |       /     |     \      |
> > > |match /      |      \     |
> > > |  0  /       |       \    |
> > > |    v        |        v   |
> > > |    |        |        |   |
> > > +----+--------+--------+---+
> > >      |        |        |
> > >     PF       VF 1     VF 2
> > >
> > > And there's an unclear number of ways to update "RX MAC filter and
> port
> > > select" table.
> > >
> > > 1) PF ndo_set_mac_addr
> > > I expect that to be implicit to match 0.
> > >
> > > 2) PF ndo_set_rx_mode
> > > Less clear, but I'd still expect these to implicitly match 0
> > >
> > > 3) PF ndo_set_vf_mac
> > > I expect these to be an explicit match to VF N (given the interface
> > > specifices which VF's MAC is being programmed).
> >
> > I'm not sure whether this is supposed to implicitly add to the MAC
> > filter or whether that has to be changed too.  That's the main
> > difference between my models (a) and (b).
> 
> I see now.  I wasn't entirely clear on the difference before.  It's also
> going to be hw specific.  I think (Intel folks can verify) that the
> Intel SR-IOV devices have a single global unicast exact match table,
> for example.
> 
> > There's also PF ndo_set_vf_vlan.
> 
> Right, although I had mentioned I was trying to limit just to MAC
> filtering to simplify.
> 
> > > 4) VF ndo_set_mac_addr
> > > This one may or may not be allowed (setting MAC+port if the VF is
> owned
> > > by a guest is likely not allowed), but would expect an implicit VF N.
> > >
> > > 5) VF ndo_set_rx_mode
> > > Same as 4) above.
> >
> > So this is where we are today.
> 
> Cool, good that we agree there.
> 
> > > 6) PF or VF? ndo_set_rx_filter_addr
> > > The new proposal, which has an explicit VF, although when it's VF_SELF
> > > I'm not clear if this is just the same as 5) above?
> > >
> > > Have I missed anything?
> >
> > Any physical port can be bridged to a mixture of guests with and without
> > their own VFs.  Packets sent from a guest with a VF to the address of a
> > guest without a VF need to be forwarded to the PF rather than the
> > physical port, but none of the drivers currently get to know about those
> > addresses.
> 
> To clarify, do you mean something like this?
> 
>        physical port
>              |
> +------------+------------+
> |         +-----+         |
> |         | VEB |         |
> |         +-----+         |
> |        /   |   \        |
> |       /    |    \       |
> |      /     |     \      |
> +-----+------+------+-----+
>       |      |       |
>      PF    VF 1    VF 2
>      /       |       |
>  +---+---+  VM4  +---+---+
>  |  sw   |       |macvtap|
>  | switch|       +---+---+
>  +-+-+-+-+           |
>    / | \            VM5
>   /  |  \
> VM1 VM2 VM3
> 
> This has VMs 1-3 hanging of the PF via a linux bridge (traditional hv
> switching), VM4 directly owning VF1 (pci device assignement), and VM5
> indirectly owning VF2 (macvtap passthrough, that started this whole
> thing).
> 
> So, I'm understanding you saying that VM4 or VM4 sending a packet to VM1
> goes in to VEB, out PF, and into linux bridging code, rigth?  At which
> point the PF is in promiscuous mode (btw, same does not work if bridge is
> attached to VF, at least for some VFs, due to lack of promiscuous mode).
> 
> > Packets sent from a guest with a VF to the address of another guest with
> > a VF need to be forwarded similarly, but the driver should be able to
> > infer that from (3).
> 
> Right, and that works currently for the case where both guests are like
> VM4, they directly own the VF via PCI device assignement.  But for VM4
> to talk to VM5, VF3 is not in promiscuous mode and has a different MAC
> address than VM5's vNIC.  If the embedded bridge does not learn, and
> nobody programmed it to fwd frames for VM5 via VF3...
> 
> I believe this is what Roopa's patch will allow.  The question now is
> whether there's a better way to handle this?
> 
> In my mind, we'd model the NIC's embedded bridge as, well, a bridge.
> And set anti-spoofing, port mirroring, port mac/vlan filtering, etc via
> that bridge.

If there was some way to push the bridge forwarding database down to the
underlying HW so that the filters could be programmed into the HW for
non-learning VEBs that would work too.

This hole has existed for a very long time, years now.  It'd be nice to get
it fixed.  If the community direction is to extend the current bridging
interface then that's fine, we'll go that way.

- Greg


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-30 23:00                         ` Chris Wright
  2011-11-30 23:19                           ` Rose, Gregory V
@ 2011-11-30 23:30                           ` Sridhar Samudrala
  2011-11-30 23:39                             ` Chris Wright
  1 sibling, 1 reply; 36+ messages in thread
From: Sridhar Samudrala @ 2011-11-30 23:30 UTC (permalink / raw)
  To: Chris Wright
  Cc: Ben Hutchings, Greg Rose, Roopa Prabhu, netdev@vger.kernel.org,
	davem@davemloft.net, dragos.tatulea@gmail.com,
	kvm@vger.kernel.org, arnd@arndb.de, mst@redhat.com,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com

On 11/30/2011 3:00 PM, Chris Wright wrote:
> * Ben Hutchings (bhutchings@solarflare.com) wrote:
>> On Wed, 2011-11-30 at 13:04 -0800, Chris Wright wrote:
>>> I agree that it's confusing.  Couldn't you simplify your ascii art
>>> (hopefully removing hw assumptions about receive processing, and
>>> completely ignoring vlans for the moment) to something like:
>>>
>>>               |RX
>>>               v
>>> +------------+-------------+
>>> |     +------+--------+    |
>>> |     | RX MAC filter |    |
>>> |     |and port select|    |
>>> |     +---------------+    |
>>> |            /|\           |
>>> |           / | \   match 2|
>>> |          /  v  \         |
>>> |         /match  \        |
>>> |        /  1 |    \       |
>>> |       /     |     \      |
>>> |match /      |      \     |
>>> |  0  /       |       \    |
>>> |    v        |        v   |
>>> |    |        |        |   |
>>> +----+--------+--------+---+
>>>       |        |        |
>>>      PF       VF 1     VF 2
>>>
>>> And there's an unclear number of ways to update "RX MAC filter and port
>>> select" table.
>>>
>>> 1) PF ndo_set_mac_addr
>>> I expect that to be implicit to match 0.
>>>
>>> 2) PF ndo_set_rx_mode
>>> Less clear, but I'd still expect these to implicitly match 0
>>>
>>> 3) PF ndo_set_vf_mac
>>> I expect these to be an explicit match to VF N (given the interface
>>> specifices which VF's MAC is being programmed).
>> I'm not sure whether this is supposed to implicitly add to the MAC
>> filter or whether that has to be changed too.  That's the main
>> difference between my models (a) and (b).
> I see now.  I wasn't entirely clear on the difference before.  It's also
> going to be hw specific.  I think (Intel folks can verify) that the
> Intel SR-IOV devices have a single global unicast exact match table,
> for example.
>
>> There's also PF ndo_set_vf_vlan.
> Right, although I had mentioned I was trying to limit just to MAC
> filtering to simplify.
>
>>> 4) VF ndo_set_mac_addr
>>> This one may or may not be allowed (setting MAC+port if the VF is owned
>>> by a guest is likely not allowed), but would expect an implicit VF N.
>>>
>>> 5) VF ndo_set_rx_mode
>>> Same as 4) above.
>> So this is where we are today.
> Cool, good that we agree there.
>
>>> 6) PF or VF? ndo_set_rx_filter_addr
>>> The new proposal, which has an explicit VF, although when it's VF_SELF
>>> I'm not clear if this is just the same as 5) above?
>>>
>>> Have I missed anything?
>> Any physical port can be bridged to a mixture of guests with and without
>> their own VFs.  Packets sent from a guest with a VF to the address of a
>> guest without a VF need to be forwarded to the PF rather than the
>> physical port, but none of the drivers currently get to know about those
>> addresses.
> To clarify, do you mean something like this?
>
>         physical port
>               |
> +------------+------------+
> |         +-----+         |
> |         | VEB |         |
> |         +-----+         |
> |        /   |   \        |
> |       /    |    \       |
> |      /     |     \      |
> +-----+------+------+-----+
>        |      |       |
>       PF    VF 1    VF 2
>       /       |       |
>   +---+---+  VM4  +---+---+
>   |  sw   |       |macvtap|
>   | switch|       +---+---+
>   +-+-+-+-+           |
>     / | \            VM5
>    /  |  \
> VM1 VM2 VM3
>
> This has VMs 1-3 hanging of the PF via a linux bridge (traditional hv
> switching), VM4 directly owning VF1 (pci device assignement), and VM5
> indirectly owning VF2 (macvtap passthrough, that started this whole
> thing).
>
> So, I'm understanding you saying that VM4 or VM4 sending a packet to VM1
> goes in to VEB, out PF, and into linux bridging code, rigth?  At which
> point the PF is in promiscuous mode (btw, same does not work if bridge is
> attached to VF, at least for some VFs, due to lack of promiscuous mode).
>
>> Packets sent from a guest with a VF to the address of another guest with
>> a VF need to be forwarded similarly, but the driver should be able to
>> infer that from (3).
> Right, and that works currently for the case where both guests are like
> VM4, they directly own the VF via PCI device assignement.  But for VM4
> to talk to VM5, VF3 is not in promiscuous mode and has a different MAC
> address than VM5's vNIC.  If the embedded bridge does not learn, and
> nobody programmed it to fwd frames for VM5 via VF3...
I think you are referring to VF2. There is no VF3 in your picture.
In macvtap passthru mode, VF2 will be set to the same mac address as 
VM5's MAC.
So VM4 should be be able to talk to VM5.
>
> I believe this is what Roopa's patch will allow.  The question now is
> whether there's a better way to handle this?
My understanding is that Roopa's patch will allow setting additional mac 
addresses to
VM5 without the need to put VF5 in promiscous mode.

Thanks
Sridhar
>
> In my mind, we'd model the NIC's embedded bridge as, well, a bridge.
> And set anti-spoofing, port mirroring, port mac/vlan filtering, etc via
> that bridge.
>
> thanks,
> -chris
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-30 23:30                           ` Sridhar Samudrala
@ 2011-11-30 23:39                             ` Chris Wright
  0 siblings, 0 replies; 36+ messages in thread
From: Chris Wright @ 2011-11-30 23:39 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: Chris Wright, Ben Hutchings, Greg Rose, Roopa Prabhu,
	netdev@vger.kernel.org, davem@davemloft.net,
	dragos.tatulea@gmail.com, kvm@vger.kernel.org, arnd@arndb.de,
	mst@redhat.com, mchan@broadcom.com, dwang2@cisco.com,
	shemminger@vyatta.com, eric.dumazet@gmail.com, kaber@trash.net,
	benve@cisco.com

* Sridhar Samudrala (sri@us.ibm.com) wrote:
> On 11/30/2011 3:00 PM, Chris Wright wrote:
> >        physical port
> >              |
> >+------------+------------+
> >|         +-----+         |
> >|         | VEB |         |
> >|         +-----+         |
> >|        /   |   \        |
> >|       /    |    \       |
> >|      /     |     \      |
> >+-----+------+------+-----+
> >       |      |       |
> >      PF    VF 1    VF 2
> >      /       |       |
> >  +---+---+  VM4  +---+---+
> >  |  sw   |       |macvtap|
> >  | switch|       +---+---+
> >  +-+-+-+-+           |
> >    / | \            VM5
> >   /  |  \
> >VM1 VM2 VM3
> >
> >This has VMs 1-3 hanging of the PF via a linux bridge (traditional hv
> >switching), VM4 directly owning VF1 (pci device assignement), and VM5
> >indirectly owning VF2 (macvtap passthrough, that started this whole
> >thing).
> >
> >So, I'm understanding you saying that VM4 or VM4 sending a packet to VM1
> >goes in to VEB, out PF, and into linux bridging code, rigth?  At which
> >point the PF is in promiscuous mode (btw, same does not work if bridge is
> >attached to VF, at least for some VFs, due to lack of promiscuous mode).
> >
> >>Packets sent from a guest with a VF to the address of another guest with
> >>a VF need to be forwarded similarly, but the driver should be able to
> >>infer that from (3).
> >Right, and that works currently for the case where both guests are like
> >VM4, they directly own the VF via PCI device assignement.  But for VM4
> >to talk to VM5, VF3 is not in promiscuous mode and has a different MAC
> >address than VM5's vNIC.  If the embedded bridge does not learn, and
> >nobody programmed it to fwd frames for VM5 via VF3...
> I think you are referring to VF2. There is no VF3 in your picture.

*sigh*  (also meant 'VM4 or VM5' up above, not 'VM4 or VM4')...

> In macvtap passthru mode, VF2 will be set to the same mac address as VM5's
> MAC.  So VM4 should be be able to talk to VM5.

yes (i think macvtap in bridging or vepa mode w/ single VM has that issue,
not passthru)

> >I believe this is what Roopa's patch will allow.  The question now is
> >whether there's a better way to handle this?
> My understanding is that Roopa's patch will allow setting additional mac
> addresses to
> VM5 without the need to put VF5 in promiscous mode.

Thanks for your corrections Sridar.

cheers,
-chris

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2011-11-20 16:30   ` Roopa Prabhu
@ 2012-02-02  7:24     ` Michael S. Tsirkin
  2012-02-02  8:46       ` John Fastabend
  0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2012-02-02  7:24 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Ben Hutchings, netdev, davem, chrisw, sri, dragos.tatulea, kvm,
	arnd, gregory.v.rose, mchan, dwang2, shemminger, eric.dumazet,
	kaber, benve

On Sun, Nov 20, 2011 at 08:30:24AM -0800, Roopa Prabhu wrote:
> 
> 
> 
> On 11/17/11 4:15 PM, "Ben Hutchings" <bhutchings@solarflare.com> wrote:
> 
> > Sorry to come to this rather late.
> > 
> > On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
> > [...]
> >> v2 -> v3
> >> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
> >> - Support for SRIOV VFs.
> >>         [Note: The get filters msg (in the way current get rtnetlink handles
> >>         it) might get too big for SRIOV vfs. This patch follows existing
> >> sriov 
> >>         vf get code and tries to accomodate filters for all VF's in a PF.
> >>         And for the SRIOV case I have only tested the fact that the VF
> >>         arguments are getting delivered to rtnetlink correctly. The code
> >>         follows existing sriov vf handling code so rest of it should work
> >> fine]
> > [...]
> > 
> > This is already broken for large numbers of VFs, and increasing the
> > amount of information per VF is going to make the situation worse.  I am
> > no netlink expert but I think that the current approach of bundling all
> > information about an interface in a single message may not be
> > sustainable.
> 
> Yes agreed. I have the same concern.

So it seems that we need to extend the existing interface to allow
tweaking filters per VF. Does it need to block this
patchset though? After all, we'll need to support the existing
interface indefinitely, too.

-- 
MST

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2012-02-02  7:24     ` Michael S. Tsirkin
@ 2012-02-02  8:46       ` John Fastabend
  2012-02-02  8:50         ` Michael S. Tsirkin
                           ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: John Fastabend @ 2012-02-02  8:46 UTC (permalink / raw)
  To: Michael S. Tsirkin, Roopa Prabhu
  Cc: Ben Hutchings, netdev, davem, chrisw, sri, dragos.tatulea, kvm,
	arnd, gregory.v.rose, mchan, dwang2, shemminger, eric.dumazet,
	kaber, benve

On 2/1/2012 11:24 PM, Michael S. Tsirkin wrote:
> On Sun, Nov 20, 2011 at 08:30:24AM -0800, Roopa Prabhu wrote:
>>
>>
>>
>> On 11/17/11 4:15 PM, "Ben Hutchings" <bhutchings@solarflare.com> wrote:
>>
>>> Sorry to come to this rather late.
>>>
>>> On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
>>> [...]
>>>> v2 -> v3
>>>> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
>>>> - Support for SRIOV VFs.
>>>>         [Note: The get filters msg (in the way current get rtnetlink handles
>>>>         it) might get too big for SRIOV vfs. This patch follows existing
>>>> sriov 
>>>>         vf get code and tries to accomodate filters for all VF's in a PF.
>>>>         And for the SRIOV case I have only tested the fact that the VF
>>>>         arguments are getting delivered to rtnetlink correctly. The code
>>>>         follows existing sriov vf handling code so rest of it should work
>>>> fine]
>>> [...]
>>>
>>> This is already broken for large numbers of VFs, and increasing the
>>> amount of information per VF is going to make the situation worse.  I am
>>> no netlink expert but I think that the current approach of bundling all
>>> information about an interface in a single message may not be
>>> sustainable.
>>
>> Yes agreed. I have the same concern.
> 
> So it seems that we need to extend the existing interface to allow
> tweaking filters per VF. Does it need to block this
> patchset though? After all, we'll need to support the existing

hmm not sure I follow what patchset is this blocking?

> interface indefinitely, too.
> 

OK finally got to read through this. And its not clear to me why we need
these per VF/PF filter netdevice ops and netlink extensions if we can
get the stacking correct. (Adding filters to the macvlan seems reasonable
to me)

In the cases I saw listed above I see a few enumerations:

PF <--> MACVLAN  <---> Guest <--- [...]

VF <--> MACVLAN  <---> Guest <--- [...]       

                    VF|Guest <--- [...]       direct assigned VF

                    PF|Guest <--- [...]       direct assigned PF

I used '[...]' to represent whatever additional stacking is done in the
guest unknown to the host. In the direct assign VF case (Greg Rose
correct me if I am wrong) the normal uc and mc addr lists should suffice
along with the netdev op ndo_set_rx_mode(). Here the guest adds MAC
addresses and/or VLANS as normal and then the VF<->PF back channel
should handle this if needed. This should work for Linux guests and other
OS's should do something similar.

In the direct assign PF case the hardware is owned by the guest so
no problems here.

This leaves the two MACVLAN cases which can be handled the same. If
the MACVLAN driver and netlink interface is extended to add filters
to the MACVLAN then the addresses can be pushed to the lower device
using the normal dev_uc_{add|del}() and dev_mc_{add|del}() routines.

I think this has some real advantages to the above scheme. First
we get rid of _all_ the drivers having to add a bunch of new
net_device ops and do it once in the layer above. This is nice
for driver implementers but also because your feature becomes usable
immediately and we don't have to wait for driver developers to implement
it.

Also it prunes down the number of netlink extensions being added
here. 

Additionally the existing semantics seem a bit strange to me on the
netlink message side. Taking a quick look at the macvlan implementation
it looks like every set has to have a complete list of address. But
the dev_uc_add and dev_uc_del seem to be using a refcnt scheme so
if I want to add a second address and then latter a third address
how does that work?

Is the expected flow from user space 'read uc_list -> write uc_list'?
This seems risky because with two adders in user space you might
lose addresses unless they are somehow kept in sync. IMHO it is likely
easier to implement an ADD and DEL attribute rather than a table
approach. Took a quick stab at something like this below but there
might be a better way to do this and allow direct modification of the
uc and mc lists I think means you could remove a uc address added
by some stacked device maybe a VLAN. (just guessing.)

Sorry if I missed something in the above thread I read most of it. And
maybe I missed something or oversimplified the problem.

Thanks,
John

+/* MACVLAN ADDRLIST management section 
+ *
+ * Contains attributes to expose multicast and unicast hardware
+ * RX address filters to user space.
+ *
+ * FIELDS:
+ * - IFLA_ADDRLIST_{UC|MC}
+ *
+ *   Read only attributes, returns currently set mc or uc addr list.
+ *
+ * - IFLA_ADDRLIST_{UC|MC}_ADD
+ *
+ *   Write only attributes, adds listed addresses to dev uc or mc
+ *   RX filter address lists.
+ *
+ * - IFLA_ADDRLIST_{UC|MC}_DEL
+ *
+ *   Write only attributes, deletes listed addresses in dev uc or
+ *   mc RX filter address lists.
+ *
+ * PRECEDENCE:
+ *
+ * Add operations are parsed before delete operations. Passing a
+ * single netlink message with a single address in both the add
+ * and del lists will result in an addresses being added and then
+ * removed.
+ *
+ * USAGE:
+ *
+ *     [IFLA_ADDRLISTS]
+ *             [IFLA_ADDRLIST_UC]
+ *                     [IFLA_ADDRLIST_ADDR], ...
+ *             [IFLA_ADDRLIST_UC_ADD]
+ *                     [IFLA_ADDRLIST_ADDR], ...
+ *             [IFLA_ADDRLIST_UC_DEL]
+ *                     [IFLA_ADDRLIST_ADDR}, ...
+ *             [IFLA_ADDRLIST_MC]
+ *                     [IFLA_ADDRLIST_ADDR], ...
+ *             [IFLA_ADDRLIST_MC_ADD]
+ *                     [IFLA_ADDRLIST_ADDR], ...
+ *             [IFLA_ADDRLIST_MC_DEL]
+ *                     [IFLA_ADDRLIST_ADDR}, ...
+ *
+ * NOTES:
+ *
+ * This interface exposes the uc and mc addresses. Addresses
+ * are handled with reference counting so adding the same address
+ * repeatedly will increment the reference count. No effort is
+ * made to determine if the address being deleted was not added
+ * by a stacked object earlier e.g. VLAN. This could for instance
+ * result in ingress VLAN traffic being dropped.
+ */

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2012-02-02  8:46       ` John Fastabend
@ 2012-02-02  8:50         ` Michael S. Tsirkin
  2012-02-02  9:04           ` John Fastabend
  2012-02-02 18:07         ` Roopa Prabhu
  2012-02-02 20:38         ` Ben Hutchings
  2 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2012-02-02  8:50 UTC (permalink / raw)
  To: John Fastabend
  Cc: Roopa Prabhu, Ben Hutchings, netdev, davem, chrisw, sri,
	dragos.tatulea, kvm, arnd, gregory.v.rose, mchan, dwang2,
	shemminger, eric.dumazet, kaber, benve

On Thu, Feb 02, 2012 at 12:46:57AM -0800, John Fastabend wrote:
> On 2/1/2012 11:24 PM, Michael S. Tsirkin wrote:
> > On Sun, Nov 20, 2011 at 08:30:24AM -0800, Roopa Prabhu wrote:
> >>
> >>
> >>
> >> On 11/17/11 4:15 PM, "Ben Hutchings" <bhutchings@solarflare.com> wrote:
> >>
> >>> Sorry to come to this rather late.
> >>>
> >>> On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
> >>> [...]
> >>>> v2 -> v3
> >>>> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
> >>>> - Support for SRIOV VFs.
> >>>>         [Note: The get filters msg (in the way current get rtnetlink handles
> >>>>         it) might get too big for SRIOV vfs. This patch follows existing
> >>>> sriov 
> >>>>         vf get code and tries to accomodate filters for all VF's in a PF.
> >>>>         And for the SRIOV case I have only tested the fact that the VF
> >>>>         arguments are getting delivered to rtnetlink correctly. The code
> >>>>         follows existing sriov vf handling code so rest of it should work
> >>>> fine]
> >>> [...]
> >>>
> >>> This is already broken for large numbers of VFs, and increasing the
> >>> amount of information per VF is going to make the situation worse.  I am
> >>> no netlink expert but I think that the current approach of bundling all
> >>> information about an interface in a single message may not be
> >>> sustainable.
> >>
> >> Yes agreed. I have the same concern.
> > 
> > So it seems that we need to extend the existing interface to allow
> > tweaking filters per VF. Does it need to block this
> > patchset though? After all, we'll need to support the existing
> 
> hmm not sure I follow what patchset is this blocking?

The one you are replying to.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2012-02-02  8:50         ` Michael S. Tsirkin
@ 2012-02-02  9:04           ` John Fastabend
  0 siblings, 0 replies; 36+ messages in thread
From: John Fastabend @ 2012-02-02  9:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Roopa Prabhu, Ben Hutchings, netdev, davem, chrisw, sri,
	dragos.tatulea, kvm, arnd, gregory.v.rose, mchan, dwang2,
	shemminger, eric.dumazet, kaber, benve

On 2/2/2012 12:50 AM, Michael S. Tsirkin wrote:
> On Thu, Feb 02, 2012 at 12:46:57AM -0800, John Fastabend wrote:
>> On 2/1/2012 11:24 PM, Michael S. Tsirkin wrote:
>>> On Sun, Nov 20, 2011 at 08:30:24AM -0800, Roopa Prabhu wrote:
>>>>
>>>>
>>>>
>>>> On 11/17/11 4:15 PM, "Ben Hutchings" <bhutchings@solarflare.com> wrote:
>>>>
>>>>> Sorry to come to this rather late.
>>>>>
>>>>> On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
>>>>> [...]
>>>>>> v2 -> v3
>>>>>> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
>>>>>> - Support for SRIOV VFs.
>>>>>>         [Note: The get filters msg (in the way current get rtnetlink handles
>>>>>>         it) might get too big for SRIOV vfs. This patch follows existing
>>>>>> sriov 
>>>>>>         vf get code and tries to accomodate filters for all VF's in a PF.
>>>>>>         And for the SRIOV case I have only tested the fact that the VF
>>>>>>         arguments are getting delivered to rtnetlink correctly. The code
>>>>>>         follows existing sriov vf handling code so rest of it should work
>>>>>> fine]
>>>>> [...]
>>>>>
>>>>> This is already broken for large numbers of VFs, and increasing the
>>>>> amount of information per VF is going to make the situation worse.  I am
>>>>> no netlink expert but I think that the current approach of bundling all
>>>>> information about an interface in a single message may not be
>>>>> sustainable.
>>>>
>>>> Yes agreed. I have the same concern.
>>>
>>> So it seems that we need to extend the existing interface to allow
>>> tweaking filters per VF. Does it need to block this
>>> patchset though? After all, we'll need to support the existing
>>
>> hmm not sure I follow what patchset is this blocking?
> 
> The one you are replying to.

Gotcha that would seem OK to me although I think you can avoid it altogether.

.John

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2012-02-02  8:46       ` John Fastabend
  2012-02-02  8:50         ` Michael S. Tsirkin
@ 2012-02-02 18:07         ` Roopa Prabhu
  2012-02-02 18:58           ` John Fastabend
  2012-02-02 20:38         ` Ben Hutchings
  2 siblings, 1 reply; 36+ messages in thread
From: Roopa Prabhu @ 2012-02-02 18:07 UTC (permalink / raw)
  To: John Fastabend, Michael S. Tsirkin
  Cc: Ben Hutchings, netdev, davem, chrisw, sri, dragos.tatulea, kvm,
	arnd, gregory.v.rose, mchan, dwang2, shemminger, eric.dumazet,
	kaber, benve




On 2/2/12 12:46 AM, "John Fastabend" <john.r.fastabend@intel.com> wrote:

> On 2/1/2012 11:24 PM, Michael S. Tsirkin wrote:
>> On Sun, Nov 20, 2011 at 08:30:24AM -0800, Roopa Prabhu wrote:
>>> 
>>> 
>>> 
>>> On 11/17/11 4:15 PM, "Ben Hutchings" <bhutchings@solarflare.com> wrote:
>>> 
>>>> Sorry to come to this rather late.
>>>> 
>>>> On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
>>>> [...]
>>>>> v2 -> v3
>>>>> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
>>>>> - Support for SRIOV VFs.
>>>>>         [Note: The get filters msg (in the way current get rtnetlink
>>>>> handles
>>>>>         it) might get too big for SRIOV vfs. This patch follows existing
>>>>> sriov 
>>>>>         vf get code and tries to accomodate filters for all VF's in a PF.
>>>>>         And for the SRIOV case I have only tested the fact that the VF
>>>>>         arguments are getting delivered to rtnetlink correctly. The code
>>>>>         follows existing sriov vf handling code so rest of it should work
>>>>> fine]
>>>> [...]
>>>> 
>>>> This is already broken for large numbers of VFs, and increasing the
>>>> amount of information per VF is going to make the situation worse.  I am
>>>> no netlink expert but I think that the current approach of bundling all
>>>> information about an interface in a single message may not be
>>>> sustainable.
>>> 
>>> Yes agreed. I have the same concern.
>> 
>> So it seems that we need to extend the existing interface to allow
>> tweaking filters per VF. Does it need to block this
>> patchset though? After all, we'll need to support the existing
> 
> hmm not sure I follow what patchset is this blocking?
> 
>> interface indefinitely, too.
>> 
> 
> OK finally got to read through this. And its not clear to me why we need
> these per VF/PF filter netdevice ops and netlink extensions if we can
> get the stacking correct. (Adding filters to the macvlan seems reasonable
> to me)
> 
> In the cases I saw listed above I see a few enumerations:
> 
> PF <--> MACVLAN  <---> Guest <--- [...]
> 
> VF <--> MACVLAN  <---> Guest <--- [...]
> 
>                     VF|Guest <--- [...]       direct assigned VF
> 
>                     PF|Guest <--- [...]       direct assigned PF
> 
> 
> I used '[...]' to represent whatever additional stacking is done in the
> guest unknown to the host. In the direct assign VF case (Greg Rose
> correct me if I am wrong) the normal uc and mc addr lists should suffice
> along with the netdev op ndo_set_rx_mode(). Here the guest adds MAC
> addresses and/or VLANS as normal and then the VF<->PF back channel
> should handle this if needed. This should work for Linux guests and other
> OS's should do something similar.
> 
> In the direct assign PF case the hardware is owned by the guest so
> no problems here.
> 
> This leaves the two MACVLAN cases which can be handled the same. If
> the MACVLAN driver and netlink interface is extended to add filters
> to the MACVLAN then the addresses can be pushed to the lower device
> using the normal dev_uc_{add|del}() and dev_mc_{add|del}() routines.

My patches were trying to do just this (unless I am missing something).

> 
> I think this has some real advantages to the above scheme. First
> we get rid of _all_ the drivers having to add a bunch of new
> net_device ops and do it once in the layer above. This is nice
> for driver implementers but also because your feature becomes usable
> immediately and we don't have to wait for driver developers to implement
> it.

Yes my patches were targeting towards this too. I had macvlan implement the
netlink ops and macvlan internally was using the dev_uc_add and del routines
to pass the addr lists to lower device.

> 
> Also it prunes down the number of netlink extensions being added
> here. 
> 
> Additionally the existing semantics seem a bit strange to me on the
> netlink message side. Taking a quick look at the macvlan implementation
> it looks like every set has to have a complete list of address. But
> the dev_uc_add and dev_uc_del seem to be using a refcnt scheme so
> if I want to add a second address and then latter a third address
> how does that work?

Every set has a complete list of addresses because, for macvlan non-passthru
modes, in future we might want to have macvlan driver do the filtering (This
is for the case when we have a single lower device and multiple macvlans)

> 
> Is the expected flow from user space 'read uc_list -> write uc_list'?
> This seems risky because with two adders in user space you might
> lose addresses unless they are somehow kept in sync. IMHO it is likely
> easier to implement an ADD and DEL attribute rather than a table
> approach.

The ADD and DEL will work for macvlan passthru mode because it maps 1-1 with
the lowerdev uc and mc list. The table was for non passthru modes when
macvlan driver might need to do filtering. So my patchset started with
macvlan filter table for all macvlan modes (hopefully) with passthru mode as
a specific case of offloading everything to the lowerdevice.

 Also the table was mimicking existing tap device filter table for macvtap.

> Took a quick stab at something like this below but there
> might be a better way to do this and allow direct modification of the
> uc and mc lists I think means you could remove a uc address added
> by some stacked device maybe a VLAN. (just guessing.)
> 
> Sorry if I missed something in the above thread I read most of it. And
> maybe I missed something or oversimplified the problem.

I might be overcomplicating things :). I have had no time to look at this
again. I had started with looking at using current interfaces and I hadn't
found anything straight forward. But was planning to look at it again.

> 
> Thanks,
> John
> 
> 
> 
> +/* MACVLAN ADDRLIST management section
> + *
> + * Contains attributes to expose multicast and unicast hardware
> + * RX address filters to user space.
> + *
> + * FIELDS:
> + * - IFLA_ADDRLIST_{UC|MC}
> + *
> + *   Read only attributes, returns currently set mc or uc addr list.
> + *
> + * - IFLA_ADDRLIST_{UC|MC}_ADD
> + *
> + *   Write only attributes, adds listed addresses to dev uc or mc
> + *   RX filter address lists.
> + *
> + * - IFLA_ADDRLIST_{UC|MC}_DEL
> + *
> + *   Write only attributes, deletes listed addresses in dev uc or
> + *   mc RX filter address lists.
> + *
> + * PRECEDENCE:
> + *
> + * Add operations are parsed before delete operations. Passing a
> + * single netlink message with a single address in both the add
> + * and del lists will result in an addresses being added and then
> + * removed.
> + *
> + * USAGE:
> + *
> + *     [IFLA_ADDRLISTS]
> + *             [IFLA_ADDRLIST_UC]
> + *                     [IFLA_ADDRLIST_ADDR], ...
> + *             [IFLA_ADDRLIST_UC_ADD]
> + *                     [IFLA_ADDRLIST_ADDR], ...
> + *             [IFLA_ADDRLIST_UC_DEL]
> + *                     [IFLA_ADDRLIST_ADDR}, ...
> + *             [IFLA_ADDRLIST_MC]
> + *                     [IFLA_ADDRLIST_ADDR], ...
> + *             [IFLA_ADDRLIST_MC_ADD]
> + *                     [IFLA_ADDRLIST_ADDR], ...
> + *             [IFLA_ADDRLIST_MC_DEL]
> + *                     [IFLA_ADDRLIST_ADDR}, ...
> + *
> + * NOTES:
> + *
> + * This interface exposes the uc and mc addresses. Addresses
> + * are handled with reference counting so adding the same address
> + * repeatedly will increment the reference count. No effort is
> + * made to determine if the address being deleted was not added
> + * by a stacked object earlier e.g. VLAN. This could for instance
> + * result in ingress VLAN traffic being dropped.
> + */

In general since we don't have a netlink mechanism to add del mc/uc addr
list from userspace (which I was looking for in the first place initially)
such mechanism will be good to have too. I will also think about this some
more.

Thanks,
Roopa

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2012-02-02 18:07         ` Roopa Prabhu
@ 2012-02-02 18:58           ` John Fastabend
  2012-02-03 15:32             ` Roopa Prabhu
  0 siblings, 1 reply; 36+ messages in thread
From: John Fastabend @ 2012-02-02 18:58 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Michael S. Tsirkin, Ben Hutchings, netdev, davem, chrisw, sri,
	dragos.tatulea, kvm, arnd, gregory.v.rose, mchan, dwang2,
	shemminger, eric.dumazet, kaber, benve

On 2/2/2012 10:07 AM, Roopa Prabhu wrote:
> 
> 
> 
> On 2/2/12 12:46 AM, "John Fastabend" <john.r.fastabend@intel.com> wrote:
> 
>> On 2/1/2012 11:24 PM, Michael S. Tsirkin wrote:
>>> On Sun, Nov 20, 2011 at 08:30:24AM -0800, Roopa Prabhu wrote:
>>>>
>>>>
>>>>
>>>> On 11/17/11 4:15 PM, "Ben Hutchings" <bhutchings@solarflare.com> wrote:
>>>>
>>>>> Sorry to come to this rather late.
>>>>>
>>>>> On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
>>>>> [...]
>>>>>> v2 -> v3
>>>>>> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
>>>>>> - Support for SRIOV VFs.
>>>>>>         [Note: The get filters msg (in the way current get rtnetlink
>>>>>> handles
>>>>>>         it) might get too big for SRIOV vfs. This patch follows existing
>>>>>> sriov 
>>>>>>         vf get code and tries to accomodate filters for all VF's in a PF.
>>>>>>         And for the SRIOV case I have only tested the fact that the VF
>>>>>>         arguments are getting delivered to rtnetlink correctly. The code
>>>>>>         follows existing sriov vf handling code so rest of it should work
>>>>>> fine]
>>>>> [...]
>>>>>
>>>>> This is already broken for large numbers of VFs, and increasing the
>>>>> amount of information per VF is going to make the situation worse.  I am
>>>>> no netlink expert but I think that the current approach of bundling all
>>>>> information about an interface in a single message may not be
>>>>> sustainable.
>>>>
>>>> Yes agreed. I have the same concern.
>>>
>>> So it seems that we need to extend the existing interface to allow
>>> tweaking filters per VF. Does it need to block this
>>> patchset though? After all, we'll need to support the existing
>>
>> hmm not sure I follow what patchset is this blocking?
>>
>>> interface indefinitely, too.
>>>
>>
>> OK finally got to read through this. And its not clear to me why we need
>> these per VF/PF filter netdevice ops and netlink extensions if we can
>> get the stacking correct. (Adding filters to the macvlan seems reasonable
>> to me)
>>
>> In the cases I saw listed above I see a few enumerations:
>>
>> PF <--> MACVLAN  <---> Guest <--- [...]
>>
>> VF <--> MACVLAN  <---> Guest <--- [...]
>>
>>                     VF|Guest <--- [...]       direct assigned VF
>>
>>                     PF|Guest <--- [...]       direct assigned PF
>>
>>
>> I used '[...]' to represent whatever additional stacking is done in the
>> guest unknown to the host. In the direct assign VF case (Greg Rose
>> correct me if I am wrong) the normal uc and mc addr lists should suffice
>> along with the netdev op ndo_set_rx_mode(). Here the guest adds MAC
>> addresses and/or VLANS as normal and then the VF<->PF back channel
>> should handle this if needed. This should work for Linux guests and other
>> OS's should do something similar.
>>
>> In the direct assign PF case the hardware is owned by the guest so
>> no problems here.
>>
>> This leaves the two MACVLAN cases which can be handled the same. If
>> the MACVLAN driver and netlink interface is extended to add filters
>> to the MACVLAN then the addresses can be pushed to the lower device
>> using the normal dev_uc_{add|del}() and dev_mc_{add|del}() routines.
> 
> My patches were trying to do just this (unless I am missing something).
> 

Right I was trying enumerate the cases. Your patches 5,6 seem to use
dev_{uc|mc}_{add|del} like this.

>>
>> I think this has some real advantages to the above scheme. First
>> we get rid of _all_ the drivers having to add a bunch of new
>> net_device ops and do it once in the layer above. This is nice
>> for driver implementers but also because your feature becomes usable
>> immediately and we don't have to wait for driver developers to implement
>> it.
> 
> Yes my patches were targeting towards this too. I had macvlan implement the
> netlink ops and macvlan internally was using the dev_uc_add and del routines
> to pass the addr lists to lower device.

Yes. But I am missing why the VF ops and netlink extensions are useful. Or
even the op/netlink extension into the PF for that matter.

> 
>>
>> Also it prunes down the number of netlink extensions being added
>> here. 
>>
>> Additionally the existing semantics seem a bit strange to me on the
>> netlink message side. Taking a quick look at the macvlan implementation
>> it looks like every set has to have a complete list of address. But
>> the dev_uc_add and dev_uc_del seem to be using a refcnt scheme so
>> if I want to add a second address and then latter a third address
>> how does that work?
> 
> Every set has a complete list of addresses because, for macvlan non-passthru
> modes, in future we might want to have macvlan driver do the filtering (This
> is for the case when we have a single lower device and multiple macvlans)
> 

hmm but lists seem problematic when hooked up to the netdev uc and mc addr
lists. Consider this case

read uc_list  <--- thread1: dumps unicast table
add vlan      <--- thread2: adds a vlan maybe inserting a uc addr
write uc_list <--- thread1: writes the table back + 1 addr

Does the uc addr of the vlan get deleted? And this case

read uc_list   <--- dump table
write uc_list  <--- add a new filter A to the uc list
read uc_list   <--- dump table
write uc_list  <--- add a second filter B to the uc list

Now based on your patch 4,5 it looks like the refcnt on the address A is
two so to remove it I have to call set filters twice without the A addr.

read  uc_list   <--- dump table
write uc_list   <--- list without A
write uc_list   <--- list without A

This seems really easy to get screwed up and it doesn't look like user
space can learn the refcnt (at least in this series).


>>
>> Is the expected flow from user space 'read uc_list -> write uc_list'?
>> This seems risky because with two adders in user space you might
>> lose addresses unless they are somehow kept in sync. IMHO it is likely
>> easier to implement an ADD and DEL attribute rather than a table
>> approach.
> 
> The ADD and DEL will work for macvlan passthru mode because it maps 1-1 with
> the lowerdev uc and mc list. The table was for non passthru modes when
> macvlan driver might need to do filtering. So my patchset started with
> macvlan filter table for all macvlan modes (hopefully) with passthru mode as
> a specific case of offloading everything to the lowerdevice.
> 

Still this doesn't require a table right. Repeated ADD/DEL should work correct?

>  Also the table was mimicking existing tap device filter table for macvtap.
> 

But the tap filter isn't directly manipulating the uc/mc betdev tables. I think
this is a key difference.

>> Took a quick stab at something like this below but there
>> might be a better way to do this and allow direct modification of the
>> uc and mc lists I think means you could remove a uc address added
>> by some stacked device maybe a VLAN. (just guessing.)
>>
>> Sorry if I missed something in the above thread I read most of it. And
>> maybe I missed something or oversimplified the problem.
> 
> I might be overcomplicating things :). I have had no time to look at this
> again. I had started with looking at using current interfaces and I hadn't
> found anything straight forward. But was planning to look at it again.
> 

I'm wondering if this is really just a macvlan specific thing after all. I
think your first v1 series was more closely done like this.

>>
>> Thanks,
>> John
>>
>>
>>
>> +/* MACVLAN ADDRLIST management section
>> + *
>> + * Contains attributes to expose multicast and unicast hardware
>> + * RX address filters to user space.
>> + *
>> + * FIELDS:
>> + * - IFLA_ADDRLIST_{UC|MC}
>> + *
>> + *   Read only attributes, returns currently set mc or uc addr list.
>> + *
>> + * - IFLA_ADDRLIST_{UC|MC}_ADD
>> + *
>> + *   Write only attributes, adds listed addresses to dev uc or mc
>> + *   RX filter address lists.
>> + *
>> + * - IFLA_ADDRLIST_{UC|MC}_DEL
>> + *
>> + *   Write only attributes, deletes listed addresses in dev uc or
>> + *   mc RX filter address lists.
>> + *
>> + * PRECEDENCE:
>> + *
>> + * Add operations are parsed before delete operations. Passing a
>> + * single netlink message with a single address in both the add
>> + * and del lists will result in an addresses being added and then
>> + * removed.
>> + *
>> + * USAGE:
>> + *
>> + *     [IFLA_ADDRLISTS]
>> + *             [IFLA_ADDRLIST_UC]
>> + *                     [IFLA_ADDRLIST_ADDR], ...
>> + *             [IFLA_ADDRLIST_UC_ADD]
>> + *                     [IFLA_ADDRLIST_ADDR], ...
>> + *             [IFLA_ADDRLIST_UC_DEL]
>> + *                     [IFLA_ADDRLIST_ADDR}, ...
>> + *             [IFLA_ADDRLIST_MC]
>> + *                     [IFLA_ADDRLIST_ADDR], ...
>> + *             [IFLA_ADDRLIST_MC_ADD]
>> + *                     [IFLA_ADDRLIST_ADDR], ...
>> + *             [IFLA_ADDRLIST_MC_DEL]
>> + *                     [IFLA_ADDRLIST_ADDR}, ...
>> + *
>> + * NOTES:
>> + *
>> + * This interface exposes the uc and mc addresses. Addresses
>> + * are handled with reference counting so adding the same address
>> + * repeatedly will increment the reference count. No effort is
>> + * made to determine if the address being deleted was not added
>> + * by a stacked object earlier e.g. VLAN. This could for instance
>> + * result in ingress VLAN traffic being dropped.
>> + */
> 
> In general since we don't have a netlink mechanism to add del mc/uc addr
> list from userspace (which I was looking for in the first place initially)
> such mechanism will be good to have too. I will also think about this some
> more.
> 

Are you sure they will be good to have? I'm  not so sure you want to be
able to manipulate the uc and mc tables from user space. MACVLAN seems to
be one type of device where it is useful but doing this to a PF or VF seems
hard to use for any real use case. Fun to test the embedded bridge though.

> Thanks,
> Roopa
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2012-02-02  8:46       ` John Fastabend
  2012-02-02  8:50         ` Michael S. Tsirkin
  2012-02-02 18:07         ` Roopa Prabhu
@ 2012-02-02 20:38         ` Ben Hutchings
  2012-02-02 21:18           ` John Fastabend
  2 siblings, 1 reply; 36+ messages in thread
From: Ben Hutchings @ 2012-02-02 20:38 UTC (permalink / raw)
  To: John Fastabend
  Cc: Michael S. Tsirkin, Roopa Prabhu, netdev, davem, chrisw, sri,
	dragos.tatulea, kvm, arnd, gregory.v.rose, mchan, dwang2,
	shemminger, eric.dumazet, kaber, benve

On Thu, 2012-02-02 at 00:46 -0800, John Fastabend wrote:
[...]
> OK finally got to read through this. And its not clear to me why we need
> these per VF/PF filter netdevice ops and netlink extensions if we can
> get the stacking correct. (Adding filters to the macvlan seems reasonable
> to me)
> 
> In the cases I saw listed above I see a few enumerations:
> 
> PF <--> MACVLAN  <---> Guest <--- [...]
> 
> VF <--> MACVLAN  <---> Guest <--- [...]       
> 
>                     VF|Guest <--- [...]       direct assigned VF
> 
>                     PF|Guest <--- [...]       direct assigned PF
> 
> 
> I used '[...]' to represent whatever additional stacking is done in the
> guest unknown to the host. In the direct assign VF case (Greg Rose
> correct me if I am wrong) the normal uc and mc addr lists should suffice
> along with the netdev op ndo_set_rx_mode(). Here the guest adds MAC
> addresses and/or VLANS as normal and then the VF<->PF back channel
> should handle this if needed. This should work for Linux guests and other
> OS's should do something similar.
> 
> In the direct assign PF case the hardware is owned by the guest so
> no problems here.
> 
> This leaves the two MACVLAN cases which can be handled the same. If
> the MACVLAN driver and netlink interface is extended to add filters
> to the MACVLAN then the addresses can be pushed to the lower device
> using the normal dev_uc_{add|del}() and dev_mc_{add|del}() routines.
[...]

There is another case: hybrid acceleration.  Without a bridge in the
NIC, you need a software bridge for multicast/broadcast replication,
traffic between guests, and traffic between guest and host.  A guest
driver can then send and receive to remote addresses through a VF while
retaining fallback to the software bridge.

In order to do this, the guest driver needs to know which addresses are
local.  The net driver for the PF can tell it about the addresses
assigned to each function, but if there are other devices included in
the bridge then it will not know about them.

In Solarflare's current out-of-tree implementation this is dealt with in
an extension to libvirt that writes the additional 'local' MAC addresses
to a driver-specific sysfs file, but that is obviously not likely to be
acceptable in-tree!  So I was interested in this proposal to extend MAC
filtering, but wanted to get the semantics clear.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2012-02-02 20:38         ` Ben Hutchings
@ 2012-02-02 21:18           ` John Fastabend
  0 siblings, 0 replies; 36+ messages in thread
From: John Fastabend @ 2012-02-02 21:18 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Michael S. Tsirkin, Roopa Prabhu, netdev, davem, chrisw, sri,
	dragos.tatulea, kvm, arnd, gregory.v.rose, mchan, dwang2,
	shemminger, eric.dumazet, kaber, benve

On 2/2/2012 12:38 PM, Ben Hutchings wrote:
> On Thu, 2012-02-02 at 00:46 -0800, John Fastabend wrote:
> [...]
>> OK finally got to read through this. And its not clear to me why we need
>> these per VF/PF filter netdevice ops and netlink extensions if we can
>> get the stacking correct. (Adding filters to the macvlan seems reasonable
>> to me)
>>
>> In the cases I saw listed above I see a few enumerations:
>>
>> PF <--> MACVLAN  <---> Guest <--- [...]
>>
>> VF <--> MACVLAN  <---> Guest <--- [...]       
>>
>>                     VF|Guest <--- [...]       direct assigned VF
>>
>>                     PF|Guest <--- [...]       direct assigned PF
>>
>>
>> I used '[...]' to represent whatever additional stacking is done in the
>> guest unknown to the host. In the direct assign VF case (Greg Rose
>> correct me if I am wrong) the normal uc and mc addr lists should suffice
>> along with the netdev op ndo_set_rx_mode(). Here the guest adds MAC
>> addresses and/or VLANS as normal and then the VF<->PF back channel
>> should handle this if needed. This should work for Linux guests and other
>> OS's should do something similar.
>>
>> In the direct assign PF case the hardware is owned by the guest so
>> no problems here.
>>
>> This leaves the two MACVLAN cases which can be handled the same. If
>> the MACVLAN driver and netlink interface is extended to add filters
>> to the MACVLAN then the addresses can be pushed to the lower device
>> using the normal dev_uc_{add|del}() and dev_mc_{add|del}() routines.
> [...]
> 
> There is another case: hybrid acceleration.  Without a bridge in the
> NIC, you need a software bridge for multicast/broadcast replication,
> traffic between guests, and traffic between guest and host.  A guest
> driver can then send and receive to remote addresses through a VF while
> retaining fallback to the software bridge.
> 
> In order to do this, the guest driver needs to know which addresses are
> local.  The net driver for the PF can tell it about the addresses
> assigned to each function, but if there are other devices included in
> the bridge then it will not know about them.

Off the top of my head another approach would be to add a flag to the
PF maybe NETIF_F_FDB_OFFLOADED and have the bridge push the fdb updates
down to the PF which could propagate them into the VFs. Then you don't
need any new netdevice ops or netlink extensions.

This also would allow any 'learned' addresses to be pushed into the
VF and your daemon wouldn't have to monitor the bridges FDB and send
updates down.

> 
> In Solarflare's current out-of-tree implementation this is dealt with in
> an extension to libvirt that writes the additional 'local' MAC addresses
> to a driver-specific sysfs file, but that is obviously not likely to be
> acceptable in-tree!  So I was interested in this proposal to extend MAC
> filtering, but wanted to get the semantics clear.

Agreed. Intel devices needs something similar to handle the bridge cases
where the MAC addresses of the virtual devices are not getting programmed
into the hardware forwarding table.

So we need something its just finding the right semantics and mechanism.

> 
> Ben.
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2012-02-02 18:58           ` John Fastabend
@ 2012-02-03 15:32             ` Roopa Prabhu
  2012-02-05 16:54               ` Roopa Prabhu
  0 siblings, 1 reply; 36+ messages in thread
From: Roopa Prabhu @ 2012-02-03 15:32 UTC (permalink / raw)
  To: John Fastabend
  Cc: Michael S. Tsirkin, Ben Hutchings, netdev, davem, chrisw, sri,
	dragos.tatulea, kvm, arnd, gregory.v.rose, mchan, dwang2,
	shemminger, eric.dumazet, kaber, benve




On 2/2/12 10:58 AM, "John Fastabend" <john.r.fastabend@intel.com> wrote:

> On 2/2/2012 10:07 AM, Roopa Prabhu wrote:

<snip>..
>> 
>> My patches were trying to do just this (unless I am missing something).
>> 
> 
> Right I was trying enumerate the cases. Your patches 5,6 seem to use
> dev_{uc|mc}_{add|del} like this.
> 
Ok

>>> 
>>> I think this has some real advantages to the above scheme. First
>>> we get rid of _all_ the drivers having to add a bunch of new
>>> net_device ops and do it once in the layer above. This is nice
>>> for driver implementers but also because your feature becomes usable
>>> immediately and we don't have to wait for driver developers to implement
>>> it.
>> 
>> Yes my patches were targeting towards this too. I had macvlan implement the
>> netlink ops and macvlan internally was using the dev_uc_add and del routines
>> to pass the addr lists to lower device.
> 
> Yes. But I am missing why the VF ops and netlink extensions are useful. Or
> even the op/netlink extension into the PF for that matter.
> 

This was to support something that intel wanted. I think Ben described that
in another email. This was done by v3 version of the patch. This is not
needed for the macvlan problem that this patch is trying to solve. We were
trying to make the new netlink interface fit other use cases if possible.

I believe at this point we are convinced that the hybrid acceleration with
PF/VF that ben described can be solved in possibly other ways.

>> 
>>> 
>>> Also it prunes down the number of netlink extensions being added
>>> here. 
>>> 
>>> Additionally the existing semantics seem a bit strange to me on the
>>> netlink message side. Taking a quick look at the macvlan implementation
>>> it looks like every set has to have a complete list of address. But
>>> the dev_uc_add and dev_uc_del seem to be using a refcnt scheme so
>>> if I want to add a second address and then latter a third address
>>> how does that work?
>> 
>> Every set has a complete list of addresses because, for macvlan non-passthru
>> modes, in future we might want to have macvlan driver do the filtering (This
>> is for the case when we have a single lower device and multiple macvlans)
>> 
> 
> hmm but lists seem problematic when hooked up to the netdev uc and mc addr
> lists. Consider this case
> 
> read uc_list  <--- thread1: dumps unicast table
> add vlan      <--- thread2: adds a vlan maybe inserting a uc addr
> write uc_list <--- thread1: writes the table back + 1 addr
> 
> Does the uc addr of the vlan get deleted? And this case

It should.

> 
> read uc_list   <--- dump table
> write uc_list  <--- add a new filter A to the uc list
> read uc_list   <--- dump table
> write uc_list  <--- add a second filter B to the uc list
> 
> Now based on your patch 4,5 it looks like the refcnt on the address A is
> two so to remove it I have to call set filters twice without the A addr.

I think this depends on the implementation of the handler. The macvtap
handler will remove A and add B if A is not part of the second filter.


> 
> read  uc_list   <--- dump table
> write uc_list   <--- list without A
> write uc_list   <--- list without A
> 
> This seems really easy to get screwed up and it doesn't look like user
> space can learn the refcnt (at least in this series).
> 
> 


the sequences you are describing above are possible but they depend on how
you implement the filter handler I think.
For macvlan, this filter op could just populate the internal filter.
Except that for passthru mode it tries to program the lower dev hw filter
using uc/mc lists. And in passthru mode it assumes that it owns the lower
device, and tries to make sure that it adds or deletes only addresses it
knows about. I think if you have another thread also adding/deleting address
to the lowerdev when it is assigned to macvtap in passthru mode, the other
thread might see inconsistent results.

The netlink filter interface which the patch was trying to add was not a
replacement for existing uc/mc lists. It was really targeted to virtual
devices that want to do filtering on their own.
I believe uc/mc lists are used to set/unset hw filters and they are not used
in fwding lookups in the kernel (pls correct me if I am wrong). The filter
that this netlink msg was trying to set was for devices that want to do
filtering/fwding lookups like the hw.

>>> 
>>> Is the expected flow from user space 'read uc_list -> write uc_list'?
>>> This seems risky because with two adders in user space you might
>>> lose addresses unless they are somehow kept in sync. IMHO it is likely
>>> easier to implement an ADD and DEL attribute rather than a table
>>> approach.
>> 
>> The ADD and DEL will work for macvlan passthru mode because it maps 1-1 with
>> the lowerdev uc and mc list. The table was for non passthru modes when
>> macvlan driver might need to do filtering. So my patchset started with
>> macvlan filter table for all macvlan modes (hopefully) with passthru mode as
>> a specific case of offloading everything to the lowerdevice.
>> 
> 
> Still this doesn't require a table right. Repeated ADD/DEL should work
> correct?
> 
Yes that's correct. You could still have an ADD/DEL interface to populate
the table.

>>  Also the table was mimicking existing tap device filter table for macvtap.
>> 
> 
> But the tap filter isn't directly manipulating the uc/mc betdev tables. I
> think
> this is a key difference.
> 
Yes. The uc/mc list came into picture only for macvlan passthru mode because
there is a 1-1 mapping between hw  dev and macvtap and its simpler to just
pass the addr filters to the lowerdev to program the hw via mc/uc.

>>> Took a quick stab at something like this below but there
>>> might be a better way to do this and allow direct modification of the
>>> uc and mc lists I think means you could remove a uc address added
>>> by some stacked device maybe a VLAN. (just guessing.)
>>> 
>>> Sorry if I missed something in the above thread I read most of it. And
>>> maybe I missed something or oversimplified the problem.
>> 
>> I might be overcomplicating things :). I have had no time to look at this
>> again. I had started with looking at using current interfaces and I hadn't
>> found anything straight forward. But was planning to look at it again.
>> 
> 
> I'm wondering if this is really just a macvlan specific thing after all. I
> think your first v1 series was more closely done like this.
> 
Yes that's correct. I still think v1 might be cleaner. But there we had to
extend the tun filter interface to handle vlans too. And michael had a point
when he said that we should probably add newer netlink interfaces instead of
extending the existing ioctl interfaces.


>> 
>> In general since we don't have a netlink mechanism to add del mc/uc addr
>> list from userspace (which I was looking for in the first place initially)
>> such mechanism will be good to have too. I will also think about this some
>> more.
>> 
> 
> Are you sure they will be good to have? I'm  not so sure you want to be
> able to manipulate the uc and mc tables from user space. MACVLAN seems to
> be one type of device where it is useful but doing this to a PF or VF seems
> hard to use for any real use case. Fun to test the embedded bridge though.
> 

I wont say I am sure. Would be nice have to have netlink interfaces to
ADD/DEL additional uc, mc addrs, filter flags and vlans. I have looked at
the existing interfaces and nothing seemed straightforward then. But I
forget and need to take a look again.
I think vlans and filter flags is somehow possible today. And maybe mc too.
But if I am right we don't have a way to add additional unicast addresses
from userspace. 

I will dig my notes and try and list down the problems with using the
existing netlink interfaces for this.

Thanks,
Roopa

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2012-02-03 15:32             ` Roopa Prabhu
@ 2012-02-05 16:54               ` Roopa Prabhu
  2012-02-09  2:03                 ` John Fastabend
  0 siblings, 1 reply; 36+ messages in thread
From: Roopa Prabhu @ 2012-02-05 16:54 UTC (permalink / raw)
  To: Roopa Prabhu, John Fastabend
  Cc: Michael S. Tsirkin, Ben Hutchings, netdev, davem, chrisw, sri,
	dragos.tatulea, kvm, arnd, gregory.v.rose, mchan, dwang2,
	shemminger, eric.dumazet, kaber, benve

On 2/3/12 7:32 AM, "Roopa Prabhu" <roprabhu@cisco.com> wrote:

> 
> 
> 
> On 2/2/12 10:58 AM, "John Fastabend" <john.r.fastabend@intel.com> wrote:
<snip>..

>> Are you sure they will be good to have? I'm  not so sure you want to be
>> able to manipulate the uc and mc tables from user space. MACVLAN seems to
>> be one type of device where it is useful but doing this to a PF or VF seems
>> hard to use for any real use case. Fun to test the embedded bridge though.
>> 
> 
> I wont say I am sure. Would be nice have to have netlink interfaces to
> ADD/DEL additional uc, mc addrs, filter flags and vlans. I have looked at
> the existing interfaces and nothing seemed straightforward then. But I
> forget and need to take a look again.
> I think vlans and filter flags is somehow possible today. And maybe mc too.
> But if I am right we don't have a way to add additional unicast addresses
> from userspace. 
> 
> I will dig my notes and try and list down the problems with using the
> existing netlink interfaces for this.

There are kernel api's/ops to add/del hw uc/mc/vlan/filter filter flags:
Ndo_set_rx_mode, add/del_vid, dev_uc_add, dev_mc_add and dev_filter_flags.

But there are no straight forward mechanisms to add these from userspace. L2
mc addresses can be added by SIOCADDMULTI. And filter flags maybe via
netlink. Nothing for uc and vlan as far as I know (correct me if I am
wrong). Setting of hw filters is usually done indirectly by the kernel
during creation of vlan devices for example.

There is a netlink msg to create a vlan device. But there is no way to add a
vlan filter directly to the hw. Nothing for secondary uc addrs.
This is ok for all cases except for the virtualization case I am trying to
solve. 

To summarize,

The requirement is to have a mechanism from userspace to populate hw filters
on a device. And this is required to program guest nic filters into the host
device backing the guest nic. In the direct attach case, its the macvtap
device and in turn the macvtap lower device.

Today I cant think of any other use case that would require this (except
that there is a brief chance that this could be used in the hybrid
acceleration stuff that ben and intel have been discussing).

I see the below ways this can be done:
1) TUNSETTXFILTER: My v1 patch, that targets only the above specific macvtap
problem. This works for only uc/mc and flags filter. Possibly requires a new
cmd TUNSETVLANFILTER for vlan filters.

2) rtnetlink ops for setting hw filters: My v2 patch targeting virtual
devices that implement rtnl_link_ops. Example macvtap/macvlan

This netlink interface to set filters follows TUNSETTXFILTER giving the
ability to set filters on these devices. The netlink payload must contain
all the uc, mc, vid's and filter flags that go on the device.

3) netdev_ops for setting hw filters: my v3 and v4 patches. This is same as
2 but moves the ops to netdev, so that it can be used by all devices if
required.

4) v5 (New approach. Not submitted yet):
In 2 and 3 above, the netlink msg could be broken down to have separate msgs
to support add/del of uc/mc/vlan. This should be close to what we have today
for vf vlan and vf mac. (Something similar to what John Fastabend was
suggesting too). Advantage, use existing hw ops. (This slightly varies from
the original goal which was not targeted at getting in to uc,mc lists alone.
The goal was to have macvlan maintain its own filters and use it in fwding
lookups if needed in the future. But I guess if we need this in the future
we could possibly use the macvlan uc, mc lists.)

Netlink msgs to set hw filters (basically for dev_uc_add/del,
dev_mc_add/del, and vlans). The below is not a final cut. Just attempting
something here. Please comment.

[IFLA_FILTER_ADDR] = {
    [IFLA_FILTER_ADD_ADDR] = {
        [IFLA_FILTER_HWADDR]      // Maybe a list here
    }

    [IFLA_FILTER_DEL_ADDR] = {
        [IFLA_FILTER_HWADDR]    // Maybe a list here
    }
}

[IFLA_FILTER_VLAN] = {
    [IFLA_FILTER_ADD_VLAN] = {
        [IFLA_FILTER_VID]          // Maybe a list here too
    }
    [IFLA_FILTER_DEL_VLAN] = {
        [IFLA_FILTER_VID] // Maybe a list here too
    }
}

No additional ops, these will call the dev_uc_add/del and add/del_vid api's
directly.

If people think this might be a better way to go, I can work up a patch for
this.

Or if anyone thinks we can use something existing please let me know.

Again, this is needed for passing guest filters in the virtualization case.
I don't see a need to add support for this in iproute2 too (unless anyone
thinks otherwise)

[Note1: Since the addr is a resource and multiple adds/dels are handled by
reference counts, a thread can really call delete multiple times and delete
all references of a particular addr even if he had not owned it. But this is
possible even with existing code today I think. Except that today the kernel
does not expose an interface to do this to userspace

Note2: We probably will need one for filter flags as well. I am still
thinking maybe we can use an existing interface for that. For symmetry a
IFLA_FILTER_FLAGS would be nice]

Comments ?
Thanks,
Roopa

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
  2012-02-05 16:54               ` Roopa Prabhu
@ 2012-02-09  2:03                 ` John Fastabend
  0 siblings, 0 replies; 36+ messages in thread
From: John Fastabend @ 2012-02-09  2:03 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Michael S. Tsirkin, Ben Hutchings, netdev, davem, chrisw, sri,
	dragos.tatulea, kvm, arnd, gregory.v.rose, mchan, dwang2,
	shemminger, eric.dumazet, kaber, benve

On 2/5/2012 8:54 AM, Roopa Prabhu wrote:
> 
> 
> 
> On 2/3/12 7:32 AM, "Roopa Prabhu" <roprabhu@cisco.com> wrote:
> 
>>
>>
>>
>> On 2/2/12 10:58 AM, "John Fastabend" <john.r.fastabend@intel.com> wrote:
> <snip>..
> 
>>> Are you sure they will be good to have? I'm  not so sure you want to be
>>> able to manipulate the uc and mc tables from user space. MACVLAN seems to
>>> be one type of device where it is useful but doing this to a PF or VF seems
>>> hard to use for any real use case. Fun to test the embedded bridge though.
>>>
>>
>> I wont say I am sure. Would be nice have to have netlink interfaces to
>> ADD/DEL additional uc, mc addrs, filter flags and vlans. I have looked at
>> the existing interfaces and nothing seemed straightforward then. But I
>> forget and need to take a look again.
>> I think vlans and filter flags is somehow possible today. And maybe mc too.
>> But if I am right we don't have a way to add additional unicast addresses
>> from userspace. 
>>
>> I will dig my notes and try and list down the problems with using the
>> existing netlink interfaces for this.
> 
> There are kernel api's/ops to add/del hw uc/mc/vlan/filter filter flags:
> Ndo_set_rx_mode, add/del_vid, dev_uc_add, dev_mc_add and dev_filter_flags.
> 
> But there are no straight forward mechanisms to add these from userspace. L2
> mc addresses can be added by SIOCADDMULTI. And filter flags maybe via
> netlink. Nothing for uc and vlan as far as I know (correct me if I am
> wrong). Setting of hw filters is usually done indirectly by the kernel
> during creation of vlan devices for example.
> 
> There is a netlink msg to create a vlan device. But there is no way to add a
> vlan filter directly to the hw. Nothing for secondary uc addrs.
> This is ok for all cases except for the virtualization case I am trying to
> solve. 

Well there is the somewhat asymmetric case where we allow port VLANs to be set
on a VF but not on the PF. I can think of cases (802.1Qbg) where firmware
might be doing EVB or CDCP and enforce a specific filtering. With current
asymmetric interfaces I'm not sure how to expose this on the PF.

to see how this works look for 'ifla_vf_vlan'.

> 
> To summarize,
> 
> The requirement is to have a mechanism from userspace to populate hw filters
> on a device. And this is required to program guest nic filters into the host
> device backing the guest nic. In the direct attach case, its the macvtap
> device and in turn the macvtap lower device.

Yes I agree we would like a mechanism to do this.

> 
> Today I cant think of any other use case that would require this (except
> that there is a brief chance that this could be used in the hybrid
> acceleration stuff that ben and intel have been discussing).

I'll post a RFC I hacked out today to do this with the ./net/bridge code
in a minute. (still needs testing and some fixups).

> 
> I see the below ways this can be done:
> 1) TUNSETTXFILTER: My v1 patch, that targets only the above specific macvtap
> problem. This works for only uc/mc and flags filter. Possibly requires a new
> cmd TUNSETVLANFILTER for vlan filters.
> 
> 
> 2) rtnetlink ops for setting hw filters: My v2 patch targeting virtual
> devices that implement rtnl_link_ops. Example macvtap/macvlan
> 
> This netlink interface to set filters follows TUNSETTXFILTER giving the
> ability to set filters on these devices. The netlink payload must contain
> all the uc, mc, vid's and filter flags that go on the device.
> 
> 
> 3) netdev_ops for setting hw filters: my v3 and v4 patches. This is same as
> 2 but moves the ops to netdev, so that it can be used by all devices if
> required.
> 
> 
> 4) v5 (New approach. Not submitted yet):
> In 2 and 3 above, the netlink msg could be broken down to have separate msgs
> to support add/del of uc/mc/vlan. This should be close to what we have today
> for vf vlan and vf mac. (Something similar to what John Fastabend was
> suggesting too). Advantage, use existing hw ops. (This slightly varies from
> the original goal which was not targeted at getting in to uc,mc lists alone.
> The goal was to have macvlan maintain its own filters and use it in fwding
> lookups if needed in the future. But I guess if we need this in the future
> we could possibly use the macvlan uc, mc lists.)
> 

hmm I'm on the fence here. I like that (4) is generic but I'm not sure I
would want user space to come in and whack a uc addr added from a stacked
device. Looks like there is a free bit in netdev_hw_addr maybe we could add
another bool here to let user space only modify/delete entries that haven't
been locked by the kernel.

That said (1) seems like the straight forward approach and if we don't have
a compelling reason and use case to add the ability to modify the uc and mc
lists generically then adding features for features sake might not be a good.
I'm leaning towards (1) unless someone has a use case for the others and is
really going to implement something with it.

.John

> 
> Netlink msgs to set hw filters (basically for dev_uc_add/del,
> dev_mc_add/del, and vlans). The below is not a final cut. Just attempting
> something here. Please comment.
> 
> [IFLA_FILTER_ADDR] = {
>     [IFLA_FILTER_ADD_ADDR] = {
>         [IFLA_FILTER_HWADDR]      // Maybe a list here
>     }
> 
>     [IFLA_FILTER_DEL_ADDR] = {
>         [IFLA_FILTER_HWADDR]    // Maybe a list here
>     }
> }
> 
> [IFLA_FILTER_VLAN] = {
>     [IFLA_FILTER_ADD_VLAN] = {
>         [IFLA_FILTER_VID]          // Maybe a list here too
>     }
>     [IFLA_FILTER_DEL_VLAN] = {
>         [IFLA_FILTER_VID] // Maybe a list here too
>     }
> }
> 
> 
> 
> No additional ops, these will call the dev_uc_add/del and add/del_vid api's
> directly.
> 
> If people think this might be a better way to go, I can work up a patch for
> this.
> 
> Or if anyone thinks we can use something existing please let me know.
> 
> Again, this is needed for passing guest filters in the virtualization case.
> I don't see a need to add support for this in iproute2 too (unless anyone
> thinks otherwise)
> 
> [Note1: Since the addr is a resource and multiple adds/dels are handled by
> reference counts, a thread can really call delete multiple times and delete
> all references of a particular addr even if he had not owned it. But this is
> possible even with existing code today I think. Except that today the kernel
> does not expose an interface to do this to userspace
> 
> Note2: We probably will need one for filter flags as well. I am still
> thinking maybe we can use an existing interface for that. For symmetry a
> IFLA_FILTER_FLAGS would be nice]
> 
> 
> 
> Comments ?
> Thanks,
> Roopa
> 
> 
> 
>  
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2012-02-09  2:03 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-09  7:55 [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode Roopa Prabhu
2011-11-09  7:55 ` [net-next-2.6 PATCH 1/6 v4] rtnetlink: Netlink interface for setting MAC and VLAN filters Roopa Prabhu
2011-11-18  0:17   ` Ben Hutchings
2011-11-09  7:55 ` [net-next-2.6 PATCH 2/6 v4] net: Add netdev_ops to set and get MAC/VLAN rx filters Roopa Prabhu
2011-11-09  7:55 ` [net-next-2.6 PATCH 3/6 v4] rtnetlink: Add support to set MAC/VLAN filters Roopa Prabhu
2011-11-09  7:55 ` [net-next-2.6 PATCH 4/6 v4] rtnetlink: Add support to get " Roopa Prabhu
2011-11-09  7:56 ` [net-next-2.6 PATCH 5/6 v4] macvlan: Add support to for netdev ops to set " Roopa Prabhu
2011-11-09  7:56 ` [net-next-2.6 PATCH 6/6 v4] macvlan: Add support to get MAC/VLAN filter netdev ops Roopa Prabhu
2011-11-18  0:15 ` [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode Ben Hutchings
2011-11-18  0:32   ` Greg Rose
2011-11-18  0:44     ` Ben Hutchings
2011-11-18 16:58       ` Greg Rose
2011-11-18 17:40         ` Ben Hutchings
2011-11-21 17:41           ` Greg Rose
2011-11-29 16:35             ` Ben Hutchings
2011-11-29 17:19               ` Ben Hutchings
2011-11-30 17:34                 ` Greg Rose
2011-11-30 18:48                   ` Ben Hutchings
2011-11-30 21:04                     ` Chris Wright
2011-11-30 21:34                       ` Ben Hutchings
2011-11-30 23:00                         ` Chris Wright
2011-11-30 23:19                           ` Rose, Gregory V
2011-11-30 23:30                           ` Sridhar Samudrala
2011-11-30 23:39                             ` Chris Wright
2011-11-20 16:30   ` Roopa Prabhu
2012-02-02  7:24     ` Michael S. Tsirkin
2012-02-02  8:46       ` John Fastabend
2012-02-02  8:50         ` Michael S. Tsirkin
2012-02-02  9:04           ` John Fastabend
2012-02-02 18:07         ` Roopa Prabhu
2012-02-02 18:58           ` John Fastabend
2012-02-03 15:32             ` Roopa Prabhu
2012-02-05 16:54               ` Roopa Prabhu
2012-02-09  2:03                 ` John Fastabend
2012-02-02 20:38         ` Ben Hutchings
2012-02-02 21:18           ` John Fastabend

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).