Netdev List
 help / color / mirror / Atom feed
* bnx2x: VF RSS support
From: Ariel Elior @ 2013-09-04 11:09 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Eilon Greenstein, Ariel Elior

Hi Dave,

This patch series adds the capability for VF functions to use multiple queues
and Receive / Transmit side scaling.

Patch #1 enhances the PF's side database to allow for multiple queues per PF
and configure the HW appropriately, and the PF side of the VF PF channel
message for configuring the RSS.

Patch #2 adds to the VF side the ability to request multiple queues, and if
obtained to configure RSS for them over the VF PF channel.

Thanks,
Ariel Elior

^ permalink raw reply

* GSO/GRO and UDP performance
From: James Yonan @ 2013-09-04 10:07 UTC (permalink / raw)
  To: netdev

I'm looking at ways to improve UDP performance in the kernel. 
Specifically I'd like to take some of the ideas in GSO/GRO for TCP and 
apply them to UDP as well.  Our use case is OpenVPN, but these methods 
should apply to any UDP-based app.

AS I understand GSO/GRO for TCP, there are essentially two central features:

(a) it's a way of batching packets with similar headers together so that 
they can efficiently traverse the network stack as a single unit

(b) it explicitly maps the batching of packets to the L4 segmenting 
features of TCP, so that batched packets can be coalesced into TCP segments.

This approach works great for TCP because of its built-in L4 segmenting 
features, but it tends to break down for UDP because of (b) in 
particular -- UDP doesn't have an L4 segmenting model, so the 
gso_segment method for UDP resorts to segmenting the packets with L3 IP 
fragmentation (i.e. UFO).  The problem is that IP fragmentation is 
broken on so many different levels that it can't be relied on for apps 
that need to communicate over the open internet (*).  Most UDP apps do 
their own app-level fragmentation and wouldn't want to be forced to buy 
into IP fragmentation in order to get the performance benefits of GSO/GRO.

So I would like to propose a GSO/GRO implementation for UDP that works 
by batching together separate UDP packets with similar headers into a 
single skb.  There is no-tie in with L3 IP fragmentation -- the packets 
are sent over the wire and received as individual UDP packets.

Here is an example of how this might work in practice:

When I call sendmmsg from userspace with a bunch of UDP packets having 
the same header, the kernel would assemble these packets into a single 
skb via shinfo(skb)->frag_list.  There would need to be a new gso_type 
indicating that frag_list is simply a list of UDP packets having the 
same header that should be transmitted separately.  No IP fragmentation 
would be necessary as long as the app has correctly sized the packets 
for the link MTU.

Once this skb is about to reach the driver, dev_hard_start_xmit could do 
the usual GSO thing and separate out the packets in 
shinfo(skb)->frag_list and pass them individually to the driver's 
ndo_start_xmit method, if the driver doesn't support batched UDP 
packets.  There would need to be a new gso_type for this batching model, 
e.g. "SKB_GSO_UDP_BUNDLE" that drivers could optionally support.

On the receive side, we would define a gro_receive method for UDP (none 
currently exists) that does the same batching in reverse:  UDP packets 
with the same header would be collected into shinfo(skb)->frag_list and 
gso_type would be set to SKB_GSO_UDP_BUNDLE.

The bundle of UDP packets would traverse the stack as a unit until it 
reaches the socket layer, where recvmmsg could pass the whole bundle up 
to userspace in a single transaction (or recvmsg could disaggregate the 
bundle and pass each datagram individually).

This approach should also significantly speed up UDP apps running on VM 
guests, because the skbs of bundled UDP packets could be passed across 
the hypervisor/guest barrier in a single transaction.

Because this technique bundles UDP packets without coalescing or 
modifying them, the approach should be lossless with respect to 
bridging, hypervisor/guest communication, routing, etc.  It also doesn't 
interfere with existing hardware support for L4 checksum offloading 
(unlike UFO).

Could this work?  Are there problems with this that I'm not considering? 
  Are there better or existing ways of doing this?

Thanks,
James

---------------------

(*) Well-known issues of UDP/IP fragmentation:

1. Relies on PMTU discovery, which often doesn't work in the real world 
because of inconsistent ICMP forwarding policies.

2. Breaks down on high-bandwidth links because the IPv4 16-bit packet ID 
value can wrap around, causing data corruption.

3. One fragment lost in transit means that the whole superpacket is lost.

James

^ permalink raw reply

* Re: [PATCH net-next v2 5/6] bonding: restructure and add rcu for bond_for_each_slave_next()
From: Veaceslav Falico @ 2013-09-04 10:35 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <522700EF.3000702@huawei.com>

On Wed, Sep 04, 2013 at 05:44:15PM +0800, Ding Tianhong wrote:
...snip...
>+/* Check whether the slave is the only one in bond */
>+#define bond_is_only_slave(bond, pos) \
>+	(((pos)->list.prev == &(bond)->slave_list) && \
>+				    ((pos)->list.next == &(bond)->slave_list))

Could be done without pos at all -

!list_empty(&(bond)->slave_list) && \
&(bond)->slave_list.next == &(bond)->slave_list.prev

If we have only one slave and pos is NOT our slave then... well.. we have
big troubles.

>+
> /**
>  * bond_for_each_slave_from - iterate the slaves list from a starting point
>  * @bond:	the bond holding this list.
>  * @pos:	current slave.
>- * @cnt:	counter for max number of moves
>  * @start:	starting point.
>  *
>  * Caller must hold bond->lock
>  */
>-#define bond_for_each_slave_from(bond, pos, cnt, start) \
>-	for (cnt = 0, pos = start; pos && cnt < (bond)->slave_cnt; \
>-	     cnt++, pos = bond_next_slave(bond, pos))
>+#define bond_for_each_slave_from(bond, pos, start) \
>+	for (pos = start; pos && (bond_is_only_slave(bond, start) ? \
>+	    &pos->list != &bond->slave_list : \
>+	    &pos->list != &start->list); bond_is_only_slave(bond, start) ? \
>+	    (pos = list_entry(pos->list.next, typeof(*pos), list)) : \
>+	    (pos = bond_next_slave(bond, pos)))

Did you check that?

pos = slave1 (bond has more than one slave);
pos && &pos->list != &slave1->list - false.

We won't ever enter this loop if we have >1 slaves.

I don't understand this at all.

>+
>+/**
>+ * bond_for_each_slave_from_rcu - iterate the slaves list from a starting point
>+ * @bond:	the bond holding this list.
>+ * @pos:	current slave.
>+ * @start:	starting point.
>+ *
>+ * Caller must hold rcu_read_lock
>+ */
>+#define bond_for_each_slave_from_rcu(bond, pos, start) \
>+	for (pos = start; pos && (bond_is_only_slave(bond, start) ? \
>+	    &pos->list != &bond->slave_list : \
>+	    &pos->list != &start->list); bond_is_only_slave(bond, start) ? \
>+	    (pos = list_entry_rcu(pos->list.next, typeof(*pos), list)) : \
>+	    (pos = bond_next_slave_rcu(bond, pos)))

Ditto as bond_for_each_slave_from() and, also, see my comment about RCU
from patch 1.

>
> /**
>  * bond_for_each_slave - iterate over all slaves
>-- 
>1.8.2.1
>
>
>

^ permalink raw reply

* Re: [PATCH v2 net-next] pkt_sched: fq: Fair Queue packet scheduler
From: Eric Dumazet @ 2013-09-04 10:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: David Miller, netdev, Yuchung Cheng, Neal Cardwell,
	Michael S. Tsirkin
In-Reply-To: <5226D39C.9070401@redhat.com>

On Wed, 2013-09-04 at 14:30 +0800, Jason Wang wrote:

> > And tcpdump would certainly help ;)
> 
> See attachment.
> 

Nothing obvious on tcpdump (only that lot of frames are missing)

1) Are you capturing part of the payload only (like tcpdump -s 128)

2) What is the setup.

3) tc -s -d qdisc

^ permalink raw reply

* Re: [PATCH net-next v2 1/6] bonding: simplify and use RCU protection for 3ad xmit path
From: Veaceslav Falico @ 2013-09-04 10:18 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Netdev
In-Reply-To: <522700D1.5060805@huawei.com>

On Wed, Sep 04, 2013 at 05:43:45PM +0800, Ding Tianhong wrote:
...snip...
>+/**
>+ * IMPORTANT: bond_first/last_slave_rcu can return NULL in case of an empty list
>+ * Caller must hold rcu_read_lock
>+ */
>+#define bond_first_slave_rcu(bond) \
>+	list_first_or_null_rcu(&(bond)->slave_list, struct slave, list)
>+#define bond_last_slave_rcu(bond) \
>+	(list_empty(&(bond)->slave_list) ? NULL : \
>+						bond_to_slave_rcu((bond)->slave_list.prev))

Here, bond_last_slave_rcu() is racy. The list can be non-empty when
list_empty() is verified, however afterwards it might become empty, when
you call bond_to_slave_rcu(), and thus you'll get
bond_to_slave(bond->slave_list) in the result, which is not a slave.

Take a look at list_first_or_null_rcu() for a reference. The main idea is
that it first gets the ->next pointer, with RCU protection, and then
verifies if it's the list head or not, and if not - it gets the container
already. This way the ->next pointer won't get away.

These kind of bugs are really rare, but are *EXTREMELY* hard to debug.

>+
> #define bond_is_first_slave(bond, pos) ((pos)->list.prev == &(bond)->slave_list)
> #define bond_is_last_slave(bond, pos) ((pos)->list.next == &(bond)->slave_list)
>
>@@ -93,6 +106,15 @@
> 	(bond_is_first_slave(bond, pos) ? bond_last_slave(bond) : \
> 					  bond_to_slave((pos)->list.prev))
>
>+/* Since bond_first/last_slave_rcu can return NULL, these can return NULL too */
>+#define bond_next_slave_rcu(bond, pos) \
>+	(bond_is_last_slave(bond, pos) ? bond_first_slave_rcu(bond) : \
>+					 bond_to_slave_rcu((pos)->list.next))
>+
>+#define bond_prev_slave_rcu(bond, pos) \
>+	(bond_is_first_slave(bond, pos) ? bond_last_slave_rcu(bond) : \
>+					  bond_to_slave_rcu((pos)->list.prev))
>+

These two are also racy. bond_is_last/first_slave() is not rcu-ified, and
thus you can't rely on it without proper locking. Same ideas apply as per
bond_first_slave_rcu().

^ permalink raw reply

* Re: [PATCH] r8169: add ethtool eeprom change/dump feature
From: Peter Wu @ 2013-09-04  9:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, romieu, hayeswang, bhutchings
In-Reply-To: <20130903.215935.1007110179484647485.davem@davemloft.net>

On Tuesday 03 September 2013 21:59:35 David Miller wrote:
> From: Peter Wu <lekensteyn@gmail.com>
> Date: Thu, 29 Aug 2013 16:44:51 +0200
> 
> > This adds the ability to read and change EEPROM for 93C46/93C56 serial
> > EEPROM. Two-Wire serial interface and SPI are not supported. (Do not
> > even try this for SPI or other chips, it may break your hardware.)
> 
> Please block the operation on configurations, such as the aforementioned
> SPI, where it won't work.

I do not know how to detect those, as far as I know it's harmless, the r8168 
vendor driver does not perform other checks either. If I look at "Realtek 
RTL8411 EEPROM/eFUSE Datasheet 1.1", then SPI access is done via CONFIG0 which 
is then at offset 51h and not 50h (which is used for 9346CR).

This SPI seems to be used for the Boot ROM, not EEPROM. Even the "RTL8111E-VL-
CG" (May 2012) mentions the 9346CR register, so I think it is safe to assume 
that nothing breaks. At worst you read invalid values (ff) and writes are no-
op.

I can remove this scary warning from the commit message, but it would really 
be nice if a Realtek engineer could give more information on this.

Regards,
Peter

^ permalink raw reply

* [PATCH net-next v2 5/6] bonding: restructure and add rcu for bond_for_each_slave_next()
From: Ding Tianhong @ 2013-09-04  9:44 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev

Remove the wordy int and add bond_for_each_slave_next_rcu() for future use.

Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_alb.c  |  3 +--
 drivers/net/bonding/bond_main.c |  6 ++----
 drivers/net/bonding/bonding.h   | 30 ++++++++++++++++++++++++++----
 3 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 91f179d..c75d383 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -383,7 +383,6 @@ static struct slave *rlb_next_rx_slave(struct bonding *bond)
 {
 	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
 	struct slave *rx_slave, *slave, *start_at;
-	int i = 0;
 
 	if (bond_info->next_rx_slave)
 		start_at = bond_info->next_rx_slave;
@@ -392,7 +391,7 @@ static struct slave *rlb_next_rx_slave(struct bonding *bond)
 
 	rx_slave = NULL;
 
-	bond_for_each_slave_from(bond, slave, i, start_at) {
+	bond_for_each_slave_from(bond, slave, start_at) {
 		if (SLAVE_IS_OK(slave)) {
 			if (!rx_slave) {
 				rx_slave = slave;
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 39e5b1c..4190389 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -782,7 +782,6 @@ static struct slave *bond_find_best_slave(struct bonding *bond)
 	struct slave *new_active, *old_active;
 	struct slave *bestslave = NULL;
 	int mintime = bond->params.updelay;
-	int i;
 
 	new_active = bond->curr_active_slave;
 
@@ -801,7 +800,7 @@ static struct slave *bond_find_best_slave(struct bonding *bond)
 	/* remember where to stop iterating over the slaves */
 	old_active = new_active;
 
-	bond_for_each_slave_from(bond, new_active, i, old_active) {
+	bond_for_each_slave_from(bond, new_active, old_active) {
 		if (new_active->link == BOND_LINK_UP) {
 			return new_active;
 		} else if (new_active->link == BOND_LINK_BACK &&
@@ -2756,7 +2755,6 @@ do_failover:
 static void bond_ab_arp_probe(struct bonding *bond)
 {
 	struct slave *slave, *next_slave;
-	int i;
 
 	read_lock(&bond->curr_slave_lock);
 
@@ -2788,7 +2786,7 @@ static void bond_ab_arp_probe(struct bonding *bond)
 
 	/* search for next candidate */
 	next_slave = bond_next_slave(bond, bond->current_arp_slave);
-	bond_for_each_slave_from(bond, slave, i, next_slave) {
+	bond_for_each_slave_from(bond, slave, next_slave) {
 		if (IS_UP(slave->dev)) {
 			slave->link = BOND_LINK_BACK;
 			bond_set_slave_active_flags(slave);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index e97eee9..80b170a 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -115,18 +115,40 @@
 	(bond_is_first_slave(bond, pos) ? bond_last_slave_rcu(bond) : \
 					  bond_to_slave_rcu((pos)->list.prev))
 
+/* Check whether the slave is the only one in bond */
+#define bond_is_only_slave(bond, pos) \
+	(((pos)->list.prev == &(bond)->slave_list) && \
+				    ((pos)->list.next == &(bond)->slave_list))
+
 /**
  * bond_for_each_slave_from - iterate the slaves list from a starting point
  * @bond:	the bond holding this list.
  * @pos:	current slave.
- * @cnt:	counter for max number of moves
  * @start:	starting point.
  *
  * Caller must hold bond->lock
  */
-#define bond_for_each_slave_from(bond, pos, cnt, start) \
-	for (cnt = 0, pos = start; pos && cnt < (bond)->slave_cnt; \
-	     cnt++, pos = bond_next_slave(bond, pos))
+#define bond_for_each_slave_from(bond, pos, start) \
+	for (pos = start; pos && (bond_is_only_slave(bond, start) ? \
+	    &pos->list != &bond->slave_list : \
+	    &pos->list != &start->list); bond_is_only_slave(bond, start) ? \
+	    (pos = list_entry(pos->list.next, typeof(*pos), list)) : \
+	    (pos = bond_next_slave(bond, pos)))
+
+/**
+ * bond_for_each_slave_from_rcu - iterate the slaves list from a starting point
+ * @bond:	the bond holding this list.
+ * @pos:	current slave.
+ * @start:	starting point.
+ *
+ * Caller must hold rcu_read_lock
+ */
+#define bond_for_each_slave_from_rcu(bond, pos, start) \
+	for (pos = start; pos && (bond_is_only_slave(bond, start) ? \
+	    &pos->list != &bond->slave_list : \
+	    &pos->list != &start->list); bond_is_only_slave(bond, start) ? \
+	    (pos = list_entry_rcu(pos->list.next, typeof(*pos), list)) : \
+	    (pos = bond_next_slave_rcu(bond, pos)))
 
 /**
  * bond_for_each_slave - iterate over all slaves
-- 
1.8.2.1

^ permalink raw reply related

* [PATCH net-next v2 3/6] bonding: replace read_lock to rcu_read_lock for bond_3ad_get_active_agg_info()
From: Ding Tianhong @ 2013-09-04  9:43 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev

the bond slave list will protected by rcu, so the read lock should replace by
rcu read lock.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_3ad.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index c134f43..b678502 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2408,9 +2408,9 @@ int bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info *ad_info)
 {
 	int ret;
 
-	read_lock(&bond->lock);
+	rcu_read_lock();
 	ret = __bond_3ad_get_active_agg_info(bond, ad_info);
-	read_unlock(&bond->lock);
+	rcu_read_unlock();
 
 	return ret;
 }
-- 
1.8.2.1

^ permalink raw reply related

* [PATCH net-next v2 1/6] bonding: simplify and use RCU protection for 3ad xmit path
From: Ding Tianhong @ 2013-09-04  9:43 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev

The commit 278b20837511776dc9d5f6ee1c7fabd5479838bb
(bonding: initial RCU conversion) has convert the roundrobin, active-backup,
broadcast and xor xmit path to rcu protection, the performance will be better
for these mode, so this time, convert xmit path for 3ad mode.

Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
---
 drivers/net/bonding/bond_3ad.c | 32 ++++++++++++++------------------
 drivers/net/bonding/bonding.h  | 22 ++++++++++++++++++++++
 2 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 9010265..7a3860f 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -143,7 +143,7 @@ static inline struct bonding *__get_bond_by_port(struct port *port)
  */
 static inline struct port *__get_first_port(struct bonding *bond)
 {
-	struct slave *first_slave = bond_first_slave(bond);
+	struct slave *first_slave = bond_first_slave_rcu(bond);
 
 	return first_slave ? &(SLAVE_AD_INFO(first_slave).port) : NULL;
 }
@@ -163,7 +163,7 @@ static inline struct port *__get_next_port(struct port *port)
 	// If there's no bond for this port, or this is the last slave
 	if (bond == NULL)
 		return NULL;
-	slave_next = bond_next_slave(bond, slave);
+	slave_next = bond_next_slave_rcu(bond, slave);
 	if (!slave_next || bond_is_first_slave(bond, slave_next))
 		return NULL;
 
@@ -2417,16 +2417,14 @@ int bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info *ad_info)
 
 int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
 {
-	struct slave *slave, *start_at;
 	struct bonding *bond = netdev_priv(dev);
+	struct slave *slave;
 	int slave_agg_no;
 	int slaves_in_agg;
 	int agg_id;
-	int i;
 	struct ad_info ad_info;
 	int res = 1;
 
-	read_lock(&bond->lock);
 	if (__bond_3ad_get_active_agg_info(bond, &ad_info)) {
 		pr_debug("%s: Error: __bond_3ad_get_active_agg_info failed\n",
 			 dev->name);
@@ -2444,13 +2442,17 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
 
 	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
 
-	bond_for_each_slave(bond, slave) {
+	bond_for_each_slave_rcu(bond, slave) {
 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
 
 		if (agg && (agg->aggregator_identifier == agg_id)) {
-			slave_agg_no--;
-			if (slave_agg_no < 0)
-				break;
+			if (--slave_agg_no < 0) {
+				if (SLAVE_IS_OK(slave)) {
+					res = bond_dev_queue_xmit(bond,
+						skb, slave->dev);
+					goto out;
+				}
+			}
 		}
 	}
 
@@ -2460,23 +2462,17 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
 		goto out;
 	}
 
-	start_at = slave;
-
-	bond_for_each_slave_from(bond, slave, i, start_at) {
-		int slave_agg_id = 0;
+	bond_for_each_slave_rcu(bond, slave) {
 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
 
-		if (agg)
-			slave_agg_id = agg->aggregator_identifier;
-
-		if (SLAVE_IS_OK(slave) && agg && (slave_agg_id == agg_id)) {
+		if (SLAVE_IS_OK(slave) && agg &&
+			agg->aggregator_identifier == agg_id) {
 			res = bond_dev_queue_xmit(bond, skb, slave->dev);
 			break;
 		}
 	}
 
 out:
-	read_unlock(&bond->lock);
 	if (res) {
 		/* no suitable interface, frame not sent */
 		kfree_skb(skb);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 4bf52d5..9898493 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -74,6 +74,9 @@
 /* slave list primitives */
 #define bond_to_slave(ptr) list_entry(ptr, struct slave, list)
 
+/* slave list primitives, Caller must hold rcu_read_lock */
+#define bond_to_slave_rcu(ptr) list_entry_rcu(ptr, struct slave, list)
+
 /* IMPORTANT: bond_first/last_slave can return NULL in case of an empty list */
 #define bond_first_slave(bond) \
 	list_first_entry_or_null(&(bond)->slave_list, struct slave, list)
@@ -81,6 +84,16 @@
 	(list_empty(&(bond)->slave_list) ? NULL : \
 					   bond_to_slave((bond)->slave_list.prev))
 
+/**
+ * IMPORTANT: bond_first/last_slave_rcu can return NULL in case of an empty list
+ * Caller must hold rcu_read_lock
+ */
+#define bond_first_slave_rcu(bond) \
+	list_first_or_null_rcu(&(bond)->slave_list, struct slave, list)
+#define bond_last_slave_rcu(bond) \
+	(list_empty(&(bond)->slave_list) ? NULL : \
+						bond_to_slave_rcu((bond)->slave_list.prev))
+
 #define bond_is_first_slave(bond, pos) ((pos)->list.prev == &(bond)->slave_list)
 #define bond_is_last_slave(bond, pos) ((pos)->list.next == &(bond)->slave_list)
 
@@ -93,6 +106,15 @@
 	(bond_is_first_slave(bond, pos) ? bond_last_slave(bond) : \
 					  bond_to_slave((pos)->list.prev))
 
+/* Since bond_first/last_slave_rcu can return NULL, these can return NULL too */
+#define bond_next_slave_rcu(bond, pos) \
+	(bond_is_last_slave(bond, pos) ? bond_first_slave_rcu(bond) : \
+					 bond_to_slave_rcu((pos)->list.next))
+
+#define bond_prev_slave_rcu(bond, pos) \
+	(bond_is_first_slave(bond, pos) ? bond_last_slave_rcu(bond) : \
+					  bond_to_slave_rcu((pos)->list.prev))
+
 /**
  * bond_for_each_slave_from - iterate the slaves list from a starting point
  * @bond:	the bond holding this list.
-- 
1.8.2.1

^ permalink raw reply related

* [PATCH net-next v2 2/6] bonding: remove the no effect lock for bond_3ad_lacpdu_recv()
From: Ding Tianhong @ 2013-09-04  9:43 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev

There is no pointer needed read lock protection, remove the unnecessary lock
and improve performance for the 3ad recv path.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_3ad.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 7a3860f..c134f43 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2494,9 +2494,7 @@ int bond_3ad_lacpdu_recv(const struct sk_buff *skb, struct bonding *bond,
 	if (!lacpdu)
 		return ret;
 
-	read_lock(&bond->lock);
 	ret = bond_3ad_rx_indication(lacpdu, slave, skb->len);
-	read_unlock(&bond->lock);
 	return ret;
 }
 
-- 
1.8.2.1

^ permalink raw reply related

* [PATCH net-next v2 6/6] bonding: use RCU protection for alb xmit path
From: Ding Tianhong @ 2013-09-04  9:44 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev, Hideaki YOSHIFUJI

The commit 278b20837511776dc9d5f6ee1c7fabd5479838bb
(bonding: initial RCU conversion) has convert the roundrobin, active-backup,
broadcast and xor xmit path to rcu protection, the performance will be better
for these mode, so this time, convert xmit path for alb mode.

Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
---
 drivers/net/bonding/bond_alb.c | 20 +++++++++-----------
 drivers/net/bonding/bonding.h  |  2 +-
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index c75d383..c0bfd56 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -229,7 +229,7 @@ static struct slave *tlb_get_least_loaded_slave(struct bonding *bond)
 	max_gap = LLONG_MIN;
 
 	/* Find the slave with the largest gap */
-	bond_for_each_slave(bond, slave) {
+	bond_for_each_slave_rcu(bond, slave) {
 		if (SLAVE_IS_OK(slave)) {
 			long long gap = compute_gap(slave);
 
@@ -625,12 +625,14 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
 {
 	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
 	struct arp_pkt *arp = arp_pkt(skb);
-	struct slave *assigned_slave;
+	struct slave *assigned_slave, *curr_active_slave;
 	struct rlb_client_info *client_info;
 	u32 hash_index = 0;
 
 	_lock_rx_hashtbl(bond);
 
+	curr_active_slave = rcu_dereference(bond->curr_active_slave);
+
 	hash_index = _simple_hash((u8 *)&arp->ip_dst, sizeof(arp->ip_dst));
 	client_info = &(bond_info->rx_hashtbl[hash_index]);
 
@@ -654,9 +656,9 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
 			 * move the old client to primary (curr_active_slave) so
 			 * that the new client can be assigned to this entry.
 			 */
-			if (bond->curr_active_slave &&
-			    client_info->slave != bond->curr_active_slave) {
-				client_info->slave = bond->curr_active_slave;
+			if (curr_active_slave &&
+			    client_info->slave != curr_active_slave) {
+				client_info->slave = curr_active_slave;
 				rlb_update_client(client_info);
 			}
 		}
@@ -1338,8 +1340,6 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
 
 	/* make sure that the curr_active_slave do not change during tx
 	 */
-	read_lock(&bond->lock);
-	read_lock(&bond->curr_slave_lock);
 
 	switch (ntohs(skb->protocol)) {
 	case ETH_P_IP: {
@@ -1422,12 +1422,12 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
 
 	if (!tx_slave) {
 		/* unbalanced or unassigned, send through primary */
-		tx_slave = bond->curr_active_slave;
+		tx_slave = rcu_dereference(bond->curr_active_slave);
 		bond_info->unbalanced_load += skb->len;
 	}
 
 	if (tx_slave && SLAVE_IS_OK(tx_slave)) {
-		if (tx_slave != bond->curr_active_slave) {
+		if (tx_slave != rcu_dereference(bond->curr_active_slave)) {
 			memcpy(eth_data->h_source,
 			       tx_slave->dev->dev_addr,
 			       ETH_ALEN);
@@ -1442,8 +1442,6 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
 		}
 	}
 
-	read_unlock(&bond->curr_slave_lock);
-	read_unlock(&bond->lock);
 	if (res) {
 		/* no suitable interface, frame not sent */
 		kfree_skb(skb);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 80b170a..4282e65 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -536,7 +536,7 @@ static inline struct slave *bond_slave_has_mac(struct bonding *bond,
 {
 	struct slave *tmp;
 
-	bond_for_each_slave(bond, tmp)
+	bond_for_each_slave_rcu(bond, tmp)
 		if (ether_addr_equal_64bits(mac, tmp->dev->dev_addr))
 			return tmp;
 
-- 
1.8.2.1

^ permalink raw reply related

* [PATCH net-next v2 4/6] bonding: add rtnl lock for bonding_store_xmit_hash
From: Ding Tianhong @ 2013-09-04  9:43 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev

the bonding_store_xmit_hash() could update bond's xmit_policy, and
the xmit_policy is used in xmit path for xor mode, maybe it is hard
to occur any problem, but just follow the logic "don't change anything
slave-related without rtnl", so I think the rntl lock is fit here. :)

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_sysfs.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 0f539de..deb1d8e 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -382,6 +382,9 @@ static ssize_t bonding_store_xmit_hash(struct device *d,
 	int new_value, ret = count;
 	struct bonding *bond = to_bond(d);
 
+	if (!rtnl_trylock())
+		return restart_syscall();
+
 	new_value = bond_parse_parm(buf, xmit_hashtype_tbl);
 	if (new_value < 0)  {
 		pr_err("%s: Ignoring invalid xmit hash policy value %.*s.\n",
@@ -396,6 +399,7 @@ static ssize_t bonding_store_xmit_hash(struct device *d,
 			xmit_hashtype_tbl[new_value].modename, new_value);
 	}
 
+	rtnl_unlock();
 	return ret;
 }
 static DEVICE_ATTR(xmit_hash_policy, S_IRUGO | S_IWUSR,
-- 
1.8.2.1

^ permalink raw reply related

* [PATCH net-next v2 0/6] bonding: Patchset for rcu use in bonding
From: Ding Tianhong @ 2013-09-04  9:43 UTC (permalink / raw)
  To: Jay Vosburgh, Andy Gospodarek, David S. Miller,
	Nikolay Aleksandrov, Veaceslav Falico, Netdev

Hi:

The Patch Set convert the xmit of 3ad and alb mode to use rcu lock.
restructure and add more rcu list function.
add rtnl lock to protect bonding_store_xmit_hash().
remove read lock in bond_3ad_lacpdu_recv().

I test the patch well and no problems found till now.

v1: add 1 patch named remove the no effect lock for bond_3ad_lacpdu_recv().
    get the advice from Nikolay Aleksandrov, and modify some mistake in the code.

v2: accept the Nikolay Aleksandrov's advise, modify the patch 05/06.
    the new bond_for_each_slave_from will not use bond->slave_cnt and int cnt any more,
    it test well.
    move the rcu_dereference call of curr_active_slave after the lock avoid 
    a race condition with the slave removal. 

Ding Tianhong (4):
Wang Yufen (1):
Yang Yingliang (1):
  bonding: simplify and use RCU protection for 3ad xmit path
  bonding: remove the no effect lock for bond_3ad_lacpdu_recv()
  bonding: replace read_lock to rcu_read_lock for
  bond_3ad_get_active_agg_info()
  bonding: add rtnl lock for bonding_store_xmit_hash
  bonding: restructure and simplify bond_for_each_slave_next()
  bonding: use RCU protection for alb xmit path

 drivers/net/bonding/bond_3ad.c   | 38 +++++++++++++++--------------------
 drivers/net/bonding/bond_alb.c   | 23 ++++++++++-----------
 drivers/net/bonding/bond_main.c  |  6 ++----
 drivers/net/bonding/bond_sysfs.c |  4 ++++
 drivers/net/bonding/bonding.h    | 43 +++++++++++++++++++++++++++++++++++-----
 5 files changed, 70 insertions(+), 44 deletions(-)

-- 
1.8.2.1

^ permalink raw reply

* Re: [PATCH 5/7] ixgbe: use pcie_capability_read_word() to simplify code
From: Jeff Kirsher @ 2013-09-04  9:26 UTC (permalink / raw)
  To: Yijing Wang
  Cc: e1000-devel, Benjamin Herrenschmidt, Hanjun Guo, linux-kernel,
	James E.J. Bottomley, netdev, linux-pci, Bjorn Helgaas,
	David S. Miller
In-Reply-To: <1378193715-25328-5-git-send-email-wangyijing@huawei.com>


[-- Attachment #1.1: Type: text/plain, Size: 442 bytes --]

On Tue, 2013-09-03 at 15:35 +0800, Yijing Wang wrote:
> use pcie_capability_read_word() to simplify code.
> 
> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> Cc: e1000-devel@lists.sourceforge.net
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    6 ++----
>  1 files changed, 2 insertions(+), 4 deletions(-)

Thanks Yijing, I will add it to my queue.

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 433 bytes --]

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk

[-- Attachment #3: Type: text/plain, Size: 257 bytes --]

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* [net-next v4 RFC] ixgbe: Get and display the notifications from changes of the Rx vxlan UDP port
From: Jeff Kirsher @ 2013-09-04  9:13 UTC (permalink / raw)
  To: davem
  Cc: Joseph Gasparakis, netdev, gospo, sassmann, Don Skidmore,
	Jeff Kirsher
In-Reply-To: <1378286019-8719-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Joseph Gasparakis <joseph.gasparakis@intel.com>

This RFC patch showcases how drivers can use the changes in
"vxlan: Notify drivers for listening UDP port changes", and its
sole purpose is to help people test the changes introduced there.
This is not meant to be submitted for inclusion in the kernel.

CC: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 7aba452..2d2c9b6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -47,6 +47,9 @@
 #include <linux/if_bridge.h>
 #include <linux/prefetch.h>
 #include <scsi/fc/fc_fcoe.h>
+#ifdef CONFIG_VXLAN_MODULE
+#include <net/vxlan.h>
+#endif
 
 #include "ixgbe.h"
 #include "ixgbe_common.h"
@@ -5178,6 +5181,9 @@ static int ixgbe_open(struct net_device *netdev)
 
 	ixgbe_up_complete(adapter);
 
+#ifdef CONFIG_VXLAN_MODULE
+	vxlan_get_rx_port(netdev);
+#endif
 	return 0;
 
 err_set_queues:
@@ -7290,6 +7296,21 @@ static int ixgbe_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
 	return ndo_dflt_bridge_getlink(skb, pid, seq, dev, mode);
 }
 
+#ifdef CONFIG_VXLAN_MODULE
+static void ixgbe_add_vxlan_port(struct net_device *netdev,
+				 sa_family_t sa_family, __u16 port)
+{
+	netdev_info(netdev, ">>>>> Adding VXLAN port %d / protocol IPv%d\n",
+			     port, sa_family == AF_INET ? 4 : 6);
+}
+
+static void ixgbe_del_vxlan_port(struct net_device *netdev,
+				 sa_family_t sa_family, __u16 port)
+{
+	netdev_info(netdev, "<<<<< Removing VXLAN port %d / protocol IPv%d\n",
+			     port, sa_family == AF_INET ? 4 : 6);
+}
+#endif
 static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_open		= ixgbe_open,
 	.ndo_stop		= ixgbe_close,
@@ -7334,6 +7355,10 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_fdb_add		= ixgbe_ndo_fdb_add,
 	.ndo_bridge_setlink	= ixgbe_ndo_bridge_setlink,
 	.ndo_bridge_getlink	= ixgbe_ndo_bridge_getlink,
+#ifdef CONFIG_VXLAN_MODULE
+	.ndo_add_vxlan_port	= ixgbe_add_vxlan_port,
+	.ndo_del_vxlan_port	= ixgbe_del_vxlan_port,
+#endif
 };
 
 /**
-- 
1.8.3.1

^ permalink raw reply related

* [net-next v4] vxlan: Notify drivers for listening UDP port changes
From: Jeff Kirsher @ 2013-09-04  9:13 UTC (permalink / raw)
  To: davem
  Cc: Joseph Gasparakis, netdev, gospo, sassmann, John Fastabend,
	Stephen Hemminger, Jeff Kirsher

From: Joseph Gasparakis <joseph.gasparakis@intel.com>

This patch adds two more ndo ops: ndo_add_rx_vxlan_port() and
ndo_del_rx_vxlan_port().

Drivers can get notifications through the above functions about changes
of the UDP listening port of VXLAN. Also, when physical ports come up,
now they can call vxlan_get_rx_port() in order to obtain the port number(s)
of the existing VXLAN interface in case they already up before them.

This information about the listening UDP port would be used for VXLAN
related offloads.

A big thank you to John Fastabend (john.r.fastabend@intel.com) for his
input and his suggestions on this patch set.

CC: John Fastabend <john.r.fastabend@intel.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/vxlan.c       | 68 ++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/netdevice.h | 19 +++++++++++++
 include/net/vxlan.h       |  1 +
 3 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ebda3a1..0b62d82 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -558,6 +558,40 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
 	return 1;
 }
 
+/* Notify netdevs that UDP port started listening */
+static void vxlan_notify_add_rx_port(struct sock *sk)
+{
+	struct net_device *dev;
+	struct net *net = sock_net(sk);
+	sa_family_t sa_family = sk->sk_family;
+	u16 port = htons(inet_sk(sk)->inet_sport);
+
+	rcu_read_lock();
+	for_each_netdev_rcu(net, dev) {
+		if (dev->netdev_ops->ndo_add_vxlan_port)
+			dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family,
+							    port);
+	}
+	rcu_read_unlock();
+}
+
+/* Notify netdevs that UDP port is no more listening */
+static void vxlan_notify_del_rx_port(struct sock *sk)
+{
+	struct net_device *dev;
+	struct net *net = sock_net(sk);
+	sa_family_t sa_family = sk->sk_family;
+	u16 port = htons(inet_sk(sk)->inet_sport);
+
+	rcu_read_lock();
+	for_each_netdev_rcu(net, dev) {
+		if (dev->netdev_ops->ndo_del_vxlan_port)
+			dev->netdev_ops->ndo_del_vxlan_port(dev, sa_family,
+							    port);
+	}
+	rcu_read_unlock();
+}
+
 /* Add new entry to forwarding table -- assumes lock held */
 static int vxlan_fdb_create(struct vxlan_dev *vxlan,
 			    const u8 *mac, union vxlan_addr *ip,
@@ -909,13 +943,16 @@ static void vxlan_sock_hold(struct vxlan_sock *vs)
 
 void vxlan_sock_release(struct vxlan_sock *vs)
 {
-	struct vxlan_net *vn = net_generic(sock_net(vs->sock->sk), vxlan_net_id);
+	struct sock *sk = vs->sock->sk;
+	struct net *net = sock_net(sk);
+	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
 
 	if (!atomic_dec_and_test(&vs->refcnt))
 		return;
 
 	spin_lock(&vn->sock_lock);
 	hlist_del_rcu(&vs->hlist);
+	vxlan_notify_del_rx_port(sk);
 	spin_unlock(&vn->sock_lock);
 
 	queue_work(vxlan_wq, &vs->del_work);
@@ -1980,6 +2017,34 @@ static struct device_type vxlan_type = {
 	.name = "vxlan",
 };
 
+/* Calls the ndo_add_vxlan_port of the caller in order to
+ * supply the listening VXLAN udp ports.
+ */
+void vxlan_get_rx_port(struct net_device *dev)
+{
+	struct vxlan_sock *vs;
+	struct net *net = dev_net(dev);
+	struct vxlan_net *vn = net_generic(net, vxlan_net_id);
+	sa_family_t sa_family;
+	u16 port;
+	int i;
+
+	if (!dev || !dev->netdev_ops || !dev->netdev_ops->ndo_add_vxlan_port)
+		return;
+
+	spin_lock(&vn->sock_lock);
+	for (i = 0; i < PORT_HASH_SIZE; ++i) {
+		hlist_for_each_entry_rcu(vs, vs_head(net, i), hlist) {
+			port = htons(inet_sk(vs->sock->sk)->inet_sport);
+			sa_family = vs->sock->sk->sk_family;
+			dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family,
+							    port);
+		}
+	}
+	spin_unlock(&vn->sock_lock);
+}
+EXPORT_SYMBOL_GPL(vxlan_get_rx_port);
+
 /* Initialize the device structure. */
 static void vxlan_setup(struct net_device *dev)
 {
@@ -2239,6 +2304,7 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 
 	spin_lock(&vn->sock_lock);
 	hlist_add_head_rcu(&vs->hlist, vs_head(net, port));
+	vxlan_notify_add_rx_port(sk);
 	spin_unlock(&vn->sock_lock);
 
 	/* Mark socket as an encapsulation socket. */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3ad49b8..8ed4ae9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -948,6 +948,19 @@ struct netdev_phys_port_id {
  *	Called to get ID of physical port of this device. If driver does
  *	not implement this, it is assumed that the hw is not able to have
  *	multiple net devices on single physical port.
+ *
+ * void (*ndo_add_vxlan_port)(struct  net_device *dev,
+ *			      sa_family_t sa_family, __u16 port);
+ *	Called by vxlan to notiy a driver about the UDP port and socket
+ *	address family that vxlan is listnening to. It is called only when
+ *	a new port starts listening. The operation is protected by the
+ *	vxlan_net->sock_lock.
+ *
+ * void (*ndo_del_vxlan_port)(struct  net_device *dev,
+ *			      sa_family_t sa_family, __u16 port);
+ *	Called by vxlan to notify the driver about a UDP port and socket
+ *	address family that vxlan is not listening to anymore. The operation
+ *	is protected by the vxlan_net->sock_lock.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1078,6 +1091,12 @@ struct net_device_ops {
 						      bool new_carrier);
 	int			(*ndo_get_phys_port_id)(struct net_device *dev,
 							struct netdev_phys_port_id *ppid);
+	void			(*ndo_add_vxlan_port)(struct  net_device *dev,
+						      sa_family_t sa_family,
+						      __u16 port);
+	void			(*ndo_del_vxlan_port)(struct  net_device *dev,
+						      sa_family_t sa_family,
+						      __u16 port);
 };
 
 /*
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index e09c40b..2d64d3c 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -36,4 +36,5 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 
 __be16 vxlan_src_port(__u16 port_min, __u16 port_max, struct sk_buff *skb);
 
+void vxlan_get_rx_port(struct net_device *netdev);
 #endif
-- 
1.8.3.1

^ permalink raw reply related

* [RFC Patch net-next] ipv6: do not allow ipv6 module to be removed
From: Cong Wang @ 2013-09-04  9:12 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Hideaki YOSHIFUJI, Stephen Hemminger, Cong Wang

There was some bug report on ipv6 module removal path before.
Also, as Stephen pointed out, after vxlan module gets ipv6 support,
the ipv6 stub it used is not safe against this module removal either.
So, let's just remove inet6_exit() so that ipv6 module will not be
able to be unloaded.

Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>

---
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 136fe55..ad584eb 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -1023,51 +1023,4 @@ out_unregister_tcp_proto:
 }
 module_init(inet6_init);
 
-static void __exit inet6_exit(void)
-{
-	if (disable_ipv6_mod)
-		return;
-
-	/* First of all disallow new sockets creation. */
-	sock_unregister(PF_INET6);
-	/* Disallow any further netlink messages */
-	rtnl_unregister_all(PF_INET6);
-
-	udpv6_exit();
-	udplitev6_exit();
-	tcpv6_exit();
-
-	/* Cleanup code parts. */
-	ipv6_packet_cleanup();
-	ipv6_frag_exit();
-	ipv6_exthdrs_exit();
-	addrconf_cleanup();
-	ip6_flowlabel_cleanup();
-	ip6_route_cleanup();
-#ifdef CONFIG_PROC_FS
-
-	/* Cleanup code parts. */
-	if6_proc_exit();
-	ipv6_misc_proc_exit();
-	udplite6_proc_exit();
-	raw6_proc_exit();
-#endif
-	ipv6_netfilter_fini();
-	ipv6_stub = NULL;
-	igmp6_cleanup();
-	ndisc_cleanup();
-	ip6_mr_cleanup();
-	icmpv6_cleanup();
-	rawv6_exit();
-
-	unregister_pernet_subsys(&inet6_net_ops);
-	proto_unregister(&rawv6_prot);
-	proto_unregister(&udplitev6_prot);
-	proto_unregister(&udpv6_prot);
-	proto_unregister(&tcpv6_prot);
-
-	rcu_barrier(); /* Wait for completion of call_rcu()'s */
-}
-module_exit(inet6_exit);
-
 MODULE_ALIAS_NETPROTO(PF_INET6);

^ permalink raw reply related

* Re: [PATCH v3 1/3] Send loginuid and sessionid in SCM_AUDIT
From: Jan Kaluža @ 2013-09-04  9:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: rgb-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
	eparis-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, tj-DgEjT+Ai2ygdnm+yROfE0A,
	cgroups-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <87bo49gifv.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

On 09/04/2013 09:22 AM, Eric W. Biederman wrote:
> Jan Kaluza <jkaluza-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:
>
>> Server-like processes in many cases need credentials and other
>> metadata of the peer, to decide if the calling process is allowed to
>> request a specific action, or the server just wants to log away this
>> type of information for auditing tasks.
>>
>> The current practice to retrieve such process metadata is to look that
>> information up in procfs with the $PID received over SCM_CREDENTIALS.
>> This is sufficient for long-running tasks, but introduces a race which
>> cannot be worked around for short-living processes; the calling
>> process and all the information in /proc/$PID/ is gone before the
>> receiver of the socket message can look it up.
>>
>> This introduces a new SCM type called SCM_AUDIT to allow the direct
>> attaching of "loginuid" and "sessionid" to SCM, which is significantly more
>> efficient and will reliably avoid the race with the round-trip over
>> procfs.
>
> Unless I am misreading something this patch will break the build if
> CONFIG_AUDITSYSCALL is not defined.

It does build. In this case, audit_get_loginuid returns INVALID_UID and 
audit_get_sessionid returns -1.

Jan Kaluza

> Eric
>
>
>> Signed-off-by: Jan Kaluza <jkaluza-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> ---
>>   include/linux/socket.h |  6 ++++++
>>   include/net/af_unix.h  |  2 ++
>>   include/net/scm.h      | 28 ++++++++++++++++++++++++++--
>>   net/unix/af_unix.c     |  7 +++++++
>>   4 files changed, 41 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/socket.h b/include/linux/socket.h
>> index 445ef75..505047a 100644
>> --- a/include/linux/socket.h
>> +++ b/include/linux/socket.h
>> @@ -130,6 +130,7 @@ static inline struct cmsghdr * cmsg_nxthdr (struct msghdr *__msg, struct cmsghdr
>>   #define	SCM_RIGHTS	0x01		/* rw: access rights (array of int) */
>>   #define SCM_CREDENTIALS 0x02		/* rw: struct ucred		*/
>>   #define SCM_SECURITY	0x03		/* rw: security label		*/
>> +#define SCM_AUDIT	0x04		/* rw: struct uaudit		*/
>>
>>   struct ucred {
>>   	__u32	pid;
>> @@ -137,6 +138,11 @@ struct ucred {
>>   	__u32	gid;
>>   };
>>
>> +struct uaudit {
>> +	__u32	loginuid;
>> +	__u32	sessionid;
>> +};
>> +
>>   /* Supported address families. */
>>   #define AF_UNSPEC	0
>>   #define AF_UNIX		1	/* Unix domain sockets 		*/
>> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
>> index a175ba4..3b9d22a 100644
>> --- a/include/net/af_unix.h
>> +++ b/include/net/af_unix.h
>> @@ -36,6 +36,8 @@ struct unix_skb_parms {
>>   	u32			secid;		/* Security ID		*/
>>   #endif
>>   	u32			consumed;
>> +	kuid_t			loginuid;
>> +	unsigned int		sessionid;
>>   };
>>
>>   #define UNIXCB(skb) 	(*(struct unix_skb_parms *)&((skb)->cb))
>> diff --git a/include/net/scm.h b/include/net/scm.h
>> index 8de2d37..e349a25 100644
>> --- a/include/net/scm.h
>> +++ b/include/net/scm.h
>> @@ -6,6 +6,7 @@
>>   #include <linux/security.h>
>>   #include <linux/pid.h>
>>   #include <linux/nsproxy.h>
>> +#include <linux/audit.h>
>>
>>   /* Well, we should have at least one descriptor open
>>    * to accept passed FDs 8)
>> @@ -18,6 +19,11 @@ struct scm_creds {
>>   	kgid_t	gid;
>>   };
>>
>> +struct scm_audit {
>> +	kuid_t loginuid;
>> +	unsigned int sessionid;
>> +};
>> +
>>   struct scm_fp_list {
>>   	short			count;
>>   	short			max;
>> @@ -28,6 +34,7 @@ struct scm_cookie {
>>   	struct pid		*pid;		/* Skb credentials */
>>   	struct scm_fp_list	*fp;		/* Passed files		*/
>>   	struct scm_creds	creds;		/* Skb credentials	*/
>> +	struct scm_audit	audit;		/* Skb audit	*/
>>   #ifdef CONFIG_SECURITY_NETWORK
>>   	u32			secid;		/* Passed security ID 	*/
>>   #endif
>> @@ -58,6 +65,13 @@ static __inline__ void scm_set_cred(struct scm_cookie *scm,
>>   	scm->creds.gid = gid;
>>   }
>>
>> +static inline void scm_set_audit(struct scm_cookie *scm,
>> +				    kuid_t loginuid, unsigned int sessionid)
>> +{
>> +	scm->audit.loginuid = loginuid;
>> +	scm->audit.sessionid = sessionid;
>> +}
>> +
>>   static __inline__ void scm_destroy_cred(struct scm_cookie *scm)
>>   {
>>   	put_pid(scm->pid);
>> @@ -77,8 +91,12 @@ static __inline__ int scm_send(struct socket *sock, struct msghdr *msg,
>>   	memset(scm, 0, sizeof(*scm));
>>   	scm->creds.uid = INVALID_UID;
>>   	scm->creds.gid = INVALID_GID;
>> -	if (forcecreds)
>> -		scm_set_cred(scm, task_tgid(current), current_uid(), current_gid());
>> +	if (forcecreds) {
>> +		scm_set_cred(scm, task_tgid(current), current_uid(),
>> +			     current_gid());
>> +		scm_set_audit(scm, audit_get_loginuid(current),
>> +			      audit_get_sessionid(current));
>> +	}
>>   	unix_get_peersec_dgram(sock, scm);
>>   	if (msg->msg_controllen <= 0)
>>   		return 0;
>> @@ -123,7 +141,13 @@ static __inline__ void scm_recv(struct socket *sock, struct msghdr *msg,
>>   			.uid = from_kuid_munged(current_ns, scm->creds.uid),
>>   			.gid = from_kgid_munged(current_ns, scm->creds.gid),
>>   		};
>> +		struct uaudit uaudits = {
>> +			.loginuid = from_kuid_munged(current_ns,
>> +						     scm->audit.loginuid),
>> +			.sessionid = scm->audit.sessionid,
>> +		};
>>   		put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(ucreds), &ucreds);
>> +		put_cmsg(msg, SOL_SOCKET, SCM_AUDIT, sizeof(uaudits), &uaudits);
>>   	}
>>
>>   	scm_destroy_cred(scm);
>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>> index 86de99a..c410f76 100644
>> --- a/net/unix/af_unix.c
>> +++ b/net/unix/af_unix.c
>> @@ -1393,6 +1393,8 @@ static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen
>>   	UNIXCB(skb).pid  = get_pid(scm->pid);
>>   	UNIXCB(skb).uid = scm->creds.uid;
>>   	UNIXCB(skb).gid = scm->creds.gid;
>> +	UNIXCB(skb).loginuid = scm->audit.loginuid;
>> +	UNIXCB(skb).sessionid = scm->audit.sessionid;
>>   	UNIXCB(skb).fp = NULL;
>>   	if (scm->fp && send_fds)
>>   		err = unix_attach_fds(scm, skb);
>> @@ -1416,6 +1418,8 @@ static void maybe_add_creds(struct sk_buff *skb, const struct socket *sock,
>>   	    test_bit(SOCK_PASSCRED, &other->sk_socket->flags)) {
>>   		UNIXCB(skb).pid  = get_pid(task_tgid(current));
>>   		current_uid_gid(&UNIXCB(skb).uid, &UNIXCB(skb).gid);
>> +		UNIXCB(skb).loginuid = audit_get_loginuid(current);
>> +		UNIXCB(skb).sessionid = audit_get_sessionid(current);
>>   	}
>>   }
>>
>> @@ -1812,6 +1816,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
>>   		memset(&tmp_scm, 0, sizeof(tmp_scm));
>>   	}
>>   	scm_set_cred(siocb->scm, UNIXCB(skb).pid, UNIXCB(skb).uid, UNIXCB(skb).gid);
>> +	scm_set_audit(siocb->scm, UNIXCB(skb).loginuid, UNIXCB(skb).sessionid);
>>   	unix_set_secdata(siocb->scm, skb);
>>
>>   	if (!(flags & MSG_PEEK)) {
>> @@ -1993,6 +1998,8 @@ again:
>>   		} else if (test_bit(SOCK_PASSCRED, &sock->flags)) {
>>   			/* Copy credentials */
>>   			scm_set_cred(siocb->scm, UNIXCB(skb).pid, UNIXCB(skb).uid, UNIXCB(skb).gid);
>> +			scm_set_audit(siocb->scm, UNIXCB(skb).loginuid,
>> +				      UNIXCB(skb).sessionid);
>>   			check_creds = 1;
>>   		}

^ permalink raw reply

* Re: xen-netback: count number required slots for an skb more carefully
From: Wei Liu @ 2013-09-04  8:41 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Matt Wilson, annie li, Wei Liu, David Vrabel, xen-devel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, netdev, msw
In-Reply-To: <1378283770.17510.52.camel@kazak.uk.xensource.com>

On Wed, Sep 04, 2013 at 09:36:10AM +0100, Ian Campbell wrote:
> On Tue, 2013-09-03 at 23:04 -0700, Matt Wilson wrote:
> > On Wed, Sep 04, 2013 at 10:25:59AM +0800, annie li wrote:
> > > On 2013-9-4 5:53, Wei Liu wrote:
> > [...]
> > > >Matt, do you fancy sending the final version? IIRC the commit message
> > > >needs to be re-written. I personally still prefer Matt's solution as
> > > >it a) make efficient use of the ring, b) uses ring pointers to
> > > >calculate slots which is most accurate, c) removes the dependence on
> > > >MAX_SKB_FRAGS in guest RX path.
> > > >
> > > >Anyway, we should get this fixed ASAP.
> 
> Yes. Would there be any harm in just applying David's patch (just
> because it is the one currently in our hands and it is based on a recent
> enough kernel). It looks right from a correctness PoV to me.
> 
> > > Totally agree. This issue is easy to be reproduced with large MTU.
> > > It is better to upstream the fix soon in case others hit it and
> > > waste time to fix it.
> > 
> > I'd like to go with Xi's proposed patch that I posed earlier.
> 
> I don't think I saw that, do you have a message-id?
> 
> ("Xi" is a bit too short for a search on my INBOX unfortunately).
> 

This one <1373409659-22383-1-git-send-email-msw@amazon.com>

Wei.

> >  The main
> > thing that's kept me from sending a final version is lack of time to
> > retest against a newer kernel.
> > 
> > Could someone help out with that? I probably can't get to it until the
> > end of the week.
> > 
> > Sorry for the delay. :-(
> > 
> > --msw
> 

^ permalink raw reply

* Re: xen-netback: count number required slots for an skb more carefully
From: Ian Campbell @ 2013-09-04  8:36 UTC (permalink / raw)
  To: Matt Wilson
  Cc: annie li, Wei Liu, David Vrabel, xen-devel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, netdev, msw
In-Reply-To: <20130904060446.GA25227@u109add4315675089e695.ant.amazon.com>

On Tue, 2013-09-03 at 23:04 -0700, Matt Wilson wrote:
> On Wed, Sep 04, 2013 at 10:25:59AM +0800, annie li wrote:
> > On 2013-9-4 5:53, Wei Liu wrote:
> [...]
> > >Matt, do you fancy sending the final version? IIRC the commit message
> > >needs to be re-written. I personally still prefer Matt's solution as
> > >it a) make efficient use of the ring, b) uses ring pointers to
> > >calculate slots which is most accurate, c) removes the dependence on
> > >MAX_SKB_FRAGS in guest RX path.
> > >
> > >Anyway, we should get this fixed ASAP.

Yes. Would there be any harm in just applying David's patch (just
because it is the one currently in our hands and it is based on a recent
enough kernel). It looks right from a correctness PoV to me.

> > Totally agree. This issue is easy to be reproduced with large MTU.
> > It is better to upstream the fix soon in case others hit it and
> > waste time to fix it.
> 
> I'd like to go with Xi's proposed patch that I posed earlier.

I don't think I saw that, do you have a message-id?

("Xi" is a bit too short for a search on my INBOX unfortunately).

>  The main
> thing that's kept me from sending a final version is lack of time to
> retest against a newer kernel.
> 
> Could someone help out with that? I probably can't get to it until the
> end of the week.
> 
> Sorry for the delay. :-(
> 
> --msw

^ permalink raw reply

* Re: [Xen-devel] xen-netback: count number required slots for an skb more carefully
From: Wei Liu @ 2013-09-04  8:20 UTC (permalink / raw)
  To: annie li
  Cc: Wei Liu, Matt Wilson, David Vrabel, xen-devel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Ian Campbell, netdev, msw
In-Reply-To: <5226EA1B.1030303@oracle.com>

On Wed, Sep 04, 2013 at 04:06:51PM +0800, annie li wrote:
> 
> On 2013-9-4 15:38, Wei Liu wrote:
> >On Wed, Sep 04, 2013 at 02:56:20PM +0800, annie li wrote:
> >>On 2013-9-4 14:04, Matt Wilson wrote:
> >>>On Wed, Sep 04, 2013 at 10:25:59AM +0800, annie li wrote:
> >>>>On 2013-9-4 5:53, Wei Liu wrote:
> >>>[...]
> >>>>>Matt, do you fancy sending the final version? IIRC the commit message
> >>>>>needs to be re-written. I personally still prefer Matt's solution as
> >>>>>it a) make efficient use of the ring, b) uses ring pointers to
> >>>>>calculate slots which is most accurate, c) removes the dependence on
> >>>>>MAX_SKB_FRAGS in guest RX path.
> >>>>>
> >>>>>Anyway, we should get this fixed ASAP.
> >>>>Totally agree. This issue is easy to be reproduced with large MTU.
> >>>>It is better to upstream the fix soon in case others hit it and
> >>>>waste time to fix it.
> >>>I'd like to go with Xi's proposed patch that I posed earlier. The main
> >>>thing that's kept me from sending a final version is lack of time to
> >>>retest against a newer kernel.
> >>>
> >>>Could someone help out with that? I probably can't get to it until the
> >>>end of the week.
> >>I can rebase it. Since wei's NAPI &1:1 model patch went into
> >>net-next already, it should not be able to applied directly.
> >>
> >I think this one should go to net then queue up for stable, not
> >net-next.
> If so, that would be easier. It is almost end of the day here, and I
> do not have enough time to test it until tomorrow. Please go ahead
> with it if you have time.
> 

Sure, I'm fine with this. I will spend some time rebase it today. But
eventually I would like to get it tested as much as possible. That's to
say, I would need confirmation / acked-by from you, David and Matt.

Wei.

> Thanks
> Annie
> >Wei.
> >
> >>Thanks
> >>Annie
> >>>Sorry for the delay. :-(
> >>>
> >>>--msw

^ permalink raw reply

* Re: [Xen-devel] xen-netback: count number required slots for an skb more carefully
From: annie li @ 2013-09-04  8:06 UTC (permalink / raw)
  To: Wei Liu
  Cc: Matt Wilson, David Vrabel, xen-devel, Konrad Rzeszutek Wilk,
	Boris Ostrovsky, Ian Campbell, netdev, msw
In-Reply-To: <20130904073857.GI14104@zion.uk.xensource.com>


On 2013-9-4 15:38, Wei Liu wrote:
> On Wed, Sep 04, 2013 at 02:56:20PM +0800, annie li wrote:
>> On 2013-9-4 14:04, Matt Wilson wrote:
>>> On Wed, Sep 04, 2013 at 10:25:59AM +0800, annie li wrote:
>>>> On 2013-9-4 5:53, Wei Liu wrote:
>>> [...]
>>>>> Matt, do you fancy sending the final version? IIRC the commit message
>>>>> needs to be re-written. I personally still prefer Matt's solution as
>>>>> it a) make efficient use of the ring, b) uses ring pointers to
>>>>> calculate slots which is most accurate, c) removes the dependence on
>>>>> MAX_SKB_FRAGS in guest RX path.
>>>>>
>>>>> Anyway, we should get this fixed ASAP.
>>>> Totally agree. This issue is easy to be reproduced with large MTU.
>>>> It is better to upstream the fix soon in case others hit it and
>>>> waste time to fix it.
>>> I'd like to go with Xi's proposed patch that I posed earlier. The main
>>> thing that's kept me from sending a final version is lack of time to
>>> retest against a newer kernel.
>>>
>>> Could someone help out with that? I probably can't get to it until the
>>> end of the week.
>> I can rebase it. Since wei's NAPI &1:1 model patch went into
>> net-next already, it should not be able to applied directly.
>>
> I think this one should go to net then queue up for stable, not
> net-next.
If so, that would be easier. It is almost end of the day here, and I do 
not have enough time to test it until tomorrow. Please go ahead with it 
if you have time.

Thanks
Annie
> Wei.
>
>> Thanks
>> Annie
>>> Sorry for the delay. :-(
>>>
>>> --msw

^ permalink raw reply

* [PATCH RESEND net-next] net: usbnet: update addr_assign_type if appropriate
From: Bjørn Mork @ 2013-09-04  7:42 UTC (permalink / raw)
  To: netdev; +Cc: linux-usb, Bjørn Mork

This module generates a common default address on init,
using eth_random_addr. Set addr_assign_type to let
userspace know the address is random unless it was
overridden by the minidriver.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Oliver Neukum <oliver@neukum.org>
---
Sorry this patch missed the "addr_assign_type series" where it should
have been.

 drivers/net/usb/usbnet.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index e4811d7..74a8e3c 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1629,6 +1629,10 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod)
 		dev->rx_urb_size = dev->hard_mtu;
 	dev->maxpacket = usb_maxpacket (dev->udev, dev->out, 1);
 
+	/* let userspace know we have a random address */
+	if (ether_addr_equal(net->dev_addr, node_id))
+		net->addr_assign_type = NET_ADDR_RANDOM;
+
 	if ((dev->driver_info->flags & FLAG_WLAN) != 0)
 		SET_NETDEV_DEVTYPE(net, &wlan_type);
 	if ((dev->driver_info->flags & FLAG_WWAN) != 0)
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH v3 0/3] Send audit/procinfo/cgroup data in socket-level control message
From: Eric W. Biederman @ 2013-09-04  7:42 UTC (permalink / raw)
  To: Jan Kaluza
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, LKML, netdev-u79uwXL29TY76Z2rM5mHXA,
	eparis-H+wXaHxf7aLQT0dZR+AlfA, rgb-H+wXaHxf7aLQT0dZR+AlfA,
	tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan-hv44wF8Li93QT0dZR+AlfA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <1378275261-4553-1-git-send-email-jkaluza-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Jan Kaluza <jkaluza-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> Hi,
>
> this patchset against net-next (applies also to linux-next) adds 3 new types
> of "Socket"-level control message (SCM_AUDIT, SCM_PROCINFO and SCM_CGROUP).
>
> Server-like processes in many cases need credentials and other
> metadata of the peer, to decide if the calling process is allowed to
> request a specific action, or the server just wants to log away this
> type of information for auditing tasks.
>
> The current practice to retrieve such process metadata is to look that
> information up in procfs with the $PID received over SCM_CREDENTIALS.
> This is sufficient for long-running tasks, but introduces a race which
> cannot be worked around for short-living processes; the calling
> process and all the information in /proc/$PID/ is gone before the
> receiver of the socket message can look it up.

> Changes introduced in this patchset can also increase performance
> of such server-like processes, because current way of opening and
> parsing /proc/$PID/* files is much more expensive than receiving these
> metadata using SCM.

Can I just say ick, blech, barf, gag.

You don't require this information to be passed.  You are asking people
to suport a lot of new code for the forseeable future.  The only advantage
appears to be for short lived racy processes that don't even bother to
make certain their message was acknowleged before exiting.

You sent this during the merge window which is the time for code
integration and testing not new code.

By my count you have overflowed cb in struct sk_buff and are stomping on
_skb_refdest.

If you are going to go crazy and pass things is there a reason you do
not add a patch to pass the bsd SCM_CREDS?  That information seems more
relevant in a security context and for making security decisions than
about half the information you are passing.

Eric

^ permalink raw reply

* Re: [Xen-devel] xen-netback: count number required slots for an skb more carefully
From: Wei Liu @ 2013-09-04  7:38 UTC (permalink / raw)
  To: annie li
  Cc: Matt Wilson, Wei Liu, David Vrabel, xen-devel,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, Ian Campbell, netdev, msw
In-Reply-To: <5226D994.9000401@oracle.com>

On Wed, Sep 04, 2013 at 02:56:20PM +0800, annie li wrote:
> 
> On 2013-9-4 14:04, Matt Wilson wrote:
> >On Wed, Sep 04, 2013 at 10:25:59AM +0800, annie li wrote:
> >>On 2013-9-4 5:53, Wei Liu wrote:
> >[...]
> >>>Matt, do you fancy sending the final version? IIRC the commit message
> >>>needs to be re-written. I personally still prefer Matt's solution as
> >>>it a) make efficient use of the ring, b) uses ring pointers to
> >>>calculate slots which is most accurate, c) removes the dependence on
> >>>MAX_SKB_FRAGS in guest RX path.
> >>>
> >>>Anyway, we should get this fixed ASAP.
> >>Totally agree. This issue is easy to be reproduced with large MTU.
> >>It is better to upstream the fix soon in case others hit it and
> >>waste time to fix it.
> >I'd like to go with Xi's proposed patch that I posed earlier. The main
> >thing that's kept me from sending a final version is lack of time to
> >retest against a newer kernel.
> >
> >Could someone help out with that? I probably can't get to it until the
> >end of the week.
> 
> I can rebase it. Since wei's NAPI &1:1 model patch went into
> net-next already, it should not be able to applied directly.
> 

I think this one should go to net then queue up for stable, not
net-next.

Wei.

> Thanks
> Annie
> >
> >Sorry for the delay. :-(
> >
> >--msw

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox