Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net 1/2] Revert "net: use lib/percpu_counter API for fragmentation mem accounting"
From: Jesper Dangaard Brouer @ 2017-09-01  9:26 UTC (permalink / raw)
  To: netdev; +Cc: mkubecek, Florian Westphal, liujian56, Jesper Dangaard Brouer
In-Reply-To: <150425790711.22227.12264977619066874632.stgit@firesoul>

This reverts commit 6d7b857d541ecd1d9bd997c97242d4ef94b19de2.

There is a bug in fragmentation codes use of the percpu_counter API,
that can cause issues on systems with many CPUs.

The frag_mem_limit() just reads the global counter (fbc->count),
without considering other CPUs can have upto batch size (130K) that
haven't been subtracted yet.  Due to the 3MBytes lower thresh limit,
this become dangerous at >=24 CPUs (3*1024*1024/130000=24).

The correct API usage would be to use __percpu_counter_compare() which
does the right thing, and takes into account the number of (online)
CPUs and batch size, to account for this and call __percpu_counter_sum()
when needed.

We choose to revert the use of the lib/percpu_counter API for frag
memory accounting for several reasons:

1) On systems with CPUs > 24, the heavier fully locked
   __percpu_counter_sum() is always invoked, which will be more
   expensive than the atomic_t that is reverted to.

Given systems with more than 24 CPUs are becoming common this doesn't
seem like a good option.  To mitigate this, the batch size could be
decreased and thresh be increased.

2) The add_frag_mem_limit+sub_frag_mem_limit pairs happen on the RX
   CPU, before SKBs are pushed into sockets on remote CPUs.  Given
   NICs can only hash on L2 part of the IP-header, the NIC-RXq's will
   likely be limited.  Thus, a fair chance that atomic add+dec happen
   on the same CPU.

Revert note that commit 1d6119baf061 ("net: fix percpu memory leaks")
removed init_frag_mem_limit() and instead use inet_frags_init_net().
After this revert, inet_frags_uninit_net() becomes empty.

Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting")
Fixes: 1d6119baf061 ("net: fix percpu memory leaks")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/net/inet_frag.h  |   30 +++++++++---------------------
 net/ipv4/inet_fragment.c |    4 +---
 2 files changed, 10 insertions(+), 24 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 6fdcd2427776..fa635aa6d0b9 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -1,14 +1,9 @@
 #ifndef __NET_FRAG_H__
 #define __NET_FRAG_H__

-#include <linux/percpu_counter.h>
-
 struct netns_frags {
-	/* The percpu_counter "mem" need to be cacheline aligned.
-	 *  mem.count must not share cacheline with other writers
-	 */
-	struct percpu_counter   mem ____cacheline_aligned_in_smp;
-
+	/* Keep atomic mem on separate cachelines in structs that include it */
+	atomic_t		mem ____cacheline_aligned_in_smp;
 	/* sysctls */
 	int			timeout;
 	int			high_thresh;
@@ -110,11 +105,11 @@ void inet_frags_fini(struct inet_frags *);

 static inline int inet_frags_init_net(struct netns_frags *nf)
 {
-	return percpu_counter_init(&nf->mem, 0, GFP_KERNEL);
+	atomic_set(&nf->mem, 0);
+	return 0;
 }
 static inline void inet_frags_uninit_net(struct netns_frags *nf)
 {
-	percpu_counter_destroy(&nf->mem);
 }

 void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f);
@@ -140,31 +135,24 @@ static inline bool inet_frag_evicting(struct inet_frag_queue *q)

 /* Memory Tracking Functions. */

-/* The default percpu_counter batch size is not big enough to scale to
- * fragmentation mem acct sizes.
- * The mem size of a 64K fragment is approx:
- *  (44 fragments * 2944 truesize) + frag_queue struct(200) = 129736 bytes
- */
-static unsigned int frag_percpu_counter_batch = 130000;
-
 static inline int frag_mem_limit(struct netns_frags *nf)
 {
-	return percpu_counter_read(&nf->mem);
+	return atomic_read(&nf->mem);
 }

 static inline void sub_frag_mem_limit(struct netns_frags *nf, int i)
 {
-	percpu_counter_add_batch(&nf->mem, -i, frag_percpu_counter_batch);
+	atomic_sub(i, &nf->mem);
 }

 static inline void add_frag_mem_limit(struct netns_frags *nf, int i)
 {
-	percpu_counter_add_batch(&nf->mem, i, frag_percpu_counter_batch);
+	atomic_add(i, &nf->mem);
 }

-static inline unsigned int sum_frag_mem_limit(struct netns_frags *nf)
+static inline int sum_frag_mem_limit(struct netns_frags *nf)
 {
-	return percpu_counter_sum_positive(&nf->mem);
+	return atomic_read(&nf->mem);
 }

 /* RFC 3168 support :
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 96e95e83cc61..af74d0433453 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -234,10 +234,8 @@ void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f)
 	cond_resched();

 	if (read_seqretry(&f->rnd_seqlock, seq) ||
-	    percpu_counter_sum(&nf->mem))
+	    sum_frag_mem_limit(nf))
 		goto evict_again;
-
-	percpu_counter_destroy(&nf->mem);
 }
 EXPORT_SYMBOL(inet_frags_exit_net);

^ permalink raw reply related

* [PATCH net 0/2] net: revert lib/percpu_counter API for fragmentation mem accounting
From: Jesper Dangaard Brouer @ 2017-09-01  9:26 UTC (permalink / raw)
  To: netdev; +Cc: mkubecek, Florian Westphal, liujian56, Jesper Dangaard Brouer

There is a bug in fragmentation codes use of the percpu_counter API,
that can cause issues on systems with many CPUs, above 24 CPUs.

After much consideration and different attempts at solving the API
usage.  The conclusion is to revert to the simple atomic_t API instead.

The ratio between batch size and threshold size make it a bad use-case
for the lib/percpu_counter API.  As using the correct API calls will
unfortunately cause systems with many CPUs to always execute an
expensive sum across all CPUs. Plus the added complexity is not worth it.

---

Jesper Dangaard Brouer (2):
      Revert "net: use lib/percpu_counter API for fragmentation mem accounting"
      Revert "net: fix percpu memory leaks"

 include/net/inet_frag.h                 |   35 ++++++++-----------------------
 net/ieee802154/6lowpan/reassembly.c     |   11 +++-------
 net/ipv4/inet_fragment.c                |    4 +---
 net/ipv4/ip_fragment.c                  |   12 +++--------
 net/ipv6/netfilter/nf_conntrack_reasm.c |   12 +++--------
 net/ipv6/reassembly.c                   |   12 +++--------
 6 files changed, 22 insertions(+), 64 deletions(-)

^ permalink raw reply

* [PATCH net] bridge: switchdev: Clear forward mark when transmitting packet
From: Ido Schimmel @ 2017-09-01  9:22 UTC (permalink / raw)
  To: netdev; +Cc: davem, stephen, nikolay, jiri, yotamg, mlxsw, bridge,
	Ido Schimmel

Commit 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for
stacked devices") added the 'offload_fwd_mark' bit to the skb in order
to allow drivers to indicate to the bridge driver that they already
forwarded the packet in L2.

In case the bit is set, before transmitting the packet from each port,
the port's mark is compared with the mark stored in the skb's control
block. If both marks are equal, we know the packet arrived from a switch
device that already forwarded the packet and it's not re-transmitted.

However, if the packet is transmitted from the bridge device itself
(e.g., br0), we should clear the 'offload_fwd_mark' bit as the mark
stored in the skb's control block isn't valid.

This scenario can happen in rare cases where a packet was trapped during
L3 forwarding and forwarded by the kernel to a bridge device.

Fixes: 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Yotam Gigi <yotamg@mellanox.com>
Tested-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 net/bridge/br_device.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 861ae2a165f4..5a7be3bddfa9 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -53,6 +53,9 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 	brstats->tx_bytes += skb->len;
 	u64_stats_update_end(&brstats->syncp);

+#ifdef CONFIG_NET_SWITCHDEV
+	skb->offload_fwd_mark = 0;
+#endif
 	BR_INPUT_SKB_CB(skb)->brdev = dev;

 	skb_reset_mac_header(skb);
-- 
2.13.5

^ permalink raw reply related

* [PATCH] net: phy: broadcom: force master mode for BCM54210E and B50212E
From: Rafał Miłecki @ 2017-09-01  9:21 UTC (permalink / raw)
  To: David S . Miller, netdev
  Cc: Andrew Lunn, Florian Fainelli, Hauke Mehrtens,
	bcm-kernel-feedback-list, Rafał Miłecki

From: Rafał Miłecki <rafal@milecki.pl>

First of all let me explain that the code we use for BCM54210E is also
executed for the B50212E. They are very similar so it probably makes
sense but it may be worth noting. The IDs are:
0x600d84a1: BCM54210E (rev B0)
0x600d84a2: BCM54210E (rev B1)
0x600d84a5: B50212E (rev B0)
0x600d84a6: B50212E (rev B1)

I got a report that a board with BCM47189 SoC and B50212E B1 PHY doesn't
work well with Intel's I217-LM and I218-LM:
http://ark.intel.com/products/60019/Intel-Ethernet-Connection-I217-LM
http://ark.intel.com/products/71307/Intel-Ethernet-Connection-I218-LM
I was told there are massive ping loss.

A solution to this problem is setting master mode in the 1000BASE-T
register. I noticed a similar fix is present in the tg3 driver. One
thing I'm not sure if this is needed for BCM54210E. It shouldn't hurt
however since both are so similar.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
---
David: I'm not 100% sure if this is the best fix, so let's give others
(Florian?) a moment to look at it / review it, please.
---
 drivers/net/phy/broadcom.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
index 1e9ad30a35c8..2569db0923b0 100644
--- a/drivers/net/phy/broadcom.c
+++ b/drivers/net/phy/broadcom.c
@@ -43,6 +43,10 @@ static int bcm54210e_config_init(struct phy_device *phydev)
 	val &= ~BCM54810_SHD_CLK_CTL_GTXCLK_EN;
 	bcm_phy_write_shadow(phydev, BCM54810_SHD_CLK_CTL, val);

+	val = phy_read(phydev, MII_CTRL1000);
+	val |= CTL1000_AS_MASTER | CTL1000_ENABLE_MASTER;
+	phy_write(phydev, MII_CTRL1000, val);
+
 	return 0;
 }

-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH net-next 2/3] bpf: Inline LRU map lookup
From: Daniel Borkmann @ 2017-09-01  9:18 UTC (permalink / raw)
  To: Martin KaFai Lau, netdev; +Cc: Alexei Starovoitov, kernel-team
In-Reply-To: <20170901062713.1842249-3-kafai@fb.com>

On 09/01/2017 08:27 AM, Martin KaFai Lau wrote:
> Inline the lru map lookup to save the cost in making calls to
> bpf_map_lookup_elem() and htab_lru_map_lookup_elem().
>
> Different LRU hash size is tested.  The benefit diminishes when
> the cache miss starts to dominate in the bigger LRU hash.
> Considering the change is simple, it is still worth to optimize.
>
> First column: Size of the LRU hash
> Second column: Number of lookups/s
>
> Before:
>> for i in $(seq 9 20); do echo "$((2**i+1)): $(./map_perf_test 1024 1 $((2**i+1)) 10000000 | awk '{print $3}')"; done
> 513: 1132020
> 1025: 1056826
> 2049: 1007024
> 4097: 853298
> 8193: 742723
> 16385: 712600
> 32769: 688142
> 65537: 677028
> 131073: 619437
> 262145: 498770
> 524289: 316695
> 1048577: 260038
>
> After:
>> for i in $(seq 9 20); do echo "$((2**i+1)): $(./map_perf_test 1024 1 $((2**i+1)) 10000000 | awk '{print $3}')"; done
> 513: 1221851
> 1025: 1144695
> 2049: 1049902
> 4097: 884460
> 8193: 773731
> 16385: 729673
> 32769: 721989
> 65537: 715530
> 131073: 671665
> 262145: 516987
> 524289: 321125
> 1048577: 260048
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* [PATCH net-next] net: Add module reference to FIB notifiers
From: Ido Schimmel @ 2017-09-01  9:15 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, dsahern, mlxsw, Ido Schimmel

When a listener registers to the FIB notification chain it receives a
dump of the FIB entries and rules from existing address families by
invoking their dump operations.

While we call into these modules we need to make sure they aren't
removed. Do that by increasing their reference count before invoking
their dump operations and decrease it afterwards.

Fixes: 04b1d4e50e82 ("net: core: Make the FIB notification chain generic")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/fib_notifier.h |  2 ++
 net/core/fib_notifier.c    | 13 +++++++++++--
 net/ipv4/fib_notifier.c    |  2 ++
 net/ipv6/fib6_notifier.c   |  2 ++
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/net/fib_notifier.h b/include/net/fib_notifier.h
index 241475224f74..669b9716dc7a 100644
--- a/include/net/fib_notifier.h
+++ b/include/net/fib_notifier.h
@@ -2,6 +2,7 @@
 #define __NET_FIB_NOTIFIER_H
 
 #include <linux/types.h>
+#include <linux/module.h>
 #include <linux/notifier.h>
 #include <net/net_namespace.h>
 
@@ -26,6 +27,7 @@ struct fib_notifier_ops {
 	struct list_head list;
 	unsigned int (*fib_seq_read)(struct net *net);
 	int (*fib_dump)(struct net *net, struct notifier_block *nb);
+	struct module *owner;
 	struct rcu_head rcu;
 };
 
diff --git a/net/core/fib_notifier.c b/net/core/fib_notifier.c
index 292aab83702f..4fc202dbdfb6 100644
--- a/net/core/fib_notifier.c
+++ b/net/core/fib_notifier.c
@@ -2,6 +2,7 @@
 #include <linux/notifier.h>
 #include <linux/rcupdate.h>
 #include <linux/kernel.h>
+#include <linux/module.h>
 #include <linux/init.h>
 #include <net/net_namespace.h>
 #include <net/fib_notifier.h>
@@ -33,8 +34,12 @@ static unsigned int fib_seq_sum(void)
 
 	rtnl_lock();
 	for_each_net(net) {
-		list_for_each_entry(ops, &net->fib_notifier_ops, list)
+		list_for_each_entry(ops, &net->fib_notifier_ops, list) {
+			if (!try_module_get(ops->owner))
+				continue;
 			fib_seq += ops->fib_seq_read(net);
+			module_put(ops->owner);
+		}
 	}
 	rtnl_unlock();
 
@@ -46,8 +51,12 @@ static int fib_net_dump(struct net *net, struct notifier_block *nb)
 	struct fib_notifier_ops *ops;
 
 	list_for_each_entry_rcu(ops, &net->fib_notifier_ops, list) {
-		int err = ops->fib_dump(net, nb);
+		int err;
 
+		if (!try_module_get(ops->owner))
+			continue;
+		err = ops->fib_dump(net, nb);
+		module_put(ops->owner);
 		if (err)
 			return err;
 	}
diff --git a/net/ipv4/fib_notifier.c b/net/ipv4/fib_notifier.c
index 5d7afb145562..cfd420b0572c 100644
--- a/net/ipv4/fib_notifier.c
+++ b/net/ipv4/fib_notifier.c
@@ -2,6 +2,7 @@
 #include <linux/notifier.h>
 #include <linux/socket.h>
 #include <linux/kernel.h>
+#include <linux/export.h>
 #include <net/net_namespace.h>
 #include <net/fib_notifier.h>
 #include <net/netns/ipv4.h>
@@ -49,6 +50,7 @@ static const struct fib_notifier_ops fib4_notifier_ops_template = {
 	.family		= AF_INET,
 	.fib_seq_read	= fib4_seq_read,
 	.fib_dump	= fib4_dump,
+	.owner		= THIS_MODULE,
 };
 
 int __net_init fib4_notifier_init(struct net *net)
diff --git a/net/ipv6/fib6_notifier.c b/net/ipv6/fib6_notifier.c
index 66a103ef7e86..05f82baaa99e 100644
--- a/net/ipv6/fib6_notifier.c
+++ b/net/ipv6/fib6_notifier.c
@@ -1,6 +1,7 @@
 #include <linux/notifier.h>
 #include <linux/socket.h>
 #include <linux/kernel.h>
+#include <linux/export.h>
 #include <net/net_namespace.h>
 #include <net/fib_notifier.h>
 #include <net/netns/ipv6.h>
@@ -41,6 +42,7 @@ static const struct fib_notifier_ops fib6_notifier_ops_template = {
 	.family		= AF_INET6,
 	.fib_seq_read	= fib6_seq_read,
 	.fib_dump	= fib6_dump,
+	.owner		= THIS_MODULE,
 };
 
 int __net_init fib6_notifier_init(struct net *net)
-- 
2.13.5

^ permalink raw reply related

* Re: [PATCH net-next 1/3] bpf: Add lru_hash_lookup performance test
From: Daniel Borkmann @ 2017-09-01  9:12 UTC (permalink / raw)
  To: Martin KaFai Lau, netdev; +Cc: Alexei Starovoitov, kernel-team
In-Reply-To: <20170901062713.1842249-2-kafai@fb.com>

On 09/01/2017 08:27 AM, Martin KaFai Lau wrote:
> Create a new case to test the LRU lookup performance.
>
> At the beginning, the LRU map is fully loaded (i.e. the number of keys
> is equal to map->max_entries).   The lookup is done through key 0
> to num_map_entries and then repeats from 0 again.
>
> This patch also creates an anonymous struct to properly
> name the test params in stress_lru_hmap_alloc() in map_perf_test_kern.c.
>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* [PATCH net-next v2 3/4] net: mvpp2: use the GoP interrupt for link status changes
From: Antoine Tenart @ 2017-09-01  9:04 UTC (permalink / raw)
  To: davem
  Cc: Antoine Tenart, andrew, gregory.clement, thomas.petazzoni, nadavh,
	linux, linux-kernel, mw, stefanc, miquel.raynal, netdev
In-Reply-To: <20170901090455.32316-1-antoine.tenart@free-electrons.com>

This patch adds the GoP link interrupt support for when a port isn't
connected to a PHY. Because of this the phylib callback is never called
and the link status management isn't done. This patch use the GoP link
interrupt in such cases to still have a minimal link management. Without
this patch ports not connected to a PHY cannot work.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 177 ++++++++++++++++++++++++++++++++++-
 1 file changed, 172 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index b4d4759ddc25..dd0ee2691c86 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -348,16 +348,24 @@
 #define     MVPP2_GMAC_FLOW_CTRL_AUTONEG	BIT(11)
 #define     MVPP2_GMAC_CONFIG_FULL_DUPLEX	BIT(12)
 #define     MVPP2_GMAC_AN_DUPLEX_EN		BIT(13)
+#define MVPP2_GMAC_STATUS0			0x10
+#define     MVPP2_GMAC_STATUS0_LINK_UP		BIT(0)
 #define MVPP2_GMAC_PORT_FIFO_CFG_1_REG		0x1c
 #define     MVPP2_GMAC_TX_FIFO_MIN_TH_OFFS	6
 #define     MVPP2_GMAC_TX_FIFO_MIN_TH_ALL_MASK	0x1fc0
 #define     MVPP2_GMAC_TX_FIFO_MIN_TH_MASK(v)	(((v) << 6) & \
 					MVPP2_GMAC_TX_FIFO_MIN_TH_ALL_MASK)
+#define MVPP22_GMAC_INT_STAT			0x20
+#define     MVPP22_GMAC_INT_STAT_LINK		BIT(1)
+#define MVPP22_GMAC_INT_MASK			0x24
+#define     MVPP22_GMAC_INT_MASK_LINK_STAT	BIT(1)
 #define MVPP22_GMAC_CTRL_4_REG			0x90
 #define     MVPP22_CTRL4_EXT_PIN_GMII_SEL	BIT(0)
 #define     MVPP22_CTRL4_DP_CLK_SEL		BIT(5)
 #define     MVPP22_CTRL4_SYNC_BYPASS_DIS	BIT(6)
 #define     MVPP22_CTRL4_QSGMII_BYPASS_ACTIVE	BIT(7)
+#define MVPP22_GMAC_INT_SUM_MASK		0xa4
+#define     MVPP22_GMAC_INT_SUM_MASK_LINK_STAT	BIT(1)
 
 /* Per-port XGMAC registers. PPv2.2 only, only for GOP port 0,
  * relative to port->base.
@@ -370,11 +378,19 @@
 #define MVPP22_XLG_CTRL1_REG			0x104
 #define     MVPP22_XLG_CTRL1_FRAMESIZELIMIT_OFFS	0
 #define     MVPP22_XLG_CTRL1_FRAMESIZELIMIT_MASK	0x1fff
+#define MVPP22_XLG_STATUS			0x10c
+#define     MVPP22_XLG_STATUS_LINK_UP		BIT(0)
+#define MVPP22_XLG_INT_STAT			0x114
+#define     MVPP22_XLG_INT_STAT_LINK		BIT(1)
+#define MVPP22_XLG_INT_MASK			0x118
+#define     MVPP22_XLG_INT_MASK_LINK		BIT(1)
 #define MVPP22_XLG_CTRL3_REG			0x11c
 #define     MVPP22_XLG_CTRL3_MACMODESELECT_MASK	(7 << 13)
 #define     MVPP22_XLG_CTRL3_MACMODESELECT_GMAC	(0 << 13)
 #define     MVPP22_XLG_CTRL3_MACMODESELECT_10G	(1 << 13)
-
+#define MVPP22_XLG_EXT_INT_MASK			0x15c
+#define     MVPP22_XLG_EXT_INT_MASK_XLG		BIT(1)
+#define     MVPP22_XLG_EXT_INT_MASK_GIG		BIT(2)
 #define MVPP22_XLG_CTRL4_REG			0x184
 #define     MVPP22_XLG_CTRL4_FWD_FC		BIT(5)
 #define     MVPP22_XLG_CTRL4_FWD_PFC		BIT(6)
@@ -837,6 +853,8 @@ struct mvpp2_port {
 	 */
 	int gop_id;
 
+	int link_irq;
+
 	struct mvpp2 *priv;
 
 	/* Per-port registers' base address */
@@ -4422,6 +4440,68 @@ static int mvpp22_gop_init(struct mvpp2_port *port)
 	return -EINVAL;
 }
 
+static void mvpp22_gop_unmask_irq(struct mvpp2_port *port)
+{
+	u32 val;
+
+	if (phy_interface_mode_is_rgmii(port->phy_interface) ||
+	    port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+		/* Enable the GMAC link status irq for this port */
+		val = readl(port->base + MVPP22_GMAC_INT_SUM_MASK);
+		val |= MVPP22_GMAC_INT_SUM_MASK_LINK_STAT;
+		writel(val, port->base + MVPP22_GMAC_INT_SUM_MASK);
+	}
+
+	if (port->gop_id == 0) {
+		/* Enable the XLG/GIG irqs for this port */
+		val = readl(port->base + MVPP22_XLG_EXT_INT_MASK);
+		if (port->phy_interface == PHY_INTERFACE_MODE_10GKR)
+			val |= MVPP22_XLG_EXT_INT_MASK_XLG;
+		else
+			val |= MVPP22_XLG_EXT_INT_MASK_GIG;
+		writel(val, port->base + MVPP22_XLG_EXT_INT_MASK);
+	}
+}
+
+static void mvpp22_gop_mask_irq(struct mvpp2_port *port)
+{
+	u32 val;
+
+	if (port->gop_id == 0) {
+		val = readl(port->base + MVPP22_XLG_EXT_INT_MASK);
+		val &= ~(MVPP22_XLG_EXT_INT_MASK_XLG |
+		         MVPP22_XLG_EXT_INT_MASK_GIG);
+		writel(val, port->base + MVPP22_XLG_EXT_INT_MASK);
+	}
+
+	if (phy_interface_mode_is_rgmii(port->phy_interface) ||
+	    port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+		val = readl(port->base + MVPP22_GMAC_INT_SUM_MASK);
+		val &= ~MVPP22_GMAC_INT_SUM_MASK_LINK_STAT;
+		writel(val, port->base + MVPP22_GMAC_INT_SUM_MASK);
+	}
+}
+
+static void mvpp22_gop_setup_irq(struct mvpp2_port *port)
+{
+	u32 val;
+
+	if (phy_interface_mode_is_rgmii(port->phy_interface) ||
+	    port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+		val = readl(port->base + MVPP22_GMAC_INT_MASK);
+		val |= MVPP22_GMAC_INT_MASK_LINK_STAT;
+		writel(val, port->base + MVPP22_GMAC_INT_MASK);
+	}
+
+	if (port->gop_id == 0) {
+		val = readl(port->base + MVPP22_XLG_INT_MASK);
+		val |= MVPP22_XLG_INT_MASK_LINK;
+		writel(val, port->base + MVPP22_XLG_INT_MASK);
+	}
+
+	mvpp22_gop_unmask_irq(port);
+}
+
 static int mvpp22_comphy_init(struct mvpp2_port *port)
 {
 	enum phy_mode mode;
@@ -5726,6 +5806,60 @@ static irqreturn_t mvpp2_isr(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+/* Per-port interrupt for link status changes */
+static irqreturn_t mvpp2_link_status_isr(int irq, void *dev_id)
+{
+	struct mvpp2_port *port = (struct mvpp2_port *)dev_id;
+	struct net_device *dev = port->dev;
+	bool event = false, link = false;
+	u32 val;
+
+	mvpp22_gop_mask_irq(port);
+
+	if (port->gop_id == 0 &&
+	    port->phy_interface == PHY_INTERFACE_MODE_10GKR) {
+		val = readl(port->base + MVPP22_XLG_INT_STAT);
+		if (val & MVPP22_XLG_INT_STAT_LINK) {
+			event = true;
+			val = readl(port->base + MVPP22_XLG_STATUS);
+			if (val & MVPP22_XLG_STATUS_LINK_UP)
+				link = true;
+		}
+	} else if (phy_interface_mode_is_rgmii(port->phy_interface) ||
+		   port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+		val = readl(port->base + MVPP22_GMAC_INT_STAT);
+		if (val & MVPP22_GMAC_INT_STAT_LINK) {
+			event = true;
+			val = readl(port->base + MVPP2_GMAC_STATUS0);
+			if (val & MVPP2_GMAC_STATUS0_LINK_UP)
+				link = true;
+		}
+	}
+
+	if (!netif_running(dev) || !event)
+		goto handled;
+
+	if (link) {
+		mvpp2_interrupts_enable(port);
+
+		mvpp2_egress_enable(port);
+		mvpp2_ingress_enable(port);
+		netif_carrier_on(dev);
+		netif_tx_wake_all_queues(dev);
+	} else {
+		netif_tx_stop_all_queues(dev);
+		netif_carrier_off(dev);
+		mvpp2_ingress_disable(port);
+		mvpp2_egress_disable(port);
+
+		mvpp2_interrupts_disable(port);
+	}
+
+handled:
+	mvpp22_gop_unmask_irq(port);
+	return IRQ_HANDLED;
+}
+
 static void mvpp2_gmac_set_autoneg(struct mvpp2_port *port,
 				   struct phy_device *phydev)
 {
@@ -5754,7 +5888,6 @@ static void mvpp2_gmac_set_autoneg(struct mvpp2_port *port,
 		val |= MVPP2_GMAC_CONFIG_MII_SPEED;
 
 	writel(val, port->base + MVPP2_GMAC_AUTONEG_CONFIG);
-
 }
 
 /* Adjust link */
@@ -6633,6 +6766,7 @@ static void mvpp2_irqs_deinit(struct mvpp2_port *port)
 static int mvpp2_open(struct net_device *dev)
 {
 	struct mvpp2_port *port = netdev_priv(dev);
+	struct mvpp2 *priv = port->priv;
 	unsigned char mac_bcast[ETH_ALEN] = {
 			0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
 	int err;
@@ -6678,12 +6812,24 @@ static int mvpp2_open(struct net_device *dev)
 		goto err_cleanup_txqs;
 	}
 
+	if (priv->hw_version == MVPP22 && !port->phy_node && port->link_irq) {
+		err = request_irq(port->link_irq, mvpp2_link_status_isr, 0,
+				  dev->name, port);
+		if (err) {
+			netdev_err(port->dev, "cannot request link IRQ %d\n",
+				   port->link_irq);
+			goto err_free_irq;
+		}
+
+		mvpp22_gop_setup_irq(port);
+	}
+
 	/* In default link is down */
 	netif_carrier_off(port->dev);
 
 	err = mvpp2_phy_connect(port);
 	if (err < 0)
-		goto err_free_irq;
+		goto err_free_link_irq;
 
 	/* Unmask interrupts on all CPUs */
 	on_each_cpu(mvpp2_interrupts_unmask, port, 1);
@@ -6693,6 +6839,9 @@ static int mvpp2_open(struct net_device *dev)
 
 	return 0;
 
+err_free_link_irq:
+	if (priv->hw_version == MVPP22 && !port->phy_node && port->link_irq)
+		free_irq(port->link_irq, port);
 err_free_irq:
 	mvpp2_irqs_deinit(port);
 err_cleanup_txqs:
@@ -6706,6 +6855,7 @@ static int mvpp2_stop(struct net_device *dev)
 {
 	struct mvpp2_port *port = netdev_priv(dev);
 	struct mvpp2_port_pcpu *port_pcpu;
+	struct mvpp2 *priv = port->priv;
 	int cpu;
 
 	mvpp2_stop_dev(port);
@@ -6715,6 +6865,9 @@ static int mvpp2_stop(struct net_device *dev)
 	on_each_cpu(mvpp2_interrupts_mask, port, 1);
 	mvpp2_shared_interrupt_mask_unmask(port, true);
 
+	if (priv->hw_version == MVPP22 && !port->phy_node && port->link_irq)
+		free_irq(port->link_irq, port);
+
 	mvpp2_irqs_deinit(port);
 	if (!port->has_tx_irqs) {
 		for_each_present_cpu(cpu) {
@@ -7413,6 +7566,15 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 	if (err)
 		goto err_free_netdev;
 
+	port->link_irq = of_irq_get_byname(port_node, "link");
+	if (port->link_irq == -EPROBE_DEFER) {
+		err = -EPROBE_DEFER;
+		goto err_deinit_qvecs;
+	}
+	if (port->link_irq <= 0)
+		/* the link irq is optional */
+		port->link_irq = 0;
+
 	if (of_property_read_bool(port_node, "marvell,loopback"))
 		port->flags |= MVPP2_F_LOOPBACK;
 
@@ -7431,7 +7593,7 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 		port->base = devm_ioremap_resource(&pdev->dev, res);
 		if (IS_ERR(port->base)) {
 			err = PTR_ERR(port->base);
-			goto err_deinit_qvecs;
+			goto err_free_irq;
 		}
 	} else {
 		if (of_property_read_u32(port_node, "gop-port-id",
@@ -7448,7 +7610,7 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 	port->stats = netdev_alloc_pcpu_stats(struct mvpp2_pcpu_stats);
 	if (!port->stats) {
 		err = -ENOMEM;
-		goto err_deinit_qvecs;
+		goto err_free_irq;
 	}
 
 	mvpp2_port_copy_mac_addr(dev, priv, port_node, &mac_from);
@@ -7518,6 +7680,9 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 		free_percpu(port->txqs[i]->pcpu);
 err_free_stats:
 	free_percpu(port->stats);
+err_free_irq:
+	if (port->link_irq)
+		irq_dispose_mapping(port->link_irq);
 err_deinit_qvecs:
 	mvpp2_queue_vectors_deinit(port);
 err_free_netdev:
@@ -7538,6 +7703,8 @@ static void mvpp2_port_remove(struct mvpp2_port *port)
 	for (i = 0; i < port->ntxqs; i++)
 		free_percpu(port->txqs[i]->pcpu);
 	mvpp2_queue_vectors_deinit(port);
+	if (port->link_irq)
+		irq_dispose_mapping(port->link_irq);
 	free_netdev(port->dev);
 }
 
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v2 1/4] net: mvpp2: take advantage of the is_rgmii helper
From: Antoine Tenart @ 2017-09-01  9:04 UTC (permalink / raw)
  To: davem
  Cc: Antoine Tenart, andrew, gregory.clement, thomas.petazzoni, nadavh,
	linux, linux-kernel, mw, stefanc, miquel.raynal, netdev
In-Reply-To: <20170901090455.32316-1-antoine.tenart@free-electrons.com>

Convert all RGMII checks to use the phy_interface_mode_is_rgmii()
helper. This is a cosmetic patch.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index 9e64b1ba3d43..f93d3a332c60 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -4463,10 +4463,7 @@ static void mvpp2_port_mii_gmac_configure_mode(struct mvpp2_port *port)
 		val |= MVPP2_GMAC_DISABLE_PADDING;
 		val &= ~MVPP2_GMAC_FLOW_CTRL_MASK;
 		writel(val, port->base + MVPP2_GMAC_CTRL_2_REG);
-	} else if (port->phy_interface == PHY_INTERFACE_MODE_RGMII ||
-		   port->phy_interface == PHY_INTERFACE_MODE_RGMII_ID ||
-		   port->phy_interface == PHY_INTERFACE_MODE_RGMII_RXID ||
-		   port->phy_interface == PHY_INTERFACE_MODE_RGMII_TXID) {
+	} else if (phy_interface_mode_is_rgmii(port->phy_interface)) {
 		val = readl(port->base + MVPP22_GMAC_CTRL_4_REG);
 		val |= MVPP22_CTRL4_EXT_PIN_GMII_SEL |
 		       MVPP22_CTRL4_SYNC_BYPASS_DIS |
@@ -4512,10 +4509,7 @@ static void mvpp2_port_mii_gmac_configure(struct mvpp2_port *port)
 	val = readl(port->base + MVPP2_GMAC_CTRL_2_REG);
 	if (port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
 	        val |= MVPP2_GMAC_INBAND_AN_MASK | MVPP2_GMAC_PCS_ENABLE_MASK;
-	} else if (port->phy_interface == PHY_INTERFACE_MODE_RGMII ||
-		   port->phy_interface == PHY_INTERFACE_MODE_RGMII_ID ||
-		   port->phy_interface == PHY_INTERFACE_MODE_RGMII_RXID ||
-		   port->phy_interface == PHY_INTERFACE_MODE_RGMII_TXID) {
+	} else if (phy_interface_mode_is_rgmii(port->phy_interface)) {
 		val &= ~MVPP2_GMAC_PCS_ENABLE_MASK;
 		val |= MVPP2_GMAC_PORT_RGMII_MASK;
 	}
@@ -4575,10 +4569,7 @@ static void mvpp2_port_mii_set(struct mvpp2_port *port)
 	if (port->priv->hw_version == MVPP22)
 		mvpp22_port_mii_set(port);
 
-	if (port->phy_interface == PHY_INTERFACE_MODE_RGMII ||
-	    port->phy_interface == PHY_INTERFACE_MODE_RGMII_ID ||
-	    port->phy_interface == PHY_INTERFACE_MODE_RGMII_RXID ||
-	    port->phy_interface == PHY_INTERFACE_MODE_RGMII_TXID ||
+	if (phy_interface_mode_is_rgmii(port->phy_interface) ||
 	    port->phy_interface == PHY_INTERFACE_MODE_SGMII)
 		mvpp2_port_mii_gmac_configure(port);
 	else if (port->phy_interface == PHY_INTERFACE_MODE_10GKR)
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v2 4/4] Documentation/bindings: net: marvell-pp2: add the link interrupt
From: Antoine Tenart @ 2017-09-01  9:04 UTC (permalink / raw)
  To: davem
  Cc: Antoine Tenart, andrew, gregory.clement, thomas.petazzoni, nadavh,
	linux, linux-kernel, mw, stefanc, miquel.raynal, netdev
In-Reply-To: <20170901090455.32316-1-antoine.tenart@free-electrons.com>

A link interrupt can be described. Document this valid interrupt name.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
---
 Documentation/devicetree/bindings/net/marvell-pp2.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/marvell-pp2.txt b/Documentation/devicetree/bindings/net/marvell-pp2.txt
index 49484db81583..c78f3187dfea 100644
--- a/Documentation/devicetree/bindings/net/marvell-pp2.txt
+++ b/Documentation/devicetree/bindings/net/marvell-pp2.txt
@@ -44,7 +44,7 @@ Optional properties (port):
 - interrupt-names: if more than a single interrupt for rx is given, must
                    be the name associated to the interrupts listed. Valid
                    names are: "tx-cpu0", "tx-cpu1", "tx-cpu2", "tx-cpu3",
-		   "rx-shared".
+		   "rx-shared", "link".
 - marvell,system-controller: a phandle to the system controller.
 
 Example for marvell,armada-375-pp2:
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v2 2/4] net: mvpp2: make the phy optional
From: Antoine Tenart @ 2017-09-01  9:04 UTC (permalink / raw)
  To: davem
  Cc: Antoine Tenart, andrew, gregory.clement, thomas.petazzoni, nadavh,
	linux, linux-kernel, mw, stefanc, miquel.raynal, netdev
In-Reply-To: <20170901090455.32316-1-antoine.tenart@free-electrons.com>

There is not necessarily a PHY between the GoP and the physical port.
However, the driver currently makes the "phy" property mandatory,
contrary to what is stated in the device tree bindings. This patch makes
the PHY optional, and aligns the PPv2 driver on its device tree
documentation. However if a PHY is provided, the GoP link interrupt
won't be used.

With this patch switches directly connected to the serdes lanes and SFP
ports on the Armada 8040-db and Armada 7040-db can be used if the link
interrupt is described in the device tree.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Tested-by: Marcin Wojtas <mw@semihalf.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index f93d3a332c60..b4d4759ddc25 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -6475,7 +6475,8 @@ static void mvpp2_start_dev(struct mvpp2_port *port)
 
 	mvpp2_port_mii_set(port);
 	mvpp2_port_enable(port);
-	phy_start(ndev->phydev);
+	if (ndev->phydev)
+		phy_start(ndev->phydev);
 	netif_tx_start_all_queues(port->dev);
 }
 
@@ -6501,7 +6502,8 @@ static void mvpp2_stop_dev(struct mvpp2_port *port)
 
 	mvpp2_egress_disable(port);
 	mvpp2_port_disable(port);
-	phy_stop(ndev->phydev);
+	if (ndev->phydev)
+		phy_stop(ndev->phydev);
 	phy_power_off(port->comphy);
 }
 
@@ -6558,6 +6560,10 @@ static int mvpp2_phy_connect(struct mvpp2_port *port)
 {
 	struct phy_device *phy_dev;
 
+	/* No PHY is attached */
+	if (!port->phy_node)
+		return 0;
+
 	phy_dev = of_phy_connect(port->dev, port->phy_node, mvpp2_link_event, 0,
 				 port->phy_interface);
 	if (!phy_dev) {
@@ -6578,6 +6584,9 @@ static void mvpp2_phy_disconnect(struct mvpp2_port *port)
 {
 	struct net_device *ndev = port->dev;
 
+	if (!ndev->phydev)
+		return;
+
 	phy_disconnect(ndev->phydev);
 }
 
@@ -7366,12 +7375,6 @@ static int mvpp2_port_probe(struct platform_device *pdev,
 		return -ENOMEM;
 
 	phy_node = of_parse_phandle(port_node, "phy", 0);
-	if (!phy_node) {
-		dev_err(&pdev->dev, "missing phy\n");
-		err = -ENODEV;
-		goto err_free_netdev;
-	}
-
 	phy_mode = of_get_phy_mode(port_node);
 	if (phy_mode < 0) {
 		dev_err(&pdev->dev, "incorrect phy mode\n");
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next v2 0/4] net: mvpp2: optional PHYs and GoP link irq
From: Antoine Tenart @ 2017-09-01  9:04 UTC (permalink / raw)
  To: davem
  Cc: Antoine Tenart, andrew, gregory.clement, thomas.petazzoni, nadavh,
	linux, linux-kernel, mw, stefanc, miquel.raynal, netdev

Hi all,

This series aims at making the driver work when no PHY is connected
between a port and the physical layer and not described as a fixed-phy.
This is useful for some usecases such as when a switch is connected
directly to the serdes lanes. It can also be used for SFP ports on the
7k-db and 8k-db while waiting for the phylink support to land in (which
should be part of another series).

This series makes the phy optional in the PPv2 driver, and then adds
the support for the GoP port link interrupt to handle link status
changes on such ports.

This was tested using the SFP ports on the 7k-db and 8k-db boards.

Thanks!
Antoine

Since v1:
  - Now use phy_interface_mode_is_rgmii() in the GoP link patch.
  - Added one cosmetic patch to take advantage of phy_interface_mode_is_rgmii()
    in the whole PPv2 driver.

Antoine Tenart (4):
  net: mvpp2: take advantage of the is_rgmii helper
  net: mvpp2: make the phy optional
  net: mvpp2: use the GoP interrupt for link status changes
  Documentation/bindings: net: marvell-pp2: add the link interrupt

 .../devicetree/bindings/net/marvell-pp2.txt        |   2 +-
 drivers/net/ethernet/marvell/mvpp2.c               | 211 ++++++++++++++++++---
 2 files changed, 187 insertions(+), 26 deletions(-)

-- 
2.13.5

^ permalink raw reply

* [PATCH net] vhost_net: correctly check tx avail during rx busy polling
From: Jason Wang @ 2017-09-01  9:02 UTC (permalink / raw)
  To: mst, jasowang; +Cc: kvm, virtualization, netdev, linux-kernel

We check tx avail through vhost_enable_notify() in the past which is
wrong since it only checks whether or not guest has filled more
available buffer since last avail idx synchronization which was just
done by vhost_vq_avail_empty() before. What we really want is checking
pending buffers in the avail ring. Fix this by calling
vhost_vq_avail_empty() instead.

This issue could be noticed by doing netperf TCP_RR benchmark as
client from guest (but not host). With this fix, TCP_RR from guest to
localhost restores from 1375.91 trans per sec to 55235.28 trans per
sec on my laptop (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz).

Fixes: 030881372460 ("vhost_net: basic polling support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
- The patch is needed for -stable
---
 drivers/vhost/net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 06d0448..1b68253 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -634,7 +634,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)

 		preempt_enable();

-		if (vhost_enable_notify(&net->dev, vq))
+		if (!vhost_vq_avail_empty(&net->dev, vq))
 			vhost_poll_queue(&vq->poll);
 		mutex_unlock(&vq->mutex);

-- 
2.7.4

^ permalink raw reply related

* [patch net-next 2/2] mlxsw: spectrum_router: Set abort trap in all virtual routers
From: Jiri Pirko @ 2017-09-01  8:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, mlxsw
In-Reply-To: <20170901085856.2286-1-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

When the abort mechanism is invoked a default route directing packets to
the CPU is programmed in all the virtual routers currently in use. This
can result in packet loss in case a new VRF is configured.

Upon abort, program the default route in all virtual routers, whether
they are in use or not.

The patch is directed at net-next since post-abort fixes aren't critical
and packet loss due to a missing default route will be insignificant
compared to packet loss caused by the CPU port policer.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 00e3aec..de15eac 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -3993,9 +3993,6 @@ static int __mlxsw_sp_router_set_abort_trap(struct mlxsw_sp *mlxsw_sp,
 		char raltb_pl[MLXSW_REG_RALTB_LEN];
 		char ralue_pl[MLXSW_REG_RALUE_LEN];
 
-		if (!mlxsw_sp_vr_is_used(vr))
-			continue;
-
 		mlxsw_reg_raltb_pack(raltb_pl, vr->id, proto, tree_id);
 		err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(raltb),
 				      raltb_pl);
-- 
2.9.3

^ permalink raw reply related

* [patch net-next 1/2] mlxsw: spectrum_router: Trap packets hitting anycast routes
From: Jiri Pirko @ 2017-09-01  8:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, mlxsw
In-Reply-To: <20170901085856.2286-1-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

I relied on the fact that anycast routes use the loopback device as
their nexthop device to trap packets hitting them to the CPU.

After commit 4832c30d5458 ("net: ipv6: put host and anycast routes on
device with address") this is no longer the case and such routes are
programmed with a forward action (note the 'offload' flag):

anycast cafe:: dev enp3s0np7 proto kernel metric 0 offload pref medium

This will prevent the router from locally receiving packets destined to
the Subnet-Router anycast address.

Fix this by specifically programming anycast routes with action trap,
which results in the following output:

anycast cafe:: dev enp3s0np7 proto kernel metric 0 pref medium

Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 0cf6810..00e3aec 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -3659,7 +3659,7 @@ static void mlxsw_sp_fib6_entry_type_set(struct mlxsw_sp_fib_entry *fib_entry,
 	 * local, which will cause them to be trapped with a lower
 	 * priority than packets that need to be locally received.
 	 */
-	if (rt->rt6i_flags & RTF_LOCAL)
+	if (rt->rt6i_flags & (RTF_LOCAL | RTF_ANYCAST))
 		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_TRAP;
 	else if (rt->rt6i_flags & RTF_REJECT)
 		fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_LOCAL;
-- 
2.9.3

^ permalink raw reply related

* [patch net-next 0/2] mlxsw: spectrum_router: Couple of fixes
From: Jiri Pirko @ 2017-09-01  8:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

Ido Schimmel (2):
  mlxsw: spectrum_router: Trap packets hitting anycast routes
  mlxsw: spectrum_router: Set abort trap in all virtual routers

 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

-- 
2.9.3

^ permalink raw reply

* [patch net] mlxsw: spectrum: Forbid linking to devices that have uppers
From: Jiri Pirko @ 2017-09-01  8:52 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, nogahf, mlxsw

From: Ido Schimmel <idosch@mellanox.com>

The mlxsw driver relies on NETDEV_CHANGEUPPER events to configure the
device in case a port is enslaved to a master netdev such as bridge or
bond.

Since the driver ignores events unrelated to its ports and their
uppers, it's possible to engineer situations in which the device's data
path differs from the kernel's.

One example to such a situation is when a port is enslaved to a bond
that is already enslaved to a bridge. When the bond was enslaved the
driver ignored the event - as the bond wasn't one of its uppers - and
therefore a bridge port instance isn't created in the device.

Until such configurations are supported forbid them by checking that the
upper device doesn't have uppers of its own.

Fixes: 0d65fc13042f ("mlxsw: spectrum: Implement LAG port join/leave")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Nogah Frankel <nogahf@mellanox.com>
Tested-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 6 ++++++
 include/linux/netdevice.h                      | 2 ++
 net/core/dev.c                                 | 3 ++-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 60bf8f2..c6a3e61b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -4139,6 +4139,8 @@ static int mlxsw_sp_netdevice_port_upper_event(struct net_device *lower_dev,
 			return -EINVAL;
 		if (!info->linking)
 			break;
+		if (netdev_has_any_upper_dev(upper_dev))
+			return -EINVAL;
 		if (netif_is_lag_master(upper_dev) &&
 		    !mlxsw_sp_master_lag_check(mlxsw_sp, upper_dev,
 					       info->upper_info))
@@ -4258,6 +4260,10 @@ static int mlxsw_sp_netdevice_port_vlan_event(struct net_device *vlan_dev,
 		upper_dev = info->upper_dev;
 		if (!netif_is_bridge_master(upper_dev))
 			return -EINVAL;
+		if (!info->linking)
+			break;
+		if (netdev_has_any_upper_dev(upper_dev))
+			return -EINVAL;
 		break;
 	case NETDEV_CHANGEUPPER:
 		upper_dev = info->upper_dev;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 779b235..c99ba79 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3866,6 +3866,8 @@ int netdev_walk_all_upper_dev_rcu(struct net_device *dev,
 bool netdev_has_upper_dev_all_rcu(struct net_device *dev,
 				  struct net_device *upper_dev);
 
+bool netdev_has_any_upper_dev(struct net_device *dev);
+
 void *netdev_lower_get_next_private(struct net_device *dev,
 				    struct list_head **iter);
 void *netdev_lower_get_next_private_rcu(struct net_device *dev,
diff --git a/net/core/dev.c b/net/core/dev.c
index 818dfa6..86b4b0a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5668,12 +5668,13 @@ EXPORT_SYMBOL(netdev_has_upper_dev_all_rcu);
  * Find out if a device is linked to an upper device and return true in case
  * it is. The caller must hold the RTNL lock.
  */
-static bool netdev_has_any_upper_dev(struct net_device *dev)
+bool netdev_has_any_upper_dev(struct net_device *dev)
 {
 	ASSERT_RTNL();
 
 	return !list_empty(&dev->adj_list.upper);
 }
+EXPORT_SYMBOL(netdev_has_any_upper_dev);
 
 /**
  * netdev_master_upper_dev_get - Get master upper device
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH 0/5] net: mdio-mux: Misc fix
From: Corentin Labbe @ 2017-09-01  8:30 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: f.fainelli, netdev, linux-kernel
In-Reply-To: <20170830190119.GK22289@lunn.ch>

On Wed, Aug 30, 2017 at 09:01:19PM +0200, Andrew Lunn wrote:
> On Wed, Aug 30, 2017 at 07:46:46PM +0200, Corentin Labbe wrote:
> > Hello
> > 
> > This patch series fix minor problems found when working on the
> > dwmac-sun8i syscon mdio-mux.
> 
> Hi Corentin
> 
> In geineral, a nice patchset.

Thanks, I will send a v2 with your suggestions.

> 
> Looking at the code, there are a few calls to devm_kfree() which look
> redundant. This one should probably stay:
> 
> 	   	if (r) {
> 			mdiobus_free(cb->mii_bus);
> 			devm_kfree(dev, cb);
> 		} else {
> 
> but i think the others can go.
> 
> Just a suggestion, not a problem if you don't feel like doing this...
> 

I will try but in another serie. For letting more time for test.

Regards

^ permalink raw reply

* hns3 use of pci_enable_msix_range
From: Christoph Hellwig @ 2017-09-01  8:26 UTC (permalink / raw)
  To: Salil; +Cc: Daode Huang, lipeng, Yisen Zhuang, Wei Hu, netdev

Hi Salil,

can you please switch your new hns3 driver to use pci_alloc_irq_vectors
instead of the deprecated pci_enable_msix_range interface?  Take a
look at Documentation/PCI/MSI-HOWTO.txt for the details.

^ permalink raw reply

* Re: [RFC PATCH] net: frag limit checks need to use percpu_counter_compare
From: Michal Kubecek @ 2017-09-01  8:10 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: liujian56, netdev, Florian Westphal
In-Reply-To: <20170901094156.12e6e0a3@redhat.com>

On Fri, Sep 01, 2017 at 09:41:56AM +0200, Jesper Dangaard Brouer wrote:
> On Thu, 31 Aug 2017 18:23:49 +0200 Michal Kubecek <mkubecek@suse.cz> wrote:
> 
> > If we go this way (which would IMHO require some benchmarks to make sure
> > it doesn't harm performance too much) we can drop the explicit checks
> > for zero thresholds which were added to work around the unreliability of
> > fast checks of percpu counters (or at least the second one was by commit
> > 30759219f562 ("net: disable fragment reassembly if high_thresh is zero").
>   
> After much considerations, together with Florian, I'm now instead
> looking at reverting the use of percpu_counter for this memory
> accounting use-case.  The complexity and maintenance cost is not worth
> it.  And I'm of-cause testing the perf effect, and currently I'm _not_
> seeing any perf regression on my 10G + 100G testlab (although this is
> not a NUMA system, which were my original optimization case).

This sounds reasonable to me. It is indeed questionable if percpu
counters are still worth the complexity if all checks have to be changed
to the exact version.

Perhaps there would be some gain for many CPUs if thresholds are large
enough to (almost) always avoid the need to calculate the sum. But once
we leave that safe area, I would be surprised if simple atomic_t
wouldn't be more efficient.

Michal Kubecek

^ permalink raw reply

* Re: [RFC PATCH] net: frag limit checks need to use percpu_counter_compare
From: Jesper Dangaard Brouer @ 2017-09-01  7:41 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: liujian56, netdev, Florian Westphal, brouer
In-Reply-To: <20170831162349.k3qnkfgkygdh2zqw@unicorn.suse.cz>

On Thu, 31 Aug 2017 18:23:49 +0200 Michal Kubecek <mkubecek@suse.cz> wrote:

> If we go this way (which would IMHO require some benchmarks to make sure
> it doesn't harm performance too much) we can drop the explicit checks
> for zero thresholds which were added to work around the unreliability of
> fast checks of percpu counters (or at least the second one was by commit
> 30759219f562 ("net: disable fragment reassembly if high_thresh is zero").

After much considerations, together with Florian, I'm now instead
looking at reverting the use of percpu_counter for this memory
accounting use-case.  The complexity and maintenance cost is not worth
it.  And I'm of-cause testing the perf effect, and currently I'm _not_
seeing any perf regression on my 10G + 100G testlab (although this is
not a NUMA system, which were my original optimization case).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Oakley Sunglass Warehouse
From: Oakley Sunglass Warehouse @ 2017-09-01  7:05 UTC (permalink / raw)
  To: Oakley Sunglass Warehouse

From:	Oakley Sunglass Warehouse
Email:	netdev@vger.kernel.org

------------------------------------------------------

Hi

It's our birthday and we are celebrating 2 years of our outlet.  We'd like to offer you only 19.95 USD for all styles of Oakley and Ray Ban Sunglasses. 
Please visit our online store for more details: http://www.eshop.trade  

YOU WON'T WANT TO MISS THESE OFFERS

To your success!

Regards, 

Mila

Oakley Sunglass Warehouse

------------------------------------------------------

^ permalink raw reply

* [PATCH 1/2] xfrm: Add support for network devices capable of removing the ESP trailer
From: Steffen Klassert @ 2017-09-01  7:30 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1504251046-17106-1-git-send-email-steffen.klassert@secunet.com>

From: Yossi Kuperman <yossiku@mellanox.com>

In conjunction with crypto offload [1], removing the ESP trailer by
hardware can potentially improve the performance by avoiding (1) a
cache miss incurred by reading the nexthdr field and (2) the necessity
to calculate the csum value of the trailer in order to keep skb->csum
valid.

This patch introduces the changes to the xfrm stack and merely serves
as an infrastructure. Subsequent patch to mlx5 driver will put this to
a good use.

[1] https://www.mail-archive.com/netdev@vger.kernel.org/msg175733.html

Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h    |  1 +
 net/ipv4/esp4.c       | 70 ++++++++++++++++++++++++++++++++++-----------------
 net/ipv6/esp6.c       | 51 ++++++++++++++++++++++++++-----------
 net/xfrm/xfrm_input.c |  5 ++++
 4 files changed, 89 insertions(+), 38 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 9c7b70c..f002a2c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1019,6 +1019,7 @@ struct xfrm_offload {
 #define	CRYPTO_FALLBACK		8
 #define	XFRM_GSO_SEGMENT	16
 #define	XFRM_GRO		32
+#define	XFRM_ESP_NO_TRAILER	64
 
 	__u32			status;
 #define CRYPTO_SUCCESS				1
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 741acd7..3190005 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -499,19 +499,59 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 	return esp_output_tail(x, skb, &esp);
 }
 
+static inline int esp_remove_trailer(struct sk_buff *skb)
+{
+	struct xfrm_state *x = xfrm_input_state(skb);
+	struct xfrm_offload *xo = xfrm_offload(skb);
+	struct crypto_aead *aead = x->data;
+	int alen, hlen, elen;
+	int padlen, trimlen;
+	__wsum csumdiff;
+	u8 nexthdr[2];
+	int ret;
+
+	alen = crypto_aead_authsize(aead);
+	hlen = sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead);
+	elen = skb->len - hlen;
+
+	if (xo && (xo->flags & XFRM_ESP_NO_TRAILER)) {
+		ret = xo->proto;
+		goto out;
+	}
+
+	if (skb_copy_bits(skb, skb->len - alen - 2, nexthdr, 2))
+		BUG();
+
+	ret = -EINVAL;
+	padlen = nexthdr[0];
+	if (padlen + 2 + alen >= elen) {
+		net_dbg_ratelimited("ipsec esp packet is garbage padlen=%d, elen=%d\n",
+				    padlen + 2, elen - alen);
+		goto out;
+	}
+
+	trimlen = alen + padlen + 2;
+	if (skb->ip_summed == CHECKSUM_COMPLETE) {
+		csumdiff = skb_checksum(skb, skb->len - trimlen, trimlen, 0);
+		skb->csum = csum_block_sub(skb->csum, csumdiff,
+					   skb->len - trimlen);
+	}
+	pskb_trim(skb, skb->len - trimlen);
+
+	ret = nexthdr[1];
+
+out:
+	return ret;
+}
+
 int esp_input_done2(struct sk_buff *skb, int err)
 {
 	const struct iphdr *iph;
 	struct xfrm_state *x = xfrm_input_state(skb);
 	struct xfrm_offload *xo = xfrm_offload(skb);
 	struct crypto_aead *aead = x->data;
-	int alen = crypto_aead_authsize(aead);
 	int hlen = sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead);
-	int elen = skb->len - hlen;
 	int ihl;
-	u8 nexthdr[2];
-	int padlen, trimlen;
-	__wsum csumdiff;
 
 	if (!xo || (xo && !(xo->flags & CRYPTO_DONE)))
 		kfree(ESP_SKB_CB(skb)->tmp);
@@ -519,16 +559,10 @@ int esp_input_done2(struct sk_buff *skb, int err)
 	if (unlikely(err))
 		goto out;
 
-	if (skb_copy_bits(skb, skb->len-alen-2, nexthdr, 2))
-		BUG();
-
-	err = -EINVAL;
-	padlen = nexthdr[0];
-	if (padlen + 2 + alen >= elen)
+	err = esp_remove_trailer(skb);
+	if (unlikely(err < 0))
 		goto out;
 
-	/* ... check padding bits here. Silly. :-) */
-
 	iph = ip_hdr(skb);
 	ihl = iph->ihl * 4;
 
@@ -569,22 +603,12 @@ int esp_input_done2(struct sk_buff *skb, int err)
 			skb->ip_summed = CHECKSUM_UNNECESSARY;
 	}
 
-	trimlen = alen + padlen + 2;
-	if (skb->ip_summed == CHECKSUM_COMPLETE) {
-		csumdiff = skb_checksum(skb, skb->len - trimlen, trimlen, 0);
-		skb->csum = csum_block_sub(skb->csum, csumdiff,
-					   skb->len - trimlen);
-	}
-	pskb_trim(skb, skb->len - trimlen);
-
 	skb_pull_rcsum(skb, hlen);
 	if (x->props.mode == XFRM_MODE_TUNNEL)
 		skb_reset_transport_header(skb);
 	else
 		skb_set_transport_header(skb, -ihl);
 
-	err = nexthdr[1];
-
 	/* RFC4303: Drop dummy packets without any error */
 	if (err == IPPROTO_NONE)
 		err = -EINVAL;
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 74bde20..7fb41b0 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -461,29 +461,30 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb)
 	return esp6_output_tail(x, skb, &esp);
 }
 
-int esp6_input_done2(struct sk_buff *skb, int err)
+static inline int esp_remove_trailer(struct sk_buff *skb)
 {
 	struct xfrm_state *x = xfrm_input_state(skb);
 	struct xfrm_offload *xo = xfrm_offload(skb);
 	struct crypto_aead *aead = x->data;
-	int alen = crypto_aead_authsize(aead);
-	int hlen = sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead);
-	int elen = skb->len - hlen;
-	int hdr_len = skb_network_header_len(skb);
+	int alen, hlen, elen;
 	int padlen, trimlen;
 	__wsum csumdiff;
 	u8 nexthdr[2];
+	int ret;
 
-	if (!xo || (xo && !(xo->flags & CRYPTO_DONE)))
-		kfree(ESP_SKB_CB(skb)->tmp);
+	alen = crypto_aead_authsize(aead);
+	hlen = sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead);
+	elen = skb->len - hlen;
 
-	if (unlikely(err))
+	if (xo && (xo->flags & XFRM_ESP_NO_TRAILER)) {
+		ret = xo->proto;
 		goto out;
+	}
 
 	if (skb_copy_bits(skb, skb->len - alen - 2, nexthdr, 2))
 		BUG();
 
-	err = -EINVAL;
+	ret = -EINVAL;
 	padlen = nexthdr[0];
 	if (padlen + 2 + alen >= elen) {
 		net_dbg_ratelimited("ipsec esp packet is garbage padlen=%d, elen=%d\n",
@@ -491,26 +492,46 @@ int esp6_input_done2(struct sk_buff *skb, int err)
 		goto out;
 	}
 
-	/* ... check padding bits here. Silly. :-) */
-
 	trimlen = alen + padlen + 2;
 	if (skb->ip_summed == CHECKSUM_COMPLETE) {
-		skb_postpull_rcsum(skb, skb_network_header(skb),
-				   skb_network_header_len(skb));
 		csumdiff = skb_checksum(skb, skb->len - trimlen, trimlen, 0);
 		skb->csum = csum_block_sub(skb->csum, csumdiff,
 					   skb->len - trimlen);
 	}
 	pskb_trim(skb, skb->len - trimlen);
 
+	ret = nexthdr[1];
+
+out:
+	return ret;
+}
+
+int esp6_input_done2(struct sk_buff *skb, int err)
+{
+	struct xfrm_state *x = xfrm_input_state(skb);
+	struct xfrm_offload *xo = xfrm_offload(skb);
+	struct crypto_aead *aead = x->data;
+	int hlen = sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead);
+	int hdr_len = skb_network_header_len(skb);
+
+	if (!xo || (xo && !(xo->flags & CRYPTO_DONE)))
+		kfree(ESP_SKB_CB(skb)->tmp);
+
+	if (unlikely(err))
+		goto out;
+
+	err = esp_remove_trailer(skb);
+	if (unlikely(err < 0))
+		goto out;
+
+	skb_postpull_rcsum(skb, skb_network_header(skb),
+			   skb_network_header_len(skb));
 	skb_pull_rcsum(skb, hlen);
 	if (x->props.mode == XFRM_MODE_TUNNEL)
 		skb_reset_transport_header(skb);
 	else
 		skb_set_transport_header(skb, -hdr_len);
 
-	err = nexthdr[1];
-
 	/* RFC4303: Drop dummy packets without any error */
 	if (err == IPPROTO_NONE)
 		err = -EINVAL;
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index f07eec5..2515cd2 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -247,6 +247,11 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 					goto drop;
 				}
 
+				if (xo->status & CRYPTO_INVALID_PROTOCOL) {
+					XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEPROTOERROR);
+					goto drop;
+				}
+
 				XFRM_INC_STATS(net, LINUX_MIB_XFRMINBUFFERERROR);
 				goto drop;
 			}
-- 
2.7.4

^ permalink raw reply related

* [PATCH 2/2] xfrm: Fix return value check of copy_sec_ctx.
From: Steffen Klassert @ 2017-09-01  7:30 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1504251046-17106-1-git-send-email-steffen.klassert@secunet.com>

A recent commit added an output_mark. When copying
this output_mark, the return value of copy_sec_ctx
is overwitten without a check. Fix this by copying
the output_mark before the security context.

Fixes: 077fbac405bf ("net: xfrm: support setting an output mark.")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_user.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index cc3268d..490132d 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -900,13 +900,13 @@ static int copy_to_user_state_extra(struct xfrm_state *x,
 		ret = copy_user_offload(&x->xso, skb);
 	if (ret)
 		goto out;
-	if (x->security)
-		ret = copy_sec_ctx(x->security, skb);
 	if (x->props.output_mark) {
 		ret = nla_put_u32(skb, XFRMA_OUTPUT_MARK, x->props.output_mark);
 		if (ret)
 			goto out;
 	}
+	if (x->security)
+		ret = copy_sec_ctx(x->security, skb);
 out:
 	return ret;
 }
-- 
2.7.4

^ permalink raw reply related

* pull request (net-next): ipsec-next 2017-09-01
From: Steffen Klassert @ 2017-09-01  7:30 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

This should be the last ipsec-next pull request for this
release cycle:

1) Support netdevice ESP trailer removal when decryption
   is offloaded. From Yossi Kuperman.

2) Fix overwritten return value of copy_sec_ctx().

Please pull or let me know if there are problems.

Thanks!

The following changes since commit acfb98b99647aa7dc7c111db52d5f4199d2b641f:

  liquidio: fix crash in presence of zeroed-out base address regs (2017-08-30 22:07:09 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master

for you to fetch changes up to 8598112d04af21cf6c895670e72dcb8a9f58e74f:

  xfrm: Fix return value check of copy_sec_ctx. (2017-08-31 10:37:00 +0200)

----------------------------------------------------------------
Steffen Klassert (1):
      xfrm: Fix return value check of copy_sec_ctx.

Yossi Kuperman (1):
      xfrm: Add support for network devices capable of removing the ESP trailer

 include/net/xfrm.h    |  1 +
 net/ipv4/esp4.c       | 70 ++++++++++++++++++++++++++++++++++-----------------
 net/ipv6/esp6.c       | 51 ++++++++++++++++++++++++++-----------
 net/xfrm/xfrm_input.c |  5 ++++
 net/xfrm/xfrm_user.c  |  4 +--
 5 files changed, 91 insertions(+), 40 deletions(-)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox