Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [Intel-wired-lan] v4.15-rc2 on thinkpad x60: ethernet stopped working
From: Neftin, Sasha @ 2017-12-18 15:50 UTC (permalink / raw)
  To: Pavel Machek, jacob.e.keller
  Cc: David Miller, bpoirier, nix.or.die, netdev, linux-kernel,
	intel-wired-lan, lsorense
In-Reply-To: <20171218115817.GA17054@amd>

On 12/18/2017 13:58, Pavel Machek wrote:
> On Mon 2017-12-18 13:24:40, Neftin, Sasha wrote:
>> On 12/18/2017 12:26, Pavel Machek wrote:
>>> Hi!
>>>
>>>>>>> In v4.15-rc2+, network manager can not see my ethernet card, and
>>>>>>> manual attempts to ifconfig it up did not really help, either.
>>>>>>>
>>>>>>> Card is:
>>>>>>>
>>>>>>> 02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet
>>>>>>> Controller
>>>>> ....
>>>>>>> Any ideas ?
>>>>>> Yes , 19110cfbb34d4af0cdfe14cd243f3b09dc95b013 broke it.
>>>>>>
>>>>>> See:
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=198047
>>>>>>
>>>>>> Fix there :
>>>>>> https://marc.info/?l=linux-kernel&m=151272209903675&w=2
>>>>> I don't see the patch in latest mainline. Not having ethernet
>>>>> is... somehow annoying. What is going on there?
>>>> Generally speaking, e1000 maintainence has been handled very poorly over
>>>> the past few years, I have to say.
>>>>
>>>> Fixes take forever to propagate even when someone other than the
>>>> maintainer provides a working and tested fix, just like this case.
>>>>
>>>> Jeff, please take e1000 maintainence seriously and get these critical
>>>> bug fixes propagated.
>>> No response AFAICT. I guess I should test reverting
>>> 19110cfbb34d4af0cdfe14cd243f3b09dc95b013, then ask you for revert?
>> Hello Pavel,
>>
>> Before ask for reverting 19110cfbb..., please, check if follow patch of
>> Benjamin work for you http://patchwork.ozlabs.org/patch/846825/
> Jacob said, in another email:
>
> # Digging into this, the problem is complicated. The original bug
> # assumed behavior of the .check_for_link call, which is universally not
> # implemented.
> #
> # I think the correct fix is to revert 19110cfbb34d ("e1000e: Separate
> # signaling for link check/link up", 2017-10-10) and find a more proper solution.
>
> ...which makes me think that revert is preffered?
>
> 									Pavel
>
Pavel, before ask for revert - let's check Benjamin's patch following to 
his previous patch. Previous patch was not competed and latest one come 
to complete changes.

^ permalink raw reply

* Re: [PATCH bpf-next 12/13] bpf: arm64: add JIT support for multi-function programs
From: Daniel Borkmann @ 2017-12-18 15:51 UTC (permalink / raw)
  To: Arnd Bergmann, Alexei Starovoitov
  Cc: David S . Miller, John Fastabend, Edward Cree, Jakub Kicinski,
	Networking, kernel-team
In-Reply-To: <CAK8P3a3XhCRiWoLVMd6VB9FMUho554UZt8AmxCm8zbkrok_cOw@mail.gmail.com>

On 12/18/2017 04:29 PM, Arnd Bergmann wrote:
> On Fri, Dec 15, 2017 at 2:55 AM, Alexei Starovoitov <ast@kernel.org> wrote:
> 
> 
>> +       if (jit_data->ctx.offset) {
>> +               ctx = jit_data->ctx;
>> +               image_ptr = jit_data->image;
>> +               header = jit_data->header;
>> +               extra_pass = true;
>> +               goto skip_init_ctx;
>> +       }
>>         memset(&ctx, 0, sizeof(ctx));
>>         ctx.prog = prog;
> 
> The 'goto' jumps over the 'image_size' initialization
> 
>>         prog->bpf_func = (void *)ctx.image;
>>         prog->jited = 1;
>>         prog->jited_len = image_size;
> 
> so we now get a warning here, starting with linux-next-20171218:
> 
> arch/arm64/net/bpf_jit_comp.c: In function 'bpf_int_jit_compile':
> arch/arm64/net/bpf_jit_comp.c:982:18: error: 'image_size' may be used
> uninitialized in this function [-Werror=maybe-uninitialized]
> 
> I could not figure out what the code should be doing instead, or if it is
> indeed safe and the warning is a false-positive.

Good catch, it's buggy indeed. Fix like below is needed; I can submit
it properly a bit later today:

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 396490c..a6fd585 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -855,6 +855,7 @@ static inline void bpf_flush_icache(void *start, void *end)
 struct arm64_jit_data {
 	struct bpf_binary_header *header;
 	u8 *image;
+	int image_size;
 	struct jit_ctx ctx;
 };

@@ -895,6 +896,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	if (jit_data->ctx.offset) {
 		ctx = jit_data->ctx;
 		image_ptr = jit_data->image;
+		image_size = jit_data->image_size;
 		header = jit_data->header;
 		extra_pass = true;
 		goto skip_init_ctx;
@@ -975,6 +977,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	} else {
 		jit_data->ctx = ctx;
 		jit_data->image = image_ptr;
+		jit_data->image_size = image_size;
 		jit_data->header = header;
 	}
 	prog->bpf_func = (void *)ctx.image;

^ permalink raw reply related

* Re: pull-request: bpf-next 2017-12-18
From: David Miller @ 2017-12-18 15:51 UTC (permalink / raw)
  To: daniel; +Cc: ast, netdev
In-Reply-To: <20171218003307.10014-1-daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Mon, 18 Dec 2017 01:33:07 +0100

> The following pull-request contains BPF updates for your *net-next* tree.
> 
> The main changes are:
> 
> 1) Allow arbitrary function calls from one BPF function to another BPF function.
>    As of today when writing BPF programs, __always_inline had to be used in
>    the BPF C programs for all functions, unnecessarily causing LLVM to inflate
>    code size. Handle this more naturally with support for BPF to BPF calls
>    such that this __always_inline restriction can be overcome. As a result,
>    it allows for better optimized code and finally enables to introduce core
>    BPF libraries in the future that can be reused out of different projects.
>    x86 and arm64 JIT support was added as well, from Alexei.

Exciting... but now there's a lot of JIT work to do.

 ...
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

Pulled, thanks!

^ permalink raw reply

* Re: [bpf-next V1-RFC PATCH 01/14] xdp: base API for new XDP rx-queue info concept
From: Jesper Dangaard Brouer @ 2017-12-18 15:52 UTC (permalink / raw)
  To: David Ahern
  Cc: Daniel Borkmann, Alexei Starovoitov, netdev, gospo, bjorn.topel,
	michael.chan, brouer
In-Reply-To: <2effe097-6802-2020-075d-47cc3576f78f@gmail.com>

On Mon, 18 Dec 2017 06:23:40 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 12/18/17 3:55 AM, Jesper Dangaard Brouer wrote:
> > 
> > Handling return-errors in the drivers complicated the driver code, as it
> > involves unraveling and deallocating other RX-rings etc (that were
> > already allocated) if the reg fails. (Also notice next patch will allow
> > dev == NULL, if right ptype is set).
> > 
> > I'm not completely rejecting you idea, as this is a good optimization
> > trick, which is to move validation checks to setup-time, thus allowing
> > less validation checks at runtime.  I sort-of actually already did
> > this, as I allow bpf to deref dev without NULL check.  I would argue
> > this is good enough, as we will crash in a predictable way, as above
> > WARN will point to which driver violated the API.
> > 
> > If people think it is valuable I can change this API to return an err?  
> 
> Saeed's suggested API in a comment on patch 12 also removes most of the
> WARN_ONs as it sets the device and index:
> 
> xdp_rxq_info_reg(netdev, rxq_index)
> {
>     rxqueue = netdev->_rx + rxq_index;
>     xdp_rxq = rxqueue.xdp_rxq;
>         xdp_rxq_info_init(xdp_rxq);
>     xdp_rxq.dev = netdev;
>     xdp_rxq.queue_index = rxq_index;
> }
> 
> xdp_rxq_info_unreg(netdev, rxq_index)
> {
> ...
> }

No, we still need the other WARN_ON's.

I don't understand why you think above API is better.  In case
netdev==NULL the system will simply crash on deref of netdev.  That
case happened for both drivers i40e and mlx5, when I was adding this.
The WARN_ON help me quickly identify the issue, and in both drivers it
was a non-critical error, as these queues are not used by XDP. IHMO a
better experience for the driver developer.

IHMO WARN_ON's are a good thing.  For example the:

 if (xdp_rxq->reg_state == REG_STATE_REGISTERED)
   WARN(1, "Missing unregister, handled but fix driver\n");

Just helped me identify a bug in i40e driver.  It turns out that
changing the RX-ring queue size via ethtool <-G|--set-ring> (_not_ the
number of RX-rings, but frames per RX-ring). Then i40e_set_ringparam()
allocates some temp RX-rings and copy-around struct contents, causing
this strange issue.  It will not crash with our currently simple content,
but later this would cause a hard-to-debug issue.  I'm happy I could
catch this now, instead of later as a strange crash.

The WARN's are there to assist driver developers when using this API
in their drivers (better than crash/BUG_ON as they don't have to dig-up
their serial cable console).  For me it is also part of the
documentation, as it document the API assumptions/assertions together
with a small text field.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH] net: qcom/emac: Change the order of mac up and sgmii open
From: Timur Tabi @ 2017-12-18 16:05 UTC (permalink / raw)
  To: Hemanth Puranik, netdev, linux-kernel
In-Reply-To: <1513576667-2967-1-git-send-email-hpuranik@codeaurora.org>

On 12/17/2017 11:57 PM, Hemanth Puranik wrote:
> This patch fixes the order of mac_up and sgmii_open for the
> reasons noted below:
> 
> - If open takes more time(if the SGMII block is not responding or
>    if we want to do some delay based task) in this situation we
>    will hit NETDEV watchdog
> - The main reason : We should signal to upper layers that we are
>    ready to receive packets "only" when the entire path is initialized
>    not the other way around, this is followed in the reset path where
>    we do mac_down, sgmii_reset and mac_up. This also makes the driver
>    uniform across the reset and open paths.
> - In the future there may be need for delay based tasks to be done in
>    sgmii open which will result in NETDEV watchdog
> - As per the documentation the order of init should be sgmii, mac, rings
>    and DMA
> 
> Signed-off-by: Hemanth Puranik<hpuranik@codeaurora.org>

Acked-by: Timur Tabi <timur@codeaurora.org>

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* [RFC ipsec-next 0/4]: Support multiple VTIs with the same src+dst pair
From: Lorenzo Colitti @ 2017-12-18 16:16 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold

When using IPsec tunnel mode, VTIs provide many benefits compared
to direct configuration of xfrm policies / states. However, one
limitation is that there can only be one VTI between a given pair
of IP addresses. This does not allow configuring multiple IPsec
tunnels to the same security gateway. This is required by some
deployments, for example I-WLAN [3GPP TS 24.327].

This patchset introduces a new VTI_KEYED flag that allows
configuration of multiple VTIs between the same IP address
pairs. The output path is the same as current VTI behaviour,
where a routing lookup selects a VTI interface, and the VTI's
okey specifies the mark to use in the XFRM lookup. The input and
ICMP error paths instead work by first looking up an SA with a
loose match that ignores the mark. That mark is then used to find
the tunnel by ikey.

This approach is simple and requires few userspace changes, but
it has one limitation in that ICMP errors received in response to
VTI-emitted packets can only be processed if the VTI's ikey and
okey are the same. This limitation could be lifted by introducing
another XFRM mark, similar to XFRMA_OUTPUT_MARK, but used for
input.

^ permalink raw reply

* [RFC ipsec-next 1/4] met: xfrm: Add an xfrm lookup that ignores the mark.
From: Lorenzo Colitti @ 2017-12-18 16:16 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, Lorenzo Colitti
In-Reply-To: <20171218161656.40618-1-lorenzo@google.com>

The xfrm inbound and ICMP error paths can match inbound XFRM states
that have a mark, but only if the skb mark is already correctly set
to match the state mark. This typically requires iptables rules
(potentially even per SA iptables rules), which impose configuration
complexity.

In some cases, it may be useful to match such an SA anyway. An example
is when processing an ICMP error to an ESP packet that we previously
sent. In this case, the only information available to match the SA are
the IP addresses and the outbound SPI. Therefore, if the output SA has
a mark, the lookup will fail and the ICMP packet cannot be processed
unless the packet is somehow already marked.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 include/net/xfrm.h    |  4 ++++
 net/xfrm/xfrm_state.c | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 1ec0c4760646..9d3b7c0ac6e2 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1550,6 +1550,10 @@ struct xfrm_state *xfrm_state_lookup_byaddr(struct net *net, u32 mark,
 					    const xfrm_address_t *saddr,
 					    u8 proto,
 					    unsigned short family);
+struct xfrm_state *xfrm_state_lookup_loose(struct net *net, u32 mark,
+					   const xfrm_address_t *daddr,
+					   __be32 spi, u8 proto,
+					   unsigned short family);
 #ifdef CONFIG_XFRM_SUB_POLICY
 int xfrm_tmpl_sort(struct xfrm_tmpl **dst, struct xfrm_tmpl **src, int n,
 		   unsigned short family, struct net *net);
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 1b7856be3eeb..ee678758547f 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -839,6 +839,38 @@ static struct xfrm_state *__xfrm_state_lookup(struct net *net, u32 mark,
 	return NULL;
 }

+struct xfrm_state *xfrm_state_lookup_loose(struct net *net, u32 mark,
+					   const xfrm_address_t *daddr,
+					   __be32 spi, u8 proto,
+					   unsigned short family)
+{
+	unsigned int h = xfrm_spi_hash(net, daddr, spi, proto, family);
+	struct xfrm_state *x, *cand = NULL;
+
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(x, net->xfrm.state_byspi + h, byspi) {
+		if (x->props.family != family ||
+		    x->id.spi       != spi ||
+		    x->id.proto     != proto ||
+		    !xfrm_addr_equal(&x->id.daddr, daddr, family))
+			continue;
+
+		if (((mark & x->mark.m) == x->mark.v) &&
+		    xfrm_state_hold_rcu(x)) {
+			if (cand)
+				xfrm_state_put(cand);
+			rcu_read_unlock();
+			return x;
+		}
+
+		if (!cand && xfrm_state_hold_rcu(x))
+			cand = x;
+	}
+
+	rcu_read_unlock();
+	return cand;
+}
+
 static struct xfrm_state *__xfrm_state_lookup_byaddr(struct net *net, u32 mark,
 						     const xfrm_address_t *daddr,
 						     const xfrm_address_t *saddr,
-- 
2.15.1.504.g5279b80103-goog

^ permalink raw reply related

* [RFC ipsec-next 2/4] net: xfrm: find VTI interfaces from xfrm_input
From: Lorenzo Colitti @ 2017-12-18 16:16 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, Lorenzo Colitti
In-Reply-To: <20171218161656.40618-1-lorenzo@google.com>

Currently, the VTI input path works by first looking up the VTI
by its IP addresses, then setting the tunnel pointer in the
XFRM_TUNNEL_SKB_CB, and then having xfrm_input override the mark
with the mark in the tunnel.

This patch changes the order so that the tunnel is found by a
callback from xfrm_input. Each tunnel type (currently only ip_vti
and ip6_vti) implements a lookup function pointer that finds the
tunnel and sets it in the CB, and also does a state lookup.

This has the advantage that much more information is available to
the tunnel lookup function, including the looked-up XFRM state.
This will be used in a future change to allow finding the tunnel
not just from the IP addresses, but also from the xfrm lookup.

The lookup function pointer occupies the same space in the
XFRM_TUNNEL_SKB_CB as the IPv4/IPv6 tunnel pointer. The semantics
of the field are:
- When not running a handler that uses tunnels: always null.
- At the beginning of xfrm_input: lookup function pointer.
- After xfrm_input calls the lookup function: tunnel if found,
  else null.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 include/net/xfrm.h     |  2 ++
 net/ipv4/ip_vti.c      | 43 ++++++++++++++++++++++++++++++++++++----
 net/ipv6/ip6_vti.c     | 53 +++++++++++++++++++++++++++++++++++++++++++++-----
 net/ipv6/xfrm6_input.c |  1 -
 net/xfrm/xfrm_input.c  | 34 +++++++++++++++++++-------------
 5 files changed, 109 insertions(+), 24 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 9d3b7c0ac6e2..3d245f2f6f6c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -653,6 +653,8 @@ struct xfrm_tunnel_skb_cb {
 	} header;
 
 	union {
+		int (*lookup)(struct sk_buff *skb, int nexthdr, __be32 spi,
+			      __be32 seq, struct xfrm_state **x);
 		struct ip_tunnel *ip4;
 		struct ip6_tnl *ip6;
 	} tunnel;
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 949f432a5f04..850625598187 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -49,8 +49,8 @@ static struct rtnl_link_ops vti_link_ops __read_mostly;
 static unsigned int vti_net_id __read_mostly;
 static int vti_tunnel_init(struct net_device *dev);
 
-static int vti_input(struct sk_buff *skb, int nexthdr, __be32 spi,
-		     int encap_type)
+static struct ip_tunnel *
+vti4_find_tunnel(struct sk_buff *skb, __be32 spi, struct xfrm_state **x)
 {
 	struct ip_tunnel *tunnel;
 	const struct iphdr *iph = ip_hdr(skb);
@@ -59,19 +59,52 @@ static int vti_input(struct sk_buff *skb, int nexthdr, __be32 spi,
 
 	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
 				  iph->saddr, iph->daddr, 0);
+	if (tunnel) {
+		*x = xfrm_state_lookup(net, be32_to_cpu(tunnel->parms.i_key),
+				       (xfrm_address_t *)&iph->daddr,
+				       spi, iph->protocol, AF_INET);
+	}
+
+	return tunnel;
+}
+
+static int vti_lookup(struct sk_buff *skb, int nexthdr, __be32 spi, __be32 seq,
+		      struct xfrm_state **x)
+{
+	struct net *net = dev_net(skb->dev);
+	struct ip_tunnel *tunnel;
+
+	tunnel = vti4_find_tunnel(skb, spi, x);
 	if (tunnel) {
 		if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb))
 			goto drop;
 
+		if (!*x) {
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
+			xfrm_audit_state_notfound(skb, AF_INET, spi, seq);
+			tunnel->dev->stats.rx_errors++;
+			tunnel->dev->stats.rx_dropped++;
+			goto drop;
+		}
+
 		XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4 = tunnel;
 
-		return xfrm_input(skb, nexthdr, spi, encap_type);
+		return 0;
 	}
 
 	return -EINVAL;
 drop:
+	if (*x)
+		xfrm_state_put(*x);
 	kfree_skb(skb);
-	return 0;
+	return -ESRCH;
+}
+
+static int vti_input(struct sk_buff *skb, int nexthdr, __be32 spi,
+		     int encap_type)
+{
+	XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup = vti_lookup;
+	return xfrm_input(skb, nexthdr, spi, encap_type);
 }
 
 static int vti_rcv(struct sk_buff *skb)
@@ -93,6 +126,8 @@ static int vti_rcv_cb(struct sk_buff *skb, int err)
 	u32 orig_mark = skb->mark;
 	int ret;
 
+	XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4 = NULL;
+
 	if (!tunnel)
 		return 1;
 
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index dbb74f3c57a7..d0676f2f99eb 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -297,13 +297,33 @@ static void vti6_dev_uninit(struct net_device *dev)
 	dev_put(dev);
 }
 
-static int vti6_rcv(struct sk_buff *skb)
+static struct ip6_tnl *
+vti6_find_tunnel(struct sk_buff *skb, __be32 spi, struct xfrm_state **x)
 {
+	const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+	struct net *net = dev_net(skb->dev);
 	struct ip6_tnl *t;
+
+	t = vti6_tnl_lookup(net, &ipv6h->saddr, &ipv6h->daddr);
+	if (t) {
+		*x = xfrm_state_lookup(net, be32_to_cpu(t->parms.i_key),
+				       (xfrm_address_t *)&ipv6h->daddr,
+				       spi, ipv6h->nexthdr, AF_INET6);
+	}
+
+	return t;
+}
+
+int
+vti6_lookup(struct sk_buff *skb, int nexthdr, __be32 spi, __be32 seq,
+	    struct xfrm_state **x)
+{
 	const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+	struct net *net = dev_net(skb->dev);
+	struct ip6_tnl *t;
 
 	rcu_read_lock();
-	t = vti6_tnl_lookup(dev_net(skb->dev), &ipv6h->saddr, &ipv6h->daddr);
+	t = vti6_find_tunnel(skb, spi, x);
 	if (t) {
 		if (t->parms.proto != IPPROTO_IPV6 && t->parms.proto != 0) {
 			rcu_read_unlock();
@@ -312,7 +332,7 @@ static int vti6_rcv(struct sk_buff *skb)
 
 		if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
 			rcu_read_unlock();
-			return 0;
+			goto discard;
 		}
 
 		if (!ip6_tnl_rcv_ctl(t, &ipv6h->daddr, &ipv6h->saddr)) {
@@ -321,15 +341,36 @@ static int vti6_rcv(struct sk_buff *skb)
 			goto discard;
 		}
 
+		if (!*x) {
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
+			xfrm_audit_state_notfound(skb, AF_INET6, spi, seq);
+			t->dev->stats.rx_errors++;
+			t->dev->stats.rx_dropped++;
+			rcu_read_unlock();
+			goto discard;
+		}
+
 		rcu_read_unlock();
 
-		return xfrm6_rcv_tnl(skb, t);
+		XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = t;
+
+		return 0;
 	}
 	rcu_read_unlock();
 	return -EINVAL;
 discard:
+	if (*x)
+		xfrm_state_put(*x);
 	kfree_skb(skb);
-	return 0;
+	return -ESRCH;
+}
+
+static int vti6_rcv(struct sk_buff *skb)
+{
+	int nexthdr = skb_network_header(skb)[IP6CB(skb)->nhoff];
+
+	XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup = vti6_lookup;
+	return xfrm6_rcv_spi(skb, nexthdr, 0, NULL);
 }
 
 static int vti6_rcv_cb(struct sk_buff *skb, int err)
@@ -343,6 +384,8 @@ static int vti6_rcv_cb(struct sk_buff *skb, int err)
 	u32 orig_mark = skb->mark;
 	int ret;
 
+	XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = NULL;
+
 	if (!t)
 		return 1;
 
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index fe04e23af986..6d1b734fef8d 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -25,7 +25,6 @@ int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb)
 int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi,
 		  struct ip6_tnl *t)
 {
-	XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 = t;
 	XFRM_SPI_SKB_CB(skb)->family = AF_INET6;
 	XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct ipv6hdr, daddr);
 	return xfrm_input(skb, nexthdr, spi, 0);
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index ac277b97e0d7..7b54f58454ee 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -267,18 +267,6 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 
 	family = XFRM_SPI_SKB_CB(skb)->family;
 
-	/* if tunnel is present override skb->mark value with tunnel i_key */
-	switch (family) {
-	case AF_INET:
-		if (XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4)
-			mark = be32_to_cpu(XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip4->parms.i_key);
-		break;
-	case AF_INET6:
-		if (XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6)
-			mark = be32_to_cpu(XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6->parms.i_key);
-		break;
-	}
-
 	err = secpath_set(skb);
 	if (err) {
 		XFRM_INC_STATS(net, LINUX_MIB_XFRMINERROR);
@@ -293,14 +281,29 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 
 	daddr = (xfrm_address_t *)(skb_network_header(skb) +
 				   XFRM_SPI_SKB_CB(skb)->daddroff);
+
+	if (XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup) {
+		err = XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup(skb, nexthdr,
+							     spi, seq, &x);
+		if (err) {
+			XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup = NULL;
+			return err;
+		}
+	}
+
 	do {
 		if (skb->sp->len == XFRM_MAX_DEPTH) {
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINBUFFERERROR);
+			if (x)
+				xfrm_state_put(x);
 			goto drop;
 		}
 
-		x = xfrm_state_lookup(net, mark, daddr, spi, nexthdr, family);
-		if (x == NULL) {
+		if (!x)
+			x = xfrm_state_lookup(net, mark, daddr, spi, nexthdr,
+					      family);
+
+		if (!x) {
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
 			xfrm_audit_state_notfound(skb, family, spi, seq);
 			goto drop;
@@ -420,6 +423,9 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINHDRERROR);
 			goto drop;
 		}
+
+		if (!err)
+			x = NULL;
 	} while (!err);
 
 	err = xfrm_rcv_cb(skb, family, x->type->proto, 0);
-- 
2.15.1.504.g5279b80103-goog

^ permalink raw reply related

* [RFC ipsec-next 3/4] net: xfrm: support multiple VTI tunnels
From: Lorenzo Colitti @ 2017-12-18 16:16 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, Lorenzo Colitti
In-Reply-To: <20171218161656.40618-1-lorenzo@google.com>

This commit allows the creation of multiple VTI tunnels with the
same src+dst pair, via a new VTI_KEYED flag. This makes it
possible to maintain multiple IPsec tunnels to the same security
gateway, with the tunnels distinguished by SPI.

The new semantics are as follows:

- The output path is the same as existing VTIs. A routing lookup
  matches a VTI interface. The VTI uses its o_key to as the mark
  to select an XFRM state. The state transforms the packet.
- Input works as follows:
  1. Attempt to match a regular VTI by IP addresses only. If that
     succeeds, use the i_key as the mark to look up the xfrm
     state.
  2. If the match failed, do an XFRM state lookup that ignores
     the mark. If that finds an state, then use the state match's
     mark to find the tunnel by its i_key.
- ICMP errors are similar to input, except the search is for the
  outbound XFRM state, because the only data that is available is
  the outbound SPI. Thus, ICMP errors are only processed if the
  ikey is the same as the same as the okey. AFAICS this is
  consistent with GRE tunnels, but not with existing VTI
  behaviour.

Tested: https://android-review.googlesource.com/571524
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 include/uapi/linux/if_tunnel.h |   3 ++
 net/ipv4/ip_vti.c              |  75 +++++++++++++++++++++++--------
 net/ipv6/ip6_vti.c             | 100 +++++++++++++++++++++++++++++++----------
 3 files changed, 136 insertions(+), 42 deletions(-)

diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 1b3d148c4560..c2ec509cbc9e 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -147,6 +147,8 @@ enum {
 
 /* VTI-mode i_flags */
 #define VTI_ISVTI ((__force __be16)0x0001)
+#define VTI_KEYED ((__force __be16)0x0002)
+#define VTI_IFLAG_MASK ((__force __be16)0x0003)
 
 enum {
 	IFLA_VTI_UNSPEC,
@@ -156,6 +158,7 @@ enum {
 	IFLA_VTI_LOCAL,
 	IFLA_VTI_REMOTE,
 	IFLA_VTI_FWMARK,
+	IFLA_VTI_IFLAGS,
 	__IFLA_VTI_MAX,
 };
 
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 850625598187..f5793782c418 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -63,6 +63,17 @@ vti4_find_tunnel(struct sk_buff *skb, __be32 spi, struct xfrm_state **x)
 		*x = xfrm_state_lookup(net, be32_to_cpu(tunnel->parms.i_key),
 				       (xfrm_address_t *)&iph->daddr,
 				       spi, iph->protocol, AF_INET);
+	} else {
+		*x = xfrm_state_lookup_loose(net, skb->mark,
+					     (xfrm_address_t *) &iph->daddr,
+					     spi, iph->protocol, AF_INET);
+		if (!*x)
+			return NULL;
+		tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_KEY,
+					  iph->saddr, iph->daddr,
+					  cpu_to_be32((*x)->mark.v));
+		if (!tunnel)
+			xfrm_state_put(*x);
 	}
 
 	return tunnel;
@@ -302,7 +313,6 @@ static netdev_tx_t vti_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 static int vti4_err(struct sk_buff *skb, u32 info)
 {
 	__be32 spi;
-	__u32 mark;
 	struct xfrm_state *x;
 	struct ip_tunnel *tunnel;
 	struct ip_esp_hdr *esph;
@@ -313,13 +323,6 @@ static int vti4_err(struct sk_buff *skb, u32 info)
 	int protocol = iph->protocol;
 	struct ip_tunnel_net *itn = net_generic(net, vti_net_id);
 
-	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
-				  iph->daddr, iph->saddr, 0);
-	if (!tunnel)
-		return -1;
-
-	mark = be32_to_cpu(tunnel->parms.o_key);
-
 	switch (protocol) {
 	case IPPROTO_ESP:
 		esph = (struct ip_esp_hdr *)(skb->data+(iph->ihl<<2));
@@ -347,18 +350,46 @@ static int vti4_err(struct sk_buff *skb, u32 info)
 		return 0;
 	}
 
-	x = xfrm_state_lookup(net, mark, (const xfrm_address_t *)&iph->daddr,
-			      spi, protocol, AF_INET);
-	if (!x)
-		return 0;
+	tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
+				  iph->daddr, iph->saddr, 0);
+	if (tunnel) {
+		x = xfrm_state_lookup(net, be32_to_cpu(tunnel->parms.o_key),
+				      (xfrm_address_t *)&iph->daddr,
+				      spi, iph->protocol, AF_INET);
+	} else {
+		x = xfrm_state_lookup_loose(net, skb->mark,
+					    (xfrm_address_t *)&iph->daddr,
+					    spi, iph->protocol, AF_INET);
+		if (!x)
+			goto out;
+		tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_KEY,
+					  iph->daddr, iph->saddr,
+					  cpu_to_be32(x->mark.v));
+	}
+
+	if (!tunnel || !x)
+		goto out;
 
 	if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH)
 		ipv4_update_pmtu(skb, net, info, 0, 0, protocol, 0);
 	else
 		ipv4_redirect(skb, net, 0, 0, protocol, 0);
-	xfrm_state_put(x);
 
-	return 0;
+out:
+	if (x)
+		xfrm_state_put(x);
+
+	return tunnel ? 0 : -1;
+}
+
+static __be16 vti_flags_to_tnl_flags(__be16 i_flags)
+{
+	return VTI_ISVTI | ((i_flags & VTI_KEYED) ? GRE_KEY : 0);
+}
+
+static __be16 tnl_flags_to_vti_flags(__be16 i_flags)
+{
+	return VTI_ISVTI | ((i_flags & GRE_KEY) ? VTI_KEYED : 0);
 }
 
 static int
@@ -381,7 +412,7 @@ vti_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	if (!(p.o_flags & GRE_KEY))
 		p.o_key = 0;
 
-	p.i_flags = VTI_ISVTI;
+	p.i_flags = vti_flags_to_tnl_flags(p.i_flags);
 
 	err = ip_tunnel_ioctl(dev, &p, cmd);
 	if (err)
@@ -508,8 +539,6 @@ static void vti_netlink_parms(struct nlattr *data[],
 	if (!data)
 		return;
 
-	parms->i_flags = VTI_ISVTI;
-
 	if (data[IFLA_VTI_LINK])
 		parms->link = nla_get_u32(data[IFLA_VTI_LINK]);
 
@@ -527,6 +556,11 @@ static void vti_netlink_parms(struct nlattr *data[],
 
 	if (data[IFLA_VTI_FWMARK])
 		*fwmark = nla_get_u32(data[IFLA_VTI_FWMARK]);
+
+	if (data[IFLA_VTI_IFLAGS])
+		parms->i_flags = nla_get_be16(data[IFLA_VTI_IFLAGS]);
+
+	parms->i_flags = vti_flags_to_tnl_flags(parms->i_flags);
 }
 
 static int vti_newlink(struct net *src_net, struct net_device *dev,
@@ -567,6 +601,8 @@ static size_t vti_get_size(const struct net_device *dev)
 		nla_total_size(4) +
 		/* IFLA_VTI_FWMARK */
 		nla_total_size(4) +
+		/* IFLA_VTI_IFLAGS */
+		nla_total_size(2) +
 		0;
 }
 
@@ -580,7 +616,9 @@ static int vti_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_be32(skb, IFLA_VTI_OKEY, p->o_key) ||
 	    nla_put_in_addr(skb, IFLA_VTI_LOCAL, p->iph.saddr) ||
 	    nla_put_in_addr(skb, IFLA_VTI_REMOTE, p->iph.daddr) ||
-	    nla_put_u32(skb, IFLA_VTI_FWMARK, t->fwmark))
+	    nla_put_u32(skb, IFLA_VTI_FWMARK, t->fwmark) ||
+	    nla_put_be16(skb, IFLA_VTI_IFLAGS,
+			 tnl_flags_to_vti_flags(p->i_flags)))
 		return -EMSGSIZE;
 
 	return 0;
@@ -593,6 +631,7 @@ static const struct nla_policy vti_policy[IFLA_VTI_MAX + 1] = {
 	[IFLA_VTI_LOCAL]	= { .len = FIELD_SIZEOF(struct iphdr, saddr) },
 	[IFLA_VTI_REMOTE]	= { .len = FIELD_SIZEOF(struct iphdr, daddr) },
 	[IFLA_VTI_FWMARK]	= { .type = NLA_U32 },
+	[IFLA_VTI_IFLAGS]	= { .type = NLA_U16 },
 };
 
 static struct rtnl_link_ops vti_link_ops __read_mostly = {
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index d0676f2f99eb..3797738c828f 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -54,9 +54,10 @@
 #define IP6_VTI_HASH_SIZE_SHIFT  5
 #define IP6_VTI_HASH_SIZE (1 << IP6_VTI_HASH_SIZE_SHIFT)
 
-static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2)
+static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2,
+		__be32 i_key)
 {
-	u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2);
+	u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2) ^ i_key;
 
 	return hash_32(hash, IP6_VTI_HASH_SIZE_SHIFT);
 }
@@ -78,11 +79,17 @@ struct vti6_net {
 #define for_each_vti6_tunnel_rcu(start) \
 	for (t = rcu_dereference(start); t; t = rcu_dereference(t->next))
 
+static __be32 vti6_get_hash_key(const struct __ip6_tnl_parm *p)
+{
+	return (p->i_flags & GRE_KEY) ? p->i_key : 0;
+}
+
 /**
- * vti6_tnl_lookup - fetch tunnel matching the end-point addresses
+ * vti6_tnl_lookup - fetch tunnel matching the end-point addresses and i_key
  *   @net: network namespace
  *   @remote: the address of the tunnel exit-point
  *   @local: the address of the tunnel entry-point
+ *   @local: the i_key of the tunnel
  *
  * Return:
  *   tunnel matching given end-points if found,
@@ -91,9 +98,9 @@ struct vti6_net {
  **/
 static struct ip6_tnl *
 vti6_tnl_lookup(struct net *net, const struct in6_addr *remote,
-		const struct in6_addr *local)
+		const struct in6_addr *local, __be32 i_key)
 {
-	unsigned int hash = HASH(remote, local);
+	unsigned int hash = HASH(remote, local, i_key);
 	struct ip6_tnl *t;
 	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
 	struct in6_addr any;
@@ -101,21 +108,24 @@ vti6_tnl_lookup(struct net *net, const struct in6_addr *remote,
 	for_each_vti6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
 		    ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    vti6_get_hash_key(&t->parms) == i_key &&
 		    (t->dev->flags & IFF_UP))
 			return t;
 	}
 
 	memset(&any, 0, sizeof(any));
-	hash = HASH(&any, local);
+	hash = HASH(&any, local, i_key);
 	for_each_vti6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
+		    vti6_get_hash_key(&t->parms) == i_key &&
 		    (t->dev->flags & IFF_UP))
 			return t;
 	}
 
-	hash = HASH(remote, &any);
+	hash = HASH(remote, &any, i_key);
 	for_each_vti6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
 		if (ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    vti6_get_hash_key(&t->parms) == i_key &&
 		    (t->dev->flags & IFF_UP))
 			return t;
 	}
@@ -147,7 +157,7 @@ vti6_tnl_bucket(struct vti6_net *ip6n, const struct __ip6_tnl_parm *p)
 
 	if (!ipv6_addr_any(remote) || !ipv6_addr_any(local)) {
 		prio = 1;
-		h = HASH(remote, local);
+		h = HASH(remote, local, vti6_get_hash_key(p));
 	}
 	return &ip6n->tnls[prio][h];
 }
@@ -266,7 +276,8 @@ static struct ip6_tnl *vti6_locate(struct net *net, struct __ip6_tnl_parm *p,
 	     (t = rtnl_dereference(*tp)) != NULL;
 	     tp = &t->next) {
 		if (ipv6_addr_equal(local, &t->parms.laddr) &&
-		    ipv6_addr_equal(remote, &t->parms.raddr)) {
+		    ipv6_addr_equal(remote, &t->parms.raddr) &&
+		    vti6_get_hash_key(&t->parms) == vti6_get_hash_key(p)) {
 			if (create)
 				return NULL;
 
@@ -304,11 +315,21 @@ vti6_find_tunnel(struct sk_buff *skb, __be32 spi, struct xfrm_state **x)
 	struct net *net = dev_net(skb->dev);
 	struct ip6_tnl *t;
 
-	t = vti6_tnl_lookup(net, &ipv6h->saddr, &ipv6h->daddr);
+	t = vti6_tnl_lookup(net, &ipv6h->saddr, &ipv6h->daddr, 0);
 	if (t) {
 		*x = xfrm_state_lookup(net, be32_to_cpu(t->parms.i_key),
 				       (xfrm_address_t *)&ipv6h->daddr,
 				       spi, ipv6h->nexthdr, AF_INET6);
+	} else {
+		*x = xfrm_state_lookup_loose(net, skb->mark,
+					     (xfrm_address_t *) &ipv6h->daddr,
+					     spi, ipv6h->nexthdr, AF_INET6);
+		if (!*x)
+			return NULL;
+		t =  vti6_tnl_lookup(net, &ipv6h->saddr, &ipv6h->daddr,
+				     cpu_to_be32((*x)->mark.v));
+		if (!t)
+			xfrm_state_put(*x);
 	}
 
 	return t;
@@ -613,7 +634,6 @@ static int vti6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 		    u8 type, u8 code, int offset, __be32 info)
 {
 	__be32 spi;
-	__u32 mark;
 	struct xfrm_state *x;
 	struct ip6_tnl *t;
 	struct ip_esp_hdr *esph;
@@ -623,12 +643,6 @@ static int vti6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	const struct ipv6hdr *iph = (const struct ipv6hdr *)skb->data;
 	int protocol = iph->nexthdr;
 
-	t = vti6_tnl_lookup(dev_net(skb->dev), &iph->daddr, &iph->saddr);
-	if (!t)
-		return -1;
-
-	mark = be32_to_cpu(t->parms.o_key);
-
 	switch (protocol) {
 	case IPPROTO_ESP:
 		esph = (struct ip_esp_hdr *)(skb->data + offset);
@@ -650,19 +664,35 @@ static int vti6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 	    type != NDISC_REDIRECT)
 		return 0;
 
-	x = xfrm_state_lookup(net, mark, (const xfrm_address_t *)&iph->daddr,
-			      spi, protocol, AF_INET6);
-	if (!x)
-		return 0;
+	t = vti6_tnl_lookup(net, &iph->daddr, &iph->saddr, 0);
+	if (t) {
+		x = xfrm_state_lookup(net, be32_to_cpu(t->parms.o_key),
+				      (xfrm_address_t *)&iph->daddr,
+				      spi, protocol, AF_INET6);
+	} else {
+		x = xfrm_state_lookup_loose(net, skb->mark,
+					    (xfrm_address_t *) &iph->daddr,
+					    spi, protocol, AF_INET6);
+		if (!x)
+			goto out;
+		t = vti6_tnl_lookup(net, &iph->daddr, &iph->saddr,
+				    cpu_to_be32(x->mark.v));
+	}
+
+	if (!t || !x)
+		goto out;
 
 	if (type == NDISC_REDIRECT)
 		ip6_redirect(skb, net, skb->dev->ifindex, 0,
 			     sock_net_uid(net, NULL));
 	else
 		ip6_update_pmtu(skb, net, info, 0, 0, sock_net_uid(net, NULL));
-	xfrm_state_put(x);
 
-	return 0;
+out:
+	if (x)
+		xfrm_state_put(x);
+
+	return t ? 0 : -1;
 }
 
 static void vti6_link_config(struct ip6_tnl *t)
@@ -957,9 +987,21 @@ static int vti6_validate(struct nlattr *tb[], struct nlattr *data[],
 	return 0;
 }
 
+static __be16 vti_flags_to_tnl_flags(__be16 i_flags)
+{
+	return VTI_ISVTI | ((i_flags & VTI_KEYED) ? GRE_KEY : 0);
+}
+
+static __be16 tnl_flags_to_vti_flags(__be16 i_flags)
+{
+	return VTI_ISVTI | ((i_flags & GRE_KEY) ? VTI_KEYED : 0);
+}
+
 static void vti6_netlink_parms(struct nlattr *data[],
 			       struct __ip6_tnl_parm *parms)
 {
+	__be16 i_flags = 0;
+
 	memset(parms, 0, sizeof(*parms));
 
 	if (!data)
@@ -982,6 +1024,11 @@ static void vti6_netlink_parms(struct nlattr *data[],
 
 	if (data[IFLA_VTI_FWMARK])
 		parms->fwmark = nla_get_u32(data[IFLA_VTI_FWMARK]);
+
+	if (data[IFLA_VTI_IFLAGS])
+		i_flags = nla_get_be16(data[IFLA_VTI_IFLAGS]);
+
+	parms->i_flags = vti_flags_to_tnl_flags(i_flags);
 }
 
 static int vti6_newlink(struct net *src_net, struct net_device *dev,
@@ -1051,6 +1098,8 @@ static size_t vti6_get_size(const struct net_device *dev)
 		nla_total_size(4) +
 		/* IFLA_VTI_FWMARK */
 		nla_total_size(4) +
+		/* IFLA_VTI_IFLAGS */
+		nla_total_size(2) +
 		0;
 }
 
@@ -1064,7 +1113,9 @@ static int vti6_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_in6_addr(skb, IFLA_VTI_REMOTE, &parm->raddr) ||
 	    nla_put_be32(skb, IFLA_VTI_IKEY, parm->i_key) ||
 	    nla_put_be32(skb, IFLA_VTI_OKEY, parm->o_key) ||
-	    nla_put_u32(skb, IFLA_VTI_FWMARK, parm->fwmark))
+	    nla_put_u32(skb, IFLA_VTI_FWMARK, parm->fwmark) ||
+	    nla_put_be16(skb, IFLA_VTI_IFLAGS,
+			 tnl_flags_to_vti_flags(parm->i_flags)))
 		goto nla_put_failure;
 	return 0;
 
@@ -1079,6 +1130,7 @@ static const struct nla_policy vti6_policy[IFLA_VTI_MAX + 1] = {
 	[IFLA_VTI_IKEY]		= { .type = NLA_U32 },
 	[IFLA_VTI_OKEY]		= { .type = NLA_U32 },
 	[IFLA_VTI_FWMARK]	= { .type = NLA_U32 },
+	[IFLA_VTI_IFLAGS]	= { .type = NLA_U16 },
 };
 
 static struct rtnl_link_ops vti6_link_ops __read_mostly = {
-- 
2.15.1.504.g5279b80103-goog

^ permalink raw reply related

* [RFC ipsec-next 4/4] net: xfrm: don't pass tunnel objects to xfrm6_rcv_spi.
From: Lorenzo Colitti @ 2017-12-18 16:16 UTC (permalink / raw)
  To: netdev; +Cc: steffen.klassert, subashab, nharold, Lorenzo Colitti
In-Reply-To: <20171218161656.40618-1-lorenzo@google.com>

This change removes the tunnel parameter from xfrm6_rcv_spi and
deletes xfrm6_rcv_tnl. These were only used by the VTI code and
are now unused.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 include/net/xfrm.h      |  4 +---
 net/ipv4/ip_vti.c       |  4 ++--
 net/ipv6/ip6_vti.c      |  2 +-
 net/ipv6/xfrm6_input.c  | 13 +++----------
 net/ipv6/xfrm6_tunnel.c |  2 +-
 5 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 3d245f2f6f6c..fc19dda73c50 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1638,10 +1638,8 @@ int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
 void xfrm4_local_error(struct sk_buff *skb, u32 mtu);
 int xfrm6_extract_header(struct sk_buff *skb);
 int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb);
-int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi,
-		  struct ip6_tnl *t);
+int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi);
 int xfrm6_transport_finish(struct sk_buff *skb, int async);
-int xfrm6_rcv_tnl(struct sk_buff *skb, struct ip6_tnl *t);
 int xfrm6_rcv(struct sk_buff *skb);
 int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 		     xfrm_address_t *saddr, u8 proto);
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index f5793782c418..144ec34fd975 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -384,12 +384,12 @@ static int vti4_err(struct sk_buff *skb, u32 info)
 
 static __be16 vti_flags_to_tnl_flags(__be16 i_flags)
 {
-	return VTI_ISVTI | ((i_flags & VTI_KEYED) ? GRE_KEY : 0);
+	return VTI_ISVTI | ((i_flags & VTI_KEYED) ? TUNNEL_KEY : 0);
 }
 
 static __be16 tnl_flags_to_vti_flags(__be16 i_flags)
 {
-	return VTI_ISVTI | ((i_flags & GRE_KEY) ? VTI_KEYED : 0);
+	return VTI_ISVTI | ((i_flags & TUNNEL_KEY) ? VTI_KEYED : 0);
 }
 
 static int
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 3797738c828f..3a68b7ba1b9c 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -391,7 +391,7 @@ static int vti6_rcv(struct sk_buff *skb)
 	int nexthdr = skb_network_header(skb)[IP6CB(skb)->nhoff];
 
 	XFRM_TUNNEL_SKB_CB(skb)->tunnel.lookup = vti6_lookup;
-	return xfrm6_rcv_spi(skb, nexthdr, 0, NULL);
+	return xfrm6_rcv_spi(skb, nexthdr, 0);
 }
 
 static int vti6_rcv_cb(struct sk_buff *skb, int err)
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 6d1b734fef8d..5f20e309263f 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -22,8 +22,7 @@ int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb)
 	return xfrm6_extract_header(skb);
 }
 
-int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi,
-		  struct ip6_tnl *t)
+int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi)
 {
 	XFRM_SPI_SKB_CB(skb)->family = AF_INET6;
 	XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct ipv6hdr, daddr);
@@ -59,16 +58,10 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
 	return -1;
 }
 
-int xfrm6_rcv_tnl(struct sk_buff *skb, struct ip6_tnl *t)
-{
-	return xfrm6_rcv_spi(skb, skb_network_header(skb)[IP6CB(skb)->nhoff],
-			     0, t);
-}
-EXPORT_SYMBOL(xfrm6_rcv_tnl);
-
 int xfrm6_rcv(struct sk_buff *skb)
 {
-	return xfrm6_rcv_tnl(skb, NULL);
+	return xfrm6_rcv_spi(skb, skb_network_header(skb)[IP6CB(skb)->nhoff],
+			     0);
 }
 EXPORT_SYMBOL(xfrm6_rcv);
 int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
index f85f0d7480ac..02161543a932 100644
--- a/net/ipv6/xfrm6_tunnel.c
+++ b/net/ipv6/xfrm6_tunnel.c
@@ -236,7 +236,7 @@ static int xfrm6_tunnel_rcv(struct sk_buff *skb)
 	__be32 spi;
 
 	spi = xfrm6_tunnel_spi_lookup(net, (const xfrm_address_t *)&iph->saddr);
-	return xfrm6_rcv_spi(skb, IPPROTO_IPV6, spi, NULL);
+	return xfrm6_rcv_spi(skb, IPPROTO_IPV6, spi);
 }
 
 static int xfrm6_tunnel_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
-- 
2.15.1.504.g5279b80103-goog

^ permalink raw reply related

* [PATCH 4.14 121/178] l2tp: cleanup l2tp_tunnel_delete calls
From: Greg Kroah-Hartman @ 2017-12-18 15:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, stable, Jiri Slaby, Sabrina Dubroca,
	Guillaume Nault, David S. Miller, netdev, Sasha Levin
In-Reply-To: <20171218152920.567991776@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Jiri Slaby <jslaby@suse.cz>


[ Upstream commit 4dc12ffeaeac939097a3f55c881d3dc3523dff0c ]

l2tp_tunnel_delete does not return anything since commit 62b982eeb458
("l2tp: fix race condition in l2tp_tunnel_delete").  But call sites of
l2tp_tunnel_delete still do casts to void to avoid unused return value
warnings.

Kill these now useless casts.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Guillaume Nault <g.nault@alphalink.fr>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/l2tp/l2tp_core.c    |    2 +-
 net/l2tp/l2tp_netlink.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1891,7 +1891,7 @@ static __net_exit void l2tp_exit_net(str
 
 	rcu_read_lock_bh();
 	list_for_each_entry_rcu(tunnel, &pn->l2tp_tunnel_list, list) {
-		(void)l2tp_tunnel_delete(tunnel);
+		l2tp_tunnel_delete(tunnel);
 	}
 	rcu_read_unlock_bh();
 
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -282,7 +282,7 @@ static int l2tp_nl_cmd_tunnel_delete(str
 	l2tp_tunnel_notify(&l2tp_nl_family, info,
 			   tunnel, L2TP_CMD_TUNNEL_DELETE);
 
-	(void) l2tp_tunnel_delete(tunnel);
+	l2tp_tunnel_delete(tunnel);
 
 	l2tp_tunnel_dec_refcount(tunnel);
 

^ permalink raw reply

* Re: BUG: unable to handle kernel NULL pointer dereference in rds_send_xmit
From: Santosh Shilimkar @ 2017-12-18 16:28 UTC (permalink / raw)
  To: syzbot, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	rds-devel-N0ozoZBvEnrZJqsBc5GL+g,
	syzkaller-bugs-/JYPxA39Uh5TLH3MbocFFw
In-Reply-To: <001a1145ac5480242305609956b3-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

On 12/18/2017 12:43 AM, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on 
> 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> Unfortunately, I don't have any reproducer for this bug yet.
> 
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> program syz-executor6 is using a deprecated SCSI ioctl, please convert 
> it to SG_IO
> IP: rds_send_xmit+0x80/0x930 net/rds/send.c:186

Looks like another one tripping on empty transport. Mostly below should
address it but we will test it if it does.

diff --git a/net/rds/send.c b/net/rds/send.c
index 7244d2e..e2d0eaa 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -183,7 +183,7 @@ int rds_send_xmit(struct rds_conn_path *cp)
                 goto out;
         }

-       if (conn->c_trans->xmit_path_prepare)
+       if (conn->c_trans && conn->c_trans->xmit_path_prepare)
                 conn->c_trans->xmit_path_prepare(cp);



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [net  1/1] tipc: fix lost member events bug
From: Jon Maloy @ 2017-12-18 16:34 UTC (permalink / raw)
  To: davem, netdev
  Cc: tipc-discussion, hoang.h.le, mohan.krishna.ghanta.krishnamurthy

Group messages are not supposed to be returned to sender when the
destination socket disappears. This is done correctly for regular
traffic messages, by setting the 'dest_droppable' bit in the header.
But we forget to do that in group protocol messages. This has the effect
that such messages may sometimes bounce back to the sender, be perceived
as a legitimate peer message, and wreak general havoc for the rest of
the session. In particular, we have seen that a member in state LEAVING
may go back to state RECLAIMED or REMITTED, hence causing suppression
of an otherwise expected 'member down' event to the user.

We fix this by setting the 'dest_droppable' bit even in group protocol
messages.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/group.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/tipc/group.c b/net/tipc/group.c
index 95fec2c..efb5714 100644
--- a/net/tipc/group.c
+++ b/net/tipc/group.c
@@ -648,6 +648,7 @@ static void tipc_group_proto_xmit(struct tipc_group *grp, struct tipc_member *m,
 	} else if (mtyp == GRP_REMIT_MSG) {
 		msg_set_grp_remitted(hdr, m->window);
 	}
+	msg_set_dest_droppable(hdr, true);
 	__skb_queue_tail(xmitq, skb);
 }

-- 
2.1.4

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

^ permalink raw reply related

* Re: [PATCH v2 0/5] Support for generalized use of make C={1,2} via a wrapper program
From: Knut Omang @ 2017-12-18 16:41 UTC (permalink / raw)
  To: Joe Perches, Jason Gunthorpe
  Cc: Stephen Hemminger, linux-kernel, Mauro Carvalho Chehab,
	Nicolas Palix, Jonathan Corbet, Santosh Shilimkar, Matthew Wilcox,
	cocci, rds-devel, linux-rdma, linux-doc, Doug Ledford,
	Mickaël Salaün, Shuah Khan, linux-kbuild, Michal Marek,
	Julia Lawall, John Haxby, Åsmund Østvold,
	Masahiro Yamada
In-Reply-To: <1513611003.31581.71.camel@perches.com>

On Mon, 2017-12-18 at 07:30 -0800, Joe Perches wrote:
> On Mon, 2017-12-18 at 14:05 +0100, Knut Omang wrote:
> > > Here is a list of the checkpatch messages for drivers/infiniband
> > > sorted by type.
> > > 
> > > Many of these might be corrected by using
> > > 
> > > $ ./scripts/checkpatch.pl -f --fix-inplace --types=<TYPE> \
> > >   $(git ls-files drivers/infiniband/)
> > 
> > Yes, and I already did that work piece by piece for individual types,
> > just to test the runchecks tool, and want to post that set once the 
> > runchecks script and Makefile changes itself are in,
> 
> I think those are independent of any runcheck tool changes and
> could be posted now.  In general, don't keep patches in a local
> tree waiting on some other unrelated patch.

It becomes related in that the runchecks.cfg file is updated 
in all the patches to keep 'make C=2' run with 0 errors while 
enabling more checks. I think they serve well as examples of 
how a workflow with runchecks could be.

> Just fyi:
> 
> There is a script that helps automate checkpatch "by-type" conversions
> with compilation, .o difference checking, and git commit editing.
> 
> https://lkml.org/lkml/2014/7/11/794

oh - good to know - seems it would have been a good help
during my little exercise..

Thanks,
Knut

^ permalink raw reply

* Re: BUG: spinlock bad magic (2)
From: Santosh Shilimkar @ 2017-12-18 16:46 UTC (permalink / raw)
  To: syzbot, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	rds-devel-N0ozoZBvEnrZJqsBc5GL+g,
	syzkaller-bugs-/JYPxA39Uh5TLH3MbocFFw
In-Reply-To: <001a113fae28c2fd6605609c97a2-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

On 12/18/2017 4:36 AM, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on 
> 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> Unfortunately, I don't have any reproducer for this bug yet.
> 
[...]

> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> IP: rds_send_xmit+0x80/0x930 net/rds/send.c:186

This one seems to be same bug as reported as below.

BUG: unable to handle kernel NULL pointer dereference in rds_send_xmit
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 3/3] trace: print address if symbol not found
From: Steven Rostedt @ 2017-12-18 16:49 UTC (permalink / raw)
  To: Tobin C. Harding
  Cc: kernel-hardening, Tycho Andersen, Linus Torvalds, Kees Cook,
	Andrew Morton, Daniel Borkmann, Masahiro Yamada,
	Alexei Starovoitov, linux-kernel, Network Development
In-Reply-To: <1513554812-13014-4-git-send-email-me@tobin.cc>

On Mon, 18 Dec 2017 10:53:32 +1100
"Tobin C. Harding" <me@tobin.cc> wrote:

> Fixes behaviour modified by: commit bd6b239cdbb2 ("kallsyms: don't leak
> address when symbol not found")
> 
> Previous patch changed behaviour of kallsyms function sprint_symbol() to
> return an error code instead of printing the address if a symbol was not
> found. Ftrace relies on the original behaviour. We should not break
> tracing when applying the previous patch. We can maintain the original
> behaviour by checking the return code on calls to sprint_symbol() and
> friends.
> 
> Check return code and print actual address on error (i.e symbol not
> found).
> 
> Signed-off-by: Tobin C. Harding <me@tobin.cc>
> ---
>  kernel/trace/trace.h             | 24 ++++++++++++++++++++++++
>  kernel/trace/trace_events_hist.c |  6 +++---
>  2 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 2a6d0325a761..881b1a577d75 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -1814,4 +1814,28 @@ static inline void trace_event_eval_update(struct trace_eval_map **map, int len)
>  
>  extern struct trace_iterator *tracepoint_print_iter;
>  
> +static inline int
> +trace_sprint_symbol(char *buffer, unsigned long address)
> +{
> +	int ret;
> +
> +	ret = sprint_symbol(buffer, address);
> +	if (ret == -1)
> +		ret = sprintf(buffer, "0x%lx", address);
> +
> +	return ret;
> +}
> +
> +static inline int
> +trace_sprint_symbol_no_offset(char *buffer, unsigned long address)
> +{
> +	int ret;
> +
> +	ret = sprint_symbol_no_offset(buffer, address);
> +	if (ret == -1)
> +		ret = sprintf(buffer, "0x%lx", address);
> +
> +	return ret;
> +}
> +
>  #endif /* _LINUX_KERNEL_TRACE_H */
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index 1e1558c99d56..3e28522a76f4 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -982,7 +982,7 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
>  			return;
>  
>  		seq_printf(m, "%*c", 1 + spaces, ' ');
> -		sprint_symbol(str, stacktrace_entries[i]);
> +		trace_sprint_symbol_addr(str, stacktrace_entries[i]);

Hmm, where is trace_sprint_symbol_addr() defined?

-- Steve

>  		seq_printf(m, "%s\n", str);
>  	}
>  }
> @@ -1014,12 +1014,12 @@ hist_trigger_entry_print(struct seq_file *m,
>  			seq_printf(m, "%s: %llx", field_name, uval);
>  		} else if (key_field->flags & HIST_FIELD_FL_SYM) {
>  			uval = *(u64 *)(key + key_field->offset);
> -			sprint_symbol_no_offset(str, uval);
> +			trace_sprint_symbol_no_offset(str, uval);
>  			seq_printf(m, "%s: [%llx] %-45s", field_name,
>  				   uval, str);
>  		} else if (key_field->flags & HIST_FIELD_FL_SYM_OFFSET) {
>  			uval = *(u64 *)(key + key_field->offset);
> -			sprint_symbol(str, uval);
> +			trace_sprint_symbol(str, uval);
>  			seq_printf(m, "%s: [%llx] %-55s", field_name,
>  				   uval, str);
>  		} else if (key_field->flags & HIST_FIELD_FL_EXECNAME) {

^ permalink raw reply

* [PATCH net-next 0/6] sfc: Initial X2000-series (Medford2) support
From: Edward Cree @ 2017-12-18 16:54 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev

Basic PCI-level changes to support X2000-series NICs.
Also fix unexpected-PTP-event log messages, since the timestamp format has
 been changed in these NICs and that causes us to fail to probe PTP (but we
 still get the PPS events).

Bert Kenward (2):
  sfc: update EF10 register definitions
  sfc: populate the timer reload field

Edward Cree (4):
  sfc: make mem_bar a function rather than a constant
  sfc: support VI strides other than 8k
  sfc: add Medford2 (SFC9250) PCI Device IDs
  sfc: improve PTP error reporting

 drivers/net/ethernet/sfc/ef10.c       | 126 +++++++++++++++++++++++++---------
 drivers/net/ethernet/sfc/ef10_regs.h  |  46 ++++++++-----
 drivers/net/ethernet/sfc/efx.c        |  10 ++-
 drivers/net/ethernet/sfc/efx.h        |   5 --
 drivers/net/ethernet/sfc/io.h         |  19 ++---
 drivers/net/ethernet/sfc/mcdi.h       |   3 +
 drivers/net/ethernet/sfc/net_driver.h |   7 +-
 drivers/net/ethernet/sfc/ptp.c        |   4 +-
 drivers/net/ethernet/sfc/siena.c      |  10 ++-
 9 files changed, 162 insertions(+), 68 deletions(-)

^ permalink raw reply

* [PATCH net-next 1/6] sfc: make mem_bar a function rather than a constant
From: Edward Cree @ 2017-12-18 16:55 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <f9e1279b-03d0-729c-2518-c1e204444447@solarflare.com>

Support using BAR 0 on SFC9250, even though the driver doesn't bind to such
 devices yet.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef10.c       | 26 +++++++++++++++++++++++---
 drivers/net/ethernet/sfc/efx.c        |  4 ++--
 drivers/net/ethernet/sfc/efx.h        |  5 -----
 drivers/net/ethernet/sfc/net_driver.h |  2 +-
 drivers/net/ethernet/sfc/siena.c      | 10 +++++++++-
 5 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index e566dbb3343d..5cc786aec7c4 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -160,11 +160,31 @@ static int efx_ef10_get_warm_boot_count(struct efx_nic *efx)
 		EFX_DWORD_FIELD(reg, EFX_WORD_0) : -EIO;
 }
 
+/* On all EF10s up to and including SFC9220 (Medford1), all PFs use BAR 0 for
+ * I/O space and BAR 2(&3) for memory.  On SFC9250 (Medford2), there is no I/O
+ * bar; PFs use BAR 0/1 for memory.
+ */
+static unsigned int efx_ef10_pf_mem_bar(struct efx_nic *efx)
+{
+	switch (efx->pci_dev->device) {
+	case 0x0b03: /* SFC9250 PF */
+		return 0;
+	default:
+		return 2;
+	}
+}
+
+/* All VFs use BAR 0/1 for memory */
+static unsigned int efx_ef10_vf_mem_bar(struct efx_nic *efx)
+{
+	return 0;
+}
+
 static unsigned int efx_ef10_mem_map_size(struct efx_nic *efx)
 {
 	int bar;
 
-	bar = efx->type->mem_bar;
+	bar = efx->type->mem_bar(efx);
 	return resource_size(&efx->pci_dev->resource[bar]);
 }
 
@@ -6392,7 +6412,7 @@ static int efx_ef10_udp_tnl_del_port(struct efx_nic *efx,
 
 const struct efx_nic_type efx_hunt_a0_vf_nic_type = {
 	.is_vf = true,
-	.mem_bar = EFX_MEM_VF_BAR,
+	.mem_bar = efx_ef10_vf_mem_bar,
 	.mem_map_size = efx_ef10_mem_map_size,
 	.probe = efx_ef10_probe_vf,
 	.remove = efx_ef10_remove,
@@ -6500,7 +6520,7 @@ const struct efx_nic_type efx_hunt_a0_vf_nic_type = {
 
 const struct efx_nic_type efx_hunt_a0_nic_type = {
 	.is_vf = false,
-	.mem_bar = EFX_MEM_BAR,
+	.mem_bar = efx_ef10_pf_mem_bar,
 	.mem_map_size = efx_ef10_mem_map_size,
 	.probe = efx_ef10_probe_pf,
 	.remove = efx_ef10_remove,
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index e3c492fcaff0..bbe4ace7dd9d 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1248,7 +1248,7 @@ static int efx_init_io(struct efx_nic *efx)
 
 	netif_dbg(efx, probe, efx->net_dev, "initialising I/O\n");
 
-	bar = efx->type->mem_bar;
+	bar = efx->type->mem_bar(efx);
 
 	rc = pci_enable_device(pci_dev);
 	if (rc) {
@@ -1323,7 +1323,7 @@ static void efx_fini_io(struct efx_nic *efx)
 	}
 
 	if (efx->membase_phys) {
-		bar = efx->type->mem_bar;
+		bar = efx->type->mem_bar(efx);
 		pci_release_region(efx->pci_dev, bar);
 		efx->membase_phys = 0;
 	}
diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
index 52c84b782901..16da3e9a6000 100644
--- a/drivers/net/ethernet/sfc/efx.h
+++ b/drivers/net/ethernet/sfc/efx.h
@@ -14,11 +14,6 @@
 #include "net_driver.h"
 #include "filter.h"
 
-/* All controllers use BAR 0 for I/O space and BAR 2(&3) for memory */
-/* All VFs use BAR 0/1 for memory */
-#define EFX_MEM_BAR 2
-#define EFX_MEM_VF_BAR 0
-
 int efx_net_open(struct net_device *net_dev);
 int efx_net_stop(struct net_device *net_dev);
 
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index c0537ea06c9a..2b6599f8d9fa 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1154,7 +1154,7 @@ struct efx_udp_tunnel {
  */
 struct efx_nic_type {
 	bool is_vf;
-	unsigned int mem_bar;
+	unsigned int (*mem_bar)(struct efx_nic *efx);
 	unsigned int (*mem_map_size)(struct efx_nic *efx);
 	int (*probe)(struct efx_nic *efx);
 	void (*remove)(struct efx_nic *efx);
diff --git a/drivers/net/ethernet/sfc/siena.c b/drivers/net/ethernet/sfc/siena.c
index a617f657eae3..22d49ebb347c 100644
--- a/drivers/net/ethernet/sfc/siena.c
+++ b/drivers/net/ethernet/sfc/siena.c
@@ -242,6 +242,14 @@ static int siena_dimension_resources(struct efx_nic *efx)
 	return 0;
 }
 
+/* On all Falcon-architecture NICs, PFs use BAR 0 for I/O space and BAR 2(&3)
+ * for memory.
+ */
+static unsigned int siena_mem_bar(struct efx_nic *efx)
+{
+	return 2;
+}
+
 static unsigned int siena_mem_map_size(struct efx_nic *efx)
 {
 	return FR_CZ_MC_TREG_SMEM +
@@ -950,7 +958,7 @@ static int siena_mtd_probe(struct efx_nic *efx)
 
 const struct efx_nic_type siena_a0_nic_type = {
 	.is_vf = false,
-	.mem_bar = EFX_MEM_BAR,
+	.mem_bar = siena_mem_bar,
 	.mem_map_size = siena_mem_map_size,
 	.probe = siena_probe_nic,
 	.remove = siena_remove_nic,

^ permalink raw reply related

* [PATCH net-next 2/6] sfc: support VI strides other than 8k
From: Edward Cree @ 2017-12-18 16:56 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <f9e1279b-03d0-729c-2518-c1e204444447@solarflare.com>

Medford2 can also have 16k or 64k VI stride.  This is reported by MCDI in
 GET_CAPABILITIES, which fortunately is called before the driver does
 anything sensitive to the VI stride (such as accessing or even allocating
 VIs past the zeroth).

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef10.c       | 70 +++++++++++++++++++++++++----------
 drivers/net/ethernet/sfc/efx.c        |  2 +
 drivers/net/ethernet/sfc/io.h         | 19 ++++++----
 drivers/net/ethernet/sfc/mcdi.h       |  3 ++
 drivers/net/ethernet/sfc/net_driver.h |  3 ++
 5 files changed, 70 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 5cc786aec7c4..dcd6be14a430 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -233,7 +233,7 @@ static int efx_ef10_get_vf_index(struct efx_nic *efx)
 
 static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
 {
-	MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V2_OUT_LEN);
+	MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V3_OUT_LEN);
 	struct efx_ef10_nic_data *nic_data = efx->nic_data;
 	size_t outlen;
 	int rc;
@@ -277,6 +277,35 @@ static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
 		return -ENODEV;
 	}
 
+	if (outlen >= MC_CMD_GET_CAPABILITIES_V3_OUT_LEN) {
+		u8 vi_window_mode = MCDI_BYTE(outbuf,
+				GET_CAPABILITIES_V3_OUT_VI_WINDOW_MODE);
+
+		switch (vi_window_mode) {
+		case MC_CMD_GET_CAPABILITIES_V3_OUT_VI_WINDOW_MODE_8K:
+			efx->vi_stride = 8192;
+			break;
+		case MC_CMD_GET_CAPABILITIES_V3_OUT_VI_WINDOW_MODE_16K:
+			efx->vi_stride = 16384;
+			break;
+		case MC_CMD_GET_CAPABILITIES_V3_OUT_VI_WINDOW_MODE_64K:
+			efx->vi_stride = 65536;
+			break;
+		default:
+			netif_err(efx, probe, efx->net_dev,
+				  "Unrecognised VI window mode %d\n",
+				  vi_window_mode);
+			return -EIO;
+		}
+		netif_dbg(efx, probe, efx->net_dev, "vi_stride = %u\n",
+			  efx->vi_stride);
+	} else {
+		/* keep default VI stride */
+		netif_dbg(efx, probe, efx->net_dev,
+			  "firmware did not report VI window mode, assuming vi_stride = %u\n",
+			  efx->vi_stride);
+	}
+
 	return 0;
 }
 
@@ -609,17 +638,6 @@ static int efx_ef10_probe(struct efx_nic *efx)
 	struct efx_ef10_nic_data *nic_data;
 	int i, rc;
 
-	/* We can have one VI for each 8K region.  However, until we
-	 * use TX option descriptors we need two TX queues per channel.
-	 */
-	efx->max_channels = min_t(unsigned int,
-				  EFX_MAX_CHANNELS,
-				  efx_ef10_mem_map_size(efx) /
-				  (EFX_VI_PAGE_SIZE * EFX_TXQ_TYPES));
-	efx->max_tx_channels = efx->max_channels;
-	if (WARN_ON(efx->max_channels == 0))
-		return -EIO;
-
 	nic_data = kzalloc(sizeof(*nic_data), GFP_KERNEL);
 	if (!nic_data)
 		return -ENOMEM;
@@ -691,6 +709,20 @@ static int efx_ef10_probe(struct efx_nic *efx)
 	if (rc < 0)
 		goto fail5;
 
+	/* We can have one VI for each vi_stride-byte region.
+	 * However, until we use TX option descriptors we need two TX queues
+	 * per channel.
+	 */
+	efx->max_channels = min_t(unsigned int,
+				  EFX_MAX_CHANNELS,
+				  efx_ef10_mem_map_size(efx) /
+				  (efx->vi_stride * EFX_TXQ_TYPES));
+	efx->max_tx_channels = efx->max_channels;
+	if (WARN_ON(efx->max_channels == 0)) {
+		rc = -EIO;
+		goto fail5;
+	}
+
 	efx->rx_packet_len_offset =
 		ES_DZ_RX_PREFIX_PKTLEN_OFST - ES_DZ_RX_PREFIX_SIZE;
 
@@ -927,7 +959,7 @@ static int efx_ef10_link_piobufs(struct efx_nic *efx)
 			} else {
 				tx_queue->piobuf =
 					nic_data->pio_write_base +
-					index * EFX_VI_PAGE_SIZE + offset;
+					index * efx->vi_stride + offset;
 				tx_queue->piobuf_offset = offset;
 				netif_dbg(efx, probe, efx->net_dev,
 					  "linked VI %u to PIO buffer %u offset %x addr %p\n",
@@ -1273,19 +1305,19 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
 	 * for writing PIO buffers through.
 	 *
 	 * The UC mapping contains (channel_vis - 1) complete VIs and the
-	 * first half of the next VI.  Then the WC mapping begins with
-	 * the second half of this last VI.
+	 * first 4K of the next VI.  Then the WC mapping begins with
+	 * the remainder of this last VI.
 	 */
-	uc_mem_map_size = PAGE_ALIGN((channel_vis - 1) * EFX_VI_PAGE_SIZE +
+	uc_mem_map_size = PAGE_ALIGN((channel_vis - 1) * efx->vi_stride +
 				     ER_DZ_TX_PIOBUF);
 	if (nic_data->n_piobufs) {
 		/* pio_write_vi_base rounds down to give the number of complete
 		 * VIs inside the UC mapping.
 		 */
-		pio_write_vi_base = uc_mem_map_size / EFX_VI_PAGE_SIZE;
+		pio_write_vi_base = uc_mem_map_size / efx->vi_stride;
 		wc_mem_map_size = (PAGE_ALIGN((pio_write_vi_base +
 					       nic_data->n_piobufs) *
-					      EFX_VI_PAGE_SIZE) -
+					      efx->vi_stride) -
 				   uc_mem_map_size);
 		max_vis = pio_write_vi_base + nic_data->n_piobufs;
 	} else {
@@ -1357,7 +1389,7 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
 		nic_data->pio_write_vi_base = pio_write_vi_base;
 		nic_data->pio_write_base =
 			nic_data->wc_membase +
-			(pio_write_vi_base * EFX_VI_PAGE_SIZE + ER_DZ_TX_PIOBUF -
+			(pio_write_vi_base * efx->vi_stride + ER_DZ_TX_PIOBUF -
 			 uc_mem_map_size);
 
 		rc = efx_ef10_link_piobufs(efx);
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index bbe4ace7dd9d..e50049cba50b 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -27,6 +27,7 @@
 #include <net/udp_tunnel.h>
 #include "efx.h"
 #include "nic.h"
+#include "io.h"
 #include "selftest.h"
 #include "sriov.h"
 
@@ -2977,6 +2978,7 @@ static int efx_init_struct(struct efx_nic *efx,
 	efx->rx_packet_ts_offset =
 		efx->type->rx_ts_offset - efx->type->rx_prefix_size;
 	spin_lock_init(&efx->stats_lock);
+	efx->vi_stride = EFX_DEFAULT_VI_STRIDE;
 	mutex_init(&efx->mac_lock);
 	efx->phy_op = &efx_dummy_phy_operations;
 	efx->mdio.dev = net_dev;
diff --git a/drivers/net/ethernet/sfc/io.h b/drivers/net/ethernet/sfc/io.h
index afb94aa2c15e..89563170af52 100644
--- a/drivers/net/ethernet/sfc/io.h
+++ b/drivers/net/ethernet/sfc/io.h
@@ -222,18 +222,21 @@ static inline void efx_reado_table(struct efx_nic *efx, efx_oword_t *value,
 	efx_reado(efx, value, reg + index * sizeof(efx_oword_t));
 }
 
-/* Page size used as step between per-VI registers */
-#define EFX_VI_PAGE_SIZE 0x2000
+/* default VI stride (step between per-VI registers) is 8K */
+#define EFX_DEFAULT_VI_STRIDE 0x2000
 
 /* Calculate offset to page-mapped register */
-#define EFX_PAGED_REG(page, reg) \
-	((page) * EFX_VI_PAGE_SIZE + (reg))
+static inline unsigned int efx_paged_reg(struct efx_nic *efx, unsigned int page,
+					 unsigned int reg)
+{
+	return page * efx->vi_stride + reg;
+}
 
 /* Write the whole of RX_DESC_UPD or TX_DESC_UPD */
 static inline void _efx_writeo_page(struct efx_nic *efx, efx_oword_t *value,
 				    unsigned int reg, unsigned int page)
 {
-	reg = EFX_PAGED_REG(page, reg);
+	reg = efx_paged_reg(efx, page, reg);
 
 	netif_vdbg(efx, hw, efx->net_dev,
 		   "writing register %x with " EFX_OWORD_FMT "\n", reg,
@@ -262,7 +265,7 @@ static inline void
 _efx_writed_page(struct efx_nic *efx, const efx_dword_t *value,
 		 unsigned int reg, unsigned int page)
 {
-	efx_writed(efx, value, EFX_PAGED_REG(page, reg));
+	efx_writed(efx, value, efx_paged_reg(efx, page, reg));
 }
 #define efx_writed_page(efx, value, reg, page)				\
 	_efx_writed_page(efx, value,					\
@@ -288,10 +291,10 @@ static inline void _efx_writed_page_locked(struct efx_nic *efx,
 
 	if (page == 0) {
 		spin_lock_irqsave(&efx->biu_lock, flags);
-		efx_writed(efx, value, EFX_PAGED_REG(page, reg));
+		efx_writed(efx, value, efx_paged_reg(efx, page, reg));
 		spin_unlock_irqrestore(&efx->biu_lock, flags);
 	} else {
-		efx_writed(efx, value, EFX_PAGED_REG(page, reg));
+		efx_writed(efx, value, efx_paged_reg(efx, page, reg));
 	}
 }
 #define efx_writed_page_locked(efx, value, reg, page)			\
diff --git a/drivers/net/ethernet/sfc/mcdi.h b/drivers/net/ethernet/sfc/mcdi.h
index 154ef41d1927..ebd95972ae7b 100644
--- a/drivers/net/ethernet/sfc/mcdi.h
+++ b/drivers/net/ethernet/sfc/mcdi.h
@@ -208,6 +208,9 @@ void efx_mcdi_sensor_event(struct efx_nic *efx, efx_qword_t *ev);
 #define _MCDI_DWORD(_buf, _field)					\
 	((_buf) + (_MCDI_CHECK_ALIGN(MC_CMD_ ## _field ## _OFST, 4) >> 2))
 
+#define MCDI_BYTE(_buf, _field)						\
+	((void)BUILD_BUG_ON_ZERO(MC_CMD_ ## _field ## _LEN != 1),	\
+	 *MCDI_PTR(_buf, _field))
 #define MCDI_WORD(_buf, _field)						\
 	((u16)BUILD_BUG_ON_ZERO(MC_CMD_ ## _field ## _LEN != 2) +	\
 	 le16_to_cpu(*(__force const __le16 *)MCDI_PTR(_buf, _field)))
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 2b6599f8d9fa..2e41f2c39c4a 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -708,6 +708,7 @@ struct vfdi_status;
  * @reset_work: Scheduled reset workitem
  * @membase_phys: Memory BAR value as physical address
  * @membase: Memory BAR value
+ * @vi_stride: step between per-VI registers / memory regions
  * @interrupt_mode: Interrupt mode
  * @timer_quantum_ns: Interrupt timer quantum, in nanoseconds
  * @timer_max_ns: Interrupt timer maximum value, in nanoseconds
@@ -842,6 +843,8 @@ struct efx_nic {
 	resource_size_t membase_phys;
 	void __iomem *membase;
 
+	unsigned int vi_stride;
+
 	enum efx_int_mode interrupt_mode;
 	unsigned int timer_quantum_ns;
 	unsigned int timer_max_ns;

^ permalink raw reply related

* [PATCH net-next 3/6] sfc: add Medford2 (SFC9250) PCI Device IDs
From: Edward Cree @ 2017-12-18 16:56 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <f9e1279b-03d0-729c-2518-c1e204444447@solarflare.com>

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/efx.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index e50049cba50b..7bcbedce07a5 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -2910,6 +2910,10 @@ static const struct pci_device_id efx_pci_table[] = {
 	 .driver_data = (unsigned long) &efx_hunt_a0_nic_type},
 	{PCI_DEVICE(PCI_VENDOR_ID_SOLARFLARE, 0x1a03),  /* SFC9220 VF */
 	 .driver_data = (unsigned long) &efx_hunt_a0_vf_nic_type},
+	{PCI_DEVICE(PCI_VENDOR_ID_SOLARFLARE, 0x0b03),  /* SFC9250 PF */
+	 .driver_data = (unsigned long) &efx_hunt_a0_nic_type},
+	{PCI_DEVICE(PCI_VENDOR_ID_SOLARFLARE, 0x1b03),  /* SFC9250 VF */
+	 .driver_data = (unsigned long) &efx_hunt_a0_vf_nic_type},
 	{0}			/* end of list */
 };
 

^ permalink raw reply related

* [PATCH net-next 4/6] sfc: improve PTP error reporting
From: Edward Cree @ 2017-12-18 16:56 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <f9e1279b-03d0-729c-2518-c1e204444447@solarflare.com>

Log a message if PTP probing fails; if we then, unexpectedly, get PTP
 events, only log a message for the first one on each device.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef10.c       | 9 ++++++++-
 drivers/net/ethernet/sfc/net_driver.h | 2 ++
 drivers/net/ethernet/sfc/ptp.c        | 4 +++-
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index dcd6be14a430..009bf28bdba5 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -747,7 +747,14 @@ static int efx_ef10_probe(struct efx_nic *efx)
 	if (rc && rc != -EPERM)
 		goto fail5;
 
-	efx_ptp_probe(efx, NULL);
+	rc = efx_ptp_probe(efx, NULL);
+	/* Failure to probe PTP is not fatal.
+	 * In the case of EPERM, efx_ptp_probe will print its own message (in
+	 * efx_ptp_get_attributes()), so we don't need to.
+	 */
+	if (rc && rc != -EPERM)
+		netif_warn(efx, drv, efx->net_dev,
+			   "Failed to probe PTP, rc=%d\n", rc);
 
 #ifdef CONFIG_SFC_SRIOV
 	if ((efx->pci_dev->physfn) && (!efx->pci_dev->is_physfn)) {
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 2e41f2c39c4a..6b8730a24513 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -813,6 +813,7 @@ struct vfdi_status;
  * @vf_init_count: Number of VFs that have been fully initialised.
  * @vi_scale: log2 number of vnics per VF.
  * @ptp_data: PTP state data
+ * @ptp_warned: has this NIC seen and warned about unexpected PTP events?
  * @vpd_sn: Serial number read from VPD
  * @monitor_work: Hardware monitor workitem
  * @biu_lock: BIU (bus interface unit) lock
@@ -968,6 +969,7 @@ struct efx_nic {
 #endif
 
 	struct efx_ptp_data *ptp_data;
+	bool ptp_warned;
 
 	char *vpd_sn;
 
diff --git a/drivers/net/ethernet/sfc/ptp.c b/drivers/net/ethernet/sfc/ptp.c
index caa89bf7603e..3b37d7ded3c4 100644
--- a/drivers/net/ethernet/sfc/ptp.c
+++ b/drivers/net/ethernet/sfc/ptp.c
@@ -1662,9 +1662,11 @@ void efx_ptp_event(struct efx_nic *efx, efx_qword_t *ev)
 	int code = EFX_QWORD_FIELD(*ev, MCDI_EVENT_CODE);
 
 	if (!ptp) {
-		if (net_ratelimit())
+		if (!efx->ptp_warned) {
 			netif_warn(efx, drv, efx->net_dev,
 				   "Received PTP event but PTP not set up\n");
+			efx->ptp_warned = true;
+		}
 		return;
 	}
 

^ permalink raw reply related

* [PATCH net-next 5/6] sfc: update EF10 register definitions
From: Edward Cree @ 2017-12-18 16:57 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <f9e1279b-03d0-729c-2518-c1e204444447@solarflare.com>

From: Bert Kenward <bkenward@solarflare.com>

The RX_L4_CLASS field has shrunk from 3 bits to 2 bits. The upper
bit was never used in previous hardware, so we can use the new
definition throughout.

The TSO OUTER_IPID field was previously spelt differently from the
external definitions.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef10.c      | 16 ++++++-------
 drivers/net/ethernet/sfc/ef10_regs.h | 46 +++++++++++++++++++++++-------------
 2 files changed, 37 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 009bf28bdba5..56a6bc60dac1 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -3292,8 +3292,8 @@ static u16 efx_ef10_handle_rx_event_errors(struct efx_channel *channel,
 		if (unlikely(rx_encap_hdr != ESE_EZ_ENCAP_HDR_VXLAN &&
 			     ((rx_l3_class != ESE_DZ_L3_CLASS_IP4 &&
 			       rx_l3_class != ESE_DZ_L3_CLASS_IP6) ||
-			      (rx_l4_class != ESE_DZ_L4_CLASS_TCP &&
-			       rx_l4_class != ESE_DZ_L4_CLASS_UDP))))
+			      (rx_l4_class != ESE_FZ_L4_CLASS_TCP &&
+			       rx_l4_class != ESE_FZ_L4_CLASS_UDP))))
 			netdev_WARN(efx->net_dev,
 				    "invalid class for RX_TCPUDP_CKSUM_ERR: event="
 				    EFX_QWORD_FMT "\n",
@@ -3330,8 +3330,8 @@ static u16 efx_ef10_handle_rx_event_errors(struct efx_channel *channel,
 				    EFX_QWORD_VAL(*event));
 		else if (unlikely((rx_l3_class != ESE_DZ_L3_CLASS_IP4 &&
 				   rx_l3_class != ESE_DZ_L3_CLASS_IP6) ||
-				  (rx_l4_class != ESE_DZ_L4_CLASS_TCP &&
-				   rx_l4_class != ESE_DZ_L4_CLASS_UDP)))
+				  (rx_l4_class != ESE_FZ_L4_CLASS_TCP &&
+				   rx_l4_class != ESE_FZ_L4_CLASS_UDP)))
 			netdev_WARN(efx->net_dev,
 				    "invalid class for RX_TCP_UDP_INNER_CHKSUM_ERR: event="
 				    EFX_QWORD_FMT "\n",
@@ -3366,7 +3366,7 @@ static int efx_ef10_handle_rx_event(struct efx_channel *channel,
 	next_ptr_lbits = EFX_QWORD_FIELD(*event, ESF_DZ_RX_DSC_PTR_LBITS);
 	rx_queue_label = EFX_QWORD_FIELD(*event, ESF_DZ_RX_QLABEL);
 	rx_l3_class = EFX_QWORD_FIELD(*event, ESF_DZ_RX_L3_CLASS);
-	rx_l4_class = EFX_QWORD_FIELD(*event, ESF_DZ_RX_L4_CLASS);
+	rx_l4_class = EFX_QWORD_FIELD(*event, ESF_FZ_RX_L4_CLASS);
 	rx_cont = EFX_QWORD_FIELD(*event, ESF_DZ_RX_CONT);
 	rx_encap_hdr =
 		nic_data->datapath_caps &
@@ -3444,8 +3444,8 @@ static int efx_ef10_handle_rx_event(struct efx_channel *channel,
 							 rx_l3_class, rx_l4_class,
 							 event);
 	} else {
-		bool tcpudp = rx_l4_class == ESE_DZ_L4_CLASS_TCP ||
-			      rx_l4_class == ESE_DZ_L4_CLASS_UDP;
+		bool tcpudp = rx_l4_class == ESE_FZ_L4_CLASS_TCP ||
+			      rx_l4_class == ESE_FZ_L4_CLASS_UDP;
 
 		switch (rx_encap_hdr) {
 		case ESE_EZ_ENCAP_HDR_VXLAN: /* VxLAN or GENEVE */
@@ -3466,7 +3466,7 @@ static int efx_ef10_handle_rx_event(struct efx_channel *channel,
 		}
 	}
 
-	if (rx_l4_class == ESE_DZ_L4_CLASS_TCP)
+	if (rx_l4_class == ESE_FZ_L4_CLASS_TCP)
 		flags |= EFX_RX_PKT_TCP;
 
 	channel->irq_mod_score += 2 * n_packets;
diff --git a/drivers/net/ethernet/sfc/ef10_regs.h b/drivers/net/ethernet/sfc/ef10_regs.h
index 2c4bf9476c37..6a56778cf06c 100644
--- a/drivers/net/ethernet/sfc/ef10_regs.h
+++ b/drivers/net/ethernet/sfc/ef10_regs.h
@@ -1,6 +1,6 @@
 /****************************************************************************
  * Driver for Solarflare network controllers and boards
- * Copyright 2012-2015 Solarflare Communications Inc.
+ * Copyright 2012-2017 Solarflare Communications Inc.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License version 2 as published
@@ -79,6 +79,8 @@
 #define	ER_DZ_EVQ_TMR 0x00000420
 #define	ER_DZ_EVQ_TMR_STEP 8192
 #define	ER_DZ_EVQ_TMR_ROWS 2048
+#define	ERF_FZ_TC_TMR_REL_VAL_LBN 16
+#define	ERF_FZ_TC_TMR_REL_VAL_WIDTH 14
 #define	ERF_DZ_TC_TIMER_MODE_LBN 14
 #define	ERF_DZ_TC_TIMER_MODE_WIDTH 2
 #define	ERF_DZ_TC_TIMER_VAL_LBN 0
@@ -159,16 +161,24 @@
 #define	ESF_DZ_RX_EV_SOFT2_WIDTH 2
 #define	ESF_DZ_RX_DSC_PTR_LBITS_LBN 48
 #define	ESF_DZ_RX_DSC_PTR_LBITS_WIDTH 4
-#define	ESF_DZ_RX_L4_CLASS_LBN 45
-#define	ESF_DZ_RX_L4_CLASS_WIDTH 3
-#define	ESE_DZ_L4_CLASS_RSVD7 7
-#define	ESE_DZ_L4_CLASS_RSVD6 6
-#define	ESE_DZ_L4_CLASS_RSVD5 5
-#define	ESE_DZ_L4_CLASS_RSVD4 4
-#define	ESE_DZ_L4_CLASS_RSVD3 3
-#define	ESE_DZ_L4_CLASS_UDP 2
-#define	ESE_DZ_L4_CLASS_TCP 1
-#define	ESE_DZ_L4_CLASS_UNKNOWN 0
+#define	ESF_DE_RX_L4_CLASS_LBN 45
+#define	ESF_DE_RX_L4_CLASS_WIDTH 3
+#define	ESE_DE_L4_CLASS_RSVD7 7
+#define	ESE_DE_L4_CLASS_RSVD6 6
+#define	ESE_DE_L4_CLASS_RSVD5 5
+#define	ESE_DE_L4_CLASS_RSVD4 4
+#define	ESE_DE_L4_CLASS_RSVD3 3
+#define	ESE_DE_L4_CLASS_UDP 2
+#define	ESE_DE_L4_CLASS_TCP 1
+#define	ESE_DE_L4_CLASS_UNKNOWN 0
+#define	ESF_FZ_RX_FASTPD_INDCTR_LBN 47
+#define	ESF_FZ_RX_FASTPD_INDCTR_WIDTH 1
+#define	ESF_FZ_RX_L4_CLASS_LBN 45
+#define	ESF_FZ_RX_L4_CLASS_WIDTH 2
+#define	ESE_FZ_L4_CLASS_RSVD3 3
+#define	ESE_FZ_L4_CLASS_UDP 2
+#define	ESE_FZ_L4_CLASS_TCP 1
+#define	ESE_FZ_L4_CLASS_UNKNOWN 0
 #define	ESF_DZ_RX_L3_CLASS_LBN 42
 #define	ESF_DZ_RX_L3_CLASS_WIDTH 3
 #define	ESE_DZ_L3_CLASS_RSVD7 7
@@ -215,6 +225,8 @@
 #define	ESF_EZ_RX_ABORT_WIDTH 1
 #define	ESF_DZ_RX_ECC_ERR_LBN 29
 #define	ESF_DZ_RX_ECC_ERR_WIDTH 1
+#define	ESF_DZ_RX_TRUNC_ERR_LBN 29
+#define	ESF_DZ_RX_TRUNC_ERR_WIDTH 1
 #define	ESF_DZ_RX_CRC1_ERR_LBN 28
 #define	ESF_DZ_RX_CRC1_ERR_WIDTH 1
 #define	ESF_DZ_RX_CRC0_ERR_LBN 27
@@ -332,6 +344,8 @@
 #define	ESE_DZ_TX_OPTION_DESC_CRC_CSUM 0
 #define	ESF_DZ_TX_TSO_OPTION_TYPE_LBN 56
 #define	ESF_DZ_TX_TSO_OPTION_TYPE_WIDTH 4
+#define	ESE_DZ_TX_TSO_OPTION_DESC_FATSO2B 3
+#define	ESE_DZ_TX_TSO_OPTION_DESC_FATSO2A 2
 #define	ESE_DZ_TX_TSO_OPTION_DESC_ENCAP 1
 #define	ESE_DZ_TX_TSO_OPTION_DESC_NORMAL 0
 #define	ESF_DZ_TX_TSO_TCP_FLAGS_LBN 48
@@ -341,7 +355,7 @@
 #define	ESF_DZ_TX_TSO_TCP_SEQNO_LBN 0
 #define	ESF_DZ_TX_TSO_TCP_SEQNO_WIDTH 32
 
-/* TX_TSO_FATSO2A_DESC */
+/* TX_TSO_V2_DESC_A */
 #define	ESF_DZ_TX_DESC_IS_OPT_LBN 63
 #define	ESF_DZ_TX_DESC_IS_OPT_WIDTH 1
 #define	ESF_DZ_TX_OPTION_TYPE_LBN 60
@@ -360,8 +374,7 @@
 #define	ESF_DZ_TX_TSO_TCP_SEQNO_LBN 0
 #define	ESF_DZ_TX_TSO_TCP_SEQNO_WIDTH 32
 
-
-/* TX_TSO_FATSO2B_DESC */
+/* TX_TSO_V2_DESC_B */
 #define	ESF_DZ_TX_DESC_IS_OPT_LBN 63
 #define	ESF_DZ_TX_DESC_IS_OPT_WIDTH 1
 #define	ESF_DZ_TX_OPTION_TYPE_LBN 60
@@ -375,11 +388,10 @@
 #define	ESE_DZ_TX_TSO_OPTION_DESC_FATSO2A 2
 #define	ESE_DZ_TX_TSO_OPTION_DESC_ENCAP 1
 #define	ESE_DZ_TX_TSO_OPTION_DESC_NORMAL 0
-#define	ESF_DZ_TX_TSO_OUTER_IP_ID_LBN 0
-#define	ESF_DZ_TX_TSO_OUTER_IP_ID_WIDTH 16
 #define	ESF_DZ_TX_TSO_TCP_MSS_LBN 32
 #define	ESF_DZ_TX_TSO_TCP_MSS_WIDTH 16
-
+#define	ESF_DZ_TX_TSO_OUTER_IPID_LBN 0
+#define	ESF_DZ_TX_TSO_OUTER_IPID_WIDTH 16
 
 /*************************************************************************/
 

^ permalink raw reply related

* [PATCH net-next 6/6] sfc: populate the timer reload field
From: Edward Cree @ 2017-12-18 16:57 UTC (permalink / raw)
  To: linux-net-drivers, davem; +Cc: netdev
In-Reply-To: <f9e1279b-03d0-729c-2518-c1e204444447@solarflare.com>

From: Bert Kenward <bkenward@solarflare.com>

The timer mode register now has a separate field for the reload value.
Since we always use this timer with the reload (for interrupt moderation)
we set this to the same as the initial value.

Previous hardware ignores this field, so we can safely set these bits
on all hardware that uses this register.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/ef10.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 56a6bc60dac1..1f64c7f60943 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -2010,8 +2010,9 @@ static void efx_ef10_push_irq_moderation(struct efx_channel *channel)
 	} else {
 		unsigned int ticks = efx_usecs_to_ticks(efx, usecs);
 
-		EFX_POPULATE_DWORD_2(timer_cmd, ERF_DZ_TC_TIMER_MODE, mode,
-				     ERF_DZ_TC_TIMER_VAL, ticks);
+		EFX_POPULATE_DWORD_3(timer_cmd, ERF_DZ_TC_TIMER_MODE, mode,
+				     ERF_DZ_TC_TIMER_VAL, ticks,
+				     ERF_FZ_TC_TMR_REL_VAL, ticks);
 		efx_writed_page(efx, &timer_cmd, ER_DZ_EVQ_TMR,
 				channel->channel);
 	}

^ permalink raw reply related

* Re: BUG: spinlock bad magic (2)
From: Dmitry Vyukov @ 2017-12-18 17:01 UTC (permalink / raw)
  To: Santosh Shilimkar
  Cc: syzbot, David Miller, LKML, linux-rdma, netdev, rds-devel,
	syzkaller-bugs
In-Reply-To: <6dbd4f85-f3f2-97f3-5b82-451276fbf877@oracle.com>

On Mon, Dec 18, 2017 at 5:46 PM, Santosh Shilimkar
<santosh.shilimkar@oracle.com> wrote:
> On 12/18/2017 4:36 AM, syzbot wrote:
>>
>> Hello,
>>
>> syzkaller hit the following crash on
>> 6084b576dca2e898f5c101baef151f7bfdbb606d
>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>>
>> Unfortunately, I don't have any reproducer for this bug yet.
>>
> [...]
>
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
>> IP: rds_send_xmit+0x80/0x930 net/rds/send.c:186
>
>
> This one seems to be same bug as reported as below.
>
> BUG: unable to handle kernel NULL pointer dereference in rds_send_xmit

Hi Santosh,

The proper syntax to tell syzbot about dups is this (from email footer):

> See https://goo.gl/tpsmEJ for details.
> Please credit me with: Reported-by: syzbot <syzkaller@googlegroups.com>
> syzbot will keep track of this bug report.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* Re: [v2 PATCH -tip 3/6] net: sctp: Add SCTP ACK tracking trace event
From: Steven Rostedt @ 2017-12-18 17:05 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Ingo Molnar, Ian McDonald, Vlad Yasevich, Stephen Hemminger,
	Peter Zijlstra, Thomas Gleixner, LKML, H . Peter Anvin,
	Gerrit Renker, David S . Miller, Neil Horman, dccp, netdev,
	linux-sctp, Stephen Rothwell
In-Reply-To: <151358473510.28850.10475072993963389604.stgit@devbox>

On Mon, 18 Dec 2017 17:12:15 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Add SCTP ACK tracking trace event to trace the changes of SCTP
> association state in response to incoming packets.
> It is used for debugging SCTP congestion control algorithms,
> and will replace sctp_probe module.
> 
> Note that this event a bit tricky. Since this consists of 2
> events (sctp_probe and sctp_probe_path) so you have to enable
> both events as below.
> 
>   # cd /sys/kernel/debug/tracing
>   # echo 1 > events/sctp/sctp_probe/enable
>   # echo 1 > events/sctp/sctp_probe_path/enable
> 
> Or, you can enable all the events under sctp.
> 
>   # echo 1 > events/sctp/enable
> 
> Since sctp_probe_path event is always invoked from sctp_probe
> event, you can not see any output if you only enable
> sctp_probe_path.

I have to ask, why did you do it this way?


> +#include <trace/define_trace.h>
> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
> index 8f8ccded13e4..c5f92b2cc5c3 100644
> --- a/net/sctp/sm_statefuns.c
> +++ b/net/sctp/sm_statefuns.c
> @@ -59,6 +59,9 @@
>  #include <net/sctp/sm.h>
>  #include <net/sctp/structs.h>
>  
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/sctp.h>
> +
>  static struct sctp_packet *sctp_abort_pkt_new(
>  					struct net *net,
>  					const struct sctp_endpoint *ep,
> @@ -3219,6 +3222,8 @@ enum sctp_disposition sctp_sf_eat_sack_6_2(struct net *net,
>  	struct sctp_sackhdr *sackh;
>  	__u32 ctsn;
>  
> +	trace_sctp_probe(ep, asoc, chunk);

What about doing this right after this probe:

	if (trace_sctp_probe_path_enabled()) {
		struct sctp_transport *sp;

		list_for_each_entry(sp, &asoc->peer.transpor_addr_list,
				    transports) {
			trace_sctp_probe_path(sp, asoc);
		}
	}

The "trace_sctp_probe_path_enabled()" is a static branch, which means
it's a nop just like a tracepoint is, and will not add any overhead if
the trace_sctp_probe_path is not enabled.

-- Steve

> +
>  	if (!sctp_vtag_verify(chunk, asoc))
>  		return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
>  

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox