Netdev List
 help / color / mirror / Atom feed
* [PATCH net 0/2] Fix vlan untag and insertion for bridge and vlan with reorder_hdr off
From: Toshiaki Makita @ 2018-03-13  5:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: Toshiaki Makita, netdev, Brandon Carpenter, Vlad Yasevich

As Brandon Carpenter reported[1], sending non-vlan-offloaded packets from
bridge devices ends up with corrupted packets. He narrowed down this problem
and found that the root cause is in skb_reorder_vlan_header().

While I was working on fixing this problem, I found that the function does
not work properly for double tagged packets with reorder_hdr off as well.

Patch 1 fixes these 2 problems in skb_reorder_vlan_header().

And it turned out that fixing skb_reorder_vlan_header() is not sufficient
to receive double tagged packets with reorder_hdr off while I was testing the
fix. Vlan tags got out of order when vlan devices with reorder_hdr disabled
were stacked. Patch 2 fixes this problem.

[1] https://www.spinics.net/lists/linux-ethernet-bridging/msg07039.html

Toshiaki Makita (2):
  net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off
  vlan: Fix out of order vlan headers with reorder header off

 include/linux/if_vlan.h       | 66 +++++++++++++++++++++++++++++++++++--------
 include/uapi/linux/if_ether.h |  1 +
 net/8021q/vlan_core.c         |  4 +--
 net/core/skbuff.c             |  7 +++--
 4 files changed, 63 insertions(+), 15 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH net 2/2] vlan: Fix out of order vlan headers with reorder header off
From: Toshiaki Makita @ 2018-03-13  5:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: Toshiaki Makita, netdev, Brandon Carpenter, Vlad Yasevich
In-Reply-To: <1520920288-2483-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>

With reorder header off, received packets are untagged in skb_vlan_untag()
called from within __netif_receive_skb_core(), and later the tag will be
inserted back in vlan_do_receive().

This caused out of order vlan headers when we create a vlan device on top
of another vlan device, because vlan_do_receive() inserts a tag as the
outermost vlan tag. E.g. the outer tag is first removed in skb_vlan_untag()
and inserted back in vlan_do_receive(), then the inner tag is next removed
and inserted back as the outermost tag.

This patch fixes the behaviour by inserting the inner tag at the right
position.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
---
 include/linux/if_vlan.h | 66 ++++++++++++++++++++++++++++++++++++++++---------
 net/8021q/vlan_core.c   |  4 +--
 2 files changed, 57 insertions(+), 13 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 5e6a2d4..c4a1cff 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -300,30 +300,34 @@ static inline bool vlan_hw_offload_capable(netdev_features_t features,
 }
 
 /**
- * __vlan_insert_tag - regular VLAN tag inserting
+ * __vlan_insert_inner_tag - inner VLAN tag inserting
  * @skb: skbuff to tag
  * @vlan_proto: VLAN encapsulation protocol
  * @vlan_tci: VLAN TCI to insert
+ * @mac_len: MAC header length including outer vlan headers
  *
- * Inserts the VLAN tag into @skb as part of the payload
+ * Inserts the VLAN tag into @skb as part of the payload at offset mac_len
  * Returns error if skb_cow_head failes.
  *
  * Does not change skb->protocol so this function can be used during receive.
  */
-static inline int __vlan_insert_tag(struct sk_buff *skb,
-				    __be16 vlan_proto, u16 vlan_tci)
+static inline int __vlan_insert_inner_tag(struct sk_buff *skb,
+					  __be16 vlan_proto, u16 vlan_tci,
+					  unsigned int mac_len)
 {
 	struct vlan_ethhdr *veth;
 
 	if (skb_cow_head(skb, VLAN_HLEN) < 0)
 		return -ENOMEM;
 
-	veth = skb_push(skb, VLAN_HLEN);
+	skb_push(skb, VLAN_HLEN);
 
-	/* Move the mac addresses to the beginning of the new header. */
-	memmove(skb->data, skb->data + VLAN_HLEN, 2 * ETH_ALEN);
+	/* Move the mac header sans proto to the beginning of the new header. */
+	memmove(skb->data, skb->data + VLAN_HLEN, mac_len - ETH_TLEN);
 	skb->mac_header -= VLAN_HLEN;
 
+	veth = (struct vlan_ethhdr *)(skb->data + mac_len - ETH_HLEN);
+
 	/* first, the ethernet type */
 	veth->h_vlan_proto = vlan_proto;
 
@@ -334,12 +338,30 @@ static inline int __vlan_insert_tag(struct sk_buff *skb,
 }
 
 /**
- * vlan_insert_tag - regular VLAN tag inserting
+ * __vlan_insert_tag - regular VLAN tag inserting
  * @skb: skbuff to tag
  * @vlan_proto: VLAN encapsulation protocol
  * @vlan_tci: VLAN TCI to insert
  *
  * Inserts the VLAN tag into @skb as part of the payload
+ * Returns error if skb_cow_head failes.
+ *
+ * Does not change skb->protocol so this function can be used during receive.
+ */
+static inline int __vlan_insert_tag(struct sk_buff *skb,
+				    __be16 vlan_proto, u16 vlan_tci)
+{
+	return __vlan_insert_inner_tag(skb, vlan_proto, vlan_tci, ETH_HLEN);
+}
+
+/**
+ * vlan_insert_inner_tag - inner VLAN tag inserting
+ * @skb: skbuff to tag
+ * @vlan_proto: VLAN encapsulation protocol
+ * @vlan_tci: VLAN TCI to insert
+ * @mac_len: MAC header length including outer vlan headers
+ *
+ * Inserts the VLAN tag into @skb as part of the payload at offset mac_len
  * Returns a VLAN tagged skb. If a new skb is created, @skb is freed.
  *
  * Following the skb_unshare() example, in case of error, the calling function
@@ -347,12 +369,14 @@ static inline int __vlan_insert_tag(struct sk_buff *skb,
  *
  * Does not change skb->protocol so this function can be used during receive.
  */
-static inline struct sk_buff *vlan_insert_tag(struct sk_buff *skb,
-					      __be16 vlan_proto, u16 vlan_tci)
+static inline struct sk_buff *vlan_insert_inner_tag(struct sk_buff *skb,
+						    __be16 vlan_proto,
+						    u16 vlan_tci,
+						    unsigned int mac_len)
 {
 	int err;
 
-	err = __vlan_insert_tag(skb, vlan_proto, vlan_tci);
+	err = __vlan_insert_inner_tag(skb, vlan_proto, vlan_tci, mac_len);
 	if (err) {
 		dev_kfree_skb_any(skb);
 		return NULL;
@@ -361,6 +385,26 @@ static inline struct sk_buff *vlan_insert_tag(struct sk_buff *skb,
 }
 
 /**
+ * vlan_insert_tag - regular VLAN tag inserting
+ * @skb: skbuff to tag
+ * @vlan_proto: VLAN encapsulation protocol
+ * @vlan_tci: VLAN TCI to insert
+ *
+ * Inserts the VLAN tag into @skb as part of the payload
+ * Returns a VLAN tagged skb. If a new skb is created, @skb is freed.
+ *
+ * Following the skb_unshare() example, in case of error, the calling function
+ * doesn't have to worry about freeing the original skb.
+ *
+ * Does not change skb->protocol so this function can be used during receive.
+ */
+static inline struct sk_buff *vlan_insert_tag(struct sk_buff *skb,
+					      __be16 vlan_proto, u16 vlan_tci)
+{
+	return vlan_insert_inner_tag(skb, vlan_proto, vlan_tci, ETH_HLEN);
+}
+
+/**
  * vlan_insert_tag_set_proto - regular VLAN tag inserting
  * @skb: skbuff to tag
  * @vlan_proto: VLAN encapsulation protocol
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 64aa9f7..45c9bf5 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -48,8 +48,8 @@ bool vlan_do_receive(struct sk_buff **skbp)
 		 * original position later
 		 */
 		skb_push(skb, offset);
-		skb = *skbp = vlan_insert_tag(skb, skb->vlan_proto,
-					      skb->vlan_tci);
+		skb = *skbp = vlan_insert_inner_tag(skb, skb->vlan_proto,
+						    skb->vlan_tci, skb->mac_len);
 		if (!skb)
 			return false;
 		skb_pull(skb, offset + VLAN_HLEN);
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH v2 net] net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()
From: Steffen Klassert @ 2018-03-13  6:50 UTC (permalink / raw)
  To: Greg Hackmann; +Cc: Herbert Xu, David S. Miller, netdev, linux-kernel
In-Reply-To: <20180307224253.152470-1-ghackmann@google.com>

On Wed, Mar 07, 2018 at 02:42:53PM -0800, Greg Hackmann wrote:
> f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a
> __this_cpu_read() call inside ipcomp_alloc_tfms().
> 
> At the time, __this_cpu_read() required the caller to either not care
> about races or to handle preemption/interrupt issues.  3.15 tightened
> the rules around some per-cpu operations, and now __this_cpu_read()
> should never be used in a preemptible context.  On 3.15 and later, we
> need to use this_cpu_read() instead.
> 

...

> Signed-off-by: Greg Hackmann <ghackmann@google.com>

Patch applied, thanks!

^ permalink raw reply

* Re: [PATCH net-next v2] sctp: fix error return code in sctp_sendmsg_new_asoc()
From: Xin Long @ 2018-03-13  6:57 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: Vlad Yasevich, Neil Horman, linux-sctp, network dev,
	kernel-janitors
In-Reply-To: <1520910210-147500-1-git-send-email-weiyongjun1@huawei.com>

On Tue, Mar 13, 2018 at 11:03 AM, Wei Yongjun <weiyongjun1@huawei.com> wrote:
> Return error code -EINVAL in the address len check error handling
> case since 'err' can be overwrite to 0 by 'err = sctp_verify_addr()'
> in the for loop.
>
> Fixes: 2c0dbaa0c43d ("sctp: add support for SCTP_DSTADDRV4/6 Information for sendmsg")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> ---
> v1 -> v2: remove the 'err' initialization
> ---
>  net/sctp/socket.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 7d3476a..af5cf29 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1677,7 +1677,7 @@ static int sctp_sendmsg_new_asoc(struct sock *sk, __u16 sflags,
>         struct sctp_association *asoc;
>         enum sctp_scope scope;
>         struct cmsghdr *cmsg;
> -       int err = -EINVAL;
> +       int err;
>
>         *tp = NULL;
>
> @@ -1761,16 +1761,20 @@ static int sctp_sendmsg_new_asoc(struct sock *sk, __u16 sflags,
>                 memset(daddr, 0, sizeof(*daddr));
>                 dlen = cmsg->cmsg_len - sizeof(struct cmsghdr);
>                 if (cmsg->cmsg_type == SCTP_DSTADDRV4) {
> -                       if (dlen < sizeof(struct in_addr))
> +                       if (dlen < sizeof(struct in_addr)) {
> +                               err = -EINVAL;
>                                 goto free;
> +                       }
>
>                         dlen = sizeof(struct in_addr);
>                         daddr->v4.sin_family = AF_INET;
>                         daddr->v4.sin_port = htons(asoc->peer.port);
>                         memcpy(&daddr->v4.sin_addr, CMSG_DATA(cmsg), dlen);
>                 } else {
> -                       if (dlen < sizeof(struct in6_addr))
> +                       if (dlen < sizeof(struct in6_addr)) {
> +                               err = -EINVAL;
>                                 goto free;
> +                       }
>
>                         dlen = sizeof(struct in6_addr);
>                         daddr->v6.sin6_family = AF_INET6;
>
Reviewed-by: Xin Long <lucien.xin@gmail.com>

^ permalink raw reply

* Re: [PATCH v2] sctp: Fix double free in sctp_sendmsg_to_asoc
From: Xin Long @ 2018-03-13  7:03 UTC (permalink / raw)
  To: Neil Horman; +Cc: linux-sctp, network dev, davem
In-Reply-To: <20180312181525.21774-1-nhorman@tuxdriver.com>

On Tue, Mar 13, 2018 at 2:15 AM, Neil Horman <nhorman@tuxdriver.com> wrote:
> syzbot/kasan detected a double free in sctp_sendmsg_to_asoc:
> BUG: KASAN: use-after-free in sctp_association_free+0x7b7/0x930
> net/sctp/associola.c:332
> Read of size 8 at addr ffff8801d8006ae0 by task syzkaller914861/4202
>
> CPU: 1 PID: 4202 Comm: syzkaller914861 Not tainted 4.16.0-rc4+ #258
> Hardware name: Google Google Compute Engine/Google Compute Engine
> 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x24d lib/dump_stack.c:53
>  print_address_description+0x73/0x250 mm/kasan/report.c:256
>  kasan_report_error mm/kasan/report.c:354 [inline]
>  kasan_report+0x23c/0x360 mm/kasan/report.c:412
>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
>  sctp_association_free+0x7b7/0x930 net/sctp/associola.c:332
>  sctp_sendmsg+0xc67/0x1a80 net/sctp/socket.c:2075
>  inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
>  sock_sendmsg_nosec net/socket.c:629 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:639
>  SYSC_sendto+0x361/0x5c0 net/socket.c:1748
>  SyS_sendto+0x40/0x50 net/socket.c:1716
>  do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x42/0xb7
>
> This was introduced by commit:
> f84af33 sctp: factor out sctp_sendmsg_to_asoc from sctp_sendmsg
>
> As the newly refactored function moved the wait_for_sndbuf call to a
> point after the association was connected, allowing for peeloff events
> to occur, which in turn caused wait_for_sndbuf to return -EPIPE which
> was not caught by the logic that determines if an association should be
> freed or not.
>
> Fix it the easy way by returning the ordering of
> sctp_primitive_ASSOCIATE and sctp_wait_for_sndbuf to the old order, to
> ensure that EPIPE will not happen.
>
> Tested by myself using the syzbot reproducers with positive results
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: davem@davemloft.net
> CC: Xin Long <lucien.xin@gmail.com>
> Reported-by: syzbot+a4e4112c3aff00c8cfd8@syzkaller.appspotmail.com
>
> ---
> Change notes
> v2)
>  * Moved additional calls to restore origional ordering
>  * add sctp prefix
> ---
>  net/sctp/socket.c | 26 +++++++++++++-------------
>  1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 7d3476a4860d..4bbfcf9532c2 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1876,6 +1876,19 @@ static int sctp_sendmsg_to_asoc(struct sctp_association *asoc,
>                 goto err;
>         }
>
> +       if (asoc->pmtu_pending)
> +               sctp_assoc_pending_pmtu(asoc);
> +
> +       if (sctp_wspace(asoc) < msg_len)
> +               sctp_prsctp_prune(asoc, sinfo, msg_len - sctp_wspace(asoc));
> +
> +       if (!sctp_wspace(asoc)) {
> +               timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
> +               err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len);
> +               if (err)
> +                       goto err;
> +       }
> +
>         if (sctp_state(asoc, CLOSED)) {
>                 err = sctp_primitive_ASSOCIATE(net, asoc, NULL);
>                 if (err)
> @@ -1893,19 +1906,6 @@ static int sctp_sendmsg_to_asoc(struct sctp_association *asoc,
>                 pr_debug("%s: we associated primitively\n", __func__);
>         }
>
> -       if (asoc->pmtu_pending)
> -               sctp_assoc_pending_pmtu(asoc);
> -
> -       if (sctp_wspace(asoc) < msg_len)
> -               sctp_prsctp_prune(asoc, sinfo, msg_len - sctp_wspace(asoc));
> -
> -       if (!sctp_wspace(asoc)) {
> -               timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
> -               err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len);
> -               if (err)
> -                       goto err;
> -       }
> -
>         datamsg = sctp_datamsg_from_user(asoc, sinfo, &msg->msg_iter);
>         if (IS_ERR(datamsg)) {
>                 err = PTR_ERR(datamsg);
> --
> 2.14.3
>
Reviewed-by: Xin Long <lucien.xin@gmail.com>

^ permalink raw reply

* [PATCH 3/9] xfrm_user: uncoditionally validate esn replay attribute struct
From: Steffen Klassert @ 2018-03-13  7:09 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>

From: Florian Westphal <fw@strlen.de>

The sanity test added in ecd7918745234 can be bypassed, validation
only occurs if XFRM_STATE_ESN flag is set, but rest of code doesn't care
and just checks if the attribute itself is present.

So always validate.  Alternative is to reject if we have the attribute
without the flag but that would change abi.

Reported-by: syzbot+0ab777c27d2bb7588f73@syzkaller.appspotmail.com
Cc: Mathias Krause <minipli@googlemail.com>
Fixes: ecd7918745234 ("xfrm_user: ensure user supplied esn replay window is valid")
Fixes: d8647b79c3b7e ("xfrm: Add user interface for esn and big anti-replay windows")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_user.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 7f52b8eb177d..080035f056d9 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -121,22 +121,17 @@ static inline int verify_replay(struct xfrm_usersa_info *p,
 	struct nlattr *rt = attrs[XFRMA_REPLAY_ESN_VAL];
 	struct xfrm_replay_state_esn *rs;
 
-	if (p->flags & XFRM_STATE_ESN) {
-		if (!rt)
-			return -EINVAL;
+	if (!rt)
+		return (p->flags & XFRM_STATE_ESN) ? -EINVAL : 0;
 
-		rs = nla_data(rt);
+	rs = nla_data(rt);
 
-		if (rs->bmp_len > XFRMA_REPLAY_ESN_MAX / sizeof(rs->bmp[0]) / 8)
-			return -EINVAL;
-
-		if (nla_len(rt) < (int)xfrm_replay_state_esn_len(rs) &&
-		    nla_len(rt) != sizeof(*rs))
-			return -EINVAL;
-	}
+	if (rs->bmp_len > XFRMA_REPLAY_ESN_MAX / sizeof(rs->bmp[0]) / 8)
+		return -EINVAL;
 
-	if (!rt)
-		return 0;
+	if (nla_len(rt) < (int)xfrm_replay_state_esn_len(rs) &&
+	    nla_len(rt) != sizeof(*rs))
+		return -EINVAL;
 
 	/* As only ESP and AH support ESN feature. */
 	if ((p->id.proto != IPPROTO_ESP) && (p->id.proto != IPPROTO_AH))
-- 
2.14.1

^ permalink raw reply related

* [PATCH 4/9] xfrm: reuse uncached_list to track xdsts
From: Steffen Klassert @ 2018-03-13  7:09 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>

From: Xin Long <lucien.xin@gmail.com>

In early time, when freeing a xdst, it would be inserted into
dst_garbage.list first. Then if it's refcnt was still held
somewhere, later it would be put into dst_busy_list in
dst_gc_task().

When one dev was being unregistered, the dev of these dsts in
dst_busy_list would be set with loopback_dev and put this dev.
So that this dev's removal wouldn't get blocked, and avoid the
kmsg warning:

  kernel:unregister_netdevice: waiting for veth0 to become \
  free. Usage count = 2

However after Commit 52df157f17e5 ("xfrm: take refcnt of dst
when creating struct xfrm_dst bundle"), the xdst will not be
freed with dst gc, and this warning happens.

To fix it, we need to find these xdsts that are still held by
others when removing the dev, and free xdst's dev and set it
with loopback_dev.

But unfortunately after flow_cache for xfrm was deleted, no
list tracks them anymore. So we need to save these xdsts
somewhere to release the xdst's dev later.

To make this easier, this patch is to reuse uncached_list to
track xdsts, so that the dev refcnt can be released in the
event NETDEV_UNREGISTER process of fib_netdev_notifier.

Thanks to Florian, we could move forward this fix quickly.

Fixes: 52df157f17e5 ("xfrm: take refcnt of dst when creating struct xfrm_dst bundle")
Reported-by: Jianlin Shi <jishi@redhat.com>
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Tested-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/ip6_route.h |  3 +++
 include/net/route.h     |  3 +++
 net/ipv4/route.c        | 21 +++++++++++++--------
 net/ipv4/xfrm4_policy.c |  4 +++-
 net/ipv6/route.c        |  4 ++--
 net/ipv6/xfrm6_policy.c |  5 +++++
 6 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 27d23a65f3cd..ac0866bb9e93 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -179,6 +179,9 @@ void rt6_disable_ip(struct net_device *dev, unsigned long event);
 void rt6_sync_down_dev(struct net_device *dev, unsigned long event);
 void rt6_multipath_rebalance(struct rt6_info *rt);
 
+void rt6_uncached_list_add(struct rt6_info *rt);
+void rt6_uncached_list_del(struct rt6_info *rt);
+
 static inline const struct rt6_info *skb_rt6_info(const struct sk_buff *skb)
 {
 	const struct dst_entry *dst = skb_dst(skb);
diff --git a/include/net/route.h b/include/net/route.h
index 1eb9ce470e25..40b870d58f38 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -227,6 +227,9 @@ struct in_ifaddr;
 void fib_add_ifaddr(struct in_ifaddr *);
 void fib_del_ifaddr(struct in_ifaddr *, struct in_ifaddr *);
 
+void rt_add_uncached_list(struct rtable *rt);
+void rt_del_uncached_list(struct rtable *rt);
+
 static inline void ip_rt_put(struct rtable *rt)
 {
 	/* dst_release() accepts a NULL parameter.
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 49cc1c1df1ba..1d1e4abe04b0 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1383,7 +1383,7 @@ struct uncached_list {
 
 static DEFINE_PER_CPU_ALIGNED(struct uncached_list, rt_uncached_list);
 
-static void rt_add_uncached_list(struct rtable *rt)
+void rt_add_uncached_list(struct rtable *rt)
 {
 	struct uncached_list *ul = raw_cpu_ptr(&rt_uncached_list);
 
@@ -1394,14 +1394,8 @@ static void rt_add_uncached_list(struct rtable *rt)
 	spin_unlock_bh(&ul->lock);
 }
 
-static void ipv4_dst_destroy(struct dst_entry *dst)
+void rt_del_uncached_list(struct rtable *rt)
 {
-	struct dst_metrics *p = (struct dst_metrics *)DST_METRICS_PTR(dst);
-	struct rtable *rt = (struct rtable *) dst;
-
-	if (p != &dst_default_metrics && refcount_dec_and_test(&p->refcnt))
-		kfree(p);
-
 	if (!list_empty(&rt->rt_uncached)) {
 		struct uncached_list *ul = rt->rt_uncached_list;
 
@@ -1411,6 +1405,17 @@ static void ipv4_dst_destroy(struct dst_entry *dst)
 	}
 }
 
+static void ipv4_dst_destroy(struct dst_entry *dst)
+{
+	struct dst_metrics *p = (struct dst_metrics *)DST_METRICS_PTR(dst);
+	struct rtable *rt = (struct rtable *)dst;
+
+	if (p != &dst_default_metrics && refcount_dec_and_test(&p->refcnt))
+		kfree(p);
+
+	rt_del_uncached_list(rt);
+}
+
 void rt_flush_dev(struct net_device *dev)
 {
 	struct net *net = dev_net(dev);
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 05017e2c849c..8d33f7b311f4 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -102,6 +102,7 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 	xdst->u.rt.rt_pmtu = rt->rt_pmtu;
 	xdst->u.rt.rt_table_id = rt->rt_table_id;
 	INIT_LIST_HEAD(&xdst->u.rt.rt_uncached);
+	rt_add_uncached_list(&xdst->u.rt);
 
 	return 0;
 }
@@ -241,7 +242,8 @@ static void xfrm4_dst_destroy(struct dst_entry *dst)
 	struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
 
 	dst_destroy_metrics_generic(dst);
-
+	if (xdst->u.rt.rt_uncached_list)
+		rt_del_uncached_list(&xdst->u.rt);
 	xfrm_dst_destroy(xdst);
 }
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fb2d251c0500..38b75e9d6eae 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -128,7 +128,7 @@ struct uncached_list {
 
 static DEFINE_PER_CPU_ALIGNED(struct uncached_list, rt6_uncached_list);
 
-static void rt6_uncached_list_add(struct rt6_info *rt)
+void rt6_uncached_list_add(struct rt6_info *rt)
 {
 	struct uncached_list *ul = raw_cpu_ptr(&rt6_uncached_list);
 
@@ -139,7 +139,7 @@ static void rt6_uncached_list_add(struct rt6_info *rt)
 	spin_unlock_bh(&ul->lock);
 }
 
-static void rt6_uncached_list_del(struct rt6_info *rt)
+void rt6_uncached_list_del(struct rt6_info *rt)
 {
 	if (!list_empty(&rt->rt6i_uncached)) {
 		struct uncached_list *ul = rt->rt6i_uncached_list;
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 09fb44ee3b45..416fe67271a9 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -113,6 +113,9 @@ static int xfrm6_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 	xdst->u.rt6.rt6i_gateway = rt->rt6i_gateway;
 	xdst->u.rt6.rt6i_dst = rt->rt6i_dst;
 	xdst->u.rt6.rt6i_src = rt->rt6i_src;
+	INIT_LIST_HEAD(&xdst->u.rt6.rt6i_uncached);
+	rt6_uncached_list_add(&xdst->u.rt6);
+	atomic_inc(&dev_net(dev)->ipv6.rt6_stats->fib_rt_uncache);
 
 	return 0;
 }
@@ -244,6 +247,8 @@ static void xfrm6_dst_destroy(struct dst_entry *dst)
 	if (likely(xdst->u.rt6.rt6i_idev))
 		in6_dev_put(xdst->u.rt6.rt6i_idev);
 	dst_destroy_metrics_generic(dst);
+	if (xdst->u.rt6.rt6i_uncached_list)
+		rt6_uncached_list_del(&xdst->u.rt6);
 	xfrm_dst_destroy(xdst);
 }
 
-- 
2.14.1

^ permalink raw reply related

* pull request (net): ipsec 2018-03-13
From: Steffen Klassert @ 2018-03-13  7:09 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

1) Refuse to insert 32 bit userspace socket policies on 64
   bit systems like we do it for standard policies. We don't
   have a compat layer, so inserting socket policies from
   32 bit userspace will lead to a broken configuration.

2) Make the policy hold queue work without the flowcache.
   Dummy bundles are not chached anymore, so we need to
   generate a new one on each lookup as long as the SAs
   are not yet in place.

3) Fix the validation of the esn replay attribute. The
   The sanity check in verify_replay() is bypassed if
   the XFRM_STATE_ESN flag is not set. Fix this by doing
   the sanity check uncoditionally.
   From Florian Westphal.

4) After most of the dst_entry garbage collection code
   is removed, we may leak xfrm_dst entries as they are
   neither cached nor tracked somewhere. Fix this by
   reusing the 'uncached_list' to track xfrm_dst entries
   too. From Xin Long.

5) Fix a rcu_read_lock/rcu_read_unlock imbalance in
   xfrm_get_tos() From Xin Long.

6) Fix an infinite loop in xfrm_get_dst_nexthop. On
   transport mode we fetch the child dst_entry after
   we continue, so this pointer is never updated.
   Fix this by fetching it before we continue.

7) Fix ESN sequence number gap after IPsec GSO packets.
    We accidentally increment the sequence number counter
    on the xfrm_state by one packet too much in the ESN
    case. Fix this by setting the sequence number to the
    correct value.

8) Reset the ethernet protocol after decapsulation only if a
   mac header was set. Otherwise it breaks configurations
   with TUN devices. From Yossi Kuperman.

9) Fix __this_cpu_read() usage in preemptible code. Use
   this_cpu_read() instead in ipcomp_alloc_tfms().
   From Greg Hackmann.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit 743ffffefac1c670c6618742c923f6275d819604:

  net: pxa168_eth: add netconsole support (2018-02-01 14:58:37 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git master

for you to fetch changes up to 0dcd7876029b58770f769cbb7b484e88e4a305e5:

  net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms() (2018-03-13 07:46:37 +0100)

----------------------------------------------------------------
Florian Westphal (1):
      xfrm_user: uncoditionally validate esn replay attribute struct

Greg Hackmann (1):
      net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()

Steffen Klassert (4):
      xfrm: Refuse to insert 32 bit userspace socket policies on 64 bit systems
      xfrm: Fix policy hold queue after flowcache removal.
      xfrm: Fix infinite loop in xfrm_get_dst_nexthop with transport mode.
      xfrm: Fix ESN sequence number handling for IPsec GSO packets.

Xin Long (2):
      xfrm: reuse uncached_list to track xdsts
      xfrm: do not call rcu_read_unlock when afinfo is NULL in xfrm_get_tos

Yossi Kuperman (1):
      xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto

 include/net/ip6_route.h      |  3 +++
 include/net/route.h          |  3 +++
 net/ipv4/route.c             | 21 +++++++++++++--------
 net/ipv4/xfrm4_mode_tunnel.c |  3 ++-
 net/ipv4/xfrm4_policy.c      |  4 +++-
 net/ipv6/route.c             |  4 ++--
 net/ipv6/xfrm6_mode_tunnel.c |  3 ++-
 net/ipv6/xfrm6_policy.c      |  5 +++++
 net/xfrm/xfrm_ipcomp.c       |  2 +-
 net/xfrm/xfrm_policy.c       | 13 ++++++++-----
 net/xfrm/xfrm_replay.c       |  2 +-
 net/xfrm/xfrm_state.c        |  5 +++++
 net/xfrm/xfrm_user.c         | 21 ++++++++-------------
 13 files changed, 56 insertions(+), 33 deletions(-)

^ permalink raw reply

* [PATCH 5/9] xfrm: do not call rcu_read_unlock when afinfo is NULL in xfrm_get_tos
From: Steffen Klassert @ 2018-03-13  7:09 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>

From: Xin Long <lucien.xin@gmail.com>

When xfrm_policy_get_afinfo returns NULL, it will not hold rcu
read lock. In this case, rcu_read_unlock should not be called
in xfrm_get_tos, just like other places where it's calling
xfrm_policy_get_afinfo.

Fixes: f5e2bb4f5b22 ("xfrm: policy: xfrm_get_tos cannot fail")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 8b3811ff002d..150d46633ce6 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1458,10 +1458,13 @@ xfrm_tmpl_resolve(struct xfrm_policy **pols, int npols, const struct flowi *fl,
 static int xfrm_get_tos(const struct flowi *fl, int family)
 {
 	const struct xfrm_policy_afinfo *afinfo;
-	int tos = 0;
+	int tos;
 
 	afinfo = xfrm_policy_get_afinfo(family);
-	tos = afinfo ? afinfo->get_tos(fl) : 0;
+	if (!afinfo)
+		return 0;
+
+	tos = afinfo->get_tos(fl);
 
 	rcu_read_unlock();
 
-- 
2.14.1

^ permalink raw reply related

* [PATCH 1/9] xfrm: Refuse to insert 32 bit userspace socket policies on 64 bit systems
From: Steffen Klassert @ 2018-03-13  7:09 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>

We don't have a compat layer for xfrm, so userspace and kernel
structures have different sizes in this case. This results in
a broken configuration, so refuse to configure socket policies
when trying to insert from 32 bit userspace as we do it already
with policies inserted via netlink.

Reported-and-tested-by: syzbot+e1a1577ca8bcb47b769a@syzkaller.appspotmail.com
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_state.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 54e21f19d722..f9d2f2233f09 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2056,6 +2056,11 @@ int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen
 	struct xfrm_mgr *km;
 	struct xfrm_policy *pol = NULL;
 
+#ifdef CONFIG_COMPAT
+	if (in_compat_syscall())
+		return -EOPNOTSUPP;
+#endif
+
 	if (!optval && !optlen) {
 		xfrm_sk_policy_insert(sk, XFRM_POLICY_IN, NULL);
 		xfrm_sk_policy_insert(sk, XFRM_POLICY_OUT, NULL);
-- 
2.14.1

^ permalink raw reply related

* [PATCH 8/9] xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto
From: Steffen Klassert @ 2018-03-13  7:09 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>

From: Yossi Kuperman <yossiku@mellanox.com>

Artem Savkov reported that commit 5efec5c655dd leads to a packet loss under
IPSec configuration. It appears that his setup consists of a TUN device,
which does not have a MAC header.

Make sure MAC header exists.

Note: TUN device sets a MAC header pointer, although it does not have one.

Fixes: 5efec5c655dd ("xfrm: Fix eth_hdr(skb)->h_proto to reflect inner IP version")
Reported-by: Artem Savkov <artem.savkov@gmail.com>
Tested-by: Artem Savkov <artem.savkov@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/xfrm4_mode_tunnel.c | 3 ++-
 net/ipv6/xfrm6_mode_tunnel.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c
index 63faeee989a9..2a9764bd1719 100644
--- a/net/ipv4/xfrm4_mode_tunnel.c
+++ b/net/ipv4/xfrm4_mode_tunnel.c
@@ -92,7 +92,8 @@ static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 
 	skb_reset_network_header(skb);
 	skb_mac_header_rebuild(skb);
-	eth_hdr(skb)->h_proto = skb->protocol;
+	if (skb->mac_len)
+		eth_hdr(skb)->h_proto = skb->protocol;
 
 	err = 0;
 
diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c
index bb935a3b7fea..de1b0b8c53b0 100644
--- a/net/ipv6/xfrm6_mode_tunnel.c
+++ b/net/ipv6/xfrm6_mode_tunnel.c
@@ -92,7 +92,8 @@ static int xfrm6_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 
 	skb_reset_network_header(skb);
 	skb_mac_header_rebuild(skb);
-	eth_hdr(skb)->h_proto = skb->protocol;
+	if (skb->mac_len)
+		eth_hdr(skb)->h_proto = skb->protocol;
 
 	err = 0;
 
-- 
2.14.1

^ permalink raw reply related

* [PATCH 6/9] xfrm: Fix infinite loop in xfrm_get_dst_nexthop with transport mode.
From: Steffen Klassert @ 2018-03-13  7:09 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>

On transport mode we forget to fetch the child dst_entry
before we continue the while loop, this leads to an infinite
loop. Fix this by fetching the child dst_entry before we
continue the while loop.

Fixes: 0f6c480f23f4 ("xfrm: Move dst->path into struct xfrm_dst")
Reported-by: syzbot+7d03c810e50aaedef98a@syzkaller.appspotmail.com
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 150d46633ce6..625b3fca5704 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2732,14 +2732,14 @@ static const void *xfrm_get_dst_nexthop(const struct dst_entry *dst,
 	while (dst->xfrm) {
 		const struct xfrm_state *xfrm = dst->xfrm;
 
+		dst = xfrm_dst_child(dst);
+
 		if (xfrm->props.mode == XFRM_MODE_TRANSPORT)
 			continue;
 		if (xfrm->type->flags & XFRM_TYPE_REMOTE_COADDR)
 			daddr = xfrm->coaddr;
 		else if (!(xfrm->type->flags & XFRM_TYPE_LOCAL_COADDR))
 			daddr = &xfrm->id.daddr;
-
-		dst = xfrm_dst_child(dst);
 	}
 	return daddr;
 }
-- 
2.14.1

^ permalink raw reply related

* [PATCH 2/9] xfrm: Fix policy hold queue after flowcache removal.
From: Steffen Klassert @ 2018-03-13  7:09 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>

Now that the flowcache is removed we need to generate
a new dummy bundle every time we check if the needed
SAs are in place because the dummy bundle is not cached
anymore. Fix it by passing the XFRM_LOOKUP_QUEUE flag
to xfrm_lookup(). This makes sure that we get a dummy
bundle in case the SAs are not yet in place.

Fixes: 3ca28286ea80 ("xfrm_policy: bypass flow_cache_lookup")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 7a23078132cf..8b3811ff002d 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1891,7 +1891,7 @@ static void xfrm_policy_queue_process(struct timer_list *t)
 	spin_unlock(&pq->hold_queue.lock);
 
 	dst_hold(xfrm_dst_path(dst));
-	dst = xfrm_lookup(net, xfrm_dst_path(dst), &fl, sk, 0);
+	dst = xfrm_lookup(net, xfrm_dst_path(dst), &fl, sk, XFRM_LOOKUP_QUEUE);
 	if (IS_ERR(dst))
 		goto purge_queue;
 
-- 
2.14.1

^ permalink raw reply related

* [PATCH 9/9] net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()
From: Steffen Klassert @ 2018-03-13  7:09 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>

From: Greg Hackmann <ghackmann@google.com>

f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a
__this_cpu_read() call inside ipcomp_alloc_tfms().

At the time, __this_cpu_read() required the caller to either not care
about races or to handle preemption/interrupt issues.  3.15 tightened
the rules around some per-cpu operations, and now __this_cpu_read()
should never be used in a preemptible context.  On 3.15 and later, we
need to use this_cpu_read() instead.

syzkaller reported this leading to the following kernel BUG while
fuzzing sendmsg:

BUG: using __this_cpu_read() in preemptible [00000000] code: repro/3101
caller is ipcomp_init_state+0x185/0x990
CPU: 3 PID: 3101 Comm: repro Not tainted 4.16.0-rc4-00123-g86f84779d8e9 #154
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0xb9/0x115
 check_preemption_disabled+0x1cb/0x1f0
 ipcomp_init_state+0x185/0x990
 ? __xfrm_init_state+0x876/0xc20
 ? lock_downgrade+0x5e0/0x5e0
 ipcomp4_init_state+0xaa/0x7c0
 __xfrm_init_state+0x3eb/0xc20
 xfrm_init_state+0x19/0x60
 pfkey_add+0x20df/0x36f0
 ? pfkey_broadcast+0x3dd/0x600
 ? pfkey_sock_destruct+0x340/0x340
 ? pfkey_seq_stop+0x80/0x80
 ? __skb_clone+0x236/0x750
 ? kmem_cache_alloc+0x1f6/0x260
 ? pfkey_sock_destruct+0x340/0x340
 ? pfkey_process+0x62a/0x6f0
 pfkey_process+0x62a/0x6f0
 ? pfkey_send_new_mapping+0x11c0/0x11c0
 ? mutex_lock_io_nested+0x1390/0x1390
 pfkey_sendmsg+0x383/0x750
 ? dump_sp+0x430/0x430
 sock_sendmsg+0xc0/0x100
 ___sys_sendmsg+0x6c8/0x8b0
 ? copy_msghdr_from_user+0x3b0/0x3b0
 ? pagevec_lru_move_fn+0x144/0x1f0
 ? find_held_lock+0x32/0x1c0
 ? do_huge_pmd_anonymous_page+0xc43/0x11e0
 ? lock_downgrade+0x5e0/0x5e0
 ? get_kernel_page+0xb0/0xb0
 ? _raw_spin_unlock+0x29/0x40
 ? do_huge_pmd_anonymous_page+0x400/0x11e0
 ? __handle_mm_fault+0x553/0x2460
 ? __fget_light+0x163/0x1f0
 ? __sys_sendmsg+0xc7/0x170
 __sys_sendmsg+0xc7/0x170
 ? SyS_shutdown+0x1a0/0x1a0
 ? __do_page_fault+0x5a0/0xca0
 ? lock_downgrade+0x5e0/0x5e0
 SyS_sendmsg+0x27/0x40
 ? __sys_sendmsg+0x170/0x170
 do_syscall_64+0x19f/0x640
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x7f0ee73dfb79
RSP: 002b:00007ffe14fc15a8 EFLAGS: 00000207 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0ee73dfb79
RDX: 0000000000000000 RSI: 00000000208befc8 RDI: 0000000000000004
RBP: 00007ffe14fc15b0 R08: 00007ffe14fc15c0 R09: 00007ffe14fc15c0
R10: 0000000000000000 R11: 0000000000000207 R12: 0000000000400440
R13: 00007ffe14fc16b0 R14: 0000000000000000 R15: 0000000000000000

Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_ipcomp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
index ccfdc7115a83..a00ec715aa46 100644
--- a/net/xfrm/xfrm_ipcomp.c
+++ b/net/xfrm/xfrm_ipcomp.c
@@ -283,7 +283,7 @@ static struct crypto_comp * __percpu *ipcomp_alloc_tfms(const char *alg_name)
 		struct crypto_comp *tfm;
 
 		/* This can be any valid CPU ID so we don't need locking. */
-		tfm = __this_cpu_read(*pos->tfms);
+		tfm = this_cpu_read(*pos->tfms);
 
 		if (!strcmp(crypto_comp_name(tfm), alg_name)) {
 			pos->users++;
-- 
2.14.1

^ permalink raw reply related

* [PATCH 7/9] xfrm: Fix ESN sequence number handling for IPsec GSO packets.
From: Steffen Klassert @ 2018-03-13  7:09 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>

When IPsec offloading was introduced, we accidentally incremented
the sequence number counter on the xfrm_state by one packet
too much in the ESN case. This leads to a sequence number gap of
one packet after each GSO packet. Fix this by setting the sequence
number to the correct value.

Fixes: d7dbefc45cf5 ("xfrm: Add xfrm_replay_overflow functions for offloading")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_replay.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_replay.c b/net/xfrm/xfrm_replay.c
index 1d38c6acf8af..9e3a5e85f828 100644
--- a/net/xfrm/xfrm_replay.c
+++ b/net/xfrm/xfrm_replay.c
@@ -660,7 +660,7 @@ static int xfrm_replay_overflow_offload_esn(struct xfrm_state *x, struct sk_buff
 		} else {
 			XFRM_SKB_CB(skb)->seq.output.low = oseq + 1;
 			XFRM_SKB_CB(skb)->seq.output.hi = oseq_hi;
-			xo->seq.low = oseq = oseq + 1;
+			xo->seq.low = oseq + 1;
 			xo->seq.hi = oseq_hi;
 			oseq += skb_shinfo(skb)->gso_segs;
 		}
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net-next v2 3/4] ibmvnic: Pad small packets to minimum MTU size
From: kbuild test robot @ 2018-03-13  7:15 UTC (permalink / raw)
  To: Thomas Falcon; +Cc: kbuild-all, netdev, davem, jallen, nfont, Thomas Falcon
In-Reply-To: <1520873465-23312-4-git-send-email-tlfalcon@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 10072 bytes --]

Hi Thomas,

I love your patch! Yet something to improve:

[auto build test ERROR on v4.16-rc4]
[also build test ERROR on next-20180309]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Thomas-Falcon/ibmvnic-Fix-VLAN-and-other-device-errata/20180313-125518
config: powerpc-allmodconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=powerpc 

All error/warnings (new ones prefixed by >>):

   drivers/net/ethernet/ibm/ibmvnic.c: In function 'ibmvnic_xmit':
>> drivers/net/ethernet/ibm/ibmvnic.c:1386:36: error: passing argument 2 of 'ibmvnic_xmit_workarounds' from incompatible pointer type [-Werror=incompatible-pointer-types]
     if (ibmvnic_xmit_workarounds(skb, adapter)) {
                                       ^~~~~~~
   drivers/net/ethernet/ibm/ibmvnic.c:1336:12: note: expected 'struct net_device *' but argument is of type 'struct ibmvnic_adapter *'
    static int ibmvnic_xmit_workarounds(struct sk_buff *skb,
               ^~~~~~~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/ibm/ibmvnic.c: In function 'ibmvnic_xmit_workarounds':
>> drivers/net/ethernet/ibm/ibmvnic.c:1347:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^
   cc1: some warnings being treated as errors

vim +/ibmvnic_xmit_workarounds +1386 drivers/net/ethernet/ibm/ibmvnic.c

  1335	
  1336	static int ibmvnic_xmit_workarounds(struct sk_buff *skb,
  1337					    struct net_device *netdev)
  1338	{
  1339		/* For some backing devices, mishandling of small packets
  1340		 * can result in a loss of connection or TX stall. Device
  1341		 * architects recommend that no packet should be smaller
  1342		 * than the minimum MTU value provided to the driver, so
  1343		 * pad any packets to that length
  1344		 */
  1345		if (skb->len < netdev->min_mtu)
  1346			return skb_put_padto(skb, netdev->min_mtu);
> 1347	}
  1348	
  1349	static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
  1350	{
  1351		struct ibmvnic_adapter *adapter = netdev_priv(netdev);
  1352		int queue_num = skb_get_queue_mapping(skb);
  1353		u8 *hdrs = (u8 *)&adapter->tx_rx_desc_req;
  1354		struct device *dev = &adapter->vdev->dev;
  1355		struct ibmvnic_tx_buff *tx_buff = NULL;
  1356		struct ibmvnic_sub_crq_queue *tx_scrq;
  1357		struct ibmvnic_tx_pool *tx_pool;
  1358		unsigned int tx_send_failed = 0;
  1359		unsigned int tx_map_failed = 0;
  1360		unsigned int tx_dropped = 0;
  1361		unsigned int tx_packets = 0;
  1362		unsigned int tx_bytes = 0;
  1363		dma_addr_t data_dma_addr;
  1364		struct netdev_queue *txq;
  1365		unsigned long lpar_rc;
  1366		union sub_crq tx_crq;
  1367		unsigned int offset;
  1368		int num_entries = 1;
  1369		unsigned char *dst;
  1370		u64 *handle_array;
  1371		int index = 0;
  1372		u8 proto = 0;
  1373		int ret = 0;
  1374	
  1375		if (adapter->resetting) {
  1376			if (!netif_subqueue_stopped(netdev, skb))
  1377				netif_stop_subqueue(netdev, queue_num);
  1378			dev_kfree_skb_any(skb);
  1379	
  1380			tx_send_failed++;
  1381			tx_dropped++;
  1382			ret = NETDEV_TX_OK;
  1383			goto out;
  1384		}
  1385	
> 1386		if (ibmvnic_xmit_workarounds(skb, adapter)) {
  1387			tx_dropped++;
  1388			tx_send_failed++;
  1389			ret = NETDEV_TX_OK;
  1390			goto out;
  1391		}
  1392	
  1393		tx_pool = &adapter->tx_pool[queue_num];
  1394		tx_scrq = adapter->tx_scrq[queue_num];
  1395		txq = netdev_get_tx_queue(netdev, skb_get_queue_mapping(skb));
  1396		handle_array = (u64 *)((u8 *)(adapter->login_rsp_buf) +
  1397			be32_to_cpu(adapter->login_rsp_buf->off_txsubm_subcrqs));
  1398	
  1399		index = tx_pool->free_map[tx_pool->consumer_index];
  1400	
  1401		if (skb_is_gso(skb)) {
  1402			offset = tx_pool->tso_index * IBMVNIC_TSO_BUF_SZ;
  1403			dst = tx_pool->tso_ltb.buff + offset;
  1404			memset(dst, 0, IBMVNIC_TSO_BUF_SZ);
  1405			data_dma_addr = tx_pool->tso_ltb.addr + offset;
  1406			tx_pool->tso_index++;
  1407			if (tx_pool->tso_index == IBMVNIC_TSO_BUFS)
  1408				tx_pool->tso_index = 0;
  1409		} else {
  1410			offset = index * (adapter->req_mtu + VLAN_HLEN);
  1411			dst = tx_pool->long_term_buff.buff + offset;
  1412			memset(dst, 0, adapter->req_mtu + VLAN_HLEN);
  1413			data_dma_addr = tx_pool->long_term_buff.addr + offset;
  1414		}
  1415	
  1416		if (skb_shinfo(skb)->nr_frags) {
  1417			int cur, i;
  1418	
  1419			/* Copy the head */
  1420			skb_copy_from_linear_data(skb, dst, skb_headlen(skb));
  1421			cur = skb_headlen(skb);
  1422	
  1423			/* Copy the frags */
  1424			for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
  1425				const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
  1426	
  1427				memcpy(dst + cur,
  1428				       page_address(skb_frag_page(frag)) +
  1429				       frag->page_offset, skb_frag_size(frag));
  1430				cur += skb_frag_size(frag);
  1431			}
  1432		} else {
  1433			skb_copy_from_linear_data(skb, dst, skb->len);
  1434		}
  1435	
  1436		tx_pool->consumer_index =
  1437		    (tx_pool->consumer_index + 1) %
  1438			adapter->req_tx_entries_per_subcrq;
  1439	
  1440		tx_buff = &tx_pool->tx_buff[index];
  1441		tx_buff->skb = skb;
  1442		tx_buff->data_dma[0] = data_dma_addr;
  1443		tx_buff->data_len[0] = skb->len;
  1444		tx_buff->index = index;
  1445		tx_buff->pool_index = queue_num;
  1446		tx_buff->last_frag = true;
  1447	
  1448		memset(&tx_crq, 0, sizeof(tx_crq));
  1449		tx_crq.v1.first = IBMVNIC_CRQ_CMD;
  1450		tx_crq.v1.type = IBMVNIC_TX_DESC;
  1451		tx_crq.v1.n_crq_elem = 1;
  1452		tx_crq.v1.n_sge = 1;
  1453		tx_crq.v1.flags1 = IBMVNIC_TX_COMP_NEEDED;
  1454		tx_crq.v1.correlator = cpu_to_be32(index);
  1455		if (skb_is_gso(skb))
  1456			tx_crq.v1.dma_reg = cpu_to_be16(tx_pool->tso_ltb.map_id);
  1457		else
  1458			tx_crq.v1.dma_reg = cpu_to_be16(tx_pool->long_term_buff.map_id);
  1459		tx_crq.v1.sge_len = cpu_to_be32(skb->len);
  1460		tx_crq.v1.ioba = cpu_to_be64(data_dma_addr);
  1461	
  1462		if (adapter->vlan_header_insertion) {
  1463			tx_crq.v1.flags2 |= IBMVNIC_TX_VLAN_INSERT;
  1464			tx_crq.v1.vlan_id = cpu_to_be16(skb->vlan_tci);
  1465		}
  1466	
  1467		if (skb->protocol == htons(ETH_P_IP)) {
  1468			tx_crq.v1.flags1 |= IBMVNIC_TX_PROT_IPV4;
  1469			proto = ip_hdr(skb)->protocol;
  1470		} else if (skb->protocol == htons(ETH_P_IPV6)) {
  1471			tx_crq.v1.flags1 |= IBMVNIC_TX_PROT_IPV6;
  1472			proto = ipv6_hdr(skb)->nexthdr;
  1473		}
  1474	
  1475		if (proto == IPPROTO_TCP)
  1476			tx_crq.v1.flags1 |= IBMVNIC_TX_PROT_TCP;
  1477		else if (proto == IPPROTO_UDP)
  1478			tx_crq.v1.flags1 |= IBMVNIC_TX_PROT_UDP;
  1479	
  1480		if (skb->ip_summed == CHECKSUM_PARTIAL) {
  1481			tx_crq.v1.flags1 |= IBMVNIC_TX_CHKSUM_OFFLOAD;
  1482			hdrs += 2;
  1483		}
  1484		if (skb_is_gso(skb)) {
  1485			tx_crq.v1.flags1 |= IBMVNIC_TX_LSO;
  1486			tx_crq.v1.mss = cpu_to_be16(skb_shinfo(skb)->gso_size);
  1487			hdrs += 2;
  1488		}
  1489		/* determine if l2/3/4 headers are sent to firmware */
  1490		if ((*hdrs >> 7) & 1) {
  1491			build_hdr_descs_arr(tx_buff, &num_entries, *hdrs);
  1492			tx_crq.v1.n_crq_elem = num_entries;
  1493			tx_buff->indir_arr[0] = tx_crq;
  1494			tx_buff->indir_dma = dma_map_single(dev, tx_buff->indir_arr,
  1495							    sizeof(tx_buff->indir_arr),
  1496							    DMA_TO_DEVICE);
  1497			if (dma_mapping_error(dev, tx_buff->indir_dma)) {
  1498				dev_kfree_skb_any(skb);
  1499				tx_buff->skb = NULL;
  1500				if (!firmware_has_feature(FW_FEATURE_CMO))
  1501					dev_err(dev, "tx: unable to map descriptor array\n");
  1502				tx_map_failed++;
  1503				tx_dropped++;
  1504				ret = NETDEV_TX_OK;
  1505				goto out;
  1506			}
  1507			lpar_rc = send_subcrq_indirect(adapter, handle_array[queue_num],
  1508						       (u64)tx_buff->indir_dma,
  1509						       (u64)num_entries);
  1510		} else {
  1511			lpar_rc = send_subcrq(adapter, handle_array[queue_num],
  1512					      &tx_crq);
  1513		}
  1514		if (lpar_rc != H_SUCCESS) {
  1515			dev_err(dev, "tx failed with code %ld\n", lpar_rc);
  1516	
  1517			if (tx_pool->consumer_index == 0)
  1518				tx_pool->consumer_index =
  1519					adapter->req_tx_entries_per_subcrq - 1;
  1520			else
  1521				tx_pool->consumer_index--;
  1522	
  1523			dev_kfree_skb_any(skb);
  1524			tx_buff->skb = NULL;
  1525	
  1526			if (lpar_rc == H_CLOSED) {
  1527				/* Disable TX and report carrier off if queue is closed.
  1528				 * Firmware guarantees that a signal will be sent to the
  1529				 * driver, triggering a reset or some other action.
  1530				 */
  1531				netif_tx_stop_all_queues(netdev);
  1532				netif_carrier_off(netdev);
  1533			}
  1534	
  1535			tx_send_failed++;
  1536			tx_dropped++;
  1537			ret = NETDEV_TX_OK;
  1538			goto out;
  1539		}
  1540	
  1541		if (atomic_inc_return(&tx_scrq->used)
  1542						>= adapter->req_tx_entries_per_subcrq) {
  1543			netdev_info(netdev, "Stopping queue %d\n", queue_num);
  1544			netif_stop_subqueue(netdev, queue_num);
  1545		}
  1546	
  1547		tx_packets++;
  1548		tx_bytes += skb->len;
  1549		txq->trans_start = jiffies;
  1550		ret = NETDEV_TX_OK;
  1551	
  1552	out:
  1553		netdev->stats.tx_dropped += tx_dropped;
  1554		netdev->stats.tx_bytes += tx_bytes;
  1555		netdev->stats.tx_packets += tx_packets;
  1556		adapter->tx_send_failed += tx_send_failed;
  1557		adapter->tx_map_failed += tx_map_failed;
  1558		adapter->tx_stats_buffers[queue_num].packets += tx_packets;
  1559		adapter->tx_stats_buffers[queue_num].bytes += tx_bytes;
  1560		adapter->tx_stats_buffers[queue_num].dropped_packets += tx_dropped;
  1561	
  1562		return ret;
  1563	}
  1564	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 56598 bytes --]

^ permalink raw reply

* Re: Problem with bridge (mcast-to-ucast + hairpin) and Broadcom's 802.11f in their FullMAC fw
From: Felix Fietkau @ 2018-03-13  7:17 UTC (permalink / raw)
  To: Rafał Miłecki, Linus Lüssing, Arend van Spriel,
	Franky Lin, Hante Meuleman, Chi-Hsien Lin, Wright Feng,
	Pieter-Paul Giesberts
  Cc: Network Development,
	open list:BROADCOM BRCM80211 IEEE802.11n WIRELESS DRIVER, bridge,
	brcm80211-dev-list, inux-wireless
In-Reply-To: <CACna6rz9L09g9oeHhvt209Tg1E3gKgmhGnYF653AdkXfZf=4kw@mail.gmail.com>

On 2018-02-27 11:08, Rafał Miłecki wrote:
> I've problem when using OpenWrt/LEDE on a home router with Broadcom's
> FullMAC WiFi chipset.
> 
> 
> First of all OpenWrt/LEDE uses bridge interface for LAN network with:
> 1) IFLA_BRPORT_MCAST_TO_UCAST
> 2) Clients isolation in hostapd
> 3) Hairpin mode enabled
> 
> For more details please see Linus's patch description:
> https://patchwork.kernel.org/patch/9530669/
> and maybe hairpin mode patch:
> https://lwn.net/Articles/347344/
> 
> Short version: in that setup packets received from a bridged wireless
> interface can be handled back to it for transmission.
> 
> 
> Now, Broadcom's firmware for their FullMAC chipsets in AP mode
> supports an obsoleted 802.11f AKA IAPP standard. It's a roaming
> standard that was replaced by 802.11r.
> 
> Whenever a new station associates, firmware generates a packet like:
> ff ff ff ff  ff ff ec 10  7b 5f ?? ??  00 06 00 01  af 81 01 00
> (just masked 2 bytes of my MAC)
> 
> For mode details you can see discussion in my brcmfmac patch thread:
> https://patchwork.kernel.org/patch/10191451/
> 
> 
> The problem is that bridge (in setup as above) handles such a packet
> back to the device.
> 
> That makes Broadcom's FullMAC firmware believe that a given station
> just connected to another AP in a network (which doesn't even exist).
> As a result firmware immediately disassociates that station. It's
> simply impossible to connect to the router. Every association is
> followed by immediate disassociation.
> 
> 
> Can you see any solution for this problem? Is that an option to stop
> multicast-to-unicast from touching 802.11f packets? Some other ideas?
> Obviously I can't modify Broadcom's firmware and drop that obsoleted
> standard.
Let's look at it from a different angle: Since these packets are
forwarded as normal packets by the bridge, and the Broadcom firmware
reacts to them in this nasty way, that's basically local DoS security
issue. In my opinion that matters a lot more than having support for an
obsolete feature that almost nobody will ever want to use.

I think the right approach to deal with this issue is to drop these
garbage packets in both the receive and transmit path of brcmfmac.

- Felix

^ permalink raw reply

* Re: Problem with bridge (mcast-to-ucast + hairpin) and Broadcom's 802.11f in their FullMAC fw
From: Felix Fietkau @ 2018-03-13  7:20 UTC (permalink / raw)
  To: Rafał Miłecki, Linus Lüssing, Arend van Spriel,
	Franky Lin, Hante Meuleman, Chi-Hsien Lin, Wright Feng,
	Pieter-Paul Giesberts
  Cc: Network Development,
	bridge-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	open list:BROADCOM BRCM80211 IEEE802.11n WIRELESS DRIVER,
	brcm80211-dev-list-+wT8y+m8/X5BDgjK7y7TUQ
In-Reply-To: <CACna6rz9L09g9oeHhvt209Tg1E3gKgmhGnYF653AdkXfZf=4kw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[resent with fixed typo in linux-wireless address]

On 2018-02-27 11:08, Rafał Miłecki wrote:
> I've problem when using OpenWrt/LEDE on a home router with Broadcom's
> FullMAC WiFi chipset.
> 
> 
> First of all OpenWrt/LEDE uses bridge interface for LAN network with:
> 1) IFLA_BRPORT_MCAST_TO_UCAST
> 2) Clients isolation in hostapd
> 3) Hairpin mode enabled
> 
> For more details please see Linus's patch description:
> https://patchwork.kernel.org/patch/9530669/
> and maybe hairpin mode patch:
> https://lwn.net/Articles/347344/
> 
> Short version: in that setup packets received from a bridged wireless
> interface can be handled back to it for transmission.
> 
> 
> Now, Broadcom's firmware for their FullMAC chipsets in AP mode
> supports an obsoleted 802.11f AKA IAPP standard. It's a roaming
> standard that was replaced by 802.11r.
> 
> Whenever a new station associates, firmware generates a packet like:
> ff ff ff ff  ff ff ec 10  7b 5f ?? ??  00 06 00 01  af 81 01 00
> (just masked 2 bytes of my MAC)
> 
> For mode details you can see discussion in my brcmfmac patch thread:
> https://patchwork.kernel.org/patch/10191451/
> 
> 
> The problem is that bridge (in setup as above) handles such a packet
> back to the device.
> 
> That makes Broadcom's FullMAC firmware believe that a given station
> just connected to another AP in a network (which doesn't even exist).
> As a result firmware immediately disassociates that station. It's
> simply impossible to connect to the router. Every association is
> followed by immediate disassociation.
> 
> 
> Can you see any solution for this problem? Is that an option to stop
> multicast-to-unicast from touching 802.11f packets? Some other ideas?
> Obviously I can't modify Broadcom's firmware and drop that obsoleted
> standard.
Let's look at it from a different angle: Since these packets are
forwarded as normal packets by the bridge, and the Broadcom firmware
reacts to them in this nasty way, that's basically local DoS security
issue. In my opinion that matters a lot more than having support for an
obsolete feature that almost nobody will ever want to use.

I think the right approach to deal with this issue is to drop these
garbage packets in both the receive and transmit path of brcmfmac.

- Felix

^ permalink raw reply

* Re: [2/2] net/usb/ax88179_178a: Delete three unnecessary variables in ax88179_chk_eee()
From: SF Markus Elfring @ 2018-03-13  7:24 UTC (permalink / raw)
  To: Oliver Neukum, linux-usb, netdev
  Cc: kernel-janitors, LKML, Andrew F. Davis, Andrew Lunn,
	Bjørn Mork, David S. Miller, Philippe Reynes, Yuval Shaia
In-Reply-To: <1520849038.29340.3.camel@suse.com>

>> Use three values directly for a condition check without assigning them
>> to intermediate variables.
> 
> Hi,
> 
> what is the benefit of this?

I proposed a small source code reduction.

Other software design directions might become more interesting for this use case.

Regards,
Markus

^ permalink raw reply

* Re: [pci PATCH v5 1/4] pci: Add pci_sriov_configure_simple for PFs that don't manage VF resources
From: Christoph Hellwig @ 2018-03-13  7:44 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Keith Busch, Bjorn Helgaas, Duyck, Alexander H, linux-pci,
	virtio-dev, kvm, Netdev, Daly, Dan, LKML, linux-nvme, netanel,
	Maximilian Heyne, Wang, Liang-min, Rustad, Mark D,
	David Woodhouse, Christoph Hellwig, dwmw
In-Reply-To: <CAKgT0UfH+xXk__R_hEtFMsm7qkBG02hWC-S=8MgYkeeEx5zweA@mail.gmail.com>

On Mon, Mar 12, 2018 at 01:17:00PM -0700, Alexander Duyck wrote:
> No, I am aware of those. The problem is they aren't accessed as
> function pointers. As such converting them to static inline functions
> is easy. As I am sure you are aware an "inline" function doesn't
> normally generate a function pointer.

I think Keith's original idea of defining them to NULL is right.  That
takes care of all the current trivial assign to struct cases.

If someone wants to call these functions they'll still need the ifdef
around the call as those won't otherwise compile, but they probably
want the ifdef around the whole caller anyway.

^ permalink raw reply

* Re: aio poll, io_pgetevents and a new in-kernel poll API V5
From: Christoph Hellwig @ 2018-03-13  7:46 UTC (permalink / raw)
  To: viro; +Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
	linux-kernel
In-Reply-To: <20180305212743.16664-1-hch@lst.de>

ping?

On Mon, Mar 05, 2018 at 01:27:07PM -0800, Christoph Hellwig wrote:
> Hi all,
> 
> this series adds support for the IOCB_CMD_POLL operation to poll for the
> readyness of file descriptors using the aio subsystem.  The API is based
> on patches that existed in RHAS2.1 and RHEL3, which means it already is
> supported by libaio.  To implement the poll support efficiently new
> methods to poll are introduced in struct file_operations:  get_poll_head
> and poll_mask.  The first one returns a wait_queue_head to wait on
> (lifetime is bound by the file), and the second does a non-blocking
> check for the POLL* events.  This allows aio poll to work without
> any additional context switches, unlike epoll.
> 
> To make the interface fully useful a new io_pgetevents system call is
> added, which atomically saves and restores the signal mask over the
> io_pgetevents system call.  It it the logical equivalent to pselect and
> ppoll for io_pgetevents.
> 
> The corresponding libaio changes for io_pgetevents support and
> documentation, as well as a test case will be posted in a separate
> series.
> 
> The changes were sponsored by Scylladb, and improve performance
> of the seastar framework up to 10%, while also removing the need
> for a privileged SCHED_FIFO epoll listener thread.
> 
>     git://git.infradead.org/users/hch/vfs.git aio-poll.5
> 
> Gitweb:
> 
>     http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-poll.5
> 
> Libaio changes:
> 
>     https://pagure.io/libaio.git io-poll
> 
> Seastar changes (not updated for the new io_pgetevens ABI yet):
> 
>     https://github.com/avikivity/seastar/commits/aio
> 
> Changes since V4:
>  - rebased ontop of Linux 4.16-rc4
> 
> Changes since V3:
>  - remove the pre-sleep ->poll_mask call in vfs_poll,
>    allow ->get_poll_head to return POLL* values.
> 
> Changes since V2:
>  - removed a double initialization
>  - new vfs_get_poll_head helper
>  - document that ->get_poll_head can return NULL
>  - call ->poll_mask before sleeping
>  - various ACKs
>  - add conversion of random to ->poll_mask
>  - add conversion of af_alg to ->poll_mask
>  - lacking ->poll_mask support now returns -EINVAL for IOCB_CMD_POLL
>  - reshuffled the series so that prep patches and everything not
>    requiring the new in-kernel poll API is in the beginning
> 
> Changes since V1:
>  - handle the NULL ->poll case in vfs_poll
>  - dropped the file argument to the ->poll_mask socket operation
>  - replace the ->pre_poll socket operation with ->get_poll_head as
>    in the file operations
---end quoted text---

^ permalink raw reply

* Re: WARNING in kmalloc_slab (4)
From: Steffen Klassert @ 2018-03-13  7:51 UTC (permalink / raw)
  To: syzbot; +Cc: davem, herbert, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <001a114214fac20a80056746440a@google.com>

On Tue, Mar 13, 2018 at 12:33:02AM -0700, syzbot wrote:
> Hello,
> 
> syzbot hit the following crash on net-next commit
> f44b1886a5f876c87b5889df463ad7b97834ba37 (Fri Mar 9 18:10:06 2018 +0000)
> Merge branch 's390-qeth-next'
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+6a7e7ed886bde43469c4@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> WARNING: CPU: 1 PID: 27333 at mm/slab_common.c:1012 kmalloc_slab+0x5d/0x70
> mm/slab_common.c:1012
> Kernel panic - not syncing: panic_on_warn set ...
> 
> syz-executor0: vmalloc: allocation failure: 17045651456 bytes,
> mode:0x14080c0(GFP_KERNEL|__GFP_ZERO), nodemask=(null)
> CPU: 1 PID: 27333 Comm: syz-executor2 Not tainted 4.16.0-rc4+ #260
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x24d lib/dump_stack.c:53
>  panic+0x1e4/0x41c kernel/panic.c:183
> syz-executor0 cpuset=
>  __warn+0x1dc/0x200 kernel/panic.c:547
> /
>  mems_allowed=0
>  report_bug+0x211/0x2d0 lib/bug.c:184
>  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
>  fixup_bug arch/x86/kernel/traps.c:247 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
> RIP: 0010:kmalloc_slab+0x5d/0x70 mm/slab_common.c:1012
> RSP: 0018:ffff8801ccfc72f0 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000010000018 RCX: ffffffff84ec4fc8
> RDX: 0000000000000ba7 RSI: 0000000000000000 RDI: 0000000010000018
> RBP: ffff8801ccfc72f0 R08: 0000000000000000 R09: 1ffff100399f8e21
> R10: ffff8801ccfc7040 R11: 0000000000000001 R12: 0000000000000018
> R13: ffff8801ccfc7598 R14: 00000000014080c0 R15: ffff8801aebaad80
>  __do_kmalloc mm/slab.c:3700 [inline]
>  __kmalloc+0x25/0x760 mm/slab.c:3714
>  kmalloc include/linux/slab.h:517 [inline]
>  kzalloc include/linux/slab.h:701 [inline]
>  xfrm_alloc_replay_state_esn net/xfrm/xfrm_user.c:442 [inline]

This is likely fixed with:

commit d97ca5d714a5334aecadadf696875da40f1fbf3e
xfrm_user: uncoditionally validate esn replay attribute struct

The patch is included in the ipsec pull request for the net
tree I've sent this morning.

^ permalink raw reply

* Re: WARNING in kmalloc_slab (4)
From: Dmitry Vyukov @ 2018-03-13  8:04 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: syzbot, David Miller, Herbert Xu, LKML, netdev, syzkaller-bugs
In-Reply-To: <20180313075143.b3ymdpt3nj3vnz77@gauss3.secunet.de>

On Tue, Mar 13, 2018 at 10:51 AM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> On Tue, Mar 13, 2018 at 12:33:02AM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot hit the following crash on net-next commit
>> f44b1886a5f876c87b5889df463ad7b97834ba37 (Fri Mar 9 18:10:06 2018 +0000)
>> Merge branch 's390-qeth-next'
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>> Raw console output is attached.
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+6a7e7ed886bde43469c4@syzkaller.appspotmail.com
>> It will help syzbot understand when the bug is fixed. See footer for
>> details.
>> If you forward the report, please keep this part and the footer.
>>
>> WARNING: CPU: 1 PID: 27333 at mm/slab_common.c:1012 kmalloc_slab+0x5d/0x70
>> mm/slab_common.c:1012
>> Kernel panic - not syncing: panic_on_warn set ...
>>
>> syz-executor0: vmalloc: allocation failure: 17045651456 bytes,
>> mode:0x14080c0(GFP_KERNEL|__GFP_ZERO), nodemask=(null)
>> CPU: 1 PID: 27333 Comm: syz-executor2 Not tainted 4.16.0-rc4+ #260
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:17 [inline]
>>  dump_stack+0x194/0x24d lib/dump_stack.c:53
>>  panic+0x1e4/0x41c kernel/panic.c:183
>> syz-executor0 cpuset=
>>  __warn+0x1dc/0x200 kernel/panic.c:547
>> /
>>  mems_allowed=0
>>  report_bug+0x211/0x2d0 lib/bug.c:184
>>  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
>>  fixup_bug arch/x86/kernel/traps.c:247 [inline]
>>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>>  invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
>> RIP: 0010:kmalloc_slab+0x5d/0x70 mm/slab_common.c:1012
>> RSP: 0018:ffff8801ccfc72f0 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: 0000000010000018 RCX: ffffffff84ec4fc8
>> RDX: 0000000000000ba7 RSI: 0000000000000000 RDI: 0000000010000018
>> RBP: ffff8801ccfc72f0 R08: 0000000000000000 R09: 1ffff100399f8e21
>> R10: ffff8801ccfc7040 R11: 0000000000000001 R12: 0000000000000018
>> R13: ffff8801ccfc7598 R14: 00000000014080c0 R15: ffff8801aebaad80
>>  __do_kmalloc mm/slab.c:3700 [inline]
>>  __kmalloc+0x25/0x760 mm/slab.c:3714
>>  kmalloc include/linux/slab.h:517 [inline]
>>  kzalloc include/linux/slab.h:701 [inline]
>>  xfrm_alloc_replay_state_esn net/xfrm/xfrm_user.c:442 [inline]
>
> This is likely fixed with:
>
> commit d97ca5d714a5334aecadadf696875da40f1fbf3e
> xfrm_user: uncoditionally validate esn replay attribute struct
>
> The patch is included in the ipsec pull request for the net
> tree I've sent this morning.

Let's tell syzbot:

#syz fix: xfrm_user: uncoditionally validate esn replay attribute struct

^ permalink raw reply

* Re: [pci PATCH v5 3/4] ena: Migrate over to unmanaged SR-IOV support
From: David Woodhouse @ 2018-03-13  8:12 UTC (permalink / raw)
  To: Alexander Duyck, bhelgaas, alexander.h.duyck, linux-pci
  Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
	keith.busch, netanel, mheyne, liang-min.wang, mark.d.rustad, hch
In-Reply-To: <20180312172309.3487.76690.stgit@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 603 bytes --]

On Mon, 2018-03-12 at 10:23 -0700, Alexander Duyck wrote:
> 
> -       .sriov_configure = ena_sriov_configure,
> +#ifdef CONFIG_PCI_IOV
> +       .sriov_configure = pci_sriov_configure_simple,
> +#endif
>  };

I'd like to see that ifdef go away, as discussed. I agree that just
#define pci_sriov_configure_simple NULL
should suffice. As Christoph points out, it's not going to compile if
people try to just invoke it directly.

I'd also *really* like to see a way to enable this for PFs which don't
have (and don't need) a driver. We seem to have lost that along the
way.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply

* Re: [pci PATCH v5 3/4] ena: Migrate over to unmanaged SR-IOV support
From: Christoph Hellwig @ 2018-03-13  8:16 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Alexander Duyck, bhelgaas, alexander.h.duyck, linux-pci,
	virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
	keith.busch, netanel, mheyne, liang-min.wang, mark.d.rustad, hch
In-Reply-To: <1520928772.28745.53.camel@infradead.org>

On Tue, Mar 13, 2018 at 08:12:52AM +0000, David Woodhouse wrote:
> I'd also *really* like to see a way to enable this for PFs which don't
> have (and don't need) a driver. We seem to have lost that along the
> way.

We've been forth and back on that.  I agree that not having any driver
just seems dangerous.  If your PF really does nothing we should just
have a trivial pf_stub driver that does nothing but wiring up
pci_sriov_configure_simple.  We can then add PCI IDs to it either
statically, or using the dynamic ids mechanism.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox