* [PATCH net 0/2] Fix vlan untag and insertion for bridge and vlan with reorder_hdr off
From: Toshiaki Makita @ 2018-03-13 5:51 UTC (permalink / raw)
To: David S. Miller; +Cc: Toshiaki Makita, netdev, Brandon Carpenter, Vlad Yasevich
As Brandon Carpenter reported[1], sending non-vlan-offloaded packets from
bridge devices ends up with corrupted packets. He narrowed down this problem
and found that the root cause is in skb_reorder_vlan_header().
While I was working on fixing this problem, I found that the function does
not work properly for double tagged packets with reorder_hdr off as well.
Patch 1 fixes these 2 problems in skb_reorder_vlan_header().
And it turned out that fixing skb_reorder_vlan_header() is not sufficient
to receive double tagged packets with reorder_hdr off while I was testing the
fix. Vlan tags got out of order when vlan devices with reorder_hdr disabled
were stacked. Patch 2 fixes this problem.
[1] https://www.spinics.net/lists/linux-ethernet-bridging/msg07039.html
Toshiaki Makita (2):
net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off
vlan: Fix out of order vlan headers with reorder header off
include/linux/if_vlan.h | 66 +++++++++++++++++++++++++++++++++++--------
include/uapi/linux/if_ether.h | 1 +
net/8021q/vlan_core.c | 4 +--
net/core/skbuff.c | 7 +++--
4 files changed, 63 insertions(+), 15 deletions(-)
--
1.8.3.1
^ permalink raw reply
* [PATCH net 2/2] vlan: Fix out of order vlan headers with reorder header off
From: Toshiaki Makita @ 2018-03-13 5:51 UTC (permalink / raw)
To: David S. Miller; +Cc: Toshiaki Makita, netdev, Brandon Carpenter, Vlad Yasevich
In-Reply-To: <1520920288-2483-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>
With reorder header off, received packets are untagged in skb_vlan_untag()
called from within __netif_receive_skb_core(), and later the tag will be
inserted back in vlan_do_receive().
This caused out of order vlan headers when we create a vlan device on top
of another vlan device, because vlan_do_receive() inserts a tag as the
outermost vlan tag. E.g. the outer tag is first removed in skb_vlan_untag()
and inserted back in vlan_do_receive(), then the inner tag is next removed
and inserted back as the outermost tag.
This patch fixes the behaviour by inserting the inner tag at the right
position.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
---
include/linux/if_vlan.h | 66 ++++++++++++++++++++++++++++++++++++++++---------
net/8021q/vlan_core.c | 4 +--
2 files changed, 57 insertions(+), 13 deletions(-)
diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 5e6a2d4..c4a1cff 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -300,30 +300,34 @@ static inline bool vlan_hw_offload_capable(netdev_features_t features,
}
/**
- * __vlan_insert_tag - regular VLAN tag inserting
+ * __vlan_insert_inner_tag - inner VLAN tag inserting
* @skb: skbuff to tag
* @vlan_proto: VLAN encapsulation protocol
* @vlan_tci: VLAN TCI to insert
+ * @mac_len: MAC header length including outer vlan headers
*
- * Inserts the VLAN tag into @skb as part of the payload
+ * Inserts the VLAN tag into @skb as part of the payload at offset mac_len
* Returns error if skb_cow_head failes.
*
* Does not change skb->protocol so this function can be used during receive.
*/
-static inline int __vlan_insert_tag(struct sk_buff *skb,
- __be16 vlan_proto, u16 vlan_tci)
+static inline int __vlan_insert_inner_tag(struct sk_buff *skb,
+ __be16 vlan_proto, u16 vlan_tci,
+ unsigned int mac_len)
{
struct vlan_ethhdr *veth;
if (skb_cow_head(skb, VLAN_HLEN) < 0)
return -ENOMEM;
- veth = skb_push(skb, VLAN_HLEN);
+ skb_push(skb, VLAN_HLEN);
- /* Move the mac addresses to the beginning of the new header. */
- memmove(skb->data, skb->data + VLAN_HLEN, 2 * ETH_ALEN);
+ /* Move the mac header sans proto to the beginning of the new header. */
+ memmove(skb->data, skb->data + VLAN_HLEN, mac_len - ETH_TLEN);
skb->mac_header -= VLAN_HLEN;
+ veth = (struct vlan_ethhdr *)(skb->data + mac_len - ETH_HLEN);
+
/* first, the ethernet type */
veth->h_vlan_proto = vlan_proto;
@@ -334,12 +338,30 @@ static inline int __vlan_insert_tag(struct sk_buff *skb,
}
/**
- * vlan_insert_tag - regular VLAN tag inserting
+ * __vlan_insert_tag - regular VLAN tag inserting
* @skb: skbuff to tag
* @vlan_proto: VLAN encapsulation protocol
* @vlan_tci: VLAN TCI to insert
*
* Inserts the VLAN tag into @skb as part of the payload
+ * Returns error if skb_cow_head failes.
+ *
+ * Does not change skb->protocol so this function can be used during receive.
+ */
+static inline int __vlan_insert_tag(struct sk_buff *skb,
+ __be16 vlan_proto, u16 vlan_tci)
+{
+ return __vlan_insert_inner_tag(skb, vlan_proto, vlan_tci, ETH_HLEN);
+}
+
+/**
+ * vlan_insert_inner_tag - inner VLAN tag inserting
+ * @skb: skbuff to tag
+ * @vlan_proto: VLAN encapsulation protocol
+ * @vlan_tci: VLAN TCI to insert
+ * @mac_len: MAC header length including outer vlan headers
+ *
+ * Inserts the VLAN tag into @skb as part of the payload at offset mac_len
* Returns a VLAN tagged skb. If a new skb is created, @skb is freed.
*
* Following the skb_unshare() example, in case of error, the calling function
@@ -347,12 +369,14 @@ static inline int __vlan_insert_tag(struct sk_buff *skb,
*
* Does not change skb->protocol so this function can be used during receive.
*/
-static inline struct sk_buff *vlan_insert_tag(struct sk_buff *skb,
- __be16 vlan_proto, u16 vlan_tci)
+static inline struct sk_buff *vlan_insert_inner_tag(struct sk_buff *skb,
+ __be16 vlan_proto,
+ u16 vlan_tci,
+ unsigned int mac_len)
{
int err;
- err = __vlan_insert_tag(skb, vlan_proto, vlan_tci);
+ err = __vlan_insert_inner_tag(skb, vlan_proto, vlan_tci, mac_len);
if (err) {
dev_kfree_skb_any(skb);
return NULL;
@@ -361,6 +385,26 @@ static inline struct sk_buff *vlan_insert_tag(struct sk_buff *skb,
}
/**
+ * vlan_insert_tag - regular VLAN tag inserting
+ * @skb: skbuff to tag
+ * @vlan_proto: VLAN encapsulation protocol
+ * @vlan_tci: VLAN TCI to insert
+ *
+ * Inserts the VLAN tag into @skb as part of the payload
+ * Returns a VLAN tagged skb. If a new skb is created, @skb is freed.
+ *
+ * Following the skb_unshare() example, in case of error, the calling function
+ * doesn't have to worry about freeing the original skb.
+ *
+ * Does not change skb->protocol so this function can be used during receive.
+ */
+static inline struct sk_buff *vlan_insert_tag(struct sk_buff *skb,
+ __be16 vlan_proto, u16 vlan_tci)
+{
+ return vlan_insert_inner_tag(skb, vlan_proto, vlan_tci, ETH_HLEN);
+}
+
+/**
* vlan_insert_tag_set_proto - regular VLAN tag inserting
* @skb: skbuff to tag
* @vlan_proto: VLAN encapsulation protocol
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 64aa9f7..45c9bf5 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -48,8 +48,8 @@ bool vlan_do_receive(struct sk_buff **skbp)
* original position later
*/
skb_push(skb, offset);
- skb = *skbp = vlan_insert_tag(skb, skb->vlan_proto,
- skb->vlan_tci);
+ skb = *skbp = vlan_insert_inner_tag(skb, skb->vlan_proto,
+ skb->vlan_tci, skb->mac_len);
if (!skb)
return false;
skb_pull(skb, offset + VLAN_HLEN);
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH v2 net] net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()
From: Steffen Klassert @ 2018-03-13 6:50 UTC (permalink / raw)
To: Greg Hackmann; +Cc: Herbert Xu, David S. Miller, netdev, linux-kernel
In-Reply-To: <20180307224253.152470-1-ghackmann@google.com>
On Wed, Mar 07, 2018 at 02:42:53PM -0800, Greg Hackmann wrote:
> f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a
> __this_cpu_read() call inside ipcomp_alloc_tfms().
>
> At the time, __this_cpu_read() required the caller to either not care
> about races or to handle preemption/interrupt issues. 3.15 tightened
> the rules around some per-cpu operations, and now __this_cpu_read()
> should never be used in a preemptible context. On 3.15 and later, we
> need to use this_cpu_read() instead.
>
...
> Signed-off-by: Greg Hackmann <ghackmann@google.com>
Patch applied, thanks!
^ permalink raw reply
* Re: [PATCH net-next v2] sctp: fix error return code in sctp_sendmsg_new_asoc()
From: Xin Long @ 2018-03-13 6:57 UTC (permalink / raw)
To: Wei Yongjun
Cc: Vlad Yasevich, Neil Horman, linux-sctp, network dev,
kernel-janitors
In-Reply-To: <1520910210-147500-1-git-send-email-weiyongjun1@huawei.com>
On Tue, Mar 13, 2018 at 11:03 AM, Wei Yongjun <weiyongjun1@huawei.com> wrote:
> Return error code -EINVAL in the address len check error handling
> case since 'err' can be overwrite to 0 by 'err = sctp_verify_addr()'
> in the for loop.
>
> Fixes: 2c0dbaa0c43d ("sctp: add support for SCTP_DSTADDRV4/6 Information for sendmsg")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> ---
> v1 -> v2: remove the 'err' initialization
> ---
> net/sctp/socket.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 7d3476a..af5cf29 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1677,7 +1677,7 @@ static int sctp_sendmsg_new_asoc(struct sock *sk, __u16 sflags,
> struct sctp_association *asoc;
> enum sctp_scope scope;
> struct cmsghdr *cmsg;
> - int err = -EINVAL;
> + int err;
>
> *tp = NULL;
>
> @@ -1761,16 +1761,20 @@ static int sctp_sendmsg_new_asoc(struct sock *sk, __u16 sflags,
> memset(daddr, 0, sizeof(*daddr));
> dlen = cmsg->cmsg_len - sizeof(struct cmsghdr);
> if (cmsg->cmsg_type == SCTP_DSTADDRV4) {
> - if (dlen < sizeof(struct in_addr))
> + if (dlen < sizeof(struct in_addr)) {
> + err = -EINVAL;
> goto free;
> + }
>
> dlen = sizeof(struct in_addr);
> daddr->v4.sin_family = AF_INET;
> daddr->v4.sin_port = htons(asoc->peer.port);
> memcpy(&daddr->v4.sin_addr, CMSG_DATA(cmsg), dlen);
> } else {
> - if (dlen < sizeof(struct in6_addr))
> + if (dlen < sizeof(struct in6_addr)) {
> + err = -EINVAL;
> goto free;
> + }
>
> dlen = sizeof(struct in6_addr);
> daddr->v6.sin6_family = AF_INET6;
>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
^ permalink raw reply
* Re: [PATCH v2] sctp: Fix double free in sctp_sendmsg_to_asoc
From: Xin Long @ 2018-03-13 7:03 UTC (permalink / raw)
To: Neil Horman; +Cc: linux-sctp, network dev, davem
In-Reply-To: <20180312181525.21774-1-nhorman@tuxdriver.com>
On Tue, Mar 13, 2018 at 2:15 AM, Neil Horman <nhorman@tuxdriver.com> wrote:
> syzbot/kasan detected a double free in sctp_sendmsg_to_asoc:
> BUG: KASAN: use-after-free in sctp_association_free+0x7b7/0x930
> net/sctp/associola.c:332
> Read of size 8 at addr ffff8801d8006ae0 by task syzkaller914861/4202
>
> CPU: 1 PID: 4202 Comm: syzkaller914861 Not tainted 4.16.0-rc4+ #258
> Hardware name: Google Google Compute Engine/Google Compute Engine
> 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:17 [inline]
> dump_stack+0x194/0x24d lib/dump_stack.c:53
> print_address_description+0x73/0x250 mm/kasan/report.c:256
> kasan_report_error mm/kasan/report.c:354 [inline]
> kasan_report+0x23c/0x360 mm/kasan/report.c:412
> __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
> sctp_association_free+0x7b7/0x930 net/sctp/associola.c:332
> sctp_sendmsg+0xc67/0x1a80 net/sctp/socket.c:2075
> inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
> sock_sendmsg_nosec net/socket.c:629 [inline]
> sock_sendmsg+0xca/0x110 net/socket.c:639
> SYSC_sendto+0x361/0x5c0 net/socket.c:1748
> SyS_sendto+0x40/0x50 net/socket.c:1716
> do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
>
> This was introduced by commit:
> f84af33 sctp: factor out sctp_sendmsg_to_asoc from sctp_sendmsg
>
> As the newly refactored function moved the wait_for_sndbuf call to a
> point after the association was connected, allowing for peeloff events
> to occur, which in turn caused wait_for_sndbuf to return -EPIPE which
> was not caught by the logic that determines if an association should be
> freed or not.
>
> Fix it the easy way by returning the ordering of
> sctp_primitive_ASSOCIATE and sctp_wait_for_sndbuf to the old order, to
> ensure that EPIPE will not happen.
>
> Tested by myself using the syzbot reproducers with positive results
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: davem@davemloft.net
> CC: Xin Long <lucien.xin@gmail.com>
> Reported-by: syzbot+a4e4112c3aff00c8cfd8@syzkaller.appspotmail.com
>
> ---
> Change notes
> v2)
> * Moved additional calls to restore origional ordering
> * add sctp prefix
> ---
> net/sctp/socket.c | 26 +++++++++++++-------------
> 1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 7d3476a4860d..4bbfcf9532c2 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1876,6 +1876,19 @@ static int sctp_sendmsg_to_asoc(struct sctp_association *asoc,
> goto err;
> }
>
> + if (asoc->pmtu_pending)
> + sctp_assoc_pending_pmtu(asoc);
> +
> + if (sctp_wspace(asoc) < msg_len)
> + sctp_prsctp_prune(asoc, sinfo, msg_len - sctp_wspace(asoc));
> +
> + if (!sctp_wspace(asoc)) {
> + timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
> + err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len);
> + if (err)
> + goto err;
> + }
> +
> if (sctp_state(asoc, CLOSED)) {
> err = sctp_primitive_ASSOCIATE(net, asoc, NULL);
> if (err)
> @@ -1893,19 +1906,6 @@ static int sctp_sendmsg_to_asoc(struct sctp_association *asoc,
> pr_debug("%s: we associated primitively\n", __func__);
> }
>
> - if (asoc->pmtu_pending)
> - sctp_assoc_pending_pmtu(asoc);
> -
> - if (sctp_wspace(asoc) < msg_len)
> - sctp_prsctp_prune(asoc, sinfo, msg_len - sctp_wspace(asoc));
> -
> - if (!sctp_wspace(asoc)) {
> - timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
> - err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len);
> - if (err)
> - goto err;
> - }
> -
> datamsg = sctp_datamsg_from_user(asoc, sinfo, &msg->msg_iter);
> if (IS_ERR(datamsg)) {
> err = PTR_ERR(datamsg);
> --
> 2.14.3
>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
^ permalink raw reply
* [PATCH 3/9] xfrm_user: uncoditionally validate esn replay attribute struct
From: Steffen Klassert @ 2018-03-13 7:09 UTC (permalink / raw)
To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>
From: Florian Westphal <fw@strlen.de>
The sanity test added in ecd7918745234 can be bypassed, validation
only occurs if XFRM_STATE_ESN flag is set, but rest of code doesn't care
and just checks if the attribute itself is present.
So always validate. Alternative is to reject if we have the attribute
without the flag but that would change abi.
Reported-by: syzbot+0ab777c27d2bb7588f73@syzkaller.appspotmail.com
Cc: Mathias Krause <minipli@googlemail.com>
Fixes: ecd7918745234 ("xfrm_user: ensure user supplied esn replay window is valid")
Fixes: d8647b79c3b7e ("xfrm: Add user interface for esn and big anti-replay windows")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_user.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 7f52b8eb177d..080035f056d9 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -121,22 +121,17 @@ static inline int verify_replay(struct xfrm_usersa_info *p,
struct nlattr *rt = attrs[XFRMA_REPLAY_ESN_VAL];
struct xfrm_replay_state_esn *rs;
- if (p->flags & XFRM_STATE_ESN) {
- if (!rt)
- return -EINVAL;
+ if (!rt)
+ return (p->flags & XFRM_STATE_ESN) ? -EINVAL : 0;
- rs = nla_data(rt);
+ rs = nla_data(rt);
- if (rs->bmp_len > XFRMA_REPLAY_ESN_MAX / sizeof(rs->bmp[0]) / 8)
- return -EINVAL;
-
- if (nla_len(rt) < (int)xfrm_replay_state_esn_len(rs) &&
- nla_len(rt) != sizeof(*rs))
- return -EINVAL;
- }
+ if (rs->bmp_len > XFRMA_REPLAY_ESN_MAX / sizeof(rs->bmp[0]) / 8)
+ return -EINVAL;
- if (!rt)
- return 0;
+ if (nla_len(rt) < (int)xfrm_replay_state_esn_len(rs) &&
+ nla_len(rt) != sizeof(*rs))
+ return -EINVAL;
/* As only ESP and AH support ESN feature. */
if ((p->id.proto != IPPROTO_ESP) && (p->id.proto != IPPROTO_AH))
--
2.14.1
^ permalink raw reply related
* [PATCH 4/9] xfrm: reuse uncached_list to track xdsts
From: Steffen Klassert @ 2018-03-13 7:09 UTC (permalink / raw)
To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>
From: Xin Long <lucien.xin@gmail.com>
In early time, when freeing a xdst, it would be inserted into
dst_garbage.list first. Then if it's refcnt was still held
somewhere, later it would be put into dst_busy_list in
dst_gc_task().
When one dev was being unregistered, the dev of these dsts in
dst_busy_list would be set with loopback_dev and put this dev.
So that this dev's removal wouldn't get blocked, and avoid the
kmsg warning:
kernel:unregister_netdevice: waiting for veth0 to become \
free. Usage count = 2
However after Commit 52df157f17e5 ("xfrm: take refcnt of dst
when creating struct xfrm_dst bundle"), the xdst will not be
freed with dst gc, and this warning happens.
To fix it, we need to find these xdsts that are still held by
others when removing the dev, and free xdst's dev and set it
with loopback_dev.
But unfortunately after flow_cache for xfrm was deleted, no
list tracks them anymore. So we need to save these xdsts
somewhere to release the xdst's dev later.
To make this easier, this patch is to reuse uncached_list to
track xdsts, so that the dev refcnt can be released in the
event NETDEV_UNREGISTER process of fib_netdev_notifier.
Thanks to Florian, we could move forward this fix quickly.
Fixes: 52df157f17e5 ("xfrm: take refcnt of dst when creating struct xfrm_dst bundle")
Reported-by: Jianlin Shi <jishi@redhat.com>
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Tested-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
include/net/ip6_route.h | 3 +++
include/net/route.h | 3 +++
net/ipv4/route.c | 21 +++++++++++++--------
net/ipv4/xfrm4_policy.c | 4 +++-
net/ipv6/route.c | 4 ++--
net/ipv6/xfrm6_policy.c | 5 +++++
6 files changed, 29 insertions(+), 11 deletions(-)
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 27d23a65f3cd..ac0866bb9e93 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -179,6 +179,9 @@ void rt6_disable_ip(struct net_device *dev, unsigned long event);
void rt6_sync_down_dev(struct net_device *dev, unsigned long event);
void rt6_multipath_rebalance(struct rt6_info *rt);
+void rt6_uncached_list_add(struct rt6_info *rt);
+void rt6_uncached_list_del(struct rt6_info *rt);
+
static inline const struct rt6_info *skb_rt6_info(const struct sk_buff *skb)
{
const struct dst_entry *dst = skb_dst(skb);
diff --git a/include/net/route.h b/include/net/route.h
index 1eb9ce470e25..40b870d58f38 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -227,6 +227,9 @@ struct in_ifaddr;
void fib_add_ifaddr(struct in_ifaddr *);
void fib_del_ifaddr(struct in_ifaddr *, struct in_ifaddr *);
+void rt_add_uncached_list(struct rtable *rt);
+void rt_del_uncached_list(struct rtable *rt);
+
static inline void ip_rt_put(struct rtable *rt)
{
/* dst_release() accepts a NULL parameter.
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 49cc1c1df1ba..1d1e4abe04b0 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1383,7 +1383,7 @@ struct uncached_list {
static DEFINE_PER_CPU_ALIGNED(struct uncached_list, rt_uncached_list);
-static void rt_add_uncached_list(struct rtable *rt)
+void rt_add_uncached_list(struct rtable *rt)
{
struct uncached_list *ul = raw_cpu_ptr(&rt_uncached_list);
@@ -1394,14 +1394,8 @@ static void rt_add_uncached_list(struct rtable *rt)
spin_unlock_bh(&ul->lock);
}
-static void ipv4_dst_destroy(struct dst_entry *dst)
+void rt_del_uncached_list(struct rtable *rt)
{
- struct dst_metrics *p = (struct dst_metrics *)DST_METRICS_PTR(dst);
- struct rtable *rt = (struct rtable *) dst;
-
- if (p != &dst_default_metrics && refcount_dec_and_test(&p->refcnt))
- kfree(p);
-
if (!list_empty(&rt->rt_uncached)) {
struct uncached_list *ul = rt->rt_uncached_list;
@@ -1411,6 +1405,17 @@ static void ipv4_dst_destroy(struct dst_entry *dst)
}
}
+static void ipv4_dst_destroy(struct dst_entry *dst)
+{
+ struct dst_metrics *p = (struct dst_metrics *)DST_METRICS_PTR(dst);
+ struct rtable *rt = (struct rtable *)dst;
+
+ if (p != &dst_default_metrics && refcount_dec_and_test(&p->refcnt))
+ kfree(p);
+
+ rt_del_uncached_list(rt);
+}
+
void rt_flush_dev(struct net_device *dev)
{
struct net *net = dev_net(dev);
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 05017e2c849c..8d33f7b311f4 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -102,6 +102,7 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
xdst->u.rt.rt_pmtu = rt->rt_pmtu;
xdst->u.rt.rt_table_id = rt->rt_table_id;
INIT_LIST_HEAD(&xdst->u.rt.rt_uncached);
+ rt_add_uncached_list(&xdst->u.rt);
return 0;
}
@@ -241,7 +242,8 @@ static void xfrm4_dst_destroy(struct dst_entry *dst)
struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
dst_destroy_metrics_generic(dst);
-
+ if (xdst->u.rt.rt_uncached_list)
+ rt_del_uncached_list(&xdst->u.rt);
xfrm_dst_destroy(xdst);
}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fb2d251c0500..38b75e9d6eae 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -128,7 +128,7 @@ struct uncached_list {
static DEFINE_PER_CPU_ALIGNED(struct uncached_list, rt6_uncached_list);
-static void rt6_uncached_list_add(struct rt6_info *rt)
+void rt6_uncached_list_add(struct rt6_info *rt)
{
struct uncached_list *ul = raw_cpu_ptr(&rt6_uncached_list);
@@ -139,7 +139,7 @@ static void rt6_uncached_list_add(struct rt6_info *rt)
spin_unlock_bh(&ul->lock);
}
-static void rt6_uncached_list_del(struct rt6_info *rt)
+void rt6_uncached_list_del(struct rt6_info *rt)
{
if (!list_empty(&rt->rt6i_uncached)) {
struct uncached_list *ul = rt->rt6i_uncached_list;
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 09fb44ee3b45..416fe67271a9 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -113,6 +113,9 @@ static int xfrm6_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
xdst->u.rt6.rt6i_gateway = rt->rt6i_gateway;
xdst->u.rt6.rt6i_dst = rt->rt6i_dst;
xdst->u.rt6.rt6i_src = rt->rt6i_src;
+ INIT_LIST_HEAD(&xdst->u.rt6.rt6i_uncached);
+ rt6_uncached_list_add(&xdst->u.rt6);
+ atomic_inc(&dev_net(dev)->ipv6.rt6_stats->fib_rt_uncache);
return 0;
}
@@ -244,6 +247,8 @@ static void xfrm6_dst_destroy(struct dst_entry *dst)
if (likely(xdst->u.rt6.rt6i_idev))
in6_dev_put(xdst->u.rt6.rt6i_idev);
dst_destroy_metrics_generic(dst);
+ if (xdst->u.rt6.rt6i_uncached_list)
+ rt6_uncached_list_del(&xdst->u.rt6);
xfrm_dst_destroy(xdst);
}
--
2.14.1
^ permalink raw reply related
* pull request (net): ipsec 2018-03-13
From: Steffen Klassert @ 2018-03-13 7:09 UTC (permalink / raw)
To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
1) Refuse to insert 32 bit userspace socket policies on 64
bit systems like we do it for standard policies. We don't
have a compat layer, so inserting socket policies from
32 bit userspace will lead to a broken configuration.
2) Make the policy hold queue work without the flowcache.
Dummy bundles are not chached anymore, so we need to
generate a new one on each lookup as long as the SAs
are not yet in place.
3) Fix the validation of the esn replay attribute. The
The sanity check in verify_replay() is bypassed if
the XFRM_STATE_ESN flag is not set. Fix this by doing
the sanity check uncoditionally.
From Florian Westphal.
4) After most of the dst_entry garbage collection code
is removed, we may leak xfrm_dst entries as they are
neither cached nor tracked somewhere. Fix this by
reusing the 'uncached_list' to track xfrm_dst entries
too. From Xin Long.
5) Fix a rcu_read_lock/rcu_read_unlock imbalance in
xfrm_get_tos() From Xin Long.
6) Fix an infinite loop in xfrm_get_dst_nexthop. On
transport mode we fetch the child dst_entry after
we continue, so this pointer is never updated.
Fix this by fetching it before we continue.
7) Fix ESN sequence number gap after IPsec GSO packets.
We accidentally increment the sequence number counter
on the xfrm_state by one packet too much in the ESN
case. Fix this by setting the sequence number to the
correct value.
8) Reset the ethernet protocol after decapsulation only if a
mac header was set. Otherwise it breaks configurations
with TUN devices. From Yossi Kuperman.
9) Fix __this_cpu_read() usage in preemptible code. Use
this_cpu_read() instead in ipcomp_alloc_tfms().
From Greg Hackmann.
Please pull or let me know if there are problems.
Thanks!
The following changes since commit 743ffffefac1c670c6618742c923f6275d819604:
net: pxa168_eth: add netconsole support (2018-02-01 14:58:37 -0500)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git master
for you to fetch changes up to 0dcd7876029b58770f769cbb7b484e88e4a305e5:
net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms() (2018-03-13 07:46:37 +0100)
----------------------------------------------------------------
Florian Westphal (1):
xfrm_user: uncoditionally validate esn replay attribute struct
Greg Hackmann (1):
net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()
Steffen Klassert (4):
xfrm: Refuse to insert 32 bit userspace socket policies on 64 bit systems
xfrm: Fix policy hold queue after flowcache removal.
xfrm: Fix infinite loop in xfrm_get_dst_nexthop with transport mode.
xfrm: Fix ESN sequence number handling for IPsec GSO packets.
Xin Long (2):
xfrm: reuse uncached_list to track xdsts
xfrm: do not call rcu_read_unlock when afinfo is NULL in xfrm_get_tos
Yossi Kuperman (1):
xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto
include/net/ip6_route.h | 3 +++
include/net/route.h | 3 +++
net/ipv4/route.c | 21 +++++++++++++--------
net/ipv4/xfrm4_mode_tunnel.c | 3 ++-
net/ipv4/xfrm4_policy.c | 4 +++-
net/ipv6/route.c | 4 ++--
net/ipv6/xfrm6_mode_tunnel.c | 3 ++-
net/ipv6/xfrm6_policy.c | 5 +++++
net/xfrm/xfrm_ipcomp.c | 2 +-
net/xfrm/xfrm_policy.c | 13 ++++++++-----
net/xfrm/xfrm_replay.c | 2 +-
net/xfrm/xfrm_state.c | 5 +++++
net/xfrm/xfrm_user.c | 21 ++++++++-------------
13 files changed, 56 insertions(+), 33 deletions(-)
^ permalink raw reply
* [PATCH 5/9] xfrm: do not call rcu_read_unlock when afinfo is NULL in xfrm_get_tos
From: Steffen Klassert @ 2018-03-13 7:09 UTC (permalink / raw)
To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>
From: Xin Long <lucien.xin@gmail.com>
When xfrm_policy_get_afinfo returns NULL, it will not hold rcu
read lock. In this case, rcu_read_unlock should not be called
in xfrm_get_tos, just like other places where it's calling
xfrm_policy_get_afinfo.
Fixes: f5e2bb4f5b22 ("xfrm: policy: xfrm_get_tos cannot fail")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_policy.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 8b3811ff002d..150d46633ce6 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1458,10 +1458,13 @@ xfrm_tmpl_resolve(struct xfrm_policy **pols, int npols, const struct flowi *fl,
static int xfrm_get_tos(const struct flowi *fl, int family)
{
const struct xfrm_policy_afinfo *afinfo;
- int tos = 0;
+ int tos;
afinfo = xfrm_policy_get_afinfo(family);
- tos = afinfo ? afinfo->get_tos(fl) : 0;
+ if (!afinfo)
+ return 0;
+
+ tos = afinfo->get_tos(fl);
rcu_read_unlock();
--
2.14.1
^ permalink raw reply related
* [PATCH 1/9] xfrm: Refuse to insert 32 bit userspace socket policies on 64 bit systems
From: Steffen Klassert @ 2018-03-13 7:09 UTC (permalink / raw)
To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>
We don't have a compat layer for xfrm, so userspace and kernel
structures have different sizes in this case. This results in
a broken configuration, so refuse to configure socket policies
when trying to insert from 32 bit userspace as we do it already
with policies inserted via netlink.
Reported-and-tested-by: syzbot+e1a1577ca8bcb47b769a@syzkaller.appspotmail.com
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_state.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 54e21f19d722..f9d2f2233f09 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2056,6 +2056,11 @@ int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen
struct xfrm_mgr *km;
struct xfrm_policy *pol = NULL;
+#ifdef CONFIG_COMPAT
+ if (in_compat_syscall())
+ return -EOPNOTSUPP;
+#endif
+
if (!optval && !optlen) {
xfrm_sk_policy_insert(sk, XFRM_POLICY_IN, NULL);
xfrm_sk_policy_insert(sk, XFRM_POLICY_OUT, NULL);
--
2.14.1
^ permalink raw reply related
* [PATCH 8/9] xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto
From: Steffen Klassert @ 2018-03-13 7:09 UTC (permalink / raw)
To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>
From: Yossi Kuperman <yossiku@mellanox.com>
Artem Savkov reported that commit 5efec5c655dd leads to a packet loss under
IPSec configuration. It appears that his setup consists of a TUN device,
which does not have a MAC header.
Make sure MAC header exists.
Note: TUN device sets a MAC header pointer, although it does not have one.
Fixes: 5efec5c655dd ("xfrm: Fix eth_hdr(skb)->h_proto to reflect inner IP version")
Reported-by: Artem Savkov <artem.savkov@gmail.com>
Tested-by: Artem Savkov <artem.savkov@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/ipv4/xfrm4_mode_tunnel.c | 3 ++-
net/ipv6/xfrm6_mode_tunnel.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c
index 63faeee989a9..2a9764bd1719 100644
--- a/net/ipv4/xfrm4_mode_tunnel.c
+++ b/net/ipv4/xfrm4_mode_tunnel.c
@@ -92,7 +92,8 @@ static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
skb_reset_network_header(skb);
skb_mac_header_rebuild(skb);
- eth_hdr(skb)->h_proto = skb->protocol;
+ if (skb->mac_len)
+ eth_hdr(skb)->h_proto = skb->protocol;
err = 0;
diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c
index bb935a3b7fea..de1b0b8c53b0 100644
--- a/net/ipv6/xfrm6_mode_tunnel.c
+++ b/net/ipv6/xfrm6_mode_tunnel.c
@@ -92,7 +92,8 @@ static int xfrm6_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
skb_reset_network_header(skb);
skb_mac_header_rebuild(skb);
- eth_hdr(skb)->h_proto = skb->protocol;
+ if (skb->mac_len)
+ eth_hdr(skb)->h_proto = skb->protocol;
err = 0;
--
2.14.1
^ permalink raw reply related
* [PATCH 6/9] xfrm: Fix infinite loop in xfrm_get_dst_nexthop with transport mode.
From: Steffen Klassert @ 2018-03-13 7:09 UTC (permalink / raw)
To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>
On transport mode we forget to fetch the child dst_entry
before we continue the while loop, this leads to an infinite
loop. Fix this by fetching the child dst_entry before we
continue the while loop.
Fixes: 0f6c480f23f4 ("xfrm: Move dst->path into struct xfrm_dst")
Reported-by: syzbot+7d03c810e50aaedef98a@syzkaller.appspotmail.com
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_policy.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 150d46633ce6..625b3fca5704 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2732,14 +2732,14 @@ static const void *xfrm_get_dst_nexthop(const struct dst_entry *dst,
while (dst->xfrm) {
const struct xfrm_state *xfrm = dst->xfrm;
+ dst = xfrm_dst_child(dst);
+
if (xfrm->props.mode == XFRM_MODE_TRANSPORT)
continue;
if (xfrm->type->flags & XFRM_TYPE_REMOTE_COADDR)
daddr = xfrm->coaddr;
else if (!(xfrm->type->flags & XFRM_TYPE_LOCAL_COADDR))
daddr = &xfrm->id.daddr;
-
- dst = xfrm_dst_child(dst);
}
return daddr;
}
--
2.14.1
^ permalink raw reply related
* [PATCH 2/9] xfrm: Fix policy hold queue after flowcache removal.
From: Steffen Klassert @ 2018-03-13 7:09 UTC (permalink / raw)
To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>
Now that the flowcache is removed we need to generate
a new dummy bundle every time we check if the needed
SAs are in place because the dummy bundle is not cached
anymore. Fix it by passing the XFRM_LOOKUP_QUEUE flag
to xfrm_lookup(). This makes sure that we get a dummy
bundle in case the SAs are not yet in place.
Fixes: 3ca28286ea80 ("xfrm_policy: bypass flow_cache_lookup")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_policy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 7a23078132cf..8b3811ff002d 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1891,7 +1891,7 @@ static void xfrm_policy_queue_process(struct timer_list *t)
spin_unlock(&pq->hold_queue.lock);
dst_hold(xfrm_dst_path(dst));
- dst = xfrm_lookup(net, xfrm_dst_path(dst), &fl, sk, 0);
+ dst = xfrm_lookup(net, xfrm_dst_path(dst), &fl, sk, XFRM_LOOKUP_QUEUE);
if (IS_ERR(dst))
goto purge_queue;
--
2.14.1
^ permalink raw reply related
* [PATCH 9/9] net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()
From: Steffen Klassert @ 2018-03-13 7:09 UTC (permalink / raw)
To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>
From: Greg Hackmann <ghackmann@google.com>
f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a
__this_cpu_read() call inside ipcomp_alloc_tfms().
At the time, __this_cpu_read() required the caller to either not care
about races or to handle preemption/interrupt issues. 3.15 tightened
the rules around some per-cpu operations, and now __this_cpu_read()
should never be used in a preemptible context. On 3.15 and later, we
need to use this_cpu_read() instead.
syzkaller reported this leading to the following kernel BUG while
fuzzing sendmsg:
BUG: using __this_cpu_read() in preemptible [00000000] code: repro/3101
caller is ipcomp_init_state+0x185/0x990
CPU: 3 PID: 3101 Comm: repro Not tainted 4.16.0-rc4-00123-g86f84779d8e9 #154
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
dump_stack+0xb9/0x115
check_preemption_disabled+0x1cb/0x1f0
ipcomp_init_state+0x185/0x990
? __xfrm_init_state+0x876/0xc20
? lock_downgrade+0x5e0/0x5e0
ipcomp4_init_state+0xaa/0x7c0
__xfrm_init_state+0x3eb/0xc20
xfrm_init_state+0x19/0x60
pfkey_add+0x20df/0x36f0
? pfkey_broadcast+0x3dd/0x600
? pfkey_sock_destruct+0x340/0x340
? pfkey_seq_stop+0x80/0x80
? __skb_clone+0x236/0x750
? kmem_cache_alloc+0x1f6/0x260
? pfkey_sock_destruct+0x340/0x340
? pfkey_process+0x62a/0x6f0
pfkey_process+0x62a/0x6f0
? pfkey_send_new_mapping+0x11c0/0x11c0
? mutex_lock_io_nested+0x1390/0x1390
pfkey_sendmsg+0x383/0x750
? dump_sp+0x430/0x430
sock_sendmsg+0xc0/0x100
___sys_sendmsg+0x6c8/0x8b0
? copy_msghdr_from_user+0x3b0/0x3b0
? pagevec_lru_move_fn+0x144/0x1f0
? find_held_lock+0x32/0x1c0
? do_huge_pmd_anonymous_page+0xc43/0x11e0
? lock_downgrade+0x5e0/0x5e0
? get_kernel_page+0xb0/0xb0
? _raw_spin_unlock+0x29/0x40
? do_huge_pmd_anonymous_page+0x400/0x11e0
? __handle_mm_fault+0x553/0x2460
? __fget_light+0x163/0x1f0
? __sys_sendmsg+0xc7/0x170
__sys_sendmsg+0xc7/0x170
? SyS_shutdown+0x1a0/0x1a0
? __do_page_fault+0x5a0/0xca0
? lock_downgrade+0x5e0/0x5e0
SyS_sendmsg+0x27/0x40
? __sys_sendmsg+0x170/0x170
do_syscall_64+0x19f/0x640
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x7f0ee73dfb79
RSP: 002b:00007ffe14fc15a8 EFLAGS: 00000207 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0ee73dfb79
RDX: 0000000000000000 RSI: 00000000208befc8 RDI: 0000000000000004
RBP: 00007ffe14fc15b0 R08: 00007ffe14fc15c0 R09: 00007ffe14fc15c0
R10: 0000000000000000 R11: 0000000000000207 R12: 0000000000400440
R13: 00007ffe14fc16b0 R14: 0000000000000000 R15: 0000000000000000
Signed-off-by: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_ipcomp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
index ccfdc7115a83..a00ec715aa46 100644
--- a/net/xfrm/xfrm_ipcomp.c
+++ b/net/xfrm/xfrm_ipcomp.c
@@ -283,7 +283,7 @@ static struct crypto_comp * __percpu *ipcomp_alloc_tfms(const char *alg_name)
struct crypto_comp *tfm;
/* This can be any valid CPU ID so we don't need locking. */
- tfm = __this_cpu_read(*pos->tfms);
+ tfm = this_cpu_read(*pos->tfms);
if (!strcmp(crypto_comp_name(tfm), alg_name)) {
pos->users++;
--
2.14.1
^ permalink raw reply related
* [PATCH 7/9] xfrm: Fix ESN sequence number handling for IPsec GSO packets.
From: Steffen Klassert @ 2018-03-13 7:09 UTC (permalink / raw)
To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <20180313070953.21317-1-steffen.klassert@secunet.com>
When IPsec offloading was introduced, we accidentally incremented
the sequence number counter on the xfrm_state by one packet
too much in the ESN case. This leads to a sequence number gap of
one packet after each GSO packet. Fix this by setting the sequence
number to the correct value.
Fixes: d7dbefc45cf5 ("xfrm: Add xfrm_replay_overflow functions for offloading")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
net/xfrm/xfrm_replay.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xfrm/xfrm_replay.c b/net/xfrm/xfrm_replay.c
index 1d38c6acf8af..9e3a5e85f828 100644
--- a/net/xfrm/xfrm_replay.c
+++ b/net/xfrm/xfrm_replay.c
@@ -660,7 +660,7 @@ static int xfrm_replay_overflow_offload_esn(struct xfrm_state *x, struct sk_buff
} else {
XFRM_SKB_CB(skb)->seq.output.low = oseq + 1;
XFRM_SKB_CB(skb)->seq.output.hi = oseq_hi;
- xo->seq.low = oseq = oseq + 1;
+ xo->seq.low = oseq + 1;
xo->seq.hi = oseq_hi;
oseq += skb_shinfo(skb)->gso_segs;
}
--
2.14.1
^ permalink raw reply related
* Re: [PATCH net-next v2 3/4] ibmvnic: Pad small packets to minimum MTU size
From: kbuild test robot @ 2018-03-13 7:15 UTC (permalink / raw)
To: Thomas Falcon; +Cc: kbuild-all, netdev, davem, jallen, nfont, Thomas Falcon
In-Reply-To: <1520873465-23312-4-git-send-email-tlfalcon@linux.vnet.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 10072 bytes --]
Hi Thomas,
I love your patch! Yet something to improve:
[auto build test ERROR on v4.16-rc4]
[also build test ERROR on next-20180309]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Thomas-Falcon/ibmvnic-Fix-VLAN-and-other-device-errata/20180313-125518
config: powerpc-allmodconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc
All error/warnings (new ones prefixed by >>):
drivers/net/ethernet/ibm/ibmvnic.c: In function 'ibmvnic_xmit':
>> drivers/net/ethernet/ibm/ibmvnic.c:1386:36: error: passing argument 2 of 'ibmvnic_xmit_workarounds' from incompatible pointer type [-Werror=incompatible-pointer-types]
if (ibmvnic_xmit_workarounds(skb, adapter)) {
^~~~~~~
drivers/net/ethernet/ibm/ibmvnic.c:1336:12: note: expected 'struct net_device *' but argument is of type 'struct ibmvnic_adapter *'
static int ibmvnic_xmit_workarounds(struct sk_buff *skb,
^~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/ibm/ibmvnic.c: In function 'ibmvnic_xmit_workarounds':
>> drivers/net/ethernet/ibm/ibmvnic.c:1347:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
cc1: some warnings being treated as errors
vim +/ibmvnic_xmit_workarounds +1386 drivers/net/ethernet/ibm/ibmvnic.c
1335
1336 static int ibmvnic_xmit_workarounds(struct sk_buff *skb,
1337 struct net_device *netdev)
1338 {
1339 /* For some backing devices, mishandling of small packets
1340 * can result in a loss of connection or TX stall. Device
1341 * architects recommend that no packet should be smaller
1342 * than the minimum MTU value provided to the driver, so
1343 * pad any packets to that length
1344 */
1345 if (skb->len < netdev->min_mtu)
1346 return skb_put_padto(skb, netdev->min_mtu);
> 1347 }
1348
1349 static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
1350 {
1351 struct ibmvnic_adapter *adapter = netdev_priv(netdev);
1352 int queue_num = skb_get_queue_mapping(skb);
1353 u8 *hdrs = (u8 *)&adapter->tx_rx_desc_req;
1354 struct device *dev = &adapter->vdev->dev;
1355 struct ibmvnic_tx_buff *tx_buff = NULL;
1356 struct ibmvnic_sub_crq_queue *tx_scrq;
1357 struct ibmvnic_tx_pool *tx_pool;
1358 unsigned int tx_send_failed = 0;
1359 unsigned int tx_map_failed = 0;
1360 unsigned int tx_dropped = 0;
1361 unsigned int tx_packets = 0;
1362 unsigned int tx_bytes = 0;
1363 dma_addr_t data_dma_addr;
1364 struct netdev_queue *txq;
1365 unsigned long lpar_rc;
1366 union sub_crq tx_crq;
1367 unsigned int offset;
1368 int num_entries = 1;
1369 unsigned char *dst;
1370 u64 *handle_array;
1371 int index = 0;
1372 u8 proto = 0;
1373 int ret = 0;
1374
1375 if (adapter->resetting) {
1376 if (!netif_subqueue_stopped(netdev, skb))
1377 netif_stop_subqueue(netdev, queue_num);
1378 dev_kfree_skb_any(skb);
1379
1380 tx_send_failed++;
1381 tx_dropped++;
1382 ret = NETDEV_TX_OK;
1383 goto out;
1384 }
1385
> 1386 if (ibmvnic_xmit_workarounds(skb, adapter)) {
1387 tx_dropped++;
1388 tx_send_failed++;
1389 ret = NETDEV_TX_OK;
1390 goto out;
1391 }
1392
1393 tx_pool = &adapter->tx_pool[queue_num];
1394 tx_scrq = adapter->tx_scrq[queue_num];
1395 txq = netdev_get_tx_queue(netdev, skb_get_queue_mapping(skb));
1396 handle_array = (u64 *)((u8 *)(adapter->login_rsp_buf) +
1397 be32_to_cpu(adapter->login_rsp_buf->off_txsubm_subcrqs));
1398
1399 index = tx_pool->free_map[tx_pool->consumer_index];
1400
1401 if (skb_is_gso(skb)) {
1402 offset = tx_pool->tso_index * IBMVNIC_TSO_BUF_SZ;
1403 dst = tx_pool->tso_ltb.buff + offset;
1404 memset(dst, 0, IBMVNIC_TSO_BUF_SZ);
1405 data_dma_addr = tx_pool->tso_ltb.addr + offset;
1406 tx_pool->tso_index++;
1407 if (tx_pool->tso_index == IBMVNIC_TSO_BUFS)
1408 tx_pool->tso_index = 0;
1409 } else {
1410 offset = index * (adapter->req_mtu + VLAN_HLEN);
1411 dst = tx_pool->long_term_buff.buff + offset;
1412 memset(dst, 0, adapter->req_mtu + VLAN_HLEN);
1413 data_dma_addr = tx_pool->long_term_buff.addr + offset;
1414 }
1415
1416 if (skb_shinfo(skb)->nr_frags) {
1417 int cur, i;
1418
1419 /* Copy the head */
1420 skb_copy_from_linear_data(skb, dst, skb_headlen(skb));
1421 cur = skb_headlen(skb);
1422
1423 /* Copy the frags */
1424 for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
1425 const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
1426
1427 memcpy(dst + cur,
1428 page_address(skb_frag_page(frag)) +
1429 frag->page_offset, skb_frag_size(frag));
1430 cur += skb_frag_size(frag);
1431 }
1432 } else {
1433 skb_copy_from_linear_data(skb, dst, skb->len);
1434 }
1435
1436 tx_pool->consumer_index =
1437 (tx_pool->consumer_index + 1) %
1438 adapter->req_tx_entries_per_subcrq;
1439
1440 tx_buff = &tx_pool->tx_buff[index];
1441 tx_buff->skb = skb;
1442 tx_buff->data_dma[0] = data_dma_addr;
1443 tx_buff->data_len[0] = skb->len;
1444 tx_buff->index = index;
1445 tx_buff->pool_index = queue_num;
1446 tx_buff->last_frag = true;
1447
1448 memset(&tx_crq, 0, sizeof(tx_crq));
1449 tx_crq.v1.first = IBMVNIC_CRQ_CMD;
1450 tx_crq.v1.type = IBMVNIC_TX_DESC;
1451 tx_crq.v1.n_crq_elem = 1;
1452 tx_crq.v1.n_sge = 1;
1453 tx_crq.v1.flags1 = IBMVNIC_TX_COMP_NEEDED;
1454 tx_crq.v1.correlator = cpu_to_be32(index);
1455 if (skb_is_gso(skb))
1456 tx_crq.v1.dma_reg = cpu_to_be16(tx_pool->tso_ltb.map_id);
1457 else
1458 tx_crq.v1.dma_reg = cpu_to_be16(tx_pool->long_term_buff.map_id);
1459 tx_crq.v1.sge_len = cpu_to_be32(skb->len);
1460 tx_crq.v1.ioba = cpu_to_be64(data_dma_addr);
1461
1462 if (adapter->vlan_header_insertion) {
1463 tx_crq.v1.flags2 |= IBMVNIC_TX_VLAN_INSERT;
1464 tx_crq.v1.vlan_id = cpu_to_be16(skb->vlan_tci);
1465 }
1466
1467 if (skb->protocol == htons(ETH_P_IP)) {
1468 tx_crq.v1.flags1 |= IBMVNIC_TX_PROT_IPV4;
1469 proto = ip_hdr(skb)->protocol;
1470 } else if (skb->protocol == htons(ETH_P_IPV6)) {
1471 tx_crq.v1.flags1 |= IBMVNIC_TX_PROT_IPV6;
1472 proto = ipv6_hdr(skb)->nexthdr;
1473 }
1474
1475 if (proto == IPPROTO_TCP)
1476 tx_crq.v1.flags1 |= IBMVNIC_TX_PROT_TCP;
1477 else if (proto == IPPROTO_UDP)
1478 tx_crq.v1.flags1 |= IBMVNIC_TX_PROT_UDP;
1479
1480 if (skb->ip_summed == CHECKSUM_PARTIAL) {
1481 tx_crq.v1.flags1 |= IBMVNIC_TX_CHKSUM_OFFLOAD;
1482 hdrs += 2;
1483 }
1484 if (skb_is_gso(skb)) {
1485 tx_crq.v1.flags1 |= IBMVNIC_TX_LSO;
1486 tx_crq.v1.mss = cpu_to_be16(skb_shinfo(skb)->gso_size);
1487 hdrs += 2;
1488 }
1489 /* determine if l2/3/4 headers are sent to firmware */
1490 if ((*hdrs >> 7) & 1) {
1491 build_hdr_descs_arr(tx_buff, &num_entries, *hdrs);
1492 tx_crq.v1.n_crq_elem = num_entries;
1493 tx_buff->indir_arr[0] = tx_crq;
1494 tx_buff->indir_dma = dma_map_single(dev, tx_buff->indir_arr,
1495 sizeof(tx_buff->indir_arr),
1496 DMA_TO_DEVICE);
1497 if (dma_mapping_error(dev, tx_buff->indir_dma)) {
1498 dev_kfree_skb_any(skb);
1499 tx_buff->skb = NULL;
1500 if (!firmware_has_feature(FW_FEATURE_CMO))
1501 dev_err(dev, "tx: unable to map descriptor array\n");
1502 tx_map_failed++;
1503 tx_dropped++;
1504 ret = NETDEV_TX_OK;
1505 goto out;
1506 }
1507 lpar_rc = send_subcrq_indirect(adapter, handle_array[queue_num],
1508 (u64)tx_buff->indir_dma,
1509 (u64)num_entries);
1510 } else {
1511 lpar_rc = send_subcrq(adapter, handle_array[queue_num],
1512 &tx_crq);
1513 }
1514 if (lpar_rc != H_SUCCESS) {
1515 dev_err(dev, "tx failed with code %ld\n", lpar_rc);
1516
1517 if (tx_pool->consumer_index == 0)
1518 tx_pool->consumer_index =
1519 adapter->req_tx_entries_per_subcrq - 1;
1520 else
1521 tx_pool->consumer_index--;
1522
1523 dev_kfree_skb_any(skb);
1524 tx_buff->skb = NULL;
1525
1526 if (lpar_rc == H_CLOSED) {
1527 /* Disable TX and report carrier off if queue is closed.
1528 * Firmware guarantees that a signal will be sent to the
1529 * driver, triggering a reset or some other action.
1530 */
1531 netif_tx_stop_all_queues(netdev);
1532 netif_carrier_off(netdev);
1533 }
1534
1535 tx_send_failed++;
1536 tx_dropped++;
1537 ret = NETDEV_TX_OK;
1538 goto out;
1539 }
1540
1541 if (atomic_inc_return(&tx_scrq->used)
1542 >= adapter->req_tx_entries_per_subcrq) {
1543 netdev_info(netdev, "Stopping queue %d\n", queue_num);
1544 netif_stop_subqueue(netdev, queue_num);
1545 }
1546
1547 tx_packets++;
1548 tx_bytes += skb->len;
1549 txq->trans_start = jiffies;
1550 ret = NETDEV_TX_OK;
1551
1552 out:
1553 netdev->stats.tx_dropped += tx_dropped;
1554 netdev->stats.tx_bytes += tx_bytes;
1555 netdev->stats.tx_packets += tx_packets;
1556 adapter->tx_send_failed += tx_send_failed;
1557 adapter->tx_map_failed += tx_map_failed;
1558 adapter->tx_stats_buffers[queue_num].packets += tx_packets;
1559 adapter->tx_stats_buffers[queue_num].bytes += tx_bytes;
1560 adapter->tx_stats_buffers[queue_num].dropped_packets += tx_dropped;
1561
1562 return ret;
1563 }
1564
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 56598 bytes --]
^ permalink raw reply
* Re: Problem with bridge (mcast-to-ucast + hairpin) and Broadcom's 802.11f in their FullMAC fw
From: Felix Fietkau @ 2018-03-13 7:17 UTC (permalink / raw)
To: Rafał Miłecki, Linus Lüssing, Arend van Spriel,
Franky Lin, Hante Meuleman, Chi-Hsien Lin, Wright Feng,
Pieter-Paul Giesberts
Cc: Network Development,
open list:BROADCOM BRCM80211 IEEE802.11n WIRELESS DRIVER, bridge,
brcm80211-dev-list, inux-wireless
In-Reply-To: <CACna6rz9L09g9oeHhvt209Tg1E3gKgmhGnYF653AdkXfZf=4kw@mail.gmail.com>
On 2018-02-27 11:08, Rafał Miłecki wrote:
> I've problem when using OpenWrt/LEDE on a home router with Broadcom's
> FullMAC WiFi chipset.
>
>
> First of all OpenWrt/LEDE uses bridge interface for LAN network with:
> 1) IFLA_BRPORT_MCAST_TO_UCAST
> 2) Clients isolation in hostapd
> 3) Hairpin mode enabled
>
> For more details please see Linus's patch description:
> https://patchwork.kernel.org/patch/9530669/
> and maybe hairpin mode patch:
> https://lwn.net/Articles/347344/
>
> Short version: in that setup packets received from a bridged wireless
> interface can be handled back to it for transmission.
>
>
> Now, Broadcom's firmware for their FullMAC chipsets in AP mode
> supports an obsoleted 802.11f AKA IAPP standard. It's a roaming
> standard that was replaced by 802.11r.
>
> Whenever a new station associates, firmware generates a packet like:
> ff ff ff ff ff ff ec 10 7b 5f ?? ?? 00 06 00 01 af 81 01 00
> (just masked 2 bytes of my MAC)
>
> For mode details you can see discussion in my brcmfmac patch thread:
> https://patchwork.kernel.org/patch/10191451/
>
>
> The problem is that bridge (in setup as above) handles such a packet
> back to the device.
>
> That makes Broadcom's FullMAC firmware believe that a given station
> just connected to another AP in a network (which doesn't even exist).
> As a result firmware immediately disassociates that station. It's
> simply impossible to connect to the router. Every association is
> followed by immediate disassociation.
>
>
> Can you see any solution for this problem? Is that an option to stop
> multicast-to-unicast from touching 802.11f packets? Some other ideas?
> Obviously I can't modify Broadcom's firmware and drop that obsoleted
> standard.
Let's look at it from a different angle: Since these packets are
forwarded as normal packets by the bridge, and the Broadcom firmware
reacts to them in this nasty way, that's basically local DoS security
issue. In my opinion that matters a lot more than having support for an
obsolete feature that almost nobody will ever want to use.
I think the right approach to deal with this issue is to drop these
garbage packets in both the receive and transmit path of brcmfmac.
- Felix
^ permalink raw reply
* Re: Problem with bridge (mcast-to-ucast + hairpin) and Broadcom's 802.11f in their FullMAC fw
From: Felix Fietkau @ 2018-03-13 7:20 UTC (permalink / raw)
To: Rafał Miłecki, Linus Lüssing, Arend van Spriel,
Franky Lin, Hante Meuleman, Chi-Hsien Lin, Wright Feng,
Pieter-Paul Giesberts
Cc: Network Development,
bridge-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
open list:BROADCOM BRCM80211 IEEE802.11n WIRELESS DRIVER,
brcm80211-dev-list-+wT8y+m8/X5BDgjK7y7TUQ
In-Reply-To: <CACna6rz9L09g9oeHhvt209Tg1E3gKgmhGnYF653AdkXfZf=4kw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
[resent with fixed typo in linux-wireless address]
On 2018-02-27 11:08, Rafał Miłecki wrote:
> I've problem when using OpenWrt/LEDE on a home router with Broadcom's
> FullMAC WiFi chipset.
>
>
> First of all OpenWrt/LEDE uses bridge interface for LAN network with:
> 1) IFLA_BRPORT_MCAST_TO_UCAST
> 2) Clients isolation in hostapd
> 3) Hairpin mode enabled
>
> For more details please see Linus's patch description:
> https://patchwork.kernel.org/patch/9530669/
> and maybe hairpin mode patch:
> https://lwn.net/Articles/347344/
>
> Short version: in that setup packets received from a bridged wireless
> interface can be handled back to it for transmission.
>
>
> Now, Broadcom's firmware for their FullMAC chipsets in AP mode
> supports an obsoleted 802.11f AKA IAPP standard. It's a roaming
> standard that was replaced by 802.11r.
>
> Whenever a new station associates, firmware generates a packet like:
> ff ff ff ff ff ff ec 10 7b 5f ?? ?? 00 06 00 01 af 81 01 00
> (just masked 2 bytes of my MAC)
>
> For mode details you can see discussion in my brcmfmac patch thread:
> https://patchwork.kernel.org/patch/10191451/
>
>
> The problem is that bridge (in setup as above) handles such a packet
> back to the device.
>
> That makes Broadcom's FullMAC firmware believe that a given station
> just connected to another AP in a network (which doesn't even exist).
> As a result firmware immediately disassociates that station. It's
> simply impossible to connect to the router. Every association is
> followed by immediate disassociation.
>
>
> Can you see any solution for this problem? Is that an option to stop
> multicast-to-unicast from touching 802.11f packets? Some other ideas?
> Obviously I can't modify Broadcom's firmware and drop that obsoleted
> standard.
Let's look at it from a different angle: Since these packets are
forwarded as normal packets by the bridge, and the Broadcom firmware
reacts to them in this nasty way, that's basically local DoS security
issue. In my opinion that matters a lot more than having support for an
obsolete feature that almost nobody will ever want to use.
I think the right approach to deal with this issue is to drop these
garbage packets in both the receive and transmit path of brcmfmac.
- Felix
^ permalink raw reply
* Re: [2/2] net/usb/ax88179_178a: Delete three unnecessary variables in ax88179_chk_eee()
From: SF Markus Elfring @ 2018-03-13 7:24 UTC (permalink / raw)
To: Oliver Neukum, linux-usb, netdev
Cc: kernel-janitors, LKML, Andrew F. Davis, Andrew Lunn,
Bjørn Mork, David S. Miller, Philippe Reynes, Yuval Shaia
In-Reply-To: <1520849038.29340.3.camel@suse.com>
>> Use three values directly for a condition check without assigning them
>> to intermediate variables.
>
> Hi,
>
> what is the benefit of this?
I proposed a small source code reduction.
Other software design directions might become more interesting for this use case.
Regards,
Markus
^ permalink raw reply
* Re: [pci PATCH v5 1/4] pci: Add pci_sriov_configure_simple for PFs that don't manage VF resources
From: Christoph Hellwig @ 2018-03-13 7:44 UTC (permalink / raw)
To: Alexander Duyck
Cc: Keith Busch, Bjorn Helgaas, Duyck, Alexander H, linux-pci,
virtio-dev, kvm, Netdev, Daly, Dan, LKML, linux-nvme, netanel,
Maximilian Heyne, Wang, Liang-min, Rustad, Mark D,
David Woodhouse, Christoph Hellwig, dwmw
In-Reply-To: <CAKgT0UfH+xXk__R_hEtFMsm7qkBG02hWC-S=8MgYkeeEx5zweA@mail.gmail.com>
On Mon, Mar 12, 2018 at 01:17:00PM -0700, Alexander Duyck wrote:
> No, I am aware of those. The problem is they aren't accessed as
> function pointers. As such converting them to static inline functions
> is easy. As I am sure you are aware an "inline" function doesn't
> normally generate a function pointer.
I think Keith's original idea of defining them to NULL is right. That
takes care of all the current trivial assign to struct cases.
If someone wants to call these functions they'll still need the ifdef
around the call as those won't otherwise compile, but they probably
want the ifdef around the whole caller anyway.
^ permalink raw reply
* Re: aio poll, io_pgetevents and a new in-kernel poll API V5
From: Christoph Hellwig @ 2018-03-13 7:46 UTC (permalink / raw)
To: viro; +Cc: Avi Kivity, linux-aio, linux-fsdevel, netdev, linux-api,
linux-kernel
In-Reply-To: <20180305212743.16664-1-hch@lst.de>
ping?
On Mon, Mar 05, 2018 at 01:27:07PM -0800, Christoph Hellwig wrote:
> Hi all,
>
> this series adds support for the IOCB_CMD_POLL operation to poll for the
> readyness of file descriptors using the aio subsystem. The API is based
> on patches that existed in RHAS2.1 and RHEL3, which means it already is
> supported by libaio. To implement the poll support efficiently new
> methods to poll are introduced in struct file_operations: get_poll_head
> and poll_mask. The first one returns a wait_queue_head to wait on
> (lifetime is bound by the file), and the second does a non-blocking
> check for the POLL* events. This allows aio poll to work without
> any additional context switches, unlike epoll.
>
> To make the interface fully useful a new io_pgetevents system call is
> added, which atomically saves and restores the signal mask over the
> io_pgetevents system call. It it the logical equivalent to pselect and
> ppoll for io_pgetevents.
>
> The corresponding libaio changes for io_pgetevents support and
> documentation, as well as a test case will be posted in a separate
> series.
>
> The changes were sponsored by Scylladb, and improve performance
> of the seastar framework up to 10%, while also removing the need
> for a privileged SCHED_FIFO epoll listener thread.
>
> git://git.infradead.org/users/hch/vfs.git aio-poll.5
>
> Gitweb:
>
> http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/aio-poll.5
>
> Libaio changes:
>
> https://pagure.io/libaio.git io-poll
>
> Seastar changes (not updated for the new io_pgetevens ABI yet):
>
> https://github.com/avikivity/seastar/commits/aio
>
> Changes since V4:
> - rebased ontop of Linux 4.16-rc4
>
> Changes since V3:
> - remove the pre-sleep ->poll_mask call in vfs_poll,
> allow ->get_poll_head to return POLL* values.
>
> Changes since V2:
> - removed a double initialization
> - new vfs_get_poll_head helper
> - document that ->get_poll_head can return NULL
> - call ->poll_mask before sleeping
> - various ACKs
> - add conversion of random to ->poll_mask
> - add conversion of af_alg to ->poll_mask
> - lacking ->poll_mask support now returns -EINVAL for IOCB_CMD_POLL
> - reshuffled the series so that prep patches and everything not
> requiring the new in-kernel poll API is in the beginning
>
> Changes since V1:
> - handle the NULL ->poll case in vfs_poll
> - dropped the file argument to the ->poll_mask socket operation
> - replace the ->pre_poll socket operation with ->get_poll_head as
> in the file operations
---end quoted text---
^ permalink raw reply
* Re: WARNING in kmalloc_slab (4)
From: Steffen Klassert @ 2018-03-13 7:51 UTC (permalink / raw)
To: syzbot; +Cc: davem, herbert, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <001a114214fac20a80056746440a@google.com>
On Tue, Mar 13, 2018 at 12:33:02AM -0700, syzbot wrote:
> Hello,
>
> syzbot hit the following crash on net-next commit
> f44b1886a5f876c87b5889df463ad7b97834ba37 (Fri Mar 9 18:10:06 2018 +0000)
> Merge branch 's390-qeth-next'
>
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+6a7e7ed886bde43469c4@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> WARNING: CPU: 1 PID: 27333 at mm/slab_common.c:1012 kmalloc_slab+0x5d/0x70
> mm/slab_common.c:1012
> Kernel panic - not syncing: panic_on_warn set ...
>
> syz-executor0: vmalloc: allocation failure: 17045651456 bytes,
> mode:0x14080c0(GFP_KERNEL|__GFP_ZERO), nodemask=(null)
> CPU: 1 PID: 27333 Comm: syz-executor2 Not tainted 4.16.0-rc4+ #260
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:17 [inline]
> dump_stack+0x194/0x24d lib/dump_stack.c:53
> panic+0x1e4/0x41c kernel/panic.c:183
> syz-executor0 cpuset=
> __warn+0x1dc/0x200 kernel/panic.c:547
> /
> mems_allowed=0
> report_bug+0x211/0x2d0 lib/bug.c:184
> fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
> fixup_bug arch/x86/kernel/traps.c:247 [inline]
> do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
> do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
> invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
> RIP: 0010:kmalloc_slab+0x5d/0x70 mm/slab_common.c:1012
> RSP: 0018:ffff8801ccfc72f0 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000010000018 RCX: ffffffff84ec4fc8
> RDX: 0000000000000ba7 RSI: 0000000000000000 RDI: 0000000010000018
> RBP: ffff8801ccfc72f0 R08: 0000000000000000 R09: 1ffff100399f8e21
> R10: ffff8801ccfc7040 R11: 0000000000000001 R12: 0000000000000018
> R13: ffff8801ccfc7598 R14: 00000000014080c0 R15: ffff8801aebaad80
> __do_kmalloc mm/slab.c:3700 [inline]
> __kmalloc+0x25/0x760 mm/slab.c:3714
> kmalloc include/linux/slab.h:517 [inline]
> kzalloc include/linux/slab.h:701 [inline]
> xfrm_alloc_replay_state_esn net/xfrm/xfrm_user.c:442 [inline]
This is likely fixed with:
commit d97ca5d714a5334aecadadf696875da40f1fbf3e
xfrm_user: uncoditionally validate esn replay attribute struct
The patch is included in the ipsec pull request for the net
tree I've sent this morning.
^ permalink raw reply
* Re: WARNING in kmalloc_slab (4)
From: Dmitry Vyukov @ 2018-03-13 8:04 UTC (permalink / raw)
To: Steffen Klassert
Cc: syzbot, David Miller, Herbert Xu, LKML, netdev, syzkaller-bugs
In-Reply-To: <20180313075143.b3ymdpt3nj3vnz77@gauss3.secunet.de>
On Tue, Mar 13, 2018 at 10:51 AM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> On Tue, Mar 13, 2018 at 12:33:02AM -0700, syzbot wrote:
>> Hello,
>>
>> syzbot hit the following crash on net-next commit
>> f44b1886a5f876c87b5889df463ad7b97834ba37 (Fri Mar 9 18:10:06 2018 +0000)
>> Merge branch 's390-qeth-next'
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>> Raw console output is attached.
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+6a7e7ed886bde43469c4@syzkaller.appspotmail.com
>> It will help syzbot understand when the bug is fixed. See footer for
>> details.
>> If you forward the report, please keep this part and the footer.
>>
>> WARNING: CPU: 1 PID: 27333 at mm/slab_common.c:1012 kmalloc_slab+0x5d/0x70
>> mm/slab_common.c:1012
>> Kernel panic - not syncing: panic_on_warn set ...
>>
>> syz-executor0: vmalloc: allocation failure: 17045651456 bytes,
>> mode:0x14080c0(GFP_KERNEL|__GFP_ZERO), nodemask=(null)
>> CPU: 1 PID: 27333 Comm: syz-executor2 Not tainted 4.16.0-rc4+ #260
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>> __dump_stack lib/dump_stack.c:17 [inline]
>> dump_stack+0x194/0x24d lib/dump_stack.c:53
>> panic+0x1e4/0x41c kernel/panic.c:183
>> syz-executor0 cpuset=
>> __warn+0x1dc/0x200 kernel/panic.c:547
>> /
>> mems_allowed=0
>> report_bug+0x211/0x2d0 lib/bug.c:184
>> fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
>> fixup_bug arch/x86/kernel/traps.c:247 [inline]
>> do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>> do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>> invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
>> RIP: 0010:kmalloc_slab+0x5d/0x70 mm/slab_common.c:1012
>> RSP: 0018:ffff8801ccfc72f0 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: 0000000010000018 RCX: ffffffff84ec4fc8
>> RDX: 0000000000000ba7 RSI: 0000000000000000 RDI: 0000000010000018
>> RBP: ffff8801ccfc72f0 R08: 0000000000000000 R09: 1ffff100399f8e21
>> R10: ffff8801ccfc7040 R11: 0000000000000001 R12: 0000000000000018
>> R13: ffff8801ccfc7598 R14: 00000000014080c0 R15: ffff8801aebaad80
>> __do_kmalloc mm/slab.c:3700 [inline]
>> __kmalloc+0x25/0x760 mm/slab.c:3714
>> kmalloc include/linux/slab.h:517 [inline]
>> kzalloc include/linux/slab.h:701 [inline]
>> xfrm_alloc_replay_state_esn net/xfrm/xfrm_user.c:442 [inline]
>
> This is likely fixed with:
>
> commit d97ca5d714a5334aecadadf696875da40f1fbf3e
> xfrm_user: uncoditionally validate esn replay attribute struct
>
> The patch is included in the ipsec pull request for the net
> tree I've sent this morning.
Let's tell syzbot:
#syz fix: xfrm_user: uncoditionally validate esn replay attribute struct
^ permalink raw reply
* Re: [pci PATCH v5 3/4] ena: Migrate over to unmanaged SR-IOV support
From: David Woodhouse @ 2018-03-13 8:12 UTC (permalink / raw)
To: Alexander Duyck, bhelgaas, alexander.h.duyck, linux-pci
Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
keith.busch, netanel, mheyne, liang-min.wang, mark.d.rustad, hch
In-Reply-To: <20180312172309.3487.76690.stgit@localhost.localdomain>
[-- Attachment #1: Type: text/plain, Size: 603 bytes --]
On Mon, 2018-03-12 at 10:23 -0700, Alexander Duyck wrote:
>
> - .sriov_configure = ena_sriov_configure,
> +#ifdef CONFIG_PCI_IOV
> + .sriov_configure = pci_sriov_configure_simple,
> +#endif
> };
I'd like to see that ifdef go away, as discussed. I agree that just
#define pci_sriov_configure_simple NULL
should suffice. As Christoph points out, it's not going to compile if
people try to just invoke it directly.
I'd also *really* like to see a way to enable this for PFs which don't
have (and don't need) a driver. We seem to have lost that along the
way.
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]
^ permalink raw reply
* Re: [pci PATCH v5 3/4] ena: Migrate over to unmanaged SR-IOV support
From: Christoph Hellwig @ 2018-03-13 8:16 UTC (permalink / raw)
To: David Woodhouse
Cc: Alexander Duyck, bhelgaas, alexander.h.duyck, linux-pci,
virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
keith.busch, netanel, mheyne, liang-min.wang, mark.d.rustad, hch
In-Reply-To: <1520928772.28745.53.camel@infradead.org>
On Tue, Mar 13, 2018 at 08:12:52AM +0000, David Woodhouse wrote:
> I'd also *really* like to see a way to enable this for PFs which don't
> have (and don't need) a driver. We seem to have lost that along the
> way.
We've been forth and back on that. I agree that not having any driver
just seems dangerous. If your PF really does nothing we should just
have a trivial pf_stub driver that does nothing but wiring up
pci_sriov_configure_simple. We can then add PCI IDs to it either
statically, or using the dynamic ids mechanism.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox