Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next v13 8/8] openvswitch: allow L3 netdev ports
From: Jiri Benc @ 2016-11-10 15:28 UTC (permalink / raw)
  To: netdev; +Cc: dev, Pravin Shelar, Lorand Jakab, Simon Horman
In-Reply-To: <cover.1478791347.git.jbenc@redhat.com>

Allow ARPHRD_NONE interfaces to be added to ovs bridge.

Based on previous versions by Lorand Jakab and Simon Horman.

Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
---
 net/openvswitch/vport-netdev.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index e825753de1e0..0389398fa4ab 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -57,8 +57,10 @@ static void netdev_port_receive(struct sk_buff *skb)
 	if (unlikely(!skb))
 		return;
 
-	skb_push(skb, ETH_HLEN);
-	skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
+	if (skb->dev->type == ARPHRD_ETHER) {
+		skb_push(skb, ETH_HLEN);
+		skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
+	}
 	ovs_vport_receive(vport, skb, skb_tunnel_info(skb));
 	return;
 error:
@@ -97,7 +99,8 @@ struct vport *ovs_netdev_link(struct vport *vport, const char *name)
 	}
 
 	if (vport->dev->flags & IFF_LOOPBACK ||
-	    vport->dev->type != ARPHRD_ETHER ||
+	    (vport->dev->type != ARPHRD_ETHER &&
+	     vport->dev->type != ARPHRD_NONE) ||
 	    ovs_is_internal_dev(vport->dev)) {
 		err = -EINVAL;
 		goto error_put;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next v13 5/8] openvswitch: add processing of L3 packets
From: Jiri Benc @ 2016-11-10 15:28 UTC (permalink / raw)
  To: netdev; +Cc: dev, Pravin Shelar, Lorand Jakab, Simon Horman
In-Reply-To: <cover.1478791347.git.jbenc@redhat.com>

Support receiving, extracting flow key and sending of L3 packets (packets
without an Ethernet header).

Note that even after this patch, non-Ethernet interfaces are still not
allowed to be added to bridges. Similarly, netlink interface for sending and
receiving L3 packets to/from user space is not in place yet.

Based on previous versions by Lorand Jakab and Simon Horman.

Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
v13:
- fix incorrect setting of packet to NULL in ovs_packet_cmd_execute
- dropping packet for interfaces with wrong type
---
 net/openvswitch/datapath.c |  13 +-----
 net/openvswitch/flow.c     | 106 ++++++++++++++++++++++++++++++++++-----------
 net/openvswitch/vport.c    |  19 ++++++++
 3 files changed, 101 insertions(+), 37 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index fa8760176b7d..1402f1be642d 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -560,7 +560,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	struct sw_flow *flow;
 	struct sw_flow_actions *sf_acts;
 	struct datapath *dp;
-	struct ethhdr *eth;
 	struct vport *input_vport;
 	u16 mru = 0;
 	int len;
@@ -581,17 +580,6 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 
 	nla_memcpy(__skb_put(packet, len), a[OVS_PACKET_ATTR_PACKET], len);
 
-	skb_reset_mac_header(packet);
-	eth = eth_hdr(packet);
-
-	/* Normally, setting the skb 'protocol' field would be handled by a
-	 * call to eth_type_trans(), but it assumes there's a sending
-	 * device, which we may not have. */
-	if (eth_proto_is_802_3(eth->h_proto))
-		packet->protocol = eth->h_proto;
-	else
-		packet->protocol = htons(ETH_P_802_2);
-
 	/* Set packet's mru */
 	if (a[OVS_PACKET_ATTR_MRU]) {
 		mru = nla_get_u16(a[OVS_PACKET_ATTR_MRU]);
@@ -618,6 +606,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 	rcu_assign_pointer(flow->sf_acts, acts);
 	packet->priority = flow->key.phy.priority;
 	packet->mark = flow->key.phy.skb_mark;
+	packet->protocol = flow->key.eth.type;
 
 	rcu_read_lock();
 	dp = get_dp_rcu(net, ovs_header->dp_ifindex);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 96c8c4716603..08aa926cd5cf 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -334,14 +334,17 @@ static int parse_vlan_tag(struct sk_buff *skb, struct vlan_head *key_vh)
 	return 1;
 }
 
-static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
+static void clear_vlan(struct sw_flow_key *key)
 {
-	int res;
-
 	key->eth.vlan.tci = 0;
 	key->eth.vlan.tpid = 0;
 	key->eth.cvlan.tci = 0;
 	key->eth.cvlan.tpid = 0;
+}
+
+static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
+{
+	int res;
 
 	if (skb_vlan_tag_present(skb)) {
 		key->eth.vlan.tci = htons(skb->vlan_tci);
@@ -483,17 +486,20 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
  *
  * Returns 0 if successful, otherwise a negative errno value.
  *
- * Initializes @skb header pointers as follows:
+ * Initializes @skb header fields as follows:
  *
- *    - skb->mac_header: the Ethernet header.
+ *    - skb->mac_header: the L2 header.
  *
- *    - skb->network_header: just past the Ethernet header, or just past the
- *      VLAN header, to the first byte of the Ethernet payload.
+ *    - skb->network_header: just past the L2 header, or just past the
+ *      VLAN header, to the first byte of the L2 payload.
  *
  *    - skb->transport_header: If key->eth.type is ETH_P_IP or ETH_P_IPV6
  *      on output, then just past the IP header, if one is present and
  *      of a correct length, otherwise the same as skb->network_header.
  *      For other key->eth.type values it is left untouched.
+ *
+ *    - skb->protocol: the type of the data starting at skb->network_header.
+ *      Equals to key->eth.type.
  */
 static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 {
@@ -505,28 +511,35 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
 
 	skb_reset_mac_header(skb);
 
-	/* Link layer.  We are guaranteed to have at least the 14 byte Ethernet
-	 * header in the linear data area.
-	 */
-	eth = eth_hdr(skb);
-	ether_addr_copy(key->eth.src, eth->h_source);
-	ether_addr_copy(key->eth.dst, eth->h_dest);
+	/* Link layer. */
+	clear_vlan(key);
+	if (key->mac_proto == MAC_PROTO_NONE) {
+		if (unlikely(eth_type_vlan(skb->protocol)))
+			return -EINVAL;
 
-	__skb_pull(skb, 2 * ETH_ALEN);
-	/* We are going to push all headers that we pull, so no need to
-	 * update skb->csum here.
-	 */
+		skb_reset_network_header(skb);
+	} else {
+		eth = eth_hdr(skb);
+		ether_addr_copy(key->eth.src, eth->h_source);
+		ether_addr_copy(key->eth.dst, eth->h_dest);
 
-	if (unlikely(parse_vlan(skb, key)))
-		return -ENOMEM;
+		__skb_pull(skb, 2 * ETH_ALEN);
+		/* We are going to push all headers that we pull, so no need to
+		* update skb->csum here.
+		*/
 
-	key->eth.type = parse_ethertype(skb);
-	if (unlikely(key->eth.type == htons(0)))
-		return -ENOMEM;
+		if (unlikely(parse_vlan(skb, key)))
+			return -ENOMEM;
+
+		skb->protocol = parse_ethertype(skb);
+		if (unlikely(skb->protocol == htons(0)))
+			return -ENOMEM;
 
-	skb_reset_network_header(skb);
+		skb_reset_network_header(skb);
+		__skb_push(skb, skb->data - skb_mac_header(skb));
+	}
 	skb_reset_mac_len(skb);
-	__skb_push(skb, skb->data - skb_mac_header(skb));
+	key->eth.type = skb->protocol;
 
 	/* Network layer. */
 	if (key->eth.type == htons(ETH_P_IP)) {
@@ -721,9 +734,25 @@ int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key)
 	return key_extract(skb, key);
 }
 
+static int key_extract_mac_proto(struct sk_buff *skb)
+{
+	switch (skb->dev->type) {
+	case ARPHRD_ETHER:
+		return MAC_PROTO_ETHERNET;
+	case ARPHRD_NONE:
+		if (skb->protocol == htons(ETH_P_TEB))
+			return MAC_PROTO_ETHERNET;
+		return MAC_PROTO_NONE;
+	}
+	WARN_ON_ONCE(1);
+	return -EINVAL;
+}
+
 int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 			 struct sk_buff *skb, struct sw_flow_key *key)
 {
+	int res;
+
 	/* Extract metadata from packet. */
 	if (tun_info) {
 		key->tun_proto = ip_tunnel_info_af(tun_info);
@@ -751,7 +780,10 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 	key->phy.skb_mark = skb->mark;
 	ovs_ct_fill_key(skb, key);
 	key->ovs_flow_hash = 0;
-	key->mac_proto = MAC_PROTO_ETHERNET;
+	res = key_extract_mac_proto(skb);
+	if (res < 0)
+		return res;
+	key->mac_proto = res;
 	key->recirc_id = 0;
 
 	return key_extract(skb, key);
@@ -768,5 +800,29 @@ int ovs_flow_key_extract_userspace(struct net *net, const struct nlattr *attr,
 	if (err)
 		return err;
 
+	if (ovs_key_mac_proto(key) == MAC_PROTO_NONE) {
+		/* key_extract assumes that skb->protocol is set-up for
+		 * layer 3 packets which is the case for other callers,
+		 * in particular packets recieved from the network stack.
+		 * Here the correct value can be set from the metadata
+		 * extracted above.
+		 */
+		skb->protocol = key->eth.type;
+	} else {
+		struct ethhdr *eth;
+
+		skb_reset_mac_header(skb);
+		eth = eth_hdr(skb);
+
+		/* Normally, setting the skb 'protocol' field would be
+		 * handled by a call to eth_type_trans(), but it assumes
+		 * there's a sending device, which we may not have.
+		 */
+		if (eth_proto_is_802_3(eth->h_proto))
+			skb->protocol = eth->h_proto;
+		else
+			skb->protocol = htons(ETH_P_802_2);
+	}
+
 	return key_extract(skb, key);
 }
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 898ed377b5cc..b6c8524032a0 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -485,6 +485,25 @@ void ovs_vport_send(struct vport *vport, struct sk_buff *skb, u8 mac_proto)
 {
 	int mtu = vport->dev->mtu;
 
+	switch (vport->dev->type) {
+	case ARPHRD_NONE:
+		if (mac_proto == MAC_PROTO_ETHERNET) {
+			skb_reset_network_header(skb);
+			skb_reset_mac_len(skb);
+			skb->protocol = htons(ETH_P_TEB);
+		} else if (mac_proto != MAC_PROTO_NONE) {
+			WARN_ON_ONCE(1);
+			goto drop;
+		}
+		break;
+	case ARPHRD_ETHER:
+		if (mac_proto != MAC_PROTO_ETHERNET)
+			goto drop;
+		break;
+	default:
+		goto drop;
+	}
+
 	if (unlikely(packet_length(skb, vport->dev) > mtu &&
 		     !skb_is_gso(skb))) {
 		net_warn_ratelimited("%s: dropped over-mtu packet: %d > %d\n",
-- 
1.8.3.1

^ permalink raw reply related

* Re: BUG() can be hit in tcp_collapse()
From: Greg KH @ 2016-11-10 15:34 UTC (permalink / raw)
  To: Vladis Dronov; +Cc: netdev, stable, Marco Grassi
In-Reply-To: <1623420310.11961160.1478789246631.JavaMail.zimbra@redhat.com>

On Thu, Nov 10, 2016 at 09:47:26AM -0500, Vladis Dronov wrote:
> Hello,
> 
> It was discovered by Marco Grassi <marco.gra@gmail.com> (many thanks) that the
> latest stable Linux kernel v4.8.6 is crashing in tcp_collapse() after making
> certain syscalls:
> 
> [    9.622886] kernel BUG at net/ipv4/tcp_input.c:4813!
> [    9.623299] invalid opcode: 0000 [#1] SMP
> [    9.623642] Modules linked in: iptable_nat nf_nat_ipv4 nf_nat
> [    9.624287] CPU: 2 PID: 2871 Comm: poc Not tainted 4.8.6 #2
> [    9.624730] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
> [    9.625459] task: ffff8801387b9a00 task.stack: ffff8801380e4000
> [    9.625929] RIP: 0010:[<ffffffff8178d4ec>]  [<ffffffff8178d4ec>] tcp_collapse+0x3ac/0x3b0
> [    9.626609] RSP: 0018:ffff8801380e7b78  EFLAGS: 00010282
> [    9.627028] RAX: 00000000fffffff2 RBX: 0000000000000ec0 RCX: 0000000000000ec0
> [    9.627587] RDX: ffff8801365cd000 RSI: 0000000000000000 RDI: ffff8801364106e0
> [    9.628142] RBP: ffff8801380e7bc8 R08: 0000000000000000 R09: ffff88013b003300
> [    9.628704] R10: ffff8801365cd000 R11: 0000000000000000 R12: 0000000000000ec0
> [    9.629259] R13: ffff88013663ae00 R14: 00000000cdf0ca26 R15: ffff8801364106e0
> [    9.629819] FS:  00007f2cef695800(0000) GS:ffff88013fc80000(0000) knlGS:0000000000000000
> [    9.630945] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    9.631655] CR2: 000000002002a000 CR3: 0000000139d46000 CR4: 00000000001406e0
> [    9.632462] Stack:
> [    9.632900]  0000000000000000 cdf0da2600000001 ffff880138050000 ffff8801380500a8
> [    9.634138]  ffff880100000000 ffff880138050688 0000000000000900 ffff8801364136e0
> [    9.635379]  ffff880138050000 ffff880138050688 ffff8801380e7c00 ffffffff8178d630
> [    9.636622] Call Trace:
> [    9.637087]  [<ffffffff8178d630>] tcp_try_rmem_schedule+0x140/0x380
> [    9.637834]  [<ffffffff81791aa8>] tcp_data_queue+0x898/0xcf0
> [    9.638538]  [<ffffffff8179210b>] tcp_rcv_established+0x20b/0x6c0
> [    9.639268]  [<ffffffff81710143>] ? sk_reset_timer+0x13/0x30
> [    9.639968]  [<ffffffff81813009>] tcp_v6_do_rcv+0x1b9/0x420
> [    9.640666]  [<ffffffff81710b02>] __release_sock+0x82/0xf0
> [    9.641353]  [<ffffffff81710b9b>] release_sock+0x2b/0x90
> [    9.642029]  [<ffffffff817890ca>] tcp_sendmsg+0x55a/0xb60
> [    9.642714]  [<ffffffff817b29d0>] inet_sendmsg+0x60/0x90
> [    9.643389]  [<ffffffff8170c7b3>] sock_sendmsg+0x33/0x40
> [    9.644064]  [<ffffffff8170ccee>] SYSC_sendto+0xee/0x160
> [    9.645530]  [<ffffffff8170d6f9>] SyS_sendto+0x9/0x10
> [    9.646190]  [<ffffffff81909df2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
> [    9.646947] Code: 48 c7 07 00 00 00 00 48 89 42 08 48 89 10 e8 cc 7e f8 ff 49 8b 47 30 48 8b 80 80 01 00 00 65 48 ff 80 b0 01 00 00 e9 72 fd ff ff <0f> 0b 66 90 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fe 53 8b 
> [    9.651794] RIP  [<ffffffff8178d4ec>] tcp_collapse+0x3ac/0x3b0
> [    9.652554]  RSP <ffff8801380e7b78>
> 
> The reproducer is generated by the syzkaller, please, see attached. The
> following BUG() is hit:
> 
> [net/ipv4/tcp_input.c]
> static void
> tcp_collapse(struct sock *sk, struct sk_buff_head *list,
>              struct sk_buff *head, struct sk_buff *tail,
>              u32 start, u32 end)
> {
> ...
> /* Copy data, releasing collapsed skbs. */
> while (copy > 0) {
>         int offset = start - TCP_SKB_CB(skb)->seq;
>         int size = TCP_SKB_CB(skb)->end_seq - start;
> 
>         BUG_ON(offset < 0);
>         if (size > 0) {
>                 size = min(copy, size);
> 4812:           if (skb_copy_bits(skb, offset, skb_put(nskb, size), size))
> 4813:                   BUG();
> 
> /usr/src/linux-4.8.6/net/ipv4/tcp_input.c: 4812
> 0xffffffff8178d390 <tcp_collapse+0x250>:        mov    %r12d,%esi
> 0xffffffff8178d393 <tcp_collapse+0x253>:        callq  0xffffffff81713ce0 <skb_put>
> 0xffffffff8178d398 <tcp_collapse+0x258>:        mov    -0x30(%rbp),%r8d
> 0xffffffff8178d39c <tcp_collapse+0x25c>:        mov    %r12d,%ecx
> 0xffffffff8178d39f <tcp_collapse+0x25f>:        mov    %rax,%rdx
> 0xffffffff8178d3a2 <tcp_collapse+0x262>:        mov    %r15,%rdi
> 0xffffffff8178d3a5 <tcp_collapse+0x265>:        mov    %r8d,%esi
> 0xffffffff8178d3a8 <tcp_collapse+0x268>:        callq  0xffffffff81714b90 <skb_copy_bits>
> 0xffffffff8178d3ad <tcp_collapse+0x26d>:        test   %eax,%eax
> 0xffffffff8178d3af <tcp_collapse+0x26f>:        jne    0xffffffff8178d4ec <tcp_collapse+0x3ac>
> ...
> /usr/src/linux-4.8.6/net/ipv4/tcp_input.c: 4813
> 0xffffffff8178d4ec <tcp_collapse+0x3ac>:        ud2    
> 
> I have checked that the reproducer can cause hitting this BUG() in the kernels
> since, at least v4.0. I was not checking the earlier kernels except RHEL-7 ones
> (3.10.0-xxx) which are not vulnerable.
> 
> The upstream kernels since v4.9-rc1 are not vulnerable too and I have bisected
> the repo to the commit c9c3321257 which fixes the issue.
> 
> $ git tag --contain c9c3321257e1b95be9b375f811fb250162af8d39
> v4.9-rc1
> 
> Stable v4.8.6 kernel with the c9c3321257 commit applied does not hit the BUG(),
> so I believe this commit should be backported to the stable branch. This commit
> applies cleanly to the v4.8.6 tree with just line offsets.

I'll be glad to take it if the network maintainer says it is safe to do
so and acks it :)

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] usbnet: prevent device rpm suspend in usbnet_probe function
From: Alan Stern @ 2016-11-10 15:36 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Bjørn Mork, Oliver Neukum, linux-kernel, linux-usb, netdev
In-Reply-To: <CAAd53p5W4xpSYsFZ00deQDh-gZBijn_ApreE9NmC=_B_MAQaug@mail.gmail.com>

On Thu, 10 Nov 2016, Kai-Heng Feng wrote:

> Hi,
> 
> On Wed, Nov 9, 2016 at 8:32 PM, Bjørn Mork <bjorn@mork.no> wrote:
> > Oliver Neukum <oneukum@suse.com> writes:
> >
> >> On Tue, 2016-11-08 at 13:44 -0500, Alan Stern wrote:
> >>
> >>> These problems could very well be caused by running at SuperSpeed
> >>> (USB-3) instead of high speed (USB-2).
> 
> Yes, it's running at SuperSpeed, on a Kabylake laptop.
> 
> It does not have this issue on a Broadwell laptop, also running at SuperSpeed.
> 
> >>>
> >>> Is there any way to test what happens when the device is attached to
> >>> the computer by a USB-2 cable?  That would prevent it from operating at
> >>> SuperSpeed.
> 
> I recall old Intel PCH can change the USB host from XHCI to EHCI,
> newer PCH does not have this option.
> 
> Is there a way to force XHCI run at HighSpeed?

Yes, like I said above: Use a USB-2 cable instead of a USB-3 cable.

Alan Stern

^ permalink raw reply

* Re: [PATCH net-next resend 10/13] debugfs: constify argument to debugfs_real_fops()
From: Jakub Kicinski @ 2016-11-10 15:42 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: netdev, Nicolai Stange, Christian Lamparter, LKML
In-Reply-To: <20161110142559.GA15902@kroah.com>

On Thu, 10 Nov 2016 15:25:59 +0100, Greg Kroah-Hartman wrote:
> On Thu, Nov 03, 2016 at 05:12:06PM +0000, Jakub Kicinski wrote:
> > seq_file users can only access const version of file pointer,
> > because the ->file member of struct seq_operations is marked
> > as such.  Make parameter to debugfs_real_fops() const.
> > 
> > CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > CC: Nicolai Stange <nicstange@gmail.com>
> > CC: Christian Lamparter <chunkeey@gmail.com>
> > CC: LKML <linux-kernel@vger.kernel.org>
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > ---
> >  include/linux/debugfs.h | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
> > index 4d3f0d1aec73..bf1907d96097 100644
> > --- a/include/linux/debugfs.h
> > +++ b/include/linux/debugfs.h
> > @@ -52,7 +52,8 @@ struct debugfs_regset32 {
> >   * Must only be called under the protection established by
> >   * debugfs_use_file_start().
> >   */
> > -static inline const struct file_operations *debugfs_real_fops(struct file *filp)
> > +static inline const struct file_operations *
> > +debugfs_real_fops(const struct file *filp)  
> 
> Ick.  Tell me that looks better :(
> 
> Please just don't wrap things like that, just make it go longer than 80
> columns, I can handle the complaints...

Ugh, I should've gone with my gut feeling then.  I'll post a fix up
shortly since this is already in Dave's tree.

^ permalink raw reply

* Re: BUG() can be hit in tcp_collapse()
From: Eric Dumazet @ 2016-11-10 15:44 UTC (permalink / raw)
  To: Vladis Dronov; +Cc: netdev, stable, Marco Grassi
In-Reply-To: <1623420310.11961160.1478789246631.JavaMail.zimbra@redhat.com>

On Thu, 2016-11-10 at 09:47 -0500, Vladis Dronov wrote:
> Hello,
> 
> It was discovered by Marco Grassi <marco.gra@gmail.com> (many thanks) that the
> latest stable Linux kernel v4.8.6 is crashing in tcp_collapse() after making
> certain syscalls:
> 
> [    9.622886] kernel BUG at net/ipv4/tcp_input.c:4813!
> [    9.623299] invalid opcode: 0000 [#1] SMP
> [    9.623642] Modules linked in: iptable_nat nf_nat_ipv4 nf_nat
> [    9.624287] CPU: 2 PID: 2871 Comm: poc Not tainted 4.8.6 #2
> [    9.624730] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
> [    9.625459] task: ffff8801387b9a00 task.stack: ffff8801380e4000
> [    9.625929] RIP: 0010:[<ffffffff8178d4ec>]  [<ffffffff8178d4ec>] tcp_collapse+0x3ac/0x3b0
> [    9.626609] RSP: 0018:ffff8801380e7b78  EFLAGS: 00010282
> [    9.627028] RAX: 00000000fffffff2 RBX: 0000000000000ec0 RCX: 0000000000000ec0
> [    9.627587] RDX: ffff8801365cd000 RSI: 0000000000000000 RDI: ffff8801364106e0
> [    9.628142] RBP: ffff8801380e7bc8 R08: 0000000000000000 R09: ffff88013b003300
> [    9.628704] R10: ffff8801365cd000 R11: 0000000000000000 R12: 0000000000000ec0
> [    9.629259] R13: ffff88013663ae00 R14: 00000000cdf0ca26 R15: ffff8801364106e0
> [    9.629819] FS:  00007f2cef695800(0000) GS:ffff88013fc80000(0000) knlGS:0000000000000000
> [    9.630945] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    9.631655] CR2: 000000002002a000 CR3: 0000000139d46000 CR4: 00000000001406e0
> [    9.632462] Stack:
> [    9.632900]  0000000000000000 cdf0da2600000001 ffff880138050000 ffff8801380500a8
> [    9.634138]  ffff880100000000 ffff880138050688 0000000000000900 ffff8801364136e0
> [    9.635379]  ffff880138050000 ffff880138050688 ffff8801380e7c00 ffffffff8178d630
> [    9.636622] Call Trace:
> [    9.637087]  [<ffffffff8178d630>] tcp_try_rmem_schedule+0x140/0x380
> [    9.637834]  [<ffffffff81791aa8>] tcp_data_queue+0x898/0xcf0
> [    9.638538]  [<ffffffff8179210b>] tcp_rcv_established+0x20b/0x6c0
> [    9.639268]  [<ffffffff81710143>] ? sk_reset_timer+0x13/0x30
> [    9.639968]  [<ffffffff81813009>] tcp_v6_do_rcv+0x1b9/0x420
> [    9.640666]  [<ffffffff81710b02>] __release_sock+0x82/0xf0
> [    9.641353]  [<ffffffff81710b9b>] release_sock+0x2b/0x90
> [    9.642029]  [<ffffffff817890ca>] tcp_sendmsg+0x55a/0xb60
> [    9.642714]  [<ffffffff817b29d0>] inet_sendmsg+0x60/0x90
> [    9.643389]  [<ffffffff8170c7b3>] sock_sendmsg+0x33/0x40
> [    9.644064]  [<ffffffff8170ccee>] SYSC_sendto+0xee/0x160
> [    9.645530]  [<ffffffff8170d6f9>] SyS_sendto+0x9/0x10
> [    9.646190]  [<ffffffff81909df2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
> [    9.646947] Code: 48 c7 07 00 00 00 00 48 89 42 08 48 89 10 e8 cc 7e f8 ff 49 8b 47 30 48 8b 80 80 01 00 00 65 48 ff 80 b0 01 00 00 e9 72 fd ff ff <0f> 0b 66 90 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fe 53 8b 
> [    9.651794] RIP  [<ffffffff8178d4ec>] tcp_collapse+0x3ac/0x3b0
> [    9.652554]  RSP <ffff8801380e7b78>
> 
> The reproducer is generated by the syzkaller, please, see attached. The
> following BUG() is hit:
> 
> [net/ipv4/tcp_input.c]
> static void
> tcp_collapse(struct sock *sk, struct sk_buff_head *list,
>              struct sk_buff *head, struct sk_buff *tail,
>              u32 start, u32 end)
> {
> ...
> /* Copy data, releasing collapsed skbs. */
> while (copy > 0) {
>         int offset = start - TCP_SKB_CB(skb)->seq;
>         int size = TCP_SKB_CB(skb)->end_seq - start;
> 
>         BUG_ON(offset < 0);
>         if (size > 0) {
>                 size = min(copy, size);
> 4812:           if (skb_copy_bits(skb, offset, skb_put(nskb, size), size))
> 4813:                   BUG();
> 
> /usr/src/linux-4.8.6/net/ipv4/tcp_input.c: 4812
> 0xffffffff8178d390 <tcp_collapse+0x250>:        mov    %r12d,%esi
> 0xffffffff8178d393 <tcp_collapse+0x253>:        callq  0xffffffff81713ce0 <skb_put>
> 0xffffffff8178d398 <tcp_collapse+0x258>:        mov    -0x30(%rbp),%r8d
> 0xffffffff8178d39c <tcp_collapse+0x25c>:        mov    %r12d,%ecx
> 0xffffffff8178d39f <tcp_collapse+0x25f>:        mov    %rax,%rdx
> 0xffffffff8178d3a2 <tcp_collapse+0x262>:        mov    %r15,%rdi
> 0xffffffff8178d3a5 <tcp_collapse+0x265>:        mov    %r8d,%esi
> 0xffffffff8178d3a8 <tcp_collapse+0x268>:        callq  0xffffffff81714b90 <skb_copy_bits>
> 0xffffffff8178d3ad <tcp_collapse+0x26d>:        test   %eax,%eax
> 0xffffffff8178d3af <tcp_collapse+0x26f>:        jne    0xffffffff8178d4ec <tcp_collapse+0x3ac>
> ...
> /usr/src/linux-4.8.6/net/ipv4/tcp_input.c: 4813
> 0xffffffff8178d4ec <tcp_collapse+0x3ac>:        ud2    
> 
> I have checked that the reproducer can cause hitting this BUG() in the kernels
> since, at least v4.0. I was not checking the earlier kernels except RHEL-7 ones
> (3.10.0-xxx) which are not vulnerable.
> 
> The upstream kernels since v4.9-rc1 are not vulnerable too and I have bisected
> the repo to the commit c9c3321257 which fixes the issue.
> 
> $ git tag --contain c9c3321257e1b95be9b375f811fb250162af8d39
> v4.9-rc1
> 
> Stable v4.8.6 kernel with the c9c3321257 commit applied does not hit the BUG(),
> so I believe this commit should be backported to the stable branch. This commit
> applies cleanly to the v4.8.6 tree with just line offsets.
> 
> Meanwhile, I see that commit c9c3321257 just increases(?) an skb buffer(?)
> (which fixes hitting the BUG() for this exact reproducer), but does not fix the
> real reason (so another set of syscalls still may cause hitting the BUG()). This
> is why I'm emailing not only to stable@, but also to netdev@, asking to review the
> data above and probably develop a fix.

I do not believe c9c3321257 is a proper fix for this issue.

It only works around the bug for this specific use case, probably
because of :

+       /* In case all data was pulled from skb frags (in __pskb_pull_tail()),
+        * we can fix skb->truesize to its real value to avoid future drops.
+        * This is valid because skb is not yet charged to the socket.
+        * It has been noticed pure SACK packets were sometimes dropped
+        * (if cooked by drivers without copybreak feature).
+        */
+       if (!skb->data_len)
+               skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));

Meaning that (tcp_win_from_space(skb->truesize) > skb->len) expression
result differs in tcp_collapse() later.

^ permalink raw reply

* Re: [PATCH 2/2] rtl8xxxu: Fix for bogus data used to determine macpower
From: Jes Sorensen @ 2016-11-08 14:29 UTC (permalink / raw)
  To: Dave Jones; +Cc: Joe Perches, John Heenan, Kalle Valo, linux-wireless, netdev
In-Reply-To: <CANf5e8aQ+HZz47M3-4+qjsG=ZCaXhBFU_jVupLqT4rEYT2LQFQ@mail.gmail.com>

Dave Jones <s.dave.jones@gmail.com> writes:
> On Fri, Nov 04, 2016 at 09:56:00AM -0400, Jes Sorensen wrote:
>>
>> Joe Perches <joe@perches.com> writes:
>> > On Sun, 2016-10-30 at 19:02 -0400, Jes Sorensen wrote:
>> >> Code is 80 characters wide, and comments are /* */ never the ugly C++
>> >> crap.
>> >
>> > You might look at the recent Linus Torvalds authored commit
>> > 5e467652ffef (?printk: re-organize log_output() to be more legible")
>> > which does both of those: c99 // comments and > 80 columns.
>> >
>> > Absolutes are for zealots.
>>
>> What Linus does in his code, is totally up to him. What I pull into the
>> driver that *I* maintain, is up to me. It is perfectly normal to expect
>> submitters to respect the coding style of the piece of code they are
>> trying to edit.
>
> Bullshit.  It's perfectly normal to respect Linux coding style described in
> Documentation/CodingStyle.  Now let's back to the topic, could you
> apply John's patch or you just wanna improve your driver is 100% bug free?

First of all, I call for proper CodingStyle to be applied to my driver,
and I expect someone posting a patch to respect the codingstyle of the
driver in question. It is simple respect for the code. If you consider
that BS - that is on you!

Second I am NOT applying that patch as I have stated repeatedly because
I am not convinced it is safe to do so and it changes the code flow for
one type of chip and not the rest. In addition it uses a broken approach
to doing chip specific changes.

In short, the patch is broken!

Jes

^ permalink raw reply

* [PATCH] genetlink: fix unsigned int comparison with less than zero
From: Colin King @ 2016-11-10 15:57 UTC (permalink / raw)
  To: David S . Miller, Johannes Berg, pravin shelar, Wei Yongjun,
	Florian Westphal, Tycho Andersen, WANG Cong, Tom Herbert, netdev
  Cc: linux-kernel

From: Colin Ian King <colin.king@canonical.com>

family->id is unsigned, so the less than zero check for
failure return from idr_alloc is never true and so the error exit
is never handled.  Instead, assign err and check if this is less
than zero since this is a signed integer.

Issue found with static analysis with CoverityScan, CID 1375916

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 net/netlink/genetlink.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index f0b65fe..2ea61ba 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -360,12 +360,10 @@ int genl_register_family(struct genl_family *family)
 	} else
 		family->attrbuf = NULL;
 
-	family->id = idr_alloc(&genl_fam_idr, family,
+	family->id = err = idr_alloc(&genl_fam_idr, family,
 			       start, end + 1, GFP_KERNEL);
-	if (family->id < 0) {
-		err = family->id;
+	if (err < 0)
 		goto errout_free;
-	}
 
 	err = genl_validate_assign_mc_groups(family);
 	if (err)
-- 
2.10.2

^ permalink raw reply related

* [PATCH iproute2 1/1] tc: improved usage help for fw classifier.
From: Roman Mashak @ 2016-11-10 16:02 UTC (permalink / raw)
  To: stephen; +Cc: netdev, jhs, Roman Mashak

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 tc/f_fw.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/tc/f_fw.c b/tc/f_fw.c
index 29c9854..790bef9 100644
--- a/tc/f_fw.c
+++ b/tc/f_fw.c
@@ -25,10 +25,19 @@
 
 static void explain(void)
 {
-	fprintf(stderr, "Usage: ... fw [ classid CLASSID ] [ action ACTION_SPEC ]\n");
-	fprintf(stderr, "       ACTION_SPEC := ... look at individual actions\n");
-	fprintf(stderr, "       CLASSID := X:Y\n");
-	fprintf(stderr, "\nNOTE: CLASSID is parsed as hexadecimal input.\n");
+	fprintf(stderr,
+		"Usage: ... fw [ classid CLASSID ] [ indev DEV ] [ action ACTION_SPEC ]\n");
+	fprintf(stderr,
+		"       CLASSID := Push matching packets to the class identified by CLASSID with format X:Y\n");
+	fprintf(stderr,
+		"                  CLASSID is parsed as hexadecimal input.\n");
+	fprintf(stderr,
+		"       DEV := specify device for incoming device classification.\n");
+	fprintf(stderr,
+		"       ACTION_SPEC := Apply an action on matching packets.\n");
+	fprintf(stderr,
+		"       NOTE: handle is represented as HANDLE[/FWMASK].\n");
+	fprintf(stderr, "             FWMASK is 0xffffffff by default.\n");
 }
 
 static int fw_parse_opt(struct filter_util *qu, char *handle, int argc, char **argv, struct nlmsghdr *n)
-- 
1.9.1

^ permalink raw reply related

* [PATCH net,v2] ipv4: use new_gw for redirect neigh lookup
From: Stephen Suryaputra Lin @ 2016-11-10 16:16 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Suryaputra Lin

In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0
and then the state of the neigh for the new_gw is checked. If the state
isn't valid then the redirected route is deleted. This behavior is
maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway
is assigned to peer->redirect_learned.a4 before calling
ipv4_neigh_lookup().

After commit 5943634fc559 ("ipv4: Maintain redirect and PMTU info in
struct rtable again."), ipv4_neigh_lookup() is performed without the
rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
isn't zero, the function uses it as the key. The neigh is most likely
valid since the old_gw is the one that sends the ICMP redirect message.
Then the new_gw is assigned to fib_nh_exception. The problem is: the
new_gw ARP may never gets resolved and the traffic is blackholed.

So, use the new_gw for neigh lookup.

Changes from v1:
 - use __ipv4_neigh_lookup instead (per Eric Dumazet).

Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
Signed-off-by: Stephen Suryaputra Lin <ssurya@ieee.org>
---
 net/ipv4/route.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 62d4d90c1389..2a57566e6e91 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -753,7 +753,9 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
 			goto reject_redirect;
 	}

-	n = ipv4_neigh_lookup(&rt->dst, NULL, &new_gw);
+	n = __ipv4_neigh_lookup(rt->dst.dev, new_gw);
+	if (!n)
+		n = neigh_create(&arp_tbl, &new_gw, rt->dst.dev);
 	if (!IS_ERR(n)) {
 		if (!(n->nud_state & NUD_VALID)) {
 			neigh_event_send(n, NULL);
-- 
2.7.4

^ permalink raw reply related

* Re: AF_VSOCK loopback
From: Cathy Avery @ 2016-11-10 16:17 UTC (permalink / raw)
  To: Stefan Hajnoczi, Jorgen Hansen; +Cc: netdev
In-Reply-To: <20161110144329.GA19683@stefanha-x1.localdomain>

I had trouble with loopback on the esx host. Using the nc-vsock ( 
AF_VSOCK 56 ) the server and the client connected but they both 
terminated without error when I attempted to send characters over. It 
might be due to something other than vsock. I haven't pursued it yet.

On 11/10/2016 09:43 AM, Stefan Hajnoczi wrote:
> Hi Jorgen,
> Cathy Avery found that the AF_VSOCK VMCI transport does loopback inside
> the guest (but not on the host?).  The virtio transport currently does
> no loopback.
>
> The loopback scenario I'm thinking of is where process A listens on port
> 1234 and process B on the same machine connects to port 1234 both with
> the same CID.
>
> I'd like to make the virtio transport compatible with VMCI transport
> semantics so AF_VSOCK behaves the same regardless of the transport.
> This means loopback must be added to virtio-vsock.
>
> The core net/vmware/af_vsock.c code does not implement loopback.  How
> does VMCI do loopback?  Are the loopback packets reflected back from the
> host?  Or does the guest driver notice the loopback and avoid passing
> packets to the host in the first place?
>
> Maybe we can make the loopback code common in af_vsock.c if that avoids
> code duplication.
>
> Thanks,
> Stefan

^ permalink raw reply

* Re: [PATCH] net: ethernet: ti: davinci_cpdma: fix fixed prio cpdma ctlr configuration
From: Grygorii Strashko @ 2016-11-10 16:37 UTC (permalink / raw)
  To: Ivan Khoronzhuk, mugunthanvnm, netdev; +Cc: linux-omap, linux-kernel
In-Reply-To: <592966c0-ae4c-ba6b-43ba-8ca837e9a700@linaro.org>



On 11/09/2016 05:56 PM, Ivan Khoronzhuk wrote:
> 
> 
> On 09.11.16 23:09, Grygorii Strashko wrote:
>>
>>
>> On 11/08/2016 07:10 AM, Ivan Khoronzhuk wrote:
>>> The dma ctlr is reseted to 0 while cpdma start, thus cpdma ctlr
>>
>> I assume this is because cpdma_ctlr_start() does soft reset. Is it
>> correct?
> Probably not. I've seen this register doesn't hold any previous settings
> (just trash)

What register CPDMA_DMACONTROL or CPSW_DMACTRL?

> after cpdma_ctlr_stop(), actually after last channel is stopped (inside
> of cpdma_ctlr_stop()).

So, You are stating that Registers context is changed after stop?

> Then cpdma_ctlr_start() just reset it to 0.

"just trash" or "0".

Sry, I do not see how cpdma_ctlr_stop() can affect on registers state :(
and I'd very appreciated if can provide more detailed information.

> 
>>
>>> cannot be configured after cpdma is stopped. So, restore content
>>> of cpdma ctlr while off/on procedure.
>>>
>>> Based on net-next/master
>>
>> ^ remove it
> sure
> 
>>
>>>
>>> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>>> ---
>>>  drivers/net/ethernet/ti/cpsw.c          |   6 +-
>>>  drivers/net/ethernet/ti/davinci_cpdma.c | 103
>>> +++++++++++++++++---------------
>>>  drivers/net/ethernet/ti/davinci_cpdma.h |   2 +
>>>  3 files changed, 58 insertions(+), 53 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/ti/cpsw.c
>>> b/drivers/net/ethernet/ti/cpsw.c
>>> index b1ddf89..4d04b8e 100644
>>> --- a/drivers/net/ethernet/ti/cpsw.c
>>> +++ b/drivers/net/ethernet/ti/cpsw.c
>>> @@ -1376,10 +1376,6 @@ static int cpsw_ndo_open(struct net_device *ndev)
>>>                    ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0);
>>>
>>>      if (!cpsw_common_res_usage_state(cpsw)) {
>>> -        /* setup tx dma to fixed prio and zero offset */
>>> -        cpdma_control_set(cpsw->dma, CPDMA_TX_PRIO_FIXED, 1);
>>> -        cpdma_control_set(cpsw->dma, CPDMA_RX_BUFFER_OFFSET, 0);
>>> -
>>>          /* disable priority elevation */
>>>          __raw_writel(0, &cpsw->regs->ptype);
>>>
>>> @@ -2710,6 +2706,8 @@ static int cpsw_probe(struct platform_device
>>> *pdev)
>>>      dma_params.desc_align        = 16;
>>>      dma_params.has_ext_regs        = true;
>>>      dma_params.desc_hw_addr         = dma_params.desc_mem_phys;
>>> +    dma_params.rxbuf_offset        = 0;
>>> +    dma_params.fixed_prio        = 1;
>>
>> Do we really need this new parameters? Do you have plans to use other
>> values?
> I'm ok if this is static (equally as a bunch of rest in dma_params), no
> see reason to use other values,

That's what i wanted to know :) - go static, pls.

> it at least now. But the issue is not only in these two parameters and
> not only in cpsw_ndo_open().
> It touches cpsw_set_channels() also, where ctlr stop/start is present.
> In order to not copy cpdma_control_set(cpsw->dma, CPDMA_TX_PRIO_FIXED,
> 1)...
> in all such kind places in eth drivers, better to allow the cpdma to
> control it's parameters...
> 
> The cpdma ctlr register holds a little more parameters (but only two of
> them are set in cpsw)
> Maybe there is reason to save them also. Actually I'd seen this problem
> when playing with
> on/off channel shapers, unfortunately the cpdma ctlr holds this info
> also, and it was lost
> while on/off (but I'm going to restore it in chan_start()).
> 

I understand you change, but I'm note sure about real root cause :(

-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH net-next 00/17] Bug fixes & Code improvements in HNS driver
From: David Miller @ 2016-11-10 16:46 UTC (permalink / raw)
  To: salil.mehta; +Cc: yisen.zhuang, mehta.salil.lnk, netdev, linux-kernel, linuxarm
In-Reply-To: <20161109181401.913728-1-salil.mehta@huawei.com>

From: Salil Mehta <salil.mehta@huawei.com>
Date: Wed, 9 Nov 2016 18:13:44 +0000

> This patch-set introduces some bug fixes and code improvements.
> These have been identified during internal review or testing of
> the driver by internal Hisilicon teams.

Series applied, thanks.

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net] igb: re-assign hw address pointer on reset after PCI error
From: Guilherme G. Piccoli @ 2016-11-10 16:48 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: intel-wired-lan, Netdev, Fujinaka, Todd
In-Reply-To: <CAKgT0UdYrO14ytSgPfa6KgC0OPaydqtd2-j2+AQFgJMgz69d+w@mail.gmail.com>

On 11/09/2016 03:12 PM, Alexander Duyck wrote:
> On Mon, Oct 31, 2016 at 1:12 PM, Guilherme G. Piccoli
> <gpiccoli@linux.vnet.ibm.com> wrote:
>> Whenever the igb driver detects the result of a read operation returns
>> a value composed only by F's (like 0xFFFFFFFF), it will detach the
>> net_device, clear the hw_addr pointer and warn to the user that adapter's
>> link is lost - those steps happen on igb_rd32().
>>
>> In case a PCI error happens on Power architecture, there's a recovery
>> mechanism called EEH, that will reset the PCI slot and call driver's
>> handlers to reset the adapter and network functionality as well.
>>
>> We observed that once hw_addr is NULL after the error is detected on
>> igb_rd32(), it's never assigned back, so in the process of resetting
>> the network functionality we got a NULL pointer dereference in both
>> igb_configure_tx_ring() and igb_configure_rx_ring(). In order to avoid
>> such bug, we re-assign the hw_addr value in the beginning of the
>> function igb_reset(), in case the hw_addr is NULL when we reach that
>> path.
>>
>> Reported-by: Anthony H. Thai <ahthai@us.ibm.com>
>> Reported-by: Harsha Thyagaraja <hathyaga@in.ibm.com>
>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
>> ---
>>  drivers/net/ethernet/intel/igb/igb_main.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
>> index edc9a6a..c19119c 100644
>> --- a/drivers/net/ethernet/intel/igb/igb_main.c
>> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
>> @@ -1873,6 +1873,13 @@ void igb_reset(struct igb_adapter *adapter)
>>         struct e1000_fc_info *fc = &hw->fc;
>>         u32 pba, hwm;
>>
>> +       /* In case of PCI error, adapter might have lost its HW
>> +        * address; if we reached this point after an error scenario,
>> +        * we should re-assign the hw_addr based on the saved io_addr.
>> +        */
>> +       if (!hw->hw_addr)
>> +               hw->hw_addr = adapter->io_addr;
>> +
>>         /* Repartition Pba for greater than 9k mtu
>>          * To take effect CTRL.RST is required.
>>          */
> 
> It seems like this would have the potential to get noisy pretty
> quickly since every reset would retrigger this.
> 
> It might make more sense to move this line into igb_io_slot_reset and
> igb_resume where the device would have gone through a reset of some
> sort and then had the state of the device restored and the device
> memory access was re-enabled via pci_enable_device_mem.  Also there is
> no point in really doing the "if" since you should always be okay to
> overwrite hw->hw_addr with adapter->io_addr.

Thanks for the good suggestion Alexander, will send a V2.
Cheers,

Guilherme


> 
> Thanks.
> 
> - Alex
> 

^ permalink raw reply

* Re: [PATCH 1/2] net: mvpp2: don't bring up on MAC address set
From: David Miller @ 2016-11-10 16:57 UTC (permalink / raw)
  To: baruch; +Cc: mw, netdev, thomas.petazzoni, gregory.clement
In-Reply-To: <ff17831771f3575f351c134703d3f153485b01c0.1478696194.git.baruch@tkos.co.il>

From: Baruch Siach <baruch@tkos.co.il>
Date: Wed,  9 Nov 2016 14:56:33 +0200

> Current .ndo_set_mac_address implementation brings up the interface when revert
> to original address after failure succeeds. Fix this.
> 
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
> ---
> Untested; I don't have the hardware.

The code which updates the parser should keep the existing
state if any part of the update fails.

This means it must attempt all memory allocations and whatever other
resource acquisition is necessary, and only after all of those
operations succeed and no more error cases are possible should it
update the tables and release the old entry.

In other worse, this whole mechanism must move to a proper "prepare
--> commit" model of making changes.

^ permalink raw reply

* Re: [PATCH net-next v5]] cadence: Add LSO support.
From: David Miller @ 2016-11-10 17:01 UTC (permalink / raw)
  To: rafalo; +Cc: nicolas.ferre, netdev, linux-kernel
In-Reply-To: <1478698862-12924-1-git-send-email-rafalo@cadence.com>

From: Rafal Ozieblo <rafalo@cadence.com>
Date: Wed, 9 Nov 2016 13:41:02 +0000

First, please remove the spurious closing bracket in your Subject line
in future submittions.

> +	if (is_udp) /* is_udp is only set when (is_lso) is checked */
> +		/* zero UDP checksum, not calculated by h/w for UFO */
> +		udp_hdr(skb)->check = 0;

This is really not ok.

If UFO is in use it should not silently disable UDP checksums.

If you cannot support UFO with proper checksumming, then you cannot
enable support for that feature.

^ permalink raw reply

* Re: [PATCH net v2] ipv4: update comment to document GSO fragmentation cases.
From: David Miller @ 2016-11-10 17:02 UTC (permalink / raw)
  To: lrichard; +Cc: netdev
In-Reply-To: <1478721879-11033-1-git-send-email-lrichard@redhat.com>

From: Lance Richardson <lrichard@redhat.com>
Date: Wed,  9 Nov 2016 15:04:39 -0500

> This is a follow-up to commit 9ee6c5dc816a ("ipv4: allow local
> fragmentation in ip_finish_output_gso()"), updating the comment
> documenting cases in which fragmentation is needed for egress
> GSO packets.
> 
> Suggested-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> Signed-off-by: Lance Richardson <lrichard@redhat.com>
> ---
> v2: corrected commit ID (v1 used local commit ID by mistake)

Applied.

^ permalink raw reply

* Re: [PATCH] genetlink: fix unsigned int comparison with less than zero
From: Cong Wang @ 2016-11-10 17:11 UTC (permalink / raw)
  To: Colin King
  Cc: David S . Miller, Johannes Berg, pravin shelar, Wei Yongjun,
	Florian Westphal, Tycho Andersen, Tom Herbert,
	Linux Kernel Network Developers, LKML
In-Reply-To: <20161110155758.26996-1-colin.king@canonical.com>

On Thu, Nov 10, 2016 at 7:57 AM, Colin King <colin.king@canonical.com> wrote:
> From: Colin Ian King <colin.king@canonical.com>
>
> family->id is unsigned, so the less than zero check for
> failure return from idr_alloc is never true and so the error exit
> is never handled.  Instead, assign err and check if this is less
> than zero since this is a signed integer.

Why family->id can't be just signed int? For me it should be.

^ permalink raw reply

* Re: [PATCH net-next V5 3/9] liquidio CN23XX: Mailbox support
From: David Miller @ 2016-11-10 17:19 UTC (permalink / raw)
  To: rvatsavayi
  Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	felix.manlunas
In-Reply-To: <1478733232-15739-4-git-send-email-rvatsavayi@caviumnetworks.com>

From: Raghu Vatsavayi <rvatsavayi@caviumnetworks.com>
Date: Wed, 9 Nov 2016 15:13:46 -0800

> +		dev_info(&oct->pci_dev->dev,
> +			 "PF changed VF's MAC address to %02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx\n",
> +			 mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);

We have a standard mechanism to print network addresses, including
MAC ones, via the %pXXX format specifier.  In this case you should
use "%pM" or "%pm" depending upon whether you want the address printed
with colon separators or not.

^ permalink raw reply

* [PATCH] debugfs: improve formatting of debugfs_real_fops()
From: Jakub Kicinski @ 2016-11-10 17:23 UTC (permalink / raw)
  To: netdev
  Cc: Greg Kroah-Hartman, Nicolai Stange, Christian Lamparter,
	Jakub Kicinski

Type of debugfs_real_fops() is longer than parameters and
the name, so there is no way to break the declaration nicely.
We have to go over 80 characters.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/linux/debugfs.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index bf1907d96097..374c6c67e6c0 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -52,8 +52,7 @@ struct debugfs_regset32 {
  * Must only be called under the protection established by
  * debugfs_use_file_start().
  */
-static inline const struct file_operations *
-debugfs_real_fops(const struct file *filp)
+static inline const struct file_operations *debugfs_real_fops(const struct file *filp)
 	__must_hold(&debugfs_srcu)
 {
 	/*
-- 
1.9.1

^ permalink raw reply related

* Re: [Intel-wired-lan] [PATCH] igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr
From: Corinna Vinschen @ 2016-11-10 17:28 UTC (permalink / raw)
  To: Hisashi T Fujinaka
  Cc: Alexander Duyck, Netdev, linux-kernel@vger.kernel.org, Cao jin,
	intel-wired-lan, Izumi, Taku/泉 拓
In-Reply-To: <alpine.NEB.2.20.17.1611100544220.6177@chris.i8u.org>

[-- Attachment #1: Type: text/plain, Size: 975 bytes --]

On Nov 10 05:48, Hisashi T Fujinaka wrote:
> On Thu, 10 Nov 2016, Corinna Vinschen wrote:
> > On Nov  8 11:33, Alexander Duyck wrote:
> ...
> > > The question I would have is what is reading the device when it is in
> > > this state.  The watchdog and any other functions that would read the
> > > device should be disabled.
> > > 
> > > One possibility could be a race between a call to igb_close and the
> > > igb_suspend function.  We have seen some of those pop up recently on
> > > ixgbe and it looks like igb has the same bug.  We should probably be
> > > using the rtnl_lock to guarantee that netif_device_detach and the call
> > > to __igb_close are completed before igb_close could possibly be called
> > > by the network stack.
> > 
> > Do you have a pointer to the related ixgbe patch, by any chance?
> ...
> Here's the initial patch for igb I have, but it's on hold awaiting more
> changes in ixgbe regarding AER.

Thanks a lot!


Corinna

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* [mm PATCH v3 23/23] igb: Update code to better handle incrementing page count
From: Alexander Duyck @ 2016-11-10 11:36 UTC (permalink / raw)
  To: linux-mm, akpm; +Cc: netdev, Jeff Kirsher, linux-kernel
In-Reply-To: <20161110113027.76501.63030.stgit@ahduyck-blue-test.jf.intel.com>

This patch updates the driver code so that we do bulk updates of the page
reference count instead of just incrementing it by one reference at a time.
The advantage to doing this is that we cut down on atomic operations and
this in turn should give us a slight improvement in cycles per packet.  In
addition if we eventually move this over to using build_skb the gains will
be more noticeable.

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h      |    7 ++++++-
 drivers/net/ethernet/intel/igb/igb_main.c |   24 +++++++++++++++++-------
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 5387b3a..786de01 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -210,7 +210,12 @@ struct igb_tx_buffer {
 struct igb_rx_buffer {
 	dma_addr_t dma;
 	struct page *page;
-	unsigned int page_offset;
+#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
+	__u32 page_offset;
+#else
+	__u16 page_offset;
+#endif
+	__u16 pagecnt_bias;
 };
 
 struct igb_tx_queue_stats {
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index ba97392..f5a9fd6 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3937,7 +3937,8 @@ static void igb_clean_rx_ring(struct igb_ring *rx_ring)
 				     PAGE_SIZE,
 				     DMA_FROM_DEVICE,
 				     DMA_ATTR_SKIP_CPU_SYNC);
-		__free_page(buffer_info->page);
+		__page_frag_drain(buffer_info->page, 0,
+				  buffer_info->pagecnt_bias);
 
 		buffer_info->page = NULL;
 	}
@@ -6813,13 +6814,15 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
 				  struct page *page,
 				  unsigned int truesize)
 {
+	unsigned int pagecnt_bias = rx_buffer->pagecnt_bias--;
+
 	/* avoid re-using remote pages */
 	if (unlikely(igb_page_is_reserved(page)))
 		return false;
 
 #if (PAGE_SIZE < 8192)
 	/* if we are only owner of page we can reuse it */
-	if (unlikely(page_count(page) != 1))
+	if (unlikely(page_ref_count(page) != pagecnt_bias))
 		return false;
 
 	/* flip page offset to other buffer */
@@ -6832,10 +6835,14 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
 		return false;
 #endif
 
-	/* Even if we own the page, we are not allowed to use atomic_set()
-	 * This would break get_page_unless_zero() users.
+	/* If we have drained the page fragment pool we need to update
+	 * the pagecnt_bias and page count so that we fully restock the
+	 * number of references the driver holds.
 	 */
-	page_ref_inc(page);
+	if (unlikely(pagecnt_bias == 1)) {
+		page_ref_add(page, USHRT_MAX);
+		rx_buffer->pagecnt_bias = USHRT_MAX;
+	}
 
 	return true;
 }
@@ -6887,7 +6894,6 @@ static bool igb_add_rx_frag(struct igb_ring *rx_ring,
 			return true;
 
 		/* this page cannot be reused so discard it */
-		__free_page(page);
 		return false;
 	}
 
@@ -6958,10 +6964,13 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring,
 		/* hand second half of page back to the ring */
 		igb_reuse_rx_page(rx_ring, rx_buffer);
 	} else {
-		/* we are not reusing the buffer so unmap it */
+		/* We are not reusing the buffer so unmap it and free
+		 * any references we are holding to it
+		 */
 		dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
 				     PAGE_SIZE, DMA_FROM_DEVICE,
 				     DMA_ATTR_SKIP_CPU_SYNC);
+		__page_frag_drain(page, 0, rx_buffer->pagecnt_bias);
 	}
 
 	/* clear contents of rx_buffer */
@@ -7235,6 +7244,7 @@ static bool igb_alloc_mapped_page(struct igb_ring *rx_ring,
 	bi->dma = dma;
 	bi->page = page;
 	bi->page_offset = 0;
+	bi->pagecnt_bias = 1;
 
 	return true;
 }

^ permalink raw reply related

* Re: Long delays creating a netns after deleting one (possibly RCU related)
From: Cong Wang @ 2016-11-10 17:37 UTC (permalink / raw)
  To: Rolf Neugebauer
  Cc: LKML, Linux Kernel Network Developers, Justin Cormack,
	Ian Campbell, Paul E. McKenney
In-Reply-To: <CA+pO-2f+7QfsLiFwvGcNqTrQ0t8B5+w0vZHFrshWsSWgDE6iUg@mail.gmail.com>

(Cc'ing Paul)

On Wed, Nov 9, 2016 at 7:42 AM, Rolf Neugebauer
<rolf.neugebauer@docker.com> wrote:
> Hi
>
> We noticed some long delays starting docker containers on some newer
> kernels (starting with 4.5.x and still present in 4.9-rc4, 4.4.x is
> fine). We narrowed this down to the creation of a network namespace
> being delayed directly after removing another one (details and
> reproduction below). We have seen delays of up to 60s on some systems.
>
> - The delay is proportional to the number of CPUs (online or offline).
> We first discovered it with a Hyper-V Linux VM. Hyper-V advertises up
> to 240 offline vCPUs even if one configures the VM with only, say 2
> vCPUs. We see linear increase in delay when we change NR_CPUS in the
> kernel config.
>
> - The delay is also dependent on some tunnel network interfaces being
> present (which we had compiled in in one of our kernel configs).
>
> - We can reproduce this issue with stock kernels from
> http://kernel.ubuntu.com/~kernel-ppa/mainline/running in Hyper-V VMs
> as well as other hypervisors like qemu and hyperkit where we have good
> control over the number of CPUs.
>
> A simple test is:
> modprobe ipip
> moprobe  ip_gre
> modprobe ip_vti
> echo -n "add netns foo ===> "; /usr/bin/time -f "%E" ip netns add foo
> echo -n "del netns foo ===> "; /usr/bin/time -f "%E" ip netns delete foo
> echo -n "add netns bar ===> "; /usr/bin/time -f "%E" ip netns add bar
> echo -n "del netns bar ===> "; /usr/bin/time -f "%E" ip netns delete bar
>
> with an output like:
> add netns foo ===> 0:00.00
> del netns foo ===> 0:00.01
> add netns bar ===> 0:08.53
> del netns bar ===> 0:00.01
>
> This is on a 4.9-rc4 kernel from the above URL configured with
> NR_CPUS=256 running in a Hyper-V VM (kernel config attached).
>
> Below is a dump of the work queues while the second 'ip add netns' is
> hanging. The state of the work queues does not seem to change while
> the command is delayed and the pattern shown is consistent across
> different kernel versions.
>
> Is this a known issue and/or is someone working on a fix?

Not to me.


>
> [  610.356272] sysrq: SysRq : Show Blocked State
> [  610.356742]   task                        PC stack   pid father
> [  610.357252] kworker/u480:1  D    0  1994      2 0x00000000
> [  610.357752] Workqueue: netns cleanup_net
> [  610.358239]  ffff9892f1065800 0000000000000000 ffff9892ee1e1e00
> ffff9892f8e59340
> [  610.358705]  ffff9892f4526900 ffffbf0104b5ba88 ffffffffbe486df3
> ffffbf0104b5ba60
> [  610.359168]  00ffffffbdcbe663 ffff9892f8e59340 0000000100012e70
> ffff9892ee1e1e00
> [  610.359677] Call Trace:
> [  610.360169]  [<ffffffffbe486df3>] ? __schedule+0x233/0x6e0
> [  610.360723]  [<ffffffffbe4872d6>] schedule+0x36/0x80
> [  610.361194]  [<ffffffffbe48a9ca>] schedule_timeout+0x22a/0x3f0
> [  610.361789]  [<ffffffffbe486dfb>] ? __schedule+0x23b/0x6e0
> [  610.362260]  [<ffffffffbe487d24>] wait_for_completion+0xb4/0x140
> [  610.362736]  [<ffffffffbdcb05a0>] ? wake_up_q+0x80/0x80
> [  610.363306]  [<ffffffffbdceb528>] __wait_rcu_gp+0xc8/0xf0
> [  610.363782]  [<ffffffffbdceea5c>] synchronize_sched+0x5c/0x80
> [  610.364137]  [<ffffffffbdcf0010>] ? call_rcu_bh+0x20/0x20
> [  610.364742]  [<ffffffffbdceb440>] ?
> trace_raw_output_rcu_utilization+0x60/0x60
> [  610.365337]  [<ffffffffbe3696bc>] synchronize_net+0x1c/0x30

This is a worker which holds the net_mutex and is waiting for
a RCU grace period to elapse.


> [  610.365846]  [<ffffffffbe369803>] netif_napi_del+0x23/0x80
> [  610.367494]  [<ffffffffc057f6f8>] ip_tunnel_dev_free+0x68/0xf0 [ip_tunnel]
> [  610.368007]  [<ffffffffbe372c10>] netdev_run_todo+0x230/0x330
> [  610.368454]  [<ffffffffbe37eb4e>] rtnl_unlock+0xe/0x10
> [  610.369001]  [<ffffffffc057f4df>] ip_tunnel_delete_net+0xdf/0x120 [ip_tunnel]
> [  610.369500]  [<ffffffffc058b92c>] ipip_exit_net+0x2c/0x30 [ipip]
> [  610.369997]  [<ffffffffbe362688>] ops_exit_list.isra.4+0x38/0x60
> [  610.370636]  [<ffffffffbe363674>] cleanup_net+0x1c4/0x2b0
> [  610.371130]  [<ffffffffbdc9e4ac>] process_one_work+0x1fc/0x4b0
> [  610.371812]  [<ffffffffbdc9e7ab>] worker_thread+0x4b/0x500
> [  610.373074]  [<ffffffffbdc9e760>] ? process_one_work+0x4b0/0x4b0
> [  610.373622]  [<ffffffffbdc9e760>] ? process_one_work+0x4b0/0x4b0
> [  610.374100]  [<ffffffffbdca4b09>] kthread+0xd9/0xf0
> [  610.374574]  [<ffffffffbdca4a30>] ? kthread_park+0x60/0x60
> [  610.375198]  [<ffffffffbe48c2b5>] ret_from_fork+0x25/0x30
> [  610.375678] ip              D    0  2149   2148 0x00000000
> [  610.376185]  ffff9892f0a99000 0000000000000000 ffff9892f0a66900
> [  610.376185]  ffff9892f8e59340
> [  610.376185]  ffff9892f4526900 ffffbf0101173db8 ffffffffbe486df3
> [  610.376753]  00000005fecffd76
> [  610.376762]  00ff9892f11d9820 ffff9892f8e59340 ffff989200000000
> ffff9892f0a66900
> [  610.377274] Call Trace:
> [  610.377789]  [<ffffffffbe486df3>] ? __schedule+0x233/0x6e0
> [  610.378306]  [<ffffffffbe4872d6>] schedule+0x36/0x80
> [  610.378992]  [<ffffffffbe48756e>] schedule_preempt_disabled+0xe/0x10
> [  610.379514]  [<ffffffffbe489199>] __mutex_lock_slowpath+0xb9/0x130
> [  610.380031]  [<ffffffffbde0fce2>] ? __kmalloc+0x162/0x1e0
> [  610.380556]  [<ffffffffbe48922f>] mutex_lock+0x1f/0x30
> [  610.381135]  [<ffffffffbe3637ff>] copy_net_ns+0x9f/0x170
> [  610.381647]  [<ffffffffbdca5e6b>] create_new_namespaces+0x11b/0x200
> [  610.382249]  [<ffffffffbdca60fa>] unshare_nsproxy_namespaces+0x5a/0xb0
> [  610.382818]  [<ffffffffbdc82dcd>] SyS_unshare+0x1cd/0x360
> [  610.383319]  [<ffffffffbe48c03b>] entry_SYSCALL_64_fastpath+0x1e/0xad

This process is apparently waiting for the net_mutex held by the previous one.

Either RCU implementation is broken or something else is missing.
Do you have more stack traces of related processes? For example,
rcu_tasks_kthread. And if anything you can help to narrow down the problem,
it would be great.

Thanks.

^ permalink raw reply

* Re: net/l2tp: use-after-free write in l2tp_ip6_close
From: Guillaume Nault @ 2016-11-10 17:44 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: David S. Miller, Eric Dumazet, Willem de Bruijn,
	Hannes Frederic Sowa, Soheil Hassas Yeganeh, Shmulik Ladkani,
	Wei Wang, Haishuang Yan, netdev, LKML, Dmitry Vyukov,
	Kostya Serebryany, Alexander Potapenko, syzkaller
In-Reply-To: <CAAeHK+wThP9rwA7ecbpTPLaj7BR6+qwTy_N=Zh3Jf+AsO_Njbw@mail.gmail.com>

On Mon, Nov 07, 2016 at 11:35:26PM +0100, Andrey Konovalov wrote:
> Hi,
> 
> I've got the following error report while running the syzkaller fuzzer:
> 
> ==================================================================
> BUG: KASAN: use-after-free in l2tp_ip6_close+0x239/0x2a0 at addr
> ffff8800677276d8
> Write of size 8 by task a.out/8668
> CPU: 0 PID: 8668 Comm: a.out Not tainted 4.9.0-rc4+ #354
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  ffff8800694d7b00 ffffffff81b46a64 ffff88006adb5780 ffff8800677276c0
>  ffff880067727c68 ffff8800677276c0 ffff8800694d7b28 ffffffff8150a86c
>  ffff8800694d7bb8 ffff88006adb5780 ffff8800e77276d8 ffff8800694d7ba8
> Call Trace:
>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  [<ffffffff81b46a64>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
>  [<ffffffff8150a86c>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
>  [<     inline     >] print_address_description mm/kasan/report.c:194
>  [<ffffffff8150ab07>] kasan_report_error+0x1f7/0x4d0 mm/kasan/report.c:283
>  [<     inline     >] kasan_report mm/kasan/report.c:303
>  [<ffffffff8150b01e>] __asan_report_store8_noabort+0x3e/0x40
> mm/kasan/report.c:329
>  [<     inline     >] __write_once_size ./include/linux/compiler.h:272
>  [<     inline     >] __hlist_del ./include/linux/list.h:622
>  [<     inline     >] hlist_del_init ./include/linux/list.h:637
>  [<ffffffff83825f49>] l2tp_ip6_close+0x239/0x2a0 net/l2tp/l2tp_ip6.c:239
>  [<ffffffff8316b31f>] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
>  [<ffffffff832cd4d0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
>  [<ffffffff82b6d89e>] sock_release+0x8e/0x1d0 net/socket.c:570
>  [<ffffffff82b6d9f6>] sock_close+0x16/0x20 net/socket.c:1017
>  [<ffffffff81524bdd>] __fput+0x29d/0x720 fs/file_table.c:208
>  [<ffffffff815250e5>] ____fput+0x15/0x20 fs/file_table.c:244
>  [<ffffffff81172928>] task_work_run+0xf8/0x170 kernel/task_work.c:116
>  [<     inline     >] exit_task_work ./include/linux/task_work.h:21
>  [<ffffffff8111bda3>] do_exit+0x883/0x2ac0 kernel/exit.c:828
>  [<ffffffff8112234e>] do_group_exit+0x10e/0x340 kernel/exit.c:931
>  [<     inline     >] SYSC_exit_group kernel/exit.c:942
>  [<ffffffff8112259d>] SyS_exit_group+0x1d/0x20 kernel/exit.c:940
>  [<ffffffff83fc1501>] entry_SYSCALL_64_fastpath+0x1f/0xc2
> arch/x86/entry/entry_64.S:209
> Object at ffff8800677276c0, in cache L2TP/IPv6 size: 1448
> Allocated:
> PID = 8692
> [<ffffffff8107e236>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
> [<ffffffff81509bd6>] save_stack+0x46/0xd0 mm/kasan/kasan.c:495
> [<     inline     >] set_track mm/kasan/kasan.c:507
> [<ffffffff81509e4b>] kasan_kmalloc+0xab/0xe0 mm/kasan/kasan.c:598
> [<ffffffff8150a3b2>] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:537
> [<     inline     >] slab_post_alloc_hook mm/slab.h:417
> [<     inline     >] slab_alloc_node mm/slub.c:2708
> [<     inline     >] slab_alloc mm/slub.c:2716
> [<ffffffff81505064>] kmem_cache_alloc+0xb4/0x270 mm/slub.c:2721
> [<ffffffff82b77ca9>] sk_prot_alloc+0x69/0x2b0 net/core/sock.c:1327
> [<ffffffff82b80898>] sk_alloc+0x38/0xaf0 net/core/sock.c:1389
> [<ffffffff832cef05>] inet6_create+0x2e5/0xf60 net/ipv6/af_inet6.c:182
> [<ffffffff82b7301f>] __sock_create+0x37f/0x640 net/socket.c:1153
> [<     inline     >] sock_create net/socket.c:1193
> [<     inline     >] SYSC_socket net/socket.c:1223
> [<ffffffff82b73510>] SyS_socket+0xf0/0x1b0 net/socket.c:1203
> [<ffffffff83fc1501>] entry_SYSCALL_64_fastpath+0x1f/0xc2
> arch/x86/entry/entry_64.S:209
> Freed:
> PID = 8668
> [<ffffffff8107e236>] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
> [<ffffffff81509bd6>] save_stack+0x46/0xd0 mm/kasan/kasan.c:495
> [<     inline     >] set_track mm/kasan/kasan.c:507
> [<ffffffff8150a433>] kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:571
> [<     inline     >] slab_free_hook mm/slub.c:1352
> [<     inline     >] slab_free_freelist_hook mm/slub.c:1374
> [<     inline     >] slab_free mm/slub.c:2951
> [<ffffffff81506263>] kmem_cache_free+0xb3/0x2c0 mm/slub.c:2973
> [<     inline     >] sk_prot_free net/core/sock.c:1370
> [<ffffffff82b7c669>] __sk_destruct+0x319/0x480 net/core/sock.c:1445
> [<ffffffff82b82b94>] sk_destruct+0x44/0x80 net/core/sock.c:1453
> [<ffffffff82b82c24>] __sk_free+0x54/0x230 net/core/sock.c:1461
> [<ffffffff82b82e23>] sk_free+0x23/0x30 net/core/sock.c:1472
> [<     inline     >] sock_put ./include/net/sock.h:1591
> [<ffffffff82b84b04>] sk_common_release+0x294/0x3e0 net/core/sock.c:2745
> [<ffffffff83825f19>] l2tp_ip6_close+0x209/0x2a0 net/l2tp/l2tp_ip6.c:243
> [<ffffffff8316b31f>] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
> [<ffffffff832cd4d0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
> [<ffffffff82b6d89e>] sock_release+0x8e/0x1d0 net/socket.c:570
> [<ffffffff82b6d9f6>] sock_close+0x16/0x20 net/socket.c:1017
> [<ffffffff81524bdd>] __fput+0x29d/0x720 fs/file_table.c:208
> [<ffffffff815250e5>] ____fput+0x15/0x20 fs/file_table.c:244
> [<ffffffff81172928>] task_work_run+0xf8/0x170 kernel/task_work.c:116
> [<     inline     >] exit_task_work ./include/linux/task_work.h:21
> [<ffffffff8111bda3>] do_exit+0x883/0x2ac0 kernel/exit.c:828
> [<ffffffff8112234e>] do_group_exit+0x10e/0x340 kernel/exit.c:931
> [<     inline     >] SYSC_exit_group kernel/exit.c:942
> [<ffffffff8112259d>] SyS_exit_group+0x1d/0x20 kernel/exit.c:940
> [<ffffffff83fc1501>] entry_SYSCALL_64_fastpath+0x1f/0xc2
> arch/x86/entry/entry_64.S:209
> Memory state around the buggy address:
>  ffff880067727580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff880067727600: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
> >ffff880067727680: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>                                                     ^
>  ffff880067727700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff880067727780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
> 
> To reproduce run the attached program in a tight parallel loop using
> stress (https://godoc.org/golang.org/x/tools/cmd/stress):
> $ gcc -lpthread tmp.c
> $ ./stress ./a.out
> 
> On commit bc33b0ca11e3df467777a4fa7639ba488c9d4911 (Nov 5).
> 

Thanks for the report. It looks like l2tp_ip6_bind() is racy.
Can you try the following patch?

diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index ad3468c..9978d01 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -269,8 +269,6 @@ static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
        int addr_type;
        int err;

-       if (!sock_flag(sk, SOCK_ZAPPED))
-               return -EINVAL;
        if (addr->l2tp_family != AF_INET6)
                return -EINVAL;
        if (addr_len < sizeof(*addr))
@@ -296,6 +294,9 @@ static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
        lock_sock(sk);

        err = -EINVAL;
+       if (!sock_flag(sk, SOCK_ZAPPED))
+               goto out_unlock;
+
        if (sk->sk_state != TCP_CLOSE)
                goto out_unlock;

^ permalink raw reply related

* Re: [PATCH] debugfs: improve formatting of debugfs_real_fops()
From: Greg Kroah-Hartman @ 2016-11-10 17:51 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, Nicolai Stange, Christian Lamparter
In-Reply-To: <1478798629-22318-1-git-send-email-jakub.kicinski@netronome.com>

On Thu, Nov 10, 2016 at 05:23:49PM +0000, Jakub Kicinski wrote:
> Type of debugfs_real_fops() is longer than parameters and
> the name, so there is no way to break the declaration nicely.
> We have to go over 80 characters.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
>  include/linux/debugfs.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox