Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] net: qualcomm: rmnet: Updates 2018-03-12
From: Subash Abhinov Kasiviswanathan @ 2018-03-13  2:26 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan

This series contains some minor updates for rmnet driver.

Patch 1 contains fixes for sparse warnings.
Patch 2 updates the copyright date to 2018.
Patch 3 is a cleanup in receive path.
Patch 4 has the implementation of the fill_info operation.

Subash Abhinov Kasiviswanathan (4):
  net: qualcomm: rmnet: Fix casting issues
  net: qualcomm: rmnet: Update copyright year to 2018
  net: qualcomm: rmnet: Remove unnecessary device assignment
  net: qualcomm: rmnet: Implement fill_info

 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 32 +++++++++++++++++++++-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h |  2 +-
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c   |  2 +-
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.h   |  2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h    |  4 +--
 .../ethernet/qualcomm/rmnet/rmnet_map_command.c    |  6 ++--
 .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c   |  8 +++---
 .../net/ethernet/qualcomm/rmnet/rmnet_private.h    |  2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c    |  2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h    |  2 +-
 10 files changed, 46 insertions(+), 16 deletions(-)

-- 
1.9.1

^ permalink raw reply

* [PATCH net-next 1/4] net: qualcomm: rmnet: Fix casting issues
From: Subash Abhinov Kasiviswanathan @ 2018-03-13  2:26 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1520907969-16696-1-git-send-email-subashab@codeaurora.org>

Fix warnings which were reported when running with sparse
(make C=1 CF=-D__CHECK_ENDIAN__)

drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c:81:15:
warning: cast to restricted __be16
drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:271:37:
warning: incorrect type in assignment (different base types)
expected unsigned short [unsigned] [usertype] pkt_len
got restricted __be16 [usertype] <noident>
drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:287:29:
warning: incorrect type in assignment (different base types)
expected unsigned short [unsigned] [usertype] pkt_len
got restricted __be16 [usertype] <noident>
drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:310:22:
warning: cast to restricted __be16
drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:319:13:
warning: cast to restricted __be16
drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c:49:18:
warning: cast to restricted __be16
drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c:50:18:
warning: cast to restricted __be32
drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c:74:21:
warning: cast to restricted __be16

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h         | 2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c | 4 ++--
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c    | 5 +++--
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
index 6ce31e2..65b074e 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
@@ -44,7 +44,7 @@ struct rmnet_map_header {
 	u8  reserved_bit:1;
 	u8  cd_bit:1;
 	u8  mux_id;
-	u16 pkt_len;
+	__be16 pkt_len;
 }  __aligned(1);
 
 struct rmnet_map_dl_csum_trailer {
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
index b0dbca0..b39b73b 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
@@ -46,8 +46,8 @@ static u8 rmnet_map_do_flow_control(struct sk_buff *skb,
 	vnd = ep->egress_dev;
 
 	ip_family = cmd->flow_control.ip_family;
-	fc_seq = ntohs(cmd->flow_control.flow_control_seq_num);
-	qos_id = ntohl(cmd->flow_control.qos_id);
+	fc_seq = ntohs((__force __be16)cmd->flow_control.flow_control_seq_num);
+	qos_id = ntohl((__force __be32)cmd->flow_control.qos_id);
 
 	/* Ignore the ip family and pass the sequence number for both v4 and v6
 	 * sequence. User space does not support creating dedicated flows for
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
index c74a6c5..4e342a3 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
@@ -307,7 +307,8 @@ struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb,
 		return NULL;
 
 	maph = (struct rmnet_map_header *)skb->data;
-	packet_len = ntohs(maph->pkt_len) + sizeof(struct rmnet_map_header);
+	packet_len = ntohs((__force __be16)maph->pkt_len) +
+		     sizeof(struct rmnet_map_header);
 
 	if (port->data_format & RMNET_INGRESS_FORMAT_MAP_CKSUMV4)
 		packet_len += sizeof(struct rmnet_map_dl_csum_trailer);
@@ -316,7 +317,7 @@ struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb,
 		return NULL;
 
 	/* Some hardware can send us empty frames. Catch them */
-	if (ntohs(maph->pkt_len) == 0)
+	if (ntohs((__force __be16)maph->pkt_len) == 0)
 		return NULL;
 
 	skbn = alloc_skb(packet_len + RMNET_MAP_DEAGGR_SPACING, GFP_ATOMIC);
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 2/4] net: qualcomm: rmnet: Update copyright year to 2018
From: Subash Abhinov Kasiviswanathan @ 2018-03-13  2:26 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1520907969-16696-1-git-send-email-subashab@codeaurora.org>

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c      | 2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h      | 2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c    | 2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h    | 2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h         | 2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c | 2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c    | 2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h     | 2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c         | 2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h         | 2 +-
 10 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
index c494918..096301a 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
@@ -1,4 +1,4 @@
-/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2013-2018, The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 and
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
index 00e4634..0b5b5da 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
@@ -1,4 +1,4 @@
-/* Copyright (c) 2013-2014, 2016-2017 The Linux Foundation. All rights reserved.
+/* Copyright (c) 2013-2014, 2016-2018 The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 and
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index 601edec..c758248 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -1,4 +1,4 @@
-/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2013-2018, The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 and
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h
index 3537e4c..1bc6828 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h
@@ -1,4 +1,4 @@
-/* Copyright (c) 2013, 2016-2017 The Linux Foundation. All rights reserved.
+/* Copyright (c) 2013, 2016-2018 The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 and
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
index 65b074e..2aaf093 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
@@ -1,4 +1,4 @@
-/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2013-2018, The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 and
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
index b39b73b..88f2595 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
@@ -1,4 +1,4 @@
-/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2013-2018, The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 and
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
index 4e342a3..fad3d0b 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
@@ -1,4 +1,4 @@
-/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2013-2018, The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 and
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
index de0143e..98365ef 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
@@ -1,4 +1,4 @@
-/* Copyright (c) 2013-2014, 2016-2017 The Linux Foundation. All rights reserved.
+/* Copyright (c) 2013-2014, 2016-2018 The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 and
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
index 346d310..2ea16a0 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -1,4 +1,4 @@
-/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2013-2018, The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 and
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h
index 71e4c32..daab75c 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h
@@ -1,4 +1,4 @@
-/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved.
+/* Copyright (c) 2013-2018, The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 and
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 3/4] net: qualcomm: rmnet: Remove unnecessary device assignment
From: Subash Abhinov Kasiviswanathan @ 2018-03-13  2:26 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1520907969-16696-1-git-send-email-subashab@codeaurora.org>

Device of the de-aggregated skb is correctly assigned after inspecting
the mux_id, so remove the assignment in rmnet_map_deaggregate().

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
index fad3d0b..134da43 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
@@ -324,7 +324,6 @@ struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb,
 	if (!skbn)
 		return NULL;
 
-	skbn->dev = skb->dev;
 	skb_reserve(skbn, RMNET_MAP_DEAGGR_HEADROOM);
 	skb_put(skbn, packet_len);
 	memcpy(skbn->data, skb->data, packet_len);
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 4/4] net: qualcomm: rmnet: Implement fill_info
From: Subash Abhinov Kasiviswanathan @ 2018-03-13  2:26 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1520907969-16696-1-git-send-email-subashab@codeaurora.org>

This is needed to query the mux_id and flags of a rmnet device.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 30 ++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
index 096301a..d0f3e0f 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
@@ -331,6 +331,35 @@ static size_t rmnet_get_size(const struct net_device *dev)
 	       nla_total_size(sizeof(struct ifla_vlan_flags)); /* IFLA_VLAN_FLAGS */
 }
 
+static int rmnet_fill_info(struct sk_buff *skb, const struct net_device *dev)
+{
+	struct rmnet_priv *priv = netdev_priv(dev);
+	struct net_device *real_dev;
+	struct ifla_vlan_flags f;
+	struct rmnet_port *port;
+
+	real_dev = priv->real_dev;
+
+	if (!rmnet_is_real_dev_registered(real_dev))
+		return -ENODEV;
+
+	if (nla_put_u16(skb, IFLA_VLAN_ID, priv->mux_id))
+		goto nla_put_failure;
+
+	port = rmnet_get_port_rtnl(real_dev);
+
+	f.flags = port->data_format;
+	f.mask  = ~0;
+
+	if (nla_put(skb, IFLA_VLAN_FLAGS, sizeof(f), &f))
+		goto nla_put_failure;
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
 struct rtnl_link_ops rmnet_link_ops __read_mostly = {
 	.kind		= "rmnet",
 	.maxtype	= __IFLA_VLAN_MAX,
@@ -341,6 +370,7 @@ struct rtnl_link_ops rmnet_link_ops __read_mostly = {
 	.dellink	= rmnet_dellink,
 	.get_size	= rmnet_get_size,
 	.changelink     = rmnet_changelink,
+	.fill_info	= rmnet_fill_info,
 };
 
 /* Needs either rcu_read_lock() or rtnl lock */
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH] net: hns: use put_device() if device_register fail
From: arvindY @ 2018-03-13  2:57 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: David Miller, yisen.zhuang, salil.mehta, linyunsheng, LKML,
	netdev, linux-mtd
In-Reply-To: <CAFLxGvw1U2OZn4kU1O6uJ17yWYf1deE=+Cr56Foffdq7Q=5u3A@mail.gmail.com>



On Monday 12 March 2018 10:59 PM, Richard Weinberger wrote:
> On Mon, Mar 12, 2018 at 5:27 PM, arvindY <arvind.yadav.cs@gmail.com> wrote:
>>
>> On Monday 12 March 2018 08:13 PM, David Miller wrote:
>>> From: Arvind Yadav <arvind.yadav.cs@gmail.com>
>>> Date: Fri,  9 Mar 2018 16:11:17 +0530
>>>
>>>> if device_register() returned an error! Always use put_device()
>>>> to give up the reference initialized.
>>>>
>>>> Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
>>> I do not see anything giving cls_dev an initial non-zero reference
>>> count before this device_register() call.
>> Yes,  you are correct there is nothing to release (hnae_release).
> Wait, this is not what DaveM said.
> Since you sent also patches to MTD I care about this too.
>
> Do we have to call put_device() in any case after a failure of
> device_register() or not?
> In this case the release function is empty, so nothing is going to released.
> But technically a put_device() is needed and correct according to your
> change log.
>
> I have to admit I don't know all details of the driver core, maybe you
> can help me.
> device_register() calls device_initialize() followed by device_add().
> device_initialize() does a kobject_init() which again sets the
> reference counter to 1 via kref_init().
> In the next step device_add() does a get_device() --> reference
> counter goes up to 2.
> But in the exit path of the function a put_device() is done, also in
> case of an error.
> So device_register() always returns with reference count 1.
>
> If I understand this correctly a put_device() is mandatory.
> Also in this driver.
> Can you please enlighten me? :-)
>
Oh, I just miss understood david question.

But what ever you are telling that is correct.
device_initialize() will call kref_init() which is setting
refcount 1 by refcount_set(&kref->refcount, 1).
and device_add() will call kref_get() to increment refcount
by calling refcount_inc(&kref->refcount).
So we need put_device() to decrement and release the
kboject.
Internally it'll call kref_put() -> refcount_dec_and_test(&kref->refcount)
to decrement refcount.

~arvind

^ permalink raw reply

* [PATCH net-next v2] sctp: fix error return code in sctp_sendmsg_new_asoc()
From: Wei Yongjun @ 2018-03-13  3:03 UTC (permalink / raw)
  To: Vlad Yasevich, Neil Horman, Xin Long
  Cc: Wei Yongjun, linux-sctp, netdev, kernel-janitors
In-Reply-To: <1520856964-132516-1-git-send-email-weiyongjun1@huawei.com>

Return error code -EINVAL in the address len check error handling
case since 'err' can be overwrite to 0 by 'err = sctp_verify_addr()'
in the for loop.

Fixes: 2c0dbaa0c43d ("sctp: add support for SCTP_DSTADDRV4/6 Information for sendmsg")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
---
v1 -> v2: remove the 'err' initialization
---
 net/sctp/socket.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 7d3476a..af5cf29 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1677,7 +1677,7 @@ static int sctp_sendmsg_new_asoc(struct sock *sk, __u16 sflags,
 	struct sctp_association *asoc;
 	enum sctp_scope scope;
 	struct cmsghdr *cmsg;
-	int err = -EINVAL;
+	int err;
 
 	*tp = NULL;
 
@@ -1761,16 +1761,20 @@ static int sctp_sendmsg_new_asoc(struct sock *sk, __u16 sflags,
 		memset(daddr, 0, sizeof(*daddr));
 		dlen = cmsg->cmsg_len - sizeof(struct cmsghdr);
 		if (cmsg->cmsg_type == SCTP_DSTADDRV4) {
-			if (dlen < sizeof(struct in_addr))
+			if (dlen < sizeof(struct in_addr)) {
+				err = -EINVAL;
 				goto free;
+			}
 
 			dlen = sizeof(struct in_addr);
 			daddr->v4.sin_family = AF_INET;
 			daddr->v4.sin_port = htons(asoc->peer.port);
 			memcpy(&daddr->v4.sin_addr, CMSG_DATA(cmsg), dlen);
 		} else {
-			if (dlen < sizeof(struct in6_addr))
+			if (dlen < sizeof(struct in6_addr)) {
+				err = -EINVAL;
 				goto free;
+			}
 
 			dlen = sizeof(struct in6_addr);
 			daddr->v6.sin6_family = AF_INET6;

^ permalink raw reply related

* [PATCH v2] netfilter: nf_tables: remove VLA usage
From: Gustavo A. R. Silva @ 2018-03-13  3:16 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	David S. Miller
  Cc: netfilter-devel, coreteam, netdev, linux-kernel,
	Gustavo A. R. Silva

In preparation to enabling -Wvla, remove VLA and replace it
with dynamic memory allocation.

>From a security viewpoint, the use of Variable Length Arrays can be
a vector for stack overflow attacks. Also, in general, as the code
evolves it is easy to lose track of how big a VLA can get. Thus, we
can end up having segfaults that are hard to debug.

Also, fixed as part of the directive to remove all VLAs from
the kernel: https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
Changes in v2:
 - Use kmalloc_array instead of kcalloc.

 net/netfilter/nf_tables_api.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 3f815b6..5a42e97 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -4357,16 +4357,20 @@ static struct nft_object *nft_obj_init(const struct nft_ctx *ctx,
 				       const struct nft_object_type *type,
 				       const struct nlattr *attr)
 {
-	struct nlattr *tb[type->maxattr + 1];
+	struct nlattr **tb;
 	const struct nft_object_ops *ops;
 	struct nft_object *obj;
-	int err;
+	int err = -ENOMEM;
+
+	tb = kmalloc_array(type->maxattr + 1, sizeof(*tb), GFP_KERNEL);
+	if (!tb)
+		goto err1;
 
 	if (attr) {
 		err = nla_parse_nested(tb, type->maxattr, attr, type->policy,
 				       NULL);
 		if (err < 0)
-			goto err1;
+			goto err2;
 	} else {
 		memset(tb, 0, sizeof(tb[0]) * (type->maxattr + 1));
 	}
@@ -4375,7 +4379,7 @@ static struct nft_object *nft_obj_init(const struct nft_ctx *ctx,
 		ops = type->select_ops(ctx, (const struct nlattr * const *)tb);
 		if (IS_ERR(ops)) {
 			err = PTR_ERR(ops);
-			goto err1;
+			goto err2;
 		}
 	} else {
 		ops = type->ops;
@@ -4383,18 +4387,21 @@ static struct nft_object *nft_obj_init(const struct nft_ctx *ctx,
 
 	err = -ENOMEM;
 	obj = kzalloc(sizeof(*obj) + ops->size, GFP_KERNEL);
-	if (obj == NULL)
-		goto err1;
+	if (!obj)
+		goto err2;
 
 	err = ops->init(ctx, (const struct nlattr * const *)tb, obj);
 	if (err < 0)
-		goto err2;
+		goto err3;
 
 	obj->ops = ops;
 
+	kfree(tb);
 	return obj;
-err2:
+err3:
 	kfree(obj);
+err2:
+	kfree(tb);
 err1:
 	return ERR_PTR(err);
 }
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v3 net-next 1/4] net: macb: Reorganize macb_mii bringup
From: kbuild test robot @ 2018-03-13  4:25 UTC (permalink / raw)
  To: Brad Mouring
  Cc: kbuild-all, Nicolas Ferre, Rob Herring, David S . Miller,
	Michael Grzeschik, Andrew Lunn, Mark Rutland, netdev,
	Julia Cartwright, devicetree, Brad Mouring
In-Reply-To: <20180312171001.129209-2-brad.mouring@ni.com>

[-- Attachment #1: Type: text/plain, Size: 8078 bytes --]

Hi Brad,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v4.16-rc4]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Brad-Mouring/net-macb-Reorganize-macb_mii-bringup/20180313-113743
config: x86_64-randconfig-x003-201810 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

Note: the linux-review/Brad-Mouring/net-macb-Reorganize-macb_mii-bringup/20180313-113743 HEAD dbe45eb4d14a7cd3d0b9c7ee8e7ca034c1ef3852 builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

   drivers/net/ethernet/cadence/macb_main.c: In function 'macb_mii_init':
>> drivers/net/ethernet/cadence/macb_main.c:596:8: error: 'i' undeclared (first use in this function)
      for (i = 0; i < PHY_MAX_ADDR; i++)
           ^
   drivers/net/ethernet/cadence/macb_main.c:596:8: note: each undeclared identifier is reported only once for each function it appears in

vim +/i +596 drivers/net/ethernet/cadence/macb_main.c

6c36a707 drivers/net/macb.c                       frederic RODO      2007-07-12  564  
421d9df0 drivers/net/ethernet/cadence/macb.c      Cyrille Pitchen    2015-03-07  565  static int macb_mii_init(struct macb *bp)
6c36a707 drivers/net/macb.c                       frederic RODO      2007-07-12  566  {
84e0cdb0 drivers/net/ethernet/cadence/macb.c      Jamie Iles         2011-03-08  567  	struct macb_platform_data *pdata;
148cbb53 drivers/net/ethernet/cadence/macb.c      Boris Brezillon    2013-08-22  568  	struct device_node *np;
6576bbb8 drivers/net/ethernet/cadence/macb_main.c Brad Mouring       2018-03-12  569  	int err;
6c36a707 drivers/net/macb.c                       frederic RODO      2007-07-12  570  
3dbda77e drivers/net/macb.c                       Uwe Kleine-Koenig  2009-07-23  571  	/* Enable management port */
6c36a707 drivers/net/macb.c                       frederic RODO      2007-07-12  572  	macb_writel(bp, NCR, MACB_BIT(MPE));
6c36a707 drivers/net/macb.c                       frederic RODO      2007-07-12  573  
298cf9be drivers/net/macb.c                       Lennert Buytenhek  2008-10-08  574  	bp->mii_bus = mdiobus_alloc();
aa50b552 drivers/net/ethernet/cadence/macb.c      Moritz Fischer     2016-03-29  575  	if (!bp->mii_bus) {
298cf9be drivers/net/macb.c                       Lennert Buytenhek  2008-10-08  576  		err = -ENOMEM;
298cf9be drivers/net/macb.c                       Lennert Buytenhek  2008-10-08  577  		goto err_out;
298cf9be drivers/net/macb.c                       Lennert Buytenhek  2008-10-08  578  	}
298cf9be drivers/net/macb.c                       Lennert Buytenhek  2008-10-08  579  
298cf9be drivers/net/macb.c                       Lennert Buytenhek  2008-10-08  580  	bp->mii_bus->name = "MACB_mii_bus";
298cf9be drivers/net/macb.c                       Lennert Buytenhek  2008-10-08  581  	bp->mii_bus->read = &macb_mdio_read;
298cf9be drivers/net/macb.c                       Lennert Buytenhek  2008-10-08  582  	bp->mii_bus->write = &macb_mdio_write;
98d5e57e drivers/net/ethernet/cadence/macb.c      Florian Fainelli   2012-01-09  583  	snprintf(bp->mii_bus->id, MII_BUS_ID_SIZE, "%s-%x",
98d5e57e drivers/net/ethernet/cadence/macb.c      Florian Fainelli   2012-01-09  584  		 bp->pdev->name, bp->pdev->id);
298cf9be drivers/net/macb.c                       Lennert Buytenhek  2008-10-08  585  	bp->mii_bus->priv = bp;
cf669660 drivers/net/ethernet/cadence/macb.c      Florian Fainelli   2016-05-02  586  	bp->mii_bus->parent = &bp->pdev->dev;
c607a0d9 drivers/net/ethernet/cadence/macb.c      Jingoo Han         2013-08-30  587  	pdata = dev_get_platdata(&bp->pdev->dev);
6c36a707 drivers/net/macb.c                       frederic RODO      2007-07-12  588  
148cbb53 drivers/net/ethernet/cadence/macb.c      Boris Brezillon    2013-08-22  589  	dev_set_drvdata(&bp->dev->dev, bp->mii_bus);
148cbb53 drivers/net/ethernet/cadence/macb.c      Boris Brezillon    2013-08-22  590  
148cbb53 drivers/net/ethernet/cadence/macb.c      Boris Brezillon    2013-08-22  591  	np = bp->pdev->dev.of_node;
dacdbb4d drivers/net/ethernet/cadence/macb.c      Michael Grzeschik  2017-06-23  592  
6576bbb8 drivers/net/ethernet/cadence/macb_main.c Brad Mouring       2018-03-12  593  	if (np) {
148cbb53 drivers/net/ethernet/cadence/macb.c      Boris Brezillon    2013-08-22  594  		err = of_mdiobus_register(bp->mii_bus, np);
148cbb53 drivers/net/ethernet/cadence/macb.c      Boris Brezillon    2013-08-22  595  	} else {
83a77e9e drivers/net/ethernet/cadence/macb.c      Bartosz Folta      2016-12-14 @596  		for (i = 0; i < PHY_MAX_ADDR; i++)
83a77e9e drivers/net/ethernet/cadence/macb.c      Bartosz Folta      2016-12-14  597  			bp->mii_bus->irq[i] = PHY_POLL;
83a77e9e drivers/net/ethernet/cadence/macb.c      Bartosz Folta      2016-12-14  598  
148cbb53 drivers/net/ethernet/cadence/macb.c      Boris Brezillon    2013-08-22  599  		if (pdata)
148cbb53 drivers/net/ethernet/cadence/macb.c      Boris Brezillon    2013-08-22  600  			bp->mii_bus->phy_mask = pdata->phy_mask;
148cbb53 drivers/net/ethernet/cadence/macb.c      Boris Brezillon    2013-08-22  601  
148cbb53 drivers/net/ethernet/cadence/macb.c      Boris Brezillon    2013-08-22  602  		err = mdiobus_register(bp->mii_bus);
148cbb53 drivers/net/ethernet/cadence/macb.c      Boris Brezillon    2013-08-22  603  	}
89e5785f drivers/net/macb.c                       Haavard Skinnemoen 2006-11-09  604  
148cbb53 drivers/net/ethernet/cadence/macb.c      Boris Brezillon    2013-08-22  605  	if (err)
e7f4dc35 drivers/net/ethernet/cadence/macb.c      Andrew Lunn        2016-01-06  606  		goto err_out_free_mdiobus;
89e5785f drivers/net/macb.c                       Haavard Skinnemoen 2006-11-09  607  
7daa78e3 drivers/net/ethernet/cadence/macb.c      Boris Brezillon    2013-08-27  608  	err = macb_mii_probe(bp->dev);
7daa78e3 drivers/net/ethernet/cadence/macb.c      Boris Brezillon    2013-08-27  609  	if (err)
6c36a707 drivers/net/macb.c                       frederic RODO      2007-07-12  610  		goto err_out_unregister_bus;
6c36a707 drivers/net/macb.c                       frederic RODO      2007-07-12  611  
6c36a707 drivers/net/macb.c                       frederic RODO      2007-07-12  612  	return 0;
6c36a707 drivers/net/macb.c                       frederic RODO      2007-07-12  613  
6c36a707 drivers/net/macb.c                       frederic RODO      2007-07-12  614  err_out_unregister_bus:
298cf9be drivers/net/macb.c                       Lennert Buytenhek  2008-10-08  615  	mdiobus_unregister(bp->mii_bus);
9ce98140 drivers/net/ethernet/cadence/macb_main.c Michael Grzeschik  2017-11-08  616  	if (np && of_phy_is_fixed_link(np))
9ce98140 drivers/net/ethernet/cadence/macb_main.c Michael Grzeschik  2017-11-08  617  		of_phy_deregister_fixed_link(np);
6576bbb8 drivers/net/ethernet/cadence/macb_main.c Brad Mouring       2018-03-12  618  err_out_free_mdiobus:
6576bbb8 drivers/net/ethernet/cadence/macb_main.c Brad Mouring       2018-03-12  619  	of_node_put(bp->phy_node);
298cf9be drivers/net/macb.c                       Lennert Buytenhek  2008-10-08  620  	mdiobus_free(bp->mii_bus);
6c36a707 drivers/net/macb.c                       frederic RODO      2007-07-12  621  err_out:
6c36a707 drivers/net/macb.c                       frederic RODO      2007-07-12  622  	return err;
89e5785f drivers/net/macb.c                       Haavard Skinnemoen 2006-11-09  623  }
89e5785f drivers/net/macb.c                       Haavard Skinnemoen 2006-11-09  624  

:::::: The code at line 596 was first introduced by commit
:::::: 83a77e9ec4150ee4acc635638f7dedd9da523a26 net: macb: Added PCI wrapper for Platform Driver.

:::::: TO: Bartosz Folta <bfolta@cadence.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29486 bytes --]

^ permalink raw reply

* Re: [PATCH v3] kernel.h: Skip single-eval logic on literals in min()/max()
From: Kees Cook @ 2018-03-13  4:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Linux Kernel Mailing List, Josh Poimboeuf,
	Rasmus Villemoes, Gustavo A. R. Silva, Tobin C. Harding,
	Steven Rostedt, Jonathan Corbet, Chris Mason, Josef Bacik,
	David Sterba, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Masahiro Yamada, Borislav Petkov, Randy Dunlap <rdunl
In-Reply-To: <CA+55aFwK7W0SsaP5eR+2TxOj-j_Mu_E11fRw2Gk8ptV71ebvww@mail.gmail.com>

On Mon, Mar 12, 2018 at 4:57 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Mar 12, 2018 at 3:55 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
>>
>> Replacing the __builtin_choose_expr() with ?: works of course.
>
> Hmm. That sounds like the right thing to do. We were so myopically
> staring at the __builtin_choose_expr() problem that we overlooked the
> obvious solution.
>
> Using __builtin_constant_p() together with a ?: is in fact our common
> pattern, so that should be fine. The only real reason to use
> __builtin_choose_expr() is if you want to get the *type* to vary
> depending on which side you choose, but that's not an issue for
> min/max.

This doesn't solve it for -Wvla, unfortunately. That was the point of
Josh's original suggestion of __builtin_choose_expr().

Try building with KCFLAGS=-Wval and checking net/ipv6/proc.c:

net/ipv6/proc.c: In function ‘snmp6_seq_show_item’:
net/ipv6/proc.c:198:2: warning: ISO C90 forbids array ‘buff’ whose
size can’t be evaluated [-Wvla]
  unsigned long buff[SNMP_MIB_MAX];
  ^~~~~~~~


-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH 01/15] ice: Add basic driver framework for Intel(R) E800 Series
From: Jakub Kicinski @ 2018-03-13  4:41 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: Anirudh Venkataramanan, netdev, intel-wired-lan
In-Reply-To: <1520873790.19785.4.camel@intel.com>

On Mon, 12 Mar 2018 09:56:30 -0700, Jeff Kirsher wrote:
> On Fri, 2018-03-09 at 14:16 -0800, Jakub Kicinski wrote:
> > On Fri,  9 Mar 2018 09:21:22 -0800, Anirudh Venkataramanan wrote:  
> > > diff --git a/Documentation/networking/ice.txt
> > > b/Documentation/networking/ice.txt
> > > new file mode 100644
> > > index 000000000000..6261c46378e1
> > > --- /dev/null
> > > +++ b/Documentation/networking/ice.txt  
> > 
> > nit: ice.rst, maybe?  
> 
> If that is the desired direction for kernel documentation, then we can
> work on generating *.rst for all of our wired ethernet drivers.  For
> now, we just have the simple text file.
> 
> Currently there are only 5 networking drivers that have a *.rst for
> documentation, so it looks like an opportunity to convert all the
> networking documentation to *.rst format, if that is desired. 

Oh, just a suggestion, I had a quick look and your file actually works
pretty nicely as rst though I'm not an expert..

^ permalink raw reply

* Re: [PATCH 1/1] net: check dev->reg_state before deref of napi netdev_ops
From: Josh Elsasser @ 2018-03-13  5:17 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Greg Kroah-Hartman, Eric Dumazet, Sasha Levin,
	Willem de Bruijn, Alexander Potapenko, Michal Kubeček,
	Linux Kernel Network Developers, LKML
In-Reply-To: <CAM_iQpVWQ+4YKX=ZCCpJ3fzbexNVYJ8wONz9QnwuxBtbQwx3Lg@mail.gmail.com>

On Mon, Mar 12, 2018 at 4:17 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Sun, Mar 11, 2018 at 12:22 PM, Josh Elsasser <jelsasser@appneta.com> wrote:
>> init_dummy_netdev() leaves its netdev_ops pointer zeroed. This leads
>> to a NULL pointer dereference when sk_busy_loop fires against an iwlwifi
>> wireless adapter and checks napi->dev->netdev_ops->ndo_busy_poll.
>>
>> Avoid this by ensuring that napi->dev is not a dummy device before
>> dereferencing napi dev's netdev_ops, preventing the following panic:
>
> Hmm, how about just checking ->netdev_ops? Checking reg_state looks
> odd, although works.

Fair point. I was trying to differentiate between an unexpected
NULL pointer and a dummy netdev, but I guess it was clever
at the expense of readability.

I'll push up a v2 that just does the obvious.

^ permalink raw reply

* [PATCH 0/1] net: avoid a kernel panic during sk_busy_loop
From: Josh Elsasser @ 2018-03-13  5:31 UTC (permalink / raw)
  To: davem
  Cc: Josh Elsasser, Greg Kroah-Hartman, Eric Dumazet, Willem de Bruijn,
	Cong Wang, Michal Kubeček, Vlad Yasevich, netdev,
	linux-kernel

V2: just check napi->dev->netdev_ops instead of getting clever with the
netdev registration state.

Original cover letter:

Hi Dave,

I stumbled across a reproducible kernel panic while playing around with
busy_poll on a Linux 4.9.86 kernel. There's an unfortunate interaction
between init_dummy_netdev, which doesn't bother to fill in netdev_ops, and
sk_busy_loop, which assumed netdev_ops is a valid pointer.

To reproduce on the device under test (DUT), I did:

  $ ip addr show dev wlan0
  8: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq [...]
      inet 172.16.122.6/23 brd 172.16.123.255 scope global wlan0
  $ sysctl -w net.core.busy_read=50
  $ nc -l 172.16.122.6 5001

Then transmitted some data to this socket from a second host:

  $ echo "foo" | nc 172.16.122.6 5001

The DUT immediately hits a kernel panic.

I've attached a patch that applies cleanly to the 4.9.87 stable release.
This fix isn't necessary for net/net-next (ndo_busy_poll was removed in
linux-4.11), but a further backport of this commit is likely required for
any stable releases older than linux-4.5.

I hope this is the right way to raise something like this. I couldn't find
a clear answer from the -stable and netdev on how to handle bugs in features
that no longer exist in mainline.

Thanks,
    Josh

^ permalink raw reply

* [PATCH v2 1/1] net: check before dereferencing netdev_ops during busy poll
From: Josh Elsasser @ 2018-03-13  5:32 UTC (permalink / raw)
  To: davem
  Cc: Josh Elsasser, Greg Kroah-Hartman, Eric Dumazet, Willem de Bruijn,
	Alexander Potapenko, Cong Wang, Vlad Yasevich,
	Michal Kubeček, netdev, linux-kernel
In-Reply-To: <20180313053248.13654-1-jelsasser@appneta.com>

init_dummy_netdev() leaves its netdev_ops pointer zeroed. This leads
to a NULL pointer dereference when sk_busy_loop fires against an iwlwifi
wireless adapter and checks napi->dev->netdev_ops->ndo_busy_poll.

Avoid this by ensuring napi->dev->netdev_ops is valid before following
the pointer, avoiding the following panic when busy polling on a dummy
netdev:

  BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
  IP: [<ffffffff817b4b72>] sk_busy_loop+0x92/0x2f0
  Call Trace:
   [<ffffffff815a3134>] ? uart_write_room+0x74/0xf0
   [<ffffffff817964a9>] sock_poll+0x99/0xa0
   [<ffffffff81223142>] do_sys_poll+0x2e2/0x520
   [<ffffffff8118d3fc>] ? get_page_from_freelist+0x3bc/0xa30
   [<ffffffff810ada22>] ? update_curr+0x62/0x140
   [<ffffffff811ea671>] ? __slab_free+0xa1/0x2a0
   [<ffffffff811ea671>] ? __slab_free+0xa1/0x2a0
   [<ffffffff8179dbb1>] ? skb_free_head+0x21/0x30
   [<ffffffff81221bd0>] ? poll_initwait+0x50/0x50
   [<ffffffff811eaa36>] ? kmem_cache_free+0x1c6/0x1e0
   [<ffffffff815a4884>] ? uart_write+0x124/0x1d0
   [<ffffffff810bd1cd>] ? remove_wait_queue+0x4d/0x60
   [<ffffffff810bd224>] ? __wake_up+0x44/0x50
   [<ffffffff81582731>] ? tty_write_unlock+0x31/0x40
   [<ffffffff8158c5c6>] ? tty_ldisc_deref+0x16/0x20
   [<ffffffff81584820>] ? tty_write+0x1e0/0x2f0
   [<ffffffff81587e50>] ? process_echoes+0x80/0x80
   [<ffffffff8120c17b>] ? __vfs_write+0x2b/0x130
   [<ffffffff8120d09a>] ? vfs_write+0x15a/0x1a0
   [<ffffffff81223455>] SyS_poll+0x75/0x100
   [<ffffffff819a6524>] entry_SYSCALL_64_fastpath+0x24/0xcf

Commit 79e7fff47b7b ("net: remove support for per driver ndo_busy_poll()")
indirectly fixed this upstream in linux-4.11 by removing the offending
pointer usage. No other users of napi->dev touch its netdev_ops.

Fixes: 060212928670 ("net: add low latency socket poll")
Fixes: ce6aea93f751 ("net: network drivers no longer need to implement ndo_busy_poll()") - 4.9.y
Signed-off-by: Josh Elsasser <jelsasser@appneta.com>
---
 net/core/dev.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 8898618bf341..1f50c131ed15 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5042,7 +5042,10 @@ bool sk_busy_loop(struct sock *sk, int nonblock)
 		goto out;
 
 	/* Note: ndo_busy_poll method is optional in linux-4.5 */
-	busy_poll = napi->dev->netdev_ops->ndo_busy_poll;
+	if (napi->dev->netdev_ops)
+		busy_poll = napi->dev->netdev_ops->ndo_busy_poll;
+	else
+		busy_poll = NULL;
 
 	do {
 		rc = 0;
-- 
2.11.0

^ permalink raw reply related

* BUG_ON triggered in skb_segment
From: Yonghong Song @ 2018-03-13  5:45 UTC (permalink / raw)
  To: Daniel Borkmann, Eric Dumazet, Alexei Starovoitov, netdev,
	Martin Lau, Yonghong Song

Hi,

One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
net-next function skb_segment, line 3667.

3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3473                             netdev_features_t features)
3474 {
3475         struct sk_buff *segs = NULL;
3476         struct sk_buff *tail = NULL;
...
3665                 while (pos < offset + len) {
3666                         if (i >= nfrags) {
3667                                 BUG_ON(skb_headlen(list_skb));
3668
3669                                 i = 0;
3670                                 nfrags = 
skb_shinfo(list_skb)->nr_frags;
3671                                 frag = skb_shinfo(list_skb)->frags;
3672                                 frag_skb = list_skb;
...

call stack:
...
#0 [ffff883ffef034f8] machine_kexec at ffffffff81044c41
  #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
  #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
  #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
  #4 [ffff883ffef03668] die at ffffffff8101deb2
  #5 [ffff883ffef03698] do_trap at ffffffff8101a700
  #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
  #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
  #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
     [exception RIP: skb_segment+3044]
     RIP: ffffffff817e4dd4  RSP: ffff883ffef03860  RFLAGS: 00010216
     RAX: 0000000000002bf6  RBX: ffff883feb7aaa00  RCX: 0000000000000011
     RDX: ffff883fb87910c0  RSI: 0000000000000011  RDI: ffff883feb7ab500
     RBP: ffff883ffef03928   R8: 0000000000002ce2   R9: 00000000000027da
     R10: 000001ea00000000  R11: 0000000000002d82  R12: ffff883f90a1ee80
     R13: ffff883fb8791120  R14: ffff883feb7abc00  R15: 0000000000002ce2
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
#10 [ffff883ffef03990] tcp4_gso_segment at ffffffff818717d8
#11 [ffff883ffef039b0] inet_gso_segment at ffffffff81882c9b
#12 [ffff883ffef03a10] skb_mac_gso_segment at ffffffff817f39b8
#13 [ffff883ffef03a38] __skb_gso_segment at ffffffff817f3ac9
#14 [ffff883ffef03a68] validate_xmit_skb at ffffffff817f3eed
#15 [ffff883ffef03aa8] validate_xmit_skb_list at ffffffff817f40a2
#16 [ffff883ffef03ad8] sch_direct_xmit at ffffffff81824efb
#17 [ffff883ffef03b20] __qdisc_run at ffffffff818251aa
#18 [ffff883ffef03b90] __dev_queue_xmit at ffffffff817f45ed
#19 [ffff883ffef03c08] dev_queue_xmit at ffffffff817f4b90
#20 [ffff883ffef03c18] __bpf_redirect at ffffffff81812b66
#21 [ffff883ffef03c40] skb_do_redirect at ffffffff81813209
#22 [ffff883ffef03c60] __netif_receive_skb_core at ffffffff817f310d
#23 [ffff883ffef03cc8] __netif_receive_skb at ffffffff817f32e8
#24 [ffff883ffef03ce8] netif_receive_skb_internal at ffffffff817f5538
#25 [ffff883ffef03d10] napi_gro_complete at ffffffff817f56c0
#26 [ffff883ffef03d28] dev_gro_receive at ffffffff817f5ea6
#27 [ffff883ffef03d78] napi_gro_receive at ffffffff817f6168
#28 [ffff883ffef03da0] mlx5e_handle_rx_cqe_mpwrq at ffffffff817381c2
#29 [ffff883ffef03e30] mlx5e_poll_rx_cq at ffffffff817386c2
#30 [ffff883ffef03e80] mlx5e_napi_poll at ffffffff8173926e
#31 [ffff883ffef03ed0] net_rx_action at ffffffff817f5a6e
#32 [ffff883ffef03f48] __softirqentry_text_start at ffffffff81c000c3
#33 [ffff883ffef03fa8] irq_exit at ffffffff8108f515
#34 [ffff883ffef03fb8] do_IRQ at ffffffff81a01b11
--- <IRQ stack> ---
bt: cannot transition from IRQ stack to current process stack:
         IRQ stack pointer: ffff883ffef034f8
     process stack pointer: ffffffff81a01ae9
        current stack base: ffffc9000c5c4000
...
Setup:
=====

The test will involve three machines:
   M_ipv6 <-> M_nat <-> M_ipv4

The M_nat will do ipv4<->ipv6 address translation and then forward packet
to proper destination. The control plane will configure M_nat properly
will understand virtual ipv4 address for machine M_ipv6, and
virtual ipv6 address for machine M_ipv4.

M_nat runs a bpf program, which is attached to clsact (ingress) qdisc.
The program uses bpf_skb_change_proto to do protocol conversion.
bpf_skb_change_proto will adjust skb header_len and len properly
based on protocol change.
After the conversion, the program will make proper change on
ethhdr and ip4/6 header, recalculate checksum, and send the packet out
through bpf_redirect.

Experiment:
===========

MTU: 1500B for all three machines.

The tso/lro/gro are enabled on the M_nat box.

ping works on both ways of M_ipv6 <-> M_ipv4.
It works for transfering a small file (4KB) between M_ipv6 and M_ipv4 
(both ways).
Transfering a large file (e.g., 4MB) from M_ipv6 to M_ipv4, failed with 
the above BUG_ON, really fast.
Did not really test from M_ipv4 to M_ipv6 with large file.

The error path likely to be (also from the above call stack):
   nic -> lro/gro -> bpf_program -> gso (BUG_ON)

In one of experiments, I explicitly printed the skb->len and 
skb->data_len. The values are below:
   skb_segment: len 2856, data_len 2686
They should be equal to avoid BUG.

In another experiment, I got:
   skb_segment: len 1428, data_len 1258

In both cases, the difference is 170 bytes. Not sure whether
this is just a coincidence or not.

Workaround:
===========

A workaround to avoid BUG_ON is to disable lro/gro. This way,
kernel will not receive big packets and hence gso is not really called.

I am not familiar with gso code. Does anybody hit this BUG_ON before?
Any suggestion on how to debug this?

Thanks!

Yonghong

^ permalink raw reply

* Re: [PATCH v2 1/1] net: check before dereferencing netdev_ops during busy poll
From: Eric Dumazet @ 2018-03-13  5:50 UTC (permalink / raw)
  To: Josh Elsasser, davem
  Cc: Greg Kroah-Hartman, Eric Dumazet, Willem de Bruijn,
	Alexander Potapenko, Cong Wang, Vlad Yasevich,
	Michal Kubeček, netdev, linux-kernel
In-Reply-To: <20180313053248.13654-2-jelsasser@appneta.com>



On 03/12/2018 10:32 PM, Josh Elsasser wrote:
> init_dummy_netdev() leaves its netdev_ops pointer zeroed. This leads
> to a NULL pointer dereference when sk_busy_loop fires against an iwlwifi
> wireless adapter and checks napi->dev->netdev_ops->ndo_busy_poll.
> 
> Avoid this by ensuring napi->dev->netdev_ops is valid before following
> the pointer, avoiding the following panic when busy polling on a dummy
> netdev:
>


> 
> Fixes: 060212928670 ("net: add low latency socket poll")
> Fixes: ce6aea93f751 ("net: network drivers no longer need to implement ndo_busy_poll()") - 4.9.y
> Signed-off-by: Josh Elsasser <jelsasser@appneta.com>
> ---
>   net/core/dev.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 8898618bf341..1f50c131ed15 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5042,7 +5042,10 @@ bool sk_busy_loop(struct sock *sk, int nonblock)
>   		goto out;
>   
>   	/* Note: ndo_busy_poll method is optional in linux-4.5 */
> -	busy_poll = napi->dev->netdev_ops->ndo_busy_poll;
> +	if (napi->dev->netdev_ops)
> +		busy_poll = napi->dev->netdev_ops->ndo_busy_poll;
> +	else
> +		busy_poll = NULL;
>   
>   	do {
>   		rc = 0;
> 
We could instead setup a non NULL netdev_ops pointer on these 'dummy' 
devices to not add a check in fast path, but I presume we do
not really care since this fix is for old kernels, and considering how 
long it took to discover this bug.

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* RE: [PATCH v4 2/6] staging: fsl-dpaa2/ethsw: Add Freescale DPAA2 Ethernet Switch driver
From: Razvan Stefanescu @ 2018-03-13  5:51 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: gregkh@linuxfoundation.org, devel@driverdev.osuosl.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Alexander Graf, arnd@arndb.de, Alexandru Marginean,
	Ruxandra Ioana Ciocoi Radulescu, Ioana Ciornei, Laurentiu Tudor,
	stuyoder@gmail.com
In-Reply-To: <20180312143654.GB21068@lunn.ch>



> -----Original Message-----
> From: Andrew Lunn [mailto:andrew@lunn.ch]
> Sent: Monday, March 12, 2018 4:37 PM
> To: Razvan Stefanescu <razvan.stefanescu@nxp.com>
> Cc: gregkh@linuxfoundation.org; devel@driverdev.osuosl.org; linux-
> kernel@vger.kernel.org; netdev@vger.kernel.org; Alexander Graf
> <agraf@suse.de>; arnd@arndb.de; Alexandru Marginean
> <alexandru.marginean@nxp.com>; Ruxandra Ioana Ciocoi Radulescu
> <ruxandra.radulescu@nxp.com>; Ioana Ciornei <ioana.ciornei@nxp.com>;
> Laurentiu Tudor <laurentiu.tudor@nxp.com>; stuyoder@gmail.com
> Subject: Re: [PATCH v4 2/6] staging: fsl-dpaa2/ethsw: Add Freescale DPAA2
> Ethernet Switch driver
> 
> > +static int port_netdevice_event(struct notifier_block *unused,
> > +				unsigned long event, void *ptr)
> > +{
> > +	struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
> > +	struct netdev_notifier_changeupper_info *info = ptr;
> > +	struct net_device *upper_dev;
> > +	int err = 0;
> > +
> > +	if (netdev->netdev_ops != &ethsw_port_ops)
> > +		return NOTIFY_DONE;
> > +
> > +	/* Handle just upper dev link/unlink for the moment */
> > +	if (event == NETDEV_CHANGEUPPER) {
> > +		upper_dev = info->upper_dev;
> > +		if (netif_is_bridge_master(upper_dev)) {
> > +			if (info->linking)
> > +				err = port_bridge_join(netdev);
> > +			else
> > +				err = port_bridge_leave(netdev);
> > +		}
> > +	}
> > +
> > +	return notifier_from_errno(err);
> > +}
> 
> I could be missing something here, but don't you need to pass to
> port_bridge_join() which bridge the port is joining. There can be
> multiple bridges, so you need to ensure the port joins the correct
> bridge.
> 
Thank you for noticing this. I'll add proper checks in next version.

	Razvan

> 	Andrew

^ permalink raw reply

* [PATCH net 1/2] net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off
From: Toshiaki Makita @ 2018-03-13  5:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: Toshiaki Makita, netdev, Brandon Carpenter, Vlad Yasevich
In-Reply-To: <1520920288-2483-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>

When we have a bridge with vlan_filtering on and a vlan device on top of
it, packets would be corrupted in skb_vlan_untag() called from
br_dev_xmit().

The problem sits in skb_reorder_vlan_header() used in skb_vlan_untag(),
which makes use of skb->mac_len. In this function mac_len is meant for
handling rx path with vlan devices with reorder_header disabled, but in
tx path mac_len is typically 0 and cannot be used, which is the problem
in this case.

The current code even does not properly handle rx path (skb_vlan_untag()
called from __netif_receive_skb_core()) with reorder_header off actually.

In rx path single tag case, it works as follows:

- Before skb_reorder_vlan_header()

 mac_header                                data
   v                                        v
   +-------------------+-------------+------+----
   |        ETH        |    VLAN     | ETH  |
   |       ADDRS       | TPID | TCI  | TYPE |
   +-------------------+-------------+------+----
   <-------- mac_len --------->
                       <------------->
                        to be removed

- After skb_reorder_vlan_header()

            mac_header                     data
                 v                          v
                 +-------------------+------+----
                 |        ETH        | ETH  |
                 |       ADDRS       | TYPE |
                 +-------------------+------+----
                 <-------- mac_len --------->

This is ok, but in rx double tag case, it corrupts packets:

- Before skb_reorder_vlan_header()

 mac_header                                              data
   v                                                      v
   +-------------------+-------------+-------------+------+----
   |        ETH        |    VLAN     |    VLAN     | ETH  |
   |       ADDRS       | TPID | TCI  | TPID | TCI  | TYPE |
   +-------------------+-------------+-------------+------+----
   <--------------- mac_len ---------------->
                                     <------------->
                                    should be removed
                       <--------------------------->
                         actually will be removed

- After skb_reorder_vlan_header()

            mac_header                                   data
                 v                                        v
                               +-------------------+------+----
                               |        ETH        | ETH  |
                               |       ADDRS       | TYPE |
                               +-------------------+------+----
                 <--------------- mac_len ---------------->

So, two of vlan tags are both removed while only inner one should be
removed and mac_header (and mac_len) is broken.

skb_vlan_untag() is meant for removing the vlan header at (skb->data - 2),
so use skb->data and skb->mac_header to calculate the right offset.

Reported-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Fixes: a6e18ff11170 ("vlan: Fix untag operations of stacked vlans with REORDER_HEADER off")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
---
 include/uapi/linux/if_ether.h | 1 +
 net/core/skbuff.c             | 7 +++++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
index 8bbbcb5..820de5d 100644
--- a/include/uapi/linux/if_ether.h
+++ b/include/uapi/linux/if_ether.h
@@ -30,6 +30,7 @@
  */
 
 #define ETH_ALEN	6		/* Octets in one ethernet addr	 */
+#define ETH_TLEN	2		/* Octets in ethernet type field */
 #define ETH_HLEN	14		/* Total octets in header.	 */
 #define ETH_ZLEN	60		/* Min. octets in frame sans FCS */
 #define ETH_DATA_LEN	1500		/* Max. octets in payload	 */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index baf9905..b103f46 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5020,13 +5020,16 @@ bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len)
 
 static struct sk_buff *skb_reorder_vlan_header(struct sk_buff *skb)
 {
+	int mac_len;
+
 	if (skb_cow(skb, skb_headroom(skb)) < 0) {
 		kfree_skb(skb);
 		return NULL;
 	}
 
-	memmove(skb->data - ETH_HLEN, skb->data - skb->mac_len - VLAN_HLEN,
-		2 * ETH_ALEN);
+	mac_len = skb->data - skb_mac_header(skb);
+	memmove(skb_mac_header(skb) + VLAN_HLEN, skb_mac_header(skb),
+		mac_len - VLAN_HLEN - ETH_TLEN);
 	skb->mac_header += VLAN_HLEN;
 	return skb;
 }
-- 
1.8.3.1

^ permalink raw reply related

* Re: BUG_ON triggered in skb_segment
From: Eric Dumazet @ 2018-03-13  6:04 UTC (permalink / raw)
  To: Yonghong Song, Daniel Borkmann, Eric Dumazet, Alexei Starovoitov,
	netdev, Martin Lau
In-Reply-To: <9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com>



On 03/12/2018 10:45 PM, Yonghong Song wrote:
> Hi,
> 
> One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
> net-next function skb_segment, line 3667.
> 
> 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
> 3473                             netdev_features_t features)
> 3474 {
> 3475         struct sk_buff *segs = NULL;
> 3476         struct sk_buff *tail = NULL;
> ...
> 3665                 while (pos < offset + len) {
> 3666                         if (i >= nfrags) {
> 3667                                 BUG_ON(skb_headlen(list_skb));
> 3668
> 3669                                 i = 0;
> 3670                                 nfrags = 
> skb_shinfo(list_skb)->nr_frags;
> 3671                                 frag = skb_shinfo(list_skb)->frags;
> 3672                                 frag_skb = list_skb;
> ...
> 
> call stack:
> ...
> #0 [ffff883ffef034f8] machine_kexec at ffffffff81044c41
>   #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
>   #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
>   #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
>   #4 [ffff883ffef03668] die at ffffffff8101deb2
>   #5 [ffff883ffef03698] do_trap at ffffffff8101a700
>   #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
>   #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
>   #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
>      [exception RIP: skb_segment+3044]
>      RIP: ffffffff817e4dd4  RSP: ffff883ffef03860  RFLAGS: 00010216
>      RAX: 0000000000002bf6  RBX: ffff883feb7aaa00  RCX: 0000000000000011
>      RDX: ffff883fb87910c0  RSI: 0000000000000011  RDI: ffff883feb7ab500
>      RBP: ffff883ffef03928   R8: 0000000000002ce2   R9: 00000000000027da
>      R10: 000001ea00000000  R11: 0000000000002d82  R12: ffff883f90a1ee80
>      R13: ffff883fb8791120  R14: ffff883feb7abc00  R15: 0000000000002ce2
>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>   #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
> #10 [ffff883ffef03990] tcp4_gso_segment at ffffffff818717d8
> #11 [ffff883ffef039b0] inet_gso_segment at ffffffff81882c9b
> #12 [ffff883ffef03a10] skb_mac_gso_segment at ffffffff817f39b8
> #13 [ffff883ffef03a38] __skb_gso_segment at ffffffff817f3ac9
> #14 [ffff883ffef03a68] validate_xmit_skb at ffffffff817f3eed
> #15 [ffff883ffef03aa8] validate_xmit_skb_list at ffffffff817f40a2
> #16 [ffff883ffef03ad8] sch_direct_xmit at ffffffff81824efb
> #17 [ffff883ffef03b20] __qdisc_run at ffffffff818251aa
> #18 [ffff883ffef03b90] __dev_queue_xmit at ffffffff817f45ed
> #19 [ffff883ffef03c08] dev_queue_xmit at ffffffff817f4b90
> #20 [ffff883ffef03c18] __bpf_redirect at ffffffff81812b66
> #21 [ffff883ffef03c40] skb_do_redirect at ffffffff81813209
> #22 [ffff883ffef03c60] __netif_receive_skb_core at ffffffff817f310d
> #23 [ffff883ffef03cc8] __netif_receive_skb at ffffffff817f32e8
> #24 [ffff883ffef03ce8] netif_receive_skb_internal at ffffffff817f5538
> #25 [ffff883ffef03d10] napi_gro_complete at ffffffff817f56c0
> #26 [ffff883ffef03d28] dev_gro_receive at ffffffff817f5ea6
> #27 [ffff883ffef03d78] napi_gro_receive at ffffffff817f6168
> #28 [ffff883ffef03da0] mlx5e_handle_rx_cqe_mpwrq at ffffffff817381c2
> #29 [ffff883ffef03e30] mlx5e_poll_rx_cq at ffffffff817386c2
> #30 [ffff883ffef03e80] mlx5e_napi_poll at ffffffff8173926e
> #31 [ffff883ffef03ed0] net_rx_action at ffffffff817f5a6e
> #32 [ffff883ffef03f48] __softirqentry_text_start at ffffffff81c000c3
> #33 [ffff883ffef03fa8] irq_exit at ffffffff8108f515
> #34 [ffff883ffef03fb8] do_IRQ at ffffffff81a01b11
> --- <IRQ stack> ---
> bt: cannot transition from IRQ stack to current process stack:
>          IRQ stack pointer: ffff883ffef034f8
>      process stack pointer: ffffffff81a01ae9
>         current stack base: ffffc9000c5c4000
> ...
> Setup:
> =====
> 
> The test will involve three machines:
>    M_ipv6 <-> M_nat <-> M_ipv4
> 
> The M_nat will do ipv4<->ipv6 address translation and then forward packet
> to proper destination. The control plane will configure M_nat properly
> will understand virtual ipv4 address for machine M_ipv6, and
> virtual ipv6 address for machine M_ipv4.
> 
> M_nat runs a bpf program, which is attached to clsact (ingress) qdisc.
> The program uses bpf_skb_change_proto to do protocol conversion.
> bpf_skb_change_proto will adjust skb header_len and len properly
> based on protocol change.
> After the conversion, the program will make proper change on
> ethhdr and ip4/6 header, recalculate checksum, and send the packet out
> through bpf_redirect.
> 
> Experiment:
> ===========
> 
> MTU: 1500B for all three machines.
> 
> The tso/lro/gro are enabled on the M_nat box.
> 
> ping works on both ways of M_ipv6 <-> M_ipv4.
> It works for transfering a small file (4KB) between M_ipv6 and M_ipv4 
> (both ways).
> Transfering a large file (e.g., 4MB) from M_ipv6 to M_ipv4, failed with 
> the above BUG_ON, really fast.
> Did not really test from M_ipv4 to M_ipv6 with large file.
> 
> The error path likely to be (also from the above call stack):
>    nic -> lro/gro -> bpf_program -> gso (BUG_ON)
> 
> In one of experiments, I explicitly printed the skb->len and 
> skb->data_len. The values are below:
>    skb_segment: len 2856, data_len 2686
> They should be equal to avoid BUG.
> 
> In another experiment, I got:
>    skb_segment: len 1428, data_len 1258
> 
> In both cases, the difference is 170 bytes. Not sure whether
> this is just a coincidence or not.
> 
> Workaround:
> ===========
> 
> A workaround to avoid BUG_ON is to disable lro/gro. This way,
> kernel will not receive big packets and hence gso is not really called.
> 
> I am not familiar with gso code. Does anybody hit this BUG_ON before?
> Any suggestion on how to debug this?
> 

skb_segment() works if incoming GRO packet is not modified in its geometry.

In your case it seems you had to adjust gso_size (calling 
skb_decrease_gso_size() or skb_increase_gso_size()), and this breaks 
skb_segment() badly, because geometry changes, unless you had specific 
MTU/MSS restrictions.

You will have to make skb_segment() more generic if you really want this.

^ permalink raw reply

* Re: BUG_ON triggered in skb_segment
From: Yonghong Song @ 2018-03-13  6:08 UTC (permalink / raw)
  To: Eric Dumazet, Daniel Borkmann, Alexei Starovoitov, netdev,
	Martin Lau
In-Reply-To: <875f59f2-d1ec-c47c-cdd7-2ce4985c5143@gmail.com>



On 3/12/18 11:04 PM, Eric Dumazet wrote:
> 
> 
> On 03/12/2018 10:45 PM, Yonghong Song wrote:
>> Hi,
>>
>> One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
>> net-next function skb_segment, line 3667.
>>
>> 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
>> 3473                             netdev_features_t features)
>> 3474 {
>> 3475         struct sk_buff *segs = NULL;
>> 3476         struct sk_buff *tail = NULL;
>> ...
>> 3665                 while (pos < offset + len) {
>> 3666                         if (i >= nfrags) {
>> 3667                                 BUG_ON(skb_headlen(list_skb));
>> 3668
>> 3669                                 i = 0;
>> 3670                                 nfrags = 
>> skb_shinfo(list_skb)->nr_frags;
>> 3671                                 frag = skb_shinfo(list_skb)->frags;
>> 3672                                 frag_skb = list_skb;
>> ...
>>
>> call stack:
>> ...
>> #0 [ffff883ffef034f8] machine_kexec at ffffffff81044c41
>>   #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
>>   #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
>>   #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
>>   #4 [ffff883ffef03668] die at ffffffff8101deb2
>>   #5 [ffff883ffef03698] do_trap at ffffffff8101a700
>>   #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
>>   #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
>>   #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
>>      [exception RIP: skb_segment+3044]
>>      RIP: ffffffff817e4dd4  RSP: ffff883ffef03860  RFLAGS: 00010216
>>      RAX: 0000000000002bf6  RBX: ffff883feb7aaa00  RCX: 0000000000000011
>>      RDX: ffff883fb87910c0  RSI: 0000000000000011  RDI: ffff883feb7ab500
>>      RBP: ffff883ffef03928   R8: 0000000000002ce2   R9: 00000000000027da
>>      R10: 000001ea00000000  R11: 0000000000002d82  R12: ffff883f90a1ee80
>>      R13: ffff883fb8791120  R14: ffff883feb7abc00  R15: 0000000000002ce2
>>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>   #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
>> #10 [ffff883ffef03990] tcp4_gso_segment at ffffffff818717d8
>> #11 [ffff883ffef039b0] inet_gso_segment at ffffffff81882c9b
>> #12 [ffff883ffef03a10] skb_mac_gso_segment at ffffffff817f39b8
>> #13 [ffff883ffef03a38] __skb_gso_segment at ffffffff817f3ac9
>> #14 [ffff883ffef03a68] validate_xmit_skb at ffffffff817f3eed
>> #15 [ffff883ffef03aa8] validate_xmit_skb_list at ffffffff817f40a2
>> #16 [ffff883ffef03ad8] sch_direct_xmit at ffffffff81824efb
>> #17 [ffff883ffef03b20] __qdisc_run at ffffffff818251aa
>> #18 [ffff883ffef03b90] __dev_queue_xmit at ffffffff817f45ed
>> #19 [ffff883ffef03c08] dev_queue_xmit at ffffffff817f4b90
>> #20 [ffff883ffef03c18] __bpf_redirect at ffffffff81812b66
>> #21 [ffff883ffef03c40] skb_do_redirect at ffffffff81813209
>> #22 [ffff883ffef03c60] __netif_receive_skb_core at ffffffff817f310d
>> #23 [ffff883ffef03cc8] __netif_receive_skb at ffffffff817f32e8
>> #24 [ffff883ffef03ce8] netif_receive_skb_internal at ffffffff817f5538
>> #25 [ffff883ffef03d10] napi_gro_complete at ffffffff817f56c0
>> #26 [ffff883ffef03d28] dev_gro_receive at ffffffff817f5ea6
>> #27 [ffff883ffef03d78] napi_gro_receive at ffffffff817f6168
>> #28 [ffff883ffef03da0] mlx5e_handle_rx_cqe_mpwrq at ffffffff817381c2
>> #29 [ffff883ffef03e30] mlx5e_poll_rx_cq at ffffffff817386c2
>> #30 [ffff883ffef03e80] mlx5e_napi_poll at ffffffff8173926e
>> #31 [ffff883ffef03ed0] net_rx_action at ffffffff817f5a6e
>> #32 [ffff883ffef03f48] __softirqentry_text_start at ffffffff81c000c3
>> #33 [ffff883ffef03fa8] irq_exit at ffffffff8108f515
>> #34 [ffff883ffef03fb8] do_IRQ at ffffffff81a01b11
>> --- <IRQ stack> ---
>> bt: cannot transition from IRQ stack to current process stack:
>>          IRQ stack pointer: ffff883ffef034f8
>>      process stack pointer: ffffffff81a01ae9
>>         current stack base: ffffc9000c5c4000
>> ...
>> Setup:
>> =====
>>
>> The test will involve three machines:
>>    M_ipv6 <-> M_nat <-> M_ipv4
>>
>> The M_nat will do ipv4<->ipv6 address translation and then forward packet
>> to proper destination. The control plane will configure M_nat properly
>> will understand virtual ipv4 address for machine M_ipv6, and
>> virtual ipv6 address for machine M_ipv4.
>>
>> M_nat runs a bpf program, which is attached to clsact (ingress) qdisc.
>> The program uses bpf_skb_change_proto to do protocol conversion.
>> bpf_skb_change_proto will adjust skb header_len and len properly
>> based on protocol change.
>> After the conversion, the program will make proper change on
>> ethhdr and ip4/6 header, recalculate checksum, and send the packet out
>> through bpf_redirect.
>>
>> Experiment:
>> ===========
>>
>> MTU: 1500B for all three machines.
>>
>> The tso/lro/gro are enabled on the M_nat box.
>>
>> ping works on both ways of M_ipv6 <-> M_ipv4.
>> It works for transfering a small file (4KB) between M_ipv6 and M_ipv4 
>> (both ways).
>> Transfering a large file (e.g., 4MB) from M_ipv6 to M_ipv4, failed 
>> with the above BUG_ON, really fast.
>> Did not really test from M_ipv4 to M_ipv6 with large file.
>>
>> The error path likely to be (also from the above call stack):
>>    nic -> lro/gro -> bpf_program -> gso (BUG_ON)
>>
>> In one of experiments, I explicitly printed the skb->len and 
>> skb->data_len. The values are below:
>>    skb_segment: len 2856, data_len 2686
>> They should be equal to avoid BUG.
>>
>> In another experiment, I got:
>>    skb_segment: len 1428, data_len 1258
>>
>> In both cases, the difference is 170 bytes. Not sure whether
>> this is just a coincidence or not.
>>
>> Workaround:
>> ===========
>>
>> A workaround to avoid BUG_ON is to disable lro/gro. This way,
>> kernel will not receive big packets and hence gso is not really called.
>>
>> I am not familiar with gso code. Does anybody hit this BUG_ON before?
>> Any suggestion on how to debug this?
>>
> 
> skb_segment() works if incoming GRO packet is not modified in its geometry.
> 
> In your case it seems you had to adjust gso_size (calling 
> skb_decrease_gso_size() or skb_increase_gso_size()), and this breaks 
> skb_segment() badly, because geometry changes, unless you had specific 
> MTU/MSS restrictions.
> 
> You will have to make skb_segment() more generic if you really want this.

In net/core/filter.c function bpf_skb_change_proto, which is called
in the bpf program, does some GSO adjustment. Could you help check
whether it satisfies my above use case or not? Thanks!

^ permalink raw reply

* linux-next: build warning after merge of the net-next tree
From: Stephen Rothwell @ 2018-03-13  6:11 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List,
	Gustavo A. R. Silva

[-- Attachment #1: Type: text/plain, Size: 392 bytes --]

Hi all,

After merging the net-next tree, today's linux-next build (sparc
defconfig) produced this warning:

net/core/pktgen.c: In function 'pktgen_if_write':
net/core/pktgen.c:1710:1: warning: the frame size of 1048 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 }
 ^

Introduced by commit

  35951393bbff ("pktgen: Remove VLA usage")

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: BUG_ON triggered in skb_segment
From: Yunsheng Lin @ 2018-03-13  6:18 UTC (permalink / raw)
  To: Yonghong Song, Daniel Borkmann, Eric Dumazet, Alexei Starovoitov,
	netdev, Martin Lau
In-Reply-To: <9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com>

Hi, Song

On 2018/3/13 13:45, Yonghong Song wrote:
> Hi,
> 
> One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
> net-next function skb_segment, line 3667.
> 
> 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
> 3473                             netdev_features_t features)
> 3474 {
> 3475         struct sk_buff *segs = NULL;
> 3476         struct sk_buff *tail = NULL;
> ...
> 3665                 while (pos < offset + len) {
> 3666                         if (i >= nfrags) {
> 3667                                 BUG_ON(skb_headlen(list_skb));
> 3668
> 3669                                 i = 0;
> 3670                                 nfrags = skb_shinfo(list_skb)->nr_frags;
> 3671                                 frag = skb_shinfo(list_skb)->frags;
> 3672                                 frag_skb = list_skb;
> ...
> 
> call stack:
> ...
> #0 [ffff883ffef034f8] machine_kexec at ffffffff81044c41
>  #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
>  #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
>  #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
>  #4 [ffff883ffef03668] die at ffffffff8101deb2
>  #5 [ffff883ffef03698] do_trap at ffffffff8101a700
>  #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
>  #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
>  #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
>     [exception RIP: skb_segment+3044]
>     RIP: ffffffff817e4dd4  RSP: ffff883ffef03860  RFLAGS: 00010216
>     RAX: 0000000000002bf6  RBX: ffff883feb7aaa00  RCX: 0000000000000011
>     RDX: ffff883fb87910c0  RSI: 0000000000000011  RDI: ffff883feb7ab500
>     RBP: ffff883ffef03928   R8: 0000000000002ce2   R9: 00000000000027da
>     R10: 000001ea00000000  R11: 0000000000002d82  R12: ffff883f90a1ee80
>     R13: ffff883fb8791120  R14: ffff883feb7abc00  R15: 0000000000002ce2
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
> #10 [ffff883ffef03990] tcp4_gso_segment at ffffffff818717d8
> #11 [ffff883ffef039b0] inet_gso_segment at ffffffff81882c9b
> #12 [ffff883ffef03a10] skb_mac_gso_segment at ffffffff817f39b8
> #13 [ffff883ffef03a38] __skb_gso_segment at ffffffff817f3ac9
> #14 [ffff883ffef03a68] validate_xmit_skb at ffffffff817f3eed
> #15 [ffff883ffef03aa8] validate_xmit_skb_list at ffffffff817f40a2
> #16 [ffff883ffef03ad8] sch_direct_xmit at ffffffff81824efb
> #17 [ffff883ffef03b20] __qdisc_run at ffffffff818251aa
> #18 [ffff883ffef03b90] __dev_queue_xmit at ffffffff817f45ed
> #19 [ffff883ffef03c08] dev_queue_xmit at ffffffff817f4b90
> #20 [ffff883ffef03c18] __bpf_redirect at ffffffff81812b66
> #21 [ffff883ffef03c40] skb_do_redirect at ffffffff81813209
> #22 [ffff883ffef03c60] __netif_receive_skb_core at ffffffff817f310d
> #23 [ffff883ffef03cc8] __netif_receive_skb at ffffffff817f32e8
> #24 [ffff883ffef03ce8] netif_receive_skb_internal at ffffffff817f5538
> #25 [ffff883ffef03d10] napi_gro_complete at ffffffff817f56c0
> #26 [ffff883ffef03d28] dev_gro_receive at ffffffff817f5ea6
> #27 [ffff883ffef03d78] napi_gro_receive at ffffffff817f6168
> #28 [ffff883ffef03da0] mlx5e_handle_rx_cqe_mpwrq at ffffffff817381c2
> #29 [ffff883ffef03e30] mlx5e_poll_rx_cq at ffffffff817386c2
> #30 [ffff883ffef03e80] mlx5e_napi_poll at ffffffff8173926e
> #31 [ffff883ffef03ed0] net_rx_action at ffffffff817f5a6e
> #32 [ffff883ffef03f48] __softirqentry_text_start at ffffffff81c000c3
> #33 [ffff883ffef03fa8] irq_exit at ffffffff8108f515
> #34 [ffff883ffef03fb8] do_IRQ at ffffffff81a01b11
> --- <IRQ stack> ---
> bt: cannot transition from IRQ stack to current process stack:
>         IRQ stack pointer: ffff883ffef034f8
>     process stack pointer: ffffffff81a01ae9
>        current stack base: ffffc9000c5c4000
> ...
> Setup:
> =====
> 
> The test will involve three machines:
>   M_ipv6 <-> M_nat <-> M_ipv4
> 
> The M_nat will do ipv4<->ipv6 address translation and then forward packet
> to proper destination. The control plane will configure M_nat properly
> will understand virtual ipv4 address for machine M_ipv6, and
> virtual ipv6 address for machine M_ipv4.
> 
> M_nat runs a bpf program, which is attached to clsact (ingress) qdisc.
> The program uses bpf_skb_change_proto to do protocol conversion.
> bpf_skb_change_proto will adjust skb header_len and len properly
> based on protocol change.
> After the conversion, the program will make proper change on
> ethhdr and ip4/6 header, recalculate checksum, and send the packet out
> through bpf_redirect.
> 
> Experiment:
> ===========
> 
> MTU: 1500B for all three machines.
> 
> The tso/lro/gro are enabled on the M_nat box.
> 
> ping works on both ways of M_ipv6 <-> M_ipv4.
> It works for transfering a small file (4KB) between M_ipv6 and M_ipv4 (both ways).
> Transfering a large file (e.g., 4MB) from M_ipv6 to M_ipv4, failed with the above BUG_ON, really fast.
> Did not really test from M_ipv4 to M_ipv6 with large file.
> 
> The error path likely to be (also from the above call stack):
>   nic -> lro/gro -> bpf_program -> gso (BUG_ON)
> 
> In one of experiments, I explicitly printed the skb->len and skb->data_len. The values are below:
>   skb_segment: len 2856, data_len 2686
> They should be equal to avoid BUG.
> 
> In another experiment, I got:
>   skb_segment: len 1428, data_len 1258
> 
> In both cases, the difference is 170 bytes. Not sure whether
> this is just a coincidence or not.
> 
> Workaround:
> ===========
> 
> A workaround to avoid BUG_ON is to disable lro/gro. This way,
> kernel will not receive big packets and hence gso is not really called.
> 
> I am not familiar with gso code. Does anybody hit this BUG_ON before?
> Any suggestion on how to debug this?

When the bpf_program do ipv4<->ipv6 address translation , shinfo->gso_type
maybe need to be changed, for example, SKB_GSO_TCPV4 -> SKB_GSO_TCPV6.
I am not sure if there are other field related to gro need to be changed,
you may want to debug it.

You may call call skb_mac_gso_segment with the packet after address
translation to debug this problem.

Hope this will help.

> 
> Thanks!
> 
> Yonghong
> 
> 

^ permalink raw reply

* Re: Problem with bridge (mcast-to-ucast + hairpin) and Broadcom's 802.11f in their FullMAC fw
From: Rafał Miłecki @ 2018-03-13  6:23 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Linus Lüssing, Felix Fietkau, Arend van Spriel, Franky Lin,
	Hante Meuleman, Chi-Hsien Lin, Wright Feng, Pieter-Paul Giesberts,
	Network Development,
	bridge-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	brcm80211-dev-list.pdl-dY08KVG/lbpWk0Htik3J/w,
	brcm80211-dev-list-+wT8y+m8/X5BDgjK7y7TUQ
In-Reply-To: <20180312160103.1a043936@xeon-e3>

On 03/13/2018 12:01 AM, Stephen Hemminger wrote:
> On Mon, 12 Mar 2018 23:42:48 +0100
> Rafał Miłecki <zajec5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> 2) Blame bridge + mcast-to-ucast + hairpin for 802.11f incompatibility
>>
>> If we agree that 802.11f support in FullMAC firmware is acceptable, then
>> we have to make sure Linux's bridge doesn't break it by passing 802.11f
>> (broadcast) frames back to the source interface. That would require a
>> check like in below diff + proper code for handling such packets. I'm
>> afraid I'm not familiar with bridge code enough to complete that.
>>
>> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
>> index edae702..9e5d6ea 100644
>> --- a/net/bridge/br_input.c
>> +++ b/net/bridge/br_input.c
>> @@ -126,6 +126,27 @@ static void br_do_proxy_arp(struct sk_buff *skb, struct net_bridge *br,
>>   	}
>>   }
>>
>> +static bool br_skb_is_iapp_add_packet(struct sk_buff *skb)
>> +{
>> +	const u8 iapp_add_packet[6] __aligned(2) = {
>> +		0x00, 0x01, 0xaf, 0x81, 0x01, 0x00,
>> +	};
>> +#if !defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
>> +	const u16 *a = (const u16 *)skb->data;
>> +	const u16 *b = (const u16 *)iapp_add_packet;
>> +#endif
>> +
>> +	if (skb->len != 6)
>> +		return false;
>> +
>> +#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
>> +	return !(((*(const u32 *)skb->data) ^ (*(const u32 *)iapp_add_packet)) |
>> +	         ((*(const u16 *)(skb->data + 4)) ^ (*(const u16 *)(iapp_add_packet + 4))));
>> +#else
>> +	return !((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2]));
>> +#endif
>> +}
>> +
>>   /* note: already called with rcu_read_lock */
>>   int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
>>   {
>> @@ -155,6 +176,8 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
>>   	if (is_multicast_ether_addr(dest)) {
>>   		/* by definition the broadcast is also a multicast address */
>>   		if (is_broadcast_ether_addr(dest)) {
>> +			if (br_skb_is_iapp_add_packet(skb))
>> +				pr_warn("This packet should not be passed back to the source interface!\n");
>>   			pkt_type = BR_PKT_BROADCAST;
>>   			local_rcv = true;
>>   		} else {
>
>
> Don't like bridge doing special case code for magic received values directly in input path.
> Really needs to be generic which is why I suggested ebtables.

We need in-bridge solution only if we decide to support FullMAC
firmwares with 802.11f implementation.

In that case is this possible to use ebtables as a workaround at all?
Can I really use ebtables to set switch to don't pass 802.11f ADD frames
back to the original interface?

^ permalink raw reply

* Re: BUG_ON triggered in skb_segment
From: Eric Dumazet @ 2018-03-13  6:25 UTC (permalink / raw)
  To: Yonghong Song, Eric Dumazet, Daniel Borkmann, Alexei Starovoitov,
	netdev, Martin Lau
In-Reply-To: <d8d5a8c3-4004-47f2-9cfb-d94d0cd0d56c@fb.com>



On 03/12/2018 11:08 PM, Yonghong Song wrote:
> 
> 
> On 3/12/18 11:04 PM, Eric Dumazet wrote:
>>
>>
>> On 03/12/2018 10:45 PM, Yonghong Song wrote:
>>> Hi,
>>>
>>> One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
>>> net-next function skb_segment, line 3667.
>>>
>>> 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>> 3473                             netdev_features_t features)
>>> 3474 {
>>> 3475         struct sk_buff *segs = NULL;
>>> 3476         struct sk_buff *tail = NULL;
>>> ...
>>> 3665                 while (pos < offset + len) {
>>> 3666                         if (i >= nfrags) {
>>> 3667                                 BUG_ON(skb_headlen(list_skb));
>>> 3668
>>> 3669                                 i = 0;
>>> 3670                                 nfrags = 
>>> skb_shinfo(list_skb)->nr_frags;
>>> 3671                                 frag = skb_shinfo(list_skb)->frags;
>>> 3672                                 frag_skb = list_skb;
>>> ...
>>>
>>> call stack:
>>> ...
>>> #0 [ffff883ffef034f8] machine_kexec at ffffffff81044c41
>>>   #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
>>>   #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
>>>   #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
>>>   #4 [ffff883ffef03668] die at ffffffff8101deb2
>>>   #5 [ffff883ffef03698] do_trap at ffffffff8101a700
>>>   #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
>>>   #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
>>>   #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
>>>      [exception RIP: skb_segment+3044]
>>>      RIP: ffffffff817e4dd4  RSP: ffff883ffef03860  RFLAGS: 00010216
>>>      RAX: 0000000000002bf6  RBX: ffff883feb7aaa00  RCX: 0000000000000011
>>>      RDX: ffff883fb87910c0  RSI: 0000000000000011  RDI: ffff883feb7ab500
>>>      RBP: ffff883ffef03928   R8: 0000000000002ce2   R9: 00000000000027da
>>>      R10: 000001ea00000000  R11: 0000000000002d82  R12: ffff883f90a1ee80
>>>      R13: ffff883fb8791120  R14: ffff883feb7abc00  R15: 0000000000002ce2
>>>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>   #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
>>> #10 [ffff883ffef03990] tcp4_gso_segment at ffffffff818717d8
>>> #11 [ffff883ffef039b0] inet_gso_segment at ffffffff81882c9b
>>> #12 [ffff883ffef03a10] skb_mac_gso_segment at ffffffff817f39b8
>>> #13 [ffff883ffef03a38] __skb_gso_segment at ffffffff817f3ac9
>>> #14 [ffff883ffef03a68] validate_xmit_skb at ffffffff817f3eed
>>> #15 [ffff883ffef03aa8] validate_xmit_skb_list at ffffffff817f40a2
>>> #16 [ffff883ffef03ad8] sch_direct_xmit at ffffffff81824efb
>>> #17 [ffff883ffef03b20] __qdisc_run at ffffffff818251aa
>>> #18 [ffff883ffef03b90] __dev_queue_xmit at ffffffff817f45ed
>>> #19 [ffff883ffef03c08] dev_queue_xmit at ffffffff817f4b90
>>> #20 [ffff883ffef03c18] __bpf_redirect at ffffffff81812b66
>>> #21 [ffff883ffef03c40] skb_do_redirect at ffffffff81813209
>>> #22 [ffff883ffef03c60] __netif_receive_skb_core at ffffffff817f310d
>>> #23 [ffff883ffef03cc8] __netif_receive_skb at ffffffff817f32e8
>>> #24 [ffff883ffef03ce8] netif_receive_skb_internal at ffffffff817f5538
>>> #25 [ffff883ffef03d10] napi_gro_complete at ffffffff817f56c0
>>> #26 [ffff883ffef03d28] dev_gro_receive at ffffffff817f5ea6
>>> #27 [ffff883ffef03d78] napi_gro_receive at ffffffff817f6168
>>> #28 [ffff883ffef03da0] mlx5e_handle_rx_cqe_mpwrq at ffffffff817381c2
>>> #29 [ffff883ffef03e30] mlx5e_poll_rx_cq at ffffffff817386c2
>>> #30 [ffff883ffef03e80] mlx5e_napi_poll at ffffffff8173926e
>>> #31 [ffff883ffef03ed0] net_rx_action at ffffffff817f5a6e
>>> #32 [ffff883ffef03f48] __softirqentry_text_start at ffffffff81c000c3
>>> #33 [ffff883ffef03fa8] irq_exit at ffffffff8108f515
>>> #34 [ffff883ffef03fb8] do_IRQ at ffffffff81a01b11
>>> --- <IRQ stack> ---
>>> bt: cannot transition from IRQ stack to current process stack:
>>>          IRQ stack pointer: ffff883ffef034f8
>>>      process stack pointer: ffffffff81a01ae9
>>>         current stack base: ffffc9000c5c4000
>>> ...
>>> Setup:
>>> =====
>>>
>>> The test will involve three machines:
>>>    M_ipv6 <-> M_nat <-> M_ipv4
>>>
>>> The M_nat will do ipv4<->ipv6 address translation and then forward 
>>> packet
>>> to proper destination. The control plane will configure M_nat properly
>>> will understand virtual ipv4 address for machine M_ipv6, and
>>> virtual ipv6 address for machine M_ipv4.
>>>
>>> M_nat runs a bpf program, which is attached to clsact (ingress) qdisc.
>>> The program uses bpf_skb_change_proto to do protocol conversion.
>>> bpf_skb_change_proto will adjust skb header_len and len properly
>>> based on protocol change.
>>> After the conversion, the program will make proper change on
>>> ethhdr and ip4/6 header, recalculate checksum, and send the packet out
>>> through bpf_redirect.
>>>
>>> Experiment:
>>> ===========
>>>
>>> MTU: 1500B for all three machines.
>>>
>>> The tso/lro/gro are enabled on the M_nat box.
>>>
>>> ping works on both ways of M_ipv6 <-> M_ipv4.
>>> It works for transfering a small file (4KB) between M_ipv6 and M_ipv4 
>>> (both ways).
>>> Transfering a large file (e.g., 4MB) from M_ipv6 to M_ipv4, failed 
>>> with the above BUG_ON, really fast.
>>> Did not really test from M_ipv4 to M_ipv6 with large file.
>>>
>>> The error path likely to be (also from the above call stack):
>>>    nic -> lro/gro -> bpf_program -> gso (BUG_ON)
>>>
>>> In one of experiments, I explicitly printed the skb->len and 
>>> skb->data_len. The values are below:
>>>    skb_segment: len 2856, data_len 2686
>>> They should be equal to avoid BUG.
>>>
>>> In another experiment, I got:
>>>    skb_segment: len 1428, data_len 1258
>>>
>>> In both cases, the difference is 170 bytes. Not sure whether
>>> this is just a coincidence or not.
>>>
>>> Workaround:
>>> ===========
>>>
>>> A workaround to avoid BUG_ON is to disable lro/gro. This way,
>>> kernel will not receive big packets and hence gso is not really called.
>>>
>>> I am not familiar with gso code. Does anybody hit this BUG_ON before?
>>> Any suggestion on how to debug this?
>>>
>>
>> skb_segment() works if incoming GRO packet is not modified in its 
>> geometry.
>>
>> In your case it seems you had to adjust gso_size (calling 
>> skb_decrease_gso_size() or skb_increase_gso_size()), and this breaks 
>> skb_segment() badly, because geometry changes, unless you had specific 
>> MTU/MSS restrictions.
>>
>> You will have to make skb_segment() more generic if you really want this.
> 
> In net/core/filter.c function bpf_skb_change_proto, which is called
> in the bpf program, does some GSO adjustment. Could you help check
> whether it satisfies my above use case or not? Thanks!

As I said this  helper ends up modifying gso_size by +/- 20 
(sizeof(ipv6 header) - sizeof(ipv4 header))

So it wont work if skb_segment() is called after this change.

Not clear why the GRO packet is not sent as is (as a TSO packet) since 
mlx4/mlx5 NICs certainly support TSO.

^ permalink raw reply

* [PATCH net 0/2] Fix vlan untag and insertion for bridge and vlan with reorder_hdr off
From: Toshiaki Makita @ 2018-03-13  5:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: Toshiaki Makita, netdev, Brandon Carpenter, Vlad Yasevich

As Brandon Carpenter reported[1], sending non-vlan-offloaded packets from
bridge devices ends up with corrupted packets. He narrowed down this problem
and found that the root cause is in skb_reorder_vlan_header().

While I was working on fixing this problem, I found that the function does
not work properly for double tagged packets with reorder_hdr off as well.

Patch 1 fixes these 2 problems in skb_reorder_vlan_header().

And it turned out that fixing skb_reorder_vlan_header() is not sufficient
to receive double tagged packets with reorder_hdr off while I was testing the
fix. Vlan tags got out of order when vlan devices with reorder_hdr disabled
were stacked. Patch 2 fixes this problem.

[1] https://www.spinics.net/lists/linux-ethernet-bridging/msg07039.html

Toshiaki Makita (2):
  net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off
  vlan: Fix out of order vlan headers with reorder header off

 include/linux/if_vlan.h       | 66 +++++++++++++++++++++++++++++++++++--------
 include/uapi/linux/if_ether.h |  1 +
 net/8021q/vlan_core.c         |  4 +--
 net/core/skbuff.c             |  7 +++--
 4 files changed, 63 insertions(+), 15 deletions(-)

-- 
1.8.3.1

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox