Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next 2/3] libbpf: add error reporting in XDP
From: Alexei Starovoitov @ 2017-12-27  2:27 UTC (permalink / raw)
  To: Eric Leblond; +Cc: netdev, daniel, linux-kernel, ast
In-Reply-To: <20171225221325.9680-3-eric@regit.org>

On Mon, Dec 25, 2017 at 11:13:24PM +0100, Eric Leblond wrote:
> Parse netlink ext attribute to get the error message returned by
> the card.
> 
> Signed-off-by: Eric Leblond <eric@regit.org>
...
> diff --git a/tools/lib/bpf/nlattr.c b/tools/lib/bpf/nlattr.c
> new file mode 100644
> index 000000000000..962de14f74e3
> --- /dev/null
> +++ b/tools/lib/bpf/nlattr.c
> @@ -0,0 +1,188 @@
> +
> +/*
> + * NETLINK      Netlink attributes
> + *
> + *		Authors:	Thomas Graf <tgraf@suug.ch>
> + *				Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> + */
...
> diff --git a/tools/lib/bpf/nlattr.h b/tools/lib/bpf/nlattr.h
> new file mode 100644
> index 000000000000..b95f3e64c14d
> --- /dev/null
> +++ b/tools/lib/bpf/nlattr.h
> @@ -0,0 +1,164 @@
> +#ifndef __NLATTR_H
> +#define __NLATTR_H

Every file in kernel repo has to have SPDX license identifier.
Also note that tools/lib/bpf is LGPL whereas _if_ you're copying
these functions from kernel lib/nlattr.c then it's GPL which we cannot mix.
Probably easier to copy from libnl instead which is LGPL.

^ permalink raw reply

* [PATCH net-next 00/10] net: qualcomm: rmnet: Enable csum offloads
From: Subash Abhinov Kasiviswanathan @ 2017-12-27  2:27 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan

This series introduces the MAPv4 packet format for checksum
offload plus some other minor changes.

Patches 1-3 are cleanups.

Patch 4 renames the ingress format to data format so that all data
formats can be configured using this going forward.

Patch 5 uses the pacing helper to improve TCP transmit performance.

Patch 6-9 defines the the MAPv4 for checksum offload for RX and TX.
A new header and trailer format are used as part of MAPv4.
For RX checksum offload, only the 1's complement of the IP payload
portion is computed by hardware. The meta data from RX header is
used to verify the checksum field in the packet. Note that the
IP packet and its field itself is not modified by hardware.
This gives metadata to help with the RX checksum. For TX, the
required metadata is filled up so hardware can compute the
checksum.

Patch 10 enables GSO on rmnet devices

Subash Abhinov Kasiviswanathan (10):
  net: qualcomm: rmnet: Remove redundant check when stamping map header
  net: qualcomm: rmnet: Remove invalid condition while stamping mux id
  net: qualcomm: rmnet: Remove unused function declaration
  net: qualcomm: rmnet: Rename ingress data format to data format
  net: qualcomm: rmnet: Set pacing rate
  net: qualcomm: rmnet: Define the MAPv4 packet formats
  net: qualcomm: rmnet: Add support for RX checksum offload
  net: qualcomm: rmnet: Handle command packets with checksum trailer
  net: qualcomm: rmnet: Add support for TX checksum offload
  net: qualcomm: rmnet: Add support for GSO

 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c |  10 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h |   2 +-
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c   |  36 ++-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h    |  23 +-
 .../ethernet/qualcomm/rmnet/rmnet_map_command.c    |  17 +-
 .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c   | 298 ++++++++++++++++++++-
 .../net/ethernet/qualcomm/rmnet/rmnet_private.h    |   2 +
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c    |   4 +
 8 files changed, 367 insertions(+), 25 deletions(-)

-- 
1.9.1

^ permalink raw reply

* [PATCH net-next 01/10] net: qualcomm: rmnet: Remove redundant check when stamping map header
From: Subash Abhinov Kasiviswanathan @ 2017-12-27  2:27 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1514341685-11262-1-git-send-email-subashab@codeaurora.org>

We already check the headroom once in rmnet_map_egress_handler(),
so this is not needed.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
index 86b8c75..978ce26 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
@@ -32,9 +32,6 @@ struct rmnet_map_header *rmnet_map_add_map_header(struct sk_buff *skb,
 	u32 padding, map_datalen;
 	u8 *padbytes;
 
-	if (skb_headroom(skb) < sizeof(struct rmnet_map_header))
-		return NULL;
-
 	map_datalen = skb->len - hdrlen;
 	map_header = (struct rmnet_map_header *)
 			skb_push(skb, sizeof(struct rmnet_map_header));
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 02/10] net: qualcomm: rmnet: Remove invalid condition while stamping mux id
From: Subash Abhinov Kasiviswanathan @ 2017-12-27  2:27 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1514341685-11262-1-git-send-email-subashab@codeaurora.org>

rmnet devices cannot have a mux id of 255. This is validated when
assigning the mux id to the rmnet devices. As a result, checking for
mux id 255 does not apply in egress path.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index 0553932..b2d317e3 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -143,10 +143,7 @@ static int rmnet_map_egress_handler(struct sk_buff *skb,
 	if (!map_header)
 		goto fail;
 
-	if (mux_id == 0xff)
-		map_header->mux_id = 0;
-	else
-		map_header->mux_id = mux_id;
+	map_header->mux_id = mux_id;
 
 	skb->protocol = htons(ETH_P_MAP);
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 03/10] net: qualcomm: rmnet: Remove unused function declaration
From: Subash Abhinov Kasiviswanathan @ 2017-12-27  2:27 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1514341685-11262-1-git-send-email-subashab@codeaurora.org>

rmnet_map_demultiplex() is only declared but not defined anywhere,
so remove it.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
index 4df359d..ef0eff2 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
@@ -67,7 +67,6 @@ struct rmnet_map_header {
 #define RMNET_MAP_NO_PAD_BYTES        0
 #define RMNET_MAP_ADD_PAD_BYTES       1
 
-u8 rmnet_map_demultiplex(struct sk_buff *skb);
 struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb);
 struct rmnet_map_header *rmnet_map_add_map_header(struct sk_buff *skb,
 						  int hdrlen, int pad);
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 04/10] net: qualcomm: rmnet: Rename ingress data format to data format
From: Subash Abhinov Kasiviswanathan @ 2017-12-27  2:27 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1514341685-11262-1-git-send-email-subashab@codeaurora.org>

This is done so that we can use this field for both ingress and
egress flags.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c   | 10 +++++-----
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h   |  2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c |  5 ++---
 3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
index cedacdd..7e7704d 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
@@ -143,7 +143,7 @@ static int rmnet_newlink(struct net *src_net, struct net_device *dev,
 			 struct nlattr *tb[], struct nlattr *data[],
 			 struct netlink_ext_ack *extack)
 {
-	int ingress_format = RMNET_INGRESS_FORMAT_DEAGGREGATION;
+	u32 data_format = RMNET_INGRESS_FORMAT_DEAGGREGATION;
 	struct net_device *real_dev;
 	int mode = RMNET_EPMODE_VND;
 	struct rmnet_endpoint *ep;
@@ -185,11 +185,11 @@ static int rmnet_newlink(struct net *src_net, struct net_device *dev,
 		struct ifla_vlan_flags *flags;
 
 		flags = nla_data(data[IFLA_VLAN_FLAGS]);
-		ingress_format = flags->flags & flags->mask;
+		data_format = flags->flags & flags->mask;
 	}
 
-	netdev_dbg(dev, "data format [ingress 0x%08X]\n", ingress_format);
-	port->ingress_data_format = ingress_format;
+	netdev_dbg(dev, "data format [0x%08X]\n", data_format);
+	port->data_format = data_format;
 
 	return 0;
 
@@ -353,7 +353,7 @@ static int rmnet_changelink(struct net_device *dev, struct nlattr *tb[],
 		struct ifla_vlan_flags *flags;
 
 		flags = nla_data(data[IFLA_VLAN_FLAGS]);
-		port->ingress_data_format = flags->flags & flags->mask;
+		port->data_format = flags->flags & flags->mask;
 	}
 
 	return 0;
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
index 2ea9fe3..00e4634 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
@@ -32,7 +32,7 @@ struct rmnet_endpoint {
  */
 struct rmnet_port {
 	struct net_device *dev;
-	u32 ingress_data_format;
+	u32 data_format;
 	u8 nr_rmnet_devs;
 	u8 rmnet_mode;
 	struct hlist_head muxed_ep[RMNET_MAX_LOGICAL_EP];
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index b2d317e3..8e1f43a 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -69,8 +69,7 @@ static void rmnet_set_skb_proto(struct sk_buff *skb)
 	u16 len;
 
 	if (RMNET_MAP_GET_CD_BIT(skb)) {
-		if (port->ingress_data_format
-		    & RMNET_INGRESS_FORMAT_MAP_COMMANDS)
+		if (port->data_format & RMNET_INGRESS_FORMAT_MAP_COMMANDS)
 			return rmnet_map_command(skb, port);
 
 		goto free_skb;
@@ -114,7 +113,7 @@ static void rmnet_set_skb_proto(struct sk_buff *skb)
 		skb_push(skb, ETH_HLEN);
 	}
 
-	if (port->ingress_data_format & RMNET_INGRESS_FORMAT_DEAGGREGATION) {
+	if (port->data_format & RMNET_INGRESS_FORMAT_DEAGGREGATION) {
 		while ((skbn = rmnet_map_deaggregate(skb)) != NULL)
 			__rmnet_map_ingress_handler(skbn, port);
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 05/10] net: qualcomm: rmnet: Set pacing rate
From: Subash Abhinov Kasiviswanathan @ 2017-12-27  2:28 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1514341685-11262-1-git-send-email-subashab@codeaurora.org>

With a default pacing rate of 10, the uplink data rate for a single
TCP stream is around 10Mbps. Setting it to 8 increases it to 146Mbps
which is the maximum supported transmit rate.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index 8e1f43a..8f8c4f2 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -16,6 +16,7 @@
 #include <linux/netdevice.h>
 #include <linux/netdev_features.h>
 #include <linux/if_arp.h>
+#include <net/sock.h>
 #include "rmnet_private.h"
 #include "rmnet_config.h"
 #include "rmnet_vnd.h"
@@ -204,6 +205,8 @@ void rmnet_egress_handler(struct sk_buff *skb)
 	struct rmnet_priv *priv;
 	u8 mux_id;
 
+	sk_pacing_shift_update(skb->sk, 8);
+
 	orig_dev = skb->dev;
 	priv = netdev_priv(orig_dev);
 	skb->dev = priv->real_dev;
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 06/10] net: qualcomm: rmnet: Define the MAPv4 packet formats
From: Subash Abhinov Kasiviswanathan @ 2017-12-27  2:28 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1514341685-11262-1-git-send-email-subashab@codeaurora.org>

The MAPv4 packet format adds support for RX / TX checksum offload.
For a bi-directional UDP stream at a rate of 570 / 146 Mbps, roughly
10% CPU cycles are saved.

For receive path, there is a checksum trailer appended to the end of
the MAP packet. The valid field indicates if hardware has computed
the checksum. csum_start_offset indicates the offset from the start
of the IP header from which hardware has computed checksum.
csum_length is the number of bytes over which the checksum was
computed and the resulting value is csum_value.

In the transmit path, a header is appended between the end of the MAP
header and the start of the IP packet. csum_start_offset is the offset
in bytes from which hardware will compute the checksum if the
csum_enabled bit is set. udp_ip4_ind indicates if the checksum
value of 0 is valid or not. csum_insert_offset is the offset from the
csum_start_offset where hardware will insert the computed checksum.

The use of this additional packet format for checksum offload is
explained in subsequent patches.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h     | 16 ++++++++++++++++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h |  2 ++
 2 files changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
index ef0eff2..01d876c 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
@@ -47,6 +47,22 @@ struct rmnet_map_header {
 	u16 pkt_len;
 }  __aligned(1);

+struct rmnet_map_dl_csum_trailer {
+	u8  reserved1;
+	u8  valid:1;
+	u8  reserved2:7;
+	u16 csum_start_offset;
+	u16 csum_length;
+	u16 csum_value;
+} __aligned(1);
+
+struct rmnet_map_ul_csum_header {
+	u16 csum_start_offset;
+	u16 csum_insert_offset:14;
+	u16 udp_ip4_ind:1;
+	u16 csum_enabled:1;
+} __aligned(1);
+
 #define RMNET_MAP_GET_MUX_ID(Y) (((struct rmnet_map_header *) \
 				 (Y)->data)->mux_id)
 #define RMNET_MAP_GET_CD_BIT(Y) (((struct rmnet_map_header *) \
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
index d214280..de0143e 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
@@ -21,6 +21,8 @@
 /* Constants */
 #define RMNET_INGRESS_FORMAT_DEAGGREGATION      BIT(0)
 #define RMNET_INGRESS_FORMAT_MAP_COMMANDS       BIT(1)
+#define RMNET_INGRESS_FORMAT_MAP_CKSUMV4        BIT(2)
+#define RMNET_EGRESS_FORMAT_MAP_CKSUMV4         BIT(3)

 /* Replace skb->dev to a virtual rmnet device and pass up the stack */
 #define RMNET_EPMODE_VND (1)
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 08/10] net: qualcomm: rmnet: Handle command packets with checksum trailer
From: Subash Abhinov Kasiviswanathan @ 2017-12-27  2:28 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1514341685-11262-1-git-send-email-subashab@codeaurora.org>

When using the MAPv4 packet format in conjunction with MAP commands,
a dummy DL checksum trailer will be appended to the packet. Before
this packet is sent out as an ACK, the DL checksum trailer needs to be
removed.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
index 51e6049..6bc328f 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
@@ -58,11 +58,24 @@ static u8 rmnet_map_do_flow_control(struct sk_buff *skb,
 }
 
 static void rmnet_map_send_ack(struct sk_buff *skb,
-			       unsigned char type)
+			       unsigned char type,
+			       struct rmnet_port *port)
 {
 	struct rmnet_map_control_command *cmd;
 	int xmit_status;
 
+	if (port->data_format & RMNET_INGRESS_FORMAT_MAP_CKSUMV4) {
+		if (skb->len < sizeof(struct rmnet_map_header) +
+		    RMNET_MAP_GET_LENGTH(skb) +
+		    sizeof(struct rmnet_map_dl_csum_trailer)) {
+			kfree_skb(skb);
+			return;
+		}
+
+		skb_trim(skb, skb->len -
+			 sizeof(struct rmnet_map_dl_csum_trailer));
+	}
+
 	skb->protocol = htons(ETH_P_MAP);
 
 	cmd = RMNET_MAP_GET_CMD_START(skb);
@@ -100,5 +113,5 @@ void rmnet_map_command(struct sk_buff *skb, struct rmnet_port *port)
 		break;
 	}
 	if (rc == RMNET_MAP_COMMAND_ACK)
-		rmnet_map_send_ack(skb, rc);
+		rmnet_map_send_ack(skb, rc, port);
 }
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 07/10] net: qualcomm: rmnet: Add support for RX checksum offload
From: Subash Abhinov Kasiviswanathan @ 2017-12-27  2:28 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1514341685-11262-1-git-send-email-subashab@codeaurora.org>

When using the MAPv4 packet format, receive checksum offload can be
enabled in hardware. The checksum computation over pseudo header is
not offloaded but the rest of the checksum computation over
the payload is offloaded. This applies only for TCP / UDP packets
which are not fragmented.

rmnet validates the TCP/UDP checksum for the packet using the checksum
from the checksum trailer added to the packet by hardware. The
validation performed is as following -

1. Perform 1's complement over the checksum value from the trailer
2. Compute 1's complement checksum over IPv4 / IPv6 header and
   subtracts it from the value from step 1
3. Computes 1's complement checksum over IPv4 / IPv6 pseudo header and
   adds it to the value from step 2
4. Subtracts the checksum value from the TCP / UDP header from the
   value from step 3.
5. Compares the value from step 4 to the checksum value from the
   TCP / UDP header.
6. If the comparison in step 5 succeeds, CHECKSUM_UNNECESSARY is set
   and the packet is passed on to network stack. If there is a
   failure, then the packet is passed on as such without modifying
   the ip_summed field.

The checksum field is also checked for UDP checksum 0 as per RFC 768
and for unexpected TCP checksum of 0.

If checksum offload is disabled when using MAPv4 packet format in
receive path, the packet is queued as is to network stack without
the validations above.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c   |  15 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h    |   4 +-
 .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c   | 177 ++++++++++++++++++++-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c    |   2 +
 4 files changed, 192 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index 8f8c4f2..3409458 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -66,8 +66,8 @@ static void rmnet_set_skb_proto(struct sk_buff *skb)
 			    struct rmnet_port *port)
 {
 	struct rmnet_endpoint *ep;
+	u16 len, pad;
 	u8 mux_id;
-	u16 len;
 
 	if (RMNET_MAP_GET_CD_BIT(skb)) {
 		if (port->data_format & RMNET_INGRESS_FORMAT_MAP_COMMANDS)
@@ -77,7 +77,8 @@ static void rmnet_set_skb_proto(struct sk_buff *skb)
 	}
 
 	mux_id = RMNET_MAP_GET_MUX_ID(skb);
-	len = RMNET_MAP_GET_LENGTH(skb) - RMNET_MAP_GET_PAD(skb);
+	pad = RMNET_MAP_GET_PAD(skb);
+	len = RMNET_MAP_GET_LENGTH(skb) - pad;
 
 	if (mux_id >= RMNET_MAX_LOGICAL_EP)
 		goto free_skb;
@@ -90,8 +91,14 @@ static void rmnet_set_skb_proto(struct sk_buff *skb)
 
 	/* Subtract MAP header */
 	skb_pull(skb, sizeof(struct rmnet_map_header));
-	skb_trim(skb, len);
 	rmnet_set_skb_proto(skb);
+
+	if (port->data_format & RMNET_INGRESS_FORMAT_MAP_CKSUMV4) {
+		if (!rmnet_map_checksum_downlink_packet(skb, len + pad))
+			skb->ip_summed = CHECKSUM_UNNECESSARY;
+	}
+
+	skb_trim(skb, len);
 	rmnet_deliver_skb(skb);
 	return;
 
@@ -115,7 +122,7 @@ static void rmnet_set_skb_proto(struct sk_buff *skb)
 	}
 
 	if (port->data_format & RMNET_INGRESS_FORMAT_DEAGGREGATION) {
-		while ((skbn = rmnet_map_deaggregate(skb)) != NULL)
+		while ((skbn = rmnet_map_deaggregate(skb, port)) != NULL)
 			__rmnet_map_ingress_handler(skbn, port);
 
 		consume_skb(skb);
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
index 01d876c..0539d99 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
@@ -83,9 +83,11 @@ struct rmnet_map_ul_csum_header {
 #define RMNET_MAP_NO_PAD_BYTES        0
 #define RMNET_MAP_ADD_PAD_BYTES       1
 
-struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb);
+struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb,
+				      struct rmnet_port *port);
 struct rmnet_map_header *rmnet_map_add_map_header(struct sk_buff *skb,
 						  int hdrlen, int pad);
 void rmnet_map_command(struct sk_buff *skb, struct rmnet_port *port);
+int rmnet_map_checksum_downlink_packet(struct sk_buff *skb, u16 len);
 
 #endif /* _RMNET_MAP_H_ */
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
index 978ce26..543e423 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
@@ -14,6 +14,9 @@
  */
 
 #include <linux/netdevice.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <net/ip6_checksum.h>
 #include "rmnet_config.h"
 #include "rmnet_map.h"
 #include "rmnet_private.h"
@@ -21,6 +24,144 @@
 #define RMNET_MAP_DEAGGR_SPACING  64
 #define RMNET_MAP_DEAGGR_HEADROOM (RMNET_MAP_DEAGGR_SPACING / 2)
 
+static u16 *rmnet_map_get_csum_field(unsigned char protocol,
+				     const void *txporthdr)
+{
+	u16 *check = 0;
+
+	switch (protocol) {
+	case IPPROTO_TCP:
+		check = &(((struct tcphdr *)txporthdr)->check);
+		break;
+
+	case IPPROTO_UDP:
+		check = &(((struct udphdr *)txporthdr)->check);
+		break;
+
+	default:
+		check = 0;
+		break;
+	}
+
+	return check;
+}
+
+static int
+rmnet_map_ipv4_dl_csum_trailer(struct sk_buff *skb,
+			       struct rmnet_map_dl_csum_trailer *csum_trailer)
+{
+	u16 ip_pseudo_payload_csum, pseudo_csum, ip_hdr_csum, *csum_field;
+	u16 csum_value, ip_payload_csum, csum_value_final;
+	struct iphdr *ip4h;
+	void *txporthdr;
+
+	ip4h = (struct iphdr *)(skb->data);
+	if ((ntohs(ip4h->frag_off) & IP_MF) ||
+	    ((ntohs(ip4h->frag_off) & IP_OFFSET) > 0))
+		return -EOPNOTSUPP;
+
+	txporthdr = skb->data + ip4h->ihl * 4;
+
+	csum_field = rmnet_map_get_csum_field(ip4h->protocol, txporthdr);
+
+	if (!csum_field)
+		return -EPROTONOSUPPORT;
+
+	/* RFC 768 - Skip IPv4 UDP packets where sender checksum field is 0 */
+	if (*csum_field == 0 && ip4h->protocol == IPPROTO_UDP)
+		return 0;
+
+	csum_value = ~ntohs(csum_trailer->csum_value);
+	ip_hdr_csum = ~ip_fast_csum(ip4h, (int)ip4h->ihl);
+	ip_payload_csum = csum16_sub(csum_value, ip_hdr_csum);
+
+	pseudo_csum = ~ntohs(csum_tcpudp_magic(ip4h->saddr, ip4h->daddr,
+			     (u16)(ntohs(ip4h->tot_len) - ip4h->ihl * 4),
+			     (u16)ip4h->protocol, 0));
+	ip_pseudo_payload_csum = csum16_add(ip_payload_csum, pseudo_csum);
+
+	csum_value_final = ~csum16_sub(ip_pseudo_payload_csum,
+				       ntohs(*csum_field));
+
+	if (unlikely(csum_value_final == 0)) {
+		switch (ip4h->protocol) {
+		case IPPROTO_UDP:
+			/* RFC 768 - DL4 1's complement rule for UDP csum 0 */
+			csum_value_final = ~csum_value_final;
+			break;
+
+		case IPPROTO_TCP:
+			/* DL4 Non-RFC compliant TCP checksum found */
+			if (*csum_field == 0xFFFF)
+				csum_value_final = ~csum_value_final;
+			break;
+		}
+	}
+
+	if (csum_value_final == ntohs(*csum_field))
+		return 0;
+	else
+		return -EINVAL;
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+static int
+rmnet_map_ipv6_dl_csum_trailer(struct sk_buff *skb,
+			       struct rmnet_map_dl_csum_trailer *csum_trailer)
+{
+	u16 ip_pseudo_payload_csum, pseudo_csum, ip6_hdr_csum, *csum_field;
+	u16 csum_value, ip6_payload_csum, csum_value_final;
+	struct ipv6hdr *ip6h;
+	void *txporthdr;
+	u32 length;
+
+	ip6h = (struct ipv6hdr *)(skb->data);
+
+	txporthdr = skb->data + sizeof(struct ipv6hdr);
+	csum_field = rmnet_map_get_csum_field(ip6h->nexthdr, txporthdr);
+
+	if (!csum_field)
+		return -EPROTONOSUPPORT;
+
+	csum_value = ~ntohs(csum_trailer->csum_value);
+	ip6_hdr_csum = ~ntohs(ip_compute_csum(ip6h,
+			      (int)(txporthdr - (void *)(skb->data))));
+	ip6_payload_csum = csum16_sub(csum_value, ip6_hdr_csum);
+
+	length = (ip6h->nexthdr == IPPROTO_UDP) ?
+		 ntohs(((struct udphdr *)txporthdr)->len) :
+		 ntohs(ip6h->payload_len);
+	pseudo_csum = ~ntohs(csum_ipv6_magic(&ip6h->saddr, &ip6h->daddr,
+			     length, ip6h->nexthdr, 0));
+	ip_pseudo_payload_csum = csum16_add(ip6_payload_csum, pseudo_csum);
+
+	csum_value_final = ~csum16_sub(ip_pseudo_payload_csum,
+				       ntohs(*csum_field));
+
+	if (unlikely(csum_value_final == 0)) {
+		switch (ip6h->nexthdr) {
+		case IPPROTO_UDP:
+			/* RFC 2460 section 8.1
+			 * DL6 One's complement rule for UDP checksum 0
+			 */
+			csum_value_final = ~csum_value_final;
+			break;
+
+		case IPPROTO_TCP:
+			/* DL6 Non-RFC compliant TCP checksum found */
+			if (*csum_field == 0xFFFF)
+				csum_value_final = ~csum_value_final;
+			break;
+		}
+	}
+
+	if (csum_value_final == ntohs(*csum_field))
+		return 0;
+	else
+		return -EINVAL;
+}
+#endif
+
 /* Adds MAP header to front of skb->data
  * Padding is calculated and set appropriately in MAP header. Mux ID is
  * initialized to 0.
@@ -66,7 +207,8 @@ struct rmnet_map_header *rmnet_map_add_map_header(struct sk_buff *skb,
  * returned, indicating that there are no more packets to deaggregate. Caller
  * is responsible for freeing the original skb.
  */
-struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb)
+struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb,
+				      struct rmnet_port *port)
 {
 	struct rmnet_map_header *maph;
 	struct sk_buff *skbn;
@@ -78,6 +220,9 @@ struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb)
 	maph = (struct rmnet_map_header *)skb->data;
 	packet_len = ntohs(maph->pkt_len) + sizeof(struct rmnet_map_header);
 
+	if (port->data_format & RMNET_INGRESS_FORMAT_MAP_CKSUMV4)
+		packet_len += sizeof(struct rmnet_map_dl_csum_trailer);
+
 	if (((int)skb->len - (int)packet_len) < 0)
 		return NULL;
 
@@ -97,3 +242,33 @@ struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb)
 
 	return skbn;
 }
+
+/* Validates packet checksums. Function takes a pointer to
+ * the beginning of a buffer which contains the IP payload +
+ * padding + checksum trailer.
+ * Only IPv4 and IPv6 are supported along with TCP & UDP.
+ * Fragmented or tunneled packets are not supported.
+ */
+int rmnet_map_checksum_downlink_packet(struct sk_buff *skb, u16 len)
+{
+	struct rmnet_map_dl_csum_trailer *csum_trailer;
+
+	if (unlikely(!(skb->dev->features & NETIF_F_RXCSUM)))
+		return -EOPNOTSUPP;
+
+	csum_trailer = (struct rmnet_map_dl_csum_trailer *)(skb->data + len);
+
+	if (!ntohs(csum_trailer->valid))
+		return -EINVAL;
+
+	if (skb->protocol == htons(ETH_P_IP))
+		return rmnet_map_ipv4_dl_csum_trailer(skb, csum_trailer);
+	else if (skb->protocol == htons(ETH_P_IPV6))
+#if IS_ENABLED(CONFIG_IPV6)
+		return rmnet_map_ipv6_dl_csum_trailer(skb, csum_trailer);
+#else
+		return -EPROTONOSUPPORT;
+#endif
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
index 5bb29f4..879a2e0 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -188,6 +188,8 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
 	if (rmnet_get_endpoint(port, id))
 		return -EBUSY;
 
+	rmnet_dev->hw_features = NETIF_F_RXCSUM;
+
 	rc = register_netdevice(rmnet_dev);
 	if (!rc) {
 		ep->egress_dev = rmnet_dev;
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 09/10] net: qualcomm: rmnet: Add support for TX checksum offload
From: Subash Abhinov Kasiviswanathan @ 2017-12-27  2:28 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1514341685-11262-1-git-send-email-subashab@codeaurora.org>

TX checksum offload applies to TCP / UDP packets which are not
fragmented using the MAPv4 checksum header. The following needs to be
done to have checksum computed in hardware -

1. Set the checksum start offset and inset offset.
2. Set the csum_enabled bit
3. Compute and set 1's complement of partial checksum field in
   transport header.

If TX checksum offload is disabled, all the fields in the checksum
header are set 0 and hardware will not perform any computation.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c   |   8 ++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h    |   2 +
 .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c   | 118 +++++++++++++++++++++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c    |   1 +
 4 files changed, 129 insertions(+)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index 3409458..601edec 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -141,11 +141,19 @@ static int rmnet_map_egress_handler(struct sk_buff *skb,
 	additional_header_len = 0;
 	required_headroom = sizeof(struct rmnet_map_header);
 
+	if (port->data_format & RMNET_EGRESS_FORMAT_MAP_CKSUMV4) {
+		additional_header_len = sizeof(struct rmnet_map_ul_csum_header);
+		required_headroom += additional_header_len;
+	}
+
 	if (skb_headroom(skb) < required_headroom) {
 		if (pskb_expand_head(skb, required_headroom, 0, GFP_KERNEL))
 			goto fail;
 	}
 
+	if (port->data_format & RMNET_EGRESS_FORMAT_MAP_CKSUMV4)
+		rmnet_map_checksum_uplink_packet(skb, orig_dev);
+
 	map_header = rmnet_map_add_map_header(skb, additional_header_len, 0);
 	if (!map_header)
 		goto fail;
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
index 0539d99..c635dd7 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
@@ -89,5 +89,7 @@ struct rmnet_map_header *rmnet_map_add_map_header(struct sk_buff *skb,
 						  int hdrlen, int pad);
 void rmnet_map_command(struct sk_buff *skb, struct rmnet_port *port);
 int rmnet_map_checksum_downlink_packet(struct sk_buff *skb, u16 len);
+void rmnet_map_checksum_uplink_packet(struct sk_buff *skb,
+				      struct net_device *orig_dev);
 
 #endif /* _RMNET_MAP_H_ */
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
index 543e423..56923a5 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
@@ -162,6 +162,84 @@ static u16 *rmnet_map_get_csum_field(unsigned char protocol,
 }
 #endif
 
+static void rmnet_map_complement_ipv4_txporthdr_csum_field(void *iphdr)
+{
+	struct iphdr *ip4h = (struct iphdr *)iphdr;
+	void *txphdr;
+	u16 *csum;
+
+	txphdr = iphdr + ip4h->ihl * 4;
+
+	if (ip4h->protocol == IPPROTO_TCP || ip4h->protocol == IPPROTO_UDP) {
+		csum = (u16 *)rmnet_map_get_csum_field(ip4h->protocol, txphdr);
+		*csum = ~(*csum);
+	}
+}
+
+static void
+rmnet_map_ipv4_ul_csum_header(void *iphdr,
+			      struct rmnet_map_ul_csum_header *ul_header,
+			      struct sk_buff *skb)
+{
+	struct iphdr *ip4h = (struct iphdr *)iphdr;
+	u16 *hdr = (u16 *)ul_header;
+
+	ul_header->csum_start_offset = htons((u16)(skb_transport_header(skb) -
+						   (unsigned char *)iphdr));
+	ul_header->csum_insert_offset = skb->csum_offset;
+	ul_header->csum_enabled = 1;
+	if (ip4h->protocol == IPPROTO_UDP)
+		ul_header->udp_ip4_ind = 1;
+	else
+		ul_header->udp_ip4_ind = 0;
+
+	/* Changing remaining fields to network order */
+	hdr++;
+	*hdr = htons(*hdr);
+
+	skb->ip_summed = CHECKSUM_NONE;
+
+	rmnet_map_complement_ipv4_txporthdr_csum_field(iphdr);
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+static void rmnet_map_complement_ipv6_txporthdr_csum_field(void *ip6hdr)
+{
+	struct ipv6hdr *ip6h = (struct ipv6hdr *)ip6hdr;
+	void *txphdr;
+	u16 *csum;
+
+	txphdr = ip6hdr + sizeof(struct ipv6hdr);
+
+	if (ip6h->nexthdr == IPPROTO_TCP || ip6h->nexthdr == IPPROTO_UDP) {
+		csum = (u16 *)rmnet_map_get_csum_field(ip6h->nexthdr, txphdr);
+		*csum = ~(*csum);
+	}
+}
+
+static void
+rmnet_map_ipv6_ul_csum_header(void *ip6hdr,
+			      struct rmnet_map_ul_csum_header *ul_header,
+			      struct sk_buff *skb)
+{
+	u16 *hdr = (u16 *)ul_header;
+
+	ul_header->csum_start_offset = htons((u16)(skb_transport_header(skb) -
+						   (unsigned char *)ip6hdr));
+	ul_header->csum_insert_offset = skb->csum_offset;
+	ul_header->csum_enabled = 1;
+	ul_header->udp_ip4_ind = 0;
+
+	/* Changing remaining fields to network order */
+	hdr++;
+	*hdr = htons(*hdr);
+
+	skb->ip_summed = CHECKSUM_NONE;
+
+	rmnet_map_complement_ipv6_txporthdr_csum_field(ip6hdr);
+}
+#endif
+
 /* Adds MAP header to front of skb->data
  * Padding is calculated and set appropriately in MAP header. Mux ID is
  * initialized to 0.
@@ -272,3 +350,43 @@ int rmnet_map_checksum_downlink_packet(struct sk_buff *skb, u16 len)
 
 	return 0;
 }
+
+/* Generates UL checksum meta info header for IPv4 and IPv6 over TCP and UDP
+ * packets that are supported for UL checksum offload.
+ */
+void rmnet_map_checksum_uplink_packet(struct sk_buff *skb,
+				      struct net_device *orig_dev)
+{
+	struct rmnet_map_ul_csum_header *ul_header;
+	void *iphdr;
+
+	ul_header = (struct rmnet_map_ul_csum_header *)
+		    skb_push(skb, sizeof(struct rmnet_map_ul_csum_header));
+
+	if (unlikely(!(orig_dev->features &
+		     (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM))))
+		goto sw_csum;
+
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		iphdr = (char *)ul_header +
+			sizeof(struct rmnet_map_ul_csum_header);
+
+		if (skb->protocol == htons(ETH_P_IP)) {
+			rmnet_map_ipv4_ul_csum_header(iphdr, ul_header, skb);
+			return;
+		} else if (skb->protocol == htons(ETH_P_IPV6)) {
+#if IS_ENABLED(CONFIG_IPV6)
+			rmnet_map_ipv6_ul_csum_header(iphdr, ul_header, skb);
+			return;
+#else
+			goto sw_csum;
+#endif
+		}
+	}
+
+sw_csum:
+	ul_header->csum_start_offset = 0;
+	ul_header->csum_insert_offset = 0;
+	ul_header->csum_enabled = 0;
+	ul_header->udp_ip4_ind = 0;
+}
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
index 879a2e0..f7f57ce 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -189,6 +189,7 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
 		return -EBUSY;
 
 	rmnet_dev->hw_features = NETIF_F_RXCSUM;
+	rmnet_dev->hw_features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
 
 	rc = register_netdevice(rmnet_dev);
 	if (!rc) {
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 10/10] net: qualcomm: rmnet: Add support for GSO
From: Subash Abhinov Kasiviswanathan @ 2017-12-27  2:28 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan
In-Reply-To: <1514341685-11262-1-git-send-email-subashab@codeaurora.org>

Real devices may support scatter gather(SG), so enable SG on rmnet
devices to use GSO. GSO reduces CPU cycles by 20% for a rate of
146Mpbs for a single stream TCP connection.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
index f7f57ce..570a227 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -190,6 +190,7 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
 
 	rmnet_dev->hw_features = NETIF_F_RXCSUM;
 	rmnet_dev->hw_features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
+	rmnet_dev->hw_features |= NETIF_F_SG;
 
 	rc = register_netdevice(rmnet_dev);
 	if (!rc) {
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH v2 bpf-next 2/2] tools/bpftool: fix bpftool build with bintutils >= 2.8
From: Alexei Starovoitov @ 2017-12-27  2:32 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Roman Gushchin, netdev, linux-kernel, kernel-team, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann
In-Reply-To: <bc74e33b-0dbc-f12a-bae8-180dd41f007b@netronome.com>

On Fri, Dec 22, 2017 at 06:50:01PM +0000, Quentin Monnet wrote:
> Hi Roman,
> 
> 2017-12-22 16:11 UTC+0000 ~ Roman Gushchin <guro@fb.com>
> > Bpftool build is broken with binutils version 2.28 and later.
> 
> Could you check the binutils version? I believe it changed in 2.29
> instead of 2.28. Could you update your commit log and subject
> accordingly, please?
> 
> > The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
> > in the binutils repo, which changed the disassembler() function
> > signature.
> > 
> > Fix this by adding a new "feature" to the tools/build/features
> > infrastructure and make it responsible for decision which
> > disassembler() function signature to use.
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > ---
> >  tools/bpf/Makefile                                | 29 +++++++++++++++++++++++
> >  tools/bpf/bpf_jit_disasm.c                        |  7 ++++++
> >  tools/bpf/bpftool/Makefile                        | 24 +++++++++++++++++++
> >  tools/bpf/bpftool/jit_disasm.c                    |  7 ++++++
> >  tools/build/feature/Makefile                      |  4 ++++
> >  tools/build/feature/test-disassembler-four-args.c | 15 ++++++++++++
> >  6 files changed, 86 insertions(+)
> >  create mode 100644 tools/build/feature/test-disassembler-four-args.c
> > 
> > diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
> > index 07a6697466ef..c8ec0ae16bf0 100644
> > --- a/tools/bpf/Makefile
> > +++ b/tools/bpf/Makefile
> > @@ -9,6 +9,35 @@ MAKE = make
> >  CFLAGS += -Wall -O2
> >  CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
> >  
> > +ifeq ($(srctree),)
> > +srctree := $(patsubst %/,%,$(dir $(CURDIR)))
> > +srctree := $(patsubst %/,%,$(dir $(srctree)))
> > +endif
> > +
> > +FEATURE_USER = .bpf
> > +FEATURE_TESTS = libbfd disassembler-four-args
> > +FEATURE_DISPLAY = libbfd disassembler-four-args
> 
> Thanks for adding libbfd as I requested. However, you do not use it in
> the Makefile to prevent compilation if the feature is not detected (see
> "bpfdep" or "elfdep" in tools/lib/bpf/Makefile. Sorry, I should have
> pointed it in my previous review.
> 
> But actually, I have another issue related to the libbfd feature: since
> commit 280e7c48c3b8 ("perf tools: fix BFD detection on opensuse") it
> requires libiberty so that libbfd is correctly detected, but libiberty
> is not needed on all distros (at least Ubuntu can have libbfd without
> libiberty). Typically, detection fails on my setup, although I do have
> libbfd installed. So forcing libbfd feature here may eventually force
> users to install libraries they do not need to compile bpftool, which is
> not what we want.
> 
> I do not have a clean work around to suggest. Maybe have one
> "libbfd-something" feature that tries to compile without libiberty, then
> another one that tries with it, and compile the tools if at least one of
> them succeeds. But it's probably for another patch series. In the
> meantime, would you please simply remove libbfd detection here and
> accept my apologies for suggesting to add it in the previous review?

I think since libbfd is already used by bpftool it's a good thing
to add feature detection. Even if it's not perfect on some setups.

Roman,
I think you still need to do one more respin to address commit log nit?

^ permalink raw reply

* Re: [patch net-next v2 00/10] Add support for resource abstraction
From: David Ahern @ 2017-12-27  4:05 UTC (permalink / raw)
  To: Jiri Pirko, netdev, davem
  Cc: arkadis, mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
	ganeshgr, saeedm, matanb, leonro, idosch, jakub.kicinski, ast,
	daniel, simon.horman, pieter.jansenvanvuuren, john.hurley,
	alexander.h.duyck, linville, gospo, steven.lin1, yuvalm, ogerlitz,
	roopa
In-Reply-To: <20171226112359.5313-1-jiri@resnulli.us>

On 12/26/17 5:23 AM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> Many of the ASIC's internal resources are limited and are shared between
> several hardware procedures. For example, unified hash-based memory can
> be used for many lookup purposes, like FDB and LPM. In many cases the user
> can provide a partitioning scheme for such a resource in order to perform
> fine tuning for his application. In such cases performing driver reload is
> needed for the changes to take place, thus this patchset also adds support
> for hot reload.
> 
> Such an abstraction can be coupled with devlink's dpipe interface, which
> models the ASIC's pipeline as a graph of match/action tables. By modeling
> the hardware resource object, and by coupling it to several dpipe tables,
> further visibility can be achieved in order to debug ASIC-wide issues.
> 
> The proposed interface will provide the user the ability to understand the
> limitations of the hardware, and receive notification regarding its occupancy.
> Furthermore, monitoring the resource occupancy can be done in real-time and
> can be useful in many cases.

In the last RFC (not v1, but RFC) I asked for some kind of description
for each resource, and you and Arkadi have pushed back. Let's walk
through an example to see what I mean:

$ devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
  name kvd size 245760 size_valid true
  resources:
    name linear size 98304 occ 0
    name hash_double size 60416
    name hash_single size 87040

So this 2700 has 3 resources that can be managed -- some table or
resource or something named 'kvd' with linear, hash_double and
hash_single sub-resources. What are these names referring too? The above
output gives no description, and 'kvd' is not an industry term. Further,
what are these sizes that a user can control? The output contains no
units, no description, nothing. In short, the above output provides
random numbers associated with random names.

I can see dpipe tables exported by this device:

$ devlink dpipe header show pci/0000:03:00.0

pci/0000:03:00.0:
  name mlxsw_meta
  field:
    name erif_port bitwidth 32 mapping_type ifindex
    name l3_forward bitwidth 1
    name l3_drop bitwidth 1
    name adj_index bitwidth 32
    name adj_size bitwidth 32
    name adj_hash_index bitwidth 32

  name ipv6
  field:
    name destination ip bitwidth 128

  name ipv4
  field:
    name destination ip bitwidth 32

  name ethernet
  field:
    name destination mac bitwidth 48

but none mention 'kvd' or 'linear' or 'hash" and none of the other
various devlink options:

$ devlink
Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }
where  OBJECT := { dev | port | sb | monitor | dpipe }

seem to related to resources.

So how does a user know what they are controlling by this 'resource'
option? Is the user expected to have a PRM or user guide on hand for the
specific device model that is being configured?

Again, I have no objections to kvd, linear, hash, etc terms as they do
relate to Mellanox products. But kvd/linear, for example, does correlate
to industry standard concepts in some way. My request is that the
resource listing guide the user in some way, stating what these
resources mean.

IMO the above output is not user friendly and having to keep a PRM on
hand for each device model is not a realistic solution.

^ permalink raw reply

* Re: BUG warnings in 4.14.9
From: alexander.levin @ 2017-12-27  4:25 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Willy Tarreau, Wei Wang, Martin KaFai Lau, Eric Dumazet,
	David S. Miller, Greg Kroah-Hartman, Chris Rankin,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20171226205436.GA32546@splinter>

On Tue, Dec 26, 2017 at 10:54:37PM +0200, Ido Schimmel wrote:
>On Tue, Dec 26, 2017 at 07:59:55PM +0100, Willy Tarreau wrote:
>> Guys,
>>
>> Chris reported the bug below and confirmed that reverting commit
>> 9704f81 (ipv6: grab rt->rt6i_ref before allocating pcpu rt) seems to
>> have fixed the issue for him. This patch is a94b9367 in mainline.
>>
>> I personally have no opinion on the patch, just found it because it
>> was the only one touching this area between 4.14.8 and 4.14.9 :-)
>>
>> Should this be reverted or maybe fixed differently ?
>
>Maybe I'm missing something, but how come this patch even made its way
>into 4.14.y? It's part of a series to RCU-ify IPv6 FIB lookup that went
>into 4.15.
>
>Anyway, the mentioned bug was already fixed by commit 951f788a80ff
>("ipv6: fix a BUG in rt6_get_pcpu_route()") when the code was still in
>net-next.

Uh, you're right. Greg, please just revert 9704f81. Thanks!

-- 

Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH net-next v2] net: sched: fix skb leak in dev_requeue_skb()
From: John Fastabend @ 2017-12-27  5:24 UTC (permalink / raw)
  To: Wei Yongjun, Jamal Hadi Salim, Cong Wang, Jiri Pirko; +Cc: netdev
In-Reply-To: <1514173746-165282-1-git-send-email-weiyongjun1@huawei.com>

On 12/24/2017 07:49 PM, Wei Yongjun wrote:
> When dev_requeue_skb() is called with bluked skb list, only the
> first skb of the list will be requeued to qdisc layer, and leak
> the others without free them.
> 
> TCP is broken due to skb leak since no free skb will be considered
> as still in the host queue and never be retransmitted. This happend
> when dev_requeue_skb() called from qdisc_restart().
>   qdisc_restart
>   |-- dequeue_skb
>   |-- sch_direct_xmit()
>       |-- dev_requeue_skb() <-- skb may bluked
> 
> Fix dev_requeue_skb() to requeue the full bluked list.
> 
> Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> ---
> v1 -> v2: add net-next prefix
> ---

First, thanks for tracking this down.

> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 981c08f..0df2dbf 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -111,10 +111,16 @@ static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
>  
>  static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>  {
> -	__skb_queue_head(&q->gso_skb, skb);
> -	q->qstats.requeues++;
> -	qdisc_qstats_backlog_inc(q, skb);
> -	q->q.qlen++;	/* it's still part of the queue */
> +	while (skb) {
> +		struct sk_buff *next = skb->next;
> +
> +		__skb_queue_tail(&q->gso_skb, skb);

Was the change from __skb_queue_head to __skb_queue_tail here
intentional? We should re-queue packets to the head of the list.

> +		q->qstats.requeues++;
> +		qdisc_qstats_backlog_inc(q, skb);
> +		q->q.qlen++;	/* it's still part of the queue */
> +
> +		skb = next;
> +	}
>  	__netif_schedule(q);
>  
>  	return 0;
> @@ -124,13 +130,20 @@ static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc *q)
>  {
>  	spinlock_t *lock = qdisc_lock(q);
>  
> -	spin_lock(lock);
> -	__skb_queue_tail(&q->gso_skb, skb);
> -	spin_unlock(lock);
> +	while (skb) {
> +		struct sk_buff *next = skb->next;
> +
> +		spin_lock(lock);

In this case I suspect its better to move the lock to be around the
while loop rather than grab and drop it repeatedly. I don't have
any data at this point so OK either way. Assuming other head/tail
comment is addressed.

> +		__skb_queue_tail(&q->gso_skb, skb);

Same here *_tail should be *_head?

> +		spin_unlock(lock);
> +
> +		qdisc_qstats_cpu_requeues_inc(q);
> +		qdisc_qstats_cpu_backlog_inc(q, skb);
> +		qdisc_qstats_cpu_qlen_inc(q);
> +
> +		skb = next;
> +	}
>  
> -	qdisc_qstats_cpu_requeues_inc(q);
> -	qdisc_qstats_cpu_backlog_inc(q, skb);
> -	qdisc_qstats_cpu_qlen_inc(q);
>  	__netif_schedule(q);
>  
>  	return 0;
> 

^ permalink raw reply

* Re: [PATCH net-next v5 1/6] net: tcp: Add trace events for TCP congestion window tracing
From: Masami Hiramatsu @ 2017-12-27  5:43 UTC (permalink / raw)
  To: David Miller
  Cc: mingo, ian.mcdonald, vyasevich, stephen, rostedt, peterz, tglx,
	linux-kernel, hpa, gerrit, nhorman, dccp, netdev, linux-sctp, sfr
In-Reply-To: <20171226.185155.2132957966134649827.davem@davemloft.net>

On Tue, 26 Dec 2017 18:51:55 -0500 (EST)
David Miller <davem@davemloft.net> wrote:

> From: Masami Hiramatsu <mhiramat@kernel.org>
> Date: Fri, 22 Dec 2017 11:05:33 +0900
> 
> > This adds an event to trace TCP stat variables with
> > slightly intrusive trace-event. This uses ftrace/perf
> > event log buffer to trace those state, no needs to
> > prepare own ring-buffer, nor custom user apps.
> > 
> > User can use ftrace to trace this event as below;
> > 
> >   # cd /sys/kernel/debug/tracing
> >   # echo 1 > events/tcp/tcp_probe/enable
> >   (run workloads)
> >   # cat trace
> > 
> > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
>  ...
> > +	TP_fast_assign(
> > +		const struct tcp_sock *tp = tcp_sk(sk);
> > +		const struct inet_sock *inet = inet_sk(sk);
> > +
> > +		memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
> > +		memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
> > +
> > +		if (sk->sk_family == AF_INET) {
> > +			struct sockaddr_in *v4 = (void *)__entry->saddr;
> > +
> > +			v4->sin_family = AF_INET;
> > +			v4->sin_port = inet->inet_sport;
> > +			v4->sin_addr.s_addr = inet->inet_saddr;
> > +			v4 = (void *)__entry->daddr;
> > +			v4->sin_family = AF_INET;
> > +			v4->sin_port = inet->inet_dport;
> > +			v4->sin_addr.s_addr = inet->inet_daddr;
> > +#if IS_ENABLED(CONFIG_IPV6)
> > +		} else if (sk->sk_family == AF_INET6) {
> 
> It looks like doing this ifdef test inside of a trace macro is very
> undesirable because it upsets sparse.
> 
> Please see the following commit which just went into 'net'.

OK, that's helpful for me how to avoid it :)

I'll update the series .

Thank you,

> 
> ====================
> commit 6a6b0b9914e73a8a54253dd5f6f5e5dd5e4a756c
> Author: Mat Martineau <mathew.j.martineau@linux.intel.com>
> Date:   Thu Dec 21 10:29:09 2017 -0800
> 
>     tcp: Avoid preprocessor directives in tracepoint macro args
>     
>     Using a preprocessor directive to check for CONFIG_IPV6 in the middle of
>     a DECLARE_EVENT_CLASS macro's arg list causes sparse to report a series
>     of errors:
>     
>     ./include/trace/events/tcp.h:68:1: error: directive in argument list
>     ./include/trace/events/tcp.h:75:1: error: directive in argument list
>     ./include/trace/events/tcp.h:144:1: error: directive in argument list
>     ./include/trace/events/tcp.h:151:1: error: directive in argument list
>     ./include/trace/events/tcp.h:216:1: error: directive in argument list
>     ./include/trace/events/tcp.h:223:1: error: directive in argument list
>     ./include/trace/events/tcp.h:274:1: error: directive in argument list
>     ./include/trace/events/tcp.h:281:1: error: directive in argument list
>     
>     Once sparse finds an error, it stops printing warnings for the file it
>     is checking. This masks any sparse warnings that would normally be
>     reported for the core TCP code.
>     
>     Instead, handle the preprocessor conditionals in a couple of auxiliary
>     macros. This also has the benefit of reducing duplicate code.
>     
>     Cc: David Ahern <dsahern@gmail.com>
>     Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> index 07cccca..ab34c56 100644
> --- a/include/trace/events/tcp.h
> +++ b/include/trace/events/tcp.h
> @@ -25,6 +25,35 @@
>  		tcp_state_name(TCP_CLOSING),		\
>  		tcp_state_name(TCP_NEW_SYN_RECV))
>  
> +#define TP_STORE_V4MAPPED(__entry, saddr, daddr)		\
> +	do {							\
> +		struct in6_addr *pin6;				\
> +								\
> +		pin6 = (struct in6_addr *)__entry->saddr_v6;	\
> +		ipv6_addr_set_v4mapped(saddr, pin6);		\
> +		pin6 = (struct in6_addr *)__entry->daddr_v6;	\
> +		ipv6_addr_set_v4mapped(daddr, pin6);		\
> +	} while (0)
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> +#define TP_STORE_ADDRS(__entry, saddr, daddr, saddr6, daddr6)		\
> +	do {								\
> +		if (sk->sk_family == AF_INET6) {			\
> +			struct in6_addr *pin6;				\
> +									\
> +			pin6 = (struct in6_addr *)__entry->saddr_v6;	\
> +			*pin6 = saddr6;					\
> +			pin6 = (struct in6_addr *)__entry->daddr_v6;	\
> +			*pin6 = daddr6;					\
> +		} else {						\
> +			TP_STORE_V4MAPPED(__entry, saddr, daddr);	\
> +		}							\
> +	} while (0)
> +#else
> +#define TP_STORE_ADDRS(__entry, saddr, daddr, saddr6, daddr6)	\
> +	TP_STORE_V4MAPPED(__entry, saddr, daddr)
> +#endif
> +
>  /*
>   * tcp event with arguments sk and skb
>   *
> @@ -50,7 +79,6 @@ DECLARE_EVENT_CLASS(tcp_event_sk_skb,
>  
>  	TP_fast_assign(
>  		struct inet_sock *inet = inet_sk(sk);
> -		struct in6_addr *pin6;
>  		__be32 *p32;
>  
>  		__entry->skbaddr = skb;
> @@ -65,20 +93,8 @@ DECLARE_EVENT_CLASS(tcp_event_sk_skb,
>  		p32 = (__be32 *) __entry->daddr;
>  		*p32 =  inet->inet_daddr;
>  
> -#if IS_ENABLED(CONFIG_IPV6)
> -		if (sk->sk_family == AF_INET6) {
> -			pin6 = (struct in6_addr *)__entry->saddr_v6;
> -			*pin6 = sk->sk_v6_rcv_saddr;
> -			pin6 = (struct in6_addr *)__entry->daddr_v6;
> -			*pin6 = sk->sk_v6_daddr;
> -		} else
> -#endif
> -		{
> -			pin6 = (struct in6_addr *)__entry->saddr_v6;
> -			ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
> -			pin6 = (struct in6_addr *)__entry->daddr_v6;
> -			ipv6_addr_set_v4mapped(inet->inet_daddr, pin6);
> -		}
> +		TP_STORE_ADDRS(__entry, inet->inet_saddr, inet->inet_daddr,
> +			      sk->sk_v6_rcv_saddr, sk->sk_v6_daddr);
>  	),
>  
>  	TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c",
> @@ -127,7 +143,6 @@ DECLARE_EVENT_CLASS(tcp_event_sk,
>  
>  	TP_fast_assign(
>  		struct inet_sock *inet = inet_sk(sk);
> -		struct in6_addr *pin6;
>  		__be32 *p32;
>  
>  		__entry->skaddr = sk;
> @@ -141,20 +156,8 @@ DECLARE_EVENT_CLASS(tcp_event_sk,
>  		p32 = (__be32 *) __entry->daddr;
>  		*p32 =  inet->inet_daddr;
>  
> -#if IS_ENABLED(CONFIG_IPV6)
> -		if (sk->sk_family == AF_INET6) {
> -			pin6 = (struct in6_addr *)__entry->saddr_v6;
> -			*pin6 = sk->sk_v6_rcv_saddr;
> -			pin6 = (struct in6_addr *)__entry->daddr_v6;
> -			*pin6 = sk->sk_v6_daddr;
> -		} else
> -#endif
> -		{
> -			pin6 = (struct in6_addr *)__entry->saddr_v6;
> -			ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
> -			pin6 = (struct in6_addr *)__entry->daddr_v6;
> -			ipv6_addr_set_v4mapped(inet->inet_daddr, pin6);
> -		}
> +		TP_STORE_ADDRS(__entry, inet->inet_saddr, inet->inet_daddr,
> +			       sk->sk_v6_rcv_saddr, sk->sk_v6_daddr);
>  	),
>  
>  	TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c",
> @@ -197,7 +200,6 @@ TRACE_EVENT(tcp_set_state,
>  
>  	TP_fast_assign(
>  		struct inet_sock *inet = inet_sk(sk);
> -		struct in6_addr *pin6;
>  		__be32 *p32;
>  
>  		__entry->skaddr = sk;
> @@ -213,20 +215,8 @@ TRACE_EVENT(tcp_set_state,
>  		p32 = (__be32 *) __entry->daddr;
>  		*p32 =  inet->inet_daddr;
>  
> -#if IS_ENABLED(CONFIG_IPV6)
> -		if (sk->sk_family == AF_INET6) {
> -			pin6 = (struct in6_addr *)__entry->saddr_v6;
> -			*pin6 = sk->sk_v6_rcv_saddr;
> -			pin6 = (struct in6_addr *)__entry->daddr_v6;
> -			*pin6 = sk->sk_v6_daddr;
> -		} else
> -#endif
> -		{
> -			pin6 = (struct in6_addr *)__entry->saddr_v6;
> -			ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
> -			pin6 = (struct in6_addr *)__entry->daddr_v6;
> -			ipv6_addr_set_v4mapped(inet->inet_daddr, pin6);
> -		}
> +		TP_STORE_ADDRS(__entry, inet->inet_saddr, inet->inet_daddr,
> +			       sk->sk_v6_rcv_saddr, sk->sk_v6_daddr);
>  	),
>  
>  	TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c oldstate=%s newstate=%s",
> @@ -256,7 +246,6 @@ TRACE_EVENT(tcp_retransmit_synack,
>  
>  	TP_fast_assign(
>  		struct inet_request_sock *ireq = inet_rsk(req);
> -		struct in6_addr *pin6;
>  		__be32 *p32;
>  
>  		__entry->skaddr = sk;
> @@ -271,20 +260,8 @@ TRACE_EVENT(tcp_retransmit_synack,
>  		p32 = (__be32 *) __entry->daddr;
>  		*p32 = ireq->ir_rmt_addr;
>  
> -#if IS_ENABLED(CONFIG_IPV6)
> -		if (sk->sk_family == AF_INET6) {
> -			pin6 = (struct in6_addr *)__entry->saddr_v6;
> -			*pin6 = ireq->ir_v6_loc_addr;
> -			pin6 = (struct in6_addr *)__entry->daddr_v6;
> -			*pin6 = ireq->ir_v6_rmt_addr;
> -		} else
> -#endif
> -		{
> -			pin6 = (struct in6_addr *)__entry->saddr_v6;
> -			ipv6_addr_set_v4mapped(ireq->ir_loc_addr, pin6);
> -			pin6 = (struct in6_addr *)__entry->daddr_v6;
> -			ipv6_addr_set_v4mapped(ireq->ir_rmt_addr, pin6);
> -		}
> +		TP_STORE_ADDRS(__entry, ireq->ir_loc_addr, ireq->ir_rmt_addr,
> +			      ireq->ir_v6_loc_addr, ireq->ir_v6_rmt_addr);
>  	),
>  
>  	TP_printk("sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c",


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply

* [PATCH 1/3] staging: irda: fix type from "unsigned" to "unsigned int"
From: JI-HUN KIM @ 2017-12-27  5:52 UTC (permalink / raw)
  To: samuel
  Cc: devel, gregkh, kernel-janitors, linux-kernel, shreeya.patel23498,
	netdev, davem

Clean up checkpatch warning:
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'

Signed-off-by: JI-HUN KIM <jihuun.k@gmail.com>
---
 drivers/staging/irda/drivers/esi-sir.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/irda/drivers/esi-sir.c b/drivers/staging/irda/drivers/esi-sir.c
index eb7aa64..a12cf55 100644
--- a/drivers/staging/irda/drivers/esi-sir.c
+++ b/drivers/staging/irda/drivers/esi-sir.c
@@ -39,7 +39,7 @@
 
 static int esi_open(struct sir_dev *);
 static int esi_close(struct sir_dev *);
-static int esi_change_speed(struct sir_dev *, unsigned);
+static int esi_change_speed(struct sir_dev *, unsigned int);
 static int esi_reset(struct sir_dev *);
 
 static struct dongle_driver esi = {
@@ -93,7 +93,7 @@ static int esi_close(struct sir_dev *dev)
  * Apparently (see old esi-driver) no delays are needed here...
  *
  */
-static int esi_change_speed(struct sir_dev *dev, unsigned speed)
+static int esi_change_speed(struct sir_dev *dev, unsigned int speed)
 {
 	int ret = 0;
 	int dtr, rts;
-- 
2.10.1 (Apple Git-78)

^ permalink raw reply related

* [PATCH 2/3] staging: irda: add spaces around '|' operator
From: JI-HUN KIM @ 2017-12-27  5:54 UTC (permalink / raw)
  To: samuel
  Cc: gregkh, davem, shreeya.patel23498, netdev, devel, kernel-janitors,
	linux-kernel

Clean up checkpatch warning:
CHECK: spaces preferred around that '|' (ctx:VxV)

Signed-off-by: JI-HUN KIM <jihuun.k@gmail.com>
---
 drivers/staging/irda/drivers/esi-sir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/irda/drivers/esi-sir.c b/drivers/staging/irda/drivers/esi-sir.c
index a12cf55..00866a3 100644
--- a/drivers/staging/irda/drivers/esi-sir.c
+++ b/drivers/staging/irda/drivers/esi-sir.c
@@ -69,7 +69,7 @@ static int esi_open(struct sir_dev *dev)
 	/* Power up and set dongle to 9600 baud */
 	sirdev_set_dtr_rts(dev, FALSE, TRUE);
 
-	qos->baud_rate.bits &= IR_9600|IR_19200|IR_115200;
+	qos->baud_rate.bits &= IR_9600 | IR_19200 | IR_115200;
 	qos->min_turn_time.bits = 0x01; /* Needs at least 10 ms */
 	irda_qos_bits_to_value(qos);
 
-- 
2.10.1 (Apple Git-78)

^ permalink raw reply related

* [PATCH 3/3] staging: irda: separate multiple assignments
From: JI-HUN KIM @ 2017-12-27  5:55 UTC (permalink / raw)
  To: samuel
  Cc: gregkh, davem, shreeya.patel23498, netdev, devel, kernel-janitors,
	linux-kernel

Clean up checkpatch warning:
CHECK: multiple assignments should be avoided

Signed-off-by: JI-HUN KIM <jihuun.k@gmail.com>
---
 drivers/staging/irda/drivers/esi-sir.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/irda/drivers/esi-sir.c b/drivers/staging/irda/drivers/esi-sir.c
index 00866a3..01097f1 100644
--- a/drivers/staging/irda/drivers/esi-sir.c
+++ b/drivers/staging/irda/drivers/esi-sir.c
@@ -104,7 +104,8 @@ static int esi_change_speed(struct sir_dev *dev, unsigned int speed)
 		rts = FALSE;
 		break;
 	case 115200:
-		dtr = rts = TRUE;
+		dtr = TRUE;
+		rts = TRUE;
 		break;
 	default:
 		ret = -EINVAL;
-- 
2.10.1 (Apple Git-78)


^ permalink raw reply related

* Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry
From: Masami Hiramatsu @ 2017-12-27  5:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Josef Bacik, rostedt, mingo, davem, netdev,
	linux-kernel, ast, kernel-team, daniel, linux-btrfs, darrick.wong,
	Josef Bacik, Akinobu Mita
In-Reply-To: <20171227015730.jjggymg4uqllteuy@ast-mbp>

On Tue, 26 Dec 2017 17:57:32 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Tue, Dec 26, 2017 at 04:46:59PM +0900, Masami Hiramatsu wrote:
> > Check whether error injectable event is on function entry or not.
> > Currently it checks the event is ftrace-based kprobes or not,
> > but that is wrong. It should check if the event is on the entry
> > of target function. Since error injection will override a function
> > to just return with modified return value, that operation must
> > be done before the target function starts making stackframe.
> > 
> > As a side effect, bpf error injection is no need to depend on
> > function-tracer. It can work with sw-breakpoint based kprobe
> > events too.
> > 
> > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> > ---
> >  kernel/trace/Kconfig        |    2 --
> >  kernel/trace/bpf_trace.c    |    6 +++---
> >  kernel/trace/trace_kprobe.c |    8 +++++---
> >  kernel/trace/trace_probe.h  |   12 ++++++------
> >  4 files changed, 14 insertions(+), 14 deletions(-)
> > 
> > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > index ae3a2d519e50..6400e1bf97c5 100644
> > --- a/kernel/trace/Kconfig
> > +++ b/kernel/trace/Kconfig
> > @@ -533,9 +533,7 @@ config FUNCTION_PROFILER
> >  config BPF_KPROBE_OVERRIDE
> >  	bool "Enable BPF programs to override a kprobed function"
> >  	depends on BPF_EVENTS
> > -	depends on KPROBES_ON_FTRACE
> >  	depends on HAVE_KPROBE_OVERRIDE
> > -	depends on DYNAMIC_FTRACE_WITH_REGS
> >  	default n
> >  	help
> >  	 Allows BPF to override the execution of a probed function and
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index f6d2327ecb59..d663660f8392 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -800,11 +800,11 @@ int perf_event_attach_bpf_prog(struct perf_event *event,
> >  	int ret = -EEXIST;
> >  
> >  	/*
> > -	 * Kprobe override only works for ftrace based kprobes, and only if they
> > -	 * are on the opt-in list.
> > +	 * Kprobe override only works if they are on the function entry,
> > +	 * and only if they are on the opt-in list.
> >  	 */
> >  	if (prog->kprobe_override &&
> > -	    (!trace_kprobe_ftrace(event->tp_event) ||
> > +	    (!trace_kprobe_on_func_entry(event->tp_event) ||
> >  	     !trace_kprobe_error_injectable(event->tp_event)))
> >  		return -EINVAL;
> >  
> > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> > index 91f4b57dab82..265e3e27e8dc 100644
> > --- a/kernel/trace/trace_kprobe.c
> > +++ b/kernel/trace/trace_kprobe.c
> > @@ -88,13 +88,15 @@ static nokprobe_inline unsigned long trace_kprobe_nhit(struct trace_kprobe *tk)
> >  	return nhit;
> >  }
> >  
> > -int trace_kprobe_ftrace(struct trace_event_call *call)
> > +bool trace_kprobe_on_func_entry(struct trace_event_call *call)
> >  {
> >  	struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
> > -	return kprobe_ftrace(&tk->rp.kp);
> > +
> > +	return kprobe_on_func_entry(tk->rp.kp.addr, tk->rp.kp.symbol_name,
> > +				    tk->rp.kp.offset);
> 
> That would be nice, but did you test this?

Yes, because the jprobe, which was only official user of modifying execution
path using kprobe, did same way to check. (and kretprobe also does it)

> My understanding that kprobe will restore all regs and
> here we need to override return ip _and_ value.

yes, no problem. kprobe restore all regs from pt_regs, including regs->ip.

> Could you add a patch with the test the way Josef did
> or describe the steps to test this new mode?

Would you mean below patch? If so, it should work without any change.

 [PATCH v10 4/5] samples/bpf: add a test for bpf_override_return

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply

* RE: [PATCH net] bnx2x: Improve reliability in case of nested PCI errors
From: Shaikh, Shahed @ 2017-12-27  6:24 UTC (permalink / raw)
  To: Guilherme G. Piccoli, Elior, Ariel, Dept-Eng Everest Linux L2
  Cc: netdev@vger.kernel.org, gpiccoli@protonmail.ch
In-Reply-To: <20171222150139.10244-1-gpiccoli@linux.vnet.ibm.com>

> -----Original Message-----
> From: Guilherme G. Piccoli [mailto:gpiccoli@linux.vnet.ibm.com]
> Sent: Friday, December 22, 2017 8:32 PM
> To: Elior, Ariel <Ariel.Elior@cavium.com>; Dept-Eng Everest Linux L2 <Dept-
> EngEverestLinuxL2@cavium.com>
> Cc: netdev@vger.kernel.org; gpiccoli@linux.vnet.ibm.com;
> gpiccoli@protonmail.ch
> Subject: [PATCH net] bnx2x: Improve reliability in case of nested PCI errors
> 
> While in recovery process of PCI error (called EEH on PowerPC arch), another
> PCI transaction could be corrupted causing a situation of nested PCI errors. Also,
> this scenario could be reproduced with error injection mechanisms (for debug
> purposes).
> 
> We observe that in case of nested PCI errors, bnx2x might attempt to initialize
> its shmem and cause a kernel crash due to bad addresses read from MCP.
> Multiple different stack traces were observed depending on the point the second
> PCI error happens.
> 
> This patch avoids the crashes by:
> 
>  * failing PCI recovery in case of nested errors (since multiple  PCI errors in a row
> are not expected to lead to a functional  adapter anyway), and by,
> 
>  * preventing access to adapter FW when MCP is failed (we mark it as  failed
> when shmem cannot get initialized properly).
> 
> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>

Acked-by: Shahed Shaikh <Shahed.Shaikh@cavium.com>

Thanks,
Shahed

^ permalink raw reply

* [PATCH net-next] cxgb4: use CLIP with LIP6 on T6 for TCAM filters
From: Ganesh Goudar @ 2017-12-27  7:42 UTC (permalink / raw)
  To: netdev, davem
  Cc: nirranjan, indranil, venkatesh, Ganesh Goudar, Kumar Sanghvi

On T6, LIP compression is always enabled for IPv6 and uncompressed
IPv6 for LIP is not supported. So, for IPv6 TCAM filters on T6,
add LIP6 to CLIP on filter creation, and release the same on filter
deletion.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
index 5980f30..6829de9 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
@@ -694,7 +694,7 @@ void clear_filter(struct adapter *adap, struct filter_entry *f)
 	if (f->smt)
 		cxgb4_smt_release(f->smt);
 
-	if (f->fs.hash && f->fs.type)
+	if ((f->fs.hash || is_t6(adap->params.chip)) && f->fs.type)
 		cxgb4_clip_release(f->dev, (const u32 *)&f->fs.val.lip, 1);
 
 	/* The zeroing of the filter rule below clears the filter valid,
@@ -1291,6 +1291,16 @@ int __cxgb4_set_filter(struct net_device *dev, int filter_id,
 	if (f->valid)
 		clear_filter(adapter, f);
 
+	if (is_t6(adapter->params.chip) && fs->type &&
+	    ipv6_addr_type((const struct in6_addr *)fs->val.lip) !=
+	    IPV6_ADDR_ANY) {
+		ret = cxgb4_clip_get(dev, (const u32 *)&fs->val.lip, 1);
+		if (ret) {
+			cxgb4_clear_ftid(&adapter->tids, filter_id, PF_INET6);
+			return ret;
+		}
+	}
+
 	/* Convert the filter specification into our internal format.
 	 * We copy the PF/VF specification into the Outer VLAN field
 	 * here so the rest of the code -- including the interface to
-- 
2.1.0

^ permalink raw reply related

* [PATCH iproute2-next 01/10] rdma: Reduce scope of _dev_map_lookup call
From: Leon Romanovsky @ 2017-12-27  7:57 UTC (permalink / raw)
  To: David Ahern; +Cc: Leon Romanovsky, netdev, Stephen Hemminger
In-Reply-To: <20171227075759.15289-1-leon@kernel.org>

From: Leon Romanovsky <leonro@mellanox.com>

There is no external users of _dev_map_lookup function,
so let's limit its scope to be local.

Fixes: 40df8263a0f0 ("rdma: Add dev object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 rdma/rdma.h  | 1 -
 rdma/utils.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/rdma/rdma.h b/rdma/rdma.h
index d551eb29..c07493c9 100644
--- a/rdma/rdma.h
+++ b/rdma/rdma.h
@@ -78,7 +78,6 @@ int rd_exec_cmd(struct rd *rd, const struct rd_cmd *c, const char *str);
  */
 void rd_free_devmap(struct rd *rd);
 struct dev_map *dev_map_lookup(struct rd *rd, bool allow_port_index);
-struct dev_map *_dev_map_lookup(struct rd *rd, const char *dev_name);
 
 /*
  * Netlink
diff --git a/rdma/utils.c b/rdma/utils.c
index eb4377cf..6ce1fd70 100644
--- a/rdma/utils.c
+++ b/rdma/utils.c
@@ -236,7 +236,7 @@ int rd_recv_msg(struct rd *rd, mnl_cb_t callback, void *data, unsigned int seq)
 	return ret;
 }
 
-struct dev_map *_dev_map_lookup(struct rd *rd, const char *dev_name)
+static struct dev_map *_dev_map_lookup(struct rd *rd, const char *dev_name)
 {
 	struct dev_map *dev_map;
 
-- 
2.15.1

^ permalink raw reply related

* [PATCH iproute2-next 00/10] RDMAtool cleanup and refactoring code
From: Leon Romanovsky @ 2017-12-27  7:57 UTC (permalink / raw)
  To: David Ahern; +Cc: Leon Romanovsky, netdev, Stephen Hemminger

From: Leon Romanovsky <leonro@mellanox.com>

Hi,

The following patchset comes as a preparation to more complex code,
which will add resource tracking visibility to the rdmatool, where
the kernel part is under review of RDMA community.

Thanks

[1] https://marc.info/?l=linux-rdma&m=151412508816802&w=2

Leon Romanovsky (10):
  rdma: Reduce scope of _dev_map_lookup call
  rdma: Protect dev_map_lookup from wrong input
  rdma: Move per-device handler function to generic code
  rdma: Fix misspelled SYS_IMAGE_GUID
  rdma: Check that port index exists before operate on link layer
  rdma: Print supplied device name in case of wrong name
  rdma: Get rid of dev_map_free call
  rdma: Rename free function to be rd_cleanup
  rdma: Rename rd_free_devmap to be rd_free
  rdma: Move link execution logic to common code

 rdma/dev.c   |  28 +----------------
 rdma/link.c  |  51 +++---------------------------
 rdma/rdma.c  |   7 ++---
 rdma/rdma.h  |   5 +--
 rdma/utils.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 5 files changed, 100 insertions(+), 91 deletions(-)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox