Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] net: bridge: add support for per-port vlan stats
From: Stephen Hemminger @ 2018-10-12 15:39 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, davem, bridge, Roopa Prabhu
In-Reply-To: <20181012104116.12751-1-nikolay@cumulusnetworks.com>

On Fri, 12 Oct 2018 13:41:16 +0300
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> This patch adds an option to have per-port vlan stats instead of the
> default global stats. The option can be set only when there are no port
> vlans in the bridge since we need to allocate the stats if it is set
> when vlans are being added to ports (and respectively free them
> when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the
> largest user of options. The current stats design allows us to add
> these without any changes to the fast-path, it all comes down to
> the per-vlan stats pointer which, if this option is enabled, will
> be allocated for each port vlan instead of using the global bridge-wide
> one.
> 
> CC: bridge@lists.linux-foundation.org
> CC: Roopa Prabhu <roopa@cumulusnetworks.com>
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

Yes, per-vlan stats could be quite useful.
Most cases of statistics in the kernel are always on, and some API's
get them (and skip others).  Other than the additional memory overhead, why
not make the statistics as always on.

Also, is there any chance of creating too much data in a netlink
message if there are 4K-1 VLAN's?

^ permalink raw reply

* Re: [PATCH net] rxrpc: Fix incorrect conditional on IPV6
From: David Howells @ 2018-10-12 15:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: dhowells, Arnd Bergmann, netdev, linux-afs
In-Reply-To: <9ee1a767-b865-6afa-91a2-311422b8338d@gmail.com>

Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Nit : Correct attribution would require a Reported-by: tag

Point.  I'll repost it with a To: line also, I'm waiting to see if Arnd will
ack it.

David

^ permalink raw reply

* [PATCH iproute2] json: make 0xhex handle u64
From: Sabrina Dubroca @ 2018-10-12 15:34 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, Sabrina Dubroca, David Ahern

Stephen converted macsec's sci to use 0xhex, but 0xhex handles
unsigned int's, not 64 bits ints. Thus, the output of the "ip macsec
show" command is mangled, with half of the SCI replaced with 0s:

# ip macsec show
11: macsec0: [...]
    cipher suite: GCM-AES-128, using ICV length 16
    TXSC: 0000000001560001 on SA 0

# ip -d link show macsec0
11: macsec0@ens3: [...]
    link/ether 52:54:00:12:01:56 brd ff:ff:ff:ff:ff:ff promiscuity 0 
    macsec sci 5254001201560001 [...]

where TXSC and sci should match.

Fixes: c0b904de6211 ("macsec: support JSON")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 include/json_print.h | 2 +-
 lib/json_print.c     | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/json_print.h b/include/json_print.h
index 78a6c83fb516..218da31a73fe 100644
--- a/include/json_print.h
+++ b/include/json_print.h
@@ -66,7 +66,7 @@ _PRINT_FUNC(uint, unsigned int);
 _PRINT_FUNC(u64, uint64_t);
 _PRINT_FUNC(hu, unsigned short);
 _PRINT_FUNC(hex, unsigned int);
-_PRINT_FUNC(0xhex, unsigned int);
+_PRINT_FUNC(0xhex, unsigned long long int);
 _PRINT_FUNC(luint, unsigned long int);
 _PRINT_FUNC(lluint, unsigned long long int);
 _PRINT_FUNC(float, double);
diff --git a/lib/json_print.c b/lib/json_print.c
index eed109c56401..f7ef41c1570f 100644
--- a/lib/json_print.c
+++ b/lib/json_print.c
@@ -172,12 +172,12 @@ void print_color_0xhex(enum output_type type,
 		       enum color_attr color,
 		       const char *key,
 		       const char *fmt,
-		       unsigned int hex)
+		       unsigned long long hex)
 {
 	if (_IS_JSON_CONTEXT(type)) {
 		SPRINT_BUF(b1);
 
-		snprintf(b1, sizeof(b1), "%#x", hex);
+		snprintf(b1, sizeof(b1), "%#llx", hex);
 		print_string(PRINT_JSON, key, NULL, b1);
 	} else if (_IS_FP_CONTEXT(type)) {
 		color_fprintf(stdout, color, fmt, hex);
-- 
2.19.1

^ permalink raw reply related

* [PATCH iproute2] macsec: fix off-by-one when parsing attributes
From: Sabrina Dubroca @ 2018-10-12 15:34 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, Sabrina Dubroca, David Ahern

I seem to have had a massive brainfart with uses of
parse_rtattr_nested(). The rtattr* array must have MAX+1 elements, and
the call to parse_rtattr_nested must have MAX as its bound. Let's fix
those.

Fixes: b26fc590ce62 ("ip: add MACsec support")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 ip/ipmacsec.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/ip/ipmacsec.c b/ip/ipmacsec.c
index fa56e0eee774..007ce5404788 100644
--- a/ip/ipmacsec.c
+++ b/ip/ipmacsec.c
@@ -727,7 +727,7 @@ static void print_txsc_stats(const char *prefix, struct rtattr *attr)
 	if (!attr || show_stats == 0)
 		return;
 
-	parse_rtattr_nested(stats, MACSEC_TXSC_STATS_ATTR_MAX + 1, attr);
+	parse_rtattr_nested(stats, MACSEC_TXSC_STATS_ATTR_MAX, attr);
 
 	print_stats(prefix, txsc_stats_names, NUM_MACSEC_TXSC_STATS_ATTR,
 		    stats);
@@ -751,7 +751,7 @@ static void print_secy_stats(const char *prefix, struct rtattr *attr)
 	if (!attr || show_stats == 0)
 		return;
 
-	parse_rtattr_nested(stats, MACSEC_SECY_STATS_ATTR_MAX + 1, attr);
+	parse_rtattr_nested(stats, MACSEC_SECY_STATS_ATTR_MAX, attr);
 
 	print_stats(prefix, secy_stats_names,
 		    NUM_MACSEC_SECY_STATS_ATTR, stats);
@@ -772,7 +772,7 @@ static void print_rxsa_stats(const char *prefix, struct rtattr *attr)
 	if (!attr || show_stats == 0)
 		return;
 
-	parse_rtattr_nested(stats, MACSEC_SA_STATS_ATTR_MAX + 1, attr);
+	parse_rtattr_nested(stats, MACSEC_SA_STATS_ATTR_MAX, attr);
 
 	print_stats(prefix, rxsa_stats_names, NUM_MACSEC_SA_STATS_ATTR, stats);
 }
@@ -789,7 +789,7 @@ static void print_txsa_stats(const char *prefix, struct rtattr *attr)
 	if (!attr || show_stats == 0)
 		return;
 
-	parse_rtattr_nested(stats, MACSEC_SA_STATS_ATTR_MAX + 1, attr);
+	parse_rtattr_nested(stats, MACSEC_SA_STATS_ATTR_MAX, attr);
 
 	print_stats(prefix, txsa_stats_names, NUM_MACSEC_SA_STATS_ATTR, stats);
 }
@@ -817,7 +817,7 @@ static void print_tx_sc(const char *prefix, __u64 sci, __u8 encoding_sa,
 		bool state;
 
 		open_json_object(NULL);
-		parse_rtattr_nested(sa_attr, MACSEC_SA_ATTR_MAX + 1, a);
+		parse_rtattr_nested(sa_attr, MACSEC_SA_ATTR_MAX, a);
 		state = rta_getattr_u8(sa_attr[MACSEC_SA_ATTR_ACTIVE]);
 
 		print_string(PRINT_FP, NULL, "%s", prefix);
@@ -858,7 +858,7 @@ static void print_rxsc_stats(const char *prefix, struct rtattr *attr)
 	if (!attr || show_stats == 0)
 		return;
 
-	parse_rtattr_nested(stats, MACSEC_RXSC_STATS_ATTR_MAX + 1, attr);
+	parse_rtattr_nested(stats, MACSEC_RXSC_STATS_ATTR_MAX, attr);
 
 	print_stats(prefix, rxsc_stats_names,
 		    NUM_MACSEC_RXSC_STATS_ATTR, stats);
@@ -885,7 +885,7 @@ static void print_rx_sc(const char *prefix, __be64 sci, __u8 active,
 		bool state;
 
 		open_json_object(NULL);
-		parse_rtattr_nested(sa_attr, MACSEC_SA_ATTR_MAX + 1, a);
+		parse_rtattr_nested(sa_attr, MACSEC_SA_ATTR_MAX, a);
 		state = rta_getattr_u8(sa_attr[MACSEC_SA_ATTR_ACTIVE]);
 
 		print_string(PRINT_FP, NULL, "%s", prefix);
@@ -918,7 +918,7 @@ static void print_rxsc_list(struct rtattr *sc)
 
 		open_json_object(NULL);
 
-		parse_rtattr_nested(sc_attr, MACSEC_RXSC_ATTR_MAX + 1, c);
+		parse_rtattr_nested(sc_attr, MACSEC_RXSC_ATTR_MAX, c);
 		print_rx_sc("    ",
 			    rta_getattr_u64(sc_attr[MACSEC_RXSC_ATTR_SCI]),
 			    rta_getattr_u32(sc_attr[MACSEC_RXSC_ATTR_ACTIVE]),
@@ -958,7 +958,7 @@ static int process(const struct sockaddr_nl *who, struct nlmsghdr *n,
 	}
 
 	ifindex = rta_getattr_u32(attrs[MACSEC_ATTR_IFINDEX]);
-	parse_rtattr_nested(attrs_secy, MACSEC_SECY_ATTR_MAX + 1,
+	parse_rtattr_nested(attrs_secy, MACSEC_SECY_ATTR_MAX,
 			    attrs[MACSEC_ATTR_SECY]);
 
 	if (!validate_secy_dump(attrs_secy)) {
-- 
2.19.1

^ permalink raw reply related

* [PATCH net-next 3/4] s390/qeth: add support for IPv6 TSO
From: Julian Wiedmann @ 2018-10-12 15:27 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20181012152715.6153-1-jwi@linux.ibm.com>

This adds TSO6 support for L3 qeth devices.
Just like for standard IPv6 traffic, TSO6 doesn't use IP offload and
thus runs over the normal qeth_xmit() path.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
---
 drivers/s390/net/qeth_core.h      |  2 ++
 drivers/s390/net/qeth_core_main.c | 49 +++++++++++++++++++++++++++++++++++----
 drivers/s390/net/qeth_l3_main.c   | 45 +++++++++++++++--------------------
 3 files changed, 65 insertions(+), 31 deletions(-)

diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index cd44ff2df6fe..c1278785a13c 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -1047,6 +1047,8 @@ int qeth_vm_request_mac(struct qeth_card *card);
 int qeth_add_hw_header(struct qeth_card *card, struct sk_buff *skb,
 		       struct qeth_hdr **hdr, unsigned int hdr_len,
 		       unsigned int proto_len, unsigned int *elements);
+void qeth_fill_tso_ext(struct qeth_hdr_tso *hdr, unsigned int payload_len,
+		       struct sk_buff *skb, unsigned int proto_len);
 int qeth_xmit(struct qeth_card *card, struct sk_buff *skb,
 	      struct qeth_qdio_out_q *queue, int ipv, int cast_type,
 	      void (*fill_header)(struct qeth_card *card, struct qeth_hdr *hdr,
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index ab022b6a41ea..3274f13aad57 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -4088,15 +4088,31 @@ int qeth_do_send_packet(struct qeth_card *card, struct qeth_qdio_out_q *queue,
 }
 EXPORT_SYMBOL_GPL(qeth_do_send_packet);
 
+void qeth_fill_tso_ext(struct qeth_hdr_tso *hdr, unsigned int payload_len,
+		       struct sk_buff *skb, unsigned int proto_len)
+{
+	struct qeth_hdr_ext_tso *ext = &hdr->ext;
+
+	ext->hdr_tot_len = sizeof(*ext);
+	ext->imb_hdr_no = 1;
+	ext->hdr_type = 1;
+	ext->hdr_version = 1;
+	ext->hdr_len = 28;
+	ext->payload_len = payload_len;
+	ext->mss = skb_shinfo(skb)->gso_size;
+	ext->dg_hdr_len = proto_len;
+}
+EXPORT_SYMBOL_GPL(qeth_fill_tso_ext);
+
 int qeth_xmit(struct qeth_card *card, struct sk_buff *skb,
 	      struct qeth_qdio_out_q *queue, int ipv, int cast_type,
 	      void (*fill_header)(struct qeth_card *card, struct qeth_hdr *hdr,
 				  struct sk_buff *skb, int ipv, int cast_type,
 				  unsigned int data_len))
 {
-	const unsigned int proto_len = IS_IQD(card) ? ETH_HLEN : 0;
-	const unsigned int hw_hdr_len = sizeof(struct qeth_hdr);
+	unsigned int proto_len, hw_hdr_len;
 	unsigned int frame_len = skb->len;
+	bool is_tso = skb_is_gso(skb);
 	unsigned int data_offset = 0;
 	struct qeth_hdr *hdr = NULL;
 	unsigned int hd_len = 0;
@@ -4104,6 +4120,14 @@ int qeth_xmit(struct qeth_card *card, struct sk_buff *skb,
 	int push_len, rc;
 	bool is_sg;
 
+	if (is_tso) {
+		hw_hdr_len = sizeof(struct qeth_hdr_tso);
+		proto_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
+	} else {
+		hw_hdr_len = sizeof(struct qeth_hdr);
+		proto_len = IS_IQD(card) ? ETH_HLEN : 0;
+	}
+
 	rc = skb_cow_head(skb, hw_hdr_len);
 	if (rc)
 		return rc;
@@ -4112,13 +4136,16 @@ int qeth_xmit(struct qeth_card *card, struct sk_buff *skb,
 				      &elements);
 	if (push_len < 0)
 		return push_len;
-	if (!push_len) {
+	if (is_tso || !push_len) {
 		/* HW header needs its own buffer element. */
 		hd_len = hw_hdr_len + proto_len;
-		data_offset = proto_len;
+		data_offset = push_len + proto_len;
 	}
 	memset(hdr, 0, hw_hdr_len);
 	fill_header(card, hdr, skb, ipv, cast_type, frame_len);
+	if (is_tso)
+		qeth_fill_tso_ext((struct qeth_hdr_tso *) hdr,
+				  frame_len - proto_len, skb, proto_len);
 
 	is_sg = skb_is_nonlinear(skb);
 	if (IS_IQD(card)) {
@@ -4136,6 +4163,10 @@ int qeth_xmit(struct qeth_card *card, struct sk_buff *skb,
 			card->perf_stats.buf_elements_sent += elements;
 			if (is_sg)
 				card->perf_stats.sg_skbs_sent++;
+			if (is_tso) {
+				card->perf_stats.large_send_bytes += frame_len;
+				card->perf_stats.large_send_cnt++;
+			}
 		}
 	} else {
 		if (!push_len)
@@ -6516,7 +6547,7 @@ static int qeth_set_ipa_rx_csum(struct qeth_card *card, bool on)
 }
 
 #define QETH_HW_FEATURES (NETIF_F_RXCSUM | NETIF_F_IP_CSUM | NETIF_F_TSO | \
-			  NETIF_F_IPV6_CSUM)
+			  NETIF_F_IPV6_CSUM | NETIF_F_TSO6)
 /**
  * qeth_enable_hw_features() - (Re-)Enable HW functions for device features
  * @dev:	a net_device
@@ -6572,6 +6603,12 @@ int qeth_set_features(struct net_device *dev, netdev_features_t features)
 		if (rc)
 			changed ^= NETIF_F_TSO;
 	}
+	if (changed & NETIF_F_TSO6) {
+		rc = qeth_set_ipa_tso(card, features & NETIF_F_TSO6,
+				      QETH_PROT_IPV6);
+		if (rc)
+			changed ^= NETIF_F_TSO6;
+	}
 
 	/* everything changed successfully? */
 	if ((dev->features ^ features) == changed)
@@ -6597,6 +6634,8 @@ netdev_features_t qeth_fix_features(struct net_device *dev,
 		features &= ~NETIF_F_RXCSUM;
 	if (!qeth_is_supported(card, IPA_OUTBOUND_TSO))
 		features &= ~NETIF_F_TSO;
+	if (!qeth_is_supported6(card, IPA_OUTBOUND_TSO))
+		features &= ~NETIF_F_TSO6;
 	/* if the card isn't up, remove features that require hw changes */
 	if (card->state == CARD_STATE_DOWN ||
 	    card->state == CARD_STATE_RECOVER)
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 80893481bb85..ac7c8ae123c3 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -2099,22 +2099,6 @@ static void qeth_l3_fill_header(struct qeth_card *card, struct qeth_hdr *hdr,
 	rcu_read_unlock();
 }
 
-static void qeth_l3_fill_tso_ext(struct qeth_hdr_tso *hdr,
-				 unsigned int payload_len, struct sk_buff *skb,
-				 unsigned int proto_len)
-{
-	struct qeth_hdr_ext_tso *ext = &hdr->ext;
-
-	ext->hdr_tot_len = sizeof(*ext);
-	ext->imb_hdr_no = 1;
-	ext->hdr_type = 1;
-	ext->hdr_version = 1;
-	ext->hdr_len = 28;
-	ext->payload_len = payload_len;
-	ext->mss = skb_shinfo(skb)->gso_size;
-	ext->dg_hdr_len = proto_len;
-}
-
 static void qeth_l3_fixup_headers(struct sk_buff *skb)
 {
 	struct iphdr *iph = ip_hdr(skb);
@@ -2175,9 +2159,9 @@ static int qeth_l3_xmit(struct qeth_card *card, struct sk_buff *skb,
 	} else {
 		qeth_l3_fill_header(card, hdr, skb, ipv, cast_type, frame_len);
 		if (is_tso)
-			qeth_l3_fill_tso_ext((struct qeth_hdr_tso *) hdr,
-					     frame_len - proto_len, skb,
-					     proto_len);
+			qeth_fill_tso_ext((struct qeth_hdr_tso *) hdr,
+					  frame_len - proto_len, skb,
+					  proto_len);
 	}
 
 	is_sg = skb_is_nonlinear(skb);
@@ -2401,6 +2385,7 @@ static const struct net_device_ops qeth_l3_osa_netdev_ops = {
 
 static int qeth_l3_setup_netdev(struct qeth_card *card)
 {
+	unsigned int headroom;
 	int rc;
 
 	if (card->dev->netdev_ops)
@@ -2415,11 +2400,6 @@ static int qeth_l3_setup_netdev(struct qeth_card *card)
 		}
 
 		card->dev->netdev_ops = &qeth_l3_osa_netdev_ops;
-		card->dev->needed_headroom = sizeof(struct qeth_hdr);
-		/* allow for de-acceleration of NETIF_F_HW_VLAN_CTAG_TX: */
-		card->dev->needed_headroom += VLAN_HLEN;
-		if (qeth_is_supported(card, IPA_OUTBOUND_TSO))
-			card->dev->needed_headroom = sizeof(struct qeth_hdr_tso);
 
 		/*IPv6 address autoconfiguration stuff*/
 		qeth_l3_get_unique_id(card);
@@ -2438,10 +2418,22 @@ static int qeth_l3_setup_netdev(struct qeth_card *card)
 			card->dev->hw_features |= NETIF_F_IPV6_CSUM;
 			card->dev->vlan_features |= NETIF_F_IPV6_CSUM;
 		}
+		if (qeth_is_supported6(card, IPA_OUTBOUND_TSO)) {
+			card->dev->hw_features |= NETIF_F_TSO6;
+			card->dev->vlan_features |= NETIF_F_TSO6;
+		}
+
+		/* allow for de-acceleration of NETIF_F_HW_VLAN_CTAG_TX: */
+		if (card->dev->hw_features & NETIF_F_TSO6)
+			headroom = sizeof(struct qeth_hdr_tso) + VLAN_HLEN;
+		else if (card->dev->hw_features & NETIF_F_TSO)
+			headroom = sizeof(struct qeth_hdr_tso);
+		else
+			headroom = sizeof(struct qeth_hdr) + VLAN_HLEN;
 	} else if (card->info.type == QETH_CARD_TYPE_IQD) {
 		card->dev->flags |= IFF_NOARP;
 		card->dev->netdev_ops = &qeth_l3_netdev_ops;
-		card->dev->needed_headroom = sizeof(struct qeth_hdr) - ETH_HLEN;
+		headroom = sizeof(struct qeth_hdr) - ETH_HLEN;
 
 		rc = qeth_l3_iqd_read_initial_mac(card);
 		if (rc)
@@ -2452,13 +2444,14 @@ static int qeth_l3_setup_netdev(struct qeth_card *card)
 	} else
 		return -ENODEV;
 
+	card->dev->needed_headroom = headroom;
 	card->dev->ethtool_ops = &qeth_l3_ethtool_ops;
 	card->dev->features |=	NETIF_F_HW_VLAN_CTAG_TX |
 				NETIF_F_HW_VLAN_CTAG_RX |
 				NETIF_F_HW_VLAN_CTAG_FILTER;
 
 	netif_keep_dst(card->dev);
-	if (card->dev->hw_features & NETIF_F_TSO)
+	if (card->dev->hw_features & (NETIF_F_TSO | NETIF_F_TSO6))
 		netif_set_gso_max_size(card->dev,
 				       PAGE_SIZE * (QETH_MAX_BUFFER_ELEMENTS(card) - 1));
 
-- 
2.16.4

^ permalink raw reply related

* [PATCH net-next 4/4] s390/qeth: add TSO support for L2 devices
From: Julian Wiedmann @ 2018-10-12 15:27 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20181012152715.6153-1-jwi@linux.ibm.com>

Except for the new HW header id, this works just like TSO6 on L3 devices
and reuses all the existing data path support in qeth_xmit().

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
---
 drivers/s390/net/qeth_core.h    |  3 ++-
 drivers/s390/net/qeth_l2_main.c | 30 ++++++++++++++++++++++++------
 drivers/s390/net/qeth_l3_main.c |  2 +-
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index c1278785a13c..6843bc7ee9f2 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -390,8 +390,9 @@ enum qeth_layer2_frame_flags {
 enum qeth_header_ids {
 	QETH_HEADER_TYPE_LAYER3 = 0x01,
 	QETH_HEADER_TYPE_LAYER2 = 0x02,
-	QETH_HEADER_TYPE_TSO	= 0x03,
+	QETH_HEADER_TYPE_L3_TSO	= 0x03,
 	QETH_HEADER_TYPE_OSN    = 0x04,
+	QETH_HEADER_TYPE_L2_TSO	= 0x06,
 };
 /* flags for qeth_hdr.ext_flags */
 #define QETH_HDR_EXT_VLAN_FRAME       0x01
diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index c810d53fff51..23aaf373f631 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -197,15 +197,19 @@ static void qeth_l2_fill_header(struct qeth_card *card, struct qeth_hdr *hdr,
 				struct sk_buff *skb, int ipv, int cast_type,
 				unsigned int data_len)
 {
-	struct vlan_ethhdr *veth = (struct vlan_ethhdr *)skb_mac_header(skb);
+	struct vlan_ethhdr *veth = vlan_eth_hdr(skb);
 
-	hdr->hdr.l2.id = QETH_HEADER_TYPE_LAYER2;
 	hdr->hdr.l2.pkt_length = data_len;
 
-	if (skb->ip_summed == CHECKSUM_PARTIAL) {
-		qeth_tx_csum(skb, &hdr->hdr.l2.flags[1], ipv);
-		if (card->options.performance_stats)
-			card->perf_stats.tx_csum++;
+	if (skb_is_gso(skb)) {
+		hdr->hdr.l2.id = QETH_HEADER_TYPE_L2_TSO;
+	} else {
+		hdr->hdr.l2.id = QETH_HEADER_TYPE_LAYER2;
+		if (skb->ip_summed == CHECKSUM_PARTIAL) {
+			qeth_tx_csum(skb, &hdr->hdr.l2.flags[1], ipv);
+			if (card->options.performance_stats)
+				card->perf_stats.tx_csum++;
+		}
 	}
 
 	/* set byte byte 3 to casting flags */
@@ -897,6 +901,20 @@ static int qeth_l2_setup_netdev(struct qeth_card *card)
 		card->dev->hw_features |= NETIF_F_RXCSUM;
 		card->dev->vlan_features |= NETIF_F_RXCSUM;
 	}
+	if (qeth_is_supported(card, IPA_OUTBOUND_TSO)) {
+		card->dev->hw_features |= NETIF_F_TSO;
+		card->dev->vlan_features |= NETIF_F_TSO;
+	}
+	if (qeth_is_supported6(card, IPA_OUTBOUND_TSO)) {
+		card->dev->hw_features |= NETIF_F_TSO6;
+		card->dev->vlan_features |= NETIF_F_TSO6;
+	}
+
+	if (card->dev->hw_features & (NETIF_F_TSO | NETIF_F_TSO6)) {
+		card->dev->needed_headroom = sizeof(struct qeth_hdr_tso);
+		netif_set_gso_max_size(card->dev,
+				       PAGE_SIZE * (QDIO_MAX_ELEMENTS_PER_BUFFER - 1));
+	}
 
 	qeth_l2_request_initial_mac(card);
 	netif_napi_add(card->dev, &card->napi, qeth_poll, QETH_NAPI_WEIGHT);
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index ac7c8ae123c3..0b161cc1fd2e 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -2037,7 +2037,7 @@ static void qeth_l3_fill_header(struct qeth_card *card, struct qeth_hdr *hdr,
 	hdr->hdr.l3.length = data_len;
 
 	if (skb_is_gso(skb)) {
-		hdr->hdr.l3.id = QETH_HEADER_TYPE_TSO;
+		hdr->hdr.l3.id = QETH_HEADER_TYPE_L3_TSO;
 	} else {
 		hdr->hdr.l3.id = QETH_HEADER_TYPE_LAYER3;
 		if (skb->ip_summed == CHECKSUM_PARTIAL) {
-- 
2.16.4

^ permalink raw reply related

* [PATCH net-next 1/4] s390/qeth: make TSO controls protocol-agnostic
From: Julian Wiedmann @ 2018-10-12 15:27 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20181012152715.6153-1-jwi@linux.ibm.com>

In preparation for IPv6 TSO, turn the protocol version into a parameter
for the TSO control code.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
---
 drivers/s390/net/qeth_core_main.c | 41 ++++++++++++++++++++-------------------
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 1771d0073c0c..92e539d1fbd3 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -6396,27 +6396,27 @@ static int qeth_set_ipa_csum(struct qeth_card *card, bool on, int cstype,
 	return rc ? -EIO : 0;
 }
 
-static int qeth_set_ipa_tso(struct qeth_card *card, int on)
+static int qeth_set_tso_off(struct qeth_card *card,
+			    enum qeth_prot_versions prot)
 {
-	int rc;
+	return qeth_send_simple_setassparms_prot(card, IPA_OUTBOUND_TSO,
+						 IPA_CMD_ASS_STOP, 0, prot);
+}
 
-	QETH_CARD_TEXT(card, 3, "sttso");
+static int qeth_set_tso_on(struct qeth_card *card,
+			   enum qeth_prot_versions prot)
+{
+	return qeth_send_simple_setassparms_prot(card, IPA_OUTBOUND_TSO,
+						 IPA_CMD_ASS_START, 0, prot);
+}
 
-	if (on) {
-		rc = qeth_send_simple_setassparms(card, IPA_OUTBOUND_TSO,
-						  IPA_CMD_ASS_START, 0);
-		if (rc) {
-			dev_warn(&card->gdev->dev,
-				 "Starting outbound TCP segmentation offload for %s failed\n",
-				 QETH_CARD_IFNAME(card));
-			return -EIO;
-		}
-		dev_info(&card->gdev->dev, "Outbound TSO enabled\n");
-	} else {
-		rc = qeth_send_simple_setassparms(card, IPA_OUTBOUND_TSO,
-						  IPA_CMD_ASS_STOP, 0);
-	}
-	return rc;
+static int qeth_set_ipa_tso(struct qeth_card *card, bool on,
+			    enum qeth_prot_versions prot)
+{
+	int rc = on ? qeth_set_tso_on(card, prot) :
+		      qeth_set_tso_off(card, prot);
+
+	return rc ? -EIO : 0;
 }
 
 static int qeth_set_ipa_rx_csum(struct qeth_card *card, bool on)
@@ -6493,8 +6493,9 @@ int qeth_set_features(struct net_device *dev, netdev_features_t features)
 		if (rc)
 			changed ^= NETIF_F_RXCSUM;
 	}
-	if ((changed & NETIF_F_TSO)) {
-		rc = qeth_set_ipa_tso(card, features & NETIF_F_TSO ? 1 : 0);
+	if (changed & NETIF_F_TSO) {
+		rc = qeth_set_ipa_tso(card, features & NETIF_F_TSO,
+				      QETH_PROT_IPV4);
 		if (rc)
 			changed ^= NETIF_F_TSO;
 	}
-- 
2.16.4

^ permalink raw reply related

* [PATCH net-next 2/4] s390/qeth: enhance TSO control sequence
From: Julian Wiedmann @ 2018-10-12 15:27 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20181012152715.6153-1-jwi@linux.ibm.com>

TSO6 requires the full programming sequence, and not just a simple
START command. This implements the additional ENABLE command, and adds
some sanity checks that were missing for the START command.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
---
 drivers/s390/net/qeth_core_main.c | 77 ++++++++++++++++++++++++++++++++++++++-
 drivers/s390/net/qeth_core_mpc.h  | 26 +++++++++++++
 2 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 92e539d1fbd3..ab022b6a41ea 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -5394,6 +5394,21 @@ static int qeth_setassparms_inspect_rc(struct qeth_ipa_cmd *cmd)
 	return cmd->hdr.return_code;
 }
 
+static int qeth_setassparms_get_caps_cb(struct qeth_card *card,
+					struct qeth_reply *reply,
+					unsigned long data)
+{
+	struct qeth_ipa_cmd *cmd = (struct qeth_ipa_cmd *) data;
+	struct qeth_ipa_caps *caps = reply->param;
+
+	if (qeth_setassparms_inspect_rc(cmd))
+		return 0;
+
+	caps->supported = cmd->data.setassparms.data.caps.supported;
+	caps->enabled = cmd->data.setassparms.data.caps.enabled;
+	return 0;
+}
+
 int qeth_setassparms_cb(struct qeth_card *card,
 			struct qeth_reply *reply, unsigned long data)
 {
@@ -6396,6 +6411,20 @@ static int qeth_set_ipa_csum(struct qeth_card *card, bool on, int cstype,
 	return rc ? -EIO : 0;
 }
 
+static int qeth_start_tso_cb(struct qeth_card *card, struct qeth_reply *reply,
+			     unsigned long data)
+{
+	struct qeth_ipa_cmd *cmd = (struct qeth_ipa_cmd *) data;
+	struct qeth_tso_start_data *tso_data = reply->param;
+
+	if (qeth_setassparms_inspect_rc(cmd))
+		return 0;
+
+	tso_data->mss = cmd->data.setassparms.data.tso.mss;
+	tso_data->supported = cmd->data.setassparms.data.tso.supported;
+	return 0;
+}
+
 static int qeth_set_tso_off(struct qeth_card *card,
 			    enum qeth_prot_versions prot)
 {
@@ -6406,8 +6435,52 @@ static int qeth_set_tso_off(struct qeth_card *card,
 static int qeth_set_tso_on(struct qeth_card *card,
 			   enum qeth_prot_versions prot)
 {
-	return qeth_send_simple_setassparms_prot(card, IPA_OUTBOUND_TSO,
-						 IPA_CMD_ASS_START, 0, prot);
+	struct qeth_tso_start_data tso_data;
+	struct qeth_cmd_buffer *iob;
+	struct qeth_ipa_caps caps;
+	int rc;
+
+	iob = qeth_get_setassparms_cmd(card, IPA_OUTBOUND_TSO,
+				       IPA_CMD_ASS_START, 0, prot);
+	if (!iob)
+		return -ENOMEM;
+
+	rc = qeth_send_setassparms(card, iob, 0, 0 /* unused */,
+				   qeth_start_tso_cb, &tso_data);
+	if (rc)
+		return rc;
+
+	if (!tso_data.mss || !(tso_data.supported & QETH_IPA_LARGE_SEND_TCP)) {
+		qeth_set_tso_off(card, prot);
+		return -EOPNOTSUPP;
+	}
+
+	iob = qeth_get_setassparms_cmd(card, IPA_OUTBOUND_TSO,
+				       IPA_CMD_ASS_ENABLE, sizeof(caps), prot);
+	if (!iob) {
+		qeth_set_tso_off(card, prot);
+		return -ENOMEM;
+	}
+
+	/* enable TSO capability */
+	caps.supported = 0;
+	caps.enabled = QETH_IPA_LARGE_SEND_TCP;
+	rc = qeth_send_setassparms(card, iob, sizeof(caps), (long) &caps,
+				   qeth_setassparms_get_caps_cb, &caps);
+	if (rc) {
+		qeth_set_tso_off(card, prot);
+		return rc;
+	}
+
+	if (!qeth_ipa_caps_supported(&caps, QETH_IPA_LARGE_SEND_TCP) ||
+	    !qeth_ipa_caps_enabled(&caps, QETH_IPA_LARGE_SEND_TCP)) {
+		qeth_set_tso_off(card, prot);
+		return -EOPNOTSUPP;
+	}
+
+	dev_info(&card->gdev->dev, "TSOv%u enabled (MSS: %u)\n", prot,
+		 tso_data.mss);
+	return 0;
 }
 
 static int qeth_set_ipa_tso(struct qeth_card *card, bool on,
diff --git a/drivers/s390/net/qeth_core_mpc.h b/drivers/s390/net/qeth_core_mpc.h
index aa5de1fe01e1..e85090467afe 100644
--- a/drivers/s390/net/qeth_core_mpc.h
+++ b/drivers/s390/net/qeth_core_mpc.h
@@ -56,6 +56,21 @@ static inline bool qeth_intparm_is_iob(unsigned long intparm)
 #define IPA_CMD_INITIATOR_OSA_REPLY   0x81
 #define IPA_CMD_PRIM_VERSION_NO 0x01
 
+struct qeth_ipa_caps {
+	u32 supported;
+	u32 enabled;
+};
+
+static inline bool qeth_ipa_caps_supported(struct qeth_ipa_caps *caps, u32 mask)
+{
+	return (caps->supported & mask) == mask;
+}
+
+static inline bool qeth_ipa_caps_enabled(struct qeth_ipa_caps *caps, u32 mask)
+{
+	return (caps->enabled & mask) == mask;
+}
+
 enum qeth_card_types {
 	QETH_CARD_TYPE_OSD     = 1,
 	QETH_CARD_TYPE_IQD     = 5,
@@ -405,14 +420,25 @@ struct qeth_checksum_cmd {
 	__u32 enabled;
 } __packed;
 
+enum qeth_ipa_large_send_caps {
+	QETH_IPA_LARGE_SEND_TCP		= 0x00000001,
+};
+
+struct qeth_tso_start_data {
+	u32 mss;
+	u32 supported;
+};
+
 /* SETASSPARMS IPA Command: */
 struct qeth_ipacmd_setassparms {
 	struct qeth_ipacmd_setassparms_hdr hdr;
 	union {
 		__u32 flags_32bit;
+		struct qeth_ipa_caps caps;
 		struct qeth_checksum_cmd chksum;
 		struct qeth_arp_cache_entry add_arp_entry;
 		struct qeth_arp_query_data query_arp;
+		struct qeth_tso_start_data tso;
 		__u8 ip[16];
 	} data;
 } __attribute__ ((packed));
-- 
2.16.4

^ permalink raw reply related

* [PATCH net-next 0/4] s390/qeth: updates 2018-10-12
From: Julian Wiedmann @ 2018-10-12 15:27 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann

Hi Dave,

please apply one more patchset for net-next. This extends the TSO support
in qeth.

Thanks,
Julian


Julian Wiedmann (4):
  s390/qeth: make TSO controls protocol-agnostic
  s390/qeth: enhance TSO control sequence
  s390/qeth: add support for IPv6 TSO
  s390/qeth: add TSO support for L2 devices

 drivers/s390/net/qeth_core.h      |   5 +-
 drivers/s390/net/qeth_core_main.c | 159 ++++++++++++++++++++++++++++++++------
 drivers/s390/net/qeth_core_mpc.h  |  26 +++++++
 drivers/s390/net/qeth_l2_main.c   |  30 +++++--
 drivers/s390/net/qeth_l3_main.c   |  47 +++++------
 5 files changed, 210 insertions(+), 57 deletions(-)

-- 
2.16.4

^ permalink raw reply

* Re: [PATCH net] rxrpc: Fix incorrect conditional on IPV6
From: Eric Dumazet @ 2018-10-12 15:19 UTC (permalink / raw)
  To: David Howells; +Cc: Arnd Bergmann, netdev, linux-afs
In-Reply-To: <153935592320.3029.14678717675993366.stgit@warthog.procyon.org.uk>



On 10/12/2018 07:52 AM, David Howells wrote:
> The udpv6_encap_enable() function is part of the ipv6 code, and if that is
> configured as a loadable module and rxrpc is built in then a build failure
> will occur because the conditional check is wrong:
> 
>   net/rxrpc/local_object.o: In function `rxrpc_lookup_local':
>   local_object.c:(.text+0x2688): undefined reference to `udpv6_encap_enable'
> 
> Use the correct config symbol (CONFIG_AF_RXRPC_IPV6) in the conditional
> check rather than CONFIG_IPV6 as that will do the right thing.
> 
> Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook")
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Arnd Bergmann <arnd@arndb.de>

Nit : Correct attribution would require a Reported-by: tag

^ permalink raw reply

* Re: [PATCH net] ipv6: rate-limit probes for neighbourless routes
From: Eric Dumazet @ 2018-10-12 15:17 UTC (permalink / raw)
  To: Sabrina Dubroca, netdev; +Cc: Stefano Brivio
In-Reply-To: <ae3154f520198f61e4cd51a082e0e6be6b8bd632.1539353746.git.sd@queasysnail.net>



On 10/12/2018 07:22 AM, Sabrina Dubroca wrote:
> When commit 270972554c91 ("[IPV6]: ROUTE: Add Router Reachability
> Probing (RFC4191).") introduced router probing, the rt6_probe() function
> required that a neighbour entry existed. This neighbour entry is used to
> record the timestamp of the last probe via the ->updated field.
> 
> Later, commit 2152caea7196 ("ipv6: Do not depend on rt->n in rt6_probe().")
> removed the requirement for a neighbour entry. Neighbourless routes skip
> the interval check and are not rate-limited.
> 

Interesting. Can you describe a little more what happens here ?

Could this explain gazillions of syzbot/syzkaller reports that have all in common :

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
rcu:    (detected by 0, t=10712 jiffies, g=90369, q=135)
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
rcu: All QSes seen, last rcu_preempt kthread activity 10548 (4295003843-4294993295), jiffies_till_next_fqs=1, root ->qsmask 0x0
syz-executor0   R
  running task    
 warn_alloc.cold.119+0xb7/0x1bd mm/page_alloc.c:3426
22896  7592   5826 0x8010000c
Call Trace:
 <IRQ>
 sched_show_task.cold.83+0x2b6/0x30a kernel/sched/core.c:5296
 __alloc_pages_slowpath+0x2667/0x2d80 mm/page_alloc.c:4297
 print_other_cpu_stall.cold.79+0xa83/0xba5 kernel/rcu/tree.c:1430
 check_cpu_stall kernel/rcu/tree.c:1557 [inline]
 __rcu_pending kernel/rcu/tree.c:3276 [inline]
 rcu_pending kernel/rcu/tree.c:3319 [inline]
 rcu_check_callbacks+0xafc/0x1990 kernel/rcu/tree.c:2665
 __alloc_pages_nodemask+0xa80/0xde0 mm/page_alloc.c:4390
 __alloc_pages include/linux/gfp.h:473 [inline]
 __alloc_pages_node include/linux/gfp.h:486 [inline]
 kmem_getpages mm/slab.c:1409 [inline]
 cache_grow_begin+0x91/0x8c0 mm/slab.c:2677
 fallback_alloc+0x203/0x2e0 mm/slab.c:3219
 ____cache_alloc_node+0x1c7/0x1e0 mm/slab.c:3287
 slab_alloc_node mm/slab.c:3327 [inline]
 kmem_cache_alloc_node+0xe3/0x730 mm/slab.c:3642
 __alloc_skb+0x119/0x770 net/core/skbuff.c:193
 alloc_skb include/linux/skbuff.h:997 [inline]
 ndisc_alloc_skb+0x144/0x340 net/ipv6/ndisc.c:403
 ndisc_send_rs+0x331/0x6e0 net/ipv6/ndisc.c:669
 update_process_times+0x2d/0x70 kernel/time/timer.c:1636
 addrconf_rs_timer+0x314/0x690 net/ipv6/addrconf.c:3836
 tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:164
 tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1274
 __run_hrtimer kernel/time/hrtimer.c:1398 [inline]
 __hrtimer_run_queues+0x41c/0x10d0 kernel/time/hrtimer.c:1460
 call_timer_fn+0x272/0x920 kernel/time/timer.c:1326
 hrtimer_interrupt+0x313/0x780 kernel/time/hrtimer.c:1518 

^ permalink raw reply

* RE: [PATCH net-next, v2] hv_netvsc: fix vf serial matching with pci slot info
From: Haiyang Zhang @ 2018-10-12 22:27 UTC (permalink / raw)
  To: Stephen Hemminger, Haiyang Zhang
  Cc: davem@davemloft.net, netdev@vger.kernel.org, olaf@aepfle.de,
	linux-kernel@vger.kernel.org, devel@linuxdriverproject.org,
	vkuznets
In-Reply-To: <20181012152030.4f58c65e@xeon-e3>



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Friday, October 12, 2018 6:21 PM
> To: Haiyang Zhang <haiyangz@linuxonhyperv.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>; davem@davemloft.net;
> netdev@vger.kernel.org; olaf@aepfle.de; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org; vkuznets <vkuznets@redhat.com>
> Subject: Re: [PATCH net-next, v2] hv_netvsc: fix vf serial matching with pci slot
> info
> 
> On Fri, 12 Oct 2018 20:55:15 +0000
> Haiyang Zhang <haiyangz@linuxonhyperv.com> wrote:
> 
> Thanks for fixing this.
> 
> 
> > +	if (kstrtou32(kobject_name(&pdev->slot->kobj), 10, &serial)) {
> > +		netdev_notice(vf_netdev, "Invalid vf serial:%s\n",
> > +			      pdev->slot->kobj.name);
> > +		return NULL;
> > +	}
> 
> Shouldn't this use kobject_name() in the message as well.
> 
> Looking at the pci.h code there is already an API to get name from slot (it uses
> kobject_name()). So please use that one.

Sure, I will look for that api. Thanks.

^ permalink raw reply

* Re: [PATCH net-next, v2] hv_netvsc: fix vf serial matching with pci slot info
From: Stephen Hemminger @ 2018-10-12 22:20 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: olaf, netdev, haiyangz, linux-kernel, devel, vkuznets, davem
In-Reply-To: <20181012205515.23355-1-haiyangz@linuxonhyperv.com>

On Fri, 12 Oct 2018 20:55:15 +0000
Haiyang Zhang <haiyangz@linuxonhyperv.com> wrote:

Thanks for fixing this.

  
> +	if (kstrtou32(kobject_name(&pdev->slot->kobj), 10, &serial)) {
> +		netdev_notice(vf_netdev, "Invalid vf serial:%s\n",
> +			      pdev->slot->kobj.name);
> +		return NULL;
> +	}

Shouldn't this use kobject_name() in the message as well.

Looking at the pci.h code there is already an API to get name from
slot (it uses kobject_name()). So please use that one.

^ permalink raw reply

* Re: 9p/RDMA for syzkaller (Was: BUG: corrupted list in p9_read_work)
From: Dmitry Vyukov @ 2018-10-12 14:42 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML,
	Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs,
	v9fs-developer
In-Reply-To: <20181011142808.GC32030@nautica>

On Thu, Oct 11, 2018 at 4:28 PM, Dominique Martinet
<asmadeus@codewreck.org> wrote:
> Dmitry Vyukov wrote on Thu, Oct 11, 2018:
>> > Now we are talking!
>> > We generally assume that all modules are simply compiled into kernel.
>> > At least that's we have on syzbot. If somebody can't compile them in,
>> > we can suggest to add modprobe into init.
>> > So this boils down to just writing to /sys/module/rdma_rxe/parameters/add.
>>
>> This fails for me:
>>
>> root@syzkaller:~# echo -n syz1 > /sys/module/rdma_rxe/parameters/add
>> [20992.905406] rdma_rxe: interface syz1 not found
>> bash: echo: write error: Invalid argument
>
> Works here, I just did:
>
> [root@f2 ~]# modprobe rdma_rxe
> [root@f2 ~]# echo -n ens3 > /sys/module/rdma_rxe/parameters/add
>
> dmesg says:
> [   35.595534] rdma_rxe: set rxe0 active
> [   35.595541] rdma_rxe: added rxe0 to ens3
>
>
> Actually for a dummy interface if I try the echo directly the echo
> works, and a verb device is created, and I just confirmed I can use
> it... so not sure why rxe_cfg said EINVAL earlier...
>
> [root@f2 ~]# ip link add dummy0 type dummy
> [root@f2 ~]# ip link set dummy0 up
> [root@f2 ~]# ip addr add 10.1.1.1/24 dev dummy0
> [root@f2 ~]# modprobe rdma_rxe
> [root@f2 ~]# echo -n dummy0 > /sys/module/rdma_rxe/parameters/add
>
>
> (then using my test client:
> [root@f2 src]# ./rcat -s
> INFO:  trans_rdma.c (879), msk_cma_event_handler: CONNECT_REQUEST
> INFO:  trans_rdma.c (862), msk_cma_event_handler: ESTABLISHED
> INFO:  trans_rdma.c (917), msk_cma_event_handler: DISCONNECT EVENT...
>
> [root@f2 src]# ./rcat -c 10.1.1.1
> INFO:  trans_rdma.c (862), msk_cma_event_handler: ESTABLISHED
> ^C
> )
>
>
> I assume your syz1 interface is a tap device as you were saying earlier?
> Got anything in dmesg?

Umm... nope, just a random string.
Don't assume any prior knowledge on my side :)

^ permalink raw reply

* [PATCH net-next 0/3] Fixes & small enhancements related to the promisc mode in HNS3
From: Salil Mehta @ 2018-10-12 14:34 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil, netdev,
	linux-kernel, linuxarm

This patch-set presents some fixes and enhancements related to promiscuous
mode and MAC VLAN Table full condition in HNS3 Ethernet Driver.

Jian Shen (3):
  net: hns3: Enable promisc mode when mac vlan table is full
  net: hns3: Resume promisc mode and vlan filter status after reset
  net: hns3: Resume promisc mode and vlan filter status after loopback
    test

 drivers/net/ethernet/hisilicon/hns3/hnae3.h        | 11 +++
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c    | 84 +++++++++++++++++++---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h    |  3 +
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 10 ++-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    |  9 ++-
 5 files changed, 103 insertions(+), 14 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: netconsole warning in 4.19.0-rc7
From: Dave Jones @ 2018-10-12 14:32 UTC (permalink / raw)
  To: Cong Wang; +Cc: Meelis Roos, LKML, Linux Kernel Network Developers
In-Reply-To: <CAM_iQpU0+LVwgMea32KQWf8z6Bp1VTmWOGnWNwNSmV0PZzf_ew@mail.gmail.com>

On Wed, Oct 10, 2018 at 10:34:49PM -0700, Cong Wang wrote:
 > (Cc'ing Dave)
 > 
 > On Wed, Oct 10, 2018 at 5:14 AM Meelis Roos <mroos@linux.ee> wrote:
 > >
 > > Thies 4.19-rc7 on a bunch of test machines and got this warning from one.
 > > It is reproducible and I have not noticed it before.
 > >
 > [...]
 > > [    9.914805] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:168 __local_bh_enable_ip+0x2e/0x44
 > > [    9.914806] Modules linked in:
 > > [    9.914808] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc7 #210
 > > [    9.914810] Hardware name: MicroLink                       /D850MV                         , BIOS MV85010A.86A.0067.P24.0304081124 04/08/2003
 > > [    9.914811] EIP: __local_bh_enable_ip+0x2e/0x44
 > > [    9.914813] Code: cc 02 5f c8 a9 00 00 0f 00 75 1f 83 ea 01 f7 da 01 15 cc 02 5f c8 a1 cc 02 5f c8 a9 00 ff 1f 00 74 0c ff 0d cc 02 5f c8 5d c3 <0f> 0b eb dd 66 a1 80 cd 5e c8 66 85 c0 74 e9 e8 87 ff ff ff eb e2
 > > [    9.914814] EAX: 80010200 EBX: f602b000 ECX: 36346270 EDX: 00000200
 > > [    9.914815] ESI: f620ecc0 EDI: f620ebac EBP: f600de40 ESP: f600de40
 > > [    9.914816] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 EFLAGS: 00010006
 > > [    9.914817] CR0: 80050033 CR2: b7f5f000 CR3: 36389000 CR4: 000006d0
 > > [    9.914818] Call Trace:
 > > [    9.914819]  <IRQ>
 > > [    9.914820]  netpoll_send_skb_on_dev+0xa5/0x1b0
 > 
 > This is exactly what I mentioned in my review here:
 > https://marc.info/?l=linux-netdev&m=153816136624679&w=2
 > 
 > "But irq is disabled here, so not sure if rcu_read_lock_bh()
 > could cause trouble... "

ugh, what a mess.
I'm travelling right now so not going to get to look into this more
for a week or so.  Unless someone has a quick-fix, should we revert ?
We've traded one warning for another, which doesn't really feel like
progress.

	Dave

^ permalink raw reply

* Re: [PATCH] rxrpc: add IPV6 dependency
From: David Howells @ 2018-10-12 14:25 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: dhowells, David S. Miller, linux-afs, linux-kernel, netdev
In-Reply-To: <20181012123624.2001922-1-arnd@arndb.de>

Arnd Bergmann <arnd@arndb.de> wrote:

> +	depends on IPV6 || !IPV6

That looks weird.  It looks like it always ought to be true.

David

^ permalink raw reply

* [PATCH net] ipv6: rate-limit probes for neighbourless routes
From: Sabrina Dubroca @ 2018-10-12 14:22 UTC (permalink / raw)
  To: netdev; +Cc: Stefano Brivio, Sabrina Dubroca

When commit 270972554c91 ("[IPV6]: ROUTE: Add Router Reachability
Probing (RFC4191).") introduced router probing, the rt6_probe() function
required that a neighbour entry existed. This neighbour entry is used to
record the timestamp of the last probe via the ->updated field.

Later, commit 2152caea7196 ("ipv6: Do not depend on rt->n in rt6_probe().")
removed the requirement for a neighbour entry. Neighbourless routes skip
the interval check and are not rate-limited.

This patch adds rate-limiting for neighbourless routes, by recording the
timestamp of the last probe in the fib6_info itself.

Fixes: 2152caea7196 ("ipv6: Do not depend on rt->n in rt6_probe().")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
---
 include/net/ip6_fib.h |  4 ++++
 net/ipv6/route.c      | 12 ++++++------
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 3d4930528db0..2d31e22babd8 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -159,6 +159,10 @@ struct fib6_info {
 	struct rt6_info * __percpu	*rt6i_pcpu;
 	struct rt6_exception_bucket __rcu *rt6i_exception_bucket;
 
+#ifdef CONFIG_IPV6_ROUTER_PREF
+	unsigned long			last_probe;
+#endif
+
 	u32				fib6_metric;
 	u8				fib6_protocol;
 	u8				fib6_type;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index a366c05a239d..abcb5ae77319 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -520,10 +520,11 @@ static void rt6_probe_deferred(struct work_struct *w)
 
 static void rt6_probe(struct fib6_info *rt)
 {
-	struct __rt6_probe_work *work;
+	struct __rt6_probe_work *work = NULL;
 	const struct in6_addr *nh_gw;
 	struct neighbour *neigh;
 	struct net_device *dev;
+	struct inet6_dev *idev;
 
 	/*
 	 * Okay, this does not seem to be appropriate
@@ -539,15 +540,12 @@ static void rt6_probe(struct fib6_info *rt)
 	nh_gw = &rt->fib6_nh.nh_gw;
 	dev = rt->fib6_nh.nh_dev;
 	rcu_read_lock_bh();
+	idev = __in6_dev_get(dev);
 	neigh = __ipv6_neigh_lookup_noref(dev, nh_gw);
 	if (neigh) {
-		struct inet6_dev *idev;
-
 		if (neigh->nud_state & NUD_VALID)
 			goto out;
 
-		idev = __in6_dev_get(dev);
-		work = NULL;
 		write_lock(&neigh->lock);
 		if (!(neigh->nud_state & NUD_VALID) &&
 		    time_after(jiffies,
@@ -557,11 +555,13 @@ static void rt6_probe(struct fib6_info *rt)
 				__neigh_set_probe_once(neigh);
 		}
 		write_unlock(&neigh->lock);
-	} else {
+	} else if (time_after(jiffies, rt->last_probe +
+				       idev->cnf.rtr_probe_interval)) {
 		work = kmalloc(sizeof(*work), GFP_ATOMIC);
 	}
 
 	if (work) {
+		rt->last_probe = jiffies;
 		INIT_WORK(&work->work, rt6_probe_deferred);
 		work->target = *nh_gw;
 		dev_hold(dev);
-- 
2.19.1

^ permalink raw reply related

* Re: [PATCH] igb: shorten maximum PHC timecounter update interval
From: Richard Cochran @ 2018-10-12 14:08 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: intel-wired-lan, netdev, Jacob Keller, Thomas Gleixner
In-Reply-To: <20181012111339.1361-1-mlichvar@redhat.com>

On Fri, Oct 12, 2018 at 01:13:39PM +0200, Miroslav Lichvar wrote:
> This fixes an issue with HW timestamps on 82580/I350/I354 being off by
> ~1100 seconds for few seconds every ~9 minutes.

This patch should go to the stable trees starting with v4.8.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH] igb: shorten maximum PHC timecounter update interval
From: Richard Cochran @ 2018-10-12 14:05 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: intel-wired-lan, netdev, Jacob Keller, Thomas Gleixner
In-Reply-To: <20181012111339.1361-1-mlichvar@redhat.com>

On Fri, Oct 12, 2018 at 01:13:39PM +0200, Miroslav Lichvar wrote:
> Since commit 500462a9d ("timers: Switch to a non-cascading wheel"),
> scheduling of delayed work seems to be less accurate and a requested
> delay of 540 seconds may actually be longer than 550 seconds. Shorten
> the delay to 480 seconds to be sure the timecounter is updated in time.

Good catch.  This timer wheel change will affect other, similar
drivers.  Guess I'll go through and adjust their timeouts, too.

Thanks,
Richard

^ permalink raw reply

* Re: Possible bug in traffic control?
From: Josh Coombs @ 2018-10-12 13:59 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev
In-Reply-To: <CACcUnf-HTqht3ogcd4YKdWxe0HvFqo7OYRR8SmkCGWe5d8zDHw@mail.gmail.com>

I've been able to narrow the scope down, the issue is with macsec
itself.  I setup two hosts with a macsec link between them, and let a
couple iperf3 sessions blast traffic across.  At approximately 4.2
billion packets / 6TB data transferred one end stopped transmitting
packets.  Doing a tcpdump on the impacted node's macsec0 interface
shows packets coming in from the remote node, in this case arp
requests, and arp replies from the local host, but watching the
interface counters for macsec0 no packets are being recorded as
transmitting.  Again, nothing in dmesg implying an error.

Deleting the macsec interface via ip link delete macsec0 and
re-creating it gets traffic flowing again without a reboot.

Meanwhile my traffic control bridge without macsec has shuffled 19TB
via 22 billion packets and not skipped a beat, so it appears my
initial assumption of it being the culprit was wrong.

To replicate, setup two hosts with a direct ethernet link between each other.
- Bring up macsec between the two hosts, setup a dedicated /30 on the link.
- Use iperf3 or another traffic generating tool over the /30, one
session for each direction.
- Wait for traffic to stop.

My test bed is on Ubuntu Server 18.04 currently, kernel 4.15.0-36.
I'm going to spin up a vanilla kernel on 4.15 and then -current to see
if this is an Ubuntu-ism from their patches, specific to 4.15, or a
general issue with macsec.

The script I used on each host (keys, rxmacs and IPs updated as appropriate):
#!/bin/bash

# Interfaces:
# dif = Egress physical interface (Dest)
# eif = Encrypted interface
dif=ens224
eif=macsec0

# MACSec Keys:
# txkey = Transmit (Local) key
# rxkey = Receive (Remote) key
# rxmac = Receive (Remote) MAC addy
txkey=60995924232808431491190820961556
rxkey=87345530111733181210202106249824
rxmac=00:0c:29:c5:95:df

# Clear any existing IP config
ifconfig $dif 0.0.0.0

# Bring up macsec:
echo "* Enable MACSec"
modprobe macsec
ip link add link "$dif" "$eif" type macsec
ip macsec add "$eif" tx sa 0 pn 1 on key 02 "$txkey"
ip macsec add "$eif" rx address "$rxmac" port 1
ip macsec add "$eif" rx address "$rxmac" port 1 sa 0 pn 1 on key 01 "$rxkey"
ip link set "$eif" type macsec encrypt on

# Bring up the interfaces:
echo "* Light tunnel NICS"
ip link set "$dif" up
ip link set "$eif" up

# Set IP
ifconfig $eif 192.168.211.1/30

echo " --=[ MACSec Up ]=--"
On Thu, Oct 11, 2018 at 10:05 AM Josh Coombs <jcoombs@staff.gwi.net> wrote:
>
> I'm actually leaning towards macsec now.  I'm at 6TB transferred in a
> double hop, no macsec over the bridge setup without triggering the
> fault.  I'm going to let it continue to churn and setup a second
> testbed that JUST uses macsec without traffic control bridging to see
> if I can trip the issue there.    That should determine if it's macsec
> itself, or an interaction between macsec and traffic control.
>
> Joshua Coombs
> GWI
>
> office 207-494-2140
> www.gwi.net
>
> On Wed, Oct 10, 2018 at 12:39 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Wed, Oct 10, 2018 at 8:54 AM Josh Coombs <jcoombs@staff.gwi.net> wrote:
> > >
> > > 2.3 billion 1 byte packets failed to re-create the bug.  To try and
> > > simplify the setup I removed macsec from the equation, using a single
> > > host in the middle as the bridge.  Interestingly, rather than 1.3Gbits
> > > a second in both directions, it ran around 8Mbits a second.  Switching
> > > the filter from u32 to matchall didn't change the performance.  Going
> > > back to the four machine test bed, again removing macsec and just
> > > bridging through radically decreased the throughput to around 8Mbits.
> > > Flip on macsec for the bridge and 1.3Gbits?
> >
> > This is a great narrow down! We can rule out macsec for guilty.
> >
> > Can you share a minimum reproducer for this problem? If so I can take
> > a look.

^ permalink raw reply

* [RFC-resend 2/2] compat: Cleanup in_compat_syscall() callers
From: Dmitry Safonov @ 2018-10-12 13:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Dmitry Safonov, Ard Biesheuvel, Andy Lutomirsky,
	David S. Miller, Herbert Xu, H. Peter Anvin, Ingo Molnar,
	John Stultz, Kirill A. Shutemov, Oleg Nesterov, Steffen Klassert,
	Stephen Boyd, Steven Rostedt, Thomas Gleixner, x86, linux-efi,
	netdev
In-Reply-To: <20181012134253.23266-1-dima@arista.com>

Now that in_compat_syscall() == false on native i686, it's possible to
remove some ifdeffery and no more needed helpers.

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 drivers/firmware/efi/efivars.c | 16 ++++------------
 kernel/time/time.c             |  2 +-
 net/xfrm/xfrm_state.c          |  2 --
 net/xfrm/xfrm_user.c           |  2 --
 4 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/drivers/firmware/efi/efivars.c b/drivers/firmware/efi/efivars.c
index 3e626fd9bd4e..8061667a6765 100644
--- a/drivers/firmware/efi/efivars.c
+++ b/drivers/firmware/efi/efivars.c
@@ -229,14 +229,6 @@ sanity_check(struct efi_variable *var, efi_char16_t *name, efi_guid_t vendor,
 	return 0;
 }
 
-static inline bool is_compat(void)
-{
-	if (IS_ENABLED(CONFIG_COMPAT) && in_compat_syscall())
-		return true;
-
-	return false;
-}
-
 static void
 copy_out_compat(struct efi_variable *dst, struct compat_efi_variable *src)
 {
@@ -263,7 +255,7 @@ efivar_store_raw(struct efivar_entry *entry, const char *buf, size_t count)
 	u8 *data;
 	int err;
 
-	if (is_compat()) {
+	if (in_compat_syscall()) {
 		struct compat_efi_variable *compat;
 
 		if (count != sizeof(*compat))
@@ -324,7 +316,7 @@ efivar_show_raw(struct efivar_entry *entry, char *buf)
 			     &entry->var.DataSize, entry->var.Data))
 		return -EIO;
 
-	if (is_compat()) {
+	if (in_compat_syscall()) {
 		compat = (struct compat_efi_variable *)buf;
 
 		size = sizeof(*compat);
@@ -418,7 +410,7 @@ static ssize_t efivar_create(struct file *filp, struct kobject *kobj,
 	struct compat_efi_variable *compat = (struct compat_efi_variable *)buf;
 	struct efi_variable *new_var = (struct efi_variable *)buf;
 	struct efivar_entry *new_entry;
-	bool need_compat = is_compat();
+	bool need_compat = in_compat_syscall();
 	efi_char16_t *name;
 	unsigned long size;
 	u32 attributes;
@@ -495,7 +487,7 @@ static ssize_t efivar_delete(struct file *filp, struct kobject *kobj,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (is_compat()) {
+	if (in_compat_syscall()) {
 		if (count != sizeof(*compat))
 			return -EINVAL;
 
diff --git a/kernel/time/time.c b/kernel/time/time.c
index ccdb351277ee..4638e8cb6378 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -863,7 +863,7 @@ int get_timespec64(struct timespec64 *ts,
 	ts->tv_sec = kts.tv_sec;
 
 	/* Zero out the padding for 32 bit systems or in compat mode */
-	if (IS_ENABLED(CONFIG_64BIT_TIME) && (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()))
+	if (IS_ENABLED(CONFIG_64BIT_TIME) && in_compat_syscall())
 		kts.tv_nsec &= 0xFFFFFFFFUL;
 
 	ts->tv_nsec = kts.tv_nsec;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index b669262682c9..dc4a9f1fb941 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2077,10 +2077,8 @@ int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen
 	struct xfrm_mgr *km;
 	struct xfrm_policy *pol = NULL;
 
-#ifdef CONFIG_COMPAT
 	if (in_compat_syscall())
 		return -EOPNOTSUPP;
-#endif
 
 	if (!optval && !optlen) {
 		xfrm_sk_policy_insert(sk, XFRM_POLICY_IN, NULL);
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index df7ca2dabc48..c3aedf8a99ff 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2621,10 +2621,8 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 	const struct xfrm_link *link;
 	int type, err;
 
-#ifdef CONFIG_COMPAT
 	if (in_compat_syscall())
 		return -EOPNOTSUPP;
-#endif
 
 	type = nlh->nlmsg_type;
 	if (type > XFRM_MSG_MAX)
-- 
2.13.6

^ permalink raw reply related

* [RFC-resend 1/2] x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
From: Dmitry Safonov @ 2018-10-12 13:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Dmitry Safonov, Ard Biesheuvel, Andy Lutomirsky,
	David S. Miller, Herbert Xu, H. Peter Anvin, Ingo Molnar,
	John Stultz, Kirill A. Shutemov, Oleg Nesterov, Steffen Klassert,
	Stephen Boyd, Steven Rostedt, Thomas Gleixner, x86, linux-efi,
	netdev
In-Reply-To: <20181012134253.23266-1-dima@arista.com>

The result of in_compat_syscall() can be pictured as:

x86 platform:
    ---------------------------------------------------
    |  Arch\syscall  |  64-bit  |   ia32   |   x32    |
    |-------------------------------------------------|
    |     x86_64     |  false   |   true   |   true   |
    |-------------------------------------------------|
    |      i686      |          |  <true>  |          |
    ---------------------------------------------------

Other platforms:
    -------------------------------------------
    |  Arch\syscall  |  64-bit  |   compat    |
    |-----------------------------------------|
    |     64-bit     |  false   |    true     |
    |-----------------------------------------|
    |    32-bit(?)   |          |   <false>   |
    -------------------------------------------

As it seen, the result of in_compat_syscall() on generic 32-bit platform
differs from i686.

There is no reason for in_compat_syscall() == true on native i686.
It also easy to misread code if the result on native 32-bit platform
differs between arches.
Because of that non arch-specific code has many places with:
    if (IS_ENABLED(CONFIG_COMPAT) && in_compat_syscall())
in different variations.

It looks-like the only non-x86 code which uses in_compat_syscall() not
under CONFIG_COMPAT guard is in amd/amdkfd. But according to
the commit a18069c132cb ("amdkfd: Disable support for 32-bit user
processes"), it actually should be disabled on native i686.

Rename in_compat_syscall() to in_32bit_syscall() for x86-specific code
and make in_compat_syscall() false under !CONFIG_COMPAT.

With a following patch I'll clean generic users which were forced
to check IS_ENABLED(CONFIG_COMPAT) with in_compat_syscall().

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/include/asm/compat.h |  9 ++++++++-
 arch/x86/include/asm/ftrace.h |  4 +---
 arch/x86/kernel/process_64.c  |  4 ++--
 arch/x86/kernel/sys_x86_64.c  | 11 ++++++-----
 arch/x86/mm/hugetlbpage.c     |  4 ++--
 arch/x86/mm/mmap.c            |  2 +-
 include/linux/compat.h        |  4 ++--
 7 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index fb97cf7c4137..626bcf1d037d 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -232,11 +232,18 @@ static inline bool in_x32_syscall(void)
 	return false;
 }
 
-static inline bool in_compat_syscall(void)
+static inline bool in_32bit_syscall(void)
 {
 	return in_ia32_syscall() || in_x32_syscall();
 }
+
+#ifdef CONFIG_COMPAT
+static inline bool in_compat_syscall(void)
+{
+	return in_32bit_syscall();
+}
 #define in_compat_syscall in_compat_syscall	/* override the generic impl */
+#endif
 
 struct compat_siginfo;
 int __copy_siginfo_to_user32(struct compat_siginfo __user *to,
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index c18ed65287d5..cf350639e76d 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -76,9 +76,7 @@ static inline bool arch_syscall_match_sym_name(const char *sym, const char *name
 #define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS 1
 static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs)
 {
-	if (in_compat_syscall())
-		return true;
-	return false;
+	return in_32bit_syscall();
 }
 #endif /* CONFIG_FTRACE_SYSCALLS && CONFIG_IA32_EMULATION */
 #endif /* !COMPILE_OFFSETS */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ea5ea850348d..f96d9ee80086 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -573,10 +573,10 @@ static void __set_personality_x32(void)
 		current->mm->context.ia32_compat = TIF_X32;
 	current->personality &= ~READ_IMPLIES_EXEC;
 	/*
-	 * in_compat_syscall() uses the presence of the x32 syscall bit
+	 * in_32bit_syscall() uses the presence of the x32 syscall bit
 	 * flag to determine compat status.  The x86 mmap() code relies on
 	 * the syscall bitness so set x32 syscall bit right here to make
-	 * in_compat_syscall() work during exec().
+	 * in_32bit_syscall() work during exec().
 	 *
 	 * Pretend to come from a x32 execve.
 	 */
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 6a78d4b36a79..f7476ce23b6e 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -105,7 +105,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
 static void find_start_end(unsigned long addr, unsigned long flags,
 		unsigned long *begin, unsigned long *end)
 {
-	if (!in_compat_syscall() && (flags & MAP_32BIT)) {
+	if (!in_32bit_syscall() && (flags & MAP_32BIT)) {
 		/* This is usually used needed to map code in small
 		   model, so it needs to be in the first 31bit. Limit
 		   it to that.  This means we need to move the
@@ -122,7 +122,7 @@ static void find_start_end(unsigned long addr, unsigned long flags,
 	}
 
 	*begin	= get_mmap_base(1);
-	if (in_compat_syscall())
+	if (in_32bit_syscall())
 		*end = task_size_32bit();
 	else
 		*end = task_size_64bit(addr > DEFAULT_MAP_WINDOW);
@@ -193,7 +193,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 		return addr;
 
 	/* for MAP_32BIT mappings we force the legacy mmap base */
-	if (!in_compat_syscall() && (flags & MAP_32BIT))
+	if (!in_32bit_syscall() && (flags & MAP_32BIT))
 		goto bottomup;
 
 	/* requesting a specific address */
@@ -217,9 +217,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	 * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
 	 * in the full address space.
 	 *
-	 * !in_compat_syscall() check to avoid high addresses for x32.
+	 * !in_32bit_syscall() check to avoid high addresses for x32
+	 * (and make it no op on native i386).
 	 */
-	if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall())
+	if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall())
 		info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;
 
 	info.align_mask = 0;
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 00b296617ca4..92e4c4b85bba 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -92,7 +92,7 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
 	 * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
 	 * in the full address space.
 	 */
-	info.high_limit = in_compat_syscall() ?
+	info.high_limit = in_32bit_syscall() ?
 		task_size_32bit() : task_size_64bit(addr > DEFAULT_MAP_WINDOW);
 
 	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
@@ -116,7 +116,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
 	 * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
 	 * in the full address space.
 	 */
-	if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall())
+	if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall())
 		info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;
 
 	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 1e95d57760cf..db3165714521 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -166,7 +166,7 @@ unsigned long get_mmap_base(int is_legacy)
 	struct mm_struct *mm = current->mm;
 
 #ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
-	if (in_compat_syscall()) {
+	if (in_32bit_syscall()) {
 		return is_legacy ? mm->mmap_compat_legacy_base
 				 : mm->mmap_compat_base;
 	}
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 1a3c4f37e908..ac643a2f5f4b 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -1033,9 +1033,9 @@ int kcompat_sys_fstatfs64(unsigned int fd, compat_size_t sz,
 #else /* !CONFIG_COMPAT */
 
 #define is_compat_task() (0)
-#ifndef in_compat_syscall
+/* Ensure no one redefines in_compat_syscall() under !CONFIG_COMPAT */
+#define in_compat_syscall in_compat_syscall
 static inline bool in_compat_syscall(void) { return false; }
-#endif
 
 #endif /* CONFIG_COMPAT */
 
-- 
2.13.6

^ permalink raw reply related

* [RFC-resend 0/2] compat: in_compat_syscall() differs on x86
From: Dmitry Safonov @ 2018-10-12 13:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Dmitry Safonov, Ard Biesheuvel, Andy Lutomirsky,
	David S. Miller, Herbert Xu, H. Peter Anvin, Ingo Molnar,
	John Stultz, Kirill A. Shutemov, Oleg Nesterov, Steffen Klassert,
	Stephen Boyd, Steven Rostedt, Thomas Gleixner, x86, linux-efi,
	netdev

Reading xfrm (ipsec) code I've found such code:

: #ifdef CONFIG_COMPAT
:         if (in_compat_syscall())
:                 return -EOPNOTSUPP;
: #endif

While I can read that it's false on native i386, it's a bit misleading
and in result it's better to introduce a helper for that.
Grepping other code, I've found that there are already such helpers.
And the uniq behavior of in_compat_syscall() on x86 is disturbing.

Adjusting it to generic with the following..

(on the first non-resend RFC I managed to forget Cc'ing Andy..
 sorry about that, was sure I did add him).

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: linux-efi@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Dmitry Safonov <0x7f454c46@gmail.com>

Dmitry Safonov (2):
  x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
  compat: Cleanup in_compat_syscall() callers

 arch/x86/include/asm/compat.h  |  9 ++++++++-
 arch/x86/include/asm/ftrace.h  |  4 +---
 arch/x86/kernel/process_64.c   |  4 ++--
 arch/x86/kernel/sys_x86_64.c   | 11 ++++++-----
 arch/x86/mm/hugetlbpage.c      |  4 ++--
 arch/x86/mm/mmap.c             |  2 +-
 drivers/firmware/efi/efivars.c | 16 ++++------------
 include/linux/compat.h         |  4 ++--
 kernel/time/time.c             |  2 +-
 net/xfrm/xfrm_state.c          |  2 --
 net/xfrm/xfrm_user.c           |  2 --
 11 files changed, 27 insertions(+), 33 deletions(-)

-- 
2.13.6

^ permalink raw reply

* Re: [PATCH bpf-next 2/2] bpf, libbpf: simplify perf RB walk and do incremental updates
From: Daniel Borkmann @ 2018-10-12 13:30 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: alexei.starovoitov, netdev
In-Reply-To: <4258c8c5-f18e-65ba-cb14-4a2f2e3187cc@iogearbox.net>

On 10/12/2018 10:39 AM, Daniel Borkmann wrote:
> On 10/12/2018 05:04 AM, Jakub Kicinski wrote:
>> On Thu, 11 Oct 2018 16:02:07 +0200, Daniel Borkmann wrote:
>>> Clean up and improve bpf_perf_event_read_simple() ring walk a bit
>>> to use similar tail update scheme as in perf and bcc allowing the
>>> kernel to make forward progress not only after full timely walk.
>>
>> The extra memory barriers won't impact performance?  If I read the code
>> correctly we now have:
>>
>> 	while (bla) {
>> 		head = HEAD
>> 		rmb()
>>
>> 		...
>>
>> 		mb()
>> 		TAIL = tail
>> 	}
>>
>> Would it make sense to try to piggy back on the mb() for head re-read
>> at least?  Perhaps that's a non-issue, just wondering.
> 
> From the scheme specified in the comment in prior patch my understanding
> would be that they don't pair (see B and C) so there would be no guarantee
> that load of head would happen before load of data. Fwiw, I've been using
> the exact same semantics as user space perf tool walks the perf mmap'ed
> ring buffer (tools/perf/util/mmap.{h,c}) here. Given kernel doesn't stop

On that note, I'll also respin, after some clarification with PeterZ on
why perf is using {rmb,mb}() barriers today as opposed to more lightweight
smp_{rmb,mb}() ones it turns out there is no real reason other than
historic one and perf can be changed and made more efficient as well. ;)

> pushing into ring buffer while user space walks it and indicates how far
> it has consumed data via tail update, it would allow for making room
> successively and not only after full run has complete, so we don't make
> any assumptions in the generic libbpf library helper on how slow/quick
> the callback would be processing resp. how full ring is, etc, and kernel
> pushing new data can be processed in the same run if necessary. One thing
> we could consider is to batch tail updates, say, every 8 elements and a
> final update once we break out walking the ring; probably okay'ish as a
> heuristic..
> 
>>> Also few other improvements to use realloc() instead of free() and
>>> malloc() combination and for the callback use proper perf_event_header
>>> instead of void pointer, so that real applications can use container_of()
>>> macro with proper type checking.
>>
>> FWIW the free() + malloc() was to avoid the the needless copy of the
>> previous event realloc() may do.  It makes sense to use realloc()
>> especially if you want to put extra info in front of the buffer, just
>> sayin' it wasn't a complete braino ;)
> 
> No strong preference from my side, I'd think that it might be sensible in
> any case from applications to call the bpf_perf_event_read_simple() with a
> already preallocated buffer, depending on the expected max element size from
> BPF could e.g. be a buffer of 1 page or so. Given 512 byte stack space from
> the BPF prog and MTU 1500 this would more than suffice to avoid new
> allocations altogether. Anyway, given we only grow the new memory area I
> did some testing on realloc() with an array of pointers to prior malloc()'ed
> buffers, running randomly 10M realloc()s to increase size over them and
> saw <1% where area had to be moved, so we're hitting corner case of a corner
> case, I'm also ok to leave the combination, though. :)
> 
> Thanks,
> Daniel

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox