Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 14/14] ARM: davinci: sffsdr: use device properties for at24 eeprom
From: Bartosz Golaszewski @ 2018-06-25 15:50 UTC (permalink / raw)
  To: Sekhar Nori, Kevin Hilman, Russell King, Grygorii Strashko,
	David S . Miller, Srinivas Kandagatla, Lukas Wunner, Rob Herring,
	Florian Fainelli, Dan Carpenter, Ivan Khoronzhuk, David Lechner,
	Greg Kroah-Hartman
  Cc: linux-arm-kernel, linux-kernel, linux-omap, netdev,
	Bartosz Golaszewski
In-Reply-To: <20180625155025.12567-1-brgl@bgdev.pl>

From: Bartosz Golaszewski <bgolaszewski@baylibre.com>

We want to work towards phasing out the at24_platform_data structure.
There are few users and its contents can be represented using generic
device properties. Using device properties only will allow us to
significantly simplify the at24 configuration code.

Remove the at24_platform_data structure and replace it with an array
of property entries. Drop the byte_len/size property, as the model name
already implies the EEPROM's size.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
---
 arch/arm/mach-davinci/board-sffsdr.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm/mach-davinci/board-sffsdr.c b/arch/arm/mach-davinci/board-sffsdr.c
index f6a4d094cbc3..680e5d7628a8 100644
--- a/arch/arm/mach-davinci/board-sffsdr.c
+++ b/arch/arm/mach-davinci/board-sffsdr.c
@@ -26,7 +26,7 @@
 #include <linux/init.h>
 #include <linux/platform_device.h>
 #include <linux/i2c.h>
-#include <linux/platform_data/at24.h>
+#include <linux/property.h>
 #include <linux/mtd/mtd.h>
 #include <linux/mtd/rawnand.h>
 #include <linux/mtd/partitions.h>
@@ -92,16 +92,15 @@ static struct platform_device davinci_sffsdr_nandflash_device = {
 	.resource	= davinci_sffsdr_nandflash_resource,
 };
 
-static struct at24_platform_data eeprom_info = {
-	.byte_len	= (64*1024) / 8,
-	.page_size	= 32,
-	.flags		= AT24_FLAG_ADDR16,
+static const struct property_entry eeprom_properties[] = {
+	PROPERTY_ENTRY_U32("pagesize", 32),
+	{ },
 };
 
 static struct i2c_board_info __initdata i2c_info[] =  {
 	{
 		I2C_BOARD_INFO("24c64", 0x50),
-		.platform_data	= &eeprom_info,
+		.properties = eeprom_properties,
 	},
 	/* Other I2C devices:
 	 * MSP430,  addr 0x23 (not used)
-- 
2.17.1

^ permalink raw reply related

* [PATCH 02/14] ARM: davinci: dm365-evm: use nvmem lookup for mac address
From: Bartosz Golaszewski @ 2018-06-25 15:50 UTC (permalink / raw)
  To: Sekhar Nori, Kevin Hilman, Russell King, Grygorii Strashko,
	David S . Miller, Srinivas Kandagatla, Lukas Wunner, Rob Herring,
	Florian Fainelli, Dan Carpenter, Ivan Khoronzhuk, David Lechner,
	Greg Kroah-Hartman
  Cc: linux-arm-kernel, linux-kernel, linux-omap, netdev,
	Bartosz Golaszewski
In-Reply-To: <20180625155025.12567-1-brgl@bgdev.pl>

From: Bartosz Golaszewski <bgolaszewski@baylibre.com>

We now support nvmem lookups for machine code. Add a lookup for
mac-address.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
---
 arch/arm/mach-davinci/board-dm365-evm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/arm/mach-davinci/board-dm365-evm.c b/arch/arm/mach-davinci/board-dm365-evm.c
index 435f7ec7d9af..df640d977bfa 100644
--- a/arch/arm/mach-davinci/board-dm365-evm.c
+++ b/arch/arm/mach-davinci/board-dm365-evm.c
@@ -28,6 +28,7 @@
 #include <linux/spi/spi.h>
 #include <linux/spi/eeprom.h>
 #include <linux/v4l2-dv-timings.h>
+#include <linux/nvmem-provider.h>
 
 #include <asm/mach-types.h>
 #include <asm/mach/arch.h>
@@ -169,6 +170,15 @@ static struct platform_device davinci_nand_device = {
 	},
 };
 
+static struct nvmem_cell_lookup dm365evm_mac_address_cell = {
+	.info = {
+		.name = "mac-address",
+		.offset = 0x7f00,
+		.bytes = ETH_ALEN,
+	},
+	.nvmem_name = "1-00500",
+};
+
 static struct at24_platform_data eeprom_info = {
 	.byte_len       = (256*1024) / 8,
 	.page_size      = 64,
@@ -769,6 +779,8 @@ static __init void dm365_evm_init(void)
 
 	dm365_init_spi0(BIT(0), dm365_evm_spi_info,
 			ARRAY_SIZE(dm365_evm_spi_info));
+
+	nvmem_register_lookup(&dm365evm_mac_address_cell, 1);
 }
 
 MACHINE_START(DAVINCI_DM365_EVM, "DaVinci DM365 EVM")
-- 
2.17.1

^ permalink raw reply related

* [PATCH 03/14] ARM: davinci: dm644-evm: use nvmem lookup for mac address
From: Bartosz Golaszewski @ 2018-06-25 15:50 UTC (permalink / raw)
  To: Sekhar Nori, Kevin Hilman, Russell King, Grygorii Strashko,
	David S . Miller, Srinivas Kandagatla, Lukas Wunner, Rob Herring,
	Florian Fainelli, Dan Carpenter, Ivan Khoronzhuk, David Lechner,
	Greg Kroah-Hartman
  Cc: linux-arm-kernel, linux-kernel, linux-omap, netdev,
	Bartosz Golaszewski
In-Reply-To: <20180625155025.12567-1-brgl@bgdev.pl>

From: Bartosz Golaszewski <bgolaszewski@baylibre.com>

We now support nvmem lookups for machine code. Add a lookup for
mac-address.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
---
 arch/arm/mach-davinci/board-dm644x-evm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/arm/mach-davinci/board-dm644x-evm.c b/arch/arm/mach-davinci/board-dm644x-evm.c
index 48436f74fd71..adbe8630ef19 100644
--- a/arch/arm/mach-davinci/board-dm644x-evm.c
+++ b/arch/arm/mach-davinci/board-dm644x-evm.c
@@ -28,6 +28,7 @@
 #include <linux/v4l2-dv-timings.h>
 #include <linux/export.h>
 #include <linux/leds.h>
+#include <linux/nvmem-provider.h>
 
 #include <media/i2c/tvp514x.h>
 
@@ -476,6 +477,15 @@ static struct pcf857x_platform_data pcf_data_u35 = {
  *  - ... newer boards may have more
  */
 
+static struct nvmem_cell_lookup dm6446evm_mac_address_cell = {
+	.info = {
+		.name = "mac-address",
+		.offset = 0x7f00,
+		.bytes = ETH_ALEN,
+	},
+	.nvmem_name = "1-00500",
+};
+
 static struct at24_platform_data eeprom_info = {
 	.byte_len	= (256*1024) / 8,
 	.page_size	= 64,
@@ -828,6 +838,8 @@ static __init void davinci_evm_init(void)
 		phy_register_fixup_for_uid(LXT971_PHY_ID, LXT971_PHY_MASK,
 						davinci_phy_fixup);
 	}
+
+	nvmem_register_lookup(&dm6446evm_mac_address_cell, 1);
 }
 
 MACHINE_START(DAVINCI_EVM, "DaVinci DM644x EVM")
-- 
2.17.1

^ permalink raw reply related

* [PATCH 05/14] ARM: davinci: da830-evm: use nvmem lookup for mac address
From: Bartosz Golaszewski @ 2018-06-25 15:50 UTC (permalink / raw)
  To: Sekhar Nori, Kevin Hilman, Russell King, Grygorii Strashko,
	David S . Miller, Srinivas Kandagatla, Lukas Wunner, Rob Herring,
	Florian Fainelli, Dan Carpenter, Ivan Khoronzhuk, David Lechner,
	Greg Kroah-Hartman
  Cc: linux-arm-kernel, linux-kernel, linux-omap, netdev,
	Bartosz Golaszewski
In-Reply-To: <20180625155025.12567-1-brgl@bgdev.pl>

From: Bartosz Golaszewski <bgolaszewski@baylibre.com>

We now support nvmem lookups for machine code. Add a lookup for
mac-address.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
---
 arch/arm/mach-davinci/board-da830-evm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/arm/mach-davinci/board-da830-evm.c b/arch/arm/mach-davinci/board-da830-evm.c
index 14a6fc061744..3be3e93f2f18 100644
--- a/arch/arm/mach-davinci/board-da830-evm.c
+++ b/arch/arm/mach-davinci/board-da830-evm.c
@@ -29,6 +29,7 @@
 #include <linux/platform_data/spi-davinci.h>
 #include <linux/platform_data/usb-davinci.h>
 #include <linux/regulator/machine.h>
+#include <linux/nvmem-provider.h>
 
 #include <asm/mach-types.h>
 #include <asm/mach/arch.h>
@@ -409,6 +410,15 @@ static inline void da830_evm_init_lcdc(int mux_mode)
 static inline void da830_evm_init_lcdc(int mux_mode) { }
 #endif
 
+static struct nvmem_cell_lookup da830_evm_mac_address_cell = {
+	.info = {
+		.name = "mac-address",
+		.offset = 0x7f00,
+		.bytes = ETH_ALEN,
+	},
+	.nvmem_name = "1-00500",
+};
+
 static struct at24_platform_data da830_evm_i2c_eeprom_info = {
 	.byte_len	= SZ_256K / 8,
 	.page_size	= 64,
@@ -618,6 +628,8 @@ static __init void da830_evm_init(void)
 		pr_warn("%s: spi 0 registration failed: %d\n", __func__, ret);
 
 	regulator_has_full_constraints();
+
+	nvmem_register_lookup(&da830_evm_mac_address_cell, 1);
 }
 
 #ifdef CONFIG_SERIAL_8250_CONSOLE
-- 
2.17.1

^ permalink raw reply related

* [PATCH 08/14] ARM: davinci: mityomapl138: don't read the MAC address from machine code
From: Bartosz Golaszewski @ 2018-06-25 15:50 UTC (permalink / raw)
  To: Sekhar Nori, Kevin Hilman, Russell King, Grygorii Strashko,
	David S . Miller, Srinivas Kandagatla, Lukas Wunner, Rob Herring,
	Florian Fainelli, Dan Carpenter, Ivan Khoronzhuk, David Lechner,
	Greg Kroah-Hartman
  Cc: linux-arm-kernel, linux-kernel, linux-omap, netdev,
	Bartosz Golaszewski
In-Reply-To: <20180625155025.12567-1-brgl@bgdev.pl>

From: Bartosz Golaszewski <bgolaszewski@baylibre.com>

This is now done by the emac driver using a registered nvmem cell.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
---
 arch/arm/mach-davinci/board-mityomapl138.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/arm/mach-davinci/board-mityomapl138.c b/arch/arm/mach-davinci/board-mityomapl138.c
index 2ec31ff61dbd..6263e6afcbf0 100644
--- a/arch/arm/mach-davinci/board-mityomapl138.c
+++ b/arch/arm/mach-davinci/board-mityomapl138.c
@@ -120,7 +120,6 @@ static void read_factory_config(struct nvmem_device *nvmem, void *context)
 {
 	int ret;
 	const char *partnum = NULL;
-	struct davinci_soc_info *soc_info = &davinci_soc_info;
 
 	if (!IS_BUILTIN(CONFIG_NVMEM)) {
 		pr_warn("Factory Config not available without CONFIG_NVMEM\n");
@@ -146,13 +145,6 @@ static void read_factory_config(struct nvmem_device *nvmem, void *context)
 		goto bad_config;
 	}
 
-	pr_info("Found MAC = %pM\n", factory_config.mac);
-	if (is_valid_ether_addr(factory_config.mac))
-		memcpy(soc_info->emac_pdata->mac_addr,
-			factory_config.mac, ETH_ALEN);
-	else
-		pr_warn("Invalid MAC found in factory config block\n");
-
 	partnum = factory_config.partnum;
 	pr_info("Part Number = %s\n", partnum);
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH 09/14] ARM: davinci: dm365-evm: use device properties for at24 eeprom
From: Bartosz Golaszewski @ 2018-06-25 15:50 UTC (permalink / raw)
  To: Sekhar Nori, Kevin Hilman, Russell King, Grygorii Strashko,
	David S . Miller, Srinivas Kandagatla, Lukas Wunner, Rob Herring,
	Florian Fainelli, Dan Carpenter, Ivan Khoronzhuk, David Lechner,
	Greg Kroah-Hartman
  Cc: linux-arm-kernel, linux-kernel, linux-omap, netdev,
	Bartosz Golaszewski
In-Reply-To: <20180625155025.12567-1-brgl@bgdev.pl>

From: Bartosz Golaszewski <bgolaszewski@baylibre.com>

We want to work towards phasing out the at24_platform_data structure.
There are few users and its contents can be represented using generic
device properties. Using device properties only will allow us to
significantly simplify the at24 configuration code.

Remove the at24_platform_data structure and replace it with an array
of property entries. Drop the byte_len/size property, as the model name
already implies the EEPROM's size.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
---
 arch/arm/mach-davinci/board-dm365-evm.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/arm/mach-davinci/board-dm365-evm.c b/arch/arm/mach-davinci/board-dm365-evm.c
index df640d977bfa..ffe93265f565 100644
--- a/arch/arm/mach-davinci/board-dm365-evm.c
+++ b/arch/arm/mach-davinci/board-dm365-evm.c
@@ -18,7 +18,7 @@
 #include <linux/i2c.h>
 #include <linux/io.h>
 #include <linux/clk.h>
-#include <linux/platform_data/at24.h>
+#include <linux/property.h>
 #include <linux/leds.h>
 #include <linux/mtd/mtd.h>
 #include <linux/mtd/partitions.h>
@@ -179,18 +179,15 @@ static struct nvmem_cell_lookup dm365evm_mac_address_cell = {
 	.nvmem_name = "1-00500",
 };
 
-static struct at24_platform_data eeprom_info = {
-	.byte_len       = (256*1024) / 8,
-	.page_size      = 64,
-	.flags          = AT24_FLAG_ADDR16,
-	.setup          = davinci_get_mac_addr,
-	.context	= (void *)0x7f00,
+static const struct property_entry eeprom_properties[] = {
+	PROPERTY_ENTRY_U32("pagesize", 64),
+	{ }
 };
 
 static struct i2c_board_info i2c_info[] = {
 	{
 		I2C_BOARD_INFO("24c256", 0x50),
-		.platform_data	= &eeprom_info,
+		.properties = eeprom_properties,
 	},
 	{
 		I2C_BOARD_INFO("tlv320aic3x", 0x18),
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH v2 ipsec-next] xfrm: policy: remove pcpu policy cache
From: Steffen Klassert @ 2018-06-25 15:55 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev
In-Reply-To: <20180625152602.23111-1-fw@strlen.de>

On Mon, Jun 25, 2018 at 05:26:02PM +0200, Florian Westphal wrote:
> Kristian Evensen says:
>   In a project I am involved in, we are running ipsec (Strongswan) on
>   different mt7621-based routers. Each router is configured as an
>   initiator and has around ~30 tunnels to different responders (running
>   on misc. devices). Before the flow cache was removed (kernel 4.9), we
>   got a combined throughput of around 70Mbit/s for all tunnels on one
>   router. However, we recently switched to kernel 4.14 (4.14.48), and
>   the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
>   drop of around 20%. Reverting the flow cache removal restores, as
>   expected, performance levels to that of kernel 4.9.
> 
> When pcpu xdst exists, it has to be validated first before it can be
> used.
> 
> A negative hit thus increases cost vs. no-cache.
> 
> As number of tunnels increases, hit rate decreases so this pcpu caching
> isn't a viable strategy.
> 
> Furthermore, the xdst cache also needs to run with BH off, so when
> removing this the bh disable/enable pairs can be removed too.
> 
> Kristian tested a 4.14.y backport of this change and reported
> increased performance:
> 
>   In our tests, the throughput reduction has been reduced from around -20%
>   to -5%. We also see that the overall throughput is independent of the
>   number of tunnels, while before the throughput was reduced as the number
>   of tunnels increased.
> 
> Reported-by: Kristian Evensen <kristian.evensen@gmail.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Applied to ipsec-next, thanks a lot!

^ permalink raw reply

* [PATCH net-next] net: preserve sock reference when scrubbing the skb.
From: Flavio Leitner @ 2018-06-25 15:56 UTC (permalink / raw)
  To: netdev
  Cc: Eric Dumazet, Paolo Abeni, David Miller, Florian Westphal,
	netfilter-devel, Flavio Leitner

The sock reference is lost when scrubbing the packet and that breaks
TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
performance impacts of about 50% in a single TCP stream when crossing
network namespaces.

XPS breaks because the queue mapping stored in the socket is not
available, so another random queue might be selected when the stack
needs to transmit something like a TCP ACK, or TCP Retransmissions.
That causes packet re-ordering and/or performance issues.

TSQ breaks because it orphans the packet while it is still in the
host, so packets are queued contributing to the buffer bloat problem.

Preserving the sock reference fixes both issues. The socket is
orphaned anyways in the receiving path before any relevant action,
but the transmit side needs some extra checking included in the
patch.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
---
 include/net/netfilter/nf_log.h         | 3 ++-
 net/core/skbuff.c                      | 1 -
 net/ipv4/netfilter/nf_log_ipv4.c       | 8 ++++----
 net/ipv6/netfilter/nf_log_ipv6.c       | 8 ++++----
 net/netfilter/nf_conntrack_broadcast.c | 2 +-
 net/netfilter/nf_log_common.c          | 5 +++--
 net/netfilter/nf_nat_core.c            | 6 +++++-
 net/netfilter/nft_meta.c               | 9 ++++++---
 net/netfilter/nft_socket.c             | 5 ++++-
 net/netfilter/xt_cgroup.c              | 6 ++++--
 net/netfilter/xt_owner.c               | 2 +-
 net/netfilter/xt_recent.c              | 3 ++-
 net/netfilter/xt_socket.c              | 6 ++++++
 13 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/include/net/netfilter/nf_log.h b/include/net/netfilter/nf_log.h
index e811ac07ea94..0d3920896d50 100644
--- a/include/net/netfilter/nf_log.h
+++ b/include/net/netfilter/nf_log.h
@@ -106,7 +106,8 @@ int nf_log_dump_udp_header(struct nf_log_buf *m, const struct sk_buff *skb,
 int nf_log_dump_tcp_header(struct nf_log_buf *m, const struct sk_buff *skb,
 			   u8 proto, int fragment, unsigned int offset,
 			   unsigned int logflags);
-void nf_log_dump_sk_uid_gid(struct nf_log_buf *m, struct sock *sk);
+void nf_log_dump_sk_uid_gid(struct net *net, struct nf_log_buf *m,
+			    struct sock *sk);
 void nf_log_dump_packet_common(struct nf_log_buf *m, u_int8_t pf,
 			       unsigned int hooknum, const struct sk_buff *skb,
 			       const struct net_device *in,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c642304f178c..37263095545a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4911,7 +4911,6 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
 		return;
 
 	ipvs_reset(skb);
-	skb_orphan(skb);
 	skb->mark = 0;
 }
 EXPORT_SYMBOL_GPL(skb_scrub_packet);
diff --git a/net/ipv4/netfilter/nf_log_ipv4.c b/net/ipv4/netfilter/nf_log_ipv4.c
index 4388de0e5380..1e6f28c97d3a 100644
--- a/net/ipv4/netfilter/nf_log_ipv4.c
+++ b/net/ipv4/netfilter/nf_log_ipv4.c
@@ -35,7 +35,7 @@ static const struct nf_loginfo default_loginfo = {
 };
 
 /* One level of recursion won't kill us */
-static void dump_ipv4_packet(struct nf_log_buf *m,
+static void dump_ipv4_packet(struct net *net, struct nf_log_buf *m,
 			     const struct nf_loginfo *info,
 			     const struct sk_buff *skb, unsigned int iphoff)
 {
@@ -183,7 +183,7 @@ static void dump_ipv4_packet(struct nf_log_buf *m,
 			/* Max length: 3+maxlen */
 			if (!iphoff) { /* Only recurse once. */
 				nf_log_buf_add(m, "[");
-				dump_ipv4_packet(m, info, skb,
+				dump_ipv4_packet(net, m, info, skb,
 					    iphoff + ih->ihl*4+sizeof(_icmph));
 				nf_log_buf_add(m, "] ");
 			}
@@ -251,7 +251,7 @@ static void dump_ipv4_packet(struct nf_log_buf *m,
 
 	/* Max length: 15 "UID=4294967295 " */
 	if ((logflags & NF_LOG_UID) && !iphoff)
-		nf_log_dump_sk_uid_gid(m, skb->sk);
+		nf_log_dump_sk_uid_gid(net, m, skb->sk);
 
 	/* Max length: 16 "MARK=0xFFFFFFFF " */
 	if (!iphoff && skb->mark)
@@ -333,7 +333,7 @@ static void nf_log_ip_packet(struct net *net, u_int8_t pf,
 	if (in != NULL)
 		dump_ipv4_mac_header(m, loginfo, skb);
 
-	dump_ipv4_packet(m, loginfo, skb, 0);
+	dump_ipv4_packet(net, m, loginfo, skb, 0);
 
 	nf_log_buf_close(m);
 }
diff --git a/net/ipv6/netfilter/nf_log_ipv6.c b/net/ipv6/netfilter/nf_log_ipv6.c
index b397a8fe88b9..c6bf580d0f33 100644
--- a/net/ipv6/netfilter/nf_log_ipv6.c
+++ b/net/ipv6/netfilter/nf_log_ipv6.c
@@ -36,7 +36,7 @@ static const struct nf_loginfo default_loginfo = {
 };
 
 /* One level of recursion won't kill us */
-static void dump_ipv6_packet(struct nf_log_buf *m,
+static void dump_ipv6_packet(struct net *net, struct nf_log_buf *m,
 			     const struct nf_loginfo *info,
 			     const struct sk_buff *skb, unsigned int ip6hoff,
 			     int recurse)
@@ -258,7 +258,7 @@ static void dump_ipv6_packet(struct nf_log_buf *m,
 			/* Max length: 3+maxlen */
 			if (recurse) {
 				nf_log_buf_add(m, "[");
-				dump_ipv6_packet(m, info, skb,
+				dump_ipv6_packet(net, m, info, skb,
 						 ptr + sizeof(_icmp6h), 0);
 				nf_log_buf_add(m, "] ");
 			}
@@ -278,7 +278,7 @@ static void dump_ipv6_packet(struct nf_log_buf *m,
 
 	/* Max length: 15 "UID=4294967295 " */
 	if ((logflags & NF_LOG_UID) && recurse)
-		nf_log_dump_sk_uid_gid(m, skb->sk);
+		nf_log_dump_sk_uid_gid(net, m, skb->sk);
 
 	/* Max length: 16 "MARK=0xFFFFFFFF " */
 	if (recurse && skb->mark)
@@ -365,7 +365,7 @@ static void nf_log_ip6_packet(struct net *net, u_int8_t pf,
 	if (in != NULL)
 		dump_ipv6_mac_header(m, loginfo, skb);
 
-	dump_ipv6_packet(m, loginfo, skb, skb_network_offset(skb), 1);
+	dump_ipv6_packet(net, m, loginfo, skb, skb_network_offset(skb), 1);
 
 	nf_log_buf_close(m);
 }
diff --git a/net/netfilter/nf_conntrack_broadcast.c b/net/netfilter/nf_conntrack_broadcast.c
index a1086bdec242..5423b197d98a 100644
--- a/net/netfilter/nf_conntrack_broadcast.c
+++ b/net/netfilter/nf_conntrack_broadcast.c
@@ -32,7 +32,7 @@ int nf_conntrack_broadcast_help(struct sk_buff *skb,
 	__be32 mask = 0;
 
 	/* we're only interested in locally generated packets */
-	if (skb->sk == NULL)
+	if (skb->sk == NULL || !net_eq(nf_ct_net(ct), sock_net(skb->sk)))
 		goto out;
 	if (rt == NULL || !(rt->rt_flags & RTCF_BROADCAST))
 		goto out;
diff --git a/net/netfilter/nf_log_common.c b/net/netfilter/nf_log_common.c
index dc61399e30be..a8c5c846aec1 100644
--- a/net/netfilter/nf_log_common.c
+++ b/net/netfilter/nf_log_common.c
@@ -132,9 +132,10 @@ int nf_log_dump_tcp_header(struct nf_log_buf *m, const struct sk_buff *skb,
 }
 EXPORT_SYMBOL_GPL(nf_log_dump_tcp_header);
 
-void nf_log_dump_sk_uid_gid(struct nf_log_buf *m, struct sock *sk)
+void nf_log_dump_sk_uid_gid(struct net *net, struct nf_log_buf *m,
+			    struct sock *sk)
 {
-	if (!sk || !sk_fullsock(sk))
+	if (!sk || !sk_fullsock(sk) || !net_eq(net, sock_net(sk)))
 		return;
 
 	read_lock_bh(&sk->sk_callback_lock);
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 46f9df99d276..86df2a1666fd 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -108,6 +108,7 @@ int nf_xfrm_me_harder(struct net *net, struct sk_buff *skb, unsigned int family)
 	struct flowi fl;
 	unsigned int hh_len;
 	struct dst_entry *dst;
+	struct sock *sk = skb->sk;
 	int err;
 
 	err = xfrm_decode_session(skb, &fl, family);
@@ -119,7 +120,10 @@ int nf_xfrm_me_harder(struct net *net, struct sk_buff *skb, unsigned int family)
 		dst = ((struct xfrm_dst *)dst)->route;
 	dst_hold(dst);
 
-	dst = xfrm_lookup(net, dst, &fl, skb->sk, 0);
+	if (sk && !net_eq(net, sock_net(sk)))
+		sk = NULL;
+
+	dst = xfrm_lookup(net, dst, &fl, sk, 0);
 	if (IS_ERR(dst))
 		return PTR_ERR(dst);
 
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 1105a23bda5e..2b94dcc43456 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -107,7 +107,8 @@ static void nft_meta_get_eval(const struct nft_expr *expr,
 		break;
 	case NFT_META_SKUID:
 		sk = skb_to_full_sk(skb);
-		if (!sk || !sk_fullsock(sk))
+		if (!sk || !sk_fullsock(sk) ||
+		    !net_eq(nft_net(pkt), sock_net(sk)))
 			goto err;
 
 		read_lock_bh(&sk->sk_callback_lock);
@@ -123,7 +124,8 @@ static void nft_meta_get_eval(const struct nft_expr *expr,
 		break;
 	case NFT_META_SKGID:
 		sk = skb_to_full_sk(skb);
-		if (!sk || !sk_fullsock(sk))
+		if (!sk || !sk_fullsock(sk) ||
+		    !net_eq(nft_net(pkt), sock_net(sk)))
 			goto err;
 
 		read_lock_bh(&sk->sk_callback_lock);
@@ -214,7 +216,8 @@ static void nft_meta_get_eval(const struct nft_expr *expr,
 #ifdef CONFIG_CGROUP_NET_CLASSID
 	case NFT_META_CGROUP:
 		sk = skb_to_full_sk(skb);
-		if (!sk || !sk_fullsock(sk))
+		if (!sk || !sk_fullsock(sk) ||
+		    !net_eq(nft_net(pkt), sock_net(sk)))
 			goto err;
 		*dest = sock_cgroup_classid(&sk->sk_cgrp_data);
 		break;
diff --git a/net/netfilter/nft_socket.c b/net/netfilter/nft_socket.c
index 74e1b3bd6954..998c2b546f6d 100644
--- a/net/netfilter/nft_socket.c
+++ b/net/netfilter/nft_socket.c
@@ -23,6 +23,9 @@ static void nft_socket_eval(const struct nft_expr *expr,
 	struct sock *sk = skb->sk;
 	u32 *dest = &regs->data[priv->dreg];
 
+	if (sk && !net_eq(nft_net(pkt), sock_net(sk)))
+		sk = NULL;
+
 	if (!sk)
 		switch(nft_pf(pkt)) {
 		case NFPROTO_IPV4:
@@ -39,7 +42,7 @@ static void nft_socket_eval(const struct nft_expr *expr,
 			return;
 		}
 
-	if(!sk) {
+	if (!sk) {
 		nft_reg_store8(dest, 0);
 		return;
 	}
diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c
index 7df2dece57d3..5d92e1781980 100644
--- a/net/netfilter/xt_cgroup.c
+++ b/net/netfilter/xt_cgroup.c
@@ -72,8 +72,9 @@ static bool
 cgroup_mt_v0(const struct sk_buff *skb, struct xt_action_param *par)
 {
 	const struct xt_cgroup_info_v0 *info = par->matchinfo;
+	struct sock *sk = skb->sk;
 
-	if (skb->sk == NULL || !sk_fullsock(skb->sk))
+	if (!sk || !sk_fullsock(sk) || !net_eq(xt_net(par), sock_net(sk)))
 		return false;
 
 	return (info->id == sock_cgroup_classid(&skb->sk->sk_cgrp_data)) ^
@@ -85,8 +86,9 @@ static bool cgroup_mt_v1(const struct sk_buff *skb, struct xt_action_param *par)
 	const struct xt_cgroup_info_v1 *info = par->matchinfo;
 	struct sock_cgroup_data *skcd = &skb->sk->sk_cgrp_data;
 	struct cgroup *ancestor = info->priv;
+	struct sock *sk = skb->sk;
 
-	if (!skb->sk || !sk_fullsock(skb->sk))
+	if (!sk || !sk_fullsock(sk) || !net_eq(xt_net(par), sock_net(sk)))
 		return false;
 
 	if (ancestor)
diff --git a/net/netfilter/xt_owner.c b/net/netfilter/xt_owner.c
index 3d705c688a27..46686fb73784 100644
--- a/net/netfilter/xt_owner.c
+++ b/net/netfilter/xt_owner.c
@@ -67,7 +67,7 @@ owner_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	struct sock *sk = skb_to_full_sk(skb);
 	struct net *net = xt_net(par);
 
-	if (sk == NULL || sk->sk_socket == NULL)
+	if (!sk || !sk->sk_socket || !net_eq(net, sock_net(sk)))
 		return (info->match ^ info->invert) == 0;
 	else if (info->match & info->invert & XT_OWNER_SOCKET)
 		/*
diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c
index 07085c22b19c..f44de4bc2100 100644
--- a/net/netfilter/xt_recent.c
+++ b/net/netfilter/xt_recent.c
@@ -265,7 +265,8 @@ recent_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	}
 
 	/* use TTL as seen before forwarding */
-	if (xt_out(par) != NULL && skb->sk == NULL)
+	if (xt_out(par) != NULL &&
+	    (!skb->sk || !net_eq(net, sock_net(skb->sk))))
 		ttl++;
 
 	spin_lock_bh(&recent_lock);
diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
index 5c0779c4fa3c..e2795f5f7000 100644
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -58,6 +58,9 @@ socket_match(const struct sk_buff *skb, struct xt_action_param *par,
 
 	if (!sk)
 		sk = nf_sk_lookup_slow_v4(xt_net(par), skb, xt_in(par));
+	else if (!net_eq(xt_net(par), sock_net(sk)))
+		sk = NULL;
+
 	if (sk) {
 		bool wildcard;
 		bool transparent = true;
@@ -115,6 +118,9 @@ socket_mt6_v1_v2_v3(const struct sk_buff *skb, struct xt_action_param *par)
 
 	if (!sk)
 		sk = nf_sk_lookup_slow_v6(xt_net(par), skb, xt_in(par));
+	else if (!net_eq(xt_net(par), sock_net(sk)))
+		sk = NULL;
+
 	if (sk) {
 		bool wildcard;
 		bool transparent = true;
-- 
2.14.3

^ permalink raw reply related

* [PATCH nf-next v2] openvswitch: use nf_ct_get_tuplepr, invert_tuplepr
From: Florian Westphal @ 2018-06-25 15:55 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, pshelar, dev, Florian Westphal

These versions deal with the l3proto/l4proto details internally.
It removes only caller of nf_ct_get_tuple, so make it static.

After this, l3proto->get_l4proto() can be removed in a followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 No changes since v1.

 This is a preparation patch to remove the l3proto indirections.
 Evanutally nf_conntrack_l3proto will be removed.

 ipv4 and ipv6 protocol trackers will be part of nf_conntrack itself.

 include/net/netfilter/nf_conntrack_core.h |  7 -------
 net/netfilter/nf_conntrack_core.c         |  3 +--
 net/openvswitch/conntrack.c               | 17 +++--------------
 3 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h
index 9b5e7634713e..90df45022c51 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -40,13 +40,6 @@ void nf_conntrack_cleanup_start(void);
 void nf_conntrack_init_end(void);
 void nf_conntrack_cleanup_end(void);
 
-bool nf_ct_get_tuple(const struct sk_buff *skb, unsigned int nhoff,
-		     unsigned int dataoff, u_int16_t l3num, u_int8_t protonum,
-		     struct net *net,
-		     struct nf_conntrack_tuple *tuple,
-		     const struct nf_conntrack_l3proto *l3proto,
-		     const struct nf_conntrack_l4proto *l4proto);
-
 bool nf_ct_invert_tuple(struct nf_conntrack_tuple *inverse,
 			const struct nf_conntrack_tuple *orig,
 			const struct nf_conntrack_l3proto *l3proto,
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 3465da2a98bd..160493f95fed 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -222,7 +222,7 @@ static u32 hash_conntrack(const struct net *net,
 	return scale_hash(hash_conntrack_raw(tuple, net));
 }
 
-bool
+static bool
 nf_ct_get_tuple(const struct sk_buff *skb,
 		unsigned int nhoff,
 		unsigned int dataoff,
@@ -244,7 +244,6 @@ nf_ct_get_tuple(const struct sk_buff *skb,
 
 	return l4proto->pkt_to_tuple(skb, dataoff, net, tuple);
 }
-EXPORT_SYMBOL_GPL(nf_ct_get_tuple);
 
 bool nf_ct_get_tuplepr(const struct sk_buff *skb, unsigned int nhoff,
 		       u_int16_t l3num,
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 284aca2a252d..e05bd3e53f0f 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -607,23 +607,12 @@ static struct nf_conn *
 ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
 		     u8 l3num, struct sk_buff *skb, bool natted)
 {
-	const struct nf_conntrack_l3proto *l3proto;
-	const struct nf_conntrack_l4proto *l4proto;
 	struct nf_conntrack_tuple tuple;
 	struct nf_conntrack_tuple_hash *h;
 	struct nf_conn *ct;
-	unsigned int dataoff;
-	u8 protonum;
 
-	l3proto = __nf_ct_l3proto_find(l3num);
-	if (l3proto->get_l4proto(skb, skb_network_offset(skb), &dataoff,
-				 &protonum) <= 0) {
-		pr_debug("ovs_ct_find_existing: Can't get protonum\n");
-		return NULL;
-	}
-	l4proto = __nf_ct_l4proto_find(l3num, protonum);
-	if (!nf_ct_get_tuple(skb, skb_network_offset(skb), dataoff, l3num,
-			     protonum, net, &tuple, l3proto, l4proto)) {
+	if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), l3num,
+			       net, &tuple)) {
 		pr_debug("ovs_ct_find_existing: Can't get tuple\n");
 		return NULL;
 	}
@@ -632,7 +621,7 @@ ovs_ct_find_existing(struct net *net, const struct nf_conntrack_zone *zone,
 	if (natted) {
 		struct nf_conntrack_tuple inverse;
 
-		if (!nf_ct_invert_tuple(&inverse, &tuple, l3proto, l4proto)) {
+		if (!nf_ct_invert_tuplepr(&inverse, &tuple)) {
 			pr_debug("ovs_ct_find_existing: Inversion failed!\n");
 			return NULL;
 		}
-- 
2.16.4

^ permalink raw reply related

* Re: Grant
From: M. M. Fridman @ 2018-06-25 11:50 UTC (permalink / raw)




-- 
I Mikhail Fridman has selected you specially as one of my beneficiaries for my
Charitable Donation.

Reply as soon as possible with further directives.

Best Regards,
Mikhail Fridman.

^ permalink raw reply

* Re: [PATCH net-next 3/5] sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams
From: 吉藤英明 @ 2018-06-25 16:12 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Neil Horman, David Miller, lucien.xin, netdev, linux-sctp,
	吉藤英明, yoshfuji
In-Reply-To: <20180625130319.GA820@localhost.localdomain>

Hi,

2018-06-25 22:03 GMT+09:00 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>:
> On Mon, Jun 25, 2018 at 07:28:47AM -0400, Neil Horman wrote:
>> On Mon, Jun 25, 2018 at 04:31:26PM +0900, David Miller wrote:
>> > From: Xin Long <lucien.xin@gmail.com>
>> > Date: Mon, 25 Jun 2018 10:14:35 +0800
>> >
>> > >  struct sctp_paddrparams {
>> > > @@ -773,6 +775,8 @@ struct sctp_paddrparams {
>> > >   __u32                   spp_pathmtu;
>> > >   __u32                   spp_sackdelay;
>> > >   __u32                   spp_flags;
>> > > + __u32                   spp_ipv6_flowlabel;
>> > > + __u8                    spp_dscp;
>> > >  } __attribute__((packed, aligned(4)));
>> >
>> > I don't think you can change the size of this structure like this.
>> >
>> > This check in sctp_setsockopt_peer_addr_params():
>> >
>> >     if (optlen != sizeof(struct sctp_paddrparams))
>> >             return -EINVAL;
>> >
>> > is going to trigger in old kernels when executing programs
>> > built against the new struct definition.
>
> That will happen, yes, but do we really care about being future-proof
> here? I mean: if we also update such check(s) to support dealing with
> smaller-than-supported structs, newer kernels will be able to run
> programs built against the old struct, and the new one; while building
> using newer headers and running on older kernel may fool the
> application in other ways too (like enabling support for something
> that is available on newer kernel and that is not present in the older
> one).

We should not break existing apps.
We still accept apps of pre-2.4 era without sin6_scope_id
(e.g., net/ipv6/af_inet6.c:inet6_bind()).

>
>> >
>> I think thats also the reason its a packed aligned attribute, it can't be
>> changed, or older kernels won't be able to fill it out properly.
>> Neil
>
> It's more for supporting running 32-bits apps on 64-bit kernels
> (according to 20c9c825b12fc).
>
>   Marcelo

^ permalink raw reply

* Re: [PATCH net-next 3/5] sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams
From: Marcelo Ricardo Leitner @ 2018-06-25 16:31 UTC (permalink / raw)
  To: 吉藤英明
  Cc: Neil Horman, David Miller, lucien.xin, netdev, linux-sctp,
	yoshfuji
In-Reply-To: <CAPA1RqB48gBWKOQck6Ke5yH3cm1B5QtJKh1CL4SYfqes8Tc+Nw@mail.gmail.com>

Hi,

On Tue, Jun 26, 2018 at 01:12:00AM +0900, 吉藤英明 wrote:
> Hi,
> 
> 2018-06-25 22:03 GMT+09:00 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>:
> > On Mon, Jun 25, 2018 at 07:28:47AM -0400, Neil Horman wrote:
> >> On Mon, Jun 25, 2018 at 04:31:26PM +0900, David Miller wrote:
> >> > From: Xin Long <lucien.xin@gmail.com>
> >> > Date: Mon, 25 Jun 2018 10:14:35 +0800
> >> >
> >> > >  struct sctp_paddrparams {
> >> > > @@ -773,6 +775,8 @@ struct sctp_paddrparams {
> >> > >   __u32                   spp_pathmtu;
> >> > >   __u32                   spp_sackdelay;
> >> > >   __u32                   spp_flags;
> >> > > + __u32                   spp_ipv6_flowlabel;
> >> > > + __u8                    spp_dscp;
> >> > >  } __attribute__((packed, aligned(4)));
> >> >
> >> > I don't think you can change the size of this structure like this.
> >> >
> >> > This check in sctp_setsockopt_peer_addr_params():
> >> >
> >> >     if (optlen != sizeof(struct sctp_paddrparams))
> >> >             return -EINVAL;
> >> >
> >> > is going to trigger in old kernels when executing programs
> >> > built against the new struct definition.
> >
> > That will happen, yes, but do we really care about being future-proof
> > here? I mean: if we also update such check(s) to support dealing with
> > smaller-than-supported structs, newer kernels will be able to run
> > programs built against the old struct, and the new one; while building
> > using newer headers and running on older kernel may fool the
> > application in other ways too (like enabling support for something
> > that is available on newer kernel and that is not present in the older
> > one).
> 
> We should not break existing apps.
> We still accept apps of pre-2.4 era without sin6_scope_id
> (e.g., net/ipv6/af_inet6.c:inet6_bind()).

Yes. That's what I tried to say. That is supporting an old app built
with old kernel headers and running on a newer kernel, and not the
other way around (an app built with fresh headers and running on an
old kernel).

> 
> >
> >> >
> >> I think thats also the reason its a packed aligned attribute, it can't be
> >> changed, or older kernels won't be able to fill it out properly.
> >> Neil
> >
> > It's more for supporting running 32-bits apps on 64-bit kernels
> > (according to 20c9c825b12fc).
> >
> >   Marcelo

^ permalink raw reply

* Re: [bpf PATCH v4 3/4] bpf: sockhash fix omitted bucket lock in sock_close
From: Martin KaFai Lau @ 2018-06-25 16:40 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev
In-Reply-To: <20180625153417.3057.19275.stgit@john-Precision-Tower-5810>

On Mon, Jun 25, 2018 at 08:34:17AM -0700, John Fastabend wrote:
> First in tcp_close, reduce scope of sk_callback_lock() the lock is
> only needed for protecting maps list the ingress and cork
> lists are protected by sock lock. Having the lock in wider scope is
> harmless but may confuse the reader who may infer it is in fact
> needed.
> 
> Next, in sock_hash_delete_elem() the pattern is as follows,
> 
>   sock_hash_delete_elem()
>      [...]
>      spin_lock(bucket_lock)
>      l = lookup_elem_raw()
>      if (l)
>         hlist_del_rcu()
>         write_lock(sk_callback_lock)
>          .... destroy psock ...
>         write_unlock(sk_callback_lock)
>      spin_unlock(bucket_lock)
> 
> The ordering is necessary because we only know the {p}sock after
> dereferencing the hash table which we can't do unless we have the
> bucket lock held. Once we have the bucket lock and the psock element
> it is deleted from the hashmap to ensure any other path doing a lookup
> will fail. Finally, the refcnt is decremented and if zero the psock
> is destroyed.
> 
> In parallel with the above (or free'ing the map) a tcp close event
> may trigger tcp_close(). Which at the moment omits the bucket lock
> altogether (oops!) where the flow looks like this,
> 
>   bpf_tcp_close()
>      [...]
>      write_lock(sk_callback_lock)
>      for each psock->maps // list of maps this sock is part of
>          hlist_del_rcu(ref_hash_node);
>          .... destroy psock ...
>      write_unlock(sk_callback_lock)
> 
> Obviously, and demonstrated by syzbot, this is broken because
> we can have multiple threads deleting entries via hlist_del_rcu().
> 
> To fix this we might be tempted to wrap the hlist operation in a
> bucket lock but that would create a lock inversion problem. In
> summary to follow locking rules the psocks maps list needs the
> sk_callback_lock but we need the bucket lock to do the hlist_del_rcu.
> To resolve the lock inversion problem pop the head of the maps list
> repeatedly and remove the reference until no more are left. If a
> delete happens in parallel from the BPF API that is OK as well because
> it will do a similar action, lookup the lock in the map/hash, delete
> it from the map/hash, and dec the refcnt. We check for this case
> before doing a destroy on the psock to ensure we don't have two
> threads tearing down a psock. The new logic is as follows,
> 
>   bpf_tcp_close()
>   e = psock_map_pop(psock->maps) // done with sk_callback_lock
>   bucket_lock() // lock hash list bucket
>   l = lookup_elem_raw(head, hash, key, key_size);
>   if (l) {
>      //only get here if elmnt was not already removed
>      hlist_del_rcu()
>      ... destroy psock...
>   }
>   bucket_unlock()
> 
> And finally for all the above to work add missing sk_callback_lock
> around smap_list_remove in sock_hash_ctx_update_elem(). Otherwise
> delete and update may corrupt maps list. Then add RCU annotations and
> use rcu_dereference/rcu_assign_pointer to manage values relying on
> RCU so that the object is not free'd from sock_hash_free() while it
> is being referenced in bpf_tcp_close().
> 
> (As an aside the sk_callback_lock serves two purposes. The
>  first, is to update the sock callbacks sk_data_ready, sk_write_space,
>  etc. The second is to protect the psock 'maps' list. The 'maps' list
>  is used to (as shown above) to delete all map/hash references to a
>  sock when the sock is closed)
> 
> Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
> Fixes: 81110384441a ("bpf: sockmap, add hash map support")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply

* Re: [offlist] Re: Crash in netlink/sk_filter_trim_cap on ARMv7 on 4.18rc1
From: Peter Robinson @ 2018-06-25 16:41 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Eric Dumazet, netdev, linux-arm-kernel, labbott
In-Reply-To: <CALeDE9PBZWJBp8KB0mB4zoNXqscmzxWzz+LnuqRA-z4t1e9T8g@mail.gmail.com>

On Mon, Jun 25, 2018 at 2:39 PM, Peter Robinson <pbrobinson@gmail.com> wrote:
> Hi Daniel,
>
>> On 06/24/2018 11:24 AM, Peter Robinson wrote:
>>>>> I'm seeing this netlink/sk_filter_trim_cap crash on ARMv7 across quite
>>>>> a few ARMv7 platforms on Fedora with 4.18rc1. I've tested RPi2/RPi3
>>>>> (doesn't happen on aarch64), AllWinner H3, BeagleBone and a few
>>>>> others, both LPAE/normal kernels.
>>
>> So this is arm32 right?
>
> Correct.
>
>>>>> I'm a bit out of my depth in this part of the kernel but I'm wondering
>>>>> if it's known, I couldn't find anything that looked obvious on a few
>>>>> mailing lists.
>>>>>
>>>>> Peter
>>>>
>>>> Hi Peter
>>>>
>>>> Could you provide symbolic information ?
>>>
>>> I passed in through scripts/decode_stacktrace.sh is that what you were after:
>>>
>>> [    8.673880] Internal error: Oops: a06 [#10] SMP ARM
>>> [    8.673949] ---[ end trace 049df4786ea3140a ]---
>>> [    8.678754] Modules linked in:
>>> [    8.678766] CPU: 1 PID: 206 Comm: systemd-udevd Tainted: G      D
>>>         4.18.0-0.rc1.git0.1.fc29.armv7hl+lpae #1
>>> [    8.678769] Hardware name: Allwinner sun8i Family
>>> [    8.678781] PC is at sk_filter_trim_cap ()
>>> [    8.678790] LR is at   (null)
>>> [    8.709463] pc : lr : psr: 60000013 ()
>>> [    8.715722] sp : c996bd60  ip : 00000000  fp : 00000000
>>> [    8.720939] r10: ee79dc00  r9 : c12c9f80  r8 : 00000000
>>> [    8.726157] r7 : 00000000  r6 : 00000001  r5 : f1648000  r4 : 00000000
>>> [    8.732674] r3 : 00000007  r2 : 00000000  r1 : 00000000  r0 : 00000000
>>> [    8.739193] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
>>> [    8.746318] Control: 30c5387d  Table: 6e7bc880  DAC: ffe75ece
>>> [    8.752055] Process systemd-udevd (pid: 206, stack limit = 0x(ptrval))
>>> [    8.758574] Stack: (0xc996bd60 to 0xc996c000)
>>
>> Do you have BPF JIT enabled or disabled? Does it happen with disabled?
>
> Enabled, I can test with it disabled, BPF configs bits are:
> CONFIG_BPF_EVENTS=y
> # CONFIG_BPFILTER is not set
> CONFIG_BPF_JIT_ALWAYS_ON=y
> CONFIG_BPF_JIT=y
> CONFIG_BPF_STREAM_PARSER=y
> CONFIG_BPF_SYSCALL=y
> CONFIG_BPF=y
> CONFIG_CGROUP_BPF=y
> CONFIG_HAVE_EBPF_JIT=y
> CONFIG_IPV6_SEG6_BPF=y
> CONFIG_LWTUNNEL_BPF=y
> # CONFIG_NBPFAXI_DMA is not set
> CONFIG_NET_ACT_BPF=m
> CONFIG_NET_CLS_BPF=m
> CONFIG_NETFILTER_XT_MATCH_BPF=m
> # CONFIG_TEST_BPF is not set
>
>> I can see one bug, but your stack trace seems unrelated.
>>
>> Anyway, could you try with this?
>
> Build in process.
>
>> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
>> index 6e8b716..f6a62ae 100644
>> --- a/arch/arm/net/bpf_jit_32.c
>> +++ b/arch/arm/net/bpf_jit_32.c
>> @@ -1844,7 +1844,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>>                 /* there are 2 passes here */
>>                 bpf_jit_dump(prog->len, image_size, 2, ctx.target);
>>
>> -       set_memory_ro((unsigned long)header, header->pages);
>> +       bpf_jit_binary_lock_ro(header);
>>         prog->bpf_func = (void *)ctx.target;
>>         prog->jited = 1;
>>         prog->jited_len = image_size;

So with that and the other fix there was no improvement, with those
and the BPF JIT disabled it works, I'm not sure if the two patches
have any effect with the JIT disabled though.

Will look at the other patches shortly, there's been some other issue
introduced between rc1 and rc2 which I have to work out before I can
test those though.

Peter

^ permalink raw reply

* Re: [PATCH] selftests: bpf: enable NET_SCHED
From: Song Liu @ 2018-06-25 16:49 UTC (permalink / raw)
  To: Anders Roxell
  Cc: Alexei Starovoitov, Daniel Borkmann, shuah, Networking, open list,
	linux-kselftest
In-Reply-To: <20180625145605.13726-1-anders.roxell@linaro.org>

On Mon, Jun 25, 2018 at 7:56 AM, Anders Roxell <anders.roxell@linaro.org> wrote:
> CONFIG_NET_SCHED wasn't enabled in arm64's defconfig only for x86.
> So bpf/test_tunnel.sh tests fails with:
> RTNETLINK answers: Operation not supported
> RTNETLINK answers: Operation not supported
> We have an error talking to the kernel, -1
> Enable NET_SCHED and more tests passes.
>
> Fixes: 3bce593ac06b ("selftests: bpf: config: add config fragments")
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  tools/testing/selftests/bpf/config | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
> index 1e0c547caf3c..7a6d92562dc6 100644
> --- a/tools/testing/selftests/bpf/config
> +++ b/tools/testing/selftests/bpf/config
> @@ -6,6 +6,7 @@ CONFIG_TEST_BPF=m
>  CONFIG_CGROUP_BPF=y
>  CONFIG_NETDEVSIM=m
>  CONFIG_NET_CLS_ACT=y
> +CONFIG_NET_SCHED=y
>  CONFIG_NET_SCH_INGRESS=y
>  CONFIG_NET_IPIP=y
>  CONFIG_IPV6=y
> --
> 2.18.0
>

^ permalink raw reply

* Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support
From: Sowmini Varadhan @ 2018-06-25 17:03 UTC (permalink / raw)
  To: Ka-Cheong Poon; +Cc: netdev, santosh.shilimkar, davem, rds-devel
In-Reply-To: <7f4f460079d3d78a18f7d759488048798e99c4db.1529922794.git.ka-cheong.poon@oracle.com>

On (06/25/18 03:38), Ka-Cheong Poon wrote:
> @@ -1105,8 +1105,27 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len)
>  			break;
>  
>  		case sizeof(*sin6): {
> -			ret = -EPROTONOSUPPORT;
> -			goto out;
> +			int addr_type;
                         :
                         :
> +			daddr = sin6->sin6_addr;
> +			dport = sin6->sin6_port;
> +			scope_id = sin6->sin6_scope_id;
> +			break;
>  		}

In rds_sendmsg, the scopeid passed to rds_conn_create_outgoing
may come from the msg_name (if msg_name is a link-local) or
may come from the rs_bound_scope_id (for connected socket, change
made in Patch 1 of the series). 

This sounds inconsistent.

If I bind to scopeid if1 and then send to fe80::1%if2 (without connect()), 
we'd create an rds_connection with dev_if set to if2. 
(first off, its a bit unexpected to be sending to fe80::1%if2 when you
are bound to a link-local on if1!)

But then, if we got back a response from fe80::1%if2, I think we would
not find a matching conn in rds_recv_incoming? 

And this is even more confusing because the fastpath in rds_sendmsg
does not take the bound_scope_id into consideration at all:
1213         if (rs->rs_conn && ipv6_addr_equal(&rs->rs_conn->c_faddr, &daddr))
1214                 conn = rs->rs_conn;
1215         else {
1216                 conn = rds_conn_create_outgoing( /* .. */, scope_id)
so if I erroneously passed a msg_name on a connected rds socket, what
would happen? (see also question about rds_connect() itself, below)

Should we always use rs_bound_scope_id for creating the outgoing
rds_connection? (you may need something deterministic for this, 
like "if bound addr is linklocal, return error if daddr has a different
scopeid, else use the bound addr's scopeid", plus, "if bound addr is
not global, and daddr is link-local, we need a conn with the daddr's
scopeid")

Also, why is there no IPv6 support in rds_connect? 

(still looking through the rds-tcp changes, but wanted to get these
questions clarified first).

--Sowmini

^ permalink raw reply

* Re: [PATCH rdma-next 08/12] overflow.h: Add arithmetic shift helper
From: Jason Gunthorpe @ 2018-06-25 17:11 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Leon Romanovsky, Doug Ledford, Kees Cook, Leon Romanovsky,
	RDMA mailing list, Hadar Hen Zion, Matan Barak, Michael J Ruhl,
	Noa Osherovich, Raed Salem, Yishai Hadas, Saeed Mahameed,
	linux-netdev, linux-kernel
In-Reply-To: <CAKwiHFhgsyWYD+q+JFb2HJEphnjiiOp=o4Airv3MW031q2jx8w@mail.gmail.com>

On Mon, Jun 25, 2018 at 11:26:05AM +0200, Rasmus Villemoes wrote:

>    check_shift_overflow(a, s, d) {
>        unsigned _nbits = 8*sizeof(a);
>        typeof(a) _a = (a);
>        typeof(s) _s = (s);
>        typeof(d) _d = (d);
> 
>        *_d = ((u64)(_a) << (_s & (_nbits-1)));
>        _s >= _nbits || (_s > 0 && (_a >> (_nbits - _s -
>    is_signed_type(a))) != 0);
>    }

Those types are not quite right.. What about this?

    check_shift_overflow(a, s, d) ({
        unsigned int _nbits = 8*sizeof(d) - is_signed_type(d);
        typeof(d) _a = a;  // Shift is always performed on type 'd'
        typeof(s) _s = s;
        typeof(d) _d = d;
 
        *_d = (_a << (_s & (_nbits-1)));

	(((*_d) >> (_s & (_nbits-1)) != _a);
    })

And can we use mathamatcial invertability to prove no overlow and
bound _a ? As above.

Jason

^ permalink raw reply

* Re: [PATCH net-next] tcp: add SNMP counter for zero-window drops
From: Eric Dumazet @ 2018-06-25 17:26 UTC (permalink / raw)
  To: Yafang Shao, davem; +Cc: netdev
In-Reply-To: <1529848974-6384-1-git-send-email-laoar.shao@gmail.com>



On 06/24/2018 07:02 AM, Yafang Shao wrote:
> It will be helpful if we could display the drops due to zero window or no
> enough window space.
> So a new SNMP MIB entry is added to track this behavior.
> This entry is named LINUX_MIB_TCPZEROWINDOWDROP and published in
> /proc/net/netstat in TcpExt line as TCPZeroWindowDrop.
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---

Signed-off-by: Eric Dumazet <edumazet@google.com>

Thanks !

^ permalink raw reply

* Re: [PATCH net] KEYS: DNS: fix parsing multiple options
From: Eric Biggers @ 2018-06-25 17:37 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Howells, keyrings, Wang Lei, Eric Biggers
In-Reply-To: <8195.1528992870@warthog.procyon.org.uk>

On Thu, Jun 14, 2018 at 05:14:30PM +0100, David Howells wrote:
> The fix seems to work, but the use of kstrtoul():
> 
> 	ret = kstrtoul(eq, 10, &derrno);
> 
> is incorrect since the buffer can't been modified to block out the next
> argument if there is one, so the following fails:
> 
> 	perl -e 'print "#dnserror=1#", "\x00" x 1' |
> 	keyctl padd dns_resolver desc @s
> 
> (Note this is preexisting and nothing to do with your patch).
> 
> I'm not sure how best to handle this.
> 
> Anyway, Dave, can you take Eric's patch into the net tree with:
> 
> 	Acked-by: David Howells <dhowells@redhat.com>
> 
> David

It could be handled by copying the option value to a temporary buffer.
Anyway, that can be a separate fix...

David (Miller), are you planning to take this through -net?

Thanks!

- Eric

^ permalink raw reply

* Re: [PATCH 00/14] ARM: davinci: step towards removing at24_platform_data
From: Andrew Lunn @ 2018-06-25 17:40 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Rob Herring, Grygorii Strashko, David Lechner, Ivan Khoronzhuk,
	Kevin Hilman, Greg Kroah-Hartman, Sekhar Nori, Russell King,
	linux-kernel, Bartosz Golaszewski, Lukas Wunner,
	Srinivas Kandagatla, linux-arm-kernel, netdev, Florian Fainelli,
	linux-omap, David S . Miller, Dan Carpenter
In-Reply-To: <20180625155025.12567-1-brgl@bgdev.pl>

On Mon, Jun 25, 2018 at 05:50:11PM +0200, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bgolaszewski@baylibre.com>
> 
> Since I took over maintainership of the at24 driver I've been working
> towards removing at24_platform_data in favor for device properties.
> 
> DaVinci is the only platform that's still using it - all other users
> have already been converted.
> 
> One of the obstacles in case of DaVinci is removing the setup() callback
> from the pdata struct, the only user of which are some davinci boards.

Hi Bartosz

Nice code.

I've got a platform i want to add sometime soon using the at24. I
noticed you doing the cleanup, so i avoided the setup() call. But
using it would of helped...

My platform is x86 based, so no device tree. I instantiate a number of
AT24 devices from a platform driver, and then add nvmem cells so i can
access data in these eeproms. This new code will make this simpler.

> The only board that's still using this callback is now mityomapl138.
> Unfortunately it stores more info in EEPROM than just the MAC address
> and will require some more work. Unfortunately I don't have access
> to this board so I can't test any actual solutions on a live hardware.

Depending on what i find in the EEPROM, i need to instantiate other
i2c devices. So i have the problem of knowing when the EEPROM has
actually probed and i can use the nvmem API to retrieve the contents.

What i have done so far, is registers a bus notifier on i2c_bus_type,
and look for BUS_NOTIFY_BOUND_DRIVER. I can then check if the i2c
client in the notifier is the at24 client. But when i then add more
i2c clients from inside the notifier i get lockdep splats. They are
false positives, but it does suggest it is not a good idea to do this.

So it would be good to have some sort of recommended alternative to
the setup() callback. Ideally it would be specific to a particular
at24, and safe to call other i2c functions from.

Do you have any ideas?

   Andrew

^ permalink raw reply

* Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support
From: Ka-Cheong Poon @ 2018-06-25 17:43 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev, santosh.shilimkar, davem, rds-devel
In-Reply-To: <20180625170317.GA28578@oracle.com>

On 06/26/2018 01:03 AM, Sowmini Varadhan wrote:
> On (06/25/18 03:38), Ka-Cheong Poon wrote:
>> @@ -1105,8 +1105,27 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len)
>>   			break;
>>   
>>   		case sizeof(*sin6): {
>> -			ret = -EPROTONOSUPPORT;
>> -			goto out;
>> +			int addr_type;
>                           :
>                           :
>> +			daddr = sin6->sin6_addr;
>> +			dport = sin6->sin6_port;
>> +			scope_id = sin6->sin6_scope_id;
>> +			break;
>>   		}
> 
> In rds_sendmsg, the scopeid passed to rds_conn_create_outgoing
> may come from the msg_name (if msg_name is a link-local) or
> may come from the rs_bound_scope_id (for connected socket, change
> made in Patch 1 of the series).
> 
> This sounds inconsistent.
> 
> If I bind to scopeid if1 and then send to fe80::1%if2 (without connect()),
> we'd create an rds_connection with dev_if set to if2.
> (first off, its a bit unexpected to be sending to fe80::1%if2 when you
> are bound to a link-local on if1!)
> 
> But then, if we got back a response from fe80::1%if2, I think we would
> not find a matching conn in rds_recv_incoming?


Yes, I think if the socket is bound, it should check the scope_id
in msg_name (if not NULL) to make sure that they match.  A bound
RDS socket can send to multiple peers.  But if the bound local
address is link local, it should only be allowed to send to peers
on the same link.


> And this is even more confusing because the fastpath in rds_sendmsg
> does not take the bound_scope_id into consideration at all:
> 1213         if (rs->rs_conn && ipv6_addr_equal(&rs->rs_conn->c_faddr, &daddr))
> 1214                 conn = rs->rs_conn;
> 1215         else {
> 1216                 conn = rds_conn_create_outgoing( /* .. */, scope_id)
> so if I erroneously passed a msg_name on a connected rds socket, what
> would happen? (see also question about rds_connect() itself, below)


The check added above takes care of this.  The scope_id should
match.


> Should we always use rs_bound_scope_id for creating the outgoing
> rds_connection? (you may need something deterministic for this,
> like "if bound addr is linklocal, return error if daddr has a different
> scopeid, else use the bound addr's scopeid", plus, "if bound addr is
> not global, and daddr is link-local, we need a conn with the daddr's
> scopeid")


If a socket is bound, I guess the scope_id should be used.  So
if a socket is not bound to a link local address and the socket
is used to sent to a link local peer, it should fail.


> Also, why is there no IPv6 support in rds_connect?


Oops, I missed this when I ported the internal version to the
net-next version.  Will add it back.



-- 
K. Poon
ka-cheong.poon@oracle.com

^ permalink raw reply

* Re: [PATCH 00/14] ARM: davinci: step towards removing at24_platform_data
From: Bartosz Golaszewski @ 2018-06-25 17:46 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Sekhar Nori, Kevin Hilman, Russell King, Grygorii Strashko,
	David S . Miller, Srinivas Kandagatla, Lukas Wunner, Rob Herring,
	Florian Fainelli, Dan Carpenter, Ivan Khoronzhuk, David Lechner,
	Greg Kroah-Hartman, Linux ARM, Linux Kernel Mailing List,
	linux-omap, netdev, Bartosz Golaszewski
In-Reply-To: <20180625174024.GB17417@lunn.ch>

2018-06-25 19:40 GMT+02:00 Andrew Lunn <andrew@lunn.ch>:
> On Mon, Jun 25, 2018 at 05:50:11PM +0200, Bartosz Golaszewski wrote:
>> From: Bartosz Golaszewski <bgolaszewski@baylibre.com>
>>
>> Since I took over maintainership of the at24 driver I've been working
>> towards removing at24_platform_data in favor for device properties.
>>
>> DaVinci is the only platform that's still using it - all other users
>> have already been converted.
>>
>> One of the obstacles in case of DaVinci is removing the setup() callback
>> from the pdata struct, the only user of which are some davinci boards.
>
> Hi Bartosz
>
> Nice code.
>
> I've got a platform i want to add sometime soon using the at24. I
> noticed you doing the cleanup, so i avoided the setup() call. But
> using it would of helped...
>
> My platform is x86 based, so no device tree. I instantiate a number of
> AT24 devices from a platform driver, and then add nvmem cells so i can
> access data in these eeproms. This new code will make this simpler.
>
>> The only board that's still using this callback is now mityomapl138.
>> Unfortunately it stores more info in EEPROM than just the MAC address
>> and will require some more work. Unfortunately I don't have access
>> to this board so I can't test any actual solutions on a live hardware.
>
> Depending on what i find in the EEPROM, i need to instantiate other
> i2c devices. So i have the problem of knowing when the EEPROM has
> actually probed and i can use the nvmem API to retrieve the contents.
>
> What i have done so far, is registers a bus notifier on i2c_bus_type,
> and look for BUS_NOTIFY_BOUND_DRIVER. I can then check if the i2c
> client in the notifier is the at24 client. But when i then add more
> i2c clients from inside the notifier i get lockdep splats. They are
> false positives, but it does suggest it is not a good idea to do this.
>
> So it would be good to have some sort of recommended alternative to
> the setup() callback. Ideally it would be specific to a particular
> at24, and safe to call other i2c functions from.
>
> Do you have any ideas?
>
>    Andrew

With my patch 1/14 you'll get -EPROBE_DEFER from nvmem_cell_get() if
the nvmem provider is not yet registered. Will that help in your case?

Bart

^ permalink raw reply

* Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support
From: Sowmini Varadhan @ 2018-06-25 17:50 UTC (permalink / raw)
  To: Ka-Cheong Poon; +Cc: netdev, santosh.shilimkar, davem, rds-devel
In-Reply-To: <25e1afda-7497-7f08-815a-286cf775bc09@oracle.com>

On (06/26/18 01:43), Ka-Cheong Poon wrote:
> 
> Yes, I think if the socket is bound, it should check the scope_id
> in msg_name (if not NULL) to make sure that they match.  A bound
> RDS socket can send to multiple peers.  But if the bound local
> address is link local, it should only be allowed to send to peers
> on the same link.

agree.


> If a socket is bound, I guess the scope_id should be used.  So
> if a socket is not bound to a link local address and the socket
> is used to sent to a link local peer, it should fail.

PF_RDS sockets *MUST* alwasy be bound.  See
Documentation/networking/rds.txt:
"   Sockets must be bound before you can send or receive data.
    This is needed because binding also selects a transport and
    attaches it to the socket. Once bound, the transport assignment
    does not change."

Also, rds_sendmsg checks this (from net-next, your version
has the equivalent ipv6_addr_any etc check):

        if (daddr == 0 || rs->rs_bound_addr == 0) {
                release_sock(sk);
                ret = -ENOTCONN; /* XXX not a great errno */
                goto out;
        }

> 
> >Also, why is there no IPv6 support in rds_connect?
> 
> 
> Oops, I missed this when I ported the internal version to the
> net-next version.  Will add it back.

Ok

--Sowmini

^ permalink raw reply

* Re: Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Samudrala, Sridhar @ 2018-06-25 17:54 UTC (permalink / raw)
  To: Siwei Liu, Michael S. Tsirkin
  Cc: Cornelia Huck, Alexander Duyck, virtio-dev, aaron.f.brown,
	Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel, virtualization,
	konrad.wilk, boris.ostrovsky, Joao Martins, Venu Busireddy,
	vijay.balakrishna
In-Reply-To: <CADGSJ22DX8=rNPY+gNS_q=LCYLR5ieoE7oD8p4+8AnpzQsWSCg@mail.gmail.com>

On 6/22/2018 5:17 PM, Siwei Liu wrote:
> On Fri, Jun 22, 2018 at 4:40 PM, Siwei Liu <loseweigh@gmail.com> wrote:
>> On Fri, Jun 22, 2018 at 3:25 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> On Fri, Jun 22, 2018 at 02:51:11PM -0700, Siwei Liu wrote:
>>>> On Fri, Jun 22, 2018 at 2:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>> On Fri, Jun 22, 2018 at 01:03:04PM -0700, Siwei Liu wrote:
>>>>>> On Fri, Jun 22, 2018 at 1:00 PM, Siwei Liu <loseweigh@gmail.com> wrote:
>>>>>>> On Thu, Jun 21, 2018 at 7:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>> On Thu, Jun 21, 2018 at 06:21:55PM -0700, Siwei Liu wrote:
>>>>>>>>> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck <cohuck@redhat.com> wrote:
>>>>>>>>>> On Wed, 20 Jun 2018 22:48:58 +0300
>>>>>>>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
>>>>>>>>>>>> In any case, I'm not sure anymore why we'd want the extra uuid.
>>>>>>>>>>> It's mostly so we can have e.g. multiple devices with same MAC
>>>>>>>>>>> (which some people seem to want in order to then use
>>>>>>>>>>> then with different containers).
>>>>>>>>>>>
>>>>>>>>>>> But it is also handy for when you assign a PF, since then you
>>>>>>>>>>> can't set the MAC.
>>>>>>>>>>>
>>>>>>>>>> OK, so what about the following:
>>>>>>>>>>
>>>>>>>>>> - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
>>>>>>>>>>    that we have a new uuid field in the virtio-net config space
>>>>>>>>>> - in QEMU, add a property for virtio-net that allows to specify a uuid,
>>>>>>>>>>    offer VIRTIO_NET_F_STANDBY_UUID if set
>>>>>>>>>> - when configuring, set the property to the group UUID of the vfio-pci
>>>>>>>>>>    device
>>>>>>>>> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe
>>>>>>>>> to still expose UUID in the config space on virtio-pci?
>>>>>>>>
>>>>>>>> Yes but guest is not supposed to read it.
>>>>>>>>
>>>>>>>>> I'm not even sure if it's sane to expose group UUID on the PCI bridge
>>>>>>>>> where the corresponding vfio-pci device attached to for a guest which
>>>>>>>>> doesn't support the feature (legacy).
>>>>>>>>>
>>>>>>>>> -Siwei
>>>>>>>> Yes but you won't add the primary behind such a bridge.
>>>>>>> I assume the UUID feature is a new one besides the exiting
>>>>>>> VIRTIO_NET_F_STANDBY feature, where I think the current proposal is,
>>>>>>> if UUID feature is present and supported by guest, we'll do pairing
>>>>>>> based on UUID; if UUID feature present
>>>>>>                                   ^^^^^^^  is NOT present
>>>>> Let's go back for a bit.
>>>>>
>>>>> What is wrong with matching by mac?
>>>>>
>>>>> 1. Does not allow multiple NICs with same mac
>>>>> 2. Requires MAC to be programmed by host in the PT device
>>>>>     (which is often possible with VFs but isn't possible with all PT
>>>>>     devices)
>>>> Might not neccessarily be something wrong, but it's very limited to
>>>> prohibit the MAC of VF from changing when enslaved by failover.
>>> You mean guest changing MAC? I'm not sure why we prohibit that.
>> I think Sridhar and Jiri might be better person to answer it. My
>> impression was that sync'ing the MAC address change between all 3
>> devices is challenging, as the failover driver uses MAC address to
>> match net_device internally.

Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement
of the MAC between the PF and VF.  Allowing the guest to change the MAC will require
synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers
don't allow changing guest MAC unless it is a trusted VF.

^ permalink raw reply

* [PATCH 1/4] net: lan78xx: Allow for VLAN headers in timeout calcs
From: Dave Stevenson @ 2018-06-25 14:07 UTC (permalink / raw)
  To: woojung.huh, UNGLinuxDriver, davem, netdev; +Cc: Dave Stevenson
In-Reply-To: <cover.1529935234.git.dave.stevenson@raspberrypi.org>

The frame abort timeout being set by lan78xx_set_rx_max_frame_length
didn't account for any VLAN headers, resulting in very low
throughput if used with tagged VLANs.

Use VLAN_ETH_HLEN instead of ETH_HLEN to correct for this.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
---
 drivers/net/usb/lan78xx.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index a89570f..2f793d4 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2298,7 +2298,7 @@ static int lan78xx_change_mtu(struct net_device *netdev, int new_mtu)
 	if ((ll_mtu % dev->maxpacket) == 0)
 		return -EDOM;
 
-	ret = lan78xx_set_rx_max_frame_length(dev, new_mtu + ETH_HLEN);
+	ret = lan78xx_set_rx_max_frame_length(dev, new_mtu + VLAN_ETH_HLEN);
 
 	netdev->mtu = new_mtu;
 
@@ -2587,7 +2587,8 @@ static int lan78xx_reset(struct lan78xx_net *dev)
 	buf |= FCT_TX_CTL_EN_;
 	ret = lan78xx_write_reg(dev, FCT_TX_CTL, buf);
 
-	ret = lan78xx_set_rx_max_frame_length(dev, dev->net->mtu + ETH_HLEN);
+	ret = lan78xx_set_rx_max_frame_length(dev,
+					      dev->net->mtu + VLAN_ETH_HLEN);
 
 	ret = lan78xx_read_reg(dev, MAC_RX, &buf);
 	buf |= MAC_RX_RXEN_;
-- 
2.7.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox