Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH] arm64: dts: stratix10: fix multicast filtering
From: Koskinen, Aaro (Nokia - FI/Espoo) @ 2018-11-02 11:04 UTC (permalink / raw)
  To: Dinh Nguyen, Thor Thayer, Rob Herring, Mark Rutland
  Cc: devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20181102105355.18654-1-aaro.koskinen@nokia.com>

Hi,

Please don't apply this - looks like the company SMTP server mangled the From
header. I'll try to send v2 with a workaround.

A.

^ permalink raw reply

* Re: [RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps
From: Quentin Monnet @ 2018-11-02 11:04 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev, oss-drivers
In-Reply-To: <62ba0595-8b04-3ad4-32cd-47829b503f6b@iogearbox.net>

2018-11-02 10:08 UTC+0100 ~ Daniel Borkmann <daniel@iogearbox.net>
> On 11/01/2018 06:18 PM, Quentin Monnet wrote:
>> 2018-10-30 15:23 UTC+0000 ~ Quentin Monnet <quentin.monnet@netronome.com>
>>> The limit for memory locked in the kernel by a process is usually set to
>>> 64 bytes by default. This can be an issue when creating large BPF maps.
>>> A workaround is to raise this limit for the current process before
>>> trying to create a new BPF map. Changing the hard limit requires the
>>> CAP_SYS_RESOURCE and can usually only be done by root user (but then
>>> only root can create BPF maps).
>>
>> Sorry, the parenthesis is not correct: non-root users can in fact create
>> BPF maps as well. If a non-root user calls the function to create a map,
>> setrlimit() will fail silently (but set errno), and the program will
>> simply go on with its rlimit unchanged.
>>
>>> As far as I know there is not API to get the current amount of memory
>>> locked for a user, therefore we cannot raise the limit only when
>>> required. One solution, used by bcc, is to try to create the map, and on
>>> getting a EPERM error, raising the limit to infinity before giving
>>> another try. Another approach, used in iproute, is to raise the limit in
>>> all cases, before trying to create the map.
>>>
>>> Here we do the same as in iproute2: the rlimit is raised to infinity
>>> before trying to load the map.
>>>
>>> I send this patch as a RFC to see if people would prefer the bcc
>>> approach instead, or the rlimit change to be in bpftool rather than in
>>> libbpf.
> 
> I'd avoid doing something like this in a generic library; it's basically an
> ugly hack for the kind of accounting we're doing and only shows that while
> this was "good enough" to start off with in the early days, we should be
> doing something better today if every application raises it to inf anyway
> then it's broken. :) It just shows that this missed its purpose. Similarly
> to the jit_limit discussion on rlimit, perhaps we should be considering
> switching to something else entirely from kernel side. Could be something
> like memcg but this definitely needs some more evaluation first.

Changing the way limitations are enforced sounds like a cleaner
long-term approach indeed.

> (Meanwhile
> I'd not change the lib but callers instead and once we have something better
> in place we remove this type of "raising to inf" from the tree ...)

Understood, for the time beeing I'll repost a patch adding the
modification to bpftool once bpf-next is open.

Thanks Daniel!
Quentin

^ permalink raw reply

* [PATCH iproute2] Fix warning in tc-skbprio.8 manpage
From: Luca Boccassi @ 2018-11-02 10:57 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern

". If" gets interpreted as a macro, so move the period to the previous
line:

  33: warning: macro `If' not defined

Fixes: 141b55f8544e ("Add SKB Priority qdisc support in tc(8)")

Signed-off-by: Luca Boccassi <bluca@debian.org>
---
 man/man8/tc-skbprio.8 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/man/man8/tc-skbprio.8 b/man/man8/tc-skbprio.8
index 844bbf46..8b259839 100644
--- a/man/man8/tc-skbprio.8
+++ b/man/man8/tc-skbprio.8
@@ -29,8 +29,8 @@ While SKB Priority Queue is agnostic to how
 .B skb->priority
 is assigned. A typical use case is to copy
 the 6-bit DS field of IPv4 and IPv6 packets using
-.BR tc-skbedit (8)
-. If
+.BR tc-skbedit (8) .
+If
 .B skb->priority
 is greater or equal to 64, the priority is assumed to be 63.
 Priorities less than 64 are taken at face value.
-- 
2.19.1

^ permalink raw reply related

* Re: [PATCH bpf] bpf: fix bpf_prog_get_info_by_fd to return 0 func_lens for unpriv
From: Sandipan Das @ 2018-11-02 10:55 UTC (permalink / raw)
  To: Daniel Borkmann, ast; +Cc: netdev, Sandipan Das, Song Liu
In-Reply-To: <20181102103546.4499-1-daniel@iogearbox.net>



On 02/11/18 4:05 PM, Daniel Borkmann wrote:
> While dbecd7388476 ("bpf: get kernel symbol addresses via syscall")
> zeroed info.nr_jited_ksyms in bpf_prog_get_info_by_fd() for queries
> from unprivileged users, commit 815581c11cc2 ("bpf: get JITed image
> lengths of functions via syscall") forgot about doing so and therefore
> returns the #elems of the user set up buffer which is incorrect. It
> also needs to indicate a info.nr_jited_func_lens of zero.
> 
> Fixes: 815581c11cc2 ("bpf: get JITed image lengths of functions via syscall")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
> Cc: Song Liu <songliubraving@fb.com>
> ---
>  kernel/bpf/syscall.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index ccb9327..1ea5ce1 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2078,6 +2078,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  		info.jited_prog_len = 0;
>  		info.xlated_prog_len = 0;
>  		info.nr_jited_ksyms = 0;
> +		info.nr_jited_func_lens = 0;
>  		goto done;
>  	}
> 

Looks good. Thanks for fixing this. 

- Sandipan

^ permalink raw reply

* [PATCH bpf] bpf: fix bpf_prog_get_info_by_fd to return 0 func_lens for unpriv
From: Daniel Borkmann @ 2018-11-02 10:35 UTC (permalink / raw)
  To: ast; +Cc: netdev, Daniel Borkmann, Sandipan Das, Song Liu

While dbecd7388476 ("bpf: get kernel symbol addresses via syscall")
zeroed info.nr_jited_ksyms in bpf_prog_get_info_by_fd() for queries
from unprivileged users, commit 815581c11cc2 ("bpf: get JITed image
lengths of functions via syscall") forgot about doing so and therefore
returns the #elems of the user set up buffer which is incorrect. It
also needs to indicate a info.nr_jited_func_lens of zero.

Fixes: 815581c11cc2 ("bpf: get JITed image lengths of functions via syscall")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
Cc: Song Liu <songliubraving@fb.com>
---
 kernel/bpf/syscall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ccb9327..1ea5ce1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2078,6 +2078,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 		info.jited_prog_len = 0;
 		info.xlated_prog_len = 0;
 		info.nr_jited_ksyms = 0;
+		info.nr_jited_func_lens = 0;
 		goto done;
 	}
 
-- 
2.9.5

^ permalink raw reply related

* Re: [PATCH bpf 2/3] bpf: show real jited address in bpf_prog_info->jited_ksyms
From: Daniel Borkmann @ 2018-11-02 10:19 UTC (permalink / raw)
  To: Song Liu, netdev; +Cc: kernel-team, ast, sandipan
In-Reply-To: <e779db0e-7933-c11b-5068-d919897d15ec@iogearbox.net>

On 11/02/2018 11:09 AM, Daniel Borkmann wrote:
> On 11/01/2018 08:00 AM, Song Liu wrote:
>> Currently, jited_ksyms in bpf_prog_info shows page addresses of jited
>> bpf program. This is not ideal for detailed profiling (find hot
>> instructions from stack traces). This patch replaces the page address
>> with real prog start address.
>>
>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> ---
>>  kernel/bpf/syscall.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index ccb93277aae2..34a9eef5992c 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -2172,7 +2172,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>>  			user_ksyms = u64_to_user_ptr(info.jited_ksyms);
>>  			for (i = 0; i < ulen; i++) {
>>  				ksym_addr = (ulong) prog->aux->func[i]->bpf_func;
>> -				ksym_addr &= PAGE_MASK;
> 
> Note that the masking was done on purpose here and in patch 1/3 in order to
> not expose randomized start address to kallsyms at least. I suppose it's
> okay to change it here and for kallsyms given bpf_prog_get_info_by_fd() dump
> is for root only, and in each of the two cases we additionally apply
> kallsyms_show_value() logic, so for unpriv this is zeroed out plus only root
> loaded programs are added under kallsyms (capable(CAP_SYS_ADMIN)) anyway.

(Btw, something like above should have been in changelog to provide some more
historical context of why we used to do it like that and explaining why it is
okay to change it this way.)

>>  				if (put_user((u64) ksym_addr, &user_ksyms[i]))
>>  					return -EFAULT;
>>  			}
>>
> 

^ permalink raw reply

* [PATCH 6/6] can: m_can: Add support for transceiver as phy
From: Faiz Abbas @ 2018-11-02 19:26 UTC (permalink / raw)
  To: linux-kernel, devicetree, netdev, linux-can
  Cc: wg, mkl, robh+dt, mark.rutland, kishon, faiz_abbas
In-Reply-To: <20181102192616.28291-1-faiz_abbas@ti.com>

Add support for implementing the transceiver device as a phy. Instead
of a child node, the max_bitrate information is obtained by getting a
phy attribute. Fallback to the legacy API if no phy node is found.

Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
---
 drivers/net/can/m_can/m_can.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 9b449400376b..ed6d84acea20 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -26,6 +26,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/iopoll.h>
 #include <linux/can/dev.h>
+#include <linux/phy/phy.h>
 #include <linux/pinctrl/consumer.h>
 
 /* napi related */
@@ -1582,6 +1583,7 @@ static int m_can_plat_probe(struct platform_device *pdev)
 	struct device_node *np;
 	u32 mram_config_vals[MRAM_CFG_LEN];
 	u32 tx_fifo_size;
+	struct phy *transceiver;
 
 	np = pdev->dev.of_node;
 
@@ -1671,7 +1673,26 @@ static int m_can_plat_probe(struct platform_device *pdev)
 
 	devm_can_led_init(dev);
 
-	of_can_transceiver(dev);
+	transceiver = devm_phy_optional_get(&pdev->dev, "can_transceiver");
+	if (IS_ERR(transceiver)) {
+		ret = PTR_ERR(transceiver);
+		dev_err(&pdev->dev, "Couldn't get phy. err=%d\n", ret);
+		return ret;
+	}
+
+	if (!transceiver) {
+		dev_warn(&pdev->dev, "No transceiver phy. Falling back to legacy API\n");
+		of_can_transceiver(dev);
+	} else {
+		ret = phy_power_on(transceiver);
+		if (ret) {
+			dev_err(&pdev->dev, "phy_power_on err:%d\n", ret);
+			return ret;
+		}
+		priv->can.bitrate_max = phy_get_max_bitrate(transceiver);
+		if (!priv->can.bitrate_max)
+			dev_warn(&pdev->dev, "Invalid value for transceiver max bitrate. Ignoring bitrate limit\n");
+	}
 
 	dev_info(&pdev->dev, "%s device registered (irq=%d, version=%d)\n",
 		 KBUILD_MODNAME, dev->irq, priv->version);
-- 
2.18.0

^ permalink raw reply related

* [PATCH 4/6] dt-bindings: can: m_can: Document transceiver implementation as a phy
From: Faiz Abbas @ 2018-11-02 19:26 UTC (permalink / raw)
  To: linux-kernel, devicetree, netdev, linux-can
  Cc: wg, mkl, robh+dt, mark.rutland, kishon, faiz_abbas
In-Reply-To: <20181102192616.28291-1-faiz_abbas@ti.com>

Some transceivers need a configuration step (for example, pulling
a standby line low) for them to start sending messages. Instead of a
parent child relationship, the transceiver can be implemented as a
phy with the configuration done in the phy driver. The bitrate
limitations can then be obtained by the driver using said phy
node.

Document the above implementation.

Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
---
 .../devicetree/bindings/net/can/m_can.txt     | 24 ++++++++++++-------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/can/m_can.txt b/Documentation/devicetree/bindings/net/can/m_can.txt
index ed614383af9c..c11548e74278 100644
--- a/Documentation/devicetree/bindings/net/can/m_can.txt
+++ b/Documentation/devicetree/bindings/net/can/m_can.txt
@@ -42,12 +42,14 @@ Required properties:
 
 			  Please refer to 2.4.1 Message RAM Configuration in
 			  Bosch M_CAN user manual for details.
+Optional properties:
+- phy-names		: must be "can_transceiver". The transceiver
+			  typically limits the maximum bitrate of the
+			  system. The phy node is used to discover this
+			  limitation.
+- phys			: phandle to the transceiver implemented as a phy
+			  node.
 
-Optional Subnode:
-- can-transceiver	: Can-transceiver subnode describing maximum speed
-			  that can be used for CAN/CAN-FD modes. See
-			  Documentation/devicetree/bindings/net/can/can-transceiver.txt
-			  for details.
 Example:
 SoC dtsi:
 m_can1: can@20e8000 {
@@ -64,12 +66,18 @@ m_can1: can@20e8000 {
 };
 
 Board dts:
+
 &m_can1 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&pinctrl_m_can1>;
 	status = "enabled";
+	phy-names = "can_transceiver";
+	phys = <&transceiver1>;
+};
 
-	can-transceiver {
-		max-bitrate = <5000000>;
-	};
+transceiver1: can-transceiver {
+	compatible = "simple-phy";
+	max-bitrate = <5000000>;
+	pwr-supply = <&transceiver1_fixed>;
+	#phy-cells = <0>;
 };
-- 
2.18.0

^ permalink raw reply related

* [PATCH 3/6] phy: phy-of-simple: Add support for simple generic phy driver
From: Faiz Abbas @ 2018-11-02 19:26 UTC (permalink / raw)
  To: linux-kernel, devicetree, netdev, linux-can
  Cc: wg, mkl, robh+dt, mark.rutland, kishon, faiz_abbas
In-Reply-To: <20181102192616.28291-1-faiz_abbas@ti.com>

A good number of phy implementations don't have a dedicated register
map and only do simple clock or regulator configurations. Add support
for a generic driver which just does such simple configurations.

The driver optionally gets all the generic phy attributes and also the
pwr regulator node. It also includes a generic implementation of
phy_power_on() and phy_power_off().

Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
---
 drivers/phy/Kconfig         |  7 +++
 drivers/phy/Makefile        |  1 +
 drivers/phy/phy-of-simple.c | 90 +++++++++++++++++++++++++++++++++++++
 3 files changed, 98 insertions(+)
 create mode 100644 drivers/phy/phy-of-simple.c

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index 60f949e2a684..e66039560da9 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -40,6 +40,13 @@ config PHY_XGENE
 	help
 	  This option enables support for APM X-Gene SoC multi-purpose PHY.
 
+config PHY_OF_SIMPLE
+	tristate "PHY Simple Driver"
+	select GENERIC_PHY
+	help
+	  This driver supports simple generic phy implementations which don't
+	  need a dedicated register map.
+
 source "drivers/phy/allwinner/Kconfig"
 source "drivers/phy/amlogic/Kconfig"
 source "drivers/phy/broadcom/Kconfig"
diff --git a/drivers/phy/Makefile b/drivers/phy/Makefile
index 0301e25d07c1..ffef0fdf2c68 100644
--- a/drivers/phy/Makefile
+++ b/drivers/phy/Makefile
@@ -4,6 +4,7 @@
 #
 
 obj-$(CONFIG_GENERIC_PHY)		+= phy-core.o
+obj-$(CONFIG_PHY_OF_SIMPLE)		+= phy-of-simple.o
 obj-$(CONFIG_PHY_LPC18XX_USB_OTG)	+= phy-lpc18xx-usb-otg.o
 obj-$(CONFIG_PHY_XGENE)			+= phy-xgene.o
 obj-$(CONFIG_PHY_PISTACHIO_USB)		+= phy-pistachio-usb.o
diff --git a/drivers/phy/phy-of-simple.c b/drivers/phy/phy-of-simple.c
new file mode 100644
index 000000000000..0194aae90d3c
--- /dev/null
+++ b/drivers/phy/phy-of-simple.c
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * phy-of-simple.c - phy driver for simple implementations
+ *
+ * Copyright (C) 2018 Texas Instruments Incorporated - http://www.ti.com
+ *
+ */
+#include <linux/phy/phy.h>
+#include <linux/platform_device.h>
+#include <linux/module.h>
+#include <linux/regulator/consumer.h>
+
+static int phy_simple_power_on(struct phy *phy)
+{
+	if (phy->pwr)
+		return regulator_enable(phy->pwr);
+
+	return 0;
+}
+
+static int phy_simple_power_off(struct phy *phy)
+{
+	if (phy->pwr)
+		return regulator_disable(phy->pwr);
+
+	return 0;
+}
+
+static const struct phy_ops phy_simple_ops = {
+	.power_on	= phy_simple_power_on,
+	.power_off	= phy_simple_power_off,
+	.owner		= THIS_MODULE,
+};
+
+int phy_simple_probe(struct platform_device *pdev)
+{
+	struct phy_provider *phy_provider;
+	struct device *dev = &pdev->dev;
+	struct regulator *pwr = NULL;
+	struct phy *phy;
+	u32 bus_width = 0;
+	u32 max_bitrate = 0;
+	int ret;
+
+	phy = devm_phy_create(dev, dev->of_node,
+				      &phy_simple_ops);
+
+	if (IS_ERR(phy)) {
+		dev_err(dev, "Failed to create phy\n");
+		return PTR_ERR(phy);
+	}
+
+	device_property_read_u32(dev, "bus-width", &bus_width);
+	phy->attrs.bus_width = bus_width;
+	device_property_read_u32(dev, "max-bitrate", &max_bitrate);
+	phy->attrs.max_bitrate = max_bitrate;
+
+	pwr = devm_regulator_get_optional(dev, "pwr");
+	if (IS_ERR(pwr)) {
+		ret = PTR_ERR(pwr);
+		dev_err(dev, "Couldn't get regulator. ret=%d\n", ret);
+		return ret;
+	}
+	phy->pwr = pwr;
+
+	phy_provider = devm_of_phy_provider_register(dev, of_phy_simple_xlate);
+
+	return PTR_ERR_OR_ZERO(phy_provider);
+}
+
+static const struct of_device_id phy_simple_dt_ids[] = {
+	{ .compatible = "simple-phy"},
+	{}
+};
+
+MODULE_DEVICE_TABLE(of, phy_simple_phy_dt_ids);
+
+static struct platform_driver phy_simple_driver = {
+	.probe		= phy_simple_probe,
+	.driver		= {
+		.name	= "phy-of-simple",
+		.of_match_table = phy_simple_dt_ids,
+	},
+};
+
+module_platform_driver(phy_simple_driver);
+
+MODULE_AUTHOR("Faiz Abbas <faiz_abbas@ti.com>");
+MODULE_DESCRIPTION("Simple PHY driver");
+MODULE_LICENSE("GPL v2");
-- 
2.18.0

^ permalink raw reply related

* [PATCH 2/6] dt-bindings: phy: phy-of-simple: Document new binding
From: Faiz Abbas @ 2018-11-02 19:26 UTC (permalink / raw)
  To: linux-kernel, devicetree, netdev, linux-can
  Cc: wg, mkl, robh+dt, mark.rutland, kishon, faiz_abbas
In-Reply-To: <20181102192616.28291-1-faiz_abbas@ti.com>

Add documentation for the generic simple phy implementation.

Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
---
 .../devicetree/bindings/phy/phy-of-simple.txt | 29 +++++++++++++++++++
 1 file changed, 29 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/phy/phy-of-simple.txt

diff --git a/Documentation/devicetree/bindings/phy/phy-of-simple.txt b/Documentation/devicetree/bindings/phy/phy-of-simple.txt
new file mode 100644
index 000000000000..696f2763395c
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/phy-of-simple.txt
@@ -0,0 +1,29 @@
+Generic simple phy device tree binding
+--------------------------------------
+
+A good number of phy implementations merely read dts properties,
+enable clocks, regulators or do resets without having a dedicated register
+map. This binding implements a generic phy driver which can be used for
+such simple implementations and avoid boilerplate code duplication.
+
+Required Properties:
+-  compatible	: must be "simple-phy"
+-  phy-cells    : must be 0
+
+Optional Properties:
+-  bus-width	: generic bus-width. Must be positive.
+-  max-bitrate	: generic max-bitrate. Must be positive.
+-  pwr		: phandle to phy pwr regulator node.
+
+Example:
+
+The following example is a can transceiver implemented as a generic phy.
+It has a max-bitrate property and a pwr regulator.
+
+
+transceiver1: can-transceiver {
+	compatible = "simple-phy";
+	max-bitrate = <5000000>;
+	pwr-supply = <&transceiver1_fixed>;
+	#phy-cells = <0>;
+};
-- 
2.18.0

^ permalink raw reply related

* [PATCH 1/6] phy: Add max_bitrate attribute & phy_get_max_bitrate()
From: Faiz Abbas @ 2018-11-02 19:26 UTC (permalink / raw)
  To: linux-kernel, devicetree, netdev, linux-can
  Cc: wg, mkl, robh+dt, mark.rutland, kishon, faiz_abbas
In-Reply-To: <20181102192616.28291-1-faiz_abbas@ti.com>

In some subsystems (eg. CAN) the physical layer capabilities are
the limiting factor in the datarate of the device. Typically, the
physical layer transceiver does not provide a way to discover this
limitation at runtime. Thus this information needs to be represented as
a phy attribute which is read from the device tree.

Therefore, add an optional max_bitrate attribute to the generic phy
sybsystem. Also add the complementary API which enables the consumer
to get max_bitrate.

Signed-off-by: Faiz Abbas <faiz_abbas@ti.com>
---
 include/linux/phy/phy.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h
index 03b319f89a34..31574464f09f 100644
--- a/include/linux/phy/phy.h
+++ b/include/linux/phy/phy.h
@@ -72,6 +72,7 @@ struct phy_ops {
  */
 struct phy_attrs {
 	u32			bus_width;
+	u32			max_bitrate;
 	enum phy_mode		mode;
 };
 
@@ -179,6 +180,10 @@ static inline void phy_set_bus_width(struct phy *phy, int bus_width)
 {
 	phy->attrs.bus_width = bus_width;
 }
+static inline int phy_get_max_bitrate(struct phy *phy)
+{
+	return phy->attrs.max_bitrate;
+}
 struct phy *phy_get(struct device *dev, const char *string);
 struct phy *phy_optional_get(struct device *dev, const char *string);
 struct phy *devm_phy_get(struct device *dev, const char *string);
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCH bpf 2/3] bpf: show real jited address in bpf_prog_info->jited_ksyms
From: Daniel Borkmann @ 2018-11-02 10:09 UTC (permalink / raw)
  To: Song Liu, netdev; +Cc: kernel-team, ast, sandipan
In-Reply-To: <20181101070058.2760251-3-songliubraving@fb.com>

On 11/01/2018 08:00 AM, Song Liu wrote:
> Currently, jited_ksyms in bpf_prog_info shows page addresses of jited
> bpf program. This is not ideal for detailed profiling (find hot
> instructions from stack traces). This patch replaces the page address
> with real prog start address.
> 
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  kernel/bpf/syscall.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index ccb93277aae2..34a9eef5992c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2172,7 +2172,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  			user_ksyms = u64_to_user_ptr(info.jited_ksyms);
>  			for (i = 0; i < ulen; i++) {
>  				ksym_addr = (ulong) prog->aux->func[i]->bpf_func;
> -				ksym_addr &= PAGE_MASK;

Note that the masking was done on purpose here and in patch 1/3 in order to
not expose randomized start address to kallsyms at least. I suppose it's
okay to change it here and for kallsyms given bpf_prog_get_info_by_fd() dump
is for root only, and in each of the two cases we additionally apply
kallsyms_show_value() logic, so for unpriv this is zeroed out plus only root
loaded programs are added under kallsyms (capable(CAP_SYS_ADMIN)) anyway.

>  				if (put_user((u64) ksym_addr, &user_ksyms[i]))
>  					return -EFAULT;
>  			}
> 

^ permalink raw reply

* [PATCH v3] arm64: dts: stratix10: fix multicast filtering
From: Aaro Koskinen @ 2018-11-02 19:10 UTC (permalink / raw)
  To: Dinh Nguyen, Thor Thayer, Rob Herring, Mark Rutland
  Cc: devicetree, linux-kernel, netdev, Aaro Koskinen

From: Aaro Koskinen <aaro.koskinen@nokia.com>

On Stratix 10, the EMAC has 256 hash buckets for multicast filtering. This
needs to be specified in DTS, otherwise the stmmac driver defaults to 64
buckets and initializes the filter incorrectly. As a result, e.g. valid
IPv6 multicast traffic ends up being dropped.

Fixes: 78cd6a9d8e15 ("arm64: dts: Add base stratix 10 dtsi")
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
---
 arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi | 3 +++
 1 file changed, 3 insertions(+)

	v3: Send the patch using sane SMTP server to preserve the correct
	    formatting.

diff --git a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
index 8253a1a9e985..fef7351e9f67 100644
--- a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
+++ b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
@@ -139,6 +139,7 @@
 			clock-names = "stmmaceth";
 			tx-fifo-depth = <16384>;
 			rx-fifo-depth = <16384>;
+			snps,multicast-filter-bins = <256>;
 			status = "disabled";
 		};
 
@@ -154,6 +155,7 @@
 			clock-names = "stmmaceth";
 			tx-fifo-depth = <16384>;
 			rx-fifo-depth = <16384>;
+			snps,multicast-filter-bins = <256>;
 			status = "disabled";
 		};
 
@@ -169,6 +171,7 @@
 			clock-names = "stmmaceth";
 			tx-fifo-depth = <16384>;
 			rx-fifo-depth = <16384>;
+			snps,multicast-filter-bins = <256>;
 			status = "disabled";
 		};
 
-- 
2.17.0

^ permalink raw reply related

* Re: Ethernet on my CycloneV broke since 4.9.124
From: Clément Péron @ 2018-11-02 10:02 UTC (permalink / raw)
  To: Dinh Nguyen; +Cc: netdev
In-Reply-To: <d1029588-e181-b804-d2f7-cc9990f6bf70@kernel.org>

Hi Dinh,

On Wed, 31 Oct 2018 at 23:02, Dinh Nguyen <dinguyen@kernel.org> wrote:
>
> Hi Clement,
>
> On 10/31/2018 10:36 AM, Clément Péron wrote:
> > Hi Dinh,
> >
> > On Wed, 31 Oct 2018 at 15:42, Dinh Nguyen <dinguyen@kernel.org> wrote:
> >>
> >> Hi Clement,
> >>
> >> On 10/31/2018 08:01 AM, Clément Péron wrote:
> >>> Hi,
> >>>
> >>> The patch "net: stmmac: socfpga: add additional ocp reset line for
> >>> Stratix10" introduce in 4.9.124 broke the ethernet on my CycloneV
> >>> board.
> >>>
> >>> When I boot i have this issue :
> >>>
> >>> socfpga-dwmac ff702000.ethernet: error getting reset control of ocp -2
> >>> socfpga-dwmac: probe of ff702000.ethernet failed with error -2
> >>>
> >>> Reverting the commit : 6f37f7b62baa6a71d7f3f298acb64de51275e724 fix the issue.
> >>>
> >>
> >> Are you sure? I just booted v4.9.124 and did not see any errors. The
> >> error should not appear because the commit is using
> >> devm_reset_control_get_optional().
> >
> > I'm booting on 4.9.130 actually, Agree with you that
> > devm_reset_control_get_optional should not failed but checking other
> > usage of this helper
> > https://elixir.bootlin.com/linux/v4.9.135/source/drivers/i2c/busses/i2c-mv64xxx.c#L824
> > https://elixir.bootlin.com/linux/v4.9.135/source/drivers/crypto/sunxi-ss/sun4i-ss-core.c#L259
> > Show that they don't check for errors except for PROBE_DEFER
> >
>
> I made a mistake, I was booting linux-next. I am seeing the error with
> v4.9.124. It's due to this commit not getting backported:
>
> "bb475230b8e59a reset: make optional functions really optional"
>
> I have backported the patch and is available here if you like to take a
> look:

Thanks, works fine on my board too.
Regards,
Clement

>
> https://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux.git/log/?h=v4.9.124_optional_reset
>
> Dinh

^ permalink raw reply

* Re: [PATCH iproute2 net-next 0/3] ss: Allow selection of columns to be displayed
From: Stefano Brivio @ 2018-11-02  9:58 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David Ahern, Yoann P., Stephen Hemminger, netdev
In-Reply-To: <20181101140623.4d6211a0@cakuba.netronome.com>

On Thu, 1 Nov 2018 14:06:23 -0700
Jakub Kicinski <kubakici@wp.pl> wrote:

> JSONification would probably be quite an undertaking for ss :(

Probably not too much, and we would skip buffering for JSON as we don't
need to print columns, so we don't have to care about buffering and
rendering performance too much.

All it takes is to extend a bit the out() function and pass the right
arguments to it -- it looks way easier than implementing format
strings, even though I'd like the latter more.

-- 
Stefano

^ permalink raw reply

* Re: [PATCH iproute2 net-next 0/3] ss: Allow selection of columns to be displayed
From: Stefano Brivio @ 2018-11-02  9:58 UTC (permalink / raw)
  To: David Ahern; +Cc: Yoann P., Stephen Hemminger, netdev
In-Reply-To: <adb1ade4-e284-73fc-ff77-c1b1b2b4c3f1@gmail.com>

On Wed, 31 Oct 2018 20:48:05 -0600
David Ahern <dsahern@gmail.com> wrote:

> On 10/30/18 11:34 AM, Stefano Brivio wrote:
> > 
> > - the current column abstraction is rather lightweight, things are
> >   already buffered in the defined column order so we don't have to jump
> >   back and forth in the buffer while rendering. Doing that needs some
> >   extra care to avoid a performance hit, but it's probably doable, I
> >   can put that on my to-do list  
> 
> The ss command is always a pain; it's much better in newer releases but
> I am always having to adjust terminal width.

Ouch. Do you have some examples?

-- 
Stefano

^ permalink raw reply

* Re: [PATCH bpf 3/3] bpf: show main program address in bpf_prog_info->jited_ksyms
From: Daniel Borkmann @ 2018-11-02  9:57 UTC (permalink / raw)
  To: Song Liu, netdev; +Cc: kernel-team, ast, sandipan
In-Reply-To: <20181101070058.2760251-4-songliubraving@fb.com>

On 11/01/2018 08:00 AM, Song Liu wrote:
> Currently, when there is not subprog (prog->aux->func_cnt == 0),
> bpf_prog_info does not return any jited_ksyms. This patch adds
> main program address (prog->bpf_func) to jited_ksyms.
> 
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  kernel/bpf/syscall.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 34a9eef5992c..7293b17ca62a 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2158,7 +2158,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	}
>  
>  	ulen = info.nr_jited_ksyms;
> -	info.nr_jited_ksyms = prog->aux->func_cnt;
> +	info.nr_jited_ksyms = prog->aux->func_cnt ? : 1;
>  	if (info.nr_jited_ksyms && ulen) {
>  		if (bpf_dump_raw_ok()) {
>  			u64 __user *user_ksyms;
> @@ -2170,9 +2170,17 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  			 */
>  			ulen = min_t(u32, info.nr_jited_ksyms, ulen);
>  			user_ksyms = u64_to_user_ptr(info.jited_ksyms);
> -			for (i = 0; i < ulen; i++) {
> -				ksym_addr = (ulong) prog->aux->func[i]->bpf_func;
> -				if (put_user((u64) ksym_addr, &user_ksyms[i]))
> +			if (prog->aux->func_cnt) {
> +				for (i = 0; i < ulen; i++) {
> +					ksym_addr = (ulong)

Small nit: can we change ksym_addr, the above and below cast to kernel-style
'unsigned long' while at it?

> +						prog->aux->func[i]->bpf_func;
> +					if (put_user((u64) ksym_addr,
> +						     &user_ksyms[i]))
> +						return -EFAULT;
> +				}
> +			} else {
> +				ksym_addr = (ulong) prog->bpf_func;
> +				if (put_user((u64) ksym_addr, &user_ksyms[0]))
>  					return -EFAULT;

If we do this here, I think we should also update nr_jited_func_lens to copy
prog->jited_len to user space to be consistent with this change here. In case
of multi-func, the latter copies the len of the main program, and the lens of
the subprogs. Given we push the address for it to user space, we should then
also push the main prog len if it's only main prog there so this case doesn't
need any special handling by user space.

>  			}
>  		} else {
> 

^ permalink raw reply

* Re: [PULL] vhost: cleanups and fixes
From: Al Viro @ 2018-11-02 19:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: mst, mark.rutland, Kees Cook, kvm, virtualization, netdev,
	Linux Kernel Mailing List, Andrew Morton, bijan.mottahedeh,
	gedwards, joe, lenaic, liang.z.li, mhocko, mhocko, stefanha,
	wei.w.wang, jasowang
In-Reply-To: <CAHk-=wj5ZnqmpDuXBCcND8nMNitNyRf_4KQSzSUfQvX2-wOYsg@mail.gmail.com>

On Fri, Nov 02, 2018 at 10:15:56AM -0700, Linus Torvalds wrote:
> On Fri, Nov 2, 2018 at 10:10 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Don't you take over the VM with "use_mm()" when you do the copies? So
> > yes, it's a kernel thread, but it has a user VM, and though that
> > should have the user limits.
> 
> Oooh. *Just* as I sent this, I realized that "use_mm()" doesn't update
> the thread addr_limit.
> 
> That actually looks like a bug to me - although one that you've
> apparently been aware of and worked around.
> 
> Wouldn't it be nicer to just make "use_mm()" do
> 
>         set_fs(USER_DS);
> 
> instead? And undo it on unuse_mm()?
> 
> And, in fact, maybe we should default kernel threads to have a zero
> address limit, so that they can't do any user accesses at all without
> doing this?

Try it and watch it fail to set initramfs up, let alone exec the init...

> Adding Al to the cc, because I think he's been looking at set_fs() in general.

It would be the right thing (with return to KERNEL_DS), but I'm not certain
if GPU users will survive - these two
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h:157:                         use_mm(mmptr);                          \
drivers/gpu/drm/i915/gvt/kvmgt.c:1799:          use_mm(kvm->mm);
I don't understand the call chains there (especially for the first one) well
enough to tell.

^ permalink raw reply

* [PATCH] net: core: netpoll: Enable netconsole IPv6 link local address
From: Matwey V. Kornilov @ 2018-11-02 18:19 UTC (permalink / raw)
  To: David S. Miller
  Cc: matwey.kornilov, Eric Dumazet, Ingo Molnar, Thomas Gleixner,
	Frederic Weisbecker, Debabrata Banerjee, Dave Jones, netdev,
	linux-kernel, Matwey V. Kornilov

There is no reason to discard using source link local address when
remote netconsole IPv6 address is set to be link local one.

The patch allows administrators to use IPv6 netconsole without
explicitly configuring source address:

    netconsole=@/,@fe80::5054:ff:fe2f:6012/

Signed-off-by: Matwey V. Kornilov <matwey@sai.msu.ru>
---
 net/core/netpoll.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 5da9552b186b..2b9fdbc43205 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -717,7 +717,8 @@ int netpoll_setup(struct netpoll *np)
 
 				read_lock_bh(&idev->lock);
 				list_for_each_entry(ifp, &idev->addr_list, if_list) {
-					if (ipv6_addr_type(&ifp->addr) & IPV6_ADDR_LINKLOCAL)
+					if (!!(ipv6_addr_type(&ifp->addr) & IPV6_ADDR_LINKLOCAL) !=
+					    !!(ipv6_addr_type(&np->remote_ip.in6) & IPV6_ADDR_LINKLOCAL))
 						continue;
 					np->local_ip.in6 = ifp->addr;
 					err = 0;
-- 
2.16.4

^ permalink raw reply related

* Re: [RFC bpf-next] libbpf: increase rlimit before trying to create BPF maps
From: Daniel Borkmann @ 2018-11-02  9:08 UTC (permalink / raw)
  To: Quentin Monnet, Alexei Starovoitov; +Cc: netdev, oss-drivers
In-Reply-To: <bb9118cc-daa2-8e73-96e4-79fed2a1de0a@netronome.com>

On 11/01/2018 06:18 PM, Quentin Monnet wrote:
> 2018-10-30 15:23 UTC+0000 ~ Quentin Monnet <quentin.monnet@netronome.com>
>> The limit for memory locked in the kernel by a process is usually set to
>> 64 bytes by default. This can be an issue when creating large BPF maps.
>> A workaround is to raise this limit for the current process before
>> trying to create a new BPF map. Changing the hard limit requires the
>> CAP_SYS_RESOURCE and can usually only be done by root user (but then
>> only root can create BPF maps).
> 
> Sorry, the parenthesis is not correct: non-root users can in fact create
> BPF maps as well. If a non-root user calls the function to create a map,
> setrlimit() will fail silently (but set errno), and the program will
> simply go on with its rlimit unchanged.
> 
>> As far as I know there is not API to get the current amount of memory
>> locked for a user, therefore we cannot raise the limit only when
>> required. One solution, used by bcc, is to try to create the map, and on
>> getting a EPERM error, raising the limit to infinity before giving
>> another try. Another approach, used in iproute, is to raise the limit in
>> all cases, before trying to create the map.
>>
>> Here we do the same as in iproute2: the rlimit is raised to infinity
>> before trying to load the map.
>>
>> I send this patch as a RFC to see if people would prefer the bcc
>> approach instead, or the rlimit change to be in bpftool rather than in
>> libbpf.

I'd avoid doing something like this in a generic library; it's basically an
ugly hack for the kind of accounting we're doing and only shows that while
this was "good enough" to start off with in the early days, we should be
doing something better today if every application raises it to inf anyway
then it's broken. :) It just shows that this missed its purpose. Similarly
to the jit_limit discussion on rlimit, perhaps we should be considering
switching to something else entirely from kernel side. Could be something
like memcg but this definitely needs some more evaluation first. (Meanwhile
I'd not change the lib but callers instead and once we have something better
in place we remove this type of "raising to inf" from the tree ...)

>> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
>> ---
>>  tools/lib/bpf/bpf.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>> index 03f9bcc4ef50..456a5a7b112c 100644
>> --- a/tools/lib/bpf/bpf.c
>> +++ b/tools/lib/bpf/bpf.c
>> @@ -26,6 +26,8 @@
>>  #include <unistd.h>
>>  #include <asm/unistd.h>
>>  #include <linux/bpf.h>
>> +#include <sys/resource.h>
>> +#include <sys/types.h>
>>  #include "bpf.h"
>>  #include "libbpf.h"
>>  #include <errno.h>
>> @@ -68,8 +70,11 @@ static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
>>  int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
>>  {
>>  	__u32 name_len = create_attr->name ? strlen(create_attr->name) : 0;
>> +	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
>>  	union bpf_attr attr;
>>  
>> +	setrlimit(RLIMIT_MEMLOCK, &rinf);
>> +
>>  	memset(&attr, '\0', sizeof(attr));
>>  
>>  	attr.map_type = create_attr->map_type;
>>
> 

^ permalink raw reply

* Re: [PULL] vhost: cleanups and fixes
From: Michael S. Tsirkin @ 2018-11-02 18:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: mark.rutland, Kees Cook, kvm, virtualization, netdev,
	Linux Kernel Mailing List, Andrew Morton, bijan.mottahedeh,
	gedwards, joe, lenaic, liang.z.li, mhocko, mhocko, stefanha,
	wei.w.wang, jasowang
In-Reply-To: <CAHk-=wjAvx_rWKct5RtqRrVEgNxrFF=1qQwkz+AegDvd4yXW=g@mail.gmail.com>

On Fri, Nov 02, 2018 at 11:02:35AM -0700, Linus Torvalds wrote:
> On Fri, Nov 2, 2018 at 10:21 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > it seems that it depends on current not on the active mm.
> 
> Yes, see my other suggestion to just fix "use_mm()" to update the address range.
> 
> Would you mind testing that?

Sure, I'll test it.

> Because that would seem to be the *much* less error-prone model..
> 
>            Linus

I agree, it's always been bothering me.

-- 
MST

^ permalink raw reply

* Re: [PULL] vhost: cleanups and fixes
From: Michael S. Tsirkin @ 2018-11-02 17:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: mark.rutland, Kees Cook, kvm, virtualization, netdev,
	Linux Kernel Mailing List, Andrew Morton, bijan.mottahedeh,
	gedwards, joe, lenaic, liang.z.li, mhocko, mhocko, stefanha,
	wei.w.wang, jasowang
In-Reply-To: <CAHk-=wibzqo7C+mS+BgZxRbgdWe2w5F39EhuFUhZUxvotoGLuA@mail.gmail.com>

On Fri, Nov 02, 2018 at 10:10:45AM -0700, Linus Torvalds wrote:
> On Fri, Nov 2, 2018 at 9:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > Just for completeness I'd like to point out for vhost the copies are
> > done from the kernel thread.  So yes we can switch to copy_to/from_user
> > but for e.g. 32-bit userspace running on top of a 64 bit kernel it is
> > IIUC not sufficient - we must *also* do access_ok checks on control path
> > when addresses are passed to the kernel and when current points to the
> > correct task struct.
> 
> Don't you take over the VM with "use_mm()" when you do the copies?

Yes we do.

> So
> yes, it's a kernel thread, but it has a user VM, and though that
> should have the user limits.
> 
> No?
> 
>           Linus

Here's what I meant: we have

#define access_ok(type, addr, size)                                     \
({                                                                      \
        WARN_ON_IN_IRQ();                                               \
        likely(!__range_not_ok(addr, size, user_addr_max()));           \
})

and

#define user_addr_max() (current->thread.addr_limit.seg)

it seems that it depends on current not on the active mm.

get_user and friends are similar:

ENTRY(__get_user_1)
        mov PER_CPU_VAR(current_task), %_ASM_DX
        cmp TASK_addr_limit(%_ASM_DX),%_ASM_AX
        jae bad_get_user
        sbb %_ASM_DX, %_ASM_DX          /* array_index_mask_nospec() */
        and %_ASM_DX, %_ASM_AX
        ASM_STAC
1:      movzbl (%_ASM_AX),%edx
        xor %eax,%eax
        ASM_CLAC
        ret
ENDPROC(__get_user_1)
EXPORT_SYMBOL(__get_user_1)

-- 
MST

^ permalink raw reply

* Re: [PULL] vhost: cleanups and fixes
From: Linus Torvalds @ 2018-11-02 17:15 UTC (permalink / raw)
  To: mst, Al Viro
  Cc: mark.rutland, Kees Cook, kvm, virtualization, netdev,
	Linux Kernel Mailing List, Andrew Morton, bijan.mottahedeh,
	gedwards, joe, lenaic, liang.z.li, mhocko, mhocko, stefanha,
	wei.w.wang, jasowang
In-Reply-To: <CAHk-=wibzqo7C+mS+BgZxRbgdWe2w5F39EhuFUhZUxvotoGLuA@mail.gmail.com>

On Fri, Nov 2, 2018 at 10:10 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Don't you take over the VM with "use_mm()" when you do the copies? So
> yes, it's a kernel thread, but it has a user VM, and though that
> should have the user limits.

Oooh. *Just* as I sent this, I realized that "use_mm()" doesn't update
the thread addr_limit.

That actually looks like a bug to me - although one that you've
apparently been aware of and worked around.

Wouldn't it be nicer to just make "use_mm()" do

        set_fs(USER_DS);

instead? And undo it on unuse_mm()?

And, in fact, maybe we should default kernel threads to have a zero
address limit, so that they can't do any user accesses at all without
doing this?

Adding Al to the cc, because I think he's been looking at set_fs() in general.

                  Linus

^ permalink raw reply

* Re: [PULL] vhost: cleanups and fixes
From: Linus Torvalds @ 2018-11-02 17:10 UTC (permalink / raw)
  To: mst
  Cc: mark.rutland, lenaic, mhocko, Kees Cook, kvm, netdev, liang.z.li,
	Linux Kernel Mailing List, virtualization, stefanha, joe,
	Andrew Morton, mhocko, bijan.mottahedeh
In-Reply-To: <20181102122937-mutt-send-email-mst@kernel.org>

On Fri, Nov 2, 2018 at 9:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> Just for completeness I'd like to point out for vhost the copies are
> done from the kernel thread.  So yes we can switch to copy_to/from_user
> but for e.g. 32-bit userspace running on top of a 64 bit kernel it is
> IIUC not sufficient - we must *also* do access_ok checks on control path
> when addresses are passed to the kernel and when current points to the
> correct task struct.

Don't you take over the VM with "use_mm()" when you do the copies? So
yes, it's a kernel thread, but it has a user VM, and though that
should have the user limits.

No?

          Linus

^ permalink raw reply

* Re: [PATCH] net/mlx5e: fix high stack usage
From: Jason Gunthorpe @ 2018-11-02 17:00 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Saeed Mahameed, Leon Romanovsky, David S. Miller, Tariq Toukan,
	Eran Ben Elisha, Boris Pismenny, Ilya Lesokhin, Moshe Shemesh,
	Kamal Heib, netdev, linux-rdma, linux-kernel
In-Reply-To: <CAK8P3a2pvkfR1A6dHAMm8ETd13E1LSxXrEsQHSOFYWbr-7R2Sw@mail.gmail.com>

On Fri, Nov 02, 2018 at 05:23:26PM +0100, Arnd Bergmann wrote:
> On 11/2/18, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Fri, Nov 02, 2018 at 04:33:03PM +0100, Arnd Bergmann wrote:
> >> A patch that looks harmless causes the stack usage of the
> >> mlx5e_grp_sw_update_stats()
> >> function to drastically increase with x86 gcc-4.9 and higher (tested up to
> >> 8.1):
> >>
> >> drivers/net/ethernet/mellanox/mlx5/core/en_stats.c: In function
> >> ‘mlx5e_grp_sw_update_stats’:
> >> drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:216:1: warning: the
> >> frame size of 1276 bytes is larger than 500 bytes [-Wframe-larger-than=]
> >
> > Why is the stack size so big here? The mlx5e_sw_stats is < 500 bytes
> > and all the other on-stack stuff looks pretty small?
> 
> I am not entirely sure, but my analysis indicates that gcc tries loop unrolling
> or some other optimization that leads to two copies on the stack.

Wow how strange

Did you consier adding a
  __attribute__((optimize("no-unroll-loops"))) 

type macro?

Jason

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox