Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net 0/7][pull request] Intel Wired LAN Driver Fixes 2018-10-24
From: David Miller @ 2018-10-24 23:28 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann
In-Reply-To: <20181024214731.26036-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 24 Oct 2018 14:47:24 -0700

> This series contains fixes for the ice driver.
> 
> Anirudh fixes a namespace issue which was introduced with a previous
> patch to remove ice_netpoll.  Fixed up the device ID define names to
> align with the branding string names.  Use the capability count returned
> by the firmware, instead of calculating the count.  Introduced driver
> workarounds due to current firmware limitations.  Fixed the queue
> mapping for a VF, which needs to be set in the config and scatter queue
> modes.  Fixed the driver which is setup to handle link status events
> (LSE), even though the firmware does not have this feature yet, so add
> the ability to poll for link status changes while we wait for updated
> firmware.
> 
> The following are changes since commit 44adbac8f7217040be97928cd19998259d9d4418:
>   Merge branch 'work.tty-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 100GbE

Pulled, thanks Jeff.

^ permalink raw reply

* Re: [PATCH bpf 7/7] bpf: make direct packet write unclone more robust
From: Song Liu @ 2018-10-24 23:36 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Alexei Starovoitov, Networking
In-Reply-To: <a851d791-8477-75aa-f0ca-0400c841f987@iogearbox.net>

On Wed, Oct 24, 2018 at 3:08 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 10/24/2018 11:42 PM, Song Liu wrote:
> > On Wed, Oct 24, 2018 at 1:06 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >>
> >> Given this seems to be quite fragile and can easily slip through the
> >> cracks, lets make direct packet write more robust by requiring that
> >> future program types which allow for such write must provide a prologue
> >> callback. In case of XDP and sk_msg it's noop, thus add a generic noop
> >> handler there. The latter starts out with NULL data/data_end unconditionally
> >> when sg pages are shared.
> >>
> >> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> >> Acked-by: Alexei Starovoitov <ast@kernel.org>
> >> ---
> >>  kernel/bpf/verifier.c |  6 +++++-
> >>  net/core/filter.c     | 11 +++++++++++
> >>  2 files changed, 16 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >> index 5fc9a65..171a2c8 100644
> >> --- a/kernel/bpf/verifier.c
> >> +++ b/kernel/bpf/verifier.c
> >> @@ -5709,7 +5709,11 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
> >>         bool is_narrower_load;
> >>         u32 target_size;
> >>
> >> -       if (ops->gen_prologue) {
> >> +       if (ops->gen_prologue || env->seen_direct_write) {
> >> +               if (!ops->gen_prologue) {
> >> +                       verbose(env, "bpf verifier is misconfigured\n");
> >> +                       return -EINVAL;
> >> +               }
> >
> > nit: how about this?
> >
> > diff --git i/kernel/bpf/verifier.c w/kernel/bpf/verifier.c
> > index 6fbe7a8afed7..d35078024e35 100644
> > --- i/kernel/bpf/verifier.c
> > +++ w/kernel/bpf/verifier.c
> > @@ -5286,6 +5286,11 @@ static int convert_ctx_accesses(struct
> > bpf_verifier_env *env)
> >         bool is_narrower_load;
> >         u32 target_size;
> >
> > +       if (!ops->gen_prologue && env->seen_direct_write) {
> > +               verbose(env, "bpf verifier is misconfigured\n");
> > +               return -EINVAL;
> > +       }
> > +
> >         if (ops->gen_prologue) {
> >                 cnt = ops->gen_prologue(insn_buf, env->seen_direct_write,
> >                                         env->prog);
> >
>
> Hm, probably matter of different style preference, personally I'd prefer
> the one as is though.

Yeah, it is just a nitpick.

Thanks!

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply

* Re: [PATCH 0/3] PM: Renesas: Remove dummy runtime PM callbacks
From: Wolfram Sang @ 2018-10-25  0:20 UTC (permalink / raw)
  To: Jarkko Nikula, Magnus Damm
  Cc: linux-pm, linux-i2c, Wolfram Sang, netdev, David S . Miller,
	Sergei Shtylyov, linux-renesas-soc, linux-usb, Yoshihiro Shimoda
In-Reply-To: <20181024142200.GA5637@kunai>

[-- Attachment #1: Type: text/plain, Size: 313 bytes --]


> At least the I2C driver part of this was on my todo list as well (just a
> bit lower :/). I wanted to find out why they have been there in the
> first place. Do you know if such callbacks were needed "back in the
> days"?

I see now that you referenced the relevant commits in the patch
descriptions. Thanks!


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH 1/2] Bluetooth: Add quirk for reading BD_ADDR from fwnode property
From: Matthias Kaehlcke @ 2018-10-25  0:21 UTC (permalink / raw)
  To: Marcel Holtmann, Johan Hedberg, David S . Miller, Loic Poulain
  Cc: linux-bluetooth, netdev, linux-kernel, Balakrishna Godavarthi,
	Brian Norris, Dmitry Grinberg, Matthias Kaehlcke
In-Reply-To: <20181025002134.256791-1-mka@chromium.org>

Add HCI_QUIRK_USE_BDADDR_PROPERTY to allow controllers to retrieve
the public Bluetooth address from the firmware node property
'local-bd-address'. If quirk is set and the property does not exist
or is invalid the controller is marked as unconfigured.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
hci_dev_get_bd_addr_from_property() currently assumes that the
firmware node with 'local-bd-address' is from hdev->dev.parent, not
sure if this universally true. However if it is true for existing
device that might use this interface we can assume this for now
(unless there is a clear solution now), and cross the bridge of
finding an alternative when we actually encounter the situation.
One option could be to look for the first parent that has a fwnode.
---
 include/net/bluetooth/hci.h | 12 +++++++++++
 net/bluetooth/hci_core.c    | 42 +++++++++++++++++++++++++++++++++++++
 net/bluetooth/mgmt.c        |  6 ++++--
 3 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index cdd9f1fe7cfa..a5d748099752 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -158,6 +158,18 @@ enum {
 	 */
 	HCI_QUIRK_INVALID_BDADDR,
 
+	/* When this quirk is set, the public Bluetooth address
+	 * initially reported by HCI Read BD Address command
+	 * is considered invalid. The public BD Address can be
+	 * specified in the fwnode property 'local-bd-address'.
+	 * If this property does not exist or is invalid controller
+	 * configuration is required before this device can be used.
+	 *
+	 * This quirk can be set before hci_register_dev is called or
+	 * during the hdev->setup vendor callback.
+	 */
+	HCI_QUIRK_USE_BDADDR_PROPERTY,
+
 	/* When this quirk is set, the duplicate filtering during
 	 * scanning is based on Bluetooth devices addresses. To allow
 	 * RSSI based updates, restart scanning if needed.
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 74b29c7d841c..97214262c4fb 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -30,6 +30,7 @@
 #include <linux/rfkill.h>
 #include <linux/debugfs.h>
 #include <linux/crypto.h>
+#include <linux/property.h>
 #include <asm/unaligned.h>
 
 #include <net/bluetooth/bluetooth.h>
@@ -1355,9 +1356,40 @@ int hci_inquiry(void __user *arg)
 	return err;
 }
 
+/**
+ * hci_dev_get_bd_addr_from_property - Get the Bluetooth Device Address
+ *				       (BD_ADDR) for a HCI device from
+ *				       a firmware node property.
+ * @hdev:	The HCI device
+ *
+ * Search the firmware node for 'local-bd-address'.
+ *
+ * All-zero BD addresses are rejected, because those could be properties
+ * that exist in the firmware tables, but were not updated by the firmware. For
+ * example, the DTS could define 'local-bd-address', with zero BD addresses.
+ */
+static int hci_dev_get_bd_addr_from_property(struct hci_dev *hdev)
+{
+	struct fwnode_handle *fwnode = dev_fwnode(hdev->dev.parent);
+	bdaddr_t ba;
+	int ret;
+
+	ret = fwnode_property_read_u8_array(fwnode, "local-bd-address",
+					    (u8 *)&ba, sizeof(ba));
+	if (ret < 0)
+		return ret;
+	if (!bacmp(&ba, BDADDR_ANY))
+		return -ENODATA;
+
+	hdev->public_addr = ba;
+
+	return 0;
+}
+
 static int hci_dev_do_open(struct hci_dev *hdev)
 {
 	int ret = 0;
+	bool bd_addr_set = false;
 
 	BT_DBG("%s %p", hdev->name, hdev);
 
@@ -1422,6 +1454,16 @@ static int hci_dev_do_open(struct hci_dev *hdev)
 		if (hdev->setup)
 			ret = hdev->setup(hdev);
 
+		if (test_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks)) {
+			if (!hci_dev_get_bd_addr_from_property(hdev))
+				if (hdev->set_bdaddr &&
+				    !hdev->set_bdaddr(hdev, &hdev->public_addr))
+					bd_addr_set = true;
+
+			if (!bd_addr_set)
+				hci_dev_set_flag(hdev, HCI_UNCONFIGURED);
+		}
+
 		/* The transport driver can set these quirks before
 		 * creating the HCI device or in its setup callback.
 		 *
diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index 3bdc8f3ca259..3d9edb752403 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
@@ -551,7 +551,8 @@ static bool is_configured(struct hci_dev *hdev)
 	    !hci_dev_test_flag(hdev, HCI_EXT_CONFIGURED))
 		return false;
 
-	if (test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) &&
+	if ((test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) ||
+	     test_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks)) &&
 	    !bacmp(&hdev->public_addr, BDADDR_ANY))
 		return false;
 
@@ -566,7 +567,8 @@ static __le32 get_missing_options(struct hci_dev *hdev)
 	    !hci_dev_test_flag(hdev, HCI_EXT_CONFIGURED))
 		options |= MGMT_OPTION_EXTERNAL_CONFIG;
 
-	if (test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) &&
+	if ((test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) ||
+	     test_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks)) &&
 	    !bacmp(&hdev->public_addr, BDADDR_ANY))
 		options |= MGMT_OPTION_PUBLIC_ADDRESS;
 
-- 
2.19.1.568.g152ad8e336-goog

^ permalink raw reply related

* Re: [PATCH 1/4] mfd: altera-sysmgr: Add SOCFPGA System Manager
From: Lee Jones @ 2018-10-25  9:02 UTC (permalink / raw)
  To: thor.thayer
  Cc: peppe.cavallaro, dinguyen, linux, alexandre.torgue, joabreu,
	davem, mchehab+samsung, catalin.marinas, akpm, arnd, aisheng.dong,
	linux-kernel, netdev, linux-arm-kernel
In-Reply-To: <1537826946-18942-2-git-send-email-thor.thayer@linux.intel.com>

Arnd,

Would you be so kind as to look at this please?

On Mon, 24 Sep 2018, thor.thayer@linux.intel.com wrote:
> From: Thor Thayer <thor.thayer@linux.intel.com>
> 
> The SOCFPGA System Manager register block aggregates different
> peripheral functions into one area.
> On 32 bit ARM parts, the syscon driver is used for accesses.
> On 64 bit ARM parts, the System Manager can only be accessed by
> EL3 secure mode. Since a SMC call to EL3 is required, this new
> driver uses regmaps similar to syscon to handle the SMC call.
> 
> Since regmaps abstract out the underlying register access, the
> changes to drivers accessing the System Manager are minimal.
> 
> Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
> ---
>  MAINTAINERS                       |   6 +
>  drivers/mfd/Kconfig               |   9 ++
>  drivers/mfd/Makefile              |   1 +
>  drivers/mfd/altera-sysmgr.c       | 310 ++++++++++++++++++++++++++++++++++++++
>  include/linux/mfd/altera-sysmgr.h | 113 ++++++++++++++
>  5 files changed, 439 insertions(+)
>  create mode 100644 drivers/mfd/altera-sysmgr.c
>  create mode 100644 include/linux/mfd/altera-sysmgr.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d9e6d86488df..bda6b2173cc6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -692,6 +692,12 @@ L:	linux-gpio@vger.kernel.org
>  S:	Maintained
>  F:	drivers/gpio/gpio-altera.c
>  
> +ALTERA SYSTEM MANAGER DRIVER
> +M:	Thor Thayer <thor.thayer@linux.intel.com>
> +S:	Maintained
> +F:	drivers/mfd/altera-sysmgr.c
> +F:	include/linux/mfd/altera-sysgmr.h
> +
>  ALTERA SYSTEM RESOURCE DRIVER FOR ARRIA10 DEVKIT
>  M:	Thor Thayer <thor.thayer@linux.intel.com>
>  S:	Maintained
> diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
> index f3a5f8d02790..1192a25186c7 100644
> --- a/drivers/mfd/Kconfig
> +++ b/drivers/mfd/Kconfig
> @@ -29,6 +29,15 @@ config MFD_ALTERA_A10SR
>  	  accessing the external gpio extender (LEDs & buttons) and
>  	  power supply alarms (hwmon).
>  
> +config MFD_ALTERA_SYSMGR
> +	bool "Altera SOCFPGA System Manager"
> +	depends on (ARCH_SOCFPGA || ARCH_STRATIX10) && OF
> +	select MFD_SYSCON
> +	help
> +	  Select this to get System Manager support for all Altera branded
> +	  SOCFPGAs. The SOCFPGA System Manager handles all SOCFPGAs by
> +	  using syscon for ARM32 parts and SMC calls to EL3 for ARM64 parts.
> +
>  config MFD_ACT8945A
>  	tristate "Active-semi ACT8945A"
>  	select MFD_CORE
> diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
> index 5856a9489cbd..bc1508e3ff7b 100644
> --- a/drivers/mfd/Makefile
> +++ b/drivers/mfd/Makefile
> @@ -232,6 +232,7 @@ obj-$(CONFIG_INTEL_SOC_PMIC_CHTDC_TI)	+= intel_soc_pmic_chtdc_ti.o
>  obj-$(CONFIG_MFD_MT6397)	+= mt6397-core.o
>  
>  obj-$(CONFIG_MFD_ALTERA_A10SR)	+= altera-a10sr.o
> +obj-$(CONFIG_MFD_ALTERA_SYSMGR) += altera-sysmgr.o
>  obj-$(CONFIG_MFD_SUN4I_GPADC)	+= sun4i-gpadc.o
>  
>  obj-$(CONFIG_MFD_STM32_LPTIMER)	+= stm32-lptimer.o
> diff --git a/drivers/mfd/altera-sysmgr.c b/drivers/mfd/altera-sysmgr.c
> new file mode 100644
> index 000000000000..29b0bfcb3f69
> --- /dev/null
> +++ b/drivers/mfd/altera-sysmgr.c
> @@ -0,0 +1,310 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + *  Copyright (C) 2017-2018, Intel Corporation.
> + *  Copyright (C) 2012 Freescale Semiconductor, Inc.
> + *  Copyright (C) 2012 Linaro Ltd.
> + *
> + *  Based on syscon driver.
> + */
> +
> +#include <linux/arm-smccc.h>
> +#include <linux/err.h>
> +#include <linux/io.h>
> +#include <linux/mfd/altera-sysmgr.h>
> +#include <linux/mfd/syscon.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/of_address.h>
> +#include <linux/of_platform.h>
> +#include <linux/regmap.h>
> +#include <linux/slab.h>
> +
> +static struct platform_driver sysmgr_driver;
> +
> +/**
> + * struct altr_sysmgr - Altera SOCFPGA System Manager
> + * @regmap: the regmap used for System Manager accesses.
> + * @base  : the base address for the System Manager
> + */
> +struct altr_sysmgr {
> +	struct regmap *regmap;
> +	void __iomem  *base;
> +};
> +
> +/**
> + * Only 1 instance of System Manager is needed but many
> + * consumers will want to access it with the matching
> + * functions below.
> + */
> +static struct altr_sysmgr *p_sysmgr;
> +
> +/**
> + * s10_protected_reg_write
> + * Write to a protected SMC register.
> + * @base: Base address of System Manager
> + * @reg:  Address offset of register
> + * @val:  Value to write
> + * Return: INTEL_SIP_SMC_STATUS_OK (0) on success
> + *	   INTEL_SIP_SMC_REG_ERROR on error
> + *	   INTEL_SIP_SMC_RETURN_UNKNOWN_FUNCTION if not supported
> + */
> +static int s10_protected_reg_write(void __iomem *base,
> +				   unsigned int reg, unsigned int val)
> +{
> +	struct arm_smccc_res result;
> +	unsigned long sysmgr_base = (unsigned long)base;
> +
> +	arm_smccc_smc(INTEL_SIP_SMC_REG_WRITE, sysmgr_base + reg,
> +		      val, 0, 0, 0, 0, 0, &result);
> +
> +	return (int)result.a0;
> +}
> +
> +/**
> + * s10_protected_reg_read
> + * Read the status of a protected SMC register
> + * @base: Base address of System Manager.
> + * @reg:  Address of register
> + * @val:  Value read.
> + * Return: INTEL_SIP_SMC_STATUS_OK (0) on success
> + *	   INTEL_SIP_SMC_REG_ERROR on error
> + *	   INTEL_SIP_SMC_RETURN_UNKNOWN_FUNCTION if not supported
> + */
> +static int s10_protected_reg_read(void __iomem *base,
> +				  unsigned int reg, unsigned int *val)
> +{
> +	struct arm_smccc_res result;
> +	unsigned long sysmgr_base = (unsigned long)base;
> +
> +	arm_smccc_smc(INTEL_SIP_SMC_REG_READ, sysmgr_base + reg,
> +		      0, 0, 0, 0, 0, 0, &result);
> +
> +	*val = (unsigned int)result.a1;
> +
> +	return (int)result.a0;
> +}
> +
> +static const struct regmap_config s10_sysmgr_regmap_cfg = {
> +	.name = "s10_sysmgr",
> +	.reg_bits = 32,
> +	.reg_stride = 4,
> +	.val_bits = 32,
> +	.reg_read = s10_protected_reg_read,
> +	.reg_write = s10_protected_reg_write,
> +	.fast_io = true,
> +	.use_single_rw = true,
> +};
> +
> +/**
> + * socfpga_is_s10
> + * Determine if running on Stratix10 platform.
> + * Return: True if running Stratix10, otherwise false.
> + */
> +static int socfpga_is_s10(void)
> +{
> +	return of_machine_is_compatible("altr,socfpga-stratix10");
> +}
> +
> +/**
> + * of_sysmgr_register
> + * Create and register the Altera System Manager regmap.
> + * Return: Pointer to new sysmgr on success.
> + *         Pointer error on failure.
> + */
> +static struct altr_sysmgr *of_sysmgr_register(struct device_node *np)
> +{
> +	struct altr_sysmgr *sysmgr;
> +	struct regmap *regmap;
> +	u32 reg_io_width;
> +	int ret;
> +	struct regmap_config sysmgr_config = s10_sysmgr_regmap_cfg;
> +	struct resource res;
> +
> +	if (!of_device_is_compatible(np, "altr,sys-mgr"))
> +		return ERR_PTR(-EINVAL);
> +
> +	sysmgr = kzalloc(sizeof(*sysmgr), GFP_KERNEL);
> +	if (!sysmgr)
> +		return ERR_PTR(-ENOMEM);
> +
> +	if (of_address_to_resource(np, 0, &res)) {
> +		ret = -ENOMEM;
> +		goto err_map;
> +	}
> +
> +	/* Need physical address for SMCC call */
> +	sysmgr->base = (void __iomem *)res.start;
> +
> +	/*
> +	 * search for reg-io-width property in DT. If it is not provided,
> +	 * default to 4 bytes. regmap_init will return an error if values
> +	 * are invalid so there is no need to check them here.
> +	 */
> +	ret = of_property_read_u32(np, "reg-io-width", &reg_io_width);
> +	if (ret)
> +		reg_io_width = 4;
> +
> +	sysmgr_config.reg_stride = reg_io_width;
> +	sysmgr_config.val_bits = reg_io_width * 8;
> +	sysmgr_config.max_register = resource_size(&res) - reg_io_width;
> +
> +	regmap = regmap_init(NULL, NULL, sysmgr->base, &sysmgr_config);
> +	if (IS_ERR(regmap)) {
> +		pr_err("regmap init failed\n");
> +		ret = PTR_ERR(regmap);
> +		goto err_map;
> +	}
> +
> +	sysmgr->regmap = regmap;
> +
> +	p_sysmgr = sysmgr;
> +
> +	return sysmgr;
> +
> +err_map:
> +	kfree(sysmgr);
> +	return ERR_PTR(ret);
> +}
> +
> +struct regmap *altr_sysmgr_node_to_regmap(struct device_node *np)
> +{
> +	struct altr_sysmgr *sysmgr = NULL;
> +
> +	if (!socfpga_is_s10())
> +		return syscon_node_to_regmap(np);
> +
> +	if (!p_sysmgr)
> +		sysmgr = of_sysmgr_register(np);
> +	else
> +		sysmgr = p_sysmgr;
> +
> +	if (IS_ERR_OR_NULL(sysmgr))
> +		return ERR_CAST(sysmgr);
> +
> +	return sysmgr->regmap;
> +}
> +EXPORT_SYMBOL_GPL(altr_sysmgr_node_to_regmap);
> +
> +struct regmap *altr_sysmgr_regmap_lookup_by_compatible(const char *s)
> +{
> +	struct device_node *sysmgr_np;
> +	struct regmap *regmap;
> +
> +	if (!socfpga_is_s10())
> +		return syscon_regmap_lookup_by_compatible(s);
> +
> +	sysmgr_np = of_find_compatible_node(NULL, NULL, s);
> +	if (!sysmgr_np)
> +		return ERR_PTR(-ENODEV);
> +
> +	regmap = altr_sysmgr_node_to_regmap(sysmgr_np);
> +	of_node_put(sysmgr_np);
> +
> +	return regmap;
> +}
> +EXPORT_SYMBOL_GPL(altr_sysmgr_regmap_lookup_by_compatible);
> +
> +static int sysmgr_match_pdevname(struct device *dev, void *data)
> +{
> +	return !strcmp(dev_name(dev), (const char *)data);
> +}
> +
> +struct regmap *altr_sysmgr_regmap_lookup_by_pdevname(const char *s)
> +{
> +	struct device *dev;
> +	struct altr_sysmgr *sysmgr;
> +
> +	if (!socfpga_is_s10())
> +		return syscon_regmap_lookup_by_pdevname(s);
> +
> +	dev = driver_find_device(&sysmgr_driver.driver, NULL, (void *)s,
> +				 sysmgr_match_pdevname);
> +	if (!dev)
> +		return ERR_PTR(-EPROBE_DEFER);
> +
> +	sysmgr = dev_get_drvdata(dev);
> +
> +	return sysmgr->regmap;
> +}
> +EXPORT_SYMBOL_GPL(altr_sysmgr_regmap_lookup_by_pdevname);
> +
> +struct regmap *altr_sysmgr_regmap_lookup_by_phandle(struct device_node *np,
> +						    const char *property)
> +{
> +	struct device_node *sysmgr_np;
> +	struct regmap *regmap;
> +
> +	if (!socfpga_is_s10())
> +		return syscon_regmap_lookup_by_phandle(np, property);
> +
> +	if (property)
> +		sysmgr_np = of_parse_phandle(np, property, 0);
> +	else
> +		sysmgr_np = np;
> +
> +	if (!sysmgr_np)
> +		return ERR_PTR(-ENODEV);
> +
> +	regmap = altr_sysmgr_node_to_regmap(sysmgr_np);
> +	of_node_put(sysmgr_np);
> +
> +	return regmap;
> +}
> +EXPORT_SYMBOL_GPL(altr_sysmgr_regmap_lookup_by_phandle);
> +
> +static int sysmgr_probe(struct platform_device *pdev)
> +{
> +	struct device *dev = &pdev->dev;
> +	struct altr_sysmgr *sysmgr;
> +	struct resource *res;
> +
> +	if (!socfpga_is_s10())
> +		return -ENODEV;
> +
> +	/* Skip Initialization if already created */
> +	if (p_sysmgr)
> +		goto finish;
> +
> +	sysmgr = of_sysmgr_register(pdev->dev.of_node);
> +	if (IS_ERR_OR_NULL(sysmgr)) {
> +		dev_err(dev, "regmap init failed\n");
> +		return -ENODEV;
> +	}
> +
> +finish:
> +	platform_set_drvdata(pdev, p_sysmgr);
> +
> +	dev_dbg(dev, "regmap %pR registered\n", res);
> +
> +	return 0;
> +}
> +
> +static const struct of_device_id altr_sysmgr_of_match[] = {
> +	{ .compatible = "altr,sys-mgr" },
> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, altr_sysmgr_of_match);
> +
> +static struct platform_driver altr_sysmgr_driver = {
> +	.probe =  sysmgr_probe,
> +	.driver = {
> +		.name = "altr,system_manager",
> +		.of_match_table = altr_sysmgr_of_match,
> +	},
> +};
> +
> +static int __init altr_sysmgr_init(void)
> +{
> +	return platform_driver_register(&altr_sysmgr_driver);
> +}
> +core_initcall(altr_sysmgr_init);
> +
> +static void __exit altr_sysmgr_exit(void)
> +{
> +	platform_driver_unregister(&altr_sysmgr_driver);
> +}
> +module_exit(altr_sysmgr_exit);
> +
> +MODULE_AUTHOR("Thor Thayer <>");
> +MODULE_DESCRIPTION("SOCFPGA System Manager driver");
> +MODULE_LICENSE("GPL v2");
> diff --git a/include/linux/mfd/altera-sysmgr.h b/include/linux/mfd/altera-sysmgr.h
> new file mode 100644
> index 000000000000..b82116706319
> --- /dev/null
> +++ b/include/linux/mfd/altera-sysmgr.h
> @@ -0,0 +1,113 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2018 Intel Corporation
> + * Copyright (C) 2012 Freescale Semiconductor, Inc.
> + * Copyright (C) 2012 Linaro Ltd.
> + */
> +
> +#ifndef __LINUX_MFD_ALTERA_SYSMGR_H__
> +#define __LINUX_MFD_ALTERA_SYSMGR_H__
> +
> +#include <linux/err.h>
> +#include <linux/errno.h>
> +
> +struct device_node;
> +
> +#ifdef CONFIG_MFD_ALTERA_SYSMGR
> +struct regmap *altr_sysmgr_node_to_regmap(struct device_node *np);
> +struct regmap *altr_sysmgr_regmap_lookup_by_compatible(const char *s);
> +struct regmap *altr_sysmgr_regmap_lookup_by_pdevname(const char *s);
> +struct regmap *altr_sysmgr_regmap_lookup_by_phandle(struct device_node *np,
> +						    const char *property);
> +
> +/*
> + * Functions specified by ARM SMC Calling convention:
> + *
> + * FAST call executes atomic operations, returns when the requested operation
> + * has completed.
> + * STD call starts a operation which can be preempted by a non-secure
> + * interrupt.
> + *
> + * a0..a7 is used as register names in the descriptions below, on arm32
> + * that translates to r0..r7 and on arm64 to w0..w7.
> + */
> +
> +#define INTEL_SIP_SMC_STD_CALL_VAL(func_num) \
> +	ARM_SMCCC_CALL_VAL(ARM_SMCCC_STD_CALL, ARM_SMCCC_SMC_64, \
> +	ARM_SMCCC_OWNER_SIP, (func_num))
> +
> +#define INTEL_SIP_SMC_FAST_CALL_VAL(func_num) \
> +	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, ARM_SMCCC_SMC_64, \
> +	ARM_SMCCC_OWNER_SIP, (func_num))
> +
> +#define INTEL_SIP_SMC_RETURN_UNKNOWN_FUNCTION		0xFFFFFFFF
> +#define INTEL_SIP_SMC_STATUS_OK				0x0
> +#define INTEL_SIP_SMC_REG_ERROR				0x5
> +
> +/*
> + * Request INTEL_SIP_SMC_REG_READ
> + *
> + * Read a protected register using SMCCC
> + *
> + * Call register usage:
> + * a0: INTEL_SIP_SMC_REG_READ.
> + * a1: register address.
> + * a2-7: not used.
> + *
> + * Return status:
> + * a0: INTEL_SIP_SMC_STATUS_OK, INTEL_SIP_SMC_REG_ERROR, or
> + *     INTEL_SIP_SMC_RETURN_UNKNOWN_FUNCTION
> + * a1: Value in the register
> + * a2-3: not used.
> + */
> +#define INTEL_SIP_SMC_FUNCID_REG_READ 7
> +#define INTEL_SIP_SMC_REG_READ \
> +	INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_REG_READ)
> +
> +/*
> + * Request INTEL_SIP_SMC_REG_WRITE
> + *
> + * Write a protected register using SMCCC
> + *
> + * Call register usage:
> + * a0: INTEL_SIP_SMC_REG_WRITE.
> + * a1: register address
> + * a2: value to program into register.
> + * a3-7: not used.
> + *
> + * Return status:
> + * a0: INTEL_SIP_SMC_STATUS_OK, INTEL_SIP_SMC_REG_ERROR, or
> + *     INTEL_SIP_SMC_RETURN_UNKNOWN_FUNCTION
> + * a1-3: not used.
> + */
> +#define INTEL_SIP_SMC_FUNCID_REG_WRITE 8
> +#define INTEL_SIP_SMC_REG_WRITE \
> +	INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_REG_WRITE)
> +
> +#else
> +static inline struct regmap *altr_sysmgr_node_to_regmap(struct device_node *np)
> +{
> +	return ERR_PTR(-ENOTSUPP);
> +}
> +
> +static inline struct regmap *
> +altr_sysmgr_regmap_lookup_by_compatible(const char *s)
> +{
> +	return ERR_PTR(-ENOTSUPP);
> +}
> +
> +static inline struct regmap *
> +altr_sysmgr_regmap_lookup_by_pdevname(const char *s)
> +{
> +	return ERR_PTR(-ENOTSUPP);
> +}
> +
> +static inline struct regmap *
> +altr_sysmgr_regmap_lookup_by_phandle(struct device_node *np,
> +				     const char *property)
> +{
> +	return ERR_PTR(-ENOTSUPP);
> +}
> +#endif
> +
> +#endif /* __LINUX_MFD_ALTERA_SYSMGR_H__ */

-- 
Lee Jones [李琼斯]
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply

* RE: [PATCH] bonding:avoid repeated display of same link status change
From: Manish Kumar Singh @ 2018-10-25  9:21 UTC (permalink / raw)
  To: Michal Kubecek, Eric Dumazet
  Cc: Mahesh Bandewar (महेश बंडेवार),
	linux-netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, linux-kernel
In-Reply-To: <20181023163825.GB22291@unicorn.suse.cz>



> -----Original Message-----
> From: Michal Kubecek [mailto:mkubecek@suse.cz]
> Sent: 23 अक्तूबर 2018 22:08
> To: Eric Dumazet
> Cc: Mahesh Bandewar (महेश बंडेवार); Manish Kumar Singh; linux-netdev; Jay
> Vosburgh; Veaceslav Falico; Andy Gospodarek; David S. Miller; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH] bonding:avoid repeated display of same link status
> change
> 
> On Tue, Oct 23, 2018 at 06:26:14PM +0200, Michal Kubecek wrote:
> > On Tue, Oct 23, 2018 at 09:10:44AM -0700, Eric Dumazet wrote:
> > >
> > >
> > > On 10/23/2018 08:54 AM, Mahesh Bandewar (महेश बंडेवार) wrote:
> > >
> > > > Atomic operations are expensive (on certain architectures) and miimon
> > > > runs quite frequently. Is the added cost of these atomic operations
> > > > even worth just to avoid *duplicate info* messages? This seems like a
> > > > overkill!
> > >
> > > atomic_read() is a simple read, no atomic operation involved.
> > >
> > > Same remark for atomic_set()
> >
> > Which makes me wonder if the patch really needs atomic_t.
> 
> IMHO it does not. AFAICS multiple instances of bond_mii_monitor() cannot
> run simultaneously for the same bond so that there doesn't seem to be
> anything to collide with. (And if they could, we would need to test and
> set the flag atomically in bond_miimon_inspect().)
> 
Yes, Michal, we are inline with your understanding.
when the -original- patch was posted to upstream there was no -synchronization- nor -racing- addressing code was in read/write of this added filed, as we -never- saw need for either.

-only- writer of the added field is bond_mii_monitor.
-only- reader of the added field is bond_miimon_inspect.
-this writer & reader -never- can run concurrently.
-writer invokes the reader.

hence, imo uint_8 rtnl_needed is all what is needed; with bond_mii_monitor doing rtnl_needed = 1; and bond_miimon_inspect doing if rtnl_needed.

here is the gravity of the situation with multiple customers whose names including machine names redacted:

 4353 May 31 02:38:57 hostname kernel: ixgbe 0000:03:00.0: removed PHC on p2p1
 4354 May 31 02:38:57 hostname kernel: public: link status down for active interface p2p1, disabling it in 100 ms
 4355 May 31 02:38:57 hostname kernel: public: link status down for active interface p2p1, disabling it in 100 ms
 4356 May 31 02:38:57 hostname kernel: public: link status definitely down for interface p2p1, disabling it
 4357 May 31 02:38:57 hostname kernel: public: making interface p2p2 the new active one
 4358 May 31 02:38:59 hostname kernel: ixgbe 0000:03:00.0: registered PHC device on p2p1
 4359 May 31 02:39:00 hostname kernel: ixgbe 0000:03:00.0 p2p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
 4360 May 31 02:39:00 hostname kernel: public: link status up for interface p2p1, enabling it in 200 ms
 4361 May 31 02:39:00 hostname kernel: public: link status definitely up for interface p2p1, 10000 Mbps full duplex
 4362 May 31 02:45:37 hostname journal: Missed 217723 kernel messages
 4363 May 31 02:45:37 hostname kernel: public: link status down for active interface p2p2, disabling it in 100 ms
 	---------------------
                      11000+ APPROX SAME REPEATED MESSAGES in second
	---------------------
15877 May 31 02:45:37 hostname kernel: public: link status down for active interface p2p2, disabling it in 100 ms
15878 May 31 02:45:37 hostname kernel: public: link status definitely down for interface p2p2, disabling it
15879 May 31 02:45:37 hostname kernel: public: making interface p2p1 the new active one

Thanks,
Manish

> Michal Kubecek

^ permalink raw reply

* Re: [PATCH] bonding:avoid repeated display of same link status change
From: Michal Kubecek @ 2018-10-25  9:29 UTC (permalink / raw)
  To: Manish Kumar Singh
  Cc: Eric Dumazet,
	Mahesh Bandewar (महेश बंडेवार),
	linux-netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, linux-kernel
In-Reply-To: <bee18dd4-0012-44d0-9508-2987bfe521e5@default>

On Thu, Oct 25, 2018 at 02:21:05AM -0700, Manish Kumar Singh wrote:
> > From: Michal Kubecek [mailto:mkubecek@suse.cz]
> > IMHO it does not. AFAICS multiple instances of bond_mii_monitor() cannot
> > run simultaneously for the same bond so that there doesn't seem to be
> > anything to collide with. (And if they could, we would need to test and
> > set the flag atomically in bond_miimon_inspect().)
> > 
> Yes, Michal, we are inline with your understanding.
> when the -original- patch was posted to upstream there was no
> -synchronization- nor -racing- addressing code was in read/write of this
> added filed, as we -never- saw need for either.
> 
> -only- writer of the added field is bond_mii_monitor.
> -only- reader of the added field is bond_miimon_inspect.
> -this writer & reader -never- can run concurrently.
> -writer invokes the reader.
> 
> hence, imo uint_8 rtnl_needed is all what is needed; with bond_mii_monitor doing rtnl_needed = 1; and bond_miimon_inspect doing if rtnl_needed.
> 
> here is the gravity of the situation with multiple customers whose names including machine names redacted:
> 
>  4353 May 31 02:38:57 hostname kernel: ixgbe 0000:03:00.0: removed PHC on p2p1
>  4354 May 31 02:38:57 hostname kernel: public: link status down for active interface p2p1, disabling it in 100 ms
>  4355 May 31 02:38:57 hostname kernel: public: link status down for active interface p2p1, disabling it in 100 ms
>  4356 May 31 02:38:57 hostname kernel: public: link status definitely down for interface p2p1, disabling it
>  4357 May 31 02:38:57 hostname kernel: public: making interface p2p2 the new active one
>  4358 May 31 02:38:59 hostname kernel: ixgbe 0000:03:00.0: registered PHC device on p2p1
>  4359 May 31 02:39:00 hostname kernel: ixgbe 0000:03:00.0 p2p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
>  4360 May 31 02:39:00 hostname kernel: public: link status up for interface p2p1, enabling it in 200 ms
>  4361 May 31 02:39:00 hostname kernel: public: link status definitely up for interface p2p1, 10000 Mbps full duplex
>  4362 May 31 02:45:37 hostname journal: Missed 217723 kernel messages
>  4363 May 31 02:45:37 hostname kernel: public: link status down for active interface p2p2, disabling it in 100 ms
>  	---------------------
>                       11000+ APPROX SAME REPEATED MESSAGES in second
> 	---------------------
> 15877 May 31 02:45:37 hostname kernel: public: link status down for active interface p2p2, disabling it in 100 ms
> 15878 May 31 02:45:37 hostname kernel: public: link status definitely down for interface p2p2, disabling it
> 15879 May 31 02:45:37 hostname kernel: public: making interface p2p1 the new active one

When I was replying, I didn't know this was a v2 and I haven't seen the
v1 discussion. I have read it since and I think I understand Eric's
point now. The thing is that just adding e.g. u8 is OK as it is now.
However, someone could later add another u8 next to it which would also
be perfectly OK on its own but reads/writes to these two could collide
between each other.

And as pointed out by a colleague, even having atomic_t and u8 flag in
one 64-bit word could be a problem on architectures which cannot do an
atomic read/write from/to a 32-bit word (sparc seems to be one).

Michal Kubecek

^ permalink raw reply

* Re: Regression: kernel 4.14 an later very slow with many ipsec tunnels
From: Wolfgang Walter @ 2018-10-25  9:38 UTC (permalink / raw)
  To: netdev
  Cc: Florian Westphal, Steffen Klassert, David Miller, linux-kernel,
	torvalds, christophe.gouault, Greg KH
In-Reply-To: <1729915.dWWxddREcQ@stwm.de>

Hello,

there is now a new 4.19 which still has the big performance regression when 
many ipsec tunnels are configured (throughput and latency get worse by 10 to 
50 times) which makes any kernel > 4.9 unusable for our routers.

I still don't understand why a revert of the flow cache removal at least for 
the longterm kernels is that a bad option (maybe as a compile time option), 
especially as there is no workaround available.

We use linux in that scenario since more than 10 years so I'm really rather 
unhappy if not to say despaired that we will be stucked with 4.9 for an 
unforeseeable future.

We would have detected and reported that performance regression much earlier 
if not another bug in ipsec had prevented us from running 4.14 and later until 
end of august 2018 (See kernels > v4.12 oops/crash with ipsec-traffic: 
bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and 
remove the operation of dst_free()).

Am Donnerstag, 4. Oktober 2018, 15:57:52 schrieb Wolfgang Walter:
> Am Dienstag, 2. Oktober 2018, 23:35:36 schrieb Florian Westphal:

[snip]

> > Well, I'm not going to send a revert of the flowcache removal.
> >
> > 
> > I'm willing to experiment with alternatives to a full iteration of the
> > inexact list but thats it.
> 
> If this brings performance back to pre-removal, I'm fine with that. I'm even
> fine if it is slower by a factor of 2.
> 
> I think it is a serious regression, and there is no workaround, and therefor
> it cannot stay like that.
> 
> So I still hope that reverting is an option if no acceptable solution can be
> found.
> 
> > > correctly done, as it still would have to obey the original order of the
> > > rules (their priority).
> > 
> > Except that neither the priority or the order in which it was added
> > matters in case the selector doesn't match.
> 
> To match a packet one has to find all matching rules and chose that one with
> the lowest priority.
> 
> "indexing" by dst will not help much if you have a ruleset where a lot of
> rules sharing a dst. You also have to replicate rules with dsts that have a
> prefix oft another dst as they may habe a higher priority even if they are
> less specific.
> 
> Every such entry may again have such an "indexing" by dst. Only then this
> would be efficient.
>

[snip]

> 
> I wonder why there is no such thing for netfilter or the rules list in
> routing. nf does not have such a thing, either. This is the reason why I
> think that this is not that easy and for longterm kernel 4.14 the best
> solution will be a revert anyway.
> 
> Regards,

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

^ permalink raw reply

* [PATCH] octeontx2-af: Use GFP_ATOMIC under spin lock
From: Wei Yongjun @ 2018-10-25  1:42 UTC (permalink / raw)
  To: Sunil Goutham, Linu Cherian, Geetha sowjanya, Jerin Jacob
  Cc: Wei Yongjun, netdev, kernel-janitors

The function nix_update_mce_list() is called from
nix_update_bcast_mce_list(), and a spin lock is held
here, so we should use GFP_ATOMIC instead.

Fixes: 4b05528ebf0c ("octeontx2-af: Update bcast list upon NIXLF alloc/free")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
index 8890c95..a618e48 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
@@ -1294,7 +1294,7 @@ static int nix_update_mce_list(struct nix_mce_list *mce_list,
 		return 0;
 
 	/* Add a new one to the list, at the tail */
-	mce = kzalloc(sizeof(*mce), GFP_KERNEL);
+	mce = kzalloc(sizeof(*mce), GFP_ATOMIC);
 	if (!mce)
 		return -ENOMEM;
 	mce->idx = idx;

^ permalink raw reply related

* BUG: KMSAN: uninit-value in selinux_socket_bind, selinux_socket_connect_helper
From: Kyungtae Kim @ 2018-10-25 10:35 UTC (permalink / raw)
  To: davem; +Cc: lifeasageek, syzkaller, threeearcat, linux-kernel, netdev

We report two crashes related (in v4.19-rc8) :
"BUG: KMSAN: uninit-value in selinux_socket_bind"
"BUG: KMSAN: uninit-value in selinux_socket_connect_helper”

kernel config: https://kt0755.github.io/etc/config-4.19-rc2.kmsan
repro: https://kt0755.github.io/etc/repro.b0e55.c

Since both crashes share the same issue, we just explain one of the two.
When the third argument of bind() (i.e., addrlen) is zero, in __sys_bind(),
data copy from user sockaddr to kernel sockaddr does not occur
(net/socket.c:186).
However, a subsequent function selinux_socket_bind() tries to read
the kernel sockaddr (address->sa_family) that was not initialized at all.

Crash log1
==================================================================
BUG: KMSAN: uninit-value in selinux_socket_bind+0x61b/0x1040
security/selinux/hooks.c:4643
CPU: 0 PID: 19070 Comm: syz-executor6 Not tainted 4.19.0-rc8+ #18
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x305/0x460 lib/dump_stack.c:113
 kmsan_report+0x1a2/0x2e0 mm/kmsan/kmsan.c:917
 __msan_warning+0x7d/0xe0 mm/kmsan/kmsan_instr.c:500
 selinux_socket_bind+0x61b/0x1040 security/selinux/hooks.c:4643
 security_socket_bind+0x127/0x200 security/security.c:1390
 __sys_bind+0x577/0x7e0 net/socket.c:1479
 __do_sys_bind net/socket.c:1494 [inline]
 __se_sys_bind+0x8d/0xb0 net/socket.c:1492
 __x64_sys_bind+0x4a/0x70 net/socket.c:1492
 do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x4497b9
Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fdddf1e9c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 00007fdddf1ea6cc RCX: 00000000004497b9
RDX: 0000000000000000 RSI: 0000000020000040 RDI: 0000000000000013
RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000000004c8 R14: 00000000006e8568 R15: 00007fdddf1ea700

Local variable description: ----address@__sys_bind
Variable was created at:
 __sys_bind+0x89/0x7e0 net/socket.c:1470
 __do_sys_bind net/socket.c:1494 [inline]
 __se_sys_bind+0x8d/0xb0 net/socket.c:1492
==================================================================

Crash log2
==================================================================
BUG: KMSAN: uninit-value in selinux_socket_connect_helper+0x55c/0x960
security/selinux/hooks.c:4775
CPU: 0 PID: 8234 Comm: syz-executor2 Not tainted 4.19.0-rc8+ #18
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x305/0x460 lib/dump_stack.c:113
 kmsan_report+0x1a2/0x2e0 mm/kmsan/kmsan.c:917
 __msan_warning+0x7d/0xe0 mm/kmsan/kmsan_instr.c:500
 selinux_socket_connect_helper+0x55c/0x960 security/selinux/hooks.c:4775
 selinux_socket_connect+0xbe/0x180 security/selinux/hooks.c:4834
 security_socket_connect+0x127/0x200 security/security.c:1395
 __sys_connect+0x577/0x850 net/socket.c:1660
 __do_sys_connect net/socket.c:1675 [inline]
 __se_sys_connect+0x8d/0xb0 net/socket.c:1672
 __x64_sys_connect+0x4a/0x70 net/socket.c:1672
 do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x4497b9
Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ff660d67c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 00007ff660d686cc RCX: 00000000004497b9
RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000013
RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000ae0 R14: 00000000006e8b80 R15: 00007ff660d68700

Local variable description: ----address@__sys_connect
Variable was created at:
 __sys_connect+0x89/0x850 net/socket.c:1647
 __do_sys_connect net/socket.c:1675 [inline]
 __se_sys_connect+0x8d/0xb0 net/socket.c:1672
==================================================================

We provide a simple patch below to stop them.
There are a few more lines that invoke move_addr_to_kernel(), but the
two of them
(bind and connect) seem to be the only cases to be corrected.


diff --git a/net/socket.c b/net/socket.c
index 390a8ec..de0931c2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1475,7 +1475,7 @@ int __sys_bind(int fd, struct sockaddr __user
*umyaddr, int addrlen)
        sock = sockfd_lookup_light(fd, &err, &fput_needed);
        if (sock) {
                err = move_addr_to_kernel(umyaddr, addrlen, &address);
-               if (err >= 0) {
+               if (err > 0) {
                        err = security_socket_bind(sock,
                                                   (struct sockaddr *)&address,
                                                   addrlen);
@@ -1653,7 +1653,7 @@ int __sys_connect(int fd, struct sockaddr __user
*uservaddr, int addrlen)
        if (!sock)
                goto out;
        err = move_addr_to_kernel(uservaddr, addrlen, &address);
-       if (err < 0)
+       if (err <= 0)
                goto out_put;

        err =


Thanks,
Kyungtae Kim

^ permalink raw reply related

* [PATCH] dp83867: fix speed 10 in sgmii mode
From: Max Uvarov @ 2018-10-25 11:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: andrew, f.fainelli, netdev, Max Uvarov

For support 10Mps sped in SGMII mode DP83867_10M_SGMII_RATE_ADAPT bit
of DP83867_10M_SGMII_CFG register has to be cleared by software.
That does not affect speeds 100 and 1000 so can be done on init.

Signed-off-by: Max Uvarov <muvarov@gmail.com>
---
 http://www.ti.com/lit/ds/symlink/dp83867e.pdf
 page 87 for register bits
 and
 page 23 last sentance that bit needs to be cleared.

 drivers/net/phy/dp83867.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index ab58224f897f..1df4da3f70a9 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -37,6 +37,7 @@
 #define DP83867_STRAP_STS1	0x006E
 #define DP83867_RGMIIDCTL	0x0086
 #define DP83867_IO_MUX_CFG	0x0170
+#define DP83867_10M_SGMII_CFG  0x016F
 
 #define DP83867_SW_RESET	BIT(15)
 #define DP83867_SW_RESTART	BIT(14)
@@ -79,6 +80,9 @@
 /* CFG4 bits */
 #define DP83867_CFG4_PORT_MIRROR_EN              BIT(0)
 
+/* 10M_SGMII_CFG bits */
+#define DP83867_10M_SGMII_RATE_ADAPT		 BIT(7)
+
 enum {
 	DP83867_PORT_MIRROING_KEEP,
 	DP83867_PORT_MIRROING_EN,
@@ -285,6 +289,24 @@ static int dp83867_config_init(struct phy_device *phydev)
 		}
 	}
 
+	if (phydev->interface == PHY_INTERFACE_MODE_SGMII) {
+		/* For support SPEED_10 in SGMII mode
+		 * DP83867_10M_SGMII_RATE_ADAPT bit
+		 * has to be cleared by software. That
+		 * does not affect SPEED_100 and
+		 * SPEED_1000.
+		 */
+		val = phy_read_mmd(phydev, DP83867_DEVADDR,
+				   DP83867_10M_SGMII_CFG);
+		val &= ~DP83867_10M_SGMII_RATE_ADAPT;
+		ret = phy_write_mmd(phydev, DP83867_DEVADDR,
+				    DP83867_10M_SGMII_CFG, val);
+		if (ret) {
+			WARN_ONCE(1, "dp83867: error DP83867_10M_SGMII_CFG read\n");
+			return ret;
+		}
+	}
+
 	/* Enable Interrupt output INT_OE in CFG3 register */
 	if (phy_interrupt_is_valid(phydev)) {
 		val = phy_read(phydev, DP83867_CFG3);
-- 
2.17.1

^ permalink raw reply related

* Re: BUG: KMSAN: uninit-value in selinux_socket_bind, selinux_socket_connect_helper
From: Alexander Potapenko @ 2018-10-25 11:07 UTC (permalink / raw)
  To: kt0755
  Cc: David Miller, lifeasageek, syzkaller, threeearcat, LKML,
	Networking, alexey.kodanev
In-Reply-To: <CAEAjamv0P63dTXEjzbDp6ahaNRuUiRPWTnj_N2RV7monBk4bHw@mail.gmail.com>

On Thu, Oct 25, 2018 at 12:35 PM Kyungtae Kim <kt0755@gmail.com> wrote:
>
> We report two crashes related (in v4.19-rc8) :
> "BUG: KMSAN: uninit-value in selinux_socket_bind"
> "BUG: KMSAN: uninit-value in selinux_socket_connect_helper”
>
> kernel config: https://kt0755.github.io/etc/config-4.19-rc2.kmsan
> repro: https://kt0755.github.io/etc/repro.b0e55.c
>
> Since both crashes share the same issue, we just explain one of the two.
> When the third argument of bind() (i.e., addrlen) is zero, in __sys_bind(),
> data copy from user sockaddr to kernel sockaddr does not occur
> (net/socket.c:186).
> However, a subsequent function selinux_socket_bind() tries to read
> the kernel sockaddr (address->sa_family) that was not initialized at all.
>
> Crash log1
> ==================================================================
> BUG: KMSAN: uninit-value in selinux_socket_bind+0x61b/0x1040
> security/selinux/hooks.c:4643
> CPU: 0 PID: 19070 Comm: syz-executor6 Not tainted 4.19.0-rc8+ #18
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x305/0x460 lib/dump_stack.c:113
>  kmsan_report+0x1a2/0x2e0 mm/kmsan/kmsan.c:917
>  __msan_warning+0x7d/0xe0 mm/kmsan/kmsan_instr.c:500
>  selinux_socket_bind+0x61b/0x1040 security/selinux/hooks.c:4643
>  security_socket_bind+0x127/0x200 security/security.c:1390
>  __sys_bind+0x577/0x7e0 net/socket.c:1479
>  __do_sys_bind net/socket.c:1494 [inline]
>  __se_sys_bind+0x8d/0xb0 net/socket.c:1492
>  __x64_sys_bind+0x4a/0x70 net/socket.c:1492
>  do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
>  entry_SYSCALL_64_after_hwframe+0x63/0xe7
> RIP: 0033:0x4497b9
> Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> 01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007fdddf1e9c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
> RAX: ffffffffffffffda RBX: 00007fdddf1ea6cc RCX: 00000000004497b9
> RDX: 0000000000000000 RSI: 0000000020000040 RDI: 0000000000000013
> RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 00000000000004c8 R14: 00000000006e8568 R15: 00007fdddf1ea700
>
> Local variable description: ----address@__sys_bind
> Variable was created at:
>  __sys_bind+0x89/0x7e0 net/socket.c:1470
>  __do_sys_bind net/socket.c:1494 [inline]
>  __se_sys_bind+0x8d/0xb0 net/socket.c:1492
> ==================================================================
>
> Crash log2
> ==================================================================
> BUG: KMSAN: uninit-value in selinux_socket_connect_helper+0x55c/0x960
> security/selinux/hooks.c:4775
> CPU: 0 PID: 8234 Comm: syz-executor2 Not tainted 4.19.0-rc8+ #18
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x305/0x460 lib/dump_stack.c:113
>  kmsan_report+0x1a2/0x2e0 mm/kmsan/kmsan.c:917
>  __msan_warning+0x7d/0xe0 mm/kmsan/kmsan_instr.c:500
>  selinux_socket_connect_helper+0x55c/0x960 security/selinux/hooks.c:4775
>  selinux_socket_connect+0xbe/0x180 security/selinux/hooks.c:4834
>  security_socket_connect+0x127/0x200 security/security.c:1395
>  __sys_connect+0x577/0x850 net/socket.c:1660
>  __do_sys_connect net/socket.c:1675 [inline]
>  __se_sys_connect+0x8d/0xb0 net/socket.c:1672
>  __x64_sys_connect+0x4a/0x70 net/socket.c:1672
>  do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
>  entry_SYSCALL_64_after_hwframe+0x63/0xe7
> RIP: 0033:0x4497b9
> Code: e8 8c 9f 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> 01 f0 ff ff 0f 83 9b 6b fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007ff660d67c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
> RAX: ffffffffffffffda RBX: 00007ff660d686cc RCX: 00000000004497b9
> RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000013
> RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 0000000000000ae0 R14: 00000000006e8b80 R15: 00007ff660d68700
>
> Local variable description: ----address@__sys_connect
> Variable was created at:
>  __sys_connect+0x89/0x850 net/socket.c:1647
>  __do_sys_connect net/socket.c:1675 [inline]
>  __se_sys_connect+0x8d/0xb0 net/socket.c:1672
> ==================================================================
>
> We provide a simple patch below to stop them.
> There are a few more lines that invoke move_addr_to_kernel(), but the
> two of them
> (bind and connect) seem to be the only cases to be corrected.
>
>
> diff --git a/net/socket.c b/net/socket.c
> index 390a8ec..de0931c2 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -1475,7 +1475,7 @@ int __sys_bind(int fd, struct sockaddr __user
> *umyaddr, int addrlen)
>         sock = sockfd_lookup_light(fd, &err, &fput_needed);
>         if (sock) {
>                 err = move_addr_to_kernel(umyaddr, addrlen, &address);
> -               if (err >= 0) {
> +               if (err > 0) {
>                         err = security_socket_bind(sock,
>                                                    (struct sockaddr *)&address,
>                                                    addrlen);
> @@ -1653,7 +1653,7 @@ int __sys_connect(int fd, struct sockaddr __user
> *uservaddr, int addrlen)
>         if (!sock)
>                 goto out;
>         err = move_addr_to_kernel(uservaddr, addrlen, &address);
> -       if (err < 0)
> +       if (err <= 0)
>                 goto out_put;
>
>         err =
I believe a better fix would be to add an |addrlen| check to
selinux_socket_bind(), which is illegally assuming
|address->sa_family| contains any reasonable data regardless of actual
address length.
I suspect the bug has been introduced in
https://github.com/torvalds/linux/commit/0f8db8cc73df60b3de9a5eebd8f117b56eff5b03
>
> Thanks,
> Kyungtae Kim
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply

* RE: [PATCH 3/3] usb: renesas_usbhs: Remove dummy runtime PM callbacks
From: Yoshihiro Shimoda @ 2018-10-25  2:45 UTC (permalink / raw)
  To: Jarkko Nikula, linux-pm@vger.kernel.org
  Cc: linux-i2c@vger.kernel.org, Wolfram Sang, netdev@vger.kernel.org,
	David S . Miller, Sergei Shtylyov,
	linux-renesas-soc@vger.kernel.org, linux-usb@vger.kernel.org
In-Reply-To: <20181024135134.28456-4-jarkko.nikula@linux.intel.com>

Hi Jarkko,

> From: Jarkko Nikula, Sent: Wednesday, October 24, 2018 10:52 PM
> 
> Platform drivers don't need dummy runtime PM callbacks that just return
> success in order to have runtime PM happening. This has changed since
> following commits:
> 
> 05aa55dddb9e ("PM / Runtime: Lenient generic runtime pm callbacks")
> 543f2503a956 ("PM / platform_bus: Allow runtime PM by default")
> 8b313a38ecff ("PM / Platform: Use generic runtime PM callbacks directly")
> 
> Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
> ---
> Only build tested.

Thank you for the patch!

Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>

> ---
>  drivers/usb/renesas_usbhs/common.c | 14 --------------
>  1 file changed, 14 deletions(-)
> 
> diff --git a/drivers/usb/renesas_usbhs/common.c b/drivers/usb/renesas_usbhs/common.c
> index a3e1290d682d..0e760f228dd8 100644
> --- a/drivers/usb/renesas_usbhs/common.c
> +++ b/drivers/usb/renesas_usbhs/common.c
> @@ -874,23 +874,9 @@ static int usbhsc_resume(struct device *dev)
>  	return 0;
>  }
> 
> -static int usbhsc_runtime_nop(struct device *dev)
> -{
> -	/* Runtime PM callback shared between ->runtime_suspend()
> -	 * and ->runtime_resume(). Simply returns success.
> -	 *
> -	 * This driver re-initializes all registers after
> -	 * pm_runtime_get_sync() anyway so there is no need
> -	 * to save and restore registers here.
> -	 */

I guess this code came from i2c-sh_mobile.c or sh_eth.c
and we didn't realize this code didn't need at that time (Oct 10 2011).

Best regards,
Yoshihiro Shimoda

> -	return 0;
> -}
> -
>  static const struct dev_pm_ops usbhsc_pm_ops = {
>  	.suspend		= usbhsc_suspend,
>  	.resume			= usbhsc_resume,
> -	.runtime_suspend	= usbhsc_runtime_nop,
> -	.runtime_resume		= usbhsc_runtime_nop,
>  };
> 
>  static struct platform_driver renesas_usbhs_driver = {
> --
> 2.19.1

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 03/10] audit: log container info of syscalls
From: Richard Guy Briggs @ 2018-10-25 12:22 UTC (permalink / raw)
  To: Paul Moore
  Cc: containers, linux-api, linux-audit, linux-fsdevel, linux-kernel,
	netdev, netfilter-devel, ebiederm, luto, carlos, dhowells, viro,
	simo, Eric Paris, Serge Hallyn
In-Reply-To: <166a9dae538.280e.85c95baa4474aabc7814e68940a78392@paul-moore.com>

On 2018-10-25 07:13, Paul Moore wrote:
> On October 25, 2018 1:43:16 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> > On 2018-10-24 16:55, Paul Moore wrote:
> >> On Wed, Oct 24, 2018 at 11:15 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> >>> On 2018-10-19 19:16, Paul Moore wrote:
> >>>> On Sun, Aug 5, 2018 at 4:32 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> >
> 
> ...
> 
> >
> >>>> However, I do care about the "op" field in this record.  It just
> >>>> doesn't make any sense; the way you are using it it is more of a
> >>>> context field than an operations field, and even then why is the
> >>>> context important from a logging and/or security perspective?  Drop it
> >>>> please.
> >>>
> >>> I'll rename it to whatever you like.  I'd suggest "ref=".  The reason I
> >>> think it is important is there are multiple sources that aren't always
> >>> obvious from the other records to which it is associated.  In the case
> >>> of ptrace and signals, there can be many target tasks listed (OBJ_PID)
> >>> with no other way to distinguish the matching audit container identifier
> >>> records all for one event.  This is in addition to the default syscall
> >>> container identifier record.  I'm not currently happy with the text
> >>> content to link the two, but that should be solvable (most obvious is
> >>> taret PID).  Throwing away this information seems shortsighted.
> >>
> >> It would be helpful if you could generate real audit events
> >> demonstrating the problems you are describing, as well as a more
> >> standard syscall event, so we can discuss some possible solutions.
> >
> > If the auditted process is in a container and it ptraces or signals
> > another process in a container, there will be two AUDIT_CONTAINER
> > records for the same event that won't be identified as to which record
> > belongs to which process or other record (SYSCALL vs 1+ OBJ_PID
> > records).  There could be many signals recorded, each with their own
> > OBJ_PID record.  The first is stored in the audit context and additional
> > ones are stored in a chained struct that can accommodate 16 entries each.
> >
> > (See audit_signal_info(), __audit_ptrace().)
> >
> > (As a side note, on code inspection it appears that a signal target
> > would get overwritten by a ptrace action if they were to happen in that
> > order.)
> 
> As requested above, please respond with real audit events generated by
> this patchset so that we can discuss possible solutions.

Ok, then we should be developping a test to test ptrace and signal
auditting in general since we don't have current experience/evidence
that those even work (or rip them out if not).

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 03/10] audit: log container info of syscalls
From: Richard Guy Briggs @ 2018-10-25 12:27 UTC (permalink / raw)
  To: Paul Moore
  Cc: luto, carlos, linux-api, containers, linux-kernel, viro, dhowells,
	linux-audit, netfilter-devel, ebiederm, simo, netdev,
	linux-fsdevel, Eric Paris, Serge Hallyn
In-Reply-To: <CAHC9VhR02siURgadQs0EikcOm0d_TPAi4jJYCx4NkA=wQ5rjSA@mail.gmail.com>

On 2018-10-25 06:49, Paul Moore wrote:
> On Thu, Oct 25, 2018 at 2:06 AM Steve Grubb <sgrubb@redhat.com> wrote:
> > On Wed, 24 Oct 2018 20:42:55 -0400
> > Richard Guy Briggs <rgb@redhat.com> wrote:
> > > On 2018-10-24 16:55, Paul Moore wrote:
> > > > On Wed, Oct 24, 2018 at 11:15 AM Richard Guy Briggs
> > > > <rgb@redhat.com> wrote:
> > > > > On 2018-10-19 19:16, Paul Moore wrote:
> > > > > > On Sun, Aug 5, 2018 at 4:32 AM Richard Guy Briggs
> > > > > > <rgb@redhat.com> wrote:
> 
> ...
> 
> > > > > > > +/*
> > > > > > > + * audit_log_contid - report container info
> > > > > > > + * @tsk: task to be recorded
> > > > > > > + * @context: task or local context for record
> > > > > > > + * @op: contid string description
> > > > > > > + */
> > > > > > > +int audit_log_contid(struct task_struct *tsk,
> > > > > > > +                            struct audit_context *context,
> > > > > > > char *op) +{
> > > > > > > +       struct audit_buffer *ab;
> > > > > > > +
> > > > > > > +       if (!audit_contid_set(tsk))
> > > > > > > +               return 0;
> > > > > > > +       /* Generate AUDIT_CONTAINER record with container ID
> > > > > > > */
> > > > > > > +       ab = audit_log_start(context, GFP_KERNEL,
> > > > > > > AUDIT_CONTAINER);
> > > > > > > +       if (!ab)
> > > > > > > +               return -ENOMEM;
> > > > > > > +       audit_log_format(ab, "op=%s contid=%llu",
> > > > > > > +                        op, audit_get_contid(tsk));
> > > > > > > +       audit_log_end(ab);
> > > > > > > +       return 0;
> > > > > > > +}
> > > > > > > +EXPORT_SYMBOL(audit_log_contid);
> > > > > >
> > > > > > As discussed in the previous iteration of the patch, I prefer
> > > > > > AUDIT_CONTAINER_ID here over AUDIT_CONTAINER.  If you feel
> > > > > > strongly about keeping it as-is with AUDIT_CONTAINER I suppose
> > > > > > I could live with that, but it is isn't my first choice.
> > > > >
> > > > > I don't have a strong opinion on this one, mildly preferring the
> > > > > shorter one only because it is shorter.
> > > >
> > > > We already have multiple AUDIT_CONTAINER* record types, so it seems
> > > > as though we should use "AUDIT_CONTAINER" as a prefix of sorts,
> > > > rather than a type itself.
> > >
> > > I'm fine with that.  I'd still like to hear Steve's input.  He had
> > > stronger opinions than me.
> >
> > The creation event should be separate and distinct from the continuing
> > use when its used as a supplemental record. IOW, binding the ID to a
> > container is part of the lifecycle and needs to be kept distinct.
> 
> Steve's comment is pretty ambiguous when it comes to AUDIT_CONTAINER
> vs AUDIT_CONTAINER_ID, but one could argue that AUDIT_CONTAINER_ID
> helps distinguish the audit container id marking record and gets to
> what I believe is the spirit of Steve's comment.  Taking this in
> context with my previous remarks, let's switch to using
> AUDIT_CONTAINER_ID.

I suspect Steve is mixing up AUDIT_CONTAINER_OP with AUDIT_CONTAINER_ID,
confusing the fact that they are two seperate records.  As a summary,
the suggested records are:
	CONTAINER_OP	audit container identifier creation
	CONTAINER	audit container identifier aux record to an event

and what Paul is suggesting (which is fine by me) is:
	CONTAINER_OP	audit container identifier creation event
	CONTAINER_ID	audit container identifier aux record to an event

Steve, please indicate you are fine with this.

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: [PATCH] octeontx2-af: Use GFP_ATOMIC under spin lock
From: Sunil Kovvuri @ 2018-10-25  4:18 UTC (permalink / raw)
  To: weiyongjun1
  Cc: Sunil Goutham, Linu Cherian, Geetha sowjanya, jerinj,
	Linux Netdev List, kernel-janitors
In-Reply-To: <1540431746-176759-1-git-send-email-weiyongjun1@huawei.com>

On Thu, Oct 25, 2018 at 7:02 AM Wei Yongjun <weiyongjun1@huawei.com> wrote:
>
> The function nix_update_mce_list() is called from
> nix_update_bcast_mce_list(), and a spin lock is held
> here, so we should use GFP_ATOMIC instead.
>
> Fixes: 4b05528ebf0c ("octeontx2-af: Update bcast list upon NIXLF alloc/free")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> ---
>  drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
> index 8890c95..a618e48 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
> @@ -1294,7 +1294,7 @@ static int nix_update_mce_list(struct nix_mce_list *mce_list,
>                 return 0;
>
>         /* Add a new one to the list, at the tail */
> -       mce = kzalloc(sizeof(*mce), GFP_KERNEL);
> +       mce = kzalloc(sizeof(*mce), GFP_ATOMIC);
>         if (!mce)
>                 return -ENOMEM;
>         mce->idx = idx;
>

Thanks for pointing this and for the patch.
FYI, this driver is no where near to complete, after net-next is open we will
start submitting rest of the patches. There is a subsequent patch
which changes most of the locks to mutex from spinlock, which will
fix this issue as well.

It would be great if above patch is ignored for now, as in it's
current state driver is
not complete and usable. Otherwise also fine, I will change change/fix
this again.

Regards,
Sunil.

^ permalink raw reply

* Re: [PATCH RFC v3 0/3] Rlimit for module space
From: Michal Hocko @ 2018-10-25 14:18 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: daniel@iogearbox.net, jeyu@kernel.org, ard.biesheuvel@linaro.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	jannh@google.com, keescook@chromium.org, arjan@linux.intel.com,
	netdev@vger.kernel.org, tglx@linutronix.de,
	linux-mips@linux-mips.org, linux-s390@vger.kernel.org,
	x86@kernel.org, kristen@linux.intel.com, Dock, Deneen T,
	catalin.marinas@arm.com, mingo@redhat.com, will.deacon@arm.com
In-Reply-To: <d1fec827d028168047eafbac56e8e47d37cf7fb5.camel@intel.com>

On Thu 25-10-18 01:01:44, Edgecombe, Rick P wrote:
[...]
> FWIW, cgroups seems like a better solution than rlimit to me too. Maybe you all
> already know, but it looks like the cgroups vmalloc charge is done in the main
> page allocator and counts against the whole kernel memory limit.

I am not sure I understand what you are saying but let me clarify that
vmalloc memory is accounted against memory cgroup associated with the
user context calling vmalloc. All you need to do is to add __GFP_ACCOUNT
to the gfp mask. The only challenge here is the charged memory life
cycle. When does it get deallocated? In the worst case the memory is not
tight to any user context and as such it doesn't get deallocated by
killing all processes which could lead to memcg limit exhaustion.

> A user may want
> to have a higher kernel limit than the module space size, so it seems it isn't
> enough by itself and some new limit would need to be added.

If there is another restriction on this memory then you are right.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH 1/2] Bluetooth: Add quirk for reading BD_ADDR from fwnode property
From: Balakrishna Godavarthi @ 2018-10-25 14:36 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: Marcel Holtmann, Johan Hedberg, David S . Miller, Loic Poulain,
	linux-bluetooth, netdev, linux-kernel, Brian Norris,
	Dmitry Grinberg
In-Reply-To: <20181025002134.256791-2-mka@chromium.org>

On 2018-10-25 05:51, Matthias Kaehlcke wrote:
> Add HCI_QUIRK_USE_BDADDR_PROPERTY to allow controllers to retrieve
> the public Bluetooth address from the firmware node property
> 'local-bd-address'. If quirk is set and the property does not exist
> or is invalid the controller is marked as unconfigured.
> 
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
> hci_dev_get_bd_addr_from_property() currently assumes that the
> firmware node with 'local-bd-address' is from hdev->dev.parent, not
> sure if this universally true. However if it is true for existing
> device that might use this interface we can assume this for now
> (unless there is a clear solution now), and cross the bridge of
> finding an alternative when we actually encounter the situation.
> One option could be to look for the first parent that has a fwnode.
> ---
>  include/net/bluetooth/hci.h | 12 +++++++++++
>  net/bluetooth/hci_core.c    | 42 +++++++++++++++++++++++++++++++++++++
>  net/bluetooth/mgmt.c        |  6 ++++--
>  3 files changed, 58 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
> index cdd9f1fe7cfa..a5d748099752 100644
> --- a/include/net/bluetooth/hci.h
> +++ b/include/net/bluetooth/hci.h
> @@ -158,6 +158,18 @@ enum {
>  	 */
>  	HCI_QUIRK_INVALID_BDADDR,
> 
> +	/* When this quirk is set, the public Bluetooth address
> +	 * initially reported by HCI Read BD Address command
> +	 * is considered invalid. The public BD Address can be
> +	 * specified in the fwnode property 'local-bd-address'.
> +	 * If this property does not exist or is invalid controller
> +	 * configuration is required before this device can be used.
> +	 *
> +	 * This quirk can be set before hci_register_dev is called or
> +	 * during the hdev->setup vendor callback.
> +	 */
> +	HCI_QUIRK_USE_BDADDR_PROPERTY,
> +
>  	/* When this quirk is set, the duplicate filtering during
>  	 * scanning is based on Bluetooth devices addresses. To allow
>  	 * RSSI based updates, restart scanning if needed.
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 74b29c7d841c..97214262c4fb 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -30,6 +30,7 @@
>  #include <linux/rfkill.h>
>  #include <linux/debugfs.h>
>  #include <linux/crypto.h>
> +#include <linux/property.h>
>  #include <asm/unaligned.h>
> 
>  #include <net/bluetooth/bluetooth.h>
> @@ -1355,9 +1356,40 @@ int hci_inquiry(void __user *arg)
>  	return err;
>  }
> 
> +/**
> + * hci_dev_get_bd_addr_from_property - Get the Bluetooth Device 
> Address
> + *				       (BD_ADDR) for a HCI device from
> + *				       a firmware node property.
> + * @hdev:	The HCI device
> + *
> + * Search the firmware node for 'local-bd-address'.
> + *
> + * All-zero BD addresses are rejected, because those could be 
> properties
> + * that exist in the firmware tables, but were not updated by the 
> firmware. For
> + * example, the DTS could define 'local-bd-address', with zero BD 
> addresses.
> + */
> +static int hci_dev_get_bd_addr_from_property(struct hci_dev *hdev)
> +{
> +	struct fwnode_handle *fwnode = dev_fwnode(hdev->dev.parent);
> +	bdaddr_t ba;
> +	int ret;
> +
> +	ret = fwnode_property_read_u8_array(fwnode, "local-bd-address",
> +					    (u8 *)&ba, sizeof(ba));
> +	if (ret < 0)
> +		return ret;
> +	if (!bacmp(&ba, BDADDR_ANY))
> +		return -ENODATA;
> +
> +	hdev->public_addr = ba;
> +
> +	return 0;
> +}
> +
>  static int hci_dev_do_open(struct hci_dev *hdev)
>  {
>  	int ret = 0;
> +	bool bd_addr_set = false;
> 
>  	BT_DBG("%s %p", hdev->name, hdev);
> 
> @@ -1422,6 +1454,16 @@ static int hci_dev_do_open(struct hci_dev *hdev)
>  		if (hdev->setup)
>  			ret = hdev->setup(hdev);
> 
> +		if (test_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks)) {
> +			if (!hci_dev_get_bd_addr_from_property(hdev))
> +				if (hdev->set_bdaddr &&
> +				    !hdev->set_bdaddr(hdev, &hdev->public_addr))
> +					bd_addr_set = true;
> +
> +			if (!bd_addr_set)
> +				hci_dev_set_flag(hdev, HCI_UNCONFIGURED);
> +		}
> +
>  		/* The transport driver can set these quirks before
>  		 * creating the HCI device or in its setup callback.
>  		 *
> diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
> index 3bdc8f3ca259..3d9edb752403 100644
> --- a/net/bluetooth/mgmt.c
> +++ b/net/bluetooth/mgmt.c
> @@ -551,7 +551,8 @@ static bool is_configured(struct hci_dev *hdev)
>  	    !hci_dev_test_flag(hdev, HCI_EXT_CONFIGURED))
>  		return false;
> 
> -	if (test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) &&
> +	if ((test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) ||
> +	     test_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks)) &&
>  	    !bacmp(&hdev->public_addr, BDADDR_ANY))
>  		return false;
> 
> @@ -566,7 +567,8 @@ static __le32 get_missing_options(struct hci_dev 
> *hdev)
>  	    !hci_dev_test_flag(hdev, HCI_EXT_CONFIGURED))
>  		options |= MGMT_OPTION_EXTERNAL_CONFIG;
> 
> -	if (test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) &&
> +	if ((test_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks) ||
> +	     test_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks)) &&
>  	    !bacmp(&hdev->public_addr, BDADDR_ANY))
>  		options |= MGMT_OPTION_PUBLIC_ADDRESS;

Looks fine to me.

Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
-- 
Regards
Balakrishna.

^ permalink raw reply

* Re: [PATCH 2/2] Bluetooth: btqcomsmd: use HCI_QUIRK_USE_BDADDR_PROPERTY
From: Balakrishna Godavarthi @ 2018-10-25 14:40 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: Marcel Holtmann, Johan Hedberg, David S . Miller, Loic Poulain,
	linux-bluetooth, netdev, linux-kernel, Brian Norris,
	Dmitry Grinberg
In-Reply-To: <20181025002134.256791-3-mka@chromium.org>

Hi Matthias,

On 2018-10-25 05:51, Matthias Kaehlcke wrote:
> Use the HCI_QUIRK_USE_BDADDR_PROPERTY quirk to let the HCI
> core handle the reading of 'local-bd-address'. With this there
> is no need to set HCI_QUIRK_INVALID_BDADDR, the case of a
> non-existing or invalid fwnode property is handled by the core
> code.
> 
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
> I couldn't actually test the changes in this driver since I
> don't have a device with this controller. Could someone
> from Qualcomm help with this?
> ---
>  drivers/bluetooth/btqcomsmd.c | 28 +++-------------------------
>  1 file changed, 3 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/bluetooth/btqcomsmd.c 
> b/drivers/bluetooth/btqcomsmd.c
> index 7df3eed1ef5e..e5841602c4f1 100644
> --- a/drivers/bluetooth/btqcomsmd.c
> +++ b/drivers/bluetooth/btqcomsmd.c
> @@ -125,23 +125,10 @@ static int btqcomsmd_setup(struct hci_dev *hdev)
>  		return PTR_ERR(skb);
>  	kfree_skb(skb);
> 
> -	/* Devices do not have persistent storage for BD address. If no
> -	 * BD address has been retrieved during probe, mark the device
> -	 * as having an invalid BD address.
> +	/* Devices do not have persistent storage for BD address. Retrieve
> +	 * it from the firmware node property.
>  	 */
> -	if (!bacmp(&btq->bdaddr, BDADDR_ANY)) {
> -		set_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks);
> -		return 0;
> -	}
> -
> -	/* When setting a configured BD address fails, mark the device
> -	 * as having an invalid BD address.
> -	 */
> -	err = qca_set_bdaddr_rome(hdev, &btq->bdaddr);
> -	if (err) {
> -		set_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks);
> -		return 0;
> -	}
> +	set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks);
> 
>  	return 0;
>  }
> @@ -169,15 +156,6 @@ static int btqcomsmd_probe(struct platform_device 
> *pdev)
>  	if (IS_ERR(btq->cmd_channel))
>  		return PTR_ERR(btq->cmd_channel);
> 
> -	/* The local-bd-address property is usually injected by the
> -	 * bootloader which has access to the allocated BD address.
> -	 */
> -	if (!of_property_read_u8_array(pdev->dev.of_node, "local-bd-address",
> -				       (u8 *)&btq->bdaddr, sizeof(bdaddr_t))) {
> -		dev_info(&pdev->dev, "BD address %pMR retrieved from device-tree",
> -			 &btq->bdaddr);
> -	}
> -
>  	hdev = hci_alloc_dev();
>  	if (!hdev)
>  		return -ENOMEM;

nit: can be remove unused "bdaddr_t bdaddr" variable.
https://elixir.bootlin.com/linux/v4.19-rc8/source/drivers/bluetooth/btqcomsmd.c#L31
Apart from this, It looks ok to me.

Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
-- 
Regards
Balakrishna.

^ permalink raw reply

* Re: [PATCH net-next v8 28/28] net: WireGuard secure network tunnel
From: Jason A. Donenfeld @ 2018-10-25 14:43 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: LKML, Netdev, Linux Crypto Mailing List, David Miller,
	Greg Kroah-Hartman
In-Reply-To: <20181020224706.GC14816@lunn.ch>

Hi Andrew,

Thanks for the review. Comments and fix links are inline below.

On Sun, Oct 21, 2018 at 12:47 AM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > +#define choose_node(parent, key)                                               \
> > +     parent->bit[(key[parent->bit_at_a] >> parent->bit_at_b) & 1]
> This should be a function, not a macro.

That prevents it from being used as an lvalue, which is mostly where
it's used, with the rcu-protected structure, so this will remain a
macro. I'll make it upper-case though.

> > +
> > +static void node_free_rcu(struct rcu_head *rcu)
> > +{
> > +     kfree(container_of(rcu, struct allowedips_node, rcu));
> > +}
> > +
> > +#define push_rcu(stack, p, len) ({                                             \
> > +             if (rcu_access_pointer(p)) {                                   \
> > +                     WARN_ON(IS_ENABLED(DEBUG) && (len) >= 128);            \
> > +                     stack[(len)++] = rcu_dereference_raw(p);               \
> > +             }                                                              \
> > +             true;                                                          \
> > +     })
>
> This also looks like it could be a function.

You're right. I've changed this now. That produces slightly worse code
on gcc 8.2, but that's not really hot path anyway, so it doesn't
matter.

> > +static void root_free_rcu(struct rcu_head *rcu)
> > +{
> > +     struct allowedips_node *node, *stack[128] = {
> > +             container_of(rcu, struct allowedips_node, rcu) };
> > +     unsigned int len = 1;
> > +
> > +     while (len > 0 && (node = stack[--len]) &&
> > +            push_rcu(stack, node->bit[0], len) &&
> > +            push_rcu(stack, node->bit[1], len))
> > +             kfree(node);
> > +}
> > +
> > +#define ref(p) rcu_access_pointer(p)
> > +#define deref(p) rcu_dereference_protected(*(p), lockdep_is_held(lock))
>
> Macros should be uppercase, or better still, functions.
>
> > +#define push(p) ({                                                             \
> > +             WARN_ON(IS_ENABLED(DEBUG) && len >= 128);                      \
> > +             stack[len++] = p;                                              \
> > +     })
>
> This one definitely should be upper case, to warn readers it has
> unexpected side effects.

I've made these little helper macros uppercase.

>
> > +
> > +static void walk_remove_by_peer(struct allowedips_node __rcu **top,
> > +                             struct wg_peer *peer, struct mutex *lock)
> > +{
> > +     struct allowedips_node __rcu **stack[128], **nptr;
> > +     struct allowedips_node *node, *prev;
> > +     unsigned int len;
> > +
> > +     if (unlikely(!peer || !ref(*top)))
> > +             return;
> > +
> > +     for (prev = NULL, len = 0, push(top); len > 0; prev = node) {
> > +             nptr = stack[len - 1];
> > +             node = deref(nptr);
> > +             if (!node) {
> > +                     --len;
> > +                     continue;
> > +             }
> > +             if (!prev || ref(prev->bit[0]) == node ||
> > +                 ref(prev->bit[1]) == node) {
> > +                     if (ref(node->bit[0]))
> > +                             push(&node->bit[0]);
> > +                     else if (ref(node->bit[1]))
> > +                             push(&node->bit[1]);
> > +             } else if (ref(node->bit[0]) == prev) {
> > +                     if (ref(node->bit[1]))
> > +                             push(&node->bit[1]);
> > +             } else {
> > +                     if (rcu_dereference_protected(node->peer,
> > +                             lockdep_is_held(lock)) == peer) {
> > +                             RCU_INIT_POINTER(node->peer, NULL);
> > +                             if (!node->bit[0] || !node->bit[1]) {
> > +                                     rcu_assign_pointer(*nptr,
> > +                                     deref(&node->bit[!ref(node->bit[0])]));
> > +                                     call_rcu_bh(&node->rcu, node_free_rcu);
> > +                                     node = deref(nptr);
> > +                             }
> > +                     }
> > +                     --len;
> > +             }
> > +     }
> > +}
> > +
> > +#undef ref
> > +#undef deref
> > +#undef push
> > +
> > +static __always_inline unsigned int fls128(u64 a, u64 b)
> > +{
> > +     return a ? fls64(a) + 64U : fls64(b);
> > +}
>
> Does the compiler actually get this wrong? Not inline it?

Actually for this function (in contrast with all of the other ones
marked as such, which really must have the annotation), recent gcc
gets it right, so I've removed the annotation. Nice catch.

> > +
> > +static __always_inline u8 common_bits(const struct allowedips_node *node,
> > +                                   const u8 *key, u8 bits)
> > +{
> > +     if (bits == 32)
> > +             return 32U - fls(*(const u32 *)node->bits ^ *(const u32 *)key);
> > +     else if (bits == 128)
> > +             return 128U - fls128(
> > +                     *(const u64 *)&node->bits[0] ^ *(const u64 *)&key[0],
> > +                     *(const u64 *)&node->bits[8] ^ *(const u64 *)&key[8]);
> > +     return 0;
> > +}
> > +
> > +/* This could be much faster if it actually just compared the common bits
> > + * properly, by precomputing a mask bswap(~0 << (32 - cidr)), and the rest, but
> > + * it turns out that common_bits is already super fast on modern processors,
> > + * even taking into account the unfortunate bswap. So, we just inline it like
> > + * this instead.
> > + */
> > +#define prefix_matches(node, key, bits)                                        \
> > +     (common_bits(node, key, bits) >= (node)->cidr)
>
> Could be a function.

I've made this a function now and confirmed codegen did not change
significantly.

> > +/* Returns a strong reference to a peer */
> > +static __always_inline struct wg_peer *
> > +lookup(struct allowedips_node __rcu *root, u8 bits, const void *be_ip)
> > +{
> > +     u8 ip[16] __aligned(__alignof(u64));
>
> You virtually never see aligned stack variables. This needs some sort
> of comment.

I've added a comment. The reason, if you're curious, is so that we can
pass it to fls64 on all platforms.

> > +__attribute__((nonnull(1))) static bool
> > +node_placement(struct allowedips_node __rcu *trie, const u8 *key, u8 cidr,
> > +            u8 bits, struct allowedips_node **rnode, struct mutex *lock)
> > +{
> > +     struct allowedips_node *node = rcu_dereference_protected(trie,
> > +                                             lockdep_is_held(lock));
> > +     struct allowedips_node *parent = NULL;
> > +     bool exact = false;
>
> Should there be a WARN_ON(!key) here, since the attribute will only
> detect problems at compile time, and maybe some runtime cases will get
> passed it?

Actually no, it's only used in one place, and that function is pretty
clear about checking beforehand, and even this function checks
implicitly. Looking at my commit log, I had added the annotation to
make clang-analyzer happy, because it couldn't (at the time) deduce
things correctly from rcu_access_pointer(p) checks. It looks like
recent clang-analyzer has fixed that, and besides I'm not sure the
kernel is in the business of adding annotations to work around
static-analyzer-bug-of-the-week. So I've gotten rid of the annotation.
Thanks for bringing my attention to it.

> > +             net_dbg_ratelimited("%s: Could not decrypt invalid cookie response\n",
> > +                                 wg->dev->name);
>
> It might be worth adding a netdev_dbg_ratelimited(), which takes a
> netdev as its first parameter, just line netdev_dbg().

That sounds like it could be useful. Though, I'm trying hard to make
the first patch submission _not_ need to touch any of the rest of the
networking stack. I've actually got a small list of interesting
networking stack changes that might be useful for WireGuard, but I
think I'd prefer to submit these after this is all merged, and each
one separately for a separate discussion, if that's okay with you.

> > +
> > +#if defined(CONFIG_PM_SLEEP) && !defined(CONFIG_ANDROID)
>
> I don't see any other code which uses this combination. Why is this
> needed?

WireGuard clears private key material before going to sleep, so that
ephemeral keys never live longer in ram than their expiration date.
This works well for mostly everything, except Android devices where
crazy out-of-tree wireless drivers (such as qcacld) will actually wake
and sleep the phone constantly, for a few milliseconds each wake, to
handle incoming packets. In this case, we must not zero out ephemeral
keys, so that these packets can actually be decrypted. There are no
other systems that have this characteristic, as far as I can tell, and
I'm certainly not willing to add a runtime configuration nob for that,
and it turns out that matching on CONFIG_ANDROID as such does the
right thing all of the time. If the landscape becomes more
complicated, we can visit it, probably in the form of passing reasons
through to pm_notification and related scaffolding. But in lieu of
this infrastructure someday existing, I prefer to do the
simple-and-always-works-anyway solution.

> > +     while (skb_queue_len(&peer->staged_packet_queue) > MAX_STAGED_PACKETS)
> > +             dev_kfree_skb(__skb_dequeue(&peer->staged_packet_queue));
>
> It would be good to have some stats counters in here. Maybe the
> standard interface statistics will cover it, otherwise ethtool -S.

Good point, I need to increment the drop counters there. I'll add
that; good catch.

Willy and I actually discussed around a year ago adding a series of
wireguard-specific error counters for all the various unusual things
that can happen. This is something I'll probably investigate in
earnest post-merge, as it's a bit of a "bell and whistle" thing, and
the existing counters now are already sufficient for most purposes.

> You should also get this code looked at by some of the queueing
> people. Rather than discarding, you might want to be applying back
> pressure to slow down the sender application?

Actually Toke, DaveT, and I have agonized quite a bit over the
queueing mechanisms in WireGuard, as well as two GSoC summers looking
at it with students. There's been lots of flent and rrul and tweaking
involved, and the entire queueing system has gone through probably 15
different revisions of trying things out and experimenting. At this
point, I think the way we're doing queueing and when we're dropping,
and those various factors, are taken care of pretty well. Future work
still involves considering shoehorning some fq_codel in a similar way
that wireless does, and a few other tweeks, but I think the current
system is good and stable and very much sufficient.

The particular instance you're pointing out though, with
staged_packet_queue, is not the relevant queue and isn't what you have
in mind with regards to backpressure. This is simply a very short
lived thing for dealing with packets before a handshake is made, and
for handling xmit being used from multiple cores, but it's not _the_
queueing system that's relevant in the actual buffering and latency
calculations.

A little bit of the queueing stuff is discussed in the paper I
presented at Netdev 2.2, if you're interested:
https://www.wireguard.com/papers/wireguard-netdev22.pdf Some of that
is outdated by now, but should give you some idea.

> > +static void kdf(u8 *first_dst, u8 *second_dst, u8 *third_dst, const u8 *data,
> > +             size_t first_len, size_t second_len, size_t third_len,
> > +             size_t data_len, const u8 chaining_key[NOISE_HASH_LEN])
> > +{
> > +     u8 output[BLAKE2S_HASH_SIZE + 1];
> > +     u8 secret[BLAKE2S_HASH_SIZE];
> > +
> > +     WARN_ON(IS_ENABLED(DEBUG) &&
> > +             (first_len > BLAKE2S_HASH_SIZE ||
> > +              second_len > BLAKE2S_HASH_SIZE ||
> > +              third_len > BLAKE2S_HASH_SIZE ||
> > +              ((second_len || second_dst || third_len || third_dst) &&
> > +               (!first_len || !first_dst)) ||
> > +              ((third_len || third_dst) && (!second_len || !second_dst))));
>
> Maybe split this up into a number of WARN_ON()s. At the moment, if it
> fires, you have little idea why, there are so many comparisons. Also,
> is this on the hot path? I guess not, since this is keys, not
> packets. Do you need to care about DEBUG here?

This is on the hot path, actually. Well, it's not on path of data
packets, but I do consider handshake packets to be fairly "warm". I'm
not especially keen on splitting that up into multiple warn_ons,
mostly because if that is ever hint, something is seriously wrong, and
the whole flow would likely then need auditing anyway.

Regards,
Jason

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 03/10] audit: log container info of syscalls
From: Paul Moore @ 2018-10-25  6:13 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, linux-audit, linux-fsdevel, linux-kernel,
	netdev, netfilter-devel, ebiederm, luto, carlos, dhowells, viro,
	simo, Eric Paris, Serge Hallyn
In-Reply-To: <20181025004255.zl7p7j6gztouh2hh@madcap2.tricolour.ca>

On October 25, 2018 1:43:16 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2018-10-24 16:55, Paul Moore wrote:
>> On Wed, Oct 24, 2018 at 11:15 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>>> On 2018-10-19 19:16, Paul Moore wrote:
>>>> On Sun, Aug 5, 2018 at 4:32 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>

...

>
>>>> However, I do care about the "op" field in this record.  It just
>>>> doesn't make any sense; the way you are using it it is more of a
>>>> context field than an operations field, and even then why is the
>>>> context important from a logging and/or security perspective?  Drop it
>>>> please.
>>>
>>> I'll rename it to whatever you like.  I'd suggest "ref=".  The reason I
>>> think it is important is there are multiple sources that aren't always
>>> obvious from the other records to which it is associated.  In the case
>>> of ptrace and signals, there can be many target tasks listed (OBJ_PID)
>>> with no other way to distinguish the matching audit container identifier
>>> records all for one event.  This is in addition to the default syscall
>>> container identifier record.  I'm not currently happy with the text
>>> content to link the two, but that should be solvable (most obvious is
>>> taret PID).  Throwing away this information seems shortsighted.
>>
>> It would be helpful if you could generate real audit events
>> demonstrating the problems you are describing, as well as a more
>> standard syscall event, so we can discuss some possible solutions.
>
> If the auditted process is in a container and it ptraces or signals
> another process in a container, there will be two AUDIT_CONTAINER
> records for the same event that won't be identified as to which record
> belongs to which process or other record (SYSCALL vs 1+ OBJ_PID
> records).  There could be many signals recorded, each with their own
> OBJ_PID record.  The first is stored in the audit context and additional
> ones are stored in a chained struct that can accommodate 16 entries each.
>
> (See audit_signal_info(), __audit_ptrace().)
>
> (As a side note, on code inspection it appears that a signal target
> would get overwritten by a ptrace action if they were to happen in that
> order.)

As requested above, please respond with real audit events generated by this patchset so that we can discuss possible solutions.

^ permalink raw reply

* Re: [PATCH net] net: sched: Remove TCA_OPTIONS from policy
From: Jiri Pirko @ 2018-10-25  6:31 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, davem, pupilla, David Ahern
In-Reply-To: <20181024153249.15374-1-dsahern@kernel.org>

Wed, Oct 24, 2018 at 05:32:49PM CEST, dsahern@kernel.org wrote:
>From: David Ahern <dsahern@gmail.com>
>
>Marco reported an error with hfsc:
>root@Calimero:~# tc qdisc add dev eth0 root handle 1:0 hfsc default 1
>Error: Attribute failed policy validation.
>
>Apparently a few implementations pass TCA_OPTIONS as a binary instead
>of nested attribute, so drop TCA_OPTIONS from the policy.

Yeah, this is nice example of a case, where I think it wouldn't hurt to
be a bit more strict. Apparently, the userspace app is buggy. It should
be fixed. Note that I'm aware of the bw compatibility.


>
>Fixes: 8b4c3cdd9dd8 ("net: sched: Add policy validation for tc attributes")
>Reported-by: Marco Berizzi <pupilla@libero.it>
>Signed-off-by: David Ahern <dsahern@gmail.com>
>---
> net/sched/sch_api.c | 1 -
> 1 file changed, 1 deletion(-)
>
>diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
>index 3dc0acf54245..be7cd140b2a3 100644
>--- a/net/sched/sch_api.c
>+++ b/net/sched/sch_api.c
>@@ -1309,7 +1309,6 @@ check_loop_fn(struct Qdisc *q, unsigned long cl, struct qdisc_walker *w)
> 
> const struct nla_policy rtm_tca_policy[TCA_MAX + 1] = {
> 	[TCA_KIND]		= { .type = NLA_STRING },
>-	[TCA_OPTIONS]		= { .type = NLA_NESTED },

A future developer might add this again. A comment why not would be good
here.


> 	[TCA_RATE]		= { .type = NLA_BINARY,
> 				    .len = sizeof(struct tc_estimator) },
> 	[TCA_STAB]		= { .type = NLA_NESTED },
>-- 
>2.11.0
>

^ permalink raw reply

* Re: [PATCH] netfilter: conntrack: fix calculation of next bucket number in early_drop
From: Vasiliy Khoruzhick @ 2018-10-25  6:37 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	David S. Miller, netfilter-devel, coreteam, netdev, linux-kernel,
	stable
In-Reply-To: <a2fa755a-048b-459a-b526-2fe27cbb6574@arista.com>

On Wed, Oct 24, 2018 at 9:52 PM, Dmitry Safonov <dima@arista.com> wrote:
> On 10/25/18 4:48 AM, Vasily Khoruzhick wrote:
>>
>> If there's no entry to drop in bucket that corresponds to the hash,
>> early_drop() should look for it in other buckets. But since it increments
>> hash instead of bucket number, it actually looks in the same bucket 8
>> times: hsize is 16k by default (14 bits) and hash is 32-bit value, so
>> reciprocal_scale(hash, hsize) returns the same value for hash..hash+7 in
>> most cases.
>>
>> Fix it by increasing bucket number instead of hash and rename _hash
>> to bucket to avoid future confusion.
>>
>> Fixes: 3e86638e9a0b ("netfilter: conntrack: consider ct netns in
>> early_drop logic")
>> Cc: <stable@vger.kernel.org> # v4.7+
>> Signed-off-by: Vasily Khoruzhick <vasilykh@arista.com>
>
>
> Nice work!
>
> Reviewed-by: Dmitry Safonov <dima@arista.com>

Oops, found a bug in it - 'bucket' should be moved outside of the
loop. I'll send v2 tomorrow morning.

>
>> ---
>>   net/netfilter/nf_conntrack_core.c | 11 +++++++----
>>   1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/netfilter/nf_conntrack_core.c
>> b/net/netfilter/nf_conntrack_core.c
>> index ca1168d67fac..a04af246b184 100644
>> --- a/net/netfilter/nf_conntrack_core.c
>> +++ b/net/netfilter/nf_conntrack_core.c
>> @@ -1073,19 +1073,22 @@ static unsigned int early_drop_list(struct net
>> *net,
>>         return drops;
>>   }
>>   -static noinline int early_drop(struct net *net, unsigned int _hash)
>> +static noinline int early_drop(struct net *net, unsigned int hash)
>>   {
>>         unsigned int i;
>>         for (i = 0; i < NF_CT_EVICTION_RANGE; i++) {
>>                 struct hlist_nulls_head *ct_hash;
>> -               unsigned int hash, hsize, drops;
>> +               unsigned int bucket, hsize, drops;
>>                 rcu_read_lock();
>>                 nf_conntrack_get_ht(&ct_hash, &hsize);
>> -               hash = reciprocal_scale(_hash++, hsize);
>> +               if (!i)
>> +                       bucket = reciprocal_scale(hash, hsize);
>> +               else
>> +                       bucket = (bucket + 1) % hsize;
>>   -             drops = early_drop_list(net, &ct_hash[hash]);
>> +               drops = early_drop_list(net, &ct_hash[bucket]);
>>                 rcu_read_unlock();
>>                 if (drops) {
>>
>
> --
>           Dima

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 03/10] audit: log container info of syscalls
From: Steve Grubb @ 2018-10-25 15:57 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Paul Moore, simo, carlos, linux-api, containers, linux-kernel,
	dhowells, linux-audit, netfilter-devel, ebiederm, luto, netdev,
	linux-fsdevel, Eric Paris, Serge Hallyn, viro
In-Reply-To: <20181025122732.4j4rbychjse3gemt@madcap2.tricolour.ca>

On Thu, 25 Oct 2018 08:27:32 -0400
Richard Guy Briggs <rgb@redhat.com> wrote:

> On 2018-10-25 06:49, Paul Moore wrote:
> > On Thu, Oct 25, 2018 at 2:06 AM Steve Grubb <sgrubb@redhat.com>
> > wrote:  
> > > On Wed, 24 Oct 2018 20:42:55 -0400
> > > Richard Guy Briggs <rgb@redhat.com> wrote:  
> > > > On 2018-10-24 16:55, Paul Moore wrote:  
> > > > > On Wed, Oct 24, 2018 at 11:15 AM Richard Guy Briggs
> > > > > <rgb@redhat.com> wrote:  
> > > > > > On 2018-10-19 19:16, Paul Moore wrote:  
> > > > > > > On Sun, Aug 5, 2018 at 4:32 AM Richard Guy Briggs
> > > > > > > <rgb@redhat.com> wrote:  
> > 
> > ...
> >   
> > > > > > > > +/*
> > > > > > > > + * audit_log_contid - report container info
> > > > > > > > + * @tsk: task to be recorded
> > > > > > > > + * @context: task or local context for record
> > > > > > > > + * @op: contid string description
> > > > > > > > + */
> > > > > > > > +int audit_log_contid(struct task_struct *tsk,
> > > > > > > > +                            struct audit_context
> > > > > > > > *context, char *op) +{
> > > > > > > > +       struct audit_buffer *ab;
> > > > > > > > +
> > > > > > > > +       if (!audit_contid_set(tsk))
> > > > > > > > +               return 0;
> > > > > > > > +       /* Generate AUDIT_CONTAINER record with
> > > > > > > > container ID */
> > > > > > > > +       ab = audit_log_start(context, GFP_KERNEL,
> > > > > > > > AUDIT_CONTAINER);
> > > > > > > > +       if (!ab)
> > > > > > > > +               return -ENOMEM;
> > > > > > > > +       audit_log_format(ab, "op=%s contid=%llu",
> > > > > > > > +                        op, audit_get_contid(tsk));
> > > > > > > > +       audit_log_end(ab);
> > > > > > > > +       return 0;
> > > > > > > > +}
> > > > > > > > +EXPORT_SYMBOL(audit_log_contid);  
> > > > > > >
> > > > > > > As discussed in the previous iteration of the patch, I
> > > > > > > prefer AUDIT_CONTAINER_ID here over AUDIT_CONTAINER.  If
> > > > > > > you feel strongly about keeping it as-is with
> > > > > > > AUDIT_CONTAINER I suppose I could live with that, but it
> > > > > > > is isn't my first choice.  
> > > > > >
> > > > > > I don't have a strong opinion on this one, mildly
> > > > > > preferring the shorter one only because it is shorter.  
> > > > >
> > > > > We already have multiple AUDIT_CONTAINER* record types, so it
> > > > > seems as though we should use "AUDIT_CONTAINER" as a prefix
> > > > > of sorts, rather than a type itself.  
> > > >
> > > > I'm fine with that.  I'd still like to hear Steve's input.  He
> > > > had stronger opinions than me.  
> > >
> > > The creation event should be separate and distinct from the
> > > continuing use when its used as a supplemental record. IOW,
> > > binding the ID to a container is part of the lifecycle and needs
> > > to be kept distinct.  
> > 
> > Steve's comment is pretty ambiguous when it comes to AUDIT_CONTAINER
> > vs AUDIT_CONTAINER_ID, but one could argue that AUDIT_CONTAINER_ID
> > helps distinguish the audit container id marking record and gets to
> > what I believe is the spirit of Steve's comment.  Taking this in
> > context with my previous remarks, let's switch to using
> > AUDIT_CONTAINER_ID.  
> 
> I suspect Steve is mixing up AUDIT_CONTAINER_OP with
> AUDIT_CONTAINER_ID, confusing the fact that they are two seperate
> records.  As a summary, the suggested records are:
> 	CONTAINER_OP	audit container identifier creation
> 	CONTAINER	audit container identifier aux record to an
> event
> 
> and what Paul is suggesting (which is fine by me) is:
> 	CONTAINER_OP	audit container identifier creation event
> 	CONTAINER_ID	audit container identifier aux record to
> an event
> 
> Steve, please indicate you are fine with this.

I thought it was:

CONTAINER_ID audit container identifier creation event
CONTAINER audit container identifier aux record to an event

Or vice versa. Don't mix up creation of the identifier with operations.

-Steve

^ permalink raw reply

* [PATCH v3 0/2] net: qcom/emac: add shared mdio bus support
From: Wang Dongsheng @ 2018-10-25  8:15 UTC (permalink / raw)
  To: timur, andrew; +Cc: netdev, Wang Dongsheng, yu.zheng, f.fainelli

The emac include MDIO controller, and the motherboard has more than one
PHY connected to an MDIO bus. So share the shared mii_bus for others MAC
device that not has MDIO bus connected.

Based on ACPI, since "phy-handle" cannot directly point to a _DSD
sub-package, so we use "phy-handle" to point an internal MDIO device port.
The port describes the phy address.

Tested: QDF2400 (ACPI), buildin/insmod/rmmod

V3:
 - Add "phy-handle" support.
 - Remove all of DT changes.

V2:
 - Separate patch.

Wang Dongsheng (2):
  net: qcom/emac: split phy_config to mdio bus create and get phy device
  net: qcom/emac: add phy-handle support for ACPI

 drivers/net/ethernet/qualcomm/emac/emac-phy.c | 183 ++++++++++++++----
 1 file changed, 142 insertions(+), 41 deletions(-)

-- 
2.18.0

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox