Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: WARNING in tracepoint_probe_register_prio (3)
From: syzbot @ 2019-08-16 12:32 UTC (permalink / raw)
  To: antoine.tenart, ard.biesheuvel, baruch, bigeasy, davem, gregkh,
	gustavo, jeyu, linux-kernel, mathieu.desnoyers, maxime.chevallier,
	mingo, netdev, paulmck, paulmck, rmk+kernel, rostedt,
	syzkaller-bugs, tglx
In-Reply-To: <000000000000ab6f84056c786b93@google.com>

syzbot has bisected this bug to:

commit ecb9f80db23a7ab09b46b298b404e41dd7aff6e6
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Aug 13 08:00:25 2019 +0000

     net/mvpp2: Replace tasklet with softirq hrtimer

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13ffb9ee600000
start commit:   ecb9f80d net/mvpp2: Replace tasklet with softirq hrtimer
git tree:       net-next
final crash:    https://syzkaller.appspot.com/x/report.txt?x=100079ee600000
console output: https://syzkaller.appspot.com/x/log.txt?x=17ffb9ee600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=d4cf1ffb87d590d7
dashboard link: https://syzkaller.appspot.com/bug?extid=774fddf07b7ab29a1e55
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=11b02a22600000

Reported-by: syzbot+774fddf07b7ab29a1e55@syzkaller.appspotmail.com
Fixes: ecb9f80db23a ("net/mvpp2: Replace tasklet with softirq hrtimer")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply

* Re: [PATCH net-next, 2/6] PCI: hv: Add a Hyper-V PCI mini driver for software backchannel interface
From: Vitaly Kuznetsov @ 2019-08-16 12:27 UTC (permalink / raw)
  To: Haiyang Zhang, sashal@kernel.org, davem@davemloft.net,
	saeedm@mellanox.com, leon@kernel.org, eranbe@mellanox.com,
	lorenzo.pieralisi@arm.com, bhelgaas@google.com,
	linux-pci@vger.kernel.org, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org
  Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	linux-kernel@vger.kernel.org
In-Reply-To: <1565809632-39138-3-git-send-email-haiyangz@microsoft.com>

Haiyang Zhang <haiyangz@microsoft.com> writes:

> This mini driver is a helper driver allows other drivers to
> have a common interface with the Hyper-V PCI frontend driver.
>
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
>  MAINTAINERS                              |  1 +
>  drivers/pci/Kconfig                      |  1 +
>  drivers/pci/controller/Kconfig           |  7 ++++
>  drivers/pci/controller/Makefile          |  1 +
>  drivers/pci/controller/pci-hyperv-mini.c | 70 ++++++++++++++++++++++++++++++++
>  drivers/pci/controller/pci-hyperv.c      | 12 ++++--
>  include/linux/hyperv.h                   | 30 ++++++++++----
>  7 files changed, 111 insertions(+), 11 deletions(-)
>  create mode 100644 drivers/pci/controller/pci-hyperv-mini.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e352550..c4962b9 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7453,6 +7453,7 @@ F:	drivers/hid/hid-hyperv.c
>  F:	drivers/hv/
>  F:	drivers/input/serio/hyperv-keyboard.c
>  F:	drivers/pci/controller/pci-hyperv.c
> +F:	drivers/pci/controller/pci-hyperv-mini.c
>  F:	drivers/net/hyperv/
>  F:	drivers/scsi/storvsc_drv.c
>  F:	drivers/uio/uio_hv_generic.c
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 2ab9240..bb852f5 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -182,6 +182,7 @@ config PCI_LABEL
>  config PCI_HYPERV
>          tristate "Hyper-V PCI Frontend"
>          depends on X86 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && X86_64
> +	select PCI_HYPERV_MINI
>          help
>            The PCI device frontend driver allows the kernel to import arbitrary
>            PCI devices from a PCI backend to support PCI driver domains.
> diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
> index fe9f9f1..8e31cba 100644
> --- a/drivers/pci/controller/Kconfig
> +++ b/drivers/pci/controller/Kconfig
> @@ -281,5 +281,12 @@ config VMD
>  	  To compile this driver as a module, choose M here: the
>  	  module will be called vmd.
>  
> +config PCI_HYPERV_MINI
> +	tristate "Hyper-V PCI Mini"
> +	depends on X86 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && X86_64
> +	help
> +	  The Hyper-V PCI Mini is a helper driver allows other drivers to
> +	  have a common interface with the Hyper-V PCI frontend driver.
> +

Out of pure curiosity, why not just export this interface from
PCI_HYPERV directly? Why do we need this stub?

>  source "drivers/pci/controller/dwc/Kconfig"
>  endmenu
> diff --git a/drivers/pci/controller/Makefile b/drivers/pci/controller/Makefile
> index d56a507..77e0132 100644
> --- a/drivers/pci/controller/Makefile
> +++ b/drivers/pci/controller/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_PCIE_CADENCE_HOST) += pcie-cadence-host.o
>  obj-$(CONFIG_PCIE_CADENCE_EP) += pcie-cadence-ep.o
>  obj-$(CONFIG_PCI_FTPCI100) += pci-ftpci100.o
>  obj-$(CONFIG_PCI_HYPERV) += pci-hyperv.o
> +obj-$(CONFIG_PCI_HYPERV_MINI) += pci-hyperv-mini.o
>  obj-$(CONFIG_PCI_MVEBU) += pci-mvebu.o
>  obj-$(CONFIG_PCI_AARDVARK) += pci-aardvark.o
>  obj-$(CONFIG_PCI_TEGRA) += pci-tegra.o
> diff --git a/drivers/pci/controller/pci-hyperv-mini.c b/drivers/pci/controller/pci-hyperv-mini.c
> new file mode 100644
> index 0000000..9b6cd1c
> --- /dev/null
> +++ b/drivers/pci/controller/pci-hyperv-mini.c
> @@ -0,0 +1,70 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) Microsoft Corporation.
> + *
> + * Author:
> + *   Haiyang Zhang <haiyangz@microsoft.com>
> + *
> + * This mini driver is a helper driver allows other drivers to
> + * have a common interface with the Hyper-V PCI frontend driver.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/hyperv.h>
> +
> +struct hyperv_pci_block_ops hvpci_block_ops;
> +EXPORT_SYMBOL(hvpci_block_ops);
> +
> +int hyperv_read_cfg_blk(struct pci_dev *dev, void *buf, unsigned int buf_len,
> +			unsigned int block_id, unsigned int *bytes_returned)
> +{
> +	if (!hvpci_block_ops.read_block)
> +		return -EOPNOTSUPP;
> +
> +	return hvpci_block_ops.read_block(dev, buf, buf_len, block_id,
> +					  bytes_returned);
> +}
> +EXPORT_SYMBOL(hyperv_read_cfg_blk);
> +
> +int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
> +			 unsigned int block_id)
> +{
> +	if (!hvpci_block_ops.write_block)
> +		return -EOPNOTSUPP;
> +
> +	return hvpci_block_ops.write_block(dev, buf, len, block_id);
> +}
> +EXPORT_SYMBOL(hyperv_write_cfg_blk);
> +
> +int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
> +				void (*block_invalidate)(void *context,
> +							 u64 block_mask))
> +{
> +	if (!hvpci_block_ops.reg_blk_invalidate)
> +		return -EOPNOTSUPP;
> +
> +	return hvpci_block_ops.reg_blk_invalidate(dev, context,
> +						  block_invalidate);
> +}
> +EXPORT_SYMBOL(hyperv_reg_block_invalidate);
> +
> +static void __exit exit_hv_pci_mini(void)
> +{
> +	pr_info("unloaded\n");
> +}
> +
> +static int __init init_hv_pci_mini(void)
> +{
> +	pr_info("loaded\n");
> +
> +	return 0;
> +}
> +
> +module_init(init_hv_pci_mini);
> +module_exit(exit_hv_pci_mini);
> +
> +MODULE_DESCRIPTION("Hyper-V PCI Mini");
> +MODULE_LICENSE("GPL v2");
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 57adeca..9c93ac2 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -983,7 +983,6 @@ int hv_read_config_block(struct pci_dev *pdev, void *buf, unsigned int len,
>  	*bytes_returned = comp_pkt.bytes_returned;
>  	return 0;
>  }
> -EXPORT_SYMBOL(hv_read_config_block);
>  
>  /**
>   * hv_pci_write_config_compl() - Invoked when a response packet for a write
> @@ -1070,7 +1069,6 @@ int hv_write_config_block(struct pci_dev *pdev, void *buf, unsigned int len,
>  
>  	return 0;
>  }
> -EXPORT_SYMBOL(hv_write_config_block);
>  
>  /**
>   * hv_register_block_invalidate() - Invoked when a config block invalidation
> @@ -1101,7 +1099,6 @@ int hv_register_block_invalidate(struct pci_dev *pdev, void *context,
>  	return 0;
>  
>  }
> -EXPORT_SYMBOL(hv_register_block_invalidate);
>  
>  /* Interrupt management hooks */
>  static void hv_int_desc_free(struct hv_pci_dev *hpdev,
> @@ -3045,10 +3042,19 @@ static int hv_pci_remove(struct hv_device *hdev)
>  static void __exit exit_hv_pci_drv(void)
>  {
>  	vmbus_driver_unregister(&hv_pci_drv);
> +
> +	hvpci_block_ops.read_block = NULL;
> +	hvpci_block_ops.write_block = NULL;
> +	hvpci_block_ops.reg_blk_invalidate = NULL;
>  }
>  
>  static int __init init_hv_pci_drv(void)
>  {
> +	/* Initialize PCI block r/w interface */
> +	hvpci_block_ops.read_block = hv_read_config_block;
> +	hvpci_block_ops.write_block = hv_write_config_block;
> +	hvpci_block_ops.reg_blk_invalidate = hv_register_block_invalidate;
> +
>  	return vmbus_driver_register(&hv_pci_drv);
>  }
>  
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 9d37f8c..2afe6fd 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1579,18 +1579,32 @@ struct vmpacket_descriptor *
>  	    pkt = hv_pkt_iter_next(channel, pkt))
>  
>  /*
> - * Functions for passing data between SR-IOV PF and VF drivers.  The VF driver
> + * Interface for passing data between SR-IOV PF and VF drivers. The VF driver
>   * sends requests to read and write blocks. Each block must be 128 bytes or
>   * smaller. Optionally, the VF driver can register a callback function which
>   * will be invoked when the host says that one or more of the first 64 block
>   * IDs is "invalid" which means that the VF driver should reread them.
>   */
>  #define HV_CONFIG_BLOCK_SIZE_MAX 128
> -int hv_read_config_block(struct pci_dev *dev, void *buf, unsigned int buf_len,
> -			 unsigned int block_id, unsigned int *bytes_returned);
> -int hv_write_config_block(struct pci_dev *dev, void *buf, unsigned int len,
> -			  unsigned int block_id);
> -int hv_register_block_invalidate(struct pci_dev *dev, void *context,
> -				 void (*block_invalidate)(void *context,
> -							  u64 block_mask));
> +
> +int hyperv_read_cfg_blk(struct pci_dev *dev, void *buf, unsigned int buf_len,
> +			unsigned int block_id, unsigned int *bytes_returned);
> +int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
> +			 unsigned int block_id);
> +int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
> +				void (*block_invalidate)(void *context,
> +							 u64 block_mask));
> +
> +struct hyperv_pci_block_ops {
> +	int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len,
> +			  unsigned int block_id, unsigned int *bytes_returned);
> +	int (*write_block)(struct pci_dev *dev, void *buf, unsigned int len,
> +			   unsigned int block_id);
> +	int (*reg_blk_invalidate)(struct pci_dev *dev, void *context,
> +				  void (*block_invalidate)(void *context,
> +							   u64 block_mask));
> +};
> +
> +extern struct hyperv_pci_block_ops hvpci_block_ops;
> +
>  #endif /* _HYPERV_H */

-- 
Vitaly

^ permalink raw reply

* Re: [RFC PATCH net-next 04/11] spi: spi-fsl-dspi: Cosmetic cleanup
From: Mark Brown @ 2019-08-16 12:21 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli,
	linux-spi, netdev
In-Reply-To: <20190816004449.10100-5-olteanv@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 700 bytes --]

On Fri, Aug 16, 2019 at 03:44:42AM +0300, Vladimir Oltean wrote:
> This patch addresses some cosmetic issues:
> - Alignment
> - Typos
> - (Non-)use of BIT() and GENMASK() macros
> - Unused definitions
> - Unused includes
> - Abuse of ternary operator in detriment of readability
> - Reduce indentation level

This is difficult to review since there's a bunch of largely unrelated
changes all munged into one patch.  It'd be better to split this up so
each change makes one kind of fix, and better to do this separately to
the rest of the series.  In particular having alignment changes along
with other changes hurts reviewability as it's less immediately clear
what's a like for liken substitution.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
From: Mark Brown @ 2019-08-16 12:18 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli,
	linux-spi, netdev
In-Reply-To: <20190816004449.10100-4-olteanv@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 332 bytes --]

On Fri, Aug 16, 2019 at 03:44:41AM +0300, Vladimir Oltean wrote:

> @@ -842,6 +843,9 @@ struct spi_transfer {
>  
>  	u32		effective_speed_hz;
>  
> +	struct ptp_system_timestamp *ptp_sts;
> +	unsigned int	ptp_sts_word_offset;
> +

You've not documented these fields at all so it's not clear what the
intended usage is.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* r8169: Performance regression and latency instability
From: Juliana Rodrigueiro @ 2019-08-16 12:09 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, hkallweit1

Greetings!

During migration from kernel 3.14 to 4.19, we noticed a regression on 
the network performance. Under the exact same circumstances, the 
standard deviation of the latency is more than double than before on the 
Realtek RTL8111/8168B (10ec:8168) using the r8169 driver.

Kernel 3.14:
     # netperf -v 2 -P 0 -H <netserver-IP>,4 -I 99,5 -t omni -l 1 -- -O 
STDDEV_LATENCY -m 64K -d Send
     313.37

Kernel 4.19:
     # netperf -v 2 -P 0 -H <netserver-IP>,4 -I 99,5 -t omni -l 1 -- -O 
STDDEV_LATENCY -m 64K -d Send
     632.96

In contrast, we noticed small improvements in performance with other 
non-Realtek network cards (igb, tg3). Which suggested a possible driver 
related bug.

However after bisecting the code, I ended up with the following patch, 
which was introduced in kernel 4.17 and modifies net/ipv4:

     commit 0a6b2a1dc2a2105f178255fe495eb914b09cb37a
     Author: Eric Dumazet <edumazet@google.com>
     Date:   Mon Feb 19 11:56:47 2018 -0800

         tcp: switch to GSO being always on

Could you please help me to clarify, should GSO be always on on my 
device? Or does it just affect TCP? According to ethtool it is always 
off, "ethtool -K eth0 gso on" has no effect, unless I switch SG on.

     # ethtool -k eth0
     Offload parameters for eth0:
     Cannot get device udp large send offload settings: Operation not 
supported
     rx-checksumming: on
     tx-checksumming: off
     scatter-gather: off
     tcp-segmentation-offload: off
     udp-fragmentation-offload: off
     generic-segmentation-offload: off
     generic-receive-offload: on
     large-receive-offload: off

I validated that reverting "tcp: switch to GSO being always on" 
successfully brings back the better performance for the r8169 driver.

I'm sure that reverting that commit is not the optimal solution, so I 
would like to kindly ask for help to shed some light in this issue.

Best regards,
Juliana.

^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: relicense bpf_helpers.h and bpf_endian.h
From: Jesper Dangaard Brouer @ 2019-08-16 12:10 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: brouer, bpf, netdev, andrii.nakryiko, kernel-team,
	Michael Holzheu, Naveen N . Rao, David S . Miller,
	Michal Rostecki, John Fastabend, Sargun Dhillon
In-Reply-To: <20190816054543.2215626-1-andriin@fb.com>

On Thu, 15 Aug 2019 22:45:43 -0700
Andrii Nakryiko <andriin@fb.com> wrote:

> bpf_helpers.h and bpf_endian.h contain useful macros and BPF helper
> definitions essential to almost every BPF program. Which makes them
> useful not just for selftests. To be able to expose them as part of
> libbpf, though, we need them to be dual-licensed as LGPL-2.1 OR
> BSD-2-Clause. This patch updates licensing of those two files.

I've already ACKed this, and is fine with (LGPL-2.1 OR BSD-2-Clause).

I just want to understand, why "BSD-2-Clause" and not "Apache-2.0" ?

The original argument was that this needed to be compatible with
"Apache-2.0", then why not simply add this in the "OR" ?

> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Hechao Li <hechaol@fb.com>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
> Acked-by: Andrey Ignatov <rdna@fb.com>
> Acked-by: Yonghong Song <yhs@fb.com>
> Acked-by: Lawrence Brakmo <brakmo@fb.com>
> Acked-by: Adam Barth <arb@fb.com>
> Acked-by: Roman Gushchin <guro@fb.com>
> Acked-by: Josef Bacik <jbacik@fb.com>
> Acked-by: Joe Stringer <joe@wand.net.nz>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Acked-by: David Ahern <dsahern@gmail.com>
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

Confirming I acked this.

> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
> Acked-by: Lorenz Bauer <lmb@cloudflare.com>
> Acked-by: Adrian Ratiu <adrian.ratiu@collabora.com>
> Acked-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
> Acked-by: Willem de Bruijn <willemb@google.com>
> Acked-by: Petar Penkov <ppenkov@google.com>
> Acked-by: Teng Qin <palmtenor@gmail.com>
> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Michal Rostecki <mrostecki@opensuse.org>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Sargun Dhillon <sargun@sargun.me>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>  tools/testing/selftests/bpf/bpf_endian.h  | 2 +-
>  tools/testing/selftests/bpf/bpf_helpers.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/bpf_endian.h b/tools/testing/selftests/bpf/bpf_endian.h
> index 05f036df8a4c..ff3593b0ae03 100644
> --- a/tools/testing/selftests/bpf/bpf_endian.h
> +++ b/tools/testing/selftests/bpf/bpf_endian.h
> @@ -1,4 +1,4 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
>  #ifndef __BPF_ENDIAN__
>  #define __BPF_ENDIAN__
>  
> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> index 8b503ea142f0..6c4930bc6e2e 100644
> --- a/tools/testing/selftests/bpf/bpf_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> @@ -1,4 +1,4 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
>  #ifndef __BPF_HELPERS_H
>  #define __BPF_HELPERS_H
>  



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [EXT] Re: [PATCH net-next 1/1] Added BASE-T1 PHY support to PHY Subsystem
From: Christian Herber @ 2019-08-16 12:05 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florian Fainelli
In-Reply-To: <2ca68436-8e49-b0b2-2460-4fcac3094b09@gmail.com>

On 15.08.2019 18:34, Heiner Kallweit wrote:
> Caution: EXT Email
> 
> On 15.08.2019 17:56, Andrew Lunn wrote:
>> On Thu, Aug 15, 2019 at 03:32:29PM +0000, Christian Herber wrote:
>>> BASE-T1 is a category of Ethernet PHYs.
>>> They use a single copper pair for transmission.
>>> This patch add basic support for this category of PHYs.
>>> It coveres the discovery of abilities and basic configuration.
>>> It includes setting fixed speed and enabling auto-negotiation.
>>> BASE-T1 devices should always Clause-45 managed.
>>> Therefore, this patch extends phy-c45.c.
>>> While for some functions like auto-neogtiation different registers are
>>> used, the layout of these registers is the same for the used fields.
>>> Thus, much of the logic of basic Clause-45 devices can be reused.
>>>
>>> Signed-off-by: Christian Herber <christian.herber@nxp.com>
>>> ---
>>>   drivers/net/phy/phy-c45.c    | 113 +++++++++++++++++++++++++++++++----
>>>   drivers/net/phy/phy-core.c   |   4 +-
>>>   include/uapi/linux/ethtool.h |   2 +
>>>   include/uapi/linux/mdio.h    |  21 +++++++
>>>   4 files changed, 129 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
>>> index b9d4145781ca..9ff0b8c785de 100644
>>> --- a/drivers/net/phy/phy-c45.c
>>> +++ b/drivers/net/phy/phy-c45.c
>>> @@ -8,13 +8,23 @@
>>>   #include <linux/mii.h>
>>>   #include <linux/phy.h>
>>>
>>> +#define IS_100BASET1(phy) (linkmode_test_bit( \
>>> +                       ETHTOOL_LINK_MODE_100baseT1_Full_BIT, \
>>> +                       (phy)->supported))
>>> +#define IS_1000BASET1(phy) (linkmode_test_bit( \
>>> +                        ETHTOOL_LINK_MODE_1000baseT1_Full_BIT, \
>>> +                        (phy)->supported))
>>
>> Hi Christian
>>
>> We already have the flag phydev->is_gigabit_capable. Maybe add a flag
>> phydev->is_t1_capable
>>
>>> +
>>> +static u32 get_aneg_ctrl(struct phy_device *phydev);
>>> +static u32 get_aneg_stat(struct phy_device *phydev);
>>
>> No forward declarations please. Put the code in the right order so
>> they are not needed.
>>
>> Thanks
>>
>>       Andrew
>>
> 
> For whatever reason I don't have the original mail in my netdev inbox (yet).
> 
> +       if (IS_100BASET1(phydev) || IS_1000BASET1(phydev))
> +               ctrl = MDIO_AN_BT1_CTRL;
> 
> Code like this could be problematic once a PHY supports one of the T1 modes
> AND normal modes. Then normal modes would be unusable.
> 
> I think this scenario isn't completely hypothetical. See the Aquantia
> AQCS109 that supports normal modes and (proprietary) 1000Base-T2.
> 
> Maybe we need separate versions of the generic functions for T1.
> Then it would be up to the PHY driver to decide when to use which
> version.
> 
> Heiner
> 

Integrating this with the existing driver or creating a new on is an 
interesting question. I came to the conclusion that it is most efficient 
to integrate to avoid all to much copy paste code.

So far, I am not aware of any device that supports T1 and something else 
at the same time. From a HW perspective, I also consider this quite 
unlikely. In the unlikely case that such a device comes up, support from 
the genphy driver would be limited to the BASE-T1 modes. But i would 
rather create the special case for the special device and cater the 
current mainstream support to the mainstream devices.

I think this boils down to a general strategy for the PHY framework, as 
this can happen for other classes of devices also like NGBASE-T1, 
MultiGBASE-T and future unknown devices. For now, I think the 
architecture is sufficiently scalable with a single c45 genphy driver

Christian

^ permalink raw reply

* Re: [EXT] Re: [PATCH net-next 1/1] Added BASE-T1 PHY support to PHY Subsystem
From: Christian Herber @ 2019-08-16 11:56 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190815155613.GE15291@lunn.ch>

On 15.08.2019 17:56, Andrew Lunn wrote:
> Caution: EXT Email
> 
> On Thu, Aug 15, 2019 at 03:32:29PM +0000, Christian Herber wrote:
>> BASE-T1 is a category of Ethernet PHYs.
>> They use a single copper pair for transmission.
>> This patch add basic support for this category of PHYs.
>> It coveres the discovery of abilities and basic configuration.
>> It includes setting fixed speed and enabling auto-negotiation.
>> BASE-T1 devices should always Clause-45 managed.
>> Therefore, this patch extends phy-c45.c.
>> While for some functions like auto-neogtiation different registers are
>> used, the layout of these registers is the same for the used fields.
>> Thus, much of the logic of basic Clause-45 devices can be reused.
>>
>> Signed-off-by: Christian Herber <christian.herber@nxp.com>
>> ---
>>   drivers/net/phy/phy-c45.c    | 113 +++++++++++++++++++++++++++++++----
>>   drivers/net/phy/phy-core.c   |   4 +-
>>   include/uapi/linux/ethtool.h |   2 +
>>   include/uapi/linux/mdio.h    |  21 +++++++
>>   4 files changed, 129 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
>> index b9d4145781ca..9ff0b8c785de 100644
>> --- a/drivers/net/phy/phy-c45.c
>> +++ b/drivers/net/phy/phy-c45.c
>> @@ -8,13 +8,23 @@
>>   #include <linux/mii.h>
>>   #include <linux/phy.h>
>>
>> +#define IS_100BASET1(phy) (linkmode_test_bit( \
>> +                        ETHTOOL_LINK_MODE_100baseT1_Full_BIT, \
>> +                        (phy)->supported))
>> +#define IS_1000BASET1(phy) (linkmode_test_bit( \
>> +                         ETHTOOL_LINK_MODE_1000baseT1_Full_BIT, \
>> +                         (phy)->supported))
> 
> Hi Christian
> 
> We already have the flag phydev->is_gigabit_capable. Maybe add a flag
> phydev->is_t1_capable
> 
>> +
>> +static u32 get_aneg_ctrl(struct phy_device *phydev);
>> +static u32 get_aneg_stat(struct phy_device *phydev);
> 
> No forward declarations please. Put the code in the right order so
> they are not needed.
> 
> Thanks
> 
>       Andrew
> 

Hi Andrew,

thanks for feedback. The use of an additional flag is a good proposal.
I was hesitant to touch the phydev structure.
I will add this along with removing the forward declaration in v2.

Regards,

Christian

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Jordan Glover @ 2019-08-16 11:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Alexei Starovoitov, Andy Lutomirski, Daniel Colascione, Song Liu,
	Kees Cook, Networking, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Lorenz Bauer, Jann Horn, Greg KH, Linux API,
	LSM List
In-Reply-To: <alpine.DEB.2.21.1908161158490.1873@nanos.tec.linutronix.de>

On Friday, August 16, 2019 9:59 AM, Thomas Gleixner <tglx@linutronix.de> wrote:

> On Fri, 16 Aug 2019, Jordan Glover wrote:
>
> > "systemd --user" service? Trying to do so will fail with:
> > "Failed to apply ambient capabilities (before UID change): Operation not permitted"
> > I think it's crucial to clear that point to avoid confusion in this discussion
> > where people are talking about different things.
> > On the other hand running "systemd --system" service with:
> > User=nobody
> > AmbientCapabilities=CAP_NET_ADMIN
> > is perfectly legit and clears some security concerns as only privileged user
> > can start such service.
>
> While we are at it, can we please stop looking at this from a systemd only
> perspective. There is a world outside of systemd.
>
> Thanks,
>
> tglx

If you define:

"systemd --user" == unprivileged process started by unprivileged user
"systemd --system" == process started by privileged user but run as another
user which keeps some of parent user privileges and drops others

you can get rid of "systemd" from the equation.

"systemd --user" was the example provided by Alexei when asked about the usecase
but his description didn't match what it does so it's not obvious what the real
usecase is. I'm sure there can be many more examples and systemd isn't important
here in particular beside to understand this specific example.

Jordan

^ permalink raw reply

* Re: [PATCH net] tunnel: fix dev null pointer dereference when send pkg larger than mtu in collect_md mode
From: Hangbin Liu @ 2019-08-16 10:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Stefano Brivio, wenxu, Alexei Starovoitov,
	David S . Miller
In-Reply-To: <fe103dee-bba8-1e0d-83b2-f91c2c37089d@gmail.com>

On Fri, Aug 16, 2019 at 10:23:55AM +0200, Eric Dumazet wrote:
> 
> 
> On 8/16/19 5:24 AM, Hangbin Liu wrote:
> > Hi Eric,
> > 
> > Thanks for the review.
> > On Thu, Aug 15, 2019 at 11:16:58AM +0200, Eric Dumazet wrote:
> >>> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> >>> index 38c02bb62e2c..c6713c7287df 100644
> >>> --- a/net/ipv4/ip_tunnel.c
> >>> +++ b/net/ipv4/ip_tunnel.c
> >>> @@ -597,6 +597,9 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
> >>>  		goto tx_error;
> >>>  	}
> >>>  
> >>> +	if (skb_dst(skb) && !skb_dst(skb)->dev)
> >>> +		skb_dst(skb)->dev = rt->dst.dev;
> >>> +
> >>
> >>
> >> IMO this looks wrong.
> >> This dst seems shared. 
> > 
> > If the dst is shared, it may cause some problem. Could you point me where the
> > dst may be shared possibly?
> >
> 
> dst are inherently shared.
> 
> This is why we have a refcount on them.
> 
> Only when the dst has been allocated by the current thread we can make changes on them.
> 

OK, I see now.

Then how about fix the issue in __icmp_send and decode_session{4,6}. The
fix in there is safe, as in __icmp_send() we only want to get net,
dev_net(skb_in->dev) could also do the work, just as icmp6_send() does.

For decode_session{4,6} the oif is also not needed in this scenario as this
is called by xfrm_decode_session_reverse(), we only need the skb_iif
fl4->flowi4_oif = reverse ? skb->skb_iif : oif;

I also need to check more code in OVS..

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 1510e951f451..95d803543df5 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -582,7 +582,11 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,

        if (!rt)
                goto out;
-       net = dev_net(rt->dst.dev);
+
+       if (skb_in->dev)
+               net = dev_net(skb_in->dev);
+       else
+               goto out;

        /*
         *      Find the original header. It is expected to be valid, of course.
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 8ca637a72697..ec94f5795ea4 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -3269,7 +3269,7 @@ decode_session4(struct sk_buff *skb, struct flowi *fl, bool reverse)
        struct flowi4 *fl4 = &fl->u.ip4;
        int oif = 0;

-       if (skb_dst(skb))
+       if (skb_dst(skb) && skb_dst(skb)->dev)
                oif = skb_dst(skb)->dev->ifindex;

        memset(fl4, 0, sizeof(struct flowi4));
@@ -3387,7 +3387,7 @@ decode_session6(struct sk_buff *skb, struct flowi *fl, bool reverse)

        nexthdr = nh[nhoff];

-       if (skb_dst(skb))
+       if (skb_dst(skb) && skb_dst(skb)->dev)
                oif = skb_dst(skb)->dev->ifindex;

        memset(fl6, 0, sizeof(struct flowi6));


Thanks
Hangbin

^ permalink raw reply related

* Re: [PATCH v2 00/10] Add definition for the number of standard PCI BARs
From: Andrew Murray @ 2019-08-16 10:51 UTC (permalink / raw)
  To: Denis Efremov
  Cc: Bjorn Helgaas, linux-kernel, linux-pci, Sebastian Ott,
	Gerald Schaefer, H. Peter Anvin, Giuseppe Cavallaro,
	Alexandre Torgue, Matt Porter, Alexandre Bounine, Peter Jones,
	Bartlomiej Zolnierkiewicz, Cornelia Huck, Alex Williamson,
	Jose Abreu, kvm, linux-fbdev, netdev, x86, linux-s390
In-Reply-To: <20190816092437.31846-1-efremov@linux.com>

On Fri, Aug 16, 2019 at 12:24:27PM +0300, Denis Efremov wrote:
> Code that iterates over all standard PCI BARs typically uses
> PCI_STD_RESOURCE_END, but this is error-prone because it requires
> "i <= PCI_STD_RESOURCE_END" rather than something like
> "i < PCI_STD_NUM_BARS". We could add such a definition and use it the same
> way PCI_SRIOV_NUM_BARS is used. There is already the definition
> PCI_BAR_COUNT for s390 only. Thus, this patchset introduces it globally.
> 
> Changes in v2:
>   - Reverse checks in pci_iomap_range,pci_iomap_wc_range.
>   - Refactor loops in vfio_pci to keep PCI_STD_RESOURCES.
>   - Add 2 new patches to replace the magic constant with new define.
>   - Split net patch in v1 to separate stmmac and dwc-xlgmac patches.
> 
> Denis Efremov (10):
>   PCI: Add define for the number of standard PCI BARs
>   s390/pci: Loop using PCI_STD_NUM_BARS
>   x86/PCI: Loop using PCI_STD_NUM_BARS
>   stmmac: pci: Loop using PCI_STD_NUM_BARS
>   net: dwc-xlgmac: Loop using PCI_STD_NUM_BARS
>   rapidio/tsi721: Loop using PCI_STD_NUM_BARS
>   efifb: Loop using PCI_STD_NUM_BARS
>   vfio_pci: Loop using PCI_STD_NUM_BARS
>   PCI: hv: Use PCI_STD_NUM_BARS
>   PCI: Use PCI_STD_NUM_BARS
> 
>  arch/s390/include/asm/pci.h                      |  5 +----
>  arch/s390/include/asm/pci_clp.h                  |  6 +++---
>  arch/s390/pci/pci.c                              | 16 ++++++++--------
>  arch/s390/pci/pci_clp.c                          |  6 +++---
>  arch/x86/pci/common.c                            |  2 +-
>  drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c |  4 ++--
>  drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c   |  2 +-
>  drivers/pci/controller/pci-hyperv.c              | 10 +++++-----
>  drivers/pci/pci.c                                | 11 ++++++-----
>  drivers/pci/quirks.c                             |  4 ++--
>  drivers/rapidio/devices/tsi721.c                 |  2 +-
>  drivers/vfio/pci/vfio_pci.c                      | 11 +++++++----
>  drivers/vfio/pci/vfio_pci_config.c               | 10 ++++++----
>  drivers/vfio/pci/vfio_pci_private.h              |  4 ++--
>  drivers/video/fbdev/efifb.c                      |  2 +-
>  include/linux/pci.h                              |  2 +-
>  include/uapi/linux/pci_regs.h                    |  1 +
>  17 files changed, 51 insertions(+), 47 deletions(-)

I've come across a few more places where this change can be made. There
may be multiple instances in the same file, but only the first is shown
below:

drivers/misc/pci_endpoint_test.c:       for (bar = BAR_0; bar <= BAR_5; bar++) {
drivers/net/ethernet/intel/e1000/e1000_main.c:          for (i = BAR_1; i <= BAR_5; i++) {
drivers/net/ethernet/intel/ixgb/ixgb_main.c:    for (i = BAR_1; i <= BAR_5; i++) {
drivers/pci/controller/dwc/pci-dra7xx.c:        for (bar = BAR_0; bar <= BAR_5; bar++)
drivers/pci/controller/dwc/pci-layerscape-ep.c: for (bar = BAR_0; bar <= BAR_5; bar++)
drivers/pci/controller/dwc/pcie-artpec6.c:      for (bar = BAR_0; bar <= BAR_5; bar++)
drivers/pci/controller/dwc/pcie-designware-plat.c:      for (bar = BAR_0; bar <= BAR_5; bar++)
drivers/pci/endpoint/functions/pci-epf-test.c:  for (bar = BAR_0; bar <= BAR_5; bar++) {
include/linux/pci-epc.h:        u64     bar_fixed_size[BAR_5 + 1];
drivers/scsi/pm8001/pm8001_hwi.c:       for (bar = 0; bar < 6; bar++) {
drivers/scsi/pm8001/pm8001_init.c:      for (bar = 0; bar < 6; bar++) {
drivers/ata/sata_nv.c:  for (bar = 0; bar < 6; bar++)
drivers/video/fbdev/core/fbmem.c:       for (idx = 0, bar = 0; bar < PCI_ROM_RESOURCE; bar++) {
drivers/staging/gasket/gasket_core.c:   for (i = 0; i < GASKET_NUM_BARS; i++) {
drivers/tty/serial/8250/8250_pci.c:     for (i = 0; i < PCI_NUM_BAR_RESOURCES; i++) { <-----------

It looks like BARs are often iterated with PCI_NUM_BAR_RESOURCES, there
are a load of these too found with:

git grep PCI_ROM_RESOURCE | grep "< "

I'm happy to share patches if preferred.

Thanks,

Andrew Murray

> 
> -- 
> 2.21.0
> 

^ permalink raw reply

* RE: [PATCH v2] rtw88: pci: Move a mass of jobs in hw IRQ to soft IRQ
From: Tony Chuang @ 2019-08-16 10:44 UTC (permalink / raw)
  To: Jian-Hong Pan, Kalle Valo, David S . Miller
  Cc: linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux@endlessm.com
In-Reply-To: <20190816100903.7549-1-jian-hong@endlessm.com>

> From: Jian-Hong Pan
> 
> There is a mass of jobs between spin lock and unlock in the hardware
> IRQ which will occupy much time originally. To make system work more
> efficiently, this patch moves the jobs to the soft IRQ (bottom half) to
> reduce the time in hardware IRQ.
> 
> Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> ---
> v2:
>  Change the spin_lock_irqsave/unlock_irqrestore to spin_lock/unlock in
>  rtw_pci_interrupt_handler. Because the interrupts are already disabled
>  in the hardware interrupt handler.
> 
>  drivers/net/wireless/realtek/rtw88/pci.c | 33 +++++++++++++++++++-----
>  1 file changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/wireless/realtek/rtw88/pci.c
> b/drivers/net/wireless/realtek/rtw88/pci.c
> index 00ef229552d5..0740140d7e46 100644
> --- a/drivers/net/wireless/realtek/rtw88/pci.c
> +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> @@ -866,12 +866,28 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq,
> void *dev)
>  {
>  	struct rtw_dev *rtwdev = dev;
>  	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
> -	u32 irq_status[4];
> 
>  	spin_lock(&rtwpci->irq_lock);
>  	if (!rtwpci->irq_enabled)
>  		goto out;
> 
> +	/* disable RTW PCI interrupt to avoid more interrupts before the end of
> +	 * thread function
> +	 */
> +	rtw_pci_disable_interrupt(rtwdev, rtwpci);

So basically it's to prevent back-to-back interrupts.

Nothing wrong about it, I just wondering why we don't like
back-to-back interrupts. Does it means that those interrupts
fired between irq_handler and threadfin would increase
much more time to consume them.

> +out:
> +	spin_unlock(&rtwpci->irq_lock);
> +
> +	return IRQ_WAKE_THREAD;
> +}
> +
> +static irqreturn_t rtw_pci_interrupt_threadfn(int irq, void *dev)
> +{
> +	struct rtw_dev *rtwdev = dev;
> +	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
> +	unsigned long flags;
> +	u32 irq_status[4];
> +
>  	rtw_pci_irq_recognized(rtwdev, rtwpci, irq_status);
> 
>  	if (irq_status[0] & IMR_MGNTDOK)
> @@ -891,8 +907,11 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq,
> void *dev)
>  	if (irq_status[0] & IMR_ROK)
>  		rtw_pci_rx_isr(rtwdev, rtwpci, RTW_RX_QUEUE_MPDU);
> 
> -out:
> -	spin_unlock(&rtwpci->irq_lock);
> +	/* all of the jobs for this interrupt have been done */
> +	spin_lock_irqsave(&rtwpci->irq_lock, flags);

I suggest to protect the ISRs. Because next patches will require
to check if the TX DMA path is empty. This means I will also add
this rtwpci->irq_lock to the TX path, and check if the skb_queue
does not have any pending SKBs not DMAed successfully.

> +	if (rtw_flag_check(rtwdev, RTW_FLAG_RUNNING))

Why check the flag here? Is there any racing or something?
Otherwise it looks to break the symmetry.

> +		rtw_pci_enable_interrupt(rtwdev, rtwpci);
> +	spin_unlock_irqrestore(&rtwpci->irq_lock, flags);
> 
>  	return IRQ_HANDLED;
>  }
> @@ -1152,8 +1171,10 @@ static int rtw_pci_probe(struct pci_dev *pdev,
>  		goto err_destroy_pci;
>  	}
> 
> -	ret = request_irq(pdev->irq, &rtw_pci_interrupt_handler,
> -			  IRQF_SHARED, KBUILD_MODNAME, rtwdev);
> +	ret = devm_request_threaded_irq(rtwdev->dev, pdev->irq,
> +					rtw_pci_interrupt_handler,
> +					rtw_pci_interrupt_threadfn,
> +					IRQF_SHARED, KBUILD_MODNAME, rtwdev);
>  	if (ret) {
>  		ieee80211_unregister_hw(hw);
>  		goto err_destroy_pci;
> @@ -1192,7 +1213,7 @@ static void rtw_pci_remove(struct pci_dev *pdev)
>  	rtw_pci_disable_interrupt(rtwdev, rtwpci);
>  	rtw_pci_destroy(rtwdev, pdev);
>  	rtw_pci_declaim(rtwdev, pdev);
> -	free_irq(rtwpci->pdev->irq, rtwdev);
> +	devm_free_irq(rtwdev->dev, rtwpci->pdev->irq, rtwdev);
>  	rtw_core_deinit(rtwdev);
>  	ieee80211_free_hw(hw);
>  }
> --
> 2.20.1

Yan-Hsuan


^ permalink raw reply

* [PATCH v2] tun: fix use-after-free when register netdev failed
From: Yang Yingliang @ 2019-08-16 11:00 UTC (permalink / raw)
  To: netdev
  Cc: jasowang, eric.dumazet, xiyou.wangcong, davem, yangyingliang,
	weiyongjun1

I got a UAF repport in tun driver when doing fuzzy test:

[  466.269490] ==================================================================
[  466.271792] BUG: KASAN: use-after-free in tun_chr_read_iter+0x2ca/0x2d0
[  466.271806] Read of size 8 at addr ffff888372139250 by task tun-test/2699
[  466.271810]
[  466.271824] CPU: 1 PID: 2699 Comm: tun-test Not tainted 5.3.0-rc1-00001-g5a9433db2614-dirty #427
[  466.271833] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[  466.271838] Call Trace:
[  466.271858]  dump_stack+0xca/0x13e
[  466.271871]  ? tun_chr_read_iter+0x2ca/0x2d0
[  466.271890]  print_address_description+0x79/0x440
[  466.271906]  ? vprintk_func+0x5e/0xf0
[  466.271920]  ? tun_chr_read_iter+0x2ca/0x2d0
[  466.271935]  __kasan_report+0x15c/0x1df
[  466.271958]  ? tun_chr_read_iter+0x2ca/0x2d0
[  466.271976]  kasan_report+0xe/0x20
[  466.271987]  tun_chr_read_iter+0x2ca/0x2d0
[  466.272013]  do_iter_readv_writev+0x4b7/0x740
[  466.272032]  ? default_llseek+0x2d0/0x2d0
[  466.272072]  do_iter_read+0x1c5/0x5e0
[  466.272110]  vfs_readv+0x108/0x180
[  466.299007]  ? compat_rw_copy_check_uvector+0x440/0x440
[  466.299020]  ? fsnotify+0x888/0xd50
[  466.299040]  ? __fsnotify_parent+0xd0/0x350
[  466.299064]  ? fsnotify_first_mark+0x1e0/0x1e0
[  466.304548]  ? vfs_write+0x264/0x510
[  466.304569]  ? ksys_write+0x101/0x210
[  466.304591]  ? do_preadv+0x116/0x1a0
[  466.304609]  do_preadv+0x116/0x1a0
[  466.309829]  do_syscall_64+0xc8/0x600
[  466.309849]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  466.309861] RIP: 0033:0x4560f9
[  466.309875] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
[  466.309889] RSP: 002b:00007ffffa5166e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000127
[  466.322992] RAX: ffffffffffffffda RBX: 0000000000400460 RCX: 00000000004560f9
[  466.322999] RDX: 0000000000000003 RSI: 00000000200008c0 RDI: 0000000000000003
[  466.323007] RBP: 00007ffffa516700 R08: 0000000000000004 R09: 0000000000000000
[  466.323014] R10: 0000000000000000 R11: 0000000000000206 R12: 000000000040cb10
[  466.323021] R13: 0000000000000000 R14: 00000000006d7018 R15: 0000000000000000
[  466.323057]
[  466.323064] Allocated by task 2605:
[  466.335165]  save_stack+0x19/0x80
[  466.336240]  __kasan_kmalloc.constprop.8+0xa0/0xd0
[  466.337755]  kmem_cache_alloc+0xe8/0x320
[  466.339050]  getname_flags+0xca/0x560
[  466.340229]  user_path_at_empty+0x2c/0x50
[  466.341508]  vfs_statx+0xe6/0x190
[  466.342619]  __do_sys_newstat+0x81/0x100
[  466.343908]  do_syscall_64+0xc8/0x600
[  466.345303]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  466.347034]
[  466.347517] Freed by task 2605:
[  466.348471]  save_stack+0x19/0x80
[  466.349476]  __kasan_slab_free+0x12e/0x180
[  466.350726]  kmem_cache_free+0xc8/0x430
[  466.351874]  putname+0xe2/0x120
[  466.352921]  filename_lookup+0x257/0x3e0
[  466.354319]  vfs_statx+0xe6/0x190
[  466.355498]  __do_sys_newstat+0x81/0x100
[  466.356889]  do_syscall_64+0xc8/0x600
[  466.358037]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  466.359567]
[  466.360050] The buggy address belongs to the object at ffff888372139100
[  466.360050]  which belongs to the cache names_cache of size 4096
[  466.363735] The buggy address is located 336 bytes inside of
[  466.363735]  4096-byte region [ffff888372139100, ffff88837213a100)
[  466.367179] The buggy address belongs to the page:
[  466.368604] page:ffffea000dc84e00 refcount:1 mapcount:0 mapping:ffff8883df1b4f00 index:0x0 compound_mapcount: 0
[  466.371582] flags: 0x2fffff80010200(slab|head)
[  466.372910] raw: 002fffff80010200 dead000000000100 dead000000000122 ffff8883df1b4f00
[  466.375209] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[  466.377778] page dumped because: kasan: bad access detected
[  466.379730]
[  466.380288] Memory state around the buggy address:
[  466.381844]  ffff888372139100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.384009]  ffff888372139180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.386131] >ffff888372139200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.388257]                                                  ^
[  466.390234]  ffff888372139280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.392512]  ffff888372139300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.394667] ==================================================================

tun_chr_read_iter() accessed the memory which freed by free_netdev()
called by tun_set_iff():

        CPUA                                     CPUB
    tun_set_iff()
      alloc_netdev_mqs()
      tun_attach()
                                            tun_chr_read_iter()
                                              tun_get()
      register_netdevice() <-- inject error
      tun_detach_all()
        synchronize_net()
                                              tun_do_read()
                                                tun_ring_recv()
                                                  schedule()
      free_netdev()
        netdev_freemem()
                                              tun_put()
                                                dev_put() <-- UAF

Move tun_set_real_num_queues() out of tun_attach() and call it
before register_netdevice() in tun_set_iff().

Call tun_attach() after register_netdevice() to make sure tfile->tun
is not published until the netdevice is registered. So the read/write
thread can not use the tun pointer that may freed by free_netdev().
(The tun and dev pointer are allocated by alloc_netdev_mqs(), they can
be freed by netdev_freemem().)

---
Changes in v2:
 - add a param in tun_set_real_num_queues()
 - move tun_set_real_num_queues() out of tun_attach()
 - call tun_set_real_num_queues() before register_netdevice()
 - call tun_attach() after register_netdevice()
---

Cc: Yang Yingliang <yangyingliang@huawei.com>
Fixes: eb0fb363f920 ("tuntap: attach queue 0 before registering netdevice")
Reported-by: Hulk Robot <hulkci@huawei.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 drivers/net/tun.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index db16d7a13e00..a19f864c5f8d 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -626,10 +626,11 @@ static inline bool tun_not_capable(struct tun_struct *tun)
 		!ns_capable(net->user_ns, CAP_NET_ADMIN);
 }
 
-static void tun_set_real_num_queues(struct tun_struct *tun)
+static void tun_set_real_num_queues(struct tun_struct *tun,
+				    unsigned int numqueues)
 {
-	netif_set_real_num_tx_queues(tun->dev, tun->numqueues);
-	netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
+	netif_set_real_num_tx_queues(tun->dev, numqueues);
+	netif_set_real_num_rx_queues(tun->dev, numqueues);
 }
 
 static void tun_disable_queue(struct tun_struct *tun, struct tun_file *tfile)
@@ -708,7 +709,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
 		tun_flow_delete_by_queue(tun, tun->numqueues + 1);
 		/* Drop read queue */
 		tun_queue_purge(tfile);
-		tun_set_real_num_queues(tun);
+		tun_set_real_num_queues(tun, tun->numqueues);
 	} else if (tfile->detached && clean) {
 		tun = tun_enable_queue(tfile);
 		sock_put(&tfile->sk);
@@ -873,7 +874,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
 	rcu_assign_pointer(tfile->tun, tun);
 	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
 	tun->numqueues++;
-	tun_set_real_num_queues(tun);
 out:
 	return err;
 }
@@ -2734,6 +2734,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		if (err < 0)
 			return err;
 
+		tun_set_real_num_queues(tun, tun->numqueues);
+
 		if (tun->flags & IFF_MULTI_QUEUE &&
 		    (tun->numqueues + tun->numdisabled > 1)) {
 			/* One or more queue has already been attached, no need
@@ -2828,14 +2830,18 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 			      (ifr->ifr_flags & TUN_FEATURES);
 
 		INIT_LIST_HEAD(&tun->disabled);
-		err = tun_attach(tun, file, false, ifr->ifr_flags & IFF_NAPI,
-				 ifr->ifr_flags & IFF_NAPI_FRAGS);
-		if (err < 0)
-			goto err_free_flow;
+
+		tun_set_real_num_queues(tun, tun->numqueues + 1);
 
 		err = register_netdevice(tun->dev);
 		if (err < 0)
-			goto err_detach;
+			/* register_netdevice() already called tun_free_netdev() */
+			goto err_free_dev;
+
+		err = tun_attach(tun, file, false, ifr->ifr_flags & IFF_NAPI,
+				 ifr->ifr_flags & IFF_NAPI_FRAGS);
+		if (err < 0)
+			goto err_unregister;
 	}
 
 	netif_carrier_on(tun->dev);
@@ -2851,14 +2857,10 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 	strcpy(ifr->ifr_name, tun->dev->name);
 	return 0;
 
-err_detach:
-	tun_detach_all(dev);
-	/* register_netdevice() already called tun_free_netdev() */
-	goto err_free_dev;
+err_unregister:
+	unregister_netdevice(dev);
+	return err;
 
-err_free_flow:
-	tun_flow_uninit(tun);
-	security_tun_dev_free_security(tun->security);
 err_free_stat:
 	free_percpu(tun->pcpu_stats);
 err_free_dev:
@@ -2979,6 +2981,8 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
 			goto unlock;
 		ret = tun_attach(tun, file, false, tun->flags & IFF_NAPI,
 				 tun->flags & IFF_NAPI_FRAGS);
+		if (!ret)
+			tun_set_real_num_queues(tun, tun->numqueues);
 	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE) {
 		tun = rtnl_dereference(tfile->tun);
 		if (!tun || !(tun->flags & IFF_MULTI_QUEUE) || tfile->detached)
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next] libbpf: remove zc variable as it is not used
From: Magnus Karlsson @ 2019-08-16 10:26 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev; +Cc: bpf, yhs

The zc is not used in the xsk part of libbpf, so let us remove it. Not
good to have dead code lying around.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reported-by: Yonghong Song <yhs@fb.com>
---
 tools/lib/bpf/xsk.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 680e630..9687da9 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -65,7 +65,6 @@ struct xsk_socket {
 	int xsks_map_fd;
 	__u32 queue_id;
 	char ifname[IFNAMSIZ];
-	bool zc;
 };
 
 struct xsk_nl_info {
@@ -608,8 +607,6 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 		goto out_mmap_tx;
 	}
 
-	xsk->zc = opts.flags & XDP_OPTIONS_ZEROCOPY;
-
 	if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
 		err = xsk_setup_xdp_prog(xsk);
 		if (err)
-- 
2.7.4


^ permalink raw reply related

* [PATCH v2] rtw88: pci: Move a mass of jobs in hw IRQ to soft IRQ
From: Jian-Hong Pan @ 2019-08-16 10:09 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller
  Cc: linux-wireless, netdev, linux-kernel, linux, Jian-Hong Pan
In-Reply-To: <CAPpJ_edibR0bxO0Pg=NAaRU8fGYheyN8NTv-gVyTDCJhE-iG5Q@mail.gmail.com>

There is a mass of jobs between spin lock and unlock in the hardware
IRQ which will occupy much time originally. To make system work more
efficiently, this patch moves the jobs to the soft IRQ (bottom half) to
reduce the time in hardware IRQ.

Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
---
v2:
 Change the spin_lock_irqsave/unlock_irqrestore to spin_lock/unlock in
 rtw_pci_interrupt_handler. Because the interrupts are already disabled
 in the hardware interrupt handler.

 drivers/net/wireless/realtek/rtw88/pci.c | 33 +++++++++++++++++++-----
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index 00ef229552d5..0740140d7e46 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -866,12 +866,28 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq, void *dev)
 {
 	struct rtw_dev *rtwdev = dev;
 	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
-	u32 irq_status[4];
 
 	spin_lock(&rtwpci->irq_lock);
 	if (!rtwpci->irq_enabled)
 		goto out;
 
+	/* disable RTW PCI interrupt to avoid more interrupts before the end of
+	 * thread function
+	 */
+	rtw_pci_disable_interrupt(rtwdev, rtwpci);
+out:
+	spin_unlock(&rtwpci->irq_lock);
+
+	return IRQ_WAKE_THREAD;
+}
+
+static irqreturn_t rtw_pci_interrupt_threadfn(int irq, void *dev)
+{
+	struct rtw_dev *rtwdev = dev;
+	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
+	unsigned long flags;
+	u32 irq_status[4];
+
 	rtw_pci_irq_recognized(rtwdev, rtwpci, irq_status);
 
 	if (irq_status[0] & IMR_MGNTDOK)
@@ -891,8 +907,11 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq, void *dev)
 	if (irq_status[0] & IMR_ROK)
 		rtw_pci_rx_isr(rtwdev, rtwpci, RTW_RX_QUEUE_MPDU);
 
-out:
-	spin_unlock(&rtwpci->irq_lock);
+	/* all of the jobs for this interrupt have been done */
+	spin_lock_irqsave(&rtwpci->irq_lock, flags);
+	if (rtw_flag_check(rtwdev, RTW_FLAG_RUNNING))
+		rtw_pci_enable_interrupt(rtwdev, rtwpci);
+	spin_unlock_irqrestore(&rtwpci->irq_lock, flags);
 
 	return IRQ_HANDLED;
 }
@@ -1152,8 +1171,10 @@ static int rtw_pci_probe(struct pci_dev *pdev,
 		goto err_destroy_pci;
 	}
 
-	ret = request_irq(pdev->irq, &rtw_pci_interrupt_handler,
-			  IRQF_SHARED, KBUILD_MODNAME, rtwdev);
+	ret = devm_request_threaded_irq(rtwdev->dev, pdev->irq,
+					rtw_pci_interrupt_handler,
+					rtw_pci_interrupt_threadfn,
+					IRQF_SHARED, KBUILD_MODNAME, rtwdev);
 	if (ret) {
 		ieee80211_unregister_hw(hw);
 		goto err_destroy_pci;
@@ -1192,7 +1213,7 @@ static void rtw_pci_remove(struct pci_dev *pdev)
 	rtw_pci_disable_interrupt(rtwdev, rtwpci);
 	rtw_pci_destroy(rtwdev, pdev);
 	rtw_pci_declaim(rtwdev, pdev);
-	free_irq(rtwpci->pdev->irq, rtwdev);
+	devm_free_irq(rtwdev->dev, rtwpci->pdev->irq, rtwdev);
 	rtw_core_deinit(rtwdev);
 	ieee80211_free_hw(hw);
 }
-- 
2.20.1


^ permalink raw reply related

* RE: [PATCH net-next] r8152: divide the tx and rx bottom functions
From: Hayes Wang @ 2019-08-16 10:04 UTC (permalink / raw)
  To: Eric Dumazet, netdev@vger.kernel.org
  Cc: nic_swsd, linux-kernel@vger.kernel.org
In-Reply-To: <a262d73b-0e91-7610-c88f-9670cc6fd18d@gmail.com>

Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Friday, August 16, 2019 5:27 PM
[...]
> Maybe you would avoid messing with a tasklet (we really try to get rid
> of tasklets in general) using two NAPI, one for TX, one for RX.
> 
> Some drivers already use two NAPI, it is fine.
> 
> This might avoid the ugly dance in r8152_poll(),
> calling napi_schedule(napi) after napi_complete_done() !

The reason is that the USB device couldn't control
the interrupt of USB controller. That is, I couldn't
disable the interrupt before napi_schedule and
enable it after napi_complete_done. If the callback
function occurs during the following timing, it is
possible no one would schedule the napi again.

static int r8152_poll(struct napi_struct *napi, int budget)
{
	struct r8152 *tp = container_of(napi, struct r8152, napi);
	int work_done;

	work_done = rx_bottom(tp, budget);
	bottom_half(tp);

	--> callback occurs here and try to call napi_schedule

	napi_complete_done(napi, work_done)

That is, no tx or rx could be handled unless
something trigger napi_schedule.


Best Regards,
Hayes




^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Thomas Gleixner @ 2019-08-16  9:59 UTC (permalink / raw)
  To: Jordan Glover
  Cc: Alexei Starovoitov, Andy Lutomirski, Daniel Colascione, Song Liu,
	Kees Cook, Networking, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Lorenz Bauer, Jann Horn, Greg KH, Linux API,
	LSM List
In-Reply-To: <fkD3fs46a1YnR4lh0tEG-g3tDnDcyZuzji7bAUR9wujPLLl75ZhI8Yk-H1jZpSugO7qChVeCwxAMmxLdeoF2QFS3ZzuYlh7zmeZOmhDJxww=@protonmail.ch>

On Fri, 16 Aug 2019, Jordan Glover wrote:
> "systemd --user" service? Trying to do so will fail with:
> "Failed to apply ambient capabilities (before UID change): Operation not permitted"
> 
> I think it's crucial to clear that point to avoid confusion in this discussion
> where people are talking about different things.
> 
> On the other hand running "systemd --system" service with:
> 
> User=nobody
> AmbientCapabilities=CAP_NET_ADMIN
> 
> is perfectly legit and clears some security concerns as only privileged user
> can start such service.

While we are at it, can we please stop looking at this from a systemd only
perspective. There is a world outside of systemd.

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH v12 3/5] dt-bindings: can: tcan4x5x: Add DT bindings for TCAN4x5X driver
From: Marc Kleine-Budde @ 2019-08-16  9:38 UTC (permalink / raw)
  To: Dan Murphy, wg, davem; +Cc: linux-can, netdev, linux-kernel
In-Reply-To: <20190509161109.10499-3-dmurphy@ti.com>


[-- Attachment #1.1: Type: text/plain, Size: 3240 bytes --]

On 5/9/19 6:11 PM, Dan Murphy wrote:
> DT binding documentation for TI TCAN4x5x driver.
> 
> Signed-off-by: Dan Murphy <dmurphy@ti.com>
> ---
> 
> v12 - No changes - https://lore.kernel.org/patchwork/patch/1052300/
> 
> v11 - No changes - https://lore.kernel.org/patchwork/patch/1051178/
> v10 - No changes - https://lore.kernel.org/patchwork/patch/1050488/
> v9 - No Changes - https://lore.kernel.org/patchwork/patch/1050118/
> v8 - No Changes - https://lore.kernel.org/patchwork/patch/1047981/
> v7 - Made device state optional - https://lore.kernel.org/patchwork/patch/1047218/
> v6 - No changes - https://lore.kernel.org/patchwork/patch/1042445/
> 
>  .../devicetree/bindings/net/can/tcan4x5x.txt  | 37 +++++++++++++++++++
>  1 file changed, 37 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/can/tcan4x5x.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/can/tcan4x5x.txt b/Documentation/devicetree/bindings/net/can/tcan4x5x.txt
> new file mode 100644
> index 000000000000..c388f7d9feb1
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/can/tcan4x5x.txt
> @@ -0,0 +1,37 @@
> +Texas Instruments TCAN4x5x CAN Controller
> +================================================
> +
> +This file provides device node information for the TCAN4x5x interface contains.
> +
> +Required properties:
> +	- compatible: "ti,tcan4x5x"
> +	- reg: 0
> +	- #address-cells: 1
> +	- #size-cells: 0
> +	- spi-max-frequency: Maximum frequency of the SPI bus the chip can
> +			     operate at should be less than or equal to 18 MHz.
> +	- data-ready-gpios: Interrupt GPIO for data and error reporting.
> +	- device-wake-gpios: Wake up GPIO to wake up the TCAN device.
> +
> +See Documentation/devicetree/bindings/net/can/m_can.txt for additional
> +required property details.
> +
> +Optional properties:
> +	- reset-gpios: Hardwired output GPIO. If not defined then software
> +		       reset.
> +	- device-state-gpios: Input GPIO that indicates if the device is in
> +			      a sleep state or if the device is active.
> +
> +Example:
> +tcan4x5x: tcan4x5x@0 {
> +		compatible = "ti,tcan4x5x";
> +		reg = <0>;
> +		#address-cells = <1>;
> +		#size-cells = <1>;
> +		spi-max-frequency = <10000000>;
> +		bosch,mram-cfg = <0x0 0 0 32 0 0 1 1>;
> +		data-ready-gpios = <&gpio1 14 GPIO_ACTIVE_LOW>;

Can you convert this into a proper interrupt property? E.g.:

>                 interrupt-parent = <&gpio4>;
>                 interrupts = <13 0x2>;

See:
https://elixir.bootlin.com/linux/latest/source/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt#L21
https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git/tree/drivers/net/can/spi/mcp251x.c?h=mcp251x#n945

> +		device-state-gpios = <&gpio3 21 GPIO_ACTIVE_HIGH>;
> +		device-wake-gpios = <&gpio1 15 GPIO_ACTIVE_HIGH>;
> +		reset-gpios = <&gpio1 27 GPIO_ACTIVE_LOW>;
> +};

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Jordan Glover @ 2019-08-16  9:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Daniel Colascione, Song Liu, Kees Cook,
	Networking, bpf, Alexei Starovoitov, Daniel Borkmann, Kernel Team,
	Lorenz Bauer, Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <20190815230808.2o2qe7a72cwdce2m@ast-mbp.dhcp.thefacebook.com>

On Thursday, August 15, 2019 11:08 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Thu, Aug 15, 2019 at 11:36:43AM -0700, Andy Lutomirski wrote:
>
> > On Thu, Aug 15, 2019 at 10:29 AM Alexei Starovoitov
> > alexei.starovoitov@gmail.com wrote:
> >
> > > On Thu, Aug 15, 2019 at 11:24:54AM +0000, Jordan Glover wrote:
> > >
> > > > systemd --user processes aren't "less privileged". The are COMPLETELY unprivileged.
> > > > Granting them cap_bpf is the same as granting it to every other unprivileged user
> > > > process. Also unprivileged user process can start systemd --user process with any
> > > > command they like.
> > >
> > > systemd itself is trusted. It's the same binary whether it runs as pid=1
> > > or as pid=123. One of the use cases is to make IPAddressDeny= work with --user.
> > > Subset of that feature already works with AmbientCapabilities=CAP_NET_ADMIN.
> > > CAP_BPF is a natural step in the same direction.
> >
> > I have the feeling that we're somehow speaking different languages.
> > What, precisely, do you mean when you say "systemd itself is trusted"?
> > Do you mean "the administrator trusts that the /lib/systemd/systemd
> > binary is not malicious"? Do you mean "the administrator trusts that
> > the running systemd process is not malicious"?
>
> please see
> https://github.com/systemd/systemd/commit/4c1567f29aeb60a6741874bca8a8e3a0bd69ed01
> I'm not advocating for or against this approach.
> Call it 'security hole' or 'better security'.
> There are two categories of people for any feature like this.
> My point that there is a demand to use bpf for non-root and CAP_NET_ADMIN
> level of privileges is acceptable.
> Another option is to relax all of bpf to CAP_NET_ADMIN instead of CAP_SYS_ADMIN.
> But CAP_BPF is clearly better way.
>

Do you realize it's not possible to grant CAP_NET_ADMIN or any other CAP in
"systemd --user" service? Trying to do so will fail with:
"Failed to apply ambient capabilities (before UID change): Operation not permitted"

I think it's crucial to clear that point to avoid confusion in this discussion
where people are talking about different things.

On the other hand running "systemd --system" service with:

User=nobody
AmbientCapabilities=CAP_NET_ADMIN

is perfectly legit and clears some security concerns as only privileged user
can start such service.

Jordan

^ permalink raw reply

* Re: [PATCH] rtw88: pci: Move a mass of jobs in hw IRQ to soft IRQ
From: Jian-Hong Pan @ 2019-08-16  9:27 UTC (permalink / raw)
  To: Tony Chuang
  Cc: Kalle Valo, David S . Miller, linux-wireless@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux@endlessm.com
In-Reply-To: <F7CD281DE3E379468C6D07993EA72F84D1892A52@RTITMBSVM04.realtek.com.tw>

Tony Chuang <yhchuang@realtek.com> 於 2019年8月16日 週五 下午4:07寫道：
>
> Hi,
>
> A few more questions below
>
> > > From: Jian-Hong Pan [mailto:jian-hong@endlessm.com]
> > >
> > > There is a mass of jobs between spin lock and unlock in the hardware
> > > IRQ which will occupy much time originally. To make system work more
> > > efficiently, this patch moves the jobs to the soft IRQ (bottom half) to
> > > reduce the time in hardware IRQ.
> > >
> > > Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> > > ---
> > >  drivers/net/wireless/realtek/rtw88/pci.c | 36 +++++++++++++++++++-----
> > >  1 file changed, 29 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/net/wireless/realtek/rtw88/pci.c
> > > b/drivers/net/wireless/realtek/rtw88/pci.c
> > > index 00ef229552d5..355606b167c6 100644
> > > --- a/drivers/net/wireless/realtek/rtw88/pci.c
> > > +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> > > @@ -866,12 +866,29 @@ static irqreturn_t rtw_pci_interrupt_handler(int
> > irq,
> > > void *dev)
> > >  {
> > >     struct rtw_dev *rtwdev = dev;
> > >     struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
> > > -   u32 irq_status[4];
> > > +   unsigned long flags;
> > >
> > > -   spin_lock(&rtwpci->irq_lock);
> > > +   spin_lock_irqsave(&rtwpci->irq_lock, flags);
>
> I think you can use 'spin_lock()' here as it's in IRQ context?

Ah!  You are right!  The interrupts are already disabled in the
interrupt handler.  So, there is no need to disable more once.  I can
tweak it.

> > >     if (!rtwpci->irq_enabled)
> > >             goto out;
> > >
> > > +   /* disable RTW PCI interrupt to avoid more interrupts before the end of
> > > +    * thread function
> > > +    */
> > > +   rtw_pci_disable_interrupt(rtwdev, rtwpci);
>
>
> Why do we need rtw_pci_disable_interrupt() here.
> Have you done any experiment and decided to add this.
> If you have can you share your results to me?

I got this idea "Avoid back to back interrupt" from Intel WiFi's driver.
https://elixir.bootlin.com/linux/v5.3-rc4/source/drivers/net/wireless/intel/iwlwifi/pcie/rx.c#L2071

So, I disable rtw_pci interrupt here in first half IRQ.  (Re-enable in
bottom half)

>
> > > +out:
> > > +   spin_unlock_irqrestore(&rtwpci->irq_lock, flags);
>
> spin_unlock()
>
> > > +
> > > +   return IRQ_WAKE_THREAD;
> > > +}
> > > +
> > > +static irqreturn_t rtw_pci_interrupt_threadfn(int irq, void *dev)
> > > +{
> > > +   struct rtw_dev *rtwdev = dev;
> > > +   struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
> > > +   unsigned long flags;
> > > +   u32 irq_status[4];
> > > +
> > >     rtw_pci_irq_recognized(rtwdev, rtwpci, irq_status);
> > >
> > >     if (irq_status[0] & IMR_MGNTDOK)
> > > @@ -891,8 +908,11 @@ static irqreturn_t rtw_pci_interrupt_handler(int
> > irq,
> > > void *dev)
> > >     if (irq_status[0] & IMR_ROK)
> > >             rtw_pci_rx_isr(rtwdev, rtwpci, RTW_RX_QUEUE_MPDU);
> > >
> > > -out:
> > > -   spin_unlock(&rtwpci->irq_lock);
> > > +   /* all of the jobs for this interrupt have been done */
> > > +   spin_lock_irqsave(&rtwpci->irq_lock, flags);
> >
> > Shouldn't we protect the ISRs above?
> >
> > This patch could actually reduce the time of IRQ.
> > But I think I need to further test it with PCI MSI interrupt.
> > https://patchwork.kernel.org/patch/11081539/
> >
> > Maybe we could drop the "rtw_pci_[enable/disable]_interrupt" when MSI
> > Is enabled with this patch.
> >
> > > +   if (rtw_flag_check(rtwdev, RTW_FLAG_RUNNING))
> > > +           rtw_pci_enable_interrupt(rtwdev, rtwpci);

Then, re-enable rtw_pci interrupt here in bottom half of the IRQ.
Here is the place where Intel WiFi re-enable interrupts.
https://elixir.bootlin.com/linux/v5.3-rc4/source/drivers/net/wireless/intel/iwlwifi/pcie/rx.c#L1959

Now, we can go back to the question "Shouldn't we protect the ISRs above?"

1. What does the lock: rtwpci->irq_lock protect for?
According to the code, seems only rtw_pci interrupt's state which is
enabled or not.

2. How about the ISRs you mentioned?
This part will only be executed if there is a fresh rtw_pci interrupt.
The first half already disabled rtw_pci interrupt, so there is no more
fresh rtw_pci interrupt until rtw_pci interrupt is enabled again.
Therefor, the rtwpci->irq_lock only wraps the rtw_pci interrupt
enablement.

If there is any more entry that I missed and will interfere, please let me know.

Thank you
Jian-Hong Pan

^ permalink raw reply

* Re: [PATCH net-next] r8152: divide the tx and rx bottom functions
From: Eric Dumazet @ 2019-08-16  9:27 UTC (permalink / raw)
  To: Hayes Wang, Eric Dumazet, netdev@vger.kernel.org
  Cc: nic_swsd, linux-kernel@vger.kernel.org
In-Reply-To: <0835B3720019904CB8F7AA43166CEEB2F18D47C8@RTITMBSVM03.realtek.com.tw>



On 8/16/19 11:08 AM, Hayes Wang wrote:
> Eric Dumazet [mailto:eric.dumazet@gmail.com]
>> Sent: Friday, August 16, 2019 4:20 PM
> [...]
>> Which callback ?
> 
> The USB device has two endpoints for Tx and Rx.
> If I submit tx or rx URB to the USB host controller,
> the relative callback functions would be called, when
> they are finished. For rx, it is read_bulk_callback.
> For tx, it is write_bulk_callback.
> 
>> After an idle period (no activity, no prior packets being tx-completed ...),
>> a packet is sent by the upper stack, enters the ndo_start_xmit() of a network
>> driver.
>>
>> This driver ndo_start_xmit() simply adds an skb to a local list, and returns.
> 
> Base on the current method (without tasklet), when
> ndo_start_xmit() is called, napi_schedule is called only
> if there is at least one free buffer (!list_empty(&tp->tx_free))
> to transmit the packet. Then, the flow would be as following.

Very uncommon naming conventions really :/


Maybe you would avoid messing with a tasklet (we really try to get rid
of tasklets in general) using two NAPI, one for TX, one for RX.

Some drivers already use two NAPI, it is fine.

This might avoid the ugly dance in r8152_poll(),
calling napi_schedule(napi) after napi_complete_done() !


^ permalink raw reply

* [PATCH v2 05/10] net: dwc-xlgmac: Loop using PCI_STD_NUM_BARS
From: Denis Efremov @ 2019-08-16  9:24 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Denis Efremov, Jose Abreu, netdev, linux-pci, linux-kernel
In-Reply-To: <20190816092437.31846-1-efremov@linux.com>

Refactor loops to use 'i < PCI_STD_NUM_BARS' instead of
'i <= PCI_STD_RESOURCE_END'.

Signed-off-by: Denis Efremov <efremov@linux.com>
---
 drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c b/drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c
index 386bafe74c3f..fa8604d7b797 100644
--- a/drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c
+++ b/drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c
@@ -34,7 +34,7 @@ static int xlgmac_probe(struct pci_dev *pcidev, const struct pci_device_id *id)
 		return ret;
 	}
 
-	for (i = 0; i <= PCI_STD_RESOURCE_END; i++) {
+	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
 		if (pci_resource_len(pcidev, i) == 0)
 			continue;
 		ret = pcim_iomap_regions(pcidev, BIT(i), XLGMAC_DRV_NAME);
-- 
2.21.0


^ permalink raw reply related

* [PATCH v2 04/10] stmmac: pci: Loop using PCI_STD_NUM_BARS
From: Denis Efremov @ 2019-08-16  9:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Denis Efremov, Giuseppe Cavallaro, Alexandre Torgue, netdev,
	linux-pci, linux-kernel
In-Reply-To: <20190816092437.31846-1-efremov@linux.com>

Refactor loops to use 'i < PCI_STD_NUM_BARS' instead of
'i <= PCI_STD_RESOURCE_END'.

Signed-off-by: Denis Efremov <efremov@linux.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
index 86f9c07a38cf..cfe496cdd78b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
@@ -258,7 +258,7 @@ static int stmmac_pci_probe(struct pci_dev *pdev,
 	}
 
 	/* Get the base address of device */
-	for (i = 0; i <= PCI_STD_RESOURCE_END; i++) {
+	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
 		if (pci_resource_len(pdev, i) == 0)
 			continue;
 		ret = pcim_iomap_regions(pdev, BIT(i), pci_name(pdev));
@@ -296,7 +296,7 @@ static void stmmac_pci_remove(struct pci_dev *pdev)
 
 	stmmac_dvr_remove(&pdev->dev);
 
-	for (i = 0; i <= PCI_STD_RESOURCE_END; i++) {
+	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
 		if (pci_resource_len(pdev, i) == 0)
 			continue;
 		pcim_iounmap_regions(pdev, BIT(i));
-- 
2.21.0


^ permalink raw reply related

* [PATCH v2 01/10] PCI: Add define for the number of standard PCI BARs
From: Denis Efremov @ 2019-08-16  9:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Denis Efremov, linux-kernel, linux-pci, Sebastian Ott,
	Gerald Schaefer, H. Peter Anvin, Giuseppe Cavallaro,
	Alexandre Torgue, Matt Porter, Alexandre Bounine, Peter Jones,
	Bartlomiej Zolnierkiewicz, Cornelia Huck, Alex Williamson,
	Andrew Murray, Jose Abreu, kvm, linux-fbdev, netdev, x86,
	linux-s390
In-Reply-To: <20190816092437.31846-1-efremov@linux.com>

Code that iterates over all standard PCI BARs typically uses
PCI_STD_RESOURCE_END. However, it requires the "unusual" loop condition
"i <= PCI_STD_RESOURCE_END" rather than something more standard like
"i < PCI_STD_NUM_BARS".

This patch adds the definition PCI_STD_NUM_BARS which is equivalent to
"PCI_STD_RESOURCE_END + 1" and updates loop conditions to use it.

Signed-off-by: Denis Efremov <efremov@linux.com>
---
 drivers/pci/quirks.c          | 2 +-
 include/linux/pci.h           | 2 +-
 include/uapi/linux/pci_regs.h | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 208aacf39329..02bdf3a0231e 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -475,7 +475,7 @@ static void quirk_extend_bar_to_page(struct pci_dev *dev)
 {
 	int i;
 
-	for (i = 0; i <= PCI_STD_RESOURCE_END; i++) {
+	for (i = 0; i < PCI_STD_NUM_BARS; i++) {
 		struct resource *r = &dev->resource[i];
 
 		if (r->flags & IORESOURCE_MEM && resource_size(r) < PAGE_SIZE) {
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 9e700d9f9f28..7b9590d5dc2d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -76,7 +76,7 @@ enum pci_mmap_state {
 enum {
 	/* #0-5: standard PCI resources */
 	PCI_STD_RESOURCES,
-	PCI_STD_RESOURCE_END = 5,
+	PCI_STD_RESOURCE_END = PCI_STD_RESOURCES + PCI_STD_NUM_BARS - 1,
 
 	/* #6: expansion ROM resource */
 	PCI_ROM_RESOURCE,
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index f28e562d7ca8..68b571d491eb 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -34,6 +34,7 @@
  * of which the first 64 bytes are standardized as follows:
  */
 #define PCI_STD_HEADER_SIZEOF	64
+#define PCI_STD_NUM_BARS	6	/* Number of standard BARs */
 #define PCI_VENDOR_ID		0x00	/* 16 bits */
 #define PCI_DEVICE_ID		0x02	/* 16 bits */
 #define PCI_COMMAND		0x04	/* 16 bits */
-- 
2.21.0


^ permalink raw reply related

* [PATCH v2 00/10] Add definition for the number of standard PCI BARs
From: Denis Efremov @ 2019-08-16  9:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Denis Efremov, linux-kernel, linux-pci, Sebastian Ott,
	Gerald Schaefer, H. Peter Anvin, Giuseppe Cavallaro,
	Alexandre Torgue, Matt Porter, Alexandre Bounine, Peter Jones,
	Bartlomiej Zolnierkiewicz, Cornelia Huck, Alex Williamson,
	Andrew Murray, Jose Abreu, kvm, linux-fbdev, netdev, x86,
	linux-s390

Code that iterates over all standard PCI BARs typically uses
PCI_STD_RESOURCE_END, but this is error-prone because it requires
"i <= PCI_STD_RESOURCE_END" rather than something like
"i < PCI_STD_NUM_BARS". We could add such a definition and use it the same
way PCI_SRIOV_NUM_BARS is used. There is already the definition
PCI_BAR_COUNT for s390 only. Thus, this patchset introduces it globally.

Changes in v2:
  - Reverse checks in pci_iomap_range,pci_iomap_wc_range.
  - Refactor loops in vfio_pci to keep PCI_STD_RESOURCES.
  - Add 2 new patches to replace the magic constant with new define.
  - Split net patch in v1 to separate stmmac and dwc-xlgmac patches.

Denis Efremov (10):
  PCI: Add define for the number of standard PCI BARs
  s390/pci: Loop using PCI_STD_NUM_BARS
  x86/PCI: Loop using PCI_STD_NUM_BARS
  stmmac: pci: Loop using PCI_STD_NUM_BARS
  net: dwc-xlgmac: Loop using PCI_STD_NUM_BARS
  rapidio/tsi721: Loop using PCI_STD_NUM_BARS
  efifb: Loop using PCI_STD_NUM_BARS
  vfio_pci: Loop using PCI_STD_NUM_BARS
  PCI: hv: Use PCI_STD_NUM_BARS
  PCI: Use PCI_STD_NUM_BARS

 arch/s390/include/asm/pci.h                      |  5 +----
 arch/s390/include/asm/pci_clp.h                  |  6 +++---
 arch/s390/pci/pci.c                              | 16 ++++++++--------
 arch/s390/pci/pci_clp.c                          |  6 +++---
 arch/x86/pci/common.c                            |  2 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c |  4 ++--
 drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c   |  2 +-
 drivers/pci/controller/pci-hyperv.c              | 10 +++++-----
 drivers/pci/pci.c                                | 11 ++++++-----
 drivers/pci/quirks.c                             |  4 ++--
 drivers/rapidio/devices/tsi721.c                 |  2 +-
 drivers/vfio/pci/vfio_pci.c                      | 11 +++++++----
 drivers/vfio/pci/vfio_pci_config.c               | 10 ++++++----
 drivers/vfio/pci/vfio_pci_private.h              |  4 ++--
 drivers/video/fbdev/efifb.c                      |  2 +-
 include/linux/pci.h                              |  2 +-
 include/uapi/linux/pci_regs.h                    |  1 +
 17 files changed, 51 insertions(+), 47 deletions(-)

-- 
2.21.0


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox