Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] MAINTAINERS: update netsec driver
From: Ilias Apalodimas @ 2019-07-18 14:38 UTC (permalink / raw)
  To: netdev, jaswinder.singh, davem; +Cc: ard.biesheuvel, Ilias Apalodimas

Add myself to maintainers since i provided the XDP and page_pool
implementation

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 211ea3a199bd..64f659d8346c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14789,6 +14789,7 @@ F:	Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
 
 SOCIONEXT (SNI) NETSEC NETWORK DRIVER
 M:	Jassi Brar <jaswinder.singh@linaro.org>
+M:	Ilias Apalodimas <ilias.apalodimas@linaro.org>
 L:	netdev@vger.kernel.org
 S:	Maintained
 F:	drivers/net/ethernet/socionext/netsec.c
-- 
2.20.1


^ permalink raw reply related

* [PATCH] net: fec: generate warning when using deprecated phy reset
From: Sven Van Asbroeck @ 2019-07-18 14:34 UTC (permalink / raw)
  To: Fugang Duan; +Cc: David S . Miller, netdev, linux-kernel

Allowing the fec to reset its PHY via the phy-reset-gpios
devicetree property is deprecated. To improve developer
awareness, generate a warning whenever the deprecated
property is used.

Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 38f10f7dcbc3..00e1b5e4ef71 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3244,6 +3244,12 @@ static int fec_reset_phy(struct platform_device *pdev)
 	else if (!gpio_is_valid(phy_reset))
 		return 0;
 
+	/* Recommended way to provide a PHY reset:
+	 * - create a phy devicetree node, and link it to its fec (phy-handle)
+	 * - add your reset gpio to the phy devicetree node
+	 */
+	dev_warn(&pdev->dev, "devicetree: phy-reset-gpios is deprecated\n");
+
 	err = of_property_read_u32(np, "phy-reset-post-delay", &phy_post_delay);
 	/* valid reset duration should be less than 1s */
 	if (!err && phy_post_delay > 1000)
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH net] ipv6: Unlink sibling route in case of failure
From: David Ahern @ 2019-07-18 14:21 UTC (permalink / raw)
  To: Ido Schimmel, netdev; +Cc: davem, alexpe, mlxsw, Ido Schimmel
In-Reply-To: <20190717203933.3073-1-idosch@idosch.org>

On 7/17/19 2:39 PM, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@mellanox.com>
> 
> When a route needs to be appended to an existing multipath route,
> fib6_add_rt2node() first appends it to the siblings list and increments
> the number of sibling routes on each sibling.
> 
> Later, the function notifies the route via call_fib6_entry_notifiers().
> In case the notification is vetoed, the route is not unlinked from the
> siblings list, which can result in a use-after-free.
> 
> Fix this by unlinking the route from the siblings list before returning
> an error.
> 
> Audited the rest of the call sites from which the FIB notification chain
> is called and could not find more problems.
> 
> Fixes: 2233000cba40 ("net/ipv6: Move call_fib6_entry_notifiers up for route adds")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
> ---
> Dave, this will not apply cleanly to stable trees due to recent changes
> in net-next. I can prepare another patch for stable if needed.
> ---
>  net/ipv6/ip6_fib.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 

Thanks for the fix, Ido. I can help with the ports as well.

Reviewed-by: David Ahern <dsahern@gmail.com>


^ permalink raw reply

* [PATCH bpf] tools/bpf: fix bpftool build with OUTPUT set
From: Ilya Leoshkevich @ 2019-07-18 14:20 UTC (permalink / raw)
  To: bpf, netdev, lmb; +Cc: gor, heiko.carstens, Ilya Leoshkevich
In-Reply-To: <CACAyw9-CWRHVH3TJ=Tke2x8YiLsH47sLCijdp=V+5M836R9aAA@mail.gmail.com>

Hi Lorenz,

I've been using the following patch for quite some time now.
Please let me know if it works for you.

Best regards,
Ilya

---

When OUTPUT is set, bpftool and libbpf put their objects into the same
directory, and since some of them have the same names, the collision
happens.

Fix by invoking libbpf build in a manner similar to $(call descend) -
descend itself cannot be used, since libbpf is a sibling, and not a
child, of bpftool.

Also, don't link bpftool with libbpf.a twice.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 tools/bpf/bpftool/Makefile | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index a7afea4dec47..2cbc3c166f44 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -15,23 +15,18 @@ else
 endif
 
 BPF_DIR = $(srctree)/tools/lib/bpf/
-
-ifneq ($(OUTPUT),)
-  BPF_PATH = $(OUTPUT)
-else
-  BPF_PATH = $(BPF_DIR)
-endif
-
-LIBBPF = $(BPF_PATH)libbpf.a
+BPF_PATH = $(objtree)/tools/lib/bpf
+LIBBPF = $(BPF_PATH)/libbpf.a
 
 BPFTOOL_VERSION := $(shell make --no-print-directory -sC ../../.. kernelversion)
 
 $(LIBBPF): FORCE
-	$(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) $(OUTPUT)libbpf.a
+	$(Q)mkdir -p $(BPF_PATH)
+	$(Q)$(MAKE) $(COMMAND_O) subdir=tools/lib/bpf -C $(BPF_DIR) $(LIBBPF)
 
 $(LIBBPF)-clean:
 	$(call QUIET_CLEAN, libbpf)
-	$(Q)$(MAKE) -C $(BPF_DIR) OUTPUT=$(OUTPUT) clean >/dev/null
+	$(Q)$(MAKE) $(COMMAND_O) subdir=tools/lib/bpf -C $(BPF_DIR) clean >/dev/null
 
 prefix ?= /usr/local
 bash_compdir ?= /usr/share/bash-completion/completions
@@ -112,7 +107,7 @@ $(OUTPUT)disasm.o: $(srctree)/kernel/bpf/disasm.c
 	$(QUIET_CC)$(COMPILE.c) -MMD -o $@ $<
 
 $(OUTPUT)bpftool: $(OBJS) $(LIBBPF)
-	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $^ $(LIBS)
+	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(OBJS) $(LIBS)
 
 $(OUTPUT)%.o: %.c
 	$(QUIET_CC)$(COMPILE.c) -MMD -o $@ $<
-- 
2.21.0


^ permalink raw reply related

* Re: [net-next 1/2] ipvs: batch __ip_vs_cleanup
From: Haishuang Yan @ 2019-07-18 14:16 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: David S. Miller, Pablo Neira Ayuso, Simon Horman, netdev,
	lvs-devel, linux-kernel, netfilter-devel
In-Reply-To: <alpine.LFD.2.21.1907152333300.5700@ja.home.ssi.bg>


> On 2019年7月16日, at 上午4:39, Julian Anastasov <ja@ssi.bg> wrote:
> 
> 
> 	Hello,
> 
> On Sat, 13 Jul 2019, Haishuang Yan wrote:
> 
>> It's better to batch __ip_vs_cleanup to speedup ipvs
>> connections dismantle.
>> 
>> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
>> ---
>> include/net/ip_vs.h             |  2 +-
>> net/netfilter/ipvs/ip_vs_core.c | 29 +++++++++++++++++------------
>> net/netfilter/ipvs/ip_vs_ctl.c  | 13 ++++++++++---
>> 3 files changed, 28 insertions(+), 16 deletions(-)
>> 
>> diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
>> index 3759167..93e7a25 100644
>> --- a/include/net/ip_vs.h
>> +++ b/include/net/ip_vs.h
>> @@ -1324,7 +1324,7 @@ static inline void ip_vs_control_del(struct ip_vs_conn *cp)
>> void ip_vs_control_net_cleanup(struct netns_ipvs *ipvs);
>> void ip_vs_estimator_net_cleanup(struct netns_ipvs *ipvs);
>> void ip_vs_sync_net_cleanup(struct netns_ipvs *ipvs);
>> -void ip_vs_service_net_cleanup(struct netns_ipvs *ipvs);
>> +void ip_vs_service_nets_cleanup(struct list_head *net_list);
>> 
>> /* IPVS application functions
>>  * (from ip_vs_app.c)
>> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
>> index 46f06f9..b4d79b7 100644
>> --- a/net/netfilter/ipvs/ip_vs_core.c
>> +++ b/net/netfilter/ipvs/ip_vs_core.c
>> @@ -2402,18 +2402,23 @@ static int __net_init __ip_vs_init(struct net *net)
>> 	return -ENOMEM;
>> }
>> 
>> -static void __net_exit __ip_vs_cleanup(struct net *net)
>> +static void __net_exit __ip_vs_cleanup_batch(struct list_head *net_list)
>> {
>> -	struct netns_ipvs *ipvs = net_ipvs(net);
>> -
>> -	ip_vs_service_net_cleanup(ipvs);	/* ip_vs_flush() with locks */
>> -	ip_vs_conn_net_cleanup(ipvs);
>> -	ip_vs_app_net_cleanup(ipvs);
>> -	ip_vs_protocol_net_cleanup(ipvs);
>> -	ip_vs_control_net_cleanup(ipvs);
>> -	ip_vs_estimator_net_cleanup(ipvs);
>> -	IP_VS_DBG(2, "ipvs netns %d released\n", ipvs->gen);
>> -	net->ipvs = NULL;
>> +	struct netns_ipvs *ipvs;
>> +	struct net *net;
>> +	LIST_HEAD(list);
>> +
>> +	ip_vs_service_nets_cleanup(net_list);	/* ip_vs_flush() with locks */
>> +	list_for_each_entry(net, net_list, exit_list) {
> 
> 	How much faster is to replace list_for_each_entry in
> ops_exit_list() with this one. IPVS can waste time in calls
> such as kthread_stop() and del_timer_sync() but I'm not sure
> we can solve it easily. What gain do you see in benchmarks?

Hi, 

As the following benchmark testing results show, there is a little performance improvement:

$  cat add_del_unshare.sh
#!/bin/bash

for i in `seq 1 100`
    do
     (for j in `seq 1 40` ; do  unshare -n ipvsadm -A -t 172.16.$i.$j:80 >/dev/null ; done) &
    done
wait; grep net_namespace /proc/slabinfo

Befor patch:
$  time sh add_del_unshare.sh
net_namespace       4020   4020   4736    6    8 : tunables    0    0    0 : slabdata    670    670      0

real    0m8.086s
user    0m2.025s
sys     0m36.956s

After patch:
$  time sh add_del_unshare.sh
net_namespace       4020   4020   4736    6    8 : tunables    0    0    0 : slabdata    670    670      0

real    0m7.623s
user    0m2.003s
sys     0m32.935s


> 
>> +		ipvs = net_ipvs(net);
>> +		ip_vs_conn_net_cleanup(ipvs);
>> +		ip_vs_app_net_cleanup(ipvs);
>> +		ip_vs_protocol_net_cleanup(ipvs);
>> +		ip_vs_control_net_cleanup(ipvs);
>> +		ip_vs_estimator_net_cleanup(ipvs);
>> +		IP_VS_DBG(2, "ipvs netns %d released\n", ipvs->gen);
>> +		net->ipvs = NULL;
>> +	}
>> }
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>
> 




^ permalink raw reply

* pull-request: wireless-drivers 2019-07-18
From: Kalle Valo @ 2019-07-18 14:03 UTC (permalink / raw)
  To: David Miller; +Cc: linux-wireless, netdev, linux-kernel

Hi Dave,

here are first fixes which have accumulated during the merge window.
This pull request is to net tree for 5.3. Please let me know if there
are any problems.

Kalle

The following changes since commit 76104862cccaeaa84fdd23e39f2610a96296291c:

  sky2: Disable MSI on P5W DH Deluxe (2019-07-14 13:45:54 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git tags/wireless-drivers-for-davem-2019-07-18

for you to fetch changes up to 41a531ffa4c5aeb062f892227c00fabb3b4a9c91:

  rt2x00usb: fix rx queue hang (2019-07-15 20:52:18 +0300)

----------------------------------------------------------------
wireless-drivers fixes for 5.3

First set of fixes for 5.3.

iwlwifi

* add new cards for 9000 and 20000 series and qu c-step devices

ath10k

* workaround an uninitialised variable warning

rt2x00

* fix rx queue hand on USB

----------------------------------------------------------------
Arnd Bergmann (1):
      ath10k: work around uninitialized vht_pfr variable

Ihab Zhaika (1):
      iwlwifi: add new cards for 9000 and 20000 series

Luca Coelho (1):
      iwlwifi: pcie: add support for qu c-step devices

Soeren Moch (1):
      rt2x00usb: fix rx queue hang

 drivers/net/wireless/ath/ath10k/mac.c           |  2 +
 drivers/net/wireless/intel/iwlwifi/cfg/22000.c  | 53 +++++++++++++++++++++++++
 drivers/net/wireless/intel/iwlwifi/iwl-config.h |  7 ++++
 drivers/net/wireless/intel/iwlwifi/iwl-csr.h    |  2 +
 drivers/net/wireless/intel/iwlwifi/pcie/drv.c   | 23 +++++++++++
 drivers/net/wireless/ralink/rt2x00/rt2x00usb.c  | 12 +++---
 6 files changed, 93 insertions(+), 6 deletions(-)

^ permalink raw reply

* Re: [PATCH] Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
From: David Ahern @ 2019-07-18 14:02 UTC (permalink / raw)
  To: Peter Kosyh; +Cc: davem, Shrijeet Mukherjee, netdev, linux-kernel
In-Reply-To: <20190718094114.13718-1-p.kosyh@gmail.com>

your subject line needs a proper Subject - a one-line summary of the
change starting with 'vrf:'. See examples from 'git log drivers/net/vrf.c'


On 7/18/19 3:41 AM, Peter Kosyh wrote:
> vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing
> using ip/ipv6 addresses, but don't make sure the header is available in
> skb->data[] (skb_headlen() is less then header size).
> 
> The situation may occures while forwarding from MPLS layer to vrf, for
> example.

so the use case is a label pop with the nexthop as the VRF device?

> 
> So, this patch adds pskb_may_pull() calls in is_ip_tx_frame(), just before
> call to vrf_process_... functions.
> 
> Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
> ---
>  drivers/net/vrf.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index 54edf8956a25..d552f29a58d1 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -292,13 +292,16 @@ static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, struct net_device *dev)
>  {
>  	switch (skb->protocol) {
>  	case htons(ETH_P_IP):
> +		if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct iphdr))
> +			break;

that check goes in vrf_process_v4_outbound.

>  		return vrf_process_v4_outbound(skb, dev);
>  	case htons(ETH_P_IPV6):
> +		if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct ipv6hdr))
> +			break;

that check goes in vrf_process_v6_outbound

leave this higher level sorter untouched.

>  		return vrf_process_v6_outbound(skb, dev);
> -	default:
> -		vrf_tx_error(dev, skb);
> -		return NET_XMIT_DROP;
>  	}
> +	vrf_tx_error(dev, skb);
> +	return NET_XMIT_DROP;
>  }
>  
>  static netdev_tx_t vrf_xmit(struct sk_buff *skb, struct net_device *dev)
> 


^ permalink raw reply

* Re: [PATCH] virtio-net: parameterize min ring num_free for virtio receive
From: Jason Wang @ 2019-07-18 14:01 UTC (permalink / raw)
  To: Michael S. Tsirkin, ? jiang
  Cc: davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net,
	jakub.kicinski@netronome.com, hawk@kernel.org,
	john.fastabend@gmail.com, kafai@fb.com, songliubraving@fb.com,
	yhs@fb.com, virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	xdp-newbies@vger.kernel.org, bpf@vger.kernel.org,
	jiangran.jr@alibaba-inc.com
In-Reply-To: <20190718085836-mutt-send-email-mst@kernel.org>


On 2019/7/18 下午9:04, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2019 at 12:55:50PM +0000, ? jiang wrote:
>> This change makes ring buffer reclaim threshold num_free configurable
>> for better performance, while it's hard coded as 1/2 * queue now.
>> According to our test with qemu + dpdk, packet dropping happens when
>> the guest is not able to provide free buffer in avail ring timely.
>> Smaller value of num_free does decrease the number of packet dropping
>> during our test as it makes virtio_net reclaim buffer earlier.
>>
>> At least, we should leave the value changeable to user while the
>> default value as 1/2 * queue is kept.
>>
>> Signed-off-by: jiangkidd<jiangkidd@hotmail.com>
> That would be one reason, but I suspect it's not the
> true one. If you need more buffer due to jitter
> then just increase the queue size. Would be cleaner.
>
>
> However are you sure this is the reason for
> packet drops? Do you see them dropped by dpdk
> due to lack of space in the ring? As opposed to
> by guest?
>
>

Besides those, this patch depends on the user to choose a suitable 
threshold which is not good. You need either a good value with 
demonstrated numbers or something smarter.

Thanks


^ permalink raw reply

* Re: [PATCH] cxgb4: Prefer pcie_capability_read_word()
From: Bjorn Helgaas @ 2019-07-18 13:50 UTC (permalink / raw)
  To: Frederick Lawler; +Cc: vishal, netdev, linux-kernel, Bjorn Helgaas
In-Reply-To: <20190718020745.8867-1-fred@fredlawl.com>

On Wed, Jul 17, 2019 at 9:08 PM Frederick Lawler <fred@fredlawl.com> wrote:
>
> Commit 8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability")
> added accessors for the PCI Express Capability so that drivers didn't
> need to be aware of differences between v1 and v2 of the PCI
> Express Capability.
>
> Replace pci_read_config_word() and pci_write_config_word() calls with
> pcie_capability_read_word() and pcie_capability_write_word().
>
> Signed-off-by: Frederick Lawler <fred@fredlawl.com>

Nice job on all these patches!  These all help avoid errors and
identify possibilities for refactoring.

If there were a cover letter for the series, I would have replied to
that, but for all of them:

Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>

If you post the series again for any reason, you can add that.
Otherwise, whoever applies them can add my reviewed-by.

> ---
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 6 ++----
>  drivers/net/ethernet/chelsio/cxgb4/t4_hw.c      | 9 +++------
>  2 files changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> index 715e4edcf4a2..98ff71434673 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> @@ -5441,7 +5441,6 @@ static int cxgb4_iov_configure(struct pci_dev *pdev, int num_vfs)
>                 char name[IFNAMSIZ];
>                 u32 devcap2;
>                 u16 flags;
> -               int pos;
>
>                 /* If we want to instantiate Virtual Functions, then our
>                  * parent bridge's PCI-E needs to support Alternative Routing
> @@ -5449,9 +5448,8 @@ static int cxgb4_iov_configure(struct pci_dev *pdev, int num_vfs)
>                  * and above.
>                  */
>                 pbridge = pdev->bus->self;
> -               pos = pci_find_capability(pbridge, PCI_CAP_ID_EXP);
> -               pci_read_config_word(pbridge, pos + PCI_EXP_FLAGS, &flags);
> -               pci_read_config_dword(pbridge, pos + PCI_EXP_DEVCAP2, &devcap2);
> +               pcie_capability_read_word(pbridge, PCI_EXP_FLAGS, &flags);
> +               pcie_capability_read_dword(pbridge, PCI_EXP_DEVCAP2, &devcap2);
>
>                 if ((flags & PCI_EXP_FLAGS_VERS) < 2 ||
>                     !(devcap2 & PCI_EXP_DEVCAP2_ARI)) {
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
> index f9b70be59792..346d7b59c50b 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
> @@ -7267,7 +7267,6 @@ int t4_fixup_host_params(struct adapter *adap, unsigned int page_size,
>         } else {
>                 unsigned int pack_align;
>                 unsigned int ingpad, ingpack;
> -               unsigned int pcie_cap;
>
>                 /* T5 introduced the separation of the Free List Padding and
>                  * Packing Boundaries.  Thus, we can select a smaller Padding
> @@ -7292,8 +7291,7 @@ int t4_fixup_host_params(struct adapter *adap, unsigned int page_size,
>                  * multiple of the Maximum Payload Size.
>                  */
>                 pack_align = fl_align;
> -               pcie_cap = pci_find_capability(adap->pdev, PCI_CAP_ID_EXP);
> -               if (pcie_cap) {
> +               if (pci_is_pcie(adap->pdev)) {
>                         unsigned int mps, mps_log;
>                         u16 devctl;
>
> @@ -7301,9 +7299,8 @@ int t4_fixup_host_params(struct adapter *adap, unsigned int page_size,
>                          * [bits 7:5] encodes sizes as powers of 2 starting at
>                          * 128 bytes.
>                          */
> -                       pci_read_config_word(adap->pdev,
> -                                            pcie_cap + PCI_EXP_DEVCTL,
> -                                            &devctl);
> +                       pcie_capability_read_word(adap->pdev, PCI_EXP_DEVCTL,
> +                                                 &devctl);
>                         mps_log = ((devctl & PCI_EXP_DEVCTL_PAYLOAD) >> 5) + 7;
>                         mps = 1 << mps_log;
>                         if (mps > pack_align)
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH] virtio-net: parameterize min ring num_free for virtio receive
From: Michael S. Tsirkin @ 2019-07-18 13:04 UTC (permalink / raw)
  To: ? jiang
  Cc: jasowang@redhat.com, davem@davemloft.net, ast@kernel.org,
	daniel@iogearbox.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com, kafai@fb.com,
	songliubraving@fb.com, yhs@fb.com,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, xdp-newbies@vger.kernel.org,
	bpf@vger.kernel.org, jiangran.jr@alibaba-inc.com
In-Reply-To: <BYAPR14MB32056583C4963342F5D817C4A6C80@BYAPR14MB3205.namprd14.prod.outlook.com>

On Thu, Jul 18, 2019 at 12:55:50PM +0000, ? jiang wrote:
> This change makes ring buffer reclaim threshold num_free configurable
> for better performance, while it's hard coded as 1/2 * queue now.
> According to our test with qemu + dpdk, packet dropping happens when
> the guest is not able to provide free buffer in avail ring timely.
> Smaller value of num_free does decrease the number of packet dropping
> during our test as it makes virtio_net reclaim buffer earlier.
> 
> At least, we should leave the value changeable to user while the
> default value as 1/2 * queue is kept.
> 
> Signed-off-by: jiangkidd <jiangkidd@hotmail.com>

That would be one reason, but I suspect it's not the
true one. If you need more buffer due to jitter
then just increase the queue size. Would be cleaner.


However are you sure this is the reason for
packet drops? Do you see them dropped by dpdk
due to lack of space in the ring? As opposed to
by guest?


> ---
>  drivers/net/virtio_net.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0d4115c9e20b..bc190dec6084 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -26,6 +26,9 @@
>  static int napi_weight = NAPI_POLL_WEIGHT;
>  module_param(napi_weight, int, 0444);
>  
> +static int min_numfree;
> +module_param(min_numfree, int, 0444);
> +
>  static bool csum = true, gso = true, napi_tx;
>  module_param(csum, bool, 0444);
>  module_param(gso, bool, 0444);
> @@ -1315,6 +1318,9 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>  	void *buf;
>  	int i;
>  
> +	if (!min_numfree)
> +		min_numfree = virtqueue_get_vring_size(rq->vq) / 2;
> +
>  	if (!vi->big_packets || vi->mergeable_rx_bufs) {
>  		void *ctx;
>  
> @@ -1331,7 +1337,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>  		}
>  	}
>  
> -	if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) {
> +	if (rq->vq->num_free > min_numfree) {
>  		if (!try_fill_recv(vi, rq, GFP_ATOMIC))
>  			schedule_delayed_work(&vi->refill, 0);
>  	}
> -- 
> 2.11.0

^ permalink raw reply

* Re: regression with napi/softirq ?
From: Sudip Mukherjee @ 2019-07-18 12:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), David S. Miller, netdev,
	linux-kernel
In-Reply-To: <8124bbe5-eaa8-2106-2695-4788ec0f6544@gmail.com>

On Thu, Jul 18, 2019 at 12:42 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 7/18/19 1:18 PM, Sudip Mukherjee wrote:
> > Hi Eric,
> >
> > On Thu, Jul 18, 2019 at 7:58 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>
> >>
> >>
> >> On 7/17/19 11:52 PM, Thomas Gleixner wrote:
> >>> Sudip,
> >>>
> >>> On Wed, 17 Jul 2019, Sudip Mukherjee wrote:
> >>>> On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >>>>> You can hack ksoftirq_running() to return always false to avoid this, but
> >>>>> that might cause application starvation and a huge packet buffer backlog
> >>>>> when the amount of incoming packets makes the CPU do nothing else than
> >>>>> softirq processing.
> >>>>
> >>>> I tried that now, it is better but still not as good as v3.8
> >>>> Now I am getting 375.9usec as the maximum time between raising the softirq
> >>>> and it starting to execute and packet drops still there.
> >>>>
> >>>> And just a thought, do you think there should be a CONFIG_ option for
> >>>> this feature of ksoftirqd_running() so that it can be disabled if needed
> >>>> by users like us?
> >>>
> >>> If at all then a sysctl to allow runtime control.
> >>>
> >
> > <snip>
> >
> >>
> >> ksoftirqd might be spuriously scheduled from tx path, when
> >> __qdisc_run() also reacts to need_resched().
> >>
> >> By raising NET_TX while we are processing NET_RX (say we send a TCP ACK packet
> >> in response to incoming packet), we force __do_softirq() to perform
> >> another loop, but before doing an other round, it will also check need_resched()
> >> and eventually call wakeup_softirqd()
> >>
> >> I wonder if following patch makes any difference.
> >>
> >> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> >> index 11c03cf4aa74b44663c74e0e3284140b0c75d9c4..ab736e974396394ae6ba409868aaea56a50ad57b 100644
> >> --- a/net/sched/sch_generic.c
> >> +++ b/net/sched/sch_generic.c
> >> @@ -377,6 +377,8 @@ void __qdisc_run(struct Qdisc *q)
> >>         int packets;
> >>
> >>         while (qdisc_restart(q, &packets)) {
> >> +               if (qdisc_is_empty(q))
> >> +                       break;
> >
> > unfortunately its v4.14.55 and qdisc_is_empty() is not yet introduced.
> > And I can not backport 28cff537ef2e ("net: sched: add empty status
> > flag for NOLOCK qdisc")
> > also as TCQ_F_NOLOCK is there. :(
> >
>
> On old kernels, you can simply use
>
> static inline bool qdisc_is_empty(struct Qdisc *q)
> {
>         return !qdisc_qlen(q);
> }
>

Thanks Eric. But there is no improvement in delay between
softirq_raise and softirq_entry with this change.
But moving to a later kernel (linus master branch? ) like Thomas has
said in the other mail might be difficult atm. I can definitely
move to v4.14.133 if that helps. Thomas ?


-- 
Regards
Sudip

^ permalink raw reply

* Re: regression with napi/softirq ?
From: Thomas Gleixner @ 2019-07-18 12:40 UTC (permalink / raw)
  To: Sudip Mukherjee
  Cc: Eric Dumazet, Peter Zijlstra (Intel), David S. Miller, netdev,
	linux-kernel
In-Reply-To: <CADVatmN6xNO1iMQ4ihsT5OqV2cuj2ajq+v00NrtUyOHkiKPo-Q@mail.gmail.com>

On Thu, 18 Jul 2019, Sudip Mukherjee wrote:
> On Thu, Jul 18, 2019 at 7:58 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > ksoftirqd might be spuriously scheduled from tx path, when
> > __qdisc_run() also reacts to need_resched().
> >
> > By raising NET_TX while we are processing NET_RX (say we send a TCP ACK packet
> > in response to incoming packet), we force __do_softirq() to perform
> > another loop, but before doing an other round, it will also check need_resched()
> > and eventually call wakeup_softirqd()
> >
> > I wonder if following patch makes any difference.
> >
> > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> > index 11c03cf4aa74b44663c74e0e3284140b0c75d9c4..ab736e974396394ae6ba409868aaea56a50ad57b 100644
> > --- a/net/sched/sch_generic.c
> > +++ b/net/sched/sch_generic.c
> > @@ -377,6 +377,8 @@ void __qdisc_run(struct Qdisc *q)
> >         int packets;
> >
> >         while (qdisc_restart(q, &packets)) {
> > +               if (qdisc_is_empty(q))
> > +                       break;
> 
> unfortunately its v4.14.55 and qdisc_is_empty() is not yet introduced.
> And I can not backport 28cff537ef2e ("net: sched: add empty status
> flag for NOLOCK qdisc")
> also as TCQ_F_NOLOCK is there. :(

Then please run the test on a current kernel. Backports can be discussed
once the problem is understood.

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed
From: Michael S. Tsirkin @ 2019-07-18 12:33 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <CAGxU2F5ybg1_8VhS=COMnxSKC4AcW4ZagYwNMi==d6-rNPgzsg@mail.gmail.com>

On Thu, Jul 18, 2019 at 09:52:41AM +0200, Stefano Garzarella wrote:
> On Wed, Jul 17, 2019 at 5:00 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> > > Since now we are able to split packets, we can avoid limiting
> > > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> > > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> > > packet size.
> > >
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> >
> >
> > OK so this is kind of like GSO where we are passing
> > 64K packets to the vsock and then split at the
> > low level.
> 
> Exactly, something like that in the Host->Guest path, instead in the
> Guest->Host we use the entire 64K packet.
> 
> Thanks,
> Stefano

btw two allocations for each packet isn't great. How about
allocating the struct linearly with the data?
And all buffers are same length for you - so you can actually
do alloc_pages.
Allocating/freeing pages in a batch should also be considered.

-- 
MST

^ permalink raw reply

* KASAN: use-after-free Read in nr_insert_socket
From: syzbot @ 2019-07-18 12:18 UTC (permalink / raw)
  To: davem, linux-hams, linux-kernel, netdev, ralf, syzkaller-bugs,
	xiyou.wangcong

Hello,

syzbot found the following crash on:

HEAD commit:    a5b64700 fix: taprio: Change type of txtime-delay paramete..
git tree:       net
console output: https://syzkaller.appspot.com/x/log.txt?x=1588b458600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=87305c3ca9c25c70
dashboard link: https://syzkaller.appspot.com/bug?extid=9399c158fcc09b21d0d2
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=105a61a4600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=153ef948600000

The bug was bisected to:

commit c8c8218ec5af5d2598381883acbefbf604e56b5e
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Thu Jun 27 21:30:58 2019 +0000

     netrom: fix a memory leak in nr_rx_frame()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=159ef948600000
final crash:    https://syzkaller.appspot.com/x/report.txt?x=179ef948600000
console output: https://syzkaller.appspot.com/x/log.txt?x=139ef948600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+9399c158fcc09b21d0d2@syzkaller.appspotmail.com
Fixes: c8c8218ec5af ("netrom: fix a memory leak in nr_rx_frame()")

==================================================================
BUG: KASAN: use-after-free in atomic_read  
/./include/asm-generic/atomic-instrumented.h:26 [inline]
BUG: KASAN: use-after-free in refcount_inc_not_zero_checked+0x81/0x200  
/lib/refcount.c:123
Read of size 4 at addr ffff8880a5d3f380 by task swapper/1/0

CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.2.0+ #89
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  <IRQ>
  __dump_stack /lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 /lib/dump_stack.c:113
  print_address_description.cold+0xd4/0x306 /mm/kasan/report.c:351
  __kasan_report.cold+0x1b/0x36 /mm/kasan/report.c:482
  kasan_report+0x12/0x20 /mm/kasan/common.c:612
  check_memory_region_inline /mm/kasan/generic.c:185 [inline]
  check_memory_region+0x134/0x1a0 /mm/kasan/generic.c:192
  __kasan_check_read+0x11/0x20 /mm/kasan/common.c:92
  atomic_read /./include/asm-generic/atomic-instrumented.h:26 [inline]
  refcount_inc_not_zero_checked+0x81/0x200 /lib/refcount.c:123
  refcount_inc_checked+0x17/0x70 /lib/refcount.c:156
  sock_hold /./include/net/sock.h:649 [inline]
  sk_add_node /./include/net/sock.h:701 [inline]
  nr_insert_socket+0x2d/0xe0 /net/netrom/af_netrom.c:137
  nr_rx_frame+0x1605/0x1e80 /net/netrom/af_netrom.c:1023
  nr_loopback_timer+0x7b/0x170 /net/netrom/nr_loopback.c:59
  call_timer_fn+0x1ac/0x780 /kernel/time/timer.c:1322
  expire_timers /kernel/time/timer.c:1366 [inline]
  __run_timers /kernel/time/timer.c:1685 [inline]
  __run_timers /kernel/time/timer.c:1653 [inline]
  run_timer_softirq+0x697/0x17a0 /kernel/time/timer.c:1698
  __do_softirq+0x262/0x98c /kernel/softirq.c:292
  invoke_softirq /kernel/softirq.c:373 [inline]
  irq_exit+0x19b/0x1e0 /kernel/softirq.c:413
  exiting_irq /./arch/x86/include/asm/apic.h:537 [inline]
  smp_apic_timer_interrupt+0x1a3/0x610 /arch/x86/kernel/apic/apic.c:1095
  apic_timer_interrupt+0xf/0x20 /arch/x86/entry/entry_64.S:828
  </IRQ>
RIP: 0010:native_safe_halt+0xe/0x10 /./arch/x86/include/asm/irqflags.h:61
Code: e8 2b 7b fa eb 8a 90 90 90 90 90 90 e9 07 00 00 00 0f 00 2d d4 0e 57  
00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d c4 0e 57 00 fb f4 <c3> 90 55 48 89  
e5 41 57 41 56 41 55 41 54 53 e8 7e 27 2f fa e8 59
RSP: 0018:ffff8880a98e7d68 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
RAX: 1ffffffff11a5ca5 RBX: ffff8880a98ce340 RCX: 0000000000000000
RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffff8880a98cebcc
RBP: ffff8880a98e7d98 R08: ffff8880a98ce340 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000
R13: ffffffff89a29778 R14: 0000000000000000 R15: 0000000000000001
  arch_cpu_idle+0xa/0x10 /arch/x86/kernel/process.c:571
  default_idle_call+0x84/0xb0 /kernel/sched/idle.c:94
  cpuidle_idle_call /kernel/sched/idle.c:154 [inline]
  do_idle+0x413/0x760 /kernel/sched/idle.c:263
  cpu_startup_entry+0x1b/0x20 /kernel/sched/idle.c:354
  start_secondary+0x315/0x430 /arch/x86/kernel/smpboot.c:264
  secondary_startup_64+0xa4/0xb0 /arch/x86/kernel/head_64.S:243

Allocated by task 0:
  save_stack+0x23/0x90 /mm/kasan/common.c:69
  set_track /mm/kasan/common.c:77 [inline]
  __kasan_kmalloc /mm/kasan/common.c:487 [inline]
  __kasan_kmalloc.constprop.0+0xcf/0xe0 /mm/kasan/common.c:460
  kasan_kmalloc+0x9/0x10 /mm/kasan/common.c:501
  __do_kmalloc /mm/slab.c:3655 [inline]
  __kmalloc+0x163/0x780 /mm/slab.c:3664
  kmalloc /./include/linux/slab.h:557 [inline]
  sk_prot_alloc+0x23a/0x310 /net/core/sock.c:1603
  sk_alloc+0x39/0xf70 /net/core/sock.c:1657
  nr_make_new /net/netrom/af_netrom.c:476 [inline]
  nr_rx_frame+0x733/0x1e80 /net/netrom/af_netrom.c:959
  nr_loopback_timer+0x7b/0x170 /net/netrom/nr_loopback.c:59
  call_timer_fn+0x1ac/0x780 /kernel/time/timer.c:1322
  expire_timers /kernel/time/timer.c:1366 [inline]
  __run_timers /kernel/time/timer.c:1685 [inline]
  __run_timers /kernel/time/timer.c:1653 [inline]
  run_timer_softirq+0x697/0x17a0 /kernel/time/timer.c:1698
  __do_softirq+0x262/0x98c /kernel/softirq.c:292

Freed by task 11342:
  save_stack+0x23/0x90 /mm/kasan/common.c:69
  set_track /mm/kasan/common.c:77 [inline]
  __kasan_slab_free+0x102/0x150 /mm/kasan/common.c:449
  kasan_slab_free+0xe/0x10 /mm/kasan/common.c:457
  __cache_free /mm/slab.c:3425 [inline]
  kfree+0x10a/0x2c0 /mm/slab.c:3756
  sk_prot_free /net/core/sock.c:1640 [inline]
  __sk_destruct+0x4f7/0x6e0 /net/core/sock.c:1726
  sk_destruct+0x86/0xa0 /net/core/sock.c:1734
  __sk_free+0xfb/0x360 /net/core/sock.c:1745
  sk_free+0x42/0x50 /net/core/sock.c:1756
  sock_put /./include/net/sock.h:1725 [inline]
  sock_efree+0x61/0x80 /net/core/sock.c:2042
  skb_release_head_state+0xeb/0x260 /net/core/skbuff.c:652
  skb_release_all+0x16/0x60 /net/core/skbuff.c:663
  __kfree_skb /net/core/skbuff.c:679 [inline]
  kfree_skb /net/core/skbuff.c:697 [inline]
  kfree_skb+0x101/0x3c0 /net/core/skbuff.c:691
  nr_accept+0x570/0x720 /net/netrom/af_netrom.c:819
  __sys_accept4+0x34e/0x6a0 /net/socket.c:1750
  __do_sys_accept4 /net/socket.c:1785 [inline]
  __se_sys_accept4 /net/socket.c:1782 [inline]
  __x64_sys_accept4+0x97/0xf0 /net/socket.c:1782
  do_syscall_64+0xfd/0x6a0 /arch/x86/entry/common.c:296
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8880a5d3f300
  which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 128 bytes inside of
  2048-byte region [ffff8880a5d3f300, ffff8880a5d3fb00)
The buggy address belongs to the page:
page:ffffea0002974f80 refcount:1 mapcount:0 mapping:ffff8880aa400e00  
index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea00021bf808 ffffea000282a708 ffff8880aa400e00
raw: 0000000000000000 ffff8880a5d3e200 0000000100000003 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8880a5d3f280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8880a5d3f300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8880a5d3f380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                    ^
  ffff8880a5d3f400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8880a5d3f480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: regression with napi/softirq ?
From: Eric Dumazet @ 2019-07-18 11:42 UTC (permalink / raw)
  To: Sudip Mukherjee
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), David S. Miller, netdev,
	linux-kernel
In-Reply-To: <CADVatmN6xNO1iMQ4ihsT5OqV2cuj2ajq+v00NrtUyOHkiKPo-Q@mail.gmail.com>



On 7/18/19 1:18 PM, Sudip Mukherjee wrote:
> Hi Eric,
> 
> On Thu, Jul 18, 2019 at 7:58 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>>
>> On 7/17/19 11:52 PM, Thomas Gleixner wrote:
>>> Sudip,
>>>
>>> On Wed, 17 Jul 2019, Sudip Mukherjee wrote:
>>>> On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>>>>> You can hack ksoftirq_running() to return always false to avoid this, but
>>>>> that might cause application starvation and a huge packet buffer backlog
>>>>> when the amount of incoming packets makes the CPU do nothing else than
>>>>> softirq processing.
>>>>
>>>> I tried that now, it is better but still not as good as v3.8
>>>> Now I am getting 375.9usec as the maximum time between raising the softirq
>>>> and it starting to execute and packet drops still there.
>>>>
>>>> And just a thought, do you think there should be a CONFIG_ option for
>>>> this feature of ksoftirqd_running() so that it can be disabled if needed
>>>> by users like us?
>>>
>>> If at all then a sysctl to allow runtime control.
>>>
> 
> <snip>
> 
>>
>> ksoftirqd might be spuriously scheduled from tx path, when
>> __qdisc_run() also reacts to need_resched().
>>
>> By raising NET_TX while we are processing NET_RX (say we send a TCP ACK packet
>> in response to incoming packet), we force __do_softirq() to perform
>> another loop, but before doing an other round, it will also check need_resched()
>> and eventually call wakeup_softirqd()
>>
>> I wonder if following patch makes any difference.
>>
>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
>> index 11c03cf4aa74b44663c74e0e3284140b0c75d9c4..ab736e974396394ae6ba409868aaea56a50ad57b 100644
>> --- a/net/sched/sch_generic.c
>> +++ b/net/sched/sch_generic.c
>> @@ -377,6 +377,8 @@ void __qdisc_run(struct Qdisc *q)
>>         int packets;
>>
>>         while (qdisc_restart(q, &packets)) {
>> +               if (qdisc_is_empty(q))
>> +                       break;
> 
> unfortunately its v4.14.55 and qdisc_is_empty() is not yet introduced.
> And I can not backport 28cff537ef2e ("net: sched: add empty status
> flag for NOLOCK qdisc")
> also as TCQ_F_NOLOCK is there. :(
> 

On old kernels, you can simply use

static inline bool qdisc_is_empty(struct Qdisc *q)
{
	return !qdisc_qlen(q);
}


^ permalink raw reply

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Michael S. Tsirkin @ 2019-07-18 11:35 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <CAGxU2F6oo7Cou7t9o=gG2=wxHMKX9xYQXNxVtDYeHq5fyEhJWg@mail.gmail.com>

On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
> On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > > If the packets to sent to the guest are bigger than the buffer
> > > > > available, we can split them, using multiple buffers and fixing
> > > > > the length in the packet header.
> > > > > This is safe since virtio-vsock supports only stream sockets.
> > > > >
> > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > >
> > > > So how does it work right now? If an app
> > > > does sendmsg with a 64K buffer and the other
> > > > side publishes 4K buffers - does it just stall?
> > >
> > > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > > (fixed in the code) and queued in an internal list for the TX worker.
> > >
> > > After this series, we will queue up to 64K packets and then it will be split in
> > > the TX worker, depending on the size of the buffers available in the
> > > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > > for now we postponed it)
> >
> > Got it. Using workers for xmit is IMHO a bad idea btw.
> > Why is it done like this?
> 
> Honestly, I don't know the exact reasons for this design, but I suppose
> that the idea was to have only one worker that uses the vring, and
> multiple user threads that enqueue packets in the list.
> This can simplify the code and we can put the user threads to sleep if
> we don't have "credit" available (this means that the receiver doesn't
> have space to receive the packet).


I think you mean the reverse: even without credits you can copy from
user and queue up data, then process it without waking up the user
thread.
Does it help though? It certainly adds up work outside of
user thread context which means it's not accounted for
correctly.

Maybe we want more VQs. Would help improve parallelism. The question
would then become how to map sockets to VQs. With a simple hash
it's easy to create collisions ...


> 
> What are the drawbacks in your opinion?
> 
> 
> Thanks,
> Stefano

- More pressure on scheduler
- Increased latency


-- 
MST

^ permalink raw reply

* Re: regression with napi/softirq ?
From: Sudip Mukherjee @ 2019-07-18 11:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, Peter Zijlstra (Intel), David S. Miller, netdev,
	linux-kernel
In-Reply-To: <052e43b6-26f8-3e46-784e-dc3c6a82bdf0@gmail.com>

Hi Eric,

On Thu, Jul 18, 2019 at 7:58 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 7/17/19 11:52 PM, Thomas Gleixner wrote:
> > Sudip,
> >
> > On Wed, 17 Jul 2019, Sudip Mukherjee wrote:
> >> On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >>> You can hack ksoftirq_running() to return always false to avoid this, but
> >>> that might cause application starvation and a huge packet buffer backlog
> >>> when the amount of incoming packets makes the CPU do nothing else than
> >>> softirq processing.
> >>
> >> I tried that now, it is better but still not as good as v3.8
> >> Now I am getting 375.9usec as the maximum time between raising the softirq
> >> and it starting to execute and packet drops still there.
> >>
> >> And just a thought, do you think there should be a CONFIG_ option for
> >> this feature of ksoftirqd_running() so that it can be disabled if needed
> >> by users like us?
> >
> > If at all then a sysctl to allow runtime control.
> >

<snip>

>
> ksoftirqd might be spuriously scheduled from tx path, when
> __qdisc_run() also reacts to need_resched().
>
> By raising NET_TX while we are processing NET_RX (say we send a TCP ACK packet
> in response to incoming packet), we force __do_softirq() to perform
> another loop, but before doing an other round, it will also check need_resched()
> and eventually call wakeup_softirqd()
>
> I wonder if following patch makes any difference.
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 11c03cf4aa74b44663c74e0e3284140b0c75d9c4..ab736e974396394ae6ba409868aaea56a50ad57b 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -377,6 +377,8 @@ void __qdisc_run(struct Qdisc *q)
>         int packets;
>
>         while (qdisc_restart(q, &packets)) {
> +               if (qdisc_is_empty(q))
> +                       break;

unfortunately its v4.14.55 and qdisc_is_empty() is not yet introduced.
And I can not backport 28cff537ef2e ("net: sched: add empty status
flag for NOLOCK qdisc")
also as TCQ_F_NOLOCK is there. :(


-- 
Regards
Sudip

^ permalink raw reply

* Re: [RFC v2] vhost: introduce mdev based hardware vhost backend
From: Jason Wang @ 2019-07-18 10:31 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: Alex Williamson, mst, maxime.coquelin, linux-kernel, kvm,
	virtualization, netdev, dan.daly, cunming.liang, zhihong.wang,
	idos, Rob Miller, Ariel Adam
In-Reply-To: <1b49aa84-2c1f-eec2-2809-711e1f2dd7de@redhat.com>


On 2019/7/10 下午3:22, Jason Wang wrote:
>> Yeah, that's a major concern. If it's true, is it something
>> that's not acceptable?
>
>
> I think not, but I don't know if any other one that care this.
>
>
>>
>>> And I do see some new RFC for VFIO to add more DMA API.
>> Is there any pointers?
>
>
> I don't remember the details, but it should be something related to 
> SVA support in recent intel IOMMU.


E.g this series:

https://www.spinics.net/lists/iommu/msg37146.html

Thanks


^ permalink raw reply

* [PATCH] Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
From: Peter Kosyh @ 2019-07-18 10:26 UTC (permalink / raw)
  To: p.kosyh; +Cc: davem, David Ahern, Shrijeet Mukherjee, netdev, linux-kernel

vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing
using ip/ipv6 addresses, but don't make sure the header is available in
skb->data[] (skb_headlen() is less then header size).

The situation may occures while forwarding from MPLS layer to vrf, for example.

So, patch adds pskb_may_pull() calls in is_ip_tx_frame(), just before call
to vrf_process_... functions.

Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
---
 drivers/net/vrf.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 54edf8956a25..7c0200bc46f9 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -292,13 +292,16 @@ static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, struct net_device *dev)
 {
 	switch (skb->protocol) {
 	case htons(ETH_P_IP):
+		if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct iphdr)))
+			break;
 		return vrf_process_v4_outbound(skb, dev);
 	case htons(ETH_P_IPV6):
+		if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct ipv6hdr)))
+			break;
 		return vrf_process_v6_outbound(skb, dev);
-	default:
-		vrf_tx_error(dev, skb);
-		return NET_XMIT_DROP;
 	}
+	vrf_tx_error(dev, skb);
+	return NET_XMIT_DROP;
 }
 
 static netdev_tx_t vrf_xmit(struct sk_buff *skb, struct net_device *dev)
-- 
2.11.0


^ permalink raw reply related

* [RFC net] net: generate icmp redirects after netfilter forward hook
From: Florian Westphal @ 2019-07-18 10:24 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel, Florian Westphal, Jason Muskat

Quoting from
https://bugzilla.kernel.org/show_bug.cgi?id=204155

 ----
1. Configure an PPPoE interface (e.g., ppp0) with an IPv6 address and
   a prefixlen of 64; e.g., 2001:0DB8::1/64.
2. Configure a Netfilter ip6table rule to drop IN to OUT forwarding of traffic
   across the PPP interface: ip6tables -A FORWARD -i ppp+ -o ppp+ -j DROP
3. Create a Netfilter NFLOG rule to log all outbound traffic.
4. From an Internet IPv6 Address initiate TCP traffic to an IPv6 address
   within the /64 IPv6 address space from step 1, but an IPv6 address that
   is NOT configured on that interface; e.g., ` nc 2001:0DB8::2 80`
5. Observe the NFLOG showing the Netfilter ip6table filter FORWARD rule
   is matched and therefore the traffic should be dropped.
6. Observe the traffic from step 4, that should have been dropped,
   resulted in an outbound ICMPv6 Redirect with a source IPv6 address of
   the PPP interface’s Local Link to the Internet IPv6 Address.
 ----

Problem is that we emit the redirect before passing the packet to the
netfilter FORWARD hook.   The same "problem" exists in ipv4.

There are various counter-arguments to changing this, e.g.  we would
still emit such redirect when packet is dropped later in the stack
(e.g. in POSTROUTING or qdisc).

We will also still emit e.g. ICMPV6 PKTTOOBIG error, as that occurs
before FORWARD as well, and moving that seems wrong (packet
has to be dropped).

The only argument that I can think of in favor of this change
is the lack of a proper alternative to filtering such traffic.

PREROUTING would work, but at that point we lack the
"packet will be forwarded from ppp0 to ppp0" information that
we only have available in FORWARD.

Compile tested only.

Cc: Jason Muskat <jason@metalcastle.ca>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/ipv4/ip_forward.c | 16 +++++------
 net/ipv6/ip6_output.c | 63 ++++++++++++++++++++++++-------------------
 2 files changed, 44 insertions(+), 35 deletions(-)

diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 06f6f280b9ff..33c46470c966 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -69,6 +69,14 @@ static int ip_forward_finish(struct net *net, struct sock *sk, struct sk_buff *s
 	__IP_INC_STATS(net, IPSTATS_MIB_OUTFORWDATAGRAMS);
 	__IP_ADD_STATS(net, IPSTATS_MIB_OUTOCTETS, skb->len);
 
+	/*
+	 *	We now generate an ICMP HOST REDIRECT giving the route
+	 *	we calculated.
+	 */
+	if (IPCB(skb)->flags & IPSKB_DOREDIRECT && !opt->srr &&
+	    !skb_sec_path(skb))
+		ip_rt_send_redirect(skb);
+
 #ifdef CONFIG_NET_SWITCHDEV
 	if (skb->offload_l3_fwd_mark) {
 		consume_skb(skb);
@@ -143,14 +151,6 @@ int ip_forward(struct sk_buff *skb)
 	/* Decrease ttl after skb cow done */
 	ip_decrease_ttl(iph);
 
-	/*
-	 *	We now generate an ICMP HOST REDIRECT giving the route
-	 *	we calculated.
-	 */
-	if (IPCB(skb)->flags & IPSKB_DOREDIRECT && !opt->srr &&
-	    !skb_sec_path(skb))
-		ip_rt_send_redirect(skb);
-
 	if (net->ipv4.sysctl_ip_fwd_update_priority)
 		skb->priority = rt_tos2priority(iph->tos);
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 8e49fd62eea9..2dafd2da2926 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -383,11 +383,45 @@ static int ip6_forward_proxy_check(struct sk_buff *skb)
 static inline int ip6_forward_finish(struct net *net, struct sock *sk,
 				     struct sk_buff *skb)
 {
+	struct inet6_skb_parm *opt = IP6CB(skb);
 	struct dst_entry *dst = skb_dst(skb);
 
 	__IP6_INC_STATS(net, ip6_dst_idev(dst), IPSTATS_MIB_OUTFORWDATAGRAMS);
 	__IP6_ADD_STATS(net, ip6_dst_idev(dst), IPSTATS_MIB_OUTOCTETS, skb->len);
 
+	/* IPv6 specs say nothing about it, but it is clear that we cannot
+	   send redirects to source routed frames.
+	   We don't send redirects to frames decapsulated from IPsec.
+	 */
+	if (IP6CB(skb)->iif == dst->dev->ifindex &&
+	    opt->srcrt == 0 && !skb_sec_path(skb)) {
+		struct ipv6hdr *hdr = ipv6_hdr(skb);
+		struct in6_addr *target = NULL;
+		struct inet_peer *peer;
+		struct rt6_info *rt;
+
+		/*
+		 *	incoming and outgoing devices are the same
+		 *	send a redirect.
+		 */
+
+		rt = (struct rt6_info *) dst;
+		if (rt->rt6i_flags & RTF_GATEWAY)
+			target = &rt->rt6i_gateway;
+		else
+			target = &hdr->daddr;
+
+		peer = inet_getpeer_v6(net->ipv6.peers, &hdr->daddr, 1);
+
+		/* Limit redirects both by destination (here)
+		   and by source (inside ndisc_send_redirect)
+		 */
+		if (inet_peer_xrlim_allow(peer, 1*HZ))
+			ndisc_send_redirect(skb, target);
+		if (peer)
+			inet_putpeer(peer);
+	}
+
 #ifdef CONFIG_NET_SWITCHDEV
 	if (skb->offload_l3_fwd_mark) {
 		consume_skb(skb);
@@ -498,33 +532,8 @@ int ip6_forward(struct sk_buff *skb)
 	   send redirects to source routed frames.
 	   We don't send redirects to frames decapsulated from IPsec.
 	 */
-	if (IP6CB(skb)->iif == dst->dev->ifindex &&
-	    opt->srcrt == 0 && !skb_sec_path(skb)) {
-		struct in6_addr *target = NULL;
-		struct inet_peer *peer;
-		struct rt6_info *rt;
-
-		/*
-		 *	incoming and outgoing devices are the same
-		 *	send a redirect.
-		 */
-
-		rt = (struct rt6_info *) dst;
-		if (rt->rt6i_flags & RTF_GATEWAY)
-			target = &rt->rt6i_gateway;
-		else
-			target = &hdr->daddr;
-
-		peer = inet_getpeer_v6(net->ipv6.peers, &hdr->daddr, 1);
-
-		/* Limit redirects both by destination (here)
-		   and by source (inside ndisc_send_redirect)
-		 */
-		if (inet_peer_xrlim_allow(peer, 1*HZ))
-			ndisc_send_redirect(skb, target);
-		if (peer)
-			inet_putpeer(peer);
-	} else {
+	if (IP6CB(skb)->iif != dst->dev->ifindex ||
+	    opt->srcrt || skb_sec_path(skb)) {
 		int addrtype = ipv6_addr_type(&hdr->saddr);
 
 		/* This check is security critical. */
-- 
2.21.0


^ permalink raw reply related

* RE: [EXT] [PATCH] bnx2x: Prevent load reordering in tx completion processing
From: Manish Chopra @ 2019-07-18 10:12 UTC (permalink / raw)
  To: Brian King, GR-everest-linux-l2
  Cc: Sudarsana Reddy Kalluru, Ariel Elior, netdev@vger.kernel.org
In-Reply-To: <1563226910-21660-1-git-send-email-brking@linux.vnet.ibm.com>

> -----Original Message-----
> From: Brian King <brking@linux.vnet.ibm.com>
> Sent: Tuesday, July 16, 2019 3:12 AM
> To: GR-everest-linux-l2 <GR-everest-linux-l2@marvell.com>
> Cc: Sudarsana Reddy Kalluru <skalluru@marvell.com>; Ariel Elior
> <aelior@marvell.com>; netdev@vger.kernel.org; Brian King
> <brking@linux.vnet.ibm.com>
> Subject: [EXT] [PATCH] bnx2x: Prevent load reordering in tx completion
> processing
> 
> External Email
> 
> ----------------------------------------------------------------------
> This patch fixes an issue seen on Power systems with bnx2x which results in
> the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb pointer
> getting loaded in bnx2x_free_tx_pkt prior to the hw_cons load in
> bnx2x_tx_int. Adding a read memory barrier resolves the issue.
> 
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> index 656ed80..e2be5a6 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> @@ -285,6 +285,9 @@ int bnx2x_tx_int(struct bnx2x *bp, struct
> bnx2x_fp_txdata *txdata)
>  	hw_cons = le16_to_cpu(*txdata->tx_cons_sb);
>  	sw_cons = txdata->tx_pkt_cons;
> 
> +	/* Ensure subsequent loads occur after hw_cons */
> +	smp_rmb();
> +
>  	while (sw_cons != hw_cons) {
>  		u16 pkt_cons;
> 
> --
> 1.8.3.1

Could you please explain a bit in detail what could have caused skb to NULL exactly ?
Curious that if skb would have been NULL for some reason it did not cause NULL pointer dereference in bnx2x_free_tx_pkt() on below call -

prefetch(&skb->end);

Which is prior to the said WARN_ON(!skb) in bnx2x_free_tx_pkt().

^ permalink raw reply

* [PATCH] Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
From: Peter Kosyh @ 2019-07-18  9:41 UTC (permalink / raw)
  To: p.kosyh; +Cc: davem, David Ahern, Shrijeet Mukherjee, netdev, linux-kernel

vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing
using ip/ipv6 addresses, but don't make sure the header is available in
skb->data[] (skb_headlen() is less then header size).

The situation may occures while forwarding from MPLS layer to vrf, for
example.

So, this patch adds pskb_may_pull() calls in is_ip_tx_frame(), just before
call to vrf_process_... functions.

Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
---
 drivers/net/vrf.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 54edf8956a25..d552f29a58d1 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -292,13 +292,16 @@ static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, struct net_device *dev)
 {
 	switch (skb->protocol) {
 	case htons(ETH_P_IP):
+		if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct iphdr))
+			break;
 		return vrf_process_v4_outbound(skb, dev);
 	case htons(ETH_P_IPV6):
+		if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct ipv6hdr))
+			break;
 		return vrf_process_v6_outbound(skb, dev);
-	default:
-		vrf_tx_error(dev, skb);
-		return NET_XMIT_DROP;
 	}
+	vrf_tx_error(dev, skb);
+	return NET_XMIT_DROP;
 }
 
 static netdev_tx_t vrf_xmit(struct sk_buff *skb, struct net_device *dev)
-- 
2.11.0


^ permalink raw reply related

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Stefano Garzarella @ 2019-07-18  9:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190718041234-mutt-send-email-mst@kernel.org>

On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > If the packets to sent to the guest are bigger than the buffer
> > > > available, we can split them, using multiple buffers and fixing
> > > > the length in the packet header.
> > > > This is safe since virtio-vsock supports only stream sockets.
> > > >
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > >
> > > So how does it work right now? If an app
> > > does sendmsg with a 64K buffer and the other
> > > side publishes 4K buffers - does it just stall?
> >
> > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > (fixed in the code) and queued in an internal list for the TX worker.
> >
> > After this series, we will queue up to 64K packets and then it will be split in
> > the TX worker, depending on the size of the buffers available in the
> > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > for now we postponed it)
>
> Got it. Using workers for xmit is IMHO a bad idea btw.
> Why is it done like this?

Honestly, I don't know the exact reasons for this design, but I suppose
that the idea was to have only one worker that uses the vring, and
multiple user threads that enqueue packets in the list.
This can simplify the code and we can put the user threads to sleep if
we don't have "credit" available (this means that the receiver doesn't
have space to receive the packet).

What are the drawbacks in your opinion?


Thanks,
Stefano

^ permalink raw reply

* WARNING: refcount bug in nr_rx_frame
From: syzbot @ 2019-07-18  9:28 UTC (permalink / raw)
  To: davem, linux-hams, linux-kernel, netdev, ralf, syzkaller-bugs,
	xiyou.wangcong

Hello,

syzbot found the following crash on:

HEAD commit:    0a8ad0ff Merge tag 'for-linus-5.3-ofs1' of git://git.kerne..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16f3cb70600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=88095c4f62402bcd
dashboard link: https://syzkaller.appspot.com/bug?extid=622bdabb128acc33427d
compiler:       clang version 9.0.0 (/home/glider/llvm/clang  
80fee25776c2fb61e74c1ecb1a523375c2500b69)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=140a8cafa00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=123b2e68600000

The bug was bisected to:

commit c8c8218ec5af5d2598381883acbefbf604e56b5e
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Thu Jun 27 21:30:58 2019 +0000

     netrom: fix a memory leak in nr_rx_frame()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1479a100600000
final crash:    https://syzkaller.appspot.com/x/report.txt?x=1679a100600000
console output: https://syzkaller.appspot.com/x/log.txt?x=1279a100600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+622bdabb128acc33427d@syzkaller.appspotmail.com
Fixes: c8c8218ec5af ("netrom: fix a memory leak in nr_rx_frame()")

------------[ cut here ]------------
refcount_t: increment on 0; use-after-free.
WARNING: CPU: 1 PID: 0 at lib/refcount.c:156 refcount_inc_checked+0x4b/0x50  
/lib/refcount.c:156
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.2.0+ #33
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  <IRQ>
  __dump_stack /lib/dump_stack.c:77 [inline]
  dump_stack+0x1d8/0x2f8 /lib/dump_stack.c:113
  panic+0x29b/0x7d9 /kernel/panic.c:219
  __warn+0x22f/0x230 /kernel/panic.c:576
  report_bug+0x190/0x290 /lib/bug.c:186
  fixup_bug /arch/x86/kernel/traps.c:179 [inline]
  do_error_trap+0xd7/0x440 /arch/x86/kernel/traps.c:272
  do_invalid_op+0x36/0x40 /arch/x86/kernel/traps.c:291
  invalid_op+0x14/0x20 /arch/x86/entry/entry_64.S:1008
RIP: 0010:refcount_inc_checked+0x4b/0x50 /lib/refcount.c:156
Code: 3d d9 5e 94 05 01 75 08 e8 22 8f 11 fe 5b 5d c3 e8 1a 8f 11 fe c6 05  
c3 5e 94 05 01 48 c7 c7 49 ca 87 88 31 c0 e8 35 ca e2 fd <0f> 0b eb df 90  
55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 e4 e0
RSP: 0018:ffff8880aeb09b40 EFLAGS: 00010246
RAX: d6b548c1187c2900 RBX: ffff8880a149f6c0 RCX: ffff8880a98bc340
RDX: 0000000000000302 RSI: 0000000000000302 RDI: 0000000000000000
RBP: ffff8880aeb09b48 R08: ffffffff81600c14 R09: fffffbfff13bb2bb
R10: fffffbfff13bb2bb R11: 0000000000000000 R12: ffff88808240aa40
R13: dffffc0000000000 R14: 0000000000000004 R15: ffff8880a149f640
  sock_hold /./include/net/sock.h:649 [inline]
  sk_add_node /./include/net/sock.h:701 [inline]
  nr_insert_socket /net/netrom/af_netrom.c:137 [inline]
  nr_rx_frame+0x17bc/0x1e40 /net/netrom/af_netrom.c:1023
  nr_loopback_timer+0x6a/0x140 /net/netrom/nr_loopback.c:59
  call_timer_fn+0xec/0x200 /kernel/time/timer.c:1322
  expire_timers /kernel/time/timer.c:1366 [inline]
  __run_timers+0x7cd/0x9c0 /kernel/time/timer.c:1685
  run_timer_softirq+0x4a/0x90 /kernel/time/timer.c:1698
  __do_softirq+0x333/0x7c4 /./arch/x86/include/asm/paravirt.h:777
  invoke_softirq /kernel/softirq.c:373 [inline]
  irq_exit+0x227/0x230 /kernel/softirq.c:413
  exiting_irq /./arch/x86/include/asm/apic.h:537 [inline]
  smp_apic_timer_interrupt+0x113/0x280 /arch/x86/kernel/apic/apic.c:1095
  apic_timer_interrupt+0xf/0x20 /arch/x86/entry/entry_64.S:828
  </IRQ>
RIP: 0010:native_safe_halt+0xe/0x10 /./arch/x86/include/asm/irqflags.h:61
Code: 08 fa eb ae 89 d9 80 e1 07 80 c1 03 38 c1 7c ba 48 89 df e8 a4 04 08  
fa eb b0 90 90 e9 07 00 00 00 0f 00 2d e6 9c 58 00 fb f4 <c3> 90 e9 07 00  
00 00 0f 00 2d d6 9c 58 00 f4 c3 90 90 55 48 89 e5
RSP: 0018:ffff8880a98cfd38 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: 1ffffffff11950db RBX: ffff8880a98bc340 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffffffff812cde8a RDI: ffff8880a98bcb78
RBP: ffff8880a98cfd40 R08: ffff8880a98bcb90 R09: ffffed1015317869
R10: ffffed1015317869 R11: 0000000000000000 R12: 0000000000000001
R13: 1ffff11015317868 R14: dffffc0000000000 R15: dffffc0000000000
  arch_cpu_idle+0xa/0x10 /arch/x86/kernel/process.c:571
  default_idle_call+0x59/0xa0 /kernel/sched/idle.c:94
  cpuidle_idle_call /kernel/sched/idle.c:154 [inline]
  do_idle+0x180/0x780 /kernel/sched/idle.c:263
  cpu_startup_entry+0x25/0x30 /kernel/sched/idle.c:354
  start_secondary+0x3f4/0x490 /arch/x86/kernel/smpboot.c:264
  secondary_startup_64+0xa4/0xb0 /arch/x86/kernel/head_64.S:243
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jon Hunter @ 2019-07-18  9:16 UTC (permalink / raw)
  To: Jose Abreu, linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <BN8PR12MB32661E919A8DEBC7095BAA12D3C80@BN8PR12MB3266.namprd12.prod.outlook.com>


On 18/07/2019 08:48, Jose Abreu wrote:
> From: Jon Hunter <jonathanh@nvidia.com>
> Date: Jul/17/2019, 19:58:53 (UTC+00:00)
> 
>> Let me know if you have any thoughts.
> 
> Can you try attached patch ?

Yes this did not help. I tried enabling the following but no more output
is seen.

CONFIG_DMA_API_DEBUG=y
CONFIG_DMA_API_DEBUG_SG=y

Have you tried using NFS on a board with this ethernet controller?

Cheers,
Jon

-- 
nvpublic

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox