Netdev List
 help / color / mirror / Atom feed
* RE: [EXT] [PATCH] bnx2x: Prevent load reordering in tx completion processing
From: Manish Chopra @ 2019-07-18 10:12 UTC (permalink / raw)
  To: Brian King, GR-everest-linux-l2
  Cc: Sudarsana Reddy Kalluru, Ariel Elior, netdev@vger.kernel.org
In-Reply-To: <1563226910-21660-1-git-send-email-brking@linux.vnet.ibm.com>

> -----Original Message-----
> From: Brian King <brking@linux.vnet.ibm.com>
> Sent: Tuesday, July 16, 2019 3:12 AM
> To: GR-everest-linux-l2 <GR-everest-linux-l2@marvell.com>
> Cc: Sudarsana Reddy Kalluru <skalluru@marvell.com>; Ariel Elior
> <aelior@marvell.com>; netdev@vger.kernel.org; Brian King
> <brking@linux.vnet.ibm.com>
> Subject: [EXT] [PATCH] bnx2x: Prevent load reordering in tx completion
> processing
> 
> External Email
> 
> ----------------------------------------------------------------------
> This patch fixes an issue seen on Power systems with bnx2x which results in
> the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb pointer
> getting loaded in bnx2x_free_tx_pkt prior to the hw_cons load in
> bnx2x_tx_int. Adding a read memory barrier resolves the issue.
> 
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> index 656ed80..e2be5a6 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> @@ -285,6 +285,9 @@ int bnx2x_tx_int(struct bnx2x *bp, struct
> bnx2x_fp_txdata *txdata)
>  	hw_cons = le16_to_cpu(*txdata->tx_cons_sb);
>  	sw_cons = txdata->tx_pkt_cons;
> 
> +	/* Ensure subsequent loads occur after hw_cons */
> +	smp_rmb();
> +
>  	while (sw_cons != hw_cons) {
>  		u16 pkt_cons;
> 
> --
> 1.8.3.1

Could you please explain a bit in detail what could have caused skb to NULL exactly ?
Curious that if skb would have been NULL for some reason it did not cause NULL pointer dereference in bnx2x_free_tx_pkt() on below call -

prefetch(&skb->end);

Which is prior to the said WARN_ON(!skb) in bnx2x_free_tx_pkt().

^ permalink raw reply

* [PATCH] Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
From: Peter Kosyh @ 2019-07-18  9:41 UTC (permalink / raw)
  To: p.kosyh; +Cc: davem, David Ahern, Shrijeet Mukherjee, netdev, linux-kernel

vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing
using ip/ipv6 addresses, but don't make sure the header is available in
skb->data[] (skb_headlen() is less then header size).

The situation may occures while forwarding from MPLS layer to vrf, for
example.

So, this patch adds pskb_may_pull() calls in is_ip_tx_frame(), just before
call to vrf_process_... functions.

Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
---
 drivers/net/vrf.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 54edf8956a25..d552f29a58d1 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -292,13 +292,16 @@ static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, struct net_device *dev)
 {
 	switch (skb->protocol) {
 	case htons(ETH_P_IP):
+		if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct iphdr))
+			break;
 		return vrf_process_v4_outbound(skb, dev);
 	case htons(ETH_P_IPV6):
+		if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct ipv6hdr))
+			break;
 		return vrf_process_v6_outbound(skb, dev);
-	default:
-		vrf_tx_error(dev, skb);
-		return NET_XMIT_DROP;
 	}
+	vrf_tx_error(dev, skb);
+	return NET_XMIT_DROP;
 }
 
 static netdev_tx_t vrf_xmit(struct sk_buff *skb, struct net_device *dev)
-- 
2.11.0


^ permalink raw reply related

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Stefano Garzarella @ 2019-07-18  9:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190718041234-mutt-send-email-mst@kernel.org>

On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > If the packets to sent to the guest are bigger than the buffer
> > > > available, we can split them, using multiple buffers and fixing
> > > > the length in the packet header.
> > > > This is safe since virtio-vsock supports only stream sockets.
> > > >
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > >
> > > So how does it work right now? If an app
> > > does sendmsg with a 64K buffer and the other
> > > side publishes 4K buffers - does it just stall?
> >
> > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > (fixed in the code) and queued in an internal list for the TX worker.
> >
> > After this series, we will queue up to 64K packets and then it will be split in
> > the TX worker, depending on the size of the buffers available in the
> > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > for now we postponed it)
>
> Got it. Using workers for xmit is IMHO a bad idea btw.
> Why is it done like this?

Honestly, I don't know the exact reasons for this design, but I suppose
that the idea was to have only one worker that uses the vring, and
multiple user threads that enqueue packets in the list.
This can simplify the code and we can put the user threads to sleep if
we don't have "credit" available (this means that the receiver doesn't
have space to receive the packet).

What are the drawbacks in your opinion?


Thanks,
Stefano

^ permalink raw reply

* WARNING: refcount bug in nr_rx_frame
From: syzbot @ 2019-07-18  9:28 UTC (permalink / raw)
  To: davem, linux-hams, linux-kernel, netdev, ralf, syzkaller-bugs,
	xiyou.wangcong

Hello,

syzbot found the following crash on:

HEAD commit:    0a8ad0ff Merge tag 'for-linus-5.3-ofs1' of git://git.kerne..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16f3cb70600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=88095c4f62402bcd
dashboard link: https://syzkaller.appspot.com/bug?extid=622bdabb128acc33427d
compiler:       clang version 9.0.0 (/home/glider/llvm/clang  
80fee25776c2fb61e74c1ecb1a523375c2500b69)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=140a8cafa00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=123b2e68600000

The bug was bisected to:

commit c8c8218ec5af5d2598381883acbefbf604e56b5e
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Thu Jun 27 21:30:58 2019 +0000

     netrom: fix a memory leak in nr_rx_frame()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1479a100600000
final crash:    https://syzkaller.appspot.com/x/report.txt?x=1679a100600000
console output: https://syzkaller.appspot.com/x/log.txt?x=1279a100600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+622bdabb128acc33427d@syzkaller.appspotmail.com
Fixes: c8c8218ec5af ("netrom: fix a memory leak in nr_rx_frame()")

------------[ cut here ]------------
refcount_t: increment on 0; use-after-free.
WARNING: CPU: 1 PID: 0 at lib/refcount.c:156 refcount_inc_checked+0x4b/0x50  
/lib/refcount.c:156
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.2.0+ #33
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  <IRQ>
  __dump_stack /lib/dump_stack.c:77 [inline]
  dump_stack+0x1d8/0x2f8 /lib/dump_stack.c:113
  panic+0x29b/0x7d9 /kernel/panic.c:219
  __warn+0x22f/0x230 /kernel/panic.c:576
  report_bug+0x190/0x290 /lib/bug.c:186
  fixup_bug /arch/x86/kernel/traps.c:179 [inline]
  do_error_trap+0xd7/0x440 /arch/x86/kernel/traps.c:272
  do_invalid_op+0x36/0x40 /arch/x86/kernel/traps.c:291
  invalid_op+0x14/0x20 /arch/x86/entry/entry_64.S:1008
RIP: 0010:refcount_inc_checked+0x4b/0x50 /lib/refcount.c:156
Code: 3d d9 5e 94 05 01 75 08 e8 22 8f 11 fe 5b 5d c3 e8 1a 8f 11 fe c6 05  
c3 5e 94 05 01 48 c7 c7 49 ca 87 88 31 c0 e8 35 ca e2 fd <0f> 0b eb df 90  
55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 e4 e0
RSP: 0018:ffff8880aeb09b40 EFLAGS: 00010246
RAX: d6b548c1187c2900 RBX: ffff8880a149f6c0 RCX: ffff8880a98bc340
RDX: 0000000000000302 RSI: 0000000000000302 RDI: 0000000000000000
RBP: ffff8880aeb09b48 R08: ffffffff81600c14 R09: fffffbfff13bb2bb
R10: fffffbfff13bb2bb R11: 0000000000000000 R12: ffff88808240aa40
R13: dffffc0000000000 R14: 0000000000000004 R15: ffff8880a149f640
  sock_hold /./include/net/sock.h:649 [inline]
  sk_add_node /./include/net/sock.h:701 [inline]
  nr_insert_socket /net/netrom/af_netrom.c:137 [inline]
  nr_rx_frame+0x17bc/0x1e40 /net/netrom/af_netrom.c:1023
  nr_loopback_timer+0x6a/0x140 /net/netrom/nr_loopback.c:59
  call_timer_fn+0xec/0x200 /kernel/time/timer.c:1322
  expire_timers /kernel/time/timer.c:1366 [inline]
  __run_timers+0x7cd/0x9c0 /kernel/time/timer.c:1685
  run_timer_softirq+0x4a/0x90 /kernel/time/timer.c:1698
  __do_softirq+0x333/0x7c4 /./arch/x86/include/asm/paravirt.h:777
  invoke_softirq /kernel/softirq.c:373 [inline]
  irq_exit+0x227/0x230 /kernel/softirq.c:413
  exiting_irq /./arch/x86/include/asm/apic.h:537 [inline]
  smp_apic_timer_interrupt+0x113/0x280 /arch/x86/kernel/apic/apic.c:1095
  apic_timer_interrupt+0xf/0x20 /arch/x86/entry/entry_64.S:828
  </IRQ>
RIP: 0010:native_safe_halt+0xe/0x10 /./arch/x86/include/asm/irqflags.h:61
Code: 08 fa eb ae 89 d9 80 e1 07 80 c1 03 38 c1 7c ba 48 89 df e8 a4 04 08  
fa eb b0 90 90 e9 07 00 00 00 0f 00 2d e6 9c 58 00 fb f4 <c3> 90 e9 07 00  
00 00 0f 00 2d d6 9c 58 00 f4 c3 90 90 55 48 89 e5
RSP: 0018:ffff8880a98cfd38 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: 1ffffffff11950db RBX: ffff8880a98bc340 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffffffff812cde8a RDI: ffff8880a98bcb78
RBP: ffff8880a98cfd40 R08: ffff8880a98bcb90 R09: ffffed1015317869
R10: ffffed1015317869 R11: 0000000000000000 R12: 0000000000000001
R13: 1ffff11015317868 R14: dffffc0000000000 R15: dffffc0000000000
  arch_cpu_idle+0xa/0x10 /arch/x86/kernel/process.c:571
  default_idle_call+0x59/0xa0 /kernel/sched/idle.c:94
  cpuidle_idle_call /kernel/sched/idle.c:154 [inline]
  do_idle+0x180/0x780 /kernel/sched/idle.c:263
  cpu_startup_entry+0x25/0x30 /kernel/sched/idle.c:354
  start_secondary+0x3f4/0x490 /arch/x86/kernel/smpboot.c:264
  secondary_startup_64+0xa4/0xb0 /arch/x86/kernel/head_64.S:243
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jon Hunter @ 2019-07-18  9:16 UTC (permalink / raw)
  To: Jose Abreu, linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <BN8PR12MB32661E919A8DEBC7095BAA12D3C80@BN8PR12MB3266.namprd12.prod.outlook.com>


On 18/07/2019 08:48, Jose Abreu wrote:
> From: Jon Hunter <jonathanh@nvidia.com>
> Date: Jul/17/2019, 19:58:53 (UTC+00:00)
> 
>> Let me know if you have any thoughts.
> 
> Can you try attached patch ?

Yes this did not help. I tried enabling the following but no more output
is seen.

CONFIG_DMA_API_DEBUG=y
CONFIG_DMA_API_DEBUG_SG=y

Have you tried using NFS on a board with this ethernet controller?

Cheers,
Jon

-- 
nvpublic

^ permalink raw reply

* [PATCH bpf] selftests/bpf: fix "valid read map access into a read-only array 1" on s390
From: Ilya Leoshkevich @ 2019-07-18  9:13 UTC (permalink / raw)
  To: bpf, netdev; +Cc: gor, heiko.carstens, Ilya Leoshkevich

This test looks up a 32-bit map element and then loads it using a 64-bit
load. This does not work on s390, which is a big-endian machine.

Since the point of this test doesn't seem to be loading a smaller value
using a larger load, simply use a 32-bit load.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 tools/testing/selftests/bpf/verifier/array_access.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/verifier/array_access.c b/tools/testing/selftests/bpf/verifier/array_access.c
index bcb83196e459..f3c33e128709 100644
--- a/tools/testing/selftests/bpf/verifier/array_access.c
+++ b/tools/testing/selftests/bpf/verifier/array_access.c
@@ -226,7 +226,7 @@
 	BPF_LD_MAP_FD(BPF_REG_1, 0),
 	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
 	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
-	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
 	.fixup_map_array_ro = { 3 },
-- 
2.21.0


^ permalink raw reply related

* Re: [RFC PATCH 4/5] PTP: Add flag for non-periodic output
From: Felipe Balbi @ 2019-07-18  8:59 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall
In-Reply-To: <20190717173645.GD1464@localhost>


Hi,

Richard Cochran <richardcochran@gmail.com> writes:

> On Wed, Jul 17, 2019 at 09:49:13AM +0300, Felipe Balbi wrote:
>> No worries, I'll work on this after vacations (I'll off for 2 weeks
>> starting next week). I thought about adding a new IOCTL until I saw that
>> rsv field. Oh well :-)
>
> It would be great if you could fix up the PTP ioctls as a preface to
> your series.

no problem, anything in particular in mind? Just create new versions of
all the IOCTLs so we can actually use the reserved fields in the future?

cheers

-- 
balbi

^ permalink raw reply

* Re: [RFC PATCH 0/5] PTP: add support for Intel's TGPIO controller
From: Felipe Balbi @ 2019-07-18  8:58 UTC (permalink / raw)
  To: Richard Cochran
  Cc: netdev, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall
In-Reply-To: <20190717173915.GE1464@localhost>


Hi,

Richard Cochran <richardcochran@gmail.com> writes:

> On Wed, Jul 17, 2019 at 09:52:55AM +0300, Felipe Balbi wrote:
>> 
>> It's just a pin, like a GPIO. So it would be a PCB trace, flat flex,
>> copper wire... Anything, really.
>
> Cool.  Are there any Intel CPUs available that have this feature?

At least canon lake has it, but its BIOS doesn't enable it. This is
something that will be more important on future CPUs.

-- 
balbi

^ permalink raw reply

* Re: IP GRO verifies csum again?
From: Eric Dumazet @ 2019-07-18  8:27 UTC (permalink / raw)
  To: Jacob Wen, netdev; +Cc: herbert
In-Reply-To: <6cc12686-a13d-81a8-ad3c-4601397c900e@oracle.com>



On 7/18/19 9:49 AM, Jacob Wen wrote:
> Hi,
> 
> inet_gro_receive verifies IP csum but a NIC already did so and set CHECKSUM_UNNECESSARY.
> 
> 
> https://github.com/torvalds/linux/blob/v5.2/net/ipv4/af_inet.c#L1432-L1433
> 
> if (unlikely(ip_fast_csum((u8 *)iph, 5)))
> 
>         goto out_unlock;
> 
> 
> Is this a bug?
> 

This checksum validates the TCP one, which is the real cost, since we need to touch all
the packet.

We do not bother 'offloading' IPV4 checksum over 20 bytes or so, since in modern cpus,
having the cache line hot in cpu caches means the checksum is almost free.

Adding a test here would not always be a win, say for CHECKSUM_COMPLETE cases,
which we try to generalize in favor of old CHECKSUM_UNNECESSARY.

^ permalink raw reply

* RE: [EXT] [PATCH] qed: Prefer pcie_capability_read_word()
From: Michal Kalderon @ 2019-07-18  8:22 UTC (permalink / raw)
  To: Frederick Lawler, Ariel Elior, GR-everest-linux-l2
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	bhelgaas@google.com
In-Reply-To: <20190718020745.8867-7-fred@fredlawl.com>

> From: Frederick Lawler <fred@fredlawl.com>
> Sent: Thursday, July 18, 2019 5:08 AM
> 
> External Email
> 
> ----------------------------------------------------------------------
> Commit 8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability") added
> accessors for the PCI Express Capability so that drivers didn't need to be
> aware of differences between v1 and v2 of the PCI Express Capability.
> 
> Replace pci_read_config_word() and pci_write_config_word() calls with
> pcie_capability_read_word() and pcie_capability_write_word().
> 
> Signed-off-by: Frederick Lawler <fred@fredlawl.com>
> ---
>  drivers/net/ethernet/qlogic/qed/qed_rdma.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
> b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
> index 7873d6dfd91f..8d8a920c3195 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
> @@ -530,9 +530,8 @@ static void qed_rdma_init_devinfo(struct qed_hwfn
> *p_hwfn,
>  	SET_FIELD(dev->dev_caps,
> QED_RDMA_DEV_CAP_LOCAL_INV_FENCE, 1);
> 
>  	/* Check atomic operations support in PCI configuration space. */
> -	pci_read_config_dword(cdev->pdev,
> -			      cdev->pdev->pcie_cap + PCI_EXP_DEVCTL2,
> -			      &pci_status_control);
> +	pcie_capability_read_dword(cdev->pdev, PCI_EXP_DEVCTL2,
> +				   &pci_status_control);
> 
>  	if (pci_status_control & PCI_EXP_DEVCTL2_LTR_EN)
>  		SET_FIELD(dev->dev_caps,
> QED_RDMA_DEV_CAP_ATOMIC_OP, 1);
> --
> 2.17.1

Thanks, 

Acked-by: Michal Kalderon <michal.kalderon@marvell.com>



^ permalink raw reply

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Michael S. Tsirkin @ 2019-07-18  8:13 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <CAGxU2F45v40qAOHkm1Hk2E69gCS0UwVgS5NS+tDXXuzdF4EixA@mail.gmail.com>

On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > If the packets to sent to the guest are bigger than the buffer
> > > available, we can split them, using multiple buffers and fixing
> > > the length in the packet header.
> > > This is safe since virtio-vsock supports only stream sockets.
> > >
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> >
> > So how does it work right now? If an app
> > does sendmsg with a 64K buffer and the other
> > side publishes 4K buffers - does it just stall?
> 
> Before this series, the 64K (or bigger) user messages was split in 4K packets
> (fixed in the code) and queued in an internal list for the TX worker.
> 
> After this series, we will queue up to 64K packets and then it will be split in
> the TX worker, depending on the size of the buffers available in the
> vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> for now we postponed it)

Got it. Using workers for xmit is IMHO a bad idea btw.
Why is it done like this?

> Note: virtio-vsock only supports stream socket for now.
> 
> Thanks,
> Stefano

^ permalink raw reply

* Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed
From: Stefano Garzarella @ 2019-07-18  7:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190717105703-mutt-send-email-mst@kernel.org>

On Wed, Jul 17, 2019 at 5:00 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> > Since now we are able to split packets, we can avoid limiting
> > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> > packet size.
> >
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>
>
> OK so this is kind of like GSO where we are passing
> 64K packets to the vsock and then split at the
> low level.

Exactly, something like that in the Host->Guest path, instead in the
Guest->Host we use the entire 64K packet.

Thanks,
Stefano

^ permalink raw reply

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Stefano Garzarella @ 2019-07-18  7:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190717105336-mutt-send-email-mst@kernel.org>

On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > If the packets to sent to the guest are bigger than the buffer
> > available, we can split them, using multiple buffers and fixing
> > the length in the packet header.
> > This is safe since virtio-vsock supports only stream sockets.
> >
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>
> So how does it work right now? If an app
> does sendmsg with a 64K buffer and the other
> side publishes 4K buffers - does it just stall?

Before this series, the 64K (or bigger) user messages was split in 4K packets
(fixed in the code) and queued in an internal list for the TX worker.

After this series, we will queue up to 64K packets and then it will be split in
the TX worker, depending on the size of the buffers available in the
vring. (The idea was to allow EWMA or a configuration of the buffers size, but
for now we postponed it)

Note: virtio-vsock only supports stream socket for now.

Thanks,
Stefano

^ permalink raw reply

* IP GRO verifies csum again?
From: Jacob Wen @ 2019-07-18  7:49 UTC (permalink / raw)
  To: netdev; +Cc: herbert

Hi,

inet_gro_receive verifies IP csum but a NIC already did so and set 
CHECKSUM_UNNECESSARY.


https://github.com/torvalds/linux/blob/v5.2/net/ipv4/af_inet.c#L1432-L1433

if (unlikely(ip_fast_csum((u8 *)iph, 5)))

         goto out_unlock;


Is this a bug?


^ permalink raw reply

* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-18  7:48 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <29dcc161-f7c8-026e-c3cc-5adb04df128c@nvidia.com>

[-- Attachment #1: Type: text/plain, Size: 192 bytes --]

From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/17/2019, 19:58:53 (UTC+00:00)

> Let me know if you have any thoughts.

Can you try attached patch ?

---
Thanks,
Jose Miguel Abreu

[-- Attachment #2: 0001-net-stmmac-RX-Descriptors-need-to-be-clean-before-se.patch --]
[-- Type: application/octet-stream, Size: 2042 bytes --]

From 00bfde6f589e60ba1a2d0671c8ba0fcd0964d6e7 Mon Sep 17 00:00:00 2001
Message-Id: <00bfde6f589e60ba1a2d0671c8ba0fcd0964d6e7.1563435927.git.joabreu@synopsys.com>
From: Jose Abreu <joabreu@synopsys.com>
Date: Thu, 18 Jul 2019 09:42:31 +0200
Subject: [PATCH net] net: stmmac: RX Descriptors need to be clean before
 setting buffers

RX Descriptors are being cleaned after setting the buffers which may
lead to buffer addresses being wiped out.

Fix this by clearing earlier the RX Descriptors.

Reported-by: Jon Hunter <jonathanh@nvidia.com>
Fixes: 2af6106ae949 ("net: stmmac: Introducing support for Page Pool")
Signed-off-by: Jose Abreu <joabreu@synopsys.com>

---
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Jose Abreu <joabreu@synopsys.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c7c9e5f162e6..5f1294ce0216 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1295,6 +1295,8 @@ static int init_dma_rx_desc_rings(struct net_device *dev, gfp_t flags)
 			  "(%s) dma_rx_phy=0x%08x\n", __func__,
 			  (u32)rx_q->dma_rx_phy);
 
+		stmmac_clear_rx_descriptors(priv, queue);
+
 		for (i = 0; i < DMA_RX_SIZE; i++) {
 			struct dma_desc *p;
 
@@ -1312,8 +1314,6 @@ static int init_dma_rx_desc_rings(struct net_device *dev, gfp_t flags)
 		rx_q->cur_rx = 0;
 		rx_q->dirty_rx = (unsigned int)(i - DMA_RX_SIZE);
 
-		stmmac_clear_rx_descriptors(priv, queue);
-
 		/* Setup the chained descriptor addresses */
 		if (priv->mode == STMMAC_CHAIN_MODE) {
 			if (priv->extend_desc)
-- 
2.7.4


^ permalink raw reply related

* [PATCH 2/3] liquidio: Replace vmalloc + memset with vzalloc
From: Chuhong Yuan @ 2019-07-18  7:45 UTC (permalink / raw)
  Cc: Derek Chickles, Satanand Burla, Felix Manlunas, David S . Miller,
	netdev, linux-kernel, Chuhong Yuan

Use vzalloc and vzalloc_node instead of using vmalloc and
vmalloc_node and then zeroing the allocated memory by
memset 0.
This simplifies the code.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/ethernet/cavium/liquidio/request_manager.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/request_manager.c b/drivers/net/ethernet/cavium/liquidio/request_manager.c
index fcf20a8f92d9..032224178b64 100644
--- a/drivers/net/ethernet/cavium/liquidio/request_manager.c
+++ b/drivers/net/ethernet/cavium/liquidio/request_manager.c
@@ -218,15 +218,13 @@ int octeon_setup_iq(struct octeon_device *oct,
 		return 0;
 	}
 	oct->instr_queue[iq_no] =
-	    vmalloc_node(sizeof(struct octeon_instr_queue), numa_node);
+	    vzalloc_node(sizeof(struct octeon_instr_queue), numa_node);
 	if (!oct->instr_queue[iq_no])
 		oct->instr_queue[iq_no] =
-		    vmalloc(sizeof(struct octeon_instr_queue));
+		    vzalloc(sizeof(struct octeon_instr_queue));
 	if (!oct->instr_queue[iq_no])
 		return 1;
 
-	memset(oct->instr_queue[iq_no], 0,
-	       sizeof(struct octeon_instr_queue));
 
 	oct->instr_queue[iq_no]->q_index = q_index;
 	oct->instr_queue[iq_no]->app_ctx = app_ctx;
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v4 3/5] vsock/virtio: fix locking in virtio_transport_inc_tx_pkt()
From: Stefano Garzarella @ 2019-07-18  7:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190717105056-mutt-send-email-mst@kernel.org>

On Wed, Jul 17, 2019 at 4:51 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 17, 2019 at 01:30:28PM +0200, Stefano Garzarella wrote:
> > fwd_cnt and last_fwd_cnt are protected by rx_lock, so we should use
> > the same spinlock also if we are in the TX path.
> >
> > Move also buf_alloc under the same lock.
> >
> > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>
> Wait a second is this a bugfix?
> If it's used under the wrong lock won't values get corrupted?
> Won't traffic then stall or more data get to sent than
> credits?

Before this series, we only read vvs->fwd_cnt and vvs->buf_alloc in this
function, but using a different lock than the one used to write them.
I'm not sure if a corruption can happen, but if we want to avoid the
lock, we should use an atomic operation or memory barriers.

Since now we also need to update vvs->last_fwd_cnt, in order to limit the
credit message, I decided to take the same lock used to protect vvs->fwd_cnt
and vvs->last_fwd_cnt.


Thanks,
Stefano

>
> > ---
> >  include/linux/virtio_vsock.h            | 2 +-
> >  net/vmw_vsock/virtio_transport_common.c | 4 ++--
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 49fc9d20bc43..4c7781f4b29b 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -35,7 +35,6 @@ struct virtio_vsock_sock {
> >
> >       /* Protected by tx_lock */
> >       u32 tx_cnt;
> > -     u32 buf_alloc;
> >       u32 peer_fwd_cnt;
> >       u32 peer_buf_alloc;
> >
> > @@ -43,6 +42,7 @@ struct virtio_vsock_sock {
> >       u32 fwd_cnt;
> >       u32 last_fwd_cnt;
> >       u32 rx_bytes;
> > +     u32 buf_alloc;
> >       struct list_head rx_queue;
> >  };
> >
> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > index a85559d4d974..34a2b42313b7 100644
> > --- a/net/vmw_vsock/virtio_transport_common.c
> > +++ b/net/vmw_vsock/virtio_transport_common.c
> > @@ -210,11 +210,11 @@ static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> >
> >  void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> >  {
> > -     spin_lock_bh(&vvs->tx_lock);
> > +     spin_lock_bh(&vvs->rx_lock);
> >       vvs->last_fwd_cnt = vvs->fwd_cnt;
> >       pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> >       pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
> > -     spin_unlock_bh(&vvs->tx_lock);
> > +     spin_unlock_bh(&vvs->rx_lock);
> >  }
> >  EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
> >
> > --
> > 2.20.1

^ permalink raw reply

* Re: [PATCH] net: ethernet: fix error return code in ag71xx_probe()
From: Oleksij Rempel @ 2019-07-18  7:41 UTC (permalink / raw)
  To: Wei Yongjun, Jay Cliburn, Chris Snook; +Cc: netdev, kernel-janitors
In-Reply-To: <20190717115215.22965-1-weiyongjun1@huawei.com>



On 17.07.19 13:52, Wei Yongjun wrote:
> Fix to return error code -ENOMEM from the dmam_alloc_coherent() error
> handling case instead of 0, as done elsewhere in this function.
> 
> Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>

> ---
>   drivers/net/ethernet/atheros/ag71xx.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/atheros/ag71xx.c b/drivers/net/ethernet/atheros/ag71xx.c
> index 72a57c6cd254..446d62e93439 100644
> --- a/drivers/net/ethernet/atheros/ag71xx.c
> +++ b/drivers/net/ethernet/atheros/ag71xx.c
> @@ -1724,8 +1724,10 @@ static int ag71xx_probe(struct platform_device *pdev)
>   	ag->stop_desc = dmam_alloc_coherent(&pdev->dev,
>   					    sizeof(struct ag71xx_desc),
>   					    &ag->stop_desc_dma, GFP_KERNEL);
> -	if (!ag->stop_desc)
> +	if (!ag->stop_desc) {
> +		err = -ENOMEM;
>   		goto err_free;
> +	}
>   
>   	ag->stop_desc->data = 0;
>   	ag->stop_desc->ctrl = 0;
> 
> 
> 
> 

Kind regards,
Oleksij Rempel

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: [PATCH net-next 00/12] mlx5 TLS TX HW offload support
From: Tariq Toukan @ 2019-07-18  7:40 UTC (permalink / raw)
  To: Jakub Kicinski, Tariq Toukan
  Cc: David Miller, netdev@vger.kernel.org, Eran Ben Elisha,
	Saeed Mahameed, Moshe Shemesh
In-Reply-To: <20190717104141.37333cc9@cakuba.netronome.com>



On 7/17/2019 8:41 PM, Jakub Kicinski wrote:
> On Sun, 7 Jul 2019 06:44:27 +0000, Tariq Toukan wrote:
>> On 7/6/2019 2:29 AM, David Miller wrote:
>>> From: Tariq Toukan <tariqt@mellanox.com>
>>> Date: Fri,  5 Jul 2019 18:30:10 +0300
>>>    
>>>> This series from Eran and me, adds TLS TX HW offload support to
>>>> the mlx5 driver.
>>>
>>> Series applied, please deal with any further feedback you get from
>>> Jakub et al.
>>
>> I will followup with patches addressing Jakub's feedback.
> 
> Ping.
> 

Hi Jakub,

I'm waiting for the window to open:
http://vger.kernel.org/~davem/net-next.html

Do you think these can already go to net as fixes?

Regards,
Tariq

^ permalink raw reply

* Re: [PATCH] net: ag71xx: fix return value check in ag71xx_probe()
From: Oleksij Rempel @ 2019-07-18  7:39 UTC (permalink / raw)
  To: Wei Yongjun, Jay Cliburn, Chris Snook; +Cc: netdev, kernel-janitors
In-Reply-To: <20190717115225.23047-1-weiyongjun1@huawei.com>



On 17.07.19 13:52, Wei Yongjun wrote:
> In case of error, the function of_get_mac_address() returns ERR_PTR()
> and never returns NULL. The NULL test in the return value check should
> be replaced with IS_ERR().
> 
> Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>

> ---
>   drivers/net/ethernet/atheros/ag71xx.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/atheros/ag71xx.c b/drivers/net/ethernet/atheros/ag71xx.c
> index 72a57c6cd254..3088a43e6436 100644
> --- a/drivers/net/ethernet/atheros/ag71xx.c
> +++ b/drivers/net/ethernet/atheros/ag71xx.c
> @@ -1732,9 +1732,9 @@ static int ag71xx_probe(struct platform_device *pdev)
>   	ag->stop_desc->next = (u32)ag->stop_desc_dma;
>   
>   	mac_addr = of_get_mac_address(np);
> -	if (mac_addr)
> +	if (!IS_ERR(mac_addr))
>   		memcpy(ndev->dev_addr, mac_addr, ETH_ALEN);
> -	if (!mac_addr || !is_valid_ether_addr(ndev->dev_addr)) {
> +	if (IS_ERR(mac_addr) || !is_valid_ether_addr(ndev->dev_addr)) {
>   		netif_err(ag, probe, ndev, "invalid MAC address, using random address\n");
>   		eth_random_addr(ndev->dev_addr);
>   	}
> 
> 
> 
> 

Kind regards,
Oleksij Rempel

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply

* Re: [PATCH ipsec v2 0/4] xfrm interface: bugs fixes
From: Steffen Klassert @ 2019-07-18  7:37 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: davem, netdev
In-Reply-To: <20190715100023.8475-1-nicolas.dichtel@6wind.com>

On Mon, Jul 15, 2019 at 12:00:19PM +0200, Nicolas Dichtel wrote:
> 
> Here is a bunch of bugs fixes. Some have been seen by code review and some when
> playing with x-netns.
> The details are in each patch.
> 
> v1 -> v2:
>  - add patch #3 and #4

Series applied, thanks Nicolas!


^ permalink raw reply

* Re: [PATCH AUTOSEL 5.2 226/249] selftests: bpf: fix inlines in test_lwt_seg6local
From: Jiri Benc @ 2019-07-18  7:36 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, Yonghong Song, Daniel Borkmann,
	linux-kselftest, netdev, bpf, clang-built-linux
In-Reply-To: <20190717234757.GD3079@sasha-vm>

On Wed, 17 Jul 2019 19:47:57 -0400, Sasha Levin wrote:
> It fixes a bug, right?

A bug in selftests. And quite likely, it probably happens only with
some compiler versions.

I don't think patches only touching tools/testing/selftests/ qualify
for stable in general. They don't affect the end users.

 Jiri

^ permalink raw reply

* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-18  7:29 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <29dcc161-f7c8-026e-c3cc-5adb04df128c@nvidia.com>

From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/17/2019, 19:58:53 (UTC+00:00)

> I am seeing a boot regression on one of our Tegra boards with both
> mainline and -next. Bisecting is pointing to this commit and reverting
> this commit on top of mainline fixes the problem. Unfortunately, there
> is not much of a backtrace but what I have captured is below. 
> 
> Please note that this is seen on a system that is using NFS to mount
> the rootfs and the crash occurs right around the point the rootfs is
> mounted.
> 
> Let me know if you have any thoughts.
> 
> Cheers
> Jon 
> 
> [   12.221843] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [   12.229485] CPU: 5 PID: 1 Comm: init Tainted: G S                5.2.0-11500-g916f562fb28a #18
> [   12.238076] Hardware name: NVIDIA Tegra186 P2771-0000 Development Board (DT)
> [   12.245105] Call trace:
> [   12.247548]  dump_backtrace+0x0/0x150
> [   12.251199]  show_stack+0x14/0x20
> [   12.254505]  dump_stack+0x9c/0xc4
> [   12.257809]  panic+0x13c/0x32c
> [   12.260853]  complete_and_exit+0x0/0x20
> [   12.264676]  do_group_exit+0x34/0x98
> [   12.268241]  get_signal+0x104/0x668
> [   12.271718]  do_notify_resume+0x2ac/0x380
> [   12.275716]  work_pending+0x8/0x10
> [   12.279109] SMP: stopping secondary CPUs
> [   12.283025] Kernel Offset: disabled
> [   12.286502] CPU features: 0x0002,20806000
> [   12.290499] Memory Limit: none
> [   12.293548] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
> 
> -- 
> nvpublic

You don't have any more data ? Can you activate DMA-API debug and check 
if there is any more info outputted ?

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: regression with napi/softirq ?
From: Eric Dumazet @ 2019-07-18  6:58 UTC (permalink / raw)
  To: Thomas Gleixner, Sudip Mukherjee
  Cc: Peter Zijlstra (Intel), David S. Miller, netdev, linux-kernel
In-Reply-To: <alpine.DEB.2.21.1907172345360.1778@nanos.tec.linutronix.de>



On 7/17/19 11:52 PM, Thomas Gleixner wrote:
> Sudip,
> 
> On Wed, 17 Jul 2019, Sudip Mukherjee wrote:
>> On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>>> You can hack ksoftirq_running() to return always false to avoid this, but
>>> that might cause application starvation and a huge packet buffer backlog
>>> when the amount of incoming packets makes the CPU do nothing else than
>>> softirq processing.
>>
>> I tried that now, it is better but still not as good as v3.8
>> Now I am getting 375.9usec as the maximum time between raising the softirq
>> and it starting to execute and packet drops still there.
>>
>> And just a thought, do you think there should be a CONFIG_ option for
>> this feature of ksoftirqd_running() so that it can be disabled if needed
>> by users like us?
> 
> If at all then a sysctl to allow runtime control.
>  
>> Can you please think of anything else that might have changed which I still need
>> to change to make the time comparable to v3.8..
> 
> Something with in that small range of:
> 
>  63592 files changed, 13783320 insertions(+), 5155492 deletions(-)
> 
> :)
> 
> Seriously, that can be anything.
> 
> Can you please test with Linus' head of tree and add some more
> instrumentation, so we can see what holds off softirqs from being
> processed. If the ksoftirqd enforcement is disabled, then the only reason
> can be a long lasting softirq disabled region. Tracing should tell.

ksoftirqd might be spuriously scheduled from tx path, when
__qdisc_run() also reacts to need_resched().

By raising NET_TX while we are processing NET_RX (say we send a TCP ACK packet
in response to incoming packet), we force __do_softirq() to perform
another loop, but before doing an other round, it will also check need_resched()
and eventually call wakeup_softirqd()

I wonder if following patch makes any difference.

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 11c03cf4aa74b44663c74e0e3284140b0c75d9c4..ab736e974396394ae6ba409868aaea56a50ad57b 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -377,6 +377,8 @@ void __qdisc_run(struct Qdisc *q)
        int packets;
 
        while (qdisc_restart(q, &packets)) {
+               if (qdisc_is_empty(q))
+                       break;
                /*
                 * Ordered by possible occurrence: Postpone processing if
                 * 1. we've exceeded packet quota


^ permalink raw reply related

* Re: [PATCH] net: dsa: sja1105: Release lock in error case
From: Yuehaibing @ 2019-07-18  6:53 UTC (permalink / raw)
  To: Hariprasad Kelam, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	David S. Miller, netdev, linux-kernel
In-Reply-To: <20190718022110.GA19222@hari-Inspiron-1545>

https://patchwork.ozlabs.org/patch/1133135/

This has been fixed.

On 2019/7/18 10:21, Hariprasad Kelam wrote:
> This patch adds release of unlock in fail case.
> 
> Issue identified by coccicheck
> 
> Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
> ---
>  net/dsa/tag_sja1105.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
> index 1d96c9d..26363d7 100644
> --- a/net/dsa/tag_sja1105.c
> +++ b/net/dsa/tag_sja1105.c
> @@ -216,6 +216,7 @@ static struct sk_buff
>  		if (!skb) {
>  			dev_err_ratelimited(dp->ds->dev,
>  					    "Failed to copy stampable skb\n");
> +			spin_unlock(&sp->data->meta_lock);
>  			return NULL;
>  		}
>  		sja1105_transfer_meta(skb, meta);
> 


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox