Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 11/11] vhost_net: batch submitting XDP buffers to underlayer sockets
From: Michael S. Tsirkin @ 2018-09-07 16:13 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <ffb802e6-1505-e01e-f6d5-11cde8dace9b@redhat.com>

On Fri, Sep 07, 2018 at 03:41:52PM +0800, Jason Wang wrote:
> > > @@ -556,10 +667,14 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > >   	size_t len, total_len = 0;
> > >   	int err;
> > >   	int sent_pkts = 0;
> > > +	bool bulking = (sock->sk->sk_sndbuf == INT_MAX);
> > What does bulking mean?
> 
> The name is misleading, it means whether we can do batching. For simplicity,
> I disable batching is sndbuf is not INT_MAX.

But what does batching have to do with sndbuf?

> > >   	for (;;) {
> > >   		bool busyloop_intr = false;
> > > +		if (nvq->done_idx == VHOST_NET_BATCH)
> > > +			vhost_tx_batch(net, nvq, sock, &msg);
> > > +
> > >   		head = get_tx_bufs(net, nvq, &msg, &out, &in, &len,
> > >   				   &busyloop_intr);
> > >   		/* On error, stop handling until the next kick. */
> > > @@ -577,14 +692,34 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > >   			break;
> > >   		}
> > > -		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
> > > -		vq->heads[nvq->done_idx].len = 0;
> > > -
> > >   		total_len += len;
> > > -		if (tx_can_batch(vq, total_len))
> > > -			msg.msg_flags |= MSG_MORE;
> > > -		else
> > > -			msg.msg_flags &= ~MSG_MORE;
> > > +
> > > +		/* For simplicity, TX batching is only enabled if
> > > +		 * sndbuf is unlimited.
> > What if sndbuf changes while this processing is going on?
> 
> We will get the correct sndbuf in the next run of handle_tx(). I think this
> is safe.

If it's safe why bother with special-casing INT_MAX?

-- 
MST

^ permalink raw reply

* [PATCH] ethernet:netronome:nfp:move spin_lock_bh to spin_lock in tasklet
From: jun qian @ 2018-09-07 16:21 UTC (permalink / raw)
  To: Jakub Kicinski, Dirk van der Merwe, Daniel Borkmann,
	Quentin Monnet
  Cc: oss-drivers, netdev, linux-kernel, jun qian

As you are already in a tasklet, it is unnecessary to call spin_lock_bh.

Signed-off-by: jun qian <hangdianqj@163.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index a8b9fbab5f73..084c983ec3c2 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2075,10 +2075,10 @@ static void nfp_ctrl_poll(unsigned long arg)
 {
 	struct nfp_net_r_vector *r_vec = (void *)arg;
 
-	spin_lock_bh(&r_vec->lock);
+	spin_lock(&r_vec->lock);
 	nfp_net_tx_complete(r_vec->tx_ring, 0);
 	__nfp_ctrl_tx_queued(r_vec);
-	spin_unlock_bh(&r_vec->lock);
+	spin_unlock(&r_vec->lock);
 
 	nfp_ctrl_rx(r_vec);
 
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH v2 net-next 3/4] net: make listified RX functions return number of good packets
From: Eric Dumazet @ 2018-09-07 11:43 UTC (permalink / raw)
  To: Edward Cree, Eric Dumazet, davem; +Cc: linux-net-drivers, netdev
In-Reply-To: <e184f2c1-fe28-ad6e-460d-950d8a363852@solarflare.com>



On 09/07/2018 03:44 AM, Edward Cree wrote:

> 
> Any suggestions on how to construct a test that will?
> 

Say 50 concurrent netperf -t TCP_RR -- -r 8000,8000

This way you have a mix of GRO-candidates, and non GRO ones (pure acks)

GRO sizes would be reasonable (not full size GRO packets).

^ permalink raw reply

* Network device suspend/resume
From: Lakshmi @ 2018-09-07 11:49 UTC (permalink / raw)
  To: Oliver Neukum, netdev

Hi,

I am bringing kernel bugzilla bug here
https://bugzilla.kernel.org/show_bug.cgi?id=196399

This issue occured 2 months ago and we didn't see this again. Wondering 
if that appears again. Can you confirm if there is any bug in network 
suspend/resume in the case.

One more instance here
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4376/fi-skl-6600u/igt@gem_exec_suspend@basic-s4-devices.html
================================================================================
Network device is
0b95:7720 ASIX Electronics Corp. AX88772
USB network card
driver seems to be usbnet
=================================================================================
Bug 196399:

We have found out that since at least 4.11-rc1, some machines in the 
Intel GFX CI lab have been generating the following warning when 
suspending to s4 (suspend to disk):

[  287.212825] ------------[ cut here ]------------
[  287.212829] WARNING: CPU: 0 PID: 3165 at net/sched/sch_generic.c:316 
dev_watchdog+0x218/0x220
[  287.212830] Modules linked in: mcs7830 usbnet mii snd_hda_codec_hdmi 
snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal 
intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel 
snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core snd_pcm 
i2c_designware_platform i2c_designware_core mei_me mei prime_numbers 
i2c_hid pinctrl_sunrisepoint pinctrl_intel
[  287.212864] CPU: 0 PID: 3165 Comm: gem_exec_suspen Tainted: G     U 
        4.12.0-CI-CI_DRM_2829+ #1
[  287.212865] Hardware name: Dell Inc. XPS 13 9360/093TW6, BIOS 1.3.2 
01/18/2017
[  287.212867] task: ffff8801b4084f40 task.stack: ffffc900001d8000
[  287.212869] RIP: 0010:dev_watchdog+0x218/0x220
[  287.212870] RSP: 0018:ffff88027e403e38 EFLAGS: 00010292
[  287.212872] RAX: 000000000000005a RBX: 0000000000000000 RCX: 
0000000000000000
[  287.212874] RDX: 0000000000000002 RSI: ffffffff81cbcf89 RDI: 
ffffffff81c9c627
[  287.212875] RBP: ffff88027e403e68 R08: 0000000000000000 R09: 
0000000000000001
[  287.212876] R10: 0000000028e9c215 R11: 0000000000000000 R12: 
ffff88026e08a848
[  287.212877] R13: 0000000000000000 R14: ffff88026e050020 R15: 
0000000000000001
[  287.212878] FS:  00007f345056a8c0(0000) GS:ffff88027e400000(0000) 
knlGS:0000000000000000
[  287.212880] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  287.212881] CR2: 00000000008d7008 CR3: 00000001b4314000 CR4: 
00000000003406f0
[  287.212882] Call Trace:
[  287.212883]  <IRQ>
[  287.212886]  ? qdisc_rcu_free+0x40/0x40
[  287.212888]  ? qdisc_rcu_free+0x40/0x40
[  287.212891]  call_timer_fn+0x8e/0x370
[  287.212894]  ? qdisc_rcu_free+0x40/0x40
[  287.212896]  expire_timers+0x150/0x1f0
[  287.212899]  run_timer_softirq+0x7c/0x160
[  287.212903]  __do_softirq+0x116/0x4a0
[  287.212906]  irq_exit+0xa9/0xc0
[  287.212909]  smp_apic_timer_interrupt+0x38/0x50
[  287.212912]  apic_timer_interrupt+0x90/0xa0
[  287.212914] RIP: 0010:delay_tsc+0x33/0xc0
[  287.212916] RSP: 0018:ffffc900001dbcd8 EFLAGS: 00000286 ORIG_RAX: 
ffffffffffffff10
[  287.212918] RAX: 0000000080000000 RBX: 00000005964f23a0 RCX: 
0000000000000001
[  287.212919] RDX: 0000000080000001 RSI: ffffffff81c8e23a RDI: 
00000000ffffffff
[  287.212920] RBP: ffffc900001dbcf8 R08: 0000000000000000 R09: 
0000000000000001
[  287.212921] R10: 0000000000000000 R11: 0000000000000000 R12: 
000000059633478e
[  287.212922] R13: 0000000000249f13 R14: 0000000000000000 R15: 
ffff880272eac008
[  287.212924]  </IRQ>
[  287.212929]  ? delay_tsc+0x6b/0xc0
[  287.212932]  __delay+0xa/0x10
[  287.212934]  __const_udelay+0x31/0x40
[  287.212936]  hibernation_debug_sleep+0x20/0x30
[  287.212938]  hibernation_snapshot+0x2bc/0x5f0
[  287.212940]  hibernate+0x159/0x2f0
[  287.212943]  state_store+0xe0/0xf0
[  287.212947]  kobj_attr_store+0xf/0x20
[  287.212949]  sysfs_kf_write+0x40/0x50
[  287.212951]  kernfs_fop_write+0x130/0x1b0
[  287.212955]  __vfs_write+0x23/0x120
[  287.212957]  ? rcu_read_lock_sched_held+0x75/0x80
[  287.212959]  ? rcu_sync_lockdep_assert+0x2a/0x50
[  287.212961]  ? __sb_start_write+0xfa/0x1f0
[  287.212964]  vfs_write+0xc5/0x1d0
[  287.212966]  ? trace_hardirqs_on_caller+0xe7/0x1c0
[  287.212969]  SyS_write+0x44/0xb0
[  287.212972]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  287.212973] RIP: 0033:0x7f344ed4a4a0
[  287.212974] RSP: 002b:00007ffef50dfaa8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
[  287.212977] RAX: ffffffffffffffda RBX: ffffffff81470683 RCX: 
00007f344ed4a4a0
[  287.212978] RDX: 0000000000000004 RSI: 000000000041d211 RDI: 
0000000000000006
[  287.212979] RBP: ffffc900001dbf88 R08: 00000000008d6a50 R09: 
0000000000000000
[  287.212980] R10: 0000000000000000 R11: 0000000000000246 R12: 
000000000041d211
[  287.212981] R13: 0000000000000006 R14: 0000000000000000 R15: 
0000000000000000
[  287.212984]  ? __this_cpu_preempt_check+0x13/0x20
[  287.212988] Code: 63 8e 18 04 00 00 eb 93 4c 89 f7 c6 05 77 5c 77 00 
01 e8 dc 7f fd ff 89 d9 48 89 c2 4c 89 f6 48 c7 c7 18 f4 cf 81 e8 f1 c4 
9d ff <0f> ff eb c3 0f 1f 40 00 48 c7 47 08 00 00 00 00 55 48 c7 07 00
[  287.213051] ---[ end trace b6016dcc7544a681 ]---

This is caught while running the intel-gpu-tools test named 
'igt@gem_exec_suspend@basic-s4-devices' on the following machines:

  - Intel Kaby Lake-R RVP: Failure rate 123/135 run(s) (91%), last 
occurence: 
https://intel-gfx-ci.01.org/CI/CI_DRM_2828/fi-kbl-r/igt@gem_exec_suspend@basic-s4-devices.html
  - Intel Kaby Lake i7-7560u: Failure rate 196/305 run(s) (64%), last 
occurence: 
https://intel-gfx-ci.01.org/CI/CI_DRM_2827/fi-kbl-7560u/igt@gem_exec_suspend@basic-s4-devices.html 

  - Intel Skylake i7-6600u: Failure rate 23/75 run(s) (30%), last 
occurence: 
https://intel-gfx-ci.01.org/CI/CI_DRM_2824/fi-skl-6600u/igt@gem_exec_suspend@basic-s4-devices.html
  - Intel Sandy Bridge i7-2600: Failure rate 10/293 run(s) (3%), last 
occurence: 
https://intel-gfx-ci.01.org/CI/CI_DRM_2816/fi-snb-2600/igt@gem_exec_suspend@basic-s4-devices.html 


We have plenty of other machines that do not trigger this warning at all.

The bug used to live in fd.o's bugzilla, but it had no business being 
there: https://bugs.freedesktop.org/show_bug.cgi?id=100125

Let me know if I can help in some ways.
---------------------------------------------------------------------
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki 
Business Identity Code: 0357606 - 4 
Domiciled in Helsinki 

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply

* Re: [PATCH] ethernet:netronome:nfp:move spin_lock_bh to spin_lock in tasklet
From: Jakub Kicinski @ 2018-09-07 16:33 UTC (permalink / raw)
  To: jun qian
  Cc: Dirk van der Merwe, Daniel Borkmann, Quentin Monnet, oss-drivers,
	netdev, linux-kernel
In-Reply-To: <20180907162117.47361-1-hangdianqj@163.com>

On Fri,  7 Sep 2018 09:21:17 -0700, jun qian wrote:
> As you are already in a tasklet, it is unnecessary to call spin_lock_bh.
> 
> Signed-off-by: jun qian <hangdianqj@163.com>

We had more of those at some point, I wonder if you(r tool) can find
them :)

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* Re: [PATCH] ethernet:netronome:nfp:move spin_lock_bh to spin_lock in tasklet
From: Jakub Kicinski @ 2018-09-07 16:36 UTC (permalink / raw)
  To: jun qian
  Cc: Dirk van der Merwe, Daniel Borkmann, Quentin Monnet, oss-drivers,
	netdev, linux-kernel
In-Reply-To: <20180907162117.47361-1-hangdianqj@163.com>

On Fri,  7 Sep 2018 09:21:17 -0700, jun qian wrote:
> As you are already in a tasklet, it is unnecessary to call spin_lock_bh.
> 
> Signed-off-by: jun qian <hangdianqj@163.com>

FWIW you should put spaces after the colons.  It's generally a good
practice to look at the prefix previous authors used for a given piece
of code with 

git log -- $file_path

This would be a better subject:

nfp: replace spin_lock_bh with spin_lock in tasklet callback

^ permalink raw reply

* network device suspend/resume
From: Lakshmi @ 2018-09-07 11:57 UTC (permalink / raw)
  To: Oliver Neukum, netdev

Hi,

I am bringing kernel bugzilla bug here 196399
https://bugzilla.kernel.org/show_bug.cgi?id=196399
This issue occurred last time two months ago and I wonder if it appears 
again. Can you confirm if there is any issue related to network device 
suspend/resume.

last instance is here
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4376/fi-skl-6600u/igt@gem_exec_suspend@basic-s4-devices.html

==================================================================
0b95:7720 ASIX Electronics Corp. AX88772
USB network card
driver seems to be usbnet
==================================================================
Bug Report: 196399
We have found out that since at least 4.11-rc1, some machines in the 
Intel GFX CI lab have been generating the following warning when 
suspending to s4 (suspend to disk):

[  287.212825] ------------[ cut here ]------------
[  287.212829] WARNING: CPU: 0 PID: 3165 at net/sched/sch_generic.c:316 
dev_watchdog+0x218/0x220
[  287.212830] Modules linked in: mcs7830 usbnet mii snd_hda_codec_hdmi 
snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal 
intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel 
snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core snd_pcm 
i2c_designware_platform i2c_designware_core mei_me mei prime_numbers 
i2c_hid pinctrl_sunrisepoint pinctrl_intel
[  287.212864] CPU: 0 PID: 3165 Comm: gem_exec_suspen Tainted: G     U 
        4.12.0-CI-CI_DRM_2829+ #1
[  287.212865] Hardware name: Dell Inc. XPS 13 9360/093TW6, BIOS 1.3.2 
01/18/2017
[  287.212867] task: ffff8801b4084f40 task.stack: ffffc900001d8000
[  287.212869] RIP: 0010:dev_watchdog+0x218/0x220
[  287.212870] RSP: 0018:ffff88027e403e38 EFLAGS: 00010292
[  287.212872] RAX: 000000000000005a RBX: 0000000000000000 RCX: 
0000000000000000
[  287.212874] RDX: 0000000000000002 RSI: ffffffff81cbcf89 RDI: 
ffffffff81c9c627
[  287.212875] RBP: ffff88027e403e68 R08: 0000000000000000 R09: 
0000000000000001
[  287.212876] R10: 0000000028e9c215 R11: 0000000000000000 R12: 
ffff88026e08a848
[  287.212877] R13: 0000000000000000 R14: ffff88026e050020 R15: 
0000000000000001
[  287.212878] FS:  00007f345056a8c0(0000) GS:ffff88027e400000(0000) 
knlGS:0000000000000000
[  287.212880] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  287.212881] CR2: 00000000008d7008 CR3: 00000001b4314000 CR4: 
00000000003406f0
[  287.212882] Call Trace:
[  287.212883]  <IRQ>
[  287.212886]  ? qdisc_rcu_free+0x40/0x40
[  287.212888]  ? qdisc_rcu_free+0x40/0x40
[  287.212891]  call_timer_fn+0x8e/0x370
[  287.212894]  ? qdisc_rcu_free+0x40/0x40
[  287.212896]  expire_timers+0x150/0x1f0
[  287.212899]  run_timer_softirq+0x7c/0x160
[  287.212903]  __do_softirq+0x116/0x4a0
[  287.212906]  irq_exit+0xa9/0xc0
[  287.212909]  smp_apic_timer_interrupt+0x38/0x50
[  287.212912]  apic_timer_interrupt+0x90/0xa0
[  287.212914] RIP: 0010:delay_tsc+0x33/0xc0
[  287.212916] RSP: 0018:ffffc900001dbcd8 EFLAGS: 00000286 ORIG_RAX: 
ffffffffffffff10
[  287.212918] RAX: 0000000080000000 RBX: 00000005964f23a0 RCX: 
0000000000000001
[  287.212919] RDX: 0000000080000001 RSI: ffffffff81c8e23a RDI: 
00000000ffffffff
[  287.212920] RBP: ffffc900001dbcf8 R08: 0000000000000000 R09: 
0000000000000001
[  287.212921] R10: 0000000000000000 R11: 0000000000000000 R12: 
000000059633478e
[  287.212922] R13: 0000000000249f13 R14: 0000000000000000 R15: 
ffff880272eac008
[  287.212924]  </IRQ>
[  287.212929]  ? delay_tsc+0x6b/0xc0
[  287.212932]  __delay+0xa/0x10
[  287.212934]  __const_udelay+0x31/0x40
[  287.212936]  hibernation_debug_sleep+0x20/0x30
[  287.212938]  hibernation_snapshot+0x2bc/0x5f0
[  287.212940]  hibernate+0x159/0x2f0
[  287.212943]  state_store+0xe0/0xf0
[  287.212947]  kobj_attr_store+0xf/0x20
[  287.212949]  sysfs_kf_write+0x40/0x50
[  287.212951]  kernfs_fop_write+0x130/0x1b0
[  287.212955]  __vfs_write+0x23/0x120
[  287.212957]  ? rcu_read_lock_sched_held+0x75/0x80
[  287.212959]  ? rcu_sync_lockdep_assert+0x2a/0x50
[  287.212961]  ? __sb_start_write+0xfa/0x1f0
[  287.212964]  vfs_write+0xc5/0x1d0
[  287.212966]  ? trace_hardirqs_on_caller+0xe7/0x1c0
[  287.212969]  SyS_write+0x44/0xb0
[  287.212972]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  287.212973] RIP: 0033:0x7f344ed4a4a0
[  287.212974] RSP: 002b:00007ffef50dfaa8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
[  287.212977] RAX: ffffffffffffffda RBX: ffffffff81470683 RCX: 
00007f344ed4a4a0
[  287.212978] RDX: 0000000000000004 RSI: 000000000041d211 RDI: 
0000000000000006
[  287.212979] RBP: ffffc900001dbf88 R08: 00000000008d6a50 R09: 
0000000000000000
[  287.212980] R10: 0000000000000000 R11: 0000000000000246 R12: 
000000000041d211
[  287.212981] R13: 0000000000000006 R14: 0000000000000000 R15: 
0000000000000000
[  287.212984]  ? __this_cpu_preempt_check+0x13/0x20
[  287.212988] Code: 63 8e 18 04 00 00 eb 93 4c 89 f7 c6 05 77 5c 77 00 
01 e8 dc 7f fd ff 89 d9 48 89 c2 4c 89 f6 48 c7 c7 18 f4 cf 81 e8 f1 c4 
9d ff <0f> ff eb c3 0f 1f 40 00 48 c7 47 08 00 00 00 00 55 48 c7 07 00
[  287.213051] ---[ end trace b6016dcc7544a681 ]---

This is caught while running the intel-gpu-tools test named 
'igt@gem_exec_suspend@basic-s4-devices' on the following machines:

  - Intel Kaby Lake-R RVP: Failure rate 123/135 run(s) (91%), last 
occurence: 
https://intel-gfx-ci.01.org/CI/CI_DRM_2828/fi-kbl-r/igt@gem_exec_suspend@basic-s4-devices.html
  - Intel Kaby Lake i7-7560u: Failure rate 196/305 run(s) (64%), last 
occurence: 
https://intel-gfx-ci.01.org/CI/CI_DRM_2827/fi-kbl-7560u/igt@gem_exec_suspend@basic-s4-devices.html 

  - Intel Skylake i7-6600u: Failure rate 23/75 run(s) (30%), last 
occurence: 
https://intel-gfx-ci.01.org/CI/CI_DRM_2824/fi-skl-6600u/igt@gem_exec_suspend@basic-s4-devices.html
  - Intel Sandy Bridge i7-2600: Failure rate 10/293 run(s) (3%), last 
occurence: 
https://intel-gfx-ci.01.org/CI/CI_DRM_2816/fi-snb-2600/igt@gem_exec_suspend@basic-s4-devices.html 


We have plenty of other machines that do not trigger this warning at all.

The bug used to live in fd.o's bugzilla, but it had no business being 
there: https://bugs.freedesktop.org/show_bug.cgi?id=100125

Let me know if I can help in some ways.
=====================================================================

Lakshmi.
---------------------------------------------------------------------
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki 
Business Identity Code: 0357606 - 4 
Domiciled in Helsinki 

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply

* Re: [PATCH 1/7] fix hnode refcounting
From: Jamal Hadi Salim @ 2018-09-07 12:13 UTC (permalink / raw)
  To: Al Viro; +Cc: netdev, Cong Wang, Jiri Pirko, stable
In-Reply-To: <20180907023529.GV19965@ZenIV.linux.org.uk>

On 2018-09-06 10:35 p.m., Al Viro wrote:
> On Thu, Sep 06, 2018 at 06:21:09AM -0400, Jamal Hadi Salim wrote:
[..]
> 
> Argh...  Unfortunately, there's this: in u32_delete() we have
>          if (root_ht) {
>                  if (root_ht->refcnt > 1) {
>                          *last = false;
>                          goto ret;
>                  }
>                  if (root_ht->refcnt == 1) {
>                          if (!ht_empty(root_ht)) {
>                                  *last = false;
>                                  goto ret;
>                          }
>                  }
>          }
> and that would need to be updated.  

It is not detrimental as you have it right now but
you are right an adjustment is needed...

Deleting of a root directly should not be allowed. But
you can flush a whole tp. Consider this:
--
sudo tc qdisc add dev $P ingress
sudo tc filter add dev $P parent ffff: protocol ip prio 10 \
u32 match ip protocol 1 0xff

Which creates root ht 800

You shouldnt be allowed to do this:
--
tc filter delete dev $P parent ffff: protocol ip prio 10 handle 800: u32
---

But you can delete the tp entirely as such:
---
tc filter delete dev $P parent ffff: protocol ip prio 10 u32
--

The later will go via the destroy() path and flush all filters.

You should also be able to delete individual filters. ex:
$tc filter del dev $P parent ffff: prio 10 handle 800:0:800 u32

Where that code you are referring to is important is when
the last filter deleted - we need the caller to know
and it destroys root.

i.e you should return last=true when the last filter is
deleted so root gets auto deleted (just like it was autocreated)

> However, that logics is bloody odd
> to start with.  First of all, root_ht has come from
>         struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
> and the only place where it's ever modified is
>          rcu_assign_pointer(tp->root, root_ht);
> in u32_init(), where we'd bloody well checked that root_ht is non-NULL
> (see
>          if (root_ht == NULL)
>                  return -ENOBUFS;
> upstream of that place) and where that assignment is inevitable on the
> way to returning 0.  No matter what, if tp has passed u32_init() it
> will have non-NULL ->root, forever.  And there is no way for tcf_proto
> to be seen outside of tcf_proto_create() without ->init() having returned
> 0 - it gets freed before anyone sees it.
> 

Yes, the check for root_ht is not necessary - but the check for the
last filter (and testing for last) is needed.

> So this 'if (root_ht)' can't be false.  What's more, what the hell is the
> whole thing checking?  We are in u32_delete().  It's called (as ->delete())
> from tfilter_del_notify(), which is called from tc_del_tfilter().  If we
> return 0 with *last true, we follow up calling tcf_proto_destroy().
> OK, let's look at the logics in there:
> 	* if there are links to root hnode => false
> 	* if there's no links to root hnode and it has knodes => false
> (BTW, if we ever get there with root_ht->refcnt < 1, we are obviously screwed)
> 	* if there is a tcf_proto sharing tp->data => false (i.e. any filters
> with different prio - don't bother)
> 	* if tp is the only one with reference to tp->data and there are *any*
> knodes => false.
> 
> Any extra links can come only from knodes in a non-empty hnode.  And it's not
> a common case.  Shouldn't thIe whole thing be
> 	* shared tp->data => false
> 	* any non-empty hnode => false
> instead?  Perhaps even with the knode counter in tp->data, avoiding any loops
> in there, as well as the entire ht_empty()...
> 
> Now, in the very beginning of u32_delete() we have this:
>          struct tc_u_hnode *ht = arg;
> 	
>          if (ht == NULL)
>                  goto out;
> OK, but the call of ->delete() is
>          err = tp->ops->delete(tp, fh, last, extack);
> and arg == NULL seen in u32_delete() means fh == NULL in tfilter_del_notify().
> Which is called in
>          if (!fh) {
> 		...
> 	} else {
>                  bool last;
> 
>                  err = tfilter_del_notify(net, skb, n, tp, block,
>                                           q, parent, fh, false, &last,
>                                           extack);
> How can we ever get there with NULL fh?
>

Try:
tc filter delete dev $P parent ffff: protocol ip prio 10 u32
tcm handle is 0, so will hit that code path.

> The whole thing makes very little sense; looks like it used to live in
> u32_destroy() prior to commit 763dbf6328e41 ("net_sched: move the empty tp
> check from ->destroy() to ->delete()"), but looking at the rationale in
> that commit...  I don't see how it fixes anything - sure, now we remove
> tcf_proto from the list before calling ->destroy().  Without any RCU delays
> in between.  How could it possibly solve any issues with ->classify()
> called in parallel with ->destroy()?  cls_u32 (at least these days)
> does try to survive u32_destroy() in parallel with u32_classify();
> if any other classifiers do not, they are still broken and that commit
> has not done anything for them.
> 
> Anyway, adjusting 1/7 for that is trivial, but I would really like to
> understand what that code is doing...  Comments?
> 

Refer to above.

cheers,
jamal

^ permalink raw reply

* [PATCH] nfp: replace spin_lock_bh with spin_lock in tasklet callback
From: jun qian @ 2018-09-07 17:01 UTC (permalink / raw)
  To: Jakub Kicinski, Dirk van der Merwe, Daniel Borkmann,
	Quentin Monnet
  Cc: oss-drivers, netdev, linux-kernel, jun qian

As you are already in a tasklet, it is unnecessary to call spin_lock_bh.

Signed-off-by: jun qian <hangdianqj@163.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index a8b9fbab5f73..084c983ec3c2 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2075,10 +2075,10 @@ static void nfp_ctrl_poll(unsigned long arg)
 {
 	struct nfp_net_r_vector *r_vec = (void *)arg;
 
-	spin_lock_bh(&r_vec->lock);
+	spin_lock(&r_vec->lock);
 	nfp_net_tx_complete(r_vec->tx_ring, 0);
 	__nfp_ctrl_tx_queued(r_vec);
-	spin_unlock_bh(&r_vec->lock);
+	spin_unlock(&r_vec->lock);
 
 	nfp_ctrl_rx(r_vec);
 
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH v3 2/2] net: ethernet: i40evf: fix underlying build error
From: Wang, Dongsheng @ 2018-09-07 17:14 UTC (permalink / raw)
  To: Sergei Shtylyov, jeffrey.t.kirsher@intel.com
  Cc: jacob.e.keller@intel.com, davem@davemloft.net,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <59fd149c-bacc-c39c-b0d8-a4fb9366f26a@cogentembedded.com>

On 9/7/2018 11:33 PM, Sergei Shtylyov wrote:
> On 09/07/2018 02:19 PM, Wang Dongsheng wrote:
>
>> Can't have non-inline function in a header file.
>> There is a risk of "Multiple definition" from cross-including.
>>
>> Tested on: x86_64, make ARCH=i386
>>
>> Modules section .text:
>> i40e: 00019380 <__i40e_add_stat_strings>:
>> i40evf: 00006b00 <__i40e_add_stat_strings>:
>>
>> Buildin section .text:
>> i40e: c351ca60 <__i40e_add_stat_strings>:
>> i40evf: c354f2c0 <__i40e_add_stat_strings>:
>>
>> Signed-off-by: Wang Dongsheng <dongsheng.wang@hxt-semitech.com>
>> ---
>> V3: add static 
>> ---
>>  .../intel/i40evf/i40e_ethtool_stats.h         | 23 +-----------------
>>  .../ethernet/intel/i40evf/i40evf_ethtool.c    | 24 +++++++++++++++++++
>>  2 files changed, 25 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h b/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
>> index 60b595dd8c39..62ab67a77753 100644
>> --- a/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
>> +++ b/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
>> @@ -181,29 +181,8 @@ i40evf_add_queue_stats(u64 **data, struct i40e_ring *ring)
>>  	*data += size;
>>  }
>>  
>> -/**
>> - * __i40e_add_stat_strings - copy stat strings into ethtool buffer
>> - * @p: ethtool supplied buffer
>> - * @stats: stat definitions array
>> - * @size: size of the stats array
>> - *
>> - * Format and copy the strings described by stats into the buffer pointed at
>> - * by p.
>> - **/
>>  static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
>    There's no point to keeping *static* function in the header file (unless it's
> also *inline*).

Yes， we need it for now. Because there is a copy file at i40e dir, and
a "Multiple definition" will show up when we buildin i40e&i40evf and
remove this *static* .

Cause the header file is only used  in ethtool.c so we can keep this
static, and another option is not touch this header.

As I replied to Jacob's email earlier, we can do without touch i40evf at
all. Because this header is only for one and not included in another.


Cheers,

Dongsheng

>> -				    const unsigned int size, ...)
>> -{
>> -	unsigned int i;
>> -
>> -	for (i = 0; i < size; i++) {
>> -		va_list args;
>> -
>> -		va_start(args, size);
>> -		vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
>> -		*p += ETH_GSTRING_LEN;
>> -		va_end(args);
>> -	}
>> -}
>> +				    const unsigned int size, ...);
>>  
>>  /**
>>   * 40e_add_stat_strings - copy stat strings into ethtool buffer
> [...]
>
> MBR, Sergei
>

^ permalink raw reply

* [PATCH] wimax: i2400m: remove unnecessary unlikely()
From: Igor Stoppa @ 2018-09-07 17:22 UTC (permalink / raw)
  To: Inaky Perez-Gonzalez
  Cc: igor.stoppa, Igor Stoppa, linux-wimax, David S. Miller, netdev,
	linux-kernel

WARN_ON() already contains an unlikely(), so it's not necessary to
wrap it into another.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Cc: linux-wimax@intel.com
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/wimax/i2400m/tx.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/wimax/i2400m/tx.c b/drivers/net/wimax/i2400m/tx.c
index f20886ade1cc..59c70d928249 100644
--- a/drivers/net/wimax/i2400m/tx.c
+++ b/drivers/net/wimax/i2400m/tx.c
@@ -655,8 +655,7 @@ void i2400m_tx_close(struct i2400m *i2400m)
 	padding = aligned_size - tx_msg_moved->size;
 	if (padding > 0) {
 		pad_buf = i2400m_tx_fifo_push(i2400m, padding, 0, 0);
-		if (unlikely(WARN_ON(pad_buf == NULL
-				     || pad_buf == TAIL_FULL))) {
+		if (WARN_ON(!pad_buf || pad_buf == TAIL_FULL)) {
 			/* This should not happen -- append should verify
 			 * there is always space left at least to append
 			 * tx_block_size */
-- 
2.17.1

^ permalink raw reply related

* [PATCH] freescale: ethernet: remove unnecessary unlikely()
From: Igor Stoppa @ 2018-09-07 17:23 UTC (permalink / raw)
  To: Madalin Bucur
  Cc: igor.stoppa, Igor Stoppa, David S. Miller, netdev, linux-kernel

Both WARN_ON() and WARN_ONCE() already contain an unlikely(), so it's not
necessary to wrap it into another.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
Cc: Madalin Bucur <madalin.bucur@nxp.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 65a22cd9aef2..783134f1b779 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -1280,7 +1280,7 @@ static int dpaa_bman_release(const struct dpaa_bp *dpaa_bp,
 
 	err = bman_release(dpaa_bp->pool, bmb, cnt);
 	/* Should never occur, address anyway to avoid leaking the buffers */
-	if (unlikely(WARN_ON(err)) && dpaa_bp->free_buf_cb)
+	if (WARN_ON(err) && dpaa_bp->free_buf_cb)
 		while (cnt-- > 0)
 			dpaa_bp->free_buf_cb(dpaa_bp, &bmb[cnt]);
 
@@ -1704,10 +1704,8 @@ static struct sk_buff *contig_fd_to_skb(const struct dpaa_priv *priv,
 
 	skb = build_skb(vaddr, dpaa_bp->size +
 			SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
-	if (unlikely(!skb)) {
-		WARN_ONCE(1, "Build skb failure on Rx\n");
+	if (WARN_ONCE(!skb, "Build skb failure on Rx\n"))
 		goto free_buffer;
-	}
 	WARN_ON(fd_off != priv->rx_headroom);
 	skb_reserve(skb, fd_off);
 	skb_put(skb, qm_fd_get_length(fd));
@@ -1770,7 +1768,7 @@ static struct sk_buff *sg_fd_to_skb(const struct dpaa_priv *priv,
 			sz = dpaa_bp->size +
 				SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 			skb = build_skb(sg_vaddr, sz);
-			if (WARN_ON(unlikely(!skb)))
+			if (WARN_ON(!skb))
 				goto free_buffers;
 
 			skb->ip_summed = rx_csum_offload(priv, fd);
-- 
2.17.1

^ permalink raw reply related

* [PATCH] ethernet: hnae: add unlikely() to assert()
From: Igor Stoppa @ 2018-09-07 17:26 UTC (permalink / raw)
  To: huangdaode
  Cc: igor.stoppa, Igor Stoppa, Yisen Zhuang, Salil Mehta,
	David S. Miller, netdev, linux-kernel

The assert() condition is likely to be true.

Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
Cc: huangdaode <huangdaode@hisilicon.com>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/ethernet/hisilicon/hns/hnae.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h b/drivers/net/ethernet/hisilicon/hns/hnae.h
index 08a750fb60c4..bd3c180a3fe9 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.h
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.h
@@ -47,7 +47,7 @@
 #ifndef assert
 #define assert(expr) \
 do { \
-	if (!(expr)) { \
+	if (unlikely(!(expr))) { \
 		pr_err("Assertion failed! %s, %s, %s, line %d\n", \
 			   #expr, __FILE__, __func__, __LINE__); \
 	} \
-- 
2.17.1

^ permalink raw reply related

* [PATCH] net/core/filter: fix unused-variable warning
From: Anders Roxell @ 2018-09-07 12:50 UTC (permalink / raw)
  To: davem, ast, daniel, tehnerd; +Cc: netdev, linux-kernel, Anders Roxell

Building with CONFIG_INET=n will show the warning below:
net/core/filter.c: In function ‘____bpf_getsockopt’:
net/core/filter.c:4048:19: warning: unused variable ‘tp’ [-Wunused-variable]
  struct tcp_sock *tp;
                   ^~
net/core/filter.c:4046:31: warning: unused variable ‘icsk’ [-Wunused-variable]
  struct inet_connection_sock *icsk;
                               ^~~~
Move the variable declarations inside the {} block where they are used.

Fixes: 1e215300f138 ("bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockopt")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
---
 net/core/filter.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index d301134bca3a..0ae7185b2207 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4043,14 +4043,14 @@ static const struct bpf_func_proto bpf_setsockopt_proto = {
 BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	   int, level, int, optname, char *, optval, int, optlen)
 {
-	struct inet_connection_sock *icsk;
 	struct sock *sk = bpf_sock->sk;
-	struct tcp_sock *tp;
 
 	if (!sk_fullsock(sk))
 		goto err_clear;
 #ifdef CONFIG_INET
 	if (level == SOL_TCP && sk->sk_prot->getsockopt == tcp_getsockopt) {
+		struct inet_connection_sock *icsk;
+		struct tcp_sock *tp;
 		switch (optname) {
 		case TCP_CONGESTION:
 			icsk = inet_csk(sk);
-- 
2.18.0

^ permalink raw reply related

* Re: linux-next: build failure after merge of the net-next tree
From: David Miller @ 2018-09-07 17:31 UTC (permalink / raw)
  To: jacob.e.keller
  Cc: sfr, netdev, linux-next, linux-kernel, jeffrey.t.kirsher,
	andrewx.bowers
In-Reply-To: <02874ECE860811409154E81DA85FBB5884C7B651@ORSMSX115.amr.corp.intel.com>

From: "Keller, Jacob E" <jacob.e.keller@intel.com>
Date: Fri, 7 Sep 2018 15:30:42 +0000

> There's some discussion about this going on in the intel-wired-lan
> mailing list.

I really want to see a pull request in my inbox fixing this by the end
of today or I'll apply a fix directly at my discretion.

^ permalink raw reply

* Re: [PATCH] ethernet:netronome:nfp:move spin_lock_bh to spin_lock in tasklet
From: David Miller @ 2018-09-07 17:35 UTC (permalink / raw)
  To: jakub.kicinski
  Cc: hangdianqj, dirk.vandermerwe, daniel, quentin.monnet, oss-drivers,
	netdev, linux-kernel
In-Reply-To: <20180907183636.48a681f4@cakuba>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Fri, 7 Sep 2018 18:36:36 +0200

> On Fri,  7 Sep 2018 09:21:17 -0700, jun qian wrote:
>> As you are already in a tasklet, it is unnecessary to call spin_lock_bh.
>> 
>> Signed-off-by: jun qian <hangdianqj@163.com>
> 
> FWIW you should put spaces after the colons.  It's generally a good
> practice to look at the prefix previous authors used for a given piece
> of code with 
> 
> git log -- $file_path
> 
> This would be a better subject:
> 
> nfp: replace spin_lock_bh with spin_lock in tasklet callback

Yes, please fix your subject styling.

^ permalink raw reply

* Re: [PATCH] nfp: replace spin_lock_bh with spin_lock in tasklet callback
From: Jakub Kicinski @ 2018-09-07 17:42 UTC (permalink / raw)
  To: jun qian
  Cc: Dirk van der Merwe, Daniel Borkmann, Quentin Monnet, oss-drivers,
	netdev, linux-kernel
In-Reply-To: <20180907170109.48150-1-hangdianqj@163.com>

On Fri,  7 Sep 2018 10:01:09 -0700, jun qian wrote:
> As you are already in a tasklet, it is unnecessary to call spin_lock_bh.
> 
> Signed-off-by: jun qian <hangdianqj@163.com>

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* [PATCH net-next] net: dsa: Expose tagging protocol to user-space
From: Florian Fainelli @ 2018-09-07 18:09 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, David S. Miller,
	open list

There is no way for user-space to know what a given DSA network device's
tagging protocol is. Expose this information through a dsa/tagging
attribute which reflects the tagging protocol currently in use.

This is helpful for configuration (e.g: none behaves dramatically
different wrt. bridges) as well as for packet capture tools when there
is not a proper Ethernet type available.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 Documentation/ABI/testing/sysfs-class-net-dsa |  7 +++
 net/dsa/dsa.c                                 | 43 +++++++++++++++++++
 net/dsa/dsa_priv.h                            |  1 +
 net/dsa/slave.c                               | 28 ++++++++++++
 4 files changed, 79 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-net-dsa

diff --git a/Documentation/ABI/testing/sysfs-class-net-dsa b/Documentation/ABI/testing/sysfs-class-net-dsa
new file mode 100644
index 000000000000..f240221e071e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-net-dsa
@@ -0,0 +1,7 @@
+What:		/sys/class/net/<iface>/tagging
+Date:		August 2018
+KernelVersion:	4.20
+Contact:	netdev@vger.kernel.org
+Description:
+		String indicating the type of tagging protocol used by the
+		DSA slave network device.
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 9f3209ff7ffd..45f70859f550 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -70,6 +70,49 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = {
 	[DSA_TAG_PROTO_NONE] = &none_ops,
 };
 
+const char *dsa_tag_protocol_to_str(const struct dsa_device_ops *ops)
+{
+	const char *protocol_name[DSA_TAG_LAST] = {
+#ifdef CONFIG_NET_DSA_TAG_BRCM
+		[DSA_TAG_PROTO_BRCM] = "brcm",
+#endif
+#ifdef CONFIG_NET_DSA_TAG_BRCM_PREPEND
+		[DSA_TAG_PROTO_BRCM_PREPEND] = "brcm-prepend",
+#endif
+#ifdef CONFIG_NET_DSA_TAG_DSA
+		[DSA_TAG_PROTO_DSA] = "dsa",
+#endif
+#ifdef CONFIG_NET_DSA_TAG_EDSA
+		[DSA_TAG_PROTO_EDSA] = "edsa",
+#endif
+#ifdef CONFIG_NET_DSA_TAG_KSZ
+		[DSA_TAG_PROTO_KSZ] = "ksz",
+#endif
+#ifdef CONFIG_NET_DSA_TAG_LAN9303
+		[DSA_TAG_PROTO_LAN9303] = "lan9303",
+#endif
+#ifdef CONFIG_NET_DSA_TAG_MTK
+		[DSA_TAG_PROTO_MTK] = "mtk",
+#endif
+#ifdef CONFIG_NET_DSA_TAG_QCA
+		[DSA_TAG_PROTO_QCA] = "qca",
+#endif
+#ifdef CONFIG_NET_DSA_TAG_TRAILER
+		[DSA_TAG_PROTO_TRAILER] = "trailer",
+#endif
+		[DSA_TAG_PROTO_NONE] = "none",
+	};
+	unsigned int i;
+
+	BUILD_BUG_ON(ARRAY_SIZE(protocol_name) != DSA_TAG_LAST);
+
+	for (i = 0; i < ARRAY_SIZE(dsa_device_ops); i++)
+		if (ops == dsa_device_ops[i])
+			return protocol_name[i];
+
+	return protocol_name[DSA_TAG_PROTO_NONE];
+};
+
 const struct dsa_device_ops *dsa_resolve_tag_protocol(int tag_protocol)
 {
 	const struct dsa_device_ops *ops;
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 3964c6f7a7c0..2868b5bb7e7d 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -86,6 +86,7 @@ struct dsa_slave_priv {
 /* dsa.c */
 const struct dsa_device_ops *dsa_resolve_tag_protocol(int tag_protocol);
 bool dsa_schedule_work(struct work_struct *work);
+const char *dsa_tag_protocol_to_str(const struct dsa_device_ops *ops);
 
 /* legacy.c */
 #if IS_ENABLED(CONFIG_NET_DSA_LEGACY)
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 1c45c1d6d241..3f840b6eea69 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1058,6 +1058,27 @@ static struct device_type dsa_type = {
 	.name	= "dsa",
 };
 
+static ssize_t tagging_show(struct device *d, struct device_attribute *attr,
+			    char *buf)
+{
+	struct net_device *dev = to_net_dev(d);
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+
+	return sprintf(buf, "%s\n",
+		       dsa_tag_protocol_to_str(dp->cpu_dp->tag_ops));
+}
+static DEVICE_ATTR_RO(tagging);
+
+static struct attribute *dsa_slave_attrs[] = {
+	&dev_attr_tagging.attr,
+	NULL
+};
+
+static const struct attribute_group dsa_group = {
+	.name	= "dsa",
+	.attrs	= dsa_slave_attrs,
+};
+
 static void dsa_slave_phylink_validate(struct net_device *dev,
 				       unsigned long *supported,
 				       struct phylink_link_state *state)
@@ -1353,8 +1374,14 @@ int dsa_slave_create(struct dsa_port *port)
 		goto out_phy;
 	}
 
+	ret = sysfs_create_group(&slave_dev->dev.kobj, &dsa_group);
+	if (ret)
+		goto out_unreg;
+
 	return 0;
 
+out_unreg:
+	unregister_netdev(slave_dev);
 out_phy:
 	rtnl_lock();
 	phylink_disconnect_phy(p->dp->pl);
@@ -1378,6 +1405,7 @@ void dsa_slave_destroy(struct net_device *slave_dev)
 	rtnl_unlock();
 
 	dsa_slave_notify(slave_dev, DSA_PORT_UNREGISTER);
+	sysfs_remove_group(&slave_dev->dev.kobj, &dsa_group);
 	unregister_netdev(slave_dev);
 	phylink_destroy(dp->pl);
 	free_percpu(p->stats64);
-- 
2.17.1

^ permalink raw reply related

* [PATCH] r8169: set TxConfig register after TX / RX is enabled, just like RxConfig
From: Maciej S. Szmigiero @ 2018-09-07 18:15 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David S. Miller
  Cc: netdev, linux-kernel, Azat Khuzhin, Heiner Kallweit

Commit 3559d81e76bf ("r8169: simplify rtl_hw_start_8169") changed order of
two register writes:
1) Caused RxConfig to be written before TX / RX is enabled,
2) Caused TxConfig to be written before TX / RX is enabled.

At least on XIDs 10000000 ("RTL8169sb/8110sb") and
18000000 ("RTL8169sc/8110sc") such writes are ignored by the chip, leaving
values in these registers intact.

Change 1) was reverted by
commit 05212ba8132b42 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices"),
however change 2) wasn't.

In practice, this caused TxConfig's "InterFrameGap time" and "Max DMA Burst
Size per Tx DMA Burst" bits to be zero dramatically reducing TX performance
(in my tests it dropped from around 500Mbps to around 50Mbps).

This patch fixes the issue by moving TxConfig register write a bit later in
the code so it happens after TX / RX is already enabled.

Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Fixes: 05212ba8132b42 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices")
---
"Fixes" tag points to the RxConfig fix instead of the actual commit that
introduced this regression to maintain patch ordering since the RxConfig fix
partially affects the same code lines as this fix.

 drivers/net/ethernet/realtek/r8169.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index b935a18358cb..2ade3a27d7f1 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -4634,13 +4634,13 @@ static void rtl_hw_start(struct  rtl8169_private *tp)

 	rtl_set_rx_max_size(tp);
 	rtl_set_rx_tx_desc_registers(tp);
-	rtl_set_tx_config_registers(tp);
 	RTL_W8(tp, Cfg9346, Cfg9346_Lock);

 	/* Initially a 10 us delay. Turned it into a PCI commit. - FR */
 	RTL_R8(tp, IntrMask);
 	RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
 	rtl_init_rxcfg(tp);
+	rtl_set_tx_config_registers(tp);

 	rtl_set_rx_mode(tp->dev);
 	/* no early-rx interrupts */
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-next v2] net: sched: change tcf_del_walker() to take idrinfo->lock
From: Vlad Buslov @ 2018-09-07 13:51 UTC (permalink / raw)
  To: netdev, xiyou.wangcong; +Cc: jhs, jiri, davem, Vlad Buslov
In-Reply-To: <CAM_iQpVhTeKMQmt55zuZ3vTgtrMajzDo3H-3Nw7+UdrGJCJDrA@mail.gmail.com>

Action API was changed to work with actions and action_idr in concurrency
safe manner, however tcf_del_walker() still uses actions without taking a
reference or idrinfo->lock first, and deletes them directly, disregarding
possible concurrent delete.

Add tc_action_wq workqueue to action API. Implement
tcf_idr_release_unsafe() that assumes external synchronization by caller
and delays blocking action cleanup part to tc_action_wq workqueue. Extend
tcf_action_cleanup() with 'async' argument to indicate that function should
free action asynchronously.

Change tcf_del_walker() to take idrinfo->lock while iterating over actions
and use new tcf_idr_release_unsafe() to release them while holding the
lock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 include/net/act_api.h |  1 +
 net/sched/act_api.c   | 73 ++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 65 insertions(+), 9 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index c6f195b3c706..4c5117bc4afb 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -38,6 +38,7 @@ struct tc_action {
 	struct gnet_stats_queue __percpu *cpu_qstats;
 	struct tc_cookie	__rcu *act_cookie;
 	struct tcf_chain	*goto_chain;
+	struct work_struct	work;
 };
 #define tcf_index	common.tcfa_index
 #define tcf_refcnt	common.tcfa_refcnt
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 6f118d62c731..4ad9062c34b3 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -90,13 +90,38 @@ static void free_tcf(struct tc_action *p)
 	kfree(p);
 }
 
-static void tcf_action_cleanup(struct tc_action *p)
+static void tcf_action_free(struct tc_action *p)
+{
+	gen_kill_estimator(&p->tcfa_rate_est);
+	free_tcf(p);
+}
+
+static void tcf_action_free_work(struct work_struct *work)
+{
+	struct tc_action *p = container_of(work,
+					   struct tc_action,
+					   work);
+
+	tcf_action_free(p);
+}
+
+static struct workqueue_struct *tc_action_wq;
+
+static bool tcf_action_queue_work(struct work_struct *work, work_func_t func)
+{
+	INIT_WORK(work, func);
+	return queue_work(tc_action_wq, work);
+}
+
+static void tcf_action_cleanup(struct tc_action *p, bool async)
 {
 	if (p->ops->cleanup)
 		p->ops->cleanup(p);
 
-	gen_kill_estimator(&p->tcfa_rate_est);
-	free_tcf(p);
+	if (async)
+		tcf_action_queue_work(&p->work, tcf_action_free_work);
+	else
+		tcf_action_free(p);
 }
 
 static int __tcf_action_put(struct tc_action *p, bool bind)
@@ -109,7 +134,7 @@ static int __tcf_action_put(struct tc_action *p, bool bind)
 		idr_remove(&idrinfo->action_idr, p->tcfa_index);
 		spin_unlock(&idrinfo->lock);
 
-		tcf_action_cleanup(p);
+		tcf_action_cleanup(p, false);
 		return 1;
 	}
 
@@ -147,6 +172,24 @@ int __tcf_idr_release(struct tc_action *p, bool bind, bool strict)
 }
 EXPORT_SYMBOL(__tcf_idr_release);
 
+/* Release idr without obtaining idrinfo->lock. Caller must prevent any
+ * concurrent modifications of idrinfo->action_idr!
+ */
+
+static int tcf_idr_release_unsafe(struct tc_action *p)
+{
+	if (atomic_read(&p->tcfa_bindcnt) > 0)
+		return -EPERM;
+
+	if (refcount_dec_and_test(&p->tcfa_refcnt)) {
+		idr_remove(&p->idrinfo->action_idr, p->tcfa_index);
+		tcf_action_cleanup(p, true);
+		return ACT_P_DELETED;
+	}
+
+	return 0;
+}
+
 static size_t tcf_action_shared_attrs_size(const struct tc_action *act)
 {
 	struct tc_cookie *act_cookie;
@@ -262,20 +305,25 @@ static int tcf_del_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 	if (nla_put_string(skb, TCA_KIND, ops->kind))
 		goto nla_put_failure;
 
+	spin_lock(&idrinfo->lock);
 	idr_for_each_entry_ul(idr, p, id) {
-		ret = __tcf_idr_release(p, false, true);
+		ret = tcf_idr_release_unsafe(p);
 		if (ret == ACT_P_DELETED) {
 			module_put(ops->owner);
 			n_i++;
 		} else if (ret < 0) {
-			goto nla_put_failure;
+			goto nla_put_failure_locked;
 		}
 	}
+	spin_unlock(&idrinfo->lock);
+
 	if (nla_put_u32(skb, TCA_FCNT, n_i))
 		goto nla_put_failure;
 	nla_nest_end(skb, nest);
 
 	return n_i;
+nla_put_failure_locked:
+	spin_unlock(&idrinfo->lock);
 nla_put_failure:
 	nla_nest_cancel(skb, nest);
 	return ret;
@@ -341,7 +389,7 @@ static int tcf_idr_delete_index(struct tcf_idrinfo *idrinfo, u32 index)
 						p->tcfa_index));
 			spin_unlock(&idrinfo->lock);
 
-			tcf_action_cleanup(p);
+			tcf_action_cleanup(p, false);
 			module_put(owner);
 			return 0;
 		}
@@ -1713,16 +1761,23 @@ static int __init tc_action_init(void)
 {
 	int err;
 
+	tc_action_wq = alloc_ordered_workqueue("tc_action_workqueue", 0);
+	if (!tc_action_wq)
+		return -ENOMEM;
+
 	err = register_pernet_subsys(&tcf_action_net_ops);
 	if (err)
-		return err;
+		goto err_register_pernet_subsys;
 
 	rtnl_register(PF_UNSPEC, RTM_NEWACTION, tc_ctl_action, NULL, 0);
 	rtnl_register(PF_UNSPEC, RTM_DELACTION, tc_ctl_action, NULL, 0);
 	rtnl_register(PF_UNSPEC, RTM_GETACTION, tc_ctl_action, tc_dump_action,
 		      0);
+	return err;
 
-	return 0;
+err_register_pernet_subsys:
+	destroy_workqueue(tc_action_wq);
+	return err;
 }
 
 subsys_initcall(tc_action_init);
-- 
2.7.5

^ permalink raw reply related

* [iproute2 PATCH] bridge: Correct json output
From: Tobias Jungel @ 2018-09-07 13:42 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

The current implementation adds configured vlans as "vlan": [ ... ] into
an array. This is malformed json and fails to be parsed. This patch
creates an object to include this key value pair.

Test with:

ip l a type bridge
./bridge/bridge -j vlan | jq

fixes c7c1a1ef5

Signed-off-by: Tobias Jungel <tobias.jungel@bisdn.de>
---
 bridge/vlan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/bridge/vlan.c b/bridge/vlan.c
index 19a36b80..9376b985 100644
--- a/bridge/vlan.c
+++ b/bridge/vlan.c
@@ -632,6 +632,7 @@ void print_vlan_info(FILE *fp, struct rtattr *tb)
 	if (!is_json_context())
 		fprintf(fp, "%s", _SL_);
 
+	open_json_object(NULL);
 	open_json_array(PRINT_JSON, "vlan");
 
 	for (i = RTA_DATA(list); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
@@ -659,6 +660,7 @@ void print_vlan_info(FILE *fp, struct rtattr *tb)
 	}
 
 	close_json_array(PRINT_ANY, "\n");
+	close_json_object();
 }
 
 int do_vlan(int argc, char **argv)

^ permalink raw reply related

* [PATCH net-next v2] net: sched: cls_flower: dump offload count value
From: Vlad Buslov @ 2018-09-07 14:22 UTC (permalink / raw)
  To: netdev; +Cc: jakub.kicinski, jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

Change flower in_hw_count type to fixed-size u32 and dump it as
TCA_FLOWER_IN_HW_COUNT. This change is necessary to properly test shared
blocks and re-offload functionality.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h    | 2 +-
 include/uapi/linux/pkt_cls.h | 2 ++
 net/sched/cls_flower.c       | 5 ++++-
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a6d00093f35e..d68ac55539a5 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -362,7 +362,7 @@ static inline void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
 }
 
 static inline void
-tc_cls_offload_cnt_update(struct tcf_block *block, unsigned int *cnt,
+tc_cls_offload_cnt_update(struct tcf_block *block, u32 *cnt,
 			  u32 *flags, bool add)
 {
 	if (add) {
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index be382fb0592d..401d0c1e612d 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -483,6 +483,8 @@ enum {
 	TCA_FLOWER_KEY_ENC_OPTS,
 	TCA_FLOWER_KEY_ENC_OPTS_MASK,
 
+	TCA_FLOWER_IN_HW_COUNT,
+
 	__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 6fd9bdd93796..4b8dd37dd4f8 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -98,7 +98,7 @@ struct cls_fl_filter {
 	struct list_head list;
 	u32 handle;
 	u32 flags;
-	unsigned int in_hw_count;
+	u32 in_hw_count;
 	struct rcu_work rwork;
 	struct net_device *hw_dev;
 };
@@ -1880,6 +1880,9 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, void *fh,
 	if (f->flags && nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags))
 		goto nla_put_failure;
 
+	if (nla_put_u32(skb, TCA_FLOWER_IN_HW_COUNT, f->in_hw_count))
+		goto nla_put_failure;
+
 	if (tcf_exts_dump(skb, &f->exts))
 		goto nla_put_failure;
 
-- 
2.7.5

^ permalink raw reply related

* [PATCH] mt76: another missing 'select' in Kconfig
From: valdis.kletnieks @ 2018-09-07 19:08 UTC (permalink / raw)
  To: Kalle Valo, David S. Miller, Lorenzo Bianconi
  Cc: netdev, linux-kernel, linux-wireless

commit b40b15e1521f7764ea8c68d5a00ecc971b673d21
Author: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Date:   Tue Jul 31 10:09:19 2018 +0200

    mt76: add usb support to mt76 layer

add a new mt76-usb.c for MT76x2U USB devices, but failed to wire
it up for MT76x0U USB devices.

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
---
diff --git a/drivers/net/wireless/mediatek/mt76/Kconfig b/drivers/net/wireless/mediatek/mt76/Kconfig
index 6a270e759006..cec977b81305 100644
--- a/drivers/net/wireless/mediatek/mt76/Kconfig
+++ b/drivers/net/wireless/mediatek/mt76/Kconfig
@@ -17,6 +17,7 @@ config MT76x2_COMMON
 config MT76x0U
 	tristate "MediaTek MT76x0U (USB) support"
 	select MT76_CORE
+	select MT76_USB
 	depends on MAC80211
 	depends on USB
 	select MT76x02_LIB

^ permalink raw reply related

* Re: [PATCH net-next v2] net: sched: cls_flower: dump offload count value
From: Jakub Kicinski @ 2018-09-07 14:49 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem
In-Reply-To: <1536330141-10354-1-git-send-email-vladbu@mellanox.com>

On Fri,  7 Sep 2018 17:22:21 +0300, Vlad Buslov wrote:
> Change flower in_hw_count type to fixed-size u32 and dump it as
> TCA_FLOWER_IN_HW_COUNT. This change is necessary to properly test shared
> blocks and re-offload functionality.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>

LGTM, thanks for the respin!

^ permalink raw reply

* Allow bpf_perf_event_output to access packet data
From: Lorenz Bauer @ 2018-09-07 14:56 UTC (permalink / raw)
  To: netdev

Re-sent due to HTML e-mail mess up, apologies.

---------- Forwarded message ----------
From: Lorenz Bauer <lmb@cloudflare.com>
Date: 7 September 2018 at 15:53
Subject: Allow bpf_perf_event_output to access packet data
To: netdev@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>

Hello list,

I'm attempting to use bpf_perf_event_output to do packet sampling from XDP.

The code basically runs before our other XDP code, does a
perf_event_output with the full packet (for now) and then tail calls
into DDoS mitigation, etc.

I've just discovered that perf_event_output isn't allowed to access
packet data by the verifier. Is this something that could be allowed?

Best
Lorenz

-- 
Lorenz Bauer  |  Systems Engineer
25 Lavington St., London SE1 0NZ

www.cloudflare.com

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox