Netdev List
 help / color / mirror / Atom feed
* [PATCH v3 0/1] multi-threading device shutdown
From: Pavel Tatashin @ 2018-05-07 15:54 UTC (permalink / raw)
  To: pasha.tatashin, steven.sistare, daniel.m.jordan, linux-kernel,
	jeffrey.t.kirsher, intel-wired-lan, netdev, gregkh,
	alexander.duyck, tobin

Changelog
v2 - v3
	- Fixed warning from kbuild test.
	- Moved device_lock/device_unlock inside device_shutdown_tree().

v1 - v2
	- It turns out we cannot lock more than MAX_LOCK_DEPTH by a single
	  thread. (By default this value is 48), and is used to detect
	  deadlocks. So, I re-wrote the code to only lock one devices per
	  thread instead of pre-locking all devices by the main thread.
	- Addressed comments from Tobin C. Harding.
	- As suggested by Alexander Duyck removed ixgbe changes. It can be
	  done as a separate work scaling RTNL mutex.

Do a faster shutdown by calling dev->*->shutdown(dev) in parallel.
device_shutdown() calls these functions for every single device but
only using one thread.

Since, nothing else is running on the machine by the device_shutdown()
s called, there is no reason not to utilize all the available CPU
resources.

Pavel Tatashin (1):
  drivers core: multi-threading device shutdown

 drivers/base/core.c | 275 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 225 insertions(+), 50 deletions(-)

-- 
2.17.0

^ permalink raw reply

* Re: [PATCH v2 net-next 2/4] net: add skeleton of bpfilter kernel module
From: David Miller @ 2018-05-07 15:50 UTC (permalink / raw)
  To: laforge
  Cc: ast, daniel, torvalds, gregkh, luto, netdev, linux-kernel,
	kernel-team
In-Reply-To: <20180507152435.GE19042@nataraja>

From: Harald Welte <laforge@gnumonks.org>
Date: Mon, 7 May 2018 17:24:35 +0200

> But if the ruleset loads but behaves different than before (because e.g.
> it's executed from a completely different place in the stack), that's
> IMHO an absolute no-go that must be avoided at all cost.

That's not what we are doing nor proposing.  I'm sorry if you are
confused on this matter.

The base implementation we strive for will execute the BPF programs
from the existing netfilter hook points.

However, if semantically the effect is equal if we execute the BPF
program from XDP, we will allow that to happen as an optimization.

The BPF exection is where it is in these patches for the purposes of
bootstrapping the bpfilter project and easy testing/benchmarking/hacking.

I hope this clears up your confusion.

If you would like to become involved in hacking on bpfilter to help us
ensure more accurate compatability between existing iptables and what
bpfilter will execute for the same rule sets, we very much look
forward to your contributions and expertiece.

Thank you.

^ permalink raw reply

* INFO: task hung in flush_work
From: syzbot @ 2018-05-07 15:47 UTC (permalink / raw)
  To: davem, linux-kernel, linux-s390, netdev, syzkaller-bugs, ubraun

Hello,

syzbot found the following crash on:

HEAD commit:    8fb11a9a8d51 net/ipv6: rename rt6_next to fib6_next
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=12ca275b800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=c416c61f3cd96be
dashboard link: https://syzkaller.appspot.com/bug?extid=2e7b6af5956e05e5cff7
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+2e7b6af5956e05e5cff7@syzkaller.appspotmail.com

netlink: 4 bytes leftover after parsing attributes in process  
`syz-executor7'.
netlink: 4 bytes leftover after parsing attributes in process  
`syz-executor7'.
INFO: task syz-executor4:17145 blocked for more than 120 seconds.
       Not tainted 4.17.0-rc3+ #33
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor4   D21736 17145   4542 0x80000002
Call Trace:
  context_switch kernel/sched/core.c:2848 [inline]
  __schedule+0x801/0x1e30 kernel/sched/core.c:3490
  schedule+0xef/0x430 kernel/sched/core.c:3549
  schedule_timeout+0x1b5/0x240 kernel/time/timer.c:1777
  do_wait_for_common kernel/sched/completion.c:83 [inline]
  __wait_for_common kernel/sched/completion.c:104 [inline]
  wait_for_common kernel/sched/completion.c:115 [inline]
  wait_for_completion+0x3e7/0x870 kernel/sched/completion.c:136
  flush_work+0x531/0x900 kernel/workqueue.c:2903
  smc_close_active+0x618/0x9c0 net/smc/smc_close.c:189
  smc_release+0x46b/0x610 net/smc/af_smc.c:141
  sock_release+0x96/0x1b0 net/socket.c:594
  sock_close+0x16/0x20 net/socket.c:1149
  __fput+0x34d/0x890 fs/file_table.c:209
  ____fput+0x15/0x20 fs/file_table.c:243
  task_work_run+0x1e4/0x290 kernel/task_work.c:113
  exit_task_work include/linux/task_work.h:22 [inline]
  do_exit+0x1aee/0x2730 kernel/exit.c:865
  do_group_exit+0x16f/0x430 kernel/exit.c:968
  get_signal+0x886/0x1960 kernel/signal.c:2469
  do_signal+0x98/0x2040 arch/x86/kernel/signal.c:810
  exit_to_usermode_loop+0x28a/0x310 arch/x86/entry/common.c:162
  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
  do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455979
RSP: 002b:00007f4181b74ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 000000000072bf78 RCX: 0000000000455979
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bf78
RBP: 000000000072bf78 R08: 0000000000000000 R09: 000000000072bf50
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000a3e81f R14: 00007f4181b759c0 R15: 0000000000000001

Showing all locks held in the system:
2 locks held by khungtaskd/894:
  #0: 000000002a4a1b2a (rcu_read_lock){....}, at:  
check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
  #0: 000000002a4a1b2a (rcu_read_lock){....}, at: watchdog+0x1ff/0xf60  
kernel/hung_task.c:249
  #1: 00000000472c3276 (tasklist_lock){.+.+}, at:  
debug_show_all_locks+0xde/0x34a kernel/locking/lockdep.c:4470
2 locks held by getty/4468:
  #0: 0000000065ad3d93 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000bfe7ad12 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4469:
  #0: 000000006f6b456f (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000d44cbfd2 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4470:
  #0: 0000000039a0b4b8 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000422d9092 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4471:
  #0: 0000000049ab501c (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000b1883d82 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4472:
  #0: 00000000e473e0f9 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 00000000d6a5f6ee (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4473:
  #0: 00000000af39adc0 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 000000005b852d11 (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by getty/4474:
  #0: 00000000b68f2084 (&tty->ldisc_sem){++++}, at:  
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
  #1: 0000000034e0241f (&ldata->atomic_read_lock){+.+.}, at:  
n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131
2 locks held by kworker/0:3/4924:
  #0: 0000000053ed24fb ((wq_completion)"events"){+.+.}, at:  
__write_once_size include/linux/compiler.h:215 [inline]
  #0: 0000000053ed24fb ((wq_completion)"events"){+.+.}, at:  
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
  #0: 0000000053ed24fb ((wq_completion)"events"){+.+.}, at: atomic64_set  
include/asm-generic/atomic-instrumented.h:40 [inline]
  #0: 0000000053ed24fb ((wq_completion)"events"){+.+.}, at: atomic_long_set  
include/asm-generic/atomic-long.h:57 [inline]
  #0: 0000000053ed24fb ((wq_completion)"events"){+.+.}, at: set_work_data  
kernel/workqueue.c:617 [inline]
  #0: 0000000053ed24fb ((wq_completion)"events"){+.+.}, at:  
set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
  #0: 0000000053ed24fb ((wq_completion)"events"){+.+.}, at:  
process_one_work+0xaef/0x1b50 kernel/workqueue.c:2116
  #1: 000000008a2387f6 ((work_completion)(&smc->tcp_listen_work)){+.+.}, at:  
process_one_work+0xb46/0x1b50 kernel/workqueue.c:2120
2 locks held by kworker/1:5/15372:
  #0: 0000000053ed24fb ((wq_completion)"events"){+.+.}, at:  
__write_once_size include/linux/compiler.h:215 [inline]
  #0: 0000000053ed24fb ((wq_completion)"events"){+.+.}, at:  
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
  #0: 0000000053ed24fb ((wq_completion)"events"){+.+.}, at: atomic64_set  
include/asm-generic/atomic-instrumented.h:40 [inline]
  #0: 0000000053ed24fb ((wq_completion)"events"){+.+.}, at: atomic_long_set  
include/asm-generic/atomic-long.h:57 [inline]
  #0: 0000000053ed24fb ((wq_completion)"events"){+.+.}, at: set_work_data  
kernel/workqueue.c:617 [inline]
  #0: 0000000053ed24fb ((wq_completion)"events"){+.+.}, at:  
set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
  #0: 0000000053ed24fb ((wq_completion)"events"){+.+.}, at:  
process_one_work+0xaef/0x1b50 kernel/workqueue.c:2116
  #1: 00000000c29cd536 ((work_completion)(&smc->tcp_listen_work)){+.+.}, at:  
process_one_work+0xb46/0x1b50 kernel/workqueue.c:2120
1 lock held by syz-executor5/18174:
  #0: 00000000fe93fbb2 (sk_lock-AF_INET6){+.+.}, at: lock_sock  
include/net/sock.h:1474 [inline]
  #0: 00000000fe93fbb2 (sk_lock-AF_INET6){+.+.}, at:  
tls_sw_sendmsg+0x1b9/0x12b0 net/tls/tls_sw.c:384

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 894 Comm: khungtaskd Not tainted 4.17.0-rc3+ #33
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
  trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
  check_hung_task kernel/hung_task.c:132 [inline]
  check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
  watchdog+0xc10/0xf60 kernel/hung_task.c:249
  kthread+0x345/0x410 kernel/kthread.c:238
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.17.0-rc3+ #33
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:activate_task+0x0/0x2e0 kernel/sched/core.c:704
RSP: 0018:ffff8801dae07570 EFLAGS: 00000007
RAX: 0000000000000000 RBX: ffff8801d9a9e240 RCX: ffff8801dae07728
RDX: 0000000000000009 RSI: ffff8801d9a9e240 RDI: ffff8801dae2c680
RBP: ffff8801dae07598 R08: ffff88021fff8018 R09: 0000000000000000
R10: ffffed0043fff001 R11: ffff88021fff8017 R12: ffff8801dae2c680
R13: 0000000000000000 R14: ffff8801dae07728 R15: ffff8801dae07728
FS:  0000000000000000(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 00000001b5403000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  <IRQ>
  ttwu_queue kernel/sched/core.c:1840 [inline]
  try_to_wake_up+0x870/0x1190 kernel/sched/core.c:2053
  wake_up_process+0x10/0x20 kernel/sched/core.c:2126
  process_timeout+0x31/0x40 kernel/time/timer.c:1730
  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
  expire_timers kernel/time/timer.c:1363 [inline]
  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
  invoke_softirq kernel/softirq.c:365 [inline]
  irq_exit+0x1d1/0x200 kernel/softirq.c:405
  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
  </IRQ>
RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54
RSP: 0018:ffffffff88c07bc0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: dffffc0000000000 RBX: 1ffffffff1180f7b RCX: 0000000000000000
RDX: 1ffffffff11a3170 RSI: 0000000000000001 RDI: ffffffff88d18b80
RBP: ffffffff88c07bc0 R08: ffffed003b5c46c3 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff88c07c78 R14: ffffffff897c2260 R15: 0000000000000000
  arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
  default_idle+0xc2/0x440 arch/x86/kernel/process.c:354
  arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:345
  default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
  cpuidle_idle_call kernel/sched/idle.c:153 [inline]
  do_idle+0x395/0x560 kernel/sched/idle.c:262
  cpu_startup_entry+0x104/0x120 kernel/sched/idle.c:368
  rest_init+0xe1/0xe4 init/main.c:441
  start_kernel+0x906/0x92d init/main.c:737
  x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:445
  x86_64_start_kernel+0x76/0x79 arch/x86/kernel/head64.c:426
  secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242
Code: 89 45 d0 e8 33 d1 63 00 48 8b 4d c8 4c 8b 45 d0 e9 2f ff ff ff 66 0f  
1f 44 00 00 55 48 89 e5 e8 a7 51 ff ff 5d c3 0f 1f 44 00 00 <48> b8 00 00  
00 00 00 fc ff df 55 48 89 e5 41 56 41 55 4c 8d 6e


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* Re: [bpf-next v2 8/9] bpf: Provide helper to do forwarding lookups in kernel FIB table
From: David Miller @ 2018-05-07 15:36 UTC (permalink / raw)
  To: dsahern
  Cc: daniel, brouer, netdev, borkmann, ast, shm, roopa, toke,
	john.fastabend
In-Reply-To: <4f3037fb-cf76-784f-bc7c-55e6e69104e9@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Mon, 7 May 2018 08:26:47 -0600

> On 5/7/18 8:10 AM, Daniel Borkmann wrote:
>> On 05/07/2018 03:35 PM, Jesper Dangaard Brouer wrote:
>>> On Thu,  3 May 2018 19:54:31 -0700 David Ahern <dsahern@gmail.com> wrote:
>>>
>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>> index 6877426c23a6..cf0d27acf1d1 100644
>>>> --- a/net/core/filter.c
>>>> +++ b/net/core/filter.c
>>> [...]
>>>> +static const struct bpf_func_proto bpf_xdp_fib_lookup_proto = {
>>>> +	.func		= bpf_xdp_fib_lookup,
>>>> +	.gpl_only	= true,
>>>
>>> Is it a deliberate choice to require BPF-progs using this helper to be
>>> GPL licensed?
>>>
>>> Asking as this seems to be the first network related helper with this
>>> requirement, while this is typical for tracing related helpers.
>> 
>> Good point, we should remove that. In networking it's only the perf event
>> output helpers tying into tracing bits. After all, if you do a route lookup
>> via netlink from user space there's no such restriction at all.
>> 
> 
> Networking symbols are typically exported GPL for modules. The person
> writing the code and exporting GPL is specifying a desire that only GPL
> licensed modules can link to the symbol.
> 
> Given the common analogy of modules and bpf programs, why can't a writer
> of a bpf helper specify a preference that only GPL licensed programs
> leverage a BPF helper?

I also think that for this particular set of helpers GPL is appropriate.

Yes, via netlink the same lookup can happen, but not with the same level
of performance and microsecond tuning we've done over the years on this
sophisticated trie lookup code.

Therefore, I think David's choice is very appropriate.

^ permalink raw reply

* Re: [PATCH v2 net-next 2/4] net: add skeleton of bpfilter kernel module
From: Harald Welte @ 2018-05-07 15:24 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, torvalds, gregkh, luto, netdev, linux-kernel,
	kernel-team
In-Reply-To: <20180503043604.1604587-3-ast@kernel.org>

Hi Alexei + netdev list,

On Wed, May 02, 2018 at 09:36:02PM -0700, Alexei Starovoitov wrote:
> Later bpfilter_process_sockopt() will be called from bpfilter hooks
> in get/setsockopt() to pass iptable commands into umh via bpfilter.ko

This is a part I'm quite heavily opposed to - at least at this point.
Unless bpfilter offered something that is semantically compatible to
what netfilter/iptables is currently implementing, I don't think
bpfilter should be [allowed to] overriding the iptables
{get,set}sockopt() calls.

I appreciate that people are working on a different architecture packet
filter than what we used to.   I also understand that there is a need
for backwards compatibility.  I still think it's wrong to offer that
compatibility on the {set,get}sockopt level, rather than on the
"iptables command line utility replacement" level.  But nevermind, you
guys have a different opinion on that, on which we can agree to
disagree.

However, no matter what you do, the most important part from the user
point of view is to make sure you don't break semantics.

netfilter/iptables semantics have an intricate notion abut when which
chain of which table is executed, in which order, at what particular
point of the packet traversal during the network stack.  The packet
filtering rulesets that people have created over more than 18 years
are based on those semantics.  If you offer the same interface, but not
that very same semantics, the packet filtering policies can an will
break - and they will break so in a hidden way.  To the user, it appears
as if the ruleset is loaded with the assumed semantics, but in reality
it isn't.

Unless you can replicate those semantics 1:1, I think it is not only
wrong to override the iptables sockopt interface, but it's outright
dangerous.

Having less matches/targets implemented than original iptables is
something that I believe is acceptable (and inevitable, at least in the
beginning).  If somebody tries to load a related ruleset with bpfilter
active, it will fail gracefully and the user can chose to not use that
match/target in his ruleset, or to not use bpfilter.

But if the ruleset loads but behaves different than before (because e.g.
it's executed from a completely different place in the stack), that's
IMHO an absolute no-go that must be avoided at all cost.  If that's the
case, you are actively breaking network security, rather than creating
it.

So I think there's only two ways to go:

a) replicate the exact semantics/order of the filter/mangle/raw/...
   tables and their chains, both among themselves as well as in terms of
   ordering with other parts of the network stack, or

b) not use the existing tables/chains with their pre-defined semantics
   but rather start new 'tables' which can then have different semantics
   as defined at the time of their implementation.

My apologies if I misunderstood something about bpfilter.  Feel free to
correct me where I'm wrong.  Thanks.

Regards,
	Harald
-- 
- Harald Welte <laforge@gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)

^ permalink raw reply

* Re: The SO_BINDTODEVICE was set to the desired interface, but packets are received from all interfaces.
From: David Ahern @ 2018-05-07 15:23 UTC (permalink / raw)
  To: Paolo Abeni, Damir Mansurov, netdev
  Cc: Konstantin Ushakov, Alexandra N. Kossovsky, Andrey Dmitrov
In-Reply-To: <1525696890.2587.24.camel@redhat.com>

On 5/7/18 6:41 AM, Paolo Abeni wrote:
> Hi,
> On Mon, 2018-05-07 at 13:19 +0300, Damir Mansurov wrote:
>> After successful call of the setsockopt(SO_BINDTODEVICE) function to set 
>> data reception from only one interface, the data is still received from 
>> all interfaces. Function setsockopt() returns 0 but then recv() receives 
>> data from all available network interfaces.
>>
>> The problem is reproducible on linux kernels 4.14 - 4.16, but it does 
>> not on linux kernels 4.4, 4.13.
> 
> I think that the cause is commit:
> 
> commit fb74c27735f0a34e76dbf1972084e984ad2ea145
> Author: David Ahern <dsahern@gmail.com>
> Date:   Mon Aug 7 08:44:16 2017 -0700
> 
>     net: ipv4: add second dif to udp socket lookups
> 
> Something like the following should fix, but I'm unsure it preserves
> the intended semathics for 'sdif'. David, can you please have a look?
> Thanks!
> 
> Paolo
> ---
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index dd3102a37ef9..0d593d5c33cf 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -401,9 +401,9 @@ static int compute_score(struct sock *sk, struct net *net,
>  		bool dev_match = (sk->sk_bound_dev_if == dif ||
>  				  sk->sk_bound_dev_if == sdif);
>  
> -		if (exact_dif && !dev_match)
> +		if (!dev_match)
>  			return -1;
> -		if (sk->sk_bound_dev_if && dev_match)
> +		if (sk->sk_bound_dev_if)
>  			score += 4;
>  	}
>  
> 

yes, that does look like a mistake -- no match on sk_bound_dev_if should
fail the lookup.

Let me apply the diff and run my vrf tests to make sure they still work.

^ permalink raw reply

* [PATCH v2] net: dsa: drop some VLAs in switch.c
From: Salvatore Mesoraca @ 2018-05-07 15:23 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: linux-kernel, kernel-hardening, netdev, David S. Miller,
	Florian Fainelli, Kees Cook, Salvatore Mesoraca, Vivien Didelot,
	David Laight

We avoid 2 VLAs by using a pre-allocated field in dsa_switch.
We also try to avoid dynamic allocation whenever possible.

Link: http://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Link: http://lkml.kernel.org/r/20180505185145.GB32630@lunn.ch

Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>
---
 include/net/dsa.h |  3 +++
 net/dsa/dsa2.c    | 14 ++++++++++++++
 net/dsa/switch.c  | 22 ++++++++++------------
 3 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 60fb4ec..576791d 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -256,6 +256,9 @@ struct dsa_switch {
 	/* Number of switch port queues */
 	unsigned int		num_tx_queues;
 
+	unsigned long		*bitmap;
+	unsigned long		_bitmap;
+
 	/* Dynamically allocated ports, keep last */
 	size_t num_ports;
 	struct dsa_port ports[];
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index adf50fb..cebf35f0 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -748,6 +748,20 @@ struct dsa_switch *dsa_switch_alloc(struct device *dev, size_t n)
 	if (!ds)
 		return NULL;
 
+	/* We avoid allocating memory outside dsa_switch
+	 * if it is not needed.
+	 */
+	if (n <= sizeof(ds->_bitmap) * 8) {
+		ds->bitmap = &ds->_bitmap;
+	} else {
+		ds->bitmap = devm_kzalloc(dev,
+					  BITS_TO_LONGS(n) *
+						sizeof(unsigned long),
+					  GFP_KERNEL);
+		if (unlikely(!ds->bitmap))
+			return NULL;
+	}
+
 	ds->dev = dev;
 	ds->num_ports = n;
 
diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index b935117..142b294 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -136,21 +136,20 @@ static int dsa_switch_mdb_add(struct dsa_switch *ds,
 {
 	const struct switchdev_obj_port_mdb *mdb = info->mdb;
 	struct switchdev_trans *trans = info->trans;
-	DECLARE_BITMAP(group, ds->num_ports);
 	int port;
 
 	/* Build a mask of Multicast group members */
-	bitmap_zero(group, ds->num_ports);
+	bitmap_zero(ds->bitmap, ds->num_ports);
 	if (ds->index == info->sw_index)
-		set_bit(info->port, group);
+		set_bit(info->port, ds->bitmap);
 	for (port = 0; port < ds->num_ports; port++)
 		if (dsa_is_dsa_port(ds, port))
-			set_bit(port, group);
+			set_bit(port, ds->bitmap);
 
 	if (switchdev_trans_ph_prepare(trans))
-		return dsa_switch_mdb_prepare_bitmap(ds, mdb, group);
+		return dsa_switch_mdb_prepare_bitmap(ds, mdb, ds->bitmap);
 
-	dsa_switch_mdb_add_bitmap(ds, mdb, group);
+	dsa_switch_mdb_add_bitmap(ds, mdb, ds->bitmap);
 
 	return 0;
 }
@@ -204,21 +203,20 @@ static int dsa_switch_vlan_add(struct dsa_switch *ds,
 {
 	const struct switchdev_obj_port_vlan *vlan = info->vlan;
 	struct switchdev_trans *trans = info->trans;
-	DECLARE_BITMAP(members, ds->num_ports);
 	int port;
 
 	/* Build a mask of VLAN members */
-	bitmap_zero(members, ds->num_ports);
+	bitmap_zero(ds->bitmap, ds->num_ports);
 	if (ds->index == info->sw_index)
-		set_bit(info->port, members);
+		set_bit(info->port, ds->bitmap);
 	for (port = 0; port < ds->num_ports; port++)
 		if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
-			set_bit(port, members);
+			set_bit(port, ds->bitmap);
 
 	if (switchdev_trans_ph_prepare(trans))
-		return dsa_switch_vlan_prepare_bitmap(ds, vlan, members);
+		return dsa_switch_vlan_prepare_bitmap(ds, vlan, ds->bitmap);
 
-	dsa_switch_vlan_add_bitmap(ds, vlan, members);
+	dsa_switch_vlan_add_bitmap(ds, vlan, ds->bitmap);
 
 	return 0;
 }
-- 
1.9.1

^ permalink raw reply related

* pull request: bluetooth 2018-05-07
From: Johan Hedberg @ 2018-05-07 15:05 UTC (permalink / raw)
  To: davem; +Cc: linux-bluetooth, netdev

[-- Attachment #1: Type: text/plain, Size: 1189 bytes --]

Hi Dave,

Here are a few more Bluetooth fixes for the 4.17 kernel, all for the
btusb driver. Two relate to the needs_reset_resume table, and one is a
revert of a patch for Atheros 1525/QCA6174 which caused a regression for
some people.

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit 2cb5fb1454ef4990f44f3070226ee29201bd5c87:

  MAINTAINERS: add myself as SCTP co-maintainer (2018-04-29 22:49:45 -0400)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git for-upstream

for you to fetch changes up to 596b07a9a22656493726edf1739569102bd3e136:

  Bluetooth: btusb: Add Dell XPS 13 9360 to btusb_needs_reset_resume_table (2018-04-30 10:56:04 +0200)

----------------------------------------------------------------
Hans de Goede (3):
      Revert "Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174"
      Bluetooth: btusb: Only check needs_reset_resume DMI table for QCA rome chipsets
      Bluetooth: btusb: Add Dell XPS 13 9360 to btusb_needs_reset_resume_table

 drivers/bluetooth/btusb.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: Locking in network code
From: Stephen Hemminger @ 2018-05-07 14:48 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Jacob S. Moroni, Netdev
In-Reply-To: <CAKgT0UddSGs-d0cbQV4YN8RLEqa478C7eG3HNFf1Y-yivWPUFw@mail.gmail.com>

On Sun, 6 May 2018 09:16:26 -0700
Alexander Duyck <alexander.duyck@gmail.com> wrote:

> On Sun, May 6, 2018 at 6:43 AM, Jacob S. Moroni <mail@jakemoroni.com> wrote:
> > Hello,
> >
> > I have a stupid question regarding which variant of spin_lock to use
> > throughout the network stack, and inside RX handlers specifically.
> >
> > It's my understanding that skbuffs are normally passed into the stack
> > from soft IRQ context if the device is using NAPI, and hard IRQ
> > context if it's not using NAPI (and I guess process context too if the
> > driver does it's own workqueue thing).
> >
> > So, that means that handlers registered with netdev_rx_handler_register
> > may end up being called from any context.  
> 
> I am pretty sure the Rx handlers are all called from softirq context.
> The hard IRQ will just call netif_rx which will queue the packet up to
> be handles in the soft IRQ later.

The only exception is the netpoll code which runs stack in hardirq context.

> > However, the RX handler in the macvlan code calls ip_check_defrag,
> > which could eventually lead to a call to ip_defrag, which ends
> > up taking a regular spin_lock around the call to ip_frag_queue.
> >
> > Is this a risk of deadlock, and if not, why?
> >
> > What if you're running a system with one CPU and a packet fragment
> > arrives on a NAPI interface, then, while the spin_lock is held,
> > another fragment somehow arrives on another interface which does
> > its processing in hard IRQ context?
> >
> > --
> >   Jacob S. Moroni
> >   mail@jakemoroni.com  
> 
> Take a look at the netif_rx code and it should answer most of your
> questions. Basically everything is handed off from the hard IRQ to the
> soft IRQ via a backlog queue and then handled in net_rx_action.
> 
> - Alex

^ permalink raw reply

* Re: [iproute2-next  1/1] tipc: Add support to set and get MTU for UDP bearer
From: Stephen Hemminger @ 2018-05-07 14:46 UTC (permalink / raw)
  To: GhantaKrishnamurthy MohanKrishna
  Cc: tipc-discussion, jon.maloy, maloy, ying.xue, netdev, davem,
	dsahern
In-Reply-To: <1525691673-22966-1-git-send-email-mohan.krishna.ghanta.krishnamurthy@ericsson.com>

On Mon, 7 May 2018 13:14:33 +0200
GhantaKrishnamurthy MohanKrishna         <mohan.krishna.ghanta.krishnamurthy@ericsson.com> wrote:

> +				struct opt *opts)
> +{
> +	struct opt *opt;
> +
> +	if (!(opt = get_opt(opts, "media"))) {

You don't need to have assignment in conditional context in this case.
Please split the assignment and the if.

^ permalink raw reply

* Re: [PATCH iproute2] rdma: fix header files
From: Stephen Hemminger @ 2018-05-07 14:45 UTC (permalink / raw)
  To: David Ahern; +Cc: swise, netdev
In-Reply-To: <424b2502-e79e-fad7-255d-37a4ddd24068@gmail.com>

On Sat, 5 May 2018 09:17:31 -0600
David Ahern <dsahern@gmail.com> wrote:

> On 5/4/18 10:58 PM, Stephen Hemminger wrote:
> > On Fri, 4 May 2018 16:13:07 -0600
> > David Ahern <dsahern@gmail.com> wrote:
> >   
> >> On 5/4/18 3:56 PM, Stephen Hemminger wrote:  
> >>> All user api headers in iproute2 should be in include/uapi
> >>> so that script can be used to put correct sanitized kernel headers
> >>> there. And the header files for rdma must be a complete set; if one
> >>> header file includes another, all must be present.
> >>>
> >>> This fixes build on older distributions, and Windows Services
> >>> for Linux.
> >>>
> >>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >>> ---
> >>>  include/uapi/rdma/ib_user_sa.h                |   77 ++
> >>>  include/uapi/rdma/ib_user_verbs.h             | 1210 +++++++++++++++++
> >>>  .../uapi/rdma/rdma_netlink.h                  |   13 +
> >>>  .../uapi/rdma/rdma_user_cm.h                  |    6 +-
> >>>  4 files changed, 1303 insertions(+), 3 deletions(-)
> >>>  create mode 100644 include/uapi/rdma/ib_user_sa.h
> >>>  create mode 100644 include/uapi/rdma/ib_user_verbs.h
> >>>  rename {rdma/include => include}/uapi/rdma/rdma_netlink.h (95%)
> >>>  rename {rdma/include => include}/uapi/rdma/rdma_user_cm.h (98%)
> >>>     
> >>
> >> Stephen:
> >>
> >> Per a recent discussion the RDMA folks need to take ownership of the
> >> uapi files. RDMA features do not hit Dave's net-next tree so the rdma
> >> code can never hit iproute2-next during a dev cycle.  
> > 
> > I want all uapi headers in include/uapi because it avoids possible overlap problems,
> > During the linux-net/linus release cycle they should match what is Linus's tree.
> > 
> > During the net-next they can come from two sources.
> >   
> 
> That creates extra work for me for no reason.
> 
> You state above "user api headers in iproute2 should be in include/uapi
> so that script can be used to put correct sanitized kernel headers there."
> 
> With RDMA's development cycle that will *never* happen. With the
> exception of RDMA, all iproute2 features go through net-next and it is
> the right tree to pull updates from. Every time I sync from net-next the
> header files for rdma will have to be ignored, so they will never be
> updated through this mechanism which means the stated goal is not
> achievable.
> 
> As for linux-next, I will not sync header files to a tree that
> disappears; it breaks all traceability. Further, it seems to me that it
> does not really solve the problem. I forget the steps now but RDMA
> features have to hit some development tree before going to Linus, so
> there will be a delay with the headers.
> 
> Back in March we discussed options. iproute2 is nothing more than a
> delivery vehicle for rdmatool. Since it breaks everything else about the
> iproute2 and net-next association, the simplest option for everyone is
> for the rdma group to control syncing their own headers and putting them
> under rdma directory.

There are three different issues here:
1. What directory should rdma files be in. It really doesn't matter that much
   so lets keep them in rdma/incude/uapi.
2. For current master branch the set of include files does not match Linus's current
   tree.
3. The header files for rdma do not include all the referenced headers. My rule has
   been if foo.c includes uapi/foo.h and foo.h includes bar.h then bar.h must also
   be in the include/uapi directory.

I will spin a new patch addressing #2 and #3.

^ permalink raw reply

* [net-next 5/6] fm10k: warn if the stat size is unknown
From: Jeff Kirsher @ 2018-05-07 14:45 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180507144521.6979-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index 0565d6f795e5..2bb7b71cf460 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -247,6 +247,8 @@ static void __fm10k_add_ethtool_stats(u64 **data, void *pointer,
 			*((*data)++) = *(u8 *)p;
 			break;
 		default:
+			WARN_ONCE(1, "unexpected stat size for %s",
+				  stats[i].stat_string);
 			*((*data)++) = 0;
 		}
 	}
-- 
2.14.3

^ permalink raw reply related

* [net-next 4/6] fm10k: use macro to avoid passing the array and size separately
From: Jeff Kirsher @ 2018-05-07 14:45 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180507144521.6979-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Avoid potential bugs with fm10k_add_stat_strings and
fm10k_add_ethtool_stats by using a macro to calculate the ARRAY_SIZE
when passing. This helps ensure that the size is always correct.

Note that it assumes we only pass static const fm10k_stat arrays, and
that evaluation of the argument won't have side effects.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 48 +++++++++++-------------
 1 file changed, 21 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index fa7c026fe874..0565d6f795e5 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -134,8 +134,8 @@ enum {
 static const char fm10k_prv_flags[FM10K_PRV_FLAG_LEN][ETH_GSTRING_LEN] = {
 };
 
-static void fm10k_add_stat_strings(u8 **p, const struct fm10k_stats stats[],
-				   const unsigned int size, ...)
+static void __fm10k_add_stat_strings(u8 **p, const struct fm10k_stats stats[],
+				     const unsigned int size, ...)
 {
 	unsigned int i;
 
@@ -149,31 +149,28 @@ static void fm10k_add_stat_strings(u8 **p, const struct fm10k_stats stats[],
 	}
 }
 
+#define fm10k_add_stat_strings(p, stats, ...) \
+	__fm10k_add_stat_strings(p, stats, ARRAY_SIZE(stats), ## __VA_ARGS__)
+
 static void fm10k_get_stat_strings(struct net_device *dev, u8 *data)
 {
 	struct fm10k_intfc *interface = netdev_priv(dev);
 	unsigned int i;
 
-	fm10k_add_stat_strings(&data, fm10k_gstrings_net_stats,
-			       FM10K_NETDEV_STATS_LEN);
+	fm10k_add_stat_strings(&data, fm10k_gstrings_net_stats);
 
-	fm10k_add_stat_strings(&data, fm10k_gstrings_global_stats,
-			       FM10K_GLOBAL_STATS_LEN);
+	fm10k_add_stat_strings(&data, fm10k_gstrings_global_stats);
 
-	fm10k_add_stat_strings(&data, fm10k_gstrings_mbx_stats,
-			       FM10K_MBX_STATS_LEN);
+	fm10k_add_stat_strings(&data, fm10k_gstrings_mbx_stats);
 
 	if (interface->hw.mac.type != fm10k_mac_vf)
-		fm10k_add_stat_strings(&data, fm10k_gstrings_pf_stats,
-				       FM10K_PF_STATS_LEN);
+		fm10k_add_stat_strings(&data, fm10k_gstrings_pf_stats);
 
 	for (i = 0; i < interface->hw.mac.max_queues; i++) {
 		fm10k_add_stat_strings(&data, fm10k_gstrings_queue_stats,
-				       FM10K_QUEUE_STATS_LEN,
 				       "tx", i);
 
 		fm10k_add_stat_strings(&data, fm10k_gstrings_queue_stats,
-				       FM10K_QUEUE_STATS_LEN,
 				       "rx", i);
 	}
 }
@@ -219,9 +216,9 @@ static int fm10k_get_sset_count(struct net_device *dev, int sset)
 	}
 }
 
-static void fm10k_add_ethtool_stats(u64 **data, void *pointer,
-				    const struct fm10k_stats stats[],
-				    const unsigned int size)
+static void __fm10k_add_ethtool_stats(u64 **data, void *pointer,
+				      const struct fm10k_stats stats[],
+				      const unsigned int size)
 {
 	unsigned int i;
 	char *p;
@@ -255,6 +252,9 @@ static void fm10k_add_ethtool_stats(u64 **data, void *pointer,
 	}
 }
 
+#define fm10k_add_ethtool_stats(data, pointer, stats) \
+	__fm10k_add_ethtool_stats(data, pointer, stats, ARRAY_SIZE(stats))
+
 static void fm10k_get_ethtool_stats(struct net_device *netdev,
 				    struct ethtool_stats __always_unused *stats,
 				    u64 *data)
@@ -265,20 +265,16 @@ static void fm10k_get_ethtool_stats(struct net_device *netdev,
 
 	fm10k_update_stats(interface);
 
-	fm10k_add_ethtool_stats(&data, net_stats, fm10k_gstrings_net_stats,
-				FM10K_NETDEV_STATS_LEN);
+	fm10k_add_ethtool_stats(&data, net_stats, fm10k_gstrings_net_stats);
 
-	fm10k_add_ethtool_stats(&data, interface, fm10k_gstrings_global_stats,
-				FM10K_GLOBAL_STATS_LEN);
+	fm10k_add_ethtool_stats(&data, interface, fm10k_gstrings_global_stats);
 
 	fm10k_add_ethtool_stats(&data, &interface->hw.mbx,
-				fm10k_gstrings_mbx_stats,
-				FM10K_MBX_STATS_LEN);
+				fm10k_gstrings_mbx_stats);
 
 	if (interface->hw.mac.type != fm10k_mac_vf) {
 		fm10k_add_ethtool_stats(&data, interface,
-					fm10k_gstrings_pf_stats,
-					FM10K_PF_STATS_LEN);
+					fm10k_gstrings_pf_stats);
 	}
 
 	for (i = 0; i < interface->hw.mac.max_queues; i++) {
@@ -286,13 +282,11 @@ static void fm10k_get_ethtool_stats(struct net_device *netdev,
 
 		ring = interface->tx_ring[i];
 		fm10k_add_ethtool_stats(&data, ring,
-					fm10k_gstrings_queue_stats,
-					FM10K_QUEUE_STATS_LEN);
+					fm10k_gstrings_queue_stats);
 
 		ring = interface->rx_ring[i];
 		fm10k_add_ethtool_stats(&data, ring,
-					fm10k_gstrings_queue_stats,
-					FM10K_QUEUE_STATS_LEN);
+					fm10k_gstrings_queue_stats);
 	}
 }
 
-- 
2.14.3

^ permalink raw reply related

* [net-next 6/6] fm10k: don't protect fm10k_queue_mac_request by fm10k_host_mbx_ready
From: Jeff Kirsher @ 2018-05-07 14:45 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180507144521.6979-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

We don't actually need to check if the host mbx is ready when queuing
MAC requests, because these are not handled by a special queue which
queues up requests until the mailbox is capable of handling them.

Pull these requests outside the fm10k_host_mbx_ready() check, as it is
not necessary.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 0dc9f2dbc1ad..929f538d28bc 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1537,12 +1537,12 @@ static void *fm10k_dfwd_add_station(struct net_device *dev,
 
 	glort = l2_accel->dglort + 1 + i;
 
-	if (fm10k_host_mbx_ready(interface)) {
+	if (fm10k_host_mbx_ready(interface))
 		hw->mac.ops.update_xcast_mode(hw, glort,
 					      FM10K_XCAST_MODE_NONE);
-		fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
-					hw->mac.default_vid, true);
-	}
+
+	fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
+				hw->mac.default_vid, true);
 
 	for (vid = fm10k_find_next_vlan(interface, 0);
 	     vid < VLAN_N_VID;
@@ -1583,12 +1583,12 @@ static void fm10k_dfwd_del_station(struct net_device *dev, void *priv)
 
 	glort = l2_accel->dglort + 1 + i;
 
-	if (fm10k_host_mbx_ready(interface)) {
+	if (fm10k_host_mbx_ready(interface))
 		hw->mac.ops.update_xcast_mode(hw, glort,
 					      FM10K_XCAST_MODE_NONE);
-		fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
-					hw->mac.default_vid, false);
-	}
+
+	fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
+				hw->mac.default_vid, false);
 
 	for (vid = fm10k_find_next_vlan(interface, 0);
 	     vid < VLAN_N_VID;
-- 
2.14.3

^ permalink raw reply related

* [net-next 3/6] fm10k: use variadic arguments to fm10k_add_stat_strings
From: Jeff Kirsher @ 2018-05-07 14:45 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180507144521.6979-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Instead of using a fixed prefix string we setup before each call to
fm10k_add_stat_strings, modify the helper to take variadic arguments and
pass them to vsnprintf. This requires changing the fm10k_stat strings to
take % format specifiers where necessary, but the resulting code is much
simpler.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 53 ++++++++++++------------
 1 file changed, 27 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index f4cad9ffdc79..fa7c026fe874 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -6,6 +6,11 @@
 #include "fm10k.h"
 
 struct fm10k_stats {
+	/* The stat_string is expected to be a format string formatted using
+	 * vsnprintf by fm10k_add_stat_strings. Every member of a stats array
+	 * should use the same format specifiers as they will be formatted
+	 * using the same variadic arguments.
+	 */
 	char stat_string[ETH_GSTRING_LEN];
 	int sizeof_stat;
 	int stat_offset;
@@ -93,15 +98,13 @@ static const struct fm10k_stats fm10k_gstrings_mbx_stats[] = {
 	FM10K_MBX_STAT("mbx_rx_mbmem_pushed", rx_mbmem_pushed),
 };
 
-#define FM10K_QUEUE_STAT(_name, _stat) { \
-	.stat_string = _name, \
-	.sizeof_stat = FIELD_SIZEOF(struct fm10k_ring, _stat), \
-	.stat_offset = offsetof(struct fm10k_ring, _stat) \
-}
+/* per-queue ring statistics */
+#define FM10K_QUEUE_STAT(_name, _stat) \
+	FM10K_STAT_FIELDS(struct fm10k_ring, _name, _stat)
 
 static const struct fm10k_stats fm10k_gstrings_queue_stats[] = {
-	FM10K_QUEUE_STAT("packets", stats.packets),
-	FM10K_QUEUE_STAT("bytes", stats.bytes),
+	FM10K_QUEUE_STAT("%s_queue_%u_packets", stats.packets),
+	FM10K_QUEUE_STAT("%s_queue_%u_bytes", stats.bytes),
 };
 
 #define FM10K_GLOBAL_STATS_LEN ARRAY_SIZE(fm10k_gstrings_global_stats)
@@ -131,16 +134,18 @@ enum {
 static const char fm10k_prv_flags[FM10K_PRV_FLAG_LEN][ETH_GSTRING_LEN] = {
 };
 
-static void fm10k_add_stat_strings(u8 **p, const char *prefix,
-				   const struct fm10k_stats stats[],
-				   const unsigned int size)
+static void fm10k_add_stat_strings(u8 **p, const struct fm10k_stats stats[],
+				   const unsigned int size, ...)
 {
 	unsigned int i;
 
 	for (i = 0; i < size; i++) {
-		snprintf(*p, ETH_GSTRING_LEN, "%s%s",
-			 prefix, stats[i].stat_string);
+		va_list args;
+
+		va_start(args, size);
+		vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
 		*p += ETH_GSTRING_LEN;
+		va_end(args);
 	}
 }
 
@@ -149,31 +154,27 @@ static void fm10k_get_stat_strings(struct net_device *dev, u8 *data)
 	struct fm10k_intfc *interface = netdev_priv(dev);
 	unsigned int i;
 
-	fm10k_add_stat_strings(&data, "", fm10k_gstrings_net_stats,
+	fm10k_add_stat_strings(&data, fm10k_gstrings_net_stats,
 			       FM10K_NETDEV_STATS_LEN);
 
-	fm10k_add_stat_strings(&data, "", fm10k_gstrings_global_stats,
+	fm10k_add_stat_strings(&data, fm10k_gstrings_global_stats,
 			       FM10K_GLOBAL_STATS_LEN);
 
-	fm10k_add_stat_strings(&data, "", fm10k_gstrings_mbx_stats,
+	fm10k_add_stat_strings(&data, fm10k_gstrings_mbx_stats,
 			       FM10K_MBX_STATS_LEN);
 
 	if (interface->hw.mac.type != fm10k_mac_vf)
-		fm10k_add_stat_strings(&data, "", fm10k_gstrings_pf_stats,
+		fm10k_add_stat_strings(&data, fm10k_gstrings_pf_stats,
 				       FM10K_PF_STATS_LEN);
 
 	for (i = 0; i < interface->hw.mac.max_queues; i++) {
-		char prefix[ETH_GSTRING_LEN];
-
-		snprintf(prefix, ETH_GSTRING_LEN, "tx_queue_%u_", i);
-		fm10k_add_stat_strings(&data, prefix,
-				       fm10k_gstrings_queue_stats,
-				       FM10K_QUEUE_STATS_LEN);
+		fm10k_add_stat_strings(&data, fm10k_gstrings_queue_stats,
+				       FM10K_QUEUE_STATS_LEN,
+				       "tx", i);
 
-		snprintf(prefix, ETH_GSTRING_LEN, "rx_queue_%u_", i);
-		fm10k_add_stat_strings(&data, prefix,
-				       fm10k_gstrings_queue_stats,
-				       FM10K_QUEUE_STATS_LEN);
+		fm10k_add_stat_strings(&data, fm10k_gstrings_queue_stats,
+				       FM10K_QUEUE_STATS_LEN,
+				       "rx", i);
 	}
 }
 
-- 
2.14.3

^ permalink raw reply related

* [net-next 1/6] fm10k: setup VLANs for l2 accelerated macvlan interfaces
From: Jeff Kirsher @ 2018-05-07 14:45 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180507144521.6979-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

We have support for accelerating macvlan devices via the
.ndo_dfwd_add_station() netdev op. These accelerated macvlan MAC
addresses are stored in the l2_accel structure, separate from the
unicast or multicast address lists.

If a VLAN is added on top of the macvlan device by the stack, traffic
will not properly flow to the macvlan. This occurs because we fail to
setup the VLANs for l2_accel MAC addresses.

In the non-offloaded case the MAC address is added to the unicast
address list, and thus the normal setup for enabling VLANs works as
expected.

We also need to add VLANs marked from .ndo_vlan_rx_add_vid() into the
l2_accel MAC addresses. Otherwise, VLAN traffic will not properly be
received by the VLAN devices attached to the offloaded macvlan devices.

Fix this by adding necessary logic to setup VLANs not only for the
unicast and multicast addresses, but also the l2_accel list. We need
similar logic in dfwd_add_station, dfwd_del_station, fm10k_update_vid,
and fm10k_restore_rx_state.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 50 ++++++++++++++++++++++++-
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index c879af72bbf5..0dc9f2dbc1ad 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -907,7 +907,9 @@ static int fm10k_mc_vlan_unsync(struct net_device *netdev,
 static int fm10k_update_vid(struct net_device *netdev, u16 vid, bool set)
 {
 	struct fm10k_intfc *interface = netdev_priv(netdev);
+	struct fm10k_l2_accel *l2_accel = interface->l2_accel;
 	struct fm10k_hw *hw = &interface->hw;
+	u16 glort;
 	s32 err;
 	int i;
 
@@ -975,6 +977,22 @@ static int fm10k_update_vid(struct net_device *netdev, u16 vid, bool set)
 	if (err)
 		goto err_out;
 
+	/* Update L2 accelerated macvlan addresses */
+	if (l2_accel) {
+		for (i = 0; i < l2_accel->size; i++) {
+			struct net_device *sdev = l2_accel->macvlan[i];
+
+			if (!sdev)
+				continue;
+
+			glort = l2_accel->dglort + 1 + i;
+
+			fm10k_queue_mac_request(interface, glort,
+						sdev->dev_addr,
+						vid, set);
+		}
+	}
+
 	/* set VLAN ID prior to syncing/unsyncing the VLAN */
 	interface->vid = vid + (set ? VLAN_N_VID : 0);
 
@@ -1214,6 +1232,22 @@ void fm10k_restore_rx_state(struct fm10k_intfc *interface)
 
 		fm10k_queue_mac_request(interface, glort,
 					hw->mac.addr, vid, true);
+
+		/* synchronize macvlan addresses */
+		if (l2_accel) {
+			for (i = 0; i < l2_accel->size; i++) {
+				struct net_device *sdev = l2_accel->macvlan[i];
+
+				if (!sdev)
+					continue;
+
+				glort = l2_accel->dglort + 1 + i;
+
+				fm10k_queue_mac_request(interface, glort,
+							sdev->dev_addr,
+							vid, true);
+			}
+		}
 	}
 
 	/* update xcast mode before synchronizing addresses if host's mailbox
@@ -1430,7 +1464,7 @@ static void *fm10k_dfwd_add_station(struct net_device *dev,
 	struct fm10k_dglort_cfg dglort = { 0 };
 	struct fm10k_hw *hw = &interface->hw;
 	int size = 0, i;
-	u16 glort;
+	u16 vid, glort;
 
 	/* The hardware supported by fm10k only filters on the destination MAC
 	 * address. In order to avoid issues we only support offloading modes
@@ -1510,6 +1544,12 @@ static void *fm10k_dfwd_add_station(struct net_device *dev,
 					hw->mac.default_vid, true);
 	}
 
+	for (vid = fm10k_find_next_vlan(interface, 0);
+	     vid < VLAN_N_VID;
+	     vid = fm10k_find_next_vlan(interface, vid))
+		fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
+					vid, true);
+
 	fm10k_mbx_unlock(interface);
 
 	return sdev;
@@ -1522,8 +1562,8 @@ static void fm10k_dfwd_del_station(struct net_device *dev, void *priv)
 	struct fm10k_dglort_cfg dglort = { 0 };
 	struct fm10k_hw *hw = &interface->hw;
 	struct net_device *sdev = priv;
+	u16 vid, glort;
 	int i;
-	u16 glort;
 
 	if (!l2_accel)
 		return;
@@ -1550,6 +1590,12 @@ static void fm10k_dfwd_del_station(struct net_device *dev, void *priv)
 					hw->mac.default_vid, false);
 	}
 
+	for (vid = fm10k_find_next_vlan(interface, 0);
+	     vid < VLAN_N_VID;
+	     vid = fm10k_find_next_vlan(interface, vid))
+		fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
+					vid, false);
+
 	fm10k_mbx_unlock(interface);
 
 	/* record removal */
-- 
2.14.3

^ permalink raw reply related

* [net-next 2/6] fm10k: reduce duplicate fm10k_stat macro code
From: Jeff Kirsher @ 2018-05-07 14:45 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180507144521.6979-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Share some of the code for setting up fm10k_stat macros by implementing
an FM10K_STAT_FIELDS macro which we can use when setting up the type
specific macros.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 28 ++++++++++++------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index eeac2b75a195..f4cad9ffdc79 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -11,12 +11,16 @@ struct fm10k_stats {
 	int stat_offset;
 };
 
-#define FM10K_NETDEV_STAT(_net_stat) { \
-	.stat_string = #_net_stat, \
-	.sizeof_stat = FIELD_SIZEOF(struct net_device_stats, _net_stat), \
-	.stat_offset = offsetof(struct net_device_stats, _net_stat) \
+#define FM10K_STAT_FIELDS(_type, _name, _stat) { \
+	.stat_string = _name, \
+	.sizeof_stat = FIELD_SIZEOF(_type, _stat), \
+	.stat_offset = offsetof(_type, _stat) \
 }
 
+/* netdevice statistics */
+#define FM10K_NETDEV_STAT(_net_stat) \
+	FM10K_STAT_FIELDS(struct net_device_stats, #_net_stat, _net_stat)
+
 static const struct fm10k_stats fm10k_gstrings_net_stats[] = {
 	FM10K_NETDEV_STAT(tx_packets),
 	FM10K_NETDEV_STAT(tx_bytes),
@@ -34,11 +38,9 @@ static const struct fm10k_stats fm10k_gstrings_net_stats[] = {
 
 #define FM10K_NETDEV_STATS_LEN	ARRAY_SIZE(fm10k_gstrings_net_stats)
 
-#define FM10K_STAT(_name, _stat) { \
-	.stat_string = _name, \
-	.sizeof_stat = FIELD_SIZEOF(struct fm10k_intfc, _stat), \
-	.stat_offset = offsetof(struct fm10k_intfc, _stat) \
-}
+/* General interface statistics */
+#define FM10K_STAT(_name, _stat) \
+	FM10K_STAT_FIELDS(struct fm10k_intfc, _name, _stat)
 
 static const struct fm10k_stats fm10k_gstrings_global_stats[] = {
 	FM10K_STAT("tx_restart_queue", restart_queue),
@@ -75,11 +77,9 @@ static const struct fm10k_stats fm10k_gstrings_pf_stats[] = {
 	FM10K_STAT("nodesc_drop", stats.nodesc_drop.count),
 };
 
-#define FM10K_MBX_STAT(_name, _stat) { \
-	.stat_string = _name, \
-	.sizeof_stat = FIELD_SIZEOF(struct fm10k_mbx_info, _stat), \
-	.stat_offset = offsetof(struct fm10k_mbx_info, _stat) \
-}
+/* mailbox statistics */
+#define FM10K_MBX_STAT(_name, _stat) \
+	FM10K_STAT_FIELDS(struct fm10k_mbx_info, _name, _stat)
 
 static const struct fm10k_stats fm10k_gstrings_mbx_stats[] = {
 	FM10K_MBX_STAT("mbx_tx_busy", tx_busy),
-- 
2.14.3

^ permalink raw reply related

* [net-next 0/6][pull request] 100GbE Intel Wired LAN Driver Updates 2018-05-07
From: Jeff Kirsher @ 2018-05-07 14:45 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to fm10k only.

Jake provides all the changes in the series, starting with adding
support for accelerated MACVLAN devices.  Reduced code duplication by
implementing a macro to be used when setting up the type specific
macros.  Avoided potential bugs with stats by using a macro to calculate
the array size when passing to ensure that the size is correct.

The following are changes since commit 90278871d4b0da39c84fc9aa4929b0809dc7cf3c:
  Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 100GbE

Jacob Keller (6):
  fm10k: setup VLANs for l2 accelerated macvlan interfaces
  fm10k: reduce duplicate fm10k_stat macro code
  fm10k: use variadic arguments to fm10k_add_stat_strings
  fm10k: use macro to avoid passing the array and size separately
  fm10k: warn if the stat size is unknown
  fm10k: don't protect fm10k_queue_mac_request by fm10k_host_mbx_ready

 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 115 +++++++++++------------
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |  62 ++++++++++--
 2 files changed, 110 insertions(+), 67 deletions(-)

-- 
2.14.3

^ permalink raw reply

* Re: [PATCH net-next v2 1/3] ipv4: support sport and dport in RTM_GETROUTE
From: Roopa Prabhu @ 2018-05-07 14:42 UTC (permalink / raw)
  To: David Ahern; +Cc: David Miller, netdev, Nikolay Aleksandrov, Ido Schimmel
In-Reply-To: <a6027834-d056-5ff9-82e7-52d4ba2e90fb@cumulusnetworks.com>

On Sun, May 6, 2018 at 6:46 PM, David Ahern <dsa@cumulusnetworks.com> wrote:
> On 5/6/18 6:59 PM, Roopa Prabhu wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> This is a followup to fib rules sport, dport match support.
>> Having them supported in getroute makes it easier to test
>> fib rule lookups. Used by fib rule self tests. Before this patch
>> getroute used same skb to pass through the route lookup and
>> for the netlink getroute reply msg. This patch allocates separate
>> skb's to keep flow dissector happy.
>>
>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>> ---
>>  include/uapi/linux/rtnetlink.h |   2 +
>>  net/ipv4/route.c               | 151 ++++++++++++++++++++++++++++++-----------
>>  2 files changed, 115 insertions(+), 38 deletions(-)
>>
>> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
>> index 9b15005..630ecf4 100644
>> --- a/include/uapi/linux/rtnetlink.h
>> +++ b/include/uapi/linux/rtnetlink.h
>> @@ -327,6 +327,8 @@ enum rtattr_type_t {
>>       RTA_PAD,
>>       RTA_UID,
>>       RTA_TTL_PROPAGATE,
>> +     RTA_SPORT,
>> +     RTA_DPORT,
>
> If you are going to add sport and dport because of the potential for FIB
> rules, you need to add ip-proto as well. I realize existing code assumed
> UDP, but the FIB rules cover any IP proto. Yes, I know this makes the
> change much larger to generate tcp, udp as well as iphdr options; the
> joys of new features. ;-)


:) sure..like i mentioned in the cover letter..., i was thinking of
submitting follow up patches for more ip_proto.
since i will be spinning v3, let me see if i can include that too.



>
> I also suggest a comment that these new RTA attributes are used for
> GETROUTE only.


sure

>
> And you need to add the new entries to rtm_ipv4_policy.
>
>
>>       __RTA_MAX
>>  };

ack,

>>
>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>> index 1412a7b..e91ed62 100644
>> --- a/net/ipv4/route.c
>> +++ b/net/ipv4/route.c
>> @@ -2568,11 +2568,10 @@ struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
>>  EXPORT_SYMBOL_GPL(ip_route_output_flow);
>>
>>  /* called with rcu_read_lock held */
>> -static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
>> -                     struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
>> -                     u32 seq)
>> +static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
>> +                     struct rtable *rt, u32 table_id, struct flowi4 *fl4,
>> +                     struct sk_buff *skb, u32 portid, u32 seq)
>>  {
>> -     struct rtable *rt = skb_rtable(skb);
>>       struct rtmsg *r;
>>       struct nlmsghdr *nlh;
>>       unsigned long expires = 0;
>> @@ -2651,6 +2650,14 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
>>                       from_kuid_munged(current_user_ns(), fl4->flowi4_uid)))
>>               goto nla_put_failure;
>>
>> +     if (fl4->fl4_sport &&
>> +         nla_put_be16(skb, RTA_SPORT, fl4->fl4_sport))
>> +             goto nla_put_failure;
>> +
>> +     if (fl4->fl4_dport &&
>> +         nla_put_be16(skb, RTA_DPORT, fl4->fl4_dport))
>> +             goto nla_put_failure;
>
> Why return the attributes to the user? I can't see any value in that.
> UID option is not returned either so there is precedence.

hmm..i do see UID returned just 2 lines above. :)

In the least i think it will confirm that the kernel did see your attributes :).


>
>
>> +
>>       error = rt->dst.error;
>>
>>       if (rt_is_input_route(rt)) {
>> @@ -2668,7 +2675,7 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
>>                       }
>>               } else
>>  #endif
>> -                     if (nla_put_u32(skb, RTA_IIF, skb->dev->ifindex))
>> +                     if (nla_put_u32(skb, RTA_IIF, fl4->flowi4_iif))
>>                               goto nla_put_failure;
>>       }
>>
>> @@ -2683,35 +2690,86 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
>>       return -EMSGSIZE;
>>  }
>>
>> +static int nla_get_port(struct nlattr *attr, __be16 *port)
>> +{
>> +     int p = nla_get_be16(attr);
>
> __be16 p;
>
>> +
>> +     if (p <= 0 || p >= 0xffff)
>> +             return -EINVAL;
>
> This check is not needed by definition of be16.

ack, will fix the kbuild sparse warning also for both v4 and v6.


>
>> +
>> +     *port = p;
>> +     return 0;
>> +}
>> +
>> +static int inet_rtm_getroute_reply(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>> +                                __be32 dst, __be32 src, struct flowi4 *fl4,
>> +                                struct rtable *rt, struct fib_result *res)
>> +{
>> +     struct net *net = sock_net(in_skb->sk);
>> +     struct rtmsg *rtm = nlmsg_data(nlh);
>> +     u32 table_id = RT_TABLE_MAIN;
>> +     struct sk_buff *skb;
>> +     int err = 0;
>> +
>> +     skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
>> +     if (!skb) {
>> +             err = -ENOMEM;
>> +             return err;
>> +     }
>
> just 'return -ENOMEM' and without the {}.

yep

>
>
>> +
>> +     if (rtm->rtm_flags & RTM_F_LOOKUP_TABLE)
>> +             table_id = res->table ? res->table->tb_id : 0;
>> +
>> +     if (rtm->rtm_flags & RTM_F_FIB_MATCH)
>> +             err = fib_dump_info(skb, NETLINK_CB(in_skb).portid,
>> +                                 nlh->nlmsg_seq, RTM_NEWROUTE, table_id,
>> +                                 rt->rt_type, res->prefix, res->prefixlen,
>> +                                 fl4->flowi4_tos, res->fi, 0);
>> +     else
>> +             err = rt_fill_info(net, dst, src, rt, table_id,
>> +                                fl4, skb, NETLINK_CB(in_skb).portid,
>> +                                nlh->nlmsg_seq);
>> +     if (err < 0)
>> +             goto errout;
>> +
>> +     return rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
>> +
>> +errout:
>> +     kfree_skb(skb);
>> +     return err;
>> +}
>> +
>>  static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>>                            struct netlink_ext_ack *extack)
>>  {
>>       struct net *net = sock_net(in_skb->sk);
>> -     struct rtmsg *rtm;
>>       struct nlattr *tb[RTA_MAX+1];
>> +     __be16 sport = 0, dport = 0;
>>       struct fib_result res = {};
>>       struct rtable *rt = NULL;
>> +     struct sk_buff *skb;
>> +     struct rtmsg *rtm;
>>       struct flowi4 fl4;
>> +     struct iphdr *iph;
>> +     struct udphdr *udph;
>>       __be32 dst = 0;
>>       __be32 src = 0;
>> +     kuid_t uid;
>>       u32 iif;
>>       int err;
>>       int mark;
>> -     struct sk_buff *skb;
>> -     u32 table_id = RT_TABLE_MAIN;
>> -     kuid_t uid;
>>
>>       err = nlmsg_parse(nlh, sizeof(*rtm), tb, RTA_MAX, rtm_ipv4_policy,
>>                         extack);
>>       if (err < 0)
>> -             goto errout;
>> +             return err;
>>
>>       rtm = nlmsg_data(nlh);
>>
>>       skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
>>       if (!skb) {
>>               err = -ENOBUFS;
>> -             goto errout;
>> +             return err;
>
> just return -ENOBUFS
>

yes

>
>>       }
>>
>>       /* Reserve room for dummy headers, this skb can pass
>

^ permalink raw reply

* Re: [bpf-next v2 8/9] bpf: Provide helper to do forwarding lookups in kernel FIB table
From: David Ahern @ 2018-05-07 14:26 UTC (permalink / raw)
  To: Daniel Borkmann, Jesper Dangaard Brouer
  Cc: netdev, borkmann, ast, davem, shm, roopa, toke, john.fastabend
In-Reply-To: <247aa1c4-e860-db72-fbb4-9da6c3a3f84c@iogearbox.net>

On 5/7/18 8:10 AM, Daniel Borkmann wrote:
> On 05/07/2018 03:35 PM, Jesper Dangaard Brouer wrote:
>> On Thu,  3 May 2018 19:54:31 -0700 David Ahern <dsahern@gmail.com> wrote:
>>
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index 6877426c23a6..cf0d27acf1d1 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>> [...]
>>> +static const struct bpf_func_proto bpf_xdp_fib_lookup_proto = {
>>> +	.func		= bpf_xdp_fib_lookup,
>>> +	.gpl_only	= true,
>>
>> Is it a deliberate choice to require BPF-progs using this helper to be
>> GPL licensed?
>>
>> Asking as this seems to be the first network related helper with this
>> requirement, while this is typical for tracing related helpers.
> 
> Good point, we should remove that. In networking it's only the perf event
> output helpers tying into tracing bits. After all, if you do a route lookup
> via netlink from user space there's no such restriction at all.
> 

Networking symbols are typically exported GPL for modules. The person
writing the code and exporting GPL is specifying a desire that only GPL
licensed modules can link to the symbol.

Given the common analogy of modules and bpf programs, why can't a writer
of a bpf helper specify a preference that only GPL licensed programs
leverage a BPF helper?

^ permalink raw reply

* Re: [PATCH] net: 8390: Fix possible data races in __ei_get_stats
From: Eric Dumazet @ 2018-05-07 14:15 UTC (permalink / raw)
  To: Jia-Ju Bai, davem, fthain, joe; +Cc: netdev, linux-kernel
In-Reply-To: <20180507140809.28847-1-baijiaju1990@gmail.com>



On 05/07/2018 07:08 AM, Jia-Ju Bai wrote:
> The write operations to "dev->stats" are protected by 
> the spinlock on line 862-864, but the read operations to
> this data on line 858 and 867 are not protected by the spinlock.
> Thus, there may exist data races for "dev->stats".
> 
> To fix the data races, the read operations to "dev->stats" are 
> protected by the spinlock, and a local variable is used for return.
> 
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
> ---
>  drivers/net/ethernet/8390/lib8390.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/8390/lib8390.c b/drivers/net/ethernet/8390/lib8390.c
> index c9c55c9eab9f..198952247d30 100644
> --- a/drivers/net/ethernet/8390/lib8390.c
> +++ b/drivers/net/ethernet/8390/lib8390.c
> @@ -852,19 +852,25 @@ static struct net_device_stats *__ei_get_stats(struct net_device *dev)
>  	unsigned long ioaddr = dev->base_addr;
>  	struct ei_device *ei_local = netdev_priv(dev);
>  	unsigned long flags;
> +	struct net_device_stats *stats;
> +
> +	spin_lock_irqsave(&ei_local->page_lock, flags);
>  
>  	/* If the card is stopped, just return the present stats. */
> -	if (!netif_running(dev))
> -		return &dev->stats;
> +	if (!netif_running(dev)) {
> +		stats = &dev->stats;
> +		spin_unlock_irqrestore(&ei_local->page_lock, flags);
> +		return stats;
> +	}
>  
> -	spin_lock_irqsave(&ei_local->page_lock, flags);
>  	/* Read the counter registers, assuming we are in page 0. */
>  	dev->stats.rx_frame_errors  += ei_inb_p(ioaddr + EN0_COUNTER0);
>  	dev->stats.rx_crc_errors    += ei_inb_p(ioaddr + EN0_COUNTER1);
>  	dev->stats.rx_missed_errors += ei_inb_p(ioaddr + EN0_COUNTER2);
> +	stats = &dev->stats;
>  	spin_unlock_irqrestore(&ei_local->page_lock, flags);
>  
> -	return &dev->stats;
> +	return stats;
>  }
>  
>  /*
> 

dev->stats is not a pointer, it is an array embedded in the 
struct net_device

So this patch is not needed, since dev->stats can not change.

^ permalink raw reply

* Re: [bpf-next v2 8/9] bpf: Provide helper to do forwarding lookups in kernel FIB table
From: Daniel Borkmann @ 2018-05-07 14:10 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Ahern
  Cc: netdev, borkmann, ast, davem, shm, roopa, toke, john.fastabend
In-Reply-To: <20180507153552.5063a4b2@redhat.com>

On 05/07/2018 03:35 PM, Jesper Dangaard Brouer wrote:
> On Thu,  3 May 2018 19:54:31 -0700 David Ahern <dsahern@gmail.com> wrote:
> 
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 6877426c23a6..cf0d27acf1d1 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
> [...]
>> +static const struct bpf_func_proto bpf_xdp_fib_lookup_proto = {
>> +	.func		= bpf_xdp_fib_lookup,
>> +	.gpl_only	= true,
> 
> Is it a deliberate choice to require BPF-progs using this helper to be
> GPL licensed?
> 
> Asking as this seems to be the first network related helper with this
> requirement, while this is typical for tracing related helpers.

Good point, we should remove that. In networking it's only the perf event
output helpers tying into tracing bits. After all, if you do a route lookup
via netlink from user space there's no such restriction at all.

^ permalink raw reply

* [PATCH] net: 8390: Fix possible data races in __ei_get_stats
From: Jia-Ju Bai @ 2018-05-07 14:08 UTC (permalink / raw)
  To: davem, fthain, joe; +Cc: netdev, linux-kernel, Jia-Ju Bai

The write operations to "dev->stats" are protected by 
the spinlock on line 862-864, but the read operations to
this data on line 858 and 867 are not protected by the spinlock.
Thus, there may exist data races for "dev->stats".

To fix the data races, the read operations to "dev->stats" are 
protected by the spinlock, and a local variable is used for return.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 drivers/net/ethernet/8390/lib8390.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/8390/lib8390.c b/drivers/net/ethernet/8390/lib8390.c
index c9c55c9eab9f..198952247d30 100644
--- a/drivers/net/ethernet/8390/lib8390.c
+++ b/drivers/net/ethernet/8390/lib8390.c
@@ -852,19 +852,25 @@ static struct net_device_stats *__ei_get_stats(struct net_device *dev)
 	unsigned long ioaddr = dev->base_addr;
 	struct ei_device *ei_local = netdev_priv(dev);
 	unsigned long flags;
+	struct net_device_stats *stats;
+
+	spin_lock_irqsave(&ei_local->page_lock, flags);
 
 	/* If the card is stopped, just return the present stats. */
-	if (!netif_running(dev))
-		return &dev->stats;
+	if (!netif_running(dev)) {
+		stats = &dev->stats;
+		spin_unlock_irqrestore(&ei_local->page_lock, flags);
+		return stats;
+	}
 
-	spin_lock_irqsave(&ei_local->page_lock, flags);
 	/* Read the counter registers, assuming we are in page 0. */
 	dev->stats.rx_frame_errors  += ei_inb_p(ioaddr + EN0_COUNTER0);
 	dev->stats.rx_crc_errors    += ei_inb_p(ioaddr + EN0_COUNTER1);
 	dev->stats.rx_missed_errors += ei_inb_p(ioaddr + EN0_COUNTER2);
+	stats = &dev->stats;
 	spin_unlock_irqrestore(&ei_local->page_lock, flags);
 
-	return &dev->stats;
+	return stats;
 }
 
 /*
-- 
2.17.0

^ permalink raw reply related

* RE: [PATCH net-next] flow_dissector: do not rely on implicit casts
From: Jon Maloy @ 2018-05-07 13:54 UTC (permalink / raw)
  To: Paolo Abeni, netdev@vger.kernel.org; +Cc: David S. Miller
In-Reply-To: <163b504b851217e600d9b48aa7cba76b376a675a.1525687522.git.pabeni@redhat.com>

Acked-by: Jon Maloy <jon.maloy@ericsson.com>


> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Paolo Abeni
> Sent: Monday, May 07, 2018 06:06
> To: netdev@vger.kernel.org
> Cc: David S. Miller <davem@davemloft.net>
> Subject: [PATCH net-next] flow_dissector: do not rely on implicit casts
> 
> This change fixes a couple of type mismatch reported by the sparse tool,
> explicitly using the requested type for the offending arguments.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/net/tipc.h        | 4 ++--
>  net/core/flow_dissector.c | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/net/tipc.h b/include/net/tipc.h index
> 07670ec022a7..f0e7e6bc1bef 100644
> --- a/include/net/tipc.h
> +++ b/include/net/tipc.h
> @@ -44,11 +44,11 @@ struct tipc_basic_hdr {
>  	__be32 w[4];
>  };
> 
> -static inline u32 tipc_hdr_rps_key(struct tipc_basic_hdr *hdr)
> +static inline __be32 tipc_hdr_rps_key(struct tipc_basic_hdr *hdr)
>  {
>  	u32 w0 = ntohl(hdr->w[0]);
>  	bool keepalive_msg = (w0 & KEEPALIVE_MSG_MASK) ==
> KEEPALIVE_MSG_MASK;
> -	int key;
> +	__be32 key;
> 
>  	/* Return source node identity as key */
>  	if (likely(!keepalive_msg))
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index
> 030d4ca177fb..4fc1e84d77ec 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -1316,7 +1316,7 @@ u32 skb_get_poff(const struct sk_buff *skb)  {
>  	struct flow_keys_basic keys;
> 
> -	if (!skb_flow_dissect_flow_keys_basic(skb, &keys, 0, 0, 0, 0, 0))
> +	if (!skb_flow_dissect_flow_keys_basic(skb, &keys, NULL, 0, 0, 0, 0))
>  		return 0;
> 
>  	return __skb_get_poff(skb, skb->data, &keys, skb_headlen(skb));
> --
> 2.14.3

^ permalink raw reply

* Re: [bpf-next v2 8/9] bpf: Provide helper to do forwarding lookups in kernel FIB table
From: Jesper Dangaard Brouer @ 2018-05-07 13:35 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, borkmann, ast, davem, shm, roopa, toke, john.fastabend,
	brouer
In-Reply-To: <20180504025432.23451-9-dsahern@gmail.com>


On Thu,  3 May 2018 19:54:31 -0700 David Ahern <dsahern@gmail.com> wrote:

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 6877426c23a6..cf0d27acf1d1 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
[...]
> +static const struct bpf_func_proto bpf_xdp_fib_lookup_proto = {
> +	.func		= bpf_xdp_fib_lookup,
> +	.gpl_only	= true,

Is it a deliberate choice to require BPF-progs using this helper to be
GPL licensed?

Asking as this seems to be the first network related helper with this
requirement, while this is typical for tracing related helpers.


> +	.ret_type	= RET_INTEGER,
> +	.arg1_type      = ARG_PTR_TO_CTX,
> +	.arg2_type      = ARG_PTR_TO_MEM,
> +	.arg3_type      = ARG_CONST_SIZE,
> +	.arg4_type	= ARG_ANYTHING,
> +};

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox