Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
From: Kirill Tkhai @ 2018-09-03 13:41 UTC (permalink / raw)
  To: Christian Brauner
  Cc: netdev, linux-kernel, davem, kuznet, yoshfuji, pombredanne,
	kstewart, gregkh, dsahern, fw, lucien.xin, jakub.kicinski, jbenc,
	nicolas.dichtel
In-Reply-To: <20180901013427.tj3t2mlik4t7hlt5@gmail.com>

On 01.09.2018 04:34, Christian Brauner wrote:
> On Thu, Aug 30, 2018 at 04:45:45PM +0200, Christian Brauner wrote:
>> On Thu, Aug 30, 2018 at 11:49:31AM +0300, Kirill Tkhai wrote:
>>> On 29.08.2018 21:13, Christian Brauner wrote:
>>>> Hi Kirill,
>>>>
>>>> Thanks for the question!
>>>>
>>>> On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
>>>>> Hi, Christian,
>>>>>
>>>>> On 29.08.2018 02:18, Christian Brauner wrote:
>>>>>> From: Christian Brauner <christian@brauner.io>
>>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> A while back we introduced and enabled IFLA_IF_NETNSID in
>>>>>> RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
>>>>>> to signficant performance increases since it allows userspace to avoid
>>>>>> taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
>>>>>> interfaces from the netns associated with the netns_fd. Especially when a
>>>>>> lot of network namespaces are in use, using setns() becomes increasingly
>>>>>> problematic when performance matters.
>>>>>
>>>>> could you please give a real example, when setns()+socket(AF_NETLINK) cause
>>>>> problems with the performance? You should do this only once on application
>>>>> startup, and then you have created netlink sockets in any net namespaces you
>>>>> need. What is the problem here?
>>>>
>>>> So we have a daemon (LXD) that is often running thousands of containers.
>>>> When users issue a lxc list request against the daemon it returns a list
>>>> of all containers including all of the interfaces and addresses for each
>>>> container. To retrieve those addresses we currently rely on setns() +
>>>> getifaddrs() for each of those containers. That has horrible
>>>> performance.
>>>
>>> Could you please provide some numbers showing that setns()
>>> introduces signify performance decrease in the application?
>>
>> Sure, might take a few days++ though since I'm traveling.
> 
> Hey Kirill,
> 
> As promised here's some code [1] that compares performance. I basically
> did a setns() to the network namespace and called getifaddrs() and
> compared this to the scenario where I use the newly introduced property.
> I did this 1 million times and calculated the mean getifaddrs()
> retrieval time based on that.
> My patch cuts the time in half.
> 
> brauner@wittgenstein:~/netns_getifaddrs$ ./getifaddrs_perf 0 1178
> Mean time in microseconds (netnsid): 81
> Mean time in microseconds (setns): 162
> 
> Christian
> 
> I'm only appending the main file since the netsns_getifaddrs() code I
> used is pretty long:
> 
> [1]:
> 
> #define _GNU_SOURCE
> #define __STDC_FORMAT_MACROS
> #include <fcntl.h>
> #include <inttypes.h>
> #include <linux/types.h>
> #include <sched.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/stat.h>
> #include <sys/time.h>
> #include <sys/types.h>
> #include <unistd.h>
> 
> #include "netns_getifaddrs.h"
> #include "print_getifaddrs.h"
> 
> #define ITERATIONS 1000000
> #define SEC_TO_MICROSEC(x) (1000000 * (x))
> 
> int main(int argc, char *argv[])
> {
> 	int i, ret;
> 	__s32 netns_id;
> 	pid_t netns_pid;
> 	char path[1024];
> 	intmax_t times[ITERATIONS];
> 	struct timeval t1, t2;
> 	intmax_t time_in_mcs;
> 	int fret = EXIT_FAILURE;
> 	intmax_t sum = 0;
> 	int host_netns_fd = -1, netns_fd = -1;
> 
> 	struct ifaddrs *ifaddrs = NULL;
> 
> 	if (argc != 3)
> 		goto on_error;
> 
> 	netns_id = atoi(argv[1]);
> 	netns_pid = atoi(argv[2]);
> 	printf("%d\n", netns_id);
> 	printf("%d\n", netns_pid);
> 
> 	for (i = 0; i < ITERATIONS; i++) {
> 		ret = gettimeofday(&t1, NULL);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = netns_getifaddrs(&ifaddrs, netns_id);
> 		freeifaddrs(ifaddrs);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = gettimeofday(&t2, NULL);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
> 			      (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
> 		times[i] = time_in_mcs;
> 	}
> 
> 	for (i = 0; i < ITERATIONS; i++)
> 		sum += times[i];
> 
> 	printf("Mean time in microseconds (netnsid): %ju\n",
> 	       sum / ITERATIONS);
> 
> 	ret = snprintf(path, sizeof(path), "/proc/%d/ns/net", netns_pid);
> 	if (ret < 0 || (size_t)ret >= sizeof(path))
> 		goto on_error;
> 
> 	netns_fd = open(path, O_RDONLY | O_CLOEXEC);
> 	if (netns_fd < 0)
> 		goto on_error;
> 
> 	host_netns_fd = open("/proc/self/ns/net", O_RDONLY | O_CLOEXEC);
> 	if (host_netns_fd < 0)
> 		goto on_error;
> 
> 	memset(times, 0, sizeof(times));
> 	for (i = 0; i < ITERATIONS; i++) {
> 		ret = gettimeofday(&t1, NULL);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = setns(netns_fd, CLONE_NEWNET);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = getifaddrs(&ifaddrs);
> 		freeifaddrs(ifaddrs);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = gettimeofday(&t2, NULL);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = setns(host_netns_fd, CLONE_NEWNET);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
> 			      (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
> 		times[i] = time_in_mcs;
> 	}
> 
> 	for (i = 0; i < ITERATIONS; i++)
> 		sum += times[i];
> 
> 	printf("Mean time in microseconds (setns): %ju\n",
> 	       sum / ITERATIONS);
> 
> 	fret = EXIT_SUCCESS;
> 
> on_error:
> 	if (netns_fd >= 0)
> 		close(netns_fd);
> 
> 	if (host_netns_fd >= 0)
> 		close(host_netns_fd);
> 
> 	exit(fret);
> }

But this is a synthetic test, while I asked about real workflow.
Is this real problem for lxd, and there is observed performance
decrease?

I see, there are already nsid use in existing code, but I have to say,
that adding new types of variables as a system call arguments make it
less modular. When you request RTM_GETADDR for a specific nsid, this
obligates the kernel to make everything unchangable during the call,
doesn't it?

We may look at existing code as example, what problems this may cause.
Look at do_setlink(). There are many different types of variables,
and all of them should be dereferenced atomically. So, all the function
is executed under global rtnl. And this causes delays in another config
places, which are sensitive to rtnl. So, adding more dimensions to RTM_GETADDR
may turn it in the same overloaded function as do_setlink() is. And one
day, when we reach the state, when we must rework all of this, we won't
be able to do this. I'm not sure, now is not too late.

I just say about this, because it's possible we should consider another
approach in rtnl communication in general, and stop to overload it.

Kirill

^ permalink raw reply

* WARNING: suspicious RCU usage (4)
From: syzbot @ 2018-09-03  9:21 UTC (permalink / raw)
  To: ast, daniel, linux-kernel, mingo, netdev, rostedt, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    360bd62dc494 Merge tag 'linux-watchdog-4.19-rc2' of git://..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11e96c92400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=531a917630d2a492
dashboard link: https://syzkaller.appspot.com/bug?extid=cd8dada2671a39f2f9e6
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1199de3e400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=153a555a400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+cd8dada2671a39f2f9e6@syzkaller.appspotmail.com

IPVS: ftp: loaded support on port[0] = 21


=============================
=============================
WARNING: suspicious RCU usage
WARNING: suspicious RCU usage
4.19.0-rc1+ #218 Not tainted
4.19.0-rc1+ #218 Not tainted
-----------------------------
-----------------------------
include/linux/rcupdate.h:631 rcu_read_lock() used illegally while idle!
kernel/trace/bpf_trace.c:72 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


other info that might help us debug this:


RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 1

RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 1
RCU used illegally from extended quiescent state!
RCU used illegally from extended quiescent state!
1 lock held by swapper/1/0:
1 lock held by swapper/0/0:
  #0:
  #0:
00000000c56a0d6a
00000000c56a0d6a
  (
  (
rcu_read_lock
rcu_read_lock
){....}
){....}
, at: trace_call_bpf+0xf8/0x640 kernel/trace/bpf_trace.c:46
, at: trace_call_bpf+0xf8/0x640 kernel/trace/bpf_trace.c:46

stack backtrace:

stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.0-rc1+ #218
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  lockdep_rcu_suspicious+0x14a/0x153 kernel/locking/lockdep.c:4537
  rcu_read_lock include/linux/rcupdate.h:630 [inline]
  trace_call_bpf+0x533/0x640 kernel/trace/bpf_trace.c:72
  perf_trace_run_bpf_submit+0x15c/0x3b0 kernel/events/core.c:8264
  perf_trace_preemptirq_template+0x3dd/0x650  
include/trace/events/preemptirq.h:14
  trace_irq_enable_rcuidle include/trace/events/preemptirq.h:40 [inline]
  trace_hardirqs_on+0x22e/0x2c0 kernel/trace/trace_preemptirq.c:25
  default_idle+0x8d/0x410 arch/x86/kernel/process.c:498
  arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:489
  default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
  cpuidle_idle_call kernel/sched/idle.c:153 [inline]
  do_idle+0x3aa/0x580 kernel/sched/idle.c:262
  cpu_startup_entry+0x10c/0x120 kernel/sched/idle.c:368
  start_secondary+0x433/0x5d0 arch/x86/kernel/smpboot.c:271
  secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242

CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc1+ #218
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
=============================
WARNING: suspicious RCU usage
Call Trace:
4.19.0-rc1+ #218 Not tainted
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
-----------------------------
include/linux/rcupdate.h:680 rcu_read_unlock() used illegally while idle!

other info that might help us debug this:

  lockdep_rcu_suspicious+0x14a/0x153 kernel/locking/lockdep.c:4537

RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 1
  trace_call_bpf+0x4cb/0x640 kernel/trace/bpf_trace.c:72
RCU used illegally from extended quiescent state!
1 lock held by swapper/1/0:
  #0:
00000000c56a0d6a
  perf_trace_run_bpf_submit+0x15c/0x3b0 kernel/events/core.c:8264
  (
rcu_read_lock
){....}
, at: trace_call_bpf+0xf8/0x640 kernel/trace/bpf_trace.c:46

stack backtrace:
  perf_trace_preemptirq_template+0x3dd/0x650  
include/trace/events/preemptirq.h:14
  trace_irq_enable_rcuidle include/trace/events/preemptirq.h:40 [inline]
  trace_hardirqs_on_caller+0x227/0x2b0 kernel/trace/trace_preemptirq.c:51
  trace_hardirqs_on_thunk+0x1a/0x1c
  retint_kernel+0x10/0x10
RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:56
Code: c7 48 89 45 d8 e8 ca 39 e7 fa 48 8b 45 d8 e9 d2 fe ff ff 48 89 df e8  
b9 39 e7 fa eb 8a 90 90 90 90 90 90 90 55 48 89 e5 fb f4 <5d> c3 0f 1f 84  
00 00 00 00 00 55 48 89 e5 f4 5d c3 90 90 90 90 90
RSP: 0018:ffffffff88007bb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
RAX: dffffc0000000000 RBX: 1ffffffff1000f7b RCX: 0000000000000000
RDX: 1ffffffff10237b8 RSI: 0000000000000001 RDI: ffffffff8811bdc0
RBP: ffffffff88007bb8 R08: ffffffff88075e00 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff88007c78 R14: 0000000000000000 R15: 0000000000000000
  arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
  default_idle+0xc2/0x410 arch/x86/kernel/process.c:498
  arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:489
  default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
  cpuidle_idle_call kernel/sched/idle.c:153 [inline]
  do_idle+0x3aa/0x580 kernel/sched/idle.c:262
  cpu_startup_entry+0x10c/0x120 kernel/sched/idle.c:368
  rest_init+0xe1/0xe4 init/main.c:442
  start_kernel+0x913/0x94e init/main.c:739
  x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:452
  x86_64_start_kernel+0x76/0x79 arch/x86/kernel/head64.c:433
  secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.0-rc1+ #218
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  lockdep_rcu_suspicious+0x14a/0x153 kernel/locking/lockdep.c:4537
  rcu_read_unlock include/linux/rcupdate.h:679 [inline]
  trace_call_bpf+0x579/0x640 kernel/trace/bpf_trace.c:72
  perf_trace_run_bpf_submit+0x15c/0x3b0 kernel/events/core.c:8264
  perf_trace_preemptirq_template+0x3dd/0x650  
include/trace/events/preemptirq.h:14
  trace_irq_enable_rcuidle include/trace/events/preemptirq.h:40 [inline]
  trace_hardirqs_on+0x22e/0x2c0 kernel/trace/trace_preemptirq.c:25
  default_idle+0x8d/0x410 arch/x86/kernel/process.c:498
  arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:489
  default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
  cpuidle_idle_call kernel/sched/idle.c:153 [inline]
  do_idle+0x3aa/0x580 kernel/sched/idle.c:262
  cpu_startup_entry+0x10c/0x120 kernel/sched/idle.c:368
  st


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* WARNING: suspicious RCU usage in trace_call_bpf
From: syzbot @ 2018-09-03  9:21 UTC (permalink / raw)
  To: ast, daniel, linux-kernel, mingo, netdev, rostedt, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    dc6417949297 Merge branch 'net_sched-reject-unknown-tcfa_a..
git tree:       bpf
console output: https://syzkaller.appspot.com/x/log.txt?x=10bc513e400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=531a917630d2a492
dashboard link: https://syzkaller.appspot.com/bug?extid=1c843dc17610ca4c764f
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13071951400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=119e710a400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+1c843dc17610ca4c764f@syzkaller.appspotmail.com

random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)

=============================
WARNING: suspicious RCU usage
4.19.0-rc1+ #44 Not tainted
-----------------------------
include/linux/rcupdate.h:631 rcu_read_lock() used illegally while idle!

other info that might help us debug this:


RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 1
RCU used illegally from extended quiescent state!
1 lock held by swapper/0/0:
  #0: 000000004b34587c (rcu_read_lock){....}, at: trace_call_bpf+0xf8/0x640  
kernel/trace/bpf_trace.c:46

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc1+ #44
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  lockdep_rcu_suspicious+0x14a/0x153 kernel/locking/lockdep.c:4537
  rcu_read_lock include/linux/rcupdate.h:630 [inline]
  trace_call_bpf+0x533/0x640 kernel/trace/bpf_trace.c:72
  perf_trace_run_bpf_submit+0x15c/0x3b0 kernel/events/core.c:8264
  perf_trace_preemptirq_template+0x3dd/0x650  
include/trace/events/preemptirq.h:14
  trace_irq_enable_rcuidle include/trace/events/preemptirq.h:40 [inline]
  trace_hardirqs_on_caller+0x227/0x2b0 kernel/trace/trace_preemptirq.c:51

=============================
WARNING: suspicious RCU usage
  trace_hardirqs_on_thunk+0x1a/0x1c
4.19.0-rc1+ #44 Not tainted
-----------------------------
  retint_kernel+0x10/0x10
kernel/trace/bpf_trace.c:72 suspicious rcu_dereference_check() usage!
RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:56

other info that might help us debug this:

Code: c7 48 89 45 d8 e8 ba 60 e7 fa 48 8b 45 d8 e9 d2 fe ff ff 48 89 df e8  
a9 60 e7 fa eb 8a 90 90 90 90 90 90 90 55 48 89 e5 fb f4 <5d> c3 0f 1f 84  
00 00 00 00 00 55 48 89 e5 f4 5d c3 90 90 90 90 90

RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 1
RSP: 0018:ffffffff88007bb8 EFLAGS: 00000286
RCU used illegally from extended quiescent state!
  ORIG_RAX: ffffffffffffff13
1 lock held by swapper/1/0:
RAX: dffffc0000000000 RBX: 1ffffffff1000f7b RCX: 0000000000000000
  #0:
RDX: 1ffffffff10237b8 RSI: 0000000000000001 RDI: ffffffff8811bdc0
000000004b34587c
RBP: ffffffff88007bb8 R08: ffffffff88075e00 R09: 0000000000000000
  (
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
rcu_read_lock
R13: ffffffff88007c78 R14: 0000000000000000 R15: 0000000000000000
){....}
  arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
  default_idle+0xc2/0x410 arch/x86/kernel/process.c:498
, at: trace_call_bpf+0xf8/0x640 kernel/trace/bpf_trace.c:46

stack backtrace:
  arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:489
  default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
  cpuidle_idle_call kernel/sched/idle.c:153 [inline]
  do_idle+0x3aa/0x580 kernel/sched/idle.c:262
  cpu_startup_entry+0x10c/0x120 kernel/sched/idle.c:368
  rest_init+0xe1/0xe4 init/main.c:442
  start_kernel+0x913/0x94e init/main.c:739
  x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:452
  x86_64_start_kernel+0x76/0x79 arch/x86/kernel/head64.c:433
  secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242

CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.0-rc1+ #44
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
=============================
WARNING: suspicious RCU usage
Call Trace:
4.19.0-rc1+ #44 Not tainted
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
-----------------------------
include/linux/rcupdate.h:680 rcu_read_unlock() used illegally while idle!

other info that might help us debug this:


RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 1
  lockdep_rcu_suspicious+0x14a/0x153 kernel/locking/lockdep.c:4537
RCU used illegally from extended quiescent state!
  trace_call_bpf+0x4cb/0x640 kernel/trace/bpf_trace.c:72
1 lock held by swapper/0/0:
  #0:
000000004b34587c
  (
  perf_trace_run_bpf_submit+0x15c/0x3b0 kernel/events/core.c:8264
rcu_read_lock
){....}
, at: trace_call_bpf+0xf8/0x640 kernel/trace/bpf_trace.c:46

stack backtrace:
  perf_trace_preemptirq_template+0x3dd/0x650  
include/trace/events/preemptirq.h:14
  trace_irq_enable_rcuidle include/trace/events/preemptirq.h:40 [inline]
  trace_hardirqs_on+0x22e/0x2c0 kernel/trace/trace_preemptirq.c:25
  default_idle+0x8d/0x410 arch/x86/kernel/process.c:498
  arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:489
  default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
  cpuidle_idle_call kernel/sched/idle.c:153 [inline]
  do_idle+0x3aa/0x580 kernel/sched/idle.c:262
  cpu_startup_entry+0x10c/0x120 kernel/sched/idle.c:368
  start_secondary+0x433/0x5d0 arch/x86/kernel/smpboot.c:271
  secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc1+ #44
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  lockdep_rcu_suspicious+0x14a/0x153 kernel/locking/lockdep.c:4537
  rcu_read_unlock include/linux/rcupdate.h:679 [inline]
  trace_call_bpf+0x579/0x640 kernel/trace/bpf_trace.c:72
  perf_trace_run_bpf_submit+0x15c/0x3b0 kernel/events/core.c:8264
  perf_trace_preemptirq_template+0x3dd/0x650  
include/trace/events/preemptirq.h:14
  trace_irq_enable_rcuidle include/trace/events/preemptirq.h:40 [inline]
  trace_hardirqs_on_caller+0x227/0x2b0 kernel/trace/trace_preemptirq.c:51
  trace_hardirqs_on_thunk+0x1a/0x1c
  retint_kernel+0x10/0x10
RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:56
Code: c7 48 89 45 d8 e8 ba 60 e7 fa 48 8b 45 d8 e9 d2 fe ff ff 48 89 df e8  
a9 60 e7 fa eb 8a 90 90 90 90 90 90 90 55 48 89 e5 fb f4 <5d> c3 0f 1f 84  
00 00 00 00 00 55 48 89 e5 f4 5d c3 90 90 90 90 90
RSP: 0018:ffffffff88007bb8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
RAX: dffffc0000000000 RBX: 1ffffffff1000f7b RCX: 0000000000000000
RDX: 1ffffffff10237b8 RSI: 0000000000000001 RDI: ffffffff8811bdc0
RBP: ffffffff88007bb8 R08: ffffffff88075e00 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff88007c78 R14: 0000000000000000 R15: 0000000000000000
  arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
  default_idle+0xc2/0x410 arch/x86/kernel/process.c:498
  arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:489
  default_idle_call+0x6d/0x90 kernel/sched/idle.c:93
  cpuidle_idle_call kernel/sched/idle.c:153 [inline]
  do_idle+0x3aa/0x580 kernel/sched/idle.c:262
  cpu_startup_entry+0x10c/0x120 kernel/sched/idle.c:368
  rest_init+0xe1/0xe4 init/main.c:442
  start_kernel+0x913/0x94e init/main.c:739
  x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:452
  x86_64_start_kernel+0x76/0x79 arch/x86/kernel/head64.c:433
  secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH net-next v2 4/7] net: phy: mscc: read 'vsc8531,edge-slowdown' as an u32
From: Quentin Schulz @ 2018-09-03 13:37 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: davem, robh+dt, mark.rutland, f.fainelli, allan.nielsen, netdev,
	devicetree, linux-kernel, thomas.petazzoni
In-Reply-To: <20180903132756.GD4445@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 831 bytes --]

Hi Andrew,

On Mon, Sep 03, 2018 at 03:27:56PM +0200, Andrew Lunn wrote:
> On Mon, Sep 03, 2018 at 10:48:50AM +0200, Quentin Schulz wrote:
> > In the DT binding, it is specified nowhere that 'vsc8531,edge-slowdown'
> > is an u8, even though it's read as an u8 in the driver.
> 
> Hi Quentin
> 
> of_property_read_u8() will perform bounds checking. Is that important
> here?
> 

Just to be sure, we're talking here about making sure the value stored
in the DT is not bigger than the specified value (here an u8)? If so,
that isn't the reason why I'm suggesting those two patches.

Without /bits 8/ in the DT property, whatever were the values I put in
the property, I'd always get a 0. So I need to fix it either in the DT
(but Rob does not really like it) or in the driver.

Hope that makes sense.

Quentin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v2 6/7] dt-bindings: net: phy: mscc: vsc8531: remove compatible from required properties
From: Andrew Lunn @ 2018-09-03 13:36 UTC (permalink / raw)
  To: Quentin Schulz
  Cc: davem, robh+dt, mark.rutland, f.fainelli, allan.nielsen, netdev,
	devicetree, linux-kernel, thomas.petazzoni
In-Reply-To: <20180903133358.b2zsdytyjpentnuj@qschulz>

> What do we do? Do I put it in optional properties and refer to the
> aforementioned doc or do I just remove it as it's done in this patch?

Just remove it.

     Andrew

^ permalink raw reply

* Re: [PATCH v2 00/11] mscc: ocelot: add support for SerDes muxing configuration
From: Andrew Lunn @ 2018-09-03 13:34 UTC (permalink / raw)
  To: Quentin Schulz
  Cc: alexandre.belloni, ralf, paul.burton, jhogan, robh+dt,
	mark.rutland, davem, kishon, f.fainelli, allan.nielsen,
	linux-mips, devicetree, linux-kernel, netdev, thomas.petazzoni
In-Reply-To: <20180903093308.24366-1-quentin.schulz@bootlin.com>

> I suggest patches 1 and 8 go through MIPS tree, 2 to 5 and 11 go through
> net while the others (6, 7, 9 and 10) go through the generic PHY subsystem.

Hi Quentin

Are you expecting merge conflicts? If not, it might be simpler to gets
ACKs from each maintainer, and then merge it though one tree.

     Andrew

^ permalink raw reply

* Re: [PATCH net-next v2 6/7] dt-bindings: net: phy: mscc: vsc8531: remove compatible from required properties
From: Quentin Schulz @ 2018-09-03 13:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: davem, robh+dt, mark.rutland, f.fainelli, allan.nielsen, netdev,
	devicetree, linux-kernel, thomas.petazzoni
In-Reply-To: <20180903133020.GE4445@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 815 bytes --]

Hi Andrew,

On Mon, Sep 03, 2018 at 03:30:20PM +0200, Andrew Lunn wrote:
> On Mon, Sep 03, 2018 at 10:48:52AM +0200, Quentin Schulz wrote:
> > Compatible isn't a required property for PHYs so let's remove it from
> > the binding DT of the VSC8531 PHYs.
> 
> Hi Quentin
> 
> It is actually a optional property. So you could move it into that
> section of the binding documentation. But removing it is also O.K.
> 

It's a property that is already defined in
Documentation/devicetree/bindings/net/phy.txt

I don't think it's actually useful to redefine it there (on the same
principle that we don't redefine/re-explain the reg property).

What do we do? Do I put it in optional properties and refer to the
aforementioned doc or do I just remove it as it's done in this patch?

Thanks,
Quentin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v2 6/7] dt-bindings: net: phy: mscc: vsc8531: remove compatible from required properties
From: Andrew Lunn @ 2018-09-03 13:30 UTC (permalink / raw)
  To: Quentin Schulz
  Cc: davem, robh+dt, mark.rutland, f.fainelli, allan.nielsen, netdev,
	devicetree, linux-kernel, thomas.petazzoni
In-Reply-To: <20180903084853.18092-6-quentin.schulz@bootlin.com>

On Mon, Sep 03, 2018 at 10:48:52AM +0200, Quentin Schulz wrote:
> Compatible isn't a required property for PHYs so let's remove it from
> the binding DT of the VSC8531 PHYs.

Hi Quentin

It is actually a optional property. So you could move it into that
section of the binding documentation. But removing it is also O.K.

	Andrew

^ permalink raw reply

* Re: [PATCH net-next] veth: report NEWLINK event when moving the peer device in a new namespace
From: Thomas Haller @ 2018-09-03  9:10 UTC (permalink / raw)
  To: David Ahern, Lorenzo Bianconi; +Cc: David S. Miller, Network Development
In-Reply-To: <247de7c5-5046-81f4-d0b9-f6fcd505e8e8@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2516 bytes --]

Hi,

On Sat, 2018-09-01 at 17:45 -0600, David Ahern wrote:
> On 9/1/18 3:05 AM, Lorenzo Bianconi wrote:
> > 
> > I was thinking about the commit 38e01b30563a and then I realized I
> > misread the code
> > yesterday. The commit 38e01b30563a provides all relevant info but
> > it
> > emits the event
> > for veth1 (the device moved in the new namespace).
> > An userspace application will not receive that message if it
> > filters
> > events for just
> > a specific device (veth0 in this case) despite that some device
> > properties have changed
> > (since veth0 and veth1 are paired devices). To fix that behavior in
> > veth_notify routine
> > I emits a RTM_NEWLINK event for veth0.
> 
> Userspace is managing a veth a pair. It moves one of them to a new
> namespace and decides to filter messages related to that device
> before
> the move. If it did not filter the DELLINK message it would get the
> information you want. That to me is a userspace problem. What am I
> missing?

The userspace component which moves the veth peer is likely not the
same that is listening to these events.

> Fundamentally, your proposal means 3 messages when moving a device to
> a
> new namespace which is what I think is unnecessary.

You are correct, the necessary information is signaled in this case.

Usually, one can manage the information about one link by considering
only RTM_NEWLINK/RTM_DELLINK message for that link. That seems
desirable behavior of the netlink API, as it simplifies user space
implementations.

For example, libnl3's cache for links doesn't properly handles this,
and you end up with invalid data in the cache. You may call that a
userspace problem and bug. But does it really need to be that
complicated?

FWIW, a similar thing was also NACK'ed earlier:
  http://lists.openwall.net/netdev/2016/06/12/8



Paolo and Lorenzo pointed out another issue when moving the peer to a
third namespace:

>>>>>>>>>>>>>>>>>>>>>>>>>>>

ip netns del ns1
ip netns del ns2
ip netns del ns3
ip netns add ns1
ip netns add ns2
ip netns add ns3


ip -n ns1 link add dev veth0 type veth peer name veth1
ip -n ns1 monitor link &

ip -n ns1 link set veth1 netns ns2
# Deleted 9: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default 
#    link/ether 86:79:1d:c6:45:bc brd ff:ff:ff:ff:ff:ff new-netnsid 0 new-ifindex 9

ip -n ns2 link set veth1 netns ns3
# no event.

>>>>>>>>>>>>>>>>>>>>>>





best,
Thomas

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v2 4/7] net: phy: mscc: read 'vsc8531,edge-slowdown' as an u32
From: Andrew Lunn @ 2018-09-03 13:27 UTC (permalink / raw)
  To: Quentin Schulz
  Cc: davem, robh+dt, mark.rutland, f.fainelli, allan.nielsen, netdev,
	devicetree, linux-kernel, thomas.petazzoni
In-Reply-To: <20180903084853.18092-4-quentin.schulz@bootlin.com>

On Mon, Sep 03, 2018 at 10:48:50AM +0200, Quentin Schulz wrote:
> In the DT binding, it is specified nowhere that 'vsc8531,edge-slowdown'
> is an u8, even though it's read as an u8 in the driver.

Hi Quentin

of_property_read_u8() will perform bounds checking. Is that important
here?

	Thanks
		Andrew

^ permalink raw reply

* Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races
From: Jerome Brunet @ 2018-09-03  8:56 UTC (permalink / raw)
  To: Jose Abreu, netdev
  Cc: Martin Blumenstingl, David S. Miller, Joao Pinto,
	Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <2c4709ce46639f6abcad2f62f50b5101fbc524c5.1535625356.git.joabreu@synopsys.com>

On Thu, 2018-08-30 at 11:37 +0100, Jose Abreu wrote:
> [ As for now this is only for testing! ]
> 
> This follows David Miller advice and tries to fix coalesce timer in
> multi-queue scenarios.
> 
> We are now using per-queue coalesce values and per-queue TX timer. This
> assumes that tx_queues == rx_queues, which can not be necessarly true.
> Official patch will need to have this fixed.
> 
> Coalesce timer default values was changed to 1ms and the coalesce frames
> to 25.
> 
> Tested in B2B setup between XGMAC2 and GMAC5.

Tested on Amlogic meson-axg-s400. No regression seen so far.
(arch/arm64/boot/dts/amlogic/meson-axg-s400.dts)

As far as I understand from the device tree parsing, this platform (and all
other amlogic platforms) use single queue.

---

Jose,

On another topic doing iperf3 test on amlogic's devices we seen a strange
behavior.

Doing Tx or Rx test usually works fine (700MBps to 900MBps depending on the
platform). However, when doing both Rx and Tx at the same time, We see the Tx
throughput dropping significantly (~30MBps) and lot of TCP retries.

Would you any idea what might be our problem ? or how to start investigating
this ?

^ permalink raw reply

* [PATCH net-next v2 6/7] dt-bindings: net: phy: mscc: vsc8531: remove compatible from required properties
From: Quentin Schulz @ 2018-09-03  8:48 UTC (permalink / raw)
  To: davem, robh+dt, mark.rutland, andrew, f.fainelli
  Cc: allan.nielsen, netdev, devicetree, linux-kernel, thomas.petazzoni,
	Quentin Schulz
In-Reply-To: <20180903084853.18092-1-quentin.schulz@bootlin.com>

Compatible isn't a required property for PHYs so let's remove it from
the binding DT of the VSC8531 PHYs.

Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
---
 Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt b/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt
index 0eedabe22cc3..664d9d0543fc 100644
--- a/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt
+++ b/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt
@@ -1,10 +1,5 @@
 * Microsemi - vsc8531 Giga bit ethernet phy
 
-Required properties:
-- compatible	: Should contain phy id as "ethernet-phy-idAAAA.BBBB"
-		  The PHY device uses the binding described in
-		  Documentation/devicetree/bindings/net/phy.txt
-
 Optional properties:
 - vsc8531,vddmac	: The vddmac in mV. Allowed values is listed
 			  in the first row of Table 1 (below).
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next v2 2/7] net: phy: mscc: factorize function for getting LED mode from DT
From: Quentin Schulz @ 2018-09-03  8:48 UTC (permalink / raw)
  To: davem, robh+dt, mark.rutland, andrew, f.fainelli
  Cc: allan.nielsen, netdev, devicetree, linux-kernel, thomas.petazzoni,
	Quentin Schulz
In-Reply-To: <20180903084853.18092-1-quentin.schulz@bootlin.com>

Microsemi PHYs support different LED modes depending on the variant, so
let's factorize the code so we just have to give the supported modes
while the logic behind getting the mode remains identical.

Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
---

added in v2

 drivers/net/phy/mscc.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/mscc.c b/drivers/net/phy/mscc.c
index af433f226ef4..aa37e8547cd0 100644
--- a/drivers/net/phy/mscc.c
+++ b/drivers/net/phy/mscc.c
@@ -104,8 +104,24 @@ enum rgmii_rx_clock_delay {
 #define DOWNSHIFT_COUNT_MAX		  5
 
 #define MAX_LEDS			  4
+#define VSC85XX_SUPP_LED_MODES (BIT(VSC8531_LINK_ACTIVITY) | \
+				BIT(VSC8531_LINK_1000_ACTIVITY) | \
+				BIT(VSC8531_LINK_100_ACTIVITY) | \
+				BIT(VSC8531_LINK_10_ACTIVITY) | \
+				BIT(VSC8531_LINK_100_1000_ACTIVITY) | \
+				BIT(VSC8531_LINK_10_1000_ACTIVITY) | \
+				BIT(VSC8531_LINK_10_100_ACTIVITY) | \
+				BIT(VSC8531_DUPLEX_COLLISION) | \
+				BIT(VSC8531_COLLISION) | \
+				BIT(VSC8531_ACTIVITY) | \
+				BIT(VSC8531_AUTONEG_FAULT) | \
+				BIT(VSC8531_SERIAL_MODE) | \
+				BIT(VSC8531_FORCE_LED_OFF) | \
+				BIT(VSC8531_FORCE_LED_ON))
+
 struct vsc8531_private {
 	int rate_magic;
+	u16 supp_led_modes;
 	u8 leds_mode[MAX_LEDS];
 	u8 nleds;
 };
@@ -401,6 +417,7 @@ static int vsc85xx_dt_led_mode_get(struct phy_device *phydev,
 				   char *led,
 				   u8 default_mode)
 {
+	struct vsc8531_private *priv = phydev->priv;
 	struct device *dev = &phydev->mdio.dev;
 	struct device_node *of_node = dev->of_node;
 	u8 led_mode;
@@ -411,7 +428,7 @@ static int vsc85xx_dt_led_mode_get(struct phy_device *phydev,
 
 	led_mode = default_mode;
 	err = of_property_read_u8(of_node, led, &led_mode);
-	if (!err && (led_mode > 15 || led_mode == 7 || led_mode == 11)) {
+	if (!err && !(BIT(led_mode) & priv->supp_led_modes)) {
 		phydev_err(phydev, "DT %s invalid\n", led);
 		return -EINVAL;
 	}
@@ -655,6 +672,7 @@ static int vsc85xx_probe(struct phy_device *phydev)
 
 	vsc8531->rate_magic = rate_magic;
 	vsc8531->nleds = 2;
+	vsc8531->supp_led_modes = VSC85XX_SUPP_LED_MODES;
 
 	return vsc85xx_dt_led_modes_get(phydev, default_mode);
 }
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH v2] mt76: Enable NL80211_EXT_FEATURE_CQM_RSSI_LIST
From: Kristian Evensen @ 2018-09-03 13:07 UTC (permalink / raw)
  To: kvalo; +Cc: linux-wireless, Network Development, linux-kernel
In-Reply-To: <87ftyqwx2e.fsf@codeaurora.org>

Hi,

On Mon, Sep 3, 2018 at 3:05 PM Kalle Valo <kvalo@codeaurora.org> wrote:
>
> The changelog should be here, after the '---' line, so that git can
> automatically drop it. But I can fix it before I commit, but in the
> future please add it to the correct place.

Thanks for letting me know and thanks for fixing my error this time. I
was not aware of this requirement, but will remember about it in the
future.

BR,
Kristian

^ permalink raw reply

* Re: [PATCH v2] mt76: Enable NL80211_EXT_FEATURE_CQM_RSSI_LIST
From: Kalle Valo @ 2018-09-03 13:05 UTC (permalink / raw)
  To: Kristian Evensen; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20180901083834.11201-1-kristian.evensen@gmail.com>

Kristian Evensen <kristian.evensen@gmail.com> writes:

> Enable the use of CQM_RSSI_LIST with mt76-devices. The change has been
> tested with the mt7602, mt7603 and mt7621 PCI wifi-cards. I passed a
> list of RSSI thresholds to the driver, and when disconnecting/connecting
> the antenna(s) I got an event each time the RSSI went above/below a
> threshold.
>
> While I have not been able to test the change with any of the mt76
> USB-devices (no access to a device), the RX RSSI management code is
> shared between the two device types. Thus, CQM should also work with the
> mt76 USB-devices.
>
> v1->v2:
> * Updated commit message. Thanks Kalle Valo, Arend van Spriel,
> Lorenzo Bianconi and Andrew Zaborowski.
>
> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
> ---

The changelog should be here, after the '---' line, so that git can
automatically drop it. But I can fix it before I commit, but in the
future please add it to the correct place.

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#changelog_missing

-- 
Kalle Valo

^ permalink raw reply

* [PATCH V2] net: qca_spi: Fix race condition in spi transfers
From: Stefan Wahren @ 2018-09-03 12:44 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-kernel, Stefan Wahren

With performance optimization the spi transfer and messages from basic
register operations like qcaspi_read_register moved into the private
driver structure. But they weren't protected against mutual access
(e.g. between driver kthread and ethtool). So dumping the QCA7000
register via ethtool during network traffic could make spi_sync
hang forever, because the completion in spi_message is overwritten.

So revert the optimization completely.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
---
 drivers/net/ethernet/qualcomm/qca_7k.c  |  84 ++++++++++++------------
 drivers/net/ethernet/qualcomm/qca_spi.c | 110 +++++++++++++++++---------------
 drivers/net/ethernet/qualcomm/qca_spi.h |   5 --
 3 files changed, 97 insertions(+), 102 deletions(-)

Changes in V2:
- explain race in commit log more in detail

diff --git a/drivers/net/ethernet/qualcomm/qca_7k.c b/drivers/net/ethernet/qualcomm/qca_7k.c
index ffe7a16..e9914b6 100644
--- a/drivers/net/ethernet/qualcomm/qca_7k.c
+++ b/drivers/net/ethernet/qualcomm/qca_7k.c
@@ -43,41 +43,40 @@ qcaspi_spi_error(struct qcaspi *qca)
 int
 qcaspi_read_register(struct qcaspi *qca, u16 reg, u16 *result)
 {
-	__be16 rx_data;
 	__be16 tx_data;
-	struct spi_transfer *transfer;
-	struct spi_message *msg;
+	struct spi_transfer transfer[2];
+	struct spi_message msg;
 	int ret;
 
+	memset(transfer, 0, sizeof(transfer));
+
+	spi_message_init(&msg);
+
 	tx_data = cpu_to_be16(QCA7K_SPI_READ | QCA7K_SPI_INTERNAL | reg);
+	*result = 0;
+
+	transfer[0].tx_buf = &tx_data;
+	transfer[0].len = QCASPI_CMD_LEN;
+	transfer[1].rx_buf = result;
+	transfer[1].len = QCASPI_CMD_LEN;
+
+	spi_message_add_tail(&transfer[0], &msg);
 
 	if (qca->legacy_mode) {
-		msg = &qca->spi_msg1;
-		transfer = &qca->spi_xfer1;
-		transfer->tx_buf = &tx_data;
-		transfer->rx_buf = NULL;
-		transfer->len = QCASPI_CMD_LEN;
-		spi_sync(qca->spi_dev, msg);
-	} else {
-		msg = &qca->spi_msg2;
-		transfer = &qca->spi_xfer2[0];
-		transfer->tx_buf = &tx_data;
-		transfer->rx_buf = NULL;
-		transfer->len = QCASPI_CMD_LEN;
-		transfer = &qca->spi_xfer2[1];
+		spi_sync(qca->spi_dev, &msg);
+		spi_message_init(&msg);
 	}
-	transfer->tx_buf = NULL;
-	transfer->rx_buf = &rx_data;
-	transfer->len = QCASPI_CMD_LEN;
-	ret = spi_sync(qca->spi_dev, msg);
+	spi_message_add_tail(&transfer[1], &msg);
+	ret = spi_sync(qca->spi_dev, &msg);
 
 	if (!ret)
-		ret = msg->status;
+		ret = msg.status;
 
-	if (ret)
+	if (ret) {
 		qcaspi_spi_error(qca);
-	else
-		*result = be16_to_cpu(rx_data);
+	} else {
+		*result = be16_to_cpu(*result);
+	}
 
 	return ret;
 }
@@ -86,35 +85,32 @@ int
 qcaspi_write_register(struct qcaspi *qca, u16 reg, u16 value)
 {
 	__be16 tx_data[2];
-	struct spi_transfer *transfer;
-	struct spi_message *msg;
+	struct spi_transfer transfer[2];
+	struct spi_message msg;
 	int ret;
 
+	memset(&transfer, 0, sizeof(transfer));
+
+	spi_message_init(&msg);
+
 	tx_data[0] = cpu_to_be16(QCA7K_SPI_WRITE | QCA7K_SPI_INTERNAL | reg);
 	tx_data[1] = cpu_to_be16(value);
 
+	transfer[0].tx_buf = &tx_data[0];
+	transfer[0].len = QCASPI_CMD_LEN;
+	transfer[1].tx_buf = &tx_data[1];
+	transfer[1].len = QCASPI_CMD_LEN;
+
+	spi_message_add_tail(&transfer[0], &msg);
 	if (qca->legacy_mode) {
-		msg = &qca->spi_msg1;
-		transfer = &qca->spi_xfer1;
-		transfer->tx_buf = &tx_data[0];
-		transfer->rx_buf = NULL;
-		transfer->len = QCASPI_CMD_LEN;
-		spi_sync(qca->spi_dev, msg);
-	} else {
-		msg = &qca->spi_msg2;
-		transfer = &qca->spi_xfer2[0];
-		transfer->tx_buf = &tx_data[0];
-		transfer->rx_buf = NULL;
-		transfer->len = QCASPI_CMD_LEN;
-		transfer = &qca->spi_xfer2[1];
+		spi_sync(qca->spi_dev, &msg);
+		spi_message_init(&msg);
 	}
-	transfer->tx_buf = &tx_data[1];
-	transfer->rx_buf = NULL;
-	transfer->len = QCASPI_CMD_LEN;
-	ret = spi_sync(qca->spi_dev, msg);
+	spi_message_add_tail(&transfer[1], &msg);
+	ret = spi_sync(qca->spi_dev, &msg);
 
 	if (!ret)
-		ret = msg->status;
+		ret = msg.status;
 
 	if (ret)
 		qcaspi_spi_error(qca);
diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c b/drivers/net/ethernet/qualcomm/qca_spi.c
index 206f026..66b775d 100644
--- a/drivers/net/ethernet/qualcomm/qca_spi.c
+++ b/drivers/net/ethernet/qualcomm/qca_spi.c
@@ -99,22 +99,24 @@ static u32
 qcaspi_write_burst(struct qcaspi *qca, u8 *src, u32 len)
 {
 	__be16 cmd;
-	struct spi_message *msg = &qca->spi_msg2;
-	struct spi_transfer *transfer = &qca->spi_xfer2[0];
+	struct spi_message msg;
+	struct spi_transfer transfer[2];
 	int ret;
 
+	memset(&transfer, 0, sizeof(transfer));
+	spi_message_init(&msg);
+
 	cmd = cpu_to_be16(QCA7K_SPI_WRITE | QCA7K_SPI_EXTERNAL);
-	transfer->tx_buf = &cmd;
-	transfer->rx_buf = NULL;
-	transfer->len = QCASPI_CMD_LEN;
-	transfer = &qca->spi_xfer2[1];
-	transfer->tx_buf = src;
-	transfer->rx_buf = NULL;
-	transfer->len = len;
+	transfer[0].tx_buf = &cmd;
+	transfer[0].len = QCASPI_CMD_LEN;
+	transfer[1].tx_buf = src;
+	transfer[1].len = len;
 
-	ret = spi_sync(qca->spi_dev, msg);
+	spi_message_add_tail(&transfer[0], &msg);
+	spi_message_add_tail(&transfer[1], &msg);
+	ret = spi_sync(qca->spi_dev, &msg);
 
-	if (ret || (msg->actual_length != QCASPI_CMD_LEN + len)) {
+	if (ret || (msg.actual_length != QCASPI_CMD_LEN + len)) {
 		qcaspi_spi_error(qca);
 		return 0;
 	}
@@ -125,17 +127,20 @@ qcaspi_write_burst(struct qcaspi *qca, u8 *src, u32 len)
 static u32
 qcaspi_write_legacy(struct qcaspi *qca, u8 *src, u32 len)
 {
-	struct spi_message *msg = &qca->spi_msg1;
-	struct spi_transfer *transfer = &qca->spi_xfer1;
+	struct spi_message msg;
+	struct spi_transfer transfer;
 	int ret;
 
-	transfer->tx_buf = src;
-	transfer->rx_buf = NULL;
-	transfer->len = len;
+	memset(&transfer, 0, sizeof(transfer));
+	spi_message_init(&msg);
+
+	transfer.tx_buf = src;
+	transfer.len = len;
 
-	ret = spi_sync(qca->spi_dev, msg);
+	spi_message_add_tail(&transfer, &msg);
+	ret = spi_sync(qca->spi_dev, &msg);
 
-	if (ret || (msg->actual_length != len)) {
+	if (ret || (msg.actual_length != len)) {
 		qcaspi_spi_error(qca);
 		return 0;
 	}
@@ -146,23 +151,25 @@ qcaspi_write_legacy(struct qcaspi *qca, u8 *src, u32 len)
 static u32
 qcaspi_read_burst(struct qcaspi *qca, u8 *dst, u32 len)
 {
-	struct spi_message *msg = &qca->spi_msg2;
+	struct spi_message msg;
 	__be16 cmd;
-	struct spi_transfer *transfer = &qca->spi_xfer2[0];
+	struct spi_transfer transfer[2];
 	int ret;
 
+	memset(&transfer, 0, sizeof(transfer));
+	spi_message_init(&msg);
+
 	cmd = cpu_to_be16(QCA7K_SPI_READ | QCA7K_SPI_EXTERNAL);
-	transfer->tx_buf = &cmd;
-	transfer->rx_buf = NULL;
-	transfer->len = QCASPI_CMD_LEN;
-	transfer = &qca->spi_xfer2[1];
-	transfer->tx_buf = NULL;
-	transfer->rx_buf = dst;
-	transfer->len = len;
+	transfer[0].tx_buf = &cmd;
+	transfer[0].len = QCASPI_CMD_LEN;
+	transfer[1].rx_buf = dst;
+	transfer[1].len = len;
 
-	ret = spi_sync(qca->spi_dev, msg);
+	spi_message_add_tail(&transfer[0], &msg);
+	spi_message_add_tail(&transfer[1], &msg);
+	ret = spi_sync(qca->spi_dev, &msg);
 
-	if (ret || (msg->actual_length != QCASPI_CMD_LEN + len)) {
+	if (ret || (msg.actual_length != QCASPI_CMD_LEN + len)) {
 		qcaspi_spi_error(qca);
 		return 0;
 	}
@@ -173,17 +180,20 @@ qcaspi_read_burst(struct qcaspi *qca, u8 *dst, u32 len)
 static u32
 qcaspi_read_legacy(struct qcaspi *qca, u8 *dst, u32 len)
 {
-	struct spi_message *msg = &qca->spi_msg1;
-	struct spi_transfer *transfer = &qca->spi_xfer1;
+	struct spi_message msg;
+	struct spi_transfer transfer;
 	int ret;
 
-	transfer->tx_buf = NULL;
-	transfer->rx_buf = dst;
-	transfer->len = len;
+	memset(&transfer, 0, sizeof(transfer));
+	spi_message_init(&msg);
 
-	ret = spi_sync(qca->spi_dev, msg);
+	transfer.rx_buf = dst;
+	transfer.len = len;
 
-	if (ret || (msg->actual_length != len)) {
+	spi_message_add_tail(&transfer, &msg);
+	ret = spi_sync(qca->spi_dev, &msg);
+
+	if (ret || (msg.actual_length != len)) {
 		qcaspi_spi_error(qca);
 		return 0;
 	}
@@ -195,19 +205,23 @@ static int
 qcaspi_tx_cmd(struct qcaspi *qca, u16 cmd)
 {
 	__be16 tx_data;
-	struct spi_message *msg = &qca->spi_msg1;
-	struct spi_transfer *transfer = &qca->spi_xfer1;
+	struct spi_message msg;
+	struct spi_transfer transfer;
 	int ret;
 
+	memset(&transfer, 0, sizeof(transfer));
+
+	spi_message_init(&msg);
+
 	tx_data = cpu_to_be16(cmd);
-	transfer->len = sizeof(tx_data);
-	transfer->tx_buf = &tx_data;
-	transfer->rx_buf = NULL;
+	transfer.len = sizeof(cmd);
+	transfer.tx_buf = &tx_data;
+	spi_message_add_tail(&transfer, &msg);
 
-	ret = spi_sync(qca->spi_dev, msg);
+	ret = spi_sync(qca->spi_dev, &msg);
 
 	if (!ret)
-		ret = msg->status;
+		ret = msg.status;
 
 	if (ret)
 		qcaspi_spi_error(qca);
@@ -835,16 +849,6 @@ qcaspi_netdev_setup(struct net_device *dev)
 	qca = netdev_priv(dev);
 	memset(qca, 0, sizeof(struct qcaspi));
 
-	memset(&qca->spi_xfer1, 0, sizeof(struct spi_transfer));
-	memset(&qca->spi_xfer2, 0, sizeof(struct spi_transfer) * 2);
-
-	spi_message_init(&qca->spi_msg1);
-	spi_message_add_tail(&qca->spi_xfer1, &qca->spi_msg1);
-
-	spi_message_init(&qca->spi_msg2);
-	spi_message_add_tail(&qca->spi_xfer2[0], &qca->spi_msg2);
-	spi_message_add_tail(&qca->spi_xfer2[1], &qca->spi_msg2);
-
 	memset(&qca->txr, 0, sizeof(qca->txr));
 	qca->txr.count = TX_RING_MAX_LEN;
 }
diff --git a/drivers/net/ethernet/qualcomm/qca_spi.h b/drivers/net/ethernet/qualcomm/qca_spi.h
index fc4beb1..fc0e987 100644
--- a/drivers/net/ethernet/qualcomm/qca_spi.h
+++ b/drivers/net/ethernet/qualcomm/qca_spi.h
@@ -83,11 +83,6 @@ struct qcaspi {
 	struct tx_ring txr;
 	struct qcaspi_stats stats;
 
-	struct spi_message spi_msg1;
-	struct spi_message spi_msg2;
-	struct spi_transfer spi_xfer1;
-	struct spi_transfer spi_xfer2[2];
-
 	u8 *rx_buffer;
 	u32 buffer_size;
 	u8 sync;
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 3/3] bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA.
From: Michael Chan @ 2018-09-03  8:23 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1535962999-10013-1-git-send-email-michael.chan@broadcom.com>

Currently, the driver adjusts the bp->hw_resc.max_cp_rings by the number
of MSIX vectors used by RDMA.  There is one code path in open that needs
to check the true max_cp_rings including any used by RDMA.  This code
is now checking for the reduced max_cp_rings which will fail when the
number of cp rings is very small.

To fix this in a clean way, we don't adjust max_cp_rings anymore.
Instead, we add a helper bnxt_get_max_func_cp_rings_for_en() to get the
reduced max_cp_rings when appropriate.

Fixes: ec86f14ea506 ("bnxt_en: Add ULP calls to stop and restart IRQs.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c       | 7 ++++---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h       | 2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 7 ++++---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c   | 5 -----
 4 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 6472ce4..cecbb1d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5913,9 +5913,9 @@ unsigned int bnxt_get_max_func_cp_rings(struct bnxt *bp)
 	return bp->hw_resc.max_cp_rings;
 }
 
-void bnxt_set_max_func_cp_rings(struct bnxt *bp, unsigned int max)
+unsigned int bnxt_get_max_func_cp_rings_for_en(struct bnxt *bp)
 {
-	bp->hw_resc.max_cp_rings = max;
+	return bp->hw_resc.max_cp_rings - bnxt_get_ulp_msix_num(bp);
 }
 
 static unsigned int bnxt_get_max_func_irqs(struct bnxt *bp)
@@ -8631,7 +8631,8 @@ static void _bnxt_get_max_rings(struct bnxt *bp, int *max_rx, int *max_tx,
 
 	*max_tx = hw_resc->max_tx_rings;
 	*max_rx = hw_resc->max_rx_rings;
-	*max_cp = min_t(int, hw_resc->max_irqs, hw_resc->max_cp_rings);
+	*max_cp = min_t(int, bnxt_get_max_func_cp_rings_for_en(bp),
+			hw_resc->max_irqs);
 	*max_cp = min_t(int, *max_cp, hw_resc->max_stat_ctxs);
 	max_ring_grps = hw_resc->max_hw_ring_grps;
 	if (BNXT_CHIP_TYPE_NITRO_A0(bp) && BNXT_PF(bp)) {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index c4c77b9..bde3846 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1481,7 +1481,7 @@ int bnxt_hwrm_set_coal(struct bnxt *);
 unsigned int bnxt_get_max_func_stat_ctxs(struct bnxt *bp);
 void bnxt_set_max_func_stat_ctxs(struct bnxt *bp, unsigned int max);
 unsigned int bnxt_get_max_func_cp_rings(struct bnxt *bp);
-void bnxt_set_max_func_cp_rings(struct bnxt *bp, unsigned int max);
+unsigned int bnxt_get_max_func_cp_rings_for_en(struct bnxt *bp);
 int bnxt_get_avail_msix(struct bnxt *bp, int num);
 int bnxt_reserve_rings(struct bnxt *bp);
 void bnxt_tx_disable(struct bnxt *bp);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 6d583bc..fcd085a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -451,7 +451,7 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
 
 	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_VF_RESOURCE_CFG, -1, -1);
 
-	vf_cp_rings = hw_resc->max_cp_rings - bp->cp_nr_rings;
+	vf_cp_rings = bnxt_get_max_func_cp_rings_for_en(bp) - bp->cp_nr_rings;
 	vf_stat_ctx = hw_resc->max_stat_ctxs - bp->num_stat_ctxs;
 	if (bp->flags & BNXT_FLAG_AGG_RINGS)
 		vf_rx_rings = hw_resc->max_rx_rings - bp->rx_nr_rings * 2;
@@ -549,7 +549,8 @@ static int bnxt_hwrm_func_cfg(struct bnxt *bp, int num_vfs)
 	max_stat_ctxs = hw_resc->max_stat_ctxs;
 
 	/* Remaining rings are distributed equally amongs VF's for now */
-	vf_cp_rings = (hw_resc->max_cp_rings - bp->cp_nr_rings) / num_vfs;
+	vf_cp_rings = (bnxt_get_max_func_cp_rings_for_en(bp) -
+		       bp->cp_nr_rings) / num_vfs;
 	vf_stat_ctx = (max_stat_ctxs - bp->num_stat_ctxs) / num_vfs;
 	if (bp->flags & BNXT_FLAG_AGG_RINGS)
 		vf_rx_rings = (hw_resc->max_rx_rings - bp->rx_nr_rings * 2) /
@@ -643,7 +644,7 @@ static int bnxt_sriov_enable(struct bnxt *bp, int *num_vfs)
 	 */
 	vfs_supported = *num_vfs;
 
-	avail_cp = hw_resc->max_cp_rings - bp->cp_nr_rings;
+	avail_cp = bnxt_get_max_func_cp_rings_for_en(bp) - bp->cp_nr_rings;
 	avail_stat = hw_resc->max_stat_ctxs - bp->num_stat_ctxs;
 	avail_cp = min_t(int, avail_cp, avail_stat);
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
index deac73e..beee612 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
@@ -169,7 +169,6 @@ static int bnxt_req_msix_vecs(struct bnxt_en_dev *edev, int ulp_id,
 		edev->ulp_tbl[ulp_id].msix_requested = avail_msix;
 	}
 	bnxt_fill_msix_vecs(bp, ent);
-	bnxt_set_max_func_cp_rings(bp, max_cp_rings - avail_msix);
 	edev->flags |= BNXT_EN_FLAG_MSIX_REQUESTED;
 	return avail_msix;
 }
@@ -178,7 +177,6 @@ static int bnxt_free_msix_vecs(struct bnxt_en_dev *edev, int ulp_id)
 {
 	struct net_device *dev = edev->net;
 	struct bnxt *bp = netdev_priv(dev);
-	int max_cp_rings, msix_requested;
 
 	ASSERT_RTNL();
 	if (ulp_id != BNXT_ROCE_ULP)
@@ -187,9 +185,6 @@ static int bnxt_free_msix_vecs(struct bnxt_en_dev *edev, int ulp_id)
 	if (!(edev->flags & BNXT_EN_FLAG_MSIX_REQUESTED))
 		return 0;
 
-	max_cp_rings = bnxt_get_max_func_cp_rings(bp);
-	msix_requested = edev->ulp_tbl[ulp_id].msix_requested;
-	bnxt_set_max_func_cp_rings(bp, max_cp_rings + msix_requested);
 	edev->ulp_tbl[ulp_id].msix_requested = 0;
 	edev->flags &= ~BNXT_EN_FLAG_MSIX_REQUESTED;
 	if (netif_running(dev)) {
-- 
2.5.1

^ permalink raw reply related

* [PATCH net 2/3] bnxt_en: Clean up unused functions.
From: Michael Chan @ 2018-09-03  8:23 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1535962999-10013-1-git-send-email-michael.chan@broadcom.com>

Remove unused bnxt_subtract_ulp_resources().  Change
bnxt_get_max_func_irqs() to static since it is only locally used.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  1 -
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 15 ---------------
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h |  1 -
 4 files changed, 1 insertion(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 6a1baf3..6472ce4 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5918,7 +5918,7 @@ void bnxt_set_max_func_cp_rings(struct bnxt *bp, unsigned int max)
 	bp->hw_resc.max_cp_rings = max;
 }
 
-unsigned int bnxt_get_max_func_irqs(struct bnxt *bp)
+static unsigned int bnxt_get_max_func_irqs(struct bnxt *bp)
 {
 	struct bnxt_hw_resc *hw_resc = &bp->hw_resc;
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index fefa011..c4c77b9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1482,7 +1482,6 @@ unsigned int bnxt_get_max_func_stat_ctxs(struct bnxt *bp);
 void bnxt_set_max_func_stat_ctxs(struct bnxt *bp, unsigned int max);
 unsigned int bnxt_get_max_func_cp_rings(struct bnxt *bp);
 void bnxt_set_max_func_cp_rings(struct bnxt *bp, unsigned int max);
-unsigned int bnxt_get_max_func_irqs(struct bnxt *bp);
 int bnxt_get_avail_msix(struct bnxt *bp, int num);
 int bnxt_reserve_rings(struct bnxt *bp);
 void bnxt_tx_disable(struct bnxt *bp);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
index c37b284..deac73e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
@@ -220,21 +220,6 @@ int bnxt_get_ulp_msix_base(struct bnxt *bp)
 	return 0;
 }
 
-void bnxt_subtract_ulp_resources(struct bnxt *bp, int ulp_id)
-{
-	ASSERT_RTNL();
-	if (bnxt_ulp_registered(bp->edev, ulp_id)) {
-		struct bnxt_en_dev *edev = bp->edev;
-		unsigned int msix_req, max;
-
-		msix_req = edev->ulp_tbl[ulp_id].msix_requested;
-		max = bnxt_get_max_func_cp_rings(bp);
-		bnxt_set_max_func_cp_rings(bp, max - msix_req);
-		max = bnxt_get_max_func_stat_ctxs(bp);
-		bnxt_set_max_func_stat_ctxs(bp, max - 1);
-	}
-}
-
 static int bnxt_send_msg(struct bnxt_en_dev *edev, int ulp_id,
 			 struct bnxt_fw_msg *fw_msg)
 {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
index df48ac7..d9bea37 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
@@ -90,7 +90,6 @@ static inline bool bnxt_ulp_registered(struct bnxt_en_dev *edev, int ulp_id)
 
 int bnxt_get_ulp_msix_num(struct bnxt *bp);
 int bnxt_get_ulp_msix_base(struct bnxt *bp);
-void bnxt_subtract_ulp_resources(struct bnxt *bp, int ulp_id);
 void bnxt_ulp_stop(struct bnxt *bp);
 void bnxt_ulp_start(struct bnxt *bp);
 void bnxt_ulp_sriov_cfg(struct bnxt *bp, int num_vfs);
-- 
2.5.1

^ permalink raw reply related

* [PATCH net 1/3] bnxt_en: Fix firmware signaled resource change logic in open.
From: Michael Chan @ 2018-09-03  8:23 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1535962999-10013-1-git-send-email-michael.chan@broadcom.com>

When the driver detects that resources have changed during open, it
should reset the rx and tx rings to 0.  This will properly setup the
init sequence to initialize the default rings again.  We also need
to signal the RDMA driver to stop and clear its interrupts.  We then
call the RoCE driver to restart if a new set of default rings is
successfully reserved.

Fixes: 25e1acd6b92b ("bnxt_en: Notify firmware about IF state changes.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8bb1e38..6a1baf3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6684,6 +6684,8 @@ static int bnxt_hwrm_if_change(struct bnxt *bp, bool up)
 		hw_resc->resv_rx_rings = 0;
 		hw_resc->resv_hw_ring_grps = 0;
 		hw_resc->resv_vnics = 0;
+		bp->tx_nr_rings = 0;
+		bp->rx_nr_rings = 0;
 	}
 	return rc;
 }
@@ -8769,20 +8771,25 @@ static int bnxt_init_dflt_ring_mode(struct bnxt *bp)
 	if (bp->tx_nr_rings)
 		return 0;
 
+	bnxt_ulp_irq_stop(bp);
+	bnxt_clear_int_mode(bp);
 	rc = bnxt_set_dflt_rings(bp, true);
 	if (rc) {
 		netdev_err(bp->dev, "Not enough rings available.\n");
-		return rc;
+		goto init_dflt_ring_err;
 	}
 	rc = bnxt_init_int_mode(bp);
 	if (rc)
-		return rc;
+		goto init_dflt_ring_err;
+
 	bp->tx_nr_rings_per_tc = bp->tx_nr_rings;
 	if (bnxt_rfs_supported(bp) && bnxt_rfs_capable(bp)) {
 		bp->flags |= BNXT_FLAG_RFS;
 		bp->dev->features |= NETIF_F_NTUPLE;
 	}
-	return 0;
+init_dflt_ring_err:
+	bnxt_ulp_irq_restart(bp, rc);
+	return rc;
 }
 
 int bnxt_restore_pf_fw_resources(struct bnxt *bp)
-- 
2.5.1

^ permalink raw reply related

* [PATCH net 0/3] bnxt_en: Bug fixes.
From: Michael Chan @ 2018-09-03  8:23 UTC (permalink / raw)
  To: davem; +Cc: netdev

This short series fixes resource related logic in the driver, mostly
affecting the RDMA driver under corner cases.

Michael Chan (3):
  bnxt_en: Fix firmware signaled resource change logic in open.
  bnxt_en: Clean up unused functions.
  bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA.

 drivers/net/ethernet/broadcom/bnxt/bnxt.c       | 22 +++++++++++++++-------
 drivers/net/ethernet/broadcom/bnxt/bnxt.h       |  3 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c |  7 ++++---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c   | 20 --------------------
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h   |  1 -
 5 files changed, 20 insertions(+), 33 deletions(-)

-- 
2.5.1

^ permalink raw reply

* Re: [PATCH v2] xfrm6: call kfree_skb when skb is toobig
From: Steffen Klassert @ 2018-09-03  8:21 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: netdev, yoshfuji, kuznet, davem, herbert, eyal.birger, sd
In-Reply-To: <20180831113849.19909-1-cascardo@canonical.com>

On Fri, Aug 31, 2018 at 08:38:49AM -0300, Thadeu Lima de Souza Cascardo wrote:
> After commit d6990976af7c5d8f55903bfb4289b6fb030bf754 ("vti6: fix PMTU caching
> and reporting on xmit"), some too big skbs might be potentially passed down to
> __xfrm6_output, causing it to fail to transmit but not free the skb, causing a
> leak of skb, and consequentially a leak of dst references.
> 
> After running pmtu.sh, that shows as failure to unregister devices in a namespace:
> 
> [  311.397671] unregister_netdevice: waiting for veth_b to become free. Usage count = 1
> 
> The fix is to call kfree_skb in case of transmit failures.
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
> Fixes: dd767856a36e ("xfrm6: Don't call icmpv6_send on local error")

Applied, thanks a lot!

^ permalink raw reply

* Re: [PATCH ipsec-next 1/2] xfrm: reset transport header back to network header after all input transforms ahave been applied
From: Steffen Klassert @ 2018-09-03  8:20 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev, davem
In-Reply-To: <9cd59eabecd0263b3d8960f3c0c7de435094b82a.1535712205.git.sowmini.varadhan@oracle.com>

On Sun, Sep 02, 2018 at 04:18:42PM -0700, Sowmini Varadhan wrote:
> A policy may have been set up with multiple transforms (e.g., ESP
> and ipcomp). In this situation, the ingress IPsec processing
> iterates in xfrm_input() and applies each transform in turn,
> processing the nexthdr to find any additional xfrm that may apply.
> 
> This patch resets the transport header back to network header
> only after the last transformation so that subsequent xfrms
> can find the correct transport header.
> 
> Suggested-by: Steffen Klassert <steffen.klassert@secunet.com>
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>

This patch is a fix, so please add a proper fixes tag
so that it can be backported to the stable trees.
Same applied to the other patch in this series.

^ permalink raw reply

* [PATCH net-next] i40e: remove inline directive
From: YueHaibing @ 2018-09-03 12:36 UTC (permalink / raw)
  To: davem, jeffrey.t.kirsher
  Cc: linux-kernel, netdev, intel-wired-lan, YueHaibing

Fixes follow gcc warning:

drivers/net/ethernet/intel/i40e/i40e_ethtool.c: In function '__i40e_add_stat_strings':
drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h:193:20: error: function '__i40e_add_stat_strings' can never be inlined because it uses variable argument lists

Fixes: 8fd75c58a09a ("i40e: move ethtool stats boiler plate code to i40e_ethtool_stats.h")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h b/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
index bba1cb0..0290ade 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
@@ -190,7 +190,7 @@ i40e_add_queue_stats(u64 **data, struct i40e_ring *ring)
  * Format and copy the strings described by stats into the buffer pointed at
  * by p.
  **/
-static inline void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
+static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
 				    const unsigned int size, ...)
 {
 	unsigned int i;
-- 
2.7.0

^ permalink raw reply related

* Re: [PATCH RFC net-next 00/11] udp gso
From: Steffen Klassert @ 2018-09-03  8:02 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Paolo Abeni, Sowmini Varadhan, Network Development,
	Willem de Bruijn
In-Reply-To: <CAF=yD-Ls-nn2Uh0bVveMGgS_Dsp7eeKpJvvo6rmuq1iuwwrrWg@mail.gmail.com>

On Fri, Aug 31, 2018 at 09:08:59AM -0400, Willem de Bruijn wrote:
> On Fri, Aug 31, 2018 at 5:09 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > Hi,
> >
> > On Tue, 2018-04-17 at 17:07 -0400, Willem de Bruijn wrote:
> > > That said, for negotiated flows an inverse GRO feature could
> > > conceivably be implemented to reduce rx stack traversal, too.
> > > Though due to interleaving of packets on the wire, it aggregation
> > > would be best effort, similar to TCP TSO and GRO using the
> > > PSH bit as packetization signal.
> >
> > Reviving this old thread, before I forgot again. I have some local
> > patches implementing UDP GRO in a dual way to current GSO_UDP_L4
> > implementation: several datagram with the same length are aggregated
> > into a single one, and the user space receive a single larger packet
> > instead of multiple ones. I hope quic can leverage such scenario, but I
> > really know nothing about the protocol.
> >
> > I measure roughly a 50% performance improvement with udpgso_bench in
> > respect to UDP GSO, and ~100% using a pktgen sender, and a reduced CPU
> > usage on the receiver[1]. Some additional hacking to the general GRO
> > bits is required to avoid useless socket lookups for ingress UDP
> > packets when UDP_GSO is not enabled.
> >
> > If there is interest on this topic, I can share some RFC patches
> > (hopefully somewhat next week).
> 
> As Eric pointed out, QUIC reception on mobile clients over the WAN
> may not see much gain. But apparently there is a non-trivial amount
> of traffic the other way, to servers. Again, WAN might limit whatever
> gain we get, but I do want to look into that. And there are other UDP high
> throughput workloads (with or without QUIC) between servers.
> 
> If you have patches, please do share them. I actually also have a rough
> patch that I did not consider ready to share yet. Based on Tom's existing
> socket lookup in udp_gro_receive to detect whether a local destination
> exists and whether it has set an option to support receiving coalesced
> payloads (along with a cmsg to share the segment size).
> 
> Converting udp_recvmsg to split apart gso packets to make this
> transparent seems to me to be too complex and not worth the effort.
> 
> If a local socket is not found in udp_gro_receive, this could also be
> tentative interpreted as a non-local path (with false positives), enabling
> transparent use of GRO + GSO batching on the forwarding path.

On forwarding, it might be even better to build packet lists instead
of merging the payload. This would save the cost of the merge at GRO
and the split at GSO (at least when the split happens in software).

I'm working on patches that builds such skb lists. The list is chained
at the frag_list pointer of the first skb, all subsequent skbs are linked
to the next pointer of the skb. It looks like this:

|---------|        |---------|        |---------|
|flow 1   |        |flow 1   |        |flow 1   |
|---------|        |---------|        |---------|
|frag_list|<-\     |frag_list|        |frag_list|
|---------|   \    |---------|        |---------|
|next     |    \---|next     |<-------|next     |
|---------|        |---------|        |---------|

Furthermore, I'm trying to integrate this with the new skb listification
work. Instead of just linking the packets into the lists as they arrive,
I try to sort them into flows, so that packets of the same flow can
travel together and lookups etc. are just done for one representative
skb of a given flow.

Here I build the packet lists like this:


|---------|        |---------|        |---------|
|flow 1   |        |flow 1   |        |flow 1   |
|---------|        |---------|        |---------|
|frag_list|<-\     |frag_list|        |frag_list|
|---------|   \    |---------|        |---------|
|next     |<-\ \---|next     |<-------|next     |
|---------|   \    |---------|        |---------|
              |
              |
              |    |---------|        |---------|        |---------|
              |    |flow 2   |        |flow 2   |        |flow 2   |
              |    |---------|        |---------|        |---------|
              |    |frag_list|<-\     |frag_list|        |frag_list|
              |    |---------|   \    |---------|        |---------|
              |----|next     |<-\ \---|next     |<-------|next     |
                   |---------|   \    |---------|        |---------|
                                 |
                                 |
                                 |    |---------|        |---------|       |---------|
                                 |    |flow 3   |        |flow 3   |       |flow 3   |
                                 |    |---------|        |---------|       |---------|
                                 |    |frag_list|<-\     |frag_list|       |frag_list|
                                 |    |---------|   \    |---------|       |---------|
                                 |----|next     |    \---|next     |<------|next     |
                                      |---------|        |---------|       |---------|

I've tried plumb it through to the forwarding/output path
and to the UDP receive queue. I have a basic working version,
but there are still unresolved things (UDP encapsulation etc).
So for now it is just my playground. But I  thought it might
make sense to share the idea here...

^ permalink raw reply

* Re: pull-request: mac80211 2018-09-03
From: Johannes Berg @ 2018-09-03 12:19 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180903121546.27673-1-johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>

On Mon, 2018-09-03 at 14:15 +0200, Johannes Berg wrote:
> Hi Dave,
> 
> This time around for mac80211 I have a larger than usual number of
> fixes, in part because Luca dumped our (Intel's) patches out after
> quite a while - we'll try to make sure this doesn't happen again.
> 
> Shortlog below, as usual, each fix is pretty self-contained but it
> adds up to quite a bit overall.
> 
> Please pull and let me know if there's any problem.

Oh, forgot: if/when you merge this, could you pull it into net-next at
some point? I have a patch that depends on this but isn't really net
material - it can wait though, so no hurry.

johannes

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox