Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH V3 net 1/2] openvswitch: Properly set L4 keys on "later" IP fragments
From: David Miller @ 2019-08-28 21:54 UTC (permalink / raw)
  To: gvrose8192; +Cc: netdev, pshelar, joe
In-Reply-To: <1566917890-22304-1-git-send-email-gvrose8192@gmail.com>

From: Greg Rose <gvrose8192@gmail.com>
Date: Tue, 27 Aug 2019 07:58:09 -0700

> When IP fragments are reassembled before being sent to conntrack, the
> key from the last fragment is used.  Unless there are reordering
> issues, the last fragment received will not contain the L4 ports, so the
> key for the reassembled datagram won't contain them.  This patch updates
> the key once we have a reassembled datagram.
> 
> The handle_fragments() function works on L3 headers so we pull the L3/L4
> flow key update code from key_extract into a new function
> 'key_extract_l3l4'.  Then we add a another new function
> ovs_flow_key_update_l3l4() and export it so that it is accessible by
> handle_fragments() for conntrack packet reassembly.
> 
> Co-authored by: Justin Pettit <jpettit@ovn.org>
> Signed-off-by: Greg Rose <gvrose8192@gmail.com>

Applied with Co-authored-by fixed.

^ permalink raw reply

* Re: [PATCH V3 net 2/2] openvswitch: Clear the L4 portion of the key for "later" fragments.
From: David Miller @ 2019-08-28 21:54 UTC (permalink / raw)
  To: gvrose8192; +Cc: netdev, pshelar, joe, jpettit
In-Reply-To: <1566917890-22304-2-git-send-email-gvrose8192@gmail.com>

From: Greg Rose <gvrose8192@gmail.com>
Date: Tue, 27 Aug 2019 07:58:10 -0700

> From: Justin Pettit <jpettit@ovn.org>
> 
> Only the first fragment in a datagram contains the L4 headers.  When the
> Open vSwitch module parses a packet, it always sets the IP protocol
> field in the key, but can only set the L4 fields on the first fragment.
> The original behavior would not clear the L4 portion of the key, so
> garbage values would be sent in the key for "later" fragments.  This
> patch clears the L4 fields in that circumstance to prevent sending those
> garbage values as part of the upcall.
> 
> Signed-off-by: Justin Pettit <jpettit@ovn.org>

Applied.

^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Alexei Starovoitov @ 2019-08-28 22:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Alexei Starovoitov, Kees Cook, LSM List,
	James Morris, Jann Horn, Masami Hiramatsu, Steven Rostedt,
	David S. Miller, Daniel Borkmann, Network Development, bpf,
	kernel-team, Linux API
In-Reply-To: <20190828071421.GK2332@hirez.programming.kicks-ass.net>

On Wed, Aug 28, 2019 at 09:14:21AM +0200, Peter Zijlstra wrote:
> On Tue, Aug 27, 2019 at 04:01:08PM -0700, Andy Lutomirski wrote:
> 
> > > Tracing:
> > >
> > > CAP_BPF and perf_paranoid_tracepoint_raw() (which is kernel.perf_event_paranoid == -1)
> > > are necessary to:
> 
> That's not tracing, that's perf.
> 
> > > +bool cap_bpf_tracing(void)
> > > +{
> > > +       return capable(CAP_SYS_ADMIN) ||
> > > +              (capable(CAP_BPF) && !perf_paranoid_tracepoint_raw());
> > > +}
> 
> A whole long time ago, I proposed we introduce CAP_PERF or something
> along those lines; as a replacement for that horrible crap Android and
> Debian ship. But nobody was ever interested enough.
> 
> The nice thing about that is that you can then disallow perf/tracing in
> general, but tag the perf executable (and similar tools) with the
> capability so that unpriv users can still use it, but only limited
> through the tool, not the syscalls directly.

Exactly.
Similar motivation for CAP_BPF as well.

re: your first comment above.
I'm not sure what difference you see in words 'tracing' and 'perf'.
I really hope we don't partition the overall tracing category
into CAP_PERF and CAP_FTRACE only because these pieces are maintained
by different people.
On one side perf_event_open() isn't really doing tracing (as step by
step ftracing of function sequences), but perf_event_open() opens
an event and the sequence of events (may include IP) becomes a trace.
imo CAP_TRACING is the best name to descibe the privileged space
of operations possible via perf_event_open, ftrace, kprobe, stack traces, etc.

Another reason are kuprobes. They can be crated via perf_event_open
and via tracefs. Are they in CAP_PERF or in CAP_FTRACE ? In both, right?
Should then CAP_KPROBE be used ? that would be an overkill.
It would partition the space even further without obvious need.

Looking from BPF angle... BPF doesn't have integration with ftrace yet.
bpf_trace_printk is using ftrace mechanism, but that's 1% of ftrace.
In the long run I really like to see bpf using all of ftrace.
Whereas bpf is using a lot of 'perf'.
And extending some perf things in bpf specific way.
Take a look at how BPF_F_STACK_BUILD_ID. It's clearly perf/stack_tracing
feature that generic perf can use one day.
Currently it sits in bpf land and accessible via bpf only.
Though its bpf only today I categorize it under CAP_TRACING.

I think CAP_TRACING privilege should allow task to do all of perf_event_open,
kuprobe, stack trace, ftrace, and kallsyms.
We can think of some exceptions that should stay under CAP_SYS_ADMIN,
but most of the functionality available by 'perf' binary should be
usable with CAP_TRACING. 'perf' can do bpf too.
With CAP_BPF it would be all set.


^ permalink raw reply

* Re: general protection fault in tls_sk_proto_close (2)
From: Jakub Kicinski @ 2019-08-28 22:26 UTC (permalink / raw)
  To: john.fastabend
  Cc: syzbot, aviadye, borisp, daniel, davejwatson, davem, linux-kernel,
	netdev, syzkaller-bugs
In-Reply-To: <000000000000c3c461059127a1c4@google.com>

On Tue, 27 Aug 2019 23:38:07 -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    a55aa89a Linux 5.3-rc6
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=16c26ebc600000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=2a6a2b9826fdadf9
> dashboard link: https://syzkaller.appspot.com/bug?extid=7a6ee4d0078eac6bf782
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1112a4de600000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+7a6ee4d0078eac6bf782@syzkaller.appspotmail.com

Hi John!

This is a loop where TLS calls it's own close function recursively.
It seems we must have gotten BPF installed on top of TLS, and then 
it handed TLS TLS'es own sk_proto via tcp_update_ulp().

Can BPF on top of TLS be prevented somehow?

Quick fix should probably be something like:

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 43252a801c3f..3f4962756fa4 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -816,6 +816,9 @@ static void tls_update(struct sock *sk, struct proto *p)
 
        ctx = tls_get_ctx(sk);
        if (likely(ctx)) {
+               if (p->setsockopt == tls_setsockopt)
+                       return;
+
                ctx->sk_proto_close = p->close;
                ctx->sk_proto = p;
        } else {

> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] PREEMPT SMP KASAN
> CPU: 1 PID: 10290 Comm: syz-executor.0 Not tainted 5.3.0-rc6 #120
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
> Google 01/01/2011
> RIP: 0010:tls_sk_proto_close+0xe5/0x990 net/tls/tls_main.c:298
> Code: 0f 85 3f 08 00 00 49 8b 84 24 c0 02 00 00 4d 8d 75 14 4c 89 f2 48 c1  
> ea 03 48 89 85 50 ff ff ff 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 02 4c  
> 89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85 2e 06 00 00
> RSP: 0018:ffff88809b23fb90 EFLAGS: 00010203
> RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: ffffffff862cb8db
> RDX: 0000000000000002 RSI: ffffffff862cb639 RDI: ffff8880a155ef00
> RBP: ffff88809b23fc48 R08: ffff888094344640 R09: ffffed10142abd9a
> R10: ffffed10142abd99 R11: ffff8880a155eccb R12: ffff8880a155ec40
> R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000001
> FS:  00005555556a8940(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f353458e000 CR3: 00000000a9174000 CR4: 00000000001406e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>   tls_sk_proto_close+0x35b/0x990 net/tls/tls_main.c:321
>   tcp_bpf_close+0x17c/0x390 net/ipv4/tcp_bpf.c:582
>   inet_release+0xed/0x200 net/ipv4/af_inet.c:427
>   inet6_release+0x53/0x80 net/ipv6/af_inet6.c:470
>   __sock_release+0xce/0x280 net/socket.c:590
>   sock_close+0x1e/0x30 net/socket.c:1268
>   __fput+0x2ff/0x890 fs/file_table.c:280
>   ____fput+0x16/0x20 fs/file_table.c:313
>   task_work_run+0x145/0x1c0 kernel/task_work.c:113
>   tracehook_notify_resume include/linux/tracehook.h:188 [inline]
>   exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
>   prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
>   syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
>   do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x413540
> Code: 01 f0 ff ff 0f 83 30 1b 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f  
> 44 00 00 83 3d 4d 2d 66 00 00 75 14 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff  
> ff 0f 83 04 1b 00 00 c3 48 83 ec 08 e8 0a fc ff ff
> RSP: 002b:00007fff5d481778 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
> RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000413540
> RDX: 0000001b2e520000 RSI: 0000000000000000 RDI: 0000000000000005
> RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffffffffffff
> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000075bf20
> R13: 0000000000000003 R14: 0000000000761220 R15: ffffffffffffffff
> Modules linked in:
> ---[ end trace bdfd4385a0f1f76d ]---
> RIP: 0010:tls_sk_proto_close+0xe5/0x990 net/tls/tls_main.c:298
> Code: 0f 85 3f 08 00 00 49 8b 84 24 c0 02 00 00 4d 8d 75 14 4c 89 f2 48 c1  
> ea 03 48 89 85 50 ff ff ff 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 02 4c  
> 89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85 2e 06 00 00
> RSP: 0018:ffff88809b23fb90 EFLAGS: 00010203
> RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: ffffffff862cb8db
> RDX: 0000000000000002 RSI: ffffffff862cb639 RDI: ffff8880a155ef00
> RBP: ffff88809b23fc48 R08: ffff888094344640 R09: ffffed10142abd9a
> R10: ffffed10142abd99 R11: ffff8880a155eccb R12: ffff8880a155ec40
> R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000001
> FS:  00005555556a8940(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f353458e000 CR3: 00000000a9174000 CR4: 00000000001406e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches


^ permalink raw reply related

* Re: [net-next 10/15] i40e: fix hw_dbg usage in i40e_hmc_get_object_va
From: Jakub Kicinski @ 2019-08-28 22:38 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: davem, Mauro S. M. Rodrigues, netdev, nhorman, sassmann,
	Andrew Bowers
In-Reply-To: <20190828064407.30168-11-jeffrey.t.kirsher@intel.com>

On Tue, 27 Aug 2019 23:44:02 -0700, Jeff Kirsher wrote:
> @@ -982,6 +983,7 @@ i40e_status i40e_hmc_get_object_va(struct i40e_hmc_info *hmc_info,
>  	struct i40e_hmc_sd_entry *sd_entry;
>  	struct i40e_hmc_pd_entry *pd_entry;
>  	u32 pd_idx, pd_lmt, rel_pd_idx;
> +	struct i40e_hmc_info *hmc_info = &hw->hmc;

reverse xmas tree

>  	u64 obj_offset_in_fpm;
>  	u32 sd_idx, sd_lmt;
>  

^ permalink raw reply

* Re: [net-next 11/15] i40e: Implement debug macro hw_dbg using pr_debug
From: Jakub Kicinski @ 2019-08-28 22:39 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: davem, Mauro S. M. Rodrigues, netdev, nhorman, sassmann,
	Andrew Bowers
In-Reply-To: <20190828064407.30168-12-jeffrey.t.kirsher@intel.com>

On Tue, 27 Aug 2019 23:44:03 -0700, Jeff Kirsher wrote:
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_osdep.h b/drivers/net/ethernet/intel/i40e/i40e_osdep.h
> index a07574bff550..c0c9ce3eab23 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_osdep.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_osdep.h
> @@ -18,7 +18,12 @@
>   * actual OS primitives
>   */
>  
> -#define hw_dbg(hw, S, A...)	do {} while (0)
> +#define hw_dbg(hw, S, A...)							\
> +do {										\
> +	int domain = pci_domain_nr(((struct i40e_pf *)(hw)->back)->pdev->bus);	\
> +	pr_debug("i40e %04x:%02x:%02x.%x " S, domain, (hw)->bus.bus_id,		\
> +		 (hw)->bus.device, (hw)->bus.func, ## A);			\

This looks like open coded dev_dbg() / dev_name(), why?

> +} while (0)

^ permalink raw reply

* Re: [net-next 13/15] ixgbe: sync the first fragment unconditionally
From: Jakub Kicinski @ 2019-08-28 22:40 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: davem, Firo Yang, netdev, nhorman, sassmann, Alexander Duyck,
	Andrew Bowers
In-Reply-To: <20190828064407.30168-14-jeffrey.t.kirsher@intel.com>

On Tue, 27 Aug 2019 23:44:05 -0700, Jeff Kirsher wrote:
> Fixes: f3213d932173 ("ixgbe: Update driver to make use of DMA
> attributes in Rx path")

wrapped tag

^ permalink raw reply

* Re: [PATCH net] ibmvnic: Do not process reset during or after device removal
From: David Miller @ 2019-08-28 22:46 UTC (permalink / raw)
  To: tlfalcon; +Cc: netdev, linuxppc-dev
In-Reply-To: <1566922204-8770-1-git-send-email-tlfalcon@linux.ibm.com>

From: Thomas Falcon <tlfalcon@linux.ibm.com>
Date: Tue, 27 Aug 2019 11:10:04 -0500

> Currently, the ibmvnic driver will not schedule device resets
> if the device is being removed, but does not check the device
> state before the reset is actually processed. This leads to a race
> where a reset is scheduled with a valid device state but is
> processed after the driver has been removed, resulting in an oops.
> 
> Fix this by checking the device state before processing a queued
> reset event.
> 
> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>

Applied.

^ permalink raw reply

* Re: [net-next 00/15][pull request] 100GbE Intel Wired LAN Driver Updates 2019-08-26
From: David Miller @ 2019-08-28 22:47 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann
In-Reply-To: <20190827163832.8362-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 27 Aug 2019 09:38:17 -0700

> This series contains updates to ice driver only.

Pulled, but I have to agree with Jakub that using CPP ifdefs to control the
"static"'ness of a function is pushing the boundaries of good taste at best.

^ permalink raw reply

* [v1] net_sched: act_police: add 2 new attributes to support police 64bit rate and peakrate
From: David Dai @ 2019-08-28 22:51 UTC (permalink / raw)
  To: jhs, xiyou.wangcong, jiri, davem, netdev, linux-kernel; +Cc: zdai, zdai

For high speed adapter like Mellanox CX-5 card, it can reach upto
100 Gbits per second bandwidth. Currently htb already supports 64bit rate
in tc utility. However police action rate and peakrate are still limited
to 32bit value (upto 32 Gbits per second). Add 2 new attributes
TCA_POLICE_RATE64 and TCA_POLICE_RATE64 in kernel for 64bit support
so that tc utility can use them for 64bit rate and peakrate value to
break the 32bit limit, and still keep the backward binary compatibility.

Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Signed-off-by: David Dai <zdai@linux.vnet.ibm.com>
---
 include/uapi/linux/pkt_cls.h |    2 ++
 net/sched/act_police.c       |   27 +++++++++++++++++++++++----
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index b057aee..eb4ea4d 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -159,6 +159,8 @@ enum {
 	TCA_POLICE_AVRATE,
 	TCA_POLICE_RESULT,
 	TCA_POLICE_TM,
+	TCA_POLICE_RATE64,
+	TCA_POLICE_PEAKRATE64,
 	TCA_POLICE_PAD,
 	__TCA_POLICE_MAX
 #define TCA_POLICE_RESULT TCA_POLICE_RESULT
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 49cec3e..ed5372e 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -40,6 +40,8 @@ static int tcf_police_walker(struct net *net, struct sk_buff *skb,
 	[TCA_POLICE_PEAKRATE]	= { .len = TC_RTAB_SIZE },
 	[TCA_POLICE_AVRATE]	= { .type = NLA_U32 },
 	[TCA_POLICE_RESULT]	= { .type = NLA_U32 },
+	[TCA_POLICE_RATE64]     = { .type = NLA_U64 },
+	[TCA_POLICE_PEAKRATE64] = { .type = NLA_U64 },
 };
 
 static int tcf_police_init(struct net *net, struct nlattr *nla,
@@ -58,6 +60,7 @@ static int tcf_police_init(struct net *net, struct nlattr *nla,
 	struct tcf_police_params *new;
 	bool exists = false;
 	u32 index;
+	u64 rate64, prate64;
 
 	if (nla == NULL)
 		return -EINVAL;
@@ -155,14 +158,18 @@ static int tcf_police_init(struct net *net, struct nlattr *nla,
 	}
 	if (R_tab) {
 		new->rate_present = true;
-		psched_ratecfg_precompute(&new->rate, &R_tab->rate, 0);
+		rate64 = tb[TCA_POLICE_RATE64] ?
+			 nla_get_u64(tb[TCA_POLICE_RATE64]) : 0;
+		psched_ratecfg_precompute(&new->rate, &R_tab->rate, rate64);
 		qdisc_put_rtab(R_tab);
 	} else {
 		new->rate_present = false;
 	}
 	if (P_tab) {
 		new->peak_present = true;
-		psched_ratecfg_precompute(&new->peak, &P_tab->rate, 0);
+		prate64 = tb[TCA_POLICE_PEAKRATE64] ?
+			  nla_get_u64(tb[TCA_POLICE_PEAKRATE64]) : 0;
+		psched_ratecfg_precompute(&new->peak, &P_tab->rate, prate64);
 		qdisc_put_rtab(P_tab);
 	} else {
 		new->peak_present = false;
@@ -313,10 +320,22 @@ static int tcf_police_dump(struct sk_buff *skb, struct tc_action *a,
 				      lockdep_is_held(&police->tcf_lock));
 	opt.mtu = p->tcfp_mtu;
 	opt.burst = PSCHED_NS2TICKS(p->tcfp_burst);
-	if (p->rate_present)
+	if (p->rate_present) {
 		psched_ratecfg_getrate(&opt.rate, &p->rate);
-	if (p->peak_present)
+		if ((police->params->rate.rate_bytes_ps >= (1ULL << 32)) &&
+		    nla_put_u64_64bit(skb, TCA_POLICE_RATE64,
+				      police->params->rate.rate_bytes_ps,
+				      TCA_POLICE_PAD))
+			goto nla_put_failure;
+	}
+	if (p->peak_present) {
 		psched_ratecfg_getrate(&opt.peakrate, &p->peak);
+		if ((police->params->peak.rate_bytes_ps >= (1ULL << 32)) &&
+		    nla_put_u64_64bit(skb, TCA_POLICE_PEAKRATE64,
+				      police->params->peak.rate_bytes_ps,
+				      TCA_POLICE_PAD))
+			goto nla_put_failure;
+	}
 	if (nla_put(skb, TCA_POLICE_TBF, sizeof(opt), &opt))
 		goto nla_put_failure;
 	if (p->tcfp_result &&
-- 
1.7.1


^ permalink raw reply related

* [v1] iproute2: police: support 64bit rate and peakrate in tc utility
From: David Dai @ 2019-08-28 22:52 UTC (permalink / raw)
  To: jhs, xiyou.wangcong, jiri, netdev, linux-kernel; +Cc: zdai, zdai

For high speed adapter like Mellanox CX-5 card, it can reach upto
100 Gbits per second bandwidth. Currently htb already supports 64bit rate
in tc utility. However police action rate and peakrate are still limited
to 32bit value (upto 32 Gbits per second). Taking advantage of the 2 new
attributes TCA_POLICE_RATE64 and TCA_POLICE_PEAKRATE64 from kernel,
tc can use them to break the 32bit limit, and still keep the backward 
binary compatibility.

Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Signed-off-by: David Dai <zdai@linux.vnet.ibm.com>
---
 include/uapi/linux/pkt_cls.h |    2 +
 tc/m_police.c                |   64 +++++++++++++++++++++++++++--------------
 tc/tc_core.c                 |   29 +++++++++++++++++++
 tc/tc_core.h                 |    3 ++
 4 files changed, 76 insertions(+), 22 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index b057aee..eb4ea4d 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -159,6 +159,8 @@ enum {
 	TCA_POLICE_AVRATE,
 	TCA_POLICE_RESULT,
 	TCA_POLICE_TM,
+	TCA_POLICE_RATE64,
+	TCA_POLICE_PEAKRATE64,
 	TCA_POLICE_PAD,
 	__TCA_POLICE_MAX
 #define TCA_POLICE_RESULT TCA_POLICE_RESULT
diff --git a/tc/m_police.c b/tc/m_police.c
index 862a39f..abdbcce 100644
--- a/tc/m_police.c
+++ b/tc/m_police.c
@@ -71,6 +71,7 @@ static int act_parse_police(struct action_util *a, int *argc_p, char ***argv_p,
 	unsigned int linklayer = LINKLAYER_ETHERNET; /* Assume ethernet */
 	int Rcell_log =  -1, Pcell_log = -1;
 	struct rtattr *tail;
+	__u64 rate64 = 0, prate64 = 0;
 
 	if (a) /* new way of doing things */
 		NEXT_ARG();
@@ -121,11 +122,11 @@ static int act_parse_police(struct action_util *a, int *argc_p, char ***argv_p,
 			}
 		} else if (strcmp(*argv, "rate") == 0) {
 			NEXT_ARG();
-			if (p.rate.rate) {
+			if (rate64) {
 				fprintf(stderr, "Double \"rate\" spec\n");
 				return -1;
 			}
-			if (get_rate(&p.rate.rate, *argv)) {
+			if (get_rate64(&rate64, *argv)) {
 				explain1("rate");
 				return -1;
 			}
@@ -141,11 +142,11 @@ static int act_parse_police(struct action_util *a, int *argc_p, char ***argv_p,
 			}
 		} else if (matches(*argv, "peakrate") == 0) {
 			NEXT_ARG();
-			if (p.peakrate.rate) {
+			if (prate64) {
 				fprintf(stderr, "Double \"peakrate\" spec\n");
 				return -1;
 			}
-			if (get_rate(&p.peakrate.rate, *argv)) {
+			if (get_rate64(&prate64, *argv)) {
 				explain1("peakrate");
 				return -1;
 			}
@@ -189,23 +190,23 @@ action_ctrl_ok:
 	if (!ok)
 		return -1;
 
-	if (p.rate.rate && avrate)
+	if (rate64 && avrate)
 		return -1;
 
 	/* Must at least do late binding, use TB or ewma policing */
-	if (!p.rate.rate && !avrate && !p.index) {
+	if (!rate64 && !avrate && !p.index) {
 		fprintf(stderr, "\"rate\" or \"avrate\" MUST be specified.\n");
 		return -1;
 	}
 
 	/* When the TB policer is used, burst is required */
-	if (p.rate.rate && !buffer && !avrate) {
+	if (rate64 && !buffer && !avrate) {
 		fprintf(stderr, "\"burst\" requires \"rate\".\n");
 		return -1;
 	}
 
-	if (p.peakrate.rate) {
-		if (!p.rate.rate) {
+	if (prate64) {
+		if (!rate64) {
 			fprintf(stderr, "\"peakrate\" requires \"rate\".\n");
 			return -1;
 		}
@@ -215,22 +216,24 @@ action_ctrl_ok:
 		}
 	}
 
-	if (p.rate.rate) {
+	if (rate64) {
+		p.rate.rate = (rate64 >= (1ULL << 32)) ? ~0U : rate64;
 		p.rate.mpu = mpu;
 		p.rate.overhead = overhead;
-		if (tc_calc_rtable(&p.rate, rtab, Rcell_log, mtu,
-				   linklayer) < 0) {
+		if (tc_calc_rtable_64(&p.rate, rtab, Rcell_log, mtu,
+				   linklayer, rate64) < 0) {
 			fprintf(stderr, "POLICE: failed to calculate rate table.\n");
 			return -1;
 		}
-		p.burst = tc_calc_xmittime(p.rate.rate, buffer);
+		p.burst = tc_calc_xmittime(rate64, buffer);
 	}
 	p.mtu = mtu;
-	if (p.peakrate.rate) {
+	if (prate64) {
+		p.peakrate.rate = (prate64 >= (1ULL << 32)) ? ~0U : prate64;
 		p.peakrate.mpu = mpu;
 		p.peakrate.overhead = overhead;
-		if (tc_calc_rtable(&p.peakrate, ptab, Pcell_log, mtu,
-				   linklayer) < 0) {
+		if (tc_calc_rtable_64(&p.peakrate, ptab, Pcell_log, mtu,
+				   linklayer, prate64) < 0) {
 			fprintf(stderr, "POLICE: failed to calculate peak rate table.\n");
 			return -1;
 		}
@@ -238,10 +241,16 @@ action_ctrl_ok:
 
 	tail = addattr_nest(n, MAX_MSG, tca_id);
 	addattr_l(n, MAX_MSG, TCA_POLICE_TBF, &p, sizeof(p));
-	if (p.rate.rate)
+	if (rate64) {
 		addattr_l(n, MAX_MSG, TCA_POLICE_RATE, rtab, 1024);
-	if (p.peakrate.rate)
+		if (rate64 >= (1ULL << 32))
+			addattr64(n, MAX_MSG, TCA_POLICE_RATE64, rate64);
+	}
+	if (prate64) {
 		addattr_l(n, MAX_MSG, TCA_POLICE_PEAKRATE, ptab, 1024);
+		if (prate64 >= (1ULL << 32))
+			addattr64(n, MAX_MSG, TCA_POLICE_PEAKRATE64, prate64);
+	}
 	if (avrate)
 		addattr32(n, MAX_MSG, TCA_POLICE_AVRATE, avrate);
 	if (presult)
@@ -268,6 +277,7 @@ static int print_police(struct action_util *a, FILE *f, struct rtattr *arg)
 	struct rtattr *tb[TCA_POLICE_MAX+1];
 	unsigned int buffer;
 	unsigned int linklayer;
+	__u64 rate64, prate64;
 
 	if (arg == NULL)
 		return 0;
@@ -286,16 +296,26 @@ static int print_police(struct action_util *a, FILE *f, struct rtattr *arg)
 #endif
 	p = RTA_DATA(tb[TCA_POLICE_TBF]);
 
+	rate64 = p->rate.rate;
+	if (tb[TCA_POLICE_RATE64] &&
+	    RTA_PAYLOAD(tb[TCA_POLICE_RATE64]) >= sizeof(rate64))
+		rate64 = rta_getattr_u64(tb[TCA_POLICE_RATE64]);
+
 	fprintf(f, " police 0x%x ", p->index);
-	fprintf(f, "rate %s ", sprint_rate(p->rate.rate, b1));
-	buffer = tc_calc_xmitsize(p->rate.rate, p->burst);
+	fprintf(f, "rate %s ", sprint_rate(rate64, b1));
+	buffer = tc_calc_xmitsize(rate64, p->burst);
 	fprintf(f, "burst %s ", sprint_size(buffer, b1));
 	fprintf(f, "mtu %s ", sprint_size(p->mtu, b1));
 	if (show_raw)
 		fprintf(f, "[%08x] ", p->burst);
 
-	if (p->peakrate.rate)
-		fprintf(f, "peakrate %s ", sprint_rate(p->peakrate.rate, b1));
+	prate64 = p->peakrate.rate;
+	if (tb[TCA_POLICE_PEAKRATE64] &&
+	    RTA_PAYLOAD(tb[TCA_POLICE_PEAKRATE64]) >= sizeof(prate64))
+		prate64 = rta_getattr_u64(tb[TCA_POLICE_PEAKRATE64]);
+
+	if (prate64)
+		fprintf(f, "peakrate %s ", sprint_rate(prate64, b1));
 
 	if (tb[TCA_POLICE_AVRATE])
 		fprintf(f, "avrate %s ",
diff --git a/tc/tc_core.c b/tc/tc_core.c
index 8eb1122..498d35d 100644
--- a/tc/tc_core.c
+++ b/tc/tc_core.c
@@ -152,6 +152,35 @@ int tc_calc_rtable(struct tc_ratespec *r, __u32 *rtab,
 	return cell_log;
 }
 
+int tc_calc_rtable_64(struct tc_ratespec *r, __u32 *rtab,
+		   int cell_log, unsigned int mtu,
+		   enum link_layer linklayer, __u64 rate)
+{
+	int i;
+	unsigned int sz;
+	__u64 bps = rate;
+	unsigned int mpu = r->mpu;
+
+	if (mtu == 0)
+		mtu = 2047;
+
+	if (cell_log < 0) {
+		cell_log = 0;
+		while ((mtu >> cell_log) > 255)
+			cell_log++;
+	}
+
+	for (i = 0; i < 256; i++) {
+		sz = tc_adjust_size((i + 1) << cell_log, mpu, linklayer);
+		rtab[i] = tc_calc_xmittime(bps, sz);
+	}
+
+	r->cell_align =  -1;
+	r->cell_log = cell_log;
+	r->linklayer = (linklayer & TC_LINKLAYER_MASK);
+	return cell_log;
+}
+
 /*
    stab[pkt_len>>cell_log] = pkt_xmit_size>>size_log
  */
diff --git a/tc/tc_core.h b/tc/tc_core.h
index bd4a99f..40520e7 100644
--- a/tc/tc_core.h
+++ b/tc/tc_core.h
@@ -21,6 +21,9 @@ unsigned tc_calc_xmittime(__u64 rate, unsigned size);
 unsigned tc_calc_xmitsize(__u64 rate, unsigned ticks);
 int tc_calc_rtable(struct tc_ratespec *r, __u32 *rtab,
 		   int cell_log, unsigned mtu, enum link_layer link_layer);
+int tc_calc_rtable_64(struct tc_ratespec *r, __u32 *rtab,
+			int cell_log, unsigned mtu, enum link_layer link_layer,
+			__u64 rate);
 int tc_calc_size_table(struct tc_sizespec *s, __u16 **stab);
 
 int tc_setup_estimator(unsigned A, unsigned time_const, struct tc_estimator *est);
-- 
1.7.1


^ permalink raw reply related

* Re: [PATCH net] net: sched: act_sample: fix psample group handling on overwrite
From: David Miller @ 2019-08-28 22:54 UTC (permalink / raw)
  To: vladbu; +Cc: netdev, jhs, xiyou.wangcong, jiri, dcaratti
In-Reply-To: <20190827184938.1824-1-vladbu@mellanox.com>

From: Vlad Buslov <vladbu@mellanox.com>
Date: Tue, 27 Aug 2019 21:49:38 +0300

> Action sample doesn't properly handle psample_group pointer in overwrite
> case. Following issues need to be fixed:
> 
> - In tcf_sample_init() function RCU_INIT_POINTER() is used to set
>   s->psample_group, even though we neither setting the pointer to NULL, nor
>   preventing concurrent readers from accessing the pointer in some way.
>   Use rcu_swap_protected() instead to safely reset the pointer.
> 
> - Old value of s->psample_group is not released or deallocated in any way,
>   which results resource leak. Use psample_group_put() on non-NULL value
>   obtained with rcu_swap_protected().
> 
> - The function psample_group_put() that released reference to struct
>   psample_group pointed by rcu-pointer s->psample_group doesn't respect rcu
>   grace period when deallocating it. Extend struct psample_group with rcu
>   head and use kfree_rcu when freeing it.
> 
> Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action")
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Alexei Starovoitov @ 2019-08-28 22:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Kees Cook, LSM List, James Morris, Jann Horn,
	Peter Zijlstra, Masami Hiramatsu, Steven Rostedt, David S. Miller,
	Daniel Borkmann, Network Development, bpf, kernel-team, Linux API
In-Reply-To: <CALCETrW1o+Lazi2Ng6b9JN6jeJffgdW9f3HvqYhNo4TpHRXW=g@mail.gmail.com>

On Tue, Aug 27, 2019 at 11:12:29PM -0700, Andy Lutomirski wrote:
> >
> > >
> > > From the previous discussion, you want to make progress toward solving
> > > a lot of problems with CAP_BPF.  One of them was making BPF
> > > firewalling more generally useful. By making CAP_BPF grant the ability
> > > to read kernel memory, you will make administrators much more nervous
> > > to grant CAP_BPF.
> >
> > Andy, were your email hacked?
> > I explained several times that in this proposal
> > CAP_BPF _and_ CAP_TRACING _both_ are necessary to read kernel memory.
> > CAP_BPF alone is _not enough_.
> 
> You have indeed said this many times.  You've stated it as a matter of
> fact as though it cannot possibly discussed.  I'm asking you to
> justify it.

That's not how I see it.
I kept stating that both CAP_BPF and CAP_TRACING are necessary to read
kernel memory whereas you kept distorting my statement by dropping second
part and then making claims that "CAP_BPF grant the ability to read
kernel memory, you will make administrators much more nervous".

Just s/CAP_BPF/CAP_BPF and CAP_TRACING/ in this above sentence.
See that meaning suddenly changes?
Now administrators would be worried about tasks that have both at once.
They also would be worried about tasks that have CAP_TRACING alone,
because that's what allows probe_kernel_read().

> It seems like you are specifically trying to add a new switch to turn
> as much of BPF as possible on and off.  Why?

Didn't I explain it several times already with multiple examples
from systemd, daemons, bpftrace ?

Let's try again.
Take your laptop with linux distro.
You're the only user there. I'm assuming you're not sharing it with
partner and kids. This is my definition of 'single user system'.
You can sudo on it at any time, but obviously prefer to run as many
apps as possible without cap_sys_admin.
Now you found some awesome open source app on the web that monitors
the health of the kernel and will pop a nice message on a screen if
something is wrong. Currently this app needs root. You hesitate,
but the apps is so useful and it has strong upstream code review process
that you keep running it 24/7.
This is open source app. New versions come. You upgrade.
You have enough trust in that app that you keep running it as root.
But there is always a chance that new version doing accidentaly
something stupid as 'kill -9 -1'. It's an open source app at the end.

Now I come with this CAP* proposal to make this app safer.
I'm not making your system more secure and not making this app
more secure. I can only make your laptop safer for day to day work
by limiting the operations this app can do.
This particular app monitros the kernel via bpf and tracing.
Hence you can give it CAP_TRACING and CAP_BPF and drop the rest.

> > speaking of MDS... I already asked you to help investigate its
> > applicability with existing bpf exposure. Are you going to do that?
> 
> I am blissfully uninvolved in MDS, and I don't know all that much more
> about the overall mechanism than a random reader of tech news :)  ISTM
> there are two meaningful ways that BPF could be involved: a BPF
> program could leak info into the state exposed by MDS, or a BPF
> program could try to read that state.  From what little I understand,
> it's essentially inevitable that BPF leaks information into MDS state,
> and this is probably even controllable by an attacker that understands
> MDS in enough detail.    So the interesting questions are: can BPF be
> used to read MDS state and can BPF be used to leak information in a
> more useful way than the rest of the kernel to an attacker.

agree. that's exactly the question to ask.

> Keeping in mind that the kernel will flush MDS state on every exit to
> usermode, I think the most likely attack is to try to read MDS state
> with BPF.  This could happen, I suppose -- BPF programs can easily
> contain the usual speculation gadgets of "do something and read an
> address that depends on the outcome".  Fortunately, outside of
> bpf_probe_read(), AFAIK BPF programs can't directly touch user memory,
> and an attacker that is allowed to use bpf_probe_read() doesn't need
> MDS to read things.

true as well.
So what do we do with that sentence in Documentation/x86/mds.rst?
Nothing?
New hw bugs will keep coming.
All of them should get similar wording?
Your understanding of MDS and BPF is way above the average.
What other users suppose to do when they read such sentence?
I think they have no choice but to do kernel.unprivileged_bpf_disabled=1.
We, as a kernel community, are forcing the users into it.
Hence I really do not see a value in any proposal today that expands
unprivileged bpf usage.
Since kernel.unprivileged_bpf_disabled=1 all bpf is under cap_sys_admin.
It's not great from security and safety pov. Hence this CAP* proposal.

> > Please take a look at Jann's var1 exploit. Was it hard to run bpf prog
> > in controlled environment without test_run command ?
> >
> 
> Can you send me a link?

https://bugs.chromium.org/p/project-zero/issues/detail?id=1272
writeup_files.tar:kernel_leak_exploit_amd_pro_a8_9600_r7/bpf_stuff.c
Execution is as trivial as write(sockfd, "X", 1) line 405.


^ permalink raw reply

* Re: [PATCH net] tcp: inherit timestamp on mtu probe
From: David Miller @ 2019-08-28 22:57 UTC (permalink / raw)
  To: willemdebruijn.kernel; +Cc: netdev, edumazet, jakub.kicinski, willemb
In-Reply-To: <20190827190933.227725-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Tue, 27 Aug 2019 15:09:33 -0400

> From: Willem de Bruijn <willemb@google.com>
> 
> TCP associates tx timestamp requests with a byte in the bytestream.
> If merging skbs in tcp_mtu_probe, migrate the tstamp request.
> 
> Similar to MSG_EOR, do not allow moving a timestamp from any segment
> in the probe but the last. This to avoid merging multiple timestamps.
> 
> Tested with the packetdrill script at
> https://github.com/wdebruij/packetdrill/commits/mtu_probe-1
> 
> Link: http://patchwork.ozlabs.org/patch/1143278/#2232897
> Fixes: 4ed2d765dfac ("net-timestamp: TCP timestamping")
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net-next 05/15] net: sgi: ioc3-eth: allocate space for desc rings only once
From: Jakub Kicinski @ 2019-08-28 22:58 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Ralf Baechle, Paul Burton, James Hogan, David S. Miller,
	linux-mips, linux-kernel, netdev
In-Reply-To: <20190828140315.17048-6-tbogendoerfer@suse.de>

On Wed, 28 Aug 2019 16:03:04 +0200, Thomas Bogendoerfer wrote:
> Memory for descriptor rings are allocated/freed, when interface is
> brought up/down. Since the size of the rings is not changeable by
> hardware, we now allocate rings now during probe and free it, when
> device is removed.
> 
> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>

So the rings still get freed and allocated from ioc3_init()
but there's a set allocated from the start? I guess that makes 
some sense..

Most drivers will allocate rings in open() and free them in close().

^ permalink raw reply

* Re: [PATCH net] net/sched: pfifo_fast: fix wrong dereference in pfifo_fast_enqueue
From: David Miller @ 2019-08-28 22:58 UTC (permalink / raw)
  To: dcaratti; +Cc: xiyou.wangcong, jhs, jiri, netdev, pabeni, sbrivio, shuali
In-Reply-To: <d5a7a167ab57e035685445ee641840a0c5fd39ae.1566940693.git.dcaratti@redhat.com>

From: Davide Caratti <dcaratti@redhat.com>
Date: Tue, 27 Aug 2019 23:18:53 +0200

> Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of
> 'TCQ_F_NOLOCK' bit in the parent qdisc, we can't assume anymore that
> per-cpu counters are there in the error path of skb_array_produce().
> Otherwise, the following splat can be seen:
 ...
> Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags',
> before dereferencing 'qdisc->cpu_qstats'.
> 
> Fixes: 8a53e616de29 ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too")
> CC: Paolo Abeni <pabeni@redhat.com>
> CC: Stefano Brivio <sbrivio@redhat.com>
> Reported-by: Li Shuang <shuali@redhat.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Applied and queued up for v5.2 -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net/ncsi: add response handlers for PLDM over NC-SI
From: David Miller @ 2019-08-28 23:00 UTC (permalink / raw)
  To: benwei; +Cc: sam, netdev, linux-kernel, openbmc
In-Reply-To: <CH2PR15MB3686302D8210855E5AB643B1A3A00@CH2PR15MB3686.namprd15.prod.outlook.com>

From: Ben Wei <benwei@fb.com>
Date: Tue, 27 Aug 2019 23:03:53 +0000

> This patch adds handlers for PLDM over NC-SI command response.
> 
> This enables NC-SI driver recognizes the packet type so the responses don't get dropped as unknown packet type.
> 
> PLDM over NC-SI are not handled in kernel driver for now, but can be passed back to user space via Netlink for further handling.
> 
> Signed-off-by: Ben Wei <benwei@fb.com>

I don't know why but patchwork puts part of your patch into the commit message, see:

https://patchwork.ozlabs.org/patch/1154104/

It's probably an encoding issue or similar.

> +static int ncsi_rsp_handler_pldm(struct ncsi_request *nr) {
> +	return 0;
> +}
> +
>  static int ncsi_rsp_handler_netlink(struct ncsi_request *nr)  {

I know other functions in this file do it, but please put the openning
curly braces of a function on a separate line.

Thank you.

^ permalink raw reply

* Re: [PATCH net v3 0/2] r8152: fix side effect
From: David Miller @ 2019-08-28 23:02 UTC (permalink / raw)
  To: hayeswang; +Cc: netdev, nic_swsd, linux-kernel
In-Reply-To: <1394712342-15778-320-Taiwan-albertk@realtek.com>

From: Hayes Wang <hayeswang@realtek.com>
Date: Wed, 28 Aug 2019 09:51:40 +0800

> v3:
> Update the commit message for patch #1.
> 
> v2:
> Replace patch #2 with "r8152: remove calling netif_napi_del".
> 
> v1:
> The commit 0ee1f4734967 ("r8152: napi hangup fix after disconnect")
> add a check to avoid using napi_disable after netif_napi_del. However,
> the commit ffa9fec30ca0 ("r8152: set RTL8152_UNPLUG only for real
> disconnection") let the check useless.
> 
> Therefore, I revert commit 0ee1f4734967 ("r8152: napi hangup fix
> after disconnect") first, and add another patch to fix it.

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next 06/15] net: sgi: ioc3-eth: get rid of ioc3_clean_rx_ring()
From: Jakub Kicinski @ 2019-08-28 23:02 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Ralf Baechle, Paul Burton, James Hogan, David S. Miller,
	linux-mips, linux-kernel, netdev
In-Reply-To: <20190828140315.17048-7-tbogendoerfer@suse.de>

On Wed, 28 Aug 2019 16:03:05 +0200, Thomas Bogendoerfer wrote:
> Clean rx ring is just called once after a new ring is allocated, which
> is per definition clean. So there is not need for this function.
> 
> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
> ---
>  drivers/net/ethernet/sgi/ioc3-eth.c | 21 ---------------------
>  1 file changed, 21 deletions(-)
> 
> diff --git a/drivers/net/ethernet/sgi/ioc3-eth.c b/drivers/net/ethernet/sgi/ioc3-eth.c
> index 6ca560d4ab79..39631e067b71 100644
> --- a/drivers/net/ethernet/sgi/ioc3-eth.c
> +++ b/drivers/net/ethernet/sgi/ioc3-eth.c
> @@ -761,26 +761,6 @@ static void ioc3_mii_start(struct ioc3_private *ip)
>  	add_timer(&ip->ioc3_timer);
>  }
>  
> -static inline void ioc3_clean_rx_ring(struct ioc3_private *ip)
> -{
> -	struct ioc3_erxbuf *rxb;
> -	struct sk_buff *skb;
> -	int i;
> -
> -	for (i = ip->rx_ci; i & 15; i++) {
> -		ip->rx_skbs[ip->rx_pi] = ip->rx_skbs[ip->rx_ci];
> -		ip->rxr[ip->rx_pi++] = ip->rxr[ip->rx_ci++];
> -	}
> -	ip->rx_pi &= RX_RING_MASK;
> -	ip->rx_ci &= RX_RING_MASK;
> -
> -	for (i = ip->rx_ci; i != ip->rx_pi; i = (i + 1) & RX_RING_MASK) {
> -		skb = ip->rx_skbs[i];
> -		rxb = (struct ioc3_erxbuf *)(skb->data - RX_OFFSET);
> -		rxb->w0 = 0;

There's gotta be some purpose to setting this w0 word to zero no?
ioc3_rx() uses that to see if the descriptor is done, and dutifully
clears it after..

> -	}
> -}
> -
>  static inline void ioc3_clean_tx_ring(struct ioc3_private *ip)
>  {
>  	struct sk_buff *skb;
> @@ -860,7 +840,6 @@ static void ioc3_init_rings(struct net_device *dev)
>  	ioc3_free_rings(ip);
>  	ioc3_alloc_rings(dev);
>  
> -	ioc3_clean_rx_ring(ip);
>  	ioc3_clean_tx_ring(ip);
>  
>  	/* Now the rx ring base, consume & produce registers.  */


^ permalink raw reply

* Re: [PATCH net-next 09/15] net: sgi: ioc3-eth: split ring cleaning/freeing and allocation
From: Jakub Kicinski @ 2019-08-28 23:04 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Ralf Baechle, Paul Burton, James Hogan, David S. Miller,
	linux-mips, linux-kernel, netdev
In-Reply-To: <20190828140315.17048-10-tbogendoerfer@suse.de>

On Wed, 28 Aug 2019 16:03:08 +0200, Thomas Bogendoerfer wrote:
> Do tx ring cleaning and freeing of rx buffers, when chip is shutdown and
> allocate buffers before bringing chip up.
> 
> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>

Ah, so you are moving the alloc/free to open/close, that's good.

^ permalink raw reply

* Re: [PATCH net 0/2] nfp: flower: fix bugs in merge tunnel encap code
From: David Miller @ 2019-08-28 23:08 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: netdev, oss-drivers
In-Reply-To: <20190828055630.17331-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Tue, 27 Aug 2019 22:56:28 -0700

> John says:
> 
> There are few bugs in the merge encap code that have come to light with
> recent driver changes. Effectively, flow bind callbacks were being
> registered twice when using internal ports (new 'busy' code triggers
> this). There was also an issue with neighbour notifier messages being
> ignored for internal ports.

Series applied and queued up for v5.2 -stable, thanks.

^ permalink raw reply

* Re: [PATCH] sky2: Disable MSI on yet another ASUS boards (P6Xxxx)
From: David Miller @ 2019-08-28 23:09 UTC (permalink / raw)
  To: tiwai; +Cc: mlindner, stephen, netdev, linux-kernel, swm
In-Reply-To: <20190828063119.22248-1-tiwai@suse.de>

From: Takashi Iwai <tiwai@suse.de>
Date: Wed, 28 Aug 2019 08:31:19 +0200

> A similar workaround for the suspend/resume problem is needed for yet
> another ASUS machines, P6X models.  Like the previous fix, the BIOS
> doesn't provide the standard DMI_SYS_* entry, so again DMI_BOARD_*
> entries are used instead.
> 
> Reported-and-tested-by: SteveM <swm@swm1.com>
> Signed-off-by: Takashi Iwai <tiwai@suse.de>

Applied, but this is getting suspicious.

It looks like MSI generally is not restored properly on resume on these
boards, so maybe there simply needs to be a generic PCI quirk for that?

^ permalink raw reply

* [PATCH][V2] arcnet: capmode: remove redundant assignment to pointer pkt
From: Colin King @ 2019-08-28 23:14 UTC (permalink / raw)
  To: Michael Grzeschik, David S . Miller, netdev; +Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Pointer pkt is being initialized with a value that is never read
and pkt is being re-assigned a little later on. The assignment is
redundant and hence can be removed.

Addresses-Coverity: ("Ununsed value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---

V2: fix typo in patch description, pkg -> pkt

---
 drivers/net/arcnet/capmode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/arcnet/capmode.c b/drivers/net/arcnet/capmode.c
index b780be6f41ff..c09b567845e1 100644
--- a/drivers/net/arcnet/capmode.c
+++ b/drivers/net/arcnet/capmode.c
@@ -44,7 +44,7 @@ static void rx(struct net_device *dev, int bufnum,
 {
 	struct arcnet_local *lp = netdev_priv(dev);
 	struct sk_buff *skb;
-	struct archdr *pkt = pkthdr;
+	struct archdr *pkt;
 	char *pktbuf, *pkthdrbuf;
 	int ofs;
 
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next] net: phy: force phy suspend when calling phy_stop
From: David Miller @ 2019-08-28 23:17 UTC (permalink / raw)
  To: shenjian15
  Cc: andrew, f.fainelli, hkallweit1, sergei.shtylyov, netdev,
	forest.zhouchang, linuxarm
In-Reply-To: <1566956087-37096-1-git-send-email-shenjian15@huawei.com>

From: Jian Shen <shenjian15@huawei.com>
Date: Wed, 28 Aug 2019 09:34:47 +0800

> Some ethernet drivers may call phy_start() and phy_stop() from
> ndo_open() and ndo_close() respectively.
> 
> When network cable is unconnected, and operate like below:
> step 1: ifconfig ethX up -> ndo_open -> phy_start ->start
> autoneg, and phy is no link.
> step 2: ifconfig ethX down -> ndo_close -> phy_stop -> just stop
> phy state machine.
> 
> This patch forces phy suspend even phydev->link is off.
> 
> Signed-off-by: Jian Shen <shenjian15@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH net v4 0/2] r8152: fix side effect
From: David Miller @ 2019-08-28 23:17 UTC (permalink / raw)
  To: hayeswang; +Cc: netdev, nic_swsd, linux-kernel
In-Reply-To: <1394712342-15778-323-Taiwan-albertk@realtek.com>

From: Hayes Wang <hayeswang@realtek.com>
Date: Wed, 28 Aug 2019 20:56:11 +0800

> v4:
> Add Fixes tag for both patch #1 and #2.

I applied v3, sorry.

I think it is OK as I will backport things to v5.2 -stable anyways.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox