Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] iproute2: devlink: use sys/queue.h from libbsd as a fallback
From: Stephen Hemminger @ 2019-07-26 18:29 UTC (permalink / raw)
  To: Sergei Trofimovich; +Cc: netdev
In-Reply-To: <20190724081838.18198-1-slyfox@gentoo.org>

On Wed, 24 Jul 2019 09:18:38 +0100
Sergei Trofimovich <slyfox@gentoo.org> wrote:

> On sys/queue.h does not exist linux-musl targets and
> fails build as:
> 
>     devlink.c:28:10: fatal error: sys/queue.h: No such file or directory
>        28 | #include <sys/queue.h>
>           |          ^~~~~~~~~~~~~
> 
> The change pulls in 'sys/queue.h' from libbsd in case
> system headers don't already provides it.
> 
> Tested on linux-musl and linux-glibc.
> 
> Bug: https://bugs.gentoo.org/690486
> CC: Stephen Hemminger <stephen@networkplumber.org>
> CC: netdev@vger.kernel.org
> Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
> ---

This is ugly and causes more maintainability issues.

Maybe just fix devlink not to depend on sys/queue.h at all.
It makes more sense to have common code style and usage across all of
iproute2.

We already have local version list.h, why continue with BSD stuff.

^ permalink raw reply

* Re: [PATCH RFC 0/4] Add support to directly attach BPF program to ftrace
From: Alexei Starovoitov @ 2019-07-26 18:39 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, bpf, Daniel Borkmann, Network Development, Steven Rostedt,
	kernel-team
In-Reply-To: <20190724135714.GA9945@google.com>

On Wed, Jul 24, 2019 at 09:57:14AM -0400, Joel Fernandes wrote:
> On Tue, Jul 23, 2019 at 03:11:10PM -0700, Alexei Starovoitov wrote:
> > > > > > I think allowing one tracepoint and disallowing another is pointless
> > > > > > from security point of view. Tracing bpf program can do bpf_probe_read
> > > > > > of anything.
> > > > >
> > > > > I think the assumption here is the user controls the program instructions at
> > > > > runtime, but that's not the case. The BPF program we are loading is not
> > > > > dynamically generated, it is built at build time and it is loaded from a
> > > > > secure verified partition, so even though it can do bpf_probe_read, it is
> > > > > still not something that the user can change.
> > > > 
> > > > so you're saying that by having a set of signed bpf programs which
> > > > instructions are known to be non-malicious and allowed set of tracepoints
> > > > to attach via selinux whitelist, such setup will be safe?
> > > > Have you considered how mix and match will behave?
> > > 
> > > Do you mean the effect of mixing tracepoints and programs? I have not
> > > considered this. I am Ok with further enforcing of this (only certain
> > > tracepoints can be attached to certain programs) if needed. What do
> > > you think? We could have a new bpf(2) syscall attribute specify which
> > > tracepoint is expected, or similar.
> > > 
> > > I wanted to walk you through our 2 usecases we are working on:
> > 
> > thanks for sharing the use case details. Appreciate it.
> 
> No problem and thanks for your thoughts.
> 
> > > 1. timeinstate: By hooking 2 programs onto sched_switch and cpu_frequency
> > > tracepoints, we are able to collect CPU power per-UID (specific app). Connor
> > > O'Brien is working on that.
> > > 
> > > 2. inode to file path mapping: By hooking onto VFS tracepoints we are adding to
> > > the android kernels, we can collect data when the kernel resolves a file path
> > > to a inode/device number. A BPF map stores the inode/dev number (key) and the
> > > path (value). We have usecases where we need a high speed lookup of this
> > > without having to scan all the files in the filesystem.
> > 
> > Can you share the link to vfs tracepoints you're adding?
> > Sounds like you're not going to attempt to upstream them knowing
> > Al's stance towards them?
> > May be there is a way we can do the feature you need, but w/o tracepoints?
> 
> Yes, given Al's stance I understand the patch is not upstreamable. The patch
> is here:
> For tracepoint:
> https://android.googlesource.com/kernel/common/+/27d3bfe20558d279041af403a887e7bdbdcc6f24%5E%21/

this is way more than tracepoint.

> For bpf program:
> https://android.googlesource.com/platform/system/bpfprogs/+/908f6cd718fab0de7a944f84628c56f292efeb17%5E%21/

what is unsafe_bpf_map_update_elem() in there?
The verifier comment sounds odd.
Could you describe the issue you see with the verifier?

> I intended to submit the tracepoint only for the Android kernels, however if
> there is an upstream solution to this then that's even better since upstream can
> benefit. Were you thinking of a BPF helper function to get this data?

I think the best way to evaluate the patches is whether they are upstreamable or not.
If they're not (like this case), it means that there is something wrong with their design
and if android decides to go with such approach it will only create serious issues long term.
Starting with the whole idea of dev+inode -> filepath cache.
dev+inode is not a unique identifier of the file.
In some filesystems two different files may have the same ino integer value.
Have you looked at 'struct file_handle' ? and name_to_handle_at ?
I think fhandle is the only way to get unique identifier of the file.
Could you please share more details why android needs this cache of dev+ino->path?
I guess something uses ino to find paths?
Sort of faster version of 'find -inum' ?


^ permalink raw reply

* Re: [patch iproute2 1/2] tc: action: fix crash caused by incorrect *argv check
From: Stephen Hemminger @ 2019-07-26 19:00 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, sthemmin, dsahern, alexanderk, mlxsw
In-Reply-To: <20190723193600.GA2315@nanopsycho.orion>

On Tue, 23 Jul 2019 21:36:00 +0200
Jiri Pirko <jiri@resnulli.us> wrote:

> Tue, Jul 23, 2019 at 07:54:01PM CEST, stephen@networkplumber.org wrote:
> >On Tue, 23 Jul 2019 13:25:37 +0200
> >Jiri Pirko <jiri@resnulli.us> wrote:
> >  
> >> From: Jiri Pirko <jiri@mellanox.com>
> >> 
> >> One cannot depend on *argv being null in case of no arg is left on the
> >> command line. For example in batch mode, this is not always true. Check
> >> argc instead to prevent crash.
> >> 
> >> Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
> >> Fixes: fd8b3d2c1b9b ("actions: Add support for user cookies")
> >> Signed-off-by: Jiri Pirko <jiri@mellanox.com>  
> >
> >Actually makeargs does NULL terminate the last arg so what input
> >to batchmode is breaking this?  
> 
> Interesting, there must be another but out there then.
> 
> My input is:
> filter add dev testdummy parent ffff: protocol all prio 11000 flower action drop
> filter add dev testdummy parent ffff: protocol ipv4 prio 1 flower dst_mac 11:22:33:44:55:66 action drop

This maybe related. Looks like the batchsize patches had issues.

# valgrind ./tc/tc -batch filter.bat 
==27348== Memcheck, a memory error detector
==27348== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==27348== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==27348== Command: ./tc/tc -batch filter.bat
==27348== 
==27348== Conditional jump or move depends on uninitialised value(s)
==27348==    at 0x4EE9C0C: getdelim (iogetdelim.c:59)
==27348==    by 0x152A37: getline (stdio.h:120)
==27348==    by 0x152A37: getcmdline (utils.c:1311)
==27348==    by 0x115543: batch (tc.c:358)
==27348==    by 0x4E9D09A: (below main) (libc-start.c:308)
==27348== 
==27348== Conditional jump or move depends on uninitialised value(s)
==27348==    at 0x152BE4: makeargs (utils.c:1359)
==27348==    by 0x115614: batch (tc.c:366)
==27348==    by 0x4E9D09A: (below main) (libc-start.c:308)
==27348== 
==27348== Conditional jump or move depends on uninitialised value(s)
==27348==    at 0x11EBFD: parse_action (m_action.c:225)
==27348==    by 0x13633E: flower_parse_opt (f_flower.c:1285)
==27348==    by 0x1190EB: tc_filter_modify (tc_filter.c:217)
==27348==    by 0x115674: batch (tc.c:404)
==27348==    by 0x4E9D09A: (below main) (libc-start.c:308)
==27348== 
==27348== Use of uninitialised value of size 8
==27348==    at 0x11EC0B: parse_action (m_action.c:225)
==27348==    by 0x13633E: flower_parse_opt (f_flower.c:1285)
==27348==    by 0x1190EB: tc_filter_modify (tc_filter.c:217)
==27348==    by 0x115674: batch (tc.c:404)
==27348==    by 0x4E9D09A: (below main) (libc-start.c:308)
==27348== 
Error: Parent Qdisc doesn't exists.
Error: Parent Qdisc doesn't exists.
Command failed filter.bat:1

^ permalink raw reply

* Re: [PATCH iproute2] iplink: document the 'link change' subcommand
From: Matteo Croce @ 2019-07-26 19:01 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, David Ahern
In-Reply-To: <20190726112514.4b0f63e4@hermes.lan>

On July 26, 2019 8:25:14 PM GMT+02:00, Stephen Hemminger <stephen@networkplumber.org> wrote:
> On Wed, 24 Jul 2019 21:12:18 +0200
> Matteo Croce <mcroce@redhat.com> wrote:
> 
> > ip link can set parameters both via the 'set' and 'change' keyword.
> > In fact, 'change' is an alias for 'set'.
> > Document this in the help and manpage.
> > 
> > Fixes: 1d93483985f0 ("iplink: use netlink for link configuration")
> > Signed-off-by: Matteo Croce <mcroce@redhat.com>
> 
> Probably just done originally for compatibility in some way with ip
> route.
> Not sure if it really needs to be documented.

Maybe not in the output help, but in the manpage we should state it.
-- 
Matteo Croce
per aspera ad upstream

^ permalink raw reply

* [PATCH net-next] r8169: align setting PME with vendor driver
From: Heiner Kallweit @ 2019-07-26 18:56 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller; +Cc: netdev@vger.kernel.org

Align setting PME with the vendor driver. PMEnable is writable on
RTL8169 only, on later chip versions it's read-only. PME_SIGNAL is
used on chip versions from RTL8168evl with the exception of the
RTL8168f family.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index e1e1c89fb..5c337234b 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -1414,18 +1414,22 @@ static void __rtl8169_set_wol(struct rtl8169_private *tp, u32 wolopts)
 	}
 
 	switch (tp->mac_version) {
-	case RTL_GIGA_MAC_VER_02 ... RTL_GIGA_MAC_VER_17:
+	case RTL_GIGA_MAC_VER_02 ... RTL_GIGA_MAC_VER_06:
 		options = RTL_R8(tp, Config1) & ~PMEnable;
 		if (wolopts)
 			options |= PMEnable;
 		RTL_W8(tp, Config1, options);
 		break;
-	default:
+	case RTL_GIGA_MAC_VER_34:
+	case RTL_GIGA_MAC_VER_37:
+	case RTL_GIGA_MAC_VER_39 ... RTL_GIGA_MAC_VER_51:
 		options = RTL_R8(tp, Config2) & ~PME_SIGNAL;
 		if (wolopts)
 			options |= PME_SIGNAL;
 		RTL_W8(tp, Config2, options);
 		break;
+	default:
+		break;
 	}
 
 	rtl_lock_config_regs(tp);
-- 
2.22.0


^ permalink raw reply related

* Re: Driver support for Realtek RTL8125 2.5GB Ethernet
From: Heiner Kallweit @ 2019-07-26 19:05 UTC (permalink / raw)
  To: Jian-Hong Pan, Linux Netdev List, Yan-Hsuan Chuang
  Cc: Linux Upstreaming Team, Andrew Lunn, Florian Fainelli
In-Reply-To: <e178221e-4f48-b9b9-2451-048e8f4a0f9f@gmail.com>

On 24.07.2019 22:02, Heiner Kallweit wrote:
> On 24.07.2019 10:19, Jian-Hong Pan wrote:
>> Hi all,
>>
>> We have got a consumer desktop equipped with Realtek RTL8125 2.5GB
>> Ethernet [1] recently.  But, there is no related driver in mainline
>> kernel yet.  So, we can only use the vendor driver [2] and customize
>> it [3] right now.
>>
>> Is anyone working on an upstream driver for this hardware?
>>
> At least I'm not aware of any such work. Issue with Realtek is that
> they answer individual questions very quickly but company policy is
> to not release any datasheets or errata documentation.
> RTL8169 inherited a lot from RTL8139, so I would expect that the
> r8169 driver could be a good basis for a RTL8125 mainline driver.
> 
Meanwhile I had a look at the RTL8125 vendor driver. Most parts are
quite similar to RTL8168. However the PHY handling is quite weird.
2.5Gbps isn't covered by Clause 22, but instead of switching to
Clause 45 Realtek uses Clause 22 plus a proprietary chip register
(for controlling the 2.5Gbps mode) that doesn't seem to be accessible
via MDIO bus. This may make using phylib tricky.

>> [1] https://www.realtek.com/en/press-room/news-releases/item/realtek-launches-world-s-first-single-chip-2-5g-ethernet-controller-for-multiple-applications-including-gaming-solution
>> [2] https://www.realtek.com/en/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-pci-express-software
>> [3] https://github.com/endlessm/linux/commit/da1e43f58850d272eb72f571524ed71fd237d32b
>>
>> Jian-Hong Pan
>>
> Heiner
> 
Heiner

^ permalink raw reply

* [PATCH net-next] ipv6: remove printk
From: Jonathan Lemon @ 2019-07-26 19:16 UTC (permalink / raw)
  To: netdev, davem; +Cc: kernel-team

ipv6_find_hdr() prints a non-rate limited error message
when it cannot find an ipv6 header at a specific offset.
This could be used as a DoS, so just remove it.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 net/ipv6/exthdrs_core.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/ipv6/exthdrs_core.c b/net/ipv6/exthdrs_core.c
index b358f1a4dd08..da46c4284676 100644
--- a/net/ipv6/exthdrs_core.c
+++ b/net/ipv6/exthdrs_core.c
@@ -197,10 +197,8 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 		struct ipv6hdr _ip6, *ip6;
 
 		ip6 = skb_header_pointer(skb, *offset, sizeof(_ip6), &_ip6);
-		if (!ip6 || (ip6->version != 6)) {
-			printk(KERN_ERR "IPv6 header not found\n");
+		if (!ip6 || (ip6->version != 6))
 			return -EBADMSG;
-		}
 		start = *offset + sizeof(struct ipv6hdr);
 		nexthdr = ip6->nexthdr;
 	}
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH RFC 0/4] Add support to directly attach BPF program to ftrace
From: Joel Fernandes @ 2019-07-26 19:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: LKML, bpf, Daniel Borkmann, Network Development, Steven Rostedt,
	kernel-team
In-Reply-To: <20190726183954.oxzhkrwt4uhgl4gl@ast-mbp.dhcp.thefacebook.com>

On Fri, Jul 26, 2019 at 11:39:56AM -0700, Alexei Starovoitov wrote:
[snip]
> > > > 1. timeinstate: By hooking 2 programs onto sched_switch and cpu_frequency
> > > > tracepoints, we are able to collect CPU power per-UID (specific app). Connor
> > > > O'Brien is working on that.
> > > > 
> > > > 2. inode to file path mapping: By hooking onto VFS tracepoints we are adding to
> > > > the android kernels, we can collect data when the kernel resolves a file path
> > > > to a inode/device number. A BPF map stores the inode/dev number (key) and the
> > > > path (value). We have usecases where we need a high speed lookup of this
> > > > without having to scan all the files in the filesystem.
> > > 
> > > Can you share the link to vfs tracepoints you're adding?
> > > Sounds like you're not going to attempt to upstream them knowing
> > > Al's stance towards them?
> > > May be there is a way we can do the feature you need, but w/o tracepoints?
> > 
> > Yes, given Al's stance I understand the patch is not upstreamable. The patch
> > is here:
> > For tracepoint:
> > https://android.googlesource.com/kernel/common/+/27d3bfe20558d279041af403a887e7bdbdcc6f24%5E%21/
> 
> this is way more than tracepoint.

True there is some code that calls the tracepoint. I want to optimize it more
but lets see I am ready to think more about it before doing it this way,
based on your suggestions.

> > For bpf program:
> > https://android.googlesource.com/platform/system/bpfprogs/+/908f6cd718fab0de7a944f84628c56f292efeb17%5E%21/
> 
> what is unsafe_bpf_map_update_elem() in there?
> The verifier comment sounds odd.
> Could you describe the issue you see with the verifier?

Will dig out the verifier issue I was seeing. I was just trying to get a
prototype working so I did not go into verifier details much.

> > I intended to submit the tracepoint only for the Android kernels, however if
> > there is an upstream solution to this then that's even better since upstream can
> > benefit. Were you thinking of a BPF helper function to get this data?
> 
> I think the best way to evaluate the patches is whether they are upstreamable or not.
> If they're not (like this case), it means that there is something wrong with their design
> and if android decides to go with such approach it will only create serious issues long term.
> Starting with the whole idea of dev+inode -> filepath cache.
> dev+inode is not a unique identifier of the file.
> In some filesystems two different files may have the same ino integer value.
> Have you looked at 'struct file_handle' ? and name_to_handle_at ?
> I think fhandle is the only way to get unique identifier of the file.
> Could you please share more details why android needs this cache of dev+ino->path?

I will follow-up with you on this by email off the list, thanks.

thanks,

 - Joel


^ permalink raw reply

* Re: [patch iproute2 1/2] tc: action: fix crash caused by incorrect *argv check
From: Stephen Hemminger @ 2019-07-26 19:47 UTC (permalink / raw)
  To: Jiri Pirko, chrims; +Cc: netdev, sthemmin, dsahern, alexanderk, mlxsw
In-Reply-To: <20190723112538.10977-1-jiri@resnulli.us>

On Tue, 23 Jul 2019 13:25:37 +0200
Jiri Pirko <jiri@resnulli.us> wrote:

> From: Jiri Pirko <jiri@mellanox.com>
> 
> One cannot depend on *argv being null in case of no arg is left on the
> command line. For example in batch mode, this is not always true. Check
> argc instead to prevent crash.
> 
> Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
> Fixes: fd8b3d2c1b9b ("actions: Add support for user cookies")
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  tc/m_action.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tc/m_action.c b/tc/m_action.c
> index ab6bc0ad28ff..0f9c3a27795d 100644
> --- a/tc/m_action.c
> +++ b/tc/m_action.c
> @@ -222,7 +222,7 @@ done0:
>  				goto bad_val;
>  			}
>  
> -			if (*argv && strcmp(*argv, "cookie") == 0) {
> +			if (argc && strcmp(*argv, "cookie") == 0) {
>  				size_t slen;
>  
>  				NEXT_ARG();


The logic here is broken at end of file.

	do {
		if (getcmdline(&line_next, &len, stdin) == -1)
			lastline = true;

		largc_next = makeargs(line_next, largv_next, 100);
		bs_enabled_next = batchsize_enabled(largc_next, largv_next);
		if (bs_enabled) {
			struct batch_


getcmdline() will return -1 at end of file.
The code will call make_args on an uninitialized pointer.

I see lots of other unnecessary complexity in the whole batch logic.
It needs to be rewritten.

Rather than me fixing the code, I am probably going to revert.

commit 485d0c6001c4aa134b99c86913d6a7089b7b2ab0
Author: Chris Mi <chrism@mellanox.com>
Date:   Fri Jan 12 14:13:16 2018 +0900

    tc: Add batchsize feature for filter and actions

^ permalink raw reply

* [PATCH net-next 0/4] r8169: improve HW csum and TSO handling
From: Heiner Kallweit @ 2019-07-26 19:47 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller; +Cc: netdev@vger.kernel.org

This series:
- delegates more tasks from the driver to the core
- enables HW csum and TSO per default
- copies quirks for buggy chip versions from vendor driver

Heiner Kallweit (4):
  r8169: set GSO size and segment limits
  r8169: implement callback ndo_features_check
  r8169: remove r8169_csum_workaround
  r8169: enable HW csum and TSO

 drivers/net/ethernet/realtek/r8169_main.c | 128 +++++++++++-----------
 1 file changed, 61 insertions(+), 67 deletions(-)

-- 
2.22.0


^ permalink raw reply

* Re: [PATCH RFC 0/4] Add support to directly attach BPF program to ftrace
From: Joel Fernandes @ 2019-07-26 19:49 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Alexei Starovoitov, LKML, bpf, Daniel Borkmann,
	Network Development, Steven Rostedt, Cc: Android Kernel
In-Reply-To: <20190726191853.GA196514@google.com>

On Fri, Jul 26, 2019 at 3:18 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Fri, Jul 26, 2019 at 11:39:56AM -0700, Alexei Starovoitov wrote:
[snip]
> > > For bpf program:
> > > https://android.googlesource.com/platform/system/bpfprogs/+/908f6cd718fab0de7a944f84628c56f292efeb17%5E%21/
> >
> > what is unsafe_bpf_map_update_elem() in there?
> > The verifier comment sounds odd.
> > Could you describe the issue you see with the verifier?
>
> Will dig out the verifier issue I was seeing. I was just trying to get a
> prototype working so I did not go into verifier details much.

This is actually slightly old code, the actual function name is
bpf_map_update_elem_unsafe() .
 https://android.googlesource.com/platform/system/bpf/+/refs/heads/master/progs/include/bpf_helpers.h#39

This function came about because someone added a DEFINE_BPF_MAP macro
which defines BPF map accessors based on the type of the key and
value. So that's the "safe" variant:
https://android.googlesource.com/platform/system/bpf/+/refs/heads/master/progs/include/bpf_helpers.h#54
(added in commit
https://android.googlesource.com/platform/system/bpf/+/6564b8eac46fc27dde807a39856386d98d2471c3)

So the "safe" variant of the bpf_map_update_elem for us became a map
specific version with a prototype:
static inline __always_inline __unused int
bpf_##the_map##_update_elem(TypeOfKey* k, TypeOfValue* v, unsigned
long long flags)

Since I had not upgraded my BPF program to the "safe" variant, I had
to use the internal "unsafe" variant of the API (if that makes
sense..).

thanks Alexei!

- Joel

^ permalink raw reply

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
From: Allan W. Nielsen @ 2019-07-26 19:50 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Horatiu Vultur, Nikolay Aleksandrov, roopa, davem, bridge, netdev,
	linux-kernel
In-Reply-To: <20190726134613.GD18223@lunn.ch>

Hi All,

I'm working on the same project as Horatiu.

The 07/26/2019 15:46, Andrew Lunn wrote:
> My default, multicast should be flooded, and that includes the CPU
> port for a DSA driver. Adding an MDB entry allows for optimisations,
> limiting which ports a multicast frame goes out of. But it is just an
> optimisation.

Do you do this for all VLANs, or is there a way to only do this for VLANs that
the CPU is suppose to take part of?

I assume we could limit the behavioral to only do this for VLANs which the
Bridge interface is part of - but I'm not sure if this is what you suggest.

As you properly guessed, this model is quite different from what we are used to.
Just for the context: The ethernet switches done by Vitesse, which was acquired
by Microsemi, and now become Microchip, has until now been supported by a MIT
licensed API (running in user-space) and a protocol stack running on top of the
API. In this model we have been used to explicitly configure what packets should
go to the CPU. Typically this would be the MAC addresses of the interface it
self, multicast addresses required by the IP stack (4 and 6), and the broadcast
address. In this model, will only do this on VLANs which is configured as L3
interfaces.

We may be able to make it work by flood all multicast traffic by default, and
use a low priority CPU queue. But I'm having a hard time getting used to this
model (maybe time will help).

Is it considered required to include the CPU in all multicast flood masks? Or do
you know if this is different from driver to driver?

Alternative would it make sense to make this behavioral configurable?

/Allan

^ permalink raw reply

* [PATCH net-next 1/4] r8169: set GSO size and segment limits
From: Heiner Kallweit @ 2019-07-26 19:48 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <d347af97-0b46-6c71-37ef-46ce2b46f4df@gmail.com>

Set GSO max size and max segment number as in the vendor driver.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 5c337234b..52a9e2d2f 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -569,6 +569,11 @@ enum rtl_rx_desc_bit {
 
 #define RsvdMask	0x3fffc000
 
+#define RTL_GSO_MAX_SIZE_V1	32000
+#define RTL_GSO_MAX_SEGS_V1	24
+#define RTL_GSO_MAX_SIZE_V2	64000
+#define RTL_GSO_MAX_SEGS_V2	64
+
 struct TxDesc {
 	__le32 opts1;
 	__le32 opts2;
@@ -6919,8 +6924,14 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		/* Disallow toggling */
 		dev->hw_features &= ~NETIF_F_HW_VLAN_CTAG_RX;
 
-	if (rtl_chip_supports_csum_v2(tp))
+	if (rtl_chip_supports_csum_v2(tp)) {
 		dev->hw_features |= NETIF_F_IPV6_CSUM | NETIF_F_TSO6;
+		dev->gso_max_size = RTL_GSO_MAX_SIZE_V2;
+		dev->gso_max_segs = RTL_GSO_MAX_SEGS_V2;
+	} else {
+		dev->gso_max_size = RTL_GSO_MAX_SIZE_V1;
+		dev->gso_max_segs = RTL_GSO_MAX_SEGS_V1;
+	}
 
 	dev->hw_features |= NETIF_F_RXALL;
 	dev->hw_features |= NETIF_F_RXFCS;
-- 
2.22.0



^ permalink raw reply related

* [PATCH net-next 2/4] r8169: implement callback ndo_features_check
From: Heiner Kallweit @ 2019-07-26 19:49 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <d347af97-0b46-6c71-37ef-46ce2b46f4df@gmail.com>

Implement callback ndo_features_check and move all feature checks there.
This will allow us to get rid of r8169_csum_workaround() completely in
a subsequent step. Like in the vendor driver disable HW csum for short
packets on RTL8168b.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 60 ++++++++++++++---------
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 52a9e2d2f..7e1b68b19 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -539,11 +539,11 @@ enum rtl_tx_desc_bit_1 {
 	TD1_GTSENV4	= (1 << 26),		/* Giant Send for IPv4 */
 	TD1_GTSENV6	= (1 << 25),		/* Giant Send for IPv6 */
 #define GTTCPHO_SHIFT			18
-#define GTTCPHO_MAX			0x7fU
+#define GTTCPHO_MAX			0x7f
 
 	/* Second doubleword. */
 #define TCPHO_SHIFT			18
-#define TCPHO_MAX			0x3ffU
+#define TCPHO_MAX			0x3ff
 #define TD1_MSS_SHIFT			18	/* MSS position (11 bits) */
 	TD1_IPv6_CS	= (1 << 28),		/* Calculate IPv6 checksum */
 	TD1_IPv4_CS	= (1 << 29),		/* Calculate IPv4 checksum */
@@ -5534,11 +5534,6 @@ static void r8169_csum_workaround(struct rtl8169_private *tp,
 		} while (segs);
 
 		dev_consume_skb_any(skb);
-	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
-		if (skb_checksum_help(skb) < 0)
-			goto drop;
-
-		rtl8169_start_xmit(skb, tp->dev);
 	} else {
 drop:
 		tp->dev->stats.tx_dropped++;
@@ -5595,13 +5590,6 @@ static bool rtl8169_tso_csum_v2(struct rtl8169_private *tp,
 	u32 mss = skb_shinfo(skb)->gso_size;
 
 	if (mss) {
-		if (transport_offset > GTTCPHO_MAX) {
-			netif_warn(tp, tx_err, tp->dev,
-				   "Invalid transport offset 0x%x for TSO\n",
-				   transport_offset);
-			return false;
-		}
-
 		switch (vlan_get_protocol(skb)) {
 		case htons(ETH_P_IP):
 			opts[0] |= TD1_GTSENV4;
@@ -5624,16 +5612,6 @@ static bool rtl8169_tso_csum_v2(struct rtl8169_private *tp,
 	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		u8 ip_protocol;
 
-		if (unlikely(rtl_test_hw_pad_bug(tp, skb)))
-			return !(skb_checksum_help(skb) || eth_skb_pad(skb));
-
-		if (transport_offset > TCPHO_MAX) {
-			netif_warn(tp, tx_err, tp->dev,
-				   "Invalid transport offset 0x%x\n",
-				   transport_offset);
-			return false;
-		}
-
 		switch (vlan_get_protocol(skb)) {
 		case htons(ETH_P_IP):
 			opts[1] |= TD1_IPv4_CS;
@@ -5790,6 +5768,39 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 	return NETDEV_TX_BUSY;
 }
 
+static netdev_features_t rtl8169_features_check(struct sk_buff *skb,
+						struct net_device *dev,
+						netdev_features_t features)
+{
+	int transport_offset = skb_transport_offset(skb);
+	struct rtl8169_private *tp = netdev_priv(dev);
+
+	if (skb_is_gso(skb)) {
+		if (transport_offset > GTTCPHO_MAX &&
+		    rtl_chip_supports_csum_v2(tp))
+			features &= ~NETIF_F_ALL_TSO;
+	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		if (skb->len < ETH_ZLEN) {
+			switch (tp->mac_version) {
+			case RTL_GIGA_MAC_VER_11:
+			case RTL_GIGA_MAC_VER_12:
+			case RTL_GIGA_MAC_VER_17:
+			case RTL_GIGA_MAC_VER_34:
+				features &= ~NETIF_F_CSUM_MASK;
+				break;
+			default:
+				break;
+			}
+		}
+
+		if (transport_offset > TCPHO_MAX &&
+		    rtl_chip_supports_csum_v2(tp))
+			features &= ~NETIF_F_CSUM_MASK;
+	}
+
+	return vlan_features_check(skb, features);
+}
+
 static void rtl8169_pcierr_interrupt(struct net_device *dev)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
@@ -6546,6 +6557,7 @@ static const struct net_device_ops rtl_netdev_ops = {
 	.ndo_stop		= rtl8169_close,
 	.ndo_get_stats64	= rtl8169_get_stats64,
 	.ndo_start_xmit		= rtl8169_start_xmit,
+	.ndo_features_check	= rtl8169_features_check,
 	.ndo_tx_timeout		= rtl8169_tx_timeout,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_change_mtu		= rtl8169_change_mtu,
-- 
2.22.0



^ permalink raw reply related

* [PATCH net-next 3/4] r8169: remove r8169_csum_workaround
From: Heiner Kallweit @ 2019-07-26 19:50 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <d347af97-0b46-6c71-37ef-46ce2b46f4df@gmail.com>

The loop in r8169_csum_workaround is called only if in
msdn_giant_send_check a copy of the skb header needs to be made and
we don't have enough memory. Let's simply drop the packet in that case
so that we can remove r8169_csum_workaround.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 39 ++---------------------
 1 file changed, 2 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 7e1b68b19..f77159540 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5508,39 +5508,6 @@ static bool rtl_test_hw_pad_bug(struct rtl8169_private *tp, struct sk_buff *skb)
 	return skb->len < ETH_ZLEN && tp->mac_version == RTL_GIGA_MAC_VER_34;
 }
 
-static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
-				      struct net_device *dev);
-/* r8169_csum_workaround()
- * The hw limites the value the transport offset. When the offset is out of the
- * range, calculate the checksum by sw.
- */
-static void r8169_csum_workaround(struct rtl8169_private *tp,
-				  struct sk_buff *skb)
-{
-	if (skb_is_gso(skb)) {
-		netdev_features_t features = tp->dev->features;
-		struct sk_buff *segs, *nskb;
-
-		features &= ~(NETIF_F_SG | NETIF_F_IPV6_CSUM | NETIF_F_TSO6);
-		segs = skb_gso_segment(skb, features);
-		if (IS_ERR(segs) || !segs)
-			goto drop;
-
-		do {
-			nskb = segs;
-			segs = segs->next;
-			nskb->next = NULL;
-			rtl8169_start_xmit(nskb, tp->dev);
-		} while (segs);
-
-		dev_consume_skb_any(skb);
-	} else {
-drop:
-		tp->dev->stats.tx_dropped++;
-		dev_kfree_skb_any(skb);
-	}
-}
-
 /* msdn_giant_send_check()
  * According to the document of microsoft, the TCP Pseudo Header excludes the
  * packet length for IPv6 TCP large packets.
@@ -5688,10 +5655,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 	opts[0] = DescOwn;
 
 	if (rtl_chip_supports_csum_v2(tp)) {
-		if (!rtl8169_tso_csum_v2(tp, skb, opts)) {
-			r8169_csum_workaround(tp, skb);
-			return NETDEV_TX_OK;
-		}
+		if (!rtl8169_tso_csum_v2(tp, skb, opts))
+			goto err_dma_0;
 	} else {
 		rtl8169_tso_csum_v1(skb, opts);
 	}
-- 
2.22.0



^ permalink raw reply related

* [PATCH net-next 4/4] r8169: enable HW csum and TSO
From: Heiner Kallweit @ 2019-07-26 19:51 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <d347af97-0b46-6c71-37ef-46ce2b46f4df@gmail.com>

Enable HW csum and TSO per default except on known buggy chip versions.
Realtek confirmed that RTL8168evl has a HW issue with TSO.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index f77159540..61a23aee0 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -6879,11 +6879,9 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	netif_napi_add(dev, &tp->napi, rtl8169_poll, NAPI_POLL_WEIGHT);
 
-	/* don't enable SG, IP_CSUM and TSO by default - it might not work
-	 * properly for all devices */
-	dev->features |= NETIF_F_RXCSUM |
-		NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX;
-
+	dev->features |= NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO |
+		NETIF_F_RXCSUM | NETIF_F_HW_VLAN_CTAG_TX |
+		NETIF_F_HW_VLAN_CTAG_RX;
 	dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO |
 		NETIF_F_RXCSUM | NETIF_F_HW_VLAN_CTAG_TX |
 		NETIF_F_HW_VLAN_CTAG_RX;
@@ -6903,6 +6901,7 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	if (rtl_chip_supports_csum_v2(tp)) {
 		dev->hw_features |= NETIF_F_IPV6_CSUM | NETIF_F_TSO6;
+		dev->features |= NETIF_F_IPV6_CSUM | NETIF_F_TSO6;
 		dev->gso_max_size = RTL_GSO_MAX_SIZE_V2;
 		dev->gso_max_segs = RTL_GSO_MAX_SEGS_V2;
 	} else {
@@ -6910,6 +6909,13 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		dev->gso_max_segs = RTL_GSO_MAX_SEGS_V1;
 	}
 
+	/* RTL8168e-vl has a HW issue with TSO */
+	if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
+		dev->vlan_features &= ~NETIF_F_ALL_TSO;
+		dev->hw_features &= ~NETIF_F_ALL_TSO;
+		dev->features &= ~NETIF_F_ALL_TSO;
+	}
+
 	dev->hw_features |= NETIF_F_RXALL;
 	dev->hw_features |= NETIF_F_RXFCS;
 
-- 
2.22.0



^ permalink raw reply related

* [PATCH] b43legacy: Remove pointless cond_resched() wrapper
From: Thomas Gleixner @ 2019-07-26 20:00 UTC (permalink / raw)
  To: netdev; +Cc: b43-dev, Larry Finger, Kalle Valo

cond_resched() can be used unconditionally. If CONFIG_PREEMPT is set, it
becomes a NOP scheduler wise.

Also the B43_BUG_ON() in that wrapper is a homebrewn variant of
__might_sleep() which is part of cond_resched() already.

Remove the wrapper and invoke cond_resched() directly.

Found while looking for CONFIG_PREEMPT dependent code treewide.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: netdev@vger.kernel.org
Cc: b43-dev@lists.infradead.org
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
---
 drivers/net/wireless/broadcom/b43legacy/phy.c |   21 +++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)

--- a/drivers/net/wireless/broadcom/b43legacy/phy.c
+++ b/drivers/net/wireless/broadcom/b43legacy/phy.c
@@ -69,17 +69,6 @@ static const s8 b43legacy_tssi2dbm_g_tab
 
 static void b43legacy_phy_initg(struct b43legacy_wldev *dev);
 
-
-static inline
-void b43legacy_voluntary_preempt(void)
-{
-	B43legacy_BUG_ON(!(!in_atomic() && !in_irq() &&
-			  !in_interrupt() && !irqs_disabled()));
-#ifndef CONFIG_PREEMPT
-	cond_resched();
-#endif /* CONFIG_PREEMPT */
-}
-
 /* Lock the PHY registers against concurrent access from the microcode.
  * This lock is nonrecursive. */
 void b43legacy_phy_lock(struct b43legacy_wldev *dev)
@@ -1124,7 +1113,7 @@ static u16 b43legacy_phy_lo_b_r15_loop(s
 		ret += b43legacy_phy_read(dev, 0x002C);
 	}
 	local_irq_restore(flags);
-	b43legacy_voluntary_preempt();
+	cond_resched();
 
 	return ret;
 }
@@ -1253,7 +1242,7 @@ u16 b43legacy_phy_lo_g_deviation_subval(
 	}
 	ret = b43legacy_phy_read(dev, 0x002D);
 	local_irq_restore(flags);
-	b43legacy_voluntary_preempt();
+	cond_resched();
 
 	return ret;
 }
@@ -1591,7 +1580,7 @@ void b43legacy_phy_lo_g_measure(struct b
 			b43legacy_radio_write16(dev, 0x43, i);
 			b43legacy_radio_write16(dev, 0x52, phy->txctl2);
 			udelay(10);
-			b43legacy_voluntary_preempt();
+			cond_resched();
 
 			b43legacy_phy_set_baseband_attenuation(dev, j * 2);
 
@@ -1642,7 +1631,7 @@ void b43legacy_phy_lo_g_measure(struct b
 					      phy->txctl2
 					      | (3/*txctl1*/ << 4));
 			udelay(10);
-			b43legacy_voluntary_preempt();
+			cond_resched();
 
 			b43legacy_phy_set_baseband_attenuation(dev, j * 2);
 
@@ -1665,7 +1654,7 @@ void b43legacy_phy_lo_g_measure(struct b
 		b43legacy_phy_write(dev, 0x0812, (r27 << 8) | 0xA2);
 		udelay(2);
 		b43legacy_phy_write(dev, 0x0812, (r27 << 8) | 0xA3);
-		b43legacy_voluntary_preempt();
+		cond_resched();
 	} else
 		b43legacy_phy_write(dev, 0x0015, r27 | 0xEFA0);
 	b43legacy_phy_lo_adjust(dev, is_initializing);

^ permalink raw reply

* Re: [PATCH] can: ti_hecc: use timestamp based rx-offloading
From: Jeroen Hofstee @ 2019-07-26 20:10 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can@vger.kernel.org
  Cc: Anant Gole, AnilKumar Ch, Wolfgang Grandegger, David S. Miller,
	open list:NETWORKING DRIVERS, open list
In-Reply-To: <04bdda38-79fa-c266-2a3c-1229a1fd8229@pengutronix.de>

Hello Marc,

On 7/26/19 1:48 PM, Marc Kleine-Budde wrote:
> On 4/29/19 2:03 PM, Jeroen Hofstee wrote:
>
>> @@ -744,8 +652,8 @@ static irqreturn_t ti_hecc_interrupt(int irq, void *dev_id)
>>   	struct net_device *ndev = (struct net_device *)dev_id;
>>   	struct ti_hecc_priv *priv = netdev_priv(ndev);
>>   	struct net_device_stats *stats = &ndev->stats;
>> -	u32 mbxno, mbx_mask, int_status, err_status;
>> -	unsigned long ack, flags;
>> +	u32 mbxno, mbx_mask, int_status, err_status, stamp;
>> +	unsigned long flags, rx_pending;
>>   
>>   	int_status = hecc_read(priv,
>>   		(priv->use_hecc1int) ? HECC_CANGIF1 : HECC_CANGIF0);
>> @@ -769,11 +677,11 @@ static irqreturn_t ti_hecc_interrupt(int irq, void *dev_id)
>>   			spin_lock_irqsave(&priv->mbx_lock, flags);
>>   			hecc_clear_bit(priv, HECC_CANME, mbx_mask);
>>   			spin_unlock_irqrestore(&priv->mbx_lock, flags);
>> -			stats->tx_bytes += hecc_read_mbx(priv, mbxno,
>> -						HECC_CANMCF) & 0xF;
>> +			stamp = hecc_read_stamp(priv, mbxno);
>> +			stats->tx_bytes += can_rx_offload_get_echo_skb(&priv->offload,
>> +									mbxno, stamp);
>>   			stats->tx_packets++;
>>   			can_led_event(ndev, CAN_LED_EVENT_TX);
>> -			can_get_echo_skb(ndev, mbxno);
>>   			--priv->tx_tail;
>>   		}
>>   
>> @@ -784,12 +692,11 @@ static irqreturn_t ti_hecc_interrupt(int irq, void *dev_id)
>>   		((priv->tx_head & HECC_TX_MASK) == HECC_TX_MASK)))
>>   			netif_wake_queue(ndev);
>>   
>> -		/* Disable RX mailbox interrupts and let NAPI reenable them */
>> -		if (hecc_read(priv, HECC_CANRMP)) {
>> -			ack = hecc_read(priv, HECC_CANMIM);
>> -			ack &= BIT(HECC_MAX_TX_MBOX) - 1;
>> -			hecc_write(priv, HECC_CANMIM, ack);
>> -			napi_schedule(&priv->napi);
>> +		/* offload RX mailboxes and let NAPI deliver them */
>> +		while ((rx_pending = hecc_read(priv, HECC_CANRMP))) {
>> +			can_rx_offload_irq_offload_timestamp(&priv->offload,
>> +							     rx_pending);
>> +			hecc_write(priv, HECC_CANRMP, rx_pending);
> Can prepare a patch to move the RMP writing into the mailbox_read()
> function. This makes the mailbox available a bit earlier and works
> better for memory pressure situations, where no skb can be allocated.

For my understanding, is there any other reason for alloc_can_skb,
to fail, besides being out of memory. I couldn't easily find an other
limit enforced on it.

If it is actually _moved_, as you suggested, it does loose the ability to
handle the newly received messages while the messages are read
in the same interrupt, so it needs to interrupt again. That will work,
but seems a bit a silly thing to do.

Perhaps we can do both? Mark the mailbox as available in
mailbox_read, so it is available as soon as possible and clear
them all in the irq (under the assumption that alloc_can_skb
failure means real big trouble, why would we want to keep the
old messages around and eventually ignore the new messages?).

Another question, not related to this patch, but this driver..
Most of the times the irq handles 1 or sometimes 2 messages.
Do you happen to know if it is possible to optionally delay the irq
a bit in the millisecond range, so more work can be done in a single
interrupt? Since there are now 28 rx hardware mailboxes available,
it shouldn't run out easily.

And as last, I guess you want a patch which applies to
linux-can-next/testing, right?

With kind regards,

Jeroen

^ permalink raw reply

* Re: [PATCH] net: key: af_key: Fix possible null-pointer dereferences in pfkey_send_policy_notify()
From: Jeremy Sowden @ 2019-07-26 20:15 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Jia-Ju Bai, herbert, davem, netdev, linux-kernel
In-Reply-To: <20190726094514.GD14601@gauss3.secunet.de>

[-- Attachment #1: Type: text/plain, Size: 2003 bytes --]

On 2019-07-26, at 11:45:14 +0200, Steffen Klassert wrote:
> On Wed, Jul 24, 2019 at 05:35:09PM +0800, Jia-Ju Bai wrote:
> > In pfkey_send_policy_notify(), there is an if statement on line 3081
> > to check whether xp is NULL:
> >     if (xp && xp->type != XFRM_POLICY_TYPE_MAIN)
> >
> > When xp is NULL, it is used by key_notify_policy() on line 3090:
> >     key_notify_policy(xp, ...)
> >         pfkey_xfrm_policy2msg_prep(xp) -- line 2211
> >             pfkey_xfrm_policy2msg_size(xp) -- line 2046
> >                 for (i=0; i<xp->xfrm_nr; i++) -- line 2026
> >                 t = xp->xfrm_vec + i; -- line 2027
> >     key_notify_policy(xp, ...)
> >         xp_net(xp) -- line 2231
> >             return read_pnet(&xp->xp_net); -- line 534
>
> Please don't quote random code lines, explain the problem instead.
>
> >
> > Thus, possible null-pointer dereferences may occur.
> >
> > To fix these bugs, xp is checked before calling key_notify_policy().
> >
> > These bugs are found by a static analysis tool STCheck written by
> > us.
> >
> > Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
> > ---
> >  net/key/af_key.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/net/key/af_key.c b/net/key/af_key.c
> > index b67ed3a8486c..ced54144d5fd 100644
> > --- a/net/key/af_key.c
> > +++ b/net/key/af_key.c
> > @@ -3087,6 +3087,8 @@ static int pfkey_send_policy_notify(struct xfrm_policy *xp, int dir, const struc
> >  	case XFRM_MSG_DELPOLICY:
> >  	case XFRM_MSG_NEWPOLICY:
> >  	case XFRM_MSG_UPDPOLICY:
> > +		if (!xp)
> > +			break;
>
> I think this can not happen. Who sends one of these notifications
> without a pointer to the policy?

I had a quick grep and found two places where km_policy_notify is passed
NULL as the policy:

  $ grep -rn '\<km_policy_notify(NULL,' net/
  net/xfrm/xfrm_user.c:2154:      km_policy_notify(NULL, 0, &c);
  net/key/af_key.c:2788:  km_policy_notify(NULL, 0, &c);

They occur in xfrm_flush_policy() and pfkey_spdflush() respectively.

J.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [REGRESSION] 5.3-rc1: r8169: remove 1000/Half from supported modes
From: Bernhard Held @ 2019-07-26 20:16 UTC (permalink / raw)
  To: linux-kernel, Heiner Kallweit, netdev; +Cc: David S. Miller

Hi Heiner,

with commit a6851c613fd7 "r8169: remove 1000/Half from supported modes" my RTL8111B GB-link stops working. It thinks that it established a link, however nothing is actually transmitted. Setting the mode with `mii-tool -F 100baseTx-HD` establishes a successful connection.

Reverting a6851c613fd7 (while considering the file name change) fixes autonegotiation for me.

Kind regards
Bernhard

^ permalink raw reply

* Re: [REGRESSION] 5.3-rc1: r8169: remove 1000/Half from supported modes
From: Heiner Kallweit @ 2019-07-26 20:24 UTC (permalink / raw)
  To: Bernhard Held, linux-kernel, netdev; +Cc: David S. Miller
In-Reply-To: <a291af45-310c-8b60-ae7e-392e73e3bad1@gmx.de>

On 26.07.2019 22:16, Bernhard Held wrote:
> Hi Heiner,
> 
> with commit a6851c613fd7 "r8169: remove 1000/Half from supported modes" my RTL8111B GB-link stops working. It thinks that it established a link, however nothing is actually transmitted. Setting the mode with `mii-tool -F 100baseTx-HD` establishes a successful connection.
> 
Can you provide standard ethtool output w/ and w/o this patch? Also a full dmesg output
with the patch would be helpful.
Is "100baseTx-HD" a typo and you mean GBit? And any special reason why you set half duplex?

> Reverting a6851c613fd7 (while considering the file name change) fixes autonegotiation for me.
> 
> Kind regards
> Bernhard
> 
Heiner

^ permalink raw reply

* [PATCH bpf-next 0/9] Revamp test_progs as a test running framework
From: Andrii Nakryiko @ 2019-07-26 20:37 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

This patch set makes a number of changes to test_progs selftest, which is
a collection of many other tests (and sometimes sub-tests as well), to provide
better testing experience and allow to start convering many individual test
programs under selftests/bpf into a single and convenient test runner.

Patch #1 fixes issue with Makefile, which makes prog_tests/test.h compiled as
a C code. This fix allows to change how test.h is generated, providing ability
to have more control on what and how tests are run.

Patch #2 changes how test.h is auto-generated, which allows to have test
definitions, instead of just running test functions. This gives ability to do
more complicated test run policies.

Patch #3 adds `-t <test-name>` and `-n <test-num>` selectors to run only
subset of tests.

Patch #4 adds libbpf_swap_print() API allowing to temporarily replace current
print callback and then set it back. This is necessary for some tests that
want more control over libbpf logging.

Patch #5 sets up and takes over libbpf logging from individual tests to
test_prog runner, adding -vv verbosity to capture debug output from libbpf.
This is useful when debugging failing tests.

Patch #6 furthers test output management and buffers it by default, emitting
log output only if test fails. This give succinct and clean default test
output. It's possible to bypass this behavior with -v flag, which will turn
off test output buffering.

Patch #7 adds support for sub-tests. It also enhances -t and -n selectors to
both support ability to specify sub-test selectors, as well as enhancing
number selector to accept sets of test, instead of just individual test
number.

Patch #8 converts bpf_verif_scale.c test to use sub-test APIs.

Patch #9 converts send_signal.c tests to use sub-test APIs.

Andrii Nakryiko (9):
  selftests/bpf: prevent headers to be compiled as C code
  selftests/bpf: revamp test_progs to allow more control
  selftests/bpf: add test selectors by number and name to test_progs
  libbpf: add libbpf_swap_print to get previous print func
  selftest/bpf: centralize libbpf logging management for test_progs
  selftests/bpf: abstract away test log output
  selftests/bpf: add sub-tests support for test_progs
  selftests/bpf: convert bpf_verif_scale.c to sub-tests API
  selftests/bpf: convert send_signal.c to use subtests

 tools/lib/bpf/libbpf.c                        |   8 +
 tools/lib/bpf/libbpf.h                        |   1 +
 tools/lib/bpf/libbpf.map                      |   5 +
 tools/testing/selftests/bpf/Makefile          |  14 +-
 .../selftests/bpf/prog_tests/bpf_obj_id.c     |   6 +-
 .../bpf/prog_tests/bpf_verif_scale.c          |  90 +++--
 .../bpf/prog_tests/get_stack_raw_tp.c         |   4 +-
 .../selftests/bpf/prog_tests/l4lb_all.c       |   2 +-
 .../selftests/bpf/prog_tests/map_lock.c       |  10 +-
 .../bpf/prog_tests/reference_tracking.c       |  15 +-
 .../selftests/bpf/prog_tests/send_signal.c    |  17 +-
 .../selftests/bpf/prog_tests/spinlock.c       |   2 +-
 .../bpf/prog_tests/stacktrace_build_id.c      |   4 +-
 .../bpf/prog_tests/stacktrace_build_id_nmi.c  |   4 +-
 .../selftests/bpf/prog_tests/xdp_noinline.c   |   3 +-
 tools/testing/selftests/bpf/test_progs.c      | 373 +++++++++++++++++-
 tools/testing/selftests/bpf/test_progs.h      |  45 ++-
 17 files changed, 501 insertions(+), 102 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH bpf-next 1/9] selftests/bpf: prevent headers to be compiled as C code
From: Andrii Nakryiko @ 2019-07-26 20:37 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190726203747.1124677-1-andriin@fb.com>

Apprently listing header as a normal dependency for a binary output
makes it go through compilation as if it was C code. This currently
works without a problem, but in subsequent commits causes problems for
differently generated test.h for test_progs. Marking those headers as
order-only dependency solves the issue.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/testing/selftests/bpf/Makefile | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 11c9c62c3362..bb66cc4a7f34 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -235,7 +235,7 @@ PROG_TESTS_H := $(PROG_TESTS_DIR)/tests.h
 PROG_TESTS_FILES := $(wildcard prog_tests/*.c)
 test_progs.c: $(PROG_TESTS_H)
 $(OUTPUT)/test_progs: CFLAGS += $(TEST_PROGS_CFLAGS)
-$(OUTPUT)/test_progs: test_progs.c $(PROG_TESTS_H) $(PROG_TESTS_FILES)
+$(OUTPUT)/test_progs: test_progs.c $(PROG_TESTS_FILES) | $(PROG_TESTS_H)
 $(PROG_TESTS_H): $(PROG_TESTS_FILES) | $(PROG_TESTS_DIR)
 	$(shell ( cd prog_tests/; \
 		  echo '/* Generated header, do not edit */'; \
@@ -256,7 +256,7 @@ MAP_TESTS_H := $(MAP_TESTS_DIR)/tests.h
 MAP_TESTS_FILES := $(wildcard map_tests/*.c)
 test_maps.c: $(MAP_TESTS_H)
 $(OUTPUT)/test_maps: CFLAGS += $(TEST_MAPS_CFLAGS)
-$(OUTPUT)/test_maps: test_maps.c $(MAP_TESTS_H) $(MAP_TESTS_FILES)
+$(OUTPUT)/test_maps: test_maps.c $(MAP_TESTS_FILES) | $(MAP_TESTS_H)
 $(MAP_TESTS_H): $(MAP_TESTS_FILES) | $(MAP_TESTS_DIR)
 	$(shell ( cd map_tests/; \
 		  echo '/* Generated header, do not edit */'; \
@@ -277,7 +277,7 @@ VERIFIER_TESTS_H := $(VERIFIER_TESTS_DIR)/tests.h
 VERIFIER_TEST_FILES := $(wildcard verifier/*.c)
 test_verifier.c: $(VERIFIER_TESTS_H)
 $(OUTPUT)/test_verifier: CFLAGS += $(TEST_VERIFIER_CFLAGS)
-$(OUTPUT)/test_verifier: test_verifier.c $(VERIFIER_TESTS_H)
+$(OUTPUT)/test_verifier: test_verifier.c | $(VERIFIER_TEST_FILES) $(VERIFIER_TESTS_H)
 $(VERIFIER_TESTS_H): $(VERIFIER_TEST_FILES) | $(VERIFIER_TESTS_DIR)
 	$(shell ( cd verifier/; \
 		  echo '/* Generated header, do not edit */'; \
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next 2/9] selftests/bpf: revamp test_progs to allow more control
From: Andrii Nakryiko @ 2019-07-26 20:37 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190726203747.1124677-1-andriin@fb.com>

Refactor test_progs to allow better control on what's being run.
Also use argp to do argument parsing, so that it's easier to keep adding
more options.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/testing/selftests/bpf/Makefile     |  8 +--
 tools/testing/selftests/bpf/test_progs.c | 84 +++++++++++++++++++++---
 2 files changed, 77 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index bb66cc4a7f34..3bd0f4a0336a 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -239,14 +239,8 @@ $(OUTPUT)/test_progs: test_progs.c $(PROG_TESTS_FILES) | $(PROG_TESTS_H)
 $(PROG_TESTS_H): $(PROG_TESTS_FILES) | $(PROG_TESTS_DIR)
 	$(shell ( cd prog_tests/; \
 		  echo '/* Generated header, do not edit */'; \
-		  echo '#ifdef DECLARE'; \
 		  ls *.c 2> /dev/null | \
-			sed -e 's@\([^\.]*\)\.c@extern void test_\1(void);@'; \
-		  echo '#endif'; \
-		  echo '#ifdef CALL'; \
-		  ls *.c 2> /dev/null | \
-			sed -e 's@\([^\.]*\)\.c@test_\1();@'; \
-		  echo '#endif' \
+			sed -e 's@\([^\.]*\)\.c@DEFINE_TEST(\1)@'; \
 		 ) > $(PROG_TESTS_H))
 
 MAP_TESTS_DIR = $(OUTPUT)/map_tests
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index dae0819b1141..eea88ba59225 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -3,6 +3,7 @@
  */
 #include "test_progs.h"
 #include "bpf_rlimit.h"
+#include <argp.h>
 
 int error_cnt, pass_cnt;
 bool jit_enabled;
@@ -156,22 +157,89 @@ void *spin_lock_thread(void *arg)
 	pthread_exit(arg);
 }
 
-#define DECLARE
+/* extern declarations for test funcs */
+#define DEFINE_TEST(name) extern void test_##name();
 #include <prog_tests/tests.h>
-#undef DECLARE
+#undef DEFINE_TEST
 
-int main(int ac, char **av)
+struct prog_test_def {
+	const char *test_name;
+	void (*run_test)(void);
+};
+
+static struct prog_test_def prog_test_defs[] = {
+#define DEFINE_TEST(name) {	      \
+	.test_name = #name,	      \
+	.run_test = &test_##name,   \
+},
+#include <prog_tests/tests.h>
+#undef DEFINE_TEST
+};
+
+const char *argp_program_version = "test_progs 0.1";
+const char *argp_program_bug_address = "<bpf@vger.kernel.org>";
+const char argp_program_doc[] = "BPF selftests test runner";
+
+enum ARG_KEYS {
+	ARG_VERIFIER_STATS = 's',
+};
+	
+static const struct argp_option opts[] = {
+	{ "verifier-stats", ARG_VERIFIER_STATS, NULL, 0,
+	  "Output verifier statistics", },
+	{},
+};
+
+struct test_env {
+	bool verifier_stats;
+};
+
+static struct test_env env = {};
+
+static error_t parse_arg(int key, char *arg, struct argp_state *state)
 {
+	struct test_env *env = state->input;
+
+	switch (key) {
+	case ARG_VERIFIER_STATS:
+		env->verifier_stats = true;
+		break;
+	case ARGP_KEY_ARG:
+		argp_usage(state);
+		break;
+	case ARGP_KEY_END:
+		break;
+	default:
+		return ARGP_ERR_UNKNOWN;
+	}
+	return 0;
+}
+
+
+int main(int argc, char **argv)
+{
+	static const struct argp argp = {
+		.options = opts,
+		.parser = parse_arg,
+		.doc = argp_program_doc,
+	};
+	const struct prog_test_def *def;
+	int err, i;
+
+	err = argp_parse(&argp, argc, argv, 0, NULL, &env);
+	if (err)
+		return err;
+
 	srand(time(NULL));
 
 	jit_enabled = is_jit_enabled();
 
-	if (ac == 2 && strcmp(av[1], "-s") == 0)
-		verifier_stats = true;
+	verifier_stats = env.verifier_stats;
 
-#define CALL
-#include <prog_tests/tests.h>
-#undef CALL
+	for (i = 0; i < ARRAY_SIZE(prog_test_defs); i++) {
+		def = &prog_test_defs[i];
+		def->run_test();
+	}
 
 	printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
 	return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next 3/9] selftests/bpf: add test selectors by number and name to test_progs
From: Andrii Nakryiko @ 2019-07-26 20:37 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190726203747.1124677-1-andriin@fb.com>

Add ability to specify either test number or test name substring to
narrow down a set of test to run.

Usage:
sudo ./test_progs -n 1
sudo ./test_progs -t attach_probe

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/testing/selftests/bpf/test_progs.c | 43 +++++++++++++++++++++---
 1 file changed, 39 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index eea88ba59225..6e04b9f83777 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -4,6 +4,7 @@
 #include "test_progs.h"
 #include "bpf_rlimit.h"
 #include <argp.h>
+#include <string.h>
 
 int error_cnt, pass_cnt;
 bool jit_enabled;
@@ -164,6 +165,7 @@ void *spin_lock_thread(void *arg)
 
 struct prog_test_def {
 	const char *test_name;
+	int test_num;
 	void (*run_test)(void);
 };
 
@@ -181,26 +183,49 @@ const char *argp_program_bug_address = "<bpf@vger.kernel.org>";
 const char argp_program_doc[] = "BPF selftests test runner";
 
 enum ARG_KEYS {
+	ARG_TEST_NUM = 'n',
+	ARG_TEST_NAME = 't',
 	ARG_VERIFIER_STATS = 's',
 };
 	
 static const struct argp_option opts[] = {
+	{ "num", ARG_TEST_NUM, "NUM", 0,
+	  "Run test number NUM only " },
+	{ "name", ARG_TEST_NAME, "NAME", 0,
+	  "Run tests with names containing NAME" },
 	{ "verifier-stats", ARG_VERIFIER_STATS, NULL, 0,
 	  "Output verifier statistics", },
 	{},
 };
 
 struct test_env {
+	int test_num_selector;
+	const char *test_name_selector;
 	bool verifier_stats;
 };
 
-static struct test_env env = {};
+static struct test_env env = {
+	.test_num_selector = -1,
+};
 
 static error_t parse_arg(int key, char *arg, struct argp_state *state)
 {
 	struct test_env *env = state->input;
 
 	switch (key) {
+	case ARG_TEST_NUM: {
+		int test_num;
+
+		errno = 0;
+		test_num = strtol(arg, NULL, 10);
+		if (errno)
+			return -errno;
+		env->test_num_selector = test_num;
+		break;
+	}
+	case ARG_TEST_NAME:
+		env->test_name_selector = arg;
+		break;
 	case ARG_VERIFIER_STATS:
 		env->verifier_stats = true;
 		break;
@@ -223,7 +248,7 @@ int main(int argc, char **argv)
 		.parser = parse_arg,
 		.doc = argp_program_doc,
 	};
-	const struct prog_test_def *def;
+	struct prog_test_def *test;
 	int err, i;
 
 	err = argp_parse(&argp, argc, argv, 0, NULL, &env);
@@ -237,8 +262,18 @@ int main(int argc, char **argv)
 	verifier_stats = env.verifier_stats;
 
 	for (i = 0; i < ARRAY_SIZE(prog_test_defs); i++) {
-		def = &prog_test_defs[i];
-		def->run_test();
+		test = &prog_test_defs[i];
+
+		test->test_num = i + 1;
+
+		if (env.test_num_selector >= 0 &&
+		    test->test_num != env.test_num_selector)
+			continue;
+		if (env.test_name_selector &&
+		    !strstr(test->test_name, env.test_name_selector))
+			continue;
+
+		test->run_test();
 	}
 
 	printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
-- 
2.17.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox