Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next,v4 08/12] drivers: net: use flow block API
From: Edward Cree @ 2019-08-16 17:00 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netdev, netfilter-devel
In-Reply-To: <20190816010421.if6mbyl2n3fsujy4@salvia>

On 16/08/2019 02:04, Pablo Neira Ayuso wrote:
> On Wed, Aug 14, 2019 at 05:17:20PM +0100, Edward Cree wrote:
>> TBH I'm still not clear why you need a flow_block per subsystem, rather than
>>  just having multiple subsystems feed their offload requests through the same
>>  flow_block but with different enum tc_setup_type or enum tc_fl_command or
>>  some other indication that this is "netfilter" rather than "tc" asking for a
>>  tc_cls_flower_offload.
> In tc, the flow_block is set up by when the ingress qdisc is
> registered. The usual scenario for most drivers is to have one single
> flow_block per registered ingress qdisc, this makes a 1:1 mapping
> between ingress qdisc and flow_block.
>
> Still, you can register two or more ingress qdiscs to make them share
> the same policy via 'tc block'. In that case all those qdiscs use one
> single flow_block. This makes a N:1 mapping between these qdisc
> ingress and the flow_block. This policy applies to all ingress qdiscs
> that are part of the same tc block. By 'tc block', I'm refering to the
> tcf_block structure.
>
> In netfilter, there are ingress basechains that are registered per
> device. Each basechain gets a flow_block by when the basechain is
> registered. Shared blocks as in tcf_block are not yet supported, but
> it should not be hard to extend it to make this work.
>
> To reuse the same flow_block as entry point for all subsystems as your
> propose - assuming offloads for two or more subsystems are in place -
> then all of them would need to have the same block sharing
> configuration, which might not be the case, ie. tc ingress might have
> a eth0 and eth1 use the same policy via flow_block, while netfilter
> might have one basechain for eth0 and another for eth1 (no policy
> sharing).
Thank you, that's very helpful.

>> This really needs a design document explaining what all the bits are, how
>>  they fit together, and why they need to be like that.
> I did not design this flow_block abstraction, this concept was already
> in place under a different name and extend it so the ethtool/netfilter
> subsystems to avoid driver code duplication for offloads.
It's more the new implementation that you've created as part of this
 extension that I was asking about, although I agree that the
 abstraction that already existed is in need of documentation too.

^ permalink raw reply

* Re: [PATCH RFC net-next 3/3] net: dsa: mv88e6xxx: setup SERDES irq also for CPU/DSA ports
From: Marek Behun @ 2019-08-16 17:05 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: netdev, Andrew Lunn, Vladimir Oltean, Florian Fainelli
In-Reply-To: <20190816122552.GC629@t480s.localdomain>

On Fri, 16 Aug 2019 12:25:52 -0400
Vivien Didelot <vivien.didelot@gmail.com> wrote:

> So now we have mv88e6xxx_setup_port() and mv88e6xxx_port_setup(), which both
> setup a port, differently, at different time. This is definitely error prone.

Hmm. I don't know how much of mv88e6xxx_setup_port() could be moved to
this new port_setup(), since there are other setup functions called in
mv88e6xxx_setup() that can possibly depend on what was done by
mv88e6xxx_setup_port().

Maybe the new DSA operations should be called .after_setup()
and .before_teardown(), and be called just once for the whole switch,
not for each port?

^ permalink raw reply

* [bpf-next,v2] selftests/bpf: fix race in test_tcp_rtt test
From: Petar Penkov @ 2019-08-16 17:08 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, sdf, Petar Penkov

From: Petar Penkov <ppenkov@google.com>

There is a race in this test between receiving the ACK for the
single-byte packet sent in the test, and reading the values from the
map.

This patch fixes this by having the client wait until there are no more
unacknowledged packets.

Before:
for i in {1..1000}; do ../net/in_netns.sh ./test_tcp_rtt; \
done | grep -c PASSED
< trimmed error messages >
993

After:
for i in {1..10000}; do ../net/in_netns.sh ./test_tcp_rtt; \
done | grep -c PASSED
10000

Fixes: b55873984dab ("selftests/bpf: test BPF_SOCK_OPS_RTT_CB")
Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 tools/testing/selftests/bpf/test_tcp_rtt.c | 31 ++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_tcp_rtt.c b/tools/testing/selftests/bpf/test_tcp_rtt.c
index 90c3862f74a8..93916a69823e 100644
--- a/tools/testing/selftests/bpf/test_tcp_rtt.c
+++ b/tools/testing/selftests/bpf/test_tcp_rtt.c
@@ -6,6 +6,7 @@
 #include <sys/types.h>
 #include <sys/socket.h>
 #include <netinet/in.h>
+#include <netinet/tcp.h>
 #include <pthread.h>
 
 #include <linux/filter.h>
@@ -34,6 +35,30 @@ static void send_byte(int fd)
 		error(1, errno, "Failed to send single byte");
 }
 
+static int wait_for_ack(int fd, int retries)
+{
+	struct tcp_info info;
+	socklen_t optlen;
+	int i, err;
+
+	for (i = 0; i < retries; i++) {
+		optlen = sizeof(info);
+		err = getsockopt(fd, SOL_TCP, TCP_INFO, &info, &optlen);
+		if (err < 0) {
+			log_err("Failed to lookup TCP stats");
+			return err;
+		}
+
+		if (info.tcpi_unacked == 0)
+			return 0;
+
+		usleep(10);
+	}
+
+	log_err("Did not receive ACK");
+	return -1;
+}
+
 static int verify_sk(int map_fd, int client_fd, const char *msg, __u32 invoked,
 		     __u32 dsack_dups, __u32 delivered, __u32 delivered_ce,
 		     __u32 icsk_retransmits)
@@ -149,6 +174,11 @@ static int run_test(int cgroup_fd, int server_fd)
 			 /*icsk_retransmits=*/0);
 
 	send_byte(client_fd);
+	if (wait_for_ack(client_fd, 100) < 0) {
+		err = -1;
+		goto close_client_fd;
+	}
+
 
 	err += verify_sk(map_fd, client_fd, "first payload byte",
 			 /*invoked=*/2,
@@ -157,6 +187,7 @@ static int run_test(int cgroup_fd, int server_fd)
 			 /*delivered_ce=*/0,
 			 /*icsk_retransmits=*/0);
 
+close_client_fd:
 	close(client_fd);
 
 close_bpf_object:
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related

* Re: [PATCH bpf 0/6] tools: bpftool: fix printf()-like functions
From: Alexei Starovoitov @ 2019-08-16 17:11 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Network Development,
	oss-drivers
In-Reply-To: <10602447-213f-fce5-54c7-7952eb3e8712@netronome.com>

On Fri, Aug 16, 2019 at 9:41 AM Quentin Monnet
<quentin.monnet@netronome.com> wrote:
>
> 2019-08-15 22:08 UTC-0700 ~ Alexei Starovoitov
> <alexei.starovoitov@gmail.com>
> > On Thu, Aug 15, 2019 at 7:32 AM Quentin Monnet
> > <quentin.monnet@netronome.com> wrote:
> >>
> >> Hi,
> >> Because the "__printf()" attributes were used only where the functions are
> >> implemented, and not in header files, the checks have not been enforced on
> >> all the calls to printf()-like functions, and a number of errors slipped in
> >> bpftool over time.
> >>
> >> This set cleans up such errors, and then moves the "__printf()" attributes
> >> to header files, so that the checks are performed at all locations.
> >
> > Applied. Thanks
> >
>
> Thanks Alexei!
>
> I noticed the set was applied to the bpf-next tree, and not bpf. Just
> checking if this is intentional?

Yes. I don't see the _fix_ part in there.
Looks like cleanup to me.
I've also considered to push
commit d34b044038bf ("tools: bpftool: close prog FD before exit on
showing a single program")
to bpf-next as well.
That fd leak didn't feel that necessary to push to bpf tree
and risk merge conflicts... but I pushed it to bpf at the end.

^ permalink raw reply

* Re: [bpf-next,v2] selftests/bpf: fix race in test_tcp_rtt test
From: Stanislav Fomichev @ 2019-08-16 17:14 UTC (permalink / raw)
  To: Petar Penkov; +Cc: netdev, bpf, davem, ast, daniel, sdf, Petar Penkov
In-Reply-To: <20190816170825.22500-1-ppenkov.kernel@gmail.com>

On 08/16, Petar Penkov wrote:
> From: Petar Penkov <ppenkov@google.com>
> 
> There is a race in this test between receiving the ACK for the
> single-byte packet sent in the test, and reading the values from the
> map.
> 
> This patch fixes this by having the client wait until there are no more
> unacknowledged packets.
Reviewed-by: Stanislav Fomichev <sdf@google.com>

Thanks!
> 
> Before:
> for i in {1..1000}; do ../net/in_netns.sh ./test_tcp_rtt; \
> done | grep -c PASSED
> < trimmed error messages >
> 993
> 
> After:
> for i in {1..10000}; do ../net/in_netns.sh ./test_tcp_rtt; \
> done | grep -c PASSED
> 10000
> 
> Fixes: b55873984dab ("selftests/bpf: test BPF_SOCK_OPS_RTT_CB")
> Signed-off-by: Petar Penkov <ppenkov@google.com>
> ---
>  tools/testing/selftests/bpf/test_tcp_rtt.c | 31 ++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/tools/testing/selftests/bpf/test_tcp_rtt.c b/tools/testing/selftests/bpf/test_tcp_rtt.c
> index 90c3862f74a8..93916a69823e 100644
> --- a/tools/testing/selftests/bpf/test_tcp_rtt.c
> +++ b/tools/testing/selftests/bpf/test_tcp_rtt.c
> @@ -6,6 +6,7 @@
>  #include <sys/types.h>
>  #include <sys/socket.h>
>  #include <netinet/in.h>
> +#include <netinet/tcp.h>
>  #include <pthread.h>
>  
>  #include <linux/filter.h>
> @@ -34,6 +35,30 @@ static void send_byte(int fd)
>  		error(1, errno, "Failed to send single byte");
>  }
>  
> +static int wait_for_ack(int fd, int retries)
> +{
> +	struct tcp_info info;
> +	socklen_t optlen;
> +	int i, err;
> +
> +	for (i = 0; i < retries; i++) {
> +		optlen = sizeof(info);
> +		err = getsockopt(fd, SOL_TCP, TCP_INFO, &info, &optlen);
> +		if (err < 0) {
> +			log_err("Failed to lookup TCP stats");
> +			return err;
> +		}
> +
> +		if (info.tcpi_unacked == 0)
> +			return 0;
> +
> +		usleep(10);
> +	}
> +
> +	log_err("Did not receive ACK");
> +	return -1;
> +}
> +
>  static int verify_sk(int map_fd, int client_fd, const char *msg, __u32 invoked,
>  		     __u32 dsack_dups, __u32 delivered, __u32 delivered_ce,
>  		     __u32 icsk_retransmits)
> @@ -149,6 +174,11 @@ static int run_test(int cgroup_fd, int server_fd)
>  			 /*icsk_retransmits=*/0);
>  
>  	send_byte(client_fd);
> +	if (wait_for_ack(client_fd, 100) < 0) {
> +		err = -1;
> +		goto close_client_fd;
> +	}
> +
>  
>  	err += verify_sk(map_fd, client_fd, "first payload byte",
>  			 /*invoked=*/2,
> @@ -157,6 +187,7 @@ static int run_test(int cgroup_fd, int server_fd)
>  			 /*delivered_ce=*/0,
>  			 /*icsk_retransmits=*/0);
>  
> +close_client_fd:
>  	close(client_fd);
>  
>  close_bpf_object:
> -- 
> 2.23.0.rc1.153.gdeed80330f-goog
> 

^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: relicense bpf_helpers.h and bpf_endian.h
From: Greg KH @ 2019-08-16 17:15 UTC (permalink / raw)
  To: Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Andrii Nakryiko, bpf, netdev, andrii.nakryiko, kernel-team,
	Michael Holzheu, Naveen N . Rao, David S . Miller,
	Michal Rostecki, John Fastabend, Sargun Dhillon
In-Reply-To: <23a87525-acf5-7a7e-b7b6-3c47b9760eeb@iogearbox.net>

On Fri, Aug 16, 2019 at 05:29:27PM +0200, Daniel Borkmann wrote:
> On 8/16/19 2:10 PM, Jesper Dangaard Brouer wrote:
> > On Thu, 15 Aug 2019 22:45:43 -0700
> > Andrii Nakryiko <andriin@fb.com> wrote:
> > 
> > > bpf_helpers.h and bpf_endian.h contain useful macros and BPF helper
> > > definitions essential to almost every BPF program. Which makes them
> > > useful not just for selftests. To be able to expose them as part of
> > > libbpf, though, we need them to be dual-licensed as LGPL-2.1 OR
> > > BSD-2-Clause. This patch updates licensing of those two files.
> > 
> > I've already ACKed this, and is fine with (LGPL-2.1 OR BSD-2-Clause).
> > 
> > I just want to understand, why "BSD-2-Clause" and not "Apache-2.0" ?
> > 
> > The original argument was that this needed to be compatible with
> > "Apache-2.0", then why not simply add this in the "OR" ?
> 
> It's use is discouraged in the kernel tree, see also LICENSES/dual/Apache-2.0 (below) and
> statement wrt compatibility from https://www.apache.org/licenses/GPL-compatibility.html:
> 
>   Valid-License-Identifier: Apache-2.0
>   SPDX-URL: https://spdx.org/licenses/Apache-2.0.html
>   Usage-Guide:
>     Do NOT use. The Apache-2.0 is not GPL2 compatible. [...]

That is correct, don't use Apache-2 code in the kernel please.  Even as
a dual-license, it's a total mess.

Having this be BSD-2 is actually better, as it should be fine to use
with Apache 2 code, right?

Jesper, do you know of any license that BSD-2 is not compatible with
that is needed?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH bpf 0/6] tools: bpftool: fix printf()-like functions
From: Quentin Monnet @ 2019-08-16 17:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Network Development,
	oss-drivers
In-Reply-To: <CAADnVQLPg8jEsUbKOxzQc5Q1BKrB=urSWiniGwsJhcm=UM7oKA@mail.gmail.com>

2019-08-16 10:11 UTC-0700 ~ Alexei Starovoitov
<alexei.starovoitov@gmail.com>
> On Fri, Aug 16, 2019 at 9:41 AM Quentin Monnet
> <quentin.monnet@netronome.com> wrote:
>>
>> 2019-08-15 22:08 UTC-0700 ~ Alexei Starovoitov
>> <alexei.starovoitov@gmail.com>
>>> On Thu, Aug 15, 2019 at 7:32 AM Quentin Monnet
>>> <quentin.monnet@netronome.com> wrote:
>>>>
>>>> Hi,
>>>> Because the "__printf()" attributes were used only where the functions are
>>>> implemented, and not in header files, the checks have not been enforced on
>>>> all the calls to printf()-like functions, and a number of errors slipped in
>>>> bpftool over time.
>>>>
>>>> This set cleans up such errors, and then moves the "__printf()" attributes
>>>> to header files, so that the checks are performed at all locations.
>>>
>>> Applied. Thanks
>>>
>>
>> Thanks Alexei!
>>
>> I noticed the set was applied to the bpf-next tree, and not bpf. Just
>> checking if this is intentional?
> 
> Yes. I don't see the _fix_ part in there.
> Looks like cleanup to me.
> I've also considered to push
> commit d34b044038bf ("tools: bpftool: close prog FD before exit on
> showing a single program")
> to bpf-next as well.
> That fd leak didn't feel that necessary to push to bpf tree
> and risk merge conflicts... but I pushed it to bpf at the end.
> 

Ok, thanks for explaining. I'll consider submitting this kind of patches
to bpf-next instead in the future.

Quentin

^ permalink raw reply

* Re: Unable to create htb tc classes more than 64K
From: Cong Wang @ 2019-08-16 17:45 UTC (permalink / raw)
  To: Akshat Kakkar; +Cc: NetFilter, lartc, netdev
In-Reply-To: <CAA5aLPhf1=wzQG0BAonhR3td-RhEmXaczug8n4hzXCzreb+52g@mail.gmail.com>

On Fri, Aug 16, 2019 at 5:49 AM Akshat Kakkar <akshat.1984@gmail.com> wrote:
>
> I want to have around 1 Million htb tc classes.
> The simple structure of htb tc class, allow having only 64K classes at once.

This is probably due the limit of class ID which is 16bit for minor.


> But, it is possible to make it more hierarchical using hierarchy of
> qdisc and classes.
> For this I tried something like this
>
> tc qdisc add dev eno2 root handle 100: htb
> tc class add dev eno2 parent 100: classid 100:1 htb rate 100Mbps
> tc class add dev eno2 parent 100: classid 100:2 htb rate 100Mbps
>
> tc qdisc add dev eno2 parent 100:1 handle 1: htb
> tc class add dev eno2 parent 1: classid 1:10 htb rate 100kbps
> tc class add dev eno2 parent 1: classid 1:20 htb rate 300kbps
>
> tc qdisc add dev eno2 parent 100:2 handle 2: htb
> tc class add dev eno2 parent 2: classid 2:10 htb rate 100kbps
> tc class add dev eno2 parent 2: classid 2:20 htb rate 300kbps
>
> What I want is something like:
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000001 fw flowid 1:10
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000002 fw flowid 1:20
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000003 fw flowid 2:10
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000004 fw flowid 2:20
>
> But I am unable to shape my traffic by any of 1:10, 1:20, 2:10 or 2:20.
>
> Can you please suggest, where is it going wrong?
> Is it not possible altogether?

The filter could only filter for classes on the same level, you are
trying to filter for the children classes, which doesn't work.

Thanks.

^ permalink raw reply

* Re: [PATCH bpf 0/6] tools: bpftool: fix printf()-like functions
From: Jakub Kicinski @ 2019-08-16 17:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Quentin Monnet, Alexei Starovoitov, Daniel Borkmann, bpf,
	Network Development, oss-drivers
In-Reply-To: <CAADnVQLPg8jEsUbKOxzQc5Q1BKrB=urSWiniGwsJhcm=UM7oKA@mail.gmail.com>

On Fri, 16 Aug 2019 10:11:12 -0700, Alexei Starovoitov wrote:
> On Fri, Aug 16, 2019 at 9:41 AM Quentin Monnet wrote:
> > 2019-08-15 22:08 UTC-0700 ~ Alexei Starovoitov
> > <alexei.starovoitov@gmail.com>  
> > > On Thu, Aug 15, 2019 at 7:32 AM Quentin Monnet
> > > <quentin.monnet@netronome.com> wrote:  
> > >>
> > >> Hi,
> > >> Because the "__printf()" attributes were used only where the functions are
> > >> implemented, and not in header files, the checks have not been enforced on
> > >> all the calls to printf()-like functions, and a number of errors slipped in
> > >> bpftool over time.
> > >>
> > >> This set cleans up such errors, and then moves the "__printf()" attributes
> > >> to header files, so that the checks are performed at all locations.  
> > >
> > > Applied. Thanks
> > >  
> >
> > Thanks Alexei!
> >
> > I noticed the set was applied to the bpf-next tree, and not bpf. Just
> > checking if this is intentional?  
> 
> Yes. I don't see the _fix_ part in there.

Mm.. these are not critical indeed, but patches 1 and 3 do fix a crash.
Perhaps those should had been a series on their own. 

We'll recalibrate :)

> Looks like cleanup to me.
> I've also considered to push
> commit d34b044038bf ("tools: bpftool: close prog FD before exit on
> showing a single program")
> to bpf-next as well.
> That fd leak didn't feel that necessary to push to bpf tree
> and risk merge conflicts... but I pushed it to bpf at the end.


^ permalink raw reply

* Re: [PATCH net-next v7 5/6] flow_offload: support get multi-subsystem block
From: Jakub Kicinski @ 2019-08-16 17:56 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: wenxu, David Miller, Jiri Pirko, pablo@netfilter.org,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <vbfpnl55eyg.fsf@mellanox.com>

On Fri, 16 Aug 2019 15:04:44 +0000, Vlad Buslov wrote:
> >> [  401.511871] RSP: 002b:00007ffca2a9fad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> >> [  401.511875] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fad892d30f8
> >> [  401.511878] RDX: 0000000000000002 RSI: 000055afeb072a90 RDI: 0000000000000001
> >> [  401.511881] RBP: 000055afeb072a90 R08: 00000000ffffffff R09: 000000000000000a
> >> [  401.511884] R10: 000055afeb058710 R11: 0000000000000246 R12: 0000000000000002
> >> [  401.511887] R13: 00007fad893a8780 R14: 0000000000000002 R15: 00007fad893a3740
> >>
> >> I don't think it is correct approach to try to call these callbacks with
> >> rcu protection because:
> >>
> >> - Cls API uses sleeping locks that cannot be used in rcu read section
> >>   (hence the included trace).
> >>
> >> - It assumes that all implementation of classifier ops reoffload() don't
> >>   sleep.
> >>
> >> - And that all driver offload callbacks (both block and classifier
> >>   setup) don't sleep, which is not the case.
> >>
> >> I don't see any straightforward way to fix this, besides using some
> >> other locking mechanism to protect block_ing_cb_list.
> >>
> >> Regards,
> >> Vlad  
> >
> > Maybe get the  mutex flow_indr_block_ing_cb_lock for both lookup, add, delete? 
> >
> > the callbacks_lists. the add and delete is work only on modules init case. So the
> >
> > lookup is also not frequently(ony [un]register) and can protect with the locks.  
> 
> That should do the job. I'll send the patch.

Hi Vlad! 

While looking into this, would you mind also add the missing
flow_block_cb_is_busy() calls in the indirect handlers in the drivers?

LMK if you're too busy, I don't want this to get forgotten :)

^ permalink raw reply

* Re: [net-next 01/15] ice: Implement ethtool ops for channels
From: Nguyen, Anthony L @ 2019-08-16 18:01 UTC (permalink / raw)
  To: jakub.kicinski@netronome.com
  Cc: nhorman@redhat.com, davem@davemloft.net, Kirsher, Jeffrey T,
	Bowers, AndrewX, netdev@vger.kernel.org, sassmann@redhat.com,
	Tieman, Henry W
In-Reply-To: <20190812152416.35f98091@cakuba.netronome.com>

[-- Attachment #1: Type: text/plain, Size: 1527 bytes --]

On Mon, 2019-08-12 at 15:24 -0700, Jakub Kicinski wrote:
> On Mon, 12 Aug 2019 15:07:09 +0000, Nguyen, Anthony L wrote:
> > On Fri, 2019-08-09 at 14:15 -0700, Jakub Kicinski wrote:
> > > On Fri,  9 Aug 2019 11:31:25 -0700, Jeff Kirsher wrote:  
> > > > From: Henry Tieman <henry.w.tieman@intel.com>
> > > > 
> > > > Add code to query and set the number of queues on the primary
> > > > VSI for a PF. This is accessed from the 'ethtool -l' and
> > > > 'ethtool
> > > > -L'
> > > > commands, respectively.
> > > > 
> > > > Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
> > > > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> > > > Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
> > > > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>  
> > > 
> > > If you're using the same IRQ vector for RX and TX queue the
> > > channel
> > > counts as combined. Looks like you are counting RX and TX
> > > separately
> > > here. That's incorrect.  
> > 
> > Hi Jakub,
> > 
> > The ice driver can support asymmetric queues.  We report these
> > seperately, as opposed to combined, so that the user can specify a
> > different number of Rx and Tx queues.
> 
> If you have 20 IRQ vectors, 10 TX queues and 20 RX queues, the first
> 10
> RX queues share a IRQ vector with TX queues the ethool API counts
> them
> as 10 combined and 10 rx-only. 
> 
> 10 tx-only and 20 rx-only would require 30 IRQ vectors.

Thanks for the feedback Jakub.  We are looking into this.

-Tony

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3277 bytes --]

^ permalink raw reply

* Re: linux-next: manual merge of the net-next tree with the kbuild tree
From: Kees Cook @ 2019-08-16 18:03 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Stephen Rothwell, David Miller, Networking, Masahiro Yamada,
	Linux Next Mailing List, Linux Kernel Mailing List,
	Andrii Nakryiko, Daniel Borkmann
In-Reply-To: <CAEf4BzY9dDZF-DBDmuQQz0Rcx3DNGvQn_GLr0Uar1PAbAf2iig@mail.gmail.com>

On Thu, Aug 15, 2019 at 10:21:29PM -0700, Andrii Nakryiko wrote:
> On Thu, Aug 15, 2019 at 7:42 PM Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > Hi all,
> >
> > Today's linux-next merge of the net-next tree got a conflict in:
> >
> >   scripts/link-vmlinux.sh
> >
> > between commit:
> >
> >   e167191e4a8a ("kbuild: Parameterize kallsyms generation and correct reporting")
> >
> > from the kbuild tree and commits:
> >
> >   341dfcf8d78e ("btf: expose BTF info through sysfs")
> >   7fd785685e22 ("btf: rename /sys/kernel/btf/kernel into /sys/kernel/btf/vmlinux")
> >
> > from the net-next tree.
> >
> > I fixed it up (I think - see below) and can carry the fix as necessary.
> 
> Thanks, Stephen! Looks good except one minor issue below.
> 
> > This is now fixed as far as linux-next is concerned, but any non trivial
> > conflicts should be mentioned to your upstream maintainer when your tree
> > is submitted for merging.  You may also want to consider cooperating
> > with the maintainer of the conflicting tree to minimise any particularly
> > complex conflicts.
> >
> > --
> > Cheers,
> > Stephen Rothwell
> >
> > diff --cc scripts/link-vmlinux.sh
> > index 2438a9faf3f1,c31193340108..000000000000
> > --- a/scripts/link-vmlinux.sh
> > +++ b/scripts/link-vmlinux.sh
> > @@@ -56,11 -56,10 +56,11 @@@ modpost_link(
> >   }
> >
> >   # Link of vmlinux
> > - # ${1} - optional extra .o files
> > - # ${2} - output file
> > + # ${1} - output file
> > + # ${@:2} - optional extra .o files
> >   vmlinux_link()
> >   {
> >  +      info LD ${2}
> 
> This needs to be ${1}.
> 
> >         local lds="${objtree}/${KBUILD_LDS}"
> >         local objects
> >
> > @@@ -139,18 -149,6 +150,18 @@@ kallsyms(
> >         ${CC} ${aflags} -c -o ${2} ${afile}
> >   }
> >
> >  +# Perform one step in kallsyms generation, including temporary linking of
> >  +# vmlinux.
> >  +kallsyms_step()
> >  +{
> >  +      kallsymso_prev=${kallsymso}
> >  +      kallsymso=.tmp_kallsyms${1}.o
> >  +      kallsyms_vmlinux=.tmp_vmlinux${1}
> >  +
> > -       vmlinux_link "${kallsymso_prev}" ${kallsyms_vmlinux}
> > ++      vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" ${btf_vmlinux_bin_o}

Good cleanup on the "optional .o files" reordering! With your ordering
change, I think the ""s around ${kallsymso_prev} here are no longer needed
(which makes it read a bit more nicely).

> >  +      kallsyms ${kallsyms_vmlinux} ${kallsymso}
> >  +}
> >  +
> >   # Create map file with all symbols from ${1}
> >   # See mksymap for additional details
> >   mksysmap()
> > @@@ -228,8 -227,14 +240,15 @@@ ${MAKE} -f "${srctree}/scripts/Makefile
> >   info MODINFO modules.builtin.modinfo
> >   ${OBJCOPY} -j .modinfo -O binary vmlinux.o modules.builtin.modinfo
> >
> > + btf_vmlinux_bin_o=""
> > + if [ -n "${CONFIG_DEBUG_INFO_BTF}" ]; then
> > +       if gen_btf .tmp_vmlinux.btf .btf.vmlinux.bin.o ; then
> > +               btf_vmlinux_bin_o=.btf.vmlinux.bin.o
> > +       fi
> > + fi
> > +
> >   kallsymso=""
> >  +kallsymso_prev=""
> >   kallsyms_vmlinux=""
> >   if [ -n "${CONFIG_KALLSYMS}" ]; then
> >
> > @@@ -268,11 -285,8 +287,7 @@@
> >         fi
> >   fi
> >
> > - vmlinux_link "${kallsymso}" vmlinux
> > -
> > - if [ -n "${CONFIG_DEBUG_INFO_BTF}" ]; then
> > -       gen_btf vmlinux
> > - fi
> >  -info LD vmlinux
> > + vmlinux_link vmlinux "${kallsymso}" "${btf_vmlinux_bin_o}"

And, I think, also not here for either trailing argument.

> >
> >   if [ -n "${CONFIG_BUILDTIME_EXTABLE_SORT}" ]; then
> >         info SORTEX vmlinux

-Kees

-- 
Kees Cook

^ permalink raw reply

* Re: [RFC bpf-next 0/3] tools: bpftool: add subcommand to count map entries
From: Edward Cree @ 2019-08-16 18:13 UTC (permalink / raw)
  To: Quentin Monnet, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, netdev, oss-drivers
In-Reply-To: <031de7fd-caa7-9e66-861f-8e46e5bb8851@netronome.com>

On 15/08/2019 15:15, Quentin Monnet wrote:
> So if I understand correctly, we would use the bpf() syscall to trigger
> a run of such program on all map entries (for map implementing the new
> operation), and the context would include pointers to the key and the
> value for the entry being processed so we can count/sum/compute an
> average of the values or any other kind of processing?
Yep, that's pretty much exactly what I had in mind.

-Ed

^ permalink raw reply

* kernel BUG at include/linux/skbuff.h:LINE! (2)
From: syzbot @ 2019-08-16 18:38 UTC (permalink / raw)
  To: davem, linux-kernel, linux-sctp, marcelo.leitner, netdev, nhorman,
	syzkaller-bugs, vyasevich

Hello,

syzbot found the following crash on:

HEAD commit:    459c5fb4 Merge branch 'mscc-PTP-support'
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=13f2d33c600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=d4cf1ffb87d590d7
dashboard link: https://syzkaller.appspot.com/bug?extid=eb349eeee854e389c36d
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=111849e2600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1442c25a600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com

------------[ cut here ]------------
kernel BUG at include/linux/skbuff.h:2225!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9030 Comm: syz-executor649 Not tainted 5.3.0-rc3+ #134
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
RIP: 0010:__skb_pull include/linux/skbuff.h:2222 [inline]
RIP: 0010:skb_pull_inline include/linux/skbuff.h:2231 [inline]
RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
Code: 9d c8 00 00 00 49 89 dc 49 89 9d c8 00 00 00 e8 9c e5 dd fb 4c 89 e0  
5b 41 5c 41 5d 41 5e 5d c3 45 31 e4 eb ea e8 86 e5 dd fb <0f> 0b e8 df 13  
18 fc e9 44 ff ff ff e8 d5 13 18 fc eb 8a e8 ee 13
RSP: 0018:ffff88808ac96e10 EFLAGS: 00010293
RAX: ffff88809c546000 RBX: 0000000000000004 RCX: ffffffff8594a3a6
RDX: 0000000000000000 RSI: ffffffff8594a3fa RDI: 0000000000000004
RBP: ffff88808ac96e30 R08: ffff88809c546000 R09: fffffbfff14a8f4f
R10: fffffbfff14a8f4e R11: ffffffff8a547a77 R12: 0000000095e28bcc
R13: ffff88808ac97478 R14: 00000000ffff8880 R15: ffff88808ac97478
FS:  0000555556549880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000100 CR3: 0000000089c3c000 CR4: 00000000001406f0
Call Trace:
  sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202
  sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385
  sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80
  sctp_rcv+0x2807/0x3590 net/sctp/input.c:256
  sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049
  ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
  ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
  NF_HOOK include/linux/netfilter.h:305 [inline]
  NF_HOOK include/linux/netfilter.h:299 [inline]
  ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
  dst_input include/net/dst.h:442 [inline]
  ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84
  ip6_list_rcv_finish net/ipv6/ip6_input.c:118 [inline]
  ip6_sublist_rcv+0x80c/0xcf0 net/ipv6/ip6_input.c:282
  ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:316
  __netif_receive_skb_list_ptype net/core/dev.c:5049 [inline]
  __netif_receive_skb_list_core+0x5fc/0x9d0 net/core/dev.c:5097
  __netif_receive_skb_list net/core/dev.c:5149 [inline]
  netif_receive_skb_list_internal+0x7eb/0xe60 net/core/dev.c:5244
  gro_normal_list.part.0+0x1e/0xb0 net/core/dev.c:5757
  gro_normal_list net/core/dev.c:5755 [inline]
  gro_normal_one net/core/dev.c:5769 [inline]
  napi_frags_finish net/core/dev.c:5782 [inline]
  napi_gro_frags+0xa6a/0xea0 net/core/dev.c:5855
  tun_get_user+0x2e98/0x3fa0 drivers/net/tun.c:1974
  tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2020
  call_write_iter include/linux/fs.h:1870 [inline]
  do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693
  do_iter_write fs/read_write.c:970 [inline]
  do_iter_write+0x184/0x610 fs/read_write.c:951
  vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
  do_writev+0x15b/0x330 fs/read_write.c:1058
  __do_sys_writev fs/read_write.c:1131 [inline]
  __se_sys_writev fs/read_write.c:1128 [inline]
  __x64_sys_writev+0x75/0xb0 fs/read_write.c:1128
  do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x441b10
Code: 05 48 3d 01 f0 ff ff 0f 83 5d 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00  
00 66 90 83 3d 01 95 29 00 00 75 14 b8 14 00 00 00 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 34 09 fc ff c3 48 83 ec 08 e8 ba 2b 00 00
RSP: 002b:00007ffe63706b88 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 00007ffe63706ba0 RCX: 0000000000441b10
RDX: 0000000000000001 RSI: 00007ffe63706bd0 RDI: 00000000000000f0
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000004
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000122cb
R13: 0000000000402960 R14: 0000000000000000 R15: 0000000000000000
Modules linked in:
---[ end trace c37566c1c02066db ]---
RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
RIP: 0010:__skb_pull include/linux/skbuff.h:2222 [inline]
RIP: 0010:skb_pull_inline include/linux/skbuff.h:2231 [inline]
RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
Code: 9d c8 00 00 00 49 89 dc 49 89 9d c8 00 00 00 e8 9c e5 dd fb 4c 89 e0  
5b 41 5c 41 5d 41 5e 5d c3 45 31 e4 eb ea e8 86 e5 dd fb <0f> 0b e8 df 13  
18 fc e9 44 ff ff ff e8 d5 13 18 fc eb 8a e8 ee 13
RSP: 0018:ffff88808ac96e10 EFLAGS: 00010293
RAX: ffff88809c546000 RBX: 0000000000000004 RCX: ffffffff8594a3a6
RDX: 0000000000000000 RSI: ffffffff8594a3fa RDI: 0000000000000004
RBP: ffff88808ac96e30 R08: ffff88809c546000 R09: fffffbfff14a8f4f
R10: fffffbfff14a8f4e R11: ffffffff8a547a77 R12: 0000000095e28bcc
R13: ffff88808ac97478 R14: 00000000ffff8880 R15: ffff88808ac97478
FS:  0000555556549880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000100 CR3: 0000000089c3c000 CR4: 00000000001406f0


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH net-next v7 5/6] flow_offload: support get multi-subsystem block
From: Vlad Buslov @ 2019-08-16 18:44 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vlad Buslov, wenxu, David Miller, Jiri Pirko, pablo@netfilter.org,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20190816105627.57c1c2aa@cakuba.netronome.com>


On Fri 16 Aug 2019 at 20:56, Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> On Fri, 16 Aug 2019 15:04:44 +0000, Vlad Buslov wrote:
>> >> [  401.511871] RSP: 002b:00007ffca2a9fad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>> >> [  401.511875] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fad892d30f8
>> >> [  401.511878] RDX: 0000000000000002 RSI: 000055afeb072a90 RDI: 0000000000000001
>> >> [  401.511881] RBP: 000055afeb072a90 R08: 00000000ffffffff R09: 000000000000000a
>> >> [  401.511884] R10: 000055afeb058710 R11: 0000000000000246 R12: 0000000000000002
>> >> [  401.511887] R13: 00007fad893a8780 R14: 0000000000000002 R15: 00007fad893a3740
>> >>
>> >> I don't think it is correct approach to try to call these callbacks with
>> >> rcu protection because:
>> >>
>> >> - Cls API uses sleeping locks that cannot be used in rcu read section
>> >>   (hence the included trace).
>> >>
>> >> - It assumes that all implementation of classifier ops reoffload() don't
>> >>   sleep.
>> >>
>> >> - And that all driver offload callbacks (both block and classifier
>> >>   setup) don't sleep, which is not the case.
>> >>
>> >> I don't see any straightforward way to fix this, besides using some
>> >> other locking mechanism to protect block_ing_cb_list.
>> >>
>> >> Regards,
>> >> Vlad  
>> >
>> > Maybe get the  mutex flow_indr_block_ing_cb_lock for both lookup, add, delete? 
>> >
>> > the callbacks_lists. the add and delete is work only on modules init case. So the
>> >
>> > lookup is also not frequently(ony [un]register) and can protect with the locks.  
>> 
>> That should do the job. I'll send the patch.
>
> Hi Vlad! 
>
> While looking into this, would you mind also add the missing
> flow_block_cb_is_busy() calls in the indirect handlers in the drivers?
>
> LMK if you're too busy, I don't want this to get forgotten :)

Hi Jakub,

Will do!

^ permalink raw reply

* Re: [RFC PATCH bpf-next 00/14] xdp_flow: Flow offload to XDP
From: Jakub Kicinski @ 2019-08-16 18:52 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, David S. Miller,
	Jesper Dangaard Brouer, John Fastabend, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, netdev, bpf, William Tu
In-Reply-To: <da840b14-ab5b-91f1-df2f-6bdd0ed41173@gmail.com>

On Fri, 16 Aug 2019 10:28:10 +0900, Toshiaki Makita wrote:
> On 2019/08/16 4:22, Jakub Kicinski wrote:
> > There's a certain allure in bringing the in-kernel BPF translation
> > infrastructure forward. OTOH from system architecture perspective IMHO
> > it does seem like a task best handed in user space. bpfilter can replace
> > iptables completely, here we're looking at an acceleration relatively
> > loosely coupled with flower.  
> 
> I don't think it's loosely coupled. Emulating TC behavior in userspace
> is not so easy.
> 
> Think about recent multi-mask support in flower. Previously userspace could
> assume there is one mask and hash table for each preference in TC. After the
> change TC accepts different masks with the same pref. Such a change tends to
> break userspace emulation. It may ignore masks passed from flow insertion
> and use the mask remembered when the first flow of the pref is inserted. It
> may override the mask of all existing flows with the pref. It may fail to
> insert such flows. Any of them would result in unexpected wrong datapath
> handling which is critical.
> I think such an emulation layer needs to be updated in sync with TC.

Oh, so you're saying that if xdp_flow is merged all patches to
cls_flower and netfilter which affect flow offload will be required 
to update xdp_flow as well?

That's a question of policy. Technically the implementation in user
space is equivalent.

The advantage of user space implementation is that you can add more
to it and explore use cases which do not fit in the flow offload API,
but are trivial for BPF. Not to mention the obvious advantage of
decoupling the upgrade path.

Personally I'm not happy with the way this patch set messes with the
flow infrastructure. You should use the indirect callback
infrastructure instead, and that way you can build the whole thing
touching none of the flow offload core.

^ permalink raw reply

* Re: [PATCH v5 00/13] net: phy: adin: add support for Analog Devices PHYs
From: David Miller @ 2019-08-16 18:57 UTC (permalink / raw)
  To: alexandru.ardelean
  Cc: netdev, devicetree, linux-kernel, robh+dt, mark.rutland,
	f.fainelli, hkallweit1, andrew
In-Reply-To: <20190816131011.23264-1-alexandru.ardelean@analog.com>

From: Alexandru Ardelean <alexandru.ardelean@analog.com>
Date: Fri, 16 Aug 2019 16:09:58 +0300

> This changeset adds support for Analog Devices Industrial Ethernet PHYs.
> Particularly the PHYs this driver adds support for:
>  * ADIN1200 - Robust, Industrial, Low Power 10/100 Ethernet PHY
>  * ADIN1300 - Robust, Industrial, Low Latency 10/100/1000 Gigabit
>    Ethernet PHY
> 
> The 2 chips are register compatible with one another. The main
> difference being that ADIN1200 doesn't operate in gigabit mode.
> 
> The chips can be operated by the Generic PHY driver as well via the
> standard IEEE PHY registers (0x0000 - 0x000F) which are supported by the
> kernel as well. This assumes that configuration of the PHY has been done
> completely in HW, according to spec, i.e. no extra SW configuration
> required.
> 
> This changeset also implements the ability to configure the chips via SW
> registers.
> 
> Datasheets:
>   https://www.analog.com/media/en/technical-documentation/data-sheets/ADIN1300.pdf
>   https://www.analog.com/media/en/technical-documentation/data-sheets/ADIN1200.pdf
> 
> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next 0/3] net: phy: remove genphy_config_init
From: David Miller @ 2019-08-16 18:57 UTC (permalink / raw)
  To: hkallweit1
  Cc: andrew, f.fainelli, khilman, vivien.didelot, netdev,
	linux-amlogic
In-Reply-To: <95dfdb55-415c-c995-cba3-1902bdd46aec@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Thu, 15 Aug 2019 14:01:43 +0200

> Supported PHY features are either auto-detected or explicitly set.
> In both cases calling genphy_config_init isn't needed. All that
> genphy_config_init does is removing features that are set as
> supported but can't be auto-detected. Basically it duplicates the
> code in genphy_read_abilities. Therefore remove genphy_config_init.

Heiner you will need to respin this series as the new adin driver
added a new call to genphy_config_init().

Thank you.

^ permalink raw reply

* Re: r8169: Performance regression and latency instability
From: Heiner Kallweit @ 2019-08-16 19:12 UTC (permalink / raw)
  To: Holger Hoffstätte, Eric Dumazet, Juliana Rodrigueiro, netdev
In-Reply-To: <792d3a56-32aa-afee-f2b4-1f867b9cf75f@applied-asynchrony.com>

On 16.08.2019 15:59, Holger Hoffstätte wrote:
> On 8/16/19 2:35 PM, Eric Dumazet wrote:
> ..snip..
>> I also see this relevant commit : I have no idea why SG would have any relation with TSO.
>>
>> commit a7eb6a4f2560d5ae64bfac98d79d11378ca2de6c
>> Author: Holger Hoffstätte <holger@applied-asynchrony.com>
>> Date:   Fri Aug 9 00:02:40 2019 +0200
>>
>>      r8169: fix performance issue on RTL8168evl
>>           Disabling TSO but leaving SG active results is a significant
>>      performance drop. Therefore disable also SG on RTL8168evl.
>>      This restores the original performance.
>>           Fixes: 93681cd7d94f ("r8169: enable HW csum and TSO")
>>      Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com>
>>      Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>      Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> It does not - and admittedly none of this makes sense, but stay with me here.
> 
> The commit 93681cd7d94f to net-next enabled rx/tx HW checksumming and TSO
> by default, but disabled TSO for one specific chip revision - the most popular
> one, of course. Enabling rx/tx checksums by default while leaving SG on turned
> out to be the performance issue (~780 MBit max) that I found & fixed in the
> quoted commit. SG *can* be enabled when rx/tx checkusmming is *dis*abled
> (I just verified again), we just had to sanitize the new default.
> 
> An alternative strategy could still be to (again?) disable everything by default
> and just let people manually enable whatever settings work for their random
> chip revision + BIOS combination. I'll let Heiner chime in here.
> 
> Basically these chips are dumpster fires and should not be used for anything
> ever, which of course means they are everywhere.
> 
> AFAICT none of this has anything to do with Juliana's problem..
> 
Indeed, here we're talking about changes in linux-next, and Juliana's issue is
with 4.19. However I'd appreciate if Juliana could test with linux-next and
different combinations of the NETIF_F_xxx features.

I have no immediate idea why the referenced GSO change affects r8169 but not
other chips / drivers.

> -h
> 
Heiner

^ permalink raw reply

* Re: IPv6 addr and route is gone after adding port to vrf (5.2.0+)
From: David Ahern @ 2019-08-16 19:15 UTC (permalink / raw)
  To: Ben Greear, netdev
In-Reply-To: <c55619f8-c565-d611-0261-c64fa7590274@candelatech.com>

On 8/16/19 1:13 PM, Ben Greear wrote:
> I have a problem with a VETH port when setting up a somewhat complicated
> VRF setup. I am loosing the global IPv6 addr, and also the route,
> apparently
> when I add the veth device to a vrf.  From my script's output:

Either enslave the device before adding the address or enable the
retention of addresses:

sysctl -q -w net.ipv6.conf.all.keep_addr_on_down=1

^ permalink raw reply

* IPv6 addr and route is gone after adding port to vrf (5.2.0+)
From: Ben Greear @ 2019-08-16 19:13 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern

Hello,

I have a problem with a VETH port when setting up a somewhat complicated
VRF setup. I am loosing the global IPv6 addr, and also the route, apparently
when I add the veth device to a vrf.  From my script's output:

### commands to set up the veth 'rddVR0'

./local/sbin/ip link set rddVR0 down
./local/sbin/ip -4 addr flush dev rddVR0
./local/sbin/ip -6 addr flush dev rddVR0
echo 1 > /proc/sys/net/ipv4/conf/rddVR0/forwarding
echo 1 > /proc/sys/net/ipv6/conf/rddVR0/forwarding
./local/sbin/ip link set rddVR0 up
./local/sbin/ip -4 addr add 10.2.127.1/24 broadcast 10.2.127.255 dev rddVR0
./local/sbin/ip -6 addr add 2001:3::1/64 scope global dev rddVR0
./local/sbin/ip -6 addr add fe80::d0f8:6fff:fe06:8ae/64 scope link dev rddVR0
RTNETLINK answers: File exists
./local/sbin/ip -6 route add 2001:3::1/64 dev rddVR0 table 10001
./local/sbin/ip -6 route add fe80::d0f8:6fff:fe06:8ae/64 dev rddVR0 table 10001
./local/sbin/ip route add 10.2.127.0/24 dev rddVR0 table 10001
echo 1 > /proc/sys/net/ipv4/conf/rddVR0/arp_filter

#printRoutes for table 10001
broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.1 linkdown
10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.1 linkdown
local 10.2.1.1 dev eth1 proto kernel scope host src 10.2.1.1
broadcast 10.2.1.255 dev eth1 proto kernel scope link src 10.2.1.1 linkdown
broadcast 10.2.8.0 dev vap0000 proto kernel scope link src 10.2.8.1 linkdown
10.2.8.0/24 dev vap0000 proto kernel scope link src 10.2.8.1 linkdown
local 10.2.8.1 dev vap0000 proto kernel scope host src 10.2.8.1
broadcast 10.2.8.255 dev vap0000 proto kernel scope link src 10.2.8.1 linkdown
broadcast 10.2.9.0 dev vap0100 proto kernel scope link src 10.2.9.1 linkdown
10.2.9.0/24 dev vap0100 proto kernel scope link src 10.2.9.1 linkdown
local 10.2.9.1 dev vap0100 proto kernel scope host src 10.2.9.1
broadcast 10.2.9.255 dev vap0100 proto kernel scope link src 10.2.9.1 linkdown
10.2.127.0/24 dev rddVR0 scope link
2001:3::/64 dev rddVR0 metric 1024 pref medium
fe80::/64 dev rddVR0 metric 1024 pref medium

.... some other commands, route/ip is still there ....

#printRoutes for table 10001
broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.1 linkdown
10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.1 linkdown
local 10.2.1.1 dev eth1 proto kernel scope host src 10.2.1.1
broadcast 10.2.1.255 dev eth1 proto kernel scope link src 10.2.1.1 linkdown
broadcast 10.2.8.0 dev vap0000 proto kernel scope link src 10.2.8.1 linkdown
10.2.8.0/24 dev vap0000 proto kernel scope link src 10.2.8.1 linkdown
local 10.2.8.1 dev vap0000 proto kernel scope host src 10.2.8.1
broadcast 10.2.8.255 dev vap0000 proto kernel scope link src 10.2.8.1 linkdown
broadcast 10.2.9.0 dev vap0100 proto kernel scope link src 10.2.9.1 linkdown
10.2.9.0/24 dev vap0100 proto kernel scope link src 10.2.9.1 linkdown
local 10.2.9.1 dev vap0100 proto kernel scope host src 10.2.9.1
broadcast 10.2.9.255 dev vap0100 proto kernel scope link src 10.2.9.1 linkdown
10.2.127.0/24 dev rddVR0 scope link
2001:3::/64 dev rddVR0 metric 1024 pref medium
fe80::/64 dev rddVR0 metric 1024 pref medium


./local/sbin/ip link set rddVR0 vrf vrf10001

#printRoutes for table 10001
broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.1 linkdown
10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.1 linkdown
local 10.2.1.1 dev eth1 proto kernel scope host src 10.2.1.1
broadcast 10.2.1.255 dev eth1 proto kernel scope link src 10.2.1.1 linkdown
broadcast 10.2.8.0 dev vap0000 proto kernel scope link src 10.2.8.1 linkdown
10.2.8.0/24 dev vap0000 proto kernel scope link src 10.2.8.1 linkdown
local 10.2.8.1 dev vap0000 proto kernel scope host src 10.2.8.1
broadcast 10.2.8.255 dev vap0000 proto kernel scope link src 10.2.8.1 linkdown
broadcast 10.2.9.0 dev vap0100 proto kernel scope link src 10.2.9.1 linkdown
10.2.9.0/24 dev vap0100 proto kernel scope link src 10.2.9.1 linkdown
local 10.2.9.1 dev vap0100 proto kernel scope host src 10.2.9.1
broadcast 10.2.9.255 dev vap0100 proto kernel scope link src 10.2.9.1 linkdown
broadcast 10.2.127.0 dev rddVR0 proto kernel scope link src 10.2.127.1
10.2.127.0/24 dev rddVR0 proto kernel scope link src 10.2.127.1
local 10.2.127.1 dev rddVR0 proto kernel scope host src 10.2.127.1
broadcast 10.2.127.255 dev rddVR0 proto kernel scope link src 10.2.127.1
fe80::/64 dev rddVR0 proto kernel metric 256 pref medium
ff00::/8 dev rddVR0 metric 256 pref medium


#### Route is gone...
#### 2001:3::/64 dev rddVR0 metric 1024 pref medium


As far as I can tell, the same actions for a wifi AP interface do not hit this problem,
but not sure if that is luck or not at this point.

Any ideas what might be going on here?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: [PATCH net,v5 2/2] netfilter: nf_tables: map basechain priority to hardware priority
From: Jakub Kicinski @ 2019-08-16 19:44 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, marcelo.leitner, jiri, wenxu,
	saeedm, paulb, gerlitz.or
In-Reply-To: <20190816012410.31844-3-pablo@netfilter.org>

On Fri, 16 Aug 2019 03:24:10 +0200, Pablo Neira Ayuso wrote:
> This patch adds initial support for offloading basechains using the
> priority range from 1 to 65535. This is restricting the netfilter
> priority range to 16-bit integer since this is what most drivers assume
> so far from tc. It should be possible to extend this range of supported
> priorities later on once drivers are updated to support for 32-bit
> integer priorities.
> 
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
> v5: fix clang warning by simplifying the mapping of hardware priorities
>     to basechain priority in the range of 1-65535. Zero is left behind
>     since some drivers do not support this, no negative basechain
>     priorities are used at this stage.

LGTM.

^ permalink raw reply

* RE: [PATCH net-next, 2/6] PCI: hv: Add a Hyper-V PCI mini driver for software backchannel interface
From: Haiyang Zhang @ 2019-08-16 19:50 UTC (permalink / raw)
  To: vkuznets
  Cc: KY Srinivasan, Stephen Hemminger, linux-kernel@vger.kernel.org,
	sashal@kernel.org, davem@davemloft.net, saeedm@mellanox.com,
	leon@kernel.org, eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
	bhelgaas@google.com, linux-pci@vger.kernel.org,
	linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <871rxl84ry.fsf@vitty.brq.redhat.com>



> -----Original Message-----
> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> Sent: Friday, August 16, 2019 12:16 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: KY Srinivasan <kys@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; linux-kernel@vger.kernel.org;
> sashal@kernel.org; davem@davemloft.net; saeedm@mellanox.com;
> leon@kernel.org; eranbe@mellanox.com; lorenzo.pieralisi@arm.com;
> bhelgaas@google.com; linux-pci@vger.kernel.org; linux-
> hyperv@vger.kernel.org; netdev@vger.kernel.org
> Subject: RE: [PATCH net-next, 2/6] PCI: hv: Add a Hyper-V PCI mini driver for
> software backchannel interface
> 
> Haiyang Zhang <haiyangz@microsoft.com> writes:
> 
> >
> > The pci_hyperv can only be loaded on VMs on Hyper-V and Azure. Other
> > drivers like MLX5e will have symbolic dependency of pci_hyperv if they
> > use functions exported by pci_hyperv. This dependency will cause other
> > drivers fail to load on other platforms, like VMs on KVM. So we
> > created this mini driver, which can be loaded on any platforms to
> > provide the symbolic dependency.
> 
> (/me wondering is there a nicer way around this, by using __weak or
> something like that...)
> 
> In case this stub is the best solution I'd suggest to rename it to something like
> PCI_HYPERV_INTERFACE to make it clear it is not a separate driver (_MINI
> makes me think so).

Thanks! I will consider those options.

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Alexei Starovoitov @ 2019-08-16 19:52 UTC (permalink / raw)
  To: Jordan Glover
  Cc: Thomas Gleixner, Andy Lutomirski, Daniel Colascione, Song Liu,
	Kees Cook, Networking, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Lorenz Bauer, Jann Horn, Greg KH, Linux API,
	LSM List
In-Reply-To: <lGGTLXBsX3V6p1Z4TkdzAjxbNywaPS2HwX5WLleAkmXNcnKjTPpWnP6DnceSsy8NKt5NBRBbuoAb0woKTcDhJXVoFb7Ygk3Skfj8j6rVfMQ=@protonmail.ch>

On Fri, Aug 16, 2019 at 11:33:57AM +0000, Jordan Glover wrote:
> On Friday, August 16, 2019 9:59 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > On Fri, 16 Aug 2019, Jordan Glover wrote:
> >
> > > "systemd --user" service? Trying to do so will fail with:
> > > "Failed to apply ambient capabilities (before UID change): Operation not permitted"
> > > I think it's crucial to clear that point to avoid confusion in this discussion
> > > where people are talking about different things.
> > > On the other hand running "systemd --system" service with:
> > > User=nobody
> > > AmbientCapabilities=CAP_NET_ADMIN
> > > is perfectly legit and clears some security concerns as only privileged user
> > > can start such service.
> >
> > While we are at it, can we please stop looking at this from a systemd only
> > perspective. There is a world outside of systemd.
> >
> > Thanks,
> >
> > tglx
> 
> If you define:
> 
> "systemd --user" == unprivileged process started by unprivileged user
> "systemd --system" == process started by privileged user but run as another
> user which keeps some of parent user privileges and drops others
> 
> you can get rid of "systemd" from the equation.
> 
> "systemd --user" was the example provided by Alexei when asked about the usecase
> but his description didn't match what it does so it's not obvious what the real
> usecase is. I'm sure there can be many more examples and systemd isn't important
> here in particular beside to understand this specific example.

It's both of the above when 'systemd' is not taken literally.
To earlier Thomas's point: the use case is not only about systemd.
There are other containers management systems.
I've used 'systemd-like' terminology as an attempt to explain that such
daemons are trusted signed binaries that can be run as pid=1.
Sometimes it's the later:
"process started by privileged user but run as another user which keeps
some of parent user privileges and drops others".
Sometimes capability delegation to another container management daemon
is too cumbersome, so it's easier to use suid bit on that other daemon.
So it will become like the former:
"sort-of unprivileged process started by unprivileged user."
where daemon has suid and drops most of the capabilities as it starts.
Let's not focus on the model being good or bad security wise.
The point that those are the use cases that folks are thinking about.
That secondary daemon can be full root just fine.
All outer and inner daemons can be root.
These daemons need to drop privileges to make the system safer ==
less prone to corruption due to bugs in themselves. Not necessary security bugs.

^ permalink raw reply

* [PATCH net-next 0/2] net: phy: realtek: support NBase-T MMD EEE registers on RTL8125
From: Heiner Kallweit @ 2019-08-16 19:55 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org

Add missing EEE-related constants, including the new MMD EEE registers
for NBase-T / 802.3bz. Based on that emulate the new 802.3bz MMD EEE
registers for 2.5Gbps EEE on RTL8125.

Heiner Kallweit (2):
  net: phy: add EEE-related constants
  net: phy: realtek: support NBase-T MMD EEE registers on RTL8125

 drivers/net/phy/realtek.c | 45 +++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/mdio.h | 10 +++++++++
 2 files changed, 53 insertions(+), 2 deletions(-)

-- 
2.22.1


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox