Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next 3/3] bpftool: support loading flow dissector
From: Stanislav Fomichev @ 2018-11-07 23:34 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Stanislav Fomichev, netdev, linux-kselftest, ast, daniel, shuah,
	quentin.monnet, guro, jiong.wang, bhole_prashant_q7,
	john.fastabend, jbenc, treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181107152917.729ea83c@cakuba.netronome.com>

On 11/07, Jakub Kicinski wrote:
> On Wed, 7 Nov 2018 15:13:33 -0800, Stanislav Fomichev wrote:
> > On 11/07, Jakub Kicinski wrote:
> > > On Wed,  7 Nov 2018 14:43:56 -0800, Stanislav Fomichev wrote:  
> > > > bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
> > > >         key 0 0 0 0 \
> > > >         value pinned /sys/fs/bpf/flow/IP/0  
> > > 
> > > Where is that /0 coming from ?  Is that in source code?  I don't see
> > > libbpf adding it, maybe I'm missing something.  
> > libbpf adds that, that's a program instance:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/bpf/libbpf.c#n1744
> 
> Ugh, I was looking at bpf_object__pin() which uses names :(
> 
> We never use this multi-instance thing, and I don't think bpftool ever
> will, so IMHO it'd be good if we just re-did the pinning loop in
> bpftool.
I wonder whether I should just add special case to bpf_program__pin: don't
create a subdir when instances.nr == 1 (and just create a file pin for
single instance)? In that case I can continue to use libbpf and don't reinvent
the wheel. Any objections?

^ permalink raw reply

* Re: [PATCH bpf-next 2/2] bpftool: support loading flow dissector
From: Jakub Kicinski @ 2018-11-07 23:30 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Quentin Monnet, Stanislav Fomichev, netdev, linux-kselftest, ast,
	daniel, shuah, guro, jiong.wang, bhole_prashant_q7,
	john.fastabend, jbenc, treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181107231207.v425etmqm4v6rito@mini-arch>

On Wed, 7 Nov 2018 15:12:07 -0800, Stanislav Fomichev wrote:
> > > I agree, constructing the jmp_table is a bit fragile with all the
> > > dependencies on the order of the progs. I'll drop that and will send a
> > > v2 that pins all the programs from the obj file instead and offloads
> > > jmp_table construction to the user. So the supposed use case would be
> > > something like the following:
> > > 
> > > bpftool prog load bpf_flow.o /sys/fs/bpf/flow type flow_dissector  
> > 
> > Okay.  One more thing - how do we differentiate between mass pin and the
> > existing pin first behaviour?  Should we perhaps add a loadall command
> > or some flag?  
> In v2 I did by program type:
> * flow_dissector -> pin all
> * not flow_dissector -> pin first?
> 
> But we can have loadall or something like:
> load OBJ [pinfirst|pinall] FILE|DIR [type TYPE]
> 
> If we want to add user control, I'd go with loadall command,
> adding more optional flags in between is a mess..

I think user control would be good.  Agreed on loadall being better.

^ permalink raw reply

* Re: [PATCH bpf-next 3/3] bpftool: support loading flow dissector
From: Jakub Kicinski @ 2018-11-07 23:29 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Stanislav Fomichev, netdev, linux-kselftest, ast, daniel, shuah,
	quentin.monnet, guro, jiong.wang, bhole_prashant_q7,
	john.fastabend, jbenc, treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181107231333.n34ffy4dl6rtbs7e@mini-arch>

On Wed, 7 Nov 2018 15:13:33 -0800, Stanislav Fomichev wrote:
> On 11/07, Jakub Kicinski wrote:
> > On Wed,  7 Nov 2018 14:43:56 -0800, Stanislav Fomichev wrote:  
> > > bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
> > >         key 0 0 0 0 \
> > >         value pinned /sys/fs/bpf/flow/IP/0  
> > 
> > Where is that /0 coming from ?  Is that in source code?  I don't see
> > libbpf adding it, maybe I'm missing something.  
> libbpf adds that, that's a program instance:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/bpf/libbpf.c#n1744

Ugh, I was looking at bpf_object__pin() which uses names :(

We never use this multi-instance thing, and I don't think bpftool ever
will, so IMHO it'd be good if we just re-did the pinning loop in
bpftool.

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] libbpf: cleanup after partial failure in bpf_object__pin
From: Stanislav Fomichev @ 2018-11-07 23:25 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Stanislav Fomichev, netdev, linux-kselftest, ast, daniel, shuah,
	quentin.monnet, guro, jiong.wang, bhole_prashant_q7,
	john.fastabend, jbenc, treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181107150605.59a8ac1c@cakuba.netronome.com>

On 11/07, Jakub Kicinski wrote:
> On Wed, 7 Nov 2018 15:00:21 -0800, Stanislav Fomichev wrote:
> > > > +err_unpin_programs:
> > > > +	bpf_object__for_each_program(prog, obj) {
> > > > +		char buf[PATH_MAX];
> > > > +		int len;
> > > > +
> > > > +		len = snprintf(buf, PATH_MAX, "%s/%s", path,
> > > > +			       prog->section_name);
> > > > +		if (len < 0)
> > > > +			continue;
> > > > +		else if (len >= PATH_MAX)
> > > > +			continue;
> > > > +
> > > > +		unlink(buf);  
> > > 
> > > I think that's no bueno, if pin failed because the file already exists
> > > you'll now remove that already existing file.  
> >
> > How about we check beforehand and bail early if we are going to
> > overwrite something?
> 
> Possible, although the most common way to handle situation like this in
> the kernel is to "continue the iteration in reverse" over the list.
> I.e. walk the list back.  I think the objects are on a double linked
> list.  You may need to add the appropriate foreach macro and helper..
That sounds more complicated than just ensuring that the top directory
for the pins doesn't exist and then rm -rf it on failure.
I'm thinking about copy-pasting rm_rf from perf
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/util.c#n119).
Thoughts?

Btw, current patch won't work because of those /0 added by bpf_program__pin.

^ permalink raw reply

* [PATCH iproute2] bridge: fdb: remove redundant dev string in show output
From: Roopa Prabhu @ 2018-11-07 23:14 UTC (permalink / raw)
  To: stephen; +Cc: netdev, phil, nikolay, dsahern

From: Roopa Prabhu <roopa@cumulusnetworks.com>

After commit 4abb8c723a64 ("bridge: fdb: Fix for missing
keywords in non-JSON output"), I am seeing a double print for dev
in bridge fdb show. eg:
"44:38:39:00:6a:82 dev dev bridge vlan 1 master bridge permanent"

this patch removes the redundant print.

Fixes: 4abb8c723a64 ("bridge: fdb: Fix for missing keywords in non-JSON output")
CC: Phil Sutter <phil@nwl.cc>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 bridge/fdb.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index f82938f..a5abc1b 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -181,13 +181,10 @@ int print_fdb(struct nlmsghdr *n, void *arg)
 				   "mac", "%s ", lladdr);
 	}
 
-	if (!filter_index && r->ndm_ifindex) {
-		if (!is_json_context())
-			fprintf(fp, "dev ");
+	if (!filter_index && r->ndm_ifindex)
 		print_color_string(PRINT_ANY, COLOR_IFNAME,
 				   "ifname", "dev %s ",
 				   ll_index_to_name(r->ndm_ifindex));
-	}
 
 	if (tb[NDA_DST]) {
 		int family = AF_INET;
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH bpf-next 3/3] bpftool: support loading flow dissector
From: Stanislav Fomichev @ 2018-11-07 23:13 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Stanislav Fomichev, netdev, linux-kselftest, ast, daniel, shuah,
	quentin.monnet, guro, jiong.wang, bhole_prashant_q7,
	john.fastabend, jbenc, treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181107150944.5d5480ba@cakuba.netronome.com>

On 11/07, Jakub Kicinski wrote:
> On Wed,  7 Nov 2018 14:43:56 -0800, Stanislav Fomichev wrote:
> > bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
> >         key 0 0 0 0 \
> >         value pinned /sys/fs/bpf/flow/IP/0
> 
> Where is that /0 coming from ?  Is that in source code?  I don't see
> libbpf adding it, maybe I'm missing something.
libbpf adds that, that's a program instance:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/bpf/libbpf.c#n1744

^ permalink raw reply

* Re: [PATCH bpf-next 2/2] bpftool: support loading flow dissector
From: Stanislav Fomichev @ 2018-11-07 23:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Quentin Monnet, Stanislav Fomichev, netdev, linux-kselftest, ast,
	daniel, shuah, guro, jiong.wang, bhole_prashant_q7,
	john.fastabend, jbenc, treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181107145122.170daba2@cakuba.netronome.com>

On 11/07, Jakub Kicinski wrote:
> On Wed, 7 Nov 2018 14:15:06 -0800, Stanislav Fomichev wrote:
> > On 11/07, Quentin Monnet wrote:
> > > 2018-11-07 12:32 UTC-0800 ~ Jakub Kicinski <jakub.kicinski@netronome.com>  
> > > > On Wed, 7 Nov 2018 20:08:53 +0000, Quentin Monnet wrote:  
> > > > > > +		err = bpf_obj_pin(bpf_program__fd(prog), pinfile);
> > > > > > +		if (err) {
> > > > > > +			p_err("failed to pin program %s",
> > > > > > +			      bpf_program__title(prog, false));
> > > > > > +			goto err_close_obj;
> > > > > > +		}  
> > > > > 
> > > > > I don't have the same opinion as Jakub for pinning :). I was hoping we
> > > > > could also load additional programs (for tail calls) for
> > > > > non-flow_dissector programs. Could this be an occasion to update the
> > > > > code in that direction?  
> > > > 
> > > > Do you mean having the bpftool construct an array for tail calling
> > > > automatically when loading an object?  Or do a "mass pin" of all
> > > > programs in an object file?
> > > > 
> > > > I'm not convinced about this strategy of auto assembling a tail call
> > > > array by assuming that a flow dissector object carries programs for
> > > > protocols in order (apart from the main program which doesn't have to
> > > > be first, for some reason).  
> > > 
> > > Not constructing the prog array, I don't think this should be the role of
> > > bpftool either. Much more a "mass pin", so that you have a link to each
> > > program loaded from the object file and can later add them to a prog array
> > > map with subsequent calls to bpftool.  
> 
> Makes sense, specific files named after index or program name/title?
> Program name may be nicer?
> 
> > I agree, constructing the jmp_table is a bit fragile with all the
> > dependencies on the order of the progs. I'll drop that and will send a
> > v2 that pins all the programs from the obj file instead and offloads
> > jmp_table construction to the user. So the supposed use case would be
> > something like the following:
> > 
> > bpftool prog load bpf_flow.o /sys/fs/bpf/flow type flow_dissector
> 
> Okay.  One more thing - how do we differentiate between mass pin and the
> existing pin first behaviour?  Should we perhaps add a loadall command
> or some flag?
In v2 I did by program type:
* flow_dissector -> pin all
* not flow_dissector -> pin first?

But we can have loadall or something like:
load OBJ [pinfirst|pinall] FILE|DIR [type TYPE]

If we want to add user control, I'd go with loadall command,
adding more optional flags in between is a mess..

> > bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
> > 	key ... value pinned /sys/fs/bpf/flow/<PROTO>/0
> 
> Interesting, why $dir/$title/0 ?  Perhaps $dir/$title is sufficient?
That's how bpf_program__pin does the pinning, for each program instance.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/bpf/libbpf.c#n1744

> 
> > bpftool map update ...
> > bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector/0 flow_dissector
> 

^ permalink raw reply

* Re: [PATCH bpf-next 3/3] bpftool: support loading flow dissector
From: Jakub Kicinski @ 2018-11-07 23:09 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, linux-kselftest, ast, daniel, shuah, quentin.monnet, guro,
	jiong.wang, bhole_prashant_q7, john.fastabend, jbenc,
	treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181107224356.73080-4-sdf@google.com>

On Wed,  7 Nov 2018 14:43:56 -0800, Stanislav Fomichev wrote:
> bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
>         key 0 0 0 0 \
>         value pinned /sys/fs/bpf/flow/IP/0

Where is that /0 coming from ?  Is that in source code?  I don't see
libbpf adding it, maybe I'm missing something.

^ permalink raw reply

* Re: [PATCH net-next] sock: Reset dst when changing sk_mark via setsockopt
From: David Barmann @ 2018-11-07 23:07 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet
In-Reply-To: <4a6b111a-c0d2-e12c-bc42-3c723a0297bc@gmail.com>

> On 11/07/2018 02:33 PM, David Barmann wrote:

> > When setting the SO_MARK socket option, the dst needs to be reset so
> > that a new route lookup is performed.
> > 
> > This fixes the case where an application wants to change routing by
> > setting a new sk_mark.  If this is done after some packets have already
> > been sent, the dst is cached and has no effect.
> > 

> > +		} else {
> > +			struct dst_entry *dst = sk_dst_get(sk);
> >  			sk->sk_mark = val;
> > +			sk_dst_reset(sk);
> 
> 
> sk_dst_get() and dst_release() seems extra overhead ?
> 
> net/ipv4/ip_sockglue.c do_ip_setsockopt() has a similar handling for IP_TOS,
> and it only calls sk_dst_reset() (If the new TOS is different than the current one)
> 
> 
> 
> So I would suggest :
> 
> 	if (!ns_capable(...)) {
>         	ret = -EPERM;
> 	} else if (val != sk->sk_mark) {
> 		sk->sk_mark = val;
> 		sk_dst_reset(sk);
> 	}


Thanks Eric, I'll redo that and resubmit.

-David

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] libbpf: cleanup after partial failure in bpf_object__pin
From: Jakub Kicinski @ 2018-11-07 23:06 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Stanislav Fomichev, netdev, linux-kselftest, ast, daniel, shuah,
	quentin.monnet, guro, jiong.wang, bhole_prashant_q7,
	john.fastabend, jbenc, treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181107230021.4yr2h6fpzhvarfcj@mini-arch>

On Wed, 7 Nov 2018 15:00:21 -0800, Stanislav Fomichev wrote:
> > > +err_unpin_programs:
> > > +	bpf_object__for_each_program(prog, obj) {
> > > +		char buf[PATH_MAX];
> > > +		int len;
> > > +
> > > +		len = snprintf(buf, PATH_MAX, "%s/%s", path,
> > > +			       prog->section_name);
> > > +		if (len < 0)
> > > +			continue;
> > > +		else if (len >= PATH_MAX)
> > > +			continue;
> > > +
> > > +		unlink(buf);  
> > 
> > I think that's no bueno, if pin failed because the file already exists
> > you'll now remove that already existing file.  
>
> How about we check beforehand and bail early if we are going to
> overwrite something?

Possible, although the most common way to handle situation like this in
the kernel is to "continue the iteration in reverse" over the list.
I.e. walk the list back.  I think the objects are on a double linked
list.  You may need to add the appropriate foreach macro and helper..

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] libbpf: cleanup after partial failure in bpf_object__pin
From: Stanislav Fomichev @ 2018-11-07 23:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Stanislav Fomichev, netdev, linux-kselftest, ast, daniel, shuah,
	quentin.monnet, guro, jiong.wang, bhole_prashant_q7,
	john.fastabend, jbenc, treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181107145648.66f47037@cakuba.netronome.com>

On 11/07, Jakub Kicinski wrote:
> On Wed,  7 Nov 2018 14:43:55 -0800, Stanislav Fomichev wrote:
> > bpftool will use bpf_object__pin in the next commit to pin all programs
> > and maps from the file; in case of a partial failure, we need to get
> > back to the clean state (undo previous program/map pins).
> > 
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  tools/lib/bpf/libbpf.c | 58 ++++++++++++++++++++++++++++++++++--------
> >  1 file changed, 48 insertions(+), 10 deletions(-)
> > 
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index d6e62e90e8d4..309abe7196f3 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -1803,14 +1803,17 @@ int bpf_object__pin(struct bpf_object *obj, const char *path)
> >  
> >  		len = snprintf(buf, PATH_MAX, "%s/%s", path,
> >  			       bpf_map__name(map));
> > -		if (len < 0)
> > -			return -EINVAL;
> > -		else if (len >= PATH_MAX)
> > -			return -ENAMETOOLONG;
> > +		if (len < 0) {
> > +			err = -EINVAL;
> > +			goto err_unpin_maps;
> > +		} else if (len >= PATH_MAX) {
> > +			err = -ENAMETOOLONG;
> > +			goto err_unpin_maps;
> > +		}
> >  
> >  		err = bpf_map__pin(map, buf);
> >  		if (err)
> > -			return err;
> > +			goto err_unpin_maps;
> >  	}
> >  
> >  	bpf_object__for_each_program(prog, obj) {
> > @@ -1819,17 +1822,52 @@ int bpf_object__pin(struct bpf_object *obj, const char *path)
> >  
> >  		len = snprintf(buf, PATH_MAX, "%s/%s", path,
> >  			       prog->section_name);
> > -		if (len < 0)
> > -			return -EINVAL;
> > -		else if (len >= PATH_MAX)
> > -			return -ENAMETOOLONG;
> > +		if (len < 0) {
> > +			err = -EINVAL;
> > +			goto err_unpin_programs;
> > +		} else if (len >= PATH_MAX) {
> > +			err = -ENAMETOOLONG;
> > +			goto err_unpin_programs;
> > +		}
> >  
> >  		err = bpf_program__pin(prog, buf);
> >  		if (err)
> > -			return err;
> > +			goto err_unpin_programs;
> >  	}
> >  
> >  	return 0;
> > +
> > +err_unpin_programs:
> > +	bpf_object__for_each_program(prog, obj) {
> > +		char buf[PATH_MAX];
> > +		int len;
> > +
> > +		len = snprintf(buf, PATH_MAX, "%s/%s", path,
> > +			       prog->section_name);
> > +		if (len < 0)
> > +			continue;
> > +		else if (len >= PATH_MAX)
> > +			continue;
> > +
> > +		unlink(buf);
> 
> I think that's no bueno, if pin failed because the file already exists
> you'll now remove that already existing file.
How about we check beforehand and bail early if we are going to
overwrite something?

> > +	}
> > +
> > +err_unpin_maps:
> > +	bpf_map__for_each(map, obj) {
> > +		char buf[PATH_MAX];
> > +		int len;
> > +
> > +		len = snprintf(buf, PATH_MAX, "%s/%s", path,
> > +			       bpf_map__name(map));
> > +		if (len < 0)
> > +			continue;
> > +		else if (len >= PATH_MAX)
> > +			continue;
> > +
> > +		unlink(buf);
> > +	}
> > +
> > +	return err;
> >  }
> >  
> >  void bpf_object__close(struct bpf_object *obj)
> 

^ permalink raw reply

* Re: [PATCH net-next] sock: Reset dst when changing sk_mark via setsockopt
From: Eric Dumazet @ 2018-11-07 22:58 UTC (permalink / raw)
  To: David Barmann, netdev
In-Reply-To: <20181107223327.GA21018@konacove.com>



On 11/07/2018 02:33 PM, David Barmann wrote:
> When setting the SO_MARK socket option, the dst needs to be reset so
> that a new route lookup is performed.
> 
> This fixes the case where an application wants to change routing by
> setting a new sk_mark.  If this is done after some packets have already
> been sent, the dst is cached and has no effect.
> 
> Signed-off-by: David Barmann <david.barmann@stackpath.com>
> ---
>  net/core/sock.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 6fcc4bc07d19..187badac24a3 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -950,10 +950,14 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
>  			clear_bit(SOCK_PASSSEC, &sock->flags);
>  		break;
>  	case SO_MARK:
> -		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
> +		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
>  			ret = -EPERM;
> -		else
> +		} else {
> +			struct dst_entry *dst = sk_dst_get(sk);
>  			sk->sk_mark = val;
> +			sk_dst_reset(sk);


sk_dst_get() and dst_release() seems extra overhead ?

net/ipv4/ip_sockglue.c do_ip_setsockopt() has a similar handling for IP_TOS,
and it only calls sk_dst_reset() (If the new TOS is different than the current one)



So I would suggest :

	if (!ns_capable(...)) {
        	ret = -EPERM;
	} else if (val != sk->sk_mark) {
		sk->sk_mark = val;
		sk_dst_reset(sk);
	}

^ permalink raw reply

* [PATCH] staging: net: ipv4: tcp_vegas: fixed checks and warnings
From: Suraj Singh @ 2018-11-08  8:29 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji; +Cc: netdev, linux-kernel, edumazet, Suraj Singh

Fixed checks and warnings in TCP Vegas

Signed-off-by: Suraj Singh <suraj1998@gmail.com>
---
 net/ipv4/tcp_vegas.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_vegas.c b/net/ipv4/tcp_vegas.c
index ee113ff..0f65cbe 100644
--- a/net/ipv4/tcp_vegas.c
+++ b/net/ipv4/tcp_vegas.c
@@ -139,8 +139,7 @@ void tcp_vegas_state(struct sock *sk, u8 ca_state)
 }
 EXPORT_SYMBOL_GPL(tcp_vegas_state);
 
-/*
- * If the connection is idle and we are restarting,
+/* If the connection is idle and we are restarting,
  * then we don't want to do any Vegas calculations
  * until we get fresh RTT samples.  So when we
  * restart, we reset our Vegas state to a clean
@@ -223,7 +222,7 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 acked)
 			 * and the window we would like to have. This quantity
 			 * is the "Diff" from the Arizona Vegas papers.
 			 */
-			diff = tp->snd_cwnd * (rtt-vegas->baseRTT) / vegas->baseRTT;
+			diff = tp->snd_cwnd * (rtt - vegas->baseRTT) / vegas->baseRTT;
 
 			if (diff > gamma && tcp_in_slow_start(tp)) {
 				/* Going too fast. Time to slow down
@@ -237,7 +236,7 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 acked)
 				 * truncation robs us of full link
 				 * utilization.
 				 */
-				tp->snd_cwnd = min(tp->snd_cwnd, (u32)target_cwnd+1);
+				tp->snd_cwnd = min(tp->snd_cwnd, (u32)target_cwnd + 1);
 				tp->snd_ssthresh = tcp_vegas_ssthresh(tp);
 
 			} else if (tcp_in_slow_start(tp)) {
@@ -254,8 +253,7 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 acked)
 					 * we slow down.
 					 */
 					tp->snd_cwnd--;
-					tp->snd_ssthresh
-						= tcp_vegas_ssthresh(tp);
+					tp->snd_ssthresh = tcp_vegas_ssthresh(tp);
 				} else if (diff < alpha) {
 					/* We don't have enough extra packets
 					 * in the network, so speed up.
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH bpf-next 2/3] libbpf: cleanup after partial failure in bpf_object__pin
From: Jakub Kicinski @ 2018-11-07 22:56 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, linux-kselftest, ast, daniel, shuah, quentin.monnet, guro,
	jiong.wang, bhole_prashant_q7, john.fastabend, jbenc,
	treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181107224356.73080-3-sdf@google.com>

On Wed,  7 Nov 2018 14:43:55 -0800, Stanislav Fomichev wrote:
> bpftool will use bpf_object__pin in the next commit to pin all programs
> and maps from the file; in case of a partial failure, we need to get
> back to the clean state (undo previous program/map pins).
> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  tools/lib/bpf/libbpf.c | 58 ++++++++++++++++++++++++++++++++++--------
>  1 file changed, 48 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index d6e62e90e8d4..309abe7196f3 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -1803,14 +1803,17 @@ int bpf_object__pin(struct bpf_object *obj, const char *path)
>  
>  		len = snprintf(buf, PATH_MAX, "%s/%s", path,
>  			       bpf_map__name(map));
> -		if (len < 0)
> -			return -EINVAL;
> -		else if (len >= PATH_MAX)
> -			return -ENAMETOOLONG;
> +		if (len < 0) {
> +			err = -EINVAL;
> +			goto err_unpin_maps;
> +		} else if (len >= PATH_MAX) {
> +			err = -ENAMETOOLONG;
> +			goto err_unpin_maps;
> +		}
>  
>  		err = bpf_map__pin(map, buf);
>  		if (err)
> -			return err;
> +			goto err_unpin_maps;
>  	}
>  
>  	bpf_object__for_each_program(prog, obj) {
> @@ -1819,17 +1822,52 @@ int bpf_object__pin(struct bpf_object *obj, const char *path)
>  
>  		len = snprintf(buf, PATH_MAX, "%s/%s", path,
>  			       prog->section_name);
> -		if (len < 0)
> -			return -EINVAL;
> -		else if (len >= PATH_MAX)
> -			return -ENAMETOOLONG;
> +		if (len < 0) {
> +			err = -EINVAL;
> +			goto err_unpin_programs;
> +		} else if (len >= PATH_MAX) {
> +			err = -ENAMETOOLONG;
> +			goto err_unpin_programs;
> +		}
>  
>  		err = bpf_program__pin(prog, buf);
>  		if (err)
> -			return err;
> +			goto err_unpin_programs;
>  	}
>  
>  	return 0;
> +
> +err_unpin_programs:
> +	bpf_object__for_each_program(prog, obj) {
> +		char buf[PATH_MAX];
> +		int len;
> +
> +		len = snprintf(buf, PATH_MAX, "%s/%s", path,
> +			       prog->section_name);
> +		if (len < 0)
> +			continue;
> +		else if (len >= PATH_MAX)
> +			continue;
> +
> +		unlink(buf);

I think that's no bueno, if pin failed because the file already exists
you'll now remove that already existing file.

> +	}
> +
> +err_unpin_maps:
> +	bpf_map__for_each(map, obj) {
> +		char buf[PATH_MAX];
> +		int len;
> +
> +		len = snprintf(buf, PATH_MAX, "%s/%s", path,
> +			       bpf_map__name(map));
> +		if (len < 0)
> +			continue;
> +		else if (len >= PATH_MAX)
> +			continue;
> +
> +		unlink(buf);
> +	}
> +
> +	return err;
>  }
>  
>  void bpf_object__close(struct bpf_object *obj)

^ permalink raw reply

* Re: [PATCH bpf-next 2/2] bpftool: support loading flow dissector
From: Jakub Kicinski @ 2018-11-07 22:51 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Quentin Monnet, Stanislav Fomichev, netdev, linux-kselftest, ast,
	daniel, shuah, guro, jiong.wang, bhole_prashant_q7,
	john.fastabend, jbenc, treeze.taeung, yhs, osk, sandipan
In-Reply-To: <20181107221506.ha4c7wr4p2zswgjd@mini-arch>

On Wed, 7 Nov 2018 14:15:06 -0800, Stanislav Fomichev wrote:
> On 11/07, Quentin Monnet wrote:
> > 2018-11-07 12:32 UTC-0800 ~ Jakub Kicinski <jakub.kicinski@netronome.com>  
> > > On Wed, 7 Nov 2018 20:08:53 +0000, Quentin Monnet wrote:  
> > > > > +		err = bpf_obj_pin(bpf_program__fd(prog), pinfile);
> > > > > +		if (err) {
> > > > > +			p_err("failed to pin program %s",
> > > > > +			      bpf_program__title(prog, false));
> > > > > +			goto err_close_obj;
> > > > > +		}  
> > > > 
> > > > I don't have the same opinion as Jakub for pinning :). I was hoping we
> > > > could also load additional programs (for tail calls) for
> > > > non-flow_dissector programs. Could this be an occasion to update the
> > > > code in that direction?  
> > > 
> > > Do you mean having the bpftool construct an array for tail calling
> > > automatically when loading an object?  Or do a "mass pin" of all
> > > programs in an object file?
> > > 
> > > I'm not convinced about this strategy of auto assembling a tail call
> > > array by assuming that a flow dissector object carries programs for
> > > protocols in order (apart from the main program which doesn't have to
> > > be first, for some reason).  
> > 
> > Not constructing the prog array, I don't think this should be the role of
> > bpftool either. Much more a "mass pin", so that you have a link to each
> > program loaded from the object file and can later add them to a prog array
> > map with subsequent calls to bpftool.  

Makes sense, specific files named after index or program name/title?
Program name may be nicer?

> I agree, constructing the jmp_table is a bit fragile with all the
> dependencies on the order of the progs. I'll drop that and will send a
> v2 that pins all the programs from the obj file instead and offloads
> jmp_table construction to the user. So the supposed use case would be
> something like the following:
> 
> bpftool prog load bpf_flow.o /sys/fs/bpf/flow type flow_dissector

Okay.  One more thing - how do we differentiate between mass pin and the
existing pin first behaviour?  Should we perhaps add a loadall command
or some flag?

> bpftool map update pinned /sys/fs/bpf/flow/jmp_table \
> 	key ... value pinned /sys/fs/bpf/flow/<PROTO>/0

Interesting, why $dir/$title/0 ?  Perhaps $dir/$title is sufficient?

> bpftool map update ...
> bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector/0 flow_dissector

^ permalink raw reply

* [net-next 11/12] e1000e: allow non-monotonic SYSTIM readings
From: Jeff Kirsher @ 2018-11-07 22:48 UTC (permalink / raw)
  To: davem
  Cc: Miroslav Lichvar, netdev, nhorman, sassmann, Richard Cochran,
	Jeff Kirsher
In-Reply-To: <20181107224830.9737-1-jeffrey.t.kirsher@intel.com>

From: Miroslav Lichvar <mlichvar@redhat.com>

It seems with some NICs supported by the e1000e driver a SYSTIM reading
may occasionally be few microseconds before the previous reading and if
enabled also pass e1000e_sanitize_systim() without reaching the maximum
number of rereads, even if the function is modified to check three
consecutive readings (i.e. it doesn't look like a double read error).
This causes an underflow in the timecounter and the PHC time jumps hours
ahead.

This was observed on 82574, I217 and I219. The fastest way to reproduce
it is to run a program that continuously calls the PTP_SYS_OFFSET ioctl
on the PHC.

Modify e1000e_phc_gettime() to use timecounter_cyc2time() instead of
timecounter_read() in order to allow non-monotonic SYSTIM readings and
prevent the PHC from jumping.

Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/ptp.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/ptp.c b/drivers/net/ethernet/intel/e1000e/ptp.c
index 37c76945ad9b..e1f821edbc21 100644
--- a/drivers/net/ethernet/intel/e1000e/ptp.c
+++ b/drivers/net/ethernet/intel/e1000e/ptp.c
@@ -173,10 +173,14 @@ static int e1000e_phc_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
 	struct e1000_adapter *adapter = container_of(ptp, struct e1000_adapter,
 						     ptp_clock_info);
 	unsigned long flags;
-	u64 ns;
+	u64 cycles, ns;
 
 	spin_lock_irqsave(&adapter->systim_lock, flags);
-	ns = timecounter_read(&adapter->tc);
+
+	/* Use timecounter_cyc2time() to allow non-monotonic SYSTIM readings */
+	cycles = adapter->cc.read(&adapter->cc);
+	ns = timecounter_cyc2time(&adapter->tc, cycles);
+
 	spin_unlock_irqrestore(&adapter->systim_lock, flags);
 
 	*ts = ns_to_timespec64(ns);
@@ -232,9 +236,12 @@ static void e1000e_systim_overflow_work(struct work_struct *work)
 						     systim_overflow_work.work);
 	struct e1000_hw *hw = &adapter->hw;
 	struct timespec64 ts;
+	u64 ns;
 
-	adapter->ptp_clock_info.gettime64(&adapter->ptp_clock_info, &ts);
+	/* Update the timecounter */
+	ns = timecounter_read(&adapter->tc);
 
+	ts = ns_to_timespec64(ns);
 	e_dbg("SYSTIM overflow check at %lld.%09lu\n",
 	      (long long) ts.tv_sec, ts.tv_nsec);
 
-- 
2.19.1

^ permalink raw reply related

* [net-next 10/12] igc: Tidy up some white space
From: Jeff Kirsher @ 2018-11-07 22:48 UTC (permalink / raw)
  To: davem; +Cc: Dan Carpenter, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181107224830.9737-1-jeffrey.t.kirsher@intel.com>

From: Dan Carpenter <dan.carpenter@oracle.com>

I just cleaned up a couple small white space issues.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index f176540f5ed7..827eda044d97 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1108,7 +1108,7 @@ static struct sk_buff *igc_build_skb(struct igc_ring *rx_ring,
 
 	/* update pointers within the skb to store the data */
 	skb_reserve(skb, IGC_SKB_PAD);
-	 __skb_put(skb, size);
+	__skb_put(skb, size);
 
 	/* update buffer offset */
 #if (PAGE_SIZE < 8192)
@@ -1160,9 +1160,9 @@ static struct sk_buff *igc_construct_skb(struct igc_ring *rx_ring,
 				(va + headlen) - page_address(rx_buffer->page),
 				size, truesize);
 #if (PAGE_SIZE < 8192)
-	rx_buffer->page_offset ^= truesize;
+		rx_buffer->page_offset ^= truesize;
 #else
-	rx_buffer->page_offset += truesize;
+		rx_buffer->page_offset += truesize;
 #endif
 	} else {
 		rx_buffer->pagecnt_bias++;
@@ -1668,8 +1668,8 @@ static bool igc_clean_tx_irq(struct igc_q_vector *q_vector, int napi_budget)
 				tx_buffer->next_to_watch,
 				jiffies,
 				tx_buffer->next_to_watch->wb.status);
-				netif_stop_subqueue(tx_ring->netdev,
-						    tx_ring->queue_index);
+			netif_stop_subqueue(tx_ring->netdev,
+					    tx_ring->queue_index);
 
 			/* we are about to reset, no point in enabling stuff */
 			return true;
-- 
2.19.1

^ permalink raw reply related

* [net-next 12/12] igc: Clean up code
From: Jeff Kirsher @ 2018-11-07 22:48 UTC (permalink / raw)
  To: davem; +Cc: Sasha Neftin, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181107224830.9737-1-jeffrey.t.kirsher@intel.com>

From: Sasha Neftin <sasha.neftin@intel.com>

Address few community comments.
Remove unused code, will be added per demand.
Remove blank lines and unneeded includes.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igc/igc.h      |  9 ---------
 drivers/net/ethernet/intel/igc/igc_main.c | 15 ---------------
 2 files changed, 24 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index cdf18a5d9e08..3b00b109b34a 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -5,23 +5,14 @@
 #define _IGC_H_
 
 #include <linux/kobject.h>
-
 #include <linux/pci.h>
 #include <linux/netdevice.h>
 #include <linux/vmalloc.h>
-
 #include <linux/ethtool.h>
-
 #include <linux/sctp.h>
 
 #define IGC_ERR(args...) pr_err("igc: " args)
 
-#define PFX "igc: "
-
-#include <linux/timecounter.h>
-#include <linux/net_tstamp.h>
-#include <linux/ptp_clock_kernel.h>
-
 #include "igc_hw.h"
 
 /* main */
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 827eda044d97..d002055c0623 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1699,20 +1699,6 @@ static bool igc_clean_tx_irq(struct igc_q_vector *q_vector, int napi_budget)
 	return !!budget;
 }
 
-/**
- * igc_ioctl - I/O control method
- * @netdev: network interface device structure
- * @ifreq: frequency
- * @cmd: command
- */
-static int igc_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
-{
-	switch (cmd) {
-	default:
-		return -EOPNOTSUPP;
-	}
-}
-
 /**
  * igc_up - Open the interface and prepare it to handle traffic
  * @adapter: board private structure
@@ -3445,7 +3431,6 @@ static const struct net_device_ops igc_netdev_ops = {
 	.ndo_set_mac_address	= igc_set_mac,
 	.ndo_change_mtu		= igc_change_mtu,
 	.ndo_get_stats		= igc_get_stats,
-	.ndo_do_ioctl		= igc_ioctl,
 };
 
 /* PCIe configuration access */
-- 
2.19.1

^ permalink raw reply related

* [net-next 09/12] igc: fix error return handling from call to netif_set_real_num_tx_queues
From: Jeff Kirsher @ 2018-11-07 22:48 UTC (permalink / raw)
  To: davem; +Cc: Colin Ian King, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181107224830.9737-1-jeffrey.t.kirsher@intel.com>

From: Colin Ian King <colin.king@canonical.com>

The call to netif_set_real_num_tx_queues is not assigning the error
return to variable err even though the next line checks err for an
error.  Fix this by adding the missing err assignment.

Detected by CoverityScan, CID#1474551 ("Logically dead code")

Fixes: 3df25e4c1e66 ("igc: Add interrupt support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 529c3e952621..f176540f5ed7 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -3358,7 +3358,7 @@ static int __igc_open(struct net_device *netdev, bool resuming)
 		goto err_req_irq;
 
 	/* Notify the stack of the actual queue counts. */
-	netif_set_real_num_tx_queues(netdev, adapter->num_tx_queues);
+	err = netif_set_real_num_tx_queues(netdev, adapter->num_tx_queues);
 	if (err)
 		goto err_set_queues;
 
-- 
2.19.1

^ permalink raw reply related

* [net-next 08/12] igc: Remove set but not used variable 'pci_using_dac'
From: Jeff Kirsher @ 2018-11-07 22:48 UTC (permalink / raw)
  To: davem; +Cc: YueHaibing, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181107224830.9737-1-jeffrey.t.kirsher@intel.com>

From: YueHaibing <yuehaibing@huawei.com>

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/intel/igc/igc_main.c: In function 'igc_probe':
drivers/net/ethernet/intel/igc/igc_main.c:3535:11: warning:
 variable 'pci_using_dac' set but not used [-Wunused-but-set-variable]

It never used since introduction in commit
d89f88419f99 ("igc: Add skeletal frame for Intel(R) 2.5G Ethernet Controller support")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_main.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 615a5fcd5a00..529c3e952621 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -3532,19 +3532,16 @@ static int igc_probe(struct pci_dev *pdev,
 	struct net_device *netdev;
 	struct igc_hw *hw;
 	const struct igc_info *ei = igc_info_tbl[ent->driver_data];
-	int err, pci_using_dac;
+	int err;
 
 	err = pci_enable_device_mem(pdev);
 	if (err)
 		return err;
 
-	pci_using_dac = 0;
 	err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
 	if (!err) {
 		err = dma_set_coherent_mask(&pdev->dev,
 					    DMA_BIT_MASK(64));
-		if (!err)
-			pci_using_dac = 1;
 	} else {
 		err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
 		if (err) {
-- 
2.19.1

^ permalink raw reply related

* [net-next 07/12] igc: Remove set but not used variables 'ctrl_ext, link_mode'
From: Jeff Kirsher @ 2018-11-07 22:48 UTC (permalink / raw)
  To: davem; +Cc: YueHaibing, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181107224830.9737-1-jeffrey.t.kirsher@intel.com>

From: YueHaibing <yuehaibing@huawei.com>

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/intel/igc/igc_base.c: In function 'igc_init_phy_params_base':
drivers/net/ethernet/intel/igc/igc_base.c:240:6: warning:
 variable 'ctrl_ext' set but not used [-Wunused-but-set-variable]
  u32 ctrl_ext;

drivers/net/ethernet/intel/igc/igc_base.c: In function 'igc_get_invariants_base':
drivers/net/ethernet/intel/igc/igc_base.c:290:6: warning:
 variable 'link_mode' set but not used [-Wunused-but-set-variable]
  u32 link_mode = 0;

It never used since introduction in
commit c0071c7aa5fe ("igc: Add HW initialization code")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igc/igc_base.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/igc/igc_base.c b/drivers/net/ethernet/intel/igc/igc_base.c
index 832da609d9a7..df40af759542 100644
--- a/drivers/net/ethernet/intel/igc/igc_base.c
+++ b/drivers/net/ethernet/intel/igc/igc_base.c
@@ -237,7 +237,6 @@ static s32 igc_init_phy_params_base(struct igc_hw *hw)
 {
 	struct igc_phy_info *phy = &hw->phy;
 	s32 ret_val = 0;
-	u32 ctrl_ext;
 
 	if (hw->phy.media_type != igc_media_type_copper) {
 		phy->type = igc_phy_none;
@@ -247,8 +246,6 @@ static s32 igc_init_phy_params_base(struct igc_hw *hw)
 	phy->autoneg_mask	= AUTONEG_ADVERTISE_SPEED_DEFAULT_2500;
 	phy->reset_delay_us	= 100;
 
-	ctrl_ext = rd32(IGC_CTRL_EXT);
-
 	/* set lan id */
 	hw->bus.func = (rd32(IGC_STATUS) & IGC_STATUS_FUNC_MASK) >>
 			IGC_STATUS_FUNC_SHIFT;
@@ -287,8 +284,6 @@ static s32 igc_init_phy_params_base(struct igc_hw *hw)
 static s32 igc_get_invariants_base(struct igc_hw *hw)
 {
 	struct igc_mac_info *mac = &hw->mac;
-	u32 link_mode = 0;
-	u32 ctrl_ext = 0;
 	s32 ret_val = 0;
 
 	switch (hw->device_id) {
@@ -302,9 +297,6 @@ static s32 igc_get_invariants_base(struct igc_hw *hw)
 
 	hw->phy.media_type = igc_media_type_copper;
 
-	ctrl_ext = rd32(IGC_CTRL_EXT);
-	link_mode = ctrl_ext & IGC_CTRL_EXT_LINK_MODE_MASK;
-
 	/* mac initialization and operations */
 	ret_val = igc_init_mac_params_base(hw);
 	if (ret_val)
-- 
2.19.1

^ permalink raw reply related

* [net-next 04/12] ixgbevf: add support for software timestamps
From: Jeff Kirsher @ 2018-11-07 22:48 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181107224830.9737-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Add a call to skb_tx_timestamp in the ixgbevf_tx_map function. This
enables software timestamping for packets sent over this device driver.
The call is placed just prior to when we notify hardware of the new
packet, in order to software timestamp as close as possible to when the
hardware will transmit.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 5e47ede7e832..196b890467b2 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -4016,6 +4016,8 @@ static void ixgbevf_tx_map(struct ixgbevf_ring *tx_ring,
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
+	skb_tx_timestamp(skb);
+
 	/* Force memory writes to complete before letting h/w know there
 	 * are new descriptors to fetch.  (Only applicable for weak-ordered
 	 * memory model archs, such as IA-64).
-- 
2.19.1

^ permalink raw reply related

* [net-next 06/12] i40e/ixgbe/igb: fail on new WoL flag setting WAKE_MAGICSECURE
From: Jeff Kirsher @ 2018-11-07 22:48 UTC (permalink / raw)
  To: davem; +Cc: Todd Fujinaka, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181107224830.9737-1-jeffrey.t.kirsher@intel.com>

From: Todd Fujinaka <todd.fujinaka@intel.com>

There's a new flag for setting WoL filters that is only
enabled on one manufacturer's NICs, and it's not ours. Fail
with EOPNOTSUPP.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c   | 3 ++-
 drivers/net/ethernet/intel/igb/igb_ethtool.c     | 2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 3 ++-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 9f8464f80783..9c1211ad2c6b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2377,7 +2377,8 @@ static int i40e_set_wol(struct net_device *netdev, struct ethtool_wolinfo *wol)
 		return -EOPNOTSUPP;
 
 	/* only magic packet is supported */
-	if (wol->wolopts && (wol->wolopts != WAKE_MAGIC))
+	if (wol->wolopts && (wol->wolopts != WAKE_MAGIC)
+			  | (wol->wolopts != WAKE_FILTER))
 		return -EOPNOTSUPP;
 
 	/* is this a new value? */
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 5acf3b743876..c57671068245 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2113,7 +2113,7 @@ static int igb_set_wol(struct net_device *netdev, struct ethtool_wolinfo *wol)
 {
 	struct igb_adapter *adapter = netdev_priv(netdev);
 
-	if (wol->wolopts & (WAKE_ARP | WAKE_MAGICSECURE))
+	if (wol->wolopts & (WAKE_ARP | WAKE_MAGICSECURE | WAKE_FILTER))
 		return -EOPNOTSUPP;
 
 	if (!(adapter->flags & IGB_FLAG_WOL_SUPPORTED))
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 732b1e6ecc43..acba067cc15a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2206,7 +2206,8 @@ static int ixgbe_set_wol(struct net_device *netdev, struct ethtool_wolinfo *wol)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 
-	if (wol->wolopts & (WAKE_PHY | WAKE_ARP | WAKE_MAGICSECURE))
+	if (wol->wolopts & (WAKE_PHY | WAKE_ARP | WAKE_MAGICSECURE |
+			    WAKE_FILTER))
 		return -EOPNOTSUPP;
 
 	if (ixgbe_wol_exclusion(adapter, wol))
-- 
2.19.1

^ permalink raw reply related

* [net-next 03/12] ixgbe: allow IPsec Tx offload in VEPA mode
From: Jeff Kirsher @ 2018-11-07 22:48 UTC (permalink / raw)
  To: davem; +Cc: Shannon Nelson, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181107224830.9737-1-jeffrey.t.kirsher@intel.com>

From: Shannon Nelson <shannon.nelson@oracle.com>

When it's possible that the PF might end up trying to send a
packet to one of its own VFs, we have to forbid IPsec offload
because the device drops the packets into a black hole.
See commit 47b6f50077e6 ("ixgbe: disallow IPsec Tx offload
when in SR-IOV mode") for more info.

This really is only necessary when the device is in the default
VEB mode.  If instead the device is running in VEPA mode,
the packets will go through the encryption engine and out the
MAC/PHY as normal, and get "hairpinned" as needed by the switch.

So let's not block IPsec offload when in VEPA mode.  To get
there with the ixgbe device, use the handy 'bridge' command:
	bridge link set dev eth1 hwmode vepa

Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index fd1b0546fd67..4d77f42e035c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -4,6 +4,7 @@
 #include "ixgbe.h"
 #include <net/xfrm.h>
 #include <crypto/aead.h>
+#include <linux/if_bridge.h>
 
 #define IXGBE_IPSEC_KEY_BITS  160
 static const char aes_gcm_name[] = "rfc4106(gcm(aes))";
@@ -693,7 +694,8 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
 	} else {
 		struct tx_sa tsa;
 
-		if (adapter->num_vfs)
+		if (adapter->num_vfs &&
+		    adapter->bridge_mode != BRIDGE_MODE_VEPA)
 			return -EOPNOTSUPP;
 
 		/* find the first unused index */
-- 
2.19.1

^ permalink raw reply related

* [net-next 05/12] intel-ethernet: software timestamp skbs as late as possible
From: Jeff Kirsher @ 2018-11-07 22:48 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, Jeff Kirsher
In-Reply-To: <20181107224830.9737-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Many of the Intel Ethernet drivers call skb_tx_timestamp() earlier than
necessary. Move the calls to this function to the latest point possible,
just prior to notifying hardware of the new Tx packet when we bump the
tail register.

This affects i40e, iavf, igb, igc, and ixgbe.

The e100, e1000, e1000e, fm10k, and ice drivers already call the
skb_tx_timestamp() function just prior to indicating the Tx packet to
hardware, so they do not need to be changed.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 4 ++--
 drivers/net/ethernet/intel/iavf/iavf_txrx.c   | 4 ++--
 drivers/net/ethernet/intel/igb/igb_main.c     | 4 ++--
 drivers/net/ethernet/intel/igc/igc_main.c     | 4 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 ++--
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index aef3c89ee79c..1384a5a006a4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -3473,6 +3473,8 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 	tx_desc->cmd_type_offset_bsz =
 			build_ctob(td_cmd, td_offset, size, td_tag);
 
+	skb_tx_timestamp(skb);
+
 	/* Force memory writes to complete before letting h/w know there
 	 * are new descriptors to fetch.
 	 *
@@ -3652,8 +3654,6 @@ static netdev_tx_t i40e_xmit_frame_ring(struct sk_buff *skb,
 	if (tsyn)
 		tx_flags |= I40E_TX_FLAGS_TSYN;
 
-	skb_tx_timestamp(skb);
-
 	/* always enable CRC insertion offload */
 	td_cmd |= I40E_TX_DESC_CMD_ICRC;
 
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index fb9bfad96daf..3b1dc77ae368 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -2343,6 +2343,8 @@ static inline void iavf_tx_map(struct iavf_ring *tx_ring, struct sk_buff *skb,
 	tx_desc->cmd_type_offset_bsz =
 			build_ctob(td_cmd, td_offset, size, td_tag);
 
+	skb_tx_timestamp(skb);
+
 	/* Force memory writes to complete before letting h/w know there
 	 * are new descriptors to fetch.
 	 *
@@ -2461,8 +2463,6 @@ static netdev_tx_t iavf_xmit_frame_ring(struct sk_buff *skb,
 	if (tso < 0)
 		goto out_drop;
 
-	skb_tx_timestamp(skb);
-
 	/* always enable CRC insertion offload */
 	td_cmd |= IAVF_TX_DESC_CMD_ICRC;
 
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 5df88ad8ac81..4584ebc9e8fe 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6019,6 +6019,8 @@ static int igb_tx_map(struct igb_ring *tx_ring,
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
+	skb_tx_timestamp(skb);
+
 	/* Force memory writes to complete before letting h/w know there
 	 * are new descriptors to fetch.  (Only applicable for weak-ordered
 	 * memory model archs, such as IA-64).
@@ -6147,8 +6149,6 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
 	else if (!tso)
 		igb_tx_csum(tx_ring, first);
 
-	skb_tx_timestamp(skb);
-
 	if (igb_tx_map(tx_ring, first, hdr_len))
 		goto cleanup_tx_tstamp;
 
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 9d85707e8a81..615a5fcd5a00 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -865,6 +865,8 @@ static int igc_tx_map(struct igc_ring *tx_ring,
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
+	skb_tx_timestamp(skb);
+
 	/* Force memory writes to complete before letting h/w know there
 	 * are new descriptors to fetch.  (Only applicable for weak-ordered
 	 * memory model archs, such as IA-64).
@@ -959,8 +961,6 @@ static netdev_tx_t igc_xmit_frame_ring(struct sk_buff *skb,
 	first->bytecount = skb->len;
 	first->gso_segs = 1;
 
-	skb_tx_timestamp(skb);
-
 	/* record initial flags and protocol */
 	first->tx_flags = tx_flags;
 	first->protocol = protocol;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index aeda1834e66a..cfb83687c3d8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8269,6 +8269,8 @@ static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
+	skb_tx_timestamp(skb);
+
 	/*
 	 * Force memory writes to complete before letting h/w know there
 	 * are new descriptors to fetch.  (Only applicable for weak-ordered
@@ -8646,8 +8648,6 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 		}
 	}
 
-	skb_tx_timestamp(skb);
-
 #ifdef CONFIG_PCI_IOV
 	/*
 	 * Use the l2switch_enable flag - would be false if the DMA
-- 
2.19.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox