Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] net: Clarify dev_weight documentation for LRO and GRO_HW.
From: Marcelo Ricardo Leitner @ 2017-12-19 22:26 UTC (permalink / raw)
  To: Michael Chan; +Cc: davem, netdev
In-Reply-To: <1513717976-24263-1-git-send-email-michael.chan@broadcom.com>

On Tue, Dec 19, 2017 at 04:12:56PM -0500, Michael Chan wrote:
> Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>

Thanks!

> ---
>  Documentation/sysctl/net.txt | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
> index b67044a..35c62f5 100644
> --- a/Documentation/sysctl/net.txt
> +++ b/Documentation/sysctl/net.txt
> @@ -95,7 +95,9 @@ dev_weight
>  --------------
>  
>  The maximum number of packets that kernel can handle on a NAPI interrupt,
> -it's a Per-CPU variable.
> +it's a Per-CPU variable. For drivers that support LRO or GRO_HW, a hardware
> +aggregated packet is counted as one packet in this context.
> +
>  Default: 64
>  
>  dev_weight_rx_bias
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* Re: [RFC] hv_netvsc: automatically name slave VF network device
From: Jakub Kicinski @ 2017-12-19 22:24 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Stephen Hemminger, Jiri Pirko
In-Reply-To: <20171219140659.39f6cc1c@xeon-e3>

On Tue, 19 Dec 2017 14:06:59 -0800, Stephen Hemminger wrote:
> > > > > I assume you mean the modern application is udev, and it works
> > > > > but the name is meaningless because it based of synthetic PCI
> > > > > information. The PCI host adapter is simulated for pass through
> > > > > devices. Names like enp12s0.
> > > > > 
> > > > > Since every passthrough VF device on Hyper-V/Azure has a matching
> > > > > synthetic network device with same mac address. It is best to
> > > > > have the relationship shown in the name.        
> > > > 
> > > > How about we make the VF drivers expose "vf" as phys_port_name?
> > > > Then systemd/udev should glue that onto the name regardless of
> > > > how the VF is used.      
> > > 
> > > One of the goals was not to modify in any way other drivers (like VF).    
> > 
> > Why?  Do you have out-of-tree drivers you can't change or some such?  
> 
> This needs to work on enterprise distributions; plus it is not good
> practice to introduce random changes into partners like Mellanox
> drivers.

Are we talking about Linux or Windows kernel here?  I don't think
this requires hypervisor changes?  The notion of a "partner" and
changing drivers by people who are not employed by a vendor being 
bad practice sounds entirely foreign to me.

If we agree that marking VF interfaces is useful (and I think 
it is) then we should mark them always, not only when they are 
enslaved to a magic bond.  And the natural way of doing that 
seems to be phys_port_name.

^ permalink raw reply

* Re: [PATCH v2 iproute2 net-next] erspan: add erspan version II support
From: David Ahern @ 2017-12-19 22:17 UTC (permalink / raw)
  To: William Tu, netdev
In-Reply-To: <1513386366-38095-1-git-send-email-u9012063@gmail.com>

On 12/15/17 6:06 PM, William Tu wrote:
> @@ -343,6 +355,22 @@ get_failed:
>  				invarg("invalid erspan index\n", *argv);
>  			if (erspan_idx & ~((1<<20) - 1) || erspan_idx == 0)
>  				invarg("erspan index must be > 0 and <= 20-bit\n", *argv);
> +		} else if (strcmp(*argv, "erspan_ver") == 0) {
> +			NEXT_ARG();
> +			if (get_u8(&erspan_ver, *argv, 0))
> +				invarg("invalid erspan version\n", *argv);
> +			if (erspan_ver != 1 && erspan_ver != 2)
> +				invarg("erspan version must be 1 or 2\n", *argv);
> +		} else if (strcmp(*argv, "erspan_dir") == 0) {
> +			NEXT_ARG();
> +			if (get_u8(&erspan_dir, *argv, 0))
> +				invarg("invalid erspan direction\n", *argv);
> +			if (erspan_dir != 0 && erspan_dir != 1)
> +				invarg("erspan direction must be 0(Ingress) or 1(Egress)\n", *argv);

Why not allow ingress and egress as strings and using matches()?
e.g.,  '... erspan_dir in[gress] ...' or '... erspan_dir e[gress] ...'

Seems much more intuitive than remembering "0" = ingress and "1" = egress.

> @@ -374,8 +402,15 @@ get_failed:
>  		addattr_l(n, 1024, IFLA_GRE_TTL, &ttl, 1);
>  		addattr_l(n, 1024, IFLA_GRE_TOS, &tos, 1);
>  		addattr32(n, 1024, IFLA_GRE_FWMARK, fwmark);
> -		if (erspan_idx != 0)
> -			addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
> +		if (erspan_ver) {
> +			addattr8(n, 1024, IFLA_GRE_ERSPAN_VER, erspan_ver);
> +			if (erspan_ver == 1 && erspan_idx != 0) {
> +				addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
> +			} else if (erspan_ver == 2) {
> +				addattr8(n, 1024, IFLA_GRE_ERSPAN_DIR, erspan_dir);
> +				addattr16(n, 1024, IFLA_GRE_ERSPAN_HWID, erspan_hwid);
> +			}
> +		}
>  	} else {
>  		addattr_l(n, 1024, IFLA_GRE_COLLECT_METADATA, NULL, 0);
>  	}
> @@ -514,7 +549,25 @@ static void gre_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
>  	if (tb[IFLA_GRE_ERSPAN_INDEX]) {
>  		__u32 erspan_idx = rta_getattr_u32(tb[IFLA_GRE_ERSPAN_INDEX]);
>  
> -		fprintf(f, "erspan_index %u ", erspan_idx);
> +		print_uint(PRINT_ANY, "erspan_index", "erspan_index %u", erspan_idx);
> +	}
> +
> +	if (tb[IFLA_GRE_ERSPAN_VER]) {
> +		__u8 erspan_ver = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_VER]);
> +
> +		print_uint(PRINT_ANY, "erspan_ver", "erspan_ver %u", erspan_ver);
> +	}
> +
> +	if (tb[IFLA_GRE_ERSPAN_DIR]) {
> +		__u8 erspan_dir = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_DIR]);
> +
> +		print_uint(PRINT_ANY, "erspan_dir", "erspan_dir %u", erspan_dir);

similarly, here can convert the 0/1 to a human string.

Same comments for the changes to link_gre6.c

^ permalink raw reply

* Re: [RFC] hv_netvsc: automatically name slave VF network device
From: Stephen Hemminger @ 2017-12-19 22:06 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, Stephen Hemminger, Jiri Pirko
In-Reply-To: <20171219135529.62800475@cakuba.netronome.com>

On Tue, 19 Dec 2017 13:55:29 -0800
Jakub Kicinski <kubakici@wp.pl> wrote:

> On Tue, 19 Dec 2017 13:29:49 -0800, Stephen Hemminger wrote:
> > > > biosdevname is dead, gone and wouldn't work on Azure (it dumpster
> > > > dives in /dev/mem).      
> > > 
> > > Hm, I haven't worked on biosdevname myself, but AFAIU it also falls 
> > > back to information from the PCI VPD, which could be populated by 
> > > the hypervisor.    
> > 
> > VPD never had any useful standard are info.
> > The rules used by udev come off sysfs, see:
> >   https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/  
> 
> Yes, the current VPD info looks quite limited, although it is
> extendable.
> 
> > > > I assume you mean the modern application is udev, and it works
> > > > but the name is meaningless because it based of synthetic PCI
> > > > information. The PCI host adapter is simulated for pass through
> > > > devices. Names like enp12s0.
> > > > 
> > > > Since every passthrough VF device on Hyper-V/Azure has a matching
> > > > synthetic network device with same mac address. It is best to
> > > > have the relationship shown in the name.      
> > > 
> > > How about we make the VF drivers expose "vf" as phys_port_name?
> > > Then systemd/udev should glue that onto the name regardless of
> > > how the VF is used.    
> > 
> > One of the goals was not to modify in any way other drivers (like VF).  
> 
> Why?  Do you have out-of-tree drivers you can't change or some such?

This needs to work on enterprise distributions; plus it is not good practice
to introduce random changes into partners like Mellanox drivers.

^ permalink raw reply

* Re: [PATCH v4 04/36] nds32: Kernel booting and initialization
From: Randy Dunlap @ 2017-12-19 22:01 UTC (permalink / raw)
  To: Greentime Hu, greentime-MUIXKm3Oiri1Z/+hSey0Gg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4,
	linux-arch-u79uwXL29TY76Z2rM5mHXA, tglx-hfZtesqFncYOwBW4kG4KsQ,
	jason-NLaQJdtUoK4Be96aLqz0jA, marc.zyngier-5wv7dgnIgG8,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A, netdev-u79uwXL29TY76Z2rM5mHXA,
	deanbo422-Re5JQEeQqe8AvxtiuMwx3w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	dhowells-H+wXaHxf7aLQT0dZR+AlfA, will.deacon-5wv7dgnIgG8,
	daniel.lezcano-QSEj5FYQhm4dnm+yROfE0A,
	linux-serial-u79uwXL29TY76Z2rM5mHXA,
	geert.uytterhoeven-Re5JQEeQqe8AvxtiuMwx3w,
	linus.walleij-QSEj5FYQhm4dnm+yROfE0A, mark.rutland-5wv7dgnIgG8,
	greg-U8xfFu+wG4EAvxtiuMwx3w, ren_guo-Y+KPrCd2zL4AvxtiuMwx3w,
	pombredanne-od1rfyK75/E
  Cc: Vincent Chen
In-Reply-To: <935ff034982f077b2e6f5eeccd6fe2110614fc9c.1513577007.git.green.hu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On 12/17/2017 10:46 PM, Greentime Hu wrote:
> From: Greentime Hu <greentime-MUIXKm3Oiri1Z/+hSey0Gg@public.gmane.org>
> 
> This patch includes the kernel startup code. It can get dtb pointer
> passed from bootloader. It will create a temp mapping by tlb
> instructions at beginning and goto start_kernel.
> 
> Signed-off-by: Vincent Chen <vincentc-MUIXKm3Oiri1Z/+hSey0Gg@public.gmane.org>
> Signed-off-by: Greentime Hu <greentime-MUIXKm3Oiri1Z/+hSey0Gg@public.gmane.org>
> ---
>  arch/nds32/kernel/head.S  |  189 ++++++++++++++++++++++
>  arch/nds32/kernel/setup.c |  383 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 572 insertions(+)
>  create mode 100644 arch/nds32/kernel/head.S
>  create mode 100644 arch/nds32/kernel/setup.c
> 

> diff --git a/arch/nds32/kernel/setup.c b/arch/nds32/kernel/setup.c
> new file mode 100644
> index 0000000..7718c58
> --- /dev/null
> +++ b/arch/nds32/kernel/setup.c
> @@ -0,0 +1,383 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2005-2017 Andes Technology Corporation
> +

[snip]

> +struct cache_info L1_cache_info[2];
> +static void __init dump_cpu_info(int cpu)
> +{
> +	int i, p = 0;
> +	char str[sizeof(hwcap_str) + 16];
> +
> +	for (i = 0; hwcap_str[i]; i++) {
> +		if (elf_hwcap & (1 << i)) {
> +			sprintf(str + p, "%s ", hwcap_str[i]);
> +			p += strlen(hwcap_str[i]) + 1;
> +		}
> +	}
> +
> +	pr_info("CPU%d Featuretures: %s\n", cpu, str);

	               Features:


-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Linux ECN Handling
From: Steve Ibanez @ 2017-12-19 22:00 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Eric Dumazet, Yuchung Cheng, Daniel Borkmann, Netdev,
	Florian Westphal, Mohammad Alizadeh, Lawrence Brakmo
In-Reply-To: <CADVnQynAqMYzi+Kt5RjbW+ma_99nGyDPubSd6bYamQGsK5f+7A@mail.gmail.com>

Hi Neal,

I managed to track down the code path that the unACKed CWR packet is
taking. The tcp_rcv_established() function calls tcp_ack_snd_check()
at the end of step5 and then the return statement indicated below is
invoked, which prevents the __tcp_ack_snd_check() function from
running.

static inline void tcp_ack_snd_check(struct sock *sk)
{
        if (!inet_csk_ack_scheduled(sk)) {
                /* We sent a data segment already. */
                return;   /* <=== here */
        }
        __tcp_ack_snd_check(sk, 1);
}

So somehow tcp_ack_snd_check() thinks that a data segment was already
sent when in fact it wasn't. Do you see a way around this issue?

Thanks,
-Steve

On Tue, Dec 19, 2017 at 7:28 AM, Neal Cardwell <ncardwell@google.com> wrote:
> On Tue, Dec 19, 2017 at 12:16 AM, Steve Ibanez <sibanez@stanford.edu> wrote:
>>
>> Hi Neal,
>>
>> I started looking into this receiver ACKing issue today. Strangely,
>> when I tried adding printk statements at the top of the
>> tcp_v4_do_rcv(), tcp_rcv_established(), __tcp_ack_snd_check() and
>> tcp_send_delayed_ack() functions they were never executed on the
>> machine running the iperf3 server (i.e. the destination of the flows).
>> Maybe the iperf3 server is using its own TCP stack?
>>
>> In any case, the ACKing problem is reproducible using just normal
>> iperf for which I do see my printk statements being executed. I can
>> now confirm that when the CWR marked packet (for which no ACK is sent)
>> arrives at the receiver, the __tcp_ack_snd_check() function is never
>> invoked; and hence neither is the tcp_send_delayed_ack() function.
>> Hopefully this helps narrow down where the issue might be? I started
>> adding some printk statements into the tcp_rcv_established() function,
>> but I'm not sure where the best places to look would be so I wanted to
>> ask your advice on this.
>
> Thanks for the detailed report!
>
> As a next step to narrow down why the CWR-marked packet is not acked,
> I would suggest adding printk statements at the bottom of
> tcp_rcv_established() in all the spots where we have a goto or return
> that would cause us to short-circuit and not reach the
> tcp_ack_snd_check() at the bottom of the function. This could be much
> like your existing nice debugging printks, that log any time we get to
> that spot and if (sk->sk_family == AF_INET && th->cwr). And these
> could be in the following spots (marked "here"):
>
> slow_path:
>         if (len < (th->doff << 2) || tcp_checksum_complete(skb))
>                 goto csum_error;                /* <=== here */
>
>         if (!th->ack && !th->rst && !th->syn)
>                 goto discard;                /* <=== here */
>
>         /*
>          *      Standard slow path.
>          */
>
>         if (!tcp_validate_incoming(sk, skb, th, 1))
>                 return;                /* <=== here */
>
> step5:
>         if (tcp_ack(sk, skb, FLAG_SLOWPATH | FLAG_UPDATE_TS_RECENT) < 0)
>                 goto discard;                /* <=== here */
>
>
> thanks,
> neal

^ permalink raw reply

* Re: [RFC] hv_netvsc: automatically name slave VF network device
From: Jakub Kicinski @ 2017-12-19 21:55 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Stephen Hemminger, Jiri Pirko
In-Reply-To: <20171219132949.57926170@xeon-e3>

On Tue, 19 Dec 2017 13:29:49 -0800, Stephen Hemminger wrote:
> > > biosdevname is dead, gone and wouldn't work on Azure (it dumpster
> > > dives in /dev/mem).    
> > 
> > Hm, I haven't worked on biosdevname myself, but AFAIU it also falls 
> > back to information from the PCI VPD, which could be populated by 
> > the hypervisor.  
> 
> VPD never had any useful standard are info.
> The rules used by udev come off sysfs, see:
>   https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/

Yes, the current VPD info looks quite limited, although it is
extendable.

> > > I assume you mean the modern application is udev, and it works
> > > but the name is meaningless because it based of synthetic PCI
> > > information. The PCI host adapter is simulated for pass through
> > > devices. Names like enp12s0.
> > > 
> > > Since every passthrough VF device on Hyper-V/Azure has a matching
> > > synthetic network device with same mac address. It is best to
> > > have the relationship shown in the name.    
> > 
> > How about we make the VF drivers expose "vf" as phys_port_name?
> > Then systemd/udev should glue that onto the name regardless of
> > how the VF is used.  
> 
> One of the goals was not to modify in any way other drivers (like VF).

Why?  Do you have out-of-tree drivers you can't change or some such?

^ permalink raw reply

* Re: [pull request][for-next 00/11] Mellanox, mlx5 E-Switch updates 2017-12-19
From: Marcelo Ricardo Leitner @ 2017-12-19 21:54 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Doug Ledford, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Leon Romanovsky
In-Reply-To: <20171219203340.2600-1-saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

On Tue, Dec 19, 2017 at 12:33:29PM -0800, Saeed Mahameed wrote:
> Hi Dave and Doug,
> 
> ==============
> This series includes updates for mlx5 E-Switch infrastructures,
> to be merged into net-next and rdma-next trees.
> 
> Mark's patches provide E-Switch refactoring that generalize the mlx5
> E-Switch vf representors interfaces and data structures. The serious is
> mainly focused on moving ethernet (netdev) specific representors logic out
> of E-Switch (eswitch.c) into mlx5e representor module (en_rep.c), which
> provides better separation and allows future support for other types of vf
> representors (e.g. RDMA).
> 
> Gal's patches at the end of this serious, provide a simple syntax fix and
> two other patches that handles vport ingress/egress ACL steering name
> spaces to be aligned with the Firmware/Hardware specs.

Patch 10 actually looks quite worrysome if a card would support only
one ingress or egress, but if all of them support both, then it should
be fine yes. Is that possible, to support only one direction?

  Marcelo
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v3 0/3] kallsyms: don't leak address
From: Tobin C. Harding @ 2017-12-19 21:39 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Tobin C. Harding, Steven Rostedt, Tycho Andersen, Linus Torvalds,
	Kees Cook, Andrew Morton, Daniel Borkmann, Masahiro Yamada,
	Alexei Starovoitov, linux-kernel, Network Development,
	Joe Perches

This set plugs a kernel address leak that occurs if kallsyms symbol
look up fails. This set was prompted by a leaking address found using
scripts/leaking_addresses.pl on a PowerPC machine in the wild.

$ perl scripts/leaking_addresses.pl		[address sanitized]
...
/proc/8025/task/8025/stack: [<0000000000000000>] 0xc0000001XXXXXXXX

$ uname -r
4.4.0-79-powerpc64-smp


Patch set does not change behaviour when KALLSYMS is not defined
(suggested by Linus).

Comments on version 1 indicated that current behaviour may be useful for
debugging. This version adds a kernel command-line parameter in order to
be able to preserve current behaviour (print raw address if kallsyms
symbol look up fails). (Command-line parameter suggested by Steve.)

New command-line parameter is documented only in the kernel-doc for
kallsyms functions sprint_symbol() and sprint_symbol_no_offset(). Is
this sufficient? Perhaps an entry in printk-formats.txt also?

Patch 1 - return error code if symbol look up fails unless new
	  command-line parameter 'insecure_print_all_symbols' is enabled.
Patch 2 - print <symbol not found> to buffer if symbol look up returns
	  an error.
Patch 3 - maintain current behaviour in ftrace.

thanks,
Tobin.

v3:
 - Remove const string and use ternary operator (suggested by Joe
   Perches) 

v2:
 - Add kernel command-line parameter.
 - Remove unnecessary function.
 - Fix broken ftrace code (and actually build and test ftrace code).

All code tested.

Tobin C. Harding (3):
  kallsyms: don't leak address when symbol not found
  vsprintf: print <symbol not found> if symbol not found
  trace: print address if symbol not found

 kernel/kallsyms.c                | 31 +++++++++++++++++++++++++------
 kernel/trace/trace.h             | 24 ++++++++++++++++++++++++
 kernel/trace/trace_events_hist.c |  6 +++---
 kernel/trace/trace_output.c      |  2 +-
 lib/vsprintf.c                   |  9 +++++----
 5 files changed, 58 insertions(+), 14 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH v3 3/3] trace: print address if symbol not found
From: Tobin C. Harding @ 2017-12-19 21:39 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Tobin C. Harding, Steven Rostedt, Tycho Andersen, Linus Torvalds,
	Kees Cook, Andrew Morton, Daniel Borkmann, Masahiro Yamada,
	Alexei Starovoitov, linux-kernel, Network Development,
	Joe Perches
In-Reply-To: <1513719564-13249-1-git-send-email-me@tobin.cc>

Fixes behaviour modified by: commit 40eee173a35e ("kallsyms: don't leak
address when symbol not found")

Previous patch changed behaviour of kallsyms function sprint_symbol() to
return an error code instead of printing the address if a symbol was not
found. Ftrace relies on the original behaviour. We should not break
tracing when applying the previous patch. We can maintain the original
behaviour by checking the return code on calls to sprint_symbol() and
friends.

Check return code and print actual address on error (i.e symbol not
found).

Signed-off-by: Tobin C. Harding <me@tobin.cc>
---
 kernel/trace/trace.h             | 24 ++++++++++++++++++++++++
 kernel/trace/trace_events_hist.c |  6 +++---
 kernel/trace/trace_output.c      |  2 +-
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 2a6d0325a761..881b1a577d75 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1814,4 +1814,28 @@ static inline void trace_event_eval_update(struct trace_eval_map **map, int len)
 
 extern struct trace_iterator *tracepoint_print_iter;
 
+static inline int
+trace_sprint_symbol(char *buffer, unsigned long address)
+{
+	int ret;
+
+	ret = sprint_symbol(buffer, address);
+	if (ret == -1)
+		ret = sprintf(buffer, "0x%lx", address);
+
+	return ret;
+}
+
+static inline int
+trace_sprint_symbol_no_offset(char *buffer, unsigned long address)
+{
+	int ret;
+
+	ret = sprint_symbol_no_offset(buffer, address);
+	if (ret == -1)
+		ret = sprintf(buffer, "0x%lx", address);
+
+	return ret;
+}
+
 #endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 1e1558c99d56..ca523327c058 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -982,7 +982,7 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
 			return;
 
 		seq_printf(m, "%*c", 1 + spaces, ' ');
-		sprint_symbol(str, stacktrace_entries[i]);
+		trace_sprint_symbol(str, stacktrace_entries[i]);
 		seq_printf(m, "%s\n", str);
 	}
 }
@@ -1014,12 +1014,12 @@ hist_trigger_entry_print(struct seq_file *m,
 			seq_printf(m, "%s: %llx", field_name, uval);
 		} else if (key_field->flags & HIST_FIELD_FL_SYM) {
 			uval = *(u64 *)(key + key_field->offset);
-			sprint_symbol_no_offset(str, uval);
+			trace_sprint_symbol_no_offset(str, uval);
 			seq_printf(m, "%s: [%llx] %-45s", field_name,
 				   uval, str);
 		} else if (key_field->flags & HIST_FIELD_FL_SYM_OFFSET) {
 			uval = *(u64 *)(key + key_field->offset);
-			sprint_symbol(str, uval);
+			trace_sprint_symbol(str, uval);
 			seq_printf(m, "%s: [%llx] %-55s", field_name,
 				   uval, str);
 		} else if (key_field->flags & HIST_FIELD_FL_EXECNAME) {
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 90db994ac900..f3c3a0a60f72 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -365,7 +365,7 @@ seq_print_sym_offset(struct trace_seq *s, const char *fmt,
 #ifdef CONFIG_KALLSYMS
 	const char *name;
 
-	sprint_symbol(str, address);
+	trace_sprint_symbol(str, address);
 	name = kretprobed(str);
 
 	if (name && strlen(name)) {
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 2/3] vsprintf: print <symbol not found> if symbol not found
From: Tobin C. Harding @ 2017-12-19 21:39 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Tobin C. Harding, Steven Rostedt, Tycho Andersen, Linus Torvalds,
	Kees Cook, Andrew Morton, Daniel Borkmann, Masahiro Yamada,
	Alexei Starovoitov, linux-kernel, Network Development,
	Joe Perches
In-Reply-To: <1513719564-13249-1-git-send-email-me@tobin.cc>

Depends on: commit 40eee173a35e ("kallsyms: don't leak address when
symbol not found")

Currently vsprintf for specifiers %p[SsB] relies on the behaviour of
kallsyms (sprint_symbol()) and prints the actual address if a symbol is
not found. Previous patch changes this behaviour so that sprint_symbol()
returns an error if symbol not found. With this patch in place we can
print a sanitized message '<symbol not found>' instead of leaking the
address.

Print '<symbol not found>' for printk specifier %p[sSB] if symbol look
up fails.

Signed-off-by: Tobin C. Harding <me@tobin.cc>
---
 lib/vsprintf.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 01c3957b2de6..503402a44ffe 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -674,6 +674,7 @@ char *symbol_string(char *buf, char *end, void *ptr,
 	unsigned long value;
 #ifdef CONFIG_KALLSYMS
 	char sym[KSYM_SYMBOL_LEN];
+	int ret;
 #endif
 
 	if (fmt[1] == 'R')
@@ -682,13 +683,13 @@ char *symbol_string(char *buf, char *end, void *ptr,
 
 #ifdef CONFIG_KALLSYMS
 	if (*fmt == 'B')
-		sprint_backtrace(sym, value);
+		ret = sprint_backtrace(sym, value);
 	else if (*fmt != 'f' && *fmt != 's')
-		sprint_symbol(sym, value);
+		ret = sprint_symbol(sym, value);
 	else
-		sprint_symbol_no_offset(sym, value);
+		ret = sprint_symbol_no_offset(sym, value);
 
-	return string(buf, end, sym, spec);
+	return string(buf, end, ret == -1 ? "<symbol not found>" : sym, spec);
 #else
 	return special_hex_number(buf, end, value, sizeof(void *));
 #endif
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 1/3] kallsyms: don't leak address when symbol not found
From: Tobin C. Harding @ 2017-12-19 21:39 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Tobin C. Harding, Steven Rostedt, Tycho Andersen, Linus Torvalds,
	Kees Cook, Andrew Morton, Daniel Borkmann, Masahiro Yamada,
	Alexei Starovoitov, linux-kernel, Network Development,
	Joe Perches
In-Reply-To: <1513719564-13249-1-git-send-email-me@tobin.cc>

Currently if kallsyms_lookup() fails to find the symbol then the address
is printed. This potentially leaks sensitive information but is useful
for debugging. We would like to stop the leak but keep the current
behaviour when needed for debugging. To achieve this we can add a
command-line parameter that if enabled maintains the current
behaviour. If the command-line parameter is not enabled we can return an
error instead of printing the address giving the calling code the option
of how to handle the look up failure.

Add command-line parameter 'insecure_print_all_symbols'. If parameter is
not enabled return an error value instead of printing the raw address.

Signed-off-by: Tobin C. Harding <me@tobin.cc>
---
 kernel/kallsyms.c | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index d5fa4116688a..2707cf751437 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -383,6 +383,16 @@ int lookup_symbol_attrs(unsigned long addr, unsigned long *size,
 	return lookup_module_symbol_attrs(addr, size, offset, modname, name);
 }
 
+/* Enables printing of raw address when symbol look up fails */
+static bool insecure_print_all_symbols;
+
+static int __init enable_insecure_print_all_symbols(char *unused)
+{
+	insecure_print_all_symbols = true;
+	return 0;
+}
+early_param("insecure_print_all_symbols", enable_insecure_print_all_symbols);
+
 /* Look up a kernel symbol and return it in a text buffer. */
 static int __sprint_symbol(char *buffer, unsigned long address,
 			   int symbol_offset, int add_offset)
@@ -394,8 +404,15 @@ static int __sprint_symbol(char *buffer, unsigned long address,
 
 	address += symbol_offset;
 	name = kallsyms_lookup(address, &size, &offset, &modname, buffer);
-	if (!name)
-		return sprintf(buffer, "0x%lx", address - symbol_offset);
+	if (insecure_print_all_symbols) {
+		if (!name)
+			return sprintf(buffer, "0x%lx", address - symbol_offset);
+	} else {
+		if (!name) {
+			buffer[0] = '\0';
+			return -1;
+		}
+	}
 
 	if (name != buffer)
 		strcpy(buffer, name);
@@ -417,8 +434,9 @@ static int __sprint_symbol(char *buffer, unsigned long address,
  * @address: address to lookup
  *
  * This function looks up a kernel symbol with @address and stores its name,
- * offset, size and module name to @buffer if possible. If no symbol was found,
- * just saves its @address as is.
+ * offset, size and module name to @buffer if possible. If no symbol was found
+ * returns -1 unless kernel command-line parameter 'insecure_print_all_symbols'
+ * is enabled, in which case saves @address as is to buffer.
  *
  * This function returns the number of bytes stored in @buffer.
  */
@@ -434,8 +452,9 @@ EXPORT_SYMBOL_GPL(sprint_symbol);
  * @address: address to lookup
  *
  * This function looks up a kernel symbol with @address and stores its name
- * and module name to @buffer if possible. If no symbol was found, just saves
- * its @address as is.
+ * and module name to @buffer if possible. If no symbol was found, returns -1
+ * unless kernel command-line parameter 'insecure_print_all_symbols' is enabled,
+ * in which case saves @address as is to buffer.
  *
  * This function returns the number of bytes stored in @buffer.
  */
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next (after merge)] netdevsim: bpf: align to cls_bpf changes
From: Jakub Kicinski @ 2017-12-19 21:35 UTC (permalink / raw)
  To: netdev, davem; +Cc: oss-drivers, jiri, daniel, Jakub Kicinski

cls_bpf no longer takes care of offload tracking.  Make sure
netdevsim performs necessary checks.  This fixes a warning
caused by TC trying to remove a filter it has not added.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
Dave, this will have to be applied to net-next as soon as
the previous series comes in via net.  netdevsim doesn't
exist in net/master, so I can't fix it there :(  I'm not
sure how to handle this situation best...
---
 drivers/net/netdevsim/bpf.c                 | 25 ++++++++++++++-----------
 tools/testing/selftests/bpf/test_offload.py |  4 ++--
 2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 7799942ed349..c977fece64a3 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -107,6 +107,7 @@ int nsim_bpf_setup_tc_block_cb(enum tc_setup_type type,
 	struct tc_cls_bpf_offload *cls_bpf = type_data;
 	struct bpf_prog *prog = cls_bpf->prog;
 	struct netdevsim *ns = cb_priv;
+	struct bpf_prog *oldprog;
 
 	if (type != TC_SETUP_CLSBPF ||
 	    !tc_can_offload(ns->netdev) ||
@@ -114,25 +115,27 @@ int nsim_bpf_setup_tc_block_cb(enum tc_setup_type type,
 	    cls_bpf->common.chain_index)
 		return -EOPNOTSUPP;
 
-	if (nsim_xdp_offload_active(ns))
-		return -EBUSY;
-
 	if (!ns->bpf_tc_accept)
 		return -EOPNOTSUPP;
 	/* Note: progs without skip_sw will probably not be dev bound */
 	if (prog && !prog->aux->offload && !ns->bpf_tc_non_bound_accept)
 		return -EOPNOTSUPP;
 
-	switch (cls_bpf->command) {
-	case TC_CLSBPF_REPLACE:
-		return nsim_bpf_offload(ns, prog, true);
-	case TC_CLSBPF_ADD:
-		return nsim_bpf_offload(ns, prog, false);
-	case TC_CLSBPF_DESTROY:
-		return nsim_bpf_offload(ns, NULL, true);
-	default:
+	if (cls_bpf->command != TC_CLSBPF_OFFLOAD)
 		return -EOPNOTSUPP;
+
+	oldprog = cls_bpf->oldprog;
+
+	/* Don't remove if oldprog doesn't match driver's state */
+	if (ns->bpf_offloaded != oldprog) {
+		oldprog = NULL;
+		if (!cls_bpf->prog)
+			return 0;
+		if (ns->bpf_offloaded)
+			return -EBUSY;
 	}
+
+	return nsim_bpf_offload(ns, cls_bpf->prog, oldprog);
 }
 
 int nsim_bpf_disable_tc(struct netdevsim *ns)
diff --git a/tools/testing/selftests/bpf/test_offload.py b/tools/testing/selftests/bpf/test_offload.py
index 3914f7a4585a..c940505c2978 100755
--- a/tools/testing/selftests/bpf/test_offload.py
+++ b/tools/testing/selftests/bpf/test_offload.py
@@ -647,8 +647,8 @@ samples = ["sample_ret0.o"]
 
     start_test("Test asking for TC offload of two filters...")
     sim.cls_bpf_add_filter(obj, da=True, skip_sw=True)
-    sim.cls_bpf_add_filter(obj, da=True, skip_sw=True)
-    # The above will trigger a splat until TC cls_bpf drivers are fixed
+    ret, _ = sim.cls_bpf_add_filter(obj, da=True, skip_sw=True, fail=False)
+    fail(ret == 0, "Managed to offload two TC filters at the same time")
 
     sim.tc_flush_filters(bound=2, total=2)
 
-- 
2.15.1

^ permalink raw reply related

* [PATCH net 2/2] nfp: bpf: keep track of the offloaded program
From: Jakub Kicinski @ 2017-12-19 21:32 UTC (permalink / raw)
  To: netdev; +Cc: daniel, jiri, oss-drivers, Jakub Kicinski
In-Reply-To: <20171219213214.1084-1-jakub.kicinski@netronome.com>

After TC offloads were converted to callbacks we have no choice
but keep track of the offloaded filter in the driver.

The check for nn->dp.bpf_offload_xdp was a stop gap solution
to make sure failed TC offload won't disable XDP, it's no longer
necessary.  nfp_net_bpf_offload() will return -EBUSY on
TC vs XDP conflicts.

Fixes: 3f7889c4c79b ("net: sched: cls_bpf: call block callbacks for offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
Dave, this will conflict on nfp_bpf_vnic_free() with net-next.
The function was removed since stack will stop XDP now, but
we need to add it back with just the WARN and free on @bv.
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c | 47 ++++++++++++++++++++++++---
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  8 +++++
 2 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index a4cf62ba4604..13190aa09faf 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -82,10 +82,33 @@ static const char *nfp_bpf_extra_cap(struct nfp_app *app, struct nfp_net *nn)
 	return nfp_net_ebpf_capable(nn) ? "BPF" : "";
 }
 
+static int
+nfp_bpf_vnic_alloc(struct nfp_app *app, struct nfp_net *nn, unsigned int id)
+{
+	int err;
+
+	nn->app_priv = kzalloc(sizeof(struct nfp_bpf_vnic), GFP_KERNEL);
+	if (!nn->app_priv)
+		return -ENOMEM;
+
+	err = nfp_app_nic_vnic_alloc(app, nn, id);
+	if (err)
+		goto err_free_priv;
+
+	return 0;
+err_free_priv:
+	kfree(nn->app_priv);
+	return err;
+}
+
 static void nfp_bpf_vnic_free(struct nfp_app *app, struct nfp_net *nn)
 {
+	struct nfp_bpf_vnic *bv = nn->app_priv;
+
 	if (nn->dp.bpf_offload_xdp)
 		nfp_bpf_xdp_offload(app, nn, NULL);
+	WARN_ON(bv->tc_prog);
+	kfree(bv);
 }
 
 static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type,
@@ -93,6 +116,9 @@ static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type,
 {
 	struct tc_cls_bpf_offload *cls_bpf = type_data;
 	struct nfp_net *nn = cb_priv;
+	struct bpf_prog *oldprog;
+	struct nfp_bpf_vnic *bv;
+	int err;
 
 	if (type != TC_SETUP_CLSBPF ||
 	    !tc_can_offload(nn->dp.netdev) ||
@@ -100,8 +126,6 @@ static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type,
 	    cls_bpf->common.protocol != htons(ETH_P_ALL) ||
 	    cls_bpf->common.chain_index)
 		return -EOPNOTSUPP;
-	if (nn->dp.bpf_offload_xdp)
-		return -EBUSY;
 
 	/* Only support TC direct action */
 	if (!cls_bpf->exts_integrated ||
@@ -113,7 +137,22 @@ static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type,
 	if (cls_bpf->command != TC_CLSBPF_OFFLOAD)
 		return -EOPNOTSUPP;
 
-	return nfp_net_bpf_offload(nn, cls_bpf->prog, cls_bpf->oldprog);
+	bv = nn->app_priv;
+	oldprog = cls_bpf->oldprog;
+
+	/* Don't remove if oldprog doesn't match driver's state */
+	if (bv->tc_prog != oldprog) {
+		oldprog = NULL;
+		if (!cls_bpf->prog)
+			return 0;
+	}
+
+	err = nfp_net_bpf_offload(nn, cls_bpf->prog, oldprog);
+	if (err)
+		return err;
+
+	bv->tc_prog = cls_bpf->prog;
+	return 0;
 }
 
 static int nfp_bpf_setup_tc_block(struct net_device *netdev,
@@ -161,7 +200,7 @@ const struct nfp_app_type app_bpf = {
 
 	.extra_cap	= nfp_bpf_extra_cap,
 
-	.vnic_alloc	= nfp_app_nic_vnic_alloc,
+	.vnic_alloc	= nfp_bpf_vnic_alloc,
 	.vnic_free	= nfp_bpf_vnic_free,
 
 	.setup_tc	= nfp_bpf_setup_tc,
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 082a15f6dfb5..57b6043177a3 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -172,6 +172,14 @@ struct nfp_prog {
 	struct list_head insns;
 };
 
+/**
+ * struct nfp_bpf_vnic - per-vNIC BPF priv structure
+ * @tc_prog:	currently loaded cls_bpf program
+ */
+struct nfp_bpf_vnic {
+	struct bpf_prog *tc_prog;
+};
+
 int nfp_bpf_jit(struct nfp_prog *prog);
 
 extern const struct bpf_ext_analyzer_ops nfp_bpf_analyzer_ops;
-- 
2.15.1

^ permalink raw reply related

* [PATCH net 1/2] cls_bpf: fix offload assumptions after callback conversion
From: Jakub Kicinski @ 2017-12-19 21:32 UTC (permalink / raw)
  To: netdev; +Cc: daniel, jiri, oss-drivers, Jakub Kicinski
In-Reply-To: <20171219213214.1084-1-jakub.kicinski@netronome.com>

cls_bpf used to take care of tracking what offload state a filter
is in, i.e. it would track if offload request succeeded or not.
This information would then be used to issue correct requests to
the driver, e.g. requests for statistics only on offloaded filters,
removing only filters which were offloaded, using add instead of
replace if previous filter was not added etc.

This tracking of offload state no longer functions with the new
callback infrastructure.  There could be multiple entities trying
to offload the same filter.

Throw out all the tracking and corresponding commands and simply
pass to the drivers both old and new bpf program.  Drivers will
have to deal with offload state tracking by themselves.

Fixes: 3f7889c4c79b ("net: sched: cls_bpf: call block callbacks for offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c | 12 +---
 include/net/pkt_cls.h                         |  5 +-
 net/sched/cls_bpf.c                           | 93 +++++++++++----------------
 3 files changed, 43 insertions(+), 67 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index e379b78e86ef..a4cf62ba4604 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -110,16 +110,10 @@ static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type,
 		return -EOPNOTSUPP;
 	}
 
-	switch (cls_bpf->command) {
-	case TC_CLSBPF_REPLACE:
-		return nfp_net_bpf_offload(nn, cls_bpf->prog, true);
-	case TC_CLSBPF_ADD:
-		return nfp_net_bpf_offload(nn, cls_bpf->prog, false);
-	case TC_CLSBPF_DESTROY:
-		return nfp_net_bpf_offload(nn, NULL, true);
-	default:
+	if (cls_bpf->command != TC_CLSBPF_OFFLOAD)
 		return -EOPNOTSUPP;
-	}
+
+	return nfp_net_bpf_offload(nn, cls_bpf->prog, cls_bpf->oldprog);
 }
 
 static int nfp_bpf_setup_tc_block(struct net_device *netdev,
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 0105445cab83..8e08b6da72f3 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -694,9 +694,7 @@ struct tc_cls_matchall_offload {
 };
 
 enum tc_clsbpf_command {
-	TC_CLSBPF_ADD,
-	TC_CLSBPF_REPLACE,
-	TC_CLSBPF_DESTROY,
+	TC_CLSBPF_OFFLOAD,
 	TC_CLSBPF_STATS,
 };
 
@@ -705,6 +703,7 @@ struct tc_cls_bpf_offload {
 	enum tc_clsbpf_command command;
 	struct tcf_exts *exts;
 	struct bpf_prog *prog;
+	struct bpf_prog *oldprog;
 	const char *name;
 	bool exts_integrated;
 	u32 gen_flags;
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 6fe798c2df1a..8d78e7f4ecc3 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -42,7 +42,6 @@ struct cls_bpf_prog {
 	struct list_head link;
 	struct tcf_result res;
 	bool exts_integrated;
-	bool offloaded;
 	u32 gen_flags;
 	struct tcf_exts exts;
 	u32 handle;
@@ -148,33 +147,37 @@ static bool cls_bpf_is_ebpf(const struct cls_bpf_prog *prog)
 }
 
 static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
-			       enum tc_clsbpf_command cmd)
+			       struct cls_bpf_prog *oldprog)
 {
-	bool addorrep = cmd == TC_CLSBPF_ADD || cmd == TC_CLSBPF_REPLACE;
 	struct tcf_block *block = tp->chain->block;
-	bool skip_sw = tc_skip_sw(prog->gen_flags);
 	struct tc_cls_bpf_offload cls_bpf = {};
+	struct cls_bpf_prog *obj;
+	bool skip_sw;
 	int err;
 
+	skip_sw = prog && tc_skip_sw(prog->gen_flags);
+	obj = prog ?: oldprog;
+
 	tc_cls_common_offload_init(&cls_bpf.common, tp);
-	cls_bpf.command = cmd;
-	cls_bpf.exts = &prog->exts;
-	cls_bpf.prog = prog->filter;
-	cls_bpf.name = prog->bpf_name;
-	cls_bpf.exts_integrated = prog->exts_integrated;
-	cls_bpf.gen_flags = prog->gen_flags;
+	cls_bpf.command = TC_CLSBPF_OFFLOAD;
+	cls_bpf.exts = &obj->exts;
+	cls_bpf.prog = prog ? prog->filter : NULL;
+	cls_bpf.oldprog = oldprog ? oldprog->filter : NULL;
+	cls_bpf.name = obj->bpf_name;
+	cls_bpf.exts_integrated = obj->exts_integrated;
+	cls_bpf.gen_flags = obj->gen_flags;
 
 	err = tc_setup_cb_call(block, NULL, TC_SETUP_CLSBPF, &cls_bpf, skip_sw);
-	if (addorrep) {
+	if (prog) {
 		if (err < 0) {
-			cls_bpf_offload_cmd(tp, prog, TC_CLSBPF_DESTROY);
+			cls_bpf_offload_cmd(tp, oldprog, prog);
 			return err;
 		} else if (err > 0) {
 			prog->gen_flags |= TCA_CLS_FLAGS_IN_HW;
 		}
 	}
 
-	if (addorrep && skip_sw && !(prog->gen_flags & TCA_CLS_FLAGS_IN_HW))
+	if (prog && skip_sw && !(prog->gen_flags & TCA_CLS_FLAGS_IN_HW))
 		return -EINVAL;
 
 	return 0;
@@ -183,38 +186,17 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
 static int cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog,
 			   struct cls_bpf_prog *oldprog)
 {
-	struct cls_bpf_prog *obj = prog;
-	enum tc_clsbpf_command cmd;
-	bool skip_sw;
-	int ret;
-
-	skip_sw = tc_skip_sw(prog->gen_flags) ||
-		(oldprog && tc_skip_sw(oldprog->gen_flags));
-
-	if (oldprog && oldprog->offloaded) {
-		if (!tc_skip_hw(prog->gen_flags)) {
-			cmd = TC_CLSBPF_REPLACE;
-		} else if (!tc_skip_sw(prog->gen_flags)) {
-			obj = oldprog;
-			cmd = TC_CLSBPF_DESTROY;
-		} else {
-			return -EINVAL;
-		}
-	} else {
-		if (tc_skip_hw(prog->gen_flags))
-			return skip_sw ? -EINVAL : 0;
-		cmd = TC_CLSBPF_ADD;
-	}
-
-	ret = cls_bpf_offload_cmd(tp, obj, cmd);
-	if (ret)
-		return ret;
+	if (prog && oldprog && prog->gen_flags != oldprog->gen_flags)
+		return -EINVAL;
 
-	obj->offloaded = true;
-	if (oldprog)
-		oldprog->offloaded = false;
+	if (prog && tc_skip_hw(prog->gen_flags))
+		prog = NULL;
+	if (oldprog && tc_skip_hw(oldprog->gen_flags))
+		oldprog = NULL;
+	if (!prog && !oldprog)
+		return 0;
 
-	return 0;
+	return cls_bpf_offload_cmd(tp, prog, oldprog);
 }
 
 static void cls_bpf_stop_offload(struct tcf_proto *tp,
@@ -222,25 +204,26 @@ static void cls_bpf_stop_offload(struct tcf_proto *tp,
 {
 	int err;
 
-	if (!prog->offloaded)
-		return;
-
-	err = cls_bpf_offload_cmd(tp, prog, TC_CLSBPF_DESTROY);
-	if (err) {
+	err = cls_bpf_offload_cmd(tp, NULL, prog);
+	if (err)
 		pr_err("Stopping hardware offload failed: %d\n", err);
-		return;
-	}
-
-	prog->offloaded = false;
 }
 
 static void cls_bpf_offload_update_stats(struct tcf_proto *tp,
 					 struct cls_bpf_prog *prog)
 {
-	if (!prog->offloaded)
-		return;
+	struct tcf_block *block = tp->chain->block;
+	struct tc_cls_bpf_offload cls_bpf = {};
+
+	tc_cls_common_offload_init(&cls_bpf.common, tp);
+	cls_bpf.command = TC_CLSBPF_STATS;
+	cls_bpf.exts = &prog->exts;
+	cls_bpf.prog = prog->filter;
+	cls_bpf.name = prog->bpf_name;
+	cls_bpf.exts_integrated = prog->exts_integrated;
+	cls_bpf.gen_flags = prog->gen_flags;
 
-	cls_bpf_offload_cmd(tp, prog, TC_CLSBPF_STATS);
+	tc_setup_cb_call(block, NULL, TC_SETUP_CLSBPF, &cls_bpf, false);
 }
 
 static int cls_bpf_init(struct tcf_proto *tp)
-- 
2.15.1

^ permalink raw reply related

* [PATCH net 0/2] cls_bpf: fix offload state tracking with block callbacks
From: Jakub Kicinski @ 2017-12-19 21:32 UTC (permalink / raw)
  To: netdev; +Cc: daniel, jiri, oss-drivers, Jakub Kicinski

Hi!

After introduction of block callbacks classifiers can no longer track
offload state.  cls_bpf used to do that in an attempt to move common
code from drivers to the core.  Remove that functionality and fix
drivers.

The user-visible bug this is fixing is that trying to offload a second
filter would trigger a spurious DESTROY and in turn disable the already
installed one.


Jakub Kicinski (2):
  cls_bpf: fix offload assumptions after callback conversion
  nfp: bpf: keep track of the offloaded program

 drivers/net/ethernet/netronome/nfp/bpf/main.c | 55 ++++++++++++----
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  8 +++
 include/net/pkt_cls.h                         |  5 +-
 net/sched/cls_bpf.c                           | 93 +++++++++++----------------
 4 files changed, 92 insertions(+), 69 deletions(-)

-- 
2.15.1

^ permalink raw reply

* Re: [RFC] hv_netvsc: automatically name slave VF network device
From: Stephen Hemminger @ 2017-12-19 21:29 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, Stephen Hemminger, Jiri Pirko
In-Reply-To: <20171219131816.70645a7b@cakuba.netronome.com>

On Tue, 19 Dec 2017 13:18:16 -0800
Jakub Kicinski <kubakici@wp.pl> wrote:

> On Tue, 19 Dec 2017 12:44:25 -0800, Stephen Hemminger wrote:
> > On Tue, 19 Dec 2017 12:32:34 -0800
> > Jakub Kicinski <kubakici@wp.pl> wrote:
> >   
> > > On Tue, 19 Dec 2017 11:35:37 -0800, Stephen Hemminger wrote:    
> > > > Rename the VF device to ethX_vf based on the ethX as the
> > > > synthetic device.  This eliminates the need for delay on setup,
> > > > and the PCI (udev based) naming is not reproducible on Hyper-V
> > > > anyway. The name of the VF does not matter since all control
> > > > operations take place the primary device. It does make the
> > > > user experience better to associate the names.
> > > > 
> > > > Based on feedback from all.systems.go talk.
> > > > The downside is that it requires exporting a symbol from netdev
> > > > core which makes it harder to backport.
> > > > 
> > > > Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>      
> > > 
> > > Why do you have to name the devices in the kernel space in the first
> > > place? :/  Why don't upstream the correct change to biosdevname like
> > > hardware vendors do?    
> > 
> > biosdevname is dead, gone and wouldn't work on Azure (it dumpster dives in /dev/mem).  
> 
> Hm, I haven't worked on biosdevname myself, but AFAIU it also falls 
> back to information from the PCI VPD, which could be populated by 
> the hypervisor.

VPD never had any useful standard are info.
The rules used by udev come off sysfs, see:
  https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/

> 
> > I assume you mean the modern application is udev, and it works but the name is meaningless
> > because it based of synthetic PCI information. The PCI host adapter is simulated
> > for pass through devices. Names like enp12s0.
> > 
> > Since every passthrough VF device on Hyper-V/Azure has a matching synthetic
> > network device with same mac address. It is best to have the relationship
> > shown in the name.  
> 
> How about we make the VF drivers expose "vf" as phys_port_name?
> Then systemd/udev should glue that onto the name regardless of
> how the VF is used.

One of the goals was not to modify in any way other drivers (like VF).

> 
> > > Your VF setup is really _not_ special, I don't understand why we are 
> > > OK with ignoring the standard practices.  Real enterprise distroes
> > > are very careful never to break the naming of interfaces and they keep
> > > the naming policy in user space.  Playing tricks in the kernel has every
> > > chance of breaking existing user setups.    
> > 
> > Actually, Systemd folks said "naming policy is in userspace only because
> > kernel can't get it right". Also there is no uniformity in userspace
> > there are at least 5 systems trying to do network setup. And most of
> > them depend on eth0 (yes still). Fixing userspace is impossible.  

^ permalink raw reply

* [PATCH v2] net: amd-xgbe: Get rid of custom hex_dump_to_buffer()
From: Andy Shevchenko @ 2017-12-19 21:22 UTC (permalink / raw)
  To: Tom Lendacky, David S . Miller, netdev; +Cc: Andy Shevchenko

Get rid of yet another custom hex_dump_to_buffer().

The output is slightly changed, i.e. each byte followed by white space.

Note, we don't use print_hex_dump() here since the original code uses
nedev_dbg().

Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 24 +++++++-----------------
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index a74a8fbad53a..7a3ebfd236f5 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -2930,9 +2930,8 @@ void xgbe_dump_rx_desc(struct xgbe_prv_data *pdata, struct xgbe_ring *ring,
 void xgbe_print_pkt(struct net_device *netdev, struct sk_buff *skb, bool tx_rx)
 {
 	struct ethhdr *eth = (struct ethhdr *)skb->data;
-	unsigned char *buf = skb->data;
 	unsigned char buffer[128];
-	unsigned int i, j;
+	unsigned int i;
 
 	netdev_dbg(netdev, "\n************** SKB dump ****************\n");
 
@@ -2943,22 +2942,13 @@ void xgbe_print_pkt(struct net_device *netdev, struct sk_buff *skb, bool tx_rx)
 	netdev_dbg(netdev, "Src MAC addr: %pM\n", eth->h_source);
 	netdev_dbg(netdev, "Protocol: %#06hx\n", ntohs(eth->h_proto));
 
-	for (i = 0, j = 0; i < skb->len;) {
-		j += snprintf(buffer + j, sizeof(buffer) - j, "%02hhx",
-			      buf[i++]);
-
-		if ((i % 32) == 0) {
-			netdev_dbg(netdev, "  %#06x: %s\n", i - 32, buffer);
-			j = 0;
-		} else if ((i % 16) == 0) {
-			buffer[j++] = ' ';
-			buffer[j++] = ' ';
-		} else if ((i % 4) == 0) {
-			buffer[j++] = ' ';
-		}
+	for (i = 0; i < skb->len; i += 32) {
+		unsigned int len = min(skb->len - i, 32U);
+
+		hex_dump_to_buffer(&skb->data[i], len, 32, 1,
+				   buffer, sizeof(buffer), false);
+		netdev_dbg(netdev, "  %#06x: %s\n", i, buffer);
 	}
-	if (i % 32)
-		netdev_dbg(netdev, "  %#06x: %s\n", i - (i % 32), buffer);
 
 	netdev_dbg(netdev, "\n************** SKB dump ****************\n");
 }
-- 
2.15.1

^ permalink raw reply related

* Re: [RFC] hv_netvsc: automatically name slave VF network device
From: Jakub Kicinski @ 2017-12-19 21:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Stephen Hemminger, Jiri Pirko
In-Reply-To: <20171219124425.56033614@xeon-e3>

On Tue, 19 Dec 2017 12:44:25 -0800, Stephen Hemminger wrote:
> On Tue, 19 Dec 2017 12:32:34 -0800
> Jakub Kicinski <kubakici@wp.pl> wrote:
> 
> > On Tue, 19 Dec 2017 11:35:37 -0800, Stephen Hemminger wrote:  
> > > Rename the VF device to ethX_vf based on the ethX as the
> > > synthetic device.  This eliminates the need for delay on setup,
> > > and the PCI (udev based) naming is not reproducible on Hyper-V
> > > anyway. The name of the VF does not matter since all control
> > > operations take place the primary device. It does make the
> > > user experience better to associate the names.
> > > 
> > > Based on feedback from all.systems.go talk.
> > > The downside is that it requires exporting a symbol from netdev
> > > core which makes it harder to backport.
> > > 
> > > Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>    
> > 
> > Why do you have to name the devices in the kernel space in the first
> > place? :/  Why don't upstream the correct change to biosdevname like
> > hardware vendors do?  
> 
> biosdevname is dead, gone and wouldn't work on Azure (it dumpster dives in /dev/mem).

Hm, I haven't worked on biosdevname myself, but AFAIU it also falls 
back to information from the PCI VPD, which could be populated by 
the hypervisor.

> I assume you mean the modern application is udev, and it works but the name is meaningless
> because it based of synthetic PCI information. The PCI host adapter is simulated
> for pass through devices. Names like enp12s0.
> 
> Since every passthrough VF device on Hyper-V/Azure has a matching synthetic
> network device with same mac address. It is best to have the relationship
> shown in the name.

How about we make the VF drivers expose "vf" as phys_port_name?
Then systemd/udev should glue that onto the name regardless of
how the VF is used.

> > Your VF setup is really _not_ special, I don't understand why we are 
> > OK with ignoring the standard practices.  Real enterprise distroes
> > are very careful never to break the naming of interfaces and they keep
> > the naming policy in user space.  Playing tricks in the kernel has every
> > chance of breaking existing user setups.  
> 
> Actually, Systemd folks said "naming policy is in userspace only because
> kernel can't get it right". Also there is no uniformity in userspace
> there are at least 5 systems trying to do network setup. And most of
> them depend on eth0 (yes still). Fixing userspace is impossible.

^ permalink raw reply

* Re: [PATCH v1] net: amd-xgbe: Get rid of custom hex_dump_to_buffer()
From: Tom Lendacky @ 2017-12-19 21:16 UTC (permalink / raw)
  To: Andy Shevchenko, David S. Miller, netdev
In-Reply-To: <20171219180231.13347-1-andriy.shevchenko@linux.intel.com>

On 12/19/2017 12:02 PM, Andy Shevchenko wrote:
> Get rid of yet another custom hex_dump_to_buffer().
> 
> The output is slightly changed, i.e. each byte followed by white space.
> 
> Note, we don't use print_hex_dump() here since the original code uses
> nedev_dbg().
> 
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> ---
>  drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 23 ++++++-----------------
>  1 file changed, 6 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> index a74a8fbad53a..fc58dc43a5bd 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
> @@ -2930,9 +2930,8 @@ void xgbe_dump_rx_desc(struct xgbe_prv_data *pdata, struct xgbe_ring *ring,
>  void xgbe_print_pkt(struct net_device *netdev, struct sk_buff *skb, bool tx_rx)
>  {
>  	struct ethhdr *eth = (struct ethhdr *)skb->data;
> -	unsigned char *buf = skb->data;
>  	unsigned char buffer[128];
> -	unsigned int i, j;
> +	unsigned int i;
>  
>  	netdev_dbg(netdev, "\n************** SKB dump ****************\n");
>  
> @@ -2943,22 +2942,12 @@ void xgbe_print_pkt(struct net_device *netdev, struct sk_buff *skb, bool tx_rx)
>  	netdev_dbg(netdev, "Src MAC addr: %pM\n", eth->h_source);
>  	netdev_dbg(netdev, "Protocol: %#06hx\n", ntohs(eth->h_proto));
>  
> -	for (i = 0, j = 0; i < skb->len;) {
> -		j += snprintf(buffer + j, sizeof(buffer) - j, "%02hhx",
> -			      buf[i++]);
> -
> -		if ((i % 32) == 0) {
> -			netdev_dbg(netdev, "  %#06x: %s\n", i - 32, buffer);
> -			j = 0;
> -		} else if ((i % 16) == 0) {
> -			buffer[j++] = ' ';
> -			buffer[j++] = ' ';
> -		} else if ((i % 4) == 0) {
> -			buffer[j++] = ' ';
> -		}
> +	for (i = 0; i < skb->len; i += 32) {
> +		unsigned int len = min(skb->len - i, 32U);
> +
> +		hex_dump_to_buffer(&skb->data[i], len, 32, 1, buffer, 128, false);

I'd prefer to see sizeof(buffer) vs the hard-coded 128 here.  Also, this
line exceeds 80 characters, so it should be split.

With those changes:

Acked-by: Tom Lendacky <thomas.lendacky@amd.com>

> +		netdev_dbg(netdev, "  %#06x: %s\n", i, buffer);
>  	}
> -	if (i % 32)
> -		netdev_dbg(netdev, "  %#06x: %s\n", i - (i % 32), buffer);
>  
>  	netdev_dbg(netdev, "\n************** SKB dump ****************\n");
>  }
> 

^ permalink raw reply

* [PATCH net-next] net: Clarify dev_weight documentation for LRO and GRO_HW.
From: Michael Chan @ 2017-12-19 21:12 UTC (permalink / raw)
  To: davem; +Cc: netdev

Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 Documentation/sysctl/net.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index b67044a..35c62f5 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -95,7 +95,9 @@ dev_weight
 --------------
 
 The maximum number of packets that kernel can handle on a NAPI interrupt,
-it's a Per-CPU variable.
+it's a Per-CPU variable. For drivers that support LRO or GRO_HW, a hardware
+aggregated packet is counted as one packet in this context.
+
 Default: 64
 
 dev_weight_rx_bias
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net V2] net: always reevalulate autoflowlabel setting for reset packet
From: Shaohua Li @ 2017-12-19 20:58 UTC (permalink / raw)
  To: netdev, davem; +Cc: Kernel Team, Shaohua Li, Martin KaFai Lau

From: Shaohua Li <shli@fb.com>

ipv6_pinfo.autoflowlabel is set in sock creation. Later if we change
sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't changed,
so the sock will keep the old behavior in terms of auto flowlabel. Reset
packet is suffering from this problem, because reset packset is sent
from a special control socket, which is created at boot time. Since
sysctl.ipv6.auto_flowlabels is 2 by default, the control socket will
always have its ipv6_pinfo.autoflowlabel set, even after user set
sysctl.ipv6.auto_flowlabels to 1, so reset packset will always have
flowlabel.

To fix this, we always reevaluate autoflowlabel setting for reset
packet. Normal sock has the same issue too, but since the
sysctl.ipv6.auto_flowlabels is usually set at host startup, this isn't a
big issue for normal sock.

Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 net/ipv6/tcp_ipv6.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 7178476..5fba548 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -787,9 +787,11 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
 	struct net *net = sk ? sock_net(sk) : dev_net(skb_dst(skb)->dev);
 	struct sock *ctl_sk = net->ipv6.tcp_sk;
 	unsigned int tot_len = sizeof(struct tcphdr);
+	struct ipv6_pinfo *np = inet6_sk(ctl_sk);
 	struct dst_entry *dst;
 	__be32 *topt;
 
+	np->autoflowlabel = ip6_default_np_autolabel(net);
 	if (tsecr)
 		tot_len += TCPOLEN_TSTAMP_ALIGNED;
 #ifdef CONFIG_TCP_MD5SIG
-- 
2.9.5

^ permalink raw reply related

* Re: [PATCH net] openvswitch: Fix pop_vlan action for double tagged frames
From: Eric Garver @ 2017-12-19 20:59 UTC (permalink / raw)
  To: Jiri Benc, netdev, ovs-dev
In-Reply-To: <20171219204247.GD25853@dev-rhel7>

On Tue, Dec 19, 2017 at 03:42:47PM -0500, Eric Garver wrote:
> On Tue, Dec 19, 2017 at 08:39:29PM +0100, Jiri Benc wrote:
> > On Tue, 19 Dec 2017 13:57:53 -0500, Eric Garver wrote:
> > > --- a/net/openvswitch/flow.c
> > > +++ b/net/openvswitch/flow.c
> > > @@ -559,8 +559,9 @@ static int parse_nsh(struct sk_buff *skb, struct sw_flow_key *key)
> > >   *      of a correct length, otherwise the same as skb->network_header.
> > >   *      For other key->eth.type values it is left untouched.
> > >   *
> > > - *    - skb->protocol: the type of the data starting at skb->network_header.
> > > - *      Equals to key->eth.type.
> > > + *    - skb->protocol: For Ethernet, the ethertype or VLAN TPID.
> > > + *      For non-Ethernet, the type of the data starting at skb->network_header
> > > + *      (also equal to key->eth.type).
> > >   */
> > >  static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
> > >  {
> > > @@ -579,6 +580,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
> > >  			return -EINVAL;
> > >  
> > >  		skb_reset_network_header(skb);
> > > +		key->eth.type = skb->protocol;
> > >  	} else {
> > >  		eth = eth_hdr(skb);
> > >  		ether_addr_copy(key->eth.src, eth->h_source);
> > > @@ -592,15 +594,14 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key)
> > >  		if (unlikely(parse_vlan(skb, key)))
> > >  			return -ENOMEM;
> > >  
> > > -		skb->protocol = parse_ethertype(skb);
> > > -		if (unlikely(skb->protocol == htons(0)))
> > > +		key->eth.type = parse_ethertype(skb);
> > > +		if (unlikely(key->eth.type == htons(0)))
> > >  			return -ENOMEM;
> > >  
> > >  		skb_reset_network_header(skb);
> > >  		__skb_push(skb, skb->data - skb_mac_header(skb));
> > >  	}
> > >  	skb_reset_mac_len(skb);
> > > -	key->eth.type = skb->protocol;
> > >  
> > >  	/* Network layer. */
> > >  	if (key->eth.type == htons(ETH_P_IP)) {
> > 
> > Unfortunately, this does not work. key_extract must set skb->protocol
> > even for Ethernet frames that come from a mixed L2/L3 tunnel. Such
> > packets will have key->mac_proto set to MAC_PROTO_ETHERNET and
> > skb->protocol set to ETH_P_TEB (see key_extract_mac_proto). In
> > key_extract, skb->protocol has to be correctly set to the dissected
> > value.
> 
> AFAICS, it's always overridden to ETH_P_TEB on output by
> ovs_vport_send() and that's the sole reason it works today.
> 
> For dissecting, the L2 case is currently setting skb->protocol to the
> real ethertype (e.g. 0x800) not ETH_P_TEB. For RX from tunnel case it'll
> indeed be ETH_P_TEB.

Of course after I hit send I realize what you were saying. I follow now
why skb->protocol needs to be the real ethertype.

> > 
> > Which means that we have to check for the existence of inner vlan tag
> > (by checking key->eth.cvlan.tci or, perhaps better, by returning it
> > from parse_vlan) and set skb->protocol accordingly.

I'll see what we can do here. Thanks Jiri.

^ permalink raw reply

* [PATCH bpf-next] libbpf: Fix build errors.
From: David Miller @ 2017-12-19 20:53 UTC (permalink / raw)
  To: daniel; +Cc: netdev


These elf object pieces are of type Elf64_Xword and therefore could be
"long long" on some builds.

Cast to "long long" and use printf format %lld to deal with this since
we are building with -Werror=format.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 5b83875..e9c4b7c 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -910,8 +910,9 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 				   GELF_R_SYM(rel.r_info));
 			return -LIBBPF_ERRNO__FORMAT;
 		}
-		pr_debug("relo for %ld value %ld name %d\n",
-			 rel.r_info >> 32, sym.st_value, sym.st_name);
+		pr_debug("relo for %lld value %lld name %d\n",
+			 (long long) (rel.r_info >> 32),
+			 (long long) sym.st_value, sym.st_name);
 
 		if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
 			pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",

^ permalink raw reply related

* Re: thunderx sgmii interface hang
From: Andrew Lunn @ 2017-12-19 20:52 UTC (permalink / raw)
  To: Tim Harvey; +Cc: Sunil Goutham, netdev
In-Reply-To: <CAJ+vNU0yYkoAGZ6sU9VGfA6A04KU4s3rzZSb=1uxsqwKHioePQ@mail.gmail.com>

On Mon, Dec 18, 2017 at 01:53:47PM -0800, Tim Harvey wrote:
> On Wed, Dec 13, 2017 at 11:43 AM, Andrew Lunn <andrew@lunn.ch> wrote:
> >> The nic appears to work fine (pings, TCP etc) up until a performance
> >> test is attempted.
> >> When an iperf bandwidth test is attempted the nic ends up in a state
> >> where truncated-ip packets are being sent out (per a tcpdump from
> >> another board):
> >
> > Hi Tim
> >
> > Are pause frames supported? Have you tried turning them off?
> >
> > Can you reproduce the issue with UDP? Or is it TCP only?
> >
> 
> Andrew,
> 
> Pause frames don't appear to be supported yet and the issue occurs
> when using UDP as well as TCP. I'm not clear what the best way to
> troubleshoot this is.

Hi Tim

Is pause being negotiated? In theory, it should not be. The PHY should
not offer it, if the MAC has not enabled it. But some PHY drivers are
probably broken and offer pause when they should not.

Also, can you trigger the issue using UDP at say 75% the maximum
bandwidth. That should be low enough that the peer never even tries to
use pause.

All this pause stuff is just a stab in the dark. Something else to try
is to turn off various forms off acceleration, ethtook -K, and see if
that makes a difference.

     Andrew

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox