Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] rxrpc: Add /proc/net/rxrpc/peers to display peer list
From: David Miller @ 2018-10-16  5:53 UTC (permalink / raw)
  To: dhowells; +Cc: netdev, linux-afs, linux-kernel
In-Reply-To: <153959946389.27408.16748794272430961849.stgit@warthog.procyon.org.uk>

From: David Howells <dhowells@redhat.com>
Date: Mon, 15 Oct 2018 11:31:03 +0100

> Add /proc/net/rxrpc/peers to display the list of peers currently active.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>

Applied.

^ permalink raw reply

* [PATCH v2] net: fix warning in af_unix
From: Kyeongdon Kim @ 2018-10-16  5:57 UTC (permalink / raw)
  To: davem, ktkhai, viro, garsilva, jbaron, dvlasenk, xiyou.wangcong
  Cc: netdev, linux-kernel, kyeongdon.kim

This fixes the "'hash' may be used uninitialized in this function"

net/unix/af_unix.c:1041:20: warning: 'hash' may be used uninitialized in this function [-Wmaybe-uninitialized]
  addr->hash = hash ^ sk->sk_type;

Signed-off-by: Kyeongdon Kim <kyeongdon.kim@lge.com>
---
 net/unix/af_unix.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index d1edfa3..98d34fb 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -225,6 +225,8 @@ static inline void unix_release_addr(struct unix_address *addr)
 
 static int unix_mkname(struct sockaddr_un *sunaddr, int len, unsigned int *hashp)
 {
+	*hashp = 0;
+
 	if (len <= sizeof(short) || len > sizeof(*sunaddr))
 		return -EINVAL;
 	if (!sunaddr || sunaddr->sun_family != AF_UNIX)
-- 
2.6.2

^ permalink raw reply related

* Re: [PATCH net-next,v3] hv_netvsc: fix vf serial matching with pci slot info
From: David Miller @ 2018-10-16  5:58 UTC (permalink / raw)
  To: haiyangz, haiyangz
  Cc: netdev, kys, sthemmin, olaf, vkuznets, devel, linux-kernel
In-Reply-To: <20181015190615.30628-1-haiyangz@linuxonhyperv.com>

From: Haiyang Zhang <haiyangz@linuxonhyperv.com>
Date: Mon, 15 Oct 2018 19:06:15 +0000

> From: Haiyang Zhang <haiyangz@microsoft.com>
> 
> The VF device's serial number is saved as a string in PCI slot's
> kobj name, not the slot->number. This patch corrects the netvsc
> driver, so the VF device can be successfully paired with synthetic
> NIC.
> 
> Fixes: 00d7ddba1143 ("hv_netvsc: pair VF based on serial number")
> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] sctp: use the pmtu from the icmp packet to update transport pathmtu
From: Marcelo Ricardo Leitner @ 2018-10-15 22:27 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, davem, Neil Horman
In-Reply-To: <de1ad572beb97767e535b7cf60da7b1de6cbfd4f.1539604709.git.lucien.xin@gmail.com>

On Mon, Oct 15, 2018 at 07:58:29PM +0800, Xin Long wrote:
> Other than asoc pmtu sync from all transports, sctp_assoc_sync_pmtu
> is also processing transport pmtu_pending by icmp packets. But it's
> meaningless to use sctp_dst_mtu(t->dst) as new pmtu for a transport.
> 
> The right pmtu value should come from the icmp packet, and it would
> be saved into transport->mtu_info in this patch and used later when
> the pmtu sync happens in sctp_sendmsg_to_asoc or sctp_packet_config.
> 
> Besides, without this patch, as pmtu can only be updated correctly
> when receiving a icmp packet and no place is holding sock lock, it
> will take long time if the sock is busy with sending packets.
> 
> Note that it doesn't process transport->mtu_info in .release_cb(),
> as there is no enough information for pmtu update, like for which
> asoc or transport. It is not worth traversing all asocs to check
> pmtu_pending. So unlike tcp, sctp does this in tx path, for which
> mtu_info needs to be atomic_t.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

> ---
>  include/net/sctp/structs.h | 2 ++
>  net/sctp/associola.c       | 3 ++-
>  net/sctp/input.c           | 1 +
>  net/sctp/output.c          | 6 ++++++
>  4 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index 28a7c8e..a11f937 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -876,6 +876,8 @@ struct sctp_transport {
>  	unsigned long sackdelay;
>  	__u32 sackfreq;
>  
> +	atomic_t mtu_info;
> +
>  	/* When was the last time that we heard from this transport? We use
>  	 * this to pick new active and retran paths.
>  	 */
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index 297d9cf..a827a1f 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -1450,7 +1450,8 @@ void sctp_assoc_sync_pmtu(struct sctp_association *asoc)
>  	/* Get the lowest pmtu of all the transports. */
>  	list_for_each_entry(t, &asoc->peer.transport_addr_list, transports) {
>  		if (t->pmtu_pending && t->dst) {
> -			sctp_transport_update_pmtu(t, sctp_dst_mtu(t->dst));
> +			sctp_transport_update_pmtu(t,
> +						   atomic_read(&t->mtu_info));
>  			t->pmtu_pending = 0;
>  		}
>  		if (!pmtu || (t->pathmtu < pmtu))
> diff --git a/net/sctp/input.c b/net/sctp/input.c
> index 9bbc5f9..5c36a99 100644
> --- a/net/sctp/input.c
> +++ b/net/sctp/input.c
> @@ -395,6 +395,7 @@ void sctp_icmp_frag_needed(struct sock *sk, struct sctp_association *asoc,
>  		return;
>  
>  	if (sock_owned_by_user(sk)) {
> +		atomic_set(&t->mtu_info, pmtu);
>  		asoc->pmtu_pending = 1;
>  		t->pmtu_pending = 1;
>  		return;
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index 7f849b0..67939ad 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -120,6 +120,12 @@ void sctp_packet_config(struct sctp_packet *packet, __u32 vtag,
>  			sctp_assoc_sync_pmtu(asoc);
>  	}
>  
> +	if (asoc->pmtu_pending) {
> +		if (asoc->param_flags & SPP_PMTUD_ENABLE)
> +			sctp_assoc_sync_pmtu(asoc);
> +		asoc->pmtu_pending = 0;
> +	}
> +
>  	/* If there a is a prepend chunk stick it on the list before
>  	 * any other chunks get appended.
>  	 */
> -- 
> 2.1.0
> 

^ permalink raw reply

* Re: [PATCH net] rxrpc: Fix a missing rxrpc_put_peer() in the error_report handler
From: David Miller @ 2018-10-16  6:14 UTC (permalink / raw)
  To: dhowells; +Cc: netdev, linux-afs, linux-kernel
In-Reply-To: <153963944115.2618.17691876338723925930.stgit@warthog.procyon.org.uk>

From: David Howells <dhowells@redhat.com>
Date: Mon, 15 Oct 2018 22:37:21 +0100

> Fix a missing call to rxrpc_put_peer() on the main path through the
> rxrpc_error_report() function.  This manifests itself as a ref leak
> whenever an ICMP packet or other error comes in.
> 
> In commit f334430316e7, the hand-off of the ref to a work item was removed
> and was not replaced with a put.
> 
> Fixes: f334430316e7 ("rxrpc: Fix error distribution")
> Signed-off-by: David Howells <dhowells@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH bpf-next 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
From: Daniel Borkmann @ 2018-10-15 22:30 UTC (permalink / raw)
  To: Yonghong Song, ast, kafai, netdev; +Cc: kernel-team
In-Reply-To: <20181012185424.2378502-3-yhs@fb.com>

On 10/12/2018 08:54 PM, Yonghong Song wrote:
> This patch adds BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
> support to the type section. BTF_KIND_FUNC_PROTO is used
> to specify the type of a function pointer. With this,
> BTF has a complete set of C types (except float).
> 
> BTF_KIND_FUNC is used to specify the signature of a
> defined subprogram. BTF_KIND_FUNC_PROTO can be referenced
> by another type, e.g., a pointer type, and BTF_KIND_FUNC
> type cannot be referenced by another type.
> 
> For both BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO types,
> the func return type is in t->type (where t is a
> "struct btf_type" object). The func args are an array of
> u32s immediately following object "t".
> 
> As a concrete example, for the C program below,
>   $ cat test.c
>   int foo(int (*bar)(int)) { return bar(5); }
> with latest llvm trunk built with Debug mode, we have
>   $ clang -target bpf -g -O2 -mllvm -debug-only=btf -c test.c
>   Type Table:
>   [1] FUNC name_off=1 info=0x0c000001 size/type=2
>           param_type=3
>   [2] INT name_off=11 info=0x01000000 size/type=4
>           desc=0x01000020
>   [3] PTR name_off=0 info=0x02000000 size/type=4
>   [4] FUNC_PROTO name_off=0 info=0x0d000001 size/type=2
>           param_type=2
> 
>   String Table:
>   0 :
>   1 : foo
>   5 : .text
>   11 : int
>   15 : test.c
>   22 : int foo(int (*bar)(int)) { return bar(5); }
> 
>   FuncInfo Table:
>   sec_name_off=5
>           insn_offset=<Omitted> type_id=1
> 
>   ...
> 
> (Eventually we shall have bpftool to dump btf information
>  like the above.)
> 
> Function "foo" has a FUNC type (type_id = 1).
> The parameter of "foo" has type_id 3 which is PTR->FUNC_PROTO,
> where FUNC_PROTO refers to function pointer "bar".

Should also "bar" be part of the string table (at least at some point in future)?
Iow, if verifier hints to an issue in the program when it would for example walk
pointers and rewrite ctx access, then it could dump the var name along with it.
It might be useful as well in combination with 22 from str table, when annotating
the source. We might need support for variadic functions, though. How is LLVM
handling the latter with the recent BTF support?

> In FuncInfo Table, for section .text, the function,
> with to-be-determined offset (marked as <Omitted>),
> has type_id=1 which refers to a FUNC type.
> This way, the function signature is
> available to both kernel and user space.
> Here, the insn offset is not available during the dump time
> as relocation is resolved pretty late in the compilation process.
> 
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
From: Fabio Rossi @ 2018-10-15 22:28 UTC (permalink / raw)
  To: Eric Dumazet, Stephen Hemminger; +Cc: netdev
In-Reply-To: <CANn89iLA+rdFNXXdzogLHF1FqYg3CjpwXJbscWTJ8Bk8bN2Scw@mail.gmail.com>



On 15 October 2018 17:41:47 CEST, Eric Dumazet <edumazet@google.com> wrote:
>On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
><stephen@networkplumber.org> wrote:
>>
>>
>>
>> Begin forwarded message:
>>
>> Date: Sun, 14 Oct 2018 10:42:48 +0000
>> From: bugzilla-daemon@bugzilla.kernel.org
>> To: stephen@networkplumber.org
>> Subject: [Bug 201423] New: eth0: hw csum failure
>>
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=201423
>>
>>             Bug ID: 201423
>>            Summary: eth0: hw csum failure
>>            Product: Networking
>>            Version: 2.5
>>     Kernel Version: 4.19.0-rc7
>>           Hardware: Intel
>>                 OS: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Other
>>           Assignee: stephen@networkplumber.org
>>           Reporter: rossi.f@inwind.it
>>         Regression: No
>>
>> I have a P6T DELUXE V2 motherboard and using the sky2 driver for the
>ethernet
>> ports. I get the following error message:
>>
>> [  433.727397] eth0: hw csum failure
>> [  433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7
>#19
>> [  433.727406] Hardware name: System manufacturer System Product
>Name/P6T
>> DELUXE V2, BIOS 1202    12/22/2010
>> [  433.727407] Call Trace:
>> [  433.727409]  <IRQ>
>> [  433.727415]  dump_stack+0x46/0x5b
>> [  433.727419]  __skb_checksum_complete+0xb0/0xc0
>> [  433.727423]  tcp_v4_rcv+0x528/0xb60
>> [  433.727426]  ? ipt_do_table+0x2d0/0x400
>> [  433.727429]  ip_local_deliver_finish+0x5a/0x110
>> [  433.727430]  ip_local_deliver+0xe1/0xf0
>> [  433.727431]  ? ip_sublist_rcv_finish+0x60/0x60
>> [  433.727432]  ip_rcv+0xca/0xe0
>> [  433.727434]  ? ip_rcv_finish_core.isra.0+0x300/0x300
>> [  433.727436]  __netif_receive_skb_one_core+0x4b/0x70
>> [  433.727438]  netif_receive_skb_internal+0x4e/0x130
>> [  433.727439]  napi_gro_receive+0x6a/0x80
>> [  433.727442]  sky2_poll+0x707/0xd20
>> [  433.727446]  ? rcu_check_callbacks+0x1b4/0x900
>> [  433.727447]  net_rx_action+0x237/0x380
>> [  433.727449]  __do_softirq+0xdc/0x1e0
>> [  433.727452]  irq_exit+0xa9/0xb0
>> [  433.727453]  do_IRQ+0x45/0xc0
>> [  433.727455]  common_interrupt+0xf/0xf
>> [  433.727456]  </IRQ>
>> [  433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200
>> [  433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e
>e8 d1 8f
>> ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20
><4c> 89 e1
>> 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
>> [  433.727462] RSP: 0000:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
>> ffffffffffffffde
>> [  433.727463] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
>> 000000000000001f
>> [  433.727464] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
>> 0000000000000000
>> [  433.727465] RBP: ffff880237b263a0 R08: 0000000000000714 R09:
>> 000000650512105d
>> [  433.727465] R10: 00000000ffffffff R11: 0000000000000342 R12:
>> 00000064fc2a8b1c
>> [  433.727466] R13: 00000064fc25b35f R14: 0000000000000004 R15:
>> ffffffff8204af20
>> [  433.727468]  ? cpuidle_enter_state+0x119/0x200
>> [  433.727471]  do_idle+0x1bf/0x200
>> [  433.727473]  cpu_startup_entry+0x6a/0x70
>> [  433.727475]  start_secondary+0x17f/0x1c0
>> [  433.727476]  secondary_startup_64+0xa4/0xb0
>> [  441.662954] eth0: hw csum failure
>> [  441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted
>4.19.0-rc7 #19
>> [  441.662960] Hardware name: System manufacturer System Product
>Name/P6T
>> DELUXE V2, BIOS 1202    12/22/2010
>> [  441.662960] Call Trace:
>> [  441.662963]  <IRQ>
>> [  441.662968]  dump_stack+0x46/0x5b
>> [  441.662972]  __skb_checksum_complete+0xb0/0xc0
>> [  441.662975]  tcp_v4_rcv+0x528/0xb60
>> [  441.662979]  ? ipt_do_table+0x2d0/0x400
>> [  441.662981]  ip_local_deliver_finish+0x5a/0x110
>> [  441.662983]  ip_local_deliver+0xe1/0xf0
>> [  441.662985]  ? ip_sublist_rcv_finish+0x60/0x60
>> [  441.662986]  ip_rcv+0xca/0xe0
>> [  441.662988]  ? ip_rcv_finish_core.isra.0+0x300/0x300
>> [  441.662990]  __netif_receive_skb_one_core+0x4b/0x70
>> [  441.662993]  netif_receive_skb_internal+0x4e/0x130
>> [  441.662994]  napi_gro_receive+0x6a/0x80
>> [  441.662998]  sky2_poll+0x707/0xd20
>> [  441.663000]  net_rx_action+0x237/0x380
>> [  441.663002]  __do_softirq+0xdc/0x1e0
>> [  441.663005]  irq_exit+0xa9/0xb0
>> [  441.663007]  do_IRQ+0x45/0xc0
>> [  441.663009]  common_interrupt+0xf/0xf
>> [  441.663010]  </IRQ>
>> [  441.663012] RIP: 0010:merge+0x22/0xb0
>> [  441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48
>89 d5 53
>> 48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0
><48> 85 c9
>> 74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48
>> [  441.663015] RSP: 0018:ffffc9000090b988 EFLAGS: 00000246 ORIG_RAX:
>> ffffffffffffffde
>> [  441.663017] RAX: 0000000000000000 RBX: ffff88021ab2d408 RCX:
>> ffff88021ab2d408
>> [  441.663018] RDX: ffff88021ab2d388 RSI: ffffffffa021c440 RDI:
>> 0000000000000000
>> [  441.663019] RBP: ffff88021ab2d388 R08: 0000000000005ecf R09:
>> 0000000000008500
>> [  441.663020] R10: ffffea000877ec00 R11: ffff880236803500 R12:
>> ffffffffa021c440
>> [  441.663021] R13: ffff88021ab2d448 R14: 0000000000000004 R15:
>> ffffc9000090b9e0
>> [  441.663048]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120
>[radeon]
>> [  441.663063]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120
>[radeon]
>> [  441.663065]  ? merge+0x57/0xb0
>> [  441.663080]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120
>[radeon]
>> [  441.663082]  list_sort+0x8b/0x230
>> [  441.663094]  radeon_cs_parser_fini+0xdf/0x110 [radeon]
>> [  441.663110]  radeon_cs_ioctl+0x2a4/0x710 [radeon]
>> [  441.663113]  ? __switch_to_asm+0x34/0x70
>> [  441.663114]  ? __switch_to_asm+0x40/0x70
>> [  441.663130]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
>> [  441.663141]  drm_ioctl_kernel+0xa3/0xe0 [drm]
>> [  441.663149]  drm_ioctl+0x2e2/0x380 [drm]
>> [  441.663164]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
>> [  441.663168]  ? page_add_new_anon_rmap+0x42/0x70
>> [  441.663171]  do_vfs_ioctl+0x9a/0x600
>> [  441.663173]  ksys_ioctl+0x35/0x60
>> [  441.663175]  __x64_sys_ioctl+0x11/0x20
>> [  441.663177]  do_syscall_64+0x3d/0xf0
>> [  441.663179]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [  441.663180] RIP: 0033:0x7f9377377f37
>> [  441.663182] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18
>c3 e8 ad
>> db 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05
><48> 3d 01
>> f0 ff ff 73 01 c3 48 8b 0d 21 4f 2c 00 f7 d8 64 89 01 48
>> [  441.663183] RSP: 002b:00007f92c3130d28 EFLAGS: 00000246 ORIG_RAX:
>> 0000000000000010
>> [  441.663185] RAX: ffffffffffffffda RBX: 0000564498327ec0 RCX:
>> 00007f9377377f37
>> [  441.663186] RDX: 0000564498337ec8 RSI: 00000000c0206466 RDI:
>> 0000000000000010
>> [  441.663186] RBP: 0000564498337ec8 R08: 0000000000000000 R09:
>> 0000000000000000
>> [  441.663187] R10: 0000000000000000 R11: 0000000000000246 R12:
>> 00000000c0206466
>> [  441.663188] R13: 0000000000000010 R14: 0000000000000000 R15:
>> 0000564497a38120
>> [  462.833418] eth0: hw csum failure
>> [  462.833428] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7
>#19
>> [  462.833429] Hardware name: System manufacturer System Product
>Name/P6T
>> DELUXE V2, BIOS 1202    12/22/2010
>> [  462.833429] Call Trace:
>> [  462.833432]  <IRQ>
>> [  462.833438]  dump_stack+0x46/0x5b
>> [  462.833442]  __skb_checksum_complete+0xb0/0xc0
>> [  462.833446]  tcp_v4_rcv+0x528/0xb60
>> [  462.833449]  ? ipt_do_table+0x2d0/0x400
>> [  462.833452]  ip_local_deliver_finish+0x5a/0x110
>> [  462.833454]  ip_local_deliver+0xe1/0xf0
>> [  462.833455]  ? ip_sublist_rcv_finish+0x60/0x60
>> [  462.833457]  ip_rcv+0xca/0xe0
>> [  462.833459]  ? ip_rcv_finish_core.isra.0+0x300/0x300
>> [  462.833461]  __netif_receive_skb_one_core+0x4b/0x70
>> [  462.833464]  netif_receive_skb_internal+0x4e/0x130
>> [  462.833466]  napi_gro_receive+0x6a/0x80
>> [  462.833469]  sky2_poll+0x707/0xd20
>> [  462.833471]  net_rx_action+0x237/0x380
>> [  462.833474]  __do_softirq+0xdc/0x1e0
>> [  462.833477]  irq_exit+0xa9/0xb0
>> [  462.833479]  do_IRQ+0x45/0xc0
>> [  462.833481]  common_interrupt+0xf/0xf
>> [  462.833482]  </IRQ>
>> [  462.833486] RIP: 0010:cpuidle_enter_state+0x124/0x200
>> [  462.833488] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e
>e8 d1 8f
>> ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20
><4c> 89 e1
>> 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
>> [  462.833489] RSP: 0018:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
>> ffffffffffffffde
>> [  462.833491] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
>> 000000000000001f
>> [  462.833492] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
>> 0000000000000000
>> [  462.833493] RBP: ffff880237b263a0 R08: 0000000000000000 R09:
>> 0000000000000000
>> [  462.833494] R10: 00000000ffffffff R11: 0000000000000273 R12:
>> 0000006bc3052131
>> [  462.833495] R13: 0000006bc2f99f57 R14: 0000000000000004 R15:
>> ffffffff8204af20
>> [  462.833498]  ? cpuidle_enter_state+0x119/0x200
>> [  462.833503]  do_idle+0x1bf/0x200
>> [  462.833506]  cpu_startup_entry+0x6a/0x70
>> [  462.833510]  start_secondary+0x17f/0x1c0
>> [  462.833513]  secondary_startup_64+0xa4/0xb0
>>
>> Something is changed between 4.17.12 and 4.18, after bisecting the
>problem I
>> got the following first bad commit:
>>
>> commit 88078d98d1bb085d72af8437707279e203524fa5
>> Author: Eric Dumazet <edumazet@google.com>
>> Date:   Wed Apr 18 11:43:15 2018 -0700
>>
>>     net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
>>
>>     After working on IP defragmentation lately, I found that some
>large
>>     packets defeat CHECKSUM_COMPLETE optimization because of NIC
>adding
>>     zero paddings on the last (small) fragment.
>>
>>     While removing the padding with pskb_trim_rcsum(), we set
>skb->ip_summed
>>     to CHECKSUM_NONE, forcing a full csum validation, even if all
>prior
>>     fragments had CHECKSUM_COMPLETE set.
>>
>>     We can instead compute the checksum of the part we are trimming,
>>     usually smaller than the part we keep.
>>
>>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>
>Thanks for bisecting !
>
>This commit is known to expose some NIC/driver bugs.
>
>Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
>("net: sungem: fix rx checksum support")  for one driver needing a fix.
>
>I assume SKY2_HW_NEW_LE is not set on your NIC ?

here is the chip found on my motherboard so it seems that flag is not set 

# dmesg | grep sky2

[    0.545661] sky2: driver version 1.30 [    0.545781] sky2 0000:06:00.0: Yukon-2 EC Ultra chip revision 3 [    0.546067] sky2 0000:06:00.0 eth0: addr 00:24:8c:xx:xx:xx [    0.546188] sky2 0000:04:00.0: Yukon-2 EC Ultra chip revision 3 [    0.546484] sky2 0000:04:00.0 eth1: addr 00:24:8c:xx:xx:xx [   38.043074] sky2 0000:06:00.0 eth0: enabling interface [   39.842161] sky2 0000:06:00.0 eth0: Link is up at 100 Mbps, full duplex, flow control rx

^ permalink raw reply

* Re: [PATCH bpf-next 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
From: Daniel Borkmann @ 2018-10-15 22:36 UTC (permalink / raw)
  To: Yonghong Song, ast, kafai, netdev; +Cc: kernel-team
In-Reply-To: <20181012185424.2378502-3-yhs@fb.com>

On 10/12/2018 08:54 PM, Yonghong Song wrote:
[...]
> +static bool btf_name_valid_identifier(const struct btf *btf, u32 offset)
> +{
> +	/* offset must be valid */
> +	const char *src = &btf->strings[offset];
> +
> +	if (!isalpha(*src) && *src != '_')
> +		return false;
> +
> +	src++;
> +	while (*src) {
> +		if (!isalnum(*src) && *src != '_')
> +			return false;
> +		src++;
> +	}
> +
> +	return true;
> +}

Should there be an upper name length limit like KSYM_NAME_LEN? (Is it implied
by the kvmalloc() limit?)

>  static const char *btf_name_by_offset(const struct btf *btf, u32 offset)
>  {
>  	if (!offset)
> @@ -747,7 +782,9 @@ static bool env_type_is_resolve_sink(const struct btf_verifier_env *env,
>  		/* int, enum or void is a sink */
>  		return !btf_type_needs_resolve(next_type);
>  	case RESOLVE_PTR:
> -		/* int, enum, void, struct or array is a sink for ptr */
> +		/* int, enum, void, struct, array or func_ptoto is a sink
> +		 * for ptr
> +		 */
>  		return !btf_type_is_modifier(next_type) &&
>  			!btf_type_is_ptr(next_type);
>  	case RESOLVE_STRUCT_OR_ARRAY:

^ permalink raw reply

* Re: [PATCH bpf-next] tools: bpftool: add map create command
From: Jakub Kicinski @ 2018-10-15 22:53 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: daniel, netdev, oss-drivers
In-Reply-To: <20181015195805.7xob34egcs3pqvag@ast-mbp.dhcp.thefacebook.com>

On Mon, 15 Oct 2018 12:58:07 -0700, Alexei Starovoitov wrote:
> > > >  	fprintf(stderr,
> > > >  		"Usage: %s %s { show | list }   [MAP]\n"
> > > > +		"       %s %s create     FILE type TYPE key KEY_SIZE value VALUE_SIZE \\\n"
> > > > +		"                              entries MAX_ENTRIES [name NAME] [flags FLAGS] \\\n"
> > > > +		"                              [dev NAME]\n"    
> > > 
> > > I suspect as soon as bpftool has an ability to create standalone maps
> > > some folks will start relying on such interface.  
> > 
> > That'd be cool, do you see any real life use cases where its useful
> > outside of corner case testing?  
> 
> In our XDP use case we have an odd protocol for different apps to share
> common prog_array that is pinned in bpffs.
> If cmdline creation of it via bpftool was available that would have been
> an option to consider. Not saying that it would have been a better option.
> Just another option.

I see, I didn't think of prog arrays.

> > > Therefore I'd like to request to make 'name' argument to be mandatory.  
> > 
> > Will do in v2!  
> 
> thx!
>  
> > > I think in the future we will require BTF to be mandatory too.
> > > We need to move towards more transparent and debuggable infra.
> > > Do you think requiring json description of key/value would be managable to implement?
> > > Then bpftool could convert it to BTF and the map full be fully defined.
> > > I certainly understand that bpf prog can disregard the key/value layout today,
> > > but we will make verifier to enforce that in the future too.  
> > 
> > I was hoping that we can leave BTF support as a future extension, and
> > then once we have the option for the verifier to enforce BTF (a sysctl?)
> > the bpftool map create without a BTF will get rejected as one would
> > expect.    
> 
> right. something like sysctl in the future.
> 
> > IOW it's fine not to make BTF required at bpftool level and
> > leave it to system configuration.
> > 
> > I'd love to implement the BTF support right away, but I'm not sure I
> > can afford that right now time-wise.  The whole map create command is
> > pretty trivial, but for BTF we don't even have a way of dumping it
> > AFAICT.  We can pretty print values, but what is the format in which to
> > express the BTF itself?  We could do JSON, do we use an external
> > library?  Should we have a separate BTF command for that?  
> 
> I prefer standard C type description for both input and output :)
> Anyway that wasn't a request for you to do it now. More of the feature
> request for somebody to put on todo list :)

Oh, okay :)  

I will wait for John's patches to get merged and post v2, otherwise
we'd conflict on the man page.

^ permalink raw reply

* Re: [PATCH bpf-next 0/2] IPv6 sk-lookup fixes
From: Alexei Starovoitov @ 2018-10-15 23:10 UTC (permalink / raw)
  To: Joe Stringer; +Cc: daniel, ast, netdev
In-Reply-To: <20181015172746.6475-1-joe@wand.net.nz>

On Mon, Oct 15, 2018 at 10:27:44AM -0700, Joe Stringer wrote:
> This series includes a couple of fixups for the IPv6 socket lookup
> helper, to make the API more consistent (always supply all arguments in
> network byte-order) and to allow its use when IPv6 is compiled as a
> module.

Applied, Thanks

^ permalink raw reply

* Re: [PATCH bpf-next 05/13] bpf: get better bpf_prog ksyms based on btf func type_id
From: Martin Lau @ 2018-10-15 23:12 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, daniel@iogearbox.net, netdev@vger.kernel.org,
	Kernel Team
In-Reply-To: <20181012185446.2379289-1-yhs@fb.com>

On Fri, Oct 12, 2018 at 11:54:42AM -0700, Yonghong Song wrote:
> This patch added interface to load a program with the following
> additional information:
>    . prog_btf_fd
>    . func_info and func_info_len
> where func_info will provides function range and type_id
> corresponding to each function.
> 
> If verifier agrees with function range provided by the user,
> the bpf_prog ksym for each function will use the func name
> provided in the type_id, which is supposed to provide better
> encoding as it is not limited by 16 bytes program name
> limitation and this is better for bpf program which contains
> multiple subprograms.
> 
> The bpf_prog_info interface is also extended to
> return btf_id and jited_func_types, so user spaces can
> print out the function prototype for each jited function.
Some nits.

> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  include/linux/bpf.h          |  2 +
>  include/linux/bpf_verifier.h |  1 +
>  include/linux/btf.h          |  2 +
>  include/uapi/linux/bpf.h     | 11 +++++
>  kernel/bpf/btf.c             | 16 +++++++
>  kernel/bpf/core.c            |  9 ++++
>  kernel/bpf/syscall.c         | 86 +++++++++++++++++++++++++++++++++++-
>  kernel/bpf/verifier.c        | 50 +++++++++++++++++++++
>  8 files changed, 176 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 9b558713447f..e9c63ffa01af 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -308,6 +308,8 @@ struct bpf_prog_aux {
>  	void *security;
>  #endif
>  	struct bpf_prog_offload *offload;
> +	struct btf *btf;
> +	u32 type_id; /* type id for this prog/func */
>  	union {
>  		struct work_struct work;
>  		struct rcu_head	rcu;
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 9e8056ec20fa..e84782ec50ac 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -201,6 +201,7 @@ static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
>  struct bpf_subprog_info {
>  	u32 start; /* insn idx of function entry point */
>  	u16 stack_depth; /* max. stack depth used by this function */
> +	u32 type_id; /* btf type_id for this subprog */
>  };
>  
>  /* single container for all structs
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index e076c4697049..90e91b52aa90 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -46,5 +46,7 @@ void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
>  		       struct seq_file *m);
>  int btf_get_fd_by_id(u32 id);
>  u32 btf_id(const struct btf *btf);
> +bool is_btf_func_type(const struct btf *btf, u32 type_id);
> +const char *btf_get_name_by_id(const struct btf *btf, u32 type_id);
>  
>  #endif
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index f9187b41dff6..7ebbf4f06a65 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -332,6 +332,9 @@ union bpf_attr {
>  		 * (context accesses, allowed helpers, etc).
>  		 */
>  		__u32		expected_attach_type;
> +		__u32		prog_btf_fd;	/* fd pointing to BTF type data */
> +		__u32		func_info_len;	/* func_info length */
> +		__aligned_u64	func_info;	/* func type info */
>  	};
>  
>  	struct { /* anonymous struct used by BPF_OBJ_* commands */
> @@ -2585,6 +2588,9 @@ struct bpf_prog_info {
>  	__u32 nr_jited_func_lens;
>  	__aligned_u64 jited_ksyms;
>  	__aligned_u64 jited_func_lens;
> +	__u32 btf_id;
> +	__u32 nr_jited_func_types;
> +	__aligned_u64 jited_func_types;
>  } __attribute__((aligned(8)));
>  
>  struct bpf_map_info {
> @@ -2896,4 +2902,9 @@ struct bpf_flow_keys {
>  	};
>  };
>  
> +struct bpf_func_info {
> +	__u32	insn_offset;
> +	__u32	type_id;
> +};
> +
>  #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 794a185f11bf..85b8eeccddbd 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -486,6 +486,15 @@ static const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id)
>  	return btf->types[type_id];
>  }
>  
> +bool is_btf_func_type(const struct btf *btf, u32 type_id)
> +{
> +	const struct btf_type *type = btf_type_by_id(btf, type_id);
> +
> +	if (!type || BTF_INFO_KIND(type->info) != BTF_KIND_FUNC)
> +		return false;
> +	return true;
> +}
Can btf_type_is_func() (from patch 2) be reused?
The btf_type_by_id() can be done by the caller.
I don't think it worths to add a similar helper
for just one user for now.

The !type check can be added to btf_type_is_func() if
it is needed.

> +
>  /*
>   * Regular int is not a bit field and it must be either
>   * u8/u16/u32/u64.
> @@ -2579,3 +2588,10 @@ u32 btf_id(const struct btf *btf)
>  {
>  	return btf->id;
>  }
> +
> +const char *btf_get_name_by_id(const struct btf *btf, u32 type_id)
> +{
> +	const struct btf_type *t = btf_type_by_id(btf, type_id);
> +
> +	return btf_name_by_offset(btf, t->name_off);
> +}
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 3f5bf1af0826..add3866a120e 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -27,6 +27,7 @@
>  #include <linux/random.h>
>  #include <linux/moduleloader.h>
>  #include <linux/bpf.h>
> +#include <linux/btf.h>
>  #include <linux/frame.h>
>  #include <linux/rbtree_latch.h>
>  #include <linux/kallsyms.h>
> @@ -387,6 +388,7 @@ bpf_get_prog_addr_region(const struct bpf_prog *prog,
>  static void bpf_get_prog_name(const struct bpf_prog *prog, char *sym)
>  {
>  	const char *end = sym + KSYM_NAME_LEN;
> +	const char *func_name;
>  
>  	BUILD_BUG_ON(sizeof("bpf_prog_") +
>  		     sizeof(prog->tag) * 2 +
> @@ -401,6 +403,13 @@ static void bpf_get_prog_name(const struct bpf_prog *prog, char *sym)
>  
>  	sym += snprintf(sym, KSYM_NAME_LEN, "bpf_prog_");
>  	sym  = bin2hex(sym, prog->tag, sizeof(prog->tag));
> +
> +	if (prog->aux->btf) {
> +		func_name = btf_get_name_by_id(prog->aux->btf, prog->aux->type_id);
> +		snprintf(sym, (size_t)(end - sym), "_%s", func_name);
> +		return;
> +	}
> +
>  	if (prog->aux->name[0])
>  		snprintf(sym, (size_t)(end - sym), "_%s", prog->aux->name);
>  	else
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 4f416234251f..aa4688a1a137 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1120,6 +1120,7 @@ static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
>  		/* bpf_prog_free_id() must be called first */
>  		bpf_prog_free_id(prog, do_idr_lock);
>  		bpf_prog_kallsyms_del_all(prog);
> +		btf_put(prog->aux->btf);
>  
>  		call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
>  	}
> @@ -1343,8 +1344,45 @@ bpf_prog_load_check_attach_type(enum bpf_prog_type prog_type,
>  	}
>  }
>  
> +static int prog_check_btf(const struct bpf_prog *prog, const struct btf *btf,
> +			  union bpf_attr *attr)
> +{
> +	struct bpf_func_info __user *uinfo, info;
> +	int i, nfuncs, usize;
> +	u32 prev_offset;
> +
> +	usize = sizeof(struct bpf_func_info);
> +	if (attr->func_info_len % usize)
> +		return -EINVAL;
> +
> +	/* func_info section should have increasing and valid insn_offset
> +	 * and type should be BTF_KIND_FUNC.
> +	 */
> +	nfuncs = attr->func_info_len / usize;
> +	uinfo = u64_to_user_ptr(attr->func_info);
> +	for (i = 0; i < nfuncs; i++) {
> +		if (copy_from_user(&info, uinfo, usize))
> +			return -EFAULT;
> +
> +		if (!is_btf_func_type(btf, info.type_id))
> +			return -EINVAL;
> +
> +		if (i == 0) {
> +			if (info.insn_offset)
> +				return -EINVAL;
> +			prev_offset = 0;
> +		} else if (info.insn_offset < prev_offset) {
> +			return -EINVAL;
> +		}
> +
> +		prev_offset = info.insn_offset;
> +	}
> +
> +	return 0;
> +}
> +
>  /* last field in 'union bpf_attr' used by this command */
> -#define	BPF_PROG_LOAD_LAST_FIELD expected_attach_type
> +#define	BPF_PROG_LOAD_LAST_FIELD func_info
>  
>  static int bpf_prog_load(union bpf_attr *attr)
>  {
> @@ -1431,6 +1469,27 @@ static int bpf_prog_load(union bpf_attr *attr)
>  	if (err)
>  		goto free_prog;
>  
> +	/* copy func_info before verifier which may make
> +	 * some adjustment.
> +	 */
Is it a left over comment?  I don't see the intention of
copying func_info to avoid verifier modification from below.
I could be missing something.

or should the comments be moved to the new "check_btf_func()" below?

> +	if (attr->func_info_len) {
> +		struct btf *btf;
> +
> +		btf = btf_get_by_fd(attr->prog_btf_fd);
> +		if (IS_ERR(btf)) {
> +			err = PTR_ERR(btf);
> +			goto free_prog;
> +		}
> +
> +		err = prog_check_btf(prog, btf, attr);
> +		if (err) {
> +			btf_put(btf);
> +			goto free_prog;
> +		}
> +
> +		prog->aux->btf = btf;
> +	}
> +
>  	/* run eBPF verifier */
>  	err = bpf_check(&prog, attr);
>  	if (err < 0)
> @@ -1463,6 +1522,7 @@ static int bpf_prog_load(union bpf_attr *attr)
>  	bpf_prog_kallsyms_del_subprogs(prog);
>  	free_used_maps(prog->aux);
>  free_prog:
> +	btf_put(prog->aux->btf);
>  	bpf_prog_uncharge_memlock(prog);
>  free_prog_sec:
>  	security_bpf_prog_free(prog->aux);
> @@ -2108,6 +2168,30 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  		}
>  	}
>  
> +	if (prog->aux->btf) {
> +		info.btf_id = btf_id(prog->aux->btf);
> +
> +		ulen = info.nr_jited_func_types;
> +		info.nr_jited_func_types = prog->aux->func_cnt;
> +		if (info.nr_jited_func_types && ulen) {
> +			if (bpf_dump_raw_ok()) {
> +				u32 __user *user_types;
> +				u32 func_type, i;
> +
> +				ulen = min_t(u32, info.nr_jited_func_types,
> +					     ulen);
> +				user_types = u64_to_user_ptr(info.jited_func_types);
> +				for (i = 0; i < ulen; i++) {
> +					func_type = prog->aux->func[i]->aux->type_id;
> +					if (put_user(func_type, &user_types[i]))
> +						return -EFAULT;
> +				}
> +			} else {
> +				info.jited_func_types = 0;
> +			}
> +		}
> +	}
> +
>  done:
>  	if (copy_to_user(uinfo, &info, info_len) ||
>  	    put_user(info_len, &uattr->info.info_len))
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 3f93a548a642..97c408e84322 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4589,6 +4589,50 @@ static int check_cfg(struct bpf_verifier_env *env)
>  	return ret;
>  }
>  
> +static int check_btf_func(struct bpf_prog *prog, struct bpf_verifier_env *env,
> +			  union bpf_attr *attr)
> +{
> +	struct bpf_func_info *data;
> +	int i, nfuncs, ret = 0;
> +
> +	if (!attr->func_info_len)
> +		return 0;
> +
> +	nfuncs = attr->func_info_len / sizeof(struct bpf_func_info);
> +	if (env->subprog_cnt != nfuncs) {
> +		verbose(env, "number of funcs in func_info does not match verifier\n");
> +		return -EINVAL;
> +	}
> +
> +	data = kvmalloc(attr->func_info_len, GFP_KERNEL | __GFP_NOWARN);
> +	if (!data) {
> +		verbose(env, "no memory to allocate attr func_info\n");
> +		return -ENOMEM;
> +	}
> +
> +	if (copy_from_user(data, u64_to_user_ptr(attr->func_info),
> +			   attr->func_info_len)) {
> +		verbose(env, "memory copy error for attr func_info\n");
> +		ret = -EFAULT;
> +		goto cleanup;
> +		}
Extra indentation.

> +
> +	for (i = 0; i < nfuncs; i++) {
> +		if (env->subprog_info[i].start != data[i].insn_offset) {
> +			verbose(env, "func_info subprog start (%d) does not match verifier (%d)\n",
> +				env->subprog_info[i].start, data[i].insn_offset);
The printing args are swapped? env->subprog_info[i].start should
go to the "verifier (%d)"?

and s/%d/%u/

> +			ret = -EINVAL;
> +			goto cleanup;
> +		}
> +		env->subprog_info[i].type_id = data[i].type_id;
> +	}
> +
> +	prog->aux->type_id = data[0].type_id;
> +cleanup:
> +	kvfree(data);
> +	return ret;
> +}
> +
>  /* check %cur's range satisfies %old's */
>  static bool range_within(struct bpf_reg_state *old,
>  			 struct bpf_reg_state *cur)
> @@ -5873,6 +5917,8 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>  		func[i]->aux->name[0] = 'F';
>  		func[i]->aux->stack_depth = env->subprog_info[i].stack_depth;
>  		func[i]->jit_requested = 1;
> +		func[i]->aux->btf = prog->aux->btf;
> +		func[i]->aux->type_id = env->subprog_info[i].type_id;
>  		func[i] = bpf_int_jit_compile(func[i]);
>  		if (!func[i]->jited) {
>  			err = -ENOTSUPP;
> @@ -6307,6 +6353,10 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
>  	if (ret < 0)
>  		goto skip_full_check;
>  
> +	ret = check_btf_func(env->prog, env, attr);
> +	if (ret < 0)
> +		goto skip_full_check;
> +
>  	ret = do_check(env);
>  	if (env->cur_state) {
>  		free_verifier_state(env->cur_state, true);
> -- 
> 2.17.1
> 

^ permalink raw reply

* Re: [bpf-next PATCH v3 0/2] bpftool support for sockmap use cases
From: Alexei Starovoitov @ 2018-10-15 23:17 UTC (permalink / raw)
  To: John Fastabend; +Cc: jakub.kicinski, ast, daniel, netdev
In-Reply-To: <20181015181857.8673.46183.stgit@john-Precision-Tower-5810>

On Mon, Oct 15, 2018 at 11:19:44AM -0700, John Fastabend wrote:
> The first patch adds support for attaching programs to maps. This is
> needed to support sock{map|hash} use from bpftool. Currently, I carry
> around custom code to do this so doing it using standard bpftool will
> be great.
> 
> The second patch adds a compat mode to ignore non-zero entries in
> the map def. This allows using bpftool with maps that have a extra
> fields that the user knows can be ignored. This is needed to work
> correctly with maps being loaded by other tools or directly via
> syscalls.
> 
> v3: add bash completion and doc updates for --mapcompat

Applied, Thanks

^ permalink raw reply

* [PATCH bpf-next v2] tools: bpftool: add map create command
From: Jakub Kicinski @ 2018-10-15 23:30 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev, Jakub Kicinski

Add a way of creating maps from user space.  The command takes
as parameters most of the attributes of the map creation system
call command.  After map is created its pinned to bpffs.  This makes
it possible to easily and dynamically (without rebuilding programs)
test various corner cases related to map creation.

Map type names are taken from bpftool's array used for printing.
In general these days we try to make use of libbpf type names, but
there are no map type names in libbpf as of today.

As with most features I add the motivation is testing (offloads) :)

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 .../bpf/bpftool/Documentation/bpftool-map.rst |  15 ++-
 tools/bpf/bpftool/Documentation/bpftool.rst   |   4 +-
 tools/bpf/bpftool/bash-completion/bpftool     |  38 +++++-
 tools/bpf/bpftool/common.c                    |  21 ++++
 tools/bpf/bpftool/main.h                      |   1 +
 tools/bpf/bpftool/map.c                       | 110 +++++++++++++++++-
 6 files changed, 183 insertions(+), 6 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index a6258bc8ec4f..3497f2d80328 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -15,13 +15,15 @@ SYNOPSIS
 	*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-f** | **--bpffs** } }
 
 	*COMMANDS* :=
-	{ **show** | **list** | **dump** | **update** | **lookup** | **getnext** | **delete**
-	| **pin** | **help** }
+	{ **show** | **list** | **create** | **dump** | **update** | **lookup** | **getnext**
+	| **delete** | **pin** | **help** }
 
 MAP COMMANDS
 =============
 
 |	**bpftool** **map { show | list }**   [*MAP*]
+|	**bpftool** **map create**     *FILE* **type** *TYPE* **key** *KEY_SIZE* **value** *VALUE_SIZE* \
+|		**entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**dev** *NAME*]
 |	**bpftool** **map dump**       *MAP*
 |	**bpftool** **map update**     *MAP*  **key** *DATA*   **value** *VALUE* [*UPDATE_FLAGS*]
 |	**bpftool** **map lookup**     *MAP*  **key** *DATA*
@@ -36,6 +38,11 @@ MAP COMMANDS
 |	*PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
 |	*VALUE* := { *DATA* | *MAP* | *PROG* }
 |	*UPDATE_FLAGS* := { **any** | **exist** | **noexist** }
+|	*TYPE* := { **hash** | **array** | **prog_array** | **perf_event_array** | **percpu_hash**
+|		| **percpu_array** | **stack_trace** | **cgroup_array** | **lru_hash**
+|		| **lru_percpu_hash** | **lpm_trie** | **array_of_maps** | **hash_of_maps**
+|		| **devmap** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
+|		| **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage** }
 
 DESCRIPTION
 ===========
@@ -47,6 +54,10 @@ DESCRIPTION
 		  Output will start with map ID followed by map type and
 		  zero or more named attributes (depending on kernel version).
 
+	**bpftool map create** *FILE* **type** *TYPE* **key** *KEY_SIZE* **value** *VALUE_SIZE*  **entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**dev** *NAME*]
+		  Create a new map with given parameters and pin it to *bpffs*
+		  as *FILE*.
+
 	**bpftool map dump**    *MAP*
 		  Dump all entries in a given *MAP*.
 
diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst b/tools/bpf/bpftool/Documentation/bpftool.rst
index 65488317fefa..04cd4f92ab89 100644
--- a/tools/bpf/bpftool/Documentation/bpftool.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool.rst
@@ -22,8 +22,8 @@ SYNOPSIS
 	| { **-j** | **--json** } [{ **-p** | **--pretty** }] }
 
 	*MAP-COMMANDS* :=
-	{ **show** | **list** | **dump** | **update** | **lookup** | **getnext** | **delete**
-	| **pin** | **event_pipe** | **help** }
+	{ **show** | **list** | **create** | **dump** | **update** | **lookup** | **getnext**
+	| **delete** | **pin** | **event_pipe** | **help** }
 
 	*PROG-COMMANDS* := { **show** | **list** | **dump jited** | **dump xlated** | **pin**
 	| **load** | **attach** | **detach** | **help** }
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index ac85207cba8d..c56545e87b0d 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -387,6 +387,42 @@ _bpftool()
                             ;;
                     esac
                     ;;
+                create)
+                    case $prev in
+                        $command)
+                            _filedir
+                            return 0
+                            ;;
+                        type)
+                            COMPREPLY=( $( compgen -W 'hash array prog_array \
+                                perf_event_array percpu_hash percpu_array \
+                                stack_trace cgroup_array lru_hash \
+                                lru_percpu_hash lpm_trie array_of_maps \
+                                hash_of_maps devmap sockmap cpumap xskmap \
+                                sockhash cgroup_storage reuseport_sockarray \
+                                percpu_cgroup_storage' -- \
+                                                   "$cur" ) )
+                            return 0
+                            ;;
+                        key|value|flags|name|entries)
+                            return 0
+                            ;;
+                        dev)
+                            _sysfs_get_netdevs
+                            return 0
+                            ;;
+                        *)
+                            _bpftool_once_attr 'type'
+                            _bpftool_once_attr 'key'
+                            _bpftool_once_attr 'value'
+                            _bpftool_once_attr 'entries'
+                            _bpftool_once_attr 'name'
+                            _bpftool_once_attr 'flags'
+                            _bpftool_once_attr 'dev'
+                            return 0
+                            ;;
+                    esac
+                    ;;
                 lookup|getnext|delete)
                     case $prev in
                         $command)
@@ -500,7 +536,7 @@ _bpftool()
                 *)
                     [[ $prev == $object ]] && \
                         COMPREPLY=( $( compgen -W 'delete dump getnext help \
-                            lookup pin event_pipe show list update' -- \
+                            lookup pin event_pipe show list update create' -- \
                             "$cur" ) )
                     ;;
             esac
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index b3a0709ea7ed..3318da8060bd 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -618,3 +618,24 @@ void print_dev_json(__u32 ifindex, __u64 ns_dev, __u64 ns_inode)
 		jsonw_string_field(json_wtr, "ifname", name);
 	jsonw_end_object(json_wtr);
 }
+
+int parse_u32_arg(int *argc, char ***argv, __u32 *val, const char *what)
+{
+	char *endptr;
+
+	NEXT_ARGP();
+
+	if (*val) {
+		p_err("%s already specified", what);
+		return -1;
+	}
+
+	*val = strtoul(**argv, &endptr, 0);
+	if (*endptr) {
+		p_err("can't parse %s as %s", **argv, what);
+		return -1;
+	}
+	NEXT_ARGP();
+
+	return 0;
+}
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 91fd697303cb..28ee769bd11b 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -139,6 +139,7 @@ int do_cgroup(int argc, char **arg);
 int do_perf(int argc, char **arg);
 int do_net(int argc, char **arg);
 
+int parse_u32_arg(int *argc, char ***argv, __u32 *val, const char *what);
 int prog_parse_fd(int *argc, char ***argv);
 int map_parse_fd(int *argc, char ***argv);
 int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len);
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 9f5de48f8a99..7bf38f0e152e 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -36,6 +36,7 @@
 #include <fcntl.h>
 #include <linux/err.h>
 #include <linux/kernel.h>
+#include <net/if.h>
 #include <stdbool.h>
 #include <stdio.h>
 #include <stdlib.h>
@@ -94,6 +95,17 @@ static bool map_is_map_of_progs(__u32 type)
 	return type == BPF_MAP_TYPE_PROG_ARRAY;
 }
 
+static int map_type_from_str(const char *type)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(map_type_name); i++)
+		/* Don't allow prefixing in case of possible future shadowing */
+		if (map_type_name[i] && !strcmp(map_type_name[i], type))
+			return i;
+	return -1;
+}
+
 static void *alloc_value(struct bpf_map_info *info)
 {
 	if (map_is_per_cpu(info->type))
@@ -1058,6 +1070,92 @@ static int do_pin(int argc, char **argv)
 	return err;
 }
 
+static int do_create(int argc, char **argv)
+{
+	struct bpf_create_map_attr attr = { NULL, };
+	const char *pinfile;
+	int err, fd;
+
+	if (!REQ_ARGS(7))
+		return -1;
+	pinfile = GET_ARG();
+
+	while (argc) {
+		if (!REQ_ARGS(2))
+			return -1;
+
+		if (is_prefix(*argv, "type")) {
+			NEXT_ARG();
+
+			if (attr.map_type) {
+				p_err("map type already specified");
+				return -1;
+			}
+
+			attr.map_type = map_type_from_str(*argv);
+			if ((int)attr.map_type < 0) {
+				p_err("unrecognized map type: %s", *argv);
+				return -1;
+			}
+			NEXT_ARG();
+		} else if (is_prefix(*argv, "name")) {
+			NEXT_ARG();
+			attr.name = GET_ARG();
+		} else if (is_prefix(*argv, "key")) {
+			if (parse_u32_arg(&argc, &argv, &attr.key_size,
+					  "key size"))
+				return -1;
+		} else if (is_prefix(*argv, "value")) {
+			if (parse_u32_arg(&argc, &argv, &attr.value_size,
+					  "value size"))
+				return -1;
+		} else if (is_prefix(*argv, "entries")) {
+			if (parse_u32_arg(&argc, &argv, &attr.max_entries,
+					  "max entries"))
+				return -1;
+		} else if (is_prefix(*argv, "flags")) {
+			if (parse_u32_arg(&argc, &argv, &attr.map_flags,
+					  "flags"))
+				return -1;
+		} else if (is_prefix(*argv, "dev")) {
+			NEXT_ARG();
+
+			if (attr.map_ifindex) {
+				p_err("offload device already specified");
+				return -1;
+			}
+
+			attr.map_ifindex = if_nametoindex(*argv);
+			if (!attr.map_ifindex) {
+				p_err("unrecognized netdevice '%s': %s",
+				      *argv, strerror(errno));
+				return -1;
+			}
+			NEXT_ARG();
+		}
+	}
+
+	if (!attr.name) {
+		p_err("map name not specified");
+		return -1;
+	}
+
+	fd = bpf_create_map_xattr(&attr);
+	if (fd < 0) {
+		p_err("map create failed: %s", strerror(errno));
+		return -1;
+	}
+
+	err = do_pin_fd(fd, pinfile);
+	close(fd);
+	if (err)
+		return err;
+
+	if (json_output)
+		jsonw_null(json_wtr);
+	return 0;
+}
+
 static int do_help(int argc, char **argv)
 {
 	if (json_output) {
@@ -1067,6 +1165,9 @@ static int do_help(int argc, char **argv)
 
 	fprintf(stderr,
 		"Usage: %s %s { show | list }   [MAP]\n"
+		"       %s %s create     FILE type TYPE key KEY_SIZE value VALUE_SIZE \\\n"
+		"                              entries MAX_ENTRIES name NAME [flags FLAGS] \\\n"
+		"                              [dev NAME]\n"
 		"       %s %s dump       MAP\n"
 		"       %s %s update     MAP  key DATA value VALUE [UPDATE_FLAGS]\n"
 		"       %s %s lookup     MAP  key DATA\n"
@@ -1081,11 +1182,17 @@ static int do_help(int argc, char **argv)
 		"       " HELP_SPEC_PROGRAM "\n"
 		"       VALUE := { DATA | MAP | PROG }\n"
 		"       UPDATE_FLAGS := { any | exist | noexist }\n"
+		"       TYPE := { hash | array | prog_array | perf_event_array | percpu_hash |\n"
+		"                 percpu_array | stack_trace | cgroup_array | lru_hash |\n"
+		"                 lru_percpu_hash | lpm_trie | array_of_maps | hash_of_maps |\n"
+		"                 devmap | sockmap | cpumap | xskmap | sockhash |\n"
+		"                 cgroup_storage | reuseport_sockarray | percpu_cgroup_storage }\n"
 		"       " HELP_SPEC_OPTIONS "\n"
 		"",
 		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
 		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
-		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
+		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
+		bin_name, argv[-2]);
 
 	return 0;
 }
@@ -1101,6 +1208,7 @@ static const struct cmd cmds[] = {
 	{ "delete",	do_delete },
 	{ "pin",	do_pin },
 	{ "event_pipe",	do_event_pipe },
+	{ "create",	do_create },
 	{ 0 }
 };
 
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH bpf-next 2/3] bpf: emit RECORD_MMAP events for bpf prog load/unload
From: Song Liu @ 2018-10-15 23:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: acme, Peter Zijlstra, Alexei Starovoitov, David S . Miller,
	Daniel Borkmann, Networking, kernel-team
In-Reply-To: <20180921221343.g52n7c4edisvice3@ast-mbp>

On Fri, Sep 21, 2018 at 3:15 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Sep 21, 2018 at 09:25:00AM -0300, Arnaldo Carvalho de Melo wrote:
> >
> > > I have considered adding MUNMAP to match existing MMAP, but went
> > > without it because I didn't want to introduce new bit in perf_event_attr
> > > and emit these new events in a misbalanced conditional way for prog load/unload.
> > > Like old perf is asking kernel for mmap events via mmap bit, so prog load events
> >
> > By prog load events you mean that old perf, having perf_event_attr.mmap = 1 ||
> > perf_event_attr.mmap2 = 1 will cause the new kernel to emit
> > PERF_RECORD_MMAP records for the range of addresses that a BPF program
> > is being loaded on, right?
>
> right. it would be weird when prog load events are there, but not unload.
>
> > > will be in perf.data, but old perf report won't recognize them anyway.
> >
> > Why not? It should lookup the symbol and find it in the rb_tree of maps,
> > with a DSO name equal to what was in the PERF_RECORD_MMAP emitted by the
> > BPF core, no? It'll be an unresolved symbol, but a resolved map.
> >
> > > Whereas new perf would certainly want to catch bpf events and will set
> > > both mmap and mumap bits.
> >
> > new perf with your code will find a symbol, not a map, because your code
> > catches a special case PERF_RECORD_MMAP and instead of creating a
> > 'struct map' will create a 'struct symbol' and insert it in the kallsyms
> > 'struct map', right?
>
> right.
> bpf progs are more similar to kernel functions than to modules.
> For modules it makes sense to create a new map and insert symbols into it.
> For bpf JITed images there is no DSO to parse.
> Single bpf elf file may contain multiple bpf progs and each prog may contain
> multiple bpf functions. They will be loaded at different time and
> will have different life time.
>
> > In theory the old perf should catch the PERF_RECORD_MMAP with a string
> > in the filename part and insert a new map into the kernel mmap rb_tree,
> > and then samples would be resolved to this map, but since there is no
> > backing DSO with a symtab, it would stop at that, just stating that the
> > map is called NAME-OF-BPF-PROGRAM. This is all from memory, possibly
> > there is something in there that makes it ignore this PERF_RECORD_MMAP
> > emitted by the BPF kernel code when loading a new program.
>
> In /proc/kcore there is already a section for module range.
> Hence when perf processes bpf load/unload events the map is already created.
> Therefore the patch 3 only searches for it and inserts new symbol into it.
>
> In that sense the reuse of RECORD_MMAP event for bpf progs is indeed
> not exactly clean, since no new map is created.
> It's probably better to introduce PERF_RECORD_[INSERT|ERASE]_KSYM events ?
>
> Such event potentially can be used for offline ksym resolution.
> perf could process /proc/kallsyms during perf record and emit all of them
> as synthetic PERF_RECORD_INSERT_KSYM into perf.data, so perf report can run
> on a different server and still find the right symbols.
>
> I guess, we can do bpf specific events too and keep RECORD_MMAP as-is.
> How about single PERF_RECORD_BPF event with internal flag for load/unload ?
>
> > Right, that is another unfortunate state of affairs, kernel module
> > load/unload should already be supported, reported by the kernel via a
> > proper PERF_RECORD_MODULE_LOAD/UNLOAD
>
> I agree with Peter here. It would nice, but low priority.
> modules are mostly static. Loaded once and stay there.
>
> > There is another longstanding TODO list entry: PERF_RECORD_MMAP records
> > should include a build-id, to avoid either userspace getting confused
> > when there is an update of some mmap DSO, for long running sessions, for
> > instance, or to have to scan the just recorded perf.data file for DSOs
> > with samples to then read it from the file system (more races).
> >
> > Have you ever considered having a build-id for bpf objects that could be
> > used here?
>
> build-id concept is not applicable to bpf.
> bpf elf files on the disc don't have good correlation with what is
> running in the kernel. bpf bytestream is converted and optimized
> by the verifier. Then JITed.
> So debug info left in .o file and original bpf bytestream in .o are
> mostly useless.
> For bpf programs we have 'program tag'. It is computed over original
> bpf bytestream, so both kernel and user space can compute it.
> In libbcc we use /var/tmp/bcc/bpf_prog_TAG/ directory to store original
> source code of the program, so users looking at kernel stack traces
> with bpf_prog_TAG can find the source.
> It's similar to build-id, but not going to help perf to annotate
> actual x86 instructions inside JITed image and show src code.
> Since JIT runs in the kernel this problem cannot be solved by user space only.
> It's a difficult problem and we have a plan to tackle that,
> but it's step 2. A bunch of infra is needed on bpf side to
> preserve the association during src_in_C -> original bpf insns ->
> translated bpf insns -> JITed asm.
> Then report it back to user space and then teach perf to properly annotate progs.
>
> > Will the JITed code from some BPF bytecode be different if the same
> > bytecode is JITed in several machines, all having the exact same
> > hardware?
>
> Yes. JITed code will be different depending on sysctl knobs (like const blinding)
> So the same original bpf byte stream loaded at different times
> may have different JITed image.
>
> Even without security features like blinding the JIT can be different.
> the bpf maps are separate from bpf progs. The bpf map is created first.
> Then the same bpf instruction stream (depending on map type that it uses)
> may be optimized by the verifier differently causing difference in JIT.
>
> > > (instead of passing kallsyms's bpf prog name in event->mmap.filename)
> > > but bpf functions don't have their own prog_id. Multiple bpf funcs
> > > with different JITed blobs are considered to be a part of single prog_id.
> > > So as a step one I'm only extending RECORD_MMAP with addr and kallsym
> > > name of bpf function/prog.
> > > As a step two the plan is to add notification mechanism for prog load/unload
> > > that will include prog_id and design new synthetic RECORD_* events in
> > > perf user space that will contain JITed code, line info, BTF, etc.
> >
> > So, will the kernel JIT a bytecode, load it somewhere and run it, then,
> > when unloading it, keep it somewhere (a filesystem with some limits,
> > etc) so that at some later time (with some timeouts) tooling can, using
> > its id/buildid cookie get the contents and symbol table to have a better
> > annotation experience?
>
> yes. The plan is to let perf fetch JITed image via BPF_OBJ_GET_INFO_BY_FD cmd
> during perf record run and store it inside perf.data in synthetic records.
> Then perf report can annotate it later.

Hi Peter and Arnaldo,

I am working with Alexei on the idea of fetching BPF program information via
BPF_OBJ_GET_INFO_BY_FD cmd. I added PERF_RECORD_BPF_EVENT
to perf_event_type, and dumped these events to perf event ring buffer.

I found that perf will not process event until the end of perf-record:

root@virt-test:~# ~/perf record -ag -- sleep 10
...... 10 seconds later
[ perf record: Woken up 34 times to write data ]
machine__process_bpf_event: prog_id 6 loaded
machine__process_bpf_event: prog_id 6 unloaded
[ perf record: Captured and wrote 9.337 MB perf.data (93178 samples) ]

In this example, the bpf program was loaded and then unloaded in
another terminal. When machine__process_bpf_event() processes
the load event, the bpf program is already unloaded. Therefore,
machine__process_bpf_event() will not be able to get information
about the program via BPF_OBJ_GET_INFO_BY_FD cmd.

To solve this problem, we will need to run BPF_OBJ_GET_INFO_BY_FD
as soon as perf get the event from kernel. I looked around the perf
code for a while. But I haven't found a good example where some
events are processed before the end of perf-record. Could you
please help me with this?

Thanks,
Song

^ permalink raw reply

* Re: [PATCH bpf-next v2] tools: bpftool: add map create command
From: Alexei Starovoitov @ 2018-10-15 23:41 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: daniel, oss-drivers, netdev
In-Reply-To: <20181015233036.2822-1-jakub.kicinski@netronome.com>

On Mon, Oct 15, 2018 at 04:30:36PM -0700, Jakub Kicinski wrote:
> Add a way of creating maps from user space.  The command takes
> as parameters most of the attributes of the map creation system
> call command.  After map is created its pinned to bpffs.  This makes
> it possible to easily and dynamically (without rebuilding programs)
> test various corner cases related to map creation.
> 
> Map type names are taken from bpftool's array used for printing.
> In general these days we try to make use of libbpf type names, but
> there are no map type names in libbpf as of today.
> 
> As with most features I add the motivation is testing (offloads) :)
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Applied, Thanks

^ permalink raw reply

* [PATCH] wlcore: Fix the return value in case of error in 'wlcore_vendor_cmd_smart_config_start()'
From: Christophe JAILLET @ 2018-10-16  7:39 UTC (permalink / raw)
  To: kvalo, davem, tony
  Cc: linux-wireless, netdev, linux-kernel, kernel-janitors,
	Christophe JAILLET

We return 0 unconditionally at the end of
'wlcore_vendor_cmd_smart_config_start()'.
However, 'ret' is set to some error codes in several error handling paths
and we already return some error codes at the beginning of the function.

Return 'ret' instead to propagate the error code.

Fixes: 80ff8063e87c ("wlcore: handle smart config vendor commands")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
 drivers/net/wireless/ti/wlcore/vendor_cmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ti/wlcore/vendor_cmd.c b/drivers/net/wireless/ti/wlcore/vendor_cmd.c
index dbe78d8491ef..7f34ec077ee5 100644
--- a/drivers/net/wireless/ti/wlcore/vendor_cmd.c
+++ b/drivers/net/wireless/ti/wlcore/vendor_cmd.c
@@ -70,7 +70,7 @@ wlcore_vendor_cmd_smart_config_start(struct wiphy *wiphy,
 out:
 	mutex_unlock(&wl->mutex);
 
-	return 0;
+	return ret;
 }
 
 static int
-- 
2.17.1

^ permalink raw reply related

* [PATCH net 0/3] nfp: fix pedit set action offloads
From: Jakub Kicinski @ 2018-10-15 23:52 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, Jakub Kicinski

Hi,

Pieter says:

This set fixes set actions when using multiple pedit actions with
partial masks and with multiple keys per pedit action. Additionally
it fixes set ipv6 pedit action offloads when using it in combination
with other header keys.

The problem would only trigger if one combines multiple pedit actions
of the same type with partial masks, e.g.:

$ tc filter add dev netdev protocol ip parent ffff: \
    flower indev netdev \
    ip_proto tcp \
    action pedit ex munge \ 
    ip src set 11.11.11.11 retain 65535 munge \
    ip src set 22.22.22.22 retain 4294901760 pipe \
    csum ip and tcp pipe \
    mirred egress redirect dev netdev

Pieter Jansen van Vuuren (3):
  nfp: flower: fix pedit set actions for multiple partial masks
  nfp: flower: fix multiple keys per pedit action
  nfp: flower: use offsets provided by pedit instead of index for ipv6

 .../ethernet/netronome/nfp/flower/action.c    | 51 ++++++++++++-------
 1 file changed, 33 insertions(+), 18 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH net 1/3] nfp: flower: fix pedit set actions for multiple partial masks
From: Jakub Kicinski @ 2018-10-15 23:52 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, Pieter Jansen van Vuuren
In-Reply-To: <20181015235225.17574-1-jakub.kicinski@netronome.com>

From: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>

Previously we did not correctly change headers when using multiple
pedit actions with partial masks. We now take this into account and
no longer just commit the last pedit action.

Fixes: c0b1bd9a8b8a ("nfp: add set ipv4 header action flower offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/flower/action.c    | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 46ba0cf257c6..91de7a9b0190 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -429,12 +429,14 @@ nfp_fl_set_ip4(const struct tc_action *action, int idx, u32 off,
 
 	switch (off) {
 	case offsetof(struct iphdr, daddr):
-		set_ip_addr->ipv4_dst_mask = mask;
-		set_ip_addr->ipv4_dst = exact;
+		set_ip_addr->ipv4_dst_mask |= mask;
+		set_ip_addr->ipv4_dst &= ~mask;
+		set_ip_addr->ipv4_dst |= exact & mask;
 		break;
 	case offsetof(struct iphdr, saddr):
-		set_ip_addr->ipv4_src_mask = mask;
-		set_ip_addr->ipv4_src = exact;
+		set_ip_addr->ipv4_src_mask |= mask;
+		set_ip_addr->ipv4_src &= ~mask;
+		set_ip_addr->ipv4_src |= exact & mask;
 		break;
 	default:
 		return -EOPNOTSUPP;
@@ -451,8 +453,9 @@ static void
 nfp_fl_set_ip6_helper(int opcode_tag, int idx, __be32 exact, __be32 mask,
 		      struct nfp_fl_set_ipv6_addr *ip6)
 {
-	ip6->ipv6[idx % 4].mask = mask;
-	ip6->ipv6[idx % 4].exact = exact;
+	ip6->ipv6[idx % 4].mask |= mask;
+	ip6->ipv6[idx % 4].exact &= ~mask;
+	ip6->ipv6[idx % 4].exact |= exact & mask;
 
 	ip6->reserved = cpu_to_be16(0);
 	ip6->head.jump_id = opcode_tag;
-- 
2.17.1

^ permalink raw reply related

* [PATCH net 2/3] nfp: flower: fix multiple keys per pedit action
From: Jakub Kicinski @ 2018-10-15 23:52 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, Pieter Jansen van Vuuren
In-Reply-To: <20181015235225.17574-1-jakub.kicinski@netronome.com>

From: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>

Previously we only allowed a single header key per pedit action to
change the header. This used to result in the last header key in the
pedit action to overwrite previous headers. We now keep track of them
and allow multiple header keys per pedit action.

Fixes: c0b1bd9a8b8a ("nfp: add set ipv4 header action flower offload")
Fixes: 354b82bb320e ("nfp: add set ipv6 source and destination address")
Fixes: f8b7b0a6b113 ("nfp: add set tcp and udp header action flower offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/flower/action.c   | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 91de7a9b0190..c39d7fdf73e6 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -544,7 +544,7 @@ nfp_fl_pedit(const struct tc_action *action, struct tc_cls_flower_offload *flow,
 	struct nfp_fl_set_eth set_eth;
 	enum pedit_header_type htype;
 	int idx, nkeys, err;
-	size_t act_size;
+	size_t act_size = 0;
 	u32 offset, cmd;
 	u8 ip_proto = 0;
 
@@ -602,7 +602,9 @@ nfp_fl_pedit(const struct tc_action *action, struct tc_cls_flower_offload *flow,
 		act_size = sizeof(set_eth);
 		memcpy(nfp_action, &set_eth, act_size);
 		*a_len += act_size;
-	} else if (set_ip_addr.head.len_lw) {
+	}
+	if (set_ip_addr.head.len_lw) {
+		nfp_action += act_size;
 		act_size = sizeof(set_ip_addr);
 		memcpy(nfp_action, &set_ip_addr, act_size);
 		*a_len += act_size;
@@ -610,10 +612,12 @@ nfp_fl_pedit(const struct tc_action *action, struct tc_cls_flower_offload *flow,
 		/* Hardware will automatically fix IPv4 and TCP/UDP checksum. */
 		*csum_updated |= TCA_CSUM_UPDATE_FLAG_IPV4HDR |
 				nfp_fl_csum_l4_to_flag(ip_proto);
-	} else if (set_ip6_dst.head.len_lw && set_ip6_src.head.len_lw) {
+	}
+	if (set_ip6_dst.head.len_lw && set_ip6_src.head.len_lw) {
 		/* TC compiles set src and dst IPv6 address as a single action,
 		 * the hardware requires this to be 2 separate actions.
 		 */
+		nfp_action += act_size;
 		act_size = sizeof(set_ip6_src);
 		memcpy(nfp_action, &set_ip6_src, act_size);
 		*a_len += act_size;
@@ -626,6 +630,7 @@ nfp_fl_pedit(const struct tc_action *action, struct tc_cls_flower_offload *flow,
 		/* Hardware will automatically fix TCP/UDP checksum. */
 		*csum_updated |= nfp_fl_csum_l4_to_flag(ip_proto);
 	} else if (set_ip6_dst.head.len_lw) {
+		nfp_action += act_size;
 		act_size = sizeof(set_ip6_dst);
 		memcpy(nfp_action, &set_ip6_dst, act_size);
 		*a_len += act_size;
@@ -633,13 +638,16 @@ nfp_fl_pedit(const struct tc_action *action, struct tc_cls_flower_offload *flow,
 		/* Hardware will automatically fix TCP/UDP checksum. */
 		*csum_updated |= nfp_fl_csum_l4_to_flag(ip_proto);
 	} else if (set_ip6_src.head.len_lw) {
+		nfp_action += act_size;
 		act_size = sizeof(set_ip6_src);
 		memcpy(nfp_action, &set_ip6_src, act_size);
 		*a_len += act_size;
 
 		/* Hardware will automatically fix TCP/UDP checksum. */
 		*csum_updated |= nfp_fl_csum_l4_to_flag(ip_proto);
-	} else if (set_tport.head.len_lw) {
+	}
+	if (set_tport.head.len_lw) {
+		nfp_action += act_size;
 		act_size = sizeof(set_tport);
 		memcpy(nfp_action, &set_tport, act_size);
 		*a_len += act_size;
-- 
2.17.1

^ permalink raw reply related

* [PATCH net 3/3] nfp: flower: use offsets provided by pedit instead of index for ipv6
From: Jakub Kicinski @ 2018-10-15 23:52 UTC (permalink / raw)
  To: davem; +Cc: netdev, oss-drivers, Pieter Jansen van Vuuren
In-Reply-To: <20181015235225.17574-1-jakub.kicinski@netronome.com>

From: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>

Previously when populating the set ipv6 address action, we incorrectly
made use of pedit's key index to determine which 32bit word should be
set. We now calculate which word has been selected based on the offset
provided by the pedit action.

Fixes: 354b82bb320e ("nfp: add set ipv6 source and destination address")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../ethernet/netronome/nfp/flower/action.c    | 26 +++++++++++--------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c
index c39d7fdf73e6..7a1e9cd9cc62 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -450,12 +450,12 @@ nfp_fl_set_ip4(const struct tc_action *action, int idx, u32 off,
 }
 
 static void
-nfp_fl_set_ip6_helper(int opcode_tag, int idx, __be32 exact, __be32 mask,
+nfp_fl_set_ip6_helper(int opcode_tag, u8 word, __be32 exact, __be32 mask,
 		      struct nfp_fl_set_ipv6_addr *ip6)
 {
-	ip6->ipv6[idx % 4].mask |= mask;
-	ip6->ipv6[idx % 4].exact &= ~mask;
-	ip6->ipv6[idx % 4].exact |= exact & mask;
+	ip6->ipv6[word].mask |= mask;
+	ip6->ipv6[word].exact &= ~mask;
+	ip6->ipv6[word].exact |= exact & mask;
 
 	ip6->reserved = cpu_to_be16(0);
 	ip6->head.jump_id = opcode_tag;
@@ -468,6 +468,7 @@ nfp_fl_set_ip6(const struct tc_action *action, int idx, u32 off,
 	       struct nfp_fl_set_ipv6_addr *ip_src)
 {
 	__be32 exact, mask;
+	u8 word;
 
 	/* We are expecting tcf_pedit to return a big endian value */
 	mask = (__force __be32)~tcf_pedit_mask(action, idx);
@@ -476,17 +477,20 @@ nfp_fl_set_ip6(const struct tc_action *action, int idx, u32 off,
 	if (exact & ~mask)
 		return -EOPNOTSUPP;
 
-	if (off < offsetof(struct ipv6hdr, saddr))
+	if (off < offsetof(struct ipv6hdr, saddr)) {
 		return -EOPNOTSUPP;
-	else if (off < offsetof(struct ipv6hdr, daddr))
-		nfp_fl_set_ip6_helper(NFP_FL_ACTION_OPCODE_SET_IPV6_SRC, idx,
+	} else if (off < offsetof(struct ipv6hdr, daddr)) {
+		word = (off - offsetof(struct ipv6hdr, saddr)) / sizeof(exact);
+		nfp_fl_set_ip6_helper(NFP_FL_ACTION_OPCODE_SET_IPV6_SRC, word,
 				      exact, mask, ip_src);
-	else if (off < offsetof(struct ipv6hdr, daddr) +
-		       sizeof(struct in6_addr))
-		nfp_fl_set_ip6_helper(NFP_FL_ACTION_OPCODE_SET_IPV6_DST, idx,
+	} else if (off < offsetof(struct ipv6hdr, daddr) +
+		       sizeof(struct in6_addr)) {
+		word = (off - offsetof(struct ipv6hdr, daddr)) / sizeof(exact);
+		nfp_fl_set_ip6_helper(NFP_FL_ACTION_OPCODE_SET_IPV6_DST, word,
 				      exact, mask, ip_dst);
-	else
+	} else {
 		return -EOPNOTSUPP;
+	}
 
 	return 0;
 }
-- 
2.17.1

^ permalink raw reply related

* pull-request: bpf-next 2018-10-16
From: Daniel Borkmann @ 2018-10-16  0:33 UTC (permalink / raw)
  To: davem; +Cc: daniel, ast, netdev

Hi David,

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Convert BPF sockmap and kTLS to both use a new sk_msg API and enable
   sk_msg BPF integration for the latter, from Daniel and John.

2) Enable BPF syscall side to indicate for maps that they do not support
   a map lookup operation as opposed to just missing key, from Prashant.

3) Add bpftool map create command which after map creation pins the
   map into bpf fs for further processing, from Jakub.

4) Add bpftool support for attaching programs to maps allowing sock_map
   and sock_hash to be used from bpftool, from John.

5) Improve syscall BPF map update/delete path for map-in-map types to
   wait a RCU grace period for pending references to complete, from Daniel.

6) Couple of follow-up fixes for the BPF socket lookup to get it
   enabled also when IPv6 is compiled as a module, from Joe.

7) Fix a generic-XDP bug to handle the case when the Ethernet header
   was mangled and thus update skb's protocol and data, from Jesper.

8) Add a missing BTF header length check between header copies from
   user space, from Wenwen.

9) Minor fixups in libbpf to use __u32 instead u32 types and include
   proper perf_event.h uapi header instead of perf internal one, from Yonghong.

10) Allow to pass user-defined flags through EXTRA_CFLAGS and EXTRA_LDFLAGS
    to bpftool's build, from Jiri.

11) BPF kselftest tweaks to add LWTUNNEL to config fragment and to install
    with_addr.sh script from flow dissector selftest, from Anders.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

Thanks a lot!

----------------------------------------------------------------

The following changes since commit 071a234ad744ab9a1e9c948874d5f646a2964734:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next (2018-10-08 23:42:44 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git 

for you to fetch changes up to 0b592b5a01bef5416472ec610d3191e019c144a5:

  tools: bpftool: add map create command (2018-10-15 16:39:21 -0700)

----------------------------------------------------------------
Alexei Starovoitov (5):
      Merge branch 'unsupported-map-lookup'
      Merge branch 'xdp-vlan'
      Merge branch 'sockmap_and_ktls'
      Merge branch 'ipv6_sk_lookup_fixes'
      Merge branch 'bpftool_sockmap'

Anders Roxell (2):
      selftests: bpf: add config fragment LWTUNNEL
      selftests: bpf: install script with_addr.sh

Daniel Borkmann (5):
      tcp, ulp: enforce sock_owned_by_me upon ulp init and cleanup
      tcp, ulp: remove ulp bits from sockmap
      bpf, sockmap: convert to generic sk_msg interface
      tls: convert to generic sk_msg interface
      bpf, doc: add maintainers entry to related files

Daniel Colascione (1):
      bpf: wait for running BPF programs when updating map-in-map

Jakub Kicinski (1):
      tools: bpftool: add map create command

Jesper Dangaard Brouer (3):
      net: fix generic XDP to handle if eth header was mangled
      bpf: make TC vlan bpf_helpers avail to selftests
      selftests/bpf: add XDP selftests for modifying and popping VLAN headers

Jiri Olsa (2):
      bpftool: Allow to add compiler flags via EXTRA_CFLAGS variable
      bpftool: Allow add linker flags via EXTRA_LDFLAGS variable

Joe Stringer (3):
      bpf: Fix dev pointer dereference from sk_skb
      bpf: Allow sk_lookup with IPv6 module
      bpf: Fix IPv6 dport byte-order in bpf_sk_lookup

John Fastabend (5):
      tls: replace poll implementation with read hook
      tls: add bpf support to sk_msg handling
      bpf: add tls support for testing in test_sockmap
      bpf: bpftool, add support for attaching programs to maps
      bpf: bpftool, add flag to allow non-compat map definitions

Prashant Bhole (6):
      bpf: error handling when map_lookup_elem isn't supported
      bpf: return EOPNOTSUPP when map lookup isn't supported
      tools/bpf: bpftool, split the function do_dump()
      tools/bpf: bpftool, print strerror when map lookup error occurs
      selftests/bpf: test_verifier, change names of fixup maps
      selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf prog

Wenwen Wang (1):
      bpf: btf: Fix a missing check bug

Yonghong Song (1):
      tools/bpf: use proper type and uapi perf_event.h header for libbpf

 MAINTAINERS                                      |   10 +
 include/linux/bpf.h                              |   33 +-
 include/linux/bpf_types.h                        |    2 +-
 include/linux/filter.h                           |   21 -
 include/linux/skmsg.h                            |  410 ++++
 include/net/addrconf.h                           |    5 +
 include/net/sock.h                               |    4 -
 include/net/tcp.h                                |   28 +-
 include/net/tls.h                                |   24 +-
 kernel/bpf/Makefile                              |    5 -
 kernel/bpf/arraymap.c                            |    2 +-
 kernel/bpf/btf.c                                 |    3 +
 kernel/bpf/core.c                                |    2 -
 kernel/bpf/sockmap.c                             | 2629 ----------------------
 kernel/bpf/stackmap.c                            |    2 +-
 kernel/bpf/syscall.c                             |   28 +-
 kernel/bpf/xskmap.c                              |    2 +-
 net/Kconfig                                      |   11 +
 net/core/Makefile                                |    2 +
 net/core/dev.c                                   |   14 +
 net/core/filter.c                                |  290 +--
 net/core/skmsg.c                                 |  802 +++++++
 net/core/sock.c                                  |   61 -
 net/core/sock_map.c                              | 1002 +++++++++
 net/ipv4/Makefile                                |    1 +
 net/ipv4/tcp_bpf.c                               |  655 ++++++
 net/ipv4/tcp_ulp.c                               |   73 +-
 net/ipv6/af_inet6.c                              |    1 +
 net/strparser/Kconfig                            |    4 +-
 net/tls/Kconfig                                  |    1 +
 net/tls/tls_device.c                             |    2 +-
 net/tls/tls_main.c                               |   11 +-
 net/tls/tls_sw.c                                 |  900 +++++---
 tools/bpf/bpftool/Documentation/bpftool-map.rst  |   15 +-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst |   11 +
 tools/bpf/bpftool/Documentation/bpftool.rst      |   10 +-
 tools/bpf/bpftool/Makefile                       |    9 +-
 tools/bpf/bpftool/bash-completion/bpftool        |   59 +-
 tools/bpf/bpftool/common.c                       |   21 +
 tools/bpf/bpftool/main.c                         |    7 +-
 tools/bpf/bpftool/main.h                         |    4 +-
 tools/bpf/bpftool/map.c                          |  212 +-
 tools/bpf/bpftool/prog.c                         |  101 +-
 tools/lib/bpf/Makefile                           |    2 +-
 tools/lib/bpf/bpf.h                              |    3 +
 tools/lib/bpf/libbpf.c                           |   35 +-
 tools/lib/bpf/libbpf.h                           |    2 +
 tools/testing/selftests/bpf/Makefile             |    8 +-
 tools/testing/selftests/bpf/bpf_helpers.h        |    4 +
 tools/testing/selftests/bpf/config               |    1 +
 tools/testing/selftests/bpf/test_sockmap.c       |   89 +
 tools/testing/selftests/bpf/test_verifier.c      |  501 +++--
 tools/testing/selftests/bpf/test_xdp_vlan.c      |  292 +++
 tools/testing/selftests/bpf/test_xdp_vlan.sh     |  195 ++
 54 files changed, 4962 insertions(+), 3659 deletions(-)
 create mode 100644 include/linux/skmsg.h
 delete mode 100644 kernel/bpf/sockmap.c
 create mode 100644 net/core/skmsg.c
 create mode 100644 net/core/sock_map.c
 create mode 100644 net/ipv4/tcp_bpf.c
 create mode 100644 tools/testing/selftests/bpf/test_xdp_vlan.c
 create mode 100755 tools/testing/selftests/bpf/test_xdp_vlan.sh

^ permalink raw reply

* Re: [PATCH ipsec-next] xfrm: use complete IPv6 addresses for hash
From: Steffen Klassert @ 2018-10-16  8:26 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: Herbert Xu, David S. Miller, netdev, linux-kernel
In-Reply-To: <20181012122444.0448FA0ED9@unicorn.suse.cz>

On Fri, Oct 12, 2018 at 02:24:44PM +0200, Michal Kubecek wrote:
> In some environments it is common that many hosts share the same lower half
> of their IPv6 addresses (in particular ::1). As __xfrm6_addr_hash() and
> __xfrm6_daddr_saddr_hash() calculate the hash only from the lower halves,
> as much as 1/3 of the hosts ends up in one hashtable chain which harms the
> performance.
> 
> Use complete IPv6 addresses when calculating the hashes. Rather than just
> adding two more words to the xor, use jhash2() for consistency with
> __xfrm6_pref_hash() and __xfrm6_dpref_spref_hash().
> 
> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>

Applied to ipsec-next, thanks a lot!

^ permalink raw reply

* Re: [PATCH bpf-next v2 7/8] bpf: add tls support for testing in test_sockmap
From: Andrey Ignatov @ 2018-10-16  0:42 UTC (permalink / raw)
  To: Daniel Borkmann, john.fastabend@gmail.com
  Cc: alexei.starovoitov@gmail.com, Dave Watson, netdev@vger.kernel.org
In-Reply-To: <20181013004603.3747-8-daniel@iogearbox.net>

Hi Daniel and John!

Daniel Borkmann <daniel@iogearbox.net> [Fri, 2018-10-12 17:46 -0700]:
> From: John Fastabend <john.fastabend@gmail.com>
> 
> This adds a --ktls option to test_sockmap in order to enable the
> combination of ktls and sockmap to run, which makes for another
> batch of 648 test cases for both in combination.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  tools/testing/selftests/bpf/test_sockmap.c | 89 ++++++++++++++++++++++++++++++
>  1 file changed, 89 insertions(+)
> 
> diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
> index ac7de38..10a5fa8 100644
> --- a/tools/testing/selftests/bpf/test_sockmap.c
> +++ b/tools/testing/selftests/bpf/test_sockmap.c
> @@ -71,6 +71,7 @@ int txmsg_start;
>  int txmsg_end;
>  int txmsg_ingress;
>  int txmsg_skb;
> +int ktls;
>  
>  static const struct option long_options[] = {
>  	{"help",	no_argument,		NULL, 'h' },
> @@ -92,6 +93,7 @@ static const struct option long_options[] = {
>  	{"txmsg_end",	required_argument,	NULL, 'e'},
>  	{"txmsg_ingress", no_argument,		&txmsg_ingress, 1 },
>  	{"txmsg_skb", no_argument,		&txmsg_skb, 1 },
> +	{"ktls", no_argument,			&ktls, 1 },
>  	{0, 0, NULL, 0 }
>  };
>  
> @@ -112,6 +114,76 @@ static void usage(char *argv[])
>  	printf("\n");
>  }
>  
> +#define TCP_ULP 31
> +#define TLS_TX 1
> +#define TLS_RX 2
> +#include <linux/tls.h>

This breaks selftest build for me:
  test_sockmap.c:120:23: fatal error: linux/tls.h: No such file or directory
   #include <linux/tls.h>
                         ^
  compilation terminated.

Should include/uapi/linux/tls.h be copied to tools/ not to depend on
host headers?

> +
> +char *sock_to_string(int s)
> +{
> +	if (s == c1)
> +		return "client1";
> +	else if (s == c2)
> +		return "client2";
> +	else if (s == s1)
> +		return "server1";
> +	else if (s == s2)
> +		return "server2";
> +	else if (s == p1)
> +		return "peer1";
> +	else if (s == p2)
> +		return "peer2";
> +	else
> +		return "unknown";
> +}
> +
> +static int sockmap_init_ktls(int verbose, int s)
> +{
> +	struct tls12_crypto_info_aes_gcm_128 tls_tx = {
> +		.info = {
> +			.version     = TLS_1_2_VERSION,
> +			.cipher_type = TLS_CIPHER_AES_GCM_128,
> +		},
> +	};
> +	struct tls12_crypto_info_aes_gcm_128 tls_rx = {
> +		.info = {
> +			.version     = TLS_1_2_VERSION,
> +			.cipher_type = TLS_CIPHER_AES_GCM_128,
> +		},
> +	};
> +	int so_buf = 6553500;
> +	int err;
> +
> +	err = setsockopt(s, 6, TCP_ULP, "tls", sizeof("tls"));
> +	if (err) {
> +		fprintf(stderr, "setsockopt: TCP_ULP(%s) failed with error %i\n", sock_to_string(s), err);
> +		return -EINVAL;
> +	}
> +	err = setsockopt(s, SOL_TLS, TLS_TX, (void *)&tls_tx, sizeof(tls_tx));
> +	if (err) {
> +		fprintf(stderr, "setsockopt: TLS_TX(%s) failed with error %i\n", sock_to_string(s), err);
> +		return -EINVAL;
> +	}
> +	err = setsockopt(s, SOL_TLS, TLS_RX, (void *)&tls_rx, sizeof(tls_rx));
> +	if (err) {
> +		fprintf(stderr, "setsockopt: TLS_RX(%s) failed with error %i\n", sock_to_string(s), err);
> +		return -EINVAL;
> +	}
> +	err = setsockopt(s, SOL_SOCKET, SO_SNDBUF, &so_buf, sizeof(so_buf));
> +	if (err) {
> +		fprintf(stderr, "setsockopt: (%s) failed sndbuf with error %i\n", sock_to_string(s), err);
> +		return -EINVAL;
> +	}
> +	err = setsockopt(s, SOL_SOCKET, SO_RCVBUF, &so_buf, sizeof(so_buf));
> +	if (err) {
> +		fprintf(stderr, "setsockopt: (%s) failed rcvbuf with error %i\n", sock_to_string(s), err);
> +		return -EINVAL;
> +	}
> +
> +	if (verbose)
> +		fprintf(stdout, "socket(%s) kTLS enabled\n", sock_to_string(s));
> +	return 0;
> +}
>  static int sockmap_init_sockets(int verbose)
>  {
>  	int i, err, one = 1;
> @@ -456,6 +528,21 @@ static int sendmsg_test(struct sockmap_options *opt)
>  	else
>  		rx_fd = p2;
>  
> +	if (ktls) {
> +		/* Redirecting into non-TLS socket which sends into a TLS
> +		 * socket is not a valid test. So in this case lets not
> +		 * enable kTLS but still run the test.
> +		 */
> +		if (!txmsg_redir || (txmsg_redir && txmsg_ingress)) {
> +			err = sockmap_init_ktls(opt->verbose, rx_fd);
> +			if (err)
> +				return err;
> +		}
> +		err = sockmap_init_ktls(opt->verbose, c1);
> +		if (err)
> +			return err;
> +	}
> +
>  	rxpid = fork();
>  	if (rxpid == 0) {
>  		if (opt->drop_expected)
> @@ -907,6 +994,8 @@ static void test_options(char *options)
>  		strncat(options, "ingress,", OPTSTRING);
>  	if (txmsg_skb)
>  		strncat(options, "skb,", OPTSTRING);
> +	if (ktls)
> +		strncat(options, "ktls,", OPTSTRING);
>  }
>  
>  static int __test_exec(int cgrp, int test, struct sockmap_options *opt)
> -- 
> 2.9.5
> 

-- 
Andrey Ignatov

^ permalink raw reply

* Re: [PATCH bpf-next v2 7/8] bpf: add tls support for testing in test_sockmap
From: Daniel Borkmann @ 2018-10-16  0:48 UTC (permalink / raw)
  To: Andrey Ignatov, john.fastabend@gmail.com
  Cc: alexei.starovoitov@gmail.com, Dave Watson, netdev@vger.kernel.org
In-Reply-To: <20181016004243.GA95609@rdna-mbp.dhcp.thefacebook.com>

On 10/16/2018 02:42 AM, Andrey Ignatov wrote:
> Hi Daniel and John!
> 
> Daniel Borkmann <daniel@iogearbox.net> [Fri, 2018-10-12 17:46 -0700]:
>> From: John Fastabend <john.fastabend@gmail.com>
>>
>> This adds a --ktls option to test_sockmap in order to enable the
>> combination of ktls and sockmap to run, which makes for another
>> batch of 648 test cases for both in combination.
>>
>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> ---
>>  tools/testing/selftests/bpf/test_sockmap.c | 89 ++++++++++++++++++++++++++++++
>>  1 file changed, 89 insertions(+)
>>
>> diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
>> index ac7de38..10a5fa8 100644
>> --- a/tools/testing/selftests/bpf/test_sockmap.c
>> +++ b/tools/testing/selftests/bpf/test_sockmap.c
>> @@ -71,6 +71,7 @@ int txmsg_start;
>>  int txmsg_end;
>>  int txmsg_ingress;
>>  int txmsg_skb;
>> +int ktls;
>>  
>>  static const struct option long_options[] = {
>>  	{"help",	no_argument,		NULL, 'h' },
>> @@ -92,6 +93,7 @@ static const struct option long_options[] = {
>>  	{"txmsg_end",	required_argument,	NULL, 'e'},
>>  	{"txmsg_ingress", no_argument,		&txmsg_ingress, 1 },
>>  	{"txmsg_skb", no_argument,		&txmsg_skb, 1 },
>> +	{"ktls", no_argument,			&ktls, 1 },
>>  	{0, 0, NULL, 0 }
>>  };
>>  
>> @@ -112,6 +114,76 @@ static void usage(char *argv[])
>>  	printf("\n");
>>  }
>>  
>> +#define TCP_ULP 31
>> +#define TLS_TX 1
>> +#define TLS_RX 2
>> +#include <linux/tls.h>
> 
> This breaks selftest build for me:
>   test_sockmap.c:120:23: fatal error: linux/tls.h: No such file or directory
>    #include <linux/tls.h>
>                          ^
>   compilation terminated.
> 
> Should include/uapi/linux/tls.h be copied to tools/ not to depend on
> host headers?

Good point, yes, that should happen; will send a fix tomorrow morning.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH] virtio_net: enable tx after resuming from suspend
From: Jason Wang @ 2018-10-16  8:53 UTC (permalink / raw)
  To: ake
  Cc: Michael S. Tsirkin, David S. Miller, virtualization, netdev,
	linux-kernel
In-Reply-To: <e2baaccc-4ead-e61d-fc1e-d79435012e1c@igel.co.jp>


On 2018/10/15 下午6:08, ake wrote:
>
> On 2018年10月12日 18:18, ake wrote:
>>
>> On 2018年10月12日 17:23, Jason Wang wrote:
>>>
>>> On 2018年10月12日 12:30, ake wrote:
>>>> On 2018年10月11日 22:06, Jason Wang wrote:
>>>>> On 2018年10月11日 18:22, ake wrote:
>>>>>> On 2018年10月11日 18:44, Jason Wang wrote:
>>>>>>> On 2018年10月11日 15:51, Ake Koomsin wrote:
>>>>>>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
>>>>>>>> disabled the virtio tx before going to suspend to avoid a use after
>>>>>>>> free.
>>>>>>>> However, after resuming, it causes the virtio_net device to lose its
>>>>>>>> network connectivity.
>>>>>>>>
>>>>>>>> To solve the issue, we need to enable tx after resuming.
>>>>>>>>
>>>>>>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>>>>>>> reset")
>>>>>>>> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
>>>>>>>> ---
>>>>>>>>      drivers/net/virtio_net.c | 1 +
>>>>>>>>      1 file changed, 1 insertion(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>>> index dab504ec5e50..3453d80f5f81 100644
>>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>>>>>>> virtio_device *vdev)
>>>>>>>>          }
>>>>>>>>            netif_device_attach(vi->dev);
>>>>>>>> +    netif_start_queue(vi->dev);
>>>>>>> I believe this is duplicated with netif_tx_wake_all_queues() in
>>>>>>> netif_device_attach() above?
>>>>>> Thank you for your review.
>>>>>>
>>>>>> If both netif_tx_wake_all_queues() and netif_start_queue() result in
>>>>>> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
>>>>>> conditions in netif_device_attach() is not satisfied?
>>>>> Yes, maybe. One case I can see now is when the device is down, in this
>>>>> case netif_device_attach() won't try to wakeup the queue.
>>>>>
>>>>>>     Without
>>>>>> netif_start_queue(), the virtio_net device does not resume properly
>>>>>> after waking up.
>>>>> How do you trigger the issue? Just do suspend/resume?
>>>> Yes, simply suspend and resume.
>>>>
>>>> Here is how I trigger the issue:
>>>>
>>>> 1) Start the Virtual Machine Manager GUI program.
>>>> 2) Create a guest Linux OS. Make sure that the guest OS kernel is
>>>>      >= 4.12. Make sure that it uses virtio_net as its network device.
>>>>      In addition, make sure that the video adapter is VGA. Otherwise,
>>>>      waking up with the virtual power button does not work.
>>>> 3) After installing the guest OS, log in, and test the network
>>>>      connectivity by ping the host machine.
>>>> 4) Suspend. After this, the screen is blank.
>>>> 5) Resume by hitting the virtual power button. The login screen
>>>>      appears again.
>>>> 6) Log in again. The guest loses its network connection.
>>>>
>>>> In my test:
>>>> Guest: Ubuntu 16.04/18.04 with kernel 4.15.0-36-generic
>>>> Host: Ubuntu 16.04 with kernel 4.15.0-36-generic/4.4.0-137-generic
>>> I can not reproduce this issue if virtio-net interface is up in guest
>>> before the suspend. I'm using net-next.git and qemu master. But I do
>>> reproduce when virtio-net interface is down in guest before suspend,
>>> after resume, even if I make it up, the network is still lost.
>>>
>>> I think the interface is up in your case, but please confirm this.
>> If you mean the interface state before I hit the suspend button,
>> the answer is yes. The interface is up before I suspend the guest
>> machine.
>>
>> Note that my current QEMU version is QEMU emulator version 2.5.0
>> (Debian 1:2.5+dfsg-5ubuntu10.32).
>>
>> I will try with net-next.git and qemu master later and see if I can
>> reproduce the issue.
> Update. I tried with net-next and qemu master. Interestingly, the result
> is different from yours. The network is lost even if the virtio_net
> interface is up before suspending.
>
> Host: Ubuntu 16.04 with net-next kernel (default configuration)
> Guest: Ubuntu 18.04 with net-next kernel (default configuration)
> Qemu: master
> Qemu command:
> qemu-system-x86_64 -cpu host -m 2048 -enable-kvm \
> -bios /usr/share/OVMF/OVMF_CODE.fd \
> -drive file=/var/lib/libvirt/images/virtio_test.qcow2,if=virtio \
> -netdev user,id=hostnet0 \
> -device virtio-net-pci,netdev=hostnet0 \
> -device VGA,id=video0,vgamem_mb=16 \
> -global PIIX4_PM.disable_s3=1 \
> -global PIIX4_PM.disable_s4=1 -monitor stdio


Interesting, just notice you're using userspace network. To isolate the 
issue, can you retry with e.g tap or e1000 to make sure it's not a fault 
of slirp or virito-net?

Thanks

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox