Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] net: stmmac: Implement logic to automatically select HW Interface
From: David Miller @ 2018-04-23  0:59 UTC (permalink / raw)
  To: Jose.Abreu
  Cc: netdev, Joao.Pinto, Vitor.Soares, peppe.cavallaro,
	alexandre.torgue
In-Reply-To: <9d14a1f97e0b963125c898d571ff44547660e9a9.1524151243.git.joabreu@synopsys.com>

From: Jose Abreu <Jose.Abreu@synopsys.com>
Date: Thu, 19 Apr 2018 16:24:15 +0100

> @@ -0,0 +1,216 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR MIT)
> +// Copyright (c) 2018 Synopsys, Inc. and/or its affiliates.
> +// stmmac HW Interface Handling

Please do not use C++ style comments for anything past the
SPDX identifier line.

Thank you.

^ permalink raw reply

* KASAN: slab-out-of-bounds Read in __sctp_v6_cmp_addr
From: syzbot @ 2018-04-23  1:02 UTC (permalink / raw)
  To: davem, linux-kernel, linux-sctp, netdev, nhorman, syzkaller-bugs,
	vyasevich

Hello,

syzbot hit the following crash on upstream commit
83beed7b2b26f232d782127792dd0cd4362fdc41 (Fri Apr 20 17:56:32 2018 +0000)
Merge branch 'fixes' of  
git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=cd494c1dd681d4d93ebb

So far this crash happened 305 times on net-next, upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6684817483628544
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=6321732692475904
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5381423422767104
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=1808800213120130118
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+cd494c1dd681d4d93ebb@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

==================================================================
BUG: KASAN: slab-out-of-bounds in ipv6_addr_equal include/net/ipv6.h:507  
[inline]
BUG: KASAN: slab-out-of-bounds in __sctp_v6_cmp_addr+0x4c7/0x530  
net/sctp/ipv6.c:580
Read of size 8 at addr ffff8801b58626d0 by task syzkaller106428/4452

CPU: 1 PID: 4452 Comm: syzkaller106428 Not tainted 4.17.0-rc1+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
  ipv6_addr_equal include/net/ipv6.h:507 [inline]
  __sctp_v6_cmp_addr+0x4c7/0x530 net/sctp/ipv6.c:580
  sctp_inet6_cmp_addr+0x169/0x1a0 net/sctp/ipv6.c:898
  sctp_bind_addr_conflict+0x28c/0x470 net/sctp/bind_addr.c:368
  sctp_get_port_local+0x9fc/0x1540 net/sctp/socket.c:7515
  sctp_do_bind+0x21c/0x5f0 net/sctp/socket.c:435
  sctp_bindx_add+0x90/0x1a0 net/sctp/socket.c:529
  sctp_setsockopt_bindx+0x2ad/0x320 net/sctp/socket.c:1058
  sctp_setsockopt+0x12c4/0x7000 net/sctp/socket.c:4227
  sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3039
  __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
  __do_sys_setsockopt net/socket.c:1914 [inline]
  __se_sys_setsockopt net/socket.c:1911 [inline]
  __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x445839
RSP: 002b:00007fbe3f0fdd98 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00000000006dac24 RCX: 0000000000445839
RDX: 0000000000000064 RSI: 0000000000000084 RDI: 0000000000000004
RBP: 00000000006dac20 R08: 0000000000000010 R09: 000000000000a6fe
R10: 00000000205ba000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffc1404827f R14: 00007fbe3f0fe9c0 R15: 0000000000000003

Allocated by task 4452:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
  __do_kmalloc_node mm/slab.c:3682 [inline]
  __kmalloc_node+0x47/0x70 mm/slab.c:3689
  kmalloc_node include/linux/slab.h:554 [inline]
  kvmalloc_node+0x6b/0x100 mm/util.c:421
  kvmalloc include/linux/mm.h:550 [inline]
  vmemdup_user+0x2d/0xa0 mm/util.c:186
  sctp_setsockopt_bindx+0x5d/0x320 net/sctp/socket.c:1022
  sctp_setsockopt+0x12c4/0x7000 net/sctp/socket.c:4227
  sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3039
  __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
  __do_sys_setsockopt net/socket.c:1914 [inline]
  __se_sys_setsockopt net/socket.c:1911 [inline]
  __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2818:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kfree+0xd9/0x260 mm/slab.c:3813
  single_release+0x8f/0xb0 fs/seq_file.c:609
  __fput+0x34d/0x890 fs/file_table.c:209
  ____fput+0x15/0x20 fs/file_table.c:243
  task_work_run+0x1e4/0x290 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:191 [inline]
  exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166
  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
  do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801b58626c0
  which belongs to the cache kmalloc-32 of size 32
The buggy address is located 16 bytes inside of
  32-byte region [ffff8801b58626c0, ffff8801b58626e0)
The buggy address belongs to the page:
page:ffffea0006d61880 count:1 mapcount:0 mapping:ffff8801b5862000  
index:0xffff8801b5862fc1
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffff8801b5862000 ffff8801b5862fc1 0000000100000032
raw: ffffea0006ddd1e0 ffffea0006dd2860 ffff8801da8001c0 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801b5862580: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
  ffff8801b5862600: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
> ffff8801b5862680: fb fb fb fb fc fc fc fc 00 00 fc fc fc fc fc fc
                                                  ^
  ffff8801b5862700: 00 00 00 00 fc fc fc fc 00 00 04 fc fc fc fc fc
  ffff8801b5862780: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
==================================================================


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* KASAN: null-ptr-deref Read in refcount_inc_not_zero
From: syzbot @ 2018-04-23  1:02 UTC (permalink / raw)
  To: davem, dvlasenk, linux-kernel, netdev, syzkaller-bugs,
	xiaolou4617, xiyou.wangcong

Hello,

syzbot hit the following crash on upstream commit
285848b0f4074f04ab606f1e5dca296482033d54 (Sun Apr 22 04:20:48 2018 +0000)
Merge tag 'random_for_linus_stable' of  
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=6a35cd2d9559c909d570

So far this crash happened 1772 times on upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5975533900791808
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=4813418829709312
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5008564225572864
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=1808800213120130118
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+6a35cd2d9559c909d570@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
==================================================================
BUG: KASAN: null-ptr-deref in atomic_read  
include/asm-generic/atomic-instrumented.h:21 [inline]
BUG: KASAN: null-ptr-deref in refcount_inc_not_zero+0x8f/0x2d0  
lib/refcount.c:120
Read of size 4 at addr 0000000000000004 by task syzkaller633288/4488

CPU: 0 PID: 4488 Comm: syzkaller633288 Not tainted 4.17.0-rc1+ #12
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  kasan_report_error mm/kasan/report.c:352 [inline]
  kasan_report.cold.7+0x6d/0x2fe mm/kasan/report.c:412
  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
  check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
  kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272
  atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
  refcount_inc_not_zero+0x8f/0x2d0 lib/refcount.c:120
  refcount_inc+0x15/0x70 lib/refcount.c:153
  llc_sap_hold include/net/llc.h:116 [inline]
  llc_ui_release+0xba/0x2b0 net/llc/af_llc.c:207
  sock_release+0x96/0x1b0 net/socket.c:594
  sock_close+0x16/0x20 net/socket.c:1149
  __fput+0x34d/0x890 fs/file_table.c:209
  ____fput+0x15/0x20 fs/file_table.c:243
  task_work_run+0x1e4/0x290 kernel/task_work.c:113
  exit_task_work include/linux/task_work.h:22 [inline]
  do_exit+0x1aee/0x2730 kernel/exit.c:865
  do_group_exit+0x16f/0x430 kernel/exit.c:968
  __do_sys_exit_group kernel/exit.c:979 [inline]
  __se_sys_exit_group kernel/exit.c:977 [inline]
  __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977
  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x43e878
RSP: 002b:00007ffd854075f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043e878
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
RBP: 00000000004be220 R08: 00000000000000e7 R09: ffffffffffffffd0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00000000006cc160 R14: 0000000000000000 R15: 0000000000000000
==================================================================


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* Re: [PATCH] [net] ipv6: sr: fix NULL pointer dereference in seg6_do_srh_encap()- v4 pkts
From: David Miller @ 2018-04-23  1:06 UTC (permalink / raw)
  To: amsalam20; +Cc: dlebrun, kuznet, yoshfuji, netdev, linux-kernel
In-Reply-To: <1524232685-1203-1-git-send-email-amsalam20@gmail.com>

From: Ahmed Abdelsalam <amsalam20@gmail.com>
Date: Fri, 20 Apr 2018 15:58:05 +0200

> In case of seg6 in encap mode, seg6_do_srh_encap() calls set_tun_src()
> in order to set the src addr of outer IPv6 header.
> 
> The net_device is required for set_tun_src(). However calling ip6_dst_idev()
> on dst_entry in case of IPv4 traffic results on the following bug.
> 
> Using just dst->dev should fix this BUG.
 ...
> Fixes: 8936ef7604c11 ipv6: sr: fix NULL pointer dereference when setting encap source address

Please format your Fixes: tag properly next time.  The commit header
text should be enclosed by (" ").  I fixed it up for you this time.

> Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH] hv_netvsc: select needed ucs2_string routine
From: David Miller @ 2018-04-23  1:07 UTC (permalink / raw)
  To: stephen; +Cc: netdev, sthemmin
In-Reply-To: <20180420154847.29476-1-sthemmin@microsoft.com>

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Fri, 20 Apr 2018 08:48:47 -0700

> The conversion of rndis friendly name to utf8 uses a standard
> kernel routine which is optional in config. Therefore build
> would fail for some configurations. Resolve by selecting needed
> library.
> 
> Fixes: 0fe554a46a0f ("hv_netvsc: propogate Hyper-V friendly name into interface alias")
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>

Please put 'net' or 'net-next' in your Subject lines as appropriate.

I guessed that this is net-next since the Fixes commit only
exists there.

Applied.

^ permalink raw reply

* Re: [PATCH for-rc] uapi: Fix SPDX tags for files referring to the 'OpenIB.org' license
From: David Miller @ 2018-04-23  1:08 UTC (permalink / raw)
  To: jgg
  Cc: linux-rdma, kstewart, pombredanne, gregkh, tglx, swinslow,
	santosh.shilimkar, netdev, linux-kernel, davejwatson
In-Reply-To: <20180420154910.GA2519@ziepe.ca>

From: Jason Gunthorpe <jgg@mellanox.com>
Date: Fri, 20 Apr 2018 09:49:10 -0600

> Based on discussion with Kate Stewart this license is not a
> BSD-2-Clause, but is now formally identified as Linux-OpenIB
> by SPDX.
> 
> The key difference between the licenses is in the 'warranty'
> paragraph.
> 
> if_infiniband.h refers to the 'OpenIB.org' license, but
> does not include the text, instead it links to an obsolete
> web site that contains a license that matches the BSD-2-Clause
> SPX. There is no 'three clause' version of the OpenIB.org
> license.
> 
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [PATCH net] strparser: Do not call mod_delayed_work with a timeout of LONG_MAX
From: David Miller @ 2018-04-23  1:10 UTC (permalink / raw)
  To: doronrk; +Cc: tj, netdev
In-Reply-To: <20180420191111.683209-1-doronrk@fb.com>

From: Doron Roberts-Kedes <doronrk@fb.com>
Date: Fri, 20 Apr 2018 12:11:11 -0700

> struct sock's sk_rcvtimeo is initialized to
> LONG_MAX/MAX_SCHEDULE_TIMEOUT in sock_init_data. Calling
> mod_delayed_work with a timeout of LONG_MAX causes spurious execution of
> the work function. timer->expires is set equal to jiffies + LONG_MAX.
> When timer_base->clk falls behind the current value of jiffies,
> the delta between timer_base->clk and jiffies + LONG_MAX causes the
> expiration to be in the past. Returning early from strp_start_timer if
> timeo == LONG_MAX solves this problem.
> 
> Found while testing net/tls_sw recv path.
> 
> Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages")
> Reviewed-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCHv4 net 0/3] net: sched: ife: malformed ife packet fixes
From: David Miller @ 2018-04-23  1:12 UTC (permalink / raw)
  To: aring; +Cc: yotam.gi, jhs, xiyou.wangcong, jiri, yuvalm, netdev, kernel
In-Reply-To: <20180420191505.27633-1-aring@mojatatu.com>

From: Alexander Aring <aring@mojatatu.com>
Date: Fri, 20 Apr 2018 15:15:02 -0400

> As promised at netdev 2.2 tc workshop I am working on adding scapy support for
> tdc testing. It is still work in progress. I will submit the patches to tdc
> later (they are not in good shape yet). The good news is I have been able to
> find bugs which normal packet testing would not be able to find.
> With fuzzy testing I was able to craft certain malformed packets that IFE
> action was not able to deal with. This patch set fixes those bugs.

Series applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net] ibmvnic: Clean actual number of RX or TX pools
From: David Miller @ 2018-04-23  1:13 UTC (permalink / raw)
  To: tlfalcon; +Cc: netdev, nfont, jallen, linuxppc-dev
In-Reply-To: <1524252332-10272-1-git-send-email-tlfalcon@linux.vnet.ibm.com>

From: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Date: Fri, 20 Apr 2018 14:25:32 -0500

> Avoid using value stored in the login response buffer when
> cleaning TX and RX buffer pools since these could be inconsistent
> depending on the device state. Instead use the field in the driver's
> private data that tracks the number of active pools.
> 
> Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>

Applied.

^ permalink raw reply

* Re: pull-request: bpf 2018-04-21
From: David Miller @ 2018-04-23  1:16 UTC (permalink / raw)
  To: daniel; +Cc: ast, netdev
In-Reply-To: <20180421002224.3881-1-daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Sat, 21 Apr 2018 02:22:24 +0200

> The following pull-request contains BPF updates for your *net* tree.
> 
> The main changes are:
> 
> 1) Fix a deadlock between mm->mmap_sem and bpf_event_mutex when
>    one task is detaching a BPF prog via perf_event_detach_bpf_prog()
>    and another one dumping through bpf_prog_array_copy_info(). For
>    the latter we move the copy_to_user() out of the bpf_event_mutex
>    lock to fix it, from Yonghong.
> 
> 2) Fix test_sock and test_sock_addr.sh failures. The former was
>    hitting rlimit issues and the latter required ping to specify
>    the address family, from Yonghong.
> 
> 3) Remove a dead check in sockmap's sock_map_alloc(), from Jann.
> 
> 4) Add generated files to BPF kselftests gitignore that were previously
>    missed, from Anders.
> 
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Pulled, thanks Daniel.

^ permalink raw reply

* [PATCH net] ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policy
From: Eric Dumazet @ 2018-04-23  1:29 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

KMSAN reported use of uninit-value that I tracked to lack
of proper size check on RTA_TABLE attribute.

I also believe RTA_PREFSRC lacks a similar check.

Fixes: 86872cb57925 ("[IPv6] route: FIB6 configuration using struct fib6_config")
Fixes: c3968a857a6b ("ipv6: RTA_PREFSRC support for ipv6 route source address selection")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 net/ipv6/route.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 49b954d6d0fa44ea0c4427e2918b3ab9c1610fe0..cde7d8251377c1a115e02c46843d361d3c0b4313 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3975,6 +3975,7 @@ void rt6_mtu_change(struct net_device *dev, unsigned int mtu)
 
 static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = {
 	[RTA_GATEWAY]           = { .len = sizeof(struct in6_addr) },
+	[RTA_PREFSRC]		= { .len = sizeof(struct in6_addr) },
 	[RTA_OIF]               = { .type = NLA_U32 },
 	[RTA_IIF]		= { .type = NLA_U32 },
 	[RTA_PRIORITY]          = { .type = NLA_U32 },
@@ -3986,6 +3987,7 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] = {
 	[RTA_EXPIRES]		= { .type = NLA_U32 },
 	[RTA_UID]		= { .type = NLA_U32 },
 	[RTA_MARK]		= { .type = NLA_U32 },
+	[RTA_TABLE]		= { .type = NLA_U32 },
 };
 
 static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* Re: [PATCH 2/6] rhashtable: remove incorrect comment on r{hl, hash}table_walk_enter()
From: NeilBrown @ 2018-04-23  1:39 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <20180419032237.yjvjuh6n7n6tggtr@gondor.apana.org.au>

[-- Attachment #1: Type: text/plain, Size: 3166 bytes --]

On Thu, Apr 19 2018, Herbert Xu wrote:

> On Thu, Apr 19, 2018 at 08:56:28AM +1000, NeilBrown wrote:
>>
>> I don't want to do that - I just want the documentation to be correct
>> (or at least, not be blatantly incorrect).  The function does not sleep,
>> and is safe to call with spin locks held.
>> Do we need to spell out when it can be called?  If so, maybe:
>> 
>>    This function may be called from any process context, including
>>    non-preemptable context, but cannot be called from interrupts.
>
> Just to make it perfectly clear, how about "cannot be called from
> softirq or hardirq context"? Previously the not able to sleep part
> completely ruled out any ambiguity but the new wording could confuse
> people into thinking that this can be called from softirq context
> where it would be unsafe if mixed with process context usage.
>

Sound fair.  Could you Ack the following?  Then I'll resend all the
patches that have an ack.
I've realised that the "further improve stability of rhashtable_walk"
patch isn't actually complete, so I'll withdraw that for now.

Thanks,
NeilBrown

From e16c037398b6c057787437f3a0aaa7fd44c3bee3 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.com>
Date: Mon, 16 Apr 2018 11:05:39 +1000
Subject: [PATCH] rhashtable: Revise incorrect comment on
 r{hl,hash}table_walk_enter()

Neither rhashtable_walk_enter() or rhltable_walk_enter() sleep, though
they do take a spinlock without irq protection.
So revise the comments to accurately state the contexts in which
these functions can be called.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 include/linux/rhashtable.h | 5 +++--
 lib/rhashtable.c           | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 87d443a5b11d..4e1f535c2034 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -1268,8 +1268,9 @@ static inline int rhashtable_walk_init(struct rhashtable *ht,
  * For a completely stable walk you should construct your own data
  * structure outside the hash table.
  *
- * This function may sleep so you must not call it from interrupt
- * context or with spin locks held.
+ * This function may be called from any process context, including
+ * non-preemptable context, but cannot be called from softirq or
+ * hardirq context.
  *
  * You must call rhashtable_walk_exit after this function returns.
  */
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 2b2b79974b61..6d490f51174e 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -668,8 +668,9 @@ EXPORT_SYMBOL_GPL(rhashtable_insert_slow);
  * For a completely stable walk you should construct your own data
  * structure outside the hash table.
  *
- * This function may sleep so you must not call it from interrupt
- * context or with spin locks held.
+ * This function may be called from any process context, including
+ * non-preemptable context, but cannot be called from softirq or
+ * hardirq context.
  *
  * You must call rhashtable_walk_exit after this function returns.
  */
-- 
2.14.0.rc0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related

* Re: [PATCH 1/6] rhashtable: remove outdated comments about grow_decision etc
From: NeilBrown @ 2018-04-23  1:41 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, tgraf, netdev, linux-kernel
In-Reply-To: <20180418.214737.1146004592657647139.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1134 bytes --]

On Wed, Apr 18 2018, David Miller wrote:

> From: NeilBrown <neilb@suse.com>
> Date: Thu, 19 Apr 2018 09:09:05 +1000
>
>> On Wed, Apr 18 2018, Herbert Xu wrote:
>> 
>>> On Wed, Apr 18, 2018 at 04:47:01PM +1000, NeilBrown wrote:
>>>> grow_decision and shink_decision no longer exist, so remove
>>>> the remaining references to them.
>>>> 
>>>> Signed-off-by: NeilBrown <neilb@suse.com>
>>>
>>> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
>> 
>> Thanks.  Is that Ack sufficient for this patch to go upstream, or is
>> there something else that I need to do?
>
> One patch being ACK'd does not release the whole series to be applied
> and the whole series will be treated as a complete unit for that
> purpose.
>
> So if discussion is holding up one patch in the series, it holds up
> the entire series.
>
> So get the entire series in acceptable condition, or submit only one
> change at a time individually and wait for that one to be accepted
> before you submit and ask for feedback on the next one.
>
> I hope that makes things clear for you.

Yes, nice and clear.  Thanks a lot!

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks
From: kbuild test robot @ 2018-04-23  2:39 UTC (permalink / raw)
  To: Christian Brauner
  Cc: kbuild-all, ebiederm, davem, netdev, linux-kernel, avagin, ktkhai,
	serge, gregkh, Christian Brauner
In-Reply-To: <20180418152106.18519-3-christian.brauner@ubuntu.com>

[-- Attachment #1: Type: text/plain, Size: 966 bytes --]

Hi Christian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Christian-Brauner/netns-uevent-performance-tweaks/20180420-013717
config: alpha-alldefconfig (attached as .config)
compiler: alpha-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=alpha 

All errors (new ones prefixed by >>):

   kernel/ksysfs.o: In function `uevent_seqnum_show':
>> (.text+0x18c): undefined reference to `get_ns_uevent_seqnum_by_vpid'
   (.text+0x19c): undefined reference to `get_ns_uevent_seqnum_by_vpid'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6435 bytes --]

^ permalink raw reply

* Re: [PATCH bpf-next v3 3/9] bpf/verifier: refine retval R0 state for bpf_get_stack helper
From: Yonghong Song @ 2018-04-23  2:46 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180422235538.5tqayfahfeqanfou@ast-mbp>



On 4/22/18 4:55 PM, Alexei Starovoitov wrote:
> On Fri, Apr 20, 2018 at 03:18:36PM -0700, Yonghong Song wrote:
>> The special property of return values for helpers bpf_get_stack
>> and bpf_probe_read_str are captured in verifier.
>> Both helpers return a negative error code or
>> a length, which is equal to or smaller than the buffer
>> size argument. This additional information in the
>> verifier can avoid the condition such as "retval > bufsize"
>> in the bpf program. For example, for the code blow,
>>      usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
>>      if (usize < 0 || usize > max_len)
>>          return 0;
>> The verifier may have the following errors:
>>      52: (85) call bpf_get_stack#65
>>       R0=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R1_w=ctx(id=0,off=0,imm=0)
>>       R2_w=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R3_w=inv800 R4_w=inv256
>>       R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
>>       R9_w=inv800 R10=fp0,call_-1
>>      53: (bf) r8 = r0
>>      54: (bf) r1 = r8
>>      55: (67) r1 <<= 32
>>      56: (bf) r2 = r1
>>      57: (77) r2 >>= 32
>>      58: (25) if r2 > 0x31f goto pc+33
>>       R0=inv(id=0) R1=inv(id=0,smax_value=9223372032559808512,
>>                           umax_value=18446744069414584320,
>>                           var_off=(0x0; 0xffffffff00000000))
>>       R2=inv(id=0,umax_value=799,var_off=(0x0; 0x3ff))
>>       R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
>>       R8=inv(id=0) R9=inv800 R10=fp0,call_-1
>>      59: (1f) r9 -= r8
>>      60: (c7) r1 s>>= 32
>>      61: (bf) r2 = r7
>>      62: (0f) r2 += r1
>>      math between map_value pointer and register with unbounded
>>      min value is not allowed
>> The failure is due to llvm compiler optimization where register "r2",
>> which is a copy of "r1", is tested for condition while later on "r1"
>> is used for map_ptr operation. The verifier is not able to track such
>> inst sequence effectively.
>>
>> Without the "usize > max_len" condition, there is no llvm optimization
>> and the below generated code passed verifier:
>>      52: (85) call bpf_get_stack#65
>>       R0=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R1_w=ctx(id=0,off=0,imm=0)
>>       R2_w=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R3_w=inv800 R4_w=inv256
>>       R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
>>       R9_w=inv800 R10=fp0,call_-1
>>      53: (b7) r1 = 0
>>      54: (bf) r8 = r0
>>      55: (67) r8 <<= 32
>>      56: (c7) r8 s>>= 32
>>      57: (6d) if r1 s> r8 goto pc+24
>>       R0=inv(id=0,umax_value=800) R1=inv0 R6=ctx(id=0,off=0,imm=0)
>>       R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
>>       R8=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R9=inv800
>>       R10=fp0,call_-1
>>      58: (bf) r2 = r7
>>      59: (0f) r2 += r8
>>      60: (1f) r9 -= r8
>>      61: (bf) r1 = r6
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   kernel/bpf/verifier.c | 27 +++++++++++++++++++++++++++
>>   1 file changed, 27 insertions(+)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index aba9425..3c8bb92 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -164,6 +164,8 @@ struct bpf_call_arg_meta {
>>   	bool pkt_access;
>>   	int regno;
>>   	int access_size;
>> +	s64 msize_smax_value;
>> +	u64 msize_umax_value;
>>   };
>>   
>>   static DEFINE_MUTEX(bpf_verifier_lock);
>> @@ -2027,6 +2029,14 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
>>   		err = check_helper_mem_access(env, regno - 1,
>>   					      reg->umax_value,
>>   					      zero_size_allowed, meta);
>> +
>> +		if (!err && !!meta) {
> 
> Please drop !! in the above.
> 
> Also what happens when
> if (!tnum_is_const(reg->var_off))
>    meta = NULL;
> ?
> it seems two new fields of meta will stay zero initialized
> that later do_refine_retval_range() will set R0->umax_value = 0
> which seems incorrect.

Thanks for catching this. In do_refine_retval_range(), if meta is NULL,
the function should just return. Otherwise, a page fault will happen.

> 
>> +			/* remember the mem_size which may be used later
>> +			 * to refine return values.
>> +			 */
>> +			meta->msize_smax_value = reg->smax_value;
>> +			meta->msize_umax_value = reg->umax_value;
>> +		}
>>   	}
>>   
>>   	return err;
>> @@ -2333,6 +2343,21 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
>>   	return 0;
>>   }
>>   
>> +static void do_refine_retval_range(struct bpf_reg_state *regs, int ret_type,
>> +				   int func_id,
>> +				   struct bpf_call_arg_meta *meta)
>> +{
>> +	struct bpf_reg_state *ret_reg = &regs[BPF_REG_0];
>> +
>> +	if (ret_type != RET_INTEGER ||
>> +	    (func_id != BPF_FUNC_get_stack &&
>> +	     func_id != BPF_FUNC_probe_read_str))
>> +		return;
>> +
>> +	ret_reg->smax_value = meta->msize_smax_value;
>> +	ret_reg->umax_value = meta->msize_umax_value;
>> +}
>> +
>>   static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn_idx)
>>   {
>>   	const struct bpf_func_proto *fn = NULL;
>> @@ -2456,6 +2481,8 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
>>   		return -EINVAL;
>>   	}
>>   
>> +	do_refine_retval_range(regs, fn->ret_type, func_id, &meta);
>> +
>>   	err = check_map_func_compatibility(env, meta.map_ptr, func_id);
>>   	if (err)
>>   		return err;
>> -- 
>> 2.9.5
>>

^ permalink raw reply

* Re: [PATCH bpf-next v3 4/9] bpf/verifier: improve register value range tracking with ARSH
From: Yonghong Song @ 2018-04-23  2:49 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180423001615.wlxnlp6xdquzrntt@ast-mbp>



On 4/22/18 5:16 PM, Alexei Starovoitov wrote:
> On Fri, Apr 20, 2018 at 03:18:37PM -0700, Yonghong Song wrote:
>> When helpers like bpf_get_stack returns an int value
>> and later on used for arithmetic computation, the LSH and ARSH
>> operations are often required to get proper sign extension into
>> 64-bit. For example, without this patch:
>>      54: R0=inv(id=0,umax_value=800)
>>      54: (bf) r8 = r0
>>      55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
>>      55: (67) r8 <<= 32
>>      56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
>>      56: (c7) r8 s>>= 32
>>      57: R8=inv(id=0)
>> With this patch:
>>      54: R0=inv(id=0,umax_value=800)
>>      54: (bf) r8 = r0
>>      55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
>>      55: (67) r8 <<= 32
>>      56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
>>      56: (c7) r8 s>>= 32
>>      57: R8=inv(id=0, umax_value=800,var_off=(0x0; 0x3ff))
>> With better range of "R8", later on when "R8" is added to other register,
>> e.g., a map pointer or scalar-value register, the better register
>> range can be derived and verifier failure may be avoided.
>>
>> In our later example,
>>      ......
>>      usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
>>      if (usize < 0)
>>          return 0;
>>      ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
>>      ......
>> Without improving ARSH value range tracking, the register representing
>> "max_len - usize" will have smin_value equal to S64_MIN and will be
>> rejected by verifier.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   kernel/bpf/verifier.c | 26 ++++++++++++++++++++++++++
>>   1 file changed, 26 insertions(+)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 3c8bb92..01c215d 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -2975,6 +2975,32 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
>>   		/* We may learn something more from the var_off */
>>   		__update_reg_bounds(dst_reg);
>>   		break;
>> +	case BPF_ARSH:
>> +		if (umax_val >= insn_bitness) {
>> +			/* Shifts greater than 31 or 63 are undefined.
>> +			 * This includes shifts by a negative number.
>> +			 */
>> +			mark_reg_unknown(env, regs, insn->dst_reg);
>> +			break;
>> +		}
>> +		if (dst_reg->smin_value < 0)
>> +			dst_reg->smin_value >>= umin_val;
>> +		else
>> +			dst_reg->smin_value >>= umax_val;
>> +		if (dst_reg->smax_value < 0)
>> +			dst_reg->smax_value >>= umax_val;
>> +		else
>> +			dst_reg->smax_value >>= umin_val;
>> +		if (src_known)
>> +			dst_reg->var_off = tnum_rshift(dst_reg->var_off,
>> +						       umin_val);
>> +		else
>> +			dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
>> +		dst_reg->umin_value >>= umax_val;
>> +		dst_reg->umax_value >>= umin_val;
>> +		/* We may learn something more from the var_off */
>> +		__update_reg_bounds(dst_reg);
> 
> I'm struggling to understand how these bounds are computed.
> Could you add examples in the comments?

Okay, let me try to add some comments for better understanding.

> In particular if dst_reg is unknown (tnum.mask == -1)
> the above tnum_rshift() will clear upper bits and will make it
> 64-bit positive, but that doesn't seem correct.
> What am I missing?

Considering this is arith shift, we probably should just have
dst_reg->var_off = tnum_unknown to be conservative.

I could miss something here as well. Let me try to write more
detailed explanation, hopefully to cover all corner cases.

^ permalink raw reply

* bpf: samples/bpf/sockex2:  Assertion `setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, prog_fd, sizeof(prog_fd[0])) == 0' failed.
From: Wang Sheng-Hui @ 2018-04-23  2:56 UTC (permalink / raw)
  To: ast, daniel, netdev

Sorry to trouble you!

I run samples/bpf/sockex2 failed with 
"Assertion `setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, prog_fd, sizeof(prog_fd[0])) == 0' failed."

Then I run " strace ./sockex2" and got:
...
bind(3, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_ALL), sll_ifindex=if_nametoindex("lo"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) = 0
setsockopt(3, SOL_SOCKET, SO_ATTACH_BPF, [0], 4) = -1 EINVAL (Invalid argument)
write(2, "sockex2: /root/linux/samples/bpf"..., 156sockex2: /root/linux/samples/bpf/sockex2_user.c:35: main: Assertion `setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, prog_fd, sizeof(prog_fd[0])) == 0' failed.
) = 156
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb8ec4bf000
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
getpid()                                = 3513
gettid()                                = 3513
tgkill(3513, 3513, SIGABRT)             = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=3513, si_uid=0} ---
+++ killed by SIGABRT +++
Aborted


I tried to find out the root cause but failed:
sockex2 is similar with sockex1, except it run with large bpf code and hash map.
Both can pass the program load phase and open raw socket, so I think there is nothing
wrong with memcharge and other checks.
While sockex1 can work, sockext2 failed with BPF attach to socket.
I read though the sock attach code path and didn't find any specific check to report failures. 

Can someone help to fix the failure, please?

Thanks,
shenghui

^ permalink raw reply

* Re: [PATCH bpf-next v3 8/9] tools/bpf: add a test for bpf_get_stack with raw tracepoint prog
From: Yonghong Song @ 2018-04-23  2:57 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180423002301.lnhdn6oueweqztvb@ast-mbp>



On 4/22/18 5:23 PM, Alexei Starovoitov wrote:
> On Fri, Apr 20, 2018 at 03:18:41PM -0700, Yonghong Song wrote:
>> The test attached a raw_tracepoint program to sched/sched_switch.
>> It tested to get stack for user space, kernel space and user
>> space with build_id request. It also tested to get user
>> and kernel stack into the same buffer with back-to-back
>> bpf_get_stack helper calls.
>>
>> Whenever the kernel stack is available, the user space
>> application will check to ensure that the kernel function
>> for raw_tracepoint ___bpf_prog_run is part of the stack.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   tools/testing/selftests/bpf/Makefile               |   3 +-
>>   tools/testing/selftests/bpf/test_get_stack_rawtp.c | 102 ++++++++++++++++++
>>   tools/testing/selftests/bpf/test_progs.c           | 115 +++++++++++++++++++++
>>   3 files changed, 219 insertions(+), 1 deletion(-)
>>   create mode 100644 tools/testing/selftests/bpf/test_get_stack_rawtp.c
>>
>> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
>> index 0b72cc7..54e9e74 100644
>> --- a/tools/testing/selftests/bpf/Makefile
>> +++ b/tools/testing/selftests/bpf/Makefile
>> @@ -32,7 +32,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
>>   	test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
>>   	sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
>>   	sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \
>> -	test_btf_haskv.o test_btf_nokv.o
>> +	test_btf_haskv.o test_btf_nokv.o test_get_stack_rawtp.o
>>   
>>   # Order correspond to 'make run_tests' order
>>   TEST_PROGS := test_kmod.sh \
>> @@ -56,6 +56,7 @@ $(TEST_GEN_PROGS_EXTENDED): $(OUTPUT)/libbpf.a
>>   $(OUTPUT)/test_dev_cgroup: cgroup_helpers.c
>>   $(OUTPUT)/test_sock: cgroup_helpers.c
>>   $(OUTPUT)/test_sock_addr: cgroup_helpers.c
>> +$(OUTPUT)/test_progs: trace_helpers.c
>>   
>>   .PHONY: force
>>   
>> diff --git a/tools/testing/selftests/bpf/test_get_stack_rawtp.c b/tools/testing/selftests/bpf/test_get_stack_rawtp.c
>> new file mode 100644
>> index 0000000..ba1dcf9
>> --- /dev/null
>> +++ b/tools/testing/selftests/bpf/test_get_stack_rawtp.c
>> @@ -0,0 +1,102 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#include <linux/bpf.h>
>> +#include "bpf_helpers.h"
>> +
>> +/* Permit pretty deep stack traces */
>> +#define MAX_STACK_RAWTP 100
>> +struct stack_trace_t {
>> +	int pid;
>> +	int kern_stack_size;
>> +	int user_stack_size;
>> +	int user_stack_buildid_size;
>> +	__u64 kern_stack[MAX_STACK_RAWTP];
>> +	__u64 user_stack[MAX_STACK_RAWTP];
>> +	struct bpf_stack_build_id user_stack_buildid[MAX_STACK_RAWTP];
>> +};
>> +
>> +struct bpf_map_def SEC("maps") perfmap = {
>> +	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
>> +	.key_size = sizeof(int),
>> +	.value_size = sizeof(__u32),
>> +	.max_entries = 2,
>> +};
>> +
>> +struct bpf_map_def SEC("maps") stackdata_map = {
>> +	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
>> +	.key_size = sizeof(__u32),
>> +	.value_size = sizeof(struct stack_trace_t),
>> +	.max_entries = 1,
>> +};
>> +
>> +/* Allocate per-cpu space twice the needed. For the code below
>> + *   usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
>> + *   if (usize < 0)
>> + *     return 0;
>> + *   ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
>> + *
>> + * If we have value_size = MAX_STACK_RAWTP * sizeof(__u64),
>> + * verifier will complain that access "raw_data + usize"
>> + * with size "max_len - usize" may be out of bound.
>> + * The maximum "raw_data + usize" is "raw_data + max_len"
>> + * and the maximum "max_len - usize" is "max_len", verifier
>> + * concludes that the maximum buffer access range is
>> + * "raw_data[0...max_len * 2 - 1]" and hence reject the program.
>> + *
>> + * Doubling the to-be-used max buffer size can fix this verifier
>> + * issue and avoid complicated C programming massaging.
>> + * This is an acceptable workaround since there is one entry here.
>> + */
>> +struct bpf_map_def SEC("maps") rawdata_map = {
>> +	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
>> +	.key_size = sizeof(__u32),
>> +	.value_size = MAX_STACK_RAWTP * sizeof(__u64) * 2,
>> +	.max_entries = 1,
>> +};
>> +
>> +SEC("tracepoint/sched/sched_switch")
>> +int bpf_prog1(void *ctx)
>> +{
>> +	int max_len, max_buildid_len, usize, ksize, total_size;
>> +	struct stack_trace_t *data;
>> +	void *raw_data;
>> +	__u32 key = 0;
>> +
>> +	data = bpf_map_lookup_elem(&stackdata_map, &key);
>> +	if (!data)
>> +		return 0;
>> +
>> +	max_len = MAX_STACK_RAWTP * sizeof(__u64);
>> +	max_buildid_len = MAX_STACK_RAWTP * sizeof(struct bpf_stack_build_id);
>> +	data->pid = bpf_get_current_pid_tgid();
>> +	data->kern_stack_size = bpf_get_stack(ctx, data->kern_stack,
>> +					      max_len, 0);
>> +	data->user_stack_size = bpf_get_stack(ctx, data->user_stack, max_len,
>> +					    BPF_F_USER_STACK);
>> +	data->user_stack_buildid_size = bpf_get_stack(
>> +		ctx, data->user_stack_buildid, max_buildid_len,
>> +		BPF_F_USER_STACK | BPF_F_USER_BUILD_ID);
>> +	bpf_perf_event_output(ctx, &perfmap, 0, data, sizeof(*data));
>> +
>> +	/* write both kernel and user stacks to the same buffer */
>> +	raw_data = bpf_map_lookup_elem(&rawdata_map, &key);
>> +	if (!raw_data)
>> +		return 0;
>> +
>> +	usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
>> +	if (usize < 0)
>> +		return 0;
>> +
>> +	ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
>> +	if (ksize < 0)
> 
> may be instead of teaching verifier about ARSH (which doesn't look
> straighforward) such use case can be done as:
> u32 max_len, usize, ksize;
> ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
> if ((int)ksize < 0)

Just tried, it does not work. The compiler generates code like:

47: (b7) r9 = 800
48: (bf) r1 = r6
49: (bf) r2 = r7
50: (b7) r3 = 800
51: (b7) r4 = 256
52: (85) call bpf_get_stack#66
  R0=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R1_w=ctx(id=0,off=0,imm=0) 
R2_w=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R3_w=inv800 R4_w=inv256 
R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) 
R9_w=inv800 R10=fp0,call_-1
53: (b7) r1 = 0
54: (bf) r8 = r0
55: (67) r8 <<= 32
56: (c7) r8 s>>= 32
57: (6d) if r1 s> r8 goto pc+27
  R0=inv(id=0,umax_value=800) R1=inv0 R6=ctx(id=0,off=0,imm=0) 
R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) 
R8=inv(id=0,umax_value=9223372036854775807,var_off=(0x0; 
0x7fffffffffffffff)) R9=inv800 R10=fp0,call_-1
58: (1f) r9 -= r8
59: (bf) r1 = r8
60: (67) r1 <<= 32
61: (77) r1 >>= 32
62: (bf) r2 = r7
63: (0f) r2 += r1
64: (bf) r1 = r6
65: (bf) r3 = r9
66: (b7) r4 = 0
67: (85) call bpf_get_stack#66
R3 min value is negative, either use unsigned or 'var &= const'

Basically, the compiler does lsh/arsh for "int" value to do the 
comparison and then get this value does a lsh/rsh.
So it looks like we still need arsh.

> 
> That's certainly suboptimal and very much non obvious to program
> developers, but at least it can unblock the bpf_get_stack part
> landing and proper ARSH support can be added later?
> Just a thought.
> 
>> +		return 0;
>> +
>> +	total_size = usize + ksize;
>> +	if (total_size > 0 && total_size <= max_len)
>> +		bpf_perf_event_output(ctx, &perfmap, 0, raw_data, total_size);
>> +
>> +	return 0;
>> +}
> 
> the rest of the test looks great. Thank you for adding such exhaustive test.
> 

^ permalink raw reply

* Re: [PATCH bpf-next v3 9/9] tools/bpf: add a test for bpf_get_stack with tracepoint prog
From: Yonghong Song @ 2018-04-23  2:58 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180423002732.6fw45mevsz3bixkq@ast-mbp>



On 4/22/18 5:27 PM, Alexei Starovoitov wrote:
> On Fri, Apr 20, 2018 at 03:18:42PM -0700, Yonghong Song wrote:
>> The test_stacktrace_map and test_stacktrace_build_id are
>> enhanced to call bpf_get_stack in the helper to get the
>> stack trace as well.  The stack traces from bpf_get_stack
>> and bpf_get_stackid are compared to ensure that for the
>> same stack as represented as the same hash, their ip addresses
>> or build id's must be the same.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   tools/testing/selftests/bpf/test_progs.c           | 63 +++++++++++++++++++---
>>   .../selftests/bpf/test_stacktrace_build_id.c       | 20 ++++++-
>>   tools/testing/selftests/bpf/test_stacktrace_map.c  | 20 +++++--
>>   3 files changed, 92 insertions(+), 11 deletions(-)
>>
>> diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
>> index dad4c3f..06b922a 100644
>> --- a/tools/testing/selftests/bpf/test_progs.c
>> +++ b/tools/testing/selftests/bpf/test_progs.c
>> @@ -897,11 +897,40 @@ static int compare_map_keys(int map1_fd, int map2_fd)
>>   	return 0;
>>   }
>>   
>> +static int compare_stack_ips(int smap_fd, int amap_fd, int stack_trace_len)
>> +{
>> +	__u32 key, next_key, *cur_key_p, *next_key_p;
>> +	char val_buf1[stack_trace_len], val_buf2[stack_trace_len];
> 
> the kernel is trying to get rid of VLAs.
> test_progs.c already uses them, but if possible let's not
> add more uses of them.

okay, try to get rid of these two VLAs.

> Other than that looks great.

^ permalink raw reply

* [PATCH bpf-next] bpf: fix virtio-net's length calc for XDP_PASS
From: Nikita V. Shirokov @ 2018-04-23  4:16 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Michael S. Tsirkin,
	Jason Wang
  Cc: netdev, David Ahern, Nikita V. Shirokov

In commit 6870de435b90 ("bpf: make virtio compatible w/ 
bpf_xdp_adjust_tail") i didn't account for vi->hdr_len during new 
packet's length calculation after bpf_prog_run in receive_mergeable. 
because of this all packets, if they were passed to the kernel, 
were truncated by 12 bytes.

Fixes:6870de435b90 ("bpf: make virtio compatible w/ bpf_xdp_adjust_tail")
Reported-by: David Ahern <dsahern@gmail.com>

Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
---

Notes:
    unfortunately it looks like that xdp_tx is still broken because
    fix by Jason (introduced in "XDP_TX for virtio_net not working in recent kernel?
    " thread) haven't landed yet)

 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 779a4f798522..08ac2cc986aa 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -761,7 +761,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			/* recalculate len if xdp.data or xdp.data_end were
 			 * adjusted
 			 */
-			len = xdp.data_end - xdp.data;
+			len = xdp.data_end - xdp.data + vi->hdr_len;
 			/* We can only create skb based on xdp_page. */
 			if (unlikely(xdp_page != page)) {
 				rcu_read_unlock();
-- 
2.15.1

^ permalink raw reply related

* Re: [PATCH net] team: check team dev npinfo when adding a port only
From: kbuild test robot @ 2018-04-23  4:17 UTC (permalink / raw)
  To: Xin Long; +Cc: kbuild-all, network dev, davem, Jiri Pirko, stephen hemminger
In-Reply-To: <ac96d2737077b41d1e7cd68164d881faa18f413f.1524395280.git.lucien.xin@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6108 bytes --]

Hi Xin,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net/master]

url:    https://github.com/0day-ci/linux/commits/Xin-Long/team-check-team-dev-npinfo-when-adding-a-port-only/20180423-114310
config: x86_64-randconfig-x011-201816 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/net/team/team.c: In function 'team_port_add':
>> drivers/net/team/team.c:1221:17: error: 'struct net_device' has no member named 'npinfo'; did you mean 'vlan_info'?
     if (team->dev->npinfo) {
                    ^~~~~~
                    vlan_info

vim +1221 drivers/net/team/team.c

  1136	
  1137	static void __team_port_change_port_added(struct team_port *port, bool linkup);
  1138	static int team_dev_type_check_change(struct net_device *dev,
  1139					      struct net_device *port_dev);
  1140	
  1141	static int team_port_add(struct team *team, struct net_device *port_dev,
  1142				 struct netlink_ext_ack *extack)
  1143	{
  1144		struct net_device *dev = team->dev;
  1145		struct team_port *port;
  1146		char *portname = port_dev->name;
  1147		int err;
  1148	
  1149		if (port_dev->flags & IFF_LOOPBACK) {
  1150			NL_SET_ERR_MSG(extack, "Loopback device can't be added as a team port");
  1151			netdev_err(dev, "Device %s is loopback device. Loopback devices can't be added as a team port\n",
  1152				   portname);
  1153			return -EINVAL;
  1154		}
  1155	
  1156		if (team_port_exists(port_dev)) {
  1157			NL_SET_ERR_MSG(extack, "Device is already a port of a team device");
  1158			netdev_err(dev, "Device %s is already a port "
  1159					"of a team device\n", portname);
  1160			return -EBUSY;
  1161		}
  1162	
  1163		if (port_dev->features & NETIF_F_VLAN_CHALLENGED &&
  1164		    vlan_uses_dev(dev)) {
  1165			NL_SET_ERR_MSG(extack, "Device is VLAN challenged and team device has VLAN set up");
  1166			netdev_err(dev, "Device %s is VLAN challenged and team device has VLAN set up\n",
  1167				   portname);
  1168			return -EPERM;
  1169		}
  1170	
  1171		err = team_dev_type_check_change(dev, port_dev);
  1172		if (err)
  1173			return err;
  1174	
  1175		if (port_dev->flags & IFF_UP) {
  1176			NL_SET_ERR_MSG(extack, "Device is up. Set it down before adding it as a team port");
  1177			netdev_err(dev, "Device %s is up. Set it down before adding it as a team port\n",
  1178				   portname);
  1179			return -EBUSY;
  1180		}
  1181	
  1182		port = kzalloc(sizeof(struct team_port) + team->mode->port_priv_size,
  1183			       GFP_KERNEL);
  1184		if (!port)
  1185			return -ENOMEM;
  1186	
  1187		port->dev = port_dev;
  1188		port->team = team;
  1189		INIT_LIST_HEAD(&port->qom_list);
  1190	
  1191		port->orig.mtu = port_dev->mtu;
  1192		err = dev_set_mtu(port_dev, dev->mtu);
  1193		if (err) {
  1194			netdev_dbg(dev, "Error %d calling dev_set_mtu\n", err);
  1195			goto err_set_mtu;
  1196		}
  1197	
  1198		memcpy(port->orig.dev_addr, port_dev->dev_addr, port_dev->addr_len);
  1199	
  1200		err = team_port_enter(team, port);
  1201		if (err) {
  1202			netdev_err(dev, "Device %s failed to enter team mode\n",
  1203				   portname);
  1204			goto err_port_enter;
  1205		}
  1206	
  1207		err = dev_open(port_dev);
  1208		if (err) {
  1209			netdev_dbg(dev, "Device %s opening failed\n",
  1210				   portname);
  1211			goto err_dev_open;
  1212		}
  1213	
  1214		err = vlan_vids_add_by_dev(port_dev, dev);
  1215		if (err) {
  1216			netdev_err(dev, "Failed to add vlan ids to device %s\n",
  1217					portname);
  1218			goto err_vids_add;
  1219		}
  1220	
> 1221		if (team->dev->npinfo) {
  1222			err = team_port_enable_netpoll(team, port);
  1223			if (err) {
  1224				netdev_err(dev, "Failed to enable netpoll on device %s\n",
  1225					   portname);
  1226				goto err_enable_netpoll;
  1227			}
  1228		}
  1229	
  1230		if (!(dev->features & NETIF_F_LRO))
  1231			dev_disable_lro(port_dev);
  1232	
  1233		err = netdev_rx_handler_register(port_dev, team_handle_frame,
  1234						 port);
  1235		if (err) {
  1236			netdev_err(dev, "Device %s failed to register rx_handler\n",
  1237				   portname);
  1238			goto err_handler_register;
  1239		}
  1240	
  1241		err = team_upper_dev_link(team, port, extack);
  1242		if (err) {
  1243			netdev_err(dev, "Device %s failed to set upper link\n",
  1244				   portname);
  1245			goto err_set_upper_link;
  1246		}
  1247	
  1248		err = __team_option_inst_add_port(team, port);
  1249		if (err) {
  1250			netdev_err(dev, "Device %s failed to add per-port options\n",
  1251				   portname);
  1252			goto err_option_port_add;
  1253		}
  1254	
  1255		netif_addr_lock_bh(dev);
  1256		dev_uc_sync_multiple(port_dev, dev);
  1257		dev_mc_sync_multiple(port_dev, dev);
  1258		netif_addr_unlock_bh(dev);
  1259	
  1260		port->index = -1;
  1261		list_add_tail_rcu(&port->list, &team->port_list);
  1262		team_port_enable(team, port);
  1263		__team_compute_features(team);
  1264		__team_port_change_port_added(port, !!netif_carrier_ok(port_dev));
  1265		__team_options_change_check(team);
  1266	
  1267		netdev_info(dev, "Port device %s added\n", portname);
  1268	
  1269		return 0;
  1270	
  1271	err_option_port_add:
  1272		team_upper_dev_unlink(team, port);
  1273	
  1274	err_set_upper_link:
  1275		netdev_rx_handler_unregister(port_dev);
  1276	
  1277	err_handler_register:
  1278		team_port_disable_netpoll(port);
  1279	
  1280	err_enable_netpoll:
  1281		vlan_vids_del_by_dev(port_dev, dev);
  1282	
  1283	err_vids_add:
  1284		dev_close(port_dev);
  1285	
  1286	err_dev_open:
  1287		team_port_leave(team, port);
  1288		team_port_set_orig_dev_addr(port);
  1289	
  1290	err_port_enter:
  1291		dev_set_mtu(port_dev, port->orig.mtu);
  1292	
  1293	err_set_mtu:
  1294		kfree(port);
  1295	
  1296		return err;
  1297	}
  1298	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30326 bytes --]

^ permalink raw reply

* Re: [PATCH bpf-next v3 4/9] bpf/verifier: improve register value range tracking with ARSH
From: Alexei Starovoitov @ 2018-04-23  4:19 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <e42ee666-dfbc-23e1-63c4-d7c548a2d3ee@fb.com>

On Sun, Apr 22, 2018 at 07:49:13PM -0700, Yonghong Song wrote:
> 
> 
> On 4/22/18 5:16 PM, Alexei Starovoitov wrote:
> > On Fri, Apr 20, 2018 at 03:18:37PM -0700, Yonghong Song wrote:
> > > When helpers like bpf_get_stack returns an int value
> > > and later on used for arithmetic computation, the LSH and ARSH
> > > operations are often required to get proper sign extension into
> > > 64-bit. For example, without this patch:
> > >      54: R0=inv(id=0,umax_value=800)
> > >      54: (bf) r8 = r0
> > >      55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
> > >      55: (67) r8 <<= 32
> > >      56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
> > >      56: (c7) r8 s>>= 32
> > >      57: R8=inv(id=0)
> > > With this patch:
> > >      54: R0=inv(id=0,umax_value=800)
> > >      54: (bf) r8 = r0
> > >      55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
> > >      55: (67) r8 <<= 32
> > >      56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
> > >      56: (c7) r8 s>>= 32
> > >      57: R8=inv(id=0, umax_value=800,var_off=(0x0; 0x3ff))
> > > With better range of "R8", later on when "R8" is added to other register,
> > > e.g., a map pointer or scalar-value register, the better register
> > > range can be derived and verifier failure may be avoided.
> > > 
> > > In our later example,
> > >      ......
> > >      usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
> > >      if (usize < 0)
> > >          return 0;
> > >      ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
> > >      ......
> > > Without improving ARSH value range tracking, the register representing
> > > "max_len - usize" will have smin_value equal to S64_MIN and will be
> > > rejected by verifier.
> > > 
> > > Signed-off-by: Yonghong Song <yhs@fb.com>
> > > ---
> > >   kernel/bpf/verifier.c | 26 ++++++++++++++++++++++++++
> > >   1 file changed, 26 insertions(+)
> > > 
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 3c8bb92..01c215d 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -2975,6 +2975,32 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
> > >   		/* We may learn something more from the var_off */
> > >   		__update_reg_bounds(dst_reg);
> > >   		break;
> > > +	case BPF_ARSH:
> > > +		if (umax_val >= insn_bitness) {
> > > +			/* Shifts greater than 31 or 63 are undefined.
> > > +			 * This includes shifts by a negative number.
> > > +			 */
> > > +			mark_reg_unknown(env, regs, insn->dst_reg);
> > > +			break;
> > > +		}
> > > +		if (dst_reg->smin_value < 0)
> > > +			dst_reg->smin_value >>= umin_val;
> > > +		else
> > > +			dst_reg->smin_value >>= umax_val;
> > > +		if (dst_reg->smax_value < 0)
> > > +			dst_reg->smax_value >>= umax_val;
> > > +		else
> > > +			dst_reg->smax_value >>= umin_val;
> > > +		if (src_known)
> > > +			dst_reg->var_off = tnum_rshift(dst_reg->var_off,
> > > +						       umin_val);
> > > +		else
> > > +			dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
> > > +		dst_reg->umin_value >>= umax_val;
> > > +		dst_reg->umax_value >>= umin_val;
> > > +		/* We may learn something more from the var_off */
> > > +		__update_reg_bounds(dst_reg);
> > 
> > I'm struggling to understand how these bounds are computed.
> > Could you add examples in the comments?
> 
> Okay, let me try to add some comments for better understanding.
> 
> > In particular if dst_reg is unknown (tnum.mask == -1)
> > the above tnum_rshift() will clear upper bits and will make it
> > 64-bit positive, but that doesn't seem correct.
> > What am I missing?
> 
> Considering this is arith shift, we probably should just have
> dst_reg->var_off = tnum_unknown to be conservative.
> 
> I could miss something here as well. Let me try to write more
> detailed explanation, hopefully to cover all corner cases.

Is there a use case for !src_known ?
I think test_verifier should have 100% line coverage of verifier.c
and every 'if' condition in the verifier needs to have real use case
behind it.
It's still on my todo list to get rid of [su][min|max]_value tracking
that was introduced without solid justification.

^ permalink raw reply

* Re: [PATCH net] team: check team dev npinfo when adding a port only
From: kbuild test robot @ 2018-04-23  4:20 UTC (permalink / raw)
  To: Xin Long; +Cc: kbuild-all, network dev, davem, Jiri Pirko, stephen hemminger
In-Reply-To: <ac96d2737077b41d1e7cd68164d881faa18f413f.1524395280.git.lucien.xin@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6041 bytes --]

Hi Xin,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net/master]

url:    https://github.com/0day-ci/linux/commits/Xin-Long/team-check-team-dev-npinfo-when-adding-a-port-only/20180423-114310
config: i386-randconfig-x071-201816 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/net/team/team.c: In function 'team_port_add':
>> drivers/net/team/team.c:1221:15: error: 'struct net_device' has no member named 'npinfo'
     if (team->dev->npinfo) {
                  ^~

vim +1221 drivers/net/team/team.c

  1136	
  1137	static void __team_port_change_port_added(struct team_port *port, bool linkup);
  1138	static int team_dev_type_check_change(struct net_device *dev,
  1139					      struct net_device *port_dev);
  1140	
  1141	static int team_port_add(struct team *team, struct net_device *port_dev,
  1142				 struct netlink_ext_ack *extack)
  1143	{
  1144		struct net_device *dev = team->dev;
  1145		struct team_port *port;
  1146		char *portname = port_dev->name;
  1147		int err;
  1148	
  1149		if (port_dev->flags & IFF_LOOPBACK) {
  1150			NL_SET_ERR_MSG(extack, "Loopback device can't be added as a team port");
  1151			netdev_err(dev, "Device %s is loopback device. Loopback devices can't be added as a team port\n",
  1152				   portname);
  1153			return -EINVAL;
  1154		}
  1155	
  1156		if (team_port_exists(port_dev)) {
  1157			NL_SET_ERR_MSG(extack, "Device is already a port of a team device");
  1158			netdev_err(dev, "Device %s is already a port "
  1159					"of a team device\n", portname);
  1160			return -EBUSY;
  1161		}
  1162	
  1163		if (port_dev->features & NETIF_F_VLAN_CHALLENGED &&
  1164		    vlan_uses_dev(dev)) {
  1165			NL_SET_ERR_MSG(extack, "Device is VLAN challenged and team device has VLAN set up");
  1166			netdev_err(dev, "Device %s is VLAN challenged and team device has VLAN set up\n",
  1167				   portname);
  1168			return -EPERM;
  1169		}
  1170	
  1171		err = team_dev_type_check_change(dev, port_dev);
  1172		if (err)
  1173			return err;
  1174	
  1175		if (port_dev->flags & IFF_UP) {
  1176			NL_SET_ERR_MSG(extack, "Device is up. Set it down before adding it as a team port");
  1177			netdev_err(dev, "Device %s is up. Set it down before adding it as a team port\n",
  1178				   portname);
  1179			return -EBUSY;
  1180		}
  1181	
  1182		port = kzalloc(sizeof(struct team_port) + team->mode->port_priv_size,
  1183			       GFP_KERNEL);
  1184		if (!port)
  1185			return -ENOMEM;
  1186	
  1187		port->dev = port_dev;
  1188		port->team = team;
  1189		INIT_LIST_HEAD(&port->qom_list);
  1190	
  1191		port->orig.mtu = port_dev->mtu;
  1192		err = dev_set_mtu(port_dev, dev->mtu);
  1193		if (err) {
  1194			netdev_dbg(dev, "Error %d calling dev_set_mtu\n", err);
  1195			goto err_set_mtu;
  1196		}
  1197	
  1198		memcpy(port->orig.dev_addr, port_dev->dev_addr, port_dev->addr_len);
  1199	
  1200		err = team_port_enter(team, port);
  1201		if (err) {
  1202			netdev_err(dev, "Device %s failed to enter team mode\n",
  1203				   portname);
  1204			goto err_port_enter;
  1205		}
  1206	
  1207		err = dev_open(port_dev);
  1208		if (err) {
  1209			netdev_dbg(dev, "Device %s opening failed\n",
  1210				   portname);
  1211			goto err_dev_open;
  1212		}
  1213	
  1214		err = vlan_vids_add_by_dev(port_dev, dev);
  1215		if (err) {
  1216			netdev_err(dev, "Failed to add vlan ids to device %s\n",
  1217					portname);
  1218			goto err_vids_add;
  1219		}
  1220	
> 1221		if (team->dev->npinfo) {
  1222			err = team_port_enable_netpoll(team, port);
  1223			if (err) {
  1224				netdev_err(dev, "Failed to enable netpoll on device %s\n",
  1225					   portname);
  1226				goto err_enable_netpoll;
  1227			}
  1228		}
  1229	
  1230		if (!(dev->features & NETIF_F_LRO))
  1231			dev_disable_lro(port_dev);
  1232	
  1233		err = netdev_rx_handler_register(port_dev, team_handle_frame,
  1234						 port);
  1235		if (err) {
  1236			netdev_err(dev, "Device %s failed to register rx_handler\n",
  1237				   portname);
  1238			goto err_handler_register;
  1239		}
  1240	
  1241		err = team_upper_dev_link(team, port, extack);
  1242		if (err) {
  1243			netdev_err(dev, "Device %s failed to set upper link\n",
  1244				   portname);
  1245			goto err_set_upper_link;
  1246		}
  1247	
  1248		err = __team_option_inst_add_port(team, port);
  1249		if (err) {
  1250			netdev_err(dev, "Device %s failed to add per-port options\n",
  1251				   portname);
  1252			goto err_option_port_add;
  1253		}
  1254	
  1255		netif_addr_lock_bh(dev);
  1256		dev_uc_sync_multiple(port_dev, dev);
  1257		dev_mc_sync_multiple(port_dev, dev);
  1258		netif_addr_unlock_bh(dev);
  1259	
  1260		port->index = -1;
  1261		list_add_tail_rcu(&port->list, &team->port_list);
  1262		team_port_enable(team, port);
  1263		__team_compute_features(team);
  1264		__team_port_change_port_added(port, !!netif_carrier_ok(port_dev));
  1265		__team_options_change_check(team);
  1266	
  1267		netdev_info(dev, "Port device %s added\n", portname);
  1268	
  1269		return 0;
  1270	
  1271	err_option_port_add:
  1272		team_upper_dev_unlink(team, port);
  1273	
  1274	err_set_upper_link:
  1275		netdev_rx_handler_unregister(port_dev);
  1276	
  1277	err_handler_register:
  1278		team_port_disable_netpoll(port);
  1279	
  1280	err_enable_netpoll:
  1281		vlan_vids_del_by_dev(port_dev, dev);
  1282	
  1283	err_vids_add:
  1284		dev_close(port_dev);
  1285	
  1286	err_dev_open:
  1287		team_port_leave(team, port);
  1288		team_port_set_orig_dev_addr(port);
  1289	
  1290	err_port_enter:
  1291		dev_set_mtu(port_dev, port->orig.mtu);
  1292	
  1293	err_set_mtu:
  1294		kfree(port);
  1295	
  1296		return err;
  1297	}
  1298	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27063 bytes --]

^ permalink raw reply

* Re: [PATCH bpf-next v3 4/9] bpf/verifier: improve register value range tracking with ARSH
From: Yonghong Song @ 2018-04-23  4:31 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180423041901.44xlyekpw3kehh7v@ast-mbp>



On 4/22/18 9:19 PM, Alexei Starovoitov wrote:
> On Sun, Apr 22, 2018 at 07:49:13PM -0700, Yonghong Song wrote:
>>
>>
>> On 4/22/18 5:16 PM, Alexei Starovoitov wrote:
>>> On Fri, Apr 20, 2018 at 03:18:37PM -0700, Yonghong Song wrote:
>>>> When helpers like bpf_get_stack returns an int value
>>>> and later on used for arithmetic computation, the LSH and ARSH
>>>> operations are often required to get proper sign extension into
>>>> 64-bit. For example, without this patch:
>>>>       54: R0=inv(id=0,umax_value=800)
>>>>       54: (bf) r8 = r0
>>>>       55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
>>>>       55: (67) r8 <<= 32
>>>>       56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
>>>>       56: (c7) r8 s>>= 32
>>>>       57: R8=inv(id=0)
>>>> With this patch:
>>>>       54: R0=inv(id=0,umax_value=800)
>>>>       54: (bf) r8 = r0
>>>>       55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
>>>>       55: (67) r8 <<= 32
>>>>       56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
>>>>       56: (c7) r8 s>>= 32
>>>>       57: R8=inv(id=0, umax_value=800,var_off=(0x0; 0x3ff))
>>>> With better range of "R8", later on when "R8" is added to other register,
>>>> e.g., a map pointer or scalar-value register, the better register
>>>> range can be derived and verifier failure may be avoided.
>>>>
>>>> In our later example,
>>>>       ......
>>>>       usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
>>>>       if (usize < 0)
>>>>           return 0;
>>>>       ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
>>>>       ......
>>>> Without improving ARSH value range tracking, the register representing
>>>> "max_len - usize" will have smin_value equal to S64_MIN and will be
>>>> rejected by verifier.
>>>>
>>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>>> ---
>>>>    kernel/bpf/verifier.c | 26 ++++++++++++++++++++++++++
>>>>    1 file changed, 26 insertions(+)
>>>>
>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>> index 3c8bb92..01c215d 100644
>>>> --- a/kernel/bpf/verifier.c
>>>> +++ b/kernel/bpf/verifier.c
>>>> @@ -2975,6 +2975,32 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
>>>>    		/* We may learn something more from the var_off */
>>>>    		__update_reg_bounds(dst_reg);
>>>>    		break;
>>>> +	case BPF_ARSH:
>>>> +		if (umax_val >= insn_bitness) {
>>>> +			/* Shifts greater than 31 or 63 are undefined.
>>>> +			 * This includes shifts by a negative number.
>>>> +			 */
>>>> +			mark_reg_unknown(env, regs, insn->dst_reg);
>>>> +			break;
>>>> +		}
>>>> +		if (dst_reg->smin_value < 0)
>>>> +			dst_reg->smin_value >>= umin_val;
>>>> +		else
>>>> +			dst_reg->smin_value >>= umax_val;
>>>> +		if (dst_reg->smax_value < 0)
>>>> +			dst_reg->smax_value >>= umax_val;
>>>> +		else
>>>> +			dst_reg->smax_value >>= umin_val;
>>>> +		if (src_known)
>>>> +			dst_reg->var_off = tnum_rshift(dst_reg->var_off,
>>>> +						       umin_val);
>>>> +		else
>>>> +			dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
>>>> +		dst_reg->umin_value >>= umax_val;
>>>> +		dst_reg->umax_value >>= umin_val;
>>>> +		/* We may learn something more from the var_off */
>>>> +		__update_reg_bounds(dst_reg);
>>>
>>> I'm struggling to understand how these bounds are computed.
>>> Could you add examples in the comments?
>>
>> Okay, let me try to add some comments for better understanding.
>>
>>> In particular if dst_reg is unknown (tnum.mask == -1)
>>> the above tnum_rshift() will clear upper bits and will make it
>>> 64-bit positive, but that doesn't seem correct.
>>> What am I missing?
>>
>> Considering this is arith shift, we probably should just have
>> dst_reg->var_off = tnum_unknown to be conservative.
>>
>> I could miss something here as well. Let me try to write more
>> detailed explanation, hopefully to cover all corner cases.
> 
> Is there a use case for !src_known ?

For typical bpf programs, the shift amount should always be known...
If src_known is true, it must be dealing custom packets or custom
data structures in tracing, etc.


> I think test_verifier should have 100% line coverage of verifier.c
> and every 'if' condition in the verifier needs to have real use case
> behind it.
> It's still on my todo list to get rid of [su][min|max]_value tracking
> that was introduced without solid justification.
> 

^ permalink raw reply

* Re: [PATCH bpf-next v3 4/9] bpf/verifier: improve register value range tracking with ARSH
From: Alexei Starovoitov @ 2018-04-23  4:40 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <8a76b492-e01a-d79e-3dbe-5a1e6b0e60ce@fb.com>

On Sun, Apr 22, 2018 at 09:31:19PM -0700, Yonghong Song wrote:
> 
> 
> On 4/22/18 9:19 PM, Alexei Starovoitov wrote:
> > On Sun, Apr 22, 2018 at 07:49:13PM -0700, Yonghong Song wrote:
> > > 
> > > 
> > > On 4/22/18 5:16 PM, Alexei Starovoitov wrote:
> > > > On Fri, Apr 20, 2018 at 03:18:37PM -0700, Yonghong Song wrote:
> > > > > When helpers like bpf_get_stack returns an int value
> > > > > and later on used for arithmetic computation, the LSH and ARSH
> > > > > operations are often required to get proper sign extension into
> > > > > 64-bit. For example, without this patch:
> > > > >       54: R0=inv(id=0,umax_value=800)
> > > > >       54: (bf) r8 = r0
> > > > >       55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
> > > > >       55: (67) r8 <<= 32
> > > > >       56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
> > > > >       56: (c7) r8 s>>= 32
> > > > >       57: R8=inv(id=0)
> > > > > With this patch:
> > > > >       54: R0=inv(id=0,umax_value=800)
> > > > >       54: (bf) r8 = r0
> > > > >       55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
> > > > >       55: (67) r8 <<= 32
> > > > >       56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
> > > > >       56: (c7) r8 s>>= 32
> > > > >       57: R8=inv(id=0, umax_value=800,var_off=(0x0; 0x3ff))
> > > > > With better range of "R8", later on when "R8" is added to other register,
> > > > > e.g., a map pointer or scalar-value register, the better register
> > > > > range can be derived and verifier failure may be avoided.
> > > > > 
> > > > > In our later example,
> > > > >       ......
> > > > >       usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
> > > > >       if (usize < 0)
> > > > >           return 0;
> > > > >       ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
> > > > >       ......
> > > > > Without improving ARSH value range tracking, the register representing
> > > > > "max_len - usize" will have smin_value equal to S64_MIN and will be
> > > > > rejected by verifier.
> > > > > 
> > > > > Signed-off-by: Yonghong Song <yhs@fb.com>
> > > > > ---
> > > > >    kernel/bpf/verifier.c | 26 ++++++++++++++++++++++++++
> > > > >    1 file changed, 26 insertions(+)
> > > > > 
> > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > index 3c8bb92..01c215d 100644
> > > > > --- a/kernel/bpf/verifier.c
> > > > > +++ b/kernel/bpf/verifier.c
> > > > > @@ -2975,6 +2975,32 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env,
> > > > >    		/* We may learn something more from the var_off */
> > > > >    		__update_reg_bounds(dst_reg);
> > > > >    		break;
> > > > > +	case BPF_ARSH:
> > > > > +		if (umax_val >= insn_bitness) {
> > > > > +			/* Shifts greater than 31 or 63 are undefined.
> > > > > +			 * This includes shifts by a negative number.
> > > > > +			 */
> > > > > +			mark_reg_unknown(env, regs, insn->dst_reg);
> > > > > +			break;
> > > > > +		}
> > > > > +		if (dst_reg->smin_value < 0)
> > > > > +			dst_reg->smin_value >>= umin_val;
> > > > > +		else
> > > > > +			dst_reg->smin_value >>= umax_val;
> > > > > +		if (dst_reg->smax_value < 0)
> > > > > +			dst_reg->smax_value >>= umax_val;
> > > > > +		else
> > > > > +			dst_reg->smax_value >>= umin_val;
> > > > > +		if (src_known)
> > > > > +			dst_reg->var_off = tnum_rshift(dst_reg->var_off,
> > > > > +						       umin_val);
> > > > > +		else
> > > > > +			dst_reg->var_off = tnum_rshift(tnum_unknown, umin_val);
> > > > > +		dst_reg->umin_value >>= umax_val;
> > > > > +		dst_reg->umax_value >>= umin_val;
> > > > > +		/* We may learn something more from the var_off */
> > > > > +		__update_reg_bounds(dst_reg);
> > > > 
> > > > I'm struggling to understand how these bounds are computed.
> > > > Could you add examples in the comments?
> > > 
> > > Okay, let me try to add some comments for better understanding.
> > > 
> > > > In particular if dst_reg is unknown (tnum.mask == -1)
> > > > the above tnum_rshift() will clear upper bits and will make it
> > > > 64-bit positive, but that doesn't seem correct.
> > > > What am I missing?
> > > 
> > > Considering this is arith shift, we probably should just have
> > > dst_reg->var_off = tnum_unknown to be conservative.
> > > 
> > > I could miss something here as well. Let me try to write more
> > > detailed explanation, hopefully to cover all corner cases.
> > 
> > Is there a use case for !src_known ?
> 
> For typical bpf programs, the shift amount should always be known...
> If src_known is true, it must be dealing custom packets or custom
> data structures in tracing, etc.

In the example it was <<= 32 and s>>= 32 only because newly
introduced helper returns signed 32-bit integer that is later
used in the math. I have a hard time imagining useful C code that
needs arithmetic shift with a variable.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox