Netdev List
 help / color / mirror / Atom feed
* Re: WARNING in strp_data_ready
From: Tom Herbert @ 2017-12-28  1:19 UTC (permalink / raw)
  To: Ozgur
  Cc: Dmitry Vyukov, John Fastabend, syzbot, David S. Miller,
	Eric Biggers, LKML, Linux Kernel Network Developers,
	syzkaller-bugs@googlegroups.com, Tom Herbert, Cong Wang
In-Reply-To: <926421514406019@web43j.yandex.ru>

On Wed, Dec 27, 2017 at 12:20 PM, Ozgur <ozgur@goosey.org> wrote:
>
>
> 27.12.2017, 23:14, "Dmitry Vyukov" <dvyukov@google.com>:
>> On Wed, Dec 27, 2017 at 9:08 PM, Ozgur <ozgur@goosey.org> wrote:
>>>  27.12.2017, 22:21, "Dmitry Vyukov" <dvyukov@google.com>:
>>>>  On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>   Did you try the patch I posted?
>>>>
>>>>  Hi Tom,
>>>
>>>  Hello Dmitry,
>>>
>>>>  No. And I didn't know I need to. Why?
>>>>  If you think the patch needs additional testing, you can ask syzbot to
>>>>  test it. See https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
>>>>  Otherwise proceed with committing it. Or what are we waiting for?
>>>>
>>>>  Thanks
>>>
>>>  I think we need to fixed patch for crash, in fact check to patch code and test solve the bug.
>>>  How do test it because there is no patch in the following bug?
>>
>> Hi Ozgur,
>>
>> I am not sure I completely understand what you mean. But the
>> reproducer for this bug (which one can use for testing) is here:
>> https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY
>> Tom also mentions there is some patch for this, but I don't know where
>> it is, it doesn't seem to be referenced from this thread.
>
> Hello Dmitry,
>
> Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :)
> I think Tom send patch to only you and are you tested?
>
> kcmsock.c will change and strp_data_ready I think locked.
>
> Tom, please send a patch for me? I can test and inform you.
>
Hi Ozgur,

I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you can!

Thanks,
Tom

> Regards
>
> Ozgur
>
>>>  The fix patch should be for this net/kcm/kcmsock.c file and lock functions must be added calling sk_data_ready ().
>>>  Regards
>>>
>>>  Ozgur
>>>
>>>>>   On Wed, Dec 27, 2017 at 10:25 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>   On Wed, Dec 6, 2017 at 4:44 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>>>   <john.fastabend@gmail.com> wrote:
>>>>>>>>>   On 10/24/2017 08:20 AM, syzbot wrote:
>>>>>>>>>>   Hello,
>>>>>>>>>>
>>>>>>>>>>   syzkaller hit the following crash on 73d3393ada4f70fa3df5639c8d438f2f034c0ecb
>>>>>>>>>>   git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>>>>>>>>>>   compiler: gcc (GCC) 7.1.1 20170620
>>>>>>>>>>   .config is attached
>>>>>>>>>>   Raw console output is attached.
>>>>>>>>>>   C reproducer is attached
>>>>>>>>>>   syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>>>>>>>>>>   for information about syzkaller reproducers
>>>>>>>>>>
>>>>>>>>>>   WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me include/net/sock.h:1505 [inline]
>>>>>>>>>>   WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user include/net/sock.h:1511 [inline]
>>>>>>>>>>   WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
>>>>>>>>>>   Kernel panic - not syncing: panic_on_warn set ...
>>>>>>>>>>
>>>>>>>>>>   CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138
>>>>>>>>>>   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>>>>>>>>>>   Call Trace:
>>>>>>>>>>    <IRQ>
>>>>>>>>>>    __dump_stack lib/dump_stack.c:16 [inline]
>>>>>>>>>>    dump_stack+0x194/0x257 lib/dump_stack.c:52
>>>>>>>>>>    panic+0x1e4/0x417 kernel/panic.c:181
>>>>>>>>>>    __warn+0x1c4/0x1d9 kernel/panic.c:542
>>>>>>>>>>    report_bug+0x211/0x2d0 lib/bug.c:183
>>>>>>>>>>    fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
>>>>>>>>>>    do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
>>>>>>>>>>    do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
>>>>>>>>>>    do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
>>>>>>>>>>    do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
>>>>>>>>>>    invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
>>>>>>>>>>   RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline]
>>>>>>>>>>   RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline]
>>>>>>>>>>   RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404
>>>>>>>>>>   RSP: 0018:ffff8801db206b18 EFLAGS: 00010206
>>>>>>>>>>   RAX: ffff8801d1e02080 RBX: ffff8801dad74c48 RCX: 0000000000000000
>>>>>>>>>>   RDX: 0000000000000100 RSI: ffff8801d29fa0a0 RDI: ffffffff85cbede0
>>>>>>>>>>   RBP: ffff8801db206b38 R08: 0000000000000005 R09: 1ffffffff0ce0bcd
>>>>>>>>>>   R10: ffff8801db206a00 R11: dffffc0000000000 R12: ffff8801d29fa000
>>>>>>>>>>   R13: ffff8801dad74c50 R14: ffff8801d4350a92 R15: 0000000000000001
>>>>>>>>>>    psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353
>>>>>>>>>
>>>>>>>>>   Looks like KCM is calling sk_data_ready() without first taking the
>>>>>>>>>   sock lock.
>>>>>>>>>
>>>>>>>>>   /* Called with lower sock held */
>>>>>>>>>   static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb)
>>>>>>>>>   {
>>>>>>>>>    [...]
>>>>>>>>>           if (kcm_queue_rcv_skb(&kcm->sk, skb)) {
>>>>>>>>>
>>>>>>>>>   In this case kcm->sk is not the same lock the comment is referring to.
>>>>>>>>>   And kcm_queue_rcv_skb() will eventually call sk_data_ready().
>>>>>>>>>
>>>>>>>>>   @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock?
>>>>>>>>>   I don't have anything better in mind immediately.
>>>>>>>>   The sock locks are taken in reverse order in the send path so so
>>>>>>>>   grabbing kcm sock lock with lower lock held to call sk_data_ready may
>>>>>>>>   lead to deadlock like I think.
>>>>>>>>
>>>>>>>>   It might be possible to change the order in the send path to do this.
>>>>>>>>   Something like:
>>>>>>>>
>>>>>>>>   trylock on lower socket lock
>>>>>>>>   -if trylock fails
>>>>>>>>     - release kcm sock lock
>>>>>>>>     - lock lower sock
>>>>>>>>     - lock kcm sock
>>>>>>>>   - call sendpage locked function
>>>>>>>>
>>>>>>>>   I admit that dealing with two levels of socket locks in the data path
>>>>>>>>   is quite a pain :-)
>>>>>>>
>>>>>>>   up
>>>>>>>
>>>>>>>   still happening and we've lost 50K+ test VMs on this
>>>>>>
>>>>>>   up
>>>>>>
>>>>>>   Still happens and number of crashes crossed 60K, can we do something
>>>>>>   with this please?

^ permalink raw reply

* [PATCH RFC 2/2] strparser: Call sock_owned_by_user_nocheck
From: Tom Herbert @ 2017-12-28  1:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, dvyukov, ozgur, Tom Herbert
In-Reply-To: <20171228011635.23562-1-tom@quantonium.net>

strparser wants to check socket ownership without producing any
warnings. As indicated by the comment in the code, it is permissible
for owned_by_user to return true.

Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Tom Herbert <tom@quantonium.net>
---
 net/strparser/strparser.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
index c5fda15ba319..1fdab5c4eda8 100644
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -401,7 +401,7 @@ void strp_data_ready(struct strparser *strp)
 	 * allows a thread in BH context to safely check if the process
 	 * lock is held. In this case, if the lock is held, queue work.
 	 */
-	if (sock_owned_by_user(strp->sk)) {
+	if (sock_owned_by_user_nocheck(strp->sk)) {
 		queue_work(strp_wq, &strp->work);
 		return;
 	}
-- 
2.11.0

^ permalink raw reply related

* [PATCH RFC 1/2] sock: Add sock_owned_by_user_nocheck
From: Tom Herbert @ 2017-12-28  1:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, dvyukov, ozgur, Tom Herbert
In-Reply-To: <20171228011635.23562-1-tom@quantonium.net>

This allows checking socket lock ownership with producing lockdep
warnings.

Signed-off-by: Tom Herbert <tom@quantonium.net>
---
 include/net/sock.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 6c1db823f8b9..66fd3951e6f3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1515,6 +1515,11 @@ static inline bool sock_owned_by_user(const struct sock *sk)
 	return sk->sk_lock.owned;
 }
 
+static inline bool sock_owned_by_user_nocheck(const struct sock *sk)
+{
+	return sk->sk_lock.owned;
+}
+
 /* no reclassification while locks are held */
 static inline bool sock_allow_reclassification(const struct sock *csk)
 {
-- 
2.11.0

^ permalink raw reply related

* [PATCH RFC 0/2] kcm: Fix lockdep issue
From: Tom Herbert @ 2017-12-28  1:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, dvyukov, ozgur, Tom Herbert

When sock_owned_by_user returns true in strparser. Fix is to add and
call sock_owned_by_user_nocheck since the check for owned by user is
not an error condition in this case.

Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages")
Reported-by: syzbot <syzkaller@googlegroups.com>

Tom Herbert (2):
  sock: Add sock_owned_by_user_nocheck
  strparser: Call sock_owned_by_user_nocheck

 include/net/sock.h        | 5 +++++
 net/strparser/strparser.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

-- 
2.11.0

^ permalink raw reply

* [RFC PATCH net-next] net: hns3: hns3_get_channels() can be static
From: kbuild test robot @ 2017-12-28  1:09 UTC (permalink / raw)
  To: Peng Li
  Cc: kbuild-all, netdev, Mingguang Qu, Yisen Zhuang, Salil Mehta,
	linux-kernel
In-Reply-To: <201712280935.vFeMCw6c%fengguang.wu@intel.com>


Fixes: 482d2e9c1cc7 ("net: hns3: add support to query tqps number")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---
 hns3_ethtool.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 2ae4d397..379c01d 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -876,8 +876,8 @@ static int hns3_nway_reset(struct net_device *netdev)
 	return genphy_restart_aneg(phy);
 }
 
-void hns3_get_channels(struct net_device *netdev,
-		       struct ethtool_channels *ch)
+static void hns3_get_channels(struct net_device *netdev,
+			      struct ethtool_channels *ch)
 {
 	struct hnae3_handle *h = hns3_get_handle(netdev);
 

^ permalink raw reply related

* [net-next:master 731/763] drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c:852:6: sparse: symbol 'hns3_get_channels' was not declared. Should it be static?
From: kbuild test robot @ 2017-12-28  1:09 UTC (permalink / raw)
  To: Peng Li
  Cc: kbuild-all, netdev, Mingguang Qu, Yisen Zhuang, Salil Mehta,
	linux-kernel

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   55b07a65e15bea6e253a907dacaf89b61fe504ca
commit: 482d2e9c1cc7c0e154464e3e052db09e5e62541f [731/763] net: hns3: add support to query tqps number
reproduce:
        # apt-get install sparse
        git checkout 482d2e9c1cc7c0e154464e3e052db09e5e62541f
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)


Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH 1/4] libbpf: add function to setup XDP
From: Toshiaki Makita @ 2017-12-28  0:59 UTC (permalink / raw)
  To: Eric Leblond, Alexei Starovoitov; +Cc: netdev, daniel, linux-kernel
In-Reply-To: <20171227180229.1926-2-eric@regit.org>

On 2017/12/28 3:02, Eric Leblond wrote:
> Most of the code is taken from set_link_xdp_fd() in bpf_load.c and
> slightly modified to be library compliant.
> 
> Signed-off-by: Eric Leblond <eric@regit.org>
> ---
...
> +int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
...
> +	if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
> +		ret = -errno;
> +		goto cleanup;
> +	}
> +
> +	addrlen = sizeof(sa);
> +	if (getsockname(sock, (struct sockaddr *)&sa, &addrlen) < 0) {
> +		ret = errno;

forgot to prepend '-'?

> +		goto cleanup;
> +	}
> +
> +	if (addrlen != sizeof(sa)) {
> +		ret = errno;

errno is not set?

> +		goto cleanup;
> +	}

-- 
Toshiaki Makita

^ permalink raw reply

* Re: [PATCHv3 0/2] capability controlled user-namespaces
From: Mahesh Bandewar (महेश बंडेवार) @ 2017-12-28  0:45 UTC (permalink / raw)
  To: mtk.manpages
  Cc: James Morris, LKML, Netdev, Kernel-hardening, Linux API,
	Kees Cook, Serge Hallyn, Eric W . Biederman, Eric Dumazet,
	David Miller, Mahesh Bandewar
In-Reply-To: <CAKgNAkgzyUT5-Gb7COjz5Um0f=ViH8s8ap4Jh9vvtyS8G=iDkw@mail.gmail.com>

On Wed, Dec 27, 2017 at 12:23 PM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> Hello Mahesh,
>
> On 27 December 2017 at 18:09, Mahesh Bandewar (महेश बंडेवार)
> <maheshb@google.com> wrote:
>> Hello James,
>>
>> Seems like I missed your name to be added into the review of this
>> patch series. Would you be willing be pull this into the security
>> tree? Serge Hallyn has already ACKed it.
>
> We seem to have no formal documentation/specification of this feature.
> I think that should be written up before this patch goes into
> mainline...
>
absolutely. I have added enough information into the Documentation dir
relevant to this feature (please look at the  individual patches),
that could be used. I could help if needed.

thanks,
--mahesh..

> Cheers,
>
> Michael
>
>
>>
>> On Tue, Dec 5, 2017 at 2:30 PM, Mahesh Bandewar <mahesh@bandewar.net> wrote:
>>> From: Mahesh Bandewar <maheshb@google.com>
>>>
>>> TL;DR version
>>> -------------
>>> Creating a sandbox environment with namespaces is challenging
>>> considering what these sandboxed processes can engage into. e.g.
>>> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
>>> Current form of user-namespaces, however, if changed a bit can allow
>>> us to create a sandbox environment without locking down user-
>>> namespaces.
>>>
>>> Detailed version
>>> ----------------
>>>
>>> Problem
>>> -------
>>> User-namespaces in the current form have increased the attack surface as
>>> any process can acquire capabilities which are not available to them (by
>>> default) by performing combination of clone()/unshare()/setns() syscalls.
>>>
>>>     #define _GNU_SOURCE
>>>     #include <stdio.h>
>>>     #include <sched.h>
>>>     #include <netinet/in.h>
>>>
>>>     int main(int ac, char **av)
>>>     {
>>>         int sock = -1;
>>>
>>>         printf("Attempting to open RAW socket before unshare()...\n");
>>>         sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
>>>         if (sock < 0) {
>>>             perror("socket() SOCK_RAW failed: ");
>>>         } else {
>>>             printf("Successfully opened RAW-Sock before unshare().\n");
>>>             close(sock);
>>>             sock = -1;
>>>         }
>>>
>>>         if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
>>>             perror("unshare() failed: ");
>>>             return 1;
>>>         }
>>>
>>>         printf("Attempting to open RAW socket after unshare()...\n");
>>>         sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
>>>         if (sock < 0) {
>>>             perror("socket() SOCK_RAW failed: ");
>>>         } else {
>>>             printf("Successfully opened RAW-Sock after unshare().\n");
>>>             close(sock);
>>>             sock = -1;
>>>         }
>>>
>>>         return 0;
>>>     }
>>>
>>> The above example shows how easy it is to acquire NET_RAW capabilities
>>> and once acquired, these processes could take benefit of above mentioned
>>> or similar issues discovered/undiscovered with malicious intent. Note
>>> that this is just an example and the problem/solution is not limited
>>> to NET_RAW capability *only*.
>>>
>>> The easiest fix one can apply here is to lock-down user-namespaces which
>>> many of the distros do (i.e. don't allow users to create user namespaces),
>>> but unfortunately that prevents everyone from using them.
>>>
>>> Approach
>>> --------
>>> Introduce a notion of 'controlled' user-namespaces. Every process on
>>> the host is allowed to create user-namespaces (governed by the limit
>>> imposed by per-ns sysctl) however, mark user-namespaces created by
>>> sandboxed processes as 'controlled'. Use this 'mark' at the time of
>>> capability check in conjunction with a global capability whitelist.
>>> If the capability is not whitelisted, processes that belong to
>>> controlled user-namespaces will not be allowed.
>>>
>>> Once a user-ns is marked as 'controlled'; all its child user-
>>> namespaces are marked as 'controlled' too.
>>>
>>> A global whitelist is list of capabilities governed by the
>>> sysctl which is available to (privileged) user in init-ns to modify
>>> while it's applicable to all controlled user-namespaces on the host.
>>>
>>> Marking user-namespaces controlled without modifying the whitelist is
>>> equivalent of the current behavior. The default value of whitelist includes
>>> all capabilities so that the compatibility is maintained. However it gives
>>> admins fine-grained ability to control various capabilities system wide
>>> without locking down user-namespaces.
>>>
>>> Please see individual patches in this series.
>>>
>>> Mahesh Bandewar (2):
>>>   capability: introduce sysctl for controlled user-ns capability whitelist
>>>   userns: control capabilities of some user namespaces
>>>
>>>  Documentation/sysctl/kernel.txt | 21 +++++++++++++++++
>>>  include/linux/capability.h      |  7 ++++++
>>>  include/linux/user_namespace.h  | 25 ++++++++++++++++++++
>>>  kernel/capability.c             | 52 +++++++++++++++++++++++++++++++++++++++++
>>>  kernel/sysctl.c                 |  5 ++++
>>>  kernel/user_namespace.c         |  4 ++++
>>>  security/commoncap.c            |  8 +++++++
>>>  7 files changed, 122 insertions(+)
>>>
>>> --
>>> 2.15.0.531.g2ccb3012c9-goog
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-api" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply

* pull-request: bpf-next 2017-12-28
From: Daniel Borkmann @ 2017-12-28  0:18 UTC (permalink / raw)
  To: davem; +Cc: daniel, ast, netdev

Hi David,

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Fix incorrect state pruning related to recognition of zero initialized
   stack slots, where stacksafe exploration would mistakenly return a
   positive pruning verdict too early ignoring other slots, from Gianluca.

2) Various BPF to BPF calls related follow-up fixes. Fix an off-by-one
   in maximum call depth check, and rework maximum stack depth tracking
   logic to fix a bypass of the total stack size check reported by Jann.
   Also fix a bug in arm64 JIT where prog->jited_len was uninitialized.
   Addition of various test cases to BPF selftests, from Alexei.

3) Addition of a BPF selftest to test_verifier that is related to BPF to
   BPF calls which demonstrates a late caller stack size increase and
   thus out of bounds access. Fixed above in 2). Test case from Jann.

4) Addition of correlating BPF helper calls, BPF to BPF calls as well
   as BPF maps to bpftool xlated dump in order to allow for better
   BPF program introspection and debugging, from Daniel.

5) Fixing several bugs in BPF to BPF calls kallsyms handling in order
   to get it actually to work for subprogs, from Daniel.

6) Extending sparc64 JIT support for BPF to BPF calls and fix a couple
   of build errors for libbpf on sparc64, from David.

7) Allow narrower context access for BPF dev cgroup typed programs in
   order to adapt to LLVM code generation. Also adjust memlock rlimit
   in the test_dev_cgroup BPF selftest, from Yonghong.

8) Add netdevsim Kconfig entry to BPF selftests since test_offload.py
   relies on netdevsim device being available, from Jakub.

9) Reduce scope of xdp_do_generic_redirect_map() to being static,
   from Xiongwei.

10) Minor cleanups and spelling fixes in BPF verifier, from Colin.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

Thanks a lot & have a happy new year!

----------------------------------------------------------------

The following changes since commit 962b582785b60a2b420b0636ad762959c72406f6:

  cxgb4: Simplify PCIe Completion Timeout setting (2017-12-18 15:12:57 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git 

for you to fetch changes up to 624588d9d6cc0a1a270a65fb4d5220f1ceddcf38:

  Merge branch 'bpf-stack-depth-tracking-fixes' (2017-12-27 18:36:24 +0100)

----------------------------------------------------------------
Alexei Starovoitov (5):
      bpf: arm64: fix uninitialized variable
      Merge branch 'bpftool-improvements-kallsymfix'
      bpf: fix maximum stack depth tracking logic
      selftests/bpf: additional stack depth tests
      bpf: fix max call depth check

Colin Ian King (2):
      bpf: fix spelling mistake: "funcation"-> "function"
      bpf: make function skip_callee static and return NULL rather than 0

Daniel Borkmann (3):
      bpf: fix kallsyms handling for subprogs
      bpf: allow for correlation of maps and helpers in dump
      Merge branch 'bpf-stack-depth-tracking-fixes'

David Miller (2):
      libbpf: Fix build errors.
      bpf: sparc64: Add JIT support for multi-function programs.

Gianluca Borello (1):
      bpf: fix stacksafe exploration when comparing states

Jakub Kicinski (1):
      selftests/bpf: add netdevsim to config

Jann Horn (1):
      bpf: selftest for late caller stack size increase

Xiongwei Song (1):
      bpf: make function xdp_do_generic_redirect_map() static

Yonghong Song (2):
      bpf/cgroup: fix a verification error for a CGROUP_DEVICE type prog
      tools/bpf: adjust rlimit RLIMIT_MEMLOCK for test_dev_cgroup

 arch/arm64/net/bpf_jit_comp.c                 |   1 +
 arch/sparc/net/bpf_jit_comp_64.c              |  44 ++++-
 include/linux/bpf_verifier.h                  |   1 +
 include/linux/filter.h                        |   9 +
 include/uapi/linux/bpf.h                      |   3 +-
 kernel/bpf/cgroup.c                           |  15 +-
 kernel/bpf/core.c                             |   4 +-
 kernel/bpf/disasm.c                           |  65 +++++--
 kernel/bpf/disasm.h                           |  29 +++-
 kernel/bpf/syscall.c                          |  93 +++++++++-
 kernel/bpf/verifier.c                         | 126 +++++++++++---
 net/core/filter.c                             |   5 +-
 tools/bpf/bpftool/prog.c                      | 181 ++++++++++++++++++-
 tools/lib/bpf/libbpf.c                        |   5 +-
 tools/testing/selftests/bpf/config            |   1 +
 tools/testing/selftests/bpf/test_dev_cgroup.c |   9 +-
 tools/testing/selftests/bpf/test_verifier.c   | 241 ++++++++++++++++++++++++++
 17 files changed, 764 insertions(+), 68 deletions(-)

^ permalink raw reply

* Re: [Patch net-next] net_sched: remove the unsafe __skb_array_empty()
From: John Fastabend @ 2017-12-27 23:53 UTC (permalink / raw)
  To: Cong Wang; +Cc: Linux Kernel Network Developers, Jakub Kicinski
In-Reply-To: <CAM_iQpX493zk7nAgv8xo0XN8sT9g5SUnrBC76Fvamc9Otfhfqg@mail.gmail.com>

On 12/27/2017 10:29 AM, Cong Wang wrote:
> On Sat, Dec 23, 2017 at 10:57 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> On 12/22/2017 12:31 PM, Cong Wang wrote:
>>> I understand why you had it, but it is just not safe. You don't want
>>> to achieve performance gain by crashing system, right?
>>
>> huh? So my point is the patch you submit here is not a
>> real fix but a work around. To peek the head of a consumer/producer ring
>> without a lock, _should_ be fine. This _should_ work as well with
>> consumer or producer operations happening at the same time. After some
>> digging the issue is in the ptr_ring code.
> 
> 
> The comments disagree with you:
> 
> /* Might be slightly faster than skb_array_empty below, but only safe if the
>  * array is never resized. Also, callers invoking this in a loop must take care
>  * to use a compiler barrier, for example cpu_relax().
>  */
> 
> If the comments are right, you miss a barrier here too.
> 

The comment talks about resizing arrays not consumer/producer operations
so I think it is OK. We don't call skb_array_empty on the same skb_array
in a loop here, its different skb arrays. So probably don't need a barrier.

> 
>>
>> The peek code (what empty check calls) is the following,
>>
>> static inline void *__ptr_ring_peek(struct ptr_ring *r)
>> {
>>         if (likely(r->size))
>>                 return r->queue[r->consumer_head];
>>         return NULL;
>> }
>>
>> So what the splat is detecting is consumer head being 'out of bounds'.
>> This happens because ptr_ring_discard_one increments the consumer_head
>> and then checks to see if it overran the array size. If above peek
>> happens after the increment, but before the size check we get the
>> splat. There are two ways, as far as I can see, to fix this. First
>> do the check before incrementing the consumer head. Or the easier
>> fix,
>>
>> --- a/include/linux/ptr_ring.h
>> +++ b/include/linux/ptr_ring.h
>> @@ -438,7 +438,7 @@ static inline int ptr_ring_consume_batched_bh(struct
>> ptr_ring *r,
>>
>>  static inline void **__ptr_ring_init_queue_alloc(unsigned int size,
>> gfp_t gfp)
>>  {
>> -       return kcalloc(size, sizeof(void *), gfp);
>> +       return kcalloc(size + 1, sizeof(void *), gfp);
>>  }
>>
>> With Jakub's help (Thanks!) I was able to reproduce the original splat
>> and also verify the above removes it.
>>
>> To be clear "resizing" a skb_array only refers to changing the actual
>> array size not adding/removing elements.
> 
> I never look into the implementation, just simply trust the comments.
> 
> At least the comments above __skb_array_empty() need to improve.
> > 
>>
>>>
>>>>
>>>> Although its not logical IMO to have both reset and dequeue running at
>>>> the same time. Some skbs would get through others would get sent, sort
>>>> of a mess. I don't see how it can be an issue. The never resized bit
>>>> in the documentation is referring to resizing the ring size _not_ popping
>>>> off elements of the ring. array_empty just reads the consumer head.
>>>> The only ring resizing in pfifo fast should be at init and destroy where
>>>> enqueue/dequeue should be disconnected by then. Although based on the
>>>> trace I missed a case.
>>>
>>>
>>> Both pfifo_fast_reset() and pfifo_fast_dequeue() call
>>> skb_array_consume_bh(), so there is no difference w.r.t. resizing.
>>>
>>
>> Sorry not following.
>>
>>> And ->reset() is called in qdisc_graft() too. Let's say we have htb+pfifo_fast,
>>> htb_graft() calls qdisc_replace() which calls qdisc_reset() on pfifo_fast,
>>> so clearly pfifo_fast_reset() can run with pfifo_fast_dequeue()
>>> concurrently.
>>
>> Yes and this _should_ be perfectly fine for pfifo_fast. I'm wondering
>> though if this API can be cleaned up. What are the paths that do a reset
>> without a destroy.. Do we really need to have this pattern where reset
>> is called then later destroy. Seems destroy could do the entire cleanup
>> and this would simplify things. None of this has to do with the splat
>> though.
> 
> I don't follow your point any more.> 
> We are talking about ->reset() race with ->dequeue() which is the
> cause of the bug, right?> 

Yes. But it should be OK for qdiscs to have reset and dequeue run
in parallel. So the bug is in ptr_ring.

> If you expect ->reset() runs in parallel with ->dequeue() for pfifo_fast,
> why did you even mention synchronize_net() from the beginning???

A mistake probably I am tracking two separate issues. First this one
with skb_array out of bounds access and the tx_action being called
while destroy is in progress. When I was first looking at this I thought
the splat was from the tx_action issue so probably started talking
about how to sync destroy and tx_action paths. Either way it was
a mistake to mention in this thread.

> Also you changed the code too, to adjust rcu grace period.
> 
> 
>>
>>>
>>>
>>>>
>>>> I think the right fix is to only call reset/destroy patterns after
>>>> waiting a grace period and for all tx_action calls in-flight to
>>>> complete. This is also better going forward for more complex qdiscs.
>>>
>>> But we don't even have rcu read lock in TX BH, do we?
>>>
>>> Also, people certainly don't like yet another synchronize_net()...
>>>
>>
>> This needs a fix and is a _real_ bug, but removing __skb_array_empty()
>> doesn't help solve this at all. Will work on a fix after the holiday
>> break. The fix here is to ensure the destroy is not going to happen
>> while tx_action is in-flight. Can be done with qdisc_run and checking
>> correct bits in lockless case.
> 
> Sounds like you missed a lot of things with your "lockless" patches....
> First qdisc rcu callback, second rcu read lock in TX BH...

A couple things yes. TX BH, ptr_ring bug, skb list free fixed in
another patch. I assume the qdisc rcu callbacks you talk about are
the TX BH or something else?

The other possible bug, still need to do analysis, is if we need to
add back the qdisc destroy rcu callback. Or if there is a grace period
in all cases. My guess is current code paths in pfifo fast don't have
an issue but future work will.

> 
> My quick one-line fix is to amend this bug before you going deeper
> in this rabbit hole.
> 

Nope its not a good fix and we are in net-next timeframe here so lets
fix the root cause in ptr_ring. Will post a patch in the next couple
days after PTO. Its minor issue and can wait.

.John

^ permalink raw reply

* [PATCH net-next] nfp: bpf: allocate vNIC priv for keeping track of the offloaded program
From: Jakub Kicinski @ 2017-12-27 23:36 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, Jakub Kicinski

After TC offloads were converted to callbacks we have no choice
but keep track of the offloaded filter in the driver.

Since this change came a little late in the release cycle
there were a number of conflicts and allocation of vNIC priv
structure seems to have slipped away in linux-next.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/bpf/main.c | 30 ++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index 214b02a3acdd..4b63167906ca 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -84,6 +84,33 @@ static const char *nfp_bpf_extra_cap(struct nfp_app *app, struct nfp_net *nn)
 	return nfp_net_ebpf_capable(nn) ? "BPF" : "";
 }
 
+static int
+nfp_bpf_vnic_alloc(struct nfp_app *app, struct nfp_net *nn, unsigned int id)
+{
+	int err;
+
+	nn->app_priv = kzalloc(sizeof(struct nfp_bpf_vnic), GFP_KERNEL);
+	if (!nn->app_priv)
+		return -ENOMEM;
+
+	err = nfp_app_nic_vnic_alloc(app, nn, id);
+	if (err)
+		goto err_free_priv;
+
+	return 0;
+err_free_priv:
+	kfree(nn->app_priv);
+	return err;
+}
+
+static void nfp_bpf_vnic_free(struct nfp_app *app, struct nfp_net *nn)
+{
+	struct nfp_bpf_vnic *bv = nn->app_priv;
+
+	WARN_ON(bv->tc_prog);
+	kfree(bv);
+}
+
 static int nfp_bpf_setup_tc_block_cb(enum tc_setup_type type,
 				     void *type_data, void *cb_priv)
 {
@@ -286,7 +313,8 @@ const struct nfp_app_type app_bpf = {
 
 	.extra_cap	= nfp_bpf_extra_cap,
 
-	.vnic_alloc	= nfp_app_nic_vnic_alloc,
+	.vnic_alloc	= nfp_bpf_vnic_alloc,
+	.vnic_free	= nfp_bpf_vnic_free,
 
 	.setup_tc	= nfp_bpf_setup_tc,
 	.tc_busy	= nfp_bpf_tc_busy,
-- 
2.15.1

^ permalink raw reply related

* pull-request: bpf 2017-12-28
From: Daniel Borkmann @ 2017-12-27 23:27 UTC (permalink / raw)
  To: davem; +Cc: daniel, ast, netdev

Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Two small fixes for bpftool. Fix otherwise broken output if any of
   the system calls failed when listing maps in json format and instead
   of bailing out, skip maps or progs that disappeared between fetching
   next id and getting an fd for that id, both from Jakub.

2) Small fix in BPF selftests to respect LLC passed from command line
   when testing for -mcpu=probe presence, from Quentin.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!

----------------------------------------------------------------

The following changes since commit c50b7c473f609189da3bccd28ee5dcf3b55109cd:

  Merge branch 'net-zerocopy-fixes' (2017-12-21 15:00:59 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to aee657460a8ce66443d5e7413046d02d7b2165db:

  Merge branch 'bpf-bpftool-various-fixes' (2017-12-23 01:09:53 +0100)

----------------------------------------------------------------
Daniel Borkmann (1):
      Merge branch 'bpf-bpftool-various-fixes'

Jakub Kicinski (2):
      tools: bpftool: maps: close json array on error paths of show
      tools: bpftool: protect against races with disappearing objects

Quentin Monnet (1):
      selftests/bpf: fix Makefile for passing LLC to the command line

 tools/bpf/bpftool/map.c              | 8 +++++---
 tools/bpf/bpftool/prog.c             | 2 ++
 tools/testing/selftests/bpf/Makefile | 2 +-
 3 files changed, 8 insertions(+), 4 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface
From: Russell King - ARM Linux @ 2017-12-27 23:20 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement, mw, stefanc, ymarkman, thomas.petazzoni,
	miquel.raynal, nadavh, netdev, linux-arm-kernel, linux-kernel,
	Jon Nettleton
In-Reply-To: <20171227224252.GB2626@kwain>

On Wed, Dec 27, 2017 at 11:42:52PM +0100, Antoine Tenart wrote:
> Hi Russell,
> 
> On Wed, Dec 27, 2017 at 10:24:01PM +0000, Russell King - ARM Linux wrote:
> > On Wed, Dec 27, 2017 at 11:14:45PM +0100, Antoine Tenart wrote:
> > >  
> > > +&cps_eth2 {
> > > +	/* CPS Lane 5 */
> > > +	status = "okay";
> > > +	phy-mode = "2500base-x";
> > > +	/* Generic PHY, providing serdes lanes */
> > > +	phys = <&cps_comphy5 2>;
> > > +};
> > > +
> > 
> > This is wrong.  This lane is connected to a SFP cage which can support
> > more than just 2500base-X.  Tying it in this way to 2500base-X means
> > that this port does not support conenctions at 1000base-X, despite
> > that's one of the most popular and more standardised speeds.
> 
> What do you suggest to describe this in the dt, to enable a port using
> the current PPv2 driver?

I don't - I'm merely pointing out that you're bodging support for the
SFP cage rather than productively discussing phylink for mvpp2.

As far as I remember, the discussion stalled at this point:

- You think there's modes that mvpp2 supports that are not supportable
  if you use phylink.

- I've described what phylink supports, and I've asked you for details
  about what you can't support.

Unfortunately, no details have been forthcoming, and no further
discussion has occurred - the ball is entirely in your court to
progress this issue since I requested information from you and that
is where things seem to have stalled.

The result is that, with your patch, you're locking the port to 2.5G
speeds, meaning that only 4.3Mbps Fibrechannel SFPs can be used with
the port, and it can only be used with another device that supports
2.5G speeds.  You can't use a copper RJ45 module, and you can't use
a standard 1000base-X module either in this configuration.

What I'm most concerned about, given the bindings for comphy that
have been merged, is that Free Electrons is pushing forward seemingly
with no regard to the requirement that the serdes lanes are dynamically
reconfigurable, and that's a basic requirement for SFP, and for the
88x3310 PHYs on the Macchiatobin platform.

So, my question to you is: what is Free Electrons plans to properly
support the ethernet ports on the Macchiatobin platform?


For those on the Cc list who don't know, phylink is part of full support
for SFP and SFP+ cages, sponsored (in terms of hardware including SFP
modules) by SolidRun on both SolidRun's Clearfog and Macchiatobin
platforms, supporting a wide range of SFP modules including:

- 1G Optical ethernet modules (duplex and bidi modules)
- 10/100/1G RJ45 modules
- 10G SFP+ modules
- 2.5Gbase-X using 4.3Mbps Fibrechannel modules
- Direct attach cables

There is work ongoing between Florian, Andrew and myself to switch DSA
to phylink, and have SFP modules working with both Marvell and Broadcom
DSA switches.

Phylink and SFP was already merged into mainline, and has been usable
(provided that the network driver is converted to phylink rather than
phylib) since 4.14-rc1.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply

* Re: [PATCH bpf-next v2 6/8] bpf: offload: report device information for offloaded programs
From: Jakub Kicinski @ 2017-12-27 23:18 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: netdev, alexei.starovoitov, ktkhai, oss-drivers,
	Eric W . Biederman
In-Reply-To: <09ae3b1b-e79b-98c0-e1d1-31fcdab9df40@iogearbox.net>

On Wed, 27 Dec 2017 18:26:16 +0100, Daniel Borkmann wrote:
> On 12/21/2017 10:01 PM, Jakub Kicinski wrote:
> > Report to the user ifindex and namespace information of offloaded
> > programs.  If device has disappeared return -ENODEV.  Specify the
> > namespace using dev/inode combination.
> > 
> > CC: Eric W. Biederman <ebiederm@xmission.com>
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
> > ---
> > v2:
> >  - take RTNL lock to grab a coherent snapshot of device state
> >    (ifindex vs name space) and avoid races with name space
> >    moves (based on Eric's comment on Kirill's patch to
> >    peernet2id_alloc()).
> > ---
> >  fs/nsfs.c                      |  2 +-
> >  include/linux/bpf.h            |  2 ++
> >  include/linux/proc_ns.h        |  1 +
> >  include/uapi/linux/bpf.h       |  3 +++
> >  kernel/bpf/offload.c           | 44 ++++++++++++++++++++++++++++++++++++++++++
> >  kernel/bpf/syscall.c           |  6 ++++++
> >  tools/include/uapi/linux/bpf.h |  3 +++
> >  7 files changed, 60 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/nsfs.c b/fs/nsfs.c
> > index 7c6f76d29f56..e50628675935 100644
> > --- a/fs/nsfs.c
> > +++ b/fs/nsfs.c
> > @@ -51,7 +51,7 @@ static void nsfs_evict(struct inode *inode)
> >  	ns->ops->put(ns);
> >  }
> >  
> > -static void *__ns_get_path(struct path *path, struct ns_common *ns)
> > +void *__ns_get_path(struct path *path, struct ns_common *ns)
> >  {
> >  	struct vfsmount *mnt = nsfs_mnt;
> >  	struct dentry *dentry;
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 9a916ab34299..7810ae57b357 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -531,6 +531,8 @@ static inline struct bpf_prog *bpf_prog_get_type(u32 ufd,
> >  
> >  int bpf_prog_offload_compile(struct bpf_prog *prog);
> >  void bpf_prog_offload_destroy(struct bpf_prog *prog);
> > +int bpf_prog_offload_info_fill(struct bpf_prog_info *info,
> > +			       struct bpf_prog *prog);
> >  
> >  #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
> >  int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr);
> > diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
> > index 2ff18c9840a7..1733359cf713 100644
> > --- a/include/linux/proc_ns.h
> > +++ b/include/linux/proc_ns.h
> > @@ -76,6 +76,7 @@ static inline int ns_alloc_inum(struct ns_common *ns)
> >  
> >  extern struct file *proc_ns_fget(int fd);
> >  #define get_proc_ns(inode) ((struct ns_common *)(inode)->i_private)
> > +extern void *__ns_get_path(struct path *path, struct ns_common *ns);
> >  extern void *ns_get_path(struct path *path, struct task_struct *task,
> >  			const struct proc_ns_operations *ns_ops);
> >  
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index d01f1cb3cfc0..72b37fc3bc0c 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -921,6 +921,9 @@ struct bpf_prog_info {
> >  	__u32 nr_map_ids;
> >  	__aligned_u64 map_ids;
> >  	char name[BPF_OBJ_NAME_LEN];
> > +	__u32 ifindex;
> > +	__u64 netns_dev;
> > +	__u64 netns_ino;
> >  } __attribute__((aligned(8)));
> >  
> >  struct bpf_map_info {
> > diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
> > index 1e6064ea3609..4d50000bd1e3 100644
> > --- a/kernel/bpf/offload.c
> > +++ b/kernel/bpf/offload.c
> > @@ -16,9 +16,11 @@
> >  #include <linux/bpf.h>
> >  #include <linux/bpf_verifier.h>
> >  #include <linux/bug.h>
> > +#include <linux/kdev_t.h>
> >  #include <linux/list.h>
> >  #include <linux/netdevice.h>
> >  #include <linux/printk.h>
> > +#include <linux/proc_ns.h>
> >  #include <linux/rtnetlink.h>
> >  #include <linux/rwsem.h>
> >  
> > @@ -181,6 +183,48 @@ int bpf_prog_offload_compile(struct bpf_prog *prog)
> >  	return bpf_prog_offload_translate(prog);
> >  }
> >  
> > +int bpf_prog_offload_info_fill(struct bpf_prog_info *info,
> > +			       struct bpf_prog *prog)
> > +{
> > +	struct bpf_dev_offload *offload;
> > +	struct inode *ns_inode;
> > +	struct path ns_path;
> > +	int ifindex, err;
> > +	struct net *net;
> > +
> > +again:
> > +	rtnl_lock();
> > +	down_read(&bpf_devs_lock);
> > +
> > +	offload = prog->aux->offload;
> > +	if (!offload) {
> > +		up_read(&bpf_devs_lock);
> > +		rtnl_unlock();
> > +		return -ENODEV;
> > +	}
> > +
> > +	ifindex = offload->netdev->ifindex;
> > +	net = dev_net(offload->netdev);
> > +	get_net(net); /* __ns_get_path() drops the reference */
> > +
> > +	up_read(&bpf_devs_lock);
> > +	rtnl_unlock();
> > +
> > +	err = PTR_ERR_OR_ZERO(__ns_get_path(&ns_path, &net->ns));
> > +	if (err) {
> > +		if (err == -EAGAIN)
> > +			goto again;
> > +		return err;
> > +	}
> > +	ns_inode = ns_path.dentry->d_inode;
> > +
> > +	info->ifindex = ifindex;
> > +	info->netns_dev = new_encode_dev(ns_inode->i_sb->s_dev);
> > +	info->netns_ino = ns_inode->i_ino;  
> 
> I think here, we're still missing a:
> 
> 	path_put(&ns_path);
> 
> Otherwise we're leaking that reference, no?

You're right.  Thanks for the detailed suggestion, let me try to meld
it into this series.

> Looking at perf, it does similar fetch of dev/ino pair through
> perf_fill_ns_link_info(). Only difference is that task / ns_ops
> are used to lookup the actual ns from a task, which we already
> have in our case (netns) through netdev.
> 
> I've been thinking, could we make this more generic and move the
> bits into fs/nsfs.c by having the retry itself in ns_get_path_cb()?
> 
> What I mean is:
> 
> void *ns_get_path_cb(struct path *path, ns_get_path_helper_t *ns_get_cb,
>                      void *private_data)
> {
>         struct ns_common *ns;
>         void *ret;
> 
> again:
>         ns = ns_get_cb(private_data);
>         if (!ns)
>                 return ERR_PTR(-ENOENT);
> 
>         ret = __ns_get_path(path, ns);
>         if (IS_ERR(ret) && PTR_ERR(ret) == -EAGAIN)
>                 goto again;
>         return ret;
> }
> 
> And then in bpf_prog_offload_info_fill(), we can just move the fetching
> and holding ref of net into the callback and do something similar to:
> 
> struct path ns_path;
> struct inode *ns_inode;
> struct cb_info = {
>         .offload = prog->aux->offload,
>         .info    = info,
> };
> void *ret;
> 
> ret = ns_get_path_cb(&ns_path, bpf_prog_offload_get_cb, &cb_info);
> if (!ret) {
>         ns_inode = ns_path.dentry->d_inode;
>         info->netns_dev = new_encode_dev(ns_inode->i_sb->s_dev);
>         info->netns_ino = ns_inode->i_ino;
>         path_put(&ns_path);
> }
> 
> And bpf_prog_offload_get_cb() would either return &net->ns in case
> of valid offload and can fill in the ifindex into info since we hold
> rtnl there, or otherwise return NULL, so we get -ENOENT back.
> 
> This would allow to keep __ns_get_path() (including retry) private and
> have a more generic, reusable API similar to ns_get_path() that just
> has a cb for getting ns_common.
> 
> (Rest of the series with feedback from Kirill included looks good to me.)

^ permalink raw reply

* Re: [PATCH v2 bpf-next 0/3] bpf: two stack check fixes
From: Daniel Borkmann @ 2017-12-27 23:08 UTC (permalink / raw)
  To: Alexei Starovoitov, David S . Miller; +Cc: Jann Horn, netdev, kernel-team
In-Reply-To: <20171225211542.3155576-1-ast@kernel.org>

On 12/25/2017 10:15 PM, Alexei Starovoitov wrote:
> Jann reported an issue with stack depth tracking. Fix it and add tests.
> Also fix off-by-one error in MAX_CALL_FRAMES check.
> This set is on top of Jann's "selftest for late caller stack size increase" test.
> 
> Alexei Starovoitov (3):
>   bpf: fix maximum stack depth tracking logic
>   selftests/bpf: additional stack depth tests
>   bpf: fix max call depth check
> 
>  include/linux/bpf_verifier.h                |   1 +
>  kernel/bpf/verifier.c                       |  86 +++++++++++----
>  tools/testing/selftests/bpf/test_verifier.c | 156 ++++++++++++++++++++++++++++
>  3 files changed, 225 insertions(+), 18 deletions(-)

Applied to bpf-next, thanks Alexei!

^ permalink raw reply

* Re: [PATCH] bpf: selftest for late caller stack size increase
From: Daniel Borkmann @ 2017-12-27 23:07 UTC (permalink / raw)
  To: Jann Horn, Alexei Starovoitov; +Cc: netdev, linux-kernel
In-Reply-To: <20171222181235.158636-1-jannh@google.com>

On 12/22/2017 07:12 PM, Jann Horn wrote:
> This checks that it is not possible to bypass the total stack size check in
> update_stack_depth() by calling a function that uses a large amount of
> stack memory *before* using a large amount of stack memory in the caller.
> 
> Currently, the first added testcase causes a rejection as expected, but
> the second testcase is (AFAICS incorrectly) accepted:
> 
> [...]
> #483/p calls: stack overflow using two frames (post-call access) FAIL
> Unexpected success to load!
> 0: (85) call pc+2
> caller:
>  R10=fp0,call_-1
> callee:
>  frame1: R1=ctx(id=0,off=0,imm=0) R10=fp0,call_0
> 3: (72) *(u8 *)(r10 -300) = 0
> 4: (b7) r0 = 0
> 5: (95) exit
> returning from callee:
>  frame1: R0_w=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0,call_0
> to caller at 1:
>  R0_w=inv0 R10=fp0,call_-1
> 
> from 5 to 1: R0=inv0 R10=fp0,call_-1
> 1: (72) *(u8 *)(r10 -300) = 0
> 2: (95) exit
> processed 6 insns, stack depth 300+300
> [...]
> Summary: 704 PASSED, 1 FAILED
> 
> AFAICS the JIT-generated code for the second testcase shows that this
> really causes the stack pointer to be decremented by 300+300:
> 
> first function:
> 00000000  55                push rbp
> 00000001  4889E5            mov rbp,rsp
> 00000004  4881EC58010000    sub rsp,0x158
> 0000000B  4883ED28          sub rbp,byte +0x28
> [...]
> 00000025  E89AB3AFE5        call 0xffffffffe5afb3c4
> 0000002A  C685D4FEFFFF00    mov byte [rbp-0x12c],0x0
> [...]
> 00000041  4883C528          add rbp,byte +0x28
> 00000045  C9                leave
> 00000046  C3                ret
> 
> second function:
> 00000000  55                push rbp
> 00000001  4889E5            mov rbp,rsp
> 00000004  4881EC58010000    sub rsp,0x158
> 0000000B  4883ED28          sub rbp,byte +0x28
> [...]
> 00000025  C685D4FEFFFF00    mov byte [rbp-0x12c],0x0
> [...]
> 0000003E  4883C528          add rbp,byte +0x28
> 00000042  C9                leave
> 00000043  C3                ret
> 
> Signed-off-by: Jann Horn <jannh@google.com>

Applied to bpf-next, thanks a lot Jann!

^ permalink raw reply

* Re: [PATCH 3/4] libbpf: break loop earlier
From: Daniel Borkmann @ 2017-12-27 23:05 UTC (permalink / raw)
  To: Eric Leblond, Alexei Starovoitov; +Cc: netdev, linux-kernel
In-Reply-To: <1514406645.31134.29.camel@regit.org>

On 12/27/2017 09:30 PM, Eric Leblond wrote:
> On Wed, 2017-12-27 at 11:00 -0800, Alexei Starovoitov wrote:
>> On Wed, Dec 27, 2017 at 07:02:28PM +0100, Eric Leblond wrote:
>>> Get out of the loop when we have a match.
>>>
>>> Signed-off-by: Eric Leblond <eric@regit.org>
>>> ---
>>>  tools/lib/bpf/libbpf.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>>> index 5fe8aaa2123e..d263748aa341 100644
>>> --- a/tools/lib/bpf/libbpf.c
>>> +++ b/tools/lib/bpf/libbpf.c
>>> @@ -412,6 +412,7 @@ bpf_object__init_prog_names(struct bpf_object
>>> *obj)
>>>  					   prog->section_name);
>>>  				return -LIBBPF_ERRNO__LIBELF;
>>>  			}
>>> +			break;
>>
>> why this is needed?
> 
> It was just cosmetic, no related bug.
> 
>> The top of the loop is:
>>  for (si = 0; si < symbols->d_size / sizeof(GElf_Sym) && !name;
>>
>> so as soon as name is found the loop will exit.
> 
> OK, I've missed that. Please disregard this patch.

Ok, please respin the series one last time w/o this patch then.

Please also keep the already given ACKs on the other patches.

Thanks, Eric!

^ permalink raw reply

* Re: [PATCH net-next 09/10] net: qualcomm: rmnet: Add support for TX checksum offload
From: kbuild test robot @ 2017-12-27 22:56 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan
  Cc: kbuild-all, davem, netdev, Subash Abhinov Kasiviswanathan
In-Reply-To: <1514341685-11262-10-git-send-email-subashab@codeaurora.org>

Hi Subash,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Subash-Abhinov-Kasiviswanathan/net-qualcomm-rmnet-Enable-csum-offloads/20171228-041216
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)


vim +187 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c

   106	
   107	#if IS_ENABLED(CONFIG_IPV6)
   108	static int
   109	rmnet_map_ipv6_dl_csum_trailer(struct sk_buff *skb,
   110				       struct rmnet_map_dl_csum_trailer *csum_trailer)
   111	{
   112		u16 ip_pseudo_payload_csum, pseudo_csum, ip6_hdr_csum, *csum_field;
   113		u16 csum_value, ip6_payload_csum, csum_value_final;
   114		struct ipv6hdr *ip6h;
   115		void *txporthdr;
   116		u32 length;
   117	
   118		ip6h = (struct ipv6hdr *)(skb->data);
   119	
   120		txporthdr = skb->data + sizeof(struct ipv6hdr);
   121		csum_field = rmnet_map_get_csum_field(ip6h->nexthdr, txporthdr);
   122	
   123		if (!csum_field)
   124			return -EPROTONOSUPPORT;
   125	
   126		csum_value = ~ntohs(csum_trailer->csum_value);
   127		ip6_hdr_csum = ~ntohs(ip_compute_csum(ip6h,
   128				      (int)(txporthdr - (void *)(skb->data))));
   129		ip6_payload_csum = csum16_sub(csum_value, ip6_hdr_csum);
   130	
   131		length = (ip6h->nexthdr == IPPROTO_UDP) ?
   132			 ntohs(((struct udphdr *)txporthdr)->len) :
   133			 ntohs(ip6h->payload_len);
   134		pseudo_csum = ~ntohs(csum_ipv6_magic(&ip6h->saddr, &ip6h->daddr,
   135				     length, ip6h->nexthdr, 0));
   136		ip_pseudo_payload_csum = csum16_add(ip6_payload_csum, pseudo_csum);
   137	
 > 138		csum_value_final = ~csum16_sub(ip_pseudo_payload_csum,
 > 139					       ntohs(*csum_field));
   140	
   141		if (unlikely(csum_value_final == 0)) {
   142			switch (ip6h->nexthdr) {
   143			case IPPROTO_UDP:
   144				/* RFC 2460 section 8.1
   145				 * DL6 One's complement rule for UDP checksum 0
   146				 */
   147				csum_value_final = ~csum_value_final;
   148				break;
   149	
   150			case IPPROTO_TCP:
   151				/* DL6 Non-RFC compliant TCP checksum found */
   152				if (*csum_field == 0xFFFF)
   153					csum_value_final = ~csum_value_final;
   154				break;
   155			}
   156		}
   157	
   158		if (csum_value_final == ntohs(*csum_field))
   159			return 0;
   160		else
   161			return -EINVAL;
   162	}
   163	#endif
   164	
   165	static void rmnet_map_complement_ipv4_txporthdr_csum_field(void *iphdr)
   166	{
   167		struct iphdr *ip4h = (struct iphdr *)iphdr;
   168		void *txphdr;
   169		u16 *csum;
   170	
   171		txphdr = iphdr + ip4h->ihl * 4;
   172	
   173		if (ip4h->protocol == IPPROTO_TCP || ip4h->protocol == IPPROTO_UDP) {
   174			csum = (u16 *)rmnet_map_get_csum_field(ip4h->protocol, txphdr);
   175			*csum = ~(*csum);
   176		}
   177	}
   178	
   179	static void
   180	rmnet_map_ipv4_ul_csum_header(void *iphdr,
   181				      struct rmnet_map_ul_csum_header *ul_header,
   182				      struct sk_buff *skb)
   183	{
   184		struct iphdr *ip4h = (struct iphdr *)iphdr;
   185		u16 *hdr = (u16 *)ul_header;
   186	
 > 187		ul_header->csum_start_offset = htons((u16)(skb_transport_header(skb) -
   188							   (unsigned char *)iphdr));
   189		ul_header->csum_insert_offset = skb->csum_offset;
   190		ul_header->csum_enabled = 1;
   191		if (ip4h->protocol == IPPROTO_UDP)
   192			ul_header->udp_ip4_ind = 1;
   193		else
   194			ul_header->udp_ip4_ind = 0;
   195	
   196		/* Changing remaining fields to network order */
   197		hdr++;
 > 198		*hdr = htons(*hdr);
   199	
   200		skb->ip_summed = CHECKSUM_NONE;
   201	
   202		rmnet_map_complement_ipv4_txporthdr_csum_field(iphdr);
   203	}
   204	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [RFC PATCH bpf-next v2 4/4] error-injection: Support fault injection framework
From: Alexei Starovoitov @ 2017-12-27 22:49 UTC (permalink / raw)
  To: Masami Hiramatsu, Alexei Starovoitov
  Cc: Josef Bacik, rostedt, mingo, davem, netdev, linux-kernel, ast,
	kernel-team, daniel, linux-btrfs, darrick.wong, Josef Bacik,
	Akinobu Mita
In-Reply-To: <20171227170910.5ac1074bc86341f194130119@kernel.org>

On 12/27/17 12:09 AM, Masami Hiramatsu wrote:
> On Tue, 26 Dec 2017 18:12:56 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
>> On Tue, Dec 26, 2017 at 04:48:25PM +0900, Masami Hiramatsu wrote:
>>> Support in-kernel fault-injection framework via debugfs.
>>> This allows you to inject a conditional error to specified
>>> function using debugfs interfaces.
>>>
>>> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
>>> ---
>>>  Documentation/fault-injection/fault-injection.txt |    5 +
>>>  kernel/Makefile                                   |    1
>>>  kernel/fail_function.c                            |  169 +++++++++++++++++++++
>>>  lib/Kconfig.debug                                 |   10 +
>>>  4 files changed, 185 insertions(+)
>>>  create mode 100644 kernel/fail_function.c
>>>
>>> diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt
>>> index 918972babcd8..6243a588dd71 100644
>>> --- a/Documentation/fault-injection/fault-injection.txt
>>> +++ b/Documentation/fault-injection/fault-injection.txt
>>> @@ -30,6 +30,11 @@ o fail_mmc_request
>>>    injects MMC data errors on devices permitted by setting
>>>    debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request
>>>
>>> +o fail_function
>>> +
>>> +  injects error return on specific functions by setting debugfs entries
>>> +  under /sys/kernel/debug/fail_function. No boot option supported.
>>
>> I like it.
>> Could you document it a bit better?
>
> Yes, I will do in next series.
>
>> In particular retval is configurable, but without an example no one
>> will be able to figure out how to use it.
>
> Ah, right. BTW, as I pointed in the covermail, should we store the
> expected error value range into the injectable list? e.g.
>
> ALLOW_ERROR_INJECTION(open_ctree, -1, -MAX_ERRNO)
>
> And provide APIs to check/get it.

I'm afraid such check would be too costly.
Right now we have only two functions marked but I expect hundreds more
will be added in the near future as soon as developers realize the
potential of such error injection.
All of ALLOW_ERROR_INJECTION marks add 8 byte overhead each to .data.
Multiple by 1k and we have 8k of data spent on marks.
If we add max/min range marks that doubles it for very little use.
I think marking function only is enough.

^ permalink raw reply

* Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry
From: Alexei Starovoitov @ 2017-12-27 22:46 UTC (permalink / raw)
  To: Masami Hiramatsu, Alexei Starovoitov
  Cc: Josef Bacik, rostedt, mingo, davem, netdev, linux-kernel, ast,
	kernel-team, daniel, linux-btrfs, darrick.wong, Josef Bacik,
	Akinobu Mita
In-Reply-To: <20171227145628.53f68f391b2108d6df118ca7@kernel.org>

On 12/26/17 9:56 PM, Masami Hiramatsu wrote:
> On Tue, 26 Dec 2017 17:57:32 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
>> On Tue, Dec 26, 2017 at 04:46:59PM +0900, Masami Hiramatsu wrote:
>>> Check whether error injectable event is on function entry or not.
>>> Currently it checks the event is ftrace-based kprobes or not,
>>> but that is wrong. It should check if the event is on the entry
>>> of target function. Since error injection will override a function
>>> to just return with modified return value, that operation must
>>> be done before the target function starts making stackframe.
>>>
>>> As a side effect, bpf error injection is no need to depend on
>>> function-tracer. It can work with sw-breakpoint based kprobe
>>> events too.
>>>
>>> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
>>> ---
>>>  kernel/trace/Kconfig        |    2 --
>>>  kernel/trace/bpf_trace.c    |    6 +++---
>>>  kernel/trace/trace_kprobe.c |    8 +++++---
>>>  kernel/trace/trace_probe.h  |   12 ++++++------
>>>  4 files changed, 14 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
>>> index ae3a2d519e50..6400e1bf97c5 100644
>>> --- a/kernel/trace/Kconfig
>>> +++ b/kernel/trace/Kconfig
>>> @@ -533,9 +533,7 @@ config FUNCTION_PROFILER
>>>  config BPF_KPROBE_OVERRIDE
>>>  	bool "Enable BPF programs to override a kprobed function"
>>>  	depends on BPF_EVENTS
>>> -	depends on KPROBES_ON_FTRACE
>>>  	depends on HAVE_KPROBE_OVERRIDE
>>> -	depends on DYNAMIC_FTRACE_WITH_REGS
>>>  	default n
>>>  	help
>>>  	 Allows BPF to override the execution of a probed function and
>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>> index f6d2327ecb59..d663660f8392 100644
>>> --- a/kernel/trace/bpf_trace.c
>>> +++ b/kernel/trace/bpf_trace.c
>>> @@ -800,11 +800,11 @@ int perf_event_attach_bpf_prog(struct perf_event *event,
>>>  	int ret = -EEXIST;
>>>
>>>  	/*
>>> -	 * Kprobe override only works for ftrace based kprobes, and only if they
>>> -	 * are on the opt-in list.
>>> +	 * Kprobe override only works if they are on the function entry,
>>> +	 * and only if they are on the opt-in list.
>>>  	 */
>>>  	if (prog->kprobe_override &&
>>> -	    (!trace_kprobe_ftrace(event->tp_event) ||
>>> +	    (!trace_kprobe_on_func_entry(event->tp_event) ||
>>>  	     !trace_kprobe_error_injectable(event->tp_event)))
>>>  		return -EINVAL;
>>>
>>> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
>>> index 91f4b57dab82..265e3e27e8dc 100644
>>> --- a/kernel/trace/trace_kprobe.c
>>> +++ b/kernel/trace/trace_kprobe.c
>>> @@ -88,13 +88,15 @@ static nokprobe_inline unsigned long trace_kprobe_nhit(struct trace_kprobe *tk)
>>>  	return nhit;
>>>  }
>>>
>>> -int trace_kprobe_ftrace(struct trace_event_call *call)
>>> +bool trace_kprobe_on_func_entry(struct trace_event_call *call)
>>>  {
>>>  	struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
>>> -	return kprobe_ftrace(&tk->rp.kp);
>>> +
>>> +	return kprobe_on_func_entry(tk->rp.kp.addr, tk->rp.kp.symbol_name,
>>> +				    tk->rp.kp.offset);
>>
>> That would be nice, but did you test this?
>
> Yes, because the jprobe, which was only official user of modifying execution
> path using kprobe, did same way to check. (and kretprobe also does it)
>
>> My understanding that kprobe will restore all regs and
>> here we need to override return ip _and_ value.
>
> yes, no problem. kprobe restore all regs from pt_regs, including regs->ip.
>
>> Could you add a patch with the test the way Josef did
>> or describe the steps to test this new mode?
>
> Would you mean below patch? If so, it should work without any change.
>
>  [PATCH v10 4/5] samples/bpf: add a test for bpf_override_return

yeah. I expect bpf_override_return test to work as-is.
I'm asking for the test for new functionality added by this patch.
In particular kprobe on func entry without ftrace.
How did you test it?
and how I can repeat the test?
I'm still not sure that it works correctly.

^ permalink raw reply

* Re: [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface
From: Antoine Tenart @ 2017-12-27 22:42 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Antoine Tenart, davem, kishon, andrew, jason,
	sebastian.hesselbarth, gregory.clement, mw, stefanc, ymarkman,
	thomas.petazzoni, miquel.raynal, nadavh, netdev, linux-arm-kernel,
	linux-kernel
In-Reply-To: <20171227222401.GT10595@n2100.armlinux.org.uk>

Hi Russell,

On Wed, Dec 27, 2017 at 10:24:01PM +0000, Russell King - ARM Linux wrote:
> On Wed, Dec 27, 2017 at 11:14:45PM +0100, Antoine Tenart wrote:
> >  
> > +&cps_eth2 {
> > +	/* CPS Lane 5 */
> > +	status = "okay";
> > +	phy-mode = "2500base-x";
> > +	/* Generic PHY, providing serdes lanes */
> > +	phys = <&cps_comphy5 2>;
> > +};
> > +
> 
> This is wrong.  This lane is connected to a SFP cage which can support
> more than just 2500base-X.  Tying it in this way to 2500base-X means
> that this port does not support conenctions at 1000base-X, despite
> that's one of the most popular and more standardised speeds.

What do you suggest to describe this in the dt, to enable a port using
the current PPv2 driver?

Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply

* [PATCH ipsec] xfrm: skip policies marked as dead while rehashing
From: Florian Westphal @ 2017-12-27 22:25 UTC (permalink / raw)
  To: netdev
  Cc: steffen.klassert, syzkaller-bugs, Florian Westphal, Herbert Xu,
	Timo Teras, Christophe Gouault
In-Reply-To: <001a1140f578d314ec0560d9f29a@google.com>

syzkaller triggered following KASAN splat:

BUG: KASAN: slab-out-of-bounds in xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618
read of size 2 at addr ffff8801c8e92fe4 by task kworker/1:1/23 [..]
Workqueue: events xfrm_hash_rebuild [..]
 __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:428
 xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618
 process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
 worker_thread+0x223/0x1990 kernel/workqueue.c:2246 [..]

The reproducer triggers:
1016                 if (error) {
1017                         list_move_tail(&walk->walk.all, &x->all);
1018                         goto out;
1019                 }

in xfrm_policy_walk() via pfkey (it sets tiny rcv space, dump
callback returns -ENOBUFS).

In this case, *walk is located the pfkey socket struct, so this socket
becomes visible in the global policy list.

It looks like this is intentional -- phony walker has walk.dead set to 1
and all other places skip such "policies".

Ccing original authors of the two commits that seem to expose this
issue (first patch missed ->dead check, second patch adds pfkey
sockets to policies dumper list).

Fixes: 880a6fab8f6ba5b ("xfrm: configure policy hash table thresholds by netlink")
Fixes: 12a169e7d8f4b1c ("ipsec: Put dumpers on the dump list")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Timo Teras <timo.teras@iki.fi>
Cc: Christophe Gouault <christophe.gouault@6wind.com>
Reported-by: syzbot <bot+c028095236fcb6f4348811565b75084c754dc729@syzkaller.appspotmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/xfrm/xfrm_policy.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 9542975eb2f9..181bc6181789 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -609,7 +609,8 @@ static void xfrm_hash_rebuild(struct work_struct *work)
 
 	/* re-insert all policies by order of creation */
 	list_for_each_entry_reverse(policy, &net->xfrm.policy_all, walk.all) {
-		if (xfrm_policy_id2dir(policy->index) >= XFRM_POLICY_MAX) {
+		if (policy->walk.dead ||
+		    xfrm_policy_id2dir(policy->index) >= XFRM_POLICY_MAX) {
 			/* skip socket policies */
 			continue;
 		}
-- 
2.13.6

^ permalink raw reply related

* Re: [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface
From: Russell King - ARM Linux @ 2017-12-27 22:24 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement, mw, stefanc, ymarkman, thomas.petazzoni,
	miquel.raynal, nadavh, netdev, linux-arm-kernel, linux-kernel
In-Reply-To: <20171227221446.18459-6-antoine.tenart@free-electrons.com>

On Wed, Dec 27, 2017 at 11:14:45PM +0100, Antoine Tenart wrote:
> This patch enables the fourth network interface on the Marvell
> Macchiatobin. It is configured in the 2500Base-X PHY mode.
> 
> Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
> ---
>  arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
> index b3350827ee55..c51efd511324 100644
> --- a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
> +++ b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
> @@ -279,6 +279,14 @@
>  	phys = <&cps_comphy0 1>;
>  };
>  
> +&cps_eth2 {
> +	/* CPS Lane 5 */
> +	status = "okay";
> +	phy-mode = "2500base-x";
> +	/* Generic PHY, providing serdes lanes */
> +	phys = <&cps_comphy5 2>;
> +};
> +

This is wrong.  This lane is connected to a SFP cage which can support
more than just 2500base-X.  Tying it in this way to 2500base-X means
that this port does not support conenctions at 1000base-X, despite
that's one of the most popular and more standardised speeds.

So, I feel I have to NAK this patch in its current form.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply

* [PATCH net-next 6/6] arm64: dts: marvell: add Ethernet aliases
From: Antoine Tenart @ 2017-12-27 22:14 UTC (permalink / raw)
  To: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement, linux
  Cc: Yan Markman, mw, stefanc, thomas.petazzoni, miquel.raynal, nadavh,
	netdev, linux-arm-kernel, linux-kernel, Antoine Tenart
In-Reply-To: <20171227221446.18459-1-antoine.tenart@free-electrons.com>

From: Yan Markman <ymarkman@marvell.com>

This patch adds Ethernet aliases in the Marvell Aramada 7040 DB, 8040 DB
and 8040 mcbin device trees so that the bootloader setup the MAC
addresses correctly.

Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Antoine: commit message, small fixes]
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 arch/arm64/boot/dts/marvell/armada-7040-db.dts    | 6 ++++++
 arch/arm64/boot/dts/marvell/armada-8040-db.dts    | 7 +++++++
 arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts | 7 +++++++
 3 files changed, 20 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-7040-db.dts b/arch/arm64/boot/dts/marvell/armada-7040-db.dts
index 52b5341cb270..62b83416b30c 100644
--- a/arch/arm64/boot/dts/marvell/armada-7040-db.dts
+++ b/arch/arm64/boot/dts/marvell/armada-7040-db.dts
@@ -61,6 +61,12 @@
 		reg = <0x0 0x0 0x0 0x80000000>;
 	};
 
+	aliases {
+		ethernet0 = &cpm_eth0;
+		ethernet1 = &cpm_eth1;
+		ethernet2 = &cpm_eth2;
+	};
+
 	cpm_reg_usb3_0_vbus: cpm-usb3-0-vbus {
 		compatible = "regulator-fixed";
 		regulator-name = "usb3h0-vbus";
diff --git a/arch/arm64/boot/dts/marvell/armada-8040-db.dts b/arch/arm64/boot/dts/marvell/armada-8040-db.dts
index d97b72bed662..d9fffde64c44 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040-db.dts
+++ b/arch/arm64/boot/dts/marvell/armada-8040-db.dts
@@ -61,6 +61,13 @@
 		reg = <0x0 0x0 0x0 0x80000000>;
 	};
 
+	aliases {
+		ethernet0 = &cpm_eth0;
+		ethernet1 = &cpm_eth2;
+		ethernet2 = &cps_eth0;
+		ethernet3 = &cps_eth1;
+	};
+
 	cpm_reg_usb3_0_vbus: cpm-usb3-0-vbus {
 		compatible = "regulator-fixed";
 		regulator-name = "cpm-usb3h0-vbus";
diff --git a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
index c51efd511324..7a23bb279e1c 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
+++ b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
@@ -62,6 +62,13 @@
 		reg = <0x0 0x0 0x0 0x80000000>;
 	};
 
+	aliases {
+		ethernet0 = &cpm_eth0;
+		ethernet1 = &cps_eth0;
+		ethernet2 = &cps_eth1;
+		ethernet3 = &cps_eth2;
+	};
+
 	/* Regulator labels correspond with schematics */
 	v_3_3: regulator-3-3v {
 		compatible = "regulator-fixed";
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface
From: Antoine Tenart @ 2017-12-27 22:14 UTC (permalink / raw)
  To: davem, kishon, andrew, jason, sebastian.hesselbarth,
	gregory.clement, linux
  Cc: Antoine Tenart, mw, stefanc, ymarkman, thomas.petazzoni,
	miquel.raynal, nadavh, netdev, linux-arm-kernel, linux-kernel
In-Reply-To: <20171227221446.18459-1-antoine.tenart@free-electrons.com>

This patch enables the fourth network interface on the Marvell
Macchiatobin. It is configured in the 2500Base-X PHY mode.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
index b3350827ee55..c51efd511324 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
+++ b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
@@ -279,6 +279,14 @@
 	phys = <&cps_comphy0 1>;
 };
 
+&cps_eth2 {
+	/* CPS Lane 5 */
+	status = "okay";
+	phy-mode = "2500base-x";
+	/* Generic PHY, providing serdes lanes */
+	phys = <&cps_comphy5 2>;
+};
+
 &cps_pinctrl {
 	cps_spi1_pins: spi1-pins {
 		marvell,pins = "mpp12", "mpp13", "mpp14", "mpp15", "mpp16";
-- 
2.14.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox