Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: general protection fault in xsk_poll
From: Björn Töpel @ 2019-08-20  9:17 UTC (permalink / raw)
  To: Hillf Danton, syzbot
  Cc: ast, bpf, daniel, davem, hawk, jakub.kicinski, john.fastabend,
	jonathan.lemon, kafai, linux-kernel, magnus.karlsson, netdev,
	songliubraving, syzkaller-bugs, xdp-newbies, yhs
In-Reply-To: <20190820033154.9112-1-hdanton@sina.com>

On 2019-08-20 05:31, Hillf Danton wrote:
> 
> On Mon, 19 Aug 2019 18:18:06 -0700
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    da657043 Add linux-next specific files for 20190819
>> git tree:       linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=16af124c600000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=739a9b3ab3d8c770
>> dashboard link: https://syzkaller.appspot.com/bug?extid=c82697e3043781e08802
>> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=109e1922600000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1445bf02600000
>>
>> The bug was bisected to:
>>
>> commit 77cd0d7b3f257fd0e3096b4fdcff1a7d38e99e10
>> Author: Magnus Karlsson <magnus.karlsson@intel.com>
>> Date:   Wed Aug 14 07:27:17 2019 +0000
>>
>>       xsk: add support for need_wakeup flag in AF_XDP rings
>>
>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15e1ea4c600000
>> final crash:    https://syzkaller.appspot.com/x/report.txt?x=17e1ea4c600000
>> console output: https://syzkaller.appspot.com/x/log.txt?x=13e1ea4c600000
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
>> Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
>>
>> kasan: CONFIG_KASAN_INLINE enabled
>> kasan: GPF could be caused by NULL-ptr deref or user memory access
>> general protection fault: 0000 [#1] PREEMPT SMP KASAN
>> CPU: 1 PID: 7959 Comm: syz-executor611 Not tainted 5.3.0-rc5-next-20190819 #68
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> RIP: 0010:xsk_poll+0x95/0x540 net/xdp/xsk.c:386
>> Code: 80 3c 02 00 0f 85 70 04 00 00 4c 8b a3 88 04 00 00 48 b8 00 00 00 00
>> 00 fc ff df 49 8d bc 24 96 00 00 00 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48
>> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 bf 03 00 00
>> RSP: 0018:ffff8880926f7850 EFLAGS: 00010207
>> RAX: dffffc0000000000 RBX: ffff88809a141700 RCX: ffffffff859b07aa
>> RDX: 0000000000000012 RSI: ffffffff859b07c4 RDI: 0000000000000096
>> RBP: ffff8880926f7880 R08: ffff88809698a580 R09: ffffed1013428329
>> R10: ffffed1013428328 R11: ffff88809a141947 R12: 0000000000000000
>> R13: 0000000000000304 R14: ffff888095d4d840 R15: ffff888092bdd020
>> FS:  0000555557529880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000020000280 CR3: 0000000098281000 CR4: 00000000001406e0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>>    sock_poll+0x15e/0x480 net/socket.c:1256
>>    vfs_poll include/linux/poll.h:90 [inline]
>>    do_pollfd fs/select.c:859 [inline]
>>    do_poll fs/select.c:907 [inline]
>>    do_sys_poll+0x7c2/0xde0 fs/select.c:1001
>>    __do_sys_ppoll fs/select.c:1101 [inline]
>>    __se_sys_ppoll fs/select.c:1081 [inline]
>>    __x64_sys_ppoll+0x259/0x310 fs/select.c:1081
>>    do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
>>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> RIP: 0033:0x440159
>> Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
>> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
>> ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
>> RSP: 002b:00007ffd9fbd16e8 EFLAGS: 00000246 ORIG_RAX: 000000000000010f
>> RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440159
>> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000020000280
>> RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004019e0
>> R13: 0000000000401a70 R14: 0000000000000000 R15: 0000000000000000
>> Modules linked in:
>> ---[ end trace da907175426b4065 ]---
> 
> Add umem check.
> 
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -381,9 +381,9 @@ static unsigned int xsk_poll(struct file
>   	struct sock *sk = sock->sk;
>   	struct xdp_sock *xs = xdp_sk(sk);
>   	struct net_device *dev = xs->dev;
> -	struct xdp_umem *umem = xs->umem;
> +	struct xdp_umem *umem = READ_ONCE(xs->umem);
>   
> -	if (umem->need_wakeup)
> +	if (umem && umem->need_wakeup)
>   		dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
>   						umem->need_wakeup);
>   
> --
> 

Thanks!

What do you think about making it a bit more generic, like:

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ee4428a892fa..08bed5e92af4 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -356,13 +356,20 @@ static int xsk_generic_xmit(struct sock *sk, 
struct msghdr *m,
  	return err;
  }

+static bool xsk_is_bound(struct xdp_sock *xs)
+{
+	struct net_device *dev = READ_ONCE(xs->dev);
+
+	return dev && xs->state == XSK_BOUND;
+}
+
  static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t 
total_len)
  {
  	bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
  	struct sock *sk = sock->sk;
  	struct xdp_sock *xs = xdp_sk(sk);

-	if (unlikely(!xs->dev))
+	if (unlikely(!xsk_is_bound(xs)))
  		return -ENXIO;
  	if (unlikely(!(xs->dev->flags & IFF_UP)))
  		return -ENETDOWN;
@@ -383,6 +390,9 @@ static unsigned int xsk_poll(struct file *file, 
struct socket *sock,
  	struct net_device *dev = xs->dev;
  	struct xdp_umem *umem = xs->umem;

+	if (unlikely(!xsk_is_bound(xs)))
+		return mask;
+
  	if (umem->need_wakeup)
  		dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
  						umem->need_wakeup);
@@ -417,7 +427,7 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
  {
  	struct net_device *dev = xs->dev;

-	if (!dev || xs->state != XSK_BOUND)
+	if (!xsk_is_bound(xs))
  		return;

  	xs->state = XSK_UNBOUND;

^ permalink raw reply related

* Re: Help needed - Kernel lockup while running ipsec
From: Florian Westphal @ 2019-08-20  9:23 UTC (permalink / raw)
  To: Vakul Garg; +Cc: Florian Westphal, netdev@vger.kernel.org
In-Reply-To: <DB7PR04MB4620C6E770C97AB14A04A1D98BAB0@DB7PR04MB4620.eurprd04.prod.outlook.com>

Vakul Garg <vakul.garg@nxp.com> wrote:
> > > With kernel 4.14.122, I am getting a kernel softlockup while running single
> > static ipsec tunnel.
> > > The problem reproduces mostly after running 8-10 hours of ipsec encap
> > test (on my dual core arm board).
> > >
> > > I found that in function xfrm_policy_lookup_bytype(), the policy in variable
> > 'ret' shows refcnt=0 under problem situation.
> > > This creates an infinite loop in  xfrm_policy_lookup_bytype() and hence the
> > lockup.
> > >
> > > Can some body please provide me pointers about 'refcnt'?
> > > Is it legitimate for 'refcnt' to become '0'? Under what condition can it
> > become '0'?
> > 
> > Yes, when policy is destroyed and the last user calls
> > xfrm_pol_put() which will invoke call_rcu to free the structure.
> 
> It seems that policy reference count never gets decremented during packet ipsec encap.
> It is getting incremented for every frame that hits the policy.
> In setkey -DP output, I see refcnt to be wrapping around after '0'.

Thats a bug.  Does this affect 4.14 only or does this happen on current
tree as well?

^ permalink raw reply

* Re: [PATCH -next] bpf: Use PTR_ERR_OR_ZERO in xsk_map_inc()
From: Björn Töpel @ 2019-08-20  9:25 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Björn Töpel, YueHaibing, Karlsson, Magnus,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, Netdev,
	bpf, kernel-janitors
In-Reply-To: <20190820085547.GE4451@kadam>

On Tue, 20 Aug 2019 at 10:59, Dan Carpenter <dan.carpenter@oracle.com> wrote:
>
> On Tue, Aug 20, 2019 at 09:28:26AM +0200, Björn Töpel wrote:
> > For future patches: Prefix AF_XDP socket work with "xsk:" and use "PATCH
> > bpf-next" to let the developers know what tree you're aiming for.
>
> There are over 300 trees in linux-next.  It impossible to try remember
> everyone's trees.  No one else has this requirement.
>

Net/bpf are different, and I wanted to point that out to lessen the
burden for the maintainers. It's documented in:

Documentation/bpf/bpf_devel_QA.rst.
Documentation/networking/netdev-FAQ.rst

> Maybe add it as an option to get_maintainer.pl --tree <hash> then that
> would be very easy.
>

Yes, improved tooling would help.


Cheers,
Björn

> regards,
> dan carpenter
>

^ permalink raw reply

* RE: Help needed - Kernel lockup while running ipsec
From: Vakul Garg @ 2019-08-20  9:26 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev@vger.kernel.org
In-Reply-To: <20190820092303.GM2588@breakpoint.cc>



> -----Original Message-----
> From: Florian Westphal <fw@strlen.de>
> Sent: Tuesday, August 20, 2019 2:53 PM
> To: Vakul Garg <vakul.garg@nxp.com>
> Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> Subject: Re: Help needed - Kernel lockup while running ipsec
> 
> Vakul Garg <vakul.garg@nxp.com> wrote:
> > > > With kernel 4.14.122, I am getting a kernel softlockup while
> > > > running single
> > > static ipsec tunnel.
> > > > The problem reproduces mostly after running 8-10 hours of ipsec
> > > > encap
> > > test (on my dual core arm board).
> > > >
> > > > I found that in function xfrm_policy_lookup_bytype(), the policy
> > > > in variable
> > > 'ret' shows refcnt=0 under problem situation.
> > > > This creates an infinite loop in  xfrm_policy_lookup_bytype() and
> > > > hence the
> > > lockup.
> > > >
> > > > Can some body please provide me pointers about 'refcnt'?
> > > > Is it legitimate for 'refcnt' to become '0'? Under what condition
> > > > can it
> > > become '0'?
> > >
> > > Yes, when policy is destroyed and the last user calls
> > > xfrm_pol_put() which will invoke call_rcu to free the structure.
> >
> > It seems that policy reference count never gets decremented during packet
> ipsec encap.
> > It is getting incremented for every frame that hits the policy.
> > In setkey -DP output, I see refcnt to be wrapping around after '0'.
> 
> Thats a bug.  Does this affect 4.14 only or does this happen on current tree
> as well?
 
I am yet to try it on 4.19.
Can you help me with the right fix? Which part of code should it get decremented?
I am not conversant with xfrm code.


^ permalink raw reply

* RE: Help needed - Kernel lockup while running ipsec
From: Vakul Garg @ 2019-08-20  9:30 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev@vger.kernel.org
In-Reply-To: <DB7PR04MB4620487074796FBC015AFD098BAB0@DB7PR04MB4620.eurprd04.prod.outlook.com>



> > -----Original Message-----
> > From: Florian Westphal <fw@strlen.de>
> > Sent: Tuesday, August 20, 2019 2:53 PM
> > To: Vakul Garg <vakul.garg@nxp.com>
> > Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> > Subject: Re: Help needed - Kernel lockup while running ipsec
> >
> > Vakul Garg <vakul.garg@nxp.com> wrote:
> > > > > With kernel 4.14.122, I am getting a kernel softlockup while
> > > > > running single
> > > > static ipsec tunnel.
> > > > > The problem reproduces mostly after running 8-10 hours of ipsec
> > > > > encap
> > > > test (on my dual core arm board).
> > > > >
> > > > > I found that in function xfrm_policy_lookup_bytype(), the policy
> > > > > in variable
> > > > 'ret' shows refcnt=0 under problem situation.
> > > > > This creates an infinite loop in  xfrm_policy_lookup_bytype() and
> > > > > hence the
> > > > lockup.
> > > > >
> > > > > Can some body please provide me pointers about 'refcnt'?
> > > > > Is it legitimate for 'refcnt' to become '0'? Under what condition
> > > > > can it
> > > > become '0'?
> > > >
> > > > Yes, when policy is destroyed and the last user calls
> > > > xfrm_pol_put() which will invoke call_rcu to free the structure.
> > >
> > > It seems that policy reference count never gets decremented during
> packet
> > ipsec encap.
> > > It is getting incremented for every frame that hits the policy.
> > > In setkey -DP output, I see refcnt to be wrapping around after '0'.
> >
> > Thats a bug.  Does this affect 4.14 only or does this happen on current tree
> > as well?
> 
> I am yet to try it on 4.19.

Correction: I am yet to try it on current tree.

> Can you help me with the right fix? Which part of code should it get
> decremented?
> I am not conversant with xfrm code.


^ permalink raw reply

* [PATCH bpf-next v2 0/5] bpf: list BTF objects loaded on system
From: Quentin Monnet @ 2019-08-20  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet

Hi,
This set adds a new command BPF_BTF_GET_NEXT_ID to the bpf() system call,
adds the relevant API function in libbpf, and uses it in bpftool to list
all BTF objects loaded on the system (and to dump the ids of maps and
programs associated with them, if any).

The main motivation of listing BTF objects is introspection and debugging
purposes. By getting BPF program and map information, it should already be
possible to list all BTF objects associated to at least one map or one
program. But there may be unattached BTF objects, held by a file descriptor
from a user space process only, and we may want to list them too.

As a side note, it also turned useful for examining the BTF objects
attached to offloaded programs, which would not show in program information
because the BTF id is not copied when retrieving such info. A fix is in
progress on that side.

v2:
- Rebase patch with new libbpf function on top of Andrii's changes
  regarding libbpf versioning.

Quentin Monnet (5):
  bpf: add new BPF_BTF_GET_NEXT_ID syscall command
  tools: bpf: synchronise BPF UAPI header with tools
  libbpf: refactor bpf_*_get_next_id() functions
  libbpf: add bpf_btf_get_next_id() to cycle through BTF objects
  tools: bpftool: implement "bpftool btf show|list"

 include/linux/bpf.h                           |   3 +
 include/uapi/linux/bpf.h                      |   1 +
 kernel/bpf/btf.c                              |   4 +-
 kernel/bpf/syscall.c                          |   4 +
 .../bpf/bpftool/Documentation/bpftool-btf.rst |   7 +
 tools/bpf/bpftool/bash-completion/bpftool     |  20 +-
 tools/bpf/bpftool/btf.c                       | 342 +++++++++++++++++-
 tools/include/uapi/linux/bpf.h                |   1 +
 tools/lib/bpf/bpf.c                           |  24 +-
 tools/lib/bpf/bpf.h                           |   1 +
 tools/lib/bpf/libbpf.map                      |   2 +
 11 files changed, 389 insertions(+), 20 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH bpf-next v2 1/5] bpf: add new BPF_BTF_GET_NEXT_ID syscall command
From: Quentin Monnet @ 2019-08-20  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190820093154.14042-1-quentin.monnet@netronome.com>

Add a new command for the bpf() system call: BPF_BTF_GET_NEXT_ID is used
to cycle through all BTF objects loaded on the system.

The motivation is to be able to inspect (list) all BTF objects presents
on the system.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/linux/bpf.h      | 3 +++
 include/uapi/linux/bpf.h | 1 +
 kernel/bpf/btf.c         | 4 ++--
 kernel/bpf/syscall.c     | 4 ++++
 4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 15ae49862b82..5b9d22338606 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -24,6 +24,9 @@ struct seq_file;
 struct btf;
 struct btf_type;
 
+extern struct idr btf_idr;
+extern spinlock_t btf_idr_lock;
+
 /* map is generic key/value storage optionally accesible by eBPF programs */
 struct bpf_map_ops {
 	/* funcs callable from userspace (via syscall) */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0ef594ac3899..8aa6126f0b6e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -106,6 +106,7 @@ enum bpf_cmd {
 	BPF_TASK_FD_QUERY,
 	BPF_MAP_LOOKUP_AND_DELETE_ELEM,
 	BPF_MAP_FREEZE,
+	BPF_BTF_GET_NEXT_ID,
 };
 
 enum bpf_map_type {
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 5fcc7a17eb5a..e716a64b2f7f 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -195,8 +195,8 @@
 	     i < btf_type_vlen(struct_type);					\
 	     i++, member++)
 
-static DEFINE_IDR(btf_idr);
-static DEFINE_SPINLOCK(btf_idr_lock);
+DEFINE_IDR(btf_idr);
+DEFINE_SPINLOCK(btf_idr_lock);
 
 struct btf {
 	void *data;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index cf8052b016e7..c0f62fd67c6b 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2884,6 +2884,10 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 		err = bpf_obj_get_next_id(&attr, uattr,
 					  &map_idr, &map_idr_lock);
 		break;
+	case BPF_BTF_GET_NEXT_ID:
+		err = bpf_obj_get_next_id(&attr, uattr,
+					  &btf_idr, &btf_idr_lock);
+		break;
 	case BPF_PROG_GET_FD_BY_ID:
 		err = bpf_prog_get_fd_by_id(&attr);
 		break;
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next v2 2/5] tools: bpf: synchronise BPF UAPI header with tools
From: Quentin Monnet @ 2019-08-20  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190820093154.14042-1-quentin.monnet@netronome.com>

Synchronise the bpf.h header under tools, to report the addition of the
new BPF_BTF_GET_NEXT_ID syscall command for bpf().

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 tools/include/uapi/linux/bpf.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0ef594ac3899..8aa6126f0b6e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -106,6 +106,7 @@ enum bpf_cmd {
 	BPF_TASK_FD_QUERY,
 	BPF_MAP_LOOKUP_AND_DELETE_ELEM,
 	BPF_MAP_FREEZE,
+	BPF_BTF_GET_NEXT_ID,
 };
 
 enum bpf_map_type {
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next v2 3/5] libbpf: refactor bpf_*_get_next_id() functions
From: Quentin Monnet @ 2019-08-20  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190820093154.14042-1-quentin.monnet@netronome.com>

In preparation for the introduction of a similar function for retrieving
the id of the next BTF object, consolidate the code from
bpf_prog_get_next_id() and bpf_map_get_next_id() in libbpf.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/lib/bpf/bpf.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index c7d7993c44bb..1439e99c9be5 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -568,7 +568,7 @@ int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr)
 	return ret;
 }
 
-int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id)
+static int bpf_obj_get_next_id(__u32 start_id, __u32 *next_id, int cmd)
 {
 	union bpf_attr attr;
 	int err;
@@ -576,26 +576,21 @@ int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id)
 	memset(&attr, 0, sizeof(attr));
 	attr.start_id = start_id;
 
-	err = sys_bpf(BPF_PROG_GET_NEXT_ID, &attr, sizeof(attr));
+	err = sys_bpf(cmd, &attr, sizeof(attr));
 	if (!err)
 		*next_id = attr.next_id;
 
 	return err;
 }
 
-int bpf_map_get_next_id(__u32 start_id, __u32 *next_id)
+int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id)
 {
-	union bpf_attr attr;
-	int err;
-
-	memset(&attr, 0, sizeof(attr));
-	attr.start_id = start_id;
-
-	err = sys_bpf(BPF_MAP_GET_NEXT_ID, &attr, sizeof(attr));
-	if (!err)
-		*next_id = attr.next_id;
+	return bpf_obj_get_next_id(start_id, next_id, BPF_PROG_GET_NEXT_ID);
+}
 
-	return err;
+int bpf_map_get_next_id(__u32 start_id, __u32 *next_id)
+{
+	return bpf_obj_get_next_id(start_id, next_id, BPF_MAP_GET_NEXT_ID);
 }
 
 int bpf_prog_get_fd_by_id(__u32 id)
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next v2 4/5] libbpf: add bpf_btf_get_next_id() to cycle through BTF objects
From: Quentin Monnet @ 2019-08-20  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190820093154.14042-1-quentin.monnet@netronome.com>

Add an API function taking a BTF object id and providing the id of the
next BTF object in the kernel. This can be used to list all BTF objects
loaded on the system.

v2:
- Rebase on top of Andrii's changes regarding libbpf versioning.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/lib/bpf/bpf.c      | 5 +++++
 tools/lib/bpf/bpf.h      | 1 +
 tools/lib/bpf/libbpf.map | 2 ++
 3 files changed, 8 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 1439e99c9be5..cbb933532981 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -593,6 +593,11 @@ int bpf_map_get_next_id(__u32 start_id, __u32 *next_id)
 	return bpf_obj_get_next_id(start_id, next_id, BPF_MAP_GET_NEXT_ID);
 }
 
+int bpf_btf_get_next_id(__u32 start_id, __u32 *next_id)
+{
+	return bpf_obj_get_next_id(start_id, next_id, BPF_BTF_GET_NEXT_ID);
+}
+
 int bpf_prog_get_fd_by_id(__u32 id)
 {
 	union bpf_attr attr;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index ff42ca043dc8..0db01334740f 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -156,6 +156,7 @@ LIBBPF_API int bpf_prog_test_run(int prog_fd, int repeat, void *data,
 				 __u32 *retval, __u32 *duration);
 LIBBPF_API int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id);
 LIBBPF_API int bpf_map_get_next_id(__u32 start_id, __u32 *next_id);
+LIBBPF_API int bpf_btf_get_next_id(__u32 start_id, __u32 *next_id);
 LIBBPF_API int bpf_prog_get_fd_by_id(__u32 id);
 LIBBPF_API int bpf_map_get_fd_by_id(__u32 id);
 LIBBPF_API int bpf_btf_get_fd_by_id(__u32 id);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 4e72df8e98ba..664ce8e7a60e 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -186,4 +186,6 @@ LIBBPF_0.0.4 {
 } LIBBPF_0.0.3;
 
 LIBBPF_0.0.5 {
+	global:
+		bpf_btf_get_next_id;
 } LIBBPF_0.0.4;
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next v2 5/5] tools: bpftool: implement "bpftool btf show|list"
From: Quentin Monnet @ 2019-08-20  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190820093154.14042-1-quentin.monnet@netronome.com>

Add a "btf list" (alias: "btf show") subcommand to bpftool in order to
dump all BTF objects loaded on a system.

When running the command, hash tables are built in bpftool to retrieve
all the associations between BTF objects and BPF maps and programs. This
allows for printing all such associations when listing the BTF objects.

The command is added at the top of the subcommands for "bpftool btf", so
that typing only "bpftool btf" also comes down to listing the programs.
We could not have this with the previous command ("dump"), which
required a BTF object id, so it should not break any previous behaviour.
This also makes the "btf" command behaviour consistent with "prog" or
"map".

Bash completion is updated to use "bpftool btf" instead of "bpftool
prog" to list the BTF ids, as it looks more consistent.

Example output (plain):

    # bpftool btf show
    9: size 2989B  prog_ids 21  map_ids 15
    17: size 2847B  prog_ids 36  map_ids 30,29,28
    26: size 2847B

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../bpf/bpftool/Documentation/bpftool-btf.rst |   7 +
 tools/bpf/bpftool/bash-completion/bpftool     |  20 +-
 tools/bpf/bpftool/btf.c                       | 342 +++++++++++++++++-
 3 files changed, 363 insertions(+), 6 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-btf.rst b/tools/bpf/bpftool/Documentation/bpftool-btf.rst
index 6694a0fc8f99..39615f8e145b 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-btf.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-btf.rst
@@ -19,6 +19,7 @@ SYNOPSIS
 BTF COMMANDS
 =============
 
+|	**bpftool** **btf** { **show** | **list** } [**id** *BTF_ID*]
 |	**bpftool** **btf dump** *BTF_SRC* [**format** *FORMAT*]
 |	**bpftool** **btf help**
 |
@@ -29,6 +30,12 @@ BTF COMMANDS
 
 DESCRIPTION
 ===========
+	**bpftool btf { show | list }** [**id** *BTF_ID*]
+		  Show information about loaded BTF objects. If a BTF ID is
+		  specified, show information only about given BTF object,
+		  otherwise list all BTF objects currently loaded on the
+		  system.
+
 	**bpftool btf dump** *BTF_SRC*
 		  Dump BTF entries from a given *BTF_SRC*.
 
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 4549fd424069..2ffd351f9dbf 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -73,8 +73,8 @@ _bpftool_get_prog_tags()
 
 _bpftool_get_btf_ids()
 {
-    COMPREPLY+=( $( compgen -W "$( bpftool -jp prog 2>&1 | \
-        command sed -n 's/.*"btf_id": \(.*\),\?$/\1/p' )" -- "$cur" ) )
+    COMPREPLY+=( $( compgen -W "$( bpftool -jp btf 2>&1 | \
+        command sed -n 's/.*"id": \(.*\),$/\1/p' )" -- "$cur" ) )
 }
 
 _bpftool_get_obj_map_names()
@@ -670,7 +670,7 @@ _bpftool()
                                 map)
                                     _bpftool_get_map_ids
                                     ;;
-                                dump)
+                                $command)
                                     _bpftool_get_btf_ids
                                     ;;
                             esac
@@ -698,9 +698,21 @@ _bpftool()
                             ;;
                     esac
                     ;;
+                show|list)
+                    case $prev in
+                        $command)
+                            COMPREPLY+=( $( compgen -W "id" -- "$cur" ) )
+                            ;;
+                        id)
+                            _bpftool_get_btf_ids
+                            ;;
+                    esac
+                    return 0
+                    ;;
                 *)
                     [[ $prev == $object ]] && \
-                        COMPREPLY=( $( compgen -W 'dump help' -- "$cur" ) )
+                        COMPREPLY=( $( compgen -W 'dump help show list' \
+                            -- "$cur" ) )
                     ;;
             esac
             ;;
diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
index 8805637f1a7e..9a9376d1d3df 100644
--- a/tools/bpf/bpftool/btf.c
+++ b/tools/bpf/bpftool/btf.c
@@ -11,6 +11,7 @@
 #include <bpf.h>
 #include <libbpf.h>
 #include <linux/btf.h>
+#include <linux/hashtable.h>
 
 #include "btf.h"
 #include "json_writer.h"
@@ -35,6 +36,16 @@ static const char * const btf_kind_str[NR_BTF_KINDS] = {
 	[BTF_KIND_DATASEC]	= "DATASEC",
 };
 
+struct btf_attach_table {
+	DECLARE_HASHTABLE(table, 16);
+};
+
+struct btf_attach_point {
+	__u32 obj_id;
+	__u32 btf_id;
+	struct hlist_node hash;
+};
+
 static const char *btf_int_enc_str(__u8 encoding)
 {
 	switch (encoding) {
@@ -522,6 +533,330 @@ static int do_dump(int argc, char **argv)
 	return err;
 }
 
+static int btf_parse_fd(int *argc, char ***argv)
+{
+	unsigned int id;
+	char *endptr;
+	int fd;
+
+	if (!is_prefix(*argv[0], "id")) {
+		p_err("expected 'id', got: '%s'?", **argv);
+		return -1;
+	}
+	NEXT_ARGP();
+
+	id = strtoul(**argv, &endptr, 0);
+	if (*endptr) {
+		p_err("can't parse %s as ID", **argv);
+		return -1;
+	}
+	NEXT_ARGP();
+
+	fd = bpf_btf_get_fd_by_id(id);
+	if (fd < 0)
+		p_err("can't get BTF object by id (%u): %s",
+		      id, strerror(errno));
+
+	return fd;
+}
+
+static void delete_btf_table(struct btf_attach_table *tab)
+{
+	struct btf_attach_point *obj;
+	struct hlist_node *tmp;
+
+	unsigned int bkt;
+
+	hash_for_each_safe(tab->table, bkt, tmp, obj, hash) {
+		hash_del(&obj->hash);
+		free(obj);
+	}
+}
+
+static int
+build_btf_type_table(struct btf_attach_table *tab, enum bpf_obj_type type,
+		     void *info, __u32 *len)
+{
+	static const char * const names[] = {
+		[BPF_OBJ_UNKNOWN]	= "unknown",
+		[BPF_OBJ_PROG]		= "prog",
+		[BPF_OBJ_MAP]		= "map",
+	};
+	struct btf_attach_point *obj_node;
+	__u32 btf_id, id = 0;
+	int err;
+	int fd;
+
+	while (true) {
+		switch (type) {
+		case BPF_OBJ_PROG:
+			err = bpf_prog_get_next_id(id, &id);
+			break;
+		case BPF_OBJ_MAP:
+			err = bpf_map_get_next_id(id, &id);
+			break;
+		default:
+			err = -1;
+			p_err("unexpected object type: %d", type);
+			goto err_free;
+		}
+		if (err) {
+			if (errno == ENOENT) {
+				err = 0;
+				break;
+			}
+			p_err("can't get next %s: %s%s", names[type],
+			      strerror(errno),
+			      errno == EINVAL ? " -- kernel too old?" : "");
+			goto err_free;
+		}
+
+		switch (type) {
+		case BPF_OBJ_PROG:
+			fd = bpf_prog_get_fd_by_id(id);
+			break;
+		case BPF_OBJ_MAP:
+			fd = bpf_map_get_fd_by_id(id);
+			break;
+		default:
+			err = -1;
+			p_err("unexpected object type: %d", type);
+			goto err_free;
+		}
+		if (fd < 0) {
+			if (errno == ENOENT)
+				continue;
+			p_err("can't get %s by id (%u): %s", names[type], id,
+			      strerror(errno));
+			err = -1;
+			goto err_free;
+		}
+
+		memset(info, 0, *len);
+		err = bpf_obj_get_info_by_fd(fd, info, len);
+		close(fd);
+		if (err) {
+			p_err("can't get %s info: %s", names[type],
+			      strerror(errno));
+			goto err_free;
+		}
+
+		switch (type) {
+		case BPF_OBJ_PROG:
+			btf_id = ((struct bpf_prog_info *)info)->btf_id;
+			break;
+		case BPF_OBJ_MAP:
+			btf_id = ((struct bpf_map_info *)info)->btf_id;
+			break;
+		default:
+			err = -1;
+			p_err("unexpected object type: %d", type);
+			goto err_free;
+		}
+		if (!btf_id)
+			continue;
+
+		obj_node = calloc(1, sizeof(*obj_node));
+		if (!obj_node) {
+			p_err("failed to allocate memory: %s", strerror(errno));
+			goto err_free;
+		}
+
+		obj_node->obj_id = id;
+		obj_node->btf_id = btf_id;
+		hash_add(tab->table, &obj_node->hash, obj_node->btf_id);
+	}
+
+	return 0;
+
+err_free:
+	delete_btf_table(tab);
+	return err;
+}
+
+static int
+build_btf_tables(struct btf_attach_table *btf_prog_table,
+		 struct btf_attach_table *btf_map_table)
+{
+	struct bpf_prog_info prog_info;
+	__u32 prog_len = sizeof(prog_info);
+	struct bpf_map_info map_info;
+	__u32 map_len = sizeof(map_info);
+	int err = 0;
+
+	err = build_btf_type_table(btf_prog_table, BPF_OBJ_PROG, &prog_info,
+				   &prog_len);
+	if (err)
+		return err;
+
+	err = build_btf_type_table(btf_map_table, BPF_OBJ_MAP, &map_info,
+				   &map_len);
+	if (err) {
+		delete_btf_table(btf_prog_table);
+		return err;
+	}
+
+	return 0;
+}
+
+static void
+show_btf_plain(struct bpf_btf_info *info, int fd,
+	       struct btf_attach_table *btf_prog_table,
+	       struct btf_attach_table *btf_map_table)
+{
+	struct btf_attach_point *obj;
+	int n;
+
+	printf("%u: ", info->id);
+	printf("size %uB", info->btf_size);
+
+	n = 0;
+	hash_for_each_possible(btf_prog_table->table, obj, hash, info->id) {
+		if (obj->btf_id == info->id)
+			printf("%s%u", n++ == 0 ? "  prog_ids " : ",",
+			       obj->obj_id);
+	}
+
+	n = 0;
+	hash_for_each_possible(btf_map_table->table, obj, hash, info->id) {
+		if (obj->btf_id == info->id)
+			printf("%s%u", n++ == 0 ? "  map_ids " : ",",
+			       obj->obj_id);
+	}
+
+	printf("\n");
+}
+
+static void
+show_btf_json(struct bpf_btf_info *info, int fd,
+	      struct btf_attach_table *btf_prog_table,
+	      struct btf_attach_table *btf_map_table)
+{
+	struct btf_attach_point *obj;
+
+	jsonw_start_object(json_wtr);	/* btf object */
+	jsonw_uint_field(json_wtr, "id", info->id);
+	jsonw_uint_field(json_wtr, "size", info->btf_size);
+
+	jsonw_name(json_wtr, "prog_ids");
+	jsonw_start_array(json_wtr);	/* prog_ids */
+	hash_for_each_possible(btf_prog_table->table, obj, hash,
+			       info->id) {
+		if (obj->btf_id == info->id)
+			jsonw_uint(json_wtr, obj->obj_id);
+	}
+	jsonw_end_array(json_wtr);	/* prog_ids */
+
+	jsonw_name(json_wtr, "map_ids");
+	jsonw_start_array(json_wtr);	/* map_ids */
+	hash_for_each_possible(btf_map_table->table, obj, hash,
+			       info->id) {
+		if (obj->btf_id == info->id)
+			jsonw_uint(json_wtr, obj->obj_id);
+	}
+	jsonw_end_array(json_wtr);	/* map_ids */
+	jsonw_end_object(json_wtr);	/* btf object */
+}
+
+static int
+show_btf(int fd, struct btf_attach_table *btf_prog_table,
+	 struct btf_attach_table *btf_map_table)
+{
+	struct bpf_btf_info info = {};
+	__u32 len = sizeof(info);
+	int err;
+
+	err = bpf_obj_get_info_by_fd(fd, &info, &len);
+	if (err) {
+		p_err("can't get BTF object info: %s", strerror(errno));
+		return -1;
+	}
+
+	if (json_output)
+		show_btf_json(&info, fd, btf_prog_table, btf_map_table);
+	else
+		show_btf_plain(&info, fd, btf_prog_table, btf_map_table);
+
+	return 0;
+}
+
+static int do_show(int argc, char **argv)
+{
+	struct btf_attach_table btf_prog_table;
+	struct btf_attach_table btf_map_table;
+	int err, fd = -1;
+	__u32 id = 0;
+
+	if (argc == 2) {
+		fd = btf_parse_fd(&argc, &argv);
+		if (fd < 0)
+			return -1;
+	}
+
+	if (argc) {
+		if (fd >= 0)
+			close(fd);
+		return BAD_ARG();
+	}
+
+	hash_init(btf_prog_table.table);
+	hash_init(btf_map_table.table);
+	err = build_btf_tables(&btf_prog_table, &btf_map_table);
+	if (err) {
+		if (fd >= 0)
+			close(fd);
+		return err;
+	}
+
+	if (fd >= 0) {
+		err = show_btf(fd, &btf_prog_table, &btf_map_table);
+		close(fd);
+		goto exit_free;
+	}
+
+	if (json_output)
+		jsonw_start_array(json_wtr);	/* root array */
+
+	while (true) {
+		err = bpf_btf_get_next_id(id, &id);
+		if (err) {
+			if (errno == ENOENT) {
+				err = 0;
+				break;
+			}
+			p_err("can't get next BTF object: %s%s",
+			      strerror(errno),
+			      errno == EINVAL ? " -- kernel too old?" : "");
+			err = -1;
+			break;
+		}
+
+		fd = bpf_btf_get_fd_by_id(id);
+		if (fd < 0) {
+			if (errno == ENOENT)
+				continue;
+			p_err("can't get BTF object by id (%u): %s",
+			      id, strerror(errno));
+			err = -1;
+			break;
+		}
+
+		err = show_btf(fd, &btf_prog_table, &btf_map_table);
+		close(fd);
+		if (err)
+			break;
+	}
+
+	if (json_output)
+		jsonw_end_array(json_wtr);	/* root array */
+
+exit_free:
+	delete_btf_table(&btf_prog_table);
+	delete_btf_table(&btf_map_table);
+
+	return err;
+}
+
 static int do_help(int argc, char **argv)
 {
 	if (json_output) {
@@ -530,7 +865,8 @@ static int do_help(int argc, char **argv)
 	}
 
 	fprintf(stderr,
-		"Usage: %s btf dump BTF_SRC [format FORMAT]\n"
+		"Usage: %s btf { show | list } [id BTF_ID]\n"
+		"       %s btf dump BTF_SRC [format FORMAT]\n"
 		"       %s btf help\n"
 		"\n"
 		"       BTF_SRC := { id BTF_ID | prog PROG | map MAP [{key | value | kv | all}] | file FILE }\n"
@@ -539,12 +875,14 @@ static int do_help(int argc, char **argv)
 		"       " HELP_SPEC_PROGRAM "\n"
 		"       " HELP_SPEC_OPTIONS "\n"
 		"",
-		bin_name, bin_name);
+		bin_name, bin_name, bin_name);
 
 	return 0;
 }
 
 static const struct cmd cmds[] = {
+	{ "show",	do_show },
+	{ "list",	do_show },
 	{ "help",	do_help },
 	{ "dump",	do_dump },
 	{ 0 }
-- 
2.17.1


^ permalink raw reply related

* Re: Help needed - Kernel lockup while running ipsec
From: Florian Westphal @ 2019-08-20  9:38 UTC (permalink / raw)
  To: Vakul Garg; +Cc: Florian Westphal, netdev@vger.kernel.org
In-Reply-To: <DB7PR04MB4620487074796FBC015AFD098BAB0@DB7PR04MB4620.eurprd04.prod.outlook.com>

Vakul Garg <vakul.garg@nxp.com> wrote:
> 
> 
> > -----Original Message-----
> > From: Florian Westphal <fw@strlen.de>
> > Sent: Tuesday, August 20, 2019 2:53 PM
> > To: Vakul Garg <vakul.garg@nxp.com>
> > Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> > Subject: Re: Help needed - Kernel lockup while running ipsec
> > 
> > Vakul Garg <vakul.garg@nxp.com> wrote:
> > > > > With kernel 4.14.122, I am getting a kernel softlockup while
> > > > > running single
> > > > static ipsec tunnel.
> > > > > The problem reproduces mostly after running 8-10 hours of ipsec
> > > > > encap
> > > > test (on my dual core arm board).
> > > > >
> > > > > I found that in function xfrm_policy_lookup_bytype(), the policy
> > > > > in variable
> > > > 'ret' shows refcnt=0 under problem situation.
> > > > > This creates an infinite loop in  xfrm_policy_lookup_bytype() and
> > > > > hence the
> > > > lockup.
> > > > >
> > > > > Can some body please provide me pointers about 'refcnt'?
> > > > > Is it legitimate for 'refcnt' to become '0'? Under what condition
> > > > > can it
> > > > become '0'?
> > > >
> > > > Yes, when policy is destroyed and the last user calls
> > > > xfrm_pol_put() which will invoke call_rcu to free the structure.
> > >
> > > It seems that policy reference count never gets decremented during packet
> > ipsec encap.
> > > It is getting incremented for every frame that hits the policy.
> > > In setkey -DP output, I see refcnt to be wrapping around after '0'.
> > 
> > Thats a bug.  Does this affect 4.14 only or does this happen on current tree
> > as well?
>  
> I am yet to try it on 4.19.
> Can you help me with the right fix? Which part of code should it get decremented?
> I am not conversant with xfrm code.

Normally policy reference counts get decremented when the skb is free'd, via dst
destruction (xfrm_dst_destroy()).

Do you see a dst leak as well?


^ permalink raw reply

* Re: [PATCH -next] bpf: Use PTR_ERR_OR_ZERO in xsk_map_inc()
From: Dan Carpenter @ 2019-08-20  9:44 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Björn Töpel, YueHaibing, Karlsson, Magnus,
	Jonathan Lemon, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, Netdev,
	bpf, kernel-janitors
In-Reply-To: <CAJ+HfNhRf+=yN6eOOZ1zp8=VicT-k6nHLO6r+f__O5X3M+N=ug@mail.gmail.com>

On Tue, Aug 20, 2019 at 11:25:29AM +0200, Björn Töpel wrote:
> On Tue, 20 Aug 2019 at 10:59, Dan Carpenter <dan.carpenter@oracle.com> wrote:
> >
> > On Tue, Aug 20, 2019 at 09:28:26AM +0200, Björn Töpel wrote:
> > > For future patches: Prefix AF_XDP socket work with "xsk:" and use "PATCH
> > > bpf-next" to let the developers know what tree you're aiming for.
> >
> > There are over 300 trees in linux-next.  It impossible to try remember
> > everyone's trees.  No one else has this requirement.
> >
> 
> Net/bpf are different, and I wanted to point that out to lessen the
> burden for the maintainers. It's documented in:
> 
> Documentation/bpf/bpf_devel_QA.rst.
> Documentation/networking/netdev-FAQ.rst

Ah...  I hadn't realized that BPF patches were confusing to Dave.

I actually do keep track of net and net-next.  I do quite a bit of extra
stuff for netdev patches.  So what about if we used [PATCH] for bpf and
[PATCH net] and [PATCH net-next] for networking?

I will do that.

regards,
dan carpenter

^ permalink raw reply

* Re: [PATCH net-next v3 2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts
From: Miroslav Lichvar @ 2019-08-20  9:49 UTC (permalink / raw)
  To: Hubert Feurstein
  Cc: netdev, linux-kernel, Andrew Lunn, Richard Cochran,
	Florian Fainelli, Heiner Kallweit, Vladimir Oltean,
	David S. Miller
In-Reply-To: <20190820084833.6019-3-hubert.feurstein@vahle.at>

On Tue, Aug 20, 2019 at 10:48:31AM +0200, Hubert Feurstein wrote:

> +	/* PTP offset compensation:
> +	 * After the MDIO access is completed (from the chip perspective), the
> +	 * switch chip will snapshot the PHC timestamp. To make sure our system
> +	 * timestamp corresponds to the PHC timestamp, we have to add the
> +	 * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys
> +	 * takes the average of pre_ts and post_ts to calculate the final
> +	 * system timestamp. With this in mind, we have to add ptp_sts_offset
> +	 * twice to post_ts, in order to not introduce an constant time offset.
> +	 */
> +	if (sts)
> +		timespec64_add_ns(&sts->post_ts, 2 * bus->ptp_sts_offset);

This correction looks good to me.

Is the MDIO write delay constant in reality, or does it at least have
an upper bound? That is, is it always true that the post_ts timestamp
does not point to a time before the PHC timestamp was actually taken?

This is important to not break the estimation of maximum error in the
measured offset. Applications using the ioctl may assume that the
maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by
phc2sys). That would not work if the delay could be occasionally 50
microseconds for instance, i.e. the post_ts timestamp would be earlier
than the PHC timestamp.

-- 
Miroslav Lichvar

^ permalink raw reply

* Re: [PATCH] can: flexcan: free error skb if enqueueing failed
From: Sean Nyekjaer @ 2019-08-20  9:49 UTC (permalink / raw)
  To: Martin Hundebøll, Wolfgang Grandegger, Marc Kleine-Budde,
	linux-can
  Cc: David S . Miller, netdev, Joakim Zhang
In-Reply-To: <20190715185308.104333-1-martin@geanix.com>

CC'ing Joakim Zhang

On 15/07/2019 20.53, Martin Hundebøll wrote:
> If the call to can_rx_offload_queue_sorted() fails, the passed skb isn't
> consumed, so the caller must do so.
> 
> Fixes: 30164759db1b ("can: flexcan: make use of rx-offload's irq_offload_fifo")
> Signed-off-by: Martin Hundebøll <martin@geanix.com>
> ---
>   drivers/net/can/flexcan.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
> index 1c66fb2ad76b..21f39e805d42 100644
> --- a/drivers/net/can/flexcan.c
> +++ b/drivers/net/can/flexcan.c
> @@ -688,7 +688,8 @@ static void flexcan_irq_bus_err(struct net_device *dev, u32 reg_esr)
>   	if (tx_errors)
>   		dev->stats.tx_errors++;
>   
> -	can_rx_offload_queue_sorted(&priv->offload, skb, timestamp);
> +	if (can_rx_offload_queue_sorted(&priv->offload, skb, timestamp))
> +		kfree_skb(skb);
>   }
>   
>   static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
> @@ -732,7 +733,8 @@ static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
>   	if (unlikely(new_state == CAN_STATE_BUS_OFF))
>   		can_bus_off(dev);
>   
> -	can_rx_offload_queue_sorted(&priv->offload, skb, timestamp);
> +	if (can_rx_offload_queue_sorted(&priv->offload, skb, timestamp))
> +		kfree_skb(skb);
>   }
>   
>   static inline struct flexcan_priv *rx_offload_to_priv(struct can_rx_offload *offload)
> 

^ permalink raw reply

* [PATCH bpf-next] bpf: add BTF ids in procfs for file descriptors to BTF objects
From: Quentin Monnet @ 2019-08-20  9:52 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet

Implement the show_fdinfo hook for BTF FDs file operations, and make it
print the id and the size of the BTF object. This allows for a quick
retrieval of the BTF id from its FD; or it can help understanding what
type of object (BTF) the file descriptor points to.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 kernel/bpf/btf.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 5fcc7a17eb5a..39e184f1b27c 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3376,6 +3376,19 @@ void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
 	btf_type_ops(t)->seq_show(btf, t, type_id, obj, 0, m);
 }
 
+#ifdef CONFIG_PROC_FS
+static void bpf_btf_show_fdinfo(struct seq_file *m, struct file *filp)
+{
+	const struct btf *btf = filp->private_data;
+
+	seq_printf(m,
+		   "btf_id:\t%u\n"
+		   "data_size:\t%u\n",
+		   btf->id,
+		   btf->data_size);
+}
+#endif
+
 static int btf_release(struct inode *inode, struct file *filp)
 {
 	btf_put(filp->private_data);
@@ -3383,6 +3396,9 @@ static int btf_release(struct inode *inode, struct file *filp)
 }
 
 const struct file_operations btf_fops = {
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo	= bpf_btf_show_fdinfo,
+#endif
 	.release	= btf_release,
 };
 
-- 
2.17.1


^ permalink raw reply related

* RE: Help needed - Kernel lockup while running ipsec
From: Vakul Garg @ 2019-08-20  9:52 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev@vger.kernel.org
In-Reply-To: <20190820093800.GN2588@breakpoint.cc>



> -----Original Message-----
> From: Florian Westphal <fw@strlen.de>
> Sent: Tuesday, August 20, 2019 3:08 PM
> To: Vakul Garg <vakul.garg@nxp.com>
> Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> Subject: Re: Help needed - Kernel lockup while running ipsec
> 
> Vakul Garg <vakul.garg@nxp.com> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Florian Westphal <fw@strlen.de>
> > > Sent: Tuesday, August 20, 2019 2:53 PM
> > > To: Vakul Garg <vakul.garg@nxp.com>
> > > Cc: Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
> > > Subject: Re: Help needed - Kernel lockup while running ipsec
> > >
> > > Vakul Garg <vakul.garg@nxp.com> wrote:
> > > > > > With kernel 4.14.122, I am getting a kernel softlockup while
> > > > > > running single
> > > > > static ipsec tunnel.
> > > > > > The problem reproduces mostly after running 8-10 hours of
> > > > > > ipsec encap
> > > > > test (on my dual core arm board).
> > > > > >
> > > > > > I found that in function xfrm_policy_lookup_bytype(), the
> > > > > > policy in variable
> > > > > 'ret' shows refcnt=0 under problem situation.
> > > > > > This creates an infinite loop in  xfrm_policy_lookup_bytype()
> > > > > > and hence the
> > > > > lockup.
> > > > > >
> > > > > > Can some body please provide me pointers about 'refcnt'?
> > > > > > Is it legitimate for 'refcnt' to become '0'? Under what
> > > > > > condition can it
> > > > > become '0'?
> > > > >
> > > > > Yes, when policy is destroyed and the last user calls
> > > > > xfrm_pol_put() which will invoke call_rcu to free the structure.
> > > >
> > > > It seems that policy reference count never gets decremented during
> > > > packet
> > > ipsec encap.
> > > > It is getting incremented for every frame that hits the policy.
> > > > In setkey -DP output, I see refcnt to be wrapping around after '0'.
> > >
> > > Thats a bug.  Does this affect 4.14 only or does this happen on
> > > current tree as well?
> >
> > I am yet to try it on 4.19.
> > Can you help me with the right fix? Which part of code should it get
> decremented?
> > I am not conversant with xfrm code.
> 
> Normally policy reference counts get decremented when the skb is free'd, via
> dst destruction (xfrm_dst_destroy()).
> 
> Do you see a dst leak as well?

Can you please guide me how to detect it?

(I am checking refcount on recent kernel and will let you know.)

^ permalink raw reply

* Re: [PATCH net-next 3/6] net: dsa: Delete the VID from the upstream port as well
From: Vladimir Oltean @ 2019-08-20  9:54 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Florian Fainelli, Andrew Lunn, Ido Schimmel, Roopa Prabhu,
	nikolay, David S. Miller, netdev
In-Reply-To: <20190820015138.GB975@t480s.localdomain>

Hi Vivien!

On Tue, 20 Aug 2019 at 08:51, Vivien Didelot <vivien.didelot@gmail.com> wrote:
>
> Vladimir,
>
> On Tue, 20 Aug 2019 02:59:59 +0300, Vladimir Oltean <olteanv@gmail.com> wrote:
> > Commit b2f81d304cee ("net: dsa: add CPU and DSA ports as VLAN members")
> > is littering a lot. After deleting a VLAN added on a DSA port, it still
> > remains installed in the hardware filter of the upstream port. Fix this.
>
> Littering a lot, really?
>
> FYI we are not removing the target VLAN from the hardware yet because it would
> be too expensive to cache data in DSA core in order to know if the VID is not
> used by any other slave port of the fabric anymore, and thus safe to remove.
>
> Keeping the VID programmed for DSA and CPU ports is simpler for the moment,
> as an hardware VLAN with only these ports as members is unlikely to harm.
>
> >
> > Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
> > ---
> >  net/dsa/switch.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/net/dsa/switch.c b/net/dsa/switch.c
> > index 09d9286b27cc..84ab2336131e 100644
> > --- a/net/dsa/switch.c
> > +++ b/net/dsa/switch.c
> > @@ -295,11 +295,20 @@ static int dsa_switch_vlan_del(struct dsa_switch *ds,
> >                              struct dsa_notifier_vlan_info *info)
> >  {
> >       const struct switchdev_obj_port_vlan *vlan = info->vlan;
> > +     int port;
> >
> >       if (!ds->ops->port_vlan_del)
> >               return -EOPNOTSUPP;
> >
> > +     /* Build a mask of VLAN members */
> > +     bitmap_zero(ds->bitmap, ds->num_ports);
> >       if (ds->index == info->sw_index)
> > +             set_bit(info->port, ds->bitmap);
> > +     for (port = 0; port < ds->num_ports; port++)
> > +             if (dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))
> > +                     set_bit(port, ds->bitmap);
> > +
> > +     for_each_set_bit(port, ds->bitmap, ds->num_ports)
> >               return ds->ops->port_vlan_del(ds, info->port, vlan);
>
> You return right away from the loop? You use info->port instead of port?
>
> >
> >       return 0;
>
> Even if you patch wasn't badly broken, "bridge vlan del" targeting a single
> switch port would also remove the VLAN from the CPU port and thus breaking
> offloaded 802.1q. It would also remove it from the DSA ports interconnecting
> multiple switches, thus breaking the 802.1q conduit for the whole fabric.
>
> So you're not fixing anything here, but you're breaking single-chip and
> cross-chip hardware VLAN. Seriously wtf is this patch?
>
> NAK!
>
>
>         Vivien

I can agree that this isn't one of my brightest moments. But at least
we get to see Cunningham's law in action :)
When dsa_8021q is cleaning up the switch's VLAN table for the bridge
to use it, it is good to really clean it up, i.e. not leave any VLAN
installed on the upstream ports.
But I think this is just an academical concern at this point. In
vlan_filtering mode, the CPU port will accept VLAN frames with the
dsa_8021q ID's, but they will eventually get dropped due to no
destination. The real breaker is the pvid change. If something like
patch 4/6 gets accepted I will drop this one.

Regards,
-Vladimir

^ permalink raw reply

* Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
From: Christophe de Dinechin @ 2019-08-20  9:58 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Alex Williamson, Jiri Pirko, David S . Miller, Kirti Wankhede,
	Cornelia Huck, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	cjia@nvidia.com, netdev@vger.kernel.org
In-Reply-To: <AM0PR05MB48668B6221E477A873688CDBD1AB0@AM0PR05MB4866.eurprd05.prod.outlook.com>


Parav Pandit writes:

> + Dave.
>
> Hi Jiri, Dave, Alex, Kirti, Cornelia,
>
> Please provide your feedback on it, how shall we proceed?
>
> Hence, I would like to discuss below options.
>
> Option-1: mdev index
> Introduce an optional mdev index/handle as u32 during mdev create time.
> User passes mdev index/handle as input.
>
> phys_port_name=mIndex=m%u
> mdev_index will be available in sysfs as mdev attribute for udev to name the mdev's netdev.
>
> example mdev create command:
> UUID=$(uuidgen)
> echo $UUID index=10 > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
> example netdevs:
> repnetdev=ens2f0_m10	/*ens2f0 is parent PF's netdevice */
> mdev_netdev=enm10
>
> Pros:
> 1. mdevctl and any other existing tools are unaffected.
> 2. netdev stack, ovs and other switching platforms are unaffected.
> 3. achieves unique phys_port_name for representor netdev
> 4. achieves unique mdev eth netdev name for the mdev using udev/systemd extension.
> 5. Aligns well with mdev and netdev subsystem and similar to existing sriov bdf's.
>
> Option-2: shorter mdev name
> Extend mdev to have shorter mdev device name in addition to UUID.
> such as 'foo', 'bar'.
> Mdev will continue to have UUID.
> phys_port_name=mdev_name
>
> Pros:
> 1. All same as option-1, except mdevctl needs upgrade for newer usage.
> It is common practice to upgrade iproute2 package along with the kernel.
> Similar practice to be done with mdevctl.
> 2. Newer users of mdevctl who wants to work with non_UUID names, will use newer mdevctl/tools.
> Cons:
> 1. Dual naming scheme of mdev might affect some of the existing tools.
> It's unclear how/if it actually affects.
> mdevctl [2] is very recently developed and can be enhanced for dual naming scheme.
>
> Option-3: mdev uuid alias
> Instead of shorter mdev name or mdev index, have alpha-numeric name alias.
> Alias is an optional mdev sysfs attribute such as 'foo', 'bar'.
> example mdev create command:
> UUID=$(uuidgen)
> echo $UUID alias=foo > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
> example netdevs:
> examle netdevs:
> repnetdev = ens2f0_mfoo
> mdev_netdev=enmfoo
>
> Pros:
> 1. All same as option-1.
> 2. Doesn't affect existing mdev naming scheme.
> Cons:
> 1. Index scheme of option-1 is better which can number large number of mdevs with fewer characters, simplifying the management tool.

I believe that Alex pointed out another "Cons" to all three options,
which is that it forces user-space to resolve potential race conditions
when creating an index or short name or alias.

Also, what happens if `index=10` is not provided on the command-line?
Does that make the device unusable for your purpose?


--
Cheers,
Christophe de Dinechin (IRC c3d)

^ permalink raw reply

* Re: [PATCH] ipvs: change type of delta and previous_delta in ip_vs_seq.
From: Julian Anastasov @ 2019-08-20 10:00 UTC (permalink / raw)
  To: zhang kai; +Cc: Wensong Zhang, Simon Horman, lvs-devel, netdev
In-Reply-To: <20190820003718.GA16620@toolchain>


	Hello,

	Cc list trimmed...

On Tue, 20 Aug 2019, zhang kai wrote:

> In NAT forwarding mode, Applications may decrease the size of packets,
> and TCP sequences will get smaller, so both of variables will be negetive
> values in this case.

	As long as nobody cares about their sign, the type should not
matter. You can not solve all signed/unsigned mismatches with such
small patch. Or you are seeing some problem, may be in debug?

> 
> Signed-off-by: zhang kai <zhangkaiheb@126.com>
> ---
>  include/net/ip_vs.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> index 3759167f91f5..de7e75063c7c 100644
> --- a/include/net/ip_vs.h
> +++ b/include/net/ip_vs.h
> @@ -346,8 +346,8 @@ enum ip_vs_sctp_states {
>   */
>  struct ip_vs_seq {
>  	__u32			init_seq;	/* Add delta from this seq */
> -	__u32			delta;		/* Delta in sequence numbers */
> -	__u32			previous_delta;	/* Delta in sequence numbers
> +	__s32			delta;		/* Delta in sequence numbers */
> +	__s32			previous_delta;	/* Delta in sequence numbers
>  						 * before last resized pkt */
>  };
>  
> -- 
> 2.17.1

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Antoine Tenart @ 2019-08-20 10:01 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Igor Russkikh, Andrew Lunn, Antoine Tenart, davem@davemloft.net,
	f.fainelli@gmail.com, hkallweit1@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	thomas.petazzoni@bootlin.com, alexandre.belloni@bootlin.com,
	allan.nielsen@microchip.com, camelia.groza@nxp.com,
	Simon Edelhaus, Pavel Belous
In-Reply-To: <20190816132959.GC8697@bistromath.localdomain>

Hi Sabrina,

On Fri, Aug 16, 2019 at 03:29:59PM +0200, Sabrina Dubroca wrote:
> 2019-08-13, 16:18:40 +0000, Igor Russkikh wrote:
> > On 13.08.2019 16:17, Andrew Lunn wrote:
> 
> > That could be a strong limitation in
> > cases when user sees HW macsec offload is broken or work differently, and he/she
> > wants to replace it with SW one.
> 
> Agreed, I think an offload that cannot be disabled is quite problematic.
> 
> > MACSec is a complex feature, and it may happen something is missing in HW.
> > Trivial example is 256bit encryption, which is not always a musthave in HW
> > implementations.
> 
> +1
> 
> > 2) I think, Antoine, its not totally true that otherwise the user macsec API
> > will be broken/changed. netlink api is the same, the only thing we may want to
> > add is an optional parameter to force selection of SW macsec engine.
> 
> Yes, I think we need an offload on/off parameter (and IMO it should
> probably be off by default). Then, if offloading is requested but
> cannot be satisfied (unsupported key length, too many SAs, etc), or if
> incompatible settings are requested (mixing offloaded and
> non-offloaded SCs on a device that cannot do it), return an error.
> 
> If we also export that offload parameter during netlink dumps, we can
> inspect the state of the system, which helps for debugging.

So it seems the ability to enable or disable the offloading on a given
interface is the main missing feature. I'll add that, however I'll
probably (at least at first):

- Have the interface to be fully offloaded or fully handled in s/w (with
  errors being thrown if a given configuration isn't supported). Having
  both at the same time on a given interface would be tricky because of
  the MACsec validation parameter.

- Won't allow to enable/disable the offloading of there are rules in
  place, as we're not sure the same rules would be accepted by the other
  implementation.

I'm not sure if we should allow to mix the implementations on a given
physical interface (by having two MACsec interfaces attached) as the
validation would be impossible to do (we would have no idea if a
packet was correctly handled by the offloading part or just not being
a MACsec packet in the first place, in Rx).

I agree the offloading should be disabled by default, and only enabled
by an user explicitly.

Thanks,
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Antoine Tenart @ 2019-08-20 10:03 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Igor Russkikh, Antoine Tenart, davem@davemloft.net,
	sd@queasysnail.net, f.fainelli@gmail.com, hkallweit1@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	thomas.petazzoni@bootlin.com, alexandre.belloni@bootlin.com,
	allan.nielsen@microchip.com, camelia.groza@nxp.com,
	Simon Edelhaus, Pavel Belous
In-Reply-To: <20190813162823.GH15047@lunn.ch>

Hi Andrew,

On Tue, Aug 13, 2019 at 06:28:23PM +0200, Andrew Lunn wrote:
> > 1) With current implementation it's impossible to install SW macsec engine onto
> > the device which supports HW offload. That could be a strong limitation in
> > cases when user sees HW macsec offload is broken or work differently, and he/she
> > wants to replace it with SW one.
> > MACSec is a complex feature, and it may happen something is missing in HW.
> > Trivial example is 256bit encryption, which is not always a musthave in HW
> > implementations.
> 
> It would also be nice to add extra information to the netlink API to
> indicate if HW or SW is being used. In other places where we offload
> to accelerators we have such additional information.

Agreed, in addition to being able to enable/disable the offloading we
should have a way to know if a MACsec interface is being offloaded or
not.

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* [PATCH bpf-next] xsk: proper socket state check in xsk_poll
From: Björn Töpel @ 2019-08-20 10:04 UTC (permalink / raw)
  To: syzbot+c82697e3043781e08802, ast, daniel, netdev
  Cc: bjorn.topel, bpf, davem, hawk, jakub.kicinski, john.fastabend,
	jonathan.lemon, kafai, linux-kernel, magnus.karlsson,
	songliubraving, syzkaller-bugs, xdp-newbies, yhs, hdanton
In-Reply-To: <0000000000009167320590823a8c@google.com>

From: Björn Töpel <bjorn.topel@intel.com>

The poll() implementation for AF_XDP sockets did not perform the
proper state checks, prior accessing the socket umem. This patch fixes
that by performing a xsk_is_bound() check.

Suggested-by: Hillf Danton <hdanton@sina.com>
Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ee4428a892fa..08bed5e92af4 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -356,13 +356,20 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
 	return err;
 }
 
+static bool xsk_is_bound(struct xdp_sock *xs)
+{
+	struct net_device *dev = READ_ONCE(xs->dev);
+
+	return dev && xs->state == XSK_BOUND;
+}
+
 static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 {
 	bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
 	struct sock *sk = sock->sk;
 	struct xdp_sock *xs = xdp_sk(sk);
 
-	if (unlikely(!xs->dev))
+	if (unlikely(!xsk_is_bound(xs)))
 		return -ENXIO;
 	if (unlikely(!(xs->dev->flags & IFF_UP)))
 		return -ENETDOWN;
@@ -383,6 +390,9 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
 	struct net_device *dev = xs->dev;
 	struct xdp_umem *umem = xs->umem;
 
+	if (unlikely(!xsk_is_bound(xs)))
+		return mask;
+
 	if (umem->need_wakeup)
 		dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
 						umem->need_wakeup);
@@ -417,7 +427,7 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
 {
 	struct net_device *dev = xs->dev;
 
-	if (!dev || xs->state != XSK_BOUND)
+	if (!xsk_is_bound(xs))
 		return;
 
 	xs->state = XSK_UNBOUND;
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Antoine Tenart @ 2019-08-20 10:07 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Antoine Tenart, Igor Russkikh, davem@davemloft.net,
	andrew@lunn.ch, f.fainelli@gmail.com, hkallweit1@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	thomas.petazzoni@bootlin.com, alexandre.belloni@bootlin.com,
	allan.nielsen@microchip.com, camelia.groza@nxp.com,
	Simon Edelhaus, Pavel Belous
In-Reply-To: <20190816132500.GA8697@bistromath.localdomain>

Hi Sabrina,

On Fri, Aug 16, 2019 at 03:25:00PM +0200, Sabrina Dubroca wrote:
> 2019-08-13, 10:58:17 +0200, Antoine Tenart wrote:
> > 
> > As for the need for xmit / handle_frame ops (for a MAC w/ MACsec
> > offloading), I'd say the xmit / handle_frame ops of the real net device
> > driver could be used as the one of the MACsec virtual interface do not
> > do much (regardless of the implementation choice discussed above).
> 
> There's no "handle_frame" op on a real device. macsec_handle_frame is
> an rx_handler specificity that grabs packets from a real device and
> sends them into a virtual device stacked on top of it. A real device
> just hands packets over to the stack via NAPI.

Right, so if there is a need for a custom implementation of xmit /
hamdle_frame we could let the offloading driver provide it. I'll
probably let the first MAC implementation add it, as we agreed not to
add an API with no user (and mine is done in the PHY).

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* Re: [PATCH net-next v2 5/9] net: phy: add MACsec ops in phy_device
From: Antoine Tenart @ 2019-08-20 10:07 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Antoine Tenart, davem, sd, andrew, hkallweit1, netdev,
	linux-kernel, thomas.petazzoni, alexandre.belloni, allan.nielsen,
	camelia.groza, Simon.Edelhaus
In-Reply-To: <1521a28b-a0af-b3fb-d1bf-af82ec2f3d47@gmail.com>

Hi Florian,

On Wed, Aug 14, 2019 at 04:15:03PM -0700, Florian Fainelli wrote:
> On 8/8/19 7:05 AM, Antoine Tenart wrote:
> > This patch adds a reference to MACsec ops in the phy_device, to allow
> > PHYs to support offloading MACsec operations. The phydev lock will be
> > held while calling those helpers.
> > 
> > Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
> > ---
> >  include/linux/phy.h | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/include/linux/phy.h b/include/linux/phy.h
> > index 462b90b73f93..6947a19587e4 100644
> > --- a/include/linux/phy.h
> > +++ b/include/linux/phy.h
> > @@ -22,6 +22,10 @@
> >  #include <linux/workqueue.h>
> >  #include <linux/mod_devicetable.h>
> >  
> > +#ifdef CONFIG_MACSEC
> > +#include <net/macsec.h>
> > +#endif
> 
> #if IS_ENABLED(CONFIG_MACSEC)
> 
> > +
> >  #include <linux/atomic.h>
> >  
> >  #define PHY_DEFAULT_FEATURES	(SUPPORTED_Autoneg | \
> > @@ -345,6 +349,7 @@ struct phy_c45_device_ids {
> >   * attached_dev: The attached enet driver's device instance ptr
> >   * adjust_link: Callback for the enet controller to respond to
> >   * changes in the link state.
> > + * macsec_ops: MACsec offloading ops.
> >   *
> >   * speed, duplex, pause, supported, advertising, lp_advertising,
> >   * and autoneg are used like in mii_if_info
> > @@ -438,6 +443,11 @@ struct phy_device {
> >  
> >  	void (*phy_link_change)(struct phy_device *, bool up, bool do_carrier);
> >  	void (*adjust_link)(struct net_device *dev);
> > +
> > +#if defined(CONFIG_MACSEC)
> > +	/* MACsec management functions */
> > +	const struct macsec_ops *macsec_ops;
> > +#endif
> 
> #if IS_ENABLED(CONFIG_MACSEC)
> 
> likewise.

I'll fix it.

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox