Netdev List
 help / color / mirror / Atom feed
* [PATCH ipsec-next 0/7] ipsec: add TCP encapsulation support (RFC 8229)
From: Sabrina Dubroca @ 2019-08-21 21:46 UTC (permalink / raw)
  To: netdev; +Cc: Herbert Xu, Steffen Klassert, Sabrina Dubroca

This patchset introduces support for TCP encapsulation of IKE and ESP
messages, as defined by RFC 8229 [0]. It is an evolution of what
Herbert Xu proposed in January 2018 [1] that addresses the main
criticism against it, by not interfering with the TCP implementation
at all. The networking stack now has infrastructure for this: TCP ULPs
and Stream Parsers.

The first patches are preparation and refactoring, and the final patch
adds the feature.

The main omission in this submission is IPv6 support. ESP
encapsulation over UDP with IPv6 is currently not supported in the
kernel either, as UDP encapsulation is aimed at NAT traversal, and NAT
is not frequently used with IPv6.

Some of the code is taken directly, or slightly modified, from Herbert
Xu's original submission [1]. The ULP and strparser pieces are
new. This work was presented and discussed at the IPsec workshop and
netdev 0x13 conference [2] in Prague, last March.

An equivalent of patch #1 (skbuff: Avoid sleeping in
skb_send_sock_locked) is already present in other trees (but not
ipsec-next) as commit bd95e678e0f6 ("bpf: sockmap, fix use after free
from sleep in psock backlog workqueue"), I'm only including it here so
that this patchset works correctly on top of ipsec-next/master.

No changes in the patchset since the RFC.

[0] https://tools.ietf.org/html/rfc8229
[1] https://patchwork.ozlabs.org/patch/859107/
[2] https://netdevconf.org/0x13/session.html?talk-ipsec-encap

Herbert Xu (1):
  skbuff: Avoid sleeping in skb_send_sock_locked

Sabrina Dubroca (6):
  net: add queue argument to __skb_wait_for_more_packets and
    __skb_{,try_}recv_datagram
  xfrm: introduce xfrm_trans_queue_net
  xfrm: add route lookup to xfrm4_rcv_encap
  esp4: prepare esp_input_done2 for non-UDP encapsulation
  esp4: split esp_output_udp_encap and introduce esp_output_encap
  xfrm: add espintcp (RFC 8229)

 include/linux/skbuff.h    |  11 +-
 include/net/espintcp.h    |  38 +++
 include/net/xfrm.h        |   4 +
 include/uapi/linux/udp.h  |   1 +
 net/core/datagram.c       |  26 +-
 net/core/skbuff.c         |   1 +
 net/ipv4/esp4.c           | 262 ++++++++++++++++++--
 net/ipv4/udp.c            |   3 +-
 net/ipv4/xfrm4_protocol.c |   9 +
 net/unix/af_unix.c        |   7 +-
 net/xfrm/Kconfig          |   9 +
 net/xfrm/Makefile         |   1 +
 net/xfrm/espintcp.c       | 505 ++++++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_input.c     |  21 +-
 net/xfrm/xfrm_policy.c    |   7 +
 net/xfrm/xfrm_state.c     |   3 +
 16 files changed, 862 insertions(+), 46 deletions(-)
 create mode 100644 include/net/espintcp.h
 create mode 100644 net/xfrm/espintcp.c

-- 
2.22.0


^ permalink raw reply

* Re: [PATCH net] ixgbe: fix double clean of tx descriptors with xdp
From: William Tu @ 2019-08-21 21:38 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Ilya Maximets, Björn Töpel, Netdev, LKML, bpf,
	David S. Miller, Magnus Karlsson, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Jeff Kirsher,
	intel-wired-lan, Eelco Chaudron
In-Reply-To: <CAKgT0UcCKiM1Ys=vWxctprN7fzWcBCk-PCuKB-8=RThM=CqLSQ@mail.gmail.com>

On Wed, Aug 21, 2019 at 9:57 AM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Wed, Aug 21, 2019 at 9:22 AM Ilya Maximets <i.maximets@samsung.com> wrote:
> >
> > On 21.08.2019 4:17, Alexander Duyck wrote:
> > > On Tue, Aug 20, 2019 at 8:58 AM Ilya Maximets <i.maximets@samsung.com> wrote:
> > >>
> > >> On 20.08.2019 18:35, Alexander Duyck wrote:
> > >>> On Tue, Aug 20, 2019 at 8:18 AM Ilya Maximets <i.maximets@samsung.com> wrote:
> > >>>>
> > >>>> Tx code doesn't clear the descriptor status after cleaning.
> > >>>> So, if the budget is larger than number of used elems in a ring, some
> > >>>> descriptors will be accounted twice and xsk_umem_complete_tx will move
> > >>>> prod_tail far beyond the prod_head breaking the comletion queue ring.
> > >>>>
> > >>>> Fix that by limiting the number of descriptors to clean by the number
> > >>>> of used descriptors in the tx ring.
> > >>>>
> > >>>> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
> > >>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> > >>>
> > >>> I'm not sure this is the best way to go. My preference would be to
> > >>> have something in the ring that would prevent us from racing which I
> > >>> don't think this really addresses. I am pretty sure this code is safe
> > >>> on x86 but I would be worried about weak ordered systems such as
> > >>> PowerPC.
> > >>>
> > >>> It might make sense to look at adding the eop_desc logic like we have
> > >>> in the regular path with a proper barrier before we write it and after
> > >>> we read it. So for example we could hold of on writing the bytecount
> > >>> value until the end of an iteration and call smp_wmb before we write
> > >>> it. Then on the cleanup we could read it and if it is non-zero we take
> > >>> an smp_rmb before proceeding further to process the Tx descriptor and
> > >>> clearing the value. Otherwise this code is going to just keep popping
> > >>> up with issues.
> > >>
> > >> But, unlike regular case, xdp zero-copy xmit and clean for particular
> > >> tx ring always happens in the same NAPI context and even on the same
> > >> CPU core.
> > >>
> > >> I saw the 'eop_desc' manipulations in regular case and yes, we could
> > >> use 'next_to_watch' field just as a flag of descriptor existence,
> > >> but it seems unnecessarily complicated. Am I missing something?
> > >>
> > >
> > > So is it always in the same NAPI context?. I forgot, I was thinking
> > > that somehow the socket could possibly make use of XDP for transmit.
> >
> > AF_XDP socket only triggers tx interrupt on ndo_xsk_async_xmit() which
> > is used in zero-copy mode. Real xmit happens inside
> > ixgbe_poll()
> >  -> ixgbe_clean_xdp_tx_irq()
> >     -> ixgbe_xmit_zc()
> >
> > This should be not possible to bound another XDP socket to the same netdev
> > queue.
> >
> > It also possible to xmit frames in xdp_ring while performing XDP_TX/REDIRECT
> > actions. REDIRECT could happen from different netdev with different NAPI
> > context, but this operation is bound to specific CPU core and each core has
> > its own xdp_ring.
> >
> > However, I'm not an expert here.
> > Björn, maybe you could comment on this?
> >
> > >
> > > As far as the logic to use I would be good with just using a value you
> > > are already setting such as the bytecount value. All that would need
> > > to happen is to guarantee that the value is cleared in the Tx path. So
> > > if you clear the bytecount in ixgbe_clean_xdp_tx_irq you could
> > > theoretically just use that as well to flag that a descriptor has been
> > > populated and is ready to be cleaned. Assuming the logic about this
> > > all being in the same NAPI context anyway you wouldn't need to mess
> > > with the barrier stuff I mentioned before.
> >
> > Checking the number of used descs, i.e. next_to_use - next_to_clean,
> > makes iteration in this function logically equal to the iteration inside
> > 'ixgbe_xsk_clean_tx_ring()'. Do you think we need to change the later
> > function too to follow same 'bytecount' approach? I don't like having
> > two different ways to determine number of used descriptors in the same file.
> >
> > Best regards, Ilya Maximets.
>
> As far as ixgbe_clean_xdp_tx_irq() vs ixgbe_xsk_clean_tx_ring(), I
> would say that if you got rid of budget and framed things more like
> how ixgbe_xsk_clean_tx_ring was framed with the ntc != ntu being
> obvious I would prefer to see us go that route.
>
> Really there is no need for budget in ixgbe_clean_xdp_tx_irq() if you
> are going to be working with a static ntu value since you will only
> ever process one iteration through the ring anyway. It might make more
> sense if you just went through and got rid of budget and i, and
> instead used ntc and ntu like what was done in
> ixgbe_xsk_clean_tx_ring().
>
> Thanks.
>
> - Alex

Not familiar with the driver details.
I tested this patch and the issue mentioned in OVS mailing list.
https://www.mail-archive.com/ovs-dev@openvswitch.org/msg35362.html
and indeed the problem goes away. But I saw a huge performance drop,
my AF_XDP tx performance drops from >9Mpps to <5Mpps.

Tested using kernel 5.3.0-rc3+
03:00.0 Ethernet controller: Intel Corporation Ethernet Controller
10-Gigabit X540-AT2 (rev 01)
Subsystem: Intel Corporation Ethernet 10G 2P X540-t Adapter
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+

Regards,
William

^ permalink raw reply

* Re: [PATCH 24/38] cls_u32: Convert tc_u_common->handle_idr to XArray
From: Jakub Kicinski @ 2019-08-21 21:38 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: netdev
In-Reply-To: <20190821212542.GB21442@bombadil.infradead.org>

On Wed, 21 Aug 2019 14:25:42 -0700, Matthew Wilcox wrote:
> On Wed, Aug 21, 2019 at 02:13:08PM -0700, Jakub Kicinski wrote:
> > On Tue, 20 Aug 2019 15:32:45 -0700, Matthew Wilcox wrote:  
> > > @@ -305,8 +306,12 @@ static void *u32_get(struct tcf_proto *tp, u32 handle)
> > >  /* Protected by rtnl lock */
> > >  static u32 gen_new_htid(struct tc_u_common *tp_c, struct tc_u_hnode *ptr)
> > >  {
> > > -	int id = idr_alloc_cyclic(&tp_c->handle_idr, ptr, 1, 0x7FF, GFP_KERNEL);
> > > -	if (id < 0)
> > > +	int err;
> > > +	u32 id;
> > > +
> > > +	err = xa_alloc_cyclic(&tp_c->ht_xa, &id, ptr, XA_LIMIT(0, 0x7ff),
> > > +			&tp_c->ht_next, GFP_KERNEL);  
> > 
> > nit: indentation seems off here and a couple of other places.  
> 
> what indentation rule does the networking stack use?  i just leave the
> cursor where my editor puts it, which seems to be two tabs.

Oh, match opening bracket..

	err = xa_alloc_cyclic(&tp_c->ht_xa, &id, ptr, XA_LIMIT(0, 0x7ff),
			      &tp_c->ht_next, GFP_KERNEL);

^ permalink raw reply

* Re: [net-next 00/15][pull request] 40GbE Intel Wired LAN Driver Updates 2019-08-21
From: Jakub Kicinski @ 2019-08-21 21:31 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, netdev, nhorman, sassmann
In-Reply-To: <20190821201623.5506-1-jeffrey.t.kirsher@intel.com>

On Wed, 21 Aug 2019 13:16:08 -0700, Jeff Kirsher wrote:
> This series contains updates to i40e driver only.

Patch 12 should really be squashed into 13, 7 and 9 could also be
combined. But not a big deal, I guess.

^ permalink raw reply

* Re: [PATCH 24/38] cls_u32: Convert tc_u_common->handle_idr to XArray
From: Matthew Wilcox @ 2019-08-21 21:25 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev
In-Reply-To: <20190821141308.54313c30@cakuba.netronome.com>

On Wed, Aug 21, 2019 at 02:13:08PM -0700, Jakub Kicinski wrote:
> On Tue, 20 Aug 2019 15:32:45 -0700, Matthew Wilcox wrote:
> > @@ -305,8 +306,12 @@ static void *u32_get(struct tcf_proto *tp, u32 handle)
> >  /* Protected by rtnl lock */
> >  static u32 gen_new_htid(struct tc_u_common *tp_c, struct tc_u_hnode *ptr)
> >  {
> > -	int id = idr_alloc_cyclic(&tp_c->handle_idr, ptr, 1, 0x7FF, GFP_KERNEL);
> > -	if (id < 0)
> > +	int err;
> > +	u32 id;
> > +
> > +	err = xa_alloc_cyclic(&tp_c->ht_xa, &id, ptr, XA_LIMIT(0, 0x7ff),
> > +			&tp_c->ht_next, GFP_KERNEL);
> 
> nit: indentation seems off here and a couple of other places.

what indentation rule does the networking stack use?  i just leave the
cursor where my editor puts it, which seems to be two tabs.

^ permalink raw reply

* [PATCH] net/ncsi: Fix the payload copying for the request coming from Netlink
From: Justin.Lee1 @ 2019-08-21 21:24 UTC (permalink / raw)
  To: netdev, openbmc, linux-kernel, sam, davem

The request coming from Netlink should use the OEM generic handler.

The standard command handler expects payload in bytes/words/dwords
but the actual payload is stored in data if the request is coming from Netlink.

Signed-off-by: Justin Lee <justin.lee1@dell.com>

---
 net/ncsi/ncsi-cmd.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
index eab4346..0187e65 100644
--- a/net/ncsi/ncsi-cmd.c
+++ b/net/ncsi/ncsi-cmd.c
@@ -309,14 +309,21 @@ static struct ncsi_request *ncsi_alloc_command(struct ncsi_cmd_arg *nca)
 
 int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca)
 {
+	struct ncsi_cmd_handler *nch = NULL;
 	struct ncsi_request *nr;
+	unsigned char type;
 	struct ethhdr *eh;
-	struct ncsi_cmd_handler *nch = NULL;
 	int i, ret;
 
+	/* Use OEM generic handler for Netlink request */
+	if (nca->req_flags == NCSI_REQ_FLAG_NETLINK_DRIVEN)
+		type = NCSI_PKT_CMD_OEM;
+	else
+		type = nca->type;
+
 	/* Search for the handler */
 	for (i = 0; i < ARRAY_SIZE(ncsi_cmd_handlers); i++) {
-		if (ncsi_cmd_handlers[i].type == nca->type) {
+		if (ncsi_cmd_handlers[i].type == type) {
 			if (ncsi_cmd_handlers[i].handler)
 				nch = &ncsi_cmd_handlers[i];
 			else
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH 24/38] cls_u32: Convert tc_u_common->handle_idr to XArray
From: Jakub Kicinski @ 2019-08-21 21:13 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: netdev
In-Reply-To: <20190820223259.22348-25-willy@infradead.org>

On Tue, 20 Aug 2019 15:32:45 -0700, Matthew Wilcox wrote:
> @@ -305,8 +306,12 @@ static void *u32_get(struct tcf_proto *tp, u32 handle)
>  /* Protected by rtnl lock */
>  static u32 gen_new_htid(struct tc_u_common *tp_c, struct tc_u_hnode *ptr)
>  {
> -	int id = idr_alloc_cyclic(&tp_c->handle_idr, ptr, 1, 0x7FF, GFP_KERNEL);
> -	if (id < 0)
> +	int err;
> +	u32 id;
> +
> +	err = xa_alloc_cyclic(&tp_c->ht_xa, &id, ptr, XA_LIMIT(0, 0x7ff),
> +			&tp_c->ht_next, GFP_KERNEL);

nit: indentation seems off here and a couple of other places.

^ permalink raw reply

* Re: libbpf distro packaging
From: Jiri Olsa @ 2019-08-21 21:09 UTC (permalink / raw)
  To: Julia Kartseva
  Cc: Andrii Nakryiko, labbott@redhat.com, acme@kernel.org,
	debian-kernel@lists.debian.org, netdev@vger.kernel.org,
	Andrii Nakryiko, Andrey Ignatov, Alexei Starovoitov,
	Yonghong Song, jolsa@kernel.org
In-Reply-To: <A770810D-591E-4292-AEFA-563724B6D6CB@fb.com>

On Tue, Aug 20, 2019 at 10:27:23PM +0000, Julia Kartseva wrote:
> 
> 
> On 8/19/19, 11:08 AM, "Julia Kartseva" <hex@fb.com> wrote:
> 
>     On 8/13/19, 11:24 AM, "Andrii Nakryiko" <andrii.nakryiko@gmail.com> wrote:
>     
>         On Tue, Aug 13, 2019 at 5:26 AM Jiri Olsa <jolsa@redhat.com> wrote:
>         >
>         > On Mon, Aug 12, 2019 at 07:04:12PM +0000, Julia Kartseva wrote:
>         > > I would like to bring up libbpf publishing discussion started at [1].
>         > > The present state of things is that libbpf is built from kernel tree, e.g. [2]
>         > > For Debian and [3] for Fedora whereas the better way would be having a
>         > > package built from github mirror. The advantages of the latter:
>         > > - Consistent, ABI matching versioning across distros
>         > > - The mirror has integration tests
>         > > - No need in kernel tree to build a package
>         > > - Changes can be merged directly to github w/o waiting them to be merged
>         > > through bpf-next -> net-next -> main
>         > > There is a PR introducing a libbpf.spec which can be used as a starting point: [4]
>         > > Any comments regarding the spec itself can be posted there.
>         > > In the future it may be used as a source of truth.
>         > > Please consider switching libbpf packaging to the github mirror instead
>         > > of the kernel tree.
>         > > Thanks
>         > >
>         > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.iovisor.org_g_iovisor-2Ddev_message_1521&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=zUrDY_Sp_5PqcGtRQPNeDA&m=prYVDiu3-aH1o2PWH4ZcP7lEQRCQAcTwcWPrJrtaroQ&s=dYAc2jLhFg0wtCZ_ms2HF5bWANoHzA3UMug5TNCeBtE&e= 
>         > > [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__packages.debian.org_sid_libbpf4.19&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=zUrDY_Sp_5PqcGtRQPNeDA&m=prYVDiu3-aH1o2PWH4ZcP7lEQRCQAcTwcWPrJrtaroQ&s=lq1MpF-bt6y6ZEtFc57eT-BO_wMBx8uUBACJooWbUYk&e= 
>         > > [3] https://urldefense.proofpoint.com/v2/url?u=http-3A__rpmfind.net_linux_RPM_fedora_devel_rawhide_x86-5F64_l_libbpf-2D5.3.0-2D0.rc2.git0.1.fc31.x86-5F64.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=zUrDY_Sp_5PqcGtRQPNeDA&m=prYVDiu3-aH1o2PWH4ZcP7lEQRCQAcTwcWPrJrtaroQ&s=NoolYHL57G2KhzE768iWdy6v5LD2GfJQyqPmtjy196E&e= 
>         > > [4] https://github.com/libbpf/libbpf/pull/64
>         >
>         > hi,
>         > Fedora has libbpf as kernel-tools subpackage, so I think
>         > we'd need to create new package and deprecate the current
>         >
>         > but I like the ABI stability by using github .. how's actually
>         > the sync (in both directions) with kernel sources going on?
>         
>         Sync is always in one direction, from kernel sources into Github repo.
>         Right now it's triggered by a human (usually me), but we are using a
>         script that automates entire process (see
>         https://github.com/libbpf/libbpf/blob/master/scripts/sync-kernel.sh).
>         It cherry-pick relevant commits from kernel, transforms them to match
>         Github's file layout and re-applies those changes to Github repo.
>         
>         There is never a sync from Github back to kernel, but Github repo
>         contains some extra stuff that's not in kernel. E.g., the script I
>         mentioned, plus Github's Makefile is different, because it can't rely
>         on kernel's kbuild setup.
> 
> Hi Jiri,
> I'm curious if you have any comments regarding sync procedure described
> By Andrii. Or if there is anything else you'd like us to address so Fedora
> can be switched to libbpf built from the github mirror?

hi,
yea, I think it's ok.. just need to check the implications
for rhel packaging and I'll let you know

jirka

^ permalink raw reply

* [PATCH bpf] bpf: fix precision tracking in presence of bpf2bpf calls
From: Alexei Starovoitov @ 2019-08-21 21:07 UTC (permalink / raw)
  To: davem; +Cc: daniel, netdev, bpf, kernel-team

While adding extra tests for precision tracking and extra infra
to adjust verifier heuristics the existing test
"calls: cross frame pruning - liveness propagation" started to fail.
The root cause is the same as described in verifer.c comment:

 * Also if parent's curframe > frame where backtracking started,
 * the verifier need to mark registers in both frames, otherwise callees
 * may incorrectly prune callers. This is similar to
 * commit 7640ead93924 ("bpf: verifier: make sure callees don't prune with caller differences")
 * For now backtracking falls back into conservative marking.

Turned out though that returning -ENOTSUPP from backtrack_insn() and
doing mark_all_scalars_precise() in the current parentage chain is not enough.
Depending on how is_state_visited() heuristic is creating parentage chain
it's possible that callee will incorrectly prune caller.
Fix the issue by setting precise=true earlier and more aggressively.
Before this fix the precision tracking _within_ functions that don't do
bpf2bpf calls would still work. Whereas now precision tracking is completely
disabled when bpf2bpf calls are present anywhere in the program.

No difference in cilium tests (they don't have bpf2bpf calls).
No difference in test_progs though some of them have bpf2bpf calls,
but precision tracking wasn't effective there.

Fixes: b5dc0163d8fd ("bpf: precise scalar_value tracking")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
Separate set of tests and infra for them will go into bpf-next.
---
 kernel/bpf/verifier.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c84d83f86141..b5c14c9d7b98 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -985,9 +985,6 @@ static void __mark_reg_unbounded(struct bpf_reg_state *reg)
 	reg->smax_value = S64_MAX;
 	reg->umin_value = 0;
 	reg->umax_value = U64_MAX;
-
-	/* constant backtracking is enabled for root only for now */
-	reg->precise = capable(CAP_SYS_ADMIN) ? false : true;
 }
 
 /* Mark a register as having a completely unknown (scalar) value. */
@@ -1014,7 +1011,11 @@ static void mark_reg_unknown(struct bpf_verifier_env *env,
 			__mark_reg_not_init(regs + regno);
 		return;
 	}
-	__mark_reg_unknown(regs + regno);
+	regs += regno;
+	__mark_reg_unknown(regs);
+	/* constant backtracking is enabled for root without bpf2bpf calls */
+	regs->precise = env->subprog_cnt > 1 || !env->allow_ptr_leaks ?
+			true : false;
 }
 
 static void __mark_reg_not_init(struct bpf_reg_state *reg)
-- 
2.20.0


^ permalink raw reply related

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
From: Toke Høiland-Jørgensen @ 2019-08-21 21:07 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Stephen Hemminger, Daniel Borkmann, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, Networking, bpf
In-Reply-To: <CAEf4BzZxb7qZabw6aDVaTqnhr3AGtwEo+DbuBR9U9tJr+qVuyg@mail.gmail.com>

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> iproute2 uses its own bpf loader to load eBPF programs, which has
>> evolved separately from libbpf. Since we are now standardising on
>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>> feature incompatibilities with libbpf-based loaders. In particular,
>> iproute2 has its own (expanded) version of the map definition struct,
>> which makes it difficult to write programs that can be loaded with both
>> custom loaders and iproute2.
>>
>> This series seeks to address this by converting iproute2 to using libbpf
>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>> get some feedback on whether people think this is the right direction.
>>
>> What this series does is the following:
>>
>> - Updates the libbpf map definition struct to match that of iproute2
>>   (patch 1).
>
>
> Hi Toke,
>
> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> totally in support of making iproute2 use libbpf to load/initialize
> BPF programs. But I'm against adding iproute2-specific fields to
> libbpf's bpf_map_def definitions to support this.
>
> I've proposed the plan of extending libbpf's supported features so
> that it can be used to load iproute2-style BPF programs earlier,
> please see discussions in [0] and [1].

Yeah, I've seen that discussion, and agree that longer term this is
probably a better way to do map-in-map definitions.

However, I view your proposal as complementary to this series: we'll
probably also want the BTF-based definition to work with iproute2, and
that means iproute2 needs to be ported to libbpf. But iproute2 needs to
be backwards compatible with the format it supports now, and, well, this
series is the simplest way to achieve that IMO :)

> I think instead of emulating iproute2 way of matching everything based
> on user-specified internal IDs, which doesn't provide good user
> experience and is quite easy to get wrong, we should support same
> scenarios with better declarative syntax and in a less error-prone
> way. I believe we can do that by relying on BTF more heavily (again,
> please check some of my proposals in [0], [1], and discussion with
> Daniel in those threads). It will feel more natural and be more
> straightforward to follow. It would be great if you can lend a hand in
> implementing pieces of that plan!
>
> I'm currently on vacation, so my availability is very sparse, but I'd
> be happy to discuss this further, if need be.

Happy to collaborate on your proposal when you're back from vacation;
but as I said above, I believe this is a complementary longer-term
thing...

-Toke

^ permalink raw reply

* Re: [PATCH net-next v2 0/3] net: dsa: mt7530: Convert to PHYLINK and add support for port 5
From: Andrew Lunn @ 2019-08-21 21:05 UTC (permalink / raw)
  To: René van Dorst
  Cc: Sean Wang, Vivien Didelot, Florian Fainelli, David S . Miller,
	Matthias Brugger, netdev, linux-arm-kernel, linux-mediatek,
	John Crispin, linux-mips, Frank Wunderlich
In-Reply-To: <20190821144547.15113-1-opensource@vdorst.com>

On Wed, Aug 21, 2019 at 04:45:44PM +0200, René van Dorst wrote:
> 1. net: dsa: mt7530: Convert to PHYLINK API
>    This patch converts mt7530 to PHYLINK API.
> 2. dt-bindings: net: dsa: mt7530: Add support for port 5
> 3. net: dsa: mt7530: Add support for port 5
>    These 2 patches adding support for port 5 of the switch.
> 
> v1->v2:
>  * Mostly phylink improvements after review.

Hi René

You are addressing comments mostly from Russell King. It would of been
good to Cc: him on the patchset.

Andrew

^ permalink raw reply

* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
From: Toke Høiland-Jørgensen @ 2019-08-21 21:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Stephen Hemminger, Daniel Borkmann, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, David Miller,
	Jesper Dangaard Brouer, netdev, bpf
In-Reply-To: <20190821192611.xmciiiqjpkujjup7@ast-mbp.dhcp.thefacebook.com>

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Tue, Aug 20, 2019 at 01:47:01PM +0200, Toke Høiland-Jørgensen wrote:
>> iproute2 uses its own bpf loader to load eBPF programs, which has
>> evolved separately from libbpf. Since we are now standardising on
>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>> feature incompatibilities with libbpf-based loaders. In particular,
>> iproute2 has its own (expanded) version of the map definition struct,
>> which makes it difficult to write programs that can be loaded with both
>> custom loaders and iproute2.
>> 
>> This series seeks to address this by converting iproute2 to using libbpf
>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>> get some feedback on whether people think this is the right direction.
>> 
>> What this series does is the following:
>> 
>> - Updates the libbpf map definition struct to match that of iproute2
>>   (patch 1).
>> - Adds functionality to libbpf to support automatic pinning of maps when
>>   loading an eBPF program, while re-using pinned maps if they already
>>   exist (patches 2-3).
>> - Modifies iproute2 to make it possible to compile it against libbpf
>>   without affecting any existing functionality (patch 4).
>> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
>>   programs (patch 5).
>> 
>> 
>> As this is an early PoC, there are still a few missing pieces before
>> this can be merged. Including (but probably not limited to):
>> 
>> - Consolidate the map definition struct in the bpf_helpers.h file in the
>>   kernel tree. This contains a different, and incompatible, update to
>>   the struct. Since the iproute2 version has actually been released for
>>   use outside the kernel tree (and thus is subject to API stability
>>   constraints), I think it makes the most sense to keep that, and port
>>   the selftests to use it.
>
> It sounds like you're implying that existing libbpf format is not
> uapi.

No, that's not what I meant... See below.

> It is and we cannot break it.
> If patch 1 means breakage for existing pre-compiled .o that won't load
> with new libbpf then we cannot use this method.
> Recompiling .o with new libbpf definition of bpf_map_def isn't an option.
> libbpf has to be smart before/after and recognize both old and iproute2 format.

The libbpf.h definition of struct bpf_map_def is compatible with the one
used in iproute2. In libbpf.h, the struct only contains five fields
(type, key_size, value_size, max_entries and flags), and iproute2 adds
another 4 (id, pinning, inner_id and inner_idx; these are the ones in
patch 1 in this series).

The issue I was alluding to above is that the bpf_helpers.h file in the
kernel selftests directory *also* extends the bpf_map_def struct, and
adds two *different* fields (inner_map_idx and numa_mode). The former is
used to implement the same map-in-map definition functionality that
iproute2 has, but with different semantics. The latter is additional to
that, and I'm planning to add that to this series.

Since bpf_helpers.h is *not* part of libbpf (yet), this will make it
possible to keep API (and ABI) compatibility with both iproute2 and
libbpf. As in, old .o files will still load with libbpf after this
series, they just won't be able to use the new automatic pinning
feature.

-Toke

^ permalink raw reply

* Re: [PATCH net] net: cpsw: fix NULL pointer exception in the probe error path
From: David Miller @ 2019-08-21 21:00 UTC (permalink / raw)
  To: antoine.tenart
  Cc: grygorii.strashko, netdev, linux-omap, linux-kernel,
	maxime.chevallier
In-Reply-To: <20190821144123.22248-1-antoine.tenart@bootlin.com>

From: Antoine Tenart <antoine.tenart@bootlin.com>
Date: Wed, 21 Aug 2019 16:41:23 +0200

> In certain cases when the probe function fails the error path calls
> cpsw_remove_dt() before calling platform_set_drvdata(). This is an
> issue as cpsw_remove_dt() uses platform_get_drvdata() to retrieve the
> cpsw_common data and leds to a NULL pointer exception. This patches
> fixes it by calling platform_set_drvdata() earlier in the probe.
> 
> Fixes: 83a8471ba255 ("net: ethernet: ti: cpsw: refactor probe to group common hw initialization")
> Reported-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: dwc-qos: use devm_platform_ioremap_resource() to simplify code
From: David Miller @ 2019-08-21 20:53 UTC (permalink / raw)
  To: yuehaibing
  Cc: peppe.cavallaro, alexandre.torgue, joabreu, khilman,
	mcoquelin.stm32, linux-kernel, netdev, linux-arm-kernel,
	linux-amlogic, linux-stm32
In-Reply-To: <20190821135701.46780-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 21 Aug 2019 21:57:01 +0800

> Use devm_platform_ioremap_resource() to simplify the code a bit.
> This is detected by coccinelle.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: dwmac-anarion: use devm_platform_ioremap_resource() to simplify code
From: David Miller @ 2019-08-21 20:53 UTC (permalink / raw)
  To: yuehaibing
  Cc: peppe.cavallaro, alexandre.torgue, joabreu, khilman,
	mcoquelin.stm32, linux-kernel, netdev, linux-arm-kernel,
	linux-amlogic, linux-stm32
In-Reply-To: <20190821135550.55200-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 21 Aug 2019 21:55:50 +0800

> Use devm_platform_ioremap_resource() to simplify the code a bit.
> This is detected by coccinelle.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: dwmac-meson: use devm_platform_ioremap_resource() to simplify code
From: David Miller @ 2019-08-21 20:53 UTC (permalink / raw)
  To: yuehaibing
  Cc: peppe.cavallaro, alexandre.torgue, joabreu, khilman,
	mcoquelin.stm32, linux-kernel, netdev, linux-arm-kernel,
	linux-amlogic, linux-stm32
In-Reply-To: <20190821135406.26200-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 21 Aug 2019 21:54:06 +0800

> Use devm_platform_ioremap_resource() to simplify the code a bit.
> This is detected by coccinelle.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: dwmac-meson8b: use devm_platform_ioremap_resource() to simplify code
From: David Miller @ 2019-08-21 20:53 UTC (permalink / raw)
  To: yuehaibing
  Cc: peppe.cavallaro, alexandre.torgue, joabreu, khilman,
	mcoquelin.stm32, linux-kernel, netdev, linux-arm-kernel,
	linux-amlogic, linux-stm32
In-Reply-To: <20190821135130.68636-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 21 Aug 2019 21:51:30 +0800

> Use devm_platform_ioremap_resource() to simplify the code a bit.
> This is detected by coccinelle.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: systemport: use devm_platform_ioremap_resource() to simplify code
From: David Miller @ 2019-08-21 20:53 UTC (permalink / raw)
  To: yuehaibing
  Cc: opendmb, f.fainelli, bcm-kernel-feedback-list, linux-kernel,
	netdev
In-Reply-To: <20190821134613.23276-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 21 Aug 2019 21:46:13 +0800

> Use devm_platform_ioremap_resource() to simplify the code a bit.
> This is detected by coccinelle.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: bcmgenet: use devm_platform_ioremap_resource() to simplify code
From: David Miller @ 2019-08-21 20:52 UTC (permalink / raw)
  To: yuehaibing
  Cc: opendmb, f.fainelli, bcm-kernel-feedback-list, linux-kernel,
	netdev
In-Reply-To: <20190821134131.57780-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 21 Aug 2019 21:41:31 +0800

> Use devm_platform_ioremap_resource() to simplify the code a bit.
> This is detected by coccinelle.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] pxa168_eth: use devm_platform_ioremap_resource() to simplify code
From: David Miller @ 2019-08-21 20:52 UTC (permalink / raw)
  To: yuehaibing; +Cc: andrew, mcgrof, tglx, ynezz, linux-kernel, netdev
In-Reply-To: <20190821133854.4308-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 21 Aug 2019 21:38:54 +0800

> Use devm_platform_ioremap_resource() to simplify the code a bit.
> This is detected by coccinelle.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: mvneta: use devm_platform_ioremap_resource() to simplify code
From: David Miller @ 2019-08-21 20:52 UTC (permalink / raw)
  To: yuehaibing; +Cc: bigeasy, linux-kernel, netdev
In-Reply-To: <20190821133302.72880-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 21 Aug 2019 21:33:02 +0800

> Use devm_platform_ioremap_resource() to simplify the code a bit.
> This is detected by coccinelle.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: fec: use devm_platform_ioremap_resource() to simplify code
From: David Miller @ 2019-08-21 20:50 UTC (permalink / raw)
  To: yuehaibing; +Cc: fugang.duan, linux-kernel, netdev
In-Reply-To: <20190821132945.19648-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 21 Aug 2019 21:29:45 +0800

> Use devm_platform_ioremap_resource() to simplify the code a bit.
> This is detected by coccinelle.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH 0/1] pull request for net: batman-adv 2019-08-21
From: David Miller @ 2019-08-21 20:50 UTC (permalink / raw)
  To: sw; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <20190821133015.12778-1-sw@simonwunderlich.de>

From: Simon Wunderlich <sw@simonwunderlich.de>
Date: Wed, 21 Aug 2019 15:30:14 +0200

> here is a pull request with Erics bugfix from last week which we would
> like to have integrated into net. We didn't get anything else, so it's
> a short one this time. :)
> 
> Please pull or let me know of any problem!

Pulled, thanks Simon.

^ permalink raw reply

* Re: [PATCH net-next] ezchip: nps_enet: use devm_platform_ioremap_resource() to simplify code
From: David Miller @ 2019-08-21 20:43 UTC (permalink / raw)
  To: yuehaibing; +Cc: ynezz, tglx, gregkh, linux-kernel, netdev
In-Reply-To: <20190821130509.71916-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 21 Aug 2019 21:05:09 +0800

> Use devm_platform_ioremap_resource() to simplify the code a bit.
> This is detected by coccinelle.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] cirrus: cs89x0: use devm_platform_ioremap_resource() to simplify code
From: David Miller @ 2019-08-21 20:43 UTC (permalink / raw)
  To: yuehaibing; +Cc: linux-kernel, netdev
In-Reply-To: <20190821130241.58276-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 21 Aug 2019 21:02:41 +0800

> Use devm_platform_ioremap_resource() to simplify the code a bit.
> This is detected by coccinelle.
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox