Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2] seg6_iptunnel: Refactor seg6_lwt_headroom out of uapi header
From: David Miller @ 2020-07-31 21:25 UTC (permalink / raw)
  To: ioanaruxandra.stancioi
  Cc: david.lebrun, kuznet, yoshfuji, netdev, linux-kernel, elver,
	glider, stancioi
In-Reply-To: <20200731074032.293456-1-ioanaruxandra.stancioi@gmail.com>

From: Ioana-Ruxandra Stancioi <ioanaruxandra.stancioi@gmail.com>
Date: Fri, 31 Jul 2020 07:40:32 +0000

> --- a/net/ipv6/seg6_iptunnel.c
> +++ b/net/ipv6/seg6_iptunnel.c
> @@ -27,6 +27,23 @@
>  #include <net/seg6_hmac.h>
>  #endif
>  
> +static inline size_t seg6_lwt_headroom(struct seg6_iptunnel_encap *tuninfo)
> +{

Please remove the inline tag when you move the function here, we do
not use the inline keyword in foo.c files.

Thank you.

^ permalink raw reply

* Re: [PATCH net] net: gre: recompute gre csum for sctp over gre tunnels
From: Lorenzo Bianconi @ 2020-07-31 21:29 UTC (permalink / raw)
  To: Lorenzo Bianconi; +Cc: netdev, davem, tom, marcelo.leitner
In-Reply-To: <6722d2c4fe1b9b376a277b3f35cdf3eb3345874e.1596218124.git.lorenzo@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 2513 bytes --]

> The GRE tunnel can be used to transport traffic that does not rely on a
> Internet checksum (e.g. SCTP). The issue can be triggered creating a GRE
> or GRETAP tunnel and transmitting SCTP traffic ontop of it where CRC
> offload has been disabled. In order to fix the issue we need to
> recompute the GRE csum in gre_gso_segment() not relying on the inner
> checksum.
> The issue is still present when we have the CRC offload enabled.
> In this case we need to disable the CRC offload if we require GRE
> checksum since otherwise skb_checksum() will report a wrong value.
> 
> Fixes: 4749c09c37030 ("gre: Call gso_make_checksum")

I put the wrong Fixes tag, reviewing it I guess the right one is:

Fixes: 90017accff61 ("sctp: Add GSO support")

sorry for the noise.

Regards,
Lorenzo

> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  net/ipv4/gre_offload.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
> index 2e6d1b7a7bc9..e0a246575887 100644
> --- a/net/ipv4/gre_offload.c
> +++ b/net/ipv4/gre_offload.c
> @@ -15,12 +15,12 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
>  				       netdev_features_t features)
>  {
>  	int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb);
> +	bool need_csum, need_recompute_csum, gso_partial;
>  	struct sk_buff *segs = ERR_PTR(-EINVAL);
>  	u16 mac_offset = skb->mac_header;
>  	__be16 protocol = skb->protocol;
>  	u16 mac_len = skb->mac_len;
>  	int gre_offset, outer_hlen;
> -	bool need_csum, gso_partial;
>  
>  	if (!skb->encapsulation)
>  		goto out;
> @@ -41,6 +41,7 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
>  	skb->protocol = skb->inner_protocol;
>  
>  	need_csum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_GRE_CSUM);
> +	need_recompute_csum = skb->csum_not_inet;
>  	skb->encap_hdr_csum = need_csum;
>  
>  	features &= skb->dev->hw_enc_features;
> @@ -98,7 +99,15 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
>  		}
>  
>  		*(pcsum + 1) = 0;
> -		*pcsum = gso_make_checksum(skb, 0);
> +		if (need_recompute_csum && !skb_is_gso(skb)) {
> +			__wsum csum;
> +
> +			csum = skb_checksum(skb, gre_offset,
> +					    skb->len - gre_offset, 0);
> +			*pcsum = csum_fold(csum);
> +		} else {
> +			*pcsum = gso_make_checksum(skb, 0);
> +		}
>  	} while ((skb = skb->next));
>  out:
>  	return segs;
> -- 
> 2.26.2
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] cxgb4: Add support to flash firmware config image
From: Rahul Lakkireddy @ 2020-07-31 21:17 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Ganji Aravind, netdev, davem, vishal
In-Reply-To: <20200731110008.598a8ea7@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Friday, July 07/31/20, 2020 at 11:00:08 -0700, Jakub Kicinski wrote:
> On Fri, 31 Jul 2020 16:39:04 +0530 Ganji Aravind wrote:
> > On Thursday, July 07/30/20, 2020 at 16:23:35 -0700, Jakub Kicinski wrote:
> > > On Thu, 30 Jul 2020 20:41:38 +0530 Ganji Aravind wrote:  
> > > > Update set_flash to flash firmware configuration image
> > > > to flash region.  
> > > 
> > > And the reason why you need to flash some .ini files separately is?  
> > 
> > Hi Jakub,
> > 
> > The firmware config file contains information on how the firmware
> > should distribute the hardware resources among NIC and
> > Upper Layer Drivers(ULD), like iWARP, crypto, filtering, etc.
> > 
> > The firmware image comes with an in-built default config file that
> > distributes resources among the NIC and all the ULDs. However, in
> > some cases, where we don't want to run a particular ULD, or if we
> > want to redistribute the resources, then we'd modify the firmware
> > config file and then firmware will redistribute those resources
> > according to the new configuration. So, if firmware finds this
> > custom config file in flash, it reads this first. Otherwise, it'll
> > continue initializing the adapter with its own in-built default
> > config file.
> 
> Sounds like something devlink could be extended to do.
> 
> Firmware update interface is not for configuration.

I thought /lib/firmware is where firmware related files need to be
placed and ethtool --flash needs to be used to program them to
their respective locations on adapter's flash.

Note that we don't have devlink support in our driver yet. And, we're
a bit confused here on why this already existing ethtool feature needs
to be duplicated to devlink.

Thanks,
Rahul

^ permalink raw reply

* RE: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet driver
From: Asmaa Mnebhi @ 2020-07-31 21:38 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David Thompson, netdev@vger.kernel.org, davem@davemloft.net,
	kuba@kernel.org, Jiri Pirko
In-Reply-To: <20200731195458.GA1843538@lunn.ch>



> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: Friday, July 31, 2020 3:55 PM
> To: Asmaa Mnebhi <Asmaa@mellanox.com>
> Cc: David Thompson <dthompson@mellanox.com>;
> netdev@vger.kernel.org; davem@davemloft.net; kuba@kernel.org; Jiri
> Pirko <jiri@mellanox.com>
> Subject: Re: [PATCH net-next] Add Mellanox BlueField Gigabit Ethernet
> driver
> 
> On Fri, Jul 31, 2020 at 06:54:04PM +0000, Asmaa Mnebhi wrote:
> 
> Hi Asmaa
> 
> Please don't send HTML obfusticated emails to mailing lists.

My apologies! 

> 
> > > +static int mlxbf_gige_mdio_read(struct mii_bus *bus, int phy_add,
> > > +int
> >
> > > +phy_reg) {
> >
> > > +         struct mlxbf_gige *priv = bus->priv;
> >
> > > +         u32 cmd;
> >
> > > +         u32 ret;
> >
> > > +
> >
> > > +         /* If the lock is held by something else, drop the request.
> >
> > > +         * If the lock is cleared, that means the busy bit was cleared.
> >
> > > +         */
> >
> >
> >
> > How can this happen? The mdio core has a mutex which prevents parallel
> access?
> >
> >
> >
> > This is a HW Lock. It is an actual register. So another HW entity can
> > be holding that lock and reading/changing the values in the HW registers.
> 
> You have not explains how that can happen? Is there something in the driver
> i missed which takes a backdoor to read/write MDIO transactions?

Ah ok! There is a HW entity (called YU) within the BlueField which is connected to the PHY device.
I think the YU is what you are calling "backdoor" here. The YU contains several registers which control reads/writes
To the PHY. So it is like an extra layer for reading MDIO registers. One of the YU registers is the gateway register (aka GW or
MLXBF_GIGE_MDIO_GW_OFFSET in the code). If the GW register's LOCK bit is not cleared, we cannot write anything to the actual PHY MDIO registers.
Did I answer your question?

> 
> > > +         ret = mlxbf_gige_mdio_poll_bit(priv,
> > > + MLXBF_GIGE_MDIO_GW_LOCK_MASK);
> >
> > > +         if (ret)
> >
> > > +                       return -EBUSY;
> >
> >
> >
> > PHY drivers are not going to like that. They are not going to retry.
> > What is likely to happen is that phylib moves into the ERROR state,
> > and the PHY driver grinds to a halt.
> >
> >
> >
> > This is a fairly quick HW transaction. So I don’t think it would cause
> > and issue for the PHY drivers. In this case, we use the micrel
> > KSZ9031. We haven’t seen issues.
> 
> So you have happy to debug hard to find and reproduce issues when it does
> happen? Or would you like to spend a little bit of time now and just prevent
> it happening at all?

I think I misunderstood your comment. Did you ask why we are polling here? Or that we shouldn't be returning -EBUSY?
> 
>      Andrew

^ permalink raw reply

* RE: [RFC PATCH bpf-next 0/3] Add a new bpf helper for FDB lookup
From: John Fastabend @ 2020-07-31 21:52 UTC (permalink / raw)
  To: Yoshiki Komachi, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Jakub Kicinski, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, KP Singh, Roopa Prabhu, Nikolay Aleksandrov,
	David Ahern
  Cc: Yoshiki Komachi, netdev, bridge, bpf
In-Reply-To: <1596170660-5582-1-git-send-email-komachi.yoshiki@gmail.com>

Yoshiki Komachi wrote:
> This series adds a new bpf helper for doing FDB lookup in the kernel
> tables from XDP programs. This helps users to accelerate Linux bridge
> with XDP.
> 
> In the past, XDP generally required users to reimplement their own
> networking functionalities with specific manners of BPF programming
> by themselves, hindering its potential uses. IMO, bpf helpers to
> access networking stacks in kernel help to mitigate the programming
> costs because users reuse mature Linux networking feature more easily.
> 
> The previous commit 87f5fc7e48dd ("bpf: Provide helper to do forwarding
> lookups in kernel FIB table") have already added a bpf helper for access
> FIB in the kernel tables from XDP programs. As a next step, this series
> introduces the API for FDB lookup. In the future, other bpf helpers for
> learning and VLAN filtering will also be required in order to realize
> fast XDP-based bridge although these are not included in this series.

Just to clarify for myself. I expect that with just the helpers here
we should only expect static configurations to work, e.g. any learning
and/or aging is not likely to work if we do redirects in the XDP path.

Then next to get a learning/filtering/aging we would need to have a
set of bridge helpers to get that functionality as well? I believe
I'm just repeating what you are saying above, but wanted to check.

Then my next question is can we see some performance numbers? These
things are always trade-off between performance and ease of
use, but would be good to know roughly what we are looking at vs
a native XDP bridge functionality.

Do you have a use case for a static bridge setup? Nothing wrong to
stage things IMO if the 'real' use case needs learning and filtering.

I guess to get STP working you would minimally need learning and
aging?

Thanks,
John

^ permalink raw reply

* RE: [PATCH bpf-next] selftests/bpf: fix spurious test failures in core_retro selftest
From: John Fastabend @ 2020-07-31 21:55 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20200731204957.2047119-1-andriin@fb.com>

Andrii Nakryiko wrote:
> core_retro selftest uses BPF program that's triggered on sys_enter
> system-wide, but has no protection from some unrelated process doing syscall
> while selftest is running. This leads to occasional test failures with
> unexpected PIDs being returned. Fix that by filtering out all processes that
> are not test_progs process.
> 
> Fixes: fcda189a5133 ("selftests/bpf: Add test relying only on CO-RE and no recent kernel features")
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* Re: [PATCH net] net: gre: recompute gre csum for sctp over gre tunnels
From: Marcelo Ricardo Leitner @ 2020-07-31 22:04 UTC (permalink / raw)
  To: Lorenzo Bianconi; +Cc: netdev, davem, lorenzo.bianconi, tom
In-Reply-To: <6722d2c4fe1b9b376a277b3f35cdf3eb3345874e.1596218124.git.lorenzo@kernel.org>

On Fri, Jul 31, 2020 at 08:12:05PM +0200, Lorenzo Bianconi wrote:
> The GRE tunnel can be used to transport traffic that does not rely on a
> Internet checksum (e.g. SCTP). The issue can be triggered creating a GRE
> or GRETAP tunnel and transmitting SCTP traffic ontop of it where CRC
> offload has been disabled. In order to fix the issue we need to
> recompute the GRE csum in gre_gso_segment() not relying on the inner
> checksum.
> The issue is still present when we have the CRC offload enabled.
> In this case we need to disable the CRC offload if we require GRE
> checksum since otherwise skb_checksum() will report a wrong value.
> 
> Fixes: 4749c09c37030 ("gre: Call gso_make_checksum")
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

^ permalink raw reply

* Re: Bug: ip utility fails to show routes with large # of multipath next-hops
From: Ashutosh Grewal @ 2020-07-31 22:26 UTC (permalink / raw)
  To: David Ahern; +Cc: Ido Schimmel, davem, netdev
In-Reply-To: <a7652bf8-e7fc-a76f-0fa7-2457128e2abc@gmail.com>

Thanks Ido and David for your confirmation and insight.

-- Ashutosh

On Wed, Jul 29, 2020 at 8:17 AM David Ahern <dsahern@gmail.com> wrote:
>
> On 7/29/20 5:43 AM, Ido Schimmel wrote:
> > On Tue, Jul 28, 2020 at 05:52:44PM -0700, Ashutosh Grewal wrote:
> >> Hello David and all,
> >>
> >> I hope this is the correct way to report a bug.
> >
> > Sure
> >
> >>
> >> I observed this problem with 256 v4 next-hops or 128 v6 next-hops (or
> >> 128 or so # of v4 next-hops with labels).
> >>
> >> Here is an example -
> >>
> >> root@a6be8c892bb7:/# ip route show 2.2.2.2
> >> Error: Buffer too small for object.
> >> Dump terminated
> >>
> >> Kernel details (though I recall running into the same problem on 4.4*
> >> kernel as well) -
> >> root@ubuntu-vm:/# uname -a
> >> Linux ch1 5.4.0-33-generic #37-Ubuntu SMP Thu May 21 12:53:59 UTC 2020
> >> x86_64 x86_64 x86_64 GNU/Linux
> >>
> >> I think the problem may be to do with the size of the skbuf being
> >> allocated as part of servicing the netlink request.
> >>
> >> static int netlink_dump(struct sock *sk)
> >> {
> >>   <snip>
> >>
> >>                 skb = alloc_skb(...)
> >
> > Yes, I believe you are correct. You will get an skb of size 4K and it
> > can't fit the entire RTA_MULTIPATH attribute with all the nested
> > nexthops. Since it's a single attribute it cannot be split across
> > multiple messages.
>
> yep, well known problem.
>
> >
> > Looking at the code, I think a similar problem was already encountered
> > with IFLA_VFINFO_LIST. See commit c7ac8679bec9 ("rtnetlink: Compute and
> > store minimum ifinfo dump size").
> >
> > Maybe we can track the maximum number of IPv4/IPv6 nexthops during
> > insertion and then consult it to adjust 'min_dump_alloc' for
> > RTM_GETROUTE.
>
> That seems better than the current design for GETLINK which walks all
> devices to determine max dump size. Not sure how you will track that
> efficiently though - add is easy, delete is not.
>
> >
> > It's a bit complicated for IPv6 because you can append nexthops, but I
> > believe anyone using so many nexthops is already using RTA_MULTIPATH to
> > insert them, so we can simplify.
>
> I hope so.
>
> >
> > David, what do you think? You have a better / simpler idea? Maybe one
> > day everyone will be using the new nexthop API and this won't be needed
> > :)
>
> exactly. You won't have this problem with separate nexthops since each
> one is small (< 4k) and the group (multipath) is a set of ids, not the
> full set of attributes.

^ permalink raw reply

* Re: [PATCH v2 net-next 2/9] mptcp: token: move retry to caller
From: Mat Martineau @ 2020-07-31 22:37 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, edumazet, matthieu.baerts, pabeni
In-Reply-To: <20200730192558.25697-3-fw@strlen.de>

On Thu, 30 Jul 2020, Florian Westphal wrote:

> Once syncookie support is added, no state will be stored anymore when the
> syn/ack is generated in syncookie mode.
>
> When the ACK comes back, the generated key will be taken from the TCP ACK,
> the token is re-generated and inserted into the token tree.
>
> This means we can't retry with a new key when the token is already taken
> in the syncookie case.
>
> Therefore, move the retry logic to the caller to prepare for syncookie
> support in mptcp.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> net/mptcp/subflow.c |  9 ++++++++-
> net/mptcp/token.c   | 12 ++++--------
> 2 files changed, 12 insertions(+), 9 deletions(-)

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply

* Re: [PATCH v2 net-next 3/9] mptcp: subflow: split subflow_init_req
From: Mat Martineau @ 2020-07-31 22:37 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, edumazet, matthieu.baerts, pabeni
In-Reply-To: <20200730192558.25697-4-fw@strlen.de>

On Thu, 30 Jul 2020, Florian Westphal wrote:

> When syncookie support is added, we will need to add a variant of
> subflow_init_req() helper.  It will do almost same thing except
> that it will not compute/add a token to the mptcp token tree.
>
> To avoid excess copy&paste, this commit splits away part of the
> code into a new helper, __subflow_init_req, that can then be re-used
> from the 'no insert' function added in a followup change.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> net/mptcp/subflow.c | 32 ++++++++++++++++++++++----------
> 1 file changed, 22 insertions(+), 10 deletions(-)
>

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply

* Re: [PATCH v2 net-next 4/9] mptcp: rename and export mptcp_subflow_request_sock_ops
From: Mat Martineau @ 2020-07-31 22:38 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, edumazet, matthieu.baerts, pabeni
In-Reply-To: <20200730192558.25697-5-fw@strlen.de>

On Thu, 30 Jul 2020, Florian Westphal wrote:

> syncookie code path needs to create an mptcp request sock.
>
> Prepare for this and add mptcp prefix plus needed export of ops struct.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> include/net/mptcp.h |  1 +
> net/mptcp/subflow.c | 11 ++++++-----
> 2 files changed, 7 insertions(+), 5 deletions(-)

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply

* Re: [PATCH v2 net-next 5/9] mptcp: subflow: add mptcp_subflow_init_cookie_req helper
From: Mat Martineau @ 2020-07-31 22:38 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, edumazet, matthieu.baerts, pabeni
In-Reply-To: <20200730192558.25697-6-fw@strlen.de>

On Thu, 30 Jul 2020, Florian Westphal wrote:

> Will be used to initialize the mptcp request socket when a MP_CAPABLE
> request was handled in syncookie mode, i.e. when a TCP ACK containing a
> MP_CAPABLE option is a valid syncookie value.
>
> Normally (non-cookie case), MPTCP will generate a unique 32 bit connection
> ID and stores it in the MPTCP token storage to be able to retrieve the
> mptcp socket for subflow joining.
>
> In syncookie case, we do not want to store any state, so just generate the
> unique ID and use it in the reply.
>
> This means there is a small window where another connection could generate
> the same token.
>
> When Cookie ACK comes back, we check that the token has not been registered
> in the mean time.  If it was, the connection needs to fall back to TCP.
>
> Changes in v2:
> - use req->syncookie instead of passing 'want_cookie' arg to ->init_req()
>   (Eric Dumazet)
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> include/net/mptcp.h  | 10 +++++++++
> net/mptcp/protocol.h |  1 +
> net/mptcp/subflow.c  | 50 +++++++++++++++++++++++++++++++++++++++++++-
> net/mptcp/token.c    | 26 +++++++++++++++++++++++
> 4 files changed, 86 insertions(+), 1 deletion(-)

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply

* Re: [PATCH v2 net-next 7/9] mptcp: enable JOIN requests even if cookies are in use
From: Mat Martineau @ 2020-07-31 22:39 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, edumazet, matthieu.baerts, pabeni
In-Reply-To: <20200730192558.25697-8-fw@strlen.de>

On Thu, 30 Jul 2020, Florian Westphal wrote:

> JOIN requests do not work in syncookie mode -- for HMAC validation, the
> peers nonce and the mptcp token (to obtain the desired connection socket
> the join is for) are required, but this information is only present in the
> initial syn.
>
> So either we need to drop all JOIN requests once a listening socket enters
> syncookie mode, or we need to store enough state to reconstruct the request
> socket later.
>
> This adds a state table (1024 entries) to store the data present in the
> MP_JOIN syn request and the random nonce used for the cookie syn/ack.
>
> When a MP_JOIN ACK passed cookie validation, the table is consulted
> to rebuild the request socket from it.
>
> An alternate approach would be to "cancel" syn-cookie mode and force
> MP_JOIN to always use a syn queue entry.
>
> However, doing so brings the backlog over the configured queue limit.
>
> v2: use req->syncookie, not (removed) want_cookie arg
>
> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> net/ipv4/syncookies.c  |   6 ++
> net/mptcp/Makefile     |   1 +
> net/mptcp/ctrl.c       |   1 +
> net/mptcp/protocol.h   |  20 +++++++
> net/mptcp/subflow.c    |  14 +++++
> net/mptcp/syncookies.c | 132 +++++++++++++++++++++++++++++++++++++++++
> 6 files changed, 174 insertions(+)
> create mode 100644 net/mptcp/syncookies.c

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply

* Re: [PATCH v2 net-next 8/9] selftests: mptcp: make 2nd net namespace use tcp syn cookies unconditionally
From: Mat Martineau @ 2020-07-31 22:39 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, edumazet, matthieu.baerts, pabeni
In-Reply-To: <20200730192558.25697-9-fw@strlen.de>

On Thu, 30 Jul 2020, Florian Westphal wrote:

> check we can establish connections also when syn cookies are in use.
>
> Check that
> MPTcpExtMPCapableSYNRX and MPTcpExtMPCapableACKRX increase for each
> MPTCP test.
>
> Check TcpExtSyncookiesSent and TcpExtSyncookiesRecv increase in netns2.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> .../selftests/net/mptcp/mptcp_connect.sh      | 47 +++++++++++++++++++
> 1 file changed, 47 insertions(+)

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply

* Re: [PATCH v2 net-next 9/9] selftests: mptcp: add test cases for mptcp join tests with syn cookies
From: Mat Martineau @ 2020-07-31 22:39 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, edumazet, matthieu.baerts, pabeni
In-Reply-To: <20200730192558.25697-10-fw@strlen.de>

On Thu, 30 Jul 2020, Florian Westphal wrote:

> Also add test cases with MP_JOIN when tcp_syncookies sysctl is 2 (i.e.,
> syncookies are always-on).
>
> While at it, also print the test number and add the test number
> to the pcap files that can be generated optionally.
>
> This makes it easier to match the pcap to the test case.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> .../testing/selftests/net/mptcp/mptcp_join.sh | 66 ++++++++++++++++++-
> 1 file changed, 64 insertions(+), 2 deletions(-)

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

--
Mat Martineau
Intel

^ permalink raw reply

* Re: [PATCH] [net/ethtool] ethnl_set_linkmodes: remove redundant null check
From: Michal Kubecek @ 2020-07-31 13:54 UTC (permalink / raw)
  To: Gaurav Singh
  Cc: David S. Miller, Jakub Kicinski, Florian Fainelli, Andrew Lunn,
	Oleksij Rempel, YueHaibing, Aya Levin,
	open list:NETWORKING [GENERAL], open list
In-Reply-To: <20200731045908.32466-1-gaurav1086@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1337 bytes --]

On Fri, Jul 31, 2020 at 12:58:44AM -0400, Gaurav Singh wrote:
> info cannot be NULL here since its being accessed earlier
> in the function: nlmsg_parse(info->nlhdr...). Remove this
> redundant NULL check.

This is what the static checker tells you but it could still mean the
other place is missing the check. The actual reason why this check is
superfluous is that the function is only used as ->doit() handler which
is never called with null info.

> Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>

The subject should rather start with "ethtool: " (instead of "[net/ethtool] ").

For the change itself:

Reviewed-by: Michal Kubecek <mkubecek@suse.cz>

Michal

> ---
>  net/ethtool/linkmodes.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/ethtool/linkmodes.c b/net/ethtool/linkmodes.c
> index fd4f3e58c6f6..b595d87fa880 100644
> --- a/net/ethtool/linkmodes.c
> +++ b/net/ethtool/linkmodes.c
> @@ -406,8 +406,7 @@ int ethnl_set_linkmodes(struct sk_buff *skb, struct genl_info *info)
>  
>  	ret = __ethtool_get_link_ksettings(dev, &ksettings);
>  	if (ret < 0) {
> -		if (info)
> -			GENL_SET_ERR_MSG(info, "failed to retrieve link settings");
> +		GENL_SET_ERR_MSG(info, "failed to retrieve link settings");
>  		goto out_ops;
>  	}
>  
> -- 
> 2.17.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers in corner case
From: John Fastabend @ 2020-07-31 22:46 UTC (permalink / raw)
  To: Daniel Borkmann, John Fastabend, kafai, ast; +Cc: netdev, bpf
In-Reply-To: <546828c9-a6bb-57d3-9a9d-83f4e0131163@iogearbox.net>

Daniel Borkmann wrote:
> On 7/29/20 6:22 PM, John Fastabend wrote:
> > I had a sockmap program that after doing some refactoring started spewing
> > this splat at me:
> > 
> > [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> > [...]
> > [18610.807359] Call Trace:
> > [18610.807370]  ? 0xffffffffc114d0d5
> > [18610.807382]  __cgroup_bpf_run_filter_sock_ops+0x7d/0xb0
> > [18610.807391]  tcp_connect+0x895/0xd50
> > [18610.807400]  tcp_v4_connect+0x465/0x4e0
> > [18610.807407]  __inet_stream_connect+0xd6/0x3a0
> > [18610.807412]  ? __inet_stream_connect+0x5/0x3a0
> > [18610.807417]  inet_stream_connect+0x3b/0x60
> > [18610.807425]  __sys_connect+0xed/0x120
> > 
> > After some debugging I was able to build this simple reproducer,
> > 
> >   __section("sockops/reproducer_bad")
> >   int bpf_reproducer_bad(struct bpf_sock_ops *skops)
> >   {
> >          volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >          return 0;
> >   }
> > 
> > And along the way noticed that below program ran without splat,
> > 
> > __section("sockops/reproducer_good")
> > int bpf_reproducer_good(struct bpf_sock_ops *skops)
> > {
> >          volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >          volatile __maybe_unused __u32 family;
> > 
> >          compiler_barrier();
> > 
> >          family = skops->family;
> >          return 0;
> > }
> > 
> > So I decided to check out the code we generate for the above two
> > programs and noticed each generates the BPF code you would expect,
> > 
> > 0000000000000000 <bpf_reproducer_bad>:
> > ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >         0:       r1 = *(u32 *)(r1 + 96)
> >         1:       *(u32 *)(r10 - 4) = r1
> > ;       return 0;
> >         2:       r0 = 0
> >         3:       exit
> > 
> > 0000000000000000 <bpf_reproducer_good>:
> > ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >         0:       r2 = *(u32 *)(r1 + 96)
> >         1:       *(u32 *)(r10 - 4) = r2
> > ;       family = skops->family;
> >         2:       r1 = *(u32 *)(r1 + 20)
> >         3:       *(u32 *)(r10 - 8) = r1
> > ;       return 0;
> >         4:       r0 = 0
> >         5:       exit
> > 
> > So we get reasonable assembly, but still something was causing the null
> > pointer dereference. So, we load the programs and dump the xlated version
> > observing that line 0 above 'r* = *(u32 *)(r1 +96)' is going to be
> > translated by the skops access helpers.
> > 
> > int bpf_reproducer_bad(struct bpf_sock_ops * skops):
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >     0: (61) r1 = *(u32 *)(r1 +28)
> >     1: (15) if r1 == 0x0 goto pc+2
> >     2: (79) r1 = *(u64 *)(r1 +0)
> >     3: (61) r1 = *(u32 *)(r1 +2340)
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >     4: (63) *(u32 *)(r10 -4) = r1
> > ; return 0;
> >     5: (b7) r0 = 0
> >     6: (95) exit
> > 
> > int bpf_reproducer_good(struct bpf_sock_ops * skops):
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >     0: (61) r2 = *(u32 *)(r1 +28)
> >     1: (15) if r2 == 0x0 goto pc+2
> >     2: (79) r2 = *(u64 *)(r1 +0)
> >     3: (61) r2 = *(u32 *)(r2 +2340)
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >     4: (63) *(u32 *)(r10 -4) = r2
> > ; family = skops->family;
> >     5: (79) r1 = *(u64 *)(r1 +0)
> >     6: (69) r1 = *(u16 *)(r1 +16)
> > ; family = skops->family;
> >     7: (63) *(u32 *)(r10 -8) = r1
> > ; return 0;
> >     8: (b7) r0 = 0
> >     9: (95) exit
> > 
> > Then we look at lines 0 and 2 above. In the good case we do the zero
> > check in r2 and then load 'r1 + 0' at line 2. Do a quick cross-check
> > into the bpf_sock_ops check and we can confirm that is the 'struct
> > sock *sk' pointer field. But, in the bad case,
> > 
> >     0: (61) r1 = *(u32 *)(r1 +28)
> >     1: (15) if r1 == 0x0 goto pc+2
> >     2: (79) r1 = *(u64 *)(r1 +0)
> > 
> > Oh no, we read 'r1 +28' into r1, this is skops->fullsock and then in
> > line 2 we read the 'r1 +0' as a pointer. Now jumping back to our spat,
> > 
> > [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> > 
> > The 0x01 makes sense because that is exactly the fullsock value. And
> > its not a valid dereference so we splat.
> > 
> > To fix we need to guard the case when a program is doing a sock_ops field
> > access with src_reg == dst_reg. This is already handled in the load case
> > where the ctx_access handler uses a tmp register being careful to
> > store the old value and restore it. To fix the get case test if
> > src_reg == dst_reg and in this case do the is_fullsock test in the
> > temporary register. Remembering to restore the temporary register before
> > writing to either dst_reg or src_reg to avoid smashing the pointer into
> > the struct holding the tmp variable.
> > 
> > Adding this inline code to test_tcpbpf_kern will now be generated
> > correctly from,
> > 
> >    9: r2 = *(u32 *)(r2 + 96)
> > 
> > to xlated code,

I have this in my logs at line 12,

                *(u64 *)(r2 +32) = r9
> >    13: (61) r9 = *(u32 *)(r2 +28)
> >    14: (15) if r9 == 0x0 goto pc+4
> >    15: (79) r9 = *(u64 *)(r2 +32)
> >    16: (79) r2 = *(u64 *)(r2 +0)
> >    17: (61) r2 = *(u32 *)(r2 +2348)
> >    18: (05) goto pc+1
> >    19: (79) r9 = *(u64 *)(r2 +32)
> 
> The diff below looks good to me, but I'm confused on this one above. I'm probably
> missing something, but given this is the dst == src case with the r2 register, where
> in the dump do we first saves the content of r9 into the scratch tmp store?
> Line 19 seems to restore it, but the save is missing, no?
> 
> Please double check whether this was just omitted, but I would really like to have
> the commit message 100% correct as it otherwise causes confusion when we stare at it
> again a month later wrt what was the original intention.

off-by-one on the cut'n'paste into the commit message. Let me send a v3
with a correction to the commit. I do want this to be correct.

^ permalink raw reply

* Re: [PATCH bpf-next v3 27/29] bpf: eliminate rlimit-based memory accounting infra for bpf maps
From: Song Liu @ 2020-07-31 22:47 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann, Kernel Team,
	open list
In-Reply-To: <20200730212310.2609108-28-guro@fb.com>

On Thu, Jul 30, 2020 at 2:27 PM Roman Gushchin <guro@fb.com> wrote:
>
> Remove rlimit-based accounting infrastructure code, which is not used
> anymore.
>
> Signed-off-by: Roman Gushchin <guro@fb.com>

The code is good, so

Acked-by: Song Liu <songliubraving@fb.com>

However, I am still nervous as we deprecate memlock.

> ---
>  include/linux/bpf.h                           | 12 ----
>  kernel/bpf/syscall.c                          | 64 +------------------
>  .../selftests/bpf/progs/map_ptr_kern.c        |  5 --
>  3 files changed, 2 insertions(+), 79 deletions(-)

[...]

^ permalink raw reply

* Re: [PATCH bpf-next v3 01/29] bpf: memcg-based memory accounting for bpf progs
From: Song Liu @ 2020-07-31 22:48 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann, Kernel Team,
	open list
In-Reply-To: <20200730212310.2609108-2-guro@fb.com>

On Thu, Jul 30, 2020 at 2:28 PM Roman Gushchin <guro@fb.com> wrote:
>
> Include memory used by bpf programs into the memcg-based accounting.
> This includes the memory used by programs itself, auxiliary data
> and statistics.
>
> Signed-off-by: Roman Gushchin <guro@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply

* RE: [PATCH v2 bpf-next 1/5] bpf: add support for forced LINK_DETACH command
From: John Fastabend @ 2020-07-31 23:19 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Song Liu
In-Reply-To: <20200731182830.286260-2-andriin@fb.com>

Andrii Nakryiko wrote:
> Add LINK_DETACH command to force-detach bpf_link without destroying it. It has
> the same behavior as auto-detaching of bpf_link due to cgroup dying for
> bpf_cgroup_link or net_device being destroyed for bpf_xdp_link. In such case,
> bpf_link is still a valid kernel object, but is defuncts and doesn't hold BPF
> program attached to corresponding BPF hook. This functionality allows users
> with enough access rights to manually force-detach attached bpf_link without
> killing respective owner process.
> 
> This patch implements LINK_DETACH for cgroup, xdp, and netns links, mostly
> re-using existing link release handling code.
> 
> Acked-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---

Looks necessary otherwise we have no way, as far as I read it, to delete
an XDP program and go back the no prog state after link_{create|update}.

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* RE: [PATCH v2 bpf-next 2/5] libbpf: add bpf_link detach APIs
From: John Fastabend @ 2020-07-31 23:22 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Song Liu
In-Reply-To: <20200731182830.286260-3-andriin@fb.com>

Andrii Nakryiko wrote:
> Add low-level bpf_link_detach() API. Also add higher-level bpf_link__detach()
> one.
> 
> Acked-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---

I like it nice and straightforward.

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* RE: [PATCH v2 bpf-next 3/5] selftests/bpf: add link detach tests for cgroup, netns, and xdp bpf_links
From: John Fastabend @ 2020-07-31 23:29 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Song Liu
In-Reply-To: <20200731182830.286260-4-andriin@fb.com>

Andrii Nakryiko wrote:
> Add bpf_link__detach() testing to selftests for cgroup, netns, and xdp
> bpf_links.
> 
> Acked-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* RE: [PATCH v2 bpf-next 4/5] tools/bpftool: add `link detach` subcommand
From: John Fastabend @ 2020-07-31 23:32 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Song Liu
In-Reply-To: <20200731182830.286260-5-andriin@fb.com>

Andrii Nakryiko wrote:
> Add ability to force-detach BPF link. Also add missing error message, if
> specified link ID is wrong.
> 
> Acked-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* RE: [PATCH v2 bpf-next 5/5] tools/bpftool: add documentation and bash-completion for `link detach`
From: John Fastabend @ 2020-07-31 23:35 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Song Liu
In-Reply-To: <20200731182830.286260-6-andriin@fb.com>

Andrii Nakryiko wrote:
> Add info on link detach sub-command to man page. Add detach to bash-completion
> as well.
> 
> Acked-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---

Acked-by: John Fastabend <john.fastabend@gmail.com.

^ permalink raw reply

* Re: [PATCH net] net/sched: The error lable position is corrected in ct_init_module
From: David Miller @ 2020-07-31 23:36 UTC (permalink / raw)
  To: liujian56; +Cc: jhs, xiyou.wangcong, jiri, kuba, paulb, netdev
In-Reply-To: <20200730081428.35904-1-liujian56@huawei.com>

From: Liu Jian <liujian56@huawei.com>
Date: Thu, 30 Jul 2020 16:14:28 +0800

> From: liujian <liujian56@huawei.com>
> 
> Exchange the positions of the err_tbl_init and err_register labels in
> ct_init_module function.
> 
> Fixes: c34b961a2492 ("net/sched: act_ct: Create nf flow table per zone")
> Signed-off-by: liujian <liujian56@huawei.com>

Applied, thank you.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox