Netdev List
 help / color / mirror / Atom feed
* Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock
From: Paul E. McKenney @ 2019-01-30 18:36 UTC (permalink / raw)
  To: Will Deacon
  Cc: Alexei Starovoitov, Peter Zijlstra, Alexei Starovoitov, davem,
	daniel, jakub.kicinski, netdev, kernel-team, mingo, jannh
In-Reply-To: <20190130181100.GA18558@fuggles.cambridge.arm.com>

On Wed, Jan 30, 2019 at 06:11:00PM +0000, Will Deacon wrote:
> Hi Alexei,
> 
> On Mon, Jan 28, 2019 at 01:56:24PM -0800, Alexei Starovoitov wrote:
> > On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote:
> > > On Fri, Jan 25, 2019 at 04:17:26PM -0800, Alexei Starovoitov wrote:
> > > > What I want to avoid is to define the whole execution ordering model upfront.
> > > > We cannot say that BPF ISA is weakly ordered like alpha.
> > > > Most of the bpf progs are written and running on x86. We shouldn't
> > > > twist bpf developer's arm by artificially relaxing memory model.
> > > > BPF memory model is equal to memory model of underlying architecture.
> > > > What we can do is to make it bpf progs a bit more portable with
> > > > smp_rmb instructions, but we must not force weak execution on the developer.
> > > 
> > > Well, I agree with only introducing bits you actually need, and my
> > > smp_rmb() example might have been poorly chosen, smp_load_acquire() /
> > > smp_store_release() might have been a far more useful example.
> > > 
> > > But I disagree with the last part; we have to pick a model now;
> > > otherwise you'll pain yourself into a corner.
> > > 
> > > Also; Alpha isn't very relevant these days; however ARM64 does seem to
> > > be gaining a lot of attention and that is very much a weak architecture.
> > > Adding strongly ordered assumptions to BPF now, will penalize them in
> > > the long run.
> > 
> > arm64 is gaining attention just like riscV is gaining it too.
> > BPF jit for arm64 is very solid, while BPF jit for riscV is being worked on.
> > BPF is not picking sides in CPU HW and ISA battles.
> 
> It's not about picking a side, it's about providing an abstraction of the
> various CPU architectures out there so that the programmer doesn't need to
> worry about where their program may run. Hell, even if you just said "eBPF
> follows x86 semantics" that would be better than saying nothing (and then we
> could have a discussion about whether x86 semantics are really what you
> want).

To reinforce this point, the Linux-kernel memory model (tools/memory-model)
is that abstraction for the Linux kernel.  Why not just use that for BPF?

							Thanx, Paul

> > Memory model is CPU HW design decision. BPF ISA cannot dictate HW design.
> > We're not saying today that BPF is strongly ordered.
> > BPF load/stores are behaving differently on x86 vs arm64.
> > We can add new instructions, but we cannot 'define' how load/stores behave
> > from memory model perspective.
> > For example, take atomicity of single byte load/store.
> > Not all archs have them atomic, but we cannot say to bpf programmers
> > to always assume non-atomic byte loads.
> 
> Hmm, I don't think this is a good approach to take for the future of eBPF.
> Assuming that a desirable property of an eBPF program is portability between
> CPU architectures, then you're effectively forcing the programmer to "assume
> the worst", where the worst is almost certainly unusable for practical
> purposes.
> 
> One easy thing you could consider would be to allow tagging of an eBPF
> program with its supported target architectures (the JIT will refuse to
> accept it for other architectures). This would at least prevent remove the
> illusion of portability and force the programmer to be explicit.
> 
> However, I think we'd much better off if we defined some basic ordering
> primitives such as relaxed and RCpc-style acquire/release operations
> (including atomic RmW), single-copy atomic memory accesses up to the native
> machine size and a full-fence instruction. If your program uses something
> that the underlying arch doesn't support, then it is rejected (e.g. 64-bit
> READ_ONCE on a 32-bit arch)
> 
> That should map straightforwardly to all modern architectures and allow for
> efficient codegen on x86 and arm64. It would probably require a bunch of new
> BPF instructions that would be defined to be atomic (you already have XADD
> as a relaxed atomic add).
> 
> Apologies if this sounds patronising, but I'm keen to help figure out the
> semantics *now* so that we don't end up having to infer them later on, which
> is the usual painful case for memory models. I suspect Peter and Paul would
> also prefer to attack it that way around. I appreciate that the temptation
> is to avoid the problem by deferring to the underlying hardware memory
> model, but I think that will create more problems than it solves and we're
> here to help you get this right.
> 
> Will
> 


^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net] ixgbe: fix potential RX buffer starvation for AF_XDP
From: Jakub Kicinski @ 2019-01-30 18:22 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Magnus Karlsson, Network Development, Björn Töpel,
	intel-wired-lan, Magnus Karlsson
In-Reply-To: <CAJ+HfNg=vUqUszLX+C6LvYRq8J_X_UiU4x63FCaXWqdRjV5DGA@mail.gmail.com>

On Wed, 30 Jan 2019 15:16:15 +0100, Björn Töpel wrote:
> Den ons 30 jan. 2019 kl 10:35 skrev Magnus Karlsson:
> >
> > On Tue, Jan 29, 2019 at 8:15 PM Jakub Kicinski wrote: 
> > >
> > > I may not understand the problem fully, but isn't it kind of normal
> > > that if you create a ring empty you'll never receive packets?  And it
> > > should be reasonably easy to catch while writing an app from scratch
> > > (i.e. it behaves deterministically).  
> >
> > Agree that this should be the normal behavior for a NIC. The question
> > is how to get out of this situation.There are two options: punt this
> > to the application writer or fix this in the driver. I chose to fix
> > the driver since this removes complexity in the application.
> >  
> 
> Magnus' fix addresses a race/timing issue. At zero-copy initialization
> point, if the fill ring was empty, the driver (both i40e and ixgbe)
> would stop retrying to "allocate" zero-copy frames from the fill
> ring. So, frames would never be received, even if the fill ring was
> filled at a later point.
> 
> If the driver runs-dry in terms of Rx buffer if one or more frames has
> been received, the driver will retry polling the fill-ring. However at
> initialization point, if the fill-ring was empty, the driver would
> just give up and never retry.
> 
> As Magnus stated, there is no "notify the kernel that the items has
> appeared in the fill ring" (other than a HW mechanism where the tail
> pointer is a door bell) on the Rx side, so for the Intel drivers it's
> up to the driver to solve this.

Oh, I see, so its a missing piece in an otherwise very fault tolerant
implementation :)  Okay, no objection.

The nfp prototype I did simply fails the program load if there are not
enough buffers to do the initial fill, but that's just a personal
preference.  I tend to return errors in unclear situations more than
most people.

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH] iavf: Use printf instead of gnu_printf for iavf_debug_d
From: Bowers, AndrewX @ 2019-01-30 18:20 UTC (permalink / raw)
  To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	intel-wired-lan@lists.osuosl.org
In-Reply-To: <20190110042157.8445-1-natechancellor@gmail.com>

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Nathan Chancellor
> Sent: Wednesday, January 9, 2019 8:22 PM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: netdev@vger.kernel.org; Nick Desaulniers <ndesaulniers@google.com>;
> linux-kernel@vger.kernel.org; Miguel Ojeda
> <miguel.ojeda.sandonis@gmail.com>; intel-wired-lan@lists.osuosl.org;
> Nathan Chancellor <natechancellor@gmail.com>; David S. Miller
> <davem@davemloft.net>
> Subject: [Intel-wired-lan] [PATCH] iavf: Use printf instead of gnu_printf for
> iavf_debug_d
> 
> Clang warns:
> 
> In file included from drivers/net/ethernet/intel/iavf/iavf_main.c:4:
> In file included from drivers/net/ethernet/intel/iavf/iavf.h:37:
> In file included from drivers/net/ethernet/intel/iavf/iavf_type.h:8:
> drivers/net/ethernet/intel/iavf/iavf_osdep.h:49:18: warning: 'format'
> attribute argument not supported: gnu_printf [-Wignored-attributes]
>         __attribute__ ((format(gnu_printf, 3, 4)));
>                         ^
> 1 warning generated.
> 
> We can convert from gnu_printf to printf without any side effects for two
> reasons:
> 
> 1. All iavf_debug instances use standard printf formats, as pointed out
>    by Miguel Ojeda at the below link, meaning gnu_printf is not strictly
>    required.
> 
> 2. However, GCC has aliased printf to gnu_printf on Linux since at least
>    2010 based on git history.
> 
>    From gcc/c-family/c-format.c:
> 
>    /* Attributes such as "printf" are equivalent to those such as
>       "gnu_printf" unless this is overridden by a target.  */
>    static const target_ovr_attr gnu_target_overrides_format_attributes[] =
>    {
>      { "gnu_printf",   "printf" },
>      { "gnu_scanf",    "scanf" },
>      { "gnu_strftime", "strftime" },
>      { "gnu_strfmon",  "strfmon" },
>      { NULL,           NULL }
>    };
> 
> The mentioned override only happens on Windows (mingw32). Changing
> from gnu_printf to printf is a no-op for GCC and stops Clang from warning.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/111
> Suggested-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> ---
>  drivers/net/ethernet/intel/iavf/iavf_osdep.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply

* Re: [PATCH net 0/4] various compat ioctl fixes
From: David Miller @ 2019-01-30 18:20 UTC (permalink / raw)
  To: viro; +Cc: johannes, netdev, robert
In-Reply-To: <20190130154009.GJ2217@ZenIV.linux.org.uk>

From: Al Viro <viro@zeniv.linux.org.uk>
Date: Wed, 30 Jan 2019 15:40:09 +0000

> On Mon, Jan 28, 2019 at 10:32:30PM +0100, Johannes Berg wrote:
> 
>> At the same time, fixing all this _completely_ is not very realistic, it
>> would require passing the ifreq size through to lots of places and
>> making the user copy there take the size rather than sizeof(ifreq),
>> obviously the very least to the method decnet uses, i.e. sock->ioctl() I
>> think, but clearly that affects every other protocol too.
>> This was what my previous patch had done partially for the directly
>> handled ioctls (the revert of which is the first patch in this series).
>> 
>> > From what I can see this looks like probably the simplest way to
>> > fix this in net and -stable currently.
>> 
>> I tend to agree, at least to fix the regression.
>> 
>> We can still deliberate separately if we want to fix decnet for compat
>> or if nobody cares now. But perhaps better decnet broken (quite
>> obviously and detectably) like it basically always was, than IP broken
>> (subtly, if your struct ends up landing at the end of a page).
>> 
>> Al, care to speak up about this here?
> 
> Umm...  Short-term I don't see anything better; long-term I would really
> like to see compat_alloc_user_space()/copy_in_user() crap gone and
> copyin-copyout for anything more or less generic lifted up as far as
> cleanly possible, but let's not mix it with regression fixing.

It's a real shame, I thought it was a super clever solution to that
problem space at the time we added it.

> So for the lack of better short-term solutions,
> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
> on the series.

Ok, series applied, thanks everyone.

I'll queue this up for -stable too.

^ permalink raw reply

* [PATCH] selftests: net: fix "from" match test in fib_rule_tests.sh
From: Marcelo Henrique Cerri @ 2019-01-30 18:19 UTC (permalink / raw)
  To: David S. Miller, Shuah Khan, netdev, linux-kselftest,
	linux-kernel
  Cc: Marcelo Henrique Cerri

Fix the IPv4 address of the dummy0 interface and ensure that ip_forward
is enabled in the network space to get a valid response when checking
for routes between the gateway and other hosts.

Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
---
 tools/testing/selftests/net/fib_rule_tests.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/fib_rule_tests.sh b/tools/testing/selftests/net/fib_rule_tests.sh
index d4cfb6a7a086..552a9784e759 100755
--- a/tools/testing/selftests/net/fib_rule_tests.sh
+++ b/tools/testing/selftests/net/fib_rule_tests.sh
@@ -54,9 +54,11 @@ setup()
 
 	$IP link add dummy0 type dummy
 	$IP link set dev dummy0 up
-	$IP address add 198.51.100.1/24 dev dummy0
+	$IP address add 192.51.100.1/24 dev dummy0
 	$IP -6 address add 2001:db8:1::1/64 dev dummy0
 
+	ip netns exec testns sysctl -w net.ipv4.ip_forward=1
+
 	set +e
 }
 
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH bpf-next v4 5/7] samples/bpf: Add a "force" flag to XDP samples
From: Maciej Fijalkowski @ 2019-01-30 18:12 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: daniel, ast, netdev, jakub.kicinski
In-Reply-To: <20190129123405.787746f7@redhat.com>

On Tue, 29 Jan 2019 12:34:05 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> On Tue, 29 Jan 2019 09:00:00 +0100
> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> 
> > On Mon, 28 Jan 2019 20:16:11 +0100
> > Maciej Fijalkowski <maciejromanfijalkowski@gmail.com> wrote:
> >   
> > > From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > 
> > > Make xdp samples consistent with iproute2 behavior and set the
> > > XDP_FLAGS_UPDATE_IF_NOEXIST by default when setting the xdp program on
> > > interface. Provide an option for user to force the program loading,
> > > which as a result will not include the mentioned flag in
> > > bpf_set_link_xdp_fd call.    
> > 
> > I like the idea, but what is the error message users get after this
> > change?  
> 
> $ sudo ./xdp1 mlx5p1 &
> [1] 9768
> 
> $ sudo ./xdp1 mlx5p1 
> link set xdp fd failed
> 
> This error message is a little too generic to be a good user experience.
> The kernel (in dev_change_xdp_fd) will return errno -EBUSY (-16), but
> we don't use or report the return value in these sample programs.
> 
> If my QA see this error message, I will still get an error report
> bugzilla that I need to spend time on investigating.  Can we please
> improve this error message? 
> 
> If you are really cool you get inspired by (or use) libbpf_strerror()
> code avail in tools/lib/bpf/libbpf_errno.c.  Default strerror(EBUSY)
> will return "Device or resource busy", but maybe we can do slightly
> better and report something more meaningful for this XDP context.
> 
I'll post a v5 with libbpf_strerror() usage when bpf_set_link_xdp_fd failed in
samples but at this point it will only give us a standard "device or resource
busy" error, so if we need some more meaningful message that libbpf will give
us then I guess we need to define a new libbpf_errno entry (as well as entry in
libbpf_strerror_table for this new errno value) and set the errno in
bpf_set_link_xdp_fd in case of a failure?

^ permalink raw reply

* Re: bpf memory model. Was: [PATCH v4 bpf-next 1/9] bpf: introduce bpf_spin_lock
From: Will Deacon @ 2019-01-30 18:11 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Alexei Starovoitov, davem, daniel, jakub.kicinski,
	netdev, kernel-team, mingo, Paul McKenney, jannh
In-Reply-To: <20190128215623.6eqskzhklydhympa@ast-mbp>

Hi Alexei,

On Mon, Jan 28, 2019 at 01:56:24PM -0800, Alexei Starovoitov wrote:
> On Mon, Jan 28, 2019 at 10:24:08AM +0100, Peter Zijlstra wrote:
> > On Fri, Jan 25, 2019 at 04:17:26PM -0800, Alexei Starovoitov wrote:
> > > What I want to avoid is to define the whole execution ordering model upfront.
> > > We cannot say that BPF ISA is weakly ordered like alpha.
> > > Most of the bpf progs are written and running on x86. We shouldn't
> > > twist bpf developer's arm by artificially relaxing memory model.
> > > BPF memory model is equal to memory model of underlying architecture.
> > > What we can do is to make it bpf progs a bit more portable with
> > > smp_rmb instructions, but we must not force weak execution on the developer.
> > 
> > Well, I agree with only introducing bits you actually need, and my
> > smp_rmb() example might have been poorly chosen, smp_load_acquire() /
> > smp_store_release() might have been a far more useful example.
> > 
> > But I disagree with the last part; we have to pick a model now;
> > otherwise you'll pain yourself into a corner.
> > 
> > Also; Alpha isn't very relevant these days; however ARM64 does seem to
> > be gaining a lot of attention and that is very much a weak architecture.
> > Adding strongly ordered assumptions to BPF now, will penalize them in
> > the long run.
> 
> arm64 is gaining attention just like riscV is gaining it too.
> BPF jit for arm64 is very solid, while BPF jit for riscV is being worked on.
> BPF is not picking sides in CPU HW and ISA battles.

It's not about picking a side, it's about providing an abstraction of the
various CPU architectures out there so that the programmer doesn't need to
worry about where their program may run. Hell, even if you just said "eBPF
follows x86 semantics" that would be better than saying nothing (and then we
could have a discussion about whether x86 semantics are really what you
want).

> Memory model is CPU HW design decision. BPF ISA cannot dictate HW design.
> We're not saying today that BPF is strongly ordered.
> BPF load/stores are behaving differently on x86 vs arm64.
> We can add new instructions, but we cannot 'define' how load/stores behave
> from memory model perspective.
> For example, take atomicity of single byte load/store.
> Not all archs have them atomic, but we cannot say to bpf programmers
> to always assume non-atomic byte loads.

Hmm, I don't think this is a good approach to take for the future of eBPF.
Assuming that a desirable property of an eBPF program is portability between
CPU architectures, then you're effectively forcing the programmer to "assume
the worst", where the worst is almost certainly unusable for practical
purposes.

One easy thing you could consider would be to allow tagging of an eBPF
program with its supported target architectures (the JIT will refuse to
accept it for other architectures). This would at least prevent remove the
illusion of portability and force the programmer to be explicit.

However, I think we'd much better off if we defined some basic ordering
primitives such as relaxed and RCpc-style acquire/release operations
(including atomic RmW), single-copy atomic memory accesses up to the native
machine size and a full-fence instruction. If your program uses something
that the underlying arch doesn't support, then it is rejected (e.g. 64-bit
READ_ONCE on a 32-bit arch)

That should map straightforwardly to all modern architectures and allow for
efficient codegen on x86 and arm64. It would probably require a bunch of new
BPF instructions that would be defined to be atomic (you already have XADD
as a relaxed atomic add).

Apologies if this sounds patronising, but I'm keen to help figure out the
semantics *now* so that we don't end up having to infer them later on, which
is the usual painful case for memory models. I suspect Peter and Paul would
also prefer to attack it that way around. I appreciate that the temptation
is to avoid the problem by deferring to the underlying hardware memory
model, but I think that will create more problems than it solves and we're
here to help you get this right.

Will

^ permalink raw reply

* Re: [PATCH -next] mISDN: hfcsusb: Fix potential NULL pointer dereference
From: David Miller @ 2019-01-30 18:10 UTC (permalink / raw)
  To: yuehaibing; +Cc: isdn, gustavo, bigeasy, linux-kernel, netdev
In-Reply-To: <20190130101902.19744-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 30 Jan 2019 18:19:02 +0800

> There is a potential NULL pointer dereference in case
> kzalloc() fails and returns NULL.
> 
> Fixes: 69f52adb2d53 ("mISDN: Add HFC USB driver")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> ---
>  drivers/isdn/hardware/mISDN/hfcsusb.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/isdn/hardware/mISDN/hfcsusb.c b/drivers/isdn/hardware/mISDN/hfcsusb.c
> index 124ff53..5660d5a 100644
> --- a/drivers/isdn/hardware/mISDN/hfcsusb.c
> +++ b/drivers/isdn/hardware/mISDN/hfcsusb.c
> @@ -263,6 +263,8 @@ hfcsusb_ph_info(struct hfcsusb *hw)
>  	int i;
>  
>  	phi = kzalloc(struct_size(phi, bch, dch->dev.nrbchan), GFP_ATOMIC);
> +	if (!phi)
> +		return;

If we fail with an error and do not perform the operation we were requested to
make, we must return an error to the caller, and the caller must do something
reasonable with that error (perhaps return it to it's caller) and so on and
so forth.

^ permalink raw reply

* Re: [PATCH net-next 0/5] mlxsw: spectrum_acl: Include delta bits into hashtable key
From: David Miller @ 2019-01-30 18:07 UTC (permalink / raw)
  To: idosch; +Cc: netdev, jiri, mlxsw
In-Reply-To: <20190130085813.32161-1-idosch@mellanox.com>

From: Ido Schimmel <idosch@mellanox.com>
Date: Wed, 30 Jan 2019 08:58:29 +0000

> The Spectrum-2 ASIC allows multiple rules to use the same mask provided
> that the difference between their masks is small enough (up to 8
> consecutive delta bits). A more detailed explanation is provided in
> merge commit 756cd36626f7 ("Merge branch
> 'mlxsw-Introduce-algorithmic-TCAM-support'").
> 
> These delta bits are part of the rule's key and therefore rules that
> only differ in their delta bits can be inserted with the same A-TCAM
> mask. In case two rules share the same key and only differ in their
> priority, then the second will spill to the C-TCAM.
> 
> Current code does not take the delta bits into account when checking for
> duplicate rules, which leads to unnecessary spillage to the C-TCAM.
> This may result in reduced scale and performance.
> 
> Patch #1 includes the delta bits in the rule's key to avoid the above
> mentioned problem.
> 
> Patch #2 adds a tracepoint when a rule is inserted into the C-TCAM.
> 
> Patches #3-#5 add test cases to make sure unnecessary spillage into the
> C-TCAM does not occur.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH bpf-next v4 6/7] libbpf: Add a support for getting xdp prog id on ifindex
From: Maciej Fijalkowski @ 2019-01-30 17:53 UTC (permalink / raw)
  To: John Fastabend; +Cc: daniel, ast, netdev, jakub.kicinski, brouer
In-Reply-To: <a9a2cef6-9628-39de-2b05-eb3f88d8ed85@gmail.com>

On Mon, 28 Jan 2019 13:07:49 -0800
John Fastabend <john.fastabend@gmail.com> wrote:

> On 1/28/19 11:16 AM, Maciej Fijalkowski wrote:
> > From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > 
> > Since we have a dedicated netlink attributes for xdp setup on a
> > particular interface, it is now possible to retrieve the program id that
> > is currently attached to the interface. The use case is targeted for
> > sample xdp programs, which will store the program id just after loading
> > bpf program onto iface. On shutdown, the sample will make sure that it
> > can unload the program by querying again the iface and verifying that
> > both program id's matches.
> > 
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > ---  
> 
> small nit.
> 
> > +
> > +int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
> > +{
> > +	struct xdp_id_md xdp_id = {};
> > +	int sock, ret;
> > +	__u32 nl_pid;
> > +	__u32 mask;
> > +
> > +	if (flags & ~XDP_FLAGS_MASK)
> > +		return -EINVAL;
> > +
> > +	/* Check whether the single {HW,DRV,SKB} mode is set */
> > +	flags &= XDP_FLAGS_MODES;
> > +	mask = flags - 1;
> > +	if (flags && flags & mask)
> > +		return -EINVAL;
> > +
> > +	sock = libbpf_netlink_open(&nl_pid);
> > +	if (sock < 0)
> > +		return sock;
> > +
> > +	xdp_id.ifindex = ifindex;
> > +	xdp_id.flags = flags;
> > +
> > +	ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_id, &xdp_id);
> > +	*prog_id = xdp_id.id;  
> 
> just a nit but should we really set prog_id from user if there is
> an error. Probably friendlier not to set caller data in error
> case.
>
Agree, besides checking the return value from libbpf_nl_get_link() we should
also check that we have assigned a value to xdp_id.id so that would mean we
have gone through whole get_xdp_id(). I'll post a v5 within several hours with
following check:

if (!ret && xdp_id.id)
	*prog_id = xdp_id;

Hope that makes sense.
> > +
> > +	close(sock);
> > +	return ret;
> > +}
> > +
> >  int libbpf_nl_get_link(int sock, unsigned int nl_pid,
> >  		       libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
> >  {
> >   
> 


^ permalink raw reply

* [PATCH net 2/9] net/smc: allow 16 byte pnetids in netlink policy
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Hans Wippel <hwippel@linux.ibm.com>

Currently, users can only send pnetids with a maximum length of 15 bytes
over the SMC netlink interface although the maximum pnetid length is 16
bytes. This patch changes the SMC netlink policy to accept 16 byte
pnetids.

Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_pnet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c
index 7cb3e4f07c10..632c3109dee5 100644
--- a/net/smc/smc_pnet.c
+++ b/net/smc/smc_pnet.c
@@ -27,7 +27,7 @@
 static struct nla_policy smc_pnet_policy[SMC_PNETID_MAX + 1] = {
 	[SMC_PNETID_NAME] = {
 		.type = NLA_NUL_STRING,
-		.len = SMC_MAX_PNETID_LEN - 1
+		.len = SMC_MAX_PNETID_LEN
 	},
 	[SMC_PNETID_ETHNAME] = {
 		.type = NLA_NUL_STRING,
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 5/9] net/smc: recvmsg and splice_read should return 0 after shutdown
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

When a socket was connected and is now shut down for read, return 0 to
indicate end of data in recvmsg and splice_read (like TCP) and do not
return ENOTCONN. This behavior is required by the socket api.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/af_smc.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index c4e56602e0c6..b04a813fc865 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1505,6 +1505,11 @@ static int smc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 
 	smc = smc_sk(sk);
 	lock_sock(sk);
+	if (sk->sk_state == SMC_CLOSED && (sk->sk_shutdown & RCV_SHUTDOWN)) {
+		/* socket was connected before, no more data to read */
+		rc = 0;
+		goto out;
+	}
 	if ((sk->sk_state == SMC_INIT) ||
 	    (sk->sk_state == SMC_LISTEN) ||
 	    (sk->sk_state == SMC_CLOSED))
@@ -1840,7 +1845,11 @@ static ssize_t smc_splice_read(struct socket *sock, loff_t *ppos,
 
 	smc = smc_sk(sk);
 	lock_sock(sk);
-
+	if (sk->sk_state == SMC_CLOSED && (sk->sk_shutdown & RCV_SHUTDOWN)) {
+		/* socket was connected before, no more data to read */
+		rc = 0;
+		goto out;
+	}
 	if (sk->sk_state == SMC_INIT ||
 	    sk->sk_state == SMC_LISTEN ||
 	    sk->sk_state == SMC_CLOSED)
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 7/9] net/smc: call smc_cdc_msg_send() under send_lock
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

Call smc_cdc_msg_send() under the connection send_lock to make sure all
send operations for one connection are serialized.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_cdc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index db83332ac1c8..1c5333d494e9 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -125,7 +125,10 @@ static int smcr_cdc_get_slot_and_msg_send(struct smc_connection *conn)
 	if (rc)
 		return rc;
 
-	return smc_cdc_msg_send(conn, wr_buf, pend);
+	spin_lock_bh(&conn->send_lock);
+	rc = smc_cdc_msg_send(conn, wr_buf, pend);
+	spin_unlock_bh(&conn->send_lock);
+	return rc;
 }
 
 int smc_cdc_get_slot_and_msg_send(struct smc_connection *conn)
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 9/9] net/smc: fix use of variable in cleared area
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

Do not use pend->idx as index for the arrays because its value is
located in the cleared area. Use the existing local variable instead.
Without this fix the wrong area might be cleared.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_wr.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index c2694750a6a8..1dc88c32d6bb 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -218,10 +218,10 @@ int smc_wr_tx_put_slot(struct smc_link *link,
 		u32 idx = pend->idx;
 
 		/* clear the full struct smc_wr_tx_pend including .priv */
-		memset(&link->wr_tx_pends[pend->idx], 0,
-		       sizeof(link->wr_tx_pends[pend->idx]));
-		memset(&link->wr_tx_bufs[pend->idx], 0,
-		       sizeof(link->wr_tx_bufs[pend->idx]));
+		memset(&link->wr_tx_pends[idx], 0,
+		       sizeof(link->wr_tx_pends[idx]));
+		memset(&link->wr_tx_bufs[idx], 0,
+		       sizeof(link->wr_tx_bufs[idx]));
 		test_and_clear_bit(idx, link->wr_tx_mask);
 		return 1;
 	}
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 8/9] net/smc: use device link provided in qp_context
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

The device field of the IB event structure does not always point to the
SMC IB device. Load the pointer from the qp_context which is always
provided to smc_ib_qp_event_handler() in the priv field. And for qp
events the affected port is given in the qp structure of the ibevent,
derive it from there.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_ib.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index e519ef29c0ff..76487a16934e 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -289,8 +289,8 @@ int smc_ib_create_protection_domain(struct smc_link *lnk)
 
 static void smc_ib_qp_event_handler(struct ib_event *ibevent, void *priv)
 {
-	struct smc_ib_device *smcibdev =
-		(struct smc_ib_device *)ibevent->device;
+	struct smc_link *lnk = (struct smc_link *)priv;
+	struct smc_ib_device *smcibdev = lnk->smcibdev;
 	u8 port_idx;
 
 	switch (ibevent->event) {
@@ -298,7 +298,7 @@ static void smc_ib_qp_event_handler(struct ib_event *ibevent, void *priv)
 	case IB_EVENT_GID_CHANGE:
 	case IB_EVENT_PORT_ERR:
 	case IB_EVENT_QP_ACCESS_ERR:
-		port_idx = ibevent->element.port_num - 1;
+		port_idx = ibevent->element.qp->port - 1;
 		set_bit(port_idx, &smcibdev->port_event_mask);
 		schedule_work(&smcibdev->port_event_work);
 		break;
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 6/9] net/smc: do not wait under send_lock
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

smc_cdc_get_free_slot() might wait for free transfer buffers when using
SMC-R. This wait should not be done under the send_lock, which is a
spin_lock. This fixes a cpu loop in parallel threads waiting for the
send_lock.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_tx.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index f99951f3f7fd..36af3de731b9 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -488,25 +488,23 @@ static int smcr_tx_sndbuf_nonempty(struct smc_connection *conn)
 	struct smc_wr_buf *wr_buf;
 	int rc;
 
-	spin_lock_bh(&conn->send_lock);
 	rc = smc_cdc_get_free_slot(conn, &wr_buf, &pend);
 	if (rc < 0) {
 		if (rc == -EBUSY) {
 			struct smc_sock *smc =
 				container_of(conn, struct smc_sock, conn);
 
-			if (smc->sk.sk_err == ECONNABORTED) {
-				rc = sock_error(&smc->sk);
-				goto out_unlock;
-			}
+			if (smc->sk.sk_err == ECONNABORTED)
+				return sock_error(&smc->sk);
 			rc = 0;
 			if (conn->alert_token_local) /* connection healthy */
 				mod_delayed_work(system_wq, &conn->tx_work,
 						 SMC_TX_WORK_DELAY);
 		}
-		goto out_unlock;
+		return rc;
 	}
 
+	spin_lock_bh(&conn->send_lock);
 	if (!conn->local_tx_ctrl.prod_flags.urg_data_present) {
 		rc = smc_tx_rdma_writes(conn);
 		if (rc) {
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 4/9] net/smc: don't wait for send buffer space when data was already sent
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

When there is no more send buffer space and at least 1 byte was already
sent then return to user space. The wait is only done when no data was
sent by the sendmsg() call.
This fixes smc_tx_sendmsg() which tried to always send all user data and
started to wait for free send buffer space when needed. During this wait
the user space program was blocked in the sendmsg() call and hence not
able to receive incoming data. When both sides were in such a situation
then the connection stalled forever.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_tx.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index d8366ed51757..f99951f3f7fd 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -165,12 +165,11 @@ int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len)
 			conn->local_tx_ctrl.prod_flags.urg_data_pending = 1;
 
 		if (!atomic_read(&conn->sndbuf_space) || conn->urg_tx_pend) {
+			if (send_done)
+				return send_done;
 			rc = smc_tx_wait(smc, msg->msg_flags);
-			if (rc) {
-				if (send_done)
-					return send_done;
+			if (rc)
 				goto out_err;
-			}
 			continue;
 		}
 
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 3/9] net/smc: prevent races between smc_lgr_terminate() and smc_conn_free()
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

To prevent races between smc_lgr_terminate() and smc_conn_free() add an
extra check of the lgr field before accessing it, and cancel a delayed
free_work when a new smc connection is created.
This fixes the problem that free_work cleared the lgr variable but
smc_lgr_terminate() or smc_conn_free() still access it in parallel.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 35c1cdc93e1c..097c798983ca 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -128,6 +128,8 @@ static void smc_lgr_unregister_conn(struct smc_connection *conn)
 {
 	struct smc_link_group *lgr = conn->lgr;
 
+	if (!lgr)
+		return;
 	write_lock_bh(&lgr->conns_lock);
 	if (conn->alert_token_local) {
 		__smc_lgr_unregister_conn(conn);
@@ -628,6 +630,8 @@ int smc_conn_create(struct smc_sock *smc, bool is_smcd, int srv_first_contact,
 			local_contact = SMC_REUSE_CONTACT;
 			conn->lgr = lgr;
 			smc_lgr_register_conn(conn); /* add smc conn to lgr */
+			if (delayed_work_pending(&lgr->free_work))
+				cancel_delayed_work(&lgr->free_work);
 			write_unlock_bh(&lgr->conns_lock);
 			break;
 		}
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 1/9] net/smc: fix another sizeof to int comparison
From: Ursula Braun @ 2019-01-30 17:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20190130175108.103221-1-ubraun@linux.ibm.com>

Comparing an int to a size, which is unsigned, causes the int to become
unsigned, giving the wrong result. kernel_sendmsg can return a negative
error code.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_clc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c
index 776e9dfc915d..d53fd588d1f5 100644
--- a/net/smc/smc_clc.c
+++ b/net/smc/smc_clc.c
@@ -378,7 +378,7 @@ int smc_clc_send_decline(struct smc_sock *smc, u32 peer_diag_info)
 	vec.iov_len = sizeof(struct smc_clc_msg_decline);
 	len = kernel_sendmsg(smc->clcsock, &msg, &vec, 1,
 			     sizeof(struct smc_clc_msg_decline));
-	if (len < sizeof(struct smc_clc_msg_decline))
+	if (len < 0 || len < sizeof(struct smc_clc_msg_decline))
 		len = -EPROTO;
 	return len > 0 ? 0 : len;
 }
-- 
2.16.4


^ permalink raw reply related

* [PATCH net 0/9] net/smc: fixes 2019-01-30
From: Ursula Braun @ 2019-01-30 17:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun

Dave,

here are some fixes in different areas of the smc code for the net
tree.

Thanks, Ursula

Hans Wippel (1):
  net/smc: allow 16 byte pnetids in netlink policy

Karsten Graul (7):
  net/smc: prevent races between smc_lgr_terminate() and smc_conn_free()
  net/smc: don't wait for send buffer space when data was already sent
  net/smc: recvmsg and splice_read should return 0 after shutdown
  net/smc: do not wait under send_lock
  net/smc: call smc_cdc_msg_send() under send_lock
  net/smc: use device link provided in qp_context
  net/smc: fix use of variable in cleared area

Ursula Braun (1):
  net/smc: fix another sizeof to int comparison

 net/smc/af_smc.c   | 11 ++++++++++-
 net/smc/smc_cdc.c  |  5 ++++-
 net/smc/smc_clc.c  |  2 +-
 net/smc/smc_core.c |  4 ++++
 net/smc/smc_ib.c   |  6 +++---
 net/smc/smc_pnet.c |  2 +-
 net/smc/smc_tx.c   | 17 +++++++----------
 net/smc/smc_wr.c   |  8 ++++----
 8 files changed, 34 insertions(+), 21 deletions(-)

-- 
2.16.4


^ permalink raw reply

* RE: [PATCH] arm64: dts: lx2160aqds: Add mdio mux nodes
From: Leo Li @ 2019-01-30 17:48 UTC (permalink / raw)
  To: Pankaj Bansal, Shawn Guo, Andrew Lunn, Florian Fainelli
  Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org
In-Reply-To: <VI1PR0401MB24968C7FA8CE3CEB291FDC94F1900@VI1PR0401MB2496.eurprd04.prod.outlook.com>



> -----Original Message-----
> From: Pankaj Bansal
> Sent: Wednesday, January 30, 2019 4:37 AM
> To: Pankaj Bansal <pankaj.bansal@nxp.com>; Shawn Guo
> <shawnguo@kernel.org>; Leo Li <leoyang.li@nxp.com>; Andrew Lunn
> <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>
> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> Subject: RE: [PATCH] arm64: dts: lx2160aqds: Add mdio mux nodes
> 
> HI Shawn/Leo,
> 
> Can you please review this patch and include it in your tree ?
> 
> Regards,
> Pankaj Bansal
> 
> -----Original Message-----
> From: Pankaj Bansal [mailto:pankaj.bansal@nxp.com]
> Sent: Thursday, 15 November, 2018 05:42 PM
> To: Shawn Guo <shawnguo@kernel.org>; Leo Li <leoyang.li@nxp.com>;
> Andrew Lunn <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>
> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Pankaj
> Bansal <pankaj.bansal@nxp.com>
> Subject: [PATCH] arm64: dts: lx2160aqds: Add mdio mux nodes
> 
> The two external MDIO buses used to communicate with phy devices that
> are external to SOC are muxed in LX2160AQDS board.
> 
> These buses can be routed to any one of the eight IO slots on LX2160AQDS
> board depending on value in fpga register 0x54.
> 
> Additionally the external MDIO1 is used to communicate to the onboard
> RGMII phy devices.
> 
> The mdio1 is controlled by bits 4-7 of fpga register and mdio2 is controlled by
> bits 0-3 of fpga register.
> 
> Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com>
> ---
> 
> Notes:
>     This patch depends on following patches:
>     [1]https://patchwork.kernel.org/cover/10658863/
>     [2]https://patchwork.codeaurora.org/patch/637861/
> 
>  .../boot/dts/freescale/fsl-lx2160a-qds.dts   | 116 +++++++++++++++++
>  .../boot/dts/freescale/fsl-lx2160a.dtsi      |  23 ++++
>  2 files changed, 139 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> index 8a0305a2b778..39aa2731ddfa 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> @@ -54,6 +54,121 @@
>  &i2c0 {
>  	status = "okay";
> 
> +	fpga@66 {
> +		compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
> +		reg = <0x66>;
> +		#address-cells = <1>;
> +		#size-cells = <0>;
> +
> +		mdio-mux-1@54 {

No compatible for this node?  What is the binding used with this node?

> +			mdio-parent-bus = <&emdio1>;
> +			reg = <0x54>;		 /* BRDCFG4 */
> +			mux-mask = <0xf8>;      /* EMI1_MDIO */
> +			#address-cells=<1>;
> +			#size-cells = <0>;
> +
> +			mdio@0 {
> +				reg = <0x00>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@40 {
> +				reg = <0x40>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@c0 {
> +				reg = <0xc0>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@c8 {
> +				reg = <0xc8>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@d0 {
> +				reg = <0xd0>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@d8 {
> +				reg = <0xd8>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@e0 {
> +				reg = <0xe0>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@e8 {
> +				reg = <0xe8>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@f0 {
> +				reg = <0xf0>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@f8 {
> +				reg = <0xf8>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +		};
> +
> +		mdio-mux-2@54 {

Same comment as the previous one.

> +			mdio-parent-bus = <&emdio2>;
> +			reg = <0x54>;		 /* BRDCFG4 */
> +			mux-mask = <0x07>;      /* EMI2_MDIO */
> +			#address-cells=<1>;
> +			#size-cells = <0>;
> +
> +			mdio@0 {
> +				reg = <0x00>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@1 {
> +				reg = <0x01>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@2 {
> +				reg = <0x02>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@3 {
> +				reg = <0x03>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@4 {
> +				reg = <0x04>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@5 {
> +				reg = <0x05>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@6 {
> +				reg = <0x06>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@7 {
> +				reg = <0x07>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +		};
> +	};
> +
>  	i2c-mux@77 {
>  		compatible = "nxp,pca9547";
>  		reg = <0x77>;
> @@ -118,3 +233,4 @@
>  &usb1 {
>  	status = "okay";
>  };
> +

No need to add this new line.

> diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> index 6ce0677c3096..518882b05f03 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> @@ -780,5 +780,28 @@
>  				     <GIC_SPI 209 IRQ_TYPE_LEVEL_HIGH>;
>  			dma-coherent;
>  		};
> +		/* TODO: WRIOP (CCSR?) */

What is this TODO?  Can we address it now?

> +		/* WRIOP0: 0x8B8_0000, E-MDIO1: 0x1_6000 */
> +		emdio1: mdio@0x8B96000 {
> +			compatible = "fsl,fman-memac-mdio";

This node doesn't actually match the binding Documentation/devicetree/bindings/net/fsl-fman.txt

> +			reg = <0x0 0x8B96000 0x0 0x1000>;
> +			device_type = "mdio";	/* TODO: is this necessary? */

Not needed

> +			little-endian;	/* force the driver in LE mode */
> +
> +			/* Not necessary on the QDS, but needed on the
> RDB*/

Why?  And we shouldn't discuss boards in the soc dtsi file.

> +			#address-cells = <1>;
> +			#size-cells = <0>;
> +		};
> +		/* WRIOP0: 0x8B8_0000, E-MDIO2: 0x1_7000 */
> +		emdio2: mdio@0x8B97000 {
> +			compatible = "fsl,fman-memac-mdio";
> +			reg = <0x0 0x8B97000 0x0 0x1000>;
> +			device_type = "mdio";	/* TODO: is this necessary? */
> +			little-endian;	/* force the driver in LE mode */
> +
> +			#address-cells = <1>;
> +			#size-cells = <0>;
> +		};
>  	};
>  };
> +
> --
> 2.17.1


^ permalink raw reply

* Re: [RFC 03/14] net: hstats: add basic/core functionality
From: Jakub Kicinski @ 2019-01-30 17:44 UTC (permalink / raw)
  To: David Ahern
  Cc: davem, oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch
In-Reply-To: <e5635cee-4a70-624f-2b1b-5f2411c75979@gmail.com>

On Tue, 29 Jan 2019 21:18:28 -0700, David Ahern wrote:
> On 1/28/19 4:44 PM, Jakub Kicinski wrote:
> > @@ -4946,6 +4964,9 @@ static size_t if_nlmsg_stats_size(const struct net_device *dev,
> >  		rcu_read_unlock();
> >  	}
> >  
> > +	if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_HSTATS, 0))  
> 
> filter_mask is populated by RTEXT_FILTER_ from
> include/uapi/linux/rtnetlink.h

ext_filter_mask is from IFLA_EXT_MASK and has RTEXT_FILTER_ bits set.
Here the mask is from struct if_stats_msg::filter_mask of RTM_GETSTATS.
Am I missing the point? :S

> > +		size += rtnl_get_link_hstats_size(dev);  
> 
> rtnl_get_link_hstats_size == __rtnl_get_link_hstats can return < 0.

Ups!  Thank you!

In general how much do you dislike this code? :)

> > +
> >  	return size;
> >  }
> >  
> >   
> 


^ permalink raw reply

* Re: Crashes in skb clone/allocation in 4.19.18
From: Edward Cree @ 2019-01-30 17:37 UTC (permalink / raw)
  To: Ivan Babrou, netdev
  Cc: David S. Miller, Eric Dumazet, Ignat Korchagin, Shawn Bohrer,
	Jakub Sitnicki
In-Reply-To: <90051606-1883-7dc7-fe4f-3bb135e816ae@solarflare.com>

On 30/01/19 17:33, Edward Cree wrote:
> On 30/01/19 16:51, Ivan Babrou wrote:
>> Hey,
>>
>> We've upgraded some machines from 4.19.13 to 4.19.18 and some of them
>> crashed with the following:
>>
>> [ 2313.192006] general protection fault: 0000 [#1] SMP PTI
>> [ 2313.205924] CPU: 32 PID: 65437 Comm: nginx-fl Tainted: G
>> O      4.19.18-cloudflare-2019.1.8 #2019.1.8
>> [ 2313.224973] Hardware name: Quanta Computer Inc. QuantaPlex
>> T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
>> [ 2313.243400] RIP: 0010:kmem_cache_alloc_node+0x178/0x1f0
>> [ 2313.257768] Code: 89 fa 4c 89 f6 e8 68 40 a1 00 4c 8b 55 00 58 4d
>> 85 d2 75 d6 e9 6f ff ff ff 41 8b 59 20 48 8d 4a 01 4c 89 f8 49 8b 39
>> 4c 01 fb <48> 33 1b 49 33 99 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0
>> 0f 84
>> [ 2313.295550] RSP: 0000:ffff94457f903b48 EFLAGS: 00010202
>> [ 2313.310352] RAX: 08b82daf1f57da0e RBX: 08b82daf1f57da0e RCX: 00000000005ff72d
>> [ 2313.327189] RDX: 00000000005ff72c RSI: 0000000000480220 RDI: 0000000000026e40
>> [ 2313.344029] RBP: ffff94457f04d680 R08: ffff94457f926e40 R09: ffff94457f04d680
>> [ 2313.360912] R10: 000004ce652a0026 R11: 0000000000000000 R12: 0000000000480220
>> [ 2313.377857] R13: 00000000ffffffff R14: ffffffffb1ab3ab7 R15: 08b82daf1f57da0e
>> [ 2313.394820] FS:  00007fdea755c780(0000) GS:ffff94457f900000(0000)
>> knlGS:0000000000000000
>> [ 2313.412887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 2313.428581] CR2: 000055acc3cf517b CR3: 000000201b1ea003 CR4: 00000000003606e0
>> [ 2313.445753] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 2313.462843] perf: interrupt took too long (8028 > 7291), lowering
>> kernel.perf_event_max_sample_rate to 24000
>> [ 2313.462867] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 2313.500216] Call Trace:
>> [ 2313.512833]  <IRQ>
>> [ 2313.524748]  __alloc_skb+0x57/0x1d0
>> [ 2313.537934]  __tcp_send_ack.part.48+0x2f/0x100
>> [ 2313.551845]  tcp_rcv_established+0x550/0x640
>> [ 2313.565394]  tcp_v4_do_rcv+0x12a/0x1e0
>> [ 2313.578322]  tcp_v4_rcv+0xadc/0xbd0
>> [ 2313.590993]  ip_local_deliver_finish+0x5d/0x1d0
>> [ 2313.604727]  ip_local_deliver+0x6b/0xe0
>> [ 2313.617782]  ? ip_sublist_rcv+0x200/0x200
>> [ 2313.630415] perf: interrupt took too long (10040 > 10035), lowering
>> kernel.perf_event_max_sample_rate to 19000
>> [ 2313.630948]  ip_rcv+0x52/0xd0
>> [ 2313.662850]  ? ip_rcv_core.isra.22+0x2b0/0x2b0
>> [ 2313.662857]  __netif_receive_skb_one_core+0x52/0x70
>> [ 2313.690860]  netif_receive_skb_internal+0x34/0xe0
>> [ 2313.690883]  efx_rx_deliver+0x11a/0x180 [sfc]
>> [ 2313.717780]  ? __efx_rx_packet+0x1ef/0x730 [sfc]
>> [ 2313.717786]  ? __queue_work+0x103/0x3e0
>> [ 2313.743118]  ? efx_poll+0x35e/0x460 [sfc]
>> [ 2313.743125]  ? net_rx_action+0x138/0x360
>> [ 2313.767356]  ? __do_softirq+0xd8/0x2d2
>> [ 2313.767362]  ? irq_exit+0xb4/0xc0
>> [ 2313.790680]  ? do_IRQ+0x85/0xd0
>> [ 2313.790688]  ? common_interrupt+0xf/0xf
>> [ 2313.790694]  </IRQ>
> Something odd is going on.  As far as I can tell from this call trace
>  (which has some weirdness in it; any chance you could reproduce with
>  frame pointers or a lower build optimisation level?) you're in the
>  normal sfc receive path (under efx_process_channel(), although that's
>  one of the functions that hasn't made it into the stack trace), which
>  means you should have a channel->rx_list, and thus efx_rx_deliver()
>  should be putting the packet on that list rather than calling
>  netif_receive_skb().
>
> I don't know how, or if, that could be related to the crash you're
>  getting, but it might be worth looking into.
> (It can't be the whole story, as your other crash is on a mlx5e and
>  AFAIK they don't use list-RX yet.  Though, confusingly, an entry for
>  ip_sublist_rcv still makes it into both stack traces.)
>
> Maybe it's secondary damage from a wild pointer or other mm problem
>  letting memory get scribbled on.
>
> -Ed
Aaaand as Lance has just pointed out, you're running the out-of-tree
 sfc driver, which doesn't have list RX yet.  Disregard the above.

-Ed

^ permalink raw reply

* Re: Crashes in skb clone/allocation in 4.19.18
From: Ivan Babrou @ 2019-01-30 17:34 UTC (permalink / raw)
  To: Lance Richardson
  Cc: Cong Wang, Linux Kernel Network Developers, David S. Miller,
	Eric Dumazet, Ignat Korchagin, Shawn Bohrer, Jakub Sitnicki
In-Reply-To: <CADuNpCyv=nh8iMrRp=T1qqi3j6pfZJyfqKagqDtAo6+ux-b1xg@mail.gmail.com>

On Wed, Jan 30, 2019 at 9:28 AM Lance Richardson <lance604@gmail.com> wrote:
>
> On Wed, Jan 30, 2019 at 12:17 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Wed, Jan 30, 2019 at 8:54 AM Ivan Babrou <ivan@cloudflare.com> wrote:
> > >
> > > Hey,
> > >
> > > We've upgraded some machines from 4.19.13 to 4.19.18 and some of them
> > > crashed with the following:
> > >
> > > [ 2313.192006] general protection fault: 0000 [#1] SMP PTI
> > > [ 2313.205924] CPU: 32 PID: 65437 Comm: nginx-fl Tainted: G
> > > O      4.19.18-cloudflare-2019.1.8 #2019.1.8
>
> "Tainted: GO" appears to mean that an out-of tree kernel module was
> loaded. If so, information about that module and whether the crash
> occurs when it hasn't been loaded might be of interest.

That module is Solarflare NIC driver. On in-tree Mellanox we've only
seen skb_clone crashes.

>    - Lance
>
> > > [ 2313.224973] Hardware name: Quanta Computer Inc. QuantaPlex
> > > T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> > > [ 2313.243400] RIP: 0010:kmem_cache_alloc_node+0x178/0x1f0
> >
> > This looks more like an mm bug than a networking one.
> >
> > Also, it is always helpful if you can map the RIP to source code,
> > using scripts/faddr2line or scripts/decode_stacktrace.sh.
> >
> >
> > Thanks.
> >

^ permalink raw reply

* Re: Crashes in skb clone/allocation in 4.19.18
From: Edward Cree @ 2019-01-30 17:33 UTC (permalink / raw)
  To: Ivan Babrou, netdev
  Cc: David S. Miller, Eric Dumazet, Ignat Korchagin, Shawn Bohrer,
	Jakub Sitnicki
In-Reply-To: <CABWYdi1UrpkV28HyenPZgqPnn+_sPqxT4XoP_HED2DC0ixxG-w@mail.gmail.com>

On 30/01/19 16:51, Ivan Babrou wrote:
> Hey,
>
> We've upgraded some machines from 4.19.13 to 4.19.18 and some of them
> crashed with the following:
>
> [ 2313.192006] general protection fault: 0000 [#1] SMP PTI
> [ 2313.205924] CPU: 32 PID: 65437 Comm: nginx-fl Tainted: G
> O      4.19.18-cloudflare-2019.1.8 #2019.1.8
> [ 2313.224973] Hardware name: Quanta Computer Inc. QuantaPlex
> T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> [ 2313.243400] RIP: 0010:kmem_cache_alloc_node+0x178/0x1f0
> [ 2313.257768] Code: 89 fa 4c 89 f6 e8 68 40 a1 00 4c 8b 55 00 58 4d
> 85 d2 75 d6 e9 6f ff ff ff 41 8b 59 20 48 8d 4a 01 4c 89 f8 49 8b 39
> 4c 01 fb <48> 33 1b 49 33 99 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0
> 0f 84
> [ 2313.295550] RSP: 0000:ffff94457f903b48 EFLAGS: 00010202
> [ 2313.310352] RAX: 08b82daf1f57da0e RBX: 08b82daf1f57da0e RCX: 00000000005ff72d
> [ 2313.327189] RDX: 00000000005ff72c RSI: 0000000000480220 RDI: 0000000000026e40
> [ 2313.344029] RBP: ffff94457f04d680 R08: ffff94457f926e40 R09: ffff94457f04d680
> [ 2313.360912] R10: 000004ce652a0026 R11: 0000000000000000 R12: 0000000000480220
> [ 2313.377857] R13: 00000000ffffffff R14: ffffffffb1ab3ab7 R15: 08b82daf1f57da0e
> [ 2313.394820] FS:  00007fdea755c780(0000) GS:ffff94457f900000(0000)
> knlGS:0000000000000000
> [ 2313.412887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2313.428581] CR2: 000055acc3cf517b CR3: 000000201b1ea003 CR4: 00000000003606e0
> [ 2313.445753] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2313.462843] perf: interrupt took too long (8028 > 7291), lowering
> kernel.perf_event_max_sample_rate to 24000
> [ 2313.462867] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2313.500216] Call Trace:
> [ 2313.512833]  <IRQ>
> [ 2313.524748]  __alloc_skb+0x57/0x1d0
> [ 2313.537934]  __tcp_send_ack.part.48+0x2f/0x100
> [ 2313.551845]  tcp_rcv_established+0x550/0x640
> [ 2313.565394]  tcp_v4_do_rcv+0x12a/0x1e0
> [ 2313.578322]  tcp_v4_rcv+0xadc/0xbd0
> [ 2313.590993]  ip_local_deliver_finish+0x5d/0x1d0
> [ 2313.604727]  ip_local_deliver+0x6b/0xe0
> [ 2313.617782]  ? ip_sublist_rcv+0x200/0x200
> [ 2313.630415] perf: interrupt took too long (10040 > 10035), lowering
> kernel.perf_event_max_sample_rate to 19000
> [ 2313.630948]  ip_rcv+0x52/0xd0
> [ 2313.662850]  ? ip_rcv_core.isra.22+0x2b0/0x2b0
> [ 2313.662857]  __netif_receive_skb_one_core+0x52/0x70
> [ 2313.690860]  netif_receive_skb_internal+0x34/0xe0
> [ 2313.690883]  efx_rx_deliver+0x11a/0x180 [sfc]
> [ 2313.717780]  ? __efx_rx_packet+0x1ef/0x730 [sfc]
> [ 2313.717786]  ? __queue_work+0x103/0x3e0
> [ 2313.743118]  ? efx_poll+0x35e/0x460 [sfc]
> [ 2313.743125]  ? net_rx_action+0x138/0x360
> [ 2313.767356]  ? __do_softirq+0xd8/0x2d2
> [ 2313.767362]  ? irq_exit+0xb4/0xc0
> [ 2313.790680]  ? do_IRQ+0x85/0xd0
> [ 2313.790688]  ? common_interrupt+0xf/0xf
> [ 2313.790694]  </IRQ>
Something odd is going on.  As far as I can tell from this call trace
 (which has some weirdness in it; any chance you could reproduce with
 frame pointers or a lower build optimisation level?) you're in the
 normal sfc receive path (under efx_process_channel(), although that's
 one of the functions that hasn't made it into the stack trace), which
 means you should have a channel->rx_list, and thus efx_rx_deliver()
 should be putting the packet on that list rather than calling
 netif_receive_skb().

I don't know how, or if, that could be related to the crash you're
 getting, but it might be worth looking into.
(It can't be the whole story, as your other crash is on a mlx5e and
 AFAIK they don't use list-RX yet.  Though, confusingly, an entry for
 ip_sublist_rcv still makes it into both stack traces.)

Maybe it's secondary damage from a wild pointer or other mm problem
 letting memory get scribbled on.

-Ed
> [ 2313.823837] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
> xt_hashlimit cls_flow cls_u32 sch_htb sch_fq md_mod dm_crypt
> algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw xt_nat iptable_nat
> nf_nat_ipv4 nf_nat xt_TPROXY nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark
> iptable_mangle xt_owner xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6
> iptable_raw ip6table_filter ip6_tables nfnetlink_log xt_NFLOG
> xt_tcpudp xt_comment xt_conntrack nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 xt_mark xt_multiport xt_set iptable_filter bpfilter
> ip_set_hash_netport ip_set_hash_net ip_set_hash_ip ip_set nfnetlink
> 8021q garp mrp stp llc sb_edac x86_pkg_temp_thermal kvm_intel kvm
> irqbypass crc32_pclmul crc32c_intel pcbc aesni_intel aes_x86_64
> ipmi_ssif crypto_simd cryptd
> [ 2313.952153]  sfc(O) glue_helper igb i2c_algo_bit ipmi_si mdio dca
> ipmi_devintf ipmi_msghandler efivarfs ip_tables x_tables
> [ 2313.952238] ---[ end trace 477d8e3081c605f6 ]---
>
> Some nodes also crashed in skb_clone, rather than __alloc_skb:
>
> [ 3810.686137] general protection fault: 0000 [#1] SMP PTI
> [ 3810.694579] CPU: 64 PID: 69338 Comm: nginx-fl Not tainted
> 4.19.18-cloudflare-2019.1.8 #2019.1.8
> [ 3810.706589] Hardware name: Quanta Cloud Technology Inc. QuantaPlex
> T42S-2U(LBG-4) ^S5SZ090028/T42S-2U MB (Lewisburg-4), BIOS 3A11.Q10
> 06/29/2018
> [ 3810.726475] RIP: 0010:kmem_cache_alloc+0x89/0x1c0
> [ 3810.734701] Code: 82 72 49 83 78 10 00 4d 8b 30 0f 84 0e 01 00 00
> 4d 85 f6 0f 84 05 01 00 00 41 8b 5f 20 48 8d 4a 01 4c 89 f0 49 8b 3f
> 4c 01 f3 <48> 33 1b 49 33 9f 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0
> 74 b2
> [ 3810.761088] RSP: 0000:ffff99723fe03730 EFLAGS: 00010282
> [ 3810.770132] RAX: f0382d8aebf1ae68 RBX: f0382d8aebf1ae68 RCX: 0000000001cb61cf
> [ 3810.781105] RDX: 0000000001cb61ce RSI: 0000000000480020 RDI: 0000000000027550
> [ 3810.792012] RBP: ffff99723f19d500 R08: ffff99723fe27550 R09: 00000000000005dc
> [ 3810.802820] R10: ffff9992227c0000 R11: 0000000000004000 R12: 0000000000480020
> [ 3810.813589] R13: ffffffff8dcb5f7d R14: f0382d8aebf1ae68 R15: ffff99723f19d500
> [ 3810.824382] FS:  00007f2a8863c780(0000) GS:ffff99723fe00000(0000)
> knlGS:0000000000000000
> [ 3810.836189] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3810.845662] CR2: 000055820762eecd CR3: 00000019eb850003 CR4: 00000000007606e0
> [ 3810.856567] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 3810.867600] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 3810.878554] PKRU: 55555554
> [ 3810.884787] Call Trace:
> [ 3810.890601]  <IRQ>
> [ 3810.896116]  skb_clone+0x4d/0xb0
> [ 3810.902712]  dev_queue_xmit_nit+0xd9/0x260
> [ 3810.910181]  dev_hard_start_xmit+0x69/0x1f0
> [ 3810.917784]  __dev_queue_xmit+0x6f7/0x8a0
> [ 3810.925172]  ? eth_header+0x26/0xc0
> [ 3810.932053]  ip_finish_output2+0x193/0x400
> [ 3810.939670]  ? ip_finish_output+0x139/0x270
> [ 3810.947241]  ip_output+0x6c/0xe0
> [ 3810.953844]  ? ip_append_data.part.51+0xc0/0xc0
> [ 3810.961802]  __tcp_transmit_skb+0x511/0xaa0
> [ 3810.969420]  __tcp_retransmit_skb+0x19c/0x7c0
> [ 3810.977209]  ? tcp_current_mss+0x57/0xa0
> [ 3810.984493]  tcp_retransmit_skb+0x12/0x80
> [ 3810.991894]  tcp_xmit_retransmit_queue.part.50+0x147/0x240
> [ 3811.000754]  tcp_ack+0x9c4/0x11b0
> [ 3811.007416]  tcp_rcv_established+0x190/0x640
> [ 3811.015065]  ? tcp_v4_inbound_md5_hash+0x69/0x160
> [ 3811.023106]  tcp_v4_do_rcv+0x12a/0x1e0
> [ 3811.030190]  tcp_v4_rcv+0xadc/0xbd0
> [ 3811.037009]  ip_local_deliver_finish+0x5d/0x1d0
> [ 3811.044859]  ip_local_deliver+0x6b/0xe0
> [ 3811.051999]  ? ip_sublist_rcv+0x200/0x200
> [ 3811.059325]  ip_rcv+0x52/0xd0
> [ 3811.065595]  ? ip_rcv_core.isra.22+0x2b0/0x2b0
> [ 3811.073361]  __netif_receive_skb_one_core+0x52/0x70
> [ 3811.081621]  netif_receive_skb_internal+0x34/0xe0
> [ 3811.089652]  napi_gro_receive+0xba/0xe0
> [ 3811.096969]  mlx5e_handle_rx_cqe+0x1eb/0x530 [mlx5_core]
> [ 3811.105545]  ? skb_release_head_state+0x5c/0xb0
> [ 3811.113447]  mlx5e_poll_rx_cq+0xc8/0x910 [mlx5_core]
> [ 3811.121652]  mlx5e_napi_poll+0xb1/0xc60 [mlx5_core]
> [ 3811.129574]  net_rx_action+0x138/0x360
> [ 3811.136266]  __do_softirq+0xd8/0x2d2
> [ 3811.142679]  irq_exit+0xb4/0xc0
> [ 3811.148578]  do_IRQ+0x85/0xd0
> [ 3811.154254]  common_interrupt+0xf/0xf
> [ 3811.160585]  </IRQ>
> [ 3811.165319] RIP: 0033:0x5581e1551ca0
> [ 3811.171546] Code: e8 10 41 ff 24 ee 81 7c ca 04 ff ff fe ff 0f 83
> 87 1c 00 00 8b 03 0f b6 cc 0f b6 e8 83 c3 04 c1 e8 10 41 ff 24 ee 48
> 8b 2c c2 <48> 89 2c ca 8b 03 0f b6 cc 0f b6 e8 83 c3 04 c1 e8 10 41 ff
> 24 ee
> [ 3811.195925] RSP: 002b:00007ffdd615ebc0 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffffde
> [ 3811.206319] RAX: 0000000000000000 RBX: 00000000406c9058 RCX: 000000000000000b
> [ 3811.216321] RDX: 000000004099cdc8 RSI: fffffffb40c07eb0 RDI: 000000004183d738
> [ 3811.226277] RBP: fffffff444c8c5c0 R08: 000000004099cdc8 R09: 00000000425ce3d8
> [ 3811.236340] R10: 0000000044c8c5c0 R11: 000000004139cbb0 R12: 0000000000000000
> [ 3811.246349] R13: 00005581ead6a9e0 R14: 000000004166afe8 R15: 00000000406c90f8
> [ 3811.256320] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
> xt_hashlimit cls_flow cls_u32 sch_htb sch_fq md_mod dm_crypt
> algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TPROXY
> nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark iptable_mangle xt_owner
> xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6 iptable_raw
> nfnetlink_log xt_NFLOG xt_tcpudp xt_comment xt_conntrack nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_multiport xt_set
> iptable_filter bpfilter ip_set_hash_netport ip_set_hash_net
> ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc skx_edac
> x86_pkg_temp_thermal kvm_intel kvm irqbypass ipmi_ssif crc32_pclmul
> crc32c_intel pcbc aesni_intel aes_x86_64 crypto_simd mlx5_core
> [ 3811.351698]  cryptd xhci_pci tpm_crb mlxfw glue_helper ioatdma
> devlink ipmi_si xhci_hcd dca ipmi_devintf ipmi_msghandler tpm_tis
> tpm_tis_core tpm efivarfs ip_tables x_tables
> [ 3811.375161] ---[ end trace 1a7795bb39a63cf7 ]---
>
> Is this know? Could it be related to this commit:
>
> * https://github.com/torvalds/linux/commit/598e57e029290be3e7f8f87ff908091a5a22ed2f
>
> Thanks!


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox