Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net 0/3] net: stmmac: Misc fixes
From: Corentin Labbe @ 2019-01-30 15:17 UTC (permalink / raw)
  To: Jose Abreu
  Cc: netdev, Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue
In-Reply-To: <cover.1548859967.git.joabreu@synopsys.com>

On Wed, Jan 30, 2019 at 03:54:18PM +0100, Jose Abreu wrote:
> Some misc fixes for stmmac targeting -net.
> 
> Cc: Joao Pinto <jpinto@synopsys.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> Cc: Alexandre Torgue <alexandre.torgue@st.com>
> 
> Jose Abreu (3):
>   net: stmmac: Fallback to Platform Data clock in Watchdog conversion
>   net: stmmac: Send TSO packets always from Queue 0
>   net: stmmac: Disable EEE mode earlier in XMIT callback
> 
>  drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 14 ++++++++++----
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c    | 17 +++++++++++++----
>  include/linux/stmmac.h                               |  1 +
>  3 files changed, 24 insertions(+), 8 deletions(-)
> 
> -- 
> 2.7.4
> 

Hello

Could you CC linux-kernel@vger.kernel.org when you send patch. (as asked by get_maintainer.pl)
By avoiding it, your patchset is not stored on lore nor is handled by all checkbots reading lkml.

Thanks
Regards

^ permalink raw reply

* Re: [PATCH 0/1] add MDIO bus multiplexer driven by a regmap device
From: Andrew Lunn @ 2019-01-30 15:22 UTC (permalink / raw)
  To: Pankaj Bansal; +Cc: Florian Fainelli, netdev@vger.kernel.org, Varun Sethi
In-Reply-To: <20190130164644.3948-1-pankaj.bansal@nxp.com>

On Wed, Jan 30, 2019 at 11:21:57AM +0000, Pankaj Bansal wrote:
> Add support for an MDIO bus multiplexer controlled by a regmap device, like an 
> FPGA.
> 
> These apis is an extension of the existing driver 
> drivers/net/phy/mdio-mux-mmioreg.c.
> 
> The problem with mmioreg driver is that it can operate only on memory mapped 
> devices.
> but if we have a device that controls mdio muxing and that device is controlled 
> using
> i2c or spi, then it will not work.
> 
> Therefore, added apis that can be used by regmap device to control mdio mux.
> 
> Tested on a NXP LX2160AQDS board which uses the "QIXIS" FPGA attached to the 
> i2c bus.
> 
> This is my second attempt at this.
> In my previous approach i wrote a separate driver for regmap apis. But then i
> realized that it is not meant to control a specific device.
> It is meant to control some registers of parent device. Therefore, IMO this 
> should not be a
> Platform driver and there should not be any "compatible" property to which this 
> driver is associated.

Hi Pankaj

It is not clear to me how you actually use this. You also need to
document the device tree binding. It could be when you write that
documentation it then becomes clear how it should be used.

Do you have patches adding support for this to the LX2160AQDS?  Seeing
that would also help.

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH 1/1] netdev/phy: add MDIO bus multiplexer driven by a regmap
From: Andrew Lunn @ 2019-01-30 15:23 UTC (permalink / raw)
  To: Pankaj Bansal; +Cc: Florian Fainelli, netdev@vger.kernel.org
In-Reply-To: <20190130164644.3948-2-pankaj.bansal@nxp.com>

> +++ b/drivers/net/phy/mdio-mux-regmap.c
> @@ -0,0 +1,170 @@
> +// SPDX-License-Identifier: GPL-2.0+

..

> +MODULE_LICENSE("GPL v2");

These are not consistent.

      Andrew

^ permalink raw reply

* Re: [PATCH] net: wireless: prefix header search paths with $(srctree)/
From: Hin-Tak Leung @ 2019-01-30 15:25 UTC (permalink / raw)
  To: Kalle Valo, linux-wireless, Masahiro Yamada
  Cc: netdev, Larry Finger, linux-kernel
In-Reply-To: <883460220.650537.1548861942189.ref@mail.yahoo.com>

--------------------------------------------
On Fri, 25/1/19, Masahiro Yamada <yamada.masahiro@socionext.com> wrote:
 
> Currently, the Kbuild core manipulates header
> search paths in a crazy
> way [1].
 
> To fix this mess, I want all Makefiles
> to add explicit $(srctree)/ to
> the search paths in the srctree. Some
> Makefiles are already written in
> that way, but not all. The goal of this
> work is to make the notation
> consistent, and finally get rid of the
> gross hacks.
 
> Having whitespaces after -I does not
> matter since commit 48f6e3cf5bc6
> ("kbuild: do not drop -I without
> parameter").
 
> I also removed one header search path
> in:
 
  
> drivers/net/wireless/broadcom/brcm80211/brcmutil/Makefile
 
> I was able to compile without it.
 
> [1]: https://patchwork.kernel.org/patch/9632347/
 
> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>

Looks okay for the rtl818x parts.

^ permalink raw reply

* Re: [PATCH 1/1] netdev/phy: add MDIO bus multiplexer driven by a regmap
From: Andrew Lunn @ 2019-01-30 15:33 UTC (permalink / raw)
  To: Pankaj Bansal; +Cc: Florian Fainelli, netdev@vger.kernel.org
In-Reply-To: <20190130164644.3948-2-pankaj.bansal@nxp.com>

On Wed, Jan 30, 2019 at 11:22:00AM +0000, Pankaj Bansal wrote:
> Add support for an MDIO bus multiplexer controlled by a regmap
> device, like an FPGA.
> 
> Tested on a NXP LX2160AQDS board which uses the "QIXIS" FPGA
> attached to the i2c bus.
> 
> Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com>
> ---
>  drivers/net/phy/Makefile          |   2 +-
>  drivers/net/phy/mdio-mux-regmap.c | 170 ++++++++++++++++++++++++++++
>  include/linux/mdio-mux.h          |  20 ++++
>  3 files changed, 191 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
> index f41b14115fde..16145973a42f 100644
> --- a/drivers/net/phy/Makefile
> +++ b/drivers/net/phy/Makefile
> @@ -25,7 +25,7 @@ obj-$(CONFIG_PHYLIB)		+= libphy.o
>  obj-$(CONFIG_MDIO_BCM_IPROC)	+= mdio-bcm-iproc.o
>  obj-$(CONFIG_MDIO_BCM_UNIMAC)	+= mdio-bcm-unimac.o
>  obj-$(CONFIG_MDIO_BITBANG)	+= mdio-bitbang.o
> -obj-$(CONFIG_MDIO_BUS_MUX)	+= mdio-mux.o
> +obj-$(CONFIG_MDIO_BUS_MUX)	+= mdio-mux.o mdio-mux-regmap.o

Please add a new KCONFIG symbol for it. And think about the
depend/select statement, since you need regmap.

> +/* MDIO multiplexing switch function
> + *
> + * This function is called by the mdio-mux layer when it thinks the mdio bus
> + * multiplexer needs to switch.
> + *
> + * 'current_child' is the current value of the mux register (masked via
> + * s->mask).
> + *
> + * 'desired_child' is the value of the 'reg' property of the target child MDIO
> + * node.
> + *
> + * The first time this function is called, current_child == -1.
> + *
> + * If current_child == desired_child, then the mux is already set to the
> + * correct bus.
> + */

Please use kerneldoc formatting for this function documentation.

> +int mdio_mux_regmap_init(struct device *dev,
> +			 struct device_node *mux_node,
> +			 void **data)
> +{

> +	/* Verify that the 'reg' property of each child MDIO bus does not
> +	 * set any bits outside of the 'mask'.
> +	 */
> +	for_each_available_child_of_node(mux_node, child) {
> +		ret = of_property_read_u32(child, "reg", &val);
> +		if (ret) {
> +			dev_err(dev, "mdio-mux child node %pOF is missing a 'reg' property\n", child);

You can probably remove "mdio-mux child node " making the line < 80,
but still retain the meaning. The child node name should be sufficient
to identify it.


> +			of_node_put(child);
> +			return -ENODEV;
> +		}
> +		if (val & ~s->mask) {
> +			dev_err(dev, "mdio-mux child node %pOF has a 'reg' value with unmasked bits\n", child);

Same here.

     Andrew

^ permalink raw reply

* Re: WoL broken in r8169.c since kernel 4.19
From: Marc Haber @ 2019-01-30 15:37 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: netdev@vger.kernel.org
In-Reply-To: <638767f7-5701-309b-97b1-a83f774afbab@gmail.com>

On Tue, Jan 29, 2019 at 10:20:48PM +0100, Heiner Kallweit wrote:
> one more attempt, could you please test the following with 4.19 or 4.20
> (w/o the other debug patches) ?

With the following patch, the machine wakes up fine on a WoL magic
packet:

nux-4.20.5/drivers/net/ethernet/realtek/r8169.c   2019-01-30 16:03:00.090841076 +0100
+++ orig/linux-4.20.5/drivers/net/ethernet/realtek/r8169.c      2019-01-26 09:20:52.000000000 +0100
@@ -1418,7 +1418,6 @@

 #define WAKE_ANY (WAKE_PHY | WAKE_MAGIC | WAKE_UCAST | WAKE_BCAST | WAKE_MCAST)

-#if 0
 static u32 __rtl8169_get_wol(struct rtl8169_private *tp)
 {
        u8 options;
@@ -1453,7 +1452,6 @@

        return wolopts;
 }
-#endif

 static void rtl8169_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
 {
@@ -1463,8 +1461,6 @@
        wol->supported = WAKE_ANY;
        wol->wolopts = tp->saved_wolopts;
        rtl_unlock_work(tp);
-
-       pr_info("get_wol: 0x%08x\n", wol->wolopts);
 }

 static void __rtl8169_set_wol(struct rtl8169_private *tp, u32 wolopts)
@@ -1540,8 +1536,6 @@
        struct rtl8169_private *tp = netdev_priv(dev);
        struct device *d = tp_to_dev(tp);

-       pr_info("set_wol: 0x%08x\n", wol->wolopts);
-
        if (wol->wolopts & ~WAKE_ANY)
                return -EINVAL;

@@ -4174,7 +4168,7 @@
 {
        struct phy_device *phydev;

-       if (!device_may_wakeup(tp_to_dev(tp)))
+       if (!__rtl8169_get_wol(tp))
                return false;

        /* phydev may not be attached to netdevice */
@@ -7372,6 +7366,8 @@
                return rc;
        }

+       tp->saved_wolopts = __rtl8169_get_wol(tp);
+
        mutex_init(&tp->wk.mutex);
        u64_stats_init(&tp->rx_stats.syncp);
        u64_stats_init(&tp->tx_stats.syncp);
1 [14/5006]mh@fan:~/linux/4.20.5 $

I'll send the dmesg output to you in private e-mail

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

^ permalink raw reply

* Re: [PATCH net 0/4] various compat ioctl fixes
From: Al Viro @ 2019-01-30 15:40 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David Miller, netdev, robert
In-Reply-To: <149d1ddec433d7cb766c99eeb78b220b33090287.camel@sipsolutions.net>

On Mon, Jan 28, 2019 at 10:32:30PM +0100, Johannes Berg wrote:

> At the same time, fixing all this _completely_ is not very realistic, it
> would require passing the ifreq size through to lots of places and
> making the user copy there take the size rather than sizeof(ifreq),
> obviously the very least to the method decnet uses, i.e. sock->ioctl() I
> think, but clearly that affects every other protocol too.
> This was what my previous patch had done partially for the directly
> handled ioctls (the revert of which is the first patch in this series).
> 
> > From what I can see this looks like probably the simplest way to
> > fix this in net and -stable currently.
> 
> I tend to agree, at least to fix the regression.
> 
> We can still deliberate separately if we want to fix decnet for compat
> or if nobody cares now. But perhaps better decnet broken (quite
> obviously and detectably) like it basically always was, than IP broken
> (subtly, if your struct ends up landing at the end of a page).
> 
> Al, care to speak up about this here?

Umm...  Short-term I don't see anything better; long-term I would really
like to see compat_alloc_user_space()/copy_in_user() crap gone and
copyin-copyout for anything more or less generic lifted up as far as
cleanly possible, but let's not mix it with regression fixing.

So for the lack of better short-term solutions,
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
on the series.

^ permalink raw reply

* Re: [PATCH RFC RFT net-next 03/10] net: dsa: mv88e6060: Replace REG_WRITE macro
From: Andrew Lunn @ 2019-01-30 15:42 UTC (permalink / raw)
  To: Pavel Machek; +Cc: netdev, Vivien Didelot, Florian Fainelli
In-Reply-To: <20190130092451.GA22071@amd>

On Wed, Jan 30, 2019 at 10:24:51AM +0100, Pavel Machek wrote:
> On Wed 2019-01-30 01:37:51, Andrew Lunn wrote:
> > The REG_WRITE macro contains a return statement, making it not very
> > safe. Remove it by inlining the code.
> 
> Not bad, but maybe there should be dev_err() or something in case of
> reg_write() returns an error?

Hi Pavel

An error is always returned to the caller. It should be the caller who
handles error recovery, and if need be prints an error message. The
only time we would print an error message is in a void function, when
we cannot return an error code.

I've also followed what we do in mv88e6xxx. It works fine there, so i
don't see the need to do anything different here.

      Andrew

^ permalink raw reply

* Re: [PATCHv3 2/6] Documentation: dt: socfpga: Add S10 System Manager binding
From: Rob Herring @ 2019-01-30 15:51 UTC (permalink / raw)
  To: thor.thayer
  Cc: lee.jones, arnd, dinguyen, linux, catalin.marinas, will.deacon,
	peppe.cavallaro, alexandre.torgue, joabreu, davem,
	mcoquelin.stm32, mchehab+samsung, mark.rutland, bjorn.andersson,
	olof, devicetree, linux-kernel, linux-arm-kernel, netdev,
	Thor Thayer
In-Reply-To: <1548713655-25940-3-git-send-email-thor.thayer@linux.intel.com>

On Mon, 28 Jan 2019 16:14:11 -0600, thor.thayer@linux.intel.com wrote:
> From: Thor Thayer <thor.thayer@linux.intel.com>
> 
> Add the device tree bindings for the Stratix10 System Manager.
> 
> Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
> ---
> v2  New compatible string and usage for Stratix10
> v3  No change
> ---
>  .../devicetree/bindings/arm/altera/socfpga-system.txt        | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 

Please add Acked-by/Reviewed-by tags when posting new versions. However,
there's no need to repost patches *only* to add the tags. The upstream
maintainer will do that for acks received on the version they apply.

If a tag was not added on purpose, please state why and what changed.

^ permalink raw reply

* Re: [PATCH net-next v2 10/12] net: dsa: Wire up multicast IGMP snooping attribute notification
From: Andrew Lunn @ 2019-01-30 16:06 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, vivien.didelot, davem, idosch, jiri, ilias.apalodimas,
	ivan.khoronzhuk, roopa, nikolay
In-Reply-To: <20190130005548.2212-11-f.fainelli@gmail.com>

On Tue, Jan 29, 2019 at 04:55:46PM -0800, Florian Fainelli wrote:
> The bridge can at runtime be configured with or without IGMP snooping
> enabled but we were not processing the switchdev attribute that notifies
> about that toggle, do this now.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  include/net/dsa.h  |  2 ++
>  net/dsa/dsa_priv.h | 11 +++++++++++
>  net/dsa/port.c     | 13 +++++++++++++
>  net/dsa/slave.c    |  4 ++++
>  net/dsa/switch.c   | 28 ++++++++++++++++++++++++++++
>  5 files changed, 58 insertions(+)
> 
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index 7f2a668ef2cc..2ee1ede7df5c 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -425,6 +425,8 @@ struct dsa_switch_ops {
>  	/*
>  	 * Multicast database
>  	 */
> +	int	(*port_multicast_toggle)(struct dsa_switch *ds, int port,
> +					 bool mc_disabled);


Hi Florin

Looks like there is an extra tab in there?

      Andrew

^ permalink raw reply

* Re: [Patch net] xfrm: destroy xfrm_state synchronously on net exit path
From: Cong Wang @ 2019-01-30 16:33 UTC (permalink / raw)
  To: Linux Kernel Network Developers; +Cc: syzbot, Steffen Klassert
In-Reply-To: <20190130062737.15504-1-xiyou.wangcong@gmail.com>

On Tue, Jan 29, 2019 at 10:27 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
> index f5b4febeaa25..08bf374a80eb 100644
> --- a/net/ipv6/xfrm6_tunnel.c
> +++ b/net/ipv6/xfrm6_tunnel.c
> @@ -344,8 +344,7 @@ static void __net_exit xfrm6_tunnel_net_exit(struct net *net)
>         struct xfrm6_tunnel_net *xfrm6_tn = xfrm6_tunnel_pernet(net);
>         unsigned int i;
>
> -       xfrm_state_flush(net, IPSEC_PROTO_ANY, false);
> -       xfrm_flush_gc();
> +       xfrm_state_flush(net, IPSEC_PROTO_ANY, false, true);

Well... We still have to wait for works scheduled from other call path.

I will send v2.

^ permalink raw reply

* Re: [PATCH bpf-next v3 1/4] bpf: add plumbing for BPF_LWT_ENCAP_IP in bpf_lwt_push_encap
From: Peter Oskolkov @ 2019-01-30 16:39 UTC (permalink / raw)
  To: David Ahern
  Cc: Peter Oskolkov, Alexei Starovoitov, Daniel Borkmann, netdev,
	Willem de Bruijn
In-Reply-To: <f4140440-c172-b06d-10a9-2143ec1d9d06@gmail.com>

On Tue, Jan 29, 2019 at 7:29 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 1/28/19 6:12 PM, Peter Oskolkov wrote
> > @@ -2583,7 +2594,15 @@ enum bpf_ret_code {
> >       BPF_DROP = 2,
> >       /* 3-6 reserved */
> >       BPF_REDIRECT = 7,
> > -     /* >127 are reserved for prog type specific return codes */
> > +     /* >127 are reserved for prog type specific return codes.
> > +      *
> > +      * BPF_LWT_REROUTE: used by BPF_PROG_TYPE_LWT_IN and
> > +      *    BPF_PROG_TYPE_LWT_XMIT to indicate that skb's dst
> > +      *    has changed and appropriate dst_input() or dst_output()
> > +      *    action has to be taken (this is an L3 redirect, as
> > +      *    opposed to L2 redirect represented by BPF_REDIRECT above).
> > +      */
> > +     BPF_LWT_REROUTE = 128,
> >  };
>
> What happens if a program pushes a new header onto the skb and does not
> return BPF_LWT_REROUTE?
>
> Might be better to move the route lookup and dst swap to run_lwt_bpf and
> only do it if the program returns BPF_LWT_REROUTE. That allows calling
> bpf_push_ip_encap without requiring a route lookup. That might be fine
> as long as their is not a protocol mismatch (ipv4 packet gets an ipv6
> header or vice versa). But then, I think you have the mismatch problem
> now if the program does not return BPF_LWT_REROUTE.

Makes sense - thanks for the suggestion I'll send a v4 later today.

Thanks,
Peter

^ permalink raw reply

* Crashes in skb clone/allocation in 4.19.18
From: Ivan Babrou @ 2019-01-30 16:51 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, Ignat Korchagin, Shawn Bohrer,
	Jakub Sitnicki

Hey,

We've upgraded some machines from 4.19.13 to 4.19.18 and some of them
crashed with the following:

[ 2313.192006] general protection fault: 0000 [#1] SMP PTI
[ 2313.205924] CPU: 32 PID: 65437 Comm: nginx-fl Tainted: G
O      4.19.18-cloudflare-2019.1.8 #2019.1.8
[ 2313.224973] Hardware name: Quanta Computer Inc. QuantaPlex
T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
[ 2313.243400] RIP: 0010:kmem_cache_alloc_node+0x178/0x1f0
[ 2313.257768] Code: 89 fa 4c 89 f6 e8 68 40 a1 00 4c 8b 55 00 58 4d
85 d2 75 d6 e9 6f ff ff ff 41 8b 59 20 48 8d 4a 01 4c 89 f8 49 8b 39
4c 01 fb <48> 33 1b 49 33 99 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0
0f 84
[ 2313.295550] RSP: 0000:ffff94457f903b48 EFLAGS: 00010202
[ 2313.310352] RAX: 08b82daf1f57da0e RBX: 08b82daf1f57da0e RCX: 00000000005ff72d
[ 2313.327189] RDX: 00000000005ff72c RSI: 0000000000480220 RDI: 0000000000026e40
[ 2313.344029] RBP: ffff94457f04d680 R08: ffff94457f926e40 R09: ffff94457f04d680
[ 2313.360912] R10: 000004ce652a0026 R11: 0000000000000000 R12: 0000000000480220
[ 2313.377857] R13: 00000000ffffffff R14: ffffffffb1ab3ab7 R15: 08b82daf1f57da0e
[ 2313.394820] FS:  00007fdea755c780(0000) GS:ffff94457f900000(0000)
knlGS:0000000000000000
[ 2313.412887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2313.428581] CR2: 000055acc3cf517b CR3: 000000201b1ea003 CR4: 00000000003606e0
[ 2313.445753] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2313.462843] perf: interrupt took too long (8028 > 7291), lowering
kernel.perf_event_max_sample_rate to 24000
[ 2313.462867] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2313.500216] Call Trace:
[ 2313.512833]  <IRQ>
[ 2313.524748]  __alloc_skb+0x57/0x1d0
[ 2313.537934]  __tcp_send_ack.part.48+0x2f/0x100
[ 2313.551845]  tcp_rcv_established+0x550/0x640
[ 2313.565394]  tcp_v4_do_rcv+0x12a/0x1e0
[ 2313.578322]  tcp_v4_rcv+0xadc/0xbd0
[ 2313.590993]  ip_local_deliver_finish+0x5d/0x1d0
[ 2313.604727]  ip_local_deliver+0x6b/0xe0
[ 2313.617782]  ? ip_sublist_rcv+0x200/0x200
[ 2313.630415] perf: interrupt took too long (10040 > 10035), lowering
kernel.perf_event_max_sample_rate to 19000
[ 2313.630948]  ip_rcv+0x52/0xd0
[ 2313.662850]  ? ip_rcv_core.isra.22+0x2b0/0x2b0
[ 2313.662857]  __netif_receive_skb_one_core+0x52/0x70
[ 2313.690860]  netif_receive_skb_internal+0x34/0xe0
[ 2313.690883]  efx_rx_deliver+0x11a/0x180 [sfc]
[ 2313.717780]  ? __efx_rx_packet+0x1ef/0x730 [sfc]
[ 2313.717786]  ? __queue_work+0x103/0x3e0
[ 2313.743118]  ? efx_poll+0x35e/0x460 [sfc]
[ 2313.743125]  ? net_rx_action+0x138/0x360
[ 2313.767356]  ? __do_softirq+0xd8/0x2d2
[ 2313.767362]  ? irq_exit+0xb4/0xc0
[ 2313.790680]  ? do_IRQ+0x85/0xd0
[ 2313.790688]  ? common_interrupt+0xf/0xf
[ 2313.790694]  </IRQ>
[ 2313.823837] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
xt_hashlimit cls_flow cls_u32 sch_htb sch_fq md_mod dm_crypt
algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw xt_nat iptable_nat
nf_nat_ipv4 nf_nat xt_TPROXY nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark
iptable_mangle xt_owner xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6
iptable_raw ip6table_filter ip6_tables nfnetlink_log xt_NFLOG
xt_tcpudp xt_comment xt_conntrack nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 xt_mark xt_multiport xt_set iptable_filter bpfilter
ip_set_hash_netport ip_set_hash_net ip_set_hash_ip ip_set nfnetlink
8021q garp mrp stp llc sb_edac x86_pkg_temp_thermal kvm_intel kvm
irqbypass crc32_pclmul crc32c_intel pcbc aesni_intel aes_x86_64
ipmi_ssif crypto_simd cryptd
[ 2313.952153]  sfc(O) glue_helper igb i2c_algo_bit ipmi_si mdio dca
ipmi_devintf ipmi_msghandler efivarfs ip_tables x_tables
[ 2313.952238] ---[ end trace 477d8e3081c605f6 ]---

Some nodes also crashed in skb_clone, rather than __alloc_skb:

[ 3810.686137] general protection fault: 0000 [#1] SMP PTI
[ 3810.694579] CPU: 64 PID: 69338 Comm: nginx-fl Not tainted
4.19.18-cloudflare-2019.1.8 #2019.1.8
[ 3810.706589] Hardware name: Quanta Cloud Technology Inc. QuantaPlex
T42S-2U(LBG-4) ^S5SZ090028/T42S-2U MB (Lewisburg-4), BIOS 3A11.Q10
06/29/2018
[ 3810.726475] RIP: 0010:kmem_cache_alloc+0x89/0x1c0
[ 3810.734701] Code: 82 72 49 83 78 10 00 4d 8b 30 0f 84 0e 01 00 00
4d 85 f6 0f 84 05 01 00 00 41 8b 5f 20 48 8d 4a 01 4c 89 f0 49 8b 3f
4c 01 f3 <48> 33 1b 49 33 9f 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0
74 b2
[ 3810.761088] RSP: 0000:ffff99723fe03730 EFLAGS: 00010282
[ 3810.770132] RAX: f0382d8aebf1ae68 RBX: f0382d8aebf1ae68 RCX: 0000000001cb61cf
[ 3810.781105] RDX: 0000000001cb61ce RSI: 0000000000480020 RDI: 0000000000027550
[ 3810.792012] RBP: ffff99723f19d500 R08: ffff99723fe27550 R09: 00000000000005dc
[ 3810.802820] R10: ffff9992227c0000 R11: 0000000000004000 R12: 0000000000480020
[ 3810.813589] R13: ffffffff8dcb5f7d R14: f0382d8aebf1ae68 R15: ffff99723f19d500
[ 3810.824382] FS:  00007f2a8863c780(0000) GS:ffff99723fe00000(0000)
knlGS:0000000000000000
[ 3810.836189] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3810.845662] CR2: 000055820762eecd CR3: 00000019eb850003 CR4: 00000000007606e0
[ 3810.856567] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3810.867600] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3810.878554] PKRU: 55555554
[ 3810.884787] Call Trace:
[ 3810.890601]  <IRQ>
[ 3810.896116]  skb_clone+0x4d/0xb0
[ 3810.902712]  dev_queue_xmit_nit+0xd9/0x260
[ 3810.910181]  dev_hard_start_xmit+0x69/0x1f0
[ 3810.917784]  __dev_queue_xmit+0x6f7/0x8a0
[ 3810.925172]  ? eth_header+0x26/0xc0
[ 3810.932053]  ip_finish_output2+0x193/0x400
[ 3810.939670]  ? ip_finish_output+0x139/0x270
[ 3810.947241]  ip_output+0x6c/0xe0
[ 3810.953844]  ? ip_append_data.part.51+0xc0/0xc0
[ 3810.961802]  __tcp_transmit_skb+0x511/0xaa0
[ 3810.969420]  __tcp_retransmit_skb+0x19c/0x7c0
[ 3810.977209]  ? tcp_current_mss+0x57/0xa0
[ 3810.984493]  tcp_retransmit_skb+0x12/0x80
[ 3810.991894]  tcp_xmit_retransmit_queue.part.50+0x147/0x240
[ 3811.000754]  tcp_ack+0x9c4/0x11b0
[ 3811.007416]  tcp_rcv_established+0x190/0x640
[ 3811.015065]  ? tcp_v4_inbound_md5_hash+0x69/0x160
[ 3811.023106]  tcp_v4_do_rcv+0x12a/0x1e0
[ 3811.030190]  tcp_v4_rcv+0xadc/0xbd0
[ 3811.037009]  ip_local_deliver_finish+0x5d/0x1d0
[ 3811.044859]  ip_local_deliver+0x6b/0xe0
[ 3811.051999]  ? ip_sublist_rcv+0x200/0x200
[ 3811.059325]  ip_rcv+0x52/0xd0
[ 3811.065595]  ? ip_rcv_core.isra.22+0x2b0/0x2b0
[ 3811.073361]  __netif_receive_skb_one_core+0x52/0x70
[ 3811.081621]  netif_receive_skb_internal+0x34/0xe0
[ 3811.089652]  napi_gro_receive+0xba/0xe0
[ 3811.096969]  mlx5e_handle_rx_cqe+0x1eb/0x530 [mlx5_core]
[ 3811.105545]  ? skb_release_head_state+0x5c/0xb0
[ 3811.113447]  mlx5e_poll_rx_cq+0xc8/0x910 [mlx5_core]
[ 3811.121652]  mlx5e_napi_poll+0xb1/0xc60 [mlx5_core]
[ 3811.129574]  net_rx_action+0x138/0x360
[ 3811.136266]  __do_softirq+0xd8/0x2d2
[ 3811.142679]  irq_exit+0xb4/0xc0
[ 3811.148578]  do_IRQ+0x85/0xd0
[ 3811.154254]  common_interrupt+0xf/0xf
[ 3811.160585]  </IRQ>
[ 3811.165319] RIP: 0033:0x5581e1551ca0
[ 3811.171546] Code: e8 10 41 ff 24 ee 81 7c ca 04 ff ff fe ff 0f 83
87 1c 00 00 8b 03 0f b6 cc 0f b6 e8 83 c3 04 c1 e8 10 41 ff 24 ee 48
8b 2c c2 <48> 89 2c ca 8b 03 0f b6 cc 0f b6 e8 83 c3 04 c1 e8 10 41 ff
24 ee
[ 3811.195925] RSP: 002b:00007ffdd615ebc0 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffffde
[ 3811.206319] RAX: 0000000000000000 RBX: 00000000406c9058 RCX: 000000000000000b
[ 3811.216321] RDX: 000000004099cdc8 RSI: fffffffb40c07eb0 RDI: 000000004183d738
[ 3811.226277] RBP: fffffff444c8c5c0 R08: 000000004099cdc8 R09: 00000000425ce3d8
[ 3811.236340] R10: 0000000044c8c5c0 R11: 000000004139cbb0 R12: 0000000000000000
[ 3811.246349] R13: 00005581ead6a9e0 R14: 000000004166afe8 R15: 00000000406c90f8
[ 3811.256320] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
xt_hashlimit cls_flow cls_u32 sch_htb sch_fq md_mod dm_crypt
algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
ip6table_mangle ip6table_security ip6table_raw ip6table_filter
ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TPROXY
nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark iptable_mangle xt_owner
xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6 iptable_raw
nfnetlink_log xt_NFLOG xt_tcpudp xt_comment xt_conntrack nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_multiport xt_set
iptable_filter bpfilter ip_set_hash_netport ip_set_hash_net
ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc skx_edac
x86_pkg_temp_thermal kvm_intel kvm irqbypass ipmi_ssif crc32_pclmul
crc32c_intel pcbc aesni_intel aes_x86_64 crypto_simd mlx5_core
[ 3811.351698]  cryptd xhci_pci tpm_crb mlxfw glue_helper ioatdma
devlink ipmi_si xhci_hcd dca ipmi_devintf ipmi_msghandler tpm_tis
tpm_tis_core tpm efivarfs ip_tables x_tables
[ 3811.375161] ---[ end trace 1a7795bb39a63cf7 ]---

Is this know? Could it be related to this commit:

* https://github.com/torvalds/linux/commit/598e57e029290be3e7f8f87ff908091a5a22ed2f

Thanks!

^ permalink raw reply

* Re: Crashes in skb clone/allocation in 4.19.18
From: Eric Dumazet @ 2019-01-30 17:00 UTC (permalink / raw)
  To: Ivan Babrou, netdev
  Cc: David S. Miller, Eric Dumazet, Ignat Korchagin, Shawn Bohrer,
	Jakub Sitnicki
In-Reply-To: <CABWYdi1UrpkV28HyenPZgqPnn+_sPqxT4XoP_HED2DC0ixxG-w@mail.gmail.com>



On 01/30/2019 08:51 AM, Ivan Babrou wrote:
> Hey,
> 
> We've upgraded some machines from 4.19.13 to 4.19.18 and some of them
> crashed with the following:
> 
> [ 2313.192006] general protection fault: 0000 [#1] SMP PTI
> [ 2313.205924] CPU: 32 PID: 65437 Comm: nginx-fl Tainted: G
> O      4.19.18-cloudflare-2019.1.8 #2019.1.8
> [ 2313.224973] Hardware name: Quanta Computer Inc. QuantaPlex
> T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> [ 2313.243400] RIP: 0010:kmem_cache_alloc_node+0x178/0x1f0
> [ 2313.257768] Code: 89 fa 4c 89 f6 e8 68 40 a1 00 4c 8b 55 00 58 4d
> 85 d2 75 d6 e9 6f ff ff ff 41 8b 59 20 48 8d 4a 01 4c 89 f8 49 8b 39
> 4c 01 fb <48> 33 1b 49 33 99 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0
> 0f 84
> [ 2313.295550] RSP: 0000:ffff94457f903b48 EFLAGS: 00010202
> [ 2313.310352] RAX: 08b82daf1f57da0e RBX: 08b82daf1f57da0e RCX: 00000000005ff72d
> [ 2313.327189] RDX: 00000000005ff72c RSI: 0000000000480220 RDI: 0000000000026e40
> [ 2313.344029] RBP: ffff94457f04d680 R08: ffff94457f926e40 R09: ffff94457f04d680
> [ 2313.360912] R10: 000004ce652a0026 R11: 0000000000000000 R12: 0000000000480220
> [ 2313.377857] R13: 00000000ffffffff R14: ffffffffb1ab3ab7 R15: 08b82daf1f57da0e
> [ 2313.394820] FS:  00007fdea755c780(0000) GS:ffff94457f900000(0000)
> knlGS:0000000000000000
> [ 2313.412887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2313.428581] CR2: 000055acc3cf517b CR3: 000000201b1ea003 CR4: 00000000003606e0
> [ 2313.445753] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2313.462843] perf: interrupt took too long (8028 > 7291), lowering
> kernel.perf_event_max_sample_rate to 24000
> [ 2313.462867] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2313.500216] Call Trace:
> [ 2313.512833]  <IRQ>
> [ 2313.524748]  __alloc_skb+0x57/0x1d0
> [ 2313.537934]  __tcp_send_ack.part.48+0x2f/0x100
> [ 2313.551845]  tcp_rcv_established+0x550/0x640
> [ 2313.565394]  tcp_v4_do_rcv+0x12a/0x1e0
> [ 2313.578322]  tcp_v4_rcv+0xadc/0xbd0
> [ 2313.590993]  ip_local_deliver_finish+0x5d/0x1d0
> [ 2313.604727]  ip_local_deliver+0x6b/0xe0
> [ 2313.617782]  ? ip_sublist_rcv+0x200/0x200
> [ 2313.630415] perf: interrupt took too long (10040 > 10035), lowering
> kernel.perf_event_max_sample_rate to 19000
> [ 2313.630948]  ip_rcv+0x52/0xd0
> [ 2313.662850]  ? ip_rcv_core.isra.22+0x2b0/0x2b0
> [ 2313.662857]  __netif_receive_skb_one_core+0x52/0x70
> [ 2313.690860]  netif_receive_skb_internal+0x34/0xe0
> [ 2313.690883]  efx_rx_deliver+0x11a/0x180 [sfc]
> [ 2313.717780]  ? __efx_rx_packet+0x1ef/0x730 [sfc]
> [ 2313.717786]  ? __queue_work+0x103/0x3e0
> [ 2313.743118]  ? efx_poll+0x35e/0x460 [sfc]
> [ 2313.743125]  ? net_rx_action+0x138/0x360
> [ 2313.767356]  ? __do_softirq+0xd8/0x2d2
> [ 2313.767362]  ? irq_exit+0xb4/0xc0
> [ 2313.790680]  ? do_IRQ+0x85/0xd0
> [ 2313.790688]  ? common_interrupt+0xf/0xf
> [ 2313.790694]  </IRQ>
> [ 2313.823837] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
> xt_hashlimit cls_flow cls_u32 sch_htb sch_fq md_mod dm_crypt
> algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw xt_nat iptable_nat
> nf_nat_ipv4 nf_nat xt_TPROXY nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark
> iptable_mangle xt_owner xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6
> iptable_raw ip6table_filter ip6_tables nfnetlink_log xt_NFLOG
> xt_tcpudp xt_comment xt_conntrack nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 xt_mark xt_multiport xt_set iptable_filter bpfilter
> ip_set_hash_netport ip_set_hash_net ip_set_hash_ip ip_set nfnetlink
> 8021q garp mrp stp llc sb_edac x86_pkg_temp_thermal kvm_intel kvm
> irqbypass crc32_pclmul crc32c_intel pcbc aesni_intel aes_x86_64
> ipmi_ssif crypto_simd cryptd
> [ 2313.952153]  sfc(O) glue_helper igb i2c_algo_bit ipmi_si mdio dca
> ipmi_devintf ipmi_msghandler efivarfs ip_tables x_tables
> [ 2313.952238] ---[ end trace 477d8e3081c605f6 ]---
> 
> Some nodes also crashed in skb_clone, rather than __alloc_skb:
> 
> [ 3810.686137] general protection fault: 0000 [#1] SMP PTI
> [ 3810.694579] CPU: 64 PID: 69338 Comm: nginx-fl Not tainted
> 4.19.18-cloudflare-2019.1.8 #2019.1.8
> [ 3810.706589] Hardware name: Quanta Cloud Technology Inc. QuantaPlex
> T42S-2U(LBG-4) ^S5SZ090028/T42S-2U MB (Lewisburg-4), BIOS 3A11.Q10
> 06/29/2018
> [ 3810.726475] RIP: 0010:kmem_cache_alloc+0x89/0x1c0
> [ 3810.734701] Code: 82 72 49 83 78 10 00 4d 8b 30 0f 84 0e 01 00 00
> 4d 85 f6 0f 84 05 01 00 00 41 8b 5f 20 48 8d 4a 01 4c 89 f0 49 8b 3f
> 4c 01 f3 <48> 33 1b 49 33 9f 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0
> 74 b2
> [ 3810.761088] RSP: 0000:ffff99723fe03730 EFLAGS: 00010282
> [ 3810.770132] RAX: f0382d8aebf1ae68 RBX: f0382d8aebf1ae68 RCX: 0000000001cb61cf
> [ 3810.781105] RDX: 0000000001cb61ce RSI: 0000000000480020 RDI: 0000000000027550
> [ 3810.792012] RBP: ffff99723f19d500 R08: ffff99723fe27550 R09: 00000000000005dc
> [ 3810.802820] R10: ffff9992227c0000 R11: 0000000000004000 R12: 0000000000480020
> [ 3810.813589] R13: ffffffff8dcb5f7d R14: f0382d8aebf1ae68 R15: ffff99723f19d500
> [ 3810.824382] FS:  00007f2a8863c780(0000) GS:ffff99723fe00000(0000)
> knlGS:0000000000000000
> [ 3810.836189] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3810.845662] CR2: 000055820762eecd CR3: 00000019eb850003 CR4: 00000000007606e0
> [ 3810.856567] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 3810.867600] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 3810.878554] PKRU: 55555554
> [ 3810.884787] Call Trace:
> [ 3810.890601]  <IRQ>
> [ 3810.896116]  skb_clone+0x4d/0xb0
> [ 3810.902712]  dev_queue_xmit_nit+0xd9/0x260
> [ 3810.910181]  dev_hard_start_xmit+0x69/0x1f0
> [ 3810.917784]  __dev_queue_xmit+0x6f7/0x8a0
> [ 3810.925172]  ? eth_header+0x26/0xc0
> [ 3810.932053]  ip_finish_output2+0x193/0x400
> [ 3810.939670]  ? ip_finish_output+0x139/0x270
> [ 3810.947241]  ip_output+0x6c/0xe0
> [ 3810.953844]  ? ip_append_data.part.51+0xc0/0xc0
> [ 3810.961802]  __tcp_transmit_skb+0x511/0xaa0
> [ 3810.969420]  __tcp_retransmit_skb+0x19c/0x7c0
> [ 3810.977209]  ? tcp_current_mss+0x57/0xa0
> [ 3810.984493]  tcp_retransmit_skb+0x12/0x80
> [ 3810.991894]  tcp_xmit_retransmit_queue.part.50+0x147/0x240
> [ 3811.000754]  tcp_ack+0x9c4/0x11b0
> [ 3811.007416]  tcp_rcv_established+0x190/0x640
> [ 3811.015065]  ? tcp_v4_inbound_md5_hash+0x69/0x160
> [ 3811.023106]  tcp_v4_do_rcv+0x12a/0x1e0
> [ 3811.030190]  tcp_v4_rcv+0xadc/0xbd0
> [ 3811.037009]  ip_local_deliver_finish+0x5d/0x1d0
> [ 3811.044859]  ip_local_deliver+0x6b/0xe0
> [ 3811.051999]  ? ip_sublist_rcv+0x200/0x200
> [ 3811.059325]  ip_rcv+0x52/0xd0
> [ 3811.065595]  ? ip_rcv_core.isra.22+0x2b0/0x2b0
> [ 3811.073361]  __netif_receive_skb_one_core+0x52/0x70
> [ 3811.081621]  netif_receive_skb_internal+0x34/0xe0
> [ 3811.089652]  napi_gro_receive+0xba/0xe0
> [ 3811.096969]  mlx5e_handle_rx_cqe+0x1eb/0x530 [mlx5_core]
> [ 3811.105545]  ? skb_release_head_state+0x5c/0xb0
> [ 3811.113447]  mlx5e_poll_rx_cq+0xc8/0x910 [mlx5_core]
> [ 3811.121652]  mlx5e_napi_poll+0xb1/0xc60 [mlx5_core]
> [ 3811.129574]  net_rx_action+0x138/0x360
> [ 3811.136266]  __do_softirq+0xd8/0x2d2
> [ 3811.142679]  irq_exit+0xb4/0xc0
> [ 3811.148578]  do_IRQ+0x85/0xd0
> [ 3811.154254]  common_interrupt+0xf/0xf
> [ 3811.160585]  </IRQ>
> [ 3811.165319] RIP: 0033:0x5581e1551ca0
> [ 3811.171546] Code: e8 10 41 ff 24 ee 81 7c ca 04 ff ff fe ff 0f 83
> 87 1c 00 00 8b 03 0f b6 cc 0f b6 e8 83 c3 04 c1 e8 10 41 ff 24 ee 48
> 8b 2c c2 <48> 89 2c ca 8b 03 0f b6 cc 0f b6 e8 83 c3 04 c1 e8 10 41 ff
> 24 ee
> [ 3811.195925] RSP: 002b:00007ffdd615ebc0 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffffde
> [ 3811.206319] RAX: 0000000000000000 RBX: 00000000406c9058 RCX: 000000000000000b
> [ 3811.216321] RDX: 000000004099cdc8 RSI: fffffffb40c07eb0 RDI: 000000004183d738
> [ 3811.226277] RBP: fffffff444c8c5c0 R08: 000000004099cdc8 R09: 00000000425ce3d8
> [ 3811.236340] R10: 0000000044c8c5c0 R11: 000000004139cbb0 R12: 0000000000000000
> [ 3811.246349] R13: 00005581ead6a9e0 R14: 000000004166afe8 R15: 00000000406c90f8
> [ 3811.256320] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
> xt_hashlimit cls_flow cls_u32 sch_htb sch_fq md_mod dm_crypt
> algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TPROXY
> nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark iptable_mangle xt_owner
> xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6 iptable_raw
> nfnetlink_log xt_NFLOG xt_tcpudp xt_comment xt_conntrack nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_multiport xt_set
> iptable_filter bpfilter ip_set_hash_netport ip_set_hash_net
> ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc skx_edac
> x86_pkg_temp_thermal kvm_intel kvm irqbypass ipmi_ssif crc32_pclmul
> crc32c_intel pcbc aesni_intel aes_x86_64 crypto_simd mlx5_core
> [ 3811.351698]  cryptd xhci_pci tpm_crb mlxfw glue_helper ioatdma
> devlink ipmi_si xhci_hcd dca ipmi_devintf ipmi_msghandler tpm_tis
> tpm_tis_core tpm efivarfs ip_tables x_tables
> [ 3811.375161] ---[ end trace 1a7795bb39a63cf7 ]---
> 
> Is this know? Could it be related to this commit:
> 
> * https://github.com/torvalds/linux/commit/598e57e029290be3e7f8f87ff908091a5a22ed2f
> 

I do not believe this commit could explain these crashes.

Given they are about 580 commits between 4.19.13 and 4.19.18, a bisection might be the easier way
to find the problem.

Thanks.


^ permalink raw reply

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
From: John David Anglin @ 2019-01-30 17:08 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <20190123002240.GF3634@lunn.ch>

On 2019-01-22 7:22 p.m., Andrew Lunn wrote:
> >From my Espressobin
>
> cat /proc/interrupts
> ...
>  44:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
>  46:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
>  48:         38         24  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
>  51:          0          1  mv88e6xxx-g2   1 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:11
>  52:          0          0  mv88e6xxx-g2   2 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12
>  53:         38         23  mv88e6xxx-g2   3 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:13
>
> These are PHY interrupts.
If we come back to my trying to use the INTn pin on the esspressobin, I
have found that clearing and resetting the interrupt
enable bits in the global control register (offset 0x4) restarts link
detection when the device is stuck.  This suggests that the
INTn connection to MPP2_23 is low when the the GIC interrupt is enabled
on this pin.  Possibly, this is caused by the fact
that EEIntEn is set to 1 on reset.  INTn then goes low when EEPROM
loading is done.  Another possibility might be race conditions
in processing interrupts.

Thoughts?

Dave

-- 
John David Anglin  dave.anglin@bell.net



^ permalink raw reply

* selftests: net: test_vxlan_fdb_changelink.sh: expected two remotes after link set [FAIL]
From: Naresh Kamboju @ 2019-01-30 17:13 UTC (permalink / raw)
  To: petrm; +Cc: Netdev

Newly added selftests: net: test_vxlan_fdb_changelink.sh Failed on
4.19, 4.14, 4.9 and 4.4
PASS on 4.20, mainline and -next
This test case is added in kselftest version update to 4.20.

selftests: net: test_vxlan_fdb_changelink.sh
expected two remotes after fdb append [ OK ]
expected two remotes after link set [FAIL]
not ok 1.. selftests: net: test_vxlan_fdb_changelink.sh [FAIL]
selftests: net_test_vxlan_fdb_changelink.sh [FAIL]

Test output with set -x:
------------------------------
+ ./test_vxlan_fdb_changelink.sh
+ ip link add name vx up type vxlan id 2000 dstport 4789
+ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.20 self permanent
+ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.30 self permanent
+ check_remotes 'fdb append'
+ local 'what=fdb append'
+ shift
++ wc -l
++ grep 00:00:00:00:00:00
++ bridge fdb sh dev vx
+ local N=2
+ echo -ne 'expected two remotes after fdb append\t'
expected two remotes after fdb append + [[ 2 != 2 ]]
+ echo '[ OK ]'
[ OK ]
+ ip link set dev vx type vxlan remote 192.0.2.30
+ check_remotes 'link set'
+ local 'what=link set'
+ shift
++ bridge fdb sh dev vx
++ grep 00:00:00:00:00:00
++ wc -l
+ local N=3
+ echo -ne 'expected two remotes after link set\t'
expected two remotes after link set + [[ 3 != 2 ]]
+ echo '[FAIL]'
[FAIL]
+ EXIT_STATUS=1
+ ip link del dev vx
+ exit 1

Test results comparison,
------------------------
https://qa-reports.linaro.org/_/comparetest/?project=22&project=6&project=58&project=135&project=141&project=40&project=23&suite=kselftest&test=net_test_vxlan_fdb_changelink.sh

Full test log,
https://lkft.validation.linaro.org/scheduler/job/584159#L10161
https://lkft.validation.linaro.org/scheduler/job/590328#L1286

^ permalink raw reply

* Re: Crashes in skb clone/allocation in 4.19.18
From: Cong Wang @ 2019-01-30 17:15 UTC (permalink / raw)
  To: Ivan Babrou
  Cc: Linux Kernel Network Developers, David S. Miller, Eric Dumazet,
	Ignat Korchagin, Shawn Bohrer, Jakub Sitnicki
In-Reply-To: <CABWYdi1UrpkV28HyenPZgqPnn+_sPqxT4XoP_HED2DC0ixxG-w@mail.gmail.com>

On Wed, Jan 30, 2019 at 8:54 AM Ivan Babrou <ivan@cloudflare.com> wrote:
>
> Hey,
>
> We've upgraded some machines from 4.19.13 to 4.19.18 and some of them
> crashed with the following:
>
> [ 2313.192006] general protection fault: 0000 [#1] SMP PTI
> [ 2313.205924] CPU: 32 PID: 65437 Comm: nginx-fl Tainted: G
> O      4.19.18-cloudflare-2019.1.8 #2019.1.8
> [ 2313.224973] Hardware name: Quanta Computer Inc. QuantaPlex
> T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> [ 2313.243400] RIP: 0010:kmem_cache_alloc_node+0x178/0x1f0

This looks more like an mm bug than a networking one.

Also, it is always helpful if you can map the RIP to source code,
using scripts/faddr2line or scripts/decode_stacktrace.sh.


Thanks.


> [ 2313.257768] Code: 89 fa 4c 89 f6 e8 68 40 a1 00 4c 8b 55 00 58 4d
> 85 d2 75 d6 e9 6f ff ff ff 41 8b 59 20 48 8d 4a 01 4c 89 f8 49 8b 39
> 4c 01 fb <48> 33 1b 49 33 99 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0
> 0f 84
> [ 2313.295550] RSP: 0000:ffff94457f903b48 EFLAGS: 00010202
> [ 2313.310352] RAX: 08b82daf1f57da0e RBX: 08b82daf1f57da0e RCX: 00000000005ff72d
> [ 2313.327189] RDX: 00000000005ff72c RSI: 0000000000480220 RDI: 0000000000026e40
> [ 2313.344029] RBP: ffff94457f04d680 R08: ffff94457f926e40 R09: ffff94457f04d680
> [ 2313.360912] R10: 000004ce652a0026 R11: 0000000000000000 R12: 0000000000480220
> [ 2313.377857] R13: 00000000ffffffff R14: ffffffffb1ab3ab7 R15: 08b82daf1f57da0e
> [ 2313.394820] FS:  00007fdea755c780(0000) GS:ffff94457f900000(0000)
> knlGS:0000000000000000
> [ 2313.412887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2313.428581] CR2: 000055acc3cf517b CR3: 000000201b1ea003 CR4: 00000000003606e0
> [ 2313.445753] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2313.462843] perf: interrupt took too long (8028 > 7291), lowering
> kernel.perf_event_max_sample_rate to 24000
> [ 2313.462867] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2313.500216] Call Trace:
> [ 2313.512833]  <IRQ>
> [ 2313.524748]  __alloc_skb+0x57/0x1d0
> [ 2313.537934]  __tcp_send_ack.part.48+0x2f/0x100
> [ 2313.551845]  tcp_rcv_established+0x550/0x640
> [ 2313.565394]  tcp_v4_do_rcv+0x12a/0x1e0
> [ 2313.578322]  tcp_v4_rcv+0xadc/0xbd0
> [ 2313.590993]  ip_local_deliver_finish+0x5d/0x1d0
> [ 2313.604727]  ip_local_deliver+0x6b/0xe0
> [ 2313.617782]  ? ip_sublist_rcv+0x200/0x200
> [ 2313.630415] perf: interrupt took too long (10040 > 10035), lowering
> kernel.perf_event_max_sample_rate to 19000
> [ 2313.630948]  ip_rcv+0x52/0xd0
> [ 2313.662850]  ? ip_rcv_core.isra.22+0x2b0/0x2b0
> [ 2313.662857]  __netif_receive_skb_one_core+0x52/0x70
> [ 2313.690860]  netif_receive_skb_internal+0x34/0xe0
> [ 2313.690883]  efx_rx_deliver+0x11a/0x180 [sfc]
> [ 2313.717780]  ? __efx_rx_packet+0x1ef/0x730 [sfc]
> [ 2313.717786]  ? __queue_work+0x103/0x3e0
> [ 2313.743118]  ? efx_poll+0x35e/0x460 [sfc]
> [ 2313.743125]  ? net_rx_action+0x138/0x360
> [ 2313.767356]  ? __do_softirq+0xd8/0x2d2
> [ 2313.767362]  ? irq_exit+0xb4/0xc0
> [ 2313.790680]  ? do_IRQ+0x85/0xd0
> [ 2313.790688]  ? common_interrupt+0xf/0xf
> [ 2313.790694]  </IRQ>
> [ 2313.823837] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
> xt_hashlimit cls_flow cls_u32 sch_htb sch_fq md_mod dm_crypt
> algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw xt_nat iptable_nat
> nf_nat_ipv4 nf_nat xt_TPROXY nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark
> iptable_mangle xt_owner xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6
> iptable_raw ip6table_filter ip6_tables nfnetlink_log xt_NFLOG
> xt_tcpudp xt_comment xt_conntrack nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 xt_mark xt_multiport xt_set iptable_filter bpfilter
> ip_set_hash_netport ip_set_hash_net ip_set_hash_ip ip_set nfnetlink
> 8021q garp mrp stp llc sb_edac x86_pkg_temp_thermal kvm_intel kvm
> irqbypass crc32_pclmul crc32c_intel pcbc aesni_intel aes_x86_64
> ipmi_ssif crypto_simd cryptd
> [ 2313.952153]  sfc(O) glue_helper igb i2c_algo_bit ipmi_si mdio dca
> ipmi_devintf ipmi_msghandler efivarfs ip_tables x_tables
> [ 2313.952238] ---[ end trace 477d8e3081c605f6 ]---
>
> Some nodes also crashed in skb_clone, rather than __alloc_skb:
>
> [ 3810.686137] general protection fault: 0000 [#1] SMP PTI
> [ 3810.694579] CPU: 64 PID: 69338 Comm: nginx-fl Not tainted
> 4.19.18-cloudflare-2019.1.8 #2019.1.8
> [ 3810.706589] Hardware name: Quanta Cloud Technology Inc. QuantaPlex
> T42S-2U(LBG-4) ^S5SZ090028/T42S-2U MB (Lewisburg-4), BIOS 3A11.Q10
> 06/29/2018
> [ 3810.726475] RIP: 0010:kmem_cache_alloc+0x89/0x1c0
> [ 3810.734701] Code: 82 72 49 83 78 10 00 4d 8b 30 0f 84 0e 01 00 00
> 4d 85 f6 0f 84 05 01 00 00 41 8b 5f 20 48 8d 4a 01 4c 89 f0 49 8b 3f
> 4c 01 f3 <48> 33 1b 49 33 9f 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0
> 74 b2
> [ 3810.761088] RSP: 0000:ffff99723fe03730 EFLAGS: 00010282
> [ 3810.770132] RAX: f0382d8aebf1ae68 RBX: f0382d8aebf1ae68 RCX: 0000000001cb61cf
> [ 3810.781105] RDX: 0000000001cb61ce RSI: 0000000000480020 RDI: 0000000000027550
> [ 3810.792012] RBP: ffff99723f19d500 R08: ffff99723fe27550 R09: 00000000000005dc
> [ 3810.802820] R10: ffff9992227c0000 R11: 0000000000004000 R12: 0000000000480020
> [ 3810.813589] R13: ffffffff8dcb5f7d R14: f0382d8aebf1ae68 R15: ffff99723f19d500
> [ 3810.824382] FS:  00007f2a8863c780(0000) GS:ffff99723fe00000(0000)
> knlGS:0000000000000000
> [ 3810.836189] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3810.845662] CR2: 000055820762eecd CR3: 00000019eb850003 CR4: 00000000007606e0
> [ 3810.856567] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 3810.867600] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 3810.878554] PKRU: 55555554
> [ 3810.884787] Call Trace:
> [ 3810.890601]  <IRQ>
> [ 3810.896116]  skb_clone+0x4d/0xb0
> [ 3810.902712]  dev_queue_xmit_nit+0xd9/0x260
> [ 3810.910181]  dev_hard_start_xmit+0x69/0x1f0
> [ 3810.917784]  __dev_queue_xmit+0x6f7/0x8a0
> [ 3810.925172]  ? eth_header+0x26/0xc0
> [ 3810.932053]  ip_finish_output2+0x193/0x400
> [ 3810.939670]  ? ip_finish_output+0x139/0x270
> [ 3810.947241]  ip_output+0x6c/0xe0
> [ 3810.953844]  ? ip_append_data.part.51+0xc0/0xc0
> [ 3810.961802]  __tcp_transmit_skb+0x511/0xaa0
> [ 3810.969420]  __tcp_retransmit_skb+0x19c/0x7c0
> [ 3810.977209]  ? tcp_current_mss+0x57/0xa0
> [ 3810.984493]  tcp_retransmit_skb+0x12/0x80
> [ 3810.991894]  tcp_xmit_retransmit_queue.part.50+0x147/0x240
> [ 3811.000754]  tcp_ack+0x9c4/0x11b0
> [ 3811.007416]  tcp_rcv_established+0x190/0x640
> [ 3811.015065]  ? tcp_v4_inbound_md5_hash+0x69/0x160
> [ 3811.023106]  tcp_v4_do_rcv+0x12a/0x1e0
> [ 3811.030190]  tcp_v4_rcv+0xadc/0xbd0
> [ 3811.037009]  ip_local_deliver_finish+0x5d/0x1d0
> [ 3811.044859]  ip_local_deliver+0x6b/0xe0
> [ 3811.051999]  ? ip_sublist_rcv+0x200/0x200
> [ 3811.059325]  ip_rcv+0x52/0xd0
> [ 3811.065595]  ? ip_rcv_core.isra.22+0x2b0/0x2b0
> [ 3811.073361]  __netif_receive_skb_one_core+0x52/0x70
> [ 3811.081621]  netif_receive_skb_internal+0x34/0xe0
> [ 3811.089652]  napi_gro_receive+0xba/0xe0
> [ 3811.096969]  mlx5e_handle_rx_cqe+0x1eb/0x530 [mlx5_core]
> [ 3811.105545]  ? skb_release_head_state+0x5c/0xb0
> [ 3811.113447]  mlx5e_poll_rx_cq+0xc8/0x910 [mlx5_core]
> [ 3811.121652]  mlx5e_napi_poll+0xb1/0xc60 [mlx5_core]
> [ 3811.129574]  net_rx_action+0x138/0x360
> [ 3811.136266]  __do_softirq+0xd8/0x2d2
> [ 3811.142679]  irq_exit+0xb4/0xc0
> [ 3811.148578]  do_IRQ+0x85/0xd0
> [ 3811.154254]  common_interrupt+0xf/0xf
> [ 3811.160585]  </IRQ>
> [ 3811.165319] RIP: 0033:0x5581e1551ca0
> [ 3811.171546] Code: e8 10 41 ff 24 ee 81 7c ca 04 ff ff fe ff 0f 83
> 87 1c 00 00 8b 03 0f b6 cc 0f b6 e8 83 c3 04 c1 e8 10 41 ff 24 ee 48
> 8b 2c c2 <48> 89 2c ca 8b 03 0f b6 cc 0f b6 e8 83 c3 04 c1 e8 10 41 ff
> 24 ee
> [ 3811.195925] RSP: 002b:00007ffdd615ebc0 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffffde
> [ 3811.206319] RAX: 0000000000000000 RBX: 00000000406c9058 RCX: 000000000000000b
> [ 3811.216321] RDX: 000000004099cdc8 RSI: fffffffb40c07eb0 RDI: 000000004183d738
> [ 3811.226277] RBP: fffffff444c8c5c0 R08: 000000004099cdc8 R09: 00000000425ce3d8
> [ 3811.236340] R10: 0000000044c8c5c0 R11: 000000004139cbb0 R12: 0000000000000000
> [ 3811.246349] R13: 00005581ead6a9e0 R14: 000000004166afe8 R15: 00000000406c90f8
> [ 3811.256320] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
> xt_hashlimit cls_flow cls_u32 sch_htb sch_fq md_mod dm_crypt
> algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TPROXY
> nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark iptable_mangle xt_owner
> xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6 iptable_raw
> nfnetlink_log xt_NFLOG xt_tcpudp xt_comment xt_conntrack nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_multiport xt_set
> iptable_filter bpfilter ip_set_hash_netport ip_set_hash_net
> ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc skx_edac
> x86_pkg_temp_thermal kvm_intel kvm irqbypass ipmi_ssif crc32_pclmul
> crc32c_intel pcbc aesni_intel aes_x86_64 crypto_simd mlx5_core
> [ 3811.351698]  cryptd xhci_pci tpm_crb mlxfw glue_helper ioatdma
> devlink ipmi_si xhci_hcd dca ipmi_devintf ipmi_msghandler tpm_tis
> tpm_tis_core tpm efivarfs ip_tables x_tables
> [ 3811.375161] ---[ end trace 1a7795bb39a63cf7 ]---
>
> Is this know? Could it be related to this commit:
>
> * https://github.com/torvalds/linux/commit/598e57e029290be3e7f8f87ff908091a5a22ed2f
>
> Thanks!

^ permalink raw reply

* Need it?
From: Jane @ 2019-01-30 11:04 UTC (permalink / raw)
  To: netdev

Do you need to make white background for your photos?
Adding  clipping path, or retouching?

We can do it for you.
Let's start with testing for your photos.

Thanks,
Jane


^ permalink raw reply

* Re: Crashes in skb clone/allocation in 4.19.18
From: Lance Richardson @ 2019-01-30 17:28 UTC (permalink / raw)
  To: Cong Wang
  Cc: Ivan Babrou, Linux Kernel Network Developers, David S. Miller,
	Eric Dumazet, Ignat Korchagin, Shawn Bohrer, Jakub Sitnicki
In-Reply-To: <CAM_iQpU9TqPp4SKLa0Z=kbnaJzd=PsgcFKMX25W9YYrJp2658g@mail.gmail.com>

On Wed, Jan 30, 2019 at 12:17 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Wed, Jan 30, 2019 at 8:54 AM Ivan Babrou <ivan@cloudflare.com> wrote:
> >
> > Hey,
> >
> > We've upgraded some machines from 4.19.13 to 4.19.18 and some of them
> > crashed with the following:
> >
> > [ 2313.192006] general protection fault: 0000 [#1] SMP PTI
> > [ 2313.205924] CPU: 32 PID: 65437 Comm: nginx-fl Tainted: G
> > O      4.19.18-cloudflare-2019.1.8 #2019.1.8

"Tainted: GO" appears to mean that an out-of tree kernel module was
loaded. If so, information about that module and whether the crash
occurs when it hasn't been loaded might be of interest.

   - Lance

> > [ 2313.224973] Hardware name: Quanta Computer Inc. QuantaPlex
> > T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> > [ 2313.243400] RIP: 0010:kmem_cache_alloc_node+0x178/0x1f0
>
> This looks more like an mm bug than a networking one.
>
> Also, it is always helpful if you can map the RIP to source code,
> using scripts/faddr2line or scripts/decode_stacktrace.sh.
>
>
> Thanks.
>

^ permalink raw reply

* Re: net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs
From: Andrew Lunn @ 2019-01-30 17:28 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <f13cf118-79cb-0430-8605-53a662fe53d7@bell.net>

On Wed, Jan 30, 2019 at 12:08:39PM -0500, John David Anglin wrote:
> On 2019-01-22 7:22 p.m., Andrew Lunn wrote:
> > >From my Espressobin
> >
> > cat /proc/interrupts
> > ...
> >  44:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
> >  46:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
> >  48:         38         24  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
> >  51:          0          1  mv88e6xxx-g2   1 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:11
> >  52:          0          0  mv88e6xxx-g2   2 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12
> >  53:         38         23  mv88e6xxx-g2   3 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:13
> >
> > These are PHY interrupts.
> If we come back to my trying to use the INTn pin on the esspressobin, I
> have found that clearing and resetting the interrupt
> enable bits in the global control register (offset 0x4) restarts link
> detection when the device is stuck.  This suggests that the
> INTn connection to MPP2_23 is low when the the GIC interrupt is enabled
> on this pin.  Possibly, this is caused by the fact
> that EEIntEn is set to 1 on reset.  INTn then goes low when EEPROM
> loading is done.  Another possibility might be race conditions
> in processing interrupts.
> 
> Thoughts?

Hi David

You need active low interrupts. Without it, i think you are always
going to have race conditions which will cause interrupts to get
stuck/lost.

I would suggest you remove the interrupt from your device tree and use
the mv88e6xxx polling method. If i remember correctly, it currently
polls 10 per second, so PHY link up/down is going to be 5 times faster
on average than when phylib is polling the PHY.

   Andrew

^ permalink raw reply

* Re: Crashes in skb clone/allocation in 4.19.18
From: Edward Cree @ 2019-01-30 17:33 UTC (permalink / raw)
  To: Ivan Babrou, netdev
  Cc: David S. Miller, Eric Dumazet, Ignat Korchagin, Shawn Bohrer,
	Jakub Sitnicki
In-Reply-To: <CABWYdi1UrpkV28HyenPZgqPnn+_sPqxT4XoP_HED2DC0ixxG-w@mail.gmail.com>

On 30/01/19 16:51, Ivan Babrou wrote:
> Hey,
>
> We've upgraded some machines from 4.19.13 to 4.19.18 and some of them
> crashed with the following:
>
> [ 2313.192006] general protection fault: 0000 [#1] SMP PTI
> [ 2313.205924] CPU: 32 PID: 65437 Comm: nginx-fl Tainted: G
> O      4.19.18-cloudflare-2019.1.8 #2019.1.8
> [ 2313.224973] Hardware name: Quanta Computer Inc. QuantaPlex
> T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> [ 2313.243400] RIP: 0010:kmem_cache_alloc_node+0x178/0x1f0
> [ 2313.257768] Code: 89 fa 4c 89 f6 e8 68 40 a1 00 4c 8b 55 00 58 4d
> 85 d2 75 d6 e9 6f ff ff ff 41 8b 59 20 48 8d 4a 01 4c 89 f8 49 8b 39
> 4c 01 fb <48> 33 1b 49 33 99 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0
> 0f 84
> [ 2313.295550] RSP: 0000:ffff94457f903b48 EFLAGS: 00010202
> [ 2313.310352] RAX: 08b82daf1f57da0e RBX: 08b82daf1f57da0e RCX: 00000000005ff72d
> [ 2313.327189] RDX: 00000000005ff72c RSI: 0000000000480220 RDI: 0000000000026e40
> [ 2313.344029] RBP: ffff94457f04d680 R08: ffff94457f926e40 R09: ffff94457f04d680
> [ 2313.360912] R10: 000004ce652a0026 R11: 0000000000000000 R12: 0000000000480220
> [ 2313.377857] R13: 00000000ffffffff R14: ffffffffb1ab3ab7 R15: 08b82daf1f57da0e
> [ 2313.394820] FS:  00007fdea755c780(0000) GS:ffff94457f900000(0000)
> knlGS:0000000000000000
> [ 2313.412887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2313.428581] CR2: 000055acc3cf517b CR3: 000000201b1ea003 CR4: 00000000003606e0
> [ 2313.445753] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2313.462843] perf: interrupt took too long (8028 > 7291), lowering
> kernel.perf_event_max_sample_rate to 24000
> [ 2313.462867] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2313.500216] Call Trace:
> [ 2313.512833]  <IRQ>
> [ 2313.524748]  __alloc_skb+0x57/0x1d0
> [ 2313.537934]  __tcp_send_ack.part.48+0x2f/0x100
> [ 2313.551845]  tcp_rcv_established+0x550/0x640
> [ 2313.565394]  tcp_v4_do_rcv+0x12a/0x1e0
> [ 2313.578322]  tcp_v4_rcv+0xadc/0xbd0
> [ 2313.590993]  ip_local_deliver_finish+0x5d/0x1d0
> [ 2313.604727]  ip_local_deliver+0x6b/0xe0
> [ 2313.617782]  ? ip_sublist_rcv+0x200/0x200
> [ 2313.630415] perf: interrupt took too long (10040 > 10035), lowering
> kernel.perf_event_max_sample_rate to 19000
> [ 2313.630948]  ip_rcv+0x52/0xd0
> [ 2313.662850]  ? ip_rcv_core.isra.22+0x2b0/0x2b0
> [ 2313.662857]  __netif_receive_skb_one_core+0x52/0x70
> [ 2313.690860]  netif_receive_skb_internal+0x34/0xe0
> [ 2313.690883]  efx_rx_deliver+0x11a/0x180 [sfc]
> [ 2313.717780]  ? __efx_rx_packet+0x1ef/0x730 [sfc]
> [ 2313.717786]  ? __queue_work+0x103/0x3e0
> [ 2313.743118]  ? efx_poll+0x35e/0x460 [sfc]
> [ 2313.743125]  ? net_rx_action+0x138/0x360
> [ 2313.767356]  ? __do_softirq+0xd8/0x2d2
> [ 2313.767362]  ? irq_exit+0xb4/0xc0
> [ 2313.790680]  ? do_IRQ+0x85/0xd0
> [ 2313.790688]  ? common_interrupt+0xf/0xf
> [ 2313.790694]  </IRQ>
Something odd is going on.  As far as I can tell from this call trace
 (which has some weirdness in it; any chance you could reproduce with
 frame pointers or a lower build optimisation level?) you're in the
 normal sfc receive path (under efx_process_channel(), although that's
 one of the functions that hasn't made it into the stack trace), which
 means you should have a channel->rx_list, and thus efx_rx_deliver()
 should be putting the packet on that list rather than calling
 netif_receive_skb().

I don't know how, or if, that could be related to the crash you're
 getting, but it might be worth looking into.
(It can't be the whole story, as your other crash is on a mlx5e and
 AFAIK they don't use list-RX yet.  Though, confusingly, an entry for
 ip_sublist_rcv still makes it into both stack traces.)

Maybe it's secondary damage from a wild pointer or other mm problem
 letting memory get scribbled on.

-Ed
> [ 2313.823837] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
> xt_hashlimit cls_flow cls_u32 sch_htb sch_fq md_mod dm_crypt
> algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw xt_nat iptable_nat
> nf_nat_ipv4 nf_nat xt_TPROXY nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark
> iptable_mangle xt_owner xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6
> iptable_raw ip6table_filter ip6_tables nfnetlink_log xt_NFLOG
> xt_tcpudp xt_comment xt_conntrack nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 xt_mark xt_multiport xt_set iptable_filter bpfilter
> ip_set_hash_netport ip_set_hash_net ip_set_hash_ip ip_set nfnetlink
> 8021q garp mrp stp llc sb_edac x86_pkg_temp_thermal kvm_intel kvm
> irqbypass crc32_pclmul crc32c_intel pcbc aesni_intel aes_x86_64
> ipmi_ssif crypto_simd cryptd
> [ 2313.952153]  sfc(O) glue_helper igb i2c_algo_bit ipmi_si mdio dca
> ipmi_devintf ipmi_msghandler efivarfs ip_tables x_tables
> [ 2313.952238] ---[ end trace 477d8e3081c605f6 ]---
>
> Some nodes also crashed in skb_clone, rather than __alloc_skb:
>
> [ 3810.686137] general protection fault: 0000 [#1] SMP PTI
> [ 3810.694579] CPU: 64 PID: 69338 Comm: nginx-fl Not tainted
> 4.19.18-cloudflare-2019.1.8 #2019.1.8
> [ 3810.706589] Hardware name: Quanta Cloud Technology Inc. QuantaPlex
> T42S-2U(LBG-4) ^S5SZ090028/T42S-2U MB (Lewisburg-4), BIOS 3A11.Q10
> 06/29/2018
> [ 3810.726475] RIP: 0010:kmem_cache_alloc+0x89/0x1c0
> [ 3810.734701] Code: 82 72 49 83 78 10 00 4d 8b 30 0f 84 0e 01 00 00
> 4d 85 f6 0f 84 05 01 00 00 41 8b 5f 20 48 8d 4a 01 4c 89 f0 49 8b 3f
> 4c 01 f3 <48> 33 1b 49 33 9f 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0
> 74 b2
> [ 3810.761088] RSP: 0000:ffff99723fe03730 EFLAGS: 00010282
> [ 3810.770132] RAX: f0382d8aebf1ae68 RBX: f0382d8aebf1ae68 RCX: 0000000001cb61cf
> [ 3810.781105] RDX: 0000000001cb61ce RSI: 0000000000480020 RDI: 0000000000027550
> [ 3810.792012] RBP: ffff99723f19d500 R08: ffff99723fe27550 R09: 00000000000005dc
> [ 3810.802820] R10: ffff9992227c0000 R11: 0000000000004000 R12: 0000000000480020
> [ 3810.813589] R13: ffffffff8dcb5f7d R14: f0382d8aebf1ae68 R15: ffff99723f19d500
> [ 3810.824382] FS:  00007f2a8863c780(0000) GS:ffff99723fe00000(0000)
> knlGS:0000000000000000
> [ 3810.836189] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3810.845662] CR2: 000055820762eecd CR3: 00000019eb850003 CR4: 00000000007606e0
> [ 3810.856567] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 3810.867600] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 3810.878554] PKRU: 55555554
> [ 3810.884787] Call Trace:
> [ 3810.890601]  <IRQ>
> [ 3810.896116]  skb_clone+0x4d/0xb0
> [ 3810.902712]  dev_queue_xmit_nit+0xd9/0x260
> [ 3810.910181]  dev_hard_start_xmit+0x69/0x1f0
> [ 3810.917784]  __dev_queue_xmit+0x6f7/0x8a0
> [ 3810.925172]  ? eth_header+0x26/0xc0
> [ 3810.932053]  ip_finish_output2+0x193/0x400
> [ 3810.939670]  ? ip_finish_output+0x139/0x270
> [ 3810.947241]  ip_output+0x6c/0xe0
> [ 3810.953844]  ? ip_append_data.part.51+0xc0/0xc0
> [ 3810.961802]  __tcp_transmit_skb+0x511/0xaa0
> [ 3810.969420]  __tcp_retransmit_skb+0x19c/0x7c0
> [ 3810.977209]  ? tcp_current_mss+0x57/0xa0
> [ 3810.984493]  tcp_retransmit_skb+0x12/0x80
> [ 3810.991894]  tcp_xmit_retransmit_queue.part.50+0x147/0x240
> [ 3811.000754]  tcp_ack+0x9c4/0x11b0
> [ 3811.007416]  tcp_rcv_established+0x190/0x640
> [ 3811.015065]  ? tcp_v4_inbound_md5_hash+0x69/0x160
> [ 3811.023106]  tcp_v4_do_rcv+0x12a/0x1e0
> [ 3811.030190]  tcp_v4_rcv+0xadc/0xbd0
> [ 3811.037009]  ip_local_deliver_finish+0x5d/0x1d0
> [ 3811.044859]  ip_local_deliver+0x6b/0xe0
> [ 3811.051999]  ? ip_sublist_rcv+0x200/0x200
> [ 3811.059325]  ip_rcv+0x52/0xd0
> [ 3811.065595]  ? ip_rcv_core.isra.22+0x2b0/0x2b0
> [ 3811.073361]  __netif_receive_skb_one_core+0x52/0x70
> [ 3811.081621]  netif_receive_skb_internal+0x34/0xe0
> [ 3811.089652]  napi_gro_receive+0xba/0xe0
> [ 3811.096969]  mlx5e_handle_rx_cqe+0x1eb/0x530 [mlx5_core]
> [ 3811.105545]  ? skb_release_head_state+0x5c/0xb0
> [ 3811.113447]  mlx5e_poll_rx_cq+0xc8/0x910 [mlx5_core]
> [ 3811.121652]  mlx5e_napi_poll+0xb1/0xc60 [mlx5_core]
> [ 3811.129574]  net_rx_action+0x138/0x360
> [ 3811.136266]  __do_softirq+0xd8/0x2d2
> [ 3811.142679]  irq_exit+0xb4/0xc0
> [ 3811.148578]  do_IRQ+0x85/0xd0
> [ 3811.154254]  common_interrupt+0xf/0xf
> [ 3811.160585]  </IRQ>
> [ 3811.165319] RIP: 0033:0x5581e1551ca0
> [ 3811.171546] Code: e8 10 41 ff 24 ee 81 7c ca 04 ff ff fe ff 0f 83
> 87 1c 00 00 8b 03 0f b6 cc 0f b6 e8 83 c3 04 c1 e8 10 41 ff 24 ee 48
> 8b 2c c2 <48> 89 2c ca 8b 03 0f b6 cc 0f b6 e8 83 c3 04 c1 e8 10 41 ff
> 24 ee
> [ 3811.195925] RSP: 002b:00007ffdd615ebc0 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffffde
> [ 3811.206319] RAX: 0000000000000000 RBX: 00000000406c9058 RCX: 000000000000000b
> [ 3811.216321] RDX: 000000004099cdc8 RSI: fffffffb40c07eb0 RDI: 000000004183d738
> [ 3811.226277] RBP: fffffff444c8c5c0 R08: 000000004099cdc8 R09: 00000000425ce3d8
> [ 3811.236340] R10: 0000000044c8c5c0 R11: 000000004139cbb0 R12: 0000000000000000
> [ 3811.246349] R13: 00005581ead6a9e0 R14: 000000004166afe8 R15: 00000000406c90f8
> [ 3811.256320] Modules linked in: tun xt_connlimit nf_conncount xt_bpf
> xt_hashlimit cls_flow cls_u32 sch_htb sch_fq md_mod dm_crypt
> algif_skcipher af_alg dm_mod dax ip6table_nat nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TPROXY
> nf_tproxy_ipv6 nf_tproxy_ipv4 xt_connmark iptable_mangle xt_owner
> xt_CT xt_socket nf_socket_ipv4 nf_socket_ipv6 iptable_raw
> nfnetlink_log xt_NFLOG xt_tcpudp xt_comment xt_conntrack nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_multiport xt_set
> iptable_filter bpfilter ip_set_hash_netport ip_set_hash_net
> ip_set_hash_ip ip_set nfnetlink 8021q garp mrp stp llc skx_edac
> x86_pkg_temp_thermal kvm_intel kvm irqbypass ipmi_ssif crc32_pclmul
> crc32c_intel pcbc aesni_intel aes_x86_64 crypto_simd mlx5_core
> [ 3811.351698]  cryptd xhci_pci tpm_crb mlxfw glue_helper ioatdma
> devlink ipmi_si xhci_hcd dca ipmi_devintf ipmi_msghandler tpm_tis
> tpm_tis_core tpm efivarfs ip_tables x_tables
> [ 3811.375161] ---[ end trace 1a7795bb39a63cf7 ]---
>
> Is this know? Could it be related to this commit:
>
> * https://github.com/torvalds/linux/commit/598e57e029290be3e7f8f87ff908091a5a22ed2f
>
> Thanks!


^ permalink raw reply

* Re: Crashes in skb clone/allocation in 4.19.18
From: Ivan Babrou @ 2019-01-30 17:34 UTC (permalink / raw)
  To: Lance Richardson
  Cc: Cong Wang, Linux Kernel Network Developers, David S. Miller,
	Eric Dumazet, Ignat Korchagin, Shawn Bohrer, Jakub Sitnicki
In-Reply-To: <CADuNpCyv=nh8iMrRp=T1qqi3j6pfZJyfqKagqDtAo6+ux-b1xg@mail.gmail.com>

On Wed, Jan 30, 2019 at 9:28 AM Lance Richardson <lance604@gmail.com> wrote:
>
> On Wed, Jan 30, 2019 at 12:17 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Wed, Jan 30, 2019 at 8:54 AM Ivan Babrou <ivan@cloudflare.com> wrote:
> > >
> > > Hey,
> > >
> > > We've upgraded some machines from 4.19.13 to 4.19.18 and some of them
> > > crashed with the following:
> > >
> > > [ 2313.192006] general protection fault: 0000 [#1] SMP PTI
> > > [ 2313.205924] CPU: 32 PID: 65437 Comm: nginx-fl Tainted: G
> > > O      4.19.18-cloudflare-2019.1.8 #2019.1.8
>
> "Tainted: GO" appears to mean that an out-of tree kernel module was
> loaded. If so, information about that module and whether the crash
> occurs when it hasn't been loaded might be of interest.

That module is Solarflare NIC driver. On in-tree Mellanox we've only
seen skb_clone crashes.

>    - Lance
>
> > > [ 2313.224973] Hardware name: Quanta Computer Inc. QuantaPlex
> > > T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
> > > [ 2313.243400] RIP: 0010:kmem_cache_alloc_node+0x178/0x1f0
> >
> > This looks more like an mm bug than a networking one.
> >
> > Also, it is always helpful if you can map the RIP to source code,
> > using scripts/faddr2line or scripts/decode_stacktrace.sh.
> >
> >
> > Thanks.
> >

^ permalink raw reply

* Re: Crashes in skb clone/allocation in 4.19.18
From: Edward Cree @ 2019-01-30 17:37 UTC (permalink / raw)
  To: Ivan Babrou, netdev
  Cc: David S. Miller, Eric Dumazet, Ignat Korchagin, Shawn Bohrer,
	Jakub Sitnicki
In-Reply-To: <90051606-1883-7dc7-fe4f-3bb135e816ae@solarflare.com>

On 30/01/19 17:33, Edward Cree wrote:
> On 30/01/19 16:51, Ivan Babrou wrote:
>> Hey,
>>
>> We've upgraded some machines from 4.19.13 to 4.19.18 and some of them
>> crashed with the following:
>>
>> [ 2313.192006] general protection fault: 0000 [#1] SMP PTI
>> [ 2313.205924] CPU: 32 PID: 65437 Comm: nginx-fl Tainted: G
>> O      4.19.18-cloudflare-2019.1.8 #2019.1.8
>> [ 2313.224973] Hardware name: Quanta Computer Inc. QuantaPlex
>> T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
>> [ 2313.243400] RIP: 0010:kmem_cache_alloc_node+0x178/0x1f0
>> [ 2313.257768] Code: 89 fa 4c 89 f6 e8 68 40 a1 00 4c 8b 55 00 58 4d
>> 85 d2 75 d6 e9 6f ff ff ff 41 8b 59 20 48 8d 4a 01 4c 89 f8 49 8b 39
>> 4c 01 fb <48> 33 1b 49 33 99 38 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0
>> 0f 84
>> [ 2313.295550] RSP: 0000:ffff94457f903b48 EFLAGS: 00010202
>> [ 2313.310352] RAX: 08b82daf1f57da0e RBX: 08b82daf1f57da0e RCX: 00000000005ff72d
>> [ 2313.327189] RDX: 00000000005ff72c RSI: 0000000000480220 RDI: 0000000000026e40
>> [ 2313.344029] RBP: ffff94457f04d680 R08: ffff94457f926e40 R09: ffff94457f04d680
>> [ 2313.360912] R10: 000004ce652a0026 R11: 0000000000000000 R12: 0000000000480220
>> [ 2313.377857] R13: 00000000ffffffff R14: ffffffffb1ab3ab7 R15: 08b82daf1f57da0e
>> [ 2313.394820] FS:  00007fdea755c780(0000) GS:ffff94457f900000(0000)
>> knlGS:0000000000000000
>> [ 2313.412887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 2313.428581] CR2: 000055acc3cf517b CR3: 000000201b1ea003 CR4: 00000000003606e0
>> [ 2313.445753] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 2313.462843] perf: interrupt took too long (8028 > 7291), lowering
>> kernel.perf_event_max_sample_rate to 24000
>> [ 2313.462867] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 2313.500216] Call Trace:
>> [ 2313.512833]  <IRQ>
>> [ 2313.524748]  __alloc_skb+0x57/0x1d0
>> [ 2313.537934]  __tcp_send_ack.part.48+0x2f/0x100
>> [ 2313.551845]  tcp_rcv_established+0x550/0x640
>> [ 2313.565394]  tcp_v4_do_rcv+0x12a/0x1e0
>> [ 2313.578322]  tcp_v4_rcv+0xadc/0xbd0
>> [ 2313.590993]  ip_local_deliver_finish+0x5d/0x1d0
>> [ 2313.604727]  ip_local_deliver+0x6b/0xe0
>> [ 2313.617782]  ? ip_sublist_rcv+0x200/0x200
>> [ 2313.630415] perf: interrupt took too long (10040 > 10035), lowering
>> kernel.perf_event_max_sample_rate to 19000
>> [ 2313.630948]  ip_rcv+0x52/0xd0
>> [ 2313.662850]  ? ip_rcv_core.isra.22+0x2b0/0x2b0
>> [ 2313.662857]  __netif_receive_skb_one_core+0x52/0x70
>> [ 2313.690860]  netif_receive_skb_internal+0x34/0xe0
>> [ 2313.690883]  efx_rx_deliver+0x11a/0x180 [sfc]
>> [ 2313.717780]  ? __efx_rx_packet+0x1ef/0x730 [sfc]
>> [ 2313.717786]  ? __queue_work+0x103/0x3e0
>> [ 2313.743118]  ? efx_poll+0x35e/0x460 [sfc]
>> [ 2313.743125]  ? net_rx_action+0x138/0x360
>> [ 2313.767356]  ? __do_softirq+0xd8/0x2d2
>> [ 2313.767362]  ? irq_exit+0xb4/0xc0
>> [ 2313.790680]  ? do_IRQ+0x85/0xd0
>> [ 2313.790688]  ? common_interrupt+0xf/0xf
>> [ 2313.790694]  </IRQ>
> Something odd is going on.  As far as I can tell from this call trace
>  (which has some weirdness in it; any chance you could reproduce with
>  frame pointers or a lower build optimisation level?) you're in the
>  normal sfc receive path (under efx_process_channel(), although that's
>  one of the functions that hasn't made it into the stack trace), which
>  means you should have a channel->rx_list, and thus efx_rx_deliver()
>  should be putting the packet on that list rather than calling
>  netif_receive_skb().
>
> I don't know how, or if, that could be related to the crash you're
>  getting, but it might be worth looking into.
> (It can't be the whole story, as your other crash is on a mlx5e and
>  AFAIK they don't use list-RX yet.  Though, confusingly, an entry for
>  ip_sublist_rcv still makes it into both stack traces.)
>
> Maybe it's secondary damage from a wild pointer or other mm problem
>  letting memory get scribbled on.
>
> -Ed
Aaaand as Lance has just pointed out, you're running the out-of-tree
 sfc driver, which doesn't have list RX yet.  Disregard the above.

-Ed

^ permalink raw reply

* Re: [RFC 03/14] net: hstats: add basic/core functionality
From: Jakub Kicinski @ 2019-01-30 17:44 UTC (permalink / raw)
  To: David Ahern
  Cc: davem, oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch
In-Reply-To: <e5635cee-4a70-624f-2b1b-5f2411c75979@gmail.com>

On Tue, 29 Jan 2019 21:18:28 -0700, David Ahern wrote:
> On 1/28/19 4:44 PM, Jakub Kicinski wrote:
> > @@ -4946,6 +4964,9 @@ static size_t if_nlmsg_stats_size(const struct net_device *dev,
> >  		rcu_read_unlock();
> >  	}
> >  
> > +	if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_HSTATS, 0))  
> 
> filter_mask is populated by RTEXT_FILTER_ from
> include/uapi/linux/rtnetlink.h

ext_filter_mask is from IFLA_EXT_MASK and has RTEXT_FILTER_ bits set.
Here the mask is from struct if_stats_msg::filter_mask of RTM_GETSTATS.
Am I missing the point? :S

> > +		size += rtnl_get_link_hstats_size(dev);  
> 
> rtnl_get_link_hstats_size == __rtnl_get_link_hstats can return < 0.

Ups!  Thank you!

In general how much do you dislike this code? :)

> > +
> >  	return size;
> >  }
> >  
> >   
> 


^ permalink raw reply

* RE: [PATCH] arm64: dts: lx2160aqds: Add mdio mux nodes
From: Leo Li @ 2019-01-30 17:48 UTC (permalink / raw)
  To: Pankaj Bansal, Shawn Guo, Andrew Lunn, Florian Fainelli
  Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org
In-Reply-To: <VI1PR0401MB24968C7FA8CE3CEB291FDC94F1900@VI1PR0401MB2496.eurprd04.prod.outlook.com>



> -----Original Message-----
> From: Pankaj Bansal
> Sent: Wednesday, January 30, 2019 4:37 AM
> To: Pankaj Bansal <pankaj.bansal@nxp.com>; Shawn Guo
> <shawnguo@kernel.org>; Leo Li <leoyang.li@nxp.com>; Andrew Lunn
> <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>
> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> Subject: RE: [PATCH] arm64: dts: lx2160aqds: Add mdio mux nodes
> 
> HI Shawn/Leo,
> 
> Can you please review this patch and include it in your tree ?
> 
> Regards,
> Pankaj Bansal
> 
> -----Original Message-----
> From: Pankaj Bansal [mailto:pankaj.bansal@nxp.com]
> Sent: Thursday, 15 November, 2018 05:42 PM
> To: Shawn Guo <shawnguo@kernel.org>; Leo Li <leoyang.li@nxp.com>;
> Andrew Lunn <andrew@lunn.ch>; Florian Fainelli <f.fainelli@gmail.com>
> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; Pankaj
> Bansal <pankaj.bansal@nxp.com>
> Subject: [PATCH] arm64: dts: lx2160aqds: Add mdio mux nodes
> 
> The two external MDIO buses used to communicate with phy devices that
> are external to SOC are muxed in LX2160AQDS board.
> 
> These buses can be routed to any one of the eight IO slots on LX2160AQDS
> board depending on value in fpga register 0x54.
> 
> Additionally the external MDIO1 is used to communicate to the onboard
> RGMII phy devices.
> 
> The mdio1 is controlled by bits 4-7 of fpga register and mdio2 is controlled by
> bits 0-3 of fpga register.
> 
> Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com>
> ---
> 
> Notes:
>     This patch depends on following patches:
>     [1]https://patchwork.kernel.org/cover/10658863/
>     [2]https://patchwork.codeaurora.org/patch/637861/
> 
>  .../boot/dts/freescale/fsl-lx2160a-qds.dts   | 116 +++++++++++++++++
>  .../boot/dts/freescale/fsl-lx2160a.dtsi      |  23 ++++
>  2 files changed, 139 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> index 8a0305a2b778..39aa2731ddfa 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> @@ -54,6 +54,121 @@
>  &i2c0 {
>  	status = "okay";
> 
> +	fpga@66 {
> +		compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
> +		reg = <0x66>;
> +		#address-cells = <1>;
> +		#size-cells = <0>;
> +
> +		mdio-mux-1@54 {

No compatible for this node?  What is the binding used with this node?

> +			mdio-parent-bus = <&emdio1>;
> +			reg = <0x54>;		 /* BRDCFG4 */
> +			mux-mask = <0xf8>;      /* EMI1_MDIO */
> +			#address-cells=<1>;
> +			#size-cells = <0>;
> +
> +			mdio@0 {
> +				reg = <0x00>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@40 {
> +				reg = <0x40>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@c0 {
> +				reg = <0xc0>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@c8 {
> +				reg = <0xc8>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@d0 {
> +				reg = <0xd0>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@d8 {
> +				reg = <0xd8>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@e0 {
> +				reg = <0xe0>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@e8 {
> +				reg = <0xe8>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@f0 {
> +				reg = <0xf0>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@f8 {
> +				reg = <0xf8>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +		};
> +
> +		mdio-mux-2@54 {

Same comment as the previous one.

> +			mdio-parent-bus = <&emdio2>;
> +			reg = <0x54>;		 /* BRDCFG4 */
> +			mux-mask = <0x07>;      /* EMI2_MDIO */
> +			#address-cells=<1>;
> +			#size-cells = <0>;
> +
> +			mdio@0 {
> +				reg = <0x00>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@1 {
> +				reg = <0x01>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@2 {
> +				reg = <0x02>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@3 {
> +				reg = <0x03>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@4 {
> +				reg = <0x04>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@5 {
> +				reg = <0x05>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@6 {
> +				reg = <0x06>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +			mdio@7 {
> +				reg = <0x07>;
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +			};
> +		};
> +	};
> +
>  	i2c-mux@77 {
>  		compatible = "nxp,pca9547";
>  		reg = <0x77>;
> @@ -118,3 +233,4 @@
>  &usb1 {
>  	status = "okay";
>  };
> +

No need to add this new line.

> diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> index 6ce0677c3096..518882b05f03 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> @@ -780,5 +780,28 @@
>  				     <GIC_SPI 209 IRQ_TYPE_LEVEL_HIGH>;
>  			dma-coherent;
>  		};
> +		/* TODO: WRIOP (CCSR?) */

What is this TODO?  Can we address it now?

> +		/* WRIOP0: 0x8B8_0000, E-MDIO1: 0x1_6000 */
> +		emdio1: mdio@0x8B96000 {
> +			compatible = "fsl,fman-memac-mdio";

This node doesn't actually match the binding Documentation/devicetree/bindings/net/fsl-fman.txt

> +			reg = <0x0 0x8B96000 0x0 0x1000>;
> +			device_type = "mdio";	/* TODO: is this necessary? */

Not needed

> +			little-endian;	/* force the driver in LE mode */
> +
> +			/* Not necessary on the QDS, but needed on the
> RDB*/

Why?  And we shouldn't discuss boards in the soc dtsi file.

> +			#address-cells = <1>;
> +			#size-cells = <0>;
> +		};
> +		/* WRIOP0: 0x8B8_0000, E-MDIO2: 0x1_7000 */
> +		emdio2: mdio@0x8B97000 {
> +			compatible = "fsl,fman-memac-mdio";
> +			reg = <0x0 0x8B97000 0x0 0x1000>;
> +			device_type = "mdio";	/* TODO: is this necessary? */
> +			little-endian;	/* force the driver in LE mode */
> +
> +			#address-cells = <1>;
> +			#size-cells = <0>;
> +		};
>  	};
>  };
> +
> --
> 2.17.1


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox