Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 net 1/3] sctp: change return type of sctp_get_port_local
From: Marcelo Ricardo Leitner @ 2019-09-12 14:51 UTC (permalink / raw)
  To: Mao Wenan
  Cc: vyasevich, nhorman, davem, linux-sctp, netdev, linux-kernel,
	kernel-janitors
In-Reply-To: <20190912040219.67517-2-maowenan@huawei.com>

On Thu, Sep 12, 2019 at 12:02:17PM +0800, Mao Wenan wrote:
> Currently sctp_get_port_local() returns a long
> which is either 0,1 or a pointer casted to long.
> It's neither of the callers use the return value since
> commit 62208f12451f ("net: sctp: simplify sctp_get_port").
> Now two callers are sctp_get_port and sctp_do_bind,
> they actually assumend a casted to an int was the same as
> a pointer casted to a long, and they don't save the return
> value just check whether it is zero or non-zero, so
> it would better change return type from long to int for
> sctp_get_port_local.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")

This Fixes tag is not needed. It's just a cleanup.
Maybe Dave can remove it for us.

Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

Thanks.

> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> ---
>  net/sctp/socket.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 9d1f83b10c0a..5e1934c48709 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -309,7 +309,7 @@ static int sctp_bind(struct sock *sk, struct sockaddr *addr, int addr_len)
>  	return retval;
>  }
>  
> -static long sctp_get_port_local(struct sock *, union sctp_addr *);
> +static int sctp_get_port_local(struct sock *, union sctp_addr *);
>  
>  /* Verify this is a valid sockaddr. */
>  static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
> @@ -7998,7 +7998,7 @@ static void sctp_unhash(struct sock *sk)
>  static struct sctp_bind_bucket *sctp_bucket_create(
>  	struct sctp_bind_hashbucket *head, struct net *, unsigned short snum);
>  
> -static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
> +static int sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
>  {
>  	struct sctp_sock *sp = sctp_sk(sk);
>  	bool reuse = (sk->sk_reuse || sp->reuse);
> @@ -8108,7 +8108,7 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
>  
>  			if (sctp_bind_addr_conflict(&ep2->base.bind_addr,
>  						    addr, sp2, sp)) {
> -				ret = (long)sk2;
> +				ret = 1;
>  				goto fail_unlock;
>  			}
>  		}
> @@ -8180,7 +8180,7 @@ static int sctp_get_port(struct sock *sk, unsigned short snum)
>  	addr.v4.sin_port = htons(snum);
>  
>  	/* Note: sk->sk_num gets filled in if ephemeral port request. */
> -	return !!sctp_get_port_local(sk, &addr);
> +	return sctp_get_port_local(sk, &addr);
>  }
>  
>  /*
> -- 
> 2.20.1
> 

^ permalink raw reply

* Re: [PATCH 04/11] net: phylink: switch to using fwnode_gpiod_get_index()
From: Russell King - ARM Linux admin @ 2019-09-12 13:52 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Linus Walleij, Dmitry Torokhov, Mika Westerberg,
	linux-kernel@vger.kernel.org, open list:GPIO SUBSYSTEM,
	Andrew Lunn, David S. Miller, Florian Fainelli, Heiner Kallweit,
	netdev
In-Reply-To: <20190912134429.GZ2680@smile.fi.intel.com>

On Thu, Sep 12, 2019 at 04:44:29PM +0300, Andy Shevchenko wrote:
> On Thu, Sep 12, 2019 at 10:41:43AM +0100, Linus Walleij wrote:
> > On Wed, Sep 11, 2019 at 10:51 AM Dmitry Torokhov
> > <dmitry.torokhov@gmail.com> wrote:
> > 
> > > If we are willing to sacrifice the custom label for the GPIO that
> > > fwnode_gpiod_get_index() allows us to set, then there are several
> > > drivers that could actually use gpiod_get() API.
> > 
> > We have:
> > gpiod_set_consumer_name(gpiod, "name");
> > to deal with that so no sacrifice is needed.
> 
> Thank for this hint!

Would it be possible to improve your email etiquette, and move this
discussion to a more appropriate subject line, so I don't have to keep
checking these emails, in case you _do_ talk about something relevent
to the original patch that the subject line refers to?

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* RE: [PATCH][PATCH net-next] hv_netvsc: Add the support of hibernation
From: Haiyang Zhang @ 2019-09-12 13:50 UTC (permalink / raw)
  To: Dexuan Cui, KY Srinivasan, Stephen Hemminger, sashal@kernel.org,
	davem@davemloft.net, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Kelley
In-Reply-To: <1568245063-69693-1-git-send-email-decui@microsoft.com>



> -----Original Message-----
> From: Dexuan Cui <decui@microsoft.com>
> Sent: Wednesday, September 11, 2019 7:38 PM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; sashal@kernel.org; davem@davemloft.net;
> linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Michael Kelley <mikelley@microsoft.com>
> Cc: Dexuan Cui <decui@microsoft.com>
> Subject: [PATCH][PATCH net-next] hv_netvsc: Add the support of hibernation
> 
> The existing netvsc_detach() and netvsc_attach() APIs make it easy to
> implement the suspend/resume callbacks.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> This patch is basically a pure Hyper-V specific change and it has a
> build dependency on the commit 271b2224d42f ("Drivers: hv: vmbus:
> Implement
> suspend/resume for VSC drivers for hibernation"), which is on Sasha Levin's
> Hyper-V tree's hyperv-next branch:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.k
> ernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fhyperv%2Flinux.git%2
> Flog%2F%3Fh%3Dhyperv-
> next&amp;data=02%7C01%7Chaiyangz%40microsoft.com%7C0b84e40c446
> 648cc35fb08d737110e28%7C72f988bf86f141af91ab2d7cd011db47%7C1%7
> C0%7C637038418703112019&amp;sdata=qd7DGFCJZ%2BDTix0VGcCe1JucV
> O97E0gILpVpcxlA6EE%3D&amp;reserved=0
> 
> I request this patch should go through Sasha's tree rather than the
> net-next tree.
> 
>  drivers/net/hyperv/hyperv_net.h |  3 +++
>  drivers/net/hyperv/netvsc_drv.c | 59
> +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 62 insertions(+)
> 
> diff --git a/drivers/net/hyperv/hyperv_net.h
> b/drivers/net/hyperv/hyperv_net.h
> index ecc9af0..b8763ee 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -952,6 +952,9 @@ struct net_device_context {
>  	u32 vf_alloc;
>  	/* Serial number of the VF to team with */
>  	u32 vf_serial;
> +
> +	/* Used to temporarily save the config info across hibernation */
> +	struct netvsc_device_info *saved_netvsc_dev_info;
>  };
> 
>  /* Per channel data */
> diff --git a/drivers/net/hyperv/netvsc_drv.c
> b/drivers/net/hyperv/netvsc_drv.c
> index afdcc56..f920959 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -2392,6 +2392,63 @@ static int netvsc_remove(struct hv_device *dev)
>  	return 0;
>  }
> 
> +static int netvsc_suspend(struct hv_device *dev)
> +{
> +	struct net_device_context *ndev_ctx;
> +	struct net_device *vf_netdev, *net;
> +	struct netvsc_device *nvdev;
> +	int ret;
> +
> +	net = hv_get_drvdata(dev);
> +
> +	ndev_ctx = netdev_priv(net);
> +	cancel_delayed_work_sync(&ndev_ctx->dwork);
> +
> +	rtnl_lock();
> +
> +	nvdev = rtnl_dereference(ndev_ctx->nvdev);
> +	if (nvdev == NULL) {
> +		ret = -ENODEV;
> +		goto out;
> +	}
> +
> +	cancel_work_sync(&nvdev->subchan_work);

This looks redundant because netvsc_detach() cancels subchan_work.

Thanks,
- Haiyang


^ permalink raw reply

* Re: [PATCH 04/11] net: phylink: switch to using fwnode_gpiod_get_index()
From: Andy Shevchenko @ 2019-09-12 13:44 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Dmitry Torokhov, Russell King - ARM Linux admin, Mika Westerberg,
	linux-kernel@vger.kernel.org, open list:GPIO SUBSYSTEM,
	Andrew Lunn, David S. Miller, Florian Fainelli, Heiner Kallweit,
	netdev
In-Reply-To: <CACRpkdbTErKxFBr__tj391FHwUTxC7ZF_m94tC8-VHzaynBsnw@mail.gmail.com>

On Thu, Sep 12, 2019 at 10:41:43AM +0100, Linus Walleij wrote:
> On Wed, Sep 11, 2019 at 10:51 AM Dmitry Torokhov
> <dmitry.torokhov@gmail.com> wrote:
> 
> > If we are willing to sacrifice the custom label for the GPIO that
> > fwnode_gpiod_get_index() allows us to set, then there are several
> > drivers that could actually use gpiod_get() API.
> 
> We have:
> gpiod_set_consumer_name(gpiod, "name");
> to deal with that so no sacrifice is needed.

Thank for this hint!

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: memory leak in ppp_write
From: Takeshi Misawa @ 2019-09-12 13:34 UTC (permalink / raw)
  To: syzbot
  Cc: ast, bpf, daniel, davem, kafai, linux-kernel, linux-ppp, netdev,
	paulus, songliubraving, syzkaller-bugs, yhs
In-Reply-To: <000000000000594c700591433550@google.com>

[-- Attachment #1: Type: text/plain, Size: 57 bytes --]

#syz test: https://github.com/google/kasan.git 6525771f


[-- Attachment #2: 0001-ppp-Fix-memory-leak-in-ppp_write.patch --]
[-- Type: text/x-diff, Size: 2454 bytes --]

From e4ff0d04a4b8dd6da3dfb9135235ae5360ce86e6 Mon Sep 17 00:00:00 2001
From: Takeshi Misawa <jeliantsurux@gmail.com>
Date: Wed, 11 Sep 2019 22:18:43 +0900
Subject: [PATCH] ppp: Fix memory leak in ppp_write

When ppp is closing, __ppp_xmit_process() failed to enqueue skb
and skb allocated in ppp_write() is leaked.

syzbot reported :
BUG: memory leak
unreferenced object 0xffff88812a17bc00 (size 224):
  comm "syz-executor673", pid 6952, jiffies 4294942888 (age 13.040s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000d110fff9>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
    [<00000000d110fff9>] slab_post_alloc_hook mm/slab.h:522 [inline]
    [<00000000d110fff9>] slab_alloc_node mm/slab.c:3262 [inline]
    [<00000000d110fff9>] kmem_cache_alloc_node+0x163/0x2f0 mm/slab.c:3574
    [<000000002d616113>] __alloc_skb+0x6e/0x210 net/core/skbuff.c:197
    [<000000000167fc45>] alloc_skb include/linux/skbuff.h:1055 [inline]
    [<000000000167fc45>] ppp_write+0x48/0x120 drivers/net/ppp/ppp_generic.c:502
    [<000000009ab42c0b>] __vfs_write+0x43/0xa0 fs/read_write.c:494
    [<00000000086b2e22>] vfs_write fs/read_write.c:558 [inline]
    [<00000000086b2e22>] vfs_write+0xee/0x210 fs/read_write.c:542
    [<00000000a2b70ef9>] ksys_write+0x7c/0x130 fs/read_write.c:611
    [<00000000ce5e0fdd>] __do_sys_write fs/read_write.c:623 [inline]
    [<00000000ce5e0fdd>] __se_sys_write fs/read_write.c:620 [inline]
    [<00000000ce5e0fdd>] __x64_sys_write+0x1e/0x30 fs/read_write.c:620
    [<00000000d9d7b370>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:296
    [<0000000006e6d506>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by freeing skb, if ppp is closing.

Reported-by: syzbot+d9c8bf24e56416d7ce2c@syzkaller.appspotmail.com
Signed-off-by: Takeshi Misawa <jeliantsurux@gmail.com>
---
 drivers/net/ppp/ppp_generic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index a30e41a56085..9a1b006904a7 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -1415,6 +1415,8 @@ static void __ppp_xmit_process(struct ppp *ppp, struct sk_buff *skb)
 			netif_wake_queue(ppp->dev);
 		else
 			netif_stop_queue(ppp->dev);
+	} else {
+		kfree_skb(skb);
 	}
 	ppp_xmit_unlock(ppp);
 }
-- 
2.17.1


^ permalink raw reply related

* [PATCH v4 0/2] ethtool: implement Energy Detect Powerdown support via phy-tunable
From: Alexandru Ardelean @ 2019-09-12 16:28 UTC (permalink / raw)
  To: netdev, devicetree, linux-kernel
  Cc: davem, robh+dt, mark.rutland, f.fainelli, hkallweit1, andrew,
	mkubecek, Alexandru Ardelean

This changeset proposes a new control for PHY tunable to control Energy
Detect Power Down.

The `phy_tunable_id` has been named `ETHTOOL_PHY_EDPD` since it looks like
this feature is common across other PHYs (like EEE), and defining
`ETHTOOL_PHY_ENERGY_DETECT_POWER_DOWN` seems too long.
    
The way EDPD works, is that the RX block is put to a lower power mode,
except for link-pulse detection circuits. The TX block is also put to low
power mode, but the PHY wakes-up periodically to send link pulses, to avoid
lock-ups in case the other side is also in EDPD mode.
    
Currently, there are 2 PHY drivers that look like they could use this new
PHY tunable feature: the `adin` && `micrel` PHYs.

This series updates only the `adin` PHY driver to support this new feature,
as this chip has been tested. A change for `micrel` can be proposed after a
discussion of the PHY-tunable API is resolved.

Alexandru Ardelean (2):
  ethtool: implement Energy Detect Powerdown support via phy-tunable
  net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable

 drivers/net/phy/adin.c       | 61 ++++++++++++++++++++++++++++++++++++
 include/uapi/linux/ethtool.h | 22 +++++++++++++
 net/core/ethtool.c           |  6 ++++
 3 files changed, 89 insertions(+)

-- 

Changelog v3 -> v4:
* impose the TX interval unit for EDPD to be milliseconds; the point was
  raised by Michal; this should allow for intervals:
   - as small as 1 millisecond, which does not sound like a power-saver
   - as large as 65 seconds, which sounds like a lot to wait for a link to
     come up

Changelog v2 -> v3:
* implement Andrew's review comments:
  1. for patch `ethtool: implement Energy Detect Powerdown support via
     phy-tunable`
     - ETHTOOL_PHY_EDPD_DFLT_TX_INTERVAL == 0xffff
     - ETHTOOL_PHY_EDPD_NO_TX            == 0xfffe
     - added comment in include/uapi/linux/ethtool.h
  2. for patch `net: phy: adin: implement Energy Detect Powerdown mode via
     phy-tunable`
     - added comments about interval & units for the ADIN PHY
     - in `adin_set_edpd()`: add a switch statement of all the valid values
     - in `adin_get_edpd()`: return `ETHTOOL_PHY_EDPD_DFLT_TX_INTERVAL`
       since the PHY only supports a single TX-interval value (1 second)

Changelog v1 -> v2:
* initial series was made up of 2 sub-series: 1 for kernel & 1 for ethtool
  in userspace; v2 contains only the kernel series

2.20.1


^ permalink raw reply

* [PATCH v4 2/2] net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable
From: Alexandru Ardelean @ 2019-09-12 16:28 UTC (permalink / raw)
  To: netdev, devicetree, linux-kernel
  Cc: davem, robh+dt, mark.rutland, f.fainelli, hkallweit1, andrew,
	mkubecek, Alexandru Ardelean
In-Reply-To: <20190912162812.402-1-alexandru.ardelean@analog.com>

This driver becomes the first user of the kernel's `ETHTOOL_PHY_EDPD`
phy-tunable feature.
EDPD is also enabled by default on PHY config_init, but can be disabled via
the phy-tunable control.

When enabling EDPD, it's also a good idea (for the ADIN PHYs) to enable TX
periodic pulses, so that in case the other PHY is also on EDPD mode, there
is no lock-up situation where both sides are waiting for the other to
transmit.

Via the phy-tunable control, TX pulses can be disabled if specifying 0
`tx-interval` via ethtool.

The ADIN PHY supports only fixed 1 second intervals; they cannot be
configured. That is why the acceptable values are 1,
ETHTOOL_PHY_EDPD_DFLT_TX_MSECS and ETHTOOL_PHY_EDPD_NO_TX (which disables
TX pulses).

Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
---
 drivers/net/phy/adin.c | 61 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/drivers/net/phy/adin.c b/drivers/net/phy/adin.c
index 4dec83df048d..9afeee67675b 100644
--- a/drivers/net/phy/adin.c
+++ b/drivers/net/phy/adin.c
@@ -26,6 +26,11 @@
 
 #define ADIN1300_RX_ERR_CNT			0x0014
 
+#define ADIN1300_PHY_CTRL_STATUS2		0x0015
+#define   ADIN1300_NRG_PD_EN			BIT(3)
+#define   ADIN1300_NRG_PD_TX_EN			BIT(2)
+#define   ADIN1300_NRG_PD_STATUS		BIT(1)
+
 #define ADIN1300_PHY_CTRL2			0x0016
 #define   ADIN1300_DOWNSPEED_AN_100_EN		BIT(11)
 #define   ADIN1300_DOWNSPEED_AN_10_EN		BIT(10)
@@ -328,12 +333,62 @@ static int adin_set_downshift(struct phy_device *phydev, u8 cnt)
 			    ADIN1300_DOWNSPEEDS_EN);
 }
 
+static int adin_get_edpd(struct phy_device *phydev, u16 *tx_interval)
+{
+	int val;
+
+	val = phy_read(phydev, ADIN1300_PHY_CTRL_STATUS2);
+	if (val < 0)
+		return val;
+
+	if (ADIN1300_NRG_PD_EN & val) {
+		if (val & ADIN1300_NRG_PD_TX_EN)
+			/* default is 1 second */
+			*tx_interval = ETHTOOL_PHY_EDPD_DFLT_TX_MSECS;
+		else
+			*tx_interval = ETHTOOL_PHY_EDPD_NO_TX;
+	} else {
+		*tx_interval = ETHTOOL_PHY_EDPD_DISABLE;
+	}
+
+	return 0;
+}
+
+static int adin_set_edpd(struct phy_device *phydev, u16 tx_interval)
+{
+	u16 val;
+
+	if (tx_interval == ETHTOOL_PHY_EDPD_DISABLE)
+		return phy_clear_bits(phydev, ADIN1300_PHY_CTRL_STATUS2,
+				(ADIN1300_NRG_PD_EN | ADIN1300_NRG_PD_TX_EN));
+
+	val = ADIN1300_NRG_PD_EN;
+
+	switch (tx_interval) {
+	case 1000: /* 1 second */
+		/* fallthrough */
+	case ETHTOOL_PHY_EDPD_DFLT_TX_MSECS:
+		val |= ADIN1300_NRG_PD_TX_EN;
+		/* fallthrough */
+	case ETHTOOL_PHY_EDPD_NO_TX:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return phy_modify(phydev, ADIN1300_PHY_CTRL_STATUS2,
+			  (ADIN1300_NRG_PD_EN | ADIN1300_NRG_PD_TX_EN),
+			  val);
+}
+
 static int adin_get_tunable(struct phy_device *phydev,
 			    struct ethtool_tunable *tuna, void *data)
 {
 	switch (tuna->id) {
 	case ETHTOOL_PHY_DOWNSHIFT:
 		return adin_get_downshift(phydev, data);
+	case ETHTOOL_PHY_EDPD:
+		return adin_get_edpd(phydev, data);
 	default:
 		return -EOPNOTSUPP;
 	}
@@ -345,6 +400,8 @@ static int adin_set_tunable(struct phy_device *phydev,
 	switch (tuna->id) {
 	case ETHTOOL_PHY_DOWNSHIFT:
 		return adin_set_downshift(phydev, *(const u8 *)data);
+	case ETHTOOL_PHY_EDPD:
+		return adin_set_edpd(phydev, *(const u16 *)data);
 	default:
 		return -EOPNOTSUPP;
 	}
@@ -368,6 +425,10 @@ static int adin_config_init(struct phy_device *phydev)
 	if (rc < 0)
 		return rc;
 
+	rc = adin_set_edpd(phydev, 1);
+	if (rc < 0)
+		return rc;
+
 	phydev_dbg(phydev, "PHY is using mode '%s'\n",
 		   phy_modes(phydev->interface));
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH v4 1/2] ethtool: implement Energy Detect Powerdown support via phy-tunable
From: Alexandru Ardelean @ 2019-09-12 16:28 UTC (permalink / raw)
  To: netdev, devicetree, linux-kernel
  Cc: davem, robh+dt, mark.rutland, f.fainelli, hkallweit1, andrew,
	mkubecek, Alexandru Ardelean
In-Reply-To: <20190912162812.402-1-alexandru.ardelean@analog.com>

The `phy_tunable_id` has been named `ETHTOOL_PHY_EDPD` since it looks like
this feature is common across other PHYs (like EEE), and defining
`ETHTOOL_PHY_ENERGY_DETECT_POWER_DOWN` seems too long.

The way EDPD works, is that the RX block is put to a lower power mode,
except for link-pulse detection circuits. The TX block is also put to low
power mode, but the PHY wakes-up periodically to send link pulses, to avoid
lock-ups in case the other side is also in EDPD mode.

Currently, there are 2 PHY drivers that look like they could use this new
PHY tunable feature: the `adin` && `micrel` PHYs.

The ADIN's datasheet mentions that TX pulses are at intervals of 1 second
default each, and they can be disabled. For the Micrel KSZ9031 PHY, the
datasheet does not mention whether they can be disabled, but mentions that
they can modified.

The way this change is structured, is similar to the PHY tunable downshift
control:
* a `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` value is exposed to cover a default
  TX interval; some PHYs could specify a certain value that makes sense
* `ETHTOOL_PHY_EDPD_NO_TX` would disable TX when EDPD is enabled
* `ETHTOOL_PHY_EDPD_DISABLE` will disable EDPD

As noted by the `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` the interval unit is 1
millisecond, which should cover a reasonable range of intervals:
 - from 1 millisecond, which does not sound like much of a power-saver
 - to ~65 seconds which is quite a lot to wait for a link to come up when
   plugging a cable

Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
---
 include/uapi/linux/ethtool.h | 22 ++++++++++++++++++++++
 net/core/ethtool.c           |  6 ++++++
 2 files changed, 28 insertions(+)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index dd06302aa93e..8938b76c4ee3 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -259,10 +259,32 @@ struct ethtool_tunable {
 #define ETHTOOL_PHY_FAST_LINK_DOWN_ON	0
 #define ETHTOOL_PHY_FAST_LINK_DOWN_OFF	0xff
 
+/* Energy Detect Power Down (EDPD) is a feature supported by some PHYs, where
+ * the PHY's RX & TX blocks are put into a low-power mode when there is no
+ * link detected (typically cable is un-plugged). For RX, only a minimal
+ * link-detection is available, and for TX the PHY wakes up to send link pulses
+ * to avoid any lock-ups in case the peer PHY may also be running in EDPD mode.
+ *
+ * Some PHYs may support configuration of the wake-up interval for TX pulses,
+ * and some PHYs may support only disabling TX pulses entirely. For the latter
+ * a special value is required (ETHTOOL_PHY_EDPD_NO_TX) so that this can be
+ * configured from userspace (should the user want it).
+ *
+ * The interval units for TX wake-up are in milliseconds, since this should
+ * cover a reasonable range of intervals:
+ *  - from 1 millisecond, which does not sound like much of a power-saver
+ *  - to ~65 seconds which is quite a lot to wait for a link to come up when
+ *    plugging a cable
+ */
+#define ETHTOOL_PHY_EDPD_DFLT_TX_MSECS		0xffff
+#define ETHTOOL_PHY_EDPD_NO_TX			0xfffe
+#define ETHTOOL_PHY_EDPD_DISABLE		0
+
 enum phy_tunable_id {
 	ETHTOOL_PHY_ID_UNSPEC,
 	ETHTOOL_PHY_DOWNSHIFT,
 	ETHTOOL_PHY_FAST_LINK_DOWN,
+	ETHTOOL_PHY_EDPD,
 	/*
 	 * Add your fresh new phy tunable attribute above and remember to update
 	 * phy_tunable_strings[] in net/core/ethtool.c
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 6288e69e94fc..c763106c73fc 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -133,6 +133,7 @@ phy_tunable_strings[__ETHTOOL_PHY_TUNABLE_COUNT][ETH_GSTRING_LEN] = {
 	[ETHTOOL_ID_UNSPEC]     = "Unspec",
 	[ETHTOOL_PHY_DOWNSHIFT]	= "phy-downshift",
 	[ETHTOOL_PHY_FAST_LINK_DOWN] = "phy-fast-link-down",
+	[ETHTOOL_PHY_EDPD]	= "phy-energy-detect-power-down",
 };
 
 static int ethtool_get_features(struct net_device *dev, void __user *useraddr)
@@ -2451,6 +2452,11 @@ static int ethtool_phy_tunable_valid(const struct ethtool_tunable *tuna)
 		    tuna->type_id != ETHTOOL_TUNABLE_U8)
 			return -EINVAL;
 		break;
+	case ETHTOOL_PHY_EDPD:
+		if (tuna->len != sizeof(u16) ||
+		    tuna->type_id != ETHTOOL_TUNABLE_U16)
+			return -EINVAL;
+		break;
 	default:
 		return -EINVAL;
 	}
-- 
2.20.1


^ permalink raw reply related

* [PATCH net] net: dsa: Fix load order between DSA drivers and taggers
From: Andrew Lunn @ 2019-09-12 13:16 UTC (permalink / raw)
  To: David Miller; +Cc: Florian Fainelli, Vivien Didelot, netdev, Andrew Lunn

The DSA core, DSA taggers and DSA drivers all make use of
module_init(). Hence they get initialised at device_initcall() time.
The ordering is non-deterministic. It can be a DSA driver is bound to
a device before the needed tag driver has been initialised, resulting
in the message:

No tagger for this switch

Rather than have this be fatal, return -EPROBE_DEFER so that it is
tried again later once all the needed drivers have been loaded.

Fixes: d3b8c04988ca ("dsa: Add boilerplate helper to register DSA tag driver modules")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---

I did wonder if we should play with the core and tag drivers and make
them use subsystem_initcall(), but EPROBE_DEFER seems to be the more
preferred solution nowadays.
---
 net/dsa/dsa2.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 3abd173ebacb..96f787cf9b6e 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -623,6 +623,8 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *master)
 	tag_protocol = ds->ops->get_tag_protocol(ds, dp->index);
 	tag_ops = dsa_tag_driver_get(tag_protocol);
 	if (IS_ERR(tag_ops)) {
+		if (PTR_ERR(tag_ops) == -ENOPROTOOPT)
+			return -EPROBE_DEFER;
 		dev_warn(ds->dev, "No tagger for this switch\n");
 		return PTR_ERR(tag_ops);
 	}
-- 
2.23.0


^ permalink raw reply related

* [PATCH net-next 2/2] mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer
From: Ido Schimmel @ 2019-09-12 13:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, shalomt, mlxsw, Ido Schimmel
In-Reply-To: <20190912130740.22061-1-idosch@idosch.org>

From: Shalom Toledo <shalomt@mellanox.com>

While debugging packet loss towards the CPU, it is useful to be able to
query the CPU port's shared buffer quotas and occupancy.

Since the CPU port has no ingress buffers, all the shared buffers ingress
information will be cleared.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../mellanox/mlxsw/spectrum_buffers.c         | 51 ++++++++++++++++---
 1 file changed, 43 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
index 888ba4300bcc..b9eeae37a4dc 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
@@ -250,6 +250,10 @@ static int mlxsw_sp_sb_pm_occ_clear(struct mlxsw_sp *mlxsw_sp, u8 local_port,
 		&mlxsw_sp->sb_vals->pool_dess[pool_index];
 	char sbpm_pl[MLXSW_REG_SBPM_LEN];
 
+	if (local_port == MLXSW_PORT_CPU_PORT &&
+	    des->dir == MLXSW_REG_SBXX_DIR_INGRESS)
+		return 0;
+
 	mlxsw_reg_sbpm_pack(sbpm_pl, local_port, des->pool, des->dir,
 			    true, 0, 0);
 	return mlxsw_reg_trans_query(mlxsw_sp->core, MLXSW_REG(sbpm), sbpm_pl,
@@ -273,6 +277,10 @@ static int mlxsw_sp_sb_pm_occ_query(struct mlxsw_sp *mlxsw_sp, u8 local_port,
 	char sbpm_pl[MLXSW_REG_SBPM_LEN];
 	struct mlxsw_sp_sb_pm *pm;
 
+	if (local_port == MLXSW_PORT_CPU_PORT &&
+	    des->dir == MLXSW_REG_SBXX_DIR_INGRESS)
+		return 0;
+
 	pm = mlxsw_sp_sb_pm_get(mlxsw_sp, local_port, pool_index);
 	mlxsw_reg_sbpm_pack(sbpm_pl, local_port, des->pool, des->dir,
 			    false, 0, 0);
@@ -1085,6 +1093,11 @@ int mlxsw_sp_sb_port_pool_set(struct mlxsw_core_port *mlxsw_core_port,
 	u32 max_buff;
 	int err;
 
+	if (local_port == MLXSW_PORT_CPU_PORT) {
+		NL_SET_ERR_MSG_MOD(extack, "Changing CPU port's threshold is forbidden");
+		return -EINVAL;
+	}
+
 	err = mlxsw_sp_sb_threshold_in(mlxsw_sp, pool_index,
 				       threshold, &max_buff, extack);
 	if (err)
@@ -1130,6 +1143,11 @@ int mlxsw_sp_sb_tc_pool_bind_set(struct mlxsw_core_port *mlxsw_core_port,
 	u32 max_buff;
 	int err;
 
+	if (local_port == MLXSW_PORT_CPU_PORT) {
+		NL_SET_ERR_MSG_MOD(extack, "Changing CPU port's binding is forbidden");
+		return -EINVAL;
+	}
+
 	if (dir != mlxsw_sp->sb_vals->pool_dess[pool_index].dir) {
 		NL_SET_ERR_MSG_MOD(extack, "Binding egress TC to ingress pool and vice versa is forbidden");
 		return -EINVAL;
@@ -1187,6 +1205,11 @@ static void mlxsw_sp_sb_sr_occ_query_cb(struct mlxsw_core *mlxsw_core,
 	     local_port < mlxsw_core_max_ports(mlxsw_core); local_port++) {
 		if (!mlxsw_sp->ports[local_port])
 			continue;
+		if (local_port == MLXSW_PORT_CPU_PORT) {
+			/* Ingress quotas are not supported for the CPU port */
+			masked_count++;
+			continue;
+		}
 		for (i = 0; i < MLXSW_SP_SB_ING_TC_COUNT; i++) {
 			cm = mlxsw_sp_sb_cm_get(mlxsw_sp, local_port, i,
 						MLXSW_REG_SBXX_DIR_INGRESS);
@@ -1222,7 +1245,7 @@ int mlxsw_sp_sb_occ_snapshot(struct mlxsw_core *mlxsw_core,
 	char *sbsr_pl;
 	u8 masked_count;
 	u8 local_port_1;
-	u8 local_port = 0;
+	u8 local_port;
 	int i;
 	int err;
 	int err2;
@@ -1231,8 +1254,8 @@ int mlxsw_sp_sb_occ_snapshot(struct mlxsw_core *mlxsw_core,
 	if (!sbsr_pl)
 		return -ENOMEM;
 
+	local_port = MLXSW_PORT_CPU_PORT;
 next_batch:
-	local_port++;
 	local_port_1 = local_port;
 	masked_count = 0;
 	mlxsw_reg_sbsr_pack(sbsr_pl, false);
@@ -1243,7 +1266,11 @@ int mlxsw_sp_sb_occ_snapshot(struct mlxsw_core *mlxsw_core,
 	for (; local_port < mlxsw_core_max_ports(mlxsw_core); local_port++) {
 		if (!mlxsw_sp->ports[local_port])
 			continue;
-		mlxsw_reg_sbsr_ingress_port_mask_set(sbsr_pl, local_port, 1);
+		if (local_port != MLXSW_PORT_CPU_PORT) {
+			/* Ingress quotas are not supported for the CPU port */
+			mlxsw_reg_sbsr_ingress_port_mask_set(sbsr_pl,
+							     local_port, 1);
+		}
 		mlxsw_reg_sbsr_egress_port_mask_set(sbsr_pl, local_port, 1);
 		for (i = 0; i < mlxsw_sp->sb_vals->pool_count; i++) {
 			err = mlxsw_sp_sb_pm_occ_query(mlxsw_sp, local_port, i,
@@ -1264,8 +1291,10 @@ int mlxsw_sp_sb_occ_snapshot(struct mlxsw_core *mlxsw_core,
 				    cb_priv);
 	if (err)
 		goto out;
-	if (local_port < mlxsw_core_max_ports(mlxsw_core))
+	if (local_port < mlxsw_core_max_ports(mlxsw_core)) {
+		local_port++;
 		goto next_batch;
+	}
 
 out:
 	err2 = mlxsw_reg_trans_bulk_wait(&bulk_list);
@@ -1282,7 +1311,7 @@ int mlxsw_sp_sb_occ_max_clear(struct mlxsw_core *mlxsw_core,
 	LIST_HEAD(bulk_list);
 	char *sbsr_pl;
 	unsigned int masked_count;
-	u8 local_port = 0;
+	u8 local_port;
 	int i;
 	int err;
 	int err2;
@@ -1291,8 +1320,8 @@ int mlxsw_sp_sb_occ_max_clear(struct mlxsw_core *mlxsw_core,
 	if (!sbsr_pl)
 		return -ENOMEM;
 
+	local_port = MLXSW_PORT_CPU_PORT;
 next_batch:
-	local_port++;
 	masked_count = 0;
 	mlxsw_reg_sbsr_pack(sbsr_pl, true);
 	for (i = 0; i < MLXSW_SP_SB_ING_TC_COUNT; i++)
@@ -1302,7 +1331,11 @@ int mlxsw_sp_sb_occ_max_clear(struct mlxsw_core *mlxsw_core,
 	for (; local_port < mlxsw_core_max_ports(mlxsw_core); local_port++) {
 		if (!mlxsw_sp->ports[local_port])
 			continue;
-		mlxsw_reg_sbsr_ingress_port_mask_set(sbsr_pl, local_port, 1);
+		if (local_port != MLXSW_PORT_CPU_PORT) {
+			/* Ingress quotas are not supported for the CPU port */
+			mlxsw_reg_sbsr_ingress_port_mask_set(sbsr_pl,
+							     local_port, 1);
+		}
 		mlxsw_reg_sbsr_egress_port_mask_set(sbsr_pl, local_port, 1);
 		for (i = 0; i < mlxsw_sp->sb_vals->pool_count; i++) {
 			err = mlxsw_sp_sb_pm_occ_clear(mlxsw_sp, local_port, i,
@@ -1319,8 +1352,10 @@ int mlxsw_sp_sb_occ_max_clear(struct mlxsw_core *mlxsw_core,
 				    &bulk_list, NULL, 0);
 	if (err)
 		goto out;
-	if (local_port < mlxsw_core_max_ports(mlxsw_core))
+	if (local_port < mlxsw_core_max_ports(mlxsw_core)) {
+		local_port++;
 		goto next_batch;
+	}
 
 out:
 	err2 = mlxsw_reg_trans_bulk_wait(&bulk_list);
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next 1/2] mlxsw: spectrum: Register CPU port with devlink
From: Ido Schimmel @ 2019-09-12 13:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, shalomt, mlxsw, Ido Schimmel
In-Reply-To: <20190912130740.22061-1-idosch@idosch.org>

From: Shalom Toledo <shalomt@mellanox.com>

Register CPU port with devlink.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c    | 33 +++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/core.h    |  5 ++
 .../net/ethernet/mellanox/mlxsw/spectrum.c    | 47 +++++++++++++++++++
 3 files changed, 85 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 963a2b4b61b1..94f83d2be17e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1892,6 +1892,39 @@ void mlxsw_core_port_fini(struct mlxsw_core *mlxsw_core, u8 local_port)
 }
 EXPORT_SYMBOL(mlxsw_core_port_fini);
 
+int mlxsw_core_cpu_port_init(struct mlxsw_core *mlxsw_core,
+			     void *port_driver_priv,
+			     const unsigned char *switch_id,
+			     unsigned char switch_id_len)
+{
+	struct devlink *devlink = priv_to_devlink(mlxsw_core);
+	struct mlxsw_core_port *mlxsw_core_port =
+					&mlxsw_core->ports[MLXSW_PORT_CPU_PORT];
+	struct devlink_port *devlink_port = &mlxsw_core_port->devlink_port;
+	int err;
+
+	mlxsw_core_port->local_port = MLXSW_PORT_CPU_PORT;
+	mlxsw_core_port->port_driver_priv = port_driver_priv;
+	devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_CPU,
+			       0, false, 0, switch_id, switch_id_len);
+	err = devlink_port_register(devlink, devlink_port, MLXSW_PORT_CPU_PORT);
+	if (err)
+		memset(mlxsw_core_port, 0, sizeof(*mlxsw_core_port));
+	return err;
+}
+EXPORT_SYMBOL(mlxsw_core_cpu_port_init);
+
+void mlxsw_core_cpu_port_fini(struct mlxsw_core *mlxsw_core)
+{
+	struct mlxsw_core_port *mlxsw_core_port =
+					&mlxsw_core->ports[MLXSW_PORT_CPU_PORT];
+	struct devlink_port *devlink_port = &mlxsw_core_port->devlink_port;
+
+	devlink_port_unregister(devlink_port);
+	memset(mlxsw_core_port, 0, sizeof(*mlxsw_core_port));
+}
+EXPORT_SYMBOL(mlxsw_core_cpu_port_fini);
+
 void mlxsw_core_port_eth_set(struct mlxsw_core *mlxsw_core, u8 local_port,
 			     void *port_driver_priv, struct net_device *dev)
 {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index b65a17d49e43..5d7d2ab6d155 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -177,6 +177,11 @@ int mlxsw_core_port_init(struct mlxsw_core *mlxsw_core, u8 local_port,
 			 const unsigned char *switch_id,
 			 unsigned char switch_id_len);
 void mlxsw_core_port_fini(struct mlxsw_core *mlxsw_core, u8 local_port);
+int mlxsw_core_cpu_port_init(struct mlxsw_core *mlxsw_core,
+			     void *port_driver_priv,
+			     const unsigned char *switch_id,
+			     unsigned char switch_id_len);
+void mlxsw_core_cpu_port_fini(struct mlxsw_core *mlxsw_core);
 void mlxsw_core_port_eth_set(struct mlxsw_core *mlxsw_core, u8 local_port,
 			     void *port_driver_priv, struct net_device *dev);
 void mlxsw_core_port_ib_set(struct mlxsw_core *mlxsw_core, u8 local_port,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 91e4792bb7e7..1fc73a9ad84d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -3872,6 +3872,46 @@ static void mlxsw_sp_port_remove(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 	mlxsw_core_port_fini(mlxsw_sp->core, local_port);
 }
 
+static int mlxsw_sp_cpu_port_create(struct mlxsw_sp *mlxsw_sp)
+{
+	struct mlxsw_sp_port *mlxsw_sp_port;
+	int err;
+
+	mlxsw_sp_port = kzalloc(sizeof(*mlxsw_sp_port), GFP_KERNEL);
+	if (!mlxsw_sp_port)
+		return -ENOMEM;
+
+	mlxsw_sp_port->mlxsw_sp = mlxsw_sp;
+	mlxsw_sp_port->local_port = MLXSW_PORT_CPU_PORT;
+
+	mlxsw_sp->ports[MLXSW_PORT_CPU_PORT] = mlxsw_sp_port;
+
+	err = mlxsw_core_cpu_port_init(mlxsw_sp->core,
+				       mlxsw_sp_port,
+				       mlxsw_sp->base_mac,
+				       sizeof(mlxsw_sp->base_mac));
+	if (err) {
+		dev_err(mlxsw_sp->bus_info->dev, "Failed to initialize core CPU port\n");
+		goto err_core_cpu_port_init;
+	}
+
+	return err;
+
+err_core_cpu_port_init:
+	mlxsw_sp->ports[MLXSW_PORT_CPU_PORT] = NULL;
+	kfree(mlxsw_sp_port);
+	return err;
+}
+
+static void mlxsw_sp_cpu_port_remove(struct mlxsw_sp *mlxsw_sp)
+{
+	struct mlxsw_sp_port *mlxsw_sp_port = mlxsw_sp->ports[0];
+
+	mlxsw_core_cpu_port_fini(mlxsw_sp->core);
+	mlxsw_sp->ports[MLXSW_PORT_CPU_PORT] = NULL;
+	kfree(mlxsw_sp_port);
+}
+
 static bool mlxsw_sp_port_created(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 {
 	return mlxsw_sp->ports[local_port] != NULL;
@@ -3884,6 +3924,7 @@ static void mlxsw_sp_ports_remove(struct mlxsw_sp *mlxsw_sp)
 	for (i = 1; i < mlxsw_core_max_ports(mlxsw_sp->core); i++)
 		if (mlxsw_sp_port_created(mlxsw_sp, i))
 			mlxsw_sp_port_remove(mlxsw_sp, i);
+	mlxsw_sp_cpu_port_remove(mlxsw_sp);
 	kfree(mlxsw_sp->port_to_module);
 	kfree(mlxsw_sp->ports);
 }
@@ -3908,6 +3949,10 @@ static int mlxsw_sp_ports_create(struct mlxsw_sp *mlxsw_sp)
 		goto err_port_to_module_alloc;
 	}
 
+	err = mlxsw_sp_cpu_port_create(mlxsw_sp);
+	if (err)
+		goto err_cpu_port_create;
+
 	for (i = 1; i < max_ports; i++) {
 		/* Mark as invalid */
 		mlxsw_sp->port_to_module[i] = -1;
@@ -3931,6 +3976,8 @@ static int mlxsw_sp_ports_create(struct mlxsw_sp *mlxsw_sp)
 	for (i--; i >= 1; i--)
 		if (mlxsw_sp_port_created(mlxsw_sp, i))
 			mlxsw_sp_port_remove(mlxsw_sp, i);
+	mlxsw_sp_cpu_port_remove(mlxsw_sp);
+err_cpu_port_create:
 	kfree(mlxsw_sp->port_to_module);
 err_port_to_module_alloc:
 	kfree(mlxsw_sp->ports);
-- 
2.21.0


^ permalink raw reply related

* [PATCH net-next 0/2] mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer
From: Ido Schimmel @ 2019-09-12 13:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, shalomt, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Shalom says:

While debugging packet loss towards the CPU, it is useful to be able to
query the CPU port's shared buffer quotas and occupancy.

Patch #1 registers the CPU port with devlink.

Patch #2 adds the ability to query the CPU port's shared buffer quotas and
occupancy.

Shalom Toledo (2):
  mlxsw: spectrum: Register CPU port with devlink
  mlxsw: spectrum_buffers: Add the ability to query the CPU port's
    shared buffer

 drivers/net/ethernet/mellanox/mlxsw/core.c    | 33 ++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/core.h    |  5 ++
 .../net/ethernet/mellanox/mlxsw/spectrum.c    | 47 +++++++++++++++++
 .../mellanox/mlxsw/spectrum_buffers.c         | 51 ++++++++++++++++---
 4 files changed, 128 insertions(+), 8 deletions(-)

-- 
2.21.0


^ permalink raw reply

* Re: [PATCH net v2 01/11] net: core: limit nested device depth
From: Taehee Yoo @ 2019-09-12 12:49 UTC (permalink / raw)
  To: David Miller
  Cc: Netdev, j.vosburgh, vfalico, Andy Gospodarek,
	Jiří Pírko, sd, Roopa Prabhu, saeedm, manishc,
	rahulv, kys, haiyangz, sthemmin, sashal, hare, varun, ubraun,
	kgraul, Jay Vosburgh
In-Reply-To: <20190912.133717.257813019167130934.davem@davemloft.net>

On Thu, 12 Sep 2019 at 20:37, David Miller <davem@davemloft.net> wrote:
>
> From: Taehee Yoo <ap420073@gmail.com>
> Date: Thu, 12 Sep 2019 19:14:37 +0900
>
> > On Thu, 12 Sep 2019 at 18:38, David Miller <davem@davemloft.net> wrote:
> >>
> >> From: Taehee Yoo <ap420073@gmail.com>
> >> Date: Thu, 12 Sep 2019 12:56:19 +0900
> >>
> >> > I tested with this reproducer commands without lockdep.
> >> >
> >> >     ip link add dummy0 type dummy
> >> >     ip link add link dummy0 name vlan1 type vlan id 1
> >> >     ip link set vlan1 up
> >> >
> >> >     for i in {2..200}
> >> >     do
> >> >             let A=$i-1
> >> >
> >> >             ip link add name vlan$i link vlan$A type vlan id $i
> >> >     done
> >> >     ip link del vlan1 <-- this command is added.
> >>
> >> Is there any other device type which allows arbitrary nesting depth
> >> in this manner other than VLAN?  Perhaps it is the VLAN nesting
> >> depth that we should limit instead of all of this extra code.
> >
> > Below device types have the same problem.
> > VLAN, BONDING, TEAM, VXLAN, MACVLAN, and MACSEC.
> > All the below test commands reproduce a panic.
>
> I think then we need to move the traversals over to a iterative
> rather than recursive algorithm.

Just to clarify, I have a question.

There are two recursive routines in the code.
a) netdev_walk_all_{lower/upper}_dev() that are used to traversal
all of their lower or upper devices.
b) VLAN, BONDING, TEAM, VXLAN, MACVLAN, and MACSEC
modules internally handle their lower/upper devices recursively
when an event is received such as unregistering.

what is the routine that you mentioned?

^ permalink raw reply

* Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames
From: Jiri Pirko @ 2019-09-12 11:59 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Michal Kubecek, netdev, David Miller, Jakub Kicinski, David Ahern,
	Stephen Hemminger, dcbw, Andrew Lunn, parav, Saeed Mahameed,
	mlxsw
In-Reply-To: <20190830170342.GR2312@nanopsycho>

Fri, Aug 30, 2019 at 07:03:42PM CEST, jiri@resnulli.us wrote:
>Fri, Aug 30, 2019 at 04:35:23PM CEST, roopa@cumulusnetworks.com wrote:

[...]

>>
>>so to summarize, i think we have discussed the following options to
>>update a netlink list attribute so far:
>>(a) encode an optional attribute/flag in the list attribute in
>>RTM_SETLINK to indicate if it is a add or del
>>(b) Use a flag in RTM_SETLINK and RTM_DELINK to indicate add/del
>>(close to bridge vlan add/del)
>
>Nope, bridge vlan add/del is done according to the cmd, not any flag.
>
>
>>(c) introduce a separate generic msg type to add/del to a list
>>attribute (IIUC this does need a separate msg type per subsystem or
>>netlink API)

Getting back to this, sorry.

Thinking about it for some time, a,b,c have all their issues. Why can't
we have another separate cmd as I originally proposed in this RFC? Does
anyone have any argument against it? Could you please describe?

Because otherwise, I don't feel comfortable going to any of a,b,c :(

Thanks!


^ permalink raw reply

* Re: [PATCH v2 net-next 3/7] net: dsa: sja1105: Switch to hardware operations for PTP
From: Vladimir Oltean @ 2019-09-12 11:58 UTC (permalink / raw)
  To: David Miller; +Cc: f.fainelli, vivien.didelot, andrew, richardcochran, netdev
In-Reply-To: <20190912.133753.1473374980190418320.davem@davemloft.net>

On 12/09/2019, David Miller <davem@davemloft.net> wrote:
> From: Vladimir Oltean <olteanv@gmail.com>
> Date: Thu, 12 Sep 2019 11:17:11 +0100
>
>> Hi Dave,
>>
>> On 12/09/2019, David Miller <davem@davemloft.net> wrote:
>>> From: Vladimir Oltean <olteanv@gmail.com>
>>> Date: Tue, 10 Sep 2019 04:34:57 +0300
>>>
>>>>  static int sja1105_ptp_adjfine(struct ptp_clock_info *ptp, long
>>>> scaled_ppm)
>>>>  {
>>>>  	struct sja1105_private *priv = ptp_to_sja1105(ptp);
>>>> +	const struct sja1105_regs *regs = priv->info->regs;
>>>>  	s64 clkrate;
>>>> +	int rc;
>>>  ..
>>>> -static int sja1105_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
>>>> -{
>>>> -	struct sja1105_private *priv = ptp_to_sja1105(ptp);
>>>> +	rc = sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclkrate,
>>>> +				  &clkrate, 4);
>>>
>>> You're sending an arbitrary 4 bytes of a 64-bit value.  This works on
>>> little
>>> endian
>>> but will not on big endian.
>>>
>>> Please properly copy this clkrate into a "u32" variable and pass that
>>> into
>>> sja1105_spi_send_int().
>>>
>>> It also seems to suggest that you want to use abs() to perform that
>>> weird
>>> centering around 1 << 31 calculation.
>>>
>>> Thank you.
>>>
>>
>> It looks 'wrong' but it isn't. The driver uses the 'packing' framework
>> (lib/packing.c) which is endian-agnostic (converts between CPU and
>> peripheral endianness) and operates on u64 as the CPU word size. On
>> the contrary, u32 would not work with the 'packing' API in its current
>> form, but I don't see yet any reasons to extend it (packing64,
>> packing32 etc).
>
> That's extremely unintuitive and makes auditing patches next to impossible.
>

Through static analysis maybe you're right - I don't yet grasp what it
takes to prove an API is used correctly at build time, but I can look
at how others are doing it.
At runtime, there is sanity checking throughout and all the bugs I've
had while calling packing() incorrectly I caught them right away.
spi_send_int in particular is just a wrapper for packing an N byte
sized word (which fits in u64) in bits [8*N-1, 0] of the buffer, as
per peripheral memory ordering quirks. This is perhaps the trivial
case that can be handled through other APIs as well, but there are
times when I need to pack an u64 into a bit field that crosses even
64-bit boundaries. Combine that with weird byte ordering, and the
sja1105 driver would have simply not existed if it had to open-code
all of that. The API's high-level goal is for readers to be able to
follow along smoothly with the register manual.
All I'm saying is that I'm willing to make the packing API more
sane/checkable, but at the moment I just don't see the better
alternative.

Thanks,
-Vladimir

^ permalink raw reply

* Re: [PATCH] sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'
From: David Miller @ 2019-09-12 11:56 UTC (permalink / raw)
  To: christophe.jaillet
  Cc: vyasevich, nhorman, marcelo.leitner, linux-sctp, netdev,
	linux-kernel, kernel-janitors
In-Reply-To: <20190911160239.10734-1-christophe.jaillet@wanadoo.fr>

From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date: Wed, 11 Sep 2019 18:02:39 +0200

> The '.exit' functions from 'pernet_operations' structure should be marked
> as __net_exit, not __net_init.
> 
> Fixes: 8e2d61e0aed2 ("sctp: fix race on protocol/netns initialization")
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net v2 01/11] net: core: limit nested device depth
From: Taehee Yoo @ 2019-09-12 11:54 UTC (permalink / raw)
  To: David Miller
  Cc: Netdev, j.vosburgh, vfalico, Andy Gospodarek,
	Jiří Pírko, sd, Roopa Prabhu, saeedm, manishc,
	rahulv, kys, haiyangz, sthemmin, sashal, hare, varun, ubraun,
	kgraul, Jay Vosburgh
In-Reply-To: <20190912.133717.257813019167130934.davem@davemloft.net>

On Thu, 12 Sep 2019 at 20:37, David Miller <davem@davemloft.net> wrote:
>
> From: Taehee Yoo <ap420073@gmail.com>
> Date: Thu, 12 Sep 2019 19:14:37 +0900
>
> > On Thu, 12 Sep 2019 at 18:38, David Miller <davem@davemloft.net> wrote:
> >>
> >> From: Taehee Yoo <ap420073@gmail.com>
> >> Date: Thu, 12 Sep 2019 12:56:19 +0900
> >>
> >> > I tested with this reproducer commands without lockdep.
> >> >
> >> >     ip link add dummy0 type dummy
> >> >     ip link add link dummy0 name vlan1 type vlan id 1
> >> >     ip link set vlan1 up
> >> >
> >> >     for i in {2..200}
> >> >     do
> >> >             let A=$i-1
> >> >
> >> >             ip link add name vlan$i link vlan$A type vlan id $i
> >> >     done
> >> >     ip link del vlan1 <-- this command is added.
> >>
> >> Is there any other device type which allows arbitrary nesting depth
> >> in this manner other than VLAN?  Perhaps it is the VLAN nesting
> >> depth that we should limit instead of all of this extra code.
> >
> > Below device types have the same problem.
> > VLAN, BONDING, TEAM, VXLAN, MACVLAN, and MACSEC.
> > All the below test commands reproduce a panic.
>
> I think then we need to move the traversals over to a iterative
> rather than recursive algorithm.

I agree with that.
So I will check it out then send a v3 patchset.

Thank you!

^ permalink raw reply

* Re: [PATCH] cxgb4: Fix spelling typos
From: David Miller @ 2019-09-12 11:51 UTC (permalink / raw)
  To: arkadiusz; +Cc: vishal, netdev, linux-kernel
In-Reply-To: <20190910204901.11741-1-arkadiusz@drabczyk.org>

From: Arkadiusz Drabczyk <arkadiusz@drabczyk.org>
Date: Tue, 10 Sep 2019 22:49:01 +0200

> Fix several spelling typos in comments in t4_hw.c.
> 
> Signed-off-by: Arkadiusz Drabczyk <arkadiusz@drabczyk.org>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH] ixgbe: Fix secpath usage for IPsec TX offload.
From: David Miller @ 2019-09-12 11:43 UTC (permalink / raw)
  To: steffen.klassert
  Cc: jeffrey.t.kirsher, intel-wired-lan, michael, snelson, netdev
In-Reply-To: <20190912110144.GS2879@gauss3.secunet.de>

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Thu, 12 Sep 2019 13:01:44 +0200

> The ixgbe driver currently does IPsec TX offloading
> based on an existing secpath. However, the secpath
> can also come from the RX side, in this case it is
> misinterpreted for TX offload and the packets are
> dropped with a "bad sa_idx" error. Fix this by using
> the xfrm_offload() function to test for TX offload.
> 
> Fixes: 592594704761 ("ixgbe: process the Tx ipsec offload")
> Reported-by: Michael Marley <michael@michaelmarley.com>
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

I'll apply this directly and queue it up for -stable, thanks.

^ permalink raw reply

* Re: [Patch net] sch_sfb: fix a crash in sfb_destroy()
From: David Miller @ 2019-09-12 11:42 UTC (permalink / raw)
  To: torvalds
  Cc: xiyou.wangcong, eric.dumazet, netdev, syzbot+d5870a903591faaca4ae,
	jhs, jiri
In-Reply-To: <CAHk-=whO37+O-mohvMODnD57ppCsK3Bcv8oHzSBvmwJbsT54cA@mail.gmail.com>

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu, 12 Sep 2019 11:31:06 +0100

> It depends on what you want to do, of course. Do you want to make sure
> each user is being very careful? Or do you want to make the interfaces
> easy to use without _having_ to be careful? There are arguments both
> ways, but we've tended to move more towards a "easy to use" model than
> the "be careful" one.

Yes, I think allowing NULL or error pointers on free/destroy makes sense.

^ permalink raw reply

* Re: [PATCH v2 net-next 3/7] net: dsa: sja1105: Switch to hardware operations for PTP
From: David Miller @ 2019-09-12 11:37 UTC (permalink / raw)
  To: olteanv; +Cc: f.fainelli, vivien.didelot, andrew, richardcochran, netdev
In-Reply-To: <CA+h21hoBQ=4pSCgwcYWErA7k7BQ02LMun_qZ476-bB4eY6RjjQ@mail.gmail.com>

From: Vladimir Oltean <olteanv@gmail.com>
Date: Thu, 12 Sep 2019 11:17:11 +0100

> Hi Dave,
> 
> On 12/09/2019, David Miller <davem@davemloft.net> wrote:
>> From: Vladimir Oltean <olteanv@gmail.com>
>> Date: Tue, 10 Sep 2019 04:34:57 +0300
>>
>>>  static int sja1105_ptp_adjfine(struct ptp_clock_info *ptp, long
>>> scaled_ppm)
>>>  {
>>>  	struct sja1105_private *priv = ptp_to_sja1105(ptp);
>>> +	const struct sja1105_regs *regs = priv->info->regs;
>>>  	s64 clkrate;
>>> +	int rc;
>>  ..
>>> -static int sja1105_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
>>> -{
>>> -	struct sja1105_private *priv = ptp_to_sja1105(ptp);
>>> +	rc = sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclkrate,
>>> +				  &clkrate, 4);
>>
>> You're sending an arbitrary 4 bytes of a 64-bit value.  This works on little
>> endian
>> but will not on big endian.
>>
>> Please properly copy this clkrate into a "u32" variable and pass that into
>> sja1105_spi_send_int().
>>
>> It also seems to suggest that you want to use abs() to perform that weird
>> centering around 1 << 31 calculation.
>>
>> Thank you.
>>
> 
> It looks 'wrong' but it isn't. The driver uses the 'packing' framework
> (lib/packing.c) which is endian-agnostic (converts between CPU and
> peripheral endianness) and operates on u64 as the CPU word size. On
> the contrary, u32 would not work with the 'packing' API in its current
> form, but I don't see yet any reasons to extend it (packing64,
> packing32 etc).

That's extremely unintuitive and makes auditing patches next to impossible.

^ permalink raw reply

* Re: [PATCH net v2 01/11] net: core: limit nested device depth
From: David Miller @ 2019-09-12 11:37 UTC (permalink / raw)
  To: ap420073
  Cc: netdev, j.vosburgh, vfalico, andy, jiri, sd, roopa, saeedm,
	manishc, rahulv, kys, haiyangz, sthemmin, sashal, hare, varun,
	ubraun, kgraul, jay.vosburgh
In-Reply-To: <CAMArcTWMjTsZB8Ssx+hVMK-3-XozZw7AqVE62-H+zrJ+doC5Lw@mail.gmail.com>

From: Taehee Yoo <ap420073@gmail.com>
Date: Thu, 12 Sep 2019 19:14:37 +0900

> On Thu, 12 Sep 2019 at 18:38, David Miller <davem@davemloft.net> wrote:
>>
>> From: Taehee Yoo <ap420073@gmail.com>
>> Date: Thu, 12 Sep 2019 12:56:19 +0900
>>
>> > I tested with this reproducer commands without lockdep.
>> >
>> >     ip link add dummy0 type dummy
>> >     ip link add link dummy0 name vlan1 type vlan id 1
>> >     ip link set vlan1 up
>> >
>> >     for i in {2..200}
>> >     do
>> >             let A=$i-1
>> >
>> >             ip link add name vlan$i link vlan$A type vlan id $i
>> >     done
>> >     ip link del vlan1 <-- this command is added.
>>
>> Is there any other device type which allows arbitrary nesting depth
>> in this manner other than VLAN?  Perhaps it is the VLAN nesting
>> depth that we should limit instead of all of this extra code.
> 
> Below device types have the same problem.
> VLAN, BONDING, TEAM, VXLAN, MACVLAN, and MACSEC.
> All the below test commands reproduce a panic.

I think then we need to move the traversals over to a iterative
rather than recursive algorithm.

^ permalink raw reply

* [patch iproute2-next v4 2/2] devlink: implement flash status monitoring
From: Jiri Pirko @ 2019-09-12 11:29 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern, jakub.kicinski, saeedm, mlxsw, f.fainelli
In-Reply-To: <20190912112938.2292-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Listen to status notifications coming from kernel during flashing and
put them on stdout to inform user about the status.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v3->v4:
- rebased (traps, pr_x conversion)
v2->v3:
- added example in man
v1->v2:
- fixed endless loop bug in case of no notifications
---
 devlink/devlink.c      | 215 ++++++++++++++++++++++++++++++++++++++++-
 devlink/mnlg.c         |   5 +
 devlink/mnlg.h         |   1 +
 man/man8/devlink-dev.8 |  11 +++
 4 files changed, 228 insertions(+), 4 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 31c319e3ef7a..00ebfb598515 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -25,6 +25,7 @@
 #include <linux/devlink.h>
 #include <libmnl/libmnl.h>
 #include <netinet/ether.h>
+#include <sys/types.h>
 
 #include "SNAPSHOT.h"
 #include "list.h"
@@ -96,6 +97,18 @@ pr_out_sp(unsigned int num, const char *fmt, ...)
 	g_new_line_count = 0;			\
 }
 
+static void __attribute__((format(printf, 1, 2)))
+pr_out_tty(const char *fmt, ...)
+{
+	va_list ap;
+
+	if (!isatty(STDOUT_FILENO))
+		return;
+	va_start(ap, fmt);
+	vprintf(fmt, ap);
+	va_end(ap);
+}
+
 static void __pr_out_indent_inc(void)
 {
 	if (g_indent_level + INDENT_STR_STEP > INDENT_STR_MAXLEN)
@@ -135,9 +148,8 @@ static int _mnlg_socket_recv_run(struct mnlg_socket *nlg,
 	return 0;
 }
 
-static int _mnlg_socket_sndrcv(struct mnlg_socket *nlg,
-			       const struct nlmsghdr *nlh,
-			       mnl_cb_t data_cb, void *data)
+static int _mnlg_socket_send(struct mnlg_socket *nlg,
+			     const struct nlmsghdr *nlh)
 {
 	int err;
 
@@ -146,6 +158,18 @@ static int _mnlg_socket_sndrcv(struct mnlg_socket *nlg,
 		pr_err("Failed to call mnlg_socket_send\n");
 		return -errno;
 	}
+	return 0;
+}
+
+static int _mnlg_socket_sndrcv(struct mnlg_socket *nlg,
+			       const struct nlmsghdr *nlh,
+			       mnl_cb_t data_cb, void *data)
+{
+	int err;
+
+	err = _mnlg_socket_send(nlg, nlh);
+	if (err)
+		return err;
 	return _mnlg_socket_recv_run(nlg, data_cb, data);
 }
 
@@ -2830,9 +2854,151 @@ static void cmd_dev_flash_help(void)
 	pr_err("Usage: devlink dev flash DEV file PATH [ component NAME ]\n");
 }
 
+
+struct cmd_dev_flash_status_ctx {
+	struct dl *dl;
+	char *last_msg;
+	char *last_component;
+	uint8_t not_first:1,
+		last_pc:1,
+		received_end:1,
+		flash_done:1;
+};
+
+static int nullstrcmp(const char *str1, const char *str2)
+{
+	if (str1 && str2)
+		return strcmp(str1, str2);
+	if (!str1 && !str2)
+		return 0;
+	return str1 ? 1 : -1;
+}
+
+static int cmd_dev_flash_status_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct cmd_dev_flash_status_ctx *ctx = data;
+	struct dl_opts *opts = &ctx->dl->opts;
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	const char *component = NULL;
+	uint64_t done = 0, total = 0;
+	const char *msg = NULL;
+	const char *bus_name;
+	const char *dev_name;
+
+	if (genl->cmd != DEVLINK_CMD_FLASH_UPDATE_STATUS &&
+	    genl->cmd != DEVLINK_CMD_FLASH_UPDATE_END)
+		return MNL_CB_STOP;
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME])
+		return MNL_CB_ERROR;
+	bus_name = mnl_attr_get_str(tb[DEVLINK_ATTR_BUS_NAME]);
+	dev_name = mnl_attr_get_str(tb[DEVLINK_ATTR_DEV_NAME]);
+	if (strcmp(bus_name, opts->bus_name) ||
+	    strcmp(dev_name, opts->dev_name))
+		return MNL_CB_ERROR;
+
+	if (genl->cmd == DEVLINK_CMD_FLASH_UPDATE_END && ctx->not_first) {
+		pr_out("\n");
+		free(ctx->last_msg);
+		free(ctx->last_component);
+		ctx->received_end = 1;
+		return MNL_CB_STOP;
+	}
+
+	if (tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_MSG])
+		msg = mnl_attr_get_str(tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_MSG]);
+	if (tb[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT])
+		component = mnl_attr_get_str(tb[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT]);
+	if (tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_DONE])
+		done = mnl_attr_get_u64(tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_DONE]);
+	if (tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL])
+		total = mnl_attr_get_u64(tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL]);
+
+	if (!nullstrcmp(msg, ctx->last_msg) &&
+	    !nullstrcmp(component, ctx->last_component) &&
+	    ctx->last_pc && ctx->not_first) {
+		pr_out_tty("\b\b\b\b\b"); /* clean percentage */
+	} else {
+		if (ctx->not_first)
+			pr_out("\n");
+		if (component) {
+			pr_out("[%s] ", component);
+			free(ctx->last_component);
+			ctx->last_component = strdup(component);
+		}
+		if (msg) {
+			pr_out("%s", msg);
+			free(ctx->last_msg);
+			ctx->last_msg = strdup(msg);
+		}
+	}
+	if (total) {
+		pr_out_tty(" %3lu%%", (done * 100) / total);
+		ctx->last_pc = 1;
+	} else {
+		ctx->last_pc = 0;
+	}
+	fflush(stdout);
+	ctx->not_first = 1;
+
+	return MNL_CB_STOP;
+}
+
+static int cmd_dev_flash_fds_process(struct cmd_dev_flash_status_ctx *ctx,
+				     struct mnlg_socket *nlg_ntf,
+				     int pipe_r)
+{
+	int nlfd = mnlg_socket_get_fd(nlg_ntf);
+	fd_set fds[3];
+	int fdmax;
+	int i;
+	int err;
+	int err2;
+
+	for (i = 0; i < 3; i++)
+		FD_ZERO(&fds[i]);
+	FD_SET(pipe_r, &fds[0]);
+	fdmax = pipe_r + 1;
+	FD_SET(nlfd, &fds[0]);
+	if (nlfd >= fdmax)
+		fdmax = nlfd + 1;
+
+	while (select(fdmax, &fds[0], &fds[1], &fds[2], NULL) < 0) {
+		if (errno == EINTR)
+			continue;
+		pr_err("select() failed\n");
+		return -errno;
+	}
+	if (FD_ISSET(nlfd, &fds[0])) {
+		err = _mnlg_socket_recv_run(nlg_ntf,
+					    cmd_dev_flash_status_cb, ctx);
+		if (err)
+			return err;
+	}
+	if (FD_ISSET(pipe_r, &fds[0])) {
+		err = read(pipe_r, &err2, sizeof(err2));
+		if (err == -1) {
+			pr_err("Failed to read pipe\n");
+			return -errno;
+		}
+		if (err2)
+			return err2;
+		ctx->flash_done = 1;
+	}
+	return 0;
+}
+
+
 static int cmd_dev_flash(struct dl *dl)
 {
+	struct cmd_dev_flash_status_ctx ctx = {.dl = dl,};
+	struct mnlg_socket *nlg_ntf;
 	struct nlmsghdr *nlh;
+	int pipe_r, pipe_w;
+	int pipe_fds[2];
+	pid_t pid;
 	int err;
 
 	if (dl_argv_match(dl, "help") || dl_no_arg(dl)) {
@@ -2848,7 +3014,48 @@ static int cmd_dev_flash(struct dl *dl)
 	if (err)
 		return err;
 
-	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+	nlg_ntf = mnlg_socket_open(DEVLINK_GENL_NAME, DEVLINK_GENL_VERSION);
+	if (!nlg_ntf)
+		return err;
+
+	err = _mnlg_socket_group_add(nlg_ntf, DEVLINK_GENL_MCGRP_CONFIG_NAME);
+	if (err)
+		return err;
+
+	err = pipe(pipe_fds);
+	if (err == -1)
+		return -errno;
+	pipe_r = pipe_fds[0];
+	pipe_w = pipe_fds[1];
+
+	pid = fork();
+	if (pid == -1) {
+		close(pipe_r);
+		close(pipe_w);
+		return -errno;
+	} else if (!pid) {
+		/* In child, just execute the flash and pass returned
+		 * value through pipe once it is done.
+		 */
+		close(pipe_r);
+		err = _mnlg_socket_send(dl->nlg, nlh);
+		write(pipe_w, &err, sizeof(err));
+		close(pipe_w);
+		exit(0);
+	}
+	close(pipe_w);
+
+	do {
+		err = cmd_dev_flash_fds_process(&ctx, nlg_ntf, pipe_r);
+		if (err)
+			goto out;
+	} while (!ctx.flash_done || (ctx.not_first && !ctx.received_end));
+
+	err = _mnlg_socket_recv_run(dl->nlg, NULL, NULL);
+out:
+	close(pipe_r);
+	mnlg_socket_close(nlg_ntf);
+	return err;
 }
 
 static int cmd_dev(struct dl *dl)
diff --git a/devlink/mnlg.c b/devlink/mnlg.c
index ee125df042f0..c7d25e8713a1 100644
--- a/devlink/mnlg.c
+++ b/devlink/mnlg.c
@@ -320,3 +320,8 @@ void mnlg_socket_close(struct mnlg_socket *nlg)
 	free(nlg->buf);
 	free(nlg);
 }
+
+int mnlg_socket_get_fd(struct mnlg_socket *nlg)
+{
+	return mnl_socket_get_fd(nlg->nl);
+}
diff --git a/devlink/mnlg.h b/devlink/mnlg.h
index 4d1babf3b4c2..61bc5a3f31aa 100644
--- a/devlink/mnlg.h
+++ b/devlink/mnlg.h
@@ -23,5 +23,6 @@ int mnlg_socket_recv_run(struct mnlg_socket *nlg, mnl_cb_t data_cb, void *data);
 int mnlg_socket_group_add(struct mnlg_socket *nlg, const char *group_name);
 struct mnlg_socket *mnlg_socket_open(const char *family_name, uint8_t version);
 void mnlg_socket_close(struct mnlg_socket *nlg);
+int mnlg_socket_get_fd(struct mnlg_socket *nlg);
 
 #endif /* _MNLG_H_ */
diff --git a/man/man8/devlink-dev.8 b/man/man8/devlink-dev.8
index 1804463b2321..1021ee8d064c 100644
--- a/man/man8/devlink-dev.8
+++ b/man/man8/devlink-dev.8
@@ -244,6 +244,17 @@ Sets the parameter internal_error_reset of specified devlink device to true.
 devlink dev reload pci/0000:01:00.0
 .RS 4
 Performs hot reload of specified devlink device.
+.RE
+.PP
+devlink dev flash pci/0000:01:00.0 file firmware.bin
+.RS 4
+Flashes the specified devlink device with provided firmware file name. If the driver supports it, user gets updates about the flash status. For example:
+.br
+Preparing to flash
+.br
+Flashing 100%
+.br
+Flashing done
 
 .SH SEE ALSO
 .BR devlink (8),
-- 
2.20.1


^ permalink raw reply related

* [patch iproute2-next v4 1/2] devlink: implement flash update status monitoring
From: Jiri Pirko @ 2019-09-12 11:29 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern, jakub.kicinski, saeedm, mlxsw, f.fainelli
In-Reply-To: <20190912112938.2292-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Kernel sends notifications about flash update status, so implement these
messages for monitoring.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v3->v4:
- rebased (traps)
---
 devlink/devlink.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 2f084c020765..31c319e3ef7a 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -443,6 +443,10 @@ static const enum mnl_attr_data_type devlink_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_HEALTH_REPORTER_RECOVER_COUNT] = MNL_TYPE_U64,
 	[DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS] = MNL_TYPE_U64,
 	[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = MNL_TYPE_U64,
+	[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT] = MNL_TYPE_STRING,
+	[DEVLINK_ATTR_FLASH_UPDATE_STATUS_MSG] = MNL_TYPE_STRING,
+	[DEVLINK_ATTR_FLASH_UPDATE_STATUS_DONE] = MNL_TYPE_U64,
+	[DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL] = MNL_TYPE_U64,
 	[DEVLINK_ATTR_STATS] = MNL_TYPE_NESTED,
 	[DEVLINK_ATTR_TRAP_NAME] = MNL_TYPE_STRING,
 	[DEVLINK_ATTR_TRAP_ACTION] = MNL_TYPE_U8,
@@ -3878,6 +3882,9 @@ static const char *cmd_name(uint8_t cmd)
 	case DEVLINK_CMD_REGION_SET: return "set";
 	case DEVLINK_CMD_REGION_NEW: return "new";
 	case DEVLINK_CMD_REGION_DEL: return "del";
+	case DEVLINK_CMD_FLASH_UPDATE: return "begin";
+	case DEVLINK_CMD_FLASH_UPDATE_END: return "end";
+	case DEVLINK_CMD_FLASH_UPDATE_STATUS: return "status";
 	case DEVLINK_CMD_TRAP_GET: return "get";
 	case DEVLINK_CMD_TRAP_SET: return "set";
 	case DEVLINK_CMD_TRAP_NEW: return "new";
@@ -3914,6 +3921,10 @@ static const char *cmd_obj(uint8_t cmd)
 	case DEVLINK_CMD_REGION_NEW:
 	case DEVLINK_CMD_REGION_DEL:
 		return "region";
+	case DEVLINK_CMD_FLASH_UPDATE:
+	case DEVLINK_CMD_FLASH_UPDATE_END:
+	case DEVLINK_CMD_FLASH_UPDATE_STATUS:
+		return "flash";
 	case DEVLINK_CMD_TRAP_GET:
 	case DEVLINK_CMD_TRAP_SET:
 	case DEVLINK_CMD_TRAP_NEW:
@@ -3948,6 +3959,29 @@ static bool cmd_filter_check(struct dl *dl, uint8_t cmd)
 	return false;
 }
 
+static void pr_out_flash_update(struct dl *dl, struct nlattr **tb)
+{
+	__pr_out_handle_start(dl, tb, true, false);
+
+	if (tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_MSG])
+		pr_out_str(dl, "msg",
+			   mnl_attr_get_str(tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_MSG]));
+
+	if (tb[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT])
+		pr_out_str(dl, "component",
+			   mnl_attr_get_str(tb[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT]));
+
+	if (tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_DONE])
+		pr_out_u64(dl, "done",
+			   mnl_attr_get_u64(tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_DONE]));
+
+	if (tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL])
+		pr_out_u64(dl, "total",
+			   mnl_attr_get_u64(tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL]));
+
+	pr_out_handle_end(dl);
+}
+
 static void pr_out_region(struct dl *dl, struct nlattr **tb);
 static void pr_out_trap(struct dl *dl, struct nlattr **tb, bool array);
 static void pr_out_trap_group(struct dl *dl, struct nlattr **tb, bool array);
@@ -4006,6 +4040,15 @@ static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
 		pr_out_mon_header(genl->cmd);
 		pr_out_region(dl, tb);
 		break;
+	case DEVLINK_CMD_FLASH_UPDATE: /* fall through */
+	case DEVLINK_CMD_FLASH_UPDATE_END: /* fall through */
+	case DEVLINK_CMD_FLASH_UPDATE_STATUS:
+		mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+		if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME])
+			return MNL_CB_ERROR;
+		pr_out_mon_header(genl->cmd);
+		pr_out_flash_update(dl, tb);
+		break;
 	case DEVLINK_CMD_TRAP_GET: /* fall through */
 	case DEVLINK_CMD_TRAP_SET: /* fall through */
 	case DEVLINK_CMD_TRAP_NEW: /* fall through */
-- 
2.20.1


^ permalink raw reply related

* [patch iproute2-next v4 0/2] devlink: couple forgotten flash patches
From: Jiri Pirko @ 2019-09-12 11:29 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern, jakub.kicinski, saeedm, mlxsw, f.fainelli

From: Jiri Pirko <jiri@mellanox.com>

I was under impression they are already merged, but apparently they are
not. I just rebased them on top of current iproute2 net-next tree.

Jiri Pirko (2):
  devlink: implement flash update status monitoring
  devlink: implement flash status monitoring

 devlink/devlink.c      | 258 ++++++++++++++++++++++++++++++++++++++++-
 devlink/mnlg.c         |   5 +
 devlink/mnlg.h         |   1 +
 man/man8/devlink-dev.8 |  11 ++
 4 files changed, 271 insertions(+), 4 deletions(-)

-- 
2.20.1


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox