* Re: [PATCH net-next 1/4] dpaa2-eth: Use a single page per Rx buffer
From: Ilias Apalodimas @ 2019-02-05 9:35 UTC (permalink / raw)
To: Ioana Ciocoi Radulescu
Cc: netdev@vger.kernel.org, davem@davemloft.net, Ioana Ciornei,
brouer@redhat.com
In-Reply-To: <1549299625-28399-2-git-send-email-ruxandra.radulescu@nxp.com>
Hi Ioana,
Can you share any results on XDP (XDP_DROP is usually useful for the hardware
capabilities).
> Instead of allocating page fragments via the network stack,
> use the page allocator directly. For now, we consume one page
> for each Rx buffer.
>
> With the new memory model we are free to consider adding more
> XDP support.
>
> Performance decreases slightly in some IP forwarding cases.
> No visible effect on termination traffic. The driver memory
> footprint increases as a result of this change, but it is
> still small enough to not really matter.
>
> Another side effect is that now Rx buffer alignment requirements
> are naturally satisfied without any additional actions needed.
> Remove alignment related code, except in the buffer layout
> information conveyed to MC, as hardware still needs to know the
> alignment value we guarantee.
>
> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
> Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
> ---
> drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 61 +++++++++++++-----------
> drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h | 21 +++-----
> 2 files changed, 38 insertions(+), 44 deletions(-)
>
> diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> index 04925c7..6e58de6 100644
> --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> @@ -86,16 +86,16 @@ static void free_rx_fd(struct dpaa2_eth_priv *priv,
> for (i = 1; i < DPAA2_ETH_MAX_SG_ENTRIES; i++) {
> addr = dpaa2_sg_get_addr(&sgt[i]);
> sg_vaddr = dpaa2_iova_to_virt(priv->iommu_domain, addr);
> - dma_unmap_single(dev, addr, DPAA2_ETH_RX_BUF_SIZE,
> - DMA_BIDIRECTIONAL);
> + dma_unmap_page(dev, addr, DPAA2_ETH_RX_BUF_SIZE,
> + DMA_BIDIRECTIONAL);
>
> - skb_free_frag(sg_vaddr);
> + free_pages((unsigned long)sg_vaddr, 0);
> if (dpaa2_sg_is_final(&sgt[i]))
> break;
> }
>
> free_buf:
> - skb_free_frag(vaddr);
> + free_pages((unsigned long)vaddr, 0);
> }
>
> /* Build a linear skb based on a single-buffer frame descriptor */
> @@ -109,7 +109,7 @@ static struct sk_buff *build_linear_skb(struct dpaa2_eth_channel *ch,
>
> ch->buf_count--;
>
> - skb = build_skb(fd_vaddr, DPAA2_ETH_SKB_SIZE);
> + skb = build_skb(fd_vaddr, DPAA2_ETH_RX_BUF_RAW_SIZE);
> if (unlikely(!skb))
> return NULL;
>
> @@ -144,19 +144,19 @@ static struct sk_buff *build_frag_skb(struct dpaa2_eth_priv *priv,
> /* Get the address and length from the S/G entry */
> sg_addr = dpaa2_sg_get_addr(sge);
> sg_vaddr = dpaa2_iova_to_virt(priv->iommu_domain, sg_addr);
> - dma_unmap_single(dev, sg_addr, DPAA2_ETH_RX_BUF_SIZE,
> - DMA_BIDIRECTIONAL);
> + dma_unmap_page(dev, sg_addr, DPAA2_ETH_RX_BUF_SIZE,
> + DMA_BIDIRECTIONAL);
>
> sg_length = dpaa2_sg_get_len(sge);
>
> if (i == 0) {
> /* We build the skb around the first data buffer */
> - skb = build_skb(sg_vaddr, DPAA2_ETH_SKB_SIZE);
> + skb = build_skb(sg_vaddr, DPAA2_ETH_RX_BUF_RAW_SIZE);
> if (unlikely(!skb)) {
> /* Free the first SG entry now, since we already
> * unmapped it and obtained the virtual address
> */
> - skb_free_frag(sg_vaddr);
> + free_pages((unsigned long)sg_vaddr, 0);
>
> /* We still need to subtract the buffers used
> * by this FD from our software counter
> @@ -211,9 +211,9 @@ static void free_bufs(struct dpaa2_eth_priv *priv, u64 *buf_array, int count)
>
> for (i = 0; i < count; i++) {
> vaddr = dpaa2_iova_to_virt(priv->iommu_domain, buf_array[i]);
> - dma_unmap_single(dev, buf_array[i], DPAA2_ETH_RX_BUF_SIZE,
> - DMA_BIDIRECTIONAL);
> - skb_free_frag(vaddr);
> + dma_unmap_page(dev, buf_array[i], DPAA2_ETH_RX_BUF_SIZE,
> + DMA_BIDIRECTIONAL);
> + free_pages((unsigned long)vaddr, 0);
I got some hardware/XDP related questions since i have no idea how the hardware
works. From what i understand on the code, you are trying to manage the buffer
from the hw buffer manager and if the fails you unmap and free the pages.
Since XDP relies on recycling for speed (hint check xdp_return_buff()), is it
possible to recycle the buffer if the hw fails ?
> }
> }
>
> @@ -378,16 +378,16 @@ static void dpaa2_eth_rx(struct dpaa2_eth_priv *priv,
> return;
> }
>
> - dma_unmap_single(dev, addr, DPAA2_ETH_RX_BUF_SIZE,
> - DMA_BIDIRECTIONAL);
> + dma_unmap_page(dev, addr, DPAA2_ETH_RX_BUF_SIZE,
> + DMA_BIDIRECTIONAL);
> skb = build_linear_skb(ch, fd, vaddr);
> } else if (fd_format == dpaa2_fd_sg) {
> WARN_ON(priv->xdp_prog);
>
> - dma_unmap_single(dev, addr, DPAA2_ETH_RX_BUF_SIZE,
> - DMA_BIDIRECTIONAL);
> + dma_unmap_page(dev, addr, DPAA2_ETH_RX_BUF_SIZE,
> + DMA_BIDIRECTIONAL);
> skb = build_frag_skb(priv, ch, buf_data);
> - skb_free_frag(vaddr);
> + free_pages((unsigned long)vaddr, 0);
> percpu_extras->rx_sg_frames++;
> percpu_extras->rx_sg_bytes += dpaa2_fd_get_len(fd);
> } else {
> @@ -903,7 +903,7 @@ static int add_bufs(struct dpaa2_eth_priv *priv,
> {
> struct device *dev = priv->net_dev->dev.parent;
> u64 buf_array[DPAA2_ETH_BUFS_PER_CMD];
> - void *buf;
> + struct page *page;
> dma_addr_t addr;
> int i, err;
>
> @@ -911,14 +911,16 @@ static int add_bufs(struct dpaa2_eth_priv *priv,
> /* Allocate buffer visible to WRIOP + skb shared info +
> * alignment padding
> */
> - buf = napi_alloc_frag(dpaa2_eth_buf_raw_size(priv));
> - if (unlikely(!buf))
> + /* allocate one page for each Rx buffer. WRIOP sees
> + * the entire page except for a tailroom reserved for
> + * skb shared info
> + */
> + page = dev_alloc_pages(0);
> + if (!page)
> goto err_alloc;
>
> - buf = PTR_ALIGN(buf, priv->rx_buf_align);
> -
> - addr = dma_map_single(dev, buf, DPAA2_ETH_RX_BUF_SIZE,
> - DMA_BIDIRECTIONAL);
> + addr = dma_map_page(dev, page, 0, DPAA2_ETH_RX_BUF_SIZE,
> + DMA_BIDIRECTIONAL);
> if (unlikely(dma_mapping_error(dev, addr)))
> goto err_map;
>
> @@ -926,7 +928,7 @@ static int add_bufs(struct dpaa2_eth_priv *priv,
>
> /* tracing point */
> trace_dpaa2_eth_buf_seed(priv->net_dev,
> - buf, dpaa2_eth_buf_raw_size(priv),
> + page, DPAA2_ETH_RX_BUF_RAW_SIZE,
> addr, DPAA2_ETH_RX_BUF_SIZE,
> bpid);
> }
> @@ -948,7 +950,7 @@ static int add_bufs(struct dpaa2_eth_priv *priv,
> return i;
>
> err_map:
> - skb_free_frag(buf);
> + __free_pages(page, 0);
> err_alloc:
> /* If we managed to allocate at least some buffers,
> * release them to hardware
> @@ -2134,6 +2136,7 @@ static int set_buffer_layout(struct dpaa2_eth_priv *priv)
> {
> struct device *dev = priv->net_dev->dev.parent;
> struct dpni_buffer_layout buf_layout = {0};
> + u16 rx_buf_align;
> int err;
>
> /* We need to check for WRIOP version 1.0.0, but depending on the MC
> @@ -2142,9 +2145,9 @@ static int set_buffer_layout(struct dpaa2_eth_priv *priv)
> */
> if (priv->dpni_attrs.wriop_version == DPAA2_WRIOP_VERSION(0, 0, 0) ||
> priv->dpni_attrs.wriop_version == DPAA2_WRIOP_VERSION(1, 0, 0))
> - priv->rx_buf_align = DPAA2_ETH_RX_BUF_ALIGN_REV1;
> + rx_buf_align = DPAA2_ETH_RX_BUF_ALIGN_REV1;
> else
> - priv->rx_buf_align = DPAA2_ETH_RX_BUF_ALIGN;
> + rx_buf_align = DPAA2_ETH_RX_BUF_ALIGN;
>
> /* tx buffer */
> buf_layout.private_data_size = DPAA2_ETH_SWA_SIZE;
> @@ -2184,7 +2187,7 @@ static int set_buffer_layout(struct dpaa2_eth_priv *priv)
> /* rx buffer */
> buf_layout.pass_frame_status = true;
> buf_layout.pass_parser_result = true;
> - buf_layout.data_align = priv->rx_buf_align;
> + buf_layout.data_align = rx_buf_align;
> buf_layout.data_head_room = dpaa2_eth_rx_head_room(priv);
> buf_layout.private_data_size = 0;
> buf_layout.options = DPNI_BUF_LAYOUT_OPT_PARSER_RESULT |
> diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
> index 31fe486..da3d039 100644
> --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
> +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
> @@ -63,9 +63,11 @@
> /* Hardware requires alignment for ingress/egress buffer addresses */
> #define DPAA2_ETH_TX_BUF_ALIGN 64
>
> -#define DPAA2_ETH_RX_BUF_SIZE 2048
> -#define DPAA2_ETH_SKB_SIZE \
> - (DPAA2_ETH_RX_BUF_SIZE + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> +#define DPAA2_ETH_RX_BUF_RAW_SIZE PAGE_SIZE
> +#define DPAA2_ETH_RX_BUF_TAILROOM \
> + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
> +#define DPAA2_ETH_RX_BUF_SIZE \
> + (DPAA2_ETH_RX_BUF_RAW_SIZE - DPAA2_ETH_RX_BUF_TAILROOM)
>
> /* Hardware annotation area in RX/TX buffers */
> #define DPAA2_ETH_RX_HWA_SIZE 64
> @@ -343,7 +345,6 @@ struct dpaa2_eth_priv {
> bool rx_tstamp; /* Rx timestamping enabled */
>
> u16 tx_qdid;
> - u16 rx_buf_align;
> struct fsl_mc_io *mc_io;
> /* Cores which have an affine DPIO/DPCON.
> * This is the cpu set on which Rx and Tx conf frames are processed
> @@ -418,15 +419,6 @@ enum dpaa2_eth_rx_dist {
> DPAA2_ETH_RX_DIST_CLS
> };
>
> -/* Hardware only sees DPAA2_ETH_RX_BUF_SIZE, but the skb built around
> - * the buffer also needs space for its shared info struct, and we need
> - * to allocate enough to accommodate hardware alignment restrictions
> - */
> -static inline unsigned int dpaa2_eth_buf_raw_size(struct dpaa2_eth_priv *priv)
> -{
> - return DPAA2_ETH_SKB_SIZE + priv->rx_buf_align;
> -}
> -
> static inline
> unsigned int dpaa2_eth_needed_headroom(struct dpaa2_eth_priv *priv,
> struct sk_buff *skb)
> @@ -451,8 +443,7 @@ unsigned int dpaa2_eth_needed_headroom(struct dpaa2_eth_priv *priv,
> */
> static inline unsigned int dpaa2_eth_rx_head_room(struct dpaa2_eth_priv *priv)
> {
> - return priv->tx_data_offset + DPAA2_ETH_TX_BUF_ALIGN -
> - DPAA2_ETH_RX_HWA_SIZE;
> + return priv->tx_data_offset - DPAA2_ETH_RX_HWA_SIZE;
> }
>
> int dpaa2_eth_set_hash(struct net_device *net_dev, u64 flags);
> --
> 2.7.4
>
^ permalink raw reply
* Re: [PATCH net-next v3 04/16] net: sched: traverse chains in block with tcf_get_next_chain()
From: Jiri Pirko @ 2019-02-05 9:35 UTC (permalink / raw)
To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, davem, ast, daniel
In-Reply-To: <20190204123301.4223-5-vladbu@mellanox.com>
Mon, Feb 04, 2019 at 01:32:49PM CET, vladbu@mellanox.com wrote:
>All users of block->chain_list rely on rtnl lock and assume that no new
>chains are added when traversing the list. Use tcf_get_next_chain() to
>traverse chain list without relying on rtnl mutex. This function iterates
>over chains by taking reference to current iterator chain only and doesn't
>assume external synchronization of chain list.
>
>Don't take reference to all chains in block when flushing and use
>tcf_get_next_chain() to safely iterate over chain list instead. Remove
>tcf_block_put_all_chains() that is no longer used.
>
>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
^ permalink raw reply
* Re: [PATCH net-next v3 05/16] net: sched: protect chain template accesses with block lock
From: Jiri Pirko @ 2019-02-05 9:42 UTC (permalink / raw)
To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, davem, ast, daniel
In-Reply-To: <20190204123301.4223-6-vladbu@mellanox.com>
Mon, Feb 04, 2019 at 01:32:50PM CET, vladbu@mellanox.com wrote:
>When cls API is called without protection of rtnl lock, parallel
>modification of chain is possible, which means that chain template can be
>changed concurrently in certain circumstances. For example, when chain is
>'deleted' by new user-space chain API, the chain might continue to be used
>if it is referenced by actions, and can be 're-created' again by user. In
>such case same chain structure is reused and its template is changed. To
>protect from described scenario, cache chain template while holding block
>lock. Introduce standalone tc_chain_notify_delete() function that works
>with cached template values, instead of chains themselves.
>
>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
^ permalink raw reply
* [PATCH v3 1/2] dt-bindings: net: add MDIO bus multiplexer driven by a regmap device
From: Pankaj Bansal @ 2019-02-05 10:05 UTC (permalink / raw)
To: Andrew Lunn, Florian Fainelli; +Cc: netdev@vger.kernel.org, Pankaj Bansal
In-Reply-To: <20190205153014.3807-1-pankaj.bansal@nxp.com>
Add support for an MDIO bus multiplexer controlled by a regmap
device, like an FPGA.
Tested on a NXP LX2160AQDS board which uses the "QIXIS" FPGA
attached to the i2c bus.
Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com>
---
Notes:
V3:
- No change
V2:
- New file describing the device tree bindings for regmap controlled devices'
mdio mux
.../bindings/net/mdio-mux-regmap.txt | 167 +++++++++++++++++
1 file changed, 167 insertions(+)
diff --git a/Documentation/devicetree/bindings/net/mdio-mux-regmap.txt b/Documentation/devicetree/bindings/net/mdio-mux-regmap.txt
new file mode 100644
index 000000000000..8968f317965f
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/mdio-mux-regmap.txt
@@ -0,0 +1,167 @@
+Properties for an MDIO bus multiplexer controlled by a regmap
+
+This is a special case of a MDIO bus multiplexer. A regmap device,
+like an FPGA, is used to control which child bus is connected. The mdio-mux
+node must be a child of the device that is controlled by a regmap.
+The driver currently only supports devices with upto 32-bit registers.
+
+Required properties in addition to the generic multiplexer properties:
+
+- reg : integer, contains the offset of the register that controls the bus
+ multiplexer. it can be 32 bit number.
+
+- mux-mask : integer, contains an 32 bit mask that specifies which
+ bits in the register control the actual bus multiplexer. The
+ 'reg' property of each child mdio-mux node must be constrained by
+ this mask.
+
+Example 1:
+
+The FPGA node defines a i2c connected FPGA with a register space of 0x30 bytes.
+For the "EMI2" MDIO bus, register 0x54 (BRDCFG4) controls the mux on that bus.
+A bitmask of 0x07 means that bits 0, 1 and 2 (bit 0 is lsb) are the bits on
+BRDCFG4 that control the actual mux.
+
+i2c@2000000 {
+ compatible = "fsl,vf610-i2c";
+ #address-cells = <1>;
+ #size-cells = <0>;
+ reg = <0x0 0x2000000 0x0 0x10000>;
+ interrupts = <0 34 0x4>; // Level high type
+ clock-names = "i2c";
+ clocks = <&clockgen 4 7>;
+ fsl-scl-gpio = <&gpio2 15 0>;
+ status = "okay";
+
+ /* The FPGA node */
+ fpga@66 {
+ compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
+ reg = <0x66>; // fpga device address on i2c bus
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ mdio-mux-2@54 {
+ mdio-parent-bus = <&emdio2>; /* MDIO bus */
+ reg = <0x54>; /* BRDCFG4 */
+ mux-mask = <0x07>; /* EMI2_MDIO */
+ #address-cells=<1>;
+ #size-cells = <0>;
+
+ mdio@0 { // Slot 1
+ reg = <0x00>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ ethernet-phy@@1 {
+ reg = <1>;
+ compatible = "ethernet-phy-id0210.7441";
+ };
+
+ ethernet-phy@@0 {
+ reg = <0>;
+ compatible = "ethernet-phy-id0210.7441";
+ };
+ };
+
+ mdio@1 { // Slot 2
+ reg = <0x01>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ };
+
+ mdio@2 { // Slot 3
+ reg = <0x02>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ };
+ };
+ };
+};
+
+/* The parent MDIO bus. */
+emdio2: mdio@0x8B97000 {
+ compatible = "fsl,fman-memac-mdio";
+ reg = <0x0 0x8B97000 0x0 0x1000>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ little-endian;
+};
+
+Example 2:
+
+The FPGA node defines a memory mapped FPGA with a register space of 0x100 bytes.
+For the "EMI1" MDIO bus, register 0x54 (BRDCFG4) controls the mux on that bus.
+A bitmask of 0xe0 means that bits 5, 6 and 7 (bit 0 is lsb) are the bits on
+BRDCFG4 that control the actual mux.
+
+ifc: ifc@1530000 {
+ compatible = "fsl,ifc", "simple-bus";
+ reg = <0x0 0x1530000 0x0 0x10000>;
+ interrupts = <GIC_SPI 75 IRQ_TYPE_LEVEL_HIGH>;
+ #address-cells = <2>;
+ #size-cells = <1>;
+ /* NOR, NAND Flashes and FPGA on board */
+ ranges = <0x0 0x0 0x0 0x60000000 0x08000000
+ 0x2 0x0 0x0 0x7e800000 0x00010000
+ 0x3 0x0 0x0 0x7fb00000 0x00000100>;
+ status = "okay";
+
+ /* The FPGA node */
+ fpga: board-control@3,0 {
+ compatible = "fsl,ls1021aqds-fpga", "fsl,fpga-qixis";
+ reg = <0x3 0x0 0x0000100>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ mdio-mux-1@54 {
+ mdio-parent-bus = <&emdio1>; /* MDIO bus */
+ reg = <0x54>; /* BRDCFG4 */
+ mux-mask = <0xe0>; /* EMI1_MDIO */
+ #address-cells=<1>;
+ #size-cells = <0>;
+
+ mdio@0 { // Onboard PHYs rgmii_phy1
+ reg = <0x00>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ ethernet-phy@@1 {
+ reg = <1>;
+ compatible = "ethernet-phy-ieee802.3-c22";
+ };
+ };
+
+ mdio@20 { // Onboard PHYs rgmii_phy2
+ reg = <0x20>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ ethernet-phy@2 {
+ reg = <0x2>;
+ compatible = "ethernet-phy-ieee802.3-c22";
+ };
+ };
+
+ mdio@40 { // Onboard PHYs rgmii_phy3
+ reg = <0x40>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+ ethernet-phy@3 {
+ reg = <0x3>;
+ compatible = "ethernet-phy-ieee802.3-c22";
+ };
+ };
+ };
+ };
+};
+
+/* The parent MDIO bus. */
+emdio1: mdio@2d24000 {
+ compatible = "gianfar";
+ device_type = "mdio";
+ #address-cells = <1>;
+ #size-cells = <0>;
+ reg = <0x0 0x2d24000 0x0 0x4000>,
+ <0x0 0x2d10030 0x0 0x4>;
+};
--
2.17.1
^ permalink raw reply related
* [PATCH v3 0/2] add MDIO bus multiplexer driven by a regmap device
From: Pankaj Bansal @ 2019-02-05 10:05 UTC (permalink / raw)
To: Andrew Lunn, Florian Fainelli
Cc: netdev@vger.kernel.org, Pankaj Bansal, Varun Sethi
Add support for an MDIO bus multiplexer controlled by a regmap device, like an FPGA.
These apis is an extension of the existing driver drivers/net/phy/mdio-mux-mmioreg.c.
The problem with mmioreg driver is that it can operate only on memory mapped devices.
but if we have a device that controls mdio muxing and that device is controlled using i2c or spi, then it will not work.
Therefore, added apis that can be used by regmap device to control mdio mux.
Tested on a NXP LX2160AQDS board which uses the "QIXIS" FPGA attached to the i2c bus.
This is my second attempt at this.
In my previous approach i wrote a separate driver for regmap apis.
But then i realized that it is not meant to control a specific device.
It is meant to control some registers of parent device.
Therefore, IMO this should not be a Platform driver and there should not be any "compatible" property to which this driver is associated.
The previous approach patches and discussion can be accessed here:
https://www.mail-archive.com/netdev@vger.kernel.org/msg252744.html
Cc: Varun Sethi <V.Sethi@nxp.com>
---
Notes:
V2:
- https://www.mail-archive.com/netdev@vger.kernel.org/msg282176.html
V1:
- https://www.spinics.net/lists/netdev/msg548027.html
Pankaj Bansal (2):
dt-bindings: net: add MDIO bus multiplexer driven by a regmap device
netdev/phy: add MDIO bus multiplexer driven by a regmap
.../bindings/net/mdio-mux-regmap.txt | 167 +++++++++++++++++
drivers/net/phy/Kconfig | 14 ++
drivers/net/phy/Makefile | 1 +
drivers/net/phy/mdio-mux-regmap.c | 169 ++++++++++++++++++
include/linux/mdio-mux.h | 34 ++++
5 files changed, 385 insertions(+)
create mode 100644 Documentation/devicetree/bindings/net/mdio-mux-regmap.txt
create mode 100644 drivers/net/phy/mdio-mux-regmap.c
--
2.17.1
^ permalink raw reply
* [PATCH v3 2/2] netdev/phy: add MDIO bus multiplexer driven by a regmap
From: Pankaj Bansal @ 2019-02-05 10:05 UTC (permalink / raw)
To: Andrew Lunn, Florian Fainelli; +Cc: netdev@vger.kernel.org, Pankaj Bansal
In-Reply-To: <20190205153014.3807-1-pankaj.bansal@nxp.com>
Add support for an MDIO bus multiplexer controlled by a regmap
device, like an FPGA.
Tested on a NXP LX2160AQDS board which uses the "QIXIS" FPGA
attached to the i2c bus.
Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com>
---
Notes:
V3:
- make the return type of uninit function to void
- reorder the Make file change to be alphabetically sorted
- fix the compilation error when mdio-mux-regmap is built as module
by adding CONFIG_MDIO_BUS_MUX_REGMAP_MODULE macro in mdio-mux.h
V2:
- Added Kconfig entry for regmap based mdio mux
- restrict the comment lines to 80 chars
- use kerneldoc formatting for this function documentation.
drivers/net/phy/Kconfig | 14 +++
drivers/net/phy/Makefile | 1 +
drivers/net/phy/mdio-mux-regmap.c | 169 ++++++++++++++++++++++++++++
include/linux/mdio-mux.h | 34 ++++++
4 files changed, 218 insertions(+)
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 3d187cd50eb0..93ef2505caba 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -87,6 +87,20 @@ config MDIO_BUS_MUX_MMIOREG
Currently, only 8/16/32 bits registers are supported.
+config MDIO_BUS_MUX_REGMAP
+ tristate "regmap device controlled MDIO bus multiplexers"
+ depends on OF_MDIO && REGMAP
+ select MDIO_BUS_MUX
+ help
+ This module provides a driver for MDIO bus multiplexers that
+ are controlled via a regmap device, like an FPGA connected to i2c bus
+ or spi bus or memory mapped FPGA.
+ The multiplexer connects one of several child MDIO busses to a
+ parent bus. Child bus selection is under the control of one of
+ the FPGA's registers.
+
+ Currently, only 32 bits registers are supported.
+
config MDIO_CAVIUM
tristate
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 5805c0b7d60e..99737128de8a 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_MDIO_BUS_MUX) += mdio-mux.o
obj-$(CONFIG_MDIO_BUS_MUX_BCM_IPROC) += mdio-mux-bcm-iproc.o
obj-$(CONFIG_MDIO_BUS_MUX_GPIO) += mdio-mux-gpio.o
obj-$(CONFIG_MDIO_BUS_MUX_MMIOREG) += mdio-mux-mmioreg.o
+obj-$(CONFIG_MDIO_BUS_MUX_REGMAP) += mdio-mux-regmap.o
obj-$(CONFIG_MDIO_CAVIUM) += mdio-cavium.o
obj-$(CONFIG_MDIO_GPIO) += mdio-gpio.o
obj-$(CONFIG_MDIO_HISI_FEMAC) += mdio-hisi-femac.o
diff --git a/drivers/net/phy/mdio-mux-regmap.c b/drivers/net/phy/mdio-mux-regmap.c
new file mode 100644
index 000000000000..bbec43b30f3c
--- /dev/null
+++ b/drivers/net/phy/mdio-mux-regmap.c
@@ -0,0 +1,169 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+/* Simple regmap based MDIO MUX driver
+ *
+ * Copyright 2018-2019 NXP
+ *
+ * Based on mdio-mux-mmioreg.c by Timur Tabi
+ *
+ * Author:
+ * Pankaj Bansal <pankaj.bansal@nxp.com>
+ */
+
+#include <linux/platform_device.h>
+#include <linux/device.h>
+#include <linux/of_mdio.h>
+#include <linux/module.h>
+#include <linux/phy.h>
+#include <linux/mdio-mux.h>
+#include <linux/regmap.h>
+
+struct mdio_mux_regmap_state {
+ void *mux_handle;
+ struct device *dev;
+ struct regmap *regmap;
+ u32 mux_reg;
+ u32 mask;
+};
+
+/**
+ * mdio_mux_regmap_switch_fn - This function is called by the mdio-mux layer
+ * when it thinks the mdio bus multiplexer needs
+ * to switch.
+ * @current_child: current value of the mux register (masked via s->mask).
+ * @desired_child: value of the 'reg' property of the target child MDIO node.
+ * @data: Private data used by this switch_fn passed to mdio_mux_init function
+ * via mdio_mux_init(.., .., .., .., data, ..).
+ *
+ * The first time this function is called, current_child == -1.
+ * If current_child == desired_child, then the mux is already set to the
+ * correct bus.
+ */
+static int mdio_mux_regmap_switch_fn(int current_child, int desired_child,
+ void *data)
+{
+ struct mdio_mux_regmap_state *s = data;
+ bool change;
+ int ret;
+
+ ret = regmap_update_bits_check(s->regmap,
+ s->mux_reg,
+ s->mask,
+ desired_child,
+ &change);
+
+ if (ret)
+ return ret;
+ if (change)
+ dev_dbg(s->dev, "%s %d -> %d\n", __func__, current_child,
+ desired_child);
+ return ret;
+}
+
+/**
+ * mdio_mux_regmap_init - control MDIO bus muxing using regmap constructs.
+ * @dev: device with which regmap construct is associated.
+ * @mux_node: mdio bus mux node that contains parent mdio bus phandle.
+ * This node also contains sub nodes, where each subnode denotes
+ * a child mdio bus. All the child mdio buses are muxed, i.e. at a
+ * time only one of the child mdio buses can be used.
+ * @data: to store the address of data allocated by this function
+ */
+int mdio_mux_regmap_init(struct device *dev,
+ struct device_node *mux_node,
+ void **data)
+{
+ struct device_node *child;
+ struct mdio_mux_regmap_state *s;
+ int ret;
+ u32 val;
+
+ dev_dbg(dev, "probing node %pOF\n", mux_node);
+
+ s = devm_kzalloc(dev, sizeof(*s), GFP_KERNEL);
+ if (!s)
+ return -ENOMEM;
+
+ s->regmap = dev_get_regmap(dev, NULL);
+ if (IS_ERR(s->regmap)) {
+ dev_err(dev, "Failed to get parent regmap\n");
+ return PTR_ERR(s->regmap);
+ }
+
+ ret = of_property_read_u32(mux_node, "reg", &s->mux_reg);
+ if (ret) {
+ dev_err(dev, "missing or invalid reg property\n");
+ return -ENODEV;
+ }
+
+ /* Test Register read write */
+ ret = regmap_read(s->regmap, s->mux_reg, &val);
+ if (ret) {
+ dev_err(dev, "error while reading reg\n");
+ return ret;
+ }
+
+ ret = regmap_write(s->regmap, s->mux_reg, val);
+ if (ret) {
+ dev_err(dev, "error while writing reg\n");
+ return ret;
+ }
+
+ ret = of_property_read_u32(mux_node, "mux-mask", &s->mask);
+ if (ret) {
+ dev_err(dev, "missing or invalid mux-mask property\n");
+ return -ENODEV;
+ }
+
+ /* Verify that the 'reg' property of each child MDIO bus does not
+ * set any bits outside of the 'mask'.
+ */
+ for_each_available_child_of_node(mux_node, child) {
+ ret = of_property_read_u32(child, "reg", &val);
+ if (ret) {
+ dev_err(dev, "%pOF is missing a 'reg' property\n",
+ child);
+ of_node_put(child);
+ return -ENODEV;
+ }
+ if (val & ~s->mask) {
+ dev_err(dev,
+ "%pOF has a 'reg' value with unmasked bits\n",
+ child);
+ of_node_put(child);
+ return -ENODEV;
+ }
+ }
+
+ ret = mdio_mux_init(dev, mux_node, mdio_mux_regmap_switch_fn,
+ &s->mux_handle, s, NULL);
+ if (ret) {
+ if (ret != -EPROBE_DEFER)
+ dev_err(dev, "failed to register mdio-mux bus %pOF\n",
+ mux_node);
+ return ret;
+ }
+
+ *data = s;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(mdio_mux_regmap_init);
+
+/**
+ * mdio_mux_regmap_uninit - relinquish the control of MDIO bus muxing using
+ * regmap constructs.
+ * @data: address of data allocated by mdio_mux_regmap_init
+ */
+void mdio_mux_regmap_uninit(void *data)
+{
+ struct mdio_mux_regmap_state *s = data;
+
+ mdio_mux_uninit(s->mux_handle);
+}
+EXPORT_SYMBOL_GPL(mdio_mux_regmap_uninit);
+
+MODULE_AUTHOR("Pankaj Bansal <pankaj.bansal@nxp.com>");
+MODULE_DESCRIPTION("regmap based MDIO MUX driver");
+MODULE_LICENSE("GPL");
+
diff --git a/include/linux/mdio-mux.h b/include/linux/mdio-mux.h
index a5d58f221939..bb5d0e2774bf 100644
--- a/include/linux/mdio-mux.h
+++ b/include/linux/mdio-mux.h
@@ -29,4 +29,38 @@ int mdio_mux_init(struct device *dev,
void mdio_mux_uninit(void *mux_handle);
+#if defined(CONFIG_MDIO_BUS_MUX_REGMAP) || \
+ defined(CONFIG_MDIO_BUS_MUX_REGMAP_MODULE)
+/**
+ * mdio_mux_regmap_init - control MDIO bus muxing using regmap constructs.
+ * @dev: device with which regmap construct is associated.
+ * @mux_node: mdio bus mux node that contains parent mdio bus phandle.
+ * This node also contains sub nodes, where each subnode denotes
+ * a child mdio bus. All the child mdio buses are muxed, i.e. at a
+ * time only one of the child mdio buses can be used.
+ * @data: to store the address of data allocated by this function
+ */
+int mdio_mux_regmap_init(struct device *dev,
+ struct device_node *mux_node,
+ void **data);
+
+/**
+ * mdio_mux_regmap_uninit - relinquish the control of MDIO bus muxing using
+ * regmap constructs.
+ * @data: address of data allocated by mdio_mux_regmap_init
+ */
+void mdio_mux_regmap_uninit(void *data);
+#else /* CONFIG_MDIO_BUS_MUX_REGMAP(_MODULE) */
+static inline int mdio_mux_regmap_init(struct device *dev,
+ struct device_node *mux_node,
+ void **data)
+{
+ return -ENODEV;
+}
+
+static inline void mdio_mux_regmap_uninit(void *data)
+{
+}
+#endif /* CONFIG_MDIO_BUS_MUX_REGMAP(_MODULE) */
+
#endif /* __LINUX_MDIO_MUX_H */
--
2.17.1
^ permalink raw reply related
* Re: [PATCH net-next v3 06/16] net: sched: protect filter_chain list with filter_chain_lock mutex
From: Jiri Pirko @ 2019-02-05 10:04 UTC (permalink / raw)
To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, davem, ast, daniel
In-Reply-To: <20190204123301.4223-7-vladbu@mellanox.com>
Mon, Feb 04, 2019 at 01:32:51PM CET, vladbu@mellanox.com wrote:
>Extend tcf_chain with new filter_chain_lock mutex. Always lock the chain
>when accessing filter_chain list, instead of relying on rtnl lock.
>Dereference filter_chain with tcf_chain_dereference() lockdep macro to
>verify that all users of chain_list have the lock taken.
>
>Rearrange tp insert/remove code in tc_new_tfilter/tc_del_tfilter to execute
>all necessary code while holding chain lock in order to prevent
>invalidation of chain_info structure by potential concurrent change. This
>also serializes calls to tcf_chain0_head_change(), which allows head change
>callbacks to rely on filter_chain_lock for synchronization instead of rtnl
>mutex.
>
>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
^ permalink raw reply
* [PATCH net-next v3] net: dsa: mv88e6xxx: Prevent suspend to RAM
From: Miquel Raynal @ 2019-02-05 11:07 UTC (permalink / raw)
To: Andrew Lunn, Vivien Didelot, Florian Fainelli, David S. Miller
Cc: netdev, linux-kernel, Thomas Petazzoni, Gregory Clement,
Antoine Tenart, Maxime Chevallier, Nadav Haklai, Miquel Raynal
On one hand, the mv88e6xxx driver has a work queue called in loop
which will attempt register accesses after MDIO bus suspension, that
entirely freezes the platform during suspend.
On the other hand, the DSA core is not ready yet to support suspend to
RAM operation because so far there is no way to recover reliably the
switch configuration.
To avoid the kernel to freeze when suspending with a switch driven by
the mv88e6xxx driver, we choose to prevent the driver suspension and
in the same way, the whole platform.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
---
Changes since v1/v2
===================
* After having discussed with Vivien and Andrew, it seems that
preventing the mv88e6xxx driver to suspend is the best option we
have as of today. I removed the code saving the switch rules at
driver level, I forgot about saving them at DSA level, so here are
just two dummy PM hooks that prevent suspension.
For people interested in picking-up what has been written and
continue working on it, my initial proposal is available there:
https://lore.kernel.org/netdev/20190130104606.31639abb@xps13/T/
drivers/net/dsa/mv88e6xxx/chip.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 8a517d8fb9d1..28764d6d7388 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -4651,6 +4651,21 @@ static const void *pdata_device_get_match_data(struct device *dev)
return NULL;
}
+/* There is no suspend to RAM support at DSA level yet, the switch configuration
+ * would be lost after a power cycle so prevent it to be suspended.
+ */
+static int __maybe_unused mv88e6xxx_suspend(struct device *dev)
+{
+ return -EOPNOTSUPP;
+}
+
+static int __maybe_unused mv88e6xxx_resume(struct device *dev)
+{
+ return 0;
+}
+
+static SIMPLE_DEV_PM_OPS(mv88e6xxx_pm_ops, mv88e6xxx_suspend, mv88e6xxx_resume);
+
static int mv88e6xxx_probe(struct mdio_device *mdiodev)
{
struct dsa_mv88e6xxx_pdata *pdata = mdiodev->dev.platform_data;
@@ -4835,6 +4850,7 @@ static struct mdio_driver mv88e6xxx_driver = {
.mdiodrv.driver = {
.name = "mv88e6085",
.of_match_table = mv88e6xxx_of_match,
+ .pm = &mv88e6xxx_pm_ops,
},
};
--
2.19.1
^ permalink raw reply related
* Re: [PATCH net] net: cls_flower: Remove filter from mask before freeing it
From: Petr Machata @ 2019-02-05 11:18 UTC (permalink / raw)
To: David Miller
Cc: netdev@vger.kernel.org, jhs@mojatatu.com,
xiyou.wangcong@gmail.com, jiri@resnulli.us, Ido Schimmel,
Paul Blakey
In-Reply-To: <20190204.092124.2218605529866288555.davem@davemloft.net>
David Miller <davem@davemloft.net> writes:
> From: Petr Machata <petrm@mellanox.com>
> Date: Mon, 4 Feb 2019 14:50:38 +0000
>
>> In fl_change(), when adding a new rule (i.e. fold == NULL), a driver may
>> reject the new rule, for example due to resource exhaustion. By that
>> point, the new rule was already assigned a mask, and it was added to
>> that mask's hash table. The clean-up path that's invoked as a result of
>> the rejection however neglects to undo the hash table addition, and
>> proceeds to free the new rule, thus leaving a dangling pointer in the
>> hash table.
>>
>> Fix by removing fnew from the mask's hash table before it is freed.
>>
>> Fixes: 35cc3cefc4de ("net/sched: cls_flower: Reject duplicated rules
>> also under skip_sw")
>
> Please do not break up lone Fixes: tag lines in the future, I fixed it
> up for you this time.
Sorry about that and thanks.
^ permalink raw reply
* Re: [PATCH net-next v3 07/16] net: sched: introduce reference counting for tcf_proto
From: Jiri Pirko @ 2019-02-05 11:22 UTC (permalink / raw)
To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, davem, ast, daniel
In-Reply-To: <20190204123301.4223-8-vladbu@mellanox.com>
Mon, Feb 04, 2019 at 01:32:52PM CET, vladbu@mellanox.com wrote:
>In order to remove dependency on rtnl lock and allow concurrent tcf_proto
>modification, extend tcf_proto with reference counter. Implement helper
>get/put functions for tcf proto and use them to modify cls API to always
>take reference to tcf_proto while using it. Only release reference to
>parent chain after releasing last reference to tp.
>
>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
^ permalink raw reply
* Re: [ovs-dev] [PATCH net-next V2 1/1] openvswitch: Declare ovs key structures using macros
From: Eli Britstein @ 2019-02-05 12:02 UTC (permalink / raw)
To: David Miller, yihung.wei@gmail.com
Cc: pshelar@ovn.org, dev@openvswitch.org, netdev@vger.kernel.org,
simon.horman@netronome.com
In-Reply-To: <20190204.120727.1732794761205124286.davem@davemloft.net>
On 2/4/2019 10:07 PM, David Miller wrote:
> From: Yi-Hung Wei <yihung.wei@gmail.com>
> Date: Mon, 4 Feb 2019 10:47:18 -0800
>
>> For example, to see how 'struct ovs_key_ipv6' is defined, now we need
>> to trace how OVS_KEY_IPV6_FIELDS is defined, and how OVS_KEY_FIELD_ARR
>> and OVS_KEY_FIELD defined. I think it makes the header file to be
>> more complicated.
> I completely agree.
>
> Unless this is totally unavoidable, I do not want to apply a patch
> which makes reading and auditing the networking code more difficult.
This technique is discussed for example in
https://stackoverflow.com/questions/6635851/real-world-use-of-x-macros,
and I found existing examples of using it in the kernel tree:
__BPF_FUNC_MAPPER in commit ebb676daa1a34 ("bpf: Print function name in
addition to function id")
__AAL_STAT_ITEMS and __SONET_ITEMS in commit 607ca46e97a1b ("UAPI:
(Scripted) Disintegrate include/linux"), the successor of commit
1da177e4c3f4 ("Linux-2.6.12-rc2"). I didn't dig deeper to the past.
I can agree it makes that H file a bit more complicated, but for sure
less than ## macros that are widely used.
However, I think the alternatives of generating such defines by some
scripts, or having the fields in more than one place are even worse, so
it is a kind of unavoidable.
Please reconsider regarding applying this patch.
^ permalink raw reply
* [PATCH v3 net-next 05/11] devlink: Add health set command
From: Eran Ben Elisha @ 2019-02-05 12:08 UTC (permalink / raw)
To: netdev, David S. Miller
Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin,
Eran Ben Elisha
In-Reply-To: <1549368532-17089-1-git-send-email-eranbe@mellanox.com>
Add devlink health set command, in order to set configuration parameters
for a specific reporter.
Supported parameters are:
- graceful_period: Time interval between auto recoveries (in msec)
- auto_recover: Determines if the devlink shall execute recover upon
receiving error for the reporter
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
include/uapi/linux/devlink.h | 1 +
net/core/devlink.c | 36 ++++++++++++++++++++++++++++++++++++
2 files changed, 37 insertions(+)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index d8f20d6ce139..b03065a99884 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -97,6 +97,7 @@ enum devlink_command {
DEVLINK_CMD_INFO_GET, /* can dump */
DEVLINK_CMD_HEALTH_REPORTER_GET,
+ DEVLINK_CMD_HEALTH_REPORTER_SET,
/* add new commands above here */
__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 69c58047d82f..f61aaa1578bb 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4711,6 +4711,33 @@ devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg,
return msg->len;
}
+static int
+devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct devlink *devlink = info->user_ptr[0];
+ struct devlink_health_reporter *reporter;
+
+ reporter = devlink_health_reporter_get_from_info(devlink, info);
+ if (!reporter)
+ return -EINVAL;
+
+ if (!reporter->ops->recover &&
+ (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] ||
+ info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]))
+ return -EOPNOTSUPP;
+
+ if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD])
+ reporter->graceful_period =
+ nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]);
+
+ if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER])
+ reporter->auto_recover =
+ nla_get_u8(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]);
+
+ return 0;
+}
+
static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4737,6 +4764,8 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 },
[DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING },
+ [DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64 },
+ [DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 },
};
static const struct genl_ops devlink_nl_ops[] = {
@@ -4988,6 +5017,13 @@ static const struct genl_ops devlink_nl_ops[] = {
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
/* can be retrieved by unprivileged users */
},
+ {
+ .cmd = DEVLINK_CMD_HEALTH_REPORTER_SET,
+ .doit = devlink_nl_cmd_health_reporter_set_doit,
+ .policy = devlink_nl_policy,
+ .flags = GENL_ADMIN_PERM,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+ },
};
static struct genl_family devlink_nl_family __ro_after_init = {
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 11/11] devlink: Add Documentation/networking/devlink-health.txt
From: Eran Ben Elisha @ 2019-02-05 12:08 UTC (permalink / raw)
To: netdev, David S. Miller
Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin,
Eran Ben Elisha
In-Reply-To: <1549368532-17089-1-git-send-email-eranbe@mellanox.com>
From: Aya Levin <ayal@mellanox.com>
This patch adds a new file to add information about devlink health
mechanism.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
Documentation/networking/devlink-health.txt | 86 +++++++++++++++++++++
1 file changed, 86 insertions(+)
create mode 100644 Documentation/networking/devlink-health.txt
diff --git a/Documentation/networking/devlink-health.txt b/Documentation/networking/devlink-health.txt
new file mode 100644
index 000000000000..1db3fbea0831
--- /dev/null
+++ b/Documentation/networking/devlink-health.txt
@@ -0,0 +1,86 @@
+The health mechanism is targeted for Real Time Alerting, in order to know when
+something bad had happened to a PCI device
+- Provide alert debug information
+- Self healing
+- If problem needs vendor support, provide a way to gather all needed debugging
+ information.
+
+The main idea is to unify and centralize driver health reports in the
+generic devlink instance and allow the user to set different
+attributes of the health reporting and recovery procedures.
+
+The devlink health reporter:
+Device driver creates a "health reporter" per each error/health type.
+Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
+or unknown (driver specific).
+For each registered health reporter a driver can issue error/health reports
+asynchronously. All health reports handling is done by devlink.
+Device driver can provide specific callbacks for each "health reporter", e.g.
+ - Recovery procedures
+ - Diagnostics and object dump procedures
+ - OOB initial parameters
+Different parts of the driver can register different types of health reporters
+with different handlers.
+
+Once an error is reported, devlink health will do the following actions:
+ * A log is being send to the kernel trace events buffer
+ * Health status and statistics are being updated for the reporter instance
+ * Object dump is being taken and saved at the reporter instance (as long as
+ there is no other dump which is already stored)
+ * Auto recovery attempt is being done. Depends on:
+ - Auto-recovery configuration
+ - Grace period vs. time passed since last recover
+
+The user interface:
+User can access/change each reporter's parameters and driver specific callbacks
+via devlink, e.g per error type (per health reporter)
+ - Configure reporter's generic parameters (like: disable/enable auto recovery)
+ - Invoke recovery procedure
+ - Run diagnostics
+ - Object dump
+
+The devlink health interface (via netlink):
+DEVLINK_CMD_HEALTH_REPORTER_GET
+ Retrieves status and configuration info per DEV and reporter.
+DEVLINK_CMD_HEALTH_REPORTER_SET
+ Allows reporter-related configuration setting.
+DEVLINK_CMD_HEALTH_REPORTER_RECOVER
+ Triggers a reporter's recovery procedure.
+DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
+ Retrieves diagnostics data from a reporter on a device.
+DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET
+ Retrieves the last stored dump. Devlink health
+ saves a single dump. If an dump is not already stored by the devlink
+ for this reporter, devlink generates a new dump.
+ dump output is defined by the reporter.
+DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR
+ Clears the last saved dump file for the specified reporter.
+
+
+ netlink
+ +--------------------------+
+ | |
+ | + |
+ | | |
+ +--------------------------+
+ |request for ops
+ |(diagnose,
+ mlx5_core devlink |recover,
+ |dump)
++--------+ +--------------------------+
+| | | reporter| |
+| | | +---------v----------+ |
+| | ops execution | | | |
+| <----------------------------------+ | |
+| | | | | |
+| | | + ^------------------+ |
+| | | | request for ops |
+| | | | (recover, dump) |
+| | | | |
+| | | +-+------------------+ |
+| | health report | | health handler | |
+| +-------------------------------> | |
+| | | +--------------------+ |
+| | health reporter create | |
+| +----------------------------> |
++--------+ +--------------------------+
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 08/11] devlink: Add health dump {get,clear} commands
From: Eran Ben Elisha @ 2019-02-05 12:08 UTC (permalink / raw)
To: netdev, David S. Miller
Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin,
Eran Ben Elisha
In-Reply-To: <1549368532-17089-1-git-send-email-eranbe@mellanox.com>
Add devlink health dump commands, in order to run an dump operation
over a specific reporter.
The supported operations are dump_get in order to get last saved
dump (if not exist, dump now) and dump_clear to clear last saved
dump.
It is expected from driver's callback for diagnose command to fill it
via the devlink fmsg API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
include/uapi/linux/devlink.h | 2 ++
net/core/devlink.c | 63 ++++++++++++++++++++++++++++++++++++
2 files changed, 65 insertions(+)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 09be37137841..72d9f7c89190 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -100,6 +100,8 @@ enum devlink_command {
DEVLINK_CMD_HEALTH_REPORTER_SET,
DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
+ DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET,
+ DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
/* add new commands above here */
__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 40efb2afd75a..8ef8d23e6f71 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4790,6 +4790,53 @@ static int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb,
return err;
}
+static int devlink_nl_cmd_health_reporter_dump_get_doit(struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct devlink *devlink = info->user_ptr[0];
+ struct devlink_health_reporter *reporter;
+ int err;
+
+ reporter = devlink_health_reporter_get_from_info(devlink, info);
+ if (!reporter)
+ return -EINVAL;
+
+ if (!reporter->ops->dump)
+ return -EOPNOTSUPP;
+
+ mutex_lock(&reporter->dump_lock);
+ err = devlink_health_do_dump(reporter, NULL);
+ if (err)
+ goto out;
+
+ err = devlink_fmsg_snd(reporter->dump_fmsg, info,
+ DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET, 0);
+
+out:
+ mutex_unlock(&reporter->dump_lock);
+ return err;
+}
+
+static int
+devlink_nl_cmd_health_reporter_dump_clear_doit(struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct devlink *devlink = info->user_ptr[0];
+ struct devlink_health_reporter *reporter;
+
+ reporter = devlink_health_reporter_get_from_info(devlink, info);
+ if (!reporter)
+ return -EINVAL;
+
+ if (!reporter->ops->dump)
+ return -EOPNOTSUPP;
+
+ mutex_lock(&reporter->dump_lock);
+ devlink_health_dump_clear(reporter);
+ mutex_unlock(&reporter->dump_lock);
+ return 0;
+}
+
static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -5090,6 +5137,22 @@ static const struct genl_ops devlink_nl_ops[] = {
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
+ {
+ .cmd = DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET,
+ .doit = devlink_nl_cmd_health_reporter_dump_get_doit,
+ .policy = devlink_nl_policy,
+ .flags = GENL_ADMIN_PERM,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
+ DEVLINK_NL_FLAG_NO_LOCK,
+ },
+ {
+ .cmd = DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR,
+ .doit = devlink_nl_cmd_health_reporter_dump_clear_doit,
+ .policy = devlink_nl_policy,
+ .flags = GENL_ADMIN_PERM,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK |
+ DEVLINK_NL_FLAG_NO_LOCK,
+ },
};
static struct genl_family devlink_nl_family __ro_after_init = {
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 10/11] net/mlx5e: Add tx timeout support for mlx5e tx reporter
From: Eran Ben Elisha @ 2019-02-05 12:08 UTC (permalink / raw)
To: netdev, David S. Miller
Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin,
Eran Ben Elisha
In-Reply-To: <1549368532-17089-1-git-send-email-eranbe@mellanox.com>
With this patch, ndo_tx_timeout callback will be redirected to the tx
reporter in order to detect a tx timeout error and report it to the
devlink health. (The watchdog detects tx timeouts, but the driver verify
the issue still exists before launching any recover method).
In addition, recover from tx timeout in case of lost interrupt was added
to the tx reporter recover method. The tx timeout recover from lost
interrupt is not a new feature in the driver, this patch re-organize the
functionality and move it to the tx reporter recovery flow.
tx timeout example:
(with auto_recover set to false, if set to true, the manual recover and
diagnose sections are irrelevant)
$cat /sys/kernel/debug/tracing/trace
...
devlink_health_report: bus_name=pci dev_name=0000:00:09.0
driver_name=mlx5_core reporter_name=tx: TX timeout on queue: 0, SQ: 0x8a,
CQ: 0x35, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 14912000
$devlink health show
pci/0000:00:09.0:
name tx
state healthy #err 1 #recover 0 last_dump_ts N/A
parameters:
grace_period 500 auto_recover false
$devlink health diagnose pci/0000:00:09.0 reporter tx -j -p
{
"SQs": [ {
"sqn": 138,
"HW state": 1,
"stopped": true
},{
"sqn": 142,
"HW state": 1,
"stopped": false
} ]
}
$devlink health diagnose pci/0000:00:09.0 reporter tx
SQs:
sqn: 138 HW state: 1 stopped: true
sqn: 142 HW state: 1 stopped: false
$devlink health recover pci/0000:00:09 reporter tx
$devlink health show
pci/0000:00:09.0:
name tx
state healthy #err 1 #recover 1 last_dump_ts N/A
parameters:
grace_period 500 auto_recover false
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
.../ethernet/mellanox/mlx5/core/en/reporter.h | 1 +
.../mellanox/mlx5/core/en/reporter_tx.c | 38 +++++++++++++
.../net/ethernet/mellanox/mlx5/core/en_main.c | 53 ++++++-------------
3 files changed, 55 insertions(+), 37 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
index 1c90059db5f8..e78e92753d73 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
@@ -10,5 +10,6 @@
int mlx5e_tx_reporter_create(struct mlx5e_priv *priv);
void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv);
void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq);
+int mlx5e_tx_reporter_timeout(struct mlx5e_txqsq *sq);
#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 35568e4820de..0aebfb377cf0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -126,6 +126,44 @@ void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq)
&err_ctx);
}
+static int mlx5e_tx_reporter_timeout_recover(struct mlx5e_txqsq *sq)
+{
+ struct mlx5_eq_comp *eq = sq->cq.mcq.eq;
+ u32 eqe_count;
+ int ret;
+
+ netdev_err(sq->channel->netdev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
+ eq->core.eqn, eq->core.cons_index, eq->core.irqn);
+
+ eqe_count = mlx5_eq_poll_irq_disabled(eq);
+ ret = eqe_count ? true : false;
+ if (!eqe_count) {
+ clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
+ return ret;
+ }
+
+ netdev_err(sq->channel->netdev, "Recover %d eqes on EQ 0x%x\n",
+ eqe_count, eq->core.eqn);
+ sq->channel->stats->eq_rearm++;
+ return ret;
+}
+
+int mlx5e_tx_reporter_timeout(struct mlx5e_txqsq *sq)
+{
+ char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
+ struct mlx5e_tx_err_ctx err_ctx;
+
+ err_ctx.sq = sq;
+ err_ctx.recover = mlx5e_tx_reporter_timeout_recover;
+ sprintf(err_str,
+ "TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u\n",
+ sq->channel->ix, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
+ jiffies_to_usecs(jiffies - sq->txq->trans_start));
+
+ return devlink_health_report(sq->channel->priv->tx_reporter, err_str,
+ &err_ctx);
+}
+
/* state lock cannot be grabbed within this function.
* It can cause a dead lock or a read-after-free.
*/
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 93e1f037b019..d81ebba7c04e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4116,31 +4116,13 @@ netdev_features_t mlx5e_features_check(struct sk_buff *skb,
return features;
}
-static bool mlx5e_tx_timeout_eq_recover(struct net_device *dev,
- struct mlx5e_txqsq *sq)
-{
- struct mlx5_eq_comp *eq = sq->cq.mcq.eq;
- u32 eqe_count;
-
- netdev_err(dev, "EQ 0x%x: Cons = 0x%x, irqn = 0x%x\n",
- eq->core.eqn, eq->core.cons_index, eq->core.irqn);
-
- eqe_count = mlx5_eq_poll_irq_disabled(eq);
- if (!eqe_count)
- return false;
-
- netdev_err(dev, "Recover %d eqes on EQ 0x%x\n", eqe_count, eq->core.eqn);
- sq->channel->stats->eq_rearm++;
- return true;
-}
-
static void mlx5e_tx_timeout_work(struct work_struct *work)
{
struct mlx5e_priv *priv = container_of(work, struct mlx5e_priv,
tx_timeout_work);
- struct net_device *dev = priv->netdev;
- bool reopen_channels = false;
- int i, err;
+ bool report_failed = false;
+ int err;
+ int i;
rtnl_lock();
mutex_lock(&priv->state_lock);
@@ -4149,31 +4131,22 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
goto unlock;
for (i = 0; i < priv->channels.num * priv->channels.params.num_tc; i++) {
- struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, i);
+ struct netdev_queue *dev_queue =
+ netdev_get_tx_queue(priv->netdev, i);
struct mlx5e_txqsq *sq = priv->txq2sq[i];
if (!netif_xmit_stopped(dev_queue))
continue;
- netdev_err(dev,
- "TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u\n",
- i, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
- jiffies_to_usecs(jiffies - dev_queue->trans_start));
-
- /* If we recover a lost interrupt, most likely TX timeout will
- * be resolved, skip reopening channels
- */
- if (!mlx5e_tx_timeout_eq_recover(dev, sq)) {
- clear_bit(MLX5E_SQ_STATE_ENABLED, &sq->state);
- reopen_channels = true;
- }
+ if (mlx5e_tx_reporter_timeout(sq))
+ report_failed = true;
}
- if (!reopen_channels)
+ if (!report_failed)
goto unlock;
- mlx5e_close_locked(dev);
- err = mlx5e_open_locked(dev);
+ mlx5e_close_locked(priv->netdev);
+ err = mlx5e_open_locked(priv->netdev);
if (err)
netdev_err(priv->netdev,
"mlx5e_open_locked failed recovering from a tx_timeout, err(%d).\n",
@@ -4189,6 +4162,12 @@ static void mlx5e_tx_timeout(struct net_device *dev)
struct mlx5e_priv *priv = netdev_priv(dev);
netdev_err(dev, "TX timeout detected\n");
+
+ if (IS_ERR_OR_NULL(priv->tx_reporter)) {
+ netdev_err_once(priv->netdev, "tx timeout will not be handled, no valid tx reporter\n");
+ return;
+ }
+
queue_work(priv->wq, &priv->tx_timeout_work);
}
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 04/11] devlink: Add health get command
From: Eran Ben Elisha @ 2019-02-05 12:08 UTC (permalink / raw)
To: netdev, David S. Miller
Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin,
Eran Ben Elisha
In-Reply-To: <1549368532-17089-1-git-send-email-eranbe@mellanox.com>
Add devlink health get command to provide reporter/s data for user space.
Add the ability to get data per reporter or dump data from all available
reporters.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
include/uapi/linux/devlink.h | 11 +++
net/core/devlink.c | 149 +++++++++++++++++++++++++++++++++++
2 files changed, 160 insertions(+)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 076692209a9b..d8f20d6ce139 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -96,6 +96,8 @@ enum devlink_command {
DEVLINK_CMD_INFO_GET, /* can dump */
+ DEVLINK_CMD_HEALTH_REPORTER_GET,
+
/* add new commands above here */
__DEVLINK_CMD_MAX,
DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -310,6 +312,15 @@ enum devlink_attr {
DEVLINK_ATTR_FMSG_OBJ_NAME, /* string */
DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE, /* u8 */
DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA, /* dynamic */
+
+ DEVLINK_ATTR_HEALTH_REPORTER, /* nested */
+ DEVLINK_ATTR_HEALTH_REPORTER_NAME, /* string */
+ DEVLINK_ATTR_HEALTH_REPORTER_STATE, /* u8 */
+ DEVLINK_ATTR_HEALTH_REPORTER_ERR, /* u64 */
+ DEVLINK_ATTR_HEALTH_REPORTER_RECOVER, /* u64 */
+ DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS, /* u64 */
+ DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD, /* u64 */
+ DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER, /* u8 */
/* add new attributes above here, update the policy in devlink.c */
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index f45eb914597e..69c58047d82f 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4571,6 +4571,146 @@ int devlink_health_report(struct devlink_health_reporter *reporter,
}
EXPORT_SYMBOL_GPL(devlink_health_report);
+static struct devlink_health_reporter *
+devlink_health_reporter_get_from_info(struct devlink *devlink,
+ struct genl_info *info)
+{
+ char *reporter_name;
+
+ if (!info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME])
+ return NULL;
+
+ reporter_name =
+ nla_data(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_NAME]);
+ return devlink_health_reporter_find_by_name(devlink, reporter_name);
+}
+
+static int
+devlink_nl_health_reporter_fill(struct sk_buff *msg,
+ struct devlink *devlink,
+ struct devlink_health_reporter *reporter,
+ enum devlink_command cmd, u32 portid,
+ u32 seq, int flags)
+{
+ struct nlattr *reporter_attr;
+ void *hdr;
+
+ hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
+ if (!hdr)
+ return -EMSGSIZE;
+
+ if (devlink_nl_put_handle(msg, devlink))
+ goto genlmsg_cancel;
+
+ reporter_attr = nla_nest_start(msg, DEVLINK_ATTR_HEALTH_REPORTER);
+ if (!reporter_attr)
+ goto genlmsg_cancel;
+ if (nla_put_string(msg, DEVLINK_ATTR_HEALTH_REPORTER_NAME,
+ reporter->ops->name))
+ goto reporter_nest_cancel;
+ if (nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_STATE,
+ reporter->health_state))
+ goto reporter_nest_cancel;
+ if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_ERR,
+ reporter->error_count, DEVLINK_ATTR_PAD))
+ goto reporter_nest_cancel;
+ if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_RECOVER,
+ reporter->recovery_count, DEVLINK_ATTR_PAD))
+ goto reporter_nest_cancel;
+ if (nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD,
+ reporter->graceful_period,
+ DEVLINK_ATTR_PAD))
+ goto reporter_nest_cancel;
+ if (nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,
+ reporter->auto_recover))
+ goto reporter_nest_cancel;
+ if (reporter->dump_fmsg &&
+ nla_put_u64_64bit(msg, DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS,
+ jiffies_to_msecs(reporter->dump_ts),
+ DEVLINK_ATTR_PAD))
+ goto reporter_nest_cancel;
+
+ nla_nest_end(msg, reporter_attr);
+ genlmsg_end(msg, hdr);
+ return 0;
+
+reporter_nest_cancel:
+ nla_nest_end(msg, reporter_attr);
+genlmsg_cancel:
+ genlmsg_cancel(msg, hdr);
+ return -EMSGSIZE;
+}
+
+static int devlink_nl_cmd_health_reporter_get_doit(struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct devlink *devlink = info->user_ptr[0];
+ struct devlink_health_reporter *reporter;
+ struct sk_buff *msg;
+ int err;
+
+ reporter = devlink_health_reporter_get_from_info(devlink, info);
+ if (!reporter)
+ return -EINVAL;
+
+ msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!msg)
+ return -ENOMEM;
+
+ err = devlink_nl_health_reporter_fill(msg, devlink, reporter,
+ DEVLINK_CMD_HEALTH_REPORTER_GET,
+ info->snd_portid, info->snd_seq,
+ 0);
+ if (err) {
+ nlmsg_free(msg);
+ return err;
+ }
+
+ return genlmsg_reply(msg, info);
+}
+
+static int
+devlink_nl_cmd_health_reporter_get_dumpit(struct sk_buff *msg,
+ struct netlink_callback *cb)
+{
+ struct devlink_health_reporter *reporter;
+ struct devlink *devlink;
+ int start = cb->args[0];
+ int idx = 0;
+ int err;
+
+ mutex_lock(&devlink_mutex);
+ list_for_each_entry(devlink, &devlink_list, list) {
+ if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+ continue;
+ mutex_lock(&devlink->lock);
+ list_for_each_entry(reporter, &devlink->reporter_list,
+ list) {
+ if (idx < start) {
+ idx++;
+ continue;
+ }
+ err = devlink_nl_health_reporter_fill(msg, devlink,
+ reporter,
+ DEVLINK_CMD_HEALTH_REPORTER_GET,
+ NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq,
+ NLM_F_MULTI);
+ if (err) {
+ mutex_unlock(&devlink->lock);
+ goto out;
+ }
+ idx++;
+ }
+ mutex_unlock(&devlink->lock);
+ }
+out:
+ mutex_unlock(&devlink_mutex);
+
+ cb->args[0] = idx;
+ return msg->len;
+}
+
static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4596,6 +4736,7 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_PARAM_VALUE_CMODE] = { .type = NLA_U8 },
[DEVLINK_ATTR_REGION_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_REGION_SNAPSHOT_ID] = { .type = NLA_U32 },
+ [DEVLINK_ATTR_HEALTH_REPORTER_NAME] = { .type = NLA_NUL_STRING },
};
static const struct genl_ops devlink_nl_ops[] = {
@@ -4839,6 +4980,14 @@ static const struct genl_ops devlink_nl_ops[] = {
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
/* can be retrieved by unprivileged users */
},
+ {
+ .cmd = DEVLINK_CMD_HEALTH_REPORTER_GET,
+ .doit = devlink_nl_cmd_health_reporter_get_doit,
+ .dumpit = devlink_nl_cmd_health_reporter_get_dumpit,
+ .policy = devlink_nl_policy,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+ /* can be retrieved by unprivileged users */
+ },
};
static struct genl_family devlink_nl_family __ro_after_init = {
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 01/11] devlink: Add devlink formatted message (fmsg) API
From: Eran Ben Elisha @ 2019-02-05 12:08 UTC (permalink / raw)
To: netdev, David S. Miller
Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin,
Eran Ben Elisha
In-Reply-To: <1549368532-17089-1-git-send-email-eranbe@mellanox.com>
Devlink fmsg is a mechanism to pass descriptors between drivers and
devlink, in json-like format. The API allows the driver to add nested
attributes such as object, object pair and value array, in addition to
attributes such as name and value.
Driver can use this API to fill the fmsg context in a format which will be
translated by the devlink to the netlink message later.
There is no memory allocation in advance (other than the initial list
head), and it dynamically allocates messages descriptors and add them to
the list on the fly.
When it needs to send the data using SKBs to the netlink layer, it
fragments the data between different SKBs. In order to do this
fragmentation, it uses virtual nests attributes, to avoid actual
nesting use which cannot be divided between different SKBs.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
include/net/devlink.h | 149 +++++++++++
include/uapi/linux/devlink.h | 8 +
net/core/devlink.c | 483 +++++++++++++++++++++++++++++++++++
3 files changed, 640 insertions(+)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 74d992a68a06..7c5722e816aa 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -448,6 +448,8 @@ struct devlink_info_req;
typedef void devlink_snapshot_data_dest_t(const void *data);
+struct devlink_fmsg;
+
struct devlink_ops {
int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
int (*port_type_set)(struct devlink_port *devlink_port,
@@ -639,6 +641,37 @@ int devlink_info_version_running_put(struct devlink_info_req *req,
const char *version_name,
const char *version_value);
+int devlink_fmsg_obj_nest_start(struct devlink_fmsg *fmsg);
+int devlink_fmsg_obj_nest_end(struct devlink_fmsg *fmsg);
+
+int devlink_fmsg_pair_nest_start(struct devlink_fmsg *fmsg, const char *name);
+int devlink_fmsg_pair_nest_end(struct devlink_fmsg *fmsg);
+
+int devlink_fmsg_arr_pair_nest_start(struct devlink_fmsg *fmsg,
+ const char *name);
+int devlink_fmsg_arr_pair_nest_end(struct devlink_fmsg *fmsg);
+
+int devlink_fmsg_bool_put(struct devlink_fmsg *fmsg, bool value);
+int devlink_fmsg_u8_put(struct devlink_fmsg *fmsg, u8 value);
+int devlink_fmsg_u32_put(struct devlink_fmsg *fmsg, u32 value);
+int devlink_fmsg_u64_put(struct devlink_fmsg *fmsg, u64 value);
+int devlink_fmsg_string_put(struct devlink_fmsg *fmsg, const char *value);
+int devlink_fmsg_binary_put(struct devlink_fmsg *fmsg, const void *value,
+ u16 value_len);
+
+int devlink_fmsg_bool_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ bool value);
+int devlink_fmsg_u8_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ u8 value);
+int devlink_fmsg_u32_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ u32 value);
+int devlink_fmsg_u64_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ u64 value);
+int devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ const char *value);
+int devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ const void *value, u16 value_len);
+
#else
static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
@@ -971,6 +1004,122 @@ devlink_info_version_running_put(struct devlink_info_req *req,
{
return 0;
}
+
+static inline int
+devlink_fmsg_obj_nest_start(struct devlink_fmsg *fmsg)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_obj_nest_end(struct devlink_fmsg *fmsg)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_pair_nest_start(struct devlink_fmsg *fmsg, const char *name)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_pair_nest_end(struct devlink_fmsg *fmsg)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_arr_pair_nest_start(struct devlink_fmsg *fmsg,
+ const char *name)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_arr_pair_nest_end(struct devlink_fmsg *fmsg)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_bool_put(struct devlink_fmsg *fmsg, bool value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_u8_put(struct devlink_fmsg *fmsg, u8 value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_u32_put(struct devlink_fmsg *fmsg, u32 value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_u64_put(struct devlink_fmsg *fmsg, u64 value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_string_put(struct devlink_fmsg *fmsg, const char *value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_binary_put(struct devlink_fmsg *fmsg, const void *value,
+ u16 value_len)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_bool_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ bool value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_u8_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ u8 value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_u32_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ u32 value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_u64_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ u64 value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ const char *value)
+{
+ return 0;
+}
+
+static inline int
+devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ const void *value, u16 value_len)
+{
+ return 0;
+}
#endif
#if IS_REACHABLE(CONFIG_NET_DEVLINK)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 054b2d1a4537..076692209a9b 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -302,6 +302,14 @@ enum devlink_attr {
DEVLINK_ATTR_SB_POOL_CELL_SIZE, /* u32 */
+ DEVLINK_ATTR_FMSG, /* nested */
+ DEVLINK_ATTR_FMSG_OBJ_NEST_START, /* flag */
+ DEVLINK_ATTR_FMSG_PAIR_NEST_START, /* flag */
+ DEVLINK_ATTR_FMSG_ARR_NEST_START, /* flag */
+ DEVLINK_ATTR_FMSG_NEST_END, /* flag */
+ DEVLINK_ATTR_FMSG_OBJ_NAME, /* string */
+ DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE, /* u8 */
+ DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA, /* dynamic */
/* add new attributes above here, update the policy in devlink.c */
__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 52bf27491fb8..20bc28326ded 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3878,6 +3878,489 @@ static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg,
return msg->len;
}
+struct devlink_fmsg_item {
+ struct list_head list;
+ int attrtype;
+ u8 nla_type;
+ u16 len;
+ int value[0];
+};
+
+struct devlink_fmsg {
+ struct list_head item_list;
+};
+
+static struct devlink_fmsg *devlink_fmsg_alloc(void)
+{
+ struct devlink_fmsg *fmsg;
+
+ fmsg = kzalloc(sizeof(*fmsg), GFP_KERNEL);
+ if (!fmsg)
+ return NULL;
+
+ INIT_LIST_HEAD(&fmsg->item_list);
+
+ return fmsg;
+}
+
+static void devlink_fmsg_free(struct devlink_fmsg *fmsg)
+{
+ struct devlink_fmsg_item *item, *tmp;
+
+ list_for_each_entry_safe(item, tmp, &fmsg->item_list, list) {
+ list_del(&item->list);
+ kfree(item);
+ }
+ kfree(fmsg);
+}
+
+static int devlink_fmsg_nest_common(struct devlink_fmsg *fmsg,
+ int attrtype)
+{
+ struct devlink_fmsg_item *item;
+
+ item = kzalloc(sizeof(*item), GFP_KERNEL);
+ if (!item)
+ return -ENOMEM;
+
+ item->attrtype = attrtype;
+ list_add_tail(&item->list, &fmsg->item_list);
+
+ return 0;
+}
+
+int devlink_fmsg_obj_nest_start(struct devlink_fmsg *fmsg)
+{
+ return devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_OBJ_NEST_START);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_obj_nest_start);
+
+static int devlink_fmsg_nest_end(struct devlink_fmsg *fmsg)
+{
+ return devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_NEST_END);
+}
+
+int devlink_fmsg_obj_nest_end(struct devlink_fmsg *fmsg)
+{
+ return devlink_fmsg_nest_end(fmsg);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_obj_nest_end);
+
+#define DEVLINK_FMSG_MAX_SIZE (GENLMSG_DEFAULT_SIZE - GENL_HDRLEN - NLA_HDRLEN)
+
+static int devlink_fmsg_put_name(struct devlink_fmsg *fmsg, const char *name)
+{
+ struct devlink_fmsg_item *item;
+
+ if (strlen(name) + 1 > DEVLINK_FMSG_MAX_SIZE)
+ return -EMSGSIZE;
+
+ item = kzalloc(sizeof(*item) + strlen(name) + 1, GFP_KERNEL);
+ if (!item)
+ return -ENOMEM;
+
+ item->nla_type = NLA_NUL_STRING;
+ item->len = strlen(name) + 1;
+ item->attrtype = DEVLINK_ATTR_FMSG_OBJ_NAME;
+ memcpy(&item->value, name, item->len);
+ list_add_tail(&item->list, &fmsg->item_list);
+
+ return 0;
+}
+
+int devlink_fmsg_pair_nest_start(struct devlink_fmsg *fmsg, const char *name)
+{
+ int err;
+
+ err = devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_PAIR_NEST_START);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_put_name(fmsg, name);
+ if (err)
+ return err;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_pair_nest_start);
+
+int devlink_fmsg_pair_nest_end(struct devlink_fmsg *fmsg)
+{
+ return devlink_fmsg_nest_end(fmsg);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_pair_nest_end);
+
+int devlink_fmsg_arr_pair_nest_start(struct devlink_fmsg *fmsg,
+ const char *name)
+{
+ int err;
+
+ err = devlink_fmsg_pair_nest_start(fmsg, name);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_nest_common(fmsg, DEVLINK_ATTR_FMSG_ARR_NEST_START);
+ if (err)
+ return err;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_arr_pair_nest_start);
+
+int devlink_fmsg_arr_pair_nest_end(struct devlink_fmsg *fmsg)
+{
+ int err;
+
+ err = devlink_fmsg_nest_end(fmsg);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_nest_end(fmsg);
+ if (err)
+ return err;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_arr_pair_nest_end);
+
+static int devlink_fmsg_put_value(struct devlink_fmsg *fmsg,
+ const void *value, u16 value_len,
+ u8 value_nla_type)
+{
+ struct devlink_fmsg_item *item;
+
+ if (value_len > DEVLINK_FMSG_MAX_SIZE)
+ return -EMSGSIZE;
+
+ item = kzalloc(sizeof(*item) + value_len, GFP_KERNEL);
+ if (!item)
+ return -ENOMEM;
+
+ item->nla_type = value_nla_type;
+ item->len = value_len;
+ item->attrtype = DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA;
+ memcpy(&item->value, value, item->len);
+ list_add_tail(&item->list, &fmsg->item_list);
+
+ return 0;
+}
+
+int devlink_fmsg_bool_put(struct devlink_fmsg *fmsg, bool value)
+{
+ return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_FLAG);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_bool_put);
+
+int devlink_fmsg_u8_put(struct devlink_fmsg *fmsg, u8 value)
+{
+ return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U8);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_u8_put);
+
+int devlink_fmsg_u32_put(struct devlink_fmsg *fmsg, u32 value)
+{
+ return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U32);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_u32_put);
+
+int devlink_fmsg_u64_put(struct devlink_fmsg *fmsg, u64 value)
+{
+ return devlink_fmsg_put_value(fmsg, &value, sizeof(value), NLA_U64);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_u64_put);
+
+int devlink_fmsg_string_put(struct devlink_fmsg *fmsg, const char *value)
+{
+ return devlink_fmsg_put_value(fmsg, value, strlen(value) + 1,
+ NLA_NUL_STRING);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_string_put);
+
+int devlink_fmsg_binary_put(struct devlink_fmsg *fmsg, const void *value,
+ u16 value_len)
+{
+ return devlink_fmsg_put_value(fmsg, value, value_len, NLA_BINARY);
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_binary_put);
+
+int devlink_fmsg_bool_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ bool value)
+{
+ int err;
+
+ err = devlink_fmsg_pair_nest_start(fmsg, name);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_bool_put(fmsg, value);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_pair_nest_end(fmsg);
+ if (err)
+ return err;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_bool_pair_put);
+
+int devlink_fmsg_u8_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ u8 value)
+{
+ int err;
+
+ err = devlink_fmsg_pair_nest_start(fmsg, name);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_u8_put(fmsg, value);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_pair_nest_end(fmsg);
+ if (err)
+ return err;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_u8_pair_put);
+
+int devlink_fmsg_u32_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ u32 value)
+{
+ int err;
+
+ err = devlink_fmsg_pair_nest_start(fmsg, name);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_u32_put(fmsg, value);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_pair_nest_end(fmsg);
+ if (err)
+ return err;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_u32_pair_put);
+
+int devlink_fmsg_u64_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ u64 value)
+{
+ int err;
+
+ err = devlink_fmsg_pair_nest_start(fmsg, name);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_u64_put(fmsg, value);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_pair_nest_end(fmsg);
+ if (err)
+ return err;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_u64_pair_put);
+
+int devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ const char *value)
+{
+ int err;
+
+ err = devlink_fmsg_pair_nest_start(fmsg, name);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_string_put(fmsg, value);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_pair_nest_end(fmsg);
+ if (err)
+ return err;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_string_pair_put);
+
+int devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
+ const void *value, u16 value_len)
+{
+ int err;
+
+ err = devlink_fmsg_pair_nest_start(fmsg, name);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_binary_put(fmsg, value, value_len);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_pair_nest_end(fmsg);
+ if (err)
+ return err;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_fmsg_binary_pair_put);
+
+static int
+devlink_fmsg_item_fill_type(struct devlink_fmsg_item *msg, struct sk_buff *skb)
+{
+ switch (msg->nla_type) {
+ case NLA_FLAG:
+ case NLA_U8:
+ case NLA_U32:
+ case NLA_U64:
+ case NLA_NUL_STRING:
+ case NLA_BINARY:
+ return nla_put_u8(skb, DEVLINK_ATTR_FMSG_OBJ_VALUE_TYPE,
+ msg->nla_type);
+ default:
+ return -EINVAL;
+ }
+}
+
+static int
+devlink_fmsg_item_fill_data(struct devlink_fmsg_item *msg, struct sk_buff *skb)
+{
+ int attrtype = DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA;
+ u8 tmp;
+
+ switch (msg->nla_type) {
+ case NLA_FLAG:
+ /* Always provide flag data, regardless of its value */
+ tmp = *(bool *) msg->value;
+
+ return nla_put_u8(skb, attrtype, tmp);
+ case NLA_U8:
+ return nla_put_u8(skb, attrtype, *(u8 *) msg->value);
+ case NLA_U32:
+ return nla_put_u32(skb, attrtype, *(u32 *) msg->value);
+ case NLA_U64:
+ return nla_put_u64_64bit(skb, attrtype, *(u64 *) msg->value,
+ DEVLINK_ATTR_PAD);
+ case NLA_NUL_STRING:
+ return nla_put_string(skb, attrtype, (char *) &msg->value);
+ case NLA_BINARY:
+ return nla_put(skb, attrtype, msg->len, (void *) &msg->value);
+ default:
+ return -EINVAL;
+ }
+}
+
+static int
+devlink_fmsg_prepare_skb(struct devlink_fmsg *fmsg, struct sk_buff *skb,
+ int *start)
+{
+ struct devlink_fmsg_item *item;
+ struct nlattr *fmsg_nlattr;
+ int i = 0;
+ int err;
+
+ fmsg_nlattr = nla_nest_start(skb, DEVLINK_ATTR_FMSG);
+ if (!fmsg_nlattr)
+ return -EMSGSIZE;
+
+ list_for_each_entry(item, &fmsg->item_list, list) {
+ if (i < *start) {
+ i++;
+ continue;
+ }
+
+ switch (item->attrtype) {
+ case DEVLINK_ATTR_FMSG_OBJ_NEST_START:
+ case DEVLINK_ATTR_FMSG_PAIR_NEST_START:
+ case DEVLINK_ATTR_FMSG_ARR_NEST_START:
+ case DEVLINK_ATTR_FMSG_NEST_END:
+ err = nla_put_flag(skb, item->attrtype);
+ break;
+ case DEVLINK_ATTR_FMSG_OBJ_VALUE_DATA:
+ err = devlink_fmsg_item_fill_type(item, skb);
+ if (err)
+ break;
+ err = devlink_fmsg_item_fill_data(item, skb);
+ break;
+ case DEVLINK_ATTR_FMSG_OBJ_NAME:
+ err = nla_put_string(skb, item->attrtype,
+ (char *) &item->value);
+ break;
+ default:
+ err = -EINVAL;
+ break;
+ }
+ if (!err)
+ *start = ++i;
+ else
+ break;
+ }
+
+ nla_nest_end(skb, fmsg_nlattr);
+ return err;
+}
+
+static int devlink_fmsg_snd(struct devlink_fmsg *fmsg,
+ struct genl_info *info,
+ enum devlink_command cmd, int flags)
+{
+ struct nlmsghdr *nlh;
+ struct sk_buff *skb;
+ bool last = false;
+ int index = 0;
+ void *hdr;
+ int err;
+
+ while (!last) {
+ int tmp_index = index;
+
+ skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!skb)
+ return -ENOMEM;
+
+ hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq,
+ &devlink_nl_family, flags | NLM_F_MULTI, cmd);
+ if (!hdr) {
+ err = -EMSGSIZE;
+ goto nla_put_failure;
+ }
+
+ err = devlink_fmsg_prepare_skb(fmsg, skb, &index);
+ if (!err)
+ last = true;
+ else if (err != -EMSGSIZE || tmp_index == index)
+ goto nla_put_failure;
+
+ genlmsg_end(skb, hdr);
+ err = genlmsg_reply(skb, info);
+ if (err)
+ return err;
+ }
+
+ skb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!skb)
+ return -ENOMEM;
+ nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq,
+ NLMSG_DONE, 0, flags | NLM_F_MULTI);
+ if (!nlh) {
+ err = -EMSGSIZE;
+ goto nla_put_failure;
+ }
+ err = genlmsg_reply(skb, info);
+ if (err)
+ return err;
+
+ return 0;
+
+nla_put_failure:
+ nlmsg_free(skb);
+ return err;
+}
+
static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 09/11] net/mlx5e: Add tx reporter support
From: Eran Ben Elisha @ 2019-02-05 12:08 UTC (permalink / raw)
To: netdev, David S. Miller
Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin,
Eran Ben Elisha
In-Reply-To: <1549368532-17089-1-git-send-email-eranbe@mellanox.com>
Add mlx5e tx reporter to devlink health reporters. This reporter will be
responsible for diagnosing, reporting and recovering of tx errors.
This patch declares the TX reporter operations and creates it using the
devlink health API. Currently, this reporter supports reporting and
recovering from send error CQE only. In addition, it adds diagnose
information for the open SQs.
For a local SQ recover (due to driver error report), in case of SQ recover
failure, the recover operation will be considered as a failure.
For a full tx recover, an attempt to close and open the channels will be
done. If this one passed successfully, it will be considered as a
successful recover.
The SQ recover from error CQE flow is not a new feature in the driver,
this patch re-organize the functions and adapt them for the devlink
health API. For this purpose, move code from en_main.c to a new file
named reporter_tx.c.
Diagnose output:
$devlink health diagnose pci/0000:00:09.0 reporter tx -j -p
{
"SQs": [ {
"sqn": 138,
"HW state": 1,
"stopped": false
},{
"sqn": 142,
"HW state": 1,
"stopped": false
} ]
}
$devlink health diagnose pci/0000:00:09.0 reporter tx
SQs:
sqn: 138 HW state: 1 stopped: false
sqn: 142 HW state: 1 stopped: false
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
.../net/ethernet/mellanox/mlx5/core/Makefile | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en.h | 18 +-
.../ethernet/mellanox/mlx5/core/en/reporter.h | 14 +
.../mellanox/mlx5/core/en/reporter_tx.c | 259 ++++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/en_main.c | 136 +--------
.../net/ethernet/mellanox/mlx5/core/en_tx.c | 5 +-
6 files changed, 306 insertions(+), 128 deletions(-)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 9de9abacf7f6..6bb2a860b15b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -22,7 +22,7 @@ mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
#
mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
en_tx.o en_rx.o en_dim.o en_txrx.o en/xdp.o en_stats.o \
- en_selftest.o en/port.o en/monitor_stats.o
+ en_selftest.o en/port.o en/monitor_stats.o en/reporter_tx.o
#
# Netdev extra
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 6dd74ef69389..66e510b16243 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -387,10 +387,7 @@ struct mlx5e_txqsq {
struct mlx5e_channel *channel;
int txq_ix;
u32 rate_limit;
- struct mlx5e_txqsq_recover {
- struct work_struct recover_work;
- u64 last_recover;
- } recover;
+ struct work_struct recover_work;
} ____cacheline_aligned_in_smp;
struct mlx5e_dma_info {
@@ -683,6 +680,13 @@ struct mlx5e_rss_params {
u8 hfunc;
};
+struct mlx5e_modify_sq_param {
+ int curr_state;
+ int next_state;
+ int rl_update;
+ int rl_index;
+};
+
struct mlx5e_priv {
/* priv data path fields - start */
struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC];
@@ -738,6 +742,7 @@ struct mlx5e_priv {
#ifdef CONFIG_MLX5_EN_TLS
struct mlx5e_tls *tls;
#endif
+ struct devlink_health_reporter *tx_reporter;
};
struct mlx5e_profile {
@@ -868,6 +873,11 @@ void mlx5e_set_rq_type(struct mlx5_core_dev *mdev, struct mlx5e_params *params);
void mlx5e_init_rq_type_params(struct mlx5_core_dev *mdev,
struct mlx5e_params *params);
+int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
+ struct mlx5e_modify_sq_param *p);
+void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq);
+void mlx5e_tx_disable_queue(struct netdev_queue *txq);
+
static inline bool mlx5e_tunnel_inner_ft_supported(struct mlx5_core_dev *mdev)
{
return (MLX5_CAP_ETH(mdev, tunnel_stateless_gre) &&
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
new file mode 100644
index 000000000000..1c90059db5f8
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2019 Mellanox Technologies. */
+
+#ifndef __MLX5E_EN_REPORTER_H
+#define __MLX5E_EN_REPORTER_H
+
+#include <linux/mlx5/driver.h>
+#include "en.h"
+
+int mlx5e_tx_reporter_create(struct mlx5e_priv *priv);
+void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv);
+void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq);
+
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
new file mode 100644
index 000000000000..35568e4820de
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -0,0 +1,259 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2019 Mellanox Technologies. */
+
+#include <net/devlink.h>
+#include "reporter.h"
+#include "lib/eq.h"
+
+#define MLX5E_TX_REPORTER_PER_SQ_MAX_LEN 256
+
+struct mlx5e_tx_err_ctx {
+ int (*recover)(struct mlx5e_txqsq *sq);
+ struct mlx5e_txqsq *sq;
+};
+
+static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
+{
+ unsigned long exp_time = jiffies + msecs_to_jiffies(2000);
+
+ while (time_before(jiffies, exp_time)) {
+ if (sq->cc == sq->pc)
+ return 0;
+
+ msleep(20);
+ }
+
+ netdev_err(sq->channel->netdev,
+ "Wait for SQ 0x%x flush timeout (sq cc = 0x%x, sq pc = 0x%x)\n",
+ sq->sqn, sq->cc, sq->pc);
+
+ return -ETIMEDOUT;
+}
+
+static void mlx5e_reset_txqsq_cc_pc(struct mlx5e_txqsq *sq)
+{
+ WARN_ONCE(sq->cc != sq->pc,
+ "SQ 0x%x: cc (0x%x) != pc (0x%x)\n",
+ sq->sqn, sq->cc, sq->pc);
+ sq->cc = 0;
+ sq->dma_fifo_cc = 0;
+ sq->pc = 0;
+}
+
+static int mlx5e_sq_to_ready(struct mlx5e_txqsq *sq, int curr_state)
+{
+ struct mlx5_core_dev *mdev = sq->channel->mdev;
+ struct net_device *dev = sq->channel->netdev;
+ struct mlx5e_modify_sq_param msp = {0};
+ int err;
+
+ msp.curr_state = curr_state;
+ msp.next_state = MLX5_SQC_STATE_RST;
+
+ err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
+ if (err) {
+ netdev_err(dev, "Failed to move sq 0x%x to reset\n", sq->sqn);
+ return err;
+ }
+
+ memset(&msp, 0, sizeof(msp));
+ msp.curr_state = MLX5_SQC_STATE_RST;
+ msp.next_state = MLX5_SQC_STATE_RDY;
+
+ err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
+ if (err) {
+ netdev_err(dev, "Failed to move sq 0x%x to ready\n", sq->sqn);
+ return err;
+ }
+
+ return 0;
+}
+
+static int mlx5e_tx_reporter_err_cqe_recover(struct mlx5e_txqsq *sq)
+{
+ struct mlx5_core_dev *mdev = sq->channel->mdev;
+ struct net_device *dev = sq->channel->netdev;
+ u8 state;
+ int err;
+
+ if (!test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state))
+ return 0;
+
+ err = mlx5_core_query_sq_state(mdev, sq->sqn, &state);
+ if (err) {
+ netdev_err(dev, "Failed to query SQ 0x%x state. err = %d\n",
+ sq->sqn, err);
+ return err;
+ }
+
+ if (state != MLX5_SQC_STATE_ERR) {
+ netdev_err(dev, "SQ 0x%x not in ERROR state\n", sq->sqn);
+ return -EINVAL;
+ }
+
+ mlx5e_tx_disable_queue(sq->txq);
+
+ err = mlx5e_wait_for_sq_flush(sq);
+ if (err)
+ return err;
+
+ /* At this point, no new packets will arrive from the stack as TXQ is
+ * marked with QUEUE_STATE_DRV_XOFF. In addition, NAPI cleared all
+ * pending WQEs. SQ can safely reset the SQ.
+ */
+
+ err = mlx5e_sq_to_ready(sq, state);
+ if (err)
+ return err;
+
+ mlx5e_reset_txqsq_cc_pc(sq);
+ sq->stats->recover++;
+ mlx5e_activate_txqsq(sq);
+
+ return 0;
+}
+
+void mlx5e_tx_reporter_err_cqe(struct mlx5e_txqsq *sq)
+{
+ char err_str[MLX5E_TX_REPORTER_PER_SQ_MAX_LEN];
+ struct mlx5e_tx_err_ctx err_ctx = {0};
+
+ err_ctx.sq = sq;
+ err_ctx.recover = mlx5e_tx_reporter_err_cqe_recover;
+ sprintf(err_str, "ERR CQE on SQ: 0x%x", sq->sqn);
+
+ devlink_health_report(sq->channel->priv->tx_reporter, err_str,
+ &err_ctx);
+}
+
+/* state lock cannot be grabbed within this function.
+ * It can cause a dead lock or a read-after-free.
+ */
+int mlx5e_tx_reporter_recover_from_ctx(struct mlx5e_tx_err_ctx *err_ctx)
+{
+ return err_ctx->recover(err_ctx->sq);
+}
+
+static int mlx5e_tx_reporter_recover_all(struct mlx5e_priv *priv)
+{
+ int err;
+
+ rtnl_lock();
+ mutex_lock(&priv->state_lock);
+ mlx5e_close_locked(priv->netdev);
+ err = mlx5e_open_locked(priv->netdev);
+ mutex_unlock(&priv->state_lock);
+ rtnl_unlock();
+
+ return err;
+}
+
+static int mlx5e_tx_reporter_recover(struct devlink_health_reporter *reporter,
+ void *context)
+{
+ struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
+ struct mlx5e_tx_err_ctx *err_ctx = context;
+
+ return err_ctx ? mlx5e_tx_reporter_recover_from_ctx(err_ctx) :
+ mlx5e_tx_reporter_recover_all(priv);
+}
+
+static int
+mlx5e_tx_reporter_build_diagnose_output(struct devlink_fmsg *fmsg,
+ u32 sqn, u8 state, bool stopped)
+{
+ int err;
+
+ err = devlink_fmsg_obj_nest_start(fmsg);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_u32_pair_put(fmsg, "sqn", sqn);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_u8_pair_put(fmsg, "HW state", state);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_bool_pair_put(fmsg, "stopped", stopped);
+ if (err)
+ return err;
+
+ err = devlink_fmsg_obj_nest_end(fmsg);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static int mlx5e_tx_reporter_diagnose(struct devlink_health_reporter *reporter,
+ struct devlink_fmsg *fmsg)
+{
+ struct mlx5e_priv *priv = devlink_health_reporter_priv(reporter);
+ int i, err = 0;
+
+ mutex_lock(&priv->state_lock);
+
+ if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
+ goto unlock;
+
+ err = devlink_fmsg_arr_pair_nest_start(fmsg, "SQs");
+ if (err)
+ goto unlock;
+
+ for (i = 0; i < priv->channels.num * priv->channels.params.num_tc;
+ i++) {
+ struct mlx5e_txqsq *sq = priv->txq2sq[i];
+ u8 state;
+
+ err = mlx5_core_query_sq_state(priv->mdev, sq->sqn, &state);
+ if (err)
+ break;
+
+ err = mlx5e_tx_reporter_build_diagnose_output(fmsg, sq->sqn,
+ state,
+ netif_xmit_stopped(sq->txq));
+ if (err)
+ break;
+ }
+ err = devlink_fmsg_arr_pair_nest_end(fmsg);
+ if (err)
+ goto unlock;
+
+unlock:
+ mutex_unlock(&priv->state_lock);
+ return err;
+}
+
+static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops = {
+ .name = "tx",
+ .recover = mlx5e_tx_reporter_recover,
+ .diagnose = mlx5e_tx_reporter_diagnose,
+};
+
+#define MLX5_REPORTER_TX_GRACEFUL_PERIOD 500
+
+int mlx5e_tx_reporter_create(struct mlx5e_priv *priv)
+{
+ struct mlx5_core_dev *mdev = priv->mdev;
+ struct devlink *devlink = priv_to_devlink(mdev);
+
+ priv->tx_reporter =
+ devlink_health_reporter_create(devlink, &mlx5_tx_reporter_ops,
+ MLX5_REPORTER_TX_GRACEFUL_PERIOD,
+ true, priv);
+ if (IS_ERR_OR_NULL(priv->tx_reporter))
+ netdev_warn(priv->netdev,
+ "Failed to create tx reporter, err = %ld\n",
+ PTR_ERR(priv->tx_reporter));
+ return PTR_ERR_OR_ZERO(priv->tx_reporter);
+}
+
+void mlx5e_tx_reporter_destroy(struct mlx5e_priv *priv)
+{
+ if (IS_ERR_OR_NULL(priv->tx_reporter))
+ return;
+
+ devlink_health_reporter_destroy(priv->tx_reporter);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 099d307e6f25..93e1f037b019 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -51,6 +51,7 @@
#include "en/xdp.h"
#include "lib/eq.h"
#include "en/monitor_stats.h"
+#include "en/reporter.h"
struct mlx5e_rq_param {
u32 rqc[MLX5_ST_SZ_DW(rqc)];
@@ -1159,7 +1160,7 @@ static int mlx5e_alloc_txqsq_db(struct mlx5e_txqsq *sq, int numa)
return 0;
}
-static void mlx5e_sq_recover(struct work_struct *work);
+static void mlx5e_tx_err_cqe_work(struct work_struct *recover_work);
static int mlx5e_alloc_txqsq(struct mlx5e_channel *c,
int txq_ix,
struct mlx5e_params *params,
@@ -1181,7 +1182,7 @@ static int mlx5e_alloc_txqsq(struct mlx5e_channel *c,
sq->uar_map = mdev->mlx5e_res.bfreg.map;
sq->min_inline_mode = params->tx_min_inline_mode;
sq->stats = &c->priv->channel_stats[c->ix].sq[tc];
- INIT_WORK(&sq->recover.recover_work, mlx5e_sq_recover);
+ INIT_WORK(&sq->recover_work, mlx5e_tx_err_cqe_work);
if (MLX5_IPSEC_DEV(c->priv->mdev))
set_bit(MLX5E_SQ_STATE_IPSEC, &sq->state);
if (mlx5_accel_is_tls_device(c->priv->mdev))
@@ -1269,15 +1270,8 @@ static int mlx5e_create_sq(struct mlx5_core_dev *mdev,
return err;
}
-struct mlx5e_modify_sq_param {
- int curr_state;
- int next_state;
- bool rl_update;
- int rl_index;
-};
-
-static int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
- struct mlx5e_modify_sq_param *p)
+int mlx5e_modify_sq(struct mlx5_core_dev *mdev, u32 sqn,
+ struct mlx5e_modify_sq_param *p)
{
void *in;
void *sqc;
@@ -1375,17 +1369,7 @@ static int mlx5e_open_txqsq(struct mlx5e_channel *c,
return err;
}
-static void mlx5e_reset_txqsq_cc_pc(struct mlx5e_txqsq *sq)
-{
- WARN_ONCE(sq->cc != sq->pc,
- "SQ 0x%x: cc (0x%x) != pc (0x%x)\n",
- sq->sqn, sq->cc, sq->pc);
- sq->cc = 0;
- sq->dma_fifo_cc = 0;
- sq->pc = 0;
-}
-
-static void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
+void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
{
sq->txq = netdev_get_tx_queue(sq->channel->netdev, sq->txq_ix);
clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state);
@@ -1394,7 +1378,7 @@ static void mlx5e_activate_txqsq(struct mlx5e_txqsq *sq)
netif_tx_start_queue(sq->txq);
}
-static inline void netif_tx_disable_queue(struct netdev_queue *txq)
+void mlx5e_tx_disable_queue(struct netdev_queue *txq)
{
__netif_tx_lock_bh(txq);
netif_tx_stop_queue(txq);
@@ -1410,7 +1394,7 @@ static void mlx5e_deactivate_txqsq(struct mlx5e_txqsq *sq)
/* prevent netif_tx_wake_queue */
napi_synchronize(&c->napi);
- netif_tx_disable_queue(sq->txq);
+ mlx5e_tx_disable_queue(sq->txq);
/* last doorbell out, godspeed .. */
if (mlx5e_wqc_has_room_for(wq, sq->cc, sq->pc, 1)) {
@@ -1430,6 +1414,7 @@ static void mlx5e_close_txqsq(struct mlx5e_txqsq *sq)
struct mlx5_rate_limit rl = {0};
cancel_work_sync(&sq->dim.work);
+ cancel_work_sync(&sq->recover_work);
mlx5e_destroy_sq(mdev, sq->sqn);
if (sq->rate_limit) {
rl.rate = sq->rate_limit;
@@ -1439,105 +1424,12 @@ static void mlx5e_close_txqsq(struct mlx5e_txqsq *sq)
mlx5e_free_txqsq(sq);
}
-static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
-{
- unsigned long exp_time = jiffies + msecs_to_jiffies(2000);
-
- while (time_before(jiffies, exp_time)) {
- if (sq->cc == sq->pc)
- return 0;
-
- msleep(20);
- }
-
- netdev_err(sq->channel->netdev,
- "Wait for SQ 0x%x flush timeout (sq cc = 0x%x, sq pc = 0x%x)\n",
- sq->sqn, sq->cc, sq->pc);
-
- return -ETIMEDOUT;
-}
-
-static int mlx5e_sq_to_ready(struct mlx5e_txqsq *sq, int curr_state)
+static void mlx5e_tx_err_cqe_work(struct work_struct *recover_work)
{
- struct mlx5_core_dev *mdev = sq->channel->mdev;
- struct net_device *dev = sq->channel->netdev;
- struct mlx5e_modify_sq_param msp = {0};
- int err;
-
- msp.curr_state = curr_state;
- msp.next_state = MLX5_SQC_STATE_RST;
-
- err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
- if (err) {
- netdev_err(dev, "Failed to move sq 0x%x to reset\n", sq->sqn);
- return err;
- }
-
- memset(&msp, 0, sizeof(msp));
- msp.curr_state = MLX5_SQC_STATE_RST;
- msp.next_state = MLX5_SQC_STATE_RDY;
-
- err = mlx5e_modify_sq(mdev, sq->sqn, &msp);
- if (err) {
- netdev_err(dev, "Failed to move sq 0x%x to ready\n", sq->sqn);
- return err;
- }
-
- return 0;
-}
-
-static void mlx5e_sq_recover(struct work_struct *work)
-{
- struct mlx5e_txqsq_recover *recover =
- container_of(work, struct mlx5e_txqsq_recover,
- recover_work);
- struct mlx5e_txqsq *sq = container_of(recover, struct mlx5e_txqsq,
- recover);
- struct mlx5_core_dev *mdev = sq->channel->mdev;
- struct net_device *dev = sq->channel->netdev;
- u8 state;
- int err;
-
- err = mlx5_core_query_sq_state(mdev, sq->sqn, &state);
- if (err) {
- netdev_err(dev, "Failed to query SQ 0x%x state. err = %d\n",
- sq->sqn, err);
- return;
- }
-
- if (state != MLX5_RQC_STATE_ERR) {
- netdev_err(dev, "SQ 0x%x not in ERROR state\n", sq->sqn);
- return;
- }
-
- netif_tx_disable_queue(sq->txq);
-
- if (mlx5e_wait_for_sq_flush(sq))
- return;
-
- /* If the interval between two consecutive recovers per SQ is too
- * short, don't recover to avoid infinite loop of ERR_CQE -> recover.
- * If we reached this state, there is probably a bug that needs to be
- * fixed. let's keep the queue close and let tx timeout cleanup.
- */
- if (jiffies_to_msecs(jiffies - recover->last_recover) <
- MLX5E_SQ_RECOVER_MIN_INTERVAL) {
- netdev_err(dev, "Recover SQ 0x%x canceled, too many error CQEs\n",
- sq->sqn);
- return;
- }
-
- /* At this point, no new packets will arrive from the stack as TXQ is
- * marked with QUEUE_STATE_DRV_XOFF. In addition, NAPI cleared all
- * pending WQEs. SQ can safely reset the SQ.
- */
- if (mlx5e_sq_to_ready(sq, state))
- return;
+ struct mlx5e_txqsq *sq = container_of(recover_work, struct mlx5e_txqsq,
+ recover_work);
- mlx5e_reset_txqsq_cc_pc(sq);
- sq->stats->recover++;
- recover->last_recover = jiffies;
- mlx5e_activate_txqsq(sq);
+ mlx5e_tx_reporter_err_cqe(sq);
}
static int mlx5e_open_icosq(struct mlx5e_channel *c,
@@ -3236,6 +3128,7 @@ static void mlx5e_cleanup_nic_tx(struct mlx5e_priv *priv)
{
int tc;
+ mlx5e_tx_reporter_destroy(priv);
for (tc = 0; tc < priv->profile->max_tc; tc++)
mlx5e_destroy_tis(priv->mdev, priv->tisn[tc]);
}
@@ -4953,6 +4846,7 @@ static int mlx5e_init_nic_tx(struct mlx5e_priv *priv)
#ifdef CONFIG_MLX5_CORE_EN_DCB
mlx5e_dcbnl_initialize(priv);
#endif
+ mlx5e_tx_reporter_create(priv);
return 0;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 598ad7e4d5c9..189211295e48 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -513,8 +513,9 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
&sq->state)) {
mlx5e_dump_error_cqe(sq,
(struct mlx5_err_cqe *)cqe);
- queue_work(cq->channel->priv->wq,
- &sq->recover.recover_work);
+ if (!IS_ERR_OR_NULL(cq->channel->priv->tx_reporter))
+ queue_work(cq->channel->priv->wq,
+ &sq->recover_work);
}
stats->cqe_err++;
}
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 02/11] devlink: Add health reporter create/destroy functionality
From: Eran Ben Elisha @ 2019-02-05 12:08 UTC (permalink / raw)
To: netdev, David S. Miller
Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin,
Eran Ben Elisha
In-Reply-To: <1549368532-17089-1-git-send-email-eranbe@mellanox.com>
Devlink health reporter is an instance for reporting, diagnosing and
recovering from run time errors discovered by the reporters.
Define it's data structure and supported operations.
In addition, expose devlink API to create and destroy a reporter.
Each devlink instance will hold it's own reporters list.
As part of the allocation, driver shall provide a set of callbacks which
will be used by devlink in order to handle health reports and user
commands related to this reporter. In addition, driver is entitled to
provide some priv pointer, which can be fetched from the reporter by
devlink_health_reporter_priv function.
For each reporter, devlink will hold a metadata of statistics,
dump msg and status.
For passing dumps and diagnose data to the user-space, it will use devlink
fmsg API.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
include/net/devlink.h | 53 +++++++++++++++++++++++++
net/core/devlink.c | 92 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 145 insertions(+)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 7c5722e816aa..3dfe30235878 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -30,6 +30,7 @@ struct devlink {
struct list_head param_list;
struct list_head region_list;
u32 snapshot_id;
+ struct list_head reporter_list;
struct devlink_dpipe_headers *dpipe_headers;
const struct devlink_ops *ops;
struct device *dev;
@@ -449,6 +450,27 @@ struct devlink_info_req;
typedef void devlink_snapshot_data_dest_t(const void *data);
struct devlink_fmsg;
+struct devlink_health_reporter;
+
+/**
+ * struct devlink_health_reporter_ops - Reporter operations
+ * @name: reporter name
+ * @recover: callback to recover from reported error
+ * if priv_ctx is NULL, run a full recover
+ * @dump: callback to dump an object
+ * if priv_ctx is NULL, run a full dump
+ * @diagnose: callback to diagnose the current status
+ */
+
+struct devlink_health_reporter_ops {
+ char *name;
+ int (*recover)(struct devlink_health_reporter *reporter,
+ void *priv_ctx);
+ int (*dump)(struct devlink_health_reporter *reporter,
+ struct devlink_fmsg *fmsg, void *priv_ctx);
+ int (*diagnose)(struct devlink_health_reporter *reporter,
+ struct devlink_fmsg *fmsg);
+};
struct devlink_ops {
int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
@@ -672,6 +694,17 @@ int devlink_fmsg_string_pair_put(struct devlink_fmsg *fmsg, const char *name,
int devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
const void *value, u16 value_len);
+struct devlink_health_reporter *
+devlink_health_reporter_create(struct devlink *devlink,
+ const struct devlink_health_reporter_ops *ops,
+ u64 graceful_period, bool auto_recover,
+ void *priv);
+void
+devlink_health_reporter_destroy(struct devlink_health_reporter *reporter);
+
+void *
+devlink_health_reporter_priv(struct devlink_health_reporter *reporter);
+
#else
static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
@@ -1120,6 +1153,26 @@ devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
{
return 0;
}
+
+static inline struct devlink_health_reporter *
+devlink_health_reporter_create(struct devlink *devlink,
+ const struct devlink_health_reporter_ops *ops,
+ u64 graceful_period, bool auto_recover,
+ void *priv)
+{
+ return NULL;
+}
+
+static inline void
+devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
+{
+}
+
+static inline void *
+devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
+{
+ return NULL;
+}
#endif
#if IS_REACHABLE(CONFIG_NET_DEVLINK)
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 20bc28326ded..9192f5e10741 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4361,6 +4361,97 @@ static int devlink_fmsg_snd(struct devlink_fmsg *fmsg,
return err;
}
+struct devlink_health_reporter {
+ struct list_head list;
+ void *priv;
+ const struct devlink_health_reporter_ops *ops;
+ struct devlink *devlink;
+ u64 graceful_period;
+ bool auto_recover;
+ u8 health_state;
+};
+
+void *
+devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
+{
+ return reporter->priv;
+}
+EXPORT_SYMBOL_GPL(devlink_health_reporter_priv);
+
+static struct devlink_health_reporter *
+devlink_health_reporter_find_by_name(struct devlink *devlink,
+ const char *reporter_name)
+{
+ struct devlink_health_reporter *reporter;
+
+ list_for_each_entry(reporter, &devlink->reporter_list, list)
+ if (!strcmp(reporter->ops->name, reporter_name))
+ return reporter;
+ return NULL;
+}
+
+/**
+ * devlink_health_reporter_create - create devlink health reporter
+ *
+ * @devlink: devlink
+ * @ops: ops
+ * @graceful_period: to avoid recovery loops, in msecs
+ * @auto_recover: auto recover when error occurs
+ * @priv: priv
+ */
+struct devlink_health_reporter *
+devlink_health_reporter_create(struct devlink *devlink,
+ const struct devlink_health_reporter_ops *ops,
+ u64 graceful_period, bool auto_recover,
+ void *priv)
+{
+ struct devlink_health_reporter *reporter;
+
+ mutex_lock(&devlink->lock);
+ if (devlink_health_reporter_find_by_name(devlink, ops->name)) {
+ reporter = ERR_PTR(-EEXIST);
+ goto unlock;
+ }
+
+ if (WARN_ON(auto_recover && !ops->recover) ||
+ WARN_ON(graceful_period && !ops->recover)) {
+ reporter = ERR_PTR(-EINVAL);
+ goto unlock;
+ }
+
+ reporter = kzalloc(sizeof(*reporter), GFP_KERNEL);
+ if (!reporter) {
+ reporter = ERR_PTR(-ENOMEM);
+ goto unlock;
+ }
+
+ reporter->priv = priv;
+ reporter->ops = ops;
+ reporter->devlink = devlink;
+ reporter->graceful_period = graceful_period;
+ reporter->auto_recover = auto_recover;
+ list_add_tail(&reporter->list, &devlink->reporter_list);
+unlock:
+ mutex_unlock(&devlink->lock);
+ return reporter;
+}
+EXPORT_SYMBOL_GPL(devlink_health_reporter_create);
+
+/**
+ * devlink_health_reporter_destroy - destroy devlink health reporter
+ *
+ * @reporter: devlink health reporter to destroy
+ */
+void
+devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
+{
+ mutex_lock(&reporter->devlink->lock);
+ list_del(&reporter->list);
+ mutex_unlock(&reporter->devlink->lock);
+ kfree(reporter);
+}
+EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy);
+
static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -4669,6 +4760,7 @@ struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
INIT_LIST_HEAD(&devlink->resource_list);
INIT_LIST_HEAD(&devlink->param_list);
INIT_LIST_HEAD(&devlink->region_list);
+ INIT_LIST_HEAD(&devlink->reporter_list);
mutex_init(&devlink->lock);
return devlink;
}
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 00/11] Devlink health reporting and recovery system
From: Eran Ben Elisha @ 2019-02-05 12:08 UTC (permalink / raw)
To: netdev, David S. Miller
Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin,
Eran Ben Elisha
The health mechanism is targeted for Real Time Alerting, in order to know when
something bad had happened to a PCI device
- Provide alert debug information
- Self healing
- If problem needs vendor support, provide a way to gather all needed debugging
information.
The main idea is to unify and centralize driver health reports in the
generic devlink instance and allow the user to set different
attributes of the health reporting and recovery procedures.
The devlink health reporter:
Device driver creates a "health reporter" per each error/health type.
Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
or unknown (driver specific).
For each registered health reporter a driver can issue error/health reports
asynchronously. All health reports handling is done by devlink.
Device driver can provide specific callbacks for each "health reporter", e.g.
- Recovery procedures
- Diagnostics and object dump procedures
- OOB initial attributes
Different parts of the driver can register different types of health reporters
with different handlers.
Once an error is reported, devlink health will do the following actions:
* A log is being send to the kernel trace events buffer
* Health status and statistics are being updated for the reporter instance
* Object dump is being taken and saved at the reporter instance (as long as
there is no other dump which is already stored)
* Auto recovery attempt is being done. Depends on:
- Auto-recovery configuration
- Grace period vs. time passed since last recover
The user interface:
User can access/change each reporter attributes and driver specific callbacks
via devlink, e.g per error type (per health reporter)
- Configure reporter's generic attributes (like: Disable/enable auto recovery)
- Invoke recovery procedure
- Run diagnostics
- Object dump
The devlink health interface (via netlink):
DEVLINK_CMD_HEALTH_REPORTER_GET
Retrieves status and configuration info per DEV and reporter.
DEVLINK_CMD_HEALTH_REPORTER_SET
Allows reporter-related configuration setting.
DEVLINK_CMD_HEALTH_REPORTER_RECOVER
Triggers a reporter's recovery procedure.
DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
Retrieves diagnostics data from a reporter on a device.
DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET
Retrieves the last stored dump. Devlink health
saves a single dump. If an dump is not already stored by the devlink
for this reporter, devlink generates a new dump.
dump output is defined by the reporter.
DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR
Clears the last saved dump file for the specified reporter.
netlink
+--------------------------+
| |
| + |
| | |
+--------------------------+
|request for ops
|(diagnose,
mlx5_core devlink |recover,
|dump)
+--------+ +--------------------------+
| | | reporter| |
| | | +---------v----------+ |
| | ops execution | | | |
| <----------------------------------+ | |
| | | | | |
| | | + ^------------------+ |
| | | | request for ops |
| | | | (recover, dump) |
| | | | |
| | | +-+------------------+ |
| | health report | | health handler | |
| +-------------------------------> | |
| | | +--------------------+ |
| | health reporter create | |
| +----------------------------> |
+--------+ +--------------------------+
In this patchset, mlx5e TX reporter is implemented.
Cmdline format:
devlink health show [DEV reporter REPORTE_NAME]
devlink health recover DEV reporter REPORTER_NAME
devlink health diagnose DEV reporter REPORTER_NAME
devlink health dump show DEV reporter REPORTER_NAME
devlink health dump clear DEV reporter REPORTER_NAME
devlink health set DEV reporter REPORTER_NAME NAME VALUE
Cmdline examples:
$devlink health show
pci/0000:00:09.0:
name tx
state healthy #err 1 #recover 0 last_dump_ts N/A
parameters:
grace_period 500 auto_recover false
$devlink health diagnose pci/0000:00:09.0 reporter tx -j -p
{
"SQs": [ {
"sqn": 138,
"HW state": 1,
"stopped": false
},{
"sqn": 142,
"HW state": 1,
"stopped": false
} ]
}
$devlink health diagnose pci/0000:00:09.0 reporter tx
SQs:
sqn: 138 HW state: 1 stopped: false
sqn: 142 HW state: 1 stopped: false
$devlink health recover pci/0000:00:09 reporter tx
$devlink health set pci/0000:00:09.0 reporter tx grace_period 3500
$devlink health set pci/0000:00:09.0 reporter tx auto_recover false
Aya Levin (1):
devlink: Add Documentation/networking/devlink-health.txt
Eran Ben Elisha (10):
devlink: Add devlink formatted message (fmsg) API
devlink: Add health reporter create/destroy functionality
devlink: Add health report functionality
devlink: Add health get command
devlink: Add health set command
devlink: Add health recover command
devlink: Add health diagnose command
devlink: Add health dump {get,clear} commands
net/mlx5e: Add tx reporter support
net/mlx5e: Add tx timeout support for mlx5e tx reporter
Documentation/networking/devlink-health.txt | 86 ++
.../net/ethernet/mellanox/mlx5/core/Makefile | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en.h | 18 +-
.../ethernet/mellanox/mlx5/core/en/reporter.h | 15 +
.../mellanox/mlx5/core/en/reporter_tx.c | 297 +++++
.../net/ethernet/mellanox/mlx5/core/en_main.c | 189 +---
.../net/ethernet/mellanox/mlx5/core/en_tx.c | 5 +-
include/net/devlink.h | 211 ++++
include/trace/events/devlink.h | 62 +
include/uapi/linux/devlink.h | 24 +
net/core/devlink.c | 1008 +++++++++++++++++
11 files changed, 1752 insertions(+), 165 deletions(-)
create mode 100644 Documentation/networking/devlink-health.txt
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter.h
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
--
2.17.1
^ permalink raw reply
* [PATCH v3 net-next 03/11] devlink: Add health report functionality
From: Eran Ben Elisha @ 2019-02-05 12:08 UTC (permalink / raw)
To: netdev, David S. Miller
Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin,
Eran Ben Elisha
In-Reply-To: <1549368532-17089-1-git-send-email-eranbe@mellanox.com>
Upon error discover, every driver can report it to the devlink health
mechanism via devlink_health_report function, using the appropriate
reporter registered to it. Driver can pass error specific context which
will be delivered to it as part of the dump / recovery callbacks.
Once an error is reported, devlink health will do the following actions:
* A log is being send to the kernel trace events buffer
* Health status and statistics are being updated for the reporter instance
* Object dump is being taken and stored at the reporter instance (as long
as there is no other dump which is already stored)
* Auto recovery attempt is being done. Depends on:
- Auto Recovery configuration
- Grace period vs. Time since last recover
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
include/net/devlink.h | 9 +++
include/trace/events/devlink.h | 62 +++++++++++++++++
net/core/devlink.c | 119 +++++++++++++++++++++++++++++++++
3 files changed, 190 insertions(+)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 3dfe30235878..c12ad6e9095d 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -704,6 +704,8 @@ devlink_health_reporter_destroy(struct devlink_health_reporter *reporter);
void *
devlink_health_reporter_priv(struct devlink_health_reporter *reporter);
+int devlink_health_report(struct devlink_health_reporter *reporter,
+ const char *msg, void *priv_ctx);
#else
@@ -1173,6 +1175,13 @@ devlink_health_reporter_priv(struct devlink_health_reporter *reporter)
{
return NULL;
}
+
+static inline int
+devlink_health_report(struct devlink_health_reporter *reporter,
+ const char *msg, void *priv_ctx)
+{
+ return 0;
+}
#endif
#if IS_REACHABLE(CONFIG_NET_DEVLINK)
diff --git a/include/trace/events/devlink.h b/include/trace/events/devlink.h
index 44acfbca1266..bc09c3408912 100644
--- a/include/trace/events/devlink.h
+++ b/include/trace/events/devlink.h
@@ -46,6 +46,65 @@ TRACE_EVENT(devlink_hwmsg,
(int) __entry->len, __get_dynamic_array(buf), __entry->len)
);
+TRACE_EVENT(devlink_health_report,
+ TP_PROTO(const struct devlink *devlink, const char *reporter_name,
+ const char *msg),
+
+ TP_ARGS(devlink, reporter_name, msg),
+
+ TP_STRUCT__entry(
+ __string(bus_name, devlink->dev->bus->name)
+ __string(dev_name, dev_name(devlink->dev))
+ __string(driver_name, devlink->dev->driver->name)
+ __string(reporter_name, msg)
+ __string(msg, msg)
+ ),
+
+ TP_fast_assign(
+ __assign_str(bus_name, devlink->dev->bus->name);
+ __assign_str(dev_name, dev_name(devlink->dev));
+ __assign_str(driver_name, devlink->dev->driver->name);
+ __assign_str(reporter_name, reporter_name);
+ __assign_str(msg, msg);
+ ),
+
+ TP_printk("bus_name=%s dev_name=%s driver_name=%s reporter_name=%s: %s",
+ __get_str(bus_name), __get_str(dev_name),
+ __get_str(driver_name), __get_str(reporter_name),
+ __get_str(msg))
+);
+
+TRACE_EVENT(devlink_health_recover_aborted,
+ TP_PROTO(const struct devlink *devlink, const char *reporter_name,
+ bool health_state, u64 time_since_last_recover),
+
+ TP_ARGS(devlink, reporter_name, health_state, time_since_last_recover),
+
+ TP_STRUCT__entry(
+ __string(bus_name, devlink->dev->bus->name)
+ __string(dev_name, dev_name(devlink->dev))
+ __string(driver_name, devlink->dev->driver->name)
+ __string(reporter_name, reporter_name)
+ __field(bool, health_state)
+ __field(u64, time_since_last_recover)
+ ),
+
+ TP_fast_assign(
+ __assign_str(bus_name, devlink->dev->bus->name);
+ __assign_str(dev_name, dev_name(devlink->dev));
+ __assign_str(driver_name, devlink->dev->driver->name);
+ __assign_str(reporter_name, reporter_name);
+ __entry->health_state = health_state;
+ __entry->time_since_last_recover = time_since_last_recover;
+ ),
+
+ TP_printk("bus_name=%s dev_name=%s driver_name=%s reporter_name=%s: health_state=%d time_since_last_recover=%llu recover aborted",
+ __get_str(bus_name), __get_str(dev_name),
+ __get_str(driver_name), __get_str(reporter_name),
+ __entry->health_state,
+ __entry->time_since_last_recover)
+);
+
#endif /* _TRACE_DEVLINK_H */
/* This part must be outside protection */
@@ -64,6 +123,9 @@ static inline void trace_devlink_hwmsg(const struct devlink *devlink,
{
}
+static inline void trace_devlink_health(const char *msg)
+{
+}
#endif /* _TRACE_DEVLINK_H */
#endif
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 9192f5e10741..f45eb914597e 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4366,9 +4366,20 @@ struct devlink_health_reporter {
void *priv;
const struct devlink_health_reporter_ops *ops;
struct devlink *devlink;
+ struct devlink_fmsg *dump_fmsg;
+ struct mutex dump_lock; /* lock parallel read/write from dump buffers */
u64 graceful_period;
bool auto_recover;
u8 health_state;
+ u64 dump_ts;
+ u64 error_count;
+ u64 recovery_count;
+ u64 last_recovery_ts;
+};
+
+enum devlink_health_reporter_state {
+ DEVLINK_HEALTH_REPORTER_STATE_HEALTHY,
+ DEVLINK_HEALTH_REPORTER_STATE_ERROR,
};
void *
@@ -4430,6 +4441,7 @@ devlink_health_reporter_create(struct devlink *devlink,
reporter->devlink = devlink;
reporter->graceful_period = graceful_period;
reporter->auto_recover = auto_recover;
+ mutex_init(&reporter->dump_lock);
list_add_tail(&reporter->list, &devlink->reporter_list);
unlock:
mutex_unlock(&devlink->lock);
@@ -4448,10 +4460,117 @@ devlink_health_reporter_destroy(struct devlink_health_reporter *reporter)
mutex_lock(&reporter->devlink->lock);
list_del(&reporter->list);
mutex_unlock(&reporter->devlink->lock);
+ if (reporter->dump_fmsg)
+ devlink_fmsg_free(reporter->dump_fmsg);
kfree(reporter);
}
EXPORT_SYMBOL_GPL(devlink_health_reporter_destroy);
+static int
+devlink_health_reporter_recover(struct devlink_health_reporter *reporter,
+ void *priv_ctx)
+{
+ int err;
+
+ if (!reporter->ops->recover)
+ return -EOPNOTSUPP;
+
+ err = reporter->ops->recover(reporter, priv_ctx);
+ if (err)
+ return err;
+
+ reporter->recovery_count++;
+ reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_HEALTHY;
+ reporter->last_recovery_ts = jiffies;
+
+ return 0;
+}
+
+static void
+devlink_health_dump_clear(struct devlink_health_reporter *reporter)
+{
+ if (!reporter->dump_fmsg)
+ return;
+ devlink_fmsg_free(reporter->dump_fmsg);
+ reporter->dump_fmsg = NULL;
+}
+
+static int devlink_health_do_dump(struct devlink_health_reporter *reporter,
+ void *priv_ctx)
+{
+ int err;
+
+ if (!reporter->ops->dump)
+ return 0;
+
+ if (reporter->dump_fmsg)
+ return 0;
+
+ reporter->dump_fmsg = devlink_fmsg_alloc();
+ if (!reporter->dump_fmsg) {
+ err = -ENOMEM;
+ return err;
+ }
+
+ err = devlink_fmsg_obj_nest_start(reporter->dump_fmsg);
+ if (err)
+ goto dump_err;
+
+ err = reporter->ops->dump(reporter, reporter->dump_fmsg,
+ priv_ctx);
+ if (err)
+ goto dump_err;
+
+ err = devlink_fmsg_obj_nest_end(reporter->dump_fmsg);
+ if (err)
+ goto dump_err;
+
+ reporter->dump_ts = jiffies;
+
+ return 0;
+
+dump_err:
+ devlink_health_dump_clear(reporter);
+ return err;
+}
+
+int devlink_health_report(struct devlink_health_reporter *reporter,
+ const char *msg, void *priv_ctx)
+{
+ struct devlink *devlink = reporter->devlink;
+
+ /* write a log message of the current error */
+ WARN_ON(!msg);
+ trace_devlink_health_report(devlink, reporter->ops->name, msg);
+ reporter->error_count++;
+
+ /* abort if the previous error wasn't recovered */
+ if (reporter->auto_recover &&
+ (reporter->health_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY ||
+ jiffies - reporter->last_recovery_ts <
+ msecs_to_jiffies(reporter->graceful_period))) {
+ trace_devlink_health_recover_aborted(devlink,
+ reporter->ops->name,
+ reporter->health_state,
+ jiffies -
+ reporter->last_recovery_ts);
+ return -ECANCELED;
+ }
+
+ reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR;
+
+ mutex_lock(&reporter->dump_lock);
+ /* store current dump of current error, for later analysis */
+ devlink_health_do_dump(reporter, priv_ctx);
+ mutex_unlock(&reporter->dump_lock);
+
+ if (reporter->auto_recover)
+ return devlink_health_reporter_recover(reporter, priv_ctx);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_health_report);
+
static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 07/11] devlink: Add health diagnose command
From: Eran Ben Elisha @ 2019-02-05 12:08 UTC (permalink / raw)
To: netdev, David S. Miller
Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin,
Eran Ben Elisha
In-Reply-To: <1549368532-17089-1-git-send-email-eranbe@mellanox.com>
Add devlink health diagnose command, in order to run a diagnose
operation over a specific reporter.
It is expected from driver's callback for diagnose command to fill it
via the devlink fmsg API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
include/uapi/linux/devlink.h | 1 +
net/core/devlink.c | 46 ++++++++++++++++++++++++++++++++++++
2 files changed, 47 insertions(+)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index a3a97e6edad8..09be37137841 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -99,6 +99,7 @@ enum devlink_command {
DEVLINK_CMD_HEALTH_REPORTER_GET,
DEVLINK_CMD_HEALTH_REPORTER_SET,
DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
+ DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
/* add new commands above here */
__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 27593bbc0348..40efb2afd75a 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4751,6 +4751,45 @@ static int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
return devlink_health_reporter_recover(reporter, NULL);
}
+static int devlink_nl_cmd_health_reporter_diagnose_doit(struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct devlink *devlink = info->user_ptr[0];
+ struct devlink_health_reporter *reporter;
+ struct devlink_fmsg *fmsg;
+ int err;
+
+ reporter = devlink_health_reporter_get_from_info(devlink, info);
+ if (!reporter)
+ return -EINVAL;
+
+ if (!reporter->ops->diagnose)
+ return -EOPNOTSUPP;
+
+ fmsg = devlink_fmsg_alloc();
+ if (!fmsg)
+ return -ENOMEM;
+
+ err = devlink_fmsg_obj_nest_start(fmsg);
+ if (err)
+ goto out;
+
+ err = reporter->ops->diagnose(reporter, fmsg);
+ if (err)
+ goto out;
+
+ err = devlink_fmsg_obj_nest_end(fmsg);
+ if (err)
+ goto out;
+
+ err = devlink_fmsg_snd(fmsg, info,
+ DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE, 0);
+
+out:
+ devlink_fmsg_free(fmsg);
+ return err;
+}
+
static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -5044,6 +5083,13 @@ static const struct genl_ops devlink_nl_ops[] = {
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
+ {
+ .cmd = DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE,
+ .doit = devlink_nl_cmd_health_reporter_diagnose_doit,
+ .policy = devlink_nl_policy,
+ .flags = GENL_ADMIN_PERM,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+ },
};
static struct genl_family devlink_nl_family __ro_after_init = {
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 06/11] devlink: Add health recover command
From: Eran Ben Elisha @ 2019-02-05 12:08 UTC (permalink / raw)
To: netdev, David S. Miller
Cc: Saeed Mahameed, Jiri Pirko, Moshe Shemesh, Aya Levin,
Eran Ben Elisha
In-Reply-To: <1549368532-17089-1-git-send-email-eranbe@mellanox.com>
Add devlink health recover command to the uapi, in order to allow the user
to execute a recover operation over a specific reporter.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
include/uapi/linux/devlink.h | 1 +
net/core/devlink.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index b03065a99884..a3a97e6edad8 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -98,6 +98,7 @@ enum devlink_command {
DEVLINK_CMD_HEALTH_REPORTER_GET,
DEVLINK_CMD_HEALTH_REPORTER_SET,
+ DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
/* add new commands above here */
__DEVLINK_CMD_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index f61aaa1578bb..27593bbc0348 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -4738,6 +4738,19 @@ devlink_nl_cmd_health_reporter_set_doit(struct sk_buff *skb,
return 0;
}
+static int devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
+ struct genl_info *info)
+{
+ struct devlink *devlink = info->user_ptr[0];
+ struct devlink_health_reporter *reporter;
+
+ reporter = devlink_health_reporter_get_from_info(devlink, info);
+ if (!reporter)
+ return -EINVAL;
+
+ return devlink_health_reporter_recover(reporter, NULL);
+}
+
static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -5024,6 +5037,13 @@ static const struct genl_ops devlink_nl_ops[] = {
.flags = GENL_ADMIN_PERM,
.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
},
+ {
+ .cmd = DEVLINK_CMD_HEALTH_REPORTER_RECOVER,
+ .doit = devlink_nl_cmd_health_reporter_recover_doit,
+ .policy = devlink_nl_policy,
+ .flags = GENL_ADMIN_PERM,
+ .internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+ },
};
static struct genl_family devlink_nl_family __ro_after_init = {
--
2.17.1
^ permalink raw reply related
* [net-next, PATCH] net: stmmac: fix ptp timestamping on Rx on gmac4
From: Ilias Apalodimas @ 2019-02-05 12:15 UTC (permalink / raw)
To: alexandre.torgue
Cc: peppe.cavallaro, davem, mcoquelin.stm32, netdev, joabreu,
Ilias Apalodimas
The current driver only enables Pdelay_Req and Pdelay_Resp when
HWTSTAMP_FILTER_PTP_V2_EVENT, HWTSTAMP_FILTER_PTP_V1_L4_EVENT or
HWTSTAMP_FILTER_PTP_V2_L4_EVENT is requested. This results in ptp sync on
slave mode to report 'received SYNC without timestamp' when using ptp4l.
Although the hardware can support Sync, Pdelay_Req and Pdelay_resp by
setting bit14 annd bits 17/16 to 01 this leaves Delay_Req timestamps out.
Fix this by enabling all event and general messages timestamps.
This includes SYNC, Follow_Up, Delay_Req, Delay_Resp, Pdelay_Req,
Pdelay_Resp and Pdelay_Resp_Follow_Up messages.
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 25 ++++++++---------------
drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h | 9 ++++++--
2 files changed, 16 insertions(+), 18 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 517555b..effff17 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -597,12 +597,13 @@ static int stmmac_hwtstamp_set(struct net_device *dev, struct ifreq *ifr)
case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
/* PTP v1, UDP, any kind of event packet */
config.rx_filter = HWTSTAMP_FILTER_PTP_V1_L4_EVENT;
- /* take time stamp for all event messages */
- if (xmac)
- snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
- else
- snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
-
+ /* 'xmac' hardware can support Sync, Pdelay_Req and
+ * Pdelay_resp by setting bit14 and bits17/16 to 01
+ * This leaves Delay_Req timestamps out.
+ * Enable all events *and* general purpose message
+ * timestamping
+ */
+ snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
ptp_over_ipv4_udp = PTP_TCR_TSIPV4ENA;
ptp_over_ipv6_udp = PTP_TCR_TSIPV6ENA;
break;
@@ -633,10 +634,7 @@ static int stmmac_hwtstamp_set(struct net_device *dev, struct ifreq *ifr)
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L4_EVENT;
ptp_v2 = PTP_TCR_TSVER2ENA;
/* take time stamp for all event messages */
- if (xmac)
- snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
- else
- snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
+ snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
ptp_over_ipv4_udp = PTP_TCR_TSIPV4ENA;
ptp_over_ipv6_udp = PTP_TCR_TSIPV6ENA;
@@ -669,12 +667,7 @@ static int stmmac_hwtstamp_set(struct net_device *dev, struct ifreq *ifr)
/* PTP v2/802.AS1 any layer, any kind of event packet */
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
ptp_v2 = PTP_TCR_TSVER2ENA;
- /* take time stamp for all event messages */
- if (xmac)
- snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
- else
- snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
-
+ snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
ptp_over_ipv4_udp = PTP_TCR_TSIPV4ENA;
ptp_over_ipv6_udp = PTP_TCR_TSIPV6ENA;
ptp_over_ethernet = PTP_TCR_TSIPENA;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h
index ecccf89..e852821 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h
@@ -59,9 +59,14 @@
#define PTP_TCR_TSEVNTENA BIT(14)
/* Enable Snapshot for Messages Relevant to Master */
#define PTP_TCR_TSMSTRENA BIT(15)
-/* Select PTP packets for Taking Snapshots */
+/* Select PTP packets for Taking Snapshots
+ * On gmac4 specifically:
+ * Enable SYNC, Pdelay_Req, Pdelay_Resp when TSEVNTENA is enabled.
+ * or
+ * Enable SYNC, Follow_Up, Delay_Req, Delay_Resp, Pdelay_Req, Pdelay_Resp,
+ * Pdelay_Resp_Follow_Up if TSEVNTENA is disabled
+ */
#define PTP_TCR_SNAPTYPSEL_1 BIT(16)
-#define PTP_GMAC4_TCR_SNAPTYPSEL_1 GENMASK(17, 16)
/* Enable MAC address for PTP Frame Filtering */
#define PTP_TCR_TSENMACADDR BIT(18)
--
2.7.4
^ permalink raw reply related
* RE: [PATCH v2] arm64: dts: lx2160aqds: Add mdio mux nodes
From: Pankaj Bansal @ 2019-02-05 12:23 UTC (permalink / raw)
To: Leo Li, shawnguo@kernel.org
Cc: andrew@lunn.ch, f.fainelli@gmail.com, netdev@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, David Miller
In-Reply-To: <20190204.085056.439064547458115258.davem@davemloft.net>
Hi Shawn/Leo,
If you have no more comments, can you please merge this path in your branch?
in same branch in which you have accepted LX2160AQDS board patches.
Regards,
Pankaj Bansal
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Monday, 4 February, 2019 10:21 PM
> To: Pankaj Bansal <pankaj.bansal@nxp.com>
> Cc: shawnguo@kernel.org; Leo Li <leoyang.li@nxp.com>; andrew@lunn.ch;
> f.fainelli@gmail.com; netdev@vger.kernel.org; linux-arm-
> kernel@lists.infradead.org
> Subject: Re: [PATCH v2] arm64: dts: lx2160aqds: Add mdio mux nodes
>
> From: Pankaj Bansal <pankaj.bansal@nxp.com>
> Date: Mon, 4 Feb 2019 08:51:57 +0000
>
> > The two external MDIO buses used to communicate with phy devices that
> > are external to SOC are muxed in LX2160AQDS board.
> >
> > These buses can be routed to any one of the eight IO slots on
> > LX2160AQDS board depending on value in fpga register 0x54.
> >
> > Additionally the external MDIO1 is used to communicate to the onboard
> > RGMII phy devices.
> >
> > The mdio1 is controlled by bits 4-7 of fpga register and mdio2 is
> > controlled by bits 0-3 of fpga register.
> >
> > Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com>
>
> Am I applying this to my networking tree or are the ARM folks taking this?
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox