devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver
@ 2015-01-14  6:34 Ding Tianhong
  2015-01-14  6:34 ` [PATCH net-next v13 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet Ding Tianhong
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Ding Tianhong @ 2015-01-14  6:34 UTC (permalink / raw)
  To: arnd, robh+dt, davem, grant.likely, agraf
  Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
	zhangfei.gao, netdev, devicetree, linux

v13:
- Fix the problem of alignment parameters for function and checkpatch warming.

v12:
- According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
  for hip04 ethernet.

v11:
- Add ethtool support for tx coalecse getting and setting, the xmit_more
  is not supported for this patch, but I think it could work for hip04,
  will support it later after some tests for performance better.

  Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
  it looks that the performance and latency is more better by tx_coalesce_frames/usecs.

  - Before:
    $ ping 192.168.1.1 ...
    === 192.168.1.1 ping statistics ===
    24 packets transmitted, 24 received, 0% packet loss, time 22999ms
    rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms

    $ iperf -c 192.168.1.1 ...
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec

  - After:
    $ ping 192.168.1.1 ...
    === 192.168.1.1 ping statistics ===
    24 packets transmitted, 24 received, 0% packet loss, time 22999ms
    rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms

    $ iperf -c 192.168.1.1 ...
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec

v10:
- According Arnd's suggestion, remove the skb_orphan and use the hrtimer
  for the cleanup of the TX queue and add some modification for the hip04
  drivers.
  1) drop the broken skb_orphan call
  2) drop the workqueue
  3) batch cleanup based on tx_coalesce_frames/usecs for better throughput
  4) use a reasonable default tx timeout (200us, could be shorted
     based on measurements) with a range timer
  5) fix napi poll function return value
  6) use a lockless queue for cleanup

v9:
- There is no tx completion interrupts to free DMAd Tx packets, it means taht
  we rely on new tx packets arriving to run the destructors of completed packets,
  which open up space in their sockets's send queues. Sometimes we don't get such
  new packets causing Tx to stall, a single UDP transmitter is a good example of
  this situation, so we need a clean up workqueue to reclaims completed packets,
  the workqueue will only free the last packets which is already stay for several jiffies.
  Also fix some format cleanups.

v8:
- Use poll to reclaim xmitted buffer as workaround since no tx done interrupt 

v7:
- Remove select NET_CORE in 0002

v6:
- Suggest by Russell: Use netdev_sent_queue & netdev_completed_queue to solve latency issue 
  Also shorten the period of timer, which is used to wakeup the queue since no
  tx completed interrupt.

v5:
- no big change, fix typo

v4:
- Modify accoringly to the suggetion from Arnd, Florian, Eric, David
  Use of_parse_phandle_with_fixed_args & syscon_node_to_regmap get ppe info
  Add skb_orphan() and tx_timer for reclaim since no tx_finished interrupt
  Update timeout, and move of_phy_connect to probe to reuse open/stop

v3:
- Suggest from Arnd, use syscon & regmap_write/read to replace static void __iomem *ppebase.
  Modify hisilicon-hip04-net.txt accrordingly to suggestion from Florian and Sergei.

v2:
- Got many suggestions from Russell, Arnd, Florian, Mark and Sergei
  Remove memcpy, use dma_map/unmap_single, use dma_alloc_coherent rather than dma_pool, etc.
  Refer property in ethernet.txt, change ppe description, etc.

Ding Tianhong (1):
  net: hisilicon: new hip04 ethernet driver

Zhangfei Gao (2):
  Documentation: add Device tree bindings for Hisilicon hip04 ethernet
  net: hisilicon: new hip04 MDIO driver

 .../bindings/net/hisilicon-hip04-net.txt           |  88 ++
 drivers/net/ethernet/hisilicon/Kconfig             |   9 +
 drivers/net/ethernet/hisilicon/Makefile            |   1 +
 drivers/net/ethernet/hisilicon/hip04_eth.c         | 969 +++++++++++++++++++++
 drivers/net/ethernet/hisilicon/hip04_mdio.c        | 186 ++++
 5 files changed, 1253 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c

-- 
1.8.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH net-next v13 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
  2015-01-14  6:34 [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver Ding Tianhong
@ 2015-01-14  6:34 ` Ding Tianhong
  2015-01-14  6:34 ` [PATCH net-next v13 2/3] net: hisilicon: new hip04 MDIO driver Ding Tianhong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Ding Tianhong @ 2015-01-14  6:34 UTC (permalink / raw)
  To: arnd, robh+dt, davem, grant.likely, agraf
  Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
	zhangfei.gao, netdev, devicetree, linux

From: Zhangfei Gao <zhangfei.gao@linaro.org>

This patch adds the Device Tree bindings for the Hisilicon hip04
Ethernet controller, including 100M / 1000M controller.

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 .../bindings/net/hisilicon-hip04-net.txt           | 88 ++++++++++++++++++++++
 1 file changed, 88 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt

diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
new file mode 100644
index 0000000..988fc69
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
@@ -0,0 +1,88 @@
+Hisilicon hip04 Ethernet Controller
+
+* Ethernet controller node
+
+Required properties:
+- compatible: should be "hisilicon,hip04-mac".
+- reg: address and length of the register set for the device.
+- interrupts: interrupt for the device.
+- port-handle: <phandle port channel>
+	phandle, specifies a reference to the syscon ppe node
+	port, port number connected to the controller
+	channel, recv channel start from channel * number (RX_DESC_NUM)
+- phy-mode: see ethernet.txt [1].
+
+Optional properties:
+- phy-handle: see ethernet.txt [1].
+
+[1] Documentation/devicetree/bindings/net/ethernet.txt
+
+
+* Ethernet ppe node:
+Control rx & tx fifos of all ethernet controllers.
+Have 2048 recv channels shared by all ethernet controllers, only if no overlap.
+Each controller's recv channel start from channel * number (RX_DESC_NUM).
+
+Required properties:
+- compatible: "hisilicon,hip04-ppe", "syscon".
+- reg: address and length of the register set for the device.
+
+
+* MDIO bus node:
+
+Required properties:
+
+- compatible: should be "hisilicon,hip04-mdio".
+- Inherits from MDIO bus node binding [2]
+[2] Documentation/devicetree/bindings/net/phy.txt
+
+Example:
+	mdio {
+		compatible = "hisilicon,hip04-mdio";
+		reg = <0x28f1000 0x1000>;
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		phy0: ethernet-phy@0 {
+			compatible = "ethernet-phy-ieee802.3-c22";
+			reg = <0>;
+			marvell,reg-init = <18 0x14 0 0x8001>;
+		};
+
+		phy1: ethernet-phy@1 {
+			compatible = "ethernet-phy-ieee802.3-c22";
+			reg = <1>;
+			marvell,reg-init = <18 0x14 0 0x8001>;
+		};
+	};
+
+	ppe: ppe@28c0000 {
+		compatible = "hisilicon,hip04-ppe", "syscon";
+		reg = <0x28c0000 0x10000>;
+	};
+
+	fe: ethernet@28b0000 {
+		compatible = "hisilicon,hip04-mac";
+		reg = <0x28b0000 0x10000>;
+		interrupts = <0 413 4>;
+		phy-mode = "mii";
+		port-handle = <&ppe 31 0>;
+	};
+
+	ge0: ethernet@2800000 {
+		compatible = "hisilicon,hip04-mac";
+		reg = <0x2800000 0x10000>;
+		interrupts = <0 402 4>;
+		phy-mode = "sgmii";
+		port-handle = <&ppe 0 1>;
+		phy-handle = <&phy0>;
+	};
+
+	ge8: ethernet@2880000 {
+		compatible = "hisilicon,hip04-mac";
+		reg = <0x2880000 0x10000>;
+		interrupts = <0 410 4>;
+		phy-mode = "sgmii";
+		port-handle = <&ppe 8 2>;
+		phy-handle = <&phy1>;
+	};
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v13 2/3] net: hisilicon: new hip04 MDIO driver
  2015-01-14  6:34 [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver Ding Tianhong
  2015-01-14  6:34 ` [PATCH net-next v13 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet Ding Tianhong
@ 2015-01-14  6:34 ` Ding Tianhong
  2015-01-14  6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
  2015-01-14 10:19 ` [PATCH net-next v13 0/3] add hisilicon " Alexander Graf
  3 siblings, 0 replies; 25+ messages in thread
From: Ding Tianhong @ 2015-01-14  6:34 UTC (permalink / raw)
  To: arnd, robh+dt, davem, grant.likely, agraf
  Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
	zhangfei.gao, netdev, devicetree, linux

From: Zhangfei Gao <zhangfei.gao@linaro.org>

Hisilicon hip04 platform mdio driver
Reuse Marvell phy drivers/net/phy/marvell.c

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/Kconfig      |   9 ++
 drivers/net/ethernet/hisilicon/Makefile     |   1 +
 drivers/net/ethernet/hisilicon/hip04_mdio.c | 186 ++++++++++++++++++++++++++++
 3 files changed, 196 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c

diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig
index e942173..a54d897 100644
--- a/drivers/net/ethernet/hisilicon/Kconfig
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -24,4 +24,13 @@ config HIX5HD2_GMAC
 	help
 	  This selects the hix5hd2 mac family network device.
 
+config HIP04_ETH
+	tristate "HISILICON P04 Ethernet support"
+	select PHYLIB
+	select MARVELL_PHY
+	select MFD_SYSCON
+	---help---
+	  If you wish to compile a kernel for a hardware with hisilicon p04 SoC and
+	  want to use the internal ethernet then you should answer Y to this.
+
 endif # NET_VENDOR_HISILICON
diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 9175e846..40115a7 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -3,3 +3,4 @@
 #
 
 obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_mdio.c b/drivers/net/ethernet/hisilicon/hip04_mdio.c
new file mode 100644
index 0000000..b3bac25
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_mdio.c
@@ -0,0 +1,186 @@
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/io.h>
+#include <linux/of_mdio.h>
+#include <linux/delay.h>
+
+#define MDIO_CMD_REG		0x0
+#define MDIO_ADDR_REG		0x4
+#define MDIO_WDATA_REG		0x8
+#define MDIO_RDATA_REG		0xc
+#define MDIO_STA_REG		0x10
+
+#define MDIO_START		BIT(14)
+#define MDIO_R_VALID		BIT(1)
+#define MDIO_READ	        (BIT(12) | BIT(11) | MDIO_START)
+#define MDIO_WRITE	        (BIT(12) | BIT(10) | MDIO_START)
+
+struct hip04_mdio_priv {
+	void __iomem *base;
+};
+
+#define WAIT_TIMEOUT 10
+static int hip04_mdio_wait_ready(struct mii_bus *bus)
+{
+	struct hip04_mdio_priv *priv = bus->priv;
+	int i;
+
+	for (i = 0; readl_relaxed(priv->base + MDIO_CMD_REG) & MDIO_START; i++) {
+		if (i == WAIT_TIMEOUT)
+			return -ETIMEDOUT;
+		msleep(20);
+	}
+
+	return 0;
+}
+
+static int hip04_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
+{
+	struct hip04_mdio_priv *priv = bus->priv;
+	u32 val;
+	int ret;
+
+	ret = hip04_mdio_wait_ready(bus);
+	if (ret < 0)
+		goto out;
+
+	val = regnum | (mii_id << 5) | MDIO_READ;
+	writel_relaxed(val, priv->base + MDIO_CMD_REG);
+
+	ret = hip04_mdio_wait_ready(bus);
+	if (ret < 0)
+		goto out;
+
+	val = readl_relaxed(priv->base + MDIO_STA_REG);
+	if (val & MDIO_R_VALID) {
+		dev_err(bus->parent, "SMI bus read not valid\n");
+		ret = -ENODEV;
+		goto out;
+	}
+
+	val = readl_relaxed(priv->base + MDIO_RDATA_REG);
+	ret = val & 0xFFFF;
+out:
+	return ret;
+}
+
+static int hip04_mdio_write(struct mii_bus *bus, int mii_id,
+			    int regnum, u16 value)
+{
+	struct hip04_mdio_priv *priv = bus->priv;
+	u32 val;
+	int ret;
+
+	ret = hip04_mdio_wait_ready(bus);
+	if (ret < 0)
+		goto out;
+
+	writel_relaxed(value, priv->base + MDIO_WDATA_REG);
+	val = regnum | (mii_id << 5) | MDIO_WRITE;
+	writel_relaxed(val, priv->base + MDIO_CMD_REG);
+out:
+	return ret;
+}
+
+static int hip04_mdio_reset(struct mii_bus *bus)
+{
+	int temp, i;
+
+	for (i = 0; i < PHY_MAX_ADDR; i++) {
+		hip04_mdio_write(bus, i, 22, 0);
+		temp = hip04_mdio_read(bus, i, MII_BMCR);
+		if (temp < 0)
+			continue;
+
+		temp |= BMCR_RESET;
+		if (hip04_mdio_write(bus, i, MII_BMCR, temp) < 0)
+			continue;
+	}
+
+	mdelay(500);
+	return 0;
+}
+
+static int hip04_mdio_probe(struct platform_device *pdev)
+{
+	struct resource *r;
+	struct mii_bus *bus;
+	struct hip04_mdio_priv *priv;
+	int ret;
+
+	bus = mdiobus_alloc_size(sizeof(struct hip04_mdio_priv));
+	if (!bus) {
+		dev_err(&pdev->dev, "Cannot allocate MDIO bus\n");
+		return -ENOMEM;
+	}
+
+	bus->name = "hip04_mdio_bus";
+	bus->read = hip04_mdio_read;
+	bus->write = hip04_mdio_write;
+	bus->reset = hip04_mdio_reset;
+	snprintf(bus->id, MII_BUS_ID_SIZE, "%s-mii", dev_name(&pdev->dev));
+	bus->parent = &pdev->dev;
+	priv = bus->priv;
+
+	r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(&pdev->dev, r);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto out_mdio;
+	}
+
+	ret = of_mdiobus_register(bus, pdev->dev.of_node);
+	if (ret < 0) {
+		dev_err(&pdev->dev, "Cannot register MDIO bus (%d)\n", ret);
+		goto out_mdio;
+	}
+
+	platform_set_drvdata(pdev, bus);
+
+	return 0;
+
+out_mdio:
+	mdiobus_free(bus);
+	return ret;
+}
+
+static int hip04_mdio_remove(struct platform_device *pdev)
+{
+	struct mii_bus *bus = platform_get_drvdata(pdev);
+
+	mdiobus_unregister(bus);
+	mdiobus_free(bus);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mdio_match[] = {
+	{ .compatible = "hisilicon,hip04-mdio" },
+	{ }
+};
+MODULE_DEVICE_TABLE(of, hip04_mdio_match);
+
+static struct platform_driver hip04_mdio_driver = {
+	.probe = hip04_mdio_probe,
+	.remove = hip04_mdio_remove,
+	.driver = {
+		.name = "hip04-mdio",
+		.owner = THIS_MODULE,
+		.of_match_table = hip04_mdio_match,
+	},
+};
+
+module_platform_driver(hip04_mdio_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 MDIO interface driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-mdio");
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
  2015-01-14  6:34 [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver Ding Tianhong
  2015-01-14  6:34 ` [PATCH net-next v13 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet Ding Tianhong
  2015-01-14  6:34 ` [PATCH net-next v13 2/3] net: hisilicon: new hip04 MDIO driver Ding Tianhong
@ 2015-01-14  6:34 ` Ding Tianhong
  2015-01-14  8:06   ` Joe Perches
                     ` (5 more replies)
  2015-01-14 10:19 ` [PATCH net-next v13 0/3] add hisilicon " Alexander Graf
  3 siblings, 6 replies; 25+ messages in thread
From: Ding Tianhong @ 2015-01-14  6:34 UTC (permalink / raw)
  To: arnd, robh+dt, davem, grant.likely, agraf
  Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
	zhangfei.gao, netdev, devicetree, linux

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
The controller has no tx done interrupt, reclaim xmitted buffer in the poll.

v13: Fix the problem of alignment parameters for function and checkpatch warming.

v12: According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
     for hip04 ethernet.

v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
     is not supported for this patch, but I think it could work for hip04,
     will support it later after some tests for performance better.

     Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
     it looks that the performance and latency is more better by tx_coalesce_frames/usecs.

     - Before:
     $ ping 192.168.1.1 ...
     === 192.168.1.1 ping statistics ===
     24 packets transmitted, 24 received, 0% packet loss, time 22999ms
     rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms

     $ iperf -c 192.168.1.1 ...
     [ ID] Interval       Transfer     Bandwidth
     [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec

     - After:
     $ ping 192.168.1.1 ...
     === 192.168.1.1 ping statistics ===
     24 packets transmitted, 24 received, 0% packet loss, time 22999ms
     rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms

     $ iperf -c 192.168.1.1 ...
     [ ID] Interval       Transfer     Bandwidth
     [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec

v10: According David Miller and Arnd Bergmann's suggestion, add some modification
     for v9 version
     - drop the workqueue
     - batch cleanup based on tx_coalesce_frames/usecs for better throughput
     - use a reasonable default tx timeout (200us, could be shorted
       based on measurements) with a range timer
     - fix napi poll function return value
     - use a lockless queue for cleanup

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/Makefile    |   2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c | 969 +++++++++++++++++++++++++++++
 2 files changed, 970 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 40115a7..6c14540 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -3,4 +3,4 @@
 #
 
 obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..525214e
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,969 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/ktime.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+#define PPE_CFG_RX_ADDR			0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT		0xa200
+
+#define GE_DUPLEX_TYPE			0x08
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_REG		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_CPU_ADD_ADDR		0x580
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define RCV_DROP			BIT(7)
+#define TX_DROP				BIT(6)
+#define DEF_INT_ERR			(RCV_NOBUF | RCV_DROP | TX_DROP)
+#define DEF_INT_MASK			(RCV_INT | DEF_INT_ERR)
+
+/* TX descriptor config */
+#define TX_FREE_MEM			BIT(0)
+#define TX_READ_ALLOC_L3		BIT(1)
+#define TX_FINISH_CACHE_INV		BIT(2)
+#define TX_CLEAR_WB			BIT(4)
+#define TX_L3_CHECKSUM			BIT(5)
+#define TX_LOOP_BACK			BIT(11)
+
+/* RX error */
+#define RX_PKT_DROP			BIT(0)
+#define RX_L2_ERR			BIT(1)
+#define RX_PKT_ERR			(RX_PKT_DROP | RX_L2_ERR)
+
+#define SGMII_SPEED_1000		0x08
+#define SGMII_SPEED_100			0x07
+#define SGMII_SPEED_10			0x06
+#define MII_SPEED_100			0x01
+#define MII_SPEED_10			0x00
+
+#define GE_DUPLEX_FULL			BIT(0)
+#define GE_DUPLEX_HALF			0x00
+#define GE_MODE_CHANGE_EN		BIT(0)
+
+#define GE_TX_AUTO_NEG			BIT(5)
+#define GE_TX_ADD_CRC			BIT(6)
+#define GE_TX_SHORT_PAD_THROUGH		BIT(7)
+
+#define GE_RX_STRIP_CRC			BIT(0)
+#define GE_RX_STRIP_PAD			BIT(3)
+#define GE_RX_PAD_EN			BIT(4)
+
+#define GE_AUTO_NEG_CTL			BIT(0)
+
+#define GE_RX_INT_THRESHOLD		BIT(6)
+#define GE_RX_TIMEOUT			0x04
+
+#define GE_RX_PORT_EN			BIT(1)
+#define GE_TX_PORT_EN			BIT(2)
+
+#define PPE_CFG_STS_RX_PKT_CNT_RC	BIT(12)
+
+#define PPE_CFG_RX_PKT_ALIGN		BIT(18)
+#define PPE_CFG_QOS_VMID_MODE		BIT(14)
+#define PPE_CFG_QOS_VMID_GRP_SHIFT	8
+
+#define PPE_CFG_RX_FIFO_FSFU		BIT(11)
+#define PPE_CFG_RX_DEPTH_SHIFT		16
+#define PPE_CFG_RX_START_SHIFT		0
+#define PPE_CFG_RX_CTRL_ALIGN_SHIFT	11
+
+#define PPE_CFG_BUS_LOCAL_REL		BIT(14)
+#define PPE_CFG_BUS_BIG_ENDIEN		BIT(0)
+
+#define RX_DESC_NUM			128
+#define TX_DESC_NUM			256
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define GMAC_MIN_PKT_LEN		31
+#define RX_BUF_SIZE			1600
+#define RESET_TIMEOUT			1000
+#define TX_TIMEOUT			(6 * HZ)
+
+#define DRV_NAME			"hip04-ether"
+#define DRV_VERSION			"v1.0"
+
+#define HIP04_MAX_TX_COALESCE_USECS	200
+#define HIP04_MIN_TX_COALESCE_USECS	100
+#define HIP04_MAX_TX_COALESCE_FRAMES	200
+#define HIP04_MIN_TX_COALESCE_FRAMES	100
+
+struct tx_desc {
+	u32 send_addr;
+	u32 send_size;
+	u32 next_addr;
+	u32 cfg;
+	u32 wb_addr;
+} __aligned(64);
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int chan;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	unsigned int tx_head;
+
+	int tx_coalesce_frames;
+	int tx_coalesce_usecs;
+	struct hrtimer tx_coalesce_timer;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+	struct regmap *map;
+	struct work_struct tx_timeout_task;
+
+	/* written only by tx cleanup */
+	unsigned int tx_tail ____cacheline_aligned_in_smp;
+};
+
+static inline unsigned int tx_count(unsigned int head, unsigned int tail)
+{
+	return (head - tail) % (TX_DESC_NUM - 1);
+}
+
+static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = SGMII_SPEED_1000;
+		else if (speed == SPEED_100)
+			val = SGMII_SPEED_100;
+		else
+			val = SGMII_SPEED_10;
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		if (speed == SPEED_100)
+			val = MII_SPEED_100;
+		else
+			val = MII_SPEED_10;
+		break;
+	default:
+		netdev_warn(ndev, "not supported mode\n");
+		val = MII_SPEED_10;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = duplex ? GE_DUPLEX_FULL : GE_DUPLEX_HALF;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = GE_MODE_CHANGE_EN;
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_REG);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val, tmp, timeout = 0;
+
+	do {
+		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
+		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
+		if (timeout++ > RESET_TIMEOUT)
+			break;
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= PPE_CFG_STS_RX_PKT_CNT_RC;
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
+
+	val = priv->port << PPE_CFG_QOS_VMID_GRP_SHIFT;
+	val |= PPE_CFG_QOS_VMID_MODE;
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
+
+	val = RX_DESC_NUM << PPE_CFG_RX_DEPTH_SHIFT;
+	val |= PPE_CFG_RX_FIFO_FSFU;
+	val |= priv->chan << PPE_CFG_RX_START_SHIFT;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
+
+	val = NET_IP_ALIGN << PPE_CFG_RX_CTRL_ALIGN_SHIFT;
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	val = PPE_CFG_RX_PKT_ALIGN;
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	val = PPE_CFG_BUS_LOCAL_REL | PPE_CFG_BUS_BIG_ENDIEN;
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	val = GMAC_PPE_RX_PKT_MAX_LEN;
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	val = GMAC_MAX_PKT_LEN;
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	val = GMAC_MIN_PKT_LEN;
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= GE_TX_AUTO_NEG | GE_TX_ADD_CRC | GE_TX_SHORT_PAD_THROUGH;
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	val = GE_RX_STRIP_CRC;
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= GE_RX_STRIP_PAD | GE_RX_PAD_EN;
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	val = GE_AUTO_NEG_CTL;
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* enable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val |= GE_RX_PORT_EN | GE_TX_PORT_EN;
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+
+	/* clear rx int */
+	val = RCV_INT;
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	/* config recv int */
+	val = GE_RX_INT_THRESHOLD | GE_RX_TIMEOUT;
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+
+	/* enable interrupt */
+	priv->reg_inten = DEF_INT_MASK;
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+}
+
+static void hip04_mac_disable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* disable int */
+	priv->reg_inten &= ~(DEF_INT_MASK);
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+	/* disable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val &= ~(GE_RX_PORT_EN | GE_TX_PORT_EN);
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_CPU_ADD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+		       priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+		       priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static int hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc;
+	unsigned int bytes_compl = 0, pkts_compl = 0;
+	unsigned int count;
+
+	smp_rmb();
+	count = tx_count(ACCESS_ONCE(priv->tx_head), tx_tail);
+	if (count == 0)
+		goto out;
+
+	while (count) {
+		desc = &priv->tx_desc[tx_tail];
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+					 priv->tx_skb[tx_tail]->len,
+					 DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		pkts_compl++;
+		bytes_compl += priv->tx_skb[tx_tail]->len;
+		dev_kfree_skb(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		count--;
+	}
+
+	priv->tx_tail = tx_tail;
+	smp_wmb(); /* Ensure tx_tail visible to xmit */
+
+out:
+	if (pkts_compl || bytes_compl)
+		netdev_completed_queue(ndev, pkts_compl, bytes_compl);
+
+	if (unlikely(netif_queue_stopped(ndev)) && (count < (TX_DESC_NUM - 1)))
+		netif_wake_queue(ndev);
+
+	return count;
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head, count;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	smp_rmb();
+	count = tx_count(tx_head, ACCESS_ONCE(priv->tx_tail));
+	if (count == (TX_DESC_NUM - 1)) {
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	if (dma_mapping_error(&ndev->dev, phys)) {
+		dev_kfree_skb(skb);
+		return NETDEV_TX_OK;
+	}
+
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be32(skb->len);
+	desc->cfg = cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+	count++;
+	netdev_sent_queue(ndev, skb->len);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+
+	/* Ensure tx_head update visible to tx reclaim */
+	smp_wmb();
+
+	/* queue is getting full, better start cleaning up now */
+	if (count >= priv->tx_coalesce_frames) {
+		if (napi_schedule_prep(&priv->napi)) {
+			/* disable rx interrupt and timer */
+			priv->reg_inten &= ~(RCV_INT);
+			writel_relaxed(DEF_INT_MASK & ~RCV_INT,
+				       priv->base + PPE_INTEN);
+			hrtimer_cancel(&priv->tx_coalesce_timer);
+			__napi_schedule(&priv->napi);
+		}
+	} else if (!hrtimer_is_queued(&priv->tx_coalesce_timer)) {
+		/* cleanup not pending yet, start a new timer */
+		hrtimer_start_expires(&priv->tx_coalesce_timer,
+				      HRTIMER_MODE_REL);
+	}
+
+	return NETDEV_TX_OK;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct rx_desc *desc;
+	struct sk_buff *skb;
+	unsigned char *buf;
+	bool last = false;
+	dma_addr_t phys;
+	int rx = 0;
+	int tx_remaining;
+	u16 len;
+	u32 err;
+
+	while (cnt && !last) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				 RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (0 == len) {
+			dev_kfree_skb_any(skb);
+			last = true;
+		} else if ((err & RX_PKT_ERR) || (len >= GMAC_MAX_PKT_LEN)) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+		} else {
+			skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+			skb_put(skb, len);
+			skb->protocol = eth_type_trans(skb, ndev);
+			napi_gro_receive(&priv->napi, skb);
+			stats->rx_packets++;
+			stats->rx_bytes += len;
+			rx++;
+		}
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			goto done;
+		phys = dma_map_single(&ndev->dev, buf,
+				      RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			goto done;
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx >= budget)
+			goto done;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (!(priv->reg_inten & RCV_INT)) {
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+	napi_complete(napi);
+done:
+	/* clean up tx descriptors and start a new timer if necessary */
+	tx_remaining = hip04_tx_reclaim(ndev, false);
+	if (rx < budget && tx_remaining)
+		hrtimer_start_expires(&priv->tx_coalesce_timer, HRTIMER_MODE_REL);
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *)dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+
+	if (!ists)
+		return IRQ_NONE;
+
+	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
+
+	if (unlikely(ists & DEF_INT_ERR)) {
+		if (ists & (RCV_NOBUF | RCV_DROP))
+			stats->rx_errors++;
+			stats->rx_dropped++;
+			netdev_err(ndev, "rx drop\n");
+		if (ists & TX_DROP) {
+			stats->tx_dropped++;
+			netdev_err(ndev, "tx drop\n");
+		}
+	}
+
+	if (ists & RCV_INT && napi_schedule_prep(&priv->napi)) {
+		/* disable rx interrupt */
+		priv->reg_inten &= ~(RCV_INT);
+		writel_relaxed(DEF_INT_MASK & ~RCV_INT, priv->base + PPE_INTEN);
+		hrtimer_cancel(&priv->tx_coalesce_timer);
+		__napi_schedule(&priv->napi);
+	}
+
+	return IRQ_HANDLED;
+}
+
+enum hrtimer_restart tx_done(struct hrtimer *hrtimer)
+{
+	struct hip04_priv *priv;
+
+	priv = container_of(hrtimer, struct hip04_priv, tx_coalesce_timer);
+
+	if (napi_schedule_prep(&priv->napi)) {
+		/* disable rx interrupt */
+		priv->reg_inten &= ~(RCV_INT);
+		writel_relaxed(DEF_INT_MASK & ~RCV_INT, priv->base + PPE_INTEN);
+		__napi_schedule(&priv->napi);
+	}
+
+	return HRTIMER_NORESTART;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(ndev, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	hip04_reset_ppe(priv);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				      RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy)
+		phy_start(priv->phy);
+
+	netdev_reset_queue(ndev);
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev);
+	napi_enable(&priv->napi);
+
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_disable(ndev);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	if (priv->phy)
+		phy_stop(priv->phy);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					 RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	schedule_work(&priv->tx_timeout_task);
+}
+
+static void hip04_tx_timeout_task(struct work_struct *work)
+{
+	struct hip04_priv *priv;
+
+	priv = container_of(work, struct hip04_priv, tx_timeout_task);
+	hip04_mac_stop(priv->ndev);
+	hip04_mac_open(priv->ndev);
+}
+
+static struct net_device_stats *hip04_get_stats(struct net_device *ndev)
+{
+	return &ndev->stats;
+}
+
+static int hip04_get_coalesce(struct net_device *netdev,
+			      struct ethtool_coalesce *ec)
+{
+	struct hip04_priv *priv = netdev_priv(netdev);
+
+	ec->tx_coalesce_usecs = priv->tx_coalesce_usecs;
+	ec->tx_max_coalesced_frames = priv->tx_coalesce_frames;
+
+	return 0;
+}
+
+static int hip04_set_coalesce(struct net_device *netdev,
+			      struct ethtool_coalesce *ec)
+{
+	struct hip04_priv *priv = netdev_priv(netdev);
+
+	/* Check not supported parameters  */
+	if ((ec->rx_max_coalesced_frames) || (ec->rx_coalesce_usecs_irq) ||
+	    (ec->rx_max_coalesced_frames_irq) || (ec->tx_coalesce_usecs_irq) ||
+	    (ec->use_adaptive_rx_coalesce) || (ec->use_adaptive_tx_coalesce) ||
+	    (ec->pkt_rate_low) || (ec->rx_coalesce_usecs_low) ||
+	    (ec->rx_max_coalesced_frames_low) || (ec->tx_coalesce_usecs_high) ||
+	    (ec->tx_max_coalesced_frames_low) || (ec->pkt_rate_high) ||
+	    (ec->tx_coalesce_usecs_low) || (ec->rx_coalesce_usecs_high) ||
+	    (ec->rx_max_coalesced_frames_high) || (ec->rx_coalesce_usecs) ||
+	    (ec->tx_max_coalesced_frames_irq) ||
+	    (ec->stats_block_coalesce_usecs) ||
+	    (ec->tx_max_coalesced_frames_high) || (ec->rate_sample_interval))
+		return -EOPNOTSUPP;
+
+	if ((ec->tx_coalesce_usecs > HIP04_MAX_TX_COALESCE_USECS ||
+	     ec->tx_coalesce_usecs < HIP04_MIN_TX_COALESCE_USECS) ||
+	    (ec->tx_max_coalesced_frames > HIP04_MAX_TX_COALESCE_FRAMES ||
+	     ec->tx_max_coalesced_frames < HIP04_MIN_TX_COALESCE_FRAMES))
+		return -EINVAL;
+
+	priv->tx_coalesce_usecs = ec->tx_coalesce_usecs;
+	priv->tx_coalesce_frames = ec->tx_max_coalesced_frames;
+
+	return 0;
+}
+
+static void hip04_get_drvinfo(struct net_device *netdev,
+			      struct ethtool_drvinfo *drvinfo)
+{
+	strlcpy(drvinfo->driver, DRV_NAME, sizeof(drvinfo->driver));
+	strlcpy(drvinfo->version, DRV_VERSION, sizeof(drvinfo->version));
+}
+
+static struct ethtool_ops hip04_ethtool_ops = {
+	.get_coalesce		= hip04_get_coalesce,
+	.set_coalesce		= hip04_set_coalesce,
+	.get_drvinfo		= hip04_get_drvinfo,
+};
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_get_stats		= hip04_get_stats,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+					   TX_DESC_NUM * sizeof(struct tx_desc),
+					   &priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			  priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *node = d->of_node;
+	struct of_phandle_args arg;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	ktime_t txtime;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, &arg);
+	if (ret < 0) {
+		dev_warn(d, "no port-handle\n");
+		goto init_fail;
+	}
+
+	priv->port = arg.args[0];
+	priv->chan = arg.args[1] * RX_DESC_NUM;
+
+	hrtimer_init(&priv->tx_coalesce_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+
+	/* BQL will try to keep the TX queue as short as possible, but it can't
+	 * be faster than tx_coalesce_usecs, so we need a fast timeout here,
+	 * but also long enough to gather up enough frames to ensure we don't
+	 * get more interrupts than necessary.
+	 * 200us is enough for 16 frames of 1500 bytes at gigabit ethernet rate
+	 */
+	priv->tx_coalesce_frames = TX_DESC_NUM * 3 / 4;
+	priv->tx_coalesce_usecs = 200;
+	/* allow timer to fire after half the time at the earliest */
+	txtime = ktime_set(0, priv->tx_coalesce_usecs * NSEC_PER_USEC / 2);
+	hrtimer_set_expires_range(&priv->tx_coalesce_timer, txtime, txtime);
+	priv->tx_coalesce_timer.function = tx_done;
+
+	priv->map = syscon_node_to_regmap(arg.np);
+	if (IS_ERR(priv->map)) {
+		dev_warn(d, "no syscon hisilicon,hip04-ppe\n");
+		ret = PTR_ERR(priv->map);
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+			       0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto init_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+					   &hip04_adjust_link,
+					   0, priv->phy_mode);
+		if (!priv->phy) {
+			ret = -EPROBE_DEFER;
+			goto init_fail;
+		}
+	}
+
+	INIT_WORK(&priv->tx_timeout_task, hip04_tx_timeout_task);
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->ethtool_ops = &hip04_ethtool_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, NAPI_POLL_WEIGHT);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(ndev, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	cancel_work_sync(&priv->tx_timeout_task);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+MODULE_DEVICE_TABLE(of, hip04_mac_match);
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
-- 
1.8.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
  2015-01-14  6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
@ 2015-01-14  8:06   ` Joe Perches
  2015-01-15  2:56     ` Ding Tianhong
  2015-01-14  8:53   ` Arnd Bergmann
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Joe Perches @ 2015-01-14  8:06 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: arnd, robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
	linux-arm-kernel, eric.dumazet, xuwei5, zhangfei.gao, netdev,
	devicetree, linux

On Wed, 2015-01-14 at 14:34 +0800, Ding Tianhong wrote:
> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.

Single comment:

> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
[]
> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> +{
[]
> +	while (cnt && !last) {
> +		buf = priv->rx_buf[priv->rx_head];
> +		skb = build_skb(buf, priv->rx_buf_size);
> +		if (unlikely(!skb))
> +			net_dbg_ratelimited("build_skb failed\n");
> +
> +		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
> +				 RX_BUF_SIZE, DMA_FROM_DEVICE);
> +		priv->rx_phys[priv->rx_head] = 0;
> +
> +		desc = (struct rx_desc *)skb->data;

Perhaps if (!skb) is possible, there shouldn't be
a dereference of that known null here.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
  2015-01-14  6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
  2015-01-14  8:06   ` Joe Perches
@ 2015-01-14  8:53   ` Arnd Bergmann
  2015-01-15  2:54     ` Ding Tianhong
  2015-01-14 16:34   ` Eric Dumazet
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Arnd Bergmann @ 2015-01-14  8:53 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
	linux-arm-kernel, eric.dumazet, xuwei5, zhangfei.gao, netdev,
	devicetree, linux

On Wednesday 14 January 2015 14:34:14 Ding Tianhong wrote:
> +#define HIP04_MAX_TX_COALESCE_USECS    200
> +#define HIP04_MIN_TX_COALESCE_USECS    100
> +#define HIP04_MAX_TX_COALESCE_FRAMES   200
> +#define HIP04_MIN_TX_COALESCE_FRAMES   100

It's not important, but in case you are creating another version of the
patch, maybe the allowed range can be extended somewhat. The example values
I picked when I sent my suggestion were really made up. It's great if
they work fine, but users might want to  tune this far more depending on
their workloads,  How about these

#define HIP04_MAX_TX_COALESCE_USECS    100000
#define HIP04_MIN_TX_COALESCE_USECS    1
#define HIP04_MAX_TX_COALESCE_FRAMES   (TX_DESC_NUM - 1)
#define HIP04_MIN_TX_COALESCE_FRAMES   1

	Arnd

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver
  2015-01-14  6:34 [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver Ding Tianhong
                   ` (2 preceding siblings ...)
  2015-01-14  6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
@ 2015-01-14 10:19 ` Alexander Graf
  2015-01-15  8:37   ` Ding Tianhong
  3 siblings, 1 reply; 25+ messages in thread
From: Alexander Graf @ 2015-01-14 10:19 UTC (permalink / raw)
  To: Ding Tianhong, arnd, robh+dt, davem, grant.likely
  Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
	zhangfei.gao, netdev, devicetree, linux



On 14.01.15 07:34, Ding Tianhong wrote:
> v13:
> - Fix the problem of alignment parameters for function and checkpatch warming.
> 
> v12:
> - According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
>   for hip04 ethernet.
> 
> v11:
> - Add ethtool support for tx coalecse getting and setting, the xmit_more
>   is not supported for this patch, but I think it could work for hip04,
>   will support it later after some tests for performance better.
> 
>   Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
>   it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
> 
>   - Before:
>     $ ping 192.168.1.1 ...
>     === 192.168.1.1 ping statistics ===
>     24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>     rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
> 
>     $ iperf -c 192.168.1.1 ...
>     [ ID] Interval       Transfer     Bandwidth
>     [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec
> 
>   - After:
>     $ ping 192.168.1.1 ...
>     === 192.168.1.1 ping statistics ===
>     24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>     rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
> 
>     $ iperf -c 192.168.1.1 ...
>     [ ID] Interval       Transfer     Bandwidth
>     [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec

Using v11 on top of 3.18 and after generating some traffic on the
network as well as compiling code on ahci in parallel I ran into the
following OOPS. Any idea what exactly is going on?

>From a 10000 feet perspective it looks like two problems to me

  1) Allocation failure doesn't get handled properly somewhere
  2) We fail to allocate with order=0 - I don't see why


Alex


[44085.652301] ld: page allocation failure: order:0, mode:0x120
[44085.671314] CPU: 0 PID: 5121 Comm: ld Not tainted 3.18.0-5-lpae #1
[44085.692110] [<c0335814>] (unwind_backtrace) from [<c032fcc8>]
(show_stack+0x18/0x1c)
[44085.718109] [<c032fcc8>] (show_stack) from [<c0a50afc>]
(dump_stack+0x88/0x98)
[44085.742360] [<c0a50afc>] (dump_stack) from [<c0443f98>]
(warn_alloc_failed+0xdc/0x124)
[44085.768938] [<c0443f98>] (warn_alloc_failed) from [<c04469ec>]
(__alloc_pages_nodemask+0x7ac/0xa8c)
[44085.799305] [<c04469ec>] (__alloc_pages_nodemask) from [<c0959a88>]
(__netdev_alloc_frag+0xac/0x17c)
[44085.829994] [<c0959a88>] (__netdev_alloc_frag) from [<bf00051c>]
(hip04_rx_poll+0x168/0x3cc [hip04_eth])
[44085.861830] [<bf00051c>] (hip04_rx_poll [hip04_eth]) from
[<c096c0fc>] (net_rx_action+0x15c/0x258)
[44085.891897] [<c096c0fc>] (net_rx_action) from [<c0369508>]
(__do_softirq+0x138/0x2dc)
[44085.918171] [<c0369508>] (__do_softirq) from [<c0369920>]
(irq_exit+0x80/0xb8)
[44085.942408] [<c0369920>] (irq_exit) from [<c03b3ee0>]
(__handle_domain_irq+0x68/0xa8)
[44085.968685] [<c03b3ee0>] (__handle_domain_irq) from [<c03086ac>]
(hip04_handle_irq+0x38/0x68)
[44085.997296] [<c03086ac>] (hip04_handle_irq) from [<c0a56d48>]
(__irq_usr+0x48/0x60)
[44086.022977] Exception stack(0xd3fbbfb0 to 0xd3fbbff8)
[44086.039925] bfa0:                                     03fa94f8
01fc2598 01fc2598 00000000
[44086.067357] bfc0: 03fa9458 40482000 00000000 400878b0 004e46e8
00000040 01f61f80 00000013
[44086.094785] bfe0: fbad2488 bed4c270 403aeb5c 403bbe84 200e0010 ffffffff
[44086.116970] Mem-info:
[44086.124597] DMA per-cpu:
[44086.133098] CPU    0: hi:  186, btch:  31 usd: 191
[44086.149167] HighMem per-cpu:
[44086.158829] CPU    0: hi:  186, btch:  31 usd:  10
[44086.174919] active_anon:78994 inactive_anon:57435 isolated_anon:0
[44086.174919]  active_file:721847 inactive_file:559931 isolated_file:0
[44086.174919]  unevictable:20 dirty:567 writeback:0 unstable:0
[44086.174919]  free:2674014 slab_reclaimable:41072 slab_unreclaimable:5704
[44086.174919]  mapped:13107 shmem:42709 pagetables:933 bounce:0
[44086.174919]  free_cma:16216
[44086.286552] DMA free:1088kB min:3012kB low:3764kB high:4516kB
active_anon:584kB inactive_anon:2948kB active_file:183224kB
inactive_file:183268kB unevictable:16kB isolated(anon):0kB
isolated(file):0kB present:778240kB managed:569024kB mlocked:16kB
dirty:28kB writeback:0kB mapped:864kB shmem:2948kB
slab_reclaimable:164288kB slab_unreclaimable:22816kB kernel_stack:776kB
pagetables:3732kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
[44086.427835] lowmem_reserve[]: 0 0 15624 15624
[44086.442687] HighMem free:10694968kB min:512kB low:21708kB
high:42908kB active_anon:315392kB inactive_anon:226792kB
active_file:2704164kB inactive_file:2056456kB unevictable:64kB
isolated(anon):0kB isolated(file):0kB present:15998976kB
managed:15998976kB mlocked:64kB dirty:2240kB writeback:0kB
mapped:51564kB shmem:167888kB slab_reclaimable:0kB
slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
bounce:59318392kB free_cma:64864kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
[44086.590672] lowmem_reserve[]: 0 0 0 0
[44086.603158] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 1*64kB (R) 0*128kB 0*256kB
0*512kB 1*1024kB (R) 0*2048kB 0*4096kB = 1088kB
[44086.639330] HighMem: 6*4kB (UMC) 2*8kB (U) 3*16kB (UMC) 5*32kB (UC)
3*64kB (UMC) 1*128kB (C) 1*256kB (M) 57*512kB (UM) 25*1024kB (UMC)
7*2048kB (UMC) 2594*4096kB (MRC) = 10694968kB
[44086.694225] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[44086.722524] 1324446 total pagecache pages
[44086.735973] 33 pages in swap cache
[44086.747384] Swap cache stats: add 741, delete 708, find 0/0
[44086.766072] Free swap  = 497068kB
[44086.777185] Total swap = 500032kB
[44087.151253] 4194304 pages of RAM
[44087.162085] 2674498 free pages
[44087.172327] 52304 reserved pages
[44087.183150] 46573 slab pages
[44087.192810] 1004058 pages shared
[44087.203632] 33 pages swap cached
[44087.217489] Unable to handle kernel paging request at virtual address
b0000000
[44087.241776] pgd = c0303000
[44087.250887] [b0000000] *pgd=80000010306003, *pmd=00000000
[44087.269133] Internal error: Oops: 2a06 [#1] SMP ARM
[44087.285499] Modules linked in: fuse dm_mod uio_pdrv_genirq uio
nls_iso8859_1 nls_cp437 vfat fat sg st sr_mod cdrom af_packet squashfs
loop virtio_blk virtio_ring virtio brd marvell ahci_platform
libahci_platform libahci hip04_mdio gpio_dwapb hip04_eth [last unloaded:
dm_mod]
[44087.368299] CPU: 0 PID: 9 Comm: rcuos/0 Not tainted 3.18.0-5-lpae #1
[44087.389614] task: e88a46c0 ti: e88ac000 task.ti: e88ac000
[44087.407749] PC is at v7_dma_inv_range+0x34/0x4c
[44087.422951] LR is at dma_cache_maint_page+0x9c/0x118
[44087.439609] pc : [<c0340b94>]    lr : [<c033a67c>]    psr: 400e0013
[44087.439609] sp : e88addc0  ip : c0340c2c  fp : 00000002
[44087.478104] r10: 00000000  r9 : c0e73580  r8 : c0e778c4
[44087.495630] r7 : c0fdfb80  r6 : c0f20780  r5 : 00000000  r4 : 00000640
[44087.517524] r3 : 0000003f  r2 : 00000040  r1 : b0000640  r0 : b0000000
[44087.539424] Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
Segment kernel
[44087.563939] Control: 30c5387d  Table: 21305b40  DAC: fffffffd
[44087.583213] Process rcuos/0 (pid: 9, stack limit = 0xe88ac238)
[44087.602777] Stack: (0xe88addc0 to 0xe88ae000)
[44087.617398] ddc0: 00000000 00000000 e88a4708 e8ae059c e8ae0000
c0e778c4 c0fdfb80 00000640
[44087.644829] dde0: 00000000 df7f9000 00000002 c033a91c c0340c2c
00000700 df9da740 e8ae059c
[44087.672259] de00: e8ae0000 000012ac 00000038 db903700 e8ae02d8
c0fdfb80 00000001 bf000498
[44087.699689] de20: 00000640 00000002 00000000 c0e9c5b0 e88ac000
00000101 c0e7ab04 00000000
[44087.727119] de40: 00000040 bf001e28 c0f77800 e8ae059c 00000000
00000040 0000012c df9da740
[44087.754549] de60: c0e6d100 00000101 e88ac000 c096c0fc c0f245e4
df9da748 007aae3c e88ac000
[44087.781978] de80: c0f26038 c0f2410d e88a5850 00000003 c0e6d08c
00000002 e88ac000 00000002
[44087.809407] dea0: c0f24314 00000101 e1f4d4c0 c0369508 00000000
c0e94fd4 0000000c c0e6d080
[44087.836837] dec0: c0e65448 0000000a c0f34000 c0e6d100 007aae3b
e88ac030 c0a62b84 00208040
[44087.864266] dee0: a00e0013 600e0013 df9d7680 000001ff 00000001
df9d7744 00000000 c0e73630
[44087.891696] df00: e1f4d4c0 c036976c e88ac010 c0369830 db903700
df9d7680 e88ac000 c03bc8cc
[44087.919125] df20: 00000000 d3fedb80 000b7df0 00000000 e88a46c0
c039db48 e88adf38 e88adf38
[44087.946554] df40: 00000000 e8838940 00000000 df9d7680 c03bc74c
00000000 00000000 00000000
[44087.973984] df60: 00000000 c0380398 00000000 00000000 00000001
df9d7680 00000000 00000000
[44088.001413] df80: e88adf80 e88adf80 00000000 00000000 e88adf90
e88adf90 e88adfac e8838940
[44088.028843] dfa0: c03802b8 00000000 00000000 c032bbc8 00000000
00000000 00000000 00000000
[44088.056272] dfc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[44088.083701] dfe0: 00000000 00000000 00000000 00000000 00000013
00000000 e88adff4 00000000
[44088.111150] [<c0340b94>] (v7_dma_inv_range) from [<c033a67c>]
(dma_cache_maint_page+0x9c/0x118)
[44088.140339] [<c033a67c>] (dma_cache_maint_page) from [<c033a91c>]
(__dma_page_dev_to_cpu+0x90/0x110)
[44088.170996] [<c033a91c>] (__dma_page_dev_to_cpu) from [<bf000498>]
(hip04_rx_poll+0xe4/0x3cc [hip04_eth])
[44088.203119] [<bf000498>] (hip04_rx_poll [hip04_eth]) from
[<c096c0fc>] (net_rx_action+0x15c/0x258)
[44088.233185] [<c096c0fc>] (net_rx_action) from [<c0369508>]
(__do_softirq+0x138/0x2dc)
[44088.259457] [<c0369508>] (__do_softirq) from [<c036976c>]
(do_softirq+0x5c/0x64)
[44088.284270] [<c036976c>] (do_softirq) from [<c0369830>]
(__local_bh_enable_ip+0xbc/0xc0)
[44088.311428] [<c0369830>] (__local_bh_enable_ip) from [<c03bc8cc>]
(rcu_nocb_kthread+0x180/0x5bc)
[44088.340910] [<c03bc8cc>] (rcu_nocb_kthread) from [<c0380398>]
(kthread+0xe0/0xf8)
[44088.366026] [<c0380398>] (kthread) from [<c032bbc8>]
(ret_from_fork+0x14/0x20)
[44088.390260] Code: 1e070f3e e1110003 e1c11003 1e071f3e (ee070f36)
[44088.410808] ---[ end trace ccf8b617d193d373 ]---
[44088.426324] Kernel panic - not syncing: Fatal exception in interrupt
[44088.447652] Rebooting in 100 seconds..Reboot failed -- System halted

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
  2015-01-14  6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
  2015-01-14  8:06   ` Joe Perches
  2015-01-14  8:53   ` Arnd Bergmann
@ 2015-01-14 16:34   ` Eric Dumazet
       [not found]     ` <1421253246.11734.17.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
  2015-01-15  4:39   ` Joe Perches
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2015-01-14 16:34 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: arnd, robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
	linux-arm-kernel, xuwei5, zhangfei.gao, netdev, devicetree, linux

On Wed, 2015-01-14 at 14:34 +0800, Ding Tianhong wrote:
> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
> 
> v13: Fix the problem of alignment parameters for function and checkpatch warming.
> 
> v12: According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
>      for hip04 ethernet.
> 
> v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
>      is not supported for this patch, but I think it could work for hip04,
>      will support it later after some tests for performance better.
> 
>      Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
>      it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
> 
>      - Before:
>      $ ping 192.168.1.1 ...
>      === 192.168.1.1 ping statistics ===
>      24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>      rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
> 
>      $ iperf -c 192.168.1.1 ...
>      [ ID] Interval       Transfer     Bandwidth
>      [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec
> 
>      - After:
>      $ ping 192.168.1.1 ...
>      === 192.168.1.1 ping statistics ===
>      24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>      rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
> 
>      $ iperf -c 192.168.1.1 ...
>      [ ID] Interval       Transfer     Bandwidth
>      [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec
> 
> v10: According David Miller and Arnd Bergmann's suggestion, add some modification
>      for v9 version
>      - drop the workqueue
>      - batch cleanup based on tx_coalesce_frames/usecs for better throughput
>      - use a reasonable default tx timeout (200us, could be shorted
>        based on measurements) with a range timer
>      - fix napi poll function return value
>      - use a lockless queue for cleanup
> 
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---
>  drivers/net/ethernet/hisilicon/Makefile    |   2 +-
>  drivers/net/ethernet/hisilicon/hip04_eth.c | 969 +++++++++++++++++++++++++++++
>  2 files changed, 970 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
> 
> diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
> index 40115a7..6c14540 100644
> --- a/drivers/net/ethernet/hisilicon/Makefile
> +++ b/drivers/net/ethernet/hisilicon/Makefile
> @@ -3,4 +3,4 @@
>  #
>  
>  obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
> -obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
> +obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
> new file mode 100644
> index 0000000..525214e
> --- /dev/null
> +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
> @@ -0,0 +1,969 @@
> +
> +/* Copyright (c) 2014 Linaro Ltd.
> + * Copyright (c) 2014 Hisilicon Limited.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/etherdevice.h>
> +#include <linux/platform_device.h>
> +#include <linux/interrupt.h>
> +#include <linux/ktime.h>
> +#include <linux/of_address.h>
> +#include <linux/phy.h>
> +#include <linux/of_mdio.h>
> +#include <linux/of_net.h>
> +#include <linux/mfd/syscon.h>
> +#include <linux/regmap.h>
> +
> +#define PPE_CFG_RX_ADDR			0x100
> +#define PPE_CFG_POOL_GRP		0x300
> +#define PPE_CFG_RX_BUF_SIZE		0x400
> +#define PPE_CFG_RX_FIFO_SIZE		0x500
> +#define PPE_CURR_BUF_CNT		0xa200
> +
> +#define GE_DUPLEX_TYPE			0x08
> +#define GE_MAX_FRM_SIZE_REG		0x3c
> +#define GE_PORT_MODE			0x40
> +#define GE_PORT_EN			0x44
> +#define GE_SHORT_RUNTS_THR_REG		0x50
> +#define GE_TX_LOCAL_PAGE_REG		0x5c
> +#define GE_TRANSMIT_CONTROL_REG		0x60
> +#define GE_CF_CRC_STRIP_REG		0x1b0
> +#define GE_MODE_CHANGE_REG		0x1b4
> +#define GE_RECV_CONTROL_REG		0x1e0
> +#define GE_STATION_MAC_ADDRESS		0x210
> +#define PPE_CFG_CPU_ADD_ADDR		0x580
> +#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
> +#define PPE_CFG_BUS_CTRL_REG		0x424
> +#define PPE_CFG_RX_CTRL_REG		0x428
> +#define PPE_CFG_RX_PKT_MODE_REG		0x438
> +#define PPE_CFG_QOS_VMID_GEN		0x500
> +#define PPE_CFG_RX_PKT_INT		0x538
> +#define PPE_INTEN			0x600
> +#define PPE_INTSTS			0x608
> +#define PPE_RINT			0x604
> +#define PPE_CFG_STS_MODE		0x700
> +#define PPE_HIS_RX_PKT_CNT		0x804
> +
> +/* REG_INTERRUPT */
> +#define RCV_INT				BIT(10)
> +#define RCV_NOBUF			BIT(8)
> +#define RCV_DROP			BIT(7)
> +#define TX_DROP				BIT(6)
> +#define DEF_INT_ERR			(RCV_NOBUF | RCV_DROP | TX_DROP)
> +#define DEF_INT_MASK			(RCV_INT | DEF_INT_ERR)
> +
> +/* TX descriptor config */
> +#define TX_FREE_MEM			BIT(0)
> +#define TX_READ_ALLOC_L3		BIT(1)
> +#define TX_FINISH_CACHE_INV		BIT(2)
> +#define TX_CLEAR_WB			BIT(4)
> +#define TX_L3_CHECKSUM			BIT(5)
> +#define TX_LOOP_BACK			BIT(11)
> +
> +/* RX error */
> +#define RX_PKT_DROP			BIT(0)
> +#define RX_L2_ERR			BIT(1)
> +#define RX_PKT_ERR			(RX_PKT_DROP | RX_L2_ERR)
> +
> +#define SGMII_SPEED_1000		0x08
> +#define SGMII_SPEED_100			0x07
> +#define SGMII_SPEED_10			0x06
> +#define MII_SPEED_100			0x01
> +#define MII_SPEED_10			0x00
> +
> +#define GE_DUPLEX_FULL			BIT(0)
> +#define GE_DUPLEX_HALF			0x00
> +#define GE_MODE_CHANGE_EN		BIT(0)
> +
> +#define GE_TX_AUTO_NEG			BIT(5)
> +#define GE_TX_ADD_CRC			BIT(6)
> +#define GE_TX_SHORT_PAD_THROUGH		BIT(7)
> +
> +#define GE_RX_STRIP_CRC			BIT(0)
> +#define GE_RX_STRIP_PAD			BIT(3)
> +#define GE_RX_PAD_EN			BIT(4)
> +
> +#define GE_AUTO_NEG_CTL			BIT(0)
> +
> +#define GE_RX_INT_THRESHOLD		BIT(6)
> +#define GE_RX_TIMEOUT			0x04
> +
> +#define GE_RX_PORT_EN			BIT(1)
> +#define GE_TX_PORT_EN			BIT(2)
> +
> +#define PPE_CFG_STS_RX_PKT_CNT_RC	BIT(12)
> +
> +#define PPE_CFG_RX_PKT_ALIGN		BIT(18)
> +#define PPE_CFG_QOS_VMID_MODE		BIT(14)
> +#define PPE_CFG_QOS_VMID_GRP_SHIFT	8
> +
> +#define PPE_CFG_RX_FIFO_FSFU		BIT(11)
> +#define PPE_CFG_RX_DEPTH_SHIFT		16
> +#define PPE_CFG_RX_START_SHIFT		0
> +#define PPE_CFG_RX_CTRL_ALIGN_SHIFT	11
> +
> +#define PPE_CFG_BUS_LOCAL_REL		BIT(14)
> +#define PPE_CFG_BUS_BIG_ENDIEN		BIT(0)
> +
> +#define RX_DESC_NUM			128
> +#define TX_DESC_NUM			256
> +#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
> +#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
> +
> +#define GMAC_PPE_RX_PKT_MAX_LEN		379
> +#define GMAC_MAX_PKT_LEN		1516
> +#define GMAC_MIN_PKT_LEN		31
> +#define RX_BUF_SIZE			1600
> +#define RESET_TIMEOUT			1000
> +#define TX_TIMEOUT			(6 * HZ)
> +
> +#define DRV_NAME			"hip04-ether"
> +#define DRV_VERSION			"v1.0"
> +
> +#define HIP04_MAX_TX_COALESCE_USECS	200
> +#define HIP04_MIN_TX_COALESCE_USECS	100
> +#define HIP04_MAX_TX_COALESCE_FRAMES	200
> +#define HIP04_MIN_TX_COALESCE_FRAMES	100
> +
> +struct tx_desc {
> +	u32 send_addr;

	__be32 send_adddr; ?

> +	u32 send_size;

	__be32

> +	u32 next_addr;
	__be32

> +	u32 cfg;
	__be32

> +	u32 wb_addr;
	__be32 wb_addr ?

> +} __aligned(64);
> +
> +struct rx_desc {
> +	u16 reserved_16;
> +	u16 pkt_len;
> +	u32 reserve1[3];
> +	u32 pkt_err;
> +	u32 reserve2[4];
> +};
> +
> +struct hip04_priv {
> +	void __iomem *base;
> +	int phy_mode;
> +	int chan;
> +	unsigned int port;
> +	unsigned int speed;
> +	unsigned int duplex;
> +	unsigned int reg_inten;
> +
> +	struct napi_struct napi;
> +	struct net_device *ndev;
> +
> +	struct tx_desc *tx_desc;
> +	dma_addr_t tx_desc_dma;
> +	struct sk_buff *tx_skb[TX_DESC_NUM];
> +	dma_addr_t tx_phys[TX_DESC_NUM];

This is not an efficient way to store skb/phys, as for each skb, info
will be store in 2 separate cache lines.

It would be better to use a 

struct hip04_tx_desc {
   struct sk_buff   *skb;
   dma_addr_t       phys;
} 

> +	unsigned int tx_head;
> +
> +	int tx_coalesce_frames;
> +	int tx_coalesce_usecs;
> +	struct hrtimer tx_coalesce_timer;
> +
> +	unsigned char *rx_buf[RX_DESC_NUM];
> +	dma_addr_t rx_phys[RX_DESC_NUM];

Same thing here : Use a struct to get better data locality.

> +	unsigned int rx_head;
> +	unsigned int rx_buf_size;
> +
> +	struct device_node *phy_node;
> +	struct phy_device *phy;
> +	struct regmap *map;
> +	struct work_struct tx_timeout_task;
> +
> +	/* written only by tx cleanup */
> +	unsigned int tx_tail ____cacheline_aligned_in_smp;
> +};
> +
> +static inline unsigned int tx_count(unsigned int head, unsigned int tail)
> +{
> +	return (head - tail) % (TX_DESC_NUM - 1);
> +}
> +
> +static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	u32 val;
> +
> +	priv->speed = speed;
> +	priv->duplex = duplex;
> +
> +	switch (priv->phy_mode) {
> +	case PHY_INTERFACE_MODE_SGMII:
> +		if (speed == SPEED_1000)
> +			val = SGMII_SPEED_1000;
> +		else if (speed == SPEED_100)
> +			val = SGMII_SPEED_100;
> +		else
> +			val = SGMII_SPEED_10;
> +		break;
> +	case PHY_INTERFACE_MODE_MII:
> +		if (speed == SPEED_100)
> +			val = MII_SPEED_100;
> +		else
> +			val = MII_SPEED_10;
> +		break;
> +	default:
> +		netdev_warn(ndev, "not supported mode\n");
> +		val = MII_SPEED_10;
> +		break;
> +	}
> +	writel_relaxed(val, priv->base + GE_PORT_MODE);
> +
> +	val = duplex ? GE_DUPLEX_FULL : GE_DUPLEX_HALF;
> +	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
> +
> +	val = GE_MODE_CHANGE_EN;
> +	writel_relaxed(val, priv->base + GE_MODE_CHANGE_REG);
> +}
> +
> +static void hip04_reset_ppe(struct hip04_priv *priv)
> +{
> +	u32 val, tmp, timeout = 0;
> +
> +	do {
> +		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
> +		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
> +		if (timeout++ > RESET_TIMEOUT)
> +			break;
> +	} while (val & 0xfff);
> +}
> +
> +static void hip04_config_fifo(struct hip04_priv *priv)
> +{
> +	u32 val;
> +
> +	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
> +	val |= PPE_CFG_STS_RX_PKT_CNT_RC;
> +	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
> +
> +	val = BIT(priv->port);
> +	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
> +
> +	val = priv->port << PPE_CFG_QOS_VMID_GRP_SHIFT;
> +	val |= PPE_CFG_QOS_VMID_MODE;
> +	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
> +
> +	val = RX_BUF_SIZE;
> +	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
> +
> +	val = RX_DESC_NUM << PPE_CFG_RX_DEPTH_SHIFT;
> +	val |= PPE_CFG_RX_FIFO_FSFU;
> +	val |= priv->chan << PPE_CFG_RX_START_SHIFT;
> +	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
> +
> +	val = NET_IP_ALIGN << PPE_CFG_RX_CTRL_ALIGN_SHIFT;
> +	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
> +
> +	val = PPE_CFG_RX_PKT_ALIGN;
> +	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
> +
> +	val = PPE_CFG_BUS_LOCAL_REL | PPE_CFG_BUS_BIG_ENDIEN;
> +	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
> +
> +	val = GMAC_PPE_RX_PKT_MAX_LEN;
> +	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
> +
> +	val = GMAC_MAX_PKT_LEN;
> +	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
> +
> +	val = GMAC_MIN_PKT_LEN;
> +	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
> +
> +	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
> +	val |= GE_TX_AUTO_NEG | GE_TX_ADD_CRC | GE_TX_SHORT_PAD_THROUGH;
> +	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
> +
> +	val = GE_RX_STRIP_CRC;
> +	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
> +
> +	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
> +	val |= GE_RX_STRIP_PAD | GE_RX_PAD_EN;
> +	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
> +
> +	val = GE_AUTO_NEG_CTL;
> +	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
> +}
> +
> +static void hip04_mac_enable(struct net_device *ndev)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	u32 val;
> +
> +	/* enable tx & rx */
> +	val = readl_relaxed(priv->base + GE_PORT_EN);
> +	val |= GE_RX_PORT_EN | GE_TX_PORT_EN;
> +	writel_relaxed(val, priv->base + GE_PORT_EN);
> +
> +	/* clear rx int */
> +	val = RCV_INT;
> +	writel_relaxed(val, priv->base + PPE_RINT);
> +
> +	/* config recv int */
> +	val = GE_RX_INT_THRESHOLD | GE_RX_TIMEOUT;
> +	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
> +
> +	/* enable interrupt */
> +	priv->reg_inten = DEF_INT_MASK;
> +	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +}
> +
> +static void hip04_mac_disable(struct net_device *ndev)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	u32 val;
> +
> +	/* disable int */
> +	priv->reg_inten &= ~(DEF_INT_MASK);
> +	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +
> +	/* disable tx & rx */
> +	val = readl_relaxed(priv->base + GE_PORT_EN);
> +	val &= ~(GE_RX_PORT_EN | GE_TX_PORT_EN);
> +	writel_relaxed(val, priv->base + GE_PORT_EN);
> +}
> +
> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
> +{
> +	writel(phys, priv->base + PPE_CFG_CPU_ADD_ADDR);
> +}
> +
> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
> +{
> +	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
> +}
> +
> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
> +{
> +	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
> +}
> +
> +static void hip04_update_mac_address(struct net_device *ndev)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +
> +	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
> +		       priv->base + GE_STATION_MAC_ADDRESS);
> +	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
> +			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
> +		       priv->base + GE_STATION_MAC_ADDRESS + 4);
> +}
> +
> +static int hip04_set_mac_address(struct net_device *ndev, void *addr)
> +{
> +	eth_mac_addr(ndev, addr);
> +	hip04_update_mac_address(ndev);
> +	return 0;
> +}
> +
> +static int hip04_tx_reclaim(struct net_device *ndev, bool force)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	unsigned tx_tail = priv->tx_tail;
> +	struct tx_desc *desc;
> +	unsigned int bytes_compl = 0, pkts_compl = 0;
> +	unsigned int count;
> +
> +	smp_rmb();
> +	count = tx_count(ACCESS_ONCE(priv->tx_head), tx_tail);
> +	if (count == 0)
> +		goto out;
> +
> +	while (count) {
> +		desc = &priv->tx_desc[tx_tail];
> +		if (desc->send_addr != 0) {
> +			if (force)
> +				desc->send_addr = 0;
> +			else
> +				break;
> +		}
> +
> +		if (priv->tx_phys[tx_tail]) {
> +			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> +					 priv->tx_skb[tx_tail]->len,
> +					 DMA_TO_DEVICE);
> +			priv->tx_phys[tx_tail] = 0;
> +		}
> +		pkts_compl++;
> +		bytes_compl += priv->tx_skb[tx_tail]->len;
> +		dev_kfree_skb(priv->tx_skb[tx_tail]);
> +		priv->tx_skb[tx_tail] = NULL;
> +		tx_tail = TX_NEXT(tx_tail);
> +		count--;
> +	}
> +
> +	priv->tx_tail = tx_tail;
> +	smp_wmb(); /* Ensure tx_tail visible to xmit */
> +
> +out:
> +	if (pkts_compl || bytes_compl)

Testing bytes_compl should be enough : There is no way pkt_compl could
be 0 if bytes_compl is not 0.

> +		netdev_completed_queue(ndev, pkts_compl, bytes_compl);
> +
> +	if (unlikely(netif_queue_stopped(ndev)) && (count < (TX_DESC_NUM - 1)))
> +		netif_wake_queue(ndev);
> +
> +	return count;
> +}
> +
> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	struct net_device_stats *stats = &ndev->stats;
> +	unsigned int tx_head = priv->tx_head, count;
> +	struct tx_desc *desc = &priv->tx_desc[tx_head];
> +	dma_addr_t phys;
> +
> +	smp_rmb();
> +	count = tx_count(tx_head, ACCESS_ONCE(priv->tx_tail));
> +	if (count == (TX_DESC_NUM - 1)) {
> +		netif_stop_queue(ndev);
> +		return NETDEV_TX_BUSY;
> +	}
> +
> +	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> +	if (dma_mapping_error(&ndev->dev, phys)) {
> +		dev_kfree_skb(skb);
> +		return NETDEV_TX_OK;
> +	}
> +
> +	priv->tx_skb[tx_head] = skb;
> +	priv->tx_phys[tx_head] = phys;
> +	desc->send_addr = cpu_to_be32(phys);
> +	desc->send_size = cpu_to_be32(skb->len);
> +	desc->cfg = cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
> +	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> +	desc->wb_addr = cpu_to_be32(phys);
> +	skb_tx_timestamp(skb);
> +
> +	hip04_set_xmit_desc(priv, phys);
> +	priv->tx_head = TX_NEXT(tx_head);
> +	count++;

Starting from this point, skb might already have been freed by TX
completion.

Its racy to access skb->len 

> +	netdev_sent_queue(ndev, skb->len);
> +
> +	stats->tx_bytes += skb->len;
> +	stats->tx_packets++;
> +
> +	/* Ensure tx_head update visible to tx reclaim */
> +	smp_wmb();
> +
> +	/* queue is getting full, better start cleaning up now */
> +	if (count >= priv->tx_coalesce_frames) {
> +		if (napi_schedule_prep(&priv->napi)) {
> +			/* disable rx interrupt and timer */
> +			priv->reg_inten &= ~(RCV_INT);
> +			writel_relaxed(DEF_INT_MASK & ~RCV_INT,
> +				       priv->base + PPE_INTEN);
> +			hrtimer_cancel(&priv->tx_coalesce_timer);
> +			__napi_schedule(&priv->napi);
> +		}
> +	} else if (!hrtimer_is_queued(&priv->tx_coalesce_timer)) {
> +		/* cleanup not pending yet, start a new timer */
> +		hrtimer_start_expires(&priv->tx_coalesce_timer,
> +				      HRTIMER_MODE_REL);
> +	}
> +
> +	return NETDEV_TX_OK;
> +}
> +
> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> +{
> +	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
> +	struct net_device *ndev = priv->ndev;
> +	struct net_device_stats *stats = &ndev->stats;
> +	unsigned int cnt = hip04_recv_cnt(priv);
> +	struct rx_desc *desc;
> +	struct sk_buff *skb;
> +	unsigned char *buf;
> +	bool last = false;
> +	dma_addr_t phys;
> +	int rx = 0;
> +	int tx_remaining;
> +	u16 len;
> +	u32 err;
> +
> +	while (cnt && !last) {
> +		buf = priv->rx_buf[priv->rx_head];
> +		skb = build_skb(buf, priv->rx_buf_size);
> +		if (unlikely(!skb))
> +			net_dbg_ratelimited("build_skb failed\n");

Well, is skb is NULL, you're crashing later...
You really have to address a memory allocation error much better than
that !

> +
> +		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
> +				 RX_BUF_SIZE, DMA_FROM_DEVICE);
> +		priv->rx_phys[priv->rx_head] = 0;
> +
> +		desc = (struct rx_desc *)skb->data;
> +		len = be16_to_cpu(desc->pkt_len);
> +		err = be32_to_cpu(desc->pkt_err);
> +
> +		if (0 == len) {
> +			dev_kfree_skb_any(skb);
> +			last = true;
> +		} else if ((err & RX_PKT_ERR) || (len >= GMAC_MAX_PKT_LEN)) {
> +			dev_kfree_skb_any(skb);
> +			stats->rx_dropped++;
> +			stats->rx_errors++;
> +		} else {
> +			skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
> +			skb_put(skb, len);
> +			skb->protocol = eth_type_trans(skb, ndev);
> +			napi_gro_receive(&priv->napi, skb);
> +			stats->rx_packets++;
> +			stats->rx_bytes += len;
> +			rx++;
> +		}
> +
> +		buf = netdev_alloc_frag(priv->rx_buf_size);
> +		if (!buf)
> +			goto done;

Same problem here : In case of memory allocation error, your driver is
totally screwed.

> +		phys = dma_map_single(&ndev->dev, buf,
> +				      RX_BUF_SIZE, DMA_FROM_DEVICE);
> +		if (dma_mapping_error(&ndev->dev, phys))
> +			goto done;

Same problem here : You really have to recover properly.

> +		priv->rx_buf[priv->rx_head] = buf;
> +		priv->rx_phys[priv->rx_head] = phys;
> +		hip04_set_recv_desc(priv, phys);
> +
> +		priv->rx_head = RX_NEXT(priv->rx_head);
> +		if (rx >= budget)
> +			goto done;
> +
> +		if (--cnt == 0)
> +			cnt = hip04_recv_cnt(priv);
> +	}
> +
> +	if (!(priv->reg_inten & RCV_INT)) {
> +		/* enable rx interrupt */
> +		priv->reg_inten |= RCV_INT;
> +		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +	}
> +	napi_complete(napi);
> +done:
> +	/* clean up tx descriptors and start a new timer if necessary */
> +	tx_remaining = hip04_tx_reclaim(ndev, false);
> +	if (rx < budget && tx_remaining)
> +		hrtimer_start_expires(&priv->tx_coalesce_timer, HRTIMER_MODE_REL);
> +
> +	return rx;
> +}
> +

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
  2015-01-14  8:53   ` Arnd Bergmann
@ 2015-01-15  2:54     ` Ding Tianhong
  2015-01-15  9:05       ` Arnd Bergmann
  0 siblings, 1 reply; 25+ messages in thread
From: Ding Tianhong @ 2015-01-15  2:54 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
	linux-arm-kernel, eric.dumazet, xuwei5, zhangfei.gao, netdev,
	devicetree, linux

On 2015/1/14 16:53, Arnd Bergmann wrote:
> On Wednesday 14 January 2015 14:34:14 Ding Tianhong wrote:
>> +#define HIP04_MAX_TX_COALESCE_USECS    200
>> +#define HIP04_MIN_TX_COALESCE_USECS    100
>> +#define HIP04_MAX_TX_COALESCE_FRAMES   200
>> +#define HIP04_MIN_TX_COALESCE_FRAMES   100
> 
> It's not important, but in case you are creating another version of the
> patch, maybe the allowed range can be extended somewhat. The example values
> I picked when I sent my suggestion were really made up. It's great if
> they work fine, but users might want to  tune this far more depending on
> their workloads,  How about these
> 
> #define HIP04_MAX_TX_COALESCE_USECS    100000
> #define HIP04_MIN_TX_COALESCE_USECS    1
> #define HIP04_MAX_TX_COALESCE_FRAMES   (TX_DESC_NUM - 1)
> #define HIP04_MIN_TX_COALESCE_FRAMES   1
> 

Is it really ok that the so big range may break the driver and hip04 could not work fine?
I am not sure it is ok, I will fix it in next version.
 
Ding

> 	Arnd
> 
> .
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
  2015-01-14  8:06   ` Joe Perches
@ 2015-01-15  2:56     ` Ding Tianhong
  0 siblings, 0 replies; 25+ messages in thread
From: Ding Tianhong @ 2015-01-15  2:56 UTC (permalink / raw)
  To: Joe Perches
  Cc: arnd, robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
	linux-arm-kernel, eric.dumazet, xuwei5, zhangfei.gao, netdev,
	devicetree, linux

On 2015/1/14 16:06, Joe Perches wrote:
> On Wed, 2015-01-14 at 14:34 +0800, Ding Tianhong wrote:
>> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
>> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
> 
> Single comment:
> 
>> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
> []
>> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
>> +{
> []
>> +	while (cnt && !last) {
>> +		buf = priv->rx_buf[priv->rx_head];
>> +		skb = build_skb(buf, priv->rx_buf_size);
>> +		if (unlikely(!skb))
>> +			net_dbg_ratelimited("build_skb failed\n");
>> +
>> +		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
>> +				 RX_BUF_SIZE, DMA_FROM_DEVICE);
>> +		priv->rx_phys[priv->rx_head] = 0;
>> +
>> +		desc = (struct rx_desc *)skb->data;
> 
> Perhaps if (!skb) is possible, there shouldn't be
> a dereference of that known null here.
> 
Yes, we should return and do something to avoid oops, thanks.

Ding

> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
  2015-01-14  6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
                     ` (2 preceding siblings ...)
  2015-01-14 16:34   ` Eric Dumazet
@ 2015-01-15  4:39   ` Joe Perches
       [not found]     ` <1421296742.25598.3.camel-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
  2015-01-15  9:53   ` Arnd Bergmann
  2015-01-19 18:11   ` Alexander Graf
  5 siblings, 1 reply; 25+ messages in thread
From: Joe Perches @ 2015-01-15  4:39 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: arnd, robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
	linux-arm-kernel, eric.dumazet, xuwei5, zhangfei.gao, netdev,
	devicetree, linux

On Wed, 2015-01-14 at 14:34 +0800, Ding Tianhong wrote:
> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.

Mostly trivial comments:

> +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c

[]

> +#define GMAC_MAX_PKT_LEN		1516

[]

> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> +{
[]
> +	while (cnt && !last) {
[]
> +		desc = (struct rx_desc *)skb->data;
> +		len = be16_to_cpu(desc->pkt_len);
> +		err = be32_to_cpu(desc->pkt_err);
> +
> +		if (0 == len) {
> +			dev_kfree_skb_any(skb);
> +			last = true;
> +		} else if ((err & RX_PKT_ERR) || (len >= GMAC_MAX_PKT_LEN)) {

Is this ">=" correct?  Maybe it should be ">" ?

[]

> +static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
> +{
> +	struct net_device *ndev = (struct net_device *)dev_id;

Unnecessary cast of void *

[]

> +static int hip04_set_coalesce(struct net_device *netdev,
> +			      struct ethtool_coalesce *ec)
> +{
> +	struct hip04_priv *priv = netdev_priv(netdev);
> +
> +	/* Check not supported parameters  */
> +	if ((ec->rx_max_coalesced_frames) || (ec->rx_coalesce_usecs_irq) ||
> +	    (ec->rx_max_coalesced_frames_irq) || (ec->tx_coalesce_usecs_irq) ||
> +	    (ec->use_adaptive_rx_coalesce) || (ec->use_adaptive_tx_coalesce) ||
> +	    (ec->pkt_rate_low) || (ec->rx_coalesce_usecs_low) ||
> +	    (ec->rx_max_coalesced_frames_low) || (ec->tx_coalesce_usecs_high) ||
> +	    (ec->tx_max_coalesced_frames_low) || (ec->pkt_rate_high) ||
> +	    (ec->tx_coalesce_usecs_low) || (ec->rx_coalesce_usecs_high) ||
> +	    (ec->rx_max_coalesced_frames_high) || (ec->rx_coalesce_usecs) ||
> +	    (ec->tx_max_coalesced_frames_irq) ||
> +	    (ec->stats_block_coalesce_usecs) ||
> +	    (ec->tx_max_coalesced_frames_high) || (ec->rate_sample_interval))
> +		return -EOPNOTSUPP;

Rather than a somewhat haphazard mix of these values, 
this might be simpler to read as something like:

	/* Check not supported parameters  */
	if (ec->pkt_rate_low ||
	    ec->pkt_rate_high ||

	    ec->use_adaptive_rx_coalesce ||
	    ec->rx_coalesce_usecs ||
	    ec->rx_coalesce_usecs_low ||
	    ec->rx_coalesce_usecs_high ||
	    ec->rx_coalesce_usecs_irq ||
	    ec->rx_max_coalesced_frames ||
	    ec->rx_max_coalesced_frames_low ||
	    ec->rx_max_coalesced_frames_high ||
	    ec->rx_max_coalesced_frames_irq ||
	    
	    ec->use_adaptive_tx_coalesce ||
	    ec->tx_coalesce_usecs_low ||
	    ec->tx_coalesce_usecs_high ||
	    ec->tx_max_coalesced_frames_low ||
	    ec->tx_max_coalesced_frames_high ||
	    ec->tx_max_coalesced_frames_irq ||

	    ec->stats_block_coalesce_usecs ||
	    ec->rate_sample_interval)
		return -EOPNOTSUPP;


> +static void hip04_free_ring(struct net_device *ndev, struct device *d)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	int i;
> +
> +	for (i = 0; i < RX_DESC_NUM; i++)
> +		if (priv->rx_buf[i])
> +			put_page(virt_to_head_page(priv->rx_buf[i]));

It's generally nicer to use braces around
for loops with single ifs.

> +
> +	for (i = 0; i < TX_DESC_NUM; i++)
> +		if (priv->tx_skb[i])
> +			dev_kfree_skb_any(priv->tx_skb[i]);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver
  2015-01-14 10:19 ` [PATCH net-next v13 0/3] add hisilicon " Alexander Graf
@ 2015-01-15  8:37   ` Ding Tianhong
  2015-01-15  9:42     ` Arnd Bergmann
  0 siblings, 1 reply; 25+ messages in thread
From: Ding Tianhong @ 2015-01-15  8:37 UTC (permalink / raw)
  To: Alexander Graf, arnd, robh+dt, davem, grant.likely
  Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
	zhangfei.gao, netdev, devicetree, linux

On 2015/1/14 18:19, Alexander Graf wrote:
> 
> 
> On 14.01.15 07:34, Ding Tianhong wrote:
>> v13:
>> - Fix the problem of alignment parameters for function and checkpatch warming.
>>
>> v12:
>> - According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
>>   for hip04 ethernet.
>>
>> v11:
>> - Add ethtool support for tx coalecse getting and setting, the xmit_more
>>   is not supported for this patch, but I think it could work for hip04,
>>   will support it later after some tests for performance better.
>>
>>   Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
>>   it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
>>
>>   - Before:
>>     $ ping 192.168.1.1 ...
>>     === 192.168.1.1 ping statistics ===
>>     24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>>     rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
>>
>>     $ iperf -c 192.168.1.1 ...
>>     [ ID] Interval       Transfer     Bandwidth
>>     [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec
>>
>>   - After:
>>     $ ping 192.168.1.1 ...
>>     === 192.168.1.1 ping statistics ===
>>     24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>>     rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
>>
>>     $ iperf -c 192.168.1.1 ...
>>     [ ID] Interval       Transfer     Bandwidth
>>     [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec
> 
> Using v11 on top of 3.18 and after generating some traffic on the
> network as well as compiling code on ahci in parallel I ran into the
> following OOPS. Any idea what exactly is going on?
> 
>>From a 10000 feet perspective it looks like two problems to me
> 
>   1) Allocation failure doesn't get handled properly somewhere
>   2) We fail to allocate with order=0 - I don't see why
> 
is it easy to repetition this bug? how big is your memory on your board, is it happened in your previous hip04 driver?

ding

> 
> Alex
> 
> 
> [44085.652301] ld: page allocation failure: order:0, mode:0x120
> [44085.671314] CPU: 0 PID: 5121 Comm: ld Not tainted 3.18.0-5-lpae #1
> [44085.692110] [<c0335814>] (unwind_backtrace) from [<c032fcc8>]
> (show_stack+0x18/0x1c)
> [44085.718109] [<c032fcc8>] (show_stack) from [<c0a50afc>]
> (dump_stack+0x88/0x98)
> [44085.742360] [<c0a50afc>] (dump_stack) from [<c0443f98>]
> (warn_alloc_failed+0xdc/0x124)
> [44085.768938] [<c0443f98>] (warn_alloc_failed) from [<c04469ec>]
> (__alloc_pages_nodemask+0x7ac/0xa8c)
> [44085.799305] [<c04469ec>] (__alloc_pages_nodemask) from [<c0959a88>]
> (__netdev_alloc_frag+0xac/0x17c)
> [44085.829994] [<c0959a88>] (__netdev_alloc_frag) from [<bf00051c>]
> (hip04_rx_poll+0x168/0x3cc [hip04_eth])
> [44085.861830] [<bf00051c>] (hip04_rx_poll [hip04_eth]) from
> [<c096c0fc>] (net_rx_action+0x15c/0x258)
> [44085.891897] [<c096c0fc>] (net_rx_action) from [<c0369508>]
> (__do_softirq+0x138/0x2dc)
> [44085.918171] [<c0369508>] (__do_softirq) from [<c0369920>]
> (irq_exit+0x80/0xb8)
> [44085.942408] [<c0369920>] (irq_exit) from [<c03b3ee0>]
> (__handle_domain_irq+0x68/0xa8)
> [44085.968685] [<c03b3ee0>] (__handle_domain_irq) from [<c03086ac>]
> (hip04_handle_irq+0x38/0x68)
> [44085.997296] [<c03086ac>] (hip04_handle_irq) from [<c0a56d48>]
> (__irq_usr+0x48/0x60)
> [44086.022977] Exception stack(0xd3fbbfb0 to 0xd3fbbff8)
> [44086.039925] bfa0:                                     03fa94f8
> 01fc2598 01fc2598 00000000
> [44086.067357] bfc0: 03fa9458 40482000 00000000 400878b0 004e46e8
> 00000040 01f61f80 00000013
> [44086.094785] bfe0: fbad2488 bed4c270 403aeb5c 403bbe84 200e0010 ffffffff
> [44086.116970] Mem-info:
> [44086.124597] DMA per-cpu:
> [44086.133098] CPU    0: hi:  186, btch:  31 usd: 191
> [44086.149167] HighMem per-cpu:
> [44086.158829] CPU    0: hi:  186, btch:  31 usd:  10
> [44086.174919] active_anon:78994 inactive_anon:57435 isolated_anon:0
> [44086.174919]  active_file:721847 inactive_file:559931 isolated_file:0
> [44086.174919]  unevictable:20 dirty:567 writeback:0 unstable:0
> [44086.174919]  free:2674014 slab_reclaimable:41072 slab_unreclaimable:5704
> [44086.174919]  mapped:13107 shmem:42709 pagetables:933 bounce:0
> [44086.174919]  free_cma:16216
> [44086.286552] DMA free:1088kB min:3012kB low:3764kB high:4516kB
> active_anon:584kB inactive_anon:2948kB active_file:183224kB
> inactive_file:183268kB unevictable:16kB isolated(anon):0kB
> isolated(file):0kB present:778240kB managed:569024kB mlocked:16kB
> dirty:28kB writeback:0kB mapped:864kB shmem:2948kB
> slab_reclaimable:164288kB slab_unreclaimable:22816kB kernel_stack:776kB
> pagetables:3732kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
> pages_scanned:0 all_unreclaimable? no
> [44086.427835] lowmem_reserve[]: 0 0 15624 15624
> [44086.442687] HighMem free:10694968kB min:512kB low:21708kB
> high:42908kB active_anon:315392kB inactive_anon:226792kB
> active_file:2704164kB inactive_file:2056456kB unevictable:64kB
> isolated(anon):0kB isolated(file):0kB present:15998976kB
> managed:15998976kB mlocked:64kB dirty:2240kB writeback:0kB
> mapped:51564kB shmem:167888kB slab_reclaimable:0kB
> slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
> bounce:59318392kB free_cma:64864kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> [44086.590672] lowmem_reserve[]: 0 0 0 0
> [44086.603158] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 1*64kB (R) 0*128kB 0*256kB
> 0*512kB 1*1024kB (R) 0*2048kB 0*4096kB = 1088kB
> [44086.639330] HighMem: 6*4kB (UMC) 2*8kB (U) 3*16kB (UMC) 5*32kB (UC)
> 3*64kB (UMC) 1*128kB (C) 1*256kB (M) 57*512kB (UM) 25*1024kB (UMC)
> 7*2048kB (UMC) 2594*4096kB (MRC) = 10694968kB
> [44086.694225] Node 0 hugepages_total=0 hugepages_free=0
> hugepages_surp=0 hugepages_size=2048kB
> [44086.722524] 1324446 total pagecache pages
> [44086.735973] 33 pages in swap cache
> [44086.747384] Swap cache stats: add 741, delete 708, find 0/0
> [44086.766072] Free swap  = 497068kB
> [44086.777185] Total swap = 500032kB
> [44087.151253] 4194304 pages of RAM
> [44087.162085] 2674498 free pages
> [44087.172327] 52304 reserved pages
> [44087.183150] 46573 slab pages
> [44087.192810] 1004058 pages shared
> [44087.203632] 33 pages swap cached
> [44087.217489] Unable to handle kernel paging request at virtual address
> b0000000
> [44087.241776] pgd = c0303000
> [44087.250887] [b0000000] *pgd=80000010306003, *pmd=00000000
> [44087.269133] Internal error: Oops: 2a06 [#1] SMP ARM
> [44087.285499] Modules linked in: fuse dm_mod uio_pdrv_genirq uio
> nls_iso8859_1 nls_cp437 vfat fat sg st sr_mod cdrom af_packet squashfs
> loop virtio_blk virtio_ring virtio brd marvell ahci_platform
> libahci_platform libahci hip04_mdio gpio_dwapb hip04_eth [last unloaded:
> dm_mod]
> [44087.368299] CPU: 0 PID: 9 Comm: rcuos/0 Not tainted 3.18.0-5-lpae #1
> [44087.389614] task: e88a46c0 ti: e88ac000 task.ti: e88ac000
> [44087.407749] PC is at v7_dma_inv_range+0x34/0x4c
> [44087.422951] LR is at dma_cache_maint_page+0x9c/0x118
> [44087.439609] pc : [<c0340b94>]    lr : [<c033a67c>]    psr: 400e0013
> [44087.439609] sp : e88addc0  ip : c0340c2c  fp : 00000002
> [44087.478104] r10: 00000000  r9 : c0e73580  r8 : c0e778c4
> [44087.495630] r7 : c0fdfb80  r6 : c0f20780  r5 : 00000000  r4 : 00000640
> [44087.517524] r3 : 0000003f  r2 : 00000040  r1 : b0000640  r0 : b0000000
> [44087.539424] Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
> Segment kernel
> [44087.563939] Control: 30c5387d  Table: 21305b40  DAC: fffffffd
> [44087.583213] Process rcuos/0 (pid: 9, stack limit = 0xe88ac238)
> [44087.602777] Stack: (0xe88addc0 to 0xe88ae000)
> [44087.617398] ddc0: 00000000 00000000 e88a4708 e8ae059c e8ae0000
> c0e778c4 c0fdfb80 00000640
> [44087.644829] dde0: 00000000 df7f9000 00000002 c033a91c c0340c2c
> 00000700 df9da740 e8ae059c
> [44087.672259] de00: e8ae0000 000012ac 00000038 db903700 e8ae02d8
> c0fdfb80 00000001 bf000498
> [44087.699689] de20: 00000640 00000002 00000000 c0e9c5b0 e88ac000
> 00000101 c0e7ab04 00000000
> [44087.727119] de40: 00000040 bf001e28 c0f77800 e8ae059c 00000000
> 00000040 0000012c df9da740
> [44087.754549] de60: c0e6d100 00000101 e88ac000 c096c0fc c0f245e4
> df9da748 007aae3c e88ac000
> [44087.781978] de80: c0f26038 c0f2410d e88a5850 00000003 c0e6d08c
> 00000002 e88ac000 00000002
> [44087.809407] dea0: c0f24314 00000101 e1f4d4c0 c0369508 00000000
> c0e94fd4 0000000c c0e6d080
> [44087.836837] dec0: c0e65448 0000000a c0f34000 c0e6d100 007aae3b
> e88ac030 c0a62b84 00208040
> [44087.864266] dee0: a00e0013 600e0013 df9d7680 000001ff 00000001
> df9d7744 00000000 c0e73630
> [44087.891696] df00: e1f4d4c0 c036976c e88ac010 c0369830 db903700
> df9d7680 e88ac000 c03bc8cc
> [44087.919125] df20: 00000000 d3fedb80 000b7df0 00000000 e88a46c0
> c039db48 e88adf38 e88adf38
> [44087.946554] df40: 00000000 e8838940 00000000 df9d7680 c03bc74c
> 00000000 00000000 00000000
> [44087.973984] df60: 00000000 c0380398 00000000 00000000 00000001
> df9d7680 00000000 00000000
> [44088.001413] df80: e88adf80 e88adf80 00000000 00000000 e88adf90
> e88adf90 e88adfac e8838940
> [44088.028843] dfa0: c03802b8 00000000 00000000 c032bbc8 00000000
> 00000000 00000000 00000000
> [44088.056272] dfc0: 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000
> [44088.083701] dfe0: 00000000 00000000 00000000 00000000 00000013
> 00000000 e88adff4 00000000
> [44088.111150] [<c0340b94>] (v7_dma_inv_range) from [<c033a67c>]
> (dma_cache_maint_page+0x9c/0x118)
> [44088.140339] [<c033a67c>] (dma_cache_maint_page) from [<c033a91c>]
> (__dma_page_dev_to_cpu+0x90/0x110)
> [44088.170996] [<c033a91c>] (__dma_page_dev_to_cpu) from [<bf000498>]
> (hip04_rx_poll+0xe4/0x3cc [hip04_eth])
> [44088.203119] [<bf000498>] (hip04_rx_poll [hip04_eth]) from
> [<c096c0fc>] (net_rx_action+0x15c/0x258)
> [44088.233185] [<c096c0fc>] (net_rx_action) from [<c0369508>]
> (__do_softirq+0x138/0x2dc)
> [44088.259457] [<c0369508>] (__do_softirq) from [<c036976c>]
> (do_softirq+0x5c/0x64)
> [44088.284270] [<c036976c>] (do_softirq) from [<c0369830>]
> (__local_bh_enable_ip+0xbc/0xc0)
> [44088.311428] [<c0369830>] (__local_bh_enable_ip) from [<c03bc8cc>]
> (rcu_nocb_kthread+0x180/0x5bc)
> [44088.340910] [<c03bc8cc>] (rcu_nocb_kthread) from [<c0380398>]
> (kthread+0xe0/0xf8)
> [44088.366026] [<c0380398>] (kthread) from [<c032bbc8>]
> (ret_from_fork+0x14/0x20)
> [44088.390260] Code: 1e070f3e e1110003 e1c11003 1e071f3e (ee070f36)
> [44088.410808] ---[ end trace ccf8b617d193d373 ]---
> [44088.426324] Kernel panic - not syncing: Fatal exception in interrupt
> [44088.447652] Rebooting in 100 seconds..Reboot failed -- System halted
> 
> .
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
  2015-01-15  2:54     ` Ding Tianhong
@ 2015-01-15  9:05       ` Arnd Bergmann
  0 siblings, 0 replies; 25+ messages in thread
From: Arnd Bergmann @ 2015-01-15  9:05 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: robh+dt, davem, grant.likely, agraf, sergei.shtylyov,
	linux-arm-kernel, eric.dumazet, xuwei5, zhangfei.gao, netdev,
	devicetree, linux

On Thursday 15 January 2015 10:54:24 Ding Tianhong wrote:
> On 2015/1/14 16:53, Arnd Bergmann wrote:
> > On Wednesday 14 January 2015 14:34:14 Ding Tianhong wrote:
> >> +#define HIP04_MAX_TX_COALESCE_USECS    200
> >> +#define HIP04_MIN_TX_COALESCE_USECS    100
> >> +#define HIP04_MAX_TX_COALESCE_FRAMES   200
> >> +#define HIP04_MIN_TX_COALESCE_FRAMES   100
> > 
> > It's not important, but in case you are creating another version of the
> > patch, maybe the allowed range can be extended somewhat. The example values
> > I picked when I sent my suggestion were really made up. It's great if
> > they work fine, but users might want to  tune this far more depending on
> > their workloads,  How about these
> > 
> > #define HIP04_MAX_TX_COALESCE_USECS    100000
> > #define HIP04_MIN_TX_COALESCE_USECS    1
> > #define HIP04_MAX_TX_COALESCE_FRAMES   (TX_DESC_NUM - 1)
> > #define HIP04_MIN_TX_COALESCE_FRAMES   1
> > 
> 
> Is it really ok that the so big range may break the driver and hip04 could not work fine?
> I am not sure it is ok, I will fix it in next version.

Obviously, performance will suffer when you pick a bad setting for
a given workload. If the numbers are too low, you will increase
the CPU load but get better latency, so I see nothing wrong in
allowing '1' as the minimum. For the descriptor number, you can't
go above TX_DESC_NUM, but there is nothing wrong in going close to
it, it will just mean that the timer should fire before you get
there and you again get more interrupts.

The 100ms maximum delay is a bit extreme, and will definitely impact
TX latency in some workloads if a user sets this, but the system
will should still be usable, and I couldn't think of a better
limit. Feel free to set this to e.g. 10ms or 1ms if you feel more
comfortable with that.

For reference, with TX_DESC_NUM=256 and 1500 byte frames, you have
up to 300ms worth of data in a full tx queue on a 10mbit link,
or 3ms respectively for a 1gbit link. Because of BQL, the actual
queue length will normally be much shorter than this, but on a
tx-only workload won't shrink below the currently used
tx_coalesce_usecs value.

	Arnd

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver
  2015-01-15  8:37   ` Ding Tianhong
@ 2015-01-15  9:42     ` Arnd Bergmann
  2015-01-15 12:39       ` Alexander Graf
  0 siblings, 1 reply; 25+ messages in thread
From: Arnd Bergmann @ 2015-01-15  9:42 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Alexander Graf, robh+dt, davem, grant.likely, sergei.shtylyov,
	linux-arm-kernel, eric.dumazet, xuwei5, zhangfei.gao, netdev,
	devicetree, linux

On Thursday 15 January 2015 16:37:23 Ding Tianhong wrote:
> On 2015/1/14 18:19, Alexander Graf wrote:
> > 
> >>From a 10000 feet perspective it looks like two problems to me
> > 
> >   1) Allocation failure doesn't get handled properly somewhere

This is the bug that Eric pointed out as well.

> >   2) We fail to allocate with order=0 - I don't see why

GFP_ATOMIC. When allocating from a the napi poll function in softirq
context, you have to use nonblocking allocations, which occasionally
fail. This should not cause any harm other than dropped packets.

> is it easy to repetition this bug? how big is your memory on your board,
> is it happened in your previous hip04 driver?

It should be independent of memory size, but may be more likely if you
don't have swap space configured.

	Arnd

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
  2015-01-14  6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
                     ` (3 preceding siblings ...)
  2015-01-15  4:39   ` Joe Perches
@ 2015-01-15  9:53   ` Arnd Bergmann
  2015-01-19 18:11   ` Alexander Graf
  5 siblings, 0 replies; 25+ messages in thread
From: Arnd Bergmann @ 2015-01-15  9:53 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ding Tianhong, robh+dt, davem, grant.likely, agraf, devicetree,
	linux, sergei.shtylyov, eric.dumazet, netdev, xuwei5,
	zhangfei.gao

On Wednesday 14 January 2015 14:34:14 Ding Tianhong wrote:
> +static int hip04_set_coalesce(struct net_device *netdev,
> +                             struct ethtool_coalesce *ec)
> +{
> +       struct hip04_priv *priv = netdev_priv(netdev);
> +
> +       /* Check not supported parameters  */
> +       if ((ec->rx_max_coalesced_frames) || (ec->rx_coalesce_usecs_irq) ||
> +           (ec->rx_max_coalesced_frames_irq) || (ec->tx_coalesce_usecs_irq) ||
> +           (ec->use_adaptive_rx_coalesce) || (ec->use_adaptive_tx_coalesce) ||
> +           (ec->pkt_rate_low) || (ec->rx_coalesce_usecs_low) ||
> +           (ec->rx_max_coalesced_frames_low) || (ec->tx_coalesce_usecs_high) ||
> +           (ec->tx_max_coalesced_frames_low) || (ec->pkt_rate_high) ||
> +           (ec->tx_coalesce_usecs_low) || (ec->rx_coalesce_usecs_high) ||
> +           (ec->rx_max_coalesced_frames_high) || (ec->rx_coalesce_usecs) ||
> +           (ec->tx_max_coalesced_frames_irq) ||
> +           (ec->stats_block_coalesce_usecs) ||
> +           (ec->tx_max_coalesced_frames_high) || (ec->rate_sample_interval))
> +               return -EOPNOTSUPP;
> +
> +       if ((ec->tx_coalesce_usecs > HIP04_MAX_TX_COALESCE_USECS ||
> +            ec->tx_coalesce_usecs < HIP04_MIN_TX_COALESCE_USECS) ||
> +           (ec->tx_max_coalesced_frames > HIP04_MAX_TX_COALESCE_FRAMES ||
> +            ec->tx_max_coalesced_frames < HIP04_MIN_TX_COALESCE_FRAMES))
> +               return -EINVAL;
> +
> +       priv->tx_coalesce_usecs = ec->tx_coalesce_usecs;
> +       priv->tx_coalesce_frames = ec->tx_max_coalesced_frames;
> +
> +       return 0;
> +}

I just looked at this again and noticed that you fail to actually use the
tx_coalesce_usecs value anywhere, since you don't call hrtimer_set_expires_range()
any more.

Also, I guess it would be nice use the rx_coalesce_usecs_low and
tx_coalesce_usecs_high numbers instead of just a single number, as
hrtimer_set_expires_range takes two values already. Just do something
like 

	hrtimer_set_expires_range(&priv->tx_coalesce_timer,
				   priv->rx_coalesce_usecs_low,
				   priv->rx_coalesce_usecs_high -
				   priv->rx_coalesce_usecs_low);

and make sure the 'low' value is smaller than the 'high' one.

> +       priv->tx_coalesce_frames = TX_DESC_NUM * 3 / 4;
> +       priv->tx_coalesce_usecs = 200;

Related, instead of putting the raw numbers here, how about
adding the defaults along with the other macros we talked about:

#define HIP04_MAX_TX_COALESCE_USECS		10000
#define HIP04_MIN_TX_COALESCE_USECS		1
#define HIP04_MAX_TX_COALESCE_FRAMES		(TX_DESC_NUM - 1)
#define HIP04_MIN_TX_COALESCE_FRAMES		1

#define HIP04_DEFAULT_TX_COALESCE_USECS_LOW	80
#define HIP04_DEFAULT_TX_COALESCE_USECS_HIGH	200
#define HIP04_DEFAULT_TX_COALESCE_FRAMES	(TX_DESC_NUM / 2)

	Arnd

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
       [not found]     ` <1421296742.25598.3.camel-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
@ 2015-01-15 10:28       ` Ding Tianhong
  0 siblings, 0 replies; 25+ messages in thread
From: Ding Tianhong @ 2015-01-15 10:28 UTC (permalink / raw)
  To: Joe Perches
  Cc: arnd-r2nGTMty4D4, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	agraf-l3A5Bk7waGM,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
	zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-lFZ/pmaqli7XmaaqVzeoHQ

On 2015/1/15 12:39, Joe Perches wrote:
> On Wed, 2015-01-14 at 14:34 +0800, Ding Tianhong wrote:
>> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
>> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
> 
> Mostly trivial comments:
> 
>> +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
> 
> []
> 
>> +#define GMAC_MAX_PKT_LEN		1516
> 
> []
> 
>> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
>> +{
> []
>> +	while (cnt && !last) {
> []
>> +		desc = (struct rx_desc *)skb->data;
>> +		len = be16_to_cpu(desc->pkt_len);
>> +		err = be32_to_cpu(desc->pkt_err);
>> +
>> +		if (0 == len) {
>> +			dev_kfree_skb_any(skb);
>> +			last = true;
>> +		} else if ((err & RX_PKT_ERR) || (len >= GMAC_MAX_PKT_LEN)) {
> 
> Is this ">=" correct?  Maybe it should be ">" ?
> 
> []
> 
>> +static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
>> +{
>> +	struct net_device *ndev = (struct net_device *)dev_id;
> 
> Unnecessary cast of void *
> 
> []
> 
>> +static int hip04_set_coalesce(struct net_device *netdev,
>> +			      struct ethtool_coalesce *ec)
>> +{
>> +	struct hip04_priv *priv = netdev_priv(netdev);
>> +
>> +	/* Check not supported parameters  */
>> +	if ((ec->rx_max_coalesced_frames) || (ec->rx_coalesce_usecs_irq) ||
>> +	    (ec->rx_max_coalesced_frames_irq) || (ec->tx_coalesce_usecs_irq) ||
>> +	    (ec->use_adaptive_rx_coalesce) || (ec->use_adaptive_tx_coalesce) ||
>> +	    (ec->pkt_rate_low) || (ec->rx_coalesce_usecs_low) ||
>> +	    (ec->rx_max_coalesced_frames_low) || (ec->tx_coalesce_usecs_high) ||
>> +	    (ec->tx_max_coalesced_frames_low) || (ec->pkt_rate_high) ||
>> +	    (ec->tx_coalesce_usecs_low) || (ec->rx_coalesce_usecs_high) ||
>> +	    (ec->rx_max_coalesced_frames_high) || (ec->rx_coalesce_usecs) ||
>> +	    (ec->tx_max_coalesced_frames_irq) ||
>> +	    (ec->stats_block_coalesce_usecs) ||
>> +	    (ec->tx_max_coalesced_frames_high) || (ec->rate_sample_interval))
>> +		return -EOPNOTSUPP;
> 
> Rather than a somewhat haphazard mix of these values, 
> this might be simpler to read as something like:
> 
> 	/* Check not supported parameters  */
> 	if (ec->pkt_rate_low ||
> 	    ec->pkt_rate_high ||
> 
> 	    ec->use_adaptive_rx_coalesce ||
> 	    ec->rx_coalesce_usecs ||
> 	    ec->rx_coalesce_usecs_low ||
> 	    ec->rx_coalesce_usecs_high ||
> 	    ec->rx_coalesce_usecs_irq ||
> 	    ec->rx_max_coalesced_frames ||
> 	    ec->rx_max_coalesced_frames_low ||
> 	    ec->rx_max_coalesced_frames_high ||
> 	    ec->rx_max_coalesced_frames_irq ||
> 	    
> 	    ec->use_adaptive_tx_coalesce ||
> 	    ec->tx_coalesce_usecs_low ||
> 	    ec->tx_coalesce_usecs_high ||
> 	    ec->tx_max_coalesced_frames_low ||
> 	    ec->tx_max_coalesced_frames_high ||
> 	    ec->tx_max_coalesced_frames_irq ||
> 
> 	    ec->stats_block_coalesce_usecs ||
> 	    ec->rate_sample_interval)
> 		return -EOPNOTSUPP;
> 
> 
>> +static void hip04_free_ring(struct net_device *ndev, struct device *d)
>> +{
>> +	struct hip04_priv *priv = netdev_priv(ndev);
>> +	int i;
>> +
>> +	for (i = 0; i < RX_DESC_NUM; i++)
>> +		if (priv->rx_buf[i])
>> +			put_page(virt_to_head_page(priv->rx_buf[i]));
> 
> It's generally nicer to use braces around
> for loops with single ifs.
> 
>> +
>> +	for (i = 0; i < TX_DESC_NUM; i++)
>> +		if (priv->tx_skb[i])
>> +			dev_kfree_skb_any(priv->tx_skb[i]);
> 
> 
> 

OK, fix them later, thanks.

Ding

> .
> 


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
       [not found]     ` <1421253246.11734.17.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
@ 2015-01-15 10:29       ` Ding Tianhong
  2015-01-15 12:30         ` Arnd Bergmann
  0 siblings, 1 reply; 25+ messages in thread
From: Ding Tianhong @ 2015-01-15 10:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: arnd-r2nGTMty4D4, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	agraf-l3A5Bk7waGM,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
	zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-lFZ/pmaqli7XmaaqVzeoHQ

On 2015/1/15 0:34, Eric Dumazet wrote:
> On Wed, 2015-01-14 at 14:34 +0800, Ding Tianhong wrote:
>> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
>> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
>>
>> v13: Fix the problem of alignment parameters for function and checkpatch warming.
>>
>> v12: According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
>>      for hip04 ethernet.
>>
>> v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
>>      is not supported for this patch, but I think it could work for hip04,
>>      will support it later after some tests for performance better.
>>
>>      Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
>>      it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
>>
>>      - Before:
>>      $ ping 192.168.1.1 ...
>>      === 192.168.1.1 ping statistics ===
>>      24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>>      rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
>>
>>      $ iperf -c 192.168.1.1 ...
>>      [ ID] Interval       Transfer     Bandwidth
>>      [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec
>>
>>      - After:
>>      $ ping 192.168.1.1 ...
>>      === 192.168.1.1 ping statistics ===
>>      24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>>      rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
>>
>>      $ iperf -c 192.168.1.1 ...
>>      [ ID] Interval       Transfer     Bandwidth
>>      [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec
>>
>> v10: According David Miller and Arnd Bergmann's suggestion, add some modification
>>      for v9 version
>>      - drop the workqueue
>>      - batch cleanup based on tx_coalesce_frames/usecs for better throughput
>>      - use a reasonable default tx timeout (200us, could be shorted
>>        based on measurements) with a range timer
>>      - fix napi poll function return value
>>      - use a lockless queue for cleanup
>>
>> Signed-off-by: Zhangfei Gao <zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>> Signed-off-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
>> Signed-off-by: Ding Tianhong <dingtianhong-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>> ---
>>  drivers/net/ethernet/hisilicon/Makefile    |   2 +-
>>  drivers/net/ethernet/hisilicon/hip04_eth.c | 969 +++++++++++++++++++++++++++++
>>  2 files changed, 970 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
>>
>> diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
>> index 40115a7..6c14540 100644
>> --- a/drivers/net/ethernet/hisilicon/Makefile
>> +++ b/drivers/net/ethernet/hisilicon/Makefile
>> @@ -3,4 +3,4 @@
>>  #
>>  
>>  obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
>> -obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
>> +obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
>> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
>> new file mode 100644
>> index 0000000..525214e
>> --- /dev/null
>> +++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
>> @@ -0,0 +1,969 @@
>> +
>> +/* Copyright (c) 2014 Linaro Ltd.
>> + * Copyright (c) 2014 Hisilicon Limited.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + */
>> +
>> +#include <linux/module.h>
>> +#include <linux/etherdevice.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/ktime.h>
>> +#include <linux/of_address.h>
>> +#include <linux/phy.h>
>> +#include <linux/of_mdio.h>
>> +#include <linux/of_net.h>
>> +#include <linux/mfd/syscon.h>
>> +#include <linux/regmap.h>
>> +
>> +#define PPE_CFG_RX_ADDR			0x100
>> +#define PPE_CFG_POOL_GRP		0x300
>> +#define PPE_CFG_RX_BUF_SIZE		0x400
>> +#define PPE_CFG_RX_FIFO_SIZE		0x500
>> +#define PPE_CURR_BUF_CNT		0xa200
>> +
>> +#define GE_DUPLEX_TYPE			0x08
>> +#define GE_MAX_FRM_SIZE_REG		0x3c
>> +#define GE_PORT_MODE			0x40
>> +#define GE_PORT_EN			0x44
>> +#define GE_SHORT_RUNTS_THR_REG		0x50
>> +#define GE_TX_LOCAL_PAGE_REG		0x5c
>> +#define GE_TRANSMIT_CONTROL_REG		0x60
>> +#define GE_CF_CRC_STRIP_REG		0x1b0
>> +#define GE_MODE_CHANGE_REG		0x1b4
>> +#define GE_RECV_CONTROL_REG		0x1e0
>> +#define GE_STATION_MAC_ADDRESS		0x210
>> +#define PPE_CFG_CPU_ADD_ADDR		0x580
>> +#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
>> +#define PPE_CFG_BUS_CTRL_REG		0x424
>> +#define PPE_CFG_RX_CTRL_REG		0x428
>> +#define PPE_CFG_RX_PKT_MODE_REG		0x438
>> +#define PPE_CFG_QOS_VMID_GEN		0x500
>> +#define PPE_CFG_RX_PKT_INT		0x538
>> +#define PPE_INTEN			0x600
>> +#define PPE_INTSTS			0x608
>> +#define PPE_RINT			0x604
>> +#define PPE_CFG_STS_MODE		0x700
>> +#define PPE_HIS_RX_PKT_CNT		0x804
>> +
>> +/* REG_INTERRUPT */
>> +#define RCV_INT				BIT(10)
>> +#define RCV_NOBUF			BIT(8)
>> +#define RCV_DROP			BIT(7)
>> +#define TX_DROP				BIT(6)
>> +#define DEF_INT_ERR			(RCV_NOBUF | RCV_DROP | TX_DROP)
>> +#define DEF_INT_MASK			(RCV_INT | DEF_INT_ERR)
>> +
>> +/* TX descriptor config */
>> +#define TX_FREE_MEM			BIT(0)
>> +#define TX_READ_ALLOC_L3		BIT(1)
>> +#define TX_FINISH_CACHE_INV		BIT(2)
>> +#define TX_CLEAR_WB			BIT(4)
>> +#define TX_L3_CHECKSUM			BIT(5)
>> +#define TX_LOOP_BACK			BIT(11)
>> +
>> +/* RX error */
>> +#define RX_PKT_DROP			BIT(0)
>> +#define RX_L2_ERR			BIT(1)
>> +#define RX_PKT_ERR			(RX_PKT_DROP | RX_L2_ERR)
>> +
>> +#define SGMII_SPEED_1000		0x08
>> +#define SGMII_SPEED_100			0x07
>> +#define SGMII_SPEED_10			0x06
>> +#define MII_SPEED_100			0x01
>> +#define MII_SPEED_10			0x00
>> +
>> +#define GE_DUPLEX_FULL			BIT(0)
>> +#define GE_DUPLEX_HALF			0x00
>> +#define GE_MODE_CHANGE_EN		BIT(0)
>> +
>> +#define GE_TX_AUTO_NEG			BIT(5)
>> +#define GE_TX_ADD_CRC			BIT(6)
>> +#define GE_TX_SHORT_PAD_THROUGH		BIT(7)
>> +
>> +#define GE_RX_STRIP_CRC			BIT(0)
>> +#define GE_RX_STRIP_PAD			BIT(3)
>> +#define GE_RX_PAD_EN			BIT(4)
>> +
>> +#define GE_AUTO_NEG_CTL			BIT(0)
>> +
>> +#define GE_RX_INT_THRESHOLD		BIT(6)
>> +#define GE_RX_TIMEOUT			0x04
>> +
>> +#define GE_RX_PORT_EN			BIT(1)
>> +#define GE_TX_PORT_EN			BIT(2)
>> +
>> +#define PPE_CFG_STS_RX_PKT_CNT_RC	BIT(12)
>> +
>> +#define PPE_CFG_RX_PKT_ALIGN		BIT(18)
>> +#define PPE_CFG_QOS_VMID_MODE		BIT(14)
>> +#define PPE_CFG_QOS_VMID_GRP_SHIFT	8
>> +
>> +#define PPE_CFG_RX_FIFO_FSFU		BIT(11)
>> +#define PPE_CFG_RX_DEPTH_SHIFT		16
>> +#define PPE_CFG_RX_START_SHIFT		0
>> +#define PPE_CFG_RX_CTRL_ALIGN_SHIFT	11
>> +
>> +#define PPE_CFG_BUS_LOCAL_REL		BIT(14)
>> +#define PPE_CFG_BUS_BIG_ENDIEN		BIT(0)
>> +
>> +#define RX_DESC_NUM			128
>> +#define TX_DESC_NUM			256
>> +#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
>> +#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
>> +
>> +#define GMAC_PPE_RX_PKT_MAX_LEN		379
>> +#define GMAC_MAX_PKT_LEN		1516
>> +#define GMAC_MIN_PKT_LEN		31
>> +#define RX_BUF_SIZE			1600
>> +#define RESET_TIMEOUT			1000
>> +#define TX_TIMEOUT			(6 * HZ)
>> +
>> +#define DRV_NAME			"hip04-ether"
>> +#define DRV_VERSION			"v1.0"
>> +
>> +#define HIP04_MAX_TX_COALESCE_USECS	200
>> +#define HIP04_MIN_TX_COALESCE_USECS	100
>> +#define HIP04_MAX_TX_COALESCE_FRAMES	200
>> +#define HIP04_MIN_TX_COALESCE_FRAMES	100
>> +
>> +struct tx_desc {
>> +	u32 send_addr;
> 
> 	__be32 send_adddr; ?
> 
>> +	u32 send_size;
> 
> 	__be32
> 
>> +	u32 next_addr;
> 	__be32
> 
>> +	u32 cfg;
> 	__be32
> 
>> +	u32 wb_addr;
> 	__be32 wb_addr ?
> 
>> +} __aligned(64);
>> +
>> +struct rx_desc {
>> +	u16 reserved_16;
>> +	u16 pkt_len;
>> +	u32 reserve1[3];
>> +	u32 pkt_err;
>> +	u32 reserve2[4];
>> +};
>> +
>> +struct hip04_priv {
>> +	void __iomem *base;
>> +	int phy_mode;
>> +	int chan;
>> +	unsigned int port;
>> +	unsigned int speed;
>> +	unsigned int duplex;
>> +	unsigned int reg_inten;
>> +
>> +	struct napi_struct napi;
>> +	struct net_device *ndev;
>> +
>> +	struct tx_desc *tx_desc;
>> +	dma_addr_t tx_desc_dma;
>> +	struct sk_buff *tx_skb[TX_DESC_NUM];
>> +	dma_addr_t tx_phys[TX_DESC_NUM];
> 
> This is not an efficient way to store skb/phys, as for each skb, info
> will be store in 2 separate cache lines.
> 
> It would be better to use a 
> 
> struct hip04_tx_desc {
>    struct sk_buff   *skb;
>    dma_addr_t       phys;
> } 
> 
>> +	unsigned int tx_head;
>> +
>> +	int tx_coalesce_frames;
>> +	int tx_coalesce_usecs;
>> +	struct hrtimer tx_coalesce_timer;
>> +
>> +	unsigned char *rx_buf[RX_DESC_NUM];
>> +	dma_addr_t rx_phys[RX_DESC_NUM];
> 
> Same thing here : Use a struct to get better data locality.
> 
>> +	unsigned int rx_head;
>> +	unsigned int rx_buf_size;
>> +
>> +	struct device_node *phy_node;
>> +	struct phy_device *phy;
>> +	struct regmap *map;
>> +	struct work_struct tx_timeout_task;
>> +
>> +	/* written only by tx cleanup */
>> +	unsigned int tx_tail ____cacheline_aligned_in_smp;
>> +};
>> +
>> +static inline unsigned int tx_count(unsigned int head, unsigned int tail)
>> +{
>> +	return (head - tail) % (TX_DESC_NUM - 1);
>> +}
>> +
>> +static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
>> +{
>> +	struct hip04_priv *priv = netdev_priv(ndev);
>> +	u32 val;
>> +
>> +	priv->speed = speed;
>> +	priv->duplex = duplex;
>> +
>> +	switch (priv->phy_mode) {
>> +	case PHY_INTERFACE_MODE_SGMII:
>> +		if (speed == SPEED_1000)
>> +			val = SGMII_SPEED_1000;
>> +		else if (speed == SPEED_100)
>> +			val = SGMII_SPEED_100;
>> +		else
>> +			val = SGMII_SPEED_10;
>> +		break;
>> +	case PHY_INTERFACE_MODE_MII:
>> +		if (speed == SPEED_100)
>> +			val = MII_SPEED_100;
>> +		else
>> +			val = MII_SPEED_10;
>> +		break;
>> +	default:
>> +		netdev_warn(ndev, "not supported mode\n");
>> +		val = MII_SPEED_10;
>> +		break;
>> +	}
>> +	writel_relaxed(val, priv->base + GE_PORT_MODE);
>> +
>> +	val = duplex ? GE_DUPLEX_FULL : GE_DUPLEX_HALF;
>> +	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
>> +
>> +	val = GE_MODE_CHANGE_EN;
>> +	writel_relaxed(val, priv->base + GE_MODE_CHANGE_REG);
>> +}
>> +
>> +static void hip04_reset_ppe(struct hip04_priv *priv)
>> +{
>> +	u32 val, tmp, timeout = 0;
>> +
>> +	do {
>> +		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
>> +		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
>> +		if (timeout++ > RESET_TIMEOUT)
>> +			break;
>> +	} while (val & 0xfff);
>> +}
>> +
>> +static void hip04_config_fifo(struct hip04_priv *priv)
>> +{
>> +	u32 val;
>> +
>> +	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
>> +	val |= PPE_CFG_STS_RX_PKT_CNT_RC;
>> +	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
>> +
>> +	val = BIT(priv->port);
>> +	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
>> +
>> +	val = priv->port << PPE_CFG_QOS_VMID_GRP_SHIFT;
>> +	val |= PPE_CFG_QOS_VMID_MODE;
>> +	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
>> +
>> +	val = RX_BUF_SIZE;
>> +	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
>> +
>> +	val = RX_DESC_NUM << PPE_CFG_RX_DEPTH_SHIFT;
>> +	val |= PPE_CFG_RX_FIFO_FSFU;
>> +	val |= priv->chan << PPE_CFG_RX_START_SHIFT;
>> +	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
>> +
>> +	val = NET_IP_ALIGN << PPE_CFG_RX_CTRL_ALIGN_SHIFT;
>> +	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
>> +
>> +	val = PPE_CFG_RX_PKT_ALIGN;
>> +	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
>> +
>> +	val = PPE_CFG_BUS_LOCAL_REL | PPE_CFG_BUS_BIG_ENDIEN;
>> +	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
>> +
>> +	val = GMAC_PPE_RX_PKT_MAX_LEN;
>> +	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
>> +
>> +	val = GMAC_MAX_PKT_LEN;
>> +	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
>> +
>> +	val = GMAC_MIN_PKT_LEN;
>> +	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
>> +
>> +	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
>> +	val |= GE_TX_AUTO_NEG | GE_TX_ADD_CRC | GE_TX_SHORT_PAD_THROUGH;
>> +	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
>> +
>> +	val = GE_RX_STRIP_CRC;
>> +	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
>> +
>> +	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
>> +	val |= GE_RX_STRIP_PAD | GE_RX_PAD_EN;
>> +	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
>> +
>> +	val = GE_AUTO_NEG_CTL;
>> +	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
>> +}
>> +
>> +static void hip04_mac_enable(struct net_device *ndev)
>> +{
>> +	struct hip04_priv *priv = netdev_priv(ndev);
>> +	u32 val;
>> +
>> +	/* enable tx & rx */
>> +	val = readl_relaxed(priv->base + GE_PORT_EN);
>> +	val |= GE_RX_PORT_EN | GE_TX_PORT_EN;
>> +	writel_relaxed(val, priv->base + GE_PORT_EN);
>> +
>> +	/* clear rx int */
>> +	val = RCV_INT;
>> +	writel_relaxed(val, priv->base + PPE_RINT);
>> +
>> +	/* config recv int */
>> +	val = GE_RX_INT_THRESHOLD | GE_RX_TIMEOUT;
>> +	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
>> +
>> +	/* enable interrupt */
>> +	priv->reg_inten = DEF_INT_MASK;
>> +	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> +}
>> +
>> +static void hip04_mac_disable(struct net_device *ndev)
>> +{
>> +	struct hip04_priv *priv = netdev_priv(ndev);
>> +	u32 val;
>> +
>> +	/* disable int */
>> +	priv->reg_inten &= ~(DEF_INT_MASK);
>> +	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> +
>> +	/* disable tx & rx */
>> +	val = readl_relaxed(priv->base + GE_PORT_EN);
>> +	val &= ~(GE_RX_PORT_EN | GE_TX_PORT_EN);
>> +	writel_relaxed(val, priv->base + GE_PORT_EN);
>> +}
>> +
>> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
>> +{
>> +	writel(phys, priv->base + PPE_CFG_CPU_ADD_ADDR);
>> +}
>> +
>> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
>> +{
>> +	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
>> +}
>> +
>> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
>> +{
>> +	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
>> +}
>> +
>> +static void hip04_update_mac_address(struct net_device *ndev)
>> +{
>> +	struct hip04_priv *priv = netdev_priv(ndev);
>> +
>> +	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
>> +		       priv->base + GE_STATION_MAC_ADDRESS);
>> +	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
>> +			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
>> +		       priv->base + GE_STATION_MAC_ADDRESS + 4);
>> +}
>> +
>> +static int hip04_set_mac_address(struct net_device *ndev, void *addr)
>> +{
>> +	eth_mac_addr(ndev, addr);
>> +	hip04_update_mac_address(ndev);
>> +	return 0;
>> +}
>> +
>> +static int hip04_tx_reclaim(struct net_device *ndev, bool force)
>> +{
>> +	struct hip04_priv *priv = netdev_priv(ndev);
>> +	unsigned tx_tail = priv->tx_tail;
>> +	struct tx_desc *desc;
>> +	unsigned int bytes_compl = 0, pkts_compl = 0;
>> +	unsigned int count;
>> +
>> +	smp_rmb();
>> +	count = tx_count(ACCESS_ONCE(priv->tx_head), tx_tail);
>> +	if (count == 0)
>> +		goto out;
>> +
>> +	while (count) {
>> +		desc = &priv->tx_desc[tx_tail];
>> +		if (desc->send_addr != 0) {
>> +			if (force)
>> +				desc->send_addr = 0;
>> +			else
>> +				break;
>> +		}
>> +
>> +		if (priv->tx_phys[tx_tail]) {
>> +			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
>> +					 priv->tx_skb[tx_tail]->len,
>> +					 DMA_TO_DEVICE);
>> +			priv->tx_phys[tx_tail] = 0;
>> +		}
>> +		pkts_compl++;
>> +		bytes_compl += priv->tx_skb[tx_tail]->len;
>> +		dev_kfree_skb(priv->tx_skb[tx_tail]);
>> +		priv->tx_skb[tx_tail] = NULL;
>> +		tx_tail = TX_NEXT(tx_tail);
>> +		count--;
>> +	}
>> +
>> +	priv->tx_tail = tx_tail;
>> +	smp_wmb(); /* Ensure tx_tail visible to xmit */
>> +
>> +out:
>> +	if (pkts_compl || bytes_compl)
> 
> Testing bytes_compl should be enough : There is no way pkt_compl could
> be 0 if bytes_compl is not 0.
> 
>> +		netdev_completed_queue(ndev, pkts_compl, bytes_compl);
>> +
>> +	if (unlikely(netif_queue_stopped(ndev)) && (count < (TX_DESC_NUM - 1)))
>> +		netif_wake_queue(ndev);
>> +
>> +	return count;
>> +}
>> +
>> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>> +{
>> +	struct hip04_priv *priv = netdev_priv(ndev);
>> +	struct net_device_stats *stats = &ndev->stats;
>> +	unsigned int tx_head = priv->tx_head, count;
>> +	struct tx_desc *desc = &priv->tx_desc[tx_head];
>> +	dma_addr_t phys;
>> +
>> +	smp_rmb();
>> +	count = tx_count(tx_head, ACCESS_ONCE(priv->tx_tail));
>> +	if (count == (TX_DESC_NUM - 1)) {
>> +		netif_stop_queue(ndev);
>> +		return NETDEV_TX_BUSY;
>> +	}
>> +
>> +	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>> +	if (dma_mapping_error(&ndev->dev, phys)) {
>> +		dev_kfree_skb(skb);
>> +		return NETDEV_TX_OK;
>> +	}
>> +
>> +	priv->tx_skb[tx_head] = skb;
>> +	priv->tx_phys[tx_head] = phys;
>> +	desc->send_addr = cpu_to_be32(phys);
>> +	desc->send_size = cpu_to_be32(skb->len);
>> +	desc->cfg = cpu_to_be32(TX_CLEAR_WB | TX_FINISH_CACHE_INV);
>> +	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
>> +	desc->wb_addr = cpu_to_be32(phys);
>> +	skb_tx_timestamp(skb);
>> +
>> +	hip04_set_xmit_desc(priv, phys);
>> +	priv->tx_head = TX_NEXT(tx_head);
>> +	count++;
> 
> Starting from this point, skb might already have been freed by TX
> completion.
> 
> Its racy to access skb->len 
> 
>> +	netdev_sent_queue(ndev, skb->len);
>> +
>> +	stats->tx_bytes += skb->len;
>> +	stats->tx_packets++;
>> +
>> +	/* Ensure tx_head update visible to tx reclaim */
>> +	smp_wmb();
>> +
>> +	/* queue is getting full, better start cleaning up now */
>> +	if (count >= priv->tx_coalesce_frames) {
>> +		if (napi_schedule_prep(&priv->napi)) {
>> +			/* disable rx interrupt and timer */
>> +			priv->reg_inten &= ~(RCV_INT);
>> +			writel_relaxed(DEF_INT_MASK & ~RCV_INT,
>> +				       priv->base + PPE_INTEN);
>> +			hrtimer_cancel(&priv->tx_coalesce_timer);
>> +			__napi_schedule(&priv->napi);
>> +		}
>> +	} else if (!hrtimer_is_queued(&priv->tx_coalesce_timer)) {
>> +		/* cleanup not pending yet, start a new timer */
>> +		hrtimer_start_expires(&priv->tx_coalesce_timer,
>> +				      HRTIMER_MODE_REL);
>> +	}
>> +
>> +	return NETDEV_TX_OK;
>> +}
>> +
>> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
>> +{
>> +	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
>> +	struct net_device *ndev = priv->ndev;
>> +	struct net_device_stats *stats = &ndev->stats;
>> +	unsigned int cnt = hip04_recv_cnt(priv);
>> +	struct rx_desc *desc;
>> +	struct sk_buff *skb;
>> +	unsigned char *buf;
>> +	bool last = false;
>> +	dma_addr_t phys;
>> +	int rx = 0;
>> +	int tx_remaining;
>> +	u16 len;
>> +	u32 err;
>> +
>> +	while (cnt && !last) {
>> +		buf = priv->rx_buf[priv->rx_head];
>> +		skb = build_skb(buf, priv->rx_buf_size);
>> +		if (unlikely(!skb))
>> +			net_dbg_ratelimited("build_skb failed\n");
> 
> Well, is skb is NULL, you're crashing later...
> You really have to address a memory allocation error much better than
> that !
> 
>> +
>> +		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
>> +				 RX_BUF_SIZE, DMA_FROM_DEVICE);
>> +		priv->rx_phys[priv->rx_head] = 0;
>> +
>> +		desc = (struct rx_desc *)skb->data;
>> +		len = be16_to_cpu(desc->pkt_len);
>> +		err = be32_to_cpu(desc->pkt_err);
>> +
>> +		if (0 == len) {
>> +			dev_kfree_skb_any(skb);
>> +			last = true;
>> +		} else if ((err & RX_PKT_ERR) || (len >= GMAC_MAX_PKT_LEN)) {
>> +			dev_kfree_skb_any(skb);
>> +			stats->rx_dropped++;
>> +			stats->rx_errors++;
>> +		} else {
>> +			skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
>> +			skb_put(skb, len);
>> +			skb->protocol = eth_type_trans(skb, ndev);
>> +			napi_gro_receive(&priv->napi, skb);
>> +			stats->rx_packets++;
>> +			stats->rx_bytes += len;
>> +			rx++;
>> +		}
>> +
>> +		buf = netdev_alloc_frag(priv->rx_buf_size);
>> +		if (!buf)
>> +			goto done;
> 
> Same problem here : In case of memory allocation error, your driver is
> totally screwed.
> 
>> +		phys = dma_map_single(&ndev->dev, buf,
>> +				      RX_BUF_SIZE, DMA_FROM_DEVICE);
>> +		if (dma_mapping_error(&ndev->dev, phys))
>> +			goto done;
> 
> Same problem here : You really have to recover properly.
> 
>> +		priv->rx_buf[priv->rx_head] = buf;
>> +		priv->rx_phys[priv->rx_head] = phys;
>> +		hip04_set_recv_desc(priv, phys);
>> +
>> +		priv->rx_head = RX_NEXT(priv->rx_head);
>> +		if (rx >= budget)
>> +			goto done;
>> +
>> +		if (--cnt == 0)
>> +			cnt = hip04_recv_cnt(priv);
>> +	}
>> +
>> +	if (!(priv->reg_inten & RCV_INT)) {
>> +		/* enable rx interrupt */
>> +		priv->reg_inten |= RCV_INT;
>> +		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> +	}
>> +	napi_complete(napi);
>> +done:
>> +	/* clean up tx descriptors and start a new timer if necessary */
>> +	tx_remaining = hip04_tx_reclaim(ndev, false);
>> +	if (rx < budget && tx_remaining)
>> +		hrtimer_start_expires(&priv->tx_coalesce_timer, HRTIMER_MODE_REL);
>> +
>> +	return rx;
>> +}
>> +

Yes, thanks, fix them later.

Ding

> 
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
  2015-01-15 10:29       ` Ding Tianhong
@ 2015-01-15 12:30         ` Arnd Bergmann
  0 siblings, 0 replies; 25+ messages in thread
From: Arnd Bergmann @ 2015-01-15 12:30 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Eric Dumazet, robh+dt, davem, grant.likely, agraf,
	sergei.shtylyov, linux-arm-kernel, xuwei5, zhangfei.gao, netdev,
	devicetree, linux

On Thursday 15 January 2015 18:29:55 Ding Tianhong wrote:
> 
> Yes, thanks, fix them later.

Please note that now that as v13 has now shown up in net-next, I guess
all other patches need to be based on top of what is already merged,
and address one issue at a time.

There is one more issue I saw during build testing, which is the
lack of a MODULE_LICENSE  statement for the main driver, although
there is one in the mdio driver.

	Arnd

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver
  2015-01-15  9:42     ` Arnd Bergmann
@ 2015-01-15 12:39       ` Alexander Graf
       [not found]         ` <54B7B4E7.8030108-l3A5Bk7waGM@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Alexander Graf @ 2015-01-15 12:39 UTC (permalink / raw)
  To: Arnd Bergmann, Ding Tianhong
  Cc: robh+dt, davem, grant.likely, sergei.shtylyov, linux-arm-kernel,
	eric.dumazet, xuwei5, zhangfei.gao, netdev, devicetree, linux



On 15.01.15 10:42, Arnd Bergmann wrote:
> On Thursday 15 January 2015 16:37:23 Ding Tianhong wrote:
>> On 2015/1/14 18:19, Alexander Graf wrote:
>>>
>>> >From a 10000 feet perspective it looks like two problems to me
>>>
>>>   1) Allocation failure doesn't get handled properly somewhere
> 
> This is the bug that Eric pointed out as well.
> 
>>>   2) We fail to allocate with order=0 - I don't see why
> 
> GFP_ATOMIC. When allocating from a the napi poll function in softirq
> context, you have to use nonblocking allocations, which occasionally
> fail. This should not cause any harm other than dropped packets.
> 
>> is it easy to repetition this bug? how big is your memory on your board,
>> is it happened in your previous hip04 driver?
> 
> It should be independent of memory size, but may be more likely if you
> don't have swap space configured.

With the previous driver I was unable to get this far - I ended up in
random memory corruption and had a ~90% packet loss after about an hour
of uptime.

I'm not sure whether it's easy to reproduce, I merely started up a few
VMs, did some disk I/O and started to compile QEMU in the background ;).

I'll happily give your follow up patch that's going to fix the memory
allocation problems a try though.


Alex

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver
       [not found]         ` <54B7B4E7.8030108-l3A5Bk7waGM@public.gmane.org>
@ 2015-01-15 12:56           ` Ding Tianhong
  0 siblings, 0 replies; 25+ messages in thread
From: Ding Tianhong @ 2015-01-15 12:56 UTC (permalink / raw)
  To: Alexander Graf, Arnd Bergmann
  Cc: robh+dt-DgEjT+Ai2ygdnm+yROfE0A, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
	zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-lFZ/pmaqli7XmaaqVzeoHQ

On 2015/1/15 20:39, Alexander Graf wrote:
> 
> 
> On 15.01.15 10:42, Arnd Bergmann wrote:
>> On Thursday 15 January 2015 16:37:23 Ding Tianhong wrote:
>>> On 2015/1/14 18:19, Alexander Graf wrote:
>>>>
>>>> >From a 10000 feet perspective it looks like two problems to me
>>>>
>>>>   1) Allocation failure doesn't get handled properly somewhere
>>
>> This is the bug that Eric pointed out as well.
>>
>>>>   2) We fail to allocate with order=0 - I don't see why
>>
>> GFP_ATOMIC. When allocating from a the napi poll function in softirq
>> context, you have to use nonblocking allocations, which occasionally
>> fail. This should not cause any harm other than dropped packets.
>>
>>> is it easy to repetition this bug? how big is your memory on your board,
>>> is it happened in your previous hip04 driver?
>>
>> It should be independent of memory size, but may be more likely if you
>> don't have swap space configured.
> 
> With the previous driver I was unable to get this far - I ended up in
> random memory corruption and had a ~90% packet loss after about an hour
> of uptime.
> 
> I'm not sure whether it's easy to reproduce, I merely started up a few
> VMs, did some disk I/O and started to compile QEMU in the background ;).
> 
> I'll happily give your follow up patch that's going to fix the memory
> allocation problems a try though.
> 
Thanks for your work, I am sure we could make it much more better:)

Ding

> 
> Alex
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
  2015-01-14  6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
                     ` (4 preceding siblings ...)
  2015-01-15  9:53   ` Arnd Bergmann
@ 2015-01-19 18:11   ` Alexander Graf
       [not found]     ` <54BD48BF.3050707-l3A5Bk7waGM@public.gmane.org>
  5 siblings, 1 reply; 25+ messages in thread
From: Alexander Graf @ 2015-01-19 18:11 UTC (permalink / raw)
  To: Ding Tianhong, arnd, robh+dt, davem, grant.likely
  Cc: sergei.shtylyov, linux-arm-kernel, eric.dumazet, xuwei5,
	zhangfei.gao, netdev, devicetree, linux

On 01/14/15 07:34, Ding Tianhong wrote:
> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
> The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
>
> v13: Fix the problem of alignment parameters for function and checkpatch warming.
>
> v12: According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
>       for hip04 ethernet.
>
> v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
>       is not supported for this patch, but I think it could work for hip04,
>       will support it later after some tests for performance better.
>
>       Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
>       it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
>
>       - Before:
>       $ ping 192.168.1.1 ...
>       === 192.168.1.1 ping statistics ===
>       24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>       rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
>
>       $ iperf -c 192.168.1.1 ...
>       [ ID] Interval       Transfer     Bandwidth
>       [  3]  0.0- 1.0 sec   115 MBytes   945 Mbits/sec
>
>       - After:
>       $ ping 192.168.1.1 ...
>       === 192.168.1.1 ping statistics ===
>       24 packets transmitted, 24 received, 0% packet loss, time 22999ms
>       rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
>
>       $ iperf -c 192.168.1.1 ...
>       [ ID] Interval       Transfer     Bandwidth
>       [  3]  0.0- 1.0 sec   115 MBytes   965 Mbits/sec
>
> v10: According David Miller and Arnd Bergmann's suggestion, add some modification
>       for v9 version
>       - drop the workqueue
>       - batch cleanup based on tx_coalesce_frames/usecs for better throughput
>       - use a reasonable default tx timeout (200us, could be shorted
>         based on measurements) with a range timer
>       - fix napi poll function return value
>       - use a lockless queue for cleanup
>
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---
>   drivers/net/ethernet/hisilicon/Makefile    |   2 +-
>   drivers/net/ethernet/hisilicon/hip04_eth.c | 969 +++++++++++++++++++++++++++++
>   2 files changed, 970 insertions(+), 1 deletion(-)
>   create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
>
> diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
> index 40115a7..6c14540 100644
> --- a/drivers/net/ethernet/hisilicon/Makefile
> +++ b/drivers/net/ethernet/hisilicon/Makefile
> @@ -3,4 +3,4 @@
>   #
>   
>   obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
> -obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
> +obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o

[...]

> +static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
> +{
> +	struct net_device *ndev = (struct net_device *)dev_id;
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	struct net_device_stats *stats = &ndev->stats;
> +	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
> +
> +	if (!ists)
> +		return IRQ_NONE;
> +
> +	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
> +
> +	if (unlikely(ists & DEF_INT_ERR)) {
> +		if (ists & (RCV_NOBUF | RCV_DROP))
> +			stats->rx_errors++;
> +			stats->rx_dropped++;
> +			netdev_err(ndev, "rx drop\n");

After hammering on the box a bit again, I'm in a situation where I get 
lots of

[302398.232603] hip04-ether e28b0000.ethernet eth0: rx drop
[302398.377309] hip04-ether e28b0000.ethernet eth0: rx drop
[302398.395198] hip04-ether e28b0000.ethernet eth0: rx drop
[302398.466118] hip04-ether e28b0000.ethernet eth0: rx drop
[302398.659009] hip04-ether e28b0000.ethernet eth0: rx drop
[302399.053389] hip04-ether e28b0000.ethernet eth0: rx drop
[302399.122067] hip04-ether e28b0000.ethernet eth0: rx drop
[302399.268192] hip04-ether e28b0000.ethernet eth0: rx drop
[302399.286081] hip04-ether e28b0000.ethernet eth0: rx drop
[302399.594201] hip04-ether e28b0000.ethernet eth0: rx drop
[302399.683416] hip04-ether e28b0000.ethernet eth0: rx drop
[302399.701307] hip04-ether e28b0000.ethernet eth0: rx drop

and I really am getting a lot of drops - I can't even ping the machine 
anymore.

However, as it is there's a good chance the machine is simply 
unreachable because it's busy writing to the UART, and even if not all 
useful messages indicating anything have scrolled out. I really don't 
think you should emit any message over and over again to the user. Once 
or twice is enough.

Please make sure to rate limit it.


Alex

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
       [not found]     ` <54BD48BF.3050707-l3A5Bk7waGM@public.gmane.org>
@ 2015-01-19 20:34       ` Arnd Bergmann
  2015-01-20  2:15         ` Ding Tianhong
  0 siblings, 1 reply; 25+ messages in thread
From: Arnd Bergmann @ 2015-01-19 20:34 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Ding Tianhong, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
	zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-lFZ/pmaqli7XmaaqVzeoHQ

On Monday 19 January 2015 19:11:11 Alexander Graf wrote:
> 
> After hammering on the box a bit again, I'm in a situation where I get 
> lots of
> 
> [302398.232603] hip04-ether e28b0000.ethernet eth0: rx drop
> [302398.377309] hip04-ether e28b0000.ethernet eth0: rx drop
> [302398.395198] hip04-ether e28b0000.ethernet eth0: rx drop
> [302398.466118] hip04-ether e28b0000.ethernet eth0: rx drop
> [302398.659009] hip04-ether e28b0000.ethernet eth0: rx drop
> [302399.053389] hip04-ether e28b0000.ethernet eth0: rx drop
> [302399.122067] hip04-ether e28b0000.ethernet eth0: rx drop
> [302399.268192] hip04-ether e28b0000.ethernet eth0: rx drop
> [302399.286081] hip04-ether e28b0000.ethernet eth0: rx drop
> [302399.594201] hip04-ether e28b0000.ethernet eth0: rx drop
> [302399.683416] hip04-ether e28b0000.ethernet eth0: rx drop
> [302399.701307] hip04-ether e28b0000.ethernet eth0: rx drop
> 
> and I really am getting a lot of drops - I can't even ping the machine 
> anymore.
> 
> However, as it is there's a good chance the machine is simply 
> unreachable because it's busy writing to the UART, and even if not all 
> useful messages indicating anything have scrolled out. I really don't 
> think you should emit any message over and over again to the user. Once 
> or twice is enough.
>
> Please make sure to rate limit it.

I would argue that packet loss is not an error condition at all
and you should not print this at netdev_err() level. You could make
this a netdev_dbg(), or just make it silent because it's already
counted in the statistics.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
  2015-01-19 20:34       ` Arnd Bergmann
@ 2015-01-20  2:15         ` Ding Tianhong
       [not found]           ` <54BDBA29.10703-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Ding Tianhong @ 2015-01-20  2:15 UTC (permalink / raw)
  To: Arnd Bergmann, Alexander Graf
  Cc: robh+dt-DgEjT+Ai2ygdnm+yROfE0A, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
	zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-lFZ/pmaqli7XmaaqVzeoHQ

On 2015/1/20 4:34, Arnd Bergmann wrote:
> On Monday 19 January 2015 19:11:11 Alexander Graf wrote:
>>
>> After hammering on the box a bit again, I'm in a situation where I get 
>> lots of
>>
>> [302398.232603] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302398.377309] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302398.395198] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302398.466118] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302398.659009] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302399.053389] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302399.122067] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302399.268192] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302399.286081] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302399.594201] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302399.683416] hip04-ether e28b0000.ethernet eth0: rx drop
>> [302399.701307] hip04-ether e28b0000.ethernet eth0: rx drop
>>
>> and I really am getting a lot of drops - I can't even ping the machine 
>> anymore.
>>
>> However, as it is there's a good chance the machine is simply 
>> unreachable because it's busy writing to the UART, and even if not all 
>> useful messages indicating anything have scrolled out. I really don't 
>> think you should emit any message over and over again to the user. Once 
>> or twice is enough.
>>
>> Please make sure to rate limit it.
> 
> I would argue that packet loss is not an error condition at all
> and you should not print this at netdev_err() level. You could make
> this a netdev_dbg(), or just make it silent because it's already
> counted in the statistics.
> 

I think something wrong with Graf's board, I will try to make it happen on my board, and 
in any case I will add rate limit to xx_drop and change to dbg log level.

Thanks 
Ding

> 	Arnd
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
       [not found]           ` <54BDBA29.10703-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2015-01-20 12:01             ` Arnd Bergmann
  2015-01-20 18:12               ` Joe Perches
  0 siblings, 1 reply; 25+ messages in thread
From: Arnd Bergmann @ 2015-01-20 12:01 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: Ding Tianhong, Alexander Graf, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-lFZ/pmaqli7XmaaqVzeoHQ,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On Tuesday 20 January 2015 10:15:05 Ding Tianhong wrote:
> On 2015/1/20 4:34, Arnd Bergmann wrote:
> > On Monday 19 January 2015 19:11:11 Alexander Graf wrote:
> >>
> >> After hammering on the box a bit again, I'm in a situation where I get 
> >> lots of
> >>
> >> [302398.232603] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302398.377309] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302398.395198] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302398.466118] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302398.659009] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302399.053389] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302399.122067] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302399.268192] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302399.286081] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302399.594201] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302399.683416] hip04-ether e28b0000.ethernet eth0: rx drop
> >> [302399.701307] hip04-ether e28b0000.ethernet eth0: rx drop
> >>
> >> and I really am getting a lot of drops - I can't even ping the machine 
> >> anymore.
> >>
> >> However, as it is there's a good chance the machine is simply 
> >> unreachable because it's busy writing to the UART, and even if not all 
> >> useful messages indicating anything have scrolled out. I really don't 
> >> think you should emit any message over and over again to the user. Once 
> >> or twice is enough.
> >>
> >> Please make sure to rate limit it.
> > 
> > I would argue that packet loss is not an error condition at all
> > and you should not print this at netdev_err() level. You could make
> > this a netdev_dbg(), or just make it silent because it's already
> > counted in the statistics.
> > 
> 
> I think something wrong with Graf's board, I will try to make it happen on my board, and 
> in any case I will add rate limit to xx_drop and change to dbg log level.

I've looked at the interrupt handling in more detail and came up
with this patch. Please review, and forward if you are happy with the
changes.

Alex, could you try if this solves your problem?

8<----
Subject: [PATCH] net/hip04: refactor interrupt masking

The hip04 ethernet driver currently acknowledges all interrupts directly
in the interrupt handler, and leaves all interrupts except the RX data
enabled the whole time. This causes multiple problems:

- When more packets come in between the original interrupt and the
  NAPI poll function, we will get an extraneous RX interrupt as soon
  as interrupts are enabled again.

- The two error interrupts are by definition combining all errors that
  may have happened since the last time they were handled, but just
  acknowledging the irq without dealing with the cause of the condition
  makes it come back immediately. In particular, when NAPI is intentionally
  stalling the rx queue, this causes a storm of "rx dropped" messages.  

- The access to the 'reg_inten' field in hip_priv is used for serializing
  access, but is in fact racy itself.

To deal with these issues, the driver is changed to only acknowledge
the IRQ at the point when it is being handled. The RX interrupts now get
acked right before reading the number of received frames to keep spurious
interrupts to the absolute minimum without losing interrupts.

As there is no tx-complete interrupt, only an informational tx-dropped
one, we now ack that in the tx reclaim handler, hoping that clearing
the descriptors has also eliminated the error condition.

The only place that reads the reg_inten now just relies on
napi_schedule_prep() to return whether NAPI is active or not, and
the poll() function is then going to ack and reenabled the IRQ if
necessary.

Finally, when we disable interrupts, they are now all disabled
together, in order to simplify the logic and to avoid getting
rx-dropped interrupts when NAPI is in control of the rx queue.

Signed-off-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 525214ef5984..83773247a368 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -56,6 +56,8 @@
 #define RCV_DROP			BIT(7)
 #define TX_DROP				BIT(6)
 #define DEF_INT_ERR			(RCV_NOBUF | RCV_DROP | TX_DROP)
+#define DEF_INT_RX			(RCV_INT | RCV_NOBUF | RCV_DROP)
+#define DEF_INT_TX			(TX_DROP)
 #define DEF_INT_MASK			(RCV_INT | DEF_INT_ERR)
 
 /* TX descriptor config */
@@ -154,7 +156,6 @@ struct hip04_priv {
 	unsigned int port;
 	unsigned int speed;
 	unsigned int duplex;
-	unsigned int reg_inten;
 
 	struct napi_struct napi;
 	struct net_device *ndev;
@@ -303,17 +304,15 @@ static void hip04_mac_enable(struct net_device *ndev)
 	val |= GE_RX_PORT_EN | GE_TX_PORT_EN;
 	writel_relaxed(val, priv->base + GE_PORT_EN);
 
-	/* clear rx int */
-	val = RCV_INT;
-	writel_relaxed(val, priv->base + PPE_RINT);
+	/* clear prior interrupts */
+	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
 
 	/* config recv int */
 	val = GE_RX_INT_THRESHOLD | GE_RX_TIMEOUT;
 	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
 
 	/* enable interrupt */
-	priv->reg_inten = DEF_INT_MASK;
-	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	writel_relaxed(DEF_INT_MASK, priv->base + PPE_INTEN);
 }
 
 static void hip04_mac_disable(struct net_device *ndev)
@@ -322,8 +321,7 @@ static void hip04_mac_disable(struct net_device *ndev)
 	u32 val;
 
 	/* disable int */
-	priv->reg_inten &= ~(DEF_INT_MASK);
-	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	writel_relaxed(0, priv->base + PPE_INTEN);
 
 	/* disable tx & rx */
 	val = readl_relaxed(priv->base + GE_PORT_EN);
@@ -403,6 +401,8 @@ static int hip04_tx_reclaim(struct net_device *ndev, bool force)
 	priv->tx_tail = tx_tail;
 	smp_wmb(); /* Ensure tx_tail visible to xmit */
 
+	writel_relaxed(DEF_INT_TX, priv->base + PPE_RINT);
+
 out:
 	if (pkts_compl || bytes_compl)
 		netdev_completed_queue(ndev, pkts_compl, bytes_compl);
@@ -458,9 +458,7 @@ static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
 	if (count >= priv->tx_coalesce_frames) {
 		if (napi_schedule_prep(&priv->napi)) {
 			/* disable rx interrupt and timer */
-			priv->reg_inten &= ~(RCV_INT);
-			writel_relaxed(DEF_INT_MASK & ~RCV_INT,
-				       priv->base + PPE_INTEN);
+			writel_relaxed(0, priv->base + PPE_INTEN);
 			hrtimer_cancel(&priv->tx_coalesce_timer);
 			__napi_schedule(&priv->napi);
 		}
@@ -478,7 +476,7 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget)
 	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
 	struct net_device *ndev = priv->ndev;
 	struct net_device_stats *stats = &ndev->stats;
-	unsigned int cnt = hip04_recv_cnt(priv);
+	unsigned int cnt;
 	struct rx_desc *desc;
 	struct sk_buff *skb;
 	unsigned char *buf;
@@ -489,6 +487,10 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget)
 	u16 len;
 	u32 err;
 
+	/* acknowledge interrupts and read status */
+	writel_relaxed(DEF_INT_RX, priv->base + PPE_RINT);
+	cnt = hip04_recv_cnt(priv);
+
 	while (cnt && !last) {
 		buf = priv->rx_buf[priv->rx_head];
 		skb = build_skb(buf, priv->rx_buf_size);
@@ -539,11 +541,8 @@ static int hip04_rx_poll(struct napi_struct *napi, int budget)
 			cnt = hip04_recv_cnt(priv);
 	}
 
-	if (!(priv->reg_inten & RCV_INT)) {
-		/* enable rx interrupt */
-		priv->reg_inten |= RCV_INT;
-		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
-	}
+	/* enable rx interrupt */
+	writel_relaxed(DEF_INT_MASK, priv->base + PPE_INTEN);
 	napi_complete(napi);
 done:
 	/* clean up tx descriptors and start a new timer if necessary */
@@ -564,23 +563,21 @@ static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
 	if (!ists)
 		return IRQ_NONE;
 
-	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
-
 	if (unlikely(ists & DEF_INT_ERR)) {
-		if (ists & (RCV_NOBUF | RCV_DROP))
+		if (ists & (RCV_NOBUF | RCV_DROP)) {
 			stats->rx_errors++;
 			stats->rx_dropped++;
-			netdev_err(ndev, "rx drop\n");
+			netdev_dbg(ndev, "rx drop\n");
+		}
 		if (ists & TX_DROP) {
 			stats->tx_dropped++;
-			netdev_err(ndev, "tx drop\n");
+			netdev_dbg(ndev, "tx drop\n");
 		}
 	}
 
-	if (ists & RCV_INT && napi_schedule_prep(&priv->napi)) {
-		/* disable rx interrupt */
-		priv->reg_inten &= ~(RCV_INT);
-		writel_relaxed(DEF_INT_MASK & ~RCV_INT, priv->base + PPE_INTEN);
+	if (napi_schedule_prep(&priv->napi)) {
+		/* disable interrupt */
+		writel_relaxed(0, priv->base + PPE_INTEN);
 		hrtimer_cancel(&priv->tx_coalesce_timer);
 		__napi_schedule(&priv->napi);
 	}
@@ -596,8 +593,7 @@ enum hrtimer_restart tx_done(struct hrtimer *hrtimer)
 
 	if (napi_schedule_prep(&priv->napi)) {
 		/* disable rx interrupt */
-		priv->reg_inten &= ~(RCV_INT);
-		writel_relaxed(DEF_INT_MASK & ~RCV_INT, priv->base + PPE_INTEN);
+		writel_relaxed(0, priv->base + PPE_INTEN);
 		__napi_schedule(&priv->napi);
 	}
 

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver
  2015-01-20 12:01             ` Arnd Bergmann
@ 2015-01-20 18:12               ` Joe Perches
  0 siblings, 0 replies; 25+ messages in thread
From: Joe Perches @ 2015-01-20 18:12 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Ding Tianhong,
	Alexander Graf, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-lFZ/pmaqli7XmaaqVzeoHQ,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, xuwei5-C8/M+/jPZTeaMJb+Lgu22Q,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A, davem-fT/PcQaiUtIeIZ0/mPfg9Q

On Tue, 2015-01-20 at 13:01 +0100, Arnd Bergmann wrote:
> On Tuesday 20 January 2015 10:15:05 Ding Tianhong wrote:
> > On 2015/1/20 4:34, Arnd Bergmann wrote:
> > > On Monday 19 January 2015 19:11:11 Alexander Graf wrote:
> > >>
> > >> After hammering on the box a bit again, I'm in a situation where I get 
> > >> lots of
> > >>
> > >> [302398.232603] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302398.377309] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302398.395198] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302398.466118] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302398.659009] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302399.053389] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302399.122067] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302399.268192] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302399.286081] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302399.594201] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302399.683416] hip04-ether e28b0000.ethernet eth0: rx drop
> > >> [302399.701307] hip04-ether e28b0000.ethernet eth0: rx drop
> > >>
> > >> and I really am getting a lot of drops - I can't even ping the machine 
> > >> anymore.
> > >>
> > >> However, as it is there's a good chance the machine is simply 
> > >> unreachable because it's busy writing to the UART, and even if not all 
> > >> useful messages indicating anything have scrolled out. I really don't 
> > >> think you should emit any message over and over again to the user. Once 
> > >> or twice is enough.

[]

> The hip04 ethernet driver currently acknowledges all interrupts directly
> in the interrupt handler, and leaves all interrupts except the RX data
> enabled the whole time. This causes multiple problems:
[]
> diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
[]
> @@ -564,23 +563,21 @@ static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
>  	if (!ists)
>  		return IRQ_NONE;
>  
> -	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
> -
>  	if (unlikely(ists & DEF_INT_ERR)) {
> -		if (ists & (RCV_NOBUF | RCV_DROP))
> +		if (ists & (RCV_NOBUF | RCV_DROP)) {
>  			stats->rx_errors++;
>  			stats->rx_dropped++;
> -			netdev_err(ndev, "rx drop\n"
> +			netdev_dbg(ndev, "rx drop\n");
> +		}
>  		if (ists & TX_DROP) {
>  			stats->tx_dropped++;
> -			netdev_err(ndev, "tx drop\n");
> +			netdev_dbg(ndev, "tx drop\n");
>  		}
>  	}
>  

While these are dubious messages to output at all, it
probably would benefit to use net_ratelimit() before the
netdev_dbg() and maybe output the counter as well:

		if (...) {
			stats++
			if (net_ratelimit())
				netdev_dbg(ndev, "[rt]x drop: %u\n", stats);
		}


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-01-20 18:12 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-14  6:34 [PATCH net-next v13 0/3] add hisilicon hip04 ethernet driver Ding Tianhong
2015-01-14  6:34 ` [PATCH net-next v13 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet Ding Tianhong
2015-01-14  6:34 ` [PATCH net-next v13 2/3] net: hisilicon: new hip04 MDIO driver Ding Tianhong
2015-01-14  6:34 ` [PATCH net-next v13 3/3] net: hisilicon: new hip04 ethernet driver Ding Tianhong
2015-01-14  8:06   ` Joe Perches
2015-01-15  2:56     ` Ding Tianhong
2015-01-14  8:53   ` Arnd Bergmann
2015-01-15  2:54     ` Ding Tianhong
2015-01-15  9:05       ` Arnd Bergmann
2015-01-14 16:34   ` Eric Dumazet
     [not found]     ` <1421253246.11734.17.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2015-01-15 10:29       ` Ding Tianhong
2015-01-15 12:30         ` Arnd Bergmann
2015-01-15  4:39   ` Joe Perches
     [not found]     ` <1421296742.25598.3.camel-6d6DIl74uiNBDgjK7y7TUQ@public.gmane.org>
2015-01-15 10:28       ` Ding Tianhong
2015-01-15  9:53   ` Arnd Bergmann
2015-01-19 18:11   ` Alexander Graf
     [not found]     ` <54BD48BF.3050707-l3A5Bk7waGM@public.gmane.org>
2015-01-19 20:34       ` Arnd Bergmann
2015-01-20  2:15         ` Ding Tianhong
     [not found]           ` <54BDBA29.10703-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2015-01-20 12:01             ` Arnd Bergmann
2015-01-20 18:12               ` Joe Perches
2015-01-14 10:19 ` [PATCH net-next v13 0/3] add hisilicon " Alexander Graf
2015-01-15  8:37   ` Ding Tianhong
2015-01-15  9:42     ` Arnd Bergmann
2015-01-15 12:39       ` Alexander Graf
     [not found]         ` <54B7B4E7.8030108-l3A5Bk7waGM@public.gmane.org>
2015-01-15 12:56           ` Ding Tianhong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).